[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-05-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16488839#comment-16488839
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky closed pull request #463: PARQUET-1253: Support for new logical 
type representation
URL: https://github.com/apache/parquet-mr/pull/463
 
 
   

This is a PR merged from a forked repository.
As GitHub hides the original diff on merge, it is displayed below for
the sake of provenance:

As this is a foreign pull request (from a fork), the diff is supplied
below (as it won't show otherwise due to GitHub magic):

diff --git 
a/parquet-cascading3/src/test/java/org/apache/parquet/cascading/TestParquetTBaseScheme.java
 
b/parquet-cascading3/src/test/java/org/apache/parquet/cascading/TestParquetTBaseScheme.java
index 7b9f817e3..97b2ccf99 100644
--- 
a/parquet-cascading3/src/test/java/org/apache/parquet/cascading/TestParquetTBaseScheme.java
+++ 
b/parquet-cascading3/src/test/java/org/apache/parquet/cascading/TestParquetTBaseScheme.java
@@ -40,14 +40,12 @@
 import org.apache.hadoop.fs.Path;
 import org.apache.hadoop.io.BytesWritable;
 import org.apache.hadoop.mapred.JobConf;
-import org.apache.hadoop.mapreduce.TaskAttemptContext;
 import org.apache.hadoop.mapreduce.TaskAttemptID;
 import org.apache.thrift.protocol.TCompactProtocol;
 import org.apache.thrift.protocol.TProtocol;
 import org.apache.thrift.protocol.TProtocolFactory;
 import org.apache.thrift.transport.TIOStreamTransport;
 import org.junit.Test;
-import static org.junit.Assert.*;
 
 import org.apache.parquet.hadoop.thrift.ThriftToParquetFileWriter;
 import org.apache.parquet.hadoop.util.ContextUtil;
@@ -55,8 +53,9 @@
 
 import java.io.File;
 import java.io.ByteArrayOutputStream;
-import java.util.HashMap;
-import java.util.Map;
+
+import static org.junit.Assert.assertEquals;
+import static org.junit.Assert.assertTrue;
 
 public class TestParquetTBaseScheme {
   final String txtInputPath = "target/test-classes/names.txt";
diff --git 
a/parquet-column/src/main/java/org/apache/parquet/schema/GroupType.java 
b/parquet-column/src/main/java/org/apache/parquet/schema/GroupType.java
index 68dba979b..3ff25b6db 100644
--- a/parquet-column/src/main/java/org/apache/parquet/schema/GroupType.java
+++ b/parquet-column/src/main/java/org/apache/parquet/schema/GroupType.java
@@ -1,4 +1,4 @@
-/* 
+/*
  * Licensed to the Apache Software Foundation (ASF) under one
  * or more contributor license agreements.  See the NOTICE file
  * distributed with this work for additional information
@@ -6,9 +6,9 @@
  * to you under the Apache License, Version 2.0 (the
  * "License"); you may not use this file except in compliance
  * with the License.  You may obtain a copy of the License at
- * 
+ *
  *   http://www.apache.org/licenses/LICENSE-2.0
- * 
+ *
  * Unless required by applicable law or agreed to in writing,
  * software distributed under the License is distributed on an
  * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
@@ -47,7 +47,7 @@
* @param fields the contained fields
*/
   public GroupType(Repetition repetition, String name, List fields) {
-this(repetition, name, null, fields, null);
+this(repetition, name, (LogicalTypeAnnotation) null, fields, null);
   }
 
   /**
@@ -97,6 +97,15 @@ public GroupType(Repetition repetition, String name, 
OriginalType originalType,
 }
   }
 
+  GroupType(Repetition repetition, String name, LogicalTypeAnnotation 
logicalTypeAnnotation, List fields, ID id) {
+super(name, repetition, logicalTypeAnnotation, id);
+this.fields = fields;
+this.indexByName = new HashMap();
+for (int i = 0; i < fields.size(); i++) {
+  indexByName.put(fields.get(i).getName(), i);
+}
+  }
+
   /**
* @param id the field id
* @return a new GroupType with the same fields and a new id
diff --git 
a/parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java
 
b/parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java
new file mode 100644
index 0..e22867aec
--- /dev/null
+++ 
b/parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java
@@ -0,0 +1,878 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the 

[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-05-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474331#comment-16474331
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r187987942
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java
 ##
 @@ -164,7 +135,7 @@ protected LogicalTypeAnnotation fromString(List 
params) {
*/
   public abstract void accept(LogicalTypeAnnotationVisitor 
logicalTypeAnnotationVisitor);
 
-  public abstract LogicalTypes getType();
+  protected abstract LogicalTypeToken getType();
 
 Review comment:
   Good point. I would say if something is not necessary to be public then 
restrict its access before committing. The later the harder to remove from the 
public API.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-05-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474271#comment-16474271
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

nandorKollar commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r187972568
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java
 ##
 @@ -164,7 +135,7 @@ protected LogicalTypeAnnotation fromString(List 
params) {
*/
   public abstract void accept(LogicalTypeAnnotationVisitor 
logicalTypeAnnotationVisitor);
 
-  public abstract LogicalTypes getType();
+  protected abstract LogicalTypeToken getType();
 
 Review comment:
   Indeed, thanks! I'm wondering, if we should narrow down the scope of a bunch 
of other methods to package private in LogicalTypeAnnotation too? For example, 
toOriginalType is only used within the same package, apart from one test case, 
not sure that it should be part of public API.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-05-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16474268#comment-16474268
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

nandorKollar commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r187970641
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java
 ##
 @@ -19,28 +19,13 @@
 package org.apache.parquet.schema;
 
 import org.apache.parquet.Preconditions;
-import org.apache.parquet.format.BsonType;
-import org.apache.parquet.format.ConvertedType;
-import org.apache.parquet.format.DateType;
-import org.apache.parquet.format.DecimalType;
-import org.apache.parquet.format.EnumType;
-import org.apache.parquet.format.IntType;
-import org.apache.parquet.format.JsonType;
-import org.apache.parquet.format.ListType;
-import org.apache.parquet.format.LogicalType;
-import org.apache.parquet.format.MapType;
-import org.apache.parquet.format.MicroSeconds;
-import org.apache.parquet.format.MilliSeconds;
-import org.apache.parquet.format.NullType;
-import org.apache.parquet.format.StringType;
-import org.apache.parquet.format.TimeType;
-import org.apache.parquet.format.TimestampType;
 
 import java.util.List;
 import java.util.Objects;
 
 public abstract class LogicalTypeAnnotation {
-  public enum LogicalTypes {
+  // This is a private enum intended only for internal use for parsing the 
schema
+  public enum LogicalTypeToken {
 
 Review comment:
   Good idea, thanks Gabor!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-05-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16473867#comment-16473867
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r187858363
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java
 ##
 @@ -164,7 +135,7 @@ protected LogicalTypeAnnotation fromString(List 
params) {
*/
   public abstract void accept(LogicalTypeAnnotationVisitor 
logicalTypeAnnotationVisitor);
 
-  public abstract LogicalTypes getType();
+  protected abstract LogicalTypeToken getType();
 
 Review comment:
   nit: package private would be fine here.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-05-14 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16473866#comment-16473866
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r187856940
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java
 ##
 @@ -19,28 +19,13 @@
 package org.apache.parquet.schema;
 
 import org.apache.parquet.Preconditions;
-import org.apache.parquet.format.BsonType;
-import org.apache.parquet.format.ConvertedType;
-import org.apache.parquet.format.DateType;
-import org.apache.parquet.format.DecimalType;
-import org.apache.parquet.format.EnumType;
-import org.apache.parquet.format.IntType;
-import org.apache.parquet.format.JsonType;
-import org.apache.parquet.format.ListType;
-import org.apache.parquet.format.LogicalType;
-import org.apache.parquet.format.MapType;
-import org.apache.parquet.format.MicroSeconds;
-import org.apache.parquet.format.MilliSeconds;
-import org.apache.parquet.format.NullType;
-import org.apache.parquet.format.StringType;
-import org.apache.parquet.format.TimeType;
-import org.apache.parquet.format.TimestampType;
 
 import java.util.List;
 import java.util.Objects;
 
 public abstract class LogicalTypeAnnotation {
-  public enum LogicalTypes {
+  // This is a private enum intended only for internal use for parsing the 
schema
+  public enum LogicalTypeToken {
 
 Review comment:
   As far as I can see this enum is used only from the schema package. I would 
suggest using package private access so the comment is not needed and the 
clients cannot misuse it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-05-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468895#comment-16468895
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r187055416
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java
 ##
 @@ -36,42 +36,152 @@
 import org.apache.parquet.format.TimeType;
 import org.apache.parquet.format.TimestampType;
 
+import java.util.List;
 import java.util.Objects;
 
-public interface LogicalTypeAnnotation {
+public abstract class LogicalTypeAnnotation {
+  public enum LogicalTypes {
+MAP {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+return mapType();
+  }
+},
+LIST {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+return listType();
+  }
+},
+UTF8 {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+return stringType();
+  }
+},
+MAP_KEY_VALUE {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+return MapKeyValueTypeAnnotation.getInstance();
+  }
+},
+ENUM {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+return enumType();
+  }
+},
+DECIMAL {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+if (params.size() != 2) {
+  throw new RuntimeException("Expecting 2 parameters for decimal 
logical type, got " + params.size());
+}
+return decimalType(Integer.valueOf(params.get(1)), 
Integer.valueOf(params.get(0)));
+  }
+},
+DATE {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+return dateType();
+  }
+},
+TIME {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+if (params.size() != 2) {
+  throw new RuntimeException("Expecting 2 parameters for time logical 
type, got " + params.size());
+}
+return timeType(Boolean.parseBoolean(params.get(1)), 
TimeUnit.valueOf(params.get(0)));
+  }
+},
+TIMESTAMP {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+if (params.size() != 2) {
+  throw new RuntimeException("Expecting 2 parameters for timestamp 
logical type, got " + params.size());
+}
+return timestampType(Boolean.parseBoolean(params.get(1)), 
TimeUnit.valueOf(params.get(0)));
+  }
+},
+INT {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+if (params.size() != 2) {
+  throw new RuntimeException("Expecting 2 parameters for integer 
logical type, got " + params.size());
+}
+return intType(Integer.valueOf(params.get(0)), 
Boolean.parseBoolean(params.get(1)));
+  }
+},
+JSON {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+return jsonType();
+  }
+},
+BSON {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+return bsonType();
+  }
+},
+INTERVAL {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+return IntervalLogicalTypeAnnotation.getInstance();
+  }
+};
+
+protected abstract LogicalTypeAnnotation fromString(List params);
+  }
+
   /**
* Convert this parquet-mr logical type to parquet-format LogicalType.
*
* @return the parquet-format LogicalType representation of this logical 
type implementation
*/
-  LogicalType toLogicalType();
+  public abstract LogicalType toLogicalType();
 
   /**
* Convert this parquet-mr logical type to parquet-format ConvertedType.
*
* @return the parquet-format ConvertedType representation of this logical 
type implementation
*/
-  ConvertedType toConvertedType();
+  public abstract ConvertedType toConvertedType();
 
   /**
* Convert this logical type to old logical type representation in 
parquet-mr (if there's any).
* Those logical type implementations, which don't have a corresponding 
mapping should return null.
*
* @return the OriginalType representation of the new logical type, or null 
if there's none
*/
-  OriginalType toOriginalType();
+  public abstract OriginalType toOriginalType();
 
   /**
* Visits this logical type with the given visitor
*
* @param logicalTypeAnnotationVisitor the visitor to visit this type
*/
-  void accept(LogicalTypeAnnotationVisitor logicalTypeAnnotationVisitor);
+  public abstract void accept(LogicalTypeAnnotationVisitor 

[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-05-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468841#comment-16468841
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

nandorKollar commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r187041286
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java
 ##
 @@ -36,42 +36,152 @@
 import org.apache.parquet.format.TimeType;
 import org.apache.parquet.format.TimestampType;
 
+import java.util.List;
 import java.util.Objects;
 
-public interface LogicalTypeAnnotation {
+public abstract class LogicalTypeAnnotation {
+  public enum LogicalTypes {
+MAP {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+return mapType();
+  }
+},
+LIST {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+return listType();
+  }
+},
+UTF8 {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+return stringType();
+  }
+},
+MAP_KEY_VALUE {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+return MapKeyValueTypeAnnotation.getInstance();
+  }
+},
+ENUM {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+return enumType();
+  }
+},
+DECIMAL {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+if (params.size() != 2) {
+  throw new RuntimeException("Expecting 2 parameters for decimal 
logical type, got " + params.size());
+}
+return decimalType(Integer.valueOf(params.get(1)), 
Integer.valueOf(params.get(0)));
+  }
+},
+DATE {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+return dateType();
+  }
+},
+TIME {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+if (params.size() != 2) {
+  throw new RuntimeException("Expecting 2 parameters for time logical 
type, got " + params.size());
+}
+return timeType(Boolean.parseBoolean(params.get(1)), 
TimeUnit.valueOf(params.get(0)));
+  }
+},
+TIMESTAMP {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+if (params.size() != 2) {
+  throw new RuntimeException("Expecting 2 parameters for timestamp 
logical type, got " + params.size());
+}
+return timestampType(Boolean.parseBoolean(params.get(1)), 
TimeUnit.valueOf(params.get(0)));
+  }
+},
+INT {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+if (params.size() != 2) {
+  throw new RuntimeException("Expecting 2 parameters for integer 
logical type, got " + params.size());
+}
+return intType(Integer.valueOf(params.get(0)), 
Boolean.parseBoolean(params.get(1)));
+  }
+},
+JSON {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+return jsonType();
+  }
+},
+BSON {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+return bsonType();
+  }
+},
+INTERVAL {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+return IntervalLogicalTypeAnnotation.getInstance();
+  }
+};
+
+protected abstract LogicalTypeAnnotation fromString(List params);
+  }
+
   /**
* Convert this parquet-mr logical type to parquet-format LogicalType.
*
* @return the parquet-format LogicalType representation of this logical 
type implementation
*/
-  LogicalType toLogicalType();
+  public abstract LogicalType toLogicalType();
 
   /**
* Convert this parquet-mr logical type to parquet-format ConvertedType.
*
* @return the parquet-format ConvertedType representation of this logical 
type implementation
*/
-  ConvertedType toConvertedType();
+  public abstract ConvertedType toConvertedType();
 
   /**
* Convert this logical type to old logical type representation in 
parquet-mr (if there's any).
* Those logical type implementations, which don't have a corresponding 
mapping should return null.
*
* @return the OriginalType representation of the new logical type, or null 
if there's none
*/
-  OriginalType toOriginalType();
+  public abstract OriginalType toOriginalType();
 
   /**
* Visits this logical type with the given visitor
*
* @param logicalTypeAnnotationVisitor the visitor to visit this type
*/
-  void accept(LogicalTypeAnnotationVisitor logicalTypeAnnotationVisitor);
+  public abstract void accept(LogicalTypeAnnotationVisitor 

[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-05-09 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16468593#comment-16468593
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

nandorKollar commented on issue #463: PARQUET-1253: Support for new logical 
type representation
URL: https://github.com/apache/parquet-mr/pull/463#issuecomment-387680018
 
 
   @gszadovszky it looks like the version of maven-shade-plugin and 
enforcer-rule dependency in maven-enforcer-plugin Maven plugins used in Parquet 
don't like Java 8 lambda expressions, that's why Travis build failed. Since 
Parquet is already upgraded to Java 8, I think we should upgrade these plugins. 
I'll create a separate Jira for this issue.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-05-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467616#comment-16467616
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r186779505
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java
 ##
 @@ -36,42 +36,152 @@
 import org.apache.parquet.format.TimeType;
 import org.apache.parquet.format.TimestampType;
 
+import java.util.List;
 import java.util.Objects;
 
-public interface LogicalTypeAnnotation {
+public abstract class LogicalTypeAnnotation {
+  public enum LogicalTypes {
+MAP {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+return mapType();
+  }
+},
+LIST {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+return listType();
+  }
+},
+UTF8 {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+return stringType();
+  }
+},
+MAP_KEY_VALUE {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+return MapKeyValueTypeAnnotation.getInstance();
+  }
+},
+ENUM {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+return enumType();
+  }
+},
+DECIMAL {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+if (params.size() != 2) {
+  throw new RuntimeException("Expecting 2 parameters for decimal 
logical type, got " + params.size());
+}
+return decimalType(Integer.valueOf(params.get(1)), 
Integer.valueOf(params.get(0)));
+  }
+},
+DATE {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+return dateType();
+  }
+},
+TIME {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+if (params.size() != 2) {
+  throw new RuntimeException("Expecting 2 parameters for time logical 
type, got " + params.size());
+}
+return timeType(Boolean.parseBoolean(params.get(1)), 
TimeUnit.valueOf(params.get(0)));
+  }
+},
+TIMESTAMP {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+if (params.size() != 2) {
+  throw new RuntimeException("Expecting 2 parameters for timestamp 
logical type, got " + params.size());
+}
+return timestampType(Boolean.parseBoolean(params.get(1)), 
TimeUnit.valueOf(params.get(0)));
+  }
+},
+INT {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+if (params.size() != 2) {
+  throw new RuntimeException("Expecting 2 parameters for integer 
logical type, got " + params.size());
+}
+return intType(Integer.valueOf(params.get(0)), 
Boolean.parseBoolean(params.get(1)));
+  }
+},
+JSON {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+return jsonType();
+  }
+},
+BSON {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+return bsonType();
+  }
+},
+INTERVAL {
+  @Override
+  protected LogicalTypeAnnotation fromString(List params) {
+return IntervalLogicalTypeAnnotation.getInstance();
+  }
+};
+
+protected abstract LogicalTypeAnnotation fromString(List params);
+  }
+
   /**
* Convert this parquet-mr logical type to parquet-format LogicalType.
*
* @return the parquet-format LogicalType representation of this logical 
type implementation
*/
-  LogicalType toLogicalType();
+  public abstract LogicalType toLogicalType();
 
   /**
* Convert this parquet-mr logical type to parquet-format ConvertedType.
*
* @return the parquet-format ConvertedType representation of this logical 
type implementation
*/
-  ConvertedType toConvertedType();
+  public abstract ConvertedType toConvertedType();
 
   /**
* Convert this logical type to old logical type representation in 
parquet-mr (if there's any).
* Those logical type implementations, which don't have a corresponding 
mapping should return null.
*
* @return the OriginalType representation of the new logical type, or null 
if there's none
*/
-  OriginalType toOriginalType();
+  public abstract OriginalType toOriginalType();
 
   /**
* Visits this logical type with the given visitor
*
* @param logicalTypeAnnotationVisitor the visitor to visit this type
*/
-  void accept(LogicalTypeAnnotationVisitor logicalTypeAnnotationVisitor);
+  public abstract void accept(LogicalTypeAnnotationVisitor 

[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-05-08 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16467613#comment-16467613
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r186777588
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java
 ##
 @@ -36,42 +36,152 @@
 import org.apache.parquet.format.TimeType;
 import org.apache.parquet.format.TimestampType;
 
+import java.util.List;
 import java.util.Objects;
 
-public interface LogicalTypeAnnotation {
+public abstract class LogicalTypeAnnotation {
+  public enum LogicalTypes {
 
 Review comment:
   This enum is used only for parsing/printing and we don't want the users to 
really use them. So, I would suggest using a name that suggests its use e.g. 
`LogicalTypeParseHelper`?
   Also, it would be nice if we could annotate/comment that this one is not 
part of the public API.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462219#comment-16462219
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r185746812
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java
 ##
 @@ -77,63 +77,116 @@ static LogicalTypeAnnotation 
fromOriginalType(OriginalType originalType, Decimal
 }
 switch (originalType) {
   case UTF8:
-return StringLogicalTypeAnnotation.create();
+return stringType();
   case MAP:
-return MapLogicalTypeAnnotation.create();
+return mapType();
   case DECIMAL:
 int scale = (decimalMetadata == null ? 0 : decimalMetadata.getScale());
 int precision = (decimalMetadata == null ? 0 : 
decimalMetadata.getPrecision());
-return DecimalLogicalTypeAnnotation.create(scale, precision);
+return decimalType(scale, precision);
   case LIST:
-return ListLogicalTypeAnnotation.create();
+return listType();
   case DATE:
-return DateLogicalTypeAnnotation.create();
+return dateType();
   case INTERVAL:
-return IntervalLogicalTypeAnnotation.create();
+return intervalType();
   case TIMESTAMP_MILLIS:
-return TimestampLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MILLIS);
+return timestampType(true, LogicalTypeAnnotation.TimeUnit.MILLIS);
   case TIMESTAMP_MICROS:
-return TimestampLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MICROS);
+return timestampType(true, LogicalTypeAnnotation.TimeUnit.MICROS);
   case TIME_MILLIS:
-return TimeLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MILLIS);
+return timeType(true, LogicalTypeAnnotation.TimeUnit.MILLIS);
   case TIME_MICROS:
-return TimeLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MICROS);
+return timeType(true, LogicalTypeAnnotation.TimeUnit.MICROS);
   case UINT_8:
-return IntLogicalTypeAnnotation.create(8, false);
+return intType(8, false);
   case UINT_16:
-return IntLogicalTypeAnnotation.create(16, false);
+return intType(16, false);
   case UINT_32:
-return IntLogicalTypeAnnotation.create(32, false);
+return intType(32, false);
   case UINT_64:
-return IntLogicalTypeAnnotation.create(64, false);
+return intType(64, false);
   case INT_8:
-return IntLogicalTypeAnnotation.create(8, true);
+return intType(8, true);
   case INT_16:
-return IntLogicalTypeAnnotation.create(16, true);
+return intType(16, true);
   case INT_32:
-return IntLogicalTypeAnnotation.create(32, true);
+return intType(32, true);
   case INT_64:
-return IntLogicalTypeAnnotation.create(64, true);
+return intType(64, true);
   case ENUM:
-return EnumLogicalTypeAnnotation.create();
+return enumType();
   case JSON:
-return JsonLogicalTypeAnnotation.create();
+return jsonType();
   case BSON:
-return BsonLogicalTypeAnnotation.create();
+return bsonType();
   case MAP_KEY_VALUE:
-return MapKeyValueTypeAnnotation.create();
+return mapKeyValueType();
   default:
 throw new RuntimeException("Can't convert original type to logical 
type, unknown original type " + originalType);
 }
   }
 
+
+  static StringLogicalTypeAnnotation stringType() {
+return StringLogicalTypeAnnotation.INSTANCE;
+  }
+
+  static MapLogicalTypeAnnotation mapType() {
+return MapLogicalTypeAnnotation.INSTANCE;
+  }
+
+  static ListLogicalTypeAnnotation listType() {
+return ListLogicalTypeAnnotation.INSTANCE;
+  }
+
+  static EnumLogicalTypeAnnotation enumType() {
+return EnumLogicalTypeAnnotation.INSTANCE;
+  }
+
+  static DecimalLogicalTypeAnnotation decimalType(final int scale, final int 
precision) {
+return new DecimalLogicalTypeAnnotation(scale, precision);
+  }
+
+  static DateLogicalTypeAnnotation dateType() {
+return DateLogicalTypeAnnotation.INSTANCE;
+  }
+
+  static TimeLogicalTypeAnnotation timeType(final boolean isAdjustedToUTC, 
final TimeUnit unit) {
+return new TimeLogicalTypeAnnotation(isAdjustedToUTC, unit);
+  }
+
+  static TimestampLogicalTypeAnnotation timestampType(final boolean 
isAdjustedToUTC, final TimeUnit unit) {
+return new TimestampLogicalTypeAnnotation(isAdjustedToUTC, unit);
+  }
+
+  static IntLogicalTypeAnnotation intType(final int bitWidth, final boolean 
isSigned) {
+Preconditions.checkArgument(
+  bitWidth == 8 || bitWidth == 

[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-05-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462112#comment-16462112
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

nandorKollar commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r185728821
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java
 ##
 @@ -77,63 +77,116 @@ static LogicalTypeAnnotation 
fromOriginalType(OriginalType originalType, Decimal
 }
 switch (originalType) {
   case UTF8:
-return StringLogicalTypeAnnotation.create();
+return stringType();
   case MAP:
-return MapLogicalTypeAnnotation.create();
+return mapType();
   case DECIMAL:
 int scale = (decimalMetadata == null ? 0 : decimalMetadata.getScale());
 int precision = (decimalMetadata == null ? 0 : 
decimalMetadata.getPrecision());
-return DecimalLogicalTypeAnnotation.create(scale, precision);
+return decimalType(scale, precision);
   case LIST:
-return ListLogicalTypeAnnotation.create();
+return listType();
   case DATE:
-return DateLogicalTypeAnnotation.create();
+return dateType();
   case INTERVAL:
-return IntervalLogicalTypeAnnotation.create();
+return intervalType();
   case TIMESTAMP_MILLIS:
-return TimestampLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MILLIS);
+return timestampType(true, LogicalTypeAnnotation.TimeUnit.MILLIS);
   case TIMESTAMP_MICROS:
-return TimestampLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MICROS);
+return timestampType(true, LogicalTypeAnnotation.TimeUnit.MICROS);
   case TIME_MILLIS:
-return TimeLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MILLIS);
+return timeType(true, LogicalTypeAnnotation.TimeUnit.MILLIS);
   case TIME_MICROS:
-return TimeLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MICROS);
+return timeType(true, LogicalTypeAnnotation.TimeUnit.MICROS);
   case UINT_8:
-return IntLogicalTypeAnnotation.create(8, false);
+return intType(8, false);
   case UINT_16:
-return IntLogicalTypeAnnotation.create(16, false);
+return intType(16, false);
   case UINT_32:
-return IntLogicalTypeAnnotation.create(32, false);
+return intType(32, false);
   case UINT_64:
-return IntLogicalTypeAnnotation.create(64, false);
+return intType(64, false);
   case INT_8:
-return IntLogicalTypeAnnotation.create(8, true);
+return intType(8, true);
   case INT_16:
-return IntLogicalTypeAnnotation.create(16, true);
+return intType(16, true);
   case INT_32:
-return IntLogicalTypeAnnotation.create(32, true);
+return intType(32, true);
   case INT_64:
-return IntLogicalTypeAnnotation.create(64, true);
+return intType(64, true);
   case ENUM:
-return EnumLogicalTypeAnnotation.create();
+return enumType();
   case JSON:
-return JsonLogicalTypeAnnotation.create();
+return jsonType();
   case BSON:
-return BsonLogicalTypeAnnotation.create();
+return bsonType();
   case MAP_KEY_VALUE:
-return MapKeyValueTypeAnnotation.create();
+return mapKeyValueType();
   default:
 throw new RuntimeException("Can't convert original type to logical 
type, unknown original type " + originalType);
 }
   }
 
+
+  static StringLogicalTypeAnnotation stringType() {
+return StringLogicalTypeAnnotation.INSTANCE;
+  }
+
+  static MapLogicalTypeAnnotation mapType() {
+return MapLogicalTypeAnnotation.INSTANCE;
+  }
+
+  static ListLogicalTypeAnnotation listType() {
+return ListLogicalTypeAnnotation.INSTANCE;
+  }
+
+  static EnumLogicalTypeAnnotation enumType() {
+return EnumLogicalTypeAnnotation.INSTANCE;
+  }
+
+  static DecimalLogicalTypeAnnotation decimalType(final int scale, final int 
precision) {
+return new DecimalLogicalTypeAnnotation(scale, precision);
+  }
+
+  static DateLogicalTypeAnnotation dateType() {
+return DateLogicalTypeAnnotation.INSTANCE;
+  }
+
+  static TimeLogicalTypeAnnotation timeType(final boolean isAdjustedToUTC, 
final TimeUnit unit) {
+return new TimeLogicalTypeAnnotation(isAdjustedToUTC, unit);
+  }
+
+  static TimestampLogicalTypeAnnotation timestampType(final boolean 
isAdjustedToUTC, final TimeUnit unit) {
+return new TimestampLogicalTypeAnnotation(isAdjustedToUTC, unit);
+  }
+
+  static IntLogicalTypeAnnotation intType(final int bitWidth, final boolean 
isSigned) {
+Preconditions.checkArgument(
+  bitWidth == 8 || bitWidth == 

[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-24 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16450178#comment-16450178
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r183800256
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java
 ##
 @@ -77,63 +77,116 @@ static LogicalTypeAnnotation 
fromOriginalType(OriginalType originalType, Decimal
 }
 switch (originalType) {
   case UTF8:
-return StringLogicalTypeAnnotation.create();
+return stringType();
   case MAP:
-return MapLogicalTypeAnnotation.create();
+return mapType();
   case DECIMAL:
 int scale = (decimalMetadata == null ? 0 : decimalMetadata.getScale());
 int precision = (decimalMetadata == null ? 0 : 
decimalMetadata.getPrecision());
-return DecimalLogicalTypeAnnotation.create(scale, precision);
+return decimalType(scale, precision);
   case LIST:
-return ListLogicalTypeAnnotation.create();
+return listType();
   case DATE:
-return DateLogicalTypeAnnotation.create();
+return dateType();
   case INTERVAL:
-return IntervalLogicalTypeAnnotation.create();
+return intervalType();
   case TIMESTAMP_MILLIS:
-return TimestampLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MILLIS);
+return timestampType(true, LogicalTypeAnnotation.TimeUnit.MILLIS);
   case TIMESTAMP_MICROS:
-return TimestampLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MICROS);
+return timestampType(true, LogicalTypeAnnotation.TimeUnit.MICROS);
   case TIME_MILLIS:
-return TimeLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MILLIS);
+return timeType(true, LogicalTypeAnnotation.TimeUnit.MILLIS);
   case TIME_MICROS:
-return TimeLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MICROS);
+return timeType(true, LogicalTypeAnnotation.TimeUnit.MICROS);
   case UINT_8:
-return IntLogicalTypeAnnotation.create(8, false);
+return intType(8, false);
   case UINT_16:
-return IntLogicalTypeAnnotation.create(16, false);
+return intType(16, false);
   case UINT_32:
-return IntLogicalTypeAnnotation.create(32, false);
+return intType(32, false);
   case UINT_64:
-return IntLogicalTypeAnnotation.create(64, false);
+return intType(64, false);
   case INT_8:
-return IntLogicalTypeAnnotation.create(8, true);
+return intType(8, true);
   case INT_16:
-return IntLogicalTypeAnnotation.create(16, true);
+return intType(16, true);
   case INT_32:
-return IntLogicalTypeAnnotation.create(32, true);
+return intType(32, true);
   case INT_64:
-return IntLogicalTypeAnnotation.create(64, true);
+return intType(64, true);
   case ENUM:
-return EnumLogicalTypeAnnotation.create();
+return enumType();
   case JSON:
-return JsonLogicalTypeAnnotation.create();
+return jsonType();
   case BSON:
-return BsonLogicalTypeAnnotation.create();
+return bsonType();
   case MAP_KEY_VALUE:
-return MapKeyValueTypeAnnotation.create();
+return mapKeyValueType();
   default:
 throw new RuntimeException("Can't convert original type to logical 
type, unknown original type " + originalType);
 }
   }
 
+
+  static StringLogicalTypeAnnotation stringType() {
+return StringLogicalTypeAnnotation.INSTANCE;
+  }
+
+  static MapLogicalTypeAnnotation mapType() {
+return MapLogicalTypeAnnotation.INSTANCE;
+  }
+
+  static ListLogicalTypeAnnotation listType() {
+return ListLogicalTypeAnnotation.INSTANCE;
+  }
+
+  static EnumLogicalTypeAnnotation enumType() {
+return EnumLogicalTypeAnnotation.INSTANCE;
+  }
+
+  static DecimalLogicalTypeAnnotation decimalType(final int scale, final int 
precision) {
+return new DecimalLogicalTypeAnnotation(scale, precision);
+  }
+
+  static DateLogicalTypeAnnotation dateType() {
+return DateLogicalTypeAnnotation.INSTANCE;
+  }
+
+  static TimeLogicalTypeAnnotation timeType(final boolean isAdjustedToUTC, 
final TimeUnit unit) {
+return new TimeLogicalTypeAnnotation(isAdjustedToUTC, unit);
+  }
+
+  static TimestampLogicalTypeAnnotation timestampType(final boolean 
isAdjustedToUTC, final TimeUnit unit) {
+return new TimestampLogicalTypeAnnotation(isAdjustedToUTC, unit);
+  }
+
+  static IntLogicalTypeAnnotation intType(final int bitWidth, final boolean 
isSigned) {
+Preconditions.checkArgument(
+  bitWidth == 8 || bitWidth == 

[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-21 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16446754#comment-16446754
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r183207493
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java
 ##
 @@ -0,0 +1,714 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.schema;
+
+import org.apache.parquet.format.BsonType;
+import org.apache.parquet.format.ConvertedType;
+import org.apache.parquet.format.DateType;
+import org.apache.parquet.format.DecimalType;
+import org.apache.parquet.format.EnumType;
+import org.apache.parquet.format.IntType;
+import org.apache.parquet.format.JsonType;
+import org.apache.parquet.format.ListType;
+import org.apache.parquet.format.LogicalType;
+import org.apache.parquet.format.MapType;
+import org.apache.parquet.format.MicroSeconds;
+import org.apache.parquet.format.MilliSeconds;
+import org.apache.parquet.format.NullType;
+import org.apache.parquet.format.StringType;
+import org.apache.parquet.format.TimeType;
+import org.apache.parquet.format.TimestampType;
+
+import java.util.Objects;
+
+public interface LogicalTypeAnnotation {
+  /**
+   * Convert this parquet-mr logical type to parquet-format LogicalType.
+   *
+   * @return the parquet-format LogicalType representation of this logical 
type implementation
+   */
+  LogicalType toLogicalType();
+
+  /**
+   * Convert this parquet-mr logical type to parquet-format ConvertedType.
+   *
+   * @return the parquet-format ConvertedType representation of this logical 
type implementation
+   */
+  ConvertedType toConvertedType();
+
+  /**
+   * Convert this logical type to old logical type representation in 
parquet-mr (if there's any).
+   * Those logical type implementations, which don't have a corresponding 
mapping should return null.
+   *
+   * @return the OriginalType representation of the new logical type, or null 
if there's none
+   */
+  OriginalType toOriginalType();
+
+  /**
+   * Helper method to convert the old representation of logical types 
(OriginalType) to new logical type.
+   */
+  static LogicalTypeAnnotation fromOriginalType(OriginalType originalType, 
DecimalMetadata decimalMetadata) {
+if (originalType == null) {
+  return null;
+}
+switch (originalType) {
+  case UTF8:
+return StringLogicalTypeAnnotation.create();
+  case MAP:
+return MapLogicalTypeAnnotation.create();
+  case DECIMAL:
+int scale = (decimalMetadata == null ? 0 : decimalMetadata.getScale());
+int precision = (decimalMetadata == null ? 0 : 
decimalMetadata.getPrecision());
+return DecimalLogicalTypeAnnotation.create(scale, precision);
+  case LIST:
+return ListLogicalTypeAnnotation.create();
+  case DATE:
+return DateLogicalTypeAnnotation.create();
+  case INTERVAL:
+return IntervalLogicalTypeAnnotation.create();
+  case TIMESTAMP_MILLIS:
+return TimestampLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MILLIS);
+  case TIMESTAMP_MICROS:
+return TimestampLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MICROS);
+  case TIME_MILLIS:
+return TimeLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MILLIS);
+  case TIME_MICROS:
+return TimeLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MICROS);
+  case UINT_8:
+return IntLogicalTypeAnnotation.create((byte) 8, false);
+  case UINT_16:
+return IntLogicalTypeAnnotation.create((byte) 16, false);
+  case UINT_32:
+return IntLogicalTypeAnnotation.create((byte) 32, false);
+  case UINT_64:
+return IntLogicalTypeAnnotation.create((byte) 64, false);
+  case INT_8:
+return IntLogicalTypeAnnotation.create((byte) 8, true);
+  case INT_16:
+return 

[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16445979#comment-16445979
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

nandorKollar commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r183102244
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java
 ##
 @@ -0,0 +1,714 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.schema;
+
+import org.apache.parquet.format.BsonType;
+import org.apache.parquet.format.ConvertedType;
+import org.apache.parquet.format.DateType;
+import org.apache.parquet.format.DecimalType;
+import org.apache.parquet.format.EnumType;
+import org.apache.parquet.format.IntType;
+import org.apache.parquet.format.JsonType;
+import org.apache.parquet.format.ListType;
+import org.apache.parquet.format.LogicalType;
+import org.apache.parquet.format.MapType;
+import org.apache.parquet.format.MicroSeconds;
+import org.apache.parquet.format.MilliSeconds;
+import org.apache.parquet.format.NullType;
+import org.apache.parquet.format.StringType;
+import org.apache.parquet.format.TimeType;
+import org.apache.parquet.format.TimestampType;
+
+import java.util.Objects;
+
+public interface LogicalTypeAnnotation {
+  /**
+   * Convert this parquet-mr logical type to parquet-format LogicalType.
+   *
+   * @return the parquet-format LogicalType representation of this logical 
type implementation
+   */
+  LogicalType toLogicalType();
+
+  /**
+   * Convert this parquet-mr logical type to parquet-format ConvertedType.
+   *
+   * @return the parquet-format ConvertedType representation of this logical 
type implementation
+   */
+  ConvertedType toConvertedType();
+
+  /**
+   * Convert this logical type to old logical type representation in 
parquet-mr (if there's any).
+   * Those logical type implementations, which don't have a corresponding 
mapping should return null.
+   *
+   * @return the OriginalType representation of the new logical type, or null 
if there's none
+   */
+  OriginalType toOriginalType();
+
+  /**
+   * Helper method to convert the old representation of logical types 
(OriginalType) to new logical type.
+   */
+  static LogicalTypeAnnotation fromOriginalType(OriginalType originalType, 
DecimalMetadata decimalMetadata) {
+if (originalType == null) {
+  return null;
+}
+switch (originalType) {
+  case UTF8:
+return StringLogicalTypeAnnotation.create();
+  case MAP:
+return MapLogicalTypeAnnotation.create();
+  case DECIMAL:
+int scale = (decimalMetadata == null ? 0 : decimalMetadata.getScale());
+int precision = (decimalMetadata == null ? 0 : 
decimalMetadata.getPrecision());
+return DecimalLogicalTypeAnnotation.create(scale, precision);
+  case LIST:
+return ListLogicalTypeAnnotation.create();
+  case DATE:
+return DateLogicalTypeAnnotation.create();
+  case INTERVAL:
+return IntervalLogicalTypeAnnotation.create();
+  case TIMESTAMP_MILLIS:
+return TimestampLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MILLIS);
+  case TIMESTAMP_MICROS:
+return TimestampLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MICROS);
+  case TIME_MILLIS:
+return TimeLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MILLIS);
+  case TIME_MICROS:
+return TimeLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MICROS);
+  case UINT_8:
+return IntLogicalTypeAnnotation.create((byte) 8, false);
+  case UINT_16:
+return IntLogicalTypeAnnotation.create((byte) 16, false);
+  case UINT_32:
+return IntLogicalTypeAnnotation.create((byte) 32, false);
+  case UINT_64:
+return IntLogicalTypeAnnotation.create((byte) 64, false);
+  case INT_8:
+return IntLogicalTypeAnnotation.create((byte) 8, true);
+  case INT_16:
+return 

[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16445584#comment-16445584
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r183004191
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/PrimitiveType.java
 ##
 @@ -459,6 +468,37 @@ public PrimitiveType(Repetition repetition, 
PrimitiveTypeName primitive,
 this.columnOrder = requireValidColumnOrder(columnOrder);
   }
 
+  public PrimitiveType(Repetition repetition, PrimitiveTypeName primitive,
 
 Review comment:
   What do you think about my previous comment? If you agree, you should say at 
the deprecations to use the `Types` factory instead of directly creating 
`PrimitiveType` objects.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16445580#comment-16445580
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r183007511
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java
 ##
 @@ -0,0 +1,714 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.schema;
+
+import org.apache.parquet.format.BsonType;
+import org.apache.parquet.format.ConvertedType;
+import org.apache.parquet.format.DateType;
+import org.apache.parquet.format.DecimalType;
+import org.apache.parquet.format.EnumType;
+import org.apache.parquet.format.IntType;
+import org.apache.parquet.format.JsonType;
+import org.apache.parquet.format.ListType;
+import org.apache.parquet.format.LogicalType;
+import org.apache.parquet.format.MapType;
+import org.apache.parquet.format.MicroSeconds;
+import org.apache.parquet.format.MilliSeconds;
+import org.apache.parquet.format.NullType;
+import org.apache.parquet.format.StringType;
+import org.apache.parquet.format.TimeType;
+import org.apache.parquet.format.TimestampType;
+
+import java.util.Objects;
+
+public interface LogicalTypeAnnotation {
+  /**
+   * Convert this parquet-mr logical type to parquet-format LogicalType.
+   *
+   * @return the parquet-format LogicalType representation of this logical 
type implementation
+   */
+  LogicalType toLogicalType();
+
+  /**
+   * Convert this parquet-mr logical type to parquet-format ConvertedType.
+   *
+   * @return the parquet-format ConvertedType representation of this logical 
type implementation
+   */
+  ConvertedType toConvertedType();
+
+  /**
+   * Convert this logical type to old logical type representation in 
parquet-mr (if there's any).
+   * Those logical type implementations, which don't have a corresponding 
mapping should return null.
+   *
+   * @return the OriginalType representation of the new logical type, or null 
if there's none
+   */
+  OriginalType toOriginalType();
+
+  /**
+   * Helper method to convert the old representation of logical types 
(OriginalType) to new logical type.
+   */
+  static LogicalTypeAnnotation fromOriginalType(OriginalType originalType, 
DecimalMetadata decimalMetadata) {
+if (originalType == null) {
+  return null;
+}
+switch (originalType) {
+  case UTF8:
+return StringLogicalTypeAnnotation.create();
+  case MAP:
+return MapLogicalTypeAnnotation.create();
+  case DECIMAL:
+int scale = (decimalMetadata == null ? 0 : decimalMetadata.getScale());
+int precision = (decimalMetadata == null ? 0 : 
decimalMetadata.getPrecision());
+return DecimalLogicalTypeAnnotation.create(scale, precision);
+  case LIST:
+return ListLogicalTypeAnnotation.create();
+  case DATE:
+return DateLogicalTypeAnnotation.create();
+  case INTERVAL:
+return IntervalLogicalTypeAnnotation.create();
+  case TIMESTAMP_MILLIS:
+return TimestampLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MILLIS);
+  case TIMESTAMP_MICROS:
+return TimestampLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MICROS);
+  case TIME_MILLIS:
+return TimeLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MILLIS);
+  case TIME_MICROS:
+return TimeLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MICROS);
+  case UINT_8:
+return IntLogicalTypeAnnotation.create((byte) 8, false);
+  case UINT_16:
+return IntLogicalTypeAnnotation.create((byte) 16, false);
+  case UINT_32:
+return IntLogicalTypeAnnotation.create((byte) 32, false);
+  case UINT_64:
+return IntLogicalTypeAnnotation.create((byte) 64, false);
+  case INT_8:
+return IntLogicalTypeAnnotation.create((byte) 8, true);
+  case INT_16:
+return 

[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-20 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16445581#comment-16445581
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r183008754
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java
 ##
 @@ -0,0 +1,714 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.schema;
+
+import org.apache.parquet.format.BsonType;
+import org.apache.parquet.format.ConvertedType;
+import org.apache.parquet.format.DateType;
+import org.apache.parquet.format.DecimalType;
+import org.apache.parquet.format.EnumType;
+import org.apache.parquet.format.IntType;
+import org.apache.parquet.format.JsonType;
+import org.apache.parquet.format.ListType;
+import org.apache.parquet.format.LogicalType;
+import org.apache.parquet.format.MapType;
+import org.apache.parquet.format.MicroSeconds;
+import org.apache.parquet.format.MilliSeconds;
+import org.apache.parquet.format.NullType;
+import org.apache.parquet.format.StringType;
+import org.apache.parquet.format.TimeType;
+import org.apache.parquet.format.TimestampType;
+
+import java.util.Objects;
+
+public interface LogicalTypeAnnotation {
+  /**
+   * Convert this parquet-mr logical type to parquet-format LogicalType.
+   *
+   * @return the parquet-format LogicalType representation of this logical 
type implementation
+   */
+  LogicalType toLogicalType();
+
+  /**
+   * Convert this parquet-mr logical type to parquet-format ConvertedType.
+   *
+   * @return the parquet-format ConvertedType representation of this logical 
type implementation
+   */
+  ConvertedType toConvertedType();
+
+  /**
+   * Convert this logical type to old logical type representation in 
parquet-mr (if there's any).
+   * Those logical type implementations, which don't have a corresponding 
mapping should return null.
+   *
+   * @return the OriginalType representation of the new logical type, or null 
if there's none
+   */
+  OriginalType toOriginalType();
+
+  /**
+   * Helper method to convert the old representation of logical types 
(OriginalType) to new logical type.
+   */
+  static LogicalTypeAnnotation fromOriginalType(OriginalType originalType, 
DecimalMetadata decimalMetadata) {
+if (originalType == null) {
+  return null;
+}
+switch (originalType) {
+  case UTF8:
+return StringLogicalTypeAnnotation.create();
+  case MAP:
+return MapLogicalTypeAnnotation.create();
+  case DECIMAL:
+int scale = (decimalMetadata == null ? 0 : decimalMetadata.getScale());
+int precision = (decimalMetadata == null ? 0 : 
decimalMetadata.getPrecision());
+return DecimalLogicalTypeAnnotation.create(scale, precision);
+  case LIST:
+return ListLogicalTypeAnnotation.create();
+  case DATE:
+return DateLogicalTypeAnnotation.create();
+  case INTERVAL:
+return IntervalLogicalTypeAnnotation.create();
+  case TIMESTAMP_MILLIS:
+return TimestampLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MILLIS);
+  case TIMESTAMP_MICROS:
+return TimestampLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MICROS);
+  case TIME_MILLIS:
+return TimeLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MILLIS);
+  case TIME_MICROS:
+return TimeLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MICROS);
+  case UINT_8:
+return IntLogicalTypeAnnotation.create((byte) 8, false);
+  case UINT_16:
+return IntLogicalTypeAnnotation.create((byte) 16, false);
+  case UINT_32:
+return IntLogicalTypeAnnotation.create((byte) 32, false);
+  case UINT_64:
+return IntLogicalTypeAnnotation.create((byte) 64, false);
+  case INT_8:
+return IntLogicalTypeAnnotation.create((byte) 8, true);
+  case INT_16:
+return 

[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442647#comment-16442647
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r182451175
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/GroupType.java
 ##
 @@ -88,6 +88,7 @@ public GroupType(Repetition repetition, String name, 
OriginalType originalType,
* @param fields the contained fields
* @param id the id of the field
*/
+  @Deprecated
 
 Review comment:
   I think, it is enough to deprecate public API. If a method is not public we 
can freely modify/remove it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442648#comment-16442648
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r182458050
 
 

 ##
 File path: parquet-column/src/main/java/org/apache/parquet/schema/Type.java
 ##
 @@ -146,11 +146,18 @@ public Type(String name, Repetition repetition, 
OriginalType originalType) {
* @param repetition OPTIONAL, REPEATED, REQUIRED
* @param originalType (optional) the original type to help with cross 
schema conversion (LIST, MAP, ...)
* @param id (optional) the id of the fields.
+   *
+   * @deprecated use {@link #Type(String, Repetition, LogicalTypeAnnotation, 
ID)} instead
*/
+  @Deprecated
 
 Review comment:
   Not public, no deprecating is required.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442643#comment-16442643
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r182452787
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/PrimitiveType.java
 ##
 @@ -436,13 +438,20 @@ public PrimitiveType(Repetition repetition, 
PrimitiveTypeName primitive,
* @param originalType (optional) the original type (MAP, DECIMAL, UTF8, ...)
* @param decimalMeta (optional) metadata about the decimal type
* @param id the id of the field
+   *
+   * @deprecated use {@link #PrimitiveType(Repetition, PrimitiveTypeName, int, 
String, LogicalTypeAnnotation, ID)} instead
 
 Review comment:
   See above.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442644#comment-16442644
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r182452368
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/PrimitiveType.java
 ##
 @@ -401,15 +400,18 @@ public PrimitiveType(Repetition repetition, 
PrimitiveTypeName primitive,
* @param name the name of the type
*/
   public PrimitiveType(Repetition repetition, PrimitiveTypeName primitive, int 
length, String name) {
-this(repetition, primitive, length, name, null, null, null);
+this(repetition, primitive, length, name, (LogicalTypeAnnotation) null, 
null, null);
   }
 
   /**
* @param repetition OPTIONAL, REPEATED, REQUIRED
* @param primitive STRING, INT64, ...
* @param name the name of the type
* @param originalType (optional) the original type to help with cross 
schema convertion (LIST, MAP, ...)
+   *
+   * @deprecated use {@link #PrimitiveType(Repetition, PrimitiveTypeName, 
String, LogicalTypeAnnotation)} instead
 
 Review comment:
   The usual pattern in parquet for deprecating is to mention that the related 
method will be removed in 2.0.0. In case of the suggestion would be to use one 
of the overloaded methods then it is fine to not mention.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442646#comment-16442646
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r182456699
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/PrimitiveType.java
 ##
 @@ -459,6 +468,37 @@ public PrimitiveType(Repetition repetition, 
PrimitiveTypeName primitive,
 this.columnOrder = requireValidColumnOrder(columnOrder);
   }
 
+  public PrimitiveType(Repetition repetition, PrimitiveTypeName primitive,
 
 Review comment:
   I think, in long term we do not want to expose the construction of types 
from the API. We would like the clients to use the builder instead. Therefore, 
I would suggest not adding public constructors if your code does not need it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442645#comment-16442645
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r182452892
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/PrimitiveType.java
 ##
 @@ -436,13 +438,20 @@ public PrimitiveType(Repetition repetition, 
PrimitiveTypeName primitive,
* @param originalType (optional) the original type (MAP, DECIMAL, UTF8, ...)
* @param decimalMeta (optional) metadata about the decimal type
* @param id the id of the field
+   *
+   * @deprecated use {@link #PrimitiveType(Repetition, PrimitiveTypeName, int, 
String, LogicalTypeAnnotation, ID)} instead
*/
+  @Deprecated
   public PrimitiveType(Repetition repetition, PrimitiveTypeName primitive,
int length, String name, OriginalType originalType,
DecimalMetadata decimalMeta, ID id) {
 this(repetition, primitive, length, name, originalType, decimalMeta, id, 
null);
   }
 
+  /**
+   * @deprecated use {@link #PrimitiveType(Repetition, PrimitiveTypeName, int, 
String, LogicalTypeAnnotation, ID, ColumnOrder)} instead
 
 Review comment:
   Not public, no need for deprecation.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-18 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16442641#comment-16442641
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r182458122
 
 

 ##
 File path: parquet-column/src/main/java/org/apache/parquet/schema/Type.java
 ##
 @@ -146,11 +146,18 @@ public Type(String name, Repetition repetition, 
OriginalType originalType) {
* @param repetition OPTIONAL, REPEATED, REQUIRED
* @param originalType (optional) the original type to help with cross 
schema conversion (LIST, MAP, ...)
* @param id (optional) the id of the fields.
+   *
+   * @deprecated use {@link #Type(String, Repetition, LogicalTypeAnnotation, 
ID)} instead
*/
+  @Deprecated
   Type(String name, Repetition repetition, OriginalType originalType, ID id) {
 this(name, repetition, originalType, null, id);
   }
 
+  /**
+   * @deprecated use {@link #Type(String, Repetition, LogicalTypeAnnotation, 
ID)} instead
+   */
+  @Deprecated
 
 Review comment:
   see above


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-05 Thread Nandor Kollar (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426584#comment-16426584
 ] 

Nandor Kollar commented on PARQUET-1253:


Thanks Ryan for clarifying my questions!

> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-05 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16426582#comment-16426582
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

nandorKollar commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r179373718
 
 

 ##
 File path: 
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/metadata/ParquetMetadata.java
 ##
 @@ -41,6 +40,10 @@
 
   private static final ObjectMapper objectMapper = new ObjectMapper();
 
+  static {
+objectMapper.configure(SerializationConfig.Feature.FAIL_ON_EMPTY_BEANS, 
false);
 
 Review comment:
   Sure, no problem, I will.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-04 Thread Ryan Blue (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425994#comment-16425994
 ] 

Ryan Blue commented on PARQUET-1253:


Not including the UUID logical type in that union is probably an accident.

MAP_KEY_VALUE is no longer used. It is noted in [backward compatibility 
rules|https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#backward-compatibility-rules-1],
 but is not required for any types.

The [comment "only valid for 
primitives"|https://github.com/apache/parquet-format/blob/apache-parquet-format-2.5.0/src/main/thrift/parquet.thrift#L384]
 is incorrect. I think we can remove it. I'm not sure why the comment was there.

> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-04 Thread Nandor Kollar (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16425789#comment-16425789
 ] 

Nandor Kollar commented on PARQUET-1253:


While working on the new logical type representation three questions came to 
mind:
* Despite there is a Thrift struct for UUID logical type in parquet-format, it 
is not included into the LogicalType union. Is this on purpose, or was omitted 
accidentally? How should parquet-mr handle those schemas, where UUID annotation 
is used, but there's no corresponding LogicalType mapping?
* Similar question with MAP_KEY_VALUE, but it is not implemented at all in the 
new representation. What should parquet-mr do with those schemas, which use it 
in the old representation?
* In parquet-format the comment for {{optional LogicalType logicalType}} says 
{{"The logical type of this SchemaElement; only valid for primitives."}} but 
I'm confused, because there's a Map and a List logical type, which  - as far as 
I know - makes sense only on groups. What was the intention of this comment? Do 
I miss anything?

[~rdblue] I can see that you worked on the new logical type representation, 
could you please help me to clarify these questions?

> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424016#comment-16424016
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r178819326
 
 

 ##
 File path: parquet-column/src/main/java/org/apache/parquet/schema/Types.java
 ##
 @@ -252,7 +252,12 @@ protected final THIS repetition(Type.Repetition 
repetition) {
  * @return this builder for method chaining
  */
 public THIS as(OriginalType type) {
-  this.originalType = type;
+  this.logicalTypeAnnotation = 
LogicalTypeAnnotation.fromOriginalType(type);
+  return self();
+}
+
+public THIS as(LogicalTypeAnnotation type) {
 
 Review comment:
   This method breaks the fluent API of the builder as you need to use an 
outside factory method of the final type to create a LogicalTypeAnnotation. If 
it is practically feasible, I would suggest to refactor the 
LogicalTypeAnnotation API to fit more in the fluent API of Types.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424020#comment-16424020
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r178810222
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java
 ##
 @@ -0,0 +1,748 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.schema;
+
+import org.apache.parquet.format.BsonType;
+import org.apache.parquet.format.ConvertedType;
+import org.apache.parquet.format.DateType;
+import org.apache.parquet.format.DecimalType;
+import org.apache.parquet.format.EnumType;
+import org.apache.parquet.format.IntType;
+import org.apache.parquet.format.JsonType;
+import org.apache.parquet.format.ListType;
+import org.apache.parquet.format.LogicalType;
+import org.apache.parquet.format.MapType;
+import org.apache.parquet.format.MicroSeconds;
+import org.apache.parquet.format.MilliSeconds;
+import org.apache.parquet.format.NullType;
+import org.apache.parquet.format.StringType;
+import org.apache.parquet.format.TimeType;
+import org.apache.parquet.format.TimestampType;
+
+import java.util.Objects;
+
+public interface LogicalTypeAnnotation {
+  /**
+   * Convert this parquet-mr logical type to parquet-format LogicalType.
+   *
+   * @return the parquet-format LogicalType representation of this logical 
type implementation
+   */
+  LogicalType toLogicalType();
+
+  /**
+   * Convert this parquet-mr logical type to parquet-format ConvertedType.
+   *
+   * @return the parquet-format ConvertedType representation of this logical 
type implementation
+   */
+  ConvertedType toConvertedType();
+
+  /**
+   * Convert this logical type to old logical type representation in 
parquet-mr (if there's any).
+   * Those logical type implementations, which don't have a corresponding 
mapping should return null.
+   *
+   * @return the OriginalType representation of the new logical type, or null 
if there's none
+   */
+  OriginalType toOriginalType();
+
+  /**
+   * Helper method to convert the old representation of logical types 
(OriginalType) to new logical type.
+   */
+  static LogicalTypeAnnotation fromOriginalType(OriginalType originalType) {
+if (originalType == null) {
+  return null;
+}
+switch (originalType) {
+  case UTF8:
+return StringLogicalTypeAnnotation.create();
+  case MAP:
+return MapLogicalTypeAnnotation.create();
+  case DECIMAL:
+return DecimalLogicalTypeAnnotation.create();
+  case LIST:
+return ListLogicalTypeAnnotation.create();
+  case DATE:
+return DateLogicalTypeAnnotation.create();
+  case INTERVAL:
+return IntervalLogicalTypeAnnotation.create();
+  case TIMESTAMP_MILLIS:
+return TimestampLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MILLIS);
+  case TIMESTAMP_MICROS:
+return TimestampLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MICROS);
+  case TIME_MILLIS:
+return TimeLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MILLIS);
+  case TIME_MICROS:
+return TimeLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MICROS);
+  case UINT_8:
+return IntLogicalTypeAnnotation.create((byte) 8, false);
+  case UINT_16:
+return IntLogicalTypeAnnotation.create((byte) 16, false);
+  case UINT_32:
+return IntLogicalTypeAnnotation.create((byte) 32, false);
+  case UINT_64:
+return IntLogicalTypeAnnotation.create((byte) 64, false);
+  case INT_8:
+return IntLogicalTypeAnnotation.create((byte) 8, true);
+  case INT_16:
+return IntLogicalTypeAnnotation.create((byte) 16, true);
+  case INT_32:
+return IntLogicalTypeAnnotation.create((byte) 32, true);
+  case INT_64:
+return IntLogicalTypeAnnotation.create((byte) 64, true);
+  case ENUM:
+  

[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424018#comment-16424018
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r178809713
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java
 ##
 @@ -0,0 +1,748 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.schema;
+
+import org.apache.parquet.format.BsonType;
+import org.apache.parquet.format.ConvertedType;
+import org.apache.parquet.format.DateType;
+import org.apache.parquet.format.DecimalType;
+import org.apache.parquet.format.EnumType;
+import org.apache.parquet.format.IntType;
+import org.apache.parquet.format.JsonType;
+import org.apache.parquet.format.ListType;
+import org.apache.parquet.format.LogicalType;
+import org.apache.parquet.format.MapType;
+import org.apache.parquet.format.MicroSeconds;
+import org.apache.parquet.format.MilliSeconds;
+import org.apache.parquet.format.NullType;
+import org.apache.parquet.format.StringType;
+import org.apache.parquet.format.TimeType;
+import org.apache.parquet.format.TimestampType;
+
+import java.util.Objects;
+
+public interface LogicalTypeAnnotation {
+  /**
+   * Convert this parquet-mr logical type to parquet-format LogicalType.
+   *
+   * @return the parquet-format LogicalType representation of this logical 
type implementation
+   */
+  LogicalType toLogicalType();
+
+  /**
+   * Convert this parquet-mr logical type to parquet-format ConvertedType.
+   *
+   * @return the parquet-format ConvertedType representation of this logical 
type implementation
+   */
+  ConvertedType toConvertedType();
+
+  /**
+   * Convert this logical type to old logical type representation in 
parquet-mr (if there's any).
+   * Those logical type implementations, which don't have a corresponding 
mapping should return null.
+   *
+   * @return the OriginalType representation of the new logical type, or null 
if there's none
+   */
+  OriginalType toOriginalType();
+
+  /**
+   * Helper method to convert the old representation of logical types 
(OriginalType) to new logical type.
+   */
+  static LogicalTypeAnnotation fromOriginalType(OriginalType originalType) {
+if (originalType == null) {
+  return null;
+}
+switch (originalType) {
+  case UTF8:
+return StringLogicalTypeAnnotation.create();
+  case MAP:
+return MapLogicalTypeAnnotation.create();
+  case DECIMAL:
+return DecimalLogicalTypeAnnotation.create();
+  case LIST:
+return ListLogicalTypeAnnotation.create();
+  case DATE:
+return DateLogicalTypeAnnotation.create();
+  case INTERVAL:
+return IntervalLogicalTypeAnnotation.create();
+  case TIMESTAMP_MILLIS:
+return TimestampLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MILLIS);
+  case TIMESTAMP_MICROS:
+return TimestampLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MICROS);
+  case TIME_MILLIS:
+return TimeLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MILLIS);
+  case TIME_MICROS:
+return TimeLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MICROS);
+  case UINT_8:
+return IntLogicalTypeAnnotation.create((byte) 8, false);
+  case UINT_16:
+return IntLogicalTypeAnnotation.create((byte) 16, false);
+  case UINT_32:
+return IntLogicalTypeAnnotation.create((byte) 32, false);
+  case UINT_64:
+return IntLogicalTypeAnnotation.create((byte) 64, false);
+  case INT_8:
+return IntLogicalTypeAnnotation.create((byte) 8, true);
+  case INT_16:
+return IntLogicalTypeAnnotation.create((byte) 16, true);
+  case INT_32:
+return IntLogicalTypeAnnotation.create((byte) 32, true);
+  case INT_64:
+return IntLogicalTypeAnnotation.create((byte) 64, true);
+  case ENUM:
+  

[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424015#comment-16424015
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r178799024
 
 

 ##
 File path: 
parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java
 ##
 @@ -0,0 +1,748 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+package org.apache.parquet.schema;
+
+import org.apache.parquet.format.BsonType;
+import org.apache.parquet.format.ConvertedType;
+import org.apache.parquet.format.DateType;
+import org.apache.parquet.format.DecimalType;
+import org.apache.parquet.format.EnumType;
+import org.apache.parquet.format.IntType;
+import org.apache.parquet.format.JsonType;
+import org.apache.parquet.format.ListType;
+import org.apache.parquet.format.LogicalType;
+import org.apache.parquet.format.MapType;
+import org.apache.parquet.format.MicroSeconds;
+import org.apache.parquet.format.MilliSeconds;
+import org.apache.parquet.format.NullType;
+import org.apache.parquet.format.StringType;
+import org.apache.parquet.format.TimeType;
+import org.apache.parquet.format.TimestampType;
+
+import java.util.Objects;
+
+public interface LogicalTypeAnnotation {
+  /**
+   * Convert this parquet-mr logical type to parquet-format LogicalType.
+   *
+   * @return the parquet-format LogicalType representation of this logical 
type implementation
+   */
+  LogicalType toLogicalType();
+
+  /**
+   * Convert this parquet-mr logical type to parquet-format ConvertedType.
+   *
+   * @return the parquet-format ConvertedType representation of this logical 
type implementation
+   */
+  ConvertedType toConvertedType();
+
+  /**
+   * Convert this logical type to old logical type representation in 
parquet-mr (if there's any).
+   * Those logical type implementations, which don't have a corresponding 
mapping should return null.
+   *
+   * @return the OriginalType representation of the new logical type, or null 
if there's none
+   */
+  OriginalType toOriginalType();
+
+  /**
+   * Helper method to convert the old representation of logical types 
(OriginalType) to new logical type.
+   */
+  static LogicalTypeAnnotation fromOriginalType(OriginalType originalType) {
+if (originalType == null) {
+  return null;
+}
+switch (originalType) {
+  case UTF8:
+return StringLogicalTypeAnnotation.create();
+  case MAP:
+return MapLogicalTypeAnnotation.create();
+  case DECIMAL:
+return DecimalLogicalTypeAnnotation.create();
+  case LIST:
+return ListLogicalTypeAnnotation.create();
+  case DATE:
+return DateLogicalTypeAnnotation.create();
+  case INTERVAL:
+return IntervalLogicalTypeAnnotation.create();
+  case TIMESTAMP_MILLIS:
+return TimestampLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MILLIS);
+  case TIMESTAMP_MICROS:
+return TimestampLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MICROS);
+  case TIME_MILLIS:
+return TimeLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MILLIS);
+  case TIME_MICROS:
+return TimeLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MICROS);
+  case UINT_8:
+return IntLogicalTypeAnnotation.create((byte) 8, false);
+  case UINT_16:
+return IntLogicalTypeAnnotation.create((byte) 16, false);
+  case UINT_32:
+return IntLogicalTypeAnnotation.create((byte) 32, false);
+  case UINT_64:
+return IntLogicalTypeAnnotation.create((byte) 64, false);
+  case INT_8:
+return IntLogicalTypeAnnotation.create((byte) 8, true);
+  case INT_16:
+return IntLogicalTypeAnnotation.create((byte) 16, true);
+  case INT_32:
+return IntLogicalTypeAnnotation.create((byte) 32, true);
+  case INT_64:
+return IntLogicalTypeAnnotation.create((byte) 64, true);
+  case ENUM:
+  

[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424017#comment-16424017
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r178821942
 
 

 ##
 File path: 
parquet-hadoop/src/main/java/org/apache/parquet/hadoop/metadata/ParquetMetadata.java
 ##
 @@ -41,6 +40,10 @@
 
   private static final ObjectMapper objectMapper = new ObjectMapper();
 
+  static {
+objectMapper.configure(SerializationConfig.Feature.FAIL_ON_EMPTY_BEANS, 
false);
 
 Review comment:
   Could you please explain why it is necessary?


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-04-03 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16424019#comment-16424019
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

gszadovszky commented on a change in pull request #463: PARQUET-1253: Support 
for new logical type representation
URL: https://github.com/apache/parquet-mr/pull/463#discussion_r178821039
 
 

 ##
 File path: 
parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java
 ##
 @@ -586,108 +595,105 @@ Type getType(PrimitiveTypeName type) {
   }
 
   // Visible for testing
-  OriginalType getOriginalType(ConvertedType type) {
+  LogicalTypeAnnotation getOriginalType(ConvertedType type, SchemaElement 
schemaElement) {
 switch (type) {
   case UTF8:
-return OriginalType.UTF8;
+return LogicalTypeAnnotation.StringLogicalTypeAnnotation.create();
   case MAP:
-return OriginalType.MAP;
+return LogicalTypeAnnotation.MapLogicalTypeAnnotation.create();
   case MAP_KEY_VALUE:
-return OriginalType.MAP_KEY_VALUE;
+return LogicalTypeAnnotation.MapKeyValueTypeAnnotation.create();
   case LIST:
-return OriginalType.LIST;
+return LogicalTypeAnnotation.ListLogicalTypeAnnotation.create();
   case ENUM:
-return OriginalType.ENUM;
+return LogicalTypeAnnotation.EnumLogicalTypeAnnotation.create();
   case DECIMAL:
-return OriginalType.DECIMAL;
+if (schemaElement == null) {
+  return LogicalTypeAnnotation.DecimalLogicalTypeAnnotation.create();
+}
+return 
LogicalTypeAnnotation.DecimalLogicalTypeAnnotation.create(schemaElement.scale, 
schemaElement.precision);
   case DATE:
-return OriginalType.DATE;
+return LogicalTypeAnnotation.DateLogicalTypeAnnotation.create();
   case TIME_MILLIS:
-return OriginalType.TIME_MILLIS;
+return LogicalTypeAnnotation.TimeLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MILLIS);
   case TIME_MICROS:
-return OriginalType.TIME_MICROS;
+return LogicalTypeAnnotation.TimeLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MICROS);
   case TIMESTAMP_MILLIS:
-return OriginalType.TIMESTAMP_MILLIS;
+return 
LogicalTypeAnnotation.TimestampLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MILLIS);
   case TIMESTAMP_MICROS:
-return OriginalType.TIMESTAMP_MICROS;
+return 
LogicalTypeAnnotation.TimestampLogicalTypeAnnotation.create(true, 
LogicalTypeAnnotation.TimeUnit.MICROS);
   case INTERVAL:
-return OriginalType.INTERVAL;
+return LogicalTypeAnnotation.IntervalLogicalTypeAnnotation.create();
   case INT_8:
-return OriginalType.INT_8;
+return LogicalTypeAnnotation.IntLogicalTypeAnnotation.create((byte) 8, 
true);
   case INT_16:
-return OriginalType.INT_16;
+return LogicalTypeAnnotation.IntLogicalTypeAnnotation.create((byte) 
16, true);
   case INT_32:
-return OriginalType.INT_32;
+return LogicalTypeAnnotation.IntLogicalTypeAnnotation.create((byte) 
32, true);
   case INT_64:
-return OriginalType.INT_64;
+return LogicalTypeAnnotation.IntLogicalTypeAnnotation.create((byte) 
64, true);
   case UINT_8:
-return OriginalType.UINT_8;
+return LogicalTypeAnnotation.IntLogicalTypeAnnotation.create((byte) 8, 
false);
   case UINT_16:
-return OriginalType.UINT_16;
+return LogicalTypeAnnotation.IntLogicalTypeAnnotation.create((byte) 
16, false);
   case UINT_32:
-return OriginalType.UINT_32;
+return LogicalTypeAnnotation.IntLogicalTypeAnnotation.create((byte) 
32, false);
   case UINT_64:
-return OriginalType.UINT_64;
+return LogicalTypeAnnotation.IntLogicalTypeAnnotation.create((byte) 
64, false);
   case JSON:
-return OriginalType.JSON;
+return LogicalTypeAnnotation.JsonLogicalTypeAnnotation.create();
   case BSON:
-return OriginalType.BSON;
+return LogicalTypeAnnotation.BsonLogicalTypeAnnotation.create();
   default:
-throw new RuntimeException("Unknown converted type " + type);
+return LogicalTypeAnnotation.NullLogicalTypeAnnotation.create();
 }
   }
 
-  // Visible for testing
-  ConvertedType getConvertedType(OriginalType type) {
-switch (type) {
-  case UTF8:
-return ConvertedType.UTF8;
+  LogicalTypeAnnotation getOriginalType(LogicalType type) {
+switch (type.getSetField()) {
   case MAP:
-return ConvertedType.MAP;
-  case MAP_KEY_VALUE:
-return ConvertedType.MAP_KEY_VALUE;
-  case LIST:
-return ConvertedType.LIST;
-  case ENUM:
-return ConvertedType.ENUM;
-  case DECIMAL:
-return ConvertedType.DECIMAL;
+return 

[jira] [Commented] (PARQUET-1253) Support for new logical type representation

2018-03-29 Thread ASF GitHub Bot (JIRA)

[ 
https://issues.apache.org/jira/browse/PARQUET-1253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16418874#comment-16418874
 ] 

ASF GitHub Bot commented on PARQUET-1253:
-

nandorKollar opened a new pull request #463: PARQUET-1253: Support for new 
logical type representation
URL: https://github.com/apache/parquet-mr/pull/463
 
 
   This PR implements the new logical type representation in parquet-mr which 
is already available in parquet-format. Reviews are welcome!


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> Support for new logical type representation
> ---
>
> Key: PARQUET-1253
> URL: https://issues.apache.org/jira/browse/PARQUET-1253
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Nandor Kollar
>Assignee: Nandor Kollar
>Priority: Major
>
> Latest parquet-format 
> [introduced|https://github.com/apache/parquet-format/commit/863875e0be3237c6aa4ed71733d54c91a51deabe#diff-0f9d1b5347959e15259da7ba8f4b6252]
>  a new representation for logical types. As of now this is not yet supported 
> in parquet-mr, thus there's no way to use parametrized UTC normalized 
> timestamp data types. When reading and writing Parquet files, besides 
> 'converted_type' parquet-mr should use the new 'logicalType' field in 
> SchemaElement to tell the current logical type annotation. To maintain 
> backward compatibility, the semantic of converted_type shouldn't change.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)