[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr

2021-04-01 Thread Xinli Shang (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313524#comment-17313524
 ] 

Xinli Shang commented on PARQUET-1827:
--

It seems the storage size is reduced by ~8%  for the UUID column. 

> UUID type currently not supported by parquet-mr
> ---
>
> Key: PARQUET-1827
> URL: https://issues.apache.org/jira/browse/PARQUET-1827
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Brad Smith
>Assignee: Gabor Szadovszky
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> The parquet-format project introduced a new UUID logical type in version 2.4:
> [https://github.com/apache/parquet-format/blob/master/CHANGES.md]
> This would be a useful type to have available in some circumstances, but it 
> currently isn't supported in the parquet-mr library. Hopefully this feature 
> can be implemented at some point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr

2021-01-04 Thread Gabor Szadovszky (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258287#comment-17258287
 ] 

Gabor Szadovszky commented on PARQUET-1827:
---

[~vitalii], there are some open points for 1.12.0. I think we will know more 
about a release date after the parquet sync next week.

> UUID type currently not supported by parquet-mr
> ---
>
> Key: PARQUET-1827
> URL: https://issues.apache.org/jira/browse/PARQUET-1827
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Brad Smith
>Assignee: Gabor Szadovszky
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
>
> The parquet-format project introduced a new UUID logical type in version 2.4:
> [https://github.com/apache/parquet-format/blob/master/CHANGES.md]
> This would be a useful type to have available in some circumstances, but it 
> currently isn't supported in the parquet-mr library. Hopefully this feature 
> can be implemented at some point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr

2020-12-28 Thread Vitalii Diravka (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255609#comment-17255609
 ] 

Vitalii Diravka commented on PARQUET-1827:
--

[~gszadovszky] I found it is still not present in 1.11.1 parquet-mr version. Is 
targeted to 1.12.0 version? If yes, could you point it is in Jira ticket 
please. 
When parquet-mr 1.12.0 version is planned to be released?

> UUID type currently not supported by parquet-mr
> ---
>
> Key: PARQUET-1827
> URL: https://issues.apache.org/jira/browse/PARQUET-1827
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Brad Smith
>Assignee: Gabor Szadovszky
>Priority: Major
>  Labels: pull-request-available
>
> The parquet-format project introduced a new UUID logical type in version 2.4:
> [https://github.com/apache/parquet-format/blob/master/CHANGES.md]
> This would be a useful type to have available in some circumstances, but it 
> currently isn't supported in the parquet-mr library. Hopefully this feature 
> can be implemented at some point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr

2020-06-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125599#comment-17125599
 ] 

ASF GitHub Bot commented on PARQUET-1827:
-

gszadovszky merged pull request #778:
URL: https://github.com/apache/parquet-mr/pull/778


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> UUID type currently not supported by parquet-mr
> ---
>
> Key: PARQUET-1827
> URL: https://issues.apache.org/jira/browse/PARQUET-1827
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Brad Smith
>Assignee: Gabor Szadovszky
>Priority: Major
>  Labels: pull-request-available
>
> The parquet-format project introduced a new UUID logical type in version 2.4:
> [https://github.com/apache/parquet-format/blob/master/CHANGES.md]
> This would be a useful type to have available in some circumstances, but it 
> currently isn't supported in the parquet-mr library. Hopefully this feature 
> can be implemented at some point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr

2020-06-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125026#comment-17125026
 ] 

ASF GitHub Bot commented on PARQUET-1827:
-

shangxinli commented on a change in pull request #778:
URL: https://github.com/apache/parquet-mr/pull/778#discussion_r434631781



##
File path: 
parquet-avro/src/test/java/org/apache/parquet/avro/TestAvroSchemaConverter.java
##
@@ -766,6 +768,33 @@ public void testReuseNameInNestedStructureAtSameLevel() 
throws Exception {
 testParquetToAvroConversion(NEW_BEHAVIOR, schema, parquetSchema);
   }
 
+  @Test
+  public void testUUIDType() throws Exception {
+Schema fromAvro = Schema.createRecord("myrecord", null, null, false,
+Arrays.asList(new Schema.Field("uuid", 
LogicalTypes.uuid().addToSchema(Schema.create(STRING)), null, null)));
+String parquet = "message myrecord {\n" +
+"  required binary uuid (STRING);\n" +
+"}\n";
+Schema toAvro = Schema.createRecord("myrecord", null, null, false,
+Arrays.asList(new Schema.Field("uuid", Schema.create(STRING), null, 
null)));
+
+testAvroToParquetConversion(fromAvro, parquet);
+testParquetToAvroConversion(toAvro, parquet);
+  }

Review comment:
   Sounds good





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> UUID type currently not supported by parquet-mr
> ---
>
> Key: PARQUET-1827
> URL: https://issues.apache.org/jira/browse/PARQUET-1827
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Brad Smith
>Assignee: Gabor Szadovszky
>Priority: Major
>  Labels: pull-request-available
>
> The parquet-format project introduced a new UUID logical type in version 2.4:
> [https://github.com/apache/parquet-format/blob/master/CHANGES.md]
> This would be a useful type to have available in some circumstances, but it 
> currently isn't supported in the parquet-mr library. Hopefully this feature 
> can be implemented at some point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr

2020-06-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124935#comment-17124935
 ] 

ASF GitHub Bot commented on PARQUET-1827:
-

gszadovszky commented on a change in pull request #778:
URL: https://github.com/apache/parquet-mr/pull/778#discussion_r434550078



##
File path: 
parquet-avro/src/test/java/org/apache/parquet/avro/TestAvroSchemaConverter.java
##
@@ -766,6 +768,33 @@ public void testReuseNameInNestedStructureAtSameLevel() 
throws Exception {
 testParquetToAvroConversion(NEW_BEHAVIOR, schema, parquetSchema);
   }
 
+  @Test
+  public void testUUIDType() throws Exception {
+Schema fromAvro = Schema.createRecord("myrecord", null, null, false,
+Arrays.asList(new Schema.Field("uuid", 
LogicalTypes.uuid().addToSchema(Schema.create(STRING)), null, null)));
+String parquet = "message myrecord {\n" +
+"  required binary uuid (STRING);\n" +
+"}\n";
+Schema toAvro = Schema.createRecord("myrecord", null, null, false,
+Arrays.asList(new Schema.Field("uuid", Schema.create(STRING), null, 
null)));
+
+testAvroToParquetConversion(fromAvro, parquet);
+testParquetToAvroConversion(toAvro, parquet);
+  }

Review comment:
   The 
[`testRoundTripConversion`](https://github.com/apache/parquet-mr/blob/master/parquet-avro/src/test/java/org/apache/parquet/avro/TestAvroSchemaConverter.java#L144)
 I'm using in `testUUIDTypeWithParquetUUID` is actually stronger than the one 
you suggested: it checks for equality (in two phases) of the initial and the 
result avro schemas (and not only for compatibility). For `testUUIDType`, 
though it is a good idea to check the compatibility of the avro schemas.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> UUID type currently not supported by parquet-mr
> ---
>
> Key: PARQUET-1827
> URL: https://issues.apache.org/jira/browse/PARQUET-1827
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Brad Smith
>Assignee: Gabor Szadovszky
>Priority: Major
>  Labels: pull-request-available
>
> The parquet-format project introduced a new UUID logical type in version 2.4:
> [https://github.com/apache/parquet-format/blob/master/CHANGES.md]
> This would be a useful type to have available in some circumstances, but it 
> currently isn't supported in the parquet-mr library. Hopefully this feature 
> can be implemented at some point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr

2020-06-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17123452#comment-17123452
 ] 

ASF GitHub Bot commented on PARQUET-1827:
-

gszadovszky commented on a change in pull request #778:
URL: https://github.com/apache/parquet-mr/pull/778#discussion_r433681475



##
File path: 
parquet-column/src/test/java/org/apache/parquet/schema/TestPrimitiveStringifier.java
##
@@ -309,6 +308,35 @@ public void testDecimalStringifier() {
 checkThrowingUnsupportedException(stringifier, Integer.TYPE, Long.TYPE, 
Binary.class);
   }
 
+  @Test
+  public void testUUIDStringifier() {
+PrimitiveStringifier stringifier = PrimitiveStringifier.UUID_STRINGIFIER;
+
+assertEquals("00112233-4455-6677-8899-aabbccddeeff", stringifier.stringify(
+toBinary(0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 
0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff)));
+assertEquals("----", stringifier.stringify(
+toBinary(0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00, 0x00, 0x00, 0x00, 0x00)));
+assertEquals("----", stringifier.stringify(
+toBinary(0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 
0xff, 0xff, 0xff, 0xff, 0xff, 0xff)));
+
+assertEquals("0eb1497c-19b6-42bc-b028-b4b612bed141", stringifier.stringify(

Review comment:
   I'm happy to add tests for the edge cases like too short or too long 
inputs. Though, I would not implement additional validations because of 
performance issues. A `stringify` method would be invoked on each values; an 
additional check would highly impact performance even if it is only used from 
the tools and not really in production. A `Stringifier` is associated to the 
value at schema level which means it shall never happen that the value is 
invalid. That's why the `Stringifier` implementations do not validate the 
values.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> UUID type currently not supported by parquet-mr
> ---
>
> Key: PARQUET-1827
> URL: https://issues.apache.org/jira/browse/PARQUET-1827
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Brad Smith
>Assignee: Gabor Szadovszky
>Priority: Major
>  Labels: pull-request-available
>
> The parquet-format project introduced a new UUID logical type in version 2.4:
> [https://github.com/apache/parquet-format/blob/master/CHANGES.md]
> This would be a useful type to have available in some circumstances, but it 
> currently isn't supported in the parquet-mr library. Hopefully this feature 
> can be implemented at some point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr

2020-06-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17123450#comment-17123450
 ] 

ASF GitHub Bot commented on PARQUET-1827:
-

gszadovszky commented on a change in pull request #778:
URL: https://github.com/apache/parquet-mr/pull/778#discussion_r433678473



##
File path: 
parquet-avro/src/test/java/org/apache/parquet/avro/TestAvroSchemaConverter.java
##
@@ -766,6 +768,33 @@ public void testReuseNameInNestedStructureAtSameLevel() 
throws Exception {
 testParquetToAvroConversion(NEW_BEHAVIOR, schema, parquetSchema);
   }
 
+  @Test
+  public void testUUIDType() throws Exception {
+Schema fromAvro = Schema.createRecord("myrecord", null, null, false,
+Arrays.asList(new Schema.Field("uuid", 
LogicalTypes.uuid().addToSchema(Schema.create(STRING)), null, null)));
+String parquet = "message myrecord {\n" +
+"  required binary uuid (STRING);\n" +
+"}\n";
+Schema toAvro = Schema.createRecord("myrecord", null, null, false,
+Arrays.asList(new Schema.Field("uuid", Schema.create(STRING), null, 
null)));
+
+testAvroToParquetConversion(fromAvro, parquet);
+testParquetToAvroConversion(toAvro, parquet);
+  }

Review comment:
   I'll look into this just did not have time to work on this PR. Thanks a 
lot for reviewing. :)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> UUID type currently not supported by parquet-mr
> ---
>
> Key: PARQUET-1827
> URL: https://issues.apache.org/jira/browse/PARQUET-1827
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Brad Smith
>Assignee: Gabor Szadovszky
>Priority: Major
>  Labels: pull-request-available
>
> The parquet-format project introduced a new UUID logical type in version 2.4:
> [https://github.com/apache/parquet-format/blob/master/CHANGES.md]
> This would be a useful type to have available in some circumstances, but it 
> currently isn't supported in the parquet-mr library. Hopefully this feature 
> can be implemented at some point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr

2020-05-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119921#comment-17119921
 ] 

ASF GitHub Bot commented on PARQUET-1827:
-

shangxinli commented on a change in pull request #778:
URL: https://github.com/apache/parquet-mr/pull/778#discussion_r432725762



##
File path: parquet-avro/README.md
##
@@ -0,0 +1,44 @@
+
> Key: PARQUET-1827
> URL: https://issues.apache.org/jira/browse/PARQUET-1827
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Brad Smith
>Assignee: Gabor Szadovszky
>Priority: Major
>  Labels: pull-request-available
>
> The parquet-format project introduced a new UUID logical type in version 2.4:
> [https://github.com/apache/parquet-format/blob/master/CHANGES.md]
> This would be a useful type to have available in some circumstances, but it 
> currently isn't supported in the parquet-mr library. Hopefully this feature 
> can be implemented at some point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr

2020-05-29 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119920#comment-17119920
 ] 

ASF GitHub Bot commented on PARQUET-1827:
-

shangxinli commented on a change in pull request #778:
URL: https://github.com/apache/parquet-mr/pull/778#discussion_r432725661



##
File path: 
parquet-avro/src/test/java/org/apache/parquet/avro/TestAvroSchemaConverter.java
##
@@ -766,6 +768,33 @@ public void testReuseNameInNestedStructureAtSameLevel() 
throws Exception {
 testParquetToAvroConversion(NEW_BEHAVIOR, schema, parquetSchema);
   }
 
+  @Test
+  public void testUUIDType() throws Exception {
+Schema fromAvro = Schema.createRecord("myrecord", null, null, false,
+Arrays.asList(new Schema.Field("uuid", 
LogicalTypes.uuid().addToSchema(Schema.create(STRING)), null, null)));
+String parquet = "message myrecord {\n" +
+"  required binary uuid (STRING);\n" +
+"}\n";
+Schema toAvro = Schema.createRecord("myrecord", null, null, false,
+Arrays.asList(new Schema.Field("uuid", Schema.create(STRING), null, 
null)));
+
+testAvroToParquetConversion(fromAvro, parquet);
+testParquetToAvroConversion(toAvro, parquet);
+  }

Review comment:
   If it is too much effort for doing this, it is OK not to do it. It is a 
lower priority. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> UUID type currently not supported by parquet-mr
> ---
>
> Key: PARQUET-1827
> URL: https://issues.apache.org/jira/browse/PARQUET-1827
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Brad Smith
>Assignee: Gabor Szadovszky
>Priority: Major
>  Labels: pull-request-available
>
> The parquet-format project introduced a new UUID logical type in version 2.4:
> [https://github.com/apache/parquet-format/blob/master/CHANGES.md]
> This would be a useful type to have available in some circumstances, but it 
> currently isn't supported in the parquet-mr library. Hopefully this feature 
> can be implemented at some point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr

2020-05-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17103555#comment-17103555
 ] 

ASF GitHub Bot commented on PARQUET-1827:
-

shangxinli commented on a change in pull request #778:
URL: https://github.com/apache/parquet-mr/pull/778#discussion_r422564200



##
File path: 
parquet-column/src/test/java/org/apache/parquet/schema/TestPrimitiveStringifier.java
##
@@ -309,6 +308,35 @@ public void testDecimalStringifier() {
 checkThrowingUnsupportedException(stringifier, Integer.TYPE, Long.TYPE, 
Binary.class);
   }
 
+  @Test
+  public void testUUIDStringifier() {
+PrimitiveStringifier stringifier = PrimitiveStringifier.UUID_STRINGIFIER;
+
+assertEquals("00112233-4455-6677-8899-aabbccddeeff", stringifier.stringify(
+toBinary(0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 
0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff)));
+assertEquals("----", stringifier.stringify(
+toBinary(0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00, 0x00, 0x00, 0x00, 0x00)));
+assertEquals("----", stringifier.stringify(
+toBinary(0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 
0xff, 0xff, 0xff, 0xff, 0xff, 0xff)));
+
+assertEquals("0eb1497c-19b6-42bc-b028-b4b612bed141", stringifier.stringify(

Review comment:
   By duplicate coverage, I meant the #2, #3 tests seems repeating the same 
test as #1. The value is different, but when the test executes, they would 
execute the same code path. So I think they won't provide extra coverage. 
   
   From test perspective, negative test does provide values. In this case, we 
can test the exception is thrown as expected if it is too short. 
   
   For "if the array is longer than 16 it would simply stringify the first 16 
and skip the others", that could cause silent errors, right? 
   





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> UUID type currently not supported by parquet-mr
> ---
>
> Key: PARQUET-1827
> URL: https://issues.apache.org/jira/browse/PARQUET-1827
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Brad Smith
>Assignee: Gabor Szadovszky
>Priority: Major
>  Labels: pull-request-available
>
> The parquet-format project introduced a new UUID logical type in version 2.4:
> [https://github.com/apache/parquet-format/blob/master/CHANGES.md]
> This would be a useful type to have available in some circumstances, but it 
> currently isn't supported in the parquet-mr library. Hopefully this feature 
> can be implemented at some point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr

2020-05-09 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17103548#comment-17103548
 ] 

ASF GitHub Bot commented on PARQUET-1827:
-

shangxinli commented on a change in pull request #778:
URL: https://github.com/apache/parquet-mr/pull/778#discussion_r422563108



##
File path: 
parquet-avro/src/test/java/org/apache/parquet/avro/TestAvroSchemaConverter.java
##
@@ -766,6 +768,33 @@ public void testReuseNameInNestedStructureAtSameLevel() 
throws Exception {
 testParquetToAvroConversion(NEW_BEHAVIOR, schema, parquetSchema);
   }
 
+  @Test
+  public void testUUIDType() throws Exception {
+Schema fromAvro = Schema.createRecord("myrecord", null, null, false,
+Arrays.asList(new Schema.Field("uuid", 
LogicalTypes.uuid().addToSchema(Schema.create(STRING)), null, null)));
+String parquet = "message myrecord {\n" +
+"  required binary uuid (STRING);\n" +
+"}\n";
+Schema toAvro = Schema.createRecord("myrecord", null, null, false,
+Arrays.asList(new Schema.Field("uuid", Schema.create(STRING), null, 
null)));
+
+testAvroToParquetConversion(fromAvro, parquet);
+testParquetToAvroConversion(toAvro, parquet);
+  }

Review comment:
   Basically we found some type of avro schema is not compatible with the 
parquet schema which it is converted to. This caused problem that the data 
cannot be read. I have a test here 
(https://github.com/shangxinli/parquet-mr/commit/f80469f55b83404ea334ee4019f658ecdb5ac575#diff-536ca67880a7870cf8df8f95143bd7d7R814)
 that reproduce the issue for a nested schema. I know likely UUID type won't 
have this issue but I better to have a test for it. It is pretty easy to add 
also.  





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> UUID type currently not supported by parquet-mr
> ---
>
> Key: PARQUET-1827
> URL: https://issues.apache.org/jira/browse/PARQUET-1827
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Brad Smith
>Assignee: Gabor Szadovszky
>Priority: Major
>  Labels: pull-request-available
>
> The parquet-format project introduced a new UUID logical type in version 2.4:
> [https://github.com/apache/parquet-format/blob/master/CHANGES.md]
> This would be a useful type to have available in some circumstances, but it 
> currently isn't supported in the parquet-mr library. Hopefully this feature 
> can be implemented at some point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr

2020-05-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101496#comment-17101496
 ] 

ASF GitHub Bot commented on PARQUET-1827:
-

gszadovszky commented on a change in pull request #778:
URL: https://github.com/apache/parquet-mr/pull/778#discussion_r421389575



##
File path: 
parquet-avro/src/test/java/org/apache/parquet/avro/TestAvroSchemaConverter.java
##
@@ -766,6 +768,33 @@ public void testReuseNameInNestedStructureAtSameLevel() 
throws Exception {
 testParquetToAvroConversion(NEW_BEHAVIOR, schema, parquetSchema);
   }
 
+  @Test
+  public void testUUIDType() throws Exception {
+Schema fromAvro = Schema.createRecord("myrecord", null, null, false,
+Arrays.asList(new Schema.Field("uuid", 
LogicalTypes.uuid().addToSchema(Schema.create(STRING)), null, null)));
+String parquet = "message myrecord {\n" +
+"  required binary uuid (STRING);\n" +
+"}\n";
+Schema toAvro = Schema.createRecord("myrecord", null, null, false,
+Arrays.asList(new Schema.Field("uuid", Schema.create(STRING), null, 
null)));
+
+testAvroToParquetConversion(fromAvro, parquet);
+testParquetToAvroConversion(toAvro, parquet);
+  }

Review comment:
   To be honest I am not too familiar with parquet-avro. I've made the 
changes based on the implementation/test of other logical types. Could you 
explain it in more details what you would test exactly?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> UUID type currently not supported by parquet-mr
> ---
>
> Key: PARQUET-1827
> URL: https://issues.apache.org/jira/browse/PARQUET-1827
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Brad Smith
>Assignee: Gabor Szadovszky
>Priority: Major
>  Labels: pull-request-available
>
> The parquet-format project introduced a new UUID logical type in version 2.4:
> [https://github.com/apache/parquet-format/blob/master/CHANGES.md]
> This would be a useful type to have available in some circumstances, but it 
> currently isn't supported in the parquet-mr library. Hopefully this feature 
> can be implemented at some point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr

2020-05-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101490#comment-17101490
 ] 

ASF GitHub Bot commented on PARQUET-1827:
-

gszadovszky commented on a change in pull request #778:
URL: https://github.com/apache/parquet-mr/pull/778#discussion_r421384119



##
File path: parquet-avro/README.md
##
@@ -0,0 +1,44 @@
+
> Key: PARQUET-1827
> URL: https://issues.apache.org/jira/browse/PARQUET-1827
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Brad Smith
>Assignee: Gabor Szadovszky
>Priority: Major
>  Labels: pull-request-available
>
> The parquet-format project introduced a new UUID logical type in version 2.4:
> [https://github.com/apache/parquet-format/blob/master/CHANGES.md]
> This would be a useful type to have available in some circumstances, but it 
> currently isn't supported in the parquet-mr library. Hopefully this feature 
> can be implemented at some point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr

2020-05-07 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101484#comment-17101484
 ] 

ASF GitHub Bot commented on PARQUET-1827:
-

gszadovszky commented on a change in pull request #778:
URL: https://github.com/apache/parquet-mr/pull/778#discussion_r421381112



##
File path: 
parquet-column/src/test/java/org/apache/parquet/schema/TestPrimitiveStringifier.java
##
@@ -309,6 +308,35 @@ public void testDecimalStringifier() {
 checkThrowingUnsupportedException(stringifier, Integer.TYPE, Long.TYPE, 
Binary.class);
   }
 
+  @Test
+  public void testUUIDStringifier() {
+PrimitiveStringifier stringifier = PrimitiveStringifier.UUID_STRINGIFIER;
+
+assertEquals("00112233-4455-6677-8899-aabbccddeeff", stringifier.stringify(
+toBinary(0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 
0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff)));
+assertEquals("----", stringifier.stringify(
+toBinary(0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00, 0x00, 0x00, 0x00, 0x00)));
+assertEquals("----", stringifier.stringify(
+toBinary(0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 
0xff, 0xff, 0xff, 0xff, 0xff, 0xff)));
+
+assertEquals("0eb1497c-19b6-42bc-b028-b4b612bed141", stringifier.stringify(

Review comment:
   The idea is to have 3 kind-of corner cases and 3 common (random but 
constant) values. What do you mean by duplicate coverage? (I think, we do not 
need exhaustive testing for the stringifiers because they only used by our 
tools for debugging purposes.)
   
   The stringifiers do not validate the data they get because of performance 
reasons. So, if the array is longer than 16 it would simply stringify the first 
16 and skip the others. In case of the length is too short then an 
`ArrayIndexOutOfBoundsException` would be thrown. Do you think we should test 
these cases? They would not reach any additional branches in the parquet code.
   Invalid characters are not possible. The full set of values of the 16 bytes 
array is covered in UUID.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> UUID type currently not supported by parquet-mr
> ---
>
> Key: PARQUET-1827
> URL: https://issues.apache.org/jira/browse/PARQUET-1827
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Brad Smith
>Assignee: Gabor Szadovszky
>Priority: Major
>  Labels: pull-request-available
>
> The parquet-format project introduced a new UUID logical type in version 2.4:
> [https://github.com/apache/parquet-format/blob/master/CHANGES.md]
> This would be a useful type to have available in some circumstances, but it 
> currently isn't supported in the parquet-mr library. Hopefully this feature 
> can be implemented at some point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr

2020-05-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098048#comment-17098048
 ] 

ASF GitHub Bot commented on PARQUET-1827:
-

shangxinli commented on a change in pull request #778:
URL: https://github.com/apache/parquet-mr/pull/778#discussion_r418987550



##
File path: 
parquet-avro/src/test/java/org/apache/parquet/avro/TestAvroSchemaConverter.java
##
@@ -766,6 +768,33 @@ public void testReuseNameInNestedStructureAtSameLevel() 
throws Exception {
 testParquetToAvroConversion(NEW_BEHAVIOR, schema, parquetSchema);
   }
 
+  @Test
+  public void testUUIDType() throws Exception {
+Schema fromAvro = Schema.createRecord("myrecord", null, null, false,
+Arrays.asList(new Schema.Field("uuid", 
LogicalTypes.uuid().addToSchema(Schema.create(STRING)), null, null)));
+String parquet = "message myrecord {\n" +
+"  required binary uuid (STRING);\n" +
+"}\n";
+Schema toAvro = Schema.createRecord("myrecord", null, null, false,
+Arrays.asList(new Schema.Field("uuid", Schema.create(STRING), null, 
null)));
+
+testAvroToParquetConversion(fromAvro, parquet);
+testParquetToAvroConversion(toAvro, parquet);
+  }

Review comment:
   Can we have checkReaderWriterCompatibility() to verify if the parquet 
and avro schema are compatible for UUID?  
   There are issues like PARQUET-1681 for avro schema and parquet schema 
conversion for other types. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> UUID type currently not supported by parquet-mr
> ---
>
> Key: PARQUET-1827
> URL: https://issues.apache.org/jira/browse/PARQUET-1827
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Brad Smith
>Assignee: Gabor Szadovszky
>Priority: Major
>  Labels: pull-request-available
>
> The parquet-format project introduced a new UUID logical type in version 2.4:
> [https://github.com/apache/parquet-format/blob/master/CHANGES.md]
> This would be a useful type to have available in some circumstances, but it 
> currently isn't supported in the parquet-mr library. Hopefully this feature 
> can be implemented at some point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr

2020-05-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098032#comment-17098032
 ] 

ASF GitHub Bot commented on PARQUET-1827:
-

shangxinli commented on a change in pull request #778:
URL: https://github.com/apache/parquet-mr/pull/778#discussion_r418985058



##
File path: parquet-avro/README.md
##
@@ -0,0 +1,44 @@
+
> Key: PARQUET-1827
> URL: https://issues.apache.org/jira/browse/PARQUET-1827
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Brad Smith
>Assignee: Gabor Szadovszky
>Priority: Major
>  Labels: pull-request-available
>
> The parquet-format project introduced a new UUID logical type in version 2.4:
> [https://github.com/apache/parquet-format/blob/master/CHANGES.md]
> This would be a useful type to have available in some circumstances, but it 
> currently isn't supported in the parquet-mr library. Hopefully this feature 
> can be implemented at some point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr

2020-05-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098030#comment-17098030
 ] 

ASF GitHub Bot commented on PARQUET-1827:
-

shangxinli commented on a change in pull request #778:
URL: https://github.com/apache/parquet-mr/pull/778#discussion_r418984184



##
File path: 
parquet-column/src/test/java/org/apache/parquet/schema/TestPrimitiveStringifier.java
##
@@ -309,6 +308,35 @@ public void testDecimalStringifier() {
 checkThrowingUnsupportedException(stringifier, Integer.TYPE, Long.TYPE, 
Binary.class);
   }
 
+  @Test
+  public void testUUIDStringifier() {
+PrimitiveStringifier stringifier = PrimitiveStringifier.UUID_STRINGIFIER;
+
+assertEquals("00112233-4455-6677-8899-aabbccddeeff", stringifier.stringify(
+toBinary(0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 
0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff)));
+assertEquals("----", stringifier.stringify(
+toBinary(0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 
0x00, 0x00, 0x00, 0x00, 0x00, 0x00)));
+assertEquals("----", stringifier.stringify(
+toBinary(0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 
0xff, 0xff, 0xff, 0xff, 0xff, 0xff)));
+
+assertEquals("0eb1497c-19b6-42bc-b028-b4b612bed141", stringifier.stringify(

Review comment:
   Are the 3 test data are randomly chosen? It seems duplicate coverage. 
   
   Could you add some negative tests like incorrect length uuids, invalid 
characters etc.  





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> UUID type currently not supported by parquet-mr
> ---
>
> Key: PARQUET-1827
> URL: https://issues.apache.org/jira/browse/PARQUET-1827
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Brad Smith
>Assignee: Gabor Szadovszky
>Priority: Major
>  Labels: pull-request-available
>
> The parquet-format project introduced a new UUID logical type in version 2.4:
> [https://github.com/apache/parquet-format/blob/master/CHANGES.md]
> This would be a useful type to have available in some circumstances, but it 
> currently isn't supported in the parquet-mr library. Hopefully this feature 
> can be implemented at some point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr

2020-05-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098025#comment-17098025
 ] 

ASF GitHub Bot commented on PARQUET-1827:
-

shangxinli commented on a change in pull request #778:
URL: https://github.com/apache/parquet-mr/pull/778#discussion_r418981845



##
File path: 
parquet-column/src/main/java/org/apache/parquet/schema/PrimitiveStringifier.java
##
@@ -421,4 +422,30 @@ private String stringifyWithScale(BigInteger i) {
   }
 };
   }
+
+  static final PrimitiveStringifier UUID_STRINGIFIER = new 
PrimitiveStringifier("UUID_STRINGIFIER") {
+private final char[] digit = "0123456789abcdef".toCharArray();
+@Override
+public String stringify(Binary value) {
+  byte[] bytes = value.getBytesUnsafe();

Review comment:
   Do you want to make 'bytes' final since you are calling 
getBytesUnsafe()? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> UUID type currently not supported by parquet-mr
> ---
>
> Key: PARQUET-1827
> URL: https://issues.apache.org/jira/browse/PARQUET-1827
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Brad Smith
>Assignee: Gabor Szadovszky
>Priority: Major
>  Labels: pull-request-available
>
> The parquet-format project introduced a new UUID logical type in version 2.4:
> [https://github.com/apache/parquet-format/blob/master/CHANGES.md]
> This would be a useful type to have available in some circumstances, but it 
> currently isn't supported in the parquet-mr library. Hopefully this feature 
> can be implemented at some point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr

2020-05-02 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098019#comment-17098019
 ] 

ASF GitHub Bot commented on PARQUET-1827:
-

shangxinli commented on a change in pull request #778:
URL: https://github.com/apache/parquet-mr/pull/778#discussion_r418980107



##
File path: 
parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java
##
@@ -861,6 +871,36 @@ PrimitiveStringifier valueStringifier(PrimitiveType 
primitiveType) {
 }
   }
 
+  public static class UUIDLogicalTypeAnnotation extends LogicalTypeAnnotation {

Review comment:
   Why there is no implementation for hasCode() and equal()? 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> UUID type currently not supported by parquet-mr
> ---
>
> Key: PARQUET-1827
> URL: https://issues.apache.org/jira/browse/PARQUET-1827
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Brad Smith
>Assignee: Gabor Szadovszky
>Priority: Major
>  Labels: pull-request-available
>
> The parquet-format project introduced a new UUID logical type in version 2.4:
> [https://github.com/apache/parquet-format/blob/master/CHANGES.md]
> This would be a useful type to have available in some circumstances, but it 
> currently isn't supported in the parquet-mr library. Hopefully this feature 
> can be implemented at some point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr

2020-04-03 Thread ASF GitHub Bot (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17074575#comment-17074575
 ] 

ASF GitHub Bot commented on PARQUET-1827:
-

gszadovszky commented on pull request #778: PARQUET-1827: UUID type currently 
not supported by parquet-mr
URL: https://github.com/apache/parquet-mr/pull/778
 
 
   Make sure you have checked _all_ steps below.
   
   ### Jira
   
   - [ ] My PR addresses the following [Parquet 
Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references 
them in the PR title. For example, "PARQUET-1234: My Parquet PR"
 - https://issues.apache.org/jira/browse/PARQUET-XXX
 - In case you are adding a dependency, check if the license complies with 
the [ASF 3rd Party License 
Policy](https://www.apache.org/legal/resolved.html#category-x).
   
   ### Tests
   
   - [ ] My PR adds the following unit tests __OR__ does not need testing for 
this extremely good reason:
   
   ### Commits
   
   - [ ] My commits all reference Jira issues in their subject lines. In 
addition, my commits follow the guidelines from "[How to write a good git 
commit message](http://chris.beams.io/posts/git-commit/)":
 1. Subject is separated from body by a blank line
 1. Subject is limited to 50 characters (not including Jira issue reference)
 1. Subject does not end with a period
 1. Subject uses the imperative mood ("add", not "adding")
 1. Body wraps at 72 characters
 1. Body explains "what" and "why", not "how"
   
   ### Documentation
   
   - [ ] In case of new functionality, my PR adds documentation that describes 
how to use it.
 - All the public functions and the classes in the PR contain Javadoc that 
explain what it does
   
 

This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> UUID type currently not supported by parquet-mr
> ---
>
> Key: PARQUET-1827
> URL: https://issues.apache.org/jira/browse/PARQUET-1827
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Affects Versions: 1.11.0
>Reporter: Brad Smith
>Assignee: Gabor Szadovszky
>Priority: Major
>  Labels: pull-request-available
>
> The parquet-format project introduced a new UUID logical type in version 2.4:
> [https://github.com/apache/parquet-format/blob/master/CHANGES.md]
> This would be a useful type to have available in some circumstances, but it 
> currently isn't supported in the parquet-mr library. Hopefully this feature 
> can be implemented at some point.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)