[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr
[ https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17313524#comment-17313524 ] Xinli Shang commented on PARQUET-1827: -- It seems the storage size is reduced by ~8% for the UUID column. > UUID type currently not supported by parquet-mr > --- > > Key: PARQUET-1827 > URL: https://issues.apache.org/jira/browse/PARQUET-1827 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Brad Smith >Assignee: Gabor Szadovszky >Priority: Major > Labels: pull-request-available > Fix For: 1.12.0 > > > The parquet-format project introduced a new UUID logical type in version 2.4: > [https://github.com/apache/parquet-format/blob/master/CHANGES.md] > This would be a useful type to have available in some circumstances, but it > currently isn't supported in the parquet-mr library. Hopefully this feature > can be implemented at some point. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr
[ https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258287#comment-17258287 ] Gabor Szadovszky commented on PARQUET-1827: --- [~vitalii], there are some open points for 1.12.0. I think we will know more about a release date after the parquet sync next week. > UUID type currently not supported by parquet-mr > --- > > Key: PARQUET-1827 > URL: https://issues.apache.org/jira/browse/PARQUET-1827 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Brad Smith >Assignee: Gabor Szadovszky >Priority: Major > Labels: pull-request-available > Fix For: 1.12.0 > > > The parquet-format project introduced a new UUID logical type in version 2.4: > [https://github.com/apache/parquet-format/blob/master/CHANGES.md] > This would be a useful type to have available in some circumstances, but it > currently isn't supported in the parquet-mr library. Hopefully this feature > can be implemented at some point. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr
[ https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17255609#comment-17255609 ] Vitalii Diravka commented on PARQUET-1827: -- [~gszadovszky] I found it is still not present in 1.11.1 parquet-mr version. Is targeted to 1.12.0 version? If yes, could you point it is in Jira ticket please. When parquet-mr 1.12.0 version is planned to be released? > UUID type currently not supported by parquet-mr > --- > > Key: PARQUET-1827 > URL: https://issues.apache.org/jira/browse/PARQUET-1827 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Brad Smith >Assignee: Gabor Szadovszky >Priority: Major > Labels: pull-request-available > > The parquet-format project introduced a new UUID logical type in version 2.4: > [https://github.com/apache/parquet-format/blob/master/CHANGES.md] > This would be a useful type to have available in some circumstances, but it > currently isn't supported in the parquet-mr library. Hopefully this feature > can be implemented at some point. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr
[ https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125599#comment-17125599 ] ASF GitHub Bot commented on PARQUET-1827: - gszadovszky merged pull request #778: URL: https://github.com/apache/parquet-mr/pull/778 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > UUID type currently not supported by parquet-mr > --- > > Key: PARQUET-1827 > URL: https://issues.apache.org/jira/browse/PARQUET-1827 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Brad Smith >Assignee: Gabor Szadovszky >Priority: Major > Labels: pull-request-available > > The parquet-format project introduced a new UUID logical type in version 2.4: > [https://github.com/apache/parquet-format/blob/master/CHANGES.md] > This would be a useful type to have available in some circumstances, but it > currently isn't supported in the parquet-mr library. Hopefully this feature > can be implemented at some point. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr
[ https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17125026#comment-17125026 ] ASF GitHub Bot commented on PARQUET-1827: - shangxinli commented on a change in pull request #778: URL: https://github.com/apache/parquet-mr/pull/778#discussion_r434631781 ## File path: parquet-avro/src/test/java/org/apache/parquet/avro/TestAvroSchemaConverter.java ## @@ -766,6 +768,33 @@ public void testReuseNameInNestedStructureAtSameLevel() throws Exception { testParquetToAvroConversion(NEW_BEHAVIOR, schema, parquetSchema); } + @Test + public void testUUIDType() throws Exception { +Schema fromAvro = Schema.createRecord("myrecord", null, null, false, +Arrays.asList(new Schema.Field("uuid", LogicalTypes.uuid().addToSchema(Schema.create(STRING)), null, null))); +String parquet = "message myrecord {\n" + +" required binary uuid (STRING);\n" + +"}\n"; +Schema toAvro = Schema.createRecord("myrecord", null, null, false, +Arrays.asList(new Schema.Field("uuid", Schema.create(STRING), null, null))); + +testAvroToParquetConversion(fromAvro, parquet); +testParquetToAvroConversion(toAvro, parquet); + } Review comment: Sounds good This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > UUID type currently not supported by parquet-mr > --- > > Key: PARQUET-1827 > URL: https://issues.apache.org/jira/browse/PARQUET-1827 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Brad Smith >Assignee: Gabor Szadovszky >Priority: Major > Labels: pull-request-available > > The parquet-format project introduced a new UUID logical type in version 2.4: > [https://github.com/apache/parquet-format/blob/master/CHANGES.md] > This would be a useful type to have available in some circumstances, but it > currently isn't supported in the parquet-mr library. Hopefully this feature > can be implemented at some point. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr
[ https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17124935#comment-17124935 ] ASF GitHub Bot commented on PARQUET-1827: - gszadovszky commented on a change in pull request #778: URL: https://github.com/apache/parquet-mr/pull/778#discussion_r434550078 ## File path: parquet-avro/src/test/java/org/apache/parquet/avro/TestAvroSchemaConverter.java ## @@ -766,6 +768,33 @@ public void testReuseNameInNestedStructureAtSameLevel() throws Exception { testParquetToAvroConversion(NEW_BEHAVIOR, schema, parquetSchema); } + @Test + public void testUUIDType() throws Exception { +Schema fromAvro = Schema.createRecord("myrecord", null, null, false, +Arrays.asList(new Schema.Field("uuid", LogicalTypes.uuid().addToSchema(Schema.create(STRING)), null, null))); +String parquet = "message myrecord {\n" + +" required binary uuid (STRING);\n" + +"}\n"; +Schema toAvro = Schema.createRecord("myrecord", null, null, false, +Arrays.asList(new Schema.Field("uuid", Schema.create(STRING), null, null))); + +testAvroToParquetConversion(fromAvro, parquet); +testParquetToAvroConversion(toAvro, parquet); + } Review comment: The [`testRoundTripConversion`](https://github.com/apache/parquet-mr/blob/master/parquet-avro/src/test/java/org/apache/parquet/avro/TestAvroSchemaConverter.java#L144) I'm using in `testUUIDTypeWithParquetUUID` is actually stronger than the one you suggested: it checks for equality (in two phases) of the initial and the result avro schemas (and not only for compatibility). For `testUUIDType`, though it is a good idea to check the compatibility of the avro schemas. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > UUID type currently not supported by parquet-mr > --- > > Key: PARQUET-1827 > URL: https://issues.apache.org/jira/browse/PARQUET-1827 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Brad Smith >Assignee: Gabor Szadovszky >Priority: Major > Labels: pull-request-available > > The parquet-format project introduced a new UUID logical type in version 2.4: > [https://github.com/apache/parquet-format/blob/master/CHANGES.md] > This would be a useful type to have available in some circumstances, but it > currently isn't supported in the parquet-mr library. Hopefully this feature > can be implemented at some point. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr
[ https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17123452#comment-17123452 ] ASF GitHub Bot commented on PARQUET-1827: - gszadovszky commented on a change in pull request #778: URL: https://github.com/apache/parquet-mr/pull/778#discussion_r433681475 ## File path: parquet-column/src/test/java/org/apache/parquet/schema/TestPrimitiveStringifier.java ## @@ -309,6 +308,35 @@ public void testDecimalStringifier() { checkThrowingUnsupportedException(stringifier, Integer.TYPE, Long.TYPE, Binary.class); } + @Test + public void testUUIDStringifier() { +PrimitiveStringifier stringifier = PrimitiveStringifier.UUID_STRINGIFIER; + +assertEquals("00112233-4455-6677-8899-aabbccddeeff", stringifier.stringify( +toBinary(0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff))); +assertEquals("----", stringifier.stringify( +toBinary(0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00))); +assertEquals("----", stringifier.stringify( +toBinary(0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff))); + +assertEquals("0eb1497c-19b6-42bc-b028-b4b612bed141", stringifier.stringify( Review comment: I'm happy to add tests for the edge cases like too short or too long inputs. Though, I would not implement additional validations because of performance issues. A `stringify` method would be invoked on each values; an additional check would highly impact performance even if it is only used from the tools and not really in production. A `Stringifier` is associated to the value at schema level which means it shall never happen that the value is invalid. That's why the `Stringifier` implementations do not validate the values. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > UUID type currently not supported by parquet-mr > --- > > Key: PARQUET-1827 > URL: https://issues.apache.org/jira/browse/PARQUET-1827 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Brad Smith >Assignee: Gabor Szadovszky >Priority: Major > Labels: pull-request-available > > The parquet-format project introduced a new UUID logical type in version 2.4: > [https://github.com/apache/parquet-format/blob/master/CHANGES.md] > This would be a useful type to have available in some circumstances, but it > currently isn't supported in the parquet-mr library. Hopefully this feature > can be implemented at some point. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr
[ https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17123450#comment-17123450 ] ASF GitHub Bot commented on PARQUET-1827: - gszadovszky commented on a change in pull request #778: URL: https://github.com/apache/parquet-mr/pull/778#discussion_r433678473 ## File path: parquet-avro/src/test/java/org/apache/parquet/avro/TestAvroSchemaConverter.java ## @@ -766,6 +768,33 @@ public void testReuseNameInNestedStructureAtSameLevel() throws Exception { testParquetToAvroConversion(NEW_BEHAVIOR, schema, parquetSchema); } + @Test + public void testUUIDType() throws Exception { +Schema fromAvro = Schema.createRecord("myrecord", null, null, false, +Arrays.asList(new Schema.Field("uuid", LogicalTypes.uuid().addToSchema(Schema.create(STRING)), null, null))); +String parquet = "message myrecord {\n" + +" required binary uuid (STRING);\n" + +"}\n"; +Schema toAvro = Schema.createRecord("myrecord", null, null, false, +Arrays.asList(new Schema.Field("uuid", Schema.create(STRING), null, null))); + +testAvroToParquetConversion(fromAvro, parquet); +testParquetToAvroConversion(toAvro, parquet); + } Review comment: I'll look into this just did not have time to work on this PR. Thanks a lot for reviewing. :) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > UUID type currently not supported by parquet-mr > --- > > Key: PARQUET-1827 > URL: https://issues.apache.org/jira/browse/PARQUET-1827 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Brad Smith >Assignee: Gabor Szadovszky >Priority: Major > Labels: pull-request-available > > The parquet-format project introduced a new UUID logical type in version 2.4: > [https://github.com/apache/parquet-format/blob/master/CHANGES.md] > This would be a useful type to have available in some circumstances, but it > currently isn't supported in the parquet-mr library. Hopefully this feature > can be implemented at some point. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr
[ https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119921#comment-17119921 ] ASF GitHub Bot commented on PARQUET-1827: - shangxinli commented on a change in pull request #778: URL: https://github.com/apache/parquet-mr/pull/778#discussion_r432725762 ## File path: parquet-avro/README.md ## @@ -0,0 +1,44 @@ + > Key: PARQUET-1827 > URL: https://issues.apache.org/jira/browse/PARQUET-1827 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Brad Smith >Assignee: Gabor Szadovszky >Priority: Major > Labels: pull-request-available > > The parquet-format project introduced a new UUID logical type in version 2.4: > [https://github.com/apache/parquet-format/blob/master/CHANGES.md] > This would be a useful type to have available in some circumstances, but it > currently isn't supported in the parquet-mr library. Hopefully this feature > can be implemented at some point. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr
[ https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17119920#comment-17119920 ] ASF GitHub Bot commented on PARQUET-1827: - shangxinli commented on a change in pull request #778: URL: https://github.com/apache/parquet-mr/pull/778#discussion_r432725661 ## File path: parquet-avro/src/test/java/org/apache/parquet/avro/TestAvroSchemaConverter.java ## @@ -766,6 +768,33 @@ public void testReuseNameInNestedStructureAtSameLevel() throws Exception { testParquetToAvroConversion(NEW_BEHAVIOR, schema, parquetSchema); } + @Test + public void testUUIDType() throws Exception { +Schema fromAvro = Schema.createRecord("myrecord", null, null, false, +Arrays.asList(new Schema.Field("uuid", LogicalTypes.uuid().addToSchema(Schema.create(STRING)), null, null))); +String parquet = "message myrecord {\n" + +" required binary uuid (STRING);\n" + +"}\n"; +Schema toAvro = Schema.createRecord("myrecord", null, null, false, +Arrays.asList(new Schema.Field("uuid", Schema.create(STRING), null, null))); + +testAvroToParquetConversion(fromAvro, parquet); +testParquetToAvroConversion(toAvro, parquet); + } Review comment: If it is too much effort for doing this, it is OK not to do it. It is a lower priority. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > UUID type currently not supported by parquet-mr > --- > > Key: PARQUET-1827 > URL: https://issues.apache.org/jira/browse/PARQUET-1827 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Brad Smith >Assignee: Gabor Szadovszky >Priority: Major > Labels: pull-request-available > > The parquet-format project introduced a new UUID logical type in version 2.4: > [https://github.com/apache/parquet-format/blob/master/CHANGES.md] > This would be a useful type to have available in some circumstances, but it > currently isn't supported in the parquet-mr library. Hopefully this feature > can be implemented at some point. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr
[ https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17103555#comment-17103555 ] ASF GitHub Bot commented on PARQUET-1827: - shangxinli commented on a change in pull request #778: URL: https://github.com/apache/parquet-mr/pull/778#discussion_r422564200 ## File path: parquet-column/src/test/java/org/apache/parquet/schema/TestPrimitiveStringifier.java ## @@ -309,6 +308,35 @@ public void testDecimalStringifier() { checkThrowingUnsupportedException(stringifier, Integer.TYPE, Long.TYPE, Binary.class); } + @Test + public void testUUIDStringifier() { +PrimitiveStringifier stringifier = PrimitiveStringifier.UUID_STRINGIFIER; + +assertEquals("00112233-4455-6677-8899-aabbccddeeff", stringifier.stringify( +toBinary(0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff))); +assertEquals("----", stringifier.stringify( +toBinary(0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00))); +assertEquals("----", stringifier.stringify( +toBinary(0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff))); + +assertEquals("0eb1497c-19b6-42bc-b028-b4b612bed141", stringifier.stringify( Review comment: By duplicate coverage, I meant the #2, #3 tests seems repeating the same test as #1. The value is different, but when the test executes, they would execute the same code path. So I think they won't provide extra coverage. From test perspective, negative test does provide values. In this case, we can test the exception is thrown as expected if it is too short. For "if the array is longer than 16 it would simply stringify the first 16 and skip the others", that could cause silent errors, right? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > UUID type currently not supported by parquet-mr > --- > > Key: PARQUET-1827 > URL: https://issues.apache.org/jira/browse/PARQUET-1827 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Brad Smith >Assignee: Gabor Szadovszky >Priority: Major > Labels: pull-request-available > > The parquet-format project introduced a new UUID logical type in version 2.4: > [https://github.com/apache/parquet-format/blob/master/CHANGES.md] > This would be a useful type to have available in some circumstances, but it > currently isn't supported in the parquet-mr library. Hopefully this feature > can be implemented at some point. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr
[ https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17103548#comment-17103548 ] ASF GitHub Bot commented on PARQUET-1827: - shangxinli commented on a change in pull request #778: URL: https://github.com/apache/parquet-mr/pull/778#discussion_r422563108 ## File path: parquet-avro/src/test/java/org/apache/parquet/avro/TestAvroSchemaConverter.java ## @@ -766,6 +768,33 @@ public void testReuseNameInNestedStructureAtSameLevel() throws Exception { testParquetToAvroConversion(NEW_BEHAVIOR, schema, parquetSchema); } + @Test + public void testUUIDType() throws Exception { +Schema fromAvro = Schema.createRecord("myrecord", null, null, false, +Arrays.asList(new Schema.Field("uuid", LogicalTypes.uuid().addToSchema(Schema.create(STRING)), null, null))); +String parquet = "message myrecord {\n" + +" required binary uuid (STRING);\n" + +"}\n"; +Schema toAvro = Schema.createRecord("myrecord", null, null, false, +Arrays.asList(new Schema.Field("uuid", Schema.create(STRING), null, null))); + +testAvroToParquetConversion(fromAvro, parquet); +testParquetToAvroConversion(toAvro, parquet); + } Review comment: Basically we found some type of avro schema is not compatible with the parquet schema which it is converted to. This caused problem that the data cannot be read. I have a test here (https://github.com/shangxinli/parquet-mr/commit/f80469f55b83404ea334ee4019f658ecdb5ac575#diff-536ca67880a7870cf8df8f95143bd7d7R814) that reproduce the issue for a nested schema. I know likely UUID type won't have this issue but I better to have a test for it. It is pretty easy to add also. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > UUID type currently not supported by parquet-mr > --- > > Key: PARQUET-1827 > URL: https://issues.apache.org/jira/browse/PARQUET-1827 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Brad Smith >Assignee: Gabor Szadovszky >Priority: Major > Labels: pull-request-available > > The parquet-format project introduced a new UUID logical type in version 2.4: > [https://github.com/apache/parquet-format/blob/master/CHANGES.md] > This would be a useful type to have available in some circumstances, but it > currently isn't supported in the parquet-mr library. Hopefully this feature > can be implemented at some point. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr
[ https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101496#comment-17101496 ] ASF GitHub Bot commented on PARQUET-1827: - gszadovszky commented on a change in pull request #778: URL: https://github.com/apache/parquet-mr/pull/778#discussion_r421389575 ## File path: parquet-avro/src/test/java/org/apache/parquet/avro/TestAvroSchemaConverter.java ## @@ -766,6 +768,33 @@ public void testReuseNameInNestedStructureAtSameLevel() throws Exception { testParquetToAvroConversion(NEW_BEHAVIOR, schema, parquetSchema); } + @Test + public void testUUIDType() throws Exception { +Schema fromAvro = Schema.createRecord("myrecord", null, null, false, +Arrays.asList(new Schema.Field("uuid", LogicalTypes.uuid().addToSchema(Schema.create(STRING)), null, null))); +String parquet = "message myrecord {\n" + +" required binary uuid (STRING);\n" + +"}\n"; +Schema toAvro = Schema.createRecord("myrecord", null, null, false, +Arrays.asList(new Schema.Field("uuid", Schema.create(STRING), null, null))); + +testAvroToParquetConversion(fromAvro, parquet); +testParquetToAvroConversion(toAvro, parquet); + } Review comment: To be honest I am not too familiar with parquet-avro. I've made the changes based on the implementation/test of other logical types. Could you explain it in more details what you would test exactly? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > UUID type currently not supported by parquet-mr > --- > > Key: PARQUET-1827 > URL: https://issues.apache.org/jira/browse/PARQUET-1827 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Brad Smith >Assignee: Gabor Szadovszky >Priority: Major > Labels: pull-request-available > > The parquet-format project introduced a new UUID logical type in version 2.4: > [https://github.com/apache/parquet-format/blob/master/CHANGES.md] > This would be a useful type to have available in some circumstances, but it > currently isn't supported in the parquet-mr library. Hopefully this feature > can be implemented at some point. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr
[ https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101490#comment-17101490 ] ASF GitHub Bot commented on PARQUET-1827: - gszadovszky commented on a change in pull request #778: URL: https://github.com/apache/parquet-mr/pull/778#discussion_r421384119 ## File path: parquet-avro/README.md ## @@ -0,0 +1,44 @@ + > Key: PARQUET-1827 > URL: https://issues.apache.org/jira/browse/PARQUET-1827 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Brad Smith >Assignee: Gabor Szadovszky >Priority: Major > Labels: pull-request-available > > The parquet-format project introduced a new UUID logical type in version 2.4: > [https://github.com/apache/parquet-format/blob/master/CHANGES.md] > This would be a useful type to have available in some circumstances, but it > currently isn't supported in the parquet-mr library. Hopefully this feature > can be implemented at some point. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr
[ https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17101484#comment-17101484 ] ASF GitHub Bot commented on PARQUET-1827: - gszadovszky commented on a change in pull request #778: URL: https://github.com/apache/parquet-mr/pull/778#discussion_r421381112 ## File path: parquet-column/src/test/java/org/apache/parquet/schema/TestPrimitiveStringifier.java ## @@ -309,6 +308,35 @@ public void testDecimalStringifier() { checkThrowingUnsupportedException(stringifier, Integer.TYPE, Long.TYPE, Binary.class); } + @Test + public void testUUIDStringifier() { +PrimitiveStringifier stringifier = PrimitiveStringifier.UUID_STRINGIFIER; + +assertEquals("00112233-4455-6677-8899-aabbccddeeff", stringifier.stringify( +toBinary(0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff))); +assertEquals("----", stringifier.stringify( +toBinary(0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00))); +assertEquals("----", stringifier.stringify( +toBinary(0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff))); + +assertEquals("0eb1497c-19b6-42bc-b028-b4b612bed141", stringifier.stringify( Review comment: The idea is to have 3 kind-of corner cases and 3 common (random but constant) values. What do you mean by duplicate coverage? (I think, we do not need exhaustive testing for the stringifiers because they only used by our tools for debugging purposes.) The stringifiers do not validate the data they get because of performance reasons. So, if the array is longer than 16 it would simply stringify the first 16 and skip the others. In case of the length is too short then an `ArrayIndexOutOfBoundsException` would be thrown. Do you think we should test these cases? They would not reach any additional branches in the parquet code. Invalid characters are not possible. The full set of values of the 16 bytes array is covered in UUID. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > UUID type currently not supported by parquet-mr > --- > > Key: PARQUET-1827 > URL: https://issues.apache.org/jira/browse/PARQUET-1827 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Brad Smith >Assignee: Gabor Szadovszky >Priority: Major > Labels: pull-request-available > > The parquet-format project introduced a new UUID logical type in version 2.4: > [https://github.com/apache/parquet-format/blob/master/CHANGES.md] > This would be a useful type to have available in some circumstances, but it > currently isn't supported in the parquet-mr library. Hopefully this feature > can be implemented at some point. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr
[ https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098048#comment-17098048 ] ASF GitHub Bot commented on PARQUET-1827: - shangxinli commented on a change in pull request #778: URL: https://github.com/apache/parquet-mr/pull/778#discussion_r418987550 ## File path: parquet-avro/src/test/java/org/apache/parquet/avro/TestAvroSchemaConverter.java ## @@ -766,6 +768,33 @@ public void testReuseNameInNestedStructureAtSameLevel() throws Exception { testParquetToAvroConversion(NEW_BEHAVIOR, schema, parquetSchema); } + @Test + public void testUUIDType() throws Exception { +Schema fromAvro = Schema.createRecord("myrecord", null, null, false, +Arrays.asList(new Schema.Field("uuid", LogicalTypes.uuid().addToSchema(Schema.create(STRING)), null, null))); +String parquet = "message myrecord {\n" + +" required binary uuid (STRING);\n" + +"}\n"; +Schema toAvro = Schema.createRecord("myrecord", null, null, false, +Arrays.asList(new Schema.Field("uuid", Schema.create(STRING), null, null))); + +testAvroToParquetConversion(fromAvro, parquet); +testParquetToAvroConversion(toAvro, parquet); + } Review comment: Can we have checkReaderWriterCompatibility() to verify if the parquet and avro schema are compatible for UUID? There are issues like PARQUET-1681 for avro schema and parquet schema conversion for other types. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > UUID type currently not supported by parquet-mr > --- > > Key: PARQUET-1827 > URL: https://issues.apache.org/jira/browse/PARQUET-1827 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Brad Smith >Assignee: Gabor Szadovszky >Priority: Major > Labels: pull-request-available > > The parquet-format project introduced a new UUID logical type in version 2.4: > [https://github.com/apache/parquet-format/blob/master/CHANGES.md] > This would be a useful type to have available in some circumstances, but it > currently isn't supported in the parquet-mr library. Hopefully this feature > can be implemented at some point. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr
[ https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098032#comment-17098032 ] ASF GitHub Bot commented on PARQUET-1827: - shangxinli commented on a change in pull request #778: URL: https://github.com/apache/parquet-mr/pull/778#discussion_r418985058 ## File path: parquet-avro/README.md ## @@ -0,0 +1,44 @@ + > Key: PARQUET-1827 > URL: https://issues.apache.org/jira/browse/PARQUET-1827 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Brad Smith >Assignee: Gabor Szadovszky >Priority: Major > Labels: pull-request-available > > The parquet-format project introduced a new UUID logical type in version 2.4: > [https://github.com/apache/parquet-format/blob/master/CHANGES.md] > This would be a useful type to have available in some circumstances, but it > currently isn't supported in the parquet-mr library. Hopefully this feature > can be implemented at some point. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr
[ https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098030#comment-17098030 ] ASF GitHub Bot commented on PARQUET-1827: - shangxinli commented on a change in pull request #778: URL: https://github.com/apache/parquet-mr/pull/778#discussion_r418984184 ## File path: parquet-column/src/test/java/org/apache/parquet/schema/TestPrimitiveStringifier.java ## @@ -309,6 +308,35 @@ public void testDecimalStringifier() { checkThrowingUnsupportedException(stringifier, Integer.TYPE, Long.TYPE, Binary.class); } + @Test + public void testUUIDStringifier() { +PrimitiveStringifier stringifier = PrimitiveStringifier.UUID_STRINGIFIER; + +assertEquals("00112233-4455-6677-8899-aabbccddeeff", stringifier.stringify( +toBinary(0x00, 0x11, 0x22, 0x33, 0x44, 0x55, 0x66, 0x77, 0x88, 0x99, 0xaa, 0xbb, 0xcc, 0xdd, 0xee, 0xff))); +assertEquals("----", stringifier.stringify( +toBinary(0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00))); +assertEquals("----", stringifier.stringify( +toBinary(0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff))); + +assertEquals("0eb1497c-19b6-42bc-b028-b4b612bed141", stringifier.stringify( Review comment: Are the 3 test data are randomly chosen? It seems duplicate coverage. Could you add some negative tests like incorrect length uuids, invalid characters etc. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > UUID type currently not supported by parquet-mr > --- > > Key: PARQUET-1827 > URL: https://issues.apache.org/jira/browse/PARQUET-1827 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Brad Smith >Assignee: Gabor Szadovszky >Priority: Major > Labels: pull-request-available > > The parquet-format project introduced a new UUID logical type in version 2.4: > [https://github.com/apache/parquet-format/blob/master/CHANGES.md] > This would be a useful type to have available in some circumstances, but it > currently isn't supported in the parquet-mr library. Hopefully this feature > can be implemented at some point. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr
[ https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098025#comment-17098025 ] ASF GitHub Bot commented on PARQUET-1827: - shangxinli commented on a change in pull request #778: URL: https://github.com/apache/parquet-mr/pull/778#discussion_r418981845 ## File path: parquet-column/src/main/java/org/apache/parquet/schema/PrimitiveStringifier.java ## @@ -421,4 +422,30 @@ private String stringifyWithScale(BigInteger i) { } }; } + + static final PrimitiveStringifier UUID_STRINGIFIER = new PrimitiveStringifier("UUID_STRINGIFIER") { +private final char[] digit = "0123456789abcdef".toCharArray(); +@Override +public String stringify(Binary value) { + byte[] bytes = value.getBytesUnsafe(); Review comment: Do you want to make 'bytes' final since you are calling getBytesUnsafe()? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > UUID type currently not supported by parquet-mr > --- > > Key: PARQUET-1827 > URL: https://issues.apache.org/jira/browse/PARQUET-1827 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Brad Smith >Assignee: Gabor Szadovszky >Priority: Major > Labels: pull-request-available > > The parquet-format project introduced a new UUID logical type in version 2.4: > [https://github.com/apache/parquet-format/blob/master/CHANGES.md] > This would be a useful type to have available in some circumstances, but it > currently isn't supported in the parquet-mr library. Hopefully this feature > can be implemented at some point. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr
[ https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098019#comment-17098019 ] ASF GitHub Bot commented on PARQUET-1827: - shangxinli commented on a change in pull request #778: URL: https://github.com/apache/parquet-mr/pull/778#discussion_r418980107 ## File path: parquet-column/src/main/java/org/apache/parquet/schema/LogicalTypeAnnotation.java ## @@ -861,6 +871,36 @@ PrimitiveStringifier valueStringifier(PrimitiveType primitiveType) { } } + public static class UUIDLogicalTypeAnnotation extends LogicalTypeAnnotation { Review comment: Why there is no implementation for hasCode() and equal()? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > UUID type currently not supported by parquet-mr > --- > > Key: PARQUET-1827 > URL: https://issues.apache.org/jira/browse/PARQUET-1827 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Brad Smith >Assignee: Gabor Szadovszky >Priority: Major > Labels: pull-request-available > > The parquet-format project introduced a new UUID logical type in version 2.4: > [https://github.com/apache/parquet-format/blob/master/CHANGES.md] > This would be a useful type to have available in some circumstances, but it > currently isn't supported in the parquet-mr library. Hopefully this feature > can be implemented at some point. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (PARQUET-1827) UUID type currently not supported by parquet-mr
[ https://issues.apache.org/jira/browse/PARQUET-1827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17074575#comment-17074575 ] ASF GitHub Bot commented on PARQUET-1827: - gszadovszky commented on pull request #778: PARQUET-1827: UUID type currently not supported by parquet-mr URL: https://github.com/apache/parquet-mr/pull/778 Make sure you have checked _all_ steps below. ### Jira - [ ] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-XXX - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [ ] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [ ] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > UUID type currently not supported by parquet-mr > --- > > Key: PARQUET-1827 > URL: https://issues.apache.org/jira/browse/PARQUET-1827 > Project: Parquet > Issue Type: Improvement > Components: parquet-mr >Affects Versions: 1.11.0 >Reporter: Brad Smith >Assignee: Gabor Szadovszky >Priority: Major > Labels: pull-request-available > > The parquet-format project introduced a new UUID logical type in version 2.4: > [https://github.com/apache/parquet-format/blob/master/CHANGES.md] > This would be a useful type to have available in some circumstances, but it > currently isn't supported in the parquet-mr library. Hopefully this feature > can be implemented at some point. -- This message was sent by Atlassian Jira (v8.3.4#803005)