[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789880#comment-17789880 ] ASF GitHub Bot commented on PARQUET-2042: - shangxinli merged PR #900: URL: https://github.com/apache/parquet-mr/pull/900 > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789881#comment-17789881 ] ASF GitHub Bot commented on PARQUET-2042: - shangxinli commented on PR #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1827000412 Merged > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17789870#comment-17789870 ] ASF GitHub Bot commented on PARQUET-2042: - mwong38 commented on PR #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1826944950 > I just noticed this PR and sorry to see it does not check in. @mwong38 Could you try rebase it one last time? Thanks! Done. I really hope it's the last time. > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776574#comment-17776574 ] ASF GitHub Bot commented on PARQUET-2042: - wgtmac commented on PR #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1768050027 I just noticed this PR and sorry to see it does not check in. @mwong38 Could you try rebase it one last time? Thanks! > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17776140#comment-17776140 ] ASF GitHub Bot commented on PARQUET-2042: - zhaolihe commented on PR #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1766145091 any reason why this PR hasn't been merged yet? > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17619125#comment-17619125 ] ASF GitHub Bot commented on PARQUET-2042: - sheinbergon commented on PR #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1281551940 @shangxinli any reason why this PR hasn't been merged yet? > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17600051#comment-17600051 ] ASF GitHub Bot commented on PARQUET-2042: - mwong38 commented on PR #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1236318657 > I think we are close to merge this PR. Resolve the conflict and use the imports , then we can merge. Done > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17570544#comment-17570544 ] ASF GitHub Bot commented on PARQUET-2042: - shangxinli commented on PR #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1193386419 I think we are close to merge this PR. Resolve the conflict and use the imports , then we can merge. > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17570543#comment-17570543 ] ASF GitHub Bot commented on PARQUET-2042: - shangxinli commented on code in PR #900: URL: https://github.com/apache/parquet-mr/pull/900#discussion_r928306582 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java: ## @@ -427,6 +485,218 @@ public void addBinary(Binary binary) { } + final class ProtoTimestampConverter extends PrimitiveConverter { + +final ParentValueContainer parent; +final LogicalTypeAnnotation.TimestampLogicalTypeAnnotation logicalTypeAnnotation; + +public ProtoTimestampConverter(ParentValueContainer parent, LogicalTypeAnnotation.TimestampLogicalTypeAnnotation logicalTypeAnnotation) { + this.parent = parent; + this.logicalTypeAnnotation = logicalTypeAnnotation; +} + +@Override +public void addLong(long value) { + switch (logicalTypeAnnotation.getUnit()) { +case MICROS: + parent.add(Timestamps.fromMicros(value)); + break; +case MILLIS: + parent.add(Timestamps.fromMillis(value)); + break; +case NANOS: + parent.add(Timestamps.fromNanos(value)); + break; + } +} + } + + final class ProtoDateConverter extends PrimitiveConverter { + +final ParentValueContainer parent; + +public ProtoDateConverter(ParentValueContainer parent) { + this.parent = parent; +} + +@Override +public void addInt(int value) { + LocalDate localDate = LocalDate.ofEpochDay(value); + com.google.type.Date date = com.google.type.Date.newBuilder() Review Comment: If there is no collision on the imports, we should use imports generally. > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17570542#comment-17570542 ] ASF GitHub Bot commented on PARQUET-2042: - shangxinli commented on code in PR #900: URL: https://github.com/apache/parquet-mr/pull/900#discussion_r928306114 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -97,6 +127,46 @@ public MessageType convert(Class protobufClass) { private Builder>, GroupBuilder> addField(FieldDescriptor descriptor, final GroupBuilder builder) { if (descriptor.getJavaType() == JavaType.MESSAGE) { + if (unwrapProtoWrappers) { +String typeName = descriptor.getMessageType().getFullName(); +if (typeName.equals(PROTOBUF_TIMESTAMP_TYPE)) { + return builder.primitive(INT64, getRepetition(descriptor)).as(timestampType(true, TimeUnit.NANOS)); Review Comment: @emkornfield Do you still have a comment on this? > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17562101#comment-17562101 ] ASF GitHub Bot commented on PARQUET-2042: - mwong38 commented on code in PR #900: URL: https://github.com/apache/parquet-mr/pull/900#discussion_r912817641 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -97,6 +127,46 @@ public MessageType convert(Class protobufClass) { private Builder>, GroupBuilder> addField(FieldDescriptor descriptor, final GroupBuilder builder) { if (descriptor.getJavaType() == JavaType.MESSAGE) { + if (unwrapProtoWrappers) { +String typeName = descriptor.getMessageType().getFullName(); +if (typeName.equals(PROTOBUF_TIMESTAMP_TYPE)) { + return builder.primitive(INT64, getRepetition(descriptor)).as(timestampType(true, TimeUnit.NANOS)); Review Comment: Isn't that outside the scope of ProtoSchemaConverter/ParquetWriter? I don't think it should go into the business of doing transformations. If the point of `ProtoSchemaConverter` is to give the closest representation of the Protobuf object in Parquet, then it should be NANOS and nothing more. > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561973#comment-17561973 ] ASF GitHub Bot commented on PARQUET-2042: - shangxinli commented on code in PR #900: URL: https://github.com/apache/parquet-mr/pull/900#discussion_r912586665 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -97,6 +127,46 @@ public MessageType convert(Class protobufClass) { private Builder>, GroupBuilder> addField(FieldDescriptor descriptor, final GroupBuilder builder) { if (descriptor.getJavaType() == JavaType.MESSAGE) { + if (unwrapProtoWrappers) { +String typeName = descriptor.getMessageType().getFullName(); +if (typeName.equals(PROTOBUF_TIMESTAMP_TYPE)) { + return builder.primitive(INT64, getRepetition(descriptor)).as(timestampType(true, TimeUnit.NANOS)); Review Comment: In that case, you can use the default 'TimeUnit.NANOS' while having that configured. > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561948#comment-17561948 ] ASF GitHub Bot commented on PARQUET-2042: - mwong38 commented on code in PR #900: URL: https://github.com/apache/parquet-mr/pull/900#discussion_r912547741 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -97,6 +127,46 @@ public MessageType convert(Class protobufClass) { private Builder>, GroupBuilder> addField(FieldDescriptor descriptor, final GroupBuilder builder) { if (descriptor.getJavaType() == JavaType.MESSAGE) { + if (unwrapProtoWrappers) { +String typeName = descriptor.getMessageType().getFullName(); Review Comment: That's a good point. I'll change it to compare the `Descriptor` directly to the Wrapper's Descriptor rather than the name. > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561946#comment-17561946 ] ASF GitHub Bot commented on PARQUET-2042: - mwong38 commented on code in PR #900: URL: https://github.com/apache/parquet-mr/pull/900#discussion_r912545610 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -97,6 +127,46 @@ public MessageType convert(Class protobufClass) { private Builder>, GroupBuilder> addField(FieldDescriptor descriptor, final GroupBuilder builder) { if (descriptor.getJavaType() == JavaType.MESSAGE) { + if (unwrapProtoWrappers) { +String typeName = descriptor.getMessageType().getFullName(); +if (typeName.equals(PROTOBUF_TIMESTAMP_TYPE)) { + return builder.primitive(INT64, getRepetition(descriptor)).as(timestampType(true, TimeUnit.NANOS)); Review Comment: I don't think it's worth complicating the API. The Timestamp common Proto stores time in nanoseconds. There's no good reason to deviate from that or to truncate the resolution. If the user wishes to do more manipulation, it can be done downstream. > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561945#comment-17561945 ] ASF GitHub Bot commented on PARQUET-2042: - mwong38 commented on code in PR #900: URL: https://github.com/apache/parquet-mr/pull/900#discussion_r912545477 ## parquet-protobuf/pom.xml: ## @@ -57,6 +58,16 @@ protobuf-java ${protobuf.version} + + com.google.protobuf + protobuf-java-util + ${protobuf.version} + + + com.google.api.grpc + proto-google-common-protos Review Comment: If you want to pre-packaged common Proto classes, you need to include this. > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553412#comment-17553412 ] ASF GitHub Bot commented on PARQUET-2042: - sheinbergon commented on PR #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1153562823 @mwong38 anyway I can help with finalizing this PR? > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17525061#comment-17525061 ] ASF GitHub Bot commented on PARQUET-2042: - shangxinli commented on PR #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1104043913 @mwong38 Can you address the feedback from @emkornfield before we can merge? > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17513131#comment-17513131 ] ASF GitHub Bot commented on PARQUET-2042: - emkornfield commented on a change in pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#discussion_r836044690 ## File path: parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java ## @@ -97,6 +127,46 @@ public MessageType convert(Class protobufClass) { private Builder>, GroupBuilder> addField(FieldDescriptor descriptor, final GroupBuilder builder) { if (descriptor.getJavaType() == JavaType.MESSAGE) { + if (unwrapProtoWrappers) { +String typeName = descriptor.getMessageType().getFullName(); Review comment: is there a reason you are using type equality vs descriptor equality? (I haven't checked but I would think .Equals would work? Maybe add a comment if this is intentional. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17513128#comment-17513128 ] ASF GitHub Bot commented on PARQUET-2042: - emkornfield commented on a change in pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#discussion_r836042758 ## File path: parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java ## @@ -35,22 +35,49 @@ import java.util.List; +import static org.apache.parquet.schema.LogicalTypeAnnotation.TimeUnit; +import static org.apache.parquet.schema.LogicalTypeAnnotation.dateType; import static org.apache.parquet.schema.LogicalTypeAnnotation.enumType; import static org.apache.parquet.schema.LogicalTypeAnnotation.listType; import static org.apache.parquet.schema.LogicalTypeAnnotation.mapType; import static org.apache.parquet.schema.LogicalTypeAnnotation.stringType; -import static org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName.*; +import static org.apache.parquet.schema.LogicalTypeAnnotation.timeType; +import static org.apache.parquet.schema.LogicalTypeAnnotation.timestampType; +import static org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName.BINARY; +import static org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName.BOOLEAN; +import static org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName.DOUBLE; +import static org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName.FLOAT; +import static org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName.INT32; +import static org.apache.parquet.schema.PrimitiveType.PrimitiveTypeName.INT64; /** * Converts a Protocol Buffer Descriptor into a Parquet schema. */ public class ProtoSchemaConverter { + public static final String PROTOBUF_TIMESTAMP_TYPE = "google.protobuf.Timestamp"; + public static final String PROTOBUF_DATE_TYPE = "google.type.Date"; + public static final String PROTOBUF_TIME_TYPE = "google.type.TimeOfDay"; + public static final String PROTOBUF_DOUBLE_TYPE = "google.protobuf.DoubleValue"; + public static final String PROTOBUF_FLOAT_TYPE = "google.protobuf.FloatValue"; + public static final String PROTOBUF_INT64_TYPE = "google.protobuf.Int64Value"; + public static final String PROTOBUF_UINT64_TYPE = "google.protobuf.UInt64Value"; + public static final String PROTOBUF_INT32_TYPE = "google.protobuf.Int32Value"; + public static final String PROTOBUF_UINT32_TYPE = "google.protobuf.UInt32Value"; + public static final String PROTOBUF_BOOL_TYPE = "google.protobuf.BoolValue"; + public static final String PROTOBUF_STRING_TYPE = "google.protobuf.StringValue"; + public static final String PROTOBUF_BYTES_TYPE = "google.protobuf.BytesValue"; + private static final Logger LOG = LoggerFactory.getLogger(ProtoSchemaConverter.class); private final boolean parquetSpecsCompliant; + private final boolean unwrapProtoWrappers; public ProtoSchemaConverter() { -this(false); +this(false, false); + } + + public ProtoSchemaConverter(boolean parquetSpecsCompliant) { Review comment: please add docs for all new publicly facingin parameters. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17513127#comment-17513127 ] ASF GitHub Bot commented on PARQUET-2042: - emkornfield commented on a change in pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#discussion_r836040766 ## File path: parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java ## @@ -97,6 +127,46 @@ public MessageType convert(Class protobufClass) { private Builder>, GroupBuilder> addField(FieldDescriptor descriptor, final GroupBuilder builder) { if (descriptor.getJavaType() == JavaType.MESSAGE) { + if (unwrapProtoWrappers) { +String typeName = descriptor.getMessageType().getFullName(); +if (typeName.equals(PROTOBUF_TIMESTAMP_TYPE)) { + return builder.primitive(INT64, getRepetition(descriptor)).as(timestampType(true, TimeUnit.NANOS)); Review comment: should TimeUnit be something configurable? I know protobuf timestamps support nanosecond resolution but might not be what applications want to store. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17513126#comment-17513126 ] ASF GitHub Bot commented on PARQUET-2042: - emkornfield commented on a change in pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#discussion_r836039431 ## File path: parquet-protobuf/pom.xml ## @@ -57,6 +58,16 @@ protobuf-java ${protobuf.version} + + com.google.protobuf + protobuf-java-util + ${protobuf.version} + + + com.google.api.grpc + proto-google-common-protos Review comment: why this dependency, i thought well known types shipped with the proto package? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17512548#comment-17512548 ] ASF GitHub Bot commented on PARQUET-2042: - shangxinli commented on pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1079368200 Will have another look soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17512378#comment-17512378 ] ASF GitHub Bot commented on PARQUET-2042: - sheinbergon commented on pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1079025532 @shangxinli any news about merging this version? Are there still any blockers? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17510486#comment-17510486 ] ASF GitHub Bot commented on PARQUET-2042: - dossett commented on pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1075164744 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509743#comment-17509743 ] ASF GitHub Bot commented on PARQUET-2042: - mwong38 commented on a change in pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#discussion_r830941957 ## File path: parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java ## @@ -427,6 +485,218 @@ public void addBinary(Binary binary) { } + final class ProtoTimestampConverter extends PrimitiveConverter { + +final ParentValueContainer parent; +final LogicalTypeAnnotation.TimestampLogicalTypeAnnotation logicalTypeAnnotation; + +public ProtoTimestampConverter(ParentValueContainer parent, LogicalTypeAnnotation.TimestampLogicalTypeAnnotation logicalTypeAnnotation) { + this.parent = parent; + this.logicalTypeAnnotation = logicalTypeAnnotation; +} + +@Override +public void addLong(long value) { + switch (logicalTypeAnnotation.getUnit()) { +case MICROS: + parent.add(Timestamps.fromMicros(value)); + break; +case MILLIS: + parent.add(Timestamps.fromMillis(value)); + break; +case NANOS: + parent.add(Timestamps.fromNanos(value)); + break; + } +} + } + + final class ProtoDateConverter extends PrimitiveConverter { + +final ParentValueContainer parent; + +public ProtoDateConverter(ParentValueContainer parent) { + this.parent = parent; +} + +@Override +public void addInt(int value) { + LocalDate localDate = LocalDate.ofEpochDay(value); + com.google.type.Date date = com.google.type.Date.newBuilder() Review comment: I think in this particular case it's more clear to be explicit that this is the Google Protobuf Date, rather than, say, the Java built-in Date. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509744#comment-17509744 ] ASF GitHub Bot commented on PARQUET-2042: - mwong38 commented on a change in pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#discussion_r830942166 ## File path: parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java ## @@ -427,6 +485,218 @@ public void addBinary(Binary binary) { } + final class ProtoTimestampConverter extends PrimitiveConverter { + +final ParentValueContainer parent; +final LogicalTypeAnnotation.TimestampLogicalTypeAnnotation logicalTypeAnnotation; + +public ProtoTimestampConverter(ParentValueContainer parent, LogicalTypeAnnotation.TimestampLogicalTypeAnnotation logicalTypeAnnotation) { + this.parent = parent; + this.logicalTypeAnnotation = logicalTypeAnnotation; +} + +@Override +public void addLong(long value) { + switch (logicalTypeAnnotation.getUnit()) { +case MICROS: + parent.add(Timestamps.fromMicros(value)); + break; +case MILLIS: + parent.add(Timestamps.fromMillis(value)); + break; +case NANOS: + parent.add(Timestamps.fromNanos(value)); + break; + } +} + } + + final class ProtoDateConverter extends PrimitiveConverter { + +final ParentValueContainer parent; + +public ProtoDateConverter(ParentValueContainer parent) { + this.parent = parent; +} + +@Override +public void addInt(int value) { + LocalDate localDate = LocalDate.ofEpochDay(value); + com.google.type.Date date = com.google.type.Date.newBuilder() +.setYear(localDate.getYear()) +.setMonth(localDate.getMonthValue()) +.setDay(localDate.getDayOfMonth()) +.build(); + parent.add(date); +} + } + + final class ProtoTimeConverter extends PrimitiveConverter { + +final ParentValueContainer parent; +final LogicalTypeAnnotation.TimeLogicalTypeAnnotation logicalTypeAnnotation; + +public ProtoTimeConverter(ParentValueContainer parent, LogicalTypeAnnotation.TimeLogicalTypeAnnotation logicalTypeAnnotation) { + this.parent = parent; + this.logicalTypeAnnotation = logicalTypeAnnotation; +} + +@Override +public void addLong(long value) { + LocalTime localTime; + switch (logicalTypeAnnotation.getUnit()) { +case MILLIS: + localTime = LocalTime.ofNanoOfDay(value * 1_000_000); + break; +case MICROS: + localTime = LocalTime.ofNanoOfDay(value * 1_000); + break; +case NANOS: + localTime = LocalTime.ofNanoOfDay(value); + break; +default: + throw new IllegalArgumentException("Unrecognized TimeUnit: " + logicalTypeAnnotation.getUnit()); + } + com.google.type.TimeOfDay timeOfDay = com.google.type.TimeOfDay.newBuilder() +.setHours(localTime.getHour()) +.setMinutes(localTime.getMinute()) +.setSeconds(localTime.getSecond()) +.setNanos(localTime.getNano()) +.build(); + parent.add(timeOfDay); +} + } + + final class ProtoDoubleValueConverter extends PrimitiveConverter { + +final ParentValueContainer parent; + +public ProtoDoubleValueConverter(ParentValueContainer parent) { + this.parent = parent; +} + +@Override +public void addDouble(double value) { + parent.add(DoubleValue.of(value)); +} + } + + final class ProtoFloatValueConverter extends PrimitiveConverter { + +final ParentValueContainer parent; + +public ProtoFloatValueConverter(ParentValueContainer parent) { + this.parent = parent; +} + +@Override +public void addFloat(float value) { + parent.add(FloatValue.of(value)); +} + } + + final class ProtoInt64ValueConverter extends PrimitiveConverter { + +final ParentValueContainer parent; + +public ProtoInt64ValueConverter(ParentValueContainer parent) { + this.parent = parent; +} + +@Override +public void addLong(long value) { + parent.add(Int64Value.of(value)); +} + } + + final class ProtoUInt64ValueConverter extends PrimitiveConverter { + +final ParentValueContainer parent; + +public ProtoUInt64ValueConverter(ParentValueContainer parent) { + this.parent = parent; +} + +@Override +public void addLong(long value) { + parent.add(UInt64Value.of(value)); +} + } + + final class ProtoInt32ValueConverter extends PrimitiveConverter { + +final ParentValueContainer parent; + +public ProtoInt32ValueConverter(ParentValueContainer parent) { + this.parent = parent; +} + +@Override +public void addInt(int value) { + parent.add(Int32Value.of(value)); +} + } + + final class ProtoUInt32ValueConverter extends
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509738#comment-17509738 ] ASF GitHub Bot commented on PARQUET-2042: - sheinbergon commented on pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1073708633 @mwong38 let me know if you want me to help in any way -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509502#comment-17509502 ] ASF GitHub Bot commented on PARQUET-2042: - shangxinli commented on a change in pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#discussion_r830653898 ## File path: parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java ## @@ -427,6 +485,218 @@ public void addBinary(Binary binary) { } + final class ProtoTimestampConverter extends PrimitiveConverter { + +final ParentValueContainer parent; +final LogicalTypeAnnotation.TimestampLogicalTypeAnnotation logicalTypeAnnotation; + +public ProtoTimestampConverter(ParentValueContainer parent, LogicalTypeAnnotation.TimestampLogicalTypeAnnotation logicalTypeAnnotation) { + this.parent = parent; + this.logicalTypeAnnotation = logicalTypeAnnotation; +} + +@Override +public void addLong(long value) { + switch (logicalTypeAnnotation.getUnit()) { +case MICROS: + parent.add(Timestamps.fromMicros(value)); + break; +case MILLIS: + parent.add(Timestamps.fromMillis(value)); + break; +case NANOS: + parent.add(Timestamps.fromNanos(value)); + break; + } +} + } + + final class ProtoDateConverter extends PrimitiveConverter { + +final ParentValueContainer parent; + +public ProtoDateConverter(ParentValueContainer parent) { + this.parent = parent; +} + +@Override +public void addInt(int value) { + LocalDate localDate = LocalDate.ofEpochDay(value); + com.google.type.Date date = com.google.type.Date.newBuilder() +.setYear(localDate.getYear()) +.setMonth(localDate.getMonthValue()) +.setDay(localDate.getDayOfMonth()) +.build(); + parent.add(date); +} + } + + final class ProtoTimeConverter extends PrimitiveConverter { + +final ParentValueContainer parent; +final LogicalTypeAnnotation.TimeLogicalTypeAnnotation logicalTypeAnnotation; + +public ProtoTimeConverter(ParentValueContainer parent, LogicalTypeAnnotation.TimeLogicalTypeAnnotation logicalTypeAnnotation) { + this.parent = parent; + this.logicalTypeAnnotation = logicalTypeAnnotation; +} + +@Override +public void addLong(long value) { + LocalTime localTime; + switch (logicalTypeAnnotation.getUnit()) { +case MILLIS: + localTime = LocalTime.ofNanoOfDay(value * 1_000_000); + break; +case MICROS: + localTime = LocalTime.ofNanoOfDay(value * 1_000); + break; +case NANOS: + localTime = LocalTime.ofNanoOfDay(value); + break; +default: + throw new IllegalArgumentException("Unrecognized TimeUnit: " + logicalTypeAnnotation.getUnit()); + } + com.google.type.TimeOfDay timeOfDay = com.google.type.TimeOfDay.newBuilder() +.setHours(localTime.getHour()) +.setMinutes(localTime.getMinute()) +.setSeconds(localTime.getSecond()) +.setNanos(localTime.getNano()) +.build(); + parent.add(timeOfDay); +} + } + + final class ProtoDoubleValueConverter extends PrimitiveConverter { + +final ParentValueContainer parent; + +public ProtoDoubleValueConverter(ParentValueContainer parent) { + this.parent = parent; +} + +@Override +public void addDouble(double value) { + parent.add(DoubleValue.of(value)); +} + } + + final class ProtoFloatValueConverter extends PrimitiveConverter { + +final ParentValueContainer parent; + +public ProtoFloatValueConverter(ParentValueContainer parent) { + this.parent = parent; +} + +@Override +public void addFloat(float value) { + parent.add(FloatValue.of(value)); +} + } + + final class ProtoInt64ValueConverter extends PrimitiveConverter { + +final ParentValueContainer parent; + +public ProtoInt64ValueConverter(ParentValueContainer parent) { + this.parent = parent; +} + +@Override +public void addLong(long value) { + parent.add(Int64Value.of(value)); +} + } + + final class ProtoUInt64ValueConverter extends PrimitiveConverter { + +final ParentValueContainer parent; + +public ProtoUInt64ValueConverter(ParentValueContainer parent) { + this.parent = parent; +} + +@Override +public void addLong(long value) { + parent.add(UInt64Value.of(value)); +} + } + + final class ProtoInt32ValueConverter extends PrimitiveConverter { + +final ParentValueContainer parent; + +public ProtoInt32ValueConverter(ParentValueContainer parent) { + this.parent = parent; +} + +@Override +public void addInt(int value) { + parent.add(Int32Value.of(value)); +} + } + + final class ProtoUInt32ValueConverter extends
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509501#comment-17509501 ] ASF GitHub Bot commented on PARQUET-2042: - shangxinli commented on a change in pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#discussion_r830652740 ## File path: parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoMessageConverter.java ## @@ -427,6 +485,218 @@ public void addBinary(Binary binary) { } + final class ProtoTimestampConverter extends PrimitiveConverter { + +final ParentValueContainer parent; +final LogicalTypeAnnotation.TimestampLogicalTypeAnnotation logicalTypeAnnotation; + +public ProtoTimestampConverter(ParentValueContainer parent, LogicalTypeAnnotation.TimestampLogicalTypeAnnotation logicalTypeAnnotation) { + this.parent = parent; + this.logicalTypeAnnotation = logicalTypeAnnotation; +} + +@Override +public void addLong(long value) { + switch (logicalTypeAnnotation.getUnit()) { +case MICROS: + parent.add(Timestamps.fromMicros(value)); + break; +case MILLIS: + parent.add(Timestamps.fromMillis(value)); + break; +case NANOS: + parent.add(Timestamps.fromNanos(value)); + break; + } +} + } + + final class ProtoDateConverter extends PrimitiveConverter { + +final ParentValueContainer parent; + +public ProtoDateConverter(ParentValueContainer parent) { + this.parent = parent; +} + +@Override +public void addInt(int value) { + LocalDate localDate = LocalDate.ofEpochDay(value); + com.google.type.Date date = com.google.type.Date.newBuilder() Review comment: Can we use import? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17509336#comment-17509336 ] ASF GitHub Bot commented on PARQUET-2042: - shangxinli commented on pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1073074540 Can you squash all the commits? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17502398#comment-17502398 ] ASF GitHub Bot commented on PARQUET-2042: - shangxinli commented on pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1060883376 I don't see a reason why this cannot be merge. I will have a look soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501969#comment-17501969 ] ASF GitHub Bot commented on PARQUET-2042: - sheinbergon commented on pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1059969038 @shangxinli is there any reason this work shouldn't be merged, given @mwong38 explanation? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501970#comment-17501970 ] ASF GitHub Bot commented on PARQUET-2042: - sheinbergon edited a comment on pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1059969038 @shangxinli is there any reason this work shouldn't be merged, given @mwong38 explanation? Is there something I can help with? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501903#comment-17501903 ] ASF GitHub Bot commented on PARQUET-2042: - mwong38 commented on pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1059918680 After proto3 made everything optional, there is no way to know whether a primitive has been set or not. That is, you could no longer represent a "nullable" primitive. (They later brought `optional` keyword back, but the damage is done). The solution was to wrap a primitive in a message. For example, to represent a nullable `double`, there is the [DoubleValue](https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#doublevalue). It's analogous to Java's boxed `Double`. Parquet supports nullable primitives natively. If we were to represent these well-known Protobuf types directly, it will be nested inside a deeper data structure; very inconvenient and a waste of space. What I did here is "unwrap" these primitives and make use of Parquet's nullaibility. Additionally, I convert other well-known types such as Timestamp and Date to Parquet's. Again, rather than represent the data structure in its raw nested form (seconds/nanos or year, month, day, etc), they are converted to Parquet's native or logical representation of Timestamp and Date. You can turn on or off this features in the `ProtoSchemaConverter` constructor. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17501733#comment-17501733 ] ASF GitHub Bot commented on PARQUET-2042: - sheinbergon commented on pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-1059757563 @shangxinli @mwong38 any way I can help with merge? Currently, proto-to-parquet conversions do not support proper timestamp handling, and it looks like this work addresses the issue. Let me know. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17442869#comment-17442869 ] ASF GitHub Bot commented on PARQUET-2042: - shangxinli commented on pull request #900: URL: https://github.com/apache/parquet-mr/pull/900#issuecomment-967285427 @mwong38, can you put more information in the Jira on why/what is changed? This is pretty big change and it would help people to review your code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.20.1#820001)
[jira] [Commented] (PARQUET-2042) Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay
[ https://issues.apache.org/jira/browse/PARQUET-2042?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17338228#comment-17338228 ] ASF GitHub Bot commented on PARQUET-2042: - mwong38 opened a new pull request #900: URL: https://github.com/apache/parquet-mr/pull/900 … logical Timestamps, Date, TimeOfDay Make sure you have checked _all_ steps below. ### Jira - [X] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-XXX - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [X] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: ### Commits - [X] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [X] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Unwrap common Protobuf wrappers and logical Timestamps, Date, TimeOfDay > --- > > Key: PARQUET-2042 > URL: https://issues.apache.org/jira/browse/PARQUET-2042 > Project: Parquet > Issue Type: Improvement > Components: parquet-protobuf >Reporter: Michael Wong >Priority: Major > > Related to https://issues.apache.org/jira/browse/PARQUET-1595 -- This message was sent by Atlassian Jira (v8.3.4#803005)