Ying Xu created PARQUET-1595: -------------------------------- Summary: Parquet proto writer de-nest Protobuf wrapper classes Key: PARQUET-1595 URL: https://issues.apache.org/jira/browse/PARQUET-1595 Project: Parquet Issue Type: Improvement Components: parquet-mr Reporter: Ying Xu
Existing Parquet protobuf writer support preserves the structure of any Protobuf Message objects. This works well in most cases. However, when dealing with [Protobuf wrapper messages|https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/wrappers.proto], users may prefer directly writing the de-nested value into the Parquet files, for ease of querying them directly (in query engine such as Hive/Presto). Proposal: * Implement a control flag, e.g., enableDenestingProtoWrappers, to control whether or not to denest Protobuf wrapper classes. * When this flag is set to true, write the Protobuf wrapper classes as single primitive fields, based on the type of the wrapped *value* field. ||Protobuf Type||Parquet Type|| |BoolValue|boolean| |BytesValue|binary| |DoubleValue|double| |FloatValue|float| |Int32Value|int64 (32-bit, signed)| |Int64Value|int64 (64-bit, signed)| |StringValue|binary (string)| |UInt32Value|int64 (32-bit, unsigned)| |UInt64Value|int64 (64-bit, unsigned)| -- This message was sent by Atlassian JIRA (v7.6.3#76005)