[jira] [Updated] (PARQUET-1595) Parquet proto writer de-nest Protobuf wrapper classes

2019-06-13 Thread Ying Xu (JIRA)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1595?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ying Xu updated PARQUET-1595:
-
Description: 
Existing Parquet protobuf writer support preserves the structure of any 
Protobuf Message objects.  This works well in most cases. However, when dealing 
with [Protobuf wrapper 
messages|https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/wrappers.proto],
 users may prefer directly writing the de-nested value into the Parquet files, 
for ease of querying them directly (in query engine such as Hive/Presto). 

Proposal: 
 * Implement a control flag, e.g., enableDenestingWrappers, to control whether 
or not to denest Protobuf wrapper classes. 
 * When this flag is set to true, write the Protobuf wrapper classes as single 
primitive fields, based on the type of the wrapped *value* field.
  
||Protobuf Type||Parquet Type||
|BoolValue|boolean|
|BytesValue|binary|
|DoubleValue|double|
|FloatValue|float|
|Int32Value|int64 (32-bit, signed)|
|Int64Value|int64 (64-bit, signed)|
|StringValue|binary (string)|
|UInt32Value|int64 (32-bit, unsigned)|
|UInt64Value|int64 (64-bit, unsigned)|

 

  was:
Existing Parquet protobuf writer support preserves the structure of any 
Protobuf Message objects.  This works well in most cases. However, when dealing 
with [Protobuf wrapper 
messages|https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/wrappers.proto],
 users may prefer directly writing the de-nested value into the Parquet files, 
for ease of querying them directly (in query engine such as Hive/Presto). 

Proposal: 
 * Implement a control flag, e.g., enableDenestingProtoWrappers, to control 
whether or not to denest Protobuf wrapper classes. 
 * When this flag is set to true, write the Protobuf wrapper classes as single 
primitive fields, based on the type of the wrapped *value* field.
 
||Protobuf Type||Parquet Type||
|BoolValue|boolean|
|BytesValue|binary|
|DoubleValue|double|
|FloatValue|float|
|Int32Value|int64 (32-bit, signed)|
|Int64Value|int64 (64-bit, signed)|
|StringValue|binary (string)|
|UInt32Value|int64 (32-bit, unsigned)|
|UInt64Value|int64 (64-bit, unsigned)|

 


> Parquet proto writer de-nest Protobuf wrapper classes
> -
>
> Key: PARQUET-1595
> URL: https://issues.apache.org/jira/browse/PARQUET-1595
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-mr
>Reporter: Ying Xu
>Priority: Major
>
> Existing Parquet protobuf writer support preserves the structure of any 
> Protobuf Message objects.  This works well in most cases. However, when 
> dealing with [Protobuf wrapper 
> messages|https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/wrappers.proto],
>  users may prefer directly writing the de-nested value into the Parquet 
> files, for ease of querying them directly (in query engine such as 
> Hive/Presto). 
> Proposal: 
>  * Implement a control flag, e.g., enableDenestingWrappers, to control 
> whether or not to denest Protobuf wrapper classes. 
>  * When this flag is set to true, write the Protobuf wrapper classes as 
> single primitive fields, based on the type of the wrapped *value* field.
>   
> ||Protobuf Type||Parquet Type||
> |BoolValue|boolean|
> |BytesValue|binary|
> |DoubleValue|double|
> |FloatValue|float|
> |Int32Value|int64 (32-bit, signed)|
> |Int64Value|int64 (64-bit, signed)|
> |StringValue|binary (string)|
> |UInt32Value|int64 (32-bit, unsigned)|
> |UInt64Value|int64 (64-bit, unsigned)|
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (PARQUET-1595) Parquet proto writer de-nest Protobuf wrapper classes

2019-06-12 Thread Ying Xu (JIRA)
Ying Xu created PARQUET-1595:


 Summary: Parquet proto writer de-nest Protobuf wrapper classes
 Key: PARQUET-1595
 URL: https://issues.apache.org/jira/browse/PARQUET-1595
 Project: Parquet
  Issue Type: Improvement
  Components: parquet-mr
Reporter: Ying Xu


Existing Parquet protobuf writer support preserves the structure of any 
Protobuf Message objects.  This works well in most cases. However, when dealing 
with [Protobuf wrapper 
messages|https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/wrappers.proto],
 users may prefer directly writing the de-nested value into the Parquet files, 
for ease of querying them directly (in query engine such as Hive/Presto). 

Proposal: 
 * Implement a control flag, e.g., enableDenestingProtoWrappers, to control 
whether or not to denest Protobuf wrapper classes. 
 * When this flag is set to true, write the Protobuf wrapper classes as single 
primitive fields, based on the type of the wrapped *value* field.
 
||Protobuf Type||Parquet Type||
|BoolValue|boolean|
|BytesValue|binary|
|DoubleValue|double|
|FloatValue|float|
|Int32Value|int64 (32-bit, signed)|
|Int64Value|int64 (64-bit, signed)|
|StringValue|binary (string)|
|UInt32Value|int64 (32-bit, unsigned)|
|UInt64Value|int64 (64-bit, unsigned)|

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)