[jira] [Updated] (PARQUET-1020) Add support for Dynamic Messages in parquet-protobuf

2022-09-15 Thread Aaron Blake Niskode-Dossett (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Blake Niskode-Dossett updated PARQUET-1020:
-
Fix Version/s: 1.13.0

> Add support for Dynamic Messages in parquet-protobuf
> 
>
> Key: PARQUET-1020
> URL: https://issues.apache.org/jira/browse/PARQUET-1020
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-protobuf
>Reporter: Alex Buck
>Assignee: Alex Buck
>Priority: Major
> Fix For: 1.13.0
>
>
> Hello. We would like to pass in a DynamicMessage rather than using the 
> generated protobuf classes to allow us to make our job very generic. 
> I think this could be achieved by setting the descriptor upfront, similarly 
> to how there is a ProtoParquetOutputFormat today.
> In ProtoWriteSupport in the init method it could then generate the parquet 
> schema created by ProtoSchemaConverter using the passed in descriptor, rather 
> than taking it from the generated proto class.
> Would there be interest in incorporating this change? If so does the approach 
> above sound sensible? I am happy to do a pull request
> initial PR here: https://github.com/apache/parquet-mr/pull/414



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Resolved] (PARQUET-1020) Add support for Dynamic Messages in parquet-protobuf

2022-09-15 Thread Aaron Blake Niskode-Dossett (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Blake Niskode-Dossett resolved PARQUET-1020.
--
Resolution: Fixed

> Add support for Dynamic Messages in parquet-protobuf
> 
>
> Key: PARQUET-1020
> URL: https://issues.apache.org/jira/browse/PARQUET-1020
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-protobuf
>Reporter: Alex Buck
>Assignee: Alex Buck
>Priority: Major
> Fix For: 1.13.0
>
>
> Hello. We would like to pass in a DynamicMessage rather than using the 
> generated protobuf classes to allow us to make our job very generic. 
> I think this could be achieved by setting the descriptor upfront, similarly 
> to how there is a ProtoParquetOutputFormat today.
> In ProtoWriteSupport in the init method it could then generate the parquet 
> schema created by ProtoSchemaConverter using the passed in descriptor, rather 
> than taking it from the generated proto class.
> Would there be interest in incorporating this change? If so does the approach 
> above sound sensible? I am happy to do a pull request
> initial PR here: https://github.com/apache/parquet-mr/pull/414



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Updated] (PARQUET-1020) Add support for Dynamic Messages in parquet-protobuf

2022-06-09 Thread Aaron Blake Niskode-Dossett (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Blake Niskode-Dossett updated PARQUET-1020:
-
Component/s: parquet-protobuf

> Add support for Dynamic Messages in parquet-protobuf
> 
>
> Key: PARQUET-1020
> URL: https://issues.apache.org/jira/browse/PARQUET-1020
> Project: Parquet
>  Issue Type: New Feature
>  Components: parquet-protobuf
>Reporter: Alex Buck
>Assignee: Alex Buck
>Priority: Major
>
> Hello. We would like to pass in a DynamicMessage rather than using the 
> generated protobuf classes to allow us to make our job very generic. 
> I think this could be achieved by setting the descriptor upfront, similarly 
> to how there is a ProtoParquetOutputFormat today.
> In ProtoWriteSupport in the init method it could then generate the parquet 
> schema created by ProtoSchemaConverter using the passed in descriptor, rather 
> than taking it from the generated proto class.
> Would there be interest in incorporating this change? If so does the approach 
> above sound sensible? I am happy to do a pull request
> initial PR here: https://github.com/apache/parquet-mr/pull/414



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type

2022-05-26 Thread Aaron Blake Niskode-Dossett (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542642#comment-17542642
 ] 

Aaron Blake Niskode-Dossett commented on PARQUET-1711:
--

That is a very interesting issue.  I'm still learning a lot about parquet, but 
I'm not sure parquet has a notion of a recursive schema the way protobuf does 
here.  In other words, I'm not sure how parquet *could* handle this (although 
it should not be a stack overflow!)

> [parquet-protobuf] stack overflow when work with well known json type
> -
>
> Key: PARQUET-1711
> URL: https://issues.apache.org/jira/browse/PARQUET-1711
> Project: Parquet
>  Issue Type: Bug
>Affects Versions: 1.10.1
>Reporter: Lawrence He
>Priority: Major
>
> Writing following protobuf message as parquet file is not possible: 
> {code:java}
> syntax = "proto3";
> import "google/protobuf/struct.proto";
> package test;
> option java_outer_classname = "CustomMessage";
> message TestMessage {
> map data = 1;
> } {code}
> Protobuf introduced "well known json type" such like 
> [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue]
>  to work around json schema conversion. 
> However writing above messages traps parquet writer into an infinite loop due 
> to the "general type" support in protobuf. Current implementation will keep 
> referencing 6 possible types defined in protobuf (null, bool, number, string, 
> struct, list) and entering infinite loop when referencing "struct".
> {code:java}
> java.lang.StackOverflowErrorjava.lang.StackOverflowError at 
> java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at 
> java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at 
> java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044)
>  at 
> java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66)
>  at 
> org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)


[jira] [Commented] (PARQUET-2126) Thread safety bug in CodecFactory

2022-03-30 Thread Aaron Blake Niskode-Dossett (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-2126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17514752#comment-17514752
 ] 

Aaron Blake Niskode-Dossett commented on PARQUET-2126:
--

[~dzamo] I'm not a committer on this project, but if you'd like to submit a fix 
for this I'd be happy to review it (for whatever that would be worth).

 

I looked at the code and a couple of thoughts:

The hadoop-common library does (de)compressor pooling and has a DoNotPool 
annotation used by, yep, 
[BuiltInGzipCompressor|https://github.com/apache/hadoop/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipDecompressor.java#L35[]|https://github.com/apache/hadoop-common/blob/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/compress/zlib/BuiltInGzipDecompressor.java#L35].]

 
 * Perhaps Parquet could look for that annotation and us/reuse accordingly?
 * It's also possible that parquet doesn't need to cache the (de)compressors at 
all since the hadoop CodecPool might be doing this already (and respecting the 
DoNotPool annotation)? Most of this parquet code is over nine years old so it's 
hard to tell.

> Thread safety bug in CodecFactory
> -
>
> Key: PARQUET-2126
> URL: https://issues.apache.org/jira/browse/PARQUET-2126
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.12.2
>Reporter: James Turton
>Priority: Major
>
> The code for returning Compressor objects to the caller goes to some lengths 
> to achieve thread safety, including keeping Codec objects in an Apache 
> Commons pool that has thread-safe borrow semantics.  This is all undone by 
> the BytesCompressor and BytesDecompressor Maps in 
> org.apache.parquet.hadoop.CodecFactory which end up caching single compressor 
> and decompressor instances due to code in CodecFactory@getCompressor and 
> CodecFactory@getDecompressor.  When the caller runs multiple threads, those 
> threads end up sharing compressor and decompressor instances.
> For compressors based on Xerial Snappy this bug has no effect because that 
> library is itself thread safe.  But when BuiltInGzipCompressor from Hadoop is 
> selected for the CompressionCodecName.GZIP case, serious problems ensue.  
> That class is not thread safe and sharing one instance of it between threads 
> produces both silent data corruption and JVM crashes.
> To fix this situation, parquet-mr should stop caching single compressor and 
> decompressor instances.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Created] (PARQUET-2018) ParquetThriftWriter uses deprecated constructors

2021-04-07 Thread Aaron Blake Niskode-Dossett (Jira)
Aaron Blake Niskode-Dossett created PARQUET-2018:


 Summary: ParquetThriftWriter uses deprecated constructors
 Key: PARQUET-2018
 URL: https://issues.apache.org/jira/browse/PARQUET-2018
 Project: Parquet
  Issue Type: Improvement
  Components: parquet-thrift
Affects Versions: 1.12.0
Reporter: Aaron Blake Niskode-Dossett


ParquetThriftWriter only has constructors that rely on deprecated ParquetWriter 
constructors.  It should implement a builder by extending ParquetWriter.builder 
similar to how other parquet writer extensions have.

 

This would, at some point in the future, be a blocker for 2.0.0



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (PARQUET-1903) Improve Parquet Protobuf Usability

2021-03-31 Thread Aaron Blake Niskode-Dossett (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Blake Niskode-Dossett updated PARQUET-1903:
-
Component/s: parquet-protobuf

> Improve Parquet Protobuf Usability
> --
>
> Key: PARQUET-1903
> URL: https://issues.apache.org/jira/browse/PARQUET-1903
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-protobuf
>Reporter: David Mollitor
>Assignee: David Mollitor
>Priority: Major
>
> Check out the PR for details.
>  
>  * Move away from passing around a {{Class}} object to take advantage of Java 
> Templating
>  * Make parquet-proto library more usable and straight-forward
>  * Provide test examples
>  * Limited support for protocol buffer schema registry
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (PARQUET-2012) ProtoParquetWriter constructors should be updated

2021-03-31 Thread Aaron Blake Niskode-Dossett (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Blake Niskode-Dossett updated PARQUET-2012:
-
Priority: Minor  (was: Major)

> ProtoParquetWriter constructors should be updated
> -
>
> Key: PARQUET-2012
> URL: https://issues.apache.org/jira/browse/PARQUET-2012
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-protobuf
>Affects Versions: 1.12.0
>Reporter: Aaron Blake Niskode-Dossett
>Assignee: Aaron Blake Niskode-Dossett
>Priority: Minor
>
> The constructors should be marked as deprecated and internal uses of them 
> switched to the Builder pattern



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (PARQUET-2012) ProtoParquetWriter constructors should be updated

2021-03-31 Thread Aaron Blake Niskode-Dossett (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-2012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Blake Niskode-Dossett updated PARQUET-2012:
-
Description: The constructors should be marked as deprecated and internal 
uses of them switched to the Builder pattern  (was: The constructors should be 
updated not to rely on deprecated ParquetWriter constructors. The constructors 
should also add support for setting ParquetFileWriter.Mode)

> ProtoParquetWriter constructors should be updated
> -
>
> Key: PARQUET-2012
> URL: https://issues.apache.org/jira/browse/PARQUET-2012
> Project: Parquet
>  Issue Type: Improvement
>  Components: parquet-protobuf
>Affects Versions: 1.12.0
>Reporter: Aaron Blake Niskode-Dossett
>Assignee: Aaron Blake Niskode-Dossett
>Priority: Major
>
> The constructors should be marked as deprecated and internal uses of them 
> switched to the Builder pattern



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (PARQUET-2012) ProtoParquetWriter constructors should be updated

2021-03-31 Thread Aaron Blake Niskode-Dossett (Jira)
Aaron Blake Niskode-Dossett created PARQUET-2012:


 Summary: ProtoParquetWriter constructors should be updated
 Key: PARQUET-2012
 URL: https://issues.apache.org/jira/browse/PARQUET-2012
 Project: Parquet
  Issue Type: Improvement
  Components: parquet-protobuf
Affects Versions: 1.12.0
Reporter: Aaron Blake Niskode-Dossett
Assignee: Aaron Blake Niskode-Dossett


The constructors should be updated not to rely on deprecated ParquetWriter 
constructors. The constructors should also add support for setting 
ParquetFileWriter.Mode



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (PARQUET-1917) [parquet-proto] default values are stored in oneOf fields that aren't set

2020-10-21 Thread Aaron Blake Niskode-Dossett (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1917?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Blake Niskode-Dossett updated PARQUET-1917:
-
Component/s: parquet-protobuf

> [parquet-proto] default values are stored in oneOf fields that aren't set
> -
>
> Key: PARQUET-1917
> URL: https://issues.apache.org/jira/browse/PARQUET-1917
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-protobuf
>Affects Versions: 1.12.0
>Reporter: Aaron Blake Niskode-Dossett
>Priority: Major
>
> SCHEMA
> 
> {noformat}
> message Person {
>   int32 foo = 1;
>   oneof optional_bar {
>     int32 bar_int = 200;
>     int32 bar_int2 = 201;
>     string bar_string = 300;
>   }
> }{noformat}
>  
> CODE
> 
> I set values for foo and bar_string
>  
> {noformat}
> for (int i = 0; i < 3; i += 1) {
>                 com.etsy.grpcparquet.Person message = Person.newBuilder()
>                         .setFoo(i)
>                         .setBarString("hello world")
>                         .build();
>                 message.writeDelimitedTo(out);
>             }{noformat}
> And then I write the protobuf file out to parquet.
>  
> RESULT
> ---
> {noformat}
> $ parquet-tools show example.parquet                                          
>                                                                               
> +---+---++--+
> |   foo |   bar_int |   bar_int2 | bar_string   |
> |---+---++--|
> |     0 |         0 |          0 | hello world  |
> |     1 |         0 |          0 | hello world  |
> |     2 |         0 |          0 | hello world  |
> +---+---++--+{noformat}
>  
> bar_int and bar_int2 should be EMPTY for all three rows since only bar_string 
> is set in the oneof.  0 is the default value for int, but it should not be 
> stored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (PARQUET-1917) [parquet-proto] default values are stored in oneOf fields that aren't set

2020-09-29 Thread Aaron Blake Niskode-Dossett (Jira)
Aaron Blake Niskode-Dossett created PARQUET-1917:


 Summary: [parquet-proto] default values are stored in oneOf fields 
that aren't set
 Key: PARQUET-1917
 URL: https://issues.apache.org/jira/browse/PARQUET-1917
 Project: Parquet
  Issue Type: Bug
Affects Versions: 1.12.0
Reporter: Aaron Blake Niskode-Dossett


SCHEMA

{noformat}
message Person {
  int32 foo = 1;
  oneof optional_bar {
    int32 bar_int = 200;
    int32 bar_int2 = 201;
    string bar_string = 300;
  }
}{noformat}
 
CODE

I set values for foo and bar_string
 
{noformat}
for (int i = 0; i < 3; i += 1) {
                com.etsy.grpcparquet.Person message = Person.newBuilder()
                        .setFoo(i)
                        .setBarString("hello world")
                        .build();
                message.writeDelimitedTo(out);
            }{noformat}


And then I write the protobuf file out to parquet.
 
RESULT
---
{noformat}
$ parquet-tools show example.parquet                                            
                                                                            
+---+---++--+
|   foo |   bar_int |   bar_int2 | bar_string   |
|---+---++--|
|     0 |         0 |          0 | hello world  |
|     1 |         0 |          0 | hello world  |
|     2 |         0 |          0 | hello world  |
+---+---++--+{noformat}
 
bar_int and bar_int2 should be EMPTY for all three rows since only bar_string 
is set in the oneof.  0 is the default value for int, but it should not be 
stored.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1684) [parquet-protobuf] default protobuf field values are stored as nulls

2020-07-29 Thread Aaron Blake Niskode-Dossett (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17167277#comment-17167277
 ] 

Aaron Blake Niskode-Dossett commented on PARQUET-1684:
--

Thanks [~gszadovszky] for the update and for considering this.  As a committer 
on other Apache projects, I really appreciate the thoughtful approach the 
Parquet community takes on matters like this.  Cheers.

> [parquet-protobuf] default protobuf field values are stored as nulls
> 
>
> Key: PARQUET-1684
> URL: https://issues.apache.org/jira/browse/PARQUET-1684
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.10.0, 1.11.0
>Reporter: George Haddad
>Assignee: Priyank Bagrecha
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
> Attachments: image-2020-07-28-12-24-05-087.png
>
>
> When the source is a protobuf3 message, and the target file is Parquet, all 
> the default values are stored in the output parquet as `{{null`}} instead of 
> the actual type's default value.
>  For example, if the field is of type `int32`, `double` or `enum` and it 
> hasn't been set, the parquet value is `{{null`}} instead of `0`. When the 
> field's type is a `string` that hasn't been set, the parquet value is 
> {{`null`}} instead of an empty string.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Commented] (PARQUET-1684) [parquet-protobuf] default protobuf field values are stored as nulls

2020-07-28 Thread Aaron Blake Niskode-Dossett (Jira)


[ 
https://issues.apache.org/jira/browse/PARQUET-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17166579#comment-17166579
 ] 

Aaron Blake Niskode-Dossett commented on PARQUET-1684:
--

[~gszadovszky] Thank you for following up, I really appreciate it.  I am also 
not a protobuf expert, but I'll do my best here.

 

Is there a workaround?  I don't believe so.  Here's an example that I hopeful 
is useful.

I defined this protobuf:

 
{code:java}
message Person {
 int32 foo = 1;
 oneof optional_bar {
 int32 bar_int = 200;
 string bar_string = 201;
 }
}{code}
 

 

And I wrote some simple code to populate three instances of it (below) and 
write it to parquet.

 
{code:java}
for (int i = 0; i < 3; i += 1) {
 com.etsy.grpcparquet.Person message = Person.newBuilder()
 .setFoo(i)
 .setBarString("hello world")
 .build();
 message.writeDelimitedTo(out);
} 
{code}
 

The parquet looks like this:

 
{code:java}
$ parquet-tools show example.parquet
+---+---+--+
| foo   | bar_int   | bar_string   |
|---+---+--|
| nan   | nan   | hello world  |
| 1 | nan   | hello world  |
| 2 | nan   | hello world  |
+---+---+--+
 
{code}
 

In the first row the fact that foo was set to zero has been lost and it's null. 
 The `bar_int` column shows what an actually null column would look like.  
Similar results in a system like BigQuery:

!image-2020-07-28-12-24-05-087.png!

 

Would this cause a potential regression?  If someone was relying on the fact 
that default values are encoded as nulls it would, but that seems unimaginable 
to be honest.

> [parquet-protobuf] default protobuf field values are stored as nulls
> 
>
> Key: PARQUET-1684
> URL: https://issues.apache.org/jira/browse/PARQUET-1684
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.10.0, 1.11.0
>Reporter: George Haddad
>Assignee: Priyank Bagrecha
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
> Attachments: image-2020-07-28-12-24-05-087.png
>
>
> When the source is a protobuf3 message, and the target file is Parquet, all 
> the default values are stored in the output parquet as `{{null`}} instead of 
> the actual type's default value.
>  For example, if the field is of type `int32`, `double` or `enum` and it 
> hasn't been set, the parquet value is `{{null`}} instead of `0`. When the 
> field's type is a `string` that hasn't been set, the parquet value is 
> {{`null`}} instead of an empty string.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (PARQUET-1684) [parquet-protobuf] default protobuf field values are stored as nulls

2020-07-28 Thread Aaron Blake Niskode-Dossett (Jira)


 [ 
https://issues.apache.org/jira/browse/PARQUET-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Aaron Blake Niskode-Dossett updated PARQUET-1684:
-
Attachment: image-2020-07-28-12-24-05-087.png

> [parquet-protobuf] default protobuf field values are stored as nulls
> 
>
> Key: PARQUET-1684
> URL: https://issues.apache.org/jira/browse/PARQUET-1684
> Project: Parquet
>  Issue Type: Bug
>  Components: parquet-mr
>Affects Versions: 1.10.0, 1.11.0
>Reporter: George Haddad
>Assignee: Priyank Bagrecha
>Priority: Major
>  Labels: pull-request-available
> Fix For: 1.12.0
>
> Attachments: image-2020-07-28-12-24-05-087.png
>
>
> When the source is a protobuf3 message, and the target file is Parquet, all 
> the default values are stored in the output parquet as `{{null`}} instead of 
> the actual type's default value.
>  For example, if the field is of type `int32`, `double` or `enum` and it 
> hasn't been set, the parquet value is `{{null`}} instead of `0`. When the 
> field's type is a `string` that hasn't been set, the parquet value is 
> {{`null`}} instead of an empty string.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)