Re: protobuf3 and oneof fields

2020-09-29 Thread Aaron Niskode-Dossett
I played around with the code and found a simple, maybe too simple,
solution and opened a PR.  Fingers crossed.

On Tue, Sep 29, 2020 at 10:55 AM Aaron Niskode-Dossett <
aniskodedoss...@etsy.com> wrote:

> Thank you, David, I agree with your conclusions.  I opened PARQUET-1917.
>
> On Tue, Sep 29, 2020 at 10:18 AM David  wrote:
>
>> Hello,
>>
>> Perhaps a bit more nuance here.  I believe that the values are technically
>> correct (they should be the default value of 0), but we should not be
>> storing them as 0 values.  We need to check the hasBar*() to determine if
>> the value should be stored or omitted.
>>
>> Thanks.
>>
>> On Tue, Sep 29, 2020 at 10:39 AM David  wrote:
>>
>> > Hello,
>> >
>> > I too have been poking around the Parquet-Proto package as well.
>> >
>> > I would expect "bar_int" and "bar_int2" to be 'null' here.
>> >
>> > Have you filed a JIRA with this reproduction?
>> >
>> > Thanks.
>> >
>> > On Fri, Sep 25, 2020 at 9:58 AM Aaron Niskode-Dossett
>> >  wrote:
>> >
>> >> Hello,
>> >>
>> >> I am experimenting with serializing protobuf3 to parquet and have a
>> >> question about how "oneOf" fields should be treated.  I will describe
>> an
>> >> example.  I'm running parquet 1.11.1 with PARQUET-1684 applied.  That
>> JIRA
>> >> is about how default values are written out, and seems related to my
>> >> question.
>> >>
>> >> SCHEMA
>> >> 
>> >> message Person {
>> >>   int32 foo = 1;
>> >>   oneof optional_bar {
>> >> int32 bar_int = 200;
>> >> int32 bar_int2 = 201;
>> >> string bar_string = 300;
>> >>   }
>> >> }
>> >>
>> >> CODE
>> >> 
>> >> I set values for foo and bar_string
>> >>
>> >> for (int i = 0; i < 3; i += 1) {
>> >> com.etsy.grpcparquet.Person message =
>> Person.newBuilder()
>> >> .setFoo(i)
>> >> .setBarString("hello world")
>> >> .build();
>> >> message.writeDelimitedTo(out);
>> >> }
>> >> And then I write the protobuf file out to parquet.
>> >>
>> >> RESULT
>> >> ---
>> >> $ parquet-tools show example.parquet
>> >>
>> >>
>> >> +---+---++--+
>> >> |   foo |   bar_int |   bar_int2 | bar_string   |
>> >> |---+---++--|
>> >> | 0 | 0 |  0 | hello world  |
>> >> | 1 | 0 |  0 | hello world  |
>> >> | 2 | 0 |  0 | hello world  |
>> >> +---+---++--+
>> >>
>> >> I would expect that bar_int and bar_int2 are EMPTY for all three rows
>> >> since
>> >> only bar_string is set in the oneof.
>> >>
>> >> Is this the right expectation for me to have?
>> >>
>> >> Thank you!
>> >>
>> >> --
>> >> Aaron Niskode-Dossett, Data Engineering -- Etsy
>> >>
>> >
>>
>
>
> --
> Aaron Niskode-Dossett, Data Engineering -- Etsy
>


-- 
Aaron Niskode-Dossett, Data Engineering -- Etsy


Re: protobuf3 and oneof fields

2020-09-29 Thread Aaron Niskode-Dossett
Thank you, David, I agree with your conclusions.  I opened PARQUET-1917.

On Tue, Sep 29, 2020 at 10:18 AM David  wrote:

> Hello,
>
> Perhaps a bit more nuance here.  I believe that the values are technically
> correct (they should be the default value of 0), but we should not be
> storing them as 0 values.  We need to check the hasBar*() to determine if
> the value should be stored or omitted.
>
> Thanks.
>
> On Tue, Sep 29, 2020 at 10:39 AM David  wrote:
>
> > Hello,
> >
> > I too have been poking around the Parquet-Proto package as well.
> >
> > I would expect "bar_int" and "bar_int2" to be 'null' here.
> >
> > Have you filed a JIRA with this reproduction?
> >
> > Thanks.
> >
> > On Fri, Sep 25, 2020 at 9:58 AM Aaron Niskode-Dossett
> >  wrote:
> >
> >> Hello,
> >>
> >> I am experimenting with serializing protobuf3 to parquet and have a
> >> question about how "oneOf" fields should be treated.  I will describe an
> >> example.  I'm running parquet 1.11.1 with PARQUET-1684 applied.  That
> JIRA
> >> is about how default values are written out, and seems related to my
> >> question.
> >>
> >> SCHEMA
> >> 
> >> message Person {
> >>   int32 foo = 1;
> >>   oneof optional_bar {
> >> int32 bar_int = 200;
> >> int32 bar_int2 = 201;
> >> string bar_string = 300;
> >>   }
> >> }
> >>
> >> CODE
> >> 
> >> I set values for foo and bar_string
> >>
> >> for (int i = 0; i < 3; i += 1) {
> >> com.etsy.grpcparquet.Person message =
> Person.newBuilder()
> >> .setFoo(i)
> >> .setBarString("hello world")
> >> .build();
> >> message.writeDelimitedTo(out);
> >> }
> >> And then I write the protobuf file out to parquet.
> >>
> >> RESULT
> >> ---
> >> $ parquet-tools show example.parquet
> >>
> >>
> >> +---+---++--+
> >> |   foo |   bar_int |   bar_int2 | bar_string   |
> >> |---+---++--|
> >> | 0 | 0 |  0 | hello world  |
> >> | 1 | 0 |  0 | hello world  |
> >> | 2 | 0 |  0 | hello world  |
> >> +---+---++--+
> >>
> >> I would expect that bar_int and bar_int2 are EMPTY for all three rows
> >> since
> >> only bar_string is set in the oneof.
> >>
> >> Is this the right expectation for me to have?
> >>
> >> Thank you!
> >>
> >> --
> >> Aaron Niskode-Dossett, Data Engineering -- Etsy
> >>
> >
>


-- 
Aaron Niskode-Dossett, Data Engineering -- Etsy


Re: protobuf3 and oneof fields

2020-09-29 Thread David
Hello,

Perhaps a bit more nuance here.  I believe that the values are technically
correct (they should be the default value of 0), but we should not be
storing them as 0 values.  We need to check the hasBar*() to determine if
the value should be stored or omitted.

Thanks.

On Tue, Sep 29, 2020 at 10:39 AM David  wrote:

> Hello,
>
> I too have been poking around the Parquet-Proto package as well.
>
> I would expect "bar_int" and "bar_int2" to be 'null' here.
>
> Have you filed a JIRA with this reproduction?
>
> Thanks.
>
> On Fri, Sep 25, 2020 at 9:58 AM Aaron Niskode-Dossett
>  wrote:
>
>> Hello,
>>
>> I am experimenting with serializing protobuf3 to parquet and have a
>> question about how "oneOf" fields should be treated.  I will describe an
>> example.  I'm running parquet 1.11.1 with PARQUET-1684 applied.  That JIRA
>> is about how default values are written out, and seems related to my
>> question.
>>
>> SCHEMA
>> 
>> message Person {
>>   int32 foo = 1;
>>   oneof optional_bar {
>> int32 bar_int = 200;
>> int32 bar_int2 = 201;
>> string bar_string = 300;
>>   }
>> }
>>
>> CODE
>> 
>> I set values for foo and bar_string
>>
>> for (int i = 0; i < 3; i += 1) {
>> com.etsy.grpcparquet.Person message = Person.newBuilder()
>> .setFoo(i)
>> .setBarString("hello world")
>> .build();
>> message.writeDelimitedTo(out);
>> }
>> And then I write the protobuf file out to parquet.
>>
>> RESULT
>> ---
>> $ parquet-tools show example.parquet
>>
>>
>> +---+---++--+
>> |   foo |   bar_int |   bar_int2 | bar_string   |
>> |---+---++--|
>> | 0 | 0 |  0 | hello world  |
>> | 1 | 0 |  0 | hello world  |
>> | 2 | 0 |  0 | hello world  |
>> +---+---++--+
>>
>> I would expect that bar_int and bar_int2 are EMPTY for all three rows
>> since
>> only bar_string is set in the oneof.
>>
>> Is this the right expectation for me to have?
>>
>> Thank you!
>>
>> --
>> Aaron Niskode-Dossett, Data Engineering -- Etsy
>>
>


Re: protobuf3 and oneof fields

2020-09-29 Thread David
Hello,

I too have been poking around the Parquet-Proto package as well.

I would expect "bar_int" and "bar_int2" to be 'null' here.

Have you filed a JIRA with this reproduction?

Thanks.

On Fri, Sep 25, 2020 at 9:58 AM Aaron Niskode-Dossett
 wrote:

> Hello,
>
> I am experimenting with serializing protobuf3 to parquet and have a
> question about how "oneOf" fields should be treated.  I will describe an
> example.  I'm running parquet 1.11.1 with PARQUET-1684 applied.  That JIRA
> is about how default values are written out, and seems related to my
> question.
>
> SCHEMA
> 
> message Person {
>   int32 foo = 1;
>   oneof optional_bar {
> int32 bar_int = 200;
> int32 bar_int2 = 201;
> string bar_string = 300;
>   }
> }
>
> CODE
> 
> I set values for foo and bar_string
>
> for (int i = 0; i < 3; i += 1) {
> com.etsy.grpcparquet.Person message = Person.newBuilder()
> .setFoo(i)
> .setBarString("hello world")
> .build();
> message.writeDelimitedTo(out);
> }
> And then I write the protobuf file out to parquet.
>
> RESULT
> ---
> $ parquet-tools show example.parquet
>
>
> +---+---++--+
> |   foo |   bar_int |   bar_int2 | bar_string   |
> |---+---++--|
> | 0 | 0 |  0 | hello world  |
> | 1 | 0 |  0 | hello world  |
> | 2 | 0 |  0 | hello world  |
> +---+---++--+
>
> I would expect that bar_int and bar_int2 are EMPTY for all three rows since
> only bar_string is set in the oneof.
>
> Is this the right expectation for me to have?
>
> Thank you!
>
> --
> Aaron Niskode-Dossett, Data Engineering -- Etsy
>


protobuf3 and oneof fields

2020-09-25 Thread Aaron Niskode-Dossett
Hello,

I am experimenting with serializing protobuf3 to parquet and have a
question about how "oneOf" fields should be treated.  I will describe an
example.  I'm running parquet 1.11.1 with PARQUET-1684 applied.  That JIRA
is about how default values are written out, and seems related to my
question.

SCHEMA

message Person {
  int32 foo = 1;
  oneof optional_bar {
int32 bar_int = 200;
int32 bar_int2 = 201;
string bar_string = 300;
  }
}

CODE

I set values for foo and bar_string

for (int i = 0; i < 3; i += 1) {
com.etsy.grpcparquet.Person message = Person.newBuilder()
.setFoo(i)
.setBarString("hello world")
.build();
message.writeDelimitedTo(out);
}
And then I write the protobuf file out to parquet.

RESULT
---
$ parquet-tools show example.parquet


+---+---++--+
|   foo |   bar_int |   bar_int2 | bar_string   |
|---+---++--|
| 0 | 0 |  0 | hello world  |
| 1 | 0 |  0 | hello world  |
| 2 | 0 |  0 | hello world  |
+---+---++--+

I would expect that bar_int and bar_int2 are EMPTY for all three rows since
only bar_string is set in the oneof.

Is this the right expectation for me to have?

Thank you!

-- 
Aaron Niskode-Dossett, Data Engineering -- Etsy