Re: protobuf3 and oneof fields
I played around with the code and found a simple, maybe too simple, solution and opened a PR. Fingers crossed. On Tue, Sep 29, 2020 at 10:55 AM Aaron Niskode-Dossett < aniskodedoss...@etsy.com> wrote: > Thank you, David, I agree with your conclusions. I opened PARQUET-1917. > > On Tue, Sep 29, 2020 at 10:18 AM David wrote: > >> Hello, >> >> Perhaps a bit more nuance here. I believe that the values are technically >> correct (they should be the default value of 0), but we should not be >> storing them as 0 values. We need to check the hasBar*() to determine if >> the value should be stored or omitted. >> >> Thanks. >> >> On Tue, Sep 29, 2020 at 10:39 AM David wrote: >> >> > Hello, >> > >> > I too have been poking around the Parquet-Proto package as well. >> > >> > I would expect "bar_int" and "bar_int2" to be 'null' here. >> > >> > Have you filed a JIRA with this reproduction? >> > >> > Thanks. >> > >> > On Fri, Sep 25, 2020 at 9:58 AM Aaron Niskode-Dossett >> > wrote: >> > >> >> Hello, >> >> >> >> I am experimenting with serializing protobuf3 to parquet and have a >> >> question about how "oneOf" fields should be treated. I will describe >> an >> >> example. I'm running parquet 1.11.1 with PARQUET-1684 applied. That >> JIRA >> >> is about how default values are written out, and seems related to my >> >> question. >> >> >> >> SCHEMA >> >> >> >> message Person { >> >> int32 foo = 1; >> >> oneof optional_bar { >> >> int32 bar_int = 200; >> >> int32 bar_int2 = 201; >> >> string bar_string = 300; >> >> } >> >> } >> >> >> >> CODE >> >> >> >> I set values for foo and bar_string >> >> >> >> for (int i = 0; i < 3; i += 1) { >> >> com.etsy.grpcparquet.Person message = >> Person.newBuilder() >> >> .setFoo(i) >> >> .setBarString("hello world") >> >> .build(); >> >> message.writeDelimitedTo(out); >> >> } >> >> And then I write the protobuf file out to parquet. >> >> >> >> RESULT >> >> --- >> >> $ parquet-tools show example.parquet >> >> >> >> >> >> +---+---++--+ >> >> | foo | bar_int | bar_int2 | bar_string | >> >> |---+---++--| >> >> | 0 | 0 | 0 | hello world | >> >> | 1 | 0 | 0 | hello world | >> >> | 2 | 0 | 0 | hello world | >> >> +---+---++--+ >> >> >> >> I would expect that bar_int and bar_int2 are EMPTY for all three rows >> >> since >> >> only bar_string is set in the oneof. >> >> >> >> Is this the right expectation for me to have? >> >> >> >> Thank you! >> >> >> >> -- >> >> Aaron Niskode-Dossett, Data Engineering -- Etsy >> >> >> > >> > > > -- > Aaron Niskode-Dossett, Data Engineering -- Etsy > -- Aaron Niskode-Dossett, Data Engineering -- Etsy
Re: protobuf3 and oneof fields
Thank you, David, I agree with your conclusions. I opened PARQUET-1917. On Tue, Sep 29, 2020 at 10:18 AM David wrote: > Hello, > > Perhaps a bit more nuance here. I believe that the values are technically > correct (they should be the default value of 0), but we should not be > storing them as 0 values. We need to check the hasBar*() to determine if > the value should be stored or omitted. > > Thanks. > > On Tue, Sep 29, 2020 at 10:39 AM David wrote: > > > Hello, > > > > I too have been poking around the Parquet-Proto package as well. > > > > I would expect "bar_int" and "bar_int2" to be 'null' here. > > > > Have you filed a JIRA with this reproduction? > > > > Thanks. > > > > On Fri, Sep 25, 2020 at 9:58 AM Aaron Niskode-Dossett > > wrote: > > > >> Hello, > >> > >> I am experimenting with serializing protobuf3 to parquet and have a > >> question about how "oneOf" fields should be treated. I will describe an > >> example. I'm running parquet 1.11.1 with PARQUET-1684 applied. That > JIRA > >> is about how default values are written out, and seems related to my > >> question. > >> > >> SCHEMA > >> > >> message Person { > >> int32 foo = 1; > >> oneof optional_bar { > >> int32 bar_int = 200; > >> int32 bar_int2 = 201; > >> string bar_string = 300; > >> } > >> } > >> > >> CODE > >> > >> I set values for foo and bar_string > >> > >> for (int i = 0; i < 3; i += 1) { > >> com.etsy.grpcparquet.Person message = > Person.newBuilder() > >> .setFoo(i) > >> .setBarString("hello world") > >> .build(); > >> message.writeDelimitedTo(out); > >> } > >> And then I write the protobuf file out to parquet. > >> > >> RESULT > >> --- > >> $ parquet-tools show example.parquet > >> > >> > >> +---+---++--+ > >> | foo | bar_int | bar_int2 | bar_string | > >> |---+---++--| > >> | 0 | 0 | 0 | hello world | > >> | 1 | 0 | 0 | hello world | > >> | 2 | 0 | 0 | hello world | > >> +---+---++--+ > >> > >> I would expect that bar_int and bar_int2 are EMPTY for all three rows > >> since > >> only bar_string is set in the oneof. > >> > >> Is this the right expectation for me to have? > >> > >> Thank you! > >> > >> -- > >> Aaron Niskode-Dossett, Data Engineering -- Etsy > >> > > > -- Aaron Niskode-Dossett, Data Engineering -- Etsy
Re: protobuf3 and oneof fields
Hello, Perhaps a bit more nuance here. I believe that the values are technically correct (they should be the default value of 0), but we should not be storing them as 0 values. We need to check the hasBar*() to determine if the value should be stored or omitted. Thanks. On Tue, Sep 29, 2020 at 10:39 AM David wrote: > Hello, > > I too have been poking around the Parquet-Proto package as well. > > I would expect "bar_int" and "bar_int2" to be 'null' here. > > Have you filed a JIRA with this reproduction? > > Thanks. > > On Fri, Sep 25, 2020 at 9:58 AM Aaron Niskode-Dossett > wrote: > >> Hello, >> >> I am experimenting with serializing protobuf3 to parquet and have a >> question about how "oneOf" fields should be treated. I will describe an >> example. I'm running parquet 1.11.1 with PARQUET-1684 applied. That JIRA >> is about how default values are written out, and seems related to my >> question. >> >> SCHEMA >> >> message Person { >> int32 foo = 1; >> oneof optional_bar { >> int32 bar_int = 200; >> int32 bar_int2 = 201; >> string bar_string = 300; >> } >> } >> >> CODE >> >> I set values for foo and bar_string >> >> for (int i = 0; i < 3; i += 1) { >> com.etsy.grpcparquet.Person message = Person.newBuilder() >> .setFoo(i) >> .setBarString("hello world") >> .build(); >> message.writeDelimitedTo(out); >> } >> And then I write the protobuf file out to parquet. >> >> RESULT >> --- >> $ parquet-tools show example.parquet >> >> >> +---+---++--+ >> | foo | bar_int | bar_int2 | bar_string | >> |---+---++--| >> | 0 | 0 | 0 | hello world | >> | 1 | 0 | 0 | hello world | >> | 2 | 0 | 0 | hello world | >> +---+---++--+ >> >> I would expect that bar_int and bar_int2 are EMPTY for all three rows >> since >> only bar_string is set in the oneof. >> >> Is this the right expectation for me to have? >> >> Thank you! >> >> -- >> Aaron Niskode-Dossett, Data Engineering -- Etsy >> >
Re: protobuf3 and oneof fields
Hello, I too have been poking around the Parquet-Proto package as well. I would expect "bar_int" and "bar_int2" to be 'null' here. Have you filed a JIRA with this reproduction? Thanks. On Fri, Sep 25, 2020 at 9:58 AM Aaron Niskode-Dossett wrote: > Hello, > > I am experimenting with serializing protobuf3 to parquet and have a > question about how "oneOf" fields should be treated. I will describe an > example. I'm running parquet 1.11.1 with PARQUET-1684 applied. That JIRA > is about how default values are written out, and seems related to my > question. > > SCHEMA > > message Person { > int32 foo = 1; > oneof optional_bar { > int32 bar_int = 200; > int32 bar_int2 = 201; > string bar_string = 300; > } > } > > CODE > > I set values for foo and bar_string > > for (int i = 0; i < 3; i += 1) { > com.etsy.grpcparquet.Person message = Person.newBuilder() > .setFoo(i) > .setBarString("hello world") > .build(); > message.writeDelimitedTo(out); > } > And then I write the protobuf file out to parquet. > > RESULT > --- > $ parquet-tools show example.parquet > > > +---+---++--+ > | foo | bar_int | bar_int2 | bar_string | > |---+---++--| > | 0 | 0 | 0 | hello world | > | 1 | 0 | 0 | hello world | > | 2 | 0 | 0 | hello world | > +---+---++--+ > > I would expect that bar_int and bar_int2 are EMPTY for all three rows since > only bar_string is set in the oneof. > > Is this the right expectation for me to have? > > Thank you! > > -- > Aaron Niskode-Dossett, Data Engineering -- Etsy >
protobuf3 and oneof fields
Hello, I am experimenting with serializing protobuf3 to parquet and have a question about how "oneOf" fields should be treated. I will describe an example. I'm running parquet 1.11.1 with PARQUET-1684 applied. That JIRA is about how default values are written out, and seems related to my question. SCHEMA message Person { int32 foo = 1; oneof optional_bar { int32 bar_int = 200; int32 bar_int2 = 201; string bar_string = 300; } } CODE I set values for foo and bar_string for (int i = 0; i < 3; i += 1) { com.etsy.grpcparquet.Person message = Person.newBuilder() .setFoo(i) .setBarString("hello world") .build(); message.writeDelimitedTo(out); } And then I write the protobuf file out to parquet. RESULT --- $ parquet-tools show example.parquet +---+---++--+ | foo | bar_int | bar_int2 | bar_string | |---+---++--| | 0 | 0 | 0 | hello world | | 1 | 0 | 0 | hello world | | 2 | 0 | 0 | hello world | +---+---++--+ I would expect that bar_int and bar_int2 are EMPTY for all three rows since only bar_string is set in the oneof. Is this the right expectation for me to have? Thank you! -- Aaron Niskode-Dossett, Data Engineering -- Etsy