[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642888#comment-17642888 ] ASF GitHub Bot commented on PARQUET-1711: - shangxinli closed pull request #988: PARQUET-1711: Break circular dependencies in proto definitions URL: https://github.com/apache/parquet-mr/pull/988 > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17642887#comment-17642887 ] ASF GitHub Bot commented on PARQUET-1711: - shangxinli commented on PR #988: URL: https://github.com/apache/parquet-mr/pull/988#issuecomment-1336214908 Since https://github.com/apache/parquet-mr/pull/995 is merged, let's close this one. Thanks @matthieun for the contribution ! > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17638109#comment-17638109 ] J Y commented on PARQUET-1711: -- this should be resolved after #995 now. > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17627752#comment-17627752 ] ASF GitHub Bot commented on PARQUET-1711: - shangxinli commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1300567095 LGTM > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17627754#comment-17627754 ] ASF GitHub Bot commented on PARQUET-1711: - shangxinli merged PR #995: URL: https://github.com/apache/parquet-mr/pull/995 > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17627285#comment-17627285 ] ASF GitHub Bot commented on PARQUET-1711: - emkornfield commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1298913644 @jinyius only committers can see the button. I was asking because different repos have different commit procedures. Should be able to merge this soon as long as @shangxinli doesn't express concerns. > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17627221#comment-17627221 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1298742515 yeah, i still don't see a button to merge. it now shows everything approved, checks passed, and no conflicts. i think a committer needs to merge. > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626982#comment-17626982 ] ASF GitHub Bot commented on PARQUET-1711: - ggershinsky commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1298107615 @shangxinli are you ok with this PR in its current form? > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1762#comment-1762 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1297295385 @ggershinsky i'd love to just hit the button. i don't see it. the workflow for travis ci had a failure due to a transient connection issue, and so it wasn't giving me the option to merge. the ui messaging also states that "Only those with [write access](https://docs.github.com/articles/what-are-the-different-access-permissions) to this repository can merge pull requests." > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626665#comment-17626665 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius opened a new pull request, #995: URL: https://github.com/apache/parquet-mr/pull/995 - This is an alternative approach to supporting recursion to apache#445 and apache#988. - This approach could address the other recursion related issues (PARQUET-129, PARQUET-554). - TODO: ReadSupport ### Jira - [x] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-1711 - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: - `ProtoSchemaConverterTest#test*Recursion` - `ProtoWriteSupportTest#test*Recursion` ### Commits - [x] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626663#comment-17626663 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1297292700 > > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626664#comment-17626664 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius closed pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth URL: https://github.com/apache/parquet-mr/pull/995 > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626624#comment-17626624 ] ASF GitHub Bot commented on PARQUET-1711: - ggershinsky commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1297193514 yep, just the squash/merge button. > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17626369#comment-17626369 ] ASF GitHub Bot commented on PARQUET-1711: - emkornfield commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1296519347 @ggershinsky what is the process to merge this? Does parquet-mr just use the github UI? > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17623305#comment-17623305 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1289372825 can someone retry the github actions? there seemed to have been a transient issue that caused one of the test/build targets to fail. i'd like to get this change in this week. > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620612#comment-17620612 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1284596663 > I would also like to recommend adding @matthieun as a co-author to this PR, per the discussion in the parallel PR. how do you do this? > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17620069#comment-17620069 ] ASF GitHub Bot commented on PARQUET-1711: - ggershinsky commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1283566340 I would also like to recommend adding @matthieun as a co-author to this PR, per the discussion in the parallel PR. > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17619032#comment-17619032 ] ASF GitHub Bot commented on PARQUET-1711: - shangxinli commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1281237456 @ggershinsky Can you have a look? > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17619025#comment-17619025 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1281213675 > Mostly looks reasonable, I'm not too familiar with parquet-mr @shangxinli can you recommend someone who might be able to give a better review? pinging @shangxinli :) > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17615228#comment-17615228 ] ASF GitHub Bot commented on PARQUET-1711: - matthieun commented on PR #988: URL: https://github.com/apache/parquet-mr/pull/988#issuecomment-1273621299 Hi, I am fine with whatever solution. If you choose #995 that works, please just close this one! > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614890#comment-17614890 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990915628 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoWriteSupport.java: ## @@ -559,7 +564,14 @@ final void writeRawValue(Object value) { class BinaryWriter extends FieldWriter { @Override final void writeRawValue(Object value) { - ByteString byteString = (ByteString) value; + // Non-ByteString values can happen when recursions gets truncated. + ByteString byteString = value instanceof ByteString + ? (ByteString) value + // TODO: figure out a way to use MessageOrBuilder + : value instanceof Message + ? ((Message) value).toByteString() Review Comment: no, afaict: https://www.javadoc.io/doc/org.apache.parquet/parquet-column/latest/org/apache/parquet/io/api/RecordConsumer.html > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614889#comment-17614889 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990914702 ## parquet-protobuf/src/test/resources/Trees.proto: ## @@ -0,0 +1,37 @@ +// +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +// + +syntax = "proto3"; + +package Trees; + +import "google/protobuf/any.proto"; + +option java_package = "org.apache.parquet.proto.test"; + +message BinaryTree { +google.protobuf.Any value = 1; Review Comment: i think the existing non-recursive proto tests exercise the existing and newly added (the skipping behavior) code paths. > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614888#comment-17614888 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990914303 ## parquet-protobuf/src/test/resources/BinaryTree.par: ## @@ -0,0 +1,50 @@ +message Trees.BinaryTree { + optional group value = 1 { Review Comment: this is parquet schema, not proto. protos should/would have a .proto suffix. > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614887#comment-17614887 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990914134 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoWriteSupport.java: ## @@ -559,7 +564,14 @@ final void writeRawValue(Object value) { class BinaryWriter extends FieldWriter { @Override final void writeRawValue(Object value) { - ByteString byteString = (ByteString) value; + // Non-ByteString values can happen when recursions gets truncated. + ByteString byteString = value instanceof ByteString + ? (ByteString) value + // TODO: figure out a way to use MessageOrBuilder + : value instanceof Message + ? ((Message) value).toByteString() + // Worst-case, just dump as plain java string. + : ByteString.copyFromUtf8(value.toString()); Review Comment: this is intended. for a real-time, production pipeline i'm working on, losing data as it passes through or killing the job b/c of an uncaught exception is problematic as it could lead to data loss and down time. this way, there's some way to know what the problematic data was and fix it properly asap. > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614886#comment-17614886 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990913344 ## parquet-protobuf/src/test/java/org/apache/parquet/proto/ProtoSchemaConverterTest.java: ## @@ -82,264 +93,447 @@ public void testConvertAllDatatypes() throws Exception { * Tests that all protocol buffer datatypes are converted to correct parquet datatypes. */ @Test - public void testProto3ConvertAllDatatypes() throws Exception { -String expectedSchema = - "message TestProto3.SchemaConverterAllDatatypes {\n" + -" optional double optionalDouble = 1;\n" + -" optional float optionalFloat = 2;\n" + -" optional int32 optionalInt32 = 3;\n" + -" optional int64 optionalInt64 = 4;\n" + -" optional int32 optionalUInt32 = 5;\n" + -" optional int64 optionalUInt64 = 6;\n" + -" optional int32 optionalSInt32 = 7;\n" + -" optional int64 optionalSInt64 = 8;\n" + -" optional int32 optionalFixed32 = 9;\n" + -" optional int64 optionalFixed64 = 10;\n" + -" optional int32 optionalSFixed32 = 11;\n" + -" optional int64 optionalSFixed64 = 12;\n" + -" optional boolean optionalBool = 13;\n" + -" optional binary optionalString (UTF8) = 14;\n" + -" optional binary optionalBytes = 15;\n" + -" optional group optionalMessage = 16 {\n" + -"optional int32 someId = 3;\n" + -" }\n" + -" optional binary optionalEnum (ENUM) = 18;" + -" optional int32 someInt32 = 19;" + -" optional binary someString (UTF8) = 20;" + -" optional group optionalMap (MAP) = 21 {\n" + -"repeated group key_value {\n" + -" required int64 key;\n" + -" optional group value {\n" + -"optional int32 someId = 3;\n" + -" }\n" + -"}\n" + -" }\n" + -"}"; + public void testProto3ConvertAllDatatypes() { +String expectedSchema = JOINER.join( Review Comment: wdym by "tpe"? if this isn't blocking, i'd rather avoid the busy-work to undo and redo in a different branch. > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614882#comment-17614882 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990912850 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -99,9 +139,9 @@ private Type.Repetition getRepetition(FieldDescriptor descriptor) { } } - private Builder>, GroupBuilder> addField(FieldDescriptor descriptor, final GroupBuilder builder) { + private Builder>, GroupBuilder> addField(FieldDescriptor descriptor, final GroupBuilder builder, ImmutableSetMultimap seen, int depth) { Review Comment: i'm not sure encapsulation helps with readability or protection in this case. they are really tracking different things, and should be understood by readers of the traversal code to know how each piece of state is used. > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614878#comment-17614878 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on PR #988: URL: https://github.com/apache/parquet-mr/pull/988#issuecomment-1272784474 > Hi @jinyius and @matthieun, Thank both of you for the contribution and we really appreciate your patience with us. Now we have two PRs for the same issue, we better merge them into one. Given this PR is earlier, would it be a good idea to incorporate #995 into this PR for what is missing? @matthieun can add @jinyius as a co-author in that case. > > Does it make sense to both of you? i don't think merging will help here. both approaches do similar things in terms of traversing and expanding out the schema on recursive fields. the differ on the state used during the traversal, and they differ on how to deal with the remaining recursive data (this one silently ignores, but the mine stores as serialized bytes). i don't care about authorship. i want this to get fixed, and fixed properly. > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614809#comment-17614809 ] ASF GitHub Bot commented on PARQUET-1711: - shangxinli commented on PR #988: URL: https://github.com/apache/parquet-mr/pull/988#issuecomment-1272621608 Hi @jinyius and @matthieun, Thank both of you for the contribution and we really appreciate your patience with us. Now we have two PRs for the same issue, we better merge them into one. Given this PR is earlier, would it be a good idea to incorporate https://github.com/apache/parquet-mr/pull/995 into this PR for what is missing? @matthieun can add @jinyius as a co-author in that case. Does it make sense to both of you? > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614580#comment-17614580 ] ASF GitHub Bot commented on PARQUET-1711: - emkornfield commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1272447374 Mostly looks reasonable, I'm not too familiar with parquet-mr @shangxinli can you recommend someone who might be able to give a better review? > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614579#comment-17614579 ] ASF GitHub Bot commented on PARQUET-1711: - emkornfield commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990727498 ## parquet-protobuf/src/test/resources/Trees.proto: ## @@ -0,0 +1,37 @@ +// +// Licensed to the Apache Software Foundation (ASF) under one +// or more contributor license agreements. See the NOTICE file +// distributed with this work for additional information +// regarding copyright ownership. The ASF licenses this file +// to you under the Apache License, Version 2.0 (the +// "License"); you may not use this file except in compliance +// with the License. You may obtain a copy of the License at +// +// http://www.apache.org/licenses/LICENSE-2.0 +// +// Unless required by applicable law or agreed to in writing, +// software distributed under the License is distributed on an +// "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +// KIND, either express or implied. See the License for the +// specific language governing permissions and limitations +// under the License. +// + +syntax = "proto3"; + +package Trees; + +import "google/protobuf/any.proto"; + +option java_package = "org.apache.parquet.proto.test"; + +message BinaryTree { +google.protobuf.Any value = 1; Review Comment: it would be good to verify that something like: message WrappedTree { google.protobuf.Any non_recursive = 1; BinaryTree tree = 2; } Also gives expected results (non_recursive doesn't accidentally trigger any of the recursio logic). } > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614578#comment-17614578 ] ASF GitHub Bot commented on PARQUET-1711: - emkornfield commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990727282 ## parquet-protobuf/src/test/resources/BinaryTree.par: ## @@ -0,0 +1,50 @@ +message Trees.BinaryTree { + optional group value = 1 { Review Comment: or is par not proto? > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614576#comment-17614576 ] ASF GitHub Bot commented on PARQUET-1711: - emkornfield commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990727030 ## parquet-protobuf/src/test/resources/BinaryTree.par: ## @@ -0,0 +1,50 @@ +message Trees.BinaryTree { + optional group value = 1 { Review Comment: Aren't groups [deprecated](https://developers.google.com/protocol-buffers/docs/proto#groups)? > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614573#comment-17614573 ] ASF GitHub Bot commented on PARQUET-1711: - emkornfield commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990726331 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoWriteSupport.java: ## @@ -559,7 +564,14 @@ final void writeRawValue(Object value) { class BinaryWriter extends FieldWriter { @Override final void writeRawValue(Object value) { - ByteString byteString = (ByteString) value; + // Non-ByteString values can happen when recursions gets truncated. + ByteString byteString = value instanceof ByteString + ? (ByteString) value + // TODO: figure out a way to use MessageOrBuilder + : value instanceof Message + ? ((Message) value).toByteString() Review Comment: does recordconsumer offer a stream API or something else to avoid the additional array/bytestring copies? > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614572#comment-17614572 ] ASF GitHub Bot commented on PARQUET-1711: - emkornfield commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990726272 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoWriteSupport.java: ## @@ -559,7 +564,14 @@ final void writeRawValue(Object value) { class BinaryWriter extends FieldWriter { @Override final void writeRawValue(Object value) { - ByteString byteString = (ByteString) value; + // Non-ByteString values can happen when recursions gets truncated. + ByteString byteString = value instanceof ByteString + ? (ByteString) value + // TODO: figure out a way to use MessageOrBuilder + : value instanceof Message + ? ((Message) value).toByteString() + // Worst-case, just dump as plain java string. + : ByteString.copyFromUtf8(value.toString()); Review Comment: is this actually an intended state? If not it is probably better to raise an exception then writing data that could possibly be hard to recover. > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614570#comment-17614570 ] ASF GitHub Bot commented on PARQUET-1711: - emkornfield commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990726095 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -99,9 +139,9 @@ private Type.Repetition getRepetition(FieldDescriptor descriptor) { } } - private Builder>, GroupBuilder> addField(FieldDescriptor descriptor, final GroupBuilder builder) { + private Builder>, GroupBuilder> addField(FieldDescriptor descriptor, final GroupBuilder builder, ImmutableSetMultimap seen, int depth) { Review Comment: right, I was thinking of encapsulating this logic into its own class, so they can be recorded and updated together, to 1. Reduce additional parameters that have to be passed through. 2. Encapsulate the logic behind more mnemonic method names (e.g. AddRecursiveStep()) > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614569#comment-17614569 ] ASF GitHub Bot commented on PARQUET-1711: - emkornfield commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r990725805 ## parquet-protobuf/src/test/java/org/apache/parquet/proto/ProtoSchemaConverterTest.java: ## @@ -82,264 +93,447 @@ public void testConvertAllDatatypes() throws Exception { * Tests that all protocol buffer datatypes are converted to correct parquet datatypes. */ @Test - public void testProto3ConvertAllDatatypes() throws Exception { -String expectedSchema = - "message TestProto3.SchemaConverterAllDatatypes {\n" + -" optional double optionalDouble = 1;\n" + -" optional float optionalFloat = 2;\n" + -" optional int32 optionalInt32 = 3;\n" + -" optional int64 optionalInt64 = 4;\n" + -" optional int32 optionalUInt32 = 5;\n" + -" optional int64 optionalUInt64 = 6;\n" + -" optional int32 optionalSInt32 = 7;\n" + -" optional int64 optionalSInt64 = 8;\n" + -" optional int32 optionalFixed32 = 9;\n" + -" optional int64 optionalFixed64 = 10;\n" + -" optional int32 optionalSFixed32 = 11;\n" + -" optional int64 optionalSFixed64 = 12;\n" + -" optional boolean optionalBool = 13;\n" + -" optional binary optionalString (UTF8) = 14;\n" + -" optional binary optionalBytes = 15;\n" + -" optional group optionalMessage = 16 {\n" + -"optional int32 someId = 3;\n" + -" }\n" + -" optional binary optionalEnum (ENUM) = 18;" + -" optional int32 someInt32 = 19;" + -" optional binary someString (UTF8) = 20;" + -" optional group optionalMap (MAP) = 21 {\n" + -"repeated group key_value {\n" + -" required int64 key;\n" + -" optional group value {\n" + -"optional int32 someId = 3;\n" + -" }\n" + -"}\n" + -" }\n" + -"}"; + public void testProto3ConvertAllDatatypes() { +String expectedSchema = JOINER.join( Review Comment: is it possible to separate this tpe of code style cleanup from functional changes? > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17614160#comment-17614160 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1271822621 ping > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17611708#comment-17611708 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1263878341 thanks for the review. updated to handle the logging perf concern as well as fixing the javadoc errors. > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17611707#comment-17611707 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r984839210 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -99,9 +139,9 @@ private Type.Repetition getRepetition(FieldDescriptor descriptor) { } } - private Builder>, GroupBuilder> addField(FieldDescriptor descriptor, final GroupBuilder builder) { + private Builder>, GroupBuilder> addField(FieldDescriptor descriptor, final GroupBuilder builder, ImmutableSetMultimap seen, int depth) { Review Comment: ? the `seen` map does encode the seen fields along with their depth as a single datastructure. `depth` being a separate arg is important b/c it's the current depth in the traversal, and is used to update the seen data structure. > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17611705#comment-17611705 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r984838003 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -124,35 +164,61 @@ private Builder>, GroupBuilder> addR .named("list"); } - private GroupBuilder> addRepeatedMessage(FieldDescriptor descriptor, GroupBuilder builder) { -GroupBuilder>>> result = - builder + private GroupBuilder> addRepeatedMessage(FieldDescriptor descriptor, GroupBuilder builder, ImmutableSetMultimap seen, int depth) { +GroupBuilder>>> result = builder .group(Type.Repetition.OPTIONAL).as(listType()) .group(Type.Repetition.REPEATED) .group(Type.Repetition.OPTIONAL); -convertFields(result, descriptor.getMessageType().getFields()); +convertFields(result, descriptor.getMessageType().getFields(), seen, depth); return result.named("element").named("list"); } - private GroupBuilder> addMessageField(FieldDescriptor descriptor, final GroupBuilder builder) { + private Builder>, GroupBuilder> addMessageField(FieldDescriptor descriptor, final GroupBuilder builder, ImmutableSetMultimap seen, int depth) { +// Prevent recursion by terminating with optional proto bytes. +depth += 1; +String typeName = getInnerTypeName(descriptor); +LOG.trace("addMessageField: " + descriptor.getFullName() + " type: " + typeName + " depth: " + depth); +if (typeName != null) { + if (seen.get(typeName).size() > maxRecursion) { +return builder.primitive(BINARY, Type.Repetition.OPTIONAL).as((LogicalTypeAnnotation) null); + } +} + if (descriptor.isMapField() && parquetSpecsCompliant) { // the old schema style did not include the MAP wrapper around map groups - return addMapField(descriptor, builder); + return addMapField(descriptor, builder, seen, depth); } + +seen = ImmutableSetMultimap.builder().putAll(seen).put(typeName, depth).build(); Review Comment: it's actually not as costly as you think. guava's immutable structures are written to simply remove method access what not needed, and takes tries its best to avoid memory reallocations when using copyOf or builder patterns [[1](https://github.com/google/guava/wiki/ImmutableCollectionsExplained)][[2](https://github.com/google/guava/blob/master/guava/src/com/google/common/collect/ImmutableSetMultimap.java#L365)][[3](https://github.com/google/guava/blob/master/guava/src/com/google/common/collect/ImmutableSetMultimap.java#L306)][[4](https://github.com/google/guava/blob/master/guava/src/com/google/common/collect/ImmutableSetMultimap.java#L291)] [generally](https://stackoverflow.com/questions/1284727/mutable-or-immutable-class). it's pretty [efficient](https://github.com/DimitrisAndreou/memory-measurer/blob/master/ElementCostInDataStructures.txt). because of depth first traversal, we do want to "go back" and let the previous state of counts start again as the basis for other branch traversals. this is exactly the benefit as it helps in avoiding defensive copying of mutable data structures or clearing of fields trying to use a single instance when traversing and going back up the stack. > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at >
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17611706#comment-17611706 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r984839210 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -99,9 +139,9 @@ private Type.Repetition getRepetition(FieldDescriptor descriptor) { } } - private Builder>, GroupBuilder> addField(FieldDescriptor descriptor, final GroupBuilder builder) { + private Builder>, GroupBuilder> addField(FieldDescriptor descriptor, final GroupBuilder builder, ImmutableSetMultimap seen, int depth) { Review Comment: ? the seen map does encode the depth as a single datastructure. the depth being a separate arg is important b/c it's the current depth in the traversal, and is used to update the seen data structure. > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17611695#comment-17611695 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r984809641 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -124,35 +164,61 @@ private Builder>, GroupBuilder> addR .named("list"); } - private GroupBuilder> addRepeatedMessage(FieldDescriptor descriptor, GroupBuilder builder) { -GroupBuilder>>> result = - builder + private GroupBuilder> addRepeatedMessage(FieldDescriptor descriptor, GroupBuilder builder, ImmutableSetMultimap seen, int depth) { +GroupBuilder>>> result = builder .group(Type.Repetition.OPTIONAL).as(listType()) .group(Type.Repetition.REPEATED) .group(Type.Repetition.OPTIONAL); -convertFields(result, descriptor.getMessageType().getFields()); +convertFields(result, descriptor.getMessageType().getFields(), seen, depth); return result.named("element").named("list"); } - private GroupBuilder> addMessageField(FieldDescriptor descriptor, final GroupBuilder builder) { + private Builder>, GroupBuilder> addMessageField(FieldDescriptor descriptor, final GroupBuilder builder, ImmutableSetMultimap seen, int depth) { +// Prevent recursion by terminating with optional proto bytes. +depth += 1; +String typeName = getInnerTypeName(descriptor); +LOG.trace("addMessageField: " + descriptor.getFullName() + " type: " + typeName + " depth: " + depth); Review Comment: you're correct. however, proto schema conversion shouldn't happen repeatedly in the greater flow of a processing job (ideally, just once), so this overhead isn't too bad. i'll move to the parameterized/formatted logging calls in the files i touch here. i would suggest the rest of the codebase do the same to avoid this penalty as well, but it's beyond the scope of this pr. > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17611377#comment-17611377 ] ASF GitHub Bot commented on PARQUET-1711: - emkornfield commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r984238238 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -99,9 +139,9 @@ private Type.Repetition getRepetition(FieldDescriptor descriptor) { } } - private Builder>, GroupBuilder> addField(FieldDescriptor descriptor, final GroupBuilder builder) { + private Builder>, GroupBuilder> addField(FieldDescriptor descriptor, final GroupBuilder builder, ImmutableSetMultimap seen, int depth) { Review Comment: would it make sense to consolidate seen and depth into a single data-structure that can be passed through and abstract some of the direct access to the multimap? > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17611376#comment-17611376 ] ASF GitHub Bot commented on PARQUET-1711: - emkornfield commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r984237501 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -124,35 +164,61 @@ private Builder>, GroupBuilder> addR .named("list"); } - private GroupBuilder> addRepeatedMessage(FieldDescriptor descriptor, GroupBuilder builder) { -GroupBuilder>>> result = - builder + private GroupBuilder> addRepeatedMessage(FieldDescriptor descriptor, GroupBuilder builder, ImmutableSetMultimap seen, int depth) { +GroupBuilder>>> result = builder .group(Type.Repetition.OPTIONAL).as(listType()) .group(Type.Repetition.REPEATED) .group(Type.Repetition.OPTIONAL); -convertFields(result, descriptor.getMessageType().getFields()); +convertFields(result, descriptor.getMessageType().getFields(), seen, depth); return result.named("element").named("list"); } - private GroupBuilder> addMessageField(FieldDescriptor descriptor, final GroupBuilder builder) { + private Builder>, GroupBuilder> addMessageField(FieldDescriptor descriptor, final GroupBuilder builder, ImmutableSetMultimap seen, int depth) { +// Prevent recursion by terminating with optional proto bytes. +depth += 1; +String typeName = getInnerTypeName(descriptor); +LOG.trace("addMessageField: " + descriptor.getFullName() + " type: " + typeName + " depth: " + depth); +if (typeName != null) { + if (seen.get(typeName).size() > maxRecursion) { +return builder.primitive(BINARY, Type.Repetition.OPTIONAL).as((LogicalTypeAnnotation) null); + } +} + if (descriptor.isMapField() && parquetSpecsCompliant) { // the old schema style did not include the MAP wrapper around map groups - return addMapField(descriptor, builder); + return addMapField(descriptor, builder, seen, depth); } + +seen = ImmutableSetMultimap.builder().putAll(seen).put(typeName, depth).build(); Review Comment: if this gets modified every time through this method, is immutability useful? > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17611375#comment-17611375 ] ASF GitHub Bot commented on PARQUET-1711: - emkornfield commented on code in PR #995: URL: https://github.com/apache/parquet-mr/pull/995#discussion_r984236009 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -124,35 +164,61 @@ private Builder>, GroupBuilder> addR .named("list"); } - private GroupBuilder> addRepeatedMessage(FieldDescriptor descriptor, GroupBuilder builder) { -GroupBuilder>>> result = - builder + private GroupBuilder> addRepeatedMessage(FieldDescriptor descriptor, GroupBuilder builder, ImmutableSetMultimap seen, int depth) { +GroupBuilder>>> result = builder .group(Type.Repetition.OPTIONAL).as(listType()) .group(Type.Repetition.REPEATED) .group(Type.Repetition.OPTIONAL); -convertFields(result, descriptor.getMessageType().getFields()); +convertFields(result, descriptor.getMessageType().getFields(), seen, depth); return result.named("element").named("list"); } - private GroupBuilder> addMessageField(FieldDescriptor descriptor, final GroupBuilder builder) { + private Builder>, GroupBuilder> addMessageField(FieldDescriptor descriptor, final GroupBuilder builder, ImmutableSetMultimap seen, int depth) { +// Prevent recursion by terminating with optional proto bytes. +depth += 1; +String typeName = getInnerTypeName(descriptor); +LOG.trace("addMessageField: " + descriptor.getFullName() + " type: " + typeName + " depth: " + depth); Review Comment: its been a while since I've done java but doesn't the string concatenation as a parameter, incur overhead even if the log level is is above trace (i.e. doesn't there need to be some string formatting that takes the string arguments as separate parameters?) > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610678#comment-17610678 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1261287406 fixed missing dep issue. can someone approve the ci flow? > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17610358#comment-17610358 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on PR #988: URL: https://github.com/apache/parquet-mr/pull/988#issuecomment-1260420022 > @matthieun and @jinyius Would it be possible for you both to sync to come up with one solution? You can put the other one as co-author. imho, i believe #995 is a superset of functionality to this pr. > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17609786#comment-17609786 ] ASF GitHub Bot commented on PARQUET-1711: - shangxinli commented on PR #988: URL: https://github.com/apache/parquet-mr/pull/988#issuecomment-1258885741 @matthieun and @jinyius Would it be possible for you both to sync to come up with one solution? You can put the other one as co-author. > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17608085#comment-17608085 ] J Y commented on PARQUET-1711: -- ping.. the pr i wrote is the byte encoding version that [~emkornfi...@gmail.com] suggested in https://issues.apache.org/jira/browse/PARQUET-1711?focusedCommentId=17543672=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17543672 > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17605098#comment-17605098 ] Gidon Gershinsky commented on PARQUET-1711: --- [~emkornfield] what do you think about these 3 alternatives? > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17604993#comment-17604993 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1247345339 ping > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17602127#comment-17602127 ] Gidon Gershinsky commented on PARQUET-1711: --- Hi to all on this Jira. Looks like we have a number of alternative solutions to this problem today, > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601698#comment-17601698 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on PR #988: URL: https://github.com/apache/parquet-mr/pull/988#issuecomment-1240350078 fyi, i sent pr #995 > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601696#comment-17601696 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius opened a new pull request, #995: URL: https://github.com/apache/parquet-mr/pull/995 - This is an alternative approach to supporting recursion to apache#445 and apache#988. - This approach could address the other recursion related issues (PARQUET-129, PARQUET-554). - TODO: ReadSupport ### Jira - [x] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-1711 - In case you are adding a dependency, check if the license complies with the [ASF 3rd Party License Policy](https://www.apache.org/legal/resolved.html#category-x). ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: - `ProtoSchemaConverterTest#test*Recursion` - `ProtoWriteSupportTest#test*Recursion` ### Commits - [x] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17598174#comment-17598174 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on code in PR #988: URL: https://github.com/apache/parquet-mr/pull/988#discussion_r959168794 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -79,12 +80,20 @@ public MessageType convert(Class protobufClass) { } /* Iterates over list of fields. **/ - private GroupBuilder convertFields(GroupBuilder groupBuilder, List fieldDescriptors) { + private GroupBuilder convertFields(GroupBuilder groupBuilder, List fieldDescriptors, List parentNames) { for (FieldDescriptor fieldDescriptor : fieldDescriptors) { - groupBuilder = - addField(fieldDescriptor, groupBuilder) + final String name = fieldDescriptor.getFullName(); + final List newParentNames = new ArrayList<>(parentNames); + newParentNames.add(name); + if (parentNames.contains(name)) { +// Circular dependency, skip +LOG.warn("Breaking circular dependency:{}{}", System.lineSeparator(), Review Comment: i had been working on this issue as well and arrived at a similar solution to this one (however, without skipping/losing data) and linked to the prs in this pr conversation. ptal, and if you folks prefer it, i can submit a merge against head and close out this pr. > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17598171#comment-17598171 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on PR #988: URL: https://github.com/apache/parquet-mr/pull/988#issuecomment-1232470935 hmm... what timing. i actually have a pr for what i think is a more robust approach that truncates at an arbitrary recursion depth by putting the remaining recursion levels into a binary blob. this approach lets downstream querying things query the non-truncated parts fine, and allows for udfs to be defined to reinstantiate the truncated recursed fields. i didn't submit the pr for merge quite yet b/c i'm busy trying to finish off the overall project i needed this for at work, so it's just coded against 1.12.3 and not head. ptal, and if everyone likes my proposal, i can spend a few cycles and move it to head: schema converter pr: - https://github.com/promotedai/parquet-mr/pull/1 write support pr: - https://github.com/promotedai/parquet-mr/pull/2 > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17585637#comment-17585637 ] ASF GitHub Bot commented on PARQUET-1711: - matthieun commented on PR #988: URL: https://github.com/apache/parquet-mr/pull/988#issuecomment-1229042610 @shangxinli Let me know if this is good to merge! > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583724#comment-17583724 ] ASF GitHub Bot commented on PARQUET-1711: - matthieun commented on code in PR #988: URL: https://github.com/apache/parquet-mr/pull/988#discussion_r952835674 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -79,12 +80,20 @@ public MessageType convert(Class protobufClass) { } /* Iterates over list of fields. **/ - private GroupBuilder convertFields(GroupBuilder groupBuilder, List fieldDescriptors) { + private GroupBuilder convertFields(GroupBuilder groupBuilder, List fieldDescriptors, List parentNames) { for (FieldDescriptor fieldDescriptor : fieldDescriptors) { - groupBuilder = - addField(fieldDescriptor, groupBuilder) + final String name = fieldDescriptor.getFullName(); + final List newParentNames = new ArrayList<>(parentNames); + newParentNames.add(name); + if (parentNames.contains(name)) { +// Circular dependency, skip +LOG.warn("Breaking circular dependency:{}{}", System.lineSeparator(), Review Comment: Well, another option would be to add a new configuration setting to allow the user to either have it fail with a good error message, or just silently break the circle like this. However I am not familiar with how `parquet-protobuf` is configured. If I should go that route, I'd appreciate some examples! > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583686#comment-17583686 ] ASF GitHub Bot commented on PARQUET-1711: - shangxinli commented on code in PR #988: URL: https://github.com/apache/parquet-mr/pull/988#discussion_r952778132 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -79,12 +80,20 @@ public MessageType convert(Class protobufClass) { } /* Iterates over list of fields. **/ - private GroupBuilder convertFields(GroupBuilder groupBuilder, List fieldDescriptors) { + private GroupBuilder convertFields(GroupBuilder groupBuilder, List fieldDescriptors, List parentNames) { for (FieldDescriptor fieldDescriptor : fieldDescriptors) { - groupBuilder = - addField(fieldDescriptor, groupBuilder) + final String name = fieldDescriptor.getFullName(); + final List newParentNames = new ArrayList<>(parentNames); + newParentNames.add(name); + if (parentNames.contains(name)) { +// Circular dependency, skip +LOG.warn("Breaking circular dependency:{}{}", System.lineSeparator(), Review Comment: In that case, we silently break the circle without throwing an exception. Is that OK? > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583688#comment-17583688 ] ASF GitHub Bot commented on PARQUET-1711: - shangxinli commented on code in PR #988: URL: https://github.com/apache/parquet-mr/pull/988#discussion_r952778132 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -79,12 +80,20 @@ public MessageType convert(Class protobufClass) { } /* Iterates over list of fields. **/ - private GroupBuilder convertFields(GroupBuilder groupBuilder, List fieldDescriptors) { + private GroupBuilder convertFields(GroupBuilder groupBuilder, List fieldDescriptors, List parentNames) { for (FieldDescriptor fieldDescriptor : fieldDescriptors) { - groupBuilder = - addField(fieldDescriptor, groupBuilder) + final String name = fieldDescriptor.getFullName(); + final List newParentNames = new ArrayList<>(parentNames); + newParentNames.add(name); + if (parentNames.contains(name)) { +// Circular dependency, skip +LOG.warn("Breaking circular dependency:{}{}", System.lineSeparator(), Review Comment: In that case, we silently break the circle without throwing an exception. Is that OK? > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583672#comment-17583672 ] ASF GitHub Bot commented on PARQUET-1711: - shangxinli commented on code in PR #988: URL: https://github.com/apache/parquet-mr/pull/988#discussion_r952764513 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -79,12 +80,20 @@ public MessageType convert(Class protobufClass) { } /* Iterates over list of fields. **/ - private GroupBuilder convertFields(GroupBuilder groupBuilder, List fieldDescriptors) { + private GroupBuilder convertFields(GroupBuilder groupBuilder, List fieldDescriptors, List parentNames) { for (FieldDescriptor fieldDescriptor : fieldDescriptors) { - groupBuilder = - addField(fieldDescriptor, groupBuilder) + final String name = fieldDescriptor.getFullName(); + final List newParentNames = new ArrayList<>(parentNames); + newParentNames.add(name); + if (parentNames.contains(name)) { Review Comment: Make sense > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583076#comment-17583076 ] ASF GitHub Bot commented on PARQUET-1711: - matthieun commented on code in PR #988: URL: https://github.com/apache/parquet-mr/pull/988#discussion_r951648555 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -79,12 +80,20 @@ public MessageType convert(Class protobufClass) { } /* Iterates over list of fields. **/ - private GroupBuilder convertFields(GroupBuilder groupBuilder, List fieldDescriptors) { + private GroupBuilder convertFields(GroupBuilder groupBuilder, List fieldDescriptors, List parentNames) { for (FieldDescriptor fieldDescriptor : fieldDescriptors) { - groupBuilder = - addField(fieldDescriptor, groupBuilder) + final String name = fieldDescriptor.getFullName(); + final List newParentNames = new ArrayList<>(parentNames); + newParentNames.add(name); + if (parentNames.contains(name)) { +// Circular dependency, skip +LOG.warn("Breaking circular dependency:{}{}", System.lineSeparator(), Review Comment: It is possible to create circular dependencies, that is the problem. I am not sure in what case they would be useful, however since they can exist, parquet should not fail with `StackOverflowError` when it encounters them > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17583074#comment-17583074 ] ASF GitHub Bot commented on PARQUET-1711: - matthieun commented on code in PR #988: URL: https://github.com/apache/parquet-mr/pull/988#discussion_r951644453 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -79,12 +80,20 @@ public MessageType convert(Class protobufClass) { } /* Iterates over list of fields. **/ - private GroupBuilder convertFields(GroupBuilder groupBuilder, List fieldDescriptors) { + private GroupBuilder convertFields(GroupBuilder groupBuilder, List fieldDescriptors, List parentNames) { for (FieldDescriptor fieldDescriptor : fieldDescriptors) { - groupBuilder = - addField(fieldDescriptor, groupBuilder) + final String name = fieldDescriptor.getFullName(); + final List newParentNames = new ArrayList<>(parentNames); + newParentNames.add(name); + if (parentNames.contains(name)) { Review Comment: The list is mostly used to keep the ordering, so that the dependency chain is printed in order in the warning message. I understand that in case the schema definition is really deep with nested types it might be slower, but overall that list is not growing any bigger than the deepest nesting in the schema. If this is still a concern, I am happy to switch to HashSet at the expense of maybe dumming down the log message (printing the nesting chain out of order would not be valuable anyway I think). > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582614#comment-17582614 ] ASF GitHub Bot commented on PARQUET-1711: - shangxinli commented on code in PR #988: URL: https://github.com/apache/parquet-mr/pull/988#discussion_r950888122 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -79,12 +80,20 @@ public MessageType convert(Class protobufClass) { } /* Iterates over list of fields. **/ - private GroupBuilder convertFields(GroupBuilder groupBuilder, List fieldDescriptors) { + private GroupBuilder convertFields(GroupBuilder groupBuilder, List fieldDescriptors, List parentNames) { for (FieldDescriptor fieldDescriptor : fieldDescriptors) { - groupBuilder = - addField(fieldDescriptor, groupBuilder) + final String name = fieldDescriptor.getFullName(); + final List newParentNames = new ArrayList<>(parentNames); + newParentNames.add(name); + if (parentNames.contains(name)) { +// Circular dependency, skip +LOG.warn("Breaking circular dependency:{}{}", System.lineSeparator(), Review Comment: I am not very familiar with Proto. By design, is the 'circular' normal in the proto or it is caused by issues? > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17582613#comment-17582613 ] ASF GitHub Bot commented on PARQUET-1711: - shangxinli commented on code in PR #988: URL: https://github.com/apache/parquet-mr/pull/988#discussion_r950887905 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java: ## @@ -79,12 +80,20 @@ public MessageType convert(Class protobufClass) { } /* Iterates over list of fields. **/ - private GroupBuilder convertFields(GroupBuilder groupBuilder, List fieldDescriptors) { + private GroupBuilder convertFields(GroupBuilder groupBuilder, List fieldDescriptors, List parentNames) { for (FieldDescriptor fieldDescriptor : fieldDescriptors) { - groupBuilder = - addField(fieldDescriptor, groupBuilder) + final String name = fieldDescriptor.getFullName(); + final List newParentNames = new ArrayList<>(parentNames); + newParentNames.add(name); + if (parentNames.contains(name)) { Review Comment: The list contains would be slower than HashSet. Any reason we don't use HashSet? > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17581510#comment-17581510 ] ASF GitHub Bot commented on PARQUET-1711: - matthieun opened a new pull request, #988: URL: https://github.com/apache/parquet-mr/pull/988 In case some proto definitions have circular dependencies, the proto schema converter breaks those and logs a warning, instead of a `StackOverflowException`. ### Jira - [x] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title. For example, "PARQUET-1234: My Parquet PR" - https://issues.apache.org/jira/browse/PARQUET-1711 ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: - Proto definitions with circular dependencies tested in `ProtoSchemaConverterTest` ### Commits - [x] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [ ] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17570331#comment-17570331 ] Micah Kornfield commented on PARQUET-1711: -- {quote}[~emkornfield] Can we expect a fix any time soon? {quote} I was not planning on working on this, I don't have a lot of expertise on parquet-mr, but I can help review if someone wants to try this approach. > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552396#comment-17552396 ] J Y commented on PARQUET-1711: -- i'd be ok of that approach: a proto option annotation to limit the recursion limit, then failing over to treat it as proto bytes. if the recursion limit is omitted/missing, then just treat the recursive definition as bytes after the first occurrence. forgive me if this is a naive question, but what's the difficulty in supporting "typing" properly to handle recursive nesting? > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17545732#comment-17545732 ] Reeni Mathew commented on PARQUET-1711: --- [~emkornfield] Can we expect a fix any time soon? Hit by the same issue when Schema contains a Self Reference. > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17543672#comment-17543672 ] Micah Kornfield commented on PARQUET-1711: -- the way one could handle this is allow users to specify a recursion limit, and expand the schema to that limit (this would need to be stored in metadata). For protos that end up deeper then that level you either error on write, or have the leaf levels of recursion store serialized proto bytes. > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542642#comment-17542642 ] Aaron Blake Niskode-Dossett commented on PARQUET-1711: -- That is a very interesting issue. I'm still learning a lot about parquet, but I'm not sure parquet has a notion of a recursive schema the way protobuf does here. In other words, I'm not sure how parquet *could* handle this (although it should not be a stack overflow!) > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17542604#comment-17542604 ] J Y commented on PARQUET-1711: -- i believe google's [Struct|https://github.com/protocolbuffers/protobuf/blob/main/src/google/protobuf/struct.proto] also runs afoul here, too. any new progress here? > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16989291#comment-16989291 ] Lawrence He commented on PARQUET-1711: -- We also ran into this issue with latest master parquet build: {code:java} java.lang.StackOverflowErrorjava.lang.StackOverflowError at org.apache.parquet.schema.Types$BasePrimitiveBuilder$1.checkBinaryPrimitiveType(Types.java:561) at org.apache.parquet.schema.Types$BasePrimitiveBuilder$1.visit(Types.java:448) at org.apache.parquet.schema.LogicalTypeAnnotation$StringLogicalTypeAnnotation.accept(LogicalTypeAnnotation.java:303) at org.apache.parquet.schema.Types$BasePrimitiveBuilder.build(Types.java:445) at org.apache.parquet.schema.Types$BasePrimitiveBuilder.build(Types.java:336) at org.apache.parquet.schema.Types$Builder.named(Types.java:314) at org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:83) at org.apache.parquet.proto.ProtoSchemaConverter.addMessageField(ProtoSchemaConverter.java:147) at org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:100) at org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:81) at org.apache.parquet.proto.ProtoSchemaConverter.addMessageField(ProtoSchemaConverter.java:147) at org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:100) at org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:81) at org.apache.parquet.proto.ProtoSchemaConverter.addMessageField(ProtoSchemaConverter.java:147) at org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:100) at org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:81) at org.apache.parquet.proto.ProtoSchemaConverter.addMessageField(ProtoSchemaConverter.java:147) {code} > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005)