[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17605098#comment-17605098 ] Gidon Gershinsky commented on PARQUET-1711: --- [~emkornfield] what do you think about these 3 alternatives? > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1711) [parquet-protobuf] stack overflow when work with well known json type
[ https://issues.apache.org/jira/browse/PARQUET-1711?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17604993#comment-17604993 ] ASF GitHub Bot commented on PARQUET-1711: - jinyius commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1247345339 ping > [parquet-protobuf] stack overflow when work with well known json type > - > > Key: PARQUET-1711 > URL: https://issues.apache.org/jira/browse/PARQUET-1711 > Project: Parquet > Issue Type: Bug >Affects Versions: 1.10.1 >Reporter: Lawrence He >Priority: Major > > Writing following protobuf message as parquet file is not possible: > {code:java} > syntax = "proto3"; > import "google/protobuf/struct.proto"; > package test; > option java_outer_classname = "CustomMessage"; > message TestMessage { > map data = 1; > } {code} > Protobuf introduced "well known json type" such like > [ListValue|https://developers.google.com/protocol-buffers/docs/reference/google.protobuf#listvalue] > to work around json schema conversion. > However writing above messages traps parquet writer into an infinite loop due > to the "general type" support in protobuf. Current implementation will keep > referencing 6 possible types defined in protobuf (null, bool, number, string, > struct, list) and entering infinite loop when referencing "struct". > {code:java} > java.lang.StackOverflowErrorjava.lang.StackOverflowError at > java.base/java.util.Arrays$ArrayItr.(Arrays.java:4418) at > java.base/java.util.Arrays$ArrayList.iterator(Arrays.java:4410) at > java.base/java.util.Collections$UnmodifiableCollection$1.(Collections.java:1044) > at > java.base/java.util.Collections$UnmodifiableCollection.iterator(Collections.java:1043) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:64) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > at > org.apache.parquet.proto.ProtoSchemaConverter.convertFields(ProtoSchemaConverter.java:66) > at > org.apache.parquet.proto.ProtoSchemaConverter.addField(ProtoSchemaConverter.java:96) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [parquet-mr] jinyius commented on pull request #995: PARQUET-1711: support recursive proto schemas by limiting recursion depth
jinyius commented on PR #995: URL: https://github.com/apache/parquet-mr/pull/995#issuecomment-1247345339 ping -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[jira] [Commented] (PARQUET-1020) Add support for Dynamic Messages in parquet-protobuf
[ https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17604981#comment-17604981 ] ASF GitHub Bot commented on PARQUET-1020: - stejskal commented on PR #963: URL: https://github.com/apache/parquet-mr/pull/963#issuecomment-1247314834 is there any idea when this will be released? > Add support for Dynamic Messages in parquet-protobuf > > > Key: PARQUET-1020 > URL: https://issues.apache.org/jira/browse/PARQUET-1020 > Project: Parquet > Issue Type: New Feature > Components: parquet-protobuf >Reporter: Alex Buck >Assignee: Alex Buck >Priority: Major > > Hello. We would like to pass in a DynamicMessage rather than using the > generated protobuf classes to allow us to make our job very generic. > I think this could be achieved by setting the descriptor upfront, similarly > to how there is a ProtoParquetOutputFormat today. > In ProtoWriteSupport in the init method it could then generate the parquet > schema created by ProtoSchemaConverter using the passed in descriptor, rather > than taking it from the generated proto class. > Would there be interest in incorporating this change? If so does the approach > above sound sensible? I am happy to do a pull request > initial PR here: https://github.com/apache/parquet-mr/pull/414 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[GitHub] [parquet-mr] stejskal commented on pull request #963: PARQUET-1020 Add DynamicMessage writing support
stejskal commented on PR #963: URL: https://github.com/apache/parquet-mr/pull/963#issuecomment-1247314834 is there any idea when this will be released? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org