[jira] [Commented] (PARQUET-1020) Add support for Dynamic Messages in parquet-protobuf
[ https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17605374#comment-17605374 ] ASF GitHub Bot commented on PARQUET-1020: - dossett commented on PR #963: URL: https://github.com/apache/parquet-mr/pull/963#issuecomment-1248173412 @stejskal This doesn't directly answer your question, but I resolved the JIRA ticket and marked it for the 1.13.0 release. I'm not aware of the timeline for that release though. > Add support for Dynamic Messages in parquet-protobuf > > > Key: PARQUET-1020 > URL: https://issues.apache.org/jira/browse/PARQUET-1020 > Project: Parquet > Issue Type: New Feature > Components: parquet-protobuf >Reporter: Alex Buck >Assignee: Alex Buck >Priority: Major > Fix For: 1.13.0 > > > Hello. We would like to pass in a DynamicMessage rather than using the > generated protobuf classes to allow us to make our job very generic. > I think this could be achieved by setting the descriptor upfront, similarly > to how there is a ProtoParquetOutputFormat today. > In ProtoWriteSupport in the init method it could then generate the parquet > schema created by ProtoSchemaConverter using the passed in descriptor, rather > than taking it from the generated proto class. > Would there be interest in incorporating this change? If so does the approach > above sound sensible? I am happy to do a pull request > initial PR here: https://github.com/apache/parquet-mr/pull/414 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1020) Add support for Dynamic Messages in parquet-protobuf
[ https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17604981#comment-17604981 ] ASF GitHub Bot commented on PARQUET-1020: - stejskal commented on PR #963: URL: https://github.com/apache/parquet-mr/pull/963#issuecomment-1247314834 is there any idea when this will be released? > Add support for Dynamic Messages in parquet-protobuf > > > Key: PARQUET-1020 > URL: https://issues.apache.org/jira/browse/PARQUET-1020 > Project: Parquet > Issue Type: New Feature > Components: parquet-protobuf >Reporter: Alex Buck >Assignee: Alex Buck >Priority: Major > > Hello. We would like to pass in a DynamicMessage rather than using the > generated protobuf classes to allow us to make our job very generic. > I think this could be achieved by setting the descriptor upfront, similarly > to how there is a ProtoParquetOutputFormat today. > In ProtoWriteSupport in the init method it could then generate the parquet > schema created by ProtoSchemaConverter using the passed in descriptor, rather > than taking it from the generated proto class. > Would there be interest in incorporating this change? If so does the approach > above sound sensible? I am happy to do a pull request > initial PR here: https://github.com/apache/parquet-mr/pull/414 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1020) Add support for Dynamic Messages in parquet-protobuf
[ https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17570689#comment-17570689 ] ASF GitHub Bot commented on PARQUET-1020: - guillaume-fetter commented on PR #963: URL: https://github.com/apache/parquet-mr/pull/963#issuecomment-1193643505 Thank you very much! > Add support for Dynamic Messages in parquet-protobuf > > > Key: PARQUET-1020 > URL: https://issues.apache.org/jira/browse/PARQUET-1020 > Project: Parquet > Issue Type: New Feature > Components: parquet-protobuf >Reporter: Alex Buck >Assignee: Alex Buck >Priority: Major > > Hello. We would like to pass in a DynamicMessage rather than using the > generated protobuf classes to allow us to make our job very generic. > I think this could be achieved by setting the descriptor upfront, similarly > to how there is a ProtoParquetOutputFormat today. > In ProtoWriteSupport in the init method it could then generate the parquet > schema created by ProtoSchemaConverter using the passed in descriptor, rather > than taking it from the generated proto class. > Would there be interest in incorporating this change? If so does the approach > above sound sensible? I am happy to do a pull request > initial PR here: https://github.com/apache/parquet-mr/pull/414 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1020) Add support for Dynamic Messages in parquet-protobuf
[ https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566350#comment-17566350 ] ASF GitHub Bot commented on PARQUET-1020: - dossett commented on PR #963: URL: https://github.com/apache/parquet-mr/pull/963#issuecomment-1183301646 Terrific, thank you! > Add support for Dynamic Messages in parquet-protobuf > > > Key: PARQUET-1020 > URL: https://issues.apache.org/jira/browse/PARQUET-1020 > Project: Parquet > Issue Type: New Feature > Components: parquet-protobuf >Reporter: Alex Buck >Assignee: Alex Buck >Priority: Major > > Hello. We would like to pass in a DynamicMessage rather than using the > generated protobuf classes to allow us to make our job very generic. > I think this could be achieved by setting the descriptor upfront, similarly > to how there is a ProtoParquetOutputFormat today. > In ProtoWriteSupport in the init method it could then generate the parquet > schema created by ProtoSchemaConverter using the passed in descriptor, rather > than taking it from the generated proto class. > Would there be interest in incorporating this change? If so does the approach > above sound sensible? I am happy to do a pull request > initial PR here: https://github.com/apache/parquet-mr/pull/414 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1020) Add support for Dynamic Messages in parquet-protobuf
[ https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566347#comment-17566347 ] ASF GitHub Bot commented on PARQUET-1020: - shangxinli commented on PR #963: URL: https://github.com/apache/parquet-mr/pull/963#issuecomment-1183298278 Merged. Thanks again! > Add support for Dynamic Messages in parquet-protobuf > > > Key: PARQUET-1020 > URL: https://issues.apache.org/jira/browse/PARQUET-1020 > Project: Parquet > Issue Type: New Feature > Components: parquet-protobuf >Reporter: Alex Buck >Assignee: Alex Buck >Priority: Major > > Hello. We would like to pass in a DynamicMessage rather than using the > generated protobuf classes to allow us to make our job very generic. > I think this could be achieved by setting the descriptor upfront, similarly > to how there is a ProtoParquetOutputFormat today. > In ProtoWriteSupport in the init method it could then generate the parquet > schema created by ProtoSchemaConverter using the passed in descriptor, rather > than taking it from the generated proto class. > Would there be interest in incorporating this change? If so does the approach > above sound sensible? I am happy to do a pull request > initial PR here: https://github.com/apache/parquet-mr/pull/414 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1020) Add support for Dynamic Messages in parquet-protobuf
[ https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566346#comment-17566346 ] ASF GitHub Bot commented on PARQUET-1020: - shangxinli merged PR #963: URL: https://github.com/apache/parquet-mr/pull/963 > Add support for Dynamic Messages in parquet-protobuf > > > Key: PARQUET-1020 > URL: https://issues.apache.org/jira/browse/PARQUET-1020 > Project: Parquet > Issue Type: New Feature > Components: parquet-protobuf >Reporter: Alex Buck >Assignee: Alex Buck >Priority: Major > > Hello. We would like to pass in a DynamicMessage rather than using the > generated protobuf classes to allow us to make our job very generic. > I think this could be achieved by setting the descriptor upfront, similarly > to how there is a ProtoParquetOutputFormat today. > In ProtoWriteSupport in the init method it could then generate the parquet > schema created by ProtoSchemaConverter using the passed in descriptor, rather > than taking it from the generated proto class. > Would there be interest in incorporating this change? If so does the approach > above sound sensible? I am happy to do a pull request > initial PR here: https://github.com/apache/parquet-mr/pull/414 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1020) Add support for Dynamic Messages in parquet-protobuf
[ https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17566326#comment-17566326 ] ASF GitHub Bot commented on PARQUET-1020: - dossett commented on PR #963: URL: https://github.com/apache/parquet-mr/pull/963#issuecomment-1183261724 Thank you @shangxinli ! Do you want to merge it now or closer to the next release? > Add support for Dynamic Messages in parquet-protobuf > > > Key: PARQUET-1020 > URL: https://issues.apache.org/jira/browse/PARQUET-1020 > Project: Parquet > Issue Type: New Feature > Components: parquet-protobuf >Reporter: Alex Buck >Assignee: Alex Buck >Priority: Major > > Hello. We would like to pass in a DynamicMessage rather than using the > generated protobuf classes to allow us to make our job very generic. > I think this could be achieved by setting the descriptor upfront, similarly > to how there is a ProtoParquetOutputFormat today. > In ProtoWriteSupport in the init method it could then generate the parquet > schema created by ProtoSchemaConverter using the passed in descriptor, rather > than taking it from the generated proto class. > Would there be interest in incorporating this change? If so does the approach > above sound sensible? I am happy to do a pull request > initial PR here: https://github.com/apache/parquet-mr/pull/414 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1020) Add support for Dynamic Messages in parquet-protobuf
[ https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17561769#comment-17561769 ] ASF GitHub Bot commented on PARQUET-1020: - shangxinli commented on PR #963: URL: https://github.com/apache/parquet-mr/pull/963#issuecomment-1172925425 Sorry for the late response and thank you @guillaume-fetter and @dossett for the contribution. Yeah, it seems low risk and LGTM. > Add support for Dynamic Messages in parquet-protobuf > > > Key: PARQUET-1020 > URL: https://issues.apache.org/jira/browse/PARQUET-1020 > Project: Parquet > Issue Type: New Feature > Components: parquet-protobuf >Reporter: Alex Buck >Assignee: Alex Buck >Priority: Major > > Hello. We would like to pass in a DynamicMessage rather than using the > generated protobuf classes to allow us to make our job very generic. > I think this could be achieved by setting the descriptor upfront, similarly > to how there is a ProtoParquetOutputFormat today. > In ProtoWriteSupport in the init method it could then generate the parquet > schema created by ProtoSchemaConverter using the passed in descriptor, rather > than taking it from the generated proto class. > Would there be interest in incorporating this change? If so does the approach > above sound sensible? I am happy to do a pull request > initial PR here: https://github.com/apache/parquet-mr/pull/414 -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (PARQUET-1020) Add support for Dynamic Messages in parquet-protobuf
[ https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17554279#comment-17554279 ] ASF GitHub Bot commented on PARQUET-1020: - dossett commented on PR #963: URL: https://github.com/apache/parquet-mr/pull/963#issuecomment-1155680245 I tested this locally and it works beautifully thank you @guillaume-fetter. @shangxinli @gszadovszky > Add support for Dynamic Messages in parquet-protobuf > > > Key: PARQUET-1020 > URL: https://issues.apache.org/jira/browse/PARQUET-1020 > Project: Parquet > Issue Type: New Feature > Components: parquet-protobuf >Reporter: Alex Buck >Assignee: Alex Buck >Priority: Major > > Hello. We would like to pass in a DynamicMessage rather than using the > generated protobuf classes to allow us to make our job very generic. > I think this could be achieved by setting the descriptor upfront, similarly > to how there is a ProtoParquetOutputFormat today. > In ProtoWriteSupport in the init method it could then generate the parquet > schema created by ProtoSchemaConverter using the passed in descriptor, rather > than taking it from the generated proto class. > Would there be interest in incorporating this change? If so does the approach > above sound sensible? I am happy to do a pull request > initial PR here: https://github.com/apache/parquet-mr/pull/414 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (PARQUET-1020) Add support for Dynamic Messages in parquet-protobuf
[ https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553622#comment-17553622 ] ASF GitHub Bot commented on PARQUET-1020: - dossett commented on PR #963: URL: https://github.com/apache/parquet-mr/pull/963#issuecomment-1154013785 @guillaume-fetter I see what you mean, that makes sense. I think for my use case (reading protobuf data from kafka via the confluent schema registry and then writing to parquet) I won't get tripped up by the serializability issue. This will be a nice parquet enhancement! > Add support for Dynamic Messages in parquet-protobuf > > > Key: PARQUET-1020 > URL: https://issues.apache.org/jira/browse/PARQUET-1020 > Project: Parquet > Issue Type: New Feature > Components: parquet-protobuf >Reporter: Alex Buck >Assignee: Alex Buck >Priority: Major > > Hello. We would like to pass in a DynamicMessage rather than using the > generated protobuf classes to allow us to make our job very generic. > I think this could be achieved by setting the descriptor upfront, similarly > to how there is a ProtoParquetOutputFormat today. > In ProtoWriteSupport in the init method it could then generate the parquet > schema created by ProtoSchemaConverter using the passed in descriptor, rather > than taking it from the generated proto class. > Would there be interest in incorporating this change? If so does the approach > above sound sensible? I am happy to do a pull request > initial PR here: https://github.com/apache/parquet-mr/pull/414 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (PARQUET-1020) Add support for Dynamic Messages in parquet-protobuf
[ https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553618#comment-17553618 ] ASF GitHub Bot commented on PARQUET-1020: - guillaume-fetter commented on PR #963: URL: https://github.com/apache/parquet-mr/pull/963#issuecomment-1154004113 @dossett Depends on your use case. If you are running a simple program that does data processing on a single host, then you're good. If you are using a big data processing tool (like me here, Flink) you can't pass around a DM instance from one task to the other, or at least, I did not find a way to make it work... For unrelated reasons, we are using the SelfDescribingMessage design pattern (https://developers.google.com/protocol-buffers/docs/techniques#self-description), which is a specific message, therefore serializable. From there we wrote a parquet writer which basically converts the SelfDescribingMessage to a DynamicMessage and then writes it using this upgraded ProtoWriteSupport. It's clearly convoluted unless you are already using a SelfDescribingMessage or equivalent. > Add support for Dynamic Messages in parquet-protobuf > > > Key: PARQUET-1020 > URL: https://issues.apache.org/jira/browse/PARQUET-1020 > Project: Parquet > Issue Type: New Feature > Components: parquet-protobuf >Reporter: Alex Buck >Assignee: Alex Buck >Priority: Major > > Hello. We would like to pass in a DynamicMessage rather than using the > generated protobuf classes to allow us to make our job very generic. > I think this could be achieved by setting the descriptor upfront, similarly > to how there is a ProtoParquetOutputFormat today. > In ProtoWriteSupport in the init method it could then generate the parquet > schema created by ProtoSchemaConverter using the passed in descriptor, rather > than taking it from the generated proto class. > Would there be interest in incorporating this change? If so does the approach > above sound sensible? I am happy to do a pull request > initial PR here: https://github.com/apache/parquet-mr/pull/414 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (PARQUET-1020) Add support for Dynamic Messages in parquet-protobuf
[ https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553557#comment-17553557 ] ASF GitHub Bot commented on PARQUET-1020: - dossett commented on PR #963: URL: https://github.com/apache/parquet-mr/pull/963#issuecomment-1153856375 Oh that's interesting @guillaume-fetter so you can't just write out a dynamic message into parquet without jumping through more hoops? > Add support for Dynamic Messages in parquet-protobuf > > > Key: PARQUET-1020 > URL: https://issues.apache.org/jira/browse/PARQUET-1020 > Project: Parquet > Issue Type: New Feature > Components: parquet-protobuf >Reporter: Alex Buck >Assignee: Alex Buck >Priority: Major > > Hello. We would like to pass in a DynamicMessage rather than using the > generated protobuf classes to allow us to make our job very generic. > I think this could be achieved by setting the descriptor upfront, similarly > to how there is a ProtoParquetOutputFormat today. > In ProtoWriteSupport in the init method it could then generate the parquet > schema created by ProtoSchemaConverter using the passed in descriptor, rather > than taking it from the generated proto class. > Would there be interest in incorporating this change? If so does the approach > above sound sensible? I am happy to do a pull request > initial PR here: https://github.com/apache/parquet-mr/pull/414 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (PARQUET-1020) Add support for Dynamic Messages in parquet-protobuf
[ https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553445#comment-17553445 ] ASF GitHub Bot commented on PARQUET-1020: - guillaume-fetter commented on PR #963: URL: https://github.com/apache/parquet-mr/pull/963#issuecomment-1153626738 Just a heads-up (because I have run into that issue), DynamicMessage is not serializable. So this means that this use-case is for local-only instances of a DynamicMessage. In my use case I need to build the DynamicMessage from another object which is serializable and do so directly in the writer, which is a bit convoluted. > Add support for Dynamic Messages in parquet-protobuf > > > Key: PARQUET-1020 > URL: https://issues.apache.org/jira/browse/PARQUET-1020 > Project: Parquet > Issue Type: New Feature > Components: parquet-protobuf >Reporter: Alex Buck >Assignee: Alex Buck >Priority: Major > > Hello. We would like to pass in a DynamicMessage rather than using the > generated protobuf classes to allow us to make our job very generic. > I think this could be achieved by setting the descriptor upfront, similarly > to how there is a ProtoParquetOutputFormat today. > In ProtoWriteSupport in the init method it could then generate the parquet > schema created by ProtoSchemaConverter using the passed in descriptor, rather > than taking it from the generated proto class. > Would there be interest in incorporating this change? If so does the approach > above sound sensible? I am happy to do a pull request > initial PR here: https://github.com/apache/parquet-mr/pull/414 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (PARQUET-1020) Add support for Dynamic Messages in parquet-protobuf
[ https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17553434#comment-17553434 ] ASF GitHub Bot commented on PARQUET-1020: - guillaume-fetter commented on code in PR #963: URL: https://github.com/apache/parquet-mr/pull/963#discussion_r895442880 ## parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoWriteSupport.java: ## @@ -115,27 +120,32 @@ public void prepareForWrite(RecordConsumer recordConsumer) { public WriteContext init(Configuration configuration) { // if no protobuf descriptor was given in constructor, load descriptor from configuration (set with setProtobufClass) -if (protoMessage == null) { - Class pbClass = configuration.getClass(PB_CLASS_WRITE, null, Message.class); - if (pbClass != null) { -protoMessage = pbClass; - } else { -String msg = "Protocol buffer class not specified."; -String hint = " Please use method ProtoParquetOutputFormat.setProtobufClass(...) or other similar method."; -throw new BadConfigurationException(msg + hint); +if (descriptor == null) { + if (protoMessage == null) { +Class pbClass = configuration.getClass(PB_CLASS_WRITE, null, Message.class); +if (pbClass != null) { + protoMessage = pbClass; +} else { + String msg = "Protocol buffer class or descriptor not specified."; + String hint = " Please use method ProtoParquetOutputFormat.setProtobufClass(...) or other similar method."; + throw new BadConfigurationException(msg + hint); +} } + descriptor = Protobufs.getMessageDescriptor(protoMessage); +} else { + //Assume no specific Message extending class, so use DynamicMessage + protoMessage = DynamicMessage.class; Review Comment: Yes I agree. In the end I set it just for the sake of having it set, but you are right it will be more confusing than useful. > Add support for Dynamic Messages in parquet-protobuf > > > Key: PARQUET-1020 > URL: https://issues.apache.org/jira/browse/PARQUET-1020 > Project: Parquet > Issue Type: New Feature > Components: parquet-protobuf >Reporter: Alex Buck >Assignee: Alex Buck >Priority: Major > > Hello. We would like to pass in a DynamicMessage rather than using the > generated protobuf classes to allow us to make our job very generic. > I think this could be achieved by setting the descriptor upfront, similarly > to how there is a ProtoParquetOutputFormat today. > In ProtoWriteSupport in the init method it could then generate the parquet > schema created by ProtoSchemaConverter using the passed in descriptor, rather > than taking it from the generated proto class. > Would there be interest in incorporating this change? If so does the approach > above sound sensible? I am happy to do a pull request > initial PR here: https://github.com/apache/parquet-mr/pull/414 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (PARQUET-1020) Add support for Dynamic Messages in parquet-protobuf
[ https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552383#comment-17552383 ] ASF GitHub Bot commented on PARQUET-1020: - dossett commented on PR #963: URL: https://github.com/apache/parquet-mr/pull/963#issuecomment-1151481731 +1 (non-binding) for this change. `DynamicMessage` is quite useful in protobuf and support here would be great, I ran into a need for it just today. cc @belugabehr in case they have thoughts. There aren't any active protobuf-parquet committers AFAICT. > Add support for Dynamic Messages in parquet-protobuf > > > Key: PARQUET-1020 > URL: https://issues.apache.org/jira/browse/PARQUET-1020 > Project: Parquet > Issue Type: New Feature >Reporter: Alex Buck >Assignee: Alex Buck >Priority: Major > > Hello. We would like to pass in a DynamicMessage rather than using the > generated protobuf classes to allow us to make our job very generic. > I think this could be achieved by setting the descriptor upfront, similarly > to how there is a ProtoParquetOutputFormat today. > In ProtoWriteSupport in the init method it could then generate the parquet > schema created by ProtoSchemaConverter using the passed in descriptor, rather > than taking it from the generated proto class. > Would there be interest in incorporating this change? If so does the approach > above sound sensible? I am happy to do a pull request > initial PR here: https://github.com/apache/parquet-mr/pull/414 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (PARQUET-1020) Add support for Dynamic Messages in parquet-protobuf
[ https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17552382#comment-17552382 ] ASF GitHub Bot commented on PARQUET-1020: - dossett commented on PR #963: URL: https://github.com/apache/parquet-mr/pull/963#issuecomment-1151481732 +1 (non-binding) for this change. `DynamicMessage` is quite useful in protobuf and support here would be great, I ran into a need for it just today. cc @belugabehr in case they have thoughts. There aren't any active protobuf-parquet committers AFAICT. > Add support for Dynamic Messages in parquet-protobuf > > > Key: PARQUET-1020 > URL: https://issues.apache.org/jira/browse/PARQUET-1020 > Project: Parquet > Issue Type: New Feature >Reporter: Alex Buck >Assignee: Alex Buck >Priority: Major > > Hello. We would like to pass in a DynamicMessage rather than using the > generated protobuf classes to allow us to make our job very generic. > I think this could be achieved by setting the descriptor upfront, similarly > to how there is a ProtoParquetOutputFormat today. > In ProtoWriteSupport in the init method it could then generate the parquet > schema created by ProtoSchemaConverter using the passed in descriptor, rather > than taking it from the generated proto class. > Would there be interest in incorporating this change? If so does the approach > above sound sensible? I am happy to do a pull request > initial PR here: https://github.com/apache/parquet-mr/pull/414 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (PARQUET-1020) Add support for Dynamic Messages in parquet-protobuf
[ https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17529469#comment-17529469 ] ASF GitHub Bot commented on PARQUET-1020: - guillaume-fetter opened a new pull request, #963: URL: https://github.com/apache/parquet-mr/pull/963 ### Jira - [X] My PR addresses the following [Parquet Jira](https://issues.apache.org/jira/browse/PARQUET/) issues and references them in the PR title: - https://issues.apache.org/jira/browse/PARQUET-1020 ### Tests - [X] My PR adds the following unit test: - testProto3SimplestDynamicMessage in parquet-protobuf/src/test/java/org/apache/parquet/proto/ProtoWriteSupportTest.java ### Commits - [X] My commits all reference Jira issues in their subject lines. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - All the public functions and the classes in the PR contain Javadoc that explain what it does This is sort of a resubmission of https://github.com/apache/parquet-mr/pull/414 as the PR has been left open for quite some time, and the branch has diverged a bit. Please tell me if this is okay. > Add support for Dynamic Messages in parquet-protobuf > > > Key: PARQUET-1020 > URL: https://issues.apache.org/jira/browse/PARQUET-1020 > Project: Parquet > Issue Type: New Feature >Reporter: Alex Buck >Assignee: Alex Buck >Priority: Major > > Hello. We would like to pass in a DynamicMessage rather than using the > generated protobuf classes to allow us to make our job very generic. > I think this could be achieved by setting the descriptor upfront, similarly > to how there is a ProtoParquetOutputFormat today. > In ProtoWriteSupport in the init method it could then generate the parquet > schema created by ProtoSchemaConverter using the passed in descriptor, rather > than taking it from the generated proto class. > Would there be interest in incorporating this change? If so does the approach > above sound sensible? I am happy to do a pull request > initial PR here: https://github.com/apache/parquet-mr/pull/414 -- This message was sent by Atlassian Jira (v8.20.7#820007)
[jira] [Commented] (PARQUET-1020) Add support for Dynamic Messages in parquet-protobuf
[ https://issues.apache.org/jira/browse/PARQUET-1020?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17134247#comment-17134247 ] ASF GitHub Bot commented on PARQUET-1020: - alexcardell commented on pull request #414: URL: https://github.com/apache/parquet-mr/pull/414#issuecomment-643289745 This PR seems to have stagnated but this is exactly what we're looking for, if I fork and fix those conflicts can we reignite the discussion? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Add support for Dynamic Messages in parquet-protobuf > > > Key: PARQUET-1020 > URL: https://issues.apache.org/jira/browse/PARQUET-1020 > Project: Parquet > Issue Type: New Feature >Reporter: Alex Buck >Assignee: Alex Buck >Priority: Major > > Hello. We would like to pass in a DynamicMessage rather than using the > generated protobuf classes to allow us to make our job very generic. > I think this could be achieved by setting the descriptor upfront, similarly > to how there is a ProtoParquetOutputFormat today. > In ProtoWriteSupport in the init method it could then generate the parquet > schema created by ProtoSchemaConverter using the passed in descriptor, rather > than taking it from the generated proto class. > Would there be interest in incorporating this change? If so does the approach > above sound sensible? I am happy to do a pull request > initial PR here: https://github.com/apache/parquet-mr/pull/414 -- This message was sent by Atlassian Jira (v8.3.4#803005)