[GitHub] [parquet-mr] matthieun commented on a diff in pull request #988: PARQUET-1711: Break circular dependencies in proto definitions

2022-08-23 Thread GitBox


matthieun commented on code in PR #988:
URL: https://github.com/apache/parquet-mr/pull/988#discussion_r952835674


##
parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java:
##
@@ -79,12 +80,20 @@ public MessageType convert(Class 
protobufClass) {
   }
 
   /* Iterates over list of fields. **/
-  private  GroupBuilder convertFields(GroupBuilder groupBuilder, 
List fieldDescriptors) {
+  private  GroupBuilder convertFields(GroupBuilder groupBuilder, 
List fieldDescriptors, List parentNames) {
 for (FieldDescriptor fieldDescriptor : fieldDescriptors) {
-  groupBuilder =
-  addField(fieldDescriptor, groupBuilder)
+  final String name = fieldDescriptor.getFullName();
+  final List newParentNames = new ArrayList<>(parentNames);
+  newParentNames.add(name);
+  if (parentNames.contains(name)) {
+// Circular dependency, skip
+LOG.warn("Breaking circular dependency:{}{}", System.lineSeparator(),

Review Comment:
   Well, another option would be to add a new configuration setting to allow 
the user to either have it fail with a good error message, or just silently 
break the circle like this. However I am not familiar with how 
`parquet-protobuf` is configured. If I should go that route, I'd appreciate 
some examples!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [parquet-mr] matthieun commented on a diff in pull request #988: PARQUET-1711: Break circular dependencies in proto definitions

2022-08-22 Thread GitBox


matthieun commented on code in PR #988:
URL: https://github.com/apache/parquet-mr/pull/988#discussion_r951648555


##
parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java:
##
@@ -79,12 +80,20 @@ public MessageType convert(Class 
protobufClass) {
   }
 
   /* Iterates over list of fields. **/
-  private  GroupBuilder convertFields(GroupBuilder groupBuilder, 
List fieldDescriptors) {
+  private  GroupBuilder convertFields(GroupBuilder groupBuilder, 
List fieldDescriptors, List parentNames) {
 for (FieldDescriptor fieldDescriptor : fieldDescriptors) {
-  groupBuilder =
-  addField(fieldDescriptor, groupBuilder)
+  final String name = fieldDescriptor.getFullName();
+  final List newParentNames = new ArrayList<>(parentNames);
+  newParentNames.add(name);
+  if (parentNames.contains(name)) {
+// Circular dependency, skip
+LOG.warn("Breaking circular dependency:{}{}", System.lineSeparator(),

Review Comment:
   It is possible to create circular dependencies, that is the problem. I am 
not sure in what case they would be useful, however since they can exist, 
parquet should not fail with `StackOverflowError` when it encounters them  



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [parquet-mr] matthieun commented on a diff in pull request #988: PARQUET-1711: Break circular dependencies in proto definitions

2022-08-22 Thread GitBox


matthieun commented on code in PR #988:
URL: https://github.com/apache/parquet-mr/pull/988#discussion_r951644453


##
parquet-protobuf/src/main/java/org/apache/parquet/proto/ProtoSchemaConverter.java:
##
@@ -79,12 +80,20 @@ public MessageType convert(Class 
protobufClass) {
   }
 
   /* Iterates over list of fields. **/
-  private  GroupBuilder convertFields(GroupBuilder groupBuilder, 
List fieldDescriptors) {
+  private  GroupBuilder convertFields(GroupBuilder groupBuilder, 
List fieldDescriptors, List parentNames) {
 for (FieldDescriptor fieldDescriptor : fieldDescriptors) {
-  groupBuilder =
-  addField(fieldDescriptor, groupBuilder)
+  final String name = fieldDescriptor.getFullName();
+  final List newParentNames = new ArrayList<>(parentNames);
+  newParentNames.add(name);
+  if (parentNames.contains(name)) {

Review Comment:
   The list is mostly used to keep the ordering, so that the dependency chain 
is printed in order in the warning message. I understand that in case the 
schema definition is really deep with nested types it might be slower, but 
overall that list is not growing any bigger than the deepest nesting in the 
schema.
   If this is still a concern, I am happy to switch to HashSet at the expense 
of maybe dumming down the log message (printing the nesting chain out of order 
would not be valuable anyway I think).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@parquet.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org