[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks
SandishKumarHN commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1053865743 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala: ## @@ -40,19 +40,27 @@ object SchemaConverters { * * @since 3.4.0 */ - def toSqlType(descriptor: Descriptor): SchemaType = { -toSqlTypeHelper(descriptor) + def toSqlType( + descriptor: Descriptor, + protobufOptions: ProtobufOptions = ProtobufOptions(Map.empty)): SchemaType = { +toSqlTypeHelper(descriptor, protobufOptions) } - def toSqlTypeHelper(descriptor: Descriptor): SchemaType = ScalaReflectionLock.synchronized { + def toSqlTypeHelper( + descriptor: Descriptor, + protobufOptions: ProtobufOptions): SchemaType = ScalaReflectionLock.synchronized { SchemaType( - StructType(descriptor.getFields.asScala.flatMap(structFieldFor(_, Set.empty)).toArray), + StructType(descriptor.getFields.asScala.flatMap( +structFieldFor(_, + Map(descriptor.getFullName -> 1), + protobufOptions: ProtobufOptions)).toArray), nullable = true) } def structFieldFor( fd: FieldDescriptor, - existingRecordNames: Set[String]): Option[StructField] = { + existingRecordNames: Map[String, Int], Review Comment: @cloud-fan added a comment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks
SandishKumarHN commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1053690877 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala: ## @@ -92,17 +108,35 @@ object SchemaConverters { MapType(keyType, valueType, valueContainsNull = false).defaultConcreteType, nullable = false)) case MESSAGE => -if (existingRecordNames.contains(fd.getFullName)) { +// If the `recursive.fields.max.depth` value is not specified, it will default to -1; +// recursive fields are not permitted. Setting it to 0 drops all recursive fields, +// 1 allows it to be recursed once, and 2 allows it to be recursed twice and so on. +// A value greater than 10 is not allowed, and if a protobuf record has more depth for +// recursive fields than the allowed value, it will be truncated and some fields may be +// discarded. +// SQL Schema for the protobuf message `message Person { string name = 1; Person bff = 2}` +// will vary based on the value of "recursive.fields.max.depth". +// 0: struct +// 1: struct> +// 2: struct>> ... +val recordName = fd.getMessageType.getFullName Review Comment: @cloud-fan fd.getFullName gives a fully qualified name along with a field name, we needed the fully qualified type name. we made this decision above. here is the difference. ``` println(s"${fd.getFullName} : ${fd.getMessageType.getFullName}") org.apache.spark.sql.protobuf.protos.Employee.ic : org.apache.spark.sql.protobuf.protos.IC org.apache.spark.sql.protobuf.protos.IC.icManager : org.apache.spark.sql.protobuf.protos.Employee org.apache.spark.sql.protobuf.protos.Employee.ic : org.apache.spark.sql.protobuf.protos.IC org.apache.spark.sql.protobuf.protos.IC.icManager : org.apache.spark.sql.protobuf.protos.Employee org.apache.spark.sql.protobuf.protos.Employee.em : org.apache.spark.sql.protobuf.protos.EM ``` @rangadi previous code ```fd.getFullName``` fully qualified name along with a field name works to find out recursion. so before we just use to throw errors on any recursion field. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks
SandishKumarHN commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1051328356 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala: ## @@ -38,6 +38,14 @@ private[sql] class ProtobufOptions( val parseMode: ParseMode = parameters.get("mode").map(ParseMode.fromString).getOrElse(FailFastMode) + + // Setting the `recursive.fields.max.depth` to 0 allows the field to be recurse once, Review Comment: @rangadi Thank you for your suggestion. I have implemented it by adding a comment and a unit test to make the example more clear to users. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks
SandishKumarHN commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1051328177 ## core/src/main/resources/error/error-classes.json: ## @@ -1016,7 +1016,7 @@ }, "RECURSIVE_PROTOBUF_SCHEMA" : { "message" : [ - "Found recursive reference in Protobuf schema, which can not be processed by Spark: " + "Found recursive reference in Protobuf schema, which can not be processed by Spark by default: . try setting the option `recursive.fields.max.depth` as 0 or 1 or 2. Going beyond 3 levels of recursion is not allowed." Review Comment: @rangadi agree. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks
SandishKumarHN commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1049175487 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala: ## @@ -92,14 +106,26 @@ object SchemaConverters { MapType(keyType, valueType, valueContainsNull = false).defaultConcreteType, nullable = false)) case MESSAGE => -if (existingRecordNames.contains(fd.getFullName)) { +// Setting the circularReferenceDepth to 0 allows the field to be recursed once, setting +// it to 1 allows it to be recursed twice, and setting it to 2 allows it to be recursed +// thrice. circularReferenceDepth value greater than 2 is not allowed. If the not +// specified, it will default to -1, which disables recursive fields. +val recordName = fd.getMessageType.getFullName +if (existingRecordNames.contains(recordName) && Review Comment: @rangadi thanks for the review, I have made all changes you suggested. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks
SandishKumarHN commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1048042504 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala: ## @@ -235,7 +237,7 @@ private[sql] class ProtobufDeserializer( writeRecord(new RowUpdater(row), value.asInstanceOf[DynamicMessage]) updater.set(ordinal, row) - case (MESSAGE, ArrayType(st: StructType, containsNull)) => + case (MESSAGE, ArrayType(st: DataType, containsNull)) => Review Comment: @mposdev21 Actually, we don't need this here at all; we can add it at the top along with other arraytype types. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks
SandishKumarHN commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1044857157 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala: ## @@ -38,6 +38,12 @@ private[sql] class ProtobufOptions( val parseMode: ParseMode = parameters.get("mode").map(ParseMode.fromString).getOrElse(FailFastMode) + + val circularReferenceType: String = parameters.getOrElse("circularReferenceType", "FIELD_NAME") Review Comment: @rangadi @baganokodo2022 thanks for the quick meet. meeting conclusion was to use descriptor type full name and added unit tests with some complex schema. ``` val recordName = fd.getMessageType.getFullName ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks
SandishKumarHN commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1044857157 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala: ## @@ -38,6 +38,12 @@ private[sql] class ProtobufOptions( val parseMode: ParseMode = parameters.get("mode").map(ParseMode.fromString).getOrElse(FailFastMode) + + val circularReferenceType: String = parameters.getOrElse("circularReferenceType", "FIELD_NAME") Review Comment: @rangadi @baganokodo2022 thanks for the quick meet. meeting conclusion use descriptor type name and added unit tests with some complex schema. ``` val recordName = fd.getMessageType.getFullName ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks
SandishKumarHN commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1044079922 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala: ## @@ -38,6 +38,12 @@ private[sql] class ProtobufOptions( val parseMode: ParseMode = parameters.get("mode").map(ParseMode.fromString).getOrElse(FailFastMode) + + val circularReferenceType: String = parameters.getOrElse("circularReferenceType", "FIELD_NAME") Review Comment: @rangadi fd.fullName is able to detect the recursive field with different field names. add a unit test. now I'm confused. ```Fail for recursion field with different field names``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks
SandishKumarHN commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1044056150 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala: ## @@ -38,6 +38,12 @@ private[sql] class ProtobufOptions( val parseMode: ParseMode = parameters.get("mode").map(ParseMode.fromString).getOrElse(FailFastMode) + + val circularReferenceType: String = parameters.getOrElse("circularReferenceType", "FIELD_NAME") Review Comment: @rangadi just replacing fd.getFullName with below would make sense. we don't need to have two different cases of checks. ``` val recordName = fd.getFullName.substring(0, fd.getFullName().lastIndexOf(".")) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks
SandishKumarHN commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1044001778 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala: ## @@ -38,6 +38,12 @@ private[sql] class ProtobufOptions( val parseMode: ParseMode = parameters.get("mode").map(ParseMode.fromString).getOrElse(FailFastMode) + + val circularReferenceType: String = parameters.getOrElse("circularReferenceType", "FIELD_NAME") Review Comment: ``` message A { B b = 1; } message B { A aa = 1; D d = 2; } message D { A aaa = 1; E a =2; } message E { int32 key = 1; } ``` @rangadi just correcting the second example. in the case of field_name base recursive check. we would fail to detect the recursion for above because the thread would be ```A.B.A.aa.D.d.A.aaa.E``` set would have unique values [A, B.b, A.aa, A.aaa, E.a], but in the case of @baganokodo2022 proposal it would be ```A.B.A.D.A.E``` would takeout field_name so early detection for recursion. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks
SandishKumarHN commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1043856365 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala: ## @@ -38,6 +38,12 @@ private[sql] class ProtobufOptions( val parseMode: ParseMode = parameters.get("mode").map(ParseMode.fromString).getOrElse(FailFastMode) + + val circularReferenceType: String = parameters.getOrElse("circularReferenceType", "FIELD_NAME") Review Comment: I would have @baganokodo2022 give more details on the field type case. We have not yet added unit tests for the field-type case. would like to discuss this before adding unit tests. ## connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala: ## @@ -26,11 +26,11 @@ import com.google.protobuf.{ByteString, DynamicMessage} import org.apache.spark.sql.{Column, QueryTest, Row} import org.apache.spark.sql.AnalysisException import org.apache.spark.sql.functions.{lit, struct} -import org.apache.spark.sql.protobuf.protos.SimpleMessageProtos.SimpleMessageRepeated +import org.apache.spark.sql.protobuf.protos.SimpleMessageProtos.{EventRecursiveA, EventRecursiveB, OneOfEvent, OneOfEventWithRecursion, SimpleMessageRepeated} Review Comment: @rangadi I didn't understand. these are already two different tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks
SandishKumarHN commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1043814379 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala: ## @@ -92,14 +109,38 @@ object SchemaConverters { MapType(keyType, valueType, valueContainsNull = false).defaultConcreteType, nullable = false)) case MESSAGE => -if (existingRecordNames.contains(fd.getFullName)) { - throw QueryCompilationErrors.foundRecursionInProtobufSchema(fd.toString()) +// User can set circularReferenceDepth of 0 or 1 or 2. +// Going beyond 3 levels of recursion is not allowed. Review Comment: @rangadi The user can specify the maximum allowed recursion depth for a field by setting the circularReferenceDepth property to 0, 1, or 2. Setting the circularReferenceDepth to 0 allows the field to be recursed once, setting it to 1 allows it to be recursed twice, and setting it to 2 allows it to be recursed thrice. Attempting to set the circularReferenceDepth to a value greater than 2 is not allowed. If the circularReferenceDepth is not specified, it will default to -1, which disables recursive fields. ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala: ## @@ -38,6 +38,12 @@ private[sql] class ProtobufOptions( val parseMode: ParseMode = parameters.get("mode").map(ParseMode.fromString).getOrElse(FailFastMode) + + val circularReferenceType: String = parameters.getOrElse("circularReferenceType", "FIELD_NAME") Review Comment: @rangadi we already know about field_name circusive check. using `fd.getFullName` we detect the recursion and throw and error. another option is to detect recursion through field type. example below. ``` message A { B b; } message B { A c; } ``` in the case of field_name recursive check it is ```A.B.C``` no recursion. in the case of field_type recursive check. it is ```MESSAGE.MESSAGE.MESSAGE``` recursion will be found and throw an error or drop the certain recursive depth. but it will also throw an error for the below case with the field_type check. since it will be ```MESSAGE.MESSAGE.MESSAGE.MESSAGE``` ``` message A { B b = 1; } message B { D d = 1; } message D { E e = 1; } message E { int32 key = 1; } ``` @baganokodo2022 argument is field_type base check will give users an option to drop recursion more quickly because with complex nested schema recursive field_name can be found at very deep. before hitting this we might see OOM. field_type base check finds the circle reference more quickly. @baganokodo2022 please correct me if I'm wrong. ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala: ## @@ -92,14 +109,38 @@ object SchemaConverters { MapType(keyType, valueType, valueContainsNull = false).defaultConcreteType, nullable = false)) case MESSAGE => -if (existingRecordNames.contains(fd.getFullName)) { - throw QueryCompilationErrors.foundRecursionInProtobufSchema(fd.toString()) +// User can set circularReferenceDepth of 0 or 1 or 2. +// Going beyond 3 levels of recursion is not allowed. +if (protobufOptions.circularReferenceType.equals("FIELD_TYPE")) { + if (existingRecordTypes.contains(fd.getType.name()) && +(protobufOptions.circularReferenceDepth < 0 || + protobufOptions.circularReferenceDepth >= 3)) { +throw QueryCompilationErrors.foundRecursionInProtobufSchema(fd.toString()) + } else if (existingRecordTypes.contains(fd.getType.name()) && Review Comment: @rangadi we have two maps with incremental counters, one for field_name base check and one for field_type. ## connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala: ## @@ -26,11 +26,11 @@ import com.google.protobuf.{ByteString, DynamicMessage} import org.apache.spark.sql.{Column, QueryTest, Row} import org.apache.spark.sql.AnalysisException import org.apache.spark.sql.functions.{lit, struct} -import org.apache.spark.sql.protobuf.protos.SimpleMessageProtos.SimpleMessageRepeated +import org.apache.spark.sql.protobuf.protos.SimpleMessageProtos.{EventRecursiveA, EventRecursiveB, OneOfEvent, OneOfEventWithRecursion, SimpleMessageRepeated} Review Comment: @rangadi yes, `Handle recursive fields in Protobuf schema, C->D->Array(C)` and `Handle recursive fields in Protobuf schema, A->B->A` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please
[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks
SandishKumarHN commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1041487582 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala: ## @@ -92,9 +92,13 @@ object SchemaConverters { MapType(keyType, valueType, valueContainsNull = false).defaultConcreteType, nullable = false)) case MESSAGE => +// Stop recursion at the first level when a recursive field is encountered. +// TODO: The user should be given the option to set the recursion level to 1, 2, or 3 Review Comment: @rangadi planning to add the selectable recursion depth in this PR. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks
SandishKumarHN commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1041487447 ## connector/protobuf/src/test/resources/protobuf/functions_suite.proto: ## @@ -170,4 +170,41 @@ message timeStampMsg { message durationMsg { string key = 1; Duration duration = 2; -} \ No newline at end of file +} + +message OneOfEvent { Review Comment: @rangadi I see a lot of use cases for the "payload" Oneof the field and recursive fields in it. So I thought combining Oneof with recursion would be a good test. will separate ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala: ## @@ -157,6 +157,8 @@ private[sql] class ProtobufDeserializer( case (null, NullType) => (updater, ordinal, _) => updater.setNullAt(ordinal) + case (MESSAGE, NullType) => (updater, ordinal, _) => updater.setNullAt(ordinal) Review Comment: yes, correct. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks
SandishKumarHN commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1041487180 ## connector/protobuf/src/test/resources/protobuf/functions_suite.proto: ## @@ -170,4 +170,41 @@ message timeStampMsg { message durationMsg { string key = 1; Duration duration = 2; -} \ No newline at end of file +} + +message OneOfEvent { + string key = 1; + oneof payload { Review Comment: @rangadi the "Oneof" field is of message type, Oneof will be converted to a struct type. ## connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala: ## @@ -693,4 +721,45 @@ class ProtobufFunctionsSuite extends QueryTest with SharedSparkSession with Prot errorClass = "CANNOT_CONSTRUCT_PROTOBUF_DESCRIPTOR", parameters = Map("descFilePath" -> testFileDescriptor)) } + + test("Unit tests for OneOf field support and recursion checks") { Review Comment: will do that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks
SandishKumarHN commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1040138657 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala: ## @@ -92,9 +92,13 @@ object SchemaConverters { MapType(keyType, valueType, valueContainsNull = false).defaultConcreteType, nullable = false)) case MESSAGE => +// Stop recursion at the first level when a recursive field is encountered. +// TODO: The user should be given the option to set the recursion level to 1, 2, or 3 Review Comment: @rangadi @mposdev21 Instead of limiting the recursion to only one level, the user should be able to choose a recursion level of 1, 2, or 3. Going beyond 3 levels of recursion should not be allowed. any thoughts? `spark.protobuf.recursion.level` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks
SandishKumarHN commented on code in PR #38922: URL: https://github.com/apache/spark/pull/38922#discussion_r1040138657 ## connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala: ## @@ -92,9 +92,13 @@ object SchemaConverters { MapType(keyType, valueType, valueContainsNull = false).defaultConcreteType, nullable = false)) case MESSAGE => +// Stop recursion at the first level when a recursive field is encountered. +// TODO: The user should be given the option to set the recursion level to 1, 2, or 3 Review Comment: @rangadi Instead of limiting the recursion to only one level, the user should be able to choose a recursion level of 1, 2, or 3. Going beyond 3 levels of recursion should not be allowed. any thoughts? `spark.protobuf.recursion.level` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org