[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-20 Thread GitBox


SandishKumarHN commented on code in PR #38922:
URL: https://github.com/apache/spark/pull/38922#discussion_r1053865743


##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala:
##
@@ -40,19 +40,27 @@ object SchemaConverters {
*
* @since 3.4.0
*/
-  def toSqlType(descriptor: Descriptor): SchemaType = {
-toSqlTypeHelper(descriptor)
+  def toSqlType(
+  descriptor: Descriptor,
+  protobufOptions: ProtobufOptions = ProtobufOptions(Map.empty)): 
SchemaType = {
+toSqlTypeHelper(descriptor, protobufOptions)
   }
 
-  def toSqlTypeHelper(descriptor: Descriptor): SchemaType = 
ScalaReflectionLock.synchronized {
+  def toSqlTypeHelper(
+  descriptor: Descriptor,
+  protobufOptions: ProtobufOptions): SchemaType = 
ScalaReflectionLock.synchronized {
 SchemaType(
-  StructType(descriptor.getFields.asScala.flatMap(structFieldFor(_, 
Set.empty)).toArray),
+  StructType(descriptor.getFields.asScala.flatMap(
+structFieldFor(_,
+  Map(descriptor.getFullName -> 1),
+  protobufOptions: ProtobufOptions)).toArray),
   nullable = true)
   }
 
   def structFieldFor(
   fd: FieldDescriptor,
-  existingRecordNames: Set[String]): Option[StructField] = {
+  existingRecordNames: Map[String, Int],

Review Comment:
   @cloud-fan added a comment. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-20 Thread GitBox


SandishKumarHN commented on code in PR #38922:
URL: https://github.com/apache/spark/pull/38922#discussion_r1053690877


##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala:
##
@@ -92,17 +108,35 @@ object SchemaConverters {
 MapType(keyType, valueType, valueContainsNull = 
false).defaultConcreteType,
 nullable = false))
   case MESSAGE =>
-if (existingRecordNames.contains(fd.getFullName)) {
+// If the `recursive.fields.max.depth` value is not specified, it will 
default to -1;
+// recursive fields are not permitted. Setting it to 0 drops all 
recursive fields,
+// 1 allows it to be recursed once, and 2 allows it to be recursed 
twice and so on.
+// A value greater than 10 is not allowed, and if a protobuf record 
has more depth for
+// recursive fields than the allowed value, it will be truncated and 
some fields may be
+// discarded.
+// SQL Schema for the protobuf message `message Person { string name = 
1; Person bff = 2}`
+// will vary based on the value of "recursive.fields.max.depth".
+// 0: struct
+// 1: struct>
+// 2: struct>> ...
+val recordName = fd.getMessageType.getFullName

Review Comment:
   @cloud-fan fd.getFullName gives a fully qualified name along with a field 
name, we needed the fully qualified type name. we made this decision above. 
   
   here is the difference.
   ```
   println(s"${fd.getFullName} : ${fd.getMessageType.getFullName}")
   
   org.apache.spark.sql.protobuf.protos.Employee.ic : 
org.apache.spark.sql.protobuf.protos.IC
   org.apache.spark.sql.protobuf.protos.IC.icManager : 
org.apache.spark.sql.protobuf.protos.Employee
   org.apache.spark.sql.protobuf.protos.Employee.ic : 
org.apache.spark.sql.protobuf.protos.IC
   org.apache.spark.sql.protobuf.protos.IC.icManager : 
org.apache.spark.sql.protobuf.protos.Employee
   org.apache.spark.sql.protobuf.protos.Employee.em : 
org.apache.spark.sql.protobuf.protos.EM
   ```
   @rangadi previous code ```fd.getFullName``` fully qualified name along with 
a field name works to find out recursion. so before we just use to throw errors 
on any recursion field. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-16 Thread GitBox


SandishKumarHN commented on code in PR #38922:
URL: https://github.com/apache/spark/pull/38922#discussion_r1051328356


##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala:
##
@@ -38,6 +38,14 @@ private[sql] class ProtobufOptions(
 
   val parseMode: ParseMode =
 parameters.get("mode").map(ParseMode.fromString).getOrElse(FailFastMode)
+
+  // Setting the `recursive.fields.max.depth` to 0 allows the field to be 
recurse once,

Review Comment:
   @rangadi Thank you for your suggestion. I have implemented it by adding a 
comment and a unit test to make the example more clear to users.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-16 Thread GitBox


SandishKumarHN commented on code in PR #38922:
URL: https://github.com/apache/spark/pull/38922#discussion_r1051328177


##
core/src/main/resources/error/error-classes.json:
##
@@ -1016,7 +1016,7 @@
   },
   "RECURSIVE_PROTOBUF_SCHEMA" : {
 "message" : [
-  "Found recursive reference in Protobuf schema, which can not be 
processed by Spark: "
+  "Found recursive reference in Protobuf schema, which can not be 
processed by Spark by default: . try setting the option 
`recursive.fields.max.depth` as 0 or 1 or 2. Going beyond 3 levels of recursion 
is not allowed."

Review Comment:
   @rangadi agree. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-14 Thread GitBox


SandishKumarHN commented on code in PR #38922:
URL: https://github.com/apache/spark/pull/38922#discussion_r1049175487


##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala:
##
@@ -92,14 +106,26 @@ object SchemaConverters {
 MapType(keyType, valueType, valueContainsNull = 
false).defaultConcreteType,
 nullable = false))
   case MESSAGE =>
-if (existingRecordNames.contains(fd.getFullName)) {
+// Setting the circularReferenceDepth to 0 allows the field to be 
recursed once, setting
+// it to 1 allows it to be recursed twice, and setting it to 2 allows 
it to be recursed
+// thrice. circularReferenceDepth value greater than 2 is not allowed. 
If the not
+// specified, it will default to -1, which disables recursive fields.
+val recordName = fd.getMessageType.getFullName
+if (existingRecordNames.contains(recordName) &&

Review Comment:
   @rangadi thanks for the review, I have made all changes you suggested.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-13 Thread GitBox


SandishKumarHN commented on code in PR #38922:
URL: https://github.com/apache/spark/pull/38922#discussion_r1048042504


##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala:
##
@@ -235,7 +237,7 @@ private[sql] class ProtobufDeserializer(
   writeRecord(new RowUpdater(row), value.asInstanceOf[DynamicMessage])
   updater.set(ordinal, row)
 
-  case (MESSAGE, ArrayType(st: StructType, containsNull)) =>
+  case (MESSAGE, ArrayType(st: DataType, containsNull)) =>

Review Comment:
   @mposdev21 Actually, we don't need this here at all; we can add it at the 
top along with other arraytype types.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-09 Thread GitBox


SandishKumarHN commented on code in PR #38922:
URL: https://github.com/apache/spark/pull/38922#discussion_r1044857157


##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala:
##
@@ -38,6 +38,12 @@ private[sql] class ProtobufOptions(
 
   val parseMode: ParseMode =
 parameters.get("mode").map(ParseMode.fromString).getOrElse(FailFastMode)
+
+  val circularReferenceType: String = 
parameters.getOrElse("circularReferenceType", "FIELD_NAME")

Review Comment:
   @rangadi @baganokodo2022 thanks for the quick meet. meeting conclusion was 
to use descriptor type full name and added unit tests with some complex schema. 
   ```
   val recordName = fd.getMessageType.getFullName
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-09 Thread GitBox


SandishKumarHN commented on code in PR #38922:
URL: https://github.com/apache/spark/pull/38922#discussion_r1044857157


##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala:
##
@@ -38,6 +38,12 @@ private[sql] class ProtobufOptions(
 
   val parseMode: ParseMode =
 parameters.get("mode").map(ParseMode.fromString).getOrElse(FailFastMode)
+
+  val circularReferenceType: String = 
parameters.getOrElse("circularReferenceType", "FIELD_NAME")

Review Comment:
   @rangadi @baganokodo2022 thanks for the quick meet. meeting conclusion use 
descriptor type name and added unit tests with some complex schema. 
   ```
   val recordName = fd.getMessageType.getFullName
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-08 Thread GitBox


SandishKumarHN commented on code in PR #38922:
URL: https://github.com/apache/spark/pull/38922#discussion_r1044079922


##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala:
##
@@ -38,6 +38,12 @@ private[sql] class ProtobufOptions(
 
   val parseMode: ParseMode =
 parameters.get("mode").map(ParseMode.fromString).getOrElse(FailFastMode)
+
+  val circularReferenceType: String = 
parameters.getOrElse("circularReferenceType", "FIELD_NAME")

Review Comment:
   @rangadi fd.fullName is able to detect the recursive field with different 
field names. add a unit test. now I'm confused.  
   ```Fail for recursion field with different field names``` 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-08 Thread GitBox


SandishKumarHN commented on code in PR #38922:
URL: https://github.com/apache/spark/pull/38922#discussion_r1044056150


##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala:
##
@@ -38,6 +38,12 @@ private[sql] class ProtobufOptions(
 
   val parseMode: ParseMode =
 parameters.get("mode").map(ParseMode.fromString).getOrElse(FailFastMode)
+
+  val circularReferenceType: String = 
parameters.getOrElse("circularReferenceType", "FIELD_NAME")

Review Comment:
   @rangadi just replacing fd.getFullName with below would make sense. we don't 
need to have two different cases of checks. 
   ```
   val recordName = fd.getFullName.substring(0, 
fd.getFullName().lastIndexOf("."))
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-08 Thread GitBox


SandishKumarHN commented on code in PR #38922:
URL: https://github.com/apache/spark/pull/38922#discussion_r1044001778


##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala:
##
@@ -38,6 +38,12 @@ private[sql] class ProtobufOptions(
 
   val parseMode: ParseMode =
 parameters.get("mode").map(ParseMode.fromString).getOrElse(FailFastMode)
+
+  val circularReferenceType: String = 
parameters.getOrElse("circularReferenceType", "FIELD_NAME")

Review Comment:
   ```
   message A {
   B b = 1;
   }
   message B {
   A aa = 1;
   D d = 2;
   }
   
   message D {
   A aaa = 1;
   E a =2;
   }
   
   message E {
   int32 key = 1;
   }
   ```
   @rangadi just correcting the second example. in the case of field_name base 
recursive check. we would fail to detect the recursion for above because the 
thread would be ```A.B.A.aa.D.d.A.aaa.E``` set would have unique values [A, 
B.b, A.aa, A.aaa, E.a], but in the case of @baganokodo2022 proposal it would be 
```A.B.A.D.A.E``` would takeout field_name so early detection for recursion. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-08 Thread GitBox


SandishKumarHN commented on code in PR #38922:
URL: https://github.com/apache/spark/pull/38922#discussion_r1043856365


##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala:
##
@@ -38,6 +38,12 @@ private[sql] class ProtobufOptions(
 
   val parseMode: ParseMode =
 parameters.get("mode").map(ParseMode.fromString).getOrElse(FailFastMode)
+
+  val circularReferenceType: String = 
parameters.getOrElse("circularReferenceType", "FIELD_NAME")

Review Comment:
   I would have @baganokodo2022 give more details on the field type case. 
   
   We have not yet added unit tests for the field-type case. would like to 
discuss this before adding unit tests. 



##
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala:
##
@@ -26,11 +26,11 @@ import com.google.protobuf.{ByteString, DynamicMessage}
 import org.apache.spark.sql.{Column, QueryTest, Row}
 import org.apache.spark.sql.AnalysisException
 import org.apache.spark.sql.functions.{lit, struct}
-import 
org.apache.spark.sql.protobuf.protos.SimpleMessageProtos.SimpleMessageRepeated
+import 
org.apache.spark.sql.protobuf.protos.SimpleMessageProtos.{EventRecursiveA, 
EventRecursiveB, OneOfEvent, OneOfEventWithRecursion, SimpleMessageRepeated}

Review Comment:
   @rangadi I didn't understand. these are already two different tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-08 Thread GitBox


SandishKumarHN commented on code in PR #38922:
URL: https://github.com/apache/spark/pull/38922#discussion_r1043814379


##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala:
##
@@ -92,14 +109,38 @@ object SchemaConverters {
 MapType(keyType, valueType, valueContainsNull = 
false).defaultConcreteType,
 nullable = false))
   case MESSAGE =>
-if (existingRecordNames.contains(fd.getFullName)) {
-  throw 
QueryCompilationErrors.foundRecursionInProtobufSchema(fd.toString())
+// User can set circularReferenceDepth of 0 or 1 or 2.
+// Going beyond 3 levels of recursion is not allowed.

Review Comment:
   @rangadi The user can specify the maximum allowed recursion depth for a 
field by setting the circularReferenceDepth property to 0, 1, or 2. Setting the 
circularReferenceDepth to 0 allows the field to be recursed once, setting it to 
1 allows it to be recursed twice, and setting it to 2 allows it to be recursed 
thrice. Attempting to set the circularReferenceDepth to a value greater than 2 
is not allowed. If the circularReferenceDepth is not specified, it will default 
to -1, which disables recursive fields.



##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/ProtobufOptions.scala:
##
@@ -38,6 +38,12 @@ private[sql] class ProtobufOptions(
 
   val parseMode: ParseMode =
 parameters.get("mode").map(ParseMode.fromString).getOrElse(FailFastMode)
+
+  val circularReferenceType: String = 
parameters.getOrElse("circularReferenceType", "FIELD_NAME")

Review Comment:
   @rangadi we already know about field_name circusive check. using 
`fd.getFullName` we detect the recursion and throw and error.  another option 
is to detect recursion through field type. example below.
   
   ```
   message A {
   B b;
   }
   
   message B {
   A c;
   }
   ```
   in the case of field_name recursive check it is ```A.B.C```  no recursion. 
   in the case of field_type recursive check. it is 
```MESSAGE.MESSAGE.MESSAGE``` recursion will be found and throw an error or 
drop the certain recursive depth. 
   but it will also throw an error for the below case with the field_type 
check. since it will be ```MESSAGE.MESSAGE.MESSAGE.MESSAGE```
   
   ```
   message A {
   B b = 1;
   }
   
   message B {
   D d = 1;
   }
   
   message D {
   E e = 1;
   }
   
   message E {
   int32 key = 1;
   }
   ```
   @baganokodo2022 argument is field_type base check will give users an option 
to drop recursion more quickly because with complex nested schema recursive 
field_name can be found at very deep. before hitting this we might see OOM. 
field_type base check finds the circle reference more quickly. 
   
   @baganokodo2022 please correct me if I'm wrong. 



##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala:
##
@@ -92,14 +109,38 @@ object SchemaConverters {
 MapType(keyType, valueType, valueContainsNull = 
false).defaultConcreteType,
 nullable = false))
   case MESSAGE =>
-if (existingRecordNames.contains(fd.getFullName)) {
-  throw 
QueryCompilationErrors.foundRecursionInProtobufSchema(fd.toString())
+// User can set circularReferenceDepth of 0 or 1 or 2.
+// Going beyond 3 levels of recursion is not allowed.
+if (protobufOptions.circularReferenceType.equals("FIELD_TYPE")) {
+  if (existingRecordTypes.contains(fd.getType.name()) &&
+(protobufOptions.circularReferenceDepth < 0 ||
+  protobufOptions.circularReferenceDepth >= 3)) {
+throw 
QueryCompilationErrors.foundRecursionInProtobufSchema(fd.toString())
+  } else if (existingRecordTypes.contains(fd.getType.name()) &&

Review Comment:
   @rangadi we have two maps with incremental counters, one for field_name base 
check and one for field_type. 



##
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala:
##
@@ -26,11 +26,11 @@ import com.google.protobuf.{ByteString, DynamicMessage}
 import org.apache.spark.sql.{Column, QueryTest, Row}
 import org.apache.spark.sql.AnalysisException
 import org.apache.spark.sql.functions.{lit, struct}
-import 
org.apache.spark.sql.protobuf.protos.SimpleMessageProtos.SimpleMessageRepeated
+import 
org.apache.spark.sql.protobuf.protos.SimpleMessageProtos.{EventRecursiveA, 
EventRecursiveB, OneOfEvent, OneOfEventWithRecursion, SimpleMessageRepeated}

Review Comment:
   @rangadi yes, 
   `Handle recursive fields in Protobuf schema, C->D->Array(C)` and
   `Handle recursive fields in Protobuf schema, A->B->A`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please 

[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-06 Thread GitBox


SandishKumarHN commented on code in PR #38922:
URL: https://github.com/apache/spark/pull/38922#discussion_r1041487582


##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala:
##
@@ -92,9 +92,13 @@ object SchemaConverters {
 MapType(keyType, valueType, valueContainsNull = 
false).defaultConcreteType,
 nullable = false))
   case MESSAGE =>
+// Stop recursion at the first level when a recursive field is 
encountered.
+// TODO: The user should be given the option to set the recursion 
level to 1, 2, or 3

Review Comment:
   @rangadi planning to add the selectable recursion depth in this PR. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-06 Thread GitBox


SandishKumarHN commented on code in PR #38922:
URL: https://github.com/apache/spark/pull/38922#discussion_r1041487447


##
connector/protobuf/src/test/resources/protobuf/functions_suite.proto:
##
@@ -170,4 +170,41 @@ message timeStampMsg {
 message durationMsg {
   string key = 1;
   Duration duration = 2;
-}
\ No newline at end of file
+}
+
+message OneOfEvent {

Review Comment:
   @rangadi I see a lot of use cases for the "payload" Oneof the field and 
recursive fields in it. So I thought combining Oneof with recursion would be a 
good test. will separate 



##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala:
##
@@ -157,6 +157,8 @@ private[sql] class ProtobufDeserializer(
 
   case (null, NullType) => (updater, ordinal, _) => 
updater.setNullAt(ordinal)
 
+  case (MESSAGE, NullType) => (updater, ordinal, _) => 
updater.setNullAt(ordinal)

Review Comment:
   yes, correct. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-06 Thread GitBox


SandishKumarHN commented on code in PR #38922:
URL: https://github.com/apache/spark/pull/38922#discussion_r1041487180


##
connector/protobuf/src/test/resources/protobuf/functions_suite.proto:
##
@@ -170,4 +170,41 @@ message timeStampMsg {
 message durationMsg {
   string key = 1;
   Duration duration = 2;
-}
\ No newline at end of file
+}
+
+message OneOfEvent {
+  string key = 1;
+  oneof payload {

Review Comment:
   @rangadi the "Oneof" field is of message type, Oneof will be converted to a 
struct type. 



##
connector/protobuf/src/test/scala/org/apache/spark/sql/protobuf/ProtobufFunctionsSuite.scala:
##
@@ -693,4 +721,45 @@ class ProtobufFunctionsSuite extends QueryTest with 
SharedSparkSession with Prot
   errorClass = "CANNOT_CONSTRUCT_PROTOBUF_DESCRIPTOR",
   parameters = Map("descFilePath" -> testFileDescriptor))
   }
+
+  test("Unit tests for OneOf field support and recursion checks") {

Review Comment:
   will do that. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-05 Thread GitBox


SandishKumarHN commented on code in PR #38922:
URL: https://github.com/apache/spark/pull/38922#discussion_r1040138657


##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala:
##
@@ -92,9 +92,13 @@ object SchemaConverters {
 MapType(keyType, valueType, valueContainsNull = 
false).defaultConcreteType,
 nullable = false))
   case MESSAGE =>
+// Stop recursion at the first level when a recursive field is 
encountered.
+// TODO: The user should be given the option to set the recursion 
level to 1, 2, or 3

Review Comment:
   @rangadi @mposdev21 Instead of limiting the recursion to only one level, the 
user should be able to choose a recursion level of 1, 2, or 3. Going beyond 3 
levels of recursion should not be allowed. any thoughts? 
   
   `spark.protobuf.recursion.level`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SandishKumarHN commented on a diff in pull request #38922: [SPARK-41396][SQL][PROTOBUF] OneOf field support and recursion checks

2022-12-05 Thread GitBox


SandishKumarHN commented on code in PR #38922:
URL: https://github.com/apache/spark/pull/38922#discussion_r1040138657


##
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala:
##
@@ -92,9 +92,13 @@ object SchemaConverters {
 MapType(keyType, valueType, valueContainsNull = 
false).defaultConcreteType,
 nullable = false))
   case MESSAGE =>
+// Stop recursion at the first level when a recursive field is 
encountered.
+// TODO: The user should be given the option to set the recursion 
level to 1, 2, or 3

Review Comment:
   @rangadi Instead of limiting the recursion to only one level, the user 
should be able to choose a recursion level of 1, 2, or 3. Going beyond 3 levels 
of recursion should not be allowed. any thoughts? 
   
   `spark.protobuf.recursion.level`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org