rangadi commented on code in PR #43773:
URL: https://github.com/apache/spark/pull/43773#discussion_r1422991198


##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala:
##########
@@ -67,9 +68,23 @@ object SchemaConverters extends Logging {
       existingRecordNames: Map[String, Int],
       protobufOptions: ProtobufOptions): Option[StructField] = {
     import com.google.protobuf.Descriptors.FieldDescriptor.JavaType._
+
+    val upcastUnsignedInts = protobufOptions.upcastUnsignedInts

Review Comment:
   can use `protobufOptions.upcastUnsignedInts` directly below. 
   Or add 'option' to this variable's name. It is more readable. 



##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala:
##########
@@ -193,6 +193,11 @@ private[sql] class ProtobufDeserializer(
       case (INT, ShortType) =>
         (updater, ordinal, value) => updater.setShort(ordinal, 
value.asInstanceOf[Short])
 
+      case (INT, LongType) =>
+        (updater, ordinal, value) =>
+          updater.setLong(
+            ordinal,
+            Integer.toUnsignedLong(value.asInstanceOf[Int]))

Review Comment:
   I am less certain about over all Spark context, but for `from_protobuf()` 
this looks fine so far (since it is only enabled with an option).



##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/utils/SchemaConverters.scala:
##########
@@ -67,9 +68,23 @@ object SchemaConverters extends Logging {
       existingRecordNames: Map[String, Int],
       protobufOptions: ProtobufOptions): Option[StructField] = {
     import com.google.protobuf.Descriptors.FieldDescriptor.JavaType._
+
+    val upcastUnsignedInts = protobufOptions.upcastUnsignedInts
+
     val dataType = fd.getJavaType match {
-      case INT => Some(IntegerType)
-      case LONG => Some(LongType)
+      // When the protobuf type is unsigned and upcastUnsignedIntegers has 
been set,
+      // use a larger type (LongType and Decimal(20,0) for uint32 and uint64).
+      case INT =>
+        if (fd.getLiteType == WireFormat.FieldType.UINT32 && 
upcastUnsignedInts) {

Review Comment:
   Would it be better to add a log here? 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to