justaparth commented on code in PR #43773:
URL: https://github.com/apache/spark/pull/43773#discussion_r1390779040
##########
connector/protobuf/src/main/scala/org/apache/spark/sql/protobuf/ProtobufDeserializer.scala:
##########
@@ -193,6 +193,11 @@ private[sql] class ProtobufDeserializer(
case (INT, ShortType) =>
(updater, ordinal, value) => updater.setShort(ordinal,
value.asInstanceOf[Short])
+ case (INT, LongType) =>
+ (updater, ordinal, value) =>
+ updater.setLong(
+ ordinal,
+ Integer.toUnsignedLong(value.asInstanceOf[Int]))
Review Comment:
> It would be problematic when Spark has unsigned types. For the same
reason, Parquet also doesn't support unsigned physical types for Spark.
hey, i'm not sure if i follow; do you mind explaining what you mean by this?
My goal here is to add an option allowing unsigned 32 and 64 bit integers
coming from protobuf to be represented in a type that can contain them without
overflow. I actually modeled my code off of how the parquet code today is
written, which i believe is doing this same thing by default:
https://github.com/justaparth/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L243-L270
https://github.com/justaparth/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala#L345-L351
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]