justaparth commented on PR #41108:
URL: https://github.com/apache/spark/pull/41108#issuecomment-1557723788

   > What if you have a UDF that converts this to BigDecimal? Will you get the 
value back? I guess that is the intention behind why protobuf-java casts 
unsiged to signed in its Java methods. I think it simpler to go this way. 
   
   Yeah, there is no information loss so you can get the right value the way I 
did in this PR (Integer.toUnsignedLong, Long.toUnsignedString). I think, 
though, it's useful if the `spark-protobuf` library can do this; the burden of 
taking a struct and trying to do this transformation is cumbersome, in my 
opinion.
   
   However, one additional piece of information is that **for unsigned types in 
parquet, the default behavior is to represent them in larger types**. I put 
this in the PR description but see this ticket 
https://issues.apache.org/jira/browse/SPARK-34817 implemented in this PR: 
https://github.com/apache/spark/pull/31921. Or the existing code today 
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L243-L247
 which shows that **by default** parquet unsigned values are actually expanded 
to larger types in spark.
   
   So, since this same problem/solution exists in another storage format, I 
think its useful to implement this behavior here as well. I also think that it 
actually _does_ make sense to do it by default, as parquet already does this. 
However, i'm open also to doing this transformation behind an option so that no 
existing usages are broken. Mainly, I want to just make sure we do  what is the 
most correct and broadly consistent thing to do (and i'm not really sure 
exactly what that is, and would love some other inputs). cc @HyukjinKwon as 
well here since you reviewed the original PR doing this for parquet!
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to