justaparth commented on PR #41108: URL: https://github.com/apache/spark/pull/41108#issuecomment-1557723788
> What if you have a UDF that converts this to BigDecimal? Will you get the value back? I guess that is the intention behind why protobuf-java casts unsiged to signed in its Java methods. I think it simpler to go this way. Yeah, there is no information loss so you can get the right value the way I did in this PR (Integer.toUnsignedLong, Long.toUnsignedString). I think, though, it's useful if the `spark-protobuf` library can do this; the burden of taking a struct and trying to do this transformation is cumbersome, in my opinion. However, one additional piece of information is that **for unsigned types in parquet, the default behavior is to represent them in larger types**. I put this in the PR description but see this ticket https://issues.apache.org/jira/browse/SPARK-34817 implemented in this PR: https://github.com/apache/spark/pull/31921. Or the existing code today https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetSchemaConverter.scala#L243-L247 which shows that **by default** parquet unsigned values are actually expanded to larger types in spark. So, since this same problem/solution exists in another storage format, I think its useful to implement this behavior here as well. I also think that it actually _does_ make sense to do it by default, as parquet already does this. However, i'm open also to doing this transformation behind an option so that no existing usages are broken. Mainly, I want to just make sure we do what is the most correct and broadly consistent thing to do (and i'm not really sure exactly what that is, and would love some other inputs). cc @HyukjinKwon as well here since you reviewed the original PR doing this for parquet! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
