mgaido91 commented on issue #23308: [SPARK-26308][SQL] Infer abstract decimal 
type for java/scala BigDecimal
URL: https://github.com/apache/spark/pull/23308#issuecomment-447316317
 
 
   thanks for your review @holdenk. Sure let me clarify some points. I hope 
(and think) that after the explanation, this PR will seem less scary:
    - The main issue here is that we cannot infer the precision and scale for 
decimals when we create the UDF and in general when we face a `BigDecimal` or 
`Decimal`. So we should return a generic `DecimalType` for it, without any 
precision and scale set; `DecimalType` is an `AbstractDataType` though. Notice 
that the generic `DecimalType` accepts any implementation of it, so we do not 
have issues with precision and scale, whatever they are.
    - `DataType` inherits from `AbstractDataType`. And the 
`defaultConcreteDataType` for each `DataType` is itself. So every type we use 
`forSchema` and we need a `DataType` rather than an abstract one, then we get 
the default concrete type we get the exact same thing (for `DecimalType` the 
default is `SYSTEM_DEFAULT`, so again everything is like before).
    - There was another possibility: in UDFs, after inferring the schema, 
replace all the decimal types with the abstract type. But that option seemed a 
bit hacky to me adn this one cleaner. Moreover, there is still an issue which 
this PR doesn't solve: if we have an array, struct or map containing decimals, 
the problem is still there as they need concrete data types as sub-elements. 
After this patch, we just need to create an array, struct, map abstract types 
accepting abstract members which are returned in `forSchema` to avoid also 
that. In the other case, we would need to add also them to special handling for 
UDFs, which seems to me indicating that it would be a bad design and hard to 
maintain.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to