beliefer commented on code in PR #43781:
URL: https://github.com/apache/spark/pull/43781#discussion_r1391880151
##########
docs/sql-ref-ansi-compliance.md:
##########
@@ -240,6 +240,25 @@ The least common type resolution is used to:
- Derive the result type for expressions such as the case expression.
- Derive the element, key, or value types for array and map constructors.
Special rules are applied if the least common type resolves to FLOAT. With
float type values, if any of the types is INT, BIGINT, or DECIMAL the least
common type is pushed to DOUBLE to avoid potential loss of digits.
+
+Decimal type is a bit more complicated here, as it's not a simple type but has
parameters: precision and scale.
+A `decimal(precision, scale)` means the value can has at most `precision -
scale` digits in the integral part and `scale` digits in the fractional part.
+A least common type between decimal types should have enough digits in both
integral and fractional parts to represent all values.
+More precisely, a least common type between `decimal(p1, s1)` and `decimal(p2,
s2)` has the scale of `max(s1, s2)` and precision of `max(s1, s2) + max(p1 -
s1, p2 - s2)`.
+However, decimal types in Spark has a maximum precision: 38. If the final
decimal type needs more precision, we must do truncation.
+Since the digits in the integral part are more significant, Spark truncates
the digits in the fractional part first. For example, `decimal(48, 20)` will be
reduced to `decimal(38, 10)`.
+
+Note, arithmetic operations have special rules to calculate the least common
type for decimal inputs:
+
+| Operation | Result precision | Result scale |
+|------------|------------------------------------------|---------------------|
+| e1 + e2 | max(s1, s2) + max(p1 - s1, p2 - s2) + 1 | max(s1, s2) |
+| e1 - e2 | max(s1, s2) + max(p1 - s1, p2 - s2) + 1 | max(s1, s2) |
+| e1 * e2 | p1 + p2 + 1 | s1 + s2 |
+| e1 / e2 | p1 - s1 + s2 + max(6, s1 + p2 + 1) | max(6, s1 + p2 + 1) |
+| e1 % e2 | min(p1 - s1, p2 - s2) + max(s1, s2) | max(s1, s2) |
Review Comment:
AFAIK, the arithmetic operations did not strictly follow this rule.
##########
sql/api/src/main/scala/org/apache/spark/sql/types/DecimalType.scala:
##########
@@ -146,6 +146,18 @@ object DecimalType extends AbstractDataType {
DecimalType(min(precision, MAX_PRECISION), min(scale, MAX_SCALE))
}
+ private[sql] def boundedPreferIntegralDigits(precision: Int, scale: Int):
DecimalType = {
+ if (precision <= MAX_PRECISION) {
+ DecimalType(precision, scale)
+ } else {
+ // If we have to reduce the precision, we should retain the digits in
the integral part first,
+ // as they are more significant to the value. Here we reduce the scale
as well to drop the
+ // digits in the fractional part.
Review Comment:
Looks good.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]