Github user ajithme commented on a diff in the pull request:

    https://github.com/apache/spark/pull/22401#discussion_r217038517
  
    --- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala
 ---
    @@ -36,7 +36,13 @@ abstract class AverageLike(child: Expression) extends 
DeclarativeAggregate {
       }
     
       private lazy val sumDataType = child.dataType match {
    -    case _ @ DecimalType.Fixed(p, s) => DecimalType.bounded(p + 10, s)
    +    /*
    +     * In case of sum of decimal ( assuming another decimal of same 
precision and scale)
    +     * Refer : org.apache.spark.sql.catalyst.analysis.DecimalPrecision
    +     * Precision : max(s1, s2) + max(p1 - s1, p2 - s2) + 1
    +     * Scale : max(s1, s2)
    +     */
    +    case _ @ DecimalType.Fixed(p, s) => DecimalType.adjustPrecisionScale(s 
+ (p - s) + 1, s)
    --- End diff --
    
    well i tested as per your suggestion with my PR : 
    ```
     sql("create table if not exists table1(salary decimal(2,1))")
     (1 to 22).foreach(_ => sql("insert into table1 values(9.1)"))
     sql("select avg(salary) from table1").show(false)
    
    +-----------+
    |avg(salary)|
    +-----------+
    |9.10000    |
    +-----------+
    ```
    which is expected result and i don't see a overflow as divide will readjust 
precision. Can you test with my patch for a overflow specifically in case of 
average.?


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to