[GitHub] spark pull request #19496: [SPARK-22271][SQL]mean overflows and returns null...

2017-10-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/19496


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19496: [SPARK-22271][SQL]mean overflows and returns null...

2017-10-17 Thread huaxingao
Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/19496#discussion_r145182120
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -2103,4 +2103,13 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
   testData2.select(lit(7), 'a, 'b).orderBy(lit(1), lit(2), lit(3)),
   Seq(Row(7, 1, 1), Row(7, 1, 2), Row(7, 2, 1), Row(7, 2, 2), Row(7, 
3, 1), Row(7, 3, 2)))
   }
+
+  test("SPARK-22271: mean overflows and returns null for some decimal 
variables") {
+val d = 0.034567890
+val df = Seq(d, d, d, d, d, d, d, d, d, d).toDF("DecimalCol")
+val result = df.select('DecimalCol cast DecimalType(38, 33))
+.select(col("DecimalCol")).describe()
+val mean = result.select("DecimalCol").where($"summary" === "mean")
+assert(mean.collect.toSet === 
Set(Row("0.034567890")))
--- End diff --

@gatorsmile Thanks Sean for your review. I will fix the problems. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19496: [SPARK-22271][SQL]mean overflows and returns null...

2017-10-16 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19496#discussion_r145034923
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -2103,4 +2103,13 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
   testData2.select(lit(7), 'a, 'b).orderBy(lit(1), lit(2), lit(3)),
   Seq(Row(7, 1, 1), Row(7, 1, 2), Row(7, 2, 1), Row(7, 2, 2), Row(7, 
3, 1), Row(7, 3, 2)))
   }
+
+  test("SPARK-22271: mean overflows and returns null for some decimal 
variables") {
+val d = 0.034567890
+val df = Seq(d, d, d, d, d, d, d, d, d, d).toDF("DecimalCol")
+val result = df.select('DecimalCol cast DecimalType(38, 33))
+.select(col("DecimalCol")).describe()
+val mean = result.select("DecimalCol").where($"summary" === "mean")
+assert(mean.collect.toSet === 
Set(Row("0.034567890")))
--- End diff --

Nit: `collect` -> `collect()`


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19496: [SPARK-22271][SQL]mean overflows and returns null...

2017-10-16 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/19496#discussion_r145034645
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -2103,4 +2103,13 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
   testData2.select(lit(7), 'a, 'b).orderBy(lit(1), lit(2), lit(3)),
   Seq(Row(7, 1, 1), Row(7, 1, 2), Row(7, 2, 1), Row(7, 2, 2), Row(7, 
3, 1), Row(7, 3, 2)))
   }
+
+  test("SPARK-22271: mean overflows and returns null for some decimal 
variables") {
+val d = 0.034567890
+val df = Seq(d, d, d, d, d, d, d, d, d, d).toDF("DecimalCol")
+val result = df.select('DecimalCol cast DecimalType(38, 33))
+.select(col("DecimalCol")).describe()
--- End diff --

Nit: two space indents.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19496: [SPARK-22271][SQL]mean overflows and returns null...

2017-10-15 Thread huaxingao
Github user huaxingao commented on a diff in the pull request:

https://github.com/apache/spark/pull/19496#discussion_r144732734
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -2103,4 +2103,13 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
   testData2.select(lit(7), 'a, 'b).orderBy(lit(1), lit(2), lit(3)),
   Seq(Row(7, 1, 1), Row(7, 1, 2), Row(7, 2, 1), Row(7, 2, 2), Row(7, 
3, 1), Row(7, 3, 2)))
   }
+
+  test("SPARK-22271: mean overflows and returns null for some decimal 
variables") {
+val d: BigDecimal = BigDecimal(0.034567890)
--- End diff --

It's not necessary. I will remove. Thanks for your review. 


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19496: [SPARK-22271][SQL]mean overflows and returns null...

2017-10-15 Thread kiszk
Github user kiszk commented on a diff in the pull request:

https://github.com/apache/spark/pull/19496#discussion_r144723280
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala 
---
@@ -2103,4 +2103,13 @@ class DataFrameSuite extends QueryTest with 
SharedSQLContext {
   testData2.select(lit(7), 'a, 'b).orderBy(lit(1), lit(2), lit(3)),
   Seq(Row(7, 1, 1), Row(7, 1, 2), Row(7, 2, 1), Row(7, 2, 2), Row(7, 
3, 1), Row(7, 3, 2)))
   }
+
+  test("SPARK-22271: mean overflows and returns null for some decimal 
variables") {
+val d: BigDecimal = BigDecimal(0.034567890)
--- End diff --

Do we need `: BigDecimal`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19496: [SPARK-22271][SQL]mean overflows and returns null...

2017-10-14 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/19496#discussion_r144689475
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/Average.scala
 ---
@@ -80,7 +80,8 @@ case class Average(child: Expression) extends 
DeclarativeAggregate with Implicit
 case DecimalType.Fixed(p, s) =>
   // increase the precision and scale to prevent precision loss
   val dt = DecimalType.bounded(p + 14, s + 4)
-  Cast(Cast(sum, dt) / Cast(count, dt), resultType)
+  Cast(Cast(sum, dt) / Cast(count, DecimalType.bounded 
(DecimalType.MAX_PRECISION, 0)),
--- End diff --

No need to add space after `bounded`.


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #19496: [SPARK-22271][SQL]mean overflows and returns null...

2017-10-13 Thread huaxingao
GitHub user huaxingao opened a pull request:

https://github.com/apache/spark/pull/19496

[SPARK-22271][SQL]mean overflows and returns null for some decimal variables



## What changes were proposed in this pull request?

In Average.scala, it has
```
  override lazy val evaluateExpression = child.dataType match {
case DecimalType.Fixed(p, s) =>
  // increase the precision and scale to prevent precision loss
  val dt = DecimalType.bounded(p + 14, s + 4)
  Cast(Cast(sum, dt) / Cast(count, dt), resultType)
case _ =>
  Cast(sum, resultType) / Cast(count, resultType)
  }

  def setChild (newchild: Expression) = {
child = newchild
  }

```
It is possible that  Cast(count, dt), resultType) will make the precision 
of the decimal number bigger than 38, and this causes over flow.  Since count 
is an integer and doesn't need a scale, I will cast it using 
DecimalType.bounded(38,0)
## How was this patch tested?
In DataFrameSuite, I will add a test case. 

Please review http://spark.apache.org/contributing.html before opening a 
pull request.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/huaxingao/spark spark-22271

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/19496.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #19496


commit a3437ee4a87d1f51b362adeb20d4fcc264085ba7
Author: Huaxin Gao 
Date:   2017-10-14T04:45:27Z

[SPARK-22271][SQL]mean overflows and returns null for some decimal variables




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org