[GitHub] spark pull request #21547: [SPARK-24538][SQL] ByteArrayDecimalType support p...

2018-06-28 Thread wangyum
Github user wangyum closed the pull request at:

https://github.com/apache/spark/pull/21547


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21547: [SPARK-24538][SQL] ByteArrayDecimalType support p...

2018-06-13 Thread wangyum
Github user wangyum commented on a diff in the pull request:

https://github.com/apache/spark/pull/21547#discussion_r195289284
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -37,6 +39,23 @@ private[parquet] class ParquetFilters(pushDownDate: 
Boolean) {
 DateTimeUtils.fromJavaDate(date)
   }
 
+  private def decimalToBinary(precision: Int, decimal: JBigDecimal): 
Binary = {
--- End diff --

REF: 
https://github.com/apache/spark/blob/21a7bfd5c324e6c82152229f1394f26afeae771c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetWriteSupport.scala#L247-L266


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21547: [SPARK-24538][SQL] ByteArrayDecimalType support p...

2018-06-13 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/21547#discussion_r194973435
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala
 ---
@@ -62,6 +62,11 @@ private[parquet] class ParquetFilters(pushDownDate: 
Boolean) {
   (n: String, v: Any) => FilterApi.eq(
 intColumn(n),
 Option(v).map(date => 
dateToDays(date.asInstanceOf[Date]).asInstanceOf[Integer]).orNull)
+case decimal: DecimalType if 
DecimalType.isByteArrayDecimalType(decimal) =>
+  (n: String, v: Any) => FilterApi.eq(
+binaryColumn(n),
+Option(v).map(d => Binary.fromReusedByteArray(new Array[Byte](8) ++
+
v.asInstanceOf[java.math.BigDecimal].unscaledValue().toByteArray)).orNull)
--- End diff --

v.asInstanceOf -> d.asInstanceOf?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21547: [SPARK-24538][SQL] ByteArrayDecimalType support p...

2018-06-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/21547#discussion_r194950951
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala
 ---
@@ -359,6 +359,41 @@ class ParquetFilterSuite extends QueryTest with 
ParquetTest with SharedSQLContex
 }
   }
 
+  test("filter pushdown - decimal(ByteArrayDecimalType)") {
+val one = new java.math.BigDecimal(1)
+val two = new java.math.BigDecimal(2)
+val three = new java.math.BigDecimal(3)
+val four = new java.math.BigDecimal(4)
--- End diff --

Hm, why don't we follow the style in this file with `implicit class`?


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #21547: [SPARK-24538][SQL] ByteArrayDecimalType support p...

2018-06-12 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/21547

[SPARK-24538][SQL] ByteArrayDecimalType support push down to the data 
sources

## What changes were proposed in this pull request?


[ByteArrayDecimalType](https://github.com/apache/spark/blob/e28eb431146bcdcaf02a6f6c406ca30920592a6a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L230)
 support push down to the data sources.

## How was this patch tested?
unit tests and manual tests.

**manual tests**:
```scala
spark.range(1000).selectExpr("id", "cast(id as decimal(9)) as d1", 
"cast(id as decimal(9, 2)) as d2", "cast(id as decimal(18)) as d3", "cast(id as 
decimal(18, 4)) as d4", "cast(id as decimal(38)) as d5", "cast(id as 
decimal(38, 18)) as d6").coalesce(1).write.option("parquet.block.size", 
1048576).parquet("/tmp/spark/parquet/decimal")
val df = spark.read.parquet("/tmp/spark/parquet/decimal/")
// Only read about 1 MB data
df.filter("d6 = 1").show
// Read 174.3 MB data
df.filter("d3 = 1").show
```


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark SPARK-24538

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/21547.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #21547


commit 96066701ec75d3caa27994c47eab8ff64150b6a5
Author: Yuming Wang 
Date:   2018-06-13T01:35:55Z

ByteArrayDecimalType support push down to the data sources




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org