[GitHub] spark pull request #21547: [SPARK-24538][SQL] ByteArrayDecimalType support p...
Github user wangyum closed the pull request at: https://github.com/apache/spark/pull/21547 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21547: [SPARK-24538][SQL] ByteArrayDecimalType support p...
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/21547#discussion_r195289284 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -37,6 +39,23 @@ private[parquet] class ParquetFilters(pushDownDate: Boolean) { DateTimeUtils.fromJavaDate(date) } + private def decimalToBinary(precision: Int, decimal: JBigDecimal): Binary = { --- End diff -- REF: https://github.com/apache/spark/blob/21a7bfd5c324e6c82152229f1394f26afeae771c/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetWriteSupport.scala#L247-L266 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21547: [SPARK-24538][SQL] ByteArrayDecimalType support p...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21547#discussion_r194973435 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilters.scala --- @@ -62,6 +62,11 @@ private[parquet] class ParquetFilters(pushDownDate: Boolean) { (n: String, v: Any) => FilterApi.eq( intColumn(n), Option(v).map(date => dateToDays(date.asInstanceOf[Date]).asInstanceOf[Integer]).orNull) +case decimal: DecimalType if DecimalType.isByteArrayDecimalType(decimal) => + (n: String, v: Any) => FilterApi.eq( +binaryColumn(n), +Option(v).map(d => Binary.fromReusedByteArray(new Array[Byte](8) ++ + v.asInstanceOf[java.math.BigDecimal].unscaledValue().toByteArray)).orNull) --- End diff -- v.asInstanceOf -> d.asInstanceOf? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21547: [SPARK-24538][SQL] ByteArrayDecimalType support p...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21547#discussion_r194950951 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFilterSuite.scala --- @@ -359,6 +359,41 @@ class ParquetFilterSuite extends QueryTest with ParquetTest with SharedSQLContex } } + test("filter pushdown - decimal(ByteArrayDecimalType)") { +val one = new java.math.BigDecimal(1) +val two = new java.math.BigDecimal(2) +val three = new java.math.BigDecimal(3) +val four = new java.math.BigDecimal(4) --- End diff -- Hm, why don't we follow the style in this file with `implicit class`? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #21547: [SPARK-24538][SQL] ByteArrayDecimalType support p...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/21547 [SPARK-24538][SQL] ByteArrayDecimalType support push down to the data sources ## What changes were proposed in this pull request? [ByteArrayDecimalType](https://github.com/apache/spark/blob/e28eb431146bcdcaf02a6f6c406ca30920592a6a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L230) support push down to the data sources. ## How was this patch tested? unit tests and manual tests. **manual tests**: ```scala spark.range(1000).selectExpr("id", "cast(id as decimal(9)) as d1", "cast(id as decimal(9, 2)) as d2", "cast(id as decimal(18)) as d3", "cast(id as decimal(18, 4)) as d4", "cast(id as decimal(38)) as d5", "cast(id as decimal(38, 18)) as d6").coalesce(1).write.option("parquet.block.size", 1048576).parquet("/tmp/spark/parquet/decimal") val df = spark.read.parquet("/tmp/spark/parquet/decimal/") // Only read about 1 MB data df.filter("d6 = 1").show // Read 174.3 MB data df.filter("d3 = 1").show ``` You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark SPARK-24538 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/21547.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #21547 commit 96066701ec75d3caa27994c47eab8ff64150b6a5 Author: Yuming Wang Date: 2018-06-13T01:35:55Z ByteArrayDecimalType support push down to the data sources --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org