GitHub user wangyum opened a pull request:
https://github.com/apache/spark/pull/21547
[SPARK-24538][SQL] ByteArrayDecimalType support push down to the data
sources
## What changes were proposed in this pull request?
[ByteArrayDecimalType](https://github.com/apache/spark/blob/e28eb431146bcdcaf02a6f6c406ca30920592a6a/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L230)
support push down to the data sources.
## How was this patch tested?
unit tests and manual tests.
**manual tests**:
```scala
spark.range(10000000).selectExpr("id", "cast(id as decimal(9)) as d1",
"cast(id as decimal(9, 2)) as d2", "cast(id as decimal(18)) as d3", "cast(id as
decimal(18, 4)) as d4", "cast(id as decimal(38)) as d5", "cast(id as
decimal(38, 18)) as d6").coalesce(1).write.option("parquet.block.size",
1048576).parquet("/tmp/spark/parquet/decimal")
val df = spark.read.parquet("/tmp/spark/parquet/decimal/")
// Only read about 1 MB data
df.filter("d6 = 10000").show
// Read 174.3 MB data
df.filter("d3 = 10000").show
```
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/wangyum/spark SPARK-24538
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21547.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21547
----
commit 96066701ec75d3caa27994c47eab8ff64150b6a5
Author: Yuming Wang <yumwang@...>
Date: 2018-06-13T01:35:55Z
ByteArrayDecimalType support push down to the data sources
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]