GitHub user nongli opened a pull request:
https://github.com/apache/spark/pull/11397
Spark 13518
## What changes were proposed in this pull request?
WIP: Don't merge.
Change the default of the flag to enable this feature.
## How was this patch tested?
The new parquet reader should be a drop in, so will be exercised by the
existing tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/nongli/spark spark-13518
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11397.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11397
----
commit 858080b2626394b7dd975498690dcd0cfd27bf78
Author: Nong Li <[email protected]>
Date: 2016-02-25T07:43:31Z
[SPARK-13499][SQL] Performance improvements for parquet reader.
This patch includes these performance fixes:
- Remove unnecessary setNotNull() calls. The NULL bits are cleared
already.
- Speed up RLE group decoding
- Speed up dictionary decoding by decoding NULLs directly into the result.
In addition to the updated benchmarks, on TPCDS, the result of these changes
running Q55 (sf40) is:
TPCDS: Best/Avg Time(ms) Rate(M/s) Per
Row(ns)
---------------------------------------------------------------------------------
q55 (Before) 6398 / 6616 18.0
55.5
q55 (After) 4983 / 5189 23.1
43.3
commit 385e6c867c9b3a05d571837df88be0429b9e5a8c
Author: Nong Li <[email protected]>
Date: 2016-02-26T18:59:42Z
Update benchmark headings.
commit 358af8c5f31fdde77792b2263725f426c4acc3bd
Author: Nong Li <[email protected]>
Date: 2016-02-26T19:04:53Z
Update ceil.
commit e8090740fe374977fd1ea2ed5e7c369827a46e7b
Author: Nong Li <[email protected]>
Date: 2016-02-26T19:07:20Z
[SPARK-13518][SQL] Enable vectorized parquet scanner by default.
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]