GitHub user nongli opened a pull request:

    https://github.com/apache/spark/pull/11397

    Spark 13518

    ## What changes were proposed in this pull request?
    
    WIP: Don't merge.
    
    Change the default of the flag to enable this feature.
    
    ## How was this patch tested?
    
    The new parquet reader should be a drop in, so will be exercised by the 
existing tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/nongli/spark spark-13518

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11397.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11397
    
----
commit 858080b2626394b7dd975498690dcd0cfd27bf78
Author: Nong Li <[email protected]>
Date:   2016-02-25T07:43:31Z

    [SPARK-13499][SQL] Performance improvements for parquet reader.
    
    This patch includes these performance fixes:
      - Remove unnecessary setNotNull() calls. The NULL bits are cleared 
already.
      - Speed up RLE group decoding
      - Speed up dictionary decoding by decoding NULLs directly into the result.
    
    In addition to the updated benchmarks, on TPCDS, the result of these changes
    running Q55 (sf40) is:
    
    TPCDS:                             Best/Avg Time(ms)    Rate(M/s)   Per 
Row(ns)
    
---------------------------------------------------------------------------------
    q55 (Before)                             6398 / 6616         18.0          
55.5
    q55 (After)                              4983 / 5189         23.1          
43.3

commit 385e6c867c9b3a05d571837df88be0429b9e5a8c
Author: Nong Li <[email protected]>
Date:   2016-02-26T18:59:42Z

    Update benchmark headings.

commit 358af8c5f31fdde77792b2263725f426c4acc3bd
Author: Nong Li <[email protected]>
Date:   2016-02-26T19:04:53Z

    Update ceil.

commit e8090740fe374977fd1ea2ed5e7c369827a46e7b
Author: Nong Li <[email protected]>
Date:   2016-02-26T19:07:20Z

    [SPARK-13518][SQL] Enable vectorized parquet scanner by default.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to