GitHub user concretevitamin opened a pull request:

    https://github.com/apache/spark/pull/1408

    [SPARK-2443][SQL] Fix slow read from partitioned tables

    This fix obtains a comparable performance boost as [PR 
#1390](https://github.com/apache/spark/pull/1390) by moving an array update and 
deserializer initialization out of a potentially very long loop. Suggested by 
@yhuai. The below results are updated for this fix.
    
    ## Benchmarks
    Generated a local text file with 10M rows of simple key-value pairs. The 
data is loaded as a table through Hive. Results are obtained on my local 
machine using hive/console.
    
    Without the fix:
    
    Type | Non-partitioned | Partitioned (1 part)
    ------------ | ------------ | -------------
    First run | 9.52s end-to-end (1.64s Spark job) | 36.6s (28.3s)
    Stablized runs | 1.21s (1.18s) | 27.6s (27.5s)
    
    With this fix:
    
    Type | Non-partitioned | Partitioned (1 part)
    ------------ | ------------ | -------------
    First run | 9.57s (1.46s) | 11.0s (1.69s)
    Stablized runs | 1.13s (1.10s) | 1.23s (1.19s)
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/concretevitamin/spark slow-read-2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1408.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1408
    
----
commit d86e437218f99179934ccd9b4d5d89c02b09459d
Author: Zongheng Yang <[email protected]>
Date:   2014-07-14T18:03:07Z

    Move update & initialization out of potentially long loop.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to