[GitHub] spark pull request #18989: [SPARK-21781][SQL] Modify DataSourceScanExec to u...

ueshin Thu, 17 Aug 2017 22:42:08 -0700

GitHub user ueshin opened a pull request:

    https://github.com/apache/spark/pull/18989


    [SPARK-21781][SQL] Modify DataSourceScanExec to use concrete ColumnVector 
type.

    ## What changes were proposed in this pull request?
    
    As mentioned at 
https://github.com/apache/spark/pull/18680#issuecomment-316820409, when we have 
more `ColumnVector` implementations, it might (or might not) have huge 
performance implications because it might disable inlining, or force virtual 
dispatches.
    
    As for read path, one of the major paths is the one generated by 
`ColumnBatchScan`. Currently it refers `ColumnVector` so the penalty will be 
bigger as we have more classes, but we can know the concrete type from its 
usage, e.g. vectorized Parquet reader uses `OnHeapColumnVector`. We can use the 
concrete type in the generated code directly to avoid the penalty.
    
    ## How was this patch tested?
    
    Existing tests.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ueshin/apache-spark issues/SPARK-21781

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/18989.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #18989
    
----
commit 6f19db753a4cba36f92bd225eb63b60080158054
Author: Takuya UESHIN <[email protected]>
Date:   2017-08-18T04:53:00Z

    Modify DataSourceScanExec to use concrete ColumnVector type.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #18989: [SPARK-21781][SQL] Modify DataSourceScanExec to u...

Reply via email to