GitHub user cloud-fan opened a pull request:

    https://github.com/apache/spark/pull/21319

    [SPARK-24267][SQL] explicitly keep DataSourceReader in DataSourceV2Relation

    ## What changes were proposed in this pull request?
    
    To keep `DataSourceV2Relation` immutable, we don't put the 
`DataSourceReader` in the constructor, but make it a `lazy val` instead. If we 
think about how `lazy val` is implemented in Scala, we actually keep a 
`DataSourceReader` instance in `DataSourceV2Relation`, and exclude it when 
defining equality of `DataSourceV2Relation`.
    
    This works, but have 2 problems:
    1. after the pushdown rule, if `DataSourceV2Relation` get transformed and 
return a new copy, we will re-do the pushdown and re-create the 
`DataSourceReader`.
    2. the pushdown logic is defined in `DataSourceV2Relation` instead of the 
pushdown rule, which is a little counter-intuitive.
    
    This PR proposes to implement the `lazy val` by ourselves: keep 
`DataSourceReader` as an optional parameter in the constructor of 
`DataSourceV2Relation`, exclude it in the equality definition but include it 
when copying.
    
    ## How was this patch tested?
    
    existing tests

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/cloud-fan/spark refactor

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21319.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21319
    
----
commit ca6ccb24e8f2910a3ffc07a790f2ba7f57e79056
Author: Wenchen Fan <wenchen@...>
Date:   2018-04-24T17:26:19Z

    do not create DataSourceReader many times

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to