GitHub user sameeragarwal opened a pull request:

    https://github.com/apache/spark/pull/13419

    [SPARK-15678][SQL] Drop cache on appends and overwrites

    ## What changes were proposed in this pull request?
    
    SparkSQL currently doesn't drop caches if the underlying data is 
overwritten. This PR fixes that behavior.
    
    ```scala
    val dir = "/tmp/test"
    sqlContext.range(1000).write.mode("overwrite").parquet(dir)
    val df = sqlContext.read.parquet(dir).cache()
    df.count() // outputs 1000
    sqlContext.range(10).write.mode("overwrite").parquet(dir)
    sqlContext.read.parquet(dir).count() // outputs 1000 instead of 10 <---- We 
are still using the cached dataset
    ```
    
    ## How was this patch tested?
    
    Unit tests for overwrites and appends in `ParquetQuerySuite`.
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sameeragarwal/spark drop-cache-on-write

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13419.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13419
    
----
commit ee631d2d98f72d99da00d8922fc4cf6a66cf063c
Author: Sameer Agarwal <[email protected]>
Date:   2016-05-31T18:27:41Z

    Drop cache on appends and overwrites

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to