Github user mengxr commented on the pull request:

    https://github.com/apache/spark/pull/13419
  
    I will prefer refreshing the dataset every time a dataset is reloaded but 
keeping existing ones unchanged.
    
    ~~~scala
    val df1 = sqlContext.read.parquet(dir).cache()
    df1.count() // outputs 1000
    sqlContext.range(10).write.mode("overwrite").parquet(dir)
    val df2 = sqlContext.read.parquet(dir).count() // outputs 10
    df2.count() // outputs 10
    df1.count() // still outputs 1000 because it was cached
    ~~~
    
    Neither approach is perfectly safe. So I don't have no strong preference on 
either.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to