Github user liancheng commented on a diff in the pull request:

    https://github.com/apache/spark/pull/3039#discussion_r19699722
  
    --- Diff: sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala ---
    @@ -267,6 +267,23 @@ class SQLContext(@transient val sparkContext: 
SparkContext)
       }
     
       /**
    +   * Unregisters the temporary table with the given table name in the 
catalog. If the table has been
    +   * cached/persisted before, it can be unpersisted if required.
    +   *
    +   * @param tableName the name of the table to be unregistered.
    +   * @param unpersist whether to unpersist the table if it has been 
cached/persisted before.
    +   *
    +   * @group userf
    +   */
    +  def unregisterTempTable(tableName: String, unpersist: Boolean = false): 
Unit = {
    --- End diff --
    
    But indeed I think this API is not intuitive. Another choice is to ask 
users to uncache the table explicitly before call `dropTempTable`, and simply 
don't care about caching in `dropTempTable`. But this calling sequence is not 
thread-safe, plus it's verbose and error prone from user's perspective.
    
    The ultimate solution should be another refinement of caching semantics: 
namely making cached columnar RDDs "reference counted", a cached columnar RDD 
shouldn't be removed as long as there's at least one temporary table uses it.
    
    Maybe for now, we can just unpersist the cache if any and accept the risk 
of losing cache of multiple tables?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to