[GitHub] spark pull request: add cacheTable guide

pwendell Thu, 31 Jul 2014 00:06:08 -0700

Github user pwendell commented on a diff in the pull request:

    https://github.com/apache/spark/pull/1681#discussion_r15628521
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -769,3 +769,15 @@ To start the Spark SQL CLI, run the following in the 
Spark directory:
     Configuration of Hive is done by placing your `hive-site.xml` file in 
`conf/`.
     You may run `./bin/spark-sql --help` for a complete list of all available
     options.
    +
    +# Cache tables
    +
    +Spark SQL can cache tables using an in-memory columnar format by calling 
`cacheTable("tableName")`.
    --- End diff --
    
    Some grammar fixes:
    
    ```
    Spark SQL can cache tables using an in-memory columnar format by calling 
`cacheTable("tableName")`.
    Then Spark SQL will scan only required columns and will automatically 
select best compression to
    minimize memory usage an GC pressure. You can call 
`uncacheTable("tableName")` to remove the
    table from memory.
    
    Note that if you just call `cache` rather than `cacheTable`, tables will 
_not_ be cached in
    in-memory columnar format. So we strongly recommend using `cacheTable` 
whenever you want to
    cache tables.
    ```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: add cacheTable guide

Reply via email to