Github user pwendell commented on a diff in the pull request:
https://github.com/apache/spark/pull/1681#discussion_r15628521
--- Diff: docs/sql-programming-guide.md ---
@@ -769,3 +769,15 @@ To start the Spark SQL CLI, run the following in the
Spark directory:
Configuration of Hive is done by placing your `hive-site.xml` file in
`conf/`.
You may run `./bin/spark-sql --help` for a complete list of all available
options.
+
+# Cache tables
+
+Spark SQL can cache tables using an in-memory columnar format by calling
`cacheTable("tableName")`.
--- End diff --
Some grammar fixes:
```
Spark SQL can cache tables using an in-memory columnar format by calling
`cacheTable("tableName")`.
Then Spark SQL will scan only required columns and will automatically
select best compression to
minimize memory usage an GC pressure. You can call
`uncacheTable("tableName")` to remove the
table from memory.
Note that if you just call `cache` rather than `cacheTable`, tables will
_not_ be cached in
in-memory columnar format. So we strongly recommend using `cacheTable`
whenever you want to
cache tables.
```
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---