Github user rxin commented on a diff in the pull request:
https://github.com/apache/spark/pull/4958#discussion_r26095307
--- Diff: docs/sql-programming-guide.md ---
@@ -662,8 +662,142 @@ for name in names.collect():
Spark SQL supports operating on a variety of data sources through the
`DataFrame` interface.
A DataFrame can be operated on as normal RDDs and can also be registered
as a temporary table.
Registering a DataFrame as a table allows you to run SQL queries over its
data. This section
-describes the various methods for loading data into a DataFrame.
+describes the general methods for loading and saving data using the Spark
Data Sources and then
+goes into specific options that are available for the built-in data
sources.
+## Generic Load/Save Functions
+
+In the simplest form, the default data source (`parquet` unless otherwise
configured by
+`spark.sql.sources.default`) will be used for all operations.
+
+<div class="codetabs">
+<div data-lang="scala" markdown="1">
+
+{% highlight scala %}
+val df = sqlContext.load("people.parquet")
+df.select("name", "age").save("namesAndAges.parquet")
+{% endhighlight %}
+
+</div>
+
+<div data-lang="java" markdown="1">
+
+{% highlight java %}
+
+DataFrame df = sqlContext.load("people.parquet");
+df.select("name", "age").save("namesAndAges.parquet");
+
+{% endhighlight %}
+
+</div>
+
+<div data-lang="python" markdown="1">
+
+{% highlight python %}
+
+df = sqlContext.load("people.parquet")
+df.select("name", "age").save("namesAndAges.parquet")
+
+{% endhighlight %}
+
+</div>
+</div>
+
+### Manually Specifying Options
+
+You can also manually specify the data source that will be used along with
any extra options
+that you would like to pass to the datasource. Data sources are specified
by their fully qualified
+name (i.e., `org.apache.spark.sql.parquet`), but for built-in sources you
can also use the shorted
+name (`json`, `parquet`, `jdbc`). DataFrames of any type can be converted
into other types
+using this syntax.
+
+<div class="codetabs">
+<div data-lang="scala" markdown="1">
+
+{% highlight scala %}
+val df = sqlContext.load("json", "people.json")
+df.select("name", "age").save("parquet", "namesAndAges.parquet")
--- End diff --
data source should be the last argument (we did it for python)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]