[GitHub] spark pull request: [SPARK-5183][SQL] Update SQL Docs with JDBC an...

rxin Mon, 09 Mar 2015 19:37:17 -0700

Github user rxin commented on a diff in the pull request:

    https://github.com/apache/spark/pull/4958#discussion_r26095307
  
    --- Diff: docs/sql-programming-guide.md ---
    @@ -662,8 +662,142 @@ for name in names.collect():
     Spark SQL supports operating on a variety of data sources through the 
`DataFrame` interface.
     A DataFrame can be operated on as normal RDDs and can also be registered 
as a temporary table.
     Registering a DataFrame as a table allows you to run SQL queries over its 
data.  This section
    -describes the various methods for loading data into a DataFrame.
    +describes the general methods for loading and saving data using the Spark 
Data Sources and then
    +goes into specific options that are available for the built-in data 
sources.
     
    +## Generic Load/Save Functions
    +
    +In the simplest form, the default data source (`parquet` unless otherwise 
configured by
    +`spark.sql.sources.default`) will be used for all operations.
    +
    +<div class="codetabs">
    +<div data-lang="scala"  markdown="1">
    +
    +{% highlight scala %}
    +val df = sqlContext.load("people.parquet")
    +df.select("name", "age").save("namesAndAges.parquet")
    +{% endhighlight %}
    +
    +</div>
    +
    +<div data-lang="java"  markdown="1">
    +
    +{% highlight java %}
    +
    +DataFrame df = sqlContext.load("people.parquet");
    +df.select("name", "age").save("namesAndAges.parquet");
    +
    +{% endhighlight %}
    +
    +</div>
    +
    +<div data-lang="python"  markdown="1">
    +
    +{% highlight python %}
    +
    +df = sqlContext.load("people.parquet")
    +df.select("name", "age").save("namesAndAges.parquet")
    +
    +{% endhighlight %}
    +
    +</div>
    +</div>
    +
    +### Manually Specifying Options
    +
    +You can also manually specify the data source that will be used along with 
any extra options
    +that you would like to pass to the datasource.  Data sources are specified 
by their fully qualified
    +name (i.e., `org.apache.spark.sql.parquet`), but for built-in sources you 
can also use the shorted
    +name (`json`, `parquet`, `jdbc`).  DataFrames of any type can be converted 
into other types
    +using this syntax.
    +
    +<div class="codetabs">
    +<div data-lang="scala"  markdown="1">
    +
    +{% highlight scala %}
    +val df = sqlContext.load("json", "people.json")
    +df.select("name", "age").save("parquet", "namesAndAges.parquet")
    --- End diff --
    
    data source should be the last argument (we did it for python)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-5183][SQL] Update SQL Docs with JDBC an...

Reply via email to