spark git commit: [SPARK-16557][SQL] Remove stale doc in sql/README.md
Repository: spark Updated Branches: refs/heads/branch-2.0 aa4690b1b -> c5f935582 [SPARK-16557][SQL] Remove stale doc in sql/README.md ## What changes were proposed in this pull request? Most of the documentation in https://github.com/apache/spark/blob/master/sql/README.md is stale. It would be useful to keep the list of projects to explain what's going on, and everything else should be removed. ## How was this patch tested? N/A Author: Reynold Xin Closes #14211 from rxin/SPARK-16557. (cherry picked from commit 2e4075e2ece9574100c79558cab054485e25c2ee) Signed-off-by: Reynold Xin Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c5f93558 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c5f93558 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c5f93558 Branch: refs/heads/branch-2.0 Commit: c5f935582c07787271dcabcfbd6a7b8e776d607a Parents: aa4690b Author: Reynold Xin Authored: Thu Jul 14 19:24:42 2016 -0700 Committer: Reynold Xin Committed: Thu Jul 14 19:24:47 2016 -0700 -- sql/README.md | 75 +- 1 file changed, 1 insertion(+), 74 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/c5f93558/sql/README.md -- diff --git a/sql/README.md b/sql/README.md index b090398..58e9097 100644 --- a/sql/README.md +++ b/sql/README.md @@ -1,83 +1,10 @@ Spark SQL = -This module provides support for executing relational queries expressed in either SQL or a LINQ-like Scala DSL. +This module provides support for executing relational queries expressed in either SQL or the DataFrame/Dataset API. Spark SQL is broken up into four subprojects: - Catalyst (sql/catalyst) - An implementation-agnostic framework for manipulating trees of relational operators and expressions. - Execution (sql/core) - A query planner / execution engine for translating Catalyst's logical query plans into Spark RDDs. This component also includes a new public interface, SQLContext, that allows users to execute SQL or LINQ statements against existing RDDs and Parquet files. - Hive Support (sql/hive) - Includes an extension of SQLContext called HiveContext that allows users to write queries using a subset of HiveQL and access data from a Hive Metastore using Hive SerDes. There are also wrappers that allows users to run queries that include Hive UDFs, UDAFs, and UDTFs. - HiveServer and CLI support (sql/hive-thriftserver) - Includes support for the SQL CLI (bin/spark-sql) and a HiveServer2 (for JDBC/ODBC) compatible server. - - -Other dependencies for developers -- -In order to create new hive test cases (i.e. a test suite based on `HiveComparisonTest`), -you will need to setup your development environment based on the following instructions. - -If you are working with Hive 0.12.0, you will need to set several environmental variables as follows. - -``` -export HIVE_HOME="/hive/build/dist" -export HIVE_DEV_HOME="/hive/" -export HADOOP_HOME="/hadoop" -``` - -If you are working with Hive 0.13.1, the following steps are needed: - -1. Download Hive's [0.13.1](https://archive.apache.org/dist/hive/hive-0.13.1) and set `HIVE_HOME` with `export HIVE_HOME=""`. Please do not set `HIVE_DEV_HOME` (See [SPARK-4119](https://issues.apache.org/jira/browse/SPARK-4119)). -2. Set `HADOOP_HOME` with `export HADOOP_HOME=""` -3. Download all Hive 0.13.1a jars (Hive jars actually used by Spark) from [here](http://mvnrepository.com/artifact/org.spark-project.hive) and replace corresponding original 0.13.1 jars in `$HIVE_HOME/lib`. -4. Download [Kryo 2.21 jar](http://mvnrepository.com/artifact/com.esotericsoftware.kryo/kryo/2.21) (Note: 2.22 jar does not work) and [Javolution 5.5.1 jar](http://mvnrepository.com/artifact/javolution/javolution/5.5.1) to `$HIVE_HOME/lib`. -5. This step is optional. But, when generating golden answer files, if a Hive query fails and you find that Hive tries to talk to HDFS or you find weird runtime NPEs, set the following in your test suite... - -``` -val testTempDir = Utils.createTempDir() -// We have to use kryo to let Hive correctly serialize some plans. -sql("set hive.plan.serialization.format=kryo") -// Explicitly set fs to local fs. -sql(s"set fs.default.name=file://$testTempDir/") -// Ask Hive to run jobs in-process as a single map and reduce task. -sql("set mapred.job.tracker=local") -``` - -Using the console -= -An interactive scala console can be invoked by running `build/sbt hive/console`. -From here you can execute queries with HiveQl and manipulate DataFrame by using DSL. - -```scala -$ build/sbt hive/console - -[info] Starting scala interpreter... -import org.apache.spark.sql.cataly
spark git commit: [SPARK-16557][SQL] Remove stale doc in sql/README.md
Repository: spark Updated Branches: refs/heads/master 972673aca -> 2e4075e2e [SPARK-16557][SQL] Remove stale doc in sql/README.md ## What changes were proposed in this pull request? Most of the documentation in https://github.com/apache/spark/blob/master/sql/README.md is stale. It would be useful to keep the list of projects to explain what's going on, and everything else should be removed. ## How was this patch tested? N/A Author: Reynold Xin Closes #14211 from rxin/SPARK-16557. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2e4075e2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2e4075e2 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2e4075e2 Branch: refs/heads/master Commit: 2e4075e2ece9574100c79558cab054485e25c2ee Parents: 972673a Author: Reynold Xin Authored: Thu Jul 14 19:24:42 2016 -0700 Committer: Reynold Xin Committed: Thu Jul 14 19:24:42 2016 -0700 -- sql/README.md | 75 +- 1 file changed, 1 insertion(+), 74 deletions(-) -- http://git-wip-us.apache.org/repos/asf/spark/blob/2e4075e2/sql/README.md -- diff --git a/sql/README.md b/sql/README.md index b090398..58e9097 100644 --- a/sql/README.md +++ b/sql/README.md @@ -1,83 +1,10 @@ Spark SQL = -This module provides support for executing relational queries expressed in either SQL or a LINQ-like Scala DSL. +This module provides support for executing relational queries expressed in either SQL or the DataFrame/Dataset API. Spark SQL is broken up into four subprojects: - Catalyst (sql/catalyst) - An implementation-agnostic framework for manipulating trees of relational operators and expressions. - Execution (sql/core) - A query planner / execution engine for translating Catalyst's logical query plans into Spark RDDs. This component also includes a new public interface, SQLContext, that allows users to execute SQL or LINQ statements against existing RDDs and Parquet files. - Hive Support (sql/hive) - Includes an extension of SQLContext called HiveContext that allows users to write queries using a subset of HiveQL and access data from a Hive Metastore using Hive SerDes. There are also wrappers that allows users to run queries that include Hive UDFs, UDAFs, and UDTFs. - HiveServer and CLI support (sql/hive-thriftserver) - Includes support for the SQL CLI (bin/spark-sql) and a HiveServer2 (for JDBC/ODBC) compatible server. - - -Other dependencies for developers -- -In order to create new hive test cases (i.e. a test suite based on `HiveComparisonTest`), -you will need to setup your development environment based on the following instructions. - -If you are working with Hive 0.12.0, you will need to set several environmental variables as follows. - -``` -export HIVE_HOME="/hive/build/dist" -export HIVE_DEV_HOME="/hive/" -export HADOOP_HOME="/hadoop" -``` - -If you are working with Hive 0.13.1, the following steps are needed: - -1. Download Hive's [0.13.1](https://archive.apache.org/dist/hive/hive-0.13.1) and set `HIVE_HOME` with `export HIVE_HOME=""`. Please do not set `HIVE_DEV_HOME` (See [SPARK-4119](https://issues.apache.org/jira/browse/SPARK-4119)). -2. Set `HADOOP_HOME` with `export HADOOP_HOME=""` -3. Download all Hive 0.13.1a jars (Hive jars actually used by Spark) from [here](http://mvnrepository.com/artifact/org.spark-project.hive) and replace corresponding original 0.13.1 jars in `$HIVE_HOME/lib`. -4. Download [Kryo 2.21 jar](http://mvnrepository.com/artifact/com.esotericsoftware.kryo/kryo/2.21) (Note: 2.22 jar does not work) and [Javolution 5.5.1 jar](http://mvnrepository.com/artifact/javolution/javolution/5.5.1) to `$HIVE_HOME/lib`. -5. This step is optional. But, when generating golden answer files, if a Hive query fails and you find that Hive tries to talk to HDFS or you find weird runtime NPEs, set the following in your test suite... - -``` -val testTempDir = Utils.createTempDir() -// We have to use kryo to let Hive correctly serialize some plans. -sql("set hive.plan.serialization.format=kryo") -// Explicitly set fs to local fs. -sql(s"set fs.default.name=file://$testTempDir/") -// Ask Hive to run jobs in-process as a single map and reduce task. -sql("set mapred.job.tracker=local") -``` - -Using the console -= -An interactive scala console can be invoked by running `build/sbt hive/console`. -From here you can execute queries with HiveQl and manipulate DataFrame by using DSL. - -```scala -$ build/sbt hive/console - -[info] Starting scala interpreter... -import org.apache.spark.sql.catalyst.analysis._ -import org.apache.spark.sql.catalyst.dsl._ -import org.apache.spark.sql.catalyst.errors._ -