subject:"spark git commit\: \[SPARK\-16557\]\[SQL\] Remove stale doc in sql\/README.md"

spark git commit: [SPARK-16557][SQL] Remove stale doc in sql/README.md

2016-07-14 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/branch-2.0 aa4690b1b -> c5f935582


[SPARK-16557][SQL] Remove stale doc in sql/README.md

## What changes were proposed in this pull request?
Most of the documentation in 
https://github.com/apache/spark/blob/master/sql/README.md is stale. It would be 
useful to keep the list of projects to explain what's going on, and everything 
else should be removed.

## How was this patch tested?
N/A

Author: Reynold Xin 

Closes #14211 from rxin/SPARK-16557.

(cherry picked from commit 2e4075e2ece9574100c79558cab054485e25c2ee)
Signed-off-by: Reynold Xin 


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c5f93558
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c5f93558
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/c5f93558

Branch: refs/heads/branch-2.0
Commit: c5f935582c07787271dcabcfbd6a7b8e776d607a
Parents: aa4690b
Author: Reynold Xin 
Authored: Thu Jul 14 19:24:42 2016 -0700
Committer: Reynold Xin 
Committed: Thu Jul 14 19:24:47 2016 -0700

--
 sql/README.md | 75 +-
 1 file changed, 1 insertion(+), 74 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/c5f93558/sql/README.md
--
diff --git a/sql/README.md b/sql/README.md
index b090398..58e9097 100644
--- a/sql/README.md
+++ b/sql/README.md
@@ -1,83 +1,10 @@
 Spark SQL
 =
 
-This module provides support for executing relational queries expressed in 
either SQL or a LINQ-like Scala DSL.
+This module provides support for executing relational queries expressed in 
either SQL or the DataFrame/Dataset API.
 
 Spark SQL is broken up into four subprojects:
  - Catalyst (sql/catalyst) - An implementation-agnostic framework for 
manipulating trees of relational operators and expressions.
  - Execution (sql/core) - A query planner / execution engine for translating 
Catalyst's logical query plans into Spark RDDs.  This component also includes a 
new public interface, SQLContext, that allows users to execute SQL or LINQ 
statements against existing RDDs and Parquet files.
  - Hive Support (sql/hive) - Includes an extension of SQLContext called 
HiveContext that allows users to write queries using a subset of HiveQL and 
access data from a Hive Metastore using Hive SerDes.  There are also wrappers 
that allows users to run queries that include Hive UDFs, UDAFs, and UDTFs.
  - HiveServer and CLI support (sql/hive-thriftserver) - Includes support for 
the SQL CLI (bin/spark-sql) and a HiveServer2 (for JDBC/ODBC) compatible server.
-
-
-Other dependencies for developers
--
-In order to create new hive test cases (i.e. a test suite based on 
`HiveComparisonTest`),
-you will need to setup your development environment based on the following 
instructions.
-
-If you are working with Hive 0.12.0, you will need to set several 
environmental variables as follows.
-
-```
-export HIVE_HOME="/hive/build/dist"
-export HIVE_DEV_HOME="/hive/"
-export HADOOP_HOME="/hadoop"
-```
-
-If you are working with Hive 0.13.1, the following steps are needed:
-
-1. Download Hive's [0.13.1](https://archive.apache.org/dist/hive/hive-0.13.1) 
and set `HIVE_HOME` with `export HIVE_HOME=""`. Please do not set 
`HIVE_DEV_HOME` (See 
[SPARK-4119](https://issues.apache.org/jira/browse/SPARK-4119)).
-2. Set `HADOOP_HOME` with `export HADOOP_HOME=""`
-3. Download all Hive 0.13.1a jars (Hive jars actually used by Spark) from 
[here](http://mvnrepository.com/artifact/org.spark-project.hive) and replace 
corresponding original 0.13.1 jars in `$HIVE_HOME/lib`.
-4. Download [Kryo 2.21 
jar](http://mvnrepository.com/artifact/com.esotericsoftware.kryo/kryo/2.21) 
(Note: 2.22 jar does not work) and [Javolution 5.5.1 
jar](http://mvnrepository.com/artifact/javolution/javolution/5.5.1) to 
`$HIVE_HOME/lib`.
-5. This step is optional. But, when generating golden answer files, if a Hive 
query fails and you find that Hive tries to talk to HDFS or you find weird 
runtime NPEs, set the following in your test suite...
-
-```
-val testTempDir = Utils.createTempDir()
-// We have to use kryo to let Hive correctly serialize some plans.
-sql("set hive.plan.serialization.format=kryo")
-// Explicitly set fs to local fs.
-sql(s"set fs.default.name=file://$testTempDir/")
-// Ask Hive to run jobs in-process as a single map and reduce task.
-sql("set mapred.job.tracker=local")
-```
-
-Using the console
-=
-An interactive scala console can be invoked by running `build/sbt 
hive/console`.
-From here you can execute queries with HiveQl and manipulate DataFrame by 
using DSL.
-
-```scala
-$ build/sbt hive/console
-
-[info] Starting scala interpreter...
-import org.apache.spark.sql.cataly

spark git commit: [SPARK-16557][SQL] Remove stale doc in sql/README.md

2016-07-14 Thread rxin

Repository: spark
Updated Branches:
  refs/heads/master 972673aca -> 2e4075e2e


[SPARK-16557][SQL] Remove stale doc in sql/README.md

## What changes were proposed in this pull request?
Most of the documentation in 
https://github.com/apache/spark/blob/master/sql/README.md is stale. It would be 
useful to keep the list of projects to explain what's going on, and everything 
else should be removed.

## How was this patch tested?
N/A

Author: Reynold Xin 

Closes #14211 from rxin/SPARK-16557.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2e4075e2
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2e4075e2
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/2e4075e2

Branch: refs/heads/master
Commit: 2e4075e2ece9574100c79558cab054485e25c2ee
Parents: 972673a
Author: Reynold Xin 
Authored: Thu Jul 14 19:24:42 2016 -0700
Committer: Reynold Xin 
Committed: Thu Jul 14 19:24:42 2016 -0700

--
 sql/README.md | 75 +-
 1 file changed, 1 insertion(+), 74 deletions(-)
--


http://git-wip-us.apache.org/repos/asf/spark/blob/2e4075e2/sql/README.md
--
diff --git a/sql/README.md b/sql/README.md
index b090398..58e9097 100644
--- a/sql/README.md
+++ b/sql/README.md
@@ -1,83 +1,10 @@
 Spark SQL
 =
 
-This module provides support for executing relational queries expressed in 
either SQL or a LINQ-like Scala DSL.
+This module provides support for executing relational queries expressed in 
either SQL or the DataFrame/Dataset API.
 
 Spark SQL is broken up into four subprojects:
  - Catalyst (sql/catalyst) - An implementation-agnostic framework for 
manipulating trees of relational operators and expressions.
  - Execution (sql/core) - A query planner / execution engine for translating 
Catalyst's logical query plans into Spark RDDs.  This component also includes a 
new public interface, SQLContext, that allows users to execute SQL or LINQ 
statements against existing RDDs and Parquet files.
  - Hive Support (sql/hive) - Includes an extension of SQLContext called 
HiveContext that allows users to write queries using a subset of HiveQL and 
access data from a Hive Metastore using Hive SerDes.  There are also wrappers 
that allows users to run queries that include Hive UDFs, UDAFs, and UDTFs.
  - HiveServer and CLI support (sql/hive-thriftserver) - Includes support for 
the SQL CLI (bin/spark-sql) and a HiveServer2 (for JDBC/ODBC) compatible server.
-
-
-Other dependencies for developers
--
-In order to create new hive test cases (i.e. a test suite based on 
`HiveComparisonTest`),
-you will need to setup your development environment based on the following 
instructions.
-
-If you are working with Hive 0.12.0, you will need to set several 
environmental variables as follows.
-
-```
-export HIVE_HOME="/hive/build/dist"
-export HIVE_DEV_HOME="/hive/"
-export HADOOP_HOME="/hadoop"
-```
-
-If you are working with Hive 0.13.1, the following steps are needed:
-
-1. Download Hive's [0.13.1](https://archive.apache.org/dist/hive/hive-0.13.1) 
and set `HIVE_HOME` with `export HIVE_HOME=""`. Please do not set 
`HIVE_DEV_HOME` (See 
[SPARK-4119](https://issues.apache.org/jira/browse/SPARK-4119)).
-2. Set `HADOOP_HOME` with `export HADOOP_HOME=""`
-3. Download all Hive 0.13.1a jars (Hive jars actually used by Spark) from 
[here](http://mvnrepository.com/artifact/org.spark-project.hive) and replace 
corresponding original 0.13.1 jars in `$HIVE_HOME/lib`.
-4. Download [Kryo 2.21 
jar](http://mvnrepository.com/artifact/com.esotericsoftware.kryo/kryo/2.21) 
(Note: 2.22 jar does not work) and [Javolution 5.5.1 
jar](http://mvnrepository.com/artifact/javolution/javolution/5.5.1) to 
`$HIVE_HOME/lib`.
-5. This step is optional. But, when generating golden answer files, if a Hive 
query fails and you find that Hive tries to talk to HDFS or you find weird 
runtime NPEs, set the following in your test suite...
-
-```
-val testTempDir = Utils.createTempDir()
-// We have to use kryo to let Hive correctly serialize some plans.
-sql("set hive.plan.serialization.format=kryo")
-// Explicitly set fs to local fs.
-sql(s"set fs.default.name=file://$testTempDir/")
-// Ask Hive to run jobs in-process as a single map and reduce task.
-sql("set mapred.job.tracker=local")
-```
-
-Using the console
-=
-An interactive scala console can be invoked by running `build/sbt 
hive/console`.
-From here you can execute queries with HiveQl and manipulate DataFrame by 
using DSL.
-
-```scala
-$ build/sbt hive/console
-
-[info] Starting scala interpreter...
-import org.apache.spark.sql.catalyst.analysis._
-import org.apache.spark.sql.catalyst.dsl._
-import org.apache.spark.sql.catalyst.errors._
-

spark git commit: [SPARK-16557][SQL] Remove stale doc in sql/README.md

spark git commit: [SPARK-16557][SQL] Remove stale doc in sql/README.md

2 matches

Site Navigation

Mail list logo

Footer information