[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1399#issuecomment-49865796
  
QA tests have started for PR 1399. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17032/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-23 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1399#issuecomment-49873447
  
QA results for PR 1399:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brclass SparkSQLOperationManager(hiveContext: HiveContext) 
extends OperationManager with Logging {brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/17032/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-23 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/1399#issuecomment-49959455
  
We will continue to improve the migration guide, but this LGTM for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-23 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/1399#issuecomment-49959642
  
@liancheng are you ready to merge this?  Can you remove [WIP] if so?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-22 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/1399#discussion_r15213826
  
--- Diff: docs/sql-programming-guide.md ---
@@ -573,4 +572,170 @@ prefixed with a tick (`'`).  Implicit conversions 
turn these symbols into expres
 evaluated by the SQL execution engine.  A full list of the functions 
supported can be found in the
 [ScalaDoc](api/scala/index.html#org.apache.spark.sql.SchemaRDD).
 
-!-- TODO: Include the table of operations here. --
\ No newline at end of file
+!-- TODO: Include the table of operations here. --
+
+## Running the Thrift JDBC server
+
+The Thrift JDBC server implemented here corresponds to the [`HiveServer2`]
+(https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2) 
in Hive 0.12. You can test
+the JDBC server with the beeline script comes with either Spark or Hive 
0.12.
+
+To start the JDBC server, run the following in the Spark directory:
+
+./sbin/start-thriftserver.sh
+
+The default port the server listens on is 1.  Now you can use beeline 
to test the Thrift JDBC
+server:
+
+./bin/beeline
+
+Connect to the JDBC server in beeline with:
+
+beeline !connect jdbc:hive2://localhost:1
+
+Beeline will ask you for a username and password. In non-secure mode, 
simply enter the username on
+your machine and a blank password. For secure mode, please follow the 
instructions given in the
+[beeline 
documentation](https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients)
+
+Configuration of Hive is done by placing your `hive-site.xml` file in 
`conf/`.
+
+You may also use the beeline script comes with Hive.
+
+### Migration Guide for Shark Users
+
+ Reducer number
+
+In Shark, default reducer number is 1, and can be tuned by property 
`mapred.reduce.tasks`. In Spark SQL, reducer number is default to 200, and can 
be customized by the `spark.sql.shuffle.partitions` property:
--- End diff --

Yep, exactly


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1399#issuecomment-49715371
  
QA tests have started for PR 1399. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16959/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1399#issuecomment-49721444
  
QA results for PR 1399:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brclass SparkSQLOperationManager(hiveContext: HiveContext) 
extends OperationManager with Logging {brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16959/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-22 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/1399#discussion_r15222777
  
--- Diff: bin/beeline ---
@@ -0,0 +1,45 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Figure out where Spark is installed
+FWDIR=$(cd `dirname $0`/..; pwd)
+
+# Find the java binary
+if [ -n ${JAVA_HOME} ]; then
--- End diff --

Hit a bug while fixing this: application options with names that 
`SparkSubmitArguments` recognizes are stolen by `SparkSubmit` instead of passed 
to the application. For example, when running `BeeLine` with `spark-submit`, 
passing `--help` option shows the usage message of `SparkSubmit` rather than 
`BeeLine`.

Since the `spark-internal` issue also touches this part of code, I tried to 
fix this bug together within this PR to avoid further conflict.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1399#issuecomment-49749099
  
QA tests have started for PR 1399. This patch DID NOT merge cleanly! 
brView progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16967/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1399#issuecomment-49753393
  
QA tests have started for PR 1399. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16968/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-22 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/1399#issuecomment-49754843
  
Hey @pwendell, thanks for the detailed comments and review. I believe the 
last a few commits have addressed all issues brought up. I rebased this PR to 
the most recent master. The bug found in `SparkSubmitArguments` was also fixed.

@rxin @marmbrus Would you please help review the Hive compatibility and 
Shark migration guide draft sections of the updated documents? Thanks in 
advance!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1399#issuecomment-49759732
  
QA results for PR 1399:br- This patch FAILED unit tests.brbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16967/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1399#issuecomment-49763828
  
QA results for PR 1399:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brclass SparkSQLOperationManager(hiveContext: HiveContext) 
extends OperationManager with Logging {brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16968/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-22 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/1399#issuecomment-49814629
  
The build failure was due to mima configuration, just disabled mima for 
`hive-thriftserver`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1399#issuecomment-49814756
  
QA tests have started for PR 1399. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16997/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-22 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1399#issuecomment-49816076
  
I only looked at this from a packaging/scripts perspective. But LGTM in 
that regard.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-22 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1399#issuecomment-49819658
  
QA results for PR 1399:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brclass SparkSQLOperationManager(hiveContext: HiveContext) 
extends OperationManager with Logging {brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16997/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-22 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/1399#discussion_r15266402
  
--- Diff: sql/hive-thriftserver/pom.xml ---
@@ -0,0 +1,75 @@
+?xml version=1.0 encoding=UTF-8?
+!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one or more
+  ~ contributor license agreements.  See the NOTICE file distributed with
+  ~ this work for additional information regarding copyright ownership.
+  ~ The ASF licenses this file to You under the Apache License, Version 2.0
+  ~ (the License); you may not use this file except in compliance with
+  ~ the License.  You may obtain a copy of the License at
+  ~
+  ~http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing, software
+  ~ distributed under the License is distributed on an AS IS BASIS,
+  ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+  ~ See the License for the specific language governing permissions and
+  ~ limitations under the License.
+  --
+
+project xmlns=http://maven.apache.org/POM/4.0.0; 
xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;
+ xsi:schemaLocation=http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;
+  modelVersion4.0.0/modelVersion
+  parent
+groupIdorg.apache.spark/groupId
+artifactIdspark-parent/artifactId
+version1.1.0-SNAPSHOT/version
+relativePath../../pom.xml/relativePath
+  /parent
+
+  groupIdorg.apache.spark/groupId
+  artifactIdspark-hive-thriftserver_2.10/artifactId
+  packagingjar/packaging
+  nameSpark Project Hive/name
+  urlhttp://spark.apache.org//url
+  properties
--- End diff --

This is done. But does it mean the Thrift server won't be included in the 
Maven artifacts, and users have to download a prebuilt version or compile it 
manually if they need it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-22 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/1399#discussion_r15267219
  
--- Diff: sql/hive-thriftserver/pom.xml ---
@@ -0,0 +1,75 @@
+?xml version=1.0 encoding=UTF-8?
+!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one or more
+  ~ contributor license agreements.  See the NOTICE file distributed with
+  ~ this work for additional information regarding copyright ownership.
+  ~ The ASF licenses this file to You under the Apache License, Version 2.0
+  ~ (the License); you may not use this file except in compliance with
+  ~ the License.  You may obtain a copy of the License at
+  ~
+  ~http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing, software
+  ~ distributed under the License is distributed on an AS IS BASIS,
+  ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+  ~ See the License for the specific language governing permissions and
+  ~ limitations under the License.
+  --
+
+project xmlns=http://maven.apache.org/POM/4.0.0; 
xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;
+ xsi:schemaLocation=http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;
+  modelVersion4.0.0/modelVersion
+  parent
+groupIdorg.apache.spark/groupId
+artifactIdspark-parent/artifactId
+version1.1.0-SNAPSHOT/version
+relativePath../../pom.xml/relativePath
+  /parent
+
+  groupIdorg.apache.spark/groupId
+  artifactIdspark-hive-thriftserver_2.10/artifactId
+  packagingjar/packaging
+  nameSpark Project Hive/name
+  urlhttp://spark.apache.org//url
+  properties
--- End diff --

There should never be a reason to depend on the server from an external 
project.  I'd imagine people will only be launching it from distributions we 
create.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-21 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/1399#discussion_r15155941
  
--- Diff: bin/spark-sql ---
@@ -0,0 +1,81 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+SCALA_VERSION=2.10
+
+cygwin=false
+case `uname` in
+CYGWIN*) cygwin=true;;
+esac
+
+# Enter posix mode for bash
+set -o posix
+
+# Figure out where Spark is installed
+FWDIR=$(cd `dirname $0`/..; pwd)
+
+if [[ $@ = *--help ]] || [[ $@ = *-h ]]; then
+  echo Usage: ./sbin/spark-sql [options]
+  $FWDIR/bin/spark-submit --help 21 | grep -v Usage 12
+  exit 0
+fi
+
+ASSEMBLY_DIR=$FWDIR/assembly/target/scala-$SCALA_VERSION
--- End diff --

Great, this is really helpful, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-21 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/1399#discussion_r15155947
  
--- Diff: assembly/pom.xml ---
@@ -162,6 +162,11 @@
   artifactIdspark-hive_${scala.binary.version}/artifactId
   version${project.version}/version
 /dependency
+dependency
+  groupIdorg.apache.spark/groupId
+  
artifactIdspark-hive-thriftserver_${scala.binary.version}/artifactId
--- End diff --

OK.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-21 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/1399#discussion_r15155961
  
--- Diff: docs/sql-programming-guide.md ---
@@ -573,4 +572,170 @@ prefixed with a tick (`'`).  Implicit conversions 
turn these symbols into expres
 evaluated by the SQL execution engine.  A full list of the functions 
supported can be found in the
 [ScalaDoc](api/scala/index.html#org.apache.spark.sql.SchemaRDD).
 
-!-- TODO: Include the table of operations here. --
\ No newline at end of file
+!-- TODO: Include the table of operations here. --
+
+## Running the Thrift JDBC server
+
+The Thrift JDBC server implemented here corresponds to the [`HiveServer2`]
+(https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2) 
in Hive 0.12. You can test
+the JDBC server with the beeline script comes with either Spark or Hive 
0.12.
+
+To start the JDBC server, run the following in the Spark directory:
+
+./sbin/start-thriftserver.sh
+
+The default port the server listens on is 1.  Now you can use beeline 
to test the Thrift JDBC
+server:
+
+./bin/beeline
+
+Connect to the JDBC server in beeline with:
+
+beeline !connect jdbc:hive2://localhost:1
+
+Beeline will ask you for a username and password. In non-secure mode, 
simply enter the username on
+your machine and a blank password. For secure mode, please follow the 
instructions given in the
+[beeline 
documentation](https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients)
+
+Configuration of Hive is done by placing your `hive-site.xml` file in 
`conf/`.
+
+You may also use the beeline script comes with Hive.
+
+### Migration Guide for Shark Users
+
+ Reducer number
+
+In Shark, default reducer number is 1, and can be tuned by property 
`mapred.reduce.tasks`. In Spark SQL, reducer number is default to 200, and can 
be customized by the `spark.sql.shuffle.partitions` property:
--- End diff --

Seems to be a good idea. Would also add a WARN log telling user this 
property is deprecated.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1399#issuecomment-49548923
  
QA tests have started for PR 1399. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16871/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1399#issuecomment-49548926
  
QA results for PR 1399:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brclass HiveThriftServer2(hiveContext: HiveContext)brclass 
SparkSQLCLIDriver extends CliDriver with Logging {brclass 
SparkSQLCLIService(hiveContext: HiveContext)brclass SparkSQLDriver(val 
context: HiveContext = SparkSQLEnv.hiveContext)brclass 
SparkSQLSessionManager(hiveContext: HiveContext)brclass 
SparkSQLOperationManager(hiveContext: HiveContext) extends OperationManager 
with Logging {brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16871/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1399#issuecomment-49549992
  
QA tests have started for PR 1399. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16873/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1399#issuecomment-49550014
  
QA results for PR 1399:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brclass HiveThriftServer2(hiveContext: HiveContext)brclass 
SparkSQLCLIDriver extends CliDriver with Logging {brclass 
SparkSQLCLIService(hiveContext: HiveContext)brclass SparkSQLDriver(val 
context: HiveContext = SparkSQLEnv.hiveContext)brclass 
SparkSQLSessionManager(hiveContext: HiveContext)brclass 
SparkSQLOperationManager(hiveContext: HiveContext) extends OperationManager 
with Logging {brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16873/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1399#issuecomment-49550656
  
QA tests have started for PR 1399. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16874/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1399#issuecomment-49550677
  
QA results for PR 1399:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brclass HiveThriftServer2(hiveContext: HiveContext)brclass 
SparkSQLCLIDriver extends CliDriver with Logging {brclass 
SparkSQLCLIService(hiveContext: HiveContext)brclass SparkSQLDriver(val 
context: HiveContext = SparkSQLEnv.hiveContext)brclass 
SparkSQLSessionManager(hiveContext: HiveContext)brclass 
SparkSQLOperationManager(hiveContext: HiveContext) extends OperationManager 
with Logging {brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16874/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1399#issuecomment-49551156
  
QA tests have started for PR 1399. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16875/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1399#issuecomment-49553961
  
QA results for PR 1399:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brclass HiveThriftServer2(hiveContext: HiveContext)brclass 
SparkSQLCLIDriver extends CliDriver with Logging {brclass 
SparkSQLCLIService(hiveContext: HiveContext)brclass SparkSQLDriver(val 
context: HiveContext = SparkSQLEnv.hiveContext)brclass 
SparkSQLSessionManager(hiveContext: HiveContext)brclass 
SparkSQLOperationManager(hiveContext: HiveContext) extends OperationManager 
with Logging {brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16875/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1399#issuecomment-49569446
  
QA tests have started for PR 1399. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16890/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-20 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/1399#discussion_r15154620
  
--- Diff: assembly/pom.xml ---
@@ -162,6 +162,11 @@
   artifactIdspark-hive_${scala.binary.version}/artifactId
   version${project.version}/version
 /dependency
+dependency
+  groupIdorg.apache.spark/groupId
+  
artifactIdspark-hive-thriftserver_${scala.binary.version}/artifactId
--- End diff --

Rather than have this included in the `hive` profile, we should make a 
`hive-thriftserver` profile. I could imagine users who want to build with 
support for reading Hive data, but don't care about running a JDBC server.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-20 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/1399#discussion_r15154735
  
--- Diff: bin/spark-sql ---
@@ -0,0 +1,81 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+SCALA_VERSION=2.10
+
+cygwin=false
+case `uname` in
+CYGWIN*) cygwin=true;;
+esac
+
+# Enter posix mode for bash
+set -o posix
+
+# Figure out where Spark is installed
+FWDIR=$(cd `dirname $0`/..; pwd)
+
+if [[ $@ = *--help ]] || [[ $@ = *-h ]]; then
+  echo Usage: ./sbin/spark-sql [options]
+  $FWDIR/bin/spark-submit --help 21 | grep -v Usage 12
+  exit 0
+fi
+
+ASSEMBLY_DIR=$FWDIR/assembly/target/scala-$SCALA_VERSION
--- End diff --

There is a lot of code here devoted to finding the assembly jar, checking 
java, etc. I think this is all redundant with code that already gets triggered 
when `spark-submit` is called, ideally we won't have this in multiple places.

The `spark-submit` script used to allow you to set the jar name to be 
`spark-internal` in which case it wouldn't require you to pass in a jar (since 
the Spark assembly is already added automatically). I didn't realize, but that 
was removed in 
https://github.com/apache/spark/commit/4b8ec6fc#diff-4d2ab44195558d5a9d5f15b8803ef39dL47
 and replaced with more specific versions for the shell. Anyways, it would be 
good to add back a generic `spark-internal` for other internal classes like 
this.

Then this script can be very simple, it will just call `spark-submit` and 
then pass the class as 
`org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-20 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/1399#discussion_r15154749
  
--- Diff: bin/beeline ---
@@ -0,0 +1,45 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Figure out where Spark is installed
+FWDIR=$(cd `dirname $0`/..; pwd)
+
+# Find the java binary
+if [ -n ${JAVA_HOME} ]; then
--- End diff --

I think a lot this could be replaced with a simple call to `spark-submit`. 
We just need to allow the `spark-internal` reserved jar name (see other 
comment).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-20 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/1399#discussion_r15154767
  
--- Diff: docs/sql-programming-guide.md ---
@@ -573,4 +572,170 @@ prefixed with a tick (`'`).  Implicit conversions 
turn these symbols into expres
 evaluated by the SQL execution engine.  A full list of the functions 
supported can be found in the
 [ScalaDoc](api/scala/index.html#org.apache.spark.sql.SchemaRDD).
 
-!-- TODO: Include the table of operations here. --
\ No newline at end of file
+!-- TODO: Include the table of operations here. --
+
+## Running the Thrift JDBC server
+
+The Thrift JDBC server implemented here corresponds to the [`HiveServer2`]
+(https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2) 
in Hive 0.12. You can test
+the JDBC server with the beeline script comes with either Spark or Hive 
0.12.
+
+To start the JDBC server, run the following in the Spark directory:
+
+./sbin/start-thriftserver.sh
+
+The default port the server listens on is 1.  Now you can use beeline 
to test the Thrift JDBC
+server:
+
+./bin/beeline
+
+Connect to the JDBC server in beeline with:
+
+beeline !connect jdbc:hive2://localhost:1
+
+Beeline will ask you for a username and password. In non-secure mode, 
simply enter the username on
+your machine and a blank password. For secure mode, please follow the 
instructions given in the
+[beeline 
documentation](https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients)
+
+Configuration of Hive is done by placing your `hive-site.xml` file in 
`conf/`.
+
+You may also use the beeline script comes with Hive.
+
+### Migration Guide for Shark Users
+
+ Reducer number
+
+In Shark, default reducer number is 1, and can be tuned by property 
`mapred.reduce.tasks`. In Spark SQL, reducer number is default to 200, and can 
be customized by the `spark.sql.shuffle.partitions` property:
--- End diff --

To support older scripts, can we convert `mapred.reduce.tasks` to 
`spark.sql.shuffle.partitions` if we see it?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-20 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/1399#discussion_r15154792
  
--- Diff: docs/sql-programming-guide.md ---
@@ -573,4 +572,170 @@ prefixed with a tick (`'`).  Implicit conversions 
turn these symbols into expres
 evaluated by the SQL execution engine.  A full list of the functions 
supported can be found in the
 [ScalaDoc](api/scala/index.html#org.apache.spark.sql.SchemaRDD).
 
-!-- TODO: Include the table of operations here. --
\ No newline at end of file
+!-- TODO: Include the table of operations here. --
+
+## Running the Thrift JDBC server
+
+The Thrift JDBC server implemented here corresponds to the [`HiveServer2`]
+(https://cwiki.apache.org/confluence/display/Hive/Setting+Up+HiveServer2) 
in Hive 0.12. You can test
+the JDBC server with the beeline script comes with either Spark or Hive 
0.12.
+
+To start the JDBC server, run the following in the Spark directory:
+
+./sbin/start-thriftserver.sh
+
+The default port the server listens on is 1.  Now you can use beeline 
to test the Thrift JDBC
+server:
+
+./bin/beeline
+
+Connect to the JDBC server in beeline with:
+
+beeline !connect jdbc:hive2://localhost:1
+
+Beeline will ask you for a username and password. In non-secure mode, 
simply enter the username on
+your machine and a blank password. For secure mode, please follow the 
instructions given in the
+[beeline 
documentation](https://cwiki.apache.org/confluence/display/Hive/HiveServer2+Clients)
+
+Configuration of Hive is done by placing your `hive-site.xml` file in 
`conf/`.
+
+You may also use the beeline script comes with Hive.
+
+### Migration Guide for Shark Users
+
+ Reducer number
+
+In Shark, default reducer number is 1, and can be tuned by property 
`mapred.reduce.tasks`. In Spark SQL, reducer number is default to 200, and can 
be customized by the `spark.sql.shuffle.partitions` property:
+
+```
+SET spark.sql.shuffle.partitions=10;
+SELECT page, count(*) c FROM logs_last_month_cached
+GROUP BY page ORDER BY c DESC LIMIT 10;
+```
+
+You may also put this property in `hive-site.xml` to override the default 
value.
+
+ Caching
+
+The `shark.cache` table property no longer exists, and tables whose name 
end with `_cached` are no longer automcatically cached. Instead, we provide 
`CACHE TABLE` and `UNCACHE TABLE` statements to let user control table caching 
explicitly:
+
+```
+CACHE TABLE logs_last_month;
+UNCACHE TABLE logs_last_month;
+```
+
+**NOTE** `CACHE TABLE tbl` is lazy, it only marks table `tbl` as need to 
by cached if necessary, but doesn't actually cache it until a query that 
touches `tbl` is executed. To force the table to be cached, you may simply 
count the table immediately after executing `CACHE TABLE`:
+
+```
+CACHE TABLE logs_last_month;
+SELECT COUNT(1) FROM logs_last_month;
+```
+
+Several caching related features are not supported yet:
+
+* User defined partition level cache eviction policy
+* RDD reloading
+* In-memory cache write through policy
+
+### Compatibility with Apache Hive
+
+ Deploying in Exising Hive Warehouses
+
+Spark SQL Thrift JDBC server is designed to be out of the box compatible 
with existing Hive
+installations. You do not need to modify your existing Hive Metastore or 
change the data placement
+or partitioning of your tables.
+
+ Supported Hive Features
+
+Spark SQL supports the vast majority of Hive features, such as:
+
+* Hive query statements, including:
+ * `SELECT`
+ * `GROUP BY
+ * `ORDER BY`
+ * `CLUSTER BY`
+ * `SORT BY`
+* All Hive operators, including:
+ * Relational operators (`=`, `⇔`, `==`, ``, ``, ``, `=`, `=`, etc)
+ * Arthimatic operators (`+`, `-`, `*`, `/`, `%`, etc)
+ * Logical operators (`AND`, ``, `OR`, `||`, etc)
+ * Complex type constructors
+ * Mathemtatical functions (`sign`, `ln`, `cos`, etc)
+ * String functions (`instr`, `length`, `printf`, etc)
+* User defined functions (UDF)
+* User defined aggregation functions (UDAF)
+* User defined serialization formats (SerDe's)
+* Joins
+ * `JOIN`
+ * `{LEFT|RIGHT|FULL} OUTER JOIN`
+ * `LEFT SEMI JOIN`
+ * `CROSS JOIN`
+* Unions
+* Sub queries
+ * `SELECT col FROM ( SELECT a + b AS col from t1) t2`
+* Sampling
+* Explain
+* Partitioned tables
+* All Hive DDL Functions, including:
+ * `CREATE TABLE`
+ * `CREATE TABLE AS SELECT`
+ * `ALTER TABLE`
+* Most Hive Data types, including:
+ * `TINYINT`
+ * `SMALLINT`
+ * `INT`
+ * `BIGINT`
+ * `BOOLEAN`
+ * `FLOAT`
+ * `DOUBLE`
+ * `STRING`
   

[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-20 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/1399#discussion_r15154977
  
--- Diff: sql/hive-thriftserver/pom.xml ---
@@ -0,0 +1,75 @@
+?xml version=1.0 encoding=UTF-8?
+!--
+  ~ Licensed to the Apache Software Foundation (ASF) under one or more
+  ~ contributor license agreements.  See the NOTICE file distributed with
+  ~ this work for additional information regarding copyright ownership.
+  ~ The ASF licenses this file to You under the Apache License, Version 2.0
+  ~ (the License); you may not use this file except in compliance with
+  ~ the License.  You may obtain a copy of the License at
+  ~
+  ~http://www.apache.org/licenses/LICENSE-2.0
+  ~
+  ~ Unless required by applicable law or agreed to in writing, software
+  ~ distributed under the License is distributed on an AS IS BASIS,
+  ~ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 
implied.
+  ~ See the License for the specific language governing permissions and
+  ~ limitations under the License.
+  --
+
+project xmlns=http://maven.apache.org/POM/4.0.0; 
xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance;
+ xsi:schemaLocation=http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;
+  modelVersion4.0.0/modelVersion
+  parent
+groupIdorg.apache.spark/groupId
+artifactIdspark-parent/artifactId
+version1.1.0-SNAPSHOT/version
+relativePath../../pom.xml/relativePath
+  /parent
+
+  groupIdorg.apache.spark/groupId
+  artifactIdspark-hive-thriftserver_2.10/artifactId
+  packagingjar/packaging
+  nameSpark Project Hive/name
+  urlhttp://spark.apache.org//url
+  properties
--- End diff --

Could you force this to be excluded when we are publishing release 
artifacts?

I think you can do it with the following approach (but it might be good to 
not put a version for the deploy plug-in).
http://prystash.blogspot.com/2009/06/maven-excluding-module-from-deploy.html


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-20 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/1399#discussion_r15155042
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
 ---
@@ -0,0 +1,345 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.hive.thriftserver
+
+import scala.collection.JavaConversions._
+
+import java.io._
+import java.util.{ArrayList = JArrayList}
+
+import jline.{ConsoleReader, History}
+import org.apache.commons.lang.StringUtils
+import org.apache.commons.logging.LogFactory
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.hive.cli.{CliDriver, CliSessionState, 
OptionsProcessor}
+import org.apache.hadoop.hive.common.LogUtils.LogInitializationException
+import org.apache.hadoop.hive.common.{HiveInterruptCallback, 
HiveInterruptUtils, LogUtils}
+import org.apache.hadoop.hive.conf.HiveConf
+import org.apache.hadoop.hive.ql.Driver
+import org.apache.hadoop.hive.ql.exec.Utilities
+import org.apache.hadoop.hive.ql.processors.{CommandProcessor, 
CommandProcessorFactory}
+import org.apache.hadoop.hive.ql.session.SessionState
+import org.apache.hadoop.hive.shims.ShimLoader
+import org.apache.thrift.transport.TSocket
+
+import org.apache.spark.sql.Logging
+
+object SparkSQLCLIDriver {
+  private var prompt = spark-sql
+  private var continuedPrompt = .padTo(prompt.length, ' ')
+  private var transport:TSocket = _
+
+  installSignalHandler()
+
+  /**
+   * Install an interrupt callback to cancel all Spark jobs. In Hive's 
CliDriver#processLine(),
+   * a signal handler will invoke this registered callback if a Ctrl+C 
signal is detected while
+   * a command is being processed by the current thread.
+   */
+  def installSignalHandler() {
+HiveInterruptUtils.add(new HiveInterruptCallback {
+  override def interrupt() {
+// Handle remote execution mode
+if (SparkSQLEnv.sparkContext != null) {
+  SparkSQLEnv.sparkContext.cancelAllJobs()
+} else {
+  if (transport != null) {
+// Force closing of TCP connection upon session termination
+transport.getSocket.close()
+  }
+}
+  }
+})
+  }
+
+  def main(args: Array[String]) {
+val oproc = new OptionsProcessor()
+if (!oproc.process_stage1(args)) {
+  System.exit(1)
+}
+
+// NOTE: It is critical to do this here so that log4j is reinitialized
+// before any of the other core hive classes are loaded
+var logInitFailed = false
+var logInitDetailMessage: String = null
+try {
+  logInitDetailMessage = LogUtils.initHiveLog4j()
+} catch {
+  case e: LogInitializationException =
+logInitFailed = true
+logInitDetailMessage = e.getMessage
+}
+
+val sessionState = new CliSessionState(new 
HiveConf(classOf[SessionState]))
+
+sessionState.in = System.in
+try {
+  sessionState.out = new PrintStream(System.out, true, UTF-8)
+  sessionState.info = new PrintStream(System.err, true, UTF-8)
+  sessionState.err = new PrintStream(System.err, true, UTF-8)
+} catch {
+  case e: UnsupportedEncodingException = System.exit(3)
+}
+
+if (!oproc.process_stage2(sessionState)) {
+  System.exit(2)
+}
+
+if (!sessionState.getIsSilent) {
+  if (logInitFailed) System.err.println(logInitDetailMessage)
+  else SessionState.getConsole.printInfo(logInitDetailMessage)
+}
+
+// Set all properties specified via command line.
+val conf: HiveConf = sessionState.getConf
+sessionState.cmdProperties.entrySet().foreach { item: 
java.util.Map.Entry[Object, Object] =
+  conf.set(item.getKey.asInstanceOf[String], 
item.getValue.asInstanceOf[String])
  

[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-20 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1399#issuecomment-49572019
  
QA results for PR 1399:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brclass HiveThriftServer2(hiveContext: HiveContext)brclass 
SparkSQLCLIDriver extends CliDriver with Logging {brclass 
SparkSQLCLIService(hiveContext: HiveContext)brclass SparkSQLDriver(val 
context: HiveContext = SparkSQLEnv.hiveContext)brclass 
SparkSQLSessionManager(hiveContext: HiveContext)brclass 
SparkSQLOperationManager(hiveContext: HiveContext) extends OperationManager 
with Logging {brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16890/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-17 Thread pwendell
Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/1399#discussion_r15045195
  
--- Diff: sbin/start-thriftserver.sh ---
@@ -0,0 +1,24 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Figure out where Spark is installed
+FWDIR=$(cd `dirname $0`/..; pwd)
+
+CLASS=org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
+$FWDIR/bin/spark-class $CLASS $@
--- End diff --

Checkout `spark-shell` for an example of a user-facing script that triages 
options to `spark-submit`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-16 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/1399#discussion_r14987086
  
--- Diff: sbin/start-thriftserver.sh ---
@@ -0,0 +1,24 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Figure out where Spark is installed
+FWDIR=$(cd `dirname $0`/..; pwd)
+
+CLASS=org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
+$FWDIR/bin/spark-class $CLASS $@
--- End diff --

Thanks, I've noticed the discussion. Marked this PR as WIP, will update 
soon.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---