[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...

2014-09-19 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/2397#issuecomment-56244356
  
Thanks!  I've merged this to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...

2014-09-19 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2397


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...

2014-09-18 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/2397#discussion_r17710806
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala ---
@@ -166,3 +166,20 @@ case class DescribeCommand(child: SparkPlan, output: 
Seq[Attribute])(
   child.output.map(field = Row(field.name, field.dataType.toString, 
null))
   }
 }
+
+/**
+ * :: DeveloperApi ::
+ */
+@DeveloperApi
+case class CacheTableAsSelectCommand(tableName: String, plan: LogicalPlan)
+  extends LeafNode with Command {
+  
+  override protected[sql] lazy val sideEffectResult = {
+sqlContext.catalog.registerTable(None, tableName,  
sqlContext.executePlan(plan).analyzed)
--- End diff --

Sorry again, you're right, I mistook `sqlContext._` for `SparkContext._`...


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...

2014-09-18 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/2397#discussion_r17711807
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala ---
@@ -166,3 +166,20 @@ case class DescribeCommand(child: SparkPlan, output: 
Seq[Attribute])(
   child.output.map(field = Row(field.name, field.dataType.toString, 
null))
   }
 }
+
+/**
+ * :: DeveloperApi ::
+ */
+@DeveloperApi
+case class CacheTableAsSelectCommand(tableName: String, plan: LogicalPlan)
+  extends LeafNode with Command {
+  
+  override protected[sql] lazy val sideEffectResult = {
+sqlContext.catalog.registerTable(None, tableName,  
sqlContext.executePlan(plan).analyzed)
--- End diff --

Updated the code. Please review


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...

2014-09-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2397#issuecomment-56001845
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20528/consoleFull)
 for   PR 2397 at commit 
[`a5f0beb`](https://github.com/apache/spark/commit/a5f0beb395836c76b3e7883ef7f1f61433645500).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...

2014-09-18 Thread ravipesala
Github user ravipesala commented on the pull request:

https://github.com/apache/spark/pull/2397#issuecomment-56001847
  
Updated as per comments. Please review.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...

2014-09-18 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2397#issuecomment-56005574
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20528/consoleFull)
 for   PR 2397 at commit 
[`a5f0beb`](https://github.com/apache/spark/commit/a5f0beb395836c76b3e7883ef7f1f61433645500).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CacheTableAsSelectCommand(tableName: String, plan: 
LogicalPlan) extends Command`
  * `case class CacheTableAsSelectCommand(tableName: String, logicalPlan: 
LogicalPlan)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...

2014-09-18 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/2397#issuecomment-56098131
  
@ravipesala Thanks for working on this! @marmbrus I think this is ready to 
go :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...

2014-09-17 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/2397#discussion_r17648914
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala ---
@@ -166,3 +166,20 @@ case class DescribeCommand(child: SparkPlan, output: 
Seq[Attribute])(
   child.output.map(field = Row(field.name, field.dataType.toString, 
null))
   }
 }
+
+/**
+ * :: DeveloperApi ::
+ */
+@DeveloperApi
+case class CacheTableAsSelectCommand(tableName: String, plan: LogicalPlan)
+  extends LeafNode with Command {
+  
+  override protected[sql] lazy val sideEffectResult = {
+sqlContext.catalog.registerTable(None, tableName,  
sqlContext.executePlan(plan).analyzed)
--- End diff --

(Probably my final comment on this PR :) )

As described in PR #2382, we shouldn't store analyzed logical plan when 
registering tables any more (see 
[here](https://github.com/apache/spark/pull/2382/files?diff=split#diff-5)).

To prevent duplicated code, I'd suggest to import `SQLContext._` so that we 
can leverage [the implicit 
conversion](https://github.com/apache/spark/blob/008a5ed4808d1467b47c1d6fa4d950cc6c4976b7/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L78-L85)
 from `LogicalPlan` to `SchemaRDD`, and then simply do this:

```scala
sqlContext.executePlan(plan).logical.registerTempTable(tableName)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...

2014-09-17 Thread liancheng
Github user liancheng commented on the pull request:

https://github.com/apache/spark/pull/2397#issuecomment-55857224
  
LGTM except for the analyzed logical plan issue as mentioned in my last 
comment. Thanks for working on this!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...

2014-09-17 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/2397#discussion_r17659871
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala ---
@@ -166,3 +166,20 @@ case class DescribeCommand(child: SparkPlan, output: 
Seq[Attribute])(
   child.output.map(field = Row(field.name, field.dataType.toString, 
null))
   }
 }
+
+/**
+ * :: DeveloperApi ::
+ */
+@DeveloperApi
+case class CacheTableAsSelectCommand(tableName: String, plan: LogicalPlan)
+  extends LeafNode with Command {
+  
+  override protected[sql] lazy val sideEffectResult = {
+sqlContext.catalog.registerTable(None, tableName,  
sqlContext.executePlan(plan).analyzed)
--- End diff --

Thank you for your comment. It is a good idea to import ```sqlContext._```. 
But we can simplify as below code if we import it. Please comment on it.
```
import sqlContext._
plan.registerTempTable(tableName)
cacheTable(tableName) 
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...

2014-09-17 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/2397#discussion_r17680055
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala ---
@@ -166,3 +166,20 @@ case class DescribeCommand(child: SparkPlan, output: 
Seq[Attribute])(
   child.output.map(field = Row(field.name, field.dataType.toString, 
null))
   }
 }
+
+/**
+ * :: DeveloperApi ::
+ */
+@DeveloperApi
+case class CacheTableAsSelectCommand(tableName: String, plan: LogicalPlan)
+  extends LeafNode with Command {
+  
+  override protected[sql] lazy val sideEffectResult = {
+sqlContext.catalog.registerTable(None, tableName,  
sqlContext.executePlan(plan).analyzed)
--- End diff --

Ah, yes you're right, we can use `plan` directly. And instead of importing 
`sqlContext._`, I'd import `SQLContext._` in the import section at the begin of 
this file:

```
import org.apache.spark.sql.SQLContext._
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...

2014-09-17 Thread ravipesala
Github user ravipesala commented on a diff in the pull request:

https://github.com/apache/spark/pull/2397#discussion_r17707447
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala ---
@@ -166,3 +166,20 @@ case class DescribeCommand(child: SparkPlan, output: 
Seq[Attribute])(
   child.output.map(field = Row(field.name, field.dataType.toString, 
null))
   }
 }
+
+/**
+ * :: DeveloperApi ::
+ */
+@DeveloperApi
+case class CacheTableAsSelectCommand(tableName: String, plan: LogicalPlan)
+  extends LeafNode with Command {
+  
+  override protected[sql] lazy val sideEffectResult = {
+sqlContext.catalog.registerTable(None, tableName,  
sqlContext.executePlan(plan).analyzed)
--- End diff --

It seems we cannot use the ```import org.apache.spark.sql.SQLContext._``` 
at the beginning of file to use implicit. Because there is no ```object``` 
defined for ```SQLContext``` and implicits are only part of ```class 
SQLContext```. We can only use the import on instance like ```import 
sqlContext._``` Please correct me if I am wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...

2014-09-16 Thread ravipesala
Github user ravipesala commented on the pull request:

https://github.com/apache/spark/pull/2397#issuecomment-55806308
  
Changed the behavior from eager to lazy caching. And also updated the 
description. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...

2014-09-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2397#issuecomment-55806367
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20419/consoleFull)
 for   PR 2397 at commit 
[`8059cd2`](https://github.com/apache/spark/commit/8059cd261dd079ab583531484dd02af452a74a18).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...

2014-09-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2397#issuecomment-55818046
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20419/consoleFull)
 for   PR 2397 at commit 
[`8059cd2`](https://github.com/apache/spark/commit/8059cd261dd079ab583531484dd02af452a74a18).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CacheTableAsSelectCommand(tableName: String, plan: 
LogicalPlan) extends Command`
  * `case class CacheTableAsSelectCommand(tableName: String, plan: 
LogicalPlan)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...

2014-09-16 Thread liancheng
Github user liancheng commented on a diff in the pull request:

https://github.com/apache/spark/pull/2397#discussion_r17635163
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -229,7 +229,13 @@ private[hive] object HiveQl {
 SetCommand(Some(key), Some(value))
 }
   } else if (sql.trim.toLowerCase.startsWith(cache table)) {
-CacheCommand(sql.trim.drop(12).trim, true)
+sql.trim.drop(12).trim.split( ).toSeq match {
+  case Seq(tableName) = 
+CacheCommand(tableName, true)
+  case Seq(tableName,as, select@_*) = 
--- End diff --

@chenghao-intel I agree that currently our HiveQL syntax extension scheme 
is quite hacky and brittle in Spark SQL... Other commands like `SET`, `ADD JAR` 
and `DFS` etc. also suffer the same problem. However, I'd like to fix them 
altogether in a future PR :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...

2014-09-16 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/2397#discussion_r17643640
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -229,7 +229,13 @@ private[hive] object HiveQl {
 SetCommand(Some(key), Some(value))
 }
   } else if (sql.trim.toLowerCase.startsWith(cache table)) {
-CacheCommand(sql.trim.drop(12).trim, true)
+sql.trim.drop(12).trim.split( ).toSeq match {
+  case Seq(tableName) = 
+CacheCommand(tableName, true)
+  case Seq(tableName,as, select@_*) = 
--- End diff --

Thank you @ravipesala @liancheng , let's improve that in the future. :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org