[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/2397#issuecomment-56244356 Thanks! I've merged this to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2397 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2397#discussion_r17710806 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala --- @@ -166,3 +166,20 @@ case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])( child.output.map(field = Row(field.name, field.dataType.toString, null)) } } + +/** + * :: DeveloperApi :: + */ +@DeveloperApi +case class CacheTableAsSelectCommand(tableName: String, plan: LogicalPlan) + extends LeafNode with Command { + + override protected[sql] lazy val sideEffectResult = { +sqlContext.catalog.registerTable(None, tableName, sqlContext.executePlan(plan).analyzed) --- End diff -- Sorry again, you're right, I mistook `sqlContext._` for `SparkContext._`... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2397#discussion_r17711807 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala --- @@ -166,3 +166,20 @@ case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])( child.output.map(field = Row(field.name, field.dataType.toString, null)) } } + +/** + * :: DeveloperApi :: + */ +@DeveloperApi +case class CacheTableAsSelectCommand(tableName: String, plan: LogicalPlan) + extends LeafNode with Command { + + override protected[sql] lazy val sideEffectResult = { +sqlContext.catalog.registerTable(None, tableName, sqlContext.executePlan(plan).analyzed) --- End diff -- Updated the code. Please review --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2397#issuecomment-56001845 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20528/consoleFull) for PR 2397 at commit [`a5f0beb`](https://github.com/apache/spark/commit/a5f0beb395836c76b3e7883ef7f1f61433645500). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2397#issuecomment-56001847 Updated as per comments. Please review. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2397#issuecomment-56005574 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20528/consoleFull) for PR 2397 at commit [`a5f0beb`](https://github.com/apache/spark/commit/a5f0beb395836c76b3e7883ef7f1f61433645500). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class CacheTableAsSelectCommand(tableName: String, plan: LogicalPlan) extends Command` * `case class CacheTableAsSelectCommand(tableName: String, logicalPlan: LogicalPlan)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2397#issuecomment-56098131 @ravipesala Thanks for working on this! @marmbrus I think this is ready to go :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2397#discussion_r17648914 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala --- @@ -166,3 +166,20 @@ case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])( child.output.map(field = Row(field.name, field.dataType.toString, null)) } } + +/** + * :: DeveloperApi :: + */ +@DeveloperApi +case class CacheTableAsSelectCommand(tableName: String, plan: LogicalPlan) + extends LeafNode with Command { + + override protected[sql] lazy val sideEffectResult = { +sqlContext.catalog.registerTable(None, tableName, sqlContext.executePlan(plan).analyzed) --- End diff -- (Probably my final comment on this PR :) ) As described in PR #2382, we shouldn't store analyzed logical plan when registering tables any more (see [here](https://github.com/apache/spark/pull/2382/files?diff=split#diff-5)). To prevent duplicated code, I'd suggest to import `SQLContext._` so that we can leverage [the implicit conversion](https://github.com/apache/spark/blob/008a5ed4808d1467b47c1d6fa4d950cc6c4976b7/sql/core/src/main/scala/org/apache/spark/sql/SQLContext.scala#L78-L85) from `LogicalPlan` to `SchemaRDD`, and then simply do this: ```scala sqlContext.executePlan(plan).logical.registerTempTable(tableName) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2397#issuecomment-55857224 LGTM except for the analyzed logical plan issue as mentioned in my last comment. Thanks for working on this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2397#discussion_r17659871 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala --- @@ -166,3 +166,20 @@ case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])( child.output.map(field = Row(field.name, field.dataType.toString, null)) } } + +/** + * :: DeveloperApi :: + */ +@DeveloperApi +case class CacheTableAsSelectCommand(tableName: String, plan: LogicalPlan) + extends LeafNode with Command { + + override protected[sql] lazy val sideEffectResult = { +sqlContext.catalog.registerTable(None, tableName, sqlContext.executePlan(plan).analyzed) --- End diff -- Thank you for your comment. It is a good idea to import ```sqlContext._```. But we can simplify as below code if we import it. Please comment on it. ``` import sqlContext._ plan.registerTempTable(tableName) cacheTable(tableName) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2397#discussion_r17680055 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala --- @@ -166,3 +166,20 @@ case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])( child.output.map(field = Row(field.name, field.dataType.toString, null)) } } + +/** + * :: DeveloperApi :: + */ +@DeveloperApi +case class CacheTableAsSelectCommand(tableName: String, plan: LogicalPlan) + extends LeafNode with Command { + + override protected[sql] lazy val sideEffectResult = { +sqlContext.catalog.registerTable(None, tableName, sqlContext.executePlan(plan).analyzed) --- End diff -- Ah, yes you're right, we can use `plan` directly. And instead of importing `sqlContext._`, I'd import `SQLContext._` in the import section at the begin of this file: ``` import org.apache.spark.sql.SQLContext._ ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...
Github user ravipesala commented on a diff in the pull request: https://github.com/apache/spark/pull/2397#discussion_r17707447 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/commands.scala --- @@ -166,3 +166,20 @@ case class DescribeCommand(child: SparkPlan, output: Seq[Attribute])( child.output.map(field = Row(field.name, field.dataType.toString, null)) } } + +/** + * :: DeveloperApi :: + */ +@DeveloperApi +case class CacheTableAsSelectCommand(tableName: String, plan: LogicalPlan) + extends LeafNode with Command { + + override protected[sql] lazy val sideEffectResult = { +sqlContext.catalog.registerTable(None, tableName, sqlContext.executePlan(plan).analyzed) --- End diff -- It seems we cannot use the ```import org.apache.spark.sql.SQLContext._``` at the beginning of file to use implicit. Because there is no ```object``` defined for ```SQLContext``` and implicits are only part of ```class SQLContext```. We can only use the import on instance like ```import sqlContext._``` Please correct me if I am wrong. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...
Github user ravipesala commented on the pull request: https://github.com/apache/spark/pull/2397#issuecomment-55806308 Changed the behavior from eager to lazy caching. And also updated the description. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2397#issuecomment-55806367 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20419/consoleFull) for PR 2397 at commit [`8059cd2`](https://github.com/apache/spark/commit/8059cd261dd079ab583531484dd02af452a74a18). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2397#issuecomment-55818046 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20419/consoleFull) for PR 2397 at commit [`8059cd2`](https://github.com/apache/spark/commit/8059cd261dd079ab583531484dd02af452a74a18). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class CacheTableAsSelectCommand(tableName: String, plan: LogicalPlan) extends Command` * `case class CacheTableAsSelectCommand(tableName: String, plan: LogicalPlan)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/2397#discussion_r17635163 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -229,7 +229,13 @@ private[hive] object HiveQl { SetCommand(Some(key), Some(value)) } } else if (sql.trim.toLowerCase.startsWith(cache table)) { -CacheCommand(sql.trim.drop(12).trim, true) +sql.trim.drop(12).trim.split( ).toSeq match { + case Seq(tableName) = +CacheCommand(tableName, true) + case Seq(tableName,as, select@_*) = --- End diff -- @chenghao-intel I agree that currently our HiveQL syntax extension scheme is quite hacky and brittle in Spark SQL... Other commands like `SET`, `ADD JAR` and `DFS` etc. also suffer the same problem. However, I'd like to fix them altogether in a future PR :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2594][SQL] Support CACHE TABLE name A...
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/2397#discussion_r17643640 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -229,7 +229,13 @@ private[hive] object HiveQl { SetCommand(Some(key), Some(value)) } } else if (sql.trim.toLowerCase.startsWith(cache table)) { -CacheCommand(sql.trim.drop(12).trim, true) +sql.trim.drop(12).trim.split( ).toSeq match { + case Seq(tableName) = +CacheCommand(tableName, true) + case Seq(tableName,as, select@_*) = --- End diff -- Thank you @ravipesala @liancheng , let's improve that in the future. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org