[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-60714744
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22337/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-60714742
  
  [Test build #22337 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22337/consoleFull)
 for   PR 2570 at commit 
[`e011ef5`](https://github.com/apache/spark/commit/e011ef557be3f438c5052e65462d5fbb89b51b6d).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CreateTableAsSelect[T](`
  * `  logDebug(Found class for $serdeName)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-28 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-60835836
  
Thanks for working on this!  I'm going to merge it in, but we should 
consider moving the semantic analyzer part to the analysis phase.  These 
execution APIs are all developer / experimental so we can change them whenever .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/2570


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-60560204
  
  [Test build #22283 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22283/consoleFull)
 for   PR 2570 at commit 
[`53d0c7a`](https://github.com/apache/spark/commit/53d0c7a911748efce5670ec79f4f565fa5b17950).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-27 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-60560261
  
@marmbrus, I've rebase dthe code with latest master (with Hive 0.13.1 
supported, but not compatible with Hive 0.12). Please let me know if you have 
concerns on this.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-60560430
  
  [Test build #22283 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22283/consoleFull)
 for   PR 2570 at commit 
[`53d0c7a`](https://github.com/apache/spark/commit/53d0c7a911748efce5670ec79f4f565fa5b17950).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CreateTableAsSelect[T](`
  * `  logDebug(Found class for $serdeName)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-60560431
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22283/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-27 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-60562071
  
Build failed because this PR is only compatible with Hive 0,13.1 (not 0.12 
any more).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-27 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-60590668
  
Can you explain the reason that hive 0.12 is not supported?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-60696885
  
  [Test build #22317 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22317/consoleFull)
 for   PR 2570 at commit 
[`2ab88c3`](https://github.com/apache/spark/commit/2ab88c350ef5f8d6e28d0a706d0414bae9e92c42).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-27 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-60697467
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/22317/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-60697462
  
  [Test build #22317 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22317/consoleFull)
 for   PR 2570 at commit 
[`2ab88c3`](https://github.com/apache/spark/commit/2ab88c350ef5f8d6e28d0a706d0414bae9e92c42).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CreateTableAsSelect[T](`
  * `  logDebug(Found class for $serdeName)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-27 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-60698604
  
@yhuai Some of the methods signature changed after upgrading to Hive 0.13, 
this actually is my concerns for how to write the shim code.
For this case:

https://github.com/apache/hive/blob/branch-0.13/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java#L4130

https://github.com/apache/hive/blob/branch-0.12/ql/src/java/org/apache/hadoop/hive/ql/exec/DDLTask.java#L3597


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-27 Thread yhuai
Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-60699948
  
Is it possible to add a method in shim like the following one?
```
// Hive 0.13
def setTableDataLocation(table: Table, createTableDesc: CreateTableDesc) {
  table.setDataLocation(new Path(createTableDesc.getLocation()));
}
// Hive 0.12
def setTableDataLocation(table: Table, createTableDesc: CreateTableDesc) {
  table.setDataLocation(new Path(createTableDesc.getLocation()).toUri());
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-27 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-60702583
  
Support conditional compilation probably need some workaround here, that's 
a general problem for the Hive upgrading I think. We need another PR to solve 
that before merging the PRs like this one.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-27 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-60705802
  
@chenghao-intel there is already support for conditional compilation based 
on hive version.  This code can go in 
`sql/hive/v0.X.0/src/main/scala/org/apache/spark/sql/hive/ShimX.scala`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-27 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-60710618
  
Thanks @marmbrus I will update the code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-27 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-60711633
  
  [Test build #22337 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/22337/consoleFull)
 for   PR 2570 at commit 
[`e011ef5`](https://github.com/apache/spark/commit/e011ef557be3f438c5052e65462d5fbb89b51b6d).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/2570#discussion_r19384148
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ---
@@ -160,17 +162,14 @@ private[hive] trait HiveStrategies {
 def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
   case logical.InsertIntoTable(table: MetastoreRelation, partition, 
child, overwrite) =
 InsertIntoHiveTable(table, partition, planLater(child), 
overwrite)(hiveContext) :: Nil
-
-  case logical.CreateTableAsSelect(database, tableName, child) =
-val query = planLater(child)
+  case logical.CreateTableAsSelect(
+ Some(database), tableName, child, allowExisting, Some(extra: 
ASTNode)) =
--- End diff --

Will the database always be specified?  If so maybe it shouldn't be an 
Option.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-26 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/2570#discussion_r19384160
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateTableAsSelect.scala
 ---
@@ -30,33 +32,46 @@ import org.apache.spark.sql.hive.MetastoreRelation
  * Create table and insert the query result into it.
  * @param database the database name of the new relation
  * @param tableName the table name of the new relation
- * @param insertIntoRelation function of creating the `InsertIntoHiveTable`
- *by specifying the `MetaStoreRelation`, the data will be inserted 
into that table.
- * TODO Add more table creating properties,  e.g. SerDe, StorageHandler, 
in-memory cache etc.
+ * @param query the query whose result will be insert into the new relation
+ * @param allowExisting allow continue working if it's already exists, 
otherwise
+ *  raise exception
+ * @param extra the extra information for this Operator, it should be the
+ *  ASTNode object for extracting the CreateTableDesc.
+
  */
 @Experimental
 case class CreateTableAsSelect(
 database: String,
 tableName: String,
-query: SparkPlan,
-insertIntoRelation: MetastoreRelation = InsertIntoHiveTable)
-  extends LeafNode with Command {
+query: LogicalPlan,
+allowExisting: Boolean,
+extra: ASTNode) extends LeafNode with Command {
--- End diff --

It seems a little odd that an AST is making it all the way to execution.  
Should we resolve the `TableDesc` earlier?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-26 Thread marmbrus
Github user marmbrus commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-60534526
  
This would be a great feature to get in before 1.2.  I made a few minor 
comments.  Also, can you rebase?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-26 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/2570#discussion_r19386385
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala ---
@@ -160,17 +162,14 @@ private[hive] trait HiveStrategies {
 def apply(plan: LogicalPlan): Seq[SparkPlan] = plan match {
   case logical.InsertIntoTable(table: MetastoreRelation, partition, 
child, overwrite) =
 InsertIntoHiveTable(table, partition, planLater(child), 
overwrite)(hiveContext) :: Nil
-
-  case logical.CreateTableAsSelect(database, tableName, child) =
-val query = planLater(child)
+  case logical.CreateTableAsSelect(
+ Some(database), tableName, child, allowExisting, Some(extra: 
ASTNode)) =
--- End diff --

yes, it's always specified here, we just to extract the real value in 
pattern matching, so, we can use the variable `database` directly later on, 
otherwise, we need to call `database.get`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-26 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/2570#discussion_r19386522
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateTableAsSelect.scala
 ---
@@ -30,33 +32,46 @@ import org.apache.spark.sql.hive.MetastoreRelation
  * Create table and insert the query result into it.
  * @param database the database name of the new relation
  * @param tableName the table name of the new relation
- * @param insertIntoRelation function of creating the `InsertIntoHiveTable`
- *by specifying the `MetaStoreRelation`, the data will be inserted 
into that table.
- * TODO Add more table creating properties,  e.g. SerDe, StorageHandler, 
in-memory cache etc.
+ * @param query the query whose result will be insert into the new relation
+ * @param allowExisting allow continue working if it's already exists, 
otherwise
+ *  raise exception
+ * @param extra the extra information for this Operator, it should be the
+ *  ASTNode object for extracting the CreateTableDesc.
+
  */
 @Experimental
 case class CreateTableAsSelect(
 database: String,
 tableName: String,
-query: SparkPlan,
-insertIntoRelation: MetastoreRelation = InsertIntoHiveTable)
-  extends LeafNode with Command {
+query: LogicalPlan,
+allowExisting: Boolean,
+extra: ASTNode) extends LeafNode with Command {
--- End diff --

Yes, that's a good question.
Actually I was considering to resolving that in `HiveQl`, or `Analyzer` (in 
package hive), but it requires the hiveconf as input, which probably need 
some of the code refactor in the existed code. So I marked this operator as 
`Experimental`, and keep it for future updating.

```
val sa = new SemanticAnalyzer(sc.hiveconf)
sa.analyze(extra, new Context(sc.hiveconf))
```



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-26 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/2570#discussion_r19386550
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateTableAsSelect.scala
 ---
@@ -30,33 +32,46 @@ import org.apache.spark.sql.hive.MetastoreRelation
  * Create table and insert the query result into it.
  * @param database the database name of the new relation
  * @param tableName the table name of the new relation
- * @param insertIntoRelation function of creating the `InsertIntoHiveTable`
- *by specifying the `MetaStoreRelation`, the data will be inserted 
into that table.
- * TODO Add more table creating properties,  e.g. SerDe, StorageHandler, 
in-memory cache etc.
+ * @param query the query whose result will be insert into the new relation
+ * @param allowExisting allow continue working if it's already exists, 
otherwise
+ *  raise exception
+ * @param extra the extra information for this Operator, it should be the
+ *  ASTNode object for extracting the CreateTableDesc.
+
  */
 @Experimental
 case class CreateTableAsSelect(
 database: String,
 tableName: String,
-query: SparkPlan,
-insertIntoRelation: MetastoreRelation = InsertIntoHiveTable)
-  extends LeafNode with Command {
+query: LogicalPlan,
+allowExisting: Boolean,
+extra: ASTNode) extends LeafNode with Command {
--- End diff --

And of course I can do the code refactor in another PR if you feel that's 
very important to keep a stable API interface.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-60475354
  
  [Test build #444 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/444/consoleFull)
 for   PR 2570 at commit 
[`3774bd4`](https://github.com/apache/spark/commit/3774bd4617cb4dec3f78a08bdf42653b682102fd).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-25 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-60477038
  
  [Test build #444 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/444/consoleFull)
 for   PR 2570 at commit 
[`3774bd4`](https://github.com/apache/spark/commit/3774bd4617cb4dec3f78a08bdf42653b682102fd).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-22 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-60178363
  
@marmbrus , sorry I know you're super busy recently, can you give more 
comments on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-14 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-59049695
  
@marmbrus any more comment on this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-11 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-58739817
  
test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-11 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-58739815
  
Seems the failure is not related to this PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-58739903
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21635/consoleFull)
 for   PR 2570 at commit 
[`3774bd4`](https://github.com/apache/spark/commit/3774bd4617cb4dec3f78a08bdf42653b682102fd).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-58740765
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21635/Test 
PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-58740758
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21635/consoleFull)
 for   PR 2570 at commit 
[`3774bd4`](https://github.com/apache/spark/commit/3774bd4617cb4dec3f78a08bdf42653b682102fd).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CreateTableAsSelect[T](`
  * `  logDebug(Found class for $serdeName)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-10 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/2570#discussion_r18690123
  
--- Diff: 
sql/hive/compatibility/src/test/scala/org/apache/spark/sql/hive/execution/HiveCompatibilitySuite.scala
 ---
@@ -211,7 +211,11 @@ class HiveCompatibilitySuite extends HiveQueryFileTest 
with BeforeAndAfter {
 describe_comment_indent,
 
 // Limit clause without a ordering, which causes failure.
-orc_predicate_pushdown
+orc_predicate_pushdown,
+
+// Sort with Limit clause causes failure.
--- End diff --

Yes, the reason it failed part of due to the #2859. I will keep updating 
the black list once #2859 be merged.
Currently, the added unit test in `SQLQuerySuite` may works for the same 
purpose.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-10 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/2570#discussion_r18692067
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala
 ---
@@ -117,8 +117,10 @@ case class InsertIntoTable(
 case class CreateTableAsSelect(
 databaseName: Option[String],
 tableName: String,
-child: LogicalPlan) extends UnaryNode {
-  override def output = child.output
+child: LogicalPlan,
+allowExisting: Boolean,
+extra: AnyRef = null) extends UnaryNode {
--- End diff --

What about to make the extra as generic type? `CTAS` probably widely 
supported by different SQL dialects, creating specialized version maybe lead to 
duplicated code.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-10 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-58625852
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21586/consoleFull)
 for   PR 2570 at commit 
[`366e758`](https://github.com/apache/spark/commit/366e758c1d2ad2e793936b7e6976fc215923a15a).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-10 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-58626811
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21587/consoleFull)
 for   PR 2570 at commit 
[`3774bd4`](https://github.com/apache/spark/commit/3774bd4617cb4dec3f78a08bdf42653b682102fd).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-58628891
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21586/Test 
FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-10 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-58628884
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21586/consoleFull)
 for   PR 2570 at commit 
[`366e758`](https://github.com/apache/spark/commit/366e758c1d2ad2e793936b7e6976fc215923a15a).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CreateTableAsSelect[T](`
  * `  logDebug(Found class for $serdeName)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-10 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-58630045
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21587/consoleFull)
 for   PR 2570 at commit 
[`3774bd4`](https://github.com/apache/spark/commit/3774bd4617cb4dec3f78a08bdf42653b682102fd).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class CreateTableAsSelect[T](`
  * `  logDebug(Found class for $serdeName)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-58630054
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21587/Test 
FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-58329046
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21458/consoleFull)
 for   PR 2570 at commit 
[`d49596b`](https://github.com/apache/spark/commit/d49596b868094570d4238720ff59e49aab263020).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-58334350
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21458/Test 
PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-58334343
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21458/consoleFull)
 for   PR 2570 at commit 
[`d49596b`](https://github.com/apache/spark/commit/d49596b868094570d4238720ff59e49aab263020).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  logDebug(Found class for $serdeName)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-08 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-58451206
  
I've updated the code, @yhuai @marmbrus , any more comments?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-58451631
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21502/Test 
FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-58454245
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21509/consoleFull)
 for   PR 2570 at commit 
[`ff2e140`](https://github.com/apache/spark/commit/ff2e140ee8fd1d5548b311a228408bbf0be492a6).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-08 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-58456682
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 

https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21509/Test 
PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-08 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-58456679
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21509/consoleFull)
 for   PR 2570 at commit 
[`ff2e140`](https://github.com/apache/spark/commit/ff2e140ee8fd1d5548b311a228408bbf0be492a6).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-07 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/2570#discussion_r18561069
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateTableAsSelect.scala
 ---
@@ -30,25 +32,38 @@ import org.apache.spark.sql.hive.MetastoreRelation
  * Create table and insert the query result into it.
  * @param database the database name of the new relation
  * @param tableName the table name of the new relation
- * @param insertIntoRelation function of creating the 
`InsertIntoHiveTable` 
- *by specifying the `MetaStoreRelation`, the data will be inserted 
into that table.
- * TODO Add more table creating properties,  e.g. SerDe, StorageHandler, 
in-memory cache etc.
+ * @param allowExisting allow continue working if it's already exists, 
otherwise
+ *  raise exception
+ * @param extra the extra information for this Operator, it should be the
+ *  ASTNode object for extracting the CreateTableDesc.
+ * @param query the query whose result will be insert into the new relation
  */
 @Experimental
 case class CreateTableAsSelect(
   database: String,
   tableName: String,
-  query: SparkPlan,
-  insertIntoRelation: MetastoreRelation = InsertIntoHiveTable)
+  allowExisting: Boolean,
+  extra: AnyRef,
+  query: LogicalPlan)
 extends LeafNode with Command {
 
   def output = Seq.empty
 
+  private[this] def sc = sqlContext.asInstanceOf[HiveContext]
+
   // A lazy computing of the metastoreRelation
   private[this] lazy val metastoreRelation: MetastoreRelation = {
-// Create the table 
-val sc = sqlContext.asInstanceOf[HiveContext]
-sc.catalog.createTable(database, tableName, query.output, false)
+// Get the CreateTableDesc from Hive SemanticAnalyzer
+val sa = new SemanticAnalyzer(sc.hiveconf)
--- End diff --

Not duplicating the code of `DDLTask.createTable()` will makes our code 
more cleaner, I will take look at this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-07 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/2570#discussion_r18561142
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateTableAsSelect.scala
 ---
@@ -56,7 +71,10 @@ case class CreateTableAsSelect(
   }
 
   override protected[sql] lazy val sideEffectResult: Seq[Row] = {
-insertIntoRelation(metastoreRelation).execute
+// TODO ideally, we should get the output data ready first and then
+// update the relation, just in case of failure occurs in data
+// processing. Otherwise we may not able to get a consistent.
--- End diff --

Oh, thanks, I didn't know this, will do that.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-07 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/2570#discussion_r18561343
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -492,14 +492,9 @@ private[hive] object HiveQl {
 TOK_TABLELOCATION,
 TOK_TABLEPROPERTIES),
   children)
-  if (notImplemented.exists(token = !token.isEmpty)) {
-throw new NotImplementedError(
-  sUnhandled clauses: 
${notImplemented.flatten.map(dumpTree(_)).mkString(\n)})
-  }
-
   val (db, tableName) = extractDbNameTableName(tableNameParts)
 
-  CreateTableAsSelect(db, tableName, nodeToPlan(query))
+  CreateTableAsSelect(db, tableName, allowExisting != None, 
nodeToPlan(query), node)
--- End diff --

I was planning to create the object `CreateTableDesc` in `HiveQl`, however, 
the class `SemanticAnalyzer` instantiating requires pass in the `HiveConf` 
object, hence it's probably more straightforward to do that within physical 
plan.
But in long term run, it's a better idea to pass in the `HiveConf` while 
call the `HiveQl.parseSql`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-07 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-58296226
  
Thank you @yhuai , I will update the code accordingly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-01 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/2570#discussion_r18263442
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -32,6 +32,54 @@ case class Nested3(f3: Int)
  * valid, but Hive currently cannot execute it.
  */
 class SQLQuerySuite extends QueryTest {
+  test(CTAS with serde) {
+sql(CREATE TABLE ctas1 AS SELECT key k, value FROM src ORDER BY k, 
value).collect
+sql(
+  CREATE TABLE ctas2
+| ROW FORMAT SERDE 
org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe
+| STORED AS RCFile AS
+|   SELECT key, value
+|   FROM src
+|   ORDER BY key, value.stripMargin).collect
--- End diff --

That's a good question, I will add the `explain` command to verify if the 
properties are correctly set.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-01 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request:

https://github.com/apache/spark/pull/2570#discussion_r18263559
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateTableAsSelect.scala
 ---
@@ -30,25 +32,38 @@ import org.apache.spark.sql.hive.MetastoreRelation
  * Create table and insert the query result into it.
  * @param database the database name of the new relation
  * @param tableName the table name of the new relation
- * @param insertIntoRelation function of creating the 
`InsertIntoHiveTable` 
- *by specifying the `MetaStoreRelation`, the data will be inserted 
into that table.
- * TODO Add more table creating properties,  e.g. SerDe, StorageHandler, 
in-memory cache etc.
+ * @param allowExisting allow continue working if it's already exists, 
otherwise
+ *  raise exception
+ * @param extra the extra information for this Operator, it should be the
+ *  ASTNode object for extracting the CreateTableDesc.
+ * @param query the query whose result will be insert into the new relation
  */
 @Experimental
 case class CreateTableAsSelect(
   database: String,
   tableName: String,
-  query: SparkPlan,
-  insertIntoRelation: MetastoreRelation = InsertIntoHiveTable)
+  allowExisting: Boolean,
+  extra: AnyRef,
+  query: LogicalPlan)
 extends LeafNode with Command {
 
   def output = Seq.empty
 
+  private[this] def sc = sqlContext.asInstanceOf[HiveContext]
+
   // A lazy computing of the metastoreRelation
   private[this] lazy val metastoreRelation: MetastoreRelation = {
-// Create the table 
-val sc = sqlContext.asInstanceOf[HiveContext]
-sc.catalog.createTable(database, tableName, query.output, false)
+// Get the CreateTableDesc from Hive SemanticAnalyzer
+val sa = new SemanticAnalyzer(sc.hiveconf)
--- End diff --

I was planning to do this by re-implementing within `HiveQL`, but the logic 
of `CreateTableDesc` is quite complicated and error-prone, probably reuse the 
Hive `SemanticAnalzyer` is a simple and quick work around. I will add a TODO 
for further improvement. What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-57432988
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21099/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-01 Thread chenghao-intel
Github user chenghao-intel commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-57461439
  
retest this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-01 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-57462077
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21112/consoleFull)
 for   PR 2570 at commit 
[`fcbbc61`](https://github.com/apache/spark/commit/fcbbc611d80a31160c79645a09970fba60559d48).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-01 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/2570#discussion_r18279773
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateTableAsSelect.scala
 ---
@@ -56,7 +71,10 @@ case class CreateTableAsSelect(
   }
 
   override protected[sql] lazy val sideEffectResult: Seq[Row] = {
-insertIntoRelation(metastoreRelation).execute
+// TODO ideally, we should get the output data ready first and then
+// update the relation, just in case of failure occurs in data
+// processing. Otherwise we may not able to get a consistent.
--- End diff --

If we populate metastore after evaluating the query, we also need to make 
sure information stored in `CreateTableDesc` will be correctly set to 
`tableInfo` in the `FileSinkDesc`. Also, if `TableDesc 
org.apache.hadoop.hive.ql.plan.PlanUtils.getTableDesc(CreateTableDesc, String, 
String)` will be used to create the `tableInfo`, the implementation of this 
method in hive 0.12 cannot be used because of the bug mentioned in 
https://issues.apache.org/jira/browse/HIVE-6083. Can you add a note at here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-01 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-57470303
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/21112/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-01 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-57470296
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21112/consoleFull)
 for   PR 2570 at commit 
[`fcbbc61`](https://github.com/apache/spark/commit/fcbbc611d80a31160c79645a09970fba60559d48).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  logDebug(Found class for $serdeName)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-01 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/2570#discussion_r18280936
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateTableAsSelect.scala
 ---
@@ -30,25 +32,38 @@ import org.apache.spark.sql.hive.MetastoreRelation
  * Create table and insert the query result into it.
  * @param database the database name of the new relation
  * @param tableName the table name of the new relation
- * @param insertIntoRelation function of creating the 
`InsertIntoHiveTable` 
- *by specifying the `MetaStoreRelation`, the data will be inserted 
into that table.
- * TODO Add more table creating properties,  e.g. SerDe, StorageHandler, 
in-memory cache etc.
+ * @param allowExisting allow continue working if it's already exists, 
otherwise
+ *  raise exception
+ * @param extra the extra information for this Operator, it should be the
+ *  ASTNode object for extracting the CreateTableDesc.
+ * @param query the query whose result will be insert into the new relation
  */
 @Experimental
 case class CreateTableAsSelect(
   database: String,
   tableName: String,
-  query: SparkPlan,
-  insertIntoRelation: MetastoreRelation = InsertIntoHiveTable)
+  allowExisting: Boolean,
+  extra: AnyRef,
+  query: LogicalPlan)
 extends LeafNode with Command {
 
   def output = Seq.empty
 
+  private[this] def sc = sqlContext.asInstanceOf[HiveContext]
+
   // A lazy computing of the metastoreRelation
   private[this] lazy val metastoreRelation: MetastoreRelation = {
-// Create the table 
-val sc = sqlContext.asInstanceOf[HiveContext]
-sc.catalog.createTable(database, tableName, query.output, false)
+// Get the CreateTableDesc from Hive SemanticAnalyzer
+val sa = new SemanticAnalyzer(sc.hiveconf)
--- End diff --

On one hand, I think we should try to not interact with Hive's query 
compiler if possible. On the other hand, since we ask Hive to process create 
table statements, it will be good to also ask Hive to process the create table 
part in CTAS queries. I guess a cleaner approach (requiring more work) will be 
splitting a CTAS query to create table part and query part. We ask Hive to 
process the create table part (Hive will see this part as a create table 
statement). We take care the query part. In this case, we will not need to 
duplicate code of `DDLTask.createTable()`. 

For now, I think that using `SemanticAnalyzer` is fine.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-01 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/2570#discussion_r18281051
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -70,6 +77,153 @@ private[hive] class HiveMetastoreCatalog(hive: 
HiveContext) extends Catalog with
   table.getTTable, partitions.map(part = part.getTPartition))(hive)
   }
 
+  /**
+   * Create table with specified database, table name, table description 
and schema
+   * @param dbName Database Name
+   * @param tblName Table Name
+   * @param crtTbl CreateTableDesc object which contains the SerDe info
+   * @param schema Schema of the new table, if not specified, will use the 
schema
+   *   specified in crtTbl
+   */
+  def createTable(
+  dbName: String,
+  tblName: String,
+  crtTbl: CreateTableDesc,
+  schema: Seq[Attribute]) {
+// Most of code are similar with the DDLTask.createTable(),
--- End diff --

It will be good to mention `DDLTask` is in Hive's codebase.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-01 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/2570#discussion_r18281532
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -70,6 +77,153 @@ private[hive] class HiveMetastoreCatalog(hive: 
HiveContext) extends Catalog with
   table.getTTable, partitions.map(part = part.getTPartition))(hive)
   }
 
+  /**
+   * Create table with specified database, table name, table description 
and schema
+   * @param dbName Database Name
+   * @param tblName Table Name
+   * @param crtTbl CreateTableDesc object which contains the SerDe info
+   * @param schema Schema of the new table, if not specified, will use the 
schema
+   *   specified in crtTbl
+   */
+  def createTable(
+  dbName: String,
+  tblName: String,
+  crtTbl: CreateTableDesc,
+  schema: Seq[Attribute]) {
+// Most of code are similar with the DDLTask.createTable(),
--- End diff --

Can we consolidate two `createTable`? With your change, seems the original 
`createTable` will be only used by `createTable` in `HiveContext`. 

Also, can you add comments to `HiveContext.createTable` to explain what 
table properties will be used if users call this method?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-10-01 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/2570#discussion_r18281667
  
--- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala ---
@@ -492,14 +492,9 @@ private[hive] object HiveQl {
 TOK_TABLELOCATION,
 TOK_TABLEPROPERTIES),
   children)
-  if (notImplemented.exists(token = !token.isEmpty)) {
-throw new NotImplementedError(
-  sUnhandled clauses: 
${notImplemented.flatten.map(dumpTree(_)).mkString(\n)})
-  }
-
   val (db, tableName) = extractDbNameTableName(tableNameParts)
 
-  CreateTableAsSelect(db, tableName, nodeToPlan(query))
+  CreateTableAsSelect(db, tableName, allowExisting != None, 
nodeToPlan(query), node)
--- End diff --

Do we have to pass the AST tree all the way down to the physical plan?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-09-30 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/2570#discussion_r18242621
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -32,6 +32,54 @@ case class Nested3(f3: Int)
  * valid, but Hive currently cannot execute it.
  */
 class SQLQuerySuite extends QueryTest {
+  test(CTAS with serde) {
+sql(CREATE TABLE ctas1 AS SELECT key k, value FROM src ORDER BY k, 
value).collect
+sql(
+  CREATE TABLE ctas2
+| ROW FORMAT SERDE 
org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe
+| STORED AS RCFile AS
+|   SELECT key, value
+|   FROM src
+|   ORDER BY key, value.stripMargin).collect
--- End diff --

I am not sure we should just check the contents of created tables to test 
whether we can correctly set some table properties. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-09-30 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/2570#discussion_r18243382
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/CreateTableAsSelect.scala
 ---
@@ -30,25 +32,38 @@ import org.apache.spark.sql.hive.MetastoreRelation
  * Create table and insert the query result into it.
  * @param database the database name of the new relation
  * @param tableName the table name of the new relation
- * @param insertIntoRelation function of creating the 
`InsertIntoHiveTable` 
- *by specifying the `MetaStoreRelation`, the data will be inserted 
into that table.
- * TODO Add more table creating properties,  e.g. SerDe, StorageHandler, 
in-memory cache etc.
+ * @param allowExisting allow continue working if it's already exists, 
otherwise
+ *  raise exception
+ * @param extra the extra information for this Operator, it should be the
+ *  ASTNode object for extracting the CreateTableDesc.
+ * @param query the query whose result will be insert into the new relation
  */
 @Experimental
 case class CreateTableAsSelect(
   database: String,
   tableName: String,
-  query: SparkPlan,
-  insertIntoRelation: MetastoreRelation = InsertIntoHiveTable)
+  allowExisting: Boolean,
+  extra: AnyRef,
+  query: LogicalPlan)
 extends LeafNode with Command {
 
   def output = Seq.empty
 
+  private[this] def sc = sqlContext.asInstanceOf[HiveContext]
+
   // A lazy computing of the metastoreRelation
   private[this] lazy val metastoreRelation: MetastoreRelation = {
-// Create the table 
-val sc = sqlContext.asInstanceOf[HiveContext]
-sc.catalog.createTable(database, tableName, query.output, false)
+// Get the CreateTableDesc from Hive SemanticAnalyzer
+val sa = new SemanticAnalyzer(sc.hiveconf)
--- End diff --

Do we have to use Hive's SemanticAnalyzer?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-09-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-57261085
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/21016/consoleFull)
 for   PR 2570 at commit 
[`4ea462c`](https://github.com/apache/spark/commit/4ea462cd3bc92f137fc9bae07be747564f01abc3).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-09-28 Thread chenghao-intel
GitHub user chenghao-intel opened a pull request:

https://github.com/apache/spark/pull/2570

[SPARK-3343] [SQL] Add serde support for CTAS



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/chenghao-intel/spark ctas_serde

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/2570.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #2570


commit 439ce779010a9786d3a1e199b7722b3144b3bf66
Author: Cheng Hao hao.ch...@intel.com
Date:   2014-09-29T02:59:54Z

Add serde support for CTAS




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-09-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-57112668
  
  [QA tests have 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20956/consoleFull)
 for   PR 2570 at commit 
[`439ce77`](https://github.com/apache/spark/commit/439ce779010a9786d3a1e199b7722b3144b3bf66).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-09-28 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-57115003
  
  [QA tests have 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/20956/consoleFull)
 for   PR 2570 at commit 
[`439ce77`](https://github.com/apache/spark/commit/439ce779010a9786d3a1e199b7722b3144b3bf66).
 * This patch **passes** unit tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `  logDebug(Found class for $serdeName)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-3343] [SQL] Add serde support for CTAS

2014-09-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/2570#issuecomment-57115005
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/20956/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org