[GitHub] spark pull request: [SPARK-5723][SQL]Change the default file forma...

2015-02-17 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/4639#discussion_r24867218
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/commands.scala ---
@@ -165,8 +165,10 @@ case class CreateMetastoreDataSourceAsSelect(
   mode match {
 case SaveMode.ErrorIfExists =
   sys.error(sTable $tableName already exists.  +
--- End diff --

I'd make this an `AnalysisException`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5723][SQL]Change the default file forma...

2015-02-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4639


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5723][SQL]Change the default file forma...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4639#issuecomment-74789292
  
  [Test build #27666 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27666/consoleFull)
 for   PR 4639 at commit 
[`a568137`](https://github.com/apache/spark/commit/a568137e8837b2a9c728f450e44d08f769c9179c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5723][SQL]Change the default file forma...

2015-02-17 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/4639#discussion_r24867096
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala ---
@@ -61,6 +61,21 @@ class HiveContext(sc: SparkContext) extends 
SQLContext(sc) {
   protected[sql] def convertMetastoreParquet: Boolean =
 getConf(spark.sql.hive.convertMetastoreParquet, true) == true
 
+  /**
+   * When true, a table created by a Hive CTAS statement (no USING clause) 
will be
+   * converted to a data source table, using the data source set by 
spark.sql.sources.default.
+   * The table in CTAS statement will be converted when it meets any of 
the following conditions:
+   *   - The CTAS does not specify any of a SerDe (ROW FORMAT SERDE), a 
File Format (STORED AS), or
+   * a Storage Hanlder (STORED BY), and the value of 
hive.default.fileformat in hive-site.xml
+   * is either TextFile or SequenceFile.
+   *   - The CTAS statement specifies TextFile (STORED AS TEXTFILE) as the 
file format and no SerDe
+   * is specified (no ROW FORMAT SERDE clause).
+   *   - The CTAS statement specifies SequenceFile (STORED AS 
SEQUENCEFILE) as the file format
+   * and no SerDe is specified (no ROW FORMAT SERDE clause).
+   */
+  protected[sql] def convertCTAS: Boolean =
+getConf(spark.sql.hive.convertCTAS, false) == true
--- End diff --

Nit: `.toBoolean`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5723][SQL]Change the default file forma...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4639#issuecomment-74797547
  
  [Test build #27666 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27666/consoleFull)
 for   PR 4639 at commit 
[`a568137`](https://github.com/apache/spark/commit/a568137e8837b2a9c728f450e44d08f769c9179c).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class SparkJobInfo(namedtuple(SparkJobInfo, jobId stageIds 
status)):`
  * `class SparkStageInfo(namedtuple(SparkStageInfo,`
  * `class StatusTracker(object):`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5723][SQL]Change the default file forma...

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4639#issuecomment-74797549
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27666/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5723][SQL]Change the default file forma...

2015-02-17 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/4639#discussion_r24867128
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -499,24 +499,69 @@ private[hive] class HiveMetastoreCatalog(hive: 
HiveContext) extends Catalog with
   Some(sa.getQB().getTableDesc)
 }
 
-execution.CreateTableAsSelect(
-  databaseName,
-  tableName,
-  child,
-  allowExisting,
-  desc)
+// Check if the query specifies file format or storage handler.
+val hasStorageSpec = desc match {
+  case Some(crtTbl) =
+crtTbl != null  (crtTbl.getSerName != null || 
crtTbl.getStorageHandler != null)
+  case None = false
+}
+
+if (hive.convertCTAS  !hasStorageSpec) {
+  // Do the conversion when convertHiveCTASWithoutStorageSpec is 
true and the query
--- End diff --

Out of date config name.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5723][SQL]Change the default file forma...

2015-02-17 Thread marmbrus
Github user marmbrus commented on a diff in the pull request:

https://github.com/apache/spark/pull/4639#discussion_r24851413
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/SQLQuerySuite.scala 
---
@@ -42,6 +45,69 @@ class SQLQuerySuite extends QueryTest {
 )
   }
 
+  test(CTAS without serde) {
+def checkRelation(tableName: String, isDataSourceParquet: Boolean): 
Unit = {
+  val relation = 
EliminateSubQueries(catalog.lookupRelation(Seq(tableName)))
+  relation match {
+case LogicalRelation(r: ParquetRelation2) =
+  if (!isDataSourceParquet) {
+fail(
+  s${classOf[MetastoreRelation].getCanonicalName} is 
expected, but found  +
+  s${ParquetRelation2.getClass.getCanonicalName}.)
+  }
+
+case r: MetastoreRelation =
+  if (isDataSourceParquet) {
+fail(
+  s${ParquetRelation2.getClass.getCanonicalName} is expected, 
but found  +
+  s${classOf[MetastoreRelation].getCanonicalName}.)
+  }
+  }
+}
+
+val originalConf = 
getConf(spark.sql.sources.convertHiveCTASWithoutStorageSpec, false)
+
+setConf(spark.sql.sources.convertHiveCTASWithoutStorageSpec, true)
+
+sql(CREATE TABLE ctas1 AS SELECT key k, value FROM src ORDER BY k, 
value)
+checkRelation(ctas1, true)
+sql(DROP TABLE ctas1)
+
+// Specifying database name for query can be converted to data source 
write path
+// is not allowed right now.
+val message = intercept[AnalysisException] {
+  sql(CREATE TABLE default.ctas1 AS SELECT key k, value FROM src 
ORDER BY k, value)
+}.getMessage
+assert(
+  message.contains(Cannot specify database name in a CTAS statement),
+  When spark.sql.sources.convertHiveCTASWithoutStorageSpec is true, 
we should not allow  +
--- End diff --

this message is out of date.  Also, should.. - do not currently allow 
the database name to be specified


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5723][SQL]Change the default file forma...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4639#issuecomment-74745021
  
  [Test build #27646 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27646/consoleFull)
 for   PR 4639 at commit 
[`8af5b2a`](https://github.com/apache/spark/commit/8af5b2ac388487c13c98ceef80ff1a8fb1a62217).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5723][SQL]Change the default file forma...

2015-02-17 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4639#issuecomment-74759366
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27646/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5723][SQL]Change the default file forma...

2015-02-17 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4639#issuecomment-74759353
  
  [Test build #27646 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27646/consoleFull)
 for   PR 4639 at commit 
[`8af5b2a`](https://github.com/apache/spark/commit/8af5b2ac388487c13c98ceef80ff1a8fb1a62217).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5723][SQL]Change the default file forma...

2015-02-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4639#issuecomment-74605485
  
  [Test build #27601 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27601/consoleFull)
 for   PR 4639 at commit 
[`5a67903`](https://github.com/apache/spark/commit/5a6790398098452bd34f8fd21de495178a87239e).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class ShowTablesCommand(databaseName: Option[String]) extends 
RunnableCommand `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5723][SQL]Change the default file forma...

2015-02-16 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4639#issuecomment-74605492
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/27601/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5723][SQL]Change the default file forma...

2015-02-16 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4639#issuecomment-74596033
  
  [Test build #27601 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27601/consoleFull)
 for   PR 4639 at commit 
[`5a67903`](https://github.com/apache/spark/commit/5a6790398098452bd34f8fd21de495178a87239e).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5723][SQL]Change the default file forma...

2015-02-16 Thread yhuai
GitHub user yhuai opened a pull request:

https://github.com/apache/spark/pull/4639

[SPARK-5723][SQL]Change the default file format to Parquet for CTAS 
statements.

JIRA: https://issues.apache.org/jira/browse/SPARK-5723

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yhuai/spark defaultCTASFileFormat

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4639.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4639


commit 5a6790398098452bd34f8fd21de495178a87239e
Author: Yin Huai yh...@databricks.com
Date:   2015-02-17T00:21:54Z

Use data source write path for Hive's CTAS statements when no storage 
format/handler is specified.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org