[GitHub] spark pull request #16417: [SPARK-19014][SQL] support complex aggregate buff...

2016-12-28 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16417#discussion_r94109096
  
--- Diff: 
sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/UnsafeRow.java
 ---
@@ -201,6 +210,25 @@ public void setNullAt(int i) {
 Platform.putLong(baseObject, getFieldOffset(i), 0);
   }
 
+  public void setNullData(int ordinal) {
--- End diff --

Might be confused with `setNullAt`, add a short comment for this? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16432: [SPARK-19021][YARN] Generailize HDFSCredentialProvider t...

2016-12-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16432
  
**[Test build #70712 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70712/testReport)**
 for PR 16432 at commit 
[`0c1ec23`](https://github.com/apache/spark/commit/0c1ec23626b74544e456778262030cee1c623b30).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16432: [SPARK-19021][YARN] Generailize HDFSCredentialPro...

2016-12-28 Thread jerryshao
GitHub user jerryshao opened a pull request:

https://github.com/apache/spark/pull/16432

[SPARK-19021][YARN] Generailize HDFSCredentialProvider to support non HDFS 
security filesystems

Change-Id: I85d7963c4980cf9660f495f377ba27227da4b1b1

Currently Spark can only get token renewal interval from security HDFS 
(hdfs://), if Spark runs with other security file systems like webHDFS 
(webhdfs://), wasb (wasb://), ADLS, it will ignore these tokens and not get 
token renewal intervals from these tokens. These will make Spark unable to work 
with these security clusters. So instead of only checking HDFS token, we should 
generalize to support different DelegationTokenIdentifier.

## How was this patch tested?

Manually verified in security cluster.


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jerryshao/apache-spark SPARK-19021

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16432.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16432


commit 0c1ec23626b74544e456778262030cee1c623b30
Author: jerryshao 
Date:   2016-12-29T07:30:19Z

Generailize HDFSCredentialProvider to support non HDFS security FS

Change-Id: I85d7963c4980cf9660f495f377ba27227da4b1b1




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16415: [SPARK-19007]Speedup and optimize the GradientBoostedTre...

2016-12-28 Thread zdh2292390
Github user zdh2292390 commented on the issue:

https://github.com/apache/spark/pull/16415
  
@jkbradley   There is another problem :   predErrorCheckpointer  did not 
unpersist  the RDD  in the queue  after  the loop.  There will be 2 RDD left  
cached  after the loop .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-28 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94107668
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -586,6 +587,122 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * Command that looks like
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+column: String,
+isExtended: Boolean,
+isFormatted: Boolean)
+  extends RunnableCommand {
+
+  override val output: Seq[Attribute] =
+// Column names are based on Hive.
+if (isFormatted) {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("min", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "min value of the 
column").build())(),
+AttributeReference("max", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "max value of the 
column").build())(),
+AttributeReference("num_nulls", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "number of nulls of 
the column").build())(),
+AttributeReference("distinct_count", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "distinct count of 
the column").build())(),
+AttributeReference("avg_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"average length of the values of the column").build())(),
+AttributeReference("max_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"max length of the values of the column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+} else {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+}
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val result = new ArrayBuffer[Row]
--- End diff --

yea I'll delete it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-28 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94107442
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -586,6 +587,122 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * Command that looks like
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+column: String,
+isExtended: Boolean,
--- End diff --

I will remove it since the result is same with or without `isExtended`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16417: [SPARK-19014][SQL] support complex aggregate buffer in H...

2016-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16417
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70706/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16417: [SPARK-19014][SQL] support complex aggregate buffer in H...

2016-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16417
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16417: [SPARK-19014][SQL] support complex aggregate buffer in H...

2016-12-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16417
  
**[Test build #70706 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70706/testReport)**
 for PR 16417 at commit 
[`8a5fe25`](https://github.com/apache/spark/commit/8a5fe253215eabc1f2686d02e4787c1fc692866f).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-28 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94106620
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -300,10 +300,12 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
* Create a [[DescribeTableCommand]] logical plan.
*/
   override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan 
= withOrigin(ctx) {
-// Describe column are not supported yet. Return null and let the 
parser decide
-// what to do with this (create an exception or pass it on to a 
different system).
 if (ctx.describeColName != null) {
-  null
+  DescribeColumnCommand(
--- End diff --

yes we should


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-28 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94106492
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -586,6 +587,122 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * Command that looks like
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+column: String,
+isExtended: Boolean,
+isFormatted: Boolean)
+  extends RunnableCommand {
+
+  override val output: Seq[Attribute] =
+// Column names are based on Hive.
--- End diff --

I got these names by running hive. I can't find any document about the 
names, but I'll add a link of the corresponding JIRA of Hive.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16387: [SPARK-18986][Core] ExternalAppendOnlyMap shouldn't fail...

2016-12-28 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16387
  
cc @JoshRosen Can you take a look? Thanks. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13758: [SPARK-16043][SQL] Prepare GenericArrayData imple...

2016-12-28 Thread kiszk
Github user kiszk closed the pull request at:

https://github.com/apache/spark/pull/13758


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-12-28 Thread kiszk
Github user kiszk commented on the issue:

https://github.com/apache/spark/pull/13758
  
Issues, which this PR addresses, have been solved by other approaches (i.e. 
use `UnsafePrimitiveArray`).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16431: [SPARK-19020] [SQL] Cardinality estimation of aggregate ...

2016-12-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16431
  
**[Test build #70710 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70710/testReport)**
 for PR 16431 at commit 
[`c064595`](https://github.com/apache/spark/commit/c0645952aef0455ac0ad6cefc6de9e624813c7bc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16233: [SPARK-18801][SQL] Add `View` operator to help resolve a...

2016-12-28 Thread yhuai
Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/16233
  
Seems it is good to know what issues need to be addressed before we can 
switch to this new approach. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16430: [SPARK-17077] [SQL] Cardinality estimation for project o...

2016-12-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16430
  
**[Test build #70711 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70711/testReport)**
 for PR 16430 at commit 
[`d222020`](https://github.com/apache/spark/commit/d222020c8b225174724f3ec4ad1b1fee3c0128e8).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16431: [SPARK-19020] [SQL] Cardinality estimation of aggregate ...

2016-12-28 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/16431
  
cc @rxin @hvanhovell @cloud-fan @srinathshankar 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16431: [SPARK-19020] [SQL] Cardinality estimation of agg...

2016-12-28 Thread wzhfy
GitHub user wzhfy opened a pull request:

https://github.com/apache/spark/pull/16431

[SPARK-19020] [SQL] Cardinality estimation of aggregate operator

## What changes were proposed in this pull request?

Support cardinality estimation of aggregate operator

## How was this patch tested?

Add test cases

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wzhfy/spark aggEstimation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16431.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16431


commit c0645952aef0455ac0ad6cefc6de9e624813c7bc
Author: Zhenhua Wang 
Date:   2016-12-29T06:34:40Z

agg estimation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...

2016-12-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16371
  
**[Test build #70709 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70709/testReport)**
 for PR 16371 at commit 
[`965a4a9`](https://github.com/apache/spark/commit/965a4a93ddc79f4f34a7a32a3b3ef560ebb08665).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16228: [SPARK-17076] [SQL] Cardinality estimation for join base...

2016-12-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16228
  
**[Test build #70708 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70708/testReport)**
 for PR 16228 at commit 
[`df839f8`](https://github.com/apache/spark/commit/df839f8d6143bd45f82b1beecf8ab151abe8ea35).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...

2016-12-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16371
  
**[Test build #70707 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70707/testReport)**
 for PR 16371 at commit 
[`f60e824`](https://github.com/apache/spark/commit/f60e824e6913edb21ae5e0fa6207dcaccdf8de20).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16233: [SPARK-18801][SQL] Add `View` operator to help resolve a...

2016-12-28 Thread yhuai
Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/16233
  
Changes look good to me. @gatorsmile @hvanhovell what do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-28 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r94104622
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,93 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we look 
up the table from catalog
+// and change the default database name if it is a view.
+// We usually look up a table from the default database if the table 
identifier has an empty
+// database part, for a view the default database should be the 
currentDb when the view was
+// created. When the case comes to resolving a nested view, the view 
may have different default
+// database with that the referenced view has, so we need to use the 
variable `defaultDatabase`
+// to track the current default database.
+// When the relation we resolve is a view, we fetch the 
view.desc(which is a CatalogTable), and
+// then set the value of `CatalogTable.viewDefaultDatabase` to the 
variable `defaultDatabase`,
+// we look up the relations that the view references using the default 
database.
+// For example:
+// |- view1 (defaultDatabase = db1)
+//   |- operator
+// |- table2 (defaultDatabase = db1)
+// |- view2 (defaultDatabase = db2)
+//|- view3 (defaultDatabase = db3)
+//   |- view4 (defaultDatabase = db4)
+// In this case, the view `view1` is a nested view, it directly 
references `table2`、`view2`
+// and `view4`, the view `view2` references `view3`. On resolving the 
table, we look up the
+// relations `table2`、`view2`、`view4` using the default database 
`db1`, and look up the
+// relation `view3` using the default database `db2`.
+//
+// Note this is compatible with the views defined by older versions of 
Spark(before 2.2), which
+// have empty defaultDatabase and all the relations in viewText have 
database part defined.
+def resolveRelation(
+plan: LogicalPlan,
+defaultDatabase: Option[String] = None): LogicalPlan = plan match {
+  case u @ UnresolvedRelation(table: TableIdentifier, _) if 
isRunningDirectlyOnFiles(table) =>
+u
+  case u: UnresolvedRelation =>
+val relation = lookupTableFromCatalog(u, defaultDatabase)
+resolveRelation(relation, defaultDatabase)
+  // Hive support is required to resolve a persistent view, the 
logical plan returned by
+  // catalog.lookupRelation() should be:
--- End diff --

oh, we are not parsing view text when we do not have hive support. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...

2016-12-28 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16371#discussion_r94104643
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
 ---
@@ -29,12 +30,10 @@ import org.apache.spark.sql.types._
 /**
  * The Collect aggregate function collects all seen expression values into 
a list of values.
  *
- * The operator is bound to the slower sort based aggregation path because 
the number of
- * elements (and their memory usage) can not be determined in advance. 
This also means that the
- * collected elements are stored on heap, and that too many elements can 
cause GC pauses and
- * eventually Out of Memory Errors.
+ * We have to store all the collected elements in memory, and that too 
many elements can cause GC
+ * paused and eventually OutOfMemory Errors.
  */
-abstract class Collect extends ImperativeAggregate {
+abstract class Collect[T] extends TypedImperativeAggregate[T] {
--- End diff --

yeah, that's good idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-28 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r94103444
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,93 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we look 
up the table from catalog
+// and change the default database name if it is a view.
+// We usually look up a table from the default database if the table 
identifier has an empty
+// database part, for a view the default database should be the 
currentDb when the view was
+// created. When the case comes to resolving a nested view, the view 
may have different default
--- End diff --

oh, nvm, even if we have `database.viewname`, dbname and viewname will be 
inside the table identifier. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-28 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r94103404
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,93 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we look 
up the table from catalog
+// and change the default database name if it is a view.
+// We usually look up a table from the default database if the table 
identifier has an empty
+// database part, for a view the default database should be the 
currentDb when the view was
+// created. When the case comes to resolving a nested view, the view 
may have different default
--- End diff --

btw, will we allow cases like `CREATE VIEW database.viewname`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...

2016-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14158
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70704/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...

2016-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14158
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15996: [SPARK-18567][SQL] Simplify CreateDataSourceTable...

2016-12-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15996


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...

2016-12-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14158
  
**[Test build #70704 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70704/testReport)**
 for PR 14158 at commit 
[`83cbb58`](https://github.com/apache/spark/commit/83cbb588320d945d34dfd28e49eba90bcf63b064).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15996: [SPARK-18567][SQL] Simplify CreateDataSourceTableAsSelec...

2016-12-28 Thread yhuai
Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/15996
  
LGTM. Merging to master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16430: [SPARK-17077] [SQL] Cardinality estimation for project o...

2016-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16430
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16430: [SPARK-17077] [SQL] Cardinality estimation for project o...

2016-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16430
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70703/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16430: [SPARK-17077] [SQL] Cardinality estimation for project o...

2016-12-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16430
  
**[Test build #70703 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70703/testReport)**
 for PR 16430 at commit 
[`12a48fa`](https://github.com/apache/spark/commit/12a48fa528bfcd8119251cd585bcef9dd555d3a3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16233: [SPARK-18801][SQL] Add `View` operator to help re...

2016-12-28 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/16233#discussion_r94102704
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -510,32 +510,93 @@ class Analyzer(
* Replaces [[UnresolvedRelation]]s with concrete relations from the 
catalog.
*/
   object ResolveRelations extends Rule[LogicalPlan] {
-private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
+
+// If the unresolved relation is running directly on files, we just 
return the original
+// UnresolvedRelation, the plan will get resolved later. Else we look 
up the table from catalog
+// and change the default database name if it is a view.
+// We usually look up a table from the default database if the table 
identifier has an empty
+// database part, for a view the default database should be the 
currentDb when the view was
+// created. When the case comes to resolving a nested view, the view 
may have different default
+// database with that the referenced view has, so we need to use the 
variable `defaultDatabase`
+// to track the current default database.
+// When the relation we resolve is a view, we fetch the 
view.desc(which is a CatalogTable), and
+// then set the value of `CatalogTable.viewDefaultDatabase` to the 
variable `defaultDatabase`,
+// we look up the relations that the view references using the default 
database.
+// For example:
+// |- view1 (defaultDatabase = db1)
+//   |- operator
+// |- table2 (defaultDatabase = db1)
+// |- view2 (defaultDatabase = db2)
+//|- view3 (defaultDatabase = db3)
+//   |- view4 (defaultDatabase = db4)
+// In this case, the view `view1` is a nested view, it directly 
references `table2`、`view2`
+// and `view4`, the view `view2` references `view3`. On resolving the 
table, we look up the
+// relations `table2`、`view2`、`view4` using the default database 
`db1`, and look up the
+// relation `view3` using the default database `db2`.
+//
+// Note this is compatible with the views defined by older versions of 
Spark(before 2.2), which
+// have empty defaultDatabase and all the relations in viewText have 
database part defined.
+def resolveRelation(
+plan: LogicalPlan,
+defaultDatabase: Option[String] = None): LogicalPlan = plan match {
+  case u @ UnresolvedRelation(table: TableIdentifier, _) if 
isRunningDirectlyOnFiles(table) =>
+u
+  case u: UnresolvedRelation =>
+val relation = lookupTableFromCatalog(u, defaultDatabase)
+resolveRelation(relation, defaultDatabase)
+  // Hive support is required to resolve a persistent view, the 
logical plan returned by
+  // catalog.lookupRelation() should be:
--- End diff --

Sorry. I probably missed something. A persistent view is stored in the 
external catalog. So, we can always have persistent views, right? (we have a 
InMemoryCatalog, which is another external catalog. It is not very useful 
though. But it is still an external catalog)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...

2016-12-28 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16371#discussion_r94102016
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
 ---
@@ -44,39 +43,52 @@ abstract class Collect extends ImperativeAggregate {
 
   override def dataType: DataType = ArrayType(child.dataType)
 
-  override def supportsPartial: Boolean = false
-
-  override def aggBufferAttributes: Seq[AttributeReference] = Nil
-
-  override def aggBufferSchema: StructType = 
StructType.fromAttributes(aggBufferAttributes)
-
-  override def inputAggBufferAttributes: Seq[AttributeReference] = Nil
-
   // Both `CollectList` and `CollectSet` are non-deterministic since their 
results depend on the
   // actual order of input rows.
   override def deterministic: Boolean = false
 
-  protected[this] val buffer: Growable[Any] with Iterable[Any]
-
-  override def initialize(b: InternalRow): Unit = {
-buffer.clear()
-  }
-
-  override def update(b: InternalRow, input: InternalRow): Unit = {
-// Do not allow null values. We follow the semantics of Hive's 
collect_list/collect_set here.
-// See: 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMkCollectionEvaluator
-val value = child.eval(input)
-if (value != null) {
-  buffer += value
+  protected def _serialize(obj: Iterable[Any]): Array[Byte] = {
--- End diff --

Also, the serialized format (child.dataType, LongType) of `Percentile` is 
different than `CollectList` and `CollectSet` (i.e., child.dataType).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...

2016-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14461
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70698/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...

2016-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14461
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...

2016-12-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14461
  
**[Test build #70698 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70698/testReport)**
 for PR 14461 at commit 
[`d598e55`](https://github.com/apache/spark/commit/d598e55cbc60b53319742cca7b3843e155f68221).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94101640
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -586,6 +587,122 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * Command that looks like
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+column: String,
+isExtended: Boolean,
+isFormatted: Boolean)
+  extends RunnableCommand {
+
+  override val output: Seq[Attribute] =
+// Column names are based on Hive.
+if (isFormatted) {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("min", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "min value of the 
column").build())(),
+AttributeReference("max", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "max value of the 
column").build())(),
+AttributeReference("num_nulls", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "number of nulls of 
the column").build())(),
+AttributeReference("distinct_count", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "distinct count of 
the column").build())(),
+AttributeReference("avg_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"average length of the values of the column").build())(),
+AttributeReference("max_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"max length of the values of the column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+} else {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+}
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val result = new ArrayBuffer[Row]
+val catalog = sparkSession.sessionState.catalog
+// Get the attribute referring to the given column
+val attribute = sparkSession.sessionState.executePlan(
--- End diff --

I don't get it, so you get the attribute just for column comment and name 
and data type?  I think `CatalogTable.schema` already have this information.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15664: [SPARK-18123][SQL] Use db column names instead of...

2016-12-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15664#discussion_r94101576
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -211,6 +211,55 @@ object JdbcUtils extends Logging {
   }
 
   /**
+   * Returns the schema if the table already exists in the JDBC database.
+   */
+  def getSchema(conn: Connection, url: String, table: String): 
Option[StructType] = {
+val dialect = JdbcDialects.get(url)
+
+try {
+  val statement = conn.prepareStatement(dialect.getSchemaQuery(table))
+  try {
+Some(getSchema(statement.executeQuery(), dialect))
--- End diff --

Then, we can keep the existing way. See 
https://github.com/apache/spark/compare/master...gatorsmile:pr-15664Changed1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94101475
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -586,6 +587,122 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * Command that looks like
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+column: String,
+isExtended: Boolean,
+isFormatted: Boolean)
+  extends RunnableCommand {
+
+  override val output: Seq[Attribute] =
+// Column names are based on Hive.
+if (isFormatted) {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("min", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "min value of the 
column").build())(),
+AttributeReference("max", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "max value of the 
column").build())(),
+AttributeReference("num_nulls", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "number of nulls of 
the column").build())(),
+AttributeReference("distinct_count", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "distinct count of 
the column").build())(),
+AttributeReference("avg_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"average length of the values of the column").build())(),
+AttributeReference("max_col_len", StringType, nullable = true,
+  new MetadataBuilder().putString("comment",
+"max length of the values of the column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+} else {
+  Seq(
+AttributeReference("col_name", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "name of the 
column").build())(),
+AttributeReference("data_type", StringType, nullable = false,
+  new MetadataBuilder().putString("comment", "data type of the 
column").build())(),
+AttributeReference("comment", StringType, nullable = true,
+  new MetadataBuilder().putString("comment", "comment of the 
column").build())())
+}
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val result = new ArrayBuffer[Row]
--- End diff --

why we create an `ArrayBuffer`? Doesn't it always return a single row?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...

2016-12-28 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16371#discussion_r94101455
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
 ---
@@ -44,39 +43,52 @@ abstract class Collect extends ImperativeAggregate {
 
   override def dataType: DataType = ArrayType(child.dataType)
 
-  override def supportsPartial: Boolean = false
-
-  override def aggBufferAttributes: Seq[AttributeReference] = Nil
-
-  override def aggBufferSchema: StructType = 
StructType.fromAttributes(aggBufferAttributes)
-
-  override def inputAggBufferAttributes: Seq[AttributeReference] = Nil
-
   // Both `CollectList` and `CollectSet` are non-deterministic since their 
results depend on the
   // actual order of input rows.
   override def deterministic: Boolean = false
 
-  protected[this] val buffer: Growable[Any] with Iterable[Any]
-
-  override def initialize(b: InternalRow): Unit = {
-buffer.clear()
-  }
-
-  override def update(b: InternalRow, input: InternalRow): Unit = {
-// Do not allow null values. We follow the semantics of Hive's 
collect_list/collect_set here.
-// See: 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMkCollectionEvaluator
-val value = child.eval(input)
-if (value != null) {
-  buffer += value
+  protected def _serialize(obj: Iterable[Any]): Array[Byte] = {
--- End diff --

We can generalize this to cover `Percentile`, but we may not use `T <: 
Growable[Any] with Iterable[Any]` but `T <: Iterable[Any]` then.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94101445
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -586,6 +587,122 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * Command that looks like
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+column: String,
+isExtended: Boolean,
--- End diff --

where do we use `isExtended`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94101427
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -586,6 +587,122 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * Command that looks like
+ * {{{
+ *   DESCRIBE [EXTENDED|FORMATTED] table_name column_name;
+ * }}}
+ */
+case class DescribeColumnCommand(
+table: TableIdentifier,
+column: String,
+isExtended: Boolean,
+isFormatted: Boolean)
+  extends RunnableCommand {
+
+  override val output: Seq[Attribute] =
+// Column names are based on Hive.
--- End diff --

can you add a link to the hive spec about this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...

2016-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14204
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16417: [SPARK-19014][SQL] support complex aggregate buffer in H...

2016-12-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16417
  
**[Test build #70706 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70706/testReport)**
 for PR 16417 at commit 
[`8a5fe25`](https://github.com/apache/spark/commit/8a5fe253215eabc1f2686d02e4787c1fc692866f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...

2016-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14204
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70697/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94101415
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ---
@@ -586,6 +587,122 @@ case class DescribeTableCommand(
   }
 }
 
+/**
+ * Command that looks like
--- End diff --

please follow other commands and add more description.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...

2016-12-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14204
  
**[Test build #70697 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70697/testReport)**
 for PR 14204 at commit 
[`11fe868`](https://github.com/apache/spark/commit/11fe868569431c0bc957e08dfb551a7a37a8aaf0).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16371: [SPARK-18932][SQL] Support partial aggregation fo...

2016-12-28 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/16371#discussion_r94101411
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/collect.scala
 ---
@@ -44,39 +43,52 @@ abstract class Collect extends ImperativeAggregate {
 
   override def dataType: DataType = ArrayType(child.dataType)
 
-  override def supportsPartial: Boolean = false
-
-  override def aggBufferAttributes: Seq[AttributeReference] = Nil
-
-  override def aggBufferSchema: StructType = 
StructType.fromAttributes(aggBufferAttributes)
-
-  override def inputAggBufferAttributes: Seq[AttributeReference] = Nil
-
   // Both `CollectList` and `CollectSet` are non-deterministic since their 
results depend on the
   // actual order of input rows.
   override def deterministic: Boolean = false
 
-  protected[this] val buffer: Growable[Any] with Iterable[Any]
-
-  override def initialize(b: InternalRow): Unit = {
-buffer.clear()
-  }
-
-  override def update(b: InternalRow, input: InternalRow): Unit = {
-// Do not allow null values. We follow the semantics of Hive's 
collect_list/collect_set here.
-// See: 
org.apache.hadoop.hive.ql.udf.generic.GenericUDAFMkCollectionEvaluator
-val value = child.eval(input)
-if (value != null) {
-  buffer += value
+  protected def _serialize(obj: Iterable[Any]): Array[Byte] = {
--- End diff --

`Percentile` takes `OpenHashMap`, which has a bit different behavior than 
`Growable[]`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATT...

2016-12-28 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/16422#discussion_r94101398
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -300,10 +300,12 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
* Create a [[DescribeTableCommand]] logical plan.
*/
   override def visitDescribeTable(ctx: DescribeTableContext): LogicalPlan 
= withOrigin(ctx) {
-// Describe column are not supported yet. Return null and let the 
parser decide
-// what to do with this (create an exception or pass it on to a 
different system).
 if (ctx.describeColName != null) {
-  null
+  DescribeColumnCommand(
--- End diff --

shall we throw exception here if partition spec is given?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14452: [SPARK-16849][SQL] Improve subquery execution by dedupli...

2016-12-28 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14452
  
Because the actual improvement of this de-duplication depends on the 
complexity of disjunctive predicate pushdown and CTE subquery, if we can't have 
general rule to decide whether de-duplicate or not, I think can we have a 
config (default off) to enable/disable this subquery de-duplication.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...

2016-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16429
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70705/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [SPARK-19019][PYTHON] Fix hijacked `collections.namedtup...

2016-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16429
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [WIP][SPARK-19019][PYTHON] Fix hijacked `collections.nam...

2016-12-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16429
  
**[Test build #70705 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70705/testReport)**
 for PR 16429 at commit 
[`b688e89`](https://github.com/apache/spark/commit/b688e89ffa266fe4713e58b1052a2416f75cbb81).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16397: [SPARK-18922][TESTS] Fix more path-related test failures...

2016-12-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/16397
  
I just checked each is fine in a concatenated log file - 
https://gist.github.com/HyukjinKwon/8851815ede9dcae80632a5378b74d1ae


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...

2016-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15664
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70695/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...

2016-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15664
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16427: [SPARK-19012][SQL] Fix `createTempViewCommand` to throw ...

2016-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16427
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70693/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16427: [SPARK-19012][SQL] Fix `createTempViewCommand` to throw ...

2016-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16427
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15664: [SPARK-18123][SQL] Use db column names instead of RDD co...

2016-12-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15664
  
**[Test build #70695 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70695/testReport)**
 for PR 15664 at commit 
[`86ab6df`](https://github.com/apache/spark/commit/86ab6df26c052ef979e42006ae9d4159a01e83b6).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16427: [SPARK-19012][SQL] Fix `createTempViewCommand` to throw ...

2016-12-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16427
  
**[Test build #70693 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70693/testReport)**
 for PR 16427 at commit 
[`2ec0035`](https://github.com/apache/spark/commit/2ec0035f160047ba00cdb756b845c05486af9626).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [WIP][SPARK-19019][PYTHON] Fix hijacked `collections.nam...

2016-12-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16429
  
**[Test build #70705 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70705/testReport)**
 for PR 16429 at commit 
[`b688e89`](https://github.com/apache/spark/commit/b688e89ffa266fe4713e58b1052a2416f75cbb81).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [WIP][SPARK-19019][PYTHON] Fix hijacked `collections.nam...

2016-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16429
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70701/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [WIP][SPARK-19019][PYTHON] Fix hijacked `collections.nam...

2016-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16429
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [WIP][SPARK-19019][PYTHON] Fix hijacked `collections.nam...

2016-12-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16429
  
**[Test build #70701 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70701/testReport)**
 for PR 16429 at commit 
[`9ac9d01`](https://github.com/apache/spark/commit/9ac9d012fc16496a5bea2d660b791c69a39b5f81).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16430: [SPARK-17077] [SQL] Cardinality estimation for project o...

2016-12-28 Thread wzhfy
Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/16430
  
cc @rxin @hvanhovell @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15664: [SPARK-18123][SQL] Use db column names instead of...

2016-12-28 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15664#discussion_r94099067
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -211,6 +211,55 @@ object JdbcUtils extends Logging {
   }
 
   /**
+   * Returns the schema if the table already exists in the JDBC database.
+   */
+  def getSchema(conn: Connection, url: String, table: String): 
Option[StructType] = {
+val dialect = JdbcDialects.get(url)
+
+try {
+  val statement = conn.prepareStatement(dialect.getSchemaQuery(table))
+  try {
+Some(getSchema(statement.executeQuery(), dialect))
--- End diff --

yes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [WIP][SPARK-19019][PYTHON] Fix hijacked `collections.nam...

2016-12-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16429
  
**[Test build #70702 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70702/testReport)**
 for PR 16429 at commit 
[`f4c56c8`](https://github.com/apache/spark/commit/f4c56c8c0ae3a925e389dadc75243f0cbb551f62).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [WIP][SPARK-19019][PYTHON] Fix hijacked `collections.nam...

2016-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16429
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70702/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [WIP][SPARK-19019][PYTHON] Fix hijacked `collections.nam...

2016-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16429
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15664: [SPARK-18123][SQL] Use db column names instead of...

2016-12-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15664#discussion_r94099004
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -211,6 +211,55 @@ object JdbcUtils extends Logging {
   }
 
   /**
+   * Returns the schema if the table already exists in the JDBC database.
+   */
+  def getSchema(conn: Connection, url: String, table: String): 
Option[StructType] = {
+val dialect = JdbcDialects.get(url)
+
+try {
+  val statement = conn.prepareStatement(dialect.getSchemaQuery(table))
+  try {
+Some(getSchema(statement.executeQuery(), dialect))
--- End diff --

For unsupported types, it will throw an `SQLException`, right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...

2016-12-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14158
  
**[Test build #70704 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70704/testReport)**
 for PR 14158 at commit 
[`83cbb58`](https://github.com/apache/spark/commit/83cbb588320d945d34dfd28e49eba90bcf63b064).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16430: [SPARK-17077] [SQL] Cardinality estimation for project o...

2016-12-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16430
  
**[Test build #70703 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70703/testReport)**
 for PR 16430 at commit 
[`12a48fa`](https://github.com/apache/spark/commit/12a48fa528bfcd8119251cd585bcef9dd555d3a3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...

2016-12-28 Thread nblintao
Github user nblintao commented on the issue:

https://github.com/apache/spark/pull/14158
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16430: [SPARK-17077] [SQL] Cardinality estimation projec...

2016-12-28 Thread wzhfy
GitHub user wzhfy opened a pull request:

https://github.com/apache/spark/pull/16430

[SPARK-17077] [SQL] Cardinality estimation project operator

## What changes were proposed in this pull request?

Support cardinality estimation for project operator.

## How was this patch tested?

Add a test case.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wzhfy/spark projectEstimation

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/16430.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #16430


commit 12a48fa528bfcd8119251cd585bcef9dd555d3a3
Author: Zhenhua Wang 
Date:   2016-12-29T03:07:11Z

project estimation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15664: [SPARK-18123][SQL] Use db column names instead of...

2016-12-28 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15664#discussion_r94098773
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -211,6 +211,55 @@ object JdbcUtils extends Logging {
   }
 
   /**
+   * Returns the schema if the table already exists in the JDBC database.
+   */
+  def getSchema(conn: Connection, url: String, table: String): 
Option[StructType] = {
+val dialect = JdbcDialects.get(url)
+
+try {
+  val statement = conn.prepareStatement(dialect.getSchemaQuery(table))
+  try {
+Some(getSchema(statement.executeQuery(), dialect))
+  } catch {
+case _: SQLException => None
+  } finally {
+statement.close()
+  }
+} catch {
+  case _: SQLException => None
+}
+  }
+
+  /**
+   * Returns a schema using rddSchema's column sequence and tableSchema's 
column names.
+   *
+   * When appending data into some case-sensitive DBMSs like 
PostgreSQL/Oracle, we need to respect
+   * the existing case-sensitive column names instead of RDD column names 
for user convenience.
+   * See SPARK-18123 for more details.
+   */
+  def normalizeSchema(
+  rddSchema: StructType,
+  tableSchema: StructType,
+  caseSensitive: Boolean): StructType = {
+val nameMap = tableSchema.fields.map(f => f.name -> f).toMap
+val lowercaseNameMap = tableSchema.fields.map(f => f.name.toLowerCase 
-> f).toMap
+
+var schema = new StructType()
+rddSchema.fields.foreach { f =>
+  if (nameMap.isDefinedAt(f.name)) {
+// identical names
+schema = schema.add(nameMap(f.name))
+  } else if (!caseSensitive && 
lowercaseNameMap.isDefinedAt(f.name.toLowerCase)) {
+// identical names in a case-insensitive way
+schema = schema.add(lowercaseNameMap(f.name.toLowerCase))
+  } else {
+throw new AnalysisException(s"""Column "${f.name}" not found""")
--- End diff --

The error message looks ambiguous. Maybe `Column "${f.name}" in RDD not 
found in table schema`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15664: [SPARK-18123][SQL] Use db column names instead of...

2016-12-28 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15664#discussion_r94098761
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -568,10 +617,9 @@ object JdbcUtils extends Logging {
 conn.setAutoCommit(false) // Everything in the same db transaction.
 conn.setTransactionIsolation(finalIsolationLevel)
   }
-  val stmt = insertStatement(conn, table, rddSchema, dialect)
-  val setters: Array[JDBCValueSetter] = 
rddSchema.fields.map(_.dataType)
-.map(makeSetter(conn, dialect, _)).toArray
-  val numFields = rddSchema.fields.length
+  val stmt = insertStatement(conn, table, schema, dialect)
--- End diff --

This is my rough idea: 
https://github.com/apache/spark/compare/master...gatorsmile:pr-15664Changed1


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...

2016-12-28 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/16371
  
@cloud-fan @marmbrus ok. I will first address @hvanhovell's comments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...

2016-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14158
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70700/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...

2016-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14158
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...

2016-12-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14158
  
**[Test build #70700 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70700/testReport)**
 for PR 14158 at commit 
[`83cbb58`](https://github.com/apache/spark/commit/83cbb588320d945d34dfd28e49eba90bcf63b064).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16415: [SPARK-19007]Speedup and optimize the GradientBoostedTre...

2016-12-28 Thread zdh2292390
Github user zdh2292390 commented on the issue:

https://github.com/apache/spark/pull/16415
  
@jkbradley   I test with MEMORY_ONLY  and got the same problem: 
`(ExecutorLostFailure (executor 6 exited caused by one of the running 
tasks) Reason: Container killed by YARN for exceeding memory limits. 10.2 GB of 
10 GB physical memory used. Consider boosting 
spark.yarn.executor.memoryOverhead.)`
and  it cost more than 1 hour.

So  my code should just  change the storageLevel  in predErrorCheckpointer ?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15664: [SPARK-18123][SQL] Use db column names instead of...

2016-12-28 Thread viirya
Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15664#discussion_r94098595
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
 ---
@@ -211,6 +211,55 @@ object JdbcUtils extends Logging {
   }
 
   /**
+   * Returns the schema if the table already exists in the JDBC database.
+   */
+  def getSchema(conn: Connection, url: String, table: String): 
Option[StructType] = {
+val dialect = JdbcDialects.get(url)
+
+try {
+  val statement = conn.prepareStatement(dialect.getSchemaQuery(table))
+  try {
+Some(getSchema(statement.executeQuery(), dialect))
--- End diff --

`getSchema` will throw an exception when the schema contains an unsupported 
type. Now we use it to check if the table exists. Does it change current 
behavior? E.g., the insertion working before now fails.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...

2016-12-28 Thread marmbrus
Github user marmbrus commented on the issue:

https://github.com/apache/spark/pull/16371
  
+1 I think we can move forward.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16371: [SPARK-18932][SQL] Support partial aggregation for colle...

2016-12-28 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/16371
  
I think we should support partial aggregation for collect, in the future we 
can define an interface to declare "should partial aggregate", so that a single 
collect function won't cause a 2-phase aggregation.

I think they are orthogonal and we can go ahead.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16415: [SPARK-19007]Speedup and optimize the GradientBoostedTre...

2016-12-28 Thread zdh2292390
Github user zdh2292390 commented on the issue:

https://github.com/apache/spark/pull/16415
  
@jkbradley   Can I  ask  what other uses are  for "The periodic 
checkpointer caches more because of some other use cases" ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13758: [SPARK-16043][SQL] Prepare GenericArrayData implementati...

2016-12-28 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/13758
  
I think we can close this now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED tabl...

2016-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16422
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70691/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [WIP][SPARK-19019][PYTHON] Fix hijected collections.name...

2016-12-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16429
  
**[Test build #70702 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70702/testReport)**
 for PR 16429 at commit 
[`f4c56c8`](https://github.com/apache/spark/commit/f4c56c8c0ae3a925e389dadc75243f0cbb551f62).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED tabl...

2016-12-28 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/16422
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16422: [SPARK-17642] [SQL] support DESC EXTENDED/FORMATTED tabl...

2016-12-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16422
  
**[Test build #70691 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70691/testReport)**
 for PR 16422 at commit 
[`d41a9cd`](https://github.com/apache/spark/commit/d41a9cd5980c5c9cadefd65a09d059ff8b9266e4).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #13909: [SPARK-16213][SQL] Reduce runtime overhead of a p...

2016-12-28 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/13909


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #13909: [SPARK-16213][SQL] Reduce runtime overhead of a program ...

2016-12-28 Thread cloud-fan
Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/13909
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16429: [WIP][SPARK-19019][PYTHON] Fix hijected collections.name...

2016-12-28 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/16429
  
**[Test build #70701 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70701/testReport)**
 for PR 16429 at commit 
[`9ac9d01`](https://github.com/apache/spark/commit/9ac9d012fc16496a5bea2d660b791c69a39b5f81).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14461: [SPARK-16856] [WEBUI] [CORE] Link the application's exec...

2016-12-28 Thread nblintao
Github user nblintao commented on the issue:

https://github.com/apache/spark/pull/14461
  
@ajbozarth I finally have a chance to rebase it in the winter break. Could 
you please have a look? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14158: [SPARK-13547] [SQL] [WEBUI] Add SQL query in web UI's SQ...

2016-12-28 Thread nblintao
Github user nblintao commented on the issue:

https://github.com/apache/spark/pull/14158
  
@ajbozarth I finally have a chance to rebase it in the winter break. Could 
you please have a look? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14204: [SPARK-16520] [WEBUI] Link executors to corresponding wo...

2016-12-28 Thread nblintao
Github user nblintao commented on the issue:

https://github.com/apache/spark/pull/14204
  
@ajbozarth I finally have a chance to rebase it in the winter break. Could 
you please have a look? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >