date:20160921

[GitHub] spark pull request #15190: [SPARK-17620][SQL] Use the storage format specifi...

2016-09-21 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/15190#discussion_r79979332
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -988,9 +988,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   .orElse(Some("org.apache.hadoop.mapred.TextInputFormat")),
 outputFormat = defaultHiveSerde.flatMap(_.outputFormat)
   
.orElse(Some("org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat")),
-// Note: Keep this unspecified because we use the presence of the 
serde to decide
--- End diff --

@viirya @cloud-fan Actually i am not sure, if the above comment is in sync 
with the code. When we had this comment, we used to have 
CreateTableAsSelectLogicalPlan to represent the CTAS case and we used to check 
for serde's presence to determine whether or not to convert it to a data source 
table like following.

``` SQL
   if (sessionState.convertCTAS && table.storage.serde.isEmpty) {
  // Do the conversion when spark.sql.hive.convertCTAS is true and 
the query
  // does not specify any storage format (file format and storage 
handler).
  if (table.identifier.database.isDefined) {
throw new AnalysisException(
  "Cannot specify database name in a CTAS statement " +
"when spark.sql.hive.convertCTAS is set to true.")
  }

  val mode = if (allowExisting) SaveMode.Ignore else 
SaveMode.ErrorIfExists
  CreateTableUsingAsSelect(
TableIdentifier(desc.identifier.table),
conf.defaultDataSourceName,
temporary = false,
Array.empty[String],
bucketSpec = None,
mode,
options = Map.empty[String, String],
child
  )
} else {
  val desc = if (table.storage.serde.isEmpty) {
// add default serde
table.withNewStorage(
  serde = 
Some("org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe"))
  } else {
table
  }
```
I think this code has changed and moved to SparkSqlParser ?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15190: [SPARK-17620][SQL] Use the storage format specified by h...

2016-09-21 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15190
  
PR title should be 
```Determine Serde by hive.default.fileformat when Creating Hive Serde 
Tables```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15190: [SPARK-17620][SQL] Use the storage format specifi...

2016-09-21 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15190#discussion_r79978495
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -988,9 +988,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   .orElse(Some("org.apache.hadoop.mapred.TextInputFormat")),
 outputFormat = defaultHiveSerde.flatMap(_.outputFormat)
   
.orElse(Some("org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat")),
-// Note: Keep this unspecified because we use the presence of the 
serde to decide
--- End diff --

The current checking conditions are based on [ctx.createFileFormat and 
ctx.rowFormat](https://github.com/dilipbiswal/spark/blob/f2b93de629f378ca99f8d3086ade8dc05b41a912/sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala#L1051-L1052).
 Thus, I think this PR looks ok. : )




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15102: [SPARK-17346][SQL] Add Kafka source for Structured Strea...

2016-09-21 Thread koeninger

Github user koeninger commented on the issue:

https://github.com/apache/spark/pull/15102
  
> I'd want to see some test cases though that show why the current 
implementation is wrong from an end-user perspective if it needs to block 
merging initial kafka support.

PR with failing test indicating at least one reason why it's wrong from an 
end-user perspective:

https://github.com/zsxwing/spark/pull/4

>  I do not think it is reasonable to suggest we block merging this patch 
on an overhaul of the DataSource API configuration system.

Here's what I actually said:
'if you know your plan down the line is to use json for structured 
configuration, you should use it now, and provide more convenient ways to 
construct json later, not use "convenient" non-json hacks now.'

No hyperbole about blocking on a complete overhaul, nothing that isn't 
backwards compatible.  I'm just saying that, if the design document already 
recognizes that json is necessary to work around the string -> string 
interface... start using structured json strings now, and make it more 
convenient later.

Or do you actually think that stuff like

option("assign", "topicA:1:1,topicA:2:2,topicB:3:3")

makes it clear what the arguments are?

> I think @koeninger made a good suggestion to block accepting certain 
kafka configurations.

In case it wasn't clear, I was not suggesting that preventing users from 
doing things they could otherwise do with Kafka is actually a good idea.  I 
think it's a bad idea, but if you're going to run with it, you might as well be 
consistent about it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15190: [SPARK-17620][SQL] Use the storage format specifi...

2016-09-21 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15190#discussion_r79978157
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -988,9 +988,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   .orElse(Some("org.apache.hadoop.mapred.TextInputFormat")),
 outputFormat = defaultHiveSerde.flatMap(_.outputFormat)
   
.orElse(Some("org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat")),
-// Note: Keep this unspecified because we use the presence of the 
serde to decide
--- End diff --

The comment is not valid now. This was removed by the PR: 
https://github.com/apache/spark/pull/13386


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15190: [SPARK-17620][SQL] Use the storage format specifi...

2016-09-21 Thread cloud-fan

Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/15190#discussion_r79977535
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -988,9 +988,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   .orElse(Some("org.apache.hadoop.mapred.TextInputFormat")),
 outputFormat = defaultHiveSerde.flatMap(_.outputFormat)
   
.orElse(Some("org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat")),
-// Note: Keep this unspecified because we use the presence of the 
serde to decide
--- End diff --

cc @yhuai to confirm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15046: [SPARK-17492] [SQL] Fix Reading Cataloged Data So...

2016-09-21 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15046


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15046: [SPARK-17492] [SQL] Fix Reading Cataloged Data Sources w...

2016-09-21 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15046
  
thanks, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15174: [SPARK-17502] [SQL] [Backport] [2.0] Fix Multiple Bugs i...

2016-09-21 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15174
  
Sure, will do it. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15190: [SPARK-17620][SQL] Use the storage format specifi...

2016-09-21 Thread viirya

Github user viirya commented on a diff in the pull request:

https://github.com/apache/spark/pull/15190#discussion_r79976580
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -988,9 +988,7 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   .orElse(Some("org.apache.hadoop.mapred.TextInputFormat")),
 outputFormat = defaultHiveSerde.flatMap(_.outputFormat)
   
.orElse(Some("org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat")),
-// Note: Keep this unspecified because we use the presence of the 
serde to decide
--- End diff --

I think this is kept as unspecified because it is intended to write the 
table with Hive write path. If we specify serde here, it will be converted to 
datasource table. Is it ok? cc @cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14988: [SPARK-17425][SQL] Override sameResult in HiveTab...

2016-09-21 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14988


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14988: [SPARK-17425][SQL] Override sameResult in HiveTableScanE...

2016-09-21 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14988
  
LGTM, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14537: [SPARK-16948][SQL] Use metastore schema instead of infer...

2016-09-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14537
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65755/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14537: [SPARK-16948][SQL] Use metastore schema instead of infer...

2016-09-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14537
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14537: [SPARK-16948][SQL] Use metastore schema instead of infer...

2016-09-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14537
  
**[Test build #65755 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65755/consoleFull)**
 for PR 14537 at commit 
[`fa71370`](https://github.com/apache/spark/commit/fa713700f853e78053ac0be5db49250951aaa715).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15174: [SPARK-17502] [SQL] [Backport] [2.0] Fix Multiple Bugs i...

2016-09-21 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15174
  
LGTM, if you have time, can you also include 
https://github.com/apache/spark/pull/15160? they are kind of related.  thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15160: [SPARK-17609][SQL] SessionCatalog.tableExists sho...

2016-09-21 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15160


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15160: [SPARK-17609][SQL] SessionCatalog.tableExists should not...

2016-09-21 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15160
  
thanks for the review, merging to master!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14537: [SPARK-16948][SQL] Use metastore schema instead of infer...

2016-09-21 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/14537
  
LGTM, pending jenkins


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics

2016-09-21 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15090
  
The test suite `StatisticsColumnSuite` misses the negative cases. For 
example, so far, we do not allow users to analyze the temporary tables. 

Ideally, all the exceptions the code could issue need a test case. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15194: New feature for structured streaming: add http stream si...

2016-09-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15194
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15182: [SPARK-17625] [SQL] set expectedOutputAttributes when co...

2016-09-21 Thread cloud-fan

Github user cloud-fan commented on the issue:

https://github.com/apache/spark/pull/15182
  
LGTM


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15194: New feature for structured streaming: add http st...

2016-09-21 Thread zhangxinyu1

GitHub user zhangxinyu1 opened a pull request:

https://github.com/apache/spark/pull/15194

New feature for structured streaming: add http stream sink

## What changes were proposed in this pull request?

Add http stream sink for structured streaming.
Streaming query results can be sinked to http server through http post 
request.

## How was this patch tested?
Use 
[quick-example](http://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#quick-example)
 and configure DataStreamWriter with .format("http").option("url", httpUrl)




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/zhangxinyu1/spark 
feature-for-structed-streaming-add-http-sink

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15194.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15194


commit 87c48c7ed284b95a27e5a6c7f59ee836a95bb588
Author: zhangxinyu1 <342689...@qq.com>
Date:   2016-09-21T07:00:39Z

add feature: streaming query results can be output to http server

commit 489f629783768bef1024de55367c67c26c7192d0
Author: zhangxinyu1 <342689...@qq.com>
Date:   2016-09-22T04:09:56Z

new feature for structed streaming: http sink

commit f6eca02c4a44a65e012bec8c294b861de9c19560
Author: zhangxinyu1 <342689...@qq.com>
Date:   2016-09-22T04:15:35Z

new feature for structed streaming: http sink

commit 96f17b1397d5858a4ce709691b632852b02682e2
Author: zhangxinyu1 <342689...@qq.com>
Date:   2016-09-22T04:25:03Z

new feature for structed streaming: add http sink




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-21 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r79975372
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import scala.collection.mutable
+
+import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.{InternalRow, TableIdentifier}
+import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
+import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, 
CatalogTable}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.aggregate._
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, 
ColumnStats, LogicalPlan, Statistics}
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.types._
+
+
+/**
+ * Analyzes the given columns of the given table in the current database 
to generate statistics,
+ * which will be used in query optimizations.
+ */
+case class AnalyzeColumnCommand(
+tableIdent: TableIdentifier,
+columnNames: Seq[String]) extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val sessionState = sparkSession.sessionState
+val db = 
tableIdent.database.getOrElse(sessionState.catalog.getCurrentDatabase)
+val tableIdentWithDB = TableIdentifier(tableIdent.table, Some(db))
+val relation = 
EliminateSubqueryAliases(sessionState.catalog.lookupRelation(tableIdentWithDB))
+
+relation match {
+  case catalogRel: CatalogRelation =>
+updateStats(catalogRel.catalogTable,
+  AnalyzeTableCommand.calculateTotalSize(sessionState, 
catalogRel.catalogTable))
+
+  case logicalRel: LogicalRelation if 
logicalRel.catalogTable.isDefined =>
+updateStats(logicalRel.catalogTable.get, 
logicalRel.relation.sizeInBytes)
+
+  case otherRelation =>
+throw new AnalysisException("ANALYZE TABLE is not supported for " +
+  s"${otherRelation.nodeName}.")
+}
+
+def updateStats(catalogTable: CatalogTable, newTotalSize: Long): Unit 
= {
+  val (rowCount, columnStats) = computeColStats(sparkSession, relation)
+  val statistics = Statistics(
+sizeInBytes = newTotalSize,
+rowCount = Some(rowCount),
+colStats = columnStats ++ 
catalogTable.stats.map(_.colStats).getOrElse(Map()))
+  sessionState.catalog.alterTable(catalogTable.copy(stats = 
Some(statistics)))
+  // Refresh the cached data source table in the catalog.
+  sessionState.catalog.refreshTable(tableIdentWithDB)
+}
+
+Seq.empty[Row]
+  }
+
+  def computeColStats(
+  sparkSession: SparkSession,
+  relation: LogicalPlan): (Long, Map[String, ColumnStats]) = {
+
+// check correctness of column names
+val attributesToAnalyze = mutable.MutableList[Attribute]()
+val caseSensitive = 
sparkSession.sessionState.conf.caseSensitiveAnalysis
+columnNames.foreach { col =>
+  val exprOption = relation.output.find { attr =>
+if (caseSensitive) attr.name == col else 
attr.name.equalsIgnoreCase(col)
+  }
+  val expr = exprOption.getOrElse(throw new 
AnalysisException(s"Invalid column name: $col."))
+  // do deduplication
+  if (!attributesToAnalyze.contains(expr)) {
--- End diff --

Deduplication lacks case sensitivity handling. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...

2016-09-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14035
  
**[Test build #65758 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65758/consoleFull)**
 for PR 14035 at commit 
[`13b1a67`](https://github.com/apache/spark/commit/13b1a6751902493e458af162b222aebf879d41da).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-21 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r79975005
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsTest.scala 
---
@@ -0,0 +1,93 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.catalyst.TableIdentifier
+import org.apache.spark.sql.catalyst.plans.logical.{ColumnStats, 
Statistics}
+import org.apache.spark.sql.execution.command.AnalyzeColumnCommand
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.test.SharedSQLContext
+import org.apache.spark.sql.types._
+
+trait StatisticsTest extends QueryTest with SharedSQLContext {
+
+  def checkColStats(
+  df: DataFrame,
+  expectedColStatsSeq: Seq[(String, ColumnStats)]): Unit = {
+val table = "tbl"
+withTable(table) {
+  df.write.format("json").saveAsTable(table)
+  val columns = expectedColStatsSeq.map(_._1)
+  val tableIdent = TableIdentifier(table, Some("default"))
+  val relation = spark.sessionState.catalog.lookupRelation(tableIdent)
+  val columnStats =
+AnalyzeColumnCommand(tableIdent, columns).computeColStats(spark, 
relation)._2
+  expectedColStatsSeq.foreach { expected =>
+assert(columnStats.contains(expected._1))
+checkColStats(colStats = columnStats(expected._1), 
expectedColStats = expected._2)
+  }
+}
+  }
+
+  def checkColStats(colStats: ColumnStats, expectedColStats: ColumnStats): 
Unit = {
+assert(colStats.dataType == expectedColStats.dataType)
+assert(colStats.numNulls == expectedColStats.numNulls)
+colStats.dataType match {
+  case _: IntegralType | DateType | TimestampType =>
+assert(colStats.max.map(_.toString.toLong) == 
expectedColStats.max.map(_.toString.toLong))
+assert(colStats.min.map(_.toString.toLong) == 
expectedColStats.min.map(_.toString.toLong))
+  case _: FractionalType =>
+assert(colStats.max.map(_.toString.toDouble) == expectedColStats
+  .max.map(_.toString.toDouble))
+assert(colStats.min.map(_.toString.toDouble) == expectedColStats
+  .min.map(_.toString.toDouble))
+  case _ =>
+// other types don't have max and min stats
+assert(colStats.max.isEmpty)
+assert(colStats.min.isEmpty)
+}
+colStats.dataType match {
+  case BinaryType | BooleanType => assert(colStats.ndv.isEmpty)
+  case _ =>
+// ndv is an approximate value, so we make sure we have the value, 
and it should be
+// within 3*SD's of the given rsd.
+assert(colStats.ndv.get >= 0)
+if (expectedColStats.ndv.get == 0) {
+  assert(colStats.ndv.get == 0)
+} else if (expectedColStats.ndv.get > 0) {
+  val rsd = spark.sessionState.conf.ndvMaxError
+  val error = math.abs((colStats.ndv.get / 
expectedColStats.ndv.get.toDouble) - 1.0d)
+  assert(error <= rsd * 3.0d, "Error should be within 3 std. 
errors.")
+}
+}
+assert(colStats.avgColLen == expectedColStats.avgColLen)
+assert(colStats.maxColLen == expectedColStats.maxColLen)
+assert(colStats.numTrues == expectedColStats.numTrues)
+assert(colStats.numFalses == expectedColStats.numFalses)
+  }
+
+  def checkTableStats(tableName: String, expectedRowCount: Option[Int]): 
Option[Statistics] = {
+val df = sql(s"SELECT * FROM $tableName")
--- End diff --

```Scala
val df = spark.table(tableName)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark issue #14035: [SPARK-16356][ML] Add testImplicits for ML unit tests an...

2016-09-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14035
  
**[Test build #65757 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65757/consoleFull)**
 for PR 14035 at commit 
[`2cbcabd`](https://github.com/apache/spark/commit/2cbcabdcef32280316db1ede1a22934dacf3cf35).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-21 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r79974658
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala ---
@@ -473,15 +476,20 @@ private[spark] class HiveExternalCatalog(conf: 
SparkConf, hadoopConf: Configurat
   }
 }
 // construct Spark's statistics from information in Hive metastore
-if (catalogTable.properties.contains(STATISTICS_TOTAL_SIZE)) {
-  val totalSize = 
BigInt(catalogTable.properties.get(STATISTICS_TOTAL_SIZE).get)
-  // TODO: we will compute "estimatedSize" when we have column stats:
-  // average size of row * number of rows
+if 
(catalogTable.properties.filterKeys(_.startsWith(STATISTICS_PREFIX)).nonEmpty) {
+  val colStatsProps = catalogTable.properties
+.filterKeys(_.startsWith(STATISTICS_BASIC_COL_STATS_PREFIX))
+.map { case (k, v) => 
(k.replace(STATISTICS_BASIC_COL_STATS_PREFIX, ""), v)}
--- End diff --

Add a space between `)` and `}`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-21 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r79974623
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -0,0 +1,189 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import scala.collection.mutable
+
+import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.{InternalRow, TableIdentifier}
+import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
+import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, 
CatalogTable}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.expressions.aggregate._
+import org.apache.spark.sql.catalyst.plans.logical.{Aggregate, 
ColumnStats, LogicalPlan, Statistics}
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.types._
+
+
+/**
+ * Analyzes the given columns of the given table in the current database 
to generate statistics,
+ * which will be used in query optimizations.
+ */
+case class AnalyzeColumnCommand(
+tableIdent: TableIdentifier,
+columnNames: Seq[String]) extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val sessionState = sparkSession.sessionState
+val db = 
tableIdent.database.getOrElse(sessionState.catalog.getCurrentDatabase)
+val tableIdentWithDB = TableIdentifier(tableIdent.table, Some(db))
+val relation = 
EliminateSubqueryAliases(sessionState.catalog.lookupRelation(tableIdentWithDB))
+
+relation match {
+  case catalogRel: CatalogRelation =>
+updateStats(catalogRel.catalogTable,
+  AnalyzeTableCommand.calculateTotalSize(sessionState, 
catalogRel.catalogTable))
+
+  case logicalRel: LogicalRelation if 
logicalRel.catalogTable.isDefined =>
+updateStats(logicalRel.catalogTable.get, 
logicalRel.relation.sizeInBytes)
+
+  case otherRelation =>
+throw new AnalysisException("ANALYZE TABLE is not supported for " +
+  s"${otherRelation.nodeName}.")
+}
+
+def updateStats(catalogTable: CatalogTable, newTotalSize: Long): Unit 
= {
+  val (rowCount, columnStats) = computeColStats(sparkSession, relation)
+  val statistics = Statistics(
+sizeInBytes = newTotalSize,
+rowCount = Some(rowCount),
+colStats = columnStats ++ 
catalogTable.stats.map(_.colStats).getOrElse(Map()))
+  sessionState.catalog.alterTable(catalogTable.copy(stats = 
Some(statistics)))
+  // Refresh the cached data source table in the catalog.
+  sessionState.catalog.refreshTable(tableIdentWithDB)
+}
+
+Seq.empty[Row]
+  }
+
+  def computeColStats(
+  sparkSession: SparkSession,
+  relation: LogicalPlan): (Long, Map[String, ColumnStats]) = {
+
+// check correctness of column names
+val attributesToAnalyze = mutable.MutableList[Attribute]()
+val caseSensitive = 
sparkSession.sessionState.conf.caseSensitiveAnalysis
+columnNames.foreach { col =>
+  val exprOption = relation.output.find { attr =>
+if (caseSensitive) attr.name == col else 
attr.name.equalsIgnoreCase(col)
+  }
+  val expr = exprOption.getOrElse(throw new 
AnalysisException(s"Invalid column name: $col."))
+  // do deduplication
+  if (!attributesToAnalyze.contains(expr)) {
+attributesToAnalyze += expr
+  }
+}
+
+// Collect statistics per column.
+// The first element in the result will be the overall row count, the 
following elements
+// will be structs containing all column stats.
+// The layout of each struct follows the layout of the ColumnStats.
+val ndvMaxErr = sparkSession.sessionState.conf.ndvMaxError
+val expressions =

[GitHub] spark issue #15182: [SPARK-17625] [SQL] set expectedOutputAttributes when co...

2016-09-21 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15182
  
LGTM pending Jenkins


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #13512: [SPARK-15769][SQL] Add Encoder for input type to Aggrega...

2016-09-21 Thread koertkuipers

Github user koertkuipers commented on the issue:

https://github.com/apache/spark/pull/13512
  
@cloud-fan i thought about this a little more, and my suggested changes to 
the Aggregator api does not allow one to use a different encoder when applying 
a typed operation on Dataset. so i do not think it is dangerous as such.

it does enable usage within the untyped grouping, which is where type 
conversions are already customary anyhow. its not more dangerous than say using 
a udaf in a DataFrame.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15182: [SPARK-17625] [SQL] set expectedOutputAttributes when co...

2016-09-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15182
  
**[Test build #65756 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65756/consoleFull)**
 for PR 15182 at commit 
[`e2c3b9d`](https://github.com/apache/spark/commit/e2c3b9df0431885efbc9575beb7735590a77cf2f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15154: [SPARK-17494] [SQL] changePrecision() on compact ...

2016-09-21 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15154


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15182: [SPARK-17625] [SQL] set expectedOutputAttributes when co...

2016-09-21 Thread wzhfy

Github user wzhfy commented on the issue:

https://github.com/apache/spark/pull/15182
  
@cloud-fan Ok.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15188: [SPARK-17627] Mark Streaming Providers Experiment...

2016-09-21 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15188


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15154: [SPARK-17494] [SQL] changePrecision() on compact decimal...

2016-09-21 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15154
  
Merging in master/2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15188: [SPARK-17627] Mark Streaming Providers Experimental

2016-09-21 Thread rxin

Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15188
  
Merging in master/2.0.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14537: [SPARK-16948][SQL] Use metastore schema instead of infer...

2016-09-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14537
  
**[Test build #65755 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65755/consoleFull)**
 for PR 14537 at commit 
[`fa71370`](https://github.com/apache/spark/commit/fa713700f853e78053ac0be5db49250951aaa715).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics

2016-09-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15090
  
**[Test build #65754 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65754/consoleFull)**
 for PR 15090 at commit 
[`5f6b581`](https://github.com/apache/spark/commit/5f6b5817d59c1b6bb48563357f625521e7c56236).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15005: [SPARK-17421] [DOCS] Documenting the current treatment o...

2016-09-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15005
  
**[Test build #3286 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3286/consoleFull)**
 for PR 15005 at commit 
[`53a09cd`](https://github.com/apache/spark/commit/53a09cd5783d55048b2cf7579cf53ccc76bdf3d7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14537: [SPARK-16948][SQL] Use metastore schema instead o...

2016-09-21 Thread rajeshbalamohan

Github user rajeshbalamohan commented on a diff in the pull request:

https://github.com/apache/spark/pull/14537#discussion_r79972251
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveMetastoreCatalog.scala ---
@@ -237,21 +237,27 @@ private[hive] class 
HiveMetastoreCatalog(sparkSession: SparkSession) extends Log
   new Path(metastoreRelation.catalogTable.storage.locationUri.get),
   partitionSpec)
 
-val inferredSchema = if (fileType.equals("parquet")) {
-  val inferredSchema =
-defaultSource.inferSchema(sparkSession, options, 
fileCatalog.allFiles())
-  inferredSchema.map { inferred =>
-ParquetFileFormat.mergeMetastoreParquetSchema(metastoreSchema, 
inferred)
-  }.getOrElse(metastoreSchema)
-} else {
-  defaultSource.inferSchema(sparkSession, options, 
fileCatalog.allFiles()).get
+val schema = fileType match {
+  case "parquet" =>
+val inferredSchema =
+  defaultSource.inferSchema(sparkSession, options, 
fileCatalog.allFiles())
+
+// For Parquet, get correct schema by merging Metastore schema 
data types
--- End diff --

Sure. Will change to return metastoreSchema for parq as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15005: [SPARK-17421] [DOCS] Documenting the current treatment o...

2016-09-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15005
  
**[Test build #3286 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3286/consoleFull)**
 for PR 15005 at commit 
[`53a09cd`](https://github.com/apache/spark/commit/53a09cd5783d55048b2cf7579cf53ccc76bdf3d7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15191: [SPARK-17628][Streaming][Examples] change name "Streamin...

2016-09-21 Thread keypointt

Github user keypointt commented on the issue:

https://github.com/apache/spark/pull/15191
  
oh I see...sorry...I'll close this one


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15191: [SPARK-17628][Streaming][Examples] change name "S...

2016-09-21 Thread keypointt

Github user keypointt closed the pull request at:

https://github.com/apache/spark/pull/15191


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15190: [SPARK-17620][SQL] hive.default.fileformat=orc does not ...

2016-09-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15190
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65752/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15190: [SPARK-17620][SQL] hive.default.fileformat=orc does not ...

2016-09-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15190
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15193: [SQL]RowBasedKeyValueBatch reuse valueRow too

2016-09-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15193
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15190: [SPARK-17620][SQL] hive.default.fileformat=orc does not ...

2016-09-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15190
  
**[Test build #65752 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65752/consoleFull)**
 for PR 15190 at commit 
[`f2b93de`](https://github.com/apache/spark/commit/f2b93de629f378ca99f8d3086ade8dc05b41a912).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15139: [SPARK-17315][Follow-up][SparkR][ML] Fix print of...

2016-09-21 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15139


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15193: [SQL]RowBasedKeyValueBatch reuse valueRow too

2016-09-21 Thread yaooqinn

Github user yaooqinn commented on the issue:

https://github.com/apache/spark/pull/15193
  
cc @ooq   @sameeragarwal @davies is it right and necessary? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15192: [SPARK-14536] [SQL] fix to handle null value in array ty...

2016-09-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15192
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65751/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15139: [SPARK-17315][Follow-up][SparkR][ML] Fix print of Kolmog...

2016-09-21 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/15139
  
I will merge this into master. If anyone has more comments, I can address 
them at follow-up work. Thanks for your review. @felixcheung 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15192: [SPARK-14536] [SQL] fix to handle null value in array ty...

2016-09-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15192
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15193: [SQL]RowBasedKeyValueBatch reuse valueRow too

2016-09-21 Thread yaooqinn

GitHub user yaooqinn opened a pull request:

https://github.com/apache/spark/pull/15193

[SQL]RowBasedKeyValueBatch reuse valueRow too

## What changes were proposed in this pull request?

reuse the cached valueRow in RowBasedKeyValueBatch 


## How was this patch tested?

existing ut




You can merge this pull request into a Git repository by running:

$ git pull https://github.com/yaooqinn/spark reuse-value

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15193.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15193


commit 0f60e107904fa4d0e92185bd9fae214ee70a1a11
Author: Kent Yao 
Date:   2016-09-22T02:59:23Z

reuse valueRow too




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15191: [SPARK-17628][Streaming][Examples] change name "Streamin...

2016-09-21 Thread srowen

Github user srowen commented on the issue:

https://github.com/apache/spark/pull/15191
  
"Foobars" is a common name in Java / Scala for "static methods related to 
Foobar objects". I think the current name is fine. It's not really an API 
anyway, just a component of an example.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15192: [SPARK-14536] [SQL] fix to handle null value in array ty...

2016-09-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15192
  
**[Test build #65751 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65751/consoleFull)**
 for PR 15192 at commit 
[`9eb40db`](https://github.com/apache/spark/commit/9eb40dbcdb0894e699a38e6dc4f44dc97408f63c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #14988: [SPARK-17425][SQL] Override sameResult in HiveTab...

2016-09-21 Thread watermen

Github user watermen commented on a diff in the pull request:

https://github.com/apache/spark/pull/14988#discussion_r79970830
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScanExec.scala
 ---
@@ -164,4 +164,19 @@ case class HiveTableScanExec(
   }
 
   override def output: Seq[Attribute] = attributes
+
+  override def sameResult(plan: SparkPlan): Boolean = plan match {
+case other: HiveTableScanExec =>
+  val thisPredicates = partitionPruningPred.map(cleanExpression)
+  val otherPredicates = other.partitionPruningPred.map(cleanExpression)
+
+  val result = relation.sameResult(other.relation) &&
+output.length == other.output.length &&
+  output.zip(other.output)
+.forall(p => p._1.name == p._2.name && p._1.dataType == 
p._2.dataType) &&
--- End diff --

@cloud-fan 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15131: [SPARK-17577][SparkR][Core] SparkR support add fi...

2016-09-21 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/15131


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #12601: [SPARK-14525][SQL] Make DataFrameWrite.save work for jdb...

2016-09-21 Thread JustinPihony

Github user JustinPihony commented on the issue:

https://github.com/apache/spark/pull/12601
  
@srowen Ping. I don't think there is anything on my plate. This should be 
mergeable


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15131: [SPARK-17577][SparkR][Core] SparkR support add files to ...

2016-09-21 Thread yanboliang

Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/15131
  
I will merge this into master. If anyone has more comments, I can address 
them at follow up work.  Thanks for your review. @felixcheung @HyukjinKwon 
@shivaram 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics

2016-09-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15090
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65749/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics

2016-09-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15090
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics

2016-09-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15090
  
**[Test build #65749 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65749/consoleFull)**
 for PR 15090 at commit 
[`ec02b2a`](https://github.com/apache/spark/commit/ec02b2a8b7bfb9c10d4d47e2678a44ec0f2f8af8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics

2016-09-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15090
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65746/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics

2016-09-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15090
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14851: [SPARK-17281][ML][MLLib] Add treeAggregateDepth paramete...

2016-09-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14851
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65753/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14851: [SPARK-17281][ML][MLLib] Add treeAggregateDepth paramete...

2016-09-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14851
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14851: [SPARK-17281][ML][MLLib] Add treeAggregateDepth paramete...

2016-09-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14851
  
**[Test build #65753 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65753/consoleFull)**
 for PR 14851 at commit 
[`378079d`](https://github.com/apache/spark/commit/378079d4778b4902b3d6956c504e22555aa2884c).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics

2016-09-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15090
  
**[Test build #65746 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65746/consoleFull)**
 for PR 15090 at commit 
[`ec02b2a`](https://github.com/apache/spark/commit/ec02b2a8b7bfb9c10d4d47e2678a44ec0f2f8af8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-09-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14124
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65747/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-09-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14124
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14124: [SPARK-16472][SQL] Inconsistent nullability in schema af...

2016-09-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14124
  
**[Test build #65747 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65747/consoleFull)**
 for PR 14124 at commit 
[`0bc06c6`](https://github.com/apache/spark/commit/0bc06c6e3e931a5f317e043aa5eeea97083b9860).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-21 Thread wzhfy

Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r79968621
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsColumnSuite.scala ---
@@ -0,0 +1,352 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.sql.{Date, Timestamp}
+
+import org.apache.spark.sql.catalyst.parser.ParseException
+import org.apache.spark.sql.catalyst.plans.logical.ColumnStats
+import org.apache.spark.sql.catalyst.util.DateTimeUtils
+import org.apache.spark.sql.execution.command.AnalyzeColumnCommand
+import org.apache.spark.sql.types._
+
+class StatisticsColumnSuite extends StatisticsTest {
+  import testImplicits._
+
+  test("parse analyze column commands") {
+def assertAnalyzeColumnCommand(analyzeCommand: String, c: Class[_]) {
+  val parsed = spark.sessionState.sqlParser.parsePlan(analyzeCommand)
+  val operators = parsed.collect {
+case a: AnalyzeColumnCommand => a
+case o => o
+  }
+  assert(operators.size == 1)
+  if (operators.head.getClass != c) {
+fail(
+  s"""$analyzeCommand expected command: $c, but got 
${operators.head}
+ |parsed command:
+ |$parsed
+   """.stripMargin)
+  }
+}
+
+val table = "table"
+assertAnalyzeColumnCommand(
+  s"ANALYZE TABLE $table COMPUTE STATISTICS FOR COLUMNS key, value",
+  classOf[AnalyzeColumnCommand])
+
+intercept[ParseException] {
+  sql(s"ANALYZE TABLE $table COMPUTE STATISTICS FOR COLUMNS")
+}
+  }
+
+  test("check correctness of columns") {
+val table = "tbl"
+val colName1 = "abc"
+val colName2 = "x.yz"
+val quotedColName2 = s"`$colName2`"
+withTable(table) {
+  sql(s"CREATE TABLE $table ($colName1 int, $quotedColName2 string) 
USING PARQUET")
+
+  val invalidColError = intercept[AnalysisException] {
+sql(s"ANALYZE TABLE $table COMPUTE STATISTICS FOR COLUMNS key")
+  }
+  assert(invalidColError.message == "Invalid column name: key.")
+
+  withSQLConf("spark.sql.caseSensitive" -> "true") {
+val invalidErr = intercept[AnalysisException] {
+  sql(s"ANALYZE TABLE $table COMPUTE STATISTICS FOR COLUMNS 
${colName1.toUpperCase}")
+}
+assert(invalidErr.message == s"Invalid column name: 
${colName1.toUpperCase}.")
+  }
+
+  withSQLConf("spark.sql.caseSensitive" -> "false") {
+val columnsToAnalyze = Seq(colName2.toUpperCase, colName1, 
colName2)
+val columnStats = spark.sessionState.computeColumnStats(table, 
columnsToAnalyze)
--- End diff --

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15185: [SPARK-17618] Fix invalid comparisons between UnsafeRow ...

2016-09-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15185
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65750/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15185: [SPARK-17618] Fix invalid comparisons between UnsafeRow ...

2016-09-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15185
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15185: [SPARK-17618] Fix invalid comparisons between UnsafeRow ...

2016-09-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15185
  
**[Test build #65750 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65750/consoleFull)**
 for PR 15185 at commit 
[`1319e82`](https://github.com/apache/spark/commit/1319e8281ab3ec14a5ba11fca0261d19b7890ad3).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14912: [SPARK-17357][SQL] Fix current predicate pushdown

2016-09-21 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14912
  
ping @cloud-fan @hvanhovell Can you review this if you have time? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14780: [SPARK-17206][SQL] Support ANALYZE TABLE on analyzable t...

2016-09-21 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/14780
  
@hvanhovell ok. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15046: [SPARK-17492] [SQL] Fix Reading Cataloged Data Sources w...

2016-09-21 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15046
  
This is a new issue of Spark 2.1, after we physically store the inferred 
schema in the metastore. 

BTW, I also ran the test cases in Spark 2.0. It works well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #14851: [SPARK-17281][ML][MLLib] Add treeAggregateDepth paramete...

2016-09-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14851
  
**[Test build #65753 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65753/consoleFull)**
 for PR 14851 at commit 
[`378079d`](https://github.com/apache/spark/commit/378079d4778b4902b3d6956c504e22555aa2884c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15190: [SPARK-17620][SQL] hive.default.fileformat=orc does not ...

2016-09-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15190
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65744/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15190: [SPARK-17620][SQL] hive.default.fileformat=orc does not ...

2016-09-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15190
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15190: [SPARK-17620][SQL] hive.default.fileformat=orc does not ...

2016-09-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15190
  
**[Test build #65744 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65744/consoleFull)**
 for PR 15190 at commit 
[`f60e760`](https://github.com/apache/spark/commit/f60e760989ff732aa50d4bea3794e1261bc1a0cc).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15071: [SPARK-17517][SQL]Improve generated Code for BroadcastHa...

2016-09-21 Thread yaooqinn

Github user yaooqinn commented on the issue:

https://github.com/apache/spark/pull/15071
  
@hvanhovell
I think unfixed length fields may lead to memory overlapping when 
```BuildLeft```, since we are reusing the ```BufferHolder``` to avoid writing 
the stream side repeatedly. In this case, the holder can not ```grow``` 
properly to avoid the left side overlap the right side.

When ```BuildRight```, there is no such a problem.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15146: [SPARK-17590][SQL] Analyze CTE definitions at once and a...

2016-09-21 Thread viirya

Github user viirya commented on the issue:

https://github.com/apache/spark/pull/15146
  
@hvanhovell @cloud-fan Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15190: [SPARK-17620][SQL] hive.default.fileformat=orc does not ...

2016-09-21 Thread gatorsmile

Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15190
  
Please update the PR description. This is not for `orc` only


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-21 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r79965497
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/StatisticsTest.scala 
---
@@ -0,0 +1,88 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import org.apache.spark.sql.catalyst.plans.logical.{ColumnStats, 
Statistics}
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.test.SharedSQLContext
+import org.apache.spark.sql.types._
+
+trait StatisticsTest extends QueryTest with SharedSQLContext {
+
+  def checkColStats(
+  df: DataFrame,
+  expectedColStatsSeq: Seq[(String, ColumnStats)]): Unit = {
+val table = "tbl"
+withTable(table) {
+  df.write.format("json").saveAsTable(table)
+  val columns = expectedColStatsSeq.map(_._1)
+  val columnStats = spark.sessionState.computeColumnStats(table, 
columns)
--- End diff --

Change this too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-21 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r79965425
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/StatisticsColumnSuite.scala ---
@@ -0,0 +1,352 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql
+
+import java.sql.{Date, Timestamp}
+
+import org.apache.spark.sql.catalyst.parser.ParseException
+import org.apache.spark.sql.catalyst.plans.logical.ColumnStats
+import org.apache.spark.sql.catalyst.util.DateTimeUtils
+import org.apache.spark.sql.execution.command.AnalyzeColumnCommand
+import org.apache.spark.sql.types._
+
+class StatisticsColumnSuite extends StatisticsTest {
+  import testImplicits._
+
+  test("parse analyze column commands") {
+def assertAnalyzeColumnCommand(analyzeCommand: String, c: Class[_]) {
+  val parsed = spark.sessionState.sqlParser.parsePlan(analyzeCommand)
+  val operators = parsed.collect {
+case a: AnalyzeColumnCommand => a
+case o => o
+  }
+  assert(operators.size == 1)
+  if (operators.head.getClass != c) {
+fail(
+  s"""$analyzeCommand expected command: $c, but got 
${operators.head}
+ |parsed command:
+ |$parsed
+   """.stripMargin)
+  }
+}
+
+val table = "table"
+assertAnalyzeColumnCommand(
+  s"ANALYZE TABLE $table COMPUTE STATISTICS FOR COLUMNS key, value",
+  classOf[AnalyzeColumnCommand])
+
+intercept[ParseException] {
+  sql(s"ANALYZE TABLE $table COMPUTE STATISTICS FOR COLUMNS")
+}
+  }
+
+  test("check correctness of columns") {
+val table = "tbl"
+val colName1 = "abc"
+val colName2 = "x.yz"
+val quotedColName2 = s"`$colName2`"
+withTable(table) {
+  sql(s"CREATE TABLE $table ($colName1 int, $quotedColName2 string) 
USING PARQUET")
+
+  val invalidColError = intercept[AnalysisException] {
+sql(s"ANALYZE TABLE $table COMPUTE STATISTICS FOR COLUMNS key")
+  }
+  assert(invalidColError.message == "Invalid column name: key.")
+
+  withSQLConf("spark.sql.caseSensitive" -> "true") {
+val invalidErr = intercept[AnalysisException] {
+  sql(s"ANALYZE TABLE $table COMPUTE STATISTICS FOR COLUMNS 
${colName1.toUpperCase}")
+}
+assert(invalidErr.message == s"Invalid column name: 
${colName1.toUpperCase}.")
+  }
+
+  withSQLConf("spark.sql.caseSensitive" -> "false") {
+val columnsToAnalyze = Seq(colName2.toUpperCase, colName1, 
colName2)
+val columnStats = spark.sessionState.computeColumnStats(table, 
columnsToAnalyze)
--- End diff --

Here, you can just replace it by 
```Scala
val tableIdent = TableIdentifier(table, Option("default"))
val relation = spark.sessionState.catalog.lookupRelation(tableIdent)
val columnStats =
  AnalyzeColumnCommand(tableIdent, 
columnsToAnalyze).computeColStats(spark, relation)._2
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-21 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r79965370
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/internal/SessionState.scala ---
@@ -186,13 +187,27 @@ private[sql] class SessionState(sparkSession: 
SparkSession) {
   }
 
   /**
-   * Analyzes the given table in the current database to generate 
statistics, which will be
+   * Analyzes the given table in the current database to generate 
table-level statistics, which
+   * will be used in query optimizations.
+   */
+  def analyzeTable(tableIdent: TableIdentifier, noscan: Boolean = true): 
Unit = {
+AnalyzeTableCommand(tableIdent, noscan).run(sparkSession)
+  }
+
+  /**
+   * Analyzes the given columns in the table to generate column-level 
statistics, which will be
* used in query optimizations.
-   *
-   * Right now, it only supports catalog tables and it only updates the 
size of a catalog table
-   * in the external catalog.
*/
-  def analyze(tableName: String, noscan: Boolean = true): Unit = {
-AnalyzeTableCommand(tableName, noscan).run(sparkSession)
+  def analyzeTableColumns(tableIdent: TableIdentifier, columnNames: 
Seq[String]): Unit = {
+AnalyzeColumnCommand(tableIdent, columnNames).run(sparkSession)
+  }
+
+  // This api is used for testing.
+  def computeColumnStats(tableName: String, columnNames: Seq[String]): 
Map[String, ColumnStats] = {
--- End diff --

Avoid adding any testing-only API, if possible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15152: [SPARK-17365][Core] Remove/Kill multiple executors toget...

2016-09-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15152
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15152: [SPARK-17365][Core] Remove/Kill multiple executors toget...

2016-09-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15152
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65740/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15190: [SPARK-17620][SQL] hive.default.fileformat=orc does not ...

2016-09-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15190
  
**[Test build #65752 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65752/consoleFull)**
 for PR 15190 at commit 
[`f2b93de`](https://github.com/apache/spark/commit/f2b93de629f378ca99f8d3086ade8dc05b41a912).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15152: [SPARK-17365][Core] Remove/Kill multiple executors toget...

2016-09-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15152
  
**[Test build #65740 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65740/consoleFull)**
 for PR 15152 at commit 
[`3d2fac4`](https://github.com/apache/spark/commit/3d2fac45f72dd56e03486bb269baa138cefe4e2e).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15189: [SPARK-17549][sql] Coalesce cached relation stats in dri...

2016-09-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15189
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15189: [SPARK-17549][sql] Coalesce cached relation stats in dri...

2016-09-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15189
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65743/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15190: [SPARK-17620][SQL] hive.default.fileformat=orc do...

2016-09-21 Thread dilipbiswal

Github user dilipbiswal commented on a diff in the pull request:

https://github.com/apache/spark/pull/15190#discussion_r79964807
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveDDLCommandSuite.scala ---
@@ -556,4 +558,32 @@ class HiveDDLCommandSuite extends PlanTest {
 assert(partition2.get.apply("c") == "1" && partition2.get.apply("d") 
== "2")
   }
 
+  test("Test default fileformat") {
+withSQLConf("hive.default.fileformat" -> "orc") {
+  val s1 =
+s"""
+   |CREATE TABLE IF NOT EXISTS fileformat_test (id int)
+""".stripMargin
+  val (desc, exists) = extractTableDesc(s1)
+  assert(exists)
+  assert(desc.storage.inputFormat == 
Some("org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"))
+  assert(desc.storage.outputFormat == 
Some("org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat"))
+  assert(desc.storage.serde == 
Some("org.apache.hadoop.hive.ql.io.orc.OrcSerde"))
+}
+
+withSQLConf("hive.default.fileformat" -> "parquet") {
+  val s1 =
+s"""
+   |CREATE TABLE IF NOT EXISTS fileformat_test (id int)
+""".stripMargin
+  val (desc, exists) = extractTableDesc(s1)
--- End diff --

@gatorsmile Thanks !! I have updated as per your comments.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15189: [SPARK-17549][sql] Coalesce cached relation stats in dri...

2016-09-21 Thread SparkQA

Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15189
  
**[Test build #65743 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65743/consoleFull)**
 for PR 15189 at commit 
[`5b3a65a`](https://github.com/apache/spark/commit/5b3a65a02210c696206546c43403867bcc9eb077).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class ColStatsAccumulator(originalOutput: Seq[Attribute])`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15190: [SPARK-17620][SQL] hive.default.fileformat=orc do...

2016-09-21 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15190#discussion_r79964525
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveDDLCommandSuite.scala ---
@@ -556,4 +558,32 @@ class HiveDDLCommandSuite extends PlanTest {
 assert(partition2.get.apply("c") == "1" && partition2.get.apply("d") 
== "2")
   }
 
+  test("Test default fileformat") {
+withSQLConf("hive.default.fileformat" -> "orc") {
+  val s1 =
+s"""
+   |CREATE TABLE IF NOT EXISTS fileformat_test (id int)
+""".stripMargin
+  val (desc, exists) = extractTableDesc(s1)
+  assert(exists)
+  assert(desc.storage.inputFormat == 
Some("org.apache.hadoop.hive.ql.io.orc.OrcInputFormat"))
+  assert(desc.storage.outputFormat == 
Some("org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat"))
+  assert(desc.storage.serde == 
Some("org.apache.hadoop.hive.ql.io.orc.OrcSerde"))
+}
+
+withSQLConf("hive.default.fileformat" -> "parquet") {
+  val s1 =
+s"""
+   |CREATE TABLE IF NOT EXISTS fileformat_test (id int)
+""".stripMargin
+  val (desc, exists) = extractTableDesc(s1)
--- End diff --

The same here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15190: [SPARK-17620][SQL] hive.default.fileformat=orc do...

2016-09-21 Thread gatorsmile

Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15190#discussion_r79964497
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveDDLCommandSuite.scala ---
@@ -556,4 +558,32 @@ class HiveDDLCommandSuite extends PlanTest {
 assert(partition2.get.apply("c") == "1" && partition2.get.apply("d") 
== "2")
   }
 
+  test("Test default fileformat") {
+withSQLConf("hive.default.fileformat" -> "orc") {
+  val s1 =
+s"""
+   |CREATE TABLE IF NOT EXISTS fileformat_test (id int)
+""".stripMargin
+  val (desc, exists) = extractTableDesc(s1)
--- End diff --

```Scala
val (desc, exists) = extractTableDesc("CREATE TABLE IF NOT EXISTS 
fileformat_test (id int)")
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #15154: [SPARK-17494] [SQL] changePrecision() on compact decimal...

2016-09-21 Thread AmplabJenkins

Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15154
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 >

1 - 100 of 554 matches

Mail list logo