[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-09-13 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/9#discussion_r78689714
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -303,6 +322,29 @@ class KMeans @Since("1.5.0") (
   @Since("1.5.0")
   def setSeed(value: Long): this.type = set(seed, value)
 
+  /** @group setParam */
+  @Since("2.1.0")
+  def setInitialModel(value: KMeansModel): this.type = set(initialModel, 
value)
+
+  /** @group setParam */
+  @Since("2.1.0")
+  def setInitialModel(value: Model[_]): this.type = {
--- End diff --

As a follow on, we could eliminate the setter `def setInitialModel(value: 
Model[_])`. To have better documentation, we could leave the param as abstract 
in the `HasInitialModel` trait:

scala
def hasInitialModel: Param[T]


Then, when we add this to new models, we implement the param there. So, in 
KMeansParams:

scala
/**
 * Param for KMeansModel to use for warm start".
 * @group param
 */
final val hasInitialModel: Param[KMeansModel] = new 
Param[KMeansModel](this, "initialModel", 
  "A KMeansModel to use for warm start")


That way the params are explicit in what type of model is used for initial 
model and the documentation is more clear.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14961: [SPARK-17379] [BUILD] Upgrade netty-all to 4.0.41 final ...

2016-09-13 Thread zsxwing
Github user zsxwing commented on the issue:

https://github.com/apache/spark/pull/14961
  
Confirmed the issue was introduced by 
https://github.com/netty/netty/commit/d58dec8862e02fc2a98f8dcdb166db4b788be50a#diff-8d83d75ebf8a18cc48bf0a0b1183c188

Add `System.setProperty("io.netty.maxDirectMemory", "0");` to disable this 
feature then the tests pass.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-13 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r78689452
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -330,14 +332,237 @@ class StatisticsSuite extends QueryTest with 
TestHiveSingleton with SQLTestUtils
   val dfNoCols = spark.createDataFrame(rddNoCols, 
StructType(Seq.empty))
   dfNoCols.write.format("json").saveAsTable(table_no_cols)
   sql(s"ANALYZE TABLE $table_no_cols COMPUTE STATISTICS")
-  checkStats(
+  checkTableStats(
 table_no_cols,
 isDataSourceTable = true,
 hasSizeInBytes = true,
 expectedRowCounts = Some(10))
 }
   }
 
+  private def checkColStats(
--- End diff --

I used `checkTableStats` in some cases for column stats, so maybe put all 
test cases for table/column stats into a separate file?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14444: [SPARK-16839] [SQL] redundant aliases after cleanupAlias...

2016-09-13 Thread eyalfa
Github user eyalfa commented on the issue:

https://github.com/apache/spark/pull/1
  
@HyukjinKwon, thank you very much for your analysis.
if you read the history of this PR you'd see that at some point @hvanhovell 
suggested that we completely remove CreateStruct and CreateStructUnsafe and 
just leave a constructor that create the named version.
I've modified catalyst tests that relied on CreateStruct, so I guess R 
tests should be modified as well.

One thing I don't really understand, which is probably related to my 
(complete) lack of R knowledge:
in scala API collect returns rows, what does it return in R, what does the 
'named_struct(...)' come from? is it the column name in the schema?

@hvanhovell: how strong is the contract of assigning a name to an unnamed 
column? should we alias the constructed tree with the backward compatible name? 
(when creating the named struct)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14834: [SPARK-17163][ML] Unified LogisticRegression interface

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14834
  
**[Test build #65355 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65355/consoleFull)**
 for PR 14834 at commit 
[`f537543`](https://github.com/apache/spark/commit/f53754313e0acf2da6d2f923f716b70c7a49e616).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14834: [SPARK-17163][ML] Unified LogisticRegression interface

2016-09-13 Thread sethah
Github user sethah commented on the issue:

https://github.com/apache/spark/pull/14834
  
@dbtsai Thanks for your review. I addressed all but one comment, which I 
left a follow up on. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14834: [SPARK-17163][ML] Unified LogisticRegression interface

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14834
  
**[Test build #65354 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65354/consoleFull)**
 for PR 14834 at commit 
[`0c2de2c`](https://github.com/apache/spark/commit/0c2de2cf70e07dd30960cccd422a4ca4ca35b594).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15092: [SPARK-17142][SQL] Complex query triggers binding error ...

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15092
  
**[Test build #65353 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65353/consoleFull)**
 for PR 15092 at commit 
[`dc3b1b2`](https://github.com/apache/spark/commit/dc3b1b288d7340183250acf2765da61497790c64).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-13 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r78688956
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import scala.collection.mutable
+
+import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
+import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, 
CatalogTable}
+import org.apache.spark.sql.catalyst.plans.logical.{BasicColStats, 
Statistics}
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
+
+
+/**
+ * Analyzes the given columns of the given table in the current database 
to generate statistics,
+ * which will be used in query optimizations.
+ */
+case class AnalyzeColumnCommand(
+tableName: String,
+columnNames: Seq[String]) extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val sessionState = sparkSession.sessionState
+val tableIdent = sessionState.sqlParser.parseTableIdentifier(tableName)
+val relation = 
EliminateSubqueryAliases(sessionState.catalog.lookupRelation(tableIdent))
+
+// check correctness for column names
+val attributeNames = relation.output.map(_.name.toLowerCase)
+val invalidColumns = columnNames.filterNot { col => 
attributeNames.contains(col.toLowerCase)}
+if (invalidColumns.nonEmpty) {
+  throw new AnalysisException(s"Invalid columns for table $tableName: 
$invalidColumns.")
+}
+
+relation match {
+  case catalogRel: CatalogRelation =>
+updateStats(catalogRel.catalogTable,
+  AnalyzeTableCommand.calculateTotalSize(sparkSession, 
catalogRel.catalogTable))
+
+  case logicalRel: LogicalRelation if 
logicalRel.catalogTable.isDefined =>
+updateStats(logicalRel.catalogTable.get, 
logicalRel.relation.sizeInBytes)
+
+  case otherRelation =>
+throw new AnalysisException(s"ANALYZE TABLE is not supported for " 
+
+  s"${otherRelation.nodeName}.")
+}
+
+def updateStats(catalogTable: CatalogTable, newTotalSize: Long): Unit 
= {
+  val lowerCaseNames = columnNames.map(_.toLowerCase)
+  val attributes =
+relation.output.filter(attr => 
lowerCaseNames.contains(attr.name.toLowerCase))
+
+  // collect column statistics
+  val aggColumns = mutable.ArrayBuffer[Column](count(Column("*")))
+  attributes.foreach(entry => aggColumns ++= statsAgg(entry.name, 
entry.dataType))
+  val statsRow: InternalRow = Dataset.ofRows(sparkSession, 
relation).select(aggColumns: _*)
+.queryExecution.toRdd.collect().head
+
+  // We also update table-level stats to prevent inconsistency in case 
of table modification
+  // between the two ANALYZE commands for collecting table-level stats 
and column-level stats.
+  val rowCount = statsRow.getLong(0)
+  var newStats: Statistics = if (catalogTable.stats.isDefined) {
+catalogTable.stats.get.copy(sizeInBytes = newTotalSize, rowCount = 
Some(rowCount))
+  } else {
+Statistics(sizeInBytes = newTotalSize, rowCount = Some(rowCount))
+  }
+
+  var pos = 1
+  val colStats = mutable.HashMap[String, BasicColStats]()
+  attributes.foreach { attr =>
+attr.dataType match {
+  case n: NumericType =>
+colStats += attr.name -> BasicColStats(
+  dataType = attr.dataType,
+  numNulls = rowCount - statsRow.getLong(pos + 
NumericStatsAgg.numNotNullsIndex),
+  max = Option(statsRow.get(pos + NumericStatsAgg.maxIndex, 

[GitHub] spark pull request #15092: [SPARK-17142][SQL] Complex query triggers binding...

2016-09-13 Thread jiangxb1987
GitHub user jiangxb1987 opened a pull request:

https://github.com/apache/spark/pull/15092

[SPARK-17142][SQL] Complex query triggers binding error in 
HashAggregateExec [BACKPORT 2.0]

## What changes were proposed in this pull request?

This PR backports #14917 to branch-2.0. It fixes a expression optimize bug 
caused by rule `ReorderAssociativeOperator `.

## How was this patch tested?

Add new test case in ReorderAssociativeOperatorSuite.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jiangxb1987/spark rao-branch-2.0

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15092.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15092






---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14926: [SPARK-17365][Core] Remove/Kill multiple executors toget...

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14926
  
**[Test build #65352 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65352/consoleFull)**
 for PR 14926 at commit 
[`202482b`](https://github.com/apache/spark/commit/202482bb2fb38c1a5c164fcbd9a214937fb0b392).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-13 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/14834#discussion_r78688637
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -595,55 +831,104 @@ class LogisticRegressionModel private[spark] (
* Predict label for the given feature vector.
* The behavior of this can be adjusted using [[thresholds]].
*/
-  override protected def predict(features: Vector): Double = {
+  override protected def predict(features: Vector): Double = if 
(isMultinomial) {
+super.predict(features)
--- End diff --

Would you mind elaborating? This calls ends up calling 
`predictRaw(features).argmax`, which equates to `margins(features).argmax`. 
What specialized version are you referring to? Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-13 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r78688327
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import scala.collection.mutable
+
+import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
+import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, 
CatalogTable}
+import org.apache.spark.sql.catalyst.plans.logical.{BasicColStats, 
Statistics}
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
+
+
+/**
+ * Analyzes the given columns of the given table in the current database 
to generate statistics,
+ * which will be used in query optimizations.
+ */
+case class AnalyzeColumnCommand(
+tableName: String,
+columnNames: Seq[String]) extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val sessionState = sparkSession.sessionState
+val tableIdent = sessionState.sqlParser.parseTableIdentifier(tableName)
+val relation = 
EliminateSubqueryAliases(sessionState.catalog.lookupRelation(tableIdent))
+
+// check correctness for column names
+val attributeNames = relation.output.map(_.name.toLowerCase)
+val invalidColumns = columnNames.filterNot { col => 
attributeNames.contains(col.toLowerCase)}
+if (invalidColumns.nonEmpty) {
+  throw new AnalysisException(s"Invalid columns for table $tableName: 
$invalidColumns.")
+}
+
+relation match {
+  case catalogRel: CatalogRelation =>
+updateStats(catalogRel.catalogTable,
+  AnalyzeTableCommand.calculateTotalSize(sparkSession, 
catalogRel.catalogTable))
+
+  case logicalRel: LogicalRelation if 
logicalRel.catalogTable.isDefined =>
+updateStats(logicalRel.catalogTable.get, 
logicalRel.relation.sizeInBytes)
+
+  case otherRelation =>
+throw new AnalysisException(s"ANALYZE TABLE is not supported for " 
+
+  s"${otherRelation.nodeName}.")
+}
+
+def updateStats(catalogTable: CatalogTable, newTotalSize: Long): Unit 
= {
+  val lowerCaseNames = columnNames.map(_.toLowerCase)
+  val attributes =
+relation.output.filter(attr => 
lowerCaseNames.contains(attr.name.toLowerCase))
+
+  // collect column statistics
+  val aggColumns = mutable.ArrayBuffer[Column](count(Column("*")))
+  attributes.foreach(entry => aggColumns ++= statsAgg(entry.name, 
entry.dataType))
+  val statsRow: InternalRow = Dataset.ofRows(sparkSession, 
relation).select(aggColumns: _*)
+.queryExecution.toRdd.collect().head
+
+  // We also update table-level stats to prevent inconsistency in case 
of table modification
+  // between the two ANALYZE commands for collecting table-level stats 
and column-level stats.
+  val rowCount = statsRow.getLong(0)
+  var newStats: Statistics = if (catalogTable.stats.isDefined) {
+catalogTable.stats.get.copy(sizeInBytes = newTotalSize, rowCount = 
Some(rowCount))
+  } else {
+Statistics(sizeInBytes = newTotalSize, rowCount = Some(rowCount))
+  }
+
+  var pos = 1
+  val colStats = mutable.HashMap[String, BasicColStats]()
+  attributes.foreach { attr =>
+attr.dataType match {
+  case n: NumericType =>
+colStats += attr.name -> BasicColStats(
+  dataType = attr.dataType,
+  numNulls = rowCount - statsRow.getLong(pos + 
NumericStatsAgg.numNotNullsIndex),
+  max = Option(statsRow.get(pos + NumericStatsAgg.maxIndex, 

[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-13 Thread sethah
Github user sethah commented on a diff in the pull request:

https://github.com/apache/spark/pull/14834#discussion_r78688210
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -508,11 +680,42 @@ object LogisticRegression extends 
DefaultParamsReadable[LogisticRegression] {
 @Since("1.4.0")
 class LogisticRegressionModel private[spark] (
 @Since("1.4.0") override val uid: String,
-@Since("2.0.0") val coefficients: Vector,
-@Since("1.3.0") val intercept: Double)
+@Since("2.1.0") val coefficientMatrix: Matrix,
+@Since("2.1.0") val interceptVector: Vector,
+@Since("1.3.0") override val numClasses: Int,
+private val isMultinomial: Boolean)
   extends ProbabilisticClassificationModel[Vector, LogisticRegressionModel]
   with LogisticRegressionParams with MLWritable {
 
+  @Since("2.0.0")
+  def coefficients: Vector = if (isMultinomial) {
+throw new SparkException("Multinomial models contain a matrix of 
coefficients, use " +
+  "coefficientMatrix instead.")
+  } else {
+_coefficients
+  }
+
+  // convert to appropriate vector representation without replicating data
+  private lazy val _coefficients: Vector = coefficientMatrix match {
+case dm: DenseMatrix => Vectors.dense(dm.values)
--- End diff --

In that case, `coefficientMatrix` is a 1 x numFeatures dense matrix, I 
don't believe it makes any difference if it's row major or column major.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15091: [Core][Doc]:remove redundant comment

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15091
  
**[Test build #65351 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65351/consoleFull)**
 for PR 15091 at commit 
[`c8afcb8`](https://github.com/apache/spark/commit/c8afcb8e51c20157ccd965231141b3b47b3130b6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15091: [Core][Doc]:remove redundant comment

2016-09-13 Thread wangmiao1981
GitHub user wangmiao1981 opened a pull request:

https://github.com/apache/spark/pull/15091

[Core][Doc]:remove redundant comment

## What changes were proposed in this pull request?
In the comment, there is redundant `the estimated`.

This PR simply remove the redundant comment and adjusts format.





You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangmiao1981/spark comment

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15091.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15091


commit f031bd91acf9c98a06afc9b6aa940248e17a8641
Author: wm...@hotmail.com 
Date:   2016-09-14T05:16:32Z

remove redundant comment




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-13 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r78687900
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -98,8 +98,12 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   ctx.identifier != null &&
   ctx.identifier.getText.toLowerCase == "noscan") {
   
AnalyzeTableCommand(visitTableIdentifier(ctx.tableIdentifier).toString)
-} else {
+} else if (ctx.identifierSeq() == null) {
--- End diff --

yeah, I'm also thinking to do this:)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-13 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r7868
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -98,8 +98,12 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   ctx.identifier != null &&
   ctx.identifier.getText.toLowerCase == "noscan") {
   
AnalyzeTableCommand(visitTableIdentifier(ctx.tableIdentifier).toString)
-} else {
+} else if (ctx.identifierSeq() == null) {
--- End diff --

Then, issue an exception here. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14444: [SPARK-16839] [SQL] redundant aliases after cleanupAlias...

2016-09-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/1
  
cc @shivaram Would this be sensible if we print the results if R tests 
failed?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-13 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r78687294
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -98,8 +98,12 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   ctx.identifier != null &&
   ctx.identifier.getText.toLowerCase == "noscan") {
   
AnalyzeTableCommand(visitTableIdentifier(ctx.tableIdentifier).toString)
-} else {
+} else if (ctx.identifierSeq() == null) {
--- End diff --

For analyze column command, users should know exactly what they want to do. 
So they need to specify the columns, otherwise, we don't compute statistics for 
columns. AFAIK, hive will generate all column stats for this case, but I think 
we should not do that. At least, we could provide other command like FOR ALL 
COLUMNS to do this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r78687128
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
 ---
@@ -457,6 +457,20 @@ class DataFrameReaderWriterSuite extends QueryTest 
with SharedSQLContext with Be
 checkAnswer(df2, df)
   }
 
+  test("save as table if a same-name temp view exists") {
+import SaveMode._
+for (mode <- Seq(Append, ErrorIfExists, Overwrite, Ignore)) {
+  withTable("same_name") {
+withTempView("same_name") {
+  spark.range(10).createTempView("same_name")
+  spark.range(20).write.mode(mode).saveAsTable("same_name")
+  checkAnswer(spark.table("same_name"), spark.range(10).toDF())
+  checkAnswer(spark.table("default.same_name"), 
spark.range(20).toDF())
+}
+  }
+}
+  }
--- End diff --

Let's add comments to explain what this test is for in case we accidentally 
delete it in future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-13 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r78687147
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import scala.collection.mutable
+
+import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
+import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, 
CatalogTable}
+import org.apache.spark.sql.catalyst.plans.logical.{BasicColStats, 
Statistics}
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
+
+
+/**
+ * Analyzes the given columns of the given table in the current database 
to generate statistics,
+ * which will be used in query optimizations.
+ */
+case class AnalyzeColumnCommand(
+tableName: String,
+columnNames: Seq[String]) extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val sessionState = sparkSession.sessionState
+val tableIdent = sessionState.sqlParser.parseTableIdentifier(tableName)
+val relation = 
EliminateSubqueryAliases(sessionState.catalog.lookupRelation(tableIdent))
+
+// check correctness for column names
+val attributeNames = relation.output.map(_.name.toLowerCase)
--- End diff --

Yeah. In Spark, we have the SQLConf `spark.sql.caseSensitive` to control it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r78687123
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala ---
@@ -322,6 +325,14 @@ class CatalogSuite
 assert(e2.message == "Cannot create a file-based external data source 
table without path")
   }
 
+  test("dropTempView if a same-name table exists") {
+withTable("same_name") {
+  sql("CREATE TABLE same_name(i int) USING json")
+  spark.catalog.dropTempView("same_name")
+  
assert(spark.sessionState.catalog.tableExists(TableIdentifier("same_name")))
+}
+  }
--- End diff --

Let's add comments to explain what this test is for in case we accidentally 
delete it in future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r78687075
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2661,4 +2661,15 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 data.selectExpr("`part.col1`", "`col.1`"))
 }
   }
+
+  test("CREATE TABLE USING if a same-name temp view exists") {
+withTable("same_name") {
+  withTempView("same_name") {
+spark.range(10).createTempView("same_name")
+sql("CREATE TABLE same_name(i int) USING json")
+checkAnswer(spark.table("same_name"), spark.range(10).toDF())
+assert(spark.table("default.same_name").collect().isEmpty)
+  }
+}
+  }
--- End diff --

Let's add comments to explain what this test is for in case we accidentally 
delete it in future.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14981: [SPARK-17418] Remove Kinesis artifacts from Spark releas...

2016-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14981
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14981: [SPARK-17418] Remove Kinesis artifacts from Spark releas...

2016-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14981
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65345/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics

2016-09-13 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/15090
  
Like Hive, I think we should implement a built-in function, 
`compute_stats`. Then, the implementation of `AnalyzeColumnCommand` will be 
much cleaner.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14981: [SPARK-17418] Remove Kinesis artifacts from Spark releas...

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14981
  
**[Test build #65345 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65345/consoleFull)**
 for PR 14981 at commit 
[`07eb037`](https://github.com/apache/spark/commit/07eb0372bbb70eb6a2d661dbdb28750020ba500b).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-13 Thread wzhfy
Github user wzhfy commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r78686975
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import scala.collection.mutable
+
+import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
+import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, 
CatalogTable}
+import org.apache.spark.sql.catalyst.plans.logical.{BasicColStats, 
Statistics}
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
+
+
+/**
+ * Analyzes the given columns of the given table in the current database 
to generate statistics,
+ * which will be used in query optimizations.
+ */
+case class AnalyzeColumnCommand(
+tableName: String,
+columnNames: Seq[String]) extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val sessionState = sparkSession.sessionState
+val tableIdent = sessionState.sqlParser.parseTableIdentifier(tableName)
+val relation = 
EliminateSubqueryAliases(sessionState.catalog.lookupRelation(tableIdent))
+
+// check correctness for column names
+val attributeNames = relation.output.map(_.name.toLowerCase)
--- End diff --

key and KeY are different columns?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r78686868
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/test/DataFrameReaderWriterSuite.scala
 ---
@@ -457,6 +457,20 @@ class DataFrameReaderWriterSuite extends QueryTest 
with SharedSQLContext with Be
 checkAnswer(df2, df)
   }
 
+  test("save as table if a same-name temp view exists") {
+import SaveMode._
+for (mode <- Seq(Append, ErrorIfExists, Overwrite, Ignore)) {
+  withTable("same_name") {
+withTempView("same_name") {
+  spark.range(10).createTempView("same_name")
+  spark.range(20).write.mode(mode).saveAsTable("same_name")
+  checkAnswer(spark.table("same_name"), spark.range(10).toDF())
+  checkAnswer(spark.table("default.same_name"), 
spark.range(20).toDF())
+}
+  }
+}
+  }
--- End diff --

This is a regression test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r78686835
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/internal/CatalogSuite.scala ---
@@ -322,6 +325,14 @@ class CatalogSuite
 assert(e2.message == "Cannot create a file-based external data source 
table without path")
   }
 
+  test("dropTempView if a same-name table exists") {
+withTable("same_name") {
+  sql("CREATE TABLE same_name(i int) USING json")
+  spark.catalog.dropTempView("same_name")
+  
assert(spark.sessionState.catalog.tableExists(TableIdentifier("same_name")))
+}
+  }
--- End diff --

This is a regression test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r78686776
  
--- Diff: sql/core/src/test/scala/org/apache/spark/sql/SQLQuerySuite.scala 
---
@@ -2661,4 +2661,15 @@ class SQLQuerySuite extends QueryTest with 
SharedSQLContext {
 data.selectExpr("`part.col1`", "`col.1`"))
 }
   }
+
+  test("CREATE TABLE USING if a same-name temp view exists") {
+withTable("same_name") {
+  withTempView("same_name") {
+spark.range(10).createTempView("same_name")
+sql("CREATE TABLE same_name(i int) USING json")
+checkAnswer(spark.table("same_name"), spark.range(10).toDF())
+assert(spark.table("default.same_name").collect().isEmpty)
+  }
+}
+  }
--- End diff --

This is a regression test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-13 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r78686462
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import scala.collection.mutable
+
+import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
+import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, 
CatalogTable}
+import org.apache.spark.sql.catalyst.plans.logical.{BasicColStats, 
Statistics}
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
+
+
+/**
+ * Analyzes the given columns of the given table in the current database 
to generate statistics,
+ * which will be used in query optimizations.
+ */
+case class AnalyzeColumnCommand(
+tableName: String,
+columnNames: Seq[String]) extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val sessionState = sparkSession.sessionState
+val tableIdent = sessionState.sqlParser.parseTableIdentifier(tableName)
+val relation = 
EliminateSubqueryAliases(sessionState.catalog.lookupRelation(tableIdent))
+
+// check correctness for column names
+val attributeNames = relation.output.map(_.name.toLowerCase)
+val invalidColumns = columnNames.filterNot { col => 
attributeNames.contains(col.toLowerCase)}
+if (invalidColumns.nonEmpty) {
+  throw new AnalysisException(s"Invalid columns for table $tableName: 
$invalidColumns.")
+}
+
+relation match {
+  case catalogRel: CatalogRelation =>
+updateStats(catalogRel.catalogTable,
+  AnalyzeTableCommand.calculateTotalSize(sparkSession, 
catalogRel.catalogTable))
+
+  case logicalRel: LogicalRelation if 
logicalRel.catalogTable.isDefined =>
+updateStats(logicalRel.catalogTable.get, 
logicalRel.relation.sizeInBytes)
+
+  case otherRelation =>
+throw new AnalysisException(s"ANALYZE TABLE is not supported for " 
+
+  s"${otherRelation.nodeName}.")
+}
+
+def updateStats(catalogTable: CatalogTable, newTotalSize: Long): Unit 
= {
+  val lowerCaseNames = columnNames.map(_.toLowerCase)
+  val attributes =
+relation.output.filter(attr => 
lowerCaseNames.contains(attr.name.toLowerCase))
+
+  // collect column statistics
+  val aggColumns = mutable.ArrayBuffer[Column](count(Column("*")))
+  attributes.foreach(entry => aggColumns ++= statsAgg(entry.name, 
entry.dataType))
+  val statsRow: InternalRow = Dataset.ofRows(sparkSession, 
relation).select(aggColumns: _*)
+.queryExecution.toRdd.collect().head
+
+  // We also update table-level stats to prevent inconsistency in case 
of table modification
+  // between the two ANALYZE commands for collecting table-level stats 
and column-level stats.
+  val rowCount = statsRow.getLong(0)
+  var newStats: Statistics = if (catalogTable.stats.isDefined) {
+catalogTable.stats.get.copy(sizeInBytes = newTotalSize, rowCount = 
Some(rowCount))
+  } else {
+Statistics(sizeInBytes = newTotalSize, rowCount = Some(rowCount))
+  }
+
+  var pos = 1
+  val colStats = mutable.HashMap[String, BasicColStats]()
+  attributes.foreach { attr =>
+attr.dataType match {
+  case n: NumericType =>
+colStats += attr.name -> BasicColStats(
+  dataType = attr.dataType,
+  numNulls = rowCount - statsRow.getLong(pos + 
NumericStatsAgg.numNotNullsIndex),
+  max = Option(statsRow.get(pos + NumericStatsAgg.maxIndex, 

[GitHub] spark issue #14444: [SPARK-16839] [SQL] redundant aliases after cleanupAlias...

2016-09-13 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/1
  
I see. It seems using `struct(...)` does not print `struct(...)` but 
`named_struct(...)` as specified in `CreateNamedStruct`.

So, the code below:

```scala
scala> spark.range(1).selectExpr("struct(1, 2)").show()
```

prints below:

**Before**

```bash
+--+
|struct(col1, col2)|
+--+
| [1,2]|
+--+
```

**After**

```bash
+--+
|named_struct(col1, 1, col2, 2)|
+--+
| [1,2]|
+--+
```

Would this be necessary to remove both `CreateStruct` and 
`CreateStructUnsafe`? I think we might have to introduce common parent if 
possible.

BTW, the failed R tests are as below:

```r
df <- createDataFrame(list(list(1L, 2L, 3L), list(4L, 5L, 6L)),
schema = c("a", "b", "c"))
result <- collect(select(df, struct("a", "c")))
expected <- data.frame(row.names = 1:2)
expected$"struct(a, c)" <- list(listToStruct(list(a = 1L, c = 3L)),
   listToStruct(list(a = 4L, c = 6L)))
```

```r
> result
  named_struct(a, a, c, c)
1 1, 3
2 4, 6
> expected
  struct(a, c)
1 1, 3
2 4, 6
```

```r
result <- collect(select(df, struct(df$a, df$b)))
expected <- data.frame(row.names = 1:2)
expected$"struct(a, b)" <- list(listToStruct(list(a = 1L, b = 2L)),
   listToStruct(list(a = 4L, b = 5L)))
```

```r
> result
  named_struct(a, a, b, b)
1 1, 2
2 4, 5
> expected
  struct(a, b)
1 1, 2
2 4, 5
```

Therefore, it seems we definitely need a test for the names as these holes 
were identified here. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-13 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r78685367
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import scala.collection.mutable
+
+import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
+import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, 
CatalogTable}
+import org.apache.spark.sql.catalyst.plans.logical.{BasicColStats, 
Statistics}
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
+
+
+/**
+ * Analyzes the given columns of the given table in the current database 
to generate statistics,
+ * which will be used in query optimizations.
+ */
+case class AnalyzeColumnCommand(
+tableName: String,
+columnNames: Seq[String]) extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val sessionState = sparkSession.sessionState
+val tableIdent = sessionState.sqlParser.parseTableIdentifier(tableName)
+val relation = 
EliminateSubqueryAliases(sessionState.catalog.lookupRelation(tableIdent))
+
+// check correctness for column names
+val attributeNames = relation.output.map(_.name.toLowerCase)
+val invalidColumns = columnNames.filterNot { col => 
attributeNames.contains(col.toLowerCase)}
+if (invalidColumns.nonEmpty) {
+  throw new AnalysisException(s"Invalid columns for table $tableName: 
$invalidColumns.")
+}
+
+relation match {
+  case catalogRel: CatalogRelation =>
+updateStats(catalogRel.catalogTable,
+  AnalyzeTableCommand.calculateTotalSize(sparkSession, 
catalogRel.catalogTable))
+
+  case logicalRel: LogicalRelation if 
logicalRel.catalogTable.isDefined =>
+updateStats(logicalRel.catalogTable.get, 
logicalRel.relation.sizeInBytes)
+
+  case otherRelation =>
+throw new AnalysisException(s"ANALYZE TABLE is not supported for " 
+
--- End diff --

This `s` is useless. Right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-13 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r78685328
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import scala.collection.mutable
+
+import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
+import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, 
CatalogTable}
+import org.apache.spark.sql.catalyst.plans.logical.{BasicColStats, 
Statistics}
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
+
+
+/**
+ * Analyzes the given columns of the given table in the current database 
to generate statistics,
+ * which will be used in query optimizations.
+ */
+case class AnalyzeColumnCommand(
+tableName: String,
+columnNames: Seq[String]) extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val sessionState = sparkSession.sessionState
+val tableIdent = sessionState.sqlParser.parseTableIdentifier(tableName)
+val relation = 
EliminateSubqueryAliases(sessionState.catalog.lookupRelation(tableIdent))
+
+// check correctness for column names
+val attributeNames = relation.output.map(_.name.toLowerCase)
+val invalidColumns = columnNames.filterNot { col => 
attributeNames.contains(col.toLowerCase)}
--- End diff --

Also verify whether the list contains duplicate columns.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-13 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r78685262
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/AnalyzeColumnCommand.scala
 ---
@@ -0,0 +1,209 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.command
+
+import scala.collection.mutable
+
+import org.apache.spark.sql._
+import org.apache.spark.sql.catalyst.InternalRow
+import org.apache.spark.sql.catalyst.analysis.EliminateSubqueryAliases
+import org.apache.spark.sql.catalyst.catalog.{CatalogRelation, 
CatalogTable}
+import org.apache.spark.sql.catalyst.plans.logical.{BasicColStats, 
Statistics}
+import org.apache.spark.sql.execution.datasources.LogicalRelation
+import org.apache.spark.sql.functions._
+import org.apache.spark.sql.types._
+
+
+/**
+ * Analyzes the given columns of the given table in the current database 
to generate statistics,
+ * which will be used in query optimizations.
+ */
+case class AnalyzeColumnCommand(
+tableName: String,
+columnNames: Seq[String]) extends RunnableCommand {
+
+  override def run(sparkSession: SparkSession): Seq[Row] = {
+val sessionState = sparkSession.sessionState
+val tableIdent = sessionState.sqlParser.parseTableIdentifier(tableName)
+val relation = 
EliminateSubqueryAliases(sessionState.catalog.lookupRelation(tableIdent))
+
+// check correctness for column names
+val attributeNames = relation.output.map(_.name.toLowerCase)
--- End diff --

Please consider case sensitivity here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-13 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r78685116
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -98,8 +98,12 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   ctx.identifier != null &&
   ctx.identifier.getText.toLowerCase == "noscan") {
   
AnalyzeTableCommand(visitTableIdentifier(ctx.tableIdentifier).toString)
-} else {
+} else if (ctx.identifierSeq() == null) {
--- End diff --

This has a bug. It will jump to this branch, if users input 
```SQL
ANALYZE TABLE t1 COMPUTE STATISTICS FOR COLUMNS
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14962: [SPARK-17402][SQL] separate the management of temp views...

2016-09-13 Thread yhuai
Github user yhuai commented on the issue:

https://github.com/apache/spark/pull/14962
  
Is it possible to first have a PR to fix the bugs?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-13 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r78684701
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala ---
@@ -98,8 +98,12 @@ class SparkSqlAstBuilder(conf: SQLConf) extends 
AstBuilder {
   ctx.identifier != null &&
   ctx.identifier.getText.toLowerCase == "noscan") {
   
AnalyzeTableCommand(visitTableIdentifier(ctx.tableIdentifier).toString)
-} else {
+} else if (ctx.identifierSeq() == null) {
--- End diff --

Since this PR changes the Parser, please update the comment of this 
function to reflect the latest changes. 

In addition, please add the test cases in `DDLCommandSuite` for verifying 
the Parser's behaviors


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14971
  
**[Test build #65350 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65350/consoleFull)**
 for PR 14971 at commit 
[`9e18ba1`](https://github.com/apache/spark/commit/9e18ba104527d2bb14331f4b51194002dabb2556).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14118
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65343/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14118
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14118
  
**[Test build #65343 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65343/consoleFull)**
 for PR 14118 at commit 
[`d5357f9`](https://github.com/apache/spark/commit/d5357f9d784cc277d58fd896738a87a7aff7ba70).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...

2016-09-13 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14971
  
@hvanhovell @cloud-fan Could you help me review this PR? 
https://github.com/apache/spark/pull/15090 is changing the same code path for 
column-level statistics.

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14971: [SPARK-17410] [SPARK-17284] Move Hive-generated Stats In...

2016-09-13 Thread gatorsmile
Github user gatorsmile commented on the issue:

https://github.com/apache/spark/pull/14971
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-13 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request:

https://github.com/apache/spark/pull/15090#discussion_r78683972
  
--- Diff: 
sql/hive/src/test/scala/org/apache/spark/sql/hive/StatisticsSuite.scala ---
@@ -330,14 +332,237 @@ class StatisticsSuite extends QueryTest with 
TestHiveSingleton with SQLTestUtils
   val dfNoCols = spark.createDataFrame(rddNoCols, 
StructType(Seq.empty))
   dfNoCols.write.format("json").saveAsTable(table_no_cols)
   sql(s"ANALYZE TABLE $table_no_cols COMPUTE STATISTICS")
-  checkStats(
+  checkTableStats(
 table_no_cols,
 isDataSourceTable = true,
 hasSizeInBytes = true,
 expectedRowCounts = Some(10))
 }
   }
 
+  private def checkColStats(
--- End diff --

This test suite becomes bigger and bigger. For column stats, let us create 
a new file?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15026: [SPARK-17472] [PYSPARK] Better error message for ...

2016-09-13 Thread ericl
Github user ericl commented on a diff in the pull request:

https://github.com/apache/spark/pull/15026#discussion_r78683777
  
--- Diff: python/pyspark/broadcast.py ---
@@ -75,7 +75,13 @@ def __init__(self, sc=None, value=None, 
pickle_registry=None, path=None):
 self._path = path
 
 def dump(self, value, f):
-pickle.dump(value, f, 2)
+try:
+pickle.dump(value, f, 2)
+except pickle.PickleError:
+raise
+except Exception as e:
+msg = "Could not serialize broadcast: " + e.__class__.__name__ 
+ ": " + e.message
+raise pickle.PicklingError(msg)
--- End diff --

It seems we use print_exec() elsewhere so going to use that for consistency.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15090
  
**[Test build #65349 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65349/consoleFull)**
 for PR 15090 at commit 
[`027bdcc`](https://github.com/apache/spark/commit/027bdcc59b1b01a8dac436dd3a86600c2451c95f).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14962: [SPARK-17402][SQL] separate the management of tem...

2016-09-13 Thread yhuai
Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14962#discussion_r78683471
  
--- Diff: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
 ---
@@ -439,7 +439,7 @@ class Analyzer(
   object ResolveRelations extends Rule[LogicalPlan] {
 private def lookupTableFromCatalog(u: UnresolvedRelation): LogicalPlan 
= {
   try {
-catalog.lookupRelation(u.tableIdentifier, u.alias)
+catalog.lookupTempViewOrRelation(u.tableIdentifier, u.alias)
--- End diff --

This is also for view, right? Should we just keep the old name?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14980: [SPARK-17317][SparkR] Add SparkR vignette

2016-09-13 Thread shivaram
Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/14980
  
@junyangq As we discussed before, lets open a new PR for 2.0 ?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14980: [SPARK-17317][SparkR] Add SparkR vignette

2016-09-13 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/14980


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15073: [SPARK-17518] [SQL] Block Users to Specify the Internal ...

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15073
  
**[Test build #65348 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65348/consoleFull)**
 for PR 15073 at commit 
[`9711edb`](https://github.com/apache/spark/commit/9711edb25f401703e08e51cc6f4f0495731da12a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics

2016-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15090
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65347/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15090
  
**[Test build #65347 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65347/consoleFull)**
 for PR 15090 at commit 
[`59ae3df`](https://github.com/apache/spark/commit/59ae3dfc45751705962a1370c195c67c6302c376).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class BasicColStats(`
  * `case class AnalyzeColumnCommand(`
  * `trait StatsAggFunc `


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics

2016-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15090
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15085: [SPARK-17484] Prevent invalid block locations from being...

2016-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15085
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65342/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15085: [SPARK-17484] Prevent invalid block locations from being...

2016-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15085
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15090: [SPARK-17073] [SQL] generate column-level statistics

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15090
  
**[Test build #65347 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65347/consoleFull)**
 for PR 15090 at commit 
[`59ae3df`](https://github.com/apache/spark/commit/59ae3dfc45751705962a1370c195c67c6302c376).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15085: [SPARK-17484] Prevent invalid block locations from being...

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15085
  
**[Test build #65342 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65342/consoleFull)**
 for PR 15085 at commit 
[`f60c4be`](https://github.com/apache/spark/commit/f60c4be307cf21bf61b27942ed75887546021458).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15090: [SPARK-17073] [SQL] generate column-level statist...

2016-09-13 Thread wzhfy
GitHub user wzhfy opened a pull request:

https://github.com/apache/spark/pull/15090

[SPARK-17073] [SQL] generate column-level statistics

## What changes were proposed in this pull request?

Generate basic column statistics for all the atomic types:
- numeric types: max, min, num of nulls, ndv (number of distinct values)
- date/timestamp types: they are also represented as numbers internally, so 
they have the same stats as above.
- string: avg length, max length, num of nulls, ndv
- binary: avg length, max length, num of nulls
- boolean: num of nulls, num of trues, num of falsies, ndv (must be 2)

Also support storing and loading these statistics.

## How was this patch tested?

add unit tests

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wzhfy/spark colStats

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/15090.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #15090


commit 59ae3dfc45751705962a1370c195c67c6302c376
Author: Zhenhua Wang 
Date:   2016-09-14T03:03:05Z

support column-level stats




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15042: [SPARK-17449] [Documentation] [Relation between heartbea...

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15042
  
**[Test build #65346 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65346/consoleFull)**
 for PR 15042 at commit 
[`1a76a56`](https://github.com/apache/spark/commit/1a76a56c25fd89ff409f856a83c5b1464d153607).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #11119: [SPARK-10780][ML] Add an initial model to kmeans

2016-09-13 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/9#discussion_r78682701
  
--- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/KMeans.scala 
---
@@ -303,6 +322,29 @@ class KMeans @Since("1.5.0") (
   @Since("1.5.0")
   def setSeed(value: Long): this.type = set(seed, value)
 
+  /** @group setParam */
+  @Since("2.1.0")
+  def setInitialModel(value: KMeansModel): this.type = set(initialModel, 
value)
+
+  /** @group setParam */
+  @Since("2.1.0")
+  def setInitialModel(value: Model[_]): this.type = {
--- End diff --

+1 on using `KMeansModel.fromCenters(centers)`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15059: [SPARK-17506][SQL] Improve the check double values equal...

2016-09-13 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/15059
  
Moving generic testing utils from mllib to common looks OK to me. Actually 
we have ```TestingUtils``` under both spark.ml.util and spark.mllib.util. If we 
would like to move, we should remove both of them. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14980: [SPARK-17317][SparkR] Add SparkR vignette

2016-09-13 Thread shivaram
Github user shivaram commented on the issue:

https://github.com/apache/spark/pull/14980
  
Thanks @junyangq and @felixcheung - Merging this into master once the 
AppVeyor check passes


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14981: [SPARK-17418] Remove Kinesis artifacts from Spark releas...

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14981
  
**[Test build #65345 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65345/consoleFull)**
 for PR 14981 at commit 
[`07eb037`](https://github.com/apache/spark/commit/07eb0372bbb70eb6a2d661dbdb28750020ba500b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14980: [SPARK-17317][SparkR] Add SparkR vignette

2016-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14980
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14980: [SPARK-17317][SparkR] Add SparkR vignette

2016-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14980
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65344/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14980: [SPARK-17317][SparkR] Add SparkR vignette

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14980
  
**[Test build #65344 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65344/consoleFull)**
 for PR 14980 at commit 
[`aa3f6a4`](https://github.com/apache/spark/commit/aa3f6a46fd27d7ad68973cb2426d06e20b6f0b32).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15000: [SPARK-17437] Add uiWebUrl to JavaSparkContext and pyspa...

2016-09-13 Thread apetresc
Github user apetresc commented on the issue:

https://github.com/apache/spark/pull/15000
  
@srowen: Just to make sure I understand, are you asking me to remove the 
Java accessor here, and just plumb straight through to the Scala object from 
PySpark? Or is it fine as-is?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15035: [SPARK-17477]: SparkSQL cannot handle schema evolution f...

2016-09-13 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/15035
  
We definitely shouldn't change SpecificMutableRow to do this upcast; 
otherwise we might introduce subtle bugs with type mismatches in the future.

cc @sameeragarwal to see if there is a better place to do this -- I think 
doing this in Parquet itself is pretty reasonable?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14980: [SPARK-17317][SparkR] Add SparkR vignette

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14980
  
**[Test build #65344 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65344/consoleFull)**
 for PR 14980 at commit 
[`aa3f6a4`](https://github.com/apache/spark/commit/aa3f6a46fd27d7ad68973cb2426d06e20b6f0b32).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14980: [SPARK-17317][SparkR] Add SparkR vignette

2016-09-13 Thread junyangq
Github user junyangq commented on a diff in the pull request:

https://github.com/apache/spark/pull/14980#discussion_r78679227
  
--- Diff: R/pkg/vignettes/sparkr-vignettes.Rmd ---
@@ -385,22 +385,29 @@ head(result[order(result$max_mpg, decreasing = TRUE), 
])
 
 Similar to `lapply` in native R, `spark.lapply` runs a function over a 
list of elements and distributes the computations with Spark. `spark.lapply` 
works in a manner that is similar to `doParallel` or `lapply` to elements of a 
list. The results of all the computations should fit in a single machine. If 
that is not the case you can do something like `df <- createDataFrame(list)` 
and then use `dapply`.
 
+We use `svm` in package `e1071` as an example. We use all default settings 
except for varying costs of constraints violation. `spark.lapply` can train 
those different models in parallel.
+
 ```{r}
-families <- c("gaussian", "poisson")
-train <- function(family) {
-  model <- glm(mpg ~ hp, mtcars, family = family)
+costs <- exp(seq(from = log(1), to = log(1000), length.out = 5))
--- End diff --

It runs as long as `e1071` is installed in the workers. Perhaps it's better 
to add a check there?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15085: [SPARK-17484] Prevent invalid block locations from being...

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15085
  
**[Test build #65337 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65337/consoleFull)**
 for PR 15085 at commit 
[`f69a5ea`](https://github.com/apache/spark/commit/f69a5ea6eff2c6b9f1e07a5d1551c67cdee5ed2e).
 * This patch **fails from timeout after a configured wait of \`250m\`**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15085: [SPARK-17484] Prevent invalid block locations from being...

2016-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15085
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65337/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15085: [SPARK-17484] Prevent invalid block locations from being...

2016-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15085
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14974: [Trivial][ML] Remove unnecessary `new` before cas...

2016-09-13 Thread zhengruifeng
Github user zhengruifeng closed the pull request at:

https://github.com/apache/spark/pull/14974


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14118
  
**[Test build #65343 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65343/consoleFull)**
 for PR 14118 at commit 
[`d5357f9`](https://github.com/apache/spark/commit/d5357f9d784cc277d58fd896738a87a7aff7ba70).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-13 Thread lw-lin
Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/14118
  
@HyukjinKwon thanks for the information!

@srowen yea I still think this is good to go.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-13 Thread lw-lin
Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/14118
  
Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15060: [SPARK-17507][ML][MLLib] check weight vector size in ANN

2016-09-13 Thread WeichenXu123
Github user WeichenXu123 commented on the issue:

https://github.com/apache/spark/pull/15060
  
@srowen the `weight` by default will randomly generated and it will 
automatically match the size, only when it is specified by user it will need 
this check... now the modification here seems to be the only path that get the 
user specified `weight`, if I missed checking something tell me, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15043: [SPARK-17491] Close serialization stream to fix wrong an...

2016-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15043
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65341/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15043: [SPARK-17491] Close serialization stream to fix wrong an...

2016-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15043
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15089: [SPARK-15621] [SQL] Support spilling for Python UDF

2016-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15089
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65340/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15043: [SPARK-17491] Close serialization stream to fix wrong an...

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15043
  
**[Test build #65341 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65341/consoleFull)**
 for PR 15043 at commit 
[`2f43e69`](https://github.com/apache/spark/commit/2f43e69c69e28ae76364155b9c8a178380b55ff3).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15089: [SPARK-15621] [SQL] Support spilling for Python UDF

2016-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15089
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15089: [SPARK-15621] [SQL] Support spilling for Python UDF

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15089
  
**[Test build #65340 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65340/consoleFull)**
 for PR 15089 at commit 
[`4964b9a`](https://github.com/apache/spark/commit/4964b9a611ed01aaa5252ac642df94db07a38868).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15085: [SPARK-17484] Prevent invalid block locations from being...

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15085
  
**[Test build #65342 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65342/consoleFull)**
 for PR 15085 at commit 
[`f60c4be`](https://github.com/apache/spark/commit/f60c4be307cf21bf61b27942ed75887546021458).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14834: [SPARK-17163][ML] Unified LogisticRegression interface

2016-09-13 Thread dbtsai
Github user dbtsai commented on the issue:

https://github.com/apache/spark/pull/14834
  
Only couple minor issues; otherwise, LGTM. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-13 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14834#discussion_r78674556
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/ProbabilisticClassifier.scala
 ---
@@ -201,11 +201,24 @@ abstract class ProbabilisticClassificationModel[
   probability.argmax
 } else {
   val thresholds: Array[Double] = getThresholds
-  val scaledProbability: Array[Double] =
-probability.toArray.zip(thresholds).map { case (p, t) =>
-  if (t == 0.0) Double.PositiveInfinity else p / t
+  val probabilities = probability.toArray
+  var argMax = 0
+  var max = Double.NegativeInfinity
+  var i = 0
+  while (i < probability.size) {
--- End diff --

val length = probability.size


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-13 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14834#discussion_r7867
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -676,39 +936,54 @@ object LogisticRegressionModel extends 
MLReadable[LogisticRegressionModel] {
 private case class Data(
 numClasses: Int,
 numFeatures: Int,
-intercept: Double,
-coefficients: Vector)
+interceptVector: Vector,
+coefficientMatrix: Matrix,
+isMultinomial: Boolean)
 
 override protected def saveImpl(path: String): Unit = {
   // Save metadata and Params
   DefaultParamsWriter.saveMetadata(instance, path, sc)
   // Save model data: numClasses, numFeatures, intercept, coefficients
-  val data = Data(instance.numClasses, instance.numFeatures, 
instance.intercept,
-instance.coefficients)
+  val data = Data(instance.numClasses, instance.numFeatures, 
instance.interceptVector,
+instance.coefficientMatrix, instance.isMultinomial)
   val dataPath = new Path(path, "data").toString
   
sparkSession.createDataFrame(Seq(data)).repartition(1).write.parquet(dataPath)
 }
   }
 
-  private class LogisticRegressionModelReader
-extends MLReader[LogisticRegressionModel] {
+  private class LogisticRegressionModelReader extends 
MLReader[LogisticRegressionModel] {
 
 /** Checked against metadata when loading model */
 private val className = classOf[LogisticRegressionModel].getName
 
 override def load(path: String): LogisticRegressionModel = {
   val metadata = DefaultParamsReader.loadMetadata(path, sc, className)
+  val versionRegex = "([0-9]+)\\.([0-9]+)\\.(.+)".r
+  val versionRegex(major, minor, _) = metadata.sparkVersion
 
   val dataPath = new Path(path, "data").toString
   val data = sparkSession.read.format("parquet").load(dataPath)
 
-  // We will need numClasses, numFeatures in the future for 
multinomial logreg support.
-  // TODO: remove numClasses and numFeatures fields?
-  val Row(numClasses: Int, numFeatures: Int, intercept: Double, 
coefficients: Vector) =
-MLUtils.convertVectorColumnsToML(data, "coefficients")
-  .select("numClasses", "numFeatures", "intercept", "coefficients")
-  .head()
-  val model = new LogisticRegressionModel(metadata.uid, coefficients, 
intercept)
+  val model = if (major.toInt < 2 || (major.toInt == 2 && minor.toInt 
== 0)) {
--- End diff --

How about `2.0.1`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15085: [SPARK-17484] Prevent invalid block locations fro...

2016-09-13 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15085#discussion_r78674398
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -857,9 +862,11 @@ private[spark] class BlockManager(
 
 val startTimeMs = System.currentTimeMillis
 var blockWasSuccessfullyStored: Boolean = false
+var exceptionWasThrown: Boolean = true
 val result: Option[T] = try {
   val res = putBody(putBlockInfo)
   blockWasSuccessfullyStored = res.isEmpty
+  exceptionWasThrown = false
   res
 } finally {
   if (blockWasSuccessfullyStored) {
--- End diff --

That said, I think we could simplify this by moving the non-error-case code 
into the `try` block. Let me do that now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #15085: [SPARK-17484] Prevent invalid block locations fro...

2016-09-13 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/15085#discussion_r78674369
  
--- Diff: core/src/main/scala/org/apache/spark/storage/BlockManager.scala 
---
@@ -857,9 +862,11 @@ private[spark] class BlockManager(
 
 val startTimeMs = System.currentTimeMillis
 var blockWasSuccessfullyStored: Boolean = false
+var exceptionWasThrown: Boolean = true
 val result: Option[T] = try {
   val res = putBody(putBlockInfo)
   blockWasSuccessfullyStored = res.isEmpty
+  exceptionWasThrown = false
   res
 } finally {
   if (blockWasSuccessfullyStored) {
--- End diff --

One concern with using a `catch` here is handling of 
`InterruptedException`: if we use `case NonFatal(e)` that won't match 
`InterruptedException` and we'll miss out on cleanup following that. If we 
catch `Throwable`, on the other hand, then I think that we'll end up clearing 
the `isInterrupted` bit for `InterruptedException`s and it'll be awkward to 
match and re-set it when rethrowing. Therefore I'd like to keep the 
exception-handling case in the `finally` block with a simple check to see if we 
entered that block via an error case.

Note that I've seen this same exception-handling idiom used in Java code, 
where code that catches and re-throws `Throwable` won't compile in older Java 
versions because of the checked exception-handling (I think that newer versions 
are a bit more permissive about throwing exceptions from a `catch` block). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-13 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14834#discussion_r78674092
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -595,55 +831,104 @@ class LogisticRegressionModel private[spark] (
* Predict label for the given feature vector.
* The behavior of this can be adjusted using [[thresholds]].
*/
-  override protected def predict(features: Vector): Double = {
+  override protected def predict(features: Vector): Double = if 
(isMultinomial) {
+super.predict(features)
--- End diff --

maybe we want to have the specialized version when thresholds is not 
defined? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #14834: [SPARK-17163][ML] Unified LogisticRegression inte...

2016-09-13 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/14834#discussion_r78673689
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -508,11 +680,42 @@ object LogisticRegression extends 
DefaultParamsReadable[LogisticRegression] {
 @Since("1.4.0")
 class LogisticRegressionModel private[spark] (
 @Since("1.4.0") override val uid: String,
-@Since("2.0.0") val coefficients: Vector,
-@Since("1.3.0") val intercept: Double)
+@Since("2.1.0") val coefficientMatrix: Matrix,
+@Since("2.1.0") val interceptVector: Vector,
+@Since("1.3.0") override val numClasses: Int,
+private val isMultinomial: Boolean)
   extends ProbabilisticClassificationModel[Vector, LogisticRegressionModel]
   with LogisticRegressionParams with MLWritable {
 
+  @Since("2.0.0")
+  def coefficients: Vector = if (isMultinomial) {
+throw new SparkException("Multinomial models contain a matrix of 
coefficients, use " +
+  "coefficientMatrix instead.")
+  } else {
+_coefficients
+  }
+
+  // convert to appropriate vector representation without replicating data
+  private lazy val _coefficients: Vector = coefficientMatrix match {
+case dm: DenseMatrix => Vectors.dense(dm.values)
--- End diff --

I think you need to check `coefficientMatrix.isTransposed` even it's dense 
here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15085: [SPARK-17484] Prevent invalid block locations from being...

2016-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15085
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15085: [SPARK-17484] Prevent invalid block locations from being...

2016-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15085
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65339/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15085: [SPARK-17484] Prevent invalid block locations from being...

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/15085
  
**[Test build #65339 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65339/consoleFull)**
 for PR 15085 at commit 
[`8ab3108`](https://github.com/apache/spark/commit/8ab3108569e5812e0e81b77e3dfb0be1f7e557ce).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14691: [SPARK-16407][STREAMING] Allow users to supply custom st...

2016-09-13 Thread jodersky
Github user jodersky commented on the issue:

https://github.com/apache/spark/pull/14691
  
I like the idea! This is might not be the best place to start a discussion, 
but I reckon that the sink provider api could also eventually be used to 
provision builtin sinks. It would make the current, stringly-typed api optional 
and provide more compile-time safety.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #15088: SPARK-17532: Add lock debugging info to thread dumps.

2016-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/15088
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65336/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >