[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-03 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18779
  
@maropu Because we already support it in Dataset API as the example 
https://github.com/apache/spark/pull/18779#issuecomment-319880753 shows, I'm 
afraid that there are users using this feature. If we remove it now, maybe 
there is compatibility issue.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18834: [SPARK-21618][DOCS] Note that http(s) not accepted in sp...

2017-08-03 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/18834
  
With #18235 , I added Http(s) support for resources like files, jars. 
SparkSubmit will fetch them remotely and download to local tmp dir.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18834: [SPARK-21618][DOCS] Note that http(s) not accepted in sp...

2017-08-03 Thread jerryshao
Github user jerryshao commented on the issue:

https://github.com/apache/spark/pull/18834
  
I think https should be supported with this JIRA 
(https://github.com/apache/spark/pull/18235).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18834: [SPARK-21618][DOCS] Note that http(s) not accepted in sp...

2017-08-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18834
  
**[Test build #80209 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80209/testReport)**
 for PR 18834 at commit 
[`a1b8d6e`](https://github.com/apache/spark/commit/a1b8d6e7a62962a4d3f14df91c35a9b1946b2752).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16158: [SPARK-18724][ML] Add TuningSummary for TrainVali...

2017-08-03 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/16158#discussion_r131133998
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/tuning/CrossValidatorSuite.scala ---
@@ -66,6 +66,29 @@ class CrossValidatorSuite
 assert(cvModel.avgMetrics.length === lrParamMaps.length)
   }
 
+  test("cross validation with tuning summary") {
+val lr = new LogisticRegression
+val lrParamMaps = new ParamGridBuilder()
+  .addGrid(lr.regParam, Array(0.001, 1.0, 1000.0))
+  .addGrid(lr.maxIter, Array(0, 2))
+  .build()
+val eval = new BinaryClassificationEvaluator
+val cv = new CrossValidator()
+  .setEstimator(lr)
+  .setEstimatorParamMaps(lrParamMaps)
+  .setEvaluator(eval)
+  .setNumFolds(3)
+val cvModel = cv.fit(dataset)
+assert(cvModel.hasSummary)
+assert(cvModel.summary.params === lrParamMaps)
+assert(cvModel.summary.trainingMetrics.count() === lrParamMaps.length)
+
+val expected = lrParamMaps.zip(cvModel.avgMetrics).map { case (map, 
metric) =>
+  Row.fromSeq(map.toSeq.sortBy(_.param.name).map(_.value.toString) ++ 
Seq(metric.toString))
+}
+assert(cvModel.summary.trainingMetrics.collect().toSet === 
expected.toSet)
+  }
+
--- End diff --

Shall we add a test for the exception being thrown if no summary?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #16158: [SPARK-18724][ML] Add TuningSummary for TrainValidationS...

2017-08-03 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/16158
  
@hhbyyh sorry for the delay. Left a few review comments. 

Tested the examples and it looks cool! Very useful


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16158: [SPARK-18724][ML] Add TuningSummary for TrainVali...

2017-08-03 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/16158#discussion_r131133851
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tuning/TuningSummary.scala ---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.ml.tuning
+
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.param.ParamMap
+import org.apache.spark.sql.{DataFrame, Row, SparkSession}
+import org.apache.spark.sql.types.{StringType, StructField, StructType}
+
+/**
+ * :: Experimental ::
+ * Summary for the grid search tuning.
+ *
+ * @param params  ParamMaps for the Estimator
+ * @param metrics  corresponding evaluation metrics for the params
+ * @param bestIndex  index in params for the ParamMap of the best model.
+ */
+@Since("2.3.0")
+@Experimental
+private[tuning] class TuningSummary private[tuning](
+private[tuning] val params: Array[ParamMap],
+private[tuning] val metrics: Array[Double],
+private[tuning] val bestIndex: Int) {
--- End diff --

It appears `bestIndex` is never used?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16158: [SPARK-18724][ML] Add TuningSummary for TrainVali...

2017-08-03 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/16158#discussion_r131133018
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -133,7 +134,10 @@ class CrossValidator @Since("1.2.0") (@Since("1.4.0") 
override val uid: String)
 logInfo(s"Best cross-validation metric: $bestMetric.")
 val bestModel = est.fit(dataset, epm(bestIndex)).asInstanceOf[Model[_]]
 instr.logSuccess(bestModel)
-copyValues(new CrossValidatorModel(uid, bestModel, 
metrics).setParent(this))
+val model = new CrossValidatorModel(uid, bestModel, 
metrics).setParent(this)
+val summary = new TuningSummary(epm, metrics, bestIndex)
+model.setSummary(Some(summary))
--- End diff --

Just to confirm, the tuning summary will not be saved? Since it's a small 
dataframe, perhaps we should consider saving it with the model? (Can do that in 
a later PR however)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16158: [SPARK-18724][ML] Add TuningSummary for TrainVali...

2017-08-03 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/16158#discussion_r131132815
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -229,6 +233,29 @@ class CrossValidatorModel private[ml] (
 bestModel.transformSchema(schema)
   }
 
+  private var trainingSummary: Option[TuningSummary] = None
+
+  private[tuning] def setSummary(summary: Option[TuningSummary]): 
this.type = {
+this.trainingSummary = summary
+this
+  }
+
+  /**
+   * Return true if there exists summary of model.
+   */
+  @Since("2.3.0")
+  def hasSummary: Boolean = trainingSummary.nonEmpty
+
+  /**
+   * Gets summary of model on training set. An exception is
--- End diff --

Likewise, "cross-validation performance of each model" or similar?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16158: [SPARK-18724][ML] Add TuningSummary for TrainVali...

2017-08-03 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/16158#discussion_r131132649
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tuning/TrainValidationSplit.scala ---
@@ -231,6 +235,29 @@ class TrainValidationSplitModel private[ml] (
 
   @Since("2.0.0")
   override def write: MLWriter = new 
TrainValidationSplitModel.TrainValidationSplitModelWriter(this)
+
+  private var trainingSummary: Option[TuningSummary] = None
+
+  private[tuning] def setSummary(summary: Option[TuningSummary]): 
this.type = {
+this.trainingSummary = summary
+this
+  }
+
+  /**
+   * Return true if there exists summary of model.
+   */
+  @Since("2.3.0")
+  def hasSummary: Boolean = trainingSummary.nonEmpty
+
+  /**
+   * Gets summary of model on training set. An exception is
--- End diff --

Should probably rather be "summary of model performance on the validation 
set"?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #16158: [SPARK-18724][ML] Add TuningSummary for TrainVali...

2017-08-03 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/16158#discussion_r131133463
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tuning/TuningSummary.scala ---
@@ -0,0 +1,58 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.spark.ml.tuning
+
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.ml.param.ParamMap
+import org.apache.spark.sql.{DataFrame, Row, SparkSession}
+import org.apache.spark.sql.types.{StringType, StructField, StructType}
+
+/**
+ * :: Experimental ::
+ * Summary for the grid search tuning.
+ *
+ * @param params  ParamMaps for the Estimator
+ * @param metrics  corresponding evaluation metrics for the params
+ * @param bestIndex  index in params for the ParamMap of the best model.
+ */
+@Since("2.3.0")
+@Experimental
+private[tuning] class TuningSummary private[tuning](
+private[tuning] val params: Array[ParamMap],
+private[tuning] val metrics: Array[Double],
+private[tuning] val bestIndex: Int) {
+
+  /**
+   * Summary of grid search tuning in the format of DataFrame. Each row 
contains one candidate
+   * paramMap and its corresponding metric.
+   */
+  def trainingMetrics: DataFrame = {
+require(params.nonEmpty, "estimator param maps should not be empty")
+require(params.length == metrics.length, "estimator param maps number 
should match metrics")
+val spark = SparkSession.builder().getOrCreate()
+val sqlContext = spark.sqlContext
+val sc = spark.sparkContext
+val fields = params(0).toSeq.sortBy(_.param.name).map(_.param.name) ++ 
Seq("metrics")
--- End diff --

"metrics" is a bit generic. Perhaps it's better (and more user-friendly) to 
make this be something like `metric_name metric` so that it's obvious what 
metric was being optimized for? such as `ROC metric` or `AUC metric` or `MSE 
metric`? etc


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18834: [SPARK-21618][DOCS] Note that http(s) not accepte...

2017-08-03 Thread srowen
GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/18834

[SPARK-21618][DOCS] Note that http(s) not accepted in spark-submit jar uri

## What changes were proposed in this pull request?

Remove https from list of supported URIs, until it's explicitly supported. 
This just makes the docs consistent with behavior, doesn't address the issue 
that HTTPS should ideally work. That is, this doesn't resolve the JIRA.

## How was this patch tested?

Doc build

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-21618

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/18834.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #18834


commit a1b8d6e7a62962a4d3f14df91c35a9b1946b2752
Author: Sean Owen 
Date:   2017-08-03T12:57:01Z

Remove https from list of supported URIs, until it's explicitly supported




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for stand-alo...

2017-08-03 Thread skonto
Github user skonto commented on the issue:

https://github.com/apache/spark/pull/18630
  
@vanzin fixed the issues. Please give it another try or merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18630: [SPARK-12559][SPARK SUBMIT] fix --packages for stand-alo...

2017-08-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18630
  
**[Test build #80208 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80208/testReport)**
 for PR 18630 at commit 
[`70649e2`](https://github.com/apache/spark/commit/70649e2a130daa01e4d0da28c69c96b1bc9f26a3).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18807: [SPARK-21601][BUILD] Modify the pom.xml file, increase t...

2017-08-03 Thread highfei2011
Github user highfei2011 commented on the issue:

https://github.com/apache/spark/pull/18807
  
Hi,  @srowen @markhamstra ,This problem has been 
solved,[https://github.com/apache/spark/pull/18808](url)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18804: [SPARK-21599][SQL] Collecting column statistics for data...

2017-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18804
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18804: [SPARK-21599][SQL] Collecting column statistics for data...

2017-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18804
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80204/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18804: [SPARK-21599][SQL] Collecting column statistics for data...

2017-08-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18804
  
**[Test build #80204 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80204/testReport)**
 for PR 18804 at commit 
[`a120357`](https://github.com/apache/spark/commit/a120357e1816501db7183c3d03c4dabd24db3284).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17357: [SPARK-20025][CORE] Ignore SPARK_LOCAL* env, while deplo...

2017-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17357
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17357: [SPARK-20025][CORE] Ignore SPARK_LOCAL* env, while deplo...

2017-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/17357
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80202/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17357: [SPARK-20025][CORE] Ignore SPARK_LOCAL* env, while deplo...

2017-08-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17357
  
**[Test build #80202 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80202/testReport)**
 for PR 17357 at commit 
[`dc9cd31`](https://github.com/apache/spark/commit/dc9cd31ce20dbf5fad28a031b0989084ca671f32).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18783: [SPARK-21254] [WebUI] History UI performance fixes

2017-08-03 Thread 2ooom
Github user 2ooom commented on the issue:

https://github.com/apache/spark/pull/18783
  
Thank you @srowen. Seems like jenkins is happy. Should we merge in this 
case?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18313: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-08-03 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/18313
  
I commented on the 
[JIRA](https://issues.apache.org/jira/browse/SPARK-21086?focusedCommentId=16112623=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16112623).
 

I would like to better understand the use case more of keeping all the 
models (as per my JIRA comment). I suspect that #16158 may be the more useful 
approach in practice.

But overall, if there is a use case for keeping the models, I would agree 
with @jkbradley's suggestion that we offer a simple "keep all sub-models as a 
field in the model" approach, as well as consider the large-scale case with 
possibly the "dump to file" option.

In addition, we could have an option to keep "best", "all" or "_k_" models 
(user-specified as a number or %)?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with...

2017-08-03 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request:

https://github.com/apache/spark/pull/18538#discussion_r131121892
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/evaluation/ClusteringEvaluatorSuite.scala
 ---
@@ -0,0 +1,235 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.evaluation
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.ml.linalg.{Vectors, VectorUDT}
+import org.apache.spark.ml.param.ParamsSuite
+import org.apache.spark.ml.util.DefaultReadWriteTest
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.types.{IntegerType, StructField, StructType}
+
+
+class ClusteringEvaluatorSuite
+  extends SparkFunSuite with MLlibTestSparkContext with 
DefaultReadWriteTest {
+
+  import testImplicits._
+
+  val dataset = Seq(Row(Vectors.dense(5.1, 3.5, 1.4, 0.2), 0),
+  Row(Vectors.dense(4.9, 3.0, 1.4, 0.2), 0),
+  Row(Vectors.dense(4.7, 3.2, 1.3, 0.2), 0),
+  Row(Vectors.dense(4.6, 3.1, 1.5, 0.2), 0),
+  Row(Vectors.dense(5.0, 3.6, 1.4, 0.2), 0),
+  Row(Vectors.dense(5.4, 3.9, 1.7, 0.4), 0),
+  Row(Vectors.dense(4.6, 3.4, 1.4, 0.3), 0),
+  Row(Vectors.dense(5.0, 3.4, 1.5, 0.2), 0),
+  Row(Vectors.dense(4.4, 2.9, 1.4, 0.2), 0),
+  Row(Vectors.dense(4.9, 3.1, 1.5, 0.1), 0),
+  Row(Vectors.dense(5.4, 3.7, 1.5, 0.2), 0),
+  Row(Vectors.dense(4.8, 3.4, 1.6, 0.2), 0),
+  Row(Vectors.dense(4.8, 3.0, 1.4, 0.1), 0),
+  Row(Vectors.dense(4.3, 3.0, 1.1, 0.1), 0),
+  Row(Vectors.dense(5.8, 4.0, 1.2, 0.2), 0),
+  Row(Vectors.dense(5.7, 4.4, 1.5, 0.4), 0),
+  Row(Vectors.dense(5.4, 3.9, 1.3, 0.4), 0),
+  Row(Vectors.dense(5.1, 3.5, 1.4, 0.3), 0),
+  Row(Vectors.dense(5.7, 3.8, 1.7, 0.3), 0),
+  Row(Vectors.dense(5.1, 3.8, 1.5, 0.3), 0),
+  Row(Vectors.dense(5.4, 3.4, 1.7, 0.2), 0),
+  Row(Vectors.dense(5.1, 3.7, 1.5, 0.4), 0),
+  Row(Vectors.dense(4.6, 3.6, 1.0, 0.2), 0),
+  Row(Vectors.dense(5.1, 3.3, 1.7, 0.5), 0),
+  Row(Vectors.dense(4.8, 3.4, 1.9, 0.2), 0),
+  Row(Vectors.dense(5.0, 3.0, 1.6, 0.2), 0),
+  Row(Vectors.dense(5.0, 3.4, 1.6, 0.4), 0),
+  Row(Vectors.dense(5.2, 3.5, 1.5, 0.2), 0),
+  Row(Vectors.dense(5.2, 3.4, 1.4, 0.2), 0),
+  Row(Vectors.dense(4.7, 3.2, 1.6, 0.2), 0),
+  Row(Vectors.dense(4.8, 3.1, 1.6, 0.2), 0),
+  Row(Vectors.dense(5.4, 3.4, 1.5, 0.4), 0),
+  Row(Vectors.dense(5.2, 4.1, 1.5, 0.1), 0),
+  Row(Vectors.dense(5.5, 4.2, 1.4, 0.2), 0),
+  Row(Vectors.dense(4.9, 3.1, 1.5, 0.1), 0),
+  Row(Vectors.dense(5.0, 3.2, 1.2, 0.2), 0),
+  Row(Vectors.dense(5.5, 3.5, 1.3, 0.2), 0),
+  Row(Vectors.dense(4.9, 3.1, 1.5, 0.1), 0),
+  Row(Vectors.dense(4.4, 3.0, 1.3, 0.2), 0),
+  Row(Vectors.dense(5.1, 3.4, 1.5, 0.2), 0),
+  Row(Vectors.dense(5.0, 3.5, 1.3, 0.3), 0),
+  Row(Vectors.dense(4.5, 2.3, 1.3, 0.3), 0),
+  Row(Vectors.dense(4.4, 3.2, 1.3, 0.2), 0),
+  Row(Vectors.dense(5.0, 3.5, 1.6, 0.6), 0),
+  Row(Vectors.dense(5.1, 3.8, 1.9, 0.4), 0),
+  Row(Vectors.dense(4.8, 3.0, 1.4, 0.3), 0),
+  Row(Vectors.dense(5.1, 3.8, 1.6, 0.2), 0),
+  Row(Vectors.dense(4.6, 3.2, 1.4, 0.2), 0),
+  Row(Vectors.dense(5.3, 3.7, 1.5, 0.2), 0),
+  Row(Vectors.dense(5.0, 3.3, 1.4, 0.2), 0),
+  Row(Vectors.dense(7.0, 3.2, 4.7, 1.4), 1),
+  Row(Vectors.dense(6.4, 3.2, 4.5, 1.5), 1),
+  Row(Vectors.dense(6.9, 3.1, 4.9, 1.5), 1),
+  Row(Vectors.dense(5.5, 2.3, 4.0, 1.3), 1),
+  Row(Vectors.dense(6.5, 2.8, 4.6, 1.5), 1),
+  Row(Vectors.dense(5.7, 2.8, 4.5, 1.3), 1),
+  Row(Vectors.dense(6.3, 3.3, 4.7, 1.6), 1),
+  Row(Vectors.dense(4.9, 2.4, 3.3, 1.0), 1),
+  Row(Vectors.dense(6.6, 2.9, 4.6, 1.3), 1),
+  Row(Vectors.dense(5.2, 2.7, 

[GitHub] spark pull request #18733: [SPARK-21535][ML]Reduce memory requirement for Cr...

2017-08-03 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/18733#discussion_r131121125
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/tuning/CrossValidator.scala ---
@@ -112,16 +112,16 @@ class CrossValidator @Since("1.2.0") (@Since("1.4.0") 
override val uid: String)
   val validationDataset = sparkSession.createDataFrame(validation, 
schema).cache()
   // multi-model training
   logDebug(s"Train split $splitIndex with multiple sets of 
parameters.")
-  val models = est.fit(trainingDataset, 
epm).asInstanceOf[Seq[Model[_]]]
-  trainingDataset.unpersist()
   var i = 0
   while (i < numModels) {
+val model = est.fit(trainingDataset, epm(i)).asInstanceOf[Model[_]]
 // TODO: duplicate evaluator to take extra params from input
-val metric = eval.evaluate(models(i).transform(validationDataset, 
epm(i)))
+val metric = eval.evaluate(model.transform(validationDataset, 
epm(i)))
 logDebug(s"Got metric $metric for model trained with ${epm(i)}.")
 metrics(i) += metric
 i += 1
   }
+  trainingDataset.unpersist()
--- End diff --

One consideration here is that we're unpersisting the training data only 
after all models (for a fold) are evaluated. This means the full dataset (train 
and validation) is in cluster memory throughout, whereas previously only one 
dataset would be in cluster memory at a time. It's possible the impact of this 
on resources may be a greater than the saving on the driver from storing `1` 
instead of `numModels` models temporarily per fold?

It obviously depends on a lot of factors (dataset size, cluster resources, 
driver memory, model size, etc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18797: [SPARK-21523][ML] update breeze to 0.13.2 for an emergen...

2017-08-03 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18797
  
I've figured out the problem, and pretty sure it's a problem in the AFT 
test that was hidden until now. It runs AFTSurvivlaRegression on this input:

```
++-+--+--+
|features|label|censor|weight|
++-+--+--+
|   [0.0]|  0.0|   0.0|   1.0|
|   [1.0]|  1.0|   0.0|   1.0|
|   [2.0]|  2.0|   0.0|   1.0|
|   [3.0]|  3.0|   0.0|   1.0|
|   [4.0]|  4.0|   0.0|   0.0|
++-+--+--+
```

The problem is one label is 0, but this is interpreted as a time to failure 
(I believe?). Somewhere the code takes the log of this value, gets NaN, and 
eventually causes the error per above.

I think we can just modify the test but wanted to see if that makes sense 
to @yanboliang  @zhengruifeng @BenFradet who have touched the AFT code?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18802: [SPARK-18535][SPARK-19720][CORE][BACKPORT-2.1] Redact se...

2017-08-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18802
  
**[Test build #80207 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80207/testReport)**
 for PR 18802 at commit 
[`81dc26b`](https://github.com/apache/spark/commit/81dc26bd79dad088f533a6b8cc750e5c71abe378).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18833: [SPARK-21625][SQL] sqrt(negative number) should be null.

2017-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18833
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80200/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18833: [SPARK-21625][SQL] sqrt(negative number) should be null.

2017-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18833
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-03 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/18779
  
@viirya btw, we still need to support group/sort-by-ordinal in Dataset? If 
this pr merged, the behaviour seems to change. I feel the syntax in Dataset is 
a little confusing..


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18833: [SPARK-21625][SQL] sqrt(negative number) should be null.

2017-08-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18833
  
**[Test build #80200 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80200/testReport)**
 for PR 18833 at commit 
[`f346f5b`](https://github.com/apache/spark/commit/f346f5b240f20b653d4a2c6eaf11660f7f7ff98b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `abstract class UnaryNonnegativeExpression(f: Double => Double, name: 
String)`
  * `case class Sqrt(child: Expression) extends 
UnaryNonnegativeExpression(math.sqrt, \"SQRT\")`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-03 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/18779
  
@viirya great! thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18779
  
**[Test build #80206 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80206/testReport)**
 for PR 18779 at commit 
[`4fce4ab`](https://github.com/apache/spark/commit/4fce4ab3da0ce425b4ba1807d165b4ab05a812b7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-03 Thread 10110346
Github user 10110346 commented on the issue:

https://github.com/apache/spark/pull/18779
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18808: [SPARK-21605][BUILD] Let IntelliJ IDEA correctly ...

2017-08-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18808


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18808: [SPARK-21605][BUILD] Let IntelliJ IDEA correctly detect ...

2017-08-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18808
  
**[Test build #3874 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3874/testReport)**
 for PR 18808 at commit 
[`d690a03`](https://github.com/apache/spark/commit/d690a03fc3b735054433b362ef2539af412bc4ff).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14151: [SPARK-16496][SQL] Add wholetext as option for reading t...

2017-08-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14151
  
**[Test build #80205 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80205/testReport)**
 for PR 14151 at commit 
[`a918ccc`](https://github.com/apache/spark/commit/a918ccc2d9034370823fc87b4db9470be1508d82).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18815: [SPARK-21609][WEB-UI]In the Master ui add "log directory...

2017-08-03 Thread guoxiaolongzte
Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/18815
  
Yes, I am also considering this security issue. If this only shows a path, 
only authorized people can log on to the spark cluster lunix server for log 
view, which is relatively safe.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14151: [SPARK-16496][SQL] Add wholetext as option for reading t...

2017-08-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14151
  
**[Test build #80203 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80203/testReport)**
 for PR 14151 at commit 
[`2edc7fe`](https://github.com/apache/spark/commit/2edc7fe4d0278ec91f0ca6051c426aba185f0019).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14151: [SPARK-16496][SQL] Add wholetext as option for reading t...

2017-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14151
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80203/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14151: [SPARK-16496][SQL] Add wholetext as option for reading t...

2017-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14151
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18815: [SPARK-21609][WEB-UI]In the Master ui add "log directory...

2017-08-03 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18815
  
I don't know if this is that important, so not sure if it's worth much 
code. Would people generally be able to access the logs anyway? is there a 
security issue if I can see the masters logs and open them up to everyone? I 
don't know that this is a good idea.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18815: [SPARK-21609][WEB-UI]In the Master ui add "log directory...

2017-08-03 Thread guoxiaolongzte
Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/18815
  
Ok, i will work it, just like hbase. Show master and worker logs, and 
provide online viewing. Is this okay?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18797: [SPARK-21523][ML] update breeze to 0.13.2 for an emergen...

2017-08-03 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18797
  
The only number that is <= Double.PositiveInfinity is Double.NaN, because 
it has no ordering at all with respect to anything. So init must be NaN somehow.

It's called from LBFGS in Breeze, where the value is `if(state.iter == 0.0) 
1.0/norm(dir) else 1.0`, that should only happen if norm(dir) is NaN, which 
should only happen if the dir vector has a NaN element. And then so on, but I'm 
not seeing how the arguments from the Spark code cause this. The initial params 
are all 0.

It might still be a Breeze issue that's just now uncovered, but, haven't 
proven that yet.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-03 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18779
  
@maropu The PR is at #17770.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18815: [SPARK-21609][WEB-UI]In the Master ui add "log directory...

2017-08-03 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18815
  
OK, then how can this work? this won't render as a usable link.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18815: [SPARK-21609][WEB-UI]In the Master ui add "log directory...

2017-08-03 Thread guoxiaolongzte
Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/18815
  
Yes, it is not  an http URI ,at present only a path to tell the user log 
path where to facilitate the lunix server query related log information.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18804: [SPARK-21599][SQL] Collecting column statistics for data...

2017-08-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18804
  
**[Test build #80204 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80204/testReport)**
 for PR 18804 at commit 
[`a120357`](https://github.com/apache/spark/commit/a120357e1816501db7183c3d03c4dabd24db3284).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18281: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-08-03 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/18281
  
I also think we can leave any potential improvements for parallelism on the 
Python side (as well as the test side if we come up with a good way of testing 
that fitting is actually being done in parallel) for a later PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18281: [SPARK-21027][ML][PYTHON] Added tunable parallelism to o...

2017-08-03 Thread MLnick
Github user MLnick commented on the issue:

https://github.com/apache/spark/pull/18281
  
@ajaysaini725 Could you resolve merge conflicts and address the remaining 
outstanding review comments?

I left a few minor additional comments. Overall I think this is just about 
ready.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-08-03 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/18281#discussion_r131104871
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/classification/OneVsRestSuite.scala ---
@@ -101,6 +101,44 @@ class OneVsRestSuite extends SparkFunSuite with 
MLlibTestSparkContext with Defau
 assert(expectedMetrics.confusionMatrix ~== ovaMetrics.confusionMatrix 
absTol 400)
   }
 
+  test("one-vs-rest: tuning parallelism does not change output") {
+val ovaPar1 = new OneVsRest()
+  .setClassifier(new LogisticRegression)
+
+val ovaModelPar1 = ovaPar1.fit(dataset)
+
+val transformedDatasetPar1 = ovaModelPar1.transform(dataset)
+
+val ovaResultsPar1 = transformedDatasetPar1.select("prediction", 
"label").rdd.map {
+  row => (row.getDouble(0), row.getDouble(1))
+}
+
+val ovaPar2 = new OneVsRest()
+  .setClassifier(new LogisticRegression)
+  .setParallelism(2)
+
+val ovaModelPar2 = ovaPar2.fit(dataset)
+
+val transformedDatasetPar2 = ovaModelPar2.transform(dataset)
+
+val ovaResultsPar2 = transformedDatasetPar2.select("prediction", 
"label").rdd.map {
+  row => (row.getDouble(0), row.getDouble(1))
+}
+
+val metricsPar1 = new MulticlassMetrics(ovaResultsPar1)
+val metricsPar2 = new MulticlassMetrics(ovaResultsPar2)
+assert(metricsPar1.confusionMatrix == metricsPar2.confusionMatrix)
+
+ovaModelPar1.models.zip(ovaModelPar2.models).foreach {
+  case (lrModel1: LogisticRegressionModel, lrModel2: 
LogisticRegressionModel) =>
+assert(lrModel1.coefficients === lrModel2.coefficients)
--- End diff --

Perhaps we should use the approx equal version for vectors and matrices 
here and above? It seems the test does pass, but perhaps that would be better, 
to avoid future flakiness for whatever reason. Also, we do so in the Python 
tests so it would be more consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-08-03 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/18281#discussion_r131104233
  
--- Diff: python/pyspark/ml/classification.py ---
@@ -1517,20 +1518,23 @@ class OneVsRest(Estimator, OneVsRestParams, 
JavaMLReadable, JavaMLWritable):
 
 @keyword_only
 def __init__(self, featuresCol="features", labelCol="label", 
predictionCol="prediction",
- classifier=None):
+ classifier=None, parallelism=1):
 """
 __init__(self, featuresCol="features", labelCol="label", 
predictionCol="prediction", \
- classifier=None)
+ classifier=None, parallelism=1)
 """
 super(OneVsRest, self).__init__()
+self._setDefault(parallelism=1)
 kwargs = self._input_kwargs
 self._set(**kwargs)
 
 @keyword_only
 @since("2.0.0")
-def setParams(self, featuresCol=None, labelCol=None, 
predictionCol=None, classifier=None):
+def setParams(self, featuresCol="features", labelCol="label", 
predictionCol="prediction",
+  classifier=None, parallelism=1):
 """
-setParams(self, featuresCol=None, labelCol=None, 
predictionCol=None, classifier=None):
+setParams(self, featuresCol=None, labelCol=None, 
predictionCol=None, \
--- End diff --

The default args here in the doc should match the method (for 
`featuresCol`, `labelCol` and `predictionCol`)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14151: [SPARK-16496][SQL] Add wholetext as option for reading t...

2017-08-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14151
  
**[Test build #80203 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80203/testReport)**
 for PR 14151 at commit 
[`2edc7fe`](https://github.com/apache/spark/commit/2edc7fe4d0278ec91f0ca6051c426aba185f0019).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18816: [SPARK-21611][SQL]Error class name for log in sev...

2017-08-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18816


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18815: [SPARK-21609][WEB-UI]In the Master ui add "log directory...

2017-08-03 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18815
  
It's not an http URI though right? it's a path. I'm missing why this is 
browseable.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18816: [SPARK-21611][SQL]Error class name for log in several cl...

2017-08-03 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18816
  
Merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-08-03 Thread MLnick
Github user MLnick commented on a diff in the pull request:

https://github.com/apache/spark/pull/18281#discussion_r131102727
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala ---
@@ -294,6 +296,18 @@ final class OneVsRest @Since("1.4.0") (
   @Since("1.5.0")
   def setPredictionCol(value: String): this.type = set(predictionCol, 
value)
 
+  /** @group expertGetParam */
+  override def getParallelism: Int = $(parallelism)
--- End diff --

This one can just go in the trait right?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18829: [SPARK-21620][WEB-UI][CORE]Add metrics url in spark web ...

2017-08-03 Thread guoxiaolongzte
Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/18829
  
@srowen @ajbozarth
Help review the code,Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18815: [SPARK-21609][WEB-UI]In the Master ui add "log directory...

2017-08-03 Thread guoxiaolongzte
Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/18815
  
1.The master process information is as follows:
 /usr/java8/jdk/bin/java -cp 
/opt/ZDH/parcels/lib/spark_gxl/conf/:/opt/ZDH/parcels/lib/spark_gxl/jars/*:/opt/ZDH/parcels/lib/spark_gxl/conf/hdfs_conf/:/opt/ZDH/parcels/lib/spark_gxl/conf/yarn_conf
 -Dspark.ui.allowFramingFrom=ALLOW-FROM -DLOG_LEVEL=INFO -DROLE_NAME=master 
-DLOG_FILE=/data1/zdh/spark/logs/spark-root-master-zdh167.log -Xmx1024M 
org.apache.spark.deploy.master.Master --host zdh167 --port 7078 --webui-port 
8081

Log4j can not write log files to hdfs.
Event log is through the hdfs client to write the log to hdfs.

2.Currently only the log path is displayed in the UI. If you can, the 
details will be displayed in the future, and you can view.
e.g.

![11](https://user-images.githubusercontent.com/26266482/28916845-1f737782-7876-11e7-8f9e-3323cedf5082.png)


![12](https://user-images.githubusercontent.com/26266482/28916863-2abea922-7876-11e7-9c81-94a1c3712af5.png)




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties from s...

2017-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18668
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80198/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with the im...

2017-08-03 Thread yanboliang
Github user yanboliang commented on the issue:

https://github.com/apache/spark/pull/18538
  
@gatorsmile Could you help to trigger the test job? It seems I can't do it 
now. Thanks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties from s...

2017-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18668
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties from s...

2017-08-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18668
  
**[Test build #80198 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80198/testReport)**
 for PR 18668 at commit 
[`c629cc4`](https://github.com/apache/spark/commit/c629cc49b6af146860b3d7cecdbe4760f347e8c8).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18538: [SPARK-14516][ML] Adding ClusteringEvaluator with...

2017-08-03 Thread yanboliang
Github user yanboliang commented on a diff in the pull request:

https://github.com/apache/spark/pull/18538#discussion_r131100309
  
--- Diff: 
mllib/src/test/scala/org/apache/spark/ml/evaluation/ClusteringEvaluatorSuite.scala
 ---
@@ -0,0 +1,235 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.ml.evaluation
+
+import org.apache.spark.SparkFunSuite
+import org.apache.spark.ml.linalg.{Vectors, VectorUDT}
+import org.apache.spark.ml.param.ParamsSuite
+import org.apache.spark.ml.util.DefaultReadWriteTest
+import org.apache.spark.mllib.util.MLlibTestSparkContext
+import org.apache.spark.sql.Row
+import org.apache.spark.sql.types.{IntegerType, StructField, StructType}
+
+
+class ClusteringEvaluatorSuite
+  extends SparkFunSuite with MLlibTestSparkContext with 
DefaultReadWriteTest {
+
+  import testImplicits._
+
+  val dataset = Seq(Row(Vectors.dense(5.1, 3.5, 1.4, 0.2), 0),
+  Row(Vectors.dense(4.9, 3.0, 1.4, 0.2), 0),
+  Row(Vectors.dense(4.7, 3.2, 1.3, 0.2), 0),
+  Row(Vectors.dense(4.6, 3.1, 1.5, 0.2), 0),
+  Row(Vectors.dense(5.0, 3.6, 1.4, 0.2), 0),
+  Row(Vectors.dense(5.4, 3.9, 1.7, 0.4), 0),
+  Row(Vectors.dense(4.6, 3.4, 1.4, 0.3), 0),
+  Row(Vectors.dense(5.0, 3.4, 1.5, 0.2), 0),
+  Row(Vectors.dense(4.4, 2.9, 1.4, 0.2), 0),
+  Row(Vectors.dense(4.9, 3.1, 1.5, 0.1), 0),
+  Row(Vectors.dense(5.4, 3.7, 1.5, 0.2), 0),
+  Row(Vectors.dense(4.8, 3.4, 1.6, 0.2), 0),
+  Row(Vectors.dense(4.8, 3.0, 1.4, 0.1), 0),
+  Row(Vectors.dense(4.3, 3.0, 1.1, 0.1), 0),
+  Row(Vectors.dense(5.8, 4.0, 1.2, 0.2), 0),
+  Row(Vectors.dense(5.7, 4.4, 1.5, 0.4), 0),
+  Row(Vectors.dense(5.4, 3.9, 1.3, 0.4), 0),
+  Row(Vectors.dense(5.1, 3.5, 1.4, 0.3), 0),
+  Row(Vectors.dense(5.7, 3.8, 1.7, 0.3), 0),
+  Row(Vectors.dense(5.1, 3.8, 1.5, 0.3), 0),
+  Row(Vectors.dense(5.4, 3.4, 1.7, 0.2), 0),
+  Row(Vectors.dense(5.1, 3.7, 1.5, 0.4), 0),
+  Row(Vectors.dense(4.6, 3.6, 1.0, 0.2), 0),
+  Row(Vectors.dense(5.1, 3.3, 1.7, 0.5), 0),
+  Row(Vectors.dense(4.8, 3.4, 1.9, 0.2), 0),
+  Row(Vectors.dense(5.0, 3.0, 1.6, 0.2), 0),
+  Row(Vectors.dense(5.0, 3.4, 1.6, 0.4), 0),
+  Row(Vectors.dense(5.2, 3.5, 1.5, 0.2), 0),
+  Row(Vectors.dense(5.2, 3.4, 1.4, 0.2), 0),
+  Row(Vectors.dense(4.7, 3.2, 1.6, 0.2), 0),
+  Row(Vectors.dense(4.8, 3.1, 1.6, 0.2), 0),
+  Row(Vectors.dense(5.4, 3.4, 1.5, 0.4), 0),
+  Row(Vectors.dense(5.2, 4.1, 1.5, 0.1), 0),
+  Row(Vectors.dense(5.5, 4.2, 1.4, 0.2), 0),
+  Row(Vectors.dense(4.9, 3.1, 1.5, 0.1), 0),
+  Row(Vectors.dense(5.0, 3.2, 1.2, 0.2), 0),
+  Row(Vectors.dense(5.5, 3.5, 1.3, 0.2), 0),
+  Row(Vectors.dense(4.9, 3.1, 1.5, 0.1), 0),
+  Row(Vectors.dense(4.4, 3.0, 1.3, 0.2), 0),
+  Row(Vectors.dense(5.1, 3.4, 1.5, 0.2), 0),
+  Row(Vectors.dense(5.0, 3.5, 1.3, 0.3), 0),
+  Row(Vectors.dense(4.5, 2.3, 1.3, 0.3), 0),
+  Row(Vectors.dense(4.4, 3.2, 1.3, 0.2), 0),
+  Row(Vectors.dense(5.0, 3.5, 1.6, 0.6), 0),
+  Row(Vectors.dense(5.1, 3.8, 1.9, 0.4), 0),
+  Row(Vectors.dense(4.8, 3.0, 1.4, 0.3), 0),
+  Row(Vectors.dense(5.1, 3.8, 1.6, 0.2), 0),
+  Row(Vectors.dense(4.6, 3.2, 1.4, 0.2), 0),
+  Row(Vectors.dense(5.3, 3.7, 1.5, 0.2), 0),
+  Row(Vectors.dense(5.0, 3.3, 1.4, 0.2), 0),
+  Row(Vectors.dense(7.0, 3.2, 4.7, 1.4), 1),
+  Row(Vectors.dense(6.4, 3.2, 4.5, 1.5), 1),
+  Row(Vectors.dense(6.9, 3.1, 4.9, 1.5), 1),
+  Row(Vectors.dense(5.5, 2.3, 4.0, 1.3), 1),
+  Row(Vectors.dense(6.5, 2.8, 4.6, 1.5), 1),
+  Row(Vectors.dense(5.7, 2.8, 4.5, 1.3), 1),
+  Row(Vectors.dense(6.3, 3.3, 4.7, 1.6), 1),
+  Row(Vectors.dense(4.9, 2.4, 3.3, 1.0), 1),
+  Row(Vectors.dense(6.6, 2.9, 4.6, 1.3), 1),
+  Row(Vectors.dense(5.2, 

[GitHub] spark pull request #14151: [SPARK-16496][SQL] Add wholetext as option for re...

2017-08-03 Thread ScrapCodes
Github user ScrapCodes commented on a diff in the pull request:

https://github.com/apache/spark/pull/14151#discussion_r131100257
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/HadoopFileWholeTextReader.scala
 ---
@@ -0,0 +1,57 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.datasources
+
+import java.io.Closeable
+import java.net.URI
+
+import org.apache.hadoop.conf.Configuration
+import org.apache.hadoop.fs.Path
+import org.apache.hadoop.io.Text
+import org.apache.hadoop.mapreduce._
+import org.apache.hadoop.mapreduce.lib.input.CombineFileSplit
+import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl
+
+import org.apache.spark.input.WholeTextFileRecordReader
+
+/**
+ * An adaptor from a [[PartitionedFile]] to an [[Iterator]] of [[Text]], 
which is all of the lines
+ * in that file.
+ */
+class HadoopFileWholeTextReader(file: PartitionedFile, conf: Configuration)
--- End diff --

Thank you, for catching this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18816: [SPARK-21611][SQL]Error class name for log in several cl...

2017-08-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18816
  
**[Test build #3876 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3876/testReport)**
 for PR 18816 at commit 
[`0b1b898`](https://github.com/apache/spark/commit/0b1b898512a7070251d121a9ed35f1dc9df5b623).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18832
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80199/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18832
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18628: [SPARK-18061][ThriftServer] Add spnego auth support for ...

2017-08-03 Thread steveloughran
Github user steveloughran commented on the issue:

https://github.com/apache/spark/pull/18628
  
Thanks for making sure this is consistent with other uses of 
Configuration.get(); consistency is critical here


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18832
  
**[Test build #80199 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80199/testReport)**
 for PR 18832 at commit 
[`83c7504`](https://github.com/apache/spark/commit/83c75043fee2a20f1eb6298bd2dab1259409c3ef).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18815: [SPARK-21609][WEB-UI]In the Master ui add "log directory...

2017-08-03 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18815
  
Hm, I thought so, even in standalone mode, but I don't know.
OK, if it's a file URI, is that going to be browseable?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...

2017-08-03 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/18668#discussion_r131095350
  
--- Diff: 
sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLCLIDriver.scala
 ---
@@ -50,6 +50,7 @@ private[hive] object SparkSQLCLIDriver extends Logging {
   private val prompt = "spark-sql"
   private val continuedPrompt = "".padTo(prompt.length, ' ')
   private var transport: TSocket = _
+  private final val SPARK_HADOOP_PROP_PREFIX = "spark.hadoop."
--- End diff --

good point. I see `spark.hive` in some of my configs


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18528: [SPARK-13041][Mesos] Adds sandbox uri to spark dispatche...

2017-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18528
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18528: [SPARK-13041][Mesos] Adds sandbox uri to spark dispatche...

2017-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18528
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80201/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18528: [SPARK-13041][Mesos] Adds sandbox uri to spark dispatche...

2017-08-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18528
  
**[Test build #80201 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80201/testReport)**
 for PR 18528 at commit 
[`2dbf2e8`](https://github.com/apache/spark/commit/2dbf2e89d338ef82d29b58663dccbeffa6956415).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...

2017-08-03 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/18668#discussion_r131094720
  
--- Diff: docs/configuration.md ---
@@ -2335,5 +2335,61 @@ The location of these configuration files varies 
across Hadoop versions, but
 a common location is inside of `/etc/hadoop/conf`. Some tools create
 configurations on-the-fly, but offer a mechanisms to download copies of 
them.
 
-To make these files visible to Spark, set `HADOOP_CONF_DIR` in 
`$SPARK_HOME/spark-env.sh`
+To make these files visible to Spark, set `HADOOP_CONF_DIR` in 
`$SPARK_HOME/conf/spark-env.sh`
 to a location containing the configuration files.
+
+# Custom Hadoop/Hive Configuration
+
+If your Spark applications interacting with Hadoop, Hive, or both, there 
are probably Hadoop/Hive
+configuration files in Spark's class path.
+
+Multiple running applications might require different Hadoop/Hive client 
side configurations.
+You can copy and modify `hdfs-site.xml`, `core-site.xml`, `yarn-site.xml`, 
`hive-site.xml` in
+Spark's class path for each application, but it is not very convenient and 
these
+files are best to be shared with common properties to avoid hard-coding 
certain configurations.
+
+The better choice is to use spark hadoop properties in the form of 
`spark.hadoop.*`. 
+They can be considered as same as normal spark properties which can be set 
in `$SPARK_HOME/conf/spark-defalut.conf`
+
+In some cases, you may want to avoid hard-coding certain configurations in 
a `SparkConf`. For
+instance, Spark allows you to simply create an empty conf and set 
spark/spark hadoop properties.
+
+{% highlight scala %}
+val conf = new SparkConf().set("spark.hadoop.abc.def","xyz")
+val sc = new SparkContext(conf)
+{% endhighlight %}
+
+Also, you can modify or add configurations at runtime:
+{% highlight bash %}
+./bin/spark-submit \ 
+  --name "My app" \ 
+  --master local[4] \  
+  --conf spark.eventLog.enabled=false \ 
+  --conf "spark.executor.extraJavaOptions=-XX:+PrintGCDetails 
-XX:+PrintGCTimeStamps" \ 
+  --conf spark.hadoop.abc.def=xyz \ 
+  myApp.jar
+{% endhighlight %}
+
+## Typical Hadoop/Hive Configurations
+
+
+
+  spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version
+  1
+  
+The file output committer algorithm version, valid algorithm version 
number: 1 or 2.
+Version 2 may have better performance, but version 1 may handle 
failures better in certain situations,
+as per https://issues.apache.org/jira/browse/MAPREDUCE-4815;>MAPREDUCE-4815.
+  
+
+
+
+  spark.hadoop.fs.hdfs.impl.disable.cache
--- End diff --

this is a pretty dangerous one to point people at, especially since it's 
fixed in future Hadoop versions & backported to some distros —and the cost of 
creating a new HDFS client on every worker can get very expensive if you have a 
spark process with many threads, all fielding work from the same user (thread 
pools, IPC connections, )


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18815: [SPARK-21609][WEB-UI]In the Master ui add "log directory...

2017-08-03 Thread guoxiaolongzte
Github user guoxiaolongzte commented on the issue:

https://github.com/apache/spark/pull/18815
  
@srowen 
Are you sure that the master and worker logs can be stored in hdfs?
Spark's master and worker logs are generated by log4j.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...

2017-08-03 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/18668#discussion_r131093892
  
--- Diff: docs/configuration.md ---
@@ -2335,5 +2335,61 @@ The location of these configuration files varies 
across Hadoop versions, but
 a common location is inside of `/etc/hadoop/conf`. Some tools create
 configurations on-the-fly, but offer a mechanisms to download copies of 
them.
 
-To make these files visible to Spark, set `HADOOP_CONF_DIR` in 
`$SPARK_HOME/spark-env.sh`
+To make these files visible to Spark, set `HADOOP_CONF_DIR` in 
`$SPARK_HOME/conf/spark-env.sh`
 to a location containing the configuration files.
+
+# Custom Hadoop/Hive Configuration
+
+If your Spark applications interacting with Hadoop, Hive, or both, there 
are probably Hadoop/Hive
+configuration files in Spark's class path.
+
+Multiple running applications might require different Hadoop/Hive client 
side configurations.
+You can copy and modify `hdfs-site.xml`, `core-site.xml`, `yarn-site.xml`, 
`hive-site.xml` in
+Spark's class path for each application, but it is not very convenient and 
these
+files are best to be shared with common properties to avoid hard-coding 
certain configurations.
--- End diff --

"best shared"

You can'd do that anyway on a production Spark on Yarn cluster as if you 
did., lots of other things would break. How about

```
In a Spark cluster running on YARN, these configuration files are set 
cluster-wide, and cannot safely be changed by the application.
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18668: [SPARK-21451][SQL]get `spark.hadoop.*` properties...

2017-08-03 Thread steveloughran
Github user steveloughran commented on a diff in the pull request:

https://github.com/apache/spark/pull/18668#discussion_r131093320
  
--- Diff: docs/configuration.md ---
@@ -2335,5 +2335,61 @@ The location of these configuration files varies 
across Hadoop versions, but
 a common location is inside of `/etc/hadoop/conf`. Some tools create
 configurations on-the-fly, but offer a mechanisms to download copies of 
them.
 
-To make these files visible to Spark, set `HADOOP_CONF_DIR` in 
`$SPARK_HOME/spark-env.sh`
+To make these files visible to Spark, set `HADOOP_CONF_DIR` in 
`$SPARK_HOME/conf/spark-env.sh`
 to a location containing the configuration files.
+
+# Custom Hadoop/Hive Configuration
+
+If your Spark applications interacting with Hadoop, Hive, or both, there 
are probably Hadoop/Hive
--- End diff --

s/applications/r/application is/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18804: [SPARK-21599][SQL] Collecting column statistics for data...

2017-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18804
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #17357: [SPARK-20025][CORE] Ignore SPARK_LOCAL* env, while deplo...

2017-08-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/17357
  
**[Test build #80202 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80202/testReport)**
 for PR 17357 at commit 
[`dc9cd31`](https://github.com/apache/spark/commit/dc9cd31ce20dbf5fad28a031b0989084ca671f32).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18804: [SPARK-21599][SQL] Collecting column statistics for data...

2017-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18804
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80196/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18779
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80197/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18779
  
**[Test build #80197 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80197/testReport)**
 for PR 18779 at commit 
[`4fce4ab`](https://github.com/apache/spark/commit/4fce4ab3da0ce425b4ba1807d165b4ab05a812b7).
 * This patch **fails PySpark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18804: [SPARK-21599][SQL] Collecting column statistics for data...

2017-08-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18804
  
**[Test build #80196 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80196/testReport)**
 for PR 18804 at commit 
[`c1ab569`](https://github.com/apache/spark/commit/c1ab569f7960846262de20340e14cf3ad939c448).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/18779
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-03 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/18779
  
ok, thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18528: [SPARK-13041][Mesos] Adds sandbox uri to spark dispatche...

2017-08-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18528
  
**[Test build #80201 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80201/testReport)**
 for PR 18528 at commit 
[`2dbf2e8`](https://github.com/apache/spark/commit/2dbf2e89d338ef82d29b58663dccbeffa6956415).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18528: [SPARK-13041][Mesos] Adds sandbox uri to spark dispatche...

2017-08-03 Thread skonto
Github user skonto commented on the issue:

https://github.com/apache/spark/pull/18528
  
@srowen fixed, ready for merge.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18811: [SPARK-21604][SQL] if the object extends Logging,...

2017-08-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18811


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18816: [SPARK-21611][SQL]Error class name for log in several cl...

2017-08-03 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/18816
  
**[Test build #3876 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3876/testReport)**
 for PR 18816 at commit 
[`0b1b898`](https://github.com/apache/spark/commit/0b1b898512a7070251d121a9ed35f1dc9df5b623).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18745: [SPARK-21544][DEPLOY] Tests jar of some module should no...

2017-08-03 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18745
  
Ping @caneGuy -- adding `[test-maven]` will let us also verify this passes 
the Maven build


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18833: [SPARK-21625][SQL] sqrt(negative number) should be null.

2017-08-03 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/18833
  
Pg does not accept this query;
```
postgres=# select sqrt(3);
   sqrt   
--
 1.73205080756888
(1 row)

postgres=# select sqrt(-1);
ERROR:  cannot take square root of a negative number
```
So, Another solution is to fail in analysis.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18811: [SPARK-21604][SQL] if the object extends Logging, i sugg...

2017-08-03 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18811
  
merged to master


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-03 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18779
  
@maropu I have a PR before to solve that. Due to some reasons it will be
merged on 2.3. I am out of laptop, will refer it once I can access laptop.



On Aug 3, 2017 5:07 PM, "Takeshi Yamamuro"  wrote:

@gatorsmile  @viirya
 I looked into why it applied some analyzer
rules into already-analyzed plans and I noticed that some rules used
transform/transformUp instead of resolveOperators in apply.
SubstituteUnresolvedOrdinals also doesn't use resolveOperators, so the rule
is applied again into an already-analyzed plan master...maropu:SPARK-21580-3
.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
, or mute
the thread


.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18811: [SPARK-21604][SQL] if the object extends Logging, i sugg...

2017-08-03 Thread zuotingbing
Github user zuotingbing commented on the issue:

https://github.com/apache/spark/pull/18811
  
ok, have done. Thanks @srowen .


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18779: [SPARK-21580][SQL]Integers in aggregation expressions ar...

2017-08-03 Thread maropu
Github user maropu commented on the issue:

https://github.com/apache/spark/pull/18779
  
@gatorsmile @viirya I looked into why it applied some analyzer rules into 
already-analyzed plans and I noticed that some rules used 
`transform/transformUp` instead of `resolveOperators` in `apply`. 
`SubstituteUnresolvedOrdinals` also doesn't use `resolveOperators`, so the rule 
is applied again into an already-analyzed plan 
https://github.com/apache/spark/compare/master...maropu:SPARK-21580-3.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18607: [SPARK-21362][SQL][Adding Apache Drill JDBC Dialect]

2017-08-03 Thread viirya
Github user viirya commented on the issue:

https://github.com/apache/spark/pull/18607
  
@radford1 Have you added the test as suggested by @gatorsmile?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #18832: [SPARK-21623][ML]fix RF doc

2017-08-03 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/18832
  
@sethah 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #18821: [SPARK-21615][ML][MLlib][DOCS] Fix broken redirec...

2017-08-03 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/18821


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



<    1   2   3   4   5   >