[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-14 Thread jkbradley
GitHub user jkbradley opened a pull request:

https://github.com/apache/spark/pull/4047

[SPARK-1405] [mllib] Latent Dirichlet Allocation (LDA) using EM

**This PR introduces an API + simple implementation for Latent Dirichlet 
Allocation (LDA).**

The [design doc for this 
PR](https://docs.google.com/document/d/1kSsDqTeZMEB94Bs4GTd0mvdAmduvZSSkpoSfn-seAzo)
 has been updated since I initially posted it.  In particular, see the API and 
Planning for the Future sections.

## Goals

* Settle on a public API which may eventually include:
  * more inference algorithms
  * more options / functionality
* Have an initial easy-to-understand implementation which others may 
improve.
* This is NOT intended to support every topic model out there.  However, if 
there are suggestions for making this extensible or pluggable in the future, 
that could be nice, as long as it does not complicate the API or implementation 
too much.
* This may not be very scalable currently.  It will be important to check 
and improve accuracy.  For correctness of the implementation, please check 
against the Asuncion et al. (2009) paper in the design doc.

## Sketch of contents of this PR

**Dependency: This makes MLlib depend on GraphX.**

Files and classes:
* LDA.scala (441 lines):
  * class LDA (main estimator class)
  * LDA.Document  (text + document ID)
* LDAModel.scala (266 lines)
  * abstract class LDAModel
  * class LocalLDAModel
  * class DistributedLDAModel
* LDAExample.scala (245 lines): script to run LDA + a simple (private) 
Tokenizer
* LDASuite.scala (144 lines)

Data/model representation and algorithm:
* Data/model: Uses GraphX, with term vertices + document vertices
* Algorithm: EM, following [Asuncion, Welling, Smyth, and Teh.  On 
Smoothing and Inference for Topic Models.  UAI, 
2009.](http://arxiv-web3.library.cornell.edu/abs/1205.2662v1)
* For more details, please see the description in the “DEVELOPERS NOTE” 
in LDA.scala

## Design notes

Please refer to the JIRA for more discussion + the [design doc for this 
PR](https://docs.google.com/document/d/1kSsDqTeZMEB94Bs4GTd0mvdAmduvZSSkpoSfn-seAzo)

Here, I list the main changes AFTER the design doc was posted.

Design decisions:
* logLikelihood() computes the log likelihood of the data and the current 
point estimate of parameters.  This is different from the likelihood of the 
data given the hyperparameters, which would be harder to compute.  I’d 
describe the current approach as more frequentist, whereas the harder approach 
would be more Bayesian.
* The current API takes Documents as token count vectors.  I believe there 
should be an extended API taking RDD[String] or RDD[Array[String]] in a future 
PR.  I have sketched this out in the design doc (as well as handier versions of 
getTopics returning Strings).
* Hyperparameters should be set differently for different 
inference/learning algorithms.  See Asuncion et al. (2009) in the design doc 
for a good demonstration.  I encourage good behavior via defaults and warning 
messages.

Items planned for future PRs:
* perplexity
* API taking Strings

## Questions for reviewers

* Should LDA be called LatentDirichletAllocation (and LDAModel be 
LatentDirichletAllocationModel)?
  * Pro: We may someday want LinearDiscriminantAnalysis.
  * Con: Very long names

* Should LDA reside in clustering?  Or do we want a sub-package?
  * mllib.topicmodel
  * mllib.clustering.topicmodel

* Does the API seem reasonable and extensible?

* Unit tests:
  * Should there be a test which checks a clustering results?  E.g., train 
on a small, fake dataset with 2 very distinct topics/clusters, and ensure LDA 
finds those 2 topics/clusters.  Does that sound useful or too flaky?

## Other notes

This has not been tested much for scaling.  I have run it on a laptop for 
200 iterations on a 5MB dataset with 1000 terms and 5 topics.  Running it for 
500 iterations made it fail because of GC problems.  Future PRs will need to 
improve the scaling.

## Thanks to…

* @dlwh  for the initial implementation
  * + @jegonzal  for some code in the initial implementation
* The many contributors towards topic model implementations in Spark which 
were referenced as a basis for this PR: @akopich @witgo @yinxusen @dlwh 
@EntilZha @jegonzal  @IlyaKozlov

CC: @mengxr


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jkbradley/spark davidhall-lda

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4047.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes 

[GitHub] spark pull request: SPARK-4746 make it easy to skip IntegrationTes...

2015-01-14 Thread squito
Github user squito commented on the pull request:

https://github.com/apache/spark/pull/4048#issuecomment-69990018
  
oh good point Marcelo -- I forgot to add that I've only done this for 
`core` in this PR.  I wanted to ask others whether its worthwhile to do in 
other projects or not before I go digging into each one of them.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2309][MLlib] Multinomial Logistic Regre...

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3833#issuecomment-69991087
  
  [Test build #25565 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25565/consoleFull)
 for   PR 3833 at commit 
[`7ac4dfc`](https://github.com/apache/spark/commit/7ac4dfc4a41b20c97c29fdf60045aca64fe08a6f).
 * This patch **does not merge cleanly**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-14 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/4047#discussion_r22963462
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala ---
@@ -0,0 +1,244 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.mllib
+
+import scala.collection.mutable.ArrayBuffer
+
+import java.text.BreakIterator
+
+import scopt.OptionParser
+
+import org.apache.log4j.{Level, Logger}
+
+import org.apache.spark.{SparkContext, SparkConf}
+import org.apache.spark.SparkContext._
+import org.apache.spark.mllib.clustering.LDA
+import org.apache.spark.mllib.clustering.LDA.Document
+import org.apache.spark.mllib.linalg.SparseVector
+import org.apache.spark.rdd.RDD
+
+
+/**
+ * An example Latent Dirichlet Allocation (LDA) app. Run with
+ * {{{
+ * ./bin/run-example mllib.DenseKMeans [options] input
--- End diff --

Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2309][MLlib] Multinomial Logistic Regre...

2015-01-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/3833#discussion_r22967437
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -61,20 +67,70 @@ class LogisticRegressionModel (
 
   override protected def predictPoint(dataMatrix: Vector, weightMatrix: 
Vector,
   intercept: Double) = {
-val margin = weightMatrix.toBreeze.dot(dataMatrix.toBreeze) + intercept
-val score = 1.0 / (1.0 + math.exp(-margin))
-threshold match {
-  case Some(t) = if (score  t) 1.0 else 0.0
-  case None = score
+// If dataMatrix and weightMatrix have the same dimension, it's binary 
logistic regression.
+if (dataMatrix.size == weightMatrix.size) {
+  val margin = dot(weights, dataMatrix) + intercept
+  val score = 1.0 / (1.0 + math.exp(-margin))
+  threshold match {
+case Some(t) = if (score  t) 1.0 else 0.0
+case None = score
+  }
+} else {
+  val dataWithBiasSize = weightMatrix.size / (nClasses - 1)
+  val dataWithBias = if(dataWithBiasSize == dataMatrix.size) {
+dataMatrix
+  }  else {
+assert(dataMatrix.size + 1 == dataWithBiasSize)
+MLUtils.appendBias(dataMatrix)
+  }
+
+  val margins = Array.ofDim[Double](nClasses)
+
+  val weightsArray = weights match {
+  case dv: DenseVector = dv.values
+  case _ =
+throw new IllegalArgumentException(
+  sweights only supports dense vector but got type 
${weights.getClass}.)
+  }
+
+  var i = 0
+  while (i  nClasses - 1) {
--- End diff --

There is `margins(i + 1) = margin`, and the first margins(0) == 0, so using 
`(0 until nClasses).map` will require couple more if statement. I change it to 
for loop since it's not tight loop.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4803] [streaming] Remove duplicate Regi...

2015-01-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3648#issuecomment-69973099
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25554/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4286] Integrate external shuffle servic...

2015-01-14 Thread tnachen
Github user tnachen commented on the pull request:

https://github.com/apache/spark/pull/3861#issuecomment-69976532
  
@andrewor14 I wonder if you have time to review this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5231][WebUI] History Server shows wrong...

2015-01-14 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4029#discussion_r22963720
  
--- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
@@ -469,6 +471,7 @@ private[spark] object JsonProtocol {
 
   def jobStartFromJson(json: JValue): SparkListenerJobStart = {
 val jobId = (json \ Job ID).extract[Int]
+val submissionTime = (json \ Submission Time).extractOpt[Long]
--- End diff --

Similarly, you should also add a backwards compatibility test; this can be 
a few lines in the existing SparkListenerJobStart backward compatibility 
test: 
https://github.com/sarutak/spark/blob/SPARK-5231/core/src/test/scala/org/apache/spark/util/JsonProtocolSuite.scala#L240


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5249] Added type specific set functions...

2015-01-14 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4042#issuecomment-69979109
  
If you run the MiMa checks, I'm pretty sure that this will break binary 
compatibility because it changes the signature of a public method.  Let's see, 
though:

Jenkins, this is ok to test.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4803] [streaming] Remove duplicate Regi...

2015-01-14 Thread ilayaperumalg
Github user ilayaperumalg commented on the pull request:

https://github.com/apache/spark/pull/3648#issuecomment-69980475
  
The test passed now (after increasing the timeout value). Can someone 
re-run the test to see if the test result is consistent?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-69981443
  
  [Test build #25561 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25561/consoleFull)
 for   PR 3916 at commit 
[`61919df`](https://github.com/apache/spark/commit/61919df21853eba479ddb591fb89dcecfd341988).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5199. Input metrics should show up for I...

2015-01-14 Thread sryza
GitHub user sryza opened a pull request:

https://github.com/apache/spark/pull/4050

SPARK-5199. Input metrics should show up for InputFormats that return Co...

...mbineFileSplits

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sryza/spark sandy-spark-5199

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4050.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4050


commit 9962dd097425442d62778f72911c6320c812f153
Author: Sandy Ryza sa...@cloudera.com
Date:   2015-01-14T21:17:02Z

SPARK-5199. Input metrics should show up for InputFormats that return 
CombineFileSplits




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2309][MLlib] Multinomial Logistic Regre...

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3833#issuecomment-69993842
  
  [Test build #25566 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25566/consoleFull)
 for   PR 3833 at commit 
[`4e16781`](https://github.com/apache/spark/commit/4e1678160f135f263b242b4cf1c28c95886bc11b).
 * This patch **fails to build**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2309][MLlib] Multinomial Logistic Regre...

2015-01-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3833#issuecomment-69993846
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25566/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2309][MLlib] Multinomial Logistic Regre...

2015-01-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/3833#discussion_r22963566
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -18,30 +18,36 @@
 package org.apache.spark.mllib.classification
 
 import org.apache.spark.annotation.Experimental
-import org.apache.spark.mllib.linalg.Vector
+import org.apache.spark.mllib.linalg.BLAS.dot
+import org.apache.spark.mllib.linalg.{DenseVector, Vector}
 import org.apache.spark.mllib.optimization._
 import org.apache.spark.mllib.regression._
-import org.apache.spark.mllib.util.DataValidators
+import org.apache.spark.mllib.util.{DataValidators, MLUtils}
 import org.apache.spark.rdd.RDD
 
 /**
- * Classification model trained using Logistic Regression.
+ * Classification model trained using Multinomial/Binary Logistic 
Regression.
  *
  * @param weights Weights computed for every feature.
- * @param intercept Intercept computed for this model.
+ * @param intercept Intercept computed for this model. (Only used in 
Binary Logistic Regression.
+ *  In Multinomial Logistic Regression, the intercepts 
will not be a single values,
+ *  so the intercepts will be part of the weights.)
+ * @param nClasses The number of possible outcomes for Multinomial 
Logistic Regression.
+ * The default value is 2 which is Binary Logistic 
Regression.
  */
 class LogisticRegressionModel (
 override val weights: Vector,
-override val intercept: Double)
+override val intercept: Double,
+nClasses: Int = 2)
   extends GeneralizedLinearModel(weights, intercept) with 
ClassificationModel with Serializable {
 
   private var threshold: Option[Double] = Some(0.5)
 
   /**
* :: Experimental ::
-   * Sets the threshold that separates positive predictions from negative 
predictions. An example
-   * with prediction score greater than or equal to this threshold is 
identified as an positive,
-   * and negative otherwise. The default value is 0.5.
+   * Sets the threshold that separates positive predictions from negative 
predictions
+   * in Binary Logistic Regression. An example with prediction score 
greater than or equal to
+   * this threshold is identified as an positive, and negative otherwise. 
The default value is 0.5.
*/
--- End diff --

I think the model should have api to predict as probability, and we have 
another transformer to take threshold so we can reuse the logic for all the 
probabilistic model. I will like to remove threshold stuff from LOR entirely. 
@mengxr what do u think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-01-14 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-69981602
  
HI @andrewor14 , thanks for all the comments. I updated the patch and I 
think I covered everything. Mainly, I fixed the issue with new lines that you 
asked about, and also now `sbt assembly` should work without having to do `sbt 
package` too.

  I notice that SparkLauncher doesn't have the full set of options found 
in SparkSubmitArguments

My thinking is that aside from the exposed APIs, everything else would be 
set using `SparkLauncher.setConf()`. I even though about removing some other 
methods (like `setMaster()`) but decided to leave the most common ones easily 
accessible.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2309][MLlib] Multinomial Logistic Regre...

2015-01-14 Thread srowen
Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/3833#discussion_r22967246
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -61,20 +67,70 @@ class LogisticRegressionModel (
 
   override protected def predictPoint(dataMatrix: Vector, weightMatrix: 
Vector,
--- End diff --

The argument to the gradient calculation is properly a vector of weights, 
so that need not change for API reasons. So is it just having to do the 
translation? it's a line of code I think, although requires a copy. Maybe 
someone else can weigh in with an opinion too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4687. [WIP] Add an addDirectory API

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3670#issuecomment-69985480
  
  [Test build #25562 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25562/consoleFull)
 for   PR 3670 at commit 
[`8413c50`](https://github.com/apache/spark/commit/8413c5010527f51cb8fc6401201a0d5f1f8ef6e9).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5193][SQL] Tighten up SQLContext API

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4049#issuecomment-69990207
  
  [Test build #25564 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25564/consoleFull)
 for   PR 4049 at commit 
[`4a38c9b`](https://github.com/apache/spark/commit/4a38c9b15ecc04f2ae2f285af5742608fc91549b).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5199. Input metrics should show up for I...

2015-01-14 Thread ksakellis
Github user ksakellis commented on the pull request:

https://github.com/apache/spark/pull/4050#issuecomment-69996188
  
This mostly LGTM. My only concern is with the proliferation of copy pasta 
between the HadoopRDD and NewHadoopRDD.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5228][WebUI] Hide tables for Active Jo...

2015-01-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4028


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-14 Thread dlwh
Github user dlwh commented on a diff in the pull request:

https://github.com/apache/spark/pull/4047#discussion_r22962641
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/mllib/LDAExample.scala ---
@@ -0,0 +1,244 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.mllib
+
+import scala.collection.mutable.ArrayBuffer
+
+import java.text.BreakIterator
+
+import scopt.OptionParser
+
+import org.apache.log4j.{Level, Logger}
+
+import org.apache.spark.{SparkContext, SparkConf}
+import org.apache.spark.SparkContext._
+import org.apache.spark.mllib.clustering.LDA
+import org.apache.spark.mllib.clustering.LDA.Document
+import org.apache.spark.mllib.linalg.SparseVector
+import org.apache.spark.rdd.RDD
+
+
+/**
+ * An example Latent Dirichlet Allocation (LDA) app. Run with
+ * {{{
+ * ./bin/run-example mllib.DenseKMeans [options] input
--- End diff --

(rename)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Add TaskContext.attemptNumber and...

2015-01-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/3849


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5234][ml]examples for ml don't have spa...

2015-01-14 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/4044#issuecomment-69980197
  
Merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5234][ml]examples for ml don't have spa...

2015-01-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4044


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4047#issuecomment-69989172
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25560/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4746 make it easy to skip IntegrationTes...

2015-01-14 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/4048#issuecomment-69990485
  
IIRC there are a few tests under `sql/` that use local-cluster too, but 
can't name any from the top of my head.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4746 make it easy to skip IntegrationTes...

2015-01-14 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/4048#issuecomment-69991447
  
Just curious - what is the before and after time? I.e. what fraction of 
time does this cut down on?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2309][MLlib] Multinomial Logistic Regre...

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3833#issuecomment-69992822
  
  [Test build #25566 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25566/consoleFull)
 for   PR 3833 at commit 
[`4e16781`](https://github.com/apache/spark/commit/4e1678160f135f263b242b4cf1c28c95886bc11b).
 * This patch **does not merge cleanly**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4746 make it easy to skip IntegrationTes...

2015-01-14 Thread srowen
Github user srowen commented on the pull request:

https://github.com/apache/spark/pull/4048#issuecomment-69997514
  
This is not a terribly useful observation, but, this is what `surefire` vs 
`failsafe` is for in the Maven world, without making a custom mechanism. But we 
have the SBT build too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4803] [streaming] Remove duplicate Regi...

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3648#issuecomment-69973089
  
  [Test build #25554 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25554/consoleFull)
 for   PR 3648 at commit 
[`868efab`](https://github.com/apache/spark/commit/868efabd2c43a662b8ccfb1651192dfb95f80f06).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5231][WebUI] History Server shows wrong...

2015-01-14 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4029#issuecomment-69976817
  
This is a nice patch, but I wonder whether there's a smaller fix that 
doesn't require changing SparkListener events; that will make it easier to 
backport that patch to `branch-1.2`.  The job page already knows the last stage 
in the job (the result stage), so I think we might be able to use the final 
stage's completion time as the job completion time and the first stage's 
submission time as the job start time.  However, there are a couple of 
corner-cases that this might miss: I could submit a job that spends a bunch of 
time queued behind other jobs before its first stage starts running, in which 
case it would be helpful to be able to distinguish between scheduler delays and 
stage durations.  Similarly, there might be a corner-case related to the job 
completion time if we have a job that spends a lot of time fetching results 
back to the driver after they've been stored in the block manager by completed 
tasks.

So, I guess the approach here seems like the right fix.  I'd guess we might 
be able to do a separate fix in branch-1.2 to use the first/last stage time 
approximations.

I have a couple of comments on the code here, so I'll comment on those 
inline.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5095] Support capping cores and launch ...

2015-01-14 Thread tnachen
Github user tnachen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4027#discussion_r22963154
  
--- Diff: docs/running-on-mesos.md ---
@@ -226,6 +226,20 @@ See the [configuration page](configuration.html) for 
information on Spark config
 The final total amount of memory allocated is the maximum value 
between executor memory plus memoryOverhead, and overhead fraction (1.07) plus 
the executor memory.
   /td
 /tr
+tr
+  tdcodespark.mesos.coarse.cpu.max/code/td
--- End diff --

Good catch! Thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor] Fix tiny typo in BlockManager

2015-01-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4046#issuecomment-69978070
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25557/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4047#issuecomment-69978003
  
  [Test build #25560 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25560/consoleFull)
 for   PR 4047 at commit 
[`984c414`](https://github.com/apache/spark/commit/984c414ce2bfc14fc1bef35adfca78db4770ff37).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5235] Make SQLConf Serializable

2015-01-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4031


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4047#issuecomment-69979680
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25559/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5193][SQL] Tighten up SQLContext API

2015-01-14 Thread rxin
GitHub user rxin opened a pull request:

https://github.com/apache/spark/pull/4049

[SPARK-5193][SQL] Tighten up SQLContext API

1. Removed 2 implicits (logicalPlanToSparkQuery and baseRelationToSchemaRDD)
2. Moved extraStrategies into ExperimentalMethods.
3. Made private methods protected[sql] so they don't show up in javadocs.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/rxin/spark sqlContext-refactor

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4049.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4049


commit 4a38c9b15ecc04f2ae2f285af5742608fc91549b
Author: Reynold Xin r...@databricks.com
Date:   2015-01-14T20:47:49Z

[SPARK-5193][SQL] Tighten up SQLContext API

1. Removed 2 implicits (logicalPlanToSparkQuery and baseRelationToSchemaRDD)
2. Moved extraStrategies into ExperimentalMethods.
3. Made private methods protected[sql] so they don't show up in javadocs.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2309][MLlib] Multinomial Logistic Regre...

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3833#issuecomment-69992401
  
  [Test build #25565 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25565/consoleFull)
 for   PR 3833 at commit 
[`7ac4dfc`](https://github.com/apache/spark/commit/7ac4dfc4a41b20c97c29fdf60045aca64fe08a6f).
 * This patch **fails to build**.
 * This patch **does not merge cleanly**.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-01-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-69995071
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25561/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3916#issuecomment-69995062
  
  [Test build #25561 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25561/consoleFull)
 for   PR 3916 at commit 
[`61919df`](https://github.com/apache/spark/commit/61919df21853eba479ddb591fb89dcecfd341988).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5199. Input metrics should show up for I...

2015-01-14 Thread sryza
Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/4050#discussion_r22972019
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -219,6 +220,9 @@ class HadoopRDD[K, V](
   val bytesReadCallback = if 
(split.inputSplit.value.isInstanceOf[FileSplit]) {
 SparkHadoopUtil.get.getFSBytesReadOnThreadCallback(
   split.inputSplit.value.asInstanceOf[FileSplit].getPath, jobConf)
+  } else if (split.inputSplit.value.isInstanceOf[CombineFileSplit]) {
+SparkHadoopUtil.get.getFSBytesReadOnThreadCallback(
+  
split.inputSplit.value.asInstanceOf[CombineFileSplit].getPath(0), jobConf)
--- End diff --

The issue is that those are actually two different classes.  There's a 
CombineFileSplit for the old MR API (used by HadoopRDD) and a CombineFileSplit 
for the new one (used by NewHadoopRDD).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2909] [MLlib] [PySpark] SparseVector in...

2015-01-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/4025


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4047#issuecomment-69971352
  
  [Test build #25558 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25558/consoleFull)
 for   PR 4047 at commit 
[`c6e4308`](https://github.com/apache/spark/commit/c6e430867ca32ca6f409f953a2d47dd04a1e6e53).
 * This patch **fails Scala style tests**.
 * This patch **does not merge cleanly**.
 * This patch adds the following public classes _(experimental)_:
  * `  case class Document(counts: Vector, id: Long)`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4047#issuecomment-69971357
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25558/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2909] [MLlib] [PySpark] SparseVector in...

2015-01-14 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/4025#issuecomment-69971384
  
Merged into master. Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5231][WebUI] History Server shows wrong...

2015-01-14 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4029#discussion_r22963608
  
--- Diff: core/src/main/scala/org/apache/spark/util/JsonProtocol.scala ---
@@ -469,6 +471,7 @@ private[spark] object JsonProtocol {
 
   def jobStartFromJson(json: JValue): SparkListenerJobStart = {
 val jobId = (json \ Job ID).extract[Int]
+val submissionTime = (json \ Submission Time).extractOpt[Long]
--- End diff --

For backwards-compatibility, you should use `Utils.json` option here; see 
the block comment at 
https://github.com/sarutak/spark/blob/SPARK-5231/core/src/main/scala/org/apache/spark/util/JsonProtocol.scala#L46,
 plus the examples elsewhere in this file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4014] Add TaskContext.attemptNumber and...

2015-01-14 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/3849#issuecomment-69978436
  
I'm going to merge this into `master` (1.3.0) since it's a blocker for some 
tests that I want to write.  I'll look into backporting this into maintenance 
branches, too, since that would allow me to backport regression tests that use 
the new methods.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4747 make it easy to skip IntegrationTes...

2015-01-14 Thread squito
GitHub user squito opened a pull request:

https://github.com/apache/spark/pull/4048

SPARK-4747 make it easy to skip IntegrationTests

* create an `IntegrationTest` tag
* label all tests in core as an `IntegrationTest` if they use a 
`local-cluster`
* make a `unit-test` task in sbt so its easy to skip all the unit tests in 
local development.

On my laptop, this means that I can run `~unit-test` in my sbt console, 
which takes ~5 mins on the first run.  But since it is calling `test-quick` 
under the hood, this means that as I make changes, it only re-runs the tests 
I've effected.  so generally I can update on all tests in a second or two.

Of course this means its skipping a bunch of important tests, but hopefully 
this is a useful subset of tests that can actually be run locally.  If you 
don't skip the IntegrationTests, its totally impractical to ever get through 
even the first run of `test-quick` on my laptop.

An added bonus is that this set of tests can be run without having to ever 
do the `mvn package` step, since we are never launching a full cluster as an 
external process.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/squito/spark SPARK-4746

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4048.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4048


commit 030cc0c0cb57d35a043e68509a3997fb3f1a3dc1
Author: Imran Rashid iras...@cloudera.com
Date:   2015-01-13T21:45:38Z

add IntegrationTest tag, and label a bunch of tests in core

commit 30f4d636387e57e9c104024db5a20afcde1b7cbb
Author: Imran Rashid iras...@cloudera.com
Date:   2015-01-14T19:36:37Z

add a unit-test task

commit 3a8503227d53554155e5766ce12d48039854f163
Author: Imran Rashid iras...@cloudera.com
Date:   2015-01-14T20:41:07Z

fix task name




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4047#issuecomment-69989155
  
  [Test build #25560 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25560/consoleFull)
 for   PR 4047 at commit 
[`984c414`](https://github.com/apache/spark/commit/984c414ce2bfc14fc1bef35adfca78db4770ff37).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4687. [WIP] Add an addDirectory API

2015-01-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3670#issuecomment-69993972
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25562/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4687. [WIP] Add an addDirectory API

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3670#issuecomment-69993963
  
  [Test build #25562 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25562/consoleFull)
 for   PR 3670 at commit 
[`8413c50`](https://github.com/apache/spark/commit/8413c5010527f51cb8fc6401201a0d5f1f8ef6e9).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4585. Spark dynamic executor allocation ...

2015-01-14 Thread sryza
GitHub user sryza opened a pull request:

https://github.com/apache/spark/pull/4051

SPARK-4585. Spark dynamic executor allocation should use minExecutors as...

... initial number

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sryza/spark sandy-spark-4585

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/4051.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #4051


commit 9d7b6f98caff5de3db88d372853cccf012f36dc6
Author: Sandy Ryza sa...@cloudera.com
Date:   2014-12-28T03:29:11Z

SPARK-4585. Spark dynamic executor allocation should use minExecutors as 
initial number




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4585. Spark dynamic executor allocation ...

2015-01-14 Thread ksakellis
Github user ksakellis commented on a diff in the pull request:

https://github.com/apache/spark/pull/4051#discussion_r22972319
  
--- Diff: 
yarn/src/main/scala/org/apache/spark/deploy/yarn/ClientArguments.scala ---
@@ -73,12 +73,12 @@ private[spark] class ClientArguments(args: 
Array[String], sparkConf: SparkConf)
   .orNull
 // If dynamic allocation is enabled, start at the max number of 
executors
--- End diff --

Fix


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5235] Make SQLConf Serializable

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4031#issuecomment-69969756
  
  [Test build #2 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/2/consoleFull)
 for   PR 4031 at commit 
[`c2103f5`](https://github.com/apache/spark/commit/c2103f57720627f44fe8ad8dcd1af8d9e2fc31f2).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5235] Make SQLConf Serializable

2015-01-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4031#issuecomment-69969765
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/2/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4047#issuecomment-69971147
  
  [Test build #25558 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25558/consoleFull)
 for   PR 4047 at commit 
[`c6e4308`](https://github.com/apache/spark/commit/c6e430867ca32ca6f409f953a2d47dd04a1e6e53).
 * This patch **does not merge cleanly**.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2909] [MLlib] [PySpark] SparseVector in...

2015-01-14 Thread mengxr
Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/4025#issuecomment-69971132
  
LGTM. @MechCoder The Scala code uses Breeze's index lookup, which uses 
bisection as well. You can try implementing bisection in MLlib and then doing a 
micro-benchmark. If there is a big difference, we will have the implementation 
in MLlib.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5228][WebUI] Hide tables for Active Jo...

2015-01-14 Thread JoshRosen
Github user JoshRosen commented on the pull request:

https://github.com/apache/spark/pull/4028#issuecomment-69972337
  
This matches the approach that I used for the job details page, so this 
looks good to me.  I'm going to merge this into `master` (1.3.0).  Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2309][MLlib] Multinomial Logistic Regre...

2015-01-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/3833#discussion_r22963904
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -18,30 +18,36 @@
 package org.apache.spark.mllib.classification
 
 import org.apache.spark.annotation.Experimental
-import org.apache.spark.mllib.linalg.Vector
+import org.apache.spark.mllib.linalg.BLAS.dot
+import org.apache.spark.mllib.linalg.{DenseVector, Vector}
 import org.apache.spark.mllib.optimization._
 import org.apache.spark.mllib.regression._
-import org.apache.spark.mllib.util.DataValidators
+import org.apache.spark.mllib.util.{DataValidators, MLUtils}
 import org.apache.spark.rdd.RDD
 
 /**
- * Classification model trained using Logistic Regression.
+ * Classification model trained using Multinomial/Binary Logistic 
Regression.
  *
  * @param weights Weights computed for every feature.
- * @param intercept Intercept computed for this model.
+ * @param intercept Intercept computed for this model. (Only used in 
Binary Logistic Regression.
+ *  In Multinomial Logistic Regression, the intercepts 
will not be a single values,
+ *  so the intercepts will be part of the weights.)
+ * @param nClasses The number of possible outcomes for Multinomial 
Logistic Regression.
+ * The default value is 2 which is Binary Logistic 
Regression.
  */
 class LogisticRegressionModel (
 override val weights: Vector,
-override val intercept: Double)
+override val intercept: Double,
--- End diff --

addressed. thks.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor] Fix tiny typo in BlockManager

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4046#issuecomment-69978060
  
  [Test build #25557 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25557/consoleFull)
 for   PR 4046 at commit 
[`a3e2a2f`](https://github.com/apache/spark/commit/a3e2a2f46d8d853d79b993ce0e22802aa243ae83).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2309][MLlib] Multinomial Logistic Regre...

2015-01-14 Thread dbtsai
Github user dbtsai commented on a diff in the pull request:

https://github.com/apache/spark/pull/3833#discussion_r22965406
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/classification/LogisticRegression.scala
 ---
@@ -61,20 +67,70 @@ class LogisticRegressionModel (
 
   override protected def predictPoint(dataMatrix: Vector, weightMatrix: 
Vector,
--- End diff --

I thought about having weights as a matrix, but it's required to change so 
many places. For example, the gradient object has to change, the underline 
`GeneralizedLinearAlgorithm` has to change as well. I'm thinking to have more 
clear APIs when we move the code to `ml` package since we can do it from 
scratch. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4746 make it easy to skip IntegrationTes...

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4048#issuecomment-69988559
  
  [Test build #25563 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25563/consoleFull)
 for   PR 4048 at commit 
[`3a85032`](https://github.com/apache/spark/commit/3a8503227d53554155e5766ce12d48039854f163).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4746 make it easy to skip IntegrationTes...

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4048#issuecomment-69988571
  
  [Test build #25563 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25563/consoleFull)
 for   PR 4048 at commit 
[`3a85032`](https://github.com/apache/spark/commit/3a8503227d53554155e5766ce12d48039854f163).
 * This patch **fails RAT tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5199. Input metrics should show up for I...

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4050#issuecomment-69994416
  
  [Test build #25567 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25567/consoleFull)
 for   PR 4050 at commit 
[`9962dd0`](https://github.com/apache/spark/commit/9962dd097425442d62778f72911c6320c812f153).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4585. Spark dynamic executor allocation ...

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4051#issuecomment-69996035
  
  [Test build #25568 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25568/consoleFull)
 for   PR 4051 at commit 
[`9d7b6f9`](https://github.com/apache/spark/commit/9d7b6f98caff5de3db88d372853cccf012f36dc6).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5231][WebUI] History Server shows wrong...

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4029#issuecomment-69996068
  
  [Test build #25569 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25569/consoleFull)
 for   PR 4029 at commit 
[`da8bd14`](https://github.com/apache/spark/commit/da8bd1498607be57b7d1e11c2e98fe92f3221bc0).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5199. Input metrics should show up for I...

2015-01-14 Thread ksakellis
Github user ksakellis commented on a diff in the pull request:

https://github.com/apache/spark/pull/4050#discussion_r22971793
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -219,6 +220,9 @@ class HadoopRDD[K, V](
   val bytesReadCallback = if 
(split.inputSplit.value.isInstanceOf[FileSplit]) {
 SparkHadoopUtil.get.getFSBytesReadOnThreadCallback(
   split.inputSplit.value.asInstanceOf[FileSplit].getPath, jobConf)
+  } else if (split.inputSplit.value.isInstanceOf[CombineFileSplit]) {
+SparkHadoopUtil.get.getFSBytesReadOnThreadCallback(
+  
split.inputSplit.value.asInstanceOf[CombineFileSplit].getPath(0), jobConf)
--- End diff --

Can you push this logic down to the SparkHadoopUtil so that we don't 
duplicate it in two places (HadoopRDD and NewHadoopRDD). 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5231][WebUI] History Server shows wrong...

2015-01-14 Thread sarutak
Github user sarutak commented on the pull request:

https://github.com/apache/spark/pull/4029#issuecomment-69996037
  
@JoshRosen Thanks for your advises. I reflected your comment and added a 
test case.
For now, I will take the original approach and also, I will try to address 
this issue using the approximation approach you mentioned for 1.2.x.
What do you think?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5228][WebUI] Hide tables for Active Jo...

2015-01-14 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4028#discussion_r22960975
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala ---
@@ -121,27 +125,47 @@ private[ui] class AllJobsPage(parent: JobsTab) 
extends WebUIPage() {
   strongScheduling Mode: /strong
   
{listener.schedulingMode.map(_.toString).getOrElse(Unknown)}
 /li
-li
-  a href=#activestrongActive Jobs:/strong/a
-  {activeJobs.size}
-/li
-li
-  a href=#completedstrongCompleted Jobs:/strong/a
-  {completedJobs.size}
-/li
-li
-  a href=#failedstrongFailed Jobs:/strong/a
-  {failedJobs.size}
-/li
+{
+  if (shouldShowActiveJobs) {
+li
+  a href=#activestrongActive Jobs:/strong/a
+  {activeJobs.size}
+/li
+  }
+}
+{
+  if (shouldShowCompletedJobs) {
+li
+  a href=#completedstrongCompleted Jobs:/strong/a
+  {completedJobs.size}
+/li
+  }
+}
+{
+  if (shouldShowFailedJobs) {
+li
+  a href=#failedstrongFailed Jobs:/strong/a
+  {failedJobs.size}
+/li
+  }
+}
   /ul
 /div
 
-  val content = summary ++
-h4 id=activeActive Jobs ({activeJobs.size})/h4 ++ 
activeJobsTable ++
-h4 id=completedCompleted Jobs ({completedJobs.size})/h4 ++ 
completedJobsTable ++
-h4 id =failedFailed Jobs ({failedJobs.size})/h4 ++ 
failedJobsTable
-
-  val helpText = A job is triggered by a action, like count() or 
saveAsTextFile(). +
+  var content = summary
+  if (shouldShowActiveJobs) {
+content ++= h4 id=activeActive Jobs ({activeJobs.size})/h4 ++
+  activeJobsTable
+  }
+  if (shouldShowCompletedJobs) {
+content ++= h4 id=completedCompleted Jobs 
({completedJobs.size})/h4 ++
+  completedJobsTable
+  }
+  if (shouldShowFailedJobs) {
+content ++= h4 id =failedFailed Jobs ({failedJobs.size})/h4 
++
+  failedJobsTable
+  }
+  val helpText = A job is triggered by an action, like count() or 
saveAsTextFile(). +
--- End diff --

Thanks for catching and fixing this typo.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5228][WebUI] Hide tables for Active Jo...

2015-01-14 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4028#discussion_r22960930
  
--- Diff: core/src/main/scala/org/apache/spark/ui/jobs/AllJobsPage.scala ---
@@ -47,7 +47,7 @@ private[ui] class AllJobsPage(parent: JobsTab) extends 
WebUIPage() {
   val lastStageData = lastStageInfo.flatMap { s =
 listener.stageIdToData.get((s.stageId, s.attemptId))
   }
-  val isComplete = job.status == JobExecutionStatus.SUCCEEDED
--- End diff --

Hmm, I guess this was unused.  Good catch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5231][WebUI] History Server shows wrong...

2015-01-14 Thread JoshRosen
Github user JoshRosen commented on a diff in the pull request:

https://github.com/apache/spark/pull/4029#discussion_r22963362
  
--- Diff: 
core/src/main/scala/org/apache/spark/scheduler/SparkListener.scala ---
@@ -58,6 +58,7 @@ case class SparkListenerTaskEnd(
 @DeveloperApi
 case class SparkListenerJobStart(
 jobId: Int,
+time: Option[Long],
--- End diff --

I guess this is an option for backwards-compatibility reasons?  We 
definitely know the time when posting this event to the listener bus, so I 
think the right approach is to have time just be a regular `Long` and pass a 
dummy value (`-1`) when replaying JSON that's missing that field.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4746 make it easy to skip IntegrationTes...

2015-01-14 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/4048#issuecomment-69988816
  
Hey Imran, haven't looked at the code, but `YarnClusterSuite` could 
probably use this tag too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-4746 make it easy to skip IntegrationTes...

2015-01-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4048#issuecomment-69988574
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25563/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2309][MLlib] Multinomial Logistic Regre...

2015-01-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/3833#issuecomment-69992408
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25565/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-14 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/3951#issuecomment-69995496
  
Taking a look now  will add comments soon!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2309][MLlib] Multinomial Logistic Regre...

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3833#issuecomment-69997838
  
  [Test build #25570 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25570/consoleFull)
 for   PR 3833 at commit 
[`9cf9811`](https://github.com/apache/spark/commit/9cf98115c9b8ba76cd4b460e205ba87328c4e471).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5193][SQL] Tighten up SQLContext API

2015-01-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4049#issuecomment-70001689
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/25564/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2996] Implement userClassPathFirst for ...

2015-01-14 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3233#discussion_r22975789
  
--- Diff: yarn/src/main/scala/org/apache/spark/deploy/yarn/Client.scala ---
@@ -762,46 +764,37 @@ object Client extends Logging {
   extraClassPath: Option[String] = None): Unit = {
 extraClassPath.foreach(addClasspathEntry(_, env))
 addClasspathEntry(Environment.PWD.$(), env)
-
-// Normally the users app.jar is last in case conflicts with spark jars
 if (sparkConf.getBoolean(spark.yarn.user.classpath.first, false)) {
-  addUserClasspath(args, sparkConf, env)
-  addFileToClasspath(sparkJar(sparkConf), SPARK_JAR, env)
-  populateHadoopClasspath(conf, env)
-} else {
-  addFileToClasspath(sparkJar(sparkConf), SPARK_JAR, env)
-  populateHadoopClasspath(conf, env)
-  addUserClasspath(args, sparkConf, env)
+  getUserClasspath(args, sparkConf).foreach { x =
+addFileToClasspath(x, null, env)
+  }
 }
-
-// Append all jar files under the working directory to the classpath.
-addClasspathEntry(Environment.PWD.$() + Path.SEPARATOR + *, env)
--- End diff --

It's removed for two reasons:

- It didn't serve any practical purpose
- It could potentially lead to behavior that diverged from other cluster 
managers

All jars distributed with `--jars` are added to the classpath 
automatically, withouth the need for this. The directory itself is also added, 
so things like `log4j.properties` uploaded by the user are in the classpath.

The only change this causes is that files and archives (`--files` and 
`--archives`) would also end up in the app's classpath. This is the part that 
diverges from other cluster managers - if you use `--files`  to add a jar file 
in standalone mode, the classes in that jar will not show up in the app's 
classpath. In Yarn mode they would, and I think that's wrong.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2996] Implement userClassPathFirst for ...

2015-01-14 Thread vanzin
Github user vanzin commented on the pull request:

https://github.com/apache/spark/pull/3233#issuecomment-70004739
  
Hi @tgravescs , thanks for taking a look. Aside from all the unit tests I 
added, I explained the testing I did, including the code I used, in my very 
first comment at the top. Did you have any specific questions about that?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4803] [streaming] Remove duplicate Regi...

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3648#issuecomment-70017327
  
  [Test build #25577 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25577/consoleFull)
 for   PR 3648 at commit 
[`868efab`](https://github.com/apache/spark/commit/868efabd2c43a662b8ccfb1651192dfb95f80f06).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4789] [SPARK-4942] [SPARK-5031] [mllib]...

2015-01-14 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3637#discussion_r22981585
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/ml/classification/LogisticRegression.scala
 ---
@@ -80,69 +50,157 @@ class LogisticRegression extends 
Estimator[LogisticRegressionModel] with Logisti
 
   def setRegParam(value: Double): this.type = set(regParam, value)
   def setMaxIter(value: Int): this.type = set(maxIter, value)
-  def setLabelCol(value: String): this.type = set(labelCol, value)
   def setThreshold(value: Double): this.type = set(threshold, value)
-  def setFeaturesCol(value: String): this.type = set(featuresCol, value)
-  def setScoreCol(value: String): this.type = set(scoreCol, value)
-  def setPredictionCol(value: String): this.type = set(predictionCol, 
value)
 
   override def fit(dataset: SchemaRDD, paramMap: ParamMap): 
LogisticRegressionModel = {
+// Check schema
 transformSchema(dataset.schema, paramMap, logging = true)
-import dataset.sqlContext._
+
+// Extract columns from data.  If dataset is persisted, do not persist 
oldDataset.
+val oldDataset = extractLabeledPoints(dataset, paramMap)
 val map = this.paramMap ++ paramMap
-val instances = dataset.select(map(labelCol).attr, 
map(featuresCol).attr)
-  .map { case Row(label: Double, features: Vector) =
-LabeledPoint(label, features)
-  }.persist(StorageLevel.MEMORY_AND_DISK)
+val handlePersistence = dataset.getStorageLevel == StorageLevel.NONE
+if (handlePersistence) {
+  oldDataset.persist(StorageLevel.MEMORY_AND_DISK)
+}
+
+// Train model
 val lr = new LogisticRegressionWithLBFGS
 lr.optimizer
   .setRegParam(map(regParam))
   .setNumIterations(map(maxIter))
-val lrm = new LogisticRegressionModel(this, map, 
lr.run(instances).weights)
-instances.unpersist()
+val oldModel = lr.run(oldDataset)
+val lrm = new LogisticRegressionModel(this, map, oldModel.weights, 
oldModel.intercept)
+
+if (handlePersistence) {
+  oldDataset.unpersist()
+}
+
 // copy model params
 Params.inheritValues(map, this, lrm)
 lrm
   }
 
-  private[ml] override def transformSchema(schema: StructType, paramMap: 
ParamMap): StructType = {
-validateAndTransformSchema(schema, paramMap, fitting = true)
-  }
+  override protected def featuresDataType: DataType = new VectorUDT
 }
 
+
 /**
  * :: AlphaComponent ::
+ *
  * Model produced by [[LogisticRegression]].
  */
 @AlphaComponent
 class LogisticRegressionModel private[ml] (
 override val parent: LogisticRegression,
 override val fittingParamMap: ParamMap,
-weights: Vector)
-  extends Model[LogisticRegressionModel] with LogisticRegressionParams {
+val weights: Vector,
+val intercept: Double)
+  extends ProbabilisticClassificationModel[Vector, LogisticRegressionModel]
+  with LogisticRegressionParams {
+
+  setThreshold(0.5)
 
   def setThreshold(value: Double): this.type = set(threshold, value)
-  def setFeaturesCol(value: String): this.type = set(featuresCol, value)
-  def setScoreCol(value: String): this.type = set(scoreCol, value)
-  def setPredictionCol(value: String): this.type = set(predictionCol, 
value)
 
-  private[ml] override def transformSchema(schema: StructType, paramMap: 
ParamMap): StructType = {
-validateAndTransformSchema(schema, paramMap, fitting = false)
+  private val margin: Vector = Double = (features) = {
+BLAS.dot(features, weights) + intercept
+  }
+
+  private val score: Vector = Double = (features) = {
+val m = margin(features)
+1.0 / (1.0 + math.exp(-m))
   }
 
   override def transform(dataset: SchemaRDD, paramMap: ParamMap): 
SchemaRDD = {
+// Check schema
 transformSchema(dataset.schema, paramMap, logging = true)
+
 import dataset.sqlContext._
 val map = this.paramMap ++ paramMap
-val score: Vector = Double = (v) = {
-  val margin = BLAS.dot(v, weights)
-  1.0 / (1.0 + math.exp(-margin))
+
+// Output selected columns only.
+// This is a bit complicated since it tries to avoid repeated 
computation.
--- End diff --

Thinking more about this, I think abstracting the key links might be best.  
It will certainly make LogisticRegression much shorter since prediction takes 
up most of the file.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at 

[GitHub] spark pull request: [SPARK-5095] Support capping cores and launch ...

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4027#issuecomment-7002
  
  [Test build #25580 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25580/consoleFull)
 for   PR 4027 at commit 
[`486d2f1`](https://github.com/apache/spark/commit/486d2f11ca278ed497d712a6adcbc41fa3a9400c).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4286] Integrate external shuffle servic...

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3861#issuecomment-70020007
  
  [Test build #25581 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25581/consoleFull)
 for   PR 3861 at commit 
[`99415c3`](https://github.com/apache/spark/commit/99415c3bc9973f2f80faaf7f5742b3bc860bc900).
 * This patch merges cleanly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [MLLIB] [spark-2352] Implementation of an Arti...

2015-01-14 Thread avulanov
Github user avulanov commented on the pull request:

https://github.com/apache/spark/pull/1290#issuecomment-70021585
  
@loachli @bgreeven We are thinking of changing the name 
of`ArtificialNeuralNetwork` and `ANNClassifier` objects to `ANNWithLBFGS` and 
`ANNClassifierWithLBFGS` to be in line with the naming convention in mllib. Are 
there any objections?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5199. Input metrics should show up for I...

2015-01-14 Thread ksakellis
Github user ksakellis commented on a diff in the pull request:

https://github.com/apache/spark/pull/4050#discussion_r22985033
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -219,6 +220,9 @@ class HadoopRDD[K, V](
   val bytesReadCallback = if 
(split.inputSplit.value.isInstanceOf[FileSplit]) {
 SparkHadoopUtil.get.getFSBytesReadOnThreadCallback(
   split.inputSplit.value.asInstanceOf[FileSplit].getPath, jobConf)
+  } else if (split.inputSplit.value.isInstanceOf[CombineFileSplit]) {
+SparkHadoopUtil.get.getFSBytesReadOnThreadCallback(
+  
split.inputSplit.value.asInstanceOf[CombineFileSplit].getPath(0), jobConf)
--- End diff --

Yes, SparkHadoopUtil) can check for those classes. It can have a matcher on 
the 4 classes (2 new and 2 old). So the call from hadoopRdd would be something 
like:
SparkHadoopUtil.get.getFSBytesReadOnThreadCallback(split.inputSplit, 
jobConf)
Not a big deal i guess since in SparkHadoopUtil you'll have four cases but 
at least that logic is centralized.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-01-14 Thread sryza
Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r22975410
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/AbstractLauncher.java ---
@@ -0,0 +1,451 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.launcher;
+
+import java.io.BufferedReader;
+import java.io.File;
+import java.io.FileFilter;
+import java.io.FileInputStream;
+import java.io.InputStreamReader;
+import java.io.IOException;
+import java.util.ArrayList;
+import java.util.Arrays;
+import java.util.Collections;
+import java.util.HashMap;
+import java.util.List;
+import java.util.Map;
+import java.util.Properties;
+import java.util.jar.JarFile;
+import java.util.regex.Pattern;
+
+/**
+ * Basic functionality for launchers.
--- End diff --

This could use a little explanation.  What is a launcher?  When should 
someone consider extending this class?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-01-14 Thread sryza
Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r22976011
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/LauncherCommon.java ---
@@ -0,0 +1,250 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.launcher;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Configuration key definitions for Spark jobs, and some helper methods.
+ */
+public class LauncherCommon {
+
+  /** The Spark master. */
+  public static final String SPARK_MASTER = spark.master;
+
+  /** Configuration key for the driver memory. */
+  public static final String DRIVER_MEMORY = spark.driver.memory;
+  /** Configuration key for the driver class path. */
+  public static final String DRIVER_EXTRA_CLASSPATH = 
spark.driver.extraClassPath;
+  /** Configuration key for the driver VM options. */
+  public static final String DRIVER_EXTRA_JAVA_OPTIONS = 
spark.driver.extraJavaOptions;
+  /** Configuration key for the driver native library path. */
+  public static final String DRIVER_EXTRA_LIBRARY_PATH = 
spark.driver.extraLibraryPath;
+
+  /** Configuration key for the executor memory. */
+  public static final String EXECUTOR_MEMORY = spark.executor.memory;
+  /** Configuration key for the executor class path. */
+  public static final String EXECUTOR_EXTRA_CLASSPATH = 
spark.executor.extraClassPath;
+  /** Configuration key for the executor VM options. */
+  public static final String EXECUTOR_EXTRA_JAVA_OPTIONS = 
spark.executor.extraJavaOptions;
+  /** Configuration key for the executor native library path. */
+  public static final String EXECUTOR_EXTRA_LIBRARY_PATH = 
spark.executor.extraLibraryOptions;
+  /** Configuration key for the number of executor CPU cores. */
+  public static final String EXECUTOR_CORES = spark.executor.cores;
+
+  /** Returns whether the given string is null or empty. */
+  protected static boolean isEmpty(String s) {
+return s == null || s.isEmpty();
+  }
+
+  /** Joins a list of strings using the given separator. */
+  protected static String join(String sep, String... elements) {
+StringBuilder sb = new StringBuilder();
+for (String e : elements) {
+  if (e != null) {
+if (sb.length()  0) {
+  sb.append(sep);
+}
+sb.append(e);
+  }
+}
+return sb.toString();
+  }
+
+  /** Joins a list of strings using the given separator. */
--- End diff --

Can this be replaced with Guava's `Joiner.on`?  Or are we somehow avoiding 
Guava's inclusion?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-01-14 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r22976317
  
--- Diff: 
launcher/src/main/java/org/apache/spark/launcher/LauncherCommon.java ---
@@ -0,0 +1,250 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.launcher;
+
+import java.io.File;
+import java.util.ArrayList;
+import java.util.List;
+import java.util.Map;
+
+/**
+ * Configuration key definitions for Spark jobs, and some helper methods.
+ */
+public class LauncherCommon {
+
+  /** The Spark master. */
+  public static final String SPARK_MASTER = spark.master;
+
+  /** Configuration key for the driver memory. */
+  public static final String DRIVER_MEMORY = spark.driver.memory;
+  /** Configuration key for the driver class path. */
+  public static final String DRIVER_EXTRA_CLASSPATH = 
spark.driver.extraClassPath;
+  /** Configuration key for the driver VM options. */
+  public static final String DRIVER_EXTRA_JAVA_OPTIONS = 
spark.driver.extraJavaOptions;
+  /** Configuration key for the driver native library path. */
+  public static final String DRIVER_EXTRA_LIBRARY_PATH = 
spark.driver.extraLibraryPath;
+
+  /** Configuration key for the executor memory. */
+  public static final String EXECUTOR_MEMORY = spark.executor.memory;
+  /** Configuration key for the executor class path. */
+  public static final String EXECUTOR_EXTRA_CLASSPATH = 
spark.executor.extraClassPath;
+  /** Configuration key for the executor VM options. */
+  public static final String EXECUTOR_EXTRA_JAVA_OPTIONS = 
spark.executor.extraJavaOptions;
+  /** Configuration key for the executor native library path. */
+  public static final String EXECUTOR_EXTRA_LIBRARY_PATH = 
spark.executor.extraLibraryOptions;
+  /** Configuration key for the number of executor CPU cores. */
+  public static final String EXECUTOR_CORES = spark.executor.cores;
+
+  /** Returns whether the given string is null or empty. */
+  protected static boolean isEmpty(String s) {
+return s == null || s.isEmpty();
+  }
+
+  /** Joins a list of strings using the given separator. */
+  protected static String join(String sep, String... elements) {
+StringBuilder sb = new StringBuilder();
+for (String e : elements) {
+  if (e != null) {
+if (sb.length()  0) {
+  sb.append(sep);
+}
+sb.append(e);
+  }
+}
+return sb.toString();
+  }
+
+  /** Joins a list of strings using the given separator. */
--- End diff --

This library should not have any external dependencies.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2309][MLlib] Multinomial Logistic Regre...

2015-01-14 Thread dbtsai
Github user dbtsai commented on the pull request:

https://github.com/apache/spark/pull/3833#issuecomment-70007063
  
Jenkins, please re-test again.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [Minor] Fix tiny typo in BlockManager

2015-01-14 Thread squito
Github user squito commented on the pull request:

https://github.com/apache/spark/pull/4046#issuecomment-70013728
  
lgtm


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: SPARK-5217 Spark UI should report pending stag...

2015-01-14 Thread squito
Github user squito commented on the pull request:

https://github.com/apache/spark/pull/4043#issuecomment-70015231
  
lgtm.

I was going to suggest that pending stages should be sorted with oldest 
submission time first, not reversed ... but I guess we want the completed 
stages sorted with oldest last, and probably makes sense to keep those tables 
consistent with each other.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4707][STREAMING] Reliable Kafka Receive...

2015-01-14 Thread harishreedharan
Github user harishreedharan commented on the pull request:

https://github.com/apache/spark/pull/3655#issuecomment-70015327
  
No, this does prevent data loss - basically if the store fails multiple 
times, we shutdown the receiver completely. So the new receiver which gets 
started starts from the last commit, so we are safe.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4789] [SPARK-4942] [SPARK-5031] [mllib]...

2015-01-14 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3637#discussion_r22983383
  
--- Diff: 
examples/src/main/scala/org/apache/spark/examples/ml/DeveloperApiExample.scala 
---
@@ -0,0 +1,195 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the License); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an AS IS BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.examples.ml
+
+import org.apache.spark.{SparkConf, SparkContext}
+import org.apache.spark.SparkContext._
+import org.apache.spark.ml.classification.{Classifier, ClassifierParams, 
ClassificationModel}
+import org.apache.spark.ml.param.{Params, IntParam, ParamMap}
+import org.apache.spark.mllib.linalg.{BLAS, Vector, Vectors, VectorUDT}
+import org.apache.spark.mllib.regression.LabeledPoint
+import org.apache.spark.sql.{DataType, SchemaRDD, Row, SQLContext}
+
+/**
+ * A simple example demonstrating how to write your own learning algorithm 
using Estimator,
+ * Transformer, and other abstractions.
+ * This mimics [[org.apache.spark.ml.classification.LogisticRegression]].
+ * Run with
+ * {{{
+ * bin/run-example ml.DeveloperApiExample
+ * }}}
+ */
+object DeveloperApiExample {
+
+  def main(args: Array[String]) {
+val conf = new SparkConf().setAppName(DeveloperApiExample)
+val sc = new SparkContext(conf)
+val sqlContext = new SQLContext(sc)
+import sqlContext._
+
+// Prepare training data.
+val training = sparkContext.parallelize(Seq(
+  LabeledPoint(1.0, Vectors.dense(0.0, 1.1, 0.1)),
+  LabeledPoint(0.0, Vectors.dense(2.0, 1.0, -1.0)),
+  LabeledPoint(0.0, Vectors.dense(2.0, 1.3, 1.0)),
+  LabeledPoint(1.0, Vectors.dense(0.0, 1.2, -0.5
+
+// Create a LogisticRegression instance.  This instance is an 
Estimator.
+val lr = new MyLogisticRegression()
+// Print out the parameters, documentation, and any default values.
+println(MyLogisticRegression parameters:\n + lr.explainParams() + 
\n)
+
+// We may set parameters using setter methods.
+lr.setMaxIter(10)
+
+// Learn a LogisticRegression model.  This uses the parameters stored 
in lr.
+val model = lr.fit(training)
+
+// Prepare test data.
+val test = sparkContext.parallelize(Seq(
+  LabeledPoint(1.0, Vectors.dense(-1.0, 1.5, 1.3)),
+  LabeledPoint(0.0, Vectors.dense(3.0, 2.0, -0.1)),
+  LabeledPoint(1.0, Vectors.dense(0.0, 2.2, -1.5
+
+// Make predictions on test data.
+val sumPredictions: Double = model.transform(test)
+  .select('features, 'label, 'prediction)
+  .collect()
+  .map { case Row(features: Vector, label: Double, prediction: Double) 
=
+prediction
+  }.sum
+assert(sumPredictions == 0.0,
+  MyLogisticRegression predicted something other than 0, even though 
all weights are 0!)
+  }
+}
+
+/**
+ * Example of defining a parameter trait for a user-defined type of 
[[Classifier]].
+ *
+ * NOTE: This is private since it is an example.  In practice, you may not 
want it to be private.
+ */
+private trait MyLogisticRegressionParams extends ClassifierParams {
+
+  /** param for max number of iterations */
+  val maxIter: IntParam = new IntParam(this, maxIter, max number of 
iterations)
+  def getMaxIter: Int = get(maxIter)
+}
+
+/**
+ * Example of defining a type of [[Classifier]].
+ *
+ * NOTE: This is private since it is an example.  In practice, you may not 
want it to be private.
+ */
+private class MyLogisticRegression
+  extends Classifier[Vector, MyLogisticRegression, 
MyLogisticRegressionModel]
+  with MyLogisticRegressionParams {
+
+  setMaxIter(100) // Initialize
+
+  def setMaxIter(value: Int): this.type = set(maxIter, value)
+
+  override def fit(dataset: SchemaRDD, paramMap: ParamMap): 
MyLogisticRegressionModel = {
+// Check schema (types). This allows early failure before running the 
algorithm.
+

[GitHub] spark pull request: [SPARK-1405] [mllib] Latent Dirichlet Allocati...

2015-01-14 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/4047#issuecomment-70024523
  
By the way, I'm running larger-scale tests, and I'll post results once they 
are ready!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-14 Thread jkbradley
Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/3951#discussion_r22972683
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/api/python/PythonMLLibAPI.scala ---
@@ -21,6 +21,8 @@ import java.io.OutputStream
 import java.nio.{ByteBuffer, ByteOrder}
 import java.util.{ArrayList = JArrayList, List = JList, Map = JMap}
 
+import org.apache.spark.mllib.tree.loss.Losses
--- End diff --

Organize imports, ordered as: scala/java, outside libraries, spark  
(alphabetized within groups)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-5094][MLlib] Add Python API for Gradien...

2015-01-14 Thread jkbradley
Github user jkbradley commented on the pull request:

https://github.com/apache/spark/pull/3951#issuecomment-69998021
  
@kazk1018  Thanks for the PR!  A few high-level items:
* Will it reduce duplicate code to abstract the TreeEnsembleModel 
concept, as in Scala?  Forests and boosting produce models which are very 
similar.  GradientBoostedTreesModel and RandomForestModel could wrap the 
abstract class.
* Default parameter values: You state default parameter values in the docs 
for trainClassifier/Regressor, but they are not actually set in the method 
declarations.  Could you please fix that?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-4924] Add a library for launching Spark...

2015-01-14 Thread sryza
Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/3916#discussion_r22975493
  
--- Diff: 
core/src/main/scala/org/apache/spark/deploy/worker/CommandUtils.scala ---
@@ -19,11 +19,14 @@ package org.apache.spark.deploy.worker
 
 import java.io.{File, FileOutputStream, InputStream, IOException}
 import java.lang.System._
+import java.util.{ArrayList, List = JList, Map = JMap}
--- End diff --

ArrayList seems to be unused


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2996] Implement userClassPathFirst for ...

2015-01-14 Thread vanzin
Github user vanzin commented on a diff in the pull request:

https://github.com/apache/spark/pull/3233#discussion_r22975503
  
--- Diff: yarn/src/test/resources/log4j.properties ---
@@ -16,7 +16,7 @@
 #
 
 # Set everything to be logged to the file target/unit-tests.log
-log4j.rootCategory=INFO, file
+log4j.rootCategory=DEBUG, file
--- End diff --

Yes. It never made much sense to me to have test logs restricted to `INFO`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-2309][MLlib] Multinomial Logistic Regre...

2015-01-14 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/3833#issuecomment-70006013
  
  [Test build #25570 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/25570/consoleFull)
 for   PR 3833 at commit 
[`9cf9811`](https://github.com/apache/spark/commit/9cf98115c9b8ba76cd4b460e205ba87328c4e471).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   >