date:20150917

[GitHub] spark pull request: [SPARK-9585] add config to enable inputFormat ...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7918#issuecomment-141364727
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42639/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10684] [SQL] StructType.interpretedOrde...

2015-09-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8808#issuecomment-141364769
  
  [Test build #1772 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1772/console)
 for   PR 8808 at commit 
[`a26512b`](https://github.com/apache/spark/commit/a26512b12339a5f82d7c55c6107a1fe5e50ac43d).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `case class TaskCommitDenied(`
  * `class Interaction(override val uid: String) extends Transformer`
  * `abstract class LocalNode(conf: SQLConf) extends QueryPlan[LocalNode] 
with Logging `



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9585] add config to enable inputFormat ...

2015-09-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7918#issuecomment-141364629
  
  [Test build #42639 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42639/console)
 for   PR 7918 at commit 
[`3c1d41d`](https://github.com/apache/spark/commit/3c1d41d8d8b338b2305281f9ab6b5db927a2706c).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9585] add config to enable inputFormat ...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7918#issuecomment-141364725
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8402][MLLIB] DP Means Clustering

2015-09-17 Thread FlytxtRnD

Github user FlytxtRnD commented on a diff in the pull request:

https://github.com/apache/spark/pull/6880#discussion_r39827805
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/DpMeans.scala ---
@@ -0,0 +1,247 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.clustering
+
+import scala.collection.mutable.ArrayBuffer
+
+import org.apache.spark.Logging
+import org.apache.spark.annotation.Experimental
+import org.apache.spark.mllib.linalg.{Vector, Vectors}
+import org.apache.spark.mllib.linalg.BLAS.{axpy, scal}
+import org.apache.spark.mllib.util.MLUtils
+import org.apache.spark.rdd.RDD
+import org.apache.spark.storage.StorageLevel
+
+/**
+ * :: Experimental ::
+ *
+ * The Dirichlet process (DP) is a popular non-parametric Bayesian mixture
+ * model that allows for flexible clustering of data without having to
+ * determine the number of clusters in advance.
+ *
+ * Given a set of data points, this class performs cluster creation 
process,
+ * based on DP means algorithm, iterating until the maximum number of 
iterations
+ * is reached or the convergence criteria is satisfied. With the current
+ * global set of centers, it locally creates a new cluster centered at `x`
+ * whenever it encounters an uncovered data point `x`. In a similar manner,
+ * a local cluster center is promoted to a global center whenever an 
uncovered
+ * local cluster center is found. A data point is said to be "covered" by
+ * a cluster `c` if the distance from the point to the cluster center of 
`c`
+ * is less than a given lambda value.
+ *
+ * The original paper is "MLbase: Distributed Machine Learning Made Easy" 
by
+ * Xinghao Pan, Evan R. Sparks, Andre Wibisono
+ *
+ * @param lambda The distance threshold value that controls cluster 
creation.
+ * @param convergenceTol The threshold value at which convergence is 
considered to have occurred.
+ * @param maxIterations The maximum number of iterations to perform.
+ */
+
+@Experimental
+class DpMeans private (
+private var lambda: Double,
+private var convergenceTol: Double,
+private var maxIterations: Int) extends Serializable with Logging {
+
+  /**
+   * Constructs a default instance.The default parameters are {lambda: 1, 
convergenceTol: 0.01,
+   * maxIterations: 20}.
+   */
+  def this() = this(1, 0.01, 20)
+
+  /** Set the distance threshold that controls cluster creation. Default: 
1 */
+  def getLambda(): Double = lambda
+
+  /** Return the lambda. */
+  def setLambda(lambda: Double): this.type = {
+this.lambda = lambda
+this
+  }
+
+  /** Set the threshold value at which convergence is considered to have 
occurred. Default: 0.01 */
+  def setConvergenceTol(convergenceTol: Double): this.type = {
+this.convergenceTol = convergenceTol
+this
+  }
+
+  /** Return the threshold value at which convergence is considered to 
have occurred. */
+  def getConvergenceTol: Double = convergenceTol
+
+  /** Set the maximum number of iterations. Default: 20 */
+  def setMaxIterations(maxIterations: Int): this.type = {
+this.maxIterations = maxIterations
+this
+  }
+
+  /** Return the maximum number of iterations. */
+  def getMaxIterations: Int = maxIterations
+
+  /**
+   * Perform DP means clustering
+   */
+  def run(data: RDD[Vector]): DpMeansModel = {
+if (data.getStorageLevel == StorageLevel.NONE) {
+  logWarning("The input data is not directly cached, which may hurt 
performance if its"
++ " parent RDDs are also uncached.")
+}
+
+// Compute squared norms and cache them.
+val norms = data.map(Vectors.norm(_, 2.0))
+norms.persist()
+val zippedData = data.zip(norms).map {
+  case (v, norm) => new VectorWithNorm(v, norm)
+}
+
+// Implementation of DP mean

[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...

2015-09-17 Thread zhzhan

Github user zhzhan commented on the pull request:

https://github.com/apache/spark/pull/8799#issuecomment-141355072
  
LGTM Thanks for fixing this. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4716#issuecomment-141353561
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...

2015-09-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4716#issuecomment-141353499
  
  [Test build #42642 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42642/console)
 for   PR 4716 at commit 
[`60b2e57`](https://github.com/apache/spark/commit/60b2e57026febcb68e459983ba3164281a47f636).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4716#issuecomment-141353562
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42642/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: docs/running-on-mesos.md: state default values...

2015-09-17 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/8810#discussion_r39825765
  
--- Diff: docs/running-on-mesos.md ---
@@ -332,21 +332,21 @@ See the [configuration page](configuration.html) for 
information on Spark config
 
 
   spark.mesos.principal
-  Framework principal to authenticate to Mesos
+  (none)
   
 Set the principal with which Spark framework will use to authenticate 
with Mesos.
   
 
 
   spark.mesos.secret
-  Framework secret to authenticate to Mesos
+  (none)/td>
   
 Set the secret with which Spark framework will use to authenticate 
with Mesos.
   
 
 
   spark.mesos.role
-  Role for the Spark framework
+  *
--- End diff --

I've already merged this though so don't worry about it.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: docs/running-on-mesos.md: state default values...

2015-09-17 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/8810#discussion_r39825764
  
--- Diff: docs/running-on-mesos.md ---
@@ -332,21 +332,21 @@ See the [configuration page](configuration.html) for 
information on Spark config
 
 
   spark.mesos.principal
-  Framework principal to authenticate to Mesos
+  (none)
   
 Set the principal with which Spark framework will use to authenticate 
with Mesos.
   
 
 
   spark.mesos.secret
-  Framework secret to authenticate to Mesos
+  (none)/td>
   
 Set the secret with which Spark framework will use to authenticate 
with Mesos.
   
 
 
   spark.mesos.role
-  Role for the Spark framework
+  *
--- End diff --

Oh I meant 
```
*
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: docs/running-on-mesos.md: state default values...

2015-09-17 Thread felixb

Github user felixb commented on a diff in the pull request:

https://github.com/apache/spark/pull/8810#discussion_r39825730
  
--- Diff: docs/running-on-mesos.md ---
@@ -332,21 +332,21 @@ See the [configuration page](configuration.html) for 
information on Spark config
 
 
   spark.mesos.principal
-  Framework principal to authenticate to Mesos
+  (none)
   
 Set the principal with which Spark framework will use to authenticate 
with Mesos.
   
 
 
   spark.mesos.secret
-  Framework secret to authenticate to Mesos
+  (none)/td>
   
 Set the secret with which Spark framework will use to authenticate 
with Mesos.
   
 
 
   spark.mesos.role
-  Role for the Spark framework
+  *
--- End diff --

I don't see anything.
Should I set it to `"*"`, `(*)` or ` * ` with blanks on each side?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: docs/running-on-mesos.md: state default values...

2015-09-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/8810


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10272][Pyspark][MLLib] Added @since tag...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8628#issuecomment-141351519
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42643/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10272][Pyspark][MLLib] Added @since tag...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8628#issuecomment-141351516
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: docs/running-on-mesos.md: state default values...

2015-09-17 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/8810#issuecomment-141351178
  
Actually I will just merge this and address the comment when I merge.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10272][Pyspark][MLLib] Added @since tag...

2015-09-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8628#issuecomment-141351219
  
  [Test build #42643 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42643/console)
 for   PR 8628 at commit 
[`9f06d04`](https://github.com/apache/spark/commit/9f06d04b272b413cee27ccfa35dd304843b264c9).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10648] Proposed bug fix when oracle ret...

2015-09-17 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/8780#issuecomment-141351034
  
(I actually don't know if Spark implements this correctly -- we should test 
it)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10648] Proposed bug fix when oracle ret...

2015-09-17 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/8780#issuecomment-141350914
  
Actually scale can be negative. It just means the number of 0s to the left 
of decimal point.

For example, for number 123, precision = 2 and scale = -1, then 123 would 
become 120.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: docs/running-on-mesos.md: state default values...

2015-09-17 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/8810#discussion_r39825352
  
--- Diff: docs/running-on-mesos.md ---
@@ -332,21 +332,21 @@ See the [configuration page](configuration.html) for 
information on Spark config
 
 
   spark.mesos.principal
-  Framework principal to authenticate to Mesos
+  (none)
   
 Set the principal with which Spark framework will use to authenticate 
with Mesos.
   
 
 
   spark.mesos.secret
-  Framework secret to authenticate to Mesos
+  (none)/td>
   
 Set the secret with which Spark framework will use to authenticate 
with Mesos.
   
 
 
   spark.mesos.role
-  Role for the Spark framework
+  *
--- End diff --

can you put  around this?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: docs/running-on-mesos.md: state default values...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8810#issuecomment-141349184
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9522][SQL] SparkSubmit process can not ...

2015-09-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/7853


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10269][Pyspark][MLLib] Add @since annot...

2015-09-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8626#issuecomment-141349076
  
  [Test build #42645 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42645/consoleFull)
 for   PR 8626 at commit 
[`2e81fd3`](https://github.com/apache/spark/commit/2e81fd314b98a460560376161a3d03950b0ed8fc).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: docs/running-on-mesos.md: state default values...

2015-09-17 Thread felixb

GitHub user felixb opened a pull request:

https://github.com/apache/spark/pull/8810

docs/running-on-mesos.md: state default values in default column

This PR simply uses the default value column for defaults.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/felixb/spark fix_mesos_doc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8810.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8810


commit 09b4e15f0dfe903ac70c0d6f4a8fcf06dac6d78b
Author: Felix Bechstein 
Date:   2015-09-18T05:25:00Z

docs/running-on-mesos.md: state default values in default column




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10471] [CORE] [MESOS] prevent getting o...

2015-09-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8639#issuecomment-141349042
  
  [Test build #42646 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42646/consoleFull)
 for   PR 8639 at commit 
[`58aaa79`](https://github.com/apache/spark/commit/58aaa79095143187175f0292d71b772b90db).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10271][Pyspark][MLLib] Added @since tag...

2015-09-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8627#issuecomment-141349055
  
  [Test build #42644 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42644/consoleFull)
 for   PR 8627 at commit 
[`100ce0f`](https://github.com/apache/spark/commit/100ce0fc36e9d143f6789db2a749afb8902d0676).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10272][Pyspark][MLLib] Added @since tag...

2015-09-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8628#issuecomment-141348863
  
  [Test build #42643 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42643/consoleFull)
 for   PR 8628 at commit 
[`9f06d04`](https://github.com/apache/spark/commit/9f06d04b272b413cee27ccfa35dd304843b264c9).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9522][SQL] SparkSubmit process can not ...

2015-09-17 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/7853#issuecomment-141348696
  
lgtm. merging to 1.5 branch and master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10471] [CORE] [MESOS] prevent getting o...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8639#issuecomment-141348570
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10471] [CORE] [MESOS] prevent getting o...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8639#issuecomment-141348557
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10471] [CORE] [MESOS] prevent getting o...

2015-09-17 Thread felixb

Github user felixb commented on the pull request:

https://github.com/apache/spark/pull/8639#issuecomment-141348584
  
added to table of parameters.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...

2015-09-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/4716#issuecomment-141347889
  
  [Test build #42642 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42642/consoleFull)
 for   PR 4716 at commit 
[`60b2e57`](https://github.com/apache/spark/commit/60b2e57026febcb68e459983ba3164281a47f636).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: SPARK-10329 Cost RDD in k-means|| initializati...

2015-09-17 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/8546#issuecomment-141347838
  
@HuJiayin This basically reverts the behavior back to 1.2. The changes we 
made in 1.3 is to avoid recomputing distances between old centers and input 
points during initialization. That is why we need `newCenters`. If you test the 
current version with a large `k`, you will see the performance difference. Base 
on our discussion offline, I think there are not much work to do here. The case 
when the new implementation introduces overhead is when the dataset is really 
tall and skinny, but we haven't heard negative feedback from practical use 
cases yet. Do you mind closing this PR for now? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10272][Pyspark][MLLib] Added @since tag...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8628#issuecomment-141347398
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10269][Pyspark][MLLib] Add @since annot...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8626#issuecomment-141347427
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10269][Pyspark][MLLib] Add @since annot...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8626#issuecomment-141347413
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10272][Pyspark][MLLib] Added @since tag...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8628#issuecomment-141347425
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10271][Pyspark][MLLib] Added @since tag...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8627#issuecomment-141347428
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10577] [PySpark] DataFrame hint for bro...

2015-09-17 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/8801#issuecomment-141347383
  
Jenkins, test this please.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10271][Pyspark][MLLib] Added @since tag...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8627#issuecomment-141347405
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10679] [CORE] javax.jdo.JDOFatalUserExc...

2015-09-17 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/8804#discussion_r39824809
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
@@ -295,13 +298,25 @@ private[hive] object HadoopTableReader extends 
HiveInspectors with Logging {
   def initializeLocalJobConfFunc(path: String, tableDesc: 
TableDesc)(jobConf: JobConf) {
 FileInputFormat.setInputPaths(jobConf, Seq[Path](new Path(path)): _*)
 if (tableDesc != null) {
-  PlanUtils.configureInputJobPropertiesForStorageHandler(tableDesc)
+  configureJobPropertiesForStorageHandler(tableDesc, jobConf)
   Utilities.copyTableJobPropertiesToConf(tableDesc, jobConf)
 }
 val bufferSize = System.getProperty("spark.buffer.size", "65536")
 jobConf.set("io.file.buffer.size", bufferSize)
   }
 
+  private def configureJobPropertiesForStorageHandler(tableDesc: 
TableDesc, jobConf: JobConf) {
--- End diff --

can you add some comment explaining what's happening, and why this is done 
this way?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10659][SQL] Add an option in SQLConf fo...

2015-09-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8809#issuecomment-141347277
  
  [Test build #42641 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42641/consoleFull)
 for   PR 8809 at commit 
[`6911d0f`](https://github.com/apache/spark/commit/6911d0ff9d82475f69ae558cbb9aab1ed588c847).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10684] [SQL] StructType.interpretedOrde...

2015-09-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8808#issuecomment-141347060
  
  [Test build #1772 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1772/consoleFull)
 for   PR 8808 at commit 
[`a26512b`](https://github.com/apache/spark/commit/a26512b12339a5f82d7c55c6107a1fe5e50ac43d).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10272][Pyspark][MLLib] Added @since tag...

2015-09-17 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/8628#issuecomment-141347073
  
@yu-iskw Could you help review this PR? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10271][Pyspark][MLLib] Added @since tag...

2015-09-17 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/8627#issuecomment-141347089
  
@yu-iskw Could you help review this PR? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10269][Pyspark][MLLib] Add @since annot...

2015-09-17 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/8626#issuecomment-141347059
  
@yu-iskw Could you help review this PR? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10659][SQL] Add an option in SQLConf fo...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8809#issuecomment-141346942
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10659][SQL] Add an option in SQLConf fo...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8809#issuecomment-141346996
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10272][Pyspark][MLLib] Added @since tag...

2015-09-17 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/8628#issuecomment-141347019
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10269][Pyspark][MLLib] Add @since annot...

2015-09-17 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/8626#issuecomment-141346952
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4716#issuecomment-141346945
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10271][Pyspark][MLLib] Added @since tag...

2015-09-17 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/8627#issuecomment-141347014
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/4716#issuecomment-141346997
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...

2015-09-17 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4716#discussion_r39824573
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/stat/test/TestResult.scala ---
@@ -115,3 +115,25 @@ class KolmogorovSmirnovTestResult private[stat] (
 "Kolmogorov-Smirnov test summary:\n" + super.toString
   }
 }
+
+/**
+ * :: Experimental ::
+ * Object containing the test results for streaming testing.
+ */
+@Experimental
+@Since("1.6.0")
+private[stat] class StreamingTestResult(
--- End diff --

add @Since to constructor


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10659][SQL] Add an option in SQLConf fo...

2015-09-17 Thread viirya

GitHub user viirya opened a pull request:

https://github.com/apache/spark/pull/8809

[SPARK-10659][SQL] Add an option in SQLConf for setting schema nullable in 
datasource

JIRA: https://issues.apache.org/jira/browse/SPARK-10659

If not preserve REQUIRED (not nullable) flag in schema is a problem for 
users, I think we can add an option (default is enabled) to enable this 
behavior or not.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/viirya/spark-1 optional_asnullable

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/8809.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #8809


commit 6911d0ff9d82475f69ae558cbb9aab1ed588c847
Author: Liang-Chi Hsieh 
Date:   2015-09-18T05:03:15Z

Add an option in SQLConf for schema asNullable.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...

2015-09-17 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/4716#issuecomment-141346526
  
test this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...

2015-09-17 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4716#discussion_r39824556
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTestMethod.scala 
---
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.stat.test
+
+import java.io.Serializable
+
+import scala.language.implicitConversions
+import scala.math.pow
+
+import com.twitter.chill.MeatLocker
+import org.apache.commons.math3.stat.descriptive.StatisticalSummaryValues
+import org.apache.commons.math3.stat.inference.TTest
+
+import org.apache.spark.Logging
+import org.apache.spark.streaming.dstream.DStream
+import org.apache.spark.util.StatCounter
+
+/**
+ * Significance testing methods for [[StreamingTest]]. New 2-sample 
statistical significance tests
+ * should extend [[StreamingTestMethod]] and introduce a new entry in
+ * [[StreamingTestMethod.TEST_NAME_TO_OBJECT]]
+ */
+private[stat] sealed trait StreamingTestMethod extends Serializable {
+
+  val MethodName: String
+  val NullHypothesis: String
+
+  protected type SummaryPairStream =
+DStream[(StatCounter, StatCounter)]
+
+  /**
+   * Perform streaming 2-sample statistical significance testing.
+   *
+   * @param sampleSummaries stream pairs of summary statistics for the 2 
samples
+   * @return stream of rest results
+   */
+  def doTest(sampleSummaries: SummaryPairStream): 
DStream[StreamingTestResult]
+
+  /**
+   * Implicit adapter to convert between streaming summary statistics type 
and the type required by
+   * the t-testing libraries.
+   */
+  protected implicit def toApacheCommonsStats(
+  summaryStats: StatCounter): StatisticalSummaryValues = {
+new StatisticalSummaryValues(
+  summaryStats.mean,
+  summaryStats.variance,
+  summaryStats.count,
+  summaryStats.max,
+  summaryStats.min,
+  summaryStats.mean * summaryStats.count
+)
+  }
+}
+
+/**
+ * Performs Welch's 2-sample t-test. The null hypothesis is that the two 
data sets have equal mean.
+ * This test does not assume equal variance between the two samples and 
does not assume equal
+ * sample size.
+ *
+ * More information: http://en.wikipedia.org/wiki/Welch%27s_t_test
+ */
+private[stat] object WelchTTest extends StreamingTestMethod with Logging {
+
+  final val MethodName = "Welch's 2-sample T-test"
+  final val NullHypothesis = "Both groups have same mean"
+
+  private final val TTester = MeatLocker(new TTest())
+
+  def doTest(data: SummaryPairStream): DStream[StreamingTestResult] =
+data.map[StreamingTestResult]((test _).tupled)
+
+  private def test(
+  statsA: StatCounter,
+  statsB: StatCounter): StreamingTestResult = {
+def welchDF(sample1: StatisticalSummaryValues, sample2: 
StatisticalSummaryValues): Double = {
+  val s1 = sample1.getVariance
+  val n1 = sample1.getN
+  val s2 = sample2.getVariance
+  val n2 = sample2.getN
+
+  val a = pow(s1, 2) / n1
+  val b = pow(s2, 2) / n2
+
+  pow(a + b, 2) / ((pow(a, 2) / (n1 - 1)) + (pow(b, 2) / (n2 - 1)))
+}
+
+new StreamingTestResult(
+  TTester.get.tTest(statsA, statsB),
+  welchDF(statsA, statsB),
+  TTester.get.t(statsA, statsB),
+  MethodName,
+  NullHypothesis
+)
+  }
+}
+
+/**
+ * Performs Students's 2-sample t-test. The null hypothesis is that the 
two data sets have equal
+ * mean. This test assumes equal variance between the two samples and does 
not assume equal sample
+ * size. For unequal variances, Welch's t-test should be used instead.
+ *
+ * More information: http://en.wikipedia.org/wiki/Student%27s_t-test
+ */
+private[stat] object StudentTTest extends StreamingTestMethod with Logging 
{

[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...

2015-09-17 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4716#discussion_r39824567
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTestMethod.scala 
---
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.stat.test
+
+import java.io.Serializable
+
+import scala.language.implicitConversions
+import scala.math.pow
+
+import com.twitter.chill.MeatLocker
+import org.apache.commons.math3.stat.descriptive.StatisticalSummaryValues
+import org.apache.commons.math3.stat.inference.TTest
+
+import org.apache.spark.Logging
+import org.apache.spark.streaming.dstream.DStream
+import org.apache.spark.util.StatCounter
+
+/**
+ * Significance testing methods for [[StreamingTest]]. New 2-sample 
statistical significance tests
+ * should extend [[StreamingTestMethod]] and introduce a new entry in
+ * [[StreamingTestMethod.TEST_NAME_TO_OBJECT]]
+ */
+private[stat] sealed trait StreamingTestMethod extends Serializable {
+
+  val MethodName: String
+  val NullHypothesis: String
+
+  protected type SummaryPairStream =
+DStream[(StatCounter, StatCounter)]
+
+  /**
+   * Perform streaming 2-sample statistical significance testing.
+   *
+   * @param sampleSummaries stream pairs of summary statistics for the 2 
samples
+   * @return stream of rest results
+   */
+  def doTest(sampleSummaries: SummaryPairStream): 
DStream[StreamingTestResult]
+
+  /**
+   * Implicit adapter to convert between streaming summary statistics type 
and the type required by
+   * the t-testing libraries.
+   */
+  protected implicit def toApacheCommonsStats(
+  summaryStats: StatCounter): StatisticalSummaryValues = {
+new StatisticalSummaryValues(
+  summaryStats.mean,
+  summaryStats.variance,
+  summaryStats.count,
+  summaryStats.max,
+  summaryStats.min,
+  summaryStats.mean * summaryStats.count
+)
+  }
+}
+
+/**
+ * Performs Welch's 2-sample t-test. The null hypothesis is that the two 
data sets have equal mean.
+ * This test does not assume equal variance between the two samples and 
does not assume equal
+ * sample size.
+ *
+ * More information: http://en.wikipedia.org/wiki/Welch%27s_t_test
+ */
+private[stat] object WelchTTest extends StreamingTestMethod with Logging {
+
+  final val MethodName = "Welch's 2-sample T-test"
+  final val NullHypothesis = "Both groups have same mean"
+
+  private final val TTester = MeatLocker(new TTest())
+
+  def doTest(data: SummaryPairStream): DStream[StreamingTestResult] =
+data.map[StreamingTestResult]((test _).tupled)
+
+  private def test(
+  statsA: StatCounter,
+  statsB: StatCounter): StreamingTestResult = {
+def welchDF(sample1: StatisticalSummaryValues, sample2: 
StatisticalSummaryValues): Double = {
+  val s1 = sample1.getVariance
+  val n1 = sample1.getN
+  val s2 = sample2.getVariance
+  val n2 = sample2.getN
+
+  val a = pow(s1, 2) / n1
+  val b = pow(s2, 2) / n2
+
+  pow(a + b, 2) / ((pow(a, 2) / (n1 - 1)) + (pow(b, 2) / (n2 - 1)))
+}
+
+new StreamingTestResult(
+  TTester.get.tTest(statsA, statsB),
+  welchDF(statsA, statsB),
+  TTester.get.t(statsA, statsB),
+  MethodName,
+  NullHypothesis
+)
+  }
+}
+
+/**
+ * Performs Students's 2-sample t-test. The null hypothesis is that the 
two data sets have equal
+ * mean. This test assumes equal variance between the two samples and does 
not assume equal sample
+ * size. For unequal variances, Welch's t-test should be used instead.
+ *
+ * More information: http://en.wikipedia.org/wiki/Student%27s_t-test
+ */
+private[stat] object StudentTTest extends StreamingTestMethod with Logging 
{

[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...

2015-09-17 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4716#discussion_r39824543
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTestMethod.scala 
---
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.stat.test
+
+import java.io.Serializable
+
+import scala.language.implicitConversions
+import scala.math.pow
+
+import com.twitter.chill.MeatLocker
+import org.apache.commons.math3.stat.descriptive.StatisticalSummaryValues
+import org.apache.commons.math3.stat.inference.TTest
+
+import org.apache.spark.Logging
+import org.apache.spark.streaming.dstream.DStream
+import org.apache.spark.util.StatCounter
+
+/**
+ * Significance testing methods for [[StreamingTest]]. New 2-sample 
statistical significance tests
+ * should extend [[StreamingTestMethod]] and introduce a new entry in
+ * [[StreamingTestMethod.TEST_NAME_TO_OBJECT]]
+ */
+private[stat] sealed trait StreamingTestMethod extends Serializable {
+
+  val MethodName: String
+  val NullHypothesis: String
+
+  protected type SummaryPairStream =
+DStream[(StatCounter, StatCounter)]
+
+  /**
+   * Perform streaming 2-sample statistical significance testing.
+   *
+   * @param sampleSummaries stream pairs of summary statistics for the 2 
samples
+   * @return stream of rest results
+   */
+  def doTest(sampleSummaries: SummaryPairStream): 
DStream[StreamingTestResult]
+
+  /**
+   * Implicit adapter to convert between streaming summary statistics type 
and the type required by
+   * the t-testing libraries.
+   */
+  protected implicit def toApacheCommonsStats(
+  summaryStats: StatCounter): StatisticalSummaryValues = {
+new StatisticalSummaryValues(
+  summaryStats.mean,
+  summaryStats.variance,
+  summaryStats.count,
+  summaryStats.max,
+  summaryStats.min,
+  summaryStats.mean * summaryStats.count
+)
+  }
+}
+
+/**
+ * Performs Welch's 2-sample t-test. The null hypothesis is that the two 
data sets have equal mean.
+ * This test does not assume equal variance between the two samples and 
does not assume equal
+ * sample size.
+ *
+ * More information: http://en.wikipedia.org/wiki/Welch%27s_t_test
+ */
+private[stat] object WelchTTest extends StreamingTestMethod with Logging {
+
+  final val MethodName = "Welch's 2-sample T-test"
+  final val NullHypothesis = "Both groups have same mean"
+
+  private final val TTester = MeatLocker(new TTest())
--- End diff --

`tTester`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-10682][GraphX] Remove Bagel test suites...

2015-09-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/8807


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...

2015-09-17 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4716#discussion_r39824515
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTestMethod.scala 
---
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.stat.test
+
+import java.io.Serializable
+
+import scala.language.implicitConversions
+import scala.math.pow
+
+import com.twitter.chill.MeatLocker
+import org.apache.commons.math3.stat.descriptive.StatisticalSummaryValues
+import org.apache.commons.math3.stat.inference.TTest
+
+import org.apache.spark.Logging
+import org.apache.spark.streaming.dstream.DStream
+import org.apache.spark.util.StatCounter
+
+/**
+ * Significance testing methods for [[StreamingTest]]. New 2-sample 
statistical significance tests
+ * should extend [[StreamingTestMethod]] and introduce a new entry in
+ * [[StreamingTestMethod.TEST_NAME_TO_OBJECT]]
+ */
+private[stat] sealed trait StreamingTestMethod extends Serializable {
+
+  val MethodName: String
--- End diff --

`MethodName` ->` methodName`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...

2015-09-17 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4716#discussion_r39824516
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTestMethod.scala 
---
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.stat.test
+
+import java.io.Serializable
+
+import scala.language.implicitConversions
+import scala.math.pow
+
+import com.twitter.chill.MeatLocker
+import org.apache.commons.math3.stat.descriptive.StatisticalSummaryValues
+import org.apache.commons.math3.stat.inference.TTest
+
+import org.apache.spark.Logging
+import org.apache.spark.streaming.dstream.DStream
+import org.apache.spark.util.StatCounter
+
+/**
+ * Significance testing methods for [[StreamingTest]]. New 2-sample 
statistical significance tests
+ * should extend [[StreamingTestMethod]] and introduce a new entry in
+ * [[StreamingTestMethod.TEST_NAME_TO_OBJECT]]
+ */
+private[stat] sealed trait StreamingTestMethod extends Serializable {
+
+  val MethodName: String
+  val NullHypothesis: String
--- End diff --

`nullHypothesis`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141345895
  
  [Test build #42640 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42640/consoleFull)
 for   PR 8631 at commit 
[`1f731c2`](https://github.com/apache/spark/commit/1f731c28ad8a59f3bf432435253dc7b0984f46b4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...

2015-09-17 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4716#discussion_r39824449
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTestMethod.scala 
---
@@ -0,0 +1,165 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.stat.test
+
+import java.io.Serializable
+
+import scala.language.implicitConversions
+import scala.math.pow
+
+import com.twitter.chill.MeatLocker
+import org.apache.commons.math3.stat.descriptive.StatisticalSummaryValues
+import org.apache.commons.math3.stat.inference.TTest
+
+import org.apache.spark.Logging
+import org.apache.spark.streaming.dstream.DStream
+import org.apache.spark.util.StatCounter
+
+/**
+ * Significance testing methods for [[StreamingTest]]. New 2-sample 
statistical significance tests
+ * should extend [[StreamingTestMethod]] and introduce a new entry in
+ * [[StreamingTestMethod.TEST_NAME_TO_OBJECT]]
+ */
+private[stat] sealed trait StreamingTestMethod extends Serializable {
+
+  val MethodName: String
+  val NullHypothesis: String
+
+  protected type SummaryPairStream =
+DStream[(StatCounter, StatCounter)]
+
+  /**
+   * Perform streaming 2-sample statistical significance testing.
+   *
+   * @param sampleSummaries stream pairs of summary statistics for the 2 
samples
+   * @return stream of rest results
+   */
+  def doTest(sampleSummaries: SummaryPairStream): 
DStream[StreamingTestResult]
+
+  /**
+   * Implicit adapter to convert between streaming summary statistics type 
and the type required by
+   * the t-testing libraries.
+   */
+  protected implicit def toApacheCommonsStats(
+  summaryStats: StatCounter): StatisticalSummaryValues = {
+new StatisticalSummaryValues(
+  summaryStats.mean,
+  summaryStats.variance,
+  summaryStats.count,
+  summaryStats.max,
+  summaryStats.min,
+  summaryStats.mean * summaryStats.count
+)
+  }
+}
+
+/**
+ * Performs Welch's 2-sample t-test. The null hypothesis is that the two 
data sets have equal mean.
+ * This test does not assume equal variance between the two samples and 
does not assume equal
+ * sample size.
+ *
+ * More information: http://en.wikipedia.org/wiki/Welch%27s_t_test
+ */
+private[stat] object WelchTTest extends StreamingTestMethod with Logging {
+
+  final val MethodName = "Welch's 2-sample T-test"
--- End diff --

`T-test` -> `t-test`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...

2015-09-17 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4716#discussion_r39824394
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTest.scala ---
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.stat.test
+
+import org.apache.spark.Logging
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.streaming.dstream.DStream
+import org.apache.spark.util.StatCounter
+
+/**
+ * :: Experimental ::
+ * Performs online 2-sample significance testing for a stream of (Boolean, 
Double) pairs. The
+ * Boolean identifies which sample each observation comes from, and the 
Double is the numeric value
+ * of the observation.
+ *
+ * To address novelty affects, the `peacePeriod` specifies a set number of 
initial
+ * [[org.apache.spark.rdd.RDD]] batches of the [[DStream]] to be dropped 
from significance testing.
+ *
+ * The `windowSize` sets the number of batches each significance test is 
to be performed over. The
+ * window is sliding with a stride length of 1 batch. Setting windowSize 
to 0 will perform
+ * cumulative processing, using all batches seen so far.
+ *
+ * Different tests may be used for assessing statistical significance 
depending on assumptions
+ * satisfied by data. For more details, see [[StreamingTestMethod]]. The 
`testMethod` specifies
+ * which test will be used.
+ *
+ * Use a builder pattern to construct a streaming test in an application, 
for example:
+ *   ```
+ *   val model = new OnlineABTest()
+ * .setPeacePeriod(10)
+ * .setWindowSize(0)
+ * .setTestMethod("welch")
+ * .registerStream(DStream)
+ *   ```
+ */
+@Experimental
+@Since("1.6.0")
+class StreamingTest(
+@Since("1.6.0") var peacePeriod: Int = 0,
+@Since("1.6.0") var windowSize: Int = 0,
+@Since("1.6.0") var testMethod: StreamingTestMethod = WelchTTest)
+  extends Logging with Serializable {
+
+  /** Set the number of initial batches to ignore. */
+  @Since("1.6.0")
+  def setPeacePeriod(peacePeriod: Int): this.type = {
+this.peacePeriod = peacePeriod
+this
+  }
+
+  /**
+   * Set the number of batches to compute significance tests over.
+   * A value of 0 will use all batches seen so far.
+   */
+  @Since("1.6.0")
+  def setWindowSize(windowSize: Int): this.type = {
+this.windowSize = windowSize
+this
+  }
+
+  /** Set the statistical method used for significance testing. */
+  @Since("1.6.0")
+  def setTestMethod(method: String): this.type = {
+this.testMethod = StreamingTestMethod.getTestMethodFromName(method)
+this
+  }
+
+  /**
+   * Register a [[DStream]] of values for significance testing.
+   *
+   * @param data stream of (key,value) pairs where the key is the group 
membership (control or
+   * treatment) and the value is the numerical metric to test 
for significance
+   * @return stream of significance testing results
+   */
+  @Since("1.6.0")
+  def registerStream(data: DStream[(Boolean, Double)]): 
DStream[StreamingTestResult] = {
+val dataAfterPeacePeriod = dropPeacePeriod(data)
+val summarizedData = summarizeByKeyAndWindow(dataAfterPeacePeriod)
+val pairedSummaries = pairSummaries(summarizedData)
+val testResults = testMethod.doTest(pairedSummaries)
+
+testResults
+  }
+
+  /** Drop all batches inside the peace period. */
+  private[stat] def dropPeacePeriod(
+  data: DStream[(Boolean, Double)]): DStream[(Boolean, Double)] = {
+data.transform { (rdd, time) =>
+  if (time.milliseconds > data.slideDuration.milliseconds * 
peacePeriod) {
+rdd
+  } else {
+rdd.filter(_ => false) // TODO: Is there a better way to drop a 
RDD from a DStream?
+  }
+}
+  }
+

[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...

2015-09-17 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4716#discussion_r39824390
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTest.scala ---
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.stat.test
+
+import org.apache.spark.Logging
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.streaming.dstream.DStream
+import org.apache.spark.util.StatCounter
+
+/**
+ * :: Experimental ::
+ * Performs online 2-sample significance testing for a stream of (Boolean, 
Double) pairs. The
+ * Boolean identifies which sample each observation comes from, and the 
Double is the numeric value
+ * of the observation.
+ *
+ * To address novelty affects, the `peacePeriod` specifies a set number of 
initial
+ * [[org.apache.spark.rdd.RDD]] batches of the [[DStream]] to be dropped 
from significance testing.
+ *
+ * The `windowSize` sets the number of batches each significance test is 
to be performed over. The
+ * window is sliding with a stride length of 1 batch. Setting windowSize 
to 0 will perform
+ * cumulative processing, using all batches seen so far.
+ *
+ * Different tests may be used for assessing statistical significance 
depending on assumptions
+ * satisfied by data. For more details, see [[StreamingTestMethod]]. The 
`testMethod` specifies
+ * which test will be used.
+ *
+ * Use a builder pattern to construct a streaming test in an application, 
for example:
+ *   ```
+ *   val model = new OnlineABTest()
+ * .setPeacePeriod(10)
+ * .setWindowSize(0)
+ * .setTestMethod("welch")
+ * .registerStream(DStream)
+ *   ```
+ */
+@Experimental
+@Since("1.6.0")
+class StreamingTest(
+@Since("1.6.0") var peacePeriod: Int = 0,
+@Since("1.6.0") var windowSize: Int = 0,
+@Since("1.6.0") var testMethod: StreamingTestMethod = WelchTTest)
+  extends Logging with Serializable {
+
+  /** Set the number of initial batches to ignore. */
+  @Since("1.6.0")
+  def setPeacePeriod(peacePeriod: Int): this.type = {
+this.peacePeriod = peacePeriod
+this
+  }
+
+  /**
+   * Set the number of batches to compute significance tests over.
+   * A value of 0 will use all batches seen so far.
+   */
+  @Since("1.6.0")
+  def setWindowSize(windowSize: Int): this.type = {
+this.windowSize = windowSize
+this
+  }
+
+  /** Set the statistical method used for significance testing. */
+  @Since("1.6.0")
+  def setTestMethod(method: String): this.type = {
+this.testMethod = StreamingTestMethod.getTestMethodFromName(method)
+this
+  }
+
+  /**
+   * Register a [[DStream]] of values for significance testing.
+   *
+   * @param data stream of (key,value) pairs where the key is the group 
membership (control or
+   * treatment) and the value is the numerical metric to test 
for significance
+   * @return stream of significance testing results
+   */
+  @Since("1.6.0")
+  def registerStream(data: DStream[(Boolean, Double)]): 
DStream[StreamingTestResult] = {
+val dataAfterPeacePeriod = dropPeacePeriod(data)
+val summarizedData = summarizeByKeyAndWindow(dataAfterPeacePeriod)
+val pairedSummaries = pairSummaries(summarizedData)
+val testResults = testMethod.doTest(pairedSummaries)
+
+testResults
+  }
+
+  /** Drop all batches inside the peace period. */
+  private[stat] def dropPeacePeriod(
+  data: DStream[(Boolean, Double)]): DStream[(Boolean, Double)] = {
+data.transform { (rdd, time) =>
+  if (time.milliseconds > data.slideDuration.milliseconds * 
peacePeriod) {
+rdd
+  } else {
+rdd.filter(_ => false) // TODO: Is there a better way to drop a 
RDD from a DStream?
+  }
+}
+  }
+

[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...

2015-09-17 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4716#discussion_r39824392
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTest.scala ---
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.stat.test
+
+import org.apache.spark.Logging
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.streaming.dstream.DStream
+import org.apache.spark.util.StatCounter
+
+/**
+ * :: Experimental ::
+ * Performs online 2-sample significance testing for a stream of (Boolean, 
Double) pairs. The
+ * Boolean identifies which sample each observation comes from, and the 
Double is the numeric value
+ * of the observation.
+ *
+ * To address novelty affects, the `peacePeriod` specifies a set number of 
initial
+ * [[org.apache.spark.rdd.RDD]] batches of the [[DStream]] to be dropped 
from significance testing.
+ *
+ * The `windowSize` sets the number of batches each significance test is 
to be performed over. The
+ * window is sliding with a stride length of 1 batch. Setting windowSize 
to 0 will perform
+ * cumulative processing, using all batches seen so far.
+ *
+ * Different tests may be used for assessing statistical significance 
depending on assumptions
+ * satisfied by data. For more details, see [[StreamingTestMethod]]. The 
`testMethod` specifies
+ * which test will be used.
+ *
+ * Use a builder pattern to construct a streaming test in an application, 
for example:
+ *   ```
+ *   val model = new OnlineABTest()
+ * .setPeacePeriod(10)
+ * .setWindowSize(0)
+ * .setTestMethod("welch")
+ * .registerStream(DStream)
+ *   ```
+ */
+@Experimental
+@Since("1.6.0")
+class StreamingTest(
+@Since("1.6.0") var peacePeriod: Int = 0,
+@Since("1.6.0") var windowSize: Int = 0,
+@Since("1.6.0") var testMethod: StreamingTestMethod = WelchTTest)
+  extends Logging with Serializable {
+
+  /** Set the number of initial batches to ignore. */
+  @Since("1.6.0")
+  def setPeacePeriod(peacePeriod: Int): this.type = {
+this.peacePeriod = peacePeriod
+this
+  }
+
+  /**
+   * Set the number of batches to compute significance tests over.
+   * A value of 0 will use all batches seen so far.
+   */
+  @Since("1.6.0")
+  def setWindowSize(windowSize: Int): this.type = {
+this.windowSize = windowSize
+this
+  }
+
+  /** Set the statistical method used for significance testing. */
+  @Since("1.6.0")
+  def setTestMethod(method: String): this.type = {
+this.testMethod = StreamingTestMethod.getTestMethodFromName(method)
+this
+  }
+
+  /**
+   * Register a [[DStream]] of values for significance testing.
+   *
+   * @param data stream of (key,value) pairs where the key is the group 
membership (control or
+   * treatment) and the value is the numerical metric to test 
for significance
+   * @return stream of significance testing results
+   */
+  @Since("1.6.0")
+  def registerStream(data: DStream[(Boolean, Double)]): 
DStream[StreamingTestResult] = {
+val dataAfterPeacePeriod = dropPeacePeriod(data)
+val summarizedData = summarizeByKeyAndWindow(dataAfterPeacePeriod)
+val pairedSummaries = pairSummaries(summarizedData)
+val testResults = testMethod.doTest(pairedSummaries)
+
+testResults
+  }
+
+  /** Drop all batches inside the peace period. */
+  private[stat] def dropPeacePeriod(
+  data: DStream[(Boolean, Double)]): DStream[(Boolean, Double)] = {
+data.transform { (rdd, time) =>
+  if (time.milliseconds > data.slideDuration.milliseconds * 
peacePeriod) {
+rdd
+  } else {
+rdd.filter(_ => false) // TODO: Is there a better way to drop a 
RDD from a DStream?
+  }
+}
+  }
+

[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...

2015-09-17 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4716#discussion_r39824320
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTest.scala ---
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.stat.test
+
+import org.apache.spark.Logging
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.streaming.dstream.DStream
+import org.apache.spark.util.StatCounter
+
+/**
+ * :: Experimental ::
+ * Performs online 2-sample significance testing for a stream of (Boolean, 
Double) pairs. The
+ * Boolean identifies which sample each observation comes from, and the 
Double is the numeric value
+ * of the observation.
+ *
+ * To address novelty affects, the `peacePeriod` specifies a set number of 
initial
+ * [[org.apache.spark.rdd.RDD]] batches of the [[DStream]] to be dropped 
from significance testing.
+ *
+ * The `windowSize` sets the number of batches each significance test is 
to be performed over. The
+ * window is sliding with a stride length of 1 batch. Setting windowSize 
to 0 will perform
+ * cumulative processing, using all batches seen so far.
+ *
+ * Different tests may be used for assessing statistical significance 
depending on assumptions
+ * satisfied by data. For more details, see [[StreamingTestMethod]]. The 
`testMethod` specifies
+ * which test will be used.
+ *
+ * Use a builder pattern to construct a streaming test in an application, 
for example:
+ *   ```
+ *   val model = new OnlineABTest()
+ * .setPeacePeriod(10)
+ * .setWindowSize(0)
+ * .setTestMethod("welch")
+ * .registerStream(DStream)
+ *   ```
+ */
+@Experimental
+@Since("1.6.0")
+class StreamingTest(
+@Since("1.6.0") var peacePeriod: Int = 0,
+@Since("1.6.0") var windowSize: Int = 0,
+@Since("1.6.0") var testMethod: StreamingTestMethod = WelchTTest)
+  extends Logging with Serializable {
+
+  /** Set the number of initial batches to ignore. */
+  @Since("1.6.0")
+  def setPeacePeriod(peacePeriod: Int): this.type = {
+this.peacePeriod = peacePeriod
+this
+  }
+
+  /**
+   * Set the number of batches to compute significance tests over.
+   * A value of 0 will use all batches seen so far.
+   */
+  @Since("1.6.0")
+  def setWindowSize(windowSize: Int): this.type = {
+this.windowSize = windowSize
+this
+  }
+
+  /** Set the statistical method used for significance testing. */
+  @Since("1.6.0")
+  def setTestMethod(method: String): this.type = {
+this.testMethod = StreamingTestMethod.getTestMethodFromName(method)
+this
+  }
+
+  /**
+   * Register a [[DStream]] of values for significance testing.
+   *
+   * @param data stream of (key,value) pairs where the key is the group 
membership (control or
+   * treatment) and the value is the numerical metric to test 
for significance
+   * @return stream of significance testing results
+   */
+  @Since("1.6.0")
+  def registerStream(data: DStream[(Boolean, Double)]): 
DStream[StreamingTestResult] = {
+val dataAfterPeacePeriod = dropPeacePeriod(data)
+val summarizedData = summarizeByKeyAndWindow(dataAfterPeacePeriod)
+val pairedSummaries = pairSummaries(summarizedData)
+val testResults = testMethod.doTest(pairedSummaries)
+
+testResults
+  }
+
+  /** Drop all batches inside the peace period. */
+  private[stat] def dropPeacePeriod(
+  data: DStream[(Boolean, Double)]): DStream[(Boolean, Double)] = {
+data.transform { (rdd, time) =>
+  if (time.milliseconds > data.slideDuration.milliseconds * 
peacePeriod) {
+rdd
+  } else {
+rdd.filter(_ => false) // TODO: Is there a better way to drop a 
RDD from a DStream?
--- End diff --

you only ne

[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...

2015-09-17 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4716#discussion_r39824292
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTest.scala ---
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.stat.test
+
+import org.apache.spark.Logging
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.streaming.dstream.DStream
+import org.apache.spark.util.StatCounter
+
+/**
+ * :: Experimental ::
+ * Performs online 2-sample significance testing for a stream of (Boolean, 
Double) pairs. The
+ * Boolean identifies which sample each observation comes from, and the 
Double is the numeric value
+ * of the observation.
+ *
+ * To address novelty affects, the `peacePeriod` specifies a set number of 
initial
+ * [[org.apache.spark.rdd.RDD]] batches of the [[DStream]] to be dropped 
from significance testing.
+ *
+ * The `windowSize` sets the number of batches each significance test is 
to be performed over. The
+ * window is sliding with a stride length of 1 batch. Setting windowSize 
to 0 will perform
+ * cumulative processing, using all batches seen so far.
+ *
+ * Different tests may be used for assessing statistical significance 
depending on assumptions
+ * satisfied by data. For more details, see [[StreamingTestMethod]]. The 
`testMethod` specifies
+ * which test will be used.
+ *
+ * Use a builder pattern to construct a streaming test in an application, 
for example:
+ *   ```
+ *   val model = new OnlineABTest()
+ * .setPeacePeriod(10)
+ * .setWindowSize(0)
+ * .setTestMethod("welch")
+ * .registerStream(DStream)
+ *   ```
+ */
+@Experimental
+@Since("1.6.0")
+class StreamingTest(
+@Since("1.6.0") var peacePeriod: Int = 0,
+@Since("1.6.0") var windowSize: Int = 0,
+@Since("1.6.0") var testMethod: StreamingTestMethod = WelchTTest)
+  extends Logging with Serializable {
+
+  /** Set the number of initial batches to ignore. */
+  @Since("1.6.0")
+  def setPeacePeriod(peacePeriod: Int): this.type = {
+this.peacePeriod = peacePeriod
+this
+  }
+
+  /**
+   * Set the number of batches to compute significance tests over.
+   * A value of 0 will use all batches seen so far.
+   */
+  @Since("1.6.0")
+  def setWindowSize(windowSize: Int): this.type = {
+this.windowSize = windowSize
+this
+  }
+
+  /** Set the statistical method used for significance testing. */
+  @Since("1.6.0")
+  def setTestMethod(method: String): this.type = {
+this.testMethod = StreamingTestMethod.getTestMethodFromName(method)
+this
+  }
+
+  /**
+   * Register a [[DStream]] of values for significance testing.
+   *
+   * @param data stream of (key,value) pairs where the key is the group 
membership (control or
+   * treatment) and the value is the numerical metric to test 
for significance
+   * @return stream of significance testing results
+   */
+  @Since("1.6.0")
+  def registerStream(data: DStream[(Boolean, Double)]): 
DStream[StreamingTestResult] = {
+val dataAfterPeacePeriod = dropPeacePeriod(data)
+val summarizedData = summarizeByKeyAndWindow(dataAfterPeacePeriod)
+val pairedSummaries = pairSummaries(summarizedData)
+val testResults = testMethod.doTest(pairedSummaries)
--- End diff --

`val testResults = ` is not necessary


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.

[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...

2015-09-17 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4716#discussion_r39824264
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTest.scala ---
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.stat.test
+
+import org.apache.spark.Logging
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.streaming.dstream.DStream
+import org.apache.spark.util.StatCounter
+
+/**
+ * :: Experimental ::
+ * Performs online 2-sample significance testing for a stream of (Boolean, 
Double) pairs. The
+ * Boolean identifies which sample each observation comes from, and the 
Double is the numeric value
+ * of the observation.
+ *
+ * To address novelty affects, the `peacePeriod` specifies a set number of 
initial
+ * [[org.apache.spark.rdd.RDD]] batches of the [[DStream]] to be dropped 
from significance testing.
+ *
+ * The `windowSize` sets the number of batches each significance test is 
to be performed over. The
+ * window is sliding with a stride length of 1 batch. Setting windowSize 
to 0 will perform
+ * cumulative processing, using all batches seen so far.
+ *
+ * Different tests may be used for assessing statistical significance 
depending on assumptions
+ * satisfied by data. For more details, see [[StreamingTestMethod]]. The 
`testMethod` specifies
+ * which test will be used.
+ *
+ * Use a builder pattern to construct a streaming test in an application, 
for example:
+ *   ```
+ *   val model = new OnlineABTest()
+ * .setPeacePeriod(10)
+ * .setWindowSize(0)
+ * .setTestMethod("welch")
+ * .registerStream(DStream)
+ *   ```
+ */
+@Experimental
+@Since("1.6.0")
+class StreamingTest(
+@Since("1.6.0") var peacePeriod: Int = 0,
+@Since("1.6.0") var windowSize: Int = 0,
+@Since("1.6.0") var testMethod: StreamingTestMethod = WelchTTest)
+  extends Logging with Serializable {
+
+  /** Set the number of initial batches to ignore. */
+  @Since("1.6.0")
+  def setPeacePeriod(peacePeriod: Int): this.type = {
+this.peacePeriod = peacePeriod
+this
+  }
+
+  /**
+   * Set the number of batches to compute significance tests over.
+   * A value of 0 will use all batches seen so far.
+   */
+  @Since("1.6.0")
+  def setWindowSize(windowSize: Int): this.type = {
+this.windowSize = windowSize
+this
+  }
+
+  /** Set the statistical method used for significance testing. */
--- End diff --

document default value and available methods


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...

2015-09-17 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4716#discussion_r39824260
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTest.scala ---
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.stat.test
+
+import org.apache.spark.Logging
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.streaming.dstream.DStream
+import org.apache.spark.util.StatCounter
+
+/**
+ * :: Experimental ::
+ * Performs online 2-sample significance testing for a stream of (Boolean, 
Double) pairs. The
+ * Boolean identifies which sample each observation comes from, and the 
Double is the numeric value
+ * of the observation.
+ *
+ * To address novelty affects, the `peacePeriod` specifies a set number of 
initial
+ * [[org.apache.spark.rdd.RDD]] batches of the [[DStream]] to be dropped 
from significance testing.
+ *
+ * The `windowSize` sets the number of batches each significance test is 
to be performed over. The
+ * window is sliding with a stride length of 1 batch. Setting windowSize 
to 0 will perform
+ * cumulative processing, using all batches seen so far.
+ *
+ * Different tests may be used for assessing statistical significance 
depending on assumptions
+ * satisfied by data. For more details, see [[StreamingTestMethod]]. The 
`testMethod` specifies
+ * which test will be used.
+ *
+ * Use a builder pattern to construct a streaming test in an application, 
for example:
+ *   ```
+ *   val model = new OnlineABTest()
+ * .setPeacePeriod(10)
+ * .setWindowSize(0)
+ * .setTestMethod("welch")
+ * .registerStream(DStream)
+ *   ```
+ */
+@Experimental
+@Since("1.6.0")
+class StreamingTest(
+@Since("1.6.0") var peacePeriod: Int = 0,
+@Since("1.6.0") var windowSize: Int = 0,
+@Since("1.6.0") var testMethod: StreamingTestMethod = WelchTTest)
+  extends Logging with Serializable {
+
+  /** Set the number of initial batches to ignore. */
+  @Since("1.6.0")
+  def setPeacePeriod(peacePeriod: Int): this.type = {
+this.peacePeriod = peacePeriod
+this
+  }
+
+  /**
+   * Set the number of batches to compute significance tests over.
+   * A value of 0 will use all batches seen so far.
--- End diff --

document default value


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...

2015-09-17 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4716#discussion_r39824269
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTest.scala ---
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.stat.test
+
+import org.apache.spark.Logging
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.streaming.dstream.DStream
+import org.apache.spark.util.StatCounter
+
+/**
+ * :: Experimental ::
+ * Performs online 2-sample significance testing for a stream of (Boolean, 
Double) pairs. The
+ * Boolean identifies which sample each observation comes from, and the 
Double is the numeric value
+ * of the observation.
+ *
+ * To address novelty affects, the `peacePeriod` specifies a set number of 
initial
+ * [[org.apache.spark.rdd.RDD]] batches of the [[DStream]] to be dropped 
from significance testing.
+ *
+ * The `windowSize` sets the number of batches each significance test is 
to be performed over. The
+ * window is sliding with a stride length of 1 batch. Setting windowSize 
to 0 will perform
+ * cumulative processing, using all batches seen so far.
+ *
+ * Different tests may be used for assessing statistical significance 
depending on assumptions
+ * satisfied by data. For more details, see [[StreamingTestMethod]]. The 
`testMethod` specifies
+ * which test will be used.
+ *
+ * Use a builder pattern to construct a streaming test in an application, 
for example:
+ *   ```
+ *   val model = new OnlineABTest()
+ * .setPeacePeriod(10)
+ * .setWindowSize(0)
+ * .setTestMethod("welch")
+ * .registerStream(DStream)
+ *   ```
+ */
+@Experimental
+@Since("1.6.0")
+class StreamingTest(
+@Since("1.6.0") var peacePeriod: Int = 0,
+@Since("1.6.0") var windowSize: Int = 0,
+@Since("1.6.0") var testMethod: StreamingTestMethod = WelchTTest)
+  extends Logging with Serializable {
+
+  /** Set the number of initial batches to ignore. */
+  @Since("1.6.0")
+  def setPeacePeriod(peacePeriod: Int): this.type = {
+this.peacePeriod = peacePeriod
+this
+  }
+
+  /**
+   * Set the number of batches to compute significance tests over.
+   * A value of 0 will use all batches seen so far.
+   */
+  @Since("1.6.0")
+  def setWindowSize(windowSize: Int): this.type = {
+this.windowSize = windowSize
+this
+  }
+
+  /** Set the statistical method used for significance testing. */
+  @Since("1.6.0")
+  def setTestMethod(method: String): this.type = {
+this.testMethod = StreamingTestMethod.getTestMethodFromName(method)
+this
+  }
+
+  /**
+   * Register a [[DStream]] of values for significance testing.
+   *
+   * @param data stream of (key,value) pairs where the key is the group 
membership (control or
--- End diff --

document clearly whether `true` means control or experiment


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141344287
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141344297
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...

2015-09-17 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4716#discussion_r39824239
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTest.scala ---
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.stat.test
+
+import org.apache.spark.Logging
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.streaming.dstream.DStream
+import org.apache.spark.util.StatCounter
+
+/**
+ * :: Experimental ::
+ * Performs online 2-sample significance testing for a stream of (Boolean, 
Double) pairs. The
+ * Boolean identifies which sample each observation comes from, and the 
Double is the numeric value
+ * of the observation.
+ *
+ * To address novelty affects, the `peacePeriod` specifies a set number of 
initial
+ * [[org.apache.spark.rdd.RDD]] batches of the [[DStream]] to be dropped 
from significance testing.
+ *
+ * The `windowSize` sets the number of batches each significance test is 
to be performed over. The
+ * window is sliding with a stride length of 1 batch. Setting windowSize 
to 0 will perform
+ * cumulative processing, using all batches seen so far.
+ *
+ * Different tests may be used for assessing statistical significance 
depending on assumptions
+ * satisfied by data. For more details, see [[StreamingTestMethod]]. The 
`testMethod` specifies
+ * which test will be used.
+ *
+ * Use a builder pattern to construct a streaming test in an application, 
for example:
+ *   ```
+ *   val model = new OnlineABTest()
--- End diff --

`StreamingTest`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...

2015-09-17 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4716#discussion_r39824244
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTest.scala ---
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.stat.test
+
+import org.apache.spark.Logging
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.streaming.dstream.DStream
+import org.apache.spark.util.StatCounter
+
+/**
+ * :: Experimental ::
+ * Performs online 2-sample significance testing for a stream of (Boolean, 
Double) pairs. The
+ * Boolean identifies which sample each observation comes from, and the 
Double is the numeric value
+ * of the observation.
+ *
+ * To address novelty affects, the `peacePeriod` specifies a set number of 
initial
+ * [[org.apache.spark.rdd.RDD]] batches of the [[DStream]] to be dropped 
from significance testing.
+ *
+ * The `windowSize` sets the number of batches each significance test is 
to be performed over. The
+ * window is sliding with a stride length of 1 batch. Setting windowSize 
to 0 will perform
+ * cumulative processing, using all batches seen so far.
+ *
+ * Different tests may be used for assessing statistical significance 
depending on assumptions
+ * satisfied by data. For more details, see [[StreamingTestMethod]]. The 
`testMethod` specifies
+ * which test will be used.
+ *
+ * Use a builder pattern to construct a streaming test in an application, 
for example:
+ *   ```
+ *   val model = new OnlineABTest()
+ * .setPeacePeriod(10)
+ * .setWindowSize(0)
+ * .setTestMethod("welch")
+ * .registerStream(DStream)
+ *   ```
+ */
+@Experimental
+@Since("1.6.0")
+class StreamingTest(
+@Since("1.6.0") var peacePeriod: Int = 0,
--- End diff --

The default values are not Java friendly. Since we already have setters, we 
can make a default constructor with no arguments.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...

2015-09-17 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4716#discussion_r39824247
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTest.scala ---
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.stat.test
+
+import org.apache.spark.Logging
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.streaming.dstream.DStream
+import org.apache.spark.util.StatCounter
+
+/**
+ * :: Experimental ::
+ * Performs online 2-sample significance testing for a stream of (Boolean, 
Double) pairs. The
+ * Boolean identifies which sample each observation comes from, and the 
Double is the numeric value
+ * of the observation.
+ *
+ * To address novelty affects, the `peacePeriod` specifies a set number of 
initial
+ * [[org.apache.spark.rdd.RDD]] batches of the [[DStream]] to be dropped 
from significance testing.
+ *
+ * The `windowSize` sets the number of batches each significance test is 
to be performed over. The
+ * window is sliding with a stride length of 1 batch. Setting windowSize 
to 0 will perform
+ * cumulative processing, using all batches seen so far.
+ *
+ * Different tests may be used for assessing statistical significance 
depending on assumptions
+ * satisfied by data. For more details, see [[StreamingTestMethod]]. The 
`testMethod` specifies
+ * which test will be used.
+ *
+ * Use a builder pattern to construct a streaming test in an application, 
for example:
+ *   ```
+ *   val model = new OnlineABTest()
+ * .setPeacePeriod(10)
+ * .setWindowSize(0)
+ * .setTestMethod("welch")
+ * .registerStream(DStream)
+ *   ```
+ */
+@Experimental
+@Since("1.6.0")
+class StreamingTest(
+@Since("1.6.0") var peacePeriod: Int = 0,
+@Since("1.6.0") var windowSize: Int = 0,
+@Since("1.6.0") var testMethod: StreamingTestMethod = WelchTTest)
+  extends Logging with Serializable {
+
+  /** Set the number of initial batches to ignore. */
--- End diff --

document default value


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...

2015-09-17 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4716#discussion_r39824241
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTest.scala ---
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.stat.test
+
+import org.apache.spark.Logging
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.streaming.dstream.DStream
+import org.apache.spark.util.StatCounter
+
+/**
+ * :: Experimental ::
+ * Performs online 2-sample significance testing for a stream of (Boolean, 
Double) pairs. The
+ * Boolean identifies which sample each observation comes from, and the 
Double is the numeric value
+ * of the observation.
+ *
+ * To address novelty affects, the `peacePeriod` specifies a set number of 
initial
+ * [[org.apache.spark.rdd.RDD]] batches of the [[DStream]] to be dropped 
from significance testing.
+ *
+ * The `windowSize` sets the number of batches each significance test is 
to be performed over. The
+ * window is sliding with a stride length of 1 batch. Setting windowSize 
to 0 will perform
+ * cumulative processing, using all batches seen so far.
+ *
+ * Different tests may be used for assessing statistical significance 
depending on assumptions
+ * satisfied by data. For more details, see [[StreamingTestMethod]]. The 
`testMethod` specifies
+ * which test will be used.
+ *
+ * Use a builder pattern to construct a streaming test in an application, 
for example:
+ *   ```
+ *   val model = new OnlineABTest()
+ * .setPeacePeriod(10)
+ * .setWindowSize(0)
+ * .setTestMethod("welch")
+ * .registerStream(DStream)
+ *   ```
+ */
+@Experimental
+@Since("1.6.0")
+class StreamingTest(
--- End diff --

add since version to constructor as well: `class StreamingTest 
@Since("1.6.0") (`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...

2015-09-17 Thread mengxr

Github user mengxr commented on a diff in the pull request:

https://github.com/apache/spark/pull/4716#discussion_r39824237
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTest.scala ---
@@ -0,0 +1,145 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.mllib.stat.test
+
+import org.apache.spark.Logging
+import org.apache.spark.annotation.{Experimental, Since}
+import org.apache.spark.streaming.dstream.DStream
+import org.apache.spark.util.StatCounter
+
+/**
+ * :: Experimental ::
+ * Performs online 2-sample significance testing for a stream of (Boolean, 
Double) pairs. The
+ * Boolean identifies which sample each observation comes from, and the 
Double is the numeric value
+ * of the observation.
+ *
+ * To address novelty affects, the `peacePeriod` specifies a set number of 
initial
+ * [[org.apache.spark.rdd.RDD]] batches of the [[DStream]] to be dropped 
from significance testing.
+ *
+ * The `windowSize` sets the number of batches each significance test is 
to be performed over. The
+ * window is sliding with a stride length of 1 batch. Setting windowSize 
to 0 will perform
+ * cumulative processing, using all batches seen so far.
+ *
+ * Different tests may be used for assessing statistical significance 
depending on assumptions
+ * satisfied by data. For more details, see [[StreamingTestMethod]]. The 
`testMethod` specifies
+ * which test will be used.
+ *
+ * Use a builder pattern to construct a streaming test in an application, 
for example:
+ *   ```
--- End diff --

use `{{{` for example code


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...

2015-09-17 Thread rotationsymmetry

Github user rotationsymmetry commented on the pull request:

https://github.com/apache/spark/pull/8631#issuecomment-141344261
  
@dbtsai Thanks for the comment on indentation. I have fixed it in the patch.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8518] [ML] Log-linear models for surviv...

2015-09-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/8611


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8518] [ML] Log-linear models for surviv...

2015-09-17 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/8611#issuecomment-141341794
  
LGTM. Merged into master. Thanks! I created 
https://issues.apache.org/jira/browse/SPARK-10686 for follow-up work.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9522][SQL] SparkSubmit process can not ...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7853#issuecomment-141340818
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42632/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9522][SQL] SparkSubmit process can not ...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7853#issuecomment-141340817
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9522][SQL] SparkSubmit process can not ...

2015-09-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7853#issuecomment-141340676
  
  [Test build #42632 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42632/console)
 for   PR 7853 at commit 
[`504aeb3`](https://github.com/apache/spark/commit/504aeb32260fc0a26cccbed17d1c48b49f99e488).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7936] [SQL] Add configuration for initi...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6488#issuecomment-141340120
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42635/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7936] [SQL] Add configuration for initi...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6488#issuecomment-141340118
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-7936] [SQL] Add configuration for initi...

2015-09-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6488#issuecomment-141339998
  
  [Test build #42635 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42635/console)
 for   PR 6488 at commit 
[`39a9c41`](https://github.com/apache/spark/commit/39a9c4184952c90673a1a9766a72bfc120c23123).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class Interaction(override val uid: String) extends Transformer`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9585] add config to enable inputFormat ...

2015-09-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7918#issuecomment-141339490
  
  [Test build #42639 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42639/consoleFull)
 for   PR 7918 at commit 
[`3c1d41d`](https://github.com/apache/spark/commit/3c1d41d8d8b338b2305281f9ab6b5db927a2706c).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9585] add config to enable inputFormat ...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7918#issuecomment-141337659
  
Merged build started.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9585] add config to enable inputFormat ...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7918#issuecomment-141337648
  
 Merged build triggered.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8518] [ML] Log-linear models for surviv...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8611#issuecomment-141337339
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8518] [ML] Log-linear models for surviv...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/8611#issuecomment-141337341
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42637/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8518] [ML] Log-linear models for surviv...

2015-09-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/8611#issuecomment-141337307
  
  [Test build #42637 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42637/console)
 for   PR 8611 at commit 
[`aa37878`](https://github.com/apache/spark/commit/aa37878c50ef6e7722a615298240ba6e61ea083c).
 * This patch **passes all tests**.
 * This patch merges cleanly.
 * This patch adds the following public classes _(experimental)_:
  * `class AFTSurvivalRegression @Since("1.6.0") (@Since("1.6.0") override 
val uid: String)`
  * `  require(censor == 1.0 || censor == 0.0, "censor of class AFTPoint 
must be 1.0 or 0.0")`



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9585] add config to enable inputFormat ...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7918#issuecomment-141337220
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9585] add config to enable inputFormat ...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/7918#issuecomment-141337221
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42638/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-9585] add config to enable inputFormat ...

2015-09-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/7918#issuecomment-141337218
  
  [Test build #42638 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42638/console)
 for   PR 7918 at commit 
[`70668d7`](https://github.com/apache/spark/commit/70668d7936564dcb25585cd591cfdd7f83958cc3).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8312] [SQL] Populate statistics info of...

2015-09-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/6767#issuecomment-141337142
  
  [Test build #42636 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42636/console)
 for   PR 6767 at commit 
[`6dbedd1`](https://github.com/apache/spark/commit/6dbedd1fd82412f3c6de27a76807519606748aaf).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8312] [SQL] Populate statistics info of...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6767#issuecomment-141337164
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request: [SPARK-8312] [SQL] Populate statistics info of...

2015-09-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/6767#issuecomment-141337167
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42636/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 >

1 - 100 of 678 matches

Mail list logo