[GitHub] spark pull request: [SPARK-9585] add config to enable inputFormat ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7918#issuecomment-141364727 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42639/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10684] [SQL] StructType.interpretedOrde...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8808#issuecomment-141364769 [Test build #1772 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1772/console) for PR 8808 at commit [`a26512b`](https://github.com/apache/spark/commit/a26512b12339a5f82d7c55c6107a1fe5e50ac43d). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class TaskCommitDenied(` * `class Interaction(override val uid: String) extends Transformer` * `abstract class LocalNode(conf: SQLConf) extends QueryPlan[LocalNode] with Logging ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9585] add config to enable inputFormat ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7918#issuecomment-141364629 [Test build #42639 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42639/console) for PR 7918 at commit [`3c1d41d`](https://github.com/apache/spark/commit/3c1d41d8d8b338b2305281f9ab6b5db927a2706c). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9585] add config to enable inputFormat ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7918#issuecomment-141364725 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8402][MLLIB] DP Means Clustering
Github user FlytxtRnD commented on a diff in the pull request: https://github.com/apache/spark/pull/6880#discussion_r39827805 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/DpMeans.scala --- @@ -0,0 +1,247 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.clustering + +import scala.collection.mutable.ArrayBuffer + +import org.apache.spark.Logging +import org.apache.spark.annotation.Experimental +import org.apache.spark.mllib.linalg.{Vector, Vectors} +import org.apache.spark.mllib.linalg.BLAS.{axpy, scal} +import org.apache.spark.mllib.util.MLUtils +import org.apache.spark.rdd.RDD +import org.apache.spark.storage.StorageLevel + +/** + * :: Experimental :: + * + * The Dirichlet process (DP) is a popular non-parametric Bayesian mixture + * model that allows for flexible clustering of data without having to + * determine the number of clusters in advance. + * + * Given a set of data points, this class performs cluster creation process, + * based on DP means algorithm, iterating until the maximum number of iterations + * is reached or the convergence criteria is satisfied. With the current + * global set of centers, it locally creates a new cluster centered at `x` + * whenever it encounters an uncovered data point `x`. In a similar manner, + * a local cluster center is promoted to a global center whenever an uncovered + * local cluster center is found. A data point is said to be "covered" by + * a cluster `c` if the distance from the point to the cluster center of `c` + * is less than a given lambda value. + * + * The original paper is "MLbase: Distributed Machine Learning Made Easy" by + * Xinghao Pan, Evan R. Sparks, Andre Wibisono + * + * @param lambda The distance threshold value that controls cluster creation. + * @param convergenceTol The threshold value at which convergence is considered to have occurred. + * @param maxIterations The maximum number of iterations to perform. + */ + +@Experimental +class DpMeans private ( +private var lambda: Double, +private var convergenceTol: Double, +private var maxIterations: Int) extends Serializable with Logging { + + /** + * Constructs a default instance.The default parameters are {lambda: 1, convergenceTol: 0.01, + * maxIterations: 20}. + */ + def this() = this(1, 0.01, 20) + + /** Set the distance threshold that controls cluster creation. Default: 1 */ + def getLambda(): Double = lambda + + /** Return the lambda. */ + def setLambda(lambda: Double): this.type = { +this.lambda = lambda +this + } + + /** Set the threshold value at which convergence is considered to have occurred. Default: 0.01 */ + def setConvergenceTol(convergenceTol: Double): this.type = { +this.convergenceTol = convergenceTol +this + } + + /** Return the threshold value at which convergence is considered to have occurred. */ + def getConvergenceTol: Double = convergenceTol + + /** Set the maximum number of iterations. Default: 20 */ + def setMaxIterations(maxIterations: Int): this.type = { +this.maxIterations = maxIterations +this + } + + /** Return the maximum number of iterations. */ + def getMaxIterations: Int = maxIterations + + /** + * Perform DP means clustering + */ + def run(data: RDD[Vector]): DpMeansModel = { +if (data.getStorageLevel == StorageLevel.NONE) { + logWarning("The input data is not directly cached, which may hurt performance if its" ++ " parent RDDs are also uncached.") +} + +// Compute squared norms and cache them. +val norms = data.map(Vectors.norm(_, 2.0)) +norms.persist() +val zippedData = data.zip(norms).map { + case (v, norm) => new VectorWithNorm(v, norm) +} + +// Implementation of DP mean
[GitHub] spark pull request: [SPARK-10623] [SQL] Fixes ORC predicate push-d...
Github user zhzhan commented on the pull request: https://github.com/apache/spark/pull/8799#issuecomment-141355072 LGTM Thanks for fixing this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4716#issuecomment-141353561 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4716#issuecomment-141353499 [Test build #42642 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42642/console) for PR 4716 at commit [`60b2e57`](https://github.com/apache/spark/commit/60b2e57026febcb68e459983ba3164281a47f636). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4716#issuecomment-141353562 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42642/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: docs/running-on-mesos.md: state default values...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/8810#discussion_r39825765 --- Diff: docs/running-on-mesos.md --- @@ -332,21 +332,21 @@ See the [configuration page](configuration.html) for information on Spark config spark.mesos.principal - Framework principal to authenticate to Mesos + (none) Set the principal with which Spark framework will use to authenticate with Mesos. spark.mesos.secret - Framework secret to authenticate to Mesos + (none)/td> Set the secret with which Spark framework will use to authenticate with Mesos. spark.mesos.role - Role for the Spark framework + * --- End diff -- I've already merged this though so don't worry about it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: docs/running-on-mesos.md: state default values...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/8810#discussion_r39825764 --- Diff: docs/running-on-mesos.md --- @@ -332,21 +332,21 @@ See the [configuration page](configuration.html) for information on Spark config spark.mesos.principal - Framework principal to authenticate to Mesos + (none) Set the principal with which Spark framework will use to authenticate with Mesos. spark.mesos.secret - Framework secret to authenticate to Mesos + (none)/td> Set the secret with which Spark framework will use to authenticate with Mesos. spark.mesos.role - Role for the Spark framework + * --- End diff -- Oh I meant ``` * ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: docs/running-on-mesos.md: state default values...
Github user felixb commented on a diff in the pull request: https://github.com/apache/spark/pull/8810#discussion_r39825730 --- Diff: docs/running-on-mesos.md --- @@ -332,21 +332,21 @@ See the [configuration page](configuration.html) for information on Spark config spark.mesos.principal - Framework principal to authenticate to Mesos + (none) Set the principal with which Spark framework will use to authenticate with Mesos. spark.mesos.secret - Framework secret to authenticate to Mesos + (none)/td> Set the secret with which Spark framework will use to authenticate with Mesos. spark.mesos.role - Role for the Spark framework + * --- End diff -- I don't see anything. Should I set it to `"*"`, `(*)` or ` * ` with blanks on each side? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: docs/running-on-mesos.md: state default values...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8810 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10272][Pyspark][MLLib] Added @since tag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8628#issuecomment-141351519 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42643/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10272][Pyspark][MLLib] Added @since tag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8628#issuecomment-141351516 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: docs/running-on-mesos.md: state default values...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8810#issuecomment-141351178 Actually I will just merge this and address the comment when I merge. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10272][Pyspark][MLLib] Added @since tag...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8628#issuecomment-141351219 [Test build #42643 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42643/console) for PR 8628 at commit [`9f06d04`](https://github.com/apache/spark/commit/9f06d04b272b413cee27ccfa35dd304843b264c9). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10648] Proposed bug fix when oracle ret...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8780#issuecomment-141351034 (I actually don't know if Spark implements this correctly -- we should test it) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10648] Proposed bug fix when oracle ret...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8780#issuecomment-141350914 Actually scale can be negative. It just means the number of 0s to the left of decimal point. For example, for number 123, precision = 2 and scale = -1, then 123 would become 120. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: docs/running-on-mesos.md: state default values...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/8810#discussion_r39825352 --- Diff: docs/running-on-mesos.md --- @@ -332,21 +332,21 @@ See the [configuration page](configuration.html) for information on Spark config spark.mesos.principal - Framework principal to authenticate to Mesos + (none) Set the principal with which Spark framework will use to authenticate with Mesos. spark.mesos.secret - Framework secret to authenticate to Mesos + (none)/td> Set the secret with which Spark framework will use to authenticate with Mesos. spark.mesos.role - Role for the Spark framework + * --- End diff -- can you put around this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: docs/running-on-mesos.md: state default values...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8810#issuecomment-141349184 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9522][SQL] SparkSubmit process can not ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/7853 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10269][Pyspark][MLLib] Add @since annot...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8626#issuecomment-141349076 [Test build #42645 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42645/consoleFull) for PR 8626 at commit [`2e81fd3`](https://github.com/apache/spark/commit/2e81fd314b98a460560376161a3d03950b0ed8fc). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: docs/running-on-mesos.md: state default values...
GitHub user felixb opened a pull request: https://github.com/apache/spark/pull/8810 docs/running-on-mesos.md: state default values in default column This PR simply uses the default value column for defaults. You can merge this pull request into a Git repository by running: $ git pull https://github.com/felixb/spark fix_mesos_doc Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8810.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8810 commit 09b4e15f0dfe903ac70c0d6f4a8fcf06dac6d78b Author: Felix Bechstein Date: 2015-09-18T05:25:00Z docs/running-on-mesos.md: state default values in default column --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10471] [CORE] [MESOS] prevent getting o...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8639#issuecomment-141349042 [Test build #42646 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42646/consoleFull) for PR 8639 at commit [`58aaa79`](https://github.com/apache/spark/commit/58aaa79095143187175f0292d71b772b90db). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10271][Pyspark][MLLib] Added @since tag...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8627#issuecomment-141349055 [Test build #42644 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42644/consoleFull) for PR 8627 at commit [`100ce0f`](https://github.com/apache/spark/commit/100ce0fc36e9d143f6789db2a749afb8902d0676). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10272][Pyspark][MLLib] Added @since tag...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8628#issuecomment-141348863 [Test build #42643 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42643/consoleFull) for PR 8628 at commit [`9f06d04`](https://github.com/apache/spark/commit/9f06d04b272b413cee27ccfa35dd304843b264c9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9522][SQL] SparkSubmit process can not ...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/7853#issuecomment-141348696 lgtm. merging to 1.5 branch and master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10471] [CORE] [MESOS] prevent getting o...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8639#issuecomment-141348570 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10471] [CORE] [MESOS] prevent getting o...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8639#issuecomment-141348557 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10471] [CORE] [MESOS] prevent getting o...
Github user felixb commented on the pull request: https://github.com/apache/spark/pull/8639#issuecomment-141348584 added to table of parameters. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4716#issuecomment-141347889 [Test build #42642 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42642/consoleFull) for PR 4716 at commit [`60b2e57`](https://github.com/apache/spark/commit/60b2e57026febcb68e459983ba3164281a47f636). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-10329 Cost RDD in k-means|| initializati...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/8546#issuecomment-141347838 @HuJiayin This basically reverts the behavior back to 1.2. The changes we made in 1.3 is to avoid recomputing distances between old centers and input points during initialization. That is why we need `newCenters`. If you test the current version with a large `k`, you will see the performance difference. Base on our discussion offline, I think there are not much work to do here. The case when the new implementation introduces overhead is when the dataset is really tall and skinny, but we haven't heard negative feedback from practical use cases yet. Do you mind closing this PR for now? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10272][Pyspark][MLLib] Added @since tag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8628#issuecomment-141347398 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10269][Pyspark][MLLib] Add @since annot...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8626#issuecomment-141347427 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10269][Pyspark][MLLib] Add @since annot...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8626#issuecomment-141347413 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10272][Pyspark][MLLib] Added @since tag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8628#issuecomment-141347425 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10271][Pyspark][MLLib] Added @since tag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8627#issuecomment-141347428 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10577] [PySpark] DataFrame hint for bro...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/8801#issuecomment-141347383 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10271][Pyspark][MLLib] Added @since tag...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8627#issuecomment-141347405 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10679] [CORE] javax.jdo.JDOFatalUserExc...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/8804#discussion_r39824809 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -295,13 +298,25 @@ private[hive] object HadoopTableReader extends HiveInspectors with Logging { def initializeLocalJobConfFunc(path: String, tableDesc: TableDesc)(jobConf: JobConf) { FileInputFormat.setInputPaths(jobConf, Seq[Path](new Path(path)): _*) if (tableDesc != null) { - PlanUtils.configureInputJobPropertiesForStorageHandler(tableDesc) + configureJobPropertiesForStorageHandler(tableDesc, jobConf) Utilities.copyTableJobPropertiesToConf(tableDesc, jobConf) } val bufferSize = System.getProperty("spark.buffer.size", "65536") jobConf.set("io.file.buffer.size", bufferSize) } + private def configureJobPropertiesForStorageHandler(tableDesc: TableDesc, jobConf: JobConf) { --- End diff -- can you add some comment explaining what's happening, and why this is done this way? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10659][SQL] Add an option in SQLConf fo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8809#issuecomment-141347277 [Test build #42641 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42641/consoleFull) for PR 8809 at commit [`6911d0f`](https://github.com/apache/spark/commit/6911d0ff9d82475f69ae558cbb9aab1ed588c847). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10684] [SQL] StructType.interpretedOrde...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8808#issuecomment-141347060 [Test build #1772 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/1772/consoleFull) for PR 8808 at commit [`a26512b`](https://github.com/apache/spark/commit/a26512b12339a5f82d7c55c6107a1fe5e50ac43d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10272][Pyspark][MLLib] Added @since tag...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/8628#issuecomment-141347073 @yu-iskw Could you help review this PR? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10271][Pyspark][MLLib] Added @since tag...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/8627#issuecomment-141347089 @yu-iskw Could you help review this PR? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10269][Pyspark][MLLib] Add @since annot...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/8626#issuecomment-141347059 @yu-iskw Could you help review this PR? Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10659][SQL] Add an option in SQLConf fo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8809#issuecomment-141346942 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10659][SQL] Add an option in SQLConf fo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8809#issuecomment-141346996 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10272][Pyspark][MLLib] Added @since tag...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/8628#issuecomment-141347019 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10269][Pyspark][MLLib] Add @since annot...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/8626#issuecomment-141346952 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4716#issuecomment-141346945 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10271][Pyspark][MLLib] Added @since tag...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/8627#issuecomment-141347014 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4716#issuecomment-141346997 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4716#discussion_r39824573 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/TestResult.scala --- @@ -115,3 +115,25 @@ class KolmogorovSmirnovTestResult private[stat] ( "Kolmogorov-Smirnov test summary:\n" + super.toString } } + +/** + * :: Experimental :: + * Object containing the test results for streaming testing. + */ +@Experimental +@Since("1.6.0") +private[stat] class StreamingTestResult( --- End diff -- add @Since to constructor --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10659][SQL] Add an option in SQLConf fo...
GitHub user viirya opened a pull request: https://github.com/apache/spark/pull/8809 [SPARK-10659][SQL] Add an option in SQLConf for setting schema nullable in datasource JIRA: https://issues.apache.org/jira/browse/SPARK-10659 If not preserve REQUIRED (not nullable) flag in schema is a problem for users, I think we can add an option (default is enabled) to enable this behavior or not. You can merge this pull request into a Git repository by running: $ git pull https://github.com/viirya/spark-1 optional_asnullable Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/8809.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #8809 commit 6911d0ff9d82475f69ae558cbb9aab1ed588c847 Author: Liang-Chi Hsieh Date: 2015-09-18T05:03:15Z Add an option in SQLConf for schema asNullable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/4716#issuecomment-141346526 test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4716#discussion_r39824556 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTestMethod.scala --- @@ -0,0 +1,165 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.stat.test + +import java.io.Serializable + +import scala.language.implicitConversions +import scala.math.pow + +import com.twitter.chill.MeatLocker +import org.apache.commons.math3.stat.descriptive.StatisticalSummaryValues +import org.apache.commons.math3.stat.inference.TTest + +import org.apache.spark.Logging +import org.apache.spark.streaming.dstream.DStream +import org.apache.spark.util.StatCounter + +/** + * Significance testing methods for [[StreamingTest]]. New 2-sample statistical significance tests + * should extend [[StreamingTestMethod]] and introduce a new entry in + * [[StreamingTestMethod.TEST_NAME_TO_OBJECT]] + */ +private[stat] sealed trait StreamingTestMethod extends Serializable { + + val MethodName: String + val NullHypothesis: String + + protected type SummaryPairStream = +DStream[(StatCounter, StatCounter)] + + /** + * Perform streaming 2-sample statistical significance testing. + * + * @param sampleSummaries stream pairs of summary statistics for the 2 samples + * @return stream of rest results + */ + def doTest(sampleSummaries: SummaryPairStream): DStream[StreamingTestResult] + + /** + * Implicit adapter to convert between streaming summary statistics type and the type required by + * the t-testing libraries. + */ + protected implicit def toApacheCommonsStats( + summaryStats: StatCounter): StatisticalSummaryValues = { +new StatisticalSummaryValues( + summaryStats.mean, + summaryStats.variance, + summaryStats.count, + summaryStats.max, + summaryStats.min, + summaryStats.mean * summaryStats.count +) + } +} + +/** + * Performs Welch's 2-sample t-test. The null hypothesis is that the two data sets have equal mean. + * This test does not assume equal variance between the two samples and does not assume equal + * sample size. + * + * More information: http://en.wikipedia.org/wiki/Welch%27s_t_test + */ +private[stat] object WelchTTest extends StreamingTestMethod with Logging { + + final val MethodName = "Welch's 2-sample T-test" + final val NullHypothesis = "Both groups have same mean" + + private final val TTester = MeatLocker(new TTest()) + + def doTest(data: SummaryPairStream): DStream[StreamingTestResult] = +data.map[StreamingTestResult]((test _).tupled) + + private def test( + statsA: StatCounter, + statsB: StatCounter): StreamingTestResult = { +def welchDF(sample1: StatisticalSummaryValues, sample2: StatisticalSummaryValues): Double = { + val s1 = sample1.getVariance + val n1 = sample1.getN + val s2 = sample2.getVariance + val n2 = sample2.getN + + val a = pow(s1, 2) / n1 + val b = pow(s2, 2) / n2 + + pow(a + b, 2) / ((pow(a, 2) / (n1 - 1)) + (pow(b, 2) / (n2 - 1))) +} + +new StreamingTestResult( + TTester.get.tTest(statsA, statsB), + welchDF(statsA, statsB), + TTester.get.t(statsA, statsB), + MethodName, + NullHypothesis +) + } +} + +/** + * Performs Students's 2-sample t-test. The null hypothesis is that the two data sets have equal + * mean. This test assumes equal variance between the two samples and does not assume equal sample + * size. For unequal variances, Welch's t-test should be used instead. + * + * More information: http://en.wikipedia.org/wiki/Student%27s_t-test + */ +private[stat] object StudentTTest extends StreamingTestMethod with Logging {
[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4716#discussion_r39824567 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTestMethod.scala --- @@ -0,0 +1,165 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.stat.test + +import java.io.Serializable + +import scala.language.implicitConversions +import scala.math.pow + +import com.twitter.chill.MeatLocker +import org.apache.commons.math3.stat.descriptive.StatisticalSummaryValues +import org.apache.commons.math3.stat.inference.TTest + +import org.apache.spark.Logging +import org.apache.spark.streaming.dstream.DStream +import org.apache.spark.util.StatCounter + +/** + * Significance testing methods for [[StreamingTest]]. New 2-sample statistical significance tests + * should extend [[StreamingTestMethod]] and introduce a new entry in + * [[StreamingTestMethod.TEST_NAME_TO_OBJECT]] + */ +private[stat] sealed trait StreamingTestMethod extends Serializable { + + val MethodName: String + val NullHypothesis: String + + protected type SummaryPairStream = +DStream[(StatCounter, StatCounter)] + + /** + * Perform streaming 2-sample statistical significance testing. + * + * @param sampleSummaries stream pairs of summary statistics for the 2 samples + * @return stream of rest results + */ + def doTest(sampleSummaries: SummaryPairStream): DStream[StreamingTestResult] + + /** + * Implicit adapter to convert between streaming summary statistics type and the type required by + * the t-testing libraries. + */ + protected implicit def toApacheCommonsStats( + summaryStats: StatCounter): StatisticalSummaryValues = { +new StatisticalSummaryValues( + summaryStats.mean, + summaryStats.variance, + summaryStats.count, + summaryStats.max, + summaryStats.min, + summaryStats.mean * summaryStats.count +) + } +} + +/** + * Performs Welch's 2-sample t-test. The null hypothesis is that the two data sets have equal mean. + * This test does not assume equal variance between the two samples and does not assume equal + * sample size. + * + * More information: http://en.wikipedia.org/wiki/Welch%27s_t_test + */ +private[stat] object WelchTTest extends StreamingTestMethod with Logging { + + final val MethodName = "Welch's 2-sample T-test" + final val NullHypothesis = "Both groups have same mean" + + private final val TTester = MeatLocker(new TTest()) + + def doTest(data: SummaryPairStream): DStream[StreamingTestResult] = +data.map[StreamingTestResult]((test _).tupled) + + private def test( + statsA: StatCounter, + statsB: StatCounter): StreamingTestResult = { +def welchDF(sample1: StatisticalSummaryValues, sample2: StatisticalSummaryValues): Double = { + val s1 = sample1.getVariance + val n1 = sample1.getN + val s2 = sample2.getVariance + val n2 = sample2.getN + + val a = pow(s1, 2) / n1 + val b = pow(s2, 2) / n2 + + pow(a + b, 2) / ((pow(a, 2) / (n1 - 1)) + (pow(b, 2) / (n2 - 1))) +} + +new StreamingTestResult( + TTester.get.tTest(statsA, statsB), + welchDF(statsA, statsB), + TTester.get.t(statsA, statsB), + MethodName, + NullHypothesis +) + } +} + +/** + * Performs Students's 2-sample t-test. The null hypothesis is that the two data sets have equal + * mean. This test assumes equal variance between the two samples and does not assume equal sample + * size. For unequal variances, Welch's t-test should be used instead. + * + * More information: http://en.wikipedia.org/wiki/Student%27s_t-test + */ +private[stat] object StudentTTest extends StreamingTestMethod with Logging {
[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4716#discussion_r39824543 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTestMethod.scala --- @@ -0,0 +1,165 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.stat.test + +import java.io.Serializable + +import scala.language.implicitConversions +import scala.math.pow + +import com.twitter.chill.MeatLocker +import org.apache.commons.math3.stat.descriptive.StatisticalSummaryValues +import org.apache.commons.math3.stat.inference.TTest + +import org.apache.spark.Logging +import org.apache.spark.streaming.dstream.DStream +import org.apache.spark.util.StatCounter + +/** + * Significance testing methods for [[StreamingTest]]. New 2-sample statistical significance tests + * should extend [[StreamingTestMethod]] and introduce a new entry in + * [[StreamingTestMethod.TEST_NAME_TO_OBJECT]] + */ +private[stat] sealed trait StreamingTestMethod extends Serializable { + + val MethodName: String + val NullHypothesis: String + + protected type SummaryPairStream = +DStream[(StatCounter, StatCounter)] + + /** + * Perform streaming 2-sample statistical significance testing. + * + * @param sampleSummaries stream pairs of summary statistics for the 2 samples + * @return stream of rest results + */ + def doTest(sampleSummaries: SummaryPairStream): DStream[StreamingTestResult] + + /** + * Implicit adapter to convert between streaming summary statistics type and the type required by + * the t-testing libraries. + */ + protected implicit def toApacheCommonsStats( + summaryStats: StatCounter): StatisticalSummaryValues = { +new StatisticalSummaryValues( + summaryStats.mean, + summaryStats.variance, + summaryStats.count, + summaryStats.max, + summaryStats.min, + summaryStats.mean * summaryStats.count +) + } +} + +/** + * Performs Welch's 2-sample t-test. The null hypothesis is that the two data sets have equal mean. + * This test does not assume equal variance between the two samples and does not assume equal + * sample size. + * + * More information: http://en.wikipedia.org/wiki/Welch%27s_t_test + */ +private[stat] object WelchTTest extends StreamingTestMethod with Logging { + + final val MethodName = "Welch's 2-sample T-test" + final val NullHypothesis = "Both groups have same mean" + + private final val TTester = MeatLocker(new TTest()) --- End diff -- `tTester` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10682][GraphX] Remove Bagel test suites...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8807 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4716#discussion_r39824515 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTestMethod.scala --- @@ -0,0 +1,165 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.stat.test + +import java.io.Serializable + +import scala.language.implicitConversions +import scala.math.pow + +import com.twitter.chill.MeatLocker +import org.apache.commons.math3.stat.descriptive.StatisticalSummaryValues +import org.apache.commons.math3.stat.inference.TTest + +import org.apache.spark.Logging +import org.apache.spark.streaming.dstream.DStream +import org.apache.spark.util.StatCounter + +/** + * Significance testing methods for [[StreamingTest]]. New 2-sample statistical significance tests + * should extend [[StreamingTestMethod]] and introduce a new entry in + * [[StreamingTestMethod.TEST_NAME_TO_OBJECT]] + */ +private[stat] sealed trait StreamingTestMethod extends Serializable { + + val MethodName: String --- End diff -- `MethodName` ->` methodName` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4716#discussion_r39824516 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTestMethod.scala --- @@ -0,0 +1,165 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.stat.test + +import java.io.Serializable + +import scala.language.implicitConversions +import scala.math.pow + +import com.twitter.chill.MeatLocker +import org.apache.commons.math3.stat.descriptive.StatisticalSummaryValues +import org.apache.commons.math3.stat.inference.TTest + +import org.apache.spark.Logging +import org.apache.spark.streaming.dstream.DStream +import org.apache.spark.util.StatCounter + +/** + * Significance testing methods for [[StreamingTest]]. New 2-sample statistical significance tests + * should extend [[StreamingTestMethod]] and introduce a new entry in + * [[StreamingTestMethod.TEST_NAME_TO_OBJECT]] + */ +private[stat] sealed trait StreamingTestMethod extends Serializable { + + val MethodName: String + val NullHypothesis: String --- End diff -- `nullHypothesis` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8631#issuecomment-141345895 [Test build #42640 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42640/consoleFull) for PR 8631 at commit [`1f731c2`](https://github.com/apache/spark/commit/1f731c28ad8a59f3bf432435253dc7b0984f46b4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4716#discussion_r39824449 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTestMethod.scala --- @@ -0,0 +1,165 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.stat.test + +import java.io.Serializable + +import scala.language.implicitConversions +import scala.math.pow + +import com.twitter.chill.MeatLocker +import org.apache.commons.math3.stat.descriptive.StatisticalSummaryValues +import org.apache.commons.math3.stat.inference.TTest + +import org.apache.spark.Logging +import org.apache.spark.streaming.dstream.DStream +import org.apache.spark.util.StatCounter + +/** + * Significance testing methods for [[StreamingTest]]. New 2-sample statistical significance tests + * should extend [[StreamingTestMethod]] and introduce a new entry in + * [[StreamingTestMethod.TEST_NAME_TO_OBJECT]] + */ +private[stat] sealed trait StreamingTestMethod extends Serializable { + + val MethodName: String + val NullHypothesis: String + + protected type SummaryPairStream = +DStream[(StatCounter, StatCounter)] + + /** + * Perform streaming 2-sample statistical significance testing. + * + * @param sampleSummaries stream pairs of summary statistics for the 2 samples + * @return stream of rest results + */ + def doTest(sampleSummaries: SummaryPairStream): DStream[StreamingTestResult] + + /** + * Implicit adapter to convert between streaming summary statistics type and the type required by + * the t-testing libraries. + */ + protected implicit def toApacheCommonsStats( + summaryStats: StatCounter): StatisticalSummaryValues = { +new StatisticalSummaryValues( + summaryStats.mean, + summaryStats.variance, + summaryStats.count, + summaryStats.max, + summaryStats.min, + summaryStats.mean * summaryStats.count +) + } +} + +/** + * Performs Welch's 2-sample t-test. The null hypothesis is that the two data sets have equal mean. + * This test does not assume equal variance between the two samples and does not assume equal + * sample size. + * + * More information: http://en.wikipedia.org/wiki/Welch%27s_t_test + */ +private[stat] object WelchTTest extends StreamingTestMethod with Logging { + + final val MethodName = "Welch's 2-sample T-test" --- End diff -- `T-test` -> `t-test` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4716#discussion_r39824394 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTest.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.stat.test + +import org.apache.spark.Logging +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.streaming.dstream.DStream +import org.apache.spark.util.StatCounter + +/** + * :: Experimental :: + * Performs online 2-sample significance testing for a stream of (Boolean, Double) pairs. The + * Boolean identifies which sample each observation comes from, and the Double is the numeric value + * of the observation. + * + * To address novelty affects, the `peacePeriod` specifies a set number of initial + * [[org.apache.spark.rdd.RDD]] batches of the [[DStream]] to be dropped from significance testing. + * + * The `windowSize` sets the number of batches each significance test is to be performed over. The + * window is sliding with a stride length of 1 batch. Setting windowSize to 0 will perform + * cumulative processing, using all batches seen so far. + * + * Different tests may be used for assessing statistical significance depending on assumptions + * satisfied by data. For more details, see [[StreamingTestMethod]]. The `testMethod` specifies + * which test will be used. + * + * Use a builder pattern to construct a streaming test in an application, for example: + * ``` + * val model = new OnlineABTest() + * .setPeacePeriod(10) + * .setWindowSize(0) + * .setTestMethod("welch") + * .registerStream(DStream) + * ``` + */ +@Experimental +@Since("1.6.0") +class StreamingTest( +@Since("1.6.0") var peacePeriod: Int = 0, +@Since("1.6.0") var windowSize: Int = 0, +@Since("1.6.0") var testMethod: StreamingTestMethod = WelchTTest) + extends Logging with Serializable { + + /** Set the number of initial batches to ignore. */ + @Since("1.6.0") + def setPeacePeriod(peacePeriod: Int): this.type = { +this.peacePeriod = peacePeriod +this + } + + /** + * Set the number of batches to compute significance tests over. + * A value of 0 will use all batches seen so far. + */ + @Since("1.6.0") + def setWindowSize(windowSize: Int): this.type = { +this.windowSize = windowSize +this + } + + /** Set the statistical method used for significance testing. */ + @Since("1.6.0") + def setTestMethod(method: String): this.type = { +this.testMethod = StreamingTestMethod.getTestMethodFromName(method) +this + } + + /** + * Register a [[DStream]] of values for significance testing. + * + * @param data stream of (key,value) pairs where the key is the group membership (control or + * treatment) and the value is the numerical metric to test for significance + * @return stream of significance testing results + */ + @Since("1.6.0") + def registerStream(data: DStream[(Boolean, Double)]): DStream[StreamingTestResult] = { +val dataAfterPeacePeriod = dropPeacePeriod(data) +val summarizedData = summarizeByKeyAndWindow(dataAfterPeacePeriod) +val pairedSummaries = pairSummaries(summarizedData) +val testResults = testMethod.doTest(pairedSummaries) + +testResults + } + + /** Drop all batches inside the peace period. */ + private[stat] def dropPeacePeriod( + data: DStream[(Boolean, Double)]): DStream[(Boolean, Double)] = { +data.transform { (rdd, time) => + if (time.milliseconds > data.slideDuration.milliseconds * peacePeriod) { +rdd + } else { +rdd.filter(_ => false) // TODO: Is there a better way to drop a RDD from a DStream? + } +} + } +
[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4716#discussion_r39824390 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTest.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.stat.test + +import org.apache.spark.Logging +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.streaming.dstream.DStream +import org.apache.spark.util.StatCounter + +/** + * :: Experimental :: + * Performs online 2-sample significance testing for a stream of (Boolean, Double) pairs. The + * Boolean identifies which sample each observation comes from, and the Double is the numeric value + * of the observation. + * + * To address novelty affects, the `peacePeriod` specifies a set number of initial + * [[org.apache.spark.rdd.RDD]] batches of the [[DStream]] to be dropped from significance testing. + * + * The `windowSize` sets the number of batches each significance test is to be performed over. The + * window is sliding with a stride length of 1 batch. Setting windowSize to 0 will perform + * cumulative processing, using all batches seen so far. + * + * Different tests may be used for assessing statistical significance depending on assumptions + * satisfied by data. For more details, see [[StreamingTestMethod]]. The `testMethod` specifies + * which test will be used. + * + * Use a builder pattern to construct a streaming test in an application, for example: + * ``` + * val model = new OnlineABTest() + * .setPeacePeriod(10) + * .setWindowSize(0) + * .setTestMethod("welch") + * .registerStream(DStream) + * ``` + */ +@Experimental +@Since("1.6.0") +class StreamingTest( +@Since("1.6.0") var peacePeriod: Int = 0, +@Since("1.6.0") var windowSize: Int = 0, +@Since("1.6.0") var testMethod: StreamingTestMethod = WelchTTest) + extends Logging with Serializable { + + /** Set the number of initial batches to ignore. */ + @Since("1.6.0") + def setPeacePeriod(peacePeriod: Int): this.type = { +this.peacePeriod = peacePeriod +this + } + + /** + * Set the number of batches to compute significance tests over. + * A value of 0 will use all batches seen so far. + */ + @Since("1.6.0") + def setWindowSize(windowSize: Int): this.type = { +this.windowSize = windowSize +this + } + + /** Set the statistical method used for significance testing. */ + @Since("1.6.0") + def setTestMethod(method: String): this.type = { +this.testMethod = StreamingTestMethod.getTestMethodFromName(method) +this + } + + /** + * Register a [[DStream]] of values for significance testing. + * + * @param data stream of (key,value) pairs where the key is the group membership (control or + * treatment) and the value is the numerical metric to test for significance + * @return stream of significance testing results + */ + @Since("1.6.0") + def registerStream(data: DStream[(Boolean, Double)]): DStream[StreamingTestResult] = { +val dataAfterPeacePeriod = dropPeacePeriod(data) +val summarizedData = summarizeByKeyAndWindow(dataAfterPeacePeriod) +val pairedSummaries = pairSummaries(summarizedData) +val testResults = testMethod.doTest(pairedSummaries) + +testResults + } + + /** Drop all batches inside the peace period. */ + private[stat] def dropPeacePeriod( + data: DStream[(Boolean, Double)]): DStream[(Boolean, Double)] = { +data.transform { (rdd, time) => + if (time.milliseconds > data.slideDuration.milliseconds * peacePeriod) { +rdd + } else { +rdd.filter(_ => false) // TODO: Is there a better way to drop a RDD from a DStream? + } +} + } +
[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4716#discussion_r39824392 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTest.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.stat.test + +import org.apache.spark.Logging +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.streaming.dstream.DStream +import org.apache.spark.util.StatCounter + +/** + * :: Experimental :: + * Performs online 2-sample significance testing for a stream of (Boolean, Double) pairs. The + * Boolean identifies which sample each observation comes from, and the Double is the numeric value + * of the observation. + * + * To address novelty affects, the `peacePeriod` specifies a set number of initial + * [[org.apache.spark.rdd.RDD]] batches of the [[DStream]] to be dropped from significance testing. + * + * The `windowSize` sets the number of batches each significance test is to be performed over. The + * window is sliding with a stride length of 1 batch. Setting windowSize to 0 will perform + * cumulative processing, using all batches seen so far. + * + * Different tests may be used for assessing statistical significance depending on assumptions + * satisfied by data. For more details, see [[StreamingTestMethod]]. The `testMethod` specifies + * which test will be used. + * + * Use a builder pattern to construct a streaming test in an application, for example: + * ``` + * val model = new OnlineABTest() + * .setPeacePeriod(10) + * .setWindowSize(0) + * .setTestMethod("welch") + * .registerStream(DStream) + * ``` + */ +@Experimental +@Since("1.6.0") +class StreamingTest( +@Since("1.6.0") var peacePeriod: Int = 0, +@Since("1.6.0") var windowSize: Int = 0, +@Since("1.6.0") var testMethod: StreamingTestMethod = WelchTTest) + extends Logging with Serializable { + + /** Set the number of initial batches to ignore. */ + @Since("1.6.0") + def setPeacePeriod(peacePeriod: Int): this.type = { +this.peacePeriod = peacePeriod +this + } + + /** + * Set the number of batches to compute significance tests over. + * A value of 0 will use all batches seen so far. + */ + @Since("1.6.0") + def setWindowSize(windowSize: Int): this.type = { +this.windowSize = windowSize +this + } + + /** Set the statistical method used for significance testing. */ + @Since("1.6.0") + def setTestMethod(method: String): this.type = { +this.testMethod = StreamingTestMethod.getTestMethodFromName(method) +this + } + + /** + * Register a [[DStream]] of values for significance testing. + * + * @param data stream of (key,value) pairs where the key is the group membership (control or + * treatment) and the value is the numerical metric to test for significance + * @return stream of significance testing results + */ + @Since("1.6.0") + def registerStream(data: DStream[(Boolean, Double)]): DStream[StreamingTestResult] = { +val dataAfterPeacePeriod = dropPeacePeriod(data) +val summarizedData = summarizeByKeyAndWindow(dataAfterPeacePeriod) +val pairedSummaries = pairSummaries(summarizedData) +val testResults = testMethod.doTest(pairedSummaries) + +testResults + } + + /** Drop all batches inside the peace period. */ + private[stat] def dropPeacePeriod( + data: DStream[(Boolean, Double)]): DStream[(Boolean, Double)] = { +data.transform { (rdd, time) => + if (time.milliseconds > data.slideDuration.milliseconds * peacePeriod) { +rdd + } else { +rdd.filter(_ => false) // TODO: Is there a better way to drop a RDD from a DStream? + } +} + } +
[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4716#discussion_r39824320 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTest.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.stat.test + +import org.apache.spark.Logging +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.streaming.dstream.DStream +import org.apache.spark.util.StatCounter + +/** + * :: Experimental :: + * Performs online 2-sample significance testing for a stream of (Boolean, Double) pairs. The + * Boolean identifies which sample each observation comes from, and the Double is the numeric value + * of the observation. + * + * To address novelty affects, the `peacePeriod` specifies a set number of initial + * [[org.apache.spark.rdd.RDD]] batches of the [[DStream]] to be dropped from significance testing. + * + * The `windowSize` sets the number of batches each significance test is to be performed over. The + * window is sliding with a stride length of 1 batch. Setting windowSize to 0 will perform + * cumulative processing, using all batches seen so far. + * + * Different tests may be used for assessing statistical significance depending on assumptions + * satisfied by data. For more details, see [[StreamingTestMethod]]. The `testMethod` specifies + * which test will be used. + * + * Use a builder pattern to construct a streaming test in an application, for example: + * ``` + * val model = new OnlineABTest() + * .setPeacePeriod(10) + * .setWindowSize(0) + * .setTestMethod("welch") + * .registerStream(DStream) + * ``` + */ +@Experimental +@Since("1.6.0") +class StreamingTest( +@Since("1.6.0") var peacePeriod: Int = 0, +@Since("1.6.0") var windowSize: Int = 0, +@Since("1.6.0") var testMethod: StreamingTestMethod = WelchTTest) + extends Logging with Serializable { + + /** Set the number of initial batches to ignore. */ + @Since("1.6.0") + def setPeacePeriod(peacePeriod: Int): this.type = { +this.peacePeriod = peacePeriod +this + } + + /** + * Set the number of batches to compute significance tests over. + * A value of 0 will use all batches seen so far. + */ + @Since("1.6.0") + def setWindowSize(windowSize: Int): this.type = { +this.windowSize = windowSize +this + } + + /** Set the statistical method used for significance testing. */ + @Since("1.6.0") + def setTestMethod(method: String): this.type = { +this.testMethod = StreamingTestMethod.getTestMethodFromName(method) +this + } + + /** + * Register a [[DStream]] of values for significance testing. + * + * @param data stream of (key,value) pairs where the key is the group membership (control or + * treatment) and the value is the numerical metric to test for significance + * @return stream of significance testing results + */ + @Since("1.6.0") + def registerStream(data: DStream[(Boolean, Double)]): DStream[StreamingTestResult] = { +val dataAfterPeacePeriod = dropPeacePeriod(data) +val summarizedData = summarizeByKeyAndWindow(dataAfterPeacePeriod) +val pairedSummaries = pairSummaries(summarizedData) +val testResults = testMethod.doTest(pairedSummaries) + +testResults + } + + /** Drop all batches inside the peace period. */ + private[stat] def dropPeacePeriod( + data: DStream[(Boolean, Double)]): DStream[(Boolean, Double)] = { +data.transform { (rdd, time) => + if (time.milliseconds > data.slideDuration.milliseconds * peacePeriod) { +rdd + } else { +rdd.filter(_ => false) // TODO: Is there a better way to drop a RDD from a DStream? --- End diff -- you only ne
[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4716#discussion_r39824292 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTest.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.stat.test + +import org.apache.spark.Logging +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.streaming.dstream.DStream +import org.apache.spark.util.StatCounter + +/** + * :: Experimental :: + * Performs online 2-sample significance testing for a stream of (Boolean, Double) pairs. The + * Boolean identifies which sample each observation comes from, and the Double is the numeric value + * of the observation. + * + * To address novelty affects, the `peacePeriod` specifies a set number of initial + * [[org.apache.spark.rdd.RDD]] batches of the [[DStream]] to be dropped from significance testing. + * + * The `windowSize` sets the number of batches each significance test is to be performed over. The + * window is sliding with a stride length of 1 batch. Setting windowSize to 0 will perform + * cumulative processing, using all batches seen so far. + * + * Different tests may be used for assessing statistical significance depending on assumptions + * satisfied by data. For more details, see [[StreamingTestMethod]]. The `testMethod` specifies + * which test will be used. + * + * Use a builder pattern to construct a streaming test in an application, for example: + * ``` + * val model = new OnlineABTest() + * .setPeacePeriod(10) + * .setWindowSize(0) + * .setTestMethod("welch") + * .registerStream(DStream) + * ``` + */ +@Experimental +@Since("1.6.0") +class StreamingTest( +@Since("1.6.0") var peacePeriod: Int = 0, +@Since("1.6.0") var windowSize: Int = 0, +@Since("1.6.0") var testMethod: StreamingTestMethod = WelchTTest) + extends Logging with Serializable { + + /** Set the number of initial batches to ignore. */ + @Since("1.6.0") + def setPeacePeriod(peacePeriod: Int): this.type = { +this.peacePeriod = peacePeriod +this + } + + /** + * Set the number of batches to compute significance tests over. + * A value of 0 will use all batches seen so far. + */ + @Since("1.6.0") + def setWindowSize(windowSize: Int): this.type = { +this.windowSize = windowSize +this + } + + /** Set the statistical method used for significance testing. */ + @Since("1.6.0") + def setTestMethod(method: String): this.type = { +this.testMethod = StreamingTestMethod.getTestMethodFromName(method) +this + } + + /** + * Register a [[DStream]] of values for significance testing. + * + * @param data stream of (key,value) pairs where the key is the group membership (control or + * treatment) and the value is the numerical metric to test for significance + * @return stream of significance testing results + */ + @Since("1.6.0") + def registerStream(data: DStream[(Boolean, Double)]): DStream[StreamingTestResult] = { +val dataAfterPeacePeriod = dropPeacePeriod(data) +val summarizedData = summarizeByKeyAndWindow(dataAfterPeacePeriod) +val pairedSummaries = pairSummaries(summarizedData) +val testResults = testMethod.doTest(pairedSummaries) --- End diff -- `val testResults = ` is not necessary --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.
[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4716#discussion_r39824264 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTest.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.stat.test + +import org.apache.spark.Logging +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.streaming.dstream.DStream +import org.apache.spark.util.StatCounter + +/** + * :: Experimental :: + * Performs online 2-sample significance testing for a stream of (Boolean, Double) pairs. The + * Boolean identifies which sample each observation comes from, and the Double is the numeric value + * of the observation. + * + * To address novelty affects, the `peacePeriod` specifies a set number of initial + * [[org.apache.spark.rdd.RDD]] batches of the [[DStream]] to be dropped from significance testing. + * + * The `windowSize` sets the number of batches each significance test is to be performed over. The + * window is sliding with a stride length of 1 batch. Setting windowSize to 0 will perform + * cumulative processing, using all batches seen so far. + * + * Different tests may be used for assessing statistical significance depending on assumptions + * satisfied by data. For more details, see [[StreamingTestMethod]]. The `testMethod` specifies + * which test will be used. + * + * Use a builder pattern to construct a streaming test in an application, for example: + * ``` + * val model = new OnlineABTest() + * .setPeacePeriod(10) + * .setWindowSize(0) + * .setTestMethod("welch") + * .registerStream(DStream) + * ``` + */ +@Experimental +@Since("1.6.0") +class StreamingTest( +@Since("1.6.0") var peacePeriod: Int = 0, +@Since("1.6.0") var windowSize: Int = 0, +@Since("1.6.0") var testMethod: StreamingTestMethod = WelchTTest) + extends Logging with Serializable { + + /** Set the number of initial batches to ignore. */ + @Since("1.6.0") + def setPeacePeriod(peacePeriod: Int): this.type = { +this.peacePeriod = peacePeriod +this + } + + /** + * Set the number of batches to compute significance tests over. + * A value of 0 will use all batches seen so far. + */ + @Since("1.6.0") + def setWindowSize(windowSize: Int): this.type = { +this.windowSize = windowSize +this + } + + /** Set the statistical method used for significance testing. */ --- End diff -- document default value and available methods --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4716#discussion_r39824260 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTest.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.stat.test + +import org.apache.spark.Logging +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.streaming.dstream.DStream +import org.apache.spark.util.StatCounter + +/** + * :: Experimental :: + * Performs online 2-sample significance testing for a stream of (Boolean, Double) pairs. The + * Boolean identifies which sample each observation comes from, and the Double is the numeric value + * of the observation. + * + * To address novelty affects, the `peacePeriod` specifies a set number of initial + * [[org.apache.spark.rdd.RDD]] batches of the [[DStream]] to be dropped from significance testing. + * + * The `windowSize` sets the number of batches each significance test is to be performed over. The + * window is sliding with a stride length of 1 batch. Setting windowSize to 0 will perform + * cumulative processing, using all batches seen so far. + * + * Different tests may be used for assessing statistical significance depending on assumptions + * satisfied by data. For more details, see [[StreamingTestMethod]]. The `testMethod` specifies + * which test will be used. + * + * Use a builder pattern to construct a streaming test in an application, for example: + * ``` + * val model = new OnlineABTest() + * .setPeacePeriod(10) + * .setWindowSize(0) + * .setTestMethod("welch") + * .registerStream(DStream) + * ``` + */ +@Experimental +@Since("1.6.0") +class StreamingTest( +@Since("1.6.0") var peacePeriod: Int = 0, +@Since("1.6.0") var windowSize: Int = 0, +@Since("1.6.0") var testMethod: StreamingTestMethod = WelchTTest) + extends Logging with Serializable { + + /** Set the number of initial batches to ignore. */ + @Since("1.6.0") + def setPeacePeriod(peacePeriod: Int): this.type = { +this.peacePeriod = peacePeriod +this + } + + /** + * Set the number of batches to compute significance tests over. + * A value of 0 will use all batches seen so far. --- End diff -- document default value --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4716#discussion_r39824269 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTest.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.stat.test + +import org.apache.spark.Logging +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.streaming.dstream.DStream +import org.apache.spark.util.StatCounter + +/** + * :: Experimental :: + * Performs online 2-sample significance testing for a stream of (Boolean, Double) pairs. The + * Boolean identifies which sample each observation comes from, and the Double is the numeric value + * of the observation. + * + * To address novelty affects, the `peacePeriod` specifies a set number of initial + * [[org.apache.spark.rdd.RDD]] batches of the [[DStream]] to be dropped from significance testing. + * + * The `windowSize` sets the number of batches each significance test is to be performed over. The + * window is sliding with a stride length of 1 batch. Setting windowSize to 0 will perform + * cumulative processing, using all batches seen so far. + * + * Different tests may be used for assessing statistical significance depending on assumptions + * satisfied by data. For more details, see [[StreamingTestMethod]]. The `testMethod` specifies + * which test will be used. + * + * Use a builder pattern to construct a streaming test in an application, for example: + * ``` + * val model = new OnlineABTest() + * .setPeacePeriod(10) + * .setWindowSize(0) + * .setTestMethod("welch") + * .registerStream(DStream) + * ``` + */ +@Experimental +@Since("1.6.0") +class StreamingTest( +@Since("1.6.0") var peacePeriod: Int = 0, +@Since("1.6.0") var windowSize: Int = 0, +@Since("1.6.0") var testMethod: StreamingTestMethod = WelchTTest) + extends Logging with Serializable { + + /** Set the number of initial batches to ignore. */ + @Since("1.6.0") + def setPeacePeriod(peacePeriod: Int): this.type = { +this.peacePeriod = peacePeriod +this + } + + /** + * Set the number of batches to compute significance tests over. + * A value of 0 will use all batches seen so far. + */ + @Since("1.6.0") + def setWindowSize(windowSize: Int): this.type = { +this.windowSize = windowSize +this + } + + /** Set the statistical method used for significance testing. */ + @Since("1.6.0") + def setTestMethod(method: String): this.type = { +this.testMethod = StreamingTestMethod.getTestMethodFromName(method) +this + } + + /** + * Register a [[DStream]] of values for significance testing. + * + * @param data stream of (key,value) pairs where the key is the group membership (control or --- End diff -- document clearly whether `true` means control or experiment --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8631#issuecomment-141344287 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8631#issuecomment-141344297 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4716#discussion_r39824239 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTest.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.stat.test + +import org.apache.spark.Logging +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.streaming.dstream.DStream +import org.apache.spark.util.StatCounter + +/** + * :: Experimental :: + * Performs online 2-sample significance testing for a stream of (Boolean, Double) pairs. The + * Boolean identifies which sample each observation comes from, and the Double is the numeric value + * of the observation. + * + * To address novelty affects, the `peacePeriod` specifies a set number of initial + * [[org.apache.spark.rdd.RDD]] batches of the [[DStream]] to be dropped from significance testing. + * + * The `windowSize` sets the number of batches each significance test is to be performed over. The + * window is sliding with a stride length of 1 batch. Setting windowSize to 0 will perform + * cumulative processing, using all batches seen so far. + * + * Different tests may be used for assessing statistical significance depending on assumptions + * satisfied by data. For more details, see [[StreamingTestMethod]]. The `testMethod` specifies + * which test will be used. + * + * Use a builder pattern to construct a streaming test in an application, for example: + * ``` + * val model = new OnlineABTest() --- End diff -- `StreamingTest` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4716#discussion_r39824244 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTest.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.stat.test + +import org.apache.spark.Logging +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.streaming.dstream.DStream +import org.apache.spark.util.StatCounter + +/** + * :: Experimental :: + * Performs online 2-sample significance testing for a stream of (Boolean, Double) pairs. The + * Boolean identifies which sample each observation comes from, and the Double is the numeric value + * of the observation. + * + * To address novelty affects, the `peacePeriod` specifies a set number of initial + * [[org.apache.spark.rdd.RDD]] batches of the [[DStream]] to be dropped from significance testing. + * + * The `windowSize` sets the number of batches each significance test is to be performed over. The + * window is sliding with a stride length of 1 batch. Setting windowSize to 0 will perform + * cumulative processing, using all batches seen so far. + * + * Different tests may be used for assessing statistical significance depending on assumptions + * satisfied by data. For more details, see [[StreamingTestMethod]]. The `testMethod` specifies + * which test will be used. + * + * Use a builder pattern to construct a streaming test in an application, for example: + * ``` + * val model = new OnlineABTest() + * .setPeacePeriod(10) + * .setWindowSize(0) + * .setTestMethod("welch") + * .registerStream(DStream) + * ``` + */ +@Experimental +@Since("1.6.0") +class StreamingTest( +@Since("1.6.0") var peacePeriod: Int = 0, --- End diff -- The default values are not Java friendly. Since we already have setters, we can make a default constructor with no arguments. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4716#discussion_r39824247 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTest.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.stat.test + +import org.apache.spark.Logging +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.streaming.dstream.DStream +import org.apache.spark.util.StatCounter + +/** + * :: Experimental :: + * Performs online 2-sample significance testing for a stream of (Boolean, Double) pairs. The + * Boolean identifies which sample each observation comes from, and the Double is the numeric value + * of the observation. + * + * To address novelty affects, the `peacePeriod` specifies a set number of initial + * [[org.apache.spark.rdd.RDD]] batches of the [[DStream]] to be dropped from significance testing. + * + * The `windowSize` sets the number of batches each significance test is to be performed over. The + * window is sliding with a stride length of 1 batch. Setting windowSize to 0 will perform + * cumulative processing, using all batches seen so far. + * + * Different tests may be used for assessing statistical significance depending on assumptions + * satisfied by data. For more details, see [[StreamingTestMethod]]. The `testMethod` specifies + * which test will be used. + * + * Use a builder pattern to construct a streaming test in an application, for example: + * ``` + * val model = new OnlineABTest() + * .setPeacePeriod(10) + * .setWindowSize(0) + * .setTestMethod("welch") + * .registerStream(DStream) + * ``` + */ +@Experimental +@Since("1.6.0") +class StreamingTest( +@Since("1.6.0") var peacePeriod: Int = 0, +@Since("1.6.0") var windowSize: Int = 0, +@Since("1.6.0") var testMethod: StreamingTestMethod = WelchTTest) + extends Logging with Serializable { + + /** Set the number of initial batches to ignore. */ --- End diff -- document default value --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4716#discussion_r39824241 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTest.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.stat.test + +import org.apache.spark.Logging +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.streaming.dstream.DStream +import org.apache.spark.util.StatCounter + +/** + * :: Experimental :: + * Performs online 2-sample significance testing for a stream of (Boolean, Double) pairs. The + * Boolean identifies which sample each observation comes from, and the Double is the numeric value + * of the observation. + * + * To address novelty affects, the `peacePeriod` specifies a set number of initial + * [[org.apache.spark.rdd.RDD]] batches of the [[DStream]] to be dropped from significance testing. + * + * The `windowSize` sets the number of batches each significance test is to be performed over. The + * window is sliding with a stride length of 1 batch. Setting windowSize to 0 will perform + * cumulative processing, using all batches seen so far. + * + * Different tests may be used for assessing statistical significance depending on assumptions + * satisfied by data. For more details, see [[StreamingTestMethod]]. The `testMethod` specifies + * which test will be used. + * + * Use a builder pattern to construct a streaming test in an application, for example: + * ``` + * val model = new OnlineABTest() + * .setPeacePeriod(10) + * .setWindowSize(0) + * .setTestMethod("welch") + * .registerStream(DStream) + * ``` + */ +@Experimental +@Since("1.6.0") +class StreamingTest( --- End diff -- add since version to constructor as well: `class StreamingTest @Since("1.6.0") (` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3147][MLLib][Streaming] Streaming 2-sam...
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4716#discussion_r39824237 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/stat/test/StreamingTest.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.mllib.stat.test + +import org.apache.spark.Logging +import org.apache.spark.annotation.{Experimental, Since} +import org.apache.spark.streaming.dstream.DStream +import org.apache.spark.util.StatCounter + +/** + * :: Experimental :: + * Performs online 2-sample significance testing for a stream of (Boolean, Double) pairs. The + * Boolean identifies which sample each observation comes from, and the Double is the numeric value + * of the observation. + * + * To address novelty affects, the `peacePeriod` specifies a set number of initial + * [[org.apache.spark.rdd.RDD]] batches of the [[DStream]] to be dropped from significance testing. + * + * The `windowSize` sets the number of batches each significance test is to be performed over. The + * window is sliding with a stride length of 1 batch. Setting windowSize to 0 will perform + * cumulative processing, using all batches seen so far. + * + * Different tests may be used for assessing statistical significance depending on assumptions + * satisfied by data. For more details, see [[StreamingTestMethod]]. The `testMethod` specifies + * which test will be used. + * + * Use a builder pattern to construct a streaming test in an application, for example: + * ``` --- End diff -- use `{{{` for example code --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9642] [ML] [WIP] LinearRegression shoul...
Github user rotationsymmetry commented on the pull request: https://github.com/apache/spark/pull/8631#issuecomment-141344261 @dbtsai Thanks for the comment on indentation. I have fixed it in the patch. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8518] [ML] Log-linear models for surviv...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/8611 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8518] [ML] Log-linear models for surviv...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/8611#issuecomment-141341794 LGTM. Merged into master. Thanks! I created https://issues.apache.org/jira/browse/SPARK-10686 for follow-up work. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9522][SQL] SparkSubmit process can not ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7853#issuecomment-141340818 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42632/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9522][SQL] SparkSubmit process can not ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7853#issuecomment-141340817 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9522][SQL] SparkSubmit process can not ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7853#issuecomment-141340676 [Test build #42632 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42632/console) for PR 7853 at commit [`504aeb3`](https://github.com/apache/spark/commit/504aeb32260fc0a26cccbed17d1c48b49f99e488). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7936] [SQL] Add configuration for initi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6488#issuecomment-141340120 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42635/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7936] [SQL] Add configuration for initi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6488#issuecomment-141340118 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7936] [SQL] Add configuration for initi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6488#issuecomment-141339998 [Test build #42635 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42635/console) for PR 6488 at commit [`39a9c41`](https://github.com/apache/spark/commit/39a9c4184952c90673a1a9766a72bfc120c23123). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class Interaction(override val uid: String) extends Transformer` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9585] add config to enable inputFormat ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7918#issuecomment-141339490 [Test build #42639 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42639/consoleFull) for PR 7918 at commit [`3c1d41d`](https://github.com/apache/spark/commit/3c1d41d8d8b338b2305281f9ab6b5db927a2706c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9585] add config to enable inputFormat ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7918#issuecomment-141337659 Merged build started. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9585] add config to enable inputFormat ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7918#issuecomment-141337648 Merged build triggered. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8518] [ML] Log-linear models for surviv...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8611#issuecomment-141337339 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8518] [ML] Log-linear models for surviv...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/8611#issuecomment-141337341 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42637/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8518] [ML] Log-linear models for surviv...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/8611#issuecomment-141337307 [Test build #42637 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42637/console) for PR 8611 at commit [`aa37878`](https://github.com/apache/spark/commit/aa37878c50ef6e7722a615298240ba6e61ea083c). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class AFTSurvivalRegression @Since("1.6.0") (@Since("1.6.0") override val uid: String)` * ` require(censor == 1.0 || censor == 0.0, "censor of class AFTPoint must be 1.0 or 0.0")` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9585] add config to enable inputFormat ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7918#issuecomment-141337220 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9585] add config to enable inputFormat ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7918#issuecomment-141337221 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42638/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9585] add config to enable inputFormat ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/7918#issuecomment-141337218 [Test build #42638 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42638/console) for PR 7918 at commit [`70668d7`](https://github.com/apache/spark/commit/70668d7936564dcb25585cd591cfdd7f83958cc3). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8312] [SQL] Populate statistics info of...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/6767#issuecomment-141337142 [Test build #42636 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42636/console) for PR 6767 at commit [`6dbedd1`](https://github.com/apache/spark/commit/6dbedd1fd82412f3c6de27a76807519606748aaf). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8312] [SQL] Populate statistics info of...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6767#issuecomment-141337164 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-8312] [SQL] Populate statistics info of...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/6767#issuecomment-141337167 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/42636/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org