[GitHub] spark issue #16204: [SPARK-18775][SQL] Limit the max number of records writt...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16204 @hvanhovell don't forget this one! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16296: [SPARK-18885][SQL] unify CREATE TABLE syntax for data so...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16296 **[Test build #70398 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70398/testReport)** for PR 16296 at commit [`4049645`](https://github.com/apache/spark/commit/4049645f9a251d6cb8db27f7d2341aab3a1a5596). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16233: [SPARK-18801][SQL] Add `View` operator to help resolve a...
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/16233 I think we all agree that a wrapper is needed to handle the case of nested views, it could be an `AnalysisContext` in `Analyzer`, or `viewContext` in `CatalogTable`, or an operator node such as `View` or `SubqueryAlias`. Perhaps we should ask @hvanhovell to share his opinion on this issue? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16232: [SPARK-18800][SQL] Fix UnsafeKVExternalSorter by correct...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16232 @davies Ok. I got it. I will update the assert. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16233: [SPARK-18801][SQL] Add `View` operator to help resolve a...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16233 I'm thinking about if we really need the wrapper: the `View` operator. Given a table/view identifier, the steps to resolve it: 1. if the database is specified, get the table/view metadata from that database. 1. If the database is not specified, try to resolve it as temp view first. 2. If it's not a temp view, get the table/view metadata from the current database. For nested views, it's a different story. The sub-plan-tree of the nested view may have a different "currentDatabase". It's kind of under a different analysis context, and wrapping the sub-plan-tree with a `View` operator can solve this problem, but I have a simpler proposal: ``` def lookupRelation(...) = { ... if (table.tableType == CatalogTableType.VIEW) { val viewContext = table.viewContext val viewText = table.viewText sparkSession.sessionState.sqlParser.parsePlan(viewText).transform { case u @ UnresolvedRelation(tableIdent) if tableIdent.database.isEmpty => u.copy(tableIdent = tableIdent.copy(database = Some(viewContext.currentDatase))) } ... } ... } ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16232: [SPARK-18800][SQL] Fix UnsafeKVExternalSorter by correct...
Github user davies commented on the issue: https://github.com/apache/spark/pull/16232 @viirya without a repro, I don't think this is the root cause. There could be a random corrupt that cause the error. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12775 **[Test build #70397 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70397/testReport)** for PR 12775 at commit [`9778cef`](https://github.com/apache/spark/commit/9778cefce3e152d559e53cd4e2f5a113e561f0ff). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...
Github user lirui-intel commented on the issue: https://github.com/apache/spark/pull/12775 Sure. Updated patch to not catch Throwable. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/12775 Ok that's fine with me -- @lirui-intel can you make that change? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16232: [SPARK-18800][SQL] Fix UnsafeKVExternalSorter by correct...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16232 @davies Actually this pr is motivated by a reported error on dev mailling list at http://apache-spark-developers-list.1001551.n3.nabble.com/java-lang-IllegalStateException-There-is-no-space-for-new-record-tc20108.html So if the array size is not enough, don't we need to allocate big enough array for the sorter like the current change? The reporter doesn't have the repro, but I think this place is the only one which will cause this error. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16296: [SPARK-18885][SQL] unify CREATE TABLE syntax for data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16296 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16296: [SPARK-18885][SQL] unify CREATE TABLE syntax for data so...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16296 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70395/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16296: [SPARK-18885][SQL] unify CREATE TABLE syntax for data so...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16296 **[Test build #70395 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70395/testReport)** for PR 16296 at commit [`631edf7`](https://github.com/apache/spark/commit/631edf75ed83a9e7598b746dc81c46d9a7761e09). * This patch **fails Spark unit tests**. * This patch **does not merge cleanly**. * This patch adds the following public classes _(experimental)_: * `class DetermineHiveSerde(conf: SQLConf) extends Rule[LogicalPlan] ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN subquery
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16337 I actually don't mind having 01, 02, 03, etc, but still some higher level grouping would be useful. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/12775 @kayousterhout Exactly. The logError is already handled elsewhere (and the throwable it is not ignored there). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16330: [SPARK-18817][SPARKR][SQL] change derby log outpu...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16330#discussion_r93176308 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -104,6 +104,12 @@ class SparkHadoopUtil extends Logging { } val bufferSize = conf.get("spark.buffer.size", "65536") hadoopConf.set("io.file.buffer.size", bufferSize) + + if (conf.contains("spark.sql.default.derby.dir")) { --- End diff -- Why do we need to introduce this flag? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16308: [SPARK-18936][SQL] Infrastructure for session local time...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/16308 @rxin I see, created. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16189: [SPARK-18761][CORE] Introduce "task reaper" to oversee t...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/16189 @mridulm Sure. Also, please feel free to leave more comments :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16308: [SPARK-18936][SQL] Infrastructure for session local time...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16308 Can you create a subtask at https://issues.apache.org/jira/browse/SPARK-18350 ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16308: [SPARK-18936][SQL] Infrastructure for session local time...
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/16308 @rxin I'd like to have a follow-up pr related to partition values. I didn't include it to this pr, but I think we need it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16308: [SPARK-18350][SQL] Support session local timezone.
Github user rxin commented on the issue: https://github.com/apache/spark/pull/16308 Thanks this looks great. Couple things: 1. Can you change the referenced JIRA to https://issues.apache.org/jira/browse/SPARK-18936 2. We should do a more detailed pass to make sure there isn't any issue with performance for the impacted expressions (e.g. don't create a new timezone object or do hash lookups per row). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16308: [SPARK-18350][SQL] Support session local timezone.
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16308 **[Test build #70396 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70396/testReport)** for PR 16308 at commit [`4b6900c`](https://github.com/apache/spark/commit/4b6900cf6d182d87a545d736d320c6229fb8251d). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16308: [SPARK-18350][SQL] Support session local timezone.
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/16308 @rxin I updated the description. Is it enough for you? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15721: [SPARK-17772][ML][TEST] Add test functions for ML...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15721#discussion_r93174061 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala --- @@ -224,4 +208,139 @@ object MLTestingUtils extends SparkFunSuite { }.toDF() (overSampledData, weightedData) } + + /** + * Generates a linear prediction function where the coefficients are generated randomly. + * The function produces a continuous (numClasses = 0) or categorical (numClasses > 0) label. + */ + def getRandomLinearPredictionFunction( + numFeatures: Int, + numClasses: Int, + seed: Long): (Vector => Double) = { +val rng = new scala.util.Random(seed) +val trueNumClasses = if (numClasses == 0) 1 else numClasses +val coefArray = Array.fill(numFeatures * trueNumClasses)(rng.nextDouble - 0.5) +(features: Vector) => { + if (numClasses == 0) { +BLAS.dot(features, new DenseVector(coefArray)) + } else { +val margins = new DenseVector(new Array[Double](numClasses)) +val coefMat = new DenseMatrix(numClasses, numFeatures, coefArray) +BLAS.gemv(1.0, coefMat, features, 1.0, margins) +margins.argmax.toDouble + } +} + } + + /** + * A helper function to generate synthetic data. Generates random feature values, + * both categorical and continuous, according to `categoricalFeaturesInfo`. The label is generated + * from a random prediction function, and noise is added to the true label. + * + * @param numPoints The number of data points to generate. + * @param numClasses The number of classes the outcome can take on. 0 for continuous labels. + * @param numFeatures The number of features in the data. + * @param categoricalFeaturesInfo Map of (featureIndex -> numCategories) for categorical features. + * @param seed Random seed. + * @param noiseLevel A number in [0.0, 1.0] indicating how much noise to add to the label. + * @return Generated sequence of noisy instances. + */ + def generateNoisyData( + numPoints: Int, + numClasses: Int, + numFeatures: Int, + categoricalFeaturesInfo: Map[Int, Int], + seed: Long, + noiseLevel: Double = 0.3): Seq[Instance] = { +require(noiseLevel >= 0.0 && noiseLevel <= 1.0, "noiseLevel must be in range [0.0, 1.0]") +val rng = new scala.util.Random(seed) +val predictionFunc = getRandomLinearPredictionFunction(numFeatures, numClasses, seed) +Range(0, numPoints).map { i => + val features = Vectors.dense(Array.tabulate(numFeatures) { j => +val numCategories = categoricalFeaturesInfo.getOrElse(j, 0) +if (numCategories > 0) { + rng.nextInt(numCategories) +} else { + rng.nextDouble() - 0.5 +} + }) + val label = predictionFunc(features) + val noisyLabel = if (numClasses > 0) { +// with probability equal to noiseLevel, select a random class instead of the true class +if (rng.nextDouble < noiseLevel) rng.nextInt(numClasses) else label + } else { +// add noise to the label proportional to the noise level +label + noiseLevel * rng.nextGaussian() + } + Instance(noisyLabel, 1.0, features) +} + } + + /** + * Helper function for testing sample weights. Tests that oversampling each point is equivalent + * to assigning a sample weight proportional to the number of samples for each point. + */ + def testOversamplingVsWeighting[M <: Model[M], E <: Estimator[M]]( +spark: SparkSession, +estimator: E with HasWeightCol with HasLabelCol with HasFeaturesCol, +categoricalFeaturesInfo: Map[Int, Int], +numPoints: Int, +numClasses: Int, +numFeatures: Int, +modelEquals: (M, M) => Unit, +seed: Long): Unit = { +import spark.implicits._ +val df = generateNoisyData(numPoints, numClasses, numFeatures, categoricalFeaturesInfo, + seed).toDF() +val (overSampledData, weightedData) = genEquivalentOversampledAndWeightedInstances( + df, estimator.getLabelCol, estimator.getFeaturesCol, seed) +val weightedModel = estimator.set(estimator.weightCol, "weight").fit(weightedData) +val overSampledModel = estimator.set(estimator.weightCol, "").fit(overSampledData) +modelEquals(weightedModel, overSampledModel) + } + + /** + * Helper function for testing sample weights. Tests that injecting a large number of outliers + * with very small sample weights does not affect fitting. The predictor should learn the
[GitHub] spark pull request #15721: [SPARK-17772][ML][TEST] Add test functions for ML...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15721#discussion_r93172081 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala --- @@ -224,4 +208,139 @@ object MLTestingUtils extends SparkFunSuite { }.toDF() (overSampledData, weightedData) } + + /** + * Generates a linear prediction function where the coefficients are generated randomly. + * The function produces a continuous (numClasses = 0) or categorical (numClasses > 0) label. + */ + def getRandomLinearPredictionFunction( + numFeatures: Int, + numClasses: Int, + seed: Long): (Vector => Double) = { +val rng = new scala.util.Random(seed) +val trueNumClasses = if (numClasses == 0) 1 else numClasses +val coefArray = Array.fill(numFeatures * trueNumClasses)(rng.nextDouble - 0.5) +(features: Vector) => { + if (numClasses == 0) { +BLAS.dot(features, new DenseVector(coefArray)) + } else { +val margins = new DenseVector(new Array[Double](numClasses)) +val coefMat = new DenseMatrix(numClasses, numFeatures, coefArray) +BLAS.gemv(1.0, coefMat, features, 1.0, margins) +margins.argmax.toDouble + } +} + } + + /** + * A helper function to generate synthetic data. Generates random feature values, + * both categorical and continuous, according to `categoricalFeaturesInfo`. The label is generated + * from a random prediction function, and noise is added to the true label. + * + * @param numPoints The number of data points to generate. + * @param numClasses The number of classes the outcome can take on. 0 for continuous labels. + * @param numFeatures The number of features in the data. + * @param categoricalFeaturesInfo Map of (featureIndex -> numCategories) for categorical features. + * @param seed Random seed. + * @param noiseLevel A number in [0.0, 1.0] indicating how much noise to add to the label. + * @return Generated sequence of noisy instances. + */ + def generateNoisyData( + numPoints: Int, + numClasses: Int, + numFeatures: Int, + categoricalFeaturesInfo: Map[Int, Int], + seed: Long, + noiseLevel: Double = 0.3): Seq[Instance] = { +require(noiseLevel >= 0.0 && noiseLevel <= 1.0, "noiseLevel must be in range [0.0, 1.0]") +val rng = new scala.util.Random(seed) +val predictionFunc = getRandomLinearPredictionFunction(numFeatures, numClasses, seed) +Range(0, numPoints).map { i => + val features = Vectors.dense(Array.tabulate(numFeatures) { j => +val numCategories = categoricalFeaturesInfo.getOrElse(j, 0) +if (numCategories > 0) { + rng.nextInt(numCategories) +} else { + rng.nextDouble() - 0.5 +} + }) + val label = predictionFunc(features) + val noisyLabel = if (numClasses > 0) { +// with probability equal to noiseLevel, select a random class instead of the true class +if (rng.nextDouble < noiseLevel) rng.nextInt(numClasses) else label + } else { +// add noise to the label proportional to the noise level +label + noiseLevel * rng.nextGaussian() + } + Instance(noisyLabel, 1.0, features) +} + } + + /** + * Helper function for testing sample weights. Tests that oversampling each point is equivalent + * to assigning a sample weight proportional to the number of samples for each point. + */ + def testOversamplingVsWeighting[M <: Model[M], E <: Estimator[M]]( +spark: SparkSession, +estimator: E with HasWeightCol with HasLabelCol with HasFeaturesCol, +categoricalFeaturesInfo: Map[Int, Int], +numPoints: Int, +numClasses: Int, +numFeatures: Int, +modelEquals: (M, M) => Unit, +seed: Long): Unit = { +import spark.implicits._ +val df = generateNoisyData(numPoints, numClasses, numFeatures, categoricalFeaturesInfo, --- End diff -- If we add noise in native data generators(see my above comment), we should remove this line and pass in the generated dataset(which already includes noise) directly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For
[GitHub] spark pull request #15721: [SPARK-17772][ML][TEST] Add test functions for ML...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15721#discussion_r93172224 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala --- @@ -224,4 +208,139 @@ object MLTestingUtils extends SparkFunSuite { }.toDF() (overSampledData, weightedData) } + + /** + * Generates a linear prediction function where the coefficients are generated randomly. + * The function produces a continuous (numClasses = 0) or categorical (numClasses > 0) label. + */ + def getRandomLinearPredictionFunction( + numFeatures: Int, + numClasses: Int, + seed: Long): (Vector => Double) = { +val rng = new scala.util.Random(seed) +val trueNumClasses = if (numClasses == 0) 1 else numClasses +val coefArray = Array.fill(numFeatures * trueNumClasses)(rng.nextDouble - 0.5) +(features: Vector) => { + if (numClasses == 0) { +BLAS.dot(features, new DenseVector(coefArray)) + } else { +val margins = new DenseVector(new Array[Double](numClasses)) +val coefMat = new DenseMatrix(numClasses, numFeatures, coefArray) +BLAS.gemv(1.0, coefMat, features, 1.0, margins) +margins.argmax.toDouble + } +} + } + + /** + * A helper function to generate synthetic data. Generates random feature values, + * both categorical and continuous, according to `categoricalFeaturesInfo`. The label is generated + * from a random prediction function, and noise is added to the true label. + * + * @param numPoints The number of data points to generate. + * @param numClasses The number of classes the outcome can take on. 0 for continuous labels. + * @param numFeatures The number of features in the data. + * @param categoricalFeaturesInfo Map of (featureIndex -> numCategories) for categorical features. + * @param seed Random seed. + * @param noiseLevel A number in [0.0, 1.0] indicating how much noise to add to the label. + * @return Generated sequence of noisy instances. + */ + def generateNoisyData( + numPoints: Int, + numClasses: Int, + numFeatures: Int, + categoricalFeaturesInfo: Map[Int, Int], + seed: Long, + noiseLevel: Double = 0.3): Seq[Instance] = { +require(noiseLevel >= 0.0 && noiseLevel <= 1.0, "noiseLevel must be in range [0.0, 1.0]") +val rng = new scala.util.Random(seed) +val predictionFunc = getRandomLinearPredictionFunction(numFeatures, numClasses, seed) +Range(0, numPoints).map { i => + val features = Vectors.dense(Array.tabulate(numFeatures) { j => +val numCategories = categoricalFeaturesInfo.getOrElse(j, 0) +if (numCategories > 0) { + rng.nextInt(numCategories) +} else { + rng.nextDouble() - 0.5 +} + }) + val label = predictionFunc(features) + val noisyLabel = if (numClasses > 0) { +// with probability equal to noiseLevel, select a random class instead of the true class +if (rng.nextDouble < noiseLevel) rng.nextInt(numClasses) else label + } else { +// add noise to the label proportional to the noise level +label + noiseLevel * rng.nextGaussian() + } + Instance(noisyLabel, 1.0, features) +} + } + + /** + * Helper function for testing sample weights. Tests that oversampling each point is equivalent + * to assigning a sample weight proportional to the number of samples for each point. + */ + def testOversamplingVsWeighting[M <: Model[M], E <: Estimator[M]]( +spark: SparkSession, +estimator: E with HasWeightCol with HasLabelCol with HasFeaturesCol, +categoricalFeaturesInfo: Map[Int, Int], +numPoints: Int, +numClasses: Int, +numFeatures: Int, +modelEquals: (M, M) => Unit, +seed: Long): Unit = { +import spark.implicits._ +val df = generateNoisyData(numPoints, numClasses, numFeatures, categoricalFeaturesInfo, + seed).toDF() +val (overSampledData, weightedData) = genEquivalentOversampledAndWeightedInstances( + df, estimator.getLabelCol, estimator.getFeaturesCol, seed) +val weightedModel = estimator.set(estimator.weightCol, "weight").fit(weightedData) +val overSampledModel = estimator.set(estimator.weightCol, "").fit(overSampledData) +modelEquals(weightedModel, overSampledModel) + } + + /** + * Helper function for testing sample weights. Tests that injecting a large number of outliers + * with very small sample weights does not affect fitting. The predictor should learn the
[GitHub] spark pull request #15721: [SPARK-17772][ML][TEST] Add test functions for ML...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15721#discussion_r93171343 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala --- @@ -224,4 +208,139 @@ object MLTestingUtils extends SparkFunSuite { }.toDF() (overSampledData, weightedData) } + + /** + * Generates a linear prediction function where the coefficients are generated randomly. + * The function produces a continuous (numClasses = 0) or categorical (numClasses > 0) label. + */ + def getRandomLinearPredictionFunction( + numFeatures: Int, + numClasses: Int, + seed: Long): (Vector => Double) = { +val rng = new scala.util.Random(seed) +val trueNumClasses = if (numClasses == 0) 1 else numClasses +val coefArray = Array.fill(numFeatures * trueNumClasses)(rng.nextDouble - 0.5) +(features: Vector) => { + if (numClasses == 0) { +BLAS.dot(features, new DenseVector(coefArray)) + } else { +val margins = new DenseVector(new Array[Double](numClasses)) +val coefMat = new DenseMatrix(numClasses, numFeatures, coefArray) +BLAS.gemv(1.0, coefMat, features, 1.0, margins) +margins.argmax.toDouble + } +} + } + + /** + * A helper function to generate synthetic data. Generates random feature values, + * both categorical and continuous, according to `categoricalFeaturesInfo`. The label is generated + * from a random prediction function, and noise is added to the true label. + * + * @param numPoints The number of data points to generate. + * @param numClasses The number of classes the outcome can take on. 0 for continuous labels. + * @param numFeatures The number of features in the data. + * @param categoricalFeaturesInfo Map of (featureIndex -> numCategories) for categorical features. + * @param seed Random seed. + * @param noiseLevel A number in [0.0, 1.0] indicating how much noise to add to the label. + * @return Generated sequence of noisy instances. + */ + def generateNoisyData( --- End diff -- I am a bit worried whether we should provide this general noisy data generation function: * It's better we can generate data following the rule of specific algorithms, for example, users provide coefficients, the mean and variance of generated features for ```LogisticRegression```. * Actually, some generators such as [```LinearDataGenerator.generateLinearInput```](https://github.com/apache/spark/blob/master/mllib/src/main/scala/org/apache/spark/mllib/util/LinearDataGenerator.scala#L97) has already considered the noise level. Just like ```LinearDataGenerator.generateLinearInput```, I think we should add argument ```eps``` for other generators such as ```LogisticRegressionSuite.generateLogisticInput, LogisticRegressionSuite.generateMultinomialLogisticInput, NaiveBayesSuite.generateNaiveBayesInput```, to make them output noisy label natively. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #15721: [SPARK-17772][ML][TEST] Add test functions for ML...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15721#discussion_r93172654 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala --- @@ -224,4 +208,139 @@ object MLTestingUtils extends SparkFunSuite { }.toDF() (overSampledData, weightedData) } + + /** + * Generates a linear prediction function where the coefficients are generated randomly. + * The function produces a continuous (numClasses = 0) or categorical (numClasses > 0) label. + */ + def getRandomLinearPredictionFunction( + numFeatures: Int, + numClasses: Int, + seed: Long): (Vector => Double) = { +val rng = new scala.util.Random(seed) +val trueNumClasses = if (numClasses == 0) 1 else numClasses +val coefArray = Array.fill(numFeatures * trueNumClasses)(rng.nextDouble - 0.5) +(features: Vector) => { + if (numClasses == 0) { +BLAS.dot(features, new DenseVector(coefArray)) + } else { +val margins = new DenseVector(new Array[Double](numClasses)) +val coefMat = new DenseMatrix(numClasses, numFeatures, coefArray) +BLAS.gemv(1.0, coefMat, features, 1.0, margins) +margins.argmax.toDouble + } +} + } + + /** + * A helper function to generate synthetic data. Generates random feature values, + * both categorical and continuous, according to `categoricalFeaturesInfo`. The label is generated + * from a random prediction function, and noise is added to the true label. + * + * @param numPoints The number of data points to generate. + * @param numClasses The number of classes the outcome can take on. 0 for continuous labels. + * @param numFeatures The number of features in the data. + * @param categoricalFeaturesInfo Map of (featureIndex -> numCategories) for categorical features. + * @param seed Random seed. + * @param noiseLevel A number in [0.0, 1.0] indicating how much noise to add to the label. + * @return Generated sequence of noisy instances. + */ + def generateNoisyData( + numPoints: Int, + numClasses: Int, + numFeatures: Int, + categoricalFeaturesInfo: Map[Int, Int], + seed: Long, + noiseLevel: Double = 0.3): Seq[Instance] = { +require(noiseLevel >= 0.0 && noiseLevel <= 1.0, "noiseLevel must be in range [0.0, 1.0]") +val rng = new scala.util.Random(seed) +val predictionFunc = getRandomLinearPredictionFunction(numFeatures, numClasses, seed) +Range(0, numPoints).map { i => + val features = Vectors.dense(Array.tabulate(numFeatures) { j => +val numCategories = categoricalFeaturesInfo.getOrElse(j, 0) +if (numCategories > 0) { + rng.nextInt(numCategories) +} else { + rng.nextDouble() - 0.5 +} + }) + val label = predictionFunc(features) + val noisyLabel = if (numClasses > 0) { +// with probability equal to noiseLevel, select a random class instead of the true class +if (rng.nextDouble < noiseLevel) rng.nextInt(numClasses) else label + } else { +// add noise to the label proportional to the noise level +label + noiseLevel * rng.nextGaussian() + } + Instance(noisyLabel, 1.0, features) +} + } + + /** + * Helper function for testing sample weights. Tests that oversampling each point is equivalent + * to assigning a sample weight proportional to the number of samples for each point. + */ + def testOversamplingVsWeighting[M <: Model[M], E <: Estimator[M]]( +spark: SparkSession, +estimator: E with HasWeightCol with HasLabelCol with HasFeaturesCol, +categoricalFeaturesInfo: Map[Int, Int], +numPoints: Int, +numClasses: Int, +numFeatures: Int, +modelEquals: (M, M) => Unit, +seed: Long): Unit = { +import spark.implicits._ +val df = generateNoisyData(numPoints, numClasses, numFeatures, categoricalFeaturesInfo, + seed).toDF() +val (overSampledData, weightedData) = genEquivalentOversampledAndWeightedInstances( + df, estimator.getLabelCol, estimator.getFeaturesCol, seed) +val weightedModel = estimator.set(estimator.weightCol, "weight").fit(weightedData) +val overSampledModel = estimator.set(estimator.weightCol, "").fit(overSampledData) +modelEquals(weightedModel, overSampledModel) + } + + /** + * Helper function for testing sample weights. Tests that injecting a large number of outliers + * with very small sample weights does not affect fitting. The predictor should learn the
[GitHub] spark pull request #15721: [SPARK-17772][ML][TEST] Add test functions for ML...
Github user yanboliang commented on a diff in the pull request: https://github.com/apache/spark/pull/15721#discussion_r93172182 --- Diff: mllib/src/test/scala/org/apache/spark/ml/util/MLTestingUtils.scala --- @@ -224,4 +208,139 @@ object MLTestingUtils extends SparkFunSuite { }.toDF() (overSampledData, weightedData) } + + /** + * Generates a linear prediction function where the coefficients are generated randomly. + * The function produces a continuous (numClasses = 0) or categorical (numClasses > 0) label. + */ + def getRandomLinearPredictionFunction( + numFeatures: Int, + numClasses: Int, + seed: Long): (Vector => Double) = { +val rng = new scala.util.Random(seed) +val trueNumClasses = if (numClasses == 0) 1 else numClasses +val coefArray = Array.fill(numFeatures * trueNumClasses)(rng.nextDouble - 0.5) +(features: Vector) => { + if (numClasses == 0) { +BLAS.dot(features, new DenseVector(coefArray)) + } else { +val margins = new DenseVector(new Array[Double](numClasses)) +val coefMat = new DenseMatrix(numClasses, numFeatures, coefArray) +BLAS.gemv(1.0, coefMat, features, 1.0, margins) +margins.argmax.toDouble + } +} + } + + /** + * A helper function to generate synthetic data. Generates random feature values, + * both categorical and continuous, according to `categoricalFeaturesInfo`. The label is generated + * from a random prediction function, and noise is added to the true label. + * + * @param numPoints The number of data points to generate. + * @param numClasses The number of classes the outcome can take on. 0 for continuous labels. + * @param numFeatures The number of features in the data. + * @param categoricalFeaturesInfo Map of (featureIndex -> numCategories) for categorical features. + * @param seed Random seed. + * @param noiseLevel A number in [0.0, 1.0] indicating how much noise to add to the label. + * @return Generated sequence of noisy instances. + */ + def generateNoisyData( + numPoints: Int, + numClasses: Int, + numFeatures: Int, + categoricalFeaturesInfo: Map[Int, Int], + seed: Long, + noiseLevel: Double = 0.3): Seq[Instance] = { +require(noiseLevel >= 0.0 && noiseLevel <= 1.0, "noiseLevel must be in range [0.0, 1.0]") +val rng = new scala.util.Random(seed) +val predictionFunc = getRandomLinearPredictionFunction(numFeatures, numClasses, seed) +Range(0, numPoints).map { i => + val features = Vectors.dense(Array.tabulate(numFeatures) { j => +val numCategories = categoricalFeaturesInfo.getOrElse(j, 0) +if (numCategories > 0) { + rng.nextInt(numCategories) +} else { + rng.nextDouble() - 0.5 +} + }) + val label = predictionFunc(features) + val noisyLabel = if (numClasses > 0) { +// with probability equal to noiseLevel, select a random class instead of the true class +if (rng.nextDouble < noiseLevel) rng.nextInt(numClasses) else label + } else { +// add noise to the label proportional to the noise level +label + noiseLevel * rng.nextGaussian() + } + Instance(noisyLabel, 1.0, features) +} + } + + /** + * Helper function for testing sample weights. Tests that oversampling each point is equivalent + * to assigning a sample weight proportional to the number of samples for each point. + */ + def testOversamplingVsWeighting[M <: Model[M], E <: Estimator[M]]( +spark: SparkSession, --- End diff -- Indent. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16232: [SPARK-18800][SQL] Fix UnsafeKVExternalSorter by correct...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16232 OK. I will update accordingly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16240: [SPARK-16792][SQL] Dataset containing a Case Class with ...
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/16240 The overall strategy LGTM. > I had to alter and add new implicit encoders into SQLImplicits. The new encoders are for Seq with Product combination (essentially only List) to disambiguate between Seq and Product encoders. Does scala have a clear definition for this case? i.e. we have implicit for both type `A` and `B`, given type `A with B`, which implicit will be picked? For the optimization, we can do it in follow-up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16232: [SPARK-18800][SQL] Fix UnsafeKVExternalSorter by correct...
Github user davies commented on the issue: https://github.com/apache/spark/pull/16232 That make sense, we should update the assert. But this still is not a bug, the other changes are not needed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16337: [SPARK-18871][SQL] New test cases for IN/NOT IN subquery
Github user nsyca commented on the issue: https://github.com/apache/spark/pull/16337 Let me try to summarize the comments around the structure of the test files here: 1. A single file of 200+ test cases are too big. We prefer smaller files with logical groupings. 2. File name with serial number is not the way Spark names files. I'd like to generate more discussions before we come to a conclusion. - It is possible to group test cases and what we tried to loosely group them is naming them in groups in the test file like TC 01.xx. 01 is effectively the group number. We can easily change to put one group in each file. - Sometimes grouping rigidly is not desirable, or impossible. Does a test case of 'EXISTS .. OR NOT IN' go to the 'EXISTS' group, the 'NOT IN' group, or the 'disjunctive subquery' group? Does a test case of 'EXISTS ( .. ) UNION EXISTS ( .. )' go into the same group as 'EXISTS ( .. UNION .. )', or the first goes to the 'UNION' suite and the latter 'subquery' suite? Shall we have test cases with one classification go to the "simple" set and the ones with more than one way to classify go to the "complex" set? Overtime, people will pile up most of them in the "complex" set and it will be bloated. And we will end up with "complex-1", "complex-2", etc. - Arguably we have a purpose when writing a test case but sometimes it triggers an unrelated problem. If a test case is intended to test a subquery functionality but ends up revealing a missed opportunity in join reordering, should we move it into the 'join reordering' suite and leave it in the 'subquery' suite? - With the current one level flat structure in sql/core/src/test/resources/sql-tests/inputs/, we could possibly end up with thousands of files in the (near) future if a file contains only a handful of test cases. What is a good solution? Should we create a subdirectory named subquery/ and break up the test cases into small files under this directory? I don't think we have a silver bullet for this kind of problem. Let's brainstorm here. I (or someone else) could moderate the discussion. Eventually we will need to pick one way or the another. And if we need to change it in the future, we pay the price for it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16346: [SPARK-16654][CORE] Add UI coverage for Application Leve...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16346 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16346: [SPARK-16654][CORE] Add UI coverage for Application Leve...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16346 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70393/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16346: [SPARK-16654][CORE] Add UI coverage for Application Leve...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16346 **[Test build #70393 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70393/testReport)** for PR 16346 at commit [`20ff7dd`](https://github.com/apache/spark/commit/20ff7dddea72bf8fc9330f464992b19e1bf1c59e). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class SparkListenerExecutorBlacklisted(` * `case class SparkListenerExecutorUnblacklisted(time: Long, executorId: String)` * `case class SparkListenerNodeBlacklisted(` * `case class SparkListenerNodeUnblacklisted(time: Long, nodeId: String)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/12775 @mridulm I see -- so you're saying to keep the finally block but remove catching the Throwable? So eliminate the logError, but otherwise the functionality is the same? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16296: [SPARK-18885][SQL] unify CREATE TABLE syntax for data so...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16296 **[Test build #70395 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70395/testReport)** for PR 16296 at commit [`631edf7`](https://github.com/apache/spark/commit/631edf75ed83a9e7598b746dc81c46d9a7761e09). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16313: [SPARK-18899][SPARK-18912][SPARK-18913][SQL] refa...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16313 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16313: [SPARK-18899][SPARK-18912][SPARK-18913][SQL] refactor th...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16313 Thanks! Merging to master/2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16348: Branch 2.0.4399
Github user laixiaohang closed the pull request at: https://github.com/apache/spark/pull/16348 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16348: Branch 2.0.4399
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16348 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16348: Branch 2.0.4399
GitHub user laixiaohang opened a pull request: https://github.com/apache/spark/pull/16348 Branch 2.0.4399 ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review http://spark.apache.org/contributing.html before opening a pull request. You can merge this pull request into a Git repository by running: $ git pull https://github.com/laixiaohang/spark branch-2.0.4399 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16348.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16348 commit c9c36fa0c7bccefde808bdbc32b04e8555356001 Author: Davies LiuDate: 2016-09-02T22:10:12Z [SPARK-17230] [SQL] Should not pass optimized query into QueryExecution in DataFrameWriter Some analyzer rules have assumptions on logical plans, optimizer may break these assumption, we should not pass an optimized query plan into QueryExecution (will be analyzed again), otherwise we may some weird bugs. For example, we have a rule for decimal calculation to promote the precision before binary operations, use PromotePrecision as placeholder to indicate that this rule should not apply twice. But a Optimizer rule will remove this placeholder, that break the assumption, then the rule applied twice, cause wrong result. Ideally, we should make all the analyzer rules all idempotent, that may require lots of effort to double checking them one by one (may be not easy). An easier approach could be never feed a optimized plan into Analyzer, this PR fix the case for RunnableComand, they will be optimized, during execution, the passed `query` will also be passed into QueryExecution again. This PR make these `query` not part of the children, so they will not be optimized and analyzed again. Right now, we did not know a logical plan is optimized or not, we could introduce a flag for that, and make sure a optimized logical plan will not be analyzed again. Added regression tests. Author: Davies Liu Closes #14797 from davies/fix_writer. (cherry picked from commit ed9c884dcf925500ceb388b06b33bd2c95cd2ada) Signed-off-by: Davies Liu commit a3930c3b9afa9f7eba2a5c8b8f279ca38e348e9b Author: Sameer Agarwal Date: 2016-09-02T22:16:16Z [SPARK-16334] Reusing same dictionary column for decoding consecutive row groups shouldn't throw an error This patch fixes a bug in the vectorized parquet reader that's caused by re-using the same dictionary column vector while reading consecutive row groups. Specifically, this issue manifests for a certain distribution of dictionary/plain encoded data while we read/populate the underlying bit packed dictionary data into a column-vector based data structure. Manually tested on datasets provided by the community. Thanks to Chris Perluss and Keith Kraus for their invaluable help in tracking down this issue! Author: Sameer Agarwal Closes #14941 from sameeragarwal/parquet-exception-2. (cherry picked from commit a2c9acb0e54b2e38cb8ee6431f1ea0e0b4cd959a) Signed-off-by: Davies Liu commit b8f65dad7be22231e982aaec3bbd69dbeacc20da Author: Davies Liu Date: 2016-09-02T22:40:02Z Fix build commit c0ea7707127c92ecb51794b96ea40d7cdb28b168 Author: Davies Liu Date: 2016-09-02T23:05:37Z Revert "[SPARK-16334] Reusing same dictionary column for decoding consecutive row groups shouldn't throw an error" This reverts commit a3930c3b9afa9f7eba2a5c8b8f279ca38e348e9b. commit 12a2e2a5ab5db12f39a7b591e914d52058e1581b Author: Junyang Qian Date: 2016-09-03T04:11:57Z [SPARKR][MINOR] Fix docs for sparkR.session and count ## What changes were proposed in this pull request? This PR tries to add some more explanation to `sparkR.session`. It also modifies doc for `count` so when grouped in one doc, the description doesn't confuse users. ## How was this patch tested? Manual test. ![screen shot 2016-09-02 at 1 21 36 pm](https://cloud.githubusercontent.com/assets/15318264/18217198/409613ac-7110-11e6-8dae-cb0c8df557bf.png) Author: Junyang Qian Closes #14942 from junyangq/fixSparkRSessionDoc. (cherry picked from commit d2fde6b72c4aede2e7edb4a7e6653fb1e7b19924)
[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/12775 @kayousterhout As @lirui-intel mentioned above, there are two parts to this change. One is moving handleFailedTask to finally - that is a correct change. The other is catching Throwable, logging it and ignoring it. This is an incorrect practice. Specifically in this context, since it is within Utils.logUncaughtExceptions - the logging issue is already handled. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/12775 @mridulm what's the scenario you're imagining where it's worse to catch the exception? I'm imagining one of two scenarios: (1) There's a recoverable exception, in which case we should properly register the task as failed (otherwise the job will hang) and log the exception (which is what this PR does). (2) There's an irrecoverable exception. My understanding is that this change only impacts the logging in that case (since the relevant thread is going to die anyway). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...
Github user lirui-intel commented on the issue: https://github.com/apache/spark/pull/12775 Hi @kayousterhout and @mridulm, to clarify, I think the error won't disappear if we don't catch it. Because the runnable is wrapped in Utils.logUncaughtExceptions so the error will be logged eventually. But anyway I think we should handle the failed task in a finally block. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/12775 If intent is only to log, why not register an uncaughtException handler for that purpose instead of catching Throwable and ignoring it after logging ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16313: [SPARK-18899][SPARK-18912][SPARK-18913][SQL] refactor th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16313 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16313: [SPARK-18899][SPARK-18912][SPARK-18913][SQL] refactor th...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16313 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70392/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16313: [SPARK-18899][SPARK-18912][SPARK-18913][SQL] refactor th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16313 **[Test build #70392 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70392/testReport)** for PR 16313 at commit [`32857e6`](https://github.com/apache/spark/commit/32857e6c5fa89094b84d4ed78469217af8c515c7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...
Github user kayousterhout commented on the issue: https://github.com/apache/spark/pull/12775 @mridulm My thought here was we might as well catch it, since the thread is about to die anyway. The alternative is that we don't catch it, the thread dies (so the error disappears / we never see it), and then the VM is in the same inconsistent state. At least the error message from catching it might provide a useful hint about what happened. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16189: [SPARK-18761][CORE] Introduce "task reaper" to oversee t...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/16189 Sounds good @JoshRosen. In general @yhuai it would have been better to give some more time for reviewers to get to ongoing conversations before commiting a patch under active review unless hotfix, thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/12775 Just saw this - catching Throwable is problematic : it could be any system related Error's too : which might leave the VM in inconsistent state if not properly handled. Like an OOM or a link error. Are we sure ignoring Throwable is the right approach here ? It is not just the current thread which might be at risk ? If there is a more specific subset which is relevant, it would be more appropriate to catch those. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/16344 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16189: [SPARK-18761][CORE] Introduce "task reaper" to ov...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/16189 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16189: [SPARK-18761][CORE] Introduce "task reaper" to oversee t...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/16189 LGTM! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16347: [SPARK-18934][SQL] Writing to dynamic partitions does no...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16347 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16189: [SPARK-18761][CORE] Introduce "task reaper" to oversee t...
Github user yhuai commented on the issue: https://github.com/apache/spark/pull/16189 Thank you for those comments. I am merging this to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16189: [SPARK-18761][CORE] Introduce "task reaper" to ov...
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/16189#discussion_r93162832 --- Diff: core/src/test/scala/org/apache/spark/JobCancellationSuite.scala --- @@ -209,6 +209,83 @@ class JobCancellationSuite extends SparkFunSuite with Matchers with BeforeAndAft assert(jobB.get() === 100) } + test("task reaper kills JVM if killed tasks keep running for too long") { +val conf = new SparkConf() + .set("spark.task.reaper.enabled", "true") + .set("spark.task.reaper.killTimeout", "5s") +sc = new SparkContext("local-cluster[2,1,1024]", "test", conf) + +// Add a listener to release the semaphore once any tasks are launched. +val sem = new Semaphore(0) +sc.addSparkListener(new SparkListener { + override def onTaskStart(taskStart: SparkListenerTaskStart) { +sem.release() + } +}) + +// jobA is the one to be cancelled. +val jobA = Future { + sc.setJobGroup("jobA", "this is a job to be cancelled", interruptOnCancel = true) + sc.parallelize(1 to 1, 2).map { i => +while (true) { } + }.count() +} + +// Block until both tasks of job A have started and cancel job A. +sem.acquire(2) +// Small delay to ensure tasks actually start executing the task body +Thread.sleep(1000) + +sc.clearJobGroup() +val jobB = sc.parallelize(1 to 100, 2).countAsync() +sc.cancelJobGroup("jobA") +val e = intercept[SparkException] { ThreadUtils.awaitResult(jobA, 15.seconds) }.getCause +assert(e.getMessage contains "cancel") + +// Once A is cancelled, job B should finish fairly quickly. +assert(ThreadUtils.awaitResult(jobB, 60.seconds) === 100) + } + + test("task reaper will not kill JVM if spark.task.killTimeout == -1") { +val conf = new SparkConf() + .set("spark.task.reaper.enabled", "true") + .set("spark.task.reaper.killTimeout", "-1") + .set("spark.task.reaper.PollingInterval", "1s") + .set("spark.deploy.maxExecutorRetries", "1") --- End diff -- We set it to 1 to make sure that we will not kill JVM, right (if we kill JVM, we will remove the application because spark.deploy.maxExecutorRetries is 1.)? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16343: [FLAKY-TEST][DO NOT MERGE] InputStreamsSuite.socket inpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16343 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70391/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16343: [FLAKY-TEST][DO NOT MERGE] InputStreamsSuite.socket inpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16343 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16343: [FLAKY-TEST][DO NOT MERGE] InputStreamsSuite.socket inpu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16343 **[Test build #70391 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70391/testReport)** for PR 16343 at commit [`04fa2f7`](https://github.com/apache/spark/commit/04fa2f709d034841a0828bd110e5561198b000ea). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16341: [SQL] [WIP] Switch internal catalog types to use URI ins...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16341 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16341: [SQL] [WIP] Switch internal catalog types to use URI ins...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16341 **[Test build #70394 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70394/testReport)** for PR 16341 at commit [`bcdac16`](https://github.com/apache/spark/commit/bcdac1691c46395410eb090cd7e0805ed4d58f14). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16341: [SQL] [WIP] Switch internal catalog types to use URI ins...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16341 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70394/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16347: [SPARK-18934][SQL] Writing to dynamic partitions ...
GitHub user junegunn opened a pull request: https://github.com/apache/spark/pull/16347 [SPARK-18934][SQL] Writing to dynamic partitions does not preserve sort order if spills occur ## What changes were proposed in this pull request? Make dynamic partition writer perform stable sort by the partition key, so that the sort order within the partition specified via `sortWithinPartitions` or `SORT BY` is preserved even when spill occurs. ## How was this patch tested? Manually tested with the following code snippet and orcdump. ```scala // FileFormatWriter sc.parallelize(1 to 1000).toDS.withColumn("part", 'value.mod(2)) .repartition(1, 'part).sortWithinPartitions("value") .write.mode("overwrite").format("orc").partitionBy("part") .saveAsTable("test_sort_within") spark.read.table("test_sort_within").filter('part === 0).show spark.read.table("test_sort_within").filter('part === 1).show // SparkHiveDynamicPartitionWriterContainer // Insert into an existing Hive table with dynamic partitions // CREATE TABLE TEST_SORT_WITHIN (VALUE INT) PARTITIONED BY (PART INT) STORED AS ORC spark.conf.set("hive.exec.dynamic.partition.mode", "nonstrict") sc.parallelize(1 to 1000).toDS.withColumn("part", 'value.mod(2)) .repartition(1, 'part).sortWithinPartitions("value") .write.mode("overwrite").insertInto("test_sort_within_hive") spark.read.table("test_sort_within_hive").filter('part === 0).show spark.read.table("test_sort_within_hive").filter('part === 1).show ``` It was not straightforward to come up with a unit test as the problem is only reproducible if spill occurs due to memory constraint. I'd appreciate any suggestions or pointers. You can merge this pull request into a Git repository by running: $ git pull https://github.com/junegunn/spark dynamic-partition-writer-stable-sort Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16347.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16347 commit bfeccd80ef032cab3525037be3d3e42519619493 Author: Junegunn ChoiDate: 2016-12-19T05:54:42Z [SPARK-18934][SQL] Writing to dynamic partitions does not preserve sort order if spills occur --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16345: [SPARK-17755][Core]Use workerRef to send RegisterWorkerR...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16345 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16345: [SPARK-17755][Core]Use workerRef to send RegisterWorkerR...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16345 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70390/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16345: [SPARK-17755][Core]Use workerRef to send RegisterWorkerR...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16345 **[Test build #70390 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70390/testReport)** for PR 16345 at commit [`b4b5552`](https://github.com/apache/spark/commit/b4b55528edc5e9c92f28cf81ea81e72748790100). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16341: [SQL] [WIP] Switch internal catalog types to use URI ins...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16341 **[Test build #70394 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70394/testReport)** for PR 16341 at commit [`bcdac16`](https://github.com/apache/spark/commit/bcdac1691c46395410eb090cd7e0805ed4d58f14). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12775 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/12775 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70388/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #12775: [SPARK-14958][Core] Failed task not handled when there's...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/12775 **[Test build #70388 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70388/testReport)** for PR 12775 at commit [`699730b`](https://github.com/apache/spark/commit/699730b592e8d913e728e0097e140c710c201dce). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16346: [SPARK-16654][CORE] Add UI coverage for Application Leve...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16346 **[Test build #70393 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70393/testReport)** for PR 16346 at commit [`20ff7dd`](https://github.com/apache/spark/commit/20ff7dddea72bf8fc9330f464992b19e1bf1c59e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16346: [SPARK-16654][CORE] Add UI coverage for Application Leve...
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/16346 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16346: [SPARK-16654][CORE] Add UI coverage for Application Leve...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16346 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16346: [SPARK-16654][CORE] Add UI coverage for Applicati...
GitHub user jsoltren opened a pull request: https://github.com/apache/spark/pull/16346 [SPARK-16654][CORE] Add UI coverage for Application Level Blacklisting Builds on top of work in SPARK-8425 to update Application Level Blacklisting in the scheduler. ## What changes were proposed in this pull request? Adds a UI to these patches by: - defining new listener events for blacklisting and unblacklisting, nodes and executors; - sending said events at the relevant points in BlacklistTracker; - adding JSON (de)serialization code for these events; - augmenting the Executors UI page to show which, and how many, executors are blacklisted; - adding a unit test to make sure events are being fired; - adding HistoryServerSuite coverage to verify that the SHS reads these events correctly. - updates the Executor UI to show Blacklisted/Active/Dead as a tri-state in Executors Status Updates .rat-excludes to pass tests. @username squito (Please fill in changes proposed in this fix) ## How was this patch tested? ./dev/run-tests testOnly org.apache.spark.util.JsonProtocolSuite testOnly org.apache.spark.scheduler.BlacklistTrackerSuite testOnly org.apache.spark.deploy.history.HistoryServerSuite https://github.com/jsoltren/jose-utils/blob/master/blacklist/test-blacklist.sh ![blacklist-20161219](https://cloud.githubusercontent.com/assets/1208477/21335321/9eda320a-c623-11e6-8b8c-9c912a73c276.jpg) You can merge this pull request into a Git repository by running: $ git pull https://github.com/jsoltren/spark SPARK-16654-submit Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/16346.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #16346 commit 20ff7dddea72bf8fc9330f464992b19e1bf1c59e Author: José Hiram Soltren <j...@cloudera.com> Date: 2016-10-14T21:09:44Z [SPARK-16654][CORE] Add UI coverage for Application Level Blacklisting Builds on top of work in SPARK-8425 to update Application Level Blacklisting in the scheduler. Adds a UI to these patches by: - defining new listener events for blacklisting and unblacklisting, nodes and executors; - sending said events at the relevant points in BlacklistTracker; - adding JSON (de)serialization code for these events; - augmenting the Executors UI page to show which, and how many, executors are blacklisted; - adding a unit test to make sure events are being fired; - adding HistoryServerSuite coverage to verify that the SHS reads these events correctly. - updates the Executor UI to show Blacklisted/Active/Dead as a tri-state in Executors Status Updates .rat-excludes to pass tests. @username squito --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16325: [SPARK-18703] [SPARK-18675] [SQL] [BACKPORT-2.1] ...
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/16325 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16325: [SPARK-18703] [SPARK-18675] [SQL] [BACKPORT-2.1] CTAS fo...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16325 Sure, will do it. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16326: [SPARK-18915] [SQL] Automatic Table Repair when Creating...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16326 We really need to improve the document, I think --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #16326: [SPARK-18915] [SQL] Automatic Table Repair when C...
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/16326 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16326: [SPARK-18915] [SQL] Automatic Table Repair when Creating...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16326 Based on the discussion in https://github.com/apache/spark/pull/15983, we do not plan to add automatic table repairing. Let me close it first. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16313: [SPARK-18899][SPARK-18912][SPARK-18913][SQL] refactor th...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16313 **[Test build #70392 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70392/testReport)** for PR 16313 at commit [`32857e6`](https://github.com/apache/spark/commit/32857e6c5fa89094b84d4ed78469217af8c515c7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16343: [FLAKY-TEST][DO NOT MERGE] InputStreamsSuite.socket inpu...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16343 **[Test build #70387 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70387/testReport)** for PR 16343 at commit [`92144e4`](https://github.com/apache/spark/commit/92144e428aa1919ed86e989f4015eb6f85186ea2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16343: [FLAKY-TEST][DO NOT MERGE] InputStreamsSuite.socket inpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16343 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70387/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16343: [FLAKY-TEST][DO NOT MERGE] InputStreamsSuite.socket inpu...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16343 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16338: [SPARK-18837][WEBUI] Very long stage descriptions do not...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16338 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70383/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16232: [SPARK-18800][SQL] Fix UnsafeKVExternalSorter by correct...
Github user viirya commented on the issue: https://github.com/apache/spark/pull/16232 @davies What I just said is not accurate. I don't meant the values have entry in the array. I meant each key/value pair will occupy two entries in the array. We iterator all key/value pairs in the `BytesToBytesMap` and call `UnsafeInMemorySorter.insertRecord` which inserts record pointer and key prefix for each key/value pair in https://github.com/apache/spark/blob/5857b9ac2d9808d9b89a5b29620b5052e2beebf5/core/src/main/java/org/apache/spark/util/collection/unsafe/sort/UnsafeInMemorySorter.java#L241 So you will have `map.numKeys() * 2` entries in the array because for each you will have one entry for record pointer and one entry for key prefix. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16338: [SPARK-18837][WEBUI] Very long stage descriptions do not...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16338 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16338: [SPARK-18837][WEBUI] Very long stage descriptions do not...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16338 **[Test build #70383 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70383/testReport)** for PR 16338 at commit [`c86dc72`](https://github.com/apache/spark/commit/c86dc72f553855843812151ff12e92fa779a5b37). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16313: [SPARK-18899][SPARK-18912][SPARK-18913][SQL] refactor th...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/16313 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15983: [SPARK-18544] [SQL] Append with df.saveAsTable writes da...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15983 I see the plan, but the behavior difference will still be affected by the value of `spark.sql.hive.manageFilesourcePartitions`, right? I might need more time to chew over it to find out the potential impacts. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/16344 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16314: [SPARK-18900][FLAKY-TEST] StateStoreSuite.maintenance
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16314 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16314: [SPARK-18900][FLAKY-TEST] StateStoreSuite.maintenance
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16314 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70384/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16314: [SPARK-18900][FLAKY-TEST] StateStoreSuite.maintenance
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16314 **[Test build #70384 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/70384/testReport)** for PR 16314 at commit [`6775639`](https://github.com/apache/spark/commit/67756391e11b4ad0ed38fec9cbe99bd7e8b2ce63). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #15996: [SPARK-18567][SQL] Simplify CreateDataSourceTableAsSelec...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/15996 Could we update the PR description and add the test case in `PartitionProviderCompatibilitySuite.scala` to reflect the external behavior changes of CTAS on partitioned data source tables? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16344: [SPARK-18929][ML] Add Tweedie distribution in GLM
Github user yanboliang commented on the issue: https://github.com/apache/spark/pull/16344 Jenkins, add to whitelist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16342: [SPARK-18927][SS] MemorySink for StructuredStreaming can...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16342 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/70382/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #16342: [SPARK-18927][SS] MemorySink for StructuredStreaming can...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16342 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org