[GitHub] spark pull request: [SPARK-11701] dynamic allocation and speculati...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10951#issuecomment-175885311 **[Test build #50219 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50219/consoleFull)** for PR 10951 at commit [`249fc78`](https://github.com/apache/spark/commit/249fc78fd0fe7b3cbe5430a075ab5f9e281c015c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/10944#issuecomment-175885463 cc @nongli @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11701] dynamic allocation and speculati...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10951#issuecomment-175885501 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11701] dynamic allocation and speculati...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10951#issuecomment-175885503 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50219/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12230][ML] WeightedLeastSquares.fit() s...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/10274#discussion_r51059677 --- Diff: mllib/src/test/scala/org/apache/spark/ml/optim/WeightedLeastSquaresSuite.scala --- @@ -74,6 +89,35 @@ class WeightedLeastSquaresSuite extends SparkFunSuite with MLlibTestSparkContext } } + test("WLS against lm when label is constant") { +/* + R code: + # here b is constant + df <- as.data.frame(cbind(A, b)) + for (formula in c(b ~ . -1, b ~ .)) { + model <- lm(formula, data=df, weights=w) + print(as.vector(coef(model))) + } + + [1] -9.221298 3.394343 + [1] 17 0 0 +*/ + +val expected = Seq( + Vectors.dense(0.0, -9.221298, 3.394343), + Vectors.dense(17.0, 0.0, 0.0)) + +var idx = 0 +for (fitIntercept <- Seq(false, true)) { + val wls = new WeightedLeastSquares( +fitIntercept, regParam = 0.0, standardizeFeatures = false, standardizeLabel = true) +.fit(instancesConstLabel) --- End diff -- Sorry for getting you back so late. The difference is due to that `glmnet` always standardizes labels even `standardization == false`. `standardization == false` is turning off the standardization on features. As a result, at least in `glmnet`, when `ystd == 0.0`, the training is not valid. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13049] Add First/last with ignore nulls...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10957#issuecomment-175889533 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50230/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13049] Add First/last with ignore nulls...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10957#issuecomment-175889530 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12895][SPARK-12896] Migrate TaskMetrics...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/10835#discussion_r51061122 --- Diff: project/MimaExcludes.scala --- @@ -145,6 +145,15 @@ object MimaExcludes { // SPARK-12510 Refactor ActorReceiver to support Java ProblemFilters.exclude[AbstractClassProblem]("org.apache.spark.streaming.receiver.ActorReceiver") ) ++ Seq( +// SPARK-12895 Implement TaskMetrics using accumulators + ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.TaskContext.internalMetricsToAccumulators"), + ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.TaskContext.collectInternalAccumulators"), + ProblemFilters.exclude[MissingMethodProblem]("org.apache.spark.TaskContext.collectAccumulators") + ) ++ Seq( +// SPARK-12896 Send only accumulator updates to driver, not TaskMetrics + ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.Accumulable.this"), --- End diff -- I just did an audit. See message on main thread for more detail. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12895][SPARK-12896] Migrate TaskMetrics...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/10835#issuecomment-175893531 I was able to verify that the changes in `Accumulable` and `Accumulator` do not break compatibility. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12895][SPARK-12896] Migrate TaskMetrics...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/10835#discussion_r51062164 --- Diff: core/src/main/scala/org/apache/spark/InternalAccumulator.scala --- @@ -17,42 +17,193 @@ package org.apache.spark +import org.apache.spark.storage.{BlockId, BlockStatus} -// This is moved to its own file because many more things will be added to it in SPARK-10620. + +/** + * A collection of fields and methods concerned with internal accumulators that represent + * task level metrics. + */ private[spark] object InternalAccumulator { - val PEAK_EXECUTION_MEMORY = "peakExecutionMemory" - val TEST_ACCUMULATOR = "testAccumulator" - - // For testing only. - // This needs to be a def since we don't want to reuse the same accumulator across stages. - private def maybeTestAccumulator: Option[Accumulator[Long]] = { -if (sys.props.contains("spark.testing")) { - Some(new Accumulator( -0L, AccumulatorParam.LongAccumulatorParam, Some(TEST_ACCUMULATOR), internal = true)) -} else { - None + + import AccumulatorParam._ + + // Prefixes used in names of internal task level metrics + val METRICS_PREFIX = "internal.metrics." + val SHUFFLE_READ_METRICS_PREFIX = METRICS_PREFIX + "shuffle.read." + val SHUFFLE_WRITE_METRICS_PREFIX = METRICS_PREFIX + "shuffle.write." + val OUTPUT_METRICS_PREFIX = METRICS_PREFIX + "output." + val INPUT_METRICS_PREFIX = METRICS_PREFIX + "input." + + // Names of internal task level metrics + val EXECUTOR_DESERIALIZE_TIME = METRICS_PREFIX + "executorDeserializeTime" + val EXECUTOR_RUN_TIME = METRICS_PREFIX + "executorRunTime" + val RESULT_SIZE = METRICS_PREFIX + "resultSize" + val JVM_GC_TIME = METRICS_PREFIX + "jvmGCTime" + val RESULT_SERIALIZATION_TIME = METRICS_PREFIX + "resultSerializationTime" + val MEMORY_BYTES_SPILLED = METRICS_PREFIX + "memoryBytesSpilled" + val DISK_BYTES_SPILLED = METRICS_PREFIX + "diskBytesSpilled" + val PEAK_EXECUTION_MEMORY = METRICS_PREFIX + "peakExecutionMemory" + val UPDATED_BLOCK_STATUSES = METRICS_PREFIX + "updatedBlockStatuses" + val TEST_ACCUM = METRICS_PREFIX + "testAccumulator" + + // scalastyle:off + + // Names of shuffle read metrics + object shuffleRead { +val REMOTE_BLOCKS_FETCHED = SHUFFLE_READ_METRICS_PREFIX + "remoteBlocksFetched" +val LOCAL_BLOCKS_FETCHED = SHUFFLE_READ_METRICS_PREFIX + "localBlocksFetched" +val REMOTE_BYTES_READ = SHUFFLE_READ_METRICS_PREFIX + "remoteBytesRead" +val LOCAL_BYTES_READ = SHUFFLE_READ_METRICS_PREFIX + "localBytesRead" +val FETCH_WAIT_TIME = SHUFFLE_READ_METRICS_PREFIX + "fetchWaitTime" +val RECORDS_READ = SHUFFLE_READ_METRICS_PREFIX + "recordsRead" + } + + // Names of shuffle write metrics + object shuffleWrite { +val BYTES_WRITTEN = SHUFFLE_WRITE_METRICS_PREFIX + "bytesWritten" +val RECORDS_WRITTEN = SHUFFLE_WRITE_METRICS_PREFIX + "recordsWritten" +val WRITE_TIME = SHUFFLE_WRITE_METRICS_PREFIX + "writeTime" + } + + // Names of output metrics + object output { +val WRITE_METHOD = OUTPUT_METRICS_PREFIX + "writeMethod" +val BYTES_WRITTEN = OUTPUT_METRICS_PREFIX + "bytesWritten" +val RECORDS_WRITTEN = OUTPUT_METRICS_PREFIX + "recordsWritten" + } + + // Names of input metrics + object input { +val READ_METHOD = INPUT_METRICS_PREFIX + "readMethod" +val BYTES_READ = INPUT_METRICS_PREFIX + "bytesRead" +val RECORDS_READ = INPUT_METRICS_PREFIX + "recordsRead" + } + + // scalastyle:on + + /** + * Create an internal [[Accumulator]] by name, which must begin with [[METRICS_PREFIX]]. + */ + def create(name: String): Accumulator[_] = { +assert(name.startsWith(METRICS_PREFIX), + s"internal accumulator name must start with '$METRICS_PREFIX': $name") +getParam(name) match { + case p @ LongAccumulatorParam => newMetric[Long](0L, name, p) + case p @ IntAccumulatorParam => newMetric[Int](0, name, p) + case p @ StringAccumulatorParam => newMetric[String]("", name, p) + case p @ UpdatedBlockStatusesAccumulatorParam => +newMetric[Seq[(BlockId, BlockStatus)]](Seq(), name, p) + case p => throw new IllegalArgumentException( +s"unsupported accumulator param '${p.getClass.getSimpleName}' for metric '$name'.") +} + } + + /** + * Get the [[AccumulatorParam]] associated with the internal metric name, + * which must begin with [[METRICS_PREFIX]]. + */ + def getParam(name: String): AccumulatorParam[_] = { +assert(name.startsWith(METRICS_PREFIX), + s"internal accumulator name must
[GitHub] spark pull request: [SPARK-10620] Minor addendum to #10835
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/10958#issuecomment-175897116 @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/10702#discussion_r51064724 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala --- @@ -558,6 +575,47 @@ class LinearRegressionSuite } } + test("linear regression model with constant label") { +/* + R code: + for (formula in c(b.const ~ . -1, b.const ~ .)) { + model <- lm(formula, data=df.const.label, weights=w) + print(as.vector(coef(model))) + } + [1] -9.221298 3.394343 + [1] 17 0 0 +*/ +val expected = Seq( + Vectors.dense(0.0, -9.221298, 3.394343), + Vectors.dense(17.0, 0.0, 0.0)) + +Seq("auto", "l-bfgs", "normal").foreach { solver => + var idx = 0 + for (fitIntercept <- Seq(false, true)) { +val model = new LinearRegression() + .setFitIntercept(fitIntercept) + .setWeightCol("weight") + .setSolver(solver) + .fit(datasetWithWeightConstantLabel) +val actual = Vectors.dense(model.intercept, model.coefficients(0), model.coefficients(1)) +assert(actual ~== expected(idx) absTol 1e-4) +idx += 1 + } +} + } + + test("regularized linear regression through origin with constant label") { +// The problem is ill-defined if fitIntercept=false, regParam is non-zero and \ --- End diff -- Remove `\` in the end of line. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/10702#discussion_r51064615 --- Diff: mllib/src/test/scala/org/apache/spark/ml/regression/LinearRegressionSuite.scala --- @@ -558,6 +575,47 @@ class LinearRegressionSuite } } + test("linear regression model with constant label") { +/* + R code: + for (formula in c(b.const ~ . -1, b.const ~ .)) { + model <- lm(formula, data=df.const.label, weights=w) + print(as.vector(coef(model))) + } + [1] -9.221298 3.394343 + [1] 17 0 0 +*/ +val expected = Seq( + Vectors.dense(0.0, -9.221298, 3.394343), + Vectors.dense(17.0, 0.0, 0.0)) + +Seq("auto", "l-bfgs", "normal").foreach { solver => + var idx = 0 + for (fitIntercept <- Seq(false, true)) { +val model = new LinearRegression() + .setFitIntercept(fitIntercept) + .setWeightCol("weight") + .setSolver(solver) + .fit(datasetWithWeightConstantLabel) +val actual = Vectors.dense(model.intercept, model.coefficients(0), model.coefficients(1)) +assert(actual ~== expected(idx) absTol 1e-4) +idx += 1 + } +} + } + + test("regularized linear regression through origin with constant label") { +// The problem is ill-defined if fitIntercept=false, regParam is non-zero and \ +// standardization=true. An exception is thrown in this case. --- End diff -- When `standardization=false`, the problem is still ill-defined since GLMNET always standardizes the labels. That's why you see it in the analytical solution. Let's throw exception when `fitIntercept=false` and `regParam != 0.0`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13049] Add First/last with ignore nulls...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10957#issuecomment-175904468 Why might this be a bug fix? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/10702#discussion_r51069962 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -219,33 +219,43 @@ class LinearRegression @Since("1.3.0") (@Since("1.3.0") override val uid: String } val yMean = ySummarizer.mean(0) -val yStd = math.sqrt(ySummarizer.variance(0)) - -// If the yStd is zero, then the intercept is yMean with zero coefficient; -// as a result, training is not needed. -if (yStd == 0.0) { - logWarning(s"The standard deviation of the label is zero, so the coefficients will be " + -s"zeros and the intercept will be the mean of the label; as a result, " + -s"training is not needed.") - if (handlePersistence) instances.unpersist() - val coefficients = Vectors.sparse(numFeatures, Seq()) - val intercept = yMean - - val model = new LinearRegressionModel(uid, coefficients, intercept) - // Handle possible missing or invalid prediction columns - val (summaryModel, predictionColName) = model.findSummaryModelAndPredictionCol() - - val trainingSummary = new LinearRegressionTrainingSummary( -summaryModel.transform(dataset), -predictionColName, -$(labelCol), -model, -Array(0D), -$(featuresCol), -Array(0D)) - return copyValues(model.setSummary(trainingSummary)) +val rawYStd = math.sqrt(ySummarizer.variance(0)) +if (rawYStd == 0.0) { + if ($(fitIntercept)) { +// If the rawYStd is zero and fitIntercept=true, then the intercept is yMean with +// zero coefficient; as a result, training is not needed. +logWarning(s"The standard deviation of the label is zero, so the coefficients will be " + + s"zeros and the intercept will be the mean of the label; as a result, " + + s"training is not needed.") +if (handlePersistence) instances.unpersist() +val coefficients = Vectors.sparse(numFeatures, Seq()) +val intercept = yMean + +val model = new LinearRegressionModel(uid, coefficients, intercept) +// Handle possible missing or invalid prediction columns +val (summaryModel, predictionColName) = model.findSummaryModelAndPredictionCol() + +val trainingSummary = new LinearRegressionTrainingSummary( + summaryModel.transform(dataset), + predictionColName, + $(labelCol), + model, + Array(0D), + $(featuresCol), + Array(0D)) +return copyValues(model.setSummary(trainingSummary)) + } else { +require(!($(regParam) > 0.0 && $(standardization)), --- End diff -- remove `&& $(standardization)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13047][PYSPARK][ML] Pyspark Params.hasP...
GitHub user sethah opened a pull request: https://github.com/apache/spark/pull/10962 [SPARK-13047][PYSPARK][ML] Pyspark Params.hasParam should not throw an error Pyspark Params class has a method `hasParam(paramName)` which returns `True` if the class has a parameter by that name, but throws an `AttributeError` otherwise. There is not currently a way of getting a Boolean to indicate if a class has a parameter. With Spark 2.0 we could modify the existing behavior of `hasParam` or add an additional method with this functionality. In Python: ```python from pyspark.ml.classification import NaiveBayes nb = NaiveBayes(smoothing=0.5) print nb.hasParam("smoothing") print nb.hasParam("notAParam") ``` produces: > True > AttributeError: 'NaiveBayes' object has no attribute 'notAParam' However, in Scala: ```scala import org.apache.spark.ml.classification.NaiveBayes val nb = new NaiveBayes() nb.hasParam("smoothing") nb.hasParam("notAParam") ``` produces: > true > false You can merge this pull request into a Git repository by running: $ git pull https://github.com/sethah/spark SPARK-13047 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10962.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10962 commit d52b1de1adefedb6938130d0530ea46fdb3f64f7 Author: sethahDate: 2016-01-27T23:55:04Z hasParam returns False instead of throwing an error --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12177] [STREAMING] Update KafkaDStreams...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/10953#issuecomment-175891692 MiMA is a binary compatibility checker. It's complaining that some changes you made caused the public APIs exposed in the compiled classes to change - meaning existing code compiled against the current Spark version might not run on the next Spark. First I'd look at whether those changes are necessary; if they are, it might be ok to add exclusions because we're being a bit lenient with API breakages in 2.0. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/10630#discussion_r51061765 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -388,57 +445,18 @@ class Analyzer( .map(_.asInstanceOf[NamedExpression]) a.copy(aggregateExpressions = expanded) - // Special handling for cases when self-join introduce duplicate expression ids. - case j @ Join(left, right, _, _) if !j.selfJoinResolved => -val conflictingAttributes = left.outputSet.intersect(right.outputSet) -logDebug(s"Conflicting attributes ${conflictingAttributes.mkString(",")} in $j") - -right.collect { - // Handle base relations that might appear more than once. - case oldVersion: MultiInstanceRelation - if oldVersion.outputSet.intersect(conflictingAttributes).nonEmpty => -val newVersion = oldVersion.newInstance() -(oldVersion, newVersion) - - // Handle projects that create conflicting aliases. - case oldVersion @ Project(projectList, _) - if findAliases(projectList).intersect(conflictingAttributes).nonEmpty => -(oldVersion, oldVersion.copy(projectList = newAliases(projectList))) - - case oldVersion @ Aggregate(_, aggregateExpressions, _) - if findAliases(aggregateExpressions).intersect(conflictingAttributes).nonEmpty => -(oldVersion, oldVersion.copy(aggregateExpressions = newAliases(aggregateExpressions))) - - case oldVersion: Generate - if oldVersion.generatedSet.intersect(conflictingAttributes).nonEmpty => -val newOutput = oldVersion.generatorOutput.map(_.newInstance()) -(oldVersion, oldVersion.copy(generatorOutput = newOutput)) - - case oldVersion @ Window(_, windowExpressions, _, _, child) - if AttributeSet(windowExpressions.map(_.toAttribute)).intersect(conflictingAttributes) -.nonEmpty => -(oldVersion, oldVersion.copy(windowExpressions = newAliases(windowExpressions))) -} -// Only handle first case, others will be fixed on the next pass. -.headOption match { - case None => -/* - * No result implies that there is a logical plan node that produces new references - * that this rule cannot handle. When that is the case, there must be another rule - * that resolves these conflicts. Otherwise, the analysis will fail. - */ -j - case Some((oldRelation, newRelation)) => -val attributeRewrites = AttributeMap(oldRelation.output.zip(newRelation.output)) -val newRight = right transformUp { - case r if r == oldRelation => newRelation -} transformUp { - case other => other transformExpressions { -case a: Attribute => attributeRewrites.get(a).getOrElse(a) - } -} -j.copy(right = newRight) -} + // To resolve duplicate expression IDs for all the BinaryNode + case b: BinaryNode if !b.duplicateResolved => b match { +case j @ Join(left, right, _, _) => + j.copy(right = dedupRight(left, right)) +case i @ Intersect(left, right) => + i.copy(right = dedupRight(left, right)) +case e @ Except(left, right) => + e.copy(right = dedupRight(left, right)) +case cg: CoGroup => --- End diff -- For other operators, can we construct test cases that can make them fail without de-duplication? If we can, then we should create a JIRA and fix it in another PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13020][SQL][test] fix random generator ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10930#issuecomment-175896474 **[Test build #50205 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50205/consoleFull)** for PR 10930 at commit [`e627f5b`](https://github.com/apache/spark/commit/e627f5b96a21ccc748c75c7fa0a4c4839cdc63c5). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10620] Minor addendum to #10835
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/10958#issuecomment-175896506 @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX] Fix Scala 2.11 compilation
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10955#issuecomment-175903354 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10620] [SPARK-13054] Minor addendum to ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10958#issuecomment-175908428 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50237/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10620] [SPARK-13054] Minor addendum to ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10958#issuecomment-175908426 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13045][SQL] Remove ColumnVector.Struct ...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/10952#issuecomment-175909805 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/10844#discussion_r51068499 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala --- @@ -88,6 +88,12 @@ case class Generate( case class Filter(condition: Expression, child: LogicalPlan) extends UnaryNode { override def output: Seq[Attribute] = child.output + + override def constraints: Set[Expression] = { +val newConstraint = splitConjunctivePredicates(condition).filter( + _.references.subsetOf(outputSet)).toSet --- End diff -- style nit: we typically avoid breaking in the middle of a function call and instead prefer to break in between calls (always pick the highest syntactic level) ```scala val newConstraint = splitConjunctivePredicates(condition) .filter(_.references.subsetOf(outputSet)) .toSet ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/10844#discussion_r51068376 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/QueryPlan.scala --- @@ -17,16 +17,31 @@ package org.apache.spark.sql.catalyst.plans -import org.apache.spark.sql.catalyst.expressions.{Attribute, AttributeSet, Expression, VirtualColumn} +import org.apache.spark.sql.catalyst.expressions._ import org.apache.spark.sql.catalyst.trees.TreeNode import org.apache.spark.sql.types.{DataType, StructType} -abstract class QueryPlan[PlanType <: TreeNode[PlanType]] extends TreeNode[PlanType] { +abstract class QueryPlan[PlanType <: TreeNode[PlanType]] + extends TreeNode[PlanType] with PredicateHelper { self: PlanType => def output: Seq[Attribute] /** + * Extracts the output property from a given child. + */ + def extractConstraintsFromChild(child: QueryPlan[PlanType]): Set[Expression] = { --- End diff -- `protected`? Also I'm not sure I get the scala doc. Maybe `getReleventContraints` is a better name? It is taking the constraints and removing those that don't apply anymore because we removed columns right? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/10702#discussion_r51070090 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -219,33 +219,43 @@ class LinearRegression @Since("1.3.0") (@Since("1.3.0") override val uid: String } val yMean = ySummarizer.mean(0) -val yStd = math.sqrt(ySummarizer.variance(0)) - -// If the yStd is zero, then the intercept is yMean with zero coefficient; -// as a result, training is not needed. -if (yStd == 0.0) { - logWarning(s"The standard deviation of the label is zero, so the coefficients will be " + -s"zeros and the intercept will be the mean of the label; as a result, " + -s"training is not needed.") - if (handlePersistence) instances.unpersist() - val coefficients = Vectors.sparse(numFeatures, Seq()) - val intercept = yMean - - val model = new LinearRegressionModel(uid, coefficients, intercept) - // Handle possible missing or invalid prediction columns - val (summaryModel, predictionColName) = model.findSummaryModelAndPredictionCol() - - val trainingSummary = new LinearRegressionTrainingSummary( -summaryModel.transform(dataset), -predictionColName, -$(labelCol), -model, -Array(0D), -$(featuresCol), -Array(0D)) - return copyValues(model.setSummary(trainingSummary)) +val rawYStd = math.sqrt(ySummarizer.variance(0)) +if (rawYStd == 0.0) { + if ($(fitIntercept)) { +// If the rawYStd is zero and fitIntercept=true, then the intercept is yMean with +// zero coefficient; as a result, training is not needed. +logWarning(s"The standard deviation of the label is zero, so the coefficients will be " + + s"zeros and the intercept will be the mean of the label; as a result, " + + s"training is not needed.") +if (handlePersistence) instances.unpersist() +val coefficients = Vectors.sparse(numFeatures, Seq()) +val intercept = yMean + +val model = new LinearRegressionModel(uid, coefficients, intercept) +// Handle possible missing or invalid prediction columns +val (summaryModel, predictionColName) = model.findSummaryModelAndPredictionCol() + +val trainingSummary = new LinearRegressionTrainingSummary( + summaryModel.transform(dataset), + predictionColName, + $(labelCol), + model, + Array(0D), + $(featuresCol), + Array(0D)) +return copyValues(model.setSummary(trainingSummary)) + } else { +require(!($(regParam) > 0.0 && $(standardization)), --- End diff -- Just ` require($(regParam) != 0.0)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12895][SPARK-12896] Migrate TaskMetrics...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/10835#issuecomment-175892453 I have compiled a list of breaking `@DeveloperAPI` changes: ExceptionFailure: - changed: `apply`, `unapply`, `copy` - removed: old constructor - deprecated: `metrics` InputMetrics: - removed: old constructor, all case class methods, `updateBytesRead`, `setBytesReadCallback`, `var bytesReadCallback` - deprecated: `apply`, `unapply`, `incBytesRead`, `incRecordsRead` OutputMetrics: - removed: old constructor, all case class methods - deprecated: `apply`, `unapply` ShuffleReadMetrics: - removed: old constructor ShuffleWriteMetrics: - removed: old constructor TaskMetrics: - changed: `accumulatorUpdates` return type (Map[Long, Any] -> Seq[AccumulableInfo]) - removed: `hostname` - deprecated: `var updatedBlocks`, set `var outputMetrics`, set `var shuffleWriteMetrics` AccumulableInfo: - changed: `update` type (Option[String] -> Option[Any]), `value` type (String -> Option[Any]), `name` type (String -> Option[String]) - removed: `internal` - deprecated: all existing `apply` methods SparkListenerTaskEnd: - changed: `taskMetrics` is now @Nullable SparkListenerExecutorMetricsUpdate: - changed: `apply`, `unapply`, `copy` - removed: old constructor, `taskMetrics` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10944#issuecomment-175892206 Can you paste some generated code? (Actually I think that's useful for most of the code gen prs). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13020][SQL][test] fix random generator ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10930#issuecomment-175892315 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50200/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13020][SQL][test] fix random generator ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10930#issuecomment-175892138 **[Test build #50200 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50200/consoleFull)** for PR 10930 at commit [`2c94ebf`](https://github.com/apache/spark/commit/2c94ebf360512fb6c58c0cf199122f349eafa0cb). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class CountMinSketchImpl extends CountMinSketch implements Serializable ` * `class DefaultSource extends HadoopFsRelationProvider with DataSourceRegister ` * `class ChiSqSelector(JavaEstimator, HasFeaturesCol, HasOutputCol, HasLabelCol):` * `class ChiSqSelectorModel(JavaModel):` * ` public static final class Array extends ArrayData ` * ` public static final class Struct extends InternalRow ` * `public class ColumnVectorUtils ` * ` public static final class Row extends InternalRow ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175898494 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50229/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175898492 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...
Github user dilipbiswal commented on the pull request: https://github.com/apache/spark/pull/10943#issuecomment-175899786 @cloud-fan Thank you Wenchen for your comments. In my understanding , users need to use back-tick to quote the column names if they wanted them to be treated as a column name as opposed to column path. I tried the following example val df = Seq((1, 2, 3)).toDF("a_b", "a.c", "b.c") df.select("a.c") => fails to resolve df.select("`a.c`") => works fine. Is this not how it is supposed to work ? Can you please elaborate by taking a small example ? Thanks in advance. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/10702#discussion_r51070105 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -219,33 +219,43 @@ class LinearRegression @Since("1.3.0") (@Since("1.3.0") override val uid: String } val yMean = ySummarizer.mean(0) -val yStd = math.sqrt(ySummarizer.variance(0)) - -// If the yStd is zero, then the intercept is yMean with zero coefficient; -// as a result, training is not needed. -if (yStd == 0.0) { - logWarning(s"The standard deviation of the label is zero, so the coefficients will be " + -s"zeros and the intercept will be the mean of the label; as a result, " + -s"training is not needed.") - if (handlePersistence) instances.unpersist() - val coefficients = Vectors.sparse(numFeatures, Seq()) - val intercept = yMean - - val model = new LinearRegressionModel(uid, coefficients, intercept) - // Handle possible missing or invalid prediction columns - val (summaryModel, predictionColName) = model.findSummaryModelAndPredictionCol() - - val trainingSummary = new LinearRegressionTrainingSummary( -summaryModel.transform(dataset), -predictionColName, -$(labelCol), -model, -Array(0D), -$(featuresCol), -Array(0D)) - return copyValues(model.setSummary(trainingSummary)) +val rawYStd = math.sqrt(ySummarizer.variance(0)) +if (rawYStd == 0.0) { + if ($(fitIntercept)) { +// If the rawYStd is zero and fitIntercept=true, then the intercept is yMean with +// zero coefficient; as a result, training is not needed. +logWarning(s"The standard deviation of the label is zero, so the coefficients will be " + + s"zeros and the intercept will be the mean of the label; as a result, " + + s"training is not needed.") +if (handlePersistence) instances.unpersist() +val coefficients = Vectors.sparse(numFeatures, Seq()) +val intercept = yMean + +val model = new LinearRegressionModel(uid, coefficients, intercept) +// Handle possible missing or invalid prediction columns +val (summaryModel, predictionColName) = model.findSummaryModelAndPredictionCol() + +val trainingSummary = new LinearRegressionTrainingSummary( + summaryModel.transform(dataset), + predictionColName, + $(labelCol), + model, + Array(0D), + $(featuresCol), + Array(0D)) +return copyValues(model.setSummary(trainingSummary)) + } else { +require(!($(regParam) > 0.0 && $(standardization)), --- End diff -- also change the message. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12963] Improve performance of stddev/va...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10960#issuecomment-175915586 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50239/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12963] Improve performance of stddev/va...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10960#issuecomment-175915582 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13047][PYSPARK][ML] Pyspark Params.hasP...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10962#issuecomment-175920740 **[Test build #50242 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50242/consoleFull)** for PR 10962 at commit [`d52b1de`](https://github.com/apache/spark/commit/d52b1de1adefedb6938130d0530ea46fdb3f64f7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...
Github user iyounus commented on a diff in the pull request: https://github.com/apache/spark/pull/10702#discussion_r51071809 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -219,33 +219,43 @@ class LinearRegression @Since("1.3.0") (@Since("1.3.0") override val uid: String } val yMean = ySummarizer.mean(0) -val yStd = math.sqrt(ySummarizer.variance(0)) - -// If the yStd is zero, then the intercept is yMean with zero coefficient; -// as a result, training is not needed. -if (yStd == 0.0) { - logWarning(s"The standard deviation of the label is zero, so the coefficients will be " + -s"zeros and the intercept will be the mean of the label; as a result, " + -s"training is not needed.") - if (handlePersistence) instances.unpersist() - val coefficients = Vectors.sparse(numFeatures, Seq()) - val intercept = yMean - - val model = new LinearRegressionModel(uid, coefficients, intercept) - // Handle possible missing or invalid prediction columns - val (summaryModel, predictionColName) = model.findSummaryModelAndPredictionCol() - - val trainingSummary = new LinearRegressionTrainingSummary( -summaryModel.transform(dataset), -predictionColName, -$(labelCol), -model, -Array(0D), -$(featuresCol), -Array(0D)) - return copyValues(model.setSummary(trainingSummary)) +val rawYStd = math.sqrt(ySummarizer.variance(0)) +if (rawYStd == 0.0) { + if ($(fitIntercept)) { +// If the rawYStd is zero and fitIntercept=true, then the intercept is yMean with +// zero coefficient; as a result, training is not needed. +logWarning(s"The standard deviation of the label is zero, so the coefficients will be " + + s"zeros and the intercept will be the mean of the label; as a result, " + + s"training is not needed.") +if (handlePersistence) instances.unpersist() +val coefficients = Vectors.sparse(numFeatures, Seq()) +val intercept = yMean + +val model = new LinearRegressionModel(uid, coefficients, intercept) +// Handle possible missing or invalid prediction columns +val (summaryModel, predictionColName) = model.findSummaryModelAndPredictionCol() + +val trainingSummary = new LinearRegressionTrainingSummary( + summaryModel.transform(dataset), + predictionColName, + $(labelCol), + model, + Array(0D), + $(featuresCol), + Array(0D)) +return copyValues(model.setSummary(trainingSummary)) + } else { +require(!($(regParam) > 0.0 && $(standardization)), --- End diff -- I can change this. But, the behaviour of WeightedLeastSquares should also be the same. Should I also make changes to WeightedLeastSquares? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10620] Minor addendum to #10835
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10958#issuecomment-175899434 **[Test build #50234 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50234/consoleFull)** for PR 10958 at commit [`5404254`](https://github.com/apache/spark/commit/540425450ea0e5376d99f6ccb43857b74f34204e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...
Github user dbtsai commented on the pull request: https://github.com/apache/spark/pull/10702#issuecomment-175901752 @iyounus `standardizeLabel = false/ture` with non-zero `regParam`, let's throw the exception. I explained the mismatch against the analytic normal equation in the other PR. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...
Github user iyounus commented on a diff in the pull request: https://github.com/apache/spark/pull/10702#discussion_r51071489 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -219,33 +219,43 @@ class LinearRegression @Since("1.3.0") (@Since("1.3.0") override val uid: String } val yMean = ySummarizer.mean(0) -val yStd = math.sqrt(ySummarizer.variance(0)) - -// If the yStd is zero, then the intercept is yMean with zero coefficient; -// as a result, training is not needed. -if (yStd == 0.0) { - logWarning(s"The standard deviation of the label is zero, so the coefficients will be " + -s"zeros and the intercept will be the mean of the label; as a result, " + -s"training is not needed.") - if (handlePersistence) instances.unpersist() - val coefficients = Vectors.sparse(numFeatures, Seq()) - val intercept = yMean - - val model = new LinearRegressionModel(uid, coefficients, intercept) - // Handle possible missing or invalid prediction columns - val (summaryModel, predictionColName) = model.findSummaryModelAndPredictionCol() - - val trainingSummary = new LinearRegressionTrainingSummary( -summaryModel.transform(dataset), -predictionColName, -$(labelCol), -model, -Array(0D), -$(featuresCol), -Array(0D)) - return copyValues(model.setSummary(trainingSummary)) +val rawYStd = math.sqrt(ySummarizer.variance(0)) +if (rawYStd == 0.0) { + if ($(fitIntercept)) { +// If the rawYStd is zero and fitIntercept=true, then the intercept is yMean with +// zero coefficient; as a result, training is not needed. +logWarning(s"The standard deviation of the label is zero, so the coefficients will be " + + s"zeros and the intercept will be the mean of the label; as a result, " + + s"training is not needed.") +if (handlePersistence) instances.unpersist() +val coefficients = Vectors.sparse(numFeatures, Seq()) +val intercept = yMean + +val model = new LinearRegressionModel(uid, coefficients, intercept) +// Handle possible missing or invalid prediction columns +val (summaryModel, predictionColName) = model.findSummaryModelAndPredictionCol() + +val trainingSummary = new LinearRegressionTrainingSummary( + summaryModel.transform(dataset), + predictionColName, + $(labelCol), + model, + Array(0D), + $(featuresCol), + Array(0D)) +return copyValues(model.setSummary(trainingSummary)) + } else { +require(!($(regParam) > 0.0 && $(standardization)), + "The standard deviation of the label is zero. " + +"Model cannot be regularized with standardization=true") +logWarning(s"The standard deviation of the label is zero. " + + "Consider setting fitIntercept=true.") + } } +// if y is constant (rawYStd is zero), then y cannot be scaled. In this case +// setting yStd=1.0 ensures that y is not scaled anymore in l-bfgs algorithm. +val yStd = if (rawYStd > 0) rawYStd else 1.0 --- End diff -- Its not clear to me why would you set yStd = abs(yMean) if label is constant. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12656] [SQL] Implement Intersect with L...
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/10630#discussion_r51059167 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -388,57 +445,18 @@ class Analyzer( .map(_.asInstanceOf[NamedExpression]) a.copy(aggregateExpressions = expanded) - // Special handling for cases when self-join introduce duplicate expression ids. - case j @ Join(left, right, _, _) if !j.selfJoinResolved => -val conflictingAttributes = left.outputSet.intersect(right.outputSet) -logDebug(s"Conflicting attributes ${conflictingAttributes.mkString(",")} in $j") - -right.collect { - // Handle base relations that might appear more than once. - case oldVersion: MultiInstanceRelation - if oldVersion.outputSet.intersect(conflictingAttributes).nonEmpty => -val newVersion = oldVersion.newInstance() -(oldVersion, newVersion) - - // Handle projects that create conflicting aliases. - case oldVersion @ Project(projectList, _) - if findAliases(projectList).intersect(conflictingAttributes).nonEmpty => -(oldVersion, oldVersion.copy(projectList = newAliases(projectList))) - - case oldVersion @ Aggregate(_, aggregateExpressions, _) - if findAliases(aggregateExpressions).intersect(conflictingAttributes).nonEmpty => -(oldVersion, oldVersion.copy(aggregateExpressions = newAliases(aggregateExpressions))) - - case oldVersion: Generate - if oldVersion.generatedSet.intersect(conflictingAttributes).nonEmpty => -val newOutput = oldVersion.generatorOutput.map(_.newInstance()) -(oldVersion, oldVersion.copy(generatorOutput = newOutput)) - - case oldVersion @ Window(_, windowExpressions, _, _, child) - if AttributeSet(windowExpressions.map(_.toAttribute)).intersect(conflictingAttributes) -.nonEmpty => -(oldVersion, oldVersion.copy(windowExpressions = newAliases(windowExpressions))) -} -// Only handle first case, others will be fixed on the next pass. -.headOption match { - case None => -/* - * No result implies that there is a logical plan node that produces new references - * that this rule cannot handle. When that is the case, there must be another rule - * that resolves these conflicts. Otherwise, the analysis will fail. - */ -j - case Some((oldRelation, newRelation)) => -val attributeRewrites = AttributeMap(oldRelation.output.zip(newRelation.output)) -val newRight = right transformUp { - case r if r == oldRelation => newRelation -} transformUp { - case other => other transformExpressions { -case a: Attribute => attributeRewrites.get(a).getOrElse(a) - } -} -j.copy(right = newRight) -} + // To resolve duplicate expression IDs for all the BinaryNode + case b: BinaryNode if !b.duplicateResolved => b match { +case j @ Join(left, right, _, _) => + j.copy(right = dedupRight(left, right)) +case i @ Intersect(left, right) => + i.copy(right = dedupRight(left, right)) +case e @ Except(left, right) => + e.copy(right = dedupRight(left, right)) +case cg: CoGroup => --- End diff -- In this case, it should work! Let me know if we should deduplicate the expression IDs for the other operators. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10620] Minor addendum to #10835
GitHub user andrewor14 opened a pull request: https://github.com/apache/spark/pull/10958 [SPARK-10620] Minor addendum to #10835 You can merge this pull request into a Git repository by running: $ git pull https://github.com/andrewor14/spark task-metrics-to-accums-followups Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10958.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10958 commit 8be6c863097d4eef0ac1b03b94165b2e61f1df7d Author: Andrew OrDate: 2016-01-27T22:31:09Z Fix indentations, visibility, deprecation etc. commit 9de795b67ed52068472bffcce119989efd4aed43 Author: Andrew Or Date: 2016-01-27T22:32:47Z Merge branch 'master' of github.com:apache/spark into task-metrics-to-accums-followups Conflicts: core/src/main/scala/org/apache/spark/Accumulable.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10620] [SPARK-13054] Minor addendum to ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10958#issuecomment-175908753 **[Test build #50238 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50238/consoleFull)** for PR 10958 at commit [`6e4859d`](https://github.com/apache/spark/commit/6e4859d0aff3dbbd1c59e88101b7112610eb7d3c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13050] [Build] Scalatest tags fail buil...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10954#issuecomment-175894464 **[Test build #50221 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50221/consoleFull)** for PR 10954 at commit [`6ab0ec9`](https://github.com/apache/spark/commit/6ab0ec9ce3748ce395885bcefeeacc4178e31d3d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13020][SQL][test] fix random generator ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10930#issuecomment-175900800 **[Test build #50233 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50233/consoleFull)** for PR 10930 at commit [`e627f5b`](https://github.com/apache/spark/commit/e627f5b96a21ccc748c75c7fa0a4c4839cdc63c5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10620] Minor addendum to #10835
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10958#issuecomment-175903012 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10620] Minor addendum to #10835
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10958#issuecomment-175902964 **[Test build #50234 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50234/consoleFull)** for PR 10958 at commit [`5404254`](https://github.com/apache/spark/commit/540425450ea0e5376d99f6ccb43857b74f34204e). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10620] Minor addendum to #10835
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10958#issuecomment-175903014 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50234/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13049] Add First/last with ignore nulls...
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/10957#issuecomment-175908190 A user is trying to get this working on 1.6 using the dataframe api. That doesn't work directly because functions.scala misses the functions implemented in this PR. The indirect approach using ```expr(...)``` doesn't work because ```WindowSpec``` does not support ```UnresolvedFunctions```. I guess this is more a feature than a bug fix --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12963] Improve performance of stddev/va...
GitHub user davies opened a pull request: https://github.com/apache/spark/pull/10960 [SPARK-12963] Improve performance of stddev/variance As benchmarked and discussed here: https://github.com/apache/spark/pull/10786/files#r50038294, benefits from codegen, the declarative aggregate function could be much faster than imperative one. This PR is based on #10944 You can merge this pull request into a Git repository by running: $ git pull https://github.com/davies/spark stddev Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/10960.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #10960 commit b4db00675bc3c51ddf8735cace522a5d771cf7e2 Author: Davies LiuDate: 2016-01-27T07:43:40Z cleanup whole stage codegen commit 70a7c7edd1988c7dd69bccc8e563c9943775bd2c Author: Davies Liu Date: 2016-01-27T23:22:33Z improve stddev and variance --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-12957][SQL] Initial support for co...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/10844#discussion_r51068566 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala --- @@ -146,6 +172,26 @@ case class Union(children: Seq[LogicalPlan]) extends LogicalPlan { val sizeInBytes = children.map(_.statistics.sizeInBytes).sum Statistics(sizeInBytes = sizeInBytes) } + + override def extractConstraintsFromChild(child: QueryPlan[LogicalPlan]): Set[Expression] = { +child.constraints.filter(_.references.subsetOf(child.outputSet)) + } + + def rewriteConstraints( + planA: LogicalPlan, + planB: LogicalPlan, + constraints: Set[Expression]): Set[Expression] = { +require(planA.output.size == planB.output.size) +val attributeRewrites = AttributeMap(planB.output.zip(planA.output)) +constraints.map(_ transform { + case a: Attribute => attributeRewrites(a) +}) + } + + override def constraints: Set[Expression] = { +children.map(child => rewriteConstraints(children.head, child, + extractConstraintsFromChild(child))).reduce(_ intersect _) --- End diff -- same style nit --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175911254 **[Test build #50227 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50227/consoleFull)** for PR 10940 at commit [`914cffc`](https://github.com/apache/spark/commit/914cffc6f0a9e0d847f486916ff89941c55c63ce). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13045][SQL] Remove ColumnVector.Struct ...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/10952#issuecomment-175911147 Merging this into master, thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10810] [SPARK-10902] [SQL] Improve sess...
Github user Neuw84 commented on the pull request: https://github.com/apache/spark/pull/8909#issuecomment-175916186 @deenar, I saw it after many hours reading the code in the web docs . Although, I think that the HiveSparkContext should implement the same logic as the SparkContext where you can get the same Session programmatically. Thanks by the way --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-12732][ML] bug fix in linear regression...
Github user dbtsai commented on a diff in the pull request: https://github.com/apache/spark/pull/10702#discussion_r51070506 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -219,33 +219,43 @@ class LinearRegression @Since("1.3.0") (@Since("1.3.0") override val uid: String } val yMean = ySummarizer.mean(0) -val yStd = math.sqrt(ySummarizer.variance(0)) - -// If the yStd is zero, then the intercept is yMean with zero coefficient; -// as a result, training is not needed. -if (yStd == 0.0) { - logWarning(s"The standard deviation of the label is zero, so the coefficients will be " + -s"zeros and the intercept will be the mean of the label; as a result, " + -s"training is not needed.") - if (handlePersistence) instances.unpersist() - val coefficients = Vectors.sparse(numFeatures, Seq()) - val intercept = yMean - - val model = new LinearRegressionModel(uid, coefficients, intercept) - // Handle possible missing or invalid prediction columns - val (summaryModel, predictionColName) = model.findSummaryModelAndPredictionCol() - - val trainingSummary = new LinearRegressionTrainingSummary( -summaryModel.transform(dataset), -predictionColName, -$(labelCol), -model, -Array(0D), -$(featuresCol), -Array(0D)) - return copyValues(model.setSummary(trainingSummary)) +val rawYStd = math.sqrt(ySummarizer.variance(0)) +if (rawYStd == 0.0) { + if ($(fitIntercept)) { +// If the rawYStd is zero and fitIntercept=true, then the intercept is yMean with +// zero coefficient; as a result, training is not needed. +logWarning(s"The standard deviation of the label is zero, so the coefficients will be " + + s"zeros and the intercept will be the mean of the label; as a result, " + + s"training is not needed.") +if (handlePersistence) instances.unpersist() +val coefficients = Vectors.sparse(numFeatures, Seq()) +val intercept = yMean + +val model = new LinearRegressionModel(uid, coefficients, intercept) +// Handle possible missing or invalid prediction columns +val (summaryModel, predictionColName) = model.findSummaryModelAndPredictionCol() + +val trainingSummary = new LinearRegressionTrainingSummary( + summaryModel.transform(dataset), + predictionColName, + $(labelCol), + model, + Array(0D), + $(featuresCol), + Array(0D)) +return copyValues(model.setSummary(trainingSummary)) + } else { +require(!($(regParam) > 0.0 && $(standardization)), + "The standard deviation of the label is zero. " + +"Model cannot be regularized with standardization=true") +logWarning(s"The standard deviation of the label is zero. " + + "Consider setting fitIntercept=true.") + } } +// if y is constant (rawYStd is zero), then y cannot be scaled. In this case +// setting yStd=1.0 ensures that y is not scaled anymore in l-bfgs algorithm. +val yStd = if (rawYStd > 0) rawYStd else 1.0 --- End diff -- Actually, in the case of `yMean == 0.0 && yStd == 0.0`, the coefficients will be all zeros as well even `fitIntercept == false`. This is rare case, so we can let model training to figure out. But if you want to handle this explicitly, it's great. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13049] Add First/last with ignore nulls...
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/10957#issuecomment-175881244 @yhuai ```expr("last(r, true)")``` would return an ```UnresolvedFunction(UnresolvedAttribute(r), Literal(true))```. The problem is that the ```WindowSpec``` does not recognize ```UnresolvedFunction```'s. This is the cleaner fix. We could also add a match to the ```WindowSpec``` function for unresolved functions. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12177] [STREAMING] Update KafkaDStreams...
Github user markgrover commented on the pull request: https://github.com/apache/spark/pull/10953#issuecomment-175887290 OK, I have no idea what Mima is but I will take a look and try to run them locally and fix the issues. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13020][SQL][test] fix random generator ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10930#issuecomment-175892312 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13020][SQL][test] fix random generator ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10930#issuecomment-175896643 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13020][SQL][test] fix random generator ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10930#issuecomment-175896645 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50205/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13049] Add First/last with ignore nulls...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10957#issuecomment-175896538 **[Test build #50231 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50231/consoleFull)** for PR 10957 at commit [`defcc02`](https://github.com/apache/spark/commit/defcc02a8885e884d5140b11705b764a51753162). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12895][SPARK-12896] Migrate TaskMetrics...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/10835#discussion_r51062999 --- Diff: core/src/main/scala/org/apache/spark/Accumulator.scala --- @@ -75,43 +84,65 @@ private[spark] object Accumulators extends Logging { * This global map holds the original accumulator objects that are created on the driver. * It keeps weak references to these objects so that accumulators can be garbage-collected * once the RDDs and user-code that reference them are cleaned up. + * TODO: Don't use a global map; these should be tied to a SparkContext at the very least. --- End diff -- https://issues.apache.org/jira/browse/SPARK-13051 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-13052 waitingApps metric doesn't show th...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10959#issuecomment-175904859 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13049] Add First/last with ignore nulls...
Github user hvanhovell commented on the pull request: https://github.com/apache/spark/pull/10957#issuecomment-175881484 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/10944#issuecomment-175895315 Here is the generated code for `sqlContext.range(values).filter("(id & 1) = 1").count()` ``` /* 001 */ /* 002 */ public Object generate(Object[] references) { /* 003 */ return new GeneratedIterator(references); /* 004 */ } /* 005 */ /* 006 */ class GeneratedIterator extends org.apache.spark.sql.execution.BufferedRowIterator { /* 007 */ /* 008 */ private Object[] references; /* 009 */ private boolean TungstenAggregate_initAgg0; /* 010 */ private boolean TungstenAggregate_bufIsNull1; /* 011 */ private long TungstenAggregate_bufValue2; /* 012 */ private boolean Range_initRange6; /* 013 */ private long Range_partitionEnd7; /* 014 */ private long Range_number8; /* 015 */ private boolean Range_overflow9; /* 016 */ private UnsafeRow TungstenAggregate_result29; /* 017 */ private org.apache.spark.sql.catalyst.expressions.codegen.BufferHolder TungstenAggregate_holder30; /* 018 */ private org.apache.spark.sql.catalyst.expressions.codegen.UnsafeRowWriter TungstenAggregate_rowWriter31; /* 019 */ /* 020 */ private void initRange(int idx) { /* 021 */ java.math.BigInteger index = java.math.BigInteger.valueOf(idx); /* 022 */ java.math.BigInteger numSlice = java.math.BigInteger.valueOf(1L); /* 023 */ java.math.BigInteger numElement = java.math.BigInteger.valueOf(209715200L); /* 024 */ java.math.BigInteger step = java.math.BigInteger.valueOf(1L); /* 025 */ java.math.BigInteger start = java.math.BigInteger.valueOf(0L); /* 026 */ /* 027 */ java.math.BigInteger st = index.multiply(numElement).divide(numSlice).multiply(step).add(start); /* 028 */ if (st.compareTo(java.math.BigInteger.valueOf(Long.MAX_VALUE)) > 0) { /* 029 */ Range_number8 = Long.MAX_VALUE; /* 030 */ } else if (st.compareTo(java.math.BigInteger.valueOf(Long.MIN_VALUE)) < 0) { /* 031 */ Range_number8 = Long.MIN_VALUE; /* 032 */ } else { /* 033 */ Range_number8 = st.longValue(); /* 034 */ } /* 035 */ /* 036 */ java.math.BigInteger end = index.add(java.math.BigInteger.ONE).multiply(numElement).divide(numSlice) /* 037 */ .multiply(step).add(start); /* 038 */ if (end.compareTo(java.math.BigInteger.valueOf(Long.MAX_VALUE)) > 0) { /* 039 */ Range_partitionEnd7 = Long.MAX_VALUE; /* 040 */ } else if (end.compareTo(java.math.BigInteger.valueOf(Long.MIN_VALUE)) < 0) { /* 041 */ Range_partitionEnd7 = Long.MIN_VALUE; /* 042 */ } else { /* 043 */ Range_partitionEnd7 = end.longValue(); /* 044 */ } /* 045 */ } /* 046 */ /* 047 */ /* 048 */ private void TungstenAggregate_doAgg5() { /* 049 */ // initialize aggregation buffer /* 050 */ /* 0 */ /* 051 */ /* 052 */ TungstenAggregate_bufIsNull1 = false; /* 053 */ TungstenAggregate_bufValue2 = 0L; /* 054 */ /* 055 */ /* 056 */ /* 057 */ // initialize Range /* 058 */ if (!Range_initRange6) { /* 059 */ Range_initRange6 = true; /* 060 */ if (input.hasNext()) { /* 061 */ initRange(((InternalRow) input.next()).getInt(0)); /* 062 */ } else { /* 063 */ return; /* 064 */ } /* 065 */ } /* 066 */ /* 067 */ while (!Range_overflow9 && Range_number8 < Range_partitionEnd7) { /* 068 */ long Range_value10 = Range_number8; /* 069 */ Range_number8 += 1L; /* 070 */ if (Range_number8 < Range_value10 ^ 1L < 0) { /* 071 */ Range_overflow9 = true; /* 072 */ } /* 073 */ /* 074 */ /* ((input[0, bigint] & 1) = 1) */ /* 075 */ /* (input[0, bigint] & 1) */ /* 076 */ /* input[0, bigint] */ /* 077 */ /* 078 */ /* 1 */ /* 079 */ /* 080 */ long Filter_value14 = -1L; /* 081 */ Filter_value14 = Range_value10 & 1L; /* 082 */ /* 1 */ /* 083 */ /* 084 */ boolean Filter_value12 = false; /* 085 */ Filter_value12 = Filter_value14 == 1L; /* 086 */ if (!false && Filter_value12) { /* 087 */ /* 088 */ /* 089 */ /* 090 */ /* 091 */ // do aggregate and update aggregation buffer /* 092 */ /* 093 */ /* (input[0, bigint] + 1) */ /* 094 */ /* input[0, bigint] */ /* 095 */ /* 096 */ /* 1 */ /* 097 */ /* 098 */ long TungstenAggregate_value22 = -1L; /* 099 */ TungstenAggregate_value22 = TungstenAggregate_bufValue2 + 1L; /* 100 */ TungstenAggregate_bufIsNull1 = false; /* 101 */ TungstenAggregate_bufValue2 = TungstenAggregate_value22; /* 102 */ /* 103 */ /* 104 */ /* 105 */
[GitHub] spark pull request: [SPARK-13020][SQL][test] fix random generator ...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/10930#issuecomment-175895328 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13029][ml] fix a logistic regression is...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10940#issuecomment-175898182 **[Test build #50229 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50229/consoleFull)** for PR 10940 at commit [`914cffc`](https://github.com/apache/spark/commit/914cffc6f0a9e0d847f486916ff89941c55c63ce). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6363][BUILD] Make Scala 2.11 the defaul...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10608#issuecomment-175907303 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6363][BUILD] Make Scala 2.11 the defaul...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10608#issuecomment-175907259 **[Test build #50236 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50236/consoleFull)** for PR 10608 at commit [`18c5223`](https://github.com/apache/spark/commit/18c5223bef0330085f0f577fea49581aa82e2ca1). * This patch **fails MiMa tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12926][SQL] SQLContext to disallow user...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/10849#issuecomment-175907368 Please update the title and description (these become the commit message when merging). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-6363][BUILD] Make Scala 2.11 the defaul...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10608#issuecomment-175907305 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50236/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13045][SQL] Remove ColumnVector.Struct ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10952#issuecomment-175909089 cc @davies --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13045][SQL] Remove ColumnVector.Struct ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/10952 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12963] Improve performance of stddev/va...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10960#issuecomment-175914531 **[Test build #50240 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50240/consoleFull)** for PR 10960 at commit [`61edd5e`](https://github.com/apache/spark/commit/61edd5e3a2c030d7387db5283eee5ada13553505). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13043][SQL] Implement remaining catalys...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10961#issuecomment-175917592 **[Test build #50241 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50241/consoleFull)** for PR 10961 at commit [`24ca13c`](https://github.com/apache/spark/commit/24ca13c7f8b2ac5fbc4a9600539bb02d22b56a91). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12988][SQL] Can't drop columns that con...
Github user dilipbiswal commented on the pull request: https://github.com/apache/spark/pull/10943#issuecomment-175917517 @cloud-fan Thank you Wenchen. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11955][SQL] Mark optional fields in mer...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/9940#issuecomment-175927505 ping @liancheng Please see if latest updates are proper for you. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-11565 Replace deprecated DigestUtils.sha...
Github user gliptak commented on the pull request: https://github.com/apache/spark/pull/9532#issuecomment-175927249 Github is hiccuping ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12567][SQL] Add aes_{encrypt,decrypt} U...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10527#issuecomment-175927638 **[Test build #50243 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50243/consoleFull)** for PR 10527 at commit [`9476822`](https://github.com/apache/spark/commit/94768224126acf303e9e8b6d2697388f0fec1d23). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12968][SQL] Implement command to set cu...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/10916#discussion_r51073364 --- Diff: sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/CliSuite.scala --- @@ -183,7 +183,7 @@ class CliSuite extends SparkFunSuite with BeforeAndAfterAll with Logging { "CREATE DATABASE hive_test_db;" -> "OK", "USE hive_test_db;" --> "OK", +-> "", --- End diff -- Return OK will break hive compatibility test. I've tried in previous commits. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12995][GraphX] Remove deprecate APIs fr...
Github user maropu commented on the pull request: https://github.com/apache/spark/pull/10918#issuecomment-175927636 @ankurdave @srowen ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12966][SQL] Support ArrayType(DecimalTy...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10898#issuecomment-175978667 **[Test build #50251 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50251/consoleFull)** for PR 10898 at commit [`52eaebe`](https://github.com/apache/spark/commit/52eaebea0cf2650ee1aff4c0eb2d7dfd706d655b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12968][SQL] Implement command to set cu...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/10916#discussion_r51082907 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/execution/HiveQuerySuite.scala --- @@ -1262,6 +1263,21 @@ class HiveQuerySuite extends HiveComparisonTest with BeforeAndAfter { } + test("use database") { +val currentDatabase = sql("select current_database()").first().getString(0) + +sql("CREATE DATABASE hive_test_db") +sql("USE hive_test_db") +assert("hive_test_db" == sql("select current_database()").first().getString(0)) --- End diff -- Do we already have database support in `SQLContext`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13057][SQL] Add benchmark codes and the...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10965#issuecomment-176000161 **[Test build #50254 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50254/consoleFull)** for PR 10965 at commit [`b3bf70c`](https://github.com/apache/spark/commit/b3bf70c810baefaa6fb374d6b8052341b847e0d7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11780][SQL] Add catalyst type aliases b...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10915#issuecomment-176007544 **[Test build #50256 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50256/consoleFull)** for PR 10915 at commit [`c338300`](https://github.com/apache/spark/commit/c3383003dff0c6c49849dad89da7a3fac906cab5). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12968][SQL] Implement command to set cu...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/10916#discussion_r51073555 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/client/ClientInterface.scala --- @@ -109,6 +109,9 @@ private[hive] trait ClientInterface { /** Returns the name of the active database. */ def currentDatabase: String --- End diff -- Yeah. I think we don't need to address database support for all catalogs in this PR? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12968][SQL] Implement command to set cu...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/10916#discussion_r51073506 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Catalog.scala --- @@ -46,6 +46,10 @@ trait Catalog { def lookupRelation(tableIdent: TableIdentifier, alias: Option[String] = None): LogicalPlan + def setCurrentDatabase(databaseName: String): Unit = { +throw new UnsupportedOperationException --- End diff -- I think not all catalog support database concept. So, inherited catalog can choose to implement it or not. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12567][SQL] Add aes_{encrypt,decrypt} U...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10527#issuecomment-175932469 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12567][SQL] Add aes_{encrypt,decrypt} U...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10527#issuecomment-175932477 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50245/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12792][SPARKR] Refactor RRDD to support...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/10947#issuecomment-175946838 cc @davies. Thanks @sunrui for the PR. I'll review this later today --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9893#issuecomment-175955462 **[Test build #50247 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50247/consoleFull)** for PR 9893 at commit [`ab6d601`](https://github.com/apache/spark/commit/ab6d6016a9edc42bc5fae3eebff63fca518912d8). * This patch **fails build dependency tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9893#issuecomment-175955469 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/50247/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9893#issuecomment-175955464 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10873] Support column sort and search f...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10648#issuecomment-176012531 @tgravescs @zhuoliu are you guys interested in more UI work? I have some ideas that I never found time / people to work on ... I think they will make the UI a lot more useful. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13031] [SQL] cleanup codegen and improv...
Github user davies commented on the pull request: https://github.com/apache/spark/pull/10944#issuecomment-176013043 @nongli Does this one looks good to you? this one blocks others. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12689][SQL] Migrate DDL parsing to the ...
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/10723#discussion_r51074963 --- Diff: sql/catalyst/src/main/antlr3/org/apache/spark/sql/catalyst/parser/SparkSqlLexer.g --- @@ -465,7 +467,7 @@ Identifier fragment QuotedIdentifier : -'`' ( '``' | ~('`') )* '`' { setText(getText().substring(1, getText().length() -1 ).replaceAll("``", "`")); } +'`' ( '``' | ~('`') )* '`' { setText(getText().replaceAll("``", "`")); } --- End diff -- The above query will get ParseException: mismatched character '' expecting '`'. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10521][SQL] Utilize Docker for test DB2...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9893#issuecomment-175955168 **[Test build #50247 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/50247/consoleFull)** for PR 9893 at commit [`ab6d601`](https://github.com/apache/spark/commit/ab6d6016a9edc42bc5fae3eebff63fca518912d8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10873] Support column sort and search f...
Github user zhuoliu commented on the pull request: https://github.com/apache/spark/pull/10648#issuecomment-175966503 Hi @tgravescs , finally fixed the paging stuff in RowsGrouping. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org