[GitHub] spark pull request: upgrade joda-time: 2.9 -> 2.9.2
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11847#issuecomment-198853307 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: upgrade joda-time: 2.9 -> 2.9.2
GitHub user sullis opened a pull request: https://github.com/apache/spark/pull/11847 upgrade joda-time: 2.9 -> 2.9.2 You can merge this pull request into a Git repository by running: $ git pull https://github.com/sullis/spark joda-time-2.9.2 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11847.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11847 commit 50a67cbfbcdd59dd97551d50ce930e6f4f32550c Author: Sean SullivanDate: 2016-03-20T05:32:24Z upgrade joda-time: 2.9 -> 2.9.2 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13289][MLLIB] Fix infinite distances be...
Github user flyjy commented on the pull request: https://github.com/apache/spark/pull/11812#issuecomment-198852800 Thanks. I have checked that the problem still exists with only the adaptive learning rate change. So, I will fix this bug without change the existing interface. I think that the score should be between 0 and 1 based on the definition of cosine similarity. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/11756#issuecomment-197629010 LGTM, cc @davies for another look. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12469][CORE][WIP/RFC] Consistent accumu...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/11105#discussion_r56426605 --- Diff: core/src/main/scala/org/apache/spark/Accumulable.scala --- @@ -146,6 +212,32 @@ class Accumulable[R, T] private ( def merge(term: R) { value_ = param.addInPlace(value_, term)} /** + * Merge in pending updates for ac consistent accumulators or merge accumulated values for + * regular accumulators. This is only called on the driver when merging task results together. + */ + private[spark] def internalMerge(term: Any) { +if (!consistent) { + merge(term.asInstanceOf[R]) +} else { + mergePending(term.asInstanceOf[mutable.HashMap[(Int, Int, Int), R]]) +} + } + + /** + * Merge another Accumulable's pending updates, checks to make sure that each pending update has + * not already been processed before updating. + */ + private[spark] def mergePending(term: mutable.HashMap[(Int, Int, Int), R]) = { +term.foreach{case ((rddId, shuffleId, splitId), v) => + val splits = processed.getOrElseUpdate((rddId, shuffleId), new mutable.BitSet()) + if (!splits.contains(splitId)) { +splits += splitId +value_ = param.addInPlace(value_, v) + } --- End diff -- Sure we could do that - I'd kept the separate processed since I thought the space efficiency of a bitset might be worth it as well as it seemed like it might be more confusing to have one val with two different meanings between driver & worker. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13988][Core] Make replaying event logs ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11800#issuecomment-198112103 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198082685 **[Test build #53468 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53468/consoleFull)** for PR 11636 at commit [`5efadf3`](https://github.com/apache/spark/commit/5efadf3f159b40a04621832facbccb99cb4b2c5c). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/10231#discussion_r56441090 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -842,60 +842,59 @@ private[ml] object RandomForest extends Logging { 1.0 } logDebug("fraction of data used for calculating quantiles = " + fraction) - input.sample(withReplacement = false, fraction, new XORShiftRandom(seed).nextInt()).collect() + input.sample(withReplacement = false, fraction, new XORShiftRandom(seed).nextInt()) } else { - new Array[LabeledPoint](0) + input.sparkContext.emptyRDD[LabeledPoint] } -val splits = new Array[Array[Split]](numFeatures) - -// Find all splits. -// Iterate over all features. -var featureIndex = 0 -while (featureIndex < numFeatures) { - if (metadata.isContinuous(featureIndex)) { -val featureSamples = sampledInput.map(_.features(featureIndex)) -val featureSplits = findSplitsForContinuousFeature(featureSamples, metadata, featureIndex) +findSplitsBinsBySorting(sampledInput, metadata, continuousFeatures) + } -val numSplits = featureSplits.length -logDebug(s"featureIndex = $featureIndex, numSplits = $numSplits") -splits(featureIndex) = new Array[Split](numSplits) + private def findSplitsBinsBySorting( + input: RDD[LabeledPoint], + metadata: DecisionTreeMetadata, + continuousFeatures: IndexedSeq[Int]): Array[Array[Split]] = { + +val continuousSplits = { + // reduce the parallelism for split computations when there are less + // continuous features than input partitions. this prevents tasks from + // being spun up that will definitely do no work. + val numPartitions = math.min(continuousFeatures.length, input.partitions.length) + + input +.flatMap(point => continuousFeatures.map(idx => (idx, point.features(idx +.groupByKey(numPartitions) +.map { case (idx, samples) => + val thresholds = findSplitsForContinuousFeature(samples.toArray, metadata, idx) + val splits: Array[Split] = thresholds.map(thresh => new ContinuousSplit(idx, thresh)) + logDebug(s"featureIndex = $idx, numSplits = ${splits.length}") + (idx, splits) +}.collectAsMap() +} -var splitIndex = 0 -while (splitIndex < numSplits) { - val threshold = featureSplits(splitIndex) - splits(featureIndex)(splitIndex) = new ContinuousSplit(featureIndex, threshold) - splitIndex += 1 -} - } else { -// Categorical feature -if (metadata.isUnordered(featureIndex)) { - val numSplits = metadata.numSplits(featureIndex) - val featureArity = metadata.featureArity(featureIndex) - // TODO: Use an implicit representation mapping each category to a subset of indices. - // I.e., track indices such that we can calculate the set of bins for which - // feature value x splits to the left. - // Unordered features - // 2^(maxFeatureValue - 1) - 1 combinations - splits(featureIndex) = new Array[Split](numSplits) - var splitIndex = 0 - while (splitIndex < numSplits) { -val categories: List[Double] = - extractMultiClassCategories(splitIndex + 1, featureArity) -splits(featureIndex)(splitIndex) = - new CategoricalSplit(featureIndex, categories.toArray, featureArity) -splitIndex += 1 - } -} else { - // Ordered features - // Bins correspond to feature values, so we do not need to compute splits or bins - // beforehand. Splits are constructed as needed during training. - splits(featureIndex) = new Array[Split](0) +val numFeatures = metadata.numFeatures +val splits = Array.tabulate(numFeatures) { + case i if metadata.isContinuous(i) => +val split = continuousSplits(i) +metadata.setNumSplits(i, split.length) +split + + case i if metadata.isCategorical(i) && metadata.isUnordered(i) => +// Unordered features +// 2^(maxFeatureValue - 1) - 1 combinations +val featureArity = metadata.featureArity(i) +Array.tabulate[Split](metadata.numSplits(i)) { splitIndex => + val categories = extractMultiClassCategories(splitIndex + 1, featureArity) + new CategoricalSplit(i, categories.toArray, featureArity) } - } -
[GitHub] spark pull request: [SPARK-13958]Executor OOM due to unbounded gro...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11794#issuecomment-198535076 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53550/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13761] [ML] Deprecate validateParams
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/11620#discussion_r56513463 --- Diff: mllib/src/main/scala/org/apache/spark/ml/param/params.scala --- @@ -549,7 +548,9 @@ trait Params extends Identifiable with Serializable { * Parameter value checks which do not depend on other parameters are handled by * [[Param.validate()]]. This method does not handle input/output column parameters; * those are checked during schema validation. + * @deprecated Will be removed in 2.1.0. All the checks should be merged into transformSchema */ + @deprecated("Will be removed in 2.1.0. Checks should be merged into transformSchema.", "2.0.0") --- End diff -- It looks like this now causes a number of deprecation warnings in the Spark code, which we're trying to get rid of. Can most of the remaining usages be transformed to not use this method? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13764][SQL] Parse modes in JSON data so...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11756#issuecomment-198214354 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13974][SQL] sub-query names do not need...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11783#issuecomment-197913617 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53430/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198084216 **[Test build #53468 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53468/consoleFull)** for PR 11636 at commit [`5efadf3`](https://github.com/apache/spark/commit/5efadf3f159b40a04621832facbccb99cb4b2c5c). * This patch **fails to build**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class InputReference(ordinal: Int, dataType: DataType, nullable: Boolean, isColumn: Boolean)` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13852][YARN]handle the InterruptedExcep...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/11692#issuecomment-197342124 So inside of hadoop in the getApplicationReport call, it was in RetryInvocationHandler which was doing a sleep and got an interrupted exception. That ended up throwing a java.lang.reflect.UndeclaredThrowableException up to monitorApplication which is why it was handled by the NonFatal catch. I need to look at it a bit closer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13017][Docs] Replace example code in ml...
Github user keypointt commented on the pull request: https://github.com/apache/spark/pull/11142#issuecomment-197375803 cc @mengxr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13982][SparkR] Fixed features column he...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11793#issuecomment-198066744 **[Test build #53464 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53464/consoleFull)** for PR 11793 at commit [`48061de`](https://github.com/apache/spark/commit/48061de21addf5b021f874fa93cddb2882959042). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13808][test-maven] Don't build assembly...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/11701#issuecomment-197535055 Since all of this code is going to be changed heavily / removed after your final patch, I'm going to go ahead and just leave the Maven test path unchanged so that we can get this merged. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13997][SQL] Use Hadoop 2.0 default valu...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/11806#issuecomment-198207204 The efficiency of compression algorithms usually goes down as the frame (block) size goes down. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-13034[ML]:PySpark ml.classification supp...
Github user wangmiao1981 commented on the pull request: https://github.com/apache/spark/pull/11582#issuecomment-198064456 close this one as it has been merged with 11707. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13904][Scheduler]Add support for plugga...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11723#issuecomment-198256400 **[Test build #53525 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53525/consoleFull)** for PR 11723 at commit [`ae808d7`](https://github.com/apache/spark/commit/ae808d73e022077dba6ad999627589eed4730270). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13950] [SQL] generate code for sort mer...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11771#issuecomment-197592210 **[Test build #53371 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53371/consoleFull)** for PR 11771 at commit [`99df29a`](https://github.com/apache/spark/commit/99df29a8c9b0bc7df7aef1a37b7c5c7bff7a1ff1). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14007] [SQL] Manage the memory used by ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11826#issuecomment-198481175 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53547/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13981][SQL] Defer evaluating variables ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11792#issuecomment-198090295 **[Test build #53463 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53463/consoleFull)** for PR 11792 at commit [`29e408d`](https://github.com/apache/spark/commit/29e408d61f3557b1c5df343d039ef74d1c6e9ab3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13826][SQL] Revises Dataset ScalaDoc
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11769#issuecomment-197553484 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53340/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13957] [SQL] Support Group By Ordinal i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11846#issuecomment-198846521 **[Test build #53624 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53624/consoleFull)** for PR 11846 at commit [`79a537a`](https://github.com/apache/spark/commit/79a537aecdd788a80948aa22f61cca4901e8d0ee). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13914] [Scheduler] Add functionality to...
Github user paragpc closed the pull request at: https://github.com/apache/spark/pull/11736 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13922][SQL] Filter rows with null attri...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11749#issuecomment-197556324 **[Test build #53354 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53354/consoleFull)** for PR 11749 at commit [`0688cf8`](https://github.com/apache/spark/commit/0688cf84958552132aaa8ada960b9c4880b437e6). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13976][SQL] do not remove sub-queries a...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11786#issuecomment-197955452 **[Test build #53435 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53435/consoleFull)** for PR 11786 at commit [`ee5a437`](https://github.com/apache/spark/commit/ee5a43739b895708c054558c62263a6a59aa4b1e). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor][DOC] Fix nits in JavaStreamingTestExam...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11821#issuecomment-198317555 **[Test build #53532 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53532/consoleFull)** for PR 11821 at commit [`09ad928`](https://github.com/apache/spark/commit/09ad928e3f5efde847eae324b648bfed227c0f34). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-13991 - Extend the enforcer plugin Maven...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/11803#discussion_r56646795 --- Diff: pom.xml --- @@ -1733,7 +1733,7 @@ -${maven.version} +[3.3,) --- End diff -- Actually, that's what the existing specification already means: https://maven.apache.org/enforcer/enforcer-rules/versionRanges.html --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13923] [SQL] Implement SessionCatalog
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/11750#discussion_r56422492 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/interface.scala --- @@ -211,8 +214,7 @@ case class CatalogTablePartition( * future once we have a better understanding of how we want to handle skewed columns. */ case class CatalogTable( -specifiedDatabase: Option[String], -name: String, +name: TableIdentifier, --- End diff -- Maybe we can rename this `name` later. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13808][test-maven] Don't build assembly...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11701#issuecomment-197467017 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13908][SQL] Add a LocalLimit for Collec...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11817#issuecomment-198414800 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53539/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13826][SQL] Addendum: update documentat...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11814#issuecomment-198239537 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53509/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13937][PySpark][ML] Change JavaWrapper ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11767#issuecomment-197490633 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53334/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11621#issuecomment-197492294 **[Test build #53335 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53335/consoleFull)** for PR 11621 at commit [`460881c`](https://github.com/apache/spark/commit/460881cffcb9b6bce35b822e4a325352074d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13629] [ML] Add binary toggle Param to ...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/11536#issuecomment-198505962 @hhbyyh Thanks for the PR! LGTM, I agree with the use of minTF --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13926] Automatically use Kryo serialize...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11755#issuecomment-197560970 **[Test build #53353 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53353/consoleFull)** for PR 11755 at commit [`45b0c0b`](https://github.com/apache/spark/commit/45b0c0be3791e518f0a8783951ad9b9e53196a1e). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` class DecisionTreeClassificationModelWriter(instance: DecisionTreeClassificationModel)` * ` class DecisionTreeRegressionModelWriter(instance: DecisionTreeRegressionModel)` * ` case class SplitData(` * ` case class NodeData(` * `class Estimator(Params):` * `class Transformer(Params):` * `class Model(Transformer):` * `class LogisticRegressionModel(JavaModel, MLWritable, MLReadable):` * `class NaiveBayesModel(JavaModel, MLWritable, MLReadable):` * `class PipelineMLWriter(JavaMLWriter, JavaWrapper):` * `class PipelineMLReader(JavaMLReader):` * `class PipelineModelMLWriter(JavaMLWriter, JavaWrapper):` * `class PipelineModelMLReader(JavaMLReader):` * ` case class SQLTable(` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13449] Naive Bayes wrapper in SparkR
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11486#issuecomment-198845476 **[Test build #53622 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53622/consoleFull)** for PR 11486 at commit [`b4ee1aa`](https://github.com/apache/spark/commit/b4ee1aab70008919ba17cf02c8470f1a75c23ef8). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13449] Naive Bayes wrapper in SparkR
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11486#issuecomment-198845488 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53622/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13449] Naive Bayes wrapper in SparkR
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11486#issuecomment-198845487 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13826][SQL] Addendum: update documentat...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11814 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13993][PySpark] Add pyspark Rformula/Rf...
Github user yinxusen commented on the pull request: https://github.com/apache/spark/pull/11807#issuecomment-198147707 @jkbradley This is a follow-up for https://github.com/apache/spark/pull/9884 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13742][Core] Add non-iterator interface...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/11578#discussion_r56611345 --- Diff: core/src/main/scala/org/apache/spark/util/random/RandomSampler.scala --- @@ -155,6 +171,28 @@ class BernoulliSampler[T: ClassTag](fraction: Double) extends RandomSampler[T, T override def setSeed(seed: Long): Unit = rng.setSeed(seed) + private val gapSampling: GapSampling = if (fraction > 0.0 && fraction < 1.0) { --- End diff -- Would this (and subsequent one) maybe be simpler as a lazy val and then get rid of the if/else/null thing? It seems like right now we make this some times even when it isn't used. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13839][SQL] Defer input evaluation and ...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11676#discussion_r56455098 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicOperators.scala --- @@ -80,12 +81,21 @@ case class Filter(condition: Expression, child: SparkPlan) // Split out all the IsNotNulls from condition. private val (notNullPreds, otherPreds) = splitConjunctivePredicates(condition).partition { case IsNotNull(a) if child.output.contains(a) => true +case IsNotNull(a) => + a match { +case Casts(a) if child.output.contains(a) => true --- End diff -- We should not add these corner cases here, they should be handled by constraints. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13320] [SQL] Support Star in CreateStru...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11208#issuecomment-197606557 **[Test build #53376 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53376/consoleFull)** for PR 11208 at commit [`e060dea`](https://github.com/apache/spark/commit/e060deaaf09d122966f090bf3b86895636418664). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13839][SQL] Defer input evaluation and ...
Github user viirya commented on the pull request: https://github.com/apache/spark/pull/11676#issuecomment-197698865 cc @davies Can you please review this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13889][YARN][Branch-1.6]Fix the calcula...
GitHub user carsonwang opened a pull request: https://github.com/apache/spark/pull/11813 [SPARK-13889][YARN][Branch-1.6]Fix the calculation of the max number of executor failure ## What changes were proposed in this pull request? Backport #11713 to 1.6. The max number of executor failure before failing the application is default to twice the maximum number of executors if dynamic allocation is enabled. The default value for "spark.dynamicAllocation.maxExecutors" is Int.MaxValue. The calculated value of the default max number of executor failure should be Int.MaxValue instead of only 3. ## How was this patch tested? It tests if the value is greater that Int.MaxValue / 2 to avoid the overflow when it multiplies 2. You can merge this pull request into a Git repository by running: $ git pull https://github.com/carsonwang/spark branch-1.6-ExecutorFailNum Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11813.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11813 commit 17d8bc1f13c3b29e22ecbec6a9f08491e5970368 Author: Carson WangDate: 2016-03-18T05:15:40Z Fix the calculation of the max number of executor failure --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/11788#discussion_r56617782 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/joins/ShuffledHashJoin.scala --- @@ -0,0 +1,95 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.joins + +import org.apache.spark.rdd.RDD +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.{Expression, JoinedRow} +import org.apache.spark.sql.catalyst.plans._ +import org.apache.spark.sql.catalyst.plans.physical._ +import org.apache.spark.sql.execution.{BinaryNode, SparkPlan} +import org.apache.spark.sql.execution.metric.SQLMetrics + +/** + * Performs an inner hash join of two child relations by first shuffling the data using the join --- End diff -- inner? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14004][SQL] NamedExpressions should hav...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/11822#issuecomment-198318534 Personally, I had once been quite confused by the fact that `NamedExpression.qualifiers` is a `Seq[String]` and thought that attributes can be qualified with multiple qualifiers like `db.table.column`. That's why current version of `AttributeReference.sql` joins all qualifiers using `.` rather than picking the first one. I believe it's safe and good to enforce the at-most-one-qualifier constraint at type level unless there do exist valid cases where using multiple qualifiers makes sense but haven't been implemented in Spark SQL yet. cc @marmbrus @rxin @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13989][SQL] Remove non-vectorized/unsaf...
Github user sameeragarwal commented on the pull request: https://github.com/apache/spark/pull/11799#issuecomment-198152113 Thanks, all comments addressed! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13997][SQL] Use Hadoop 2.0 default valu...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/11806#issuecomment-198314857 It does mean each record is compressed separately. Maybe that makes sense for huge records, or somehow facilitates processing pieces of a block (since the whole block has to be uncompressed to use any of it). However Tom's book says block compression should be preferred. I don't know why it's not the default. Also summoning @steveloughran --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13928] Move org.apache.spark.Logging in...
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/11764#issuecomment-197641888 The log says: `java.lang.RuntimeException: spark-core: Binary compatibility check failed!`, but no reason is provided... cc @JoshRosen --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13320] [SQL] Support Star in CreateStru...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11208#issuecomment-197634039 **[Test build #53376 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53376/consoleFull)** for PR 11208 at commit [`e060dea`](https://github.com/apache/spark/commit/e060deaaf09d122966f090bf3b86895636418664). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13826][SQL] Revises Dataset ScalaDoc
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11769#issuecomment-197763593 **[Test build #53407 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53407/consoleFull)** for PR 11769 at commit [`6062f49`](https://github.com/apache/spark/commit/6062f49ba6d123e731d6103beeaa2b0441257253). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12789] [SQL] Support Order By Ordinal i...
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/11815#discussion_r56620898 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -202,3 +203,14 @@ object Unions { } } } + +/** + * Extractor for retrieving Int value. + */ +object IntegerIndex { + def unapply(a: Any): Option[Int] = a match { +case Literal(a: Int, IntegerType) => Some(a) +case UnaryMinus(IntegerLiteral(v)) => Some(-v) --- End diff -- ah ic so this is used to detect errors. i'd add some comment here explaining why we are having this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13957] [SQL] Support Group By Ordinal i...
Github user gatorsmile closed the pull request at: https://github.com/apache/spark/pull/11846 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13957] [SQL] Support Group By Ordinal i...
GitHub user gatorsmile reopened a pull request: https://github.com/apache/spark/pull/11846 [SPARK-13957] [SQL] Support Group By Ordinal in SQL What changes were proposed in this pull request? This PR is to support group by position in SQL. For example, when users input the following query ```SQL select c1 as a, c2, c3, sum(*) from tbl group by 1, 3, c4 ``` The ordinals are recognized as the positions in the select list. Thus, `Analyzer` converts it to ```SQL select c1, c2, c3, sum(*) from tbl group by c1, c3, c4 ``` This is controlled by the config option `spark.sql.groupByOrdinal`. - When true, the ordinal numbers in group by clauses are treated as the position in the select list. - When false, the ordinal numbers are ignored. - Only convert integer literals (not foldable expressions). If found foldable expressions, ignore them. - When the positions specified in the group by clauses correspond to the aggregate functions in select list, output an exception message. Note: This PR is taken from https://github.com/apache/spark/pull/10731. When merging this PR, please give the credit to @zhichao-li Also cc all the people who are involved in the previous discussion: @rxin @cloud-fan @marmbrus @yhuai @hvanhovell @adrian-wang @chenghao-intel @tejasapatil How was this patch tested? Added a few test cases for both positive and negative test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark groupByOrdinal Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11846.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11846 commit 95f25a6eb688a2cf3e3efa6ec7b7715884b1fa7b Author: gatorsmileDate: 2016-03-20T04:00:32Z group by ordinals commit a9273761d4dfc3c7a95d570884bfbcc420a119e9 Author: gatorsmile Date: 2016-03-20T04:08:37Z Merge remote-tracking branch 'upstream/master' into groupByOrdinal commit b10d076a71d863255a901861f5ca571816d8fca7 Author: gatorsmile Date: 2016-03-20T04:11:34Z fix messages. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13957] [SQL] Support Group By Ordinal i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11846#issuecomment-198845243 **[Test build #53623 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53623/consoleFull)** for PR 11846 at commit [`b10d076`](https://github.com/apache/spark/commit/b10d076a71d863255a901861f5ca571816d8fca7). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14012][SQL] Extract VectorizedColumnRea...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11834#issuecomment-198571979 **[Test build #53578 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53578/consoleFull)** for PR 11834 at commit [`3685480`](https://github.com/apache/spark/commit/3685480cdf018d80adc4289e34d2eba458ef7cb9). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13923] [SQL] Implement SessionCatalog
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11750#issuecomment-197620375 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Minor][DOC] Add JavaStreamingTestExample
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11776#issuecomment-197732446 **[Test build #53399 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53399/consoleFull)** for PR 11776 at commit [`ff56ff5`](https://github.com/apache/spark/commit/ff56ff56d46db9ee64924c44fb18c03c0ff91e4d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12379][ML][MLLIB] Copy GBT implementati...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/10607#issuecomment-198033902 Thanks for doing this migration. I checked the PR and it LGTM Your tests look good to me. The tests all seem fairly close, except for a couple of outliers, but even those seem within a standard deviation or so (the 2nd value in spark-perf results). Thanks for running them! Also @MLnick > As part of those tickets, I think we can clean up this ML impl and interfaces if required (e.g. we could look at removing theprivate [ml] train method in favour of one in MLLIb that converts RDDs to DataFrame and calls ML, we can make more stuff private where possible, etc). But I think it'll be a lot easier to clean things up once everything is in ML. If the ML implementation uses RDDs underneath, it will be nice to call directly into that implementation from spark.mllib in order to avoid serialization overhead. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12469][CORE][WIP/RFC] Consistent accumu...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11105#issuecomment-197716738 **[Test build #53389 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53389/consoleFull)** for PR 11105 at commit [`8ddaf7c`](https://github.com/apache/spark/commit/8ddaf7c7c96e5bbb1cd2f11844db847bc52fe77f). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12789] [SQL] Support Order By Ordinal i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11815#issuecomment-198245857 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13602][CORE] Add shutdown hook to Drive...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11746#issuecomment-198142673 **[Test build #53475 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53475/consoleFull)** for PR 11746 at commit [`86eb800`](https://github.com/apache/spark/commit/86eb800f2ee6a6bfa0671c1c17192bd4ab934ff0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12719][HOTFIX] Fix compilation against ...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/11787#issuecomment-197977494 Thanks. I am merging this to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL][BUILD] Remove duplicated lines
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/11773#discussion_r56431109 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala --- @@ -49,7 +49,6 @@ class JoinSuite extends QueryTest with SharedSQLContext { case j: BroadcastHashJoin => j case j: CartesianProduct => j case j: BroadcastNestedLoopJoin => j - case j: BroadcastHashJoin => j --- End diff -- See the line 49. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13957] [SQL] Support Group By Ordinal i...
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/11846 [SPARK-13957] [SQL] Support Group By Ordinal in SQL What changes were proposed in this pull request? This PR is to support group by position in SQL. For example, when users input the following query ```SQL select c1 as a, c2, c3, sum(*) from tbl group by 1, 3, c4 ``` The ordinals are recognized as the positions in the select list. Thus, `Analyzer` converts it to ```SQL select c1, c2, c3, sum(*) from tbl group by c1, c3, c4 ``` This is controlled by the config option `spark.sql.groupByOrdinal`. - When true, the ordinal numbers in group by clauses are treated as the position in the select list. - When false, the ordinal numbers are ignored. - Only convert integer literals (not foldable expressions). If found foldable expressions, ignore them. - When the positions specified in the group by clauses correspond to the aggregate functions in select list, output an exception message. Note: This PR is taken from https://github.com/apache/spark/pull/10731. When merging this PR, please give the credit to @zhichao-li Also cc all the people who are involved in the previous discussion: @rxin @cloud-fan @marmbrus @yhuai @hvanhovell @adrian-wang @chenghao-intel @tejasapatil How was this patch tested? Added a few test cases for both positive and negative test cases. You can merge this pull request into a Git repository by running: $ git pull https://github.com/gatorsmile/spark groupByOrdinal Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/11846.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #11846 commit 95f25a6eb688a2cf3e3efa6ec7b7715884b1fa7b Author: gatorsmileDate: 2016-03-20T04:00:32Z group by ordinals commit a9273761d4dfc3c7a95d570884bfbcc420a119e9 Author: gatorsmile Date: 2016-03-20T04:08:37Z Merge remote-tracking branch 'upstream/master' into groupByOrdinal commit b10d076a71d863255a901861f5ca571816d8fca7 Author: gatorsmile Date: 2016-03-20T04:11:34Z fix messages. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10231#issuecomment-198504543 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53553/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13928] Move org.apache.spark.Logging in...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11764#issuecomment-197700493 **[Test build #53392 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53392/consoleFull)** for PR 11764 at commit [`e875d82`](https://github.com/apache/spark/commit/e875d823d24139235e88031775354d28a6061997). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13958]Executor OOM due to unbounded gro...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/11794#discussion_r56697868 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java --- @@ -320,7 +320,15 @@ private void growPointerArrayIfNecessary() throws IOException { assert(inMemSorter != null); if (!inMemSorter.hasSpaceForAnotherRecord()) { long used = inMemSorter.getMemoryUsage(); - LongArray array = allocateArray(used / 8 * 2); + LongArray array; + try { +// could trigger spilling +array = allocateArray(used / 8 * 2); + } catch (OutOfMemoryError e) { +// should have trigger spilling +assert(inMemSorter.hasSpaceForAnotherRecord()); --- End diff -- I see, then use a `if` ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Spark-13034] PySpark ml.classification suppor...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11707#issuecomment-197553310 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/11663#discussion_r56406664 --- Diff: python/pyspark/ml/param/__init__.py --- @@ -275,23 +382,9 @@ def _set(self, **kwargs): """ for param, value in kwargs.items(): p = getattr(self, param) -if p.expectedType is None or type(value) == p.expectedType or value is None: -self._paramMap[getattr(self, param)] = value -else: -try: -# Try and do "safe" conversions that don't lose information -if p.expectedType == float: -self._paramMap[getattr(self, param)] = float(value) -# Python 3 unified long & int -elif p.expectedType == int and type(value).__name__ == 'long': -self._paramMap[getattr(self, param)] = value -else: -raise Exception( -"Provided type {0} incompatible with type {1} for param {2}" -.format(type(value), p.expectedType, p)) -except ValueError: -raise Exception(("Failed to convert {0} to type {1} for param {2}" - .format(type(value), p.expectedType, p))) +if value is not None: +value = p.typeConverter(value) +self._paramMap[getattr(self, param)] = value --- End diff -- reuse value ```p``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [MINOR][SQL][BUILD] Remove duplicated lines
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/11773#discussion_r56431084 --- Diff: project/MimaExcludes.scala --- @@ -299,13 +299,11 @@ object MimaExcludes { // [SPARK-13244][SQL] Migrates DataFrame to Dataset ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.sql.DataFrameHolder.apply"), ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.sql.DataFrameHolder.toDF"), - ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.sql.DataFrameHolder.toDF"), ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.sql.DataFrameHolder.copy"), ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.sql.DataFrameHolder.copy$default$1"), ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.sql.DataFrameHolder.df$1"), ProblemFilters.exclude[IncompatibleMethTypeProblem]("org.apache.spark.sql.DataFrameHolder.this"), ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.sql.SQLContext.tables"), - ProblemFilters.exclude[IncompatibleResultTypeProblem]("org.apache.spark.sql.SQLContext.tables"), --- End diff -- See the line 307. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13928] Move org.apache.spark.Logging in...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11764#issuecomment-197407360 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53322/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-13809][SQL] State store for stream...
Github user tdas commented on a diff in the pull request: https://github.com/apache/spark/pull/11645#discussion_r56386741 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala --- @@ -0,0 +1,200 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.execution.streaming.state + +import java.util.{Timer, TimerTask} + +import scala.collection.mutable +import scala.util.control.NonFatal + +import org.apache.spark.{Logging, SparkEnv} +import org.apache.spark.sql.catalyst.InternalRow + +/** Unique identifier for a [[StateStore]] */ +case class StateStoreId(operatorId: Long, partitionId: Int) + +/** + * Base trait for a versioned key-value store used for streaming aggregations + */ +trait StateStore { + + /** Unique identifier of the store */ + def id: StateStoreId + + /** Version of the data in this store before committing updates. */ + def version: Long + + /** + * Update the value of a key using the value generated by the update function. + * This can be called only after prepareForUpdates() has been called in the same thread. + */ + def update(key: InternalRow, updateFunc: Option[InternalRow] => InternalRow): Unit + + /** + * Remove keys that match the following condition. + * This can be called only after prepareForUpdates() has been called in the current thread. + */ + def remove(condition: InternalRow => Boolean): Unit + + /** + * Commit all the updates that have been made to the store. + * This can be called only after prepareForUpdates() has been called in the current thread. + */ + def commit(): Long + + /** Cancel all the updates that have been made to the store. */ + def cancel(): Unit + + /** + * Iterator of store data after a set of updates have been committed. + * This can be called only after commitUpdates() has been called in the current thread. + */ + def iterator(): Iterator[InternalRow] + + /** + * Iterator of the updates that have been committed. + * This can be called only after commitUpdates() has been called in the current thread. + */ + def updates(): Iterator[StoreUpdate] + + /** + * Whether all updates have been committed + */ + def hasCommitted: Boolean +} + + +trait StateStoreProvider { + + /** Get the store with the existing version. */ + def getStore(version: Long): StateStore + + /** Optional method for providers to allow for background management */ + def manage(): Unit = { } +} + +sealed trait StoreUpdate +case class ValueAdded(key: InternalRow, value: InternalRow) extends StoreUpdate +case class ValueUpdated(key: InternalRow, value: InternalRow) extends StoreUpdate +case class KeyRemoved(key: InternalRow) extends StoreUpdate + + +/** + * Companion object to [[StateStore]] that provides helper methods to create and retrive stores + * by their unique ids. + */ +private[state] object StateStore extends Logging { + + private val MANAGEMENT_TASK_INTERVAL_SECS = 60 + + private val loadedProviders = new mutable.HashMap[StateStoreId, StateStoreProvider]() + private val managementTimer = new Timer("StateStore Timer", true) + @volatile private var managementTask: TimerTask = null + + /** Get or create a store associated with the id. */ + def get(storeId: StateStoreId, directory: String, version: Long): StateStore = { +require(version >= 0) +val storeProvider = loadedProviders.synchronized { + startIfNeeded() + val provider = loadedProviders.getOrElseUpdate( + storeId, new HDFSBackedStateStoreProvider(storeId, directory)) + reportActiveInstance(storeId) + provider +} +storeProvider.getStore(version) + } + + def remove(storeId: StateStoreId): Unit =
[GitHub] spark pull request: [SPARK-13805][SQL] Generate code that get a va...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11636#issuecomment-198598006 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/11663#discussion_r56406609 --- Diff: python/pyspark/ml/param/__init__.py --- @@ -32,13 +35,17 @@ class Param(object): .. versionadded:: 1.3.0 """ -def __init__(self, parent, name, doc, expectedType=None): +def __init__(self, parent, name, doc, expectedType=None, typeConverter=None): if not isinstance(parent, Identifiable): raise TypeError("Parent must be an Identifiable but got type %s." % type(parent)) self.parent = parent.uid self.name = str(name) self.doc = str(doc) self.expectedType = expectedType +if expectedType is not None: +warnings.warn("expectedType is deprecated and will be removed in 2.1.0, " + + "use typeConverter instead.") --- End diff -- "use typeConverter instead, as a keyword argument" Also, I'd put this same message in the docstring too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13958]Executor OOM due to unbounded gro...
Github user sitalkedia commented on a diff in the pull request: https://github.com/apache/spark/pull/11794#discussion_r56696662 --- Diff: core/src/main/java/org/apache/spark/shuffle/sort/ShuffleExternalSorter.java --- @@ -320,7 +320,15 @@ private void growPointerArrayIfNecessary() throws IOException { assert(inMemSorter != null); if (!inMemSorter.hasSpaceForAnotherRecord()) { long used = inMemSorter.getMemoryUsage(); - LongArray array = allocateArray(used / 8 * 2); + LongArray array; + try { +// could trigger spilling +array = allocateArray(used / 8 * 2); + } catch (OutOfMemoryError e) { +// should have trigger spilling +assert(inMemSorter.hasSpaceForAnotherRecord()); --- End diff -- Hmm.. I tried changing it to require but compiler does not seem to like it. May be because its a java file and can't import scala methods? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11891] Model export/import for RFormula...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/9884 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13921] Store serialized blocks as multi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11748#issuecomment-198007459 **[Test build #53442 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53442/consoleFull)** for PR 11748 at commit [`3fc0b66`](https://github.com/apache/spark/commit/3fc0b66981aa2d45be129986f0dc5bd595e08b22). * This patch **fails to build**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13449] Naive Bayes wrapper in SparkR
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11486#issuecomment-198841429 **[Test build #53622 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53622/consoleFull)** for PR 11486 at commit [`b4ee1aa`](https://github.com/apache/spark/commit/b4ee1aab70008919ba17cf02c8470f1a75c23ef8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14000][SQL] case class with a tuple fie...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11816#issuecomment-198283547 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13068][PYSPARK][ML] Type conversion for...
Github user sethah commented on a diff in the pull request: https://github.com/apache/spark/pull/11663#discussion_r56548536 --- Diff: python/pyspark/ml/param/__init__.py --- @@ -65,6 +72,106 @@ def __eq__(self, other): return False +class TypeConverters(object): +""" +.. note:: DeveloperApi + +Factory methods for common type conversion functions for `Param.typeConverter`. + +.. versionadded:: 2.0.0 +""" + +@staticmethod +def _is_numeric(value): +vtype = type(value) +return vtype == int or vtype == float or vtype == np.float64 \ +or vtype == np.int64 or vtype.__name__ == 'long' + +@staticmethod +def _can_convert_to_list(value): +vtype = type(value) +return vtype == list or vtype == np.ndarray or isinstance(value, Vector) + +@staticmethod +def identity(value): +""" +Dummy converter that just returns value. +""" +return value + +@staticmethod +def convertToList(value): +""" +Convert a value to a list, if possible. +""" +if type(value) == list: +return value +elif type(value) == np.ndarray: +return list(value) +elif isinstance(value, Vector): +return value.toArray() +else: +raise TypeError("Could not convert %s to list" % value) + +@staticmethod +def convertToListFloat(value): +""" +Convert a value to list of floats, if possible. +""" +if TypeConverters._can_convert_to_list(value) and \ +all(map(lambda v: TypeConverters._is_numeric(v), value)): +value = TypeConverters.convertToList(value) +return list(map(lambda v: float(v), value)) +else: +raise TypeError("Could not convert %s to list of floats" % value) + +@staticmethod +def convertToListInt(value): +""" +Convert a value to list of ints, if possible. +""" +if TypeConverters._can_convert_to_list(value) and \ +all(map(lambda v: TypeConverters._is_numeric(v), value)): +value = TypeConverters.convertToList(value) +return list(map(lambda v: int(v), value)) +else: +raise TypeError("Could not convert %s to list of ints" % value) + +@staticmethod +def convertToVector(value): +""" +Convert a value to a MLlib Vector, if possible. +""" +if isinstance(value, Vector): +return value +elif TypeConverters._can_convert_to_list(value) and \ +all(map(lambda v: TypeConverters._is_numeric(v), value)): +value = DenseVector(value) +else: +raise TypeError("Could not convert %s to vector" % value) +return value + +@staticmethod +def convertToFloat(value): +""" +Convert a value to a float, if possible. +""" +if TypeConverters._is_numeric(value): +return float(value) +else: +raise TypeError("Could not convert %s to float" % value) + +@staticmethod +def convertToInt(value): +""" +Convert a value to an int, if possible. +""" +if TypeConverters._is_numeric(value): --- End diff -- Done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13977] [SQL] Brings back Shuffled hash ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11788#issuecomment-198070544 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53451/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13808][test-maven] Don't build assembly...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11701#issuecomment-197513303 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53336/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11888] [ML] Decision tree persistence i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11581#issuecomment-197552020 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53346/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13997][SQL] Use Hadoop 2.0 default valu...
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11806#issuecomment-198364410 @tomwitte Sorry for adding more comments but does that mean the default value in Hadoop 1.x is BLOCK? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13942][CORE][DOCS] Remove Shark-related...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/11770#issuecomment-197561885 Removing Shark docs part looks OK. The slightly controversial bit is making `SparkEnv` private. People might depend on that. @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13777] [ML] Remove constant features fr...
Github user iyounus commented on the pull request: https://github.com/apache/spark/pull/11610#issuecomment-197468720 One problem with the eigen decomposition method is that for rank deficient matrix some of the eigenvalues can be extremely small (instead of being zero) and their contribution to the inverse can become very large. I'll try out these methods (DGELSD and eigen decomposition) and see how they behave in this case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13826][SQL] Revises Dataset ScalaDoc
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/11769#issuecomment-197512611 cc @rxin @marmbrus @yhuai --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-14011][CORE][SQL] Enable `LineLength` J...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11831#issuecomment-198549526 **[Test build #53566 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53566/consoleFull)** for PR 11831 at commit [`2923ef0`](https://github.com/apache/spark/commit/2923ef095369376be03a868c2bf2375294dab6d1). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13816][Graphx] Add parameter checks for...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/11655#issuecomment-197485725 Thanks - merging in master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12789]Support order by index and group ...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/10731#issuecomment-197698035 Also I'd say "by position", not "by index", since index usually refers to something else in databases. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13903][SQL] Modify output nullability w...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11722#issuecomment-197278897 **[Test build #53297 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53297/consoleFull)** for PR 11722 at commit [`c7d54a0`](https://github.com/apache/spark/commit/c7d54a0fb78c826903c0db8f1b1ac7b0d54bb303). * This patch **fails from timeout after a configured wait of \`250m\`**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13629] [ML] Add binary toggle Param to ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11536#issuecomment-197782539 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13928] Move org.apache.spark.Logging in...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11764#issuecomment-197639904 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/53383/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13430][PySpark][ML] Python API for trai...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11621#issuecomment-198026652 **[Test build #53449 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/53449/consoleFull)** for PR 11621 at commit [`d7e17ab`](https://github.com/apache/spark/commit/d7e17ab6ab7219394d08b205e892f383f7ca1641). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-13294] [PROJECT INFRA] Remove MiMa's de...
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/11178#issuecomment-197623653 For some reason, this PR breaks the following invocation: ``` ./dev/make-distribution.sh -T 1C -Phadoop-2.6 ``` The problem appears to be with this line ```sh SCALA_VERSION=$("$MVN" help:evaluate -Dexpression=scala.binary.version $@ 2>/dev/null\ | grep -v "INFO"\ | tail -n 1) ``` which outputs this when run ``` + SCALA_VERSION='[ERROR] Re-run Maven using the -X switch to enable full debug logging.' ``` Removing the `-T 1C` fixes it, for some reason. Any ideas why this PR is interfering with the additional flags passed to Maven? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org