[GitHub] spark pull request: [SPARK-11815] [ML] [PySpark] PySpark DecisionT...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9807#issuecomment-163688796 **[Test build #47511 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47511/consoleFull)** for PR 9807 at commit [`9dd8870`](https://github.com/apache/spark/commit/9dd88706a1401598f6a818958ee9f10ea73dea57). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11815] [ML] [PySpark] PySpark DecisionT...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9807#issuecomment-163688975 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47511/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12063][SQL] Use number in group by clau...
Github user dereksabryfb commented on the pull request: https://github.com/apache/spark/pull/10052#issuecomment-163699828 Added a case for sort --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12198] [SparkR] SparkR support read.par...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10191#issuecomment-163699843 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12155] [SPARK-12253] Fix executor OOM i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10240#issuecomment-163704082 **[Test build #47528 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47528/consoleFull)** for PR 10240 at commit [`d8be669`](https://github.com/apache/spark/commit/d8be66911d2abf3da46a25a54a7d80fd1eeebdfa). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [Streaming][Doc][Minor] Update the description...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/10246#issuecomment-163706432 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12235][SPARKR] Enhance mutate() to supp...
Github user shivaram commented on the pull request: https://github.com/apache/spark/pull/10220#issuecomment-163706535 @felixcheung Could you see if this satisfies the requirements in https://issues.apache.org/jira/browse/SPARK-10346 ? The only other thing we had in mind was to match the signature of `mutate` in dplyr ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7286] [SQL] Deprecate !== in favour of ...
Github user jodersky commented on the pull request: https://github.com/apache/spark/pull/9925#issuecomment-163708919 I agree that its not pretty, however the only other fix I see is to remove "$" for columns instead --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [DOCS][ML][SPARK-11964] Add in Pipeline Import...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/10179#issuecomment-163712455 @anabranch Hm, I may not have been clear enough. The save/load functionality seems general and important enough that it should go under the "Main concepts in Pipelines" section; I would put a subsection with a small paragraph (without code) at the end of the "Main concepts in Pipelines" section, just before the "Code example" section. I would then modify the first code example "Example: Estimator, Transformer, and Param" to include saving and loading the pipeline. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/10234#discussion_r47264940 --- Diff: docs/ml-classification-regression.md --- @@ -27,10 +27,10 @@ displayTitle: Classification and regression in spark.ml * This will become a table of contents (this text will be scraped). {:toc} -In MLlib, we implement popular linear methods such as logistic +In `spark.ml`, we implement popular linear methods such as logistic --- End diff -- I see the purpose now. It was the old MLlib text, but a lot of it still applies. The distinction is removed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10234#issuecomment-163714372 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47530/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12228] [SQL] Try to run execution hive'...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10204#issuecomment-163714745 **[Test build #47527 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47527/consoleFull)** for PR 10204 at commit [`c5294a9`](https://github.com/apache/spark/commit/c5294a91a52124fa45cb32bd5799d6f1c0374fd0). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12155] [SPARK-12253] Fix executor OOM i...
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/10240#discussion_r47267569 --- Diff: core/src/main/scala/org/apache/spark/memory/ExecutionMemoryPool.scala --- @@ -91,23 +108,34 @@ private[memory] class ExecutionMemoryPool( val numActiveTasks = memoryForTask.keys.size val curMem = memoryForTask(taskAttemptId) - // How much we can grant this task; don't let it grow to more than 1 / numActiveTasks; - // don't let it be negative - val maxToGrant = -math.min(numBytes, math.max(0, (poolSize / numActiveTasks) - curMem)) + // In every iteration of this loop, we should first try to reclaim any borrowed execution + // space from storage. This is necessary because of the potential race condition where new + // storage blocks may steal the free execution memory that this task was waiting for. + maybeGrowPool(numBytes - memoryFree) + + // Maximum size the pool would have after potentially growing the pool. + // This is used to compute the upper bound of how much memory each task can occupy. This + // must take into account potential free memory as well as the amount this pool currently + // occupies. Otherwise, we may run into SPARK-12155 where, in unified memory management, + // we did not take into account space that could have been freed by evicting cached blocks. + val maxPoolSize = computeMaxPoolSize() + val maxMemoryPerTask = maxPoolSize / numActiveTasks + val minMemoryPerTask = poolSize / (2 * numActiveTasks) + + // How much we can grant this task; keep its share within 0 <= X <= 1 / numActiveTasks + val maxToGrant = math.min(numBytes, math.max(0, maxMemoryPerTask - curMem)) // Only give it as much memory as is free, which might be none if it reached 1 / numTasks val toGrant = math.min(maxToGrant, memoryFree) - if (curMem < poolSize / (2 * numActiveTasks)) { + if (curMem < minMemoryPerTask) { --- End diff -- The current code is hard to understand, I can prove that it's the same with mine one. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12250] [SQL] Allow users to define a UD...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10236#issuecomment-163716622 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-10647][MESOS] Fix zookeeper dir with me...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10057#issuecomment-163689616 **[Test build #47497 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47497/consoleFull)** for PR 10057 at commit [`b8fc74c`](https://github.com/apache/spark/commit/b8fc74c4f2d0e648b439ba722230e8e865ccca76). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11563] [core] [repl] Use RpcEnv to tran...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9923#issuecomment-163690659 **[Test build #47525 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47525/consoleFull)** for PR 9923 at commit [`08a74e5`](https://github.com/apache/spark/commit/08a74e5606a4df2317040ff270e5a9bfd7f6efd2). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12228] [SQL] Try to run execution hive'...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10204#issuecomment-163690621 **[Test build #47527 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47527/consoleFull)** for PR 10204 at commit [`c5294a9`](https://github.com/apache/spark/commit/c5294a91a52124fa45cb32bd5799d6f1c0374fd0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12198] [SparkR] SparkR support read.par...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10191#issuecomment-163699540 **[Test build #47518 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47518/consoleFull)** for PR 10191 at commit [`9e0fd63`](https://github.com/apache/spark/commit/9e0fd637c97ea269398db7469499aa4d7e3dda45). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9003] [MLlib] Add mapActive{Pairs,Value...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/7357#issuecomment-163701403 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47502/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12155] [SPARK-12253] Fix executor OOM i...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/10240#issuecomment-163701442 ok, retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12155] [SPARK-12253] Fix executor OOM i...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/10240#issuecomment-163701557 Last commit actually passed tests last night. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12220][Core]Make Utils.fetchFile suppor...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10208#issuecomment-163705815 **[Test build #47529 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47529/consoleFull)** for PR 10208 at commit [`2c31643`](https://github.com/apache/spark/commit/2c3164386040b5051e0332652cff9d2052b90cdb). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12012][SQL] Backports PR #10004 to bran...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/10250#issuecomment-163708398 I have merged it. Let's close this PR. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10209#issuecomment-163710755 **[Test build #47531 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47531/consoleFull)** for PR 10209 at commit [`fb562fb`](https://github.com/apache/spark/commit/fb562fb67a761276456b14a81513f3fc69a6ead8). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/10209#discussion_r47267922 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicOperators.scala --- @@ -122,11 +122,22 @@ case class Except(left: LogicalPlan, right: LogicalPlan) extends SetOperation(le override def output: Seq[Attribute] = left.output } +object Join { + def apply( +left: LogicalPlan, +right: LogicalPlan, +joinType: JoinType, +condition: Option[Expression]): Join = { +Join(left, right, joinType, condition, None) + } +} + case class Join( left: LogicalPlan, right: LogicalPlan, joinType: JoinType, - condition: Option[Expression]) extends BinaryNode { + condition: Option[Expression], + generatedExpressions: Option[EquivalentExpressions]) extends BinaryNode { --- End diff -- This is semi-public API cause I think some advanced projects do dig into catalyst and we've never changed the signature of something as basic as `Join` before. Could we do this instead by fixing nullablity propagation and only inserting the filter if the attribute is `nullable`? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12227][SQL] Support drop multiple colum...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/10218#issuecomment-163718935 I'm not sure this is worth the complexity. I think most users will only ever drop by name (since dropping a complex expression doesn't really make sense), and in that case constructing a column is strictly more typing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/10234#issuecomment-163719076 In addition to moving ml-intro back to ml-guide, it'd be nice if the sidebar had links back to the main spark.ml and spark.mllib pages. That could be done in a separate JIRA/PR, if you prefer. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/10209#discussion_r47268378 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -99,6 +99,13 @@ abstract class LogicalPlan extends QueryPlan[LogicalPlan] with Logging { */ lazy val resolved: Boolean = expressions.forall(_.resolved) && childrenResolved + /** + * Returns true if the two plans are semantically equal. This should ignore state generated + * during planning to help the planning process. + * TODO: implement this as a pass that canonicalizes the plan tree instead? + */ + def semanticEquals(other: LogicalPlan): Boolean = this == other --- End diff -- Oh, this is a new semantic equals. How is this different than `sameResult`? Maybe we should unify the naming between Expression and LogicalPlan for this concept. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12063][SQL] Use number in group by clau...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10052#issuecomment-163722755 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47534/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10228#issuecomment-163726187 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12063][SQL] Use number in group by clau...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/10052#issuecomment-163728332 I'd try the following locally `build/sbt scalastyle test:scalastyle catalyst/test sql/test`. Each of those commands can be run separately too and you can use ~ to rerun whenever something changes to iterate more quickly `build/sbt ~scalastyle` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12063][SQL] Use number in group by clau...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10052#issuecomment-163729292 **[Test build #47538 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47538/consoleFull)** for PR 10052 at commit [`bd453d5`](https://github.com/apache/spark/commit/bd453d5f6744aa8fdd03b5ee2ecd44b471165eb4). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12155] [SPARK-12253] Fix executor OOM i...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/10240#discussion_r47275498 --- Diff: core/src/main/scala/org/apache/spark/memory/ExecutionMemoryPool.scala --- @@ -91,23 +108,34 @@ private[memory] class ExecutionMemoryPool( val numActiveTasks = memoryForTask.keys.size val curMem = memoryForTask(taskAttemptId) - // How much we can grant this task; don't let it grow to more than 1 / numActiveTasks; - // don't let it be negative - val maxToGrant = -math.min(numBytes, math.max(0, (poolSize / numActiveTasks) - curMem)) + // In every iteration of this loop, we should first try to reclaim any borrowed execution + // space from storage. This is necessary because of the potential race condition where new + // storage blocks may steal the free execution memory that this task was waiting for. + maybeGrowPool(numBytes - memoryFree) + + // Maximum size the pool would have after potentially growing the pool. + // This is used to compute the upper bound of how much memory each task can occupy. This + // must take into account potential free memory as well as the amount this pool currently + // occupies. Otherwise, we may run into SPARK-12155 where, in unified memory management, + // we did not take into account space that could have been freed by evicting cached blocks. + val maxPoolSize = computeMaxPoolSize() + val maxMemoryPerTask = maxPoolSize / numActiveTasks + val minMemoryPerTask = poolSize / (2 * numActiveTasks) + + // How much we can grant this task; keep its share within 0 <= X <= 1 / numActiveTasks + val maxToGrant = math.min(numBytes, math.max(0, maxMemoryPerTask - curMem)) // Only give it as much memory as is free, which might be none if it reached 1 / numTasks val toGrant = math.min(maxToGrant, memoryFree) - if (curMem < poolSize / (2 * numActiveTasks)) { + if (curMem < minMemoryPerTask) { --- End diff -- I was able to prove this myself. I summarized my thoughts in this gist: https://gist.github.com/andrewor14/aea58796dd25d2ec9f20 That said, I would still prefer to do this separately since this PR is already passing tests. :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10234#issuecomment-163731886 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47537/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/10228#discussion_r47276008 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregationIterator.scala --- @@ -165,155 +134,52 @@ abstract class AggregationIterator( // Initializing functions used to process a row. protected val processRow: (MutableRow, InternalRow) => Unit = { -val rowToBeProcessed = new JoinedRow -val aggregationBufferSchema = allAggregateFunctions.flatMap(_.aggBufferAttributes) -aggregationMode match { - // Partial-only - case (Some(Partial), None) => -val updateExpressions = nonCompleteAggregateFunctions.flatMap { - case ae: DeclarativeAggregate => ae.updateExpressions - case agg: AggregateFunction => Seq.fill(agg.aggBufferAttributes.length)(NoOp) -} -val expressionAggUpdateProjection = - newMutableProjection(updateExpressions, aggregationBufferSchema ++ valueAttributes)() - -(currentBuffer: MutableRow, row: InternalRow) => { - expressionAggUpdateProjection.target(currentBuffer) - // Process all expression-based aggregate functions. - expressionAggUpdateProjection(rowToBeProcessed(currentBuffer, row)) - // Process all imperative aggregate functions. - var i = 0 - while (i < nonCompleteImperativeAggregateFunctions.length) { - nonCompleteImperativeAggregateFunctions(i).update(currentBuffer, row) -i += 1 - } -} - - // PartialMerge-only or Final-only - case (Some(PartialMerge), None) | (Some(Final), None) => -val inputAggregationBufferSchema = if (initialInputBufferOffset == 0) { - // If initialInputBufferOffset, the input value does not contain - // grouping keys. - // This part is pretty hacky. - allAggregateFunctions.flatMap(_.inputAggBufferAttributes).toSeq -} else { - groupingKeyAttributes ++ allAggregateFunctions.flatMap(_.inputAggBufferAttributes) -} -// val inputAggregationBufferSchema = -// groupingKeyAttributes ++ -//allAggregateFunctions.flatMap(_.cloneBufferAttributes) -val mergeExpressions = nonCompleteAggregateFunctions.flatMap { - case ae: DeclarativeAggregate => ae.mergeExpressions - case agg: AggregateFunction => Seq.fill(agg.aggBufferAttributes.length)(NoOp) -} -// This projection is used to merge buffer values for all expression-based aggregates. -val expressionAggMergeProjection = - newMutableProjection( -mergeExpressions, -aggregationBufferSchema ++ inputAggregationBufferSchema)() - -(currentBuffer: MutableRow, row: InternalRow) => { - // Process all expression-based aggregate functions. - expressionAggMergeProjection.target(currentBuffer)(rowToBeProcessed(currentBuffer, row)) - // Process all imperative aggregate functions. - var i = 0 - while (i < nonCompleteImperativeAggregateFunctions.length) { - nonCompleteImperativeAggregateFunctions(i).merge(currentBuffer, row) -i += 1 - } -} - - // Final-Complete - case (Some(Final), Some(Complete)) => -val completeAggregateFunctions: Array[AggregateFunction] = - allAggregateFunctions.takeRight(completeAggregateExpressions.length) -// All imperative aggregate functions with mode Complete. -val completeImperativeAggregateFunctions: Array[ImperativeAggregate] = - completeAggregateFunctions.collect { case func: ImperativeAggregate => func } - -// The first initialInputBufferOffset values of the input aggregation buffer is -// for grouping expressions and distinct columns. -val groupingAttributesAndDistinctColumns = valueAttributes.take(initialInputBufferOffset) - -val completeOffsetExpressions = - Seq.fill(completeAggregateFunctions.map(_.aggBufferAttributes.length).sum)(NoOp) -// We do not touch buffer values of aggregate functions with the Final mode. -val finalOffsetExpressions = - Seq.fill(nonCompleteAggregateFunctions.map(_.aggBufferAttributes.length).sum)(NoOp) - -val mergeInputSchema = - aggregationBufferSchema ++ -groupingAttributesAndDistinctColumns ++ - nonCompleteAggregateFunctions.flatMap(_.inputAggBufferAttributes) -val mergeExpressions = - nonCompleteAggregateFunctions.flatMap { -
[GitHub] spark pull request: [SPARK-12063][SQL] Use number in group by clau...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10052#issuecomment-163734167 **[Test build #47538 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47538/consoleFull)** for PR 10052 at commit [`bd453d5`](https://github.com/apache/spark/commit/bd453d5f6744aa8fdd03b5ee2ecd44b471165eb4). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12063][SQL] Use number in group by clau...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10052#issuecomment-163734214 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12220][Core]Make Utils.fetchFile suppor...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10208#issuecomment-163733865 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12220][Core]Make Utils.fetchFile suppor...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10208#issuecomment-163733716 **[Test build #47529 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47529/consoleFull)** for PR 10208 at commit [`2c31643`](https://github.com/apache/spark/commit/2c3164386040b5051e0332652cff9d2052b90cdb). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/10234#issuecomment-163737620 That's the only remaining issue I found. I checked against the Spark 1.5 doc links as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11563] [core] [repl] Use RpcEnv to tran...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9923#issuecomment-163716405 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12063][SQL] Use number in group by clau...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/10052#issuecomment-163718318 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12256] [SQL] Code refactoring: naming b...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/10243#issuecomment-163720242 +1 to @rxin concerns on wrapping. A good rule of thumb is to always break at the highest syntatic level (not in the middle of some construct like a list of arguments). Otherwise you break things up that are actually the same and create an artificial separation. ```scala // No def getPath: Expression = path.getOrElse(BoundReference(0, inferDataType(typeToken)._1, nullable = true)) // Yes def getPath: Expression = path.getOrElse(BoundReference(0, inferDataType(typeToken)._1, nullable = true)) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12248][CORE] Adds limits per cpu for me...
Github user drcrallen commented on the pull request: https://github.com/apache/spark/pull/10232#issuecomment-163724877 Doesn't affect heap memory properly, closing until fixed --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10234#issuecomment-163726273 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10228#issuecomment-163726125 **[Test build #47533 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47533/consoleFull)** for PR 10228 at commit [`3f60962`](https://github.com/apache/spark/commit/3f60962c2fd2f8f140714d0010dd0bb424b034b0). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12257][SQL] Non partitioned insert into...
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/10254#discussion_r47274373 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -155,6 +155,11 @@ case class InsertIntoHiveTable( val partitionColumns = fileSinkConf.getTableInfo.getProperties.getProperty("partition_columns") val partitionColumnNames = Option(partitionColumns).map(_.split("/")).orNull +// Validate that partition values are specified for partition columns. +if (partitionColumnNames != null && partitionColumnNames.size > 0 && partitionSpec.size == 0) { + throw new SparkException(ErrorMsg.NEED_PARTITION_ERROR.getMsg) --- End diff -- @marmbrus Thanks. Actually right after the code block i changed, there are a few places where we raise SparkException. So i thought there may be a reason for it and followed it. :-). I will change all those places as well. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12182][ML] Distributed binning for tree...
Github user jodersky commented on a diff in the pull request: https://github.com/apache/spark/pull/10231#discussion_r47274795 --- Diff: mllib/src/main/scala/org/apache/spark/ml/tree/impl/RandomForest.scala --- @@ -842,60 +842,63 @@ private[ml] object RandomForest extends Logging { 1.0 } logDebug("fraction of data used for calculating quantiles = " + fraction) - input.sample(withReplacement = false, fraction, new XORShiftRandom(seed).nextInt()).collect() + input.sample(withReplacement = false, fraction, new XORShiftRandom(seed).nextInt()) } else { - new Array[LabeledPoint](0) + input.sparkContext.emptyRDD[LabeledPoint] } -val splits = new Array[Array[Split]](numFeatures) - -// Find all splits. -// Iterate over all features. -var featureIndex = 0 -while (featureIndex < numFeatures) { - if (metadata.isContinuous(featureIndex)) { -val featureSamples = sampledInput.map(_.features(featureIndex)) -val featureSplits = findSplitsForContinuousFeature(featureSamples, metadata, featureIndex) +findSplitsBinsBySorting(sampledInput, metadata, continuousFeatures) + } -val numSplits = featureSplits.length -logDebug(s"featureIndex = $featureIndex, numSplits = $numSplits") -splits(featureIndex) = new Array[Split](numSplits) + private def findSplitsBinsBySorting( + input: RDD[LabeledPoint], + metadata: DecisionTreeMetadata, + continuousFeatures: IndexedSeq[Int]): Array[Array[Split]] = { + +val continuousSplits = { + // reduce the parallelism for split computations when there are less + // continuous features than input partitions. this prevents tasks from + // being spun up that will definitely do no work. + val numPartitions = math.min(continuousFeatures.length, input.partitions.length) + + input +.flatMap(point => continuousFeatures.map(idx => (idx, point.features(idx +.groupByKey(numPartitions) +.map { case (idx, samples) => + val thresholds = findSplitsForContinuousFeature(samples.toArray, metadata, idx) + val splits: Array[Split] = thresholds.map(thresh => new ContinuousSplit(idx, thresh)) + logDebug(s"featureIndex = $idx, numSplits = ${splits.length}") + (idx, splits) +}.collectAsMap() +} -var splitIndex = 0 -while (splitIndex < numSplits) { - val threshold = featureSplits(splitIndex) - splits(featureIndex)(splitIndex) = new ContinuousSplit(featureIndex, threshold) - splitIndex += 1 -} - } else { -// Categorical feature -if (metadata.isUnordered(featureIndex)) { - val numSplits = metadata.numSplits(featureIndex) - val featureArity = metadata.featureArity(featureIndex) - // TODO: Use an implicit representation mapping each category to a subset of indices. - // I.e., track indices such that we can calculate the set of bins for which - // feature value x splits to the left. - // Unordered features - // 2^(maxFeatureValue - 1) - 1 combinations - splits(featureIndex) = new Array[Split](numSplits) - var splitIndex = 0 - while (splitIndex < numSplits) { -val categories: List[Double] = - extractMultiClassCategories(splitIndex + 1, featureArity) -splits(featureIndex)(splitIndex) = - new CategoricalSplit(featureIndex, categories.toArray, featureArity) -splitIndex += 1 - } -} else { - // Ordered features - // Bins correspond to feature values, so we do not need to compute splits or bins - // beforehand. Splits are constructed as needed during training. - splits(featureIndex) = new Array[Split](0) +val numFeatures = metadata.numFeatures +val splits = Range(0, numFeatures).map { + case i if metadata.isContinuous(i) => +val split = continuousSplits(i) +metadata.setNumSplits(i, split.length) +split + + case i if metadata.isCategorical(i) && metadata.isUnordered(i) => +// Unordered features +// 2^(maxFeatureValue - 1) - 1 combinations +val featureArity = metadata.featureArity(i) +val split: IndexedSeq[Split] = Range(0, metadata.numSplits(i)).map { splitIndex => --- End diff -- You could use an Array.tablulate here. Something like ```scala Array.tabulate[Split](numSplits(i)){splitIndex => ... } ``` --- If your
[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10234#issuecomment-163731702 **[Test build #47537 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47537/consoleFull)** for PR 10234 at commit [`8432ac9`](https://github.com/apache/spark/commit/8432ac947a2ee469dbc4082a4fa702da82f44ebe). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `* more functionality for random forests: estimates of feature importance, as well as the predicted probability of each class (a.k.a. class conditional probabilities) for classification.`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10234#issuecomment-163731884 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12155] [SPARK-12253] Fix executor OOM i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10240#issuecomment-163731260 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...
Github user nongli commented on a diff in the pull request: https://github.com/apache/spark/pull/10228#discussion_r47276459 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/AggregationIterator.scala --- @@ -49,41 +47,20 @@ abstract class AggregationIterator( // Initializing functions. /// - // An Seq of all AggregateExpressions. - // It is important that all AggregateExpressions with the mode Partial, PartialMerge or Final - // are at the beginning of the allAggregateExpressions. - protected val allAggregateExpressions = -nonCompleteAggregateExpressions ++ completeAggregateExpressions - require( -allAggregateExpressions.map(_.mode).distinct.length <= 2, -s"$allAggregateExpressions are not supported becuase they have more than 2 distinct modes.") - - /** - * The distinct modes of AggregateExpressions. Right now, we can handle the following mode: --- End diff -- Can you add a similar comment for the new version? Which combinations are valid now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12063][SQL] Use number in group by clau...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10052#issuecomment-163734217 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47538/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12250] [SQL] Allow users to define a UD...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/10236 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...
Github user jkbradley commented on a diff in the pull request: https://github.com/apache/spark/pull/10234#discussion_r47278589 --- Diff: docs/ml-survival-regression.md --- @@ -1,7 +1,7 @@ --- layout: global -title: Survival Regression - ML -displayTitle: ML - Survival Regression +title: Survival Regression - spark.ml +displayTitle: Survival Regression - spark.ml --- End diff -- This doc should now be a redirect to the ml-classification-regression.html#survival-regression section. Also, it looks like some of the math renders incorrectly, but let's fix that in a follow-up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12155] [SPARK-12253] Fix executor OOM i...
Github user andrewor14 commented on a diff in the pull request: https://github.com/apache/spark/pull/10240#discussion_r47268739 --- Diff: core/src/main/scala/org/apache/spark/memory/ExecutionMemoryPool.scala --- @@ -91,23 +108,34 @@ private[memory] class ExecutionMemoryPool( val numActiveTasks = memoryForTask.keys.size val curMem = memoryForTask(taskAttemptId) - // How much we can grant this task; don't let it grow to more than 1 / numActiveTasks; - // don't let it be negative - val maxToGrant = -math.min(numBytes, math.max(0, (poolSize / numActiveTasks) - curMem)) + // In every iteration of this loop, we should first try to reclaim any borrowed execution + // space from storage. This is necessary because of the potential race condition where new + // storage blocks may steal the free execution memory that this task was waiting for. + maybeGrowPool(numBytes - memoryFree) + + // Maximum size the pool would have after potentially growing the pool. + // This is used to compute the upper bound of how much memory each task can occupy. This + // must take into account potential free memory as well as the amount this pool currently + // occupies. Otherwise, we may run into SPARK-12155 where, in unified memory management, + // we did not take into account space that could have been freed by evicting cached blocks. + val maxPoolSize = computeMaxPoolSize() + val maxMemoryPerTask = maxPoolSize / numActiveTasks + val minMemoryPerTask = poolSize / (2 * numActiveTasks) + + // How much we can grant this task; keep its share within 0 <= X <= 1 / numActiveTasks + val maxToGrant = math.min(numBytes, math.max(0, maxMemoryPerTask - curMem)) // Only give it as much memory as is free, which might be none if it reached 1 / numTasks val toGrant = math.min(maxToGrant, memoryFree) - if (curMem < poolSize / (2 * numActiveTasks)) { + if (curMem < minMemoryPerTask) { --- End diff -- yeah, I agree, though it's something we can always fix separately so we don't block the release. Let's defer the judgment to @JoshRosen. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/10209#discussion_r47268048 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/plans/PlanTest.scala --- @@ -43,7 +43,7 @@ abstract class PlanTest extends SparkFunSuite { protected def comparePlans(plan1: LogicalPlan, plan2: LogicalPlan) { val normalized1 = normalizeExprIds(plan1) val normalized2 = normalizeExprIds(plan2) -if (normalized1 != normalized2) { +if (!normalized1.semanticEquals(normalized2)) { --- End diff -- Existing: do we need this hacky normalization logic above anymore? I don't think `semanticEquals` existed when I wrote this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10228#issuecomment-163720246 **[Test build #47533 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47533/consoleFull)** for PR 10228 at commit [`3f60962`](https://github.com/apache/spark/commit/3f60962c2fd2f8f140714d0010dd0bb424b034b0). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11131] [core] Fix race in worker regist...
Github user zsxwing commented on the pull request: https://github.com/apache/spark/pull/9138#issuecomment-163719606 @vanzin just found an issue about this change. Now if the master receives `RegisterWorker`, it won't use the `workerRef` to send the reply. So there is no connection from `Master` to the server in `Worker`. If the `Worker` is killed now, `Master` only observes some client is lost, but the address is just a client address in Worker and won't match the Worker address. So `Master` cannot remove this dead `Worker` at once. However, this Worker will be removed in 60 seconds because of no heartbeat. See the log here: https://www.mail-archive.com/dev@spark.apache.org/msg12332.html --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-7727] [SQL] Avoid inner classes in Rule...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/10174#discussion_r47270183 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/DefaultOptimizerExtendableSuite.scala --- @@ -0,0 +1,45 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.sql.catalyst + +import org.apache.spark.SparkFunSuite +import org.apache.spark.sql.catalyst.optimizer.{DefaultOptimizer, Optimizer} +import org.apache.spark.sql.catalyst.plans.logical.LogicalPlan +import org.apache.spark.sql.catalyst.rules.Batch + +/** + * This is a test for SPARK-7727 if the Default Optimizer is kept being extendable + */ +class DefaultOptimizerExtendableSuite extends SparkFunSuite{ + + /** +* This class represents a dummy extended optimizer that takes the rules of the +* DefaultOptimizer and adds custom ones. +*/ + class ExtendedOptimizer extends Optimizer{ --- End diff -- Nit: space before `{` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12063][SQL] Use number in group by clau...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10052#issuecomment-163722742 **[Test build #47534 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47534/consoleFull)** for PR 10052 at commit [`8a5a4f6`](https://github.com/apache/spark/commit/8a5a4f63fc66792e924f4f3355df357815aae13b). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12063][SQL] Use number in group by clau...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10052#issuecomment-163722112 **[Test build #47534 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47534/consoleFull)** for PR 10052 at commit [`8a5a4f6`](https://github.com/apache/spark/commit/8a5a4f63fc66792e924f4f3355df357815aae13b). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10234#issuecomment-163722357 **[Test build #47535 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47535/consoleFull)** for PR 10234 at commit [`c75a5ca`](https://github.com/apache/spark/commit/c75a5ca68cec48574cddeb9f1cb8695b8d44e9ea). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...
Github user thunterdb commented on the pull request: https://github.com/apache/spark/pull/10234#issuecomment-163723975 @jkbradley done: https://cloud.githubusercontent.com/assets/7594753/11725710/e9949f04-9f2f-11e5-8ba5-7f955e8b41fa.png;> --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11923][ML] Python API for ml.feature.Ch...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/10186#discussion_r47273371 --- Diff: python/pyspark/ml/feature.py --- @@ -2093,6 +2093,95 @@ class RFormulaModel(JavaModel): """ +@inherit_doc +class ChiSqSelector(JavaEstimator, HasFeaturesCol, HasOutputCol, HasLabelCol): +""" +.. note:: Experimental + +Chi-Squared feature selection, which selects categorical features to use for predicting a +categorical label. + +>>> from pyspark.mllib.linalg import Vectors +>>> df = sqlContext.createDataFrame( +...[(Vectors.dense([0.0, 0.0, 18.0, 1.0]), 1.0), +...(Vectors.dense([0.0, 1.0, 12.0, 0.0]), 0.0), +...(Vectors.dense([1.0, 0.0, 15.0, 0.1]), 0.0)], +...["features", "label"]) +>>> selector = ChiSqSelector(numTopFeatures=1, outputCol="selectedFeatures") +>>> model = selector.fit(df) +>>> model.transform(df).collect()[0].selectedFeatures +DenseVector([1.0]) +>>> model.transform(df).collect()[1].selectedFeatures +DenseVector([0.0]) +>>> model.transform(df).collect()[2].selectedFeatures +DenseVector([0.1]) + +.. versionadded:: 1.6.0 +""" + +# a placeholder to make it appear in the generated doc +numTopFeatures = \ +Param(Params._dummy(), "numTopFeatures", + "Number of features that selector will select, ordered by statistics value " + + "descending. If the number of features is < numTopFeatures, then this will select " + + "all features.") + +@keyword_only +def __init__(self, numTopFeatures=50, featuresCol="features", outputCol=None, labelCol="label"): +""" +__init__(self, numTopFeatures=50, featuresCol="features", outputCol=None, labelCol="label") +""" +super(ChiSqSelector, self).__init__() +self._java_obj = self._new_java_obj("org.apache.spark.ml.feature.ChiSqSelector", self.uid) +self.numTopFeatures = \ +Param(self, "numTopFeatures", + "Number of features that selector will select, ordered by statistics value " + + "descending. If the number of features is < numTopFeatures, then this will " + + "select all features.") +kwargs = self.__init__._input_kwargs +self.setParams(**kwargs) + +@keyword_only +@since("1.6.0") +def setParams(self, numTopFeatures=50, featuresCol="features", outputCol=None, + labelCol="labels"): +""" +setParams(self, numTopFeatures=50, featuresCol="features", outputCol=None,\ + labelCol="labels") +Sets params for this ChiSqSelector. +""" +kwargs = self.setParams._input_kwargs +return self._set(**kwargs) + +@since("1.6.0") +def setNumTopFeatures(self, value): +""" +Sets the value of :py:attr:`numTopFeatures`. +""" +self._paramMap[self.numTopFeatures] = value +return self + +@since("1.6.0") +def getNumTopFeatures(self): +""" +Gets the value of numTopFeatures or its default value. +""" +return self.getOrDefault(self.numTopFeatures) + +def _create_model(self, java_model): +return ChiSqSelectorModel(java_model) + + +class ChiSqSelectorModel(JavaModel): --- End diff -- This model is loadable and saveable in Java, I don't see us doing this elsewhere in ml/ yet (although we do it in mllib/) but do we maybe want to use the JavaLoader & JavaSaveable base classes? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12063][SQL] Use number in group by clau...
Github user dereksabryfb commented on the pull request: https://github.com/apache/spark/pull/10052#issuecomment-163727440 Apologies, I haven't been able to run ./dev/run-tests, getting the following exception: http://pastebin.com/L0p0sjtJ so I wasn't able to pick up the style issues, and I'm not sure if there's more that the build doesn't flag. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10234#issuecomment-163727576 **[Test build #47537 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47537/consoleFull)** for PR 10234 at commit [`8432ac9`](https://github.com/apache/spark/commit/8432ac947a2ee469dbc4082a4fa702da82f44ebe). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12155] [SPARK-12253] Fix executor OOM i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10240#issuecomment-163731070 **[Test build #47528 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47528/consoleFull)** for PR 10240 at commit [`d8be669`](https://github.com/apache/spark/commit/d8be66911d2abf3da46a25a54a7d80fd1eeebdfa). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12155] [SPARK-12253] Fix executor OOM i...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10240#issuecomment-163731262 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47528/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10209#issuecomment-163732061 **[Test build #47531 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47531/consoleFull)** for PR 10209 at commit [`fb562fb`](https://github.com/apache/spark/commit/fb562fb67a761276456b14a81513f3fc69a6ead8). * This patch **fails PySpark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10209#issuecomment-163732157 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47531/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12220][Core]Make Utils.fetchFile suppor...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10208#issuecomment-163733868 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47529/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12228] [SQL] Try to run execution hive'...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/10204#issuecomment-163734579 LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12228] [SQL] Try to run execution hive'...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/10204#issuecomment-163734657 Thanks! Merging to master. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12250] [SQL] Allow users to define a UD...
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/10236#issuecomment-163734799 The only change is to remove that `require`. I am merging it to master and branch 1.6. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12250] [SQL] Allow users to define a UD...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10236#issuecomment-163716624 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47526/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12257][SQL] Non partitioned insert into...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/10254#discussion_r47269588 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/InsertIntoHiveTable.scala --- @@ -155,6 +155,11 @@ case class InsertIntoHiveTable( val partitionColumns = fileSinkConf.getTableInfo.getProperties.getProperty("partition_columns") val partitionColumnNames = Option(partitionColumns).map(_.split("/")).orNull +// Validate that partition values are specified for partition columns. +if (partitionColumnNames != null && partitionColumnNames.size > 0 && partitionSpec.size == 0) { + throw new SparkException(ErrorMsg.NEED_PARTITION_ERROR.getMsg) --- End diff -- `AnalysisException` for anything that is thrown due to an invalid query. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12063][SQL] Use number in group by clau...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10052#issuecomment-163722753 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10228#issuecomment-163724932 **[Test build #47536 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47536/consoleFull)** for PR 10228 at commit [`a9eae30`](https://github.com/apache/spark/commit/a9eae303166d6c3ba1f80a22265482b9f4d0a525). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12248][CORE] Adds limits per cpu for me...
Github user drcrallen closed the pull request at: https://github.com/apache/spark/pull/10232 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10234#issuecomment-163726109 **[Test build #47535 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47535/consoleFull)** for PR 10234 at commit [`c75a5ca`](https://github.com/apache/spark/commit/c75a5ca68cec48574cddeb9f1cb8695b8d44e9ea). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_:\n * `* more functionality for random forests: estimates of feature importance, as well as the predicted probability of each class (a.k.a. class conditional probabilities) for classification.`\n --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12213] [SQL] use multiple partitions fo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10228#issuecomment-163726189 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47533/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12212][ML][DOC] Clarifies the differenc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10234#issuecomment-163726277 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47535/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12149] [Web UI] Executor UI improvement...
Github user holdenk commented on a diff in the pull request: https://github.com/apache/spark/pull/10154#discussion_r47274898 --- Diff: core/src/main/scala/org/apache/spark/ui/exec/ExecutorsPage.scala --- @@ -33,11 +33,13 @@ private[ui] case class ExecutorSummaryInfo( rddBlocks: Int, memoryUsed: Long, diskUsed: Long, +totalCores: Int, --- End diff -- So the comment for this case class says it isn't used anymore - do we really need to update it? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9372] [SQL] For joins, insert IS NOT NU...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10209#issuecomment-163732156 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12235][SPARKR] Enhance mutate() to supp...
Github user felixcheung commented on the pull request: https://github.com/apache/spark/pull/10220#issuecomment-163732757 Sure, I'll check. We were discussing a bit in SPARK-12235 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12228] [SQL] Try to run execution hive'...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/10204 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2750][WEB UI] Add https support to the ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10238#issuecomment-163737331 **[Test build #47532 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47532/consoleFull)** for PR 10238 at commit [`f6f1dab`](https://github.com/apache/spark/commit/f6f1dab2eede5147c2387efa4d02d92f6c7a5388). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2750][WEB UI] Add https support to the ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10238#issuecomment-163737414 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47532/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2750][WEB UI] Add https support to the ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10238#issuecomment-163737413 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12155] [SPARK-12253] Fix executor OOM i...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/10240#issuecomment-163738746 @davies please look at the final changes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12155] [SPARK-12253] Fix executor OOM i...
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/10240#issuecomment-163738626 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12258] [SQL] Hive Timestamp UDF is bind...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10249#issuecomment-163684431 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Make pyspark shell pythonstartup work under py...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/10255#issuecomment-163684045 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11925] [ML] [PySpark] Add PySpark missi...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9908#issuecomment-163690187 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9695] [ML] Add random seed Param to ML ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/9158#issuecomment-163692088 [Test build #47506 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47506/console) for PR 9158 at commit [`9822a26`](https://github.com/apache/spark/commit/9822a26e0941a575387df03216e81d63f584eb57). * This patch **fails PySpark unit tests**. * This patch **does not merge cleanly**. * This patch adds the following public classes _(experimental)_: * `class Pipeline(override val uid: String) extends Estimator[PipelineModel] with HasSeed ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-12250] [SQL] Allow users to define a UD...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/10236#issuecomment-163691816 **[Test build #47526 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/47526/consoleFull)** for PR 10236 at commit [`e303d4c`](https://github.com/apache/spark/commit/e303d4ca88e1209d0eaf17a367deb52ee18f8717). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9695] [ML] Add random seed Param to ML ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9158#issuecomment-163692307 Build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-9695] [ML] Add random seed Param to ML ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9158#issuecomment-163692308 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47506/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-11978] [ML] Move dataset_example.py to ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/9957#issuecomment-163693631 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/47514/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org