[GitHub] spark pull request: [SPARK-3273]The spark version in the welcome m...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2175#issuecomment-53676807 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX] Wait for EOF only for the PySpark she...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/2170#issuecomment-53677017 Okay thanks - I'll merge this. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2636: no where to get job identifier whi...
GitHub user lirui-intel opened a pull request: https://github.com/apache/spark/pull/2176 SPARK-2636: no where to get job identifier while submit spark job through spark API This PR adds the async actions to the Java API. User can call these async actions to get the FutureAction and use JobWaiter (for SimpleFutureAction) to retrieve job Id. You can merge this pull request into a Git repository by running: $ git pull https://github.com/lirui-intel/spark SPARK-2636 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2176.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2176 commit 6e2b87b4bb414f69753cc5208099baff8d54f002 Author: lirui rui...@intel.com Date: 2014-08-27T05:02:01Z SPARK-2636: add java API for async actions commit eb1ee798a41f89bc996bc1661968b7cf75c48d6e Author: lirui rui...@intel.com Date: 2014-08-27T06:51:56Z SPARK-2636: change some parameters in SimpleFutureAction to member field commit d09f73216005bef498f4086d40860a8bdd587940 Author: lirui rui...@intel.com Date: 2014-08-27T06:57:11Z SPARK-2636: fix build commit 1b25abc86516f1e2cbbab9bdb03dffa920bb8658 Author: lirui rui...@intel.com Date: 2014-08-27T07:16:38Z SPARK-2636: expose some fields in JobWaiter commit fbf574443ffe63f8d449fd639093016cb064283d Author: lirui rui...@intel.com Date: 2014-08-27T14:35:02Z SPARK-2636: add more async actions for java api --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX] Wait for EOF only for the PySpark she...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2170 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX][SQL] Remove cleaning of UDFs
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2174 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2947] DAGScheduler resubmit the stage i...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1877#issuecomment-53677259 [SPARK-3224](https://issues.apache.org/jira/browse/SPARK-3224) is the same problem. This PR adds some boundary judgments and removed some redundant code --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2636: no where to get job identifier whi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2176#issuecomment-53677401 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19377/consoleFull) for PR 2176 at commit [`fbf5744`](https://github.com/apache/spark/commit/fbf574443ffe63f8d449fd639093016cb064283d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2636: no where to get job identifier whi...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2176#issuecomment-53677618 @lirui-intel - JobWaiter is an internal API that's never designed to be public. I would not expose it simply because you need the job id. There are lots of ways to get the job id ... e.g. you can add an interface to the future to get the job id. Also this doesn't really make sense. Even with your change, JobWaiter is still private[spark], and only SimpleFutureAction contains a public field to it ... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3273]The spark version in the welcome m...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/2175#discussion_r16822707 --- Diff: repl/src/main/scala/org/apache/spark/repl/SparkILoopInit.scala --- @@ -26,7 +26,7 @@ trait SparkILoopInit { __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ - /___/ .__/\_,_/_/ /_/\_\ version 1.0.0-SNAPSHOT + /___/ .__/\_,_/_/ /_/\_\ version 1.1.0-SNAPSHOT --- End diff -- Can we get this from SparkContext ? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3273]The spark version in the welcome m...
Github user witgo commented on a diff in the pull request: https://github.com/apache/spark/pull/2175#discussion_r16822912 --- Diff: repl/src/main/scala/org/apache/spark/repl/SparkILoopInit.scala --- @@ -26,7 +26,7 @@ trait SparkILoopInit { __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ - /___/ .__/\_,_/_/ /_/\_\ version 1.0.0-SNAPSHOT + /___/ .__/\_,_/_/ /_/\_\ version 1.1.0-SNAPSHOT --- End diff -- This is a good idea. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2947] DAGScheduler resubmit the stage i...
Github user kayousterhout commented on a diff in the pull request: https://github.com/apache/spark/pull/1877#discussion_r16823367 --- Diff: core/src/test/scala/org/apache/spark/scheduler/DAGSchedulerSuite.scala --- @@ -472,6 +472,44 @@ class DAGSchedulerSuite extends TestKit(ActorSystem(DAGSchedulerSuite)) with F assert(sparkListener.failedStages.size == 1) } + test(run trivial shuffle with repeated fetch failure) { --- End diff -- can you change this and/or the name for the test at line 438? They are currently almost identical such that it's unclear what the point of each test is. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2636: no where to get job identifier whi...
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/2176#issuecomment-53679584 Hi @rxin, thanks for the review! I can add interface to SimpleFutureAction to get the job id if we shouldn't expose JobWaiter to users. Hive on spark currently only uses foreach to submit the job, we can change it to foreachAsync which just returns a SimpleFutureAction. I agree this doesn't seem to be a perfect way to solve the problem and we still have ComplexFutureAction to worry about. So please let me know if you have a better idea in mind or if you think we shouldn't expose the job ID at all. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2636: no where to get job identifier whi...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2176#issuecomment-53679729 If foreachAsync is the only one you need right now, why don't you just add foreachAsync (and remove the rest), and add jobId to SimpleFutureAction? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2947] DAGScheduler resubmit the stage i...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1877#issuecomment-53679879 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19375/consoleFull) for PR 1877 at commit [`c4b0f91`](https://github.com/apache/spark/commit/c4b0f91d63aaacc2d62455ae01fcea307a4db6e8). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2636: no where to get job identifier whi...
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/2176#issuecomment-53680289 I thought these async actions are missing in the java API so I added all of them from AsyncRDDActions. But sure, let me just add foreachAsync. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2636: no where to get job identifier whi...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2176#issuecomment-53680380 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19377/consoleFull) for PR 2176 at commit [`fbf5744`](https://github.com/apache/spark/commit/fbf574443ffe63f8d449fd639093016cb064283d). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2636: no where to get job identifier whi...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2176#issuecomment-53680555 Yea let's not add all of them since they are highly experimental. I'm not even sure if those are the APIs we want to commit to in the long run. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SQL] Fixed 2 comment typos in SQLConf
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2172 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3230][SQL] Fix udfs that return structs
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/2133 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3181][MLLIB]: Add Robust Regression Alg...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2096#issuecomment-53682676 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2636: no where to get job identifier whi...
Github user lirui-intel commented on the pull request: https://github.com/apache/spark/pull/2176#issuecomment-53682639 @rxin I've updated the patch. Yes I see these APIs are experimental. We can make hive use it as a workaround and change it when we have a better solution. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3181][MLLIB]: Add Robust Regression Alg...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2096#issuecomment-53682689 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3181][MLLIB]: Add Robust Regression Alg...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2096#issuecomment-53682783 @fjiang6 Could you try LBFGS instead of SGD? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3251][MLLIB]: Clarify learning interfac...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2137#issuecomment-53682909 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Added Sql to mima checks
Github user ScrapCodes closed the pull request at: https://github.com/apache/spark/pull/1342 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3251][MLLIB]: Clarify learning interfac...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2137#issuecomment-53683388 QA tests have started for PR 2137. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19382/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3279] Remove useless field variable in ...
GitHub user sarutak opened a pull request: https://github.com/apache/spark/pull/2177 [SPARK-3279] Remove useless field variable in ApplicationMaster You can merge this pull request into a Git repository by running: $ git pull https://github.com/sarutak/spark SPARK-3279 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2177.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2177 commit 113601ff0b7c15db49acc26b9b646f6036644cc1 Author: Kousuke Saruta saru...@oss.nttdata.co.jp Date: 2014-08-28T07:44:15Z Removed useless field variable from ApplicationMaster --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1477]: Add the lifecycle interface
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/991#issuecomment-53683774 @witgo I'm going to take a look at this later for 1.2. I think it's a good idea to have a Service abstraction for service that we can start/stop. The current API is slightly more complicated than necessary, but it is in good direction. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3251][MLLIB]: Clarify learning interfac...
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/2137#issuecomment-53684027 The assumption is usually unrealistic. For logistic regression, it is common to have the predictions be something like 0.9 or 0.01, and they cannot be interpreted as probabilities without calibration. Logistic regression is not responsible for it. I created a JIRA for isotonic regression, which can be used for calibration: https://issues.apache.org/jira/browse/SPARK-3278 For the method names, my suggestion would be: add `classify` that outputs classes using a threshold, keep `predict` that output the raw predictions. Do not distinguish `predictScore` and `predictProb`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3279] Remove useless field variable in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2177#issuecomment-53684178 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19383/consoleFull) for PR 2177 at commit [`2955edc`](https://github.com/apache/spark/commit/2955edc255432f9432a5e832f20651ab24f6c63f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3280] Made sort-based shuffle the defau...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/2178 [SPARK-3280] Made sort-based shuffle the default implementation Sort-based shuffle has lower memory usage and seems to outperform hash-based in almost all of our testing. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark sort-shuffle Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2178.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2178 commit bee001e28d65c4fd8326b58c08bffcb91e27842b Author: Reynold Xin r...@apache.org Date: 2014-08-28T07:57:38Z [SPARK-3280] Made sort-based shuffle the default implementation commit 1445ef24b680ad746fd1660bb1ec1ff36ccce634 Author: Reynold Xin r...@apache.org Date: 2014-08-28T07:58:34Z Fixed a comment typo. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3280] Made sort-based shuffle the defau...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2178#issuecomment-53684899 Hopefully I caught all the cases --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2288] Hide ShuffleBlockManager behind S...
Github user colorant commented on a diff in the pull request: https://github.com/apache/spark/pull/1241#discussion_r16825748 --- Diff: core/src/main/scala/org/apache/spark/shuffle/FileShuffleBlockManager.scala --- @@ -181,17 +171,30 @@ class ShuffleBlockManager(blockManager: BlockManager, /** * Returns the physical file segment in which the given BlockId is located. - * This function should only be called if shuffle file consolidation is enabled, as it is - * an error condition if we don't find the expected block. */ def getBlockLocation(id: ShuffleBlockId): FileSegment = { -// Search all file groups associated with this shuffle. -val shuffleState = shuffleStates(id.shuffleId) -for (fileGroup - shuffleState.allFileGroups) { - val segment = fileGroup.getFileSegmentFor(id.mapId, id.reduceId) - if (segment.isDefined) { return segment.get } +if (consolidateShuffleFiles) { + // Search all file groups associated with this shuffle. + val shuffleState = shuffleStates(id.shuffleId) + val iter = shuffleState.allFileGroups.iterator + while (iter.hasNext) { +val segment = iter.next.getFileSegmentFor(id.mapId, id.reduceId) +if (segment.isDefined) { return segment.get } + } + throw new IllegalStateException(Failed to find shuffle block: + id) +} else { + val file = blockManager.diskBlockManager.getFile(id) --- End diff -- you mean diskBlockManager.getfile? yes, it know nothing about consolidation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1912] Lazily initialize buffers for loc...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/2179 [SPARK-1912] Lazily initialize buffers for local shuffle blocks. This is a simplified fix for SPARK-1912. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-1912 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2179.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2179 commit 66679d2245b15dd6516d392a404dc81843995a55 Author: Reynold Xin r...@apache.org Date: 2014-08-28T08:02:24Z [SPARK-1912] Lazily initialize buffers for local shuffle blocks. This is a simplified fix for SPARK-1912. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1912] Lazily initialize buffers for loc...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2179#issuecomment-53685306 Note that this was previously fixed by @cloud-fan in #860. cc @cloud-fan @ash211 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-2816][SQL] Type-safe SQL Queries
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1759#issuecomment-53685952 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19387/consoleFull) for PR 1759 at commit [`500e746`](https://github.com/apache/spark/commit/500e746014b1a6c0406df7013a2febeecd858648). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-2816][SQL] Type-safe SQL Queries
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1759#issuecomment-53685965 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19387/consoleFull) for PR 1759 at commit [`500e746`](https://github.com/apache/spark/commit/500e746014b1a6c0406df7013a2febeecd858648). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `trait ScalaReflection ` * ` case class Schema(dataType: DataType, nullable: Boolean)` * ` class Macros[C : Context](val c: C) extends ScalaReflection ` * `trait InterpolatedItem ` * `case class InterpolatedUDF(index: Int, expr: c.Expr[Any], returnType: DataType)` * `case class InterpolatedTable(index: Int, expr: c.Expr[Any], schema: StructType)` * `case class RecSchema(name: String, index: Int, cType: DataType, tpe: Type)` * ` case class ImplSchema(name: String, tpe: Type, impl: Tree)` * `trait TypedSQL ` * ` implicit class SQLInterpolation(val strCtx: StringContext) ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2288] Hide ShuffleBlockManager behind S...
Github user colorant commented on the pull request: https://github.com/apache/spark/pull/1241#issuecomment-53686699 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Dt predict
GitHub user chouqin opened a pull request: https://github.com/apache/spark/pull/2180 Dt predict In current implementation, prediction for a node is calculated along with calculation of information gain stats for each possible splits. The value to predict for a specific node is determined, no matter what the splits are. To save computation, we can first calculate prediction first and then calculate information gain stats for each split. This is also necessary if we want to support minimum instances per node parameters([SPARK-2207](https://issues.apache.org/jira/browse/SPARK-2207)) because when all splits don't satisfy minimum instances requirement , we don't use information gain of any splits. There should be a way to get the prediction value. This PR also removes unused function `nodeIndexToLevel`. CC: @mengxr @manishamde @jkbradley, do you think this is really necessary? You can merge this pull request into a Git repository by running: $ git pull https://github.com/chouqin/spark dt-predict Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2180.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2180 commit 0552c7e798f5d62b74511372c0d38e08e50e6bac Author: qiping.lqp qiping@alibaba-inc.com Date: 2014-08-28T08:03:55Z separate calculation of predict of node from calculation of info gain of splits commit c205eb8775a8dabfd567501972e2c9732c2fe80a Author: qiping.lqp qiping@alibaba-inc.com Date: 2014-08-28T08:05:20Z commit Predict.scala commit d92b3d47666e1c907222605b873172ef4a2c770c Author: qiping.lqp qiping@alibaba-inc.com Date: 2014-08-28T08:19:59Z fix decision tree suite --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Dt predict
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2180#issuecomment-53687434 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3281] Remove Netty specific code in Blo...
GitHub user rxin opened a pull request: https://github.com/apache/spark/pull/2181 [SPARK-3281] Remove Netty specific code in BlockManager. Netty functionality will be added back in subsequent PRs by using the BlockTransferService interface. You can merge this pull request into a Git repository by running: $ git pull https://github.com/rxin/spark SPARK-3281 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2181.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2181 commit ff6d1e1a92fda65ee51916ca01bdcd3449b51364 Author: Reynold Xin r...@apache.org Date: 2014-08-28T08:38:26Z [SPARK-3281] Remove Netty specific code in BlockManager. Netty functionality will be added back in subsequent PRs by using the BlockTransferService interface. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3279] Remove useless field variable in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2177#issuecomment-53688268 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19383/consoleFull) for PR 2177 at commit [`2955edc`](https://github.com/apache/spark/commit/2955edc255432f9432a5e832f20651ab24f6c63f). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Dt predict
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/2180#discussion_r16827232 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala --- @@ -47,9 +47,9 @@ class DecisionTreeSuite extends FunSuite with LocalSparkContext { } def validateRegressor( - model: DecisionTreeModel, - input: Seq[LabeledPoint], - requiredMSE: Double) { + model: DecisionTreeModel, --- End diff -- same here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Dt predict
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/2180#discussion_r16827247 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala --- @@ -885,7 +887,7 @@ object DecisionTreeSuite { } def generateCategoricalDataPointsForMulticlassForOrderedFeatures(): -Array[LabeledPoint] = { + Array[LabeledPoint] = { --- End diff -- Same here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Dt predict
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/2180#issuecomment-53688602 I can not say anything about the usefulness of the patch. But we follow the spark style guide across our code base. https://cwiki.apache.org/confluence/display/SPARK/Spark+Code+Style+Guide --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3251][MLLIB]: Clarify learning interfac...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2137#issuecomment-53688701 QA results for PR 2137:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19382/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2155#issuecomment-53688875 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19390/consoleFull) for PR 2155 at commit [`5873415`](https://github.com/apache/spark/commit/5873415ec1e0cd670adf144144eaf8060e412503). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3281] Remove Netty specific code in Blo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2181#issuecomment-53688872 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19389/consoleFull) for PR 2181 at commit [`ff6d1e1`](https://github.com/apache/spark/commit/ff6d1e1a92fda65ee51916ca01bdcd3449b51364). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/2155#issuecomment-53688906 Thank you @marmbrus , you're right. I've updated the code by providing a new class called `TreeNodeRef` which is a wrapper simply re-implement the `equals` and `hashCode` methods for `TreeNode`, and it aims to replace the `id` where be used as the key in HashSet/HashMap. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2917] [SQL] Avoid table creation in log...
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/1846#issuecomment-53689679 @marmbrus @yhuai Can you review this for me? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3265 Allow using custom ipython executab...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2167#issuecomment-53689831 Shouldn't we update the documentation to include this? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Use user defined $SPARK_HOME in spark-submit i...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/1969#issuecomment-53689928 Actually you can just set `spark.home` in `spark-defaults.conf` for this use case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-2813: [SQL] Implement SQRT() directly in...
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/1750#discussion_r16827967 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveQl.scala --- @@ -935,6 +936,7 @@ private[hive] object HiveQl { case Token(DIV(), left :: right:: Nil) = Cast(Divide(nodeToExpr(left), nodeToExpr(right)), LongType) case Token(%, left :: right:: Nil) = Remainder(nodeToExpr(left), nodeToExpr(right)) +case Token(TOK_FUNCTION, Token(SQRT(), Nil) :: arg :: Nil) = Sqrt(nodeToExpr(arg)) --- End diff -- Good to know, thanks :) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1912] Lazily initialize buffers for loc...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2179#issuecomment-53690190 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3280] Made sort-based shuffle the defau...
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/2178#issuecomment-53690186 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3269][SQL] Decreases initial buffer siz...
Github user liancheng commented on the pull request: https://github.com/apache/spark/pull/2171#issuecomment-53690286 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1912] Lazily initialize buffers for loc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2179#issuecomment-53690416 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19391/consoleFull) for PR 2179 at commit [`66679d2`](https://github.com/apache/spark/commit/66679d2245b15dd6516d392a404dc81843995a55). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3280] Made sort-based shuffle the defau...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2178#issuecomment-53690413 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19392/consoleFull) for PR 2178 at commit [`1445ef2`](https://github.com/apache/spark/commit/1445ef24b680ad746fd1660bb1ec1ff36ccce634). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3173][SQL] Timestamp support in the par...
Github user byF commented on the pull request: https://github.com/apache/spark/pull/2084#issuecomment-53690368 @SparkQA says the test fails https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19336/consoleFull I was running the test from Intellij where it's passed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3173][SQL] Timestamp support in the par...
Github user byF commented on a diff in the pull request: https://github.com/apache/spark/pull/2084#discussion_r16828221 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala --- @@ -218,11 +218,18 @@ trait HiveTypeCoercion { case a: BinaryArithmetic if a.right.dataType == StringType = a.makeCopy(Array(a.left, Cast(a.right, DoubleType))) + case p: BinaryPredicate if p.left.dataType == TimestampType + p.right.dataType == StringType = +p.makeCopy(Array(p.left, Cast(p.right, TimestampType))) + case p: BinaryPredicate if p.left.dataType == StringType p.right.dataType != StringType = p.makeCopy(Array(Cast(p.left, DoubleType), p.right)) case p: BinaryPredicate if p.left.dataType != StringType p.right.dataType == StringType = p.makeCopy(Array(p.left, Cast(p.right, DoubleType))) + case i@In(a,b) if a.dataType == TimestampType b.forall(_.dataType==StringType) = --- End diff -- That's what the `forall` condition does, doesn't it. If the left expression is Timestamp and all the right expression elements are String, only then promote the strings into timestamps. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3269][SQL] Decreases initial buffer siz...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2171#issuecomment-53690908 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19393/consoleFull) for PR 2171 at commit [`5e1623b`](https://github.com/apache/spark/commit/5e1623b5b4af127e9cc240327b23f428a104c36d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3173][SQL] Timestamp support in the par...
Github user byF commented on a diff in the pull request: https://github.com/apache/spark/pull/2084#discussion_r16828700 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/HiveTypeCoercion.scala --- @@ -218,11 +218,18 @@ trait HiveTypeCoercion { case a: BinaryArithmetic if a.right.dataType == StringType = a.makeCopy(Array(a.left, Cast(a.right, DoubleType))) + case p: BinaryPredicate if p.left.dataType == TimestampType --- End diff -- Fixed in 47b27b4 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3173][SQL] Timestamp support in the par...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2084#issuecomment-53693340 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19394/consoleFull) for PR 2084 at commit [`47b27b4`](https://github.com/apache/spark/commit/47b27b427c27018d92b7a2fdb68c397dbc7015c0). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1912] Lazily initialize buffers for loc...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2179#issuecomment-53697805 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19391/consoleFull) for PR 2179 at commit [`66679d2`](https://github.com/apache/spark/commit/66679d2245b15dd6516d392a404dc81843995a55). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3280] Made sort-based shuffle the defau...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2178#issuecomment-53698044 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19392/consoleFull) for PR 2178 at commit [`1445ef2`](https://github.com/apache/spark/commit/1445ef24b680ad746fd1660bb1ec1ff36ccce634). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-53698415 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19395/consoleFull) for PR 1983 at commit [`46cf160`](https://github.com/apache/spark/commit/46cf160a02c1f020dcfbfeb65be15e08fc1d2852). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-53699785 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19396/consoleFull) for PR 1983 at commit [`9bb1931`](https://github.com/apache/spark/commit/9bb193158b562d2cd7be33cfbed37370adc37c6e). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-53699939 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19396/consoleFull) for PR 1983 at commit [`9bb1931`](https://github.com/apache/spark/commit/9bb193158b562d2cd7be33cfbed37370adc37c6e). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Document(docId: Int, content: Iterable[Int], var topics: Iterable[Int] = null,` * `class TopicCounters(val topicCounts: BV[Double],` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...
Github user witgo commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-53700053 @mengxr This patch removed the `accumulable` operation . repair formula errors in `dropOneDistSampler ` method and some of the performance optimization. About how I store model, I have not yet mature ideas. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3198] [SQL] Remove the TreeNode.id
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2155#issuecomment-53700780 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19390/consoleFull) for PR 2155 at commit [`5873415`](https://github.com/apache/spark/commit/5873415ec1e0cd670adf144144eaf8060e412503). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * ` implicit class TreeNodeRef(val obj: TreeNode[_]) ` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3269][SQL] Decreases initial buffer siz...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2171#issuecomment-53701899 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19393/consoleFull) for PR 2171 at commit [`5e1623b`](https://github.com/apache/spark/commit/5e1623b5b4af127e9cc240327b23f428a104c36d). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3251][MLLIB]: Clarify learning interfac...
Github user BigCrunsh commented on the pull request: https://github.com/apache/spark/pull/2137#issuecomment-53702702 @mengxr, might it be that you mistake logistic regression for Naive Bayes? Logistic regression typically predicts well-calibrated probabilities, see e.g. [1]; it might only be problematic if the data can be separated perfectly. The learning algorithm returns (is responsible for) a model that maximizes the likelihood of the data under the model assumption; in classification, the returned probability measures how likely it is that a certain label is generated by the learned model for a given example. Adding an isotonic regression is a good idea anyways. I think we should definitely distinguish between the output of the linear model (score) and the calibrated value (probability); it depends on the task, which one of them is needed. Furthermore, having a function that changes the type of output depending on the model is misleading. E.g, one should expect that a score function always returns an arbitrary real value and that the calibrated version returns a value between zero and one. Sklearn [2] for example makes this distinctions too: ``decision_function`` for scores, ``predict`` for class labels, ``predict_proba`` for probability estimates. However, it is not obvious what ``predict`` returns (@mengxr: what do you mean with raw predictions). My suggestion would be: - ``classify`` or ``predictClass`` for the class; - ``score`` or ``decisionValue`` or ``predictScore`` for the outcome of the linear model; - ``probabilityEstimate`` or ``predictProbability`` for an estimate of the class probability. Perhaps ``predict`` could return the class for classification and the regression value for regression tasks (or just be maintained as deprecated version). [1] Niculescu-Mizil, Alexandru, and Rich Caruana. Predicting good probabilities with supervised learning. Proceedings of the 22nd international conference on Machine learning. ACM, 2005. [2] http://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3279] Remove useless field variable in ...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2177#issuecomment-53702994 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3173][SQL] Timestamp support in the par...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2084#issuecomment-53703006 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19394/consoleFull) for PR 2084 at commit [`47b27b4`](https://github.com/apache/spark/commit/47b27b427c27018d92b7a2fdb68c397dbc7015c0). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3279] Remove useless field variable in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2177#issuecomment-53703196 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19397/consoleFull) for PR 2177 at commit [`2955edc`](https://github.com/apache/spark/commit/2955edc255432f9432a5e832f20651ab24f6c63f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3281] Remove Netty specific code in Blo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2181#issuecomment-53704032 **Tests timed out** after a configured wait of `120m`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-53704442 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19398/consoleFull) for PR 1983 at commit [`c575afd`](https://github.com/apache/spark/commit/c575afdf8ff231f7fb54ce18db4e45b653854d9a). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-53704512 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19395/consoleFull) for PR 1983 at commit [`46cf160`](https://github.com/apache/spark/commit/46cf160a02c1f020dcfbfeb65be15e08fc1d2852). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/2134#discussion_r16832936 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -200,81 +248,118 @@ private[spark] class MemoryStore(blockManager: BlockManager, maxMemory: Long) * checking whether the memory restrictions for unrolling blocks are still satisfied, * stopping immediately if not. This check is a safeguard against the scenario in which * there is not enough free memory to accommodate the entirety of a single block. + * + * When there is not enough memory for unrolling blocks, old blocks will be dropped from + * memory. The dropping operation is in parallel to fully utilized the disk throughput + * when there are multiple disks. And befor dropping, each thread will mark the old blocks + * that can be dropped. * * This method returns either an array with the contents of the entire block or an iterator * containing the values of the block (if the array would have exceeded available memory). */ + def unrollSafely( - blockId: BlockId, - values: Iterator[Any], - droppedBlocks: ArrayBuffer[(BlockId, BlockStatus)]) -: Either[Array[Any], Iterator[Any]] = { +blockId: BlockId, --- End diff -- incorrect indentation. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3000][CORE] drop old blocks to disk in ...
Github user ScrapCodes commented on a diff in the pull request: https://github.com/apache/spark/pull/2134#discussion_r16832952 --- Diff: core/src/main/scala/org/apache/spark/storage/MemoryStore.scala --- @@ -291,54 +376,71 @@ private[spark] class MemoryStore(blockManager: BlockManager, maxMemory: Long) * an Array if deserialized is true or a ByteBuffer otherwise. Its (possibly estimated) size * must also be passed by the caller. * - * Synchronize on `accountingLock` to ensure that all the put requests and its associated block - * dropping is done by only on thread at a time. Otherwise while one thread is dropping - * blocks to free memory for one block, another thread may use up the freed space for - * another block. - * + * In order to drop old blocks in parallel, we will first mark the blocks that can be dropped + * when there is not enough memory. + * * Return whether put was successful, along with the blocks dropped in the process. */ - private def tryToPut( - blockId: BlockId, - value: Any, - size: Long, - deserialized: Boolean): ResultWithDroppedBlocks = { -/* TODO: Its possible to optimize the locking by locking entries only when selecting blocks - * to be dropped. Once the to-be-dropped blocks have been selected, and lock on entries has - * been released, it must be ensured that those to-be-dropped blocks are not double counted - * for freeing up more space for another block that needs to be put. Only then the actually - * dropping of blocks (and writing to disk if necessary) can proceed in parallel. */ + private def tryToPut( +blockId: BlockId, --- End diff -- same here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3272][MLLib]Calculate prediction for no...
Github user chouqin commented on a diff in the pull request: https://github.com/apache/spark/pull/2180#discussion_r16832992 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/tree/DecisionTreeSuite.scala --- @@ -34,9 +34,9 @@ import org.apache.spark.mllib.regression.LabeledPoint class DecisionTreeSuite extends FunSuite with LocalSparkContext { def validateClassifier( - model: DecisionTreeModel, - input: Seq[LabeledPoint], - requiredAccuracy: Double) { + model: DecisionTreeModel, + input: Seq[LabeledPoint], --- End diff -- Thanks, this is my fault, I will fix style soon --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3272][MLLib]Calculate prediction for no...
Github user chouqin commented on the pull request: https://github.com/apache/spark/pull/2180#issuecomment-53706696 @ScrapCodes thanks for you comments, I have changed indentation to meet the spark style guide just now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3279] Remove useless field variable in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2177#issuecomment-53707200 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19397/consoleFull) for PR 2177 at commit [`2955edc`](https://github.com/apache/spark/commit/2955edc255432f9432a5e832f20651ab24f6c63f). * This patch **fails** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Using values.sum is easier to understand than ...
GitHub user watermen opened a pull request: https://github.com/apache/spark/pull/2182 Using values.sum is easier to understand than using values.foldLeft(0)(_ + _) def sum[B : A](implicit num: Numeric[B]): B = foldLeft(num.zero)(num.plus) Using values.sum is easier to understand than using values.foldLeft(0)(_ + _), so we'd better use values.sum instead of values.foldLeft(0)(_ + _) You can merge this pull request into a Git repository by running: $ git pull https://github.com/watermen/spark bug-fix3 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/2182.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2182 commit 57e704c5ee0423fbdc9302895d4decad1b4164ef Author: Yadong Qi qiyadong2...@gmail.com Date: 2014-08-28T11:16:38Z Update StatefulNetworkWordCount.scala commit 714bda5983dd08f5a5a5a850b09ad50955c65a33 Author: Yadong Qi qiyadong2...@gmail.com Date: 2014-08-28T11:17:28Z Update BasicOperationsSuite.scala commit 17be9fb734d442c86723080e3df03007e002c364 Author: Yadong Qi qiyadong2...@gmail.com Date: 2014-08-28T11:23:41Z Update CheckpointSuite.scala --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Using values.sum is easier to understand than ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2182#issuecomment-53707923 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Using values.sum is easier to understand than ...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2182#issuecomment-53708729 +1 -- you should open a JIRA though. Although there's reluctance to do cross-cutting code polish PRs, this looks targeted, restricted to example/test code, and is also something that I've wanted to zap for a while. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [WIP][SPARK-1405][MLLIB]Collapsed Gibbs sampli...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1983#issuecomment-53709038 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19398/consoleFull) for PR 1983 at commit [`c575afd`](https://github.com/apache/spark/commit/c575afdf8ff231f7fb54ce18db4e45b653854d9a). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `case class Document(docId: Int, content: Iterable[Int], var topics: Iterable[Int] = null,` * `class TopicCounters(val topicCounts: BV[Double],` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3251][MLLIB]: Clarify learning interfac...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2137#issuecomment-53711128 QA tests have started for PR 2137. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19399/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3279] Remove useless field variable in ...
Github user sarutak commented on the pull request: https://github.com/apache/spark/pull/2177#issuecomment-53713170 Jenkins, retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3279] Remove useless field variable in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2177#issuecomment-53713444 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19400/consoleFull) for PR 2177 at commit [`2955edc`](https://github.com/apache/spark/commit/2955edc255432f9432a5e832f20651ab24f6c63f). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3187] [yarn] Cleanup allocator code.
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/2169#issuecomment-53716263 @vanzin is this purely moving things around again or does it also subsume https://github.com/apache/spark/pull/655/? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3069 [DOCS] Build instructions in README...
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/2014#issuecomment-53717258 @nchammas @pwendell Is the net conclusion that `README.md` should use Maven if anything? I'd be happy to move the wiki into `CONTRIBUTING.md` but then I can't remove the wiki page and it ends up duplicated again. Maybe it's fine as is and the important change is getting the file in place to trigger the prompt on the PR screen. If so then I think this is still ready for review/merge as you all see fit. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3251][MLLIB]: Clarify learning interfac...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2137#issuecomment-53718901 QA results for PR 2137:br- This patch FAILED unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19399/consoleFull --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3285] [examples] Using values.sum is ea...
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2182#issuecomment-53719188 +1 nice catch, the simpler the examples the easier they'll be to consume by their intended audience: folks who aren't experts yet --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3279] Remove useless field variable in ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2177#issuecomment-53719526 [QA tests have finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19400/consoleFull) for PR 2177 at commit [`2955edc`](https://github.com/apache/spark/commit/2955edc255432f9432a5e832f20651ab24f6c63f). * This patch **passes** unit tests. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3280] Made sort-based shuffle the defau...
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2178#issuecomment-53719576 is the testing captured somewhere so this change can be evaluated in the future, maybe against other strategies? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3273]The spark version in the welcome m...
Github user mattf commented on a diff in the pull request: https://github.com/apache/spark/pull/2175#discussion_r16838727 --- Diff: repl/src/main/scala/org/apache/spark/repl/SparkILoopInit.scala --- @@ -26,9 +28,9 @@ trait SparkILoopInit { __ / __/__ ___ _/ /__ _\ \/ _ \/ _ `/ __/ '_/ - /___/ .__/\_,_/_/ /_/\_\ version 1.0.0-SNAPSHOT + /___/ .__/\_,_/_/ /_/\_\ version %s /_/ -) +.format(SparkContext.SPARK_VERSION)) --- End diff -- glad to see this change! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3264] Allow users to set executor Spark...
Github user mattf commented on the pull request: https://github.com/apache/spark/pull/2166#issuecomment-53720190 lgtm, nice idea i've been using rpm installed spark, which provides a single version and location on all nodes. however, this will make for a clear path to running multiple versions of spark on a mesos cluster. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3263][GraphX] Fix changes made to Graph...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2168#issuecomment-53721328 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19401/consoleFull) for PR 2168 at commit [`dfbb6dd`](https://github.com/apache/spark/commit/dfbb6dd86256ba176a304d103c9b0b9fe00d302b). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3263][GraphX] Fix changes made to Graph...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2168#issuecomment-53722074 [QA tests have started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/19402/consoleFull) for PR 2168 at commit [`1c8fc44`](https://github.com/apache/spark/commit/1c8fc4407cabc8c592bde4b696bd09f5b3766da5). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org