[GitHub] spark pull request: [FIX][DOC] Fix broken links in ml-guide.md
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3601 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [FIX][DOC] Fix broken links in ml-guide.md
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/3601#issuecomment-65623870 Merged into master and branch-1.2. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4743 - Use SparkEnv.serializer instead o...
GitHub user IvanVergiliev opened a pull request: https://github.com/apache/spark/pull/3605 SPARK-4743 - Use SparkEnv.serializer instead of closureSerializer in aggregateByKey and foldByKey You can merge this pull request into a Git repository by running: $ git pull https://github.com/IvanVergiliev/spark change-serializer Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3605.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3605 commit a49b7cf7b507d7dbd3aba587eeb99125ce3e8203 Author: Ivan Vergiliev i...@leanplum.com Date: 2014-12-04T12:08:12Z Use serializer instead of closureSerializer in aggregate/foldByKey. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4743 - Use SparkEnv.serializer instead o...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3605#issuecomment-65624217 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [FIX][DOC] Fix broken links in ml-guide.md
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3601#issuecomment-65625554 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24138/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [FIX][DOC] Fix broken links in ml-guide.md
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3601#issuecomment-65625544 [Test build #24138 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24138/consoleFull) for PR 3601 at commit [`c559768`](https://github.com/apache/spark/commit/c559768a78cbfab84038b6d2489b923f24ed79a7). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3586][streaming]Support nested director...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/2765#issuecomment-65619648 [Test build #24137 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24137/consoleFull) for PR 2765 at commit [`cba8a2e`](https://github.com/apache/spark/commit/cba8a2e6cf11741867561ce1c0d7d2eda66033c6). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3586][streaming]Support nested director...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2765#issuecomment-65619652 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24137/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4494] IDFModel.transform() add support ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3603#issuecomment-65619901 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24136/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4735]Spark SQL UDF doesn't support 0 ar...
GitHub user potix2 opened a pull request: https://github.com/apache/spark/pull/3604 [SPARK-4735]Spark SQL UDF doesn't support 0 arguments I fixed the udf bug. https://issues.apache.org/jira/browse/SPARK-4735 You can merge this pull request into a Git repository by running: $ git pull https://github.com/potix2/spark bugfix-4735 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3604.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3604 commit 025537a1ec966fa34330fbbc1ab29c2d3d9943cf Author: Katsunori Kanda ka...@amoad.com Date: 2014-12-04T11:52:06Z Add UdfRegistration.registerFunction() for Function0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4494] IDFModel.transform() add support ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3603#issuecomment-65619897 [Test build #24136 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24136/consoleFull) for PR 3603 at commit [`d25e49b`](https://github.com/apache/spark/commit/d25e49b01ad5e160366b5e4512ff0826f3cf2740). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4735]Spark SQL UDF doesn't support 0 ar...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3604#issuecomment-65621779 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4744] [SQL] Short circuit evaluation fo...
GitHub user chenghao-intel opened a pull request: https://github.com/apache/spark/pull/3606 [SPARK-4744] [SQL] Short circuit evaluation for AND OR in CodeGen You can merge this pull request into a Git repository by running: $ git pull https://github.com/chenghao-intel/spark codegen_short_circuit Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3606.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3606 commit f466303f872bcfd5e056f626f41108b748011680 Author: Cheng Hao hao.ch...@intel.com Date: 2014-12-04T12:47:11Z short circuit for AND OR --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4744] [SQL] Short circuit evaluation fo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3606#issuecomment-65628448 [Test build #24139 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24139/consoleFull) for PR 3606 at commit [`f466303`](https://github.com/apache/spark/commit/f466303f872bcfd5e056f626f41108b748011680). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3928][SQL] Support wildcard matches on ...
Github user tkyaw commented on the pull request: https://github.com/apache/spark/pull/3407#issuecomment-65630704 Make following changes as suggested. (1) Change PR request and commit message. (2) Update parquetFile doc message. (3) Added test case. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3928][SQL] Support wildcard matches on ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3407#issuecomment-65630826 [Test build #24140 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24140/consoleFull) for PR 3407 at commit [`ceded32`](https://github.com/apache/spark/commit/ceded32aa2a487af41678807e56f32448af38096). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4697][YARN]System properties should ove...
Github user tgravescs commented on the pull request: https://github.com/apache/spark/pull/3557#issuecomment-65632939 Did you test it to see if they still work? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...
Github user jacek-lewandowski commented on the pull request: https://github.com/apache/spark/pull/3571#issuecomment-65633106 @pwendell is it possible to access log file somehow? I don't know how to replicate the problems - what is the operating system Jenkins is running on? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1953]yarn client mode Application Maste...
GitHub user WangTaoTheTonic opened a pull request: https://github.com/apache/spark/pull/3607 [SPARK-1953]yarn client mode Application Master memory size is same as driver memory... ... size Ways to set Application Master's memory on yarn-client mode: Item 1 `--am-memory MEM` in SparkSubmit args Item 2 `spark.yarn.appMaster.memory` in SparkConf or System Properties Item 3 `SPARK_YARN_AM_MEMORY` in System env Item 4 default value 512m Note: this arguments is only available in yarn-client mode and will be assigned with driver memory in yarn-cluster mode. You can merge this pull request into a Git repository by running: $ git pull https://github.com/WangTaoTheTonic/spark SPARK4181 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3607.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3607 commit 0566bb89318a848ee6d2f551430d9fd135a22c7d Author: WangTaoTheTonic barneystin...@aliyun.com Date: 2014-12-04T13:42:47Z yarn client mode Application Master memory size is same as driver memory size --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1953]yarn client mode Application Maste...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/3607#issuecomment-65635264 @tgravescs --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4744] [SQL] Short circuit evaluation fo...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3606#issuecomment-65635600 [Test build #24139 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24139/consoleFull) for PR 3606 at commit [`f466303`](https://github.com/apache/spark/commit/f466303f872bcfd5e056f626f41108b748011680). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4744] [SQL] Short circuit evaluation fo...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3606#issuecomment-65635609 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24139/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1953]yarn client mode Application Maste...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3607#issuecomment-65635774 [Test build #24141 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24141/consoleFull) for PR 3607 at commit [`0566bb8`](https://github.com/apache/spark/commit/0566bb89318a848ee6d2f551430d9fd135a22c7d). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3586][streaming]Support nested director...
Github user wangxiaojing commented on the pull request: https://github.com/apache/spark/pull/2765#issuecomment-65637021 @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3586][streaming]Support nested director...
Github user wangxiaojing commented on the pull request: https://github.com/apache/spark/pull/2765#issuecomment-65637093 @liancheng --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2188] Support sbt/sbt for Windows
Github user tsudukim commented on the pull request: https://github.com/apache/spark/pull/3591#issuecomment-65638788 @pwendell Thank you for your comment. I quite agree that Windows script like .cmd or .bat is very high cost for maintainance, but this time I used PowerShell which is a scripting language differently from .cmd or .bat. You can see the script inside. Linux version and PowerShell version have same structure (functions, variables, ...) so I think it's easier to read or modify. And Yes, I use Windows for daily Spark development. For sbt and maven, sbt is much better for trial and erro development as you know. I think the reason why I want sbt is the same as why we use sbt rather than maven for development on linux. I also use maven as a final check, but sbt is more useful for continuous development. About cygwin, I'm not using cygwin. Cygwin environment is so polluted by cygwin functions or cygwin variables that the behavior of Windows becomes strange. That's critical for enterprise systems. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3928][SQL] Support wildcard matches on ...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3407#issuecomment-65639449 [Test build #24140 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24140/consoleFull) for PR 3407 at commit [`ceded32`](https://github.com/apache/spark/commit/ceded32aa2a487af41678807e56f32448af38096). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3928][SQL] Support wildcard matches on ...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3407#issuecomment-65639456 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24140/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4697][YARN]System properties should ove...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/3557#issuecomment-65644328 @tgravescs Ok if you mean whether the app name could still be shown correctly, I will test it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4461][YARN] pass extra java options to ...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/3409#discussion_r21309157 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ClientBase.scala --- @@ -360,6 +360,10 @@ private[spark] trait ClientBase extends Logging { } } +// include yarn am specific java options +sparkConf.getOption(spark.yarn.am.extraJavaOptions) + .foreach(opts = javaOpts += opts) --- End diff -- so currently this affects both cluster and client mode since driver.extraJavaOptions applies in cluster mode I think we should make this only apply in client mode. Otherwise we should define precendence between it and the driver.extraJavaOptions in driver mode or potentially error if they aren't set. It seems most straight forward to only have it apply in client mode but I'm open to thoughts - @vanzin @andrewor14 . Also we should run this through the check like in SparkConf for spark.executor.extraJavaOptions to make sure no spark configs or -Xmx is set. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3779. yarn spark.yarn.applicationMaster....
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/3471#discussion_r21309273 --- Diff: docs/running-on-yarn.md --- @@ -22,10 +22,12 @@ Most of the configs are the same for Spark on YARN as for other deployment modes table class=table trthProperty Name/ththDefault/ththMeaning/th/tr tr - tdcodespark.yarn.applicationMaster.waitTries/code/td - td10/td + tdcodespark.yarn.applicationMaster.waitTime/code/td --- End diff -- can we rename it to spark.yarn.am.waitTime? (to be consistent with pr 3409 and possibly others) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3779. yarn spark.yarn.applicationMaster....
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/3471#discussion_r21309453 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -329,8 +329,10 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments, sparkContextRef.synchronized { var count = 0 --- End diff -- count isn't needed anymore --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3779. yarn spark.yarn.applicationMaster....
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/3471#discussion_r21309571 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -353,13 +355,13 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments, val hostport = args.userArgs(0) val (driverHost, driverPort) = Utils.parseHostPort(hostport) -// spark driver should already be up since it launched us, but we don't want to +// Spark driver should already be up since it launched us, but we don't want to // wait forever, so wait 100 seconds max to match the cluster mode setting. -// Leave this config unpublished for now. SPARK-3779 to investigating changing -// this config to be time based. -val numTries = sparkConf.getInt(spark.yarn.applicationMaster.waitTries, 1000) +val waitTime = 100 +val totalWaitTime = sparkConf.getInt(spark.yarn.applicationMaster.waitTime, 10) +val deadline = System.currentTimeMillis + totalWaitTime -while (!driverUp !finished count numTries) { +while (!driverUp !finished System.currentTimeMillis deadline + waitTime) { try { count = count + 1 --- End diff -- count isn't needed anymore --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3779. yarn spark.yarn.applicationMaster....
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/3471#discussion_r21309688 --- Diff: yarn/common/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -353,13 +355,13 @@ private[spark] class ApplicationMaster(args: ApplicationMasterArguments, val hostport = args.userArgs(0) val (driverHost, driverPort) = Utils.parseHostPort(hostport) -// spark driver should already be up since it launched us, but we don't want to +// Spark driver should already be up since it launched us, but we don't want to // wait forever, so wait 100 seconds max to match the cluster mode setting. -// Leave this config unpublished for now. SPARK-3779 to investigating changing -// this config to be time based. -val numTries = sparkConf.getInt(spark.yarn.applicationMaster.waitTries, 1000) +val waitTime = 100 +val totalWaitTime = sparkConf.getInt(spark.yarn.applicationMaster.waitTime, 10) +val deadline = System.currentTimeMillis + totalWaitTime -while (!driverUp !finished count numTries) { +while (!driverUp !finished System.currentTimeMillis deadline + waitTime) { --- End diff -- why are we adding waitTime to deadline here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1953]yarn client mode Application Maste...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/3607#discussion_r21309773 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -107,6 +108,10 @@ private[spark] class SparkSubmitArguments(args: Seq[String], env: Map[String, St .orElse(sparkProperties.get(spark.driver.memory)) .orElse(env.get(SPARK_DRIVER_MEMORY)) .orNull +amMemory = Option(amMemory) +.orElse(sparkProperties.get(spark.yarn.appMaster.memory)) --- End diff -- can we call this spark.yarn.am.memory --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1953]yarn client mode Application Maste...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/3607#discussion_r21309762 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -107,6 +108,10 @@ private[spark] class SparkSubmitArguments(args: Seq[String], env: Map[String, St .orElse(sparkProperties.get(spark.driver.memory)) .orElse(env.get(SPARK_DRIVER_MEMORY)) .orNull +amMemory = Option(amMemory) +.orElse(sparkProperties.get(spark.yarn.appMaster.memory)) +.orElse(env.get(SPARK_YARN_AM_MEMORY)) --- End diff -- env variables are only for backwards compatibility we shouldn't add them for new configs so can you please remove it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1953]yarn client mode Application Maste...
Github user WangTaoTheTonic commented on a diff in the pull request: https://github.com/apache/spark/pull/3607#discussion_r21310482 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -107,6 +108,10 @@ private[spark] class SparkSubmitArguments(args: Seq[String], env: Map[String, St .orElse(sparkProperties.get(spark.driver.memory)) .orElse(env.get(SPARK_DRIVER_MEMORY)) .orNull +amMemory = Option(amMemory) +.orElse(sparkProperties.get(spark.yarn.appMaster.memory)) --- End diff -- Ok, I use appMaster because we already have an item called spark.yarn.appMasterEnv.*, but spark.yarn.am.memory looks more simple. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1953]yarn client mode Application Maste...
Github user WangTaoTheTonic commented on a diff in the pull request: https://github.com/apache/spark/pull/3607#discussion_r21310507 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -107,6 +108,10 @@ private[spark] class SparkSubmitArguments(args: Seq[String], env: Map[String, St .orElse(sparkProperties.get(spark.driver.memory)) .orElse(env.get(SPARK_DRIVER_MEMORY)) .orNull +amMemory = Option(amMemory) +.orElse(sparkProperties.get(spark.yarn.appMaster.memory)) +.orElse(env.get(SPARK_YARN_AM_MEMORY)) --- End diff -- Got it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1953]yarn client mode Application Maste...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3607#issuecomment-65648737 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24141/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1953]yarn client mode Application Maste...
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3607#issuecomment-65648726 [Test build #24141 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24141/consoleFull) for PR 3607 at commit [`0566bb8`](https://github.com/apache/spark/commit/0566bb89318a848ee6d2f551430d9fd135a22c7d). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1953]yarn client mode Application Maste...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/3607#discussion_r21310970 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -107,6 +108,10 @@ private[spark] class SparkSubmitArguments(args: Seq[String], env: Map[String, St .orElse(sparkProperties.get(spark.driver.memory)) .orElse(env.get(SPARK_DRIVER_MEMORY)) .orNull +amMemory = Option(amMemory) +.orElse(sparkProperties.get(spark.yarn.appMaster.memory)) --- End diff -- Yeah there are various prs up right now with am related configs, trying to be consistent and use .am. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1953]yarn client mode Application Maste...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/3607#discussion_r21311166 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -279,6 +285,10 @@ private[spark] class SparkSubmitArguments(args: Seq[String], env: Map[String, St driverExtraLibraryPath = value parse(tail) + case (--am-memory) :: value :: tail = --- End diff -- I am a little bit on the fence here about having this config in spark-submit. I'm not sure if it will cause more confusion since it only applies to client mode. I'm wondering if perhaps we just add the config for now. @vanzin @andrewor14 thoughts on that since you both commented on the am.extraJavaOptions pr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4730][YARN] Warn against deprecated YAR...
Github user tgravescs commented on a diff in the pull request: https://github.com/apache/spark/pull/3590#discussion_r21311665 --- Diff: yarn/common/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala --- @@ -78,11 +79,25 @@ private[spark] class YarnClientSchedulerBackend( (--queue, SPARK_YARN_QUEUE, spark.yarn.queue), (--name, SPARK_YARN_APP_NAME, spark.app.name) ) +// Warn against the following deprecated environment variables: env var - suggestion +val deprecatedEnvVars = Map( + SPARK_MASTER_MEMORY - SPARK_DRIVER_MEMORY or --driver-memory through spark-submit, + SPARK_WORKER_INSTANCES - SPARK_WORKER_INSTANCES or --num-executors through spark-submit, + SPARK_WORKER_MEMORY - SPARK_EXECUTOR_MEMORY or --executor-memory through spark-submit, + SPARK_WORKER_CORES - SPARK_EXECUTOR_CORES or --executor-cores through spark-submit) --- End diff -- aren't essentially all of the env variables deprecated? I know we have warnings on some throughout the code but haven't checked for all of them. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4461][YARN] pass extra java options to ...
Github user WangTaoTheTonic commented on the pull request: https://github.com/apache/spark/pull/3409#issuecomment-65651760 Agree with @tgravescs, adding spark.yarn.am.extraClassPath and spark.yarn.am.extraLibraryPath together would be better. @zhzhan You can also check https://issues.apache.org/jira/browse/SPARK-4181. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4685] Include all spark.ml and spark.ml...
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/3598#issuecomment-65664523 LGTM in retrospect --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3779. yarn spark.yarn.applicationMaster....
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/3471#issuecomment-65675297 Thanks for the feedback, Tom. Updated the patch to reflect your and Wang Tao's comments. I left out adding MS to the config name because it's inconsistent with all Spark's existing configs. I agree that it would have been better to start out including the units in config names, but I think it'll be confusing to different conventions for different configs here. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3779. yarn spark.yarn.applicationMaster....
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3471#issuecomment-65675709 [Test build #24142 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24142/consoleFull) for PR 3471 at commit [`ce6dff2`](https://github.com/apache/spark/commit/ce6dff2be37e1ab40925f2b60182565386245438). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4683][SQL] Add a beeline.cmd to run on ...
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/3599#issuecomment-65677001 Thanks Cheng, I'll pull this in. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4683][SQL] Add a beeline.cmd to run on ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3599 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [STREAMING] Add redis pub/sub streaming suppor...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/2348#issuecomment-65678290 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-1953]yarn client mode Application Maste...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/3607#discussion_r21324538 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -279,6 +285,10 @@ private[spark] class SparkSubmitArguments(args: Seq[String], env: Map[String, St driverExtraLibraryPath = value parse(tail) + case (--am-memory) :: value :: tail = --- End diff -- I'd prefer not to add this to SparkSubmit. I've never seen someone have to fiddle with that value, so my guess is that this is such an uncommon need that those who want to use it wouldn't be bothered by the more verbose --conf approach. Also, should probably add a memory overhead config too. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2199] [mllib] topic modeling
Github user jkbradley commented on the pull request: https://github.com/apache/spark/pull/1269#issuecomment-65682412 @akopich Thanks for the responses! Follow-ups: (1) Users implementing their own regularizers You're right that this would be nice to have. If we include it, then perhaps we can spend more time to have a clean API for regularizers which can be re-used in other algorithms which try to optimize an objective. (E.g., ideally, Dirichlet regularization would be implemented such that any algorithm which needed a Dirichlet regularizer (or prior) could reuse your code.) Here are some thoughts on that: * All of the regularizers here operate on each Matrix element individually. If that will be the case for all useful regularizers, then the regularizer API could operate per-element (to be simpler), and the code using the regularizers could iterate over all elements as needed. * The regularizer API should use general terminology, such as: * penalty(param: Double): Double * gradient(param: Double): Double Alternatively, we could use this development path to avoid having to decide on the API right now: * In this PR, keep the regularizer types public, but make all of their internals private[mllib]. This way, the API choices are kept private for now. * In a later PR, the regularizer API can be refined and made public so that users can implement their own pluggable types. (2) Regular and Robust in the same class You should be able to extract the functionality you need. E.g., if the newIteration method knows that it has a DocumentParameters instance, it can call getNewTheta(). If the actual type of the instance is RobustDocumentParameters, then RobustDocumentParameters.getNewTheta() will be called. But I would recommend having both classes implement getNewTheta() with the same visibility (private/public), and also to use the âoverrideâ keyword in RobustDocumentParameters. You should be able to abstract any other needed functionality similarly. (3) PLSA and RobustPLSA code duplication Looking more closely, I think youâre right about it being hard to abstract further. (Iâll let you know if I have ideas.) (4) Float vs. Double True, it may be worth the trouble to use Float to save on memory and communication. I donât know enough about PLSA to know how important numerical precision is in general. Your approach sounds reasonable then. One alternative would be to user Breeze matrices (but not in public APIs!), but Iâd only suggest that if it will simplify or shorten code. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark Core - [SPARK-3620] - Refactor of SparkS...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/2516#issuecomment-65682500 @pwendell I was less interested in the refactoring part than in formalizing the precedence for the options in a more obvious manner in the code. Right now that's a little confusing. But yeah, this patch is rather large, and a lot has changed since it was last updated... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Spark 3883: SSL support for HttpServer and Akk...
Github user vanzin commented on the pull request: https://github.com/apache/spark/pull/3571#issuecomment-65682860 @jacek-lewandowski from a quick look at the diff, it seems you didn't change anything w.r.t. the configuration. In master, there's no need to add a new config file nor all the different ways of loading it - all daemons should be loading spark-defaults.conf and so you could just use SparkConf for everything like I suggested in the old PR. Did you have a chance to look at that? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4397] Move object RDD to the front of R...
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/3580#issuecomment-65682925 Good catch on the return types. Would be great if we can make ScalaStyle complain about those. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4697][YARN]System properties should ove...
Github user vanzin commented on a diff in the pull request: https://github.com/apache/spark/pull/3557#discussion_r21328217 --- Diff: yarn/common/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala --- @@ -79,10 +79,10 @@ private[spark] class YarnClientSchedulerBackend( (--name, SPARK_YARN_APP_NAME, spark.app.name) ) optionTuples.foreach { case (optionName, envVar, sparkProp) = - if (System.getenv(envVar) != null) { -extraArgs += (optionName, System.getenv(envVar)) - } else if (sc.getConf.contains(sparkProp)) { + if (sc.getConf.contains(sparkProp)) { extraArgs += (optionName, sc.getConf.get(sparkProp)) + } else if (System.getenv(envVar) != null) { +extraArgs += (optionName, System.getenv(envVar)) --- End diff -- The method you're modifying (`getExtraClientArguments`) is the one that defines the `--name` argument for `ClientArguments`. And you're inverting the priority here, so that `spark.app.name` `SPARK_YARN_APP_NAME`. So basically, since `spark.app.name` is mandatory, `SPARK_YARN_APP_NAME` becomes useless. But please test it; make sure both work, both in client and cluster mode. Something might have changed since those fixes went in, although I kinda doubt it. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-2188] Support sbt/sbt for Windows
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3591#issuecomment-65687621 Our `sbt` shell scripts are from the [sbt-launcher-package](https://github.com/sbt/sbt-launcher-package) project. Do you think we should try to submit this change upstream first? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3779. yarn spark.yarn.applicationMaster....
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3471#issuecomment-65687913 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24142/ Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-3779. yarn spark.yarn.applicationMaster....
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3471#issuecomment-65687902 [Test build #24142 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24142/consoleFull) for PR 3471 at commit [`ce6dff2`](https://github.com/apache/spark/commit/ce6dff2be37e1ab40925f2b60182565386245438). * This patch **passes all tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4253]Ignore spark.driver.host in yarn-c...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3112 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4253]Ignore spark.driver.host in yarn-c...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3112#issuecomment-65693674 Thanks for testing this out! I also tested it with my own integration test and it now passes, so this looks good to me. I'm going to merge this into `master` and `branch-1.2`. I'll edit the commit message to reflect the bug description from JIRA. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4253]Ignore spark.driver.host in yarn-c...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3112#issuecomment-65694771 I've also backported this to `branch-1.1`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX] Fixing two issues with the release sc...
GitHub user pwendell opened a pull request: https://github.com/apache/spark/pull/3608 [HOTFIX] Fixing two issues with the release script. 1. The version replacement was still producing some false changes. 2. Uploads to the staging repo specifically. You can merge this pull request into a Git repository by running: $ git pull https://github.com/pwendell/spark release-script Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3608.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3608 commit 3c63294a3109b13ad570a7a6056eedae0558f029 Author: Patrick Wendell pwend...@gmail.com Date: 2014-11-28T22:10:13Z Fixing two issues with the release script: 1. The version replacement was still producing some false changes. 2. Uploads to the staging repo specifically. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX] Fixing two issues with the release sc...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3608 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [HOTFIX] Fixing two issues with the release sc...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3608#issuecomment-65698083 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24143/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize test execution
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3564#issuecomment-65700600 [Test build #24145 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24145/consoleFull) for PR 3564 at commit [`ef705a4`](https://github.com/apache/spark/commit/ef705a4536dbdc3c46e0a05a18098624b5d6be5c). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize test execution
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3564#issuecomment-65701539 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24144/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize test execution
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3564#issuecomment-65703187 [Test build #24145 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24145/consoleFull) for PR 3564 at commit [`ef705a4`](https://github.com/apache/spark/commit/ef705a4536dbdc3c46e0a05a18098624b5d6be5c). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize test execution
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3564#issuecomment-65703195 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24145/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize test execution
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3564#issuecomment-65704277 [Test build #24146 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24146/consoleFull) for PR 3564 at commit [`bf1d46f`](https://github.com/apache/spark/commit/bf1d46f1bb5ac343a2e47b3e9880d2f35b013603). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update DecisionTree.scala
GitHub user emtl97 opened a pull request: https://github.com/apache/spark/pull/3609 Update DecisionTree.scala Hello, Hope you are well. We've been using DecisionTree at Samsung and hope to help in some small way. I was interested in setting the seed for the sampling i.e. in line 988.We're in the process of creating tests for our code being able to set the seed is helpful. To that end, I think also the sample method here depends on a PartitionwiseSampledRDD. In there the 'compute' method think uses a different seed from the one that can be passed to the constructor of PartitionSampledRDD, it uses split.seed (below). Well hope we can discuss more! Thank you.. Best Wishes -Ed override def compute(splitIn: Partition, context: TaskContext): Iterator[U] = { val split = splitIn.asInstanceOf[PartitionwiseSampledRDDPartition] val thisSampler = sampler.clone thisSampler.setSeed(split.seed) thisSampler.sample(firstParent[T].iterator(split.prev, context)) } You can merge this pull request into a Git repository by running: $ git pull https://github.com/emtl97/spark patch-1 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3609.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3609 commit b2f6926cc5d4e5020b6811f6952101d6882877e1 Author: Ed sigm...@yahoo.com Date: 2014-12-04T21:11:41Z Update DecisionTree.scala Hello, Hope you are well. We've been using DecisionTree at Samsung and hope to help in some small way. I was interested in setting the seed for the sampling i.e. in line 988.We're in the process of creating tests for our code being able to set the seed is helpful. To that end, I think also the sample method here depends on a PartitionwiseSampledRDD. In there the 'compute' method think uses a different seed from the one that can be passed to the constructor of PartitionSampledRDD, it uses split.seed (below). Well hope we can discuss more! Thank you.. Best Wishes -Ed override def compute(splitIn: Partition, context: TaskContext): Iterator[U] = { val split = splitIn.asInstanceOf[PartitionwiseSampledRDDPartition] val thisSampler = sampler.clone thisSampler.setSeed(split.seed) thisSampler.sample(firstParent[T].iterator(split.prev, context)) } --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Update DecisionTree.scala
Github user emtl97 closed the pull request at: https://github.com/apache/spark/pull/3609 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize test execution
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3564#issuecomment-65706465 [Test build #24146 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24146/consoleFull) for PR 3564 at commit [`bf1d46f`](https://github.com/apache/spark/commit/bf1d46f1bb5ac343a2e47b3e9880d2f35b013603). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize test execution
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3564#issuecomment-65706473 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24146/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize test execution
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3564#issuecomment-65707887 [Test build #24147 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24147/consoleFull) for PR 3564 at commit [`ab127b7`](https://github.com/apache/spark/commit/ab127b798dbfa9399833d546e627f9651b060918). * This patch merges cleanly. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4349] Checking if parallel collection p...
Github user mccheah closed the pull request at: https://github.com/apache/spark/pull/3275 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4349] Checking if parallel collection p...
Github user mccheah commented on the pull request: https://github.com/apache/spark/pull/3275#issuecomment-65709183 We want a more generic fix than this. I'll push something new which will be completely different, addressing the issue further down in the stack. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize test execution
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3564#issuecomment-65710986 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/24147/ Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-3431] [WIP] Parallelize test execution
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/3564#issuecomment-65710977 [Test build #24147 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/24147/consoleFull) for PR 3564 at commit [`ab127b7`](https://github.com/apache/spark/commit/ab127b798dbfa9399833d546e627f9651b060918). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4749: Allow initializing KMeans clusters...
GitHub user nxwhite-str opened a pull request: https://github.com/apache/spark/pull/3610 SPARK-4749: Allow initializing KMeans clusters using a seed This implements the functionality for SPARK-4749 and provides units tests in Scala and PySpark You can merge this pull request into a Git repository by running: $ git pull https://github.com/nxwhite-str/spark master Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/3610.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #3610 commit 35c188463798729b65ca74549984cb765ac1e9c9 Author: nate.crosswhite nate.crosswh...@stresearch.com Date: 2014-12-04T19:12:29Z Add kmeans initial seed to pyspark API commit 616d11187128ca5bb1ecce1bfe3ca2df16529f61 Author: nate.crosswhite nate.crosswh...@stresearch.com Date: 2014-12-04T19:13:12Z Merge remote-tracking branch 'upstream/master' commit 5d087b40e14db51b1eeb44e462e04d5e718338be Author: nate.crosswhite nate.crosswh...@stresearch.com Date: 2014-12-04T21:25:49Z Adding KMeans train with seed and Scala unit test commit 9156a5782c254bbf765954fffcee1ca34d5d0b7f Author: nate.crosswhite nate.crosswh...@stresearch.com Date: 2014-12-04T21:28:32Z Merge remote-tracking branch 'upstream/master' --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: SPARK-4749: Allow initializing KMeans clusters...
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/3610#issuecomment-65712520 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4745] Fix get_existing_cluster() functi...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3596#discussion_r21341088 --- Diff: ec2/spark_ec2.py --- @@ -504,9 +504,9 @@ def get_existing_cluster(conn, opts, cluster_name, die_on_error=True): active = [i for i in res.instances if is_active(i)] for inst in active: group_names = [g.name for g in inst.groups] -if group_names == [cluster_name + -master]: +if cluster_name + -master in group_names: --- End diff -- Minor nit, but I think it's clearer to add parentheses here to make the operator precedence more clear. I'm going to do this myself while merging. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4745] Fix get_existing_cluster() functi...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3596#issuecomment-65713681 LGTM. I tested this out myself and it works, so I'm going to merge this into `master`, `branch-1.2` and `branch-1.1`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4745] Fix get_existing_cluster() functi...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3596 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add a Note on jsonFile having separate JSON ob...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/3517#discussion_r21341182 --- Diff: docs/sql-programming-guide.md --- @@ -621,7 +621,7 @@ val sqlContext = new org.apache.spark.sql.SQLContext(sc) // A JSON dataset is pointed to by path. // The path can be either a single text file or a directory storing text files. -val path = examples/src/main/resources/people.json +val path = examples/src/main/resources/people.txt --- End diff -- We need to move the file too and update the other places that reference it: ``` examples/src/main/java/org/apache/spark/examples/sql/JavaSparkSQL.java: String path = examples/src/main/resources/people.json; examples/src/main/python/sql.py:path = os.path.join(os.environ['SPARK_HOME'], examples/src/main/resources/people.json) ``` --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add a Note on jsonFile having separate JSON ob...
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/3517#issuecomment-65714075 LGTM once my comment is addressed. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: Add a Note on jsonFile having separate JSON ob...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3517#issuecomment-65714207 One thought: will the changed example file name / location be confusing for people reading documentation versions that don't match their Spark version? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4618][SQL] Make foreign DDL commands op...
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/3470#discussion_r21341852 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/sources/interfaces.scala --- @@ -37,7 +37,7 @@ import org.apache.spark.sql.catalyst.expressions.{Expression, Attribute} @DeveloperApi trait RelationProvider { /** Returns a new base relation with the given parameters. */ - def createRelation(sqlContext: SQLContext, parameters: Map[String, String]): BaseRelation + def createRelation(sqlContext: SQLContext, parameters: CaseInsensitiveMap): BaseRelation --- End diff -- `spark.sql.caseSensitive` is about identifiers (i.e., attributes and table names). I'd say this is more analogous to keyword case insensitivity. I don't know any database that doesn't treat `SELECT` and `select` the same so I'm not sure if that should be configurable. You can still pass your `CaseInsensitiveMap ` in and it will have the desired effect. Just don't change the function signature. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4459] Change groupBy type parameter fro...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3327#discussion_r21341919 --- Diff: core/src/test/java/org/apache/spark/JavaAPISuite.java --- @@ -322,6 +322,42 @@ public Boolean call(Integer x) { Assert.assertEquals(2, Iterables.size(oddsAndEvens.lookup(true).get(0))); // Evens Assert.assertEquals(5, Iterables.size(oddsAndEvens.lookup(false).get(0))); // Odds } + + @Test + public void groupByOnPairRDD() { +JavaRDDInteger rdd = sc.parallelize(Arrays.asList(1, 1, 2, 3, 5, 8, 13)); +Functionscala.Tuple2Integer, Integer, Boolean areOdd = new Functionscala.Tuple2Integer, Integer, Boolean() { + @Override + public Boolean call(scala.Tuple2Integer, Integer x) { +return x._1 % 2 == 0 x._2 % 2 == 0; + } +}; + JavaPairRDDInteger, Integer pairrdd = rdd.zip(rdd); +JavaPairRDDBoolean, Iterablescala.Tuple2Integer, Integer oddsAndEvens = pairrdd.groupBy(areOdd); +Assert.assertEquals(2, oddsAndEvens.count()); +Assert.assertEquals(2, Iterables.size(oddsAndEvens.lookup(true).get(0))); // Evens +Assert.assertEquals(5, Iterables.size(oddsAndEvens.lookup(false).get(0))); // Odds + +oddsAndEvens = pairrdd.groupBy(areOdd, 1); +Assert.assertEquals(2, oddsAndEvens.count()); +Assert.assertEquals(2, Iterables.size(oddsAndEvens.lookup(true).get(0))); // Evens +Assert.assertEquals(5, Iterables.size(oddsAndEvens.lookup(false).get(0))); // Odds + } + + @Test + public void keyByOnPairRDD() { +JavaRDDInteger rdd = sc.parallelize(Arrays.asList(1, 1, 2, 3, 5, 8, 13)); +Functionscala.Tuple2Integer, Integer, String areOdd = new Functionscala.Tuple2Integer, Integer, String() { + @Override + public String call(scala.Tuple2Integer, Integer x) { +return +(x._1 +x._2); --- End diff -- The spacing here is messy. Also, ` + x` is messy; just do `x.toString()` instead if you want to convert an object in a string. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4459] Change groupBy type parameter fro...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3327#discussion_r21342049 --- Diff: core/src/test/java/org/apache/spark/JavaAPISuite.java --- @@ -322,6 +322,42 @@ public Boolean call(Integer x) { Assert.assertEquals(2, Iterables.size(oddsAndEvens.lookup(true).get(0))); // Evens Assert.assertEquals(5, Iterables.size(oddsAndEvens.lookup(false).get(0))); // Odds } + + @Test + public void groupByOnPairRDD() { +JavaRDDInteger rdd = sc.parallelize(Arrays.asList(1, 1, 2, 3, 5, 8, 13)); +Functionscala.Tuple2Integer, Integer, Boolean areOdd = new Functionscala.Tuple2Integer, Integer, Boolean() { + @Override + public Boolean call(scala.Tuple2Integer, Integer x) { +return x._1 % 2 == 0 x._2 % 2 == 0; + } +}; + JavaPairRDDInteger, Integer pairrdd = rdd.zip(rdd); +JavaPairRDDBoolean, Iterablescala.Tuple2Integer, Integer oddsAndEvens = pairrdd.groupBy(areOdd); +Assert.assertEquals(2, oddsAndEvens.count()); +Assert.assertEquals(2, Iterables.size(oddsAndEvens.lookup(true).get(0))); // Evens +Assert.assertEquals(5, Iterables.size(oddsAndEvens.lookup(false).get(0))); // Odds + +oddsAndEvens = pairrdd.groupBy(areOdd, 1); +Assert.assertEquals(2, oddsAndEvens.count()); +Assert.assertEquals(2, Iterables.size(oddsAndEvens.lookup(true).get(0))); // Evens +Assert.assertEquals(5, Iterables.size(oddsAndEvens.lookup(false).get(0))); // Odds + } + + @Test + public void keyByOnPairRDD() { +JavaRDDInteger rdd = sc.parallelize(Arrays.asList(1, 1, 2, 3, 5, 8, 13)); +Functionscala.Tuple2Integer, Integer, String areOdd = new Functionscala.Tuple2Integer, Integer, String() { --- End diff -- Also, why is this named `areOdd`? That's not what this function is doing. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4668] Fix some documentation typos.
Github user ryan-williams commented on the pull request: https://github.com/apache/spark/pull/3523#issuecomment-65716929 added some more documentation typo fixes and added parameter names to a few unmarked booleans, for clarity --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4459] Change groupBy type parameter fro...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3327#discussion_r21342647 --- Diff: core/src/test/java/org/apache/spark/JavaAPISuite.java --- @@ -322,6 +322,42 @@ public Boolean call(Integer x) { Assert.assertEquals(2, Iterables.size(oddsAndEvens.lookup(true).get(0))); // Evens Assert.assertEquals(5, Iterables.size(oddsAndEvens.lookup(false).get(0))); // Odds } + + @Test + public void groupByOnPairRDD() { +JavaRDDInteger rdd = sc.parallelize(Arrays.asList(1, 1, 2, 3, 5, 8, 13)); +Functionscala.Tuple2Integer, Integer, Boolean areOdd = new Functionscala.Tuple2Integer, Integer, Boolean() { --- End diff -- Also, you could just write `Tuple2` instead of `scala.Tuple2`. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4459] Change groupBy type parameter fro...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3327#issuecomment-65719085 I downloaded this and tested it out; everything looks fine, modulo these formatting issues. I've fixed the style issues myself and am going to merge this into `master`, `branch-1.2`, and `branch-1.1`. Thanks for fixing this! (I ran the MiMa tests locally and this passes) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4459] Change groupBy type parameter fro...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3327 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4459] Change groupBy type parameter fro...
Github user alokito commented on the pull request: https://github.com/apache/spark/pull/3327#issuecomment-65719809 Thanks for fixing the style issues, I meant to but it's been a tough week. On Dec 4, 2014, at 17:57, Josh Rosen notificati...@github.com wrote: I downloaded this and tested it out; everything looks fine, modulo these formatting issues. I've fixed the style issues myself and am going to merge this into master, branch-1.2, and branch-1.1. Thanks for fixing this! (I ran the MiMa tests locally and this passes) â Reply to this email directly or view it on GitHub https://github.com/apache/spark/pull/3327#issuecomment-65719085. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4652][DOCS] Add docs about spark-git-re...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/3513#discussion_r21344431 --- Diff: docs/ec2-scripts.md --- @@ -85,6 +85,11 @@ another. specified version of Spark. The `version` can be a version number (e.g. 0.7.3) or a specific git hash. By default, a recent version will be used. +-`--spark-git-repo=repository url` enables you to run your + development version on EC2 cluster. You need to set + `--spark-version` as git commit hash such as 317e114 not + original release version number. By default, this repository is + set [apache mirror](https://github.com/apache/spark). --- End diff -- Apache needs to be capitalized here. I'd also swap the order of these sentences so that the default for `--spark-git-repo` appears first, followed by the sentence describing how `--spark-version` needs to be set when using this option. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4652][DOCS] Add docs about spark-git-re...
Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/3513#issuecomment-65721200 This looks good to me. There's one minor sentence-ordering and capitalization issue that I'd like to fix, but I'll do it myself on merge. I'm going to merge this into `master`, `branch-1.2`, and `branch-1.1`. Thanks! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4652][DOCS] Add docs about spark-git-re...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/3513 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4005][CORE] handle message replies in r...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2853#discussion_r21345364 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala --- @@ -351,23 +350,23 @@ class BlockManagerMasterActor(val isLocal: Boolean, conf: SparkConf, listenerBus storageLevel: StorageLevel, memSize: Long, diskSize: Long, - tachyonSize: Long) { + tachyonSize: Long): Boolean = { +var updated = true if (!blockManagerInfo.contains(blockManagerId)) { if (blockManagerId.isDriver !isLocal) { // We intentionally do not register the master (except in local mode), // so we should not indicate failure. -sender ! true +// do nothing here, updated == true. --- End diff -- Why not just `return true` here, and `return false` in the other branch so that we can eliminate the mutable `updated` variable? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request: [SPARK-4005][CORE] handle message replies in r...
Github user JoshRosen commented on a diff in the pull request: https://github.com/apache/spark/pull/2853#discussion_r21345387 --- Diff: core/src/main/scala/org/apache/spark/storage/BlockManagerMasterActor.scala --- @@ -391,7 +390,7 @@ class BlockManagerMasterActor(val isLocal: Boolean, conf: SparkConf, listenerBus if (locations.size == 0) { blockLocations.remove(blockId) } -sender ! true +updated --- End diff -- Similarly, why not just `return true` here? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org