Re: [VOTE] Release Apache Spark 1.2.0 (RC1)
Hi Krishna, Thanks for providing the notebook! I tried and found that the problem is with PySpark's zip. I created a JIRA to track the issue: https://issues.apache.org/jira/browse/SPARK-4841 -Xiangrui On Thu, Dec 11, 2014 at 1:55 PM, Krishna Sankar ksanka...@gmail.com wrote: K-Means iPython notebook data attached. It is the zip that gives the error ; while one of the RDDs is from the prediction, most probably there is no problem with the K-Means. Lines 34,35 36 essentially are the same. But only 36 works with 1.2.0. Interestingly, lines 34,35 36 work with 1.1.1 (Checked just now) The plot thickens! In 1.1.1, freq_cluster_map.take(5) prints normally for 34 35, but in exponential form for 36. So there is some difference even in 1.1.1. #34,#35 [(array([28143, 0, 174, 1, 0, 0, 7000]), 1), (array([19244, 0, 215, 2, 0, 0, 6968]), 1), (array([41354, 0, 4123, 4, 0, 0, 7034]), 1), (array([14776, 0, 500, 1, 0, 0, 6952]), 1), (array([97752, 0, 43300,26, 2077, 4, 6935]), 0)] #36 [(array([ 2.8143e+04, 0.e+00, 1.7400e+02, 1.e+00, 0.e+00, 0.e+00, 7.e+03]), 1), (array([ 1.9244e+04, 0.e+00, 2.1500e+02, 2.e+00, 0.e+00, 0.e+00, 6.9680e+03]), 1), (array([ 4.1354e+04, 0.e+00, 4.1230e+03, 4.e+00, 0.e+00, 0.e+00, 7.0340e+03]), 1), (array([ 1.4776e+04, 0.e+00, 5.e+02, 1.e+00, 0.e+00, 0.e+00, 6.9520e+03]), 1), (array([ 9.7752e+04, 0.e+00, 4.3300e+04, 2.6000e+01, 2.0770e+03, 4.e+00, 6.9350e+03]), 0)] I had overwritten the naive bayes example. Will chase the older versions down Cheers k/ On Wed, Dec 3, 2014 at 4:19 PM, Xiangrui Meng men...@gmail.com wrote: Krishna, could you send me some code snippets for the issues you saw in naive Bayes and k-means? -Xiangrui On Sun, Nov 30, 2014 at 6:49 AM, Krishna Sankar ksanka...@gmail.com wrote: +1 1. Compiled OSX 10.10 (Yosemite) mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package 16:46 min (slightly slower connection) 2. Tested pyspark, mlib - running as well as compare esults with 1.1.x 2.1. statistics OK 2.2. Linear/Ridge/Laso Regression OK Slight difference in the print method (vs. 1.1.x) of the model object - with a label more details. This is good. 2.3. Decision Tree, Naive Bayes OK Changes in print(model) - now print (model.ToDebugString()) - OK Some changes in NaiveBayes. Different from my 1.1.x code - had to flatten list structures, zip required same number in partitions After code changes ran fine. 2.4. KMeans OK zip occasionally fails with error localhost): org.apache.spark.SparkException: Can only zip RDDs with same number of elements in each partition Has https://issues.apache.org/jira/browse/SPARK-2251 reappeared ? Made it work by doing a different transformation ie reusing an original rdd. 2.5. rdd operations OK State of the Union Texts - MapReduce, Filter,sortByKey (word count) 2.6. recommendation OK 2.7. Good work ! In 1.x.x, had a map distinct over the movielens medium dataset which never worked. Works fine in 1.2.0 ! 3. Scala Mlib - subset of examples as in #2 above, with Scala 3.1. statistics OK 3.2. Linear Regression OK 3.3. Decision Tree OK 3.4. KMeans OK Cheers k/ P.S: Plan to add RF and .ml mechanics to this bank On Fri, Nov 28, 2014 at 9:16 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=1056e9ec13203d0c51564265e94d77a054498fdb The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1048/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Tuesday, December 02, at 05:15 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.1.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote
[RESULT] [VOTE] Release Apache Spark 1.2.0 (RC1)
This vote is closed in favor of RC2. On Fri, Dec 5, 2014 at 2:02 PM, Patrick Wendell pwend...@gmail.com wrote: Hey All, Thanks all for the continued testing! The issue I mentioned earlier SPARK-4498 was fixed earlier this week (hat tip to Mark Hamstra who contributed to fix). In the interim a few smaller blocker-level issues with Spark SQL were found and fixed (SPARK-4753, SPARK-4552, SPARK-4761). There is currently an outstanding issue (SPARK-4740[1]) in Spark core that needs to be fixed. I want to thank in particular Shopify and Intel China who have identified and helped test blocker issues with the release. This type of workload testing around releases is really helpful for us. Once things stabilize I will cut RC2. I think we're pretty close with this one. - Patrick On Wed, Dec 3, 2014 at 5:38 PM, Takeshi Yamamuro linguin@gmail.com wrote: +1 (non-binding) Checked on CentOS 6.5, compiled from the source. Ran various examples in stand-alone master and three slaves, and browsed the web UI. On Sat, Nov 29, 2014 at 2:16 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=1056e9ec13203d0c51564265e94d77a054498fdb The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1048/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Tuesday, December 02, at 05:15 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.1.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening very late into the QA period compared with previous votes, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == Other notes == Because this vote is occurring over a weekend, I will likely extend the vote if this RC survives until the end of the vote period. - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.2.0 (RC1)
Hey All, Thanks all for the continued testing! The issue I mentioned earlier SPARK-4498 was fixed earlier this week (hat tip to Mark Hamstra who contributed to fix). In the interim a few smaller blocker-level issues with Spark SQL were found and fixed (SPARK-4753, SPARK-4552, SPARK-4761). There is currently an outstanding issue (SPARK-4740[1]) in Spark core that needs to be fixed. I want to thank in particular Shopify and Intel China who have identified and helped test blocker issues with the release. This type of workload testing around releases is really helpful for us. Once things stabilize I will cut RC2. I think we're pretty close with this one. - Patrick On Wed, Dec 3, 2014 at 5:38 PM, Takeshi Yamamuro linguin@gmail.com wrote: +1 (non-binding) Checked on CentOS 6.5, compiled from the source. Ran various examples in stand-alone master and three slaves, and browsed the web UI. On Sat, Nov 29, 2014 at 2:16 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=1056e9ec13203d0c51564265e94d77a054498fdb The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1048/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Tuesday, December 02, at 05:15 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.1.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening very late into the QA period compared with previous votes, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == Other notes == Because this vote is occurring over a weekend, I will likely extend the vote if this RC survives until the end of the vote period. - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.2.0 (RC1)
+1 (non-binding) Checked on CentOS 6.5, compiled from the source. Ran various examples in stand-alone master and three slaves, and browsed the web UI. On Sat, Nov 29, 2014 at 2:16 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=1056e9ec13203d0c51564265e94d77a054498fdb The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1048/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Tuesday, December 02, at 05:15 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.1.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening very late into the QA period compared with previous votes, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == Other notes == Because this vote is occurring over a weekend, I will likely extend the vote if this RC survives until the end of the vote period. - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.2.0 (RC1)
Will do. Am on the road - will annotate an iPython notebook with what works what didn't work ... Cheers k/ On Wed, Dec 3, 2014 at 4:19 PM, Xiangrui Meng men...@gmail.com wrote: Krishna, could you send me some code snippets for the issues you saw in naive Bayes and k-means? -Xiangrui On Sun, Nov 30, 2014 at 6:49 AM, Krishna Sankar ksanka...@gmail.com wrote: +1 1. Compiled OSX 10.10 (Yosemite) mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package 16:46 min (slightly slower connection) 2. Tested pyspark, mlib - running as well as compare esults with 1.1.x 2.1. statistics OK 2.2. Linear/Ridge/Laso Regression OK Slight difference in the print method (vs. 1.1.x) of the model object - with a label more details. This is good. 2.3. Decision Tree, Naive Bayes OK Changes in print(model) - now print (model.ToDebugString()) - OK Some changes in NaiveBayes. Different from my 1.1.x code - had to flatten list structures, zip required same number in partitions After code changes ran fine. 2.4. KMeans OK zip occasionally fails with error localhost): org.apache.spark.SparkException: Can only zip RDDs with same number of elements in each partition Has https://issues.apache.org/jira/browse/SPARK-2251 reappeared ? Made it work by doing a different transformation ie reusing an original rdd. 2.5. rdd operations OK State of the Union Texts - MapReduce, Filter,sortByKey (word count) 2.6. recommendation OK 2.7. Good work ! In 1.x.x, had a map distinct over the movielens medium dataset which never worked. Works fine in 1.2.0 ! 3. Scala Mlib - subset of examples as in #2 above, with Scala 3.1. statistics OK 3.2. Linear Regression OK 3.3. Decision Tree OK 3.4. KMeans OK Cheers k/ P.S: Plan to add RF and .ml mechanics to this bank On Fri, Nov 28, 2014 at 9:16 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=1056e9ec13203d0c51564265e94d77a054498fdb The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1048/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Tuesday, December 02, at 05:15 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.1.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening very late into the QA period compared with previous votes, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == Other notes == Because this vote is occurring over a weekend, I will likely extend the vote if this RC survives until the end of the vote period. - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.2.0 (RC1)
+1 (non-binding) Verified on OSX 10.10.2, built from source, spark-shell / spark-submit jobs ran various simple Spark / Scala queries ran various SparkSQL queries (including HiveContext) ran ThriftServer service and connected via beeline ran SparkSVD On Mon Dec 01 2014 at 11:09:26 PM Patrick Wendell pwend...@gmail.com wrote: Hey All, Just an update. Josh, Andrew, and others are working to reproduce SPARK-4498 and fix it. Other than that issue no serious regressions have been reported so far. If we are able to get a fix in for that soon, we'll likely cut another RC with the patch. Continued testing of RC1 is definitely appreciated! I'll leave this vote open to allow folks to continue posting comments. It's fine to still give +1 from your own testing... i.e. you can assume at this point SPARK-4498 will be fixed before releasing. - Patrick On Mon, Dec 1, 2014 at 3:30 PM, Matei Zaharia matei.zaha...@gmail.com wrote: +0.9 from me. Tested it on Mac and Windows (someone has to do it) and while things work, I noticed a few recent scripts don't have Windows equivalents, namely https://issues.apache.org/jira/browse/SPARK-4683 and https://issues.apache.org/jira/browse/SPARK-4684. The first one at least would be good to fix if we do another RC. Not blocking the release but useful to fix in docs is https://issues.apache.org/jira/browse/SPARK-4685. Matei On Dec 1, 2014, at 11:18 AM, Josh Rosen rosenvi...@gmail.com wrote: Hi everyone, There's an open bug report related to Spark standalone which could be a potential release-blocker (pending investigation / a bug fix): https://issues.apache.org/jira/browse/SPARK-4498. This issue seems non-deterministc and only affects long-running Spark standalone deployments, so it may be hard to reproduce. I'm going to work on a patch to add additional logging in order to help with debugging. I just wanted to give an early head's up about this issue and to get more eyes on it in case anyone else has run into it or wants to help with debugging. - Josh On November 28, 2014 at 9:18:09 PM, Patrick Wendell (pwend...@gmail.com) wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= 1056e9ec13203d0c51564265e94d77a054498fdb The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1048/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Tuesday, December 02, at 05:15 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.1.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening very late into the QA period compared with previous votes, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == Other notes == Because this vote is occurring over a weekend, I will likely extend the vote if this RC survives until the end of the vote period. - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.2.0 (RC1)
+1 (non-binding) Installed version pre-built for Hadoop on a private HPC ran PySpark shell w/ iPython loaded data using custom Hadoop input formats ran MLlib routines in PySpark ran custom workflows in PySpark browsed the web UI Noticeable improvements in stability and performance during large shuffles (as well as the elimination of frequent but unpredictable “FileNotFound / too many open files” errors). We initially hit errors during large collects that ran fine in 1.1, but setting the new spark.driver.maxResultSize to 0 preserved the old behavior. Definitely worth highlighting this setting in the release notes, as the new default may be too small for some users and workloads. — Jeremy - jeremyfreeman.net @thefreemanlab On Dec 2, 2014, at 3:22 AM, Denny Lee denny.g@gmail.com wrote: +1 (non-binding) Verified on OSX 10.10.2, built from source, spark-shell / spark-submit jobs ran various simple Spark / Scala queries ran various SparkSQL queries (including HiveContext) ran ThriftServer service and connected via beeline ran SparkSVD On Mon Dec 01 2014 at 11:09:26 PM Patrick Wendell pwend...@gmail.com wrote: Hey All, Just an update. Josh, Andrew, and others are working to reproduce SPARK-4498 and fix it. Other than that issue no serious regressions have been reported so far. If we are able to get a fix in for that soon, we'll likely cut another RC with the patch. Continued testing of RC1 is definitely appreciated! I'll leave this vote open to allow folks to continue posting comments. It's fine to still give +1 from your own testing... i.e. you can assume at this point SPARK-4498 will be fixed before releasing. - Patrick On Mon, Dec 1, 2014 at 3:30 PM, Matei Zaharia matei.zaha...@gmail.com wrote: +0.9 from me. Tested it on Mac and Windows (someone has to do it) and while things work, I noticed a few recent scripts don't have Windows equivalents, namely https://issues.apache.org/jira/browse/SPARK-4683 and https://issues.apache.org/jira/browse/SPARK-4684. The first one at least would be good to fix if we do another RC. Not blocking the release but useful to fix in docs is https://issues.apache.org/jira/browse/SPARK-4685. Matei On Dec 1, 2014, at 11:18 AM, Josh Rosen rosenvi...@gmail.com wrote: Hi everyone, There's an open bug report related to Spark standalone which could be a potential release-blocker (pending investigation / a bug fix): https://issues.apache.org/jira/browse/SPARK-4498. This issue seems non-deterministc and only affects long-running Spark standalone deployments, so it may be hard to reproduce. I'm going to work on a patch to add additional logging in order to help with debugging. I just wanted to give an early head's up about this issue and to get more eyes on it in case anyone else has run into it or wants to help with debugging. - Josh On November 28, 2014 at 9:18:09 PM, Patrick Wendell (pwend...@gmail.com) wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= 1056e9ec13203d0c51564265e94d77a054498fdb The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1048/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Tuesday, December 02, at 05:15 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.1.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening very late into the QA period compared with previous votes, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == Other notes == Because this vote is occurring over a weekend, I will likely extend the vote if this RC survives until the end of the vote period. - Patrick
Re: [VOTE] Release Apache Spark 1.2.0 (RC1)
+1. I also tested on Windows just in case, with jars referring other jars and python files referring other python files. Path resolution still works. 2014-12-02 10:16 GMT-08:00 Jeremy Freeman freeman.jer...@gmail.com: +1 (non-binding) Installed version pre-built for Hadoop on a private HPC ran PySpark shell w/ iPython loaded data using custom Hadoop input formats ran MLlib routines in PySpark ran custom workflows in PySpark browsed the web UI Noticeable improvements in stability and performance during large shuffles (as well as the elimination of frequent but unpredictable “FileNotFound / too many open files” errors). We initially hit errors during large collects that ran fine in 1.1, but setting the new spark.driver.maxResultSize to 0 preserved the old behavior. Definitely worth highlighting this setting in the release notes, as the new default may be too small for some users and workloads. — Jeremy - jeremyfreeman.net @thefreemanlab On Dec 2, 2014, at 3:22 AM, Denny Lee denny.g@gmail.com wrote: +1 (non-binding) Verified on OSX 10.10.2, built from source, spark-shell / spark-submit jobs ran various simple Spark / Scala queries ran various SparkSQL queries (including HiveContext) ran ThriftServer service and connected via beeline ran SparkSVD On Mon Dec 01 2014 at 11:09:26 PM Patrick Wendell pwend...@gmail.com wrote: Hey All, Just an update. Josh, Andrew, and others are working to reproduce SPARK-4498 and fix it. Other than that issue no serious regressions have been reported so far. If we are able to get a fix in for that soon, we'll likely cut another RC with the patch. Continued testing of RC1 is definitely appreciated! I'll leave this vote open to allow folks to continue posting comments. It's fine to still give +1 from your own testing... i.e. you can assume at this point SPARK-4498 will be fixed before releasing. - Patrick On Mon, Dec 1, 2014 at 3:30 PM, Matei Zaharia matei.zaha...@gmail.com wrote: +0.9 from me. Tested it on Mac and Windows (someone has to do it) and while things work, I noticed a few recent scripts don't have Windows equivalents, namely https://issues.apache.org/jira/browse/SPARK-4683 and https://issues.apache.org/jira/browse/SPARK-4684. The first one at least would be good to fix if we do another RC. Not blocking the release but useful to fix in docs is https://issues.apache.org/jira/browse/SPARK-4685. Matei On Dec 1, 2014, at 11:18 AM, Josh Rosen rosenvi...@gmail.com wrote: Hi everyone, There's an open bug report related to Spark standalone which could be a potential release-blocker (pending investigation / a bug fix): https://issues.apache.org/jira/browse/SPARK-4498. This issue seems non-deterministc and only affects long-running Spark standalone deployments, so it may be hard to reproduce. I'm going to work on a patch to add additional logging in order to help with debugging. I just wanted to give an early head's up about this issue and to get more eyes on it in case anyone else has run into it or wants to help with debugging. - Josh On November 28, 2014 at 9:18:09 PM, Patrick Wendell ( pwend...@gmail.com) wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= 1056e9ec13203d0c51564265e94d77a054498fdb The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1048/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Tuesday, December 02, at 05:15 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.1.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening very late into the QA period compared with previous votes, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been
Re: [VOTE] Release Apache Spark 1.2.0 (RC1)
+1 tested on yarn. Tom On Friday, November 28, 2014 11:18 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=1056e9ec13203d0c51564265e94d77a054498fdb The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1048/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Tuesday, December 02, at 05:15 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.1.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening very late into the QA period compared with previous votes, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == Other notes == Because this vote is occurring over a weekend, I will likely extend the vote if this RC survives until the end of the vote period. - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.2.0 (RC1)
+1 (non-binding) built from source fired up a spark-shell against YARN cluster ran some jobs using parallelize ran some jobs that read files clicked around the web UI On Sun, Nov 30, 2014 at 1:10 AM, GuoQiang Li wi...@qq.com wrote: +1 (non-binding) -- Original -- From: Patrick Wendell;pwend...@gmail.com; Date: Sat, Nov 29, 2014 01:16 PM To: dev@spark.apache.orgdev@spark.apache.org; Subject: [VOTE] Release Apache Spark 1.2.0 (RC1) Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=1056e9ec13203d0c51564265e94d77a054498fdb The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1048/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Tuesday, December 02, at 05:15 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.1.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening very late into the QA period compared with previous votes, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == Other notes == Because this vote is occurring over a weekend, I will likely extend the vote if this RC survives until the end of the vote period. - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.2.0 (RC1)
Hi everyone, There’s an open bug report related to Spark standalone which could be a potential release-blocker (pending investigation / a bug fix): https://issues.apache.org/jira/browse/SPARK-4498. This issue seems non-deterministc and only affects long-running Spark standalone deployments, so it may be hard to reproduce. I’m going to work on a patch to add additional logging in order to help with debugging. I just wanted to give an early head’s up about this issue and to get more eyes on it in case anyone else has run into it or wants to help with debugging. - Josh On November 28, 2014 at 9:18:09 PM, Patrick Wendell (pwend...@gmail.com) wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=1056e9ec13203d0c51564265e94d77a054498fdb The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1048/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Tuesday, December 02, at 05:15 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.1.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening very late into the QA period compared with previous votes, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == Other notes == Because this vote is occurring over a weekend, I will likely extend the vote if this RC survives until the end of the vote period. - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.2.0 (RC1)
+0.9 from me. Tested it on Mac and Windows (someone has to do it) and while things work, I noticed a few recent scripts don't have Windows equivalents, namely https://issues.apache.org/jira/browse/SPARK-4683 and https://issues.apache.org/jira/browse/SPARK-4684. The first one at least would be good to fix if we do another RC. Not blocking the release but useful to fix in docs is https://issues.apache.org/jira/browse/SPARK-4685. Matei On Dec 1, 2014, at 11:18 AM, Josh Rosen rosenvi...@gmail.com wrote: Hi everyone, There’s an open bug report related to Spark standalone which could be a potential release-blocker (pending investigation / a bug fix): https://issues.apache.org/jira/browse/SPARK-4498. This issue seems non-deterministc and only affects long-running Spark standalone deployments, so it may be hard to reproduce. I’m going to work on a patch to add additional logging in order to help with debugging. I just wanted to give an early head’s up about this issue and to get more eyes on it in case anyone else has run into it or wants to help with debugging. - Josh On November 28, 2014 at 9:18:09 PM, Patrick Wendell (pwend...@gmail.com) wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=1056e9ec13203d0c51564265e94d77a054498fdb The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1048/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Tuesday, December 02, at 05:15 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.1.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening very late into the QA period compared with previous votes, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == Other notes == Because this vote is occurring over a weekend, I will likely extend the vote if this RC survives until the end of the vote period. - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.2.0 (RC1)
Hey All, Just an update. Josh, Andrew, and others are working to reproduce SPARK-4498 and fix it. Other than that issue no serious regressions have been reported so far. If we are able to get a fix in for that soon, we'll likely cut another RC with the patch. Continued testing of RC1 is definitely appreciated! I'll leave this vote open to allow folks to continue posting comments. It's fine to still give +1 from your own testing... i.e. you can assume at this point SPARK-4498 will be fixed before releasing. - Patrick On Mon, Dec 1, 2014 at 3:30 PM, Matei Zaharia matei.zaha...@gmail.com wrote: +0.9 from me. Tested it on Mac and Windows (someone has to do it) and while things work, I noticed a few recent scripts don't have Windows equivalents, namely https://issues.apache.org/jira/browse/SPARK-4683 and https://issues.apache.org/jira/browse/SPARK-4684. The first one at least would be good to fix if we do another RC. Not blocking the release but useful to fix in docs is https://issues.apache.org/jira/browse/SPARK-4685. Matei On Dec 1, 2014, at 11:18 AM, Josh Rosen rosenvi...@gmail.com wrote: Hi everyone, There's an open bug report related to Spark standalone which could be a potential release-blocker (pending investigation / a bug fix): https://issues.apache.org/jira/browse/SPARK-4498. This issue seems non-deterministc and only affects long-running Spark standalone deployments, so it may be hard to reproduce. I'm going to work on a patch to add additional logging in order to help with debugging. I just wanted to give an early head's up about this issue and to get more eyes on it in case anyone else has run into it or wants to help with debugging. - Josh On November 28, 2014 at 9:18:09 PM, Patrick Wendell (pwend...@gmail.com) wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=1056e9ec13203d0c51564265e94d77a054498fdb The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1048/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Tuesday, December 02, at 05:15 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.1.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening very late into the QA period compared with previous votes, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == Other notes == Because this vote is occurring over a weekend, I will likely extend the vote if this RC survives until the end of the vote period. - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.2.0 (RC1)
+1 (non-binding) -- Original -- From: Patrick Wendell;pwend...@gmail.com; Date: Sat, Nov 29, 2014 01:16 PM To: dev@spark.apache.orgdev@spark.apache.org; Subject: [VOTE] Release Apache Spark 1.2.0 (RC1) Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=1056e9ec13203d0c51564265e94d77a054498fdb The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1048/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Tuesday, December 02, at 05:15 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.1.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening very late into the QA period compared with previous votes, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == Other notes == Because this vote is occurring over a weekend, I will likely extend the vote if this RC survives until the end of the vote period. - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.2.0 (RC1)
+1 1 Compiled binaries 2 All Tests Pass 3 Ran python and scala examples for spark and Mllib on local and master + 4 workers -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-2-0-RC1-tp9546p9552.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.2.0 (RC1)
Thanks for pointing this out, Matei. I don't think a minor typo like this is a big deal. Hopefully it's clear to everyone this is the 1.2.0 release vote, as indicated by the subject and all of the artifacts. On Sat, Nov 29, 2014 at 1:26 AM, Matei Zaharia matei.zaha...@gmail.com wrote: Hey Patrick, unfortunately you got some of the text here wrong, saying 1.1.0 instead of 1.2.0. Not sure it will matter since there can well be another RC after testing, but we should be careful. Matei On Nov 28, 2014, at 9:16 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=1056e9ec13203d0c51564265e94d77a054498fdb The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1048/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Tuesday, December 02, at 05:15 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.1.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening very late into the QA period compared with previous votes, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == Other notes == Because this vote is occurring over a weekend, I will likely extend the vote if this RC survives until the end of the vote period. - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.2.0 (RC1)
+1 1 Compiled binaries 2 All Tests Pass Regards, Vaquar khan On 30 Nov 2014 04:21, Krishna Sankar ksanka...@gmail.com wrote: +1 1. Compiled OSX 10.10 (Yosemite) mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package 16:46 min (slightly slower connection) 2. Tested pyspark, mlib - running as well as compare esults with 1.1.x 2.1. statistics OK 2.2. Linear/Ridge/Laso Regression OK Slight difference in the print method (vs. 1.1.x) of the model object - with a label more details. This is good. 2.3. Decision Tree, Naive Bayes OK Changes in print(model) - now print (model.ToDebugString()) - OK Some changes in NaiveBayes. Different from my 1.1.x code - had to flatten list structures, zip required same number in partitions After code changes ran fine. 2.4. KMeans OK zip occasionally fails with error localhost): org.apache.spark.SparkException: Can only zip RDDs with same number of elements in each partition Has https://issues.apache.org/jira/browse/SPARK-2251 reappeared ? Made it work by doing a different transformation ie reusing an original rdd. 2.5. rdd operations OK State of the Union Texts - MapReduce, Filter,sortByKey (word count) 2.6. recommendation OK 2.7. Good work ! In 1.x.x, had a map distinct over the movielens medium dataset which never worked. Works fine in 1.2.0 ! 3. Scala Mlib - subset of examples as in #2 above, with Scala 3.1. statistics OK 3.2. Linear Regression OK 3.3. Decision Tree OK 3.4. KMeans OK Cheers k/ P.S: Plan to add RF and .ml mechanics to this bank On Fri, Nov 28, 2014 at 9:16 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=1056e9ec13203d0c51564265e94d77a054498fdb The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1048/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Tuesday, December 02, at 05:15 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.1.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening very late into the QA period compared with previous votes, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == Other notes == Because this vote is occurring over a weekend, I will likely extend the vote if this RC survives until the end of the vote period. - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[VOTE] Release Apache Spark 1.2.0 (RC1)
Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=1056e9ec13203d0c51564265e94d77a054498fdb The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1048/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Tuesday, December 02, at 05:15 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.1.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening very late into the QA period compared with previous votes, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == Other notes == Because this vote is occurring over a weekend, I will likely extend the vote if this RC survives until the end of the vote period. - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.2.0 (RC1)
Krishna, Docs don't block the rc voting because docs can be updated in parallel with release candidates, until the point a release is made. On Fri, Nov 28, 2014 at 9:55 PM, Krishna Sankar ksanka...@gmail.com wrote: Looks like the documentation hasn't caught up with the new features. On the machine learning side, for example org.apache.spark.ml, RandomForest, gbtree and so forth. Is a refresh of the documentation planned ? Am happy to see these capabilities, but these would need good explanations as well, especially the new thinking around the ml ... pipelines, transformations et al. IMHO, the documentation is a -1. Will check out the compilation, mlib et al Cheers k/ On Fri, Nov 28, 2014 at 9:16 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=1056e9ec13203d0c51564265e94d77a054498fdb The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1048/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Tuesday, December 02, at 05:15 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.1.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening very late into the QA period compared with previous votes, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == Other notes == Because this vote is occurring over a weekend, I will likely extend the vote if this RC survives until the end of the vote period. - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.2.0 (RC1)
Hey Patrick, unfortunately you got some of the text here wrong, saying 1.1.0 instead of 1.2.0. Not sure it will matter since there can well be another RC after testing, but we should be careful. Matei On Nov 28, 2014, at 9:16 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc1 (commit 1056e9ec1): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=1056e9ec13203d0c51564265e94d77a054498fdb The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1048/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc1-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Tuesday, December 02, at 05:15 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.1.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening very late into the QA period compared with previous votes, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == Other notes == Because this vote is occurring over a weekend, I will likely extend the vote if this RC survives until the end of the vote period. - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org