Re: [RESULT] [VOTE] Release Apache Spark 1.2.0 (RC2)
Update: An Apache infrastructure issue prevented me from pushing this last night. The issue was resolved today and I should be able to push the final release artifacts tonight. On Tue, Dec 16, 2014 at 9:20 PM, Patrick Wendell pwend...@gmail.com wrote: This vote has PASSED with 12 +1 votes (8 binding) and no 0 or -1 votes: +1: Matei Zaharia* Madhu Siddalingaiah Reynold Xin* Sandy Ryza Josh Rozen* Mark Hamstra* Denny Lee Tom Graves* GuiQiang Li Nick Pentreath* Sean McNamara* Patrick Wendell* 0: -1: I'll finalize and package this release in the next 48 hours. Thanks to everyone who contributed. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Fwd: [VOTE] Release Apache Spark 1.2.0 (RC2)
Forgot Reply To All ;o( -- Forwarded message -- From: Krishna Sankar ksanka...@gmail.com Date: Wed, Dec 10, 2014 at 9:16 PM Subject: Re: [VOTE] Release Apache Spark 1.2.0 (RC2) To: Matei Zaharia matei.zaha...@gmail.com +1 Works same as RC1 1. Compiled OSX 10.10 (Yosemite) mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean package 13:07 min 2. Tested pyspark, mlib - running as well as compare results with 1.1.x 2.1. statistics OK 2.2. Linear/Ridge/Laso Regression OK Slight difference in the print method (vs. 1.1.x) of the model object - with a label more details. This is good. 2.3. Decision Tree, Naive Bayes OK Changes in print(model) - now print (model.ToDebugString()) - OK Some changes in NaiveBayes. Different from my 1.1.x code - had to flatten list structures, zip required same number in partitions After code changes ran fine. 2.4. KMeans OK Center And Scale OK zip occasionally fails with error localhost): org.apache.spark.SparkException: Can only zip RDDs with same number of elements in each partition Has https://issues.apache.org/jira/browse/SPARK-2251 reappeared ? Made it work by doing a different transformation ie reusing an original rdd. (Xiangrui, I will end you the iPython Notebook the dataset by a separate e-mail) 2.5. rdd operations OK State of the Union Texts - MapReduce, Filter,sortByKey (word count) 2.6. recommendation OK 2.7. Good work ! In 1.x.x, had a map distinct over the movielens medium dataset which never worked. Works fine in 1.2.0 ! 3. Scala Mlib - subset of examples as in #2 above, with Scala 3.1. statistics OK 3.2. Linear Regression OK 3.3. Decision Tree OK 3.4. KMeans OK Cheers k/ On Wed, Dec 10, 2014 at 3:05 PM, Matei Zaharia matei.zaha...@gmail.com wrote: +1 Tested on Mac OS X. Matei On Dec 10, 2014, at 1:08 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc2 (commit a428c446e2): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=a428c446e23e628b746e0626cc02b7b3cadf588e The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc2/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1055/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc2-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Saturday, December 13, at 21:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.2.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening relatively late into the QA period, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == How does this differ from RC1 == This has fixes for a handful of issues identified - some of the notable fixes are: [Core] SPARK-4498: Standalone Master can fail to recognize completed/failed applications [SQL] SPARK-4552: Query for empty parquet table in spark sql hive get IllegalArgumentException SPARK-4753: Parquet2 does not prune based on OR filters on partition columns SPARK-4761: With JDBC server, set Kryo as default serializer and disable reference tracking SPARK-4785: When called with arguments referring column fields, PMOD throws NPE - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[RESULT] [VOTE] Release Apache Spark 1.2.0 (RC2)
This vote has PASSED with 12 +1 votes (8 binding) and no 0 or -1 votes: +1: Matei Zaharia* Madhu Siddalingaiah Reynold Xin* Sandy Ryza Josh Rozen* Mark Hamstra* Denny Lee Tom Graves* GuiQiang Li Nick Pentreath* Sean McNamara* Patrick Wendell* 0: -1: I'll finalize and package this release in the next 48 hours. Thanks to everyone who contributed. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.2.0 (RC2)
I'm closing this vote now, will send results in a new thread. On Sat, Dec 13, 2014 at 12:47 PM, Sean McNamara sean.mcnam...@webtrends.com wrote: +1 tested on OS X and deployed+tested our apps via YARN into our staging cluster. Sean On Dec 11, 2014, at 10:40 AM, Reynold Xin r...@databricks.com wrote: +1 Tested on OS X. On Wednesday, December 10, 2014, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc2 (commit a428c446e2): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=a428c446e23e628b746e0626cc02b7b3cadf588e The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc2/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1055/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc2-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Saturday, December 13, at 21:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.2.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening relatively late into the QA period, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == How does this differ from RC1 == This has fixes for a handful of issues identified - some of the notable fixes are: [Core] SPARK-4498: Standalone Master can fail to recognize completed/failed applications [SQL] SPARK-4552: Query for empty parquet table in spark sql hive get IllegalArgumentException SPARK-4753: Parquet2 does not prune based on OR filters on partition columns SPARK-4761: With JDBC server, set Kryo as default serializer and disable reference tracking SPARK-4785: When called with arguments referring column fields, PMOD throws NPE - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org javascript:; For additional commands, e-mail: dev-h...@spark.apache.org javascript:; - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.2.0 (RC2)
+1 built and tested on Yarn on Hadoop 2.x cluster. Tom On Saturday, December 13, 2014 12:48 AM, Denny Lee denny.g@gmail.com wrote: +1 Tested on OSX Tested Scala 2.10.3, SparkSQL with Hive 0.12 / Hadoop 2.5, Thrift Server, MLLib SVD On Fri Dec 12 2014 at 8:57:16 PM Mark Hamstra m...@clearstorydata.com wrote: +1 On Fri, Dec 12, 2014 at 8:00 PM, Josh Rosen rosenvi...@gmail.com wrote: +1. Tested using spark-perf and the Spark EC2 scripts. I didn’t notice any performance regressions that could not be attributed to changes of default configurations. To be more specific, when running Spark 1.2.0 with the Spark 1.1.0 settings of spark.shuffle.manager=hash and spark.shuffle.blockTransferService=nio, there was no performance regression and, in fact, there were significant performance improvements for some workloads. In Spark 1.2.0, the new default settings are spark.shuffle.manager=sort and spark.shuffle.blockTransferService=netty. With these new settings, I noticed a performance regression in the scala-sort-by-key-int spark-perf test. However, Spark 1.1.0 and 1.1.1 exhibit a similar performance regression for that same test when run with spark.shuffle.manager=sort, so this regression seems explainable by the change of defaults. Besides this, most of the other tests ran at the same speeds or faster with the new 1.2.0 defaults. Also, keep in mind that this is a somewhat artificial micro benchmark; I have heard anecdotal reports from many users that their real workloads have run faster with 1.2.0. Based on these results, I’m comfortable giving a +1 on 1.2.0 RC2. - Josh On December 11, 2014 at 9:52:39 AM, Sandy Ryza (sandy.r...@cloudera.com) wrote: +1 (non-binding). Tested on Ubuntu against YARN. On Thu, Dec 11, 2014 at 9:38 AM, Reynold Xin r...@databricks.com wrote: +1 Tested on OS X. On Wednesday, December 10, 2014, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc2 (commit a428c446e2): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= a428c446e23e628b746e0626cc02b7b3cadf588e The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc2/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1055/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc2-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Saturday, December 13, at 21:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.2.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening relatively late into the QA period, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == How does this differ from RC1 == This has fixes for a handful of issues identified - some of the notable fixes are: [Core] SPARK-4498: Standalone Master can fail to recognize completed/failed applications [SQL] SPARK-4552: Query for empty parquet table in spark sql hive get IllegalArgumentException SPARK-4753: Parquet2 does not prune based on OR filters on partition columns SPARK-4761: With JDBC server, set Kryo as default serializer and disable reference tracking SPARK-4785: When called with arguments referring column fields, PMOD throws NPE - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org javascript:; For additional commands, e-mail: dev-h...@spark.apache.org javascript:;
Re: [VOTE] Release Apache Spark 1.2.0 (RC2)
+1 (non-binding). Tested on CentOS 6.4 -- Original -- From: Patrick Wendell;pwend...@gmail.com; Date: Thu, Dec 11, 2014 05:08 AM To: dev@spark.apache.orgdev@spark.apache.org; Subject: [VOTE] Release Apache Spark 1.2.0 (RC2) Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc2 (commit a428c446e2): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=a428c446e23e628b746e0626cc02b7b3cadf588e The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc2/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1055/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc2-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Saturday, December 13, at 21:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.2.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening relatively late into the QA period, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == How does this differ from RC1 == This has fixes for a handful of issues identified - some of the notable fixes are: [Core] SPARK-4498: Standalone Master can fail to recognize completed/failed applications [SQL] SPARK-4552: Query for empty parquet table in spark sql hive get IllegalArgumentException SPARK-4753: Parquet2 does not prune based on OR filters on partition columns SPARK-4761: With JDBC server, set Kryo as default serializer and disable reference tracking SPARK-4785: When called with arguments referring column fields, PMOD throws NPE - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.2.0 (RC2)
+1 — Sent from Mailbox On Sat, Dec 13, 2014 at 3:12 PM, GuoQiang Li wi...@qq.com wrote: +1 (non-binding). Tested on CentOS 6.4 -- Original -- From: Patrick Wendell;pwend...@gmail.com; Date: Thu, Dec 11, 2014 05:08 AM To: dev发送@spark.apache.orgdev@spark.apache.org; Subject: [VOTE] Release Apache Spark 1.2.0 (RC2) Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc2 (commit a428c446e2): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=a428c446e23e628b746e0626cc02b7b3cadf588e The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc2/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1055/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc2-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Saturday, December 13, at 21:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.2.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening relatively late into the QA period, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == How does this differ from RC1 == This has fixes for a handful of issues identified - some of the notable fixes are: [Core] SPARK-4498: Standalone Master can fail to recognize completed/failed applications [SQL] SPARK-4552: Query for empty parquet table in spark sql hive get IllegalArgumentException SPARK-4753: Parquet2 does not prune based on OR filters on partition columns SPARK-4761: With JDBC server, set Kryo as default serializer and disable reference tracking SPARK-4785: When called with arguments referring column fields, PMOD throws NPE - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.2.0 (RC2)
I am building and testing using sbt. I get a lot of Job aborted due to stage failure: Master removed our application: FAILED did not contain cancelled, and Job aborted due to stage failure: Master removed our application: FAILED did not contain killed errors trying to run tests. (JobCancellationSuite.scala:236) I have never experienced this before so it is concerning. I was able to successfully run all the python examples for spark and Mllib successfully. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-2-0-RC2-tp9713p9770.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.2.0 (RC2)
+1 tested on OS X and deployed+tested our apps via YARN into our staging cluster. Sean On Dec 11, 2014, at 10:40 AM, Reynold Xin r...@databricks.com wrote: +1 Tested on OS X. On Wednesday, December 10, 2014, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc2 (commit a428c446e2): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=a428c446e23e628b746e0626cc02b7b3cadf588e The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc2/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1055/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc2-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Saturday, December 13, at 21:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.2.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening relatively late into the QA period, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == How does this differ from RC1 == This has fixes for a handful of issues identified - some of the notable fixes are: [Core] SPARK-4498: Standalone Master can fail to recognize completed/failed applications [SQL] SPARK-4552: Query for empty parquet table in spark sql hive get IllegalArgumentException SPARK-4753: Parquet2 does not prune based on OR filters on partition columns SPARK-4761: With JDBC server, set Kryo as default serializer and disable reference tracking SPARK-4785: When called with arguments referring column fields, PMOD throws NPE - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org javascript:; For additional commands, e-mail: dev-h...@spark.apache.org javascript:; - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.2.0 (RC2)
+1. Tested using spark-perf and the Spark EC2 scripts. I didn’t notice any performance regressions that could not be attributed to changes of default configurations. To be more specific, when running Spark 1.2.0 with the Spark 1.1.0 settings of spark.shuffle.manager=hash and spark.shuffle.blockTransferService=nio, there was no performance regression and, in fact, there were significant performance improvements for some workloads. In Spark 1.2.0, the new default settings are spark.shuffle.manager=sort and spark.shuffle.blockTransferService=netty. With these new settings, I noticed a performance regression in the scala-sort-by-key-int spark-perf test. However, Spark 1.1.0 and 1.1.1 exhibit a similar performance regression for that same test when run with spark.shuffle.manager=sort, so this regression seems explainable by the change of defaults. Besides this, most of the other tests ran at the same speeds or faster with the new 1.2.0 defaults. Also, keep in mind that this is a somewhat artificial micro benchmark; I have heard anecdotal reports from many users that their real workloads have run faster with 1.2.0. Based on these results, I’m comfortable giving a +1 on 1.2.0 RC2. - Josh On December 11, 2014 at 9:52:39 AM, Sandy Ryza (sandy.r...@cloudera.com) wrote: +1 (non-binding). Tested on Ubuntu against YARN. On Thu, Dec 11, 2014 at 9:38 AM, Reynold Xin r...@databricks.com wrote: +1 Tested on OS X. On Wednesday, December 10, 2014, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc2 (commit a428c446e2): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=a428c446e23e628b746e0626cc02b7b3cadf588e The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc2/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1055/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc2-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Saturday, December 13, at 21:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.2.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening relatively late into the QA period, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == How does this differ from RC1 == This has fixes for a handful of issues identified - some of the notable fixes are: [Core] SPARK-4498: Standalone Master can fail to recognize completed/failed applications [SQL] SPARK-4552: Query for empty parquet table in spark sql hive get IllegalArgumentException SPARK-4753: Parquet2 does not prune based on OR filters on partition columns SPARK-4761: With JDBC server, set Kryo as default serializer and disable reference tracking SPARK-4785: When called with arguments referring column fields, PMOD throws NPE - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org javascript:; For additional commands, e-mail: dev-h...@spark.apache.org javascript:;
Re: [VOTE] Release Apache Spark 1.2.0 (RC2)
+1 On Fri, Dec 12, 2014 at 8:00 PM, Josh Rosen rosenvi...@gmail.com wrote: +1. Tested using spark-perf and the Spark EC2 scripts. I didn’t notice any performance regressions that could not be attributed to changes of default configurations. To be more specific, when running Spark 1.2.0 with the Spark 1.1.0 settings of spark.shuffle.manager=hash and spark.shuffle.blockTransferService=nio, there was no performance regression and, in fact, there were significant performance improvements for some workloads. In Spark 1.2.0, the new default settings are spark.shuffle.manager=sort and spark.shuffle.blockTransferService=netty. With these new settings, I noticed a performance regression in the scala-sort-by-key-int spark-perf test. However, Spark 1.1.0 and 1.1.1 exhibit a similar performance regression for that same test when run with spark.shuffle.manager=sort, so this regression seems explainable by the change of defaults. Besides this, most of the other tests ran at the same speeds or faster with the new 1.2.0 defaults. Also, keep in mind that this is a somewhat artificial micro benchmark; I have heard anecdotal reports from many users that their real workloads have run faster with 1.2.0. Based on these results, I’m comfortable giving a +1 on 1.2.0 RC2. - Josh On December 11, 2014 at 9:52:39 AM, Sandy Ryza (sandy.r...@cloudera.com) wrote: +1 (non-binding). Tested on Ubuntu against YARN. On Thu, Dec 11, 2014 at 9:38 AM, Reynold Xin r...@databricks.com wrote: +1 Tested on OS X. On Wednesday, December 10, 2014, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc2 (commit a428c446e2): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=a428c446e23e628b746e0626cc02b7b3cadf588e The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc2/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1055/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc2-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Saturday, December 13, at 21:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.2.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening relatively late into the QA period, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == How does this differ from RC1 == This has fixes for a handful of issues identified - some of the notable fixes are: [Core] SPARK-4498: Standalone Master can fail to recognize completed/failed applications [SQL] SPARK-4552: Query for empty parquet table in spark sql hive get IllegalArgumentException SPARK-4753: Parquet2 does not prune based on OR filters on partition columns SPARK-4761: With JDBC server, set Kryo as default serializer and disable reference tracking SPARK-4785: When called with arguments referring column fields, PMOD throws NPE - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org javascript:; For additional commands, e-mail: dev-h...@spark.apache.org javascript:;
Re: [VOTE] Release Apache Spark 1.2.0 (RC2)
+1 Tested on OSX Tested Scala 2.10.3, SparkSQL with Hive 0.12 / Hadoop 2.5, Thrift Server, MLLib SVD On Fri Dec 12 2014 at 8:57:16 PM Mark Hamstra m...@clearstorydata.com wrote: +1 On Fri, Dec 12, 2014 at 8:00 PM, Josh Rosen rosenvi...@gmail.com wrote: +1. Tested using spark-perf and the Spark EC2 scripts. I didn’t notice any performance regressions that could not be attributed to changes of default configurations. To be more specific, when running Spark 1.2.0 with the Spark 1.1.0 settings of spark.shuffle.manager=hash and spark.shuffle.blockTransferService=nio, there was no performance regression and, in fact, there were significant performance improvements for some workloads. In Spark 1.2.0, the new default settings are spark.shuffle.manager=sort and spark.shuffle.blockTransferService=netty. With these new settings, I noticed a performance regression in the scala-sort-by-key-int spark-perf test. However, Spark 1.1.0 and 1.1.1 exhibit a similar performance regression for that same test when run with spark.shuffle.manager=sort, so this regression seems explainable by the change of defaults. Besides this, most of the other tests ran at the same speeds or faster with the new 1.2.0 defaults. Also, keep in mind that this is a somewhat artificial micro benchmark; I have heard anecdotal reports from many users that their real workloads have run faster with 1.2.0. Based on these results, I’m comfortable giving a +1 on 1.2.0 RC2. - Josh On December 11, 2014 at 9:52:39 AM, Sandy Ryza (sandy.r...@cloudera.com) wrote: +1 (non-binding). Tested on Ubuntu against YARN. On Thu, Dec 11, 2014 at 9:38 AM, Reynold Xin r...@databricks.com wrote: +1 Tested on OS X. On Wednesday, December 10, 2014, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc2 (commit a428c446e2): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h= a428c446e23e628b746e0626cc02b7b3cadf588e The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc2/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1055/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc2-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Saturday, December 13, at 21:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.2.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening relatively late into the QA period, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == How does this differ from RC1 == This has fixes for a handful of issues identified - some of the notable fixes are: [Core] SPARK-4498: Standalone Master can fail to recognize completed/failed applications [SQL] SPARK-4552: Query for empty parquet table in spark sql hive get IllegalArgumentException SPARK-4753: Parquet2 does not prune based on OR filters on partition columns SPARK-4761: With JDBC server, set Kryo as default serializer and disable reference tracking SPARK-4785: When called with arguments referring column fields, PMOD throws NPE - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org javascript:; For additional commands, e-mail: dev-h...@spark.apache.org javascript:;
Re: [VOTE] Release Apache Spark 1.2.0 (RC2)
+1 (non-binding) Built and tested on Windows 7: cd apache-spark git fetch git checkout v1.2.0-rc2 sbt assembly [warn] ... [warn] [success] Total time: 720 s, completed Dec 11, 2014 8:57:36 AM dir assembly\target\scala-2.10\spark-assembly-1.2.0-hadoop1.0.4.jar 110,361,054 spark-assembly-1.2.0-hadoop1.0.4.jar Ran some of my 1.2 code successfully. Review some docs, looks good. spark-shell.cmd works as expected. Env details: sbtconfig.txt: -Xmx1024M -XX:MaxPermSize=256m -XX:ReservedCodeCacheSize=128m sbt --version sbt launcher version 0.13.1 - -- Madhu https://www.linkedin.com/in/msiddalingaiah -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-2-0-RC2-tp9713p9728.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.2.0 (RC2)
Signatures and checksums are OK. License and notice still looks fine. The plain-vanilla source release compiles with Maven 3.2.1 and passes tests, on OS X 10.10 + Java 8. On Wed, Dec 10, 2014 at 9:08 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc2 (commit a428c446e2): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=a428c446e23e628b746e0626cc02b7b3cadf588e The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc2/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1055/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc2-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Saturday, December 13, at 21:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.2.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening relatively late into the QA period, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == How does this differ from RC1 == This has fixes for a handful of issues identified - some of the notable fixes are: [Core] SPARK-4498: Standalone Master can fail to recognize completed/failed applications [SQL] SPARK-4552: Query for empty parquet table in spark sql hive get IllegalArgumentException SPARK-4753: Parquet2 does not prune based on OR filters on partition columns SPARK-4761: With JDBC server, set Kryo as default serializer and disable reference tracking SPARK-4785: When called with arguments referring column fields, PMOD throws NPE - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.2.0 (RC2)
+1 Tested on OS X. On Wednesday, December 10, 2014, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc2 (commit a428c446e2): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=a428c446e23e628b746e0626cc02b7b3cadf588e The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc2/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1055/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc2-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Saturday, December 13, at 21:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.2.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening relatively late into the QA period, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == How does this differ from RC1 == This has fixes for a handful of issues identified - some of the notable fixes are: [Core] SPARK-4498: Standalone Master can fail to recognize completed/failed applications [SQL] SPARK-4552: Query for empty parquet table in spark sql hive get IllegalArgumentException SPARK-4753: Parquet2 does not prune based on OR filters on partition columns SPARK-4761: With JDBC server, set Kryo as default serializer and disable reference tracking SPARK-4785: When called with arguments referring column fields, PMOD throws NPE - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org javascript:; For additional commands, e-mail: dev-h...@spark.apache.org javascript:;
Re: [VOTE] Release Apache Spark 1.2.0 (RC2)
+1 (non-binding). Tested on Ubuntu against YARN. On Thu, Dec 11, 2014 at 9:38 AM, Reynold Xin r...@databricks.com wrote: +1 Tested on OS X. On Wednesday, December 10, 2014, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc2 (commit a428c446e2): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=a428c446e23e628b746e0626cc02b7b3cadf588e The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc2/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1055/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc2-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Saturday, December 13, at 21:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.2.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening relatively late into the QA period, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == How does this differ from RC1 == This has fixes for a handful of issues identified - some of the notable fixes are: [Core] SPARK-4498: Standalone Master can fail to recognize completed/failed applications [SQL] SPARK-4552: Query for empty parquet table in spark sql hive get IllegalArgumentException SPARK-4753: Parquet2 does not prune based on OR filters on partition columns SPARK-4761: With JDBC server, set Kryo as default serializer and disable reference tracking SPARK-4785: When called with arguments referring column fields, PMOD throws NPE - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org javascript:; For additional commands, e-mail: dev-h...@spark.apache.org javascript:;
[VOTE] Release Apache Spark 1.2.0 (RC2)
Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc2 (commit a428c446e2): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=a428c446e23e628b746e0626cc02b7b3cadf588e The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc2/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1055/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc2-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Saturday, December 13, at 21:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.2.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening relatively late into the QA period, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == How does this differ from RC1 == This has fixes for a handful of issues identified - some of the notable fixes are: [Core] SPARK-4498: Standalone Master can fail to recognize completed/failed applications [SQL] SPARK-4552: Query for empty parquet table in spark sql hive get IllegalArgumentException SPARK-4753: Parquet2 does not prune based on OR filters on partition columns SPARK-4761: With JDBC server, set Kryo as default serializer and disable reference tracking SPARK-4785: When called with arguments referring column fields, PMOD throws NPE - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.2.0 (RC2)
+1 Tested on Mac OS X. Matei On Dec 10, 2014, at 1:08 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.2.0! The tag to be voted on is v1.2.0-rc2 (commit a428c446e2): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=a428c446e23e628b746e0626cc02b7b3cadf588e The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc2/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1055/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-1.2.0-rc2-docs/ Please vote on releasing this package as Apache Spark 1.2.0! The vote is open until Saturday, December 13, at 21:00 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.2.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What justifies a -1 vote for this release? == This vote is happening relatively late into the QA period, so -1 votes should only occur for significant regressions from 1.0.2. Bugs already present in 1.1.X, minor regressions, or bugs related to new features will not block this release. == What default changes should I be aware of? == 1. The default value of spark.shuffle.blockTransferService has been changed to netty -- Old behavior can be restored by switching to nio 2. The default value of spark.shuffle.manager has been changed to sort. -- Old behavior can be restored by setting spark.shuffle.manager to hash. == How does this differ from RC1 == This has fixes for a handful of issues identified - some of the notable fixes are: [Core] SPARK-4498: Standalone Master can fail to recognize completed/failed applications [SQL] SPARK-4552: Query for empty parquet table in spark sql hive get IllegalArgumentException SPARK-4753: Parquet2 does not prune based on OR filters on partition columns SPARK-4761: With JDBC server, set Kryo as default serializer and disable reference tracking SPARK-4785: When called with arguments referring column fields, PMOD throws NPE - Patrick - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org