Re: [VOTE] Release Apache Spark 1.4.0 (RC3)
Hi everyone, I think there's a blocker on PySpark the when functions in python seems to be broken but the Scala API seems fine. Here's a snippet demonstrating that with Spark 1.4.0 RC3 : In [*1*]: df = sqlCtx.createDataFrame([(1, 1), (2, 2), (1, 2), (1, 2)], [key, value]) In [*2*]: from pyspark.sql import functions as F In [*8*]: df.select(df.key, F.when(df.key 1, 0).when(df.key == 0, 2).otherwise(1)).show() +---+-+ | key |CASE WHEN (key = 0) THEN 2 ELSE 1| +---+-+ | 1| 1| | 2| 1| | 1| 1| | 1| 1| +---+-+ When in Scala I get the expectes expression and behaviour : scala val df = sqlContext.createDataFrame(List((1, 1), (2, 2), (1, 2), (1, 2))).toDF(key, value) scala import org.apache.spark.sql.functions._ scala df.select(df(key), when(df(key) 1, 0).when(df(key) === 2, 2).otherwise(1)).show() +---+---+ |key|CASE WHEN (key 1) THEN 0 WHEN (key = 2) THEN 2 ELSE 1| +---+---+ | 1| 1| | 2| 0| | 1| 1| | 1| 1| +---+---+ I've opened the Jira (https://issues.apache.org/jira/browse/SPARK-8038) and fixed it here https://github.com/apache/spark/pull/6580 Regards, Olivier. Le mar. 2 juin 2015 à 07:34, Bobby Chowdary bobby.chowdar...@gmail.com a écrit : Hi Patrick, Thanks for clarifying. No issues with functionality. +1 (non-binding) Thanks Bobby On Mon, Jun 1, 2015 at 9:41 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Bobby, Those are generic warnings that the hadoop libraries throw. If you are using MapRFS they shouldn't matter since you are using the MapR client and not the default hadoop client. Do you have any issues with functionality... or was it just seeing the warnings that was the concern? Thanks for helping test! - Patrick On Mon, Jun 1, 2015 at 5:18 PM, Bobby Chowdary bobby.chowdar...@gmail.com wrote: Hive Context works on RC3 for Mapr after adding spark.sql.hive.metastore.sharedPrefixes as suggested in SPARK-7819. However, there still seems to be some other issues with native libraries, i get below warning WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable. I tried adding even after adding SPARK_LIBRARYPATH and --driver-library-path with no luck. Built on MacOSX and running CentOS 7 JDK1.6 and JDK 1.8 (tried both) make-distribution.sh --tgz --skip-java-test -Phive -Phive-0.13.1 -Pmapr4 -Pnetlib-lgpl -Phive-thriftserver. C On Mon, Jun 1, 2015 at 3:05 PM, Sean Owen so...@cloudera.com wrote: I get a bunch of failures in VersionSuite with build/test params -Pyarn -Phive -Phadoop-2.6: - success sanity check *** FAILED *** java.lang.RuntimeException: [download failed: org.jboss.netty#netty;3.2.2.Final!netty.jar(bundle), download failed: commons-net#commons-net;3.1!commons-net.jar] at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:978) ... but maybe I missed the memo about how to build for Hive? do I still need another Hive profile? Other tests, signatures, etc look good. On Sat, May 30, 2015 at 12:40 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.0! The tag to be voted on is v1.4.0-rc3 (commit dd109a8): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.0] https://repository.apache.org/content/repositories/orgapachespark-1109/ [published as version: 1.4.0-rc3] https://repository.apache.org/content/repositories/orgapachespark-1110/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/ Please vote on releasing this package as Apache Spark 1.4.0! The vote is open until Tuesday, June 02, at 00:32 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What has changed since RC1 == Below is a list of bug fixes that went into this RC: http://s.apache.org/vN == How can I help test this release? == If you are a Spark user, you can help us test this release by taking a Spark 1.3 workload
[RESULT] [VOTE] Release Apache Spark 1.4.0 (RC3)
This vote is cancelled in favor of RC4. Thanks everyone for the thorough testing of this RC. We are really close, but there were a few blockers found. I've cut a new RC to incorporate those issues. The following patches were merged during the RC3 testing period: (blockers) 4940630 [SPARK-8020] [SQL] Spark SQL conf in spark-defaults.conf make metadataHive get constructed too early 6b0f615 [SPARK-8038] [SQL] [PYSPARK] fix Column.when() and otherwise() 78a6723 [SPARK-7978] [SQL] [PYSPARK] DecimalType should not be singleton (other fixes) 9d6475b [SPARK-6917] [SQL] DecimalType is not read back when non-native type exists 97d4cd0 [SPARK-8049] [MLLIB] drop tmp col from OneVsRest output cbaf595 [SPARK-8014] [SQL] Avoid premature metadata discovery when writing a HadoopFsRelation with a save mode other than Append fa292dc [SPARK-8015] [FLUME] Remove Guava dependency from flume-sink. f71a09d [SPARK-8037] [SQL] Ignores files whose name starts with dot in HadoopFsRelation 292ee1a [SPARK-8021] [SQL] [PYSPARK] make Python read/write API consistent with Scala 87941ff [SPARK-8023][SQL] Add deterministic attribute to Expression to avoid collapsing nondeterministic projects. e6d5895 [SPARK-7965] [SPARK-7972] [SQL] Handle expressions containing multiple window expressions and make parser match window frames in case insensitive way 8ac2376 [SPARK-8026][SQL] Add Column.alias to Scala/Java DataFrame API efc0e05 [SPARK-7982][SQL] DataFrame.stat.crosstab should use 0 instead of null for pairs that don't appear cbfb682a [SPARK-8028] [SPARKR] Use addJar instead of setJars in SparkR a7c8b00 [SPARK-7958] [STREAMING] Handled exception in StreamingContext.start() to prevent leaking of actors a76c2e1 [SPARK-7899] [PYSPARK] Fix Python 3 pyspark/sql/types module conflict f1d4e7e [SPARK-7227] [SPARKR] Support fillna / dropna in R DataFrame. 01f38f7 [SPARK-7979] Enforce structural type checker. 2c45009 [SPARK-7459] [MLLIB] ElementwiseProduct Java example 8938a74 [SPARK-7962] [MESOS] Fix master url parsing in rest submission client. 1513cff [SPARK-7957] Preserve partitioning when using randomSplit 9a88be1 [SPARK-6013] [ML] Add more Python ML examples for spark.ml 2bd4460 [SPARK-7954] [SPARKR] Create SparkContext in sparkRSQL init On Fri, May 29, 2015 at 4:40 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.0! The tag to be voted on is v1.4.0-rc3 (commit dd109a8): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.0] https://repository.apache.org/content/repositories/orgapachespark-1109/ [published as version: 1.4.0-rc3] https://repository.apache.org/content/repositories/orgapachespark-1110/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/ Please vote on releasing this package as Apache Spark 1.4.0! The vote is open until Tuesday, June 02, at 00:32 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What has changed since RC1 == Below is a list of bug fixes that went into this RC: http://s.apache.org/vN == How can I help test this release? == If you are a Spark user, you can help us test this release by taking a Spark 1.3 workload and running on this release candidate, then reporting any regressions. == What justifies a -1 vote for this release? == This vote is happening towards the end of the 1.4 QA period, so -1 votes should only occur for significant regressions from 1.3.1. Bugs already present in 1.3.X, minor regressions, or bugs related to new features will not block this release. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.4.0 (RC3)
Still have problem using HiveContext from sbt. Here’s an example of dependencies: |val sparkVersion = 1.4.0-rc3 lazy val root = Project(id = spark-hive, base = file(.), settings = Project.defaultSettings ++ Seq( name := spark-1.4-hive, scalaVersion := 2.10.5, scalaBinaryVersion := 2.10, resolvers += Spark RC at https://repository.apache.org/content/repositories/orgapachespark-1110/;, libraryDependencies ++= Seq( org.apache.spark %% spark-core % sparkVersion, org.apache.spark %% spark-mllib % sparkVersion, org.apache.spark %% spark-hive % sparkVersion, org.apache.spark %% spark-sql % sparkVersion ) )) | Launching sbt console with it and running: |val conf = new SparkConf().setMaster(local[4]).setAppName(test) val sc = new SparkContext(conf) val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) val data = sc.parallelize(1 to 1) import sqlContext.implicits._ scala data.toDF java.lang.IllegalArgumentException: Unable to locate hive jars to connect to metastore using classloader scala.tools.nsc.interpreter.IMain$TranslatingClassLoader. Please set spark.sql.hive.metastore.jars at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:206) at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:175) at org.apache.spark.sql.hive.HiveContext$anon$2.init(HiveContext.scala:367) at org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:367) at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:366) at org.apache.spark.sql.hive.HiveContext$anon$1.init(HiveContext.scala:379) at org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:379) at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:378) at org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:901) at org.apache.spark.sql.DataFrame.init(DataFrame.scala:134) at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:474) at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:456) at org.apache.spark.sql.SQLContext$implicits$.intRddToDataFrameHolder(SQLContext.scala:345) | Thanks, Peter Rudenko On 2015-06-01 05:04, Guoqiang Li wrote: +1 (non-binding) -- Original -- *From: * Sandy Ryza;sandy.r...@cloudera.com; *Date: * Mon, Jun 1, 2015 07:34 AM *To: * Krishna Sankarksanka...@gmail.com; *Cc: * Patrick Wendellpwend...@gmail.com; dev@spark.apache.orgdev@spark.apache.org; *Subject: * Re: [VOTE] Release Apache Spark 1.4.0 (RC3) +1 (non-binding) Launched against a pseudo-distributed YARN cluster running Hadoop 2.6.0 and ran some jobs. -Sandy On Sat, May 30, 2015 at 3:44 PM, Krishna Sankar ksanka...@gmail.com mailto:ksanka...@gmail.com wrote: +1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 17:07 min mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -DskipTests 2. Tested pyspark, mlib - running as well as compare results with 1.3.1 2.1. statistics (min,max,mean,Pearson,Spearman) OK 2.2. Linear/Ridge/Laso Regression OK 2.3. Decision Tree, Naive Bayes OK 2.4. KMeans OK Center And Scale OK 2.5. RDD operations OK State of the Union Texts - MapReduce, Filter,sortByKey (word count) 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK Model evaluation/optimization (rank, numIter, lambda) with itertools OK 3. Scala - MLlib 3.1. statistics (min,max,mean,Pearson,Spearman) OK 3.2. LinearRegressionWithSGD OK 3.3. Decision Tree OK 3.4. KMeans OK 3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK 3.6. saveAsParquetFile OK 3.7. Read and verify the 4.3 save(above) - sqlContext.parquetFile, registerTempTable, sql OK 3.8. result = sqlContext.sql(SELECT OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID) OK 4.0. Spark SQL from Python OK 4.1. result = sqlContext.sql(SELECT * from people WHERE State = 'WA') OK Cheers k/ On Fri, May 29, 2015 at 4:40 PM, Patrick Wendell pwend...@gmail.com mailto:pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.0! The tag to be voted on is v1.4.0-rc3 (commit dd109a8): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/ http://people.apache.org/%7Epwendell/spark-releases/spark-1.4.0-rc3-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer
Re: [VOTE] Release Apache Spark 1.4.0 (RC3)
Hi Peter, Based on your error message, seems you were not using the RC3. For the error thrown at HiveContext's line 206, we have changed the message to this one https://github.com/apache/spark/blob/v1.4.0-rc3/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L205-207 just before RC3. Basically, we will not print out the class loader name. Can you check if a older version of 1.4 branch got used? Have you published a RC3 to your local maven repo? Can you clean your local repo cache and try again? Thanks, Yin On Mon, Jun 1, 2015 at 10:45 AM, Peter Rudenko petro.rude...@gmail.com wrote: Still have problem using HiveContext from sbt. Here’s an example of dependencies: val sparkVersion = 1.4.0-rc3 lazy val root = Project(id = spark-hive, base = file(.), settings = Project.defaultSettings ++ Seq( name := spark-1.4-hive, scalaVersion := 2.10.5, scalaBinaryVersion := 2.10, resolvers += Spark RC at https://repository.apache.org/content/repositories/orgapachespark-1110/; https://repository.apache.org/content/repositories/orgapachespark-1110/, libraryDependencies ++= Seq( org.apache.spark %% spark-core % sparkVersion, org.apache.spark %% spark-mllib % sparkVersion, org.apache.spark %% spark-hive % sparkVersion, org.apache.spark %% spark-sql % sparkVersion ) )) Launching sbt console with it and running: val conf = new SparkConf().setMaster(local[4]).setAppName(test) val sc = new SparkContext(conf) val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) val data = sc.parallelize(1 to 1) import sqlContext.implicits._ scala data.toDF java.lang.IllegalArgumentException: Unable to locate hive jars to connect to metastore using classloader scala.tools.nsc.interpreter.IMain$TranslatingClassLoader. Please set spark.sql.hive.metastore.jars at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:206) at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:175) at org.apache.spark.sql.hive.HiveContext$anon$2.init(HiveContext.scala:367) at org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:367) at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:366) at org.apache.spark.sql.hive.HiveContext$anon$1.init(HiveContext.scala:379) at org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:379) at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:378) at org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:901) at org.apache.spark.sql.DataFrame.init(DataFrame.scala:134) at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:474) at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:456) at org.apache.spark.sql.SQLContext$implicits$.intRddToDataFrameHolder(SQLContext.scala:345) Thanks, Peter Rudenko On 2015-06-01 05:04, Guoqiang Li wrote: +1 (non-binding) -- Original -- *From: * Sandy Ryza;sandy.r...@cloudera.com sandy.r...@cloudera.com ; *Date: * Mon, Jun 1, 2015 07:34 AM *To: * Krishna Sankarksanka...@gmail.com ksanka...@gmail.com; *Cc: * Patrick Wendellpwend...@gmail.com pwend...@gmail.com; dev@spark.apache.org dev@spark.apache.orgdev@spark.apache.org dev@spark.apache.org; *Subject: * Re: [VOTE] Release Apache Spark 1.4.0 (RC3) +1 (non-binding) Launched against a pseudo-distributed YARN cluster running Hadoop 2.6.0 and ran some jobs. -Sandy On Sat, May 30, 2015 at 3:44 PM, Krishna Sankar ksanka...@gmail.com ksanka...@gmail.com wrote: +1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 17:07 min mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -DskipTests 2. Tested pyspark, mlib - running as well as compare results with 1.3.1 2.1. statistics (min,max,mean,Pearson,Spearman) OK 2.2. Linear/Ridge/Laso Regression OK 2.3. Decision Tree, Naive Bayes OK 2.4. KMeans OK Center And Scale OK 2.5. RDD operations OK State of the Union Texts - MapReduce, Filter,sortByKey (word count) 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK Model evaluation/optimization (rank, numIter, lambda) with itertools OK 3. Scala - MLlib 3.1. statistics (min,max,mean,Pearson,Spearman) OK 3.2. LinearRegressionWithSGD OK 3.3. Decision Tree OK 3.4. KMeans OK 3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK 3.6. saveAsParquetFile OK 3.7. Read and verify the 4.3 save(above) - sqlContext.parquetFile, registerTempTable, sql OK 3.8. result = sqlContext.sql(SELECT OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID
Re: [VOTE] Release Apache Spark 1.4.0 (RC3)
+1 (binding) Tested the standalone cluster mode REST submission gateway - submit / status / kill Tested simple applications on YARN client / cluster modes with and without --jars Tested python applications on YARN client / cluster modes with and without --py-files* Tested dynamic allocation on YARN client / cluster modes All good. *Filed SPARK-8017: not a blocker because python in YARN cluster mode is a new feature 2015-06-01 11:10 GMT-07:00 Yin Huai yh...@databricks.com: Hi Peter, Based on your error message, seems you were not using the RC3. For the error thrown at HiveContext's line 206, we have changed the message to this one https://github.com/apache/spark/blob/v1.4.0-rc3/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L205-207 just before RC3. Basically, we will not print out the class loader name. Can you check if a older version of 1.4 branch got used? Have you published a RC3 to your local maven repo? Can you clean your local repo cache and try again? Thanks, Yin On Mon, Jun 1, 2015 at 10:45 AM, Peter Rudenko petro.rude...@gmail.com wrote: Still have problem using HiveContext from sbt. Here’s an example of dependencies: val sparkVersion = 1.4.0-rc3 lazy val root = Project(id = spark-hive, base = file(.), settings = Project.defaultSettings ++ Seq( name := spark-1.4-hive, scalaVersion := 2.10.5, scalaBinaryVersion := 2.10, resolvers += Spark RC at https://repository.apache.org/content/repositories/orgapachespark-1110/; https://repository.apache.org/content/repositories/orgapachespark-1110/, libraryDependencies ++= Seq( org.apache.spark %% spark-core % sparkVersion, org.apache.spark %% spark-mllib % sparkVersion, org.apache.spark %% spark-hive % sparkVersion, org.apache.spark %% spark-sql % sparkVersion ) )) Launching sbt console with it and running: val conf = new SparkConf().setMaster(local[4]).setAppName(test) val sc = new SparkContext(conf) val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc) val data = sc.parallelize(1 to 1) import sqlContext.implicits._ scala data.toDF java.lang.IllegalArgumentException: Unable to locate hive jars to connect to metastore using classloader scala.tools.nsc.interpreter.IMain$TranslatingClassLoader. Please set spark.sql.hive.metastore.jars at org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:206) at org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:175) at org.apache.spark.sql.hive.HiveContext$anon$2.init(HiveContext.scala:367) at org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:367) at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:366) at org.apache.spark.sql.hive.HiveContext$anon$1.init(HiveContext.scala:379) at org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:379) at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:378) at org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:901) at org.apache.spark.sql.DataFrame.init(DataFrame.scala:134) at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:474) at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:456) at org.apache.spark.sql.SQLContext$implicits$.intRddToDataFrameHolder(SQLContext.scala:345) Thanks, Peter Rudenko On 2015-06-01 05:04, Guoqiang Li wrote: +1 (non-binding) -- Original -- *From: * Sandy Ryza;sandy.r...@cloudera.com sandy.r...@cloudera.com; *Date: * Mon, Jun 1, 2015 07:34 AM *To: * Krishna Sankarksanka...@gmail.com ksanka...@gmail.com; *Cc: * Patrick Wendellpwend...@gmail.com pwend...@gmail.com; dev@spark.apache.org dev@spark.apache.orgdev@spark.apache.org dev@spark.apache.org; *Subject: * Re: [VOTE] Release Apache Spark 1.4.0 (RC3) +1 (non-binding) Launched against a pseudo-distributed YARN cluster running Hadoop 2.6.0 and ran some jobs. -Sandy On Sat, May 30, 2015 at 3:44 PM, Krishna Sankar ksanka...@gmail.com ksanka...@gmail.com wrote: +1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 17:07 min mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -DskipTests 2. Tested pyspark, mlib - running as well as compare results with 1.3.1 2.1. statistics (min,max,mean,Pearson,Spearman) OK 2.2. Linear/Ridge/Laso Regression OK 2.3. Decision Tree, Naive Bayes OK 2.4. KMeans OK Center And Scale OK 2.5. RDD operations OK State of the Union Texts - MapReduce, Filter,sortByKey (word count) 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK Model evaluation/optimization (rank, numIter, lambda) with itertools OK 3
Re: [VOTE] Release Apache Spark 1.4.0 (RC3)
On 2015-06-01 05:04, Guoqiang Li wrote: +1 (non-binding) -- Original -- *From: * Sandy Ryza;sandy.r...@cloudera.com mailto:sandy.r...@cloudera.com; *Date: * Mon, Jun 1, 2015 07:34 AM *To: * Krishna Sankarksanka...@gmail.com mailto:ksanka...@gmail.com; *Cc: * Patrick Wendellpwend...@gmail.com mailto:pwend...@gmail.com; dev@spark.apache.org mailto:dev@spark.apache.orgdev@spark.apache.org mailto:dev@spark.apache.org; *Subject: * Re: [VOTE] Release Apache Spark 1.4.0 (RC3) +1 (non-binding) Launched against a pseudo-distributed YARN cluster running Hadoop 2.6.0 and ran some jobs. -Sandy On Sat, May 30, 2015 at 3:44 PM, Krishna Sankar ksanka...@gmail.com mailto:ksanka...@gmail.com wrote: +1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 17:07 min mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -DskipTests 2. Tested pyspark, mlib - running as well as compare results with 1.3.1 2.1. statistics (min,max,mean,Pearson,Spearman) OK 2.2. Linear/Ridge/Laso Regression OK 2.3. Decision Tree, Naive Bayes OK 2.4. KMeans OK Center And Scale OK 2.5. RDD operations OK State of the Union Texts - MapReduce, Filter,sortByKey (word count) 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK Model evaluation/optimization (rank, numIter, lambda) with itertools OK 3. Scala - MLlib 3.1. statistics (min,max,mean,Pearson,Spearman) OK 3.2. LinearRegressionWithSGD OK 3.3. Decision Tree OK 3.4. KMeans OK 3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK 3.6. saveAsParquetFile OK 3.7. Read and verify the 4.3 save(above) - sqlContext.parquetFile, registerTempTable, sql OK 3.8. result = sqlContext.sql(SELECT OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID) OK 4.0. Spark SQL from Python OK 4.1. result = sqlContext.sql(SELECT * from people WHERE State = 'WA') OK Cheers k/ On Fri, May 29, 2015 at 4:40 PM, Patrick Wendell pwend...@gmail.com mailto:pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.0! The tag to be voted on is v1.4.0-rc3 (commit dd109a8): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/ http://people.apache.org/%7Epwendell/spark-releases/spark-1.4.0-rc3-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.0] https://repository.apache.org/content/repositories/orgapachespark-1109/ [published as version: 1.4.0-rc3] https://repository.apache.org/content/repositories/orgapachespark-1110/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/ http://people.apache.org/%7Epwendell/spark-releases/spark-1.4.0-rc3-docs/ Please vote on releasing this package as Apache Spark 1.4.0! The vote is open until Tuesday, June 02, at 00:32 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What has changed since RC1 == Below is a list of bug fixes that went into this RC: http://s.apache.org/vN == How can I help test this release? == If you are a Spark user, you can help us test this release by taking a Spark 1.3 workload and running on this release candidate, then reporting any regressions. == What justifies a -1 vote for this release? == This vote is happening towards the end of the 1.4 QA period, so -1 votes should only occur for significant regressions from 1.3.1. Bugs already present in 1.3.X, minor regressions, or bugs related to new features will not block this release
Re: [VOTE] Release Apache Spark 1.4.0 (RC3)
(HiveContext.scala:379) at org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:379) at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:378) at org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:901) at org.apache.spark.sql.DataFrame.init(DataFrame.scala:134) at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:474) at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:456) at org.apache.spark.sql.SQLContext$implicits$.intRddToDataFrameHolder(SQLContext.scala:345) Thanks, Peter Rudenko On 2015-06-01 05:04, Guoqiang Li wrote: +1 (non-binding) -- Original -- *From: * Sandy Ryza; sandy.r...@cloudera.com sandy.r...@cloudera.com sandy.r...@cloudera.com; *Date: * Mon, Jun 1, 2015 07:34 AM *To: * Krishna Sankar ksanka...@gmail.comksanka...@gmail.com ksanka...@gmail.com; *Cc: * Patrick Wendell pwend...@gmail.compwend...@gmail.com pwend...@gmail.com; dev@spark.apache.org dev@spark.apache.org dev@spark.apache.orgdev@spark.apache.org dev@spark.apache.org; *Subject: * Re: [VOTE] Release Apache Spark 1.4.0 (RC3) +1 (non-binding) Launched against a pseudo-distributed YARN cluster running Hadoop 2.6.0 and ran some jobs. -Sandy On Sat, May 30, 2015 at 3:44 PM, Krishna Sankar ksanka...@gmail.com ksanka...@gmail.com wrote: +1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 17:07 min mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -DskipTests 2. Tested pyspark, mlib - running as well as compare results with 1.3.1 2.1. statistics (min,max,mean,Pearson,Spearman) OK 2.2. Linear/Ridge/Laso Regression OK 2.3. Decision Tree, Naive Bayes OK 2.4. KMeans OK Center And Scale OK 2.5. RDD operations OK State of the Union Texts - MapReduce, Filter,sortByKey (word count) 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK Model evaluation/optimization (rank, numIter, lambda) with itertools OK 3. Scala - MLlib 3.1. statistics (min,max,mean,Pearson,Spearman) OK 3.2. LinearRegressionWithSGD OK 3.3. Decision Tree OK 3.4. KMeans OK 3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK 3.6. saveAsParquetFile OK 3.7. Read and verify the 4.3 save(above) - sqlContext.parquetFile, registerTempTable, sql OK 3.8. result = sqlContext.sql(SELECT OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID) OK 4.0. Spark SQL from Python OK 4.1. result = sqlContext.sql(SELECT * from people WHERE State = 'WA') OK Cheers k/ On Fri, May 29, 2015 at 4:40 PM, Patrick Wendell pwend...@gmail.com pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.0! The tag to be voted on is v1.4.0-rc3 (commit dd109a8): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.0] https://repository.apache.org/content/repositories/orgapachespark-1109/ [published as version: 1.4.0-rc3] https://repository.apache.org/content/repositories/orgapachespark-1110/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/ Please vote on releasing this package as Apache Spark 1.4.0! The vote is open until Tuesday, June 02, at 00:32 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What has changed since RC1 == Below is a list of bug fixes that went into this RC: http://s.apache.org/vN == How can I help test this release? == If you are a Spark user, you can help us test this release by taking a Spark 1.3 workload and running on this release candidate, then reporting any regressions. == What justifies a -1 vote for this release? == This vote is happening towards the end of the 1.4 QA period, so -1 votes should only occur for significant regressions from 1.3.1. Bugs already present in 1.3.X, minor regressions, or bugs related to new features will not block this release. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.4.0 (RC3)
I get a bunch of failures in VersionSuite with build/test params -Pyarn -Phive -Phadoop-2.6: - success sanity check *** FAILED *** java.lang.RuntimeException: [download failed: org.jboss.netty#netty;3.2.2.Final!netty.jar(bundle), download failed: commons-net#commons-net;3.1!commons-net.jar] at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:978) ... but maybe I missed the memo about how to build for Hive? do I still need another Hive profile? Other tests, signatures, etc look good. On Sat, May 30, 2015 at 12:40 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.0! The tag to be voted on is v1.4.0-rc3 (commit dd109a8): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.0] https://repository.apache.org/content/repositories/orgapachespark-1109/ [published as version: 1.4.0-rc3] https://repository.apache.org/content/repositories/orgapachespark-1110/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/ Please vote on releasing this package as Apache Spark 1.4.0! The vote is open until Tuesday, June 02, at 00:32 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What has changed since RC1 == Below is a list of bug fixes that went into this RC: http://s.apache.org/vN == How can I help test this release? == If you are a Spark user, you can help us test this release by taking a Spark 1.3 workload and running on this release candidate, then reporting any regressions. == What justifies a -1 vote for this release? == This vote is happening towards the end of the 1.4 QA period, so -1 votes should only occur for significant regressions from 1.3.1. Bugs already present in 1.3.X, minor regressions, or bugs related to new features will not block this release. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.4.0 (RC3)
Hive Context works on RC3 for Mapr after adding spark.sql.hive.metastore.sharedPrefixes as suggested in SPARK-7819 https://issues.apache.org/jira/browse/SPARK-7819. However, there still seems to be some other issues with native libraries, i get below warning WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable. I tried adding even after adding SPARK_LIBRARYPATH and --driver-library-path with no luck. Built on MacOSX and running CentOS 7 JDK1.6 and JDK 1.8 (tried both) make-distribution.sh --tgz --skip-java-test -Phive -Phive-0.13.1 -Pmapr4 -Pnetlib-lgpl -Phive-thriftserver. C On Mon, Jun 1, 2015 at 3:05 PM, Sean Owen so...@cloudera.com wrote: I get a bunch of failures in VersionSuite with build/test params -Pyarn -Phive -Phadoop-2.6: - success sanity check *** FAILED *** java.lang.RuntimeException: [download failed: org.jboss.netty#netty;3.2.2.Final!netty.jar(bundle), download failed: commons-net#commons-net;3.1!commons-net.jar] at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:978) ... but maybe I missed the memo about how to build for Hive? do I still need another Hive profile? Other tests, signatures, etc look good. On Sat, May 30, 2015 at 12:40 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.0! The tag to be voted on is v1.4.0-rc3 (commit dd109a8): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.0] https://repository.apache.org/content/repositories/orgapachespark-1109/ [published as version: 1.4.0-rc3] https://repository.apache.org/content/repositories/orgapachespark-1110/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/ Please vote on releasing this package as Apache Spark 1.4.0! The vote is open until Tuesday, June 02, at 00:32 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What has changed since RC1 == Below is a list of bug fixes that went into this RC: http://s.apache.org/vN == How can I help test this release? == If you are a Spark user, you can help us test this release by taking a Spark 1.3 workload and running on this release candidate, then reporting any regressions. == What justifies a -1 vote for this release? == This vote is happening towards the end of the 1.4 QA period, so -1 votes should only occur for significant regressions from 1.3.1. Bugs already present in 1.3.X, minor regressions, or bugs related to new features will not block this release. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.4.0 (RC3)
Hey Bobby, Those are generic warnings that the hadoop libraries throw. If you are using MapRFS they shouldn't matter since you are using the MapR client and not the default hadoop client. Do you have any issues with functionality... or was it just seeing the warnings that was the concern? Thanks for helping test! - Patrick On Mon, Jun 1, 2015 at 5:18 PM, Bobby Chowdary bobby.chowdar...@gmail.com wrote: Hive Context works on RC3 for Mapr after adding spark.sql.hive.metastore.sharedPrefixes as suggested in SPARK-7819. However, there still seems to be some other issues with native libraries, i get below warning WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable. I tried adding even after adding SPARK_LIBRARYPATH and --driver-library-path with no luck. Built on MacOSX and running CentOS 7 JDK1.6 and JDK 1.8 (tried both) make-distribution.sh --tgz --skip-java-test -Phive -Phive-0.13.1 -Pmapr4 -Pnetlib-lgpl -Phive-thriftserver. C On Mon, Jun 1, 2015 at 3:05 PM, Sean Owen so...@cloudera.com wrote: I get a bunch of failures in VersionSuite with build/test params -Pyarn -Phive -Phadoop-2.6: - success sanity check *** FAILED *** java.lang.RuntimeException: [download failed: org.jboss.netty#netty;3.2.2.Final!netty.jar(bundle), download failed: commons-net#commons-net;3.1!commons-net.jar] at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:978) ... but maybe I missed the memo about how to build for Hive? do I still need another Hive profile? Other tests, signatures, etc look good. On Sat, May 30, 2015 at 12:40 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.0! The tag to be voted on is v1.4.0-rc3 (commit dd109a8): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.0] https://repository.apache.org/content/repositories/orgapachespark-1109/ [published as version: 1.4.0-rc3] https://repository.apache.org/content/repositories/orgapachespark-1110/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/ Please vote on releasing this package as Apache Spark 1.4.0! The vote is open until Tuesday, June 02, at 00:32 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What has changed since RC1 == Below is a list of bug fixes that went into this RC: http://s.apache.org/vN == How can I help test this release? == If you are a Spark user, you can help us test this release by taking a Spark 1.3 workload and running on this release candidate, then reporting any regressions. == What justifies a -1 vote for this release? == This vote is happening towards the end of the 1.4 QA period, so -1 votes should only occur for significant regressions from 1.3.1. Bugs already present in 1.3.X, minor regressions, or bugs related to new features will not block this release. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.4.0 (RC3)
Hi Patrick, Thanks for clarifying. No issues with functionality. +1 (non-binding) Thanks Bobby On Mon, Jun 1, 2015 at 9:41 PM, Patrick Wendell pwend...@gmail.com wrote: Hey Bobby, Those are generic warnings that the hadoop libraries throw. If you are using MapRFS they shouldn't matter since you are using the MapR client and not the default hadoop client. Do you have any issues with functionality... or was it just seeing the warnings that was the concern? Thanks for helping test! - Patrick On Mon, Jun 1, 2015 at 5:18 PM, Bobby Chowdary bobby.chowdar...@gmail.com wrote: Hive Context works on RC3 for Mapr after adding spark.sql.hive.metastore.sharedPrefixes as suggested in SPARK-7819. However, there still seems to be some other issues with native libraries, i get below warning WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable. I tried adding even after adding SPARK_LIBRARYPATH and --driver-library-path with no luck. Built on MacOSX and running CentOS 7 JDK1.6 and JDK 1.8 (tried both) make-distribution.sh --tgz --skip-java-test -Phive -Phive-0.13.1 -Pmapr4 -Pnetlib-lgpl -Phive-thriftserver. C On Mon, Jun 1, 2015 at 3:05 PM, Sean Owen so...@cloudera.com wrote: I get a bunch of failures in VersionSuite with build/test params -Pyarn -Phive -Phadoop-2.6: - success sanity check *** FAILED *** java.lang.RuntimeException: [download failed: org.jboss.netty#netty;3.2.2.Final!netty.jar(bundle), download failed: commons-net#commons-net;3.1!commons-net.jar] at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:978) ... but maybe I missed the memo about how to build for Hive? do I still need another Hive profile? Other tests, signatures, etc look good. On Sat, May 30, 2015 at 12:40 AM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.0! The tag to be voted on is v1.4.0-rc3 (commit dd109a8): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.0] https://repository.apache.org/content/repositories/orgapachespark-1109/ [published as version: 1.4.0-rc3] https://repository.apache.org/content/repositories/orgapachespark-1110/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/ Please vote on releasing this package as Apache Spark 1.4.0! The vote is open until Tuesday, June 02, at 00:32 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What has changed since RC1 == Below is a list of bug fixes that went into this RC: http://s.apache.org/vN == How can I help test this release? == If you are a Spark user, you can help us test this release by taking a Spark 1.3 workload and running on this release candidate, then reporting any regressions. == What justifies a -1 vote for this release? == This vote is happening towards the end of the 1.4 QA period, so -1 votes should only occur for significant regressions from 1.3.1. Bugs already present in 1.3.X, minor regressions, or bugs related to new features will not block this release. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.4.0 (RC3)
+1 (non-binding) Launched against a pseudo-distributed YARN cluster running Hadoop 2.6.0 and ran some jobs. -Sandy On Sat, May 30, 2015 at 3:44 PM, Krishna Sankar ksanka...@gmail.com wrote: +1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 17:07 min mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -DskipTests 2. Tested pyspark, mlib - running as well as compare results with 1.3.1 2.1. statistics (min,max,mean,Pearson,Spearman) OK 2.2. Linear/Ridge/Laso Regression OK 2.3. Decision Tree, Naive Bayes OK 2.4. KMeans OK Center And Scale OK 2.5. RDD operations OK State of the Union Texts - MapReduce, Filter,sortByKey (word count) 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK Model evaluation/optimization (rank, numIter, lambda) with itertools OK 3. Scala - MLlib 3.1. statistics (min,max,mean,Pearson,Spearman) OK 3.2. LinearRegressionWithSGD OK 3.3. Decision Tree OK 3.4. KMeans OK 3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK 3.6. saveAsParquetFile OK 3.7. Read and verify the 4.3 save(above) - sqlContext.parquetFile, registerTempTable, sql OK 3.8. result = sqlContext.sql(SELECT OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID) OK 4.0. Spark SQL from Python OK 4.1. result = sqlContext.sql(SELECT * from people WHERE State = 'WA') OK Cheers k/ On Fri, May 29, 2015 at 4:40 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.0! The tag to be voted on is v1.4.0-rc3 (commit dd109a8): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.0] https://repository.apache.org/content/repositories/orgapachespark-1109/ [published as version: 1.4.0-rc3] https://repository.apache.org/content/repositories/orgapachespark-1110/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/ Please vote on releasing this package as Apache Spark 1.4.0! The vote is open until Tuesday, June 02, at 00:32 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What has changed since RC1 == Below is a list of bug fixes that went into this RC: http://s.apache.org/vN == How can I help test this release? == If you are a Spark user, you can help us test this release by taking a Spark 1.3 workload and running on this release candidate, then reporting any regressions. == What justifies a -1 vote for this release? == This vote is happening towards the end of the 1.4 QA period, so -1 votes should only occur for significant regressions from 1.3.1. Bugs already present in 1.3.X, minor regressions, or bugs related to new features will not block this release. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.4.0 (RC3)
+1 (non-binding, of course) 1. Compiled OSX 10.10 (Yosemite) OK Total time: 17:07 min mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4 -Dhadoop.version=2.6.0 -DskipTests 2. Tested pyspark, mlib - running as well as compare results with 1.3.1 2.1. statistics (min,max,mean,Pearson,Spearman) OK 2.2. Linear/Ridge/Laso Regression OK 2.3. Decision Tree, Naive Bayes OK 2.4. KMeans OK Center And Scale OK 2.5. RDD operations OK State of the Union Texts - MapReduce, Filter,sortByKey (word count) 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK Model evaluation/optimization (rank, numIter, lambda) with itertools OK 3. Scala - MLlib 3.1. statistics (min,max,mean,Pearson,Spearman) OK 3.2. LinearRegressionWithSGD OK 3.3. Decision Tree OK 3.4. KMeans OK 3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK 3.6. saveAsParquetFile OK 3.7. Read and verify the 4.3 save(above) - sqlContext.parquetFile, registerTempTable, sql OK 3.8. result = sqlContext.sql(SELECT OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID) OK 4.0. Spark SQL from Python OK 4.1. result = sqlContext.sql(SELECT * from people WHERE State = 'WA') OK Cheers k/ On Fri, May 29, 2015 at 4:40 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.0! The tag to be voted on is v1.4.0-rc3 (commit dd109a8): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.0] https://repository.apache.org/content/repositories/orgapachespark-1109/ [published as version: 1.4.0-rc3] https://repository.apache.org/content/repositories/orgapachespark-1110/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/ Please vote on releasing this package as Apache Spark 1.4.0! The vote is open until Tuesday, June 02, at 00:32 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What has changed since RC1 == Below is a list of bug fixes that went into this RC: http://s.apache.org/vN == How can I help test this release? == If you are a Spark user, you can help us test this release by taking a Spark 1.3 workload and running on this release candidate, then reporting any regressions. == What justifies a -1 vote for this release? == This vote is happening towards the end of the 1.4 QA period, so -1 votes should only occur for significant regressions from 1.3.1. Bugs already present in 1.3.X, minor regressions, or bugs related to new features will not block this release. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
[VOTE] Release Apache Spark 1.4.0 (RC3)
Please vote on releasing the following candidate as Apache Spark version 1.4.0! The tag to be voted on is v1.4.0-rc3 (commit dd109a8): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.0] https://repository.apache.org/content/repositories/orgapachespark-1109/ [published as version: 1.4.0-rc3] https://repository.apache.org/content/repositories/orgapachespark-1110/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/ Please vote on releasing this package as Apache Spark 1.4.0! The vote is open until Tuesday, June 02, at 00:32 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What has changed since RC1 == Below is a list of bug fixes that went into this RC: http://s.apache.org/vN == How can I help test this release? == If you are a Spark user, you can help us test this release by taking a Spark 1.3 workload and running on this release candidate, then reporting any regressions. == What justifies a -1 vote for this release? == This vote is happening towards the end of the 1.4 QA period, so -1 votes should only occur for significant regressions from 1.3.1. Bugs already present in 1.3.X, minor regressions, or bugs related to new features will not block this release. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.4.0 (RC3)
Mike, The broken Configuration link can be fixed if you add a missing dash '-' on the first line in docs/configuration.md and run 'jekyll build'. https://github.com/apache/spark/pull/6513 On Fri, May 29, 2015 at 6:38 PM, Mike Ringenburg mik...@cray.com wrote: The Configuration link on the docs appears to be broken. Mike On May 29, 2015, at 4:41 PM, Patrick Wendell pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.0! The tag to be voted on is v1.4.0-rc3 (commit dd109a8): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.0] https://repository.apache.org/content/repositories/orgapachespark-1109/ [published as version: 1.4.0-rc3] https://repository.apache.org/content/repositories/orgapachespark-1110/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/ Please vote on releasing this package as Apache Spark 1.4.0! The vote is open until Tuesday, June 02, at 00:32 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What has changed since RC1 == Below is a list of bug fixes that went into this RC: http://s.apache.org/vN == How can I help test this release? == If you are a Spark user, you can help us test this release by taking a Spark 1.3 workload and running on this release candidate, then reporting any regressions. == What justifies a -1 vote for this release? == This vote is happening towards the end of the 1.4 QA period, so -1 votes should only occur for significant regressions from 1.3.1. Bugs already present in 1.3.X, minor regressions, or bugs related to new features will not block this release. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.org
Re: [VOTE] Release Apache Spark 1.4.0 (RC3)
The Configuration link on the docs appears to be broken. Mike On May 29, 2015, at 4:41 PM, Patrick Wendell pwend...@gmail.commailto:pwend...@gmail.com wrote: Please vote on releasing the following candidate as Apache Spark version 1.4.0! The tag to be voted on is v1.4.0-rc3 (commit dd109a8): https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730 The release files, including signatures, digests, etc. can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/ Release artifacts are signed with the following key: https://people.apache.org/keys/committer/pwendell.asc The staging repository for this release can be found at: [published as version: 1.4.0] https://repository.apache.org/content/repositories/orgapachespark-1109/ [published as version: 1.4.0-rc3] https://repository.apache.org/content/repositories/orgapachespark-1110/ The documentation corresponding to this release can be found at: http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/ Please vote on releasing this package as Apache Spark 1.4.0! The vote is open until Tuesday, June 02, at 00:32 UTC and passes if a majority of at least 3 +1 PMC votes are cast. [ ] +1 Release this package as Apache Spark 1.4.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ == What has changed since RC1 == Below is a list of bug fixes that went into this RC: http://s.apache.org/vN == How can I help test this release? == If you are a Spark user, you can help us test this release by taking a Spark 1.3 workload and running on this release candidate, then reporting any regressions. == What justifies a -1 vote for this release? == This vote is happening towards the end of the 1.4 QA period, so -1 votes should only occur for significant regressions from 1.3.1. Bugs already present in 1.3.X, minor regressions, or bugs related to new features will not block this release. - To unsubscribe, e-mail: dev-unsubscr...@spark.apache.orgmailto:dev-unsubscr...@spark.apache.org For additional commands, e-mail: dev-h...@spark.apache.orgmailto:dev-h...@spark.apache.org