Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-02 Thread Olivier Girardot
Hi everyone,
I think there's a blocker on PySpark the when functions in python seems
to be broken but the Scala API seems fine.
Here's a snippet demonstrating that with Spark 1.4.0 RC3 :

In [*1*]: df = sqlCtx.createDataFrame([(1, 1), (2, 2), (1, 2), (1,
2)], [key, value])

In [*2*]: from pyspark.sql import functions as F

In [*8*]: df.select(df.key, F.when(df.key  1, 0).when(df.key == 0,
2).otherwise(1)).show()
+---+-+
| key |CASE WHEN (key = 0) THEN 2 ELSE 1|
+---+-+
| 1| 1|
| 2| 1|
| 1| 1|
| 1| 1|
+---+-+

When in Scala I get the expectes expression and behaviour :

scala val df = sqlContext.createDataFrame(List((1, 1), (2, 2), (1,
2), (1, 2))).toDF(key, value)

scala import org.apache.spark.sql.functions._

scala df.select(df(key), when(df(key)  1, 0).when(df(key) === 2,
2).otherwise(1)).show()


+---+---+

|key|CASE WHEN (key  1) THEN 0 WHEN (key = 2) THEN 2 ELSE 1|
+---+---+
| 1| 1|
| 2| 0|
| 1| 1|
| 1| 1|
+---+---+

I've opened the Jira (https://issues.apache.org/jira/browse/SPARK-8038) and
fixed it here https://github.com/apache/spark/pull/6580

Regards,

Olivier.

Le mar. 2 juin 2015 à 07:34, Bobby Chowdary bobby.chowdar...@gmail.com a
écrit :

 Hi Patrick,
   Thanks for clarifying. No issues with functionality.
 +1 (non-binding)

 Thanks
 Bobby

 On Mon, Jun 1, 2015 at 9:41 PM, Patrick Wendell pwend...@gmail.com
 wrote:

 Hey Bobby,

 Those are generic warnings that the hadoop libraries throw. If you are
 using MapRFS they shouldn't matter since you are using the MapR client
 and not the default hadoop client.

 Do you have any issues with functionality... or was it just seeing the
 warnings that was the concern?

 Thanks for helping test!

 - Patrick

 On Mon, Jun 1, 2015 at 5:18 PM, Bobby Chowdary
 bobby.chowdar...@gmail.com wrote:
  Hive Context works on RC3 for Mapr after adding
  spark.sql.hive.metastore.sharedPrefixes as suggested in SPARK-7819.
 However,
  there still seems to be some other issues with native libraries, i get
 below
  warning
  WARN NativeCodeLoader: Unable to load native-hadoop library for your
  platform... using builtin-java classes where applicable. I tried adding
 even
  after adding SPARK_LIBRARYPATH and --driver-library-path with no luck.
 
  Built on MacOSX and running CentOS 7 JDK1.6 and JDK 1.8 (tried both)
 
   make-distribution.sh --tgz --skip-java-test -Phive -Phive-0.13.1
 -Pmapr4
  -Pnetlib-lgpl -Phive-thriftserver.
 
C
 
  On Mon, Jun 1, 2015 at 3:05 PM, Sean Owen so...@cloudera.com wrote:
 
  I get a bunch of failures in VersionSuite with build/test params
  -Pyarn -Phive -Phadoop-2.6:
 
  - success sanity check *** FAILED ***
java.lang.RuntimeException: [download failed:
  org.jboss.netty#netty;3.2.2.Final!netty.jar(bundle), download failed:
  commons-net#commons-net;3.1!commons-net.jar]
at
 
 org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:978)
 
  ... but maybe I missed the memo about how to build for Hive? do I
  still need another Hive profile?
 
  Other tests, signatures, etc look good.
 
  On Sat, May 30, 2015 at 12:40 AM, Patrick Wendell pwend...@gmail.com
  wrote:
   Please vote on releasing the following candidate as Apache Spark
 version
   1.4.0!
  
   The tag to be voted on is v1.4.0-rc3 (commit dd109a8):
  
  
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730
  
   The release files, including signatures, digests, etc. can be found
 at:
  
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/
  
   Release artifacts are signed with the following key:
   https://people.apache.org/keys/committer/pwendell.asc
  
   The staging repository for this release can be found at:
   [published as version: 1.4.0]
  
 https://repository.apache.org/content/repositories/orgapachespark-1109/
   [published as version: 1.4.0-rc3]
  
 https://repository.apache.org/content/repositories/orgapachespark-1110/
  
   The documentation corresponding to this release can be found at:
  
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/
  
   Please vote on releasing this package as Apache Spark 1.4.0!
  
   The vote is open until Tuesday, June 02, at 00:32 UTC and passes
   if a majority of at least 3 +1 PMC votes are cast.
  
   [ ] +1 Release this package as Apache Spark 1.4.0
   [ ] -1 Do not release this package because ...
  
   To learn more about Apache Spark, please see
   http://spark.apache.org/
  
   == What has changed since RC1 ==
   Below is a list of bug fixes that went into this RC:
   http://s.apache.org/vN
  
   == How can I help test this release? ==
   If you are a Spark user, you can help us test this release by
   taking a Spark 1.3 workload 

[RESULT] [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-02 Thread Patrick Wendell
This vote is cancelled in favor of RC4.

Thanks everyone for the thorough testing of this RC. We are really
close, but there were a few blockers found. I've cut a new RC to
incorporate those issues.

The following patches were merged during the RC3 testing period:

(blockers)
4940630 [SPARK-8020] [SQL] Spark SQL conf in spark-defaults.conf make
metadataHive get constructed too early
6b0f615 [SPARK-8038] [SQL] [PYSPARK] fix Column.when() and otherwise()
78a6723 [SPARK-7978] [SQL] [PYSPARK] DecimalType should not be singleton

(other fixes)
9d6475b [SPARK-6917] [SQL] DecimalType is not read back when
non-native type exists
97d4cd0 [SPARK-8049] [MLLIB] drop tmp col from OneVsRest output
cbaf595 [SPARK-8014] [SQL] Avoid premature metadata discovery when
writing a HadoopFsRelation with a save mode other than Append
fa292dc [SPARK-8015] [FLUME] Remove Guava dependency from flume-sink.
f71a09d [SPARK-8037] [SQL] Ignores files whose name starts with dot in
HadoopFsRelation
292ee1a [SPARK-8021] [SQL] [PYSPARK] make Python read/write API
consistent with Scala
87941ff [SPARK-8023][SQL] Add deterministic attribute to Expression
to avoid collapsing nondeterministic projects.
e6d5895 [SPARK-7965] [SPARK-7972] [SQL] Handle expressions containing
multiple window expressions and make parser match window frames in
case insensitive way
8ac2376 [SPARK-8026][SQL] Add Column.alias to Scala/Java DataFrame API
efc0e05 [SPARK-7982][SQL] DataFrame.stat.crosstab should use 0 instead
of null for pairs that don't appear
cbfb682a [SPARK-8028] [SPARKR] Use addJar instead of setJars in SparkR
a7c8b00 [SPARK-7958] [STREAMING] Handled exception in
StreamingContext.start() to prevent leaking of actors
a76c2e1 [SPARK-7899] [PYSPARK] Fix Python 3 pyspark/sql/types module conflict
f1d4e7e [SPARK-7227] [SPARKR] Support fillna / dropna in R DataFrame.
01f38f7 [SPARK-7979] Enforce structural type checker.
2c45009 [SPARK-7459] [MLLIB] ElementwiseProduct Java example
8938a74 [SPARK-7962] [MESOS] Fix master url parsing in rest submission client.
1513cff [SPARK-7957] Preserve partitioning when using randomSplit
9a88be1 [SPARK-6013] [ML] Add more Python ML examples for spark.ml
2bd4460 [SPARK-7954] [SPARKR] Create SparkContext in sparkRSQL init

On Fri, May 29, 2015 at 4:40 PM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.0!

 The tag to be voted on is v1.4.0-rc3 (commit dd109a8):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.0]
 https://repository.apache.org/content/repositories/orgapachespark-1109/
 [published as version: 1.4.0-rc3]
 https://repository.apache.org/content/repositories/orgapachespark-1110/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Tuesday, June 02, at 00:32 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == What has changed since RC1 ==
 Below is a list of bug fixes that went into this RC:
 http://s.apache.org/vN

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-01 Thread Peter Rudenko
Still have problem using HiveContext from sbt. Here’s an example of 
dependencies:


|val sparkVersion = 1.4.0-rc3 lazy val root = Project(id = 
spark-hive, base = file(.), settings = Project.defaultSettings ++ 
Seq( name := spark-1.4-hive, scalaVersion := 2.10.5, 
scalaBinaryVersion := 2.10, resolvers += Spark RC at 
https://repository.apache.org/content/repositories/orgapachespark-1110/;, 
libraryDependencies ++= Seq( org.apache.spark %% spark-core % 
sparkVersion, org.apache.spark %% spark-mllib % sparkVersion, 
org.apache.spark %% spark-hive % sparkVersion, org.apache.spark %% 
spark-sql % sparkVersion ) )) |


Launching sbt console with it and running:

|val conf = new SparkConf().setMaster(local[4]).setAppName(test) val 
sc = new SparkContext(conf) val sqlContext = new 
org.apache.spark.sql.hive.HiveContext(sc) val data = sc.parallelize(1 to 
1) import sqlContext.implicits._ scala data.toDF 
java.lang.IllegalArgumentException: Unable to locate hive jars to 
connect to metastore using classloader 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader. Please set 
spark.sql.hive.metastore.jars at 
org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:206) 
at 
org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:175) at 
org.apache.spark.sql.hive.HiveContext$anon$2.init(HiveContext.scala:367) 
at 
org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:367) 
at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:366) 
at 
org.apache.spark.sql.hive.HiveContext$anon$1.init(HiveContext.scala:379) 
at 
org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:379) 
at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:378) 
at 
org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:901) 
at org.apache.spark.sql.DataFrame.init(DataFrame.scala:134) at 
org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51) at 
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:474) at 
org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:456) at 
org.apache.spark.sql.SQLContext$implicits$.intRddToDataFrameHolder(SQLContext.scala:345) 
|


Thanks,
Peter Rudenko

On 2015-06-01 05:04, Guoqiang Li wrote:


+1 (non-binding)


-- Original --
*From: * Sandy Ryza;sandy.r...@cloudera.com;
*Date: * Mon, Jun 1, 2015 07:34 AM
*To: * Krishna Sankarksanka...@gmail.com;
*Cc: * Patrick Wendellpwend...@gmail.com; 
dev@spark.apache.orgdev@spark.apache.org;

*Subject: * Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

+1 (non-binding)

Launched against a pseudo-distributed YARN cluster running Hadoop 
2.6.0 and ran some jobs.


-Sandy

On Sat, May 30, 2015 at 3:44 PM, Krishna Sankar ksanka...@gmail.com 
mailto:ksanka...@gmail.com wrote:


+1 (non-binding, of course)

1. Compiled OSX 10.10 (Yosemite) OK Total time: 17:07 min
 mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
-Dhadoop.version=2.6.0 -DskipTests
2. Tested pyspark, mlib - running as well as compare results with
1.3.1
2.1. statistics (min,max,mean,Pearson,Spearman) OK
2.2. Linear/Ridge/Laso Regression OK
2.3. Decision Tree, Naive Bayes OK
2.4. KMeans OK
   Center And Scale OK
2.5. RDD operations OK
  State of the Union Texts - MapReduce, Filter,sortByKey (word
count)
2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
   Model evaluation/optimization (rank, numIter, lambda) with
itertools OK
3. Scala - MLlib
3.1. statistics (min,max,mean,Pearson,Spearman) OK
3.2. LinearRegressionWithSGD OK
3.3. Decision Tree OK
3.4. KMeans OK
3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK
3.6. saveAsParquetFile OK
3.7. Read and verify the 4.3 save(above) - sqlContext.parquetFile,
registerTempTable, sql OK
3.8. result = sqlContext.sql(SELECT
OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM
Orders INNER JOIN OrderDetails ON Orders.OrderID =
OrderDetails.OrderID) OK
4.0. Spark SQL from Python OK
4.1. result = sqlContext.sql(SELECT * from people WHERE State =
'WA') OK

Cheers
k/

On Fri, May 29, 2015 at 4:40 PM, Patrick Wendell
pwend...@gmail.com mailto:pwend...@gmail.com wrote:

Please vote on releasing the following candidate as Apache
Spark version 1.4.0!

The tag to be voted on is v1.4.0-rc3 (commit dd109a8):

https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730

The release files, including signatures, digests, etc. can be
found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/

http://people.apache.org/%7Epwendell/spark-releases/spark-1.4.0-rc3-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-01 Thread Yin Huai
Hi Peter,

Based on your error message, seems you were not using the RC3. For the
error thrown at HiveContext's line 206, we have changed the message to this
one
https://github.com/apache/spark/blob/v1.4.0-rc3/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L205-207
just
before RC3. Basically, we will not print out the class loader name. Can you
check if a older version of 1.4 branch got used? Have you published a RC3
to your local maven repo? Can you clean your local repo cache and try again?

Thanks,

Yin

On Mon, Jun 1, 2015 at 10:45 AM, Peter Rudenko petro.rude...@gmail.com
wrote:

  Still have problem using HiveContext from sbt. Here’s an example of
 dependencies:

  val sparkVersion = 1.4.0-rc3

 lazy val root = Project(id = spark-hive, base = file(.),
settings = Project.defaultSettings ++ Seq(
name := spark-1.4-hive,
scalaVersion := 2.10.5,
scalaBinaryVersion := 2.10,
resolvers += Spark RC at 
 https://repository.apache.org/content/repositories/orgapachespark-1110/; 
 https://repository.apache.org/content/repositories/orgapachespark-1110/,
libraryDependencies ++= Seq(
  org.apache.spark %% spark-core % sparkVersion,
  org.apache.spark %% spark-mllib % sparkVersion,
  org.apache.spark %% spark-hive % sparkVersion,
  org.apache.spark %% spark-sql % sparkVersion
 )

   ))

 Launching sbt console with it and running:

 val conf = new SparkConf().setMaster(local[4]).setAppName(test)
 val sc = new SparkContext(conf)
 val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
 val data = sc.parallelize(1 to 1)
 import sqlContext.implicits._
 scala data.toDF
 java.lang.IllegalArgumentException: Unable to locate hive jars to connect to 
 metastore using classloader 
 scala.tools.nsc.interpreter.IMain$TranslatingClassLoader. Please set 
 spark.sql.hive.metastore.jars
 at 
 org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:206)
 at 
 org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:175)
 at 
 org.apache.spark.sql.hive.HiveContext$anon$2.init(HiveContext.scala:367)
 at 
 org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:367)
 at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:366)
 at 
 org.apache.spark.sql.hive.HiveContext$anon$1.init(HiveContext.scala:379)
 at 
 org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:379)
 at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:378)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:901)
 at org.apache.spark.sql.DataFrame.init(DataFrame.scala:134)
 at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
 at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:474)
 at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:456)
 at 
 org.apache.spark.sql.SQLContext$implicits$.intRddToDataFrameHolder(SQLContext.scala:345)

 Thanks,
 Peter Rudenko

 On 2015-06-01 05:04, Guoqiang Li wrote:

   +1 (non-binding)


  -- Original --
  *From: * Sandy Ryza;sandy.r...@cloudera.com sandy.r...@cloudera.com
 ;
 *Date: * Mon, Jun 1, 2015 07:34 AM
 *To: * Krishna Sankarksanka...@gmail.com ksanka...@gmail.com;
 *Cc: * Patrick Wendellpwend...@gmail.com pwend...@gmail.com;
 dev@spark.apache.org dev@spark.apache.orgdev@spark.apache.org
 dev@spark.apache.org;
 *Subject: * Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

  +1 (non-binding)

  Launched against a pseudo-distributed YARN cluster running Hadoop 2.6.0
 and ran some jobs.

  -Sandy

 On Sat, May 30, 2015 at 3:44 PM, Krishna Sankar  ksanka...@gmail.com
 ksanka...@gmail.com wrote:

  +1 (non-binding, of course)

  1. Compiled OSX 10.10 (Yosemite) OK Total time: 17:07 min
  mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
 -Dhadoop.version=2.6.0 -DskipTests
 2. Tested pyspark, mlib - running as well as compare results with 1.3.1
 2.1. statistics (min,max,mean,Pearson,Spearman) OK
 2.2. Linear/Ridge/Laso Regression OK
 2.3. Decision Tree, Naive Bayes OK
 2.4. KMeans OK
Center And Scale OK
 2.5. RDD operations OK
   State of the Union Texts - MapReduce, Filter,sortByKey (word count)
 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
Model evaluation/optimization (rank, numIter, lambda) with
 itertools OK
 3. Scala - MLlib
 3.1. statistics (min,max,mean,Pearson,Spearman) OK
 3.2. LinearRegressionWithSGD OK
 3.3. Decision Tree OK
 3.4. KMeans OK
 3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK
 3.6. saveAsParquetFile OK
 3.7. Read and verify the 4.3 save(above) - sqlContext.parquetFile,
 registerTempTable, sql OK
 3.8. result = sqlContext.sql(SELECT
 OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER
 JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-01 Thread Andrew Or
+1 (binding)

Tested the standalone cluster mode REST submission gateway - submit /
status / kill
Tested simple applications on YARN client / cluster modes with and without
--jars
Tested python applications on YARN client / cluster modes with and without
--py-files*
Tested dynamic allocation on YARN client / cluster modes

All good.

*Filed SPARK-8017: not a blocker because python in YARN cluster mode is a
new feature


2015-06-01 11:10 GMT-07:00 Yin Huai yh...@databricks.com:

 Hi Peter,

 Based on your error message, seems you were not using the RC3. For the
 error thrown at HiveContext's line 206, we have changed the message to this
 one
 https://github.com/apache/spark/blob/v1.4.0-rc3/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L205-207
  just
 before RC3. Basically, we will not print out the class loader name. Can you
 check if a older version of 1.4 branch got used? Have you published a RC3
 to your local maven repo? Can you clean your local repo cache and try again?

 Thanks,

 Yin

 On Mon, Jun 1, 2015 at 10:45 AM, Peter Rudenko petro.rude...@gmail.com
 wrote:

  Still have problem using HiveContext from sbt. Here’s an example of
 dependencies:

  val sparkVersion = 1.4.0-rc3

 lazy val root = Project(id = spark-hive, base = file(.),
settings = Project.defaultSettings ++ Seq(
name := spark-1.4-hive,
scalaVersion := 2.10.5,
scalaBinaryVersion := 2.10,
resolvers += Spark RC at 
 https://repository.apache.org/content/repositories/orgapachespark-1110/; 
 https://repository.apache.org/content/repositories/orgapachespark-1110/,
libraryDependencies ++= Seq(
  org.apache.spark %% spark-core % sparkVersion,
  org.apache.spark %% spark-mllib % sparkVersion,
  org.apache.spark %% spark-hive % sparkVersion,
  org.apache.spark %% spark-sql % sparkVersion
 )

   ))

 Launching sbt console with it and running:

 val conf = new SparkConf().setMaster(local[4]).setAppName(test)
 val sc = new SparkContext(conf)
 val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
 val data = sc.parallelize(1 to 1)
 import sqlContext.implicits._
 scala data.toDF
 java.lang.IllegalArgumentException: Unable to locate hive jars to connect to 
 metastore using classloader 
 scala.tools.nsc.interpreter.IMain$TranslatingClassLoader. Please set 
 spark.sql.hive.metastore.jars
 at 
 org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:206)
 at 
 org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:175)
 at 
 org.apache.spark.sql.hive.HiveContext$anon$2.init(HiveContext.scala:367)
 at 
 org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:367)
 at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:366)
 at 
 org.apache.spark.sql.hive.HiveContext$anon$1.init(HiveContext.scala:379)
 at 
 org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:379)
 at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:378)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:901)
 at org.apache.spark.sql.DataFrame.init(DataFrame.scala:134)
 at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
 at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:474)
 at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:456)
 at 
 org.apache.spark.sql.SQLContext$implicits$.intRddToDataFrameHolder(SQLContext.scala:345)

 Thanks,
 Peter Rudenko

 On 2015-06-01 05:04, Guoqiang Li wrote:

   +1 (non-binding)


  -- Original --
  *From: * Sandy Ryza;sandy.r...@cloudera.com
 sandy.r...@cloudera.com;
 *Date: * Mon, Jun 1, 2015 07:34 AM
 *To: * Krishna Sankarksanka...@gmail.com ksanka...@gmail.com;
 *Cc: * Patrick Wendellpwend...@gmail.com pwend...@gmail.com;
 dev@spark.apache.org dev@spark.apache.orgdev@spark.apache.org
 dev@spark.apache.org;
 *Subject: * Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

  +1 (non-binding)

  Launched against a pseudo-distributed YARN cluster running Hadoop 2.6.0
 and ran some jobs.

  -Sandy

 On Sat, May 30, 2015 at 3:44 PM, Krishna Sankar  ksanka...@gmail.com
 ksanka...@gmail.com wrote:

  +1 (non-binding, of course)

  1. Compiled OSX 10.10 (Yosemite) OK Total time: 17:07 min
  mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
 -Dhadoop.version=2.6.0 -DskipTests
 2. Tested pyspark, mlib - running as well as compare results with 1.3.1
 2.1. statistics (min,max,mean,Pearson,Spearman) OK
 2.2. Linear/Ridge/Laso Regression OK
 2.3. Decision Tree, Naive Bayes OK
 2.4. KMeans OK
Center And Scale OK
 2.5. RDD operations OK
   State of the Union Texts - MapReduce, Filter,sortByKey (word count)
 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
Model evaluation/optimization (rank, numIter, lambda) with
 itertools OK
 3

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-01 Thread Peter Rudenko

On 2015-06-01 05:04, Guoqiang Li wrote:


+1 (non-binding)


-- Original --
*From: * Sandy Ryza;sandy.r...@cloudera.com
mailto:sandy.r...@cloudera.com;
*Date: * Mon, Jun 1, 2015 07:34 AM
*To: * Krishna Sankarksanka...@gmail.com
mailto:ksanka...@gmail.com;
*Cc: * Patrick Wendellpwend...@gmail.com
mailto:pwend...@gmail.com; dev@spark.apache.org
mailto:dev@spark.apache.orgdev@spark.apache.org
mailto:dev@spark.apache.org;
*Subject: * Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

+1 (non-binding)

Launched against a pseudo-distributed YARN cluster running Hadoop
2.6.0 and ran some jobs.

-Sandy

On Sat, May 30, 2015 at 3:44 PM, Krishna Sankar
ksanka...@gmail.com mailto:ksanka...@gmail.com wrote:

+1 (non-binding, of course)

1. Compiled OSX 10.10 (Yosemite) OK Total time: 17:07 min
 mvn clean package -Pyarn -Dyarn.version=2.6.0
-Phadoop-2.4 -Dhadoop.version=2.6.0 -DskipTests
2. Tested pyspark, mlib - running as well as compare results
with 1.3.1
2.1. statistics (min,max,mean,Pearson,Spearman) OK
2.2. Linear/Ridge/Laso Regression OK
2.3. Decision Tree, Naive Bayes OK
2.4. KMeans OK
   Center And Scale OK
2.5. RDD operations OK
  State of the Union Texts - MapReduce, Filter,sortByKey
(word count)
2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
   Model evaluation/optimization (rank, numIter, lambda)
with itertools OK
3. Scala - MLlib
3.1. statistics (min,max,mean,Pearson,Spearman) OK
3.2. LinearRegressionWithSGD OK
3.3. Decision Tree OK
3.4. KMeans OK
3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK
3.6. saveAsParquetFile OK
3.7. Read and verify the 4.3 save(above) -
sqlContext.parquetFile, registerTempTable, sql OK
3.8. result = sqlContext.sql(SELECT
OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM
Orders INNER JOIN OrderDetails ON Orders.OrderID =
OrderDetails.OrderID) OK
4.0. Spark SQL from Python OK
4.1. result = sqlContext.sql(SELECT * from people WHERE
State = 'WA') OK

Cheers
k/

On Fri, May 29, 2015 at 4:40 PM, Patrick Wendell
pwend...@gmail.com mailto:pwend...@gmail.com wrote:

Please vote on releasing the following candidate as
Apache Spark version 1.4.0!

The tag to be voted on is v1.4.0-rc3 (commit dd109a8):

https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730

The release files, including signatures, digests, etc.
can be found at:

http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/

http://people.apache.org/%7Epwendell/spark-releases/spark-1.4.0-rc3-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
[published as version: 1.4.0]

https://repository.apache.org/content/repositories/orgapachespark-1109/
[published as version: 1.4.0-rc3]

https://repository.apache.org/content/repositories/orgapachespark-1110/

The documentation corresponding to this release can be
found at:

http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/

http://people.apache.org/%7Epwendell/spark-releases/spark-1.4.0-rc3-docs/

Please vote on releasing this package as Apache Spark 1.4.0!

The vote is open until Tuesday, June 02, at 00:32 UTC and
passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.4.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

== What has changed since RC1 ==
Below is a list of bug fixes that went into this RC:
http://s.apache.org/vN

== How can I help test this release? ==
If you are a Spark user, you can help us test this release by
taking a Spark 1.3 workload and running on this release
candidate,
then reporting any regressions.

== What justifies a -1 vote for this release? ==
This vote is happening towards the end of the 1.4 QA period,
so -1 votes should only occur for significant regressions
from 1.3.1.
Bugs already present in 1.3.X, minor regressions, or bugs
related
to new features will not block this release

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-01 Thread Michael Armbrust
(HiveContext.scala:379)
 at 
 org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:379)
 at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:378)
 at 
 org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:901)
 at org.apache.spark.sql.DataFrame.init(DataFrame.scala:134)
 at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
 at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:474)
 at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:456)
 at 
 org.apache.spark.sql.SQLContext$implicits$.intRddToDataFrameHolder(SQLContext.scala:345)

 Thanks,
 Peter Rudenko

 On 2015-06-01 05:04, Guoqiang Li wrote:

 +1 (non-binding)


  -- Original --
  *From: * Sandy Ryza; sandy.r...@cloudera.com
 sandy.r...@cloudera.com sandy.r...@cloudera.com;
 *Date: * Mon, Jun 1, 2015 07:34 AM
 *To: * Krishna Sankar ksanka...@gmail.comksanka...@gmail.com
 ksanka...@gmail.com;
 *Cc: * Patrick Wendell pwend...@gmail.compwend...@gmail.com
 pwend...@gmail.com; dev@spark.apache.org dev@spark.apache.org
 dev@spark.apache.orgdev@spark.apache.org dev@spark.apache.org;
 *Subject: * Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

  +1 (non-binding)

  Launched against a pseudo-distributed YARN cluster running Hadoop 2.6.0
 and ran some jobs.

  -Sandy

 On Sat, May 30, 2015 at 3:44 PM, Krishna Sankar  ksanka...@gmail.com
 ksanka...@gmail.com wrote:

  +1 (non-binding, of course)

  1. Compiled OSX 10.10 (Yosemite) OK Total time: 17:07 min
  mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
 -Dhadoop.version=2.6.0 -DskipTests
 2. Tested pyspark, mlib - running as well as compare results with 1.3.1
 2.1. statistics (min,max,mean,Pearson,Spearman) OK
 2.2. Linear/Ridge/Laso Regression OK
 2.3. Decision Tree, Naive Bayes OK
 2.4. KMeans OK
Center And Scale OK
 2.5. RDD operations OK
   State of the Union Texts - MapReduce, Filter,sortByKey (word count)
 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
Model evaluation/optimization (rank, numIter, lambda) with
 itertools OK
 3. Scala - MLlib
 3.1. statistics (min,max,mean,Pearson,Spearman) OK
 3.2. LinearRegressionWithSGD OK
 3.3. Decision Tree OK
 3.4. KMeans OK
 3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK
 3.6. saveAsParquetFile OK
 3.7. Read and verify the 4.3 save(above) - sqlContext.parquetFile,
 registerTempTable, sql OK
 3.8. result = sqlContext.sql(SELECT
 OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER
 JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID) OK
 4.0. Spark SQL from Python OK
 4.1. result = sqlContext.sql(SELECT * from people WHERE State = 'WA')
 OK

  Cheers
  k/

 On Fri, May 29, 2015 at 4:40 PM, Patrick Wendell  pwend...@gmail.com
 pwend...@gmail.com wrote:

 Please vote on releasing the following candidate as Apache Spark
 version 1.4.0!

 The tag to be voted on is v1.4.0-rc3 (commit dd109a8):

 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.0]
 https://repository.apache.org/content/repositories/orgapachespark-1109/
 [published as version: 1.4.0-rc3]
 https://repository.apache.org/content/repositories/orgapachespark-1110/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Tuesday, June 02, at 00:32 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == What has changed since RC1 ==
 Below is a list of bug fixes that went into this RC:
 http://s.apache.org/vN

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org

Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-01 Thread Sean Owen
I get a bunch of failures in VersionSuite with build/test params
-Pyarn -Phive -Phadoop-2.6:

- success sanity check *** FAILED ***
  java.lang.RuntimeException: [download failed:
org.jboss.netty#netty;3.2.2.Final!netty.jar(bundle), download failed:
commons-net#commons-net;3.1!commons-net.jar]
  at 
org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:978)

... but maybe I missed the memo about how to build for Hive? do I
still need another Hive profile?

Other tests, signatures, etc look good.

On Sat, May 30, 2015 at 12:40 AM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.0!

 The tag to be voted on is v1.4.0-rc3 (commit dd109a8):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.0]
 https://repository.apache.org/content/repositories/orgapachespark-1109/
 [published as version: 1.4.0-rc3]
 https://repository.apache.org/content/repositories/orgapachespark-1110/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Tuesday, June 02, at 00:32 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == What has changed since RC1 ==
 Below is a list of bug fixes that went into this RC:
 http://s.apache.org/vN

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-01 Thread Bobby Chowdary
Hive Context works on RC3 for Mapr after adding
spark.sql.hive.metastore.sharedPrefixes as suggested in SPARK-7819
https://issues.apache.org/jira/browse/SPARK-7819. However, there still
seems to be some other issues with native libraries, i get below warning
WARN NativeCodeLoader: Unable to load native-hadoop library for your
platform... using builtin-java classes where applicable. I tried adding
even after adding SPARK_LIBRARYPATH and --driver-library-path with no luck.

Built on MacOSX and running CentOS 7 JDK1.6 and JDK 1.8 (tried both)

 make-distribution.sh --tgz --skip-java-test -Phive -Phive-0.13.1 -Pmapr4
-Pnetlib-lgpl -Phive-thriftserver.
  C​

On Mon, Jun 1, 2015 at 3:05 PM, Sean Owen so...@cloudera.com wrote:

 I get a bunch of failures in VersionSuite with build/test params
 -Pyarn -Phive -Phadoop-2.6:

 - success sanity check *** FAILED ***
   java.lang.RuntimeException: [download failed:
 org.jboss.netty#netty;3.2.2.Final!netty.jar(bundle), download failed:
 commons-net#commons-net;3.1!commons-net.jar]
   at
 org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:978)

 ... but maybe I missed the memo about how to build for Hive? do I
 still need another Hive profile?

 Other tests, signatures, etc look good.

 On Sat, May 30, 2015 at 12:40 AM, Patrick Wendell pwend...@gmail.com
 wrote:
  Please vote on releasing the following candidate as Apache Spark version
 1.4.0!
 
  The tag to be voted on is v1.4.0-rc3 (commit dd109a8):
 
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730
 
  The release files, including signatures, digests, etc. can be found at:
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
  [published as version: 1.4.0]
  https://repository.apache.org/content/repositories/orgapachespark-1109/
  [published as version: 1.4.0-rc3]
  https://repository.apache.org/content/repositories/orgapachespark-1110/
 
  The documentation corresponding to this release can be found at:
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/
 
  Please vote on releasing this package as Apache Spark 1.4.0!
 
  The vote is open until Tuesday, June 02, at 00:32 UTC and passes
  if a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 1.4.0
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
 
  == What has changed since RC1 ==
  Below is a list of bug fixes that went into this RC:
  http://s.apache.org/vN
 
  == How can I help test this release? ==
  If you are a Spark user, you can help us test this release by
  taking a Spark 1.3 workload and running on this release candidate,
  then reporting any regressions.
 
  == What justifies a -1 vote for this release? ==
  This vote is happening towards the end of the 1.4 QA period,
  so -1 votes should only occur for significant regressions from 1.3.1.
  Bugs already present in 1.3.X, minor regressions, or bugs related
  to new features will not block this release.
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-01 Thread Patrick Wendell
Hey Bobby,

Those are generic warnings that the hadoop libraries throw. If you are
using MapRFS they shouldn't matter since you are using the MapR client
and not the default hadoop client.

Do you have any issues with functionality... or was it just seeing the
warnings that was the concern?

Thanks for helping test!

- Patrick

On Mon, Jun 1, 2015 at 5:18 PM, Bobby Chowdary
bobby.chowdar...@gmail.com wrote:
 Hive Context works on RC3 for Mapr after adding
 spark.sql.hive.metastore.sharedPrefixes as suggested in SPARK-7819. However,
 there still seems to be some other issues with native libraries, i get below
 warning
 WARN NativeCodeLoader: Unable to load native-hadoop library for your
 platform... using builtin-java classes where applicable. I tried adding even
 after adding SPARK_LIBRARYPATH and --driver-library-path with no luck.

 Built on MacOSX and running CentOS 7 JDK1.6 and JDK 1.8 (tried both)

  make-distribution.sh --tgz --skip-java-test -Phive -Phive-0.13.1 -Pmapr4
 -Pnetlib-lgpl -Phive-thriftserver.

   C

 On Mon, Jun 1, 2015 at 3:05 PM, Sean Owen so...@cloudera.com wrote:

 I get a bunch of failures in VersionSuite with build/test params
 -Pyarn -Phive -Phadoop-2.6:

 - success sanity check *** FAILED ***
   java.lang.RuntimeException: [download failed:
 org.jboss.netty#netty;3.2.2.Final!netty.jar(bundle), download failed:
 commons-net#commons-net;3.1!commons-net.jar]
   at
 org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:978)

 ... but maybe I missed the memo about how to build for Hive? do I
 still need another Hive profile?

 Other tests, signatures, etc look good.

 On Sat, May 30, 2015 at 12:40 AM, Patrick Wendell pwend...@gmail.com
 wrote:
  Please vote on releasing the following candidate as Apache Spark version
  1.4.0!
 
  The tag to be voted on is v1.4.0-rc3 (commit dd109a8):
 
  https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730
 
  The release files, including signatures, digests, etc. can be found at:
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
  [published as version: 1.4.0]
  https://repository.apache.org/content/repositories/orgapachespark-1109/
  [published as version: 1.4.0-rc3]
  https://repository.apache.org/content/repositories/orgapachespark-1110/
 
  The documentation corresponding to this release can be found at:
  http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/
 
  Please vote on releasing this package as Apache Spark 1.4.0!
 
  The vote is open until Tuesday, June 02, at 00:32 UTC and passes
  if a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 1.4.0
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
 
  == What has changed since RC1 ==
  Below is a list of bug fixes that went into this RC:
  http://s.apache.org/vN
 
  == How can I help test this release? ==
  If you are a Spark user, you can help us test this release by
  taking a Spark 1.3 workload and running on this release candidate,
  then reporting any regressions.
 
  == What justifies a -1 vote for this release? ==
  This vote is happening towards the end of the 1.4 QA period,
  so -1 votes should only occur for significant regressions from 1.3.1.
  Bugs already present in 1.3.X, minor regressions, or bugs related
  to new features will not block this release.
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-06-01 Thread Bobby Chowdary
Hi Patrick,
  Thanks for clarifying. No issues with functionality.
+1 (non-binding)

Thanks
Bobby

On Mon, Jun 1, 2015 at 9:41 PM, Patrick Wendell pwend...@gmail.com wrote:

 Hey Bobby,

 Those are generic warnings that the hadoop libraries throw. If you are
 using MapRFS they shouldn't matter since you are using the MapR client
 and not the default hadoop client.

 Do you have any issues with functionality... or was it just seeing the
 warnings that was the concern?

 Thanks for helping test!

 - Patrick

 On Mon, Jun 1, 2015 at 5:18 PM, Bobby Chowdary
 bobby.chowdar...@gmail.com wrote:
  Hive Context works on RC3 for Mapr after adding
  spark.sql.hive.metastore.sharedPrefixes as suggested in SPARK-7819.
 However,
  there still seems to be some other issues with native libraries, i get
 below
  warning
  WARN NativeCodeLoader: Unable to load native-hadoop library for your
  platform... using builtin-java classes where applicable. I tried adding
 even
  after adding SPARK_LIBRARYPATH and --driver-library-path with no luck.
 
  Built on MacOSX and running CentOS 7 JDK1.6 and JDK 1.8 (tried both)
 
   make-distribution.sh --tgz --skip-java-test -Phive -Phive-0.13.1 -Pmapr4
  -Pnetlib-lgpl -Phive-thriftserver.
 
C
 
  On Mon, Jun 1, 2015 at 3:05 PM, Sean Owen so...@cloudera.com wrote:
 
  I get a bunch of failures in VersionSuite with build/test params
  -Pyarn -Phive -Phadoop-2.6:
 
  - success sanity check *** FAILED ***
java.lang.RuntimeException: [download failed:
  org.jboss.netty#netty;3.2.2.Final!netty.jar(bundle), download failed:
  commons-net#commons-net;3.1!commons-net.jar]
at
 
 org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:978)
 
  ... but maybe I missed the memo about how to build for Hive? do I
  still need another Hive profile?
 
  Other tests, signatures, etc look good.
 
  On Sat, May 30, 2015 at 12:40 AM, Patrick Wendell pwend...@gmail.com
  wrote:
   Please vote on releasing the following candidate as Apache Spark
 version
   1.4.0!
  
   The tag to be voted on is v1.4.0-rc3 (commit dd109a8):
  
  
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730
  
   The release files, including signatures, digests, etc. can be found
 at:
  
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/
  
   Release artifacts are signed with the following key:
   https://people.apache.org/keys/committer/pwendell.asc
  
   The staging repository for this release can be found at:
   [published as version: 1.4.0]
  
 https://repository.apache.org/content/repositories/orgapachespark-1109/
   [published as version: 1.4.0-rc3]
  
 https://repository.apache.org/content/repositories/orgapachespark-1110/
  
   The documentation corresponding to this release can be found at:
  
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/
  
   Please vote on releasing this package as Apache Spark 1.4.0!
  
   The vote is open until Tuesday, June 02, at 00:32 UTC and passes
   if a majority of at least 3 +1 PMC votes are cast.
  
   [ ] +1 Release this package as Apache Spark 1.4.0
   [ ] -1 Do not release this package because ...
  
   To learn more about Apache Spark, please see
   http://spark.apache.org/
  
   == What has changed since RC1 ==
   Below is a list of bug fixes that went into this RC:
   http://s.apache.org/vN
  
   == How can I help test this release? ==
   If you are a Spark user, you can help us test this release by
   taking a Spark 1.3 workload and running on this release candidate,
   then reporting any regressions.
  
   == What justifies a -1 vote for this release? ==
   This vote is happening towards the end of the 1.4 QA period,
   so -1 votes should only occur for significant regressions from 1.3.1.
   Bugs already present in 1.3.X, minor regressions, or bugs related
   to new features will not block this release.
  
   -
   To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
   For additional commands, e-mail: dev-h...@spark.apache.org
  
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 



Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-05-31 Thread Sandy Ryza
+1 (non-binding)

Launched against a pseudo-distributed YARN cluster running Hadoop 2.6.0 and
ran some jobs.

-Sandy

On Sat, May 30, 2015 at 3:44 PM, Krishna Sankar ksanka...@gmail.com wrote:

 +1 (non-binding, of course)

 1. Compiled OSX 10.10 (Yosemite) OK Total time: 17:07 min
  mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
 -Dhadoop.version=2.6.0 -DskipTests
 2. Tested pyspark, mlib - running as well as compare results with 1.3.1
 2.1. statistics (min,max,mean,Pearson,Spearman) OK
 2.2. Linear/Ridge/Laso Regression OK
 2.3. Decision Tree, Naive Bayes OK
 2.4. KMeans OK
Center And Scale OK
 2.5. RDD operations OK
   State of the Union Texts - MapReduce, Filter,sortByKey (word count)
 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
Model evaluation/optimization (rank, numIter, lambda) with
 itertools OK
 3. Scala - MLlib
 3.1. statistics (min,max,mean,Pearson,Spearman) OK
 3.2. LinearRegressionWithSGD OK
 3.3. Decision Tree OK
 3.4. KMeans OK
 3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK
 3.6. saveAsParquetFile OK
 3.7. Read and verify the 4.3 save(above) - sqlContext.parquetFile,
 registerTempTable, sql OK
 3.8. result = sqlContext.sql(SELECT
 OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER
 JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID) OK
 4.0. Spark SQL from Python OK
 4.1. result = sqlContext.sql(SELECT * from people WHERE State = 'WA') OK

 Cheers
 k/

 On Fri, May 29, 2015 at 4:40 PM, Patrick Wendell pwend...@gmail.com
 wrote:

 Please vote on releasing the following candidate as Apache Spark version
 1.4.0!

 The tag to be voted on is v1.4.0-rc3 (commit dd109a8):

 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.0]
 https://repository.apache.org/content/repositories/orgapachespark-1109/
 [published as version: 1.4.0-rc3]
 https://repository.apache.org/content/repositories/orgapachespark-1110/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Tuesday, June 02, at 00:32 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == What has changed since RC1 ==
 Below is a list of bug fixes that went into this RC:
 http://s.apache.org/vN

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org





Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-05-30 Thread Krishna Sankar
+1 (non-binding, of course)

1. Compiled OSX 10.10 (Yosemite) OK Total time: 17:07 min
 mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
-Dhadoop.version=2.6.0 -DskipTests
2. Tested pyspark, mlib - running as well as compare results with 1.3.1
2.1. statistics (min,max,mean,Pearson,Spearman) OK
2.2. Linear/Ridge/Laso Regression OK
2.3. Decision Tree, Naive Bayes OK
2.4. KMeans OK
   Center And Scale OK
2.5. RDD operations OK
  State of the Union Texts - MapReduce, Filter,sortByKey (word count)
2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
   Model evaluation/optimization (rank, numIter, lambda) with itertools
OK
3. Scala - MLlib
3.1. statistics (min,max,mean,Pearson,Spearman) OK
3.2. LinearRegressionWithSGD OK
3.3. Decision Tree OK
3.4. KMeans OK
3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK
3.6. saveAsParquetFile OK
3.7. Read and verify the 4.3 save(above) - sqlContext.parquetFile,
registerTempTable, sql OK
3.8. result = sqlContext.sql(SELECT
OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER
JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID) OK
4.0. Spark SQL from Python OK
4.1. result = sqlContext.sql(SELECT * from people WHERE State = 'WA') OK

Cheers
k/

On Fri, May 29, 2015 at 4:40 PM, Patrick Wendell pwend...@gmail.com wrote:

 Please vote on releasing the following candidate as Apache Spark version
 1.4.0!

 The tag to be voted on is v1.4.0-rc3 (commit dd109a8):

 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.0]
 https://repository.apache.org/content/repositories/orgapachespark-1109/
 [published as version: 1.4.0-rc3]
 https://repository.apache.org/content/repositories/orgapachespark-1110/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Tuesday, June 02, at 00:32 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == What has changed since RC1 ==
 Below is a list of bug fixes that went into this RC:
 http://s.apache.org/vN

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




[VOTE] Release Apache Spark 1.4.0 (RC3)

2015-05-29 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.4.0!

The tag to be voted on is v1.4.0-rc3 (commit dd109a8):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
[published as version: 1.4.0]
https://repository.apache.org/content/repositories/orgapachespark-1109/
[published as version: 1.4.0-rc3]
https://repository.apache.org/content/repositories/orgapachespark-1110/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/

Please vote on releasing this package as Apache Spark 1.4.0!

The vote is open until Tuesday, June 02, at 00:32 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.4.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

== What has changed since RC1 ==
Below is a list of bug fixes that went into this RC:
http://s.apache.org/vN

== How can I help test this release? ==
If you are a Spark user, you can help us test this release by
taking a Spark 1.3 workload and running on this release candidate,
then reporting any regressions.

== What justifies a -1 vote for this release? ==
This vote is happening towards the end of the 1.4 QA period,
so -1 votes should only occur for significant regressions from 1.3.1.
Bugs already present in 1.3.X, minor regressions, or bugs related
to new features will not block this release.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-05-29 Thread Taka Shinagawa
Mike,

The broken Configuration link can be fixed if you add a missing dash '-' on
the first line in docs/configuration.md and run 'jekyll build'.

https://github.com/apache/spark/pull/6513

On Fri, May 29, 2015 at 6:38 PM, Mike Ringenburg mik...@cray.com wrote:

  The Configuration link on the docs appears to be broken.

  Mike



 On May 29, 2015, at 4:41 PM, Patrick Wendell pwend...@gmail.com wrote:

  Please vote on releasing the following candidate as Apache Spark version
 1.4.0!

 The tag to be voted on is v1.4.0-rc3 (commit dd109a8):

 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 [published as version: 1.4.0]
 https://repository.apache.org/content/repositories/orgapachespark-1109/
 [published as version: 1.4.0-rc3]
 https://repository.apache.org/content/repositories/orgapachespark-1110/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Tuesday, June 02, at 00:32 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == What has changed since RC1 ==
 Below is a list of bug fixes that went into this RC:
 http://s.apache.org/vN

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: [VOTE] Release Apache Spark 1.4.0 (RC3)

2015-05-29 Thread Mike Ringenburg
The Configuration link on the docs appears to be broken.

Mike



On May 29, 2015, at 4:41 PM, Patrick Wendell 
pwend...@gmail.commailto:pwend...@gmail.com wrote:

Please vote on releasing the following candidate as Apache Spark version 1.4.0!

The tag to be voted on is v1.4.0-rc3 (commit dd109a8):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=dd109a8746ec07c7c83995890fc2c0cd7a693730

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
[published as version: 1.4.0]
https://repository.apache.org/content/repositories/orgapachespark-1109/
[published as version: 1.4.0-rc3]
https://repository.apache.org/content/repositories/orgapachespark-1110/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc3-docs/

Please vote on releasing this package as Apache Spark 1.4.0!

The vote is open until Tuesday, June 02, at 00:32 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.4.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

== What has changed since RC1 ==
Below is a list of bug fixes that went into this RC:
http://s.apache.org/vN

== How can I help test this release? ==
If you are a Spark user, you can help us test this release by
taking a Spark 1.3 workload and running on this release candidate,
then reporting any regressions.

== What justifies a -1 vote for this release? ==
This vote is happening towards the end of the 1.4 QA period,
so -1 votes should only occur for significant regressions from 1.3.1.
Bugs already present in 1.3.X, minor regressions, or bugs related
to new features will not block this release.

-
To unsubscribe, e-mail: 
dev-unsubscr...@spark.apache.orgmailto:dev-unsubscr...@spark.apache.org
For additional commands, e-mail: 
dev-h...@spark.apache.orgmailto:dev-h...@spark.apache.org