Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-29 Thread Peter Rudenko
Hi Yin, i’m using spark-hive dependency and tests for my app work for 
spark1.3.1.
seems it’s something with hive  sbt. Running from spark-shell next 
statement works, but from sbt console in rc3 i get next error:



scala val sqlContext = new org.apache.spark.sql.hive.HiveContext(sc)
15/05/29 16:31:06 WARN ObjectStore: Version information not found in 
metastore. hive.metastore.schema.verification is not enabled so 
recording the schema version 0.13.1aa
sqlContext: org.apache.spark.sql.hive.HiveContext = 
org.apache.spark.sql.hive.HiveContext@177ac9f4


scala val data = sqlContext.read.parquet(caches/-1525448137)
SLF4J: Failed to load class org.slf4j.impl.StaticLoggerBinder.
SLF4J: Defaulting to no-operation (NOP) logger implementation
SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for 
further details.
java.lang.IllegalArgumentException: Unable to locate hive jars to 
connect to metastore using classloader 
scala.tools.nsc.interpreter.IMain$TranslatingClassLoader. Please set 
spark.sql.hive.metastore.jars
at 
org.apache.spark.sql.hive.HiveContext.metadataHive$lzycompute(HiveContext.scala:206)
at 
org.apache.spark.sql.hive.HiveContext.metadataHive(HiveContext.scala:175)
at 
org.apache.spark.sql.hive.HiveContext$$anon$2.init(HiveContext.scala:367)
at 
org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:367)

at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:366)
at 
org.apache.spark.sql.hive.HiveContext$$anon$1.init(HiveContext.scala:379)
at 
org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:379)
at 
org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:378)
at 
org.apache.spark.sql.SQLContext$QueryExecution.assertAnalyzed(SQLContext.scala:901)

at org.apache.spark.sql.DataFrame.init(DataFrame.scala:134)
at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
at 
org.apache.spark.sql.SQLContext.baseRelationToDataFrame(SQLContext.scala:419)
at 
org.apache.spark.sql.DataFrameReader.parquet(DataFrameReader.scala:264)



Thanks,
Peter Rudenko

On 2015-05-29 07:08, Yin Huai wrote:


Justin,

If you are creating multiple HiveContexts in tests, you need to assign 
a temporary metastore location for every HiveContext (like what we do 
at here 
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L527-L543). 
Otherwise, they all try to connect to the metastore in the current dir 
(look at metastore_db).


Peter,

Do you also have the same use case as Justin (creating multiple 
HiveContexts in tests)? Can you explain what you meant by all tests? 
I am probably missing some context at here.


Thanks,

Yin


On Thu, May 28, 2015 at 11:28 AM, Peter Rudenko 
petro.rude...@gmail.com mailto:petro.rude...@gmail.com wrote:


Also have the same issue - all tests fail because of HiveContext /
derby lock.

|Cause: javax.jdo.JDOFatalDataStoreException: Unable to open a test
connection to the given database. JDBC url =
jdbc:derby:;databaseName=metastore_db;create=true, username = APP.
Terminating connection pool (set lazyInit to true if you expect to
start your database after your app). Original Exception: --
[info] java.sql.SQLException: Failed to start database
'metastore_db' with class loader
org.apache.spark.sql.hive.client.IsolatedClientLoader$anon$1@8066e0e,
see the next exception for details. |

Also is there build for hadoop2.6? Don’t see it here:
http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc2-bin/
http://people.apache.org/%7Epwendell/spark-releases/spark-1.4.0-rc2-bin/

Thanks,
Peter Rudenko

On 2015-05-22 22:56, Justin Uang wrote:


I'm working on one of the Palantir teams using Spark, and here is
our feedback:

We have encountered three issues when upgrading to spark 1.4.0.
I'm not sure they qualify as a -1, as they come from using
non-public APIs and multiple spark contexts for the purposes of
testing, but I do want to bring them up for awareness =)

 1. Our UDT was serializing to a StringType, but now strings are
represented internally as UTF8String, so we had to change our
UDT to use UTF8String.apply() and UTF8String.toString() to
convert back to String.
 2. createDataFrame when using UDTs used to accept things in the
serialized catalyst form. Now, they're supposed to be in the
UDT java class form (I think this change would've affected us
in 1.3.1 already, since we were in 1.3.0)
 3. derby database lifecycle management issue with HiveContext.
We have been using a SparkContextResource JUnit Rule that we
wrote, and it sets up then tears down a SparkContext and
HiveContext between unit test runs within the same process
(possibly the same thread as well). Multiple contexts are not
being used at once. It 

Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-28 Thread Peter Rudenko
Also have the same issue - all tests fail because of HiveContext / derby 
lock.


|Cause: javax.jdo.JDOFatalDataStoreException: Unable to open a test 
connection to the given database. JDBC url = 
jdbc:derby:;databaseName=metastore_db;create=true, username = APP. 
Terminating connection pool (set lazyInit to true if you expect to start 
your database after your app). Original Exception: -- [info] 
java.sql.SQLException: Failed to start database 'metastore_db' with 
class loader 
org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@8066e0e, 
see the next exception for details. |


Also is there build for hadoop2.6? Don’t see it here: 
http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc2-bin/ 
http://people.apache.org/%7Epwendell/spark-releases/spark-1.4.0-rc2-bin/


Thanks,
Peter Rudenko

On 2015-05-22 22:56, Justin Uang wrote:

I'm working on one of the Palantir teams using Spark, and here is our 
feedback:


We have encountered three issues when upgrading to spark 1.4.0. I'm 
not sure they qualify as a -1, as they come from using non-public APIs 
and multiple spark contexts for the purposes of testing, but I do want 
to bring them up for awareness =)


 1. Our UDT was serializing to a StringType, but now strings are
represented internally as UTF8String, so we had to change our UDT
to use UTF8String.apply() and UTF8String.toString() to convert
back to String.
 2. createDataFrame when using UDTs used to accept things in the
serialized catalyst form. Now, they're supposed to be in the UDT
java class form (I think this change would've affected us in 1.3.1
already, since we were in 1.3.0)
 3. derby database lifecycle management issue with HiveContext. We
have been using a SparkContextResource JUnit Rule that we wrote,
and it sets up then tears down a SparkContext and HiveContext
between unit test runs within the same process (possibly the same
thread as well). Multiple contexts are not being used at once. It
used to work in 1.3.0, but now when we try to create the
HiveContext for the second unit test, then it complains with the
following exception. I have a feeling it might have something to
do with the Hive object being thread local, and us not explicitly
closing the HiveContext and everything it holds. The full stack
trace is here: https://gist.github.com/justinuang/0403d49cdeedf91727cd

Caused by: java.sql.SQLException: Failed to start database 'metastore_db' with 
class loader 
org.apache.spark.sql.hive.client.IsolatedClientLoader$anon$1@5dea2446, see the 
next exception for details.
at 
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)

On Wed, May 20, 2015 at 10:35 AM Imran Rashid iras...@cloudera.com 
mailto:iras...@cloudera.com wrote:


-1

discovered I accidentally removed master  worker json endpoints,
will restore
https://issues.apache.org/jira/browse/SPARK-7760

On Tue, May 19, 2015 at 11:10 AM, Patrick Wendell
pwend...@gmail.com mailto:pwend...@gmail.com wrote:

Please vote on releasing the following candidate as Apache
Spark version 1.4.0!

The tag to be voted on is v1.4.0-rc1 (commit 777a081):

https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e

The release files, including signatures, digests, etc. can be
found at:
http://people.apache.org/~pwendell/spark-1.4.0-rc1/
http://people.apache.org/%7Epwendell/spark-1.4.0-rc1/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1092/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/
http://people.apache.org/%7Epwendell/spark-1.4.0-rc1-docs/

Please vote on releasing this package as Apache Spark 1.4.0!

The vote is open until Friday, May 22, at 17:03 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.4.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
http://spark.apache.org/

== How can I help test this release? ==
If you are a Spark user, you can help us test this release by
taking a Spark 1.3 workload and running on this release candidate,
then reporting any regressions.

== What justifies a -1 vote for this release? ==
This vote is happening towards the end of the 1.4 QA period,
so -1 votes should only occur for significant regressions from
1.3.1.
Bugs already present in 1.3.X, minor regressions, or bugs related
to new features will not block 

Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-28 Thread Yin Huai
Justin,

If you are creating multiple HiveContexts in tests, you need to assign a
temporary metastore location for every HiveContext (like what we do at here
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveContext.scala#L527-L543).
Otherwise, they all try to connect to the metastore in the current dir
(look at metastore_db).

Peter,

Do you also have the same use case as Justin (creating multiple
HiveContexts in tests)? Can you explain what you meant by all tests? I am
probably missing some context at here.

Thanks,

Yin


On Thu, May 28, 2015 at 11:28 AM, Peter Rudenko petro.rude...@gmail.com
wrote:

  Also have the same issue - all tests fail because of HiveContext / derby
 lock.

 Cause: javax.jdo.JDOFatalDataStoreException: Unable to open a test connection 
 to the given database. JDBC url = 
 jdbc:derby:;databaseName=metastore_db;create=true, username = APP. 
 Terminating connection pool (set lazyInit to true if you expect to start your 
 database after your app). Original Exception: --
 [info] java.sql.SQLException: Failed to start database 'metastore_db' with 
 class loader 
 org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@8066e0e, see 
 the next exception for details.

 Also is there build for hadoop2.6? Don’t see it here:
 http://people.apache.org/%7Epwendell/spark-releases/spark-1.4.0-rc2-bin/
 http://people.apache.org/~pwendell/spark-releases/spark-1.4.0-rc2-bin/

 Thanks,
 Peter Rudenko

 On 2015-05-22 22:56, Justin Uang wrote:

   I'm working on one of the Palantir teams using Spark, and here is our
 feedback:

  We have encountered three issues when upgrading to spark 1.4.0. I'm not
 sure they qualify as a -1, as they come from using non-public APIs and
 multiple spark contexts for the purposes of testing, but I do want to bring
 them up for awareness =)

1. Our UDT was serializing to a StringType, but now strings are
represented internally as UTF8String, so we had to change our UDT to use
UTF8String.apply() and UTF8String.toString() to convert back to String.
2. createDataFrame when using UDTs used to accept things in the
serialized catalyst form. Now, they're supposed to be in the UDT java class
form (I think this change would've affected us in 1.3.1 already, since we
were in 1.3.0)
3. derby database lifecycle management issue with HiveContext. We have
been using a SparkContextResource JUnit Rule that we wrote, and it sets up
then tears down a SparkContext and HiveContext between unit test runs
within the same process (possibly the same thread as well). Multiple
contexts are not being used at once. It used to work in 1.3.0, but now when
we try to create the HiveContext for the second unit test, then it
complains with the following exception. I have a feeling it might have
something to do with the Hive object being thread local, and us not
explicitly closing the HiveContext and everything it holds. The full stack
trace is here:
https://gist.github.com/justinuang/0403d49cdeedf91727cd
https://gist.github.com/justinuang/0403d49cdeedf91727cd

  Caused by: java.sql.SQLException: Failed to start database 'metastore_db' 
 with class loader 
 org.apache.spark.sql.hive.client.IsolatedClientLoader$anon$1@5dea2446, see 
 the next exception for details.
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)


 On Wed, May 20, 2015 at 10:35 AM Imran Rashid iras...@cloudera.com
 wrote:

 -1

 discovered I accidentally removed master  worker json endpoints, will
 restore
  https://issues.apache.org/jira/browse/SPARK-7760

 On Tue, May 19, 2015 at 11:10 AM, Patrick Wendell  pwend...@gmail.com
 pwend...@gmail.com wrote:

 Please vote on releasing the following candidate as Apache Spark version
 1.4.0!

 The tag to be voted on is v1.4.0-rc1 (commit 777a081):

 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1092/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Friday, May 22, at 17:03 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and 

Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-24 Thread Patrick Wendell
Hey jameszhouyi,

Since SPARK-7119 is not a regression from earlier versions, we won't
hold the release for it. However, please comment on the JIRA if it is
affecting you... it will help us prioritize the bug.

- Patrick

On Fri, May 22, 2015 at 8:41 PM, jameszhouyi yiaz...@gmail.com wrote:
 We came across a Spark SQL issue
 (https://issues.apache.org/jira/browse/SPARK-7119) that cause query to fail.
 I not sure that if vote -1 to this RC1.



 --
 View this message in context: 
 http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-4-0-RC1-tp12321p12403.html
 Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org


-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-22 Thread Justin Uang
I'm working on one of the Palantir teams using Spark, and here is our
feedback:

We have encountered three issues when upgrading to spark 1.4.0. I'm not
sure they qualify as a -1, as they come from using non-public APIs and
multiple spark contexts for the purposes of testing, but I do want to bring
them up for awareness =)

   1. Our UDT was serializing to a StringType, but now strings are
   represented internally as UTF8String, so we had to change our UDT to use
   UTF8String.apply() and UTF8String.toString() to convert back to String.
   2. createDataFrame when using UDTs used to accept things in the
   serialized catalyst form. Now, they're supposed to be in the UDT java class
   form (I think this change would've affected us in 1.3.1 already, since we
   were in 1.3.0)
   3. derby database lifecycle management issue with HiveContext. We have
   been using a SparkContextResource JUnit Rule that we wrote, and it sets up
   then tears down a SparkContext and HiveContext between unit test runs
   within the same process (possibly the same thread as well). Multiple
   contexts are not being used at once. It used to work in 1.3.0, but now when
   we try to create the HiveContext for the second unit test, then it
   complains with the following exception. I have a feeling it might have
   something to do with the Hive object being thread local, and us not
   explicitly closing the HiveContext and everything it holds. The full stack
   trace is here: https://gist.github.com/justinuang/0403d49cdeedf91727cd

Caused by: java.sql.SQLException: Failed to start database
'metastore_db' with class loader
org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@5dea2446,
see the next exception for details.
at 
org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown
Source)


On Wed, May 20, 2015 at 10:35 AM Imran Rashid iras...@cloudera.com wrote:

 -1

 discovered I accidentally removed master  worker json endpoints, will
 restore
 https://issues.apache.org/jira/browse/SPARK-7760

 On Tue, May 19, 2015 at 11:10 AM, Patrick Wendell pwend...@gmail.com
 wrote:

 Please vote on releasing the following candidate as Apache Spark version
 1.4.0!

 The tag to be voted on is v1.4.0-rc1 (commit 777a081):

 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1092/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Friday, May 22, at 17:03 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org





Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-22 Thread Shivaram Venkataraman
Thanks for catching this. I'll check with Patrick to see why the R API docs
are not getting included.

On Fri, May 22, 2015 at 2:44 PM, Andrew Psaltis psaltis.and...@gmail.com
wrote:

 All,
 Should all the docs work from
 http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/ ? If so the R
 API docs 404.


 On Tue, May 19, 2015 at 11:10 AM, Patrick Wendell pwend...@gmail.com
 wrote:

 Please vote on releasing the following candidate as Apache Spark
 version 1.4.0!

 The tag to be voted on is v1.4.0-rc1 (commit 777a081):

 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1092/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Friday, May 22, at 17:03 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org







Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-22 Thread Andrew Psaltis
All,
Should all the docs work from
http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/ ? If so the R API
docs 404.


 On Tue, May 19, 2015 at 11:10 AM, Patrick Wendell pwend...@gmail.com
 wrote:

 Please vote on releasing the following candidate as Apache Spark
 version 1.4.0!

 The tag to be voted on is v1.4.0-rc1 (commit 777a081):

 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1092/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Friday, May 22, at 17:03 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org






Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-22 Thread Michael Armbrust
Thanks for the feedback.  As you stated UDTs are explicitly not a public
api as we knew we were going to be making breaking changes to them.  We
hope to stabilize / open them up in future releases.  Regarding the Hive
issue, have you tried using TestHive instead.  This is what we use for
testing and it takes care of creating temporary directories for all
storage.  It also has a reset() function that you can call in-between
tests.  If this doesn't work for you, maybe open a JIRA and we can discuss
more there.

On Fri, May 22, 2015 at 12:56 PM, Justin Uang justin.u...@gmail.com wrote:

 I'm working on one of the Palantir teams using Spark, and here is our
 feedback:

 We have encountered three issues when upgrading to spark 1.4.0. I'm not
 sure they qualify as a -1, as they come from using non-public APIs and
 multiple spark contexts for the purposes of testing, but I do want to bring
 them up for awareness =)

1. Our UDT was serializing to a StringType, but now strings are
represented internally as UTF8String, so we had to change our UDT to use
UTF8String.apply() and UTF8String.toString() to convert back to String.
2. createDataFrame when using UDTs used to accept things in the
serialized catalyst form. Now, they're supposed to be in the UDT java class
form (I think this change would've affected us in 1.3.1 already, since we
were in 1.3.0)
3. derby database lifecycle management issue with HiveContext. We have
been using a SparkContextResource JUnit Rule that we wrote, and it sets up
then tears down a SparkContext and HiveContext between unit test runs
within the same process (possibly the same thread as well). Multiple
contexts are not being used at once. It used to work in 1.3.0, but now when
we try to create the HiveContext for the second unit test, then it
complains with the following exception. I have a feeling it might have
something to do with the Hive object being thread local, and us not
explicitly closing the HiveContext and everything it holds. The full stack
trace is here: https://gist.github.com/justinuang/0403d49cdeedf91727cd

 Caused by: java.sql.SQLException: Failed to start database 'metastore_db' 
 with class loader 
 org.apache.spark.sql.hive.client.IsolatedClientLoader$$anon$1@5dea2446, see 
 the next exception for details.
   at 
 org.apache.derby.impl.jdbc.SQLExceptionFactory.getSQLException(Unknown Source)


 On Wed, May 20, 2015 at 10:35 AM Imran Rashid iras...@cloudera.com
 wrote:

 -1

 discovered I accidentally removed master  worker json endpoints, will
 restore
 https://issues.apache.org/jira/browse/SPARK-7760

 On Tue, May 19, 2015 at 11:10 AM, Patrick Wendell pwend...@gmail.com
 wrote:

 Please vote on releasing the following candidate as Apache Spark version
 1.4.0!

 The tag to be voted on is v1.4.0-rc1 (commit 777a081):

 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1092/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Friday, May 22, at 17:03 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org





Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-22 Thread Patrick Wendell
Thanks Andrew, the doc issue should be fixed in RC2 (if not, please
chine in!). R was missing in the build envirionment.

- Patrick

On Fri, May 22, 2015 at 3:33 PM, Shivaram Venkataraman
shiva...@eecs.berkeley.edu wrote:
 Thanks for catching this. I'll check with Patrick to see why the R API docs
 are not getting included.

 On Fri, May 22, 2015 at 2:44 PM, Andrew Psaltis psaltis.and...@gmail.com
 wrote:

 All,
 Should all the docs work from
 http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/ ? If so the R API
 docs 404.


 On Tue, May 19, 2015 at 11:10 AM, Patrick Wendell pwend...@gmail.com
 wrote:

 Please vote on releasing the following candidate as Apache Spark
 version 1.4.0!

 The tag to be voted on is v1.4.0-rc1 (commit 777a081):

 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e

 The release files, including signatures, digests, etc. can be found
 at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:

 https://repository.apache.org/content/repositories/orgapachespark-1092/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Friday, May 22, at 17:03 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org






-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-22 Thread jameszhouyi
We came across a Spark SQL issue
(https://issues.apache.org/jira/browse/SPARK-7119) that cause query to fail.
I not sure that if vote -1 to this RC1.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-4-0-RC1-tp12321p12403.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-20 Thread Sean Owen
Signature, hashes, LICENSE/NOTICE, source tarball looks OK. I built
for Hadoop 2.6 (-Pyarn -Phive -Phadoop-2.6) on Ubuntu from source and
tests pass. The release looks OK except that I'd like to resolve the
Blockers before giving a +1.

I'm seeing some test failures, and wanted to cross-check with others.
They're all in Hive. Some I think are due to Java 8 differences and
are just test issues; they expect an exact output from a query plan
and some HashSet ordering differences make it trivially different. If
so, I've seen this in the past and we could ignore it for now, but
would be good to get a second set of eyes. The trace is big so it's at
the end.

When rerunning with Java 7 I get a different error due to Hive version support:

- success sanity check *** FAILED ***
  java.lang.RuntimeException: [download failed:
org.jboss.netty#netty;3.2.2.Final!netty.jar(bundle), download failed:
commons-net#commons-net;3.1!commons-net.jar]
  at 
org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:972)
  at 
org.apache.spark.sql.hive.client.IsolatedClientLoader$$anonfun$3.apply(IsolatedClientLoader.scala:62)
  ...



Hive / possible Java 8 test issue:

- windowing.q -- 20. testSTATs *** FAILED ***
  Results do not match for windowing.q -- 20. testSTATs:
  == Parsed Logical Plan ==
  'WithWindowDefinition Map(w1 - WindowSpecDefinition ROWS BETWEEN 2
PRECEDING AND 2 FOLLOWING)
   'Project ['p_mfgr,'p_name,'p_size,UnresolvedWindowExpression
WindowSpecReference(w1)
   UnresolvedWindowFunction stddev
UnresolvedAttribute [p_retailprice]
   AS sdev#159481,UnresolvedWindowExpression WindowSpecReference(w1)
   UnresolvedWindowFunction stddev_pop
UnresolvedAttribute [p_retailprice]
   AS sdev_pop#159482,UnresolvedWindowExpression WindowSpecReference(w1)
   UnresolvedWindowFunction collect_set
UnresolvedAttribute [p_size]
   AS uniq_size#159483,UnresolvedWindowExpression WindowSpecReference(w1)
   UnresolvedWindowFunction variance
UnresolvedAttribute [p_retailprice]
   AS var#159484,UnresolvedWindowExpression WindowSpecReference(w1)
   UnresolvedWindowFunction corr
UnresolvedAttribute [p_size]
UnresolvedAttribute [p_retailprice]
   AS cor#159485,UnresolvedWindowExpression WindowSpecReference(w1)
   UnresolvedWindowFunction covar_pop
UnresolvedAttribute [p_size]
UnresolvedAttribute [p_retailprice]
   AS covarp#159486]
'UnresolvedRelation [part], None

  == Analyzed Logical Plan ==
  p_mfgr: string, p_name: string, p_size: int, sdev: double, sdev_pop:
double, uniq_size: arrayint, var: double, cor: double, covarp:
double
  Project 
[p_mfgr#159489,p_name#159488,p_size#159492,sdev#159481,sdev_pop#159482,uniq_size#159483,var#159484,cor#159485,covarp#159486]
   Window [p_mfgr#159489,p_name#159488,p_size#159492,p_retailprice#159494],
[HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFStd(p_retailprice#159494)
WindowSpecDefinition ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING AS
sdev#159481,HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFStd(p_retailprice#159494)
WindowSpecDefinition ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING AS
sdev_pop#159482,HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCollectSet(p_size#159492)
WindowSpecDefinition ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING AS
uniq_size#159483,HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFVariance(p_retailprice#159494)
WindowSpecDefinition ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING AS
var#159484,HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCorrelation(p_size#159492,p_retailprice#159494)
WindowSpecDefinition ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING AS
cor#159485,HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCovariance(p_size#159492,p_retailprice#159494)
WindowSpecDefinition ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING AS
covarp#159486], WindowSpecDefinition ROWS BETWEEN 2 PRECEDING AND 2
FOLLOWING
Project [p_mfgr#159489,p_name#159488,p_size#159492,p_retailprice#159494]
 MetastoreRelation default, part, None

  == Optimized Logical Plan ==
  Project 
[p_mfgr#159489,p_name#159488,p_size#159492,sdev#159481,sdev_pop#159482,uniq_size#159483,var#159484,cor#159485,covarp#159486]
   Window [p_mfgr#159489,p_name#159488,p_size#159492,p_retailprice#159494],
[HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFStd(p_retailprice#159494)
WindowSpecDefinition ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING AS
sdev#159481,HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFStd(p_retailprice#159494)
WindowSpecDefinition ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING AS
sdev_pop#159482,HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFCollectSet(p_size#159492)
WindowSpecDefinition ROWS BETWEEN 2 PRECEDING AND 2 FOLLOWING AS
uniq_size#159483,HiveWindowFunction#org.apache.hadoop.hive.ql.udf.generic.GenericUDAFVariance(p_retailprice#159494)
WindowSpecDefinition ROWS BETWEEN 2 

Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-20 Thread Imran Rashid
-1

discovered I accidentally removed master  worker json endpoints, will
restore
https://issues.apache.org/jira/browse/SPARK-7760

On Tue, May 19, 2015 at 11:10 AM, Patrick Wendell pwend...@gmail.com
wrote:

 Please vote on releasing the following candidate as Apache Spark version
 1.4.0!

 The tag to be voted on is v1.4.0-rc1 (commit 777a081):

 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1092/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Friday, May 22, at 17:03 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-19 Thread Krishna Sankar
Quick tests from my side - looks OK. The results are same or very similar
to 1.3.1. Will add dataframes et al in future tests.

+1 (non-binding, of course)

1. Compiled OSX 10.10 (Yosemite) OK Total time: 17:42 min
 mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
-Dhadoop.version=2.6.0 -Phive -DskipTests
2. Tested pyspark, mlib - running as well as compare results with 1.3.1
2.1. statistics (min,max,mean,Pearson,Spearman) OK
2.2. Linear/Ridge/Laso Regression OK
2.3. Decision Tree, Naive Bayes OK
2.4. KMeans OK
   Center And Scale OK
2.5. RDD operations OK
  State of the Union Texts - MapReduce, Filter,sortByKey (word count)
2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
   Model evaluation/optimization (rank, numIter, lambda) with itertools
OK

Cheers
k/

On Tue, May 19, 2015 at 9:10 AM, Patrick Wendell pwend...@gmail.com wrote:

 Please vote on releasing the following candidate as Apache Spark version
 1.4.0!

 The tag to be voted on is v1.4.0-rc1 (commit 777a081):

 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1092/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Friday, May 22, at 17:03 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-19 Thread Patrick Wendell
HI all,

I've created another release repository where the release is
identified with the version 1.4.0-rc1:

https://repository.apache.org/content/repositories/orgapachespark-1093/

On Tue, May 19, 2015 at 5:36 PM, Krishna Sankar ksanka...@gmail.com wrote:
 Quick tests from my side - looks OK. The results are same or very similar to
 1.3.1. Will add dataframes et al in future tests.

 +1 (non-binding, of course)

 1. Compiled OSX 10.10 (Yosemite) OK Total time: 17:42 min
  mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
 -Dhadoop.version=2.6.0 -Phive -DskipTests
 2. Tested pyspark, mlib - running as well as compare results with 1.3.1
 2.1. statistics (min,max,mean,Pearson,Spearman) OK
 2.2. Linear/Ridge/Laso Regression OK
 2.3. Decision Tree, Naive Bayes OK
 2.4. KMeans OK
Center And Scale OK
 2.5. RDD operations OK
   State of the Union Texts - MapReduce, Filter,sortByKey (word count)
 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
Model evaluation/optimization (rank, numIter, lambda) with itertools
 OK

 Cheers
 k/

 On Tue, May 19, 2015 at 9:10 AM, Patrick Wendell pwend...@gmail.com wrote:

 Please vote on releasing the following candidate as Apache Spark version
 1.4.0!

 The tag to be voted on is v1.4.0-rc1 (commit 777a081):

 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1092/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Friday, May 22, at 17:03 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org



-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-19 Thread Patrick Wendell
Punya,

Let me see if I can publish these under rc1 as well. In the future
this will all be automated but current it's a somewhat manual task.

- Patrick

On Tue, May 19, 2015 at 9:32 AM, Punyashloka Biswal
punya.bis...@gmail.com wrote:
 When publishing future RCs to the staging repository, would it be possible
 to use a version number that includes the rc1 designation? In the current
 setup, when I run a build against the artifacts at
 https://repository.apache.org/content/repositories/orgapachespark-1092/org/apache/spark/spark-core_2.10/1.4.0/,
 my local Maven cache will get polluted with things that claim to be 1.4.0
 but aren't. It would be preferable for the version number to be 1.4.0-rc1
 instead.

 Thanks!
 Punya


 On Tue, May 19, 2015 at 12:20 PM Sean Owen so...@cloudera.com wrote:

 Before I vote, I wanted to point out there are still 9 Blockers for 1.4.0.
 I'd like to use this status to really mean must happen before the release.
 Many of these may be already fixed, or aren't really blockers -- can just be
 updated accordingly.

 I bet at least one will require further work if it's really meant for 1.4,
 so all this means is there is likely to be another RC. We should still kick
 the tires on RC1.

 (I also assume we should be extra conservative about what is merged into
 1.4 at this point.)


 SPARK-6784 SQL Clean up all the inbound/outbound conversions for DateType
 Adrian Wang

 SPARK-6811 SparkR Building binary R packages for SparkR Shivaram
 Venkataraman

 SPARK-6941 SQL Provide a better error message to explain that tables
 created from RDDs are immutable
 SPARK-7158 SQL collect and take return different results
 SPARK-7478 SQL Add a SQLContext.getOrCreate to maintain a singleton
 instance of SQLContext Tathagata Das

 SPARK-7616 SQL Overwriting a partitioned parquet table corrupt data Cheng
 Lian

 SPARK-7654 SQL DataFrameReader and DataFrameWriter for input/output API
 Reynold Xin

 SPARK-7662 SQL Exception of multi-attribute generator anlysis in
 projection

 SPARK-7713 SQL Use shared broadcast hadoop conf for partitioned table
 scan. Yin Huai


 On Tue, May 19, 2015 at 5:10 PM, Patrick Wendell pwend...@gmail.com
 wrote:

 Please vote on releasing the following candidate as Apache Spark version
 1.4.0!

 The tag to be voted on is v1.4.0-rc1 (commit 777a081):

 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1092/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Friday, May 22, at 17:03 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org




-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-19 Thread Punyashloka Biswal
Thanks! I realize that manipulating the published version in the pom is a
bit inconvenient but it's really useful to have clear version identifiers
when we're juggling different versions and testing them out. For example,
this will come in handy when we compare 1.4.0-rc1 and 1.4.0-rc2 in a couple
of weeks :)

Punya

On Tue, May 19, 2015 at 12:39 PM Patrick Wendell pwend...@gmail.com wrote:

 Punya,

 Let me see if I can publish these under rc1 as well. In the future
 this will all be automated but current it's a somewhat manual task.

 - Patrick

 On Tue, May 19, 2015 at 9:32 AM, Punyashloka Biswal
 punya.bis...@gmail.com wrote:
  When publishing future RCs to the staging repository, would it be
 possible
  to use a version number that includes the rc1 designation? In the
 current
  setup, when I run a build against the artifacts at
 
 https://repository.apache.org/content/repositories/orgapachespark-1092/org/apache/spark/spark-core_2.10/1.4.0/
 ,
  my local Maven cache will get polluted with things that claim to be 1.4.0
  but aren't. It would be preferable for the version number to be 1.4.0-rc1
  instead.
 
  Thanks!
  Punya
 
 
  On Tue, May 19, 2015 at 12:20 PM Sean Owen so...@cloudera.com wrote:
 
  Before I vote, I wanted to point out there are still 9 Blockers for
 1.4.0.
  I'd like to use this status to really mean must happen before the
 release.
  Many of these may be already fixed, or aren't really blockers -- can
 just be
  updated accordingly.
 
  I bet at least one will require further work if it's really meant for
 1.4,
  so all this means is there is likely to be another RC. We should still
 kick
  the tires on RC1.
 
  (I also assume we should be extra conservative about what is merged into
  1.4 at this point.)
 
 
  SPARK-6784 SQL Clean up all the inbound/outbound conversions for
 DateType
  Adrian Wang
 
  SPARK-6811 SparkR Building binary R packages for SparkR Shivaram
  Venkataraman
 
  SPARK-6941 SQL Provide a better error message to explain that tables
  created from RDDs are immutable
  SPARK-7158 SQL collect and take return different results
  SPARK-7478 SQL Add a SQLContext.getOrCreate to maintain a singleton
  instance of SQLContext Tathagata Das
 
  SPARK-7616 SQL Overwriting a partitioned parquet table corrupt data
 Cheng
  Lian
 
  SPARK-7654 SQL DataFrameReader and DataFrameWriter for input/output API
  Reynold Xin
 
  SPARK-7662 SQL Exception of multi-attribute generator anlysis in
  projection
 
  SPARK-7713 SQL Use shared broadcast hadoop conf for partitioned table
  scan. Yin Huai
 
 
  On Tue, May 19, 2015 at 5:10 PM, Patrick Wendell pwend...@gmail.com
  wrote:
 
  Please vote on releasing the following candidate as Apache Spark
 version
  1.4.0!
 
  The tag to be voted on is v1.4.0-rc1 (commit 777a081):
 
 
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e
 
  The release files, including signatures, digests, etc. can be found at:
  http://people.apache.org/~pwendell/spark-1.4.0-rc1/
 
  Release artifacts are signed with the following key:
  https://people.apache.org/keys/committer/pwendell.asc
 
  The staging repository for this release can be found at:
 
 https://repository.apache.org/content/repositories/orgapachespark-1092/
 
  The documentation corresponding to this release can be found at:
  http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/
 
  Please vote on releasing this package as Apache Spark 1.4.0!
 
  The vote is open until Friday, May 22, at 17:03 UTC and passes
  if a majority of at least 3 +1 PMC votes are cast.
 
  [ ] +1 Release this package as Apache Spark 1.4.0
  [ ] -1 Do not release this package because ...
 
  To learn more about Apache Spark, please see
  http://spark.apache.org/
 
  == How can I help test this release? ==
  If you are a Spark user, you can help us test this release by
  taking a Spark 1.3 workload and running on this release candidate,
  then reporting any regressions.
 
  == What justifies a -1 vote for this release? ==
  This vote is happening towards the end of the 1.4 QA period,
  so -1 votes should only occur for significant regressions from 1.3.1.
  Bugs already present in 1.3.X, minor regressions, or bugs related
  to new features will not block this release.
 
  -
  To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
  For additional commands, e-mail: dev-h...@spark.apache.org
 
 
 



Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-19 Thread Patrick Wendell
A couple of other process things:

1. Please *keep voting* (+1/-1) on this thread even if we find some
issues, until we cut RC2. This lets us pipeline the QA.
2. The SQL team owes a JIRA clean-up (forthcoming shortly)... there
are still a few Blocker's that aren't.


On Tue, May 19, 2015 at 9:10 AM, Patrick Wendell pwend...@gmail.com wrote:
 Please vote on releasing the following candidate as Apache Spark version 
 1.4.0!

 The tag to be voted on is v1.4.0-rc1 (commit 777a081):
 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1092/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Friday, May 22, at 17:03 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.4.0 (RC1)

2015-05-19 Thread Sean Owen
Before I vote, I wanted to point out there are still 9 Blockers for 1.4.0.
I'd like to use this status to really mean must happen before the
release. Many of these may be already fixed, or aren't really blockers --
can just be updated accordingly.

I bet at least one will require further work if it's really meant for 1.4,
so all this means is there is likely to be another RC. We should still kick
the tires on RC1.

(I also assume we should be extra conservative about what is merged into
1.4 at this point.)


SPARK-6784 SQL Clean up all the inbound/outbound conversions for
DateType Adrian
Wang

SPARK-6811 SparkR Building binary R packages for SparkR Shivaram
Venkataraman

SPARK-6941 SQL Provide a better error message to explain that tables
created from RDDs are immutable
SPARK-7158 SQL collect and take return different results
SPARK-7478 SQL Add a SQLContext.getOrCreate to maintain a singleton
instance of SQLContext Tathagata Das

SPARK-7616 SQL Overwriting a partitioned parquet table corrupt data Cheng
Lian

SPARK-7654 SQL DataFrameReader and DataFrameWriter for input/output API Reynold
Xin

SPARK-7662 SQL Exception of multi-attribute generator anlysis in projection

SPARK-7713 SQL Use shared broadcast hadoop conf for partitioned table scan. Yin
Huai


On Tue, May 19, 2015 at 5:10 PM, Patrick Wendell pwend...@gmail.com wrote:

 Please vote on releasing the following candidate as Apache Spark version
 1.4.0!

 The tag to be voted on is v1.4.0-rc1 (commit 777a081):

 https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=777a08166f1fb144146ba32581d4632c3466541e

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1092/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-1.4.0-rc1-docs/

 Please vote on releasing this package as Apache Spark 1.4.0!

 The vote is open until Friday, May 22, at 17:03 UTC and passes
 if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 1.4.0
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see
 http://spark.apache.org/

 == How can I help test this release? ==
 If you are a Spark user, you can help us test this release by
 taking a Spark 1.3 workload and running on this release candidate,
 then reporting any regressions.

 == What justifies a -1 vote for this release? ==
 This vote is happening towards the end of the 1.4 QA period,
 so -1 votes should only occur for significant regressions from 1.3.1.
 Bugs already present in 1.3.X, minor regressions, or bugs related
 to new features will not block this release.

 -
 To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
 For additional commands, e-mail: dev-h...@spark.apache.org