Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-26 Thread Stephen Hellberg
Yeah, I thought the vote was closed... but I couldn't think of a better
thread to remark upon!
That's a useful comment on Derby's role - thanks.  Certainly, we'd just
attempted a build-and-test execution with revising the Derby level to the
current 10.12.1.1, and hadn't observed any issues... a PR will be
forthcoming.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-2-0-0-RC5-tp18367p18467.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-26 Thread Sean Owen
The release vote has already closed and passed. Derby is only used in
tests AFAIK, so I don't think this is even critical let alone a
blocker. Updating is fine though, open a PR.

On Tue, Jul 26, 2016 at 3:37 PM, Stephen Hellberg  wrote:
>  -1   Sorry, I've just noted that the RC5 proposal includes shipping Derby @
> 10.11.1.1 which is vulnerable to CVE: 2015-1832.
> It would be ideal if we could instead ship 10.12.1.1 real soon.
>
>
>
> --
> View this message in context: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-2-0-0-RC5-tp18367p18465.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-26 Thread Stephen Hellberg
 -1   Sorry, I've just noted that the RC5 proposal includes shipping Derby @
10.11.1.1 which is vulnerable to CVE: 2015-1832.
It would be ideal if we could instead ship 10.12.1.1 real soon.



--
View this message in context: 
http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-2-0-0-RC5-tp18367p18465.html
Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-25 Thread Luciano Resende
When are we planning to push the release maven artifacts ? We are waiting
for this in order to push an official Apache Bahir release supporting Spark
2.0.

On Sat, Jul 23, 2016 at 7:05 AM, Reynold Xin  wrote:

> The vote has passed with the following +1 votes and no -1 votes. I will
> work on packaging the new release next week.
>
>
> +1
>
> Reynold Xin*
> Sean Owen*
> Shivaram Venkataraman*
> Jonathan Kelly
> Joseph E. Gonzalez*
> Krishna Sankar
> Dongjoon Hyun
> Ricardo Almeida
> Joseph Bradley*
> Matei Zaharia*
> Luciano Resende
> Holden Karau
> Michael Armbrust*
> Felix Cheung
> Suresh Thalamati
> Kousuke Saruta
> Xiao Li
>
>
> * binding votes
>
>
> On July 19, 2016 at 7:35:19 PM, Reynold Xin (r...@databricks.com) wrote:
>
> Please vote on releasing the following candidate as Apache Spark version
> 2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.0.0
> [ ] -1 Do not release this package because ...
>
>
> The tag to be voted on is v2.0.0-rc5
> (13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).
>
> This release candidate resolves ~2500 issues:
> https://s.apache.org/spark-2.0.0-jira
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1195/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/
>
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions from 1.x.
>
> ==
> What justifies a -1 vote for this release?
> ==
> Critical bugs impacting major functionalities.
>
> Bugs already present in 1.x, missing features, or bugs related to new
> features will not necessarily block this release. Note that historically
> Spark documentation has been published on the website separately from the
> main release so we do not need to block the release due to documentation
> errors either.
>
>


-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/


Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-23 Thread Ewan Leith
Ok cool, i didn't vote as I've done no real testing myself and i think the 
window had already closed anyway.

I'm happy to wait for 2.0.1 for our systems.

Thanks,
Ewan

On 23 Jul 2016 07:07, Reynold Xin  wrote:
Ewan not sure if you wanted to explicitly -1 so I didn’t include you in that.

I will document this as a known issue in the release notes. We have other bugs 
that we have fixed since RC5, and we can fix those together in 2.0.1.


On July 22, 2016 at 10:24:32 PM, Ewan Leith 
(ewan.le...@realitymine.com) wrote:

I think this new issue in JIRA blocks the release unfortunately?

https://issues.apache.org/jira/browse/SPARK-16664 - Persist call on data frames 
with more than 200 columns is wiping out the data

Otherwise there'll need to be 2.0.1 pretty much right after?

Thanks,
Ewan

On 23 Jul 2016 03:46, Xiao Li 
> wrote:
+1

2016-07-22 19:32 GMT-07:00 Kousuke Saruta 
>:

+1 (non-binding)

Tested on my cluster with three slave nodes.


On 2016/07/23 10:25, Suresh Thalamati wrote:
+1 (non-binding)

Tested data source api , and jdbc data sources.


On Jul 19, 2016, at 7:35 PM, Reynold Xin 
> wrote:

Please vote on releasing the following candidate as Apache Spark version 2.0.0. 
The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes if a 
majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.0.0
[ ] -1 Do not release this package because ...


The tag to be voted on is v2.0.0-rc5 (13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).

This release candidate resolves ~2500 issues: 
https://s.apache.org/spark-2.0.0-jira

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1195/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/


=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking an 
existing Spark workload and running on this release candidate, then reporting 
any regressions from 1.x.

==
What justifies a -1 vote for this release?
==
Critical bugs impacting major functionalities.

Bugs already present in 1.x, missing features, or bugs related to new features 
will not necessarily block this release. Note that historically Spark 
documentation has been published on the website separately from the main 
release so we do not need to block the release due to documentation errors 
either.








Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-23 Thread Reynold Xin
Ewan not sure if you wanted to explicitly -1 so I didn’t include you in
that.

I will document this as a known issue in the release notes. We have other
bugs that we have fixed since RC5, and we can fix those together in 2.0.1.

On July 22, 2016 at 10:24:32 PM, Ewan Leith (ewan.le...@realitymine.com)
wrote:

I think this new issue in JIRA blocks the release unfortunately?

https://issues.apache.org/jira/browse/SPARK-16664 - Persist call on data
frames with more than 200 columns is wiping out the data

Otherwise there'll need to be 2.0.1 pretty much right after?

Thanks,
Ewan

On 23 Jul 2016 03:46, Xiao Li  wrote:

+1

2016-07-22 19:32 GMT-07:00 Kousuke Saruta :

+1 (non-binding)

Tested on my cluster with three slave nodes.

On 2016/07/23 10:25, Suresh Thalamati wrote:

+1 (non-binding)

Tested data source api , and jdbc data sources.


On Jul 19, 2016, at 7:35 PM, Reynold Xin  wrote:

Please vote on releasing the following candidate as Apache Spark version
2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.0.0
[ ] -1 Do not release this package because ...


The tag to be voted on is v2.0.0-rc5
(13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).

This release candidate resolves ~2500 issues:
https://s.apache.org/spark-2.0.0-jira

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1195/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/


=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking an
existing Spark workload and running on this release candidate, then
reporting any regressions from 1.x.

==
What justifies a -1 vote for this release?
==
Critical bugs impacting major functionalities.

Bugs already present in 1.x, missing features, or bugs related to new
features will not necessarily block this release. Note that historically
Spark documentation has been published on the website separately from the
main release so we do not need to block the release due to documentation
errors either.


Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-23 Thread Reynold Xin
The vote has passed with the following +1 votes and no -1 votes. I will
work on packaging the new release next week.


+1

Reynold Xin*
Sean Owen*
Shivaram Venkataraman*
Jonathan Kelly
Joseph E. Gonzalez*
Krishna Sankar
Dongjoon Hyun
Ricardo Almeida
Joseph Bradley*
Matei Zaharia*
Luciano Resende
Holden Karau
Michael Armbrust*
Felix Cheung
Suresh Thalamati
Kousuke Saruta
Xiao Li


* binding votes


On July 19, 2016 at 7:35:19 PM, Reynold Xin (r...@databricks.com) wrote:

Please vote on releasing the following candidate as Apache Spark version
2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.0.0
[ ] -1 Do not release this package because ...


The tag to be voted on is v2.0.0-rc5
(13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).

This release candidate resolves ~2500 issues:
https://s.apache.org/spark-2.0.0-jira

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1195/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/


=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking an
existing Spark workload and running on this release candidate, then
reporting any regressions from 1.x.

==
What justifies a -1 vote for this release?
==
Critical bugs impacting major functionalities.

Bugs already present in 1.x, missing features, or bugs related to new
features will not necessarily block this release. Note that historically
Spark documentation has been published on the website separately from the
main release so we do not need to block the release due to documentation
errors either.


Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-22 Thread Ewan Leith
I think this new issue in JIRA blocks the release unfortunately?

https://issues.apache.org/jira/browse/SPARK-16664 - Persist call on data frames 
with more than 200 columns is wiping out the data

Otherwise there'll need to be 2.0.1 pretty much right after?

Thanks,
Ewan

On 23 Jul 2016 03:46, Xiao Li  wrote:
+1

2016-07-22 19:32 GMT-07:00 Kousuke Saruta 
>:

+1 (non-binding)

Tested on my cluster with three slave nodes.


On 2016/07/23 10:25, Suresh Thalamati wrote:
+1 (non-binding)

Tested data source api , and jdbc data sources.


On Jul 19, 2016, at 7:35 PM, Reynold Xin 
> wrote:

Please vote on releasing the following candidate as Apache Spark version 2.0.0. 
The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes if a 
majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.0.0
[ ] -1 Do not release this package because ...


The tag to be voted on is v2.0.0-rc5 (13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).

This release candidate resolves ~2500 issues: 
https://s.apache.org/spark-2.0.0-jira

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1195/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/


=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking an 
existing Spark workload and running on this release candidate, then reporting 
any regressions from 1.x.

==
What justifies a -1 vote for this release?
==
Critical bugs impacting major functionalities.

Bugs already present in 1.x, missing features, or bugs related to new features 
will not necessarily block this release. Note that historically Spark 
documentation has been published on the website separately from the main 
release so we do not need to block the release due to documentation errors 
either.







Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-22 Thread Xiao Li
+1

2016-07-22 19:32 GMT-07:00 Kousuke Saruta :

> +1 (non-binding)
>
> Tested on my cluster with three slave nodes.
>
> On 2016/07/23 10:25, Suresh Thalamati wrote:
>
> +1 (non-binding)
>
> Tested data source api , and jdbc data sources.
>
>
> On Jul 19, 2016, at 7:35 PM, Reynold Xin  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version
> 2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.0.0
> [ ] -1 Do not release this package because ...
>
>
> The tag to be voted on is v2.0.0-rc5
> (13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).
>
> This release candidate resolves ~2500 issues:
> https://s.apache.org/spark-2.0.0-jira
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1195/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/
>
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions from 1.x.
>
> ==
> What justifies a -1 vote for this release?
> ==
> Critical bugs impacting major functionalities.
>
> Bugs already present in 1.x, missing features, or bugs related to new
> features will not necessarily block this release. Note that historically
> Spark documentation has been published on the website separately from the
> main release so we do not need to block the release due to documentation
> errors either.
>
>
>
>


Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-22 Thread Kousuke Saruta

+1 (non-binding)

Tested on my cluster with three slave nodes.

On 2016/07/23 10:25, Suresh Thalamati wrote:

+1 (non-binding)

Tested data source api , and jdbc data sources.


On Jul 19, 2016, at 7:35 PM, Reynold Xin > wrote:


Please vote on releasing the following candidate as Apache Spark 
version 2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 
PDT and passes if a majority of at least 3 +1 PMC votes are cast.


[ ] +1 Release this package as Apache Spark 2.0.0
[ ] -1 Do not release this package because ...


The tag to be voted on is v2.0.0-rc5 
(13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).


This release candidate resolves ~2500 issues: 
https://s.apache.org/spark-2.0.0-jira


The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/ 



Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1195/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/ 




=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking 
an existing Spark workload and running on this release candidate, 
then reporting any regressions from 1.x.


==
What justifies a -1 vote for this release?
==
Critical bugs impacting major functionalities.

Bugs already present in 1.x, missing features, or bugs related to new 
features will not necessarily block this release. Note that 
historically Spark documentation has been published on the website 
separately from the main release so we do not need to block the 
release due to documentation errors either.








Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-22 Thread Suresh Thalamati
+1 (non-binding)

Tested data source api , and jdbc data sources. 


> On Jul 19, 2016, at 7:35 PM, Reynold Xin  wrote:
> 
> Please vote on releasing the following candidate as Apache Spark version 
> 2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes 
> if a majority of at least 3 +1 PMC votes are cast.
> 
> [ ] +1 Release this package as Apache Spark 2.0.0
> [ ] -1 Do not release this package because ...
> 
> 
> The tag to be voted on is v2.0.0-rc5 
> (13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).
> 
> This release candidate resolves ~2500 issues: 
> https://s.apache.org/spark-2.0.0-jira 
> 
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/ 
> 
> 
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc 
> 
> 
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1195/ 
> 
> 
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/ 
> 
> 
> 
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking an 
> existing Spark workload and running on this release candidate, then reporting 
> any regressions from 1.x.
> 
> ==
> What justifies a -1 vote for this release?
> ==
> Critical bugs impacting major functionalities.
> 
> Bugs already present in 1.x, missing features, or bugs related to new 
> features will not necessarily block this release. Note that historically 
> Spark documentation has been published on the website separately from the 
> main release so we do not need to block the release due to documentation 
> errors either.
> 



Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-22 Thread Felix Cheung
+1

Tested on Ubuntu, ran a bunch of SparkR tests, found a broken link in doc but 
not a blocker.


_
From: Michael Armbrust <mich...@databricks.com<mailto:mich...@databricks.com>>
Sent: Friday, July 22, 2016 3:18 PM
Subject: Re: [VOTE] Release Apache Spark 2.0.0 (RC5)
To: <dev@spark.apache.org<mailto:dev@spark.apache.org>>
Cc: Reynold Xin <r...@databricks.com<mailto:r...@databricks.com>>


+1

On Fri, Jul 22, 2016 at 2:42 PM, Holden Karau 
<hol...@pigscanfly.ca<mailto:hol...@pigscanfly.ca>> wrote:
+1 (non-binding)

Built locally on Ubuntu 14.04, basic pyspark sanity checking & tested with a 
simple structured streaming project (spark-structured-streaming-ml) & 
spark-testing-base & high-performance-spark-examples (minor changes required 
from preview version but seem intentional & jetty conflicts with out of date 
testing library - but not a Spark problem).

On Fri, Jul 22, 2016 at 12:45 PM, Luciano Resende 
<luckbr1...@gmail.com<mailto:luckbr1...@gmail.com>> wrote:
+ 1 (non-binding)

Found a minor issue when trying to run some of the docker tests, but nothing 
blocking the release. Will create a JIRA for that.

On Tue, Jul 19, 2016 at 7:35 PM, Reynold Xin 
<r...@databricks.com<mailto:r...@databricks.com>> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.0.0. 
The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes if a 
majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 2.0.0
[ ] -1 Do not release this package because ...


The tag to be voted on is v2.0.0-rc5 (13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).

This release candidate resolves ~2500 issues: 
https://s.apache.org/spark-2.0.0-jira

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1195/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/


=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking an 
existing Spark workload and running on this release candidate, then reporting 
any regressions from 1.x.

==
What justifies a -1 vote for this release?
==
Critical bugs impacting major functionalities.

Bugs already present in 1.x, missing features, or bugs related to new features 
will not necessarily block this release. Note that historically Spark 
documentation has been published on the website separately from the main 
release so we do not need to block the release due to documentation errors 
either.




--
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/



--
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau





Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-22 Thread Michael Armbrust
+1

On Fri, Jul 22, 2016 at 2:42 PM, Holden Karau  wrote:

> +1 (non-binding)
>
> Built locally on Ubuntu 14.04, basic pyspark sanity checking & tested with
> a simple structured streaming project (spark-structured-streaming-ml) &
> spark-testing-base & high-performance-spark-examples (minor changes
> required from preview version but seem intentional & jetty conflicts with
> out of date testing library - but not a Spark problem).
>
> On Fri, Jul 22, 2016 at 12:45 PM, Luciano Resende 
> wrote:
>
>> + 1 (non-binding)
>>
>> Found a minor issue when trying to run some of the docker tests, but
>> nothing blocking the release. Will create a JIRA for that.
>>
>> On Tue, Jul 19, 2016 at 7:35 PM, Reynold Xin  wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes
>>> if a majority of at least 3 +1 PMC votes are cast.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.0.0
>>> [ ] -1 Do not release this package because ...
>>>
>>>
>>> The tag to be voted on is v2.0.0-rc5
>>> (13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).
>>>
>>> This release candidate resolves ~2500 issues:
>>> https://s.apache.org/spark-2.0.0-jira
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://people.apache.org/keys/committer/pwendell.asc
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1195/
>>>
>>> The documentation corresponding to this release can be found at:
>>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/
>>>
>>>
>>> =
>>> How can I help test this release?
>>> =
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions from 1.x.
>>>
>>> ==
>>> What justifies a -1 vote for this release?
>>> ==
>>> Critical bugs impacting major functionalities.
>>>
>>> Bugs already present in 1.x, missing features, or bugs related to new
>>> features will not necessarily block this release. Note that historically
>>> Spark documentation has been published on the website separately from the
>>> main release so we do not need to block the release due to documentation
>>> errors either.
>>>
>>>
>>
>>
>> --
>> Luciano Resende
>> http://twitter.com/lresende1975
>> http://lresende.blogspot.com/
>>
>
>
>
> --
> Cell : 425-233-8271
> Twitter: https://twitter.com/holdenkarau
>


Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-22 Thread Holden Karau
+1 (non-binding)

Built locally on Ubuntu 14.04, basic pyspark sanity checking & tested with
a simple structured streaming project (spark-structured-streaming-ml) &
spark-testing-base & high-performance-spark-examples (minor changes
required from preview version but seem intentional & jetty conflicts with
out of date testing library - but not a Spark problem).

On Fri, Jul 22, 2016 at 12:45 PM, Luciano Resende 
wrote:

> + 1 (non-binding)
>
> Found a minor issue when trying to run some of the docker tests, but
> nothing blocking the release. Will create a JIRA for that.
>
> On Tue, Jul 19, 2016 at 7:35 PM, Reynold Xin  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes
>> if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.0.0
>> [ ] -1 Do not release this package because ...
>>
>>
>> The tag to be voted on is v2.0.0-rc5
>> (13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).
>>
>> This release candidate resolves ~2500 issues:
>> https://s.apache.org/spark-2.0.0-jira
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1195/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/
>>
>>
>> =
>> How can I help test this release?
>> =
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions from 1.x.
>>
>> ==
>> What justifies a -1 vote for this release?
>> ==
>> Critical bugs impacting major functionalities.
>>
>> Bugs already present in 1.x, missing features, or bugs related to new
>> features will not necessarily block this release. Note that historically
>> Spark documentation has been published on the website separately from the
>> main release so we do not need to block the release due to documentation
>> errors either.
>>
>>
>
>
> --
> Luciano Resende
> http://twitter.com/lresende1975
> http://lresende.blogspot.com/
>



-- 
Cell : 425-233-8271
Twitter: https://twitter.com/holdenkarau


Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-22 Thread Luciano Resende
+ 1 (non-binding)

Found a minor issue when trying to run some of the docker tests, but
nothing blocking the release. Will create a JIRA for that.

On Tue, Jul 19, 2016 at 7:35 PM, Reynold Xin  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.0.0
> [ ] -1 Do not release this package because ...
>
>
> The tag to be voted on is v2.0.0-rc5
> (13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).
>
> This release candidate resolves ~2500 issues:
> https://s.apache.org/spark-2.0.0-jira
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1195/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/
>
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions from 1.x.
>
> ==
> What justifies a -1 vote for this release?
> ==
> Critical bugs impacting major functionalities.
>
> Bugs already present in 1.x, missing features, or bugs related to new
> features will not necessarily block this release. Note that historically
> Spark documentation has been published on the website separately from the
> main release so we do not need to block the release due to documentation
> errors either.
>
>


-- 
Luciano Resende
http://twitter.com/lresende1975
http://lresende.blogspot.com/


Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-22 Thread Matei Zaharia
+1

Tested on Mac.

Matei

> On Jul 22, 2016, at 11:18 AM, Joseph Bradley  wrote:
> 
> +1
> 
> Mainly tested ML/Graph/R.  Perf tests from Tim Hunter showed minor speedups 
> from 1.6 for common ML algorithms.
> 
> On Thu, Jul 21, 2016 at 9:41 AM, Ricardo Almeida 
> > wrote:
> +1 (non binding)
> 
> Tested PySpark Core, DataFrame/SQL, MLlib and Streaming on a standalone 
> cluster
> 
> On 21 July 2016 at 05:24, Reynold Xin  > wrote:
> +1
> 
> 
> On Wednesday, July 20, 2016, Krishna Sankar  > wrote:
> +1 (non-binding, of course)
> 
> 1. Compiled OS X 10.11.5 (El Capitan) OK Total time: 24:07 min
>  mvn clean package -Pyarn -Phadoop-2.7 -DskipTests
> 2. Tested pyspark, mllib (iPython 4.0)
> 2.0 Spark version is 2.0.0 
> 2.1. statistics (min,max,mean,Pearson,Spearman) OK
> 2.2. Linear/Ridge/Lasso Regression OK 
> 2.3. Classification : Decision Tree, Naive Bayes OK
> 2.4. Clustering : KMeans OK
>Center And Scale OK
> 2.5. RDD operations OK
>   State of the Union Texts - MapReduce, Filter,sortByKey (word count)
> 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
>Model evaluation/optimization (rank, numIter, lambda) with itertools OK
> 3. Scala - MLlib
> 3.1. statistics (min,max,mean,Pearson,Spearman) OK
> 3.2. LinearRegressionWithSGD OK
> 3.3. Decision Tree OK
> 3.4. KMeans OK
> 3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK
> 3.6. saveAsParquetFile OK
> 3.7. Read and verify the 3.6 save(above) - sqlContext.parquetFile, 
> registerTempTable, sql OK
> 3.8. result = sqlContext.sql("SELECT 
> OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER 
> JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID") OK
> 4.0. Spark SQL from Python OK
> 4.1. result = sqlContext.sql("SELECT * from people WHERE State = 'WA'") OK
> 5.0. Packages
> 5.1. com.databricks.spark.csv - read/write OK (--packages 
> com.databricks:spark-csv_2.10:1.4.0)
> 6.0. DataFrames 
> 6.1. cast,dtypes OK
> 6.2. groupBy,avg,crosstab,corr,isNull,na.drop OK
> 6.3. All joins,sql,set operations,udf OK
> [Dataframe Operations very fast from 11 secs to 3 secs, to 1.8 secs, to 1.5 
> secs! Good work !!!]
> 7.0. GraphX/Scala
> 7.1. Create Graph (small and bigger dataset) OK
> 7.2. Structure APIs - OK
> 7.3. Social Network/Community APIs - OK
> 7.4. Algorithms : PageRank of 2 datasets, aggregateMessages() - OK
> 
> Cheers
> 
> 
> On Tue, Jul 19, 2016 at 7:35 PM, Reynold Xin > wrote:
> Please vote on releasing the following candidate as Apache Spark version 
> 2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes 
> if a majority of at least 3 +1 PMC votes are cast.
> 
> [ ] +1 Release this package as Apache Spark 2.0.0
> [ ] -1 Do not release this package because ...
> 
> 
> The tag to be voted on is v2.0.0-rc5 
> (13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).
> 
> This release candidate resolves ~2500 issues: 
> https://s.apache.org/spark-2.0.0-jira 
> 
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/ 
> 
> 
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc 
> 
> 
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1195/ 
> 
> 
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/ 
> 
> 
> 
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking an 
> existing Spark workload and running on this release candidate, then reporting 
> any regressions from 1.x.
> 
> ==
> What justifies a -1 vote for this release?
> ==
> Critical bugs impacting major functionalities.
> 
> Bugs already present in 1.x, missing features, or bugs related to new 
> features will not necessarily block this release. Note that historically 
> Spark documentation has been published on the website separately from the 
> main release so we do not need to block the release due to documentation 
> errors either.
> 
> 
> 
> 



Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-22 Thread Joseph Bradley
+1

Mainly tested ML/Graph/R.  Perf tests from Tim Hunter showed minor speedups
from 1.6 for common ML algorithms.

On Thu, Jul 21, 2016 at 9:41 AM, Ricardo Almeida <
ricardo.alme...@actnowib.com> wrote:

> +1 (non binding)
>
> Tested PySpark Core, DataFrame/SQL, MLlib and Streaming on a standalone
> cluster
>
> On 21 July 2016 at 05:24, Reynold Xin  wrote:
>
>> +1
>>
>>
>> On Wednesday, July 20, 2016, Krishna Sankar  wrote:
>>
>>> +1 (non-binding, of course)
>>>
>>> 1. Compiled OS X 10.11.5 (El Capitan) OK Total time: 24:07 min
>>>  mvn clean package -Pyarn -Phadoop-2.7 -DskipTests
>>> 2. Tested pyspark, mllib (iPython 4.0)
>>> 2.0 Spark version is 2.0.0
>>> 2.1. statistics (min,max,mean,Pearson,Spearman) OK
>>> 2.2. Linear/Ridge/Lasso Regression OK
>>> 2.3. Classification : Decision Tree, Naive Bayes OK
>>> 2.4. Clustering : KMeans OK
>>>Center And Scale OK
>>> 2.5. RDD operations OK
>>>   State of the Union Texts - MapReduce, Filter,sortByKey (word count)
>>> 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
>>>Model evaluation/optimization (rank, numIter, lambda) with
>>> itertools OK
>>> 3. Scala - MLlib
>>> 3.1. statistics (min,max,mean,Pearson,Spearman) OK
>>> 3.2. LinearRegressionWithSGD OK
>>> 3.3. Decision Tree OK
>>> 3.4. KMeans OK
>>> 3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK
>>> 3.6. saveAsParquetFile OK
>>> 3.7. Read and verify the 3.6 save(above) - sqlContext.parquetFile,
>>> registerTempTable, sql OK
>>> 3.8. result = sqlContext.sql("SELECT
>>> OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER
>>> JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID") OK
>>> 4.0. Spark SQL from Python OK
>>> 4.1. result = sqlContext.sql("SELECT * from people WHERE State = 'WA'")
>>> OK
>>> 5.0. Packages
>>> 5.1. com.databricks.spark.csv - read/write OK (--packages
>>> com.databricks:spark-csv_2.10:1.4.0)
>>> 6.0. DataFrames
>>> 6.1. cast,dtypes OK
>>> 6.2. groupBy,avg,crosstab,corr,isNull,na.drop OK
>>> 6.3. All joins,sql,set operations,udf OK
>>> [Dataframe Operations very fast from 11 secs to 3 secs, to 1.8 secs, to
>>> 1.5 secs! Good work !!!]
>>> 7.0. GraphX/Scala
>>> 7.1. Create Graph (small and bigger dataset) OK
>>> 7.2. Structure APIs - OK
>>> 7.3. Social Network/Community APIs - OK
>>> 7.4. Algorithms : PageRank of 2 datasets, aggregateMessages() - OK
>>>
>>> Cheers
>>> 
>>>
>>> On Tue, Jul 19, 2016 at 7:35 PM, Reynold Xin 
>>> wrote:
>>>
 Please vote on releasing the following candidate as Apache Spark
 version 2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 PDT
 and passes if a majority of at least 3 +1 PMC votes are cast.

 [ ] +1 Release this package as Apache Spark 2.0.0
 [ ] -1 Do not release this package because ...


 The tag to be voted on is v2.0.0-rc5
 (13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).

 This release candidate resolves ~2500 issues:
 https://s.apache.org/spark-2.0.0-jira

 The release files, including signatures, digests, etc. can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/

 Release artifacts are signed with the following key:
 https://people.apache.org/keys/committer/pwendell.asc

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1195/

 The documentation corresponding to this release can be found at:
 http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/


 =
 How can I help test this release?
 =
 If you are a Spark user, you can help us test this release by taking an
 existing Spark workload and running on this release candidate, then
 reporting any regressions from 1.x.

 ==
 What justifies a -1 vote for this release?
 ==
 Critical bugs impacting major functionalities.

 Bugs already present in 1.x, missing features, or bugs related to new
 features will not necessarily block this release. Note that historically
 Spark documentation has been published on the website separately from the
 main release so we do not need to block the release due to documentation
 errors either.


>>>
>


Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-20 Thread Reynold Xin
+1

On Wednesday, July 20, 2016, Krishna Sankar  wrote:

> +1 (non-binding, of course)
>
> 1. Compiled OS X 10.11.5 (El Capitan) OK Total time: 24:07 min
>  mvn clean package -Pyarn -Phadoop-2.7 -DskipTests
> 2. Tested pyspark, mllib (iPython 4.0)
> 2.0 Spark version is 2.0.0
> 2.1. statistics (min,max,mean,Pearson,Spearman) OK
> 2.2. Linear/Ridge/Lasso Regression OK
> 2.3. Classification : Decision Tree, Naive Bayes OK
> 2.4. Clustering : KMeans OK
>Center And Scale OK
> 2.5. RDD operations OK
>   State of the Union Texts - MapReduce, Filter,sortByKey (word count)
> 2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
>Model evaluation/optimization (rank, numIter, lambda) with
> itertools OK
> 3. Scala - MLlib
> 3.1. statistics (min,max,mean,Pearson,Spearman) OK
> 3.2. LinearRegressionWithSGD OK
> 3.3. Decision Tree OK
> 3.4. KMeans OK
> 3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK
> 3.6. saveAsParquetFile OK
> 3.7. Read and verify the 3.6 save(above) - sqlContext.parquetFile,
> registerTempTable, sql OK
> 3.8. result = sqlContext.sql("SELECT
> OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER
> JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID") OK
> 4.0. Spark SQL from Python OK
> 4.1. result = sqlContext.sql("SELECT * from people WHERE State = 'WA'") OK
> 5.0. Packages
> 5.1. com.databricks.spark.csv - read/write OK (--packages
> com.databricks:spark-csv_2.10:1.4.0)
> 6.0. DataFrames
> 6.1. cast,dtypes OK
> 6.2. groupBy,avg,crosstab,corr,isNull,na.drop OK
> 6.3. All joins,sql,set operations,udf OK
> [Dataframe Operations very fast from 11 secs to 3 secs, to 1.8 secs, to
> 1.5 secs! Good work !!!]
> 7.0. GraphX/Scala
> 7.1. Create Graph (small and bigger dataset) OK
> 7.2. Structure APIs - OK
> 7.3. Social Network/Community APIs - OK
> 7.4. Algorithms : PageRank of 2 datasets, aggregateMessages() - OK
>
> Cheers
> 
>
> On Tue, Jul 19, 2016 at 7:35 PM, Reynold Xin  > wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes
>> if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.0.0
>> [ ] -1 Do not release this package because ...
>>
>>
>> The tag to be voted on is v2.0.0-rc5
>> (13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).
>>
>> This release candidate resolves ~2500 issues:
>> https://s.apache.org/spark-2.0.0-jira
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1195/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/
>>
>>
>> =
>> How can I help test this release?
>> =
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions from 1.x.
>>
>> ==
>> What justifies a -1 vote for this release?
>> ==
>> Critical bugs impacting major functionalities.
>>
>> Bugs already present in 1.x, missing features, or bugs related to new
>> features will not necessarily block this release. Note that historically
>> Spark documentation has been published on the website separately from the
>> main release so we do not need to block the release due to documentation
>> errors either.
>>
>>
>


Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-20 Thread Krishna Sankar
+1 (non-binding, of course)

1. Compiled OS X 10.11.5 (El Capitan) OK Total time: 24:07 min
 mvn clean package -Pyarn -Phadoop-2.7 -DskipTests
2. Tested pyspark, mllib (iPython 4.0)
2.0 Spark version is 2.0.0
2.1. statistics (min,max,mean,Pearson,Spearman) OK
2.2. Linear/Ridge/Lasso Regression OK
2.3. Classification : Decision Tree, Naive Bayes OK
2.4. Clustering : KMeans OK
   Center And Scale OK
2.5. RDD operations OK
  State of the Union Texts - MapReduce, Filter,sortByKey (word count)
2.6. Recommendation (Movielens medium dataset ~1 M ratings) OK
   Model evaluation/optimization (rank, numIter, lambda) with itertools
OK
3. Scala - MLlib
3.1. statistics (min,max,mean,Pearson,Spearman) OK
3.2. LinearRegressionWithSGD OK
3.3. Decision Tree OK
3.4. KMeans OK
3.5. Recommendation (Movielens medium dataset ~1 M ratings) OK
3.6. saveAsParquetFile OK
3.7. Read and verify the 3.6 save(above) - sqlContext.parquetFile,
registerTempTable, sql OK
3.8. result = sqlContext.sql("SELECT
OrderDetails.OrderID,ShipCountry,UnitPrice,Qty,Discount FROM Orders INNER
JOIN OrderDetails ON Orders.OrderID = OrderDetails.OrderID") OK
4.0. Spark SQL from Python OK
4.1. result = sqlContext.sql("SELECT * from people WHERE State = 'WA'") OK
5.0. Packages
5.1. com.databricks.spark.csv - read/write OK (--packages
com.databricks:spark-csv_2.10:1.4.0)
6.0. DataFrames
6.1. cast,dtypes OK
6.2. groupBy,avg,crosstab,corr,isNull,na.drop OK
6.3. All joins,sql,set operations,udf OK
[Dataframe Operations very fast from 11 secs to 3 secs, to 1.8 secs, to 1.5
secs! Good work !!!]
7.0. GraphX/Scala
7.1. Create Graph (small and bigger dataset) OK
7.2. Structure APIs - OK
7.3. Social Network/Community APIs - OK
7.4. Algorithms : PageRank of 2 datasets, aggregateMessages() - OK

Cheers


On Tue, Jul 19, 2016 at 7:35 PM, Reynold Xin  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.0.0
> [ ] -1 Do not release this package because ...
>
>
> The tag to be voted on is v2.0.0-rc5
> (13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).
>
> This release candidate resolves ~2500 issues:
> https://s.apache.org/spark-2.0.0-jira
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1195/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/
>
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions from 1.x.
>
> ==
> What justifies a -1 vote for this release?
> ==
> Critical bugs impacting major functionalities.
>
> Bugs already present in 1.x, missing features, or bugs related to new
> features will not necessarily block this release. Note that historically
> Spark documentation has been published on the website separately from the
> main release so we do not need to block the release due to documentation
> errors either.
>
>


Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-20 Thread Joseph Gonzalez
+1

Sent from my iPad

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-20 Thread Michael Allman
I've run some tests with some real and some synthetic parquet data with nested 
columns with and without the hive metastore on our Spark 1.5, 1.6 and 2.0 
versions. I haven't seen any unexpected performance surprises, except that 
Spark 2.0 now does schema inference across all files in a partitioned parquet 
metastore table. Granted, you aren't using a metastore table, but maybe Spark 
does that for partitioned non-metastore tables as well.

Michael

> On Jul 20, 2016, at 2:16 PM, Maciej Bryński  wrote:
> 
> @Michael,
> I answered in Jira and could repeat here.
> I think that my problem is unrelated to Hive, because I'm using read.parquet 
> method.
> I also attached some VisualVM snapshots to SPARK-16321 (I think I should 
> merge both issues)
> And code profiling suggest bottleneck when reading parquet file.
> 
> I wonder if there are any other benchmarks related to parquet performance.
> 
> Regards,
> -- 
> Maciek Bryński


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-20 Thread Maciej Bryński
@Michael,
I answered in Jira and could repeat here.
I think that my problem is unrelated to Hive, because I'm using
read.parquet method.
I also attached some VisualVM snapshots to SPARK-16321 (I think I should
merge both issues)
And code profiling suggest bottleneck when reading parquet file.

I wonder if there are any other benchmarks related to parquet performance.

Regards,
-- 
Maciek Bryński


Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-20 Thread Marcin Tustin
I refer to Maciej Bryński's (mac...@brynski.pl) emails of 29 and 30 June
2016 to this list. He said that his benchmarking suggested that Spark 2.0
was slower than 1.6.

I'm wondering if that was ever investigated, and if so if the speed is back
up, or not.

On Wed, Jul 20, 2016 at 12:18 PM, Michael Allman 
wrote:

> Marcin,
>
> I'm not sure what you're referring to. Can you be more specific?
>
> Cheers,
>
> Michael
>
> On Jul 20, 2016, at 9:10 AM, Marcin Tustin  wrote:
>
> Whatever happened with the query regarding benchmarks? Is that resolved?
>
> On Tue, Jul 19, 2016 at 10:35 PM, Reynold Xin  wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes
>> if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 2.0.0
>> [ ] -1 Do not release this package because ...
>>
>>
>> The tag to be voted on is v2.0.0-rc5
>> (13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).
>>
>> This release candidate resolves ~2500 issues:
>> https://s.apache.org/spark-2.0.0-jira
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1195/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/
>>
>>
>> =
>> How can I help test this release?
>> =
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions from 1.x.
>>
>> ==
>> What justifies a -1 vote for this release?
>> ==
>> Critical bugs impacting major functionalities.
>>
>> Bugs already present in 1.x, missing features, or bugs related to new
>> features will not necessarily block this release. Note that historically
>> Spark documentation has been published on the website separately from the
>> main release so we do not need to block the release due to documentation
>> errors either.
>>
>>
>
> Want to work at Handy? Check out our culture deck and open roles
> 
> Latest news  at Handy
> Handy just raised $50m
> 
>  led
> by Fidelity
>
>
>

-- 
Want to work at Handy? Check out our culture deck and open roles 

Latest news  at Handy
Handy just raised $50m 

 led 
by Fidelity



Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-20 Thread Michael Allman
Marcin,

I'm not sure what you're referring to. Can you be more specific?

Cheers,

Michael

> On Jul 20, 2016, at 9:10 AM, Marcin Tustin  wrote:
> 
> Whatever happened with the query regarding benchmarks? Is that resolved?
> 
> On Tue, Jul 19, 2016 at 10:35 PM, Reynold Xin  > wrote:
> Please vote on releasing the following candidate as Apache Spark version 
> 2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes 
> if a majority of at least 3 +1 PMC votes are cast.
> 
> [ ] +1 Release this package as Apache Spark 2.0.0
> [ ] -1 Do not release this package because ...
> 
> 
> The tag to be voted on is v2.0.0-rc5 
> (13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).
> 
> This release candidate resolves ~2500 issues: 
> https://s.apache.org/spark-2.0.0-jira 
> 
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/ 
> 
> 
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc 
> 
> 
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1195/ 
> 
> 
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/ 
> 
> 
> 
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking an 
> existing Spark workload and running on this release candidate, then reporting 
> any regressions from 1.x.
> 
> ==
> What justifies a -1 vote for this release?
> ==
> Critical bugs impacting major functionalities.
> 
> Bugs already present in 1.x, missing features, or bugs related to new 
> features will not necessarily block this release. Note that historically 
> Spark documentation has been published on the website separately from the 
> main release so we do not need to block the release due to documentation 
> errors either.
> 
> 
> 
> Want to work at Handy? Check out our culture deck and open roles 
> 
> Latest news  at Handy
> Handy just raised $50m 
> 
>  led by Fidelity
> 
> 



Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-20 Thread Marcin Tustin
Whatever happened with the query regarding benchmarks? Is that resolved?

On Tue, Jul 19, 2016 at 10:35 PM, Reynold Xin  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.0.0
> [ ] -1 Do not release this package because ...
>
>
> The tag to be voted on is v2.0.0-rc5
> (13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).
>
> This release candidate resolves ~2500 issues:
> https://s.apache.org/spark-2.0.0-jira
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1195/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/
>
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions from 1.x.
>
> ==
> What justifies a -1 vote for this release?
> ==
> Critical bugs impacting major functionalities.
>
> Bugs already present in 1.x, missing features, or bugs related to new
> features will not necessarily block this release. Note that historically
> Spark documentation has been published on the website separately from the
> main release so we do not need to block the release due to documentation
> errors either.
>
>

-- 
Want to work at Handy? Check out our culture deck and open roles 

Latest news  at Handy
Handy just raised $50m 

 led 
by Fidelity



Re: [VOTE] Release Apache Spark 2.0.0 (RC5)

2016-07-20 Thread Shivaram Venkataraman
+1

SHA and MD5 sums match for all binaries. Docs look fine this time
around. Built and ran `dev/run-tests` with Java 7 on a linux machine.

No blocker bugs on JIRA and the only critical bug with target as 2.0.0
is SPARK-16633, which doesn't look like a release blocker. I also
checked issues which are marked as Critical affecting version 2.0.0
and the only other ones that seem applicable are SPARK-15703 and
SPARK-16334. Both of them don't look like blockers to me.

Thanks
Shivaram


On Tue, Jul 19, 2016 at 7:35 PM, Reynold Xin  wrote:
> Please vote on releasing the following candidate as Apache Spark version
> 2.0.0. The vote is open until Friday, July 22, 2016 at 20:00 PDT and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 2.0.0
> [ ] -1 Do not release this package because ...
>
>
> The tag to be voted on is v2.0.0-rc5
> (13650fc58e1fcf2cf2a26ba11c819185ae1acc1f).
>
> This release candidate resolves ~2500 issues:
> https://s.apache.org/spark-2.0.0-jira
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-bin/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1195/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-releases/spark-2.0.0-rc5-docs/
>
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions from 1.x.
>
> ==
> What justifies a -1 vote for this release?
> ==
> Critical bugs impacting major functionalities.
>
> Bugs already present in 1.x, missing features, or bugs related to new
> features will not necessarily block this release. Note that historically
> Spark documentation has been published on the website separately from the
> main release so we do not need to block the release due to documentation
> errors either.
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org