Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-15 Thread Sameer Agarwal
In addition to the issues mentioned above, Wenchen and Xiao have flagged
two other regressions (https://issues.apache.org/jira/browse/SPARK-23316
and https://issues.apache.org/jira/browse/SPARK-23388) that were merged
after RC3 was cut.

Due to these, this vote fails. I'll follow-up with an RC4 in a day (this
will probably also give us enough time to resolve
https://issues.apache.org/jira/browse/SPARK-23381 and
https://issues.apache.org/jira/browse/SPARK-23410).


On 15 February 2018 at 17:22, mrkm4ntr  wrote:

> I agree that this is not a blocker against RC3.  It was not appropriate as
> a
> vote for RC3.
> There is no problem if it is in time for release 2.3.0.
>
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-15 Thread mrkm4ntr
I agree that this is not a blocker against RC3.  It was not appropriate as a
vote for RC3.
There is no problem if it is in time for release 2.3.0.




--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-15 Thread Ryan Blue
I agree that SPARK-23413 should be considered a blocker. It isn't
unreasonable to run a history server that is used for several versions of
Spark.

On Thu, Feb 15, 2018 at 7:49 AM, Sean Owen  wrote:

> SPARK-23381 is probably not a blocker IMHO; it's a nice-to-have to make
> some returned values match an external implementation, for code that hasn't
> been published yet.
>
> However I think it's OK to add to the 2.3.0 release if there's going to be
> another RC.
>
>
> On Wed, Feb 14, 2018 at 10:49 PM Holden Karau 
> wrote:
>
>> So it's currently tagged as minor and under consideration for 2.4.0. Do
>> you think this priority is incorrect? This doesn't seem like a regression
>> or a correctness issue so normally we wouldn't hold the release. Of course
>> your free to vote how you choose, just providing some additional context
>> around how tend to do released.
>>
>>
>> On Feb 14, 2018 11:03 PM, "mrkm4ntr"  wrote:
>>
>> I'm -1 because of this issue.
>> I want to fix the hashing implementation in FeatureHasher before
>> FeatureHasher released in 2.3.0.
>>
>> https://issues.apache.org/jira/browse/SPARK-23381
>> https://github.com/apache/spark/pull/20568
>>
>> I will fix it soon.
>>
>>
>>
>> --
>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>>


-- 
Ryan Blue
Software Engineer
Netflix


Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-15 Thread Sean Owen
SPARK-23381 is probably not a blocker IMHO; it's a nice-to-have to make
some returned values match an external implementation, for code that hasn't
been published yet.

However I think it's OK to add to the 2.3.0 release if there's going to be
another RC.

On Wed, Feb 14, 2018 at 10:49 PM Holden Karau 
wrote:

> So it's currently tagged as minor and under consideration for 2.4.0. Do
> you think this priority is incorrect? This doesn't seem like a regression
> or a correctness issue so normally we wouldn't hold the release. Of course
> your free to vote how you choose, just providing some additional context
> around how tend to do released.
>
>
> On Feb 14, 2018 11:03 PM, "mrkm4ntr"  wrote:
>
> I'm -1 because of this issue.
> I want to fix the hashing implementation in FeatureHasher before
> FeatureHasher released in 2.3.0.
>
> https://issues.apache.org/jira/browse/SPARK-23381
> https://github.com/apache/spark/pull/20568
>
> I will fix it soon.
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
>


Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-15 Thread Marcelo Vanzin
Since it seems there are other issues to fix, I raised SPARK-23413 to
blocker status to avoid having to change the disk format of history
data in a minor release.

On Wed, Feb 14, 2018 at 11:06 PM, Nick Pentreath
 wrote:
> -1 for me as we elevated https://issues.apache.org/jira/browse/SPARK-23377
> to a Blocker. It should be fixed before release.
>
> On Thu, 15 Feb 2018 at 07:25 Holden Karau  wrote:
>>
>> If this is a blocker in your view then the vote thread is an important
>> place to mention it. I'm not super sure all of the places these methods are
>> used so I'll defer to srowen and folks, but for the ML related implications
>> in the past we've allowed people to set the hashing function when we've
>> introduced changes.
>>
>> On Feb 15, 2018 2:08 PM, "mrkm4ntr"  wrote:
>>>
>>> I was advised to post here in the discussion at GitHub. I do not know
>>> what to
>>> do about the problem that discussions dispersing in two places.
>>>
>>>
>>>
>>> --
>>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-14 Thread Nick Pentreath
-1 for me as we elevated https://issues.apache.org/jira/browse/SPARK-23377 to
a Blocker. It should be fixed before release.

On Thu, 15 Feb 2018 at 07:25 Holden Karau  wrote:

> If this is a blocker in your view then the vote thread is an important
> place to mention it. I'm not super sure all of the places these methods are
> used so I'll defer to srowen and folks, but for the ML related implications
> in the past we've allowed people to set the hashing function when we've
> introduced changes.
>
> On Feb 15, 2018 2:08 PM, "mrkm4ntr"  wrote:
>
>> I was advised to post here in the discussion at GitHub. I do not know
>> what to
>> do about the problem that discussions dispersing in two places.
>>
>>
>>
>> --
>> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-14 Thread Holden Karau
If this is a blocker in your view then the vote thread is an important
place to mention it. I'm not super sure all of the places these methods are
used so I'll defer to srowen and folks, but for the ML related implications
in the past we've allowed people to set the hashing function when we've
introduced changes.

On Feb 15, 2018 2:08 PM, "mrkm4ntr"  wrote:

> I was advised to post here in the discussion at GitHub. I do not know what
> to
> do about the problem that discussions dispersing in two places.
>
>
>
> --
> Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-14 Thread mrkm4ntr
I was advised to post here in the discussion at GitHub. I do not know what to
do about the problem that discussions dispersing in two places.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-14 Thread Holden Karau
So it's currently tagged as minor and under consideration for 2.4.0. Do you
think this priority is incorrect? This doesn't seem like a regression or a
correctness issue so normally we wouldn't hold the release. Of course your
free to vote how you choose, just providing some additional context around
how tend to do released.

On Feb 14, 2018 11:03 PM, "mrkm4ntr"  wrote:

I'm -1 because of this issue.
I want to fix the hashing implementation in FeatureHasher before
FeatureHasher released in 2.3.0.

https://issues.apache.org/jira/browse/SPARK-23381
https://github.com/apache/spark/pull/20568

I will fix it soon.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-14 Thread mrkm4ntr
I'm -1 because of this issue.
I want to fix the hashing implementation in FeatureHasher before
FeatureHasher released in 2.3.0.

https://issues.apache.org/jira/browse/SPARK-23381
https://github.com/apache/spark/pull/20568

I will fix it soon.



--
Sent from: http://apache-spark-developers-list.1001551.n3.nabble.com/

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-13 Thread Sameer Agarwal
The issue with SPARK-23292 is that we currently run the python tests
related to pandas and pyarrow with python 3 (which is already installed on
all amplab jenkins machines). Since the code path is fully tested, we
decided to not mark it as a blocker; I've reworded the title to better
indicate that.

On 13 February 2018 at 08:16, Sean Owen  wrote:

> +1 from me. Again, licenses and sigs look fine. I built the source
> distribution with "-Phive -Phadoop-2.7 -Pyarn -Pkubernetes" and all tests
> passed.
>
> Remaining issues for 2.3.0, none of which are a Blocker:
>
> SPARK-22797 Add multiple column support to PySpark Bucketizer
> SPARK-23083 Adding Kubernetes as an option to https://spark.apache.org/
> SPARK-23292 python tests related to pandas are skipped
> SPARK-23309 Spark 2.3 cached query performance 20-30% worse then spark 2.2
> SPARK-23316 AnalysisException after max iteration reached for IN query
>
> ... though the pandas tests issue is "Critical".
>
> (SPARK-23083 is an update to the main site that should happen as the
> artifacts are released, so it's OK.)
>
> On Tue, Feb 13, 2018 at 12:30 AM Sameer Agarwal 
> wrote:
>
>> Now that all known blockers have once again been resolved, please vote on
>> releasing the following candidate as Apache Spark version 2.3.0. The vote
>> is open until Friday February 16, 2018 at 8:00:00 am UTC and passes if a
>> majority of at least 3 PMC +1 votes are cast.
>>
>>
>> [ ] +1 Release this package as Apache Spark 2.3.0
>>
>> [ ] -1 Do not release this package because ...
>>
>>
>> To learn more about Apache Spark, please see https://spark.apache.org/
>>
>> The tag to be voted on is v2.3.0-rc3: https://github.com/apache/
>> spark/tree/v2.3.0-rc3 (89f6fcbafcfb0a7aeb897fba6036cb085bd35121)
>>
>> List of JIRA tickets resolved in this release can be found here:
>> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc3-bin/
>>
>> Release artifacts are signed with the following key:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1264/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc3-
>> docs/_site/index.html
>>
>>
>> FAQ
>>
>> ===
>> What are the unresolved issues targeted for 2.3.0?
>> ===
>>
>> Please see https://s.apache.org/oXKi. At the time of writing, there are
>> currently no known release blockers.
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking an
>> existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install the
>> current RC and see if anything important breaks, in the Java/Scala you can
>> add the staging repository to your projects resolvers and test with the RC
>> (make sure to clean up the artifact cache before/after so you don't end up
>> building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 2.3.0?
>> ===
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should be
>> worked on immediately. Everything else please retarget to 2.3.1 or 2.4.0 as
>> appropriate.
>>
>> ===
>> Why is my bug not fixed?
>> ===
>>
>> In order to make timely releases, we will typically not hold the release
>> unless the bug in question is a regression from 2.2.0. That being said, if
>> there is something which is a regression from 2.2.0 and has not been
>> correctly targeted please ping me or a committer to help target the issue
>> (you can see the open issues listed as impacting Spark 2.3.0 at
>> https://s.apache.org/WmoI).
>>
>>
>> Regards,
>> Sameer
>>
>


-- 
Sameer Agarwal
Computer Science | UC Berkeley
http://cs.berkeley.edu/~sameerag


Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-13 Thread Sean Owen
+1 from me. Again, licenses and sigs look fine. I built the source
distribution with "-Phive -Phadoop-2.7 -Pyarn -Pkubernetes" and all tests
passed.

Remaining issues for 2.3.0, none of which are a Blocker:

SPARK-22797 Add multiple column support to PySpark Bucketizer
SPARK-23083 Adding Kubernetes as an option to https://spark.apache.org/
SPARK-23292 python tests related to pandas are skipped
SPARK-23309 Spark 2.3 cached query performance 20-30% worse then spark 2.2
SPARK-23316 AnalysisException after max iteration reached for IN query

... though the pandas tests issue is "Critical".

(SPARK-23083 is an update to the main site that should happen as the
artifacts are released, so it's OK.)

On Tue, Feb 13, 2018 at 12:30 AM Sameer Agarwal  wrote:

> Now that all known blockers have once again been resolved, please vote on
> releasing the following candidate as Apache Spark version 2.3.0. The vote
> is open until Friday February 16, 2018 at 8:00:00 am UTC and passes if a
> majority of at least 3 PMC +1 votes are cast.
>
>
> [ ] +1 Release this package as Apache Spark 2.3.0
>
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see https://spark.apache.org/
>
> The tag to be voted on is v2.3.0-rc3:
> https://github.com/apache/spark/tree/v2.3.0-rc3
> (89f6fcbafcfb0a7aeb897fba6036cb085bd35121)
>
> List of JIRA tickets resolved in this release can be found here:
> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc3-bin/
>
> Release artifacts are signed with the following key:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1264/
>
> The documentation corresponding to this release can be found at:
>
> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc3-docs/_site/index.html
>
>
> FAQ
>
> ===
> What are the unresolved issues targeted for 2.3.0?
> ===
>
> Please see https://s.apache.org/oXKi. At the time of writing, there are
> currently no known release blockers.
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install the
> current RC and see if anything important breaks, in the Java/Scala you can
> add the staging repository to your projects resolvers and test with the RC
> (make sure to clean up the artifact cache before/after so you don't end up
> building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.3.0?
> ===
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.3.1 or 2.4.0 as
> appropriate.
>
> ===
> Why is my bug not fixed?
> ===
>
> In order to make timely releases, we will typically not hold the release
> unless the bug in question is a regression from 2.2.0. That being said, if
> there is something which is a regression from 2.2.0 and has not been
> correctly targeted please ping me or a committer to help target the issue
> (you can see the open issues listed as impacting Spark 2.3.0 at
> https://s.apache.org/WmoI).
>
>
> Regards,
> Sameer
>


Re: [VOTE] Spark 2.3.0 (RC3)

2018-02-12 Thread Sameer Agarwal
I'll start the vote with a +1.

As of today, all known release blockers and QA tasks have been resolved,
and the jenkins builds are healthy:
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/

On 12 February 2018 at 22:30, Sameer Agarwal  wrote:

> Now that all known blockers have once again been resolved, please vote on
> releasing the following candidate as Apache Spark version 2.3.0. The vote
> is open until Friday February 16, 2018 at 8:00:00 am UTC and passes if a
> majority of at least 3 PMC +1 votes are cast.
>
>
> [ ] +1 Release this package as Apache Spark 2.3.0
>
> [ ] -1 Do not release this package because ...
>
>
> To learn more about Apache Spark, please see https://spark.apache.org/
>
> The tag to be voted on is v2.3.0-rc3: https://github.com/apache/
> spark/tree/v2.3.0-rc3 (89f6fcbafcfb0a7aeb897fba6036cb085bd35121)
>
> List of JIRA tickets resolved in this release can be found here:
> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc3-bin/
>
> Release artifacts are signed with the following key:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1264/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc3-
> docs/_site/index.html
>
>
> FAQ
>
> ===
> What are the unresolved issues targeted for 2.3.0?
> ===
>
> Please see https://s.apache.org/oXKi. At the time of writing, there are
> currently no known release blockers.
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking an
> existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install the
> current RC and see if anything important breaks, in the Java/Scala you can
> add the staging repository to your projects resolvers and test with the RC
> (make sure to clean up the artifact cache before/after so you don't end up
> building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.3.0?
> ===
>
> Committers should look at those and triage. Extremely important bug fixes,
> documentation, and API tweaks that impact compatibility should be worked on
> immediately. Everything else please retarget to 2.3.1 or 2.4.0 as
> appropriate.
>
> ===
> Why is my bug not fixed?
> ===
>
> In order to make timely releases, we will typically not hold the release
> unless the bug in question is a regression from 2.2.0. That being said, if
> there is something which is a regression from 2.2.0 and has not been
> correctly targeted please ping me or a committer to help target the issue
> (you can see the open issues listed as impacting Spark 2.3.0 at
> https://s.apache.org/WmoI).
>
>
> Regards,
> Sameer
>


[VOTE] Spark 2.3.0 (RC3)

2018-02-12 Thread Sameer Agarwal
Now that all known blockers have once again been resolved, please vote on
releasing the following candidate as Apache Spark version 2.3.0. The vote
is open until Friday February 16, 2018 at 8:00:00 am UTC and passes if a
majority of at least 3 PMC +1 votes are cast.


[ ] +1 Release this package as Apache Spark 2.3.0

[ ] -1 Do not release this package because ...


To learn more about Apache Spark, please see https://spark.apache.org/

The tag to be voted on is v2.3.0-rc3:
https://github.com/apache/spark/tree/v2.3.0-rc3
(89f6fcbafcfb0a7aeb897fba6036cb085bd35121)

List of JIRA tickets resolved in this release can be found here:
https://issues.apache.org/jira/projects/SPARK/versions/12339551

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc3-bin/

Release artifacts are signed with the following key:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1264/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc3-docs/_site/index.html


FAQ

===
What are the unresolved issues targeted for 2.3.0?
===

Please see https://s.apache.org/oXKi. At the time of writing, there are
currently no known release blockers.

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking an
existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install the
current RC and see if anything important breaks, in the Java/Scala you can
add the staging repository to your projects resolvers and test with the RC
(make sure to clean up the artifact cache before/after so you don't end up
building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 2.3.0?
===

Committers should look at those and triage. Extremely important bug fixes,
documentation, and API tweaks that impact compatibility should be worked on
immediately. Everything else please retarget to 2.3.1 or 2.4.0 as
appropriate.

===
Why is my bug not fixed?
===

In order to make timely releases, we will typically not hold the release
unless the bug in question is a regression from 2.2.0. That being said, if
there is something which is a regression from 2.2.0 and has not been
correctly targeted please ping me or a committer to help target the issue
(you can see the open issues listed as impacting Spark 2.3.0 at
https://s.apache.org/WmoI).


Regards,
Sameer