Why Dataset.hint uses logicalPlan (= analyzed not planWithBarrier)?

2018-01-25 Thread Jacek Laskowski
Hi,

I've just noticed that every time Dataset.hint is used it triggers
execution of logical commands, their unions and hint resolution (among
other things that analyzer does).

Why?

Why does hint trigger hint resolution (through QueryExecution.analyzed)? [1]

And moreover why not to use planWithBarrier instead? [2] Looks like an
oversight, doesn't it?

[1]
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L1219

[2]
https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L195

Pozdrawiam,
Jacek Laskowski

https://about.me/JacekLaskowski
Mastering Spark SQL https://bit.ly/mastering-spark-sql
Spark Structured Streaming https://bit.ly/spark-structured-streaming
Mastering Kafka Streams https://bit.ly/mastering-kafka-streams
Follow me at https://twitter.com/jaceklaskowski


Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Joseph Torres
SPARK-23221 fixes an issue specific
to KafkaContinuousSourceStressForDontFailOnDataLossSuite; I don't think it
could cause other suites to deadlock.

Do note that the previous hang issues we saw caused by SPARK-23055 were
correctly marked as failures.

On Thu, Jan 25, 2018 at 3:40 PM, Shixiong(Ryan) Zhu  wrote:

> + Jose
>
> On Thu, Jan 25, 2018 at 2:18 PM, Dongjoon Hyun 
> wrote:
>
>> SPARK-23221 is one of the reasons for Kafka-test-suite deadlock issue.
>>
>> For the hang issues, it seems not to be marked as a failure correctly in
>> Apache Spark Jenkins history.
>>
>>
>> On Thu, Jan 25, 2018 at 1:03 PM, Marcelo Vanzin 
>> wrote:
>>
>>> On Thu, Jan 25, 2018 at 12:29 PM, Sean Owen  wrote:
>>> > I am still seeing these tests fail or hang:
>>> >
>>> > - subscribing topic by name from earliest offsets (failOnDataLoss:
>>> false)
>>> > - subscribing topic by name from earliest offsets (failOnDataLoss:
>>> true)
>>>
>>> This is something that we are seeing internally on a different version
>>> Spark, and we're currently investigating with our Kafka people. Not
>>> sure it's the same issue (we have a newer version of Kafka libraries),
>>> but this is just another way of saying that I don't think those hangs
>>> are new in 2.3, at least.
>>>
>>> --
>>> Marcelo
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>
>


Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Shixiong(Ryan) Zhu
+ Jose

On Thu, Jan 25, 2018 at 2:18 PM, Dongjoon Hyun 
wrote:

> SPARK-23221 is one of the reasons for Kafka-test-suite deadlock issue.
>
> For the hang issues, it seems not to be marked as a failure correctly in
> Apache Spark Jenkins history.
>
>
> On Thu, Jan 25, 2018 at 1:03 PM, Marcelo Vanzin 
> wrote:
>
>> On Thu, Jan 25, 2018 at 12:29 PM, Sean Owen  wrote:
>> > I am still seeing these tests fail or hang:
>> >
>> > - subscribing topic by name from earliest offsets (failOnDataLoss:
>> false)
>> > - subscribing topic by name from earliest offsets (failOnDataLoss: true)
>>
>> This is something that we are seeing internally on a different version
>> Spark, and we're currently investigating with our Kafka people. Not
>> sure it's the same issue (we have a newer version of Kafka libraries),
>> but this is just another way of saying that I don't think those hangs
>> are new in 2.3, at least.
>>
>> --
>> Marcelo
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>


Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Dongjoon Hyun
SPARK-23221 is one of the reasons for Kafka-test-suite deadlock issue.

For the hang issues, it seems not to be marked as a failure correctly in
Apache Spark Jenkins history.


On Thu, Jan 25, 2018 at 1:03 PM, Marcelo Vanzin  wrote:

> On Thu, Jan 25, 2018 at 12:29 PM, Sean Owen  wrote:
> > I am still seeing these tests fail or hang:
> >
> > - subscribing topic by name from earliest offsets (failOnDataLoss: false)
> > - subscribing topic by name from earliest offsets (failOnDataLoss: true)
>
> This is something that we are seeing internally on a different version
> Spark, and we're currently investigating with our Kafka people. Not
> sure it's the same issue (we have a newer version of Kafka libraries),
> but this is just another way of saying that I don't think those hangs
> are new in 2.3, at least.
>
> --
> Marcelo
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Marcelo Vanzin
On Thu, Jan 25, 2018 at 12:29 PM, Sean Owen  wrote:
> I am still seeing these tests fail or hang:
>
> - subscribing topic by name from earliest offsets (failOnDataLoss: false)
> - subscribing topic by name from earliest offsets (failOnDataLoss: true)

This is something that we are seeing internally on a different version
Spark, and we're currently investigating with our Kafka people. Not
sure it's the same issue (we have a newer version of Kafka libraries),
but this is just another way of saying that I don't think those hangs
are new in 2.3, at least.

-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Sameer Agarwal
> Most tests pass on RC2, except I'm still seeing the timeout caused by
> https://issues.apache.org/jira/browse/SPARK-23055 ; the tests never
> finish. I followed the thread a bit further and wasn't clear whether it was
> subsequently re-fixed for 2.3.0 or not. It says it's resolved along with
> https://issues.apache.org/jira/browse/SPARK-22908 for 2.3.0 though I am
> still seeing these tests fail or hang:
>
> - subscribing topic by name from earliest offsets (failOnDataLoss: false)
> - subscribing topic by name from earliest offsets (failOnDataLoss: true)
>

Sean, while some of these tests were timing out on RC1, we're not aware of
any known issues in RC2. Both maven (
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-maven-hadoop-2.6/146/testReport/org.apache.spark.sql.kafka010/history/)
and sbt (
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Test%20(Dashboard)/job/spark-branch-2.3-test-sbt-hadoop-2.6/123/testReport/org.apache.spark.sql.kafka010/history/)
historical builds on jenkins for org.apache.spark.sql.kafka010 look fairly
healthy. If you're still seeing timeouts in RC2, can you create a JIRA with
any applicable build/env info?



> On Tue, Jan 23, 2018 at 9:01 AM Sean Owen  wrote:
>
>> I'm not seeing that same problem on OS X and /usr/bin/tar. I tried
>> unpacking it with 'xvzf' and also unzipping it first, and it untarred
>> without warnings in either case.
>>
>> I am encountering errors while running the tests, different ones each
>> time, so am still figuring out whether there is a real problem or just
>> flaky tests.
>>
>> These issues look like blockers, as they are inherently to be completed
>> before the 2.3 release. They are mostly not done. I suppose I'd -1 on
>> behalf of those who say this needs to be done first, though, we can keep
>> testing.
>>
>> SPARK-23105 Spark MLlib, GraphX 2.3 QA umbrella
>> SPARK-23114 Spark R 2.3 QA umbrella
>>
>> Here are the remaining items targeted for 2.3:
>>
>> SPARK-15689 Data source API v2
>> SPARK-20928 SPIP: Continuous Processing Mode for Structured Streaming
>> SPARK-21646 Add new type coercion rules to compatible with Hive
>> SPARK-22386 Data Source V2 improvements
>> SPARK-22731 Add a test for ROWID type to OracleIntegrationSuite
>> SPARK-22735 Add VectorSizeHint to ML features documentation
>> SPARK-22739 Additional Expression Support for Objects
>> SPARK-22809 pyspark is sensitive to imports with dots
>> SPARK-22820 Spark 2.3 SQL API audit
>>
>>
>> On Mon, Jan 22, 2018 at 7:09 PM Marcelo Vanzin 
>> wrote:
>>
>>> +0
>>>
>>> Signatures check out. Code compiles, although I see the errors in [1]
>>> when untarring the source archive; perhaps we should add "use GNU tar"
>>> to the RM checklist?
>>>
>>> Also ran our internal tests and they seem happy.
>>>
>>> My concern is the list of open bugs targeted at 2.3.0 (ignoring the
>>> documentation ones). It is not long, but it seems some of those need
>>> to be looked at. It would be nice for the committers who are involved
>>> in those bugs to take a look.
>>>
>>> [1] https://superuser.com/questions/318809/linux-os-x-
>>> tar-incompatibility-tarballs-created-on-os-x-give-errors-when-unt
>>>
>>>
>>> On Mon, Jan 22, 2018 at 1:36 PM, Sameer Agarwal 
>>> wrote:
>>> > Please vote on releasing the following candidate as Apache Spark
>>> version
>>> > 2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am
>>> UTC and
>>> > passes if a majority of at least 3 PMC +1 votes are cast.
>>> >
>>> >
>>> > [ ] +1 Release this package as Apache Spark 2.3.0
>>> >
>>> > [ ] -1 Do not release this package because ...
>>> >
>>> >
>>> > To learn more about Apache Spark, please see https://spark.apache.org/
>>> >
>>> > The tag to be voted on is v2.3.0-rc2:
>>> > https://github.com/apache/spark/tree/v2.3.0-rc2
>>> > (489ecb0ef23e5d9b705e5e5bae4fa3d871bdac91)
>>> >
>>> > List of JIRA tickets resolved in this release can be found here:
>>> > https://issues.apache.org/jira/projects/SPARK/versions/12339551
>>> >
>>> > The release files, including signatures, digests, etc. can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-bin/
>>> >
>>> > Release artifacts are signed with the following key:
>>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>>> >
>>> > The staging repository for this release can be found at:
>>> > https://repository.apache.org/content/repositories/
>>> orgapachespark-1262/
>>> >
>>> > The documentation corresponding to this release can be found at:
>>> > https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-
>>> docs/_site/index.html
>>> >
>>> >
>>> > FAQ
>>> >
>>> > ===
>>> > What are the unresolved issues targeted for 2.3.0?
>>> > ===
>>> >
>>> > Please see https://s.apache.org/oXKi. At the time of writing, there
>>> are
>>> > currently no known release blockers.
>>> >

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Sameer Agarwal
I'm a -1 too. In addition to SPARK-23207
, we've recently merged
two codegen fixes (SPARK-23208
 and SPARK-21717
) that address a major
code-splitting bug and performance regressions respectively.

Regarding QA tasks, I think it goes without saying that all QA
pre-requisites are by-definition "release blockers" and an RC will not pass
until all of them are resolved. Traditionally for every major Spark
release, we've seen that serious QA only starts once an RC is cut, but if
the community feels otherwise, I'm happy to hold off the next RC until all
these QA JIRAs are resolved. Otherwise, I'll follow up with an RC3 once
SPARK-23207  and
SPARK-23209  are
resolved.

On 25 January 2018 at 10:17, Nick Pentreath 
wrote:

> I think this has come up before (and Sean mentions it above), but the
> sub-items on:
>
> SPARK-23105 Spark MLlib, GraphX 2.3 QA umbrella
>
> are actually marked as Blockers, but are not targeted to 2.3.0. I think
> they should be, and I'm not comfortable with those not being resolved
> before voting positively on the release.
>
> So I'm -1 too for that reason.
>
> I think most of those review items are close to done, and there is also
> https://issues.apache.org/jira/browse/SPARK-22799 that I think should be
> in for 2.3 (to avoid a behavior change later between 2.3.0 and 2.3.1,
> especially since we'll have another RC now it seems).
>
>
> On Thu, 25 Jan 2018 at 19:28 Marcelo Vanzin  wrote:
>
>> Sorry, have to change my vote again. Hive guys ran into SPARK-23209
>> and that's a regression we need to fix. I'll post a patch soon. So -1
>> (although others have already -1'ed).
>>
>> On Wed, Jan 24, 2018 at 11:42 AM, Marcelo Vanzin 
>> wrote:
>> > Given that the bugs I was worried about have been dealt with, I'm
>> > upgrading to +1.
>> >
>> > On Mon, Jan 22, 2018 at 5:09 PM, Marcelo Vanzin 
>> wrote:
>> >> +0
>> >>
>> >> Signatures check out. Code compiles, although I see the errors in [1]
>> >> when untarring the source archive; perhaps we should add "use GNU tar"
>> >> to the RM checklist?
>> >>
>> >> Also ran our internal tests and they seem happy.
>> >>
>> >> My concern is the list of open bugs targeted at 2.3.0 (ignoring the
>> >> documentation ones). It is not long, but it seems some of those need
>> >> to be looked at. It would be nice for the committers who are involved
>> >> in those bugs to take a look.
>> >>
>> >> [1] https://superuser.com/questions/318809/linux-os-x-
>> tar-incompatibility-tarballs-created-on-os-x-give-errors-when-unt
>> >>
>> >>
>> >> On Mon, Jan 22, 2018 at 1:36 PM, Sameer Agarwal 
>> wrote:
>> >>> Please vote on releasing the following candidate as Apache Spark
>> version
>> >>> 2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am
>> UTC and
>> >>> passes if a majority of at least 3 PMC +1 votes are cast.
>> >>>
>> >>>
>> >>> [ ] +1 Release this package as Apache Spark 2.3.0
>> >>>
>> >>> [ ] -1 Do not release this package because ...
>> >>>
>> >>>
>> >>> To learn more about Apache Spark, please see
>> https://spark.apache.org/
>> >>>
>> >>> The tag to be voted on is v2.3.0-rc2:
>> >>> https://github.com/apache/spark/tree/v2.3.0-rc2
>> >>> (489ecb0ef23e5d9b705e5e5bae4fa3d871bdac91)
>> >>>
>> >>> List of JIRA tickets resolved in this release can be found here:
>> >>> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>> >>>
>> >>> The release files, including signatures, digests, etc. can be found
>> at:
>> >>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-bin/
>> >>>
>> >>> Release artifacts are signed with the following key:
>> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >>>
>> >>> The staging repository for this release can be found at:
>> >>> https://repository.apache.org/content/repositories/
>> orgapachespark-1262/
>> >>>
>> >>> The documentation corresponding to this release can be found at:
>> >>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-
>> docs/_site/index.html
>> >>>
>> >>>
>> >>> FAQ
>> >>>
>> >>> ===
>> >>> What are the unresolved issues targeted for 2.3.0?
>> >>> ===
>> >>>
>> >>> Please see https://s.apache.org/oXKi. At the time of writing, there
>> are
>> >>> currently no known release blockers.
>> >>>
>> >>> =
>> >>> How can I help test this release?
>> >>> =
>> >>>
>> >>> If you are a Spark user, you can help us test this release by taking
>> an
>> >>> existing Spark workload and running on this release candidate, then
>> >>> reporting any regressions.
>> >>>
>> >>> If you're working in PySpark you can set 

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Sean Owen
Most tests pass on RC2, except I'm still seeing the timeout caused by
https://issues.apache.org/jira/browse/SPARK-23055 ; the tests never finish.
I followed the thread a bit further and wasn't clear whether it was
subsequently re-fixed for 2.3.0 or not. It says it's resolved along with
https://issues.apache.org/jira/browse/SPARK-22908 for 2.3.0 though I am
still seeing these tests fail or hang:

- subscribing topic by name from earliest offsets (failOnDataLoss: false)
- subscribing topic by name from earliest offsets (failOnDataLoss: true)

Will check out the next RC.

On Tue, Jan 23, 2018 at 9:01 AM Sean Owen  wrote:

> I'm not seeing that same problem on OS X and /usr/bin/tar. I tried
> unpacking it with 'xvzf' and also unzipping it first, and it untarred
> without warnings in either case.
>
> I am encountering errors while running the tests, different ones each
> time, so am still figuring out whether there is a real problem or just
> flaky tests.
>
> These issues look like blockers, as they are inherently to be completed
> before the 2.3 release. They are mostly not done. I suppose I'd -1 on
> behalf of those who say this needs to be done first, though, we can keep
> testing.
>
> SPARK-23105 Spark MLlib, GraphX 2.3 QA umbrella
> SPARK-23114 Spark R 2.3 QA umbrella
>
> Here are the remaining items targeted for 2.3:
>
> SPARK-15689 Data source API v2
> SPARK-20928 SPIP: Continuous Processing Mode for Structured Streaming
> SPARK-21646 Add new type coercion rules to compatible with Hive
> SPARK-22386 Data Source V2 improvements
> SPARK-22731 Add a test for ROWID type to OracleIntegrationSuite
> SPARK-22735 Add VectorSizeHint to ML features documentation
> SPARK-22739 Additional Expression Support for Objects
> SPARK-22809 pyspark is sensitive to imports with dots
> SPARK-22820 Spark 2.3 SQL API audit
>
>
> On Mon, Jan 22, 2018 at 7:09 PM Marcelo Vanzin 
> wrote:
>
>> +0
>>
>> Signatures check out. Code compiles, although I see the errors in [1]
>> when untarring the source archive; perhaps we should add "use GNU tar"
>> to the RM checklist?
>>
>> Also ran our internal tests and they seem happy.
>>
>> My concern is the list of open bugs targeted at 2.3.0 (ignoring the
>> documentation ones). It is not long, but it seems some of those need
>> to be looked at. It would be nice for the committers who are involved
>> in those bugs to take a look.
>>
>> [1]
>> https://superuser.com/questions/318809/linux-os-x-tar-incompatibility-tarballs-created-on-os-x-give-errors-when-unt
>>
>>
>> On Mon, Jan 22, 2018 at 1:36 PM, Sameer Agarwal 
>> wrote:
>> > Please vote on releasing the following candidate as Apache Spark version
>> > 2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am UTC
>> and
>> > passes if a majority of at least 3 PMC +1 votes are cast.
>> >
>> >
>> > [ ] +1 Release this package as Apache Spark 2.3.0
>> >
>> > [ ] -1 Do not release this package because ...
>> >
>> >
>> > To learn more about Apache Spark, please see https://spark.apache.org/
>> >
>> > The tag to be voted on is v2.3.0-rc2:
>> > https://github.com/apache/spark/tree/v2.3.0-rc2
>> > (489ecb0ef23e5d9b705e5e5bae4fa3d871bdac91)
>> >
>> > List of JIRA tickets resolved in this release can be found here:
>> > https://issues.apache.org/jira/projects/SPARK/versions/12339551
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-bin/
>> >
>> > Release artifacts are signed with the following key:
>> > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1262/
>> >
>> > The documentation corresponding to this release can be found at:
>> >
>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-docs/_site/index.html
>> >
>> >
>> > FAQ
>> >
>> > ===
>> > What are the unresolved issues targeted for 2.3.0?
>> > ===
>> >
>> > Please see https://s.apache.org/oXKi. At the time of writing, there are
>> > currently no known release blockers.
>> >
>> > =
>> > How can I help test this release?
>> > =
>> >
>> > If you are a Spark user, you can help us test this release by taking an
>> > existing Spark workload and running on this release candidate, then
>> > reporting any regressions.
>> >
>> > If you're working in PySpark you can set up a virtual env and install
>> the
>> > current RC and see if anything important breaks, in the Java/Scala you
>> can
>> > add the staging repository to your projects resolvers and test with the
>> RC
>> > (make sure to clean up the artifact cache before/after so you don't end
>> up
>> > building with a out of date RC going forward).
>> >
>> > ===
>> > What should happen to 

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Nick Pentreath
I think this has come up before (and Sean mentions it above), but the
sub-items on:

SPARK-23105 Spark MLlib, GraphX 2.3 QA umbrella

are actually marked as Blockers, but are not targeted to 2.3.0. I think
they should be, and I'm not comfortable with those not being resolved
before voting positively on the release.

So I'm -1 too for that reason.

I think most of those review items are close to done, and there is also
https://issues.apache.org/jira/browse/SPARK-22799 that I think should be in
for 2.3 (to avoid a behavior change later between 2.3.0 and 2.3.1,
especially since we'll have another RC now it seems).


On Thu, 25 Jan 2018 at 19:28 Marcelo Vanzin  wrote:

> Sorry, have to change my vote again. Hive guys ran into SPARK-23209
> and that's a regression we need to fix. I'll post a patch soon. So -1
> (although others have already -1'ed).
>
> On Wed, Jan 24, 2018 at 11:42 AM, Marcelo Vanzin 
> wrote:
> > Given that the bugs I was worried about have been dealt with, I'm
> > upgrading to +1.
> >
> > On Mon, Jan 22, 2018 at 5:09 PM, Marcelo Vanzin 
> wrote:
> >> +0
> >>
> >> Signatures check out. Code compiles, although I see the errors in [1]
> >> when untarring the source archive; perhaps we should add "use GNU tar"
> >> to the RM checklist?
> >>
> >> Also ran our internal tests and they seem happy.
> >>
> >> My concern is the list of open bugs targeted at 2.3.0 (ignoring the
> >> documentation ones). It is not long, but it seems some of those need
> >> to be looked at. It would be nice for the committers who are involved
> >> in those bugs to take a look.
> >>
> >> [1]
> https://superuser.com/questions/318809/linux-os-x-tar-incompatibility-tarballs-created-on-os-x-give-errors-when-unt
> >>
> >>
> >> On Mon, Jan 22, 2018 at 1:36 PM, Sameer Agarwal 
> wrote:
> >>> Please vote on releasing the following candidate as Apache Spark
> version
> >>> 2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am
> UTC and
> >>> passes if a majority of at least 3 PMC +1 votes are cast.
> >>>
> >>>
> >>> [ ] +1 Release this package as Apache Spark 2.3.0
> >>>
> >>> [ ] -1 Do not release this package because ...
> >>>
> >>>
> >>> To learn more about Apache Spark, please see https://spark.apache.org/
> >>>
> >>> The tag to be voted on is v2.3.0-rc2:
> >>> https://github.com/apache/spark/tree/v2.3.0-rc2
> >>> (489ecb0ef23e5d9b705e5e5bae4fa3d871bdac91)
> >>>
> >>> List of JIRA tickets resolved in this release can be found here:
> >>> https://issues.apache.org/jira/projects/SPARK/versions/12339551
> >>>
> >>> The release files, including signatures, digests, etc. can be found at:
> >>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-bin/
> >>>
> >>> Release artifacts are signed with the following key:
> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
> >>>
> >>> The staging repository for this release can be found at:
> >>>
> https://repository.apache.org/content/repositories/orgapachespark-1262/
> >>>
> >>> The documentation corresponding to this release can be found at:
> >>>
> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-docs/_site/index.html
> >>>
> >>>
> >>> FAQ
> >>>
> >>> ===
> >>> What are the unresolved issues targeted for 2.3.0?
> >>> ===
> >>>
> >>> Please see https://s.apache.org/oXKi. At the time of writing, there
> are
> >>> currently no known release blockers.
> >>>
> >>> =
> >>> How can I help test this release?
> >>> =
> >>>
> >>> If you are a Spark user, you can help us test this release by taking an
> >>> existing Spark workload and running on this release candidate, then
> >>> reporting any regressions.
> >>>
> >>> If you're working in PySpark you can set up a virtual env and install
> the
> >>> current RC and see if anything important breaks, in the Java/Scala you
> can
> >>> add the staging repository to your projects resolvers and test with
> the RC
> >>> (make sure to clean up the artifact cache before/after so you don't
> end up
> >>> building with a out of date RC going forward).
> >>>
> >>> ===
> >>> What should happen to JIRA tickets still targeting 2.3.0?
> >>> ===
> >>>
> >>> Committers should look at those and triage. Extremely important bug
> fixes,
> >>> documentation, and API tweaks that impact compatibility should be
> worked on
> >>> immediately. Everything else please retarget to 2.3.1 or 2.3.0 as
> >>> appropriate.
> >>>
> >>> ===
> >>> Why is my bug not fixed?
> >>> ===
> >>>
> >>> In order to make timely releases, we will typically not hold the
> release
> >>> unless the bug in question is a regression from 2.2.0. That being
> said, if
> >>> there is something which is a regression from 2.2.0 and has not been
> >>> correctly targeted 

Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread Marcelo Vanzin
Sorry, have to change my vote again. Hive guys ran into SPARK-23209
and that's a regression we need to fix. I'll post a patch soon. So -1
(although others have already -1'ed).

On Wed, Jan 24, 2018 at 11:42 AM, Marcelo Vanzin  wrote:
> Given that the bugs I was worried about have been dealt with, I'm
> upgrading to +1.
>
> On Mon, Jan 22, 2018 at 5:09 PM, Marcelo Vanzin  wrote:
>> +0
>>
>> Signatures check out. Code compiles, although I see the errors in [1]
>> when untarring the source archive; perhaps we should add "use GNU tar"
>> to the RM checklist?
>>
>> Also ran our internal tests and they seem happy.
>>
>> My concern is the list of open bugs targeted at 2.3.0 (ignoring the
>> documentation ones). It is not long, but it seems some of those need
>> to be looked at. It would be nice for the committers who are involved
>> in those bugs to take a look.
>>
>> [1] 
>> https://superuser.com/questions/318809/linux-os-x-tar-incompatibility-tarballs-created-on-os-x-give-errors-when-unt
>>
>>
>> On Mon, Jan 22, 2018 at 1:36 PM, Sameer Agarwal  wrote:
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am UTC and
>>> passes if a majority of at least 3 PMC +1 votes are cast.
>>>
>>>
>>> [ ] +1 Release this package as Apache Spark 2.3.0
>>>
>>> [ ] -1 Do not release this package because ...
>>>
>>>
>>> To learn more about Apache Spark, please see https://spark.apache.org/
>>>
>>> The tag to be voted on is v2.3.0-rc2:
>>> https://github.com/apache/spark/tree/v2.3.0-rc2
>>> (489ecb0ef23e5d9b705e5e5bae4fa3d871bdac91)
>>>
>>> List of JIRA tickets resolved in this release can be found here:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12339551
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-bin/
>>>
>>> Release artifacts are signed with the following key:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1262/
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-docs/_site/index.html
>>>
>>>
>>> FAQ
>>>
>>> ===
>>> What are the unresolved issues targeted for 2.3.0?
>>> ===
>>>
>>> Please see https://s.apache.org/oXKi. At the time of writing, there are
>>> currently no known release blockers.
>>>
>>> =
>>> How can I help test this release?
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking an
>>> existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install the
>>> current RC and see if anything important breaks, in the Java/Scala you can
>>> add the staging repository to your projects resolvers and test with the RC
>>> (make sure to clean up the artifact cache before/after so you don't end up
>>> building with a out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 2.3.0?
>>> ===
>>>
>>> Committers should look at those and triage. Extremely important bug fixes,
>>> documentation, and API tweaks that impact compatibility should be worked on
>>> immediately. Everything else please retarget to 2.3.1 or 2.3.0 as
>>> appropriate.
>>>
>>> ===
>>> Why is my bug not fixed?
>>> ===
>>>
>>> In order to make timely releases, we will typically not hold the release
>>> unless the bug in question is a regression from 2.2.0. That being said, if
>>> there is something which is a regression from 2.2.0 and has not been
>>> correctly targeted please ping me or a committer to help target the issue
>>> (you can see the open issues listed as impacting Spark 2.3.0 at
>>> https://s.apache.org/WmoI).
>>>
>>>
>>> Regards,
>>> Sameer
>>
>>
>>
>> --
>> Marcelo
>
>
>
> --
> Marcelo



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Spark 2.3.0 (RC2)

2018-01-25 Thread 蒋星博
I'm sorry to post -1 on this, since there is a non-trivial correctness
issue that I believe we should fix in 2.3.

TL;DR; of the issue: A certain pattern of shuffle+repartition in a query
may produce wrong result if some downstream stages failed and trigger retry
of repartition, the reason of this bug is that current implementation of
`repartition()` doesn't generate deterministic output. The JIRA task:
https://issues.apache.org/jira/browse/SPARK-23207

This is NOT a regression, but since it's a non-trivial correctness issue,
we'd better ship the patch along with 2.3,

2018-01-24 11:42 GMT-08:00 Marcelo Vanzin :

> Given that the bugs I was worried about have been dealt with, I'm
> upgrading to +1.
>
> On Mon, Jan 22, 2018 at 5:09 PM, Marcelo Vanzin 
> wrote:
> > +0
> >
> > Signatures check out. Code compiles, although I see the errors in [1]
> > when untarring the source archive; perhaps we should add "use GNU tar"
> > to the RM checklist?
> >
> > Also ran our internal tests and they seem happy.
> >
> > My concern is the list of open bugs targeted at 2.3.0 (ignoring the
> > documentation ones). It is not long, but it seems some of those need
> > to be looked at. It would be nice for the committers who are involved
> > in those bugs to take a look.
> >
> > [1] https://superuser.com/questions/318809/linux-os-x-
> tar-incompatibility-tarballs-created-on-os-x-give-errors-when-unt
> >
> >
> > On Mon, Jan 22, 2018 at 1:36 PM, Sameer Agarwal 
> wrote:
> >> Please vote on releasing the following candidate as Apache Spark version
> >> 2.3.0. The vote is open until Friday January 26, 2018 at 8:00:00 am UTC
> and
> >> passes if a majority of at least 3 PMC +1 votes are cast.
> >>
> >>
> >> [ ] +1 Release this package as Apache Spark 2.3.0
> >>
> >> [ ] -1 Do not release this package because ...
> >>
> >>
> >> To learn more about Apache Spark, please see https://spark.apache.org/
> >>
> >> The tag to be voted on is v2.3.0-rc2:
> >> https://github.com/apache/spark/tree/v2.3.0-rc2
> >> (489ecb0ef23e5d9b705e5e5bae4fa3d871bdac91)
> >>
> >> List of JIRA tickets resolved in this release can be found here:
> >> https://issues.apache.org/jira/projects/SPARK/versions/12339551
> >>
> >> The release files, including signatures, digests, etc. can be found at:
> >> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-bin/
> >>
> >> Release artifacts are signed with the following key:
> >> https://dist.apache.org/repos/dist/dev/spark/KEYS
> >>
> >> The staging repository for this release can be found at:
> >> https://repository.apache.org/content/repositories/orgapachespark-1262/
> >>
> >> The documentation corresponding to this release can be found at:
> >> https://dist.apache.org/repos/dist/dev/spark/v2.3.0-rc2-
> docs/_site/index.html
> >>
> >>
> >> FAQ
> >>
> >> ===
> >> What are the unresolved issues targeted for 2.3.0?
> >> ===
> >>
> >> Please see https://s.apache.org/oXKi. At the time of writing, there are
> >> currently no known release blockers.
> >>
> >> =
> >> How can I help test this release?
> >> =
> >>
> >> If you are a Spark user, you can help us test this release by taking an
> >> existing Spark workload and running on this release candidate, then
> >> reporting any regressions.
> >>
> >> If you're working in PySpark you can set up a virtual env and install
> the
> >> current RC and see if anything important breaks, in the Java/Scala you
> can
> >> add the staging repository to your projects resolvers and test with the
> RC
> >> (make sure to clean up the artifact cache before/after so you don't end
> up
> >> building with a out of date RC going forward).
> >>
> >> ===
> >> What should happen to JIRA tickets still targeting 2.3.0?
> >> ===
> >>
> >> Committers should look at those and triage. Extremely important bug
> fixes,
> >> documentation, and API tweaks that impact compatibility should be
> worked on
> >> immediately. Everything else please retarget to 2.3.1 or 2.3.0 as
> >> appropriate.
> >>
> >> ===
> >> Why is my bug not fixed?
> >> ===
> >>
> >> In order to make timely releases, we will typically not hold the release
> >> unless the bug in question is a regression from 2.2.0. That being said,
> if
> >> there is something which is a regression from 2.2.0 and has not been
> >> correctly targeted please ping me or a committer to help target the
> issue
> >> (you can see the open issues listed as impacting Spark 2.3.0 at
> >> https://s.apache.org/WmoI).
> >>
> >>
> >> Regards,
> >> Sameer
> >
> >
> >
> > --
> > Marcelo
>
>
>
> --
> Marcelo
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>