Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-31 Thread Dongjoon Hyun
+1

Cheers,
Dongjoon.


Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-31 Thread Felix Cheung
+1
Checked R doc and all R API changes



From: Denny Lee 
Sent: Wednesday, October 31, 2018 9:13 PM
To: Chitral Verma
Cc: Wenchen Fan; dev@spark.apache.org
Subject: Re: [VOTE] SPARK 2.4.0 (RC5)

+1

On Wed, Oct 31, 2018 at 12:54 PM Chitral Verma 
mailto:chitralve...@gmail.com>> wrote:
+1

On Wed, 31 Oct 2018 at 11:56, Reynold Xin 
mailto:r...@databricks.com>> wrote:
+1

Look forward to the release!



On Mon, Oct 29, 2018 at 3:22 AM Wenchen Fan 
mailto:cloud0...@gmail.com>> wrote:
Please vote on releasing the following candidate as Apache Spark version 2.4.0.

The vote is open until November 1 PST and passes if a majority +1 PMC votes are 
cast, with
a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 2.4.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v2.4.0-rc5 (commit 
0a4c03f7d084f1d2aa48673b99f3b9496893ce8d):
https://github.com/apache/spark/tree/v2.4.0-rc5

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1291

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-docs/

The list of bug fixes going into 2.4.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12342385

FAQ

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 2.4.0?
===

The current list of open tickets targeted at 2.4.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" 
= 2.4.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-31 Thread Denny Lee
+1

On Wed, Oct 31, 2018 at 12:54 PM Chitral Verma 
wrote:

> +1
>
> On Wed, 31 Oct 2018 at 11:56, Reynold Xin  wrote:
>
>> +1
>>
>> Look forward to the release!
>>
>>
>>
>> On Mon, Oct 29, 2018 at 3:22 AM Wenchen Fan  wrote:
>>
>>> Please vote on releasing the following candidate as Apache Spark version
>>> 2.4.0.
>>>
>>> The vote is open until November 1 PST and passes if a majority +1 PMC
>>> votes are cast, with
>>> a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 2.4.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v2.4.0-rc5 (commit
>>> 0a4c03f7d084f1d2aa48673b99f3b9496893ce8d):
>>> https://github.com/apache/spark/tree/v2.4.0-rc5
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1291
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-docs/
>>>
>>> The list of bug fixes going into 2.4.0 can be found at the following URL:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12342385
>>>
>>> FAQ
>>>
>>> =
>>> How can I help test this release?
>>> =
>>>
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with a out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 2.4.0?
>>> ===
>>>
>>> The current list of open tickets targeted at 2.4.0 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>>> Version/s" = 2.4.0
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>>
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>


Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-31 Thread Reynold Xin
+1

Look forward to the release!



On Mon, Oct 29, 2018 at 3:22 AM Wenchen Fan  wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 2.4.0.
>
> The vote is open until November 1 PST and passes if a majority +1 PMC
> votes are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.4.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.4.0-rc5 (commit
> 0a4c03f7d084f1d2aa48673b99f3b9496893ce8d):
> https://github.com/apache/spark/tree/v2.4.0-rc5
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1291
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-docs/
>
> The list of bug fixes going into 2.4.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12342385
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.4.0?
> ===
>
> The current list of open tickets targeted at 2.4.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 2.4.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>


Re: [VOTE] SPARK 2.4.0 (RC5)

2018-10-31 Thread Marcelo Vanzin
+1
On Mon, Oct 29, 2018 at 3:22 AM Wenchen Fan  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 2.4.0.
>
> The vote is open until November 1 PST and passes if a majority +1 PMC votes 
> are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 2.4.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v2.4.0-rc5 (commit 
> 0a4c03f7d084f1d2aa48673b99f3b9496893ce8d):
> https://github.com/apache/spark/tree/v2.4.0-rc5
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1291
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-docs/
>
> The list of bug fixes going into 2.4.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12342385
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 2.4.0?
> ===
>
> The current list of open tickets targeted at 2.4.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> Version/s" = 2.4.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.



-- 
Marcelo

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: DataSourceV2 hangouts sync

2018-10-31 Thread Arun Mahadevan
Thanks for bringing up the custom metrics API in the list, its something
that needs to be addressed.

A couple more items worth considering,

1. Possibility to unify the batch, micro-batch and continuous sources.
(similar to SPARK-25000)
Right now now there is significant code duplication even between
micro-batch v/s continuous sources.
Attempt to redesign such that a single implementation could potentially
work across modes (by implementing relevant apis).
2. Better framework support for supporting end-end exactly-once in
streaming. (maybe framework level support for 2PC).

Thanks,
Arun


On Tue, 30 Oct 2018 at 19:24, Wenchen Fan  wrote:

> Hi all,
>
> I spent some time thinking about the roadmap, and came up with an initial
> list:
> SPARK-25390: data source V2 API refactoring
> SPARK-24252: add catalog support
> SPARK-25531: new write APIs for data source v2
> SPARK-25190: better operator pushdown API
> Streaming rate control API
> Custom metrics API
> Migrate existing data sources
> Move data source v2 and built-in implementations to individual modules.
>
>
> Let's have more discussion over the hangout.
>
> Thanks,
> Wenchen
>
> On Tue, Oct 30, 2018 at 4:32 AM Ryan Blue 
> wrote:
>
>> Everyone,
>>
>> There are now 25 guests invited, which is a lot of people to actively
>> participate in a sync like this.
>>
>> For those of you who probably won't actively participate, I've added a
>> live stream. If you don't plan to talk, please use the live stream instead
>> of the meet/hangout so that we don't end up with so many people that we
>> can't actually get the discussion going. Here's a link to the stream:
>>
>> https://stream.meet.google.com/stream/6be59d80-04c7-44dc-9042-4f3b597fc8ba
>>
>> Thanks!
>>
>> rb
>>
>> On Thu, Oct 25, 2018 at 1:09 PM Ryan Blue  wrote:
>>
>>> Hi everyone,
>>>
>>> There's been some great discussion for DataSourceV2 in the last few
>>> months, but it has been difficult to resolve some of the discussions and I
>>> don't think that we have a very clear roadmap for getting the work done.
>>>
>>> To coordinate better as a community, I'd like to start a regular sync-up
>>> over google hangouts. We use this in the Parquet community to have more
>>> effective community discussions about thorny technical issues and to get
>>> aligned on an overall roadmap. It is really helpful in that community and I
>>> think it would help us get DSv2 done more quickly.
>>>
>>> Here's how it works: people join the hangout, we go around the list to
>>> gather topics, have about an hour-long discussion, and then send a summary
>>> of the discussion to the dev list for anyone that couldn't participate.
>>> That way we can move topics along, but we keep the broader community in the
>>> loop as well for further discussion on the mailing list.
>>>
>>> I'll volunteer to set up the sync and send invites to anyone that wants
>>> to attend. If you're interested, please reply with the email address you'd
>>> like to put on the invite list (if there's a way to do this without
>>> specific invites, let me know). Also for the first sync, please note what
>>> times would work for you so we can try to account for people in different
>>> time zones.
>>>
>>> For the first one, I was thinking some day next week (time TBD by those
>>> interested) and starting off with a general roadmap discussion before
>>> diving into specific technical topics.
>>>
>>> Thanks,
>>>
>>> rb
>>>
>>> --
>>> Ryan Blue
>>> Software Engineer
>>> Netflix
>>>
>>
>>
>> --
>> Ryan Blue
>> Software Engineer
>> Netflix
>>
>


Re: python lint is broken on master branch

2018-10-31 Thread Wenchen Fan
Cool! Thanks Shane!

On Wed, Oct 31, 2018 at 11:24 PM shane knapp  wrote:

> flake8 was at 3.6.0 on amp-jenkins-staging-worker-01, so i downgraded to
> 3.5.0 and we're green:
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/9082/console
>
> checked the rest of the ubuntu workers and they were fine.
>
> On Wed, Oct 31, 2018 at 7:59 AM shane knapp  wrote:
>
>> yeah, that's what it is.  thought i'd fixed that.  looking now.
>>
>> On Wed, Oct 31, 2018 at 6:09 AM Sean Owen  wrote:
>>
>>> Maybe a pycodestyle or flake8 version issue?
>>> On Wed, Oct 31, 2018 at 7:43 AM Wenchen Fan  wrote:
>>> >
>>> > The Jenkins job spark-master-lint keeps failing. The error message is
>>> > flake8.exceptions.FailedToLoadPlugin: Flake8 failed to load plugin
>>> "pycodestyle.break_after_binary_operator" due to 'module' object has no
>>> attribute 'break_after_binary_operator'.
>>> > flake8 checks failed.
>>> >
>>> > As an example please see
>>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/9080/console
>>> >
>>> > Any ideas?
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>
>> --
>> Shane Knapp
>> UC Berkeley EECS Research / RISELab Staff Technical Lead
>> https://rise.cs.berkeley.edu
>>
>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


Re: python lint is broken on master branch

2018-10-31 Thread shane knapp
flake8 was at 3.6.0 on amp-jenkins-staging-worker-01, so i downgraded to
3.5.0 and we're green:
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/9082/console

checked the rest of the ubuntu workers and they were fine.

On Wed, Oct 31, 2018 at 7:59 AM shane knapp  wrote:

> yeah, that's what it is.  thought i'd fixed that.  looking now.
>
> On Wed, Oct 31, 2018 at 6:09 AM Sean Owen  wrote:
>
>> Maybe a pycodestyle or flake8 version issue?
>> On Wed, Oct 31, 2018 at 7:43 AM Wenchen Fan  wrote:
>> >
>> > The Jenkins job spark-master-lint keeps failing. The error message is
>> > flake8.exceptions.FailedToLoadPlugin: Flake8 failed to load plugin
>> "pycodestyle.break_after_binary_operator" due to 'module' object has no
>> attribute 'break_after_binary_operator'.
>> > flake8 checks failed.
>> >
>> > As an example please see
>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/9080/console
>> >
>> > Any ideas?
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>
> --
> Shane Knapp
> UC Berkeley EECS Research / RISELab Staff Technical Lead
> https://rise.cs.berkeley.edu
>


-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: python lint is broken on master branch

2018-10-31 Thread shane knapp
yeah, that's what it is.  thought i'd fixed that.  looking now.

On Wed, Oct 31, 2018 at 6:09 AM Sean Owen  wrote:

> Maybe a pycodestyle or flake8 version issue?
> On Wed, Oct 31, 2018 at 7:43 AM Wenchen Fan  wrote:
> >
> > The Jenkins job spark-master-lint keeps failing. The error message is
> > flake8.exceptions.FailedToLoadPlugin: Flake8 failed to load plugin
> "pycodestyle.break_after_binary_operator" due to 'module' object has no
> attribute 'break_after_binary_operator'.
> > flake8 checks failed.
> >
> > As an example please see
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/9080/console
> >
> > Any ideas?
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

-- 
Shane Knapp
UC Berkeley EECS Research / RISELab Staff Technical Lead
https://rise.cs.berkeley.edu


Re: python lint is broken on master branch

2018-10-31 Thread Sean Owen
Maybe a pycodestyle or flake8 version issue?
On Wed, Oct 31, 2018 at 7:43 AM Wenchen Fan  wrote:
>
> The Jenkins job spark-master-lint keeps failing. The error message is
> flake8.exceptions.FailedToLoadPlugin: Flake8 failed to load plugin 
> "pycodestyle.break_after_binary_operator" due to 'module' object has no 
> attribute 'break_after_binary_operator'.
> flake8 checks failed.
>
> As an example please see 
> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/9080/console
>
> Any ideas?

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



python lint is broken on master branch

2018-10-31 Thread Wenchen Fan
The Jenkins job spark-master-lint keeps failing. The error message is
flake8.exceptions.FailedToLoadPlugin: Flake8 failed to load plugin
"pycodestyle.break_after_binary_operator" due to 'module' object has no
attribute 'break_after_binary_operator'.
flake8 checks failed.

As an example please see
https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/9080/console

Any ideas?