Re: [VOTE] SPARK 2.4.0 (RC5)
+1 Cheers, Dongjoon.
Re: [VOTE] SPARK 2.4.0 (RC5)
+1 Checked R doc and all R API changes From: Denny Lee Sent: Wednesday, October 31, 2018 9:13 PM To: Chitral Verma Cc: Wenchen Fan; dev@spark.apache.org Subject: Re: [VOTE] SPARK 2.4.0 (RC5) +1 On Wed, Oct 31, 2018 at 12:54 PM Chitral Verma mailto:chitralve...@gmail.com>> wrote: +1 On Wed, 31 Oct 2018 at 11:56, Reynold Xin mailto:r...@databricks.com>> wrote: +1 Look forward to the release! On Mon, Oct 29, 2018 at 3:22 AM Wenchen Fan mailto:cloud0...@gmail.com>> wrote: Please vote on releasing the following candidate as Apache Spark version 2.4.0. The vote is open until November 1 PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 2.4.0 [ ] -1 Do not release this package because ... To learn more about Apache Spark, please see http://spark.apache.org/ The tag to be voted on is v2.4.0-rc5 (commit 0a4c03f7d084f1d2aa48673b99f3b9496893ce8d): https://github.com/apache/spark/tree/v2.4.0-rc5 The release files, including signatures, digests, etc. can be found at: https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-bin/ Signatures used for Spark RCs can be found in this file: https://dist.apache.org/repos/dist/dev/spark/KEYS The staging repository for this release can be found at: https://repository.apache.org/content/repositories/orgapachespark-1291 The documentation corresponding to this release can be found at: https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-docs/ The list of bug fixes going into 2.4.0 can be found at the following URL: https://issues.apache.org/jira/projects/SPARK/versions/12342385 FAQ = How can I help test this release? = If you are a Spark user, you can help us test this release by taking an existing Spark workload and running on this release candidate, then reporting any regressions. If you're working in PySpark you can set up a virtual env and install the current RC and see if anything important breaks, in the Java/Scala you can add the staging repository to your projects resolvers and test with the RC (make sure to clean up the artifact cache before/after so you don't end up building with a out of date RC going forward). === What should happen to JIRA tickets still targeting 2.4.0? === The current list of open tickets targeted at 2.4.0 can be found at: https://issues.apache.org/jira/projects/SPARK and search for "Target Version/s" = 2.4.0 Committers should look at those and triage. Extremely important bug fixes, documentation, and API tweaks that impact compatibility should be worked on immediately. Everything else please retarget to an appropriate release. == But my bug isn't fixed? == In order to make timely releases, we will typically not hold the release unless the bug in question is a regression from the previous release. That being said, if there is something which is a regression that has not been correctly targeted please ping me or a committer to help target the issue.
Re: [VOTE] SPARK 2.4.0 (RC5)
+1 On Wed, Oct 31, 2018 at 12:54 PM Chitral Verma wrote: > +1 > > On Wed, 31 Oct 2018 at 11:56, Reynold Xin wrote: > >> +1 >> >> Look forward to the release! >> >> >> >> On Mon, Oct 29, 2018 at 3:22 AM Wenchen Fan wrote: >> >>> Please vote on releasing the following candidate as Apache Spark version >>> 2.4.0. >>> >>> The vote is open until November 1 PST and passes if a majority +1 PMC >>> votes are cast, with >>> a minimum of 3 +1 votes. >>> >>> [ ] +1 Release this package as Apache Spark 2.4.0 >>> [ ] -1 Do not release this package because ... >>> >>> To learn more about Apache Spark, please see http://spark.apache.org/ >>> >>> The tag to be voted on is v2.4.0-rc5 (commit >>> 0a4c03f7d084f1d2aa48673b99f3b9496893ce8d): >>> https://github.com/apache/spark/tree/v2.4.0-rc5 >>> >>> The release files, including signatures, digests, etc. can be found at: >>> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-bin/ >>> >>> Signatures used for Spark RCs can be found in this file: >>> https://dist.apache.org/repos/dist/dev/spark/KEYS >>> >>> The staging repository for this release can be found at: >>> https://repository.apache.org/content/repositories/orgapachespark-1291 >>> >>> The documentation corresponding to this release can be found at: >>> https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-docs/ >>> >>> The list of bug fixes going into 2.4.0 can be found at the following URL: >>> https://issues.apache.org/jira/projects/SPARK/versions/12342385 >>> >>> FAQ >>> >>> = >>> How can I help test this release? >>> = >>> >>> If you are a Spark user, you can help us test this release by taking >>> an existing Spark workload and running on this release candidate, then >>> reporting any regressions. >>> >>> If you're working in PySpark you can set up a virtual env and install >>> the current RC and see if anything important breaks, in the Java/Scala >>> you can add the staging repository to your projects resolvers and test >>> with the RC (make sure to clean up the artifact cache before/after so >>> you don't end up building with a out of date RC going forward). >>> >>> === >>> What should happen to JIRA tickets still targeting 2.4.0? >>> === >>> >>> The current list of open tickets targeted at 2.4.0 can be found at: >>> https://issues.apache.org/jira/projects/SPARK and search for "Target >>> Version/s" = 2.4.0 >>> >>> Committers should look at those and triage. Extremely important bug >>> fixes, documentation, and API tweaks that impact compatibility should >>> be worked on immediately. Everything else please retarget to an >>> appropriate release. >>> >>> == >>> But my bug isn't fixed? >>> == >>> >>> In order to make timely releases, we will typically not hold the >>> release unless the bug in question is a regression from the previous >>> release. That being said, if there is something which is a regression >>> that has not been correctly targeted please ping me or a committer to >>> help target the issue. >>> >>
Re: [VOTE] SPARK 2.4.0 (RC5)
+1 Look forward to the release! On Mon, Oct 29, 2018 at 3:22 AM Wenchen Fan wrote: > Please vote on releasing the following candidate as Apache Spark version > 2.4.0. > > The vote is open until November 1 PST and passes if a majority +1 PMC > votes are cast, with > a minimum of 3 +1 votes. > > [ ] +1 Release this package as Apache Spark 2.4.0 > [ ] -1 Do not release this package because ... > > To learn more about Apache Spark, please see http://spark.apache.org/ > > The tag to be voted on is v2.4.0-rc5 (commit > 0a4c03f7d084f1d2aa48673b99f3b9496893ce8d): > https://github.com/apache/spark/tree/v2.4.0-rc5 > > The release files, including signatures, digests, etc. can be found at: > https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-bin/ > > Signatures used for Spark RCs can be found in this file: > https://dist.apache.org/repos/dist/dev/spark/KEYS > > The staging repository for this release can be found at: > https://repository.apache.org/content/repositories/orgapachespark-1291 > > The documentation corresponding to this release can be found at: > https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-docs/ > > The list of bug fixes going into 2.4.0 can be found at the following URL: > https://issues.apache.org/jira/projects/SPARK/versions/12342385 > > FAQ > > = > How can I help test this release? > = > > If you are a Spark user, you can help us test this release by taking > an existing Spark workload and running on this release candidate, then > reporting any regressions. > > If you're working in PySpark you can set up a virtual env and install > the current RC and see if anything important breaks, in the Java/Scala > you can add the staging repository to your projects resolvers and test > with the RC (make sure to clean up the artifact cache before/after so > you don't end up building with a out of date RC going forward). > > === > What should happen to JIRA tickets still targeting 2.4.0? > === > > The current list of open tickets targeted at 2.4.0 can be found at: > https://issues.apache.org/jira/projects/SPARK and search for "Target > Version/s" = 2.4.0 > > Committers should look at those and triage. Extremely important bug > fixes, documentation, and API tweaks that impact compatibility should > be worked on immediately. Everything else please retarget to an > appropriate release. > > == > But my bug isn't fixed? > == > > In order to make timely releases, we will typically not hold the > release unless the bug in question is a regression from the previous > release. That being said, if there is something which is a regression > that has not been correctly targeted please ping me or a committer to > help target the issue. >
Re: [VOTE] SPARK 2.4.0 (RC5)
+1 On Mon, Oct 29, 2018 at 3:22 AM Wenchen Fan wrote: > > Please vote on releasing the following candidate as Apache Spark version > 2.4.0. > > The vote is open until November 1 PST and passes if a majority +1 PMC votes > are cast, with > a minimum of 3 +1 votes. > > [ ] +1 Release this package as Apache Spark 2.4.0 > [ ] -1 Do not release this package because ... > > To learn more about Apache Spark, please see http://spark.apache.org/ > > The tag to be voted on is v2.4.0-rc5 (commit > 0a4c03f7d084f1d2aa48673b99f3b9496893ce8d): > https://github.com/apache/spark/tree/v2.4.0-rc5 > > The release files, including signatures, digests, etc. can be found at: > https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-bin/ > > Signatures used for Spark RCs can be found in this file: > https://dist.apache.org/repos/dist/dev/spark/KEYS > > The staging repository for this release can be found at: > https://repository.apache.org/content/repositories/orgapachespark-1291 > > The documentation corresponding to this release can be found at: > https://dist.apache.org/repos/dist/dev/spark/v2.4.0-rc5-docs/ > > The list of bug fixes going into 2.4.0 can be found at the following URL: > https://issues.apache.org/jira/projects/SPARK/versions/12342385 > > FAQ > > = > How can I help test this release? > = > > If you are a Spark user, you can help us test this release by taking > an existing Spark workload and running on this release candidate, then > reporting any regressions. > > If you're working in PySpark you can set up a virtual env and install > the current RC and see if anything important breaks, in the Java/Scala > you can add the staging repository to your projects resolvers and test > with the RC (make sure to clean up the artifact cache before/after so > you don't end up building with a out of date RC going forward). > > === > What should happen to JIRA tickets still targeting 2.4.0? > === > > The current list of open tickets targeted at 2.4.0 can be found at: > https://issues.apache.org/jira/projects/SPARK and search for "Target > Version/s" = 2.4.0 > > Committers should look at those and triage. Extremely important bug > fixes, documentation, and API tweaks that impact compatibility should > be worked on immediately. Everything else please retarget to an > appropriate release. > > == > But my bug isn't fixed? > == > > In order to make timely releases, we will typically not hold the > release unless the bug in question is a regression from the previous > release. That being said, if there is something which is a regression > that has not been correctly targeted please ping me or a committer to > help target the issue. -- Marcelo - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: DataSourceV2 hangouts sync
Thanks for bringing up the custom metrics API in the list, its something that needs to be addressed. A couple more items worth considering, 1. Possibility to unify the batch, micro-batch and continuous sources. (similar to SPARK-25000) Right now now there is significant code duplication even between micro-batch v/s continuous sources. Attempt to redesign such that a single implementation could potentially work across modes (by implementing relevant apis). 2. Better framework support for supporting end-end exactly-once in streaming. (maybe framework level support for 2PC). Thanks, Arun On Tue, 30 Oct 2018 at 19:24, Wenchen Fan wrote: > Hi all, > > I spent some time thinking about the roadmap, and came up with an initial > list: > SPARK-25390: data source V2 API refactoring > SPARK-24252: add catalog support > SPARK-25531: new write APIs for data source v2 > SPARK-25190: better operator pushdown API > Streaming rate control API > Custom metrics API > Migrate existing data sources > Move data source v2 and built-in implementations to individual modules. > > > Let's have more discussion over the hangout. > > Thanks, > Wenchen > > On Tue, Oct 30, 2018 at 4:32 AM Ryan Blue > wrote: > >> Everyone, >> >> There are now 25 guests invited, which is a lot of people to actively >> participate in a sync like this. >> >> For those of you who probably won't actively participate, I've added a >> live stream. If you don't plan to talk, please use the live stream instead >> of the meet/hangout so that we don't end up with so many people that we >> can't actually get the discussion going. Here's a link to the stream: >> >> https://stream.meet.google.com/stream/6be59d80-04c7-44dc-9042-4f3b597fc8ba >> >> Thanks! >> >> rb >> >> On Thu, Oct 25, 2018 at 1:09 PM Ryan Blue wrote: >> >>> Hi everyone, >>> >>> There's been some great discussion for DataSourceV2 in the last few >>> months, but it has been difficult to resolve some of the discussions and I >>> don't think that we have a very clear roadmap for getting the work done. >>> >>> To coordinate better as a community, I'd like to start a regular sync-up >>> over google hangouts. We use this in the Parquet community to have more >>> effective community discussions about thorny technical issues and to get >>> aligned on an overall roadmap. It is really helpful in that community and I >>> think it would help us get DSv2 done more quickly. >>> >>> Here's how it works: people join the hangout, we go around the list to >>> gather topics, have about an hour-long discussion, and then send a summary >>> of the discussion to the dev list for anyone that couldn't participate. >>> That way we can move topics along, but we keep the broader community in the >>> loop as well for further discussion on the mailing list. >>> >>> I'll volunteer to set up the sync and send invites to anyone that wants >>> to attend. If you're interested, please reply with the email address you'd >>> like to put on the invite list (if there's a way to do this without >>> specific invites, let me know). Also for the first sync, please note what >>> times would work for you so we can try to account for people in different >>> time zones. >>> >>> For the first one, I was thinking some day next week (time TBD by those >>> interested) and starting off with a general roadmap discussion before >>> diving into specific technical topics. >>> >>> Thanks, >>> >>> rb >>> >>> -- >>> Ryan Blue >>> Software Engineer >>> Netflix >>> >> >> >> -- >> Ryan Blue >> Software Engineer >> Netflix >> >
Re: python lint is broken on master branch
Cool! Thanks Shane! On Wed, Oct 31, 2018 at 11:24 PM shane knapp wrote: > flake8 was at 3.6.0 on amp-jenkins-staging-worker-01, so i downgraded to > 3.5.0 and we're green: > > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/9082/console > > checked the rest of the ubuntu workers and they were fine. > > On Wed, Oct 31, 2018 at 7:59 AM shane knapp wrote: > >> yeah, that's what it is. thought i'd fixed that. looking now. >> >> On Wed, Oct 31, 2018 at 6:09 AM Sean Owen wrote: >> >>> Maybe a pycodestyle or flake8 version issue? >>> On Wed, Oct 31, 2018 at 7:43 AM Wenchen Fan wrote: >>> > >>> > The Jenkins job spark-master-lint keeps failing. The error message is >>> > flake8.exceptions.FailedToLoadPlugin: Flake8 failed to load plugin >>> "pycodestyle.break_after_binary_operator" due to 'module' object has no >>> attribute 'break_after_binary_operator'. >>> > flake8 checks failed. >>> > >>> > As an example please see >>> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/9080/console >>> > >>> > Any ideas? >>> >>> - >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>> >> >> -- >> Shane Knapp >> UC Berkeley EECS Research / RISELab Staff Technical Lead >> https://rise.cs.berkeley.edu >> > > > -- > Shane Knapp > UC Berkeley EECS Research / RISELab Staff Technical Lead > https://rise.cs.berkeley.edu >
Re: python lint is broken on master branch
flake8 was at 3.6.0 on amp-jenkins-staging-worker-01, so i downgraded to 3.5.0 and we're green: https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/9082/console checked the rest of the ubuntu workers and they were fine. On Wed, Oct 31, 2018 at 7:59 AM shane knapp wrote: > yeah, that's what it is. thought i'd fixed that. looking now. > > On Wed, Oct 31, 2018 at 6:09 AM Sean Owen wrote: > >> Maybe a pycodestyle or flake8 version issue? >> On Wed, Oct 31, 2018 at 7:43 AM Wenchen Fan wrote: >> > >> > The Jenkins job spark-master-lint keeps failing. The error message is >> > flake8.exceptions.FailedToLoadPlugin: Flake8 failed to load plugin >> "pycodestyle.break_after_binary_operator" due to 'module' object has no >> attribute 'break_after_binary_operator'. >> > flake8 checks failed. >> > >> > As an example please see >> https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/9080/console >> > >> > Any ideas? >> >> - >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> > > -- > Shane Knapp > UC Berkeley EECS Research / RISELab Staff Technical Lead > https://rise.cs.berkeley.edu > -- Shane Knapp UC Berkeley EECS Research / RISELab Staff Technical Lead https://rise.cs.berkeley.edu
Re: python lint is broken on master branch
yeah, that's what it is. thought i'd fixed that. looking now. On Wed, Oct 31, 2018 at 6:09 AM Sean Owen wrote: > Maybe a pycodestyle or flake8 version issue? > On Wed, Oct 31, 2018 at 7:43 AM Wenchen Fan wrote: > > > > The Jenkins job spark-master-lint keeps failing. The error message is > > flake8.exceptions.FailedToLoadPlugin: Flake8 failed to load plugin > "pycodestyle.break_after_binary_operator" due to 'module' object has no > attribute 'break_after_binary_operator'. > > flake8 checks failed. > > > > As an example please see > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/9080/console > > > > Any ideas? > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- Shane Knapp UC Berkeley EECS Research / RISELab Staff Technical Lead https://rise.cs.berkeley.edu
Re: python lint is broken on master branch
Maybe a pycodestyle or flake8 version issue? On Wed, Oct 31, 2018 at 7:43 AM Wenchen Fan wrote: > > The Jenkins job spark-master-lint keeps failing. The error message is > flake8.exceptions.FailedToLoadPlugin: Flake8 failed to load plugin > "pycodestyle.break_after_binary_operator" due to 'module' object has no > attribute 'break_after_binary_operator'. > flake8 checks failed. > > As an example please see > https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/9080/console > > Any ideas? - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
python lint is broken on master branch
The Jenkins job spark-master-lint keeps failing. The error message is flake8.exceptions.FailedToLoadPlugin: Flake8 failed to load plugin "pycodestyle.break_after_binary_operator" due to 'module' object has no attribute 'break_after_binary_operator'. flake8 checks failed. As an example please see https://amplab.cs.berkeley.edu/jenkins/view/Spark%20QA%20Compile/job/spark-master-lint/9080/console Any ideas?