Re: [VOTE] Release Spark 3.0.1 (RC3)

2020-08-31 Thread Xiao Li
-1 due to a regression introduced by a fix in 3.0.1.

See https://github.com/apache/spark/pull/29602

Xiao

On Mon, Aug 31, 2020 at 9:26 AM Tom Graves 
wrote:

> +1
>
> Tom
>
> On Friday, August 28, 2020, 09:02:31 AM CDT, 郑瑞峰 
> wrote:
>
>
> Please vote on releasing the following candidate as Apache Spark version
> 3.0.1.
>
> The vote is open until Sep 2nd at 9AM PST and passes if a majority +1 PMC
> votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.0.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> There are currently no issues targeting 3.0.1 (try project = SPARK AND
> "Target Version/s" = "3.0.1" AND status in (Open, Reopened, "In Progress"))
>
> The tag to be voted on is v3.0.1-rc3 (commit
> dc04bf53fe821b7a07f817966c6c173f3b3788c6):
> https://github.com/apache/spark/tree/v3.0.1-rc3
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.0.1-rc3-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1357/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.0.1-rc3-docs/
>
> The list of bug fixes going into 3.0.1 can be found at the following URL:
> https://s.apache.org/q9g2d
>
> This release is using the release script of the tag v3.0.1-rc3.
>
> FAQ
>
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with an out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.0.1?
> ===
>
> The current list of open tickets targeted at 3.0.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.0.1
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
>

-- 



Re: [VOTE] Release Spark 3.0.1 (RC3)

2020-08-31 Thread Tom Graves
 +1
Tom
On Friday, August 28, 2020, 09:02:31 AM CDT, 郑瑞峰  
wrote:  
 
 Please vote on releasing the following candidate as Apache Spark version 3.0.1.
The vote is open until Sep 2nd at 9AM PST and passes if a majority +1 PMC votes 
are cast, with a minimum of 3 +1 votes.
[ ] +1 Release this package as Apache Spark 3.0.1[ ] -1 Do not release this 
package because ...
To learn more about Apache Spark, please see http://spark.apache.org/
There are currently no issues targeting 3.0.1 (try project = SPARK AND "Target 
Version/s" = "3.0.1" AND status in (Open, Reopened, "In Progress"))
The tag to be voted on is v3.0.1-rc3 (commit 
dc04bf53fe821b7a07f817966c6c173f3b3788c6):https://github.com/apache/spark/tree/v3.0.1-rc3
The release files, including signatures, digests, etc. can be found 
at:https://dist.apache.org/repos/dist/dev/spark/v3.0.1-rc3-bin/
Signatures used for Spark RCs can be found in this 
file:https://dist.apache.org/repos/dist/dev/spark/KEYS
The staging repository for this release can be found 
at:https://repository.apache.org/content/repositories/orgapachespark-1357/
The documentation corresponding to this release can be found 
at:https://dist.apache.org/repos/dist/dev/spark/v3.0.1-rc3-docs/
The list of bug fixes going into 3.0.1 can be found at the following 
URL:https://s.apache.org/q9g2d
This release is using the release script of the tag v3.0.1-rc3.
FAQ

=How can I help test this 
release?=
If you are a Spark user, you can help us test this release by takingan existing 
Spark workload and running on this release candidate, thenreporting any 
regressions.
If you're working in PySpark you can set up a virtual env and installthe 
current RC and see if anything important breaks, in the Java/Scalayou can add 
the staging repository to your projects resolvers and testwith the RC (make 
sure to clean up the artifact cache before/after soyou don't end up building 
with an out of date RC going forward).
===What should happen to JIRA tickets 
still targeting 3.0.1?===
The current list of open tickets targeted at 3.0.1 can be found 
at:https://issues.apache.org/jira/projects/SPARK and search for "Target 
Version/s" = 3.0.1
Committers should look at those and triage. Extremely important bugfixes, 
documentation, and API tweaks that impact compatibility shouldbe worked on 
immediately. Everything else please retarget to anappropriate release.
==But my bug isn't fixed?==
In order to make timely releases, we will typically not hold therelease unless 
the bug in question is a regression from the previousrelease. That being said, 
if there is something which is a regressionthat has not been correctly targeted 
please ping me or a committer tohelp target the issue.

  

Re: What's the root cause of not supporting multiple aggregations in structured streaming?

2020-08-31 Thread Etienne Chauchot

Hi all,

I'm also very interested in this feature but the PR is open since 
January 2019 and was not updated. It raised a design discussion around 
watermarks and a design doc was written 
(https://docs.google.com/document/d/1IAH9UQJPUiUCLd7H6dazRK2k1szDX38SnM6GVNZYvUo/edit#heading=h.npkueh4bbkz1). 
We also commented this design but no matter what it seems that the 
subject is still stale.


Is there any interest in the community in delivering this feature or is 
it considered worthless ? If the latter, can you explain why ?


Best

Etienne

On 22/05/2019 03:38, 张万新 wrote:

Thanks, I'll check it out.

Arun Mahadevan mailto:ar...@apache.org>> 于 
2019年5月21日周二 01:31写道:


Heres the proposal for supporting it in "append" mode -
https://github.com/apache/spark/pull/23576. You could see if it
addresses your requirement and post your feedback in the PR.
For "update" mode its going to be much harder to support this
without first adding support for "retractions", otherwise we would
end up with wrong results.

- Arun


On Mon, 20 May 2019 at 01:34, Gabor Somogyi
mailto:gabor.g.somo...@gmail.com>> wrote:

There is PR for this but not yet merged.

On Mon, May 20, 2019 at 10:13 AM 张万新 mailto:kevinzwx1...@gmail.com>> wrote:

Hi there,

I'd like to know what's the root reason why multiple
aggregations on streaming dataframe is not allowed since
it's a very useful feature, and flink has supported it for
a long time.

Thanks.



is it possible to apply AQE rules only on some of nodes?

2020-08-31 Thread CodingCat
Hi, Spark devs

I am wondering if it is possible to apply AQE on part of the physical plan?
e.g. I only want to apply coalesce partitions on a
particular ShuffleQueryStageExec?

I didn't find a very straightforward way to achieve this, but is there a
way to workaround the current limitation?

Thanks!

Nan