Re: Apache Spark 3.2 Expectation

2021-07-01 Thread Gengliang Wang
Hi all, I just cut branch-3.2 on Github and created version 3.3.0 on Jira. When merging PRs on the master branch before 3.2.0 RC, please help cherry-picking bug fixes and ongoing major features mentioned in this thread to branch-3.2, thanks! On Fri, Jul 2, 2021 at 2:31 AM Dongjoon Hyun wrote:

Re: Apache Spark 3.2 Expectation

2021-07-01 Thread Dongjoon Hyun
Thank you, Gengliang! On Wed, Jun 30, 2021 at 10:56 PM Gengliang Wang wrote: > Hi all, > > Just as a gentle reminder, I will do the branch cut tomorrow. Please > focus on finalizing the works to land in Spark 3.2.0. > After the branch cut, we can still merge the ongoing major features >

Re: Apache Spark 3.2 Expectation

2021-06-30 Thread Gengliang Wang
Hi all, Just as a gentle reminder, I will do the branch cut tomorrow. Please focus on finalizing the works to land in Spark 3.2.0. After the branch cut, we can still merge the ongoing major features mentioned in this thread. There should no be other new features in branch 3.2. Thanks! On Thu,

Re: Apache Spark 3.2 Expectation

2021-06-17 Thread Hyukjin Kwon
*GA -> QA On Thu, 17 Jun 2021, 15:16 Hyukjin Kwon, wrote: > I think we would make sure treating these items in the list as exceptions > from the code freeze, and discourage to push new APIs and features though. > > GA period ideally we should focus on bug fixes and polishing. > > It would be

Re: Apache Spark 3.2 Expectation

2021-06-17 Thread Hyukjin Kwon
I think we would make sure treating these items in the list as exceptions from the code freeze, and discourage to push new APIs and features though. GA period ideally we should focus on bug fixes and polishing. It would be great if we can speed up on these items in the list too. On Thu, 17 Jun

Re: Apache Spark 3.2 Expectation

2021-06-17 Thread Gengliang Wang
Thanks for the suggestions from Dongjoon, Liangchi, Min, and Xiao! Now we make it clear that it's a soft cut and we can still merge important code changes to branch-3.2 before RC. Let's keep the branch cut date as July 1st. On Thu, Jun 17, 2021 at 1:41 PM Dongjoon Hyun wrote: > > First, I think

Re: Apache Spark 3.2 Expectation

2021-06-16 Thread Dongjoon Hyun
> First, I think you are saying "branch-3.2"; To Xiao. Yes, it's was a typo of "branch-3.2". > We do strongly prefer to cut the release for Spark 3.2.0 including all the patches under SPARK-30602. > This way, we can backport the other performance/operability enhancements tickets under

Re: Apache Spark 3.2 Expectation

2021-06-16 Thread Xiao Li
> > To Liang-Chi, I'm -1 for postponing the branch cut because this is a soft > cut and the committers still are able to commit to `branch-3.3` according > to their decisions. First, I think you are saying "branch-3.2"; Second, the "so cut" means no "code freeze", although we cut the branch. To

Re: Apache Spark 3.2 Expectation

2021-06-16 Thread Min Shen
Hi Gengliang, Thanks for volunteering as the release manager for Spark 3.2.0. Regarding the ongoing work of push-based shuffle in SPARK-30602, we are close to having all the patches merged to master to enable push-based shuffle. Currently, there are 2 PRs under SPARK-30602 that are under active

Re: Apache Spark 3.2 Expectation

2021-06-16 Thread Liang-Chi Hsieh
Thanks Dongjoon. I've talked with Dongjoon offline to know more this. As it is soft cut date, there is no reason to postpone it. It sounds good then to keep original branch cut date. Thank you. Dongjoon Hyun-2 wrote > Thank you for volunteering, Gengliang. > > Apache Spark 3.2.0 is the

Re: Apache Spark 3.2 Expectation

2021-06-16 Thread Dongjoon Hyun
Thank you for volunteering, Gengliang. Apache Spark 3.2.0 is the first version enabling AQE by default. I'm also watching some on-going improvements on that. https://issues.apache.org/jira/browse/SPARK-33828 (SQL Adaptive Query Execution QA) To Liang-Chi, I'm -1 for postponing the branch

Re: Apache Spark 3.2 Expectation

2021-06-16 Thread Liang-Chi Hsieh
First, thanks for being volunteer as the release manager of Spark 3.2.0, Gengliang! And yes, for the two important Structured Streaming features, RocksDB StateStore and session window, we're working on them and expect to have them in the new release. So I propose to postpone the branch cut date.

Re: Apache Spark 3.2 Expectation

2021-06-16 Thread Gengliang Wang
Thanks, Hyukjin. The expected target branch cut date of Spark 3.2 is *July 1st* on https://spark.apache.org/versioning-policy.html. However, I notice that there are still multiple important projects in progress now: [Core] - SPIP: Support push-based shuffle to improve shuffle efficiency

Re: Apache Spark 3.2 Expectation

2021-06-15 Thread Hyukjin Kwon
+1, thanks. On Tue, 15 Jun 2021, 16:17 Gengliang Wang, wrote: > Hi, > > As the expected release date is close, I would like to volunteer as the > release manager for Apache Spark 3.2.0. > > Thanks, > Gengliang > > On Mon, Apr 12, 2021 at 1:59 PM Wenchen Fan wrote: > >> An update: we found a

Re: Apache Spark 3.2 Expectation

2021-06-15 Thread Gengliang Wang
Hi, As the expected release date is close, I would like to volunteer as the release manager for Apache Spark 3.2.0. Thanks, Gengliang On Mon, Apr 12, 2021 at 1:59 PM Wenchen Fan wrote: > An update: we found a mistake that we picked the Spark 3.2 release date > based on the scheduled release

Re: Apache Spark 3.2 Expectation

2021-04-11 Thread Wenchen Fan
An update: we found a mistake that we picked the Spark 3.2 release date based on the scheduled release date of 3.1. However, 3.1 was delayed and released on March 2. In order to have a full 6 months development for 3.2, the target release date for 3.2 should be September 2. I'm updating the

Re: Apache Spark 3.2 Expectation

2021-03-11 Thread Dongjoon Hyun
Thank you, Xiao, Wenchen and Hyukjin. Bests, Dongjoon. On Thu, Mar 11, 2021 at 2:15 AM Hyukjin Kwon wrote: > Just for an update, I will send a discussion email about my idea late this > week or early next week. > > 2021년 3월 11일 (목) 오후 7:00, Wenchen Fan 님이 작성: > >> There are many projects

Re: Apache Spark 3.2 Expectation

2021-03-11 Thread Hyukjin Kwon
Just for an update, I will send a discussion email about my idea late this week or early next week. 2021년 3월 11일 (목) 오후 7:00, Wenchen Fan 님이 작성: > There are many projects going on right now, such as new DS v2 APIs, ANSI > interval types, join improvement, disaggregated shuffle, etc. I don't >

Re: Apache Spark 3.2 Expectation

2021-03-11 Thread Wenchen Fan
There are many projects going on right now, such as new DS v2 APIs, ANSI interval types, join improvement, disaggregated shuffle, etc. I don't think it's realistic to do the branch cut in April. I'm +1 to release 3.2 around July, but it doesn't mean we have to cut the branch 3 months earlier. We

Re: Apache Spark 3.2 Expectation

2021-03-10 Thread Xiao Li
Below are some nice-to-have features we can work on in Spark 3.2: Lateral Join support , interval data type, timestamp without time zone, un-nesting arbitrary queries, the returned metrics of DSV2, and error message standardization. Spark 3.2 will

Re: Apache Spark 3.2 Expectation

2021-03-10 Thread Dongjoon Hyun
Hi, Xiao. This thread started 13 days ago. Since you asked the community about major features or timelines at that time, could you share your roadmap or expectations if you have something in your mind? > Thank you, Dongjoon, for initiating this discussion. Let us keep it open. It might take 1-2

Re: Apache Spark 3.2 Expectation

2021-03-03 Thread Dongjoon Hyun
Hi, John. This thread aims to share your expectations and goals (and maybe work progress) to Apache Spark 3.2 because we are making this together. :) Bests, Dongjoon. On Wed, Mar 3, 2021 at 1:59 PM John Zhuge wrote: > Hi Dongjoon, > > Is it possible to get ViewCatalog in? The community

Re: Apache Spark 3.2 Expectation

2021-03-03 Thread John Zhuge
Hi Dongjoon, Is it possible to get ViewCatalog in? The community already had fairly detailed discussions. Thanks, John On Thu, Feb 25, 2021 at 8:57 AM Dongjoon Hyun wrote: > Hi, All. > > Since we have been preparing Apache Spark 3.2.0 in master branch since > December 2020, March seems to be

Re: Apache Spark 3.2 Expectation

2021-03-03 Thread Chang Chen
+1 for Data Source V2 Aggregate push down huaxin gao 于2021年2月27日周六 上午4:20写道: > Thanks Dongjoon and Xiao for the discussion. I would like to add Data > Source V2 Aggregate push down to the list. I am currently working on > JDBC Data Source V2 Aggregate push down, but the common code can be used

Re: Apache Spark 3.2 Expectation

2021-02-28 Thread bo yang
+1 for better support for disaggregated shuffle (push-based shuffle is a great example, also there are Facebook shuffle service and Uber remote shuffle service ). There

Re: Apache Spark 3.2 Expectation

2021-02-28 Thread Takeshi Yamamuro
Thanks, Dongjoon, for the discussion. I would like to add Gengliang's work: SPARK-34246 New type coercion syntax rules in ANSI mode I think it is worth describing it in the next release note, too. Bests, Takeshi On Sat, Feb 27, 2021 at 11:41 AM Yi Wu wrote: > +1 to continue the incompleted

Re: Apache Spark 3.2 Expectation

2021-02-26 Thread Yi Wu
+1 to continue the incompleted push-based shuffle. -- Yi On Fri, Feb 26, 2021 at 1:26 AM Mridul Muralidharan wrote: > > > Nit: Java 17 -> should be available by Sept 2021 :-) > Adoption would also depend on some of our nontrivial dependencies > supporting it - it might be a stretch to get it

Re: Apache Spark 3.2 Expectation

2021-02-26 Thread Cheng Su
ideally want to finish the feature in 3.2. For most of features here, we already developed internally and rolled out to production. Thanks, Cheng Su From: Dongjoon Hyun Date: Friday, February 26, 2021 at 4:06 PM To: Hyukjin Kwon Cc: huaxin gao , Xiao Li , dev Subject: Re: Apache Spark 3.2

Re: Apache Spark 3.2 Expectation

2021-02-26 Thread Dongjoon Hyun
Sure, thank you, Hyukjin. Bests, Dongjoon. On Fri, Feb 26, 2021 at 4:01 PM Hyukjin Kwon wrote: > I have an idea which I'll send an email to discuss next or a week after > the next week. I did not have enough bandwidth to drive both together at > the same time. I would appreciate if we have

Re: Apache Spark 3.2 Expectation

2021-02-26 Thread Hyukjin Kwon
I have an idea which I'll send an email to discuss next or a week after the next week. I did not have enough bandwidth to drive both together at the same time. I would appreciate if we have some more time for 3.2. In addition, It would also be great if we follow the schedule and catch potential

Re: Apache Spark 3.2 Expectation

2021-02-26 Thread Dongjoon Hyun
Thank you for sharing your plan, Huaxin! Bests, Dongjoon. On Fri, Feb 26, 2021 at 12:20 PM huaxin gao wrote: > Thanks Dongjoon and Xiao for the discussion. I would like to add Data > Source V2 Aggregate push down to the list. I am currently working on > JDBC Data Source V2 Aggregate push

Re: Apache Spark 3.2 Expectation

2021-02-26 Thread Dongjoon Hyun
On Fri, Feb 26, 2021 at 11:13 AM Xiao Li wrote: > Do we have enough features in the current master branch? > Hi, Xiao. Is this a question to Sean's previous comment, `There is already some good stuff in 3.2 and will be a good minor release in 5-6 months.`? On Thu, Feb 25, 2021 at 9:33 AM Sean

Re: Apache Spark 3.2 Expectation

2021-02-26 Thread huaxin gao
Thanks Dongjoon and Xiao for the discussion. I would like to add Data Source V2 Aggregate push down to the list. I am currently working on JDBC Data Source V2 Aggregate push down, but the common code can be used for the file based V2 Data Source as well. For example, MAX and MIN can be pushed down

Re: Apache Spark 3.2 Expectation

2021-02-26 Thread Xiao Li
Thank you, Dongjoon, for initiating this discussion. Let us keep it open. It might take 1-2 weeks to collect from the community all the features we plan to build and ship in 3.2 since we just finished the 3.1 voting. > 3. +100 for Apache Spark 3.2.0 in July 2021. Maybe, we need `branch-cut` > in

Re: Apache Spark 3.2 Expectation

2021-02-26 Thread Dongjoon Hyun
Thank you, Mridul and Sean. 1. Yes, `2017` was a typo. Java 17 is scheduled September 2021. And, of course, it's a nice-to-have status. :) 2. `Push based shuffle and disaggregated shuffle`. Definitely. Thanks for sharing, 3. +100 for Apache Spark 3.2.0 in July 2021. Maybe, we need `branch-cut`

Re: Apache Spark 3.2 Expectation

2021-02-25 Thread Sean Owen
I'd roughly expect 3.2 in, say, July of this year, given the usual cadence. No reason it couldn't be a little sooner or later. There is already some good stuff in 3.2 and will be a good minor release in 5-6 months. On Thu, Feb 25, 2021 at 10:57 AM Dongjoon Hyun wrote: > Hi, All. > > Since we

Re: Apache Spark 3.2 Expectation

2021-02-25 Thread Mridul Muralidharan
Nit: Java 17 -> should be available by Sept 2021 :-) Adoption would also depend on some of our nontrivial dependencies supporting it - it might be a stretch to get it in for Apache Spark 3.2 ? Features: Push based shuffle and disaggregated shuffle should also be in 3.2 Regards, Mridul On

Apache Spark 3.2 Expectation

2021-02-25 Thread Dongjoon Hyun
Hi, All. Since we have been preparing Apache Spark 3.2.0 in master branch since December 2020, March seems to be a good time to share our thoughts and aspirations on Apache Spark 3.2. According to the progress on Apache Spark 3.1 release, Apache Spark 3.2 seems to be the last minor release of