Re: Apache Spark 3.3 Release

2022-03-15 Thread Xiao Li
Please do not get me wrong. If we don't cut a branch, we are allowing all patches to land Apache Spark 3.3. That is totally fine. After we cut the branch, we should avoid merging the feature work. In the next three days, let us collect the actively developed PRs that we want to make an exception

Re: Apache Spark 3.3 Release

2022-03-15 Thread Dongjoon Hyun
That is not totally fine, Xiao. It sounds like you are asking a change of plan without a proper reason. Although we cut the branch Today according our plan, you still can collect the list and make a list of exceptions. I'm not blocking what you want to do. Please let the community start to ramp

Re: Apache Spark 3.3 Release

2022-03-15 Thread Xiao Li
I think I finally got your point. What you want to keep unchanged is the branch cut date of Spark 3.3. Today? or this Friday? This is not a big deal. My major concern is whether we should keep merging the feature work or the dependency upgrade after the branch cut. To make our release time more

Re: Apache Spark 3.3 Release

2022-03-15 Thread Dongjoon Hyun
Yes, I agree with you for your whitelist approach for backporting. :) Thank you for summarizing. Thanks, Dongjoon. On Tue, Mar 15, 2022 at 4:20 PM Xiao Li wrote: > I think I finally got your point. What you want to keep unchanged is the > branch cut date of Spark 3.3. Today? or this Friday?

Re: Apache Spark 3.3 Release

2022-03-15 Thread Yikun Jiang
> To make our release time more predictable, let us collect the PRs and wait three more days before the branch cut? For SPIP: Support Customized Kubernetes Schedulers: #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1 Three more days are

Re: Apache Spark 3.3 Release

2022-03-15 Thread Xiao Li
Let me clarify my above suggestion. Maybe we can wait 3 more days to collect the list of actively developed PRs that we want to merge to 3.3 after the branch cut? Please do not rush to merge the PRs that are not fully reviewed. We can cut the branch this Friday and continue merging the PRs that

Re: Apache Spark 3.3 Release

2022-03-15 Thread Dongjoon Hyun
The following was tested and merged a few minutes ago. So, we can remove it from the list. #35819 [SPARK-38524][SPARK-38553][K8S] Bump Volcano to v1.5.1 Thanks, Dongjoon. On Tue, Mar 15, 2022 at 9:48 AM Xiao Li wrote: > Let me clarify my above

Re: Apache Spark 3.3 Release

2022-03-15 Thread Chao Sun
Cool, thanks for clarifying! On Tue, Mar 15, 2022 at 10:11 AM Xiao Li wrote: >> >> For the following list: >> #35789 [SPARK-32268][SQL] Row-level Runtime Filtering >> #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized reader >> #35848 [SPARK-38548][SQL] New SQL function:

Re: Apache Spark 3.3 Release

2022-03-15 Thread Xiao Li
> > For the following list: > #35789 [SPARK-32268][SQL] Row-level Runtime Filtering > #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized > reader > #35848 [SPARK-38548][SQL] New SQL function: try_sum > Do you mean we should include them, or exclude them from 3.3? If possible,

Re: Apache Spark 3.3 Release

2022-03-15 Thread Holden Karau
May I suggest we push out one week (22nd) just to give everyone a bit of breathing space? Rushed software development more often results in bugs. On Tue, Mar 15, 2022 at 6:23 AM Yikun Jiang wrote: > > To make our release time more predictable, let us collect the PRs and > wait three more days

Re: Apache Spark 3.3 Release

2022-03-15 Thread Chao Sun
Hi Xiao, For the following list: #35789 [SPARK-32268][SQL] Row-level Runtime Filtering #34659 [SPARK-34863][SQL] Support complex types for Parquet vectorized reader #35848 [SPARK-38548][SQL] New SQL function: try_sum Do you mean we should include them, or exclude them from 3.3? Thanks, Chao

Re: Apache Spark 3.3 Release

2022-03-15 Thread Dongjoon Hyun
Hi, Max, Chao, Xiao, Holden and all. I have a different idea. Given the situation and small patch list, I don't think we need to postpone the branch cut for those patches. It's easier to cut a branch-3.3 and allow backporting. As of today, we already have an obvious Apache Spark 3.4 patch in

Re: Apache Spark 3.3 Release

2022-03-15 Thread Xiao Li
Cutting the branch is simple, but we need to avoid backporting the feature work that are not being well discussed. Not all the members are actively following the dev list. I think we should wait 3 more days for collecting the PR list before cutting the branch. BTW, there are very few 3.4-only

Re: Data correctness issue with Repartition + FetchFailure

2022-03-15 Thread Jason Xu
Hi Wenchen, thanks for the insight. Agree, the previous fix for repartition works for deterministic data. With non-deterministic data, I didn't find an API to pass DeterministicLevel to underlying rdd. Do you plan to continue work on integration with SQL operators? If not, I'm available to take a

Re: Apache Spark 3.3 Release

2022-03-15 Thread Dongjoon Hyun
Xiao. You are working against what you are saying. If you don't cut a branch, it means you are allowing all patches to land Apache Spark 3.3. No? > we need to avoid backporting the feature work that are not being well discussed. On Tue, Mar 15, 2022 at 12:12 PM Xiao Li wrote: > Cutting the