Re: Apache Spark 3.3 Release

2022-03-14 Thread Xiao Li
To make our release time more predictable, let us collect the PRs and wait three more days before the branch cut? Please list all the actively developed feature work we plan to release with Spark 3.3? We should avoid merging any new feature work that is not being discussed in this email thread.

Re: Apache Spark 3.3 Release

2022-03-14 Thread Chao Sun
I mainly mean: - [SPARK-35801] Row-level operations in Data Source V2 - [SPARK-37166] Storage Partitioned Join For which the PR: - https://github.com/apache/spark/pull/35395 - https://github.com/apache/spark/pull/35657 are actively being reviewed. It seems there are ongoing PRs for other

Re: Apache Spark 3.3 Release

2022-03-14 Thread Holden Karau
On Mon, Mar 14, 2022 at 11:53 PM Xiao Li wrote: > Could you please list which features we want to finish before the branch > cut? How long will they take? > > Xiao > > Chao Sun 于2022年3月14日周一 13:30写道: > >> Hi Max, >> >> As there are still some ongoing work for the above listed SPIPs, can we >>

Re: Apache Spark 3.3 Release

2022-03-14 Thread Xiao Li
Could you please list which features we want to finish before the branch cut? How long will they take? Xiao Chao Sun 于2022年3月14日周一 13:30写道: > Hi Max, > > As there are still some ongoing work for the above listed SPIPs, can we > still merge them after the branch cut? > > Thanks, > Chao > > On

Re: Data correctness issue with Repartition + FetchFailure

2022-03-14 Thread Wenchen Fan
We fixed the repartition correctness bug before, by sorting the data before doing round-robin partitioning. But the issue is that we need to propagate the isDeterministic property through SQL operators. On Tue, Mar 15, 2022 at 1:50 AM Jason Xu wrote: > Hi Reynold, do you suggest removing

Re: Apache Spark 3.3 Release

2022-03-14 Thread Chao Sun
Hi Max, As there are still some ongoing work for the above listed SPIPs, can we still merge them after the branch cut? Thanks, Chao On Mon, Mar 14, 2022 at 6:12 AM Maxim Gekk wrote: > Hi All, > > Since there are no actual blockers for Spark 3.3.0 and significant > objections, I am going to

Re: Data correctness issue with Repartition + FetchFailure

2022-03-14 Thread Jason Xu
Hi Reynold, do you suggest removing RoundRobinPartitioning in repartition(numPartitions: Int) API implementation? If that's the direction we're considering, before we have a new implementation, should we suggest users avoid using the repartition(numPartitions: Int) API? On Sat, Mar 12, 2022 at

Re: Apache Spark 3.3 Release

2022-03-14 Thread Maxim Gekk
Hi All, Since there are no actual blockers for Spark 3.3.0 and significant objections, I am going to cut branch-3.3 after 15th March at 00:00 PST. Please, let us know if you have any concerns about that. Best regards, Max Gekk On Thu, Mar 3, 2022 at 9:44 PM Maxim Gekk wrote: > Hello All, > >