Re: Apache Spark 3.4.2 (?)
Thank you all. Here is an update. Thanks to your help, all open blocker issues (including correctness issues) are resolved. However, I'm still waiting for this additional alternative approach PR for the previously resolved JIRAs. https://github.com/apache/spark/pull/43760 (for Apache Spark 4.0.0, 3.5.2, 3.4.2). Although the above PR is still under review and needs revisions, I hope we can start 3.4.2 RC1 vote early this week. Bests, Dongjoon. On 2023/11/10 08:41:57 Kent Yao wrote: > +1 > > Maxim Gekk 于2023年11月9日周四 18:18写道: > > > > +1 > > > > On Wed, Nov 8, 2023 at 5:29 AM kazuyuki tanimura > > wrote: > >> > >> +1 > >> > >> Kazu > >> > >> On Nov 7, 2023, at 5:23 PM, L. C. Hsieh wrote: > >> > >> +1 > >> > >> On Tue, Nov 7, 2023 at 4:56 PM Dongjoon Hyun > >> wrote: > >> > >> > >> Thank you all! > >> > >> Dongjoon > >> > >> On Mon, Nov 6, 2023 at 6:03 PM Holden Karau wrote: > >> > >> > >> +1 > >> > >> On Mon, Nov 6, 2023 at 4:30 PM yangjie01 > >> wrote: > >> > >> > >> +1 > >> > >> > >> > >> 发件人: Yuming Wang > >> 日期: 2023年11月7日 星期二 07:00 > >> 收件人: Santosh Pingale > >> 抄送: Dongjoon Hyun , dev > >> 主题: Re: Apache Spark 3.4.2 (?) > >> > >> > >> > >> +1 > >> > >> > >> > >> On Tue, Nov 7, 2023 at 3:55 AM Santosh Pingale > >> wrote: > >> > >> Makes sense given the nature of those commits. > >> > >> > >> > >> On Mon, Nov 6, 2023, 7:52 PM Dongjoon Hyun wrote: > >> > >> Hi, All. > >> > >> Apache Spark 3.4.1 tag was created on Jun 19th and `branch-3.4` has 103 > >> commits including important security and correctness patches like > >> SPARK-44251, SPARK-44805, and SPARK-44940. > >> > >>https://github.com/apache/spark/releases/tag/v3.4.1 > >> > >>$ git log --oneline v3.4.1..HEAD | wc -l > >>103 > >> > >>SPARK-44251 Potential for incorrect results or NPE when full outer > >> USING join has null key value > >>SPARK-44805 Data lost after union using > >> spark.sql.parquet.enableNestedColumnVectorizedReader=true > >>SPARK-44940 Improve performance of JSON parsing when > >> "spark.sql.json.enablePartialResults" is enabled > >> > >> Currently, I'm checking the following open correctness issues. I'd like to > >> propose to release Apache Spark 3.4.2 after resolving them and volunteer > >> as the release manager for Apache Spark 3.4.2. If there are no additional > >> blockers, the first tentative RC1 vote date is November 13rd (Monday). If > >> it takes some time to resolve the open correctness issues, we can start > >> the vote after Thanksgiving holiday. > >> > >>SPARK-44512 dataset.sort.select.write.partitionBy sorts wrong column > >>SPARK-45282 Join loses records for cached datasets > >> > >> WDTY? > >> > >> Dongjoon. > >> > >> > >> - > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >> > >> > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Apache Spark 3.4.2 (?)
+1 Maxim Gekk 于2023年11月9日周四 18:18写道: > > +1 > > On Wed, Nov 8, 2023 at 5:29 AM kazuyuki tanimura > wrote: >> >> +1 >> >> Kazu >> >> On Nov 7, 2023, at 5:23 PM, L. C. Hsieh wrote: >> >> +1 >> >> On Tue, Nov 7, 2023 at 4:56 PM Dongjoon Hyun wrote: >> >> >> Thank you all! >> >> Dongjoon >> >> On Mon, Nov 6, 2023 at 6:03 PM Holden Karau wrote: >> >> >> +1 >> >> On Mon, Nov 6, 2023 at 4:30 PM yangjie01 wrote: >> >> >> +1 >> >> >> >> 发件人: Yuming Wang >> 日期: 2023年11月7日 星期二 07:00 >> 收件人: Santosh Pingale >> 抄送: Dongjoon Hyun , dev >> 主题: Re: Apache Spark 3.4.2 (?) >> >> >> >> +1 >> >> >> >> On Tue, Nov 7, 2023 at 3:55 AM Santosh Pingale >> wrote: >> >> Makes sense given the nature of those commits. >> >> >> >> On Mon, Nov 6, 2023, 7:52 PM Dongjoon Hyun wrote: >> >> Hi, All. >> >> Apache Spark 3.4.1 tag was created on Jun 19th and `branch-3.4` has 103 >> commits including important security and correctness patches like >> SPARK-44251, SPARK-44805, and SPARK-44940. >> >>https://github.com/apache/spark/releases/tag/v3.4.1 >> >>$ git log --oneline v3.4.1..HEAD | wc -l >>103 >> >>SPARK-44251 Potential for incorrect results or NPE when full outer USING >> join has null key value >>SPARK-44805 Data lost after union using >> spark.sql.parquet.enableNestedColumnVectorizedReader=true >>SPARK-44940 Improve performance of JSON parsing when >> "spark.sql.json.enablePartialResults" is enabled >> >> Currently, I'm checking the following open correctness issues. I'd like to >> propose to release Apache Spark 3.4.2 after resolving them and volunteer as >> the release manager for Apache Spark 3.4.2. If there are no additional >> blockers, the first tentative RC1 vote date is November 13rd (Monday). If it >> takes some time to resolve the open correctness issues, we can start the >> vote after Thanksgiving holiday. >> >>SPARK-44512 dataset.sort.select.write.partitionBy sorts wrong column >>SPARK-45282 Join loses records for cached datasets >> >> WDTY? >> >> Dongjoon. >> >> >> - >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Apache Spark 3.4.2 (?)
+1 On Wed, Nov 8, 2023 at 5:29 AM kazuyuki tanimura wrote: > +1 > > Kazu > > On Nov 7, 2023, at 5:23 PM, L. C. Hsieh wrote: > > +1 > > On Tue, Nov 7, 2023 at 4:56 PM Dongjoon Hyun > wrote: > > > Thank you all! > > Dongjoon > > On Mon, Nov 6, 2023 at 6:03 PM Holden Karau wrote: > > > +1 > > On Mon, Nov 6, 2023 at 4:30 PM yangjie01 > wrote: > > > +1 > > > > 发件人: Yuming Wang > 日期: 2023年11月7日 星期二 07:00 > 收件人: Santosh Pingale > 抄送: Dongjoon Hyun , dev > 主题: Re: Apache Spark 3.4.2 (?) > > > > +1 > > > > On Tue, Nov 7, 2023 at 3:55 AM Santosh Pingale > wrote: > > Makes sense given the nature of those commits. > > > > On Mon, Nov 6, 2023, 7:52 PM Dongjoon Hyun > wrote: > > Hi, All. > > Apache Spark 3.4.1 tag was created on Jun 19th and `branch-3.4` has 103 > commits including important security and correctness patches like > SPARK-44251, SPARK-44805, and SPARK-44940. > >https://github.com/apache/spark/releases/tag/v3.4.1 > >$ git log --oneline v3.4.1..HEAD | wc -l >103 > >SPARK-44251 Potential for incorrect results or NPE when full outer > USING join has null key value >SPARK-44805 Data lost after union using > spark.sql.parquet.enableNestedColumnVectorizedReader=true >SPARK-44940 Improve performance of JSON parsing when > "spark.sql.json.enablePartialResults" is enabled > > Currently, I'm checking the following open correctness issues. I'd like to > propose to release Apache Spark 3.4.2 after resolving them and volunteer as > the release manager for Apache Spark 3.4.2. If there are no additional > blockers, the first tentative RC1 vote date is November 13rd (Monday). If > it takes some time to resolve the open correctness issues, we can start the > vote after Thanksgiving holiday. > >SPARK-44512 dataset.sort.select.write.partitionBy sorts wrong column >SPARK-45282 Join loses records for cached datasets > > WDTY? > > Dongjoon. > > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > >
Re: Apache Spark 3.4.2 (?)
+1 Kazu > On Nov 7, 2023, at 5:23 PM, L. C. Hsieh wrote: > > +1 > > On Tue, Nov 7, 2023 at 4:56 PM Dongjoon Hyun wrote: >> >> Thank you all! >> >> Dongjoon >> >> On Mon, Nov 6, 2023 at 6:03 PM Holden Karau wrote: >>> >>> +1 >>> >>> On Mon, Nov 6, 2023 at 4:30 PM yangjie01 >>> wrote: >>>> >>>> +1 >>>> >>>> >>>> >>>> 发件人: Yuming Wang >>>> 日期: 2023年11月7日 星期二 07:00 >>>> 收件人: Santosh Pingale >>>> 抄送: Dongjoon Hyun , dev >>>> 主题: Re: Apache Spark 3.4.2 (?) >>>> >>>> >>>> >>>> +1 >>>> >>>> >>>> >>>> On Tue, Nov 7, 2023 at 3:55 AM Santosh Pingale >>>> wrote: >>>> >>>> Makes sense given the nature of those commits. >>>> >>>> >>>> >>>> On Mon, Nov 6, 2023, 7:52 PM Dongjoon Hyun wrote: >>>> >>>> Hi, All. >>>> >>>> Apache Spark 3.4.1 tag was created on Jun 19th and `branch-3.4` has 103 >>>> commits including important security and correctness patches like >>>> SPARK-44251, SPARK-44805, and SPARK-44940. >>>> >>>>https://github.com/apache/spark/releases/tag/v3.4.1 >>>> >>>>$ git log --oneline v3.4.1..HEAD | wc -l >>>>103 >>>> >>>>SPARK-44251 Potential for incorrect results or NPE when full outer >>>> USING join has null key value >>>>SPARK-44805 Data lost after union using >>>> spark.sql.parquet.enableNestedColumnVectorizedReader=true >>>>SPARK-44940 Improve performance of JSON parsing when >>>> "spark.sql.json.enablePartialResults" is enabled >>>> >>>> Currently, I'm checking the following open correctness issues. I'd like to >>>> propose to release Apache Spark 3.4.2 after resolving them and volunteer >>>> as the release manager for Apache Spark 3.4.2. If there are no additional >>>> blockers, the first tentative RC1 vote date is November 13rd (Monday). If >>>> it takes some time to resolve the open correctness issues, we can start >>>> the vote after Thanksgiving holiday. >>>> >>>>SPARK-44512 dataset.sort.select.write.partitionBy sorts wrong column >>>>SPARK-45282 Join loses records for cached datasets >>>> >>>> WDTY? >>>> >>>> Dongjoon. > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >
Re: Apache Spark 3.4.2 (?)
+1 On Tue, Nov 7, 2023 at 4:56 PM Dongjoon Hyun wrote: > > Thank you all! > > Dongjoon > > On Mon, Nov 6, 2023 at 6:03 PM Holden Karau wrote: >> >> +1 >> >> On Mon, Nov 6, 2023 at 4:30 PM yangjie01 wrote: >>> >>> +1 >>> >>> >>> >>> 发件人: Yuming Wang >>> 日期: 2023年11月7日 星期二 07:00 >>> 收件人: Santosh Pingale >>> 抄送: Dongjoon Hyun , dev >>> 主题: Re: Apache Spark 3.4.2 (?) >>> >>> >>> >>> +1 >>> >>> >>> >>> On Tue, Nov 7, 2023 at 3:55 AM Santosh Pingale >>> wrote: >>> >>> Makes sense given the nature of those commits. >>> >>> >>> >>> On Mon, Nov 6, 2023, 7:52 PM Dongjoon Hyun wrote: >>> >>> Hi, All. >>> >>> Apache Spark 3.4.1 tag was created on Jun 19th and `branch-3.4` has 103 >>> commits including important security and correctness patches like >>> SPARK-44251, SPARK-44805, and SPARK-44940. >>> >>> https://github.com/apache/spark/releases/tag/v3.4.1 >>> >>> $ git log --oneline v3.4.1..HEAD | wc -l >>> 103 >>> >>> SPARK-44251 Potential for incorrect results or NPE when full outer >>> USING join has null key value >>> SPARK-44805 Data lost after union using >>> spark.sql.parquet.enableNestedColumnVectorizedReader=true >>> SPARK-44940 Improve performance of JSON parsing when >>> "spark.sql.json.enablePartialResults" is enabled >>> >>> Currently, I'm checking the following open correctness issues. I'd like to >>> propose to release Apache Spark 3.4.2 after resolving them and volunteer as >>> the release manager for Apache Spark 3.4.2. If there are no additional >>> blockers, the first tentative RC1 vote date is November 13rd (Monday). If >>> it takes some time to resolve the open correctness issues, we can start the >>> vote after Thanksgiving holiday. >>> >>> SPARK-44512 dataset.sort.select.write.partitionBy sorts wrong column >>> SPARK-45282 Join loses records for cached datasets >>> >>> WDTY? >>> >>> Dongjoon. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
Re: Apache Spark 3.4.2 (?)
Thank you all! Dongjoon On Mon, Nov 6, 2023 at 6:03 PM Holden Karau wrote: > +1 > > On Mon, Nov 6, 2023 at 4:30 PM yangjie01 > wrote: > >> +1 >> >> >> >> *发件人**: *Yuming Wang >> *日期**: *2023年11月7日 星期二 07:00 >> *收件人**: *Santosh Pingale >> *抄送**: *Dongjoon Hyun , dev < >> dev@spark.apache.org> >> *主题**: *Re: Apache Spark 3.4.2 (?) >> >> >> >> +1 >> >> >> >> On Tue, Nov 7, 2023 at 3:55 AM Santosh Pingale >> wrote: >> >> Makes sense given the nature of those commits. >> >> >> >> On Mon, Nov 6, 2023, 7:52 PM Dongjoon Hyun >> wrote: >> >> Hi, All. >> >> Apache Spark 3.4.1 tag was created on Jun 19th and `branch-3.4` has 103 >> commits including important security and correctness patches like >> SPARK-44251, SPARK-44805, and SPARK-44940. >> >> https://github.com/apache/spark/releases/tag/v3.4.1 >> <https://mailshield.baidu.com/check?q=8mtU6R7ROnz38d6jZlaYci3jI%2b5S56t2j7nGA5F9QxA2VlWTmDyCT%2f8AiOVvzkLi32ehJN%2bSZX4%3d> >> >> $ git log --oneline v3.4.1..HEAD | wc -l >> 103 >> >> SPARK-44251 Potential for incorrect results or NPE when full outer >> USING join has null key value >> SPARK-44805 Data lost after union using >> spark.sql.parquet.enableNestedColumnVectorizedReader=true >> SPARK-44940 Improve performance of JSON parsing when >> "spark.sql.json.enablePartialResults" is enabled >> >> Currently, I'm checking the following open correctness issues. I'd like >> to propose to release Apache Spark 3.4.2 after resolving them and volunteer >> as the release manager for Apache Spark 3.4.2. If there are no additional >> blockers, the first tentative RC1 vote date is November 13rd (Monday). If >> it takes some time to resolve the open correctness issues, we can start the >> vote after Thanksgiving holiday. >> >> SPARK-44512 dataset.sort.select.write.partitionBy sorts wrong column >> SPARK-45282 Join loses records for cached datasets >> >> WDTY? >> >> Dongjoon. >> >>
Re: Apache Spark 3.4.2 (?)
+1 On Mon, Nov 6, 2023 at 4:30 PM yangjie01 wrote: > +1 > > > > *发件人**: *Yuming Wang > *日期**: *2023年11月7日 星期二 07:00 > *收件人**: *Santosh Pingale > *抄送**: *Dongjoon Hyun , dev > > *主题**: *Re: Apache Spark 3.4.2 (?) > > > > +1 > > > > On Tue, Nov 7, 2023 at 3:55 AM Santosh Pingale > wrote: > > Makes sense given the nature of those commits. > > > > On Mon, Nov 6, 2023, 7:52 PM Dongjoon Hyun > wrote: > > Hi, All. > > Apache Spark 3.4.1 tag was created on Jun 19th and `branch-3.4` has 103 > commits including important security and correctness patches like > SPARK-44251, SPARK-44805, and SPARK-44940. > > https://github.com/apache/spark/releases/tag/v3.4.1 > <https://mailshield.baidu.com/check?q=8mtU6R7ROnz38d6jZlaYci3jI%2b5S56t2j7nGA5F9QxA2VlWTmDyCT%2f8AiOVvzkLi32ehJN%2bSZX4%3d> > > $ git log --oneline v3.4.1..HEAD | wc -l > 103 > > SPARK-44251 Potential for incorrect results or NPE when full outer > USING join has null key value > SPARK-44805 Data lost after union using > spark.sql.parquet.enableNestedColumnVectorizedReader=true > SPARK-44940 Improve performance of JSON parsing when > "spark.sql.json.enablePartialResults" is enabled > > Currently, I'm checking the following open correctness issues. I'd like to > propose to release Apache Spark 3.4.2 after resolving them and volunteer as > the release manager for Apache Spark 3.4.2. If there are no additional > blockers, the first tentative RC1 vote date is November 13rd (Monday). If > it takes some time to resolve the open correctness issues, we can start the > vote after Thanksgiving holiday. > > SPARK-44512 dataset.sort.select.write.partitionBy sorts wrong column > SPARK-45282 Join loses records for cached datasets > > WDTY? > > Dongjoon. > >
Re: Apache Spark 3.4.2 (?)
+1 发件人: Yuming Wang 日期: 2023年11月7日 星期二 07:00 收件人: Santosh Pingale 抄送: Dongjoon Hyun , dev 主题: Re: Apache Spark 3.4.2 (?) +1 On Tue, Nov 7, 2023 at 3:55 AM Santosh Pingale wrote: Makes sense given the nature of those commits. On Mon, Nov 6, 2023, 7:52 PM Dongjoon Hyun mailto:dongjoon.h...@gmail.com>> wrote: Hi, All. Apache Spark 3.4.1 tag was created on Jun 19th and `branch-3.4` has 103 commits including important security and correctness patches like SPARK-44251, SPARK-44805, and SPARK-44940. https://github.com/apache/spark/releases/tag/v3.4.1<https://mailshield.baidu.com/check?q=8mtU6R7ROnz38d6jZlaYci3jI%2b5S56t2j7nGA5F9QxA2VlWTmDyCT%2f8AiOVvzkLi32ehJN%2bSZX4%3d> $ git log --oneline v3.4.1..HEAD | wc -l 103 SPARK-44251 Potential for incorrect results or NPE when full outer USING join has null key value SPARK-44805 Data lost after union using spark.sql.parquet.enableNestedColumnVectorizedReader=true SPARK-44940 Improve performance of JSON parsing when "spark.sql.json.enablePartialResults" is enabled Currently, I'm checking the following open correctness issues. I'd like to propose to release Apache Spark 3.4.2 after resolving them and volunteer as the release manager for Apache Spark 3.4.2. If there are no additional blockers, the first tentative RC1 vote date is November 13rd (Monday). If it takes some time to resolve the open correctness issues, we can start the vote after Thanksgiving holiday. SPARK-44512 dataset.sort.select.write.partitionBy sorts wrong column SPARK-45282 Join loses records for cached datasets WDTY? Dongjoon.
Re: Apache Spark 3.4.2 (?)
+1 On Tue, Nov 7, 2023 at 3:55 AM Santosh Pingale wrote: > Makes sense given the nature of those commits. > > On Mon, Nov 6, 2023, 7:52 PM Dongjoon Hyun > wrote: > >> Hi, All. >> >> Apache Spark 3.4.1 tag was created on Jun 19th and `branch-3.4` has 103 >> commits including important security and correctness patches like >> SPARK-44251, SPARK-44805, and SPARK-44940. >> >> https://github.com/apache/spark/releases/tag/v3.4.1 >> >> $ git log --oneline v3.4.1..HEAD | wc -l >> 103 >> >> SPARK-44251 Potential for incorrect results or NPE when full outer >> USING join has null key value >> SPARK-44805 Data lost after union using >> spark.sql.parquet.enableNestedColumnVectorizedReader=true >> SPARK-44940 Improve performance of JSON parsing when >> "spark.sql.json.enablePartialResults" is enabled >> >> Currently, I'm checking the following open correctness issues. I'd like >> to propose to release Apache Spark 3.4.2 after resolving them and volunteer >> as the release manager for Apache Spark 3.4.2. If there are no additional >> blockers, the first tentative RC1 vote date is November 13rd (Monday). If >> it takes some time to resolve the open correctness issues, we can start the >> vote after Thanksgiving holiday. >> >> SPARK-44512 dataset.sort.select.write.partitionBy sorts wrong column >> SPARK-45282 Join loses records for cached datasets >> >> WDTY? >> >> Dongjoon. >> >
Re: Apache Spark 3.4.2 (?)
Makes sense given the nature of those commits. On Mon, Nov 6, 2023, 7:52 PM Dongjoon Hyun wrote: > Hi, All. > > Apache Spark 3.4.1 tag was created on Jun 19th and `branch-3.4` has 103 > commits including important security and correctness patches like > SPARK-44251, SPARK-44805, and SPARK-44940. > > https://github.com/apache/spark/releases/tag/v3.4.1 > > $ git log --oneline v3.4.1..HEAD | wc -l > 103 > > SPARK-44251 Potential for incorrect results or NPE when full outer > USING join has null key value > SPARK-44805 Data lost after union using > spark.sql.parquet.enableNestedColumnVectorizedReader=true > SPARK-44940 Improve performance of JSON parsing when > "spark.sql.json.enablePartialResults" is enabled > > Currently, I'm checking the following open correctness issues. I'd like to > propose to release Apache Spark 3.4.2 after resolving them and volunteer as > the release manager for Apache Spark 3.4.2. If there are no additional > blockers, the first tentative RC1 vote date is November 13rd (Monday). If > it takes some time to resolve the open correctness issues, we can start the > vote after Thanksgiving holiday. > > SPARK-44512 dataset.sort.select.write.partitionBy sorts wrong column > SPARK-45282 Join loses records for cached datasets > > WDTY? > > Dongjoon. >