Re: Apache Spark 3.2.2 Release?
+1 Thanks for driving this Dongjoon ! Regards, Mridul On Thu, Jul 7, 2022 at 12:36 AM Gengliang Wang wrote: > +1. > Thank you, Dongjoon. > > On Wed, Jul 6, 2022 at 10:21 PM Wenchen Fan wrote: > >> +1 >> >> On Thu, Jul 7, 2022 at 10:41 AM Xinrong Meng >> wrote: >> >>> +1 >>> >>> Thanks! >>> >>> >>> Xinrong Meng >>> >>> Software Engineer >>> >>> Databricks >>> >>> >>> On Wed, Jul 6, 2022 at 7:25 PM Xiao Li wrote: >>> +1 Xiao Cheng Su 于2022年7月6日周三 19:16写道: > +1 (non-binding) > > Thanks, > Cheng Su > > On Wed, Jul 6, 2022 at 6:01 PM Yuming Wang wrote: > >> +1 >> >> On Thu, Jul 7, 2022 at 5:53 AM Maxim Gekk >> wrote: >> >>> +1 >>> >>> On Thu, Jul 7, 2022 at 12:26 AM John Zhuge >>> wrote: >>> +1 Thanks for the effort! On Wed, Jul 6, 2022 at 2:23 PM Bjørn Jørgensen < bjornjorgen...@gmail.com> wrote: > +1 > > ons. 6. jul. 2022, 23:05 skrev Hyukjin Kwon : > >> Yeah +1 >> >> On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun < >> dongjoon.h...@gmail.com> wrote: >> >>> Hi, All. >>> >>> Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches >>> including 11 correctness patches arrived at branch-3.2. >>> >>> Shall we make a new release, Apache Spark 3.2.2, as the third >>> release >>> at 3.2 line? I'd like to volunteer as the release manager for >>> Apache >>> Spark 3.2.2. I'm thinking about starting the first RC next week. >>> >>> $ git log --oneline v3.2.1..HEAD | wc -l >>> 197 >>> >>> # Correctness issues >>> >>> SPARK-38075 Hive script transform with order by and limit >>> will >>> return fake rows >>> SPARK-38204 All state operators are at a risk of >>> inconsistency >>> between state partitioning and operator partitioning >>> SPARK-38309 SHS has incorrect percentiles for shuffle read >>> bytes >>> and shuffle total blocks metrics >>> SPARK-38320 (flat)MapGroupsWithState can timeout groups >>> which just >>> received inputs in the same microbatch >>> SPARK-38614 After Spark update, df.show() shows incorrect >>> F.percent_rank results >>> SPARK-38655 OffsetWindowFunctionFrameBase cannot find the >>> offset >>> row whose input is not null >>> SPARK-38684 Stream-stream outer join has a possible >>> correctness >>> issue due to weakly read consistent on outer iterators >>> SPARK-39061 Incorrect results or NPE when using Inline >>> function >>> against an array of dynamically created structs >>> SPARK-39107 Silent change in regexp_replace's handling of >>> empty strings >>> SPARK-39259 Timestamps returned by now() and equivalent >>> functions >>> are not consistent in subqueries >>> SPARK-39293 The accumulator of ArrayAggregate should copy the >>> intermediate result if string, struct, array, or map >>> >>> Best, >>> Dongjoon. >>> >>> >>> - >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>> -- John Zhuge >>>
Re: Apache Spark 3.2.2 Release?
+1. Thank you, Dongjoon. On Wed, Jul 6, 2022 at 10:21 PM Wenchen Fan wrote: > +1 > > On Thu, Jul 7, 2022 at 10:41 AM Xinrong Meng > wrote: > >> +1 >> >> Thanks! >> >> >> Xinrong Meng >> >> Software Engineer >> >> Databricks >> >> >> On Wed, Jul 6, 2022 at 7:25 PM Xiao Li wrote: >> >>> +1 >>> >>> Xiao >>> >>> Cheng Su 于2022年7月6日周三 19:16写道: >>> +1 (non-binding) Thanks, Cheng Su On Wed, Jul 6, 2022 at 6:01 PM Yuming Wang wrote: > +1 > > On Thu, Jul 7, 2022 at 5:53 AM Maxim Gekk > wrote: > >> +1 >> >> On Thu, Jul 7, 2022 at 12:26 AM John Zhuge wrote: >> >>> +1 Thanks for the effort! >>> >>> On Wed, Jul 6, 2022 at 2:23 PM Bjørn Jørgensen < >>> bjornjorgen...@gmail.com> wrote: >>> +1 ons. 6. jul. 2022, 23:05 skrev Hyukjin Kwon : > Yeah +1 > > On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun < > dongjoon.h...@gmail.com> wrote: > >> Hi, All. >> >> Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches >> including 11 correctness patches arrived at branch-3.2. >> >> Shall we make a new release, Apache Spark 3.2.2, as the third >> release >> at 3.2 line? I'd like to volunteer as the release manager for >> Apache >> Spark 3.2.2. I'm thinking about starting the first RC next week. >> >> $ git log --oneline v3.2.1..HEAD | wc -l >> 197 >> >> # Correctness issues >> >> SPARK-38075 Hive script transform with order by and limit will >> return fake rows >> SPARK-38204 All state operators are at a risk of inconsistency >> between state partitioning and operator partitioning >> SPARK-38309 SHS has incorrect percentiles for shuffle read >> bytes >> and shuffle total blocks metrics >> SPARK-38320 (flat)MapGroupsWithState can timeout groups which >> just >> received inputs in the same microbatch >> SPARK-38614 After Spark update, df.show() shows incorrect >> F.percent_rank results >> SPARK-38655 OffsetWindowFunctionFrameBase cannot find the >> offset >> row whose input is not null >> SPARK-38684 Stream-stream outer join has a possible >> correctness >> issue due to weakly read consistent on outer iterators >> SPARK-39061 Incorrect results or NPE when using Inline >> function >> against an array of dynamically created structs >> SPARK-39107 Silent change in regexp_replace's handling of >> empty strings >> SPARK-39259 Timestamps returned by now() and equivalent >> functions >> are not consistent in subqueries >> SPARK-39293 The accumulator of ArrayAggregate should copy the >> intermediate result if string, struct, array, or map >> >> Best, >> Dongjoon. >> >> >> - >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> -- >>> John Zhuge >>> >>
Re: Apache Spark 3.2.2 Release?
+1 On Thu, Jul 7, 2022 at 10:41 AM Xinrong Meng wrote: > +1 > > Thanks! > > > Xinrong Meng > > Software Engineer > > Databricks > > > On Wed, Jul 6, 2022 at 7:25 PM Xiao Li wrote: > >> +1 >> >> Xiao >> >> Cheng Su 于2022年7月6日周三 19:16写道: >> >>> +1 (non-binding) >>> >>> Thanks, >>> Cheng Su >>> >>> On Wed, Jul 6, 2022 at 6:01 PM Yuming Wang wrote: >>> +1 On Thu, Jul 7, 2022 at 5:53 AM Maxim Gekk wrote: > +1 > > On Thu, Jul 7, 2022 at 12:26 AM John Zhuge wrote: > >> +1 Thanks for the effort! >> >> On Wed, Jul 6, 2022 at 2:23 PM Bjørn Jørgensen < >> bjornjorgen...@gmail.com> wrote: >> >>> +1 >>> >>> ons. 6. jul. 2022, 23:05 skrev Hyukjin Kwon : >>> Yeah +1 On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun < dongjoon.h...@gmail.com> wrote: > Hi, All. > > Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches > including 11 correctness patches arrived at branch-3.2. > > Shall we make a new release, Apache Spark 3.2.2, as the third > release > at 3.2 line? I'd like to volunteer as the release manager for > Apache > Spark 3.2.2. I'm thinking about starting the first RC next week. > > $ git log --oneline v3.2.1..HEAD | wc -l > 197 > > # Correctness issues > > SPARK-38075 Hive script transform with order by and limit will > return fake rows > SPARK-38204 All state operators are at a risk of inconsistency > between state partitioning and operator partitioning > SPARK-38309 SHS has incorrect percentiles for shuffle read > bytes > and shuffle total blocks metrics > SPARK-38320 (flat)MapGroupsWithState can timeout groups which > just > received inputs in the same microbatch > SPARK-38614 After Spark update, df.show() shows incorrect > F.percent_rank results > SPARK-38655 OffsetWindowFunctionFrameBase cannot find the > offset > row whose input is not null > SPARK-38684 Stream-stream outer join has a possible correctness > issue due to weakly read consistent on outer iterators > SPARK-39061 Incorrect results or NPE when using Inline function > against an array of dynamically created structs > SPARK-39107 Silent change in regexp_replace's handling of > empty strings > SPARK-39259 Timestamps returned by now() and equivalent > functions > are not consistent in subqueries > SPARK-39293 The accumulator of ArrayAggregate should copy the > intermediate result if string, struct, array, or map > > Best, > Dongjoon. > > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- >> John Zhuge >> >
Re: Apache Spark 3.2.2 Release?
+1 Thanks! Xinrong Meng Software Engineer Databricks On Wed, Jul 6, 2022 at 7:25 PM Xiao Li wrote: > +1 > > Xiao > > Cheng Su 于2022年7月6日周三 19:16写道: > >> +1 (non-binding) >> >> Thanks, >> Cheng Su >> >> On Wed, Jul 6, 2022 at 6:01 PM Yuming Wang wrote: >> >>> +1 >>> >>> On Thu, Jul 7, 2022 at 5:53 AM Maxim Gekk >>> wrote: >>> +1 On Thu, Jul 7, 2022 at 12:26 AM John Zhuge wrote: > +1 Thanks for the effort! > > On Wed, Jul 6, 2022 at 2:23 PM Bjørn Jørgensen < > bjornjorgen...@gmail.com> wrote: > >> +1 >> >> ons. 6. jul. 2022, 23:05 skrev Hyukjin Kwon : >> >>> Yeah +1 >>> >>> On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun < >>> dongjoon.h...@gmail.com> wrote: >>> Hi, All. Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches including 11 correctness patches arrived at branch-3.2. Shall we make a new release, Apache Spark 3.2.2, as the third release at 3.2 line? I'd like to volunteer as the release manager for Apache Spark 3.2.2. I'm thinking about starting the first RC next week. $ git log --oneline v3.2.1..HEAD | wc -l 197 # Correctness issues SPARK-38075 Hive script transform with order by and limit will return fake rows SPARK-38204 All state operators are at a risk of inconsistency between state partitioning and operator partitioning SPARK-38309 SHS has incorrect percentiles for shuffle read bytes and shuffle total blocks metrics SPARK-38320 (flat)MapGroupsWithState can timeout groups which just received inputs in the same microbatch SPARK-38614 After Spark update, df.show() shows incorrect F.percent_rank results SPARK-38655 OffsetWindowFunctionFrameBase cannot find the offset row whose input is not null SPARK-38684 Stream-stream outer join has a possible correctness issue due to weakly read consistent on outer iterators SPARK-39061 Incorrect results or NPE when using Inline function against an array of dynamically created structs SPARK-39107 Silent change in regexp_replace's handling of empty strings SPARK-39259 Timestamps returned by now() and equivalent functions are not consistent in subqueries SPARK-39293 The accumulator of ArrayAggregate should copy the intermediate result if string, struct, array, or map Best, Dongjoon. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org -- > John Zhuge >
Re: Apache Spark 3.2.2 Release?
+1 Xiao Cheng Su 于2022年7月6日周三 19:16写道: > +1 (non-binding) > > Thanks, > Cheng Su > > On Wed, Jul 6, 2022 at 6:01 PM Yuming Wang wrote: > >> +1 >> >> On Thu, Jul 7, 2022 at 5:53 AM Maxim Gekk >> wrote: >> >>> +1 >>> >>> On Thu, Jul 7, 2022 at 12:26 AM John Zhuge wrote: >>> +1 Thanks for the effort! On Wed, Jul 6, 2022 at 2:23 PM Bjørn Jørgensen < bjornjorgen...@gmail.com> wrote: > +1 > > ons. 6. jul. 2022, 23:05 skrev Hyukjin Kwon : > >> Yeah +1 >> >> On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun >> wrote: >> >>> Hi, All. >>> >>> Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches >>> including 11 correctness patches arrived at branch-3.2. >>> >>> Shall we make a new release, Apache Spark 3.2.2, as the third release >>> at 3.2 line? I'd like to volunteer as the release manager for Apache >>> Spark 3.2.2. I'm thinking about starting the first RC next week. >>> >>> $ git log --oneline v3.2.1..HEAD | wc -l >>> 197 >>> >>> # Correctness issues >>> >>> SPARK-38075 Hive script transform with order by and limit will >>> return fake rows >>> SPARK-38204 All state operators are at a risk of inconsistency >>> between state partitioning and operator partitioning >>> SPARK-38309 SHS has incorrect percentiles for shuffle read bytes >>> and shuffle total blocks metrics >>> SPARK-38320 (flat)MapGroupsWithState can timeout groups which >>> just >>> received inputs in the same microbatch >>> SPARK-38614 After Spark update, df.show() shows incorrect >>> F.percent_rank results >>> SPARK-38655 OffsetWindowFunctionFrameBase cannot find the offset >>> row whose input is not null >>> SPARK-38684 Stream-stream outer join has a possible correctness >>> issue due to weakly read consistent on outer iterators >>> SPARK-39061 Incorrect results or NPE when using Inline function >>> against an array of dynamically created structs >>> SPARK-39107 Silent change in regexp_replace's handling of empty >>> strings >>> SPARK-39259 Timestamps returned by now() and equivalent functions >>> are not consistent in subqueries >>> SPARK-39293 The accumulator of ArrayAggregate should copy the >>> intermediate result if string, struct, array, or map >>> >>> Best, >>> Dongjoon. >>> >>> - >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>> -- John Zhuge >>>
Re: Apache Spark 3.2.2 Release?
+1 (non-binding) Thanks, Cheng Su On Wed, Jul 6, 2022 at 6:01 PM Yuming Wang wrote: > +1 > > On Thu, Jul 7, 2022 at 5:53 AM Maxim Gekk > wrote: > >> +1 >> >> On Thu, Jul 7, 2022 at 12:26 AM John Zhuge wrote: >> >>> +1 Thanks for the effort! >>> >>> On Wed, Jul 6, 2022 at 2:23 PM Bjørn Jørgensen >>> wrote: >>> +1 ons. 6. jul. 2022, 23:05 skrev Hyukjin Kwon : > Yeah +1 > > On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun > wrote: > >> Hi, All. >> >> Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches >> including 11 correctness patches arrived at branch-3.2. >> >> Shall we make a new release, Apache Spark 3.2.2, as the third release >> at 3.2 line? I'd like to volunteer as the release manager for Apache >> Spark 3.2.2. I'm thinking about starting the first RC next week. >> >> $ git log --oneline v3.2.1..HEAD | wc -l >> 197 >> >> # Correctness issues >> >> SPARK-38075 Hive script transform with order by and limit will >> return fake rows >> SPARK-38204 All state operators are at a risk of inconsistency >> between state partitioning and operator partitioning >> SPARK-38309 SHS has incorrect percentiles for shuffle read bytes >> and shuffle total blocks metrics >> SPARK-38320 (flat)MapGroupsWithState can timeout groups which just >> received inputs in the same microbatch >> SPARK-38614 After Spark update, df.show() shows incorrect >> F.percent_rank results >> SPARK-38655 OffsetWindowFunctionFrameBase cannot find the offset >> row whose input is not null >> SPARK-38684 Stream-stream outer join has a possible correctness >> issue due to weakly read consistent on outer iterators >> SPARK-39061 Incorrect results or NPE when using Inline function >> against an array of dynamically created structs >> SPARK-39107 Silent change in regexp_replace's handling of empty >> strings >> SPARK-39259 Timestamps returned by now() and equivalent functions >> are not consistent in subqueries >> SPARK-39293 The accumulator of ArrayAggregate should copy the >> intermediate result if string, struct, array, or map >> >> Best, >> Dongjoon. >> >> - >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> -- >>> John Zhuge >>> >>
Re: Apache Spark 3.2.2 Release?
+1 On Thu, Jul 7, 2022 at 5:53 AM Maxim Gekk wrote: > +1 > > On Thu, Jul 7, 2022 at 12:26 AM John Zhuge wrote: > >> +1 Thanks for the effort! >> >> On Wed, Jul 6, 2022 at 2:23 PM Bjørn Jørgensen >> wrote: >> >>> +1 >>> >>> ons. 6. jul. 2022, 23:05 skrev Hyukjin Kwon : >>> Yeah +1 On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun wrote: > Hi, All. > > Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches > including 11 correctness patches arrived at branch-3.2. > > Shall we make a new release, Apache Spark 3.2.2, as the third release > at 3.2 line? I'd like to volunteer as the release manager for Apache > Spark 3.2.2. I'm thinking about starting the first RC next week. > > $ git log --oneline v3.2.1..HEAD | wc -l > 197 > > # Correctness issues > > SPARK-38075 Hive script transform with order by and limit will > return fake rows > SPARK-38204 All state operators are at a risk of inconsistency > between state partitioning and operator partitioning > SPARK-38309 SHS has incorrect percentiles for shuffle read bytes > and shuffle total blocks metrics > SPARK-38320 (flat)MapGroupsWithState can timeout groups which just > received inputs in the same microbatch > SPARK-38614 After Spark update, df.show() shows incorrect > F.percent_rank results > SPARK-38655 OffsetWindowFunctionFrameBase cannot find the offset > row whose input is not null > SPARK-38684 Stream-stream outer join has a possible correctness > issue due to weakly read consistent on outer iterators > SPARK-39061 Incorrect results or NPE when using Inline function > against an array of dynamically created structs > SPARK-39107 Silent change in regexp_replace's handling of empty > strings > SPARK-39259 Timestamps returned by now() and equivalent functions > are not consistent in subqueries > SPARK-39293 The accumulator of ArrayAggregate should copy the > intermediate result if string, struct, array, or map > > Best, > Dongjoon. > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- >> John Zhuge >> >
Re: Apache Spark 3.2.2 Release?
+1 On Thu, Jul 7, 2022 at 12:26 AM John Zhuge wrote: > +1 Thanks for the effort! > > On Wed, Jul 6, 2022 at 2:23 PM Bjørn Jørgensen > wrote: > >> +1 >> >> ons. 6. jul. 2022, 23:05 skrev Hyukjin Kwon : >> >>> Yeah +1 >>> >>> On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun >>> wrote: >>> Hi, All. Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches including 11 correctness patches arrived at branch-3.2. Shall we make a new release, Apache Spark 3.2.2, as the third release at 3.2 line? I'd like to volunteer as the release manager for Apache Spark 3.2.2. I'm thinking about starting the first RC next week. $ git log --oneline v3.2.1..HEAD | wc -l 197 # Correctness issues SPARK-38075 Hive script transform with order by and limit will return fake rows SPARK-38204 All state operators are at a risk of inconsistency between state partitioning and operator partitioning SPARK-38309 SHS has incorrect percentiles for shuffle read bytes and shuffle total blocks metrics SPARK-38320 (flat)MapGroupsWithState can timeout groups which just received inputs in the same microbatch SPARK-38614 After Spark update, df.show() shows incorrect F.percent_rank results SPARK-38655 OffsetWindowFunctionFrameBase cannot find the offset row whose input is not null SPARK-38684 Stream-stream outer join has a possible correctness issue due to weakly read consistent on outer iterators SPARK-39061 Incorrect results or NPE when using Inline function against an array of dynamically created structs SPARK-39107 Silent change in regexp_replace's handling of empty strings SPARK-39259 Timestamps returned by now() and equivalent functions are not consistent in subqueries SPARK-39293 The accumulator of ArrayAggregate should copy the intermediate result if string, struct, array, or map Best, Dongjoon. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org -- > John Zhuge >
Re: Apache Spark 3.2.2 Release?
+1 Thanks for the effort! On Wed, Jul 6, 2022 at 2:23 PM Bjørn Jørgensen wrote: > +1 > > ons. 6. jul. 2022, 23:05 skrev Hyukjin Kwon : > >> Yeah +1 >> >> On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun >> wrote: >> >>> Hi, All. >>> >>> Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches >>> including 11 correctness patches arrived at branch-3.2. >>> >>> Shall we make a new release, Apache Spark 3.2.2, as the third release >>> at 3.2 line? I'd like to volunteer as the release manager for Apache >>> Spark 3.2.2. I'm thinking about starting the first RC next week. >>> >>> $ git log --oneline v3.2.1..HEAD | wc -l >>> 197 >>> >>> # Correctness issues >>> >>> SPARK-38075 Hive script transform with order by and limit will >>> return fake rows >>> SPARK-38204 All state operators are at a risk of inconsistency >>> between state partitioning and operator partitioning >>> SPARK-38309 SHS has incorrect percentiles for shuffle read bytes >>> and shuffle total blocks metrics >>> SPARK-38320 (flat)MapGroupsWithState can timeout groups which just >>> received inputs in the same microbatch >>> SPARK-38614 After Spark update, df.show() shows incorrect >>> F.percent_rank results >>> SPARK-38655 OffsetWindowFunctionFrameBase cannot find the offset >>> row whose input is not null >>> SPARK-38684 Stream-stream outer join has a possible correctness >>> issue due to weakly read consistent on outer iterators >>> SPARK-39061 Incorrect results or NPE when using Inline function >>> against an array of dynamically created structs >>> SPARK-39107 Silent change in regexp_replace's handling of empty >>> strings >>> SPARK-39259 Timestamps returned by now() and equivalent functions >>> are not consistent in subqueries >>> SPARK-39293 The accumulator of ArrayAggregate should copy the >>> intermediate result if string, struct, array, or map >>> >>> Best, >>> Dongjoon. >>> >>> - >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> >>> -- John Zhuge
Re: Apache Spark 3.2.2 Release?
+1 ons. 6. jul. 2022, 23:05 skrev Hyukjin Kwon : > Yeah +1 > > On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun > wrote: > >> Hi, All. >> >> Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches >> including 11 correctness patches arrived at branch-3.2. >> >> Shall we make a new release, Apache Spark 3.2.2, as the third release >> at 3.2 line? I'd like to volunteer as the release manager for Apache >> Spark 3.2.2. I'm thinking about starting the first RC next week. >> >> $ git log --oneline v3.2.1..HEAD | wc -l >> 197 >> >> # Correctness issues >> >> SPARK-38075 Hive script transform with order by and limit will >> return fake rows >> SPARK-38204 All state operators are at a risk of inconsistency >> between state partitioning and operator partitioning >> SPARK-38309 SHS has incorrect percentiles for shuffle read bytes >> and shuffle total blocks metrics >> SPARK-38320 (flat)MapGroupsWithState can timeout groups which just >> received inputs in the same microbatch >> SPARK-38614 After Spark update, df.show() shows incorrect >> F.percent_rank results >> SPARK-38655 OffsetWindowFunctionFrameBase cannot find the offset >> row whose input is not null >> SPARK-38684 Stream-stream outer join has a possible correctness >> issue due to weakly read consistent on outer iterators >> SPARK-39061 Incorrect results or NPE when using Inline function >> against an array of dynamically created structs >> SPARK-39107 Silent change in regexp_replace's handling of empty >> strings >> SPARK-39259 Timestamps returned by now() and equivalent functions >> are not consistent in subqueries >> SPARK-39293 The accumulator of ArrayAggregate should copy the >> intermediate result if string, struct, array, or map >> >> Best, >> Dongjoon. >> >> - >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >>
Re: Apache Spark 3.2.2 Release?
Yeah +1 On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun wrote: > Hi, All. > > Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches > including 11 correctness patches arrived at branch-3.2. > > Shall we make a new release, Apache Spark 3.2.2, as the third release > at 3.2 line? I'd like to volunteer as the release manager for Apache > Spark 3.2.2. I'm thinking about starting the first RC next week. > > $ git log --oneline v3.2.1..HEAD | wc -l > 197 > > # Correctness issues > > SPARK-38075 Hive script transform with order by and limit will > return fake rows > SPARK-38204 All state operators are at a risk of inconsistency > between state partitioning and operator partitioning > SPARK-38309 SHS has incorrect percentiles for shuffle read bytes > and shuffle total blocks metrics > SPARK-38320 (flat)MapGroupsWithState can timeout groups which just > received inputs in the same microbatch > SPARK-38614 After Spark update, df.show() shows incorrect > F.percent_rank results > SPARK-38655 OffsetWindowFunctionFrameBase cannot find the offset > row whose input is not null > SPARK-38684 Stream-stream outer join has a possible correctness > issue due to weakly read consistent on outer iterators > SPARK-39061 Incorrect results or NPE when using Inline function > against an array of dynamically created structs > SPARK-39107 Silent change in regexp_replace's handling of empty strings > SPARK-39259 Timestamps returned by now() and equivalent functions > are not consistent in subqueries > SPARK-39293 The accumulator of ArrayAggregate should copy the > intermediate result if string, struct, array, or map > > Best, > Dongjoon. > > - > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > >
Apache Spark 3.2.2 Release?
Hi, All. Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches including 11 correctness patches arrived at branch-3.2. Shall we make a new release, Apache Spark 3.2.2, as the third release at 3.2 line? I'd like to volunteer as the release manager for Apache Spark 3.2.2. I'm thinking about starting the first RC next week. $ git log --oneline v3.2.1..HEAD | wc -l 197 # Correctness issues SPARK-38075 Hive script transform with order by and limit will return fake rows SPARK-38204 All state operators are at a risk of inconsistency between state partitioning and operator partitioning SPARK-38309 SHS has incorrect percentiles for shuffle read bytes and shuffle total blocks metrics SPARK-38320 (flat)MapGroupsWithState can timeout groups which just received inputs in the same microbatch SPARK-38614 After Spark update, df.show() shows incorrect F.percent_rank results SPARK-38655 OffsetWindowFunctionFrameBase cannot find the offset row whose input is not null SPARK-38684 Stream-stream outer join has a possible correctness issue due to weakly read consistent on outer iterators SPARK-39061 Incorrect results or NPE when using Inline function against an array of dynamically created structs SPARK-39107 Silent change in regexp_replace's handling of empty strings SPARK-39259 Timestamps returned by now() and equivalent functions are not consistent in subqueries SPARK-39293 The accumulator of ArrayAggregate should copy the intermediate result if string, struct, array, or map Best, Dongjoon. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
[DISCUSS] Deprecate Trigger.Once and promote Trigger.AvailableNow
Hi dev, I would like to hear voices about deprecating Trigger.Once, and promoting Trigger.AvailableNow as a replacement [1] in Structured Streaming. (It doesn't mean we remove Trigger.Once now or near future. It probably requires another discussion at some time.) Rationalization: The expected behavior of Trigger.Once is like reading all available data after the last trigger and processing them. This holds true when the last run was gracefully terminated, but there are cases streaming queries to not be terminated gracefully. There is a possibility the last run may write the offset for the new batch before termination, then a new run of Trigger.Once only processes the data which was built in the latest unfinished batch and doesn't process new data. The behavior is not deterministic from the users' point of view, as end users wouldn't know whether the last run wrote the offset or not, unless they look into the query's checkpoint by themselves. While Trigger.AvailableNow came to solve the scalability issue on Trigger.Once, it also ensures that it tries to process all available data at the point of time it is triggered, which consistently works as expected behavior of Trigger.Once. Another issue on Trigger.Once is that it does not trigger a no-data batch immediately. When the watermark is calculated in batch N, it takes effect in batch N + 1. If the query is scheduled to be run per day, you can see the output from the new watermark in the query run the next day. Thanks to the behavior of Trigger.AvailableNow, it handles no-data batch as well before termination of the query. Please review and let us know if you have any feedback or concerns on the proposal. Thanks! Jungtaek Lim 1. https://issues.apache.org/jira/browse/SPARK-36533