Re: Apache Spark 3.2.2 Release?

2022-07-06 Thread Mridul Muralidharan
+1

Thanks for driving this Dongjoon !

Regards,
Mridul

On Thu, Jul 7, 2022 at 12:36 AM Gengliang Wang  wrote:

> +1.
> Thank you, Dongjoon.
>
> On Wed, Jul 6, 2022 at 10:21 PM Wenchen Fan  wrote:
>
>> +1
>>
>> On Thu, Jul 7, 2022 at 10:41 AM Xinrong Meng
>>  wrote:
>>
>>> +1
>>>
>>> Thanks!
>>>
>>>
>>> Xinrong Meng
>>>
>>> Software Engineer
>>>
>>> Databricks
>>>
>>>
>>> On Wed, Jul 6, 2022 at 7:25 PM Xiao Li  wrote:
>>>
 +1

 Xiao

 Cheng Su  于2022年7月6日周三 19:16写道:

> +1 (non-binding)
>
> Thanks,
> Cheng Su
>
> On Wed, Jul 6, 2022 at 6:01 PM Yuming Wang  wrote:
>
>> +1
>>
>> On Thu, Jul 7, 2022 at 5:53 AM Maxim Gekk
>>  wrote:
>>
>>> +1
>>>
>>> On Thu, Jul 7, 2022 at 12:26 AM John Zhuge 
>>> wrote:
>>>
 +1  Thanks for the effort!

 On Wed, Jul 6, 2022 at 2:23 PM Bjørn Jørgensen <
 bjornjorgen...@gmail.com> wrote:

> +1
>
> ons. 6. jul. 2022, 23:05 skrev Hyukjin Kwon :
>
>> Yeah +1
>>
>> On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun <
>> dongjoon.h...@gmail.com> wrote:
>>
>>> Hi, All.
>>>
>>> Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches
>>> including 11 correctness patches arrived at branch-3.2.
>>>
>>> Shall we make a new release, Apache Spark 3.2.2, as the third
>>> release
>>> at 3.2 line? I'd like to volunteer as the release manager for
>>> Apache
>>> Spark 3.2.2. I'm thinking about starting the first RC next week.
>>>
>>> $ git log --oneline v3.2.1..HEAD | wc -l
>>>  197
>>>
>>> # Correctness issues
>>>
>>> SPARK-38075 Hive script transform with order by and limit
>>> will
>>> return fake rows
>>> SPARK-38204 All state operators are at a risk of
>>> inconsistency
>>> between state partitioning and operator partitioning
>>> SPARK-38309 SHS has incorrect percentiles for shuffle read
>>> bytes
>>> and shuffle total blocks metrics
>>> SPARK-38320 (flat)MapGroupsWithState can timeout groups
>>> which just
>>> received inputs in the same microbatch
>>> SPARK-38614 After Spark update, df.show() shows incorrect
>>> F.percent_rank results
>>> SPARK-38655 OffsetWindowFunctionFrameBase cannot find the
>>> offset
>>> row whose input is not null
>>> SPARK-38684 Stream-stream outer join has a possible
>>> correctness
>>> issue due to weakly read consistent on outer iterators
>>> SPARK-39061 Incorrect results or NPE when using Inline
>>> function
>>> against an array of dynamically created structs
>>> SPARK-39107 Silent change in regexp_replace's handling of
>>> empty strings
>>> SPARK-39259 Timestamps returned by now() and equivalent
>>> functions
>>> are not consistent in subqueries
>>> SPARK-39293 The accumulator of ArrayAggregate should copy the
>>> intermediate result if string, struct, array, or map
>>>
>>> Best,
>>> Dongjoon.
>>>
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>> --
 John Zhuge

>>>


Re: Apache Spark 3.2.2 Release?

2022-07-06 Thread Gengliang Wang
+1.
Thank you, Dongjoon.

On Wed, Jul 6, 2022 at 10:21 PM Wenchen Fan  wrote:

> +1
>
> On Thu, Jul 7, 2022 at 10:41 AM Xinrong Meng
>  wrote:
>
>> +1
>>
>> Thanks!
>>
>>
>> Xinrong Meng
>>
>> Software Engineer
>>
>> Databricks
>>
>>
>> On Wed, Jul 6, 2022 at 7:25 PM Xiao Li  wrote:
>>
>>> +1
>>>
>>> Xiao
>>>
>>> Cheng Su  于2022年7月6日周三 19:16写道:
>>>
 +1 (non-binding)

 Thanks,
 Cheng Su

 On Wed, Jul 6, 2022 at 6:01 PM Yuming Wang  wrote:

> +1
>
> On Thu, Jul 7, 2022 at 5:53 AM Maxim Gekk
>  wrote:
>
>> +1
>>
>> On Thu, Jul 7, 2022 at 12:26 AM John Zhuge  wrote:
>>
>>> +1  Thanks for the effort!
>>>
>>> On Wed, Jul 6, 2022 at 2:23 PM Bjørn Jørgensen <
>>> bjornjorgen...@gmail.com> wrote:
>>>
 +1

 ons. 6. jul. 2022, 23:05 skrev Hyukjin Kwon :

> Yeah +1
>
> On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun <
> dongjoon.h...@gmail.com> wrote:
>
>> Hi, All.
>>
>> Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches
>> including 11 correctness patches arrived at branch-3.2.
>>
>> Shall we make a new release, Apache Spark 3.2.2, as the third
>> release
>> at 3.2 line? I'd like to volunteer as the release manager for
>> Apache
>> Spark 3.2.2. I'm thinking about starting the first RC next week.
>>
>> $ git log --oneline v3.2.1..HEAD | wc -l
>>  197
>>
>> # Correctness issues
>>
>> SPARK-38075 Hive script transform with order by and limit will
>> return fake rows
>> SPARK-38204 All state operators are at a risk of inconsistency
>> between state partitioning and operator partitioning
>> SPARK-38309 SHS has incorrect percentiles for shuffle read
>> bytes
>> and shuffle total blocks metrics
>> SPARK-38320 (flat)MapGroupsWithState can timeout groups which
>> just
>> received inputs in the same microbatch
>> SPARK-38614 After Spark update, df.show() shows incorrect
>> F.percent_rank results
>> SPARK-38655 OffsetWindowFunctionFrameBase cannot find the
>> offset
>> row whose input is not null
>> SPARK-38684 Stream-stream outer join has a possible
>> correctness
>> issue due to weakly read consistent on outer iterators
>> SPARK-39061 Incorrect results or NPE when using Inline
>> function
>> against an array of dynamically created structs
>> SPARK-39107 Silent change in regexp_replace's handling of
>> empty strings
>> SPARK-39259 Timestamps returned by now() and equivalent
>> functions
>> are not consistent in subqueries
>> SPARK-39293 The accumulator of ArrayAggregate should copy the
>> intermediate result if string, struct, array, or map
>>
>> Best,
>> Dongjoon.
>>
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>> --
>>> John Zhuge
>>>
>>


Re: Apache Spark 3.2.2 Release?

2022-07-06 Thread Wenchen Fan
+1

On Thu, Jul 7, 2022 at 10:41 AM Xinrong Meng
 wrote:

> +1
>
> Thanks!
>
>
> Xinrong Meng
>
> Software Engineer
>
> Databricks
>
>
> On Wed, Jul 6, 2022 at 7:25 PM Xiao Li  wrote:
>
>> +1
>>
>> Xiao
>>
>> Cheng Su  于2022年7月6日周三 19:16写道:
>>
>>> +1 (non-binding)
>>>
>>> Thanks,
>>> Cheng Su
>>>
>>> On Wed, Jul 6, 2022 at 6:01 PM Yuming Wang  wrote:
>>>
 +1

 On Thu, Jul 7, 2022 at 5:53 AM Maxim Gekk
  wrote:

> +1
>
> On Thu, Jul 7, 2022 at 12:26 AM John Zhuge  wrote:
>
>> +1  Thanks for the effort!
>>
>> On Wed, Jul 6, 2022 at 2:23 PM Bjørn Jørgensen <
>> bjornjorgen...@gmail.com> wrote:
>>
>>> +1
>>>
>>> ons. 6. jul. 2022, 23:05 skrev Hyukjin Kwon :
>>>
 Yeah +1

 On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun <
 dongjoon.h...@gmail.com> wrote:

> Hi, All.
>
> Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches
> including 11 correctness patches arrived at branch-3.2.
>
> Shall we make a new release, Apache Spark 3.2.2, as the third
> release
> at 3.2 line? I'd like to volunteer as the release manager for
> Apache
> Spark 3.2.2. I'm thinking about starting the first RC next week.
>
> $ git log --oneline v3.2.1..HEAD | wc -l
>  197
>
> # Correctness issues
>
> SPARK-38075 Hive script transform with order by and limit will
> return fake rows
> SPARK-38204 All state operators are at a risk of inconsistency
> between state partitioning and operator partitioning
> SPARK-38309 SHS has incorrect percentiles for shuffle read
> bytes
> and shuffle total blocks metrics
> SPARK-38320 (flat)MapGroupsWithState can timeout groups which
> just
> received inputs in the same microbatch
> SPARK-38614 After Spark update, df.show() shows incorrect
> F.percent_rank results
> SPARK-38655 OffsetWindowFunctionFrameBase cannot find the
> offset
> row whose input is not null
> SPARK-38684 Stream-stream outer join has a possible correctness
> issue due to weakly read consistent on outer iterators
> SPARK-39061 Incorrect results or NPE when using Inline function
> against an array of dynamically created structs
> SPARK-39107 Silent change in regexp_replace's handling of
> empty strings
> SPARK-39259 Timestamps returned by now() and equivalent
> functions
> are not consistent in subqueries
> SPARK-39293 The accumulator of ArrayAggregate should copy the
> intermediate result if string, struct, array, or map
>
> Best,
> Dongjoon.
>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
> --
>> John Zhuge
>>
>


Re: Apache Spark 3.2.2 Release?

2022-07-06 Thread Xinrong Meng
+1

Thanks!


Xinrong Meng

Software Engineer

Databricks


On Wed, Jul 6, 2022 at 7:25 PM Xiao Li  wrote:

> +1
>
> Xiao
>
> Cheng Su  于2022年7月6日周三 19:16写道:
>
>> +1 (non-binding)
>>
>> Thanks,
>> Cheng Su
>>
>> On Wed, Jul 6, 2022 at 6:01 PM Yuming Wang  wrote:
>>
>>> +1
>>>
>>> On Thu, Jul 7, 2022 at 5:53 AM Maxim Gekk
>>>  wrote:
>>>
 +1

 On Thu, Jul 7, 2022 at 12:26 AM John Zhuge  wrote:

> +1  Thanks for the effort!
>
> On Wed, Jul 6, 2022 at 2:23 PM Bjørn Jørgensen <
> bjornjorgen...@gmail.com> wrote:
>
>> +1
>>
>> ons. 6. jul. 2022, 23:05 skrev Hyukjin Kwon :
>>
>>> Yeah +1
>>>
>>> On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun <
>>> dongjoon.h...@gmail.com> wrote:
>>>
 Hi, All.

 Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches
 including 11 correctness patches arrived at branch-3.2.

 Shall we make a new release, Apache Spark 3.2.2, as the third
 release
 at 3.2 line? I'd like to volunteer as the release manager for Apache
 Spark 3.2.2. I'm thinking about starting the first RC next week.

 $ git log --oneline v3.2.1..HEAD | wc -l
  197

 # Correctness issues

 SPARK-38075 Hive script transform with order by and limit will
 return fake rows
 SPARK-38204 All state operators are at a risk of inconsistency
 between state partitioning and operator partitioning
 SPARK-38309 SHS has incorrect percentiles for shuffle read bytes
 and shuffle total blocks metrics
 SPARK-38320 (flat)MapGroupsWithState can timeout groups which
 just
 received inputs in the same microbatch
 SPARK-38614 After Spark update, df.show() shows incorrect
 F.percent_rank results
 SPARK-38655 OffsetWindowFunctionFrameBase cannot find the offset
 row whose input is not null
 SPARK-38684 Stream-stream outer join has a possible correctness
 issue due to weakly read consistent on outer iterators
 SPARK-39061 Incorrect results or NPE when using Inline function
 against an array of dynamically created structs
 SPARK-39107 Silent change in regexp_replace's handling of empty
 strings
 SPARK-39259 Timestamps returned by now() and equivalent
 functions
 are not consistent in subqueries
 SPARK-39293 The accumulator of ArrayAggregate should copy the
 intermediate result if string, struct, array, or map

 Best,
 Dongjoon.


 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

 --
> John Zhuge
>



Re: Apache Spark 3.2.2 Release?

2022-07-06 Thread Xiao Li
+1

Xiao

Cheng Su  于2022年7月6日周三 19:16写道:

> +1 (non-binding)
>
> Thanks,
> Cheng Su
>
> On Wed, Jul 6, 2022 at 6:01 PM Yuming Wang  wrote:
>
>> +1
>>
>> On Thu, Jul 7, 2022 at 5:53 AM Maxim Gekk
>>  wrote:
>>
>>> +1
>>>
>>> On Thu, Jul 7, 2022 at 12:26 AM John Zhuge  wrote:
>>>
 +1  Thanks for the effort!

 On Wed, Jul 6, 2022 at 2:23 PM Bjørn Jørgensen <
 bjornjorgen...@gmail.com> wrote:

> +1
>
> ons. 6. jul. 2022, 23:05 skrev Hyukjin Kwon :
>
>> Yeah +1
>>
>> On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun 
>> wrote:
>>
>>> Hi, All.
>>>
>>> Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches
>>> including 11 correctness patches arrived at branch-3.2.
>>>
>>> Shall we make a new release, Apache Spark 3.2.2, as the third release
>>> at 3.2 line? I'd like to volunteer as the release manager for Apache
>>> Spark 3.2.2. I'm thinking about starting the first RC next week.
>>>
>>> $ git log --oneline v3.2.1..HEAD | wc -l
>>>  197
>>>
>>> # Correctness issues
>>>
>>> SPARK-38075 Hive script transform with order by and limit will
>>> return fake rows
>>> SPARK-38204 All state operators are at a risk of inconsistency
>>> between state partitioning and operator partitioning
>>> SPARK-38309 SHS has incorrect percentiles for shuffle read bytes
>>> and shuffle total blocks metrics
>>> SPARK-38320 (flat)MapGroupsWithState can timeout groups which
>>> just
>>> received inputs in the same microbatch
>>> SPARK-38614 After Spark update, df.show() shows incorrect
>>> F.percent_rank results
>>> SPARK-38655 OffsetWindowFunctionFrameBase cannot find the offset
>>> row whose input is not null
>>> SPARK-38684 Stream-stream outer join has a possible correctness
>>> issue due to weakly read consistent on outer iterators
>>> SPARK-39061 Incorrect results or NPE when using Inline function
>>> against an array of dynamically created structs
>>> SPARK-39107 Silent change in regexp_replace's handling of empty
>>> strings
>>> SPARK-39259 Timestamps returned by now() and equivalent functions
>>> are not consistent in subqueries
>>> SPARK-39293 The accumulator of ArrayAggregate should copy the
>>> intermediate result if string, struct, array, or map
>>>
>>> Best,
>>> Dongjoon.
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>> --
 John Zhuge

>>>


Re: Apache Spark 3.2.2 Release?

2022-07-06 Thread Cheng Su
+1 (non-binding)

Thanks,
Cheng Su

On Wed, Jul 6, 2022 at 6:01 PM Yuming Wang  wrote:

> +1
>
> On Thu, Jul 7, 2022 at 5:53 AM Maxim Gekk
>  wrote:
>
>> +1
>>
>> On Thu, Jul 7, 2022 at 12:26 AM John Zhuge  wrote:
>>
>>> +1  Thanks for the effort!
>>>
>>> On Wed, Jul 6, 2022 at 2:23 PM Bjørn Jørgensen 
>>> wrote:
>>>
 +1

 ons. 6. jul. 2022, 23:05 skrev Hyukjin Kwon :

> Yeah +1
>
> On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun 
> wrote:
>
>> Hi, All.
>>
>> Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches
>> including 11 correctness patches arrived at branch-3.2.
>>
>> Shall we make a new release, Apache Spark 3.2.2, as the third release
>> at 3.2 line? I'd like to volunteer as the release manager for Apache
>> Spark 3.2.2. I'm thinking about starting the first RC next week.
>>
>> $ git log --oneline v3.2.1..HEAD | wc -l
>>  197
>>
>> # Correctness issues
>>
>> SPARK-38075 Hive script transform with order by and limit will
>> return fake rows
>> SPARK-38204 All state operators are at a risk of inconsistency
>> between state partitioning and operator partitioning
>> SPARK-38309 SHS has incorrect percentiles for shuffle read bytes
>> and shuffle total blocks metrics
>> SPARK-38320 (flat)MapGroupsWithState can timeout groups which just
>> received inputs in the same microbatch
>> SPARK-38614 After Spark update, df.show() shows incorrect
>> F.percent_rank results
>> SPARK-38655 OffsetWindowFunctionFrameBase cannot find the offset
>> row whose input is not null
>> SPARK-38684 Stream-stream outer join has a possible correctness
>> issue due to weakly read consistent on outer iterators
>> SPARK-39061 Incorrect results or NPE when using Inline function
>> against an array of dynamically created structs
>> SPARK-39107 Silent change in regexp_replace's handling of empty
>> strings
>> SPARK-39259 Timestamps returned by now() and equivalent functions
>> are not consistent in subqueries
>> SPARK-39293 The accumulator of ArrayAggregate should copy the
>> intermediate result if string, struct, array, or map
>>
>> Best,
>> Dongjoon.
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>> --
>>> John Zhuge
>>>
>>


Re: Apache Spark 3.2.2 Release?

2022-07-06 Thread Yuming Wang
+1

On Thu, Jul 7, 2022 at 5:53 AM Maxim Gekk 
wrote:

> +1
>
> On Thu, Jul 7, 2022 at 12:26 AM John Zhuge  wrote:
>
>> +1  Thanks for the effort!
>>
>> On Wed, Jul 6, 2022 at 2:23 PM Bjørn Jørgensen 
>> wrote:
>>
>>> +1
>>>
>>> ons. 6. jul. 2022, 23:05 skrev Hyukjin Kwon :
>>>
 Yeah +1

 On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun 
 wrote:

> Hi, All.
>
> Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches
> including 11 correctness patches arrived at branch-3.2.
>
> Shall we make a new release, Apache Spark 3.2.2, as the third release
> at 3.2 line? I'd like to volunteer as the release manager for Apache
> Spark 3.2.2. I'm thinking about starting the first RC next week.
>
> $ git log --oneline v3.2.1..HEAD | wc -l
>  197
>
> # Correctness issues
>
> SPARK-38075 Hive script transform with order by and limit will
> return fake rows
> SPARK-38204 All state operators are at a risk of inconsistency
> between state partitioning and operator partitioning
> SPARK-38309 SHS has incorrect percentiles for shuffle read bytes
> and shuffle total blocks metrics
> SPARK-38320 (flat)MapGroupsWithState can timeout groups which just
> received inputs in the same microbatch
> SPARK-38614 After Spark update, df.show() shows incorrect
> F.percent_rank results
> SPARK-38655 OffsetWindowFunctionFrameBase cannot find the offset
> row whose input is not null
> SPARK-38684 Stream-stream outer join has a possible correctness
> issue due to weakly read consistent on outer iterators
> SPARK-39061 Incorrect results or NPE when using Inline function
> against an array of dynamically created structs
> SPARK-39107 Silent change in regexp_replace's handling of empty
> strings
> SPARK-39259 Timestamps returned by now() and equivalent functions
> are not consistent in subqueries
> SPARK-39293 The accumulator of ArrayAggregate should copy the
> intermediate result if string, struct, array, or map
>
> Best,
> Dongjoon.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
> --
>> John Zhuge
>>
>


Re: Apache Spark 3.2.2 Release?

2022-07-06 Thread Maxim Gekk
+1

On Thu, Jul 7, 2022 at 12:26 AM John Zhuge  wrote:

> +1  Thanks for the effort!
>
> On Wed, Jul 6, 2022 at 2:23 PM Bjørn Jørgensen 
> wrote:
>
>> +1
>>
>> ons. 6. jul. 2022, 23:05 skrev Hyukjin Kwon :
>>
>>> Yeah +1
>>>
>>> On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun 
>>> wrote:
>>>
 Hi, All.

 Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches
 including 11 correctness patches arrived at branch-3.2.

 Shall we make a new release, Apache Spark 3.2.2, as the third release
 at 3.2 line? I'd like to volunteer as the release manager for Apache
 Spark 3.2.2. I'm thinking about starting the first RC next week.

 $ git log --oneline v3.2.1..HEAD | wc -l
  197

 # Correctness issues

 SPARK-38075 Hive script transform with order by and limit will
 return fake rows
 SPARK-38204 All state operators are at a risk of inconsistency
 between state partitioning and operator partitioning
 SPARK-38309 SHS has incorrect percentiles for shuffle read bytes
 and shuffle total blocks metrics
 SPARK-38320 (flat)MapGroupsWithState can timeout groups which just
 received inputs in the same microbatch
 SPARK-38614 After Spark update, df.show() shows incorrect
 F.percent_rank results
 SPARK-38655 OffsetWindowFunctionFrameBase cannot find the offset
 row whose input is not null
 SPARK-38684 Stream-stream outer join has a possible correctness
 issue due to weakly read consistent on outer iterators
 SPARK-39061 Incorrect results or NPE when using Inline function
 against an array of dynamically created structs
 SPARK-39107 Silent change in regexp_replace's handling of empty
 strings
 SPARK-39259 Timestamps returned by now() and equivalent functions
 are not consistent in subqueries
 SPARK-39293 The accumulator of ArrayAggregate should copy the
 intermediate result if string, struct, array, or map

 Best,
 Dongjoon.

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

 --
> John Zhuge
>


Re: Apache Spark 3.2.2 Release?

2022-07-06 Thread John Zhuge
+1  Thanks for the effort!

On Wed, Jul 6, 2022 at 2:23 PM Bjørn Jørgensen 
wrote:

> +1
>
> ons. 6. jul. 2022, 23:05 skrev Hyukjin Kwon :
>
>> Yeah +1
>>
>> On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun 
>> wrote:
>>
>>> Hi, All.
>>>
>>> Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches
>>> including 11 correctness patches arrived at branch-3.2.
>>>
>>> Shall we make a new release, Apache Spark 3.2.2, as the third release
>>> at 3.2 line? I'd like to volunteer as the release manager for Apache
>>> Spark 3.2.2. I'm thinking about starting the first RC next week.
>>>
>>> $ git log --oneline v3.2.1..HEAD | wc -l
>>>  197
>>>
>>> # Correctness issues
>>>
>>> SPARK-38075 Hive script transform with order by and limit will
>>> return fake rows
>>> SPARK-38204 All state operators are at a risk of inconsistency
>>> between state partitioning and operator partitioning
>>> SPARK-38309 SHS has incorrect percentiles for shuffle read bytes
>>> and shuffle total blocks metrics
>>> SPARK-38320 (flat)MapGroupsWithState can timeout groups which just
>>> received inputs in the same microbatch
>>> SPARK-38614 After Spark update, df.show() shows incorrect
>>> F.percent_rank results
>>> SPARK-38655 OffsetWindowFunctionFrameBase cannot find the offset
>>> row whose input is not null
>>> SPARK-38684 Stream-stream outer join has a possible correctness
>>> issue due to weakly read consistent on outer iterators
>>> SPARK-39061 Incorrect results or NPE when using Inline function
>>> against an array of dynamically created structs
>>> SPARK-39107 Silent change in regexp_replace's handling of empty
>>> strings
>>> SPARK-39259 Timestamps returned by now() and equivalent functions
>>> are not consistent in subqueries
>>> SPARK-39293 The accumulator of ArrayAggregate should copy the
>>> intermediate result if string, struct, array, or map
>>>
>>> Best,
>>> Dongjoon.
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>> --
John Zhuge


Re: Apache Spark 3.2.2 Release?

2022-07-06 Thread Bjørn Jørgensen
+1

ons. 6. jul. 2022, 23:05 skrev Hyukjin Kwon :

> Yeah +1
>
> On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun 
> wrote:
>
>> Hi, All.
>>
>> Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches
>> including 11 correctness patches arrived at branch-3.2.
>>
>> Shall we make a new release, Apache Spark 3.2.2, as the third release
>> at 3.2 line? I'd like to volunteer as the release manager for Apache
>> Spark 3.2.2. I'm thinking about starting the first RC next week.
>>
>> $ git log --oneline v3.2.1..HEAD | wc -l
>>  197
>>
>> # Correctness issues
>>
>> SPARK-38075 Hive script transform with order by and limit will
>> return fake rows
>> SPARK-38204 All state operators are at a risk of inconsistency
>> between state partitioning and operator partitioning
>> SPARK-38309 SHS has incorrect percentiles for shuffle read bytes
>> and shuffle total blocks metrics
>> SPARK-38320 (flat)MapGroupsWithState can timeout groups which just
>> received inputs in the same microbatch
>> SPARK-38614 After Spark update, df.show() shows incorrect
>> F.percent_rank results
>> SPARK-38655 OffsetWindowFunctionFrameBase cannot find the offset
>> row whose input is not null
>> SPARK-38684 Stream-stream outer join has a possible correctness
>> issue due to weakly read consistent on outer iterators
>> SPARK-39061 Incorrect results or NPE when using Inline function
>> against an array of dynamically created structs
>> SPARK-39107 Silent change in regexp_replace's handling of empty
>> strings
>> SPARK-39259 Timestamps returned by now() and equivalent functions
>> are not consistent in subqueries
>> SPARK-39293 The accumulator of ArrayAggregate should copy the
>> intermediate result if string, struct, array, or map
>>
>> Best,
>> Dongjoon.
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: Apache Spark 3.2.2 Release?

2022-07-06 Thread Hyukjin Kwon
Yeah +1

On Thu, Jul 7, 2022 at 5:40 AM Dongjoon Hyun 
wrote:

> Hi, All.
>
> Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches
> including 11 correctness patches arrived at branch-3.2.
>
> Shall we make a new release, Apache Spark 3.2.2, as the third release
> at 3.2 line? I'd like to volunteer as the release manager for Apache
> Spark 3.2.2. I'm thinking about starting the first RC next week.
>
> $ git log --oneline v3.2.1..HEAD | wc -l
>  197
>
> # Correctness issues
>
> SPARK-38075 Hive script transform with order by and limit will
> return fake rows
> SPARK-38204 All state operators are at a risk of inconsistency
> between state partitioning and operator partitioning
> SPARK-38309 SHS has incorrect percentiles for shuffle read bytes
> and shuffle total blocks metrics
> SPARK-38320 (flat)MapGroupsWithState can timeout groups which just
> received inputs in the same microbatch
> SPARK-38614 After Spark update, df.show() shows incorrect
> F.percent_rank results
> SPARK-38655 OffsetWindowFunctionFrameBase cannot find the offset
> row whose input is not null
> SPARK-38684 Stream-stream outer join has a possible correctness
> issue due to weakly read consistent on outer iterators
> SPARK-39061 Incorrect results or NPE when using Inline function
> against an array of dynamically created structs
> SPARK-39107 Silent change in regexp_replace's handling of empty strings
> SPARK-39259 Timestamps returned by now() and equivalent functions
> are not consistent in subqueries
> SPARK-39293 The accumulator of ArrayAggregate should copy the
> intermediate result if string, struct, array, or map
>
> Best,
> Dongjoon.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Apache Spark 3.2.2 Release?

2022-07-06 Thread Dongjoon Hyun
Hi, All.

Since Apache Spark 3.2.1 tag creation (Jan 19), new 197 patches
including 11 correctness patches arrived at branch-3.2.

Shall we make a new release, Apache Spark 3.2.2, as the third release
at 3.2 line? I'd like to volunteer as the release manager for Apache
Spark 3.2.2. I'm thinking about starting the first RC next week.

$ git log --oneline v3.2.1..HEAD | wc -l
 197

# Correctness issues

SPARK-38075 Hive script transform with order by and limit will
return fake rows
SPARK-38204 All state operators are at a risk of inconsistency
between state partitioning and operator partitioning
SPARK-38309 SHS has incorrect percentiles for shuffle read bytes
and shuffle total blocks metrics
SPARK-38320 (flat)MapGroupsWithState can timeout groups which just
received inputs in the same microbatch
SPARK-38614 After Spark update, df.show() shows incorrect
F.percent_rank results
SPARK-38655 OffsetWindowFunctionFrameBase cannot find the offset
row whose input is not null
SPARK-38684 Stream-stream outer join has a possible correctness
issue due to weakly read consistent on outer iterators
SPARK-39061 Incorrect results or NPE when using Inline function
against an array of dynamically created structs
SPARK-39107 Silent change in regexp_replace's handling of empty strings
SPARK-39259 Timestamps returned by now() and equivalent functions
are not consistent in subqueries
SPARK-39293 The accumulator of ArrayAggregate should copy the
intermediate result if string, struct, array, or map

Best,
Dongjoon.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[DISCUSS] Deprecate Trigger.Once and promote Trigger.AvailableNow

2022-07-06 Thread Jungtaek Lim
Hi dev,

I would like to hear voices about deprecating Trigger.Once, and promoting
Trigger.AvailableNow as a replacement [1] in Structured Streaming.
(It doesn't mean we remove Trigger.Once now or near future. It probably
requires another discussion at some time.)

Rationalization:

The expected behavior of Trigger.Once is like reading all available data
after the last trigger and processing them. This holds true when the last
run was gracefully terminated, but there are cases streaming queries to not
be terminated gracefully. There is a possibility the last run may write the
offset for the new batch before termination, then a new run of Trigger.Once
only processes the data which was built in the latest unfinished batch and
doesn't process new data.

The behavior is not deterministic from the users' point of view, as end
users wouldn't know whether the last run wrote the offset or not, unless
they look into the query's checkpoint by themselves.

While Trigger.AvailableNow came to solve the scalability issue on
Trigger.Once, it also ensures that it tries to process all available data
at the point of time it is triggered, which consistently works as expected
behavior of Trigger.Once.

Another issue on Trigger.Once is that it does not trigger a no-data batch
immediately. When the watermark is calculated in batch N, it takes effect
in batch N + 1. If the query is scheduled to be run per day, you can see
the output from the new watermark in the query run the next day. Thanks to
the behavior of Trigger.AvailableNow, it handles no-data batch as well
before termination of the query.

Please review and let us know if you have any feedback or concerns on the
proposal.

Thanks!
Jungtaek Lim

1. https://issues.apache.org/jira/browse/SPARK-36533