Re: Missing data in spark output

2022-10-18 Thread Emil Ejbyfeldt

Hi,

We have observed similar behavior in older versions of spark. But we 
were are currently using 3.3.0 where we have not seen such issues.


Which version of Spark and Hadoop are you using?

On 18/10/2022 19:48, Sandeep Vinayak wrote:

Hello Everyone,

We are recently observing an intermittent data loss in the spark with 
output to GCS (google cloud storage). When there are missing rows, they 
are accompanied by duplicate rows. The re-run of the job doesn't have 
any duplicate or missing rows. Since it's hard to debug, we are first 
trying to understand the potential theoretical root cause of this issue, 
can this be a GCS specific issue where GCS might not be handling the 
consistencies well? Any tips will be super helpful.


Thanks,



-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Welcome Yikun Jiang as a Spark committer

2022-10-18 Thread Rui Wang
Well deserved! Congrats!


-Rui

On Mon, Oct 10, 2022 at 9:07 AM Xinrong Meng 
wrote:

> Congratulations, Yikun! Well deserved.
>
> On Sun, Oct 9, 2022 at 9:36 PM John Zhuge  wrote:
>
>> Congratulations, Yikun!
>>
>> On Sun, Oct 9, 2022 at 8:52 PM Senthil Kumar  wrote:
>>
>>> Congratulations Yikun
>>>
>>> On Mon, 10 Oct 2022, 09:11 Xiao Li,  wrote:
>>>
 Congratulations, Yikun!

 Xiao

 Yikun Jiang  于2022年10月9日周日 19:34写道:

> Thank you all!
>
> Regards,
> Yikun
>
>
> On Mon, Oct 10, 2022 at 3:18 AM Chao Sun  wrote:
>
>> Congratulations Yikun!
>>
>> On Sun, Oct 9, 2022 at 11:14 AM vaquar khan 
>> wrote:
>>
>>> Congratulations.
>>>
>>> Regards,
>>> Vaquar khan
>>>
>>> On Sun, Oct 9, 2022, 6:46 AM 叶先进  wrote:
>>>
 Congrats

 On Oct 9, 2022, at 16:44, XiDuo You  wrote:

 Congratulations, Yikun !

 Maxim Gekk  于2022年10月9日周日
 15:59写道:

> Keep up the great work, Yikun!
>
> On Sun, Oct 9, 2022 at 10:52 AM Gengliang Wang 
> wrote:
>
>> Congratulations, Yikun!
>>
>> On Sun, Oct 9, 2022 at 12:33 AM 416161...@qq.com <
>> ruife...@foxmail.com> wrote:
>>
>>> Congrats, Yikun!
>>>
>>> --
>>> Ruifeng Zheng
>>> ruife...@foxmail.com
>>>
>>> 
>>>
>>>
>>>
>>> -- Original --
>>> *From:* "Martin Grigorov" ;
>>> *Date:* Sun, Oct 9, 2022 05:01 AM
>>> *To:* "Hyukjin Kwon";
>>> *Cc:* "dev";"Yikun Jiang"<
>>> yikunk...@gmail.com>;
>>> *Subject:* Re: Welcome Yikun Jiang as a Spark committer
>>>
>>> Congratulations, Yikun!
>>>
>>> On Sat, Oct 8, 2022 at 7:41 AM Hyukjin Kwon 
>>> wrote:
>>>
 Hi all,

 The Spark PMC recently added Yikun Jiang as a committer on the
 project.
 Yikun is the major contributor of the infrastructure and GitHub
 Actions in Apache Spark as well as Kubernates and PySpark.
 He has put a lot of effort into stabilizing and optimizing the
 builds so we all can work together in Apache Spark more
 efficiently and effectively. He's also driving the SPIP for
 Docker official image in Apache Spark as well for users and 
 developers.
 Please join me in welcoming Yikun!


 --
>> John Zhuge
>>
>


Re: [DISCUSS] Flip the default value of Kafka offset fetching config (spark.sql.streaming.kafka.useDeprecatedOffsetFetching)

2022-10-18 Thread Jungtaek Lim
No further voice so far. I'm going to submit a PR. Thanks again for the
feedback!

On Mon, Oct 17, 2022 at 9:30 AM Jungtaek Lim 
wrote:

> Thanks Gabor and Dongjoon for supporting this!
>
> Bump to reach more eyes. If there is no further voice on this in a couple
> of days, I'll consider it as a lazy consensus and submit a PR to this.
>
> On Sat, Oct 15, 2022 at 3:32 AM Dongjoon Hyun 
> wrote:
>
>> +1
>>
>> I agree with Jungtaek and Gabor about switching the default value of
>> configurations with the migration guide.
>>
>> Dongjoon
>>
>> On Thu, Oct 13, 2022 at 12:46 AM Gabor Somogyi 
>> wrote:
>>
>>> Hi Jungtaek,
>>>
>>> Good to hear that the new approach is working fine. +1 from my side.
>>>
>>> BR,
>>> G
>>>
>>>
>>> On Thu, Oct 13, 2022 at 4:12 AM Jungtaek Lim <
>>> kabhwan.opensou...@gmail.com> wrote:
>>>
 Hi all,

 I would like to propose flipping the default value of Kafka offset
 fetching config. The context is following:

 Before Spark 3.1, there was only one approach on fetching offset, using
 consumer.poll(0). This has been pointed out as a root cause for hang since
 there is no timeout for metadata fetch.

 In Spark 3.1, we addressed this via introducing a new approach on
 fetching offset, via SPARK-32032
 . Since the new
 approach leverages AdminClient and consumer group is no longer needed for
 fetching offset, required security ACLs are loosen.

 Reference:
 https://spark.apache.org/docs/latest/structured-streaming-kafka-integration.html#offset-fetching

 There was some concern about behavioral change on the security model
 hence we couldn't make the new approach by default.

 During the time, we have observed various Kafka connector related
 issues which came from old offset fetching (e.g. hang, issues on rebalance
 on customer group, etc.) and we fixed many of these issues via simply
 flipping the config.

 Based on this, I would consider the default value as "incorrect". The
 security-related behavioral change would be introduced inevitably (they can
 set topic based ACL rule), but most people will get benefited. IMHO this is
 something we can deal with release/migration note.

 Would like to hear the voices on this.

 Thanks,
 Jungtaek Lim (HeartSaVioR)

>>>


Re: [VOTE] Release Spark 3.3.1 (RC4)

2022-10-18 Thread Cheng Pan
+1 (non-binding)

- Passed Apache Kyuubi (Incubating) integration tests[1]
- Run some jobs on our internal K8s cluster

[1] https://github.com/apache/incubator-kyuubi/pull/3507

Thanks,
Cheng Pan

On Wed, Oct 19, 2022 at 9:13 AM Yikun Jiang  wrote:
>
> +1, also test passed with spark-docker workflow (downloading rc4 tgz, 
> extract, build image, run K8s IT)
>
> [1] https://github.com/Yikun/spark-docker/pull/9
>
> Regards,
> Yikun
>
> On Wed, Oct 19, 2022 at 8:59 AM Wenchen Fan  wrote:
>>
>> +1
>>
>> On Wed, Oct 19, 2022 at 4:59 AM Chao Sun  wrote:
>>>
>>> +1. Thanks Yuming!
>>>
>>> Chao
>>>
>>> On Tue, Oct 18, 2022 at 1:18 PM Thomas graves  wrote:
>>> >
>>> > +1. Ran internal test suite.
>>> >
>>> > Tom
>>> >
>>> > On Sun, Oct 16, 2022 at 9:14 PM Yuming Wang  wrote:
>>> > >
>>> > > Please vote on releasing the following candidate as Apache Spark 
>>> > > version 3.3.1.
>>> > >
>>> > > The vote is open until 11:59pm Pacific time October 21th and passes if 
>>> > > a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>> > >
>>> > > [ ] +1 Release this package as Apache Spark 3.3.1
>>> > > [ ] -1 Do not release this package because ...
>>> > >
>>> > > To learn more about Apache Spark, please see https://spark.apache.org
>>> > >
>>> > > The tag to be voted on is v3.3.1-rc4 (commit 
>>> > > fbbcf9434ac070dd4ced4fb9efe32899c6db12a9):
>>> > > https://github.com/apache/spark/tree/v3.3.1-rc4
>>> > >
>>> > > The release files, including signatures, digests, etc. can be found at:
>>> > > https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-bin
>>> > >
>>> > > Signatures used for Spark RCs can be found in this file:
>>> > > https://dist.apache.org/repos/dist/dev/spark/KEYS
>>> > >
>>> > > The staging repository for this release can be found at:
>>> > > https://repository.apache.org/content/repositories/orgapachespark-1430
>>> > >
>>> > > The documentation corresponding to this release can be found at:
>>> > > https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-docs
>>> > >
>>> > > The list of bug fixes going into 3.3.1 can be found at the following 
>>> > > URL:
>>> > > https://s.apache.org/ttgz6
>>> > >
>>> > > This release is using the release script of the tag v3.3.1-rc4.
>>> > >
>>> > >
>>> > > FAQ
>>> > >
>>> > > ==
>>> > > What happened to v3.3.1-rc3?
>>> > > ==
>>> > > A performance regression(SPARK-40703) was found after tagging 
>>> > > v3.3.1-rc3, which the Iceberg community hopes Spark 3.3.1 could fix.
>>> > > So we skipped the vote on v3.3.1-rc3.
>>> > >
>>> > > =
>>> > > How can I help test this release?
>>> > > =
>>> > > If you are a Spark user, you can help us test this release by taking
>>> > > an existing Spark workload and running on this release candidate, then
>>> > > reporting any regressions.
>>> > >
>>> > > If you're working in PySpark you can set up a virtual env and install
>>> > > the current RC and see if anything important breaks, in the Java/Scala
>>> > > you can add the staging repository to your projects resolvers and test
>>> > > with the RC (make sure to clean up the artifact cache before/after so
>>> > > you don't end up building with a out of date RC going forward).
>>> > >
>>> > > ===
>>> > > What should happen to JIRA tickets still targeting 3.3.1?
>>> > > ===
>>> > > The current list of open tickets targeted at 3.3.1 can be found at:
>>> > > https://issues.apache.org/jira/projects/SPARK and search for "Target 
>>> > > Version/s" = 3.3.1
>>> > >
>>> > > Committers should look at those and triage. Extremely important bug
>>> > > fixes, documentation, and API tweaks that impact compatibility should
>>> > > be worked on immediately. Everything else please retarget to an
>>> > > appropriate release.
>>> > >
>>> > > ==
>>> > > But my bug isn't fixed?
>>> > > ==
>>> > > In order to make timely releases, we will typically not hold the
>>> > > release unless the bug in question is a regression from the previous
>>> > > release. That being said, if there is something which is a regression
>>> > > that has not been correctly targeted please ping me or a committer to
>>> > > help target the issue.
>>> > >
>>> > >
>>> >
>>> > -
>>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> >
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Apache Spark 3.2.3 Release?

2022-10-18 Thread Yang,Jie(INF)
+1

发件人: vaquar khan 
日期: 2022年10月19日 星期三 10:08
收件人: "416161...@qq.com" 
抄送: Yuming Wang , kazuyuki tanimura 
, Gengliang Wang , huaxin gao 
, Dongjoon Hyun , Sean Owen 
, Chao Sun , dev 
主题: Re: Apache Spark 3.2.3 Release?

+1

On Tue, Oct 18, 2022, 8:58 PM 416161...@qq.com 
mailto:ruife...@foxmail.com>> wrote:
+1


[图像已被发件人删除。]

Ruifeng Zheng
ruife...@foxmail.com




-- Original --
From: "Yuming Wang" mailto:wgy...@gmail.com>>;
Date: Wed, Oct 19, 2022 09:35 AM
To: "kazuyuki tanimura";
Cc: "Gengliang Wang"mailto:ltn...@gmail.com>>;"huaxin 
gao"mailto:huaxin.ga...@gmail.com>>;"Dongjoon 
Hyun"mailto:dongjoon.h...@gmail.com>>;"Sean 
Owen"mailto:sro...@gmail.com>>;"Chao 
Sun"mailto:sunc...@apache.org>>;"dev"mailto:dev@spark.apache.org>>;
Subject: Re: Apache Spark 3.2.3 Release?

+1

On Wed, Oct 19, 2022 at 4:17 AM kazuyuki tanimura  
wrote:
+1 Thanks Chao!


Kazu

On Oct 18, 2022, at 11:48 AM, Gengliang Wang 
mailto:ltn...@gmail.com>> wrote:

+1. Thanks Chao!

On Tue, Oct 18, 2022 at 11:45 AM huaxin gao 
mailto:huaxin.ga...@gmail.com>> wrote:
+1 Thanks Chao!

Huaxin

On Tue, Oct 18, 2022 at 11:29 AM Dongjoon Hyun 
mailto:dongjoon.h...@gmail.com>> wrote:
+1

Thank you for volunteering, Chao!

Dongjoon.


On Tue, Oct 18, 2022 at 9:55 AM Sean Owen 
mailto:sro...@gmail.com>> wrote:
OK by me, if someone is willing to drive it.

On Tue, Oct 18, 2022 at 11:47 AM Chao Sun 
mailto:sunc...@apache.org>> wrote:
Hi All,

It's been more than 3 months since 3.2.2 (tagged at Jul 11) was
released There are now 66 patches accumulated in branch-3.2, including
2 correctness issues.

Is it a good time to start a new release? If there's no objection, I'd
like to volunteer as the release manager for the 3.2.3 release, and
start preparing the first RC next week.

# Correctness issues

SPARK-39833Filtered parquet data frame count() and show() produce
inconsistent results when spark.sql.parquet.filterPushdown is true
SPARK-40002.   Limit improperly pushed down through window using ntile function

Best,
Chao

-
To unsubscribe e-mail: 
dev-unsubscr...@spark.apache.org



Re: Apache Spark 3.2.3 Release?

2022-10-18 Thread vaquar khan
+1

On Tue, Oct 18, 2022, 8:58 PM 416161...@qq.com  wrote:

> +1
>
> --
> Ruifeng Zheng
> ruife...@foxmail.com
>
> 
>
>
>
> -- Original --
> *From:* "Yuming Wang" ;
> *Date:* Wed, Oct 19, 2022 09:35 AM
> *To:* "kazuyuki tanimura";
> *Cc:* "Gengliang Wang";"huaxin gao"<
> huaxin.ga...@gmail.com>;"Dongjoon Hyun";"Sean
> Owen";"Chao Sun";"dev"<
> dev@spark.apache.org>;
> *Subject:* Re: Apache Spark 3.2.3 Release?
>
> +1
>
> On Wed, Oct 19, 2022 at 4:17 AM kazuyuki tanimura
>  wrote:
>
>> +1 Thanks Chao!
>>
>>
>> Kazu
>>
>> On Oct 18, 2022, at 11:48 AM, Gengliang Wang  wrote:
>>
>> +1. Thanks Chao!
>>
>> On Tue, Oct 18, 2022 at 11:45 AM huaxin gao 
>> wrote:
>>
>>> +1 Thanks Chao!
>>>
>>> Huaxin
>>>
>>> On Tue, Oct 18, 2022 at 11:29 AM Dongjoon Hyun 
>>> wrote:
>>>
 +1

 Thank you for volunteering, Chao!

 Dongjoon.


 On Tue, Oct 18, 2022 at 9:55 AM Sean Owen  wrote:

> OK by me, if someone is willing to drive it.
>
> On Tue, Oct 18, 2022 at 11:47 AM Chao Sun  wrote:
>
>> Hi All,
>>
>> It's been more than 3 months since 3.2.2 (tagged at Jul 11) was
>> released There are now 66 patches accumulated in branch-3.2, including
>> 2 correctness issues.
>>
>> Is it a good time to start a new release? If there's no objection, I'd
>> like to volunteer as the release manager for the 3.2.3 release, and
>> start preparing the first RC next week.
>>
>> # Correctness issues
>>
>> SPARK-39833Filtered parquet data frame count() and show() produce
>> inconsistent results when spark.sql.parquet.filterPushdown is true
>> SPARK-40002.   Limit improperly pushed down through window using
>> ntile function
>>
>> Best,
>> Chao
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>>


Re: Apache Spark 3.2.3 Release?

2022-10-18 Thread 416161...@qq.com
+1




RuifengZheng
ruife...@foxmail.com








--Original--
From:   
 "Yuming Wang"  
  


Re: Apache Spark 3.2.3 Release?

2022-10-18 Thread Yuming Wang
+1

On Wed, Oct 19, 2022 at 4:17 AM kazuyuki tanimura
 wrote:

> +1 Thanks Chao!
>
>
> Kazu
>
> On Oct 18, 2022, at 11:48 AM, Gengliang Wang  wrote:
>
> +1. Thanks Chao!
>
> On Tue, Oct 18, 2022 at 11:45 AM huaxin gao 
> wrote:
>
>> +1 Thanks Chao!
>>
>> Huaxin
>>
>> On Tue, Oct 18, 2022 at 11:29 AM Dongjoon Hyun 
>> wrote:
>>
>>> +1
>>>
>>> Thank you for volunteering, Chao!
>>>
>>> Dongjoon.
>>>
>>>
>>> On Tue, Oct 18, 2022 at 9:55 AM Sean Owen  wrote:
>>>
 OK by me, if someone is willing to drive it.

 On Tue, Oct 18, 2022 at 11:47 AM Chao Sun  wrote:

> Hi All,
>
> It's been more than 3 months since 3.2.2 (tagged at Jul 11) was
> released There are now 66 patches accumulated in branch-3.2, including
> 2 correctness issues.
>
> Is it a good time to start a new release? If there's no objection, I'd
> like to volunteer as the release manager for the 3.2.3 release, and
> start preparing the first RC next week.
>
> # Correctness issues
>
> SPARK-39833Filtered parquet data frame count() and show() produce
> inconsistent results when spark.sql.parquet.filterPushdown is true
> SPARK-40002.   Limit improperly pushed down through window using ntile
> function
>
> Best,
> Chao
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>
>


Re: [VOTE] Release Spark 3.3.1 (RC4)

2022-10-18 Thread Yikun Jiang
+1, also test passed with spark-docker workflow (downloading rc4 tgz,
extract, build image, run K8s IT)

[1] https://github.com/Yikun/spark-docker/pull/9

Regards,
Yikun

On Wed, Oct 19, 2022 at 8:59 AM Wenchen Fan  wrote:

> +1
>
> On Wed, Oct 19, 2022 at 4:59 AM Chao Sun  wrote:
>
>> +1. Thanks Yuming!
>>
>> Chao
>>
>> On Tue, Oct 18, 2022 at 1:18 PM Thomas graves  wrote:
>> >
>> > +1. Ran internal test suite.
>> >
>> > Tom
>> >
>> > On Sun, Oct 16, 2022 at 9:14 PM Yuming Wang  wrote:
>> > >
>> > > Please vote on releasing the following candidate as Apache Spark
>> version 3.3.1.
>> > >
>> > > The vote is open until 11:59pm Pacific time October 21th and passes
>> if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>> > >
>> > > [ ] +1 Release this package as Apache Spark 3.3.1
>> > > [ ] -1 Do not release this package because ...
>> > >
>> > > To learn more about Apache Spark, please see https://spark.apache.org
>> > >
>> > > The tag to be voted on is v3.3.1-rc4 (commit
>> fbbcf9434ac070dd4ced4fb9efe32899c6db12a9):
>> > > https://github.com/apache/spark/tree/v3.3.1-rc4
>> > >
>> > > The release files, including signatures, digests, etc. can be found
>> at:
>> > > https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-bin
>> > >
>> > > Signatures used for Spark RCs can be found in this file:
>> > > https://dist.apache.org/repos/dist/dev/spark/KEYS
>> > >
>> > > The staging repository for this release can be found at:
>> > >
>> https://repository.apache.org/content/repositories/orgapachespark-1430
>> > >
>> > > The documentation corresponding to this release can be found at:
>> > > https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-docs
>> > >
>> > > The list of bug fixes going into 3.3.1 can be found at the following
>> URL:
>> > > https://s.apache.org/ttgz6
>> > >
>> > > This release is using the release script of the tag v3.3.1-rc4.
>> > >
>> > >
>> > > FAQ
>> > >
>> > > ==
>> > > What happened to v3.3.1-rc3?
>> > > ==
>> > > A performance regression(SPARK-40703) was found after tagging
>> v3.3.1-rc3, which the Iceberg community hopes Spark 3.3.1 could fix.
>> > > So we skipped the vote on v3.3.1-rc3.
>> > >
>> > > =
>> > > How can I help test this release?
>> > > =
>> > > If you are a Spark user, you can help us test this release by taking
>> > > an existing Spark workload and running on this release candidate, then
>> > > reporting any regressions.
>> > >
>> > > If you're working in PySpark you can set up a virtual env and install
>> > > the current RC and see if anything important breaks, in the Java/Scala
>> > > you can add the staging repository to your projects resolvers and test
>> > > with the RC (make sure to clean up the artifact cache before/after so
>> > > you don't end up building with a out of date RC going forward).
>> > >
>> > > ===
>> > > What should happen to JIRA tickets still targeting 3.3.1?
>> > > ===
>> > > The current list of open tickets targeted at 3.3.1 can be found at:
>> > > https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.3.1
>> > >
>> > > Committers should look at those and triage. Extremely important bug
>> > > fixes, documentation, and API tweaks that impact compatibility should
>> > > be worked on immediately. Everything else please retarget to an
>> > > appropriate release.
>> > >
>> > > ==
>> > > But my bug isn't fixed?
>> > > ==
>> > > In order to make timely releases, we will typically not hold the
>> > > release unless the bug in question is a regression from the previous
>> > > release. That being said, if there is something which is a regression
>> > > that has not been correctly targeted please ping me or a committer to
>> > > help target the issue.
>> > >
>> > >
>> >
>> > -
>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [VOTE] Release Spark 3.3.1 (RC4)

2022-10-18 Thread Wenchen Fan
+1

On Wed, Oct 19, 2022 at 4:59 AM Chao Sun  wrote:

> +1. Thanks Yuming!
>
> Chao
>
> On Tue, Oct 18, 2022 at 1:18 PM Thomas graves  wrote:
> >
> > +1. Ran internal test suite.
> >
> > Tom
> >
> > On Sun, Oct 16, 2022 at 9:14 PM Yuming Wang  wrote:
> > >
> > > Please vote on releasing the following candidate as Apache Spark
> version 3.3.1.
> > >
> > > The vote is open until 11:59pm Pacific time October 21th and passes if
> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> > >
> > > [ ] +1 Release this package as Apache Spark 3.3.1
> > > [ ] -1 Do not release this package because ...
> > >
> > > To learn more about Apache Spark, please see https://spark.apache.org
> > >
> > > The tag to be voted on is v3.3.1-rc4 (commit
> fbbcf9434ac070dd4ced4fb9efe32899c6db12a9):
> > > https://github.com/apache/spark/tree/v3.3.1-rc4
> > >
> > > The release files, including signatures, digests, etc. can be found at:
> > > https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-bin
> > >
> > > Signatures used for Spark RCs can be found in this file:
> > > https://dist.apache.org/repos/dist/dev/spark/KEYS
> > >
> > > The staging repository for this release can be found at:
> > > https://repository.apache.org/content/repositories/orgapachespark-1430
> > >
> > > The documentation corresponding to this release can be found at:
> > > https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-docs
> > >
> > > The list of bug fixes going into 3.3.1 can be found at the following
> URL:
> > > https://s.apache.org/ttgz6
> > >
> > > This release is using the release script of the tag v3.3.1-rc4.
> > >
> > >
> > > FAQ
> > >
> > > ==
> > > What happened to v3.3.1-rc3?
> > > ==
> > > A performance regression(SPARK-40703) was found after tagging
> v3.3.1-rc3, which the Iceberg community hopes Spark 3.3.1 could fix.
> > > So we skipped the vote on v3.3.1-rc3.
> > >
> > > =
> > > How can I help test this release?
> > > =
> > > If you are a Spark user, you can help us test this release by taking
> > > an existing Spark workload and running on this release candidate, then
> > > reporting any regressions.
> > >
> > > If you're working in PySpark you can set up a virtual env and install
> > > the current RC and see if anything important breaks, in the Java/Scala
> > > you can add the staging repository to your projects resolvers and test
> > > with the RC (make sure to clean up the artifact cache before/after so
> > > you don't end up building with a out of date RC going forward).
> > >
> > > ===
> > > What should happen to JIRA tickets still targeting 3.3.1?
> > > ===
> > > The current list of open tickets targeted at 3.3.1 can be found at:
> > > https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.3.1
> > >
> > > Committers should look at those and triage. Extremely important bug
> > > fixes, documentation, and API tweaks that impact compatibility should
> > > be worked on immediately. Everything else please retarget to an
> > > appropriate release.
> > >
> > > ==
> > > But my bug isn't fixed?
> > > ==
> > > In order to make timely releases, we will typically not hold the
> > > release unless the bug in question is a regression from the previous
> > > release. That being said, if there is something which is a regression
> > > that has not been correctly targeted please ping me or a committer to
> > > help target the issue.
> > >
> > >
> >
> > -
> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Release Spark 3.3.1 (RC4)

2022-10-18 Thread Chao Sun
+1. Thanks Yuming!

Chao

On Tue, Oct 18, 2022 at 1:18 PM Thomas graves  wrote:
>
> +1. Ran internal test suite.
>
> Tom
>
> On Sun, Oct 16, 2022 at 9:14 PM Yuming Wang  wrote:
> >
> > Please vote on releasing the following candidate as Apache Spark version 
> > 3.3.1.
> >
> > The vote is open until 11:59pm Pacific time October 21th and passes if a 
> > majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >
> > [ ] +1 Release this package as Apache Spark 3.3.1
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see https://spark.apache.org
> >
> > The tag to be voted on is v3.3.1-rc4 (commit 
> > fbbcf9434ac070dd4ced4fb9efe32899c6db12a9):
> > https://github.com/apache/spark/tree/v3.3.1-rc4
> >
> > The release files, including signatures, digests, etc. can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-bin
> >
> > Signatures used for Spark RCs can be found in this file:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1430
> >
> > The documentation corresponding to this release can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-docs
> >
> > The list of bug fixes going into 3.3.1 can be found at the following URL:
> > https://s.apache.org/ttgz6
> >
> > This release is using the release script of the tag v3.3.1-rc4.
> >
> >
> > FAQ
> >
> > ==
> > What happened to v3.3.1-rc3?
> > ==
> > A performance regression(SPARK-40703) was found after tagging v3.3.1-rc3, 
> > which the Iceberg community hopes Spark 3.3.1 could fix.
> > So we skipped the vote on v3.3.1-rc3.
> >
> > =
> > How can I help test this release?
> > =
> > If you are a Spark user, you can help us test this release by taking
> > an existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > If you're working in PySpark you can set up a virtual env and install
> > the current RC and see if anything important breaks, in the Java/Scala
> > you can add the staging repository to your projects resolvers and test
> > with the RC (make sure to clean up the artifact cache before/after so
> > you don't end up building with a out of date RC going forward).
> >
> > ===
> > What should happen to JIRA tickets still targeting 3.3.1?
> > ===
> > The current list of open tickets targeted at 3.3.1 can be found at:
> > https://issues.apache.org/jira/projects/SPARK and search for "Target 
> > Version/s" = 3.3.1
> >
> > Committers should look at those and triage. Extremely important bug
> > fixes, documentation, and API tweaks that impact compatibility should
> > be worked on immediately. Everything else please retarget to an
> > appropriate release.
> >
> > ==
> > But my bug isn't fixed?
> > ==
> > In order to make timely releases, we will typically not hold the
> > release unless the bug in question is a regression from the previous
> > release. That being said, if there is something which is a regression
> > that has not been correctly targeted please ping me or a committer to
> > help target the issue.
> >
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Spark 3.3.1 (RC4)

2022-10-18 Thread Thomas graves
+1. Ran internal test suite.

Tom

On Sun, Oct 16, 2022 at 9:14 PM Yuming Wang  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 3.3.1.
>
> The vote is open until 11:59pm Pacific time October 21th and passes if a 
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.3.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see https://spark.apache.org
>
> The tag to be voted on is v3.3.1-rc4 (commit 
> fbbcf9434ac070dd4ced4fb9efe32899c6db12a9):
> https://github.com/apache/spark/tree/v3.3.1-rc4
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-bin
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1430
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-docs
>
> The list of bug fixes going into 3.3.1 can be found at the following URL:
> https://s.apache.org/ttgz6
>
> This release is using the release script of the tag v3.3.1-rc4.
>
>
> FAQ
>
> ==
> What happened to v3.3.1-rc3?
> ==
> A performance regression(SPARK-40703) was found after tagging v3.3.1-rc3, 
> which the Iceberg community hopes Spark 3.3.1 could fix.
> So we skipped the vote on v3.3.1-rc3.
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.3.1?
> ===
> The current list of open tickets targeted at 3.3.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target 
> Version/s" = 3.3.1
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Apache Spark 3.2.3 Release?

2022-10-18 Thread kazuyuki tanimura
+1 Thanks Chao!


Kazu

> On Oct 18, 2022, at 11:48 AM, Gengliang Wang  wrote:
> 
> +1. Thanks Chao!
> 
> On Tue, Oct 18, 2022 at 11:45 AM huaxin gao  > wrote:
> +1 Thanks Chao!
> 
> Huaxin
> 
> On Tue, Oct 18, 2022 at 11:29 AM Dongjoon Hyun  > wrote:
> +1
> 
> Thank you for volunteering, Chao!
> 
> Dongjoon.
> 
> 
> On Tue, Oct 18, 2022 at 9:55 AM Sean Owen  > wrote:
> OK by me, if someone is willing to drive it.
> 
> On Tue, Oct 18, 2022 at 11:47 AM Chao Sun  > wrote:
> Hi All,
> 
> It's been more than 3 months since 3.2.2 (tagged at Jul 11) was
> released There are now 66 patches accumulated in branch-3.2, including
> 2 correctness issues.
> 
> Is it a good time to start a new release? If there's no objection, I'd
> like to volunteer as the release manager for the 3.2.3 release, and
> start preparing the first RC next week.
> 
> # Correctness issues
> 
> SPARK-39833Filtered parquet data frame count() and show() produce
> inconsistent results when spark.sql.parquet.filterPushdown is true
> SPARK-40002.   Limit improperly pushed down through window using ntile 
> function
> 
> Best,
> Chao
> 
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org 
> 
> 



Re: Apache Spark 3.2.3 Release?

2022-10-18 Thread Gengliang Wang
+1. Thanks Chao!

On Tue, Oct 18, 2022 at 11:45 AM huaxin gao  wrote:

> +1 Thanks Chao!
>
> Huaxin
>
> On Tue, Oct 18, 2022 at 11:29 AM Dongjoon Hyun 
> wrote:
>
>> +1
>>
>> Thank you for volunteering, Chao!
>>
>> Dongjoon.
>>
>>
>> On Tue, Oct 18, 2022 at 9:55 AM Sean Owen  wrote:
>>
>>> OK by me, if someone is willing to drive it.
>>>
>>> On Tue, Oct 18, 2022 at 11:47 AM Chao Sun  wrote:
>>>
 Hi All,

 It's been more than 3 months since 3.2.2 (tagged at Jul 11) was
 released There are now 66 patches accumulated in branch-3.2, including
 2 correctness issues.

 Is it a good time to start a new release? If there's no objection, I'd
 like to volunteer as the release manager for the 3.2.3 release, and
 start preparing the first RC next week.

 # Correctness issues

 SPARK-39833Filtered parquet data frame count() and show() produce
 inconsistent results when spark.sql.parquet.filterPushdown is true
 SPARK-40002.   Limit improperly pushed down through window using ntile
 function

 Best,
 Chao

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org




Re: [VOTE] Release Spark 3.3.1 (RC4)

2022-10-18 Thread Gengliang Wang
+1 from me, same as last time.

On Tue, Oct 18, 2022 at 11:45 AM L. C. Hsieh  wrote:

> +1
>
> Thanks Yuming!
>
> On Tue, Oct 18, 2022 at 11:28 AM Dongjoon Hyun 
> wrote:
> >
> > +1
> >
> > Thank you, Yuming and all!
> >
> > Dongjoon.
> >
> >
> > On Tue, Oct 18, 2022 at 9:22 AM Yang,Jie(INF) 
> wrote:
> >>
> >> Use maven to test Java 17 + Scala 2.13 and test passed, +1 for me
> >>
> >>
> >>
> >> 发件人: Sean Owen 
> >> 日期: 2022年10月17日 星期一 21:34
> >> 收件人: Yuming Wang 
> >> 抄送: dev 
> >> 主题: Re: [VOTE] Release Spark 3.3.1 (RC4)
> >>
> >>
> >>
> >> +1 from me, same as last time
> >>
> >>
> >>
> >> On Sun, Oct 16, 2022 at 9:14 PM Yuming Wang  wrote:
> >>
> >> Please vote on releasing the following candidate as Apache Spark
> version 3.3.1.
> >>
> >> The vote is open until 11:59pm Pacific time October 21th and passes if
> a majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >>
> >> [ ] +1 Release this package as Apache Spark 3.3.1
> >> [ ] -1 Do not release this package because ...
> >>
> >> To learn more about Apache Spark, please see https://spark.apache.org
> >>
> >> The tag to be voted on is v3.3.1-rc4 (commit
> fbbcf9434ac070dd4ced4fb9efe32899c6db12a9):
> >> https://github.com/apache/spark/tree/v3.3.1-rc4
> >>
> >> The release files, including signatures, digests, etc. can be found at:
> >> https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-bin
> >>
> >> Signatures used for Spark RCs can be found in this file:
> >> https://dist.apache.org/repos/dist/dev/spark/KEYS
> >>
> >> The staging repository for this release can be found at:
> >> https://repository.apache.org/content/repositories/orgapachespark-1430
> >>
> >> The documentation corresponding to this release can be found at:
> >> https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-docs
> >>
> >> The list of bug fixes going into 3.3.1 can be found at the following
> URL:
> >> https://s.apache.org/ttgz6
> >>
> >> This release is using the release script of the tag v3.3.1-rc4.
> >>
> >>
> >> FAQ
> >>
> >> ==
> >> What happened to v3.3.1-rc3?
> >> ==
> >> A performance regression(SPARK-40703) was found after tagging
> v3.3.1-rc3, which the Iceberg community hopes Spark 3.3.1 could fix.
> >> So we skipped the vote on v3.3.1-rc3.
> >>
> >> =
> >> How can I help test this release?
> >> =
> >> If you are a Spark user, you can help us test this release by taking
> >> an existing Spark workload and running on this release candidate, then
> >> reporting any regressions.
> >>
> >> If you're working in PySpark you can set up a virtual env and install
> >> the current RC and see if anything important breaks, in the Java/Scala
> >> you can add the staging repository to your projects resolvers and test
> >> with the RC (make sure to clean up the artifact cache before/after so
> >> you don't end up building with a out of date RC going forward).
> >>
> >> ===
> >> What should happen to JIRA tickets still targeting 3.3.1?
> >> ===
> >> The current list of open tickets targeted at 3.3.1 can be found at:
> >> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.3.1
> >>
> >> Committers should look at those and triage. Extremely important bug
> >> fixes, documentation, and API tweaks that impact compatibility should
> >> be worked on immediately. Everything else please retarget to an
> >> appropriate release.
> >>
> >> ==
> >> But my bug isn't fixed?
> >> ==
> >> In order to make timely releases, we will typically not hold the
> >> release unless the bug in question is a regression from the previous
> >> release. That being said, if there is something which is a regression
> >> that has not been correctly targeted please ping me or a committer to
> >> help target the issue.
> >>
> >>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Release Spark 3.3.1 (RC4)

2022-10-18 Thread L. C. Hsieh
+1

Thanks Yuming!

On Tue, Oct 18, 2022 at 11:28 AM Dongjoon Hyun  wrote:
>
> +1
>
> Thank you, Yuming and all!
>
> Dongjoon.
>
>
> On Tue, Oct 18, 2022 at 9:22 AM Yang,Jie(INF)  wrote:
>>
>> Use maven to test Java 17 + Scala 2.13 and test passed, +1 for me
>>
>>
>>
>> 发件人: Sean Owen 
>> 日期: 2022年10月17日 星期一 21:34
>> 收件人: Yuming Wang 
>> 抄送: dev 
>> 主题: Re: [VOTE] Release Spark 3.3.1 (RC4)
>>
>>
>>
>> +1 from me, same as last time
>>
>>
>>
>> On Sun, Oct 16, 2022 at 9:14 PM Yuming Wang  wrote:
>>
>> Please vote on releasing the following candidate as Apache Spark version 
>> 3.3.1.
>>
>> The vote is open until 11:59pm Pacific time October 21th and passes if a 
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.3.1
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see https://spark.apache.org
>>
>> The tag to be voted on is v3.3.1-rc4 (commit 
>> fbbcf9434ac070dd4ced4fb9efe32899c6db12a9):
>> https://github.com/apache/spark/tree/v3.3.1-rc4
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-bin
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1430
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-docs
>>
>> The list of bug fixes going into 3.3.1 can be found at the following URL:
>> https://s.apache.org/ttgz6
>>
>> This release is using the release script of the tag v3.3.1-rc4.
>>
>>
>> FAQ
>>
>> ==
>> What happened to v3.3.1-rc3?
>> ==
>> A performance regression(SPARK-40703) was found after tagging v3.3.1-rc3, 
>> which the Iceberg community hopes Spark 3.3.1 could fix.
>> So we skipped the vote on v3.3.1-rc3.
>>
>> =
>> How can I help test this release?
>> =
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 3.3.1?
>> ===
>> The current list of open tickets targeted at 3.3.1 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target 
>> Version/s" = 3.3.1
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Apache Spark 3.2.3 Release?

2022-10-18 Thread huaxin gao
+1 Thanks Chao!

Huaxin

On Tue, Oct 18, 2022 at 11:29 AM Dongjoon Hyun 
wrote:

> +1
>
> Thank you for volunteering, Chao!
>
> Dongjoon.
>
>
> On Tue, Oct 18, 2022 at 9:55 AM Sean Owen  wrote:
>
>> OK by me, if someone is willing to drive it.
>>
>> On Tue, Oct 18, 2022 at 11:47 AM Chao Sun  wrote:
>>
>>> Hi All,
>>>
>>> It's been more than 3 months since 3.2.2 (tagged at Jul 11) was
>>> released There are now 66 patches accumulated in branch-3.2, including
>>> 2 correctness issues.
>>>
>>> Is it a good time to start a new release? If there's no objection, I'd
>>> like to volunteer as the release manager for the 3.2.3 release, and
>>> start preparing the first RC next week.
>>>
>>> # Correctness issues
>>>
>>> SPARK-39833Filtered parquet data frame count() and show() produce
>>> inconsistent results when spark.sql.parquet.filterPushdown is true
>>> SPARK-40002.   Limit improperly pushed down through window using ntile
>>> function
>>>
>>> Best,
>>> Chao
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>


Re: Apache Spark 3.2.3 Release?

2022-10-18 Thread L. C. Hsieh
+1

Thanks Chao!

On Tue, Oct 18, 2022 at 11:30 AM Dongjoon Hyun  wrote:
>
> +1
>
> Thank you for volunteering, Chao!
>
> Dongjoon.
>
>
> On Tue, Oct 18, 2022 at 9:55 AM Sean Owen  wrote:
>>
>> OK by me, if someone is willing to drive it.
>>
>> On Tue, Oct 18, 2022 at 11:47 AM Chao Sun  wrote:
>>>
>>> Hi All,
>>>
>>> It's been more than 3 months since 3.2.2 (tagged at Jul 11) was
>>> released There are now 66 patches accumulated in branch-3.2, including
>>> 2 correctness issues.
>>>
>>> Is it a good time to start a new release? If there's no objection, I'd
>>> like to volunteer as the release manager for the 3.2.3 release, and
>>> start preparing the first RC next week.
>>>
>>> # Correctness issues
>>>
>>> SPARK-39833Filtered parquet data frame count() and show() produce
>>> inconsistent results when spark.sql.parquet.filterPushdown is true
>>> SPARK-40002.   Limit improperly pushed down through window using ntile 
>>> function
>>>
>>> Best,
>>> Chao
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: Apache Spark 3.2.3 Release?

2022-10-18 Thread Dongjoon Hyun
+1

Thank you for volunteering, Chao!

Dongjoon.


On Tue, Oct 18, 2022 at 9:55 AM Sean Owen  wrote:

> OK by me, if someone is willing to drive it.
>
> On Tue, Oct 18, 2022 at 11:47 AM Chao Sun  wrote:
>
>> Hi All,
>>
>> It's been more than 3 months since 3.2.2 (tagged at Jul 11) was
>> released There are now 66 patches accumulated in branch-3.2, including
>> 2 correctness issues.
>>
>> Is it a good time to start a new release? If there's no objection, I'd
>> like to volunteer as the release manager for the 3.2.3 release, and
>> start preparing the first RC next week.
>>
>> # Correctness issues
>>
>> SPARK-39833Filtered parquet data frame count() and show() produce
>> inconsistent results when spark.sql.parquet.filterPushdown is true
>> SPARK-40002.   Limit improperly pushed down through window using ntile
>> function
>>
>> Best,
>> Chao
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [VOTE] Release Spark 3.3.1 (RC4)

2022-10-18 Thread Dongjoon Hyun
+1

Thank you, Yuming and all!

Dongjoon.


On Tue, Oct 18, 2022 at 9:22 AM Yang,Jie(INF)  wrote:

> Use maven to test Java 17 + Scala 2.13 and test passed, +1 for me
>
>
>
> *发件人**: *Sean Owen 
> *日期**: *2022年10月17日 星期一 21:34
> *收件人**: *Yuming Wang 
> *抄送**: *dev 
> *主题**: *Re: [VOTE] Release Spark 3.3.1 (RC4)
>
>
>
> +1 from me, same as last time
>
>
>
> On Sun, Oct 16, 2022 at 9:14 PM Yuming Wang  wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 3.3.1.
>
> The vote is open until 11:59pm Pacific time October 21th and passes if a 
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.3.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see https://spark.apache.org 
> 
>
> The tag to be voted on is v3.3.1-rc4 (commit 
> fbbcf9434ac070dd4ced4fb9efe32899c6db12a9):
> https://github.com/apache/spark/tree/v3.3.1-rc4 
> 
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-bin 
> 
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS 
> 
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1430 
> 
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-docs 
> 
>
> The list of bug fixes going into 3.3.1 can be found at the following URL:
> https://s.apache.org/ttgz6 
> 
>
> This release is using the release script of the tag v3.3.1-rc4.
>
>
> FAQ
>
> ==
> What happened to v3.3.1-rc3?
> ==
> A performance regression(SPARK-40703) was found after tagging v3.3.1-rc3, 
> which the Iceberg community hopes Spark 3.3.1 could fix.
> So we skipped the vote on v3.3.1-rc3.
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.3.1?
> ===
> The current list of open tickets targeted at 3.3.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK 
> 
>  and search for "Target Version/s" = 3.3.1
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
>
>


Missing data in spark output

2022-10-18 Thread Sandeep Vinayak
Hello Everyone,

We are recently observing an intermittent data loss in the spark with
output to GCS (google cloud storage). When there are missing rows, they are
accompanied by duplicate rows. The re-run of the job doesn't have any
duplicate or missing rows. Since it's hard to debug, we are first trying to
understand the potential theoretical root cause of this issue, can this be
a GCS specific issue where GCS might not be handling the consistencies
well? Any tips will be super helpful.

Thanks,


Re: Apache Spark 3.2.3 Release?

2022-10-18 Thread Sean Owen
OK by me, if someone is willing to drive it.

On Tue, Oct 18, 2022 at 11:47 AM Chao Sun  wrote:

> Hi All,
>
> It's been more than 3 months since 3.2.2 (tagged at Jul 11) was
> released There are now 66 patches accumulated in branch-3.2, including
> 2 correctness issues.
>
> Is it a good time to start a new release? If there's no objection, I'd
> like to volunteer as the release manager for the 3.2.3 release, and
> start preparing the first RC next week.
>
> # Correctness issues
>
> SPARK-39833Filtered parquet data frame count() and show() produce
> inconsistent results when spark.sql.parquet.filterPushdown is true
> SPARK-40002.   Limit improperly pushed down through window using ntile
> function
>
> Best,
> Chao
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Apache Spark 3.2.3 Release?

2022-10-18 Thread Chao Sun
Hi All,

It's been more than 3 months since 3.2.2 (tagged at Jul 11) was
released There are now 66 patches accumulated in branch-3.2, including
2 correctness issues.

Is it a good time to start a new release? If there's no objection, I'd
like to volunteer as the release manager for the 3.2.3 release, and
start preparing the first RC next week.

# Correctness issues

SPARK-39833Filtered parquet data frame count() and show() produce
inconsistent results when spark.sql.parquet.filterPushdown is true
SPARK-40002.   Limit improperly pushed down through window using ntile function

Best,
Chao

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Spark 3.3.1 (RC4)

2022-10-18 Thread Yang,Jie(INF)
Use maven to test Java 17 + Scala 2.13 and test passed, +1 for me

发件人: Sean Owen 
日期: 2022年10月17日 星期一 21:34
收件人: Yuming Wang 
抄送: dev 
主题: Re: [VOTE] Release Spark 3.3.1 (RC4)

+1 from me, same as last time

On Sun, Oct 16, 2022 at 9:14 PM Yuming Wang 
mailto:wgy...@gmail.com>> wrote:

Please vote on releasing the following candidate as Apache Spark version 3.3.1.

The vote is open until 11:59pm Pacific time October 21th and passes if a 
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.3.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see 
https://spark.apache.org

The tag to be voted on is v3.3.1-rc4 (commit 
fbbcf9434ac070dd4ced4fb9efe32899c6db12a9):
https://github.com/apache/spark/tree/v3.3.1-rc4

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-bin

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1430

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.3.1-rc4-docs

The list of bug fixes going into 3.3.1 can be found at the following URL:
https://s.apache.org/ttgz6

This release is using the release script of the tag v3.3.1-rc4.


FAQ

==
What happened to v3.3.1-rc3?
==
A performance regression(SPARK-40703) was found after tagging v3.3.1-rc3, which 
the Iceberg community hopes Spark 3.3.1 could fix.
So we skipped the vote on v3.3.1-rc3.

=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.3.1?
===
The current list of open tickets targeted at 3.3.1 can be found at:
https://issues.apache.org/jira/projects/SPARK
 and search for "Target Version/s" = 3.3.1

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==
In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.