Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-02-29 Thread Peter Toth
Congratulations and thanks Jungtaek for driving this!

Xinrong Meng  ezt írta (időpont: 2024. márc. 1.,
P, 5:24):

> Congratulations!
>
> Thanks,
> Xinrong
>
> On Thu, Feb 29, 2024 at 11:16 AM Dongjoon Hyun 
> wrote:
>
>> Congratulations!
>>
>> Bests,
>> Dongjoon.
>>
>> On Wed, Feb 28, 2024 at 11:43 AM beliefer  wrote:
>>
>>> Congratulations!
>>>
>>>
>>>
>>> At 2024-02-28 17:43:25, "Jungtaek Lim" 
>>> wrote:
>>>
>>> Hi everyone,
>>>
>>> We are happy to announce the availability of Spark 3.5.1!
>>>
>>> Spark 3.5.1 is a maintenance release containing stability fixes. This
>>> release is based on the branch-3.5 maintenance branch of Spark. We
>>> strongly
>>> recommend all 3.5 users to upgrade to this stable release.
>>>
>>> To download Spark 3.5.1, head over to the download page:
>>> https://spark.apache.org/downloads.html
>>>
>>> To view the release notes:
>>> https://spark.apache.org/releases/spark-release-3-5-1.html
>>>
>>> We would like to acknowledge all community members for contributing to
>>> this
>>> release. This release would not have been possible without you.
>>>
>>> Jungtaek Lim
>>>
>>> ps. Yikun is helping us through releasing the official docker image for
>>> Spark 3.5.1 (Thanks Yikun!) It may take some time to be generally available.
>>>
>>>


Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-02-29 Thread Jungtaek Lim
Thanks for reporting - this is odd - the dropdown did not exist in other
recent releases.

https://spark.apache.org/docs/3.5.0/api/python/index.html
https://spark.apache.org/docs/3.4.2/api/python/index.html
https://spark.apache.org/docs/3.3.4/api/python/index.html

Looks like the dropdown feature was recently introduced but partially done.
The addition of a dropdown was done, but the way how to bump the version
was missed to be documented.
The contributor proposed the way to update the version "automatically", but
the PR wasn't merged. As a result, we are neither having the instruction
how to bump the version manually, nor having the automatic bump.

* PR for addition of dropdown: https://github.com/apache/spark/pull/42428
* PR for automatically bumping version:
https://github.com/apache/spark/pull/42881

We will probably need to add an instruction in the release process to
update the version. (For automatic bumping I don't have a good idea.)
I'll look into it. Please expect some delay during the holiday weekend
in S. Korea.

Thanks again.
Jungtaek Lim (HeartSaVioR)


On Fri, Mar 1, 2024 at 2:14 PM Dongjoon Hyun 
wrote:

> BTW, Jungtaek.
>
> PySpark document seems to show a wrong branch. At this time, `master`.
>
> https://spark.apache.org/docs/3.5.1/api/python/index.html
>
> PySpark Overview
> 
>
>Date: Feb 24, 2024 Version: master
>
> [image: Screenshot 2024-02-29 at 21.12.24.png]
>
>
> Could you do the follow-up, please?
>
> Thank you in advance.
>
> Dongjoon.
>
>
> On Thu, Feb 29, 2024 at 2:48 PM John Zhuge  wrote:
>
>> Excellent work, congratulations!
>>
>> On Wed, Feb 28, 2024 at 10:12 PM Dongjoon Hyun 
>> wrote:
>>
>>> Congratulations!
>>>
>>> Bests,
>>> Dongjoon.
>>>
>>> On Wed, Feb 28, 2024 at 11:43 AM beliefer  wrote:
>>>
 Congratulations!



 At 2024-02-28 17:43:25, "Jungtaek Lim" 
 wrote:

 Hi everyone,

 We are happy to announce the availability of Spark 3.5.1!

 Spark 3.5.1 is a maintenance release containing stability fixes. This
 release is based on the branch-3.5 maintenance branch of Spark. We
 strongly
 recommend all 3.5 users to upgrade to this stable release.

 To download Spark 3.5.1, head over to the download page:
 https://spark.apache.org/downloads.html

 To view the release notes:
 https://spark.apache.org/releases/spark-release-3-5-1.html

 We would like to acknowledge all community members for contributing to
 this
 release. This release would not have been possible without you.

 Jungtaek Lim

 ps. Yikun is helping us through releasing the official docker image for
 Spark 3.5.1 (Thanks Yikun!) It may take some time to be generally 
 available.


>>
>> --
>> John Zhuge
>>
>


Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-02-29 Thread Dongjoon Hyun
BTW, Jungtaek.

PySpark document seems to show a wrong branch. At this time, `master`.

https://spark.apache.org/docs/3.5.1/api/python/index.html

PySpark Overview


   Date: Feb 24, 2024 Version: master

[image: Screenshot 2024-02-29 at 21.12.24.png]


Could you do the follow-up, please?

Thank you in advance.

Dongjoon.


On Thu, Feb 29, 2024 at 2:48 PM John Zhuge  wrote:

> Excellent work, congratulations!
>
> On Wed, Feb 28, 2024 at 10:12 PM Dongjoon Hyun 
> wrote:
>
>> Congratulations!
>>
>> Bests,
>> Dongjoon.
>>
>> On Wed, Feb 28, 2024 at 11:43 AM beliefer  wrote:
>>
>>> Congratulations!
>>>
>>>
>>>
>>> At 2024-02-28 17:43:25, "Jungtaek Lim" 
>>> wrote:
>>>
>>> Hi everyone,
>>>
>>> We are happy to announce the availability of Spark 3.5.1!
>>>
>>> Spark 3.5.1 is a maintenance release containing stability fixes. This
>>> release is based on the branch-3.5 maintenance branch of Spark. We
>>> strongly
>>> recommend all 3.5 users to upgrade to this stable release.
>>>
>>> To download Spark 3.5.1, head over to the download page:
>>> https://spark.apache.org/downloads.html
>>>
>>> To view the release notes:
>>> https://spark.apache.org/releases/spark-release-3-5-1.html
>>>
>>> We would like to acknowledge all community members for contributing to
>>> this
>>> release. This release would not have been possible without you.
>>>
>>> Jungtaek Lim
>>>
>>> ps. Yikun is helping us through releasing the official docker image for
>>> Spark 3.5.1 (Thanks Yikun!) It may take some time to be generally available.
>>>
>>>
>
> --
> John Zhuge
>


[DISCUSS] SPIP: Structured Spark Logging

2024-02-29 Thread Gengliang Wang
Hi All,

I propose to enhance our logging system by transitioning to structured
logs. This initiative is designed to tackle the challenges of analyzing
distributed logs from drivers, workers, and executors by allowing them to
be queried using a fixed schema. The goal is to improve the informativeness
and accessibility of logs, making it significantly easier to diagnose
issues.

Key benefits include:

   - Clarity and queryability of distributed log files.
   - Continued support for log4j, allowing users to switch back to
   traditional text logging if preferred.

The improvement will simplify debugging and enhance productivity without
disrupting existing logging practices. The implementation is estimated to
take around 3 months.

*SPIP*:
https://docs.google.com/document/d/1rATVGmFLNVLmtxSpWrEceYm7d-ocgu8ofhryVs4g3XU/edit?usp=sharing
*JIRA*: SPARK-47240 

Your comments and feedback would be greatly appreciated.


Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-02-29 Thread Dongjoon Hyun
Please use the url as thr full string including '()' part.

Or you can seach directly at ASF Jira with 'Spark' project and three
labels, 'Correctness', 'correctness' and 'data-loss'.

Dongjoon

On Thu, Feb 29, 2024 at 11:54 Prem Sahoo  wrote:

> Hello Dongjoon,
> Thanks for emailing me.
> Could you please share a list of fixes  as the link provided by you is
> not working.
>
> On Thu, Feb 29, 2024 at 11:27 AM Dongjoon Hyun 
> wrote:
>
>> Hi,
>>
>> If you are observing correctness issues, you may hit some old (and fixed)
>> correctness issues.
>>
>> For example, from Apache Spark 3.2.1 to 3.2.4, we fixed 31 correctness
>> issues.
>>
>>
>> https://issues.apache.org/jira/issues/?filter=12345390=project%20%3D%20SPARK%20AND%20fixVersion%20in%20(3.2.1%2C%203.2.2%2C%203.2.3%2C%203.2.4)%20AND%20labels%20in%20(Correctness%2C%20correctness%2C%20data-loss)
>>
>> There are more fixes in 3.3 and 3.4 and 3.5, too.
>>
>> Please use the latest version, Apache Spark 3.5.1, because Apache Spark
>> 3.2 and 3.3 are in the End-Of-Support status of the community.
>>
>> It would be help if you can report any correctness issues with Apache
>> Spark 3.5.1.
>>
>> Thanks,
>> Dongjoon.
>>
>> On 2024/02/29 15:04:41 Prem Sahoo wrote:
>> > When Spark job shows FetchFailedException it creates few duplicate data
>> and
>> > we see few data also missing , please explain why. We have scenario when
>> > spark job complains FetchFailedException as one of the data node got
>> > rebooted middle of job running .
>> >
>> > Now due to this we have few duplicate data and few missing data . Why
>> spark
>> > is not handling this scenario correctly ? kind of we shouldn't miss any
>> > data and we shouldn't create duplicate data .
>> >
>> >
>> >
>> > I am using spark3.2.0 version.
>> >
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>


Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-02-29 Thread John Zhuge
Excellent work, congratulations!

On Wed, Feb 28, 2024 at 10:12 PM Dongjoon Hyun 
wrote:

> Congratulations!
>
> Bests,
> Dongjoon.
>
> On Wed, Feb 28, 2024 at 11:43 AM beliefer  wrote:
>
>> Congratulations!
>>
>>
>>
>> At 2024-02-28 17:43:25, "Jungtaek Lim" 
>> wrote:
>>
>> Hi everyone,
>>
>> We are happy to announce the availability of Spark 3.5.1!
>>
>> Spark 3.5.1 is a maintenance release containing stability fixes. This
>> release is based on the branch-3.5 maintenance branch of Spark. We
>> strongly
>> recommend all 3.5 users to upgrade to this stable release.
>>
>> To download Spark 3.5.1, head over to the download page:
>> https://spark.apache.org/downloads.html
>>
>> To view the release notes:
>> https://spark.apache.org/releases/spark-release-3-5-1.html
>>
>> We would like to acknowledge all community members for contributing to
>> this
>> release. This release would not have been possible without you.
>>
>> Jungtaek Lim
>>
>> ps. Yikun is helping us through releasing the official docker image for
>> Spark 3.5.1 (Thanks Yikun!) It may take some time to be generally available.
>>
>>

-- 
John Zhuge


Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-02-29 Thread Prem Sahoo
Congratulations Sent from my iPhoneOn Feb 29, 2024, at 4:54 PM, Xinrong Meng  wrote:Congratulations!Thanks,XinrongOn Thu, Feb 29, 2024 at 11:16 AM Dongjoon Hyun  wrote:Congratulations!Bests,Dongjoon.On Wed, Feb 28, 2024 at 11:43 AM beliefer  wrote:Congratulations!At 2024-02-28 17:43:25, "Jungtaek Lim"  wrote:Hi everyone,We are happy to announce the availability of Spark 3.5.1!Spark 3.5.1 is a maintenance release containing stability fixes. Thisrelease is based on the branch-3.5 maintenance branch of Spark. We stronglyrecommend all 3.5 users to upgrade to this stable release.To download Spark 3.5.1, head over to the download page:https://spark.apache.org/downloads.htmlTo view the release notes:https://spark.apache.org/releases/spark-release-3-5-1.htmlWe would like to acknowledge all community members for contributing to thisrelease. This release would not have been possible without you.Jungtaek Limps. Yikun is helping us through releasing the official docker image for Spark 3.5.1 (Thanks Yikun!) It may take some time to be generally available.




Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-02-29 Thread Prem Sahoo
Hello Dongjoon,
Thanks for emailing me.
Could you please share a list of fixes  as the link provided by you is
not working.

On Thu, Feb 29, 2024 at 11:27 AM Dongjoon Hyun  wrote:

> Hi,
>
> If you are observing correctness issues, you may hit some old (and fixed)
> correctness issues.
>
> For example, from Apache Spark 3.2.1 to 3.2.4, we fixed 31 correctness
> issues.
>
>
> https://issues.apache.org/jira/issues/?filter=12345390=project%20%3D%20SPARK%20AND%20fixVersion%20in%20(3.2.1%2C%203.2.2%2C%203.2.3%2C%203.2.4)%20AND%20labels%20in%20(Correctness%2C%20correctness%2C%20data-loss)
>
> There are more fixes in 3.3 and 3.4 and 3.5, too.
>
> Please use the latest version, Apache Spark 3.5.1, because Apache Spark
> 3.2 and 3.3 are in the End-Of-Support status of the community.
>
> It would be help if you can report any correctness issues with Apache
> Spark 3.5.1.
>
> Thanks,
> Dongjoon.
>
> On 2024/02/29 15:04:41 Prem Sahoo wrote:
> > When Spark job shows FetchFailedException it creates few duplicate data
> and
> > we see few data also missing , please explain why. We have scenario when
> > spark job complains FetchFailedException as one of the data node got
> > rebooted middle of job running .
> >
> > Now due to this we have few duplicate data and few missing data . Why
> spark
> > is not handling this scenario correctly ? kind of we shouldn't miss any
> > data and we shouldn't create duplicate data .
> >
> >
> >
> > I am using spark3.2.0 version.
> >
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-02-29 Thread Xinrong Meng
Congratulations!

Thanks,
Xinrong

On Thu, Feb 29, 2024 at 11:16 AM Dongjoon Hyun 
wrote:

> Congratulations!
>
> Bests,
> Dongjoon.
>
> On Wed, Feb 28, 2024 at 11:43 AM beliefer  wrote:
>
>> Congratulations!
>>
>>
>>
>> At 2024-02-28 17:43:25, "Jungtaek Lim" 
>> wrote:
>>
>> Hi everyone,
>>
>> We are happy to announce the availability of Spark 3.5.1!
>>
>> Spark 3.5.1 is a maintenance release containing stability fixes. This
>> release is based on the branch-3.5 maintenance branch of Spark. We
>> strongly
>> recommend all 3.5 users to upgrade to this stable release.
>>
>> To download Spark 3.5.1, head over to the download page:
>> https://spark.apache.org/downloads.html
>>
>> To view the release notes:
>> https://spark.apache.org/releases/spark-release-3-5-1.html
>>
>> We would like to acknowledge all community members for contributing to
>> this
>> release. This release would not have been possible without you.
>>
>> Jungtaek Lim
>>
>> ps. Yikun is helping us through releasing the official docker image for
>> Spark 3.5.1 (Thanks Yikun!) It may take some time to be generally available.
>>
>>


Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-02-29 Thread Dongjoon Hyun
Hi,

If you are observing correctness issues, you may hit some old (and fixed) 
correctness issues.

For example, from Apache Spark 3.2.1 to 3.2.4, we fixed 31 correctness issues.

https://issues.apache.org/jira/issues/?filter=12345390=project%20%3D%20SPARK%20AND%20fixVersion%20in%20(3.2.1%2C%203.2.2%2C%203.2.3%2C%203.2.4)%20AND%20labels%20in%20(Correctness%2C%20correctness%2C%20data-loss)

There are more fixes in 3.3 and 3.4 and 3.5, too.

Please use the latest version, Apache Spark 3.5.1, because Apache Spark 3.2 and 
3.3 are in the End-Of-Support status of the community.

It would be help if you can report any correctness issues with Apache Spark 
3.5.1.

Thanks,
Dongjoon.

On 2024/02/29 15:04:41 Prem Sahoo wrote:
> When Spark job shows FetchFailedException it creates few duplicate data and
> we see few data also missing , please explain why. We have scenario when
> spark job complains FetchFailedException as one of the data node got
> rebooted middle of job running .
> 
> Now due to this we have few duplicate data and few missing data . Why spark
> is not handling this scenario correctly ? kind of we shouldn't miss any
> data and we shouldn't create duplicate data .
> 
> 
> 
> I am using spark3.2.0 version.
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-02-29 Thread Prem Sahoo
When Spark job shows FetchFailedException it creates few duplicate data and
we see few data also missing , please explain why. We have scenario when
spark job complains FetchFailedException as one of the data node got
rebooted middle of job running .

Now due to this we have few duplicate data and few missing data . Why spark
is not handling this scenario correctly ? kind of we shouldn't miss any
data and we shouldn't create duplicate data .



I am using spark3.2.0 version.