Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-15 Thread Sean Owen
Is anyone seeing this Spark Connect test failure? then again, I have some
weird issue with this env that always fails 1 or 2 tests that nobody else
can replicate.

- Test observe *** FAILED ***
  == FAIL: Plans do not match ===
  !CollectMetrics my_metric, [min(id#0) AS min_val#0, max(id#0) AS
max_val#0, sum(id#0) AS sum(id)#0L], 0   CollectMetrics my_metric,
[min(id#0) AS min_val#0, max(id#0) AS max_val#0, sum(id#0) AS sum(id)#0L],
44
   +- LocalRelation , [id#0, name#0]
+- LocalRelation , [id#0, name#0]
(PlanTest.scala:179)

On Thu, Feb 15, 2024 at 1:34 PM Jungtaek Lim 
wrote:

> DISCLAIMER: RC for Apache Spark 3.5.1 starts with RC2 as I lately figured
> out doc generation issue after tagging RC1.
>
> Please vote on releasing the following candidate as Apache Spark version
> 3.5.1.
>
> The vote is open until February 18th 9AM (PST) and passes if a majority +1
> PMC votes are cast, with
> a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.5.1
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see https://spark.apache.org/
>
> The tag to be voted on is v3.5.1-rc2 (commit
> fd86f85e181fc2dc0f50a096855acf83a6cc5d9c):
> https://github.com/apache/spark/tree/v3.5.1-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.5.1-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1452/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.5.1-rc2-docs/
>
> The list of bug fixes going into 3.5.1 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12353495
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC via "pip install
> https://dist.apache.org/repos/dist/dev/spark/v3.5.1-rc2-bin/pyspark-3.5.1.tar.gz
> "
> and see if anything important breaks.
> In the Java/Scala, you can add the staging repository to your projects
> resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.5.1?
> ===
>
> The current list of open tickets targeted at 3.5.1 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.5.1
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>


Re: Re: [DISCUSS] Release Spark 3.5.1?

2024-02-15 Thread Jungtaek Lim
UPDATE: The vote thread is up now.
https://lists.apache.org/thread/f28h0brncmkoyv5mtsqtxx38hx309c2j


On Tue, Feb 6, 2024 at 11:30 AM Jungtaek Lim 
wrote:

> Thanks all for the positive feedback! Will figure out time to go through
> the RC process. Stay tuned!
>
> On Mon, Feb 5, 2024 at 7:46 AM Gengliang Wang  wrote:
>
>> +1
>>
>> On Sun, Feb 4, 2024 at 1:57 PM Hussein Awala  wrote:
>>
>>> +1
>>>
>>> On Sun, Feb 4, 2024 at 10:13 PM John Zhuge  wrote:
>>>
 +1

 John Zhuge


 On Sun, Feb 4, 2024 at 11:23 AM Santosh Pingale
  wrote:

> +1
>
> On Sun, Feb 4, 2024, 8:18 PM Xiao Li 
> wrote:
>
>> +1
>>
>> On Sun, Feb 4, 2024 at 6:07 AM beliefer  wrote:
>>
>>> +1
>>>
>>>
>>>
>>> 在 2024-02-04 15:26:13,"Dongjoon Hyun"  写道:
>>>
>>> +1
>>>
>>> On Sat, Feb 3, 2024 at 9:18 PM yangjie01 
>>> wrote:
>>>
 +1

 在 2024/2/4 13:13,“Kent Yao”mailto:y...@apache.org>>
 写入:


 +1


 Jungtaek Lim >>> kabhwan.opensou...@gmail.com>> 于2024年2月3日周六 21:14写道:
 >
 > Hi dev,
 >
 > looks like there are a huge number of commits being pushed to
 branch-3.5 after 3.5.0 was released, 200+ commits.
 >
 > $ git log --oneline v3.5.0..HEAD | wc -l
 > 202
 >
 > Also, there are 180 JIRA tickets containing 3.5.1 as fixed
 version, and 10 resolved issues are either marked as blocker (even
 correctness issues) or critical, which justifies the release.
 > https://issues.apache.org/jira/projects/SPARK/versions/12353495 <
 https://issues.apache.org/jira/projects/SPARK/versions/12353495>
 >
 > What do you think about releasing 3.5.1 with the current head of
 branch-3.5? I'm happy to volunteer as the release manager.
 >
 > Thanks,
 > Jungtaek Lim (HeartSaVioR)



 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>> dev-unsubscr...@spark.apache.org>







 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


>>
>> --
>>
>>


Re: Heads-up: Update on Spark 3.5.1 RC

2024-02-15 Thread Jungtaek Lim
UPDATE: Now the vote thread is up for RC2.
https://lists.apache.org/thread/f28h0brncmkoyv5mtsqtxx38hx309c2j

On Wed, Feb 14, 2024 at 2:59 AM Dongjoon Hyun 
wrote:

> Thank you for the update, Jungtaek.
>
> Dongjoon.
>
> On Tue, Feb 13, 2024 at 7:29 AM Jungtaek Lim 
> wrote:
>
>> Hi,
>>
>> Just a head-up since I didn't give an update for a week after the last
>> update from the discussion thread.
>>
>> I've been following the automated release process and encountered several
>> issues. Maybe I will file JIRA tickets and follow PRs.
>>
>> Issues I figured out so far are 1) python library version issue in the
>> release docker image, 2) doc build failure in pyspark ml for Spark connect.
>> I'm deferring to submit fixes till I see dry-run to succeed.
>>
>> Btw, I optimistically ran the process without a dry-run as GA has been
>> paased (my bad), and the tag for RC1 being created was done before I saw
>> issues. Maybe I'll need to start with RC2 after things are sorted out and
>> necessary fixes are landed to branch-3.5.
>>
>> Thanks,
>> Jungtaek Lim (HeartSaVioR)
>>
>>


[VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-15 Thread Jungtaek Lim
DISCLAIMER: RC for Apache Spark 3.5.1 starts with RC2 as I lately figured
out doc generation issue after tagging RC1.

Please vote on releasing the following candidate as Apache Spark version
3.5.1.

The vote is open until February 18th 9AM (PST) and passes if a majority +1
PMC votes are cast, with
a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.5.1
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see https://spark.apache.org/

The tag to be voted on is v3.5.1-rc2 (commit
fd86f85e181fc2dc0f50a096855acf83a6cc5d9c):
https://github.com/apache/spark/tree/v3.5.1-rc2

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.5.1-rc2-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1452/

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.5.1-rc2-docs/

The list of bug fixes going into 3.5.1 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12353495

FAQ

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC via "pip install
https://dist.apache.org/repos/dist/dev/spark/v3.5.1-rc2-bin/pyspark-3.5.1.tar.gz
"
and see if anything important breaks.
In the Java/Scala, you can add the staging repository to your projects
resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.5.1?
===

The current list of open tickets targeted at 3.5.1 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.5.1

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.


Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-15 Thread Mich Talebzadeh
Hi,I gather from the replies that the plugin is not currently available in
the form expected although I am aware of the shell script.

Also have you got some benchmark results from your tests that you can
possibly share?

Thanks,

Mich Talebzadeh,
Dad | Technologist | Solutions Architect | Engineer
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge, sourced from both personal expertise and other resources but of
course cannot be guaranteed . It is essential to note that, as with any
advice, one verified and tested result holds more weight than a thousand
expert opinions.


On Thu, 15 Feb 2024 at 01:18, Chao Sun  wrote:

> Hi Praveen,
>
> We will add a "Getting Started" section in the README soon, but basically
> comet-spark-shell
> 
>  in
> the repo should provide a basic tool to build Comet and launch a Spark
> shell with it.
>
> Note that we haven't open sourced several features yet including shuffle
> support, which the aggregate operation depends on. Please stay tuned!
>
> Chao
>
>
> On Wed, Feb 14, 2024 at 2:44 PM praveen sinha 
> wrote:
>
>> Hi Chao,
>>
>> Is there any example app/gist/repo which can help me use this plugin. I
>> wanted to try out some realtime aggregate performance on top of parquet and
>> spark dataframes.
>>
>> Thanks and Regards
>> Praveen
>>
>>
>> On Wed, Feb 14, 2024 at 9:20 AM Chao Sun  wrote:
>>
>>> > Out of interest what are the differences in the approach between this
>>> and Glutten?
>>>
>>> Overall they are similar, although Gluten supports multiple backends
>>> including Velox and Clickhouse. One major difference is (obviously)
>>> Comet is based on DataFusion and Arrow, and written in Rust, while
>>> Gluten is mostly C++.
>>> I haven't looked very deep into Gluten yet, but there could be other
>>> differences such as how strictly the engine follows Spark's semantics,
>>> table format support (Iceberg, Delta, etc), fallback mechanism
>>> (coarse-grained fallback on stage level or more fine-grained fallback
>>> within stages), UDF support (Comet hasn't started on this yet),
>>> shuffle support, memory management, etc.
>>>
>>> Both engines are backed by very strong and vibrant open source
>>> communities (Velox, Clickhouse, Arrow & DataFusion) so it's very
>>> exciting to see how the projects will grow in future.
>>>
>>> Best,
>>> Chao
>>>
>>> On Tue, Feb 13, 2024 at 10:06 PM John Zhuge  wrote:
>>> >
>>> > Congratulations! Excellent work!
>>> >
>>> > On Tue, Feb 13, 2024 at 8:04 PM Yufei Gu  wrote:
>>> >>
>>> >> Absolutely thrilled to see the project going open-source! Huge
>>> congrats to Chao and the entire team on this milestone!
>>> >>
>>> >> Yufei
>>> >>
>>> >>
>>> >> On Tue, Feb 13, 2024 at 12:43 PM Chao Sun  wrote:
>>> >>>
>>> >>> Hi all,
>>> >>>
>>> >>> We are very happy to announce that Project Comet, a plugin to
>>> >>> accelerate Spark query execution via leveraging DataFusion and Arrow,
>>> >>> has now been open sourced under the Apache Arrow umbrella. Please
>>> >>> check the project repo
>>> >>> https://github.com/apache/arrow-datafusion-comet for more details if
>>> >>> you are interested. We'd love to collaborate with people from the
>>> open
>>> >>> source community who share similar goals.
>>> >>>
>>> >>> Thanks,
>>> >>> Chao
>>> >>>
>>> >>> -
>>> >>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>> >>>
>>> >
>>> >
>>> > --
>>> > John Zhuge
>>>
>>> -
>>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>>
>>>