Re: [DISCUSS] SPIP: Spark Connect - A client and server interface for Apache Spark.

2022-06-03 Thread Koert Kuipers
how would scala udfs be supported in this?

On Fri, Jun 3, 2022 at 1:52 PM Martin Grund
 wrote:

> Hi Everyone,
>
> We would like to start a discussion on the "Spark Connect" proposal.
> Please find the links below:
>
> *JIRA* - https://issues.apache.org/jira/browse/SPARK-39375
> *SPIP Document* -
> https://docs.google.com/document/d/1Mnl6jmGszixLW4KcJU5j9IgpG9-UabS0dcM6PM2XGDc/edit#heading=h.wmsrrfealhrj
>
> *Excerpt from the document: *
>
> We propose to extend Apache Spark by building on the DataFrame API and the
> underlying unresolved logical plans. The DataFrame API is widely used and
> makes it very easy to iteratively express complex logic. We will introduce
> Spark Connect, a remote option of the DataFrame API that separates the
> client from the Spark server. With Spark Connect, Spark will become
> decoupled, allowing for built-in remote connectivity: The decoupled client
> SDK can be used to run interactive data exploration and connect to the
> server for DataFrame operations.
>
> Spark Connect will benefit Spark developers in different ways: The
> decoupled architecture will result in improved stability, as clients are
> separated from the driver. From the Spark Connect client perspective, Spark
> will be (almost) versionless, and thus enable seamless upgradability, as
> server APIs can evolve without affecting the client API. The decoupled
> client-server architecture can be leveraged to build close integrations
> with local developer tooling. Finally, separating the client process from
> the Spark server process will improve Spark’s overall security posture by
> avoiding the tight coupling of the client inside the Spark runtime
> environment.
>
> Spark Connect will strengthen Spark’s position as the modern unified
> engine for large-scale data analytics and expand applicability to use cases
> and developers we could not reach with the current setup: Spark will become
> ubiquitously usable as the DataFrame API can be used with (almost) any
> programming language.
>
> We would like to start a discussion on the document and any feedback is
> welcome!
>
> Thanks a lot in advance,
> Martin
>

-- 
CONFIDENTIALITY NOTICE: This electronic communication and any files 
transmitted with it are confidential, privileged and intended solely for 
the use of the individual or entity to whom they are addressed. If you are 
not the intended recipient, you are hereby notified that any disclosure, 
copying, distribution (electronic or otherwise) or forwarding of, or the 
taking of any action in reliance on the contents of this transmission is 
strictly prohibited. Please notify the sender immediately by e-mail if you 
have received this email by mistake and delete this email from your system.


Is it necessary to print this email? If you care about the environment 
like we do, please refrain from printing emails. It helps to keep the 
environment forested and litter-free.


Re: [VOTE] Release Spark 3.3.0 (RC4)

2022-06-03 Thread L. C. Hsieh
It's fixed at https://github.com/apache/spark/pull/36762.

On Fri, Jun 3, 2022 at 2:20 PM Sean Owen  wrote:
>
> Ah yeah, I think it's this change from 15 hrs ago. That needs to be .toSeq:
>
> https://github.com/apache/spark/commit/4a0f0ff6c22b85cb0fc1eef842da8dbe4c90543a#diff-01813c3e2e933ed573e4a93750107f004a86e587330cba5e91b5052fa6ade2a5R146
>
> On Fri, Jun 3, 2022 at 4:13 PM Sean Owen  wrote:
>>
>> In Scala 2.13, I'm getting errors like this:
>>
>>  analyzer should replace current_timestamp with literals *** FAILED ***
>>   java.lang.ClassCastException: class scala.collection.mutable.ArrayBuffer 
>> cannot be cast to class scala.collection.immutable.Seq 
>> (scala.collection.mutable.ArrayBuffer and scala.collection.immutable.Seq are 
>> in unnamed module of loader 'app')
>>   at 
>> org.apache.spark.sql.catalyst.optimizer.ComputeCurrentTimeSuite.literals(ComputeCurrentTimeSuite.scala:146)
>> ...
>> - analyzer should replace current_date with literals *** FAILED ***
>>   java.lang.ClassCastException: class scala.collection.mutable.ArrayBuffer 
>> cannot be cast to class scala.collection.immutable.Seq 
>> (scala.collection.mutable.ArrayBuffer and scala.collection.immutable.Seq are 
>> in unnamed module of loader 'app')
>> ...
>>
>> I haven't investigated yet, just flagging in case anyone knows more about it 
>> immediately.
>>
>>
>> On Fri, Jun 3, 2022 at 9:54 AM Maxim Gekk 
>>  wrote:
>>>
>>> Please vote on releasing the following candidate as Apache Spark version 
>>> 3.3.0.
>>>
>>> The vote is open until 11:59pm Pacific time June 7th and passes if a 
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 3.3.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v3.3.0-rc4 (commit 
>>> 4e3599bc11a1cb0ea9fc819e7f752d2228e54baf):
>>> https://github.com/apache/spark/tree/v3.3.0-rc4
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc4-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1405
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc4-docs/
>>>
>>> The list of bug fixes going into 3.3.0 can be found at the following URL:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>>>
>>> This release is using the release script of the tag v3.3.0-rc4.
>>>
>>>
>>> FAQ
>>>
>>> =
>>> How can I help test this release?
>>> =
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with a out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 3.3.0?
>>> ===
>>> The current list of open tickets targeted at 3.3.0 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target 
>>> Version/s" = 3.3.0
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>> Maxim Gekk
>>>
>>> Software Engineer
>>>
>>> Databricks, Inc.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Spark 3.3.0 (RC4)

2022-06-03 Thread Sean Owen
Ah yeah, I think it's this change from 15 hrs ago. That needs to be .toSeq:

https://github.com/apache/spark/commit/4a0f0ff6c22b85cb0fc1eef842da8dbe4c90543a#diff-01813c3e2e933ed573e4a93750107f004a86e587330cba5e91b5052fa6ade2a5R146

On Fri, Jun 3, 2022 at 4:13 PM Sean Owen  wrote:

> In Scala 2.13, I'm getting errors like this:
>
>  analyzer should replace current_timestamp with literals *** FAILED ***
>   java.lang.ClassCastException: class scala.collection.mutable.ArrayBuffer
> cannot be cast to class scala.collection.immutable.Seq
> (scala.collection.mutable.ArrayBuffer and scala.collection.immutable.Seq
> are in unnamed module of loader 'app')
>   at
> org.apache.spark.sql.catalyst.optimizer.ComputeCurrentTimeSuite.literals(ComputeCurrentTimeSuite.scala:146)
> ...
> - analyzer should replace current_date with literals *** FAILED ***
>   java.lang.ClassCastException: class scala.collection.mutable.ArrayBuffer
> cannot be cast to class scala.collection.immutable.Seq
> (scala.collection.mutable.ArrayBuffer and scala.collection.immutable.Seq
> are in unnamed module of loader 'app')
> ...
>
> I haven't investigated yet, just flagging in case anyone knows more about
> it immediately.
>
>
> On Fri, Jun 3, 2022 at 9:54 AM Maxim Gekk
>  wrote:
>
>> Please vote on releasing the following candidate as
>> Apache Spark version 3.3.0.
>>
>> The vote is open until 11:59pm Pacific time June 7th and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.3.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v3.3.0-rc4 (commit
>> 4e3599bc11a1cb0ea9fc819e7f752d2228e54baf):
>> https://github.com/apache/spark/tree/v3.3.0-rc4
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc4-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1405
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc4-docs/
>>
>> The list of bug fixes going into 3.3.0 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>>
>> This release is using the release script of the tag v3.3.0-rc4.
>>
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 3.3.0?
>> ===
>> The current list of open tickets targeted at 3.3.0 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.3.0
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>> Maxim Gekk
>>
>> Software Engineer
>>
>> Databricks, Inc.
>>
>


Re: [VOTE] Release Spark 3.3.0 (RC4)

2022-06-03 Thread Sean Owen
In Scala 2.13, I'm getting errors like this:

 analyzer should replace current_timestamp with literals *** FAILED ***
  java.lang.ClassCastException: class scala.collection.mutable.ArrayBuffer
cannot be cast to class scala.collection.immutable.Seq
(scala.collection.mutable.ArrayBuffer and scala.collection.immutable.Seq
are in unnamed module of loader 'app')
  at
org.apache.spark.sql.catalyst.optimizer.ComputeCurrentTimeSuite.literals(ComputeCurrentTimeSuite.scala:146)
...
- analyzer should replace current_date with literals *** FAILED ***
  java.lang.ClassCastException: class scala.collection.mutable.ArrayBuffer
cannot be cast to class scala.collection.immutable.Seq
(scala.collection.mutable.ArrayBuffer and scala.collection.immutable.Seq
are in unnamed module of loader 'app')
...

I haven't investigated yet, just flagging in case anyone knows more about
it immediately.


On Fri, Jun 3, 2022 at 9:54 AM Maxim Gekk 
wrote:

> Please vote on releasing the following candidate as
> Apache Spark version 3.3.0.
>
> The vote is open until 11:59pm Pacific time June 7th and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.3.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.3.0-rc4 (commit
> 4e3599bc11a1cb0ea9fc819e7f752d2228e54baf):
> https://github.com/apache/spark/tree/v3.3.0-rc4
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc4-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1405
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc4-docs/
>
> The list of bug fixes going into 3.3.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>
> This release is using the release script of the tag v3.3.0-rc4.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.3.0?
> ===
> The current list of open tickets targeted at 3.3.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.3.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>


Re: The draft of the Spark 3.3.0 release notes

2022-06-03 Thread Bjørn Jørgensen
   -

   Support lambda `column` parameter of `DataFrame.rename`(SPARK-38763
   )


This did work before 18.jan.2022, see JIRA post for more information.
I think we can remove this one from the list.

fre. 3. jun. 2022 kl. 20:42 skrev Dongjoon Hyun :

> You are right.
>
> After SPARK-36837, we tried to ship Apache Spark 3.3.0 with Apache Kafka
> 3.1.1 via the following PR.
>
> https://github.com/apache/spark/pull/36135
> [WIP][SPARK-38850][BUILD] Upgrade Kafka to 3.1.1
>
> However, the final decision was to revert it from `branch-3.3` and move
> directly to Apache Kafka 3.2.0 at `master` branch. We need to remove it
> from the 3.3.0 release note.
>
>
> On Fri, Jun 3, 2022 at 9:54 AM Koert Kuipers  wrote:
>
>> i thought SPARK-36837 didnt make it in? i see it in notes
>>
>> On Fri, Jun 3, 2022 at 4:31 AM Maxim Gekk
>>  wrote:
>>
>>> Hi All,
>>>
>>> I am preparing the release notes of Spark 3.3.0. Here is a draft
>>> document:
>>>
>>> https://docs.google.com/document/d/1gGySrLGvIK8bajKdGjTI_mDqk0-YPvHmPN64YjoWfOQ/edit?usp=sharing
>>>
>>> Please take a look and let me know if I missed any major changes or
>>> something.
>>>
>>> Maxim Gekk
>>>
>>> Software Engineer
>>>
>>> Databricks, Inc.
>>>
>>
>> CONFIDENTIALITY NOTICE: This electronic communication and any files
>> transmitted with it are confidential, privileged and intended solely for
>> the use of the individual or entity to whom they are addressed. If you are
>> not the intended recipient, you are hereby notified that any disclosure,
>> copying, distribution (electronic or otherwise) or forwarding of, or the
>> taking of any action in reliance on the contents of this transmission is
>> strictly prohibited. Please notify the sender immediately by e-mail if you
>> have received this email by mistake and delete this email from your system.
>>
>> Is it necessary to print this email? If you care about the environment
>> like we do, please refrain from printing emails. It helps to keep the
>> environment forested and litter-free.
>
>

-- 
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge

+47 480 94 297


Re: The draft of the Spark 3.3.0 release notes

2022-06-03 Thread Dongjoon Hyun
You are right.

After SPARK-36837, we tried to ship Apache Spark 3.3.0 with Apache Kafka
3.1.1 via the following PR.

https://github.com/apache/spark/pull/36135
[WIP][SPARK-38850][BUILD] Upgrade Kafka to 3.1.1

However, the final decision was to revert it from `branch-3.3` and move
directly to Apache Kafka 3.2.0 at `master` branch. We need to remove it
from the 3.3.0 release note.


On Fri, Jun 3, 2022 at 9:54 AM Koert Kuipers  wrote:

> i thought SPARK-36837 didnt make it in? i see it in notes
>
> On Fri, Jun 3, 2022 at 4:31 AM Maxim Gekk
>  wrote:
>
>> Hi All,
>>
>> I am preparing the release notes of Spark 3.3.0. Here is a draft document:
>>
>> https://docs.google.com/document/d/1gGySrLGvIK8bajKdGjTI_mDqk0-YPvHmPN64YjoWfOQ/edit?usp=sharing
>>
>> Please take a look and let me know if I missed any major changes or
>> something.
>>
>> Maxim Gekk
>>
>> Software Engineer
>>
>> Databricks, Inc.
>>
>
> CONFIDENTIALITY NOTICE: This electronic communication and any files
> transmitted with it are confidential, privileged and intended solely for
> the use of the individual or entity to whom they are addressed. If you are
> not the intended recipient, you are hereby notified that any disclosure,
> copying, distribution (electronic or otherwise) or forwarding of, or the
> taking of any action in reliance on the contents of this transmission is
> strictly prohibited. Please notify the sender immediately by e-mail if you
> have received this email by mistake and delete this email from your system.
>
> Is it necessary to print this email? If you care about the environment
> like we do, please refrain from printing emails. It helps to keep the
> environment forested and litter-free.


[DISCUSS] SPIP: Spark Connect - A client and server interface for Apache Spark.

2022-06-03 Thread Martin Grund
Hi Everyone,

We would like to start a discussion on the "Spark Connect" proposal. Please
find the links below:

*JIRA* - https://issues.apache.org/jira/browse/SPARK-39375
*SPIP Document* -
https://docs.google.com/document/d/1Mnl6jmGszixLW4KcJU5j9IgpG9-UabS0dcM6PM2XGDc/edit#heading=h.wmsrrfealhrj

*Excerpt from the document: *

We propose to extend Apache Spark by building on the DataFrame API and the
underlying unresolved logical plans. The DataFrame API is widely used and
makes it very easy to iteratively express complex logic. We will introduce
Spark Connect, a remote option of the DataFrame API that separates the
client from the Spark server. With Spark Connect, Spark will become
decoupled, allowing for built-in remote connectivity: The decoupled client
SDK can be used to run interactive data exploration and connect to the
server for DataFrame operations.

Spark Connect will benefit Spark developers in different ways: The
decoupled architecture will result in improved stability, as clients are
separated from the driver. From the Spark Connect client perspective, Spark
will be (almost) versionless, and thus enable seamless upgradability, as
server APIs can evolve without affecting the client API. The decoupled
client-server architecture can be leveraged to build close integrations
with local developer tooling. Finally, separating the client process from
the Spark server process will improve Spark’s overall security posture by
avoiding the tight coupling of the client inside the Spark runtime
environment.

Spark Connect will strengthen Spark’s position as the modern unified engine
for large-scale data analytics and expand applicability to use cases and
developers we could not reach with the current setup: Spark will become
ubiquitously usable as the DataFrame API can be used with (almost) any
programming language.

We would like to start a discussion on the document and any feedback is
welcome!

Thanks a lot in advance,
Martin


Re: The draft of the Spark 3.3.0 release notes

2022-06-03 Thread Koert Kuipers
i thought SPARK-36837 didnt make it in? i see it in notes

On Fri, Jun 3, 2022 at 4:31 AM Maxim Gekk 
wrote:

> Hi All,
>
> I am preparing the release notes of Spark 3.3.0. Here is a draft document:
>
> https://docs.google.com/document/d/1gGySrLGvIK8bajKdGjTI_mDqk0-YPvHmPN64YjoWfOQ/edit?usp=sharing
>
> Please take a look and let me know if I missed any major changes or
> something.
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>

-- 
CONFIDENTIALITY NOTICE: This electronic communication and any files 
transmitted with it are confidential, privileged and intended solely for 
the use of the individual or entity to whom they are addressed. If you are 
not the intended recipient, you are hereby notified that any disclosure, 
copying, distribution (electronic or otherwise) or forwarding of, or the 
taking of any action in reliance on the contents of this transmission is 
strictly prohibited. Please notify the sender immediately by e-mail if you 
have received this email by mistake and delete this email from your system.


Is it necessary to print this email? If you care about the environment 
like we do, please refrain from printing emails. It helps to keep the 
environment forested and litter-free.


Re: The draft of the Spark 3.3.0 release notes

2022-06-03 Thread Adam Binford
It's more of a missing feature than a bug, but I guess it's all subjective.

Adam

On Fri, Jun 3, 2022 at 10:24 AM Maxim Gekk 
wrote:

> Hi Adam,
>
> The release notes focus on new features and user-facing improvements
> mostly. SPARK-37618  seems
> like a bug fix, that's why I didn't put it in the doc.
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>
>
> On Fri, Jun 3, 2022 at 2:20 PM Adam Binford  wrote:
>
>> I don't think I see https://issues.apache.org/jira/browse/SPARK-37618
>> which got merged post branch cut.
>>
>> Adam
>>
>> On Fri, Jun 3, 2022 at 4:25 AM Maxim Gekk
>>  wrote:
>>
>>> Hi All,
>>>
>>> I am preparing the release notes of Spark 3.3.0. Here is a draft
>>> document:
>>>
>>> https://docs.google.com/document/d/1gGySrLGvIK8bajKdGjTI_mDqk0-YPvHmPN64YjoWfOQ/edit?usp=sharing
>>>
>>> Please take a look and let me know if I missed any major changes or
>>> something.
>>>
>>> Maxim Gekk
>>>
>>> Software Engineer
>>>
>>> Databricks, Inc.
>>>
>>
>>
>> --
>> Adam Binford
>>
>

-- 
Adam Binford


[VOTE] Release Spark 3.3.0 (RC4)

2022-06-03 Thread Maxim Gekk
Please vote on releasing the following candidate as
Apache Spark version 3.3.0.

The vote is open until 11:59pm Pacific time June 7th and passes if a
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.3.0
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see http://spark.apache.org/

The tag to be voted on is v3.3.0-rc4 (commit
4e3599bc11a1cb0ea9fc819e7f752d2228e54baf):
https://github.com/apache/spark/tree/v3.3.0-rc4

The release files, including signatures, digests, etc. can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc4-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1405

The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc4-docs/

The list of bug fixes going into 3.3.0 can be found at the following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12350369

This release is using the release script of the tag v3.3.0-rc4.


FAQ

=
How can I help test this release?
=
If you are a Spark user, you can help us test this release by taking
an existing Spark workload and running on this release candidate, then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and install
the current RC and see if anything important breaks, in the Java/Scala
you can add the staging repository to your projects resolvers and test
with the RC (make sure to clean up the artifact cache before/after so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.3.0?
===
The current list of open tickets targeted at 3.3.0 can be found at:
https://issues.apache.org/jira/projects/SPARK and search for "Target
Version/s" = 3.3.0

Committers should look at those and triage. Extremely important bug
fixes, documentation, and API tweaks that impact compatibility should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==
In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the previous
release. That being said, if there is something which is a regression
that has not been correctly targeted please ping me or a committer to
help target the issue.

Maxim Gekk

Software Engineer

Databricks, Inc.


Re: The draft of the Spark 3.3.0 release notes

2022-06-03 Thread Maxim Gekk
Hi Adam,

The release notes focus on new features and user-facing improvements
mostly. SPARK-37618  seems
like a bug fix, that's why I didn't put it in the doc.

Maxim Gekk

Software Engineer

Databricks, Inc.


On Fri, Jun 3, 2022 at 2:20 PM Adam Binford  wrote:

> I don't think I see https://issues.apache.org/jira/browse/SPARK-37618
> which got merged post branch cut.
>
> Adam
>
> On Fri, Jun 3, 2022 at 4:25 AM Maxim Gekk
>  wrote:
>
>> Hi All,
>>
>> I am preparing the release notes of Spark 3.3.0. Here is a draft document:
>>
>> https://docs.google.com/document/d/1gGySrLGvIK8bajKdGjTI_mDqk0-YPvHmPN64YjoWfOQ/edit?usp=sharing
>>
>> Please take a look and let me know if I missed any major changes or
>> something.
>>
>> Maxim Gekk
>>
>> Software Engineer
>>
>> Databricks, Inc.
>>
>
>
> --
> Adam Binford
>


Re: The draft of the Spark 3.3.0 release notes

2022-06-03 Thread Adam Binford
I don't think I see https://issues.apache.org/jira/browse/SPARK-37618 which
got merged post branch cut.

Adam

On Fri, Jun 3, 2022 at 4:25 AM Maxim Gekk 
wrote:

> Hi All,
>
> I am preparing the release notes of Spark 3.3.0. Here is a draft document:
>
> https://docs.google.com/document/d/1gGySrLGvIK8bajKdGjTI_mDqk0-YPvHmPN64YjoWfOQ/edit?usp=sharing
>
> Please take a look and let me know if I missed any major changes or
> something.
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>


-- 
Adam Binford


The draft of the Spark 3.3.0 release notes

2022-06-03 Thread Maxim Gekk
Hi All,

I am preparing the release notes of Spark 3.3.0. Here is a draft document:
https://docs.google.com/document/d/1gGySrLGvIK8bajKdGjTI_mDqk0-YPvHmPN64YjoWfOQ/edit?usp=sharing

Please take a look and let me know if I missed any major changes or
something.

Maxim Gekk

Software Engineer

Databricks, Inc.