date:20220518

Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-18 Thread Maxim Gekk

Hi Kent,

> Shall we backport the fix from the master to 3.3 too?

Yes, we shall.

Maxim Gekk

Software Engineer

Databricks, Inc.


On Thu, May 19, 2022 at 6:44 AM Kent Yao  wrote:

> Hi,
>
> I verified the simple case below with the binary release, and it looks
> like a bug to me.
>
> bin/spark-sql -e "select date '2018-11-17' > 1"
>
> Error in query: Invalid call to toAttribute on unresolved object;
> 'Project [unresolvedalias((2018-11-17 > 1), None)]
> +- OneRowRelation
>
> Both 3.2 releases and the master branch work fine with correct errors
> -  'due to data type mismatch'.
>
> Shall we backport the fix from the master to 3.3 too?
>
> Bests
>
> Kent Yao
>
>
> Yuming Wang  于2022年5月18日周三 19:04写道：
> >
> > -1. There is a regression: https://github.com/apache/spark/pull/36595
> >
> > On Wed, May 18, 2022 at 4:11 PM Martin Grigorov 
> wrote:
> >>
> >> Hi,
> >>
> >> [X] +1 Release this package as Apache Spark 3.3.0
> >>
> >> Tested:
> >> - make local distribution from sources (with ./dev/make-distribution.sh
> --tgz --name with-volcano -Pkubernetes,volcano,hadoop-3)
> >> - create a Docker image (with JDK 11)
> >> - run Pi example on
> >> -- local
> >> -- Kubernetes with default scheduler
> >> -- Kubernetes with Volcano scheduler
> >>
> >> On both x86_64 and aarch64 !
> >>
> >> Regards,
> >> Martin
> >>
> >>
> >> On Mon, May 16, 2022 at 3:44 PM Maxim Gekk 
> >> 
> wrote:
> >>>
> >>> Please vote on releasing the following candidate as Apache Spark
> version 3.3.0.
> >>>
> >>> The vote is open until 11:59pm Pacific time May 19th and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >>>
> >>> [ ] +1 Release this package as Apache Spark 3.3.0
> >>> [ ] -1 Do not release this package because ...
> >>>
> >>> To learn more about Apache Spark, please see http://spark.apache.org/
> >>>
> >>> The tag to be voted on is v3.3.0-rc2 (commit
> c8c657b922ac8fd8dcf9553113e11a80079db059):
> >>> https://github.com/apache/spark/tree/v3.3.0-rc2
> >>>
> >>> The release files, including signatures, digests, etc. can be found at:
> >>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc2-bin/
> >>>
> >>> Signatures used for Spark RCs can be found in this file:
> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
> >>>
> >>> The staging repository for this release can be found at:
> >>> https://repository.apache.org/content/repositories/orgapachespark-1403
> >>>
> >>> The documentation corresponding to this release can be found at:
> >>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc2-docs/
> >>>
> >>> The list of bug fixes going into 3.3.0 can be found at the following
> URL:
> >>> https://issues.apache.org/jira/projects/SPARK/versions/12350369
> >>>
> >>> This release is using the release script of the tag v3.3.0-rc2.
> >>>
> >>>
> >>> FAQ
> >>>
> >>> =
> >>> How can I help test this release?
> >>> =
> >>> If you are a Spark user, you can help us test this release by taking
> >>> an existing Spark workload and running on this release candidate, then
> >>> reporting any regressions.
> >>>
> >>> If you're working in PySpark you can set up a virtual env and install
> >>> the current RC and see if anything important breaks, in the Java/Scala
> >>> you can add the staging repository to your projects resolvers and test
> >>> with the RC (make sure to clean up the artifact cache before/after so
> >>> you don't end up building with a out of date RC going forward).
> >>>
> >>> ===
> >>> What should happen to JIRA tickets still targeting 3.3.0?
> >>> ===
> >>> The current list of open tickets targeted at 3.3.0 can be found at:
> >>> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.3.0
> >>>
> >>> Committers should look at those and triage. Extremely important bug
> >>> fixes, documentation, and API tweaks that impact compatibility should
> >>> be worked on immediately. Everything else please retarget to an
> >>> appropriate release.
> >>>
> >>> ==
> >>> But my bug isn't fixed?
> >>> ==
> >>> In order to make timely releases, we will typically not hold the
> >>> release unless the bug in question is a regression from the previous
> >>> release. That being said, if there is something which is a regression
> >>> that has not been correctly targeted please ping me or a committer to
> >>> help target the issue.
> >>>
> >>> Maxim Gekk
> >>>
> >>> Software Engineer
> >>>
> >>> Databricks, Inc.
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>

Re: A scene with unstable Spark performance

2022-05-18 Thread Chang Chen

This is a case where resources are fixed in the same SparkContext, but sqls
have different priorities.

Some SQLs are only allowed to be executed if there are spare resources,
once the high priority sql comes in, those sqls taskset either are killed
or stalled.

If  we set a high priority pool's minShare to a relatively higher value,
e.g.  50% or 60% of total cores, does it make sense?


Sungwoo Park  于2022年5月18日周三 13:28写道：

> The problem you describe is the motivation for developing Spark on MR3.
> From the blog article (
> https://www.datamonad.com/post/2021-08-18-spark-mr3/):
>
> *The main motivation for developing Spark on MR3 is to allow multiple
> Spark applications to share compute resources such as Yarn containers or
> Kubernetes Pods.*
>
> The problem is due to an architectural limitation of Spark, and I guess
> fixing the problem would require a heavy rewrite of Spark core. When we
> developed Spark on MR3, we were not aware of any attempt being made
> elsewhere (in academia and industry) to address this limitation.
>
> A potential workaround might be to implement a custom Spark application
> that manages the submission of two groups of Spark jobs and controls their
> execution (similarly to Spark Thrift Server). Not sure if this approach
> would fix your problem, though.
>
> If you are interested, see the webpage of Spark on MR3:
> https://mr3docs.datamonad.com/docs/spark/
>
> We have released Spark 3.0.1 on MR3, and Spark 3.2.1 on MR3 is under
> development. For Spark 3.0.1 on MR3, no change is made to Spark and MR3 is
> used as an add-on. The main application of MR3 is Hive on MR3, but Spark on
> MR3 is equally ready for production.
>
> Thank you,
>
> --- Sungwoo
>
>>

Re: Behaviour of Append & Overwrite modes when table is not present when using df.write in Spark 3

2022-05-18 Thread Sourabh Badhya

Requesting some suggestions on this.

Thanks in advance,
Sourabh Badhya

On Mon, May 9, 2022 at 5:16 PM Sourabh Badhya  wrote:

> Hi team,
>
> I would like to know the behaviour of Append & Overwrite modes when table
> is not present and whether automatic table creation is
> supported/unsupported when df.write is used in Spark 3 when the underlying
> custom datasource is using SupportCatalogOptions.
>
> As per my knowledge, in the current implementation in master,  df.write in
> Append and Overwrite mode tries to load the table and look for the schema
> of the table. Hence the table is expected to be created beforehand and
> hence the table will not automatically be created. Attaching the below code
> link for reference.
>
> Code link -
> https://github.com/apache/spark/blob/b065c945fe27dd5869b39bfeaad8e2b23a8835b5/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala#L287
>
> This is slightly different from the behaviour we observed in Spark 2 -
> https://lists.apache.org/thread/y468ngqhfhxhv0fygvwvy8r3g4sw9v7n
>
> Please confirm if I am correct and if this is the intended behaviour in
> Spark 3.
>
> Thanks and regards,
> Sourabh Badhya
>

Re: Introducing "Pandas API on Spark" component in JIRA, and use "PS" PR title component

2022-05-18 Thread Dongjoon Hyun

+1

Thank you for the suggestion, Hyukjin.

Dongjoon.

On Wed, May 18, 2022 at 11:08 AM Bjørn Jørgensen 
wrote:

> +1
> But can will have PR Title and PR label the same,  PS
>
> ons. 18. mai 2022 kl. 18:57 skrev Xinrong Meng
> :
>
>> Great!
>>
>> It saves us from always specifying "Pandas API on Spark" in PR titles.
>>
>> Thanks!
>>
>>
>> Xinrong Meng
>>
>> Software Engineer
>>
>> Databricks
>>
>>
>> On Tue, May 17, 2022 at 1:08 AM Maciej  wrote:
>>
>>> Sounds good!
>>>
>>> +1
>>>
>>> On 5/17/22 06:08, Yikun Jiang wrote:
>>> > It's a pretty good idea, +1.
>>> >
>>> > To be clear in Github:
>>> >
>>> > - For each PR Title: [SPARK-XXX][PYTHON][PS] The Pandas on spark pr
>>> title
>>> > (*still keep [PYTHON]* and [PS] new added)
>>> >
>>> > - For PR label: new added: `PANDAS API ON Spark`, still keep: `PYTHON`,
>>> > `CORE`
>>> > (*still keep `PYTHON`, `CORE`* and `PANDAS API ON SPARK` new added)
>>> > https://github.com/apache/spark/pull/36574
>>> > 
>>> >
>>> > Right?
>>> >
>>> > Regards,
>>> > Yikun
>>> >
>>> >
>>> > On Tue, May 17, 2022 at 11:26 AM Hyukjin Kwon >> > > wrote:
>>> >
>>> > Hi all,
>>> >
>>> > What about we introduce a component in JIRA "Pandas API on Spark",
>>> > and use "PS"  (pandas-on-Spark) in PR titles? We already use "ps"
>>> in
>>> > many places when we: import pyspark.pandas as ps.
>>> > This is similar to "Structured Streaming" in JIRA, and "SS" in PR
>>> title.
>>> >
>>> > I think it'd be easier to track the changes here with that.
>>> > Currently it's a bit difficult to identify it from pure PySpark
>>> changes.
>>> >
>>>
>>>
>>> --
>>> Best regards,
>>> Maciej Szymkiewicz
>>>
>>> Web: https://zero323.net
>>> PGP: A30CEF0C31A501EC
>>>
>>
>
> --
> Bjørn Jørgensen
> Vestre Aspehaug 4, 6010 Ålesund
> Norge
>
> +47 480 94 297
>

Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-18 Thread Kent Yao

Hi,

I verified the simple case below with the binary release, and it looks
like a bug to me.

bin/spark-sql -e "select date '2018-11-17' > 1"

Error in query: Invalid call to toAttribute on unresolved object;
'Project [unresolvedalias((2018-11-17 > 1), None)]
+- OneRowRelation

Both 3.2 releases and the master branch work fine with correct errors
-  'due to data type mismatch'.

Shall we backport the fix from the master to 3.3 too?

Bests

Kent Yao


Yuming Wang  于2022年5月18日周三 19:04写道：
>
> -1. There is a regression: https://github.com/apache/spark/pull/36595
>
> On Wed, May 18, 2022 at 4:11 PM Martin Grigorov  wrote:
>>
>> Hi,
>>
>> [X] +1 Release this package as Apache Spark 3.3.0
>>
>> Tested:
>> - make local distribution from sources (with ./dev/make-distribution.sh 
>> --tgz --name with-volcano -Pkubernetes,volcano,hadoop-3)
>> - create a Docker image (with JDK 11)
>> - run Pi example on
>> -- local
>> -- Kubernetes with default scheduler
>> -- Kubernetes with Volcano scheduler
>>
>> On both x86_64 and aarch64 !
>>
>> Regards,
>> Martin
>>
>>
>> On Mon, May 16, 2022 at 3:44 PM Maxim Gekk 
>>  wrote:
>>>
>>> Please vote on releasing the following candidate as Apache Spark version 
>>> 3.3.0.
>>>
>>> The vote is open until 11:59pm Pacific time May 19th and passes if a 
>>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>>
>>> [ ] +1 Release this package as Apache Spark 3.3.0
>>> [ ] -1 Do not release this package because ...
>>>
>>> To learn more about Apache Spark, please see http://spark.apache.org/
>>>
>>> The tag to be voted on is v3.3.0-rc2 (commit 
>>> c8c657b922ac8fd8dcf9553113e11a80079db059):
>>> https://github.com/apache/spark/tree/v3.3.0-rc2
>>>
>>> The release files, including signatures, digests, etc. can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc2-bin/
>>>
>>> Signatures used for Spark RCs can be found in this file:
>>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>>
>>> The staging repository for this release can be found at:
>>> https://repository.apache.org/content/repositories/orgapachespark-1403
>>>
>>> The documentation corresponding to this release can be found at:
>>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc2-docs/
>>>
>>> The list of bug fixes going into 3.3.0 can be found at the following URL:
>>> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>>>
>>> This release is using the release script of the tag v3.3.0-rc2.
>>>
>>>
>>> FAQ
>>>
>>> =
>>> How can I help test this release?
>>> =
>>> If you are a Spark user, you can help us test this release by taking
>>> an existing Spark workload and running on this release candidate, then
>>> reporting any regressions.
>>>
>>> If you're working in PySpark you can set up a virtual env and install
>>> the current RC and see if anything important breaks, in the Java/Scala
>>> you can add the staging repository to your projects resolvers and test
>>> with the RC (make sure to clean up the artifact cache before/after so
>>> you don't end up building with a out of date RC going forward).
>>>
>>> ===
>>> What should happen to JIRA tickets still targeting 3.3.0?
>>> ===
>>> The current list of open tickets targeted at 3.3.0 can be found at:
>>> https://issues.apache.org/jira/projects/SPARK and search for "Target 
>>> Version/s" = 3.3.0
>>>
>>> Committers should look at those and triage. Extremely important bug
>>> fixes, documentation, and API tweaks that impact compatibility should
>>> be worked on immediately. Everything else please retarget to an
>>> appropriate release.
>>>
>>> ==
>>> But my bug isn't fixed?
>>> ==
>>> In order to make timely releases, we will typically not hold the
>>> release unless the bug in question is a regression from the previous
>>> release. That being said, if there is something which is a regression
>>> that has not been correctly targeted please ping me or a committer to
>>> help target the issue.
>>>
>>> Maxim Gekk
>>>
>>> Software Engineer
>>>
>>> Databricks, Inc.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Re: Introducing "Pandas API on Spark" component in JIRA, and use "PS" PR title component

2022-05-18 Thread Bjørn Jørgensen

+1
But can will have PR Title and PR label the same,  PS

ons. 18. mai 2022 kl. 18:57 skrev Xinrong Meng
:

> Great!
>
> It saves us from always specifying "Pandas API on Spark" in PR titles.
>
> Thanks!
>
>
> Xinrong Meng
>
> Software Engineer
>
> Databricks
>
>
> On Tue, May 17, 2022 at 1:08 AM Maciej  wrote:
>
>> Sounds good!
>>
>> +1
>>
>> On 5/17/22 06:08, Yikun Jiang wrote:
>> > It's a pretty good idea, +1.
>> >
>> > To be clear in Github:
>> >
>> > - For each PR Title: [SPARK-XXX][PYTHON][PS] The Pandas on spark pr
>> title
>> > (*still keep [PYTHON]* and [PS] new added)
>> >
>> > - For PR label: new added: `PANDAS API ON Spark`, still keep: `PYTHON`,
>> > `CORE`
>> > (*still keep `PYTHON`, `CORE`* and `PANDAS API ON SPARK` new added)
>> > https://github.com/apache/spark/pull/36574
>> > 
>> >
>> > Right?
>> >
>> > Regards,
>> > Yikun
>> >
>> >
>> > On Tue, May 17, 2022 at 11:26 AM Hyukjin Kwon > > > wrote:
>> >
>> > Hi all,
>> >
>> > What about we introduce a component in JIRA "Pandas API on Spark",
>> > and use "PS"  (pandas-on-Spark) in PR titles? We already use "ps" in
>> > many places when we: import pyspark.pandas as ps.
>> > This is similar to "Structured Streaming" in JIRA, and "SS" in PR
>> title.
>> >
>> > I think it'd be easier to track the changes here with that.
>> > Currently it's a bit difficult to identify it from pure PySpark
>> changes.
>> >
>>
>>
>> --
>> Best regards,
>> Maciej Szymkiewicz
>>
>> Web: https://zero323.net
>> PGP: A30CEF0C31A501EC
>>
>

-- 
Bjørn Jørgensen
Vestre Aspehaug 4, 6010 Ålesund
Norge

+47 480 94 297

Re: Introducing "Pandas API on Spark" component in JIRA, and use "PS" PR title component

2022-05-18 Thread Xinrong Meng

Great!

It saves us from always specifying "Pandas API on Spark" in PR titles.

Thanks!


Xinrong Meng

Software Engineer

Databricks


On Tue, May 17, 2022 at 1:08 AM Maciej  wrote:

> Sounds good!
>
> +1
>
> On 5/17/22 06:08, Yikun Jiang wrote:
> > It's a pretty good idea, +1.
> >
> > To be clear in Github:
> >
> > - For each PR Title: [SPARK-XXX][PYTHON][PS] The Pandas on spark pr title
> > (*still keep [PYTHON]* and [PS] new added)
> >
> > - For PR label: new added: `PANDAS API ON Spark`, still keep: `PYTHON`,
> > `CORE`
> > (*still keep `PYTHON`, `CORE`* and `PANDAS API ON SPARK` new added)
> > https://github.com/apache/spark/pull/36574
> > 
> >
> > Right?
> >
> > Regards,
> > Yikun
> >
> >
> > On Tue, May 17, 2022 at 11:26 AM Hyukjin Kwon  > > wrote:
> >
> > Hi all,
> >
> > What about we introduce a component in JIRA "Pandas API on Spark",
> > and use "PS"  (pandas-on-Spark) in PR titles? We already use "ps" in
> > many places when we: import pyspark.pandas as ps.
> > This is similar to "Structured Streaming" in JIRA, and "SS" in PR
> title.
> >
> > I think it'd be easier to track the changes here with that.
> > Currently it's a bit difficult to identify it from pure PySpark
> changes.
> >
>
>
> --
> Best regards,
> Maciej Szymkiewicz
>
> Web: https://zero323.net
> PGP: A30CEF0C31A501EC
>

Re: Unable to create view due to up cast error when migrating from Hive to Spark

2022-05-18 Thread Wenchen Fan

A view is essentially a SQL query. It's fragile to share views between
Spark and Hive because different systems have different SQL dialects. They
may interpret the view SQL query differently and introduce unexpected
behaviors.

In this case, Spark returns decimal type for gender * 0.3 - 0.1 but Hive
returns double type. The view schema was determined during creation by
Hive, which does not match the view SQL query when we use Spark to read the
view. We need to re-create this view using Spark. Actually I think we need
to do the same for every Hive view if we need to use it in Spark.

On Wed, May 18, 2022 at 7:03 PM beliefer  wrote:

> During the migration from hive to spark, there was a problem with the SQL
> used to create views in hive. The problem is that the SQL that legally
> creates a view in hive will make an error when executed in spark SQL.
>
> The SQL is as follows:
>
> CREATE VIEW test_db.my_view AS
> select
> case
> when age > 12 then gender * 0.3 - 0.1
> end AS TT,
> gender,
> age,
> careers,
> education
> from
> test_db.my_table;
>
> The error message is as follows:
>
> Cannot up cast TT from decimal(13, 1) to double.
> The type path of the target object is:
>
> You can either add an explicit cast to the input data or choose a higher
> precision type of the field in the target object
>
> *How should we solve this problem?*
>
>
>
>

Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-18 Thread Yuming Wang

-1. There is a regression: https://github.com/apache/spark/pull/36595

On Wed, May 18, 2022 at 4:11 PM Martin Grigorov 
wrote:

> Hi,
>
> [X] +1 Release this package as Apache Spark 3.3.0
>
> Tested:
> - make local distribution from sources (with ./dev/make-distribution.sh
> --tgz --name with-volcano -Pkubernetes,volcano,hadoop-3)
> - create a Docker image (with JDK 11)
> - run Pi example on
> -- local
> -- Kubernetes with default scheduler
> -- Kubernetes with Volcano scheduler
>
> On both x86_64 and aarch64 !
>
> Regards,
> Martin
>
>
> On Mon, May 16, 2022 at 3:44 PM Maxim Gekk
>  wrote:
>
>> Please vote on releasing the following candidate as
>> Apache Spark version 3.3.0.
>>
>> The vote is open until 11:59pm Pacific time May 19th and passes if a
>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.3.0
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see http://spark.apache.org/
>>
>> The tag to be voted on is v3.3.0-rc2 (commit
>> c8c657b922ac8fd8dcf9553113e11a80079db059):
>> https://github.com/apache/spark/tree/v3.3.0-rc2
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc2-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1403
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc2-docs/
>>
>> The list of bug fixes going into 3.3.0 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>>
>> This release is using the release script of the tag v3.3.0-rc2.
>>
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 3.3.0?
>> ===
>> The current list of open tickets targeted at 3.3.0 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.3.0
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>> Maxim Gekk
>>
>> Software Engineer
>>
>> Databricks, Inc.
>>
>

Unable to create view due to up cast error when migrating from Hive to Spark

2022-05-18 Thread beliefer

During the migration from hive to spark, there was a problem with the SQL used 
to create views in hive. The problem is that the SQL that legally creates a 
view in hive will make an error when executed in spark SQL.

The SQL is as follows:

CREATE VIEW test_db.my_view AS
select
case
when age > 12 then gender * 0.3 - 0.1
end AS TT,
gender,
age,
careers,
education
from
test_db.my_table;

The error message is as follows:

Cannot up cast TT from decimal(13, 1) to double.

The type path of the target object is:



You can either add an explicit cast to the input data or choose a higher 
precision type of the field in the target object



How should we solve this problem?

Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-18 Thread Martin Grigorov

Hi,

[X] +1 Release this package as Apache Spark 3.3.0

Tested:
- make local distribution from sources (with ./dev/make-distribution.sh
--tgz --name with-volcano -Pkubernetes,volcano,hadoop-3)
- create a Docker image (with JDK 11)
- run Pi example on
-- local
-- Kubernetes with default scheduler
-- Kubernetes with Volcano scheduler

On both x86_64 and aarch64 !

Regards,
Martin


On Mon, May 16, 2022 at 3:44 PM Maxim Gekk
 wrote:

> Please vote on releasing the following candidate as
> Apache Spark version 3.3.0.
>
> The vote is open until 11:59pm Pacific time May 19th and passes if a
> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.3.0
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see http://spark.apache.org/
>
> The tag to be voted on is v3.3.0-rc2 (commit
> c8c657b922ac8fd8dcf9553113e11a80079db059):
> https://github.com/apache/spark/tree/v3.3.0-rc2
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc2-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1403
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.3.0-rc2-docs/
>
> The list of bug fixes going into 3.3.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12350369
>
> This release is using the release script of the tag v3.3.0-rc2.
>
>
> FAQ
>
> =
> How can I help test this release?
> =
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.3.0?
> ===
> The current list of open tickets targeted at 3.3.0 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.3.0
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
> Maxim Gekk
>
> Software Engineer
>
> Databricks, Inc.
>

Re: [VOTE] Release Spark 3.3.0 (RC2)

Re: A scene with unstable Spark performance

Re: Behaviour of Append & Overwrite modes when table is not present when using df.write in Spark 3

Re: Introducing "Pandas API on Spark" component in JIRA, and use "PS" PR title component

Re: [VOTE] Release Spark 3.3.0 (RC2)

Re: Introducing "Pandas API on Spark" component in JIRA, and use "PS" PR title component

Re: Introducing "Pandas API on Spark" component in JIRA, and use "PS" PR title component

Re: Unable to create view due to up cast error when migrating from Hive to Spark

Re: [VOTE] Release Spark 3.3.0 (RC2)

Unable to create view due to up cast error when migrating from Hive to Spark

Re: [VOTE] Release Spark 3.3.0 (RC2)

11 matches

Site Navigation

Mail list logo

Footer information