Re: [VOTE] Release Spark 3.2.2 (RC1)

2022-07-14 Thread sarutak

I noticed that `TimestampNTZ` is intentionally hidden from the doc.
https://github.com/apache/spark/pull/35313#issuecomment-1185194701

So, it's better to remove notes about TimestampNTZ from the doc.
But I don't think this issue is not a blocker, so +1 on this RC.

Kousuke


Hi Bruce,

FYI we had further discussions on
https://github.com/apache/spark/pull/35313#issuecomment-1185195455.
Thanks for pointing that out, but this document issue should not be a
blocker of the release.

+1 on the RC.

Gengliang

On Thu, Jul 14, 2022 at 10:22 PM sarutak 
wrote:


Hi Dongjoon and Bruce,

SPARK-36724 is about SessionWindow, while SPARK-38017 and PR #35313
are
about TimeWindow, and TimeWindow already supports TimestampNTZ in
v3.2.1.



https://github.com/apache/spark/blob/4f25b3f71238a00508a356591553f2dfa89f8290/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala#L99


So, I think that change still valid.

Kousuke


Thank you so much, Bruce.

After SPARK-36724 landed at Spark 3.3.0, SPARK-38017 seems to land

at

branch-3.2 mistakenly here.

https://github.com/apache/spark/pull/35313

I believe I can remove those four places after uploading the docs

to

our website.

Dongjoon.

On Thu, Jul 14, 2022 at 2:16 PM Bruce Robbins



wrote:


A small thing. The function API doc (here [1]) claims that the
window function accepts a timeColumn of TimestampType or
TimestampNTZType. The update to the API doc was made since

v3.2.1.


As far as I can tell, 3.2.2 doesn't support TimestampNTZType.

On Mon, Jul 11, 2022 at 2:58 PM Dongjoon Hyun
 wrote:


Please vote on releasing the following candidate as Apache Spark
version 3.2.2.

The vote is open until July 15th 1AM (PST) and passes if a
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.2.2
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
https://spark.apache.org/

The tag to be voted on is v3.2.2-rc1 (commit
78a5825fe266c0884d2dd18cbca9625fa258d7f7):
https://github.com/apache/spark/tree/v3.2.2-rc1

The release files, including signatures, digests, etc. can be
found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.2-rc1-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:








https://repository.apache.org/content/repositories/orgapachespark-1409/


The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.2-rc1-docs/

The list of bug fixes going into 3.2.2 can be found at the
following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12351232

This release is using the release script of the tag v3.2.2-rc1.

FAQ

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by
taking
an existing Spark workload and running on this release

candidate,

then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and
install
the current RC and see if anything important breaks, in the
Java/Scala
you can add the staging repository to your projects resolvers

and

test
with the RC (make sure to clean up the artifact cache

before/after

so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.2.2?
===

The current list of open tickets targeted at 3.2.2 can be found
at:
https://issues.apache.org/jira/projects/SPARK and search for
"Target Version/s" = 3.2.2

Committers should look at those and triage. Extremely important
bug
fixes, documentation, and API tweaks that impact compatibility
should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the
previous
release. That being said, if there is something which is a
regression
that has not been correctly targeted please ping me or a

committer

to
help target the issue.

Dongjoon



Links:
--
[1]




https://dist.apache.org/repos/dist/dev/spark/v3.2.2-rc1-docs/_site/api/scala/org/apache/spark/sql/functions$.html




-

To unsubscribe e-mail: dev-unsubscr...@spark.apache.org


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Spark 3.2.2 (RC1)

2022-07-14 Thread Gengliang Wang
Hi Bruce,

FYI we had further discussions on
https://github.com/apache/spark/pull/35313#issuecomment-1185195455.
Thanks for pointing that out, but this document issue should not be a
blocker of the release.

+1 on the RC.

Gengliang

On Thu, Jul 14, 2022 at 10:22 PM sarutak  wrote:

> Hi Dongjoon and Bruce,
>
> SPARK-36724 is about SessionWindow, while SPARK-38017 and PR #35313 are
> about TimeWindow, and TimeWindow already supports TimestampNTZ in
> v3.2.1.
>
>
> https://github.com/apache/spark/blob/4f25b3f71238a00508a356591553f2dfa89f8290/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala#L99
>
> So, I think that change still valid.
>
> Kousuke
>
> > Thank you so much, Bruce.
> >
> > After SPARK-36724 landed at Spark 3.3.0, SPARK-38017 seems to land at
> > branch-3.2 mistakenly here.
> >
> > https://github.com/apache/spark/pull/35313
> >
> > I believe I can remove those four places after uploading the docs to
> > our website.
> >
> > Dongjoon.
> >
> > On Thu, Jul 14, 2022 at 2:16 PM Bruce Robbins 
> > wrote:
> >
> >> A small thing. The function API doc (here [1]) claims that the
> >> window function accepts a timeColumn of TimestampType or
> >> TimestampNTZType. The update to the API doc was made since v3.2.1.
> >>
> >> As far as I can tell, 3.2.2 doesn't support TimestampNTZType.
> >>
> >> On Mon, Jul 11, 2022 at 2:58 PM Dongjoon Hyun
> >>  wrote:
> >>
> >>> Please vote on releasing the following candidate as Apache Spark
> >>> version 3.2.2.
> >>>
> >>> The vote is open until July 15th 1AM (PST) and passes if a
> >>> majority +1 PMC votes are cast, with a minimum of 3 +1 votes.
> >>>
> >>> [ ] +1 Release this package as Apache Spark 3.2.2
> >>> [ ] -1 Do not release this package because ...
> >>>
> >>> To learn more about Apache Spark, please see
> >>> https://spark.apache.org/
> >>>
> >>> The tag to be voted on is v3.2.2-rc1 (commit
> >>> 78a5825fe266c0884d2dd18cbca9625fa258d7f7):
> >>> https://github.com/apache/spark/tree/v3.2.2-rc1
> >>>
> >>> The release files, including signatures, digests, etc. can be
> >>> found at:
> >>> https://dist.apache.org/repos/dist/dev/spark/v3.2.2-rc1-bin/
> >>>
> >>> Signatures used for Spark RCs can be found in this file:
> >>> https://dist.apache.org/repos/dist/dev/spark/KEYS
> >>>
> >>> The staging repository for this release can be found at:
> >>>
> >>
> > https://repository.apache.org/content/repositories/orgapachespark-1409/
> >>>
> >>> The documentation corresponding to this release can be found at:
> >>> https://dist.apache.org/repos/dist/dev/spark/v3.2.2-rc1-docs/
> >>>
> >>> The list of bug fixes going into 3.2.2 can be found at the
> >>> following URL:
> >>> https://issues.apache.org/jira/projects/SPARK/versions/12351232
> >>>
> >>> This release is using the release script of the tag v3.2.2-rc1.
> >>>
> >>> FAQ
> >>>
> >>> =
> >>> How can I help test this release?
> >>> =
> >>>
> >>> If you are a Spark user, you can help us test this release by
> >>> taking
> >>> an existing Spark workload and running on this release candidate,
> >>> then
> >>> reporting any regressions.
> >>>
> >>> If you're working in PySpark you can set up a virtual env and
> >>> install
> >>> the current RC and see if anything important breaks, in the
> >>> Java/Scala
> >>> you can add the staging repository to your projects resolvers and
> >>> test
> >>> with the RC (make sure to clean up the artifact cache before/after
> >>> so
> >>> you don't end up building with a out of date RC going forward).
> >>>
> >>> ===
> >>> What should happen to JIRA tickets still targeting 3.2.2?
> >>> ===
> >>>
> >>> The current list of open tickets targeted at 3.2.2 can be found
> >>> at:
> >>> https://issues.apache.org/jira/projects/SPARK and search for
> >>> "Target Version/s" = 3.2.2
> >>>
> >>> Committers should look at those and triage. Extremely important
> >>> bug
> >>> fixes, documentation, and API tweaks that impact compatibility
> >>> should
> >>> be worked on immediately. Everything else please retarget to an
> >>> appropriate release.
> >>>
> >>> ==
> >>> But my bug isn't fixed?
> >>> ==
> >>>
> >>> In order to make timely releases, we will typically not hold the
> >>> release unless the bug in question is a regression from the
> >>> previous
> >>> release. That being said, if there is something which is a
> >>> regression
> >>> that has not been correctly targeted please ping me or a committer
> >>> to
> >>> help target the issue.
> >>>
> >>> Dongjoon
> >
> >
> > Links:
> > --
> > [1]
> >
> https://dist.apache.org/repos/dist/dev/spark/v3.2.2-rc1-docs/_site/api/scala/org/apache/spark/sql/functions$.html
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] Release Spark 3.2.2 (RC1)

2022-07-14 Thread sarutak

Hi Dongjoon and Bruce,

SPARK-36724 is about SessionWindow, while SPARK-38017 and PR #35313 are 
about TimeWindow, and TimeWindow already supports TimestampNTZ in 
v3.2.1.


https://github.com/apache/spark/blob/4f25b3f71238a00508a356591553f2dfa89f8290/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/TimeWindow.scala#L99

So, I think that change still valid.

Kousuke


Thank you so much, Bruce.

After SPARK-36724 landed at Spark 3.3.0, SPARK-38017 seems to land at
branch-3.2 mistakenly here.

https://github.com/apache/spark/pull/35313

I believe I can remove those four places after uploading the docs to
our website.

Dongjoon.

On Thu, Jul 14, 2022 at 2:16 PM Bruce Robbins 
wrote:


A small thing. The function API doc (here [1]) claims that the
window function accepts a timeColumn of TimestampType or
TimestampNTZType. The update to the API doc was made since v3.2.1.

As far as I can tell, 3.2.2 doesn't support TimestampNTZType.

On Mon, Jul 11, 2022 at 2:58 PM Dongjoon Hyun
 wrote:


Please vote on releasing the following candidate as Apache Spark
version 3.2.2.

The vote is open until July 15th 1AM (PST) and passes if a
majority +1 PMC votes are cast, with a minimum of 3 +1 votes.

[ ] +1 Release this package as Apache Spark 3.2.2
[ ] -1 Do not release this package because ...

To learn more about Apache Spark, please see
https://spark.apache.org/

The tag to be voted on is v3.2.2-rc1 (commit
78a5825fe266c0884d2dd18cbca9625fa258d7f7):
https://github.com/apache/spark/tree/v3.2.2-rc1

The release files, including signatures, digests, etc. can be
found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.2-rc1-bin/

Signatures used for Spark RCs can be found in this file:
https://dist.apache.org/repos/dist/dev/spark/KEYS

The staging repository for this release can be found at:




https://repository.apache.org/content/repositories/orgapachespark-1409/


The documentation corresponding to this release can be found at:
https://dist.apache.org/repos/dist/dev/spark/v3.2.2-rc1-docs/

The list of bug fixes going into 3.2.2 can be found at the
following URL:
https://issues.apache.org/jira/projects/SPARK/versions/12351232

This release is using the release script of the tag v3.2.2-rc1.

FAQ

=
How can I help test this release?
=

If you are a Spark user, you can help us test this release by
taking
an existing Spark workload and running on this release candidate,
then
reporting any regressions.

If you're working in PySpark you can set up a virtual env and
install
the current RC and see if anything important breaks, in the
Java/Scala
you can add the staging repository to your projects resolvers and
test
with the RC (make sure to clean up the artifact cache before/after
so
you don't end up building with a out of date RC going forward).

===
What should happen to JIRA tickets still targeting 3.2.2?
===

The current list of open tickets targeted at 3.2.2 can be found
at:
https://issues.apache.org/jira/projects/SPARK and search for
"Target Version/s" = 3.2.2

Committers should look at those and triage. Extremely important
bug
fixes, documentation, and API tweaks that impact compatibility
should
be worked on immediately. Everything else please retarget to an
appropriate release.

==
But my bug isn't fixed?
==

In order to make timely releases, we will typically not hold the
release unless the bug in question is a regression from the
previous
release. That being said, if there is something which is a
regression
that has not been correctly targeted please ping me or a committer
to
help target the issue.

Dongjoon



Links:
--
[1] 
https://dist.apache.org/repos/dist/dev/spark/v3.2.2-rc1-docs/_site/api/scala/org/apache/spark/sql/functions$.html


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] Release Spark 3.2.2 (RC1)

2022-07-14 Thread Dongjoon Hyun
Thank you so much, Bruce.

After SPARK-36724 landed at Spark 3.3.0, SPARK-38017 seems to land at
branch-3.2 mistakenly here.

https://github.com/apache/spark/pull/35313

I believe I can remove those four places after uploading the docs to our
website.

Dongjoon.


On Thu, Jul 14, 2022 at 2:16 PM Bruce Robbins 
wrote:

> A small thing. The function API doc (here
> )
> claims that the window function accepts a timeColumn of TimestampType or
> TimestampNTZType. The update to the API doc was made since v3.2.1.
>
> As far as I can tell, 3.2.2 doesn't support TimestampNTZType.
>
>
> On Mon, Jul 11, 2022 at 2:58 PM Dongjoon Hyun 
> wrote:
>
>> Please vote on releasing the following candidate as Apache Spark version
>> 3.2.2.
>>
>> The vote is open until July 15th 1AM (PST) and passes if a majority +1
>> PMC votes are cast, with a minimum of 3 +1 votes.
>>
>> [ ] +1 Release this package as Apache Spark 3.2.2
>> [ ] -1 Do not release this package because ...
>>
>> To learn more about Apache Spark, please see https://spark.apache.org/
>>
>> The tag to be voted on is v3.2.2-rc1 (commit
>> 78a5825fe266c0884d2dd18cbca9625fa258d7f7):
>> https://github.com/apache/spark/tree/v3.2.2-rc1
>>
>> The release files, including signatures, digests, etc. can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.2-rc1-bin/
>>
>> Signatures used for Spark RCs can be found in this file:
>> https://dist.apache.org/repos/dist/dev/spark/KEYS
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1409/
>>
>> The documentation corresponding to this release can be found at:
>> https://dist.apache.org/repos/dist/dev/spark/v3.2.2-rc1-docs/
>>
>> The list of bug fixes going into 3.2.2 can be found at the following URL:
>> https://issues.apache.org/jira/projects/SPARK/versions/12351232
>>
>> This release is using the release script of the tag v3.2.2-rc1.
>>
>> FAQ
>>
>> =
>> How can I help test this release?
>> =
>>
>> If you are a Spark user, you can help us test this release by taking
>> an existing Spark workload and running on this release candidate, then
>> reporting any regressions.
>>
>> If you're working in PySpark you can set up a virtual env and install
>> the current RC and see if anything important breaks, in the Java/Scala
>> you can add the staging repository to your projects resolvers and test
>> with the RC (make sure to clean up the artifact cache before/after so
>> you don't end up building with a out of date RC going forward).
>>
>> ===
>> What should happen to JIRA tickets still targeting 3.2.2?
>> ===
>>
>> The current list of open tickets targeted at 3.2.2 can be found at:
>> https://issues.apache.org/jira/projects/SPARK and search for "Target
>> Version/s" = 3.2.2
>>
>> Committers should look at those and triage. Extremely important bug
>> fixes, documentation, and API tweaks that impact compatibility should
>> be worked on immediately. Everything else please retarget to an
>> appropriate release.
>>
>> ==
>> But my bug isn't fixed?
>> ==
>>
>> In order to make timely releases, we will typically not hold the
>> release unless the bug in question is a regression from the previous
>> release. That being said, if there is something which is a regression
>> that has not been correctly targeted please ping me or a committer to
>> help target the issue.
>>
>>
>> Dongjoon
>>
>


Re: [VOTE] Release Spark 3.2.2 (RC1)

2022-07-14 Thread Bruce Robbins
A small thing. The function API doc (here
)
claims that the window function accepts a timeColumn of TimestampType or
TimestampNTZType. The update to the API doc was made since v3.2.1.

As far as I can tell, 3.2.2 doesn't support TimestampNTZType.


On Mon, Jul 11, 2022 at 2:58 PM Dongjoon Hyun 
wrote:

> Please vote on releasing the following candidate as Apache Spark version
> 3.2.2.
>
> The vote is open until July 15th 1AM (PST) and passes if a majority +1 PMC
> votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.2
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see https://spark.apache.org/
>
> The tag to be voted on is v3.2.2-rc1 (commit
> 78a5825fe266c0884d2dd18cbca9625fa258d7f7):
> https://github.com/apache/spark/tree/v3.2.2-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.2-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1409/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.2-rc1-docs/
>
> The list of bug fixes going into 3.2.2 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12351232
>
> This release is using the release script of the tag v3.2.2-rc1.
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
>
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with a out of date RC going forward).
>
> ===
> What should happen to JIRA tickets still targeting 3.2.2?
> ===
>
> The current list of open tickets targeted at 3.2.2 can be found at:
> https://issues.apache.org/jira/projects/SPARK and search for "Target
> Version/s" = 3.2.2
>
> Committers should look at those and triage. Extremely important bug
> fixes, documentation, and API tweaks that impact compatibility should
> be worked on immediately. Everything else please retarget to an
> appropriate release.
>
> ==
> But my bug isn't fixed?
> ==
>
> In order to make timely releases, we will typically not hold the
> release unless the bug in question is a regression from the previous
> release. That being said, if there is something which is a regression
> that has not been correctly targeted please ping me or a committer to
> help target the issue.
>
>
> Dongjoon
>


Re: [DISCUSS][Catalog API] Deprecate 4 Catalog API that takes two parameters which are (dbName, tableName/functionName)

2022-07-14 Thread Rui Wang
There were some extra discussions that happened at
https://github.com/apache/spark/pull/37105.

As of now we agreed to have a "soft deprecation" that
1. document the limitation of four API and suggest to use alternatives in
the API doc
2. do not use the @deprecation annotation.

Please let us know if you don't agree.


-Rui

On Fri, Jul 8, 2022 at 11:18 AM Rui Wang  wrote:

> Yes. The current goal is a pure educational deprecation.
>
> So given the proposal:
> 1. existing users or users who do not care about catalog names in table
> identifiers can still use all the API that maintain their past behavior.
> 2. new users who intend to use table identifiers with catalog names
> get warned by the annotation (and maybe a bit more comments on the API
> surface) that 4 API will not serve their usage.
>
> I believe this proposal is conservative: do not intend to cause troubles
> for existing users; do not intend to force user migration; do not intend to
> delete APIs; do not intend to hurt supportability. If there is anything
> that we can make this goal clear, I can do it.
>
> Ultimately, the 4 API in this thread has the problem that it is not
> compatible with 3 layer namespace thus not the same as other API who is
> supporting 3 layer namespace. For people who want to include catalog names,
> the problem itself will stand and we probably have to do something about
> it.
>
> -Rui
>
> On Fri, Jul 8, 2022 at 7:24 AM Wenchen Fan  wrote:
>
>> It's better to keep all APIs working. But in this case, I really have no
>> idea how to make these 4 APIs reasonable. For example, tableExists(dbName:
>> String, tableName: String) currently checks if table "dbName.tableName"
>> exists in the Hive metastore, and does not work with v2 catalogs at all.
>> It's not only a "not needed" API, but also a confusing API. We need a
>> mechanism to move users away from confusing APIs.
>>
>> I agree that we should not abuse deprecation. I think a general principle
>> to use deprecation is you have the intention to remove it eventually, which
>> is exactly the case here. We should remove these 4 APIs when most users
>> have moved away.
>>
>> Thanks,
>> Wenchen
>>
>> On Fri, Jul 8, 2022 at 2:49 PM Dongjoon Hyun 
>> wrote:
>>
>>> Thank you for starting the official discussion, Rui.
>>>
>>> 'Unneeded API' doesn't sound like a good frame for this discussion
>>> because it ignores the existing users and codes completely.
>>> Technically, the above mentioned reasons look irrelevant to any
>>> specific existing bugs or future maintenance cost saving. Instead, the
>>> deprecation already causes costs to the community (your PR, the future
>>> migration guide, and the communication with the customers like Q)
>>> and to the users for the actual migration to new API and validations.
>>> Given that, for now, the goal of this proposal looks like a pure
>>> educational purpose to advertise new APIs to Apache Spark 3.4+ users.
>>>
>>> Can we be more conservative at Apache Spark deprecation and allow
>>> users to use both APIs freely without any concern of uncertain
>>> insupportability? I simply want to avoid the situation where the pure
>>> educational deprecation itself becomes `Unneeded Deprecation` in the
>>> community.
>>>
>>> Dongjoon.
>>>
>>> On Thu, Jul 7, 2022 at 2:26 PM Rui Wang  wrote:
>>> >
>>> > I want to highlight in case I missed this in the original email:
>>> >
>>> > The 4 API will not be deleted. They will just be marked as deprecated
>>> annotations and we encourage users to use their alternatives.
>>> >
>>> >
>>> > -Rui
>>> >
>>> > On Thu, Jul 7, 2022 at 2:23 PM Rui Wang  wrote:
>>> >>
>>> >> Hi Community,
>>> >>
>>> >> Proposal:
>>> >> I want to discuss a proposal to deprecate the following Catalog API:
>>> >> def listColumns(dbName: String, tableName: String): Dataset[Column]
>>> >> def getTable(dbName: String, tableName: String): Table
>>> >> def getFunction(dbName: String, functionName: String): Function
>>> >> def tableExists(dbName: String, tableName: String): Boolean
>>> >>
>>> >>
>>> >> Context:
>>> >> We have been adding table identifier with catalog name (aka 3 layer
>>> namespace) support to Catalog API in
>>> https://issues.apache.org/jira/browse/SPARK-39235.
>>> >> The basic idea is, if an API accepts:
>>> >> 1. only tableName:String, we allow it accepts "a.b.c" and goes
>>> analyzer which treats a as catalog name, b namespace name and c table name.
>>> >> 2. only dbName:String, we allow it accepts "a.b" and goes analyzer
>>> which treats a as catalog name, b namespace name.
>>> >> Meanwhile we still maintain the backwards compatibility for such API
>>> to make sure past behavior remains the same. E.g. If you only use tableName
>>> it is still recognized by the session catalog.
>>> >>
>>> >> With this effort ongoing, the above 4 API becomes not fully
>>> compatible with the 3 layer namespace.
>>> >>
>>> >> use tableExists(dbName: String, tableName: String) as an example,
>>> given that it takes two parameters but leaves no 

Looking for Review for SPARK-39091

2022-07-14 Thread Neil Gupta
Hello,

I am not sure if this is the best place to ask but I submitted a PR for
SPARK-39091 sometime ago and was wondering if anyone with write access has
the bandwidth to review my work?

This is the PR: https://github.com/apache/spark/pull/36441

Thanks,
Neil


Re: [VOTE] Release Spark 3.2.2 (RC1)

2022-07-14 Thread Chao Sun
+1 (non-binding)

On Thu, Jul 14, 2022 at 12:40 AM Wenchen Fan  wrote:
>
> +1
>
> On Wed, Jul 13, 2022 at 7:29 PM Yikun Jiang  wrote:
>>
>> +1 (non-binding)
>>
>> Checked out tag and built from source on Linux aarch64 and ran some basic 
>> test.
>>
>>
>> Regards,
>> Yikun
>>
>>
>> On Wed, Jul 13, 2022 at 5:54 AM Mridul Muralidharan  wrote:
>>>
>>>
>>> +1
>>>
>>> Signatures, digests, etc check out fine.
>>> Checked out tag and build/tested with "-Pyarn -Pmesos -Pkubernetes"
>>>
>>> As always, the test "SPARK-33084: Add jar support Ivy URI in SQL" in 
>>> sql.SQLQuerySuite fails in my env; but other than that, the rest looks good.
>>>
>>> Regards,
>>> Mridul
>>>
>>>
>>> On Tue, Jul 12, 2022 at 3:17 AM Maxim Gekk 
>>>  wrote:

 +1

 On Tue, Jul 12, 2022 at 11:05 AM Yang,Jie(INF)  wrote:
>
> +1 (non-binding)
>
>
>
> Yang Jie
>
>
>
>
>
> 发件人: Dongjoon Hyun 
> 日期: 2022年7月12日 星期二 16:03
> 收件人: dev 
> 抄送: Cheng Su , "Yang,Jie(INF)" , 
> Sean Owen 
> 主题: Re: [VOTE] Release Spark 3.2.2 (RC1)
>
>
>
> +1
>
>
>
> Dongjoon.
>
>
>
> On Mon, Jul 11, 2022 at 11:34 PM Cheng Su  wrote:
>
> +1 (non-binding). Built from source, and ran some scala unit tests on M1 
> mac, with OpenJDK 8 and Scala 2.12.
>
>
>
> Thanks,
>
> Cheng Su
>
>
>
> On Mon, Jul 11, 2022 at 10:31 PM Yang,Jie(INF)  
> wrote:
>
> Does this happen when running all UTs? I ran this suite several times 
> alone using OpenJDK(zulu) 8u322-b06 on my Mac, but no similar error 
> occurred
>
>
>
> 发件人: Sean Owen 
> 日期: 2022年7月12日 星期二 10:45
> 收件人: Dongjoon Hyun 
> 抄送: dev 
> 主题: Re: [VOTE] Release Spark 3.2.2 (RC1)
>
>
>
> Is anyone seeing this error? I'm on OpenJDK 8 on a Mac:
>
>
>
> #
> # A fatal error has been detected by the Java Runtime Environment:
> #
> #  SIGSEGV (0xb) at pc=0x000101ca8ace, pid=11962, 
> tid=0x1603
> #
> # JRE version: OpenJDK Runtime Environment (8.0_322) (build 
> 1.8.0_322-bre_2022_02_28_15_01-b00)
> # Java VM: OpenJDK 64-Bit Server VM (25.322-b00 mixed mode bsd-amd64 
> compressed oops)
> # Problematic frame:
> # V  [libjvm.dylib+0x549ace]
> #
> # Failed to write core dump. Core dumps have been disabled. To enable 
> core dumping, try "ulimit -c unlimited" before starting Java again
> #
> # An error report file with more information is saved as:
> # /private/tmp/spark-3.2.2/sql/core/hs_err_pid11962.log
> ColumnVectorSuite:
> - boolean
> - byte
> Compiled method (nm)  885897 75403 n 0   
> sun.misc.Unsafe::putShort (native)
>  total in heap  [0x000102fdaa10,0x000102fdad48] = 824
>  relocation [0x000102fdab38,0x000102fdab78] = 64
>  main code  [0x000102fdab80,0x000102fdad48] = 456
> Compiled method (nm)  885897 75403 n 0   
> sun.misc.Unsafe::putShort (native)
>  total in heap  [0x000102fdaa10,0x000102fdad48] = 824
>  relocation [0x000102fdab38,0x000102fdab78] = 64
>  main code  [0x000102fdab80,0x000102fdad48] = 456
>
>
>
> On Mon, Jul 11, 2022 at 4:58 PM Dongjoon Hyun  
> wrote:
>
> Please vote on releasing the following candidate as Apache Spark version 
> 3.2.2.
>
> The vote is open until July 15th 1AM (PST) and passes if a majority +1 
> PMC votes are cast, with a minimum of 3 +1 votes.
>
> [ ] +1 Release this package as Apache Spark 3.2.2
> [ ] -1 Do not release this package because ...
>
> To learn more about Apache Spark, please see https://spark.apache.org/
>
> The tag to be voted on is v3.2.2-rc1 (commit 
> 78a5825fe266c0884d2dd18cbca9625fa258d7f7):
> https://github.com/apache/spark/tree/v3.2.2-rc1
>
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.2-rc1-bin/
>
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1409/
>
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v3.2.2-rc1-docs/
>
> The list of bug fixes going into 3.2.2 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12351232
>
> This release is using the release script of the tag v3.2.2-rc1.
>
> FAQ
>
> =
> How can I help test this release?
> =
>
> If you are a Spark user, you can help us test this release by 

Spark-39755 Review/comment

2022-07-14 Thread Pralabh Kumar
Hi Dev community

Please review/comment

https://issues.apache.org/jira/browse/SPARK-39755

Regards
Pralabh kumar


Re: [SPARK-39515] Improve scheduled jobs in GitHub Actions

2022-07-14 Thread Yikun Jiang
With the help from the community, the cache based job switch has been
completed!

* About the ghcr images:

You might notice that two images are generated in apache ghcr:

- Image cache: spark/apache-spark-github-action-image-cache
:
This is the cache based on branches' dev/infra/Dockerfile.

- CI image: apache-spark-ci-image
:
This is for scheduled jobs. It builds an image just-in-time from the cache,
and then uses it to run the CI jobs.

- Distributed (User) CI image: such as yikun/apache-spark-ci-image
: This
is for PR triggered jobs. Again built just-in-time from the cache and used
to execute the CI job(s) in the user's Github Action space.

* About the job:

For Lint/PySpark/SparkR jobs, "Base image build" will do a just-in-time
build and generate a ci-image for each PR, and jobs use the image as the
job container image.

* About how to change the infra deps:

Currently, the CI image is just like a static image unless you change the
Dockerfile.

- If you want to change the version of a dependency of Lint/PySpark/SparkR
jobs, you could change the dev/infra/Dockerfile just like
https://github.com/apache/spark/pull/37175.

- If you want to trigger a full refresh you could just change the
FULL_REFRESH_DATE
in the Dockerfile

.

FYI, I also do a updated the doc on
https://docs.google.com/document/d/1_uiId-U1DODYyYZejAZeyz2OAjxcnA-xfwjynDF6vd0
to
help you understand.


Through this work, I can really feel the efforts of previous maintenance! A
simple version bump of a dependency may lead to a lot of investigation!
Thanks to HyukjinKwon, Dongjoon and the whole community for keeping the
infra deps always latest!

Feel free to ping me if you have any other concerns or ideas!

Regards,
Yikun


On Mon, Jun 27, 2022 at 12:05 AM Yikun Jiang  wrote:

> > There’s one last task to simply caching the Docker image (
> https://issues.apache.org/jira/browse/SPARK-39522).
> I will have to be less active for this week and next week because of the
> Spark Summit. Would appreciate if somebody
> finds some time to take a stab.
>
> I did some investigations on spark container jobs (pyspark/sparkr/lint)
> using cache, and draft a doc to help you guys understand #36980
> :
>
> https://docs.google.com/document/d/1_uiId-U1DODYyYZejAZeyz2OAjxcnA-xfwjynDF6vd0
>
>
> > About a quick hallway meetup, I will be there after Holden’s talk at
> least to say hello to her :-).
>
> Something topic I was interesting about and related to build CI:
> - K8S integrations  test on
> GA:
> - To help various OS  and
> multi architecture/hardware (x86/arm64, gpu) integration support, what we
> can do to help improving.
> Please feel free to ping me if necessary. It's a little bit pity I
> couldn't have the opportunity to be there, I hope you guys have a fabulous
> meet on summit!
>
> Regards,
> Yikun
>
>
> On Fri, Jun 24, 2022 at 11:15 AM Dongjoon Hyun 
> wrote:
>
>> Yep, I'll be there too. Thank you for the adjustment. See you soon. :)
>>
>> Dongjoon.
>>
>> On Thu, Jun 23, 2022 at 4:59 PM Hyukjin Kwon  wrote:
>>
>>> Alright, I'll be there after Holden's talk Thursday
>>> https://databricks.com/dataaisummit/session/tools-assisted-apache-spark-version-migrations-21-32
>>> w/ Dongjoon (since he manages OSS Jenkins too).
>>> Let's have a quickie chat :-).
>>>
>>> On Thu, 23 Jun 2022 at 06:16, Hyukjin Kwon  wrote:
>>>
 Oops, I was confused about the time and distance in the US. I won't
 make it too.
 Let me find another time slot that works for more ppl.

 On Thu, 23 Jun 2022 at 00:19, Dongjoon Hyun 
 wrote:

> Thank you, Hyukjin! :)
>
> BTW, unfortunately, it seems that I cannot join that quick meeting.
> I have another schedule at South Bay around 7PM and need to leave San
> Francisco at least 5PM.
>
> Dongjoon.
>
>
> On Wed, Jun 22, 2022 at 3:39 AM Hyukjin Kwon 
> wrote:
>
>> (cc @Yikun Jiang  @Gengliang Wang
>>  @Maxim Gekk
>>  @Yang,Jie(INF)  FYI)
>>
>> On Wed, 22 Jun 2022 at 19:34, Hyukjin Kwon 
>> wrote:
>>
>>> Couple of updates:
>>>
>>>-
>>>
>>>All builds passed now with all combinations we defined in the
>>>GitHub Actions (e.g., branch-3.2, branch-3.3, JDK 11,
>>>JDK 17 and Scala 2.13), see
>>>https://github.com/apache/spark/actions cc @Tom Graves
>>> @Dongjoon Hyun 
>>> FYI
>>>-
>>>
>>>except one test that is being failed due to OOM. That’s being
>>>fixed at 

Re: [VOTE] Release Spark 3.2.2 (RC1)

2022-07-14 Thread Wenchen Fan
+1

On Wed, Jul 13, 2022 at 7:29 PM Yikun Jiang  wrote:

> +1 (non-binding)
>
> Checked out tag and built from source on Linux aarch64 and ran some basic
> test.
>
>
> Regards,
> Yikun
>
>
> On Wed, Jul 13, 2022 at 5:54 AM Mridul Muralidharan 
> wrote:
>
>>
>> +1
>>
>> Signatures, digests, etc check out fine.
>> Checked out tag and build/tested with "-Pyarn -Pmesos -Pkubernetes"
>>
>> As always, the test "SPARK-33084: Add jar support Ivy URI in SQL" in
>> sql.SQLQuerySuite fails in my env; but other than that, the rest looks good.
>>
>> Regards,
>> Mridul
>>
>>
>> On Tue, Jul 12, 2022 at 3:17 AM Maxim Gekk
>>  wrote:
>>
>>> +1
>>>
>>> On Tue, Jul 12, 2022 at 11:05 AM Yang,Jie(INF) 
>>> wrote:
>>>
 +1 (non-binding)



 Yang Jie





 *发件人**: *Dongjoon Hyun 
 *日期**: *2022年7月12日 星期二 16:03
 *收件人**: *dev 
 *抄送**: *Cheng Su , "Yang,Jie(INF)" <
 yangji...@baidu.com>, Sean Owen 
 *主题**: *Re: [VOTE] Release Spark 3.2.2 (RC1)



 +1



 Dongjoon.



 On Mon, Jul 11, 2022 at 11:34 PM Cheng Su  wrote:

 +1 (non-binding). Built from source, and ran some scala unit tests on
 M1 mac, with OpenJDK 8 and Scala 2.12.



 Thanks,

 Cheng Su



 On Mon, Jul 11, 2022 at 10:31 PM Yang,Jie(INF) 
 wrote:

 Does this happen when running all UTs? I ran this suite several times
 alone using OpenJDK(zulu) 8u322-b06 on my Mac, but no similar error
 occurred



 *发件人**: *Sean Owen 
 *日期**: *2022年7月12日 星期二 10:45
 *收件人**: *Dongjoon Hyun 
 *抄送**: *dev 
 *主题**: *Re: [VOTE] Release Spark 3.2.2 (RC1)



 Is anyone seeing this error? I'm on OpenJDK 8 on a Mac:



 #
 # A fatal error has been detected by the Java Runtime Environment:
 #
 #  SIGSEGV (0xb) at pc=0x000101ca8ace, pid=11962,
 tid=0x1603
 #
 # JRE version: OpenJDK Runtime Environment (8.0_322) (build
 1.8.0_322-bre_2022_02_28_15_01-b00)
 # Java VM: OpenJDK 64-Bit Server VM (25.322-b00 mixed mode bsd-amd64
 compressed oops)
 # Problematic frame:
 # V  [libjvm.dylib+0x549ace]
 #
 # Failed to write core dump. Core dumps have been disabled. To enable
 core dumping, try "ulimit -c unlimited" before starting Java again
 #
 # An error report file with more information is saved as:
 # /private/tmp/spark-3.2.2/sql/core/hs_err_pid11962.log
 ColumnVectorSuite:
 - boolean
 - byte
 Compiled method (nm)  885897 75403 n 0
 sun.misc.Unsafe::putShort (native)
  total in heap  [0x000102fdaa10,0x000102fdad48] = 824
  relocation [0x000102fdab38,0x000102fdab78] = 64
  main code  [0x000102fdab80,0x000102fdad48] = 456
 Compiled method (nm)  885897 75403 n 0
 sun.misc.Unsafe::putShort (native)
  total in heap  [0x000102fdaa10,0x000102fdad48] = 824
  relocation [0x000102fdab38,0x000102fdab78] = 64
  main code  [0x000102fdab80,0x000102fdad48] = 456



 On Mon, Jul 11, 2022 at 4:58 PM Dongjoon Hyun 
 wrote:

 Please vote on releasing the following candidate as Apache Spark
 version 3.2.2.

 The vote is open until July 15th 1AM (PST) and passes if a majority +1
 PMC votes are cast, with a minimum of 3 +1 votes.

 [ ] +1 Release this package as Apache Spark 3.2.2
 [ ] -1 Do not release this package because ...

 To learn more about Apache Spark, please see https://spark.apache.org/
 

 The tag to be voted on is v3.2.2-rc1 (commit
 78a5825fe266c0884d2dd18cbca9625fa258d7f7):
 https://github.com/apache/spark/tree/v3.2.2-rc1
 

 The release files, including signatures, digests, etc. can be found at:
 https://dist.apache.org/repos/dist/dev/spark/v3.2.2-rc1-bin/
 

 Signatures used for Spark RCs can be found in this file:
 https://dist.apache.org/repos/dist/dev/spark/KEYS
 

 The staging repository for this release can be found at:
 https://repository.apache.org/content/repositories/orgapachespark-1409/
 

 The documentation corresponding to this release can be found at:
 

How to set platform-level defaults for array-like configs?

2022-07-14 Thread Shardul Mahadik
Hi Spark devs,

Spark contains a bunch of array-like configs (comma separated lists). Some
examples include `spark.sql.extensions`,
`spark.sql.queryExecutionListeners`, `spark.jars.repositories`,
`spark.extraListeners`, `spark.driver.extraClassPath` and so on (there are
a dozen or so more). As owners of the Spark platform in our organization,
we would like to set platform-level defaults, e.g. custom SQL extension and
listeners, and we use some of the above mentioned properties to do so. At
the same time, we have power users writing their own listeners, setting the
same Spark confs and thus unintentionally overriding our platform defaults.
This leads to a loss of functionality within our platform.

Previously, Spark has introduced "default" confs for a few of these
array-like configs, e.g. `spark.plugins.defaultList` for `spark.plugins`,
`spark.driver.defaultJavaOptions` for `spark.driver.extraJavaOptions`.
These properties are meant to only be set by cluster admins thus allowing
separation between platform default and user configs. However, as discussed
in https://github.com/apache/spark/pull/34856, these configs are still
client-side and can still be overridden, while also not being a scalable
solution as we cannot introduce 1 new "default" config for every array-like
config.

I wanted to know if others have experienced this issue and what systems
have been implemented to tackle this. Are there any existing solutions for
this; either client-side or server-side? (e.g. at job submission server).
Even though we cannot easily enforce this at the client-side, the
simplicity of a solution may make it more appealing.

Thanks,
Shardul