Re: [VOTE] SPIP: Stored Procedures API for Catalogs

2024-05-11 Thread huaxin gao
+1

On Sat, May 11, 2024 at 4:35 PM L. C. Hsieh  wrote:

> +1
>
> On Sat, May 11, 2024 at 3:11 PM Chao Sun  wrote:
> >
> > +1
> >
> > On Sat, May 11, 2024 at 2:10 PM L. C. Hsieh  wrote:
> >>
> >> Hi all,
> >>
> >> I’d like to start a vote for SPIP: Stored Procedures API for Catalogs.
> >>
> >> Please also refer to:
> >>
> >>- Discussion thread:
> >> https://lists.apache.org/thread/7r04pz544c9qs3gc8q2nyj3fpzfnv8oo
> >>- JIRA ticket: https://issues.apache.org/jira/browse/SPARK-44167
> >>- SPIP doc:
> https://docs.google.com/document/d/1rDcggNl9YNcBECsfgPcoOecHXYZOu29QYFrloo2lPBg/
> >>
> >>
> >> Please vote on the SPIP for the next 72 hours:
> >>
> >> [ ] +1: Accept the proposal as an official SPIP
> >> [ ] +0
> >> [ ] -1: I don’t think this is a good idea because …
> >>
> >>
> >> Thank you!
> >>
> >> Liang-Chi Hsieh
> >>
> >> -
> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [VOTE] SPIP: Stored Procedures API for Catalogs

2024-05-11 Thread L. C. Hsieh
+1

On Sat, May 11, 2024 at 3:11 PM Chao Sun  wrote:
>
> +1
>
> On Sat, May 11, 2024 at 2:10 PM L. C. Hsieh  wrote:
>>
>> Hi all,
>>
>> I’d like to start a vote for SPIP: Stored Procedures API for Catalogs.
>>
>> Please also refer to:
>>
>>- Discussion thread:
>> https://lists.apache.org/thread/7r04pz544c9qs3gc8q2nyj3fpzfnv8oo
>>- JIRA ticket: https://issues.apache.org/jira/browse/SPARK-44167
>>- SPIP doc: 
>> https://docs.google.com/document/d/1rDcggNl9YNcBECsfgPcoOecHXYZOu29QYFrloo2lPBg/
>>
>>
>> Please vote on the SPIP for the next 72 hours:
>>
>> [ ] +1: Accept the proposal as an official SPIP
>> [ ] +0
>> [ ] -1: I don’t think this is a good idea because …
>>
>>
>> Thank you!
>>
>> Liang-Chi Hsieh
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] SPIP: Stored Procedures API for Catalogs

2024-05-11 Thread Chao Sun
+1

On Sat, May 11, 2024 at 2:10 PM L. C. Hsieh  wrote:

> Hi all,
>
> I’d like to start a vote for SPIP: Stored Procedures API for Catalogs.
>
> Please also refer to:
>
>- Discussion thread:
> https://lists.apache.org/thread/7r04pz544c9qs3gc8q2nyj3fpzfnv8oo
>- JIRA ticket: https://issues.apache.org/jira/browse/SPARK-44167
>- SPIP doc:
> https://docs.google.com/document/d/1rDcggNl9YNcBECsfgPcoOecHXYZOu29QYFrloo2lPBg/
>
>
> Please vote on the SPIP for the next 72 hours:
>
> [ ] +1: Accept the proposal as an official SPIP
> [ ] +0
> [ ] -1: I don’t think this is a good idea because …
>
>
> Thank you!
>
> Liang-Chi Hsieh
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


[VOTE] SPIP: Stored Procedures API for Catalogs

2024-05-11 Thread L. C. Hsieh
Hi all,

I’d like to start a vote for SPIP: Stored Procedures API for Catalogs.

Please also refer to:

   - Discussion thread:
https://lists.apache.org/thread/7r04pz544c9qs3gc8q2nyj3fpzfnv8oo
   - JIRA ticket: https://issues.apache.org/jira/browse/SPARK-44167
   - SPIP doc: 
https://docs.google.com/document/d/1rDcggNl9YNcBECsfgPcoOecHXYZOu29QYFrloo2lPBg/


Please vote on the SPIP for the next 72 hours:

[ ] +1: Accept the proposal as an official SPIP
[ ] +0
[ ] -1: I don’t think this is a good idea because …


Thank you!

Liang-Chi Hsieh

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSS] SPIP: Stored Procedures API for Catalogs

2024-05-11 Thread Mich Talebzadeh
Thanks

In the context of stored procedures API for Catalogs, this approach
deviates from the traditional definition of stored procedures in RDBMS for
two key reasons:

   - Compilation vs. Interpretation: Traditional stored procedures are
   typically pre-compiled into machine code for faster execution. This
   approach, however, focuses on loading and interpreting the code on demand,
   similar to how scripts are run in some programming languages like Python.
   - Schema Changes and Invalidation: In RDBMS, changes to the underlying
   tables can invalidate compiled procedures as they might reference
   non-existent columns or have incompatible data types. This approach aims to
   avoid invalidation by potentially adapting to minor schema changes.

So, while it leverages the concept of pre-defined procedures stored within
the database and accessible through the Catalog API, it is evident that
this approach functions more like dynamic scripts than traditional compiled
stored procedures.

HTH

Mich Talebzadeh,Technologist | Architect | Data Engineer  | Generative AI |
FinCrime

London
United Kingdom

   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh




Disclaimer: The information provided is correct to the best of my knowledge
but of course cannot be guaranteed . It is essential to note that, as with
any advice, quote "one test result is worth one-thousand expert opinions
(Werner Von Braun)".


Mich Talebzadeh,
Technologist | Architect | Data Engineer  | Generative AI | FinCrime
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* The information provided is correct to the best of my
knowledge but of course cannot be guaranteed . It is essential to note
that, as with any advice, quote "one test result is worth one-thousand
expert opinions (Werner  Von
Braun )".


On Sat, 11 May 2024 at 19:25, Anton Okolnychyi 
wrote:

> Mich, I don't think the invalidation will be necessary in our case as
> there is no plan to preprocess or compile the procedures into executable
> objects. They will be loaded and executed on demand via the Catalog API.
>
> пт, 10 трав. 2024 р. о 10:37 Mich Talebzadeh 
> пише:
>
>> Hi,
>>
>> If the underlying table changes (DDL), if I recall from RDBMSs like
>> Oracle, the stored procedure will be invalidated as it is a compiled
>> object. How is this going to be handled? Does it follow the same mechanism?
>>
>> Thanks
>>
>> Mich Talebzadeh,
>> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
>> London
>> United Kingdom
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>
>>
>>
>> *Disclaimer:* The information provided is correct to the best of my
>> knowledge but of course cannot be guaranteed . It is essential to note
>> that, as with any advice, quote "one test result is worth one-thousand
>> expert opinions (Werner
>> Von Braun
>> )".
>>
>>
>> On Sat, 20 Apr 2024 at 02:34, Anton Okolnychyi 
>> wrote:
>>
>>> Hi folks,
>>>
>>> I'd like to start a discussion on SPARK-44167 that aims to enable
>>> catalogs to expose custom routines as stored procedures. I believe this
>>> functionality will enhance Spark’s ability to interact with external
>>> connectors and allow users to perform more operations in plain SQL.
>>>
>>> SPIP [1] contains proposed API changes and parser extensions. Any
>>> feedback is more than welcome!
>>>
>>> Unlike the initial proposal for stored procedures with Python [2], this
>>> one focuses on exposing pre-defined stored procedures via the catalog API.
>>> This approach is inspired by a similar functionality in Trino and avoids
>>> the challenges of supporting user-defined routines discussed earlier [3].
>>>
>>> Liang-Chi was kind enough to shepherd this effort. Thanks!
>>>
>>> - Anton
>>>
>>> [1] -
>>> https://docs.google.com/document/d/1rDcggNl9YNcBECsfgPcoOecHXYZOu29QYFrloo2lPBg/
>>> [2] -
>>> https://docs.google.com/document/d/1ce2EZrf2BxHu7TjfGn4TgToK3TBYYzRkmsIVcfmkNzE/
>>> [3] - https://lists.apache.org/thread/lkjm9r7rx7358xxn2z8yof4wdknpzg3l
>>>
>>>
>>>
>>>


Re: [DISCUSS] SPIP: Stored Procedures API for Catalogs

2024-05-11 Thread Anton Okolnychyi
Mich, I don't think the invalidation will be necessary in our case as there
is no plan to preprocess or compile the procedures into executable objects.
They will be loaded and executed on demand via the Catalog API.

пт, 10 трав. 2024 р. о 10:37 Mich Talebzadeh 
пише:

> Hi,
>
> If the underlying table changes (DDL), if I recall from RDBMSs like
> Oracle, the stored procedure will be invalidated as it is a compiled
> object. How is this going to be handled? Does it follow the same mechanism?
>
> Thanks
>
> Mich Talebzadeh,
> Technologist | Architect | Data Engineer  | Generative AI | FinCrime
> London
> United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* The information provided is correct to the best of my
> knowledge but of course cannot be guaranteed . It is essential to note
> that, as with any advice, quote "one test result is worth one-thousand
> expert opinions (Werner  Von
> Braun )".
>
>
> On Sat, 20 Apr 2024 at 02:34, Anton Okolnychyi 
> wrote:
>
>> Hi folks,
>>
>> I'd like to start a discussion on SPARK-44167 that aims to enable
>> catalogs to expose custom routines as stored procedures. I believe this
>> functionality will enhance Spark’s ability to interact with external
>> connectors and allow users to perform more operations in plain SQL.
>>
>> SPIP [1] contains proposed API changes and parser extensions. Any
>> feedback is more than welcome!
>>
>> Unlike the initial proposal for stored procedures with Python [2], this
>> one focuses on exposing pre-defined stored procedures via the catalog API.
>> This approach is inspired by a similar functionality in Trino and avoids
>> the challenges of supporting user-defined routines discussed earlier [3].
>>
>> Liang-Chi was kind enough to shepherd this effort. Thanks!
>>
>> - Anton
>>
>> [1] -
>> https://docs.google.com/document/d/1rDcggNl9YNcBECsfgPcoOecHXYZOu29QYFrloo2lPBg/
>> [2] -
>> https://docs.google.com/document/d/1ce2EZrf2BxHu7TjfGn4TgToK3TBYYzRkmsIVcfmkNzE/
>> [3] - https://lists.apache.org/thread/lkjm9r7rx7358xxn2z8yof4wdknpzg3l
>>
>>
>>
>>


Re: [VOTE] SPARK 4.0.0-preview1 (RC1)

2024-05-11 Thread Cheng Pan
-1 (non-binding)

A small question, the tag is orphan but I suppose it should belong to the 
master branch.

Seems YARN integration is broken due to javax =>  jakarta namespace migration, 
I filled SPARK-48238, and left some comments on 
https://github.com/apache/spark/pull/45154

Caused by: java.lang.IllegalStateException: class 
org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter is not a 
jakarta.servlet.Filter
at 
org.sparkproject.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99) 
~[spark-core_2.13-4.0.0-preview1.jar:4.0.0-preview1]
at 
org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93)
 ~[spark-core_2.13-4.0.0-preview1.jar:4.0.0-preview1]
at 
org.sparkproject.jetty.servlet.ServletHandler.lambda$initialize$2(ServletHandler.java:724)
 ~[spark-core_2.13-4.0.0-preview1.jar:4.0.0-preview1]
at 
java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
 ~[?:?]
at 
java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:734)
 ~[?:?]
at 
java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:762)
 ~[?:?]
at 
org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:749)
 ~[spark-core_2.13-4.0.0-preview1.jar:4.0.0-preview1]
... 38 more

Thanks,
Cheng Pan


> On May 11, 2024, at 13:55, Wenchen Fan  wrote:
> 
> Please vote on releasing the following candidate as Apache Spark version 
> 4.0.0-preview1.
> 
> The vote is open until May 16 PST and passes if a majority +1 PMC votes are 
> cast, with
> a minimum of 3 +1 votes.
> 
> [ ] +1 Release this package as Apache Spark 4.0.0-preview1
> [ ] -1 Do not release this package because ...
> 
> To learn more about Apache Spark, please see http://spark.apache.org/
> 
> The tag to be voted on is v4.0.0-preview1-rc1 (commit 
> 7dcf77c739c3854260464d732dbfb9a0f54706e7):
> https://github.com/apache/spark/tree/v4.0.0-preview1-rc1
> 
> The release files, including signatures, digests, etc. can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview1-rc1-bin/
> 
> Signatures used for Spark RCs can be found in this file:
> https://dist.apache.org/repos/dist/dev/spark/KEYS
> 
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1454/
> 
> The documentation corresponding to this release can be found at:
> https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview1-rc1-docs/
> 
> The list of bug fixes going into 4.0.0 can be found at the following URL:
> https://issues.apache.org/jira/projects/SPARK/versions/12353359
> 
> FAQ
> 
> =
> How can I help test this release?
> =
> 
> If you are a Spark user, you can help us test this release by taking
> an existing Spark workload and running on this release candidate, then
> reporting any regressions.
> 
> If you're working in PySpark you can set up a virtual env and install
> the current RC and see if anything important breaks, in the Java/Scala
> you can add the staging repository to your projects resolvers and test
> with the RC (make sure to clean up the artifact cache before/after so
> you don't end up building with an out of date RC going forward).


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org