[VOTE][RESULT] SPIP: Stored Procedures API for Catalogs

2024-05-15 Thread L. C. Hsieh
The vote passes with 13+1s (8 binding +1s) and 1+0.

(* = binding)
+1:
Chao Sun (*)
Liang-Chi Hsieh (*)
Huaxin Gao (*)
Bo Yang
Dongjoon Hyun (*)
Kent Yao
Wenchen Fan (*)
Ryan Blue
Anton Okolnychyi
Zhou Jiang
Gengliang Wang (*)
Xiao Li (*)
Hyukjin Kwon (*)

+0: None
Mich Talebzadeh


-1: None

Thanks all.

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] SPIP: Stored Procedures API for Catalogs

2024-05-15 Thread L. C. Hsieh
Hi all,

Thanks all for participating and your support! The vote has been passed.
I'll send out the result in a separate thread.

On Wed, May 15, 2024 at 4:44 PM Hyukjin Kwon  wrote:
>
> +1
>
> On Tue, 14 May 2024 at 16:39, Wenchen Fan  wrote:
>>
>> +1
>>
>> On Tue, May 14, 2024 at 8:19 AM Zhou Jiang  wrote:
>>>
>>> +1 (non-binding)
>>>
>>> On Sat, May 11, 2024 at 2:10 PM L. C. Hsieh  wrote:

 Hi all,

 I’d like to start a vote for SPIP: Stored Procedures API for Catalogs.

 Please also refer to:

- Discussion thread:
 https://lists.apache.org/thread/7r04pz544c9qs3gc8q2nyj3fpzfnv8oo
- JIRA ticket: https://issues.apache.org/jira/browse/SPARK-44167
- SPIP doc: 
 https://docs.google.com/document/d/1rDcggNl9YNcBECsfgPcoOecHXYZOu29QYFrloo2lPBg/


 Please vote on the SPIP for the next 72 hours:

 [ ] +1: Accept the proposal as an official SPIP
 [ ] +0
 [ ] -1: I don’t think this is a good idea because …


 Thank you!

 Liang-Chi Hsieh

 -
 To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

>>>
>>>
>>> --
>>> Zhou JIANG
>>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [VOTE] SPIP: Stored Procedures API for Catalogs

2024-05-15 Thread Hyukjin Kwon
+1

On Tue, 14 May 2024 at 16:39, Wenchen Fan  wrote:

> +1
>
> On Tue, May 14, 2024 at 8:19 AM Zhou Jiang  wrote:
>
>> +1 (non-binding)
>>
>> On Sat, May 11, 2024 at 2:10 PM L. C. Hsieh  wrote:
>>
>>> Hi all,
>>>
>>> I’d like to start a vote for SPIP: Stored Procedures API for Catalogs.
>>>
>>> Please also refer to:
>>>
>>>- Discussion thread:
>>> https://lists.apache.org/thread/7r04pz544c9qs3gc8q2nyj3fpzfnv8oo
>>>- JIRA ticket: https://issues.apache.org/jira/browse/SPARK-44167
>>>- SPIP doc:
>>> https://docs.google.com/document/d/1rDcggNl9YNcBECsfgPcoOecHXYZOu29QYFrloo2lPBg/
>>>
>>>
>>> Please vote on the SPIP for the next 72 hours:
>>>
>>> [ ] +1: Accept the proposal as an official SPIP
>>> [ ] +0
>>> [ ] -1: I don’t think this is a good idea because …
>>>
>>>
>>> Thank you!
>>>
>>> Liang-Chi Hsieh
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>
>> --
>> *Zhou JIANG*
>>
>>


Re: [DISCUSS] clarify the definition of behavior changes

2024-05-15 Thread Wenchen Fan
Thanks all for the feedback here! Let me put up a new version, which
clarifies the definition of "users":

Behavior changes mean user-visible functional changes in a new release via
public APIs. The "user" here is not only the user who writes queries and/or
develops Spark plugins, but also the user who deploys and/or manages Spark
clusters. New features, and even bug fixes that eliminate NPE or correct
query results, are behavior changes. Things like performance improvement,
code refactoring, and changes to unreleased APIs/features are not. All
behavior changes should be called out in the PR description. We need to
write an item in the migration guide (and probably legacy config) for those
that may break users when upgrading:

   - Bug fixes that change query results. Users may need to do backfill to
   correct the existing data and must know about these correctness fixes.
   - Bug fixes that change query schema. Users may need to update the
   schema of the tables in their data pipelines and must know about these
   changes.
   - Remove configs
   - Rename error class/condition
   - Any non-additive change to the public Python/SQL/Scala/Java/R APIs
   (including developer APIs): rename function, remove parameters, add
   parameters, rename parameters, change parameter default values, etc. These
   changes should be avoided in general, or done in a binary-compatible
   way like deprecating and adding a new function instead of renaming.
   - Any non-additive change to the way Spark should be deployed and
   managed.

The list above is not supposed to be comprehensive. Anyone can raise your
concern when reviewing PRs and ask the PR author to add migration guide if
you believe the change is risky and may break users.

On Thu, May 2, 2024 at 10:25 PM Will Raschkowski 
wrote:

> To add some user perspective, I wanted to share our experience from
> automatically upgrading tens of thousands of jobs from Spark 2 to 3 at
> Palantir:
>
>
>
> We didn't mind "loud" changes that threw exceptions. We have some infra to
> try run jobs with Spark 3 and fallback to Spark 2 if there's an exception.
> E.g., the datetime parsing and rebasing migration in Spark 3 was great:
> Spark threw a helpful exception but never silently changed results.
> Similarly, for things listed in the migration guide as silent changes
> (e.g., add_months's handling of last-day-of-month), we wrote custom check
> rules to throw unless users acknowledged the change through config.
>
>
>
> Silent changes *not* in the migration guide were really bad for us:
> Trusting the migration guide to be exhaustive, we automatically upgraded
> jobs which then “succeeded” but wrote incorrect results. For example, some
> expression increased timestamp precision in Spark 3; a query implicitly
> relied on the reduced precision, and then produced bad results on upgrade.
> It’s a silly query but a note in the migration guide would have helped.
>
>
>
> To summarize: the migration guide was invaluable, we appreciated every
> entry, and we'd appreciate Wenchen's stricter definition of "behavior
> changes" (especially for silent ones).
>
>
>
> *From: *Nimrod Ofek 
> *Date: *Thursday, 2 May 2024 at 11:57
> *To: *Wenchen Fan 
> *Cc: *Erik Krogen , Spark dev list <
> dev@spark.apache.org>
> *Subject: *Re: [DISCUSS] clarify the definition of behavior changes
>
> *CAUTION:* This email originates from an external party (outside of
> Palantir). If you believe this message is suspicious in nature, please use
> the "Report Message" button built into Outlook.
>
>
>
> Hi Erik and Wenchen,
>
>
>
> I think that usually a good practice with public api and with internal api
> that has big impact and a lot of usage is to ease in changes by providing
> defaults to new parameters that will keep former behaviour in a method with
> the previous signature with deprecation notice, and deleting that
> deprecated function in the next release- so the actual break will be in the
> next release after all libraries had the chance to align with the api and
> upgrades can be done while already using the new version.
>
>
>
> Another thing is that we should probably examine what private apis are
> used externally to provide better experience and provide proper public apis
> to meet those needs (for instance, applicative metrics and some way of
> creating custom behaviour columns).
>
>
>
> Thanks,
>
> Nimrod
>
>
>
> בתאריך יום ה׳, 2 במאי 2024, 03:51, מאת Wenchen Fan ‏:
>
> Hi Erik,
>
>
>
> Thanks for sharing your thoughts! Note: developer APIs are also public
> APIs (such as Data Source V2 API, Spark Listener API, etc.), so breaking
> changes should be avoided as much as we can and new APIs should be
> mentioned in the release notes. Breaking binary compatibility is also a
> "functional change" and should be treated as a behavior change.
>
>
>
> BTW, AFAIK some downstream libraries use private APIs such as Catalyst
> Expression and LogicalPlan. It's too much work to track all the changes to
> private APIs 

Re: [VOTE] SPARK 4.0.0-preview1 (RC1)

2024-05-15 Thread Wenchen Fan
RC1 failed because of this issue. I'll cut RC2 after we downgrade Jetty to
9.x.

On Sat, May 11, 2024 at 3:37 PM Cheng Pan  wrote:

> -1 (non-binding)
>
> A small question, the tag is orphan but I suppose it should belong to the
> master branch.
>
> Seems YARN integration is broken due to javax =>  jakarta namespace
> migration, I filled SPARK-48238, and left some comments on
> https://github.com/apache/spark/pull/45154
>
> Caused by: java.lang.IllegalStateException: class
> org.apache.hadoop.yarn.server.webproxy.amfilter.AmIpFilter is not a
> jakarta.servlet.Filter
> at
> org.sparkproject.jetty.servlet.FilterHolder.doStart(FilterHolder.java:99)
> ~[spark-core_2.13-4.0.0-preview1.jar:4.0.0-preview1]
> at
> org.sparkproject.jetty.util.component.AbstractLifeCycle.start(AbstractLifeCycle.java:93)
> ~[spark-core_2.13-4.0.0-preview1.jar:4.0.0-preview1]
> at
> org.sparkproject.jetty.servlet.ServletHandler.lambda$initialize$2(ServletHandler.java:724)
> ~[spark-core_2.13-4.0.0-preview1.jar:4.0.0-preview1]
> at
> java.base/java.util.ArrayList$ArrayListSpliterator.forEachRemaining(ArrayList.java:1625)
> ~[?:?]
> at
> java.base/java.util.stream.Streams$ConcatSpliterator.forEachRemaining(Streams.java:734)
> ~[?:?]
> at
> java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:762)
> ~[?:?]
> at
> org.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:749)
> ~[spark-core_2.13-4.0.0-preview1.jar:4.0.0-preview1]
> ... 38 more
>
> Thanks,
> Cheng Pan
>
>
> > On May 11, 2024, at 13:55, Wenchen Fan  wrote:
> >
> > Please vote on releasing the following candidate as Apache Spark version
> 4.0.0-preview1.
> >
> > The vote is open until May 16 PST and passes if a majority +1 PMC votes
> are cast, with
> > a minimum of 3 +1 votes.
> >
> > [ ] +1 Release this package as Apache Spark 4.0.0-preview1
> > [ ] -1 Do not release this package because ...
> >
> > To learn more about Apache Spark, please see http://spark.apache.org/
> >
> > The tag to be voted on is v4.0.0-preview1-rc1 (commit
> 7dcf77c739c3854260464d732dbfb9a0f54706e7):
> > https://github.com/apache/spark/tree/v4.0.0-preview1-rc1
> >
> > The release files, including signatures, digests, etc. can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview1-rc1-bin/
> >
> > Signatures used for Spark RCs can be found in this file:
> > https://dist.apache.org/repos/dist/dev/spark/KEYS
> >
> > The staging repository for this release can be found at:
> > https://repository.apache.org/content/repositories/orgapachespark-1454/
> >
> > The documentation corresponding to this release can be found at:
> > https://dist.apache.org/repos/dist/dev/spark/v4.0.0-preview1-rc1-docs/
> >
> > The list of bug fixes going into 4.0.0 can be found at the following URL:
> > https://issues.apache.org/jira/projects/SPARK/versions/12353359
> >
> > FAQ
> >
> > =
> > How can I help test this release?
> > =
> >
> > If you are a Spark user, you can help us test this release by taking
> > an existing Spark workload and running on this release candidate, then
> > reporting any regressions.
> >
> > If you're working in PySpark you can set up a virtual env and install
> > the current RC and see if anything important breaks, in the Java/Scala
> > you can add the staging repository to your projects resolvers and test
> > with the RC (make sure to clean up the artifact cache before/after so
> > you don't end up building with an out of date RC going forward).
>
>