Re: [VOTE] Release Spark 4.0.0-preview2 (RC1)

2024-09-18 Thread Wenchen Fan
+1 On Wed, Sep 18, 2024 at 1:21 AM John Zhuge wrote: > +1 non-binding > > John Zhuge > > > On Mon, Sep 16, 2024 at 11:07 PM Xinrong Meng wrote: > >> +1 >> >> Thank you @Dongjoon Hyun ! >> >> On Tue, Sep 17, 2024 at 11:31 AM huaxin gao >> wrote: >> >>> +1 >>> >>> On Mon, Sep 16, 2024 at 6:20 P

Re: [VOTE] Document and Feature Preview via GitHub Pages

2024-09-11 Thread Wenchen Fan
+1 On Wed, Sep 11, 2024 at 5:15 PM Martin Grund wrote: > +1 > > On Wed, Sep 11, 2024 at 9:39 AM Kent Yao wrote: > >> Hi all, >> >> Following the discussion[1], I'd like to start the vote for 'Document and >> Feature Preview via GitHub Pages' >> >> >> Please vote for the next 72 hours:(excluding

Re: Apache Spark 4.0.0-preview2 (?)

2024-09-08 Thread Wenchen Fan
+1, thanks Dongjoon! On Mon, Sep 9, 2024 at 9:44 AM Xinrong Meng wrote: > +1 > > Thank you @Dongjoon Hyun ! > > On Sat, Sep 7, 2024 at 8:05 PM Hyukjin Kwon wrote: > >> +1 >> >> On Sat, Sep 7, 2024 at 9:04 AM huaxin gao wrote: >> >>> +1 >>> >>> On Fri, Sep 6, 2024 at 1:12 PM L. C. Hsieh wrote

Re: [DISCUSS] release Spark 3.5.3?

2024-09-01 Thread Wenchen Fan
+1 >> >> Yuming Wang 于2024年8月30日周五 02:34写道: >> >>> +1, Could we include two additional issues: >>> https://issues.apache.org/jira/browse/SPARK-49472 >>> https://issues.apache.org/jira/browse/SPARK-49349 >>> >>> On Wed, Aug 28, 2024 at

[DISCUSS] release Spark 3.5.3?

2024-08-28 Thread Wenchen Fan
Hi all, It's unfortunate that we missed merging a fix of a correctness bug in Spark 3.5: https://github.com/apache/spark/pull/43938. I just re-submitted it: https://github.com/apache/spark/pull/47905 In addition to this correctness bug fix, around 40 fixes have been merged to branch 3.5 after 3.5

Re: [DISCUSS] [Spark SQL] A single-pass resolution approach for the Catalyst Analyzer

2024-08-26 Thread Wenchen Fan
+1. The analyzer rule order issue has bitten me multiple times and it's very hard to make your analyzer rule bug-free if it interacts with other rules. On Wed, Aug 21, 2024 at 2:49 AM Reynold Xin wrote: > +1 on this too > > When I implemented "group by all", I introduced at least two subtle bugs

Re: [DISCUSS] Deprecating SparkR

2024-08-13 Thread Wenchen Fan
+1 On Tue, Aug 13, 2024 at 10:50 PM L. C. Hsieh wrote: > +1 > > On Tue, Aug 13, 2024 at 2:54 AM Dongjoon Hyun > wrote: > > > > +1 > > > > Dongjoon > > > > On Mon, Aug 12, 2024 at 17:52 Holden Karau > wrote: > >> > >> +1 > >> > >> Are the sparklyr folks on this list? > >> > >> Twitter: https://

Re: Welcome new Apache Spark committers

2024-08-13 Thread Wenchen Fan
Congratulations! On Tue, Aug 13, 2024 at 6:06 PM Peter Toth wrote: > Congratulations! > > Gengliang Wang ezt írta (időpont: 2024. aug. 13., K, > 6:15): > >> Congratulations, everyone! >> >> On Mon, Aug 12, 2024 at 7:10 PM Denny Lee wrote: >> >>> Congrats Allison, Martin, and Haejoon! >>> >>> O

Re: Welcoming a new PMC member

2024-08-13 Thread Wenchen Fan
Congratulations! On Tue, Aug 13, 2024 at 4:13 PM Ruifeng Zheng wrote: > Congratulations! > > On Tue, Aug 13, 2024 at 3:59 PM Martin Grund > wrote: > >> Congratulations! >> >> On Tue, Aug 13, 2024 at 9:37 AM Peter Toth wrote: >> >>> Congratulations! >>> >>> Mridul Muralidharan ezt írta (időpon

Re: [VOTE] Archive Spark Documentations in Apache Archives

2024-08-12 Thread Wenchen Fan
+1 On Tue, Aug 13, 2024 at 1:57 AM Holden Karau wrote: > +1 > > > On Mon, Aug 12, 2024 at 10:17 AM Dongjoon Hyun > wrote: > >> +1 for the proposals >> - enhancing the release process to put the docs to `release` directory in >> order to archive. >> - uploading old releases via SVN manually to a

Re: [VOTE] Release Spark 3.5.2 (RC5)

2024-08-09 Thread Wenchen Fan
+1 On Fri, Aug 9, 2024 at 6:04 PM Peter Toth wrote: > +1 > > huaxin gao ezt írta (időpont: 2024. aug. 8., Cs, > 21:19): > >> +1 >> >> On Thu, Aug 8, 2024 at 11:41 AM L. C. Hsieh wrote: >> >>> Then, >>> >>> +1 again >>> >>> On Thu, Aug 8, 2024 at 11:38 AM Dongjoon Hyun >>> wrote: >>> > >>> > +

Re: Spark website repo size hits the storage limit of GitHub-hosted runners

2024-08-08 Thread Wenchen Fan
It makes sense to me to only keep the doc files for the latest maintenance release. i.e. remove the docs for 3.5.0 and only keep 3.5.1. On Thu, Aug 8, 2024 at 8:06 PM Kent Yao wrote: > Hi dev, > > The current size of the spark-website repository is approximately 16GB, > exceeding the storage lim

Re: [VOTE] Release Spark 3.5.2 (RC4)

2024-07-29 Thread Wenchen Fan
+1 On Sat, Jul 27, 2024 at 10:03 AM Dongjoon Hyun wrote: > +1 > > Thank you, Kent. > > Dongjoon. > > On Fri, Jul 26, 2024 at 6:37 AM Kent Yao wrote: > >> Hi dev, >> >> Please vote on releasing the following candidate as Apache Spark version >> 3.5.2. >> >> The vote is open until Jul 29, 14:00:0

Re: [外部邮件] [VOTE] Release Spark 3.5.2 (RC2)

2024-07-25 Thread Wenchen Fan
I'm changing my vote to -1 as we found a regression that breaks Delta Lake's generated column feature. The fix was merged just now: https://github.com/apache/spark/pull/47483 Can we cut a new RC? On Thu, Jul 25, 2024 at 3:13 PM Mridul Muralidharan wrote: > > +1 > > Signatures, digests, etc chec

Re: [外部邮件] [VOTE] Release Spark 3.5.2 (RC2)

2024-07-23 Thread Wenchen Fan
+1 On Wed, Jul 24, 2024 at 10:51 AM Kent Yao wrote: > +1(non-binding), I have checked: > > - Download links are OK > - Signatures, Checksums, and the KEYS file are OK > - LICENSE and NOTICE are present > - No unexpected binary files in source releases > - Successfully built from source > > Thank

Re: [VOTE] Differentiate Spark without Spark Connect from Spark Connect

2024-07-22 Thread Wenchen Fan
+1 On Tue, Jul 23, 2024 at 8:40 AM Xinrong Meng wrote: > +1 > > Thank you @Hyukjin Kwon ! > > On Mon, Jul 22, 2024 at 5:20 PM Gengliang Wang wrote: > >> +1 >> >> On Mon, Jul 22, 2024 at 5:19 PM Hyukjin Kwon >> wrote: >> >>> Starting with my own +1. >>> >>> On Tue, 23 Jul 2024 at 09:12, Hyukji

Re: [DISCUSS] Differentiate Spark without Spark Connect from Spark Connect

2024-07-21 Thread Wenchen Fan
Classic SGTM. On Mon, Jul 22, 2024 at 1:12 PM Jungtaek Lim wrote: > I'd propose not to change the name of "Spark Connect" - the name > represents the characteristic of the mode (separation of layer for client > and server). Trying to remove the part of "Connect" would just make > confusion. > >

Re: [VOTE] Release Spark 3.5.2 (RC1)

2024-07-18 Thread Wenchen Fan
> The vote is open until Jul 18 Is it a typo? It's July 18 today. On Thu, Jul 18, 2024 at 6:30 PM Kent Yao wrote: > Hi dev, > > Please vote on releasing the following candidate as Apache Spark version > 3.5.2. > > The vote is open until Jul 18, 11 AM UTC, and passes if a majority +1 > PMC votes

Re: [VOTE] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-08 Thread Wenchen Fan
+1 On Tue, Jul 9, 2024 at 10:47 AM Reynold Xin wrote: > +1 > > On Mon, Jul 8, 2024 at 7:44 PM haydn wrote: > >> +1 >> >> On Mon, Jul 8, 2024 at 7:41 PM haydn wrote: >> >>> +1 >>> >>> On Mon, Jul 8, 2024 at 19:41 Takuya UESHIN >>> wrote: >>> +1 On Mon, Jul 8, 2024 at 6:05 PM Yua

Re: [VOTE] Move Spark Connect server to builtin package (Client API layer stays external)

2024-07-03 Thread Wenchen Fan
+1 On Thu, Jul 4, 2024 at 10:41 AM Gengliang Wang wrote: > +1 > > On Wed, Jul 3, 2024 at 4:48 PM Reynold Xin > wrote: > >> +1 >> >> On Wed, Jul 3, 2024 at 4:45 PM L. C. Hsieh wrote: >> >>> +1 >>> >>> On Wed, Jul 3, 2024 at 3:54 PM Dongjoon Hyun >>> wrote: >>> > >>> > +1 >>> > >>> > Dongjoon >

Re: 4.0.0-preview1 test report: running on Yarn

2024-06-17 Thread Wenchen Fan
Thanks for sharing! Yea Spark 4.0 is built using Java 17. On Tue, Jun 18, 2024 at 5:07 AM George Magiros wrote: > I successfully submitted and ran org.apache.spark.examples.SparkPi on Yarn > using 4.0.0-preview1. However I got it to work only after fixing an issue > with the Yarn nodemanagers (

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-11 Thread Wenchen Fan
ebsite. > The downside is that backport merge conflicts *will *force developers to > backport changes themselves. While I do not want to sign up for that work, > is this something people are more comfortable with? > > Neil > > > On Tue, Jun 11, 2024 at 8:47 AM Wenchen Fan w

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-11 Thread Wenchen Fan
Wed, Jun 5, 2024 at 3:22 PM Neil Ramaswamy >>>> wrote: >>>> >>>>> Thanks all for the responses. Let me try to address everything. >>>>> >>>>> > the programming guides are also different between versions since >>>>>

Re: [DISCUSS] Versionless Spark Programming Guide Proposal

2024-06-05 Thread Wenchen Fan
I agree with the idea of a versionless programming guide. But one thing we need to make sure of is we give clear messages for things that are only available in a new version. My proposal is: 1. keep the old versions' programming guide unchanged. For example, people can still access https:

[ANNOUNCE] Announcing Apache Spark 4.0.0-preview1

2024-06-03 Thread Wenchen Fan
Hi all, To enable wide-scale community testing of the upcoming Spark 4.0 release, the Apache Spark community has posted a preview release of Spark 4.0. This preview is not a stable release in terms of either API or functionality, but it is meant to give the community early access to try the code t

Re: [VOTE] SPARK 4.0.0-preview1 (RC3)

2024-06-02 Thread Wenchen Fan
The vote passes with 6+1s (4 binding +1s). (* = binding) +1: Wenchen Fan (*) Kent Yao Cheng Pan Xiao Li (*) Gengliang Wang (*) Tathagata Das (*) Thanks all! On Fri, May 31, 2024 at 6:07 PM Tathagata Das wrote: > +1 > - Tested RC3 with Delta Lake. All our Scala and Python tests pass.

Re: [DISCUSS] clarify the definition of behavior changes

2024-05-28 Thread Wenchen Fan
Hi all, I've created a PR to put the behavior change guideline on the Spark website: https://github.com/apache/spark-website/pull/518 . Please leave comments if you have any, thanks! On Wed, May 15, 2024 at 1:41 AM Wenchen Fan wrote: > Thanks all for the feedback here! Let me put

Re: [VOTE] SPARK 4.0.0-preview1 (RC3)

2024-05-28 Thread Wenchen Fan
one correction: "The tag to be voted on is v4.0.0-preview1-rc2 (commit 7cfe5a6e44e8d7079ae29ad3e2cee7231cd3dc66)" should be "The tag to be voted on is v4.0.0-preview1-rc3 (commit 7a7a8bc4bab591ac8b98b2630b38c57adf619b82):" On Tue, May 28, 2024 at 11:48 AM Wenchen Fan wrot

[VOTE] SPARK 4.0.0-preview1 (RC3)

2024-05-28 Thread Wenchen Fan
Please vote on releasing the following candidate as Apache Spark version 4.0.0-preview1. The vote is open until May 31 PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 4.0.0-preview1 [ ] -1 Do not release this package be

Re: [VOTE] SPARK 4.0.0-preview1 (RC2)

2024-05-28 Thread Wenchen Fan
ude this bug fix > > > https://github.com/apache/spark/commit/6cd1ccc56321dfa52672cd25f4cfdf2bbc86b3ea > . > > The bug can lead to the unrecoverable job failure. > > > > Thanks, > > Yi > > > > On Tue, May 28, 2024 at 3:45 PM Wenchen Fan wrote: > > > > > Please vote on

[VOTE] SPARK 4.0.0-preview1 (RC2)

2024-05-28 Thread Wenchen Fan
Please vote on releasing the following candidate as Apache Spark version 4.0.0-preview1. The vote is open until May 31 PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 4.0.0-preview1 [ ] -1 Do not release this package be

Re: [DISCUSS] clarify the definition of behavior changes

2024-05-15 Thread Wenchen Fan
t a note in the migration guide would have helped. > > > > To summarize: the migration guide was invaluable, we appreciated every > entry, and we'd appreciate Wenchen's stricter definition of "behavior > changes" (especially for silent ones). > > > >

Re: [VOTE] SPARK 4.0.0-preview1 (RC1)

2024-05-15 Thread Wenchen Fan
.sparkproject.jetty.servlet.ServletHandler.initialize(ServletHandler.java:749) > ~[spark-core_2.13-4.0.0-preview1.jar:4.0.0-preview1] > ... 38 more > > Thanks, > Cheng Pan > > > > On May 11, 2024, at 13:55, Wenchen Fan wrote: > > > > Please vote on releas

Re: [VOTE] SPIP: Stored Procedures API for Catalogs

2024-05-13 Thread Wenchen Fan
+1 On Tue, May 14, 2024 at 8:19 AM Zhou Jiang wrote: > +1 (non-binding) > > On Sat, May 11, 2024 at 2:10 PM L. C. Hsieh wrote: > >> Hi all, >> >> I’d like to start a vote for SPIP: Stored Procedures API for Catalogs. >> >> Please also refer to: >> >>- Discussion thread: >> https://lists.apa

Re: [DISCUSS] Spark - How to improve our release processes

2024-05-13 Thread Wenchen Fan
again to fix this problem, but it needs to be in > collaboration with a committer since I cannot fully test the release > scripts. (This testing gap is what doomed my last attempt at fixing this > problem.) > > Nick > > > On May 13, 2024, at 12:18 AM, Wenchen Fan wrote:

Re: [VOTE] SPIP: Stored Procedures API for Catalogs

2024-05-12 Thread Wenchen Fan
+1 On Mon, May 13, 2024 at 10:30 AM Kent Yao wrote: > +1 > > Dongjoon Hyun 于2024年5月13日周一 08:39写道: > > > > +1 > > > > On Sun, May 12, 2024 at 3:50 PM huaxin gao > wrote: > >> > >> +1 > >> > >> On Sat, May 11, 2024 at 4:35 PM L. C. Hsieh wrote: > >>> > >>> +1 > >>> > >>> On Sat, May 11, 2024 at

Re: [DISCUSS] Spark - How to improve our release processes

2024-05-12 Thread Wenchen Fan
After finishing the 4.0.0-preview1 RC1, I have more experience with this topic now. In fact, the main job of the release process: building packages and documents, is tested in Github Action jobs. However, the way we test them is different from what we do in the release scripts. 1. the execution e

[VOTE] SPARK 4.0.0-preview1 (RC1)

2024-05-10 Thread Wenchen Fan
Please vote on releasing the following candidate as Apache Spark version 4.0.0-preview1. The vote is open until May 16 PST and passes if a majority +1 PMC votes are cast, with a minimum of 3 +1 votes. [ ] +1 Release this package as Apache Spark 4.0.0-preview1 [ ] -1 Do not release this package be

Re: [DISCUSS] SPIP: Stored Procedures API for Catalogs

2024-05-09 Thread Wenchen Fan
Thanks for leading this project! Let's move forward. On Fri, May 10, 2024 at 10:31 AM L. C. Hsieh wrote: > Thanks Anton. Thank you, Wenchen, Dongjoon, Ryan, Serge, Allison and > others if I miss those who are participating in the discussion. > > I suppose we have reached a consensus or close to

Re: [DISCUSS] Spark 4.0.0 release

2024-05-09 Thread Wenchen Fan
ncreased to 650MB > > Dongjoon. > > > > On Thu, May 9, 2024 at 8:12 AM Wenchen Fan wrote: > >> I've created a ticket: https://issues.apache.org/jira/browse/INFRA-25776 >> >> On Thu, May 9, 2024 at 11:06 PM Dongjoon Hyun >> wrote: >> >>>

Re: [DISCUSS] Spark 4.0.0 release

2024-05-09 Thread Wenchen Fan
: > >> Could you file an INFRA JIRA issue with the error message and context >> first, Wenchen? >> >> As you know, if we see something, we had better file a JIRA issue because >> it could be not only an Apache Spark project issue but also all ASF project >> issue

Re: [DISCUSS] Spark - How to improve our release processes

2024-05-09 Thread Wenchen Fan
Thanks for starting the discussion! To add a bit more color, we should at least add a test job to make sure the release script can produce the packages correctly. Today it's kind of being manually tested by the release manager each time, which slows down the release process. It's better if we can a

Re: [DISCUSS] Spark 4.0.0 release

2024-05-09 Thread Wenchen Fan
gt;> Nimrod >>>>> >>>>> >>>>> בתאריך יום ד׳, 8 במאי 2024, 00:54, מאת Holden Karau ‏< >>>>> holden.ka...@gmail.com>: >>>>> >>>>>> Indeed. We could conceivably build the release in CI/CD

Re: [DISCUSS] Spark 4.0.0 release

2024-05-07 Thread Wenchen Fan
yangjie01 wrote: > +1 > > > > *发件人**: *Jungtaek Lim > *日期**: *2024年5月2日 星期四 10:21 > *收件人**: *Holden Karau > *抄送**: *Chao Sun , Xiao Li , > Tathagata Das , Wenchen Fan < > cloud0...@gmail.com>, Cheng Pan , Nicholas Chammas < > nicholas.cham...@gmai

Re: ASF board report draft for May

2024-05-06 Thread Wenchen Fan
The preview release also needs a vote. I'll try my best to cut the RC on Monday, but the actual release may take some time. Hopefully, we can get it out this week but if the vote fails, it will take longer as we need more RCs. On Mon, May 6, 2024 at 7:22 AM Dongjoon Hyun wrote: > +1 for Holden's

Re: [DISCUSS] clarify the definition of behavior changes

2024-05-01 Thread Wenchen Fan
d users: There are some users that use spark as a service from a >> provider >> 2. Providers/Operators: There are some users that provide spark as a >> service for their internal(on-prem setup with yarn/k8s)/external(Something >> like EMR) customers >> 3. ? >> >

Re: [DISCUSS] clarify the definition of behavior changes

2024-05-01 Thread Wenchen Fan
modate the second group of users. > > On 1 May 2024, at 06:08, Wenchen Fan wrote: > > Hi all, > > It's exciting to see innovations keep happening in the Spark community and > Spark keeps evolving itself. To make these innovations available to more > users, it's importa

Re: [DISCUSS] Spark 4.0.0 release

2024-05-01 Thread Wenchen Fan
we make a Preview release, > the faster we can start getting feedback for fixing things for a great > Spark 4.0 final release. > > So I urge the community to produce a Spark 4.0 Preview soon even if > certain features targeting the Delta 4.0 release are still incomplete. > > Tha

[DISCUSS] clarify the definition of behavior changes

2024-04-30 Thread Wenchen Fan
Hi all, It's exciting to see innovations keep happening in the Spark community and Spark keeps evolving itself. To make these innovations available to more users, it's important to help users upgrade to newer Spark versions easily. We've done a good job on it: the PR template requires the author t

Re: Potential Impact of Hive Upgrades on Spark Tables

2024-04-30 Thread Wenchen Fan
Yes, Spark has a shim layer to support all Hive versions. It shouldn't be an issue as many users create native Spark data source tables already today, by explicitly putting the `USING` clause in the CREATE TABLE statement. On Wed, May 1, 2024 at 12:56 AM Mich Talebzadeh wrote: > @Wen

Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-29 Thread Wenchen Fan
alse, we have a more reasonable default behavior: creating Parquet tables (or whatever is specified by `spark.sql.sources.default`). On Tue, Apr 30, 2024 at 10:45 AM Wenchen Fan wrote: > @Mich Talebzadeh there seems to be a > misunderstanding here. The Spark native data source table is

Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-29 Thread Wenchen Fan
t; On Mon, Apr 29, 2024 at 2:08 AM Mich Talebzadeh > wrote: > >> >> Hi @Wenchen Fan >> >> Thanks for your response. I believe we have not had enough time to >> "DISCUSS" this matter. >> >> Currently in order to make Spark take advantage of Hive,

Re: [VOTE] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-28 Thread Wenchen Fan
@Mich Talebzadeh thanks for sharing your concern! Note: creating Spark native data source tables is usually Hive compatible as well, unless we use features that Hive does not support (TIMESTAMP NTZ, ANSI INTERVAL, etc.). I think it's a better default to create Spark native table in this case, ins

Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-25 Thread Wenchen Fan
gt; that, as with any advice, quote "one test result is worth one-thousand > expert opinions (Werner <https://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > > On Thu, 25 Apr 2024 at 11:17, Wenchen Fan wrote

Re: [DISCUSS] SPARK-46122: Set spark.sql.legacy.createHiveTableByDefault to false

2024-04-25 Thread Wenchen Fan
+1 On Thu, Apr 25, 2024 at 2:46 PM Kent Yao wrote: > +1 > > Nit: the umbrella ticket is SPARK-44111, not SPARK-4. > > Thanks, > Kent Yao > > Dongjoon Hyun 于2024年4月25日周四 14:39写道: > > > > Hi, All. > > > > It's great to see community activities to polish 4.0.0 more and more. > > Thank you all.

Re: [DISCUSS] Spark 4.0.0 release

2024-04-17 Thread Wenchen Fan
> > (cc. Anish to add more context on the plan for transformWithState) > > > > On Sat, Apr 13, 2024 at 3:15 AM Wenchen Fan wrote: > > Hi all, > > > > It's close to the previously proposed 4.0.0 release date (June 2024), > and I think it's

Re: [VOTE] Release Spark 3.4.3 (RC2)

2024-04-15 Thread Wenchen Fan
+1 On Mon, Apr 15, 2024 at 12:31 PM Dongjoon Hyun wrote: > I'll start with my +1. > > - Checked checksum and signature > - Checked Scala/Java/R/Python/SQL Document's Spark version > - Checked published Maven artifacts > - All CIs passed. > > Thanks, > Dongjoon. > > On 2024/04/15 04:22:26 Dongjoo

Re: [VOTE] SPARK-44444: Use ANSI SQL mode by default

2024-04-14 Thread Wenchen Fan
+1 On Sun, Apr 14, 2024 at 6:28 AM Dongjoon Hyun wrote: > I'll start from my +1. > > Dongjoon. > > On 2024/04/13 22:22:05 Dongjoon Hyun wrote: > > Please vote on SPARK-4 to use ANSI SQL mode by default. > > The technical scope is defined in the following PR which is > > one line of code chan

Re: [DISCUSS] SPARK-44444: Use ANSI SQL mode by default

2024-04-12 Thread Wenchen Fan
+1, the existing "NULL on error" behavior is terrible for data quality. I have one concern about error reporting with DataFrame APIs. Query execution is lazy and where the error happens can be far away from where the dataframe/column was created. We are improving it (PR

[DISCUSS] Spark 4.0.0 release

2024-04-12 Thread Wenchen Fan
- STRING collation support - Spark k8s operator versioning Please help to add more items to this list that are missed here. I would like to volunteer as the release manager for Apache Spark 4.0.0 if there is no objection. Thank you all for the great work that fills Spark 4.0! Wenchen Fan

Re: SPIP: Enhancing the Flexibility of Spark's Physical Plan to Enable Execution on Various Native Engines

2024-04-10 Thread Wenchen Fan
It's good to reduce duplication between different native accelerators of Spark, and AFAIK there is already a project trying to solve it: https://substrait.io/ I'm not sure why we need to do this inside Spark, instead of doing the unification for a wider scope (for all engines, not only Spark). O

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Wenchen Fan
+1 On Mon, Mar 11, 2024 at 5:26 PM Hyukjin Kwon wrote: > +1 > > On Mon, 11 Mar 2024 at 18:11, yangjie01 > wrote: > >> +1 >> >> >> >> Jie Yang >> >> >> >> *发件人**: *Haejoon Lee >> *日期**: *2024年3月11日 星期一 17:09 >> *收件人**: *Gengliang Wang >> *抄送**: *dev >> *主题**: *Re: [VOTE] SPIP: Structured Logg

Re: [VOTE] Release Apache Spark 3.5.1 (RC2)

2024-02-19 Thread Wenchen Fan
+1, thanks for making the release! On Sat, Feb 17, 2024 at 3:54 AM Sean Owen wrote: > Yeah let's get that fix in, but it seems to be a minor test only issue so > should not block release. > > On Fri, Feb 16, 2024, 9:30 AM yangjie01 wrote: > >> Very sorry. When I was fixing `SPARK-45242 ( >> htt

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-10 Thread Wenchen Fan
+1 On Thu, Jan 11, 2024 at 9:32 AM L. C. Hsieh wrote: > +1 > > On Wed, Jan 10, 2024 at 9:06 AM Bhuwan Sahni > wrote: > >> +1. This is a good addition. >> >> >> *Bhuwan Sahni* >> Staff Software Engineer >> >> bhuwan.sa...@databricks.com >> 500 108th Ave. NE >> Bellevu

Re: [DISCUSS] SPIP: Testing Framework for Spark UI Javascript files

2023-11-21 Thread Wenchen Fan
+1, very useful! On Wed, Nov 22, 2023 at 10:29 AM Dongjoon Hyun wrote: > Thank you for proposing a new UI test framework for Apache Spark 4.0. > > It looks very useful. > > Thanks, > Dongjoon. > > > On Tue, Nov 21, 2023 at 1:51 AM Kent Yao wrote: > >> Hi Spark Dev, >> >> This is a call to discu

Re: [VOTE] SPIP: State Data Source - Reader

2023-10-23 Thread Wenchen Fan
+1 On Mon, Oct 23, 2023 at 4:03 PM Jungtaek Lim wrote: > Starting with my +1 (non-binding). Thanks! > > On Mon, Oct 23, 2023 at 1:23 PM Jungtaek Lim > wrote: > >> Hi all, >> >> I'd like to start the vote for SPIP: State Data Source - Reader. >> >> The high level summary of the SPIP is that we p

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-03 Thread Wenchen Fan
Congrats! On Wed, Oct 4, 2023 at 8:25 AM Hyukjin Kwon wrote: > Woohoo! > > On Tue, 3 Oct 2023 at 22:47, Hussein Awala wrote: > >> Congrats to all of you! >> >> On Tue 3 Oct 2023 at 08:15, Rui Wang wrote: >> >>> Congratulations! Well deserved! >>> >>> -Rui >>> >>> >>> On Mon, Oct 2, 2023 at 10:

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Wenchen Fan
+1 On Tue, Sep 12, 2023 at 9:00 AM Yuanjian Li wrote: > +1 (non-binding) > > Yuanjian Li 于2023年9月11日周一 09:36写道: > >> @Peter Toth I've looked into the details of this >> issue, and it appears that it's neither a regression in version 3.5.0 nor a >> correctness issue. It's a bug related to a new

Re: [VOTE] Release Apache Spark 3.5.0 (RC3)

2023-08-31 Thread Wenchen Fan
Sorry for the last-minute bug report, but we found a regression in 3.5: the SQL INSERT command without a column list fills missing columns with NULL while Spark 3.4 does not allow it. According to the SQL standard, this shouldn't be allowed and thus a regression in 3.5. The fix has been merged but

Re: Spark writing API

2023-08-17 Thread Wenchen Fan
: > Hello Wenchen, > > On Wed, Aug 16, 2023 at 23:33 Wenchen Fan wrote: > >> > is there a way to hint to the downstream users on the number of rows >> expected to write? >> >> It will be very hard to do. Spark pipelines the execution (within shuffle &g

Re: Spark writing API

2023-08-16 Thread Wenchen Fan
> is there a way to hint to the downstream users on the number of rows expected to write? It will be very hard to do. Spark pipelines the execution (within shuffle boundaries) and we can't predict the number of final output rows. On Mon, Aug 7, 2023 at 8:27 PM Steve Loughran wrote: > > > On Thu

Re: What else could be removed in Spark 4?

2023-08-07 Thread Wenchen Fan
I think the principle is we should remove things that block us from supporting new things like Java 21, or come with a significant maintenance cost. If there is no benefit to removing deprecated APIs (just to keep the codebase clean?), I'd prefer to leave them there and not bother. On Tue, Aug 8,

Welcome two new Apache Spark committers

2023-08-06 Thread Wenchen Fan
Hi all, The Spark PMC recently voted to add two new committers. Please join me in welcoming them to their new role! - Peter Toth (Spark SQL) - Xiduo You (Spark SQL) They consistently make contributions to the project and clearly showed their expertise. We are very excited to have them join as co

Re: [DISCUSS] SPIP: Python Data Source API

2023-06-20 Thread Wenchen Fan
In an ideal world, every data source you want to connect to already has a Spark data source implementation (either v1 or v2), then this Python API is useless. But I feel it's common that people want to do quick data exploration, and the target data system is not popular enough to have an existing S

Re: [Feature Request] create *permanent* Spark View from DataFrame via PySpark

2023-06-09 Thread Wenchen Fan
DataFrame view stores the logical plan, while SQL view stores SQL text. I don't think we can support this feature until we have a reliable way to materialize logical plans. On Sun, Jun 4, 2023 at 10:31 PM Mich Talebzadeh wrote: > Try sending it to dev@spark.apache.org (and join that group) > > Y

Re: Apache Spark 3.4.1 Release?

2023-06-09 Thread Wenchen Fan
+1 On Fri, Jun 9, 2023 at 8:52 PM Xinrong Meng wrote: > +1. Thank you Doonjoon! > > Thanks, > > Xinrong Meng > > Mridul Muralidharan 于2023年6月9日 周五上午5:22写道: > >> >> +1, thanks Dongjoon ! >> >> Regards, >> Mridul >> >> On Thu, Jun 8, 2023 at 7:16 PM Jia Fan >> wrote: >> >>> +1 >>> >>> ___

Re: [VOTE] Release Apache Spark 3.4.0 (RC7)

2023-04-10 Thread Wenchen Fan
+1 On Tue, Apr 11, 2023 at 9:57 AM Yuming Wang wrote: > +1. > > On Tue, Apr 11, 2023 at 9:14 AM Yikun Jiang wrote: > >> +1 (non-binding) >> >> Also ran the docker image related test (signatures/standalone/k8s) with >> rc7: https://github.com/apache/spark-docker/pull/32 >> >> Regards, >> Yikun >

Re: [VOTE] Release Apache Spark 3.2.4 (RC1)

2023-04-10 Thread Wenchen Fan
+1 On Tue, Apr 11, 2023 at 10:09 AM Hyukjin Kwon wrote: > +1 > > On Tue, 11 Apr 2023 at 11:04, Ruifeng Zheng wrote: > >> +1 (non-binding) >> >> Thank you for driving this release! >> >> -- >> Ruifeng Zheng >> ruife...@foxmail.com >> >>

Re: [VOTE] Release Apache Spark 3.4.0 (RC5)

2023-04-03 Thread Wenchen Fan
Sorry for the last-minute change, but we found two wrong behaviors and want to fix them before the release: https://github.com/apache/spark/pull/40641 We missed a corner case when the input index for `array_insert` is 0. It should fail as 0 is an invalid index. https://github.com/apache/spark/pul

Re: Time for release v3.3.2

2023-01-31 Thread Wenchen Fan
+1, thanks! On Tue, Jan 31, 2023 at 3:17 PM Maxim Gekk wrote: > +1 > > On Tue, Jan 31, 2023 at 10:12 AM John Zhuge wrote: > >> +1 Thanks Liang-Chi for driving the release! >> >> On Mon, Jan 30, 2023 at 10:26 PM Yuming Wang wrote: >> >>> +1 >>> >>> On Tue, Jan 31, 2023 at 12:18 PM yangjie01 wr

Re: [VOTE][SPIP] Asynchronous Offset Management in Structured Streaming

2022-12-01 Thread Wenchen Fan
+1 On Thu, Dec 1, 2022 at 12:31 PM Shixiong Zhu wrote: > +1 > > > On Wed, Nov 30, 2022 at 8:04 PM Hyukjin Kwon wrote: > >> +1 >> >> On Thu, 1 Dec 2022 at 12:39, Mridul Muralidharan >> wrote: >> >>> >>> +1 >>> >>> Regards, >>> Mridul >>> >>> On Wed, Nov 30, 2022 at 8:55 PM Xingbo Jiang >>> wro

Re: [DISCUSSION] SPIP: Asynchronous Offset Management in Structured Streaming

2022-11-30 Thread Wenchen Fan
+1 to improve the widely used micro-batch mode first. On Thu, Dec 1, 2022 at 8:49 AM Hyukjin Kwon wrote: > +1 > > On Thu, 1 Dec 2022 at 08:10, Shixiong Zhu wrote: > >> +1 >> >> This is exciting. I agree with Jerry that this SPIP and continuous >> processing are orthogonal. This SPIP itself woul

Re: [ANNOUNCE] Apache Spark 3.2.3 released

2022-11-30 Thread Wenchen Fan
Thanks, Chao! On Wed, Nov 30, 2022 at 1:33 AM Chao Sun wrote: > We are happy to announce the availability of Apache Spark 3.2.3! > > Spark 3.2.3 is a maintenance release containing stability fixes. This > release is based on the branch-3.2 maintenance branch of Spark. We strongly > recommend all

Re: [VOTE][SPIP] Better Spark UI scalability and Driver stability for large applications

2022-11-16 Thread Wenchen Fan
+1, I'm looking forward to it! On Thu, Nov 17, 2022 at 9:44 AM Ye Zhou wrote: > +1 (non-binding) > Thanks for proposing this improvement to SHS, it resolves the main > performance issue within SHS. > > On Wed, Nov 16, 2022 at 1:15 PM Jungtaek Lim > wrote: > >> +1 >> >> Nice to see the chance fo

Re: [VOTE] Release Spark 3.2.3 (RC1)

2022-11-16 Thread Wenchen Fan
+1 On Thu, Nov 17, 2022 at 10:20 AM Yang,Jie(INF) wrote: > +1,non-binding > > > > The test combination of Java 11 + Scala 2.12 and Java 11 + Scala 2.13 has > passed. > > > > Yang Jie > > > > *发件人**: *Chris Nauroth > *日期**: *2022年11月17日 星期四 04:27 > *收件人**: *Yuming Wang > *抄送**: *"Yang,Jie(INF)"

Re: [DISCUSS] SPIP: Better Spark UI scalability and Driver stability for large applications

2022-11-15 Thread Wenchen Fan
This looks great! UI stability/scalability has been a pain point for a long time. On Sat, Nov 12, 2022 at 5:24 AM Gengliang Wang wrote: > Hi Everyone, > > I want to discuss the "Better Spark UI scalability and Driver stability > for large applications" proposal. Please find the links below: > >

Re: [VOTE] Release Spark 3.3.1 (RC4)

2022-10-18 Thread Wenchen Fan
+1 On Wed, Oct 19, 2022 at 4:59 AM Chao Sun wrote: > +1. Thanks Yuming! > > Chao > > On Tue, Oct 18, 2022 at 1:18 PM Thomas graves wrote: > > > > +1. Ran internal test suite. > > > > Tom > > > > On Sun, Oct 16, 2022 at 9:14 PM Yuming Wang wrote: > > > > > > Please vote on releasing the followi

Re: [DISCUSS] SPIP: Support Docker Official Image for Spark

2022-09-19 Thread Wenchen Fan
+1 On Mon, Sep 19, 2022 at 2:59 PM Yang,Jie(INF) wrote: > +1 (non-binding) > > > > Yang Jie > -- > *发件人:* Yikun Jiang > *发送时间:* 2022年9月19日 14:23:14 > *收件人:* Denny Lee > *抄送:* bo zhaobo; Yuming Wang; Kent Yao; Gengliang Wang; Hyukjin Kwon; > dev; zrf > *主题:* Re: [DISC

Re: Non-deterministic function duplicated in final Spark plan

2022-08-01 Thread Wenchen Fan
This is a hard one. Spark duplicates the join child plan if it's a self-join because Spark does not support diamond-shaped query plans. Seems the only option here is to write the join child plan to a parquet table (or using a shuffle) and read it back. On Mon, Aug 1, 2022 at 4:46 PM Enrico Minack

Re: [VOTE] Release Spark 3.2.2 (RC1)

2022-07-14 Thread Wenchen Fan
+1 On Wed, Jul 13, 2022 at 7:29 PM Yikun Jiang wrote: > +1 (non-binding) > > Checked out tag and built from source on Linux aarch64 and ran some basic > test. > > > Regards, > Yikun > > > On Wed, Jul 13, 2022 at 5:54 AM Mridul Muralidharan > wrote: > >> >> +1 >> >> Signatures, digests, etc chec

Re: [DISCUSS][Catalog API] Deprecate 4 Catalog API that takes two parameters which are (dbName, tableName/functionName)

2022-07-08 Thread Wenchen Fan
It's better to keep all APIs working. But in this case, I really have no idea how to make these 4 APIs reasonable. For example, tableExists(dbName: String, tableName: String) currently checks if table "dbName.tableName" exists in the Hive metastore, and does not work with v2 catalogs at all. It's n

Re: Apache Spark 3.2.2 Release?

2022-07-06 Thread Wenchen Fan
+1 On Thu, Jul 7, 2022 at 10:41 AM Xinrong Meng wrote: > +1 > > Thanks! > > > Xinrong Meng > > Software Engineer > > Databricks > > > On Wed, Jul 6, 2022 at 7:25 PM Xiao Li wrote: > >> +1 >> >> Xiao >> >> Cheng Su 于2022年7月6日周三 19:16写道: >> >>> +1 (non-binding) >>> >>> Thanks, >>> Cheng Su >>> >

Re: [VOTE][SPIP] Spark Connect

2022-06-14 Thread Wenchen Fan
+1 On Tue, Jun 14, 2022 at 9:38 AM Ruifeng Zheng wrote: > +1 > > > -- 原始邮件 -- > *发件人:* "huaxin gao" ; > *发送时间:* 2022年6月14日(星期二) 上午8:47 > *收件人:* "L. C. Hsieh"; > *抄送:* "Spark dev list"; > *主题:* Re: [VOTE][SPIP] Spark Connect > > +1 > > On Mon, Jun 13, 2022 at 5:42

Re: [VOTE] Release Spark 3.3.0 (RC6)

2022-06-13 Thread Wenchen Fan
+1, tests are all green and there are no more blocker issues AFAIK. On Fri, Jun 10, 2022 at 12:27 PM Maxim Gekk wrote: > Please vote on releasing the following candidate as > Apache Spark version 3.3.0. > > The vote is open until 11:59pm Pacific time June 14th and passes if a > majority +1 PMC v

Re: [VOTE] Release Spark 3.3.0 (RC2)

2022-05-19 Thread Wenchen Fan
I think it should have been fixed by https://github.com/apache/spark/commit/0fdb6757946e2a0991256a3b73c0c09d6e764eed . Maybe the fix is not completed... On Thu, May 19, 2022 at 2:16 PM Kent Yao wrote: > Thanks, Maxim. > > Leave my -1 for this release candidate. > > Unfortunately, I don't know w

Re: Unable to create view due to up cast error when migrating from Hive to Spark

2022-05-18 Thread Wenchen Fan
A view is essentially a SQL query. It's fragile to share views between Spark and Hive because different systems have different SQL dialects. They may interpret the view SQL query differently and introduce unexpected behaviors. In this case, Spark returns decimal type for gender * 0.3 - 0.1 but Hiv

Re: SIGMOD System Award for Apache Spark

2022-05-13 Thread Wenchen Fan
Great! Congratulations to everyone! On Fri, May 13, 2022 at 10:38 AM Gengliang Wang wrote: > Congratulations to the whole spark community! > > On Fri, May 13, 2022 at 10:14 AM Jungtaek Lim < > kabhwan.opensou...@gmail.com> wrote: > >> Congrats Spark community! >> >> On Fri, May 13, 2022 at 10:40

Re: [VOTE] Release Spark 3.3.0 (RC1)

2022-05-10 Thread Wenchen Fan
I'd like to see an RC2 as well. There is kind of a correctness bug fixed after RC1 is cut: https://github.com/apache/spark/pull/36468 Users may hit this bug much more frequently if they enable ANSI mode. It's not a regression so I'd vote -0. On Wed, May 11, 2022 at 5:24 AM Thomas graves wrote: >

Re: PR builder not working now

2022-04-19 Thread Wenchen Fan
Thank you, Hyukjin! On Wed, Apr 20, 2022 at 7:48 AM Dongjoon Hyun wrote: > It's great! Thank you. :) > > On Tue, Apr 19, 2022 at 4:42 PM Hyukjin Kwon wrote: > >> It's fixed now. >> >> On Tue, 19 Apr 2022 at 08:33, Hyukjin Kwon wrote: >> >>> It's still persistent. I will send an email to GitHub

Re: bazel and external/

2022-03-21 Thread Wenchen Fan
How about renaming it to `connectors` if docker is the only exception and will be moved out? On Sat, Mar 19, 2022 at 6:18 PM Alkis Evlogimenos wrote: > It looks like renaming the directory and moving components can be separate > steps. If there is consensus that connectors will move out, should

  1   2   3   4   5   6   >