Re: [VOTE] Document and Feature Preview via GitHub Pages

2024-09-11 Thread Ruifeng Zheng
+1 On Thu, Sep 12, 2024 at 6:49 AM Hyukjin Kwon wrote: > +1 > > On Thu, Sep 12, 2024 at 5:00 AM Gengliang Wang wrote: > >> +1 >> >> On Wed, Sep 11, 2024 at 6:30 AM Wenchen Fan wrote: >> >>> +1 >>> >>> On Wed, Sep 11, 2024 at 5:15 PM Martin Grund >>> wrote: >>> +1 On Wed, Sep 11

Re: [VOTE] Release Apache Spark 3.5.3 (RC3)

2024-09-11 Thread Ruifeng Zheng
+1 On Thu, Sep 12, 2024 at 2:36 AM L. C. Hsieh wrote: > +1 > > Thanks. > > On Wed, Sep 11, 2024 at 10:41 AM Dongjoon Hyun > wrote: > > > > +1 > > > > Dongjoon > > > > On 2024/09/11 13:51:23 Herman van Hovell wrote: > > > +1 > > > > > > On Wed, Sep 11, 2024 at 3:30 AM Kent Yao wrote: > > > > >

Re: Welcome new Apache Spark committers

2024-08-13 Thread Ruifeng Zheng
Congratulations, everyone! On Tue, Aug 13, 2024 at 12:14 PM Gengliang Wang wrote: > Congratulations, everyone! > > On Mon, Aug 12, 2024 at 7:10 PM Denny Lee wrote: > >> Congrats Allison, Martin, and Haejoon! >> >> On Tue, Aug 13, 2024 at 9:59 AM Jungtaek Lim < >> kabhwan.opensou...@gmail.com> w

Re: Welcoming a new PMC member

2024-08-13 Thread Ruifeng Zheng
Congratulations! On Tue, Aug 13, 2024 at 3:59 PM Martin Grund wrote: > Congratulations! > > On Tue, Aug 13, 2024 at 9:37 AM Peter Toth wrote: > >> Congratulations! >> >> Mridul Muralidharan ezt írta (időpont: 2024. aug. >> 13., K, 8:46): >> >>> >>> Congratulations Kent ! >>> >>> Regards, >>> M

Re: [DISCUSS] Deprecating SparkR

2024-08-12 Thread Ruifeng Zheng
+1 On Tue, Aug 13, 2024 at 1:08 PM Holden Karau wrote: > +1 > > Are the sparklyr folks on this list? > > Twitter: https://twitter.com/holdenkarau > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 > YouTube Live Streams: https://www.youtu

Re: Spark website repo size hits the storage limit of GitHub-hosted runners

2024-08-08 Thread Ruifeng Zheng
Hi Kent, I remember that we have some scripts to free disk in spark repo, maybe we can reuse them for spark-website. On Fri, Aug 9, 2024 at 9:57 AM Sean Owen wrote: > I don't think that's the issue - it's the size of what is cloned into a > container during the GitHub actions runs. Doesnt matter

Re: [外部邮件] Re: [VOTE] Differentiate Spark without Spark Connect from Spark Connect

2024-07-22 Thread Ruifeng Zheng
+1 On Tue, Jul 23, 2024 at 11:15 AM yangjie01 wrote: > +1 > > 在 2024/7/23 11:11,“Kent Yao”mailto:y...@apache.org>> 写入: > > > +1 > > > On 2024/07/23 02:04:17 Herman van Hovell wrote: > > +1 > > > > On Mon, Jul 22, 2024 at 8:56 PM Wenchen Fan > wrote: > > > > > +1 > >

Re: [DISCUSS] Differentiate Spark without Spark Connect from Spark Connect

2024-07-20 Thread Ruifeng Zheng
+1 for 'Classic' On Sun, Jul 21, 2024 at 8:03 AM Xiao Li wrote: > Classic is much better than Legacy. : ) > > Hyukjin Kwon 于2024年7月18日周四 16:58写道: > >> Hi all, >> >> I noticed that we need to standardize our terminology before moving >> forward. For instance, when documenting, 'Spark without Spa

Re: [DISCUSS] Why do we remove RDD usage and RDD-backed code?

2024-07-12 Thread Ruifeng Zheng
;> >>>>> This PR doesn't look proper to me in two ways. >>>>> - SparkSession is heavier than SparkContext >>>>> - According to the following PR description, the background is also >>>>> hidden in the community. >>>>> >>>>> > # Why are the changes needed? >>>>> > In databricks runtime, RDD read / write API has some issue for >>>>> certain storage types >>>>> > that requires the account key, but Dataframe read / write API >>>>> works. >>>>> >>>>> In addition, we don't know if this PR fixes the mentioned unknown >>>>> storage's issue or not because it's not testable in the community test >>>>> coverage. >>>>> >>>>> I'm wondering if the Apache Spark community aims to move away from the >>>>> RDD usage in favor of `Spark Connect`. Isn't it too early because `Spark >>>>> Connect` is not even GA in the community? >>>>> >>>>> Dongjoon. >>>>> >>>> -- Ruifeng Zheng E-mail: zrfli...@gmail.com

Re: [DISCUSS] Allow GitHub Actions runs for contributors' PRs without approvals in apache/spark-connect-go

2024-07-08 Thread Ruifeng Zheng
+1 On Sat, Jul 6, 2024 at 4:45 AM bo yang wrote: > +1 This is a great suggestion, thanks Hyukjin! > > > On Thu, Jul 4, 2024 at 4:11 AM Hyukjin Kwon wrote: > >> Alright! let me start the vote! >> >> On Thu, 4 Jul 2024 at 16:31, Mich Talebzadeh >> wrote: >> >>> A good point agreed. >>> >>> Mich

Re: [FYI] SPARK-47993: Drop Python 3.8

2024-04-25 Thread Ruifeng Zheng
+1 On Fri, Apr 26, 2024 at 10:26 AM Xinrong Meng wrote: > +1 > > On Thu, Apr 25, 2024 at 2:08 PM Holden Karau > wrote: > >> +1 >> >> Twitter: https://twitter.com/holdenkarau >> Books (Learning Spark, High Performance Spark, etc.): >> https://amzn.to/2MaRAG9 >> YouTube

Re: [PySpark]: DataFrameWriterV2.overwrite fails with spark connect

2024-04-11 Thread Ruifeng Zheng
r PR regarding this error? > If not, create one. > > Thanks, > Toki Takahashi > > --------- > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org > > -- Ruifeng Zheng E-mail: zrfli...@gmail.com

Re: [VOTE] SPIP: Pure Python Package in PyPI (Spark Connect)

2024-03-31 Thread Ruifeng Zheng
xt 72 hours: >> >> [ ] +1: Accept the proposal as an official SPIP >> [ ] +0 >> [ ] -1: I don’t think this is a good idea because … >> >> Thanks. >> > -- Ruifeng Zheng E-mail: zrfli...@gmail.com

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-12 Thread Ruifeng Zheng
+1 On Wed, Mar 13, 2024 at 4:32 AM John Zhuge wrote: > +1 (non-binding) > > On Tue, Mar 12, 2024 at 8:45 AM L. C. Hsieh wrote: > >> +1 >> >> >> On Tue, Mar 12, 2024 at 8:20 AM Chao Sun wrote: >> >>> +1 >>> >>> On Tue, Mar 12, 2024 at 8:03 AM Xiao Li >>> wrote: >>> +1 On Tue, Ma

Re: [VOTE] SPIP: Testing Framework for Spark UI Javascript files

2023-11-26 Thread Ruifeng Zheng
rg/thread/5rqrho4ldgmqlc173y2229pfll5sgkff >> <https://mailshield.baidu.com/check?q=A5eIk13PzbvR5UCVFABDK4GRUTDfo284IuNUsoBhT99%2fS%2boFEdCRAzrqN9WHLc6WlpnhXlOglydnrAZRonZnnNSbT%2fY%3d> >> [2] >> https://docs.google.com/document/d/1hWl5Q2CNNOjN5Ubyoa28XmpJtDyD9BtGtiEG2TT94rg/edit?usp=sharing >> <https://mailshield.baidu.com/check?q=%2blIPGMGbYRuMEyUGx9ZQr5J1dIG1UlQn%2foKhALCdvpTABcdfsCF%2feqphsUqfpMIo7PacgdBDy6l9QC%2bTgZsqyACtpv4nZolAb0la8ThaeT5qcuXbdAnaKqgLCfTm8MZMdthX2w%3d%3d> >> >> - >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >> >> >> > -- Ruifeng Zheng E-mail: zrfli...@gmail.com

Re: [VOTE] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-15 Thread Ruifeng Zheng
>> > >> Please also refer to: >>>>>>> > >> >>>>>>> > >>- Discussion thread: >>>>>>> > >> >>>>>>> https://lists.apache.org/thread/wdy7jfhf7m8jy74p6s0npjfd15ym5rxz >>>>>>> > >>- JIRA ticket: >>>>>>> https://issues.apache.org/jira/browse/SPARK-45923 >>>>>>> > >>- SPIP doc: >>>>>>> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE >>>>>>> > >> >>>>>>> > >> >>>>>>> > >> Please vote on the SPIP for the next 72 hours: >>>>>>> > >> >>>>>>> > >> [ ] +1: Accept the proposal as an official SPIP >>>>>>> > >> [ ] +0 >>>>>>> > >> [ ] -1: I don’t think this is a good idea because … >>>>>>> > >> >>>>>>> > >> >>>>>>> > >> Thank you! >>>>>>> > >> >>>>>>> > >> Liang-Chi Hsieh >>>>>>> > >> >>>>>>> > >> >>>>>>> - >>>>>>> > >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>> > >> >>>>>>> > > >>>>>>> > > >>>>>>> > > -- >>>>>>> > > >>>>>>> > > Zhou, Ye 周晔 >>>>>>> > >>>>>>> > >>>>>>> - >>>>>>> > To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>> > >>>>>>> >>>>>>> - >>>>>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>>>>> >>>>>>> >>>>>> -- Ruifeng Zheng E-mail: zrfli...@gmail.com

Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-25 Thread Ruifeng Zheng
+1 On Tue, Sep 26, 2023 at 12:51 PM Hyukjin Kwon wrote: > Hi all, > > I would like to start the vote for updating documentation hosted for EOL > and maintenance releases to improve the usability here, and in order for > end users to read the proper and correct documentation. > > For discussion t

Re: [ANNOUNCE] Apache Spark 3.5.0 released

2023-09-17 Thread Ruifeng Zheng
view the release notes: >> https://spark.apache.org/releases/spark-release-3-5-0.html >> >> We would like to acknowledge all community members for contributing to >> this >> release. This release would not have been possible without you. >> >> Best, >> Yuanjian >> >> -- Ruifeng Zheng E-mail: zrfli...@gmail.com

Re: [VOTE] Release Apache Spark 3.5.0 (RC5)

2023-09-11 Thread Ruifeng Zheng
+1 On Tue, Sep 12, 2023 at 7:24 AM Hyukjin Kwon wrote: > +1 > > On Tue, Sep 12, 2023 at 7:05 AM Xiao Li wrote: > >> +1 >> >> Xiao >> >> Yuanjian Li 于2023年9月11日周一 10:53写道: >> >>> @Peter Toth I've looked into the details of this >>> issue, and it appears that it's neither a regression in versio

Re: Welcome two new Apache Spark committers

2023-08-06 Thread Ruifeng Zheng
Congratulations! Peter and Xiduo! On Mon, Aug 7, 2023 at 10:13 AM Xiao Li wrote: > Congratulations, Peter and Xiduo! > > > > Debasish Das 于2023年8月6日周日 19:08写道: > >> Congratulations Peter and Xidou. >> >> On Sun, Aug 6, 2023, 7:05 PM Wenchen Fan wrote: >> >>> Hi all, >>> >>> The Spark PMC recen

Re: LLM script for error message improvement

2023-08-02 Thread Ruifeng Zheng
+1 from my side, I'm fine to have it as a helper script On Thu, Aug 3, 2023 at 10:53 AM Hyukjin Kwon wrote: > I think adding that dev tool script to improve the error message is fine. > > On Thu, 3 Aug 2023 at 10:24, Haejoon Lee > wrote: > >> Dear contributors, I hope you are doing well! >> >>

Re: Time for Spark 3.3.3 release?

2023-07-31 Thread Ruifeng Zheng
+1, thank you Yuming On Tue, Aug 1, 2023 at 10:40 AM Yuming Wang wrote: > Thank you. I will prepare 3.3.3-rc1 soon. > > On Sun, Jul 30, 2023 at 12:15 AM Dongjoon Hyun > wrote: > >> +1 >> >> Thank you for volunteering, Yuming. >> >> Dongjoon >> >> >> On Fri, Jul 28, 2023 at 11:35 AM Yuming Wang

Re: Spark Docker Official Image is now available

2023-07-19 Thread Ruifeng Zheng
for Spark: >> https://lists.apache.org/thread/l1793y5224n8bqkp3s6ltgkykso4htb3 >> [2] [VOTE] SPIP: Support Docker Official Image for Spark: >> https://lists.apache.org/thread/ro6olodm1jzdffwjx4oc7ol7oh6kshbl >> [3] https://github.com/docker-library/official-images/pull/130

Re: [VOTE][SPIP] Python Data Source API

2023-07-09 Thread Ruifeng Zheng
+1 On Mon, Jul 10, 2023 at 8:20 AM Jungtaek Lim wrote: > +1 > > On Sat, Jul 8, 2023 at 4:13 AM Reynold Xin > wrote: > >> +1! >> >> >> On Fri, Jul 7 2023 at 11:58 AM, Holden Karau >> wrote: >> >>> +1 >>> >>> On Fri, Jul 7, 2023 at 9:55 AM huaxin gao >>> wrote: >>> +1 On Fri, Jul

Re: [VOTE][SPIP] PySpark Test Framework

2023-06-21 Thread Ruifeng Zheng
+1 On Thu, Jun 22, 2023 at 1:11 PM Dongjoon Hyun wrote: > +1 > > Dongjoon > > On Wed, Jun 21, 2023 at 8:56 PM Hyukjin Kwon wrote: > >> +1 >> >> On Thu, 22 Jun 2023 at 02:20, Jacek Laskowski wrote: >> >>> +0 >>> >>> Pozdrawiam, >>> Jacek Laskowski >>> >>> "The Internals Of" Online Books

Re: [VOTE] Release Spark 3.4.1 (RC1)

2023-06-21 Thread Ruifeng Zheng
+1 On Wed, Jun 21, 2023 at 2:26 PM huaxin gao wrote: > +1 > > On Tue, Jun 20, 2023 at 11:21 PM Hyukjin Kwon > wrote: > >> +1 >> >> On Wed, 21 Jun 2023 at 14:23, yangjie01 wrote: >> >>> +1 >>> >>> >>> 在 2023/6/21 13:20,“L. C. Hsieh”>> vii...@gmail.com>> 写入: >>> >>> >>> +1 >>> >>> >>> On Tue, J

Re: [DISCUSS] SPIP: Add PySpark Test Framework

2023-06-14 Thread Ruifeng Zheng
+1 from my side sounds good, it will be helpful to both users and contributors to improve the test coverage On Wed, Jun 14, 2023 at 8:27 AM Hyukjin Kwon wrote: > Yeah, I have been thinking about this too, and Holden did some work here > that this SPIP will reuse. I support this. > > On Wed, 14

Re: Apache Spark 3.4.1 Release?

2023-06-09 Thread Ruifeng Zheng
+1 Thank you Dongjoon! On Fri, Jun 9, 2023 at 11:54 PM Xiao Li wrote: > +1 > > On Fri, Jun 9, 2023 at 08:30 Wenchen Fan wrote: > >> +1 >> >> On Fri, Jun 9, 2023 at 8:52 PM Xinrong Meng wrote: >> >>> +1. Thank you Doonjoon! >>> >>> Thanks, >>> >>> Xinrong Meng >>> >>> Mridul Muralidharan 于202

Re: [VOTE] Release Apache Spark 3.2.4 (RC1)

2023-04-10 Thread Ruifeng Zheng
+1 (non-binding) Thank you for driving this release! Ruifeng  Zheng ruife...@foxmail.com   -- Original -- From: "Yuming

Re: [VOTE] Release Apache Spark 3.4.0 (RC7)

2023-04-10 Thread Ruifeng Zheng
+1 (non-binding) Ruifeng  Zheng ruife...@foxmail.com   -- Original -- From: "Ken

Re: Apache Spark 3.2.2 Release?

2022-07-07 Thread Ruifeng Zheng
+1 thank you Dongjoon! Ruifeng Zheng ruife...@foxmail.com   -- Original -- From: "Yikun Jiang"

?????? [VOTE] Release Spark 3.3.0 (RC6)

2022-06-13 Thread Ruifeng Zheng
+1 (non-binding) Maxim, thank you for driving this release! thanks, ruifeng --  -- ??: "Chao Sun"

?????? [VOTE][SPIP] Spark Connect

2022-06-13 Thread Ruifeng Zheng
+1 --  -- ??: "huaxin gao"

回复:Introducing "Pandas API on Spark" component in JIRA, and use "PS" PR title component

2022-05-16 Thread Ruifeng Zheng
+1, I think it is a good idea -- 原始邮件 -- 发件人: "Hyukjin Kwon"

回复: [VOTE] Spark 3.1.3 RC4

2022-02-14 Thread Ruifeng Zheng
+1 (non-binding) checked the release script issue Dongjoon mentioned: curl -s https://dist.apache.org/repos/dist/dev/spark/v3.1.3-rc4-bin/spark-3.1.3-bin-hadoop2.7.tgz | tar tz | grep hadoop-common spark-3.1.3-bin-hadoop2.7/jars/hadoop-common-2.7.4 -- 原始邮件 

回复:[ANNOUNCE] Apache Spark 3.2.1 released

2022-01-28 Thread Ruifeng Zheng
It's Great! Congrats and thanks, huaxin! -- 原始邮件 -- 发件人: "huaxin gao"

回复: [VOTE] Release Spark 3.2.1 (RC2)

2022-01-24 Thread Ruifeng Zheng
+1 (non-binding) -- 原始邮件 -- 发件人: "Kent Yao" http://spark.apa

Re: [VOTE] Release Spark 3.2.1 (RC1)

2022-01-11 Thread Ruifeng Zheng
+1 (non-binding) Thanks, ruifeng zheng -- Original -- From: "Che

Re: [Core][Suggestion] sortWithinPartitions and aggregateWithinPartitions for RDD

2018-01-31 Thread Ruifeng Zheng
Do you mean in-memory processing? It works fine if all partitions are small. But when some partition don’t fit in memory, it will cause OOM. 发件人: Reynold Xin 日期: 2018年2月1日 星期四 下午3:14 收件人: Ruifeng Zheng 抄送: 主题: Re: [Core][Suggestion] sortWithinPartitions and aggregateWithinPartitions

[Core][Suggestion] sortWithinPartitions and aggregateWithinPartitions for RDD

2018-01-31 Thread Ruifeng Zheng
HI all: 1, Dataset API supports operation “sortWithinPartitions”, but in RDD API there is no counterpart (I know there is “repartitionAndSortWithinPartitions”, but I don’t want to repartition the RDD), I have to convert RDD to Dataset for this function. Would it make sense to add a “s