Re: Apache Spark 3.4.2 (?)

2023-11-06 Thread Holden Karau
+1 On Mon, Nov 6, 2023 at 4:30 PM yangjie01 wrote: > +1 > > > > *发件人**: *Yuming Wang > *日期**: *2023年11月7日 星期二 07:00 > *收件人**: *Santosh Pingale > *抄送**: *Dongjoon Hyun , dev > > *主题**: *Re: Apache Spark 3.4.2 (?) > > > > +1 > > > > On Tue, Nov 7, 2023 at 3:55 AM Santosh Pingale > wrote: > >

Re: Apache Spark 3.4.2 (?)

2023-11-06 Thread yangjie01
+1 发件人: Yuming Wang 日期: 2023年11月7日 星期二 07:00 收件人: Santosh Pingale 抄送: Dongjoon Hyun , dev 主题: Re: Apache Spark 3.4.2 (?) +1 On Tue, Nov 7, 2023 at 3:55 AM Santosh Pingale wrote: Makes sense given the nature of those commits. On Mon, Nov 6, 2023, 7:52 PM Dongjoon Hyun

Re: Apache Spark 3.4.2 (?)

2023-11-06 Thread Yuming Wang
+1 On Tue, Nov 7, 2023 at 3:55 AM Santosh Pingale wrote: > Makes sense given the nature of those commits. > > On Mon, Nov 6, 2023, 7:52 PM Dongjoon Hyun > wrote: > >> Hi, All. >> >> Apache Spark 3.4.1 tag was created on Jun 19th and `branch-3.4` has 103 >> commits including important security

Re: Apache Spark 3.4.2 (?)

2023-11-06 Thread Santosh Pingale
Makes sense given the nature of those commits. On Mon, Nov 6, 2023, 7:52 PM Dongjoon Hyun wrote: > Hi, All. > > Apache Spark 3.4.1 tag was created on Jun 19th and `branch-3.4` has 103 > commits including important security and correctness patches like > SPARK-44251, SPARK-44805, and

Re: On adding applyInArrow to groupBy and cogroup

2023-11-06 Thread Hyukjin Kwon
Sounds good, I'll review the PR. On Fri, 3 Nov 2023 at 14:08, Abdeali Kothari wrote: > Seeing more support for arrow based functions would be great. > Gives more control to application developers. And so pandas just becomes 1 > of the available options. > > On Fri, 3 Nov 2023, 21:23 Luca

Re: ASF board report draft for Nov 2023

2023-11-06 Thread Dongjoon Hyun
Thank you, Matei. It would be great if we can include upcoming plans briefly. - Apache Spark 3.4.2 (https://lists.apache.org/thread/35o2169l5r05k2mknqjy9mztq3ty1btr) - Apache Spark 3.3.4 EOL (December 16th) Dongjoon. On 2023/11/06 05:32:11 Matei Zaharia wrote: > It’s time to send our

Apache Spark 3.4.2 (?)

2023-11-06 Thread Dongjoon Hyun
Hi, All. Apache Spark 3.4.1 tag was created on Jun 19th and `branch-3.4` has 103 commits including important security and correctness patches like SPARK-44251, SPARK-44805, and SPARK-44940. https://github.com/apache/spark/releases/tag/v3.4.1 $ git log --oneline v3.4.1..HEAD | wc -l

ASF board report draft for Nov 2023

2023-11-05 Thread Matei Zaharia
It’s time to send our project’s quarterly report to the ASF board on Wednesday November 8th. Here’s what I wrote as a draft; let me know any suggested changes. = Issues for the board: - None Project status: - We released Apache Spark 3.5 on September 15, a feature

Re: [DISCUSS] SPIP: ShuffleManager short name registration via SparkPlugin

2023-11-05 Thread Alessandro Bellina
Thanks for the comments Reynold. This is an ease of use change, and it is not absolutely required (as other ease of use changes are not required either). That said, do we not want to invest in making Spark easier to configure for the average user, or even the user that is trying out Spark? Here

Re: [DISCUSS] SPIP: ShuffleManager short name registration via SparkPlugin

2023-11-04 Thread Reynold Xin
Why do we need this? The reason data source APIs need it is because it will be used by very unsophisticated end users and used all the time (for each connection / query). Shuffle is something you set up once, presumably by fairly sophisticated admins / engineers. On Sat, Nov 04, 2023 at 2:42

[DISCUSS] SPIP: ShuffleManager short name registration via SparkPlugin

2023-11-04 Thread Alessandro Bellina
Hello devs, I would like to start discussion on the SPIP "ShuffleManager short name registration via SparkPlugin" The idea behind this change is to allow a driver plugin (spark.plugins) to export ShuffleManagers via short names, along with sensible default configurations. Users can then use this

Re: On adding applyInArrow to groupBy and cogroup

2023-11-03 Thread Abdeali Kothari
Seeing more support for arrow based functions would be great. Gives more control to application developers. And so pandas just becomes 1 of the available options. On Fri, 3 Nov 2023, 21:23 Luca Canali, wrote: > Hi Enrico, > > > > +1 on supporting Arrow on par with Pandas. Besides the frameworks

RE: On adding applyInArrow to groupBy and cogroup

2023-11-03 Thread Luca Canali
Hi Enrico, +1 on supporting Arrow on par with Pandas. Besides the frameworks and libraries that you mentioned I add awkward array, a library used in High Energy Physics (for those interested more details on how we tested awkward array with Spark from back when mapInArrow was introduced can be

unsubscribe

2023-11-03 Thread Stefan Hagedorn

Re: Spark 3.2.1 parquet read error

2023-10-30 Thread Mich Talebzadeh
Hi, The error message when reading Parquet data in Spark 3.2.1 is due to a schema mismatch between the Parquet file and the Spark schema. The Parquet file contains INT32 data for the ss_sold_time_sk column, while Spark schema expects it to be BIGINT. This schema mismatch is causing the error.

Spark 3.2.1 parquet read error

2023-10-30 Thread Suryansh Agnihotri
Hello spark-dev I have loaded tpcds data in parquet format using spark *3.0.2* and while reading it from spark *3.2.1* , my query is failing with below error. Later I set spark.sql.parquet.enableVectorizedReader=false my but it resulted in a different error. I am also providing output of

Re: On adding applyInArrow to groupBy and cogroup

2023-10-28 Thread Adam Binford
I'm definitely +1 to include this. - It seems like an odd feature parity gap to have a map function but no group apply function. - There's currently no way to use large arrow types with applyInPandas, which can lead to errors hitting the 2 GiB max string/binary array size. I have a PR to Arrow

r-project.org is down

2023-10-27 Thread Khalid Mammadov
Hi devs Just heads up, *r-project.org is down* and may affect builds if infra image cache needs rebuild. Not sure who needs to fix this Cheers Khalid

On adding applyInArrow to groupBy and cogroup

2023-10-26 Thread Enrico Minack
Hi devs, PySpark allows to transform a |DataFrame| via Pandas *and* Arrow API: df.mapInArrow(map_arrow, schema="...") df.mapInPandas(map_pandas, schema="...") For |df.groupBy(...)| and |df.groupBy(...).cogroup(...)|, there is *only* a Pandas interface, no Arrow interface:

Unsubscribe

2023-10-26 Thread 杨军
Unsubscribe

Unsubscribe

2023-10-26 Thread Kiran Kumar Dusi
Unsubscribe

[VOTE][RESULT] SPIP: State Data Source - Reader

2023-10-25 Thread Jungtaek Lim
The vote passes with 9 +1s (4 binding +1s). Thanks to all who reviews the SPIP doc and votes! (* = binding) +1: - Jungtaek Lim - Wenchen Fan (*) - Anish Shrigondekar - L. C. Hsieh (*) - Jia Fan - Bartosz Konieczny - Yuanjian Li (*) - Shixiong Zhu (*) - Yuepeng Pan +0: None -1: None

Re: [VOTE] SPIP: State Data Source - Reader

2023-10-25 Thread Jungtaek Lim
Thanks all for participating! The vote passed. I'll send out the result to a separate thread. On Thu, Oct 26, 2023 at 10:52 AM Yuepeng Pan wrote: > +1 (non-binding) > > Regards, > Roc > > At 2023-10-23 12:23:52, "Jungtaek Lim" > wrote: > > Hi all, > > I'd like to start the vote for SPIP: State

Re:[VOTE] SPIP: State Data Source - Reader

2023-10-25 Thread Yuepeng Pan
+1 (non-binding) Regards, Roc At 2023-10-23 12:23:52, "Jungtaek Lim" wrote: Hi all, I'd like to start the vote for SPIP: State Data Source - Reader. The high level summary of the SPIP is that we propose a new data source which enables a read ability for state store in the checkpoint,

Re: [VOTE] SPIP: State Data Source - Reader

2023-10-25 Thread Shixiong Zhu
+1 Best Regards, Shixiong Zhu On Wed, Oct 25, 2023 at 4:20 PM Yuanjian Li wrote: > +1 > > Jungtaek Lim 于2023年10月25日周三 01:06写道: > >> Friendly reminder: the VOTE thread got 2 binding votes and needs 1 more >> binding vote to pass. >> >> On Wed, Oct 25, 2023 at 1:21 AM Bartosz Konieczny < >>

Re: [VOTE] SPIP: State Data Source - Reader

2023-10-25 Thread Yuanjian Li
+1 Jungtaek Lim 于2023年10月25日周三 01:06写道: > Friendly reminder: the VOTE thread got 2 binding votes and needs 1 more > binding vote to pass. > > On Wed, Oct 25, 2023 at 1:21 AM Bartosz Konieczny > wrote: > >> +1 >> >> On Tuesday, October 24, 2023, Jia Fan wrote: >> >>> +1 >>> >>> L. C. Hsieh

Re: [VOTE] SPIP: State Data Source - Reader

2023-10-24 Thread Jungtaek Lim
Friendly reminder: the VOTE thread got 2 binding votes and needs 1 more binding vote to pass. On Wed, Oct 25, 2023 at 1:21 AM Bartosz Konieczny wrote: > +1 > > On Tuesday, October 24, 2023, Jia Fan wrote: > >> +1 >> >> L. C. Hsieh 于2023年10月24日周二 13:23写道: >> >>> +1 >>> >>> On Mon, Oct 23, 2023

Re: [VOTE] SPIP: State Data Source - Reader

2023-10-24 Thread Bartosz Konieczny
+1 On Tuesday, October 24, 2023, Jia Fan wrote: > +1 > > L. C. Hsieh 于2023年10月24日周二 13:23写道: > >> +1 >> >> On Mon, Oct 23, 2023 at 6:31 PM Anish Shrigondekar >> wrote: >> > >> > +1 (non-binding) >> > >> > Thanks, >> > Anish >> > >> > On Mon, Oct 23, 2023 at 5:01 PM Wenchen Fan >> wrote: >>

Re: [VOTE] SPIP: State Data Source - Reader

2023-10-24 Thread Jia Fan
+1 L. C. Hsieh 于2023年10月24日周二 13:23写道: > +1 > > On Mon, Oct 23, 2023 at 6:31 PM Anish Shrigondekar > wrote: > > > > +1 (non-binding) > > > > Thanks, > > Anish > > > > On Mon, Oct 23, 2023 at 5:01 PM Wenchen Fan wrote: > >> > >> +1 > >> > >> On Mon, Oct 23, 2023 at 4:03 PM Jungtaek Lim < >

Re: [VOTE] SPIP: State Data Source - Reader

2023-10-23 Thread L. C. Hsieh
+1 On Mon, Oct 23, 2023 at 6:31 PM Anish Shrigondekar wrote: > > +1 (non-binding) > > Thanks, > Anish > > On Mon, Oct 23, 2023 at 5:01 PM Wenchen Fan wrote: >> >> +1 >> >> On Mon, Oct 23, 2023 at 4:03 PM Jungtaek Lim >> wrote: >>> >>> Starting with my +1 (non-binding). Thanks! >>> >>> On Mon,

Re: [VOTE] SPIP: State Data Source - Reader

2023-10-23 Thread Anish Shrigondekar
+1 (non-binding) Thanks, Anish On Mon, Oct 23, 2023 at 5:01 PM Wenchen Fan wrote: > +1 > > On Mon, Oct 23, 2023 at 4:03 PM Jungtaek Lim > wrote: > >> Starting with my +1 (non-binding). Thanks! >> >> On Mon, Oct 23, 2023 at 1:23 PM Jungtaek Lim < >> kabhwan.opensou...@gmail.com> wrote: >> >>>

Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-23 Thread laglangyue
+1 发自我的iPhone -- Original -- From: Jungtaek Lim https://lists.apache.org/thread/7ohctj1gmqbhds56bntf4s2zst5qpll1;(committer+can login to reply) or search with "[VOTE] SPIP: State Data Source - Reader" in your inbox. Every vote would be really appreciated! On

Re: [VOTE] SPIP: State Data Source - Reader

2023-10-23 Thread Wenchen Fan
+1 On Mon, Oct 23, 2023 at 4:03 PM Jungtaek Lim wrote: > Starting with my +1 (non-binding). Thanks! > > On Mon, Oct 23, 2023 at 1:23 PM Jungtaek Lim > wrote: > >> Hi all, >> >> I'd like to start the vote for SPIP: State Data Source - Reader. >> >> The high level summary of the SPIP is that we

Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-23 Thread Jungtaek Lim
FYI: VOTE thread is open, please check the link https://lists.apache.org/thread/7ohctj1gmqbhds56bntf4s2zst5qpll1 (committer+ can login to reply) or search with "[VOTE] SPIP: State Data Source - Reader" in your inbox. Every vote would be really appreciated! On Mon, Oct 23, 2023 at 1:06 PM Jungtaek

Re: [VOTE] SPIP: State Data Source - Reader

2023-10-22 Thread Jungtaek Lim
Starting with my +1 (non-binding). Thanks! On Mon, Oct 23, 2023 at 1:23 PM Jungtaek Lim wrote: > Hi all, > > I'd like to start the vote for SPIP: State Data Source - Reader. > > The high level summary of the SPIP is that we propose a new data source > which enables a read ability for state

[VOTE] SPIP: State Data Source - Reader

2023-10-22 Thread Jungtaek Lim
Hi all, I'd like to start the vote for SPIP: State Data Source - Reader. The high level summary of the SPIP is that we propose a new data source which enables a read ability for state store in the checkpoint, via batch query. This would enable two major use cases 1) constructing tests with

Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-22 Thread Jungtaek Lim
I don't see major comments as of now. Given that the thread was initiated more than 10 days ago and I see multiple supporters, I'm going to initiate a VOTE thread. Please participate in the VOTE thread as well. Thanks! On Thu, Oct 19, 2023 at 11:39 AM Jungtaek Lim wrote: > Also, I want to

SPARK-24156: Kafka messages left behind in Spark Structured Streaming

2023-10-19 Thread Phillip Henry
Hi, folks, A few years ago, I asked about SSS not processing the final batch left on a Kafka topic when using groupBy, OutputMode.Append and withWatermark. At the time, Jungtaek Lim kindly pointed out (27/7/20) that this was expected behaviour, that (if I have this correct) a message needs to

unsubscribe

2023-10-18 Thread ankur

Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-18 Thread Jungtaek Lim
Also, I want to replicate the comment Liang-Chi put into SPIP doc, as it is a rather general and usual question for every new addition of data source. Hence I want to sort it out for everyone. As I know, the author implemented a third-party tool for query state store > as a data source long time

Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-18 Thread Jungtaek Lim
Thanks Raghu for your support! Btw, I'd like to replicate the support from JIRA ticket itself, I see support from Chaoqin and Praveen. Thanks both! On Thu, Oct 19, 2023 at 5:56 AM Raghu Angadi wrote: > +1 overall and a big +1 to keeping offline state-rebalancing as a primary > use case. > >

Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-18 Thread Raghu Angadi
+1 overall and a big +1 to keeping offline state-rebalancing as a primary use case. Raghu. On Mon, Oct 16, 2023 at 11:25 AM Bartosz Konieczny wrote: > Thank you, Jungtaek, for your answers! It's clear now. > > +1 for me. It seems like a prerequisite for further ops-related > improvements for

unsubscribe

2023-10-18 Thread Duy Pham
unsubscribe

Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-18 Thread Jungtaek Lim
Thanks Yuanjian for your support! I've left a comment but to replicate here - I agree with your point. It's really uneasy for a new feature to be stable from the initial version and we might want to decide on breaking backward compatibility for (semantic) bug fixes/improvements. Maybe we could

Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-18 Thread Yuanjian Li
+1, I have no issues with the practicality and value of this feature itself. I've left some comments concerning ongoing maintenance and compatibility-related matters, which we can continue to discuss. Jungtaek Lim 于2023年10月17日周二 05:23写道: > Thanks Bartosz and Anish for your support! > > I'll

unsubscribe

2023-10-17 Thread chu Dragon
unsubscribe

Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-16 Thread Jungtaek Lim
Thanks Bartosz and Anish for your support! I'll wait for a couple more days to see whether we can hear more voices on this. We could probably look for initiating a VOTE thread if there is no objection. On Tue, Oct 17, 2023 at 5:48 AM Anish Shrigondekar < anish.shrigonde...@databricks.com> wrote:

Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-16 Thread Anish Shrigondekar
Hi Jungtaek, Thanks for putting this together. +1 from me and looks good overall. Posted some minor comments/questions to the doc. Thanks, Anish On Mon, Oct 16, 2023 at 11:25 AM Bartosz Konieczny wrote: > Thank you, Jungtaek, for your answers! It's clear now. > > +1 for me. It seems like a

Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-16 Thread Bartosz Konieczny
Thank you, Jungtaek, for your answers! It's clear now. +1 for me. It seems like a prerequisite for further ops-related improvements for the state store management. I mean especially here the state rebalancing that could rely on this read+write state store API. I don't mean here the dynamic state

Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-16 Thread Jungtaek Lim
bump for better reach On Thu, Oct 12, 2023 at 4:26 PM Jungtaek Lim wrote: > Sorry, please use this link instead for SPIP doc: > https://docs.google.com/document/d/1_iVf_CIu2RZd3yWWF6KoRNlBiz5NbSIK0yThqG0EvPY/edit?usp=sharing > > > On Thu, Oct 12, 2023 at 3:58 PM Jungtaek Lim > wrote: > >> Hi

unsubscribe

2023-10-12 Thread Duy Pham

Re: [DISCUSS] SPIP: State Data Source - Reader

2023-10-12 Thread Jungtaek Lim
Sorry, please use this link instead for SPIP doc: https://docs.google.com/document/d/1_iVf_CIu2RZd3yWWF6KoRNlBiz5NbSIK0yThqG0EvPY/edit?usp=sharing On Thu, Oct 12, 2023 at 3:58 PM Jungtaek Lim wrote: > Hi dev, > > I'd like to start a discussion on "State Data Source - Reader". > > This proposal

[DISCUSS] SPIP: State Data Source - Reader

2023-10-12 Thread Jungtaek Lim
Hi dev, I'd like to start a discussion on "State Data Source - Reader". This proposal aims to introduce a new data source "statestore" which enables reading the state rows from existing checkpoint via offline (batch) query. This will enable users to 1) create unit tests against stateful query

Re: Watermark on late data only

2023-10-10 Thread Raghu Angadi
I like some way to expose watermarks to the user. It does affect the processing of the records, so it is relevant for the users. `current_watermark()` is a good option. The implementation of this might be engine specific. But it is a very relevant concept for authors of streaming pipelines.

Re: Watermark on late data only

2023-10-10 Thread Jungtaek Lim
slight correction/clarification: We now take the "previous" watermark to determine the late record, because they are valid inputs for non-first stateful operators dropping records based on the same criteria would drop valid records from previous (upstream) stateful operators. Please look back

Re: Watermark on late data only

2023-10-10 Thread Jungtaek Lim
We wouldn't like to expose the internal mechanism to the public. As you are a very detail oriented engineer tracking major changes, you might notice that we "changed" the definition of late record while fixing late records. Previously the late record is defined as a record having event time

Re: Watermark on late data only

2023-10-10 Thread Bartosz Konieczny
Thank you for the clarification, Jungtaek  Indeed, it doesn't sound like a highly demanded feature from the end users, haven't seen that a lot on StackOverflow or mailing lists. I was just curious about the reasons. Using the arbitrary stateful processing could be indeed a workaround! But IMHO

Delimited identifiers.

2023-10-10 Thread Virgil Artimon Palanciuc
Apologies if this has been discussed before, I searched but couldn’t find it. What is the rationale behind picking backticks for identifier delimiters in spark? In the [SQL 92 spec]( https://www.contrib.andrew.cmu.edu/~shadow/sql/sql1992.txt), the delimited identifier is unambiguously defined

Re: Watermark on late data only

2023-10-09 Thread Jungtaek Lim
Technically speaking, "late data" represents the data which cannot be processed due to the fact the engine threw out the state associated with the data already. That said, the only reason watermark does exist for streaming is to handle stateful operators. From the engine's point of view, there is

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-09 Thread Xinrong Meng
Congratulations! On Mon, Oct 9, 2023 at 5:06 AM Kent Yao wrote: > Congrats! > > Kent > > > 在 2023年10月7日星期六,John Zhuge 写道: > >> Congratulations! >> >> On Fri, Oct 6, 2023 at 6:41 PM Yi Wu >> wrote: >> >>> Congrats! >>> >>> On Sat, Oct 7, 2023 at 9:24 AM XiDuo You wrote: >>>

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-09 Thread Kent Yao
Congrats! Kent 在 2023年10月7日星期六,John Zhuge 写道: > Congratulations! > > On Fri, Oct 6, 2023 at 6:41 PM Yi Wu wrote: > >> Congrats! >> >> On Sat, Oct 7, 2023 at 9:24 AM XiDuo You wrote: >> >>> Congratulations! >>> >>> Prashant Sharma 于2023年10月6日周五 00:26写道: >>> > >>> > Congratulations  >>> >

Watermark on late data only

2023-10-08 Thread Bartosz Konieczny
Hi, I've been analyzing the watermark propagation added in the 3.5.0 recently and had to return to the basics of watermarks. One question is still unanswered in my head. Why are the watermarks reserved to stateful queries? Can't they apply to the filtering late date out only? The reason is only

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-06 Thread John Zhuge
Congratulations! On Fri, Oct 6, 2023 at 6:41 PM Yi Wu wrote: > Congrats! > > On Sat, Oct 7, 2023 at 9:24 AM XiDuo You wrote: > >> Congratulations! >> >> Prashant Sharma 于2023年10月6日周五 00:26写道: >> > >> > Congratulations  >> > >> > On Wed, 4 Oct, 2023, 8:52 pm huaxin gao, >> wrote: >> >> >> >>

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-06 Thread Yi Wu
Congrats! On Sat, Oct 7, 2023 at 9:24 AM XiDuo You wrote: > Congratulations! > > Prashant Sharma 于2023年10月6日周五 00:26写道: > > > > Congratulations  > > > > On Wed, 4 Oct, 2023, 8:52 pm huaxin gao, wrote: > >> > >> Congratulations! > >> > >> On Wed, Oct 4, 2023 at 7:39 AM Chao Sun wrote: > >>>

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-06 Thread XiDuo You
Congratulations! Prashant Sharma 于2023年10月6日周五 00:26写道: > > Congratulations  > > On Wed, 4 Oct, 2023, 8:52 pm huaxin gao, wrote: >> >> Congratulations! >> >> On Wed, Oct 4, 2023 at 7:39 AM Chao Sun wrote: >>> >>> Congratulations! >>> >>> On Wed, Oct 4, 2023 at 5:11 AM Jungtaek Lim >>>

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-05 Thread Prashant Sharma
Congratulations  On Wed, 4 Oct, 2023, 8:52 pm huaxin gao, wrote: > Congratulations! > > On Wed, Oct 4, 2023 at 7:39 AM Chao Sun wrote: > >> Congratulations! >> >> On Wed, Oct 4, 2023 at 5:11 AM Jungtaek Lim >> wrote: >> >>> Congrats! >>> >>> 2023년 10월 4일 (수) 오후 5:04, yangjie01 님이 작성: >>>

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-04 Thread huaxin gao
Congratulations! On Wed, Oct 4, 2023 at 7:39 AM Chao Sun wrote: > Congratulations! > > On Wed, Oct 4, 2023 at 5:11 AM Jungtaek Lim > wrote: > >> Congrats! >> >> 2023년 10월 4일 (수) 오후 5:04, yangjie01 님이 작성: >> >>> Congratulations! >>> >>> >>> >>> Jie Yang >>> >>> >>> >>> *发件人**: *Dongjoon Hyun

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-04 Thread Chao Sun
Congratulations! On Wed, Oct 4, 2023 at 5:11 AM Jungtaek Lim wrote: > Congrats! > > 2023년 10월 4일 (수) 오후 5:04, yangjie01 님이 작성: > >> Congratulations! >> >> >> >> Jie Yang >> >> >> >> *发件人**: *Dongjoon Hyun >> *日期**: *2023年10月4日 星期三 13:04 >> *收件人**: *Hyukjin Kwon >> *抄送**: *Hussein Awala , Rui

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-04 Thread Jungtaek Lim
Congrats! 2023년 10월 4일 (수) 오후 5:04, yangjie01 님이 작성: > Congratulations! > > > > Jie Yang > > > > *发件人**: *Dongjoon Hyun > *日期**: *2023年10月4日 星期三 13:04 > *收件人**: *Hyukjin Kwon > *抄送**: *Hussein Awala , Rui Wang , > Gengliang Wang , Xiao Li , " > dev@spark.apache.org" > *主题**: *Re: Welcome to

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-04 Thread yangjie01
Congratulations! Jie Yang 发件人: Dongjoon Hyun 日期: 2023年10月4日 星期三 13:04 收件人: Hyukjin Kwon 抄送: Hussein Awala , Rui Wang , Gengliang Wang , Xiao Li , "dev@spark.apache.org" 主题: Re: Welcome to Our New Apache Spark Committer and PMCs Congratulations! Dongjoon. On Tue, Oct 3, 2023 at 5:25 PM

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-03 Thread Dongjoon Hyun
Congratulations! Dongjoon. On Tue, Oct 3, 2023 at 5:25 PM Hyukjin Kwon wrote: > Woohoo! > > On Tue, 3 Oct 2023 at 22:47, Hussein Awala wrote: > >> Congrats to all of you! >> >> On Tue 3 Oct 2023 at 08:15, Rui Wang wrote: >> >>> Congratulations! Well deserved! >>> >>> -Rui >>> >>> >>> On Mon,

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-03 Thread Wenchen Fan
Congrats! On Wed, Oct 4, 2023 at 8:25 AM Hyukjin Kwon wrote: > Woohoo! > > On Tue, 3 Oct 2023 at 22:47, Hussein Awala wrote: > >> Congrats to all of you! >> >> On Tue 3 Oct 2023 at 08:15, Rui Wang wrote: >> >>> Congratulations! Well deserved! >>> >>> -Rui >>> >>> >>> On Mon, Oct 2, 2023 at

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-03 Thread Hyukjin Kwon
Woohoo! On Tue, 3 Oct 2023 at 22:47, Hussein Awala wrote: > Congrats to all of you! > > On Tue 3 Oct 2023 at 08:15, Rui Wang wrote: > >> Congratulations! Well deserved! >> >> -Rui >> >> >> On Mon, Oct 2, 2023 at 10:32 PM Gengliang Wang wrote: >> >>> Congratulations to all! Well deserved! >>>

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-03 Thread Mridul Muralidharan
Congratulations ! Looking forward to more exciting contributions :-) Regards, Mridul On Tue, Oct 3, 2023 at 2:51 AM Hussein Awala wrote: > Congrats to all of you! > > On Tue 3 Oct 2023 at 08:15, Rui Wang wrote: > >> Congratulations! Well deserved! >> >> -Rui >> >> >> On Mon, Oct 2, 2023 at

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-03 Thread Hussein Awala
Congrats to all of you! On Tue 3 Oct 2023 at 08:15, Rui Wang wrote: > Congratulations! Well deserved! > > -Rui > > > On Mon, Oct 2, 2023 at 10:32 PM Gengliang Wang wrote: > >> Congratulations to all! Well deserved! >> >> On Mon, Oct 2, 2023 at 10:16 PM Xiao Li wrote: >> >>> Hi all, >>> >>>

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-03 Thread Rui Wang
Congratulations! Well deserved! -Rui On Mon, Oct 2, 2023 at 10:32 PM Gengliang Wang wrote: > Congratulations to all! Well deserved! > > On Mon, Oct 2, 2023 at 10:16 PM Xiao Li wrote: > >> Hi all, >> >> The Spark PMC is delighted to announce that we have voted to add one new >> committer and

Re: Welcome to Our New Apache Spark Committer and PMCs

2023-10-02 Thread Gengliang Wang
Congratulations to all! Well deserved! On Mon, Oct 2, 2023 at 10:16 PM Xiao Li wrote: > Hi all, > > The Spark PMC is delighted to announce that we have voted to add one new > committer and two new PMC members. These individuals have consistently > contributed to the project and have clearly

Welcome to Our New Apache Spark Committer and PMCs

2023-10-02 Thread Xiao Li
Hi all, The Spark PMC is delighted to announce that we have voted to add one new committer and two new PMC members. These individuals have consistently contributed to the project and have clearly demonstrated their expertise. New Committer: - Jiaan Geng (focusing on Spark Connect and Spark SQL)

[RESULT] Updating documentation hosted for EOL and maintenance releases

2023-09-29 Thread Hyukjin Kwon
The vote passes with 9 +1s (6 binding +1s). (* = binding) +1: - Hyukjin Kwon * - Ruifeng Zheng * - Jiaan Geng - Yikun Jiang * - Herman van Hovell * - Michel Miotto Barbosa - Maciej Szymkiewicz * - Denny Lee - Yuanjian Li *

unsubscribe

2023-09-26 Thread praveen rao joginapally

Re: [ANNOUNCE] Apache Spark 3.5.0 released

2023-09-26 Thread Hyukjin Kwon
Awesome! On Wed, 27 Sept 2023 at 11:02, Hussein Awala wrote: > I installed the package, tested it with kubernetes master from Jupyter, > and tested it with Spark Connect server, all looks good. > > On Tue, Sep 26, 2023 at 10:45 PM Yuanjian Li > wrote: > >> FYI, we received the handling from

Re: Migrating the Junit framework used in Apache Spark 4.0 from 4.x to 5.x

2023-09-26 Thread Mridul Muralidharan
+1 for moving to a newer version. Thanks for driving this Jie Yang ! Regards, Mridul On Mon, Sep 25, 2023 at 10:15 AM 杨杰 wrote: > Hi all, > > In SPARK-44170 (apache/spark#43074 [1]), I’m trying to migrate the Junit > test framework used in Spark 4.0 from Junit4 to Junit5. > > > Although this

Re: [ANNOUNCE] Apache Spark 3.5.0 released

2023-09-26 Thread Hussein Awala
I installed the package, tested it with kubernetes master from Jupyter, and tested it with Spark Connect server, all looks good. On Tue, Sep 26, 2023 at 10:45 PM Yuanjian Li wrote: > FYI, we received the handling from Pypi > org yesterday, and the >

Re: [ANNOUNCE] Apache Spark 3.5.0 released

2023-09-26 Thread Yuanjian Li
FYI, we received the handling from Pypi org yesterday, and the upload of version 3.5.0 has just been completed. Please assist in verifying it. Thank you! Ruifeng Zheng 于2023年9月17日周日 23:28写道: > Thanks Yuanjian for driving this release,

Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-26 Thread Yuanjian Li
+1 Denny Lee 于2023年9月26日周二 12:07写道: > +1 > > On Tue, Sep 26, 2023 at 10:52 Maciej wrote: > >> +1 >> >> Best regards, >> Maciej Szymkiewicz >> >> Web: https://zero323.net >> PGP: A30CEF0C31A501EC >> >> On 9/26/23 17:12, Michel Miotto Barbosa wrote: >> >> +1 >> >> A disposição | At your disposal

Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-26 Thread Denny Lee
+1 On Tue, Sep 26, 2023 at 10:52 Maciej wrote: > +1 > > Best regards, > Maciej Szymkiewicz > > Web: https://zero323.net > PGP: A30CEF0C31A501EC > > On 9/26/23 17:12, Michel Miotto Barbosa wrote: > > +1 > > A disposição | At your disposal > > Michel Miotto Barbosa >

Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-26 Thread Maciej
+1 Best regards, Maciej Szymkiewicz Web:https://zero323.net PGP: A30CEF0C31A501EC On 9/26/23 17:12, Michel Miotto Barbosa wrote: +1 A disposição | At your disposal Michel Miotto Barbosa https://www.linkedin.com/in/michelmiottobarbosa/ mmiottobarb...@gmail.com +55 11 984 342 347 On Tue,

Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-26 Thread Michel Miotto Barbosa
+1 A disposição | At your disposal Michel Miotto Barbosa https://www.linkedin.com/in/michelmiottobarbosa/ mmiottobarb...@gmail.com +55 11 984 342 347 On Tue, Sep 26, 2023 at 11:44 AM Herman van Hovell wrote: > +1 > > On Tue, Sep 26, 2023 at 10:39 AM yangjie01 > wrote: > >> +1 >> >> >> >>

Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-26 Thread Herman van Hovell
+1 On Tue, Sep 26, 2023 at 10:39 AM yangjie01 wrote: > +1 > > > > *发件人**: *Yikun Jiang > *日期**: *2023年9月26日 星期二 18:06 > *收件人**: *dev > *抄送**: *Hyukjin Kwon , Ruifeng Zheng < > ruife...@apache.org> > *主题**: *Re: [VOTE] Updating documentation hosted for EOL and maintenance > releases > > > >

Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-26 Thread yangjie01
+1 发件人: Yikun Jiang 日期: 2023年9月26日 星期二 18:06 收件人: dev 抄送: Hyukjin Kwon , Ruifeng Zheng 主题: Re: [VOTE] Updating documentation hosted for EOL and maintenance releases +1, I believe it is a wise choice to update the EOL policy of the document based on the real demands of community users.

Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-26 Thread Yikun Jiang
+1, I believe it is a wise choice to update the EOL policy of the document based on the real demands of community users. Regards, Yikun On Tue, Sep 26, 2023 at 1:06 PM Ruifeng Zheng wrote: > +1 > > On Tue, Sep 26, 2023 at 12:51 PM Hyukjin Kwon > wrote: > >> Hi all, >> >> I would like to

Re:Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-26 Thread beliefer
+1 At 2023-09-26 13:03:56, "Ruifeng Zheng" wrote: +1 On Tue, Sep 26, 2023 at 12:51 PM Hyukjin Kwon wrote: Hi all, I would like to start the vote for updating documentation hosted for EOL and maintenance releases to improve the usability here, and in order for end users to read the

Re: [VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-25 Thread Ruifeng Zheng
+1 On Tue, Sep 26, 2023 at 12:51 PM Hyukjin Kwon wrote: > Hi all, > > I would like to start the vote for updating documentation hosted for EOL > and maintenance releases to improve the usability here, and in order for > end users to read the proper and correct documentation. > > For discussion

[VOTE] Updating documentation hosted for EOL and maintenance releases

2023-09-25 Thread Hyukjin Kwon
Hi all, I would like to start the vote for updating documentation hosted for EOL and maintenance releases to improve the usability here, and in order for end users to read the proper and correct documentation. For discussion thread, please refer to

Migrating the Junit framework used in Apache Spark 4.0 from 4.x to 5.x

2023-09-25 Thread 杨杰
Hi all, In SPARK-44170 (apache/spark#43074 [1]), I’m trying to migrate the Junit test framework used in Spark 4.0 from Junit4 to Junit5. Although this involves a fair amount of code modifications, given that Junit 4 is still developed based on Java 6 source code and it hasn't released a new

unsubscribe

2023-09-24 Thread Wei Hong
unsubscribe

Re: Are DataFrame rows ordered without an explicit ordering clause?

2023-09-24 Thread Mich Talebzadeh
LOL, Hindsight is a very good thing and often one learns these through experience.Once told off because strict ordering was not maintained, then the lesson will never be forgotten! HTH Mich Talebzadeh, Distinguished Technologist, Solutions Architect & Engineer London United Kingdom view

Re: Are DataFrame rows ordered without an explicit ordering clause?

2023-09-23 Thread Steve Loughran
Now, if you are ruthless it'd make sense to randomise the order of results if someone left out the order by, to stop complacency. like that time sun changed the ordering that methods were returned in a Class.listMethods() call and everyone's junit test cases failed if they'd assumed that ordering

Re:Are DataFrame rows ordered without an explicit ordering clause?

2023-09-23 Thread beliefer
AFAIK, The order is free whether it's SQL without spcified ORDER BY clause or DataFrame without sort. The behavior is consistent between them. At 2023-09-18 23:47:40, "Nicholas Chammas" wrote: I’ve always considered DataFrames to be logically equivalent to SQL tables or queries. In

[DISCUSS] Porting back SPARK-45178 to 3.5/3.4 version lines

2023-09-20 Thread Jungtaek Lim
Hi devs, I'd like to get some inputs for dealing with the possible correctness issue we figured. The JIRA ticket is SPARK-45178 and I described the issue and solution I proposed. Context: Source might behave incorrectly leading to correctness

<    4   5   6   7   8   9   10   11   12   13   >