Re: [DISCUSS] [Spark SQL] A single-pass resolution approach for the Catalyst Analyzer

2024-08-26 Thread Wenchen Fan
uced at least two subtle bugs > that many reviewers weren't able to catch and those two bugs would not have > been possible to introduce if we had a single pass analyzer. Single pass > can make the whole framework more robust. > > > > > > > On Tue, Aug 20, 2024 a

Re: [DISCUSS] [Spark SQL] A single-pass resolution approach for the Catalyst Analyzer

2024-08-20 Thread Reynold Xin
+1 on this too When I implemented "group by all", I introduced at least two subtle bugs that many reviewers weren't able to catch and those two bugs would not have been possible to introduce if we had a single pass analyzer. Single pass can make the whole framework more robust.

[DISCUSS] [Spark SQL] A single-pass resolution approach for the Catalyst Analyzer

2024-08-20 Thread Xiao Li
This sounds like a good idea! The Analyzer is complex. The changes in the new Analyzer should not affect the existing one. The users could add the QO rules and rely on the existing structures and patterns of the logical plan trees generated by the current one. The new Analyzer needs to generate

Re: [外部邮件] Re: Welcoming a new PMC member

2024-08-14 Thread yangjie01
Congratulations ! 发件人: Matei Zaharia 日期: 2024年8月14日 星期三 06:03 收件人: Wenchen Fan 抄送: Ruifeng Zheng , Martin Grund , Peter Toth , dev 主题: [外部邮件] Re: Welcoming a new PMC member Congrats and welcome Kent! On Aug 13, 2024, at 7:27 AM, Wenchen Fan wrote: Congratulations! On Tue, Aug 13, 2024

Re: [Spark SQL] A single-pass resolution approach for the Catalyst Analyzer

2024-08-14 Thread Vladimir Golubev
2: Support main datasources, ...). Running both analyzers in mixed mode may lead to unexpected logical plan problems, because that would introduce a completely different chain of transformations On Wed, Aug 14, 2024 at 3:58 PM Herman van Hovell wrote: > +1(000) on this! > > This should

Re: [Spark SQL] A single-pass resolution approach for the Catalyst Analyzer

2024-08-14 Thread Herman van Hovell
+1(000) on this! This should massively reduce allocations done in the analyzer, and it is much more efficient. I also can't count the times that I had to increase the number of iterations. This sounds like a no-brainer to me. I do have two questions: - How do we ensure that we

Re: Welcoming a new PMC member

2024-08-14 Thread Reynold Xin
>>>>> On Mon, Aug 12, 2024 at 8:46 PM Dongjoon Hyun < > dongjoon.h...@gmail.com <mailto:dongjoon.h...@gmail.com>> wrote: > > >>>>>> Congratulations, Kent. > > >>>>>> > > >>>>>> Dongjoon. > &

Re: Welcoming a new PMC member

2024-08-14 Thread Kent Yao
;> Congratulations Kent ! > >>>>> > >>>>> Regards, > >>>>> Mridul > >>>>> > >>>>> On Mon, Aug 12, 2024 at 8:46 PM Dongjoon Hyun >>>>> <mailto:dongjoon.h...@gmail.com>> wrote: > >>>

Re: Welcoming a new PMC member

2024-08-13 Thread Matei Zaharia
gt;>>> <mailto:dongjoon.h...@gmail.com>> wrote: >>>>>> Congratulations, Kent. >>>>>> >>>>>> Dongjoon. >>>>>> >>>>>> On Mon, Aug 12, 2024 at 5:22 PM Xiao Li >>>>> <mailto:gatorsm...@gmail.com>> wrote: >>>>>>> Congratulations ! >>>>>>> >>>>>>> Hyukjin Kwon mailto:gurwls...@apache.org>> >>>>>>> 于2024年8月12日周一 17:20写道: >>>>>>>> Hi all, >>>>>>>> >>>>>>>> The Spark PMC recently voted to add a new PMC member, Kent Yao. Join >>>>>>>> me in welcoming him to his new role! >>>>>>>>

Re: Welcoming a new PMC member

2024-08-13 Thread Wenchen Fan
n, Aug 12, 2024 at 8:46 PM Dongjoon Hyun >>>> wrote: >>>> >>>>> Congratulations, Kent. >>>>> >>>>> Dongjoon. >>>>> >>>>> On Mon, Aug 12, 2024 at 5:22 PM Xiao Li wrote: >>>>> >>>>>> Congratulations ! >>>>>> >>>>>> Hyukjin Kwon 于2024年8月12日周一 17:20写道: >>>>>> >>>>>>> Hi all, >>>>>>> >>>>>>> The Spark PMC recently voted to add a new PMC member, Kent Yao. Join >>>>>>> me in welcoming him to his new role! >>>>>>> >>>>>>>

Re: Welcoming a new PMC member

2024-08-13 Thread Ruifeng Zheng
; >>>> On Mon, Aug 12, 2024 at 5:22 PM Xiao Li wrote: >>>> >>>>> Congratulations ! >>>>> >>>>> Hyukjin Kwon 于2024年8月12日周一 17:20写道: >>>>> >>>>>> Hi all, >>>>>> >>>>>> The Spark PMC recently voted to add a new PMC member, Kent Yao. Join >>>>>> me in welcoming him to his new role! >>>>>> >>>>>>

Re: Welcoming a new PMC member

2024-08-13 Thread Martin Grund
; On Mon, Aug 12, 2024 at 8:46 PM Dongjoon Hyun >> wrote: >> >>> Congratulations, Kent. >>> >>> Dongjoon. >>> >>> On Mon, Aug 12, 2024 at 5:22 PM Xiao Li wrote: >>> >>>> Congratulations ! >>>> >>>> Hyukjin Kwon

Re: Welcoming a new PMC member

2024-08-13 Thread Peter Toth
; >> On Mon, Aug 12, 2024 at 5:22 PM Xiao Li wrote: >> >>> Congratulations ! >>> >>> Hyukjin Kwon 于2024年8月12日周一 17:20写道: >>> >>>> Hi all, >>>> >>>> The Spark PMC recently voted to add a new PMC member, Kent Yao. Join me >>>> in welcoming him to his new role! >>>> >>>>

Re: Welcoming a new PMC member

2024-08-12 Thread Gengliang Wang
>>> Congratulations ! >>> >>> Hyukjin Kwon 于2024年8月12日周一 17:20写道: >>> >>>> Hi all, >>>> >>>> The Spark PMC recently voted to add a new PMC member, Kent Yao. Join me >>>> in welcoming him to his new role! >>>> >>>>

Re: Welcoming a new PMC member

2024-08-12 Thread Denny Lee
Congrats, Kent! On Tue, Aug 13, 2024 at 9:06 AM Dongjoon Hyun wrote: > Congratulations, Kent. > > Dongjoon. > > On Mon, Aug 12, 2024 at 5:22 PM Xiao Li wrote: > >> Congratulations ! >> >> Hyukjin Kwon 于2024年8月12日周一 17:20写道: >> >>> Hi all, &g

Re: Welcoming a new PMC member

2024-08-12 Thread huaxin gao
;> >>> Congratulations ! >>> >>> Hyukjin Kwon 于2024年8月12日周一 17:20写道: >>> >>>> Hi all, >>>> >>>> The Spark PMC recently voted to add a new PMC member, Kent Yao. Join me >>>> in welcoming him to his new role! >>>> >>>>

Re: Welcoming a new PMC member

2024-08-12 Thread Mridul Muralidharan
>>> Hi all, >>> >>> The Spark PMC recently voted to add a new PMC member, Kent Yao. Join me >>> in welcoming him to his new role! >>> >>>

Re: Welcoming a new PMC member

2024-08-12 Thread Jungtaek Lim
Congrats, Kent! On Tue, Aug 13, 2024 at 10:06 AM Dongjoon Hyun wrote: > Congratulations, Kent. > > Dongjoon. > > On Mon, Aug 12, 2024 at 5:22 PM Xiao Li wrote: > >> Congratulations ! >> >> Hyukjin Kwon 于2024年8月12日周一 17:20写道: >> >>> Hi all, &g

Re: Welcoming a new PMC member

2024-08-12 Thread XiDuo You
Congratulations! Yuming Wang 于2024年8月13日周二 08:28写道: > > Congratulations! > > On Mon, Aug 12, 2024 at 5:20 PM Hyukjin Kwon wrote: >> >> Hi all, >> >> The Spark PMC recently voted to add a new PMC member, Kent Yao. Join me

Re: Welcoming a new PMC member

2024-08-12 Thread Yuming Wang
Congratulations! On Mon, Aug 12, 2024 at 5:20 PM Hyukjin Kwon wrote: > Hi all, > > The Spark PMC recently voted to add a new PMC member, Kent Yao. Join me in > welcoming him to his new role! > >

Re: Welcoming a new PMC member

2024-08-12 Thread Dongjoon Hyun
Congratulations, Kent. Dongjoon. On Mon, Aug 12, 2024 at 5:22 PM Xiao Li wrote: > Congratulations ! > > Hyukjin Kwon 于2024年8月12日周一 17:20写道: > >> Hi all, >> >> The Spark PMC recently voted to add a new PMC member, Kent Yao. Join me >> in welcoming him to his new role! >> >>

Re: Welcoming a new PMC member

2024-08-12 Thread Xiao Li
Congratulations ! Hyukjin Kwon 于2024年8月12日周一 17:20写道: > Hi all, > > The Spark PMC recently voted to add a new PMC member, Kent Yao. Join me in > welcoming him to his new role! > >

Welcoming a new PMC member

2024-08-12 Thread Hyukjin Kwon
Hi all, The Spark PMC recently voted to add a new PMC member, Kent Yao. Join me in welcoming him to his new role!

[Spark SQL] A single-pass resolution approach for the Catalyst Analyzer

2024-08-09 Thread Vladimir Golubev
unobvious, so it’s hard to introduce changes without having the full knowledge. By modifying one rule, the whole chain of transformations can change in an unobvious way. Since we can hit the maximum number of iterations, there’s no guarantee that the plan is going to be resolved. And from a

Re: caching a dataframe in Spark takes lot of time

2024-05-08 Thread Prem Sahoo
tps://en.wikipedia.org/wiki/Wernher_von_Braun>Von > Braun <https://en.wikipedia.org/wiki/Wernher_von_Braun>)". > > > On Wed, 8 May 2024 at 13:41, Prem Sahoo wrote: > >> Could any one help me here ? >> Sent from my iPhone >> >> > On May 7, 2024

Re: caching a dataframe in Spark takes lot of time

2024-05-08 Thread Mich Talebzadeh
e > > > On May 7, 2024, at 4:30 PM, Prem Sahoo wrote: > > > >  > > Hello Folks, > > in Spark I have read a file and done some transformation and finally > writing to hdfs. > > > > Now I am interested in writing the same dataframe to MapRFS but for this

Re: caching a dataframe in Spark takes lot of time

2024-05-08 Thread Prem Sahoo
Could any one help me here ? Sent from my iPhone > On May 7, 2024, at 4:30 PM, Prem Sahoo wrote: > >  > Hello Folks, > in Spark I have read a file and done some transformation and finally writing > to hdfs. > > Now I am interested in writing the same dataframe to MapR

caching a dataframe in Spark takes lot of time

2024-05-07 Thread Prem Sahoo
Hello Folks, in Spark I have read a file and done some transformation and finally writing to hdfs. Now I am interested in writing the same dataframe to MapRFS but for this Spark will execute the full DAG again (recompute all the previous steps)(all the read + transformations ). I don't

Re: Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-10 Thread Holden Karau
On Wed, Apr 10, 2024 at 9:54 PM Binwei Yang wrote: > > Gluten currently already support Velox backend and Clickhouse backend. > data fusion support is also proposed but no one worked on it. > > Gluten isn't a POC. It's under actively developing but some companies > al

Re: Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-10 Thread Binwei Yang
Gluten currently already support Velox backend and Clickhouse backend. data fusion support is also proposed but no one worked on it. Gluten isn't a POC. It's under actively developing but some companies already used it. On 2024/04/11 03:32:01 Dongjoon Hyun wrote: > I'm

Re: Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-10 Thread Dongjoon Hyun
I'm interested in your claim. Could you elaborate or provide some evidence for your claim, *a door for all native libraries*, Binwei? For example, is there any POC for that claim? Maybe, did I miss something in that SPIP? Dongjoon. On Wed, Apr 10, 2024 at 8:19 PM Binwei Yang wrote: &g

Re: Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-10 Thread Binwei Yang
The SPIP is not for current Gluten, but open a door for all native libraries and accelerators support. On 2024/04/11 00:27:43 Weiting Chen wrote: > Yes, the 1st Apache release(v1.2.0) for Gluten will be in September. > For Spark version support, currently Gluten v1.1.1 support Spark3.2 a

Re: Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-10 Thread Weiting Chen
project is still under active development now, and doesn't have a > stable release. > > https://github.com/apache/incubator-gluten/releases/tag/v1.1.1 > > In the Apache Spark community, Apache Spark 3.2 and 3.3 is the end of > support. > And, 3.4 will have 3.4.3 next

Re: Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-09 Thread Dongjoon Hyun
Thank you for sharing, Weiting. Do you think you can share the future milestone of Apache Gluten? I'm wondering when the first stable release will come and how we can coordinate across the ASF communities. > This project is still under active development now, and doesn't have a s

Introducing Apache Gluten(incubating), a middle layer to offload Spark to native engine

2024-04-08 Thread WeitingChen
Hi all, We are excited to introduce a new Apache incubating project called Gluten. Gluten serves as a middleware layer designed to offload Spark to native engines like Velox or ClickHouse. For more detailed information, please visit the project repository at https://github.com/apache/incubator

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-23 Thread Jay Han
> Some of you may be aware that Databricks community Home | Databricks >>> have just launched a knowledge sharing hub. I thought it would be a >>> good idea for the Apache Spark user group to have the same, especially >>> for repeat questions on Spark core, Spark SQL, Spa

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-19 Thread Mich Talebzadeh
I concur. Whilst Databricks' (a commercial entity) Knowledge Sharing Hub can be a useful resource for sharing knowledge and engaging with their respective community, ASF likely prioritizes platforms and channels that align more closely with its principles of open source, and vendor neutr

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-19 Thread Steve Loughran
ASF will be unhappy about this. and stack overflow exists. otherwise: apache Confluent and linkedIn exist; LI is the option I'd point at On Mon, 18 Mar 2024 at 10:59, Mich Talebzadeh wrote: > Some of you may be aware that Databricks community Home | Databricks > have just launched

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-19 Thread Mich Talebzadeh
n entertain this idea. They seem to have a well defined structure for hosting topics. Let me know your thoughts Thanks <https://community.databricks.com/t5/knowledge-sharing-hub/bd-p/Knowledge-Sharing-Hub> Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kin

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Varun Shah
+1 Great initiative. QQ : Stack overflow has a similar feature called "Collectives", but I am not sure of the expenses to create one for Apache Spark. With SO being used ( atleast before ChatGPT became quite the norm for searching questions), it already has a lot of questions asked an

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Deepak Sharma
>> >> >> >> >> >> >> *From: *ashok34...@yahoo.com.INVALID >> *Date: *Monday, March 18, 2024 at 6:36 AM >> *To: *user @spark , Spark dev list < >> dev@spark.apache.org>, Mich Talebzadeh >> *Cc: *Matei Zaharia >> *Subject: *R

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Hyukjin Kwon
org/wiki/Wernher_von_Braun>)". > > > On Mon, 18 Mar 2024 at 16:23, Parsian, Mahmoud > wrote: > >> Good idea. Will be useful >> >> >> >> +1 >> >> >> >> >> >> >> >> *From: *ashok34...@yahoo.com.INVALID

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Mich Talebzadeh
OK thanks for the update. What does officially blessed signify here? Can we have and run it as a sister site? The reason this comes to my mind is that the interested parties should have easy access to this site (from ISUG Spark sites) as a reference repository. I guess the advice would be that

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Reynold Xin
;> >>> >>> >>> +1 >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> >>> *From:* ashok34668@ yahoo. com. INVAL

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Mich Talebzadeh
ars 2024 kl. 17:26 skrev Parsian, Mahmoud > : > >> Good idea. Will be useful >> >> >> >> +1 >> >> >> >> >> >> >> >> *From: *ashok34...@yahoo.com.INVALID >> *Date: *Monday, March 18, 2024 at 6:36 AM >> *To: *

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Bjørn Jørgensen
y, March 18, 2024 at 6:36 AM > *To: *user @spark , Spark dev list < > dev@spark.apache.org>, Mich Talebzadeh > *Cc: *Matei Zaharia > *Subject: *Re: A proposal for creating a Knowledge Sharing Hub for Apache > Spark Community > > External message, be mindful when clicking l

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Mich Talebzadeh
> dev@spark.apache.org>, Mich Talebzadeh > *Cc: *Matei Zaharia > *Subject: *Re: A proposal for creating a Knowledge Sharing Hub for Apache > Spark Community > > External message, be mindful when clicking links or attachments > > > > Good idea. Will be useful >

A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Mich Talebzadeh
Some of you may be aware that Databricks community Home | Databricks have just launched a knowledge sharing hub. I thought it would be a good idea for the Apache Spark user group to have the same, especially for repeat questions on Spark core, Spark SQL, Spark Structured Streaming, Spark Mlib and

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-19 Thread Mich Talebzadeh
> shuffle and better memory management have been introduced, we plan to > publish the benchmark results (at least TPC-H) in the repo. > > > Compared to standard Spark, what kind of performance gains can be > expected with Comet? > > Currently, users could benefit from Comet in a few a

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-19 Thread Chao Sun
epo. > Compared to standard Spark, what kind of performance gains can be expected with Comet? Currently, users could benefit from Comet in a few areas: - Parquet read: a few improvements have been made against reading from S3 in particular, so users can expect better scan performance in this sc

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-16 Thread Mich Talebzadeh
Hi Chao, As a cool feature - Compared to standard Spark, what kind of performance gains can be expected with Comet? - Can one use Comet on k8s in conjunction with something like a Volcano addon? HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-15 Thread Mich Talebzadeh
ources but of course cannot be guaranteed . It is essential to note that, as with any advice, one verified and tested result holds more weight than a thousand expert opinions. On Thu, 15 Feb 2024 at 01:18, Chao Sun wrote: > Hi Praveen, > > We will add a "Getting Started" sectio

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-14 Thread Chao Sun
Hi Praveen, We will add a "Getting Started" section in the README soon, but basically comet-spark-shell <https://github.com/apache/arrow-datafusion-comet/blob/main/bin/comet-spark-shell> in the repo should provide a basic tool to build Comet and launch a Spark shell with it. Note

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-14 Thread Liu(Laswift) Cao
wrote: > >> > >> Absolutely thrilled to see the project going open-source! Huge congrats > to Chao and the entire team on this milestone! > >> > >> Yufei > >> > >> > >> On Tue, Feb 13, 2024 at 12:43 PM Chao Sun wrote: > >>>

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-14 Thread Chao Sun
team on this milestone! >> >> Yufei >> >> >> On Tue, Feb 13, 2024 at 12:43 PM Chao Sun wrote: >>> >>> Hi all, >>> >>> We are very happy to announce that Project Comet, a plugin to >>> accelerate Spark query execution via leve

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-13 Thread John Zhuge
>> Hi all, >> >> We are very happy to announce that Project Comet, a plugin to >> accelerate Spark query execution via leveraging DataFusion and Arrow, >> has now been open sourced under the Apache Arrow umbrella. Please >> check the project repo >> ht

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-13 Thread Yufei Gu
Absolutely thrilled to see the project going open-source! Huge congrats to Chao and the entire team on this milestone! Yufei On Tue, Feb 13, 2024 at 12:43 PM Chao Sun wrote: > Hi all, > > We are very happy to announce that Project Comet, a plugin to > accelerate Spark query e

Re: How do you debug a code-generated aggregate?

2024-02-13 Thread Mich Talebzadeh
Sure thanks for clarification. I gather what you are alluding to is -- in a distributed environment, when one does operations that involve shuffling or repartitioning of data, the order in which this data is processed across partitions is not guaranteed. So when repartitioning a dataframe, the

Re: How do you debug a code-generated aggregate?

2024-02-13 Thread Jack Goodson
Apologies if it wasn't clear, I was meaning the difficulty of debugging, not floating point precision :) On Wed, Feb 14, 2024 at 2:03 AM Mich Talebzadeh wrote: > Hi Jack, > > " most SQL engines suffer from the same issue... "" > > Sure. This behavior is

Re: Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-13 Thread Holden Karau
This looks really cool :) Out of interest what are the differences in the approach between this and Glutten? On Tue, Feb 13, 2024 at 12:42 PM Chao Sun wrote: > Hi all, > > We are very happy to announce that Project Comet, a plugin to > accelerate Spark query execution via leveragin

Introducing Comet, a plugin to accelerate Spark execution via DataFusion and Arrow

2024-02-13 Thread Chao Sun
Hi all, We are very happy to announce that Project Comet, a plugin to accelerate Spark query execution via leveraging DataFusion and Arrow, has now been open sourced under the Apache Arrow umbrella. Please check the project repo https://github.com/apache/arrow-datafusion-comet for more details if

Re: How do you debug a code-generated aggregate?

2024-02-13 Thread Mich Talebzadeh
Hi Jack, " most SQL engines suffer from the same issue... "" Sure. This behavior is not a bug, but rather a consequence of the limitations of floating-point precision. The numbers involved in the example (see SPIP [SPARK-47024] Sum of floats/doubles may be incorre

Re: How do you debug a code-generated aggregate?

2024-02-12 Thread Jack Goodson
I may be ignorant of other debugging methods in Spark but the best success I've had is using smaller datasets (if runs take a long time) and adding intermediate output steps. This is quite different from application development in non-distributed systems where a debugger is trivial to attach

Re: How do you debug a code-generated aggregate?

2024-02-12 Thread Nicholas Chammas
OK, I figured it out. The details are in SPARK-47024 <https://issues.apache.org/jira/browse/SPARK-47024> for anyone who’s interested. It turned out to be a floating point arithmetic “bug”. The main reason I was able to figure it out was because I’ve been investigating another, unrelated

Re: How do you debug a code-generated aggregate?

2024-02-12 Thread Herman van Hovell
sum("id")).show()+---+|sum(id)|+---+| > >>> 6|+---+ > > I’m trying to understand how this works because I’m investigating a bug in > this kind of aggregate. > > I see that doProduceWithoutKeys > <https://github.com/apache/spark/blob/d02fbba

How do you debug a code-generated aggregate?

2024-02-11 Thread Nicholas Chammas
Consider this example: >>> from pyspark.sql.functions import sum >>> spark.range(4).repartition(2).select(sum("id")).show() +---+ |sum(id)| +---+ | 6| +---+ I’m trying to understand how this works because I’m investigating a bug in this ki

Re: [EXTERNAL] Re: Add user as a contributor

2023-06-14 Thread Aman Raj
Thanks Hyukjin. Will do so. Thanks, Aman. From: Hyukjin Kwon Sent: Thursday, June 15, 2023 9:39 AM To: Aman Raj Cc: dev@spark.apache.org Subject: [EXTERNAL] Re: Add user as a contributor You don't often get email from gurwls...@apache.org. Learn why th

Re: Add user as a contributor

2023-06-14 Thread Hyukjin Kwon
You can open a PR first. When that's merged, the ticket will be assigned to you with the contribuor access On Thu, Jun 15, 2023 at 1:07 PM Aman Raj wrote: > Hi team, > > Can someone please help giving contributor access to amanraj2520 username. > I have raised a Spark Ticket :

Add user as a contributor

2023-06-14 Thread Aman Raj
Hi team, Can someone please help giving contributor access to amanraj2520 username. I have raised a Spark Ticket : issues.apache.org/jira/browse/SPARK-44058<https://issues.apache.org/jira/browse/SPARK-44058>. I am not able to assign this to myself. Thanks, Aman.

A Message from the Board to PMC members

2023-03-29 Thread Rich Bowen
Dear Apache Project Management Committee (PMC) members, The Board wants to take just a moment of your time to communicate a few things that seem to have been forgotten by a number of PMC members, across the Foundation, over the past few years. Please note that this is being sent to all projects

Re: Welcome Yikun Jiang as a Spark committer

2022-10-18 Thread Rui Wang
rk, Yikun! >>>>>>>>> >>>>>>>>> On Sun, Oct 9, 2022 at 10:52 AM Gengliang Wang >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> Congratulations, Yikun! &

Re: Welcome Yikun Jiang as a Spark committer

2022-10-10 Thread Xinrong Meng
;>>> >>>>>>>>> On Sun, Oct 9, 2022 at 12:33 AM 416161...@qq.com < >>>>>>>>> ruife...@foxmail.com> wrote: >>>>>>>>> >>>>>>>>>> Congrats, Yikun! >>>>>>>>>&

Re: Welcome Yikun Jiang as a Spark committer

2022-10-09 Thread John Zhuge
;>>>> >>>>>>>> On Sun, Oct 9, 2022 at 12:33 AM 416161...@qq.com < >>>>>>>> ruife...@foxmail.com> wrote: >>>>>>>> >>>>>>>>> Congrats, Yikun! >>>>>>>>> >>>>&

Re: Welcome Yikun Jiang as a Spark committer

2022-10-09 Thread Senthil Kumar
;>>>>> -- >>>>>>>> Ruifeng Zheng >>>>>>>> ruife...@foxmail.com >>>>>>>> >>>>>>>> <https://wx.mail.qq.com/home/index?t=readmail_businesscard_midpage&nocheck=t

Re: Welcome Yikun Jiang as a Spark committer

2022-10-09 Thread Xiao Li
t;>> >>>>>>> Congrats, Yikun! >>>>>>> >>>>>>> -- >>>>>>> Ruifeng Zheng >>>>>>> ruife...@foxmail.com >>>>>>> >>>>>>> <https://wx.mail.qq.com/home/index?t=readmail_businessc

Re: Welcome Yikun Jiang as a Spark committer

2022-10-09 Thread Yikun Jiang
readmail_businesscard_midpage&nocheck=true&name=Ruifeng+Zheng&icon=https%3A%2F%2Fres.mail.qq.com%2Fzh_CN%2Fhtmledition%2Fimages%2Frss%2Fmale.gif%3Frand%3D1617349242&mail=ruifengz%40foxmail.com&code=> >>>>>> >>>>>> >>>>>> &

Re: Welcome Yikun Jiang as a Spark committer

2022-10-09 Thread Chao Sun
t;>>> >>>>> <https://wx.mail.qq.com/home/index?t=readmail_businesscard_midpage&nocheck=true&name=Ruifeng+Zheng&icon=https%3A%2F%2Fres.mail.qq.com%2Fzh_CN%2Fhtmledition%2Fimages%2Frss%2Fmale.gif%3Frand%3D1617349242&mail=ruifengz%40foxmail.com&code=> >>>>> &

Re: Welcome Yikun Jiang as a Spark committer

2022-10-09 Thread vaquar khan
.gif%3Frand%3D1617349242&mail=ruifengz%40foxmail.com&code=> >>>> >>>> >>>> >>>> -- Original -- >>>> *From:* "Martin Grigorov" ; >>>> *Date:* Sun, Oct 9, 2022 05:01 AM >>

Re: Welcome Yikun Jiang as a Spark committer

2022-10-09 Thread 叶先进
m: "Martin Grigorov" mailto:mgrigo...@apache.org>>; > Date: Sun, Oct 9, 2022 05:01 AM > To: "Hyukjin Kwon"mailto:gurwls...@gmail.com>>; > Cc: "dev"mailto:dev@spark.apache.org>>;"Yikun > Jiang"mailto:yikunk...@gmail.com>>; >

Re: Welcome Yikun Jiang as a Spark committer

2022-10-09 Thread XiDuo You
F%2Fres.mail.qq.com%2Fzh_CN%2Fhtmledition%2Fimages%2Frss%2Fmale.gif%3Frand%3D1617349242&mail=ruifengz%40foxmail.com&code=> >>> >>> >>> >>> -- Original -- >>> *From:* "Martin Grigorov" ; >>> *D

Re: Welcome Yikun Jiang as a Spark committer

2022-10-09 Thread Maxim Gekk
=> >> >> >> >> -- Original -- >> *From:* "Martin Grigorov" ; >> *Date:* Sun, Oct 9, 2022 05:01 AM >> *To:* "Hyukjin Kwon"; >> *Cc:* "dev";"Yikun Jiang"; >> *Subject:* R

Re: Welcome Yikun Jiang as a Spark committer

2022-10-09 Thread Gengliang Wang
un, Oct 9, 2022 05:01 AM > *To:* "Hyukjin Kwon"; > *Cc:* "dev";"Yikun Jiang"; > *Subject:* Re: Welcome Yikun Jiang as a Spark committer > > Congratulations, Yikun! > > On Sat, Oct 8, 2022 at 7:41 AM Hyukjin Kwon wrote: > >> Hi all, >> >>

Re: Welcome Yikun Jiang as a Spark committer

2022-10-09 Thread 416161...@qq.com
Congrats, Yikun! Ruifeng Zheng ruife...@foxmail.com   -- Original -- From: "Martin Grigorov"

Re: Welcome Yikun Jiang as a Spark committer

2022-10-08 Thread Martin Grigorov
Congratulations, Yikun! On Sat, Oct 8, 2022 at 7:41 AM Hyukjin Kwon wrote: > Hi all, > > The Spark PMC recently added Yikun Jiang as a committer on the project. > Yikun is the major contributor of the infrastructure and GitHub Actions in > Apache Spark as well as Kubernates an

Re: Welcome Yikun Jiang as a Spark committer

2022-10-08 Thread Цвигун Евгений
Welcome Yikun! Stable infra is super important. Cheers Evgenii сб, 8 окт. 2022 г., 07:40 Hyukjin Kwon : > Hi all, > > The Spark PMC recently added Yikun Jiang as a committer on the project. > Yikun is the major contributor of the infrastructure and GitHub Actions in > Apache S

Re: Welcome Yikun Jiang as a Spark committer

2022-10-08 Thread Qian SUN
Congratulations! Hyukjin Kwon 于2022年10月8日周六 12:40写道: > Hi all, > > The Spark PMC recently added Yikun Jiang as a committer on the project. > Yikun is the major contributor of the infrastructure and GitHub Actions in > Apache Spark as well as Kubernates and PySpark. > He has p

Re: Welcome Yikun Jiang as a Spark committer

2022-10-07 Thread Jungtaek Lim
lidharan >> *发送时间:* 2022年10月8日 14:16:02 >> *收件人:* Yuming Wang >> *抄送:* Hyukjin Kwon; dev; Yikun Jiang >> *主题:* Re: Welcome Yikun Jiang as a Spark committer >> >> >> Congratulations ! >> >> Regards, >> Mridul >> >> On Sat, Oct 8,

Re: Welcome Yikun Jiang as a Spark committer

2022-10-07 Thread huaxin gao
n Jiang > *主题:* Re: Welcome Yikun Jiang as a Spark committer > > > Congratulations ! > > Regards, > Mridul > > On Sat, Oct 8, 2022 at 12:19 AM Yuming Wang wrote: > >> Congratulations Yikun! >> >> On Sat, Oct 8, 2022 at 12:40 PM Hyukjin Kwon wrote: >&g

答复: Welcome Yikun Jiang as a Spark committer

2022-10-07 Thread Yang,Jie(INF)
Congratulations Yikun! Regards, Yang Jie 发件人: Mridul Muralidharan 发送时间: 2022年10月8日 14:16:02 收件人: Yuming Wang 抄送: Hyukjin Kwon; dev; Yikun Jiang 主题: Re: Welcome Yikun Jiang as a Spark committer Congratulations ! Regards, Mridul On Sat, Oct 8, 2022 at 12:19 AM

Re: Welcome Yikun Jiang as a Spark committer

2022-10-07 Thread Mridul Muralidharan
Congratulations ! Regards, Mridul On Sat, Oct 8, 2022 at 12:19 AM Yuming Wang wrote: > Congratulations Yikun! > > On Sat, Oct 8, 2022 at 12:40 PM Hyukjin Kwon wrote: > >> Hi all, >> >> The Spark PMC recently added Yikun Jiang as a committer on the project. >&g

Re: Welcome Yikun Jiang as a Spark committer

2022-10-07 Thread Yuming Wang
Congratulations Yikun! On Sat, Oct 8, 2022 at 12:40 PM Hyukjin Kwon wrote: > Hi all, > > The Spark PMC recently added Yikun Jiang as a committer on the project. > Yikun is the major contributor of the infrastructure and GitHub Actions in > Apache Spark as well as Kubernates an

Welcome Yikun Jiang as a Spark committer

2022-10-07 Thread Hyukjin Kwon
Hi all, The Spark PMC recently added Yikun Jiang as a committer on the project. Yikun is the major contributor of the infrastructure and GitHub Actions in Apache Spark as well as Kubernates and PySpark. He has put a lot of effort into stabilizing and optimizing the builds so we all can work

Re: Creating a new component "Connect" in JIRA

2022-09-16 Thread Dongjoon Hyun
Thank you for sharing that information. +1 for the proposed way. Dongjoon. On Fri, Sep 16, 2022 at 5:07 AM Hyukjin Kwon wrote: > Hi all, > > I created a new component called "Connect" temporarily for the Spark > Connect project, > see https://issues.apache.org/jira/b

Creating a new component "Connect" in JIRA

2022-09-16 Thread Hyukjin Kwon
Hi all, I created a new component called "Connect" temporarily for the Spark Connect project, see https://issues.apache.org/jira/browse/SPARK-39375 because a lot of changes will be made in an isolated location, and the concept itself is pretty isolated as a separate component In addi

Re: [DISCUSS] [Spark SQL, PySpark] Combining StructTypes into a new StructType

2022-08-23 Thread Alexandros Biratsis
Hi Maciej, Sorry for the late reply. I believe you are right. Merging nested StructType s can be tricky. As a matter of fact, it will require a complex logic and most likely some conventions to include all the edge cases. What about just exposing the existing merge <https://github.com/apa

Re: [DISCUSS] [Spark SQL, PySpark] Combining StructTypes into a new StructType

2022-08-14 Thread Maciej
I have mixed feelings about this proposal. Merging or diffing schemas is a common operation, but specific requirements differ from case to case, especially when complex nested data is used. Even if we put ordering of the fields aside, data types equality semantics (StructField in particular

Re: [DISCUSS] [Spark SQL, PySpark] Combining StructTypes into a new StructType

2022-08-14 Thread Alexandros Biratsis
Hello Rui and Tim, Indeed this sound a good idea and quite useful. To make it more formal the list of a StructType could be treated as a Scala/Python set by providing(inheriting?) the common sets' functionality i.e add, remove, concat, intersect, diff etc. The set like functionality could be

Re: Welcome Xinrong Meng as a Spark committer

2022-08-11 Thread Xinrong Meng
ll, >>> >>> The Spark PMC recently added Xinrong Meng as a committer on the project. >>> Xinrong is the major contributor of PySpark especially Pandas API on Spark. >>> She has guided a lot of new contributors enthusiastically. Please join me >>> in welco

Re: Welcome Xinrong Meng as a Spark committer

2022-08-10 Thread Peter Toth
Congratulations! Bjørn Jørgensen ezt írta (időpont: 2022. aug. 10., Sze, 12:21): > Congratulations :) > > tir. 9. aug. 2022 kl. 10:13 skrev Hyukjin Kwon : > >> Hi all, >> >> The Spark PMC recently added Xinrong Meng as a committer on the project. >> Xinrong

Re: Welcome Xinrong Meng as a Spark committer

2022-08-10 Thread Bjørn Jørgensen
Congratulations :) tir. 9. aug. 2022 kl. 10:13 skrev Hyukjin Kwon : > Hi all, > > The Spark PMC recently added Xinrong Meng as a committer on the project. > Xinrong is the major contributor of PySpark especially Pandas API on Spark. > She has guided a lot of new contributors e

  1   2   3   4   5   6   7   8   9   10   >