Re: Allowing Unicode Whitespace in Lexer

2024-03-27 Thread serge rielau . com
Going once, going twice, …. last call for objections On Mar 23, 2024 at 5:29 PM -0700, serge rielau.com , wrote: Hello, I have a PR https://github.com/apache/spark/pull/45620 ready to go that will extend the definition of whitespace (what separates token) from the small set of ASCII characters

Re: Allowing Unicode Whitespace in Lexer

2024-03-27 Thread serge rielau . com
Yeah I heard about that. This IMHO is a bit more worrying, and we do not have teh "excuse" that it is transparent. Also, which of these would be STRING and which IDENTIFIER? On Mar 25, 2024 at 1:06 PM -0700, Alex Cruise , wrote: While we're at it, maybe consider allowing "smart quotes" too :)

Community Over Code NA 2024 Travel Assistance Applications now open!

2024-03-27 Thread Gavin McDonald
Hello to all users, contributors and Committers! [ You are receiving this email as a subscriber to one or more ASF project dev or user mailing lists and is not being sent to you directly. It is important that we reach all of our users and contributors/committers so that they may get a chance

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-26 Thread Jungtaek Lim
Sounds good. One thing I'd like to clarify before shepherding this SPIP is the process itself. Getting enough traction from PMC members is another issue to pass the SPIP vote. Even a vote from committer is not counted. (I don't have a binding vote.) I only see one PMC member (Thomas Graves, not

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-26 Thread Pavan Kotikalapudi
Sounds good. Thanks again for your help on guiding the effort from discussion/review through voting phases in the spark dev community. Thank you, Pavan On Tue, Mar 26, 2024 at 4:20 AM Mich Talebzadeh wrote: > Hi Pavan, > > Thanks for instigating this proposal. Looks like the proposal is

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-26 Thread Mich Talebzadeh
Hi Pavan, Thanks for instigating this proposal. Looks like the proposal is ready and has enough votes to be implemented. Having a sheppard will make it more fruitful. I will leave it to @Jungtaek Lim 's capable hands to drive it forward. Will be there to help if needed. Cheers Mich

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-26 Thread Pavan Kotikalapudi
Hi Bhuwan, Glad to hear back from you! Very much appreciate your help on reviewing the design doc/PR and endorsing this proposal. Thank you so much @Jungtaek Lim , @Mich Talebzadeh for graciously agreeing to mentor/shepherd this effort. Regarding Twilio copyright in Notice binary file:

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-26 Thread Jungtaek Lim
I'm happy to, but it looks like I need to check one more thing about the license, according to the WIP PR . @Pavan Kotikalapudi I see you've added the copyright of Twilio in the NOTICE-binary file, which makes me wonder if Twilio had filed CCLA to the

Re: Improved Structured Streaming Documentation Proof-of-Concept

2024-03-25 Thread Neil Ramaswamy
I'm glad you think it's generally a good idea! I will mention, though, that with these better docs I've almost finished, I'm hoping that Structured Streaming no longer stays a specialist topic that requires "trench warfare." With good pedagogy, I think that it's very approachable. The Knowledge

Re: Improved Structured Streaming Documentation Proof-of-Concept

2024-03-25 Thread Mich Talebzadeh
Hi, Your intended work on improving the Structured Streaming documentation is great! Clear and well-organized instructions are important for everyone using Spark, beginners and experts alike. Having said that, Spark Structured Streaming much like other specialist topics with Spark say (k8s) or

Re: Allowing Unicode Whitespace in Lexer

2024-03-25 Thread Alex Cruise
While we're at it, maybe consider allowing "smart quotes" too :) -0xe1a On Sat, Mar 23, 2024 at 5:29 PM serge rielau.com wrote: > Hello, > > I have a PR https://github.com/apache/spark/pull/45620 ready to go that > will extend the definition of whitespace (what separates token) from the >

Improved Structured Streaming Documentation Proof-of-Concept

2024-03-25 Thread Neil Ramaswamy
Hi all, I recently started an effort to improve the Structured Streaming documentation. I thought that the current documentation, while very comprehensive, could be improved in terms of organization, clarity, and presence of examples. You can view the repo here

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-25 Thread Bhuwan Sahni
Hi Pavan, I looked at the PR, and the changes look simple and contained. It would be useful to add dynamic resource allocation to Spark Structured Streaming. Jungtaek. Would you be able to shepherd this change? On Tue, Mar 19, 2024 at 10:38 AM Bhuwan Sahni wrote: > Thanks a lot for creating

Re: [DISCUSS] MySQL version support policy

2024-03-25 Thread Cheng Pan
Thanks Dongjoon’s reply and questions, > A. Adding a new Apache Spark community policy (contract) to guarantee MySQL > LTS Versions Support. Yes, at least the latest MySQL LTS version. To reduce the maintenance efforts on the Spark side, I think we can only run CI with the latest LTS version

Re: [DISCUSS] MySQL version support policy

2024-03-25 Thread Dongjoon Hyun
Hi, Cheng. Thank you for the suggestion. Your suggestion seems to have at least two themes. A. Adding a new Apache Spark community policy (contract) to guarantee MySQL LTS Versions Support. B. Dropping the support of non-LTS version support (MySQL 8.3/8.2/8.1) And, it brings me three questions.

[DISCUSS] MySQL version support policy

2024-03-24 Thread Cheng Pan
Hi, Spark community, I noticed that the Spark JDBC connector MySQL dialect is testing against the 8.3.0[1] now, a non-LTS version. MySQL changed the version policy recently[2], which is now very similar to the Java version policy. In short, 5.5, 5.6, 5.7, 8.0 is the LTS version, 8.1, 8.2, 8.3

Allowing Unicode Whitespace in Lexer

2024-03-23 Thread serge rielau . com
Hello, I have a PR https://github.com/apache/spark/pull/45620 ready to go that will extend the definition of whitespace (what separates token) from the small set of ASCII characters space, tab, linefeed to those defined in Unicode. While this is a small and safe change, it is one where we

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-23 Thread Jay Han
+1. It sounds awesome! Kiran Kumar Dusi 于2024年3月21日周四 14:16写道: > +1 > > On Thu, 21 Mar 2024 at 7:46 AM, Farshid Ashouri < > farsheed.asho...@gmail.com> wrote: > >> +1 >> >> On Mon, 18 Mar 2024, 11:00 Mich Talebzadeh, >> wrote: >> >>> Some of you may be aware that Databricks community Home |

Unsubscribe

2024-03-22 Thread Dusty Williams
Unsubscribe

unsubscribe

2024-03-22 Thread madhan kumar

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2024-03-20 Thread Vakaris Baškirov
Hi! Just wanted to inquire about the status of the official operator. We are looking forward to contributing and later on switching to a Spark Operator and we would prefer it to be the official one. Thanks, Vakaris On Thu, Nov 30, 2023 at 7:09 AM Shiqi Sun wrote: > Hi Zhou, > > Thanks for the

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-19 Thread Mich Talebzadeh
I concur. Whilst Databricks' (a commercial entity) Knowledge Sharing Hub can be a useful resource for sharing knowledge and engaging with their respective community, ASF likely prioritizes platforms and channels that align more closely with its principles of open source, and vendor neutrality.

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-19 Thread Steve Loughran
ASF will be unhappy about this. and stack overflow exists. otherwise: apache Confluent and linkedIn exist; LI is the option I'd point at On Mon, 18 Mar 2024 at 10:59, Mich Talebzadeh wrote: > Some of you may be aware that Databricks community Home | Databricks > have just launched a knowledge

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-19 Thread Bhuwan Sahni
Thanks a lot for creating the risk table Pavan. My apologies. I was tied up with high priority items for the last couple weeks and could not respond. I will review the PR by tomorrow's end, and get back to you. Appreciate your patience. Thanks Bhuwan Sahni On Sun, Mar 17, 2024 at 4:42 PM Pavan

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-19 Thread Mich Talebzadeh
One option that comes to my mind, is that given the cyclic nature of these types of proposals in these two forums, we should be able to use Databricks's existing knowledge sharing hub Knowledge Sharing Hub - Databricks

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-19 Thread Varun Shah
+1 Great initiative. QQ : Stack overflow has a similar feature called "Collectives", but I am not sure of the expenses to create one for Apache Spark. With SO being used ( atleast before ChatGPT became quite the norm for searching questions), it already has a lot of questions asked and answered

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Deepak Sharma
+1 . I can contribute to it as well . On Tue, 19 Mar 2024 at 9:19 AM, Code Tutelage wrote: > +1 > > Thanks for proposing > > On Mon, Mar 18, 2024 at 9:25 AM Parsian, Mahmoud > wrote: > >> Good idea. Will be useful >> >> >> >> +1 >> >> >> >> >> >> >> >> *From: *ashok34...@yahoo.com.INVALID >>

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Hyukjin Kwon
One very good example is SparkR releases in Conda channel ( https://github.com/conda-forge/r-sparkr-feedstock). This is fully run by the community unofficially. On Tue, 19 Mar 2024 at 09:54, Mich Talebzadeh wrote: > +1 for me > > Mich Talebzadeh, > Dad | Technologist | Solutions Architect |

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Mich Talebzadeh
OK thanks for the update. What does officially blessed signify here? Can we have and run it as a sister site? The reason this comes to my mind is that the interested parties should have easy access to this site (from ISUG Spark sites) as a reference repository. I guess the advice would be that

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Reynold Xin
One of the problem in the past when something like this was brought up was that the ASF couldn't have officially blessed venues beyond the already approved ones. So that's something to look into. Now of course you are welcome to run unofficial things unblessed as long as they follow trademark

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Mich Talebzadeh
Well as long as it works. Please all check this link from Databricks and let us know your thoughts. Will something similar work for us?. Of course Databricks have much deeper pockets than our ASF community. Will it require moderation in our side to block spams and nutcases. Knowledge Sharing Hub

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Bjørn Jørgensen
something like this Spark community · GitHub man. 18. mars 2024 kl. 17:26 skrev Parsian, Mahmoud : > Good idea. Will be useful > > > > +1 > > > > > > > > *From: *ashok34...@yahoo.com.INVALID > *Date: *Monday, March 18, 2024 at 6:36 AM > *To: *user @spark ,

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

2024-03-18 Thread Mridul Muralidharan
Hi Ashish, This is something we are still actively working on internally, but is unfortunately not yet in a state to share widely yet. Regards, Mridul On Mon, Mar 11, 2024 at 6:23 PM Ashish Singh wrote: > Hi Kalyan, > > Is this something you are still interested in pursuing? There are some

Re: A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Mich Talebzadeh
+1 for me Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is correct to the

A proposal for creating a Knowledge Sharing Hub for Apache Spark Community

2024-03-18 Thread Mich Talebzadeh
Some of you may be aware that Databricks community Home | Databricks have just launched a knowledge sharing hub. I thought it would be a good idea for the Apache Spark user group to have the same, especially for repeat questions on Spark core, Spark SQL, Spark Structured Streaming, Spark Mlib and

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-17 Thread Pavan Kotikalapudi
Hi Bhuwan, I hope the team got a chance to review the draft PR, looking for some comments to see if the plan looks alright?. I have updated the document about the risks .(also mentioned

[VOTE][RESULT] SPIP: Structured Logging Framework for Apache Spark

2024-03-13 Thread Gengliang Wang
The vote passes with 24+1s (13 binding +1s). Thanks to all who reviewed the SPIP doc and voted! (* = binding) +1: - Haejoon Lee - Jie Yang - Hyukjin Kwon (*) - Wenchen Fan (*) - Mich Talebzadeh - Kent Yao - Denny Lee - Mridul Muralidharan (*) - Huaxin Gao (*) - Dongjoon Hyun (*) - Xinrong Meng

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-13 Thread Gengliang Wang
Thanks all for participating! The vote passed. I'll send out the result in a separate thread. On Wed, Mar 13, 2024 at 9:43 AM bo yang wrote: > +1 > > On Wed, Mar 13, 2024 at 7:19 AM Tom Graves > wrote: > >> Similar as others, will be interested in working out api's and details >> but overall

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-13 Thread bo yang
+1 On Wed, Mar 13, 2024 at 7:19 AM Tom Graves wrote: > Similar as others, will be interested in working out api's and details > but overall in favor of it. > > +1 > > Tom Graves > On Monday, March 11, 2024 at 11:25:38 AM CDT, Mridul Muralidharan < > mri...@gmail.com> wrote: > > > > I am

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-13 Thread Tom Graves
Similar as others,  will be interested in working out api's and details but overall in favor of it. +1 Tom Graves On Monday, March 11, 2024 at 11:25:38 AM CDT, Mridul Muralidharan wrote:   I am supportive of the proposal - this is a step in the right direction !Additional metadata

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-12 Thread Ruifeng Zheng
+1 On Wed, Mar 13, 2024 at 4:32 AM John Zhuge wrote: > +1 (non-binding) > > On Tue, Mar 12, 2024 at 8:45 AM L. C. Hsieh wrote: > >> +1 >> >> >> On Tue, Mar 12, 2024 at 8:20 AM Chao Sun wrote: >> >>> +1 >>> >>> On Tue, Mar 12, 2024 at 8:03 AM Xiao Li >>> wrote: >>> +1 On Tue,

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-12 Thread John Zhuge
+1 (non-binding) On Tue, Mar 12, 2024 at 8:45 AM L. C. Hsieh wrote: > +1 > > > On Tue, Mar 12, 2024 at 8:20 AM Chao Sun wrote: > >> +1 >> >> On Tue, Mar 12, 2024 at 8:03 AM Xiao Li >> wrote: >> >>> +1 >>> >>> On Tue, Mar 12, 2024 at 6:09 AM Holden Karau >>> wrote: >>> +1

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-12 Thread Robyn Nameth
+ 1 On Mon, Mar 11, 2024 at 3:46 AM Gengliang Wang wrote: > Hi all, > > I'd like to start the vote for SPIP: Structured Logging Framework for > Apache Spark > > References: > >- JIRA ticket >- SPIP doc > >

Re: Enhanced Console Sink for Structured Streaming

2024-03-12 Thread Neil Ramaswamy
For advanced users, it's certainly an option to look at the streaming query progress and use the state store reader to look at your state. However, the goal of this Enhanced Console Sink is to improve the experience for *new *users, i.e. it should work mostly out of the box. Let's move discussion

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-12 Thread L. C. Hsieh
+1 On Tue, Mar 12, 2024 at 8:20 AM Chao Sun wrote: > +1 > > On Tue, Mar 12, 2024 at 8:03 AM Xiao Li > wrote: > >> +1 >> >> On Tue, Mar 12, 2024 at 6:09 AM Holden Karau >> wrote: >> >>> +1 >>> >>> Twitter: https://twitter.com/holdenkarau >>> Books (Learning Spark, High Performance Spark,

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-12 Thread Chao Sun
+1 On Tue, Mar 12, 2024 at 8:03 AM Xiao Li wrote: > +1 > > On Tue, Mar 12, 2024 at 6:09 AM Holden Karau > wrote: > >> +1 >> >> Twitter: https://twitter.com/holdenkarau >> Books (Learning Spark, High Performance Spark, etc.): >> https://amzn.to/2MaRAG9 >> YouTube Live

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-12 Thread Xiao Li
+1 On Tue, Mar 12, 2024 at 6:09 AM Holden Karau wrote: > +1 > > Twitter: https://twitter.com/holdenkarau > Books (Learning Spark, High Performance Spark, etc.): > https://amzn.to/2MaRAG9 > YouTube Live Streams: https://www.youtube.com/user/holdenkarau > > > On Mon,

Re: Enhanced Console Sink for Structured Streaming

2024-03-12 Thread Mich Talebzadeh
OK I have just been working on a Databricks engineering question raised by a user Monitoring structure streaming in external sink In practice there is an option to use

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-12 Thread Holden Karau
+1 Twitter: https://twitter.com/holdenkarau Books (Learning Spark, High Performance Spark, etc.): https://amzn.to/2MaRAG9 YouTube Live Streams: https://www.youtube.com/user/holdenkarau On Mon, Mar 11, 2024 at 7:44 PM Reynold Xin wrote: > +1 > > > On Mon, Mar 11 2024

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Reynold Xin
+1 On Mon, Mar 11 2024 at 7:38 PM, Jungtaek Lim < kabhwan.opensou...@gmail.com > wrote: > > +1 (non-binding), thanks Gengliang! > > > On Mon, Mar 11, 2024 at 5:46 PM Gengliang Wang < ltn...@gmail.com > wrote: > > > >> Hi all, >> >> I'd like to start the vote for SPIP: Structured Logging

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Jungtaek Lim
+1 (non-binding), thanks Gengliang! On Mon, Mar 11, 2024 at 5:46 PM Gengliang Wang wrote: > Hi all, > > I'd like to start the vote for SPIP: Structured Logging Framework for > Apache Spark > > References: > >- JIRA ticket >- SPIP doc >

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Scott
+1 On Mon, Mar 11, 2024 at 4:11 AM yangjie01 wrote: > +1 > > > > Jie Yang > > > > *发件人**: *Haejoon Lee > *日期**: *2024年3月11日 星期一 17:09 > *收件人**: *Gengliang Wang > *抄送**: *dev > *主题**: *Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark > > > > +1 > > > > On Mon, Mar 11, 2024 at

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

2024-03-11 Thread Ashish Singh
Hi Kalyan, Is this something you are still interested in pursuing? There are some open discussion threads on the doc you shared. @Mridul Muralidharan In what state are your efforts along this? Is it something that your team is actively pursuing/ building or are mostly planning right now? Asking

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Xinrong Meng
+1 Thanks @Gengliang Wang ! On Mon, Mar 11, 2024 at 1:09 PM Gengliang Wang wrote: > Hi Steve, > > thanks for the suggestion in this email thread and the SPIP doc! I will > read the Audit Log and seek your feedback through PR reviews during the > implementation process. > > > So worrying about

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Dongjoon Hyun
Ya, I also have a similar opinion with Mridul. +1 Thank you, Gengliang. Dongjoon. On Mon, Mar 11, 2024 at 1:34 PM Mridul Muralidharan wrote: > > I am supportive of the proposal - this is a step in the right direction ! > Additional metadata (explicit and inferred) for log records, and

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Gengliang Wang
Hi Steve, thanks for the suggestion in this email thread and the SPIP doc! I will read the Audit Log and seek your feedback through PR reviews during the implementation process. > So worrying about how pass and manage that at the thread level matters. We can have a specific logger for

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread huaxin gao
+1 On Mon, Mar 11, 2024 at 7:02 AM Wenchen Fan wrote: > +1 > > On Mon, Mar 11, 2024 at 5:26 PM Hyukjin Kwon wrote: > >> +1 >> >> On Mon, 11 Mar 2024 at 18:11, yangjie01 >> wrote: >> >>> +1 >>> >>> >>> >>> Jie Yang >>> >>> >>> >>> *发件人**: *Haejoon Lee >>> *日期**: *2024年3月11日 星期一 17:09 >>>

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Mridul Muralidharan
I am supportive of the proposal - this is a step in the right direction ! Additional metadata (explicit and inferred) for log records, and exposing them for indexing is extremely useful. The specifics of the API still need some work IMO and does not need to be this disruptive, but I consider

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Denny Lee
+1 (non-binding) On Sun, Mar 10, 2024 at 23:36 Gengliang Wang wrote: > Hi all, > > I'd like to start the vote for SPIP: Structured Logging Framework for > Apache Spark > > References: > >- JIRA ticket >- SPIP doc > >

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Steve Loughran
I consider the context info as more important than just logging; at hadoop level we do it to attach things like task/jobIds, kerberos principals etc to all store requests. https://hadoop.apache.org/docs/stable/hadoop-aws/tools/hadoop-aws/auditing.html So worrying about how pass and manage that at

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Kent Yao
+1 (non-binding) Kent Yao Hyukjin Kwon 于2024年3月11日周一 17:26写道: > > +1 > > On Mon, 11 Mar 2024 at 18:11, yangjie01 wrote: >> >> +1 >> >> >> >> Jie Yang >> >> >> >> 发件人: Haejoon Lee >> 日期: 2024年3月11日 星期一 17:09 >> 收件人: Gengliang Wang >> 抄送: dev >> 主题: Re: [VOTE] SPIP: Structured Logging

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Mich Talebzadeh
+1 Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided is correct to the best

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Wenchen Fan
+1 On Mon, Mar 11, 2024 at 5:26 PM Hyukjin Kwon wrote: > +1 > > On Mon, 11 Mar 2024 at 18:11, yangjie01 > wrote: > >> +1 >> >> >> >> Jie Yang >> >> >> >> *发件人**: *Haejoon Lee >> *日期**: *2024年3月11日 星期一 17:09 >> *收件人**: *Gengliang Wang >> *抄送**: *dev >> *主题**: *Re: [VOTE] SPIP: Structured

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Hyukjin Kwon
+1 On Mon, 11 Mar 2024 at 18:11, yangjie01 wrote: > +1 > > > > Jie Yang > > > > *发件人**: *Haejoon Lee > *日期**: *2024年3月11日 星期一 17:09 > *收件人**: *Gengliang Wang > *抄送**: *dev > *主题**: *Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark > > > > +1 > > > > On Mon, Mar 11, 2024 at

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread yangjie01
+1 Jie Yang 发件人: Haejoon Lee 日期: 2024年3月11日 星期一 17:09 收件人: Gengliang Wang 抄送: dev 主题: Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark +1 On Mon, Mar 11, 2024 at 10:36 AM Gengliang Wang mailto:ltn...@gmail.com>> wrote: Hi all, I'd like to start the vote for SPIP: Structured

Re: [VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-11 Thread Haejoon Lee
+1 On Mon, Mar 11, 2024 at 10:36 AM Gengliang Wang wrote: > Hi all, > > I'd like to start the vote for SPIP: Structured Logging Framework for > Apache Spark > > References: > >- JIRA ticket >- SPIP doc > >

[VOTE] SPIP: Structured Logging Framework for Apache Spark

2024-03-10 Thread Gengliang Wang
Hi all, I'd like to start the vote for SPIP: Structured Logging Framework for Apache Spark References: - JIRA ticket - SPIP doc -

Re: [DISCUSS] SPIP: Structured Spark Logging

2024-03-10 Thread Gengliang Wang
Thanks everyone for the valuable feedback! Given the generally positive feedback received, I plan to move forward by initiating the voting thread. I encourage you to participate in the upcoming thread. Warm regards, Gengliang On Sat, Mar 9, 2024 at 12:55 PM Mich Talebzadeh wrote: > Splendid.

Re: [DISCUSS] SPIP: Structured Spark Logging

2024-03-09 Thread Mich Talebzadeh
Splendid. Thanks Gengliang Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* The information provided

Re: [DISCUSS] SPIP: Structured Spark Logging

2024-03-09 Thread Gengliang Wang
Hi Mich, Thanks for your suggestions. I agree that we should avoid confusion with Spark Structured Streaming. So, I'll go with "Structured Logging Framework for Apache Spark". This keeps the standard term "Structured Logging" and distinguishes it from "Structured Streaming" clearly. Thanks for

SPARK-44951, Improve Spark Dynamic Allocation

2024-03-08 Thread Mich Talebzadeh
Hi all, On this ticket, improve Spark Dynamic Allocation I see no movement since it was opened back in August 2023 I may be wrong of course Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view

答复: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-05 Thread Pan,Bingkun
Okay, Let me double-check it carefully. Thank you very much for your help! 发件人: Jungtaek Lim 发送时间: 2024年3月5日 21:56:41 收件人: Pan,Bingkun 抄送: Dongjoon Hyun; dev; user 主题: Re: [ANNOUNCE] Apache Spark 3.5.1 released Yeah the approach seems OK to me - please double

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-05 Thread Mich Talebzadeh
Hi Jason, I read your notes and the code simulating the problem as link https://issues.apache.org/jira/browse/SPARK-38388 and the specific repartition issue (SPARK-38388) that this code aims to demonstrate The code below from the above link Jira import scala.sys.process._ import

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-05 Thread Jungtaek Lim
Yeah the approach seems OK to me - please double check that the doc generation in Spark repo won't fail after the move of the js file. Other than that, it would be probably just a matter of updating the release process. On Tue, Mar 5, 2024 at 7:24 PM Pan,Bingkun wrote: > Okay, I see. > >

答复: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-05 Thread Pan,Bingkun
Okay, I see. Perhaps we can solve this confusion by sharing the same file `version.json` across `all versions` in the `Spark website repo`? Make each version of the document display the `same` data in the dropdown menu. 发件人: Jungtaek Lim 发送时间: 2024年3月5日

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-05 Thread Jungtaek Lim
Let me be more specific. We have two active release version lines, 3.4.x and 3.5.x. We just released Spark 3.5.1, having a dropdown as 3.5.1 and 3.4.2 given the fact the last version of 3.4.x is 3.4.2. After a month we released Spark 3.4.3. In the dropdown of Spark 3.4.3, there will be 3.5.1 and

答复: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-05 Thread Pan,Bingkun
Based on my understanding, we should not update versions that have already been released, such as the situation you mentioned: `But what about dropout of version D? Should we add E in the dropdown?` We only need to record the latest `version. json` file that has already been published at the

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-05 Thread Jungtaek Lim
But this does not answer my question about updating the dropdown for the doc of "already released versions", right? Let's say we just released version D, and the dropdown has version A, B, C. We have another release tomorrow as version E, and it's probably easy to add A, B, C, D in the dropdown

答复: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-05 Thread Pan,Bingkun
According to my understanding, the original intention of this feature is that when a user has entered the pyspark document, if he finds that the version he is currently in is not the version he wants, he can easily jump to the version he wants by clicking on the drop-down box. Additionally, in

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-04 Thread Yang Jie
hmm... I guess this is meant to cc @Bingkun Pan ? On 2024/03/05 02:16:12 Hyukjin Kwon wrote: > Is this related to https://github.com/apache/spark/pull/42428? > > cc @Yang,Jie(INF) > > On Mon, 4 Mar 2024 at 22:21, Jungtaek Lim > wrote: > > > Shall we revisit this functionality? The API doc

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-04 Thread yangjie01
That sounds like a great suggestion. 发件人: Jungtaek Lim 日期: 2024年3月5日 星期二 10:46 收件人: Hyukjin Kwon 抄送: yangjie01 , Dongjoon Hyun , dev , user 主题: Re: [ANNOUNCE] Apache Spark 3.5.1 released Yes, it's relevant to that PR. I wonder, if we want to expose version switcher, it should be in

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-04 Thread Jungtaek Lim
Yes, it's relevant to that PR. I wonder, if we want to expose version switcher, it should be in versionless doc (spark-website) rather than the doc being pinned to a specific version. On Tue, Mar 5, 2024 at 11:18 AM Hyukjin Kwon wrote: > Is this related to

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-04 Thread Hyukjin Kwon
Is this related to https://github.com/apache/spark/pull/42428? cc @Yang,Jie(INF) On Mon, 4 Mar 2024 at 22:21, Jungtaek Lim wrote: > Shall we revisit this functionality? The API doc is built with individual > versions, and for each individual version we depend on other released > versions.

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-04 Thread Prem Sahoo
Thanks Jason for detailed information and big associated with it. Hopefully someone provided more information about this pressing issue. On Mon, Mar 4, 2024 at 1:26 PM Jason Xu wrote: > Hi Prem, > > From the symptom of shuffle fetch failure and few duplicate data and few > missing data, I think

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-04 Thread Jason Xu
Hi Prem, >From the symptom of shuffle fetch failure and few duplicate data and few missing data, I think you might run into this correctness bug: https://issues.apache.org/jira/browse/SPARK-38388. Node/shuffle failure is hard to avoid, I wonder if you have non-deterministic logic and calling

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-04 Thread Prem Sahoo
super :( On Mon, Mar 4, 2024 at 6:19 AM Mich Talebzadeh wrote: > "... in a nutshell if fetchFailedException occurs due to data node reboot > then it can create duplicate / missing data . so this is more of > hardware(env issue ) rather than spark issue ." > > As an overall conclusion your

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-04 Thread Mich Talebzadeh
"... in a nutshell if fetchFailedException occurs due to data node reboot then it can create duplicate / missing data . so this is more of hardware(env issue ) rather than spark issue ." As an overall conclusion your point is correct but again the answer is not binary. Spark core relies on

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-03 Thread Prem Sahoo
thanks Mich, in a nutshell if fetchFailedException occurs due to data node reboot then it can create duplicate / missing data . so this is more of hardware(env issue ) rather than spark issue . On Sat, Mar 2, 2024 at 7:45 AM Mich Talebzadeh wrote: > Hi, > > It seems to me that there are

Re: [ANNOUNCE] Apache Spark 3.5.1 released

2024-03-03 Thread Jungtaek Lim
Shall we revisit this functionality? The API doc is built with individual versions, and for each individual version we depend on other released versions. This does not seem to be right to me. Also, the functionality is only in PySpark API doc which does not seem to be consistent as well. I don't

Re: [DISCUSS] SPIP: Structured Spark Logging

2024-03-02 Thread Mich Talebzadeh
Hi Gengliang, Thanks for taking the initiative to improve the Spark logging system. Transitioning to structured logs seems like a worthy way to enhance the ability to analyze and troubleshoot Spark jobs and hopefully the future integration with cloud logging systems. While "Structured Spark

Re: [DISCUSS] SPIP: Structured Spark Logging

2024-03-02 Thread Mridul Muralidharan
Hi Gengling, Thanks for sharing this ! I added a few queries to the proposal doc, and we can continue discussing there, but overall I am in favor of this. Regards, Mridul On Fri, Mar 1, 2024 at 1:35 AM Gengliang Wang wrote: > Hi All, > > I propose to enhance our logging system by

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-02 Thread Mich Talebzadeh
Hi, It seems to me that there are issues related to below * I think when a task failed in between and retry task started and completed it may create duplicate as failed task has some data + retry task has full data. but my question is why spark keeps delta data or according to you if

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-01 Thread Prem Sahoo
Hello Mich, thanks for your reply. As an engineer I can chip in. You may have partial execution and retries meaning when spark encounters a *FetchFailedException*, it may retry fetching the data from the unavailable (the one being rebooted) node a few times before marking it permanently

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-01 Thread Mich Talebzadeh
Hi, Your point -> "When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why. We have scenario when spark job complains *FetchFailedException as one of the data node got ** rebooted middle of job running ."* As an engineer I

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-01 Thread Mich Talebzadeh
Hi Bhuwan et al, Thank you for passing on the DataBricks Structured Streaming team's review of the SPIP document. FYI, I work closely with Pawan and other members to help deliver this piece of work. We appreciate your insights, especially regarding the cost savings potential from the PoC. Pavan

Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-01 Thread Nivedita VY
+1 Nivi

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-01 Thread Pavan Kotikalapudi
Thanks Bhuwan and rest of the databricks team for the reviews, I appreciate your reviews, was very helpful in evaluating a few options that were overlooked earlier (especially about mixed spark apps running on notebooks). Regarding the use-cases, It could handle multiple streaming queries

RE: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-01 Thread Nivedita VY
+1 Nivi

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-03-01 Thread Bhuwan Sahni
Hi Pavan, I am from the DataBricks Structured Streaming team, and we did a review of the SPIP internally. Wanted to pass on the points discussed in the meeting. Thanks for putting together the SPIP document. It's useful to have dynamic resource allocation for Streaming queries, and it's

Re: When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why

2024-03-01 Thread Prem Sahoo
Hello All, in the list of JIRAs i didn't find anything related to fetchFailedException. as mentioned above "When Spark job shows FetchFailedException it creates few duplicate data and we see few data also missing , please explain why. We have a scenario when spark job complains

<    1   2   3   4   5   6   7   8   9   10   >