Re: Re: [DISCUSS] Release Spark 3.5.1?

2024-02-04 Thread John Zhuge
+1 John Zhuge On Sun, Feb 4, 2024 at 11:23 AM Santosh Pingale wrote: > +1 > > On Sun, Feb 4, 2024, 8:18 PM Xiao Li > wrote: > >> +1 >> >> On Sun, Feb 4, 2024 at 6:07 AM beliefer wrote: >> >>> +1 >>> >>> >>> >>> 在 2024-02-04 15:26:13,"Dongjoon Hyun" 写道: >>> >>> +1 >>> >>> On Sat, Feb 3,

Re: Re: [DISCUSS] Release Spark 3.5.1?

2024-02-04 Thread Santosh Pingale
+1 On Sun, Feb 4, 2024, 8:18 PM Xiao Li wrote: > +1 > > On Sun, Feb 4, 2024 at 6:07 AM beliefer wrote: > >> +1 >> >> >> >> 在 2024-02-04 15:26:13,"Dongjoon Hyun" 写道: >> >> +1 >> >> On Sat, Feb 3, 2024 at 9:18 PM yangjie01 >> wrote: >> >>> +1 >>> >>> 在 2024/2/4 13:13,“Kent

Re: Re: [DISCUSS] Release Spark 3.5.1?

2024-02-04 Thread Xiao Li
+1 On Sun, Feb 4, 2024 at 6:07 AM beliefer wrote: > +1 > > > > 在 2024-02-04 15:26:13,"Dongjoon Hyun" 写道: > > +1 > > On Sat, Feb 3, 2024 at 9:18 PM yangjie01 > wrote: > >> +1 >> >> 在 2024/2/4 13:13,“Kent Yao”mailto:y...@apache.org>> 写入: >> >> >> +1 >> >> >> Jungtaek Lim >

Re:Re: [DISCUSS] Release Spark 3.5.1?

2024-02-04 Thread beliefer
+1 在 2024-02-04 15:26:13,"Dongjoon Hyun" 写道: +1 On Sat, Feb 3, 2024 at 9:18 PM yangjie01 wrote: +1 在 2024/2/4 13:13,“Kent Yao”mailto:y...@apache.org>> 写入: +1 Jungtaek Lim mailto:kabhwan.opensou...@gmail.com>> 于2024年2月3日周六 21:14写道: > > Hi dev, > > looks like there are a huge

Re: [DISCUSS] Release Spark 3.5.1?

2024-02-03 Thread Dongjoon Hyun
+1 On Sat, Feb 3, 2024 at 9:18 PM yangjie01 wrote: > +1 > > 在 2024/2/4 13:13,“Kent Yao”mailto:y...@apache.org>> 写入: > > > +1 > > > Jungtaek Lim kabhwan.opensou...@gmail.com>> 于2024年2月3日周六 21:14写道: > > > > Hi dev, > > > > looks like there are a huge number of commits being pushed to branch-3.5

Re: [DISCUSS] Release Spark 3.5.1?

2024-02-03 Thread yangjie01
+1 在 2024/2/4 13:13,“Kent Yao”mailto:y...@apache.org>> 写入: +1 Jungtaek Lim mailto:kabhwan.opensou...@gmail.com>> 于2024年2月3日周六 21:14写道: > > Hi dev, > > looks like there are a huge number of commits being pushed to branch-3.5 > after 3.5.0 was released, 200+ commits. > > $ git log --oneline

Re: [DISCUSS] Release Spark 3.5.1?

2024-02-03 Thread Kent Yao
+1 Jungtaek Lim 于2024年2月3日周六 21:14写道: > > Hi dev, > > looks like there are a huge number of commits being pushed to branch-3.5 > after 3.5.0 was released, 200+ commits. > > $ git log --oneline v3.5.0..HEAD | wc -l > 202 > > Also, there are 180 JIRA tickets containing 3.5.1 as fixed version, and

Re: Enhanced Console Sink for Structured Streaming

2024-02-03 Thread Neil Ramaswamy
Re: verbosity: yes, it will be more verbose. A config I was planning to implement was a default-on console sink option, verboseMode, that you can set to be off if you just want sink data. I don't think that introduces additional complexity, as the last point suggests. (And also, nobody should be

[DISCUSS] Release Spark 3.5.1?

2024-02-03 Thread Jungtaek Lim
Hi dev, looks like there are a huge number of commits being pushed to branch-3.5 after 3.5.0 was released, 200+ commits. $ git log --oneline v3.5.0..HEAD | wc -l 202 Also, there are 180 JIRA tickets containing 3.5.1 as fixed version, and 10 resolved issues are either marked as blocker (even

Re: Enhanced Console Sink for Structured Streaming

2024-02-03 Thread Mich Talebzadeh
Hi, As I understood, the proposal you mentioned suggests adding event-time and state store metadata to the console sink to better highlight the semantics of the Structured Streaming engine. While I agree this enhancement can provide valuable insights into the engine's behavior especially for

Community over Code EU 2024 Travel Assistance Applications now open!

2024-02-03 Thread Gavin McDonald
Hello to all users, contributors and Committers! The Travel Assistance Committee (TAC) are pleased to announce that travel assistance applications for Community over Code EU 2024 are now open! We will be supporting Community over Code EU, Bratislava, Slovakia, June 3th - 5th, 2024. TAC exists

[no subject]

2024-02-03 Thread Gavin McDonald
Hello to all users, contributors and Committers! The Travel Assistance Committee (TAC) are pleased to announce that travel assistance applications for Community over Code EU 2024 are now open! We will be supporting Community over Code EU, Bratislava, Slovakia, June 3th - 5th, 2024. TAC exists

Enhanced Console Sink for Structured Streaming

2024-02-02 Thread Neil Ramaswamy
Hi all, I'd like to propose the idea of enhancing Structured Streaming's console sink to print event-time metrics and state store data, in addition to the sink's rows. I've noticed beginners often struggle to understand how watermarks, operator state, and output rows are all intertwined. By

Re: Spark 3.5.1

2024-01-31 Thread Jungtaek Lim
Hi, I agreed it's time to release 3.5.1. 10 resolved issues are either marked as blocker (even correctness issues) or critical, which justifies the release. I had been trying to find the time to take a step, but had no luck with it. I'll give it another try this week (it needs some time as I'm

Extracting Input and Output Partitions in Spark

2024-01-30 Thread Aditya Sohoni
Hello Spark Devs! We are from Uber's Spark team. Our ETL jobs use Spark to read and write from Hive datasets stored in HDFS. The freshness of the partition written to depends on the freshness of the data in the input partition(s). We monitor this freshness score, so that partitions in our

Spark 3.5.1

2024-01-30 Thread Santosh Pingale
Hey there Spark 3.5 branch has accumulated 199 commits with quite a few bug fixes related to correctness. Are there any plans for releasing 3.5.1? Kind regards Santosh

Re: [QUESTION] Legal dependency with Oracle JDBC driver

2024-01-30 Thread Mich Talebzadeh
Hi Alex, Well, that is just Justin's opinion vis-à-vis his matter. It is different from mine. Bottom line, you can always refer to Oracle or a copyright expert on this matter and see what they suggest. HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom

unsubscribe

2024-01-29 Thread Gang Feng

Re: [QUESTION] Legal dependency with Oracle JDBC driver

2024-01-29 Thread Alex Porcelli
Hi Mich, Thank you for the prompt response. Looks like Justin Mclean has a slightly different perspective on the Oracle's license as you can see in [3]. On Mon, Jan 29, 2024 at 4:17 PM Mich Talebzadeh wrote: > Hi, > > This is not an official response and should not be taken as an > official

Re: [QUESTION] Legal dependency with Oracle JDBC driver

2024-01-29 Thread Mich Talebzadeh
Hi, This is not an official response and should not be taken as an official view. It is my own opinion. Looking at the reference [1], I can see a host of inclusion to other JDBC vendor' drivers such as IBM DB2 and MSSQL With regard to link [2], it is already closed (3+ years) and it is assumed

[QUESTION] Legal dependency with Oracle JDBC driver

2024-01-29 Thread Alex Porcelli
Hi Spark Devs, I'm reaching out to understand how you managed to include the Oracle JDBC as one of your dependencies [1]. According to legal tickers [2][3], this is considered a Category X dependency and is not allowed. (I'm part of the Apache KIE podling, and we are struggling with such a

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

2024-01-26 Thread kalyan
Hi all, Sorry for the delay in getting the first draft of (my first) SPIP out. https://docs.google.com/document/d/1hxEPUirf3eYwNfMOmUHpuI5dIt_HJErCdo7_yr9htQc/edit?pli=1 Let me know what you think. Regards kalyan. On Sat, Jan 20, 2024 at 8:19 AM Ashish Singh wrote: > Hey all, > > Thanks for

Re: [EXTERNAL] Re: Spark Kafka Rack Aware Consumer

2024-01-26 Thread Raghu Angadi
Overall the proposal to make this an option for Kafka source SGTM. You can address the doc review and can send PR (in parallel or after the review). Note that currently executors cache client connection to Kafka and reuse the connection and buffered records for next micro-batch. Your proposal

Re: [EXTERNAL] Re: Spark Kafka Rack Aware Consumer

2024-01-26 Thread Schwager, Randall
Granted. Thanks for bearing with me. I’ve also opened up permissions to allow anyone with the link to edit the document. Thank you! From: Mich Talebzadeh Date: Friday, January 26, 2024 at 09:19 To: "Schwager, Randall" Cc: "dev@spark.apache.org" Subject: Re: [EXTERNAL] Re: Spark Kafka Rack

Re: [EXTERNAL] Re: Spark Kafka Rack Aware Consumer

2024-01-26 Thread Mich Talebzadeh
Ok I made a request to access this document Thanks Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile Ent https://en.everybodywiki.com/Mich_Talebzadeh

Re: [EXTERNAL] Re: Spark Kafka Rack Aware Consumer

2024-01-26 Thread Schwager, Randall
Hi Mich, Thanks for responding. In the JIRA issue, the design doc you’re referring to describes the prior work. This is the design doc for the proposed change: https://docs.google.com/document/d/1RoEk_mt8AUh9sTQZ1NfzIuuYKf1zx6BP1K3IlJ2b8iM/edit#heading=h.pbt6pdb2jt5c I’ll re-word the

Re: Spark Kafka Rack Aware Consumer

2024-01-26 Thread Mich Talebzadeh
Your design doc Structured Streaming Kafka Source - Design Doc - Google Docs seems to be around since 2016. Reading the comments it was decided not to progress with it. What has changed

Re: Spark Kafka Rack Aware Consumer

2024-01-25 Thread Schwager, Randall
Bump. Am I asking these questions in the wrong place? Or should I forego design input and just write the PR? From: "Schwager, Randall" Date: Monday, January 22, 2024 at 17:02 To: "dev@spark.apache.org" Subject: Re: Spark Kafka Rack Aware Consumer Hello Spark Devs! After doing some detective

Re: Spark Kafka Rack Aware Consumer

2024-01-22 Thread Schwager, Randall
Hello Spark Devs! After doing some detective work, I’d like to revisit this idea in earnest. My understanding now is that setting `client.rack` dynamically on the executor will do nothing. This is because the driver assigns Kafka partitions to executors. I’ve summarized a design to enable rack

Re: Removing Kinesis in Spark 4

2024-01-20 Thread Nicholas Chammas
Oh, that’s a very interesting dashboard. I was familiar with the Matomo snippet but never looked up where exactly those metrics were going. I see that the Kinesis docs do indeed have around 650 views in the past month, but for Kafka I see 11K and 1.3K views for the Structured Streaming and

Re: Removing Kinesis in Spark 4

2024-01-20 Thread Sean Owen
I'm not aware of much usage. but that doesn't mean a lot. FWIW, in the past month or so, the Kinesis docs page got about 700 views, compared to about 1400 for Kafka

Removing Kinesis in Spark 4

2024-01-20 Thread Nicholas Chammas
From the dev thread: What else could be removed in Spark 4? > On Aug 17, 2023, at 1:44 AM, Yang Jie wrote: > > I would like to know how we should handle the two Kinesis-related modules in > Spark 4.0. They have a very low

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-20 Thread Pavan Kotikalapudi
Here is the link to the voting thread https://lists.apache.org/thread/rlwqrw6ddxdkbvkp78kpd0zgvglgbbp8. Thank you, Pavan On Wed, Jan 17, 2024 at 7:15 PM Pavan Kotikalapudi wrote: > Thanks for the +1, I will propose voting in a new thread now. > > - Pavan > > On Wed, Jan 17, 2024 at 5:28 PM

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

2024-01-19 Thread Ashish Singh
Hey all, Thanks for this discussion, the timing of this couldn't be better! At Pinterest, we recently started to look into reducing OOM failures while also reducing memory consumption of spark applications. We considered the following options. 1. Changing core count on executor to change memory

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-19 Thread Mich Talebzadeh
Everyone's vote matters whether they are PMC or not. There is no monopoly here HTH Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-19 Thread Pavan Kotikalapudi
+1 If my vote counts. Does only spark PMC votes count? Thanks, Pavan On Thu, Jan 18, 2024 at 3:19 AM Adam Hobbs wrote: > +1 > -- > *From:* Pavan Kotikalapudi > *Sent:* Thursday, January 18, 2024 4:19:32 AM > *To:* Spark dev list > *Subject:* Re: Vote on Dynamic

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-17 Thread Adam Hobbs
+1 From: Pavan Kotikalapudi Sent: Thursday, January 18, 2024 4:19:32 AM To: Spark dev list Subject: Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815] CAUTION: This email originated from outside of the organisation. Do not click

Re: Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-17 Thread Pavan Kotikalapudi
Thanks for proposing and voting for the feature Mich. adding some references to the thread. - Jira ticket - SPARK-24815 - Design Doc

Vote on Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-17 Thread Mich Talebzadeh
+1 for me (non binding) *Disclaimer:* Use it at your own risk. Any and all responsibility for any loss, damage or destruction of data or any other property which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

2024-01-17 Thread Mridul Muralidharan
Hi, We are internally exploring adding support for dynamically changing the resource profile of a stage based on runtime characteristics. This includes failures due to OOM and the like, slowness due to excessive GC, resource wastage due to excessive overprovisioning, etc. Essentially handles

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

2024-01-17 Thread Tom Graves
It is interesting. I think there are definitely some discussion points around this.  reliability vs performance is always a trade off and its great it doesn't fail but if it doesn't meet someone's SLA now that could be as bad if its hard to figure out why.   I think if something like this

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-17 Thread Pavan Kotikalapudi
Thanks for the +1, I will propose voting in a new thread now. - Pavan On Wed, Jan 17, 2024 at 5:28 PM Mich Talebzadeh wrote: > I think we have discussed this enough and I consider it as a useful > feature.. I propose a vote on it. > > + 1 for me > > Mich Talebzadeh, > Dad | Technologist |

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-17 Thread Mich Talebzadeh
I think we have discussed this enough and I consider it as a useful feature.. I propose a vote on it. + 1 for me Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile

Re: [Spark-Core] Improving Reliability of spark when Executors OOM

2024-01-16 Thread Holden Karau
Oh interesting solution, a co-worker was suggesting something similar using resource profiles to increase memory -- but your approach avoids a lot of complexity I like it (and we could extend it out to support resource profile growth too). I think an SPIP sounds like a great next step. On Tue,

[Spark-Core] Improving Reliability of spark when Executors OOM

2024-01-16 Thread kalyan
Hello All, At Uber, we had recently, done some work on improving the reliability of spark applications in scenarios of fatter executors going out of memory and leading to application failure. Fatter executors are those that have more than 1 task running on it at a given time concurrently. This

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-16 Thread Adam Hobbs
Hi, This is my first time using the dev mailing list so I hope this is the correct way to do it. I would like to lend my support to this proposal and offer my experiences as a consumer of spark, and specifically Spark Structured Streaming (SSS). I am more of an cloud infrastructure devops

[VOTE][RESULT] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-11 Thread Jungtaek Lim
The vote passes with 12 +1s (3 binding +1s). Thanks to all who reviews the SPIP doc and votes! (* = binding) +1: - Jungtaek Lim - Anish Shrigondekar - Mich Talebzadeh - Raghu Angadi - 刘唯 - Shixiong Zhu (*) - Bartosz Konieczny - Praveen Gattu - Burak Yavuz - Bhuwan Sahni - L. C. Hsieh (*) -

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-11 Thread Jungtaek Lim
Thanks all for participating! The vote passed. I'll send out the result to a separate thread. On Thu, Jan 11, 2024 at 10:37 PM Wenchen Fan wrote: > +1 > > On Thu, Jan 11, 2024 at 9:32 AM L. C. Hsieh wrote: > >> +1 >> >> On Wed, Jan 10, 2024 at 9:06 AM Bhuwan Sahni >> wrote: >> >>> +1. This is

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-10 Thread Wenchen Fan
+1 On Thu, Jan 11, 2024 at 9:32 AM L. C. Hsieh wrote: > +1 > > On Wed, Jan 10, 2024 at 9:06 AM Bhuwan Sahni > wrote: > >> +1. This is a good addition. >> >> >> *Bhuwan Sahni* >> Staff Software Engineer >> >> bhuwan.sa...@databricks.com >> 500 108th Ave. NE >>

Spark Kafka Rack Aware Consumer

2024-01-10 Thread Schwager, Randall
Hello Spark Devs! Has there been discussion around adding the ability to dynamically set the ‘client.rack’ Kafka parameter at the executor? The Kafka SQL connector code on master doesn’t seem to support this feature. One can easily set the ‘client.rack’ parameter at the driver, but that just

Install Ruby 3 to build the docs

2024-01-10 Thread Nicholas Chammas
Just a quick heads up that, while Ruby 2.7 will continue to work, you should plan to install Ruby 3 in the near future in order to build the docs. (I recommend using rbenv to manage multiple Ruby versions.) Ruby 2 reached EOL in March 2023

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-10 Thread L. C. Hsieh
+1 On Wed, Jan 10, 2024 at 9:06 AM Bhuwan Sahni wrote: > +1. This is a good addition. > > > *Bhuwan Sahni* > Staff Software Engineer > > bhuwan.sa...@databricks.com > 500 108th Ave. NE > Bellevue, WA 98004 > USA > > > On Wed, Jan 10, 2024 at 9:00 AM Burak Yavuz

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-10 Thread Bhuwan Sahni
+1. This is a good addition. *Bhuwan Sahni* Staff Software Engineer bhuwan.sa...@databricks.com 500 108th Ave. NE Bellevue, WA 98004 USA On Wed, Jan 10, 2024 at 9:00 AM Burak Yavuz wrote: > +1. Excited to see more stateful workloads with Structured Streaming! > >

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-10 Thread Burak Yavuz
+1. Excited to see more stateful workloads with Structured Streaming! Best, Burak On Wed, Jan 10, 2024 at 8:21 AM Praveen Gattu wrote: > +1. This brings Structured Streaming a good solution for customers wanting > to build stateful stream processing applications. > > On Wed, Jan 10, 2024 at

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-10 Thread Praveen Gattu
+1. This brings Structured Streaming a good solution for customers wanting to build stateful stream processing applications. On Wed, Jan 10, 2024 at 7:30 AM Bartosz Konieczny wrote: > +1 :) > > On Wed, Jan 10, 2024 at 9:57 AM Shixiong Zhu wrote: > >> +1 (binding) >> >> Best Regards, >>

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-10 Thread Bartosz Konieczny
+1 :) On Wed, Jan 10, 2024 at 9:57 AM Shixiong Zhu wrote: > +1 (binding) > > Best Regards, > Shixiong Zhu > > > On Tue, Jan 9, 2024 at 6:47 PM 刘唯 wrote: > >> This is a good addition! +1 >> >> Raghu Angadi 于2024年1月9日周二 13:17写道: >> >>> +1. This is a major improvement to the state API. >>> >>>

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread Shixiong Zhu
+1 (binding) Best Regards, Shixiong Zhu On Tue, Jan 9, 2024 at 6:47 PM 刘唯 wrote: > This is a good addition! +1 > > Raghu Angadi 于2024年1月9日周二 13:17写道: > >> +1. This is a major improvement to the state API. >> >> Raghu. >> >> On Tue, Jan 9, 2024 at 1:42 AM Mich Talebzadeh >> wrote: >> >>> +1

Re: [DISCUSS] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread Jungtaek Lim
Friendly reminder, VOTE thread is now live! https://lists.apache.org/thread/16ryx828bwoth31hobknxnjfxjxj07mf The vote made here is not counted toward, so please ensure you vote in the VOTE thread. Thanks! On Tue, Jan 9, 2024 at 9:33 AM Jungtaek Lim wrote: > Thanks everyone for the feedback! > >

Re: Spark Structured Streaming and Flask REST API for Real-Time Data Ingestion and Analytics.

2024-01-09 Thread Mich Talebzadeh
Hi Ashok, Thanks for pointing out the databricks article Scalable Spark Structured Streaming for REST API Destinations | Databricks Blog I browsed it and it is basically similar to many of us involved

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread 刘唯
This is a good addition! +1 Raghu Angadi 于2024年1月9日周二 13:17写道: > +1. This is a major improvement to the state API. > > Raghu. > > On Tue, Jan 9, 2024 at 1:42 AM Mich Talebzadeh > wrote: > >> +1 for me as well >> >> >> Mich Talebzadeh, >> Dad | Technologist | Solutions Architect | Engineer >>

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread Raghu Angadi
+1. This is a major improvement to the state API. Raghu. On Tue, Jan 9, 2024 at 1:42 AM Mich Talebzadeh wrote: > +1 for me as well > > > Mich Talebzadeh, > Dad | Technologist | Solutions Architect | Engineer > London > United Kingdom > > >view my Linkedin profile >

RE: Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread 刘唯
+1 This is a good addition! On 2024/01/09 03:23:35 Anish Shrigondekar wrote: > Thanks Jungtaek for creating the Vote thread. > > +1 (non-binding) from my side too. > > Thanks, > Anish > > On Tue, Jan 9, 2024 at 6:09 AM Jungtaek Lim > wrote: > > > Starting with my +1 (non-binding). Thanks! > > >

Re: AutoReply: Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread Mich Talebzadeh
Hi, Please stop this acknowledgement email. It is spamming the forum unnecessarily! Thanks Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-09 Thread Mich Talebzadeh
+1 for me as well Mich Talebzadeh, Dad | Technologist | Solutions Architect | Engineer London United Kingdom view my Linkedin profile https://en.everybodywiki.com/Mich_Talebzadeh *Disclaimer:* Use it at your own risk. Any and

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-08 Thread Anish Shrigondekar
Thanks Jungtaek for creating the Vote thread. +1 (non-binding) from my side too. Thanks, Anish On Tue, Jan 9, 2024 at 6:09 AM Jungtaek Lim wrote: > Starting with my +1 (non-binding). Thanks! > > On Tue, Jan 9, 2024 at 9:37 AM Jungtaek Lim > wrote: > >> Hi all, >> >> I'd like to start the

Re: [VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-08 Thread Jungtaek Lim
Starting with my +1 (non-binding). Thanks! On Tue, Jan 9, 2024 at 9:37 AM Jungtaek Lim wrote: > Hi all, > > I'd like to start the vote for SPIP: Structured Streaming - Arbitrary > State API v2. > > References: > >- JIRA ticket >- SPIP

[VOTE] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-08 Thread Jungtaek Lim
Hi all, I'd like to start the vote for SPIP: Structured Streaming - Arbitrary State API v2. References: - JIRA ticket - SPIP doc -

Re: [DISCUSS] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-08 Thread Jungtaek Lim
Thanks everyone for the feedback! Given that we get positive feedback without major concerns, I will initiate the vote thread soon. Please make a vote in that thread as well. Thanks again! On Tue, Jan 9, 2024 at 7:44 AM Bhuwan Sahni wrote: > +1 on the newer APIs. I believe these APIs provide

Re: Spark Structured Streaming and Flask REST API for Real-Time Data Ingestion and Analytics.

2024-01-08 Thread Mich Talebzadeh
Please also note that Flask, by default, is a single-threaded web framework. While it is suitable for development and small-scale applications, it may not handle concurrent requests efficiently in a production environment. In production, one can utilise Gunicorn (Green Unicorn) which is a WSGI (

Re: [DISCUSS] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-08 Thread Bhuwan Sahni
+1 on the newer APIs. I believe these APIs provide a much powerful mechanism for the user to perform arbitrary state management in Structured Streaming queries. Thanks Bhuwan Sahni On Mon, Jan 8, 2024 at 10:07 AM L. C. Hsieh wrote: > +1 > > I left some comments in the SPIP doc and got replies

Spark Structured Streaming and Flask REST API for Real-Time Data Ingestion and Analytics.

2024-01-08 Thread Mich Talebzadeh
Thought it might be useful to share my idea with fellow forum members. During the breaks, I worked on the *seamless integration of Spark Structured Streaming with Flask REST API for real-time data ingestion and analytics*. The use case revolves around a scenario where data is generated through

Re: [DISCUSS] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-08 Thread L. C. Hsieh
+1 I left some comments in the SPIP doc and got replies quickly. The new API looks good and more comprehensive. I think it will help Spark Structured Streaming to be more useful in more complicated streaming use cases. On Fri, Jan 5, 2024 at 8:15 PM Burak Yavuz wrote: > > I'm also a +1 on the

Re: Regression? - UIUtils::formatBatchTime - [SPARK-46611][CORE] Remove ThreadLocal by replace SimpleDateFormat with DateTimeFormatter

2024-01-08 Thread Sean Owen
Agreed, that looks wrong. From the code, it seems that "timezone" is only used for testing, though apparently no test caught this. I'll submit a PR to patch it in any event: https://github.com/apache/spark/pull/44619 On Mon, Jan 8, 2024 at 1:33 AM Janda Martin wrote: > I think that >

Regression? - UIUtils::formatBatchTime - [SPARK-46611][CORE] Remove ThreadLocal by replace SimpleDateFormat with DateTimeFormatter

2024-01-07 Thread Janda Martin
I think that [SPARK-46611][CORE] Remove ThreadLocal by replace SimpleDateFormat with DateTimeFormatter introduced regression in UIUtils::formatBatchTime when timezone is defined. DateTimeFormatter is thread-safe and immutable according to JavaDoc so method DateTimeFormatter::withZone

Re: [DISCUSS] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-05 Thread Burak Yavuz
I'm also a +1 on the newer APIs. We had a lot of learnings from using flatMapGroupsWithState and I believe that we can make the APIs a lot easier to use. On Wed, Nov 29, 2023 at 6:43 PM Anish Shrigondekar wrote: > Hi dev, > > Addressed the comments that Jungtaek had on the doc. Bumping the

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-05 Thread Mich Talebzadeh
Hi Pavan, Thanks for your answers. Given these responses , it seems like you have already taken a comprehensive approach to address the challenges associated with dynamic scaling in Spark Structured Streaming. IMO, It would also be beneficial to engage with other members as well, or gather

Re: [DISCUSS] SPIP: Structured Streaming - Arbitrary State API v2

2024-01-05 Thread Shixiong Zhu
+1. Looking forward to seeing how the new API brings in new streaming use cases! Best Regards, Shixiong Zhu On Wed, Nov 29, 2023 at 6:42 PM Anish Shrigondekar wrote: > Hi dev, > > Addressed the comments that Jungtaek had on the doc. Bumping the thread > once again to see if other folks have

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-05 Thread Pavan Kotikalapudi
Hi Mich, As always thanks for looking keenly on the design, really appreciate your inputs on this Ticket. Would love to improve this further and cover more edge-cases if any. I can answer the concerns you have below. I believe I have covered some of them in the proposal, If at all I missed out

回复:unsubscribe

2024-01-04 Thread yxj1141

unsubscribe

2024-01-03 Thread Chenyang Tang
unsubscribe

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-02 Thread Mich Talebzadeh
Hi Pavan, Thanks for putting this request forward. I am generally supportive of it. In a nutshell, I believe this proposal can potentially hold a significant promise for optimizing resource utilization and enhancing performance in Spark Structured Streaming. Having said that there are potential

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2024-01-01 Thread Pavan Kotikalapudi
Hi PMC members, Bumping this idea for one last time to see if there are any approvals to take it forward. Here is an initial Implementation draft PR https://github.com/apache/spark/pull/42352 and design doc:

Re: When and how does Spark use metastore statistics?

2023-12-26 Thread Bjørn Jørgensen
Tell me more about spark.sql.cbo.strategy tir. 12. des. 2023 kl. 00:25 skrev Nicholas Chammas < nicholas.cham...@gmail.com>: > Where exactly are you getting this information from? > > As far as I can tell, spark.sql.cbo.enabled has defaulted to false since > it was introduced 7 years ago >

Re: Contribute to Spark Open source

2023-12-25 Thread Colin Williams
Hello, Did you see https://spark.apache.org/contributing.html ? On Mon, Dec 25, 2023 at 5:13 AM Sudharshan V wrote: > > Hi All, > > I am new to Open source and have been using spark scala in my organisation > for the past couple of years. > I would like to contribute to spark open source. > I

Contribute to Spark Open source

2023-12-25 Thread Sudharshan V
Hi All, I am new to Open source and have been using spark scala in my organisation for the past couple of years. I would like to contribute to spark open source. I am not exactly sure of how and where to start. Any help would be greatly appreciated. Is there any documentation per se on how to

the life cycle shuffle Dependency

2023-12-24 Thread yang chen
hi, I'm learning spark, and wonder when to delete shuffle data, I find the ContextCleaner class which clean the shuffle data when shuffle dependency is GC-ed. Based on source code, the shuffle dependency is gc-ed only when active job finish, but i'm not sure, Could you explain the life cycle of

Re: Validate spark sql

2023-12-24 Thread Nicholas Chammas
This is a user-list question, not a dev-list question. Moving this conversation to the user list and BCC-ing the dev list. Also, this statement > We are not validating against table or column existence. is not correct. When you call spark.sql(…), Spark will lookup the table references and

Re: Validate spark sql

2023-12-24 Thread Mich Talebzadeh
Yes, you can validate the syntax of your PySpark SQL queries without connecting to an actual dataset or running the queries on a cluster. PySpark provides a method for syntax validation without executing the query. Something like below __ / __/__ ___ _/ /__ _\ \/ _

Validate spark sql

2023-12-23 Thread ram manickam
Hello, Is there a way to validate pyspark sql to validate only syntax errors?. I cannot connect do actual data set to perform this validation. Any help would be appreciated. Thanks Ram

Meet our keynote speakers and register to Community Over Code EU!

2023-12-22 Thread Ryan Skraba
[Note: You're receiving this email because you are subscribed to one or more project dev@ mailing lists at the Apache Software Foundation.] * Merge with the ASF EUniverse!The registration for

Unsubscribe

2023-12-21 Thread yxj1141
Unsubscribe

Re: ShuffleManager and Speculative Execution

2023-12-21 Thread Mich Talebzadeh
Interesting point. As I understand, the key point is the ShuffleManager ensures that only one map output file is processed by the reduce task, even when multiple attempts succeed. So it is not a random selection process. At the reduce stage, only one copy of the map output needs to be read by the

ShuffleManager and Speculative Execution

2023-12-21 Thread Enrico Minack
Hi Spark devs, I have a question around ShuffleManager: With speculative execution, one map output file is being created multiple times (by multiple task attempts). If both attempts succeed, which is to be read by the reduce task in the next stage? Is any map output as good as any other?

the life cycle shuffle Dependency

2023-12-17 Thread yang chen
hi, I'm learning spark, and wonder when to delete shuffle data, I find the ContextCleaner class which clean the shuffle data when shuffle dependency is GC-ed. Based on source code, the shuffle dependency is gc-ed only when active job finish, but i'm not sure, Could you explain the life cycle of

Guidance for filling out "Affects Version" on Jira

2023-12-17 Thread Nicholas Chammas
The Contributing guide only mentions what to fill in for “Affects Version” for bugs. How about for improvements? This question once caused some problems when I set “Affects Version” to the last released version, and that was interpreted as a request

[ANNOUNCE] Apache Spark 3.3.4 released

2023-12-16 Thread Dongjoon Hyun
We are happy to announce the availability of Apache Spark 3.3.4! Spark 3.3.4 is the last maintenance release based on the branch-3.3 maintenance branch of Spark. It contains many fixes including security and correctness domains. We strongly recommend all 3.3 users to upgrade to this or higher

[VOTE][RESULT] Release Spark 3.3.4 (RC1)

2023-12-15 Thread Dongjoon Hyun
The vote passes with 6 +1s (3 binding +1s). Thanks to all who helped with the release! (* = binding) +1: - Dongjoon Hyun * - Yuming Wang * - Kent Yao - Liang-Chi Hsieh * - Yang Jie - Malcolm Decuire +0: None -1: None

Re: [VOTE] Release Spark 3.3.4 (RC1)

2023-12-15 Thread Dongjoon Hyun
Thank you all. This vote passed. Let me conclude. Dongjoon On 2023/12/11 23:58:28 Malcolm Decuire wrote: > +1 > > On Mon, Dec 11, 2023 at 6:21 PM Yang Jie wrote: > > > +1 > > > > On 2023/12/11 03:03:39 "L. C. Hsieh" wrote: > > > +1 > > > > > > On Sun, Dec 10, 2023 at 6:15 PM Kent Yao wrote:

Re: Spark 3.5.0 and issue SPARK-45593 (SPARK-45201)

2023-12-14 Thread Steven B Jones
A follow-up to my note yesterday. Issue SPARK-45201 has similar externals to SPARK-45593 and is written to cover target release 3.5.0. Remarkably, the issue only affects self-created distributions, and not the one(s) provided by Spark development itself. I'll let you read

Spark 3.5.0 and issue SPARK-45593

2023-12-13 Thread Steven B Jones
Hello, I maintain a version of Apache Spark that runs on z/OS. I'm porting Spark 3.5.0 to our platform, and having the problem described by https://issues.apache.org/jira/projects/SPARK/issues/SPARK-45593 in

<    2   3   4   5   6   7   8   9   10   11   >