Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-10 Thread Cheng Pan
> Not really - this is not designed to be a replacement for the current 
> approach.

That's what I assumed too. But my question is, as a user, how to write a 
spark-submit command to submit a Spark app to leverage this operator?

Thanks,
Cheng Pan


> On Nov 11, 2023, at 03:21, Zhou Jiang  wrote:
> 
> Not really - this is not designed to be a replacement for the current 
> approach. Kubernetes operator fits in the scenario for automation and 
> application lifecycle management at scale. Users can choose between 
> spark-submit and operator approach based on their specific needs and 
> requirements.
> 
> On Thu, Nov 9, 2023 at 9:16 PM Cheng Pan  wrote:
> Thanks for this impressive proposal, I have a basic question, how does 
> spark-submit work with this operator? Or it enforces that we must use 
> `kubectl apply -f spark-job.yaml`(or K8s client in programming way) to submit 
> Spark app?
> 
> Thanks,
> Cheng Pan
> 
> 
> > On Nov 10, 2023, at 04:05, Zhou Jiang  wrote:
> > 
> > Hi Spark community,
> > I'm reaching out to initiate a conversation about the possibility of 
> > developing a Java-based Kubernetes operator for Apache Spark. Following the 
> > operator pattern 
> > (https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark 
> > users may manage applications and related components seamlessly using 
> > native tools like kubectl. The primary goal is to simplify the Spark user 
> > experience on Kubernetes, minimizing the learning curve and operational 
> > complexities and therefore enable users to focus on the Spark application 
> > development.
> > Although there are several open-source Spark on Kubernetes operators 
> > available, none of them are officially integrated into the Apache Spark 
> > project. As a result, these operators may lack active support and 
> > development for new features. Within this proposal, our aim is to introduce 
> > a Java-based Spark operator as an integral component of the Apache Spark 
> > project. This solution has been employed internally at Apple for multiple 
> > years, operating millions of executors in real production environments. The 
> > use of Java in this solution is intended to accommodate a wider user and 
> > contributor audience, especially those who are familiar with Scala.
> > Ideally, this operator should have its dedicated repository, similar to 
> > Spark Connect Golang or Spark Docker, allowing it to maintain a loose 
> > connection with the Spark release cycle. This model is also followed by the 
> > Apache Flink Kubernetes operator.
> > We believe that this project holds the potential to evolve into a thriving 
> > community project over the long run. A comparison can be drawn with the 
> > Flink Kubernetes Operator: Apple has open-sourced internal Flink Kubernetes 
> > operator, making it a part of the Apache Flink project 
> > (https://github.com/apache/flink-kubernetes-operator). This move has gained 
> > wide industry adoption and contributions from the community. In a mere 
> > year, the Flink operator has garnered more than 600 stars and has attracted 
> > contributions from over 80 contributors. This showcases the level of 
> > community interest and collaborative momentum that can be achieved in 
> > similar scenarios.
> > More details can be found at SPIP doc : Spark Kubernetes Operator 
> > https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
> > Thanks,-- 
> > Zhou JIANG
> > 
> 
> 
> 
> -- 
> Zhou JIANG
> 


-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-10 Thread kazuyuki tanimura
+1

Kazu

> On Nov 10, 2023, at 10:05 AM, Khalid Mammadov  
> wrote:
> 
> +1
> 
> On Fri, 10 Nov 2023, 15:23 Peter Toth,  > wrote:
>> +1
>> 
>> On Fri, Nov 10, 2023, 14:09 Bjørn Jørgensen > > wrote:
>>> +1
>>> 
>>> fre. 10. nov. 2023 kl. 08:39 skrev Nan Zhu >> >:
 just curious what happened on google’s spark operator? 
 
 On Thu, Nov 9, 2023 at 19:12 Ilan Filonenko >>> > wrote:
> +1
> 
> On Thu, Nov 9, 2023 at 7:43 PM Ryan Blue  > wrote:
>> +1
>> 
>> On Thu, Nov 9, 2023 at 4:23 PM Hussein Awala > > wrote:
>>> +1 for creating an official Kubernetes operator for Apache Spark
>>> 
>>> On Fri, Nov 10, 2023 at 12:38 AM huaxin gao >> > wrote:
 +1
> 
 
 On Thu, Nov 9, 2023 at 3:14 PM DB Tsai >>> > wrote:
> +1
> 
> To be completely transparent, I am employed in the same department as 
> Zhou at Apple.
> 
> I support this proposal, provided that we witness community adoption 
> following the release of the Flink Kubernetes operator, streamlining 
> Flink deployment on Kubernetes. 
> 
> A well-maintained official Spark Kubernetes operator is essential for 
> our Spark community as well.
> 
> DB Tsai  |  https://www.dbtsai.com/ 
> 
>   |  PGP 42E5B25A8F7A82C1
> 
>> On Nov 9, 2023, at 12:05 PM, Zhou Jiang > > wrote:
>> 
>> Hi Spark community,
>> I'm reaching out to initiate a conversation about the possibility of 
>> developing a Java-based Kubernetes operator for Apache Spark. 
>> Following the operator pattern 
>> (https://kubernetes.io/docs/concepts/extend-kubernetes/operator/ 
>> ),
>>  Spark users may manage applications and related components 
>> seamlessly using native tools like kubectl. The primary goal is to 
>> simplify the Spark user experience on Kubernetes, minimizing the 
>> learning curve and operational complexities and therefore enable 
>> users to focus on the Spark application development.
>> Although there are several open-source Spark on Kubernetes operators 
>> available, none of them are officially integrated into the Apache 
>> Spark project. As a result, these operators may lack active support 
>> and development for new features. Within this proposal, our aim is 
>> to introduce a Java-based Spark operator as an integral component of 
>> the Apache Spark project. This solution has been employed internally 
>> at Apple for multiple years, operating millions of executors in real 
>> production environments. The use of Java in this solution is 
>> intended to accommodate a wider user and contributor audience, 
>> especially those who are familiar with Scala.
>> Ideally, this operator should have its dedicated repository, similar 
>> to Spark Connect Golang or Spark Docker, allowing it to maintain a 
>> loose connection with the Spark release cycle. This model is also 
>> followed by the Apache Flink Kubernetes operator.
>> We believe that this project holds the potential to evolve into a 
>> thriving community project over the long run. A comparison can be 
>> drawn with the Flink Kubernetes Operator: Apple has open-sourced 
>> internal Flink Kubernetes operator, making it a part of the Apache 
>> Flink project (https://github.com/apache/flink-kubernetes-operator 
>> 

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-10 Thread Khalid Mammadov
+1

On Fri, 10 Nov 2023, 15:23 Peter Toth,  wrote:

> +1
>
> On Fri, Nov 10, 2023, 14:09 Bjørn Jørgensen 
> wrote:
>
>> +1
>>
>> fre. 10. nov. 2023 kl. 08:39 skrev Nan Zhu :
>>
>>> just curious what happened on google’s spark operator?
>>>
>>> On Thu, Nov 9, 2023 at 19:12 Ilan Filonenko  wrote:
>>>
 +1

 On Thu, Nov 9, 2023 at 7:43 PM Ryan Blue  wrote:

> +1
>
> On Thu, Nov 9, 2023 at 4:23 PM Hussein Awala  wrote:
>
>> +1 for creating an official Kubernetes operator for Apache Spark
>>
>> On Fri, Nov 10, 2023 at 12:38 AM huaxin gao 
>> wrote:
>>
>>> +1
>>>
>>
>>> On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:
>>>
 +1

 To be completely transparent, I am employed in the same department
 as Zhou at Apple.

 I support this proposal, provided that we witness community
 adoption following the release of the Flink Kubernetes operator,
 streamlining Flink deployment on Kubernetes.

 A well-maintained official Spark Kubernetes operator is essential
 for our Spark community as well.

 DB Tsai  |  https://www.dbtsai.com/
 
  |  PGP 42E5B25A8F7A82C1

 On Nov 9, 2023, at 12:05 PM, Zhou Jiang 
 wrote:

 Hi Spark community,
 I'm reaching out to initiate a conversation about the possibility
 of developing a Java-based Kubernetes operator for Apache Spark. 
 Following
 the operator pattern (
 https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
 ),
 Spark users may manage applications and related components seamlessly 
 using
 native tools like kubectl. The primary goal is to simplify the Spark 
 user
 experience on Kubernetes, minimizing the learning curve and operational
 complexities and therefore enable users to focus on the Spark 
 application
 development.
 Although there are several open-source Spark on Kubernetes
 operators available, none of them are officially integrated into the 
 Apache
 Spark project. As a result, these operators may lack active support and
 development for new features. Within this proposal, our aim is to 
 introduce
 a Java-based Spark operator as an integral component of the Apache 
 Spark
 project. This solution has been employed internally at Apple for 
 multiple
 years, operating millions of executors in real production 
 environments. The
 use of Java in this solution is intended to accommodate a wider user 
 and
 contributor audience, especially those who are familiar with Scala.
 Ideally, this operator should have its dedicated repository,
 similar to Spark Connect Golang or Spark Docker, allowing it to 
 maintain a
 loose connection with the Spark release cycle. This model is also 
 followed
 by the Apache Flink Kubernetes operator.
 We believe that this project holds the potential to evolve into a
 thriving community project over the long run. A comparison can be drawn
 with the Flink Kubernetes Operator: Apple has open-sourced internal 
 Flink
 Kubernetes operator, making it a part of the Apache Flink project (
 https://github.com/apache/flink-kubernetes-operator
 ).
 This move has gained wide industry adoption and contributions from the
 community. In a mere year, the Flink operator has garnered more than 
 600
 stars and has attracted contributions from over 80 contributors. This
 showcases the 

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-10 Thread Mich Talebzadeh
Hi,

Looks like a good idea but before committing myself, I have a number of
design questions having looked at SPIP itself:


   1. Will the name "Standard add-on Kubernetes operator to Spark ''
   describe it better?
   2. We  are still struggling with improving Spark driver start-up time.
   What would be the footprint of this add-on on the driver start-up time?
   3. In  a commercial world will there be (?) a static image for this
   besides the base image that is maintained in the so called  container
   registry (ECR, GCR etc), It takes time to upload these images. Will this
   bea  static image (docker file)? Other alternative would be that this
   docker file is created by the user through set of scripts?


These are the things that come into my mind.

HTH


Mich Talebzadeh,
Distinguished Technologist, Solutions Architect & Engineer
London
United Kingdom


   view my Linkedin profile



 https://en.everybodywiki.com/Mich_Talebzadeh



*Disclaimer:* Use it at your own risk. Any and all responsibility for any
loss, damage or destruction of data or any other property which may arise
from relying on this email's technical content is explicitly disclaimed.
The author will in no case be liable for any monetary damages arising from
such loss, damage or destruction.




On Fri, 10 Nov 2023 at 14:19, Bjørn Jørgensen 
wrote:

> +1
>
> fre. 10. nov. 2023 kl. 08:39 skrev Nan Zhu :
>
>> just curious what happened on google’s spark operator?
>>
>> On Thu, Nov 9, 2023 at 19:12 Ilan Filonenko  wrote:
>>
>>> +1
>>>
>>> On Thu, Nov 9, 2023 at 7:43 PM Ryan Blue  wrote:
>>>
 +1

 On Thu, Nov 9, 2023 at 4:23 PM Hussein Awala  wrote:

> +1 for creating an official Kubernetes operator for Apache Spark
>
> On Fri, Nov 10, 2023 at 12:38 AM huaxin gao 
> wrote:
>
>> +1
>>
>
>> On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:
>>
>>> +1
>>>
>>> To be completely transparent, I am employed in the same department
>>> as Zhou at Apple.
>>>
>>> I support this proposal, provided that we witness community adoption
>>> following the release of the Flink Kubernetes operator, streamlining 
>>> Flink
>>> deployment on Kubernetes.
>>>
>>> A well-maintained official Spark Kubernetes operator is essential
>>> for our Spark community as well.
>>>
>>> DB Tsai  |  https://www.dbtsai.com/
>>> 
>>>  |  PGP 42E5B25A8F7A82C1
>>>
>>> On Nov 9, 2023, at 12:05 PM, Zhou Jiang 
>>> wrote:
>>>
>>> Hi Spark community,
>>> I'm reaching out to initiate a conversation about the possibility of
>>> developing a Java-based Kubernetes operator for Apache Spark. Following 
>>> the
>>> operator pattern (
>>> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
>>> ),
>>> Spark users may manage applications and related components seamlessly 
>>> using
>>> native tools like kubectl. The primary goal is to simplify the Spark 
>>> user
>>> experience on Kubernetes, minimizing the learning curve and operational
>>> complexities and therefore enable users to focus on the Spark 
>>> application
>>> development.
>>> Although there are several open-source Spark on Kubernetes operators
>>> available, none of them are officially integrated into the Apache Spark
>>> project. As a result, these operators may lack active support and
>>> development for new features. Within this proposal, our aim is to 
>>> introduce
>>> a Java-based Spark operator as an integral component of the Apache Spark
>>> project. This solution has been employed internally at Apple for 
>>> multiple
>>> years, operating millions of executors in real production environments. 
>>> The
>>> use of Java in this solution is intended to accommodate a wider user and
>>> contributor audience, especially those who are familiar with Scala.
>>> Ideally, this operator should have its dedicated repository, similar
>>> to Spark Connect Golang or Spark Docker, allowing it to maintain a loose
>>> connection with the Spark 

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-10 Thread Peter Toth
+1

On Fri, Nov 10, 2023, 14:09 Bjørn Jørgensen 
wrote:

> +1
>
> fre. 10. nov. 2023 kl. 08:39 skrev Nan Zhu :
>
>> just curious what happened on google’s spark operator?
>>
>> On Thu, Nov 9, 2023 at 19:12 Ilan Filonenko  wrote:
>>
>>> +1
>>>
>>> On Thu, Nov 9, 2023 at 7:43 PM Ryan Blue  wrote:
>>>
 +1

 On Thu, Nov 9, 2023 at 4:23 PM Hussein Awala  wrote:

> +1 for creating an official Kubernetes operator for Apache Spark
>
> On Fri, Nov 10, 2023 at 12:38 AM huaxin gao 
> wrote:
>
>> +1
>>
>
>> On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:
>>
>>> +1
>>>
>>> To be completely transparent, I am employed in the same department
>>> as Zhou at Apple.
>>>
>>> I support this proposal, provided that we witness community adoption
>>> following the release of the Flink Kubernetes operator, streamlining 
>>> Flink
>>> deployment on Kubernetes.
>>>
>>> A well-maintained official Spark Kubernetes operator is essential
>>> for our Spark community as well.
>>>
>>> DB Tsai  |  https://www.dbtsai.com/
>>> 
>>>  |  PGP 42E5B25A8F7A82C1
>>>
>>> On Nov 9, 2023, at 12:05 PM, Zhou Jiang 
>>> wrote:
>>>
>>> Hi Spark community,
>>> I'm reaching out to initiate a conversation about the possibility of
>>> developing a Java-based Kubernetes operator for Apache Spark. Following 
>>> the
>>> operator pattern (
>>> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
>>> ),
>>> Spark users may manage applications and related components seamlessly 
>>> using
>>> native tools like kubectl. The primary goal is to simplify the Spark 
>>> user
>>> experience on Kubernetes, minimizing the learning curve and operational
>>> complexities and therefore enable users to focus on the Spark 
>>> application
>>> development.
>>> Although there are several open-source Spark on Kubernetes operators
>>> available, none of them are officially integrated into the Apache Spark
>>> project. As a result, these operators may lack active support and
>>> development for new features. Within this proposal, our aim is to 
>>> introduce
>>> a Java-based Spark operator as an integral component of the Apache Spark
>>> project. This solution has been employed internally at Apple for 
>>> multiple
>>> years, operating millions of executors in real production environments. 
>>> The
>>> use of Java in this solution is intended to accommodate a wider user and
>>> contributor audience, especially those who are familiar with Scala.
>>> Ideally, this operator should have its dedicated repository, similar
>>> to Spark Connect Golang or Spark Docker, allowing it to maintain a loose
>>> connection with the Spark release cycle. This model is also followed by 
>>> the
>>> Apache Flink Kubernetes operator.
>>> We believe that this project holds the potential to evolve into a
>>> thriving community project over the long run. A comparison can be drawn
>>> with the Flink Kubernetes Operator: Apple has open-sourced internal 
>>> Flink
>>> Kubernetes operator, making it a part of the Apache Flink project (
>>> https://github.com/apache/flink-kubernetes-operator
>>> ).
>>> This move has gained wide industry adoption and contributions from the
>>> community. In a mere year, the Flink operator has garnered more than 600
>>> stars and has attracted contributions from over 80 contributors. This
>>> showcases the level of community interest and collaborative momentum 
>>> that
>>> can be achieved in similar scenarios.
>>> More details can be found at SPIP doc : Spark Kubernetes Operator

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-10 Thread Bjørn Jørgensen
+1

fre. 10. nov. 2023 kl. 08:39 skrev Nan Zhu :

> just curious what happened on google’s spark operator?
>
> On Thu, Nov 9, 2023 at 19:12 Ilan Filonenko  wrote:
>
>> +1
>>
>> On Thu, Nov 9, 2023 at 7:43 PM Ryan Blue  wrote:
>>
>>> +1
>>>
>>> On Thu, Nov 9, 2023 at 4:23 PM Hussein Awala  wrote:
>>>
 +1 for creating an official Kubernetes operator for Apache Spark

 On Fri, Nov 10, 2023 at 12:38 AM huaxin gao 
 wrote:

> +1
>

> On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:
>
>> +1
>>
>> To be completely transparent, I am employed in the same department as
>> Zhou at Apple.
>>
>> I support this proposal, provided that we witness community adoption
>> following the release of the Flink Kubernetes operator, streamlining 
>> Flink
>> deployment on Kubernetes.
>>
>> A well-maintained official Spark Kubernetes operator is essential for
>> our Spark community as well.
>>
>> DB Tsai  |  https://www.dbtsai.com/
>> 
>>  |  PGP 42E5B25A8F7A82C1
>>
>> On Nov 9, 2023, at 12:05 PM, Zhou Jiang 
>> wrote:
>>
>> Hi Spark community,
>> I'm reaching out to initiate a conversation about the possibility of
>> developing a Java-based Kubernetes operator for Apache Spark. Following 
>> the
>> operator pattern (
>> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
>> ),
>> Spark users may manage applications and related components seamlessly 
>> using
>> native tools like kubectl. The primary goal is to simplify the Spark user
>> experience on Kubernetes, minimizing the learning curve and operational
>> complexities and therefore enable users to focus on the Spark application
>> development.
>> Although there are several open-source Spark on Kubernetes operators
>> available, none of them are officially integrated into the Apache Spark
>> project. As a result, these operators may lack active support and
>> development for new features. Within this proposal, our aim is to 
>> introduce
>> a Java-based Spark operator as an integral component of the Apache Spark
>> project. This solution has been employed internally at Apple for multiple
>> years, operating millions of executors in real production environments. 
>> The
>> use of Java in this solution is intended to accommodate a wider user and
>> contributor audience, especially those who are familiar with Scala.
>> Ideally, this operator should have its dedicated repository, similar
>> to Spark Connect Golang or Spark Docker, allowing it to maintain a loose
>> connection with the Spark release cycle. This model is also followed by 
>> the
>> Apache Flink Kubernetes operator.
>> We believe that this project holds the potential to evolve into a
>> thriving community project over the long run. A comparison can be drawn
>> with the Flink Kubernetes Operator: Apple has open-sourced internal Flink
>> Kubernetes operator, making it a part of the Apache Flink project (
>> https://github.com/apache/flink-kubernetes-operator
>> ).
>> This move has gained wide industry adoption and contributions from the
>> community. In a mere year, the Flink operator has garnered more than 600
>> stars and has attracted contributions from over 80 contributors. This
>> showcases the level of community interest and collaborative momentum that
>> can be achieved in similar scenarios.
>> More details can be found at SPIP doc : Spark Kubernetes Operator
>> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
>> 

Re: Apache Spark 3.4.2 (?)

2023-11-10 Thread Kent Yao
+1

Maxim Gekk  于2023年11月9日周四 18:18写道:
>
> +1
>
> On Wed, Nov 8, 2023 at 5:29 AM kazuyuki tanimura 
>  wrote:
>>
>> +1
>>
>> Kazu
>>
>> On Nov 7, 2023, at 5:23 PM, L. C. Hsieh  wrote:
>>
>> +1
>>
>> On Tue, Nov 7, 2023 at 4:56 PM Dongjoon Hyun  wrote:
>>
>>
>> Thank you all!
>>
>> Dongjoon
>>
>> On Mon, Nov 6, 2023 at 6:03 PM Holden Karau  wrote:
>>
>>
>> +1
>>
>> On Mon, Nov 6, 2023 at 4:30 PM yangjie01  wrote:
>>
>>
>> +1
>>
>>
>>
>> 发件人: Yuming Wang 
>> 日期: 2023年11月7日 星期二 07:00
>> 收件人: Santosh Pingale 
>> 抄送: Dongjoon Hyun , dev 
>> 主题: Re: Apache Spark 3.4.2 (?)
>>
>>
>>
>> +1
>>
>>
>>
>> On Tue, Nov 7, 2023 at 3:55 AM Santosh Pingale 
>>  wrote:
>>
>> Makes sense given the nature of those commits.
>>
>>
>>
>> On Mon, Nov 6, 2023, 7:52 PM Dongjoon Hyun  wrote:
>>
>> Hi, All.
>>
>> Apache Spark 3.4.1 tag was created on Jun 19th and `branch-3.4` has 103 
>> commits including important security and correctness patches like 
>> SPARK-44251, SPARK-44805, and SPARK-44940.
>>
>>https://github.com/apache/spark/releases/tag/v3.4.1
>>
>>$ git log --oneline v3.4.1..HEAD | wc -l
>>103
>>
>>SPARK-44251 Potential for incorrect results or NPE when full outer USING 
>> join has null key value
>>SPARK-44805 Data lost after union using 
>> spark.sql.parquet.enableNestedColumnVectorizedReader=true
>>SPARK-44940 Improve performance of JSON parsing when 
>> "spark.sql.json.enablePartialResults" is enabled
>>
>> Currently, I'm checking the following open correctness issues. I'd like to 
>> propose to release Apache Spark 3.4.2 after resolving them and volunteer as 
>> the release manager for Apache Spark 3.4.2. If there are no additional 
>> blockers, the first tentative RC1 vote date is November 13rd (Monday). If it 
>> takes some time to resolve the open correctness issues, we can start the 
>> vote after Thanksgiving holiday.
>>
>>SPARK-44512 dataset.sort.select.write.partitionBy sorts wrong column
>>SPARK-45282 Join loses records for cached datasets
>>
>> WDTY?
>>
>> Dongjoon.
>>
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org