Re: Apache Spark 3.4.2 (?)

2023-11-12 Thread Dongjoon Hyun
Thank you all.

Here is an update.

Thanks to your help, all open blocker issues (including correctness issues) are 
resolved.

However, I'm still waiting for this additional alternative approach PR for the 
previously resolved JIRAs.

https://github.com/apache/spark/pull/43760 (for Apache Spark 4.0.0, 3.5.2, 
3.4.2).

Although the above PR is still under review and needs revisions, I hope we can 
start 3.4.2 RC1 vote early this week.

Bests,
Dongjoon.

On 2023/11/10 08:41:57 Kent Yao wrote:
> +1
> 
> Maxim Gekk  于2023年11月9日周四 18:18写道:
> >
> > +1
> >
> > On Wed, Nov 8, 2023 at 5:29 AM kazuyuki tanimura 
> >  wrote:
> >>
> >> +1
> >>
> >> Kazu
> >>
> >> On Nov 7, 2023, at 5:23 PM, L. C. Hsieh  wrote:
> >>
> >> +1
> >>
> >> On Tue, Nov 7, 2023 at 4:56 PM Dongjoon Hyun  
> >> wrote:
> >>
> >>
> >> Thank you all!
> >>
> >> Dongjoon
> >>
> >> On Mon, Nov 6, 2023 at 6:03 PM Holden Karau  wrote:
> >>
> >>
> >> +1
> >>
> >> On Mon, Nov 6, 2023 at 4:30 PM yangjie01  
> >> wrote:
> >>
> >>
> >> +1
> >>
> >>
> >>
> >> 发件人: Yuming Wang 
> >> 日期: 2023年11月7日 星期二 07:00
> >> 收件人: Santosh Pingale 
> >> 抄送: Dongjoon Hyun , dev 
> >> 主题: Re: Apache Spark 3.4.2 (?)
> >>
> >>
> >>
> >> +1
> >>
> >>
> >>
> >> On Tue, Nov 7, 2023 at 3:55 AM Santosh Pingale 
> >>  wrote:
> >>
> >> Makes sense given the nature of those commits.
> >>
> >>
> >>
> >> On Mon, Nov 6, 2023, 7:52 PM Dongjoon Hyun  wrote:
> >>
> >> Hi, All.
> >>
> >> Apache Spark 3.4.1 tag was created on Jun 19th and `branch-3.4` has 103 
> >> commits including important security and correctness patches like 
> >> SPARK-44251, SPARK-44805, and SPARK-44940.
> >>
> >>https://github.com/apache/spark/releases/tag/v3.4.1
> >>
> >>$ git log --oneline v3.4.1..HEAD | wc -l
> >>103
> >>
> >>SPARK-44251 Potential for incorrect results or NPE when full outer 
> >> USING join has null key value
> >>SPARK-44805 Data lost after union using 
> >> spark.sql.parquet.enableNestedColumnVectorizedReader=true
> >>SPARK-44940 Improve performance of JSON parsing when 
> >> "spark.sql.json.enablePartialResults" is enabled
> >>
> >> Currently, I'm checking the following open correctness issues. I'd like to 
> >> propose to release Apache Spark 3.4.2 after resolving them and volunteer 
> >> as the release manager for Apache Spark 3.4.2. If there are no additional 
> >> blockers, the first tentative RC1 vote date is November 13rd (Monday). If 
> >> it takes some time to resolve the open correctness issues, we can start 
> >> the vote after Thanksgiving holiday.
> >>
> >>SPARK-44512 dataset.sort.select.write.partitionBy sorts wrong column
> >>SPARK-45282 Join loses records for cached datasets
> >>
> >> WDTY?
> >>
> >> Dongjoon.
> >>
> >>
> >> -
> >> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> >>
> >>
> 
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
> 
> 

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



De-serialization by Java encoder : Spark 3.4.x doesn't support anymore fields having an accessor but no setter? (Encoder fails on many "NoSuchElementException: None.get" since 3.4.x [SPARK-45311])

2023-11-12 Thread Marc Le Bihan

Hello,

I am writing to check if what I am encountering is bug or the behavior 
that is expected from Spark 3.4.x and over.


I've noticed that analysis quickly fails on a "/NoSuchElementException: 
None.get/" with the JavaBeanEncoder in deserialization since 3.4.x, if a 
candidate field has a accessor method on it, /getSomething()/, 
/isSomething()/, but no setter associated.


The "/NoSuchElementException: None.get/" comes from the statement *
f.writeMethod.get -> setter*
that finds no setter for the getter, and fails on /None/.

case  JavaBeanEncoder(tag, fields) =>
      val setters = fields.map { f =>
            val newTypePath = walkedTypePath.recordField(
                f.enc.clsTag.runtimeClass.getName,
                f.name)
    val setter = expressionWithNullSafety(
        deserializerFor(
            f.enc,
            addToPath(path, f.name, f.enc.dataType, newTypePath),
            newTypePath),
          nullable = f.nullable,
          newTypePath)
     f.writeMethod.get -> setter
}


Spark versions 3.3.x and below were allowing an accessor not to have a 
setter linked to it.
I've found no indication on the migration guide that a rule is now 
taking place, that enforces the writing of a setter for each existing 
accessor.


A workaround is to rename these accessors with "/the new name/" that is 
now in favor with Java Records, where *getSomething()* or 
*isSomething()* accessors are renamed *something()*.

Then, Spark doesn't detect these accessors and won't stubble upon.


If it's the expected new behavior, would it be possible to handle the 
detection of a missing setter smoothly?


a "/NoSuchElementException: None.get" /message stopping the analysis is 
clueless. What ? Where ?
an error log like : "/no setter associated to the accessor {} for field 
{} in class {}/" would be useful for the developer. With maybe mentioned 
into it, the workaround I suggest.


Regards,

Marc Le Bihan


[Encoder fails on many "NoSuchElementException: None.get" since 3.4.x, 
search for an encoder for a generic type, and since 3.5.x isn't "an 
expression encoder"](https://issues.apache.org/jira/browse/SPARK-45311)


Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-11-12 Thread Pavan Kotikalapudi
Here is an initial Implementation draft PR
https://github.com/apache/spark/pull/42352 and design doc:
https://docs.google.com/document/d/1_YmfCsQQb9XhRdKh0ijbc-j8JKGtGBxYsk_30NVSTWo/edit?usp=sharing


On Sun, Nov 12, 2023 at 5:24 PM Pavan Kotikalapudi 
wrote:

> Hi Dev community,
>
> Just bumping to see if there are more reviews to evaluate this idea of
> adding auto-scaling to structured streaming.
>
> Thanks again,
>
> Pavan
>
> On Wed, Aug 23, 2023 at 2:49 PM Pavan Kotikalapudi <
> pkotikalap...@twilio.com> wrote:
>
>> Thanks for the review Mich.
>>
>> I have updated the Q4 with as concise information as possible and left
>> the detailed explanation to Appendix.
>>
>> here is the updated answer to the Q4
>> 
>>
>> Thank you,
>>
>> Pavan
>>
>> On Wed, Aug 23, 2023 at 2:46 AM Mich Talebzadeh <
>> mich.talebza...@gmail.com> wrote:
>>
>>> Hi Pavan,
>>>
>>> I started reading your SPIP but have difficulty understanding it in
>>> detail.
>>>
>>> Specifically under Q4, " What is new in your approach and why do you
>>> think it will be successful?", I believe it would be better to remove the
>>> plots and focus on "what this proposed solution is going to add to the
>>> current play". At this stage a concise briefing would be appreciated and
>>> the specific plots should be left to the Appendix.
>>>
>>> HTH
>>>
>>>
>>> Mich Talebzadeh,
>>> Distinguished Technologist, Solutions Architect & Engineer
>>> London
>>> United Kingdom
>>>
>>>
>>>view my Linkedin profile
>>> 
>>>
>>>
>>>  https://en.everybodywiki.com/Mich_Talebzadeh
>>> 
>>>
>>>
>>>
>>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>>> any loss, damage or destruction of data or any other property which may
>>> arise from relying on this email's technical content is explicitly
>>> disclaimed. The author will in no case be liable for any monetary damages
>>> arising from such loss, damage or destruction.
>>>
>>>
>>>
>>>
>>> On Sun, 20 Aug 2023 at 07:40, Pavan Kotikalapudi <
>>> pkotikalap...@twilio.com> wrote:
>>>
 IMO ML might be good for cluster scheduler but for the core DRA
 algorithm of SSS I believe we should start with some primitives of
 Structured streaming. I would love to get some reviews on the doc and
 opinions on the feasibility of the solution.

 We have seen quite some savings using this solution in our team, Would
 like to listen to the dev community to see if they are looking
 for/interested in DRA for structured streaming.

 On Mon, Aug 14, 2023 at 9:12 AM Mich Talebzadeh <
 mich.talebza...@gmail.com> wrote:

> Thank you for your comments.
>
> My vision of integrating machine learning (ML) into Spark Structured
> Streaming (SSS) for capacity planning and performance optimization seems 
> to
> be promising. By leveraging ML techniques, I believe that we can
> potentially create predictive models that enhance the efficiency and
> resource allocation of the data processing pipelines. Here are some
> potential benefits and considerations for adding ML to SSS for capacity
> planning. However, I stand corrected
>
>1.
>
>*Predictive Capacity Planning:* ML models can analyze historical
>data (that we discussed already), workloads, and trends to predict 
> future
>resource needs accurately. This enables proactive scaling and 
> allocation of
>resources, ensuring optimal performance during high-demand periods, 
> such as
>times of high trades.
>2.
>
>*Real-time Decision Making: *ML can be used to make real-time
>decisions on resource allocation (software and cluster) based on 
> current
>data and conditions, allowing for dynamic adjustments to meet the
>processing demands.
>3.
>
>*Complex Data Analysis: *In a heterogeneous setup involving
>multiple databases, ML can analyze various factors like data read and 
> write
>times from different databases, data volumes, and data distribution
>patterns to optimize the overall data processing flow.
>4.
>
>*Anomaly Detection: *ML models can identify unusual patterns or
>performance deviations, alerting us to potential issues before they 
> impact
>the system.
>5.
>
>Integration with Monitoring: ML models can work alongside
>monitoring tools, gathering real-time data on various performance 

Re: Dynamic resource allocation for structured streaming [SPARK-24815]

2023-11-12 Thread Pavan Kotikalapudi
Hi Dev community,

Just bumping to see if there are more reviews to evaluate this idea of
adding auto-scaling to structured streaming.

Thanks again,

Pavan

On Wed, Aug 23, 2023 at 2:49 PM Pavan Kotikalapudi 
wrote:

> Thanks for the review Mich.
>
> I have updated the Q4 with as concise information as possible and left the
> detailed explanation to Appendix.
>
> here is the updated answer to the Q4
> 
>
> Thank you,
>
> Pavan
>
> On Wed, Aug 23, 2023 at 2:46 AM Mich Talebzadeh 
> wrote:
>
>> Hi Pavan,
>>
>> I started reading your SPIP but have difficulty understanding it in
>> detail.
>>
>> Specifically under Q4, " What is new in your approach and why do you
>> think it will be successful?", I believe it would be better to remove the
>> plots and focus on "what this proposed solution is going to add to the
>> current play". At this stage a concise briefing would be appreciated and
>> the specific plots should be left to the Appendix.
>>
>> HTH
>>
>>
>> Mich Talebzadeh,
>> Distinguished Technologist, Solutions Architect & Engineer
>> London
>> United Kingdom
>>
>>
>>view my Linkedin profile
>> 
>>
>>
>>  https://en.everybodywiki.com/Mich_Talebzadeh
>> 
>>
>>
>>
>> *Disclaimer:* Use it at your own risk. Any and all responsibility for
>> any loss, damage or destruction of data or any other property which may
>> arise from relying on this email's technical content is explicitly
>> disclaimed. The author will in no case be liable for any monetary damages
>> arising from such loss, damage or destruction.
>>
>>
>>
>>
>> On Sun, 20 Aug 2023 at 07:40, Pavan Kotikalapudi <
>> pkotikalap...@twilio.com> wrote:
>>
>>> IMO ML might be good for cluster scheduler but for the core DRA
>>> algorithm of SSS I believe we should start with some primitives of
>>> Structured streaming. I would love to get some reviews on the doc and
>>> opinions on the feasibility of the solution.
>>>
>>> We have seen quite some savings using this solution in our team, Would
>>> like to listen to the dev community to see if they are looking
>>> for/interested in DRA for structured streaming.
>>>
>>> On Mon, Aug 14, 2023 at 9:12 AM Mich Talebzadeh <
>>> mich.talebza...@gmail.com> wrote:
>>>
 Thank you for your comments.

 My vision of integrating machine learning (ML) into Spark Structured
 Streaming (SSS) for capacity planning and performance optimization seems to
 be promising. By leveraging ML techniques, I believe that we can
 potentially create predictive models that enhance the efficiency and
 resource allocation of the data processing pipelines. Here are some
 potential benefits and considerations for adding ML to SSS for capacity
 planning. However, I stand corrected

1.

*Predictive Capacity Planning:* ML models can analyze historical
data (that we discussed already), workloads, and trends to predict 
 future
resource needs accurately. This enables proactive scaling and 
 allocation of
resources, ensuring optimal performance during high-demand periods, 
 such as
times of high trades.
2.

*Real-time Decision Making: *ML can be used to make real-time
decisions on resource allocation (software and cluster) based on current
data and conditions, allowing for dynamic adjustments to meet the
processing demands.
3.

*Complex Data Analysis: *In a heterogeneous setup involving
multiple databases, ML can analyze various factors like data read and 
 write
times from different databases, data volumes, and data distribution
patterns to optimize the overall data processing flow.
4.

*Anomaly Detection: *ML models can identify unusual patterns or
performance deviations, alerting us to potential issues before they 
 impact
the system.
5.

Integration with Monitoring: ML models can work alongside
monitoring tools, gathering real-time data on various performance 
 metrics,
and using this data for making intelligent decisions on capacity and
resource allocation.

 However, there are some important considerations to keep in mind:

1.

*Model Training: *ML models require training and validation using
relevant data. Our DS colleagues need to define appropriate features,
select the right ML algorithms, and fine-tune the model parameters to
achi

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-12 Thread Holden Karau
To be clear: I am generally supportive of the idea (+1) but have some
follow-up questions:

Have we taken the time to learn from the other operators? Do we have a
compatible CRD/API or not (and if so why?)
The API seems to assume that everything is packaged in the container in
advance, but I imagine that might not be the case for many folks who have
Java or Python packages published to cloud storage and they want to use?
What's our plan for the testing on the potential version explosion (not
tying ourselves to operator version -> spark version makes a lot of sense,
but how do we reasonably assure ourselves that the cross product of
Operator Version, Kube Version, and Spark Version all function)? Do we have
CI resources for this?
Is there a current (non-open source operator) that folks from Apple are
using and planning to open source, or is this a fresh "from the ground up"
operator proposal?
One of the key reasons for this is listed as "An out-of-the-box automation
solution that scales effectively" but I don't see any discussion of the
target scale or plans to achieve it?



On Thu, Nov 9, 2023 at 9:02 PM Zhou Jiang  wrote:

> Hi Spark community,
>
> I'm reaching out to initiate a conversation about the possibility of
> developing a Java-based Kubernetes operator for Apache Spark. Following the
> operator pattern (
> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark
> users may manage applications and related components seamlessly using
> native tools like kubectl. The primary goal is to simplify the Spark user
> experience on Kubernetes, minimizing the learning curve and operational
> complexities and therefore enable users to focus on the Spark application
> development.
>
> Although there are several open-source Spark on Kubernetes operators
> available, none of them are officially integrated into the Apache Spark
> project. As a result, these operators may lack active support and
> development for new features. Within this proposal, our aim is to introduce
> a Java-based Spark operator as an integral component of the Apache Spark
> project. This solution has been employed internally at Apple for multiple
> years, operating millions of executors in real production environments. The
> use of Java in this solution is intended to accommodate a wider user and
> contributor audience, especially those who are familiar with Scala.
>
> Ideally, this operator should have its dedicated repository, similar to
> Spark Connect Golang or Spark Docker, allowing it to maintain a loose
> connection with the Spark release cycle. This model is also followed by the
> Apache Flink Kubernetes operator.
>
> We believe that this project holds the potential to evolve into a thriving
> community project over the long run. A comparison can be drawn with the
> Flink Kubernetes Operator: Apple has open-sourced internal Flink Kubernetes
> operator, making it a part of the Apache Flink project (
> https://github.com/apache/flink-kubernetes-operator). This move has
> gained wide industry adoption and contributions from the community. In a
> mere year, the Flink operator has garnered more than 600 stars and has
> attracted contributions from over 80 contributors. This showcases the level
> of community interest and collaborative momentum that can be achieved in
> similar scenarios.
>
> More details can be found at SPIP doc : Spark Kubernetes Operator
> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
>
> Thanks,
> --
> *Zhou JIANG*
>
>

-- 
Twitter: https://twitter.com/holdenkarau
Books (Learning Spark, High Performance Spark, etc.):
https://amzn.to/2MaRAG9  
YouTube Live Streams: https://www.youtube.com/user/holdenkarau


Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-12 Thread Zhou Jiang
resending cc dev for record - sorry forgot to reply all earlier :)

For 1 - I'm more leaning towards 'official' as this aims to provide Spark
users a community-recommended way to automate and manage Spark deployments
on k8s. It does not mean the current / other options would become
off-standard from my point of view.

For 2/3 - as the operator starts driver pods in the same way as
spark-submit, I would not expect start-up time to be significantly reduced
by using the operator. However there are indeed some optimizations we can
do in practice. For example, with operator we can enable users to separate
the application packaging from Spark: use an init container to load Spark
binary, and apply application jar / packages on top of that in a
different container. The benefit is - application image or package would be
relatively lean and therefore, taking less time to upload to registry or to
download onto nodes. Spark images could be relatively static (e.g. use the
official docker images  ) and hence
can be cached on nodes. There are more technical details that can be
discussed in the upcoming design doc if we agree to proceed with the
operator proposal.

On Fri, Nov 10, 2023 at 8:11 AM Mich Talebzadeh 
wrote:

> Hi,
>
> Looks like a good idea but before committing myself, I have a number of
> design questions having looked at SPIP itself:
>
>
>1. Will the name "Standard add-on Kubernetes operator to Spark ''
>describe it better?
>2. We  are still struggling with improving Spark driver start-up time.
>What would be the footprint of this add-on on the driver start-up time?
>3. In  a commercial world will there be (?) a static image for this
>besides the base image that is maintained in the so called  container
>registry (ECR, GCR etc), It takes time to upload these images. Will this
>bea  static image (docker file)? Other alternative would be that this
>docker file is created by the user through set of scripts?
>
>
> These are the things that come into my mind.
>
> HTH
>
>
> Mich Talebzadeh,
> Distinguished Technologist, Solutions Architect & Engineer
> London
> United Kingdom
>
>
>view my Linkedin profile
> 
>
>
>  https://en.everybodywiki.com/Mich_Talebzadeh
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Fri, 10 Nov 2023 at 14:19, Bjørn Jørgensen 
> wrote:
>
>> +1
>>
>> fre. 10. nov. 2023 kl. 08:39 skrev Nan Zhu :
>>
>>> just curious what happened on google’s spark operator?
>>>
>>> On Thu, Nov 9, 2023 at 19:12 Ilan Filonenko  wrote:
>>>
 +1

 On Thu, Nov 9, 2023 at 7:43 PM Ryan Blue  wrote:

> +1
>
> On Thu, Nov 9, 2023 at 4:23 PM Hussein Awala  wrote:
>
>> +1 for creating an official Kubernetes operator for Apache Spark
>>
>> On Fri, Nov 10, 2023 at 12:38 AM huaxin gao 
>> wrote:
>>
>>> +1
>>>
>>
>>> On Thu, Nov 9, 2023 at 3:14 PM DB Tsai  wrote:
>>>
 +1

 To be completely transparent, I am employed in the same department
 as Zhou at Apple.

 I support this proposal, provided that we witness community
 adoption following the release of the Flink Kubernetes operator,
 streamlining Flink deployment on Kubernetes.

 A well-maintained official Spark Kubernetes operator is essential
 for our Spark community as well.

 DB Tsai  |  https://www.dbtsai.com/
 
  |  PGP 42E5B25A8F7A82C1

 On Nov 9, 2023, at 12:05 PM, Zhou Jiang 
 wrote:

 Hi Spark community,
 I'm reaching out to initiate a conversation about the possibility
 of developing a Java-based Kubernetes operator for Apache Spark. 
 Following
 the operator pattern (
 https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
 

Re: [DISCUSSION] SPIP: An Official Kubernetes Operator for Apache Spark

2023-11-12 Thread Zhou Jiang
I'd say that's actually the other way round. A user may either
1. Use spark-submit, this works with or without operator. Or,
2. Deploy the operator, create the Spark Applications with kubectl /
clients - so that the Operator does spark-submit for you.
We may also continue this discussion in the proposal doc.

On Fri, Nov 10, 2023 at 8:57 PM Cheng Pan  wrote:

> > Not really - this is not designed to be a replacement for the current
> approach.
>
> That's what I assumed too. But my question is, as a user, how to write a
> spark-submit command to submit a Spark app to leverage this operator?
>
> Thanks,
> Cheng Pan
>
>
> > On Nov 11, 2023, at 03:21, Zhou Jiang  wrote:
> >
> > Not really - this is not designed to be a replacement for the current
> approach. Kubernetes operator fits in the scenario for automation and
> application lifecycle management at scale. Users can choose between
> spark-submit and operator approach based on their specific needs and
> requirements.
> >
> > On Thu, Nov 9, 2023 at 9:16 PM Cheng Pan  wrote:
> > Thanks for this impressive proposal, I have a basic question, how does
> spark-submit work with this operator? Or it enforces that we must use
> `kubectl apply -f spark-job.yaml`(or K8s client in programming way) to
> submit Spark app?
> >
> > Thanks,
> > Cheng Pan
> >
> >
> > > On Nov 10, 2023, at 04:05, Zhou Jiang  wrote:
> > >
> > > Hi Spark community,
> > > I'm reaching out to initiate a conversation about the possibility of
> developing a Java-based Kubernetes operator for Apache Spark. Following the
> operator pattern (
> https://kubernetes.io/docs/concepts/extend-kubernetes/operator/), Spark
> users may manage applications and related components seamlessly using
> native tools like kubectl. The primary goal is to simplify the Spark user
> experience on Kubernetes, minimizing the learning curve and operational
> complexities and therefore enable users to focus on the Spark application
> development.
> > > Although there are several open-source Spark on Kubernetes operators
> available, none of them are officially integrated into the Apache Spark
> project. As a result, these operators may lack active support and
> development for new features. Within this proposal, our aim is to introduce
> a Java-based Spark operator as an integral component of the Apache Spark
> project. This solution has been employed internally at Apple for multiple
> years, operating millions of executors in real production environments. The
> use of Java in this solution is intended to accommodate a wider user and
> contributor audience, especially those who are familiar with Scala.
> > > Ideally, this operator should have its dedicated repository, similar
> to Spark Connect Golang or Spark Docker, allowing it to maintain a loose
> connection with the Spark release cycle. This model is also followed by the
> Apache Flink Kubernetes operator.
> > > We believe that this project holds the potential to evolve into a
> thriving community project over the long run. A comparison can be drawn
> with the Flink Kubernetes Operator: Apple has open-sourced internal Flink
> Kubernetes operator, making it a part of the Apache Flink project (
> https://github.com/apache/flink-kubernetes-operator). This move has
> gained wide industry adoption and contributions from the community. In a
> mere year, the Flink operator has garnered more than 600 stars and has
> attracted contributions from over 80 contributors. This showcases the level
> of community interest and collaborative momentum that can be achieved in
> similar scenarios.
> > > More details can be found at SPIP doc : Spark Kubernetes Operator
> https://docs.google.com/document/d/1f5mm9VpSKeWC72Y9IiKN2jbBn32rHxjWKUfLRaGEcLE
> > > Thanks,--
> > > Zhou JIANG
> > >
> >
> >
> >
> > --
> > Zhou JIANG
> >
>
>

-- 
*Zhou JIANG*