Re: [k8s] Spark operator (the Java one)

2019-10-19 Thread Erik Erlandson
> It's applicable regardless of if the operators are maintained as part of
> Spark core or not, with the maturity of Kubernetes features around CRD
> support and webhooks. The GCP Spark operator supports a lot of additional
> pod/container configs using a webhook, and this approach seems pretty
> successful so far.
>

Agreed (in fact the existence of >= two independent operator projects
testifies to this). I do believe this has implications for how feature
requests for spark-on-k8s get fielded here upstream. There's a non-zero
amount of cognitive load involved with recommending that a feature request
be deferred to some independent operator project. Going forward, will that
complicate the upstream story for spark-submit, history server, and shuffle
service on a kube backend?


Re: [k8s] Spark operator (the Java one)

2019-10-16 Thread Erik Erlandson
Folks have (correctly) pointed out that an operator does not need to be
coupled to the Apache Spark project. However, I believe there are some
strategic community benefits to supporting a Spark operator that should be
weighed against the costs of maintaining one.

*) The Kubernetes ecosystem is evolving toward adopting operators as the de
facto standard for deploying and manipulating software resources on a kube
cluster. Supporting an out-of-the-box operator will increase the
attractiveness of Spark for users and stakeholders in the Kubernetes
ecosystem and maximize future uptake; it will continue to keep the barrier
to entry low for Spark on Kubernetes.

*) An operator provides a unified and idiomatic kube front-end not just for
spark job submissions, but also standalone spark clusters in the cloud, the
spark history server and eventually the modernized shuffle service, when
that is completed.

*) It represents an additional channel for exposing kube-specific features,
that might otherwise need to be plumbed through spark-submit or the k8s
backend.

Cheers,
Erik

On Thu, Oct 10, 2019 at 9:23 PM Yinan Li  wrote:

> +1. This and the GCP Spark Operator, although being very useful for k8s
> users, are not something needed by all Spark users, not even by all Spark
> on k8s users.
>
>
> On Thu, Oct 10, 2019 at 6:34 PM Stavros Kontopoulos <
> stavros.kontopou...@lightbend.com> wrote:
>
>> Hi all,
>>
>> I also left a comment on the PR with more details. I dont see why the
>> java operator should be maintained by the Spark project.
>> This is an interesting project and could thrive on its own as an external
>> operator project.
>>
>> Best,
>> Stavros
>>
>> On Thu, Oct 10, 2019 at 7:51 PM Sean Owen  wrote:
>>
>>> I'd have the same question on the PR - why does this need to be in the
>>> Apache Spark project vs where it is now? Yes, it's not a Spark package
>>> per se, but it seems like this is a tool for K8S to use Spark rather
>>> than a core Spark tool.
>>>
>>> Yes of course all the packages, licenses, etc have to be overhauled,
>>> but that kind of underscores that this is a dump of a third party tool
>>> that works fine on its own?
>>>
>>> On Thu, Oct 10, 2019 at 9:30 AM Jiri Kremser 
>>> wrote:
>>> >
>>> > Hello,
>>> >
>>> >
>>> > Spark Operator is a tool that can deploy/scale and help with
>>> monitoring of Spark clusters on Kubernetes. It follows the operator pattern
>>> [1] introduced by CoreOS so it watches for changes in custom resources
>>> representing the desired state of the clusters and does the steps to
>>> achieve this state in the Kubernetes by using the K8s client. It’s written
>>> in Java and there is an overlap with the spark dependencies (logging, k8s
>>> client, apache-commons-*, fasterxml-jackson, etc.). The operator contains
>>> also metadata that allows it to deploy smoothly using the operatorhub.io
>>> [2]. For a very basic info, check the readme on the project page including
>>> the gif :) Other unique feature to this operator is the ability (it’s
>>> optional) to compile itself to a native image using GraalVM compiler to be
>>> able to start fast and have a very low memory footprint.
>>> >
>>> >
>>> > We would like to contribute this project to Spark’s code base. It
>>> can’t be distributed as a spark package, because it’s not a library that
>>> can be used from Spark environment. So if you are interested, the directory
>>> under resource-managers/kubernetes/spark-operator/ could be a suitable
>>> destination.
>>> >
>>> >
>>> > The current repository is radanalytics/spark-operator [2] on GitHub
>>> and it contains also a test suite [3] that verifies if the operator can
>>> work well on K8s (using minikube) and also on OpenShift. I am not sure how
>>> to transfer those tests in case you would be interested in those as well.
>>> >
>>> >
>>> > I’ve already opened the PR [5], but it got closed, so I am opening the
>>> discussion here first. The PR contained old package names with our
>>> organisation called radanalytics.io but we are willing to change that
>>> to anything that will be more aligned with the existing Spark conventions,
>>> same holds for the license headers in all the source files.
>>> >
>>> >
>>> > jk
>>> >
>>> >
>>> >
>>> > [1]: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
>>> >
>>> > [2]: https://operatorhub.io/operator/radanalytics-spark
>>> >
>>> > [3]: https://github.com/radanalyticsio/spark-operator
>>> >
>>> > [4]: https://travis-ci.org/radanalyticsio/spark-operator
>>> >
>>> > [5]: https://github.com/apache/spark/pull/26075
>>>
>>> -
>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>>
>>>
>>


Re: [k8s] Spark operator (the Java one)

2019-10-16 Thread Yinan Li
Hi Erik,

I agree with what you said about the community benefits of supporting
operators for Spark. However, that doesn't necessarily mean the operators
need and should be part of Spark core.

> *) The Kubernetes ecosystem is evolving toward adopting operators as the
de facto standard for deploying and manipulating software resources on a
kube cluster. Supporting an out-of-the-box operator will increase the
attractiveness of Spark for users and stakeholders in the Kubernetes
ecosystem and maximize future uptake; it will continue to keep the barrier
to entry low for Spark on Kubernetes.

Making an operator be part of Spark core is not necessarily a prerequisite
or condition for building strong community support for it. There are lots
of other third-party projects around Spark that have strong community
support behind them. Kubernetes itself is moving towards a model that more
and more functionalities are built out of core through CRDs, for good
reasons. There are a lot of benefits with this model. We can still build a
strong community for the existing operators for Spark, while using separate
build/test setups, release schedules, and enhancement planning.

> *) It represents an additional channel for exposing kube-specific
features, that might otherwise need to be plumbed through spark-submit or
the k8s backend.

It's applicable regardless of if the operators are maintained as part of
Spark core or not, with the maturity of Kubernetes features around CRD
support and webhooks. The GCP Spark operator supports a lot of additional
pod/container configs using a webhook, and this approach seems pretty
successful so far.


On Wed, Oct 16, 2019 at 7:10 AM Erik Erlandson  wrote:

>
> Folks have (correctly) pointed out that an operator does not need to be
> coupled to the Apache Spark project. However, I believe there are some
> strategic community benefits to supporting a Spark operator that should be
> weighed against the costs of maintaining one.
>
> *) The Kubernetes ecosystem is evolving toward adopting operators as the
> de facto standard for deploying and manipulating software resources on a
> kube cluster. Supporting an out-of-the-box operator will increase the
> attractiveness of Spark for users and stakeholders in the Kubernetes
> ecosystem and maximize future uptake; it will continue to keep the barrier
> to entry low for Spark on Kubernetes.
>
> *) An operator provides a unified and idiomatic kube front-end not just
> for spark job submissions, but also standalone spark clusters in the cloud,
> the spark history server and eventually the modernized shuffle service,
> when that is completed.
>
> *) It represents an additional channel for exposing kube-specific
> features, that might otherwise need to be plumbed through spark-submit or
> the k8s backend.
>
> Cheers,
> Erik
>
> On Thu, Oct 10, 2019 at 9:23 PM Yinan Li  wrote:
>
>> +1. This and the GCP Spark Operator, although being very useful for k8s
>> users, are not something needed by all Spark users, not even by all Spark
>> on k8s users.
>>
>>
>> On Thu, Oct 10, 2019 at 6:34 PM Stavros Kontopoulos <
>> stavros.kontopou...@lightbend.com> wrote:
>>
>>> Hi all,
>>>
>>> I also left a comment on the PR with more details. I dont see why the
>>> java operator should be maintained by the Spark project.
>>> This is an interesting project and could thrive on its own as an
>>> external operator project.
>>>
>>> Best,
>>> Stavros
>>>
>>> On Thu, Oct 10, 2019 at 7:51 PM Sean Owen  wrote:
>>>
 I'd have the same question on the PR - why does this need to be in the
 Apache Spark project vs where it is now? Yes, it's not a Spark package
 per se, but it seems like this is a tool for K8S to use Spark rather
 than a core Spark tool.

 Yes of course all the packages, licenses, etc have to be overhauled,
 but that kind of underscores that this is a dump of a third party tool
 that works fine on its own?

 On Thu, Oct 10, 2019 at 9:30 AM Jiri Kremser 
 wrote:
 >
 > Hello,
 >
 >
 > Spark Operator is a tool that can deploy/scale and help with
 monitoring of Spark clusters on Kubernetes. It follows the operator pattern
 [1] introduced by CoreOS so it watches for changes in custom resources
 representing the desired state of the clusters and does the steps to
 achieve this state in the Kubernetes by using the K8s client. It’s written
 in Java and there is an overlap with the spark dependencies (logging, k8s
 client, apache-commons-*, fasterxml-jackson, etc.). The operator contains
 also metadata that allows it to deploy smoothly using the
 operatorhub.io [2]. For a very basic info, check the readme on the
 project page including the gif :) Other unique feature to this operator is
 the ability (it’s optional) to compile itself to a native image using
 GraalVM compiler to be able to start fast and have a very low memory
 footprint.
 >
 >
 > We would like to 

Re: [k8s] Spark operator (the Java one)

2019-10-10 Thread Yinan Li
+1. This and the GCP Spark Operator, although being very useful for k8s
users, are not something needed by all Spark users, not even by all Spark
on k8s users.


On Thu, Oct 10, 2019 at 6:34 PM Stavros Kontopoulos <
stavros.kontopou...@lightbend.com> wrote:

> Hi all,
>
> I also left a comment on the PR with more details. I dont see why the java
> operator should be maintained by the Spark project.
> This is an interesting project and could thrive on its own as an external
> operator project.
>
> Best,
> Stavros
>
> On Thu, Oct 10, 2019 at 7:51 PM Sean Owen  wrote:
>
>> I'd have the same question on the PR - why does this need to be in the
>> Apache Spark project vs where it is now? Yes, it's not a Spark package
>> per se, but it seems like this is a tool for K8S to use Spark rather
>> than a core Spark tool.
>>
>> Yes of course all the packages, licenses, etc have to be overhauled,
>> but that kind of underscores that this is a dump of a third party tool
>> that works fine on its own?
>>
>> On Thu, Oct 10, 2019 at 9:30 AM Jiri Kremser  wrote:
>> >
>> > Hello,
>> >
>> >
>> > Spark Operator is a tool that can deploy/scale and help with monitoring
>> of Spark clusters on Kubernetes. It follows the operator pattern [1]
>> introduced by CoreOS so it watches for changes in custom resources
>> representing the desired state of the clusters and does the steps to
>> achieve this state in the Kubernetes by using the K8s client. It’s written
>> in Java and there is an overlap with the spark dependencies (logging, k8s
>> client, apache-commons-*, fasterxml-jackson, etc.). The operator contains
>> also metadata that allows it to deploy smoothly using the operatorhub.io
>> [2]. For a very basic info, check the readme on the project page including
>> the gif :) Other unique feature to this operator is the ability (it’s
>> optional) to compile itself to a native image using GraalVM compiler to be
>> able to start fast and have a very low memory footprint.
>> >
>> >
>> > We would like to contribute this project to Spark’s code base. It can’t
>> be distributed as a spark package, because it’s not a library that can be
>> used from Spark environment. So if you are interested, the directory under
>> resource-managers/kubernetes/spark-operator/ could be a suitable
>> destination.
>> >
>> >
>> > The current repository is radanalytics/spark-operator [2] on GitHub and
>> it contains also a test suite [3] that verifies if the operator can work
>> well on K8s (using minikube) and also on OpenShift. I am not sure how to
>> transfer those tests in case you would be interested in those as well.
>> >
>> >
>> > I’ve already opened the PR [5], but it got closed, so I am opening the
>> discussion here first. The PR contained old package names with our
>> organisation called radanalytics.io but we are willing to change that to
>> anything that will be more aligned with the existing Spark conventions,
>> same holds for the license headers in all the source files.
>> >
>> >
>> > jk
>> >
>> >
>> >
>> > [1]: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
>> >
>> > [2]: https://operatorhub.io/operator/radanalytics-spark
>> >
>> > [3]: https://github.com/radanalyticsio/spark-operator
>> >
>> > [4]: https://travis-ci.org/radanalyticsio/spark-operator
>> >
>> > [5]: https://github.com/apache/spark/pull/26075
>>
>> -
>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>>
>>
>


Re: [k8s] Spark operator (the Java one)

2019-10-10 Thread Stavros Kontopoulos
Hi all,

I also left a comment on the PR with more details. I dont see why the java
operator should be maintained by the Spark project.
This is an interesting project and could thrive on its own as an external
operator project.

Best,
Stavros

On Thu, Oct 10, 2019 at 7:51 PM Sean Owen  wrote:

> I'd have the same question on the PR - why does this need to be in the
> Apache Spark project vs where it is now? Yes, it's not a Spark package
> per se, but it seems like this is a tool for K8S to use Spark rather
> than a core Spark tool.
>
> Yes of course all the packages, licenses, etc have to be overhauled,
> but that kind of underscores that this is a dump of a third party tool
> that works fine on its own?
>
> On Thu, Oct 10, 2019 at 9:30 AM Jiri Kremser  wrote:
> >
> > Hello,
> >
> >
> > Spark Operator is a tool that can deploy/scale and help with monitoring
> of Spark clusters on Kubernetes. It follows the operator pattern [1]
> introduced by CoreOS so it watches for changes in custom resources
> representing the desired state of the clusters and does the steps to
> achieve this state in the Kubernetes by using the K8s client. It’s written
> in Java and there is an overlap with the spark dependencies (logging, k8s
> client, apache-commons-*, fasterxml-jackson, etc.). The operator contains
> also metadata that allows it to deploy smoothly using the operatorhub.io
> [2]. For a very basic info, check the readme on the project page including
> the gif :) Other unique feature to this operator is the ability (it’s
> optional) to compile itself to a native image using GraalVM compiler to be
> able to start fast and have a very low memory footprint.
> >
> >
> > We would like to contribute this project to Spark’s code base. It can’t
> be distributed as a spark package, because it’s not a library that can be
> used from Spark environment. So if you are interested, the directory under
> resource-managers/kubernetes/spark-operator/ could be a suitable
> destination.
> >
> >
> > The current repository is radanalytics/spark-operator [2] on GitHub and
> it contains also a test suite [3] that verifies if the operator can work
> well on K8s (using minikube) and also on OpenShift. I am not sure how to
> transfer those tests in case you would be interested in those as well.
> >
> >
> > I’ve already opened the PR [5], but it got closed, so I am opening the
> discussion here first. The PR contained old package names with our
> organisation called radanalytics.io but we are willing to change that to
> anything that will be more aligned with the existing Spark conventions,
> same holds for the license headers in all the source files.
> >
> >
> > jk
> >
> >
> >
> > [1]: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
> >
> > [2]: https://operatorhub.io/operator/radanalytics-spark
> >
> > [3]: https://github.com/radanalyticsio/spark-operator
> >
> > [4]: https://travis-ci.org/radanalyticsio/spark-operator
> >
> > [5]: https://github.com/apache/spark/pull/26075
>
> -
> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
>
>


Re: [k8s] Spark operator (the Java one)

2019-10-10 Thread Sean Owen
I'd have the same question on the PR - why does this need to be in the
Apache Spark project vs where it is now? Yes, it's not a Spark package
per se, but it seems like this is a tool for K8S to use Spark rather
than a core Spark tool.

Yes of course all the packages, licenses, etc have to be overhauled,
but that kind of underscores that this is a dump of a third party tool
that works fine on its own?

On Thu, Oct 10, 2019 at 9:30 AM Jiri Kremser  wrote:
>
> Hello,
>
>
> Spark Operator is a tool that can deploy/scale and help with monitoring of 
> Spark clusters on Kubernetes. It follows the operator pattern [1] introduced 
> by CoreOS so it watches for changes in custom resources representing the 
> desired state of the clusters and does the steps to achieve this state in the 
> Kubernetes by using the K8s client. It’s written in Java and there is an 
> overlap with the spark dependencies (logging, k8s client, apache-commons-*, 
> fasterxml-jackson, etc.). The operator contains also metadata that allows it 
> to deploy smoothly using the operatorhub.io [2]. For a very basic info, check 
> the readme on the project page including the gif :) Other unique feature to 
> this operator is the ability (it’s optional) to compile itself to a native 
> image using GraalVM compiler to be able to start fast and have a very low 
> memory footprint.
>
>
> We would like to contribute this project to Spark’s code base. It can’t be 
> distributed as a spark package, because it’s not a library that can be used 
> from Spark environment. So if you are interested, the directory under 
> resource-managers/kubernetes/spark-operator/ could be a suitable destination.
>
>
> The current repository is radanalytics/spark-operator [2] on GitHub and it 
> contains also a test suite [3] that verifies if the operator can work well on 
> K8s (using minikube) and also on OpenShift. I am not sure how to transfer 
> those tests in case you would be interested in those as well.
>
>
> I’ve already opened the PR [5], but it got closed, so I am opening the 
> discussion here first. The PR contained old package names with our 
> organisation called radanalytics.io but we are willing to change that to 
> anything that will be more aligned with the existing Spark conventions, same 
> holds for the license headers in all the source files.
>
>
> jk
>
>
>
> [1]: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/
>
> [2]: https://operatorhub.io/operator/radanalytics-spark
>
> [3]: https://github.com/radanalyticsio/spark-operator
>
> [4]: https://travis-ci.org/radanalyticsio/spark-operator
>
> [5]: https://github.com/apache/spark/pull/26075

-
To unsubscribe e-mail: dev-unsubscr...@spark.apache.org



[k8s] Spark operator (the Java one)

2019-10-10 Thread Jiri Kremser
Hello,

Spark Operator is a tool that can deploy/scale and help with monitoring of
Spark clusters on Kubernetes. It follows the operator pattern [1]
introduced by CoreOS so it watches for changes in custom resources
representing the desired state of the clusters and does the steps to
achieve this state in the Kubernetes by using the K8s client. It’s written
in Java and there is an overlap with the spark dependencies (logging, k8s
client, apache-commons-*, fasterxml-jackson, etc.). The operator contains
also metadata that allows it to deploy smoothly using the operatorhub.io
[2]. For a very basic info, check the readme on the project page including
the gif :) Other unique feature to this operator is the ability (it’s
optional) to compile itself to a native image using GraalVM compiler to be
able to start fast and have a very low memory footprint.

We would like to contribute this project to Spark’s code base. It can’t be
distributed as a spark package, because it’s not a library that can be used
from Spark environment. So if you are interested, the directory under
resource-managers/kubernetes/spark-operator/ could be a suitable
destination.

The current repository is radanalytics/spark-operator [2] on GitHub and it
contains also a test suite [3] that verifies if the operator can work well
on K8s (using minikube) and also on OpenShift. I am not sure how to
transfer those tests in case you would be interested in those as well.

I’ve already opened the PR [5], but it got closed, so I am opening the
discussion here first. The PR contained old package names with our
organisation called radanalytics.io but we are willing to change that to
anything that will be more aligned with the existing Spark conventions,
same holds for the license headers in all the source files.

jk


[1]: https://kubernetes.io/docs/concepts/extend-kubernetes/operator/

[2]: https://operatorhub.io/operator/radanalytics-spark

[3]: https://github.com/radanalyticsio/spark-operator

[4]: https://travis-ci.org/radanalyticsio/spark-operator
[5]: https://github.com/apache/spark/pull/26075