For Kubernetes yes -- we use AWS EKS. We manage the Spark cluster ourselves
and it is deployed via Helm chart.

On Wed, Feb 2, 2022 at 8:32 AM Mich Talebzadeh <mich.talebza...@gmail.com>
wrote:

> Is this part of cloud service offerings like Google GKE?
>
>
>
>    view my Linkedin profile
> <https://www.linkedin.com/in/mich-talebzadeh-ph-d-5205b2/>
>
>
>
> *Disclaimer:* Use it at your own risk. Any and all responsibility for any
> loss, damage or destruction of data or any other property which may arise
> from relying on this email's technical content is explicitly disclaimed.
> The author will in no case be liable for any monetary damages arising from
> such loss, damage or destruction.
>
>
>
>
> On Wed, 2 Feb 2022 at 13:17, Han Lin <h...@twistbioscience.com> wrote:
>
>> Hello,
>>
>> Before I ask my questions, let me say what I am trying to do and briefly
>> describe the setup I have so far. I am basically building an API service
>> that serves a ML model which uses Spark ML. I have Spark deployed in
>> Kubernetes in standalone mode (the default Spark manager) with 2 worker
>> nodes, each worker running as a pod. The API app itself is in Python using
>> FastAPI, and uses PySpark to start a Spark application in the Spark
>> cluster. The Spark Driver is created on app startup and lives in the same
>> pod as the app, so the Spark application is expected to run for as long as
>> the API pod is alive, which is indefinitely. I understand that this is not
>> the typical pattern, since usually a Spark application is expected to
>> complete at some point, as a kind of workload that eventually returns a
>> result.
>> When load on the API increases, I would like to have the Spark cluster
>> scale up automatically by increasing the number of worker replicas, and
>> have Spark application scale up by using those new workers in the cluster.
>> For controlling horizontal pod scaling in Kubernetes we use KEDA
>> <https://keda.sh/>, and in this case I will probably use certain
>> Prometheus metrics exposed by Spark Master as the scaling trigger. This
>> leads to my questions.
>>
>> 1. I initially thought the number of pending jobs (jobs waiting for
>> available executors) in the Spark cluster would be a good metric to use as
>> a scaling trigger. But after some digging, I found that "pending" is not
>> one of the statuses of jobs. Spark doesn't seem to have an obvious concept
>> of a job-queue. Are there any metrics that would give some indication of
>> the number of jobs in the cluster waiting to be run?
>> 2. Are there other Spark metrics that would make sense to use as a
>> scaling trigger?
>> 3. Does this setup of having perpetual-running Spark applications even
>> make much sense in the first place?
>>
>> Thanks very much in advance for any advice!
>>
>

Reply via email to