Structured Streaming on Kubernetes Performance

2018-12-14 Thread Kalvin Chau
Hi,

We've recently started testing spark on kubernetes, and have found some odd
performance decreases. In particular its almost an order of magnitude
slower pulling data from kafka than it is in our mesos cluster.

We've tested a few set-ups:

Baseline: Spark 2.3.0 on Mesos host networking (~5million records processed
per batch, ~12-15s a batch)

Spark 2.3.0 on k8s 1.10 EKS with the AWS CNI plugin
Spark 2.3.0 on k8s 1.10 EKS with the Calico CNI plugin
Spark 2.4.0 on k8s 1.10.x with the Cilium CNI plugin (~5 million, 80-100s a
batch)

All of them show the same performance decrease, though we don't have good
numbers we have on the EKS cluster (those tests were run by a different
group). On the 2.4.0/Cilium cluster I run I've seen roughly 8x performance
decrease compared to the equivalent in our mesos cluster.

Our production mesos job runs with 20 executors with 2 cores each, 6g mem.
When we set-up the equivalent I also set it up in the same fashion and
removed CPU limits so that it could burst up if it was needing more CPU.
But none of that seemed to get it to the performance level of our mesos
set-up.

The mesos cluster does not use any CNI, so it's all host based networking,
but I woulnd't expect an overlay network to slow down our jobs by an order
of magnitude.

I was able to compare running a simple consumer that just reads from kafak
and measures how fast it could go, and found that running in the overlay
was about 8% slower than running in the host network stack. So the numbers
aren't lining up for an order of magnitude slower.

Does anyone else have any experience in running streaming production
workloads in kubernetes and if they've run into issues with performance?
Any potential settings I could be missing?

I know most kubernetes clusters are set-up wildly differently but any tips
or insights on where to look would be greatly appreciated.

Thanks,
Kalvin


Re: Structured Streaming on Kubernetes

2018-08-21 Thread puneetloya
Thanks for putting a comprehensive observation about Spark on Kubernetes. In
mesos Spark deployment, it has a property called spark.mesos.extra.cores.
The property means:
*
Set the extra number of cores for an executor to advertise. This does not
result in more cores allocated. It instead means that an executor will
"pretend" it has more cores, so that the driver will send it more tasks. Use
this to increase parallelism. This setting is only used for Mesos
coarse-grained mode*

Can this be used to increase parallelism? Are there other better ways to
increase parallelism in Kubernetes?





--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Structured Streaming on Kubernetes

2018-04-16 Thread Krishna Kalyan
Thank you so much TD, Matt, Anirudh and Oz,
Really appropriate this.

On Fri, Apr 13, 2018 at 9:54 PM, Oz Ben-Ami <ozzi...@gmail.com> wrote:

> I can confirm that Structured Streaming works on Kubernetes, though we're
> not quite on production with that yet. Issues we're looking at are:
> - Submission through spark-submit works, but is a bit clunky with a
> kubernetes-centered workflow. Spark Operator
> <https://github.com/GoogleCloudPlatform/spark-on-k8s-operator> is
> promising, but still in alpha (eg, we ran into this
> <https://github.com/kubernetes/kubernetes/issues/56018>). Even better
> would be something that runs the driver as a Deployment / StatefulSet, so
> that long-running streaming jobs can be restarted automatically
> - Dynamic allocation: works with the spark-on-k8s fork, but not with plain
> Spark 2.3, due to reliance on shuffle service which hasn't been merged yet.
> Ideal implementation would be able to connect to a PersistentVolume
> independently of a node, but that's a bit more complicated
> - Checkpointing: We checkpoint to a separate HDFS (Dataproc) cluster,
> which works well for us both on the old Spark Streaming and Structured
> Streaming. We've successfully experimented with HDFS on Kubernetes
> <https://github.com/apache-spark-on-k8s/kubernetes-HDFS/tree/master>, but
> again not in production
> - UI: Unfortunately Structured Streaming does not yet have a comprehensive
> UI like the old Spark Streaming, but it does show the basic information
> (jobs, stages, queries, executors), and other information is generally
> available in the logs and metrics
> - Monitoring / Logging: this is a strength of Kubernetes, in that it's all
> centralized by the cluster. We use Splunk, but it would also be possible to 
> hook
> up <https://github.com/dhatim/dropwizard-prometheus> Spark's Dropwizard
> Metrics library to Prometheus, and read logs with fluentd or Stackdriver.
> - Side note: Kafka support in Spark and Structured Streaming is very good,
> but as of Spark 2.3 there are still a couple of missing features, notably
> transparent avro support (UDFs are needed) and taking advantage of
> transactional processing (introduced to Kafka last year) for better
> exactly-once guarantees
>
> On Fri, Apr 13, 2018 at 3:08 PM, Anirudh Ramanathan <
> ramanath...@google.com> wrote:
>
>> +ozzieba who was experimenting with streaming workloads recently. +1 to
>> what Matt said. Checkpointing and driver recovery is future work.
>> Structured streaming is important, and it would be good to get some
>> production experiences here and try and target improving the feature's
>> support on K8s for 2.4/3.0.
>>
>>
>> On Fri, Apr 13, 2018 at 11:55 AM Matt Cheah <mch...@palantir.com> wrote:
>>
>>> We don’t provide any Kubernetes-specific mechanisms for streaming, such
>>> as checkpointing to persistent volumes. But as long as streaming doesn’t
>>> require persisting to the executor’s local disk, streaming ought to work
>>> out of the box. E.g. you can checkpoint to HDFS, but not to the pod’s local
>>> directories.
>>>
>>>
>>>
>>> However, I’m unaware of any specific use of streaming with the Spark on
>>> Kubernetes integration right now. Would be curious to get feedback on the
>>> failover behavior right now.
>>>
>>>
>>>
>>> -Matt Cheah
>>>
>>>
>>>
>>> *From: *Tathagata Das <t...@databricks.com>
>>> *Date: *Friday, April 13, 2018 at 1:27 AM
>>> *To: *Krishna Kalyan <krishnakaly...@gmail.com>
>>> *Cc: *user <user@spark.apache.org>
>>> *Subject: *Re: Structured Streaming on Kubernetes
>>>
>>>
>>>
>>> Structured streaming is stable in production! At Databricks, we and our
>>> customers collectively process almost 100s of billions of records per day
>>> using SS. However, we are not using kubernetes :)
>>>
>>>
>>>
>>> Though I don't think it will matter too much as long as kubes are
>>> correctly provisioned+configured and you are checkpointing to HDFS (for
>>> fault-tolerance guarantees).
>>>
>>>
>>>
>>> TD
>>>
>>>
>>>
>>> On Fri, Apr 13, 2018, 12:28 AM Krishna Kalyan <krishnakaly...@gmail.com>
>>> wrote:
>>>
>>> Hello All,
>>>
>>> We were evaluating Spark Structured Streaming on Kubernetes (Running on
>>> GCP). It would be awesome if the spark community could share their
>>> experience around this. I would like to know more about you production
>>> experience and the monitoring tools you are using.
>>>
>>>
>>>
>>> Since spark on kubernetes is a relatively new addition to spark, I was
>>> wondering if structured streaming is stable in production. We were also
>>> evaluating Apache Beam with Flink.
>>>
>>>
>>>
>>> Regards,
>>>
>>> Krishna
>>>
>>>
>>>
>>>
>>>
>>>
>>
>> --
>> Anirudh Ramanathan
>>
>
>


Re: Structured Streaming on Kubernetes

2018-04-13 Thread Anirudh Ramanathan
+ozzieba who was experimenting with streaming workloads recently. +1 to
what Matt said. Checkpointing and driver recovery is future work.
Structured streaming is important, and it would be good to get some
production experiences here and try and target improving the feature's
support on K8s for 2.4/3.0.


On Fri, Apr 13, 2018 at 11:55 AM Matt Cheah <mch...@palantir.com> wrote:

> We don’t provide any Kubernetes-specific mechanisms for streaming, such as
> checkpointing to persistent volumes. But as long as streaming doesn’t
> require persisting to the executor’s local disk, streaming ought to work
> out of the box. E.g. you can checkpoint to HDFS, but not to the pod’s local
> directories.
>
>
>
> However, I’m unaware of any specific use of streaming with the Spark on
> Kubernetes integration right now. Would be curious to get feedback on the
> failover behavior right now.
>
>
>
> -Matt Cheah
>
>
>
> *From: *Tathagata Das <t...@databricks.com>
> *Date: *Friday, April 13, 2018 at 1:27 AM
> *To: *Krishna Kalyan <krishnakaly...@gmail.com>
> *Cc: *user <user@spark.apache.org>
> *Subject: *Re: Structured Streaming on Kubernetes
>
>
>
> Structured streaming is stable in production! At Databricks, we and our
> customers collectively process almost 100s of billions of records per day
> using SS. However, we are not using kubernetes :)
>
>
>
> Though I don't think it will matter too much as long as kubes are
> correctly provisioned+configured and you are checkpointing to HDFS (for
> fault-tolerance guarantees).
>
>
>
> TD
>
>
>
> On Fri, Apr 13, 2018, 12:28 AM Krishna Kalyan <krishnakaly...@gmail.com>
> wrote:
>
> Hello All,
>
> We were evaluating Spark Structured Streaming on Kubernetes (Running on
> GCP). It would be awesome if the spark community could share their
> experience around this. I would like to know more about you production
> experience and the monitoring tools you are using.
>
>
>
> Since spark on kubernetes is a relatively new addition to spark, I was
> wondering if structured streaming is stable in production. We were also
> evaluating Apache Beam with Flink.
>
>
>
> Regards,
>
> Krishna
>
>
>
>
>
>

-- 
Anirudh Ramanathan


Re: Structured Streaming on Kubernetes

2018-04-13 Thread Matt Cheah
We don’t provide any Kubernetes-specific mechanisms for streaming, such as 
checkpointing to persistent volumes. But as long as streaming doesn’t require 
persisting to the executor’s local disk, streaming ought to work out of the 
box. E.g. you can checkpoint to HDFS, but not to the pod’s local directories.

 

However, I’m unaware of any specific use of streaming with the Spark on 
Kubernetes integration right now. Would be curious to get feedback on the 
failover behavior right now.

 

-Matt Cheah

 

From: Tathagata Das <t...@databricks.com>
Date: Friday, April 13, 2018 at 1:27 AM
To: Krishna Kalyan <krishnakaly...@gmail.com>
Cc: user <user@spark.apache.org>
Subject: Re: Structured Streaming on Kubernetes

 

Structured streaming is stable in production! At Databricks, we and our 
customers collectively process almost 100s of billions of records per day using 
SS. However, we are not using kubernetes :) 

 

Though I don't think it will matter too much as long as kubes are correctly 
provisioned+configured and you are checkpointing to HDFS (for fault-tolerance 
guarantees).

 

TD

 

On Fri, Apr 13, 2018, 12:28 AM Krishna Kalyan <krishnakaly...@gmail.com> wrote:

Hello All, 

We were evaluating Spark Structured Streaming on Kubernetes (Running on GCP). 
It would be awesome if the spark community could share their experience around 
this. I would like to know more about you production experience and the 
monitoring tools you are using.

 

Since spark on kubernetes is a relatively new addition to spark, I was 
wondering if structured streaming is stable in production. We were also 
evaluating Apache Beam with Flink.

 

Regards,

Krishna

 

 



smime.p7s
Description: S/MIME cryptographic signature


Re: Structured Streaming on Kubernetes

2018-04-13 Thread Tathagata Das
Structured streaming is stable in production! At Databricks, we and our
customers collectively process almost 100s of billions of records per day
using SS. However, we are not using kubernetes :)

Though I don't think it will matter too much as long as kubes are correctly
provisioned+configured and you are checkpointing to HDFS (for
fault-tolerance guarantees).

TD

On Fri, Apr 13, 2018, 12:28 AM Krishna Kalyan <krishnakaly...@gmail.com>
wrote:

> Hello All,
> We were evaluating Spark Structured Streaming on Kubernetes (Running on
> GCP). It would be awesome if the spark community could share their
> experience around this. I would like to know more about you production
> experience and the monitoring tools you are using.
>
> Since spark on kubernetes is a relatively new addition to spark, I was
> wondering if structured streaming is stable in production. We were also
> evaluating Apache Beam with Flink.
>
> Regards,
> Krishna
>
>
>


Structured Streaming on Kubernetes

2018-04-13 Thread Krishna Kalyan
Hello All,
We were evaluating Spark Structured Streaming on Kubernetes (Running on
GCP). It would be awesome if the spark community could share their
experience around this. I would like to know more about you production
experience and the monitoring tools you are using.

Since spark on kubernetes is a relatively new addition to spark, I was
wondering if structured streaming is stable in production. We were also
evaluating Apache Beam with Flink.

Regards,
Krishna