Hi, We're running several Spark Streaming-kafka-Cassandra jobs on Mesos. I'm currently working on tuning and validating scalability, and I'm looking for the way to configure the number of coarse grained task executors for a job.
For example: I'm consuming 2 Kafka topics with 12 partitions and I've 4 Kafka consumers per topic. I assign max-cores to 16 (2x4 for kafka consumers + 8 for Spark). Then, I sometimes get 3 executors and sometimes 4. Ideally, I want to control that number and always use 4 executors to maximize the distribution of network load. How is the current number of executors for Spark Streaming currently decided? (My hunch is that the number of executors is based on Mesos offers until the (cpu, mem) resources requested have been fulfilled, so it will dynamically depend on the cluster load at any point in time) Any thoughts? -kr, Gerard.