Re: Tuning Spark Streaming jobs

2014-12-23 Thread Timothy Chen
Hi Gerard, SPARK-4286 is the ticket I am working on, which besides supporting shuffle service it also supports the executor scaling callbacks (kill/request total) for coarse grain mode. I created SPARK-4940 to discuss more about the distribution problem, and let's bring our discussions there.

Tuning Spark Streaming jobs

2014-12-22 Thread Gerard Maas
Hi, After facing issues with the performance of some of our Spark Streaming jobs, we invested quite some effort figuring out the factors that affect the performance characteristics of a Streaming job. We defined an empirical model that helps us reason about Streaming jobs and applied it to tune

Re: Tuning Spark Streaming jobs

2014-12-22 Thread Timothy Chen
Hi Gerard, Really nice guide! I'm particularly interested in the Mesos scheduling side to more evenly distribute cores across cluster. I wonder if you are using coarse grain mode or fine grain mode? I'm making changes to the spark mesos scheduler and I think we can propose a best way to

Re: Tuning Spark Streaming jobs

2014-12-22 Thread Gerard Maas
Hi Tim, That would be awesome. We have seen some really disparate Mesos allocations for our Spark Streaming jobs. (like (7,4,1) over 3 executors for 4 kafka consumer instead of the ideal (3,3,3,3)) For network dependent consumers, achieving an even deployment would provide a reliable and