Spark Streaming on a Cluster

Craig Vanderborgh Tue, 03 Dec 2013 13:18:03 -0800

Hi All,

I am working toward running some of our Spark Streaming jobs on a cluster.
 However, I have not seen documentation on best practices for this.  Here
and there I have found some lore though:


1. Keeping task latency low is paramount.  Spark master has lower task
latency than Mesos, but "local" is the best.

2. It is possible to configure range partitioning so that ranges of keys
for incoming events are sent to the same node for processing.  This allows
Spark Streaming to perform parallel computation using multiple nodes.

Here's what I need:  What is the best way to configure a Spark Streaming
job to use range partitioning, a la #2 above?  I need the details:  what
has to be changed in the job's source code, whether to use "spark" master,
etc.

Thanks in advance,
Craig Vanderborgh

Spark Streaming on a Cluster

Reply via email to