I don't have specific solutions for you, but the general things to try are:

- Decrease task size by broadcasting any non-trivial objects.
- Increase duration of tasks by making them less fine-grained.

How many tasks are you sending? I've seen in the past something like 25
seconds for ~10k total medium-sized tasks.


On Thu, Jun 26, 2014 at 12:06 PM, Kyle Ellrott <kellr...@soe.ucsc.edu>
wrote:

> I'm working to set up a calculation that involves calling
> mllib's SVMWithSGD.train several thousand times on different permutations
> of the data. I'm trying to run the separate jobs using a threadpool to
> dispatch the different requests to a spark context connected a Mesos's
> cluster, using course scheduling, and a max of 2000 cores on Spark 1.0.
> Total utilization of the system is terrible. Most of the 'aggregate at
> GradientDescent.scala:178' stages(where mllib spends most of its time) take
> about 3 seconds, but have ~25 seconds of scheduler delay time.
> What kind of things can I do to improve this?
>
> Kyle
>

Reply via email to