I don't have specific solutions for you, but the general things to try are:
- Decrease task size by broadcasting any non-trivial objects. - Increase duration of tasks by making them less fine-grained. How many tasks are you sending? I've seen in the past something like 25 seconds for ~10k total medium-sized tasks. On Thu, Jun 26, 2014 at 12:06 PM, Kyle Ellrott <kellr...@soe.ucsc.edu> wrote: > I'm working to set up a calculation that involves calling > mllib's SVMWithSGD.train several thousand times on different permutations > of the data. I'm trying to run the separate jobs using a threadpool to > dispatch the different requests to a spark context connected a Mesos's > cluster, using course scheduling, and a max of 2000 cores on Spark 1.0. > Total utilization of the system is terrible. Most of the 'aggregate at > GradientDescent.scala:178' stages(where mllib spends most of its time) take > about 3 seconds, but have ~25 seconds of scheduler delay time. > What kind of things can I do to improve this? > > Kyle >