Here is a good post on linux performance monitoring tools. Look for the cheat sheet image towards the bottom of the post. http://www.joyent.com/blog/linux-performance-analysis-and-tools-brendan-gregg-s-talk-at-scale-11x
On Thu, Jan 9, 2014 at 3:23 PM, Matei Zaharia <[email protected]>wrote: > Typically you want 2-3 partitions per CPU core to get good load balancing. > How big is the data you’re transferring in this case? And have you looked > at the machines to see whether they’re spending lots of time on IO, CPU, > etc? (Use top or dstat on each machine for this). For large datasets with > larger numbers of tasks, one option we added in 0.8.1 that helps a lot is > consolidating shuffle files (see > http://spark.incubator.apache.org/releases/spark-release-0-8-1.html). > However, another common problem is just serialization taking a lot of time, > which you’ll notice if the application is CPU-heavy, and which you can fix > using Kryo. > > Matei > > On Jan 9, 2014, at 2:11 PM, Yann Luppo <[email protected]> wrote: > > Thank you guys that was really helpful in identifying the slow step, > which in our case is the leftouterjoin. > I'm checking with our admins to see if we have some sort of distributed > system monitoring in place, which I'm sure we do. > > Now just out of curiosity, what would be the rule of thumb or general > guideline for the number of partitions and the number of reducers? > Should it be some kind of factor of the number of cores available? Of > nodes available? Should the number of partitions match the number of > reducers or at least be some multiple of it for better performance? > > Thanks, > Yann > > From: Evan Sparks <[email protected]> > Reply-To: "[email protected]" < > [email protected]> > Date: Wednesday, January 8, 2014 5:28 PM > To: "[email protected]" <[email protected]> > Cc: "[email protected]" <[email protected]> > Subject: Re: performance > > On this note - the ganglia web front end that runs on the master > (assuming you're launching with the ec2 scripts) is great for this. > > Also, a common technique for diagnosing "which step is slow" is to run a > '.cache' and a '.count' on the RDD after each step. This forces the RDD to > be materialized, which subverts the lazy evaluation that causes such > diagnosis to be hard sometimes. > > - Evan > > On Jan 8, 2014, at 2:57 PM, Andrew Ash <[email protected]> wrote: > > My first thought on hearing that you're calling collect is that taking > all the data back to the driver is intensive on the network. Try checking > the basic systems stuff on the machines to get a sense of what's being > heavily used: > > disk IO > CPU > network > > Any kind of distributed system monitoring framework should be able to > handle these sorts of things. > > Cheers! > Andrew > > > On Wed, Jan 8, 2014 at 1:49 PM, Yann Luppo <[email protected]>wrote: > >> Hi, >> >> I have what I hope is a simple question. What's a typical approach to >> diagnostic performance issues on a Spark cluster? >> We've followed all the pertinent parts of the following document already: >> http://spark.incubator.apache.org/docs/latest/tuning.html >> But we seem to still have issues. More specifically we have a >> leftouterjoin followed by a flatmap and then a collect running a bit long. >> >> How would I go about determining the bottleneck operation(s) ? >> Is our leftouterjoin taking a long time? >> Is the function we send to the flatmap not optimized? >> >> Thanks, >> Yann >> > > >
