Here is a good post on linux performance monitoring tools.  Look for the
cheat sheet image towards the bottom of the post.
http://www.joyent.com/blog/linux-performance-analysis-and-tools-brendan-gregg-s-talk-at-scale-11x


On Thu, Jan 9, 2014 at 3:23 PM, Matei Zaharia <[email protected]>wrote:

> Typically you want 2-3 partitions per CPU core to get good load balancing.
> How big is the data you’re transferring in this case? And have you looked
> at the machines to see whether they’re spending lots of time on IO, CPU,
> etc? (Use top or dstat on each machine for this). For large datasets with
> larger numbers of tasks, one option we added in 0.8.1 that helps a lot is
> consolidating shuffle files (see
> http://spark.incubator.apache.org/releases/spark-release-0-8-1.html).
> However, another common problem is just serialization taking a lot of time,
> which you’ll notice if the application is CPU-heavy, and which you can fix
> using Kryo.
>
> Matei
>
> On Jan 9, 2014, at 2:11 PM, Yann Luppo <[email protected]> wrote:
>
>  Thank you guys that was really helpful in identifying the slow step,
> which in our case is the leftouterjoin.
> I'm checking with our admins to see if we have some sort of distributed
> system monitoring in place, which I'm sure we do.
>
>  Now just out of curiosity, what would be the rule of thumb or general
> guideline for the number of partitions and the number of reducers?
> Should it be some kind of factor of the number of cores available? Of
> nodes available? Should the number of partitions match the number of
> reducers or at least be some multiple of it for better performance?
>
>  Thanks,
> Yann
>
>   From: Evan Sparks <[email protected]>
> Reply-To: "[email protected]" <
> [email protected]>
> Date: Wednesday, January 8, 2014 5:28 PM
> To: "[email protected]" <[email protected]>
> Cc: "[email protected]" <[email protected]>
> Subject: Re: performance
>
>   On this note - the ganglia web front end that runs on the master
> (assuming you're launching with the ec2 scripts) is great for this.
>
>  Also, a common technique for diagnosing "which step is slow" is to run a
> '.cache' and a '.count' on the RDD after each step. This forces the RDD to
> be materialized, which subverts the lazy evaluation that causes such
> diagnosis to be hard sometimes.
>
>  - Evan
>
> On Jan 8, 2014, at 2:57 PM, Andrew Ash <[email protected]> wrote:
>
>   My first thought on hearing that you're calling collect is that taking
> all the data back to the driver is intensive on the network.  Try checking
> the basic systems stuff on the machines to get a sense of what's being
> heavily used:
>
>  disk IO
> CPU
> network
>
>  Any kind of distributed system monitoring framework should be able to
> handle these sorts of things.
>
>  Cheers!
> Andrew
>
>
> On Wed, Jan 8, 2014 at 1:49 PM, Yann Luppo <[email protected]>wrote:
>
>>  Hi,
>>
>>  I have what I hope is a simple question. What's a typical approach to
>> diagnostic performance issues on a Spark cluster?
>> We've followed all the pertinent parts of the following document already:
>> http://spark.incubator.apache.org/docs/latest/tuning.html
>> But we seem to still have issues. More specifically we have a
>> leftouterjoin followed by a flatmap and then a collect running a bit long.
>>
>>  How would I go about determining the bottleneck operation(s) ?
>> Is our leftouterjoin taking a long time?
>> Is the function we send to the flatmap not optimized?
>>
>>  Thanks,
>> Yann
>>
>
>
>

Reply via email to