Also, take a look at the driver logs -- if there is overhead before the
first task is launched, the driver logs would likely reveal this.


On Tue, Apr 8, 2014 at 9:21 AM, Aaron Davidson <ilike...@gmail.com> wrote:

> Off the top of my head, the most likely cause would be driver GC issues.
> You can diagnose this by enabling GC printing at the driver and you can fix
> this by increasing the amount of memory your driver program has (see
> http://spark.apache.org/docs/0.9.0/tuning.html#garbage-collection-tuning).
>
> The "launch time" statistic would also be useful -- if all tasks are
> launched at around the same time and complete within 300ms, yet the total
> time is 10s, that strongly suggests that the overhead is coming before the
> first task is launched. Similarly, it would be useful to know if there was
> a large gap between launch times, or if it appeared that the launch times
> were serial with respect to the durations. If for some reason Spark started
> using only one executor, say, each task would take the same duration but
> would be executed one after another.
>
>
> On Tue, Apr 8, 2014 at 8:11 AM, Yana Kadiyska <yana.kadiy...@gmail.com>wrote:
>
>> Hi Spark users, I'm very much hoping someone can help me out.
>>
>> I have a strict performance requirement on a particular query. One of
>> the stages shows great variance in duration -- from 300ms to 10sec.
>>
>> The stage is mapPartitionsWithIndex at Operator.scala:210 (running Spark
>> 0.8)
>>
>> I have run the job quite a few times -- the details within the stage
>> do not account for the overall duration shown for the stage. What
>> could be taking up time that's not showing within the stage breakdown
>> UI? Im thinking that reading the data in is reflected in the Duration
>> column before, so caching should not be a reason(I'm not caching
>> explicitly)?
>>
>> The details within the stage always show roughly the following (both
>> for the 10second and 600ms query -- very little variation, nothing
>> over 500ms, ShuffleWrite size is pretty comparable):
>>
>> StatusLocality LevelExecutor Launch Time DurationGC TimeShuffle Write
>> 1864  SUCCESSNODE_LOCAL #######  301 ms  8 ms  111.0 B
>> 1863  SUCCESSNODE_LOCAL #######  273 ms            102.0 B
>> 1862  SUCCESSNODE_LOCAL #######  245 ms            111.0 B
>> 1861  SUCCESSNODE_LOCAL #######  326 ms  4 ms   102.0 B
>> 1860  SUCCESSNODE_LOCAL #######  217 ms  6 ms   102.0 B
>> 1859  SUCCESSNODE_LOCAL #######  277 ms             111.0 B
>> 1858  SUCCESSNODE_LOCAL #######  262 ms             108.0 B
>> 1857  SUCCESSNODE_LOCAL #######  217 ms  14 ms  112.0 B
>> 1856  SUCCESSNODE_LOCAL #######  208 ms              109.0 B
>> 1855  SUCCESSNODE_LOCAL #######  242 ms              74.0 B
>> 1854  SUCCESSNODE_LOCAL #######  218 ms  3 ms     58.0 B
>> 1853  SUCCESSNODE_LOCAL #######  254 ms  12 ms   102.0 B
>> 1852  SUCCESSNODE_LOCAL #######  274 ms  8 ms     77.0 B
>>
>
>

Reply via email to