Re: spark streaming performance

Tathagata Das Thu, 09 Jul 2015 00:39:51 -0700

What were the number of cores in the executor? It could be that you had
only one core in the executor which did all the 50 tasks serially so 50
task X 15 ms = ~ 1 second.
Could you take a look at the task details in the stage page to see when the
tasks were added to see whether it explains the 5 second?


On Thu, Jul 9, 2015 at 12:21 AM, Michel Hubert <mich...@vsnsystemen.nl>
wrote:

>
>
> Hi,
>
>
>
> I’ve developed a POC Spark Streaming application.
>
> But it seems to perform better on my development machine  than on our
> cluster.
>
> I submit it to yarn on our cloudera cluster.
>
>
>
> But my first question is more detailed:
>
>
>
> In de application UI (:4040) I see in the streaming section that the batch
> processing took 6 sec.
>
> Then when I look at the stages I indeed see a stage with duration 5s.
>
>
>
> For example:
>
> 1678
>
> map at LogonAnalysis.scala:215+details
>
> 2015/07/09 09:17:00
>
> 5 s
>
> 50/50
>
> 173.5 KB
>
>
>
> But when I look into the details of state 1678 it tells me the duration
> was 14 ms and the aggregated metrics by executor has 1.0s as Task Time.
>
> What is responsible for the gap between 14 ms, 1s and 5 sec?
>
>
>
>
>
> *Details for Stage 1678*
>
> ·         *Total task time across all tasks: *0.8 s
>
> ·         *Shuffle write: *173.5 KB / 2031
>
>  *Show additional metrics*
>
> *Summary Metrics for 50 Completed Tasks*
>
> *Metric*
>
> *Min*
>
> *25th percentile*
>
> *Median*
>
> *75th percentile*
>
> *Max*
>
> Duration
>
> 14 ms
>
> 14 ms
>
> 15 ms
>
> 15 ms
>
> 24 ms
>
> GC Time
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> 0 ms
>
> Shuffle Write Size / Records
>
> 2.6 KB / 28
>
> 3.1 KB / 35
>
> 3.5 KB / 42
>
> 3.9 KB / 46
>
> 4.4 KB / 53
>
> *Aggregated Metrics by Executor*
>
> *Executor ID*
>
> *Address*
>
> *Task Time*
>
> *Total Tasks*
>
> *Failed Tasks*
>
> *Succeeded Tasks*
>
> *Shuffle Write Size / Records*
>
> 2
>
> xxxx:44231
>
> 1.0 s
>
> 50
>
> 0
>
> 50
>
> 173.5 KB / 2031
>
>
>
>
>
>
>
>
>
>
>

Re: spark streaming performance

Reply via email to