Are your driver running on the same m/c as master? On 29 Apr 2015 03:59, "Anshul Singhle" <ans...@betaglide.com> wrote:
> Hi, > > I'm running short spark jobs on rdds cached in memory. I'm also using a > long running job context. I want to be able to complete my jobs (on the > cached rdd) in under 1 sec. > I'm getting the following job times with about 15 GB of data distributed > across 6 nodes. Each executor has about 20GB of memory available. My > context has about 26 cores in total. > > If number of partitions < no of cores - > > Some jobs run in 3s others take about 6s -- the time difference can be > explained by GC time. > > If number of partitions = no of cores - > > All jobs run in 4s. The initial tasks of each stage on every executor take > about 1s. > > If partitions > cores - > > Jobs take more time. The initial tasks of each stage on every executor > take about 1s. The other tasks run in 45-50 ms each. However, since the > initial tasks again take about 1s each, the total time in this case is > about 6s which is more than the previous case. > > Clearly the limiting factor here is the initial set of tasks. For every > case, these tasks take 1s to run, no matter the amount of partitions. Hence > best results are obtained with partitions = cores, because in that case. > every core gets 1 task which takes 1s to run. > In this case, I get 0 GC time. The only explanation is "scheduling delay" > which is about 0.2 - 0.3 seconds. I looked at my task size and result size > and that has no bearing on this delay. Also, I'm not getting the task size > warnings in the logs. > > For what I can understand, the first time a task runs on a core, it takes > 1s to run. Is this normal? > > Is it possible to get sub-second latencies? > > Can something be done about the scheduler delay? > > What other things can I look at to reduce this time? > > Regards, > > Anshul > > >