Re: Slow start to cope with load

Jay Hacker Thu, 22 Mar 2012 08:48:27 -0700

Perhaps this is a bit simplistic, but what if you took your idea and
also kept a running estimate of the amount of load added by each job?
Start out assuming each job adds 1 unit of load, and then measure:
"Okay, I started 4 jobs last time, and the load went up by 8, so I
estimate each job causes 2 units of load."  Then when you sample the
difference and current load is say 12, with 16 procs, you'll only add
2 jobs, and the load doesn't go over the max.


I'm not sure exactly how to calculate it, but a first stab might be
load_per_job = current_load / job_slots, and then job_slots +=
(desired_load - current_load) / load_per_job.  Really you probably
want a moving average.  But something like that could let you learn
how your jobs affect the system.

-John


On Thu, Mar 15, 2012 at 8:32 PM, Ole Tange <[email protected]> wrote:
> Thomas got me thinking.
>
> One of the problems with --load is that it only limits how many jobs
> are started. So you may start way too many. This will give you a load
> of 100:
>
>  seq 100 | nice parallel -j0 --load 2.00 burnP6
>
> and that is most likely not what you want.
>
> While some programs run multiple threads (and thus can give a load > 1
> each) that is the exception. So in general I think we can assume one
> job will at most give a load of 1.
>
> Currently load is only computed every 10 seconds. So we could
> recompute every 10 seconds:
>
>    number_of_concurrent_jobs = max_load - current_load +
> number_of_concurrent_jobs
>
> If the job immediately takes 100% CPU time (like burnP6) then the
> number of processes will grow every 10 seconds with the difference
> between current load and max load. As the load lags behind it may
> cause us to spawn too many processes that will cause a load > max
> load. But when the jobs finish the the load will over time drop to the
> max load.
>
> If the job never takes 100% CPU time (like host) then the number of
> processes will grow every 10 seconds with the difference between
> current load and max load.
>
> If the job takes 100% CPU time after some initialization (like blast)
> then the number of processes will grow every 10 seconds with the
> difference between current load and max load. The current load will
> start out small, this may cause us to spawn too many processes that
> will cause a load > max load.
>
> If the job takes >100% CPU time after some initialization (like
> multithreaded blast) then the number of processes will grow every 10
> seconds with the difference between current load and max load. The
> current load will start out small, this may cause us to spawn too many
> processes that will cause a load > max load.
>
> I believe it would be better than the current, but I am very open to
> even better ideas.
>
>
> /Ole
>

Re: Slow start to cope with load

Reply via email to