Perhaps this is a bit simplistic, but what if you took your idea and also kept a running estimate of the amount of load added by each job? Start out assuming each job adds 1 unit of load, and then measure: "Okay, I started 4 jobs last time, and the load went up by 8, so I estimate each job causes 2 units of load." Then when you sample the difference and current load is say 12, with 16 procs, you'll only add 2 jobs, and the load doesn't go over the max.
I'm not sure exactly how to calculate it, but a first stab might be load_per_job = current_load / job_slots, and then job_slots += (desired_load - current_load) / load_per_job. Really you probably want a moving average. But something like that could let you learn how your jobs affect the system. -John On Thu, Mar 15, 2012 at 8:32 PM, Ole Tange <[email protected]> wrote: > Thomas got me thinking. > > One of the problems with --load is that it only limits how many jobs > are started. So you may start way too many. This will give you a load > of 100: > > seq 100 | nice parallel -j0 --load 2.00 burnP6 > > and that is most likely not what you want. > > While some programs run multiple threads (and thus can give a load > 1 > each) that is the exception. So in general I think we can assume one > job will at most give a load of 1. > > Currently load is only computed every 10 seconds. So we could > recompute every 10 seconds: > > number_of_concurrent_jobs = max_load - current_load + > number_of_concurrent_jobs > > If the job immediately takes 100% CPU time (like burnP6) then the > number of processes will grow every 10 seconds with the difference > between current load and max load. As the load lags behind it may > cause us to spawn too many processes that will cause a load > max > load. But when the jobs finish the the load will over time drop to the > max load. > > If the job never takes 100% CPU time (like host) then the number of > processes will grow every 10 seconds with the difference between > current load and max load. > > If the job takes 100% CPU time after some initialization (like blast) > then the number of processes will grow every 10 seconds with the > difference between current load and max load. The current load will > start out small, this may cause us to spawn too many processes that > will cause a load > max load. > > If the job takes >100% CPU time after some initialization (like > multithreaded blast) then the number of processes will grow every 10 > seconds with the difference between current load and max load. The > current load will start out small, this may cause us to spawn too many > processes that will cause a load > max load. > > I believe it would be better than the current, but I am very open to > even better ideas. > > > /Ole >
