I like that idea, since ps gives an instantaneous count instead of an average. I don't know much about process states, but some brief playing around reveals some issues:
1. It's not accurate. Running `pbzip2 bigfile.txt` while watching `ps -u $USER -o s,pcpu,args` often shows state S (sleeping) even when using 1600% CPU. 2. It doesn't account for multithreaded programs. Running pbzip2 at 1600% CPU shows only one R (running). Fortunately, ps -L seems to fix this, while also helping #1. 3. If I have more than one disk or network adapter, I can usefully have more than one process in the 'D' (I/O) state. This seems tough to get right automatically; perhaps a separate "--ioload" option is easiest. If the machine is swapping, the user can fix that with --noswap. But swapping is really running out of memory, which is probably best addressed with an orthogonal "--memload" or "--min-free-mem" option. Having something like parallel --load 100% --ioload 4 --min-free-mem 2G would be awesome: only start new jobs if there are < $num_cpus threads in the R state, < 4 in the D state, and > 2GB free memory. That way I can have lots of processes with highly variable workloads all doing their own thing and not stepping on each other. (I can see both --memload and --min-free-mem being useful, the first for "I need to reserve 4G system RAM for other stuff," the second for "Each job needs 2G RAM." The second is probably more common.) On Thu, Mar 22, 2012 at 12:20 PM, Ole Tange <[email protected]> wrote: > On Thu, Mar 22, 2012 at 4:48 PM, Jay Hacker <[email protected]> wrote: > >> Perhaps this is a bit simplistic, but what if you took your idea and >> also kept a running estimate of the amount of load added by each job? >> Start out assuming each job adds 1 unit of load, and then measure: >> "Okay, I started 4 jobs last time, and the load went up by 8, so I >> estimate each job causes 2 units of load." Then when you sample the >> difference and current load is say 12, with 16 procs, you'll only add >> 2 jobs, and the load doesn't go over the max. > > That would only work on dedicated single user systems. > > My servers are (ab)used by 3-5 people at the same time. > > But I am warming up to the idea of ignoring load and instead just look > at 'ps -A -o s'. > > 1: If number of 'R' == number of cpus: Do not start another. > 2: If number of 'D' amongst (grand)children >= 1: Do not start another. > 3: Else start a job more. > > CPU limited tasks will be limited by rule 1. > Disk and NFS I/O limited tasks will be limited by rule 2. > Net I/O will not be limited. > > I have not tested what will happen if the machine is swapping. > > > /Ole
