> >>> This patch changes the default setting to 1.5 * > >>> (number_of_cpus_in_system) instead, which I find better matches modern > >>> behaviour. > >> > >> A larger number is sensible in this position. > >> > >> I would propose 8. I don't agree with a calculation like that; the > >> amount of work a system can do should not be calculated like that. > > > > I think 8 is way to high. Isn't the point of batch to run things > > when the machine is mostly idle? > > The problem is (and we've had this discussion several times before at > least in misc@), that the system load doesn't really tell us that.
What's the proper way to calculate amount of work a system can do, for (then) figuring CPU idle time threshold? Does this not also include the work load (type) being done and imply capability to manage the work load distribution? > It *may* be the case that the system is under lots of work, but it may > also be the case that there are many processes just blocking waiting for > some resource and that the system is essentially idling. > > My particular problem, and the reason I suggested this patch in the > first place, is that I often see loads of 20-30-50 or even way more, > without there even being a problem. The machine is very responsive, and > everything works great - there are just a lot of processes running or > waiting for an opportunity to run. That's not the general case on 'single/dual' (or less than "your chosen higher than 4 number" of) CPU systems, and when running fewer processes that are more CPU intensive. In these cases it may also be easier to know what's happening on the system. Selecting the offloaded period (automatically?) where you don't have direct control requires more understanding than average load numbers (suggestion only). Or a different approach at task running (e.g. service oriented nodes assisting general worker ones). Better use statistical approach per machine (counters) while factoring processing capability and duty saturation cycle (human assessment). Or simply users circadian cycle and not care much as machines just work while people rest, with potential and overlap between multiple machines for same role/task. > Since the system load essentially is a decaying average of the number of > runnable or running processes, it is not in any way connected to actual > processor workload as in instructions executed, just to the fact that > there is much *potentially* going on in the system. Obviously, this explains why the average load figure is not 'the' proper way to quantify processor business, such method gains little adequacy without a tuning knob and that is after assessment of other factors. CPU number does correlate but is not solely deterministic, and imagine the mess from twisting a knob without understanding what it does (sane limits, sane defaults). > That's also why I suggested to base the default on a value relative to > the number of cores - it made sense from my practical point of view. But > I understand where Theo's coming from on this. Please comment (improved?) method to estimate processor offloaded periods that reduces average load guess work, or simply a practical approach at solving the problem of finding offloaded periods (threshold) without pushing edge case changes.