Anne & Lynn Wheeler <[EMAIL PROTECTED]> writes: > on the other hand, the same could be said of my slight-of-hand change > to the testing and resetting code. however, I could demonstrate that > my change actually corresponded to well provable theoritical > principles and had well describable and predictable behavior under all > workloads and configurations.
re: http://www.garlic.com/~lynn/2006b.html#14 Expanded Storage http://www.garlic.com/~lynn/2006b.html#15 {SPAM?} Expanded Storage http://www.garlic.com/~lynn/2006b.html#16 {SPAM?} Expanded Storage ok, and for even more drift. one of the things done for the resource manager was a automated benchmarking process was developed. http://www.garlic.com/~lynn/subtopic.html#bench over the years, there had been lots of work done on workload and configuration profiling (leading into the evoluation of things like capacity planning). one of these that saw a lot of exposure was the performance predictor analytical model available to SEs and salesmen on HONE http://www.garlic.com/~lynn/subtopic.html#hone based on lots of customer and internal datacenter activity, in some cases spanning nearly a decade ... an initial set of 1000 benchmarks were defined for calibrating the resource manager ... selecting a wide variety of workload profiles and configuration profiles. these were specified and run by the automated benchmarking process. in paralle a highly modified version of the performance predictor was developed that would, the modified performance predictor would take all workload, configuration and benchmark results done to date. the model then would select a new workload/configuraiton combination, predict the benchmark results and dynamically specify the workload/configuraiton profile to the automated benchmark process. after the benchmark was run, the results would be feed back to the model and checked against the predictions. then it would select another workload/configuration and repeat the process. this was done for an additional 1000 benchmarks ... each time validating that the actual operation (cpu useage, paging rate, distribution of cpu across different tasks, etc) corresponded to the predicted. the full 2000 automated benchmarks took three months elapsed time to run. however, at the end, we were relatively confident that the resource manager (cpu, dispatching, scheduling, paging, i/o, etc) operating consistently and preditably with respect to theory as well as developed analytical models across an extremely wide range of workloads and configurations. as a side issue. one of the things that we started with before with the automated benchmarks ... before the actual 2000 final were run ... were some extremely pathelogical and extreme benchmarks (i.e. number of users, total virtual pages, etc that were ten to twenty times more than anybody had ever run before). this put extreme stress on the operating system and initially resulted in lots of system failures. as a result, before starting the final resource manager phase ... i redesigned and rewrote the internal serialization mechanism ... and then went thru the whole kernel fixing up all sorts of things to use the new sycronization and serialization process. when i was done all cases of zombie/hung users had been eliminated as well as all cases of system failures because of syncronization/serialization bugs. this code then was incorporated into (and whipped as part of) the resource manager. unfortunately, over the years, various rewrites and fixes corrupted the purity of this serialization/syncronization rework ... and you started to again see hung/zombie users as well as some serialization/syncronization failures. misc. collected past posts on debugging, zombine/hung users, etc http://www.garlic.com/~lynn/subtopic.html#dumprx -- Anne & Lynn Wheeler | http://www.garlic.com/~lynn/
