Anne & Lynn Wheeler <[EMAIL PROTECTED]> writes:
> on the other hand, the same could be said of my slight-of-hand change
> to the testing and resetting code. however, I could demonstrate that
> my change actually corresponded to well provable theoritical
> principles and had well describable and predictable behavior under all
> workloads and configurations.

re:
http://www.garlic.com/~lynn/2006b.html#14 Expanded Storage
http://www.garlic.com/~lynn/2006b.html#15 {SPAM?} Expanded Storage
http://www.garlic.com/~lynn/2006b.html#16 {SPAM?} Expanded Storage

ok, and for even more drift.

one of the things done for the resource manager was a automated
benchmarking process was developed.
http://www.garlic.com/~lynn/subtopic.html#bench

over the years, there had been lots of work done on workload and
configuration profiling (leading into the evoluation of things like
capacity planning). one of these that saw a lot of exposure was the
performance predictor analytical model available to SEs and salesmen
on HONE
http://www.garlic.com/~lynn/subtopic.html#hone

based on lots of customer and internal datacenter activity, in some
cases spanning nearly a decade ... an initial set of 1000 benchmarks
were defined for calibrating the resource manager ... selecting a wide
variety of workload profiles and configuration profiles. these were
specified and run by the automated benchmarking process.

in paralle a highly modified version of the performance predictor was
developed that would, the modified performance predictor would take
all workload, configuration and benchmark results done to date.  the
model then would select a new workload/configuraiton combination,
predict the benchmark results and dynamically specify the
workload/configuraiton profile to the automated benchmark process.
after the benchmark was run, the results would be feed back to the
model and checked against the predictions. then it would select
another workload/configuration and repeat the process. this was done
for an additional 1000 benchmarks ... each time validating that the
actual operation (cpu useage, paging rate, distribution of cpu across
different tasks, etc) corresponded to the predicted.

the full 2000 automated benchmarks took three months elapsed time to
run.  however, at the end, we were relatively confident that the
resource manager (cpu, dispatching, scheduling, paging, i/o, etc)
operating consistently and preditably with respect to theory as well
as developed analytical models across an extremely wide range of
workloads and configurations.

as a side issue. one of the things that we started with before with
the automated benchmarks ... before the actual 2000 final were run
... were some extremely pathelogical and extreme benchmarks
(i.e. number of users, total virtual pages, etc that were ten to
twenty times more than anybody had ever run before). this put extreme
stress on the operating system and initially resulted in lots of
system failures. as a result, before starting the final resource
manager phase ... i redesigned and rewrote the internal serialization
mechanism ... and then went thru the whole kernel fixing up all sorts
of things to use the new sycronization and serialization process. when
i was done all cases of zombie/hung users had been eliminated as well
as all cases of system failures because of
syncronization/serialization bugs. this code then was incorporated
into (and whipped as part of) the resource manager.

unfortunately, over the years, various rewrites and fixes corrupted
the purity of this serialization/syncronization rework ... and you
started to again see hung/zombie users as well as some
serialization/syncronization failures.

misc. collected past posts on debugging, zombine/hung users, etc
http://www.garlic.com/~lynn/subtopic.html#dumprx

--
Anne & Lynn Wheeler | http://www.garlic.com/~lynn/

Reply via email to