John Summerfield wrote:
> I don't know about that; virtual machines and virtual networks are good
> to start with. For starters, they're cheap. When it virtually works is
> time to see if it really works.
I agree... people have been emulating parallelism on single machines as
long as there has been research into parallel computing! Not only does
this serve as a way of testing that ideas/designs work, but you can even
get useful performance data out of such experiments.
Let's say you want to know how performance scales with cluster size 1
to 32 nodes. To establish a "fixed" baseline, you run all your
experiments with 32 virtual machines and load them down with a similar
workload; i.e. when you meassure your cluster of 2, have the other 30
machines each run a cluster of 1 doing the same work. At the end of
this you should have pretty good performance data from which you can
make general statements about how your cluster software scales... the
fact that the data is relative to a pretty slow virtual machine which
has performance of ((real machine - scheduling overhead for 32 VMs) /
32 - VM overhead) is not really important.
Will this tell you everything you want to know about the behavior of
a real cluster? Probably not, but you'll likely learn a lot of things
that will help you do a better job when you finally do build a real
cluster. In essence this is how real-world engineering is always
done... before you build the real thing you build some scale models
to test your assumptions. Because if you don't do that mistakes are
awfully expensive.
--jurgen