OK, here is some of what we learned people seem to want on these
clusters. This is drawn from experiences with users of bproc and
conventional clusters.

They were used to
ssh node cmd

and if cmd was a shell script,
ssh node script

Now on bproc (bpsh) we learned that asking people to do this instead:
./script
and change commands in script from:
command
to
bpsh node-list command

in essence, turn the script inside out,
was a big hurdle for many folks, and they did not like it, *even if it
was only one line to change*,  and *even if it gave them 1000-fold or
greater performance improvement*. I am not making this up.

People want to ssh in and have a full system, with command history and
all that jazz. This has other implications.

And I hate to say it, but people here at SNL who run clusters for a
living have found xcpu hard to set up and use. Performance is still
disappointing and really lags bproc by quite a bit.

Setup difficulty was also true for bproc -- it had kernel footprint
and keeping it all working was pretty awful, and it was not able to
function with even minor heterogeneity, e.g. a geode and a P4 were not
usable as one bproc cluster.

No matter what, xcpu2 has to be as easy to set up and use as ssh, and
"different even if better" translates to "harder" for most people.

Anyway, still tired from travel but hope this is not too incoherent.

ron

Reply via email to