Hi,
What communication layer is used? How do I choose it?
The fastest available. You can choose the network by parameters given to
mpirun see
http://www.open-mpi.org/faq/?category=tuning#mca-def
What is the behavior in case a node dies or becomes unreachable?
Your run will be aborted. However there is checkpoint/restart support
for Linux http://www.open-mpi.org/faq/?category=ft
What makes any given machine become a node available for tasks?
You define it in a host file or a batch system tells it OpenMPI.
Is there some sort of load balancing ?
No, you have to do that yourself.
Is there a monitoring tool that would give me indications of the
status and health of the nodes?
This has nothing to do with MPI. Nagios or Ganglia can do that.
How does the "MPI enabled" code gets transferred to the nodes? If I
understand things correctly, I would have to write a separate command
line exe that takes care of the tasks and this would be the exe that
gets sent over to node.
Usually you use a shared file system.
I'm quite sure all these are trivial questions for those with more
experience, but I'm having a hard time finding resources that would
answer those.
Read an introduction on programming with MPI and another one on Beowulf
clusters (batch systems, monitoring, shared file systems). This should
give you enough information on the topic. If you don't mind spending
more money on software you can also take a look at Microsofts HPC Server.
Nico