Hi,

What communication layer is used? How do I choose it?

The fastest available. You can choose the network by parameters given to mpirun see
http://www.open-mpi.org/faq/?category=tuning#mca-def

What is the behavior in case a node dies or becomes unreachable?

Your run will be aborted. However there is checkpoint/restart support for Linux http://www.open-mpi.org/faq/?category=ft

What makes any given machine become a node available for tasks?

You define it in a host file or a batch system tells it OpenMPI.

Is there some sort of load balancing ?

No, you have to do that yourself.

Is there a monitoring tool that would give me indications of the status and health of the nodes?

This has nothing to do with MPI. Nagios or Ganglia can do that.

How does the "MPI enabled" code gets transferred to the nodes? If I understand things correctly, I would have to write a separate command line exe that takes care of the tasks and this would be the exe that gets sent over to node.

Usually you use a shared file system.

I'm quite sure all these are trivial questions for those with more experience, but I'm having a hard time finding resources that would answer those.

Read an introduction on programming with MPI and another one on Beowulf clusters (batch systems, monitoring, shared file systems). This should give you enough information on the topic. If you don't mind spending more money on software you can also take a look at Microsofts HPC Server.

Nico

Reply via email to