Hi I've finally managed to fix the problems I had!!! Apart from the MPI problem I've previously reported the same occurred when submitting single jobs (not MPI). Again, just the client node executed the jobs. This was caused because the file "/var/spool/pbs/server_priv/nodes" didn't include the server's hostname and because the pbs_mom wasn't set to execute at startup. After adding the server's hostname in the file (or add it with `qmgr`) and starting pbs_mom, everything seemed to work fine for regular job submission.
I hoped that fixing this problem would fix the one in MPI also, but no. I kept looking and managed to advance a little bit more. It looks like that there were several things missing when I installed OSCAR. I found out the MPI libraries ("lam-libs.x86_64") weren't installed on the client node (Is this normal? I thought the OSCAR image had all the needed libraries..) and that the LD_LIBRARY_PATH environment variable wasn't set. I installed the MPI libraries and set the variable and the MPI worked BUT just on the client node!!! Now the problem had to be with LAM/MPI. Surfed through the site and found out that the $PBS_NODEFILE must include ALL computation nodes. I added the server hostname to a new file (I cannot change the original one since it it dynamically generated) but still no good. At last the problem was that the LAM/MPI version that comes with Fedora doesn't support the "ssi boot tm" option so I just had to change the "ssi boot" to "rsh". In the end one just has to boot LAM/MPI with the command: "lamboot -ssi boot rsh -v node.file". In resume: /var/spool/pbs/server_priv/nodes - must include all execution nodes pbs_mom - run at startup on every execution host (server included) lam-libs.x86_64 - install in every host $LD_LIBRARY_PATH - set to include MPI libraries $PBS_NODEFILE file - include all execution hosts hostname PBS script - use "lamboot -ssi boot rsh -v node.file" instead of the command presented on the samples hope someone fixes this issues on the next OSCAR release. FG PS - some of this fixes may be inaccurate but it was how I managed to put the OSCAR cluster to work. I would appreciate if someone from the OSCAR development team could check them. ------------------------------------------------------------------------- This SF.net email is sponsored by DB2 Express Download DB2 Express C - the FREE version of DB2 express and take control of your XML. No limits. Just data. Click to get it now. http://sourceforge.net/powerbar/db2/ _______________________________________________ Oscar-users mailing list Oscar-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oscar-users