A few other points that should be mentioned:

- The same problem occurs for the PVM, LAM/MPI, and MPICH tests because they all compile source code out on the compute nodes.

- Compiling on local disk doesn't exhibit the same problem. For example, if you copy cpu.c to /tmp and compile it there, it takes less than a second (as it should). Compiling the same cpi.c with the exact same command on an NFS-mounted directory (i.e., on one of the OSCAR nodes that has /home mounted from the OSCAR head node) takes several minutes. This seems to imply that this is an NFS problem somehow.

- LAM uses a wrapper compiler "mpicc" to compile cpi.c ("mpicc cpi.c -o lam-cpi"). This simply adds several compiler/linker flags and then invokes gcc. So the actual command is really "gcc ...." Just for completeness, if you remove mpicc and use the actual, underlying gcc command, the timing results are the same. So I don't think that this is a problem with LAM/MPI (or PVM or MPICH). It's really seems to be an NFS issue.

- Going on the theory that this was an NFS problem, I tried copying the lam-cpi executable from /tmp to the NFS directory (~oscartst) a few times and didn't observe any noticable delay. lam-cpi is not a large file (IIRC, it was under 1MB?). So it really seems to be some kind of bad interaction between the linker and NFS. Perhaps something to do with file locking...?

- After the application is compiled, it runs just fine. I tested specifically with LAM/MPI, but I think the issue is the same for PVM and MPICH as well. That is, I can lamboot/mpirun/lamhalt within a few seconds. So it does not look like a network latency / message passing issue (since the test apps for PVM, LAM/MPI, and MPICH are all using the network for message passing).


So this raises a few questions:

- how does one tune NFS? We played with a few different parameters in /etc/exports on the head node and /etc/fstab on the client nodes, but nothing seemed to help. Is this a file locking issue somehow?

- are there any known problems with RHAS NFS? (DongInn noted the specific version in his original mail)


On Dec 21, 2004, at 9:52 AM, DongInn Kim wrote:

Thanks Michael,

One thing I fogot to tell you is this problem (long time to compile C codes(LAM, or something else)) does not happen
on the head node.
So, it can try to optimize for ia64 or something else on the client node but it does not seem to do on the head node.


Please, can you try to test with your machines?


[EMAIL PROTECTED] lam]# cat hello.c
#include <stdio.h>
int main() {
printf("Hello, world\n");
return 0;
}
[EMAIL PROTECTED] lam]# time gcc hello.c -o hello
real 0m5.635s
user 0m0.060s
sys 0m0.020s
[EMAIL PROTECTED] lam]# ssh oscar_server
Warning: Permanently added 'oscar_server' (RSA) to the list of known hosts.
Last login: Mon Dec 20 16:44:30 2004 from loudsl01.4.0.6.117.iglou.com
[EMAIL PROTECTED] root]# cat hello.c
#include <stdio.h>
int main() {
printf("Hello, world\n");
return 0;
}
[EMAIL PROTECTED] root]# time gcc hello.c -o hello
real 0m0.120s
user 0m0.060s
sys 0m0.040s
[EMAIL PROTECTED] root]#




Regards,

DongInn.

Michael Edwards wrote:

Did you check to see if it is trying to optimize for ia64 or something
odd since I think that is what RHEL is mainly used on?  Or is it just
a mater of compiling it on a 64 bit machine that makes it 32/64 bit
code?

I suppose I should figure these types of things out sincee I have some
opterons sitting here waiting  for me to get some setup time.


On Mon, 20 Dec 2004 22:58:09 -0500, DongInn Kim <[EMAIL PROTECTED]> wrote:


Hi all,

I have a testing problem of MPICH and LAM/MPI on RHEL AS 3 (*ia32*)*,
not IA64*

This is my cluster environment for testing OSCAR 4.0b6r2801.

- OS : RHEL (Red Hat Enterprise Linux) AS (Advanced Server) release 3
(Taroon Update 3).
- Kernel : 2.4.21-20.EL
- NFS : nfs-utils-1.0.6-31EL
- MySQL server : rebuilt from the mysql src rpm
ftp://ftp.redhat.com/pub/redhat/linux/enterprise/3/en/os/i386/SRPMS/
- /etc/exports on my head node
/home 192.168.0.6/255.255.255.0(async,rw,no_root_squash)
- /home of /etc/fstab on the client node
nfs_oscar:/home /home nfs rw 0 0
- rpcinfo
[EMAIL PROTECTED] root]# rpcinfo -p | grep nfs
100003 2 udp 2049 nfs
100003 3 udp 2049 nfs
100003 2 tcp 2049 nfs
100003 3 tcp 2049 nfs
[EMAIL PROTECTED] root]#


I have two testing failures (MPICH and LAM/MPI) at step 8 "Test Cluster
Setup".
The superficial reason that I can figure out is passing the time limit
(30 sec).


However, there is no reason for MPICH and LAM to have this long compile
time and
I checked to see if it takes this much time to compile lam codes on the
client nodes.
Surprisingly, it takes more than a miniute to compile the cpi, which is
about a 100 line C program.
I checked if this kind of thing happens in the other normal C programs.
And yes, it does.
So, I think that the problem can be NFS.


Please, can anyone, who has the RHEL AS 3 (IA32), test the new SVN
version of OSCAR or oscar-4.0br2801 posted
by Thomas and check to see if you have the same problem whether or not
you have the exactly same environment
as I mentioned above?
http://www.csm.ornl.gov/~naughton/oscar/testing/


As far as I have tested, RH9 and FC2 have never this problem. Several
persons confirmed that there is no
this problem on RH9 and FC2.
I have two persons with RHEL AS 3 (IA64) but not IA32 and they don't
have this problem.

I appreciate if you can test and share some idea.

Regards,

DongInn.

-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now.
http://productguide.itmanagersjournal.com/
_______________________________________________
Oscar-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-devel






-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/
_______________________________________________
Oscar-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-devel



-- {+} Jeff Squyres {+} [EMAIL PROTECTED] {+} http://www.lam-mpi.org/



-------------------------------------------------------
SF email is sponsored by - The IT Product Guide
Read honest & candid reviews on hundreds of IT Products from real users.
Discover which products truly live up to the hype. Start reading now. http://productguide.itmanagersjournal.com/
_______________________________________________
Oscar-devel mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/oscar-devel

Reply via email to