Don,

Good to know the information on the other node.

Both the kernel and the openib software was compiled separately on each machine.  The corresponding logs from 'jatoba' are attached below.  None of the directories are shared.  For compiling the "cpi.c" program, I compile it on each machine, but the directory structure is the same:  i.e. the "cpi" executable is under /home/ib/test/mpi/cpi/cpi on each machine.

This is where the problem came from!

The differences between these two nodes are causing the same mvapich source code to be configured differently, which is enough to cause the incompatibilities at run time. The exact problem can be either because of the linux installations (ie 32-bit mode or 64-bit on EM64T), or because the libraries you installed are different. You can take a diff from the two config-mine.log files you have. Amongst various differences between them, one thing particularly important is the different sizes of int, pointers and long, as shown by the following portion.

++++++++++++++++++++
137,141c142,145
< checking for size of void *... unavailable
< checking for pointers greater than 32 bits... no
< checking for size of int... unavailable< checking for int large enough for pointers... yes
< checking for size of void *... unavailable
---
> checking for size of void *... 8
> checking for pointers greater than 32 bits... yes
> checking for size of int... 4
> checking for int large enough for pointers... no
++++++++++++++++++++++

So taken this into consideration. Just be curious. Have been you able to run some MPI implementations across these two nodes? Or mvapich with mpirun_rsh instead of mpirun_mpd? It wouldn't be surprising if the answer is no.

The size differences above lead to differences in many of the structures. That is why you are not able to run either mvapich-gen2 or mvapich2-gen2. In a little larger context, these two nodes can be taken as a sample case of heterogeneous configurations. We have plans to work out solutions for this kind of heterogeneity in mvapich/mvapich2. It may take some more to get ready.

So that leaves the question about how to get these two nodes to be able to run mvapich. I would suggest you first unify the system installation on these two nodes. And then compile OpenIB/gen2 kernel/userspace on one node and distribute to the other(s). Same thing for building/installing/running mvapich/mvapich2.

Please keep us updated about how this gets solved at the end.

Thanks,
Weikuan
--
Weikuan Yu, Computer Science, OSU
http://www.cse.ohio-state.edu/~yuw

_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to