Weikuan,

Wow!  Thanks for the analysis!   I knew there were some differences in the controller boards and PCI bus layout between the two machines, but I never would have guessed that the basic geometry like pointer sizes was set up differently!    I will have to dig into the past history of these two machines a bit.  I know that "koa" at least had both RedHat and Suse distributions installed at one time or another,  but I am not sure about "jatoba".

You are also correct that I could not get any version of mpi to run between the two machines.

Thanks, again!

  -Don Albert-


Weikuan Yu <[EMAIL PROTECTED]> wrote on 03/22/2006 07:57:11 PM:

> Don,
>
> Good to know the information on the other node.
>
> > Both the kernel and the openib software was compiled separately on
> > each machine.  The corresponding logs from 'jatoba' are attached
> > below.  None of the directories are shared.  For compiling the "cpi.c"
> > program, I compile it on each machine, but the directory structure is
> > the same:  i.e. the "cpi" executable is under
> > /home/ib/test/mpi/cpi/cpi on each machine.
>
> This is where the problem came from!
>
> The differences between these two nodes are causing the same mvapich
> source code to be configured differently, which is enough to cause the
> incompatibilities at run time. The exact problem can be either because
> of the linux installations (ie 32-bit mode or 64-bit on EM64T), or
> because the libraries you installed are different. You can take a diff
> from the two config-mine.log files you have. Amongst various
> differences between them, one thing particularly important is the
> different sizes of int, pointers and long, as shown by the following
> portion.
>
> ++++++++++++++++++++
> 137,141c142,145
> < checking for size of void *... unavailable
> < checking for pointers greater than 32 bits... no
> < checking for size of int... unavailable< checking for int large
> enough for pointers... yes
> < checking for size of void *... unavailable
> ---
>  > checking for size of void *... 8
>  > checking for pointers greater than 32 bits... yes
>  > checking for size of int... 4
>  > checking for int large enough for pointers... no
> ++++++++++++++++++++++
>
> So taken this into consideration. Just be curious. Have been you able
> to run some MPI implementations across these two nodes? Or mvapich with
> mpirun_rsh instead of mpirun_mpd? It wouldn't be surprising if the
> answer is no.
>
> The size differences above lead to differences in many of the
> structures. That is why you are not able to run either mvapich-gen2 or
> mvapich2-gen2. In a  little larger context, these two nodes can be
> taken as a sample case of heterogeneous configurations. We have plans
> to work out solutions for this kind of heterogeneity in
> mvapich/mvapich2. It may take some more to get ready.
>
> So that leaves the question about how to get these two nodes to be able
> to run mvapich. I would suggest you first unify the system installation
> on these two nodes. And then compile OpenIB/gen2 kernel/userspace on
> one node and distribute to the other(s). Same thing for
> building/installing/running mvapich/mvapich2.
>
> Please keep us updated about how this gets solved at the end.
>
> Thanks,
> Weikuan
> --
> Weikuan Yu, Computer Science, OSU
> http://www.cse.ohio-state.edu/~yuw
>
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general

To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general

Reply via email to