Hi, Don,
Good to know that you are able to run mvapich with mpirun_rsh. We can
now focus on MPD problem. We never had attempted to run MPD_RING option
as root user. Just curious, were you able to mvapich2-gen2 with
MPD_RING? They are more or less similar code. So could you try the
following two possibilities and let us know all the log files and etc.
a) rpm -e lam.
The reason for this is that I noticed earlier LAM showing up in your
config.log. It might help the configure if you can remove the other MPI
packages which are on your path.
b) Try mvapich-gen2 with mpd_ring, either as root or as user. Please do
build/configure/install on one node and propagate the installation to
see if it runs. We can look into the separate build later on. BTW, make
sure you do `make install' at the end of configure/build.
c) If possible, could you try mvapich2-gen2 with mpd_ring since the
mpd_ring related code is similar there. That may help to locate the
problem.
Thanks,
Weikuan
On Apr 6, 2006, at 8:02 PM, [EMAIL PROTECTED] wrote:
Weikuan
I previously reported that I was having problems running any MPI jobs
between a pair of EM64T machines with RHEL4, Update 3 with the OpenIB
modules, (kernel versions 2.6.9-34.ELsmp) and the "mvapich-gen2" code
from the OpenIB svn tree. I was having two problems:
1. When I tried to run from user mode, I would get segmentation
faults
2. When I ran from root, the jobs would fail with the following
message: "cpi: pmgr_client_mpd.c:254: mpd_exchange_info: Assertion
`len_remote == len_local' failed. ".
The first problem turned out to be a memory problem; I had to
increase the size of the max locked-in-memory address space (memlock)
in the user limits.
The second problem seemed to be more related to process management
than to MPI itself. I remembered that when I modified the
"make.mvapich.gen2" build script, there was a parameter for MPD:
# Whether to use an optimized queue pair exchange scheme. This is
not
# checked for a setting in in the script. It must be set here
explicitly.
# Supported: "-DUSE_MPD_RING", "-DUSE_MPD_BASIC" and "" (to disable)
HAVE_MPD_RING=""
Because I wanted to use MPD to launch jobs, I set
HAVE_MPD_RING="-DUSE_MPD_RING" in the build script.
I went back and set the parameter to HAVE_MPD_RING="" to disable it,
and rebuilt, which meant that MPD was not installed. Using
"mpirun_rsh" I am now able to run the MPI jobs, including "cpi",
"mping" and other benchmark tests.
There seems to be a problem with "USE_MPD_RING". Have you seen this
before? Should I try with "USE_MPD_BASIC" instead?
-Don Albert-
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit
http://openib.org/mailman/listinfo/openib-general
--
Weikuan Yu, Computer Science, OSU
http://www.cse.ohio-state.edu/~yuw
_______________________________________________
openib-general mailing list
[email protected]
http://openib.org/mailman/listinfo/openib-general
To unsubscribe, please visit http://openib.org/mailman/listinfo/openib-general