Mark,
I think the problem is likely due to the networking differences
between the nodes. Check out these two FAQ entries:
http://www.open-mpi.org/faq/?category=tcp#tcp-multi-network
http://www.open-mpi.org/faq/?category=tcp#tcp-selection
Specifically, I think you should try using a pair of these
Using todays SVN (1.3a1r17234) and building in the context of OFED 1.2.5.4
installed and it works!
Regards,
Mostyn
On Thu, 24 Jan 2008, Mostyn Lewis wrote:
Hello,
I have a very simple MPI program hanging in MPI_Reduce using the openmpi-1.2.4-1
as supplied with OFED 1.2.5.4 (running this too).
I dont think so, we are using the hdf5 serial io module, our hosts
have just 1 gb ethernet and our OSS has gigbit also. But again our
lustre setup is brand-new with only a few users so its effectively Idle.
We also see the same behavior on NFS v3 backed by OnStor bobcats.
Brock Palen
Center
Brock,
The only thing that came to mind was that possibly on the second dump,
the I/O was substantial enough to cause an overload of the OSS's (I/O
servers) resulting in a process or task hang? Can you tell if your
Lustre environment is getting overwhelmed when the Open MPI / FLASH
combinatio
I started a new run with some changes,
Shortening the run wont work well, it takes 3 days just to get
through the AMR.
Brock Palen
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
On Jan 25, 2008, at 3:01 PM, Daniel Pfenniger wrote:
Hi,
Brock Palen wrote:
Is anyone using fla
Hi,
Brock Palen wrote:
Is anyone using flash with openMPI? we are here, but when ever it
tries to write its second checkpoint file it segfaults once it gets
to 2.2GB always in the same location.
Debugging is a pain as it takes 3 days to get to that point. Just
wondering if anyone else h
On Thu, Jan 24, 2008 at 10:09:51PM -0500, Barry Rountree wrote:
> On Thu, Jan 24, 2008 at 04:03:49PM -0500, Tim Mattox wrote:
> > Hello Barry,
> > I am guessing you are trying to use a threaded build of Open MPI...
> >
> > Unfortunately, the threading support in Open MPI 1.2.x is not only not well
Yes that is true.
The underlying filesystems are ether NFSv3 provided by a OnStor
bobcat, or a simple Lustre cluster.
All systems are 64bit x86_64. We create files larger than 2GB al the
time.
Brock Palen
Center for Advanced Computing
bro...@umich.edu
(734)936-1985
On Jan 25, 2008, at
I'm guessing he means the ASC FLASH code which simulates star explosions...
Brock?
Jeff F. Pummill
University of Arkansas
//
Doug Reeder wrote:
Brock,
Do you mean flash memory, like a USB memory stick. What kid of file
system is on the memory. Is there some filesystem limit you are
bump
Brock,
Do you mean flash memory, like a USB memory stick. What kid of file
system is on the memory. Is there some filesystem limit you are
bumping into.
Doug Reeder
On Jan 25, 2008, at 8:38 AM, Brock Palen wrote:
Is anyone using flash with openMPI? we are here, but when ever it
tries to
Is anyone using flash with openMPI? we are here, but when ever it
tries to write its second checkpoint file it segfaults once it gets
to 2.2GB always in the same location.
Debugging is a pain as it takes 3 days to get to that point. Just
wondering if anyone else has seen this same behavio
I will be out of the office starting 01/25/2008 and will not return until
02/25/2008.
I will be on vacation starting Dec 19 and will be back on Jan 3. I will
respond to your message when I return.
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1
Hi all,
I encounter a problem with the routine MPI_ACCUMULATE trying to sum up
MPI_REAL8's on a large memory window with a large offset.
My program running (on a single processor, x86_64 architecture) crashes with
an error message like:
[node14:1623
13 matches
Mail list logo