Re: [OMPI users] odd network behavior

2008-01-25 Thread Tim Mattox
Mark, I think the problem is likely due to the networking differences between the nodes. Check out these two FAQ entries: http://www.open-mpi.org/faq/?category=tcp#tcp-multi-network http://www.open-mpi.org/faq/?category=tcp#tcp-selection Specifically, I think you should try using a pair of these

Re: [OMPI users] openmpi-1.2.4-1/OFED 1.2.5.4 ConnectX MPI_Reduce hang

2008-01-25 Thread Mostyn Lewis
Using todays SVN (1.3a1r17234) and building in the context of OFED 1.2.5.4 installed and it works! Regards, Mostyn On Thu, 24 Jan 2008, Mostyn Lewis wrote: Hello, I have a very simple MPI program hanging in MPI_Reduce using the openmpi-1.2.4-1 as supplied with OFED 1.2.5.4 (running this too).

Re: [OMPI users] flash2.5 with openmpi

2008-01-25 Thread Brock Palen
I dont think so, we are using the hdf5 serial io module, our hosts have just 1 gb ethernet and our OSS has gigbit also. But again our lustre setup is brand-new with only a few users so its effectively Idle. We also see the same behavior on NFS v3 backed by OnStor bobcats. Brock Palen Center

Re: [OMPI users] flash2.5 with openmpi

2008-01-25 Thread Jeff Pummill
Brock, The only thing that came to mind was that possibly on the second dump, the I/O was substantial enough to cause an overload of the OSS's (I/O servers) resulting in a process or task hang? Can you tell if your Lustre environment is getting overwhelmed when the Open MPI / FLASH combinatio

Re: [OMPI users] flash2.5 with openmpi

2008-01-25 Thread Brock Palen
I started a new run with some changes, Shortening the run wont work well, it takes 3 days just to get through the AMR. Brock Palen Center for Advanced Computing bro...@umich.edu (734)936-1985 On Jan 25, 2008, at 3:01 PM, Daniel Pfenniger wrote: Hi, Brock Palen wrote: Is anyone using fla

Re: [OMPI users] flash2.5 with openmpi

2008-01-25 Thread Daniel Pfenniger
Hi, Brock Palen wrote: Is anyone using flash with openMPI? we are here, but when ever it tries to write its second checkpoint file it segfaults once it gets to 2.2GB always in the same location. Debugging is a pain as it takes 3 days to get to that point. Just wondering if anyone else h

Re: [OMPI users] Occasional mpirun hang on completion

2008-01-25 Thread Barry Rountree
On Thu, Jan 24, 2008 at 10:09:51PM -0500, Barry Rountree wrote: > On Thu, Jan 24, 2008 at 04:03:49PM -0500, Tim Mattox wrote: > > Hello Barry, > > I am guessing you are trying to use a threaded build of Open MPI... > > > > Unfortunately, the threading support in Open MPI 1.2.x is not only not well

Re: [OMPI users] flash2.5 with openmpi

2008-01-25 Thread Brock Palen
Yes that is true. The underlying filesystems are ether NFSv3 provided by a OnStor bobcat, or a simple Lustre cluster. All systems are 64bit x86_64. We create files larger than 2GB al the time. Brock Palen Center for Advanced Computing bro...@umich.edu (734)936-1985 On Jan 25, 2008, at

Re: [OMPI users] flash2.5 with openmpi

2008-01-25 Thread Jeff Pummill
I'm guessing he means the ASC FLASH code which simulates star explosions... Brock? Jeff F. Pummill University of Arkansas // Doug Reeder wrote: Brock, Do you mean flash memory, like a USB memory stick. What kid of file system is on the memory. Is there some filesystem limit you are bump

Re: [OMPI users] flash2.5 with openmpi

2008-01-25 Thread Doug Reeder
Brock, Do you mean flash memory, like a USB memory stick. What kid of file system is on the memory. Is there some filesystem limit you are bumping into. Doug Reeder On Jan 25, 2008, at 8:38 AM, Brock Palen wrote: Is anyone using flash with openMPI? we are here, but when ever it tries to

[OMPI users] flash2.5 with openmpi

2008-01-25 Thread Brock Palen
Is anyone using flash with openMPI? we are here, but when ever it tries to write its second checkpoint file it segfaults once it gets to 2.2GB always in the same location. Debugging is a pain as it takes 3 days to get to that point. Just wondering if anyone else has seen this same behavio

[OMPI users] Swamy Kandadai is out of the office.

2008-01-25 Thread Swamy Kandadai
I will be out of the office starting 01/25/2008 and will not return until 02/25/2008. I will be on vacation starting Dec 19 and will be back on Jan 3. I will respond to your message when I return.

[OMPI users] bug in MPI_ACCUMULATE for window offsets > 2**31 - 1 bytes? openmpi v1.2.5

2008-01-25 Thread Stefan Knecht
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi all, I encounter a problem with the routine MPI_ACCUMULATE trying to sum up MPI_REAL8's on a large memory window with a large offset. My program running (on a single processor, x86_64 architecture) crashes with an error message like: [node14:1623