Re: [OMPI users] EXTERNAL: Re: MPI_GET beyond 2 GB displacement

2010-07-07 Thread Jed Brown
On Wed, 07 Jul 2010 17:34:44 -0600, "Price, Brian M (N-KCI)" wrote: > Jed, > > The IBM P5 I'm working on is big endian. Sorry, that didn't register. The displ argument is MPI_Aint which is 8 bytes (at least on LP64, probably also on LLP64), so your use of kind=8 for

Re: [OMPI users] MPI_GET beyond 2 GB displacement

2010-07-07 Thread Changsheng Jiang
Does it means we have to split the MPI_Get to many 2GB parts? I have a MPI programm which first serialize a object, sending to other process. The char array after serialize is just below 2GB now, but the data is increasing. One method is to build a large type with MPI_Type_vector, align the char

Re: [OMPI users] Open MPI error MPI_ERR_TRUNCATE: message truncated

2010-07-07 Thread David Zhang
This error typically occurs when the received message is bigger than the specified buffer size. You need to narrow your code down to offending receive command to see if this is indeed the case. On Wed, Jul 7, 2010 at 8:42 AM, Jack Bryan wrote: > Dear All: > > I need to

[OMPI users] configure options

2010-07-07 Thread Zhigang Wei
Dear all, How can I decide the configure options? I am greatly confused. I am using school's high performance computer. But the openmpi there is version 1.3.2, old, so I want to build the new one. I am new to openmpi, I have built the openmpi and it doesn't work, I built and installed it to my

Re: [OMPI users] EXTERNAL: Re: MPI_GET beyond 2 GB displacement

2010-07-07 Thread Price, Brian M (N-KCI)
Jed, The IBM P5 I'm working on is big endian. The test program I'm using is written in Fortran 90 (as stated in my question). I imagine this is indeed a library issue, but I still don't understand what I've done wrong here. Can anyone tell me how I should be building my OpenMPI libraries and

Re: [OMPI users] EXTERNAL: Re: MPI_GET beyond 2 GB displacement

2010-07-07 Thread Jed Brown
On Wed, 07 Jul 2010 15:51:41 -0600, "Price, Brian M (N-KCI)" wrote: > Jeff, > > I understand what you've said about 32-bit signed INTs, but in my program, > the displacement variable that I use for the MPI_GET call is a 64-bit INT > (KIND = 8). The MPI Fortran

Re: [OMPI users] EXTERNAL: Re: MPI_GET beyond 2 GB displacement

2010-07-07 Thread Price, Brian M (N-KCI)
Jeff, I understand what you've said about 32-bit signed INTs, but in my program, the displacement variable that I use for the MPI_GET call is a 64-bit INT (KIND = 8). In fact, the only thing in my program that isn't a 64-bit INT is the array that I'm trying to transfer values from. I would

Re: [OMPI users] trouble using openmpi under slurm

2010-07-07 Thread Jeff Squyres
+1. FWIW, Open MPI works pretty well with SLURM; I use it back here at Cisco for all my testing. That one particular option you're testing doesn't seem to work, but all in all, the integration works fairly well. On Jul 7, 2010, at 3:27 PM, Ralph Castain wrote: > You'll get passionate

Re: [OMPI users] trouble using openmpi under slurm

2010-07-07 Thread Ralph Castain
You'll get passionate advocates from all the various resource managers - there really isn't a right/wrong answer. Torque is more widely used, but any of them will do. None are perfect, IMHO. On Jul 7, 2010, at 1:16 PM, Douglas Guptill wrote: > On Wed, Jul 07, 2010 at 12:37:54PM -0600, Ralph

Re: [OMPI users] trouble using openmpi under slurm

2010-07-07 Thread Douglas Guptill
On Wed, Jul 07, 2010 at 12:37:54PM -0600, Ralph Castain wrote: > Noafraid not. Things work pretty well, but there are places > where things just don't mesh. Sub-node allocation in particular is > an issue as it implies binding, and slurm and ompi have conflicting > methods. > > It all can get

[OMPI users] Adding libraries to wrapper compiler at run-time

2010-07-07 Thread Jeremiah Willcock
The Open MPI FAQ shows how to add libraries to the Open MPI wrapper compilers when building them (using configure flags), but I would like to add flags for a specific run of the wrapper compiler. Setting OMPI_LIBS overrides the necessary MPI libraries, and it does not appear that there is an

Re: [OMPI users] trouble using openmpi under slurm

2010-07-07 Thread Jeff Squyres
On Jul 7, 2010, at 2:37 PM, Ralph Castain wrote: > Noafraid not. Things work pretty well, but there are places where things > just don't mesh. Sub-node allocation in particular is an issue as it implies > binding, and slurm and ompi have conflicting methods. > > It all can get worked out,

Re: [OMPI users] trouble using openmpi under slurm

2010-07-07 Thread Ralph Castain
Noafraid not. Things work pretty well, but there are places where things just don't mesh. Sub-node allocation in particular is an issue as it implies binding, and slurm and ompi have conflicting methods. It all can get worked out, but we have limited time and nobody cares enough to put in

Re: [OMPI users] trouble using openmpi under slurm

2010-07-07 Thread David Roundy
Alas, I'm sorry to hear that! I had hoped (assumed?) that the slurm team would be hand-in-glove with the OMPI team in making sure the interface between the two is smooth. :( David On Wed, Jul 7, 2010 at 11:09 AM, Ralph Castain wrote: > Ah, if only it were that simple. Slurm

Re: [OMPI users] trouble using openmpi under slurm

2010-07-07 Thread Ralph Castain
Ah, if only it were that simple. Slurm is a very difficult beast to interface with, and I have yet to find a single, reliable marker across the various slurm releases to detect options we cannot support. On Jul 7, 2010, at 11:59 AM, David Roundy wrote: > On Wed, Jul 7, 2010 at 10:26 AM, Ralph

Re: [OMPI users] trouble using openmpi under slurm

2010-07-07 Thread David Roundy
On Wed, Jul 7, 2010 at 10:26 AM, Ralph Castain wrote: > I'm afraid the bottom line is that OMPI simply doesn't support core-level > allocations. I tried it on a slurm machine available to me, using our devel > trunk as well as 1.4, with the same results. > > Not sure why you

Re: [OMPI users] MPI_GET beyond 2 GB displacement

2010-07-07 Thread Jeff Squyres
Sorry for the delay in replying. :-( It's because for a 32 bit signed int, at 2GB, the value turns negative. On Jun 29, 2010, at 1:46 PM, Price, Brian M (N-KCI) wrote: > OpenMPI version: 1.3.3 > > Platform: IBM P5 > > Built OpenMPI 64-bit (i.e., CFLAGS=-q64, CXXFLAGS=-q64, -FFLAGS=-q64,

Re: [OMPI users] trouble using openmpi under slurm

2010-07-07 Thread Ralph Castain
I'm afraid the bottom line is that OMPI simply doesn't support core-level allocations. I tried it on a slurm machine available to me, using our devel trunk as well as 1.4, with the same results. Not sure why you are trying to run that way, but I'm afraid you can't do it with OMPI. On Jul 6,

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-07 Thread Ralph Castain
I would guess the #files limit of 1024. However, if it behaves the same way when spread across multiple machines, I would suspect it is somewhere in your program itself. Given that the segfault is in your process, can you use gdb to look at the core file and see where and why it fails? On Jul

Re: [OMPI users] MPI_Init failing in singleton

2010-07-07 Thread Ralph Castain
On Jul 7, 2010, at 10:12 AM, Grzegorz Maj wrote: > The problem was that orted couldn't find ssh nor rsh on that machine. > I've added my installation to PATH and it now works. > So one question: I will definitely not use MPI_Comm_spawn or any > related stuff. Do I need this ssh? If not, is there

Re: [OMPI users] perhaps an openmpi bug, how best to identify?

2010-07-07 Thread Jeff Squyres
On Jul 7, 2010, at 12:50 PM, Olivier Marsden wrote: > Hi Jeff, thanks for the response. > As soon as I can afford to reboot my workstation, > like tomorrow, I will test as you suggest whether the computer > actually hangs or just slows down. For exhaustive kernel logging, > I replaced the

Re: [OMPI users] perhaps an openmpi bug, how best to identify?

2010-07-07 Thread Olivier Marsden
Hi Jeff, thanks for the response. As soon as I can afford to reboot my workstation, like tomorrow, I will test as you suggest whether the computer actually hangs or just slows down. For exhaustive kernel logging, I replaced the following line kern.* -/var/log/kern.log with kern.*

[OMPI users] Question on checkpoint overhead in Open MPI

2010-07-07 Thread Nguyen Toan
Hello everyone, I have a question concerning the checkpoint overhead in Open MPI, which is the difference taken from the runtime of application execution with and without checkpoint. I observe that when the data size and the number of processes increases, the runtime of BLCR is very small compared

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-07 Thread Grzegorz Maj
2010/7/7 Ralph Castain : > > On Jul 6, 2010, at 8:48 AM, Grzegorz Maj wrote: > >> Hi Ralph, >> sorry for the late response, but I couldn't find free time to play >> with this. Finally I've applied the patch you prepared. I've launched >> my processes in the way you've described

Re: [OMPI users] MPI_Init failing in singleton

2010-07-07 Thread Grzegorz Maj
The problem was that orted couldn't find ssh nor rsh on that machine. I've added my installation to PATH and it now works. So one question: I will definitely not use MPI_Comm_spawn or any related stuff. Do I need this ssh? If not, is there any way to say orted that it shouldn't be looking for ssh

[OMPI users] Open MPI error MPI_ERR_TRUNCATE: message truncated

2010-07-07 Thread Jack Bryan
Dear All: I need to transfer some messages from workers master node on MPI cluster with Open MPI. The number of messages is fixed. When I increase the number of worker nodes, i got error: -- terminate called after throwing an instance of

Re: [OMPI users] perhaps an openmpi bug, how best to identify?

2010-07-07 Thread Jeff Squyres
On Jul 7, 2010, at 10:20 AM, Olivier Marsden wrote: > The (7 process) code runs correctly on my workstation using mpich2 (latest > stable version) & ifort 11.1, using intel-mpi & ifort 11.1, but > randomly hangs the > computer (vanilla ubuntu 9.10 kernel v. 2.6.31 ) to the point where only > a

[OMPI users] perhaps an openmpi bug, how best to identify?

2010-07-07 Thread Olivier Marsden
Hello, I am developing a fortran mpi code and currently testing it on my workstation, so in a shared memory environment. The (7 process) code runs correctly on my workstation using mpich2 (latest stable version) & ifort 11.1, using intel-mpi & ifort 11.1, but randomly hangs the computer

Re: [OMPI users] Dynamic algorithms problem

2010-07-07 Thread Jeff Squyres
I do believe that this is a bug. I *think* that the included patch will fix it for you, but George is on vacation until tomorrow (and I don't know how long it'll take him to slog through his backlog :-( ). Can you try the following patch and see if it fixes it for you? Index:

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-07 Thread Ralph Castain
On Jul 6, 2010, at 8:48 AM, Grzegorz Maj wrote: > Hi Ralph, > sorry for the late response, but I couldn't find free time to play > with this. Finally I've applied the patch you prepared. I've launched > my processes in the way you've described and I think it's working as > you expected. None of

Re: [OMPI users] MPI_Init failing in singleton

2010-07-07 Thread Ralph Castain
Check your path and ld_library_path- looks like you are picking up some stale binary for orted and/or stale libraries (perhaps getting the default OMPI instead of 1.4.2) on the machine where it fails. On Jul 7, 2010, at 7:44 AM, Grzegorz Maj wrote: > Hi, > I was trying to run some MPI

[OMPI users] MPI_Init failing in singleton

2010-07-07 Thread Grzegorz Maj
Hi, I was trying to run some MPI processes as a singletons. On some of the machines they crash on MPI_Init. I use exactly the same binaries of my application and the same installation of openmpi 1.4.2 on two machines and it works on one of them and fails on the other one. This is the command and

Re: [OMPI users] Sending an objects vector via MPI C++

2010-07-07 Thread Jeff Squyres
You might want to look at the Boost.mpi project. They wrote some nice C++ wrappers around MPI to handle things like STL vectors, etc. On Jul 7, 2010, at 5:07 AM, Saygin Arkan wrote: > Hello, > > I'm a newbie on MPI, just playing around with the things. > I've searched through the internet

Re: [OMPI users] OpenMPI Hangs, No Error

2010-07-07 Thread Jeff Squyres
On Jul 6, 2010, at 6:36 PM, Reuti wrote: > But just for curiosity: at one point Open MPI chooses the ports. At > that point it might possible to implement to start two SSH tunnels per > slave node to have both directions and the daemons have to contact > then "localhost" on a specific port

[OMPI users] Sending an objects vector via MPI C++

2010-07-07 Thread Saygin Arkan
Hello, I'm a newbie on MPI, just playing around with the things. I've searched through the internet but couldn't find an appropriate code example for my problem. I'm making comparisons, correlations on my cluster, and gaining the results like this: vector results; In every node, they calculate