Re: [OMPI users] MPI_Init

2012-08-28 Thread Ralph Castain

On Aug 28, 2012, at 6:47 PM, Tony Raymond  wrote:

> Hi Ralph,
> 
> Thanks for taking care of this so quick!
> 
> Does this mean that MPI_Init will leave the SIGCHLD handler alone?

Yes

> Should it be fine to set the handler as I did in the current version of MPI?

Yes - no harm done either way, but we shouldn't be messing with the handler 
(and didn't realize we were).

> 
> Thanks,
> Tony
> 
> From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of 
> Ralph Castain [r...@open-mpi.org]
> Sent: Tuesday, August 28, 2012 2:40 PM
> To: Open MPI Users
> Subject: Re: [OMPI users] MPI_Init
> 
> Okay, I fixed this on our trunk - I'll post it for transfer to the 1.7 and 
> 1.6 series in their next releases.
> 
> Thanks!
> 
> On Aug 28, 2012, at 2:27 PM, Ralph Castain  wrote:
> 
>> Oh crud - yes we do. Checking on it...
>> 
>> On Aug 28, 2012, at 2:23 PM, Ralph Castain  wrote:
>> 
>>> Glancing at the code, I don't see anywhere that we trap SIGCHLD outside of 
>>> mpirun and the orte daemons - certainly not inside an MPI app. What version 
>>> of OMPI are you using?
>>> 
>>> On Aug 28, 2012, at 2:06 PM, Tony Raymond  wrote:
>>> 
 Hi,
 
 I have an application that uses openMPI and creates some child processes 
 using fork(). I've been trying to catch SIGCHLD in order to check the exit 
 status of these processes so that the program will exit if a child errors 
 out.
 
 I've found out that if I set the SIGCHLD handler before calling MPI_Init, 
 MPI_Init sets the SIGCHLD handler so that my application appears to ignore 
 SIGCHLD, but if I set my handler after MPI_Init, the application handles 
 SIGCHLD appropriately.
 
 I'm wondering if there are any problems that could come up by changing the 
 SIGCHLD handler, and why MPI_Init modifies the SIGCHLD handler in the 
 first place.
 
 Thanks,
 Tony
 ___
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> 
>> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] MPI_Init

2012-08-28 Thread Tony Raymond
Hi Ralph,

Thanks for taking care of this so quick!

Does this mean that MPI_Init will leave the SIGCHLD handler alone? Should it be 
fine to set the handler as I did in the current version of MPI?

Thanks,
Tony

From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] On Behalf Of 
Ralph Castain [r...@open-mpi.org]
Sent: Tuesday, August 28, 2012 2:40 PM
To: Open MPI Users
Subject: Re: [OMPI users] MPI_Init

Okay, I fixed this on our trunk - I'll post it for transfer to the 1.7 and 1.6 
series in their next releases.

Thanks!

On Aug 28, 2012, at 2:27 PM, Ralph Castain  wrote:

> Oh crud - yes we do. Checking on it...
>
> On Aug 28, 2012, at 2:23 PM, Ralph Castain  wrote:
>
>> Glancing at the code, I don't see anywhere that we trap SIGCHLD outside of 
>> mpirun and the orte daemons - certainly not inside an MPI app. What version 
>> of OMPI are you using?
>>
>> On Aug 28, 2012, at 2:06 PM, Tony Raymond  wrote:
>>
>>> Hi,
>>>
>>> I have an application that uses openMPI and creates some child processes 
>>> using fork(). I've been trying to catch SIGCHLD in order to check the exit 
>>> status of these processes so that the program will exit if a child errors 
>>> out.
>>>
>>> I've found out that if I set the SIGCHLD handler before calling MPI_Init, 
>>> MPI_Init sets the SIGCHLD handler so that my application appears to ignore 
>>> SIGCHLD, but if I set my handler after MPI_Init, the application handles 
>>> SIGCHLD appropriately.
>>>
>>> I'm wondering if there are any problems that could come up by changing the 
>>> SIGCHLD handler, and why MPI_Init modifies the SIGCHLD handler in the first 
>>> place.
>>>
>>> Thanks,
>>> Tony
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] MPI::Intracomm::Spawn and cluster configuration

2012-08-28 Thread Brian Budge
Thanks!

On Tue, Aug 28, 2012 at 4:57 PM, Ralph Castain  wrote:
> Yeah, I'm seeing the hang as well when running across multiple machines. Let 
> me dig a little and get this fixed.
>
> Thanks
> Ralph
>
> On Aug 28, 2012, at 4:51 PM, Brian Budge  wrote:
>
>> Hmmm, I went to the build directories of openmpi for my two machines,
>> went into the orte/test/mpi directory and made the executables on both
>> machines.  I set the hostsfile in the env variable on the "master"
>> machine.
>>
>> Here's the output:
>>
>> OMPI_MCA_orte_default_hostfile=/home/budgeb/p4/pseb/external/install/openmpi-1.6.1/orte/test/mpi/hostsfile
>> ./simple_spawn
>> Parent [pid 97504] starting up!
>> 0 completed MPI_Init
>> Parent [pid 97504] about to spawn!
>> Parent [pid 97507] starting up!
>> Parent [pid 97508] starting up!
>> Parent [pid 30626] starting up!
>> ^C
>> zsh: interrupt  OMPI_MCA_orte_default_hostfile= ./simple_spawn
>>
>> I had to ^C to kill the hung process.
>>
>> When I run using mpirun:
>>
>> OMPI_MCA_orte_default_hostfile=/home/budgeb/p4/pseb/external/install/openmpi-1.6.1/orte/test/mpi/hostsfile
>> mpirun -np 1 ./simple_spawn
>> Parent [pid 97511] starting up!
>> 0 completed MPI_Init
>> Parent [pid 97511] about to spawn!
>> Parent [pid 97513] starting up!
>> Parent [pid 30762] starting up!
>> Parent [pid 30764] starting up!
>> Parent done with spawn
>> Parent sending message to child
>> 1 completed MPI_Init
>> Hello from the child 1 of 3 on host budgeb-sandybridge pid 97513
>> 0 completed MPI_Init
>> Hello from the child 0 of 3 on host budgeb-interlagos pid 30762
>> 2 completed MPI_Init
>> Hello from the child 2 of 3 on host budgeb-interlagos pid 30764
>> Child 1 disconnected
>> Child 0 received msg: 38
>> Child 0 disconnected
>> Parent disconnected
>> Child 2 disconnected
>> 97511: exiting
>> 97513: exiting
>> 30762: exiting
>> 30764: exiting
>>
>> As you can see, I'm using openmpi v 1.6.1.  I just barely freshly
>> installed on both machines using the default configure options.
>>
>> Thanks for all your help.
>>
>>  Brian
>>
>> On Tue, Aug 28, 2012 at 4:39 PM, Ralph Castain  wrote:
>>> Looks to me like it didn't find your executable - could be a question of 
>>> where it exists relative to where you are running. If you look in your OMPI 
>>> source tree at the orte/test/mpi directory, you'll see an example program 
>>> "simple_spawn.c" there. Just "make simple_spawn" and execute that with your 
>>> default hostfile set - does it work okay?
>>>
>>> It works fine for me, hence the question.
>>>
>>> Also, what OMPI version are you using?
>>>
>>> On Aug 28, 2012, at 4:25 PM, Brian Budge  wrote:
>>>
 I see.  Okay.  So, I just tried removing the check for universe size,
 and set the universe size to 2.  Here's my output:

 LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib
 OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile ./master_exe
 [budgeb-interlagos:29965] [[4156,0],0] ORTE_ERROR_LOG: Fatal in file
 base/plm_base_receive.c at line 253
 [budgeb-interlagos:29963] [[4156,1],0] ORTE_ERROR_LOG: The specified
 application failed to start in file dpm_orte.c at line 785

 The corresponding run with mpirun still works.

 Thanks,
 Brian

 On Tue, Aug 28, 2012 at 2:46 PM, Ralph Castain  wrote:
> I see the issue - it's here:
>
>> MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, , );
>>
>> if(!flag) {
>> std::cerr << "no universe size" << std::endl;
>> return -1;
>> }
>> universeSize = *puniverseSize;
>> if(universeSize == 1) {
>> std::cerr << "cannot start slaves... not enough nodes" << std::endl;
>> }
>
> The universe size is set to 1 on a singleton because the attribute gets 
> set at the beginning of time - we haven't any way to go back and change 
> it. The sequence of events explains why. The singleton starts up and sets 
> its attributes, including universe_size. It also spins off an orte daemon 
> to act as its own private "mpirun" in case you call comm_spawn. At this 
> point, however, no hostfile has been read - the singleton is just an MPI 
> proc doing its own thing, and the orte daemon is just sitting there on 
> "stand-by".
>
> When your app calls comm_spawn, then the orte daemon gets called to 
> launch the new procs. At that time, it (not the original singleton!) 
> reads the hostfile to find out how many nodes are around, and then does 
> the launch.
>
> You are trying to check the number of nodes from within the singleton, 
> which won't work - it has no way of discovering that info.
>
>
>
>
> On Aug 28, 2012, at 2:38 PM, Brian Budge  wrote:
>
>>> echo hostsfile
>> localhost
>> budgeb-sandybridge
>>
>> Thanks,
>> Brian
>>
>> On 

Re: [OMPI users] MPI::Intracomm::Spawn and cluster configuration

2012-08-28 Thread Ralph Castain
Yeah, I'm seeing the hang as well when running across multiple machines. Let me 
dig a little and get this fixed.

Thanks
Ralph

On Aug 28, 2012, at 4:51 PM, Brian Budge  wrote:

> Hmmm, I went to the build directories of openmpi for my two machines,
> went into the orte/test/mpi directory and made the executables on both
> machines.  I set the hostsfile in the env variable on the "master"
> machine.
> 
> Here's the output:
> 
> OMPI_MCA_orte_default_hostfile=/home/budgeb/p4/pseb/external/install/openmpi-1.6.1/orte/test/mpi/hostsfile
> ./simple_spawn
> Parent [pid 97504] starting up!
> 0 completed MPI_Init
> Parent [pid 97504] about to spawn!
> Parent [pid 97507] starting up!
> Parent [pid 97508] starting up!
> Parent [pid 30626] starting up!
> ^C
> zsh: interrupt  OMPI_MCA_orte_default_hostfile= ./simple_spawn
> 
> I had to ^C to kill the hung process.
> 
> When I run using mpirun:
> 
> OMPI_MCA_orte_default_hostfile=/home/budgeb/p4/pseb/external/install/openmpi-1.6.1/orte/test/mpi/hostsfile
> mpirun -np 1 ./simple_spawn
> Parent [pid 97511] starting up!
> 0 completed MPI_Init
> Parent [pid 97511] about to spawn!
> Parent [pid 97513] starting up!
> Parent [pid 30762] starting up!
> Parent [pid 30764] starting up!
> Parent done with spawn
> Parent sending message to child
> 1 completed MPI_Init
> Hello from the child 1 of 3 on host budgeb-sandybridge pid 97513
> 0 completed MPI_Init
> Hello from the child 0 of 3 on host budgeb-interlagos pid 30762
> 2 completed MPI_Init
> Hello from the child 2 of 3 on host budgeb-interlagos pid 30764
> Child 1 disconnected
> Child 0 received msg: 38
> Child 0 disconnected
> Parent disconnected
> Child 2 disconnected
> 97511: exiting
> 97513: exiting
> 30762: exiting
> 30764: exiting
> 
> As you can see, I'm using openmpi v 1.6.1.  I just barely freshly
> installed on both machines using the default configure options.
> 
> Thanks for all your help.
> 
>  Brian
> 
> On Tue, Aug 28, 2012 at 4:39 PM, Ralph Castain  wrote:
>> Looks to me like it didn't find your executable - could be a question of 
>> where it exists relative to where you are running. If you look in your OMPI 
>> source tree at the orte/test/mpi directory, you'll see an example program 
>> "simple_spawn.c" there. Just "make simple_spawn" and execute that with your 
>> default hostfile set - does it work okay?
>> 
>> It works fine for me, hence the question.
>> 
>> Also, what OMPI version are you using?
>> 
>> On Aug 28, 2012, at 4:25 PM, Brian Budge  wrote:
>> 
>>> I see.  Okay.  So, I just tried removing the check for universe size,
>>> and set the universe size to 2.  Here's my output:
>>> 
>>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib
>>> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile ./master_exe
>>> [budgeb-interlagos:29965] [[4156,0],0] ORTE_ERROR_LOG: Fatal in file
>>> base/plm_base_receive.c at line 253
>>> [budgeb-interlagos:29963] [[4156,1],0] ORTE_ERROR_LOG: The specified
>>> application failed to start in file dpm_orte.c at line 785
>>> 
>>> The corresponding run with mpirun still works.
>>> 
>>> Thanks,
>>> Brian
>>> 
>>> On Tue, Aug 28, 2012 at 2:46 PM, Ralph Castain  wrote:
 I see the issue - it's here:
 
> MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, , );
> 
> if(!flag) {
> std::cerr << "no universe size" << std::endl;
> return -1;
> }
> universeSize = *puniverseSize;
> if(universeSize == 1) {
> std::cerr << "cannot start slaves... not enough nodes" << std::endl;
> }
 
 The universe size is set to 1 on a singleton because the attribute gets 
 set at the beginning of time - we haven't any way to go back and change 
 it. The sequence of events explains why. The singleton starts up and sets 
 its attributes, including universe_size. It also spins off an orte daemon 
 to act as its own private "mpirun" in case you call comm_spawn. At this 
 point, however, no hostfile has been read - the singleton is just an MPI 
 proc doing its own thing, and the orte daemon is just sitting there on 
 "stand-by".
 
 When your app calls comm_spawn, then the orte daemon gets called to launch 
 the new procs. At that time, it (not the original singleton!) reads the 
 hostfile to find out how many nodes are around, and then does the launch.
 
 You are trying to check the number of nodes from within the singleton, 
 which won't work - it has no way of discovering that info.
 
 
 
 
 On Aug 28, 2012, at 2:38 PM, Brian Budge  wrote:
 
>> echo hostsfile
> localhost
> budgeb-sandybridge
> 
> Thanks,
> Brian
> 
> On Tue, Aug 28, 2012 at 2:36 PM, Ralph Castain  wrote:
>> Hmmm...what is in your "hostsfile"?
>> 
>> On Aug 28, 2012, at 2:33 PM, Brian Budge  

Re: [OMPI users] MPI::Intracomm::Spawn and cluster configuration

2012-08-28 Thread Brian Budge
Hmmm, I went to the build directories of openmpi for my two machines,
went into the orte/test/mpi directory and made the executables on both
machines.  I set the hostsfile in the env variable on the "master"
machine.

Here's the output:

OMPI_MCA_orte_default_hostfile=/home/budgeb/p4/pseb/external/install/openmpi-1.6.1/orte/test/mpi/hostsfile
./simple_spawn
Parent [pid 97504] starting up!
0 completed MPI_Init
Parent [pid 97504] about to spawn!
Parent [pid 97507] starting up!
Parent [pid 97508] starting up!
Parent [pid 30626] starting up!
^C
zsh: interrupt  OMPI_MCA_orte_default_hostfile= ./simple_spawn

I had to ^C to kill the hung process.

When I run using mpirun:

OMPI_MCA_orte_default_hostfile=/home/budgeb/p4/pseb/external/install/openmpi-1.6.1/orte/test/mpi/hostsfile
mpirun -np 1 ./simple_spawn
Parent [pid 97511] starting up!
0 completed MPI_Init
Parent [pid 97511] about to spawn!
Parent [pid 97513] starting up!
Parent [pid 30762] starting up!
Parent [pid 30764] starting up!
Parent done with spawn
Parent sending message to child
1 completed MPI_Init
Hello from the child 1 of 3 on host budgeb-sandybridge pid 97513
0 completed MPI_Init
Hello from the child 0 of 3 on host budgeb-interlagos pid 30762
2 completed MPI_Init
Hello from the child 2 of 3 on host budgeb-interlagos pid 30764
Child 1 disconnected
Child 0 received msg: 38
Child 0 disconnected
Parent disconnected
Child 2 disconnected
97511: exiting
97513: exiting
30762: exiting
30764: exiting

As you can see, I'm using openmpi v 1.6.1.  I just barely freshly
installed on both machines using the default configure options.

Thanks for all your help.

  Brian

On Tue, Aug 28, 2012 at 4:39 PM, Ralph Castain  wrote:
> Looks to me like it didn't find your executable - could be a question of 
> where it exists relative to where you are running. If you look in your OMPI 
> source tree at the orte/test/mpi directory, you'll see an example program 
> "simple_spawn.c" there. Just "make simple_spawn" and execute that with your 
> default hostfile set - does it work okay?
>
> It works fine for me, hence the question.
>
> Also, what OMPI version are you using?
>
> On Aug 28, 2012, at 4:25 PM, Brian Budge  wrote:
>
>> I see.  Okay.  So, I just tried removing the check for universe size,
>> and set the universe size to 2.  Here's my output:
>>
>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib
>> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile ./master_exe
>> [budgeb-interlagos:29965] [[4156,0],0] ORTE_ERROR_LOG: Fatal in file
>> base/plm_base_receive.c at line 253
>> [budgeb-interlagos:29963] [[4156,1],0] ORTE_ERROR_LOG: The specified
>> application failed to start in file dpm_orte.c at line 785
>>
>> The corresponding run with mpirun still works.
>>
>> Thanks,
>>  Brian
>>
>> On Tue, Aug 28, 2012 at 2:46 PM, Ralph Castain  wrote:
>>> I see the issue - it's here:
>>>
  MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, , );

  if(!flag) {
  std::cerr << "no universe size" << std::endl;
  return -1;
  }
  universeSize = *puniverseSize;
  if(universeSize == 1) {
  std::cerr << "cannot start slaves... not enough nodes" << std::endl;
  }
>>>
>>> The universe size is set to 1 on a singleton because the attribute gets set 
>>> at the beginning of time - we haven't any way to go back and change it. The 
>>> sequence of events explains why. The singleton starts up and sets its 
>>> attributes, including universe_size. It also spins off an orte daemon to 
>>> act as its own private "mpirun" in case you call comm_spawn. At this point, 
>>> however, no hostfile has been read - the singleton is just an MPI proc 
>>> doing its own thing, and the orte daemon is just sitting there on 
>>> "stand-by".
>>>
>>> When your app calls comm_spawn, then the orte daemon gets called to launch 
>>> the new procs. At that time, it (not the original singleton!) reads the 
>>> hostfile to find out how many nodes are around, and then does the launch.
>>>
>>> You are trying to check the number of nodes from within the singleton, 
>>> which won't work - it has no way of discovering that info.
>>>
>>>
>>>
>>>
>>> On Aug 28, 2012, at 2:38 PM, Brian Budge  wrote:
>>>
> echo hostsfile
 localhost
 budgeb-sandybridge

 Thanks,
 Brian

 On Tue, Aug 28, 2012 at 2:36 PM, Ralph Castain  wrote:
> Hmmm...what is in your "hostsfile"?
>
> On Aug 28, 2012, at 2:33 PM, Brian Budge  wrote:
>
>> Hi Ralph -
>>
>> Thanks for confirming this is possible.  I'm trying this and currently
>> failing.  Perhaps there's something I'm missing in the code to make
>> this work.  Here are the two instantiations and their outputs:
>>
>>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib 
>>> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile 

Re: [OMPI users] MPI::Intracomm::Spawn and cluster configuration

2012-08-28 Thread Ralph Castain
Looks to me like it didn't find your executable - could be a question of where 
it exists relative to where you are running. If you look in your OMPI source 
tree at the orte/test/mpi directory, you'll see an example program 
"simple_spawn.c" there. Just "make simple_spawn" and execute that with your 
default hostfile set - does it work okay?

It works fine for me, hence the question.

Also, what OMPI version are you using?

On Aug 28, 2012, at 4:25 PM, Brian Budge  wrote:

> I see.  Okay.  So, I just tried removing the check for universe size,
> and set the universe size to 2.  Here's my output:
> 
> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib
> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile ./master_exe
> [budgeb-interlagos:29965] [[4156,0],0] ORTE_ERROR_LOG: Fatal in file
> base/plm_base_receive.c at line 253
> [budgeb-interlagos:29963] [[4156,1],0] ORTE_ERROR_LOG: The specified
> application failed to start in file dpm_orte.c at line 785
> 
> The corresponding run with mpirun still works.
> 
> Thanks,
>  Brian
> 
> On Tue, Aug 28, 2012 at 2:46 PM, Ralph Castain  wrote:
>> I see the issue - it's here:
>> 
>>>  MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, , );
>>> 
>>>  if(!flag) {
>>>  std::cerr << "no universe size" << std::endl;
>>>  return -1;
>>>  }
>>>  universeSize = *puniverseSize;
>>>  if(universeSize == 1) {
>>>  std::cerr << "cannot start slaves... not enough nodes" << std::endl;
>>>  }
>> 
>> The universe size is set to 1 on a singleton because the attribute gets set 
>> at the beginning of time - we haven't any way to go back and change it. The 
>> sequence of events explains why. The singleton starts up and sets its 
>> attributes, including universe_size. It also spins off an orte daemon to act 
>> as its own private "mpirun" in case you call comm_spawn. At this point, 
>> however, no hostfile has been read - the singleton is just an MPI proc doing 
>> its own thing, and the orte daemon is just sitting there on "stand-by".
>> 
>> When your app calls comm_spawn, then the orte daemon gets called to launch 
>> the new procs. At that time, it (not the original singleton!) reads the 
>> hostfile to find out how many nodes are around, and then does the launch.
>> 
>> You are trying to check the number of nodes from within the singleton, which 
>> won't work - it has no way of discovering that info.
>> 
>> 
>> 
>> 
>> On Aug 28, 2012, at 2:38 PM, Brian Budge  wrote:
>> 
 echo hostsfile
>>> localhost
>>> budgeb-sandybridge
>>> 
>>> Thanks,
>>> Brian
>>> 
>>> On Tue, Aug 28, 2012 at 2:36 PM, Ralph Castain  wrote:
 Hmmm...what is in your "hostsfile"?
 
 On Aug 28, 2012, at 2:33 PM, Brian Budge  wrote:
 
> Hi Ralph -
> 
> Thanks for confirming this is possible.  I'm trying this and currently
> failing.  Perhaps there's something I'm missing in the code to make
> this work.  Here are the two instantiations and their outputs:
> 
>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib 
>> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile ./master_exe
> cannot start slaves... not enough nodes
> 
>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib 
>> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile mpirun -n 1 ./master_exe
> master spawned 1 slaves...
> slave responding...
> 
> 
> The code:
> 
> //master.cpp
> #include 
> #include 
> #include 
> 
> int main(int argc, char **args) {
>  int worldSize, universeSize, *puniverseSize, flag;
> 
>  MPI_Comm everyone; //intercomm
>  boost::filesystem::path curPath =
> boost::filesystem::absolute(boost::filesystem::current_path());
> 
>  std::string toRun = (curPath / "slave_exe").string();
> 
>  int ret = MPI_Init(, );
> 
>  if(ret != MPI_SUCCESS) {
>  std::cerr << "failed init" << std::endl;
>  return -1;
>  }
> 
>  MPI_Comm_size(MPI_COMM_WORLD, );
> 
>  if(worldSize != 1) {
>  std::cerr << "too many masters" << std::endl;
>  }
> 
>  MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, , );
> 
>  if(!flag) {
>  std::cerr << "no universe size" << std::endl;
>  return -1;
>  }
>  universeSize = *puniverseSize;
>  if(universeSize == 1) {
>  std::cerr << "cannot start slaves... not enough nodes" << std::endl;
>  }
> 
> 
>  char *buf = (char*)alloca(toRun.size() + 1);
>  memcpy(buf, toRun.c_str(), toRun.size());
>  buf[toRun.size()] = '\0';
> 
>  MPI_Comm_spawn(buf, MPI_ARGV_NULL, universeSize-1, MPI_INFO_NULL,
> 0, MPI_COMM_SELF, ,
> MPI_ERRCODES_IGNORE);
> 
>  std::cerr << "master spawned " << universeSize-1 << " slaves..."
> << std::endl;
> 
>  MPI_Finalize();

Re: [OMPI users] MPI::Intracomm::Spawn and cluster configuration

2012-08-28 Thread Brian Budge
I see.  Okay.  So, I just tried removing the check for universe size,
and set the universe size to 2.  Here's my output:

LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib
OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile ./master_exe
[budgeb-interlagos:29965] [[4156,0],0] ORTE_ERROR_LOG: Fatal in file
base/plm_base_receive.c at line 253
[budgeb-interlagos:29963] [[4156,1],0] ORTE_ERROR_LOG: The specified
application failed to start in file dpm_orte.c at line 785

The corresponding run with mpirun still works.

Thanks,
  Brian

On Tue, Aug 28, 2012 at 2:46 PM, Ralph Castain  wrote:
> I see the issue - it's here:
>
>>   MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, , );
>>
>>   if(!flag) {
>>   std::cerr << "no universe size" << std::endl;
>>   return -1;
>>   }
>>   universeSize = *puniverseSize;
>>   if(universeSize == 1) {
>>   std::cerr << "cannot start slaves... not enough nodes" << std::endl;
>>   }
>
> The universe size is set to 1 on a singleton because the attribute gets set 
> at the beginning of time - we haven't any way to go back and change it. The 
> sequence of events explains why. The singleton starts up and sets its 
> attributes, including universe_size. It also spins off an orte daemon to act 
> as its own private "mpirun" in case you call comm_spawn. At this point, 
> however, no hostfile has been read - the singleton is just an MPI proc doing 
> its own thing, and the orte daemon is just sitting there on "stand-by".
>
> When your app calls comm_spawn, then the orte daemon gets called to launch 
> the new procs. At that time, it (not the original singleton!) reads the 
> hostfile to find out how many nodes are around, and then does the launch.
>
> You are trying to check the number of nodes from within the singleton, which 
> won't work - it has no way of discovering that info.
>
>
>
>
> On Aug 28, 2012, at 2:38 PM, Brian Budge  wrote:
>
>>> echo hostsfile
>> localhost
>> budgeb-sandybridge
>>
>> Thanks,
>>  Brian
>>
>> On Tue, Aug 28, 2012 at 2:36 PM, Ralph Castain  wrote:
>>> Hmmm...what is in your "hostsfile"?
>>>
>>> On Aug 28, 2012, at 2:33 PM, Brian Budge  wrote:
>>>
 Hi Ralph -

 Thanks for confirming this is possible.  I'm trying this and currently
 failing.  Perhaps there's something I'm missing in the code to make
 this work.  Here are the two instantiations and their outputs:

> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib 
> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile ./master_exe
 cannot start slaves... not enough nodes

> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib 
> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile mpirun -n 1 ./master_exe
 master spawned 1 slaves...
 slave responding...


 The code:

 //master.cpp
 #include 
 #include 
 #include 

 int main(int argc, char **args) {
   int worldSize, universeSize, *puniverseSize, flag;

   MPI_Comm everyone; //intercomm
   boost::filesystem::path curPath =
 boost::filesystem::absolute(boost::filesystem::current_path());

   std::string toRun = (curPath / "slave_exe").string();

   int ret = MPI_Init(, );

   if(ret != MPI_SUCCESS) {
   std::cerr << "failed init" << std::endl;
   return -1;
   }

   MPI_Comm_size(MPI_COMM_WORLD, );

   if(worldSize != 1) {
   std::cerr << "too many masters" << std::endl;
   }

   MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, , );

   if(!flag) {
   std::cerr << "no universe size" << std::endl;
   return -1;
   }
   universeSize = *puniverseSize;
   if(universeSize == 1) {
   std::cerr << "cannot start slaves... not enough nodes" << std::endl;
   }


   char *buf = (char*)alloca(toRun.size() + 1);
   memcpy(buf, toRun.c_str(), toRun.size());
   buf[toRun.size()] = '\0';

   MPI_Comm_spawn(buf, MPI_ARGV_NULL, universeSize-1, MPI_INFO_NULL,
 0, MPI_COMM_SELF, ,
  MPI_ERRCODES_IGNORE);

   std::cerr << "master spawned " << universeSize-1 << " slaves..."
 << std::endl;

   MPI_Finalize();

  return 0;
 }


 //slave.cpp
 #include 

 int main(int argc, char **args) {
   int size;
   MPI_Comm parent;
   MPI_Init(, );

   MPI_Comm_get_parent();

   if(parent == MPI_COMM_NULL) {
   std::cerr << "slave has no parent" << std::endl;
   }
   MPI_Comm_remote_size(parent, );
   if(size != 1) {
   std::cerr << "parent size is " << size << std::endl;
   }

   std::cerr << "slave responding..." << std::endl;

   MPI_Finalize();

   return 0;
 }


 Any ideas?  Thanks for any help.

 Brian


Re: [OMPI users] 转发:lwkmpi

2012-08-28 Thread Reuti
There is only one file where "return { ... };" is used.

--disable-vt

seems to fix it.

-- Reuti


Am 28.08.2012 um 14:56 schrieb Tim Prince:

> On 8/28/2012 5:11 AM, 清风 wrote:
>> 
>> 
>> 
>> -- 原始邮 件 --
>> *发件人:* "295187383"<295187...@qq.com>;
>> *发送时间:* 2012年8月28日(星期二) 下午4:13
>> *收件人:* "users";
>> *主题:* lwkmpi
>> 
>> Hi everybody,
>>I'm trying compile openmpi with intel compiler11.1.07 on ubuntu .
>>I compiled openmpi  many times and I could always find a problem. But the 
>> error that I'm getting now, gives me no clues where to even search for the 
>> problem.
>>It seems I have succeed to configure.While I try "make all",it always 
>> show problems below:
>> 
>> 
>> 
>>make[7]: 正在进入目录 
>> `/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt/tools/opari/tool'
>> /opt/intel/Compiler/11.1/072/bin/ia32/icpc -DHAVE_CONFIG_H -I. -I../../..   
>> -DINSIDE_OPENMPI -I/home/lwk/桌面/mnt/Software/openmpi- 
>> 1.6.1/opal/mca/hwloc/hwloc132/hwloc /include   -I/usr/include/infiniband 
>> -I/usr/include/infiniband  -DOPARI_VT -O3 -DNDEBUG -finline-functions 
>> -pthread -MT opari-ompragma_c.o -MD -MP -MF .deps/opari-ompragma_c.Tpo -c -o 
>> opari-ompragma_c.o `test -f 'ompragma_c.cc' || echo './'`ompragma_c.cc
>> /usr/include/c++/4.5/iomanip(64): error: expected an expression
>>{ return { __mask }; }
>> ^
>> 
> 
> Looks like your icpc is too old to work with your g++.  If you want to build 
> with C++ support, you'll need better matching versions of icpc and g++.  icpc 
> support for g++4.7 is expected to release within the next month; icpc 12.1 
> should be fine with g++ 4.5 and 4.6.
> 
> -- 
> Tim Prince
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 




Re: [OMPI users] MPI::Intracomm::Spawn and cluster configuration

2012-08-28 Thread Ralph Castain
I see the issue - it's here:

>   MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, , );
> 
>   if(!flag) {
>   std::cerr << "no universe size" << std::endl;
>   return -1;
>   }
>   universeSize = *puniverseSize;
>   if(universeSize == 1) {
>   std::cerr << "cannot start slaves... not enough nodes" << std::endl;
>   }

The universe size is set to 1 on a singleton because the attribute gets set at 
the beginning of time - we haven't any way to go back and change it. The 
sequence of events explains why. The singleton starts up and sets its 
attributes, including universe_size. It also spins off an orte daemon to act as 
its own private "mpirun" in case you call comm_spawn. At this point, however, 
no hostfile has been read - the singleton is just an MPI proc doing its own 
thing, and the orte daemon is just sitting there on "stand-by".

When your app calls comm_spawn, then the orte daemon gets called to launch the 
new procs. At that time, it (not the original singleton!) reads the hostfile to 
find out how many nodes are around, and then does the launch.

You are trying to check the number of nodes from within the singleton, which 
won't work - it has no way of discovering that info.




On Aug 28, 2012, at 2:38 PM, Brian Budge  wrote:

>> echo hostsfile
> localhost
> budgeb-sandybridge
> 
> Thanks,
>  Brian
> 
> On Tue, Aug 28, 2012 at 2:36 PM, Ralph Castain  wrote:
>> Hmmm...what is in your "hostsfile"?
>> 
>> On Aug 28, 2012, at 2:33 PM, Brian Budge  wrote:
>> 
>>> Hi Ralph -
>>> 
>>> Thanks for confirming this is possible.  I'm trying this and currently
>>> failing.  Perhaps there's something I'm missing in the code to make
>>> this work.  Here are the two instantiations and their outputs:
>>> 
 LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib 
 OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile ./master_exe
>>> cannot start slaves... not enough nodes
>>> 
 LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib 
 OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile mpirun -n 1 ./master_exe
>>> master spawned 1 slaves...
>>> slave responding...
>>> 
>>> 
>>> The code:
>>> 
>>> //master.cpp
>>> #include 
>>> #include 
>>> #include 
>>> 
>>> int main(int argc, char **args) {
>>>   int worldSize, universeSize, *puniverseSize, flag;
>>> 
>>>   MPI_Comm everyone; //intercomm
>>>   boost::filesystem::path curPath =
>>> boost::filesystem::absolute(boost::filesystem::current_path());
>>> 
>>>   std::string toRun = (curPath / "slave_exe").string();
>>> 
>>>   int ret = MPI_Init(, );
>>> 
>>>   if(ret != MPI_SUCCESS) {
>>>   std::cerr << "failed init" << std::endl;
>>>   return -1;
>>>   }
>>> 
>>>   MPI_Comm_size(MPI_COMM_WORLD, );
>>> 
>>>   if(worldSize != 1) {
>>>   std::cerr << "too many masters" << std::endl;
>>>   }
>>> 
>>>   MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, , );
>>> 
>>>   if(!flag) {
>>>   std::cerr << "no universe size" << std::endl;
>>>   return -1;
>>>   }
>>>   universeSize = *puniverseSize;
>>>   if(universeSize == 1) {
>>>   std::cerr << "cannot start slaves... not enough nodes" << std::endl;
>>>   }
>>> 
>>> 
>>>   char *buf = (char*)alloca(toRun.size() + 1);
>>>   memcpy(buf, toRun.c_str(), toRun.size());
>>>   buf[toRun.size()] = '\0';
>>> 
>>>   MPI_Comm_spawn(buf, MPI_ARGV_NULL, universeSize-1, MPI_INFO_NULL,
>>> 0, MPI_COMM_SELF, ,
>>>  MPI_ERRCODES_IGNORE);
>>> 
>>>   std::cerr << "master spawned " << universeSize-1 << " slaves..."
>>> << std::endl;
>>> 
>>>   MPI_Finalize();
>>> 
>>>  return 0;
>>> }
>>> 
>>> 
>>> //slave.cpp
>>> #include 
>>> 
>>> int main(int argc, char **args) {
>>>   int size;
>>>   MPI_Comm parent;
>>>   MPI_Init(, );
>>> 
>>>   MPI_Comm_get_parent();
>>> 
>>>   if(parent == MPI_COMM_NULL) {
>>>   std::cerr << "slave has no parent" << std::endl;
>>>   }
>>>   MPI_Comm_remote_size(parent, );
>>>   if(size != 1) {
>>>   std::cerr << "parent size is " << size << std::endl;
>>>   }
>>> 
>>>   std::cerr << "slave responding..." << std::endl;
>>> 
>>>   MPI_Finalize();
>>> 
>>>   return 0;
>>> }
>>> 
>>> 
>>> Any ideas?  Thanks for any help.
>>> 
>>> Brian
>>> 
>>> On Wed, Aug 22, 2012 at 9:03 AM, Ralph Castain  wrote:
 It really is just that simple :-)
 
 On Aug 22, 2012, at 8:56 AM, Brian Budge  wrote:
 
> Okay.  Is there a tutorial or FAQ for setting everything up?  Or is it
> really just that simple?  I don't need to run a copy of the orte
> server somewhere?
> 
> if my current ip is 192.168.0.1,
> 
> 0 > echo 192.168.0.11 > /tmp/hostfile
> 1 > echo 192.168.0.12 >> /tmp/hostfile
> 2 > export OMPI_MCA_orte_default_hostfile=/tmp/hostfile
> 3 > ./mySpawningExe
> 
> At this point, mySpawningExe will be the master, running on
> 192.168.0.1, and I can have spawned, for example, childExe on

Re: [OMPI users] MPI_Init

2012-08-28 Thread Ralph Castain
Okay, I fixed this on our trunk - I'll post it for transfer to the 1.7 and 1.6 
series in their next releases.

Thanks!

On Aug 28, 2012, at 2:27 PM, Ralph Castain  wrote:

> Oh crud - yes we do. Checking on it...
> 
> On Aug 28, 2012, at 2:23 PM, Ralph Castain  wrote:
> 
>> Glancing at the code, I don't see anywhere that we trap SIGCHLD outside of 
>> mpirun and the orte daemons - certainly not inside an MPI app. What version 
>> of OMPI are you using?
>> 
>> On Aug 28, 2012, at 2:06 PM, Tony Raymond  wrote:
>> 
>>> Hi,
>>> 
>>> I have an application that uses openMPI and creates some child processes 
>>> using fork(). I've been trying to catch SIGCHLD in order to check the exit 
>>> status of these processes so that the program will exit if a child errors 
>>> out. 
>>> 
>>> I've found out that if I set the SIGCHLD handler before calling MPI_Init, 
>>> MPI_Init sets the SIGCHLD handler so that my application appears to ignore 
>>> SIGCHLD, but if I set my handler after MPI_Init, the application handles 
>>> SIGCHLD appropriately. 
>>> 
>>> I'm wondering if there are any problems that could come up by changing the 
>>> SIGCHLD handler, and why MPI_Init modifies the SIGCHLD handler in the first 
>>> place.
>>> 
>>> Thanks,
>>> Tony
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> 




Re: [OMPI users] MPI::Intracomm::Spawn and cluster configuration

2012-08-28 Thread Brian Budge
>echo hostsfile
localhost
budgeb-sandybridge

Thanks,
  Brian

On Tue, Aug 28, 2012 at 2:36 PM, Ralph Castain  wrote:
> Hmmm...what is in your "hostsfile"?
>
> On Aug 28, 2012, at 2:33 PM, Brian Budge  wrote:
>
>> Hi Ralph -
>>
>> Thanks for confirming this is possible.  I'm trying this and currently
>> failing.  Perhaps there's something I'm missing in the code to make
>> this work.  Here are the two instantiations and their outputs:
>>
>>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib 
>>> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile ./master_exe
>> cannot start slaves... not enough nodes
>>
>>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib 
>>> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile mpirun -n 1 ./master_exe
>> master spawned 1 slaves...
>> slave responding...
>>
>>
>> The code:
>>
>> //master.cpp
>> #include 
>> #include 
>> #include 
>>
>> int main(int argc, char **args) {
>>int worldSize, universeSize, *puniverseSize, flag;
>>
>>MPI_Comm everyone; //intercomm
>>boost::filesystem::path curPath =
>> boost::filesystem::absolute(boost::filesystem::current_path());
>>
>>std::string toRun = (curPath / "slave_exe").string();
>>
>>int ret = MPI_Init(, );
>>
>>if(ret != MPI_SUCCESS) {
>>std::cerr << "failed init" << std::endl;
>>return -1;
>>}
>>
>>MPI_Comm_size(MPI_COMM_WORLD, );
>>
>>if(worldSize != 1) {
>>std::cerr << "too many masters" << std::endl;
>>}
>>
>>MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, , );
>>
>>if(!flag) {
>>std::cerr << "no universe size" << std::endl;
>>return -1;
>>}
>>universeSize = *puniverseSize;
>>if(universeSize == 1) {
>>std::cerr << "cannot start slaves... not enough nodes" << std::endl;
>>}
>>
>>
>>char *buf = (char*)alloca(toRun.size() + 1);
>>memcpy(buf, toRun.c_str(), toRun.size());
>>buf[toRun.size()] = '\0';
>>
>>MPI_Comm_spawn(buf, MPI_ARGV_NULL, universeSize-1, MPI_INFO_NULL,
>> 0, MPI_COMM_SELF, ,
>>   MPI_ERRCODES_IGNORE);
>>
>>std::cerr << "master spawned " << universeSize-1 << " slaves..."
>> << std::endl;
>>
>>MPI_Finalize();
>>
>>   return 0;
>> }
>>
>>
>> //slave.cpp
>> #include 
>>
>> int main(int argc, char **args) {
>>int size;
>>MPI_Comm parent;
>>MPI_Init(, );
>>
>>MPI_Comm_get_parent();
>>
>>if(parent == MPI_COMM_NULL) {
>>std::cerr << "slave has no parent" << std::endl;
>>}
>>MPI_Comm_remote_size(parent, );
>>if(size != 1) {
>>std::cerr << "parent size is " << size << std::endl;
>>}
>>
>>std::cerr << "slave responding..." << std::endl;
>>
>>MPI_Finalize();
>>
>>return 0;
>> }
>>
>>
>> Any ideas?  Thanks for any help.
>>
>>  Brian
>>
>> On Wed, Aug 22, 2012 at 9:03 AM, Ralph Castain  wrote:
>>> It really is just that simple :-)
>>>
>>> On Aug 22, 2012, at 8:56 AM, Brian Budge  wrote:
>>>
 Okay.  Is there a tutorial or FAQ for setting everything up?  Or is it
 really just that simple?  I don't need to run a copy of the orte
 server somewhere?

 if my current ip is 192.168.0.1,

 0 > echo 192.168.0.11 > /tmp/hostfile
 1 > echo 192.168.0.12 >> /tmp/hostfile
 2 > export OMPI_MCA_orte_default_hostfile=/tmp/hostfile
 3 > ./mySpawningExe

 At this point, mySpawningExe will be the master, running on
 192.168.0.1, and I can have spawned, for example, childExe on
 192.168.0.11 and 192.168.0.12?  Or childExe1 on 192.168.0.11 and
 childExe2 on 192.168.0.12?

 Thanks for the help.

 Brian

 On Wed, Aug 22, 2012 at 7:15 AM, Ralph Castain  wrote:
> Sure, that's still true on all 1.3 or above releases. All you need to do 
> is set the hostfile envar so we pick it up:
>
> OMPI_MCA_orte_default_hostfile=
>
>
> On Aug 21, 2012, at 7:23 PM, Brian Budge  wrote:
>
>> Hi.  I know this is an old thread, but I'm curious if there are any
>> tutorials describing how to set this up?  Is this still available on
>> newer open mpi versions?
>>
>> Thanks,
>> Brian
>>
>> On Fri, Jan 4, 2008 at 7:57 AM, Ralph Castain  wrote:
>>> Hi Elena
>>>
>>> I'm copying this to the user list just to correct a mis-statement on my 
>>> part
>>> in an earlier message that went there. I had stated that a singleton 
>>> could
>>> comm_spawn onto other nodes listed in a hostfile by setting an 
>>> environmental
>>> variable that pointed us to the hostfile.
>>>
>>> This is incorrect in the 1.2 code series. That series does not allow
>>> singletons to read a hostfile at all. Hence, any comm_spawn done by a
>>> singleton can only launch child processes on the singleton's local host.
>>>
>>> This 

Re: [OMPI users] MPI::Intracomm::Spawn and cluster configuration

2012-08-28 Thread Ralph Castain
Hmmm...what is in your "hostsfile"?

On Aug 28, 2012, at 2:33 PM, Brian Budge  wrote:

> Hi Ralph -
> 
> Thanks for confirming this is possible.  I'm trying this and currently
> failing.  Perhaps there's something I'm missing in the code to make
> this work.  Here are the two instantiations and their outputs:
> 
>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib 
>> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile ./master_exe
> cannot start slaves... not enough nodes
> 
>> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib 
>> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile mpirun -n 1 ./master_exe
> master spawned 1 slaves...
> slave responding...
> 
> 
> The code:
> 
> //master.cpp
> #include 
> #include 
> #include 
> 
> int main(int argc, char **args) {
>int worldSize, universeSize, *puniverseSize, flag;
> 
>MPI_Comm everyone; //intercomm
>boost::filesystem::path curPath =
> boost::filesystem::absolute(boost::filesystem::current_path());
> 
>std::string toRun = (curPath / "slave_exe").string();
> 
>int ret = MPI_Init(, );
> 
>if(ret != MPI_SUCCESS) {
>std::cerr << "failed init" << std::endl;
>return -1;
>}
> 
>MPI_Comm_size(MPI_COMM_WORLD, );
> 
>if(worldSize != 1) {
>std::cerr << "too many masters" << std::endl;
>}
> 
>MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, , );
> 
>if(!flag) {
>std::cerr << "no universe size" << std::endl;
>return -1;
>}
>universeSize = *puniverseSize;
>if(universeSize == 1) {
>std::cerr << "cannot start slaves... not enough nodes" << std::endl;
>}
> 
> 
>char *buf = (char*)alloca(toRun.size() + 1);
>memcpy(buf, toRun.c_str(), toRun.size());
>buf[toRun.size()] = '\0';
> 
>MPI_Comm_spawn(buf, MPI_ARGV_NULL, universeSize-1, MPI_INFO_NULL,
> 0, MPI_COMM_SELF, ,
>   MPI_ERRCODES_IGNORE);
> 
>std::cerr << "master spawned " << universeSize-1 << " slaves..."
> << std::endl;
> 
>MPI_Finalize();
> 
>   return 0;
> }
> 
> 
> //slave.cpp
> #include 
> 
> int main(int argc, char **args) {
>int size;
>MPI_Comm parent;
>MPI_Init(, );
> 
>MPI_Comm_get_parent();
> 
>if(parent == MPI_COMM_NULL) {
>std::cerr << "slave has no parent" << std::endl;
>}
>MPI_Comm_remote_size(parent, );
>if(size != 1) {
>std::cerr << "parent size is " << size << std::endl;
>}
> 
>std::cerr << "slave responding..." << std::endl;
> 
>MPI_Finalize();
> 
>return 0;
> }
> 
> 
> Any ideas?  Thanks for any help.
> 
>  Brian
> 
> On Wed, Aug 22, 2012 at 9:03 AM, Ralph Castain  wrote:
>> It really is just that simple :-)
>> 
>> On Aug 22, 2012, at 8:56 AM, Brian Budge  wrote:
>> 
>>> Okay.  Is there a tutorial or FAQ for setting everything up?  Or is it
>>> really just that simple?  I don't need to run a copy of the orte
>>> server somewhere?
>>> 
>>> if my current ip is 192.168.0.1,
>>> 
>>> 0 > echo 192.168.0.11 > /tmp/hostfile
>>> 1 > echo 192.168.0.12 >> /tmp/hostfile
>>> 2 > export OMPI_MCA_orte_default_hostfile=/tmp/hostfile
>>> 3 > ./mySpawningExe
>>> 
>>> At this point, mySpawningExe will be the master, running on
>>> 192.168.0.1, and I can have spawned, for example, childExe on
>>> 192.168.0.11 and 192.168.0.12?  Or childExe1 on 192.168.0.11 and
>>> childExe2 on 192.168.0.12?
>>> 
>>> Thanks for the help.
>>> 
>>> Brian
>>> 
>>> On Wed, Aug 22, 2012 at 7:15 AM, Ralph Castain  wrote:
 Sure, that's still true on all 1.3 or above releases. All you need to do 
 is set the hostfile envar so we pick it up:
 
 OMPI_MCA_orte_default_hostfile=
 
 
 On Aug 21, 2012, at 7:23 PM, Brian Budge  wrote:
 
> Hi.  I know this is an old thread, but I'm curious if there are any
> tutorials describing how to set this up?  Is this still available on
> newer open mpi versions?
> 
> Thanks,
> Brian
> 
> On Fri, Jan 4, 2008 at 7:57 AM, Ralph Castain  wrote:
>> Hi Elena
>> 
>> I'm copying this to the user list just to correct a mis-statement on my 
>> part
>> in an earlier message that went there. I had stated that a singleton 
>> could
>> comm_spawn onto other nodes listed in a hostfile by setting an 
>> environmental
>> variable that pointed us to the hostfile.
>> 
>> This is incorrect in the 1.2 code series. That series does not allow
>> singletons to read a hostfile at all. Hence, any comm_spawn done by a
>> singleton can only launch child processes on the singleton's local host.
>> 
>> This situation has been corrected for the upcoming 1.3 code series. For 
>> the
>> 1.2 series, though, you will have to do it via an mpirun command line.
>> 
>> Sorry for the confusion - I sometimes have too many code families to keep
>> 

Re: [OMPI users] MPI::Intracomm::Spawn and cluster configuration

2012-08-28 Thread Brian Budge
Hi Ralph -

Thanks for confirming this is possible.  I'm trying this and currently
failing.  Perhaps there's something I'm missing in the code to make
this work.  Here are the two instantiations and their outputs:

> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib 
> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile ./master_exe
cannot start slaves... not enough nodes

> LD_LIBRARY_PATH=/home/budgeb/p4/pseb/external/lib.dev:/usr/local/lib 
> OMPI_MCA_orte_default_hostfile=`pwd`/hostsfile mpirun -n 1 ./master_exe
master spawned 1 slaves...
slave responding...


The code:

//master.cpp
#include 
#include 
#include 

int main(int argc, char **args) {
int worldSize, universeSize, *puniverseSize, flag;

MPI_Comm everyone; //intercomm
boost::filesystem::path curPath =
boost::filesystem::absolute(boost::filesystem::current_path());

std::string toRun = (curPath / "slave_exe").string();

int ret = MPI_Init(, );

if(ret != MPI_SUCCESS) {
std::cerr << "failed init" << std::endl;
return -1;
}

MPI_Comm_size(MPI_COMM_WORLD, );

if(worldSize != 1) {
std::cerr << "too many masters" << std::endl;
}

MPI_Attr_get(MPI_COMM_WORLD, MPI_UNIVERSE_SIZE, , );

if(!flag) {
std::cerr << "no universe size" << std::endl;
return -1;
}
universeSize = *puniverseSize;
if(universeSize == 1) {
std::cerr << "cannot start slaves... not enough nodes" << std::endl;
}


char *buf = (char*)alloca(toRun.size() + 1);
memcpy(buf, toRun.c_str(), toRun.size());
buf[toRun.size()] = '\0';

MPI_Comm_spawn(buf, MPI_ARGV_NULL, universeSize-1, MPI_INFO_NULL,
0, MPI_COMM_SELF, ,
   MPI_ERRCODES_IGNORE);

std::cerr << "master spawned " << universeSize-1 << " slaves..."
<< std::endl;

MPI_Finalize();

   return 0;
}


//slave.cpp
#include 

int main(int argc, char **args) {
int size;
MPI_Comm parent;
MPI_Init(, );

MPI_Comm_get_parent();

if(parent == MPI_COMM_NULL) {
std::cerr << "slave has no parent" << std::endl;
}
MPI_Comm_remote_size(parent, );
if(size != 1) {
std::cerr << "parent size is " << size << std::endl;
}

std::cerr << "slave responding..." << std::endl;

MPI_Finalize();

return 0;
}


Any ideas?  Thanks for any help.

  Brian

On Wed, Aug 22, 2012 at 9:03 AM, Ralph Castain  wrote:
> It really is just that simple :-)
>
> On Aug 22, 2012, at 8:56 AM, Brian Budge  wrote:
>
>> Okay.  Is there a tutorial or FAQ for setting everything up?  Or is it
>> really just that simple?  I don't need to run a copy of the orte
>> server somewhere?
>>
>> if my current ip is 192.168.0.1,
>>
>> 0 > echo 192.168.0.11 > /tmp/hostfile
>> 1 > echo 192.168.0.12 >> /tmp/hostfile
>> 2 > export OMPI_MCA_orte_default_hostfile=/tmp/hostfile
>> 3 > ./mySpawningExe
>>
>> At this point, mySpawningExe will be the master, running on
>> 192.168.0.1, and I can have spawned, for example, childExe on
>> 192.168.0.11 and 192.168.0.12?  Or childExe1 on 192.168.0.11 and
>> childExe2 on 192.168.0.12?
>>
>> Thanks for the help.
>>
>>  Brian
>>
>> On Wed, Aug 22, 2012 at 7:15 AM, Ralph Castain  wrote:
>>> Sure, that's still true on all 1.3 or above releases. All you need to do is 
>>> set the hostfile envar so we pick it up:
>>>
>>> OMPI_MCA_orte_default_hostfile=
>>>
>>>
>>> On Aug 21, 2012, at 7:23 PM, Brian Budge  wrote:
>>>
 Hi.  I know this is an old thread, but I'm curious if there are any
 tutorials describing how to set this up?  Is this still available on
 newer open mpi versions?

 Thanks,
 Brian

 On Fri, Jan 4, 2008 at 7:57 AM, Ralph Castain  wrote:
> Hi Elena
>
> I'm copying this to the user list just to correct a mis-statement on my 
> part
> in an earlier message that went there. I had stated that a singleton could
> comm_spawn onto other nodes listed in a hostfile by setting an 
> environmental
> variable that pointed us to the hostfile.
>
> This is incorrect in the 1.2 code series. That series does not allow
> singletons to read a hostfile at all. Hence, any comm_spawn done by a
> singleton can only launch child processes on the singleton's local host.
>
> This situation has been corrected for the upcoming 1.3 code series. For 
> the
> 1.2 series, though, you will have to do it via an mpirun command line.
>
> Sorry for the confusion - I sometimes have too many code families to keep
> straight in this old mind!
>
> Ralph
>
>
> On 1/4/08 5:10 AM, "Elena Zhebel"  wrote:
>
>> Hello Ralph,
>>
>> Thank you very much for the explanations.
>> But I still do not get it running...
>>
>> For the case
>> mpirun -n 1 -hostfile my_hostfile -host my_master_host my_master.exe

Re: [OMPI users] MPI_Init

2012-08-28 Thread Ralph Castain
Oh crud - yes we do. Checking on it...

On Aug 28, 2012, at 2:23 PM, Ralph Castain  wrote:

> Glancing at the code, I don't see anywhere that we trap SIGCHLD outside of 
> mpirun and the orte daemons - certainly not inside an MPI app. What version 
> of OMPI are you using?
> 
> On Aug 28, 2012, at 2:06 PM, Tony Raymond  wrote:
> 
>> Hi,
>> 
>> I have an application that uses openMPI and creates some child processes 
>> using fork(). I've been trying to catch SIGCHLD in order to check the exit 
>> status of these processes so that the program will exit if a child errors 
>> out. 
>> 
>> I've found out that if I set the SIGCHLD handler before calling MPI_Init, 
>> MPI_Init sets the SIGCHLD handler so that my application appears to ignore 
>> SIGCHLD, but if I set my handler after MPI_Init, the application handles 
>> SIGCHLD appropriately. 
>> 
>> I'm wondering if there are any problems that could come up by changing the 
>> SIGCHLD handler, and why MPI_Init modifies the SIGCHLD handler in the first 
>> place.
>> 
>> Thanks,
>> Tony
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 




Re: [OMPI users] MPI_Init

2012-08-28 Thread Ralph Castain
Glancing at the code, I don't see anywhere that we trap SIGCHLD outside of 
mpirun and the orte daemons - certainly not inside an MPI app. What version of 
OMPI are you using?

On Aug 28, 2012, at 2:06 PM, Tony Raymond  wrote:

> Hi,
> 
> I have an application that uses openMPI and creates some child processes 
> using fork(). I've been trying to catch SIGCHLD in order to check the exit 
> status of these processes so that the program will exit if a child errors 
> out. 
> 
> I've found out that if I set the SIGCHLD handler before calling MPI_Init, 
> MPI_Init sets the SIGCHLD handler so that my application appears to ignore 
> SIGCHLD, but if I set my handler after MPI_Init, the application handles 
> SIGCHLD appropriately. 
> 
> I'm wondering if there are any problems that could come up by changing the 
> SIGCHLD handler, and why MPI_Init modifies the SIGCHLD handler in the first 
> place.
> 
> Thanks,
> Tony
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] MPI_Init

2012-08-28 Thread Tony Raymond
Hi,

I have an application that uses openMPI and creates some child processes using 
fork(). I've been trying to catch SIGCHLD in order to check the exit status of 
these processes so that the program will exit if a child errors out. 

I've found out that if I set the SIGCHLD handler before calling MPI_Init, 
MPI_Init sets the SIGCHLD handler so that my application appears to ignore 
SIGCHLD, but if I set my handler after MPI_Init, the application handles 
SIGCHLD appropriately. 

I'm wondering if there are any problems that could come up by changing the 
SIGCHLD handler, and why MPI_Init modifies the SIGCHLD handler in the first 
place.

Thanks,
Tony


Re: [hwloc-users] lstopo and GPus

2012-08-28 Thread Gabriele Fatigati
Hi,

thanks for the reply.

How can cuda branch help me? lstopo output of that branch is the same of
the trunk.

Another question: the GPU IDs are the same (10de: 06d2). How is it possible?

Thanks.

2012/8/28 Samuel Thibault 

> Brice Goglin, le Tue 28 Aug 2012 14:43:53 +0200, a écrit :
> > > $ lstopo
> > >   Socket #0
> > >   Socket #1
> > > PCI...
> > > (connected to socket #1)
> > >
> > > vs
> > >
> > > $ lstopo
> > >   Socket #0
> > >   Socket #1
> > >   PCI...
> > > (connected to both sockets)
> >
> > Fortunately, this won't occur in most cases (including Gabriele's
> > machines) because there's a NUMAnode object above each socket.
>
> Oops, I actually meant NUMAnode above
>
> > Both the socket and the PCI bus are drawn inside the NUMA box, so
> > things appear OK in graphics to.
>
> Indeed, if the PCI bus was connected to one NUMAnode/socket only, it
> would be drawn inside, which is not the case.
>
> > Gabriele, assuming you have a dual Xeon X56xx Westmere machine, there
> > are plenty of such platforms where the GPU is indeed connected to both
> > sockets. Or it could be a buggy BIOS.
>
> Agreed.
>
> Samuel
> ___
> hwloc-users mailing list
> hwloc-us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-users
>



-- 
Ing. Gabriele Fatigati

HPC specialist

SuperComputing Applications and Innovation Department

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39 051 6171722

g.fatigati [AT] cineca.it


Re: [OMPI users] deprecated MCA parameter

2012-08-28 Thread Jeff Squyres
Ralph and I talked about this -- it seems like we should extend the help 
message.  If there is no replacement for the param, it should say that.  If 
there is a replacement, it should be listed.

We'll take this as a feature enhancement.

On Aug 28, 2012, at 9:23 AM, jody wrote:

> Thanks Ralph
> 
> I renamed the parameter in my script,
> and now there are no more ugly messages :)
> 
> Jody
> 
> On Tue, Aug 28, 2012 at 3:17 PM, Ralph Castain  wrote:
>> Ah, I see - yeah, the parameter technically is being renamed to 
>> "orte_rsh_agent" to avoid having users need to know the internal topology of 
>> the code base (i.e., that it is in the plm framework and the rsh component). 
>> It will always be there, though - only the name is changing to protect the 
>> innocent. :-)
>> 
>> 
>> On Aug 28, 2012, at 6:07 AM, jody  wrote:
>> 
>>> Hi Rallph
>>> 
>>> I get one of these messages
>>> --
>>> A deprecated MCA parameter value was specified in the environment or
>>> on the command line.  Deprecated MCA parameters should be avoided;
>>> they may disappear in future releases.
>>> 
>>> Deprecated parameter: plm_rsh_agent
>>> --
>>> for every process that starts...
>>> 
>>> My openmpi version is 1.6 (gentoo package sys-cluster/openmpi-1.6-r1)
>>> 
>>> jody
>>> 
>>> On Tue, Aug 28, 2012 at 2:38 PM, Ralph Castain  wrote:
 Guess I'm confused - what is the issue here? The param still exists:
 
MCA plm: parameter "plm_rsh_agent" (current value: >>> rsh>, data source: default value, synonyms:
 pls_rsh_agent, orte_rsh_agent)
 The command used to launch executables on remote 
 nodes (typically either "ssh" or "rsh")
 
 I am unaware of any plans to deprecate it. Is there a problem with it?
 
 On Aug 28, 2012, at 2:24 AM, jody  wrote:
 
> Hi
> 
> In order to open a xterm for each of my processes i use the MCA
> parameter 'plm_rsh_agent'
> like this:
> mpirun -np 5 -hostfile allhosts-mca plm_base_verbose 1 -mca
> plm_rsh_agent "ssh -Y"  --leave-session-attached xterm  -hold -e
> ./MPIProg
> 
> Without the option ' -mca plm_rsh_agent "ssh -Y"' i can't open windows
> from the remote:
> 
> jody@boss /mnt/data1/neander $  mpirun -np 5 -hostfile allhosts
> -mca plm_base_verbose 1   --leave-session-attached xterm -hold -e
> ./MPIStruct
> xterm: Xt error: Can't open display:
> xterm: DISPLAY is not set
> xterm: Xt error: Can't open display:
> xterm: DISPLAY is not set
> xterm: Xt error: Can't open display:
> xterm: DISPLAY is not set
> xterm: Xt error: Can't open display:
> xterm: DISPLAY is not set
> xterm: Xt error: Can't open display:
> xterm: DISPLAY is not set
> --
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --
> 
> Is there some replacement for this parameter,
> or how else can i get mpi to use" ssh -Y for" its connections?
> 
> Thank You
> jody
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
 
 
 ___
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] deprecated MCA parameter

2012-08-28 Thread jody
Thanks Ralph

I renamed the parameter in my script,
and now there are no more ugly messages :)

Jody

On Tue, Aug 28, 2012 at 3:17 PM, Ralph Castain  wrote:
> Ah, I see - yeah, the parameter technically is being renamed to 
> "orte_rsh_agent" to avoid having users need to know the internal topology of 
> the code base (i.e., that it is in the plm framework and the rsh component). 
> It will always be there, though - only the name is changing to protect the 
> innocent. :-)
>
>
> On Aug 28, 2012, at 6:07 AM, jody  wrote:
>
>> Hi Rallph
>>
>> I get one of these messages
>> --
>> A deprecated MCA parameter value was specified in the environment or
>> on the command line.  Deprecated MCA parameters should be avoided;
>> they may disappear in future releases.
>>
>>  Deprecated parameter: plm_rsh_agent
>> --
>> for every process that starts...
>>
>> My openmpi version is 1.6 (gentoo package sys-cluster/openmpi-1.6-r1)
>>
>> jody
>>
>> On Tue, Aug 28, 2012 at 2:38 PM, Ralph Castain  wrote:
>>> Guess I'm confused - what is the issue here? The param still exists:
>>>
>>> MCA plm: parameter "plm_rsh_agent" (current value: >> rsh>, data source: default value, synonyms:
>>>  pls_rsh_agent, orte_rsh_agent)
>>>  The command used to launch executables on remote 
>>> nodes (typically either "ssh" or "rsh")
>>>
>>> I am unaware of any plans to deprecate it. Is there a problem with it?
>>>
>>> On Aug 28, 2012, at 2:24 AM, jody  wrote:
>>>
 Hi

 In order to open a xterm for each of my processes i use the MCA
 parameter 'plm_rsh_agent'
 like this:
 mpirun -np 5 -hostfile allhosts-mca plm_base_verbose 1 -mca
 plm_rsh_agent "ssh -Y"  --leave-session-attached xterm  -hold -e
 ./MPIProg

 Without the option ' -mca plm_rsh_agent "ssh -Y"' i can't open windows
 from the remote:

 jody@boss /mnt/data1/neander $  mpirun -np 5 -hostfile allhosts
 -mca plm_base_verbose 1   --leave-session-attached xterm -hold -e
 ./MPIStruct
 xterm: Xt error: Can't open display:
 xterm: DISPLAY is not set
 xterm: Xt error: Can't open display:
 xterm: DISPLAY is not set
 xterm: Xt error: Can't open display:
 xterm: DISPLAY is not set
 xterm: Xt error: Can't open display:
 xterm: DISPLAY is not set
 xterm: Xt error: Can't open display:
 xterm: DISPLAY is not set
 --
 mpirun noticed that the job aborted, but has no info as to the process
 that caused that situation.
 --

 Is there some replacement for this parameter,
 or how else can i get mpi to use" ssh -Y for" its connections?

 Thank You
 jody
 ___
 users mailing list
 us...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] deprecated MCA parameter

2012-08-28 Thread Ralph Castain
Ah, I see - yeah, the parameter technically is being renamed to 
"orte_rsh_agent" to avoid having users need to know the internal topology of 
the code base (i.e., that it is in the plm framework and the rsh component). It 
will always be there, though - only the name is changing to protect the 
innocent. :-)


On Aug 28, 2012, at 6:07 AM, jody  wrote:

> Hi Rallph
> 
> I get one of these messages
> --
> A deprecated MCA parameter value was specified in the environment or
> on the command line.  Deprecated MCA parameters should be avoided;
> they may disappear in future releases.
> 
>  Deprecated parameter: plm_rsh_agent
> --
> for every process that starts...
> 
> My openmpi version is 1.6 (gentoo package sys-cluster/openmpi-1.6-r1)
> 
> jody
> 
> On Tue, Aug 28, 2012 at 2:38 PM, Ralph Castain  wrote:
>> Guess I'm confused - what is the issue here? The param still exists:
>> 
>> MCA plm: parameter "plm_rsh_agent" (current value: > rsh>, data source: default value, synonyms:
>>  pls_rsh_agent, orte_rsh_agent)
>>  The command used to launch executables on remote 
>> nodes (typically either "ssh" or "rsh")
>> 
>> I am unaware of any plans to deprecate it. Is there a problem with it?
>> 
>> On Aug 28, 2012, at 2:24 AM, jody  wrote:
>> 
>>> Hi
>>> 
>>> In order to open a xterm for each of my processes i use the MCA
>>> parameter 'plm_rsh_agent'
>>> like this:
>>> mpirun -np 5 -hostfile allhosts-mca plm_base_verbose 1 -mca
>>> plm_rsh_agent "ssh -Y"  --leave-session-attached xterm  -hold -e
>>> ./MPIProg
>>> 
>>> Without the option ' -mca plm_rsh_agent "ssh -Y"' i can't open windows
>>> from the remote:
>>> 
>>> jody@boss /mnt/data1/neander $  mpirun -np 5 -hostfile allhosts
>>> -mca plm_base_verbose 1   --leave-session-attached xterm -hold -e
>>> ./MPIStruct
>>> xterm: Xt error: Can't open display:
>>> xterm: DISPLAY is not set
>>> xterm: Xt error: Can't open display:
>>> xterm: DISPLAY is not set
>>> xterm: Xt error: Can't open display:
>>> xterm: DISPLAY is not set
>>> xterm: Xt error: Can't open display:
>>> xterm: DISPLAY is not set
>>> xterm: Xt error: Can't open display:
>>> xterm: DISPLAY is not set
>>> --
>>> mpirun noticed that the job aborted, but has no info as to the process
>>> that caused that situation.
>>> --
>>> 
>>> Is there some replacement for this parameter,
>>> or how else can i get mpi to use" ssh -Y for" its connections?
>>> 
>>> Thank You
>>> jody
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] 转发:lwkmpi

2012-08-28 Thread Tim Prince

On 8/28/2012 5:11 AM, 清风 wrote:




-- 原始邮 件 --
*发件人:* "295187383"<295187...@qq.com>;
*发送时间:* 2012年8月28日(星期二) 下午4:13
*收件人:* "users";
*主题:* lwkmpi

Hi everybody,
I'm trying compile openmpi with intel compiler11.1.07 on ubuntu .
I compiled openmpi  many times and I could always find a problem. 
But the error that I'm getting now, gives me no clues where to even 
search for the problem.
It seems I have succeed to configure.While I try "make all",it 
always show problems below:




make[7]: 正在进入目录 
`/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt/tools/opari/tool'
/opt/intel/Compiler/11.1/072/bin/ia32/icpc -DHAVE_CONFIG_H -I. 
-I../../..   -DINSIDE_OPENMPI -I/home/lwk/桌面/mnt/Software/openmpi- 
1.6.1/opal/mca/hwloc/hwloc132/hwloc /include   
-I/usr/include/infiniband -I/usr/include/infiniband  -DOPARI_VT -O3 
-DNDEBUG -finline-functions -pthread -MT opari-ompragma_c.o -MD -MP 
-MF .deps/opari-ompragma_c.Tpo -c -o opari-ompragma_c.o `test -f 
'ompragma_c.cc' || echo './'`ompragma_c.cc

/usr/include/c++/4.5/iomanip(64): error: expected an expression
{ return { __mask }; }
 ^



Looks like your icpc is too old to work with your g++.  If you want to 
build with C++ support, you'll need better matching versions of icpc and 
g++.  icpc support for g++4.7 is expected to release within the next 
month; icpc 12.1 should be fine with g++ 4.5 and 4.6.


--
Tim Prince



Re: [hwloc-users] lstopo and GPus

2012-08-28 Thread Samuel Thibault
Brice Goglin, le Tue 28 Aug 2012 14:43:53 +0200, a écrit :
> > $ lstopo
> >   Socket #0
> >   Socket #1
> > PCI...
> > (connected to socket #1)
> >
> > vs
> >
> > $ lstopo
> >   Socket #0
> >   Socket #1
> >   PCI...
> > (connected to both sockets)
> 
> Fortunately, this won't occur in most cases (including Gabriele's
> machines) because there's a NUMAnode object above each socket.

Oops, I actually meant NUMAnode above

> Both the socket and the PCI bus are drawn inside the NUMA box, so
> things appear OK in graphics to.

Indeed, if the PCI bus was connected to one NUMAnode/socket only, it
would be drawn inside, which is not the case.

> Gabriele, assuming you have a dual Xeon X56xx Westmere machine, there
> are plenty of such platforms where the GPU is indeed connected to both
> sockets. Or it could be a buggy BIOS.

Agreed.

Samuel


Re: [hwloc-users] lstopo and GPus

2012-08-28 Thread Brice Goglin
Le 28/08/2012 14:23, Samuel Thibault a écrit :
> Gabriele Fatigati, le Tue 28 Aug 2012 14:19:44 +0200, a écrit :
>> I'm using hwloc 1.5. I would to see how GPUs are connected with the processor
>> socket using lstopo command. 
> About connexion with the socket, there is indeed no real graphical
> difference between "connected to socket #1" and "connected to all
> sockets". You can use the text output for that:
>
> $ lstopo
>   Socket #0
>   Socket #1
> PCI...
> (connected to socket #1)
>
> vs
>
> $ lstopo
>   Socket #0
>   Socket #1
>   PCI...
> (connected to both sockets)

Fortunately, this won't occur in most cases (including Gabriele's
machines) because there's a NUMAnode object above each socket. Both the
socket and the PCI bus are drawn inside the NUMA box, so things appear
OK in graphics to.

I've never seen the problem on a real machine, but a fake topology with
a PCI bus attached to a socket that is not strictly equal to the above
NUMA node is indeed wrongly displayed.


Gabriele, assuming you have a dual Xeon X56xx Westmere machine, there
are plenty of such platforms where the GPU is indeed connected to both
sockets. Or it could be a buggy BIOS.

Brice



Re: [OMPI users] deprecated MCA parameter

2012-08-28 Thread Ralph Castain
Guess I'm confused - what is the issue here? The param still exists:

 MCA plm: parameter "plm_rsh_agent" (current value: , data source: default value, synonyms:
  pls_rsh_agent, orte_rsh_agent)
  The command used to launch executables on remote 
nodes (typically either "ssh" or "rsh")

I am unaware of any plans to deprecate it. Is there a problem with it?

On Aug 28, 2012, at 2:24 AM, jody  wrote:

> Hi
> 
> In order to open a xterm for each of my processes i use the MCA
> parameter 'plm_rsh_agent'
> like this:
>  mpirun -np 5 -hostfile allhosts-mca plm_base_verbose 1 -mca
> plm_rsh_agent "ssh -Y"  --leave-session-attached xterm  -hold -e
> ./MPIProg
> 
> Without the option ' -mca plm_rsh_agent "ssh -Y"' i can't open windows
> from the remote:
> 
> jody@boss /mnt/data1/neander $  mpirun -np 5 -hostfile allhosts
> -mca plm_base_verbose 1   --leave-session-attached xterm -hold -e
> ./MPIStruct
> xterm: Xt error: Can't open display:
> xterm: DISPLAY is not set
> xterm: Xt error: Can't open display:
> xterm: DISPLAY is not set
> xterm: Xt error: Can't open display:
> xterm: DISPLAY is not set
> xterm: Xt error: Can't open display:
> xterm: DISPLAY is not set
> xterm: Xt error: Can't open display:
> xterm: DISPLAY is not set
> --
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
> --
> 
> Is there some replacement for this parameter,
> or how else can i get mpi to use" ssh -Y for" its connections?
> 
> Thank You
>  jody
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [hwloc-users] lstopo and GPus

2012-08-28 Thread Samuel Thibault
Gabriele Fatigati, le Tue 28 Aug 2012 14:19:44 +0200, a écrit :
> I'm using hwloc 1.5. I would to see how GPUs are connected with the processor
> socket using lstopo command. 

About connexion with the socket, there is indeed no real graphical
difference between "connected to socket #1" and "connected to all
sockets". You can use the text output for that:

$ lstopo
  Socket #0
  Socket #1
PCI...
(connected to socket #1)

vs

$ lstopo
  Socket #0
  Socket #1
  PCI...
(connected to both sockets)

Samuel


[hwloc-users] lstopo and GPus

2012-08-28 Thread Gabriele Fatigati
Dear hwloc user,

I'm using hwloc 1.5. I would to see how GPUs are connected with the
processor socket using lstopo command.

I attach the figure. The system has two GPUs, but I don't understand how to
find that information from PCI boxes.

Thanks in advance.



-- 
Ing. Gabriele Fatigati

HPC specialist

SuperComputing Applications and Innovation Department

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39 051 6171722

g.fatigati [AT] cineca.it


Re: [OMPI users] problem with installing open mpi with intelcompiler11.1.07 on ubuntu

2012-08-28 Thread Jeff Squyres (jsquyres)
Try using the 1.6.2 nightly snapshot tarball and see if that fixed your 
problem. 

I'm not near a computer to give you the specific link - go to openmpi.org and 
download and nightly snapshots and the v1.6 series. 

Sent from my phone. No type good. 

On Aug 28, 2012, at 6:59 AM, "清风" <295187...@qq.com> wrote:

> 
> Hi everybody, 
>   I'm trying compile openmpi with intel compiler11.1.07 on ubuntu . 
>   I compiled openmpi  many times and I could always find a problem. But the 
> error that I'm getting now, gives me no clues where to even search for the 
> problem. 
>   It seems I have succeed to configure.While I try "make all",it always show 
> problems below:
> 
> 
> 
>   make[7]:  `/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt/tools/opari/tool'
> /opt/intel/Compiler/11.1/072/bin/ia32/icpc -DHAVE_CONFIG_H -I. -I../../..   
> -DINSIDE_OPENMPI   
> -I/home/lwk/desktop/mnt/Software/openmpi-1.6.1/opal/mca/hwloc/hwloc132/hwloc 
> /include   -I/usr/include/infiniband -I/usr/include/infiniband  -DOPARI_VT 
> -O3 -DNDEBUG -finline-functions -pthread -MT opari-ompragma_c.o -MD -MP -MF 
> .deps/opari-ompragma_c.Tpo -c -o opari-ompragma_c.o `test -f 'ompragma_c.cc' 
> || echo './'`ompragma_c.cc
> /usr/include/c++/4.5/iomanip(64): error: expected an expression
>   { return { __mask }; }
>^
> 
> /usr/include/c++/4.5/iomanip(94): error: expected an expression
>   { return { __mask }; }
>^
> 
> /usr/include/c++/4.5/iomanip(125): error: expected an expression
>   { return { __base }; }
>^
> 
> /usr/include/c++/4.5/iomanip(193): error: expected an expression
>   { return { __n }; }
>^
> 
> /usr/include/c++/4.5/iomanip(223): error: expected an expression
>   { return { __n }; }
>^
> 
> /usr/include/c++/4.5/iomanip(163): error: expected an expression
> { return { __c }; }
>  ^
> 
> compilation aborted for ompragma_c.cc (code 2)
> make[7]: *** [opari-ompragma_c.o] error 2
> make[7]:are now leaving directory 
> `/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt/tools/opari/tool'
> make[6]: *** [all-recursive] error 1
> make[6]:are now leaving 
> directory`/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt/tools/opari'
> make[5]: *** [all-recursive] error 1
> make[5]:are now leaving directory 
> `/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt/tools'
> make[4]: *** [all-recursive] error 1
> make[4]:are now leaving directory 
> `/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt'
> make[3]: *** [all] error 2
> make[3]:are now leaving directory 
> `/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt'
> make[2]: *** [all-recursive] error 1
> make[2]:are now leaving directory 
> `/mnt/Software/openmpi-1.6.1/ompi/contrib/vt'
> make[1]: *** [all-recursive] error 1
> make[1]:are now leaving directory `/mnt/Software/openmpi-1.6.1/ompi'
> make: *** [all-recursive] error 1
> 
> 
> 
> 
>   My  file "configure.log" and "make.out"  is in accessory lwkmpi.zip.
>  with best regards
>Liang Wenke
> 
> 
> *
> ** **
> ** WARNING:  This email contains an attachment of a very suspicious type.  **
> ** You are urged NOT to open this attachment unless you are absolutely **
> ** sure it is legitimate.  Opening this attachment may cause irreparable   **
> ** damage to your computer and your files.  If you have any questions  **
> ** about the validity of this message, PLEASE SEEK HELP BEFORE OPENING IT. **
> ** **
> ** This warning was added by the IU Computer Science Dept. mail scanner.   **
> *
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] problem with installing open mpi with intelcompiler11.1.07 on ubuntu

2012-08-28 Thread ????
Hi everybody, 
I'm trying  compile openmpi with intel compiler11.1.07 on ubuntu . 
I compiled openmpi  many times and I could always find  a problem. But the 
error that I'm getting now, gives me no clues where  to even search for the 
problem. 
It seems I have succeed to configure.While I try "make all",it always show 
problems below:



make[7]:  `/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt/tools/opari/tool'
/opt/intel/Compiler/11.1/072/bin/ia32/icpc  -DHAVE_CONFIG_H -I. -I../../..   
-DINSIDE_OPENMPI
-I/home/lwk/desktop/mnt/Software/openmpi-1.6.1/opal/mca/hwloc/hwloc132/hwloc 
/include   -I/usr/include/infiniband -I/usr/include/infiniband   -DOPARI_VT -O3 
-DNDEBUG -finline-functions -pthread -MT  opari-ompragma_c.o -MD -MP -MF 
.deps/opari-ompragma_c.Tpo -c -o  opari-ompragma_c.o `test -f 'ompragma_c.cc' 
|| echo './'`ompragma_c.cc
/usr/include/c++/4.5/iomanip(64): error: expected an expression
{ return { __mask }; }
 ^

/usr/include/c++/4.5/iomanip(94): error: expected an expression
{ return { __mask }; }
 ^

/usr/include/c++/4.5/iomanip(125): error: expected an expression
{ return { __base }; }
 ^

/usr/include/c++/4.5/iomanip(193): error: expected an expression
{ return { __n }; }
 ^

/usr/include/c++/4.5/iomanip(223): error: expected an expression
{ return { __n }; }
 ^

/usr/include/c++/4.5/iomanip(163): error: expected an expression
  { return { __c }; }
   ^

compilation aborted for ompragma_c.cc (code 2)
make[7]: *** [opari-ompragma_c.o] error 2
make[7]:are now leaving directory 
`/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt/tools/opari/tool'
make[6]: *** [all-recursive] error 1
make[6]:are now leaving 
directory`/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt/tools/opari'
make[5]: *** [all-recursive] error 1
make[5]:are now leaving directory 
`/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt/tools'
make[4]: *** [all-recursive] error 1
make[4]:are now leaving directory 
`/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt'
make[3]: *** [all] error 2
make[3]:are now leaving directory 
`/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt'
make[2]: *** [all-recursive] error 1
make[2]:are now leaving directory `/mnt/Software/openmpi-1.6.1/ompi/contrib/vt'
make[1]: *** [all-recursive] error 1
make[1]:are now leaving directory `/mnt/Software/openmpi-1.6.1/ompi'
make: *** [all-recursive] error 1




My  file "configure.log" and "make.out"  is in accessory lwkmpi.zip.
   with best regards
 Liang Wenke

*
** **
** WARNING:  This email contains an attachment of a very suspicious type.  **
** You are urged NOT to open this attachment unless you are absolutely **
** sure it is legitimate.  Opening this attachment may cause irreparable   **
** damage to your computer and your files.  If you have any questions  **
** about the validity of this message, PLEASE SEEK HELP BEFORE OPENING IT. **
** **
** This warning was added by the IU Computer Science Dept. mail scanner.   **
*


<>


[OMPI users] deprecated MCA parameter

2012-08-28 Thread jody
Hi

In order to open a xterm for each of my processes i use the MCA
parameter 'plm_rsh_agent'
like this:
  mpirun -np 5 -hostfile allhosts-mca plm_base_verbose 1 -mca
plm_rsh_agent "ssh -Y"  --leave-session-attached xterm  -hold -e
./MPIProg

Without the option ' -mca plm_rsh_agent "ssh -Y"' i can't open windows
from the remote:

jody@boss /mnt/data1/neander $  mpirun -np 5 -hostfile allhosts
-mca plm_base_verbose 1   --leave-session-attached xterm -hold -e
./MPIStruct
xterm: Xt error: Can't open display:
xterm: DISPLAY is not set
xterm: Xt error: Can't open display:
xterm: DISPLAY is not set
xterm: Xt error: Can't open display:
xterm: DISPLAY is not set
xterm: Xt error: Can't open display:
xterm: DISPLAY is not set
xterm: Xt error: Can't open display:
xterm: DISPLAY is not set
--
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--

Is there some replacement for this parameter,
or how else can i get mpi to use" ssh -Y for" its connections?

Thank You
  jody


[OMPI users] ??????lwkmpi

2012-08-28 Thread ????
--  --
??: "295187383"<295187...@qq.com>;
: 2012??8??28??(??) 4:13
??: "users"; 

: lwkmpi



Hi everybody, 
I'm trying  compile openmpi with intel compiler11.1.07 on ubuntu . 
I compiled openmpi  many times and I could always find  a problem. But the 
error that I'm getting now, gives me no clues where  to even search for the 
problem. 
It seems I have succeed to configure.While I try "make all",it always show 
problems below:



make[7]:  
`/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt/tools/opari/tool'
/opt/intel/Compiler/11.1/072/bin/ia32/icpc  -DHAVE_CONFIG_H -I. -I../../..   
-DINSIDE_OPENMPI
-I/home/lwk//mnt/Software/openmpi-1.6.1/opal/mca/hwloc/hwloc132/hwloc 
/include   -I/usr/include/infiniband -I/usr/include/infiniband   -DOPARI_VT -O3 
-DNDEBUG -finline-functions -pthread -MT  opari-ompragma_c.o -MD -MP -MF 
.deps/opari-ompragma_c.Tpo -c -o  opari-ompragma_c.o `test -f 'ompragma_c.cc' 
|| echo './'`ompragma_c.cc
/usr/include/c++/4.5/iomanip(64): error: expected an expression
{ return { __mask }; }
 ^

/usr/include/c++/4.5/iomanip(94): error: expected an expression
{ return { __mask }; }
 ^

/usr/include/c++/4.5/iomanip(125): error: expected an expression
{ return { __base }; }
 ^

/usr/include/c++/4.5/iomanip(193): error: expected an expression
{ return { __n }; }
 ^

/usr/include/c++/4.5/iomanip(223): error: expected an expression
{ return { __n }; }
 ^

/usr/include/c++/4.5/iomanip(163): error: expected an expression
  { return { __c }; }
   ^

compilation aborted for ompragma_c.cc (code 2)
make[7]: *** [opari-ompragma_c.o]  2
make[7]: 
`/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt/tools/opari/tool'
make[6]: *** [all-recursive]  1
make[6]: 
`/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt/tools/opari'
make[5]: *** [all-recursive]  1
make[5]: `/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt/tools'
make[4]: *** [all-recursive]  1
make[4]: `/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt'
make[3]: *** [all]  2
make[3]: `/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt'
make[2]: *** [all-recursive]  1
make[2]: `/mnt/Software/openmpi-1.6.1/ompi/contrib/vt'
make[1]: *** [all-recursive]  1
make[1]: `/mnt/Software/openmpi-1.6.1/ompi'
make: *** [all-recursive]  1




My  file "configure.log" and "make.out"  is in accessory lwkmpi.zip.
   with best regards
 Liang Wenke

*
** **
** WARNING:  This email contains an attachment of a very suspicious type.  **
** You are urged NOT to open this attachment unless you are absolutely **
** sure it is legitimate.  Opening this attachment may cause irreparable   **
** damage to your computer and your files.  If you have any questions  **
** about the validity of this message, PLEASE SEEK HELP BEFORE OPENING IT. **
** **
** This warning was added by the IU Computer Science Dept. mail scanner.   **
*


<>


Re: [OMPI users] error compiling openmpi-1.6.1 on Windows 7

2012-08-28 Thread Shiqing Fan

Hi Siegmar,

It seems that the runtime environment is messed up with the different 
versions of Open MPI. I suggest you completely remove all the 
installations and install 1.6.1 again (just build the installation 
project again). It should work without any problem under Cygwin too.


Shiqing

On 2012-08-27 4:02 PM, Siegmar Gross wrote:

Hi,

thank you very much for your reply. I compiled and installed
openmpi-1.6.1. Unfortunately I cannot compile programs because
"mpicc" uses wrong path names. I have set an environment for
openmpi-1.6.1 as you can see from the following output.

D:\...prog\mpi\small_prog>set | c:\cygwin\bin\grep openmpi
LIB=C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\lib\amd64;
   C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\atlmfc\lib\amd64;
   C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Lib\x64;
   C:\Program Files\openmpi-1.6.1\lib

LIBPATH=C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\lib\amd64;
   C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\atlmfc\lib\amd64;
   C:\Program Files\openmpi-1.6.1\lib

Path=C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\bin\amd64;
   C:\Program Files (x86)\Microsoft Visual Studio 10.0\VC\vcpackages;
   C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\IDE\;
   C:\Program Files (x86)\Microsoft Visual Studio 10.0\Common7\Tools\;
   C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Bin\NETFX 4.0 Tools\x64;
   C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Bin\x64;
   C:\Program Files (x86)\Microsoft SDKs\Windows\v7.0A\Bin\;
   C:\Windows\System32;
   C:\Windows;
   C:\Windows\System32\Wbem;
   C:\Program Files\openmpi-1.6.1\bin;
   C:\cmd;.



I get the following error when I try to compile my program
because of "/LIBPATH:C:\Program Files (x86)\openmpi-1.6/lib".

D:\...\prog\mpi\small_prog>mpicc init_finalize.c
Microsoft (R) C/C++-Optimierungscompiler Version 16.00.40219.01 für x64
Copyright (C) Microsoft Corporation. Alle Rechte vorbehalten.
init_finalize.c
Microsoft (R) Incremental Linker Version 10.00.40219.01
Copyright (C) Microsoft Corporation.  All rights reserved.
/out:init_finalize.exe
"/LIBPATH:C:\Program Files (x86)\openmpi-1.6/lib"
libmpi.lib
libopen-pal.lib
libopen-rte.lib
advapi32.lib
Ws2_32.lib
shlwapi.lib
init_finalize.obj
init_finalize.obj : error LNK2019: Verweis auf nicht aufgelöstes externes Symbol 
"__imp_MP
I_Finalize" in Funktion "main".
init_finalize.obj : error LNK2019: Verweis auf nicht aufgelöstes externes Symbol 
"__imp_MP
I_Init" in Funktion "main".
init_finalize.exe : fatal error LNK1120: 2 nicht aufgelöste externe Verweise.



When I start in a new command shell without my MPI environment,
I get the following outputs for "mpicc -show". The first one
is OK, but both others are wrong because they point to 32-bit
libraries instead of 64-bit ones. Why do both versions point
to openmpi-1.6? I downloaded and installed the precompiled
32- and 64-bit version 1.6 from open-mpi.org.

C:\Program Files>openmpi-1.5.1\bin\mpicc -show
cl.exe /I"C:/Program Files/openmpi-1.5.1/include" /TC /D "OMPI_IMPORTS" /D 
"OPAL_IMPORTS"
/D "ORTE_IMPORTS" /link /LIBPATH:"C:/Program Files/openmpi-1.5.1/lib" 
libmpi.lib libopen-p
al.lib libopen-rte.lib advapi32.lib Ws2_32.lib shlwapi.lib

C:\Program Files>openmpi-1.6\bin\mpicc -show
cl.exe /I"C:\Program Files (x86)\openmpi-1.6/include" /TC /DOMPI_IMPORTS 
/DOPAL_IMPORTS /D
ORTE_IMPORTS /link /LIBPATH:"C:\Program Files (x86)\openmpi-1.6/lib" libmpi.lib 
libopen-pa
l.lib libopen-rte.lib advapi32.lib Ws2_32.lib shlwapi.lib

C:\Program Files>openmpi-1.6.1\bin\mpicc -show
cl.exe /I"C:\Program Files (x86)\openmpi-1.6/include" /TC /DOMPI_IMPORTS 
/DOPAL_IMPORTS /D
ORTE_IMPORTS /link /LIBPATH:"C:\Program Files (x86)\openmpi-1.6/lib" libmpi.lib 
libopen-pa
l.lib libopen-rte.lib advapi32.lib Ws2_32.lib shlwapi.lib


Do you have any idea what I have done wrong? Thank you very
much for any help in advance.


Kind regards

Siegmar



I didn't have this problem when building the binary release.

But to solve the problem is very easy. You can just open
\openmpi-1.6.1\ompi\mca\osc\rdma\osc_rdma_data_move.c, and go
to line 1099, change "void*" to "void**". This will get rid of the error.

For the warnings, they are just some redefinitions that cannot
be avoided, they are totally harmless.


Regards,
Shiqing
   




On 2012-08-27 1:02 PM, Siegmar Gross wrote:

Hi,

I tried to compile openmpi-1.6.1 with CMake-2.8.3 and Visual Studio
2010 on Windows 7. All service packs and patches from Microsoft are
installed.

I changed the following options:

CMAKE_BUILD_TYPE: "Debug" modified to "Release"
CMAKE_INSTALL_PREFIX: modified to "c:/Program Files (x86)/openmpi-1.6.1"
OMPI_ENABLE_THREAD_MULTIPLE: "no" changed to "yes"
OMPI_RELEASE_BUILD: "no" changed to "yes"
OPAL_ENABLE_HETEREOGENEOUS:SUPPORT: "no" changed to "yes"
OPAL_ENABLE_IPV6:  "yes" changed to "no"
OPAL_ENABLE_MULTI_THREADS: "no" changed to "yes"

I also selected "Release" in "Visual Studio". 

Re: [OMPI users] Application with mxm hangs on startup

2012-08-28 Thread ????
Dear prof.Aleksey:
My system is 32 bit system for unbuntu.What's used for MXM version which you 
give to me?
 
   Best regards,
   Liang Wenke





-- Original --
From:  "Aleksey Senin";
Date:  Tue, Aug 28, 2012 04:19 PM
To:  "pavel.mezentsev"; 
Cc:  "users"; 
Subject:  [OMPI users] Application with mxm hangs on startup



Please, download MXM version
http://mellanox.com/downloads/hpc/mxm/v1.1/mxm_1.1.1328.tar

This version checked with OMPI-1.6.2 
(http://svn.open-mpi.org/svn/ompi/branches/v1.6).

In the case of any failure, could you enclose the output?

Regards,
Aleksey.
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Application with mxm hangs on startup

2012-08-28 Thread ????
Dear prof.Aleksey:
   Thank you very much.
   Some failure output file like "config.log","make.out" is in attachment 
'lwkmpi.zip'.




-- Original --
From:  "Aleksey Senin";
Date:  Tue, Aug 28, 2012 04:19 PM
To:  "pavel.mezentsev"; 
Cc:  "users"; 
Subject:  [OMPI users] Application with mxm hangs on startup



Please, download MXM version
http://mellanox.com/downloads/hpc/mxm/v1.1/mxm_1.1.1328.tar

This version checked with OMPI-1.6.2 
(http://svn.open-mpi.org/svn/ompi/branches/v1.6).

In the case of any failure, could you enclose the output?

Regards,
Aleksey.
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

*
** **
** WARNING:  This email contains an attachment of a very suspicious type.  **
** You are urged NOT to open this attachment unless you are absolutely **
** sure it is legitimate.  Opening this attachment may cause irreparable   **
** damage to your computer and your files.  If you have any questions  **
** about the validity of this message, PLEASE SEEK HELP BEFORE OPENING IT. **
** **
** This warning was added by the IU Computer Science Dept. mail scanner.   **
*


<>


[OMPI users] Application with mxm hangs on startup

2012-08-28 Thread Aleksey Senin

Please, download MXM version
http://mellanox.com/downloads/hpc/mxm/v1.1/mxm_1.1.1328.tar

This version checked with OMPI-1.6.2 
(http://svn.open-mpi.org/svn/ompi/branches/v1.6).


In the case of any failure, could you enclose the output?

Regards,
Aleksey.


[OMPI users] lwkmpi

2012-08-28 Thread ????
Hi everybody, 
I'm trying  compile openmpi with intel compiler11.1.07 on ubuntu . 
I compiled openmpi  many times and I could always find  a problem. But the 
error that I'm getting now, gives me no clues where  to even search for the 
problem. 
It seems I have succeed to configure.While I try "make all",it always show 
problems below:



make[7]:  
`/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt/tools/opari/tool'
/opt/intel/Compiler/11.1/072/bin/ia32/icpc  -DHAVE_CONFIG_H -I. -I../../..   
-DINSIDE_OPENMPI
-I/home/lwk//mnt/Software/openmpi-1.6.1/opal/mca/hwloc/hwloc132/hwloc 
/include   -I/usr/include/infiniband -I/usr/include/infiniband   -DOPARI_VT -O3 
-DNDEBUG -finline-functions -pthread -MT  opari-ompragma_c.o -MD -MP -MF 
.deps/opari-ompragma_c.Tpo -c -o  opari-ompragma_c.o `test -f 'ompragma_c.cc' 
|| echo './'`ompragma_c.cc
/usr/include/c++/4.5/iomanip(64): error: expected an expression
{ return { __mask }; }
 ^

/usr/include/c++/4.5/iomanip(94): error: expected an expression
{ return { __mask }; }
 ^

/usr/include/c++/4.5/iomanip(125): error: expected an expression
{ return { __base }; }
 ^

/usr/include/c++/4.5/iomanip(193): error: expected an expression
{ return { __n }; }
 ^

/usr/include/c++/4.5/iomanip(223): error: expected an expression
{ return { __n }; }
 ^

/usr/include/c++/4.5/iomanip(163): error: expected an expression
  { return { __c }; }
   ^

compilation aborted for ompragma_c.cc (code 2)
make[7]: *** [opari-ompragma_c.o]  2
make[7]: 
`/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt/tools/opari/tool'
make[6]: *** [all-recursive]  1
make[6]: 
`/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt/tools/opari'
make[5]: *** [all-recursive]  1
make[5]: `/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt/tools'
make[4]: *** [all-recursive]  1
make[4]: `/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt'
make[3]: *** [all]  2
make[3]: `/mnt/Software/openmpi-1.6.1/ompi/contrib/vt/vt'
make[2]: *** [all-recursive]  1
make[2]: `/mnt/Software/openmpi-1.6.1/ompi/contrib/vt'
make[1]: *** [all-recursive]  1
make[1]: `/mnt/Software/openmpi-1.6.1/ompi'
make: *** [all-recursive]  1




My  file "configure.log" and "make.out"  is in accessory lwkmpi.zip.
   with best regards
 Liang Wenke

*
** **
** WARNING:  This email contains an attachment of a very suspicious type.  **
** You are urged NOT to open this attachment unless you are absolutely **
** sure it is legitimate.  Opening this attachment may cause irreparable   **
** damage to your computer and your files.  If you have any questions  **
** about the validity of this message, PLEASE SEEK HELP BEFORE OPENING IT. **
** **
** This warning was added by the IU Computer Science Dept. mail scanner.   **
*


<>


Re: [OMPI users] Infiniband performance Problem and stalling

2012-08-28 Thread Paul Kapinos

Randolph,
after reading this:

On 08/28/12 04:26, Randolph Pullen wrote:

- On occasions it seems to stall indefinately, waiting on a single receive.


... I would make a blind guess: are you aware about IB card parameters for 
registered memory?

http://www.open-mpi.org/faq/?category=openfabrics#ib-low-reg-mem

"Waiting forever" for a single operation is one of symptoms of the problem 
especially in 1.5.3.



best,
Paul

P.S. the lower performance with 'big' chinks is known phenomenon, cf.
http://www.scl.ameslab.gov/netpipe/
(image on bottom of the page). But the chunk size of 64k is fairly small




--
Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
RWTH Aachen University, Center for Computing and Communication
Seffenter Weg 23,  D 52074  Aachen (Germany)
Tel: +49 241/80-24915



smime.p7s
Description: S/MIME Cryptographic Signature