Re: [OMPI users] ompi-restart failed && ompi-migrate

2012-04-11 Thread kidd
Hello !
I check  my OS(ubuntu 11)  . it not install prelink . Are there other reasons? 
(ompi-restart)
  thanks . 



 寄件者: Josh Hursey 
收件者: Open MPI Users  
寄件日期: 2012/4/11 (週三) 8:36 PM
主旨: Re: [OMPI users] ompi-restart failed && ompi-migrate
 
The 1.5 series does not support process migration, so there is no
ompi-migrate option there. This was only contributed to the trunk (1.7
series). However, changes to the runtime environment over the past few
months have broken this functionality. It is currently unclear when
this will be repaired. We hope to have it fixed and functional again
before the first release of the 1.7 series.

As far as your problem with ompi-restart have you checked the prelink
option on all of your nodes, per:
  https://upc-bugs.lbl.gov/blcr/doc/html/FAQ.html#prelink

-- Josh

On Tue, Apr 10, 2012 at 11:14 PM, kidd  wrote:
> Hello !
> I had some  problems .
> This is My environment
>    BLCR= 0.8.4   , openMPI= 1.5.5  , OS= ubuntu 11.04
>    I have 2 Node : cuda05(Master ,it have NFS  file system)  , cuda07(slave
> ,mount Master)
>
>    I had also set  ~/.openmpi/mca-params.conf->
>  crs_base_snapshot_dir=/root/kidd_openMPI/Tmp
>  snapc_base_global_snapshot_dir=/root/kidd_openMPI/checkpoints
>
>   my configure format=
> ./configure --prefix=/root/kidd_openMPI --with-ft=cr --enable-ft-thread
>  --with-blcr=/usr/local/BLCR  --with-blcr-libdir=/usr/local/BLCR/lib
> --enable-mpirun-prefix-by-default
>  --enable-static --enable-shared  --enable-opal-multi-threads;
>
> problem 1:  ompi-restart  on multiple Node
>   command 01: mpirun -hostfile  Hosts -am ft-enable-cr  -x  LD_LIBRARY_PATH
> -np 2  ./TEST
>   command 02: ompi-restart  ompi_global_snapshot_2892.ckpt
>   -> I can checkpoint 2 process on multiples nodes ,but when I restart
> ,it can only restart on Master-Node.
>
>      command 03 : ompi-restart  -hostfile Hosts
> ompi_global_snapshot_2892.ckpt
>     ->Error Message .   I make sure BLCR  is OK.
> 
>
> --
>     root@cuda05:~/kidd_openMPI/checkpoints# ompi-restart -hostfile Hosts
> ompi_global_snapshot_2892.ckpt/
>
> --
>    Error: BLCR was not able to restart the process because exec failed.
>     Check the installation of BLCR on all of the machines in your
>    system. The following information may be of help:
>  Return Code : -1
>  BLCR Restart Command : cr_restart
>  Restart Command Line : cr_restart
> /root/kidd_openMPI/checkpoints/ompi_global_snapshot_2892.ckpt/0/opal_snapshot_1.ckpt/ompi_blcr_context.2704
> --
> --
> Error: Unable to obtain the proper restart command to restart from the
>    checkpoint file (opal_snapshot_1.ckpt). Returned -1.
>    Check the installation of the blcr checkpoint/restart service
>    on all of the machines in your system.essage
> 
>  problem 2: ompi-migrate i can't find .   How to use ompi-migrate ?
>
>   Please help me , thanks .
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



-- 
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

[OMPI users] MPI_Send, MPI_Recv problem on Mac and Linux

2012-04-11 Thread Peter Sels
Dear openMPI users,

I think this should be an easy question to anyone with more experience
than an openMPI-hello-world-program...

I wrote some openMPI code, where the master sends a length and then a
buffer with that length as 2 subsequent MPI messages.
The slave is receiving these messages and answers back in a similar manner.

Sometimes this goes ok, sometimes not.
Messages of 28 chars or shorter do fine.
Messages of 29 or longer are usually problematic.
This length can be controlled with
#define DUMMY_MSG_LENGTH (40)


On Mac I sometimes get a mentioning of "slave 32767", where there
should only be a leave 1.
Probably a buffer overrun or so, but I cannot see where.

On linux I get:  Segmentation fault (11)

Increasing the length gives more problems...

How can I get this code stable?
What am I doing wrong?
Is there a maximum length to MPI messages?
For sending a string, do I use MPI_CHARACTER or MPI_BYTE or ...?

How come I cannot assert that my messages end in '\0' when received?
And how come that when I print them, I also get a segmentation fault?

Can I send two subsequent messages using MPI_Send, or do I have to do
the first as MPI_Isend and then do a MPI_Wait before the next
MPI_Send?...

Why do I not find code online for receiving the length first and then
allocating a buffer of this size and then receiving the next message?

All code, build, run scripts and logs are attached.

It would help me big time, if you could answer my questions or debug the code.

thanks a lot!

Pete
#include 

#include 
#include 
#include 
#include 

#define DUMMY_MSG_LENGTH (40)
// >28 almost never works, 
// <=28 mostly works, sometimes not either

#define LENGTH_TAG 1
#define WORK_TAG 2
#define RESULT_TAG 3
#define DIE_TAG 4

using namespace std;

// From: http://beige.ucs.indiana.edu/I590/node85.html
void mpiErrorLog(int rank, int error_code) {
  if (error_code != MPI_SUCCESS) {

char error_string[BUFSIZ];
int length_of_error_string;

MPI_Error_string(error_code, error_string, _of_error_string);
cerr << "MPI: rank=" << rank << ", errorStr=" << error_string << endl;
//send_error = TRUE;
  }
}

int main(int argc, char* argv[]) {
  typedef unsigned long int unit_of_length_t;
  typedef unsigned char unit_of_work_t;
  typedef unsigned char unit_of_result_t;
  
  int numprocs, rank, namelen;
  char processor_name[MPI_MAX_PROCESSOR_NAME];
  
  MPI_Init(, );
  
  MPI_Comm_size(MPI_COMM_WORLD, );
  cerr << "MPI: numprocs = " << numprocs << endl;
  MPI_Comm_rank(MPI_COMM_WORLD, );
  cerr << "MPI: rank = " << rank << endl;
  MPI_Get_processor_name(processor_name, );
  cerr << "MPI: processor_name = " << processor_name << endl;
  
  MPI_Status status;
  
  // Send work to the Slaves //
  
  //MPI_Errhandler set(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
  int errorCode;
  
  stringstream ss;
  string s0(DUMMY_MSG_LENGTH, 'W');
  cerr << "work msg = '" << s0 << "'" << endl; 
  ss << s0;
  string s = ss.str();
  
  if (rank!=0) {

MPI_Status status;
int errorCode;

while (true) {
  
  // Receive work from the master //
  unit_of_length_t workLength;
  cerr << "MPI: slave " << rank << " ready to receive workLength from master" 
  << endl;
  errorCode = MPI_Recv(, 1, MPI_UNSIGNED_LONG, 0, MPI_ANY_TAG,
   MPI_COMM_WORLD, );
  mpiErrorLog(rank, errorCode);
  
  assert((status.MPI_TAG == LENGTH_TAG) || (status.MPI_TAG == DIE_TAG));
  if (status.MPI_TAG == DIE_TAG) {
cerr << "MPI: slave " << rank << " received dieTag from master, "
<< "errorCode = " << errorCode << endl;
MPI_Finalize();
return 0; // ok
  }
  assert(status.MPI_TAG == LENGTH_TAG);
  cerr << "MPI: slave " << rank << " received workLength = " 
  << workLength << " from master, errorCode = " << errorCode << endl;
  
  unit_of_work_t * work = new unit_of_work_t[workLength+1];
  cerr << "work = " << (void*)work << endl;
  assert(work != 0);
  
  cerr << "MPI: slave " << rank << " ready to receive work from master" 
  << endl;
  MPI_Recv(, workLength+1, MPI_BYTE, 0, WORK_TAG,
   MPI_COMM_WORLD, ); 
  cerr << "MPI: slave " << rank << " received work from master, "
  << "errorCode = " << errorCode << endl;
  mpiErrorLog(rank, errorCode);
  //**//assert(work[workLength] == '\0');
  //**//cerr << ">>>MPI: work = " << work << endl;
  //**//printf("MPI: work = %s", work);


  assert(status.MPI_TAG == WORK_TAG);
  
  
  stringstream ss1;
  string s0(DUMMY_MSG_LENGTH, 'R');
  cerr << "result msg = '" << s0 << "'" << endl; 
  ss1 << s0;
  
  // Send result to the master //
  
  unit_of_length_t resultLength = ss1.str().length();
  
  unit_of_result_t * result = new unit_of_result_t[resultLength+1];
  result[resultLength] = '\0';
  cerr << "result = " << (void*)result << endl;
  

Re: [OMPI users] ompi-restart failed && ompi-migrate

2012-04-11 Thread Josh Hursey
The 1.5 series does not support process migration, so there is no
ompi-migrate option there. This was only contributed to the trunk (1.7
series). However, changes to the runtime environment over the past few
months have broken this functionality. It is currently unclear when
this will be repaired. We hope to have it fixed and functional again
before the first release of the 1.7 series.

As far as your problem with ompi-restart have you checked the prelink
option on all of your nodes, per:
  https://upc-bugs.lbl.gov/blcr/doc/html/FAQ.html#prelink

-- Josh

On Tue, Apr 10, 2012 at 11:14 PM, kidd  wrote:
> Hello !
> I had some  problems .
> This is My environment
>    BLCR= 0.8.4   , openMPI= 1.5.5  , OS= ubuntu 11.04
>    I have 2 Node : cuda05(Master ,it have NFS  file system)  , cuda07(slave
> ,mount Master)
>
>    I had also set  ~/.openmpi/mca-params.conf->
>  crs_base_snapshot_dir=/root/kidd_openMPI/Tmp
>  snapc_base_global_snapshot_dir=/root/kidd_openMPI/checkpoints
>
>   my configure format=
> ./configure --prefix=/root/kidd_openMPI --with-ft=cr --enable-ft-thread
>  --with-blcr=/usr/local/BLCR  --with-blcr-libdir=/usr/local/BLCR/lib
> --enable-mpirun-prefix-by-default
>  --enable-static --enable-shared  --enable-opal-multi-threads;
>
> problem 1:  ompi-restart  on multiple Node
>   command 01: mpirun -hostfile  Hosts -am ft-enable-cr  -x  LD_LIBRARY_PATH
> -np 2  ./TEST
>   command 02: ompi-restart  ompi_global_snapshot_2892.ckpt
>   -> I can checkpoint 2 process on multiples nodes ,but when I restart
> ,it can only restart on Master-Node.
>
>      command 03 : ompi-restart  -hostfile Hosts
> ompi_global_snapshot_2892.ckpt
>     ->Error Message .   I make sure BLCR  is OK.
> 
>
> --
>     root@cuda05:~/kidd_openMPI/checkpoints# ompi-restart -hostfile Hosts
> ompi_global_snapshot_2892.ckpt/
>
> --
>    Error: BLCR was not able to restart the process because exec failed.
>     Check the installation of BLCR on all of the machines in your
>    system. The following information may be of help:
>  Return Code : -1
>  BLCR Restart Command : cr_restart
>  Restart Command Line : cr_restart
> /root/kidd_openMPI/checkpoints/ompi_global_snapshot_2892.ckpt/0/opal_snapshot_1.ckpt/ompi_blcr_context.2704
> --
> --
> Error: Unable to obtain the proper restart command to restart from the
>    checkpoint file (opal_snapshot_1.ckpt). Returned -1.
>    Check the installation of the blcr checkpoint/restart service
>    on all of the machines in your system.essage
> 
>  problem 2: ompi-migrate i can't find .   How to use ompi-migrate ?
>
>   Please help me , thanks .
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



-- 
Joshua Hursey
Postdoctoral Research Associate
Oak Ridge National Laboratory
http://users.nccs.gov/~jjhursey



Re: [OMPI users] sge tight integration leads to bad allocation

2012-04-11 Thread Ralph Castain

On Apr 11, 2012, at 6:20 AM, Reuti wrote:

> Am 11.04.2012 um 04:26 schrieb Ralph Castain:
> 
>> Hi Reuti
>> 
>> Can you replicate this problem on your machine? Can you try it with 1.5?
> 
> No. It's also working fine in 1.5.5 in some tests. I even forced an uneven 
> distribution by limiting the slots setting for some machines in the queue 
> configuration.

Thanks - that confirms what I've been able to test. It sounds like it is 
something in Eloi's setup, but I can't fathom what it would be - the 
allocations all look acceptable.

I'm stumped. :-(


> 
> -- Reuti
> 
> 
>> Afraid I don't have a way to replicate it, and as I said, wouldn't fix it 
>> for the 1.4 series anyway. I'm not seeing this problem elsewhere, but I 
>> don't generally get an allocation that varies across nodes.
>> 
>> Ralph
>> 
>> On Apr 10, 2012, at 11:57 AM, Reuti wrote:
>> 
>>> Am 10.04.2012 um 16:55 schrieb Eloi Gaudry:
>>> 
 Hi Ralf,
 
 I haven't tried any of the 1.5 series yet (we have chosen not to use the 
 features releases) but if this is mandatory for you to work on this topic, 
 I will.
 
 This might be of interest to Reuti and you : it seems that we cannot 
 reproduce the problem anymore if we don't provide the "-np N" option on 
 the orterun command line. Of course, we need to launch a few other runs to 
 be really sure because the allocation error was not always observable. 
 Actually, I recently understood (from Reuti) that the tight integration 
 mode would supply every necessary bits to the launcher and thus I removed 
 the '-np N' that was around... Could it be that using the '-np N' while 
 using the sge tight integration mode is pathologic ?
>>> 
>>> Yes, it should work without problem to specify -np. As it didn't hit me in 
>>> my tests (normally I don't specify -np), I would really be interested in 
>>> the underlying cause.
>>> 
>>> Especially as the example in Open MPI's FAQ lists -np to start with 
>>> GirdEngine integration, it should have hit other users too.
>>> 
>>> -- Reuti
>>> 
>>> 
 Regards,
 Eloi
 
 
 -Original Message-
 From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On 
 Behalf Of Ralph Castain
 Sent: mardi 10 avril 2012 16:43
 To: Open MPI Users
 Subject: Re: [OMPI users] sge tight intregration leads to bad allocation
 
 Could well be a bug in OMPI - I can take a look, though it may be awhile 
 before I get to it. Have you tried one of the 1.5 series releases?
 
 On Apr 10, 2012, at 3:42 AM, Eloi Gaudry wrote:
 
> Thx. This is the allocation which is also confirmed by the Open MPI 
> output.
> [eg: ] exactly, but not the one used afterwards by openmpi
> 
> - The application was compiled with the same version of Open MPI?
> [eg: ] yes, version 1.4.4 for all
> 
> - Does the application start something on its own besides the tasks 
> granted by mpiexec/orterun?
> [eg: ] no
> 
> You want 12 ranks in total, and to barney.fft and carl.fft there are also 
> "-mca orte_ess_num_procs 3 " given in to the qrsh_starter. In total I 
> count only 10 ranks in this example given - 4+4+2 - do you observe the 
> same?
> [eg: ] i don't know why the -mca orte_ess_num_procs 3 is added here...
> In the "Map generated by mapping policy" output in my last email, I see 
> that 4 processes were started on each node (barney, carl and charlie), 
> but yes, in the ps -elf output, two of them are missing for one node 
> (barney)... sorry about that, a bad copy/paste. Here is the actual output 
> for this node:
> 2048 ?Sl 3:33 /opt/sge/bin/lx-amd64/sge_execd
> 27502 ?Sl 0:00  \_ sge_shepherd-1416 -bg
> 27503 ?Ss 0:00  \_ /opt/sge/utilbin/lx-amd64/qrsh_starter 
> /opt/sge/default/spool/barney/active_jobs/1416.1/1.barney
> 27510 ?S  0:00  \_ bash -c  
> PATH=/opt/openmpi-1.4.4/bin:$PATH ; export PATH ; 
> LD_LIBRARY_PATH=/opt/openmpi-1.4.4/lib:$LD_LIBRARY_PATH ; export 
> LD_LIBRARY_PATH ;  /opt/openmpi-1.4.4/bin/orted -mca ess env -mca 
> orte_ess_jobid 3800367104 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 3 
> --hnp-uri "3800367104.0;tcp://192.168.0.20:57233" --mca 
> pls_gridengine_verbose 1 --mca ras_gridengine_show_jobid 1 --mca 
> ras_gridengine_verbose 1
> 27511 ?S  0:00  \_ /opt/openmpi-1.4.4/bin/orted 
> -mca ess env -mca orte_ess_jobid 3800367104 -mca orte_ess_vpid 1 -mca 
> orte_ess_num_procs 3 --hnp-uri 3800367104.0;tcp://192.168.0.20:57233 
> --mca pls_gridengine_verbose 1 --mca ras_gridengine_show_jobid 1 --mca 
> ras_gridengine_verbose 1
> 27512 ?Rl12:54  \_ 
> /opt/fft/actran_product/Actran_13.0.b.57333/bin/actranpy_mp 
> --apl=/opt/fft/actran_product/Actran_13.0.b.57333 -e radiation -m 1 

Re: [OMPI users] sge tight integration leads to bad allocation

2012-04-11 Thread Reuti
Am 11.04.2012 um 04:26 schrieb Ralph Castain:

> Hi Reuti
> 
> Can you replicate this problem on your machine? Can you try it with 1.5?

No. It's also working fine in 1.5.5 in some tests. I even forced an uneven 
distribution by limiting the slots setting for some machines in the queue 
configuration.

-- Reuti


> Afraid I don't have a way to replicate it, and as I said, wouldn't fix it for 
> the 1.4 series anyway. I'm not seeing this problem elsewhere, but I don't 
> generally get an allocation that varies across nodes.
> 
> Ralph
> 
> On Apr 10, 2012, at 11:57 AM, Reuti wrote:
> 
>> Am 10.04.2012 um 16:55 schrieb Eloi Gaudry:
>> 
>>> Hi Ralf,
>>> 
>>> I haven't tried any of the 1.5 series yet (we have chosen not to use the 
>>> features releases) but if this is mandatory for you to work on this topic, 
>>> I will.
>>> 
>>> This might be of interest to Reuti and you : it seems that we cannot 
>>> reproduce the problem anymore if we don't provide the "-np N" option on the 
>>> orterun command line. Of course, we need to launch a few other runs to be 
>>> really sure because the allocation error was not always observable. 
>>> Actually, I recently understood (from Reuti) that the tight integration 
>>> mode would supply every necessary bits to the launcher and thus I removed 
>>> the '-np N' that was around... Could it be that using the '-np N' while 
>>> using the sge tight integration mode is pathologic ?
>> 
>> Yes, it should work without problem to specify -np. As it didn't hit me in 
>> my tests (normally I don't specify -np), I would really be interested in the 
>> underlying cause.
>> 
>> Especially as the example in Open MPI's FAQ lists -np to start with 
>> GirdEngine integration, it should have hit other users too.
>> 
>> -- Reuti
>> 
>> 
>>> Regards,
>>> Eloi
>>> 
>>> 
>>> -Original Message-
>>> From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On 
>>> Behalf Of Ralph Castain
>>> Sent: mardi 10 avril 2012 16:43
>>> To: Open MPI Users
>>> Subject: Re: [OMPI users] sge tight intregration leads to bad allocation
>>> 
>>> Could well be a bug in OMPI - I can take a look, though it may be awhile 
>>> before I get to it. Have you tried one of the 1.5 series releases?
>>> 
>>> On Apr 10, 2012, at 3:42 AM, Eloi Gaudry wrote:
>>> 
 Thx. This is the allocation which is also confirmed by the Open MPI output.
 [eg: ] exactly, but not the one used afterwards by openmpi
 
 - The application was compiled with the same version of Open MPI?
 [eg: ] yes, version 1.4.4 for all
 
 - Does the application start something on its own besides the tasks 
 granted by mpiexec/orterun?
 [eg: ] no
 
 You want 12 ranks in total, and to barney.fft and carl.fft there are also 
 "-mca orte_ess_num_procs 3 " given in to the qrsh_starter. In total I 
 count only 10 ranks in this example given - 4+4+2 - do you observe the 
 same?
 [eg: ] i don't know why the -mca orte_ess_num_procs 3 is added here...
 In the "Map generated by mapping policy" output in my last email, I see 
 that 4 processes were started on each node (barney, carl and charlie), but 
 yes, in the ps -elf output, two of them are missing for one node 
 (barney)... sorry about that, a bad copy/paste. Here is the actual output 
 for this node:
 2048 ?Sl 3:33 /opt/sge/bin/lx-amd64/sge_execd
 27502 ?Sl 0:00  \_ sge_shepherd-1416 -bg
 27503 ?Ss 0:00  \_ /opt/sge/utilbin/lx-amd64/qrsh_starter 
 /opt/sge/default/spool/barney/active_jobs/1416.1/1.barney
 27510 ?S  0:00  \_ bash -c  
 PATH=/opt/openmpi-1.4.4/bin:$PATH ; export PATH ; 
 LD_LIBRARY_PATH=/opt/openmpi-1.4.4/lib:$LD_LIBRARY_PATH ; export 
 LD_LIBRARY_PATH ;  /opt/openmpi-1.4.4/bin/orted -mca ess env -mca 
 orte_ess_jobid 3800367104 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 3 
 --hnp-uri "3800367104.0;tcp://192.168.0.20:57233" --mca 
 pls_gridengine_verbose 1 --mca ras_gridengine_show_jobid 1 --mca 
 ras_gridengine_verbose 1
 27511 ?S  0:00  \_ /opt/openmpi-1.4.4/bin/orted 
 -mca ess env -mca orte_ess_jobid 3800367104 -mca orte_ess_vpid 1 -mca 
 orte_ess_num_procs 3 --hnp-uri 3800367104.0;tcp://192.168.0.20:57233 --mca 
 pls_gridengine_verbose 1 --mca ras_gridengine_show_jobid 1 --mca 
 ras_gridengine_verbose 1
 27512 ?Rl12:54  \_ 
 /opt/fft/actran_product/Actran_13.0.b.57333/bin/actranpy_mp 
 --apl=/opt/fft/actran_product/Actran_13.0.b.57333 -e radiation -m 1 
 --parallel=frequency --scratch=/scratch/cluster/1416 
 --inputfile=/home/jj/Projects/Toyota/REFERENCE_JPC/semi_green_PML_06/semi_green_coarse.edat
 27513 ?Rl12:54  \_ 
 /opt/fft/actran_product/Actran_13.0.b.57333/bin/actranpy_mp 
 --apl=/opt/fft/actran_product/Actran_13.0.b.57333 -e radiation -m 1 
 

Re: [OMPI users] wrong core binding by openmpi-1.5.5

2012-04-11 Thread Ralph Castain
Ouch - finally figured out what happened. Jeff and I did indeed address this 
problem a few weeks ago. There were some changes required in a couple of places 
to make it all work, so we did the work in a Mercurial branch Jeff set up.

Unfortunately, I think he got distracted by the MPI Forum shortly thereafter, 
and then got engulfed by other things. The work appears complete, but I can't 
find a record of it actually being committed to the 1.5 branch. Could be he 
intended it for 1.6.

I'll have to bug him when he gets back next week and see what happened, and his 
plans. Sorry for the mixup.
Ralph

On Apr 11, 2012, at 3:15 AM, Brice Goglin wrote:

> Here's a better patch. Still only compile tested :)
> Brice
> 
> 
> Le 11/04/2012 10:36, Brice Goglin a écrit :
>> 
>> A quick look at the code seems to confirm my feeling. get/set_module()
>> callbacks manipulate arrays of logical indexes, and they do not convert
>> them back to physical indexes before binding.
>> 
>> Here's a quick patch that may help. Only compile tested...
>> 
>> Brice
>> 
>> 
>> 
>> Le 11/04/2012 09:49, Brice Goglin a écrit :
>>> Le 11/04/2012 09:06, tmish...@jcity.maeda.co.jp a écrit :
 Hi, Brice.
 
 I installed the latest hwloc-1.4.1.
 Here is the output of lstopo -p.
 
 [root@node03 bin]# ./lstopo -p
 Machine (126GB)
   Socket P#0 (32GB)
 NUMANode P#0 (16GB) + L3 (5118KB)
   L2 (512KB) + L1 (64KB) + Core P#0 + PU P#0
   L2 (512KB) + L1 (64KB) + Core P#1 + PU P#4
   L2 (512KB) + L1 (64KB) + Core P#2 + PU P#8
   L2 (512KB) + L1 (64KB) + Core P#3 + PU P#12
>>> Ok then the cpuset of this numanode is .
>>> 
> [node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
> [[55518,1],0] to cpus 
>>> So openmpi 1.5.4 is correct.
>>> 
> [node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
> [[40566,1],0] to cpus 000f
>>> And openmpi 1.5.5 is indeed wrong.
>>> 
>>> Random guess: 000f is the bitmask made of hwloc *logical* indexes. hwloc
>>> cpusets (used for binding) are internally made of hwloc *physical*
>>> indexes ( here).
>>> 
>>> Jeff, Ralph:
>>> How does OMPI 1.5.5 build hwloc cpusets for binding? Are you doing
>>> bitmap operations on hwloc object cpusets?
>>> If yes, I don't know what's going wrong here.
>>> If no, are you building hwloc cpusets manually by setting individual
>>> bits from object indexes? If yes, you must use *physical* indexes to do so.
>>> 
>>> Brice
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] wrong core binding by openmpi-1.5.5

2012-04-11 Thread Ralph Castain
Interesting. Jeff and I had discussed that very problem not that long ago, and 
I could swear he fixed it - but I don't see the CMR for that code. He's on 
vacation this week, so I'll wait for his return to look at it.

Thanks!
Ralph

On Apr 11, 2012, at 2:36 AM, Brice Goglin wrote:

> A quick look at the code seems to confirm my feeling. get/set_module()
> callbacks manipulate arrays of logical indexes, and they do not convert
> them back to physical indexes before binding.
> 
> Here's a quick patch that may help. Only compile tested...
> 
> Brice
> 
> 
> 
> Le 11/04/2012 09:49, Brice Goglin a écrit :
>> Le 11/04/2012 09:06, tmish...@jcity.maeda.co.jp a écrit :
>>> Hi, Brice.
>>> 
>>> I installed the latest hwloc-1.4.1.
>>> Here is the output of lstopo -p.
>>> 
>>> [root@node03 bin]# ./lstopo -p
>>> Machine (126GB)
>>>  Socket P#0 (32GB)
>>>NUMANode P#0 (16GB) + L3 (5118KB)
>>>  L2 (512KB) + L1 (64KB) + Core P#0 + PU P#0
>>>  L2 (512KB) + L1 (64KB) + Core P#1 + PU P#4
>>>  L2 (512KB) + L1 (64KB) + Core P#2 + PU P#8
>>>  L2 (512KB) + L1 (64KB) + Core P#3 + PU P#12
>> Ok then the cpuset of this numanode is .
>> 
 [node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
 [[55518,1],0] to cpus 
>> So openmpi 1.5.4 is correct.
>> 
 [node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
 [[40566,1],0] to cpus 000f
>> And openmpi 1.5.5 is indeed wrong.
>> 
>> Random guess: 000f is the bitmask made of hwloc *logical* indexes. hwloc
>> cpusets (used for binding) are internally made of hwloc *physical*
>> indexes ( here).
>> 
>> Jeff, Ralph:
>> How does OMPI 1.5.5 build hwloc cpusets for binding? Are you doing
>> bitmap operations on hwloc object cpusets?
>> If yes, I don't know what's going wrong here.
>> If no, are you building hwloc cpusets manually by setting individual
>> bits from object indexes? If yes, you must use *physical* indexes to do so.
>> 
>> Brice
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] wrong core binding by openmpi-1.5.5

2012-04-11 Thread Brice Goglin
Here's a better patch. Still only compile tested :)
Brice


Le 11/04/2012 10:36, Brice Goglin a écrit :
> A quick look at the code seems to confirm my feeling. get/set_module()
> callbacks manipulate arrays of logical indexes, and they do not convert
> them back to physical indexes before binding.
>
> Here's a quick patch that may help. Only compile tested...
>
> Brice
>
>
>
> Le 11/04/2012 09:49, Brice Goglin a écrit :
>> Le 11/04/2012 09:06, tmish...@jcity.maeda.co.jp a écrit :
>>> Hi, Brice.
>>>
>>> I installed the latest hwloc-1.4.1.
>>> Here is the output of lstopo -p.
>>>
>>> [root@node03 bin]# ./lstopo -p
>>> Machine (126GB)
>>>   Socket P#0 (32GB)
>>> NUMANode P#0 (16GB) + L3 (5118KB)
>>>   L2 (512KB) + L1 (64KB) + Core P#0 + PU P#0
>>>   L2 (512KB) + L1 (64KB) + Core P#1 + PU P#4
>>>   L2 (512KB) + L1 (64KB) + Core P#2 + PU P#8
>>>   L2 (512KB) + L1 (64KB) + Core P#3 + PU P#12
>> Ok then the cpuset of this numanode is .
>>
 [node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
 [[55518,1],0] to cpus 
>> So openmpi 1.5.4 is correct.
>>
 [node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
 [[40566,1],0] to cpus 000f
>> And openmpi 1.5.5 is indeed wrong.
>>
>> Random guess: 000f is the bitmask made of hwloc *logical* indexes. hwloc
>> cpusets (used for binding) are internally made of hwloc *physical*
>> indexes ( here).
>>
>> Jeff, Ralph:
>> How does OMPI 1.5.5 build hwloc cpusets for binding? Are you doing
>> bitmap operations on hwloc object cpusets?
>> If yes, I don't know what's going wrong here.
>> If no, are you building hwloc cpusets manually by setting individual
>> bits from object indexes? If yes, you must use *physical* indexes to do so.
>>
>> Brice
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

--- opal/mca/paffinity/hwloc/paffinity_hwloc_module.c.old	2012-04-11 10:19:36.766710073 +0200
+++ opal/mca/paffinity/hwloc/paffinity_hwloc_module.c	2012-04-11 11:13:52.930438083 +0200
@@ -164,9 +164,10 @@

 static int module_set(opal_paffinity_base_cpu_set_t mask)
 {
-int i, ret = OPAL_SUCCESS;
+int ret = OPAL_SUCCESS;
 hwloc_bitmap_t set;
 hwloc_topology_t *t;
+hwloc_obj_t pu;

 /* bozo check */
 if (NULL == opal_hwloc_topology) {
@@ -178,10 +179,11 @@
 if (NULL == set) {
 return OPAL_ERR_OUT_OF_RESOURCE;
 }
-hwloc_bitmap_zero(set);
-for (i = 0; ((unsigned int) i) < OPAL_PAFFINITY_BITMASK_CPU_MAX; ++i) {
-if (OPAL_PAFFINITY_CPU_ISSET(i, mask)) {
-hwloc_bitmap_set(set, i);
+for (pu = hwloc_get_obj_by_type(*t, HWLOC_OBJ_PU, 0);
+ pu && pu->logical_index < OPAL_PAFFINITY_BITMASK_CPU_MAX;
+ pu = pu->next_cousin) {
+if (OPAL_PAFFINITY_CPU_ISSET(pu->logical_index, mask)) {
+hwloc_bitmap_set(set, pu->os_index);
 }
 }

@@ -196,9 +198,10 @@

 static int module_get(opal_paffinity_base_cpu_set_t *mask)
 {
-int i, ret = OPAL_SUCCESS;
+int ret = OPAL_SUCCESS;
 hwloc_bitmap_t set;
 hwloc_topology_t *t;
+hwloc_obj_t pu;

 /* bozo check */
 if (NULL == opal_hwloc_topology) {
@@ -218,9 +221,11 @@
 ret = OPAL_ERR_IN_ERRNO;
 } else {
 OPAL_PAFFINITY_CPU_ZERO(*mask);
-for (i = 0; ((unsigned int) i) < 8 * sizeof(*mask); i++) {
-if (hwloc_bitmap_isset(set, i)) {
-OPAL_PAFFINITY_CPU_SET(i, *mask);
+for (pu = hwloc_get_obj_by_type(*t, HWLOC_OBJ_PU, 0);
+ pu && pu->logical_index < 8 * sizeof(*mask);
+ pu = pu->next_cousin) {
+if (hwloc_bitmap_isset(set, pu->os_index)) {
+OPAL_PAFFINITY_CPU_SET(pu->logical_index, *mask);
 }
 }
 }


Re: [OMPI users] wrong core binding by openmpi-1.5.5

2012-04-11 Thread Brice Goglin
A quick look at the code seems to confirm my feeling. get/set_module()
callbacks manipulate arrays of logical indexes, and they do not convert
them back to physical indexes before binding.

Here's a quick patch that may help. Only compile tested...

Brice



Le 11/04/2012 09:49, Brice Goglin a écrit :
> Le 11/04/2012 09:06, tmish...@jcity.maeda.co.jp a écrit :
>> Hi, Brice.
>>
>> I installed the latest hwloc-1.4.1.
>> Here is the output of lstopo -p.
>>
>> [root@node03 bin]# ./lstopo -p
>> Machine (126GB)
>>   Socket P#0 (32GB)
>> NUMANode P#0 (16GB) + L3 (5118KB)
>>   L2 (512KB) + L1 (64KB) + Core P#0 + PU P#0
>>   L2 (512KB) + L1 (64KB) + Core P#1 + PU P#4
>>   L2 (512KB) + L1 (64KB) + Core P#2 + PU P#8
>>   L2 (512KB) + L1 (64KB) + Core P#3 + PU P#12
> Ok then the cpuset of this numanode is .
>
>>> [node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
>>> [[55518,1],0] to cpus 
> So openmpi 1.5.4 is correct.
>
>>> [node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
>>> [[40566,1],0] to cpus 000f
> And openmpi 1.5.5 is indeed wrong.
>
> Random guess: 000f is the bitmask made of hwloc *logical* indexes. hwloc
> cpusets (used for binding) are internally made of hwloc *physical*
> indexes ( here).
>
> Jeff, Ralph:
> How does OMPI 1.5.5 build hwloc cpusets for binding? Are you doing
> bitmap operations on hwloc object cpusets?
> If yes, I don't know what's going wrong here.
> If no, are you building hwloc cpusets manually by setting individual
> bits from object indexes? If yes, you must use *physical* indexes to do so.
>
> Brice
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

--- opal/mca/paffinity/hwloc/paffinity_hwloc_module.c.old	2012-04-11 10:19:36.766710073 +0200
+++ opal/mca/paffinity/hwloc/paffinity_hwloc_module.c	2012-04-11 10:32:07.398696734 +0200
@@ -167,6 +167,7 @@
 int i, ret = OPAL_SUCCESS;
 hwloc_bitmap_t set;
 hwloc_topology_t *t;
+hwloc_obj_t pu;

 /* bozo check */
 if (NULL == opal_hwloc_topology) {
@@ -178,10 +179,11 @@
 if (NULL == set) {
 return OPAL_ERR_OUT_OF_RESOURCE;
 }
-hwloc_bitmap_zero(set);
-for (i = 0; ((unsigned int) i) < OPAL_PAFFINITY_BITMASK_CPU_MAX; ++i) {
+for (i = 0, pu = hwloc_get_obj_by_type(*t, HWLOC_OBJ_PU, 0);
+	 ((unsigned int) i) < OPAL_PAFFINITY_BITMASK_CPU_MAX;
+	 ++i, pu = pu->next_cousin) {
 if (OPAL_PAFFINITY_CPU_ISSET(i, mask)) {
-hwloc_bitmap_set(set, i);
+hwloc_bitmap_set(set, pu->os_index);
 }
 }

@@ -199,6 +201,7 @@
 int i, ret = OPAL_SUCCESS;
 hwloc_bitmap_t set;
 hwloc_topology_t *t;
+hwloc_obj_t pu;

 /* bozo check */
 if (NULL == opal_hwloc_topology) {
@@ -218,8 +221,10 @@
 ret = OPAL_ERR_IN_ERRNO;
 } else {
 OPAL_PAFFINITY_CPU_ZERO(*mask);
-for (i = 0; ((unsigned int) i) < 8 * sizeof(*mask); i++) {
-if (hwloc_bitmap_isset(set, i)) {
+for (i = 0, pu = hwloc_get_obj_by_type(*t, HWLOC_OBJ_PU, 0);
+	 ((unsigned int) i) < 8 * sizeof(*mask);
+	 i++, pu = pu->next_cousin) {
+if (hwloc_bitmap_isset(set, pu->os_index)) {
 OPAL_PAFFINITY_CPU_SET(i, *mask);
 }
 }


Re: [OMPI users] wrong core binding by openmpi-1.5.5

2012-04-11 Thread Brice Goglin
Le 11/04/2012 09:06, tmish...@jcity.maeda.co.jp a écrit :
> Hi, Brice.
>
> I installed the latest hwloc-1.4.1.
> Here is the output of lstopo -p.
>
> [root@node03 bin]# ./lstopo -p
> Machine (126GB)
>   Socket P#0 (32GB)
> NUMANode P#0 (16GB) + L3 (5118KB)
>   L2 (512KB) + L1 (64KB) + Core P#0 + PU P#0
>   L2 (512KB) + L1 (64KB) + Core P#1 + PU P#4
>   L2 (512KB) + L1 (64KB) + Core P#2 + PU P#8
>   L2 (512KB) + L1 (64KB) + Core P#3 + PU P#12

Ok then the cpuset of this numanode is .

>> [node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
>> [[55518,1],0] to cpus 

So openmpi 1.5.4 is correct.

>> [node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
>> [[40566,1],0] to cpus 000f
And openmpi 1.5.5 is indeed wrong.

Random guess: 000f is the bitmask made of hwloc *logical* indexes. hwloc
cpusets (used for binding) are internally made of hwloc *physical*
indexes ( here).

Jeff, Ralph:
How does OMPI 1.5.5 build hwloc cpusets for binding? Are you doing
bitmap operations on hwloc object cpusets?
If yes, I don't know what's going wrong here.
If no, are you building hwloc cpusets manually by setting individual
bits from object indexes? If yes, you must use *physical* indexes to do so.

Brice



Re: [OMPI users] wrong core binding by openmpi-1.5.5

2012-04-11 Thread tmishima
Hi, Brice.

I installed the latest hwloc-1.4.1.
Here is the output of lstopo -p.

[root@node03 bin]# ./lstopo -p
Machine (126GB)
  Socket P#0 (32GB)
NUMANode P#0 (16GB) + L3 (5118KB)
  L2 (512KB) + L1 (64KB) + Core P#0 + PU P#0
  L2 (512KB) + L1 (64KB) + Core P#1 + PU P#4
  L2 (512KB) + L1 (64KB) + Core P#2 + PU P#8
  L2 (512KB) + L1 (64KB) + Core P#3 + PU P#12
NUMANode P#1 (16GB) + L3 (5118KB)
  L2 (512KB) + L1 (64KB) + Core P#0 + PU P#16
  L2 (512KB) + L1 (64KB) + Core P#1 + PU P#20
  L2 (512KB) + L1 (64KB) + Core P#2 + PU P#24
  L2 (512KB) + L1 (64KB) + Core P#3 + PU P#28
  Socket P#3 (32GB)
NUMANode P#6 (16GB) + L3 (5118KB)
  L2 (512KB) + L1 (64KB) + Core P#0 + PU P#1
  L2 (512KB) + L1 (64KB) + Core P#1 + PU P#5
  L2 (512KB) + L1 (64KB) + Core P#2 + PU P#9
  L2 (512KB) + L1 (64KB) + Core P#3 + PU P#13
NUMANode P#7 (16GB) + L3 (5118KB)
  L2 (512KB) + L1 (64KB) + Core P#0 + PU P#17
  L2 (512KB) + L1 (64KB) + Core P#1 + PU P#21
  L2 (512KB) + L1 (64KB) + Core P#2 + PU P#25
  L2 (512KB) + L1 (64KB) + Core P#3 + PU P#29
  Socket P#2 (32GB)
NUMANode P#4 (16GB) + L3 (5118KB)
  L2 (512KB) + L1 (64KB) + Core P#0 + PU P#2
  L2 (512KB) + L1 (64KB) + Core P#1 + PU P#6
  L2 (512KB) + L1 (64KB) + Core P#2 + PU P#10
  L2 (512KB) + L1 (64KB) + Core P#3 + PU P#14
NUMANode P#5 (16GB) + L3 (5118KB)
  L2 (512KB) + L1 (64KB) + Core P#0 + PU P#18
  L2 (512KB) + L1 (64KB) + Core P#1 + PU P#22
  L2 (512KB) + L1 (64KB) + Core P#2 + PU P#26
  L2 (512KB) + L1 (64KB) + Core P#3 + PU P#30
  Socket P#1 (32GB)
NUMANode P#2 (16GB) + L3 (5118KB)
  L2 (512KB) + L1 (64KB) + Core P#0 + PU P#3
  L2 (512KB) + L1 (64KB) + Core P#1 + PU P#7
  L2 (512KB) + L1 (64KB) + Core P#2 + PU P#11
  L2 (512KB) + L1 (64KB) + Core P#3 + PU P#15
NUMANode P#3 (16GB) + L3 (5118KB)
  L2 (512KB) + L1 (64KB) + Core P#0 + PU P#19
  L2 (512KB) + L1 (64KB) + Core P#1 + PU P#23
  L2 (512KB) + L1 (64KB) + Core P#2 + PU P#27
  L2 (512KB) + L1 (64KB) + Core P#3 + PU P#31
  HostBridge P#0
PCIBridge
  PCI 14e4:1639
Net "eth0"
  PCI 14e4:1639
Net "eth1"
PCIBridge
  PCI 14e4:1639
Net "eth2"
  PCI 14e4:1639
Net "eth3"
PCIBridge
  PCIBridge
PCIBridge
  PCI 1000:0072
Block "sdb"
Block "sda"
PCI 1002:4390
  Block "sr0"
PCIBridge
  PCI 102b:0532
  HostBridge P#1
PCIBridge
  PCI 15b3:6274
Net "ib0"
OpenFabrics "mthca0"

Tetsuya Mishima

> Can you send the output of lstopo -p ? (you'll have to install hwloc)
> Brice
>
>
> tmish...@jcity.maeda.co.jp a écrit :
>
>
> Hi,
>
> I updated openmpi from version 1.5.4 to 1.5.5.
> Then, an execution speed of my application becomes quite slower than
> before,
> due to wrong core bindings. As far as I checked, it seems that
> openmpi-1.5.4
> gives correct core bindings for my magnycore based machine.
>
> 1) my script is as follows:
> export OMP_NUM_THREADS=4
> mpirun -machinefile pbs_hosts \
> -np 8 \
> -x OMP_NUM_THREADS \
> -bind-to-core \
> -cpus-per-proc ${OMP_NUM_THREADS} \
> -report-bindings \
> ./Solver
>
> 2)binding reports are as follows:
> openmpi-1.5.4:
> [node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
> [[55518,1],3] to cpus 
> [node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
> [[55518,1],4] to cpus 
> [node03.cluster:21706] [[55518,0],0]
> odls:default:fork binding child
> [[55518,1],5] to cpus 
> [node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
> [[55518,1],6] to cpus 
> [node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
> [[55518,1],7] to cpus 
> [node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
> [[55518,1],0] to cpus 
> [node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
> [[55518,1],1] to cpus 
> [node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
> [[55518,1],2] to cpus 
> openmpi-1.5.5:
> [node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
> [[40566,1],3] to cpus f000
> [node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
> [[40566,1],4] to cpus 000f
> [node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
> [[40566,1],5] to cpus 00f0
> [node03.cluster:04706] [[40566,0],0]
> odls:default:fork binding child
> [[40566,1],6] to cpus 0f00
> [node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
> [[40566,1],7] to cpus f000
> [node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
> [[40566,1],0] to cpus 000f
> [node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
> [[40566,1],1] to cpus 00f0
> [node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
> [[40566,1],2] to cpus 0f00
>
> 3)node03 has 32 cores with 4 

Re: [OMPI users] wrong core binding by openmpi-1.5.5

2012-04-11 Thread Brice Goglin
Can you send the output of lstopo -p ? (you'll have to install hwloc)
Brice


tmish...@jcity.maeda.co.jp a écrit :


Hi,

I updated openmpi from version 1.5.4 to 1.5.5.
Then, an execution speed of my application becomes quite slower than
before,
due to wrong core bindings. As far as I checked, it seems that
openmpi-1.5.4
gives correct core bindings for my magnycore based machine.

1) my script is as follows:
export OMP_NUM_THREADS=4
mpirun -machinefile pbs_hosts \
-np 8 \
-x OMP_NUM_THREADS \
-bind-to-core \
-cpus-per-proc ${OMP_NUM_THREADS} \
-report-bindings \
./Solver

2)binding reports are as follows:
openmpi-1.5.4:
[node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
[[55518,1],3] to cpus 
[node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
[[55518,1],4] to cpus 
[node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
[[55518,1],5] to cpus 
[node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
[[55518,1],6] to cpus 
[node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
[[55518,1],7] to cpus 
[node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
[[55518,1],0] to cpus 
[node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
[[55518,1],1] to cpus 
[node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
[[55518,1],2] to cpus 
openmpi-1.5.5:
[node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
[[40566,1],3] to cpus f000
[node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
[[40566,1],4] to cpus 000f
[node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
[[40566,1],5] to cpus 00f0
[node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
[[40566,1],6] to cpus 0f00
[node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
[[40566,1],7] to cpus f000
[node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
[[40566,1],0] to cpus 000f
[node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
[[40566,1],1] to cpus 00f0
[node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
[[40566,1],2] to cpus 0f00

3)node03 has 32 cores with 4 magnycores(8cores/cpu-type).

Regards,
Tetsuya Mishima

_

users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



[OMPI users] wrong core binding by openmpi-1.5.5

2012-04-11 Thread tmishima

Hi,

I updated openmpi from version 1.5.4 to 1.5.5.
Then, an execution speed of my application becomes quite slower than
before,
due to wrong core bindings. As far as I checked, it seems that
openmpi-1.5.4
gives correct core bindings for my magnycore based machine.

1) my script is as follows:
export OMP_NUM_THREADS=4
mpirun -machinefile pbs_hosts \
   -np 8 \
   -x OMP_NUM_THREADS \
   -bind-to-core \
   -cpus-per-proc ${OMP_NUM_THREADS} \
   -report-bindings \
   ./Solver

2)binding reports are as follows:
openmpi-1.5.4:
[node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
[[55518,1],3] to cpus 
[node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
[[55518,1],4] to cpus 
[node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
[[55518,1],5] to cpus 
[node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
[[55518,1],6] to cpus 
[node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
[[55518,1],7] to cpus 
[node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
[[55518,1],0] to cpus 
[node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
[[55518,1],1] to cpus 
[node03.cluster:21706] [[55518,0],0] odls:default:fork binding child
[[55518,1],2] to cpus 
openmpi-1.5.5:
[node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
[[40566,1],3] to cpus f000
[node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
[[40566,1],4] to cpus 000f
[node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
[[40566,1],5] to cpus 00f0
[node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
[[40566,1],6] to cpus 0f00
[node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
[[40566,1],7] to cpus f000
[node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
[[40566,1],0] to cpus 000f
[node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
[[40566,1],1] to cpus 00f0
[node03.cluster:04706] [[40566,0],0] odls:default:fork binding child
[[40566,1],2] to cpus 0f00

3)node03 has 32 cores with 4 magnycores(8cores/cpu-type).

Regards,
Tetsuya Mishima



[OMPI users] ompi-restart failed && ompi-migrate

2012-04-11 Thread kidd
Hello !  
I had some  problems . 
This is My environment 
   BLCR= 0.8.4   , openMPI= 1.5.5  , OS= ubuntu 11.04
   I have 2 Node : cuda05(Master ,it have NFS  file system)  , cuda07(slave 
,mount Master)

   I had also set  ~/.openmpi/mca-params.conf->
 crs_base_snapshot_dir=/root/kidd_openMPI/Tmp
 snapc_base_global_snapshot_dir=/root/kidd_openMPI/checkpoints

  my configure format=
./configure --prefix=/root/kidd_openMPI --with-ft=cr --enable-ft-thread  
 --with-blcr=/usr/local/BLCR  --with-blcr-libdir=/usr/local/BLCR/lib 
--enable-mpirun-prefix-by-default 
 --enable-static --enable-shared  --enable-opal-multi-threads;

problem 1:  ompi-restart  on multiple Node
  command 01: mpirun -hostfile  Hosts -am ft-enable-cr  -x  LD_LIBRARY_PATH  
-np 2  ./TEST     
  command 02: ompi-restart  ompi_global_snapshot_2892.ckpt
  -> I can checkpoint 2 process on multiples nodes ,but when I restart ,it 
can only restart on Master-Node.   
   
     command 03 : ompi-restart  -hostfile Hosts ompi_global_snapshot_2892.ckpt
    ->Error Message .   I make sure BLCR  is OK.

  
 --
    root@cuda05:~/kidd_openMPI/checkpoints# ompi-restart -hostfile Hosts 
ompi_global_snapshot_2892.ckpt/
   --
   Error: BLCR was not able to restart the process because exec failed.
    Check the installation of BLCR on all of the machines in your
   system. The following information may be of help:
 Return Code : -1
 BLCR Restart Command : cr_restart
 Restart
 Command Line : cr_restart 
/root/kidd_openMPI/checkpoints/ompi_global_snapshot_2892.ckpt/0/opal_snapshot_1.ckpt/ompi_blcr_context.2704
--
--
Error: Unable to
 obtain the proper restart command to restart from the 
   checkpoint file (opal_snapshot_1.ckpt). Returned -1.
   Check the installation of the blcr checkpoint/restart service
   on all of the machines in your system.essage

 problem 2: ompi-migrate i can't find .   How to use ompi-migrate ?
  Please help me , thanks .