Re: [OMPI users] A problem with 'mpiexec -launch-agent'

2010-06-14 Thread Ralph Castain
It isn't our intention either - still looking at this to see what is going on.


On Jun 14, 2010, at 6:24 PM, Terry Frankcombe wrote:

> On Tue, 2010-06-15 at 00:13 +0200, Reuti wrote:
>> Hi,
>> 
>> Am 13.06.2010 um 09:02 schrieb Zhang Linbo:
>> 
>>> Hi,
>>> 
>>> I'm new to OpenMPI and have encountered a problem with mpiexec.
>>> 
>>> Since I need to set up the execution environment for OpenMPI programs
>>> on the execution nodes, I use the following command line to launch an
>>> OMPI program:
>>> 
>>>  mpiexec -launch-agent /some_path/myscript 
>>> 
>>> The problem is: the above command works fine if I invoke 'mpiexec'
>>> without an absolute path just like above (assuming the PATH variable
>>> is properly set), but if I prepend an absolute path to 'mpiexec',  
>>> e.g.:
>>> 
>>>  /OMPI_dir/bin/mpiexec -launch-agent /some_path/myscript 
>> 
>> using an absolute path is equivalent to use the --prefix option to  
>> `mpiexec`. Both ways lead obviously to the erroneous behavior you  
>> encounter.
> 
> Hi folks
> 
> Speaking as no more than an uneducated user, having the behaviour change
> depending on invoking by an absolute path or invoking by some
> unspecified (potentially shell-dependent) path magic seems like a bad
> idea.
> 
> As a long-time *nix user, this just rubs me the wrong way.
> 
> Ciao
> Terry
> 
> 
> -- 
> Dr. Terry Frankcombe
> Research School of Chemistry, Australian National University
> Ph: (+61) 0417 163 509Skype: terry.frankcombe
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] A problem with 'mpiexec -launch-agent'

2010-06-14 Thread Terry Frankcombe
On Tue, 2010-06-15 at 00:13 +0200, Reuti wrote:
> Hi,
> 
> Am 13.06.2010 um 09:02 schrieb Zhang Linbo:
> 
> > Hi,
> >
> > I'm new to OpenMPI and have encountered a problem with mpiexec.
> >
> > Since I need to set up the execution environment for OpenMPI programs
> > on the execution nodes, I use the following command line to launch an
> > OMPI program:
> >
> >   mpiexec -launch-agent /some_path/myscript 
> >
> > The problem is: the above command works fine if I invoke 'mpiexec'
> > without an absolute path just like above (assuming the PATH variable
> > is properly set), but if I prepend an absolute path to 'mpiexec',  
> > e.g.:
> >
> >   /OMPI_dir/bin/mpiexec -launch-agent /some_path/myscript 
> 
> using an absolute path is equivalent to use the --prefix option to  
> `mpiexec`. Both ways lead obviously to the erroneous behavior you  
> encounter.

Hi folks

Speaking as no more than an uneducated user, having the behaviour change
depending on invoking by an absolute path or invoking by some
unspecified (potentially shell-dependent) path magic seems like a bad
idea.

As a long-time *nix user, this just rubs me the wrong way.

Ciao
Terry


-- 
Dr. Terry Frankcombe
Research School of Chemistry, Australian National University
Ph: (+61) 0417 163 509Skype: terry.frankcombe



Re: [OMPI users] A problem with 'mpiexec -launch-agent'

2010-06-14 Thread Reuti

Am 15.06.2010 um 00:26 schrieb Ralph Castain:

Jeff and I are taking a look at the logic in that code now - I know  
we thought we understood it back when we wrote it, but somehow it  
just doesn't look right any more...


To avoid confusion: I meant "orted_prefix" below which holds the name  
of the launch-agent and can't be used with this demonstration fix.


Sorry for the typo.

-- Reuti




On Jun 14, 2010, at 4:13 PM, Reuti wrote:


Hi,

Am 13.06.2010 um 09:02 schrieb Zhang Linbo:


Hi,

I'm new to OpenMPI and have encountered a problem with mpiexec.

Since I need to set up the execution environment for OpenMPI  
programs
on the execution nodes, I use the following command line to launch  
an

OMPI program:

mpiexec -launch-agent /some_path/myscript 

The problem is: the above command works fine if I invoke 'mpiexec'
without an absolute path just like above (assuming the PATH variable
is properly set), but if I prepend an absolute path to 'mpiexec',  
e.g.:


/OMPI_dir/bin/mpiexec -launch-agent /some_path/myscript 


using an absolute path is equivalent to use the --prefix option to  
`mpiexec`. Both ways lead obviously to the erroneous behavior you  
encounter.




then I get the following error message:

bash: -c: line 0: syntax error near unexpected token `('
bash: -c: line 0: ` PATH=/OMPI_dir/bin:$PATH ; export PATH ;  
LD_LIBRARY_PATH=/OMPI_dir/lib:$LD_LIBRARY_PATH ; export  
LD_LIBRARY_PATH ; /some_path/myscript /OMPI_dir/bin/(null) -- 
daemonize -mca ess env -mca orte_ess_jobid 1978662912 -mca  
orte_ess_vpid 1 -mca orte_ess_num_procs 2 --hnp-uri  
"1978662912.0;tcp://180.0.14.12:54844;tcp://190.0.14.12:54844"'


Reason seems to be, that in case of a given prefix the assembly of  
the necessary command line includes some elements too much. I tried  
to circumvent this by a new case in "orte/mca/plm/rsh/ 
plm_rsh_module.c":


  if (orted_prefix != NULL) {
  asprintf (_cmd,
"%s%s%s PATH=%s/%s:$PATH ; export PATH ; "
"LD_LIBRARY_PATH=%s/%s:$LD_LIBRARY_PATH ;  
export LD_LIBRARY_PATH ; "

"%s",
(opal_prefix != NULL ? "OPAL_PREFIX=" : ""),
(opal_prefix != NULL ? opal_prefix : ""),
(opal_prefix != NULL ? " ; export  
OPAL_PREFIX;" : ""),

prefix_dir, bin_base,
prefix_dir, lib_base,
orted_prefix );
  }
  else {
  asprintf (_cmd,
"%s%s%s PATH=%s/%s:$PATH ; export PATH ; "
"LD_LIBRARY_PATH=%s/%s:$LD_LIBRARY_PATH ;  
export LD_LIBRARY_PATH ; "

"%s %s/%s/%s",
(opal_prefix != NULL ? "OPAL_PREFIX=" : ""),
(opal_prefix != NULL ? opal_prefix : ""),
(opal_prefix != NULL ? " ; export  
OPAL_PREFIX;" : ""),

prefix_dir, bin_base,
prefix_dir, lib_base,
(orted_prefix != NULL ? orted_prefix : ""),
prefix_dir, bin_base,
orted_cmd);
  }

The name of the agent is for sake of easiness stored in  
"opal_prefix" AFAICS.


This is of course not a clean solution (as "opal_prefix" can't be  
used any more), but more a proof of concept, as only sh-like shelle  
are handled. Sure there are better ways to solve it. Anyway, it's a  
bug and should be filed


-- Reuti


I'd like to know what causes the above problem and how should I  
deal with it.
I want to use absolute pathname of mpiexec to avoid possible  
inteferences

with other MPI installations. Thanks in advance.

LB


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] A problem with 'mpiexec -launch-agent'

2010-06-14 Thread Ralph Castain
Jeff and I are taking a look at the logic in that code now - I know we thought 
we understood it back when we wrote it, but somehow it just doesn't look right 
any more...


On Jun 14, 2010, at 4:13 PM, Reuti wrote:

> Hi,
> 
> Am 13.06.2010 um 09:02 schrieb Zhang Linbo:
> 
>> Hi,
>> 
>> I'm new to OpenMPI and have encountered a problem with mpiexec.
>> 
>> Since I need to set up the execution environment for OpenMPI programs
>> on the execution nodes, I use the following command line to launch an
>> OMPI program:
>> 
>>  mpiexec -launch-agent /some_path/myscript 
>> 
>> The problem is: the above command works fine if I invoke 'mpiexec'
>> without an absolute path just like above (assuming the PATH variable
>> is properly set), but if I prepend an absolute path to 'mpiexec', e.g.:
>> 
>>  /OMPI_dir/bin/mpiexec -launch-agent /some_path/myscript 
> 
> using an absolute path is equivalent to use the --prefix option to `mpiexec`. 
> Both ways lead obviously to the erroneous behavior you encounter.
> 
> 
>> then I get the following error message:
>> 
>> bash: -c: line 0: syntax error near unexpected token `('
>> bash: -c: line 0: ` PATH=/OMPI_dir/bin:$PATH ; export PATH ; 
>> LD_LIBRARY_PATH=/OMPI_dir/lib:$LD_LIBRARY_PATH ; export LD_LIBRARY_PATH ; 
>> /some_path/myscript /OMPI_dir/bin/(null) --daemonize -mca ess env -mca 
>> orte_ess_jobid 1978662912 -mca orte_ess_vpid 1 -mca orte_ess_num_procs 2 
>> --hnp-uri "1978662912.0;tcp://180.0.14.12:54844;tcp://190.0.14.12:54844"'
> 
> Reason seems to be, that in case of a given prefix the assembly of the 
> necessary command line includes some elements too much. I tried to circumvent 
> this by a new case in "orte/mca/plm/rsh/plm_rsh_module.c":
> 
>if (orted_prefix != NULL) {
>asprintf (_cmd,
>  "%s%s%s PATH=%s/%s:$PATH ; export PATH ; "
>  "LD_LIBRARY_PATH=%s/%s:$LD_LIBRARY_PATH ; export 
> LD_LIBRARY_PATH ; "
>  "%s",
>  (opal_prefix != NULL ? "OPAL_PREFIX=" : ""),
>  (opal_prefix != NULL ? opal_prefix : ""),
>  (opal_prefix != NULL ? " ; export OPAL_PREFIX;" : ""),
>  prefix_dir, bin_base,
>  prefix_dir, lib_base,
>  orted_prefix );
>}
>else {
>asprintf (_cmd,
>  "%s%s%s PATH=%s/%s:$PATH ; export PATH ; "
>  "LD_LIBRARY_PATH=%s/%s:$LD_LIBRARY_PATH ; export 
> LD_LIBRARY_PATH ; "
>  "%s %s/%s/%s",
>  (opal_prefix != NULL ? "OPAL_PREFIX=" : ""),
>  (opal_prefix != NULL ? opal_prefix : ""),
>  (opal_prefix != NULL ? " ; export OPAL_PREFIX;" : ""),
>  prefix_dir, bin_base,
>  prefix_dir, lib_base,
>  (orted_prefix != NULL ? orted_prefix : ""),
>  prefix_dir, bin_base,
>  orted_cmd);
>}
> 
> The name of the agent is for sake of easiness stored in "opal_prefix" AFAICS.
> 
> This is of course not a clean solution (as "opal_prefix" can't be used any 
> more), but more a proof of concept, as only sh-like shelle are handled. Sure 
> there are better ways to solve it. Anyway, it's a bug and should be filed
> 
> -- Reuti
> 
> 
>> I'd like to know what causes the above problem and how should I deal with it.
>> I want to use absolute pathname of mpiexec to avoid possible inteferences
>> with other MPI installations. Thanks in advance.
>> 
>> LB
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] A problem with 'mpiexec -launch-agent'

2010-06-14 Thread Reuti

Hi,

Am 13.06.2010 um 09:02 schrieb Zhang Linbo:


Hi,

I'm new to OpenMPI and have encountered a problem with mpiexec.

Since I need to set up the execution environment for OpenMPI programs
on the execution nodes, I use the following command line to launch an
OMPI program:

  mpiexec -launch-agent /some_path/myscript 

The problem is: the above command works fine if I invoke 'mpiexec'
without an absolute path just like above (assuming the PATH variable
is properly set), but if I prepend an absolute path to 'mpiexec',  
e.g.:


  /OMPI_dir/bin/mpiexec -launch-agent /some_path/myscript 


using an absolute path is equivalent to use the --prefix option to  
`mpiexec`. Both ways lead obviously to the erroneous behavior you  
encounter.




then I get the following error message:

bash: -c: line 0: syntax error near unexpected token `('
bash: -c: line 0: ` PATH=/OMPI_dir/bin:$PATH ; export PATH ;  
LD_LIBRARY_PATH=/OMPI_dir/lib:$LD_LIBRARY_PATH ; export  
LD_LIBRARY_PATH ; /some_path/myscript /OMPI_dir/bin/(null) -- 
daemonize -mca ess env -mca orte_ess_jobid 1978662912 -mca  
orte_ess_vpid 1 -mca orte_ess_num_procs 2 --hnp-uri  
"1978662912.0;tcp://180.0.14.12:54844;tcp://190.0.14.12:54844"'


Reason seems to be, that in case of a given prefix the assembly of the  
necessary command line includes some elements too much. I tried to  
circumvent this by a new case in "orte/mca/plm/rsh/plm_rsh_module.c":


if (orted_prefix != NULL) {
asprintf (_cmd,
  "%s%s%s PATH=%s/%s:$PATH ; export PATH ; "
  "LD_LIBRARY_PATH=%s/%s:$LD_LIBRARY_PATH ;  
export LD_LIBRARY_PATH ; "

  "%s",
  (opal_prefix != NULL ? "OPAL_PREFIX=" : ""),
  (opal_prefix != NULL ? opal_prefix : ""),
  (opal_prefix != NULL ? " ; export  
OPAL_PREFIX;" : ""),

  prefix_dir, bin_base,
  prefix_dir, lib_base,
  orted_prefix );
}
else {
asprintf (_cmd,
  "%s%s%s PATH=%s/%s:$PATH ; export PATH ; "
  "LD_LIBRARY_PATH=%s/%s:$LD_LIBRARY_PATH ;  
export LD_LIBRARY_PATH ; "

  "%s %s/%s/%s",
  (opal_prefix != NULL ? "OPAL_PREFIX=" : ""),
  (opal_prefix != NULL ? opal_prefix : ""),
  (opal_prefix != NULL ? " ; export  
OPAL_PREFIX;" : ""),

  prefix_dir, bin_base,
  prefix_dir, lib_base,
  (orted_prefix != NULL ? orted_prefix : ""),
  prefix_dir, bin_base,
  orted_cmd);
}

The name of the agent is for sake of easiness stored in "opal_prefix"  
AFAICS.


This is of course not a clean solution (as "opal_prefix" can't be used  
any more), but more a proof of concept, as only sh-like shelle are  
handled. Sure there are better ways to solve it. Anyway, it's a bug  
and should be filed


-- Reuti


I'd like to know what causes the above problem and how should I deal  
with it.
I want to use absolute pathname of mpiexec to avoid possible  
inteferences

with other MPI installations. Thanks in advance.

LB


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] How to checkpoint atomic function in OpenMPI

2010-06-14 Thread Nguyen Toan
Hi all,
I have a MPI program as follows:
---
int main(){
   MPI_Init();
   ..
   for (i=0; i<1; i++) {
  my_atomic_func();
   }
   ...
   MPI_Finalize();
   return 0;
}


The runtime of this program mainly involves in running the loop and
my_atomic_func() takes a little bit long.
Here I want my_atomic_func() to be operated atomically, but the timing of
checkpointing (by running ompi-checkpoint command) may be in the middle of
my_atomic_func() operation and hence ompi-restart may fail to restart
correctly.

So my question is:
+ At the checkpoint time (executing ompi-checkpoint), is there a way to let
OpenMPI wait until my_atomic_func()  finishes its operation?
+ How does ompi-checkpoint operate to checkpoint MPI threads?

Regards,
Nguyen Toan


Re: [OMPI users] ompi-restart failed

2010-06-14 Thread Nguyen Toan
Hi all,
I finally figured out the answer. I just put the parameter "-machinefile
host" in the "ompi-restart" command and it restarted correctly. So is it
unable to restart multi-threaded application on 1 node in OpenMPI?

Nguyen Toan

On Tue, Jun 8, 2010 at 12:07 AM, Nguyen Toan wrote:

> Sorry, I just want to add 2 more things:
> + I tried configure with and without --enable-ft-thread but nothing changed
> + I also applied this patch for OpenMPI here and reinstalled but I got the
> same error
>
> https://svn.open-mpi.org/trac/ompi/raw-attachment/ticket/2139/v1.4-preload-part1.diff
>
> Somebody helps? Thank you very much.
>
> Nguyen Toan
>
>
> On Mon, Jun 7, 2010 at 11:51 PM, Nguyen Toan wrote:
>
>> Hello everyone,
>>
>> I'm using OpenMPI 1.4.2 with BLCR 0.8.2 to test checkpointing on 2 nodes
>> but it failed to restart (Segmentation fault).
>> Here are the details concerning my problem:
>>
>> + OS: Centos 5.4
>> + OpenMPI configure:
>> ./configure --with-ft=cr --enable-ft-thread --enable-mpi-threads \
>> --with-blcr=/home/nguyen/opt/blcr
>> --with-blcr-libdir=/home/nguyen/opt/blcr/lib \
>> --prefix=/home/nguyen/opt/openmpi \
>> --enable-mpirun-prefix-by-default
>> + mpirun -am ft-enable-cr -machinefile host ./test
>>
>> I checkpointed the test program using "ompi-checkpoint -v -s PID" and the
>> checkpoint file was created successfully. However it failed to restart using
>> ompi-restart:
>> *"mpirun noticed that process rank 0 with PID 21242 on node rc014.local
>> exited on signal 11 (Segmentation fault)"
>> *
>> Did I miss something in the installation of OpenMPI?
>>
>> Regards,
>> Nguyen Toan
>>
>
>