Re: [OMPI users] Openmpi SGE and BLACS

2012-01-15 Thread Conn ORourke


Found the problem. I had accidently linked to BLACS built wit mpich, not 
openmpi.

Cheers, 

Conn



 From: Conn ORourke 
To: "us...@open-mpi.org" ; "terry.don...@oracle.com" 
 
Sent: Saturday, 14 January 2012, 17:42
Subject: Re: [OMPI users] Openmpi SGE and BLACS
 

Dear Terry, 


Thanks for the reply, and sorry for the delay in getting back to you. Here is 
the relevant part of the gdb output:

Program terminated with signal 11, Segmentation fault.
#0  0x2b63ba7f9291 in PMPI_Comm_size () at ./pcomm_size.c:46
46        if ( ompi_comm_invalid (comm)) {
(gdb) where
#0  0x2b63ba7f9291 in PMPI_Comm_size () at ./pcomm_size.c:46
#1  0x0062cb6c in blacs_pinfo_ () at ./blacs_pinfo_.c:29
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Do you think the problem is being caused by SGE feeding the wrong number of 
processors to BLACS in someway?
As I mentioned previously I am requesting a different number of processors than 
I am running on, as I run several jobs on the requested processors.

Thanks for your time & help.

Conn





 From: TERRY DONTJE 
To: us...@open-mpi.org 
Sent: Friday, 13 January 2012, 13:21
Subject: Re: [OMPI users] Openmpi SGE and BLACS
 

Do you have a stack of where exactly things are seg faulting in blacs_pinfo?  

--td

On 1/13/2012 8:12 AM, Conn ORourke wrote: 
Dear Openmpi Users,
>
>
>I am reserving several processors with SGE upon which I want to run a number 
>of openmpi jobs, all of which individually (and combined) use less than the 
>reserved number of processors. The code I am using uses BLACS, and when 
>blacs_pinfo is called I get a seg fault. If the code doesn't call blacs_pinfo 
>it runs fine being submitted in this manner. blacs_pinfo simply returns the 
>number of available processors, so I suspect this is an issue with SGE and 
>openmpi and the requested node number being different to that given to mpirun.
>
>
>
>Can anyone explain why this would happen with openmpi jobs using BLACS  on the 
>SGE? And suggest maybe a way around it?
>
>
>
>Many thanks
>
>Conn
>
>
>
>example submission script:
>#!/bin/bash -f -l#$ -V #$ -N test #$ -S /bin/bash#$ -cwd#$ -l vf=1800M#$ -pe 
>ib-ompi 12 #$ -q infiniband.q    BIN=~/bin/program
    fori inXPOL,YPOL,ZPOL;do       mkdir ${TMPDIR}/4ZP;       mkdir 
${TMPDIR}/4ZP/$i;       cp ./4ZP/$i/*${TMPDIR}/4ZP/$i;    done    cd 
${TMPDIR}/4ZP/XPOL;    mpirun -np 4-machinefile ${TMPDIR}/machines $BIN >output 
&    cd ${TMPDIR}/4ZP/YPOL;    mpirun -np 4-machinefile ${TMPDIR}/machines $BIN 
>output &    cd ${TMPDIR}/4ZP/ZPOL;    mpirun -np 4-machinefile 
${TMPDIR}/machines $BIN >output ;    fori in XPOL YPOL ZPOL  ;do     cp 
${TMPDIR}/4ZP/$i/*${HOME}/4ZP/$i;    doneblacs_pinfo.c: #include "Bdef.h"#if 
(INTFACE == C_CALL)void Cblacs_pinfo(int *mypnum,int *nprocs)#elseF_VOID_FUNC 
blacs_pinfo_(int *mypnum,int *nprocs)#endif{   int ierr;   extern int 
BI_Iam,BI_Np;/* *Ifthis is our first call,will need to setup some stuff
 */   if(BI_F77_MPI_COMM_WORLD ==NULL)   {/* *   TheBLACS always call f77's 
mpi_init.  If the user is using C, he should
 *    explicitly call MPI_Init . . .
 */
      MPI_Initialized(nprocs);
#ifdef MainInF77
      if (!(*nprocs)) bi_f77_init_();
#else
      if (!(*nprocs))
         BI_BlacsErr(-1, -1, __FILE__,
            "Users with C main programs must explicitly call MPI_Init");
#endif
      BI_F77_MPI_COMM_WORLD = (int *) malloc(sizeof(int));
#ifdef UseF77Mpi
      BI_F77_MPI_CONSTANTS = (int *) malloc(23*sizeof(int));
      ierr = 1;
      bi_f77_get_constants_(BI_F77_MPI_COMM_WORLD, &ierr, BI_F77_MPI_CONSTANTS);
#else
      ierr = 0;
      bi_f77_get_constants_(BI_F77_MPI_COMM_WORLD, &ierr, nprocs);
#endif
      BI_MPI_Comm_size(BI_MPI_COMM_WORLD, &BI_Np, ierr);
      BI_MPI_Comm_rank(BI_MPI_COMM_WORLD, &BI_Iam, ierr);
   }
   *mypnum = BI_Iam;
   *nprocs = BI_Np;
} 
>
___
users mailing list us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 

 

Terry D. Dontje | Principal Software Engineer

Developer Tools Engineering | +1.781.442.2631
 Oracle - Performance Technologies
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Openmpi SGE and BLACS

2012-01-14 Thread Conn ORourke
Dear Terry, 


Thanks for the reply, and sorry for the delay in getting back to you. Here is 
the relevant part of the gdb output:

Program terminated with signal 11, Segmentation fault.
#0  0x2b63ba7f9291 in PMPI_Comm_size () at ./pcomm_size.c:46
46        if ( ompi_comm_invalid (comm)) {
(gdb) where
#0  0x2b63ba7f9291 in PMPI_Comm_size () at ./pcomm_size.c:46
#1  0x0062cb6c in blacs_pinfo_ () at ./blacs_pinfo_.c:29
Backtrace stopped: previous frame inner to this frame (corrupt stack?)

Do you think the problem is being caused by SGE feeding the wrong number of 
processors to BLACS in someway?
As I mentioned previously I am requesting a different number of processors than 
I am running on, as I run several jobs on the requested processors.

Thanks for your time & help.

Conn





 From: TERRY DONTJE 
To: us...@open-mpi.org 
Sent: Friday, 13 January 2012, 13:21
Subject: Re: [OMPI users] Openmpi SGE and BLACS
 

Do you have a stack of where exactly things are seg faulting in blacs_pinfo?  

--td

On 1/13/2012 8:12 AM, Conn ORourke wrote: 
Dear Openmpi Users,
>
>
>I am reserving several processors with SGE upon which I want to run a number 
>of openmpi jobs, all of which individually (and combined) use less than the 
>reserved number of processors. The code I am using uses BLACS, and when 
>blacs_pinfo is called I get a seg fault. If the code doesn't call blacs_pinfo 
>it runs fine being submitted in this manner. blacs_pinfo simply returns the 
>number of available processors, so I suspect this is an issue with SGE and 
>openmpi and the requested node number being different to that given to mpirun.
>
>
>
>Can anyone explain why this would happen with openmpi jobs using BLACS  on the 
>SGE? And suggest maybe a way around it?
>
>
>
>Many thanks
>
>Conn
>
>
>
>example submission script:
>#!/bin/bash -f -l#$ -V #$ -N test #$ -S /bin/bash#$ -cwd#$ -l vf=1800M#$ -pe 
>ib-ompi 12 #$ -q infiniband.q    BIN=~/bin/program
    fori inXPOL,YPOL,ZPOL;do       mkdir ${TMPDIR}/4ZP;       mkdir 
${TMPDIR}/4ZP/$i;       cp ./4ZP/$i/*${TMPDIR}/4ZP/$i;    done    cd 
${TMPDIR}/4ZP/XPOL;    mpirun -np 4-machinefile ${TMPDIR}/machines $BIN >output 
&    cd ${TMPDIR}/4ZP/YPOL;    mpirun -np 4-machinefile ${TMPDIR}/machines $BIN 
>output &    cd ${TMPDIR}/4ZP/ZPOL;    mpirun -np 4-machinefile 
${TMPDIR}/machines $BIN >output ;    fori in XPOL YPOL ZPOL  ;do     cp 
${TMPDIR}/4ZP/$i/*${HOME}/4ZP/$i;    doneblacs_pinfo.c: #include "Bdef.h"#if 
(INTFACE == C_CALL)void Cblacs_pinfo(int *mypnum,int *nprocs)#elseF_VOID_FUNC 
blacs_pinfo_(int *mypnum,int *nprocs)#endif{   int ierr;   extern int 
BI_Iam,BI_Np;/* *Ifthis is our first call,will need to setup some stuff
 */   if(BI_F77_MPI_COMM_WORLD ==NULL)   {/* *   TheBLACS always call f77's 
mpi_init.  If the user is using C, he should
 *    explicitly call MPI_Init . . .
 */
      MPI_Initialized(nprocs);
#ifdef MainInF77
      if (!(*nprocs)) bi_f77_init_();
#else
      if (!(*nprocs))
         BI_BlacsErr(-1, -1, __FILE__,
            "Users with C main programs must explicitly call MPI_Init");
#endif
      BI_F77_MPI_COMM_WORLD = (int *) malloc(sizeof(int));
#ifdef UseF77Mpi
      BI_F77_MPI_CONSTANTS = (int *) malloc(23*sizeof(int));
      ierr = 1;
      bi_f77_get_constants_(BI_F77_MPI_COMM_WORLD, &ierr, BI_F77_MPI_CONSTANTS);
#else
      ierr = 0;
      bi_f77_get_constants_(BI_F77_MPI_COMM_WORLD, &ierr, nprocs);
#endif
      BI_MPI_Comm_size(BI_MPI_COMM_WORLD, &BI_Np, ierr);
      BI_MPI_Comm_rank(BI_MPI_COMM_WORLD, &BI_Iam, ierr);
   }
   *mypnum = BI_Iam;
   *nprocs = BI_Np;
} 
>
___
users mailing list us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 

 

Terry D. Dontje | Principal Software Engineer

Developer Tools Engineering | +1.781.442.2631
 Oracle - Performance Technologies
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Openmpi SGE and BLACS

2012-01-13 Thread TERRY DONTJE
Do you have a stack of where exactly things are seg faulting in 
blacs_pinfo?


--td

On 1/13/2012 8:12 AM, Conn ORourke wrote:

Dear Openmpi Users,

I am reserving several processors with SGE upon which I want to run a 
number of openmpi jobs, all of which individually (and combined) use 
less than the reserved number of processors. The code I am using uses 
BLACS, and when blacs_pinfo is called I get a seg fault. If the code 
doesn't call blacs_pinfo it runs fine being submitted in this manner. 
blacs_pinfo simply returns the number of available processors, so I 
suspect this is an issue with SGE and openmpi and the requested node 
number being different to that given to mpirun.


Can anyone explain why this would happen with openmpi jobs using 
BLACS  on the SGE? And suggest maybe a way around it?


Many thanks
Conn

example submission script:
|#!/bin/bash -f -l
#$ -V
#$ -N test
#$ -S /bin/bash
#$ -cwd
#$ -l vf=1800M
#$ -pe ib-ompi 12
#$ -q infiniband.q


 BIN=~/bin/program
 for  iin  XPOL,YPOL,ZPOL;  do
mkdir ${TMPDIR}/4ZP;
mkdir ${TMPDIR}/4ZP/$i;
cp./4ZP/$i/*  ${TMPDIR}/4ZP/$i;
 done

 cd ${TMPDIR}/4ZP/XPOL;
 mpirun-np4  -machinefile ${TMPDIR}/machines $BIN>  output&
 cd ${TMPDIR}/4ZP/YPOL;
 mpirun-np4  -machinefile ${TMPDIR}/machines $BIN>  output&
 cd ${TMPDIR}/4ZP/ZPOL;
 mpirun-np4  -machinefile ${TMPDIR}/machines $BIN>  output;

 for  iinXPOL YPOL ZPOL;  do
  cp ${TMPDIR}/4ZP/$i/*  ${HOME}/4ZP/$i;
 done


blacs_pinfo.c:
||#include "Bdef.h"

#if (INTFACE == C_CALL)
voidCblacs_pinfo(int*mypnum,  int*nprocs)
#else
F_VOID_FUNC blacs_pinfo_(int*mypnum,  int*nprocs)
#endif
{
int ierr;
extern int BI_Iam,  BI_Np;

/*
  *  If  this is our first call,  will need toset  up some stuff
  */
if  (BI_F77_MPI_COMM_WORLD==  NULL)
{
/*
  *  The  BLACS always call f77's mpi_init.  If the user is using C, he 
should
  *explicitly call MPI_Init . . .
  */
   MPI_Initialized(nprocs);
#ifdef MainInF77
   if (!(*nprocs)) bi_f77_init_();
#else
   if (!(*nprocs))
  BI_BlacsErr(-1, -1, __FILE__,
 "Users with C main programs must explicitly call MPI_Init");
#endif
   BI_F77_MPI_COMM_WORLD = (int *) malloc(sizeof(int));
#ifdef UseF77Mpi
   BI_F77_MPI_CONSTANTS = (int *)
  malloc(23*sizeof(int));
   ierr = 1;
   bi_f77_get_constants_(BI_F77_MPI_COMM_WORLD,&ierr, BI_F77_MPI_CONSTANTS);
#else
   ierr = 0;
   bi_f77_get_constants_(BI_F77_MPI_COMM_WORLD,&ierr, nprocs);
#endif
   BI_MPI_Comm_size(BI_MPI_COMM_WORLD,&BI_Np, ierr);
   BI_MPI_Comm_rank(BI_MPI_COMM_WORLD,&BI_Iam, ierr);
}
*mypnum = BI_Iam;
*nprocs = BI_Np;
}|


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





[OMPI users] Openmpi SGE and BLACS

2012-01-13 Thread Conn ORourke
Dear Openmpi Users,

I am reserving several processors with SGE upon which I want to run a number of 
openmpi jobs, all of which individually (and combined) use 
less than the reserved number of processors. The code I am using uses 
BLACS, and when blacs_pinfo is called I get a seg fault. If the code doesn't 
call blacs_pinfo it runs fine being submitted in this manner. blacs_pinfo 
simply returns the number of available processors, so I suspect this is an 
issue with SGE and openmpi and the requested node number being different to 
that given to mpirun.


Can anyone explain why this would happen with openmpi jobs using BLACS  on the 
SGE? And suggest maybe a way around it?


Many thanks

Conn


example submission script:
#!/bin/bash -f -l
#$ -V 
#$ -N test 
#$ -S /bin/bash
#$ -cwd
#$ -l vf=1800M
#$ -pe ib-ompi 12 
#$ -q infiniband.q


    BIN=~/bin/program
    fori inXPOL,YPOL,ZPOL;do
       mkdir ${TMPDIR}/4ZP;
       mkdir ${TMPDIR}/4ZP/$i;
       cp ./4ZP/$i/*${TMPDIR}/4ZP/$i;
    done

    cd ${TMPDIR}/4ZP/XPOL;
    mpirun -np 4-machinefile ${TMPDIR}/machines $BIN >output &
    cd ${TMPDIR}/4ZP/YPOL;
    mpirun -np 4-machinefile ${TMPDIR}/machines $BIN >output &
    cd ${TMPDIR}/4ZP/ZPOL;
    mpirun -np 4-machinefile ${TMPDIR}/machines $BIN >output ;

    fori in XPOL YPOL ZPOL  ;do
     cp ${TMPDIR}/4ZP/$i/*${HOME}/4ZP/$i;
    done


blacs_pinfo.c:
#include "Bdef.h"

#if (INTFACE == C_CALL)
void Cblacs_pinfo(int *mypnum,int *nprocs)
#else
F_VOID_FUNC blacs_pinfo_(int *mypnum,int *nprocs)
#endif
{
   int ierr;
   extern int BI_Iam,BI_Np;

/*
 *Ifthis is our first call,will need to setup some stuff
 */
   if(BI_F77_MPI_COMM_WORLD ==NULL)
   {
/*
 *   TheBLACS always call f77's mpi_init.  If the user is using C, he should
 *    explicitly call MPI_Init . . .
 */
      MPI_Initialized(nprocs);
#ifdef MainInF77
      if (!(*nprocs)) bi_f77_init_();
#else
      if (!(*nprocs))
         BI_BlacsErr(-1, -1, __FILE__,
            "Users with C main programs must explicitly call MPI_Init");
#endif
      BI_F77_MPI_COMM_WORLD = (int *) malloc(sizeof(int));
#ifdef UseF77Mpi
      BI_F77_MPI_CONSTANTS = (int *) malloc(23*sizeof(int));
      ierr = 1;
      bi_f77_get_constants_(BI_F77_MPI_COMM_WORLD, &ierr, BI_F77_MPI_CONSTANTS);
#else
      ierr = 0;
      bi_f77_get_constants_(BI_F77_MPI_COMM_WORLD, &ierr, nprocs);
#endif
      BI_MPI_Comm_size(BI_MPI_COMM_WORLD, &BI_Np, ierr);
      BI_MPI_Comm_rank(BI_MPI_COMM_WORLD, &BI_Iam, ierr);
   }
   *mypnum = BI_Iam;
   *nprocs = BI_Np;
}