Re: [OMPI users] Processor/core selection/affinity for large shared memory systems

2008-12-04 Thread Terry Frankcombe
Isn't it up to the OS scheduler what gets run where?


> We have an 8-way, 32-core AMD processor machine (single system image)
> and are at present running OpenMPI 1.2.8 .  Jobs are launched locally on
> the machine itself.  As far as I can see, there doesn't seem to be any
> way to tell OpenMPI to launch the MPI processes on adjacent cores.
> Presumably such functionality is technically possible via PLPA.  Is
> there in fact a way to specify such a thing with 1.2.8, and if not, will
> 1.3 support these kinds arguments?
>
> Thank you.
> --
>   V. Ram
>   v_r_...@fastmail.fm
>
> --
> http://www.fastmail.fm - Or how I learned to stop worrying and
>   love email again
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>




[OMPI users] Issue with Profiling Fortran code

2008-12-04 Thread Nick Wright

Hi

I am trying to use the PMPI interface with OPENMPI to profile a fortran 
program.


I have tried with 1.28 and 1.3rc1 with --enable-mpi-profile switched on.

The problem seems to be that if one eg. intercepts to call to 
mpi_comm_rank_ (the fortran hook) then calls pmpi_comm_rank_ this then 
calls MPI_Comm_rank (the C hook) not PMPI_Comm_rank as it should.


So if one wants to create a library that can profile C and Fortran codes 
at the same time one ends up intercepting the mpi call twice. Which is 
not desirable and not what should happen (and indeed doesn't happen in 
other MPI implementations).


A simple example to illustrate is below. If somebody knows of a fix to 
avoid this issue that would be great !


Thanks

Nick.

pmpi_test.c: mpicc pmpi_test.c -c

#include
#include "mpi.h"
void mpi_comm_rank_(MPI_Comm *comm, int *rank, int *info) {
  printf("mpi_comm_rank call successfully intercepted\n");
  pmpi_comm_rank_(comm,rank,info);
}
int MPI_Comm_rank(MPI_Comm comm, int *rank) {
  printf("MPI_comm_rank call successfully intercepted\n");
  PMPI_Comm_rank(comm,rank);
}

hello_mpi.f: mpif77 hello_mpi.f pmpi_test.o

  program hello
   implicit none
   include 'mpif.h'
   integer ierr
   integer myid,nprocs
   character*24 fdate,host
   call MPI_Init( ierr )
  myid=0
  call mpi_comm_rank(MPI_COMM_WORLD, myid, ierr )
  call mpi_comm_size(MPI_COMM_WORLD , nprocs, ierr )
  call getenv('HOST',host)
  write (*,*) 'Hello World from proc',myid,' out of',nprocs,host
  call mpi_finalize(ierr)
  end





Re: [OMPI users] Name Mangling

2008-12-04 Thread Jeff Squyres
In general, Open MPI just uses whatever name mangling scheme the  
compiler uses.  Hence, if you compile your app and Open MPI with the  
same compiler, it should just work.  That being said, if your CLM app  
is supplying its own name mangling scheme flags to the PGI compiler  
(i.e., shifting it away from its default scheme), then yes, OMPI won't  
match it.  You can pass the same flags to OMPI's build process if you  
want; then they should match.


We do provide weak symbols on platforms that support them, so it's a  
little odd that your app apparently isn't seeing them.



On Dec 4, 2008, at 2:33 PM, Elvedin Trnjanin wrote:

I'm using OpenMPI 1.2.5 and PGI 7.1.5 compiler suite to get CLM 3.5  
working correctly. When compiling for OpenMPI, I encounter the  
following snippet of errors -


areaMod.o(.text+0x98a0): In function `areamod_map_checkmap_':
: undefined reference to `mpi_reduce_'
areaMod.o(.text+0x9b6c): In function `areamod_map_checkmap_':
: undefined reference to `mpi_reduce_'
areaMod.o(.text+0x9c39): In function `areamod_map_checkmap_':
: undefined reference to `mpi_reduce_'
areaMod.o(.text+0x9ea2): more undefined references to `mpi_reduce_'


When compiling for MPICH2, it works just fine. I assume this is  
going to lead to recompiling OpenMPI so I am wondering which PGI  
name mangling options to pass either the OpenMPI compile, or CLM  
compile to get the names in order?


Thanks,
Elvedin

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems



Re: [OMPI users] Processor/core selection/affinity for large shared memory systems

2008-12-04 Thread V. Ram
Ralph H. Castain wrote:

> I confess to confusion. OpenMPI will by default map your processes in
> a round-robin fashion based on process slot. If you are in a resource
> managed environment (e.g., TM or SLURM), then the slots correspond to
> cores. If you are in an unmanaged environment, then your hostfile
> needs to specify a single hostname, and the slots=x number should
> reflect the total number of cores on your machine.

> If you then set mpi_paffinity_alone=1, OMPI will bind each rank to its
> associated core.

> Is that not what you are trying to do?
> Ralph

I probably didn't explain myself well.  In this case, the system is not
running a resource manager like SLURM.  It is running Linux.  If I run
numactl --hardware, then I get:

available: 8 nodes (0-7)
node 0 cpus: 0 1 2 3
node 1 cpus: 4 5 6 7
node 3 cpus: 12 13 14 15
node 4 cpus: 16 17 18 19
node 5 cpus: 20 21 22 23
node 6 cpus: 24 25 26 27
node 7 cpus: 28 29 30 31

where I've elided the memory-related output as well as the node
distances.  Just to reiterate, each node here is an AMD processor, part
of the 8-way system; there is no IP networking going on.

What I'd like is, if start a job with mpirun -np 16 ,
these 16 MPI processes get allocated on continuous "cpus" in numactl
parlance, e.g. cpus 0-15, or 12-27, etc.

As it stands, if I check the cpus allocated to the aforementioned -np 16
job, I see various cores active on multiple sockets, but I don't see
whole sockets (all 4 cores) active at a time on this job.

Does this make more sense?
-- 
  V. Ram
  v_r_...@fastmail.fm

-- 
http://www.fastmail.fm - A no graphics, no pop-ups email service



[OMPI users] Processor/core selection/affinity for large shared memory systems

2008-12-04 Thread V. Ram
We have an 8-way, 32-core AMD processor machine (single system image)
and are at present running OpenMPI 1.2.8 .  Jobs are launched locally on
the machine itself.  As far as I can see, there doesn't seem to be any
way to tell OpenMPI to launch the MPI processes on adjacent cores. 
Presumably such functionality is technically possible via PLPA.  Is
there in fact a way to specify such a thing with 1.2.8, and if not, will
1.3 support these kinds arguments?

Thank you.
-- 
  V. Ram
  v_r_...@fastmail.fm

-- 
http://www.fastmail.fm - Or how I learned to stop worrying and
  love email again



[OMPI users] Name Mangling

2008-12-04 Thread Elvedin Trnjanin
I'm using OpenMPI 1.2.5 and PGI 7.1.5 compiler suite to get CLM 3.5 
working correctly. When compiling for OpenMPI, I encounter the following 
snippet of errors -


areaMod.o(.text+0x98a0): In function `areamod_map_checkmap_':
: undefined reference to `mpi_reduce_'
areaMod.o(.text+0x9b6c): In function `areamod_map_checkmap_':
: undefined reference to `mpi_reduce_'
areaMod.o(.text+0x9c39): In function `areamod_map_checkmap_':
: undefined reference to `mpi_reduce_'
areaMod.o(.text+0x9ea2): more undefined references to `mpi_reduce_'


When compiling for MPICH2, it works just fine. I assume this is going to lead 
to recompiling OpenMPI so I am wondering which PGI name mangling options to 
pass either the OpenMPI compile, or CLM compile to get the names in order?

Thanks,
Elvedin



Re: [OMPI users] Checkpointing fails with BLCR 0.8.0b2

2008-12-04 Thread Josh Hursey

Matthias,

Thank you for the heads up. I'll work on a fix that uses the  
cr_request_checkpoint() interface instead of cr_request_file() when  
appropriate.


I filed a ticket about it if you are interested in tracking the  
progress on this bug:

  https://svn.open-mpi.org/trac/ompi/ticket/1691

Cheers,
Josh

On Dec 4, 2008, at 4:47 AM, Matthias Hovestadt wrote:


Hi!

Berkely recently released a new version of their BLCR. They already
marked the function cr_request_file as deprecated in BLCR 0.7.3. Now
they removed deprecated functions from libcr API.

Since checkpointing support of OMPI is using cr_request_file, all
checkpointing operations fail with BLCR 0.8.0b2, making a downgrade
to BLCR 0.7.3 necessary.


Best,
Matthias
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] Checkpointing fails with BLCR 0.8.0b2

2008-12-04 Thread Matthias Hovestadt

Hi!

Berkely recently released a new version of their BLCR. They already
marked the function cr_request_file as deprecated in BLCR 0.7.3. Now
they removed deprecated functions from libcr API.

Since checkpointing support of OMPI is using cr_request_file, all
checkpointing operations fail with BLCR 0.8.0b2, making a downgrade
to BLCR 0.7.3 necessary.


Best,
Matthias


[OMPI users] MCA parameter

2008-12-04 Thread Yasmine Yacoub
  Good morning,

  after installing pwscf and running an example, I got a warning message 
related to the MCA parameter. To fix it I have tried all the steps indicated in 
the FAQ link but it doesn't work. please which command I have to use and from 
which directory can use it. perhaps I didn't use the right one.
    Yours sincerely,