from:"Terry Frankcombe"

Re: [OMPI users] How to run different versions of application in the same run?

2010-10-13 Thread Terry Frankcombe

See here:
http://www.open-mpi.org/faq/?category=running#mpmd-run



On Tue, 2010-10-12 at 22:21 -0400, Bowen Zhou wrote:
> Greetings,
> 
> I'm doing software fault injection in a parallel application to evaluate 
> the effect of hardware failures to the execution. My question is how to 
> execute the faulty version of the application on one node and the 
> fault-free version on all other nodes in the same run?
> 
> I understand that mpirun or mpiexec would require a globally accessible 
> path to the same executable mounted with NFS or some other file system. 
> So is there any way to specify different pathnames in different nodes?
> 
> Many thanks,
> 
> Bowen Zhou
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] a question about [MPI]IO on systems without network filesystem

2010-09-29 Thread Terry Frankcombe

Hi Paul

I think you should clarify whether you mean you want you application to
send all it's data back to a particular rank, which then does all IO (in
which case the answer is any MPI implementation can do this... it's a
matter of how you code the app), or if you want the application to know
nothing about it, but have the system somehow intercept all IO and make
it magically appear at a particular node (much harder).


On Wed, 2010-09-29 at 11:34 +0200, Paul Kapinos wrote:
> Dear OpenMPI developer,
> 
> We have a question about the possibility to use MPI IO (and possible 
> regular I/O) on clusters which does *not* have a common filesystem 
> (network filesystem) on all nodes.
> 
> A common filesystem is mainly NOT a hard precondition to use OpenMPI:
> http://www.open-mpi.org/faq/?category=running#do-i-need-a-common-filesystem
> 
> 
> Say, we have a (diskless? equipped with very small disks?) cluster, on 
> which only one node have access to a filesystem.
> 
> Is it possible to configure/run OpenMPI in a such way, that only _one_ 
> process (e.g. master) performs real disk I/O, and other processes sends 
> the data to the master which works as an agent?
> 
> Of course this would impacts the performance, because all data must be 
> send over network, and the master may became a bottleneck. But is such 
> scenario - IO of all processes bundled to one  process - practicable at all?
> 
> 
> Best wishes
> Paul
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] MPI_Reduce performance

2010-09-09 Thread Terry Frankcombe

On Thu, 2010-09-09 at 01:24 -0600, Ralph Castain wrote:
> As people have said, these time values are to be expected. All they
> reflect is the time difference spent in reduce waiting for the slowest
> process to catch up to everyone else. The barrier removes that factor
> by forcing all processes to start from the same place.
> 
> 
> No mystery here - just a reflection of the fact that your processes
> arrive at the MPI_Reduce calls at different times.


Yes, however, it seems Gabriele is saying the total execution time
*drops* by ~500 s when the barrier is put *in*.  (Is that the right way
around, Gabriele?)

That's harder to explain as a sync issue.



> On Sep 9, 2010, at 1:14 AM, Gabriele Fatigati wrote:
> 
> > More in depth,
> > 
> > 
> > total execution time without Barrier is about 1 sec.
> > 
> > 
> > Total execution time with Barrier+Reduce is 9453, with 128 procs.
> > 
> > 2010/9/9 Terry Frankcombe <te...@chem.gu.se>
> > Gabriele,
> > 
> > Can you clarify... those timings are what is reported for
> > the reduction
> > call specifically, not the total execution time?
> > 
> > If so, then the difference is, to a first approximation, the
> > time you
> > spend sitting idly by doing absolutely nothing waiting at
> > the barrier.
> > 
> > Ciao
> > Terry
> > 
> > 
> > --
> > Dr. Terry Frankcombe
> > Research School of Chemistry, Australian National University
> > Ph: (+61) 0417 163 509Skype: terry.frankcombe
> > 
> > 
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > 
> > 
> > 
> > 
> > 
> > -- 
> > Ing. Gabriele Fatigati
> > 
> > Parallel programmer
> > 
> > CINECA Systems & Tecnologies Department
> > 
> > Supercomputing Group
> > 
> > Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy
> > 
> > www.cineca.itTel:   +39 051 6171722
> > 
> > g.fatigati [AT] cineca.it   
> > 
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] MPI_Reduce performance

2010-09-08 Thread Terry Frankcombe

Gabriele,

Can you clarify... those timings are what is reported for the reduction
call specifically, not the total execution time?

If so, then the difference is, to a first approximation, the time you
spend sitting idly by doing absolutely nothing waiting at the barrier.

Ciao
Terry


-- 
Dr. Terry Frankcombe
Research School of Chemistry, Australian National University
Ph: (+61) 0417 163 509Skype: terry.frankcombe

Re: [OMPI users] OpenMPI Run-Time "Freedom" Question

2010-08-12 Thread Terry Frankcombe

On Thu, 2010-08-12 at 20:04 -0500, Michael E. Thomadakis wrote:
> The basic motive in this hypothetical situation is to build the MPI 
> application ONCE and then swap run-time libs as newer compilers come out 

Not building your application with the compiler you want to use sounds
like a very bad idea to me.  There's more to a compiler than it's
runtime libraries.

Re: [OMPI users] MPI_Bcast issue

2010-08-11 Thread Terry Frankcombe

On Tue, 2010-08-10 at 19:09 -0700, Randolph Pullen wrote:
> Jeff thanks for the clarification,
> What I am trying to do is run N concurrent copies of a 1 to N data
> movement program to affect an N to N solution.

I'm no MPI guru, nor do I completely understand what you are doing, but
isn't this an allgather (or possibly an alltoall)?

Re: [OMPI users] Memory allocation error when linking with MPI libraries

2010-08-08 Thread Terry Frankcombe

You're trying to do a 6GB allocate.  Can your underlying system handle
that?  IF you compile without the wrapper, does it work?

I see your executable is using the OMPI memory stuff.  IIRC there are
switches to turn that off.


On Fri, 2010-08-06 at 15:05 +0200, Nicolas Deladerriere wrote:
> Hello,
> 
> I'am having an sigsegv error when using simple program compiled and
> link with openmpi.
> I have reproduce the problem using really simple fortran code. It
> actually does not even use MPI, but just link with mpi shared
> libraries. (problem does not appear when I do not link with mpi
> libraries)
>% cat allocate.F90
>program test
>implicit none
>integer, dimension(:), allocatable :: z
>integer(kind=8) :: l
> 
>write(*,*) "l ?"
>read(*,*) l
>
>ALLOCATE(z(l))
>z(1) = 111
>z(l) = 222
>DEALLOCATE(z)
> 
>end program test
> 
> I am using openmpi 1.4.2 and gfortran for my tests. Here is the
> compilation :
> 
>% ./openmpi-1.4.2/build/bin/mpif90 --showme -g -o testallocate
> allocate.F90
>gfortran -g -o testallocate allocate.F90
> -I/s0/scr1/TOMOT_19311_HAL_/openmpi-1.4.2/build/include -pthread
> -I/s0/scr1/TOMOT_19311_HAL_/openmpi-1.4.2/build/lib
> -L/s0/scr1/TOMOT_19311_HAL_/openmpi-1.4.2/build/lib -lmpi_f90
> -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl -Wl,--export-dynamic -lnsl
> -lutil -lm -ldl -pthread
> 
> When I am running that test with different length, I sometimes get a
> "Segmentation fault" error. Here are two examples using two specific
> values, but error happens for many other values of length (I did not
> manage to find which values of lenght gives that error)
> 
>%  ./testallocate
> l ?
>16
>Segmentation fault
>% ./testallocate
> l ?
>20
> 
> I used debugger with re-compiled version of openmpi using debug flag.
> I got the folowing error in function sYSMALLOc
> 
>Program received signal SIGSEGV, Segmentation fault.
>0x2b70b3b3 in sYSMALLOc (nb=640016, av=0x2b930200)
> at malloc.c:3239
>3239set_head(remainder, remainder_size | PREV_INUSE);
>Current language:  auto; currently c
>(gdb) bt
>#0  0x2b70b3b3 in sYSMALLOc (nb=640016,
> av=0x2b930200) at malloc.c:3239
>#1  0x2b70d0db in opal_memory_ptmalloc2_int_malloc
> (av=0x2b930200, bytes=64) at malloc.c:4322
>#2  0x2b70b773 in opal_memory_ptmalloc2_malloc
> (bytes=64) at malloc.c:3435
>#3  0x2b70a665 in opal_memory_ptmalloc2_malloc_hook
> (sz=64, caller=0x2bf8534d) at hooks.c:667
>#4  0x2bf8534d in _gfortran_internal_free ()
> from /usr/lib64/libgfortran.so.1
>#5  0x00400bcc in MAIN__ () at allocate.F90:11
>#6  0x00400c4e in main ()
>(gdb) display
>(gdb) list
>3234  if ((unsigned long)(size) >= (unsigned long)(nb +
> MINSIZE)) {
>3235remainder_size = size - nb;
>3236remainder = chunk_at_offset(p, nb);
>3237av->top = remainder;
>3238set_head(p, nb | PREV_INUSE | (av != _arena ?
> NON_MAIN_ARENA : 0));
>3239set_head(remainder, remainder_size | PREV_INUSE);
>3240check_malloced_chunk(av, p, nb);
>3241return chunk2mem(p);
>3242  }
>3243
> 
> 
> I also did the same test in C and I got the same problem. 
> 
> Does someone has any idea that could help me understand what's going
> on ?
> 
> Regards
> Nicolas
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Accessing to the send buffer

2010-08-02 Thread Terry Frankcombe

On Mon, 2010-08-02 at 11:36 -0400, Alberto Canestrelli wrote:
> Thanks,
> ok that is not my problem I never read a data from the posted receive 
> before the correspondent WAIT. Now the last question is: what could 
> happen if I  am reading the data from the posted send? I do it plenty of 
> times! possible consequences?Can you guarantee me that this approach is 
> safe?

Well, it seems from what you've posted that the standard says you should
not assume it's safe.  Don't you want to be standard-compliant?


> 
> Il 02/08/2010 11.29, Alberto Canestrelli ha scritto:
> > In the posted irecv case if you are reading from the posted receive
> > buffer the problem is you may get one of three values:
> >
> > 1. pre irecv value
> > 2. value received from the irecv in progress
> > 3. possibly garbage if you are unlucky enough to access memory that is
> > at the same time being updated.
> >
> > --td
> > Alberto Canestrelli wrote:
> >>  Thanks,
> >>  it was late in the night yesterday and i highlighted STORES but I
> >>  meanted to highlight LOADS! I know that
> >>  stores are not allowed when you are doing non blocking send-recv. But
> >>  I was impressed about LOADS case. I always do some loads of the data
> >>  between all my ISEND-IRECVs and my WAITs. Could you please confirm me
> >>  that OMPI can handle the LOAD case? And if it cannot handle it, which
> >>  could be the consequence? What could happen in the worst of the case
> >>  when there is a data race in reading a data?
> >>  thanks
> >>  alberto
> >>
> >>  Il 02/08/2010 9.32, Alberto Canestrelli ha scritto:
> >> > I believe it is definitely a no-no to STORE (write) into a send buffer
> >> > while a send is posted. I know there have been debate in the forum to
> >> > relax LOADS (reads) from a send buffer. I think OMPI can handle the
> >> > latter case (LOADS). On the posted receive side you open yourself up
> >> > for some race conditions and overwrites if you do STORES or LOADS 
> >> from a
> >> > posted receive buffer.
> >> >
> >> > --td
> >> >
> >> > Alberto Canestrelli wrote:
> >> >> Hi,
> >> >> I have a problem with a fortran code that I have parallelized with
> >> >> MPI. I state in advance that I read the whole ebook "Mit Press - 
> >> Mpi -
> >> >> The Complete Reference, Volume 1" and I took different MPI 
> >> classes, so
> >> >> I have a discrete MPI knowledge. I was able to solve by myself all 
> >> the
> >> >> errors I bumped into but now I am not able to find the bug of my code
> >> >> that provides erroneous results. Without entering in the details 
> >> of my
> >> >> code, I think that the cause of the problem could be reletad to the
> >> >> following aspect highlighted in the above ebook (in the follow I copy
> >> >> and paste from the e-book):
> >> >>
> >> >> A nonblocking post-send call indicates that the system may start
> >> >> copying data
> >> >> out of the send buffer. The sender must not access any part of the
> >> >> send buffer
> >> >> (neither for loads nor for STORES) after a nonblocking send operation
> >> >> is posted until
> >> >> the complete send returns.
> >> >> A nonblocking post-receive indicates that the system may start 
> >> writing
> >> >> data into
> >> >> the receive buffer. The receiver must not access any part of the
> >> >> receive buffer after
> >> >> a nonblocking receive operation is posted, until the complete-receive
> >> >> returns.
> >> >> Rationale. We prohibit read accesses to a send buffer while it is
> >> >> being used, even
> >> >> though the send operation is not supposed to alter the content of 
> >> this
> >> >> buffer. This
> >> >> may seem more stringent than necessary, but the additional 
> >> restriction
> >> >> causes little
> >> >> loss of functionality and allows better performance on some systems-
> >> >> consider
> >> >> the case where data transfer is done by a DMA engine that is not
> >> >> cache-coherent
> >> >> with the main processor.End of rationale.
> >> >>
> >> >> I use plenty of nonblocking post-send in my code. Is it really true
> >> >> that the sender must not access any part of the send buffer not even
> >> >> for STORES? Or was it a MPI 1.0 issue?
> >> >> Thanks.
> >> >> alberto
> >
> 
> -- 
> **
> Ing. Alberto Canestrelli
> Università degli Studi di Padova,
> Dipartimento di Ingegneria Idraulica, Marittima,
> Ambientale e Geotecnica,
> via Loredan 20, 35131 PADOVA (ITALY)
> phone: +39 0498275438
> fax:  +39 0498275446
> mail:  canestre...@idra.unipd.it
> 
> ***
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-28 Thread Terry Frankcombe

On Tue, 2010-07-27 at 16:19 -0400, Gus Correa wrote:
> Hi Hugo, David, Jeff, Terry, Anton, list
> 
> I suppose maybe we're guessing that somehow on Hugo's iMac 
> MPI_DOUBLE_PRECISION may not have as many bytes as dp = kind(1.d0),
> hence the segmentation fault on MPI_Allreduce.
> 
> Question:
> 
> Is there a simple way to check the number of bytes associated to each
> MPI basic type of OpenMPI on a specific machine (or machine+compiler)?
> 
> Something that would come out easily, say, from ompi_info?

bit_size() will give you the integer size.  For reals, digits() will
give you a hint, but the Fortran real data model is designed to not tie
you to a particular representation (my interpretation), so there's no
language feature to give a simple word size.

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-27 Thread Terry Frankcombe

On Tue, 2010-07-27 at 08:11 -0400, Jeff Squyres wrote:
> On Jul 26, 2010, at 11:06 PM, Hugo Gagnon wrote:
> 
> >   8 integer, parameter :: dp = kind(1.d0)
> >   9 real(kind=dp) :: inside(5), outside(5)
> 
> I'm not a fortran expert -- is kind(1.d0) really double precision?  According 
> to http://gcc.gnu.org/onlinedocs/gcc-3.4.6/g77/Kind-Notation.html, kind(2) is 
> double precision (but that's for a different compiler, and I don't quite grok 
> the ".d0" notation).
> 

Urgh!  Thank heavens gcc have moved away from that stupid idea.

kind=8 is normally double precision (and is with gfortran).  kind(1.0d0)
is always double precision.

The d (as opposed to e) means DP.

Re: [OMPI users] A problem with 'mpiexec -launch-agent'

2010-06-14 Thread Terry Frankcombe

On Tue, 2010-06-15 at 00:13 +0200, Reuti wrote:
> Hi,
> 
> Am 13.06.2010 um 09:02 schrieb Zhang Linbo:
> 
> > Hi,
> >
> > I'm new to OpenMPI and have encountered a problem with mpiexec.
> >
> > Since I need to set up the execution environment for OpenMPI programs
> > on the execution nodes, I use the following command line to launch an
> > OMPI program:
> >
> >   mpiexec -launch-agent /some_path/myscript 
> >
> > The problem is: the above command works fine if I invoke 'mpiexec'
> > without an absolute path just like above (assuming the PATH variable
> > is properly set), but if I prepend an absolute path to 'mpiexec',  
> > e.g.:
> >
> >   /OMPI_dir/bin/mpiexec -launch-agent /some_path/myscript 
> 
> using an absolute path is equivalent to use the --prefix option to  
> `mpiexec`. Both ways lead obviously to the erroneous behavior you  
> encounter.

Hi folks

Speaking as no more than an uneducated user, having the behaviour change
depending on invoking by an absolute path or invoking by some
unspecified (potentially shell-dependent) path magic seems like a bad
idea.

As a long-time *nix user, this just rubs me the wrong way.

Ciao
Terry


-- 
Dr. Terry Frankcombe
Research School of Chemistry, Australian National University
Ph: (+61) 0417 163 509Skype: terry.frankcombe

Re: [OMPI users] How do I run OpenMPI safely on a Nehalem standalone machine?

2010-05-10 Thread Terry Frankcombe


> I don't know what Jeff meant by that, but we haven't seen a feasible way 
> of disabling HT without rebooting and using the BIOS options.

According to this page:
http://dag.wieers.com/blog/is-hyper-threading-enabled-on-a-linux-system
in RHEL5/CentOS-5 it's easy to switch it on and off on the fly.  No
wizardry required.

(Disclaimer:  I neither run RHEL nor bugger about with HT.)

Re: [OMPI users] Fortran derived types

2010-05-06 Thread Terry Frankcombe

Hi Derek

On Wed, 2010-05-05 at 13:05 -0400, Cole, Derek E wrote:
> In general, even in your serial fortran code, you're already taking a
> performance hit using a derived type.

Do you have any numbers to back that up?

Ciao
Terry

Re: [OMPI users] Hide Abort output

2010-04-06 Thread Terry Frankcombe

> Jeff -
>
> I started a discussion of MPI_Quit on the MPI Forum reflector.  I raised
> the question because I do not think using MPI_Abort is appropriate.
>
> The situation is when  a single task decides the parallel program has
> arrived at the desired answer and therefore whatever the other tasks are
> currently doing has become irrelevant.  The other tasks do not know that
> the answer has been found by one of them so they cannot just call
> MPI_Finalize.
>
> Do we need a clean and portable way for the task that detects that the
> answer has been found and written out to do a single handed termination of
> the parallel job?

I'm not Jeff.  But isn't that MPI_Abort with an appropriate errorcode
argument, provided we can get it to shut up?

Re: [OMPI users] Help om Openmpi

2010-04-06 Thread Terry Frankcombe



>1. Run the following command on the client
>   * -> ssh-keygen -t dsa
>2. File id_dsa and id_dsa.pub will be created inside $HOME/.ssh
>3. Copy id_dsa.pub to the server's .ssh directory
>   * -> scp $HOME/.ssh/id_dsa.pub user@server:/home/user/.ssh
>4. Change to /root/.ssh and create file authorized_keys containing
> id_dsa content
>   * -> cd /home/user/.ssh
>   * -> cat id_dsa >> authorized_keys
>5. You can try ssh to the server from the client and no password
> will be needed
>   * -> ssh user@server

That prescription is a little messed up.  You need to create id_dsa and
id_dsa.pub on the client, as above.

But it is the client's id_dsa.pub that needs to go
into /home/user/.ssh/authorized_keys on the server, which seems to be
not what the above recipe does.

If that doesn't help, try adding -v or even -v -v to the ssh command to
see what the connection is trying to do w.r.t. your keys.

Re: [OMPI users] Help om Openmpi

2010-04-06 Thread Terry Frankcombe

On Tue, 2010-04-06 at 02:33 -0500, Trent Creekmore wrote:
> SSH means SECURE Shell. That being said, it would not be very secure
> without a password, now would it?

I think you need to read about public key authentication.  It is secure.

Re: [OMPI users] Problem with private variables in modules

2010-03-10 Thread Terry Frankcombe

Hi Justin

I think you are confusing OpenMP and OpenMPI.

You sound like you're using OpenMP.  This mailing list is for OpenMPI, a
specific implementation of MPI.  OpenMP and MPI, while having some
overlapping aims, are completely separate.

I suggest you post your query to an OpenMP mailing list.

Ciao
Terry


On Wed, 2010-03-10 at 13:56 -0500, Justin Watson wrote:
> Hello everyone,
> 
>  
> 
> I have  come across a situation where I am trying to
> make private variables that passed to subroutines using modules.  Here
> is the situation, The main program calls two different routines.
> These routines are functionally different but utilize the same
> variable names for some global data which are contained in a module
> (this was done to make the passing of the data easy to various levels
> of subroutines it is not needed outside the subroutine chain).  I am
> using workshare constructs to run each of these routines on its own
> thread.  I would  like to make the data in the module private to that
> thread.  When I set the variable to private it still behaves as if it
> were shared.  If I pass the variable to the routines via an argument
> list everything is fine (this will cause me to re-write a bunch of
> code).  The question is … shouldn’t this work within the context of a
> module as well?  
> 
>  
> 
> I have been getting different result using different
> compilers.  I have tried Lahey and Intel and they both show signs of
> not handling this properly.  I have attach a small test problem that
> mimics what I am doing in the large code.
> 
>  
> 
> Justin K. Watson
> Email: jkw...@arl.psu.edu
> 
> Research Assistant
> Phone: (814) 863-6754
> 
> Computational Methods Development Department
> Fax: (814) 865-3287
> 
>  
> 
>  
> 
> Applied Research Laboratory
> 
> The Pennsylvania State University
> 
> P.O. Box 30
> 
> State College, Pa 16804-0030
> 
>  
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] newbe question

2010-03-09 Thread Terry Frankcombe

It sounds like, with the fault tolerance features specifically mentioned
by Vasiliy, MPI in its current form may not be the simplest choice.


On Tue, 2010-03-09 at 18:55 -0700, Ralph Castain wrote:
> Running an orted directly won't work - it is intended solely to be launched 
> when running a job with "mpirun".
> 
> You application doesn't immediately sounds like it -needs- MPI, though you 
> could always use it anyway. The MPI messaging system is fast, but it isn't 
> clear if your application will necessarily benefit from that speed. It 
> depends upon how much communication is going on vs computation and idle time.
> 
> If you are more familiar with the non-MPI methods, I would personally do it 
> that way unless I found a need for MPI - for example, a place where MPI 
> collectives such as MPI_Allgather would be helpful.
> 
> 
> On Mar 9, 2010, at 12:10 PM, Vasiliy G Tolstov wrote:
> 
> > Hello.
> > Some times ago i run study MPI (openmpi). 
> > I need to write application (client/server) runs on 50 servers in
> > parallel. Each application can communicate with others by tcp/ip (send
> > commands, doing some parallel computations).
> > 
> > Master - controls all clients - slaves (send control commands, if needed
> > restart clients). If master machine with server application die, some
> > other server need to recive master role and controls other slaves.
> > 
> > Can i do this things with openmpi? Or i need to write standart tcp/ip
> > client/server application?
> > 
> > I'm try to read some search results in google like this -
> > http://docs.sun.com/source/819-7480-11/ExecutingPrograms.htmlaopenmpi%
> > 20orted%20persistent%20daemon
> > but orted return error:
> > 
> > orted --daemonize
> > [mobile:24107] [[INVALID],INVALID] ORTE_ERROR_LOG: Not found in file
> > runtime/orte_init.c at line 125
> > --
> > It looks like orte_init failed for some reason; your parallel process is
> > likely to abort.  There are many reasons that a parallel process can
> > fail during orte_init; some of which are due to configuration or
> > environment problems.  This failure appears to be an internal failure;
> > here's some additional information (which may only be relevant to an
> > Open MPI developer):
> > 
> >  orte_ess_base_select failed
> >  --> Returned value Not found (-13) instead of ORTE_SUCCESS
> > 
> > 
> > Thank You. Sorry for my poor english.
> > 
> > 
> > -- 
> > Vasiliy G Tolstov 
> > Selfip.Ru
> > 
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] running external program on same processor (Fortran)

2010-03-03 Thread Terry Frankcombe

On Wed, 2010-03-03 at 12:57 -0500, Prentice Bisbal wrote:
> Reuti wrote:
> > Are you speaking of the same?
> 
> Good point, Reuti. I was thinking of a cluster scheduler like SGE or
> Torque.


Yeah, I meant the scheduler in the CPU time slice sense.

http://en.wikipedia.org/wiki/Scheduling_(computing)
vs.
http://en.wikipedia.org/wiki/Job_scheduler




> > Am 03.03.2010 um 17:32 schrieb Prentice Bisbal:
> > 
> >> Terry Frankcombe wrote:
> >>> Surely this is the problem of the scheduler that your system uses,
> > 
> > This I would also state.
> > 
> > 
> >>> rather than MPI?
> > 
> > Scheduler in the Linux kernel?
> > 
> > 
> >> That's not true. The scheduler only assigns the initial processes to
> >> nodes
> > 
> > Scheduler in MPI?
> > 
> > 
> >> and starts them. It can kill the processes it starts if they use
> >> too much memory or run too long, but doesn't prevent them from spawning
> >> more processes, and once spawned,
> > 
> > When the processes are bound to one and the same core, these addititonal
> > processes won't intefere with other jobs' processes on the same node
> > which run on the other cores.
> > 
> > -- Reuti
> > 
> > 
> >> unless they are spawned through the
> >> scheduler, it has no control over them.
> >>>
> >>>
> >>> On Wed, 2010-03-03 at 00:48 +, abc def wrote:
> >>>> Hello,
> >>>>
> >>>> I wonder if someone can help.
> >>>>
> >>>> The situation is that I have an MPI-parallel fortran program. I run it
> >>>> and it's distributed on N cores, and each of these processes must call
> >>>> an external program.
> >>>>
> >>>> This external program is also an MPI program, however I want to run it
> >>>> in serial, on the core that is calling it, as if it were part of the
> >>>> fortran program. The fortran program waits until the external program
> >>>> has completed, and then continues.
> >>>>
> >>>> The problem is that this external program seems to run on any core,
> >>>> and not necessarily the (now idle) core that called it. This slows
> >>>> things down a lot as you get one core doing multiple tasks.
> >>>>
> >>>> Can anyone tell me how I can call the program and ensure it runs only
> >>>> on the core that's calling it? Note that there are several cores per
> >>>> node. I can ID the node by running the hostname command (I don't know
> >>>> a way to do this for individual cores).
> >>>>
> >>>> Thanks!
> >>>>
> >>>> 
> >>>>
> >>>> Extra information that might be helpful:
> >>>>
> >>>> If I simply run the external program from the command line (ie, type
> >>>> "/path/myprogram.ex "), it runs fine. If I run it within the
> >>>> fortran program by calling it via
> >>>>
> >>>> CALL SYSTEM("/path/myprogram.ex")
> >>>>
> >>>> it doesn't run at all (doesn't even start) and everything crashes. I
> >>>> don't know why this is.
> >>>>
> >>>> If I call it using mpiexec:
> >>>>
> >>>> CALL SYSTEM("mpiexec -n 1 /path/myprogram.ex")
> >>>>
> >>>> then it does work, but I get the problem that it can go on any core.
> >>>>
> >>>> __
> >>>> Do you want a Hotmail account? Sign-up now - Free
> >>>> ___
> >>>> users mailing list
> >>>> us...@open-mpi.org
> >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>
> >>> ___
> >>> users mailing list
> >>> us...@open-mpi.org
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>>
> >>
> >> -- 
> >> Prentice Bisbal
> >> Linux Software Support Specialist/System Administrator
> >> School of Natural Sciences
> >> Institute for Advanced Study
> >> Princeton, NJ
> >> ___
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> > 
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > 
>

Re: [OMPI users] MPI_Init() and MPI_Init_thread()

2010-03-03 Thread Terry Frankcombe

I can't speak for the developers.  However, I think it's to do with the
library internals.

>From here: http://www.mpi-forum.org/docs/mpi-20-html/node165.htm

"Advice to implementors. 

"If provided is not MPI_THREAD_SINGLE then the MPI library should not
invoke C/ C++/Fortran library calls that are not thread safe, e.g., in
an environment where malloc is not thread safe, then malloc should not
be used by the MPI library. 

"Some implementors may want to use different MPI libraries for different
levels of thread support. They can do so using dynamic linking and
selecting which library will be linked when MPI_INIT_THREAD is invoked.
If this is not possible, then optimizations for lower levels of thread
support will occur only when the level of thread support required is
specified at link time. ( End of advice to implementors.)"

On Wed, 2010-03-03 at 16:33 +0900, Yuanyuan ZHANG wrote:
> Hi all,
> 
> I am a beginner of MPI and a little confused with
> MPI_Init_thread() function:
> 
> If we use MPI_Init() or MPI_Init_thread(MPI_THREAD_SINGLE, provided)
> to initialize MPI environment, when we use OpenMP
> PARALLEL directive each process is forked to multiple
> threads and when an MPI function is called, one thread
> is used to execute the call. It seems that this
> has same effect when we use MPI_Init_Thread(MPI_THREAD_FUNNELED,
> provided). So what's the difference between MPI_Init() and
> MPI_Init_thread(MPI_THREAD_FUNNELED, provided)?
> 
> Thanks in advance,
> 
> Yuanyuan
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] running external program on same processor (Fortran)

2010-03-02 Thread Terry Frankcombe

Surely this is the problem of the scheduler that your system uses,
rather than MPI?


On Wed, 2010-03-03 at 00:48 +, abc def wrote:
> Hello,
> 
> I wonder if someone can help.
> 
> The situation is that I have an MPI-parallel fortran program. I run it
> and it's distributed on N cores, and each of these processes must call
> an external program.
> 
> This external program is also an MPI program, however I want to run it
> in serial, on the core that is calling it, as if it were part of the
> fortran program. The fortran program waits until the external program
> has completed, and then continues.
> 
> The problem is that this external program seems to run on any core,
> and not necessarily the (now idle) core that called it. This slows
> things down a lot as you get one core doing multiple tasks.
> 
> Can anyone tell me how I can call the program and ensure it runs only
> on the core that's calling it? Note that there are several cores per
> node. I can ID the node by running the hostname command (I don't know
> a way to do this for individual cores).
> 
> Thanks!
> 
> 
> 
> Extra information that might be helpful:
> 
> If I simply run the external program from the command line (ie, type
> "/path/myprogram.ex "), it runs fine. If I run it within the
> fortran program by calling it via
> 
> CALL SYSTEM("/path/myprogram.ex")
> 
> it doesn't run at all (doesn't even start) and everything crashes. I
> don't know why this is.
> 
> If I call it using mpiexec:
> 
> CALL SYSTEM("mpiexec -n 1 /path/myprogram.ex")
> 
> then it does work, but I get the problem that it can go on any core. 
> 
> __
> Do you want a Hotmail account? Sign-up now - Free
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] problems on parallel writing

2010-02-24 Thread Terry Frankcombe

On Wed, 2010-02-24 at 13:40 -0500, w k wrote:
> Hi Jordy,
> 
> I don't think this part caused the problem. For fortran, it doesn't
> matter if the pointer is NULL as long as the count requested from the
> processor is 0. Actually I tested the code and it passed this part
> without problem. I believe it aborted at MPI_FILE_SET_VIEW part.
> 

For the record:  A pointer is not NULL unless you've nullified it.
IIRC, the Fortran standard says that any non-assigning reference to an
unassigned, unnullified pointer is undefined (or maybe illegal... check
the standard).

Re: [OMPI users] [btl_tcp_frag.c:216:mca_btl_tcp_frag_recv] mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)

2010-02-23 Thread Terry Frankcombe

Vasp can be temperamental.  For example, I have a largish system (384
atoms) for which Vasp hangs if I request more than 120 MD steps at a
time.  I am not convinced that this is OPMI's problem.  However, your
case looks much more diagnosable than my silent spinning hang.

On Tue, 2010-02-23 at 16:00 -0500, Thomas Sadowski wrote:
> Hello all,
> 
> 
> I am currently attempting to use OpenMPI as my MPI for my VASP
> calculations. VASP is an ab initio DFT code. Anyhow, I was able to
> compile and build OpenMPI v. 1.4.1 (i thought) correctly using the
> following command:
> 
> ./configure --prefix=/home/tes98002 F77=ifort FC=ifort
> --with-tm=/usr/local
> 
> 
> Note that I am compiling OpenMPI for use with Torque/PBS which was
> compiled using Intel v 10 Fortran compilers and gcc for C\C++. After
> building OpenMPI, I successfully used it to compile VASP using Intel
> MKL v. 10.2. I am running OpenMPI in heterogeneous cluster
> configuration, and I used an NFS mount so that all the compute nodes
> could access the executable. Our hardware configuration is as follows:
> 
> 7 nodes: 2 single-core AMD Opteron processors, 2GB of RAM (henceforth
> called old nodes)
> 4 nodes: 2 duo-core AMD Opteron processors, 2GB of RAM (henceforth
> called new nodes)
> 
> We are currently running SUSE v. 8.x. No we have problems when we
> attempt to run VASP on multiple nodes. A small system (~10 atoms) runs
> perfectly well with Torque and OpenMPI in all instances: running using
> single old node, a single new node, or across multiple old and new
> nodes. Larger systems (>24 atoms) are able to run to completion if
> they are kept within a single old or new node. However, if I try to
> run a job on multiple old or new nodes I receive a segfault. In
> particular the error is as follows:
> 
> 
> [node12][[7759,1],2][btl_tcp_frag.c:216:mca_btl_tcp_frag_recv]
> mca_btl_tcp_frag_recv: readv failed: Connection reset by peer
> (104)[node12][[7759,1],1][btl_tcp_frag.c:216:mca_btl_tcp_frag_recv]
> [node12][[7759,1],3][btl_tcp_frag.c:216:mca_btl_tcp_frag_recv]
> [node12][[7759,1],0][btl_tcp_frag.c:216:mca_btl_tcp_frag_recv]
> mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
> mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
> [node12][[7759,1],1][btl_tcp_frag.c:216:mca_btl_tcp_frag_recv]
> mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
> mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
> [node12][[7759,1],3][btl_tcp_frag.c:216:mca_btl_tcp_frag_recv]
> mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
> [node12][[7759,1],0][btl_tcp_frag.c:216:mca_btl_tcp_frag_recv]
> mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
> [node12][[7759,1],2][btl_tcp_frag.c:216:mca_btl_tcp_frag_recv]
> mca_btl_tcp_frag_recv: readv failed: Connection reset by peer (104)
> --
> mpirun noticed that process rank 6 with PID 11985 on node node11
> exited on signal 11 (Segmentation fault).
> --
> forrtl: error (78): process killed (SIGTERM)
> forrtl: error (78): process killed (SIGTERM)
> forrtl: error (78): process killed (SIGTERM)
> forrtl: error (78): process killed (SIGTERM)
> 
> 
> 
> It seems to me that this is a memory issue, however I may be mistaken.
> I have searched the archive and have as yet seen an adequate treatment
> of the problem. I have also tried other versions of OpenMPI. Does
> anyone have any insight into our issues
> 
> 
> -Tom
>  
> 
> 
> 
> 
> 
> 
> __
> Hotmail: Trusted email with powerful SPAM protection. Sign up now.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Bad Infiniband latency with subounce

2010-02-15 Thread Terry Frankcombe

On Mon, 2010-02-15 at 20:18 -0700, Ralph Castain wrote:
> Did you run it with -mca mpi_paffinity_alone 1? Given this is 1.4.1, you can 
> set the bindings to -bind-to-socket or -bind-to-core. Either will give you 
> improved performance.
> 
> IIRC, MVAPICH defaults to -bind-to-socket. OMPI defaults to no binding.


Is this sensible?  Won't most users want processes bound?  OMPI's
supposed to "to the right thing" out of the box, right?

Re: [OMPI users] Test OpenMPI on a cluster

2010-01-31 Thread Terry Frankcombe

It seems your OpenMPI installation is not PBS-aware.

Either reinstall OpenMPI configured for PBS (and then you don't even
need -np 10), or, as Constantinos says, find the PBS nodefile and pass
that to mpirun.


On Sat, 2010-01-30 at 18:45 -0800, Tim wrote:
> Hi,  
>   
> I am learning MPI on a cluster. Here is one simple example. I expect the 
> output would show response from different nodes, but they all respond from 
> the same node node062. I just wonder why and how I can actually get report 
> from different nodes to show MPI actually distributes processes to different 
> nodes? Thanks and regards!
>   
> ex1.c  
>   
> /* test of MPI */  
> #include "mpi.h"  
> #include   
> #include   
>   
> int main(int argc, char **argv)  
> {  
> char idstr[2232]; char buff[22128];  
> char processor_name[MPI_MAX_PROCESSOR_NAME];  
> int numprocs; int myid; int i; int namelen;  
> MPI_Status stat;  
>   
> MPI_Init(,);  
> MPI_Comm_size(MPI_COMM_WORLD,);  
> MPI_Comm_rank(MPI_COMM_WORLD,);  
> MPI_Get_processor_name(processor_name, );  
>   
> if(myid == 0)  
> {  
>   printf("WE have %d processors\n", numprocs);  
>   for(i=1;i   {  
> sprintf(buff, "Hello %d", i);  
> MPI_Send(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD); }  
> for(i=1;i {  
>   MPI_Recv(buff, 128, MPI_CHAR, i, 0, MPI_COMM_WORLD, );  
>   printf("%s\n", buff);  
> }  
> }  
> else  
> {   
>   MPI_Recv(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD, );  
>   sprintf(idstr, " Processor %d at node %s ", myid, processor_name);  
>   strcat(buff, idstr);  
>   strcat(buff, "reporting for duty\n");  
>   MPI_Send(buff, 128, MPI_CHAR, 0, 0, MPI_COMM_WORLD);  
> }  
> MPI_Finalize();  
>   
> }  
>   
> ex1.pbs  
>   
> #!/bin/sh  
> #  
> #This is an example script example.sh  
> #  
> #These commands set up the Grid Environment for your job:  
> #PBS -N ex1  
> #PBS -l nodes=10:ppn=1,walltime=1:10:00  
> #PBS -q dque  
>   
> # export OMP_NUM_THREADS=4  
>   
>  mpirun -np 10 /home/tim/courses/MPI/examples/ex1  
>   
> compile and run:
> 
> [tim@user1 examples]$ mpicc ./ex1.c -o ex1   
> [tim@user1 examples]$ qsub ex1.pbs  
> 35540.mgt  
> [tim@user1 examples]$ nano ex1.o35540  
>   
> Begin PBS Prologue Sat Jan 30 21:28:03 EST 2010 1264904883  
> Job ID: 35540.mgt  
> Username:   tim  
> Group:  Brown  
> Nodes:  node062 node063 node169 node170 node171 node172 node174 
> node175  
> node176 node177  
> End PBS Prologue Sat Jan 30 21:28:03 EST 2010 1264904883  
>   
> WE have 10 processors  
> Hello 1 Processor 1 at node node062 reporting for duty  
>   
> Hello 2 Processor 2 at node node062 reporting for duty  
>   
> Hello 3 Processor 3 at node node062 reporting for duty  
>   
> Hello 4 Processor 4 at node node062 reporting for duty  
>   
> Hello 5 Processor 5 at node node062 reporting for duty  
>   
> Hello 6 Processor 6 at node node062 reporting for duty  
>   
> Hello 7 Processor 7 at node node062 reporting for duty  
>   
> Hello 8 Processor 8 at node node062 reporting for duty  
>   
> Hello 9 Processor 9 at node node062 reporting for duty  
>   
>   
> Begin PBS Epilogue Sat Jan 30 21:28:11 EST 2010 1264904891  
> Job ID: 35540.mgt  
> Username:   tim  
> Group:  Brown  
> Job Name:   ex1  
> Session:15533  
> Limits: neednodes=10:ppn=1,nodes=10:ppn=1,walltime=01:10:00  
> Resources:  cput=00:00:00,mem=420kb,vmem=8216kb,walltime=00:00:03  
> Queue:  dque  
> Account:  
> Nodes:  node062 node063 node169 node170 node171 node172 node174 node175 
> node176  
> node177  
> Killing leftovers...  
>   
> End PBS Epilogue Sat Jan 30 21:28:11 EST 2010 1264904891  
> 
> 
> 
> 
>   
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] speed up this problem by MPI

2010-01-29 Thread Terry Frankcombe

In rank 0 main broadcast feature to all processes.
In f calculate a slice of array based on rank, then either send/recv
back to rank 0 or maybe gather.
Only rank 0 does everything else.  (Other ranks must call f after
recv'ing feature.)


On Thu, 2010-01-28 at 21:23 -0800, Tim wrote:
> Sorry, complicated_computation() and f() are simplified too much. They do 
> take more inputs. 
> 
> Among the inputs to complicated_computation(), some is passed from the main() 
> to f() by address since it is a big array, some is passed by value, some are 
> created inside f() before the call to complicated_computation(). 
> 
> so actually (although not exactly) the code is like:
> 
>  int main(int argc, char ** argv)   
>  {   
>   int size;
>   double *feature = new double[1000];
>  // compute values of elements of "feature"
>  // some operations  
>  f(size, feature);   
>  // some operations  
>  delete [] feature;   
>  return 0;   
>  }   
>
>  void f(int size, double *feature)   
>  {   
>  vector coeff;  
>  // read from a file into elements of coeff
>  MyClass myobj;
>  double * array =  new double [coeff.size()];   
>  for (int i = 0; i < coeff.size(); i++) // need to speed up by MPI.
>  {
>  array[i] = myobj.complicated_computation(size, coeff[i], feature); // 
> time consuming   
>  }   
>  // some operations using all elements in array 
>  delete [] array;
>  }
> 
> --- On Thu, 1/28/10, Eugene Loh  wrote:
> 
> > From: Eugene Loh 
> > Subject: Re: [OMPI users] speed up this problem by MPI
> > To: "Open MPI Users" 
> > Date: Thursday, January 28, 2010, 11:40 PM
> > Tim wrote:
> > 
> > > Thanks Eugene!
> > > 
> > > My case, after simplified, is to speed up the
> > time-consuming computation in the loop below by assigning
> > iterations to several nodes in a cluster by MPI. Each
> > iteration of the loop computes each element of an array. The
> > computation of each element is independent of others in the
> > array.
> > >   int main(int argc, char
> > ** argv)   {   
> >// some operations 
> >  f(size);   
> >// some
> > operations 
> >return 0;   
> >} 
> >void f(int size)   
> >{   // some
> > operations 
> >int i; 
> >  double * array =  new double
> > [size];   
> >for (i = 0; i < size; i++) // need to
> > speed up by MPI.
> > > {   
> >array[i] = complicated_computation(); //
> > time consuming 
> > What are the inputs to complicated_computation()? 
> > Does each process know what the inputs are?  Or, do
> > they need to come from the master process?  Are there
> > many inputs?
> > 
> > > }   
> >// some operations using all
> > elements in array   
> >delete [] array; 
> >}
> > >  
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > 
> 
> 
>   
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] speed up this problem by MPI

2010-01-28 Thread Terry Frankcombe

On Thu, 2010-01-28 at 17:05 -0800, Tim wrote:
> Also I only need the loop that computes every element of the array to
> be parallelized. Someone said that the parallel part begins with
> MPI_Init and ends with MPI_Finilize, and one can do any serial
> computations before and/or after these calls. But I have wrote some
> MPI programs, and found that the parallel part is not restricted
> between MPI_Init and MPI_Finilize, but instead the whole program. If
> the rest part of the code has to be wrapped for process with ID 0, I
> have little idea about how to apply that to my case since the rest
> part would be the parts before and after the loop in the function and
> the whole in main().

I think you're being polluted by your OpenMP experience!  ;-)

Unlike in OpenMP, there is no concept of "parallel region" when using
MPI.  MPI allows you to pass data between processes.  That's all.  It's
up to you to write your code in such a way that the data is used allow
parallel computation.

Often MPI_Init and MPI_Finilize are amongst the first and last things
done in a parallel code, respectively.  They effectively say "set up
stuff so I can pass messages effectively" and "clean that up".  Each
process runs from start to finish "independently".

As an aside, using MPI is much more invasive than OpenMP.  Parallelising
an existing serial code can be hard with MPI.  But if you start from
scratch you usually end up with a better code with MPI than with OpenMP
(e.g. MPI makes you think about data locality, whereas you can ignore
all the bad things bad locality does and still have a working code with
OpenMP.)

Re: [OMPI users] How to start MPI_Spawn child processes early?

2010-01-27 Thread Terry Frankcombe

My question is why?  If you are willing to reserve a chunk of your
machine for yet-to-exist tasks, why not just create them all at mpirun
time and slice and dice your communicators as appropriate?


On Thu, 2010-01-28 at 09:24 +1100, Jaison Paul wrote:
> Hi, I am just reposting my early query once again. If anyone one can 
> give some hint, that would be great.
> 
> Thanks, Jaison
> ANU
> 
> Jaison Paul wrote:
> > Hi All,
> >
> > I am trying to use MPI for scientific High Performance (hpc) 
> > applications. I use MPI_Spawn to create child processes. Is there a 
> > way to start child processes early than the parent process, using 
> > MPI_Spawn?
> >
> > I want this because, my experiments showed that the time to spawn the 
> > children by parent is too long for HPC apps which slows down the whole 
> > process. If the children are ready when parent application process 
> > seeks for them, that initial delay can be avoided. Is there a way to 
> > do that?
> >
> > Thanks in advance,
> >
> > Jaison
> > Australian National University
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] MPI Processes and Auto Vectorization

2009-12-02 Thread Terry Frankcombe

On Tue, 2009-12-01 at 05:47 -0800, Tim Prince wrote:
> amjad ali wrote:
> > Hi,
> > thanks T.Prince,
> > 
> > Your saying:
> > "I'll just mention that we are well into the era of 3 levels of 
> > programming parallelization:  vectorization, threaded parallel (e.g. 
> > OpenMP), and process parallel (e.g. MPI)."  is a really great new 
> > learning for me. Now I can perceive better.
> > 
> > 
> > Can you please explain a bit about:
> > 
> > " This application gains significant benefit from cache blocking, so 
> > vectorization has more opportunity to gain than for applications which 
> > have less memory locality."
> > 
> > So now should I conclude from your reply that if we have single core 
> > processor in a PC, even than we can get benefit of Auto-Vectorization? 
> > And we do not need free cores for getting benefit of auto-vectorization?
> > 
> > Thank you very much.
> Yes, we were using auto-vectorization from before the beginnings of MPI 
> back in the days of single core CPUs; in fact, it would often show a 
> greater gain than it did on later multi-core CPUs.
> The reason for greater effectiveness of auto-vectorization with cache 
> blocking and possibly with single core CPUs would be less saturation of 
> memory buss.

Just for the record, there's a huge difference between "back in the days
of single core CPUs" and "before the beginnings of MPI".  They're
separated by a decade or two.

Vectorisation (automatic or otherwise) is useful on pipeline
architectures.  Pipeline architectures do go back a long way, at least
to the 80s.  They do predate MPI I think, but not parallel programming
and message passing in general.  Multi-core chips are
Johnny-come-latelys.

Re: [OMPI users] Open MPI Query

2009-11-24 Thread Terry Frankcombe

On Tue, 2009-11-24 at 15:55 +0530, Vivek Satpute wrote:
> Hi,
> 
> I am new to Open MPI ( which is part of OFED-1.4 packege ). I have few
> basic queries about
> Open MPI :
> 
> 1) I am using openmpi-1.2.8 ( it is part of OFED-1.4 ). Is has two
> examples i) hello_c ii) ring_c
> Does those examples work on Multiple machines or meant for a
> single node (i.e. localhost) ?

MPI programs are designed to work across multiple machines.  That's
largely the point of MPI.  They also work well as multiple processes on
a multiple-CPU machine, or combinations of the two.

> 2) Does MPI_Send() and MPI_Recv() calls send message from process on
> one machine to 
>process on another machine ? If yes, then how can I achieve this ?

Take a look at what the example codes are doing.  Read man mpirun.  Wait
for someone here to point you to an MPI primer or tute.

> 3) Does MPI APIs are implemented on the top of Infiniband ? Does MPI
> APIs uses Infiniband
>hardware and its module for sending and receiving data ?

If you've working infiniband, and your MPI implementation (eg. OpenMPI)
is configured to use it, then yes.  As your MPI is part of OFED, this is
eminently likely.

Re: [OMPI users] Distribute app over open mpi

2009-11-07 Thread Terry Frankcombe

On Fri, 2009-11-06 at 08:10 -0800, Arnaud Westenberg wrote:
> Hi all,
> 
> Sorry for the newbie question, but I'm having a hard time finding the answer, 
> as I'm not even familiar with the terminology...
> 
> I've setup a small cluster on Ubuntu (hardy) and everything is working great, 
> including slurm etc. If I run the well known 'Pi' program I get the proper 
> results returned from all the nodes.
> 
> However, I'm looking for a way such that I wouldn't need to install the 
> application on each node, nor on the shared nfs. Currently I get the obvious 
> error that the app is not found on the nodes on which it isn't installed.
> 
> The idea is that the master node would thus distribute the required (parts of 
> the) program to the slave nodes so they can perform the assigned work.
> 
> Reason is that I want to run an FEA package on a much larger (redhat) cluster 
> we currently use for CDF calculations. I really don't want to mess up the 
> cluster as we bought it already configured and compiling new versions of the 
> FEA package on it turns out to be a missing library nightmare.

I don't understand the question.  Do you have a binary that works on
your new cluster or not?  I just don't see how recompiling the code fits
with the rest of the question.  If you have an OpenMPI-linked binary for
your FEA, simply copy it out to your nodes, then run it.  There are many
ways to do this: which is best depends on many factors.  Probably scp is
your friend if you don't have a common filesystem.

Re: [OMPI users] mpirun example program fail on multiple nodes- unable to launch specified application on client node

2009-11-05 Thread Terry Frankcombe

For small ad hoc COWs I'd vote for sshfs too.  It may well be as slow as
a dog, but it actually has some security, unlike NFS, and is a doddle to
make work with no superuser access on the server, unlike NFS.


On Thu, 2009-11-05 at 17:53 -0500, Jeff Squyres wrote:
> On Nov 5, 2009, at 5:34 PM, Douglas Guptill wrote:
> 
> > I am currently using sshfs to mount both OpenMPI and my application on
> > the "other" computers/nodes.  The advantage to this is that I have
> > only one copy of OpenMPI and my application.  There may be a
> > performance penalty, but I haven't seen it yet.
> >
> 
> 
> For a small number of nodes (where small <=32 or sometimes even <=64),  
> I find that simple NFS works just fine.  If your apps aren't IO  
> intensive, that can greatly simplify installation and deployment of  
> both Open MPI and your MPI applications IMNSHO.
> 
> But -- every app is different.  :-)  YMMV.
>

Re: [OMPI users] How to create multi-thread parallel program using thread-safe send and recv?

2009-09-22 Thread Terry Frankcombe

If you want all threads to communicate via MPI, and your initially
launching multiple parents, I don't really see the advantage of using
threads at all.  Why not launch 12 MPI processes?

On Tue, 2009-09-22 at 10:32 -0700, Eugene Loh wrote:
> guosong wrote: 
> > Thanks for responding. I used a linux cluster. I think I would like
> > to create a model that is multithreaded and each thread can make MPI
> > calls. I attached test code as follow. It has two pthreads and there
> > are MPI calls in both of those two threads. In the main function,
> > there are also MPI calls. Should I use a full multithreading?
> I guess so.  It seems like the created threads are expected to make
> independent/concurrent message-passing calls.  Do read the link I
> sent.  You need to convert from MPI_Init to MPI_Init_thread(), asking
> for a full-multithreaded model and checking that you got it.  Also
> note in main() that the MPI_Isend() calls should be matched with
> MPI_Wait() or similar calls.  I guess the parent thread will sit in
> such calls while the child threads do their own message passing.  Good
> luck.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] SVD with mpi

2009-09-09 Thread Terry Frankcombe


Take a look at http://www.netlib.org/scalapack/

Ciao
Terry


On Tue, 2009-09-08 at 13:55 +0200, Attila Börcs wrote:
> Hi Everyone, 
> 
> I'd like to achieve singular value decomposition with mpi. I heard
> about Lanczos algorith and some different kind of algorith for svd,
> but I need some help about this theme. Knows anybody some usable code
> or tutorial about parallel svd?
> 
> Best Regards, 
> 
> Attila
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Installation problems

2009-07-28 Thread Terry Frankcombe


>   I suspect that my not changing my .bash_profile is indeed the
> problem unfortunately I do not know were yum placed open's lib and bin
> directories. I tried looking in /usr/local/ but did not find and
> openmpi directory, I suspect this has something to my Fedora
> distribution as apposed to the Debian bellow. Could anybody tell me
> where I could suspect to find the directory to place in my profile on
> a Fedora or Red hat system when installed through yum? Much
> appreciated.


Do this:

whereis libmpi.so

Re: [OMPI users] Send variable size of matrices

2009-07-21 Thread Terry Frankcombe

Which language bindings?

For Fortran, consider pack or reshape.  (I *think* whether array
sections are bundled off into temporary, contiguous storage is
implementation-dependent.)

Isn't it easier to broadcast the size first?


On Tue, 2009-07-21 at 11:53 +0530, Prasadcse Perera wrote:
> Hi all,
> I'm writing an application which requires sending some variable size
> of sub matrices to a set of processes by a lead process who holds the
> original matrix.  Here, the matrices are square matrices and the
> receiving process doesn't  know the size of the receiving matrix. In
> MPI_Bcast, I have seen that we can broadcast a whole matrix. Is there
> a similar way to do this with a derived data type for the matrices
> which we can send a matrix without looping the blocks ?.
> 
> Thanks,
> Prasad.
> -- 
> http://www.codeproject.com/script/Articles/MemberArticles.aspx?amid=3489381
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] MPI and C++ (Boost)

2009-07-07 Thread Terry Frankcombe

On Mon, 2009-07-06 at 23:09 -0400, John Phillips wrote:
> Luis Vitorio Cargnini wrote:
> > 
> > Your suggestion is a great and interesting idea. I only have the fear to 
> > get used to the Boost and could not get rid of Boost anymore, because 
> > one thing is sure the abstraction added by Boost is impressive, it turn 
> > the things much less painful like MPI to be implemented using C++, also 
> > the serialization inside Boost::MPI already made by Boost to use MPI is 
> > astonishing attractive, and of course the possibility to add new types 
> > like classes to be able to send objects through MPI_Send of Boost, this 
> > is certainly attractive, but again I do not want to get dependent of a 
> > library as I said, this is my major concern.
> > .
> 
>I'm having problems understanding your base argument here. It seems 
> to be that you are afraid that Boost.MPI will make your prototype 
> program so much better and easier to write that you won't want to remove 
> it. Wouldn't this be exactly the reason why keeping it would be good?
> 
>I like and use Boost.MPI. I voted for inclusion during the review in 
> the Boost developer community. However, what you should do in your 
> program is use those tools that produce the right trade off between the 
> best performance, easiest to develop correctly, and most maintainable 
> program you can. If that means using Boost.MPI, then remember that 
> questions about it are answered at the Boost Users mailing list. If your 
> decision is that that does not include Boost.MPI then you will have some 
> other challenges to face but experience shows that you can still produce 
> a very high quality program.
> 
>Choose as you see fit, just be sure to understand your own reasons. 
> (Whether any of the rest of us on this list understand them or not.)

I understand Luis' position completely.  He wants an MPI program, not a
program that's written in some other environment, no matter how
attractive that may be.  It's like the difference between writing a
numerical program in standard-conforming Fortran and writing it in the
latest flavour of the month interpreted language calling highly
optimised libraries behind the scenes.

IF boost is attached to MPI 3 (or whatever), AND it becomes part of the
mainstream MPI implementations, THEN you can have the discussion again.

Ciao
Terry

-- 
Dr. Terry Frankcombe
Research School of Chemistry, Australian National University
Ph: (+61) 0417 163 509Skype: terry.frankcombe

Re: [OMPI users] What flags for configure for a single machine installation ?

2009-06-08 Thread Terry Frankcombe

On Fri, 2009-06-05 at 14:36 +0200, DEVEL Michel wrote:
> Terry Frankcombe a écrit :
> > Is there any compelling reason you're not using the wrappers
> > mpif77/mpif90?
> >
> >   
> In fact, this is for the same reason that I also try to use static linking:
> I have been using two middle-size clusters as a normal user without root
> privilege.
> Hence I cannot update the compiler/libraries... packages.
> The binaries are installed via apt-get (Debian distro) and almost never
> updated by the administrator.
> Hence, they are not optimized for our hardware.
> The default gfortran is still  4.1.2 and ifort is  10.0 20070426 (hence
> also mpif90).
> I have made tests with more recent compilers and libraries installed
> under my account and the system compilers indeed produce much slower codes.
> I could go on this way and use LD_LIBRARY_PATH to point to my private
> versions of the compilers/library but I have problems with the fact that
> the SGE batch system uses an openmpi environment with an old version of
> openmpi coherent with the compilers and glibc versions...
> Furthermore, I would like to have the same computing environment on my
> machine and on the cluster.
> 
> Maybe there is a clever way to deal with my problems than going for
> static link, but I have already wasted quite some time trying other
> solutions unsuccessfully. However, I would evidently appreciate if
> someone could point me one! ;-)

I'm no SGE expert.  But don't you have a PE available that simply
allocates nodes and calls your script?  Then you can specify in your
script any mpirun you want, and it all should still work.
Alternatively, can't you shut down the SGE-called mpirun as the first
thing you do, then continue on calling your own mpirun?  All this
depends on exactly how your machine and SGE is set up.

I don't see how statically linking your app avoids this issue at all, if
you're still calling the wrong mpirun.

Re: [OMPI users] What flags for configure for a single machine installation ?

2009-06-04 Thread Terry Frankcombe

Is there any compelling reason you're not using the wrappers
mpif77/mpif90?


On Thu, 2009-06-04 at 18:01 +0200, DEVEL Michel wrote:
> Dear all,
> 
> I still have problems with installing and using openmpi:
> 
> 1°) In fact I just want to install openmpi on my machine (single i7 920)
> to be able to develop parallel codes (using eclipse/photran/PTP) that I
> will execute on a cluster later (using SGE batch queue system).
> I therefore wonder what kind of configure flags I could put to have a
> basic single-machine installation ?
> 
> 2°) For GCC, "./configure --prefix=/usr/local --with-sge
> --enable-static" worked but when I try to statically link a test code by
> gfortran -m64 -O3 -fPIC -fopenmp -fbounds-check -pthread --static 
> testmpirun.f  -o bin/testmpirun_gfortran_static -I/usr/local/include
> -L/usr/local/lib -lmpi_f90 -lmpi_f77 -lmpi -lopen-rte -lopen-pal -ldl
> -lnsl -lutil -lm -ldl
> It fails because the link step does not find Infiniband routines (ibv_*).
> If I use dynamical link, it works but asks me for a password when I try
> to do
> "/usr/bin/local/mpirun -np 4 bin/testmpirun_gfortran_static" though I
> have an a priori valid .rhosts file...
> 
> 3°) for the intel compiler suite case
> "./configure --prefix=/opt/intel/Compiler/11.0/074 --with-sge
> --enable-static CC='icc' CFLAGS=' -xHOST -ip -O3 -C' LDFLAGS='-xHOST -ip
> -O3 -C -static-intel' AR='ar' F77='ifort' FC='ifort' FFLAGS=' -xHOST -ip
> -O3 -C' FCFLAGS=' -xHOST -ip -O3 -C' CXX='icpc' CXXFLAGS=' -xHOST -ip
> -O3 -C'"
> worked but I have the same problem with missing ibv_ * routines if I try
> a static link
> "ifort -Bdynamic -fast -C -openmp -check noarg_temp_created 
> testmpirun.f  -o bin/testmpirun_ifort_dynamic
> -I/opt/intel/Compiler/11.0/074/include
> -L/opt/intel/Compiler/11.0/074/lib -lmpi_f90 -lmpi_f77 -lmpi -lopen-rte
> -lopen-pal -ldl -lnsl -lutil -lm -ldl"
> 
> (Remark: If I add "-static" to LDFLAGS in configure, it  stops during
> the making of opal_wrapper).
> 
> If I use dynamic link, I get the executable but then
> /opt/intel/Compiler/11.0/074/bin/mpirun -np 4
> ../../bin/testmpirun_ifort_dynamic
> gives
> --
> mpirun noticed that process rank 0 with PID 16664 on node mn2s-devel
> exited on signal 11 (Segmentation fault).
> --
> 2 total processes killed (some possibly by mpirun during cleanup)
> 
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] LSF launch with OpenMPI

2009-05-05 Thread Terry Frankcombe

On Tue, 2009-05-05 at 12:10 +0200, Matthieu Brucher wrote:
> Hello,
> 
> I have two questions, in fact.
> 
> The first is what the support of LSF by OpenMPI means. When mpirun is
> executed, it is an LSF job that is actually ran? Or what does it
> imply? I've tried to search on the openmpi website as well as on the
> internet, but I couldn't find a clear answer/use case.

Hi Matthieu

I think it's fair to say that if "batch system XYZ" is supported, then
in a job script submitted to that batch system you can issue an mpirun
command without manually specifying numbers of processes, hostnames,
launch protocols, etc.  They're all picked up using the mechanisms of
the batch system.

If LSF has any peculiarities, someone will point them out, I'm sure.

Configuring for LSF I can't help you with.

Ciao

Re: [OMPI users] 100% CPU doing nothing!?

2009-04-22 Thread Terry Frankcombe

On Tue, 2009-04-21 at 19:19 -0700, Ross Boylan wrote:
> I'm using Rmpi (a pretty thin wrapper around MPI for R) on Debian Lenny
> (amd64).  My set up has a central calculator and a bunch of slaves to
> wich work is distributed.
> 
> The slaves wait like this:
> mpi.send(as.double(0), doubleType, root, requestCode, comm=comm)
> request <- request+1
> cases <- mpi.recv(cases, integerType, root, mpi.any.tag(),
> comm=comm)
> 
> I.e., they do a simple send and then a receive.
> 
> It's possible there's no one to talk to, so it could be stuck at
> mpi.send or mpi.recv.
> 
> Are either of those operations that should chew up CPU?  At this point,
> I'm just trying to figure out where to look for the source of the
> problem.


The short response is:  why do you not want it to use the whole CPU
while waiting?

There's some discussion starting here:
<http://www.open-mpi.org/community/lists/users/2008/04/5457.php>

If you really want to make your application run slower, you can yield
the CPU when not doing much:
<http://www.open-mpi.org/faq/?category=running#force-aggressive-degraded>


Ciao
Terry

-- 
Dr. Terry Frankcombe
Research School of Chemistry, Australian National University
Ph: (+61) 0417 163 509Skype: terry.frankcombe

Re: [OMPI users] An mpirun question

2009-04-16 Thread Terry Frankcombe

Hi Min Zhu

You need to read about hostfiles and bynode/byslot scheduling.  See
here:


Ciao


On Thu, 2009-04-16 at 10:43 +0100, Min Zhu wrote:
> Dear all,
> 
>  
> 
> I wonder if you could help me with this question.
> 
> I have got 3 Linux servers with 8 processors on each server. If I want
> to run a job using mpirun command
> 
> and specify the number of processors to be used on each server. Is
> there any way to do this? At the moment,
> 
> I find that I can only issue such command “mpirun –np 14 –host
> cfd1,cfd2 ./wrf.exe’ which means the mpirun
> 
> Will run the job using 7 processors on each server cfd1 and cfd2. Can
> I specify say using 8 processors on cfd1 and
> 
> 6 processors on cfd2? I ask this question because I found that the
> different combination of processors on those
> 
> Servers can influence the computation time dramatically. Thank you
> very much in advance,
> 
>  
> 
> Cheers,
> 
>  
> 
> Min Zhu
> 
> 
> 
> CONFIDENTIALITY NOTICE: This e-mail, including any attachments,
> contains information that may be confidential, and is protected by
> copyright. It is directed to the intended recipient(s) only. If you
> have received this e-mail in error please e-mail the sender by
> replying to this message, and then delete the e-mail. Unauthorised
> disclosure, publication, copying or use of this e-mail is prohibited.
> Any communication of a personal nature in this e-mail is not made by
> or on behalf of any RES group company. E-mails sent or received may be
> monitored to ensure compliance with the law, regulation and/or our
> policies.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] ssh MPi and program tests

2009-04-07 Thread Terry Frankcombe

On Tue, 2009-04-07 at 11:39 +0200, Francesco Pietra wrote:
> Hi Gus:
> I should have set clear at the beginning that on the Zyxel router
> (connected to Internet by dynamic IP afforded by the provider)  there
> are three computers. Their host names:
> 
> deb32 (desktop debian i386)
> 
> deb64 (multisocket debian amd 64 lenny)
> 
> tya64 (multisocket debian amd 64 lenny)
> 
> The three are ssh passwordless interconnected from the same user
> (myself). I never established connections as root user because I have
> direct access to all tree computers. So, if I slogin as user,
> passwordless connection is established. If I try to slogin as root
> user, it says that the authenticity of the host to which I intended to
> connect can't be established, RSA key fingerprint .. Connect?
> 
> Moreover, I appended to the pub keys know to deb64 those that deb64
> had sent to either deb32 or tya64. Whereby, when i command.
> 
> With certain programs (conceived for batch run), the execution on
> deb64 is launched from deb32.
> 
> ssh 192.168.#.## date (where the numbers stand for hostname)
> 
> 
> I copied /examples to my deb64 home, chown to me, compiled as user and
> run as user "connectivity".  (I have not compild in the openmpi
> directory as this is to root user, while ssh has been adjusted for me
> as user.
> 
> Running as user in my home
> 
> /usr/local/bin/mpirun -deb64 -1 connectivity_c 2>&1 | tee n=1.connectivity.out
> 
> it asked to add the host (himself) to the list on known hosts (on
> repeating the command, that was no more asked). The unabridged output:

The easiest setup is for the executable to be accessible on all nodes,
either copied or on a shared filesystem.  Is that the case here?

(I haven't read the whole thread, so apologies if this has already been
covered.)

Re: [OMPI users] libnuma under ompi 1.3

2009-03-04 Thread Terry Frankcombe


Thanks to everyone who contributed.

I no longer think this is Open MPI's problem.  This system is just
stupid.  Everything's 64 bit (which various probes with file confirm).

There's no icc, so I can't test with that.  gcc finds libnuma without
-L.  (Though a simple gcc -lnuma -Wl,-t reports that libnuma is found
through the rather convoluted
path /usr/lib64/gcc-lib/x86_64-suse-linux/3.3.4/../../../../lib64/libnuma.so.)

ifort -lnuma can't find libnuma.so, but then ifort -L/usr/lib64 -lnuma
can't find it either!  While everything else points to some mix up with
linking search paths, that last result confuses me greatly.  (Unless
there's some subtlety with libnuma.so being a link to libnuma.so.1.)

I can compile my app by replicating mpif90's --showme output directly on
the command line, with -lnuma replaced explicitly
with /usr/lib64/libnuma.so.  Then, even though I've told ifort -static,
ldd gives the three lines:

libnuma.so.1 => /usr/lib64/libnuma.so.1 (0x2b3f58a3c000)
libc.so.6 => /lib64/tls/libc.so.6 (0x2b3f58b42000)
/lib/ld64.so.1 => /lib/ld64.so.1 (0x2b3f58925000)

While I don't understand what's going on here, I now have a working
binary.  It's the only app I use on this machine, so I'm no longer
concerned.  All other machines on which I use Open MPI work as expected
out of the box.  My workaround here is sufficient.

Once more, thanks for the suggestions.  I think this machine is just
pathological.

Ciao
Terry

Re: [OMPI users] Is this an OpenMPI bug?

2009-02-21 Thread Terry Frankcombe

When you say "a real variable", you mean default real, no crazy implicit
typing or anything?

I think if x is real(8) you'd see what you say you see.


On Fri, 2009-02-20 at 18:54 -0500, -Gim wrote:
> I am trying to use the mpi_bcast function in fortran.  I am using
> open-mpi-v-1.2.7
> 
> Say x is a real variable of size 100. np =100  I try to bcast this to
> all the processors. 
> 
> I use call mpi_bcast(x,np,mpi_real,0,ierr) 
> 
> When I do this and try to print the value from the resultant
> processor, exactly half the values gets broadcast.  In this case, I
> get 50 correct values in the resultant processor and rest are junk.
> Same happened when i tried with np=20.. Exactly 10 values gets
> populated and rest are junk.!!
> 
> ps: I am running this in a single processor. ( Just testing purposes )
> I run this with "mpirun -np 4  "
> 
> Cheerio,
> Gim
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] using ompi-server on a single node

2009-01-05 Thread Terry Frankcombe

But why doesn't tcp work on loopback?


On Mon, 2009-01-05 at 07:25 -0700, Ralph Castain wrote:
> It is currently a known limitation - shared memory currently only  
> works between procs from the same job. There is an enhancement coming  
> that will remove this restriction, but it won't be out for some time.
> 
> Ralph
> 
> On Jan 5, 2009, at 1:06 AM, Thomas Ropars wrote:
> 
> > Hi,
> >
> > I've tried to use ompi-server to connect 2 processes belonging to
> > different jobs but running on the same computer. It works when the
> > computer has a network interface up. But if the only active network
> > interface is the local loop, it doesn't work.
> >
> > According to what I understood reading the code, it is because no btl
> > component can be used in this case. "tcp" is not used because usually
> > it is the "sm" component that is used for processes on the same host.
> > But in that case it doesn't work because "sm" is supposed to work only
> > for processes of the same job.
> >
> > I know that this use-case is not very frequent  :)
> > But Is there a solution to make it work ? or is it a known  
> > limitation ?
> >
> > Regards
> >
> > Thomas
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] openMPI, transfer data from multiple sources to one destination

2008-12-28 Thread Terry Frankcombe


It sounds like you may like to read about MPI_ANY_SOURCE as the source for
MPI_Recv.  Using MPI_Probe, with MPI_ANY_SOURCE, may also be a solution.

> HI,
>
> I need to transfer data from multiple sources to one destination.
> The requirement is:
>
> (1) The sources and destination nodes may work asynchronously.
>
> (2) Each source node generates data package in their own paces.
> And, there may be many packages to send. Whenever, a data package
> is generated , it should be sent to the desination node at once.
> And then, the source node continue to work on generating the next
> package.
>
> (3) There is only one destination node , which must receive all data
> package generated from the source nodes.
> Because the source and destination nodes may work asynchronously,
> the destination node should not wait for a specific source node until
> the source node sends out its data.
>
> The destination node should be able to receive data package
> from anyone source node whenever the data package is available in a
> source node.
>
> My question is :
>
> What MPI function should be used to implement the protocol above ?
>
> If I use MPI_Send/Recv, they are blocking function. The destination
> node have to wait for one node until its data is available.
>
> The communication overhead is too high.
>
> If I use MPI_Bsend, the destination node has to use MPI_Recv to ,
> a Blocking receive for a message .
>
> This can make the destination node wait for only one source node and
> actually other source nodes may have data avaiable.
>
>
> Any help or comment is appreciated !!!
>
> thanks
>
> Dec. 28 2008
>
>
> _
> It?s the same Hotmail®. If by ?same? you mean up to 70% faster.
> http://windowslive.com/online/hotmail?ocid=TXT_TAGLM_WL_hotmail_acq_broad1_122008___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] pgi and gcc runtime compatability

2008-12-07 Thread Terry Frankcombe

Many of today's compilers for Linux (pgi, intel, etc.) are designed to
be link-compatible with gcc.  That must extend to calling conventions
(mangling schemes and argument passing, etc.)

If it's static link-compatible, surely this applies to dynamic (runtime)
linking (right?)

Is there stuff going on internal to OMPI that requires tighter
integration between app and library than standard function calls tying
together?  How invasive is the memory management stuff?



On Sun, 2008-12-07 at 22:06 -0500, Brock Palen wrote:
> I did something today that I was happy worked,  but I want to know if  
> anyone has had problem with it.
> 
> At runtime. (not compiling)  would a OpenMPI built with pgi  work to  
> run a code that was compiled with the same version but gcc built  
> OpenMPI ?  I tested a few apps today after I accidentally did this  
> and found it worked.  They were all C/C++ apps  (namd and gromacs)   
> but what about fortran apps?   Should we expect problems if someone  
> does this?
> 
> I am not going to encourage this, but it is more if needed.
> 
> 
> Brock Palen
> www.umich.edu/~brockp
> Center for Advanced Computing
> bro...@umich.edu
> (734)936-1985
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Processor/core selection/affinity for large shared memory systems

2008-12-04 Thread Terry Frankcombe

Isn't it up to the OS scheduler what gets run where?


> We have an 8-way, 32-core AMD processor machine (single system image)
> and are at present running OpenMPI 1.2.8 .  Jobs are launched locally on
> the machine itself.  As far as I can see, there doesn't seem to be any
> way to tell OpenMPI to launch the MPI processes on adjacent cores.
> Presumably such functionality is technically possible via PLPA.  Is
> there in fact a way to specify such a thing with 1.2.8, and if not, will
> 1.3 support these kinds arguments?
>
> Thank you.
> --
>   V. Ram
>   v_r_...@fastmail.fm
>
> --
> http://www.fastmail.fm - Or how I learned to stop worrying and
>   love email again
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] Problems installing in Cygwin - Problem with GCC 3.4.4

2008-11-04 Thread Terry Frankcombe

> *** Fortran 90/95 compiler
> checking whether we are using the GNU Fortran compiler... yes
> checking whether g95 accepts -g... yes
> checking if Fortran compiler works... yes
> checking whether g95 and g95 compilers are compatible... no
> configure: WARNING: *** Fortran 77 and Fortran 90 compilers are not
> link compatible
> configure: WARNING: *** Disabling MPI Fortran 90/95 bindings


OK, for that one I think you need to dig into config.log and see exactly
what's failing and why.

I can't speak for the developers, but it seems slightly concerning that
configure thinks it's using "the GNU Fortran compiler".  I feel sure the
GNU people would object to g95 being called that.

Re: [OMPI users] Problems installing in Cygwin - Problem with GCC 3.4.4

2008-11-03 Thread Terry Frankcombe

> On Nov 3, 2008, at 3:36 PM, Gustavo Seabra wrote:
>
>>> For your fortran issue, the Fortran 90 interface needs the Fortran 77
>>> interface.  So you need to supply an F77 as well (the output from
>>> configure
>>> should indicate that the F90 interface was disabled because the F77
>>> interface was disabled).
>>
>> Is that what you mean (see below)?
>
> Ah yes -- that's another reason the f90 interface could be disabled:
> if configure detects that the f77 and f90 compilers are not link-
> compatible.
>
>> I thought the g95 compiler could
>> deal with F77 as well as F95... If so, could I just pass F77='g95'?
>
> That would probably work (F77=g95).  I don't know the g95 compiler at
> all, so I don't know if it also accepts Fortran-77-style codes.  But
> if it does, then you're set.  Otherwise, specify a different F77
> compiler that is link compatible with g95 and you should be good.

Fortran 90 is a superset of the archaic, hamstrung, "I'm too old to learn
how to program in a useful manner and I still use punched cards" Fortran
77.  All Fortran 90 compilers are Fortran 77 compilers, by definition. 
Fortran 95 has a few (~5) deleted features and a few minor added features.
 I've never heard of a Fortran 95 compiler that wasn't a Fortran 90
compiler, and thus a Fortran 77 compiler.

Take g77 and throw it away.  While it's not particularly buggy, it hasn't
been maintained for years and should be out-performed by a more modern
compiler such as g95 or gfortran.

Re: [OMPI users] MPI + Mixed language coding(Fortran90 + C++)

2008-11-02 Thread Terry Frankcombe

On Sat, 2008-11-01 at 07:52 -0400, Jeff Squyres wrote:
> On Oct 31, 2008, at 3:07 PM, Rajesh Ramaya wrote:
> 
> >   Thank you very much for the immediate reply. I am able to  
> > successfully
> > access the data from the common block but the values are zero. In my
> > algorithm I even update a common block but the update made by the  
> > shared
> > library is not taken in to account by the executable.
> 
> Can you reduce this to a small example that you can share, perchance?

Is it all the MPI processes getting zeros, or only a subset?  Does what
you see change if you run your app without mpirun, or with mpirun -n 1?

Re: [OMPI users] MPI_SUM and MPI_REAL16 with MPI_ALLREDUCE in fortran90

2008-10-28 Thread Terry Frankcombe

I assume you've confirmed that point to point communication works
happily with quad prec on your machine?  How about one-way reductions?


On Tue, 2008-10-28 at 08:47 +, Julien Devriendt wrote:
> Thanks for your suggestions.
> I tried them all (declaring my variables as REAL*16 or REAL(16)) to no 
> avail. I still get the wrong answer with my call to MPI_ALLREDUCE.
> 
> > I think the KINDs are compiler dependent.  For Sun Studio Fortran, REAL*16 
> > and REAL(16) are the same thing.  For Intel, maybe it's different.  I don't 
> > know.  Try running this program:
> >
> > double precision xDP
> > real(16) x16
> > real*16 xSTAR16
> > write(6,*) kind(xDP), kind(x16), kind(xSTAR16), kind(1.0_16)
> > end
> >
> > and checking if the output matches your expectations.
> >
> > Jeff Squyres wrote:
> >
> >> I dabble in Fortran but am not an expert -- is REAL(kind=16) the same  as 
> >> REAL*16?  MPI_REAL16 should be a 16 byte REAL; I'm not 100% sure  that 
> >> REAL(kind=16) is the same thing...?
> >> 
> >> On Oct 23, 2008, at 7:37 AM, Julien Devriendt wrote:
> >> 
> >>> Hi,
> >>> 
> >>> I'm trying to do an MPI_ALLREDUCE with quadruple precision real and
> >>> MPI_SUM and open mpi does not give me the correct answer (vartemp
> >>> is equal to vartored instead of 2*vartored). Switching to double 
> >>> precision
> >>> real works fine.
> >>> My version of openmpi is 1.2.7 and it has been compiled with ifort  v10.1
> >>> and icc/icpc at installation
> >>> 
> >>> Here's the simple f90 code which fails:
> >>> 
> >>> program test_quad
> >>> 
> >>>implicit none
> >>> 
> >>>include "mpif.h"
> >>> 
> >>>real(kind=16) :: vartored(8),vartemp(8)
> >>>integer   :: nn,nslaves,my_index
> >>>integer   :: mpierror
> >>> 
> >>>call MPI_INIT(mpierror)
> >>>call MPI_COMM_SIZE(MPI_COMM_WORLD,nslaves,mpierror)
> >>>call MPI_COMM_RANK(MPI_COMM_WORLD,my_index,mpierror)
> >>> 
> >>>nn   = 8
> >>>vartored = 1.0_16
> >>>vartemp  = 0.0_16
> >>>print*,"P1 ",my_index,vartored
> >>>call  MPI_ALLREDUCE 
> >>> (vartored,vartemp,nn,MPI_REAL16,MPI_SUM,MPI_COMM_WORLD,mpierror)
> >>>print*,"P2 ",my_index,vartemp
> >>> 
> >>>stop
> >>> 
> >>> end program test_quad
> >>> 
> >>> Any idea why this happens?
> >> 
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Can OpenMPI support multiple compilers?

2008-10-19 Thread Terry Frankcombe

It happily supports multiple compilers on the same system, but not in
the way you mean.  You need another installation of OMPI (in,
say, /usr/lib64/mpi/intel) for icc/ifort.

Select by path manipulation.

On Mon, 2008-10-20 at 08:19 +0800, Wen Hao Wang wrote:
> Hi all:
> 
> I have openmpi 1.2.5 installed on SLES10 SP2. These packages should be
> compiled with gcc compilers. Now I have installed Intel C++ and
> Fortran compilers on my cluster. Can openmpi use Intel compilers
> withour recompiling?
> 
> I tried to use environment variable to indicate Intel compiler, but it
> seems the mpi commands still wanted to use gcc ones.
> LS21-08:/opt/intel/fce/10.1.018/bin # mpif77 --showme
> gfortran -I/usr/lib64/mpi/gcc/openmpi/include -pthread
> -L/usr/lib64/mpi/gcc/openmpi/lib64 -lmpi_f77 -lmpi -lopen-rte
> -lopen-pal -ldl -Wl,--export-dynamic -lnsl -lutil -lm -ldl
> LS21-08:/opt/intel/fce/10.1.018/bin # export
> F77=/opt/intel/fce/10.1.018/bin/ifort
> LS21-08:/opt/intel/fce/10.1.018/bin # rpm -e
> gcc-fortran-4.1.2_20070115-0.21
> LS21-08:/opt/intel/fce/10.1.018/bin # mpif77 /LTC/matmul-for-intel.f
> --
> The Open MPI wrapper compiler was unable to find the specified
> compiler
> gfortran in your PATH.
> 
> Note that this compiler was either specified at configure time or in
> one of several possible environment variables.
> 
> --
> 
> Is it possible to change openmpi's underlying compiler? Thus I can use
> multiple compilers on one machine.
> 
> Thanks in advance!
> 
> Steven Wang
> Email: wangw...@cn.ibm.com
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Debian MPI -- mpirun missing

2008-10-17 Thread Terry Frankcombe

Er, shouldn't this be in the Debian support list?  A correctly installed
OpenMPI will give you mpirun.  If their openmpi-bin package doesn't,
then surely it's broken?  (Or is there a straight openmpi package?)



On Sat, 2008-10-18 at 00:16 +0900, Raymond Wan wrote:
> Hi all,
> 
> I'm very new to MPI and am trying to install it on to a Debian Etch 
> system.  I did have mpich installed and I believe that is causing me 
> problems.  I completely uninstalled it and then ran:
> 
> update-alternatives --remove-all mpicc
> 
> Then, I installed the following packages:
> 
> libibverbs1 openmpi-bin openmpi-common openmpi-libs0 openmpi-dbg openmpi-dev
> 
> And it now says:
> 
>  >> update-alternatives --display mpicc
> mpicc - status is auto.
>  link currently points to /usr/bin/mpicc.openmpi
> /usr/bin/mpicc.openmpi - priority 40
>  slave mpif90: /usr/bin/mpif90.openmpi
>  slave mpiCC: /usr/bin/mpic++.openmpi
>  slave mpic++: /usr/bin/mpic++.openmpi
>  slave mpif77: /usr/bin/mpif77.openmpi
>  slave mpicxx: /usr/bin/mpic++.openmpi
> Current `best' version is /usr/bin/mpicc.openmpi.
> 
> which seems ok to me...  So, I tried to compile something (I had sample 
> code from a book I purchased a while back, but for mpich), however, I 
> can run the program as-is, but I think I should be running it with 
> mpirun -- the FAQ suggests there is one?  But, there is no mpirun 
> anywhere.  It's not in /usr/bin.  I updated the filename database 
> (updatedb) and tried a "locate mpirun", and I get only one hit:
> 
> /usr/include/openmpi/ompi/runtime/mpiruntime.h
> 
> Is there a package that I neglected to install?  I did an "aptitude 
> search openmpi" and installed everything listed...  :-)  Or perhaps I 
> haven't removed all trace of mpich?
> 
> Thank you in advance!
> 
> Ray
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Performance: MPICH2 vs OpenMPI

2008-10-09 Thread Terry Frankcombe

>I'm rusty on my GCC, too, though - does it default to an O2
> level, or does it default to no optimizations?

Default gcc is indeed no optimisation.  gcc seems to like making users
type really long complicated command lines even more than OpenMPI does.

(Yes yes, I know!  Don't tell me!)

Re: [OMPI users] qsub - mpirun problem

2008-09-29 Thread Terry Frankcombe

On Mon, 2008-09-29 at 17:30 -0500, Zhiliang Hu wrote:
> >As you blank out some addresses: have the nodes and the headnode one  
> >or two network cards installed? All the names like node001 et al. are  
> >known on neach node by the correct address? I.e. 172.16.100.1 = node001?
> >
> >-- Reuti
> 
> There should be no problem in this regard -- the set up is by a 
> commercial company. I can ssh from any node to any node (passwdless).
> 
> Zhiliang

Your faith in commercial enterprises is touching.  Unfortunately, it's
at odds with my experience, on two continents.

Like Reuti said, if you paid someone to set up a cluster to run parallel
jobs and it won't run parallel jobs, then yell at them loud and long.

I'll also reiterate that this sounds like a PBS problem rather than
(yet) an OpenMPI problem.  It seems you left the PBS discussion
prematurely.

Re: [OMPI users] how to install openmpi with a specific gcc

2008-09-26 Thread Terry Frankcombe

http://www.open-mpi.org/faq/?category=building#build-compilers

But this won't help build your app with the dusty, crumbling 2.95.
You're telling OpenMPI (and all of OpenMPI, including mpicc) to use a
specific compiler.

You need to fix your code so it's portable.



On Thu, 2008-09-25 at 20:56 -0700, Shafagh Jafer wrote:
> Hi!
> on my system the default gcc is 2.95.3. For openmpi i have installed
> gcc-3.4.6 but i kept the default gcc to stay gcc-2.95.3. Now, how can
> I configure/install openmpi with gcc-3.4.6?what options should i give
> when configuring it so that it doesnt go and pick upt the dafault
> 2.95.3 ???
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] which gcc to compile openmpi with?

2008-09-24 Thread Terry Frankcombe

Both of those are really ancient.  Fortran in particular will not work
happily with those.  Why don't you install something from the current
epoch?  I run happily with gcc 4.3.2.

On Wed, 2008-09-24 at 08:36 -0700, Shafagh Jafer wrote:
> which gcc is prefered to compile openmpi with?? gcc-2.95.3 or
> gcc-3.2.3 ???
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Configure and Build ok, but mpi module not recognized?

2008-09-23 Thread Terry Frankcombe

Just to jump in on the side of Fortran here:  The statement ordering
rules are indeed sensible.  You need to have your implicit typing set
before you start declaring stuff (so include must come after implicit).
You need to have all your modules used before setting your implicit
typing (as modules often define new types, which must be known before
you start declaring implicit rules).

On Tue, 2008-09-23 at 13:34 -0400, Gus Correa wrote:
> Hi Brian and list
> 
> Terry Frankcombe is right on the spot on that recommendation to you.
> 
> Just to support Terry's suggestion, here is what  "Fortran 95/2003 
> Explained", by Michael Metcalf,
> John Reid, and Malcolm Cohen, Oxford Univ. Press, 2004, pp. 144, Section 
> 7.10, says about it:
> 
> "Any 'use' statements must precede other specification statements in a 
> scoping unit."
> 
> Fortran 90/95/2003 scoping rules are a pain,
> written on a legalistic-forensic style, but this one at least is easy to 
> remember! :)
> 
> BTW, "scope" corresponds to the location and accessibility of a variable 
> or another Fortran entity
> on a programming unit.
> I.e., to be accessed by a programming unit, the entity must be "visible" 
> there,
> even if it is declared somewhere else.
> "Scope" is not related to the program unit source form (i.e. not to free 
> vs. fixed form),
> which may have been part of the confusion on our conversation.
> 
> Cheers,
> Gus Correa
> 
> -- 
> -
> Gustavo J. Ponce Correa, PhD - Email: g...@ldeo.columbia.edu
> Lamont-Doherty Earth Observatory - Columbia University
> P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
> -----
> 
> 
> Brian Harker wrote:
> 
> >Ahhh, now that makes sense.  Never included, always used.  Thanks!
> >
> >On Mon, Sep 22, 2008 at 8:55 PM, Terry Frankcombe <te...@chem.gu.se> wrote:
> >  
> >
> >>Remember what include does:  it essentially dumps mpif.h into the
> >>source.  So to be proper F90 you need:
> >>
> >>PROGRAM main
> >>USE local_module
> >>IMPLICT NONE
> >>INCLUDE 'mpif.h'
> >>...
> >>
> >>
> >>On Mon, 2008-09-22 at 20:17 -0600, Brian Harker wrote:
> >>
> >>
> >>>Well, I'm stumped then...my top-level program is the only one that
> >>>uses MPI interfaces.  I USE other f90 module files, but none of them
> >>>are dependent on MPI functions.  For example here's the first few
> >>>lines of code where things act up:
> >>>
> >>>PROGRAM main
> >>>INCLUDE 'mpif.h'  (this line used to be "USE mpi"...this is
> >>>correct replacement, right?)
> >>>USE local_module
> >>>IMPLICIT NONE
> >>>...
> >>>STOP
> >>>END PROGRAM main
> >>>
> >>>with local_module.f90:
> >>>
> >>>MODULE local_module
> >>>CONTAINS
> >>>
> >>>SUBROUTINE local_subroutine
> >>>IMPLICIT NONE
> >>>...
> >>>RETURN
> >>>END SUBROUTINE local_subroutine
> >>>
> >>>END MODULE local_module
> >>>
> >>>and mpif90 gives me grief about local_module being out of scope within
> >>>main.  To the best of my knowledge, I have never used fixed-form or
> >>>had free-vs-fixed form compiler issues with ifort.  Thanks!
> >>>
> >>>On Mon, Sep 22, 2008 at 7:56 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:
> >>>  
> >>>
> >>>>Hi Brian and list
> >>>>
> >>>>On my code I have
> >>>>
> >>>> include 'mpif.h'
> >>>>
> >>>>with single quotes around the file name.
> >>>>I use single quotes, but double quotes are also possible according to the
> >>>>F90 standard.
> >>>>If you start at column 7 and end at column 72,
> >>>>you avoid any problems with free vs. fixed Fortran form (which may happen 
> >>>>if
> >>>>one decides
> >>>>to mess the compiler options on Makefiles, for instance).
> >>>>This is PDP, paranoid defensive programming.
> >>>>I type the Fortran commands in lowercase (include) but this may not really
> >>>>make any difference
> >>>>(although there are compiler options to transliterate upper to lower, 
> >>>&

Re: [OMPI users] Configure and Build ok, but mpi module not recognized?

2008-09-22 Thread Terry Frankcombe

Remember what include does:  it essentially dumps mpif.h into the
source.  So to be proper F90 you need:

PROGRAM main
USE local_module
IMPLICT NONE
INCLUDE 'mpif.h'
...


On Mon, 2008-09-22 at 20:17 -0600, Brian Harker wrote:
> Well, I'm stumped then...my top-level program is the only one that
> uses MPI interfaces.  I USE other f90 module files, but none of them
> are dependent on MPI functions.  For example here's the first few
> lines of code where things act up:
> 
> PROGRAM main
> INCLUDE 'mpif.h'  (this line used to be "USE mpi"...this is
> correct replacement, right?)
> USE local_module
> IMPLICIT NONE
> ...
> STOP
> END PROGRAM main
> 
> with local_module.f90:
> 
> MODULE local_module
> CONTAINS
> 
> SUBROUTINE local_subroutine
> IMPLICIT NONE
> ...
> RETURN
> END SUBROUTINE local_subroutine
> 
> END MODULE local_module
> 
> and mpif90 gives me grief about local_module being out of scope within
> main.  To the best of my knowledge, I have never used fixed-form or
> had free-vs-fixed form compiler issues with ifort.  Thanks!
> 
> On Mon, Sep 22, 2008 at 7:56 PM, Gus Correa  wrote:
> > Hi Brian and list
> >
> > On my code I have
> >
> >  include 'mpif.h'
> >
> > with single quotes around the file name.
> > I use single quotes, but double quotes are also possible according to the
> > F90 standard.
> > If you start at column 7 and end at column 72,
> > you avoid any problems with free vs. fixed Fortran form (which may happen if
> > one decides
> > to mess the compiler options on Makefiles, for instance).
> > This is PDP, paranoid defensive programming.
> > I type the Fortran commands in lowercase (include) but this may not really
> > make any difference
> > (although there are compiler options to transliterate upper to lower, be/not
> > be case sensitive, etc)
> >
> > However, since you asked, I changed the "include 'mpif.h'" location to
> > column 1, column 6,
> > etc, and I could compile the code with OpenMPI mpif90 without any problems.
> > Hence, I presume mpif90 uses the free form as default.
> > So, I wonder if you still have some residual mix of gfortran and ifort
> > there,
> > or if there is some funny configuration on your ifort.cfg file choosing the
> > fixed form as a default.
> >
> > In any case, going out of scope probably has nothing to do with fixed vs
> > free form.
> > I only have one "use mpi" statement, no other modules "used" in the little
> > hello_f90.
> > It may well be that if you try other "use"  statements, particularly if the
> > respective modules
> > involve other MPI modules or MPI function interfaces, things may break and
> > be out of scope.
> > I presume you are not talking of hello_f90 anymore, but of a larger code.
> > The large code examples I have here use other F90 modules (not mpi.mod),
> > but they don't rely on MPI interfaces.
> >
> > Glad to know that the main problem is gone (I read the more recent
> > messages).
> > I was reluctant to tell you to do a "make distclean", and start fresh,
> > configure again, make again,
> > because you said you had built the OpenMPI library more than once.
> > Now I think that was exactly what I should have told you to do.
> > Everybody builds these things many times to get them right.
> > Then a few more times to make it efficient, to optimize, etc.
> > Can you imagine how many times Rutherford set up that gold foil experiment
> > until he got it right?  :)
> > After it is done, the past errors become trivial, and all the rest is
> > reduced just to stamp collecting.  :)
> > Configure is always complex, and dumping its output to a log file is
> > worth the effort, to check out for sticky problems like this, which often
> > happen.
> > (Likewise for make and make install.)
> >
> > I hope this helps,
> > Gus Correa
> >
> > --
> > -
> > Gustavo J. Ponce Correa, PhD - Email: g...@ldeo.columbia.edu
> > Lamont-Doherty Earth Observatory - Columbia University
> > P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
> > -
> >
> >
> > Brian Harker wrote:
> >
> >> Hi Gus-
> >>
> >> Thanks for the idea.  One question: how do you position INCLUDE
> >> statements in a fortran program, because if I just straight substitute
> >> ' INCLUDE "mpif.h" ' for ' USE mpi ', I get a lot of crap telling me
> >> my other USE statements are not positioned correctly within the scope
> >> and nothing compiles.  I have tried various positions, but I seem to
> >> be suffering from a lot of BS.  Am I overlooking something very
> >> obvious?
> >>
> >> On Mon, Sep 22, 2008 at 5:42 PM, Gus Correa  wrote:
> >>
> >>>
> >>> Hi Brian and list
> >>>
> >>> I seldom used the "use mpi" syntax before.
> >>> I have a lot of code here written in Fortran 90,
> >>> but and mpif.h is included instead "use mpi".
> >>> The MPI function calls are the same in Fortran 77 and

Re: [OMPI users] what is inside mpicc/mpic++

2008-09-18 Thread Terry Frankcombe

In OMPI these are binaries, not scripts.  Not human readable.


[tjf@rscpc28 NH2+]$ ll /usr/local/bin/mpif90
lrwxrwxrwx 1 root root 12 2008-03-05 14:39 /usr/local/bin/mpif90 -> 
opal_wrapper*
[tjf@rscpc28 NH2+]$ file /usr/local/bin/opal_wrapper
/usr/local/bin/opal_wrapper: ELF 32-bit LSB executable, Intel 80386, version 1 
(SYSV), for GNU/Linux 2.6.8, dynamically linked (uses shared libs), not stripped




On Wed, 2008-09-17 at 22:31 -0700, Shafagh Jafer wrote:
> I am trying to figure out a problem that i am stuck in :-( could 
> anyone please tell me how their mpicc/mpic++ looks like? is there any thing 
> readable inside these files?because mine look corrupted and are filled with 
> unreadable charachters.
> Please let me know.
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] Heap profiling with OpenMPI

2008-08-07 Thread Terry Frankcombe

I can't speak for Jeff, but my understanding of the changes for 1.3
should allow you to switch off the memory manager when running your
checks.

It seems to me an obvious interim solution would be to have two versions
of OpenMPI installed, one with and one without the memory manager.  Use
one for debugging, and (if desired) the pinning-capable version for
production.


On Thu, 2008-08-07 at 09:20 +0200, Jan Ploski wrote:
> users-boun...@open-mpi.org schrieb am 08/06/2008 07:44:03 PM:
> 
> > On Aug 6, 2008, at 12:37 PM, Jan Ploski wrote:
> > 
> > >> I'm using the latest of Open MPI compiled with debug turned on, and 
> > >> valgrind 3.3.0. From your trace it looks like there is a conflict 
> > >> between two memory managers. I'm not having the same problem as I 
> > >> disable the Open MPI memory manager on my builds (configure option 
> > >> --without-memory-manager).
> > >
> > > Thanks for the tip! I confirm that the problem goes away after 
> > > rebuilding --without-memory-manager.
> > >
> > > As I also have the same problem in another cluster, I'd like to know 
> > > what side effects using this configuration option might have before 
> > > suggesting it as a solution to that cluster's admin. I didn't find 
> > > an explanation of what it does in the FAQ (beyond a recommendation 
> > > to use it for static builds). Could you please explain this option, 
> > > especially why one might want to *not* use it?
> > 
> > This is on my to-do list (add this to the FAQ); sorry it isn't done yet.
> > 
> > Here's a recent post where I've explained it a bit more:
> > 
> >  http://www.open-mpi.org/community/lists/users/2008/07/6161.php
> > 
> > Let me know if you'd like to know more.
> 
> Jeff,
> 
> Thanks for this explanation. According to what you wrote, 
> --without-memory-manager can make my and other applications run 
> significantly slower. While I can find out just how much for my app, I 
> hardly can do it for other (unknown) users. So it would be nice if my heap 
> profiling problem could be resolved in another way in the future. Is the 
> planned mpi_leave_pinned change in v1.3 going to correct it?
> 
> Regards,
> Jan Ploski
> 
> --
> Dipl.-Inform. (FH) Jan Ploski
> OFFIS
> FuE Bereich Energie | R Division Energy
> Escherweg 2  - 26121 Oldenburg - Germany
> Phone/Fax: +49 441 9722 - 184 / 202
> E-Mail: jan.plo...@offis.de
> URL: http://www.offis.de
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] getting fortran90 to compile

2008-07-13 Thread Terry Frankcombe

On Sun, 2008-07-13 at 10:58 -0500, zach wrote:
> I installed openmpi like
> ./configure --prefix= FC=/usr/bin/gfortran-4.2
> make all install

If gfortran is in your path (which is usually the best way to have it
set up) then
./configure --prefix=
should work.  (It does for me.)

Is there a particular reason you are trying to point to a particular
version of gfortran?  Is that gfortran different to the rest of your
system gcc?  If so, that's generally a bad idea.

Ciao
Terry

Re: [OMPI users] Proper way to throw an error to all nodes?

2008-06-03 Thread Terry Frankcombe

Calling MPI_Finalize in a single process won't ever do what you want.
You need to get all the processes to call MPI_Finalize for the end to be
graceful.

What you need to do is have some sort of special message to tell
everyone to die.  In my codes I have a rather dynamic master-slave model
with flags being broadcast by the master process to tell the slaves what
to expect next, so it's easy for me to send out an "it's all over,
please kill yourself" message.  For a more rigid communication pattern
you could embed the die message in the data: something like if the first
element of the received data is negative, then that's the sign things
have gone south and everyone should stop what they're doing and
MPI_Finalize.  The details depend on the details of your code.

Presumably you could also set something up using tags and message
polling.

Hope this helps.

On Tue, 2008-06-03 at 19:57 +0900, 8mj6tc...@sneakemail.com wrote:
> So I'm working on this program which has many ways it might possibly die
> at runtime, but one of them that happens frequently is the user types a
> wrong (non-existant) filename on the command prompt. As it is now, the
> node looking for the file notices the file doesn't exist and tries to
> terminate the program. It tries to call MPI_Finalize(), but the other
> nodes are all waiting for a message from the node doing the file
> reading, so MPI_Finalize waits forever until the user realizes the job
> isn't doing anything and terminates it manually.
> 
> So, my question is: what's the "correct" graceful way to handle
> situations like this? Is there some MPI function which can basically
> throw an exception to all other nodes telling them bail out now? Or is
> correct behaviour just to have the node that spotted the error die
> quietly and wait for the others to notice?
> 
> Thanks for any suggestions!
-- 
Dr. Terry Frankcombe
Research School of Chemistry, Australian National University
Ph: (+61) 0417 163 509Skype: terry.frankcombe

Re: [OMPI users] Memory manager

2008-05-21 Thread Terry Frankcombe

Hi Brian

I ran your experiment.  Changing the MMAP threshold made no difference
to the memory footprint (>8GB/process out of the box, an order of
magnitude smaller with --with-memory-manager=none).

What does that tell us?

Ciao
Terry



On Tue, 2008-05-20 at 06:51 -0600, Brian Barrett wrote:
> Terry -
> 
> Would you be willing to do an experiment with the memory allocator?   
> There are two values we change to try to make IB run faster (at the  
> cost of corner cases you're hitting).  I'm not sure one is strictly  
> necessary, and I'm concerned that it's the one causing problems.  If  
> you don't mind recompiling again, would you change line 64 in opal/mca/ 
> memory/ptmalloc2/malloc.c from:
> 
> #define DEFAULT_MMAP_THRESHOLD (2*1024*1024)
> 
> to:
> 
> #define DEFAULT_MMAP_THRESHOLD (128*1024)
> 
> And then recompile with the memory manager, obviously.  That will make  
> the mmap / sbrk cross-over point the same as the default allocator in  
> Linux.  There's still one other tweak we do, but I'm almost 100%  
> positive it's the threshold causing problems.
> 
> 
> Brian
> 
> 
> On May 19, 2008, at 8:17 PM, Terry Frankcombe wrote:
> 
> > To tell you all what noone wanted to tell me, yes, it does seem to be
> > the memory manager.  Compiling everything with
> > --with-memory-manager=none returns the vmem use to the more reasonable
> > ~100MB per process (down from >8GB).
> >
> > I take it this may affect my peak bandwidth over infiniband.  What's  
> > the
> > general feeling about how bad this is?
> >
> >
> > On Tue, 2008-05-13 at 13:12 +1000, Terry Frankcombe wrote:
> >> Hi folks
> >>
> >> I'm trying to run an MPI app on an infiniband cluster with OpenMPI
> >> 1.2.6.
> >>
> >> When run on a single node, this app is grabbing large chunks of  
> >> memory
> >> (total per process ~8.5GB, including strace showing a single 4GB  
> >> grab)
> >> but not using it.  The resident memory use is ~40MB per process.   
> >> When
> >> this app is compiled in serial mode (with conditionals to remove  
> >> the MPI
> >> calls) the memory use is more like what you'd expect, 40MB res and
> >> ~100MB vmem.
> >>
> >> Now I didn't write it so I'm not sure what extra stuff the MPI  
> >> version
> >> does, and we haven't tracked down the large memory grabs.
> >>
> >> Could it be that this vmem is being grabbed by the OpenMPI memory
> >> manager rather than directly by the app?
> >>
> >> Ciao
> >> Terry
> >>
> >>
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
>

Re: [OMPI users] Memory manager

2008-05-19 Thread Terry Frankcombe

To tell you all what noone wanted to tell me, yes, it does seem to be
the memory manager.  Compiling everything with
--with-memory-manager=none returns the vmem use to the more reasonable
~100MB per process (down from >8GB).

I take it this may affect my peak bandwidth over infiniband.  What's the
general feeling about how bad this is?


On Tue, 2008-05-13 at 13:12 +1000, Terry Frankcombe wrote:
> Hi folks
> 
> I'm trying to run an MPI app on an infiniband cluster with OpenMPI
> 1.2.6.
> 
> When run on a single node, this app is grabbing large chunks of memory
> (total per process ~8.5GB, including strace showing a single 4GB grab)
> but not using it.  The resident memory use is ~40MB per process.  When
> this app is compiled in serial mode (with conditionals to remove the MPI
> calls) the memory use is more like what you'd expect, 40MB res and
> ~100MB vmem.
> 
> Now I didn't write it so I'm not sure what extra stuff the MPI version
> does, and we haven't tracked down the large memory grabs.
> 
> Could it be that this vmem is being grabbed by the OpenMPI memory
> manager rather than directly by the app?
> 
> Ciao
> Terry
> 
>

[OMPI users] Memory manager

2008-05-13 Thread Terry Frankcombe

Hi folks

I'm trying to run an MPI app on an infiniband cluster with OpenMPI
1.2.6.

When run on a single node, this app is grabbing large chunks of memory
(total per process ~8.5GB, including strace showing a single 4GB grab)
but not using it.  The resident memory use is ~40MB per process.  When
this app is compiled in serial mode (with conditionals to remove the MPI
calls) the memory use is more like what you'd expect, 40MB res and
~100MB vmem.

Now I didn't write it so I'm not sure what extra stuff the MPI version
does, and we haven't tracked down the large memory grabs.

Could it be that this vmem is being grabbed by the OpenMPI memory
manager rather than directly by the app?

Ciao
Terry


-- 
Dr. Terry Frankcombe
Research School of Chemistry, Australian National University
Ph: (+61) 0417 163 509Skype: terry.frankcombe

Re: [OMPI users] mpif77 report Old-style type declaration

2008-05-08 Thread Terry Frankcombe

It's a function of your underlying compiler.  Try:

mpif77 --showme

to see how it's being invoked.


On Thu, 2008-05-08 at 16:55 +0800, Wen Hao Wang wrote:
> Hi all:
> 
> I am using openmpi to compile Fortran program on RHEL5.2 x86 machine.
> Open MPI 1.2.5-2 is installed. mpif77 gave following error.
> 
> [root@xblade08 dtyp]# mpif77 -c -o freal16_f.o freal16_f.f
> In file freal16_f.f:67
> 
> real*16 real16
> 1
> Error: Old-style type declaration REAL*16 not supported at (1)
> [root@xblade08 dtyp]# vi freal16_f.f
> 67 real*16 real16
> [root@xblade08 dtyp]# which mpif77
> /usr/bin/mpif77
> [root@xblade08 dtyp]# file /usr/bin/mpif77
> /usr/bin/mpif77: symbolic link to `/etc/alternatives/mpif77'
> [root@xblade08 dtyp]# file /etc/alternatives/mpif77
> /etc/alternatives/mpif77: symbolic link to
> `/usr/bin/opal_wrapper-1.2.5-gcc-32'
> [root@xblade08 dtyp]# rpm -qf /usr/bin/opal_wrapper-1.2.5-gcc-32
> openmpi-devel-1.2.5-2.el5
> [root@xblade08 dtyp]# mpif77 -v
> Using built-in specs.
> Target: i386-redhat-linux
> Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
> --infodir=/usr/share/info --enable-shared --enable-threads=posix
> --enable-checking=release --with-system-zlib --enable-__cxa_atexit
> --disable-libunwind-exceptions --enable-libgcj-multifile
> --enable-languages=c,c++,objc,obj-c++,java,fortran,ada
> --enable-java-awt=gtk --disable-dssi --enable-plugin
> --with-java-home=/usr/lib/jvm/java-1.4.2-gcj-1.4.2.0/jre
> --with-cpu=generic --host=i386-redhat-linux
> Thread model: posix
> gcc version 4.1.2 20071124 (Red Hat 4.1.2-41)
> 
> 
> Since mpif77 is one compiler wrapper, how can I get the detailed
> compiler, environment variables and arguments to the file compile
> freal16_f.f? I have another file named freal8_f.f, with the sentence
> "real*8 real8". But mpif77 can compile it smoothly.
> 
> Thanks!
> 
> Wen Hao Wang
> Email: wangw...@cn.ibm.com
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] openMPI + Ubuntu 7.10 puzzling

2008-04-21 Thread Terry Frankcombe


>> vrotival@vrotival-laptop:~/Work/workbench$ ompi_info --all
>> ompi_info: Symbol `mca_allocator_base_components' has different size in
>> shared object, consider re-linking
>
> Usually, this "has different size" means that libraries have changed
> since you've compiled your program.
>
>> updates in Ubuntu (although I do not want to incriminate it at first)
>
> Perhaps you got the Ubuntu Open MPI package. You should make sure that
>
>- ompi_info shows all the right paths
>- no openmpi-* package is installed
>
> If in doubt, rip out every installed "Open MPI" and recompile. (just to
> make sure you don't have a mix of different versions)


I'd also suggest having a close look at your synaptic logs to see what has
actually been upgraded.

Re: [OMPI users] mpicc -showme:compile output (possible bug report)

2008-04-17 Thread Terry Frankcombe

Given that discussion, might I suggest an (untested) workaround would be
to --prefix OpenMPI into a non-standard location?



On Wed, 2008-04-16 at 13:03 -0400, Jeff Squyres wrote:
> On Apr 16, 2008, at 9:38 AM, Crni Gorac wrote:
> >> mpicc (and friends) typically do not output -I only for "special"
> >> directories, such as /usr/include, because adding -I/usr/include may
> >> subvert the compiler's normal include directory search order.
> >
> > On my machine, "mpicc -showme:compile" outputs  "-pthread" only.
> 
> This would seem to indicate that OMPI is specifically choosing not to  
> display the -I/whatever flag, most likely because it would have been - 
> I/usr/include (or similar).
> 
> > I guess CMake is modeling MPI recognizing after MPICH, and that "-I"
> > flags appears in "mpicc -showme:compile" output there; tried to check
> > that, but latest MPICH (1.0.7) won't even compile on my machine...  In
> > any case, I reported the issue to the CMake bug tracker too.
> 
> 
> Ok.  If OMPI is installed with a prefix of /usr, I don't anticipate us  
> changing this behavior -- this exception is specifically implemented  
> to not subvert the normal compiler include directory search order.   
> Note, too, that the same issue will occur with -L in the --showme:link  
> line -- we don't display -L/usr/lib for the same reasons as described  
> above.
>

Re: [OMPI users] An error occurred in MPI_comm_size

2008-04-07 Thread Terry Frankcombe


On Mon, 2008-04-07 at 09:08 -0700, yacob sen wrote:
> 
> 
> Dear All,
> 
> I have just installed openmpi/mpich in ubuntu 7.10 in my linux machine
> which has  a dual processor. 

That's a little bit like saying you've installed Ubuntu/Red Hat on your
machine.

OpenMPI and MPICH are different versions of the same thing.  You don't
need to/probably shouldn't install both.  Having to copy mpif.h locally
sounds like MPICH.  You certainly don't want to do this with OpenMPI.
You need to work out exactly which mpif90 you're invoking.  If it's
OpenMPI, then continue here, the OpenMPI mailing list.  If it's MPICH,
try the MPICH mailing lists.  Either way, what's happening will
certainly be clearer if you uninstall one of OpenMPI/MPICH.


> I compiled my fortran program as follows:
> 
> mpif90  add.f90 -o add_n
> 
> I, however, forced to copy "mpif.h" library in my working directory
> where i run my program and also I  inserted an additional line inside
> the file "/etc/openmpi/openmpi-mca-params.conf", the following :
> btl=^openib.
> 
> 
> I have then run the program as:
> 
> mpirun -np 2 ./add_n  (here I use 2 processor as my dual laptop has
> two processor)
> 
> What I got is the following error message :
> 
> [geosl063:13781] *** An error occurred in MPI_comm_size
> [geosl063:13780] *** An error occurred in MPI_comm_size
> [geosl063:13780] *** on communicator MPI_COMM_WORLD
> [geosl063:13780] *** MPI_ERR_COMM: invalid communicator
> [geosl063:13780] *** MPI_ERRORS_ARE_FATAL (goodbye)
> [geosl063:13781] *** on communicator MPI_COMM_WORLD
> [geosl063:13781] *** MPI_ERR_COMM: invalid communicator
> 
> 
> I used  MPI commands to program my fortran code. The program has been
> running in a linux cluster. The point here is to develop my parallel
> program in my linux laptop before I go and run it in a Linux cluster.
> 
> Any comments. I appreciate any comments
> 
> Thank you so much
> 
> 
> Yacob
> 
> 
> 
> 
> __
> You rock. That's why Blockbuster's offering you one month of
> Blockbuster Total Access, No Cost.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] setup of a basic system on RHEL or Fedora

2008-04-03 Thread Terry Frankcombe



> One thing about running programs is that the binaries need to be in the 
> same absolute path on all systems. This means if you're running the 
> program from /home/me on system1, the program you're running must also 
> be in /home/me on all the other systems. OpenMPI will not transfer those 
> binaries for you. An easy way for this is have an NFS mount for your MPI 
> programs that all of the systems can access and run from there. The 
> system specs make no difference as long as you're not going to switch to 
> a high speed interconnect soon.

Someone can correct me if I'm wrong, but I do believe that the
executables don't need to be in the same place on every node, but they
do need to be on every node somewhere in that node's PATH.  Certainly,
consistent NFS mounted filespaces is one of the easiest ways to achieve
this.

Re: [OMPI users] Spawn problem

2008-03-31 Thread Terry Frankcombe

My C++ is a little rusty.  Is that returned intercommunicator going
where you think it is?  If you unroll the loop does the same badness
happen?


On Mon, 2008-03-31 at 02:41 -0300, Joao Vicente Lima wrote:
> Hi,
> sorry bring this again ... but i hope use spawn in ompi someday :-D
> 
> The execution of spawn in this way works fine:
> MPI_Comm_spawn ("./spawn1", MPI_ARGV_NULL, 2, MPI_INFO_NULL, 0,
> MPI_COMM_SELF, , MPI_ERRCODES_IGNORE);
> 
> but if this code go to a for I get a problem :
> for (i= 0; i < 2; i++)
> {
>   MPI_Comm_spawn ("./spawn1", MPI_ARGV_NULL, 1,
>   MPI_INFO_NULL, 0, MPI_COMM_SELF, [i], MPI_ERRCODES_IGNORE);
> }
> 
> and the error is:
> spawning ...
> child!
> child!
> [localhost:03892] *** Process received signal ***
> [localhost:03892] Signal: Segmentation fault (11)
> [localhost:03892] Signal code: Address not mapped (1)
> [localhost:03892] Failing at address: 0xc8
> [localhost:03892] [ 0] /lib/libpthread.so.0 [0x2ac71ca8bed0]
> [localhost:03892] [ 1]
> /usr/local/mpi/ompi-svn/lib/libmpi.so.0(ompi_dpm_base_dyn_finalize+0xa3)
> [0x2ac71ba7448c]
> [localhost:03892] [ 2] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 
> [0x2ac71b9decdf]
> [localhost:03892] [ 3] /usr/local/mpi/ompi-svn/lib/libmpi.so.0 
> [0x2ac71ba04765]
> [localhost:03892] [ 4]
> /usr/local/mpi/ompi-svn/lib/libmpi.so.0(PMPI_Finalize+0x71)
> [0x2ac71ba365c9]
> [localhost:03892] [ 5] ./spawn1(main+0xaa) [0x400ac2]
> [localhost:03892] [ 6] /lib/libc.so.6(__libc_start_main+0xf4) [0x2ac71ccb7b74]
> [localhost:03892] [ 7] ./spawn1 [0x400989]
> [localhost:03892] *** End of error message ***
> --
> mpirun noticed that process rank 0 with PID 3892 on node localhost
> exited on signal 11 (Segmentation fault).
> --
> 
> the attachments contain the ompi_info, config.log and program.
> 
> thanks for some check,
> Joao.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] More on AlltoAll

2008-03-20 Thread Terry Frankcombe

If the data distribution was sufficiently predictable and long-lived
through the life of the application, could one not define new
communicators to clean up the calls?


> After reading the previous discussion on AllReduce and AlltoAll, I
> thought I would ask my question. I have a case where I have data
> unevenly distributed among the processes (unevenly means that the
> processes have differing amounts of data) that I need to globally
> redistribute, resulting in a different uneven distribution. Writing the
> code to do the redistribution using AlltoAll is straightforward.
>
> The problem though is that there are often special cases where each
> process only needs to exchange data with it neighbors. So the question
> is that when two processors don't have data to exchange, is the OpenMPI
> AlltoAll is written in such a way so that they don't do any
> communication? Will the AlltoAll be as efficient (or at least nearly as
> efficient) as direct send/recv among neighbors?
>   Thanks!
> Dave
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Re: [OMPI users] ScaLapack and BLACS on Leopard

2008-03-06 Thread Terry Frankcombe


> Intel ifort  is 7.7x faster then Linux g95 on MacPro 3.0 GHz
> Intel ifort  is 2.9x faster then Linux g95 on Dual Opteron 1.4 GHz
> Intel ifort  is 1.8x faster then Linux g95 on SGI Altix 350 dual  
> Itanium2 1.4 GHz
> OS X g95 is 2.7x faster then Linux g95 on a MacPro 2.66 GHz (same  
> hardware exactly)

That ordering makes little sense to me.  The Intel compilers should be
the most effective on Itanium, where a lot of functionality has been
moved out of hardware and foisted onto the compiler!

Have you done these tests with a recent gfortran?  (Certainly the gcc
people would want to know about it as these must be missed
optimisations.  But only for the gfortran case.)

Re: [OMPI users] OpenMPI on intel core 2 duo machine for parallel computation.

2008-02-28 Thread Terry Frankcombe


On Thu, 2008-02-28 at 16:32 -0600, Chembeti, Ramesh (S) wrote:
> Dear All,
> 
> I am a graduate student working on molecular dynamic simulation. My 
> professor/adviser is planning to buy Linux based clusters. But before that he 
> wanted me to parallelize a serial code on molecular dynamic simulations and 
> test it on a intelcore 2 duo machine with fedora 8 on it. I have parallelised 
> my code in fortran 77 using MPI. I have installed OpenMPI and compiling the 
> code using mpif77 -g -o code code.f and running it using 
> mpirun -np 2 ./code. I have a couple of questions to ask you:


You have actually parallelised it, right?  As in built parellelisation
with MPI_ calls?

Re: [OMPI users] process placement with toruqe and OpenMP

2008-01-29 Thread Terry Frankcombe

> Ok so I ask the mpirun masters how would you do the following:
>
> I submit a job with torque (we use --with-tm) like the following:
>
> nodes=4:ppn=2
>
> My desired outcome is to have 1 mpi process per 2 cpus and use
> threaded blas (or my own OpenMP take your pick)

For the above, -np 4 --bynode should work.



>
> Our cluster has some 4 core machines thus the above job sometimes
> ends up looking like
>
> nodes=1:ppn=4+nodes=2:ppn=2
>
> The mpirun -bynode command will work in the case i get 4 nodes with
> only 2 cpus free.  But if any machine other than the first machine is
> my node with 4 cores free given to me by moab, I would end up
> starting a extra process on the first node, where mpirun thinks
> another cpu is free, but that cpu is really to be used by OpenMP, and
> that the last process should be placed on the node that has 4 cpus free.
>
> I hope that wasn't to confusing, Its how to i launch hybrid jobs and
> make sure the process started by mpirun go where i want when my nodes
> have different core counts, and I am running via torque so using -H
> wont%

Re: [OMPI users] Excessive Use of CPU System Resources with OpenMPI 1.2.4 using TCP only ..

2008-01-23 Thread Terry Frankcombe

On Tue, 2008-01-22 at 20:19 +0100, Pignot Geoffroy wrote:
> You could try the following MCA setting in your mpirun command
> --mca mpi_yield_when_idle 1

Yes, but to repeat what was said above, it is first essential that you
read:

and the related

Re: [OMPI users] Excessive Use of CPU System Resources with OpenMPI 1.2.4 using TCP only ..

2008-01-22 Thread Terry Frankcombe

Well, I have noticed that when a process is waiting for communication
from another process the reported CPU usage remains around 100%.  Is
that what you mean?  I haven't explored whether these processes give way
to other active processes under the linux scheduler, nor whether I
should expect anything different.  This has been with 1.2.3 and 1.2.4.



On Tue, 2008-01-22 at 16:48 +1100, Graham Jenkins wrote:
> We've observed an excessive use of CPU system resources with OpenMPI
> 1.2.4 using TCP connections only on our SL5 x86_64 Cluster. Typically,
> for a simple Canonical Ring Program, we're seeing between 30 and 70%
> system usage.
> 
> Has anybody else noticed this sort of behaviour?
> And does anybody have some suggestions for resolving the issue?
> 
> Present values we have are:
> --
> ompi_info --param btl tcp |grep MCA
>  MCA btl: parameter "btl_base_debug" (current value: "0")
>  MCA btl: parameter "btl" (current value: )
>  MCA btl: parameter "btl_base_verbose" (current value: "0")
>  MCA btl: parameter "btl_tcp_if_include" (current value:
> "eth0")
>  MCA btl: parameter "btl_tcp_if_exclude" (current value:
> "lo")
>  MCA btl: parameter "btl_tcp_free_list_num" (current
> value: "8")
>  MCA btl: parameter "btl_tcp_free_list_max" (current
> value: "-1")
>  MCA btl: parameter "btl_tcp_free_list_inc" (current
> value: "32")
>  MCA btl: parameter "btl_tcp_sndbuf" (current value:
> "131072")
>  MCA btl: parameter "btl_tcp_rcvbuf" (current value:
> "131072")
>  MCA btl: parameter "btl_tcp_endpoint_cache" (current
> value: "30720")
>  MCA btl: parameter "btl_tcp_exclusivity" (current
> value: "0")
>  MCA btl: parameter "btl_tcp_eager_limit" (current
> value: "65536")
>  MCA btl: parameter "btl_tcp_min_send_size" (current
> value: "65536")
>  MCA btl: parameter "btl_tcp_max_send_size" (current
> value: "131072")
>  MCA btl: parameter "btl_tcp_min_rdma_size" (current
> value: "131072")
>  MCA btl: parameter "btl_tcp_max_rdma_size" (current
> value: "2147483647")
>  MCA btl: parameter "btl_tcp_flags" (current value: "122")
>  MCA btl: parameter "btl_tcp_priority" (current value: "0")
>  MCA btl: parameter "btl_base_warn_component_unused"
> (current value: "1")
>

[OMPI users] Memory manager

2007-11-20 Thread Terry Frankcombe

Hi folks

I posted this to the devel list the other day, but it raised no
responses.  Maybe people will have more to say here.


Questions:  How much does using the MPI wrappers influence the memory
management at runtime?  What has changed in this regard from 1.2.3 to
1.2.4?


The reason I ask is that I have an f90 code that does very strange
things.  The structure of the code is not all that straightforward, with
a "tree" of modules usually allocating their own storage (all with save
applied globally within the module).  Compiling with OpenMPI 1.2.4
coupled to a gcc 4.3.0 prerelease and running as a single process (with
no explicit mpirun), the elements of one particular array seem to revert
to previous values between where they are set and a later part of the
code.  (I'll refer to this as The Bug, and having the matrix elements
stay as set as "expected behaviour".)

The most obvious explanation would be a coding error.  However,
compiling and running this with OpenMPI 1.2.3 gives me the expected
behaviour!  As does compiling and running with a different MPI
implementation and compiler set.  Replacing the prerelease gcc 4.3.0
with the released 4.2.2 makes no change.

The Bug is unstable.  Removing calls to various routines in used modules
(that otherwise do not effect the results) returns to expected behaviour
at runtime.  Removing a call to MPI_Recv that is never called returns to
expected behaviour.

Because of this I can't reduce the problem to a small testcase, and so
have not included any code at this stage.

If I run the code with mpirun -np 1 the problem goes away.  So one could
presumably simply say "always run it with mpirun."  But if this is
required, why does OpenMPI not detect it?  And why the difference
between 1.2.3 and 1.2.4?

Does anyone care to comment?

Ciao
Terry


-- 
Dr Terry Frankcombe
Physical Chemistry, Department of Chemistry
Göteborgs Universitet
SE-412 96 Göteborg Sweden
Ph: +46 76 224 0887   Skype: terry.frankcombe
<te...@chem.gu.se>

Re: [OMPI users] error -- libtool unsupported hardcode properties

2007-06-20 Thread Terry Frankcombe

Isn't this another case of trying to use two different Fortran compilers
at the same time?


On Tue, 2007-06-19 at 20:04 -0400, Jeff Squyres wrote:
> I have not seen this before -- did you look in the libtool  
> documentation?  ("See the libtool documentation for more information.")
> 
> On Jun 19, 2007, at 6:46 PM, Andrew Friedley wrote:
> 
> > I'm trying to build Open MPI v1.2.2 with gcc/g++/g77 3.4.4 and  
> > pathf90 v2.4 on a linux system, and see this error when compiling  
> > ompi_info:
> >
> > /bin/sh ../../../libtool --tag=CXX --mode=link g++  -g -O2 -finline- 
> > functions -pthread  -export-dynamic   -o ompi_info components.o  
> > ompi_info.o output.o param.o version.o ../../../ompi/libmpi.la - 
> > lnsl -lutil  -lm
> > libtool: link: unsupported hardcode properties
> > libtool: link: See the libtool documentation for more information.
> > libtool: link: Fatal configuration error.
> > make[2]: *** [ompi_info] Error 1
> > make[2]: Leaving directory `/g/g21/afriedle/work/ompibuild/ 
> > openmpi-1.2.2/ompi/tools/ompi_info'
> > make[1]: *** [all-recursive] Error 1
> >
> > Google didn't turn anything up.  Strange thing is, gcc 3.4.5 works  
> > just fine on this system.  I'm using this to build:
> >
> > export CC=gcc
> > export CXX=g++
> > export F77=g77
> > export FC=pathf90
> > export CFLAGS="-g -O2"
> > export CXXFLAGS="-g -O2"
> > export FFLAGS="-fno-second-underscore -g -O2"
> > export FCFLAGS="-fno-second-underscore -g -O2"
> > export PREFIX=$ROOT/gnudbg
> >
> > ./configure --prefix=$PREFIX --enable-debug --enable-mpi-f77 -- 
> > enable-mpi-f90 --with-openib=/usr
> >
> > I've attached the config.log.. any ideas?
> >
> > Andrew
> > 
> > 
> 
>

Re: [OMPI users] f90 support not built with gfortran?

2007-06-12 Thread Terry Frankcombe

On Mon, 2007-06-11 at 12:10 -0500, Jeff Pummill wrote:
> Greetings all,
> 
> I downloaded and configured v1.2.2 this morning on an Opteron cluster
> using the following configure directives...
> 
> ./configure --prefix=/share/apps CC=gcc CXX=g++ F77=g77 FC=gfortran
> CFLAGS=-m64 CXXFLAGS=-m64 FFLAGS=-m64 FCFLAGS=-m64 

What does config.log say?  (Look for 'Fortran 90'.)  config.log should
be your first port of call when trying to debug build problems in any
"configure"-d project.

Re: [OMPI users] Library Definitions

2007-06-11 Thread Terry Frankcombe


Hi Victor

I'd suggest 3 seconds of CPU time is far, far to small a problem to do
scaling tests with.  Even with only 2 CPUs, I wouldn't go below 100
times that.


On Mon, 2007-06-11 at 01:10 -0700, victor marian wrote:
> Hi Jeff
> 
> I ran the NAS Parallel Bechmark and it gives for me
> -bash%/export/home/vmarian/fortran/benchmarks/NPB3.2/NPB3.2-MPI/bin$
> mpirun -np 1 cg.A.1
> --
> [0,1,0]: uDAPL on host SERVSOLARIS was unable to find
> any NICs.
> Another transport will be used instead, although this
> may result in
> lower performance.
> --
>  NAS Parallel Benchmarks 3.2 -- CG Benchmark
> 
>  Size:  14000
>  Iterations:15
>  Number of active processes: 1
>  Number of nonzeroes per row:   11
>  Eigenvalue shift: .200E+02
>  Benchmark completed
>  VERIFICATION SUCCESSFUL
>  Zeta is  0.171302350540E+02
>  Error is 0.512264003323E-13
> 
> 
>  CG Benchmark Completed.
>  Class   =A
>  Size=14000
>  Iterations  =   15
>  Time in seconds = 3.02
>  Total processes =1
>  Compiled procs  =1
>  Mop/s total =   495.93
>  Mop/s/process   =   495.93
>  Operation type  =   floating point
>  Verification=   SUCCESSFUL
>  Version =  3.2
>  Compile date=  11 Jun 2007
> 
> 
> -bash%/export/home/vmarian/fortran/benchmarks/NPB3.2/NPB3.2-MPI/bin$
> mpirun -np 2 cg.A.2
> --
> [0,1,0]: uDAPL on host SERVSOLARIS was unable to find
> any NICs.
> Another transport will be used instead, although this
> may result in
> lower performance.
> --
> --
> [0,1,1]: uDAPL on host SERVSOLARIS was unable to find
> any NICs.
> Another transport will be used instead, although this
> may result in
> lower performance.
> --
> 
> 
>  NAS Parallel Benchmarks 3.2 -- CG Benchmark
> 
>  Size:  14000
>  Iterations:15
>  Number of active processes: 2
>  Number of nonzeroes per row:   11
>  Eigenvalue shift: .200E+02
> 
>  Benchmark completed
>  VERIFICATION SUCCESSFUL
>  Zeta is  0.171302350540E+02
>  Error is 0.522633719989E-13
> 
> 
>  CG Benchmark Completed.
>  Class   =A
>  Size=14000
>  Iterations  =   15
>  Time in seconds = 2.47
>  Total processes =2
>  Compiled procs  =2
>  Mop/s total =   606.32
>  Mop/s/process   =   303.16
>  Operation type  =   floating point
>  Verification=   SUCCESSFUL
>  Version =  3.2
>  Compile date=  11 Jun 2007
> 
> 
> You can remark that the scalling is not so good
> like yours. Maibe I am having comunications problems
> between processors.
>You can also remark that I am faster on one process
> concared to your processor.
> 
>Victor
> 
> 
> 
> 
> 
> --- Jeff Pummill  wrote:
> 
> > Perfect! Thanks Jeff!
> > 
> > The NAS Parallel Benchmark on a dual core AMD
> > machine now returns this...
> > [jpummil@localhost bin]$ mpirun -np 1 cg.A.1
> > NAS Parallel Benchmarks 3.2 -- CG Benchmark
> > CG Benchmark Completed.
> >  Class   =A
> >  Size=14000
> >  Iterations  =   15
> >  Time in seconds = 4.75
> >  Total processes =1
> >  Compiled procs  =1
> >  Mop/s total =   315.32
> > 
> > ...and...
> > 
> > [jpummil@localhost bin]$ mpirun -np 2 cg.A.2
> > NAS Parallel Benchmarks 3.2 -- CG Benchmark
> >  CG Benchmark Completed.
> >  Class   =A
> >  Size=14000
> >  Iterations  =   15
> >  Time in seconds = 2.48
> >  Total processes =2
> >  Compiled procs  =2
> >  Mop/s total =   604.46
> > 
> > Not quite linear, but one must account for all of
> > the OS traffic that 
> > one core or the other must deal with.
> > 
> > 
> > Jeff F. Pummill
> > Senior Linux Cluster Administrator
> > University of Arkansas
> > Fayetteville, Arkansas 72701
> > (479) 575 - 4590
> > http://hpc.uark.edu
> > 
> > "A supercomputer is a device for turning
> >

Re: [OMPI users] Newbie question. Please help.

2007-05-10 Thread Terry Frankcombe


I have previously been running parallel VASP happily with an old,
prerelease version of OpenMPI:


[terry@nocona Vasp.4.6-OpenMPI]$
head /home/terry/Install_trees/OpenMPI-1.0rc6/config.log
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by Open MPI configure 1.0rc6, which was
generated by GNU Autoconf 2.59.  Invocation command line was

  $ ./configure --enable-static --disable-shared
--prefix=/home/terry/bin/Local --enable-picky --disable-heterogeneous
--without-libnuma --without-slurm --without-tm F77=ifort



In my VASP makefile:

FC=/home/terry/bin/Local/bin/mpif90

OFLAG= -O3 -xP -tpp7

CPP = $(CPP_) -DMPI  -DHOST=\"LinuxIFC\" -DIFC -Dkind8 -DNGZhalf
-DCACHE_SIZE=12000 -DPGF90 -Davoidalloc -DMPI_BLOCK=500 -DRPROMU_DGEMV
-DRACCMU_DGEMV

FFLAGS =  -FR -lowercase -assume byterecl

As far as I can see (it was a long time ago!) I didn't use BLACS or
SCALAPACK libraries.  I used ATLAS.



Maybe this will help.


-- 
Dr Terry Frankcombe
Physical Chemistry, Department of Chemistry
Göteborgs Universitet
SE-412 96 Göteborg Sweden
Ph: +46 76 224 0887   Skype: terry.frankcombe
<te...@chem.gu.se>

85 matches

Mail list logo