Re: [OMPI users] What is the best choice of pml and btl for intranode communication

2023-03-06 Thread Edgar Gabriel via users
[AMD Official Use Only - General]

UCX will disqualify itself unless it finds cuda, rocm, or InfiniBand network to 
use. To allow UCX to run on a regular shared memory job without GPUs or IB, you 
have to set UCX_TLS environment variable explicitly allowe UCX to run for shm, 
e.g :

mpirun -x UCX_TLS=shm,self,ib --mca pml ucx ….

(I think you can also set UCX_TLS=all but am not entirely sure)

Thanks
Edgar


From: users  On Behalf Of George Bosilca via 
users
Sent: Monday, March 6, 2023 8:56 AM
To: Open MPI Users 
Cc: George Bosilca 
Subject: Re: [OMPI users] What is the best choice of pml and btl for intranode 
communication

ucx PML should work just fine even on a single node scenario. As Jeff indicated 
you need to move the MCA param `--mca pml ucx` before your command.

  George.


On Mon, Mar 6, 2023 at 9:48 AM Jeff Squyres (jsquyres) via users 
mailto:users@lists.open-mpi.org>> wrote:
If this run was on a single node, then UCX probably disabled itself since it 
wouldn't be using InfiniBand or RoCE to communicate between peers.

Also, I'm not sure your command line was correct:


perf_benchmark $ mpirun -np 32 --map-by core --bind-to core ./perf  --mca pml 
ucx

You probably need to list all of mpirun's CLI options before​ you list the 
./perf executable.  In its right-to-left traversal, once mpirun hits a CLI 
option it does not recognize (e.g., "./perf"), it assumes that it is the user's 
executable name, and does not process the CLI options to the right of that.

Hence, the output you show must have forced the UCX PML another way -- perhaps 
you set an environment variable or something?


From: users 
mailto:users-boun...@lists.open-mpi.org>> on 
behalf of Chandran, Arun via users 
mailto:users@lists.open-mpi.org>>
Sent: Monday, March 6, 2023 3:33 AM
To: Open MPI Users mailto:users@lists.open-mpi.org>>
Cc: Chandran, Arun mailto:arun.chand...@amd.com>>
Subject: Re: [OMPI users] What is the best choice of pml and btl for intranode 
communication


[Public]



Hi Gilles,



Thanks very much for the information.



I was looking for the best pml + btl combination for a standalone intra node 
with high task count (>= 192) with no HPC-class networking installed.



Just now realized that I can’t use pml ucx for such cases as it is unable find 
IB and fails.



perf_benchmark $ mpirun -np 32 --map-by core --bind-to core ./perf  --mca pml 
ucx

--

No components were able to be opened in the pml framework.



This typically means that either no components of this type were

installed, or none of the installed components can be loaded.

Sometimes this means that shared libraries required by these

components are unable to be found/loaded.



  Host:  lib-ssp-04

  Framework: pml

--

[lib-ssp-04:753542] PML ucx cannot be selected

[lib-ssp-04:753531] PML ucx cannot be selected

[lib-ssp-04:753541] PML ucx cannot be selected

[lib-ssp-04:753539] PML ucx cannot be selected

[lib-ssp-04:753545] PML ucx cannot be selected

[lib-ssp-04:753547] PML ucx cannot be selected

[lib-ssp-04:753572] PML ucx cannot be selected

[lib-ssp-04:753538] PML ucx cannot be selected

[lib-ssp-04:753530] PML ucx cannot be selected

[lib-ssp-04:753537] PML ucx cannot be selected

[lib-ssp-04:753546] PML ucx cannot be selected

[lib-ssp-04:753544] PML ucx cannot be selected

[lib-ssp-04:753570] PML ucx cannot be selected

[lib-ssp-04:753567] PML ucx cannot be selected

[lib-ssp-04:753534] PML ucx cannot be selected

[lib-ssp-04:753592] PML ucx cannot be selected

[lib-ssp-04:753529] PML ucx cannot be selected





That means my only choice is pml/ob1 + btl/vader.



--Arun



From: users 
mailto:users-boun...@lists.open-mpi.org>> On 
Behalf Of Gilles Gouaillardet via users
Sent: Monday, March 6, 2023 12:56 PM
To: Open MPI Users mailto:users@lists.open-mpi.org>>
Cc: Gilles Gouaillardet 
mailto:gilles.gouaillar...@gmail.com>>
Subject: Re: [OMPI users] What is the best choice of pml and btl for intranode 
communication



Caution: This message originated from an External Source. Use proper caution 
when opening attachments, clicking links, or responding.



Arun,



First Open MPI selects a pml for **all** the MPI tasks (for example, pml/ucx or 
pml/ob1)



Then, if pml/ob1 ends up being selected, a btl component (e.g. btl/uct, 
btl/vader) is used for each pair of MPI tasks

(tasks on the same node will use btl/vader, tasks on different nodes will use 
btl/uct)



Note that if UCX is available, pml/ucx takes the highest priority, so no btl is 
involved

(in your case, if means intra-node communications will be handled by UCX and 
not btl/vader).

You can force ob1 and try different combinations of btl with

mpirun --mca pml ob1 --mca btl self,, ...



I expect pml/ucx is faster than pml/ob1 with btl/uct for inter node 
communications.



I have not 

Re: [OMPI users] GPU direct in OMPIO?

2022-12-05 Thread Edgar Gabriel via users
There was work done in ompio in that direction, but the code wasn’t actually 
committed into the main repository. It probably exists somewhere in a branch 
somewhere. If you are interested, please ping me directly and I can put you in 
contact with the person that wrote the code and to clarify the precise status.

Thanks
Edgar

From: users  On Behalf Of Jim Edwards via 
users
Sent: Monday, December 5, 2022 8:16 AM
To: Open MPI Users 
Cc: Jim Edwards 
Subject: [OMPI users] GPU direct in OMPIO?

Greetings,

Does the OMPIO library support GPU-Direct IO?  NVIDIA seems to suggest that it 
does,
 but I can't find details or examples.

--
Jim Edwards
CESM Software Engineer
National Center for Atmospheric Research
Boulder, CO


Re: [OMPI users] CephFS and striping_factor

2022-11-29 Thread Edgar Gabriel via users
[AMD Official Use Only - General]

I can also offer to help if there are any question regarding the ompio code, 
but I do not have the bandwidth/resources to do that myself, and more 
importantly, I do not have a platform to test the new component.
Edgar

From: users  On Behalf Of Jeff Squyres 
(jsquyres) via users
Sent: Tuesday, November 29, 2022 9:16 AM
To: users@lists.open-mpi.org
Cc: Jeff Squyres (jsquyres) 
Subject: Re: [OMPI users] CephFS and striping_factor

More specifically, Gilles created a skeleton "ceph" component in this draft 
pull request: https://github.com/open-mpi/ompi/pull/11122

If anyone has any cycles to work on it and develop it beyond the skeleton that 
is currently there, that would be great!

--
Jeff Squyres
jsquy...@cisco.com

From: users 
mailto:users-boun...@lists.open-mpi.org>> on 
behalf of Gilles Gouaillardet via users 
mailto:users@lists.open-mpi.org>>
Sent: Monday, November 28, 2022 9:48 PM
To: users@lists.open-mpi.org 
mailto:users@lists.open-mpi.org>>
Cc: Gilles Gouaillardet mailto:gil...@rist.or.jp>>
Subject: Re: [OMPI users] CephFS and striping_factor

Hi Eric,


Currently, Open MPI does not provide specific support for CephFS.

MPI-IO is either implemented by ROMIO (imported from MPICH, it does not
support CephFS today)

or the "native" ompio component (that also does not support CephFS today).


A proof of concept for CephFS in ompio might not be a huge work for
someone motivated:

That could be as simple as (so to speak, since things are generally not
easy) creating a new fs/ceph component

(e.g. in ompi/mca/fs/ceph) and implement the "file_open" callback that
uses the ceph API.

I think the fs/lustre component can be used as an inspiration.


I cannot commit to do this, but if you are willing to take a crack at
it, I can create such a component

so you can go directly to implementing the callback without spending too
much time on some Open MPI internals

(e.g. component creation).



Cheers,


Gilles


On 11/29/2022 6:55 AM, Eric Chamberland via users wrote:
> Hi,
>
> I would like to know if OpenMPI is supporting file creation with
> "striping_factor" for CephFS?
>
> According to CephFS library, I *think* it would be possible to do it
> at file creation with "ceph_open_layout".
>
> https://github.com/ceph/ceph/blob/main/src/include/cephfs/libcephfs.h
>
> Is it a possible futur enhancement?
>
> Thanks,
>
> Eric
>


Re: [OMPI users] MPI I/O, Romio vs Ompio on GPFS

2022-06-14 Thread Edgar Gabriel via users
Hi,

There are a few things that you could test to see whether they make difference.


  1.  Try to modify the number of aggregators used in collective I/O (assuming 
that the code uses collective I/O). You could try e.g. to set it to the number 
of nodes used (the algorithm determining the number of aggregators 
automatically is sometimes overly aggressive). E.g.



mpirun –mca io_ompio_num_aggregators 16 -np 256 ./executable name



(assuming here that you run 256 processes distributed on 16 nodes). Based on 
our tests from a while back gpfs was not super sensitive to this, but you never 
know, its worth a try.



  1.  If your data is large and mostly contiguous,  you could try to disable 
data sieving for write operations, e.g.



mpirun --mca fbtl_posix_write_datasieving 0 -np 256 ./…

Let me know if these make a difference. There are quite a couple of info 
objects that the gpfs fs component understands and that potentially could be 
used to tune the performance, but I do not have experience with them, they are 
based on code contributed by the HLRS a couple of years ago. You can still have 
a look at them and see whether some of them would make sense (source location: 
ompi/ompi/mca/fs/gpfs/fs_gpfs_file_set_info.c).

Thanks
Edgar


From: users  On Behalf Of Eric Chamberland 
via users
Sent: Saturday, June 11, 2022 9:28 PM
To: Open MPI Users 
Cc: Eric Chamberland ; Ramses van Zon 
; Vivien Clauzon ; 
dave.mar...@giref.ulaval.ca; Thomas Briffard 
Subject: Re: [OMPI users] MPI I/O, Romio vs Ompio on GPFS


Hi,

just almost found what I wanted with "--mca io_base_verbose 100"

Now I am looking at performances for GPFS and I must say OpenMPI 4.1.2 performs 
very poorly when it comes the time to write.

I am launching a 512 processes, read+compute (ghosts components of a mesh), and 
then later write a 79Gb file.

Here are the timings (all in seconds):



IO module ;  reading+ghost computing ; writing

ompio   ;   24.9   ; 2040+ (job got killed before completion)

romio321 ;  20.8; 15.6



I have run many times the job with Ompio module (the default) and Romio and the 
timings are always similar to those given.

I also activated maximum debug output with " --mca mca_base_verbose 
stdout,level:9  --mca mpi_show_mca_params all --mca io_base_verbose 100" and 
got a few lines but nothing relevant to debug:

Sat Jun 11 20:08:28 2022:chrono::ecritMaillageMPI::debut VmSize: 
6530408 VmRSS: 5599604 VmPeak: 7706396 VmData: 5734408 VmHWM: 5699324 
Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683] io:base:delete: 
deleting file: resultat01_-2.mail
Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683] io:base:delete: 
Checking all available modules
Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683] io:base:delete: 
component available: ompio, priority: 30
Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683] io:base:delete: 
component available: romio321, priority: 10
Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683] io:base:delete: 
Selected io component ompio
Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683] 
io:base:file_select: new file: resultat01_-2.mail
Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683] 
io:base:file_select: Checking all available modules
Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683] 
io:base:file_select: component available: ompio, priority: 30
Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683] 
io:base:file_select: component available: romio321, priority: 10
Sat Jun 11 20:08:28 2022:[nia0073.scinet.local:236683] 
io:base:file_select: Selected io module ompio

What else can I do to dig into this?

Are there parameters ompio is aware of with GPFS?

Thanks,

Eric

--

Eric Chamberland, ing., M. Ing

Professionnel de recherche

GIREF/Université Laval

(418) 656-2131 poste 41 22 42
On 2022-06-10 16:23, Eric Chamberland via users wrote:
Hi,

I want to try romio with OpenMPI 4.1.2 because I am observing a big performance 
difference with IntelMPI on GPFS.

I want to see, at *runtime*, all parameters (default values, names) used by MPI 
(at least for the "io" framework).

I would like to have all the same output as "ompi_info --all" gives me...

I have tried this:

mpiexec --mca io romio321  --mca mca_verbose 1  --mca mpi_show_mca_params 1 
--mca io_base_verbose 1 ...

But I cannot see anything about io coming out...

With "ompi_info" I do...

Is it possible?

Thanks,

Eric


--

Eric Chamberland, ing., M. Ing

Professionnel de recherche

GIREF/Université Laval

(418) 656-2131 poste 41 22 42


Re: [OMPI users] mpi send/recv pair hangin

2018-04-05 Thread Edgar Gabriel
is the file I/O that you mentioned using MPI I/O for that? If yes, what 
file system are you writing to?


Edgar


On 4/5/2018 10:15 AM, Noam Bernstein wrote:

On Apr 5, 2018, at 11:03 AM, Reuti  wrote:

Hi,


Am 05.04.2018 um 16:16 schrieb Noam Bernstein :

Hi all - I have a code that uses MPI (vasp), and it’s hanging in a strange way. 
 Basically, there’s a Cartesian communicator, 4x16 (64 processes total), and 
despite the fact that the communication pattern is rather regular, one 
particular send/recv pair hangs consistently.  Basically, across each row of 4, 
task 0 receives from 1,2,3, and tasks 1,2,3 send to 0.  On most of the 16 such 
sets all those send/recv pairs complete.  However, on 2 of them, it hangs (both 
the send and recv).  I have stack traces (with gdb -p on the running processes) 
from what I believe are corresponding send/recv pairs.



This is with OpenMPI 3.0.1 (same for 3.0.0, haven’t checked older versions), 
Intel compilers (17.2.174). It seems to be independent of which nodes, always 
happens on this pair of calls and happens after the code has been running for a 
while, and the same code for the other 14 sets of 4 work fine, suggesting that 
it’s an MPI issue, rather than an obvious bug in this code or a hardware 
problem.  Does anyone have any ideas, either about possible causes or how to 
debug things further?

Do you use scaLAPACK, and which type of BLAS/LAPACK? I used Intel MKL with the 
Intel compilers for VASP and found, that using in addition a self-compiled 
scaLAPACK is working fine in combination with Open MPI. Using Intel scaLAPACK 
and Intel MPI is also working fine. What I never got working was the 
combination Intel scaLAPACK and Open MPI – at one point one process got a 
message from a wrong rank IIRC. I tried both: the Intel supplied Open MPI 
version of scaLAPACK and also compiling the necessary interface on my own for 
Open MPI in $MKLROOT/interfaces/mklmpi with identical results.

MKL BLAS/LAPACK, with my own self-compiled scalapack, but in this run I set 
LSCALAPCK=.FALSE. I suppose I could try compiling without it just to test.  In 
any case, this is when it’s writing out the wavefunctions, which I would assume 
be unrelated to scalapack operations (unless they’re corrupting some low level 
MPI thing, I guess).


Noam

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] OMPI users] Installation of openmpi-1.10.7 fails

2018-01-23 Thread Edgar Gabriel

I ran all my tests with gcc 6.4

Thanks

Edgar


On 1/23/2018 7:40 AM, Vahid Askarpour wrote:

Gilles,

I have not tried compiling the latest openmpi with GCC. I am waiting 
to see how the intel version turns out before attempting GCC.


Cheers,

Vahid

On Jan 23, 2018, at 9:33 AM, Gilles Gouaillardet 
<gilles.gouaillar...@gmail.com 
<mailto:gilles.gouaillar...@gmail.com>> wrote:


Vahid,

There used to be a bug in the IOF part, but I am pretty sure this has 
already been fixed.


Does the issue also occur with GNU compilers ?
There used to be an issue with Intel Fortran runtime (short 
read/write were silently ignored) and that was also fixed some time ago.


Cheers,

Gilles

Vahid Askarpour <vh261...@dal.ca <mailto:vh261...@dal.ca>> wrote:
This would work for Quantum Espresso input. I am waiting to see what 
happens to EPW. I don’t think EPW accepts the -i argument. I will 
report back once the EPW job is done.


Cheers,

Vahid

On Jan 22, 2018, at 6:05 PM, Edgar Gabriel <egabr...@central.uh.edu 
<mailto:egabr...@central.uh.edu>> wrote:


well, my final comment on this topic, as somebody suggested earlier 
in this email chain, if you provide the input with the -i argument 
instead of piping from standard input, things seem to work as far as 
I can see (disclaimer: I do not know what the final outcome should 
be. I just see that the application does not complain about the 'end 
of file while reading crystal k points'). So maybe that is the most 
simple solution.


Thanks

Edgar


On 1/22/2018 1:17 PM, Edgar Gabriel wrote:


after some further investigation, I am fairly confident that this 
is not an MPI I/O problem.


The input file input_tmp.in is generated in this sequence of 
instructions (which is in Modules/open_close_input_file.f90)


---

   IF ( TRIM(input_file_) /= ' ' ) THEn
  !
  ! copy file to be opened into input_file
  !
  input_file = input_file_
  !
   ELSE
  !
  ! if no file specified then copy from standard input
  !
  input_file="input_tmp.in"
  OPEN(UNIT = stdtmp, FILE=trim(input_file), FORM='formatted', &
   STATUS='unknown', IOSTAT = ierr )
  IF ( ierr > 0 ) GO TO 30
  !
  dummy=' '
  WRITE(stdout, '(5x,a)') "Waiting for input..."
  DO WHILE ( TRIM(dummy) .NE. "MAGICALME" )
     READ (stdin,fmt='(A512)',END=20) dummy
     WRITE (stdtmp,'(A)') trim(dummy)
  END DO
  !
20   CLOSE ( UNIT=stdtmp, STATUS='keep' )



Basically, if no input file has been provided, the input file is 
generated by reading from standard input. Since the application is 
being launched e.g. with


mpirun -np 64 ../bin/pw.x -npool 64 nscf.out

the data comes from nscf.in. I simply do not know enough about IO 
forwarding do be able to tell why we do not see the entire file, 
but one interesting detail is that if I run it in the debugger, the 
input_tmp.in is created correctly. However, if I run it using 
mpirun as shown above, the file is cropped incorrectly, which leads 
to the error message mentioned in this email chain.


Anyway, I would probably need some help here from somebody who 
knows the runtime better than me on what could go wrong at this point.


Thanks

Edgar




On 1/19/2018 1:22 PM, Vahid Askarpour wrote:

Concerning the following error

     from pw_readschemafile : error #         1
     xml data file not found

The nscf run uses files generated by the scf.in run. So I first 
run scf.in and when it finishes, I run nscf.in. If you have done 
this and still get the above error, then this could be another 
bug. It does not happen for me with intel14/openmpi-1.8.8.


Thanks for the update,

Vahid

On Jan 19, 2018, at 3:08 PM, Edgar Gabriel 
<egabr...@central.uh.edu <mailto:egabr...@central.uh.edu>> wrote:


ok, here is what found out so far, will have to stop for now 
however for today:


 1. I can in fact reproduce your bug on my systems.

 2. I can confirm that the problem occurs both with romio314 and 
ompio. I *think* the issue is that the input_tmp.in file is 
incomplete. In both cases (ompio and romio) the end of the file 
looks as follows (and its exactly the same for both libraries):


gabriel@crill-002:/tmp/gabriel/qe-6.2.1/QE_input_files> tail -10 
input_tmp.in

  0.6667  0.5000 0.8333  5.787037e-04
  0.6667  0.5000 0.9167  5.787037e-04
  0.6667  0.5833 0.  5.787037e-04
  0.6667  0.5833 0.0833  5.787037e-04
  0.6667  0.5833 0.1667  5.787037e-04
  0.6667  0.5833 0.2500  5.787037e-04
  0.6667  0.5833 0.  5.787037e-04
  0.6667  0.5833 0.4167  5.787037e-04
  0.6667  0.5833 0.5000  5.787037e-04
  0.6667  0.5833 0.5833  5

which is what I *think* causes the problem.

 3. I tried to find where input_tmp.in is generated, but haven't 
completely identified the location. However, I could 

Re: [OMPI users] Installation of openmpi-1.10.7 fails

2018-01-22 Thread Edgar Gabriel
well, my final comment on this topic, as somebody suggested earlier in 
this email chain, if you provide the input with the -i argument instead 
of piping from standard input, things seem to work as far as I can see 
(disclaimer: I do not know what the final outcome should be. I just see 
that the application does not complain about the 'end of file while 
reading crystal k points'). So maybe that is the most simple solution.


Thanks

Edgar


On 1/22/2018 1:17 PM, Edgar Gabriel wrote:


after some further investigation, I am fairly confident that this is 
not an MPI I/O problem.


The input file input_tmp.in is generated in this sequence of 
instructions (which is in Modules/open_close_input_file.f90)


---

   IF ( TRIM(input_file_) /= ' ' ) THEn
  !
  ! copy file to be opened into input_file
  !
  input_file = input_file_
  !
   ELSE
  !
  ! if no file specified then copy from standard input
  !
  input_file="input_tmp.in"
  OPEN(UNIT = stdtmp, FILE=trim(input_file), FORM='formatted', &
   STATUS='unknown', IOSTAT = ierr )
  IF ( ierr > 0 ) GO TO 30
  !
  dummy=' '
  WRITE(stdout, '(5x,a)') "Waiting for input..."
  DO WHILE ( TRIM(dummy) .NE. "MAGICALME" )
     READ (stdin,fmt='(A512)',END=20) dummy
     WRITE (stdtmp,'(A)') trim(dummy)
  END DO
  !
20   CLOSE ( UNIT=stdtmp, STATUS='keep' )



Basically, if no input file has been provided, the input file is 
generated by reading from standard input. Since the application is 
being launched e.g. with


mpirun -np 64 ../bin/pw.x -npool 64 nscf.out

the data comes from nscf.in. I simply do not know enough about IO 
forwarding do be able to tell why we do not see the entire file, but 
one interesting detail is that if I run it in the debugger, the 
input_tmp.in is created correctly. However, if I run it using mpirun 
as shown above, the file is cropped incorrectly, which leads to the 
error message mentioned in this email chain.


Anyway, I would probably need some help here from somebody who knows 
the runtime better than me on what could go wrong at this point.


Thanks

Edgar




On 1/19/2018 1:22 PM, Vahid Askarpour wrote:

Concerning the following error

     from pw_readschemafile : error #         1
     xml data file not found

The nscf run uses files generated by the scf.in run. So I first run 
scf.in and when it finishes, I run nscf.in. If you have done this and 
still get the above error, then this could be another bug. It does 
not happen for me with intel14/openmpi-1.8.8.


Thanks for the update,

Vahid

On Jan 19, 2018, at 3:08 PM, Edgar Gabriel <egabr...@central.uh.edu 
<mailto:egabr...@central.uh.edu>> wrote:


ok, here is what found out so far, will have to stop for now however 
for today:


 1. I can in fact reproduce your bug on my systems.

 2. I can confirm that the problem occurs both with romio314 and 
ompio. I *think* the issue is that the input_tmp.in file is 
incomplete. In both cases (ompio and romio) the end of the file 
looks as follows (and its exactly the same for both libraries):


gabriel@crill-002:/tmp/gabriel/qe-6.2.1/QE_input_files> tail -10 
input_tmp.in

  0.6667  0.5000  0.8333  5.787037e-04
  0.6667  0.5000  0.9167  5.787037e-04
  0.6667  0.5833  0.  5.787037e-04
  0.6667  0.5833  0.0833  5.787037e-04
  0.6667  0.5833  0.1667  5.787037e-04
  0.6667  0.5833  0.2500  5.787037e-04
  0.6667  0.5833  0.  5.787037e-04
  0.6667  0.5833  0.4167  5.787037e-04
  0.6667  0.5833  0.5000  5.787037e-04
  0.6667  0.5833  0.5833  5

which is what I *think* causes the problem.

 3. I tried to find where input_tmp.in is generated, but haven't 
completely identified the location. However, I could not find MPI 
file_write(_all) operations anywhere in the code, although there are 
some MPI_file_read(_all) operations.


 4. I can confirm that the behavior with Open MPI 1.8.x is 
different. input_tmp.in looks more complete (at least it doesn't end 
in the middle of the line). The simulation does still not finish for 
me, but the bug reported is slightly different, I might just be 
missing a file or something



 from pw_readschemafile : error # 1
 xml data file not found

Since I think input_tmp.in is generated from data that is provided 
in nscf.in, it might very well be something in the 
MPI_File_read(_all) operation that causes the issue, but since both 
ompio and romio are affected, there is good chance that something 
outside of the control of io components is causing the trouble 
(maybe a datatype issue that has changed from 1.8.x series to 3.0.x).


 5. Last but not least, I also wanted to mention that I ran all 
parallel tests that I found in the testsuite  
(run-tests-cp-parallel, run-tests-pw-parallel, 
run-tests-ph-parallel, run-tests-epw-paral

Re: [OMPI users] Installation of openmpi-1.10.7 fails

2018-01-22 Thread Edgar Gabriel
after some further investigation, I am fairly confident that this is not 
an MPI I/O problem.


The input file input_tmp.in is generated in this sequence of 
instructions (which is in Modules/open_close_input_file.f90)


---

  IF ( TRIM(input_file_) /= ' ' ) THEn
 !
 ! copy file to be opened into input_file
 !
 input_file = input_file_
 !
  ELSE
 !
 ! if no file specified then copy from standard input
 !
 input_file="input_tmp.in"
 OPEN(UNIT = stdtmp, FILE=trim(input_file), FORM='formatted', &
  STATUS='unknown', IOSTAT = ierr )
 IF ( ierr > 0 ) GO TO 30
 !
 dummy=' '
 WRITE(stdout, '(5x,a)') "Waiting for input..."
 DO WHILE ( TRIM(dummy) .NE. "MAGICALME" )
    READ (stdin,fmt='(A512)',END=20) dummy
    WRITE (stdtmp,'(A)') trim(dummy)
 END DO
 !
20   CLOSE ( UNIT=stdtmp, STATUS='keep' )



Basically, if no input file has been provided, the input file is 
generated by reading from standard input. Since the application is being 
launched e.g. with


mpirun -np 64 ../bin/pw.x -npool 64 nscf.out

the data comes from nscf.in. I simply do not know enough about IO 
forwarding do be able to tell why we do not see the entire file, but one 
interesting detail is that if I run it in the debugger, the input_tmp.in 
is created correctly. However, if I run it using mpirun as shown above, 
the file is cropped incorrectly, which leads to the error message 
mentioned in this email chain.


Anyway, I would probably need some help here from somebody who knows the 
runtime better than me on what could go wrong at this point.


Thanks

Edgar




On 1/19/2018 1:22 PM, Vahid Askarpour wrote:

Concerning the following error

     from pw_readschemafile : error #         1
     xml data file not found

The nscf run uses files generated by the scf.in run. So I first run 
scf.in and when it finishes, I run nscf.in. If you have done this and 
still get the above error, then this could be another bug. It does not 
happen for me with intel14/openmpi-1.8.8.


Thanks for the update,

Vahid

On Jan 19, 2018, at 3:08 PM, Edgar Gabriel <egabr...@central.uh.edu 
<mailto:egabr...@central.uh.edu>> wrote:


ok, here is what found out so far, will have to stop for now however 
for today:


 1. I can in fact reproduce your bug on my systems.

 2. I can confirm that the problem occurs both with romio314 and 
ompio. I *think* the issue is that the input_tmp.in file is 
incomplete. In both cases (ompio and romio) the end of the file looks 
as follows (and its exactly the same for both libraries):


gabriel@crill-002:/tmp/gabriel/qe-6.2.1/QE_input_files> tail -10 
input_tmp.in

  0.6667  0.5000  0.8333  5.787037e-04
  0.6667  0.5000  0.9167  5.787037e-04
  0.6667  0.5833  0.  5.787037e-04
  0.6667  0.5833  0.0833  5.787037e-04
  0.6667  0.5833  0.1667  5.787037e-04
  0.6667  0.5833  0.2500  5.787037e-04
  0.6667  0.5833  0.  5.787037e-04
  0.6667  0.5833  0.4167  5.787037e-04
  0.6667  0.5833  0.5000  5.787037e-04
  0.6667  0.5833  0.5833  5

which is what I *think* causes the problem.

 3. I tried to find where input_tmp.in is generated, but haven't 
completely identified the location. However, I could not find MPI 
file_write(_all) operations anywhere in the code, although there are 
some MPI_file_read(_all) operations.


 4. I can confirm that the behavior with Open MPI 1.8.x is different. 
input_tmp.in looks more complete (at least it doesn't end in the 
middle of the line). The simulation does still not finish for me, but 
the bug reported is slightly different, I might just be missing a 
file or something



 from pw_readschemafile : error # 1
 xml data file not found

Since I think input_tmp.in is generated from data that is provided in 
nscf.in, it might very well be something in the MPI_File_read(_all) 
operation that causes the issue, but since both ompio and romio are 
affected, there is good chance that something outside of the control 
of io components is causing the trouble (maybe a datatype issue that 
has changed from 1.8.x series to 3.0.x).


 5. Last but not least, I also wanted to mention that I ran all 
parallel tests that I found in the testsuite  (run-tests-cp-parallel, 
run-tests-pw-parallel, run-tests-ph-parallel, run-tests-epw-parallel 
), and they all passed with ompio (and romio314 although I only ran a 
subset of the tests with romio314).


Thanks

Edgar

-




On 01/19/2018 11:44 AM, Vahid Askarpour wrote:

Hi Edgar,

Just to let you know that the nscf run with --mca io ompio crashed 
like the other two runs.


Thank you,

Vahid

On Jan 19, 2018, at 12:46 PM, Edgar Gabriel 
<egabr...@central.uh.edu <mailto:egabr...@central.uh.edu>> wrote:


ok, thank you for the information. Two short questions and 
requests. I have qe-6.2.1

Re: [OMPI users] Installation of openmpi-1.10.7 fails

2018-01-19 Thread Edgar Gabriel
this is most likely a different issue. The bug in the original case is 
appearing also on a local file system/disk, it doesn't have to be NSF.


That being said, I would urge to submit a new issue ( or a new email 
thread), I would be more than happy to look into your problem as well, 
since we submit a number of patches into the 3.0.x branch specifically 
for NFS.


Thanks
Edgar

On 1/19/2018 2:42 PM, Stephen Guzik wrote:

Not sure if this is related and I have not had time to investigate it
much or reduce but I am also having issues with 3.0.x.  There's a couple
of layers of cgns and hdf5 but I am seeing:

mpirun --mca io romio314 --mca btl self,vader,openib...
-- works perfectly

mpirun --mca btl self,vader,openib...
cgio_open_file:H5Dwrite:write to node data failed

The files system in NFS and an openmpi-v3.0.x-201711220306-2399e85 build.

Stephen

Stephen Guzik, Ph.D.
Assistant Professor, Department of Mechanical Engineering
Colorado State University

On 01/18/2018 04:17 PM, Jeff Squyres (jsquyres) wrote:

On Jan 18, 2018, at 5:53 PM, Vahid Askarpour  wrote:

My openmpi3.0.x run (called nscf run) was reading data from a routine Quantum 
Espresso input file edited by hand. The preliminary run (called scf run) was 
done with openmpi3.0.x on a similar input file also edited by hand.

Gotcha.

Well, that's a little disappointing.

It would be good to understand why it is crashing -- is the app doing something 
that is accidentally not standard?  Is there a bug in (soon to be released) 
Open MPI 3.0.1?  ...?


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Installation of openmpi-1.10.7 fails

2018-01-19 Thread Edgar Gabriel
ok, here is what found out so far, will have to stop for now however for 
today:


 1. I can in fact reproduce your bug on my systems.

 2. I can confirm that the problem occurs both with romio314 and ompio. 
I *think* the issue is that the input_tmp.in file is incomplete. In both 
cases (ompio and romio) the end of the file looks as follows (and its 
exactly the same for both libraries):


gabriel@crill-002:/tmp/gabriel/qe-6.2.1/QE_input_files> tail -10 
input_tmp.in

  0.6667  0.5000  0.8333  5.787037e-04
  0.6667  0.5000  0.9167  5.787037e-04
  0.6667  0.5833  0.  5.787037e-04
  0.6667  0.5833  0.0833  5.787037e-04
  0.6667  0.5833  0.1667  5.787037e-04
  0.6667  0.5833  0.2500  5.787037e-04
  0.6667  0.5833  0.  5.787037e-04
  0.6667  0.5833  0.4167  5.787037e-04
  0.6667  0.5833  0.5000  5.787037e-04
  0.6667  0.5833  0.5833  5

which is what I *think* causes the problem.

 3. I tried to find where input_tmp.in is generated, but haven't 
completely identified the location. However, I could not find MPI 
file_write(_all) operations anywhere in the code, although there are 
some MPI_file_read(_all) operations.


 4. I can confirm that the behavior with Open MPI 1.8.x is different. 
input_tmp.in looks more complete (at least it doesn't end in the middle 
of the line). The simulation does still not finish for me, but the bug 
reported is slightly different, I might just be missing a file or something



 from pw_readschemafile : error # 1
 xml data file not found

Since I think input_tmp.in is generated from data that is provided in 
nscf.in, it might very well be something in the MPI_File_read(_all) 
operation that causes the issue, but since both ompio and romio are 
affected, there is good chance that something outside of the control of 
io components is causing the trouble (maybe a datatype issue that has 
changed from 1.8.x series to 3.0.x).


 5. Last but not least, I also wanted to mention that I ran all 
parallel tests that I found in the testsuite (run-tests-cp-parallel, 
run-tests-pw-parallel, run-tests-ph-parallel, run-tests-epw-parallel ), 
and they all passed with ompio (and romio314 although I only ran a 
subset of the tests with romio314).


Thanks

Edgar

-




On 01/19/2018 11:44 AM, Vahid Askarpour wrote:

Hi Edgar,

Just to let you know that the nscf run with --mca io ompio crashed 
like the other two runs.


Thank you,

Vahid

On Jan 19, 2018, at 12:46 PM, Edgar Gabriel <egabr...@central.uh.edu 
<mailto:egabr...@central.uh.edu>> wrote:


ok, thank you for the information. Two short questions and requests. 
I have qe-6.2.1 compiled and running on my system (although it is 
with gcc-6.4 instead of the intel compiler), and I am currently 
running the parallel test suite. So far, all the tests passed, 
although it is still running.


My question is now, would it be possible for you to give me access to 
exactly the same data set that you are using?  You could upload to a 
webpage or similar and just send me the link.


The second question/request, could you rerun your tests one more 
time, this time forcing using ompio? e.g. --mca io ompio


Thanks

Edgar


On 1/19/2018 10:32 AM, Vahid Askarpour wrote:
To run EPW, the command for running the preliminary nscf run is 
(http://epw.org.uk/Documentation/B-dopedDiamond):


~/bin/openmpi-v3.0/bin/mpiexec -np 64 
/home/vaskarpo/bin/qe-6.0_intel14_soc/bin/pw.x -npool 64 < nscf.in > 
nscf.out


So I submitted it with the following command:

~/bin/openmpi-v3.0/bin/mpiexec --mca io romio314 -np 64 
/home/vaskarpo/bin/qe-6.0_intel14_soc/bin/pw.x -npool 64 < nscf.in > 
nscf.out


And it crashed like the first time.

It is interesting that the preliminary scf run works fine. The scf 
run requires Quantum Espresso to generate the k points automatically 
as shown below:


K_POINTS (automatic)
12 12 12 0 0 0

The nscf run which crashes includes a list of k points (1728 in this 
case) as seen below:


K_POINTS (crystal)
1728
  0.  0.  0.  5.787037e-04
  0.  0.  0.0833  5.787037e-04
  0.  0.  0.1667  5.787037e-04
  0.  0.  0.2500  5.787037e-04
  0.  0.  0.  5.787037e-04
  0.  0.  0.4167  5.787037e-04
  0.  0.  0.5000  5.787037e-04
  0.  0.  0.5833  5.787037e-04
  0.  0.  0.6667  5.787037e-04
  0.  0.  0.7500  5.787037e-04
…….
…….

To build openmpi (either 1.10.7 or 3.0.x), I loaded the fortran 
compiler module, configured with only the “--prefix="  and then 
“make all install”. I did not enable or disable any other options.


Cheers,

Vahid


On Jan 19, 2018, at 10:23 AM, Edgar Gabriel 
<egabr...@central.uh.edu <mailto:egabr...@central.uh.edu>> wrote:


thanks, that is 

Re: [OMPI users] Installation of openmpi-1.10.7 fails

2018-01-19 Thread Edgar Gabriel
ok, thank you for the information. Two short questions and requests. I 
have qe-6.2.1 compiled and running on my system (although it is with 
gcc-6.4 instead of the intel compiler), and I am currently running the 
parallel test suite. So far, all the tests passed, although it is still 
running.


My question is now, would it be possible for you to give me access to 
exactly the same data set that you are using?  You could upload to a 
webpage or similar and just send me the link.


The second question/request, could you rerun your tests one more time, 
this time forcing using ompio? e.g. --mca io ompio


Thanks

Edgar


On 1/19/2018 10:32 AM, Vahid Askarpour wrote:
To run EPW, the command for running the preliminary nscf run is 
(http://epw.org.uk/Documentation/B-dopedDiamond):


~/bin/openmpi-v3.0/bin/mpiexec -np 64 
/home/vaskarpo/bin/qe-6.0_intel14_soc/bin/pw.x -npool 64 < nscf.in > 
nscf.out


So I submitted it with the following command:

~/bin/openmpi-v3.0/bin/mpiexec --mca io romio314 -np 64 
/home/vaskarpo/bin/qe-6.0_intel14_soc/bin/pw.x -npool 64 < nscf.in > 
nscf.out


And it crashed like the first time.

It is interesting that the preliminary scf run works fine. The scf run 
requires Quantum Espresso to generate the k points automatically as 
shown below:


K_POINTS (automatic)
12 12 12 0 0 0

The nscf run which crashes includes a list of k points (1728 in this 
case) as seen below:


K_POINTS (crystal)
1728
  0.  0.  0.  5.787037e-04
  0.  0.  0.0833  5.787037e-04
  0.  0.  0.1667  5.787037e-04
  0.  0.  0.2500  5.787037e-04
  0.  0.  0.  5.787037e-04
  0.  0.  0.4167  5.787037e-04
  0.  0.  0.5000  5.787037e-04
  0.  0.  0.5833  5.787037e-04
  0.  0.  0.6667  5.787037e-04
  0.  0.  0.7500  5.787037e-04
…….
…….

To build openmpi (either 1.10.7 or 3.0.x), I loaded the fortran 
compiler module, configured with only the “--prefix="  and then “make 
all install”. I did not enable or disable any other options.


Cheers,

Vahid


On Jan 19, 2018, at 10:23 AM, Edgar Gabriel <egabr...@central.uh.edu 
<mailto:egabr...@central.uh.edu>> wrote:


thanks, that is interesting. Since /scratch is a lustre file system, 
Open MPI should actually utilize romio314 for that anyway, not ompio. 
What I have seen however happen on at least one occasions is that 
ompio was still used since ( I suspect) romio314 didn't pick up 
correctly the configuration options. It is a little bit of a mess 
from that perspective that we have to pass the romio arguments with 
different flag/options than for ompio, e.g.


--with-lustre=/path/to/lustre/ 
--with-io-romio-flags="--with-file-system=ufs+nfs+lustre 
--with-lustre=/path/to/lustre"


ompio should pick up the lustre options correctly if lustre 
headers/libraries are found at the default location, even if the user 
did not pass the --with-lustre option. I am not entirely sure what 
happens in romio if the user did not pass the 
--with-file-system=ufs+nfs+lustre but the lustre headers/libraries 
are found at the default location, i.e. whether the lustre adio 
component is still compiled or not.


Anyway, lets wait for the outcome of your run enforcing using the 
romio314 component, and I will still try to reproduce your problem on 
my system.


Thanks
Edgar

On 1/19/2018 7:15 AM, Vahid Askarpour wrote:

Gilles,

I have submitted that job with --mca io romio314. If it finishes, I will let 
you know. It is sitting in Conte’s queue at Purdue.

As to Edgar’s question about the file system, here is the output of df -Th:

vaskarpo@conte-fe00:~ $ df -Th
Filesystem   TypeSize  Used Avail Use% Mounted on
/dev/sda1ext4435G   16G  398G   4% /
tmpfstmpfs16G  1.4M   16G   1% /dev/shm
persistent-nfs.rcac.purdue.edu 
<http://persistent-nfs.rcac.purdue.edu>:/persistent/home
  nfs  80T   64T   17T  80% /home
persistent-nfs.rcac.purdue.edu 
<http://persistent-nfs.rcac.purdue.edu>:/persistent/apps
  nfs 8.0T  4.0T  4.1T  49% /apps
mds-d01-ib.rcac.purdue.edu@o2ib1:mds-d02-ib.rcac.purdue.edu@o2ib1:/lustreD
  lustre  1.4P  994T  347T  75% /scratch/conte
depotint-nfs.rcac.purdue.edu <http://depotint-nfs.rcac.purdue.edu>:/depot
  nfs 4.5P  3.0P  1.6P  66% /depot
172.18.84.186:/persistent/fsadmin
  nfs 200G  130G   71G  65% /usr/rmt_share/fsadmin

The code is compiled in my $HOME and is run on the scratch.

Cheers,

Vahid


On Jan 18, 2018, at 10:14 PM, Gilles 
Gouaillardet<gilles.gouaillar...@gmail.com>  wrote:

Vahid,

i the v1.10 series, the default MPI-IO component was ROMIO based, and
in the v3 series, it is now ompio.
You can force the latest Open MPI to use the ROMI

Re: [OMPI users] Installation of openmpi-1.10.7 fails

2018-01-19 Thread Edgar Gabriel
thanks, that is interesting. Since /scratch is a lustre file system, 
Open MPI should actually utilize romio314 for that anyway, not ompio. 
What I have seen however happen on at least one occasions is that ompio 
was still used since ( I suspect) romio314 didn't pick up correctly the 
configuration options. It is a little bit of a mess from that 
perspective that we have to pass the romio arguments with different 
flag/options than for ompio, e.g.


--with-lustre=/path/to/lustre/ 
--with-io-romio-flags="--with-file-system=ufs+nfs+lustre 
--with-lustre=/path/to/lustre"


ompio should pick up the lustre options correctly if lustre 
headers/libraries are found at the default location, even if the user 
did not pass the --with-lustre option. I am not entirely sure what 
happens in romio if the user did not pass the 
--with-file-system=ufs+nfs+lustre but the lustre headers/libraries are 
found at the default location, i.e. whether the lustre adio component is 
still compiled or not.


Anyway, lets wait for the outcome of your run enforcing using the 
romio314 component, and I will still try to reproduce your problem on my 
system.


Thanks
Edgar

On 1/19/2018 7:15 AM, Vahid Askarpour wrote:

Gilles,

I have submitted that job with --mca io romio314. If it finishes, I will let 
you know. It is sitting in Conte’s queue at Purdue.

As to Edgar’s question about the file system, here is the output of df -Th:

vaskarpo@conte-fe00:~ $ df -Th
Filesystem   TypeSize  Used Avail Use% Mounted on
/dev/sda1ext4435G   16G  398G   4% /
tmpfstmpfs16G  1.4M   16G   1% /dev/shm
persistent-nfs.rcac.purdue.edu:/persistent/home
  nfs  80T   64T   17T  80% /home
persistent-nfs.rcac.purdue.edu:/persistent/apps
  nfs 8.0T  4.0T  4.1T  49% /apps
mds-d01-ib.rcac.purdue.edu@o2ib1:mds-d02-ib.rcac.purdue.edu@o2ib1:/lustreD
  lustre  1.4P  994T  347T  75% /scratch/conte
depotint-nfs.rcac.purdue.edu:/depot
  nfs 4.5P  3.0P  1.6P  66% /depot
172.18.84.186:/persistent/fsadmin
  nfs 200G  130G   71G  65% /usr/rmt_share/fsadmin

The code is compiled in my $HOME and is run on the scratch.

Cheers,

Vahid


On Jan 18, 2018, at 10:14 PM, Gilles 
Gouaillardet<gilles.gouaillar...@gmail.com>  wrote:

Vahid,

i the v1.10 series, the default MPI-IO component was ROMIO based, and
in the v3 series, it is now ompio.
You can force the latest Open MPI to use the ROMIO based component with
mpirun --mca io romio314 ...

That being said, your description (e.g. a hand edited file) suggests
that I/O is not performed with MPI-IO,
which makes me very puzzled on why the latest Open MPI is crashing.

Cheers,

Gilles

On Fri, Jan 19, 2018 at 10:55 AM, Edgar Gabriel<egabr...@central.uh.edu>  wrote:

I will try to reproduce this problem with 3.0.x, but it might take me a
couple of days to get to it.

Since it seemed to have worked with 2.0.x (except for the running out file
handles problem), there is the suspicion that one of the fixes that we
introduced since then is the problem.

What file system did you run it on? NFS?

Thanks

Edgar


On 1/18/2018 5:17 PM, Jeff Squyres (jsquyres) wrote:

On Jan 18, 2018, at 5:53 PM, Vahid Askarpour<vh261...@dal.ca>  wrote:

My openmpi3.0.x run (called nscf run) was reading data from a routine
Quantum Espresso input file edited by hand. The preliminary run (called scf
run) was done with openmpi3.0.x on a similar input file also edited by hand.

Gotcha.

Well, that's a little disappointing.

It would be good to understand why it is crashing -- is the app doing
something that is accidentally not standard?  Is there a bug in (soon to be
released) Open MPI 3.0.1?  ...?


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Installation of openmpi-1.10.7 fails

2018-01-18 Thread Edgar Gabriel

I will try to reproduce this problem with 3.0.x, but it might take me a
couple of days to get to it.

Since it seemed to have worked with 2.0.x (except for the running out file 
handles problem), there is the suspicion that one of the fixes that we 
introduced since then is the problem.

What file system did you run it on? NFS?

Thanks

Edgar


On 1/18/2018 5:17 PM, Jeff Squyres (jsquyres) wrote:

On Jan 18, 2018, at 5:53 PM, Vahid Askarpour  wrote:

My openmpi3.0.x run (called nscf run) was reading data from a routine Quantum 
Espresso input file edited by hand. The preliminary run (called scf run) was 
done with openmpi3.0.x on a similar input file also edited by hand.

Gotcha.

Well, that's a little disappointing.

It would be good to understand why it is crashing -- is the app doing something 
that is accidentally not standard?  Is there a bug in (soon to be released) 
Open MPI 3.0.1?  ...?



___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Incorrect file size from MPI_File_write (_all) when using MPI derived types for both view filetype and write datatype

2017-11-06 Thread Edgar Gabriel
ok, I found the problem, a book-keeping error when the read/write 
operations spans multiple internal cycles. Will commit a fix shortly to 
master, and will try to get a pr into the 3.0.x and the upcoming 3.1.x 
series, not sure however which precise release the fix will make it in.


Thanks for the bug report!

Edgar


On 11/6/2017 8:25 AM, Edgar Gabriel wrote:

I'll have a look at  it. I can confirm that I can replicate the problem,
and I do not see an obvious mistake in your code for 1 process
scenarios. Will keep you posted.

Thanks

Edgar


On 11/6/2017 7:52 AM, Christopher Brady wrote:

I have been working with a Fortran code that has to write a large array to disk excluding 
an outer strip of guard cells using MPI-IO. This uses two MPI types, one representing an 
array the size of the main array without its guard cells that is passed to 
MPI_File_set_view as the filetype, and another that represents the subsection of the main 
array not including the guard cells that is used as the datatype in MPI_File_write (same 
result with MPI_File_write_all). Both subarrays are created using 
MPI_Type_create_subarray. When the file size (per core) reaches a value of 512MB the 
final output size diverges from the expected one and is always smaller than expected. It 
does not reach a hard bound, but is always smaller than expected. I have replicated this 
behaviour on machines using Open-MPI 2.1.2 and 3.0.0, and am attaching a simple test code 
(in both C and "use mpi" Fortran) that replicates the behaviour on a single 
core (the test codes only work on a single core, but I have demonstrated the same problem 
on multiple cores with our main code). While I've replicated this behaviour on several 
machines (every machine that I've tried it on), I'm also attaching the ompi_info output 
and config.log files for a machine that demonstrates the problem.

If anyone can tell me if I've made a mistake, if this is a known bug that I've 
missed in the archives (very sorry if so), or even if this is a previously 
unknown bug I'd be very grateful.

Many Thanks
Chris Brady
Senior Research Software Engineer
University of Warwick


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users



___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Incorrect file size from MPI_File_write (_all) when using MPI derived types for both view filetype and write datatype

2017-11-06 Thread Edgar Gabriel
yes, the result is correct in that case, we must be overrunning a 
counter or something similar in the ompio code.


Edgar


On 11/6/2017 8:40 AM, Gilles Gouaillardet wrote:

Chris,

Can you try to

mpirun --mca io romio314 ...

And see if it helps ?

Cheers,

Gilles

Christopher Brady <c.s.br...@warwick.ac.uk> wrote:

I have been working with a Fortran code that has to write a large array to disk excluding 
an outer strip of guard cells using MPI-IO. This uses two MPI types, one representing an 
array the size of the main array without its guard cells that is passed to 
MPI_File_set_view as the filetype, and another that represents the subsection of the main 
array not including the guard cells that is used as the datatype in MPI_File_write (same 
result with MPI_File_write_all). Both subarrays are created using 
MPI_Type_create_subarray. When the file size (per core) reaches a value of 512MB the 
final output size diverges from the expected one and is always smaller than expected. It 
does not reach a hard bound, but is always smaller than expected. I have replicated this 
behaviour on machines using Open-MPI 2.1.2 and 3.0.0, and am attaching a simple test code 
(in both C and "use mpi" Fortran) that replicates the behaviour on a single 
core (the test codes only work on a single core, but I have d

  emonstra
  ted the same problem on multiple cores with our main code). While I've 
replicated this behaviour on several machines (every machine that I've tried it 
on), I'm also attaching the ompi_info output and config.log files for a machine 
that demonstrates the problem.

If anyone can tell me if I've made a mistake, if this is a known bug that I've 
missed in the archives (very sorry if so), or even if this is a previously 
unknown bug I'd be very grateful.

Many Thanks
Chris Brady
Senior Research Software Engineer
University of Warwick


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


--
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 228Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335
--


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Incorrect file size from MPI_File_write (_all) when using MPI derived types for both view filetype and write datatype

2017-11-06 Thread Edgar Gabriel
I'll have a look at  it. I can confirm that I can replicate the problem, 
and I do not see an obvious mistake in your code for 1 process 
scenarios. Will keep you posted.


Thanks

Edgar


On 11/6/2017 7:52 AM, Christopher Brady wrote:

I have been working with a Fortran code that has to write a large array to disk excluding 
an outer strip of guard cells using MPI-IO. This uses two MPI types, one representing an 
array the size of the main array without its guard cells that is passed to 
MPI_File_set_view as the filetype, and another that represents the subsection of the main 
array not including the guard cells that is used as the datatype in MPI_File_write (same 
result with MPI_File_write_all). Both subarrays are created using 
MPI_Type_create_subarray. When the file size (per core) reaches a value of 512MB the 
final output size diverges from the expected one and is always smaller than expected. It 
does not reach a hard bound, but is always smaller than expected. I have replicated this 
behaviour on machines using Open-MPI 2.1.2 and 3.0.0, and am attaching a simple test code 
(in both C and "use mpi" Fortran) that replicates the behaviour on a single 
core (the test codes only work on a single core, but I have demonstrated the same problem 
on multiple cores with our main code). While I've replicated this behaviour on several 
machines (every machine that I've tried it on), I'm also attaching the ompi_info output 
and config.log files for a machine that demonstrates the problem.

If anyone can tell me if I've made a mistake, if this is a known bug that I've 
missed in the archives (very sorry if so), or even if this is a previously 
unknown bug I'd be very grateful.

Many Thanks
Chris Brady
Senior Research Software Engineer
University of Warwick



___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Missing data with MPI I/O and NFS

2017-10-12 Thread Edgar Gabriel
try for now to switch to the romio314 component with OpenMPI. There is 
an issue with NFS and OMPIO that I am aware of and working on, that 
might trigger this behavior (although it should actually work for 
collective I/O even in that case).


try to set something like

mpirun --mca io romio314 ...

Thanks

Edgar


On 10/12/2017 8:26 PM, Stephen Guzik wrote:

Hi,

I'm having trouble with parallel I/O to a file system mounted with NFS
over an infiniband network.  In my test code, I'm simply writing 1 byte
per process to the same file.  When using two nodes, some bytes are not
written (zero bits in the unwritten bytes).  Usually at least some data
from each node is written---it appears to be all data from one node and
partial from the other.

This used to work fine but broke when the cluster was upgraded from
Debian 8 to Debian 9.  I suspect an issue with NFS and not with
OpenMPI.  However, if anyone can suggest a work-around or ways to get
more information, I would appreciate it.  In the sole case where the
file system is exported with 'sync' and mounted with 'hard,intr', I get
the error:
[node1:14823] mca_sharedfp_individual_file_open: Error during datafile
file open
MPI_ERR_FILE: invalid file
[node2:14593] (same)

--

Some additional info:
- tested versions 1.8.8, 2.1.1, and 3.0.0 self-compiled and packaged and
vendor-supplied versions.  All have same behavior.
- all write methods (individual or collective) fail similarly.
- exporting the file system to two workstations across ethernet and
running the job across the two workstations seems to work fine.
- on a single node, everything works as expected in all cases.  In the
case described above where I get an error, the error is only observed
with processes on two nodes.
- code follows.

Thanks,
Stephen Guzik

--

#include 

#include 

int main(int argc, const char* argv[])
{
   MPI_File fh;
   MPI_Status status;

   int mpierr;
   char mpistr[MPI_MAX_ERROR_STRING];
   int mpilen;
   int numProc;
   int procID;
   MPI_Init(, const_cast());
   MPI_Comm_size(MPI_COMM_WORLD, );
   MPI_Comm_rank(MPI_COMM_WORLD, );

   const int filesize = numProc;
   const int bufsize = filesize/numProc;
   char *buf = new char[bufsize];
   buf[0] = (char)(48 + procID);
   int numChars = bufsize/sizeof(char);

   mpierr = MPI_File_open(MPI_COMM_WORLD, "dataio",
  MPI_MODE_CREATE | MPI_MODE_WRONLY,
MPI_INFO_NULL, );
   if (mpierr != MPI_SUCCESS)
 {
   MPI_Error_string(mpierr, mpistr, );
   std::cout << "Error: " << mpistr << std::endl;
 }
   mpierr = MPI_File_write_at_all(fh, (MPI_Offset)(procID*bufsize), buf,
  numChars, MPI_CHAR, );
   if (mpierr != MPI_SUCCESS)
 {
   MPI_Error_string(mpierr, mpistr, );
   std::cout << "Error: " << mpistr << std::endl;
 }
   mpierr = MPI_File_close();
   if (mpierr != MPI_SUCCESS)
 {
   MPI_Error_string(mpierr, mpistr, );
   std::cout << "Error: " << mpistr << std::endl;
 }

   delete[] buf;
   MPI_Finalize();
   return 0;
}

___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users


Re: [OMPI users] Problem with MPI_FILE_WRITE_AT

2017-09-15 Thread Edgar Gabriel
thank you for the report and the code, I will look into this. What file 
system is that occurring on?


Until I find the problem, note that you could switch to back to the 
previous parallel I/O implementation (romio) by providing that as a 
parameter to your mpirun command, e.g.


mpirun --mca io romio314 -np 

Thanks

Edgar


On 9/15/2017 4:39 PM, McGrattan, Kevin B. Dr. (Fed) wrote:


I am using MPI_FILE_WRITE_AT to print out the timings of subroutines 
in a big Fortran code. I have noticed since upgrading to Open MPI 
2.1.1 that sometimes the file to be written is corrupted. Each MPI 
process is supposed to write out a character string that is 159 
characters in length, plus a line feed. Sometimes, I see


^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@^@

Instead of the character string. I cannot reproduce the problem 
consistently. That is, sometimes the file is fine, and sometimes the 
records are corrupted randomly. The subroutine is included below. I 
added some MPI_BARRIERs hoping that this would prevent the file from 
being closed too early, but that did not help.


SUBROUTINE DUMP_TIMERS

INTEGER, PARAMETER :: LINE_LENGTH=159

CHARACTER, PARAMETER :: LF=ACHAR(10)

CHARACTER(LEN=LINE_LENGTH+1) :: LINE,HEAD

INTEGER :: ERROR,RECORD,FH

CALL MPI_BARRIER(MPI_COMM_WORLD, IERR)

FN_CPU = ‘file_cpu.csv'

CALL MPI_TYPE_CONTIGUOUS(LINE_LENGTH+1,MPI_CHARACTER,RECORD,ERROR)

CALL MPI_TYPE_COMMIT(RECORD,ERROR)

CALL 
MPI_FILE_OPEN(MPI_COMM_WORLD,FN_CPU,MPI_MODE_WRONLY+MPI_MODE_CREATE,MPI_INFO_NULL,FH,ERROR)


CALL 
MPI_FILE_SET_VIEW(FH,0_MPI_OFFSET_KIND,RECORD,RECORD,'NATIVE',MPI_INFO_NULL,ERROR)


! T_USED(1) is the time spend in the main routine; i.e. the time not 
spend in  some other routine


T_USED(1) = SECOND() - T_USED(1) - SUM(T_USED(2:N_TIMERS))

WRITE(LINE,'(I5,14(",",ES10.3))') 
MYID,(T_USED(I),I=1,N_TIMERS),SUM(T_USED(1:N_TIMERS))


LINE(LINE_LENGTH+1:LINE_LENGTH+1) = LF

IF (MYID==0) THEN

   HEAD(1:LINE_LENGTH+1) = ' '

   WRITE(HEAD,'(A)') 
'Rank,MAIN,DIVG,MASS,VELO,PRES,WALL,DUMP,PART,RADI,FIRE,COMM,EVAC,HVAC,Total 
T_USED (s)'


   HEAD(LINE_LENGTH+1:LINE_LENGTH+1) = LF

   CALL 
MPI_FILE_WRITE_AT(FH,INT(0,MPI_OFFSET_KIND),HEAD,1,RECORD,MPI_STATUS_IGNORE,ERROR)


ENDIF

CALL 
MPI_FILE_WRITE_AT(FH,INT(MYID+1,MPI_OFFSET_KIND),LINE,1,RECORD,MPI_STATUS_IGNORE,ERROR)


CALL MPI_BARRIER(MPI_COMM_WORLD, IERR)

CALL MPI_FILE_CLOSE(FH,ERROR)

CALL MPI_TYPE_FREE(RECORD,ERROR)

END SUBROUTINE DUMP_TIMERS



___
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] datarep=NULL argument in MPI_File_set_view

2017-05-11 Thread Edgar Gabriel
I would argue that if the standard does not mention NULL as a valid 
argument, we should probably remove it from the man pages. I can not 
recall having seen a code using that feature to be honest.


Thanks
Edgar


On 5/11/2017 8:46 PM, Bert Wesarg via users wrote:

Hi,

the MPI_File_set_view.3 manual from Open MPI 2.1.0 tells me:

To obtain the default value (or "native"), pass NULL.

Though, that results in SIGSEGV:

#0  strlen () at ../sysdeps/x86_64/strlen.S:106
#1  0x774fb41e in __GI___strdup (s=0x0) at strdup.c:41
#2  0x7fffde917d3c in mca_io_ompio_set_view_internal (fh=0x73e910,
disp=0, etype=0x602200 , filetype=0x602200 ,
datarep=0x0, info=0x0) at
../../../../../ompi/mca/io/ompio/io_ompio_file_set_view.c:108
#3  0x7fffde918318 in mca_io_ompio_file_set_view (fp=0x746610,
disp=0, etype=0x602200 , filetype=0x602200 ,
datarep=0x0, info=0x0) at
../../../../../ompi/mca/io/ompio/io_ompio_file_set_view.c:239
#4  0x77b1a44d in PMPI_File_set_view (fh=0x746610, disp=0,
etype=0x602200 , filetype=0x602200 ,
datarep=0x0, info=0x0) at pfile_set_view.c:73
#5  0x00400fea in main (argc=2, argv=0x7fffbfe8) at
mpi_io_test.c:58

Note, that the MPI 3.1 document does not mention NULL as a valid value
for datarep
in 13.5 at all.

Best,
Bert

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] MPI I/O gives undefined behavior if the amount of bytes described by a filetype reaches 2^32

2017-04-28 Thread Edgar Gabriel
short update on this: master does not finish with either OMPIO or ROMIO. 
It admittedly segfaults earlier with OMPIO than with ROMIO, but with one 
little tweak I can make them fail both at the same spot. There is 
clearly a memory corruption going on for the larger cases, I will try to 
further narrow it down.



On 4/28/2017 8:46 AM, Edgar Gabriel wrote:

actually, reading through the email in more details, I actually doubt
that it is OMPIO.

"I deem an output wrong if it doesn't follow from the parameters or if
the program crashes on execution.
The only difference between OpenMPI and Intel MPI, according to my
tests, is in the different behavior on error: OpenMPI will mostly write
wrong data but won't crash, whereas Intel MPI mostly crashes."

I will still look into that though.

THanks

Edgar


On 4/28/2017 8:26 AM, Edgar Gabriel wrote:

Thank you for the detailed analysis,  I will have a look into that. It
would be really important to know which version of Open MPI triggers
this problem?

Christoph, I doubt that it is

https://github.com/open-mpi/ompi/issues/2399

due to the fact that the test uses collective I/O, which breaks down the 
operations internally cycles (typically 32 MB), so that issue should not be 
triggered. If it is OMPIO, I would suspect more that it has to do with 
something on how we treat/analyze the fileview. We did have test cases 
exceeding 100GB file size overall that worked correctly, but I am not sure on 
whether we exceed 4GB 'portion' of a file view per rank, I will look into that.

Thanks
Edgar


On 4/28/2017 6:49 AM, gil...@rist.or.jp wrote:

Before v1.10, the default is ROMIO, and you can force OMPIO with
mpirun --mca io ompio ...

   From v2, the default is OMPIO (unless you are running on lustre iirc),
and you can force ROMIO with
mpirun --mca io ^ompio ...

maybe that can help for the time being

Cheers,

Gilles

- Original Message -

Hello,

Which MPI Version are you using?
This looks for me like it triggers https://github.com/open-mpi/ompi/issues/2399
You can check if you are running into this problem by playing around

with the mca_io_ompio_cycle_buffer_size parameter.

Best
Christoph Niethammer

--

Christoph Niethammer
High Performance Computing Center Stuttgart (HLRS)
Nobelstrasse 19
70569 Stuttgart

Tel: ++49(0)711-685-87203
email: nietham...@hlrs.de
http://www.hlrs.de/people/niethammer



- Original Message -
From: "Nils Moschuering" <ni...@ipp.mpg.de>
To: "Open MPI Users" <users@lists.open-mpi.org>
Sent: Friday, April 28, 2017 12:51:50 PM
Subject: [OMPI users] MPI I/O gives undefined behavior if the amount

of bytes described by a filetype reaches 2^32

Dear OpenMPI Mailing List,

I have a problem with MPI I/O running on more than 1 rank using very

large filetypes. In order to reproduce the problem please take advantage
of the attached program "mpi_io_test.c". After compilation it should be
run on 2 nodes.

The program will do the following for a variety of different

parameters:

1. Create an elementary datatype (commonly refered to as etype in the

MPI Standard) of a specific size given by the parameter bsize (in
multiple of bytes). This datatype is called blk_filetype .

2. Create a complex filetype, which is different for each rank. This

filetype divides the file into a number of blocks given by parameter nr_
blocks of size bsize . Each rank only gets access to a subarray
containing

nr_blocks_per_rank = nr_blocks / size
blocks (where size is the number of participating ranks). The

respective subarray of each rank starts at

rank * nr_blocks_per_rank
This guarantees that the regions of the different ranks don't overlap.
The resulting datatype is called full_filetype .
3. Allocate enough memory on each rank, in order to be able to write a

whole block.

4. Fill the allocated memory with the rank number to be able to check

the resulting file for correctness.

5. Open a file named fname and set the view using the previously

generated blk_filetype and full_filetype .

6. Write one block on each rank, using the collective routine.
7. Clean up.

The above will be repeated for different values of bsize and nr_blocks

. Please note, that there is no overflow of the used basic dataype int .

The output is verified using
hexdump fname
which performs a hexdump of the file. This tool collects consecutive

equal lines in a file into one output line. The resulting output of a
call to hexdump is given by a structure comparable to the following

 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 |.

...|

*
1f40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |.

...|

*
3e80 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 |.

...|

*
5dc0
This example is to be read in the following manner:
-From byte  to 1f40 (which is equal to 500 Mib) the file

contains the value 01 in each byte.

-From byte 1f40 to 3e80 (which is equal 

Re: [OMPI users] MPI I/O gives undefined behavior if the amount of bytes described by a filetype reaches 2^32

2017-04-28 Thread Edgar Gabriel
actually, reading through the email in more details, I actually doubt 
that it is OMPIO.


"I deem an output wrong if it doesn't follow from the parameters or if 
the program crashes on execution.
The only difference between OpenMPI and Intel MPI, according to my 
tests, is in the different behavior on error: OpenMPI will mostly write 
wrong data but won't crash, whereas Intel MPI mostly crashes."


I will still look into that though.

THanks

Edgar


On 4/28/2017 8:26 AM, Edgar Gabriel wrote:

Thank you for the detailed analysis,  I will have a look into that. It
would be really important to know which version of Open MPI triggers
this problem?

Christoph, I doubt that it is

https://github.com/open-mpi/ompi/issues/2399

due to the fact that the test uses collective I/O, which breaks down the 
operations internally cycles (typically 32 MB), so that issue should not be 
triggered. If it is OMPIO, I would suspect more that it has to do with 
something on how we treat/analyze the fileview. We did have test cases 
exceeding 100GB file size overall that worked correctly, but I am not sure on 
whether we exceed 4GB 'portion' of a file view per rank, I will look into that.

Thanks
Edgar


On 4/28/2017 6:49 AM, gil...@rist.or.jp wrote:

Before v1.10, the default is ROMIO, and you can force OMPIO with
mpirun --mca io ompio ...

  From v2, the default is OMPIO (unless you are running on lustre iirc),
and you can force ROMIO with
mpirun --mca io ^ompio ...

maybe that can help for the time being

Cheers,

Gilles

- Original Message -

Hello,

Which MPI Version are you using?
This looks for me like it triggers https://github.com/open-mpi/ompi/issues/2399
You can check if you are running into this problem by playing around

with the mca_io_ompio_cycle_buffer_size parameter.

Best
Christoph Niethammer

--

Christoph Niethammer
High Performance Computing Center Stuttgart (HLRS)
Nobelstrasse 19
70569 Stuttgart

Tel: ++49(0)711-685-87203
email: nietham...@hlrs.de
http://www.hlrs.de/people/niethammer



- Original Message -
From: "Nils Moschuering" <ni...@ipp.mpg.de>
To: "Open MPI Users" <users@lists.open-mpi.org>
Sent: Friday, April 28, 2017 12:51:50 PM
Subject: [OMPI users] MPI I/O gives undefined behavior if the amount

of bytes described by a filetype reaches 2^32

Dear OpenMPI Mailing List,

I have a problem with MPI I/O running on more than 1 rank using very

large filetypes. In order to reproduce the problem please take advantage
of the attached program "mpi_io_test.c". After compilation it should be
run on 2 nodes.

The program will do the following for a variety of different

parameters:

1. Create an elementary datatype (commonly refered to as etype in the

MPI Standard) of a specific size given by the parameter bsize (in
multiple of bytes). This datatype is called blk_filetype .

2. Create a complex filetype, which is different for each rank. This

filetype divides the file into a number of blocks given by parameter nr_
blocks of size bsize . Each rank only gets access to a subarray
containing

nr_blocks_per_rank = nr_blocks / size
blocks (where size is the number of participating ranks). The

respective subarray of each rank starts at

rank * nr_blocks_per_rank
This guarantees that the regions of the different ranks don't overlap.
The resulting datatype is called full_filetype .
3. Allocate enough memory on each rank, in order to be able to write a

whole block.

4. Fill the allocated memory with the rank number to be able to check

the resulting file for correctness.

5. Open a file named fname and set the view using the previously

generated blk_filetype and full_filetype .

6. Write one block on each rank, using the collective routine.
7. Clean up.

The above will be repeated for different values of bsize and nr_blocks

. Please note, that there is no overflow of the used basic dataype int .

The output is verified using
hexdump fname
which performs a hexdump of the file. This tool collects consecutive

equal lines in a file into one output line. The resulting output of a
call to hexdump is given by a structure comparable to the following

 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 |.

...|

*
1f40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |.

...|

*
3e80 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 02 |.

...|

*
5dc0
This example is to be read in the following manner:
-From byte  to 1f40 (which is equal to 500 Mib) the file

contains the value 01 in each byte.

-From byte 1f40 to 3e80 (which is equal to 1000 Mib) the file

contains the value 00 in each byte.

-From byte 3e80 to 5dc0 (which is equal to 1500 Mib) the file

contains the value 02 in each byte.

-The file ends here.
This is the correct output of the above outlined program with

parameters

bsize=500*1023*1024
nr_blocks=4
running on 2 ranks. The attached file contains a lot of tes

Re: [OMPI users] MPI I/O gives undefined behavior if the amount of bytes described by a filetype reaches 2^32

2017-04-28 Thread Edgar Gabriel
he source.
The final conclusions, I derive from the tests, are the following:

1. If the filetype used in the view is set in a way that it describes

an amount of bytes equaling or exceeding 2^32 = 4Gib the code produces
wrong output. For values slightly smaller (the second example with fname
="test_8_blocks" uses a total filetype size of 4000 MiB which is smaller
than 4Gib) the code works as expected.

2. The act of actually writing the described regions is not important.

When the filetype describes an area >= 4Gib but only writes to regions
much smaller than that, the code still produces undefined behavior (
please refer to the 6th example with fname="test_too_large_blocks" in
order to see an example).

3. It doesn't matter if the block size or the amount of blocks pushes

the filetype over the 4 Gib (refer to the 5th and 6th example, with
filenames "test_16_blocks" and "test_too_large_blocks" respectively).

4. If the binary is launched using only one rank, the output is always

as expected (refer to the 3rd and 4th example, with filenames "test_too_
large_blocks_single" and "test_too_large_blocks_single_even_larger",
respectively).

There are, of course, many other things one could test.
It seems that the implementations use 32bit integer variables to

compute the byte composition inside the filetype. Since the filetype is
defined using two 32bit integer variables, this can easily lead to
integer overflows if the user supplies large values. It seems that no
implementation expects this problem and therefore they do not act
gracefully on its occurrence.

I looked at [ https://software.intel.com/en-us/node/528914 | ILP64 ]

Support, but it only adapts the function parameters and not the
internally used variables and it is also not available for C.

I looked at [ 
https://www.gnu.org/software/libc/manual/html_node/Program-Error-Signals.html#Program%20Error%20Signals

  | integer

   overflow ] (FPE_INTOVF_TRAP) trapping, which could help to

verify the source of the problem, but it doesn't seem to be possible for
C. Intel does [ 
https://software.intel.com/en-us/forums/intel-c-compiler/topic/306156
  | not ] offer any built-in integer overflow trapping.

There are ways to circumvent this problem for most cases. It is only

unavoidable if the logic of a program contains complex, non-repeating
data structures with sizes of over (or equal) 4GiB. Even then, one could
split up the filetype and use a different displacement in two distinct
write calls.

Still, this problem violates the standard as it produces undefined

behavior even when using the API in a consistent way. The implementation
should at least provide a warning for the user, but should ideally use
larger datatypes in the filetype computations. When a user stumbles on
this problem, he will have a hard time to debug it.

Thank you very much for reading everything ;)

Kind Regards,

Nils

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users



___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


--
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335
--

___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] OpenMPI 2.1.0 build error: yes/lib: No such file or director

2017-04-26 Thread Edgar Gabriel
Can you try to just skip the --with-lustre option ? The option really is 
there to provide an alternative path, if the lustre libraries are not 
installed in the default directories ( e.g. 
--with-lustre=/opt/lustre/).  There is obviously a bug that the 
system did not recognize the missing argument. However, if the lustre 
libraries and headers are installed in the default location (i.e. 
/usr/), the configure logic will pick it up and compile it even if you 
do not provide the --with-lustre argument.


Thanks

Edgar


On 4/26/2017 4:18 PM, Prentice Bisbal wrote:

I'm getting the following error when I build OpenMPI 2.1.0 with GCC 5.4.0:

/bin/sh ../../../../libtool  --tag=CC   --mode=link gcc  -O3 -DNDEBUG
-finline-functions -fno-strict-aliasing -pthread -module -avoid-version
-Lyes/lib  -o libmca_fs_lustre.la  fs_lustre.lo fs_lustre_component.lo
fs_lustre_file_open.lo fs_lustre_file_close.lo fs_lustre_file_delete.lo
fs_lustre_file_sync.lo fs_lustre_file_set_size.lo
fs_lustre_file_get_size.lo -llustreapi  -lrt -lm -lutil
../../../../libtool: line 7489: cd: yes/lib: No such file or directory
libtool:   error: cannot determine absolute directory name of 'yes/lib'
make[2]: *** [libmca_fs_lustre.la] Error 1
make[2]: Leaving directory `/local/pbisbal/openmpi-2.1.0/ompi/mca/fs/lustre'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/local/pbisbal/openmpi-2.1.0/ompi'
make: *** [all-recursive] Error 1

Obviously, the problem is this argument to libtool in the above command:

-Lyes/lib

I've worked around this by going into ompi/mca/fs/lustre, running that
same libtool command but changing "-Lyes/lib" to "-L/lib", and then
resuming my build from the top level, I figured I'd report this error
here, to see if this a problem caused by me, or a bug in the configure
script.

When I do 'make check', I get another error caused by the same bad argument:

/bin/sh ../../libtool  --tag=CC   --mode=link gcc  -O3 -DNDEBUG
-finline-functions -fno-strict-aliasing -pthread
-L/usr/pppl/slurm/15.08.8/lib -Lyes/lib-Wl,-rpath
-Wl,/usr/pppl/slurm/15.08.8/lib -Wl,-rpath -Wl,yes/lib -Wl,-rpath
-Wl,/usr/pppl/gcc/5.4-pkgs/openmpi-2.1.0/lib -Wl,--enable-new-dtags  -o
external32 external32.o ../../ompi/libmpi.la ../../opal/libopen-pal.la
-lrt -lm -lutil
../../libtool: line 7489: cd: yes/lib: No such file or directory
libtool:   error: cannot determine absolute directory name of 'yes/lib'
make[3]: *** [external32] Error 1
make[3]: Leaving directory `/local/pbisbal/openmpi-2.1.0/test/datatype'
make[2]: *** [check-am] Error 2
make[2]: Leaving directory `/local/pbisbal/openmpi-2.1.0/test/datatype'
make[1]: *** [check-recursive] Error 1
make[1]: Leaving directory `/local/pbisbal/openmpi-2.1.0/test'
make: *** [check-recursive] Error 1

For reference, here is my configure command:

./configure \
--prefix=/usr/pppl/gcc/5.4-pkgs/openmpi-2.1.0 \
--disable-silent-rules \
--enable-mpi-fortran \
--enable-mpi-cxx \
--enable-shared \
--enable-static \
--enable-mpi-thread-multiple \
--with-cuda=/usr/pppl/cuda/cudatoolkit/6.5.14 \
--with-pmix \
--with-verbs \
--with-hwloc \
--with-pmi=/usr/pppl/slurm/15.08.8 \
--with-slurm \
--with-lustre \
--with-psm \
CC=gcc \
CXX=g++ \
FC=gfortran \
2>&1 | tee configure.log



___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Lustre support uses deprecated include.

2017-03-13 Thread Edgar Gabriel
thank you for the report, it is on my to do list. I will try to get the 
configure logic to recognize which file to use later this, should 
hopefully be done for 2.0.3 and 2.1.1 series.


Thanks

Edgar


On 3/13/2017 8:55 AM, Åke Sandgren wrote:

Hi!

The lustre support in ompi/mca/fs/lustre/fs_lustre.h is using a
deprecated include.

#include 

is deprecated in newer lustre versions (at least from 2.8) and

#include 

should be used instead.



___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] sharedfp/lockedfile collision between multiple program instances

2017-03-03 Thread Edgar Gabriel
done.  I added 2.0.3 as the milestone, since I am not sure what the 
timeline for 2.1.0 is. I will try to get the fix in over the weekened, 
and we can go from there.


Tanks

Edga


On 3/3/2017 9:00 AM, Howard Pritchard wrote:

Hi Edgar

Please open an issue too so we can track the fix.

Howard


Edgar Gabriel <egabr...@central.uh.edu 
<mailto:egabr...@central.uh.edu>> schrieb am Fr. 3. März 2017 um 07:45:


Nicolas,

thank you for the bug report, I can confirm the behavior. I will
work on
a patch and will try to get that into the next release, should
hopefully
not be too complicated.

Thanks

Edgar


On 3/3/2017 7:36 AM, Nicolas Joly wrote:
> Hi,
>
> We just got hit by a problem with sharedfp/lockedfile component
under
> v2.0.1 (should be identical with v2.0.2). We had 2 instances of
an MPI
> program running conccurrently on the same input file and using
> MPI_File_read_shared() function ...
>
> If the shared file pointer is maintained with the lockedfile
> component, a "XXX.lockedfile" is created near to the data
> file. Unfortunately, this fixed name will collide with multiple
tools
> instances ;)
>
> Running 2 instances of the following command line (source code
> attached) on the same machine will show the problematic behaviour.
>
> mpirun -n 1 --mca sharedfp lockedfile ./shrread -v input.dat
>
> Confirmed with lsof(8) output :
>
> njoly@tars [~]> lsof input.dat.lockedfile
> COMMAND  PID  USER   FD   TYPE DEVICE SIZE/OFF NODE NAME
> shrread 5876 njoly   21w   REG   0,308 13510798885996031
input.dat.lockedfile
> shrread 5884 njoly   21w   REG   0,308 13510798885996031
input.dat.lockedfile
>
> Thanks in advance.
>

___
users mailing list
users@lists.open-mpi.org <mailto:users@lists.open-mpi.org>
https://rfd.newmexicoconsortium.org/mailman/listinfo/users



___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] sharedfp/lockedfile collision between multiple program instances

2017-03-03 Thread Edgar Gabriel

Nicolas,

thank you for the bug report, I can confirm the behavior. I will work on 
a patch and will try to get that into the next release, should hopefully 
not be too complicated.


Thanks

Edgar


On 3/3/2017 7:36 AM, Nicolas Joly wrote:

Hi,

We just got hit by a problem with sharedfp/lockedfile component under
v2.0.1 (should be identical with v2.0.2). We had 2 instances of an MPI
program running conccurrently on the same input file and using
MPI_File_read_shared() function ...

If the shared file pointer is maintained with the lockedfile
component, a "XXX.lockedfile" is created near to the data
file. Unfortunately, this fixed name will collide with multiple tools
instances ;)

Running 2 instances of the following command line (source code
attached) on the same machine will show the problematic behaviour.

mpirun -n 1 --mca sharedfp lockedfile ./shrread -v input.dat

Confirmed with lsof(8) output :

njoly@tars [~]> lsof input.dat.lockedfile
COMMAND  PID  USER   FD   TYPE DEVICE SIZE/OFF  NODE NAME
shrread 5876 njoly   21w   REG   0,308 13510798885996031 
input.dat.lockedfile
shrread 5884 njoly   21w   REG   0,308 13510798885996031 
input.dat.lockedfile

Thanks in advance.



___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] MPI_File_write_shared() and MPI_MODE_APPEND issue ?

2017-01-23 Thread Edgar Gabriel
just wanted to give a brief update on this. The problem was in fact that 
we did not correctly move the shared file pointer to the end of the file 
when a file is opened in append mode. (The individual file pointer did 
the right thing however). The patch itself is not overly complected, I 
filed a pr towards masters, and will create pr for the 2.0 and 2.1 
release later as well. I am not sure however whether it will make it in 
time for the 2.0.2 release, it might be too late for that.


Thanks for the bug report!

Edgar


On 1/18/2017 9:36 AM, Nicolas Joly wrote:

Hi,

We have a tool where all workers will use MPI_File_write_shared() on a
file that was opened with MPI_MODE_APPEND, mostly because rank 0 will
have written some format specific header data.

We recently upgraded our openmpi version from v1.10.4 to v2.0.1. And
at that time we noticed a behaviour change ... ompio do not show the
same result as romio with the attached code.

njoly@tars-submit0 [tmp/mpiio]> mpirun --version
mpirun (Open MPI) 2.0.1
[...]
njoly@tars-submit0 [tmp/mpiio]> mpirun -n 1 --mca io romio314 ./openappend
njoly@tars-submit0 [tmp/mpiio]> echo $?
0
njoly@tars-submit0 [tmp/mpiio]> cat openappend.test
Header line
Data line
njoly@tars-submit0 [tmp/mpiio]> mpirun -n 1 --mca io ompio ./openappend
njoly@tars-submit0 [tmp/mpiio]> echo $?
0
njoly@tars-submit0 [tmp/mpiio]> cat openappend.test
Data line
e

With ompio, it seems that, for some reason, the shared file pointer
was reset/initialised(?) to zero ... leading to an unexpected write
position for the "Data line" buffer.

Thanks in advance.
Regards.


___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] MPI_File_write_shared() and MPI_MODE_APPEND issue ?

2017-01-18 Thread Edgar Gabriel
I will look into this, I have a suspicion on what might be wrong. Give 
me a day or three.


Thanks

EDgar


On 1/18/2017 9:36 AM, Nicolas Joly wrote:

Hi,

We have a tool where all workers will use MPI_File_write_shared() on a
file that was opened with MPI_MODE_APPEND, mostly because rank 0 will
have written some format specific header data.

We recently upgraded our openmpi version from v1.10.4 to v2.0.1. And
at that time we noticed a behaviour change ... ompio do not show the
same result as romio with the attached code.

njoly@tars-submit0 [tmp/mpiio]> mpirun --version
mpirun (Open MPI) 2.0.1
[...]
njoly@tars-submit0 [tmp/mpiio]> mpirun -n 1 --mca io romio314 ./openappend
njoly@tars-submit0 [tmp/mpiio]> echo $?
0
njoly@tars-submit0 [tmp/mpiio]> cat openappend.test
Header line
Data line
njoly@tars-submit0 [tmp/mpiio]> mpirun -n 1 --mca io ompio ./openappend
njoly@tars-submit0 [tmp/mpiio]> echo $?
0
njoly@tars-submit0 [tmp/mpiio]> cat openappend.test
Data line
e

With ompio, it seems that, for some reason, the shared file pointer
was reset/initialised(?) to zero ... leading to an unexpected write
position for the "Data line" buffer.

Thanks in advance.
Regards.



___
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users


Re: [OMPI users] Regression in MPI_File_close?!

2016-06-02 Thread Edgar Gabriel

Gilles,

I think the semantics of MPI_File_close does not necessarily mandate 
that there has to be an MPI_Barrier based on that text snippet. However, 
I think what the Barrier does in this scenario is 'hide' a consequence 
of an implementation aspect. So the MPI standard might not mandate a 
Barrier synchronization, but the actual implementation does.


( I have btw. not have had yet the time to think through the different 
mechanisms that ompio offers for shared file pointers, and whether all 
of them truly require a Barrier for the same reason). Hope to get to 
that soon.


THanks

EDga


On 5/31/2016 9:33 PM, Gilles Gouaillardet wrote:


Edgar,


this is the bug reported at 
http://www.open-mpi.org/community/lists/users/2016/05/29333.php



now i am having some second thoughts about it ...

per the MPI_File_close man page :

"MPI_File_close  first  synchronizes  file  state, then closes the 
file associated with fh.


MPI_File_close is a collective routine. The user is responsible for 
ensuring that all outstanding requests associated with fh have 
completed before calling MPI_File_close."



does this implies MPI_File_close() internally performs a MPI_Barrier() ?

or am i over-interpreting the man page ?


My point is if all tasks but one call MPI_File_close() *before* the 
other one calls MPI_File_write_at(), there is really nothing to flush, 
and though MPI_File_close() is a collective routine (just like 
MPI_Bcast() ) that does not necessarily means it has a MPI_Barrier() 
semantic.



Cheers,


Gilles


On 5/31/2016 11:18 PM, Edgar Gabriel wrote:


just for my understanding, which bug in ompio are you referring? I am 
only aware of a single (pretty minor) pending issue in the 2.x series


Thanks

Edgar


On 5/31/2016 1:28 AM, Gilles Gouaillardet wrote:


Thanks for the report.

the romio included in the v1.10 series is a bit old and did not 
include the fix,


i made PR #1206 for that 
http://www.open-mpi.org/community/lists/users/2016/05/29333.php


feel free to manually apply the patch available at 
https://github.com/open-mpi/ompi-release/commit/a0ea9fb6cbe4cf71567c9fc7fd8f4be384617ad4.diff



note that the issue is already fixed in romio of the v2.x series and 
master.


that being said, the default io module here is ompio, and it is 
currently buggy, so if you are using these series, you need to


mpirun --mca io romio314 ...

for the time being


Cheers,


Gilles


On 5/31/2016 2:27 PM, Cihan Altinay wrote:

Hello list,

I recently upgraded my distribution-supplied OpenMPI packages 
(debian) from 1.6.5 to 1.10.2 and the attached test is no longer 
guaranteed to produce the expected output.

In plain English what the test is doing is:
1) open a file in parallel (all on the same local ext3/4 filesystem),
2) use MPI_File_write_at() or MPI_File_write_shared() to write to it,
3) close the file using MPI_File_close(),
4) then check the file size (either by stat(), or by fseek+ftell)

My reading of the standard is that MPI_File_close() is a collective 
operation so I should reliably get the correct file size in step 4. 
However, while this worked with version 1.6.5 and with Intel MPI 
this is no longer the case with the current OpenMPI version.
I was able to confirm the same behaviour on a fresh Ubuntu 16.0.4 
install in a VM.

The more ranks I use the more likely I get a wrong file size.

Is there anything I'm missing or is this a regression?

Thanks,
Cihan



___
users mailing list
us...@open-mpi.org
Subscription:https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2016/05/29333.php




--
Edgar Gabriel
Associate Professor
Parallel Software Technologies Labhttp://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335
--


___
users mailing list
us...@open-mpi.org
Subscription:https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2016/05/29335.php




--
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335
--



Re: [OMPI users] Regression in MPI_File_close?!

2016-05-31 Thread Edgar Gabriel
just for my understanding, which bug in ompio are you referring? I am 
only aware of a single (pretty minor) pending issue in the 2.x series


Thanks

Edgar


On 5/31/2016 1:28 AM, Gilles Gouaillardet wrote:


Thanks for the report.

the romio included in the v1.10 series is a bit old and did not 
include the fix,


i made PR #1206 for that 
http://www.open-mpi.org/community/lists/users/2016/05/29333.php


feel free to manually apply the patch available at 
https://github.com/open-mpi/ompi-release/commit/a0ea9fb6cbe4cf71567c9fc7fd8f4be384617ad4.diff



note that the issue is already fixed in romio of the v2.x series and 
master.


that being said, the default io module here is ompio, and it is 
currently buggy, so if you are using these series, you need to


mpirun --mca io romio314 ...

for the time being


Cheers,


Gilles


On 5/31/2016 2:27 PM, Cihan Altinay wrote:

Hello list,

I recently upgraded my distribution-supplied OpenMPI packages 
(debian) from 1.6.5 to 1.10.2 and the attached test is no longer 
guaranteed to produce the expected output.

In plain English what the test is doing is:
1) open a file in parallel (all on the same local ext3/4 filesystem),
2) use MPI_File_write_at() or MPI_File_write_shared() to write to it,
3) close the file using MPI_File_close(),
4) then check the file size (either by stat(), or by fseek+ftell)

My reading of the standard is that MPI_File_close() is a collective 
operation so I should reliably get the correct file size in step 4. 
However, while this worked with version 1.6.5 and with Intel MPI this 
is no longer the case with the current OpenMPI version.
I was able to confirm the same behaviour on a fresh Ubuntu 16.0.4 
install in a VM.

The more ranks I use the more likely I get a wrong file size.

Is there anything I'm missing or is this a regression?

Thanks,
Cihan



___
users mailing list
us...@open-mpi.org
Subscription:https://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this 
post:http://www.open-mpi.org/community/lists/users/2016/05/29333.php




--
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335
--



Re: [OMPI users] Wrong values when reading file with MPI IO

2016-04-07 Thread Edgar Gabriel

What version of Open MPI did you execute your test with?
Edgar

On 4/7/2016 1:54 PM, david.froger...@mailoo.org wrote:

Hello,

Here is a simple `C` program reading a file in parallel with `MPI IO`:

  #include 
  #include 

  #include "mpi.h"

  #define N 10

  main( int argc, char **argv )
  {
  int rank, size;
  MPI_Init(, );
  MPI_Comm_rank( MPI_COMM_WORLD,  );
  MPI_Comm_size( MPI_COMM_WORLD,  );

  int i0 = N *  rank / size;
  int i1 = N * (rank+1) / size;
  printf("rank: %d, i0: %d, i1: %d\n", rank, i0, i1);

  int i;
  double* data = malloc( (i1-i0)*sizeof(double) );
  for (i = 0 ; i < i1-i0 ; i++)
  data[i] = 123.;

  MPI_File f;
  MPI_File_open(MPI_COMM_WORLD, "data.bin", MPI_MODE_RDONLY,
MPI_INFO_NULL, );

  MPI_File_set_view(f, i0, MPI_DOUBLE, MPI_DOUBLE, "native",
MPI_INFO_NULL);

  MPI_Status status;
  MPI_File_read(f, data, i1-i0, MPI_DOUBLE, );

  int count;
  MPI_Get_count(, MPI_DOUBLE, );
  printf("rank %d, %d value read\n", rank, count);

  for (i = 0 ; i < i1-i0 ; i++) {
  printf("rank: %d index: %d value: %.2f\n", rank, i,
data[i]);
  }

  MPI_File_close();

  MPI_Finalize();

  free(data);

  return 0;
  }

With one processus:

  ./read_mpi_io

Values read are correct:

  rank: 0, i0: 0, i1: 10
  rank 0, 10 value read
  rank: 0 index: 0 value: 0.00
  rank: 0 index: 1 value: 1.00
  rank: 0 index: 2 value: 2.00
  rank: 0 index: 3 value: 3.00
  rank: 0 index: 4 value: 4.00
  rank: 0 index: 5 value: 5.00
  rank: 0 index: 6 value: 6.00
  rank: 0 index: 7 value: 7.00
  rank: 0 index: 8 value: 8.00
  rank: 0 index: 9 value: 9.00

But with two processus:

  mpirun -n 2 ./read_mpi_io

I get wrong values (zeros):

  rank: 0, i0: 0, i1: 5
  rank: 1, i0: 5, i1: 10
  rank 0, 5 value read
  rank: 0 index: 0 value: 0.00
  rank 1, 5 value read
  rank: 1 index: 0 value: 0.00
  rank: 0 index: 1 value: 1.00
  rank: 0 index: 2 value: 2.00
  rank: 1 index: 1 value: 0.00
  rank: 1 index: 2 value: 0.00
  rank: 1 index: 3 value: 0.00
  rank: 1 index: 4 value: 0.00
  rank: 0 index: 3 value: 3.00
  rank: 0 index: 4 value: 4.00


What's wrong in my C code?

Thanks,
David
___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/04/28903.php


Re: [OMPI users] terrible infiniband performance for

2016-03-23 Thread Edgar Gabriel
iirc issue--how should I check?
>> Are there any flags I should be using for infiniband? Is this a
>> problem with latency?
>>
>> Ron
>>
>>
>> ---
>> Ron Cohen
>> recoh...@gmail.com
>> skypename: ronaldcohen
>> twitter: @recohen3
>>
>>
>> On Wed, Mar 23, 2016 at 8:13 AM, Gilles Gouaillardet
>> <gilles.gouaillar...@gmail.com> wrote:
>> > Ronald,
>> >
>> > did you try to build openmpi with a previous gcc release ?
>> > if yes, what about the performance ?
>> >
>> > did you build openmpi from a tarball or from git ?
>> > if from git and without VPATH, then you need to
>> > configure with --disable-debug
>> >
>> > iirc, one issue was identified previously
>> > (gcc optimization that prevents the memory wrapper from
behaving as
>> > expected) and I am not sure the fix landed in v1.10
branch nor master
>> > ...
>> >
>> > thanks for the info about gcc 6.0.0
>> > now this is supported on a free compiler
>> > (cray and intel already support that, but they are commercial
>> > compilers),
>> > I will resume my work on supporting this
>> >
>> > Cheers,
>> >
>> > Gilles
>> >
>> > On Wednesday, March 23, 2016, Ronald Cohen
<recoh...@gmail.com> wrote:
>> >>
>> >> I get 100 GFLOPS for 16 cores on one node, but 1 GFLOP
running 8 cores
>> >> on two nodes. It seems that quad-infiniband should do
better than
>> >> this. I built openmpi-1.10.2g with gcc version 6.0.0
20160317 . Any
>> >> ideas of what to do to get usable performance? Thank you!
>> >>
>> >> bstatus
>> >> Infiniband device 'mlx4_0' port 1 status:
>> >> default gid:
 fe80::::0002:c903:00ec:9301
>> >> base lid:0x1
>> >> sm lid:  0x1
>> >> state:   4: ACTIVE
>> >> phys state:  5: LinkUp
>> >> rate:56 Gb/sec (4X FDR)
>> >> link_layer:  InfiniBand
>> >>
>> >> Ron
>> >> --
>> >>
>> >> Professor Dr. Ronald Cohen
>> >> Ludwig Maximilians Universität
>> >> Theresienstrasse 41 Room 207
>> >> Department für Geo- und Umweltwissenschaften
>> >> München
>> >> 80333
>> >> Deutschland
>> >>
>> >>
>> >> ronald.co...@min.uni-muenchen.de
>> >> skype: ronaldcohen
>> >> +49 (0) 89 74567980
>> >> ---
>> >> Ronald Cohen
>> >> Geophysical Laboratory
>> >> Carnegie Institution
>> >> 5251 Broad Branch Rd., N.W.
>> >> Washington, D.C. 20015
>> >> rco...@carnegiescience.edu
>> >> office: 202-478-8937
>> >> skype: ronaldcohen
>> >> https://twitter.com/recohen3
>> >> https://www.linkedin.com/profile/view?id=163327727
>> >>
>> >>
>> >> ---
>> >> Ron Cohen
>> >> recoh...@gmail.com
>> >> skypename: ronaldcohen
>> >> twitter: @recohen3
>> >> ___
>> >> users mailing list
>> >> us...@open-mpi.org
>> >> Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >> Link to this post:
>> >>
http://www.open-mpi.org/community/lists/users/2016/03/28791.php
>> >
>> >
>> > ___
    >> > users mailing list
>> > us...@open-mpi.org
>> > Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > Link to this post:
>&

Re: [OMPI users] Error with MPI_Register_datarep

2016-03-16 Thread Edgar Gabriel

On 3/16/2016 7:06 AM, Éric Chamberland wrote:

Le 16-03-14 15:07, Rob Latham a écrit :

On mpich's discussion list the point was made that libraries like HDF5
and (Parallel-)NetCDF provide not only the sort of platform
portability Eric desires, but also provide a self-describing file format.

==rob


But I do not agree with that.

If MPI can provide me a simple solution like user datarep, why in the
world would I bind my code to another library?

Instead of re-coding all my I/O in my code, I would prefer to contribute
to MPI I/O implementations out there...  :)

So, the never answered question: How big is that task


Just speaking for OMPIO: there is a simple solution which would 
basically perform the necessary conversion of the user buffer as a first 
step. This implementation would be fairly straight forward, but would 
require a temporary buffer that is basically of the same size (or 
larger, depending on the format) as your input buffer, which would be a 
problem for many application scenarios.


The problem with trying to perform the conversion at a later step is, 
that all the buffers are treated as byte sequences internally, so the 
notion of data types is lost at one point in time. This is especially 
important for collective I/O, since the aggregation step might in some 
extreme situations even break up a datatype to be written in different 
cycles (or by different aggregators) internally.


That being said, I admit that I haven't spent too much time thinking 
about solutions to this problem. If there is interest, I am would be 
happy to work on it - and happy to accept help :-)


Edgar


Also, in 2012, I can state that having looked at HDF5, there was no
functions that used collective MPI I/O for *randomly distributed*
data...  Collective I/O was available only for "structured" data. So I
coded it all directly into MPI natives calls... and it works like a charm!

Thanks,

Eric

___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2016/03/28711.php


Re: [OMPI users] error openmpi check hdf5

2016-02-10 Thread Edgar Gabriel
yes and no :-) That particular functions was fixed, but there are a few 
other especially in the shardefp framework that would cause similar 
problems if compiled without RTLD_GLOBAL.


But more importantly, I can confirm that ompio in the 1.8 and 1.10 
series does *not* pass the HDF5 tests and should not be used for that 
(it passes on master and the 2.x series).  ROMIO is the default in 1.7, 
1.8 and 1.10 and should be used therefore.


Thganks
Edgar

On 2/10/2016 6:32 AM, Gilles Gouaillardet wrote:

Edgar,

It seems this the consequence of an abstraction violation
(fcoll/two_phases directly calls the io/ompio component) and that was 
fixed in master.

Can you confirm that ?

Delphine,

The problem should disappear if you use romio instead of ompio

Cheers,

Gilles

On Wednesday, February 10, 2016, Edgar Gabriel 
<egabr...@central.uh.edu 
<javascript:_e(%7B%7D,'cvml','egabr...@central.uh.edu');>> wrote:


which version of Open MPI is this?
Thanks
Edgar

On 2/10/2016 4:13 AM, Delphine Ramalingom wrote:

Hello,

I try to compile a parallel version of hdf5.
I have error messages when I check with openmpi.

Support on HDF5 told me that the errors seem related to the new
ompio implementation inside
open MPI for MPI-I/O. He suggests that I talk to the OMPI mailing
list to resolve  these errors.

For information, my version of openmpi is : gcc (GCC) 4.8.2
mpicc --showme
gcc -I/programs/Compilateurs2/usr/include -pthread -Wl,-rpath
-Wl,/programs/Compilateurs2/usr/lib -Wl,--enable-new-dtags
-L/programs/Compilateurs2/usr/lib -lmpi

Errors are :

.../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error:f
/programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so:
undefined symbol: ompi_io_ompio_decode_datatype
.../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error:
/programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so:
undefined symbol: ompi_io_ompio_decode_datatype
---
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
---
.../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error:
/programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so:
undefined symbol: ompi_io_ompio_set_aggregator_props
.../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error:
/programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so:
undefined symbol: ompi_io_ompio_set_aggregator_props
.../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error:
/programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so:
undefined symbol: ompi_io_ompio_set_aggregator_props


Thanks in advance for your help.

Regards
Delphine

-- 
<http://www.univ-reunion.fr> *Delphine Ramalingom Barbary |

Ingénieure en Calcul Scientifique *
Direction des Usages du Numérique (DUN)
Centre de Développement du Calcul Scientifique
TEL : 02 62 93 84 87- FAX : 02 62 93 81 06


-- 
Edgar Gabriel

Associate Professor
Parallel Software Technologies Labhttp://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335
    --



--
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335
--



Re: [OMPI users] error openmpi check hdf5

2016-02-10 Thread Edgar Gabriel

which version of Open MPI is this?
Thanks
Edgar

On 2/10/2016 4:13 AM, Delphine Ramalingom wrote:

Hello,

I try to compile a parallel version of hdf5.
I have error messages when I check with openmpi.

Support on HDF5 told me that the errors seem related to the new ompio 
implementation inside
open MPI for MPI-I/O. He suggests that I talk to the OMPI mailing list 
to resolve  these errors.


For information, my version of openmpi is : gcc (GCC) 4.8.2
mpicc --showme
gcc -I/programs/Compilateurs2/usr/include -pthread -Wl,-rpath 
-Wl,/programs/Compilateurs2/usr/lib -Wl,--enable-new-dtags 
-L/programs/Compilateurs2/usr/lib -lmpi


Errors are :

.../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error: 
/programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so: 
undefined symbol: ompi_io_ompio_decode_datatype
.../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error: 
/programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so: 
undefined symbol: ompi_io_ompio_decode_datatype

---
Primary job  terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
---
.../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error: 
/programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so: 
undefined symbol: ompi_io_ompio_set_aggregator_props
.../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error: 
/programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so: 
undefined symbol: ompi_io_ompio_set_aggregator_props
.../hdf5-1.8.16_gnu/testpar/.libs/t_mpi: symbol lookup error: 
/programs/Compilateurs2/usr/lib/openmpi/mca_fcoll_two_phase.so: 
undefined symbol: ompi_io_ompio_set_aggregator_props



Thanks in advance for your help.

Regards
Delphine

--
<http://www.univ-reunion.fr> *Delphine Ramalingom Barbary | Ingénieure 
en Calcul Scientifique *

Direction des Usages du Numérique (DUN)
Centre de Développement du Calcul Scientifique
TEL : 02 62 93 84 87- FAX : 02 62 93 81 06


--
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335
--



Re: [OMPI users] Error building openmpi-v2.x-dev-1020-ge2a53b3 on Solaris

2016-01-26 Thread Edgar Gabriel
you are probably right, the code in io_ompio was copied from fs_lustre 
(and was there for a long time), but if the solaris system does not 
support Lustre, it would not have shown up. The generic ufs component 
actually does not have that sequence. I will prepare a patch, just not 
sure how to test it since I do not have access to a solaris system.


Thanks
Edgar

On 1/26/2016 8:39 AM, Gilles Gouaillardet wrote:
Paul Hargrove builds all rc versions on various platforms that do 
include solaris.

the faulty lines were committed about 10 days ago
(use romio instead of ompio with lustre) and are not fs specific.
I can only guess several filesytems are not available on solaris, so 
using a Linux statfs never caused any issue.


I was just pointing to opal/util/proc.c as an example of how statfs 
can be replaced on Solaris.
that being said, that part could be refactored so it can be simply 
used by ompio.


Cheers,

Gilles

On Tuesday, January 26, 2016, Edgar Gabriel <egabr...@central.uh.edu 
<mailto:egabr...@central.uh.edu>> wrote:


I can look into that, but just as a note, that code is now for
roughly 5 years in master in *all* fs components, so its not
necessarily new (it just shows how often we compile with solaris).
Based on what I see in the opal/util/path.c the function
opal_path_nfs does something very similar, but I would have to
extend on that, since I need to know *what* file system it is, not
just *whether* it is one of the known file systems. Its however a
hange affecting io/ompio and all fs components.

Edgar

On 01/26/2016 06:27 AM, Gilles Gouaillardet wrote:

Thanks Siegmar,

recent updates cannot work on Solaris.

Edgar,

you can have a look at opal/util/path.c,
statfs "oddities" are handled here

Cheers,

Gilles

On Tuesday, January 26, 2016, Siegmar Gross
<siegmar.gr...@informatik.hs-fulda.de
<javascript:_e(%7B%7D,'cvml','siegmar.gr...@informatik.hs-fulda.de');>>
wrote:

Hi,

yesterday I tried to build openmpi-v2.x-dev-1020-ge2a53b3 on my
machines (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux
12.1 x86_64) with gcc-5.1.0 and Sun C 5.13. I was successful on
my Linux machine, but I got the following errors on both Solaris
platforms.



tyr openmpi-v2.x-dev-1020-ge2a53b3-SunOS.sparc.64_cc 33 tail
-15 log.make.SunOS.sparc.64_cc
  CCLD mca_fs_ufs.la <http://mca_fs_ufs.la>
make[2]: Leaving directory

`/export2/src/openmpi-2.0.0/openmpi-v2.x-dev-1020-ge2a53b3-SunOS.sparc.64_cc/ompi/mca/fs/ufs'
Making all in mca/io/ompio
make[2]: Entering directory

`/export2/src/openmpi-2.0.0/openmpi-v2.x-dev-1020-ge2a53b3-SunOS.sparc.64_cc/ompi/mca/io/ompio'
  CC   io_ompio.lo
  CC   io_ompio_component.lo

"../../../../../openmpi-v2.x-dev-1020-ge2a53b3/ompi/mca/io/ompio/io_ompio_component.c",
line 285: prototype mismatch: 2 args passed, 4 expected

"../../../../../openmpi-v2.x-dev-1020-ge2a53b3/ompi/mca/io/ompio/io_ompio_component.c",
line 290: prototype mismatch: 2 args passed, 4 expected

"../../../../../openmpi-v2.x-dev-1020-ge2a53b3/ompi/mca/io/ompio/io_ompio_component.c",
line 296: undefined struct/union member: f_type
cc: acomp failed for

../../../../../openmpi-v2.x-dev-1020-ge2a53b3/ompi/mca/io/ompio/io_ompio_component.c
make[2]: *** [io_ompio_component.lo] Error 1
make[2]: Leaving directory

`/export2/src/openmpi-2.0.0/openmpi-v2.x-dev-1020-ge2a53b3-SunOS.sparc.64_cc/ompi/mca/io/ompio'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory

`/export2/src/openmpi-2.0.0/openmpi-v2.x-dev-1020-ge2a53b3-SunOS.sparc.64_cc/ompi'
make: *** [all-recursive] Error 1
tyr openmpi-v2.x-dev-1020-ge2a53b3-SunOS.sparc.64_cc 34





tyr openmpi-v2.x-dev-1020-ge2a53b3-SunOS.sparc.64_gcc 37 tail
-29 log.make.SunOS.sparc.64_gcc
  CCLD mca_fs_ufs.la <http://mca_fs_ufs.la>
make[2]: Leaving directory

`/export2/src/openmpi-2.0.0/openmpi-v2.x-dev-1020-ge2a53b3-SunOS.sparc.64_gcc/ompi/mca/fs/ufs'
Making all in mca/io/ompio
make[2]: Entering directory

`/export2/src/openmpi-2.0.0/openmpi-v2.x-dev-1020-ge2a53b3-SunOS.sparc.64_gcc/ompi/mca/io/ompio'
  CC   io_ompio.lo
  CC   io_ompio_component.lo

../../../../../openmpi-v2.x-dev-1020-ge2a53b3/ompi/mca/io/ompio/io_ompio_component.c:
In function 'file_query':

../../../../../openmpi-v2.x-dev-1020-ge2a53b3/ompi/mca/io/ompio/io_ompio_component.c:285:23:
error: too few arguments to function 'statfs'
 err = statfs (file->f_filename, );
   ^
In file include

Re: [OMPI users] Error building openmpi-v2.x-dev-1020-ge2a53b3 on Solaris

2016-01-26 Thread Edgar Gabriel
I can look into that, but just as a note, that code is now for roughly 5 
years in master in *all* fs components, so its not necessarily new (it 
just shows how often we compile with solaris). Based on what I see in 
the opal/util/path.c the function opal_path_nfs does something very 
similar, but I would have to extend on that, since I need to know *what* 
file system it is, not just *whether* it is one of the known file 
systems. Its however a hange affecting io/ompio and all fs components.


Edgar

On 01/26/2016 06:27 AM, Gilles Gouaillardet wrote:

Thanks Siegmar,

recent updates cannot work on Solaris.

Edgar,

you can have a look at opal/util/path.c,
statfs "oddities" are handled here

Cheers,

Gilles

On Tuesday, January 26, 2016, Siegmar Gross 
> wrote:


Hi,

yesterday I tried to build openmpi-v2.x-dev-1020-ge2a53b3 on my
machines (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux
12.1 x86_64) with gcc-5.1.0 and Sun C 5.13. I was successful on
my Linux machine, but I got the following errors on both Solaris
platforms.



tyr openmpi-v2.x-dev-1020-ge2a53b3-SunOS.sparc.64_cc 33 tail -15
log.make.SunOS.sparc.64_cc
  CCLD mca_fs_ufs.la 
make[2]: Leaving directory

`/export2/src/openmpi-2.0.0/openmpi-v2.x-dev-1020-ge2a53b3-SunOS.sparc.64_cc/ompi/mca/fs/ufs'
Making all in mca/io/ompio
make[2]: Entering directory

`/export2/src/openmpi-2.0.0/openmpi-v2.x-dev-1020-ge2a53b3-SunOS.sparc.64_cc/ompi/mca/io/ompio'
  CC   io_ompio.lo
  CC   io_ompio_component.lo

"../../../../../openmpi-v2.x-dev-1020-ge2a53b3/ompi/mca/io/ompio/io_ompio_component.c",
line 285: prototype mismatch: 2 args passed, 4 expected

"../../../../../openmpi-v2.x-dev-1020-ge2a53b3/ompi/mca/io/ompio/io_ompio_component.c",
line 290: prototype mismatch: 2 args passed, 4 expected

"../../../../../openmpi-v2.x-dev-1020-ge2a53b3/ompi/mca/io/ompio/io_ompio_component.c",
line 296: undefined struct/union member: f_type
cc: acomp failed for

../../../../../openmpi-v2.x-dev-1020-ge2a53b3/ompi/mca/io/ompio/io_ompio_component.c
make[2]: *** [io_ompio_component.lo] Error 1
make[2]: Leaving directory

`/export2/src/openmpi-2.0.0/openmpi-v2.x-dev-1020-ge2a53b3-SunOS.sparc.64_cc/ompi/mca/io/ompio'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory

`/export2/src/openmpi-2.0.0/openmpi-v2.x-dev-1020-ge2a53b3-SunOS.sparc.64_cc/ompi'
make: *** [all-recursive] Error 1
tyr openmpi-v2.x-dev-1020-ge2a53b3-SunOS.sparc.64_cc 34





tyr openmpi-v2.x-dev-1020-ge2a53b3-SunOS.sparc.64_gcc 37 tail -29
log.make.SunOS.sparc.64_gcc
  CCLD mca_fs_ufs.la 
make[2]: Leaving directory

`/export2/src/openmpi-2.0.0/openmpi-v2.x-dev-1020-ge2a53b3-SunOS.sparc.64_gcc/ompi/mca/fs/ufs'
Making all in mca/io/ompio
make[2]: Entering directory

`/export2/src/openmpi-2.0.0/openmpi-v2.x-dev-1020-ge2a53b3-SunOS.sparc.64_gcc/ompi/mca/io/ompio'
  CC   io_ompio.lo
  CC   io_ompio_component.lo

../../../../../openmpi-v2.x-dev-1020-ge2a53b3/ompi/mca/io/ompio/io_ompio_component.c:
In function 'file_query':

../../../../../openmpi-v2.x-dev-1020-ge2a53b3/ompi/mca/io/ompio/io_ompio_component.c:285:23:
error: too few arguments to function 'statfs'
 err = statfs (file->f_filename, );
   ^
In file included from

../../../../../openmpi-v2.x-dev-1020-ge2a53b3/ompi/mca/io/ompio/io_ompio_component.c:36:0:
/usr/include/sys/statfs.h:53:5: note: declared here
 int statfs(const char *, struct statfs *, int, int);
 ^

../../../../../openmpi-v2.x-dev-1020-ge2a53b3/ompi/mca/io/ompio/io_ompio_component.c:290:23:
error: too few arguments to function 'statfs'
 err = statfs (dir, );
   ^
In file included from

../../../../../openmpi-v2.x-dev-1020-ge2a53b3/ompi/mca/io/ompio/io_ompio_component.c:36:0:
/usr/include/sys/statfs.h:53:5: note: declared here
 int statfs(const char *, struct statfs *, int, int);
 ^

../../../../../openmpi-v2.x-dev-1020-ge2a53b3/ompi/mca/io/ompio/io_ompio_component.c:296:22:
error: 'struct statfs' has no member named 'f_type'
 if (fsbuf.f_type == LL_SUPER_MAGIC) {
  ^
make[2]: *** [io_ompio_component.lo] Error 1
make[2]: Leaving directory

`/export2/src/openmpi-2.0.0/openmpi-v2.x-dev-1020-ge2a53b3-SunOS.sparc.64_gcc/ompi/mca/io/ompio'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory

`/export2/src/openmpi-2.0.0/openmpi-v2.x-dev-1020-ge2a53b3-SunOS.sparc.64_gcc/ompi'
make: *** [all-recursive] Error 1
tyr openmpi-v2.x-dev-1020-ge2a53b3-SunOS.sparc.64_gcc 38


I would be grateful if somebody can fix the problem. Thank 

Re: [OMPI users] OMPIO correctnes issues

2015-12-09 Thread Edgar Gabriel
ok, I can confirm that once I update the file_get_position function to 
what we have in master and the 2.x series, your test passes with ompio 
in the 1.10 series as well. I am happy to provide a patch for testing, 
and to submit a pr. I am however worried since we know that ompio in the 
1.10 series is significantly out of sync with master, so there is 
potential for other, similar issues.


It would however be interesting to see, whether your code works 
correctly with ompio in the 2.x release (or master), and I would be 
happy to provide any support necessary for testing (including the offer, 
that I can run the tests if you provide me the source code).


Thanks
Edgar

On 12/9/2015 9:30 AM, Edgar Gabriel wrote:

what does the mount command return?

On 12/9/2015 9:27 AM, Paul Kapinos wrote:

Dear Edgar,


On 12/09/15 16:16, Edgar Gabriel wrote:

I tested your code in master and v1.10 ( on my local machine), and I get for
both version of ompio exactly the same (correct) output that you had with romio.

I've tested it at local hard disk..

pk224850@cluster:/tmp/pk224850/cluster_15384/TMP[529]$ df -h .
Filesystem  Size  Used Avail Use% Mounted on
/dev/sda3   1.1T   16G  1.1T   2% /w0

pk224850@cluster:/tmp/pk224850/cluster_15384/TMP[530]$ echo hell-o > out.txt;
./a.out
fileOffset, fileSize 7 7
fileOffset, fileSize2323
ierr0
MPI_MODE_WRONLY,  MPI_MODE_APPEND4 128

pk224850@cluster:/tmp/pk224850/cluster_15384/TMP[531]$ export OMPI_MCA_io=ompio

pk224850@cluster:/tmp/pk224850/cluster_15384/TMP[532]$ echo hell-o > out.txt;
./a.out
fileOffset, fileSize 0 7
fileOffset, fileSize 016
ierr0
MPI_MODE_WRONLY,  MPI_MODE_APPEND4 128



However, I also noticed that in the ompio version that is in the v1.10 branch,
the MPI_File_get_size function is not implemented on lustre.

Yes we have Lustre in the cluster.
I believe that was one of 'another' issues mentioned, yes some users tend to use
Lustre as HPC file system =)





Thanks
Edgar

On 12/9/2015 8:06 AM, Edgar Gabriel wrote:

I will look at your test case and see what is going on in ompio. That
being said, the vast number of fixes and improvements that went into
ompio over the last two years were not back ported to the 1.8 (and thus
1.10) series, since it would have required changes to the interfaces of
the frameworks involved (and thus would have violated one of rules of
Open MPI release series) . Anyway, if there is a simple fix for your
test case for the 1.10 series, I am happy to provide a patch. It might
take me a day or two however.

Edgar

On 12/9/2015 6:24 AM, Paul Kapinos wrote:

Sorry, forgot to mention: 1.10.1


 Open MPI: 1.10.1
   Open MPI repo revision: v1.10.0-178-gb80f802
Open MPI release date: Nov 03, 2015
 Open RTE: 1.10.1
   Open RTE repo revision: v1.10.0-178-gb80f802
Open RTE release date: Nov 03, 2015
 OPAL: 1.10.1
   OPAL repo revision: v1.10.0-178-gb80f802
OPAL release date: Nov 03, 2015
  MPI API: 3.0.0
 Ident string: 1.10.1


On 12/09/15 11:26, Gilles Gouaillardet wrote:

Paul,

which OpenMPI version are you using ?

thanks for providing a simple reproducer, that will make things much easier
from
now.
(and at first glance, that might not be a very tricky bug)

Cheers,

Gilles

On Wednesday, December 9, 2015, Paul Kapinos <kapi...@itc.rwth-aachen.de
<mailto:kapi...@itc.rwth-aachen.de>> wrote:

Dear Open MPI developers,
did OMPIO (1) reached 'usable-stable' state?

As we reported in (2) we had some trouble in building Open MPI with
ROMIO,
which fact was hidden by OMPIO implementation stepping into the MPI_IO
breach. The fact 'ROMIO isn't AVBL' was detected after users complained
'MPI_IO don't work as expected with version XYZ of OpenMPI' and further
investigations.

Take a look at the attached example. It deliver different result in
case of
using ROMIO and OMPIO even with 1 MPI rank on local hard disk, cf. (3).
We've seen more examples of divergent behaviour but this one is quite
handy.

Is that a bug in OMPIO or did we miss something?

Best
Paul Kapinos


1) http://www.open-mpi.org/faq/?category=ompio

2) http://www.open-mpi.org/community/lists/devel/2015/12/18405.php

3) (ROMIO is default; on local hard drive at node 'cluster')
$ ompi_info  | grep  romio
   MCA io: romio (MCA v2.0.0, API v2.0.0, Component
v1.10.1)
$ ompi_info  | grep  ompio
   MCA io: ompio (MCA v2.0.0, API v2.0.0, Component
v1.10.1)
$ mpif90 main.f90

  

Re: [OMPI users] OMPIO correctnes issues

2015-12-09 Thread Edgar Gabriel
ok, forget it, I found the issue. I totally forgot that in the 1.10 
series I have to manually force ompio ( it is the default on master and 
2.x). It fails now for me as well with v1.10, will elt you know what I find.


Thanks
Edgar

On 12/9/2015 9:30 AM, Edgar Gabriel wrote:

what does the mount command return?

On 12/9/2015 9:27 AM, Paul Kapinos wrote:

Dear Edgar,


On 12/09/15 16:16, Edgar Gabriel wrote:

I tested your code in master and v1.10 ( on my local machine), and I get for
both version of ompio exactly the same (correct) output that you had with romio.

I've tested it at local hard disk..

pk224850@cluster:/tmp/pk224850/cluster_15384/TMP[529]$ df -h .
Filesystem  Size  Used Avail Use% Mounted on
/dev/sda3   1.1T   16G  1.1T   2% /w0

pk224850@cluster:/tmp/pk224850/cluster_15384/TMP[530]$ echo hell-o > out.txt;
./a.out
fileOffset, fileSize 7 7
fileOffset, fileSize2323
ierr0
MPI_MODE_WRONLY,  MPI_MODE_APPEND4 128

pk224850@cluster:/tmp/pk224850/cluster_15384/TMP[531]$ export OMPI_MCA_io=ompio

pk224850@cluster:/tmp/pk224850/cluster_15384/TMP[532]$ echo hell-o > out.txt;
./a.out
fileOffset, fileSize 0 7
fileOffset, fileSize 016
ierr0
MPI_MODE_WRONLY,  MPI_MODE_APPEND4 128



However, I also noticed that in the ompio version that is in the v1.10 branch,
the MPI_File_get_size function is not implemented on lustre.

Yes we have Lustre in the cluster.
I believe that was one of 'another' issues mentioned, yes some users tend to use
Lustre as HPC file system =)





Thanks
Edgar

On 12/9/2015 8:06 AM, Edgar Gabriel wrote:

I will look at your test case and see what is going on in ompio. That
being said, the vast number of fixes and improvements that went into
ompio over the last two years were not back ported to the 1.8 (and thus
1.10) series, since it would have required changes to the interfaces of
the frameworks involved (and thus would have violated one of rules of
Open MPI release series) . Anyway, if there is a simple fix for your
test case for the 1.10 series, I am happy to provide a patch. It might
take me a day or two however.

Edgar

On 12/9/2015 6:24 AM, Paul Kapinos wrote:

Sorry, forgot to mention: 1.10.1


 Open MPI: 1.10.1
   Open MPI repo revision: v1.10.0-178-gb80f802
Open MPI release date: Nov 03, 2015
 Open RTE: 1.10.1
   Open RTE repo revision: v1.10.0-178-gb80f802
Open RTE release date: Nov 03, 2015
 OPAL: 1.10.1
   OPAL repo revision: v1.10.0-178-gb80f802
OPAL release date: Nov 03, 2015
  MPI API: 3.0.0
 Ident string: 1.10.1


On 12/09/15 11:26, Gilles Gouaillardet wrote:

Paul,

which OpenMPI version are you using ?

thanks for providing a simple reproducer, that will make things much easier
from
now.
(and at first glance, that might not be a very tricky bug)

Cheers,

Gilles

On Wednesday, December 9, 2015, Paul Kapinos <kapi...@itc.rwth-aachen.de
<mailto:kapi...@itc.rwth-aachen.de>> wrote:

Dear Open MPI developers,
did OMPIO (1) reached 'usable-stable' state?

As we reported in (2) we had some trouble in building Open MPI with
ROMIO,
which fact was hidden by OMPIO implementation stepping into the MPI_IO
breach. The fact 'ROMIO isn't AVBL' was detected after users complained
'MPI_IO don't work as expected with version XYZ of OpenMPI' and further
investigations.

Take a look at the attached example. It deliver different result in
case of
using ROMIO and OMPIO even with 1 MPI rank on local hard disk, cf. (3).
We've seen more examples of divergent behaviour but this one is quite
handy.

Is that a bug in OMPIO or did we miss something?

Best
Paul Kapinos


1) http://www.open-mpi.org/faq/?category=ompio

2) http://www.open-mpi.org/community/lists/devel/2015/12/18405.php

3) (ROMIO is default; on local hard drive at node 'cluster')
$ ompi_info  | grep  romio
   MCA io: romio (MCA v2.0.0, API v2.0.0, Component
v1.10.1)
$ ompi_info  | grep  ompio
   MCA io: ompio (MCA v2.0.0, API v2.0.0, Component
v1.10.1)
$ mpif90 main.f90

$ echo hello1234 > out.txt; $MPIEXEC -np 1 -H cluster  ./a.out;
  fileOffset, fileSize1010
  fileOffset, fileSize2626
  ierr0
  MPI_MODE_WRONLY,  MPI_MODE_APPEND4 128

$ export OMPI_MCA_io=ompio
$ echo hello1234 > out.txt; $MPIEXEC -np 1 -H cluster  ./a.out;
  

Re: [OMPI users] OMPIO correctnes issues

2015-12-09 Thread Edgar Gabriel

what does the mount command return?

On 12/9/2015 9:27 AM, Paul Kapinos wrote:

Dear Edgar,


On 12/09/15 16:16, Edgar Gabriel wrote:

I tested your code in master and v1.10 ( on my local machine), and I get for
both version of ompio exactly the same (correct) output that you had with romio.

I've tested it at local hard disk..

pk224850@cluster:/tmp/pk224850/cluster_15384/TMP[529]$ df -h .
Filesystem  Size  Used Avail Use% Mounted on
/dev/sda3   1.1T   16G  1.1T   2% /w0

pk224850@cluster:/tmp/pk224850/cluster_15384/TMP[530]$ echo hell-o > out.txt;
./a.out
   fileOffset, fileSize 7 7
   fileOffset, fileSize2323
   ierr0
   MPI_MODE_WRONLY,  MPI_MODE_APPEND4 128

pk224850@cluster:/tmp/pk224850/cluster_15384/TMP[531]$ export OMPI_MCA_io=ompio

pk224850@cluster:/tmp/pk224850/cluster_15384/TMP[532]$ echo hell-o > out.txt;
./a.out
   fileOffset, fileSize 0 7
   fileOffset, fileSize 016
   ierr0
   MPI_MODE_WRONLY,  MPI_MODE_APPEND4 128



However, I also noticed that in the ompio version that is in the v1.10 branch,
the MPI_File_get_size function is not implemented on lustre.

Yes we have Lustre in the cluster.
I believe that was one of 'another' issues mentioned, yes some users tend to use
Lustre as HPC file system =)





Thanks
Edgar

On 12/9/2015 8:06 AM, Edgar Gabriel wrote:

I will look at your test case and see what is going on in ompio. That
being said, the vast number of fixes and improvements that went into
ompio over the last two years were not back ported to the 1.8 (and thus
1.10) series, since it would have required changes to the interfaces of
the frameworks involved (and thus would have violated one of rules of
Open MPI release series) . Anyway, if there is a simple fix for your
test case for the 1.10 series, I am happy to provide a patch. It might
take me a day or two however.

Edgar

On 12/9/2015 6:24 AM, Paul Kapinos wrote:

Sorry, forgot to mention: 1.10.1


Open MPI: 1.10.1
  Open MPI repo revision: v1.10.0-178-gb80f802
   Open MPI release date: Nov 03, 2015
Open RTE: 1.10.1
  Open RTE repo revision: v1.10.0-178-gb80f802
   Open RTE release date: Nov 03, 2015
OPAL: 1.10.1
  OPAL repo revision: v1.10.0-178-gb80f802
   OPAL release date: Nov 03, 2015
 MPI API: 3.0.0
Ident string: 1.10.1


On 12/09/15 11:26, Gilles Gouaillardet wrote:

Paul,

which OpenMPI version are you using ?

thanks for providing a simple reproducer, that will make things much easier
from
now.
(and at first glance, that might not be a very tricky bug)

Cheers,

Gilles

On Wednesday, December 9, 2015, Paul Kapinos <kapi...@itc.rwth-aachen.de
<mailto:kapi...@itc.rwth-aachen.de>> wrote:

   Dear Open MPI developers,
   did OMPIO (1) reached 'usable-stable' state?

   As we reported in (2) we had some trouble in building Open MPI with
ROMIO,
   which fact was hidden by OMPIO implementation stepping into the MPI_IO
   breach. The fact 'ROMIO isn't AVBL' was detected after users complained
   'MPI_IO don't work as expected with version XYZ of OpenMPI' and further
   investigations.

   Take a look at the attached example. It deliver different result in
case of
   using ROMIO and OMPIO even with 1 MPI rank on local hard disk, cf. (3).
   We've seen more examples of divergent behaviour but this one is quite
handy.

   Is that a bug in OMPIO or did we miss something?

   Best
   Paul Kapinos


   1) http://www.open-mpi.org/faq/?category=ompio

   2) http://www.open-mpi.org/community/lists/devel/2015/12/18405.php

   3) (ROMIO is default; on local hard drive at node 'cluster')
   $ ompi_info  | grep  romio
  MCA io: romio (MCA v2.0.0, API v2.0.0, Component
v1.10.1)
   $ ompi_info  | grep  ompio
  MCA io: ompio (MCA v2.0.0, API v2.0.0, Component
v1.10.1)
   $ mpif90 main.f90

   $ echo hello1234 > out.txt; $MPIEXEC -np 1 -H cluster  ./a.out;
 fileOffset, fileSize1010
 fileOffset, fileSize2626
 ierr0
 MPI_MODE_WRONLY,  MPI_MODE_APPEND4 128

   $ export OMPI_MCA_io=ompio
   $ echo hello1234 > out.txt; $MPIEXEC -np 1 -H cluster  ./a.out;
 fileOffset, fileSize 010
 fileOffset, fileSize 016
 ierr0
 MPI_MODE_WRONLY,  MPI_MODE_APPEND4 128


   --
   Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
   RWTH Aach

Re: [OMPI users] OMPIO correctnes issues

2015-12-09 Thread Edgar Gabriel

Paul,

I tested your code in master and v1.10 ( on my local machine), and I get 
for both version of ompio exactly the same (correct) output that you had 
with romio. However, I also noticed that in the ompio version that is in 
the v1.10 branch, the MPI_File_get_size function is not implemented on 
lustre. Did you run your test by any chance on a lustre file system?


Thanks
Edgar

On 12/9/2015 8:06 AM, Edgar Gabriel wrote:

I will look at your test case and see what is going on in ompio. That
being said, the vast number of fixes and improvements that went into
ompio over the last two years were not back ported to the 1.8 (and thus
1.10) series, since it would have required changes to the interfaces of
the frameworks involved (and thus would have violated one of rules of
Open MPI release series) . Anyway, if there is a simple fix for your
test case for the 1.10 series, I am happy to provide a patch. It might
take me a day or two however.

Edgar

On 12/9/2015 6:24 AM, Paul Kapinos wrote:

Sorry, forgot to mention: 1.10.1


   Open MPI: 1.10.1
 Open MPI repo revision: v1.10.0-178-gb80f802
  Open MPI release date: Nov 03, 2015
   Open RTE: 1.10.1
 Open RTE repo revision: v1.10.0-178-gb80f802
  Open RTE release date: Nov 03, 2015
   OPAL: 1.10.1
 OPAL repo revision: v1.10.0-178-gb80f802
  OPAL release date: Nov 03, 2015
MPI API: 3.0.0
   Ident string: 1.10.1


On 12/09/15 11:26, Gilles Gouaillardet wrote:

Paul,

which OpenMPI version are you using ?

thanks for providing a simple reproducer, that will make things much easier from
now.
(and at first glance, that might not be a very tricky bug)

Cheers,

Gilles

On Wednesday, December 9, 2015, Paul Kapinos <kapi...@itc.rwth-aachen.de
<mailto:kapi...@itc.rwth-aachen.de>> wrote:

  Dear Open MPI developers,
  did OMPIO (1) reached 'usable-stable' state?

  As we reported in (2) we had some trouble in building Open MPI with ROMIO,
  which fact was hidden by OMPIO implementation stepping into the MPI_IO
  breach. The fact 'ROMIO isn't AVBL' was detected after users complained
  'MPI_IO don't work as expected with version XYZ of OpenMPI' and further
  investigations.

  Take a look at the attached example. It deliver different result in case 
of
  using ROMIO and OMPIO even with 1 MPI rank on local hard disk, cf. (3).
  We've seen more examples of divergent behaviour but this one is quite 
handy.

  Is that a bug in OMPIO or did we miss something?

  Best
  Paul Kapinos


  1) http://www.open-mpi.org/faq/?category=ompio

  2) http://www.open-mpi.org/community/lists/devel/2015/12/18405.php

  3) (ROMIO is default; on local hard drive at node 'cluster')
  $ ompi_info  | grep  romio
 MCA io: romio (MCA v2.0.0, API v2.0.0, Component 
v1.10.1)
  $ ompi_info  | grep  ompio
 MCA io: ompio (MCA v2.0.0, API v2.0.0, Component 
v1.10.1)
  $ mpif90 main.f90

  $ echo hello1234 > out.txt; $MPIEXEC -np 1 -H cluster  ./a.out;
fileOffset, fileSize1010
fileOffset, fileSize2626
ierr0
MPI_MODE_WRONLY,  MPI_MODE_APPEND4 128

  $ export OMPI_MCA_io=ompio
  $ echo hello1234 > out.txt; $MPIEXEC -np 1 -H cluster  ./a.out;
fileOffset, fileSize 010
fileOffset, fileSize 016
ierr0
MPI_MODE_WRONLY,  MPI_MODE_APPEND4 128


  --
  Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
  RWTH Aachen University, IT Center
  Seffenter Weg 23,  D 52074  Aachen (Germany)
  Tel: +49 241/80-24915



___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/12/28145.php



--
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335
--



Re: [OMPI users] OMPIO correctnes issues

2015-12-09 Thread Edgar Gabriel
I will look at your test case and see what is going on in ompio. That 
being said, the vast number of fixes and improvements that went into 
ompio over the last two years were not back ported to the 1.8 (and thus 
1.10) series, since it would have required changes to the interfaces of 
the frameworks involved (and thus would have violated one of rules of 
Open MPI release series) . Anyway, if there is a simple fix for your 
test case for the 1.10 series, I am happy to provide a patch. It might 
take me a day or two however.


Edgar

On 12/9/2015 6:24 AM, Paul Kapinos wrote:

Sorry, forgot to mention: 1.10.1


  Open MPI: 1.10.1
Open MPI repo revision: v1.10.0-178-gb80f802
 Open MPI release date: Nov 03, 2015
  Open RTE: 1.10.1
Open RTE repo revision: v1.10.0-178-gb80f802
 Open RTE release date: Nov 03, 2015
  OPAL: 1.10.1
OPAL repo revision: v1.10.0-178-gb80f802
 OPAL release date: Nov 03, 2015
   MPI API: 3.0.0
  Ident string: 1.10.1


On 12/09/15 11:26, Gilles Gouaillardet wrote:

Paul,

which OpenMPI version are you using ?

thanks for providing a simple reproducer, that will make things much easier from
now.
(and at first glance, that might not be a very tricky bug)

Cheers,

Gilles

On Wednesday, December 9, 2015, Paul Kapinos <kapi...@itc.rwth-aachen.de
<mailto:kapi...@itc.rwth-aachen.de>> wrote:

 Dear Open MPI developers,
 did OMPIO (1) reached 'usable-stable' state?

 As we reported in (2) we had some trouble in building Open MPI with ROMIO,
 which fact was hidden by OMPIO implementation stepping into the MPI_IO
 breach. The fact 'ROMIO isn't AVBL' was detected after users complained
 'MPI_IO don't work as expected with version XYZ of OpenMPI' and further
 investigations.

 Take a look at the attached example. It deliver different result in case of
 using ROMIO and OMPIO even with 1 MPI rank on local hard disk, cf. (3).
 We've seen more examples of divergent behaviour but this one is quite 
handy.

 Is that a bug in OMPIO or did we miss something?

 Best
 Paul Kapinos


 1) http://www.open-mpi.org/faq/?category=ompio

 2) http://www.open-mpi.org/community/lists/devel/2015/12/18405.php

 3) (ROMIO is default; on local hard drive at node 'cluster')
 $ ompi_info  | grep  romio
MCA io: romio (MCA v2.0.0, API v2.0.0, Component 
v1.10.1)
 $ ompi_info  | grep  ompio
MCA io: ompio (MCA v2.0.0, API v2.0.0, Component 
v1.10.1)
 $ mpif90 main.f90

 $ echo hello1234 > out.txt; $MPIEXEC -np 1 -H cluster  ./a.out;
   fileOffset, fileSize1010
   fileOffset, fileSize2626
   ierr0
   MPI_MODE_WRONLY,  MPI_MODE_APPEND4 128

 $ export OMPI_MCA_io=ompio
 $ echo hello1234 > out.txt; $MPIEXEC -np 1 -H cluster  ./a.out;
   fileOffset, fileSize 010
   fileOffset, fileSize 016
   ierr0
   MPI_MODE_WRONLY,  MPI_MODE_APPEND4 128


 --
 Dipl.-Inform. Paul Kapinos   -   High Performance Computing,
 RWTH Aachen University, IT Center
 Seffenter Weg 23,  D 52074  Aachen (Germany)
 Tel: +49 241/80-24915



___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/12/28145.php





--
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335
--



Re: [OMPI users] OpenMPI 1.8.4 - Java Library - allToAllv()

2015-04-08 Thread Edgar Gabriel
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/04/26622.php



___
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/04/26623.php



___
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/04/26631.php



___
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/04/26634.php


___
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/04/26637.php






___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/04/26648.php



--
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335
--


Re: [OMPI users] MPIIO and OrangeFS

2015-02-25 Thread Edgar Gabriel

Two separate comments.

1. I do not know the precise status of the PVFS2 support in 1.8 series 
of Open MPI for ROMIO, I haven't tested it in a while. On master, I know 
that there is a compilation problem with PVFS2 and ROMIO on Open MPI and 
I am about to submit a report/question to ROMIO about that.


2. for OMPIO, we use PVFS2 as our main development platform. However, we 
have honestly not tried to use PVFS2 without the file system being 
mounted (i.e. we do rely on the kernel component to some extent).  Yes, 
internally we use the library interfaces of PVFS2, but we use the file 
system information to determine the type of the file system, and my 
guess is that if that information is not available, the pvfs2 fs (and 
fbtl for that matter) components disable themselves, and that's the 
error that you see. I can look into how to make that scenario work in 
OMPIO, but its definitely not in the 1.8 series.


Thanks
Edgar

On 2/25/2015 2:01 AM, vithanousek wrote:

Thanks for your repaly!

I checked my configuration parametrs and it seem, that everything is correct:
./configure --prefix=/opt/modules/openmpi-1.8.4 --with-sge --with-psm 
--with-pvfs2=/opt/orangefs 
--with-io-romio-flags='--with-file-system=pvfs2+ufs+nfs 
--with-pvfs2=/opt/orangefs'

I have added error chceking code to my app, and I was getting multiple errors, 
like en MPI_ERR_AMODE, MPI_ERR_UNKNOWN, MPI_ERR_NO_SUCH_FILE,MPI_ERR_IO. 
(depend on permisions of mount point of pvfs2, and --mca io romio/ompio --mca 
fs pvfs2)

But it seems that error is in sourcecode of my application, because I cant find 
any more complex documentation about using ROMIO and OMPIO.
I found here https://surfsara.nl/systems/lisa/software/pvfs2, that I should use as filename 
"pvfs2:/pvfs_mount_point/name_of_file" instead of 
"/pvfs_mount_point/name_of_file". This is working with ROMIO.

Do you know how to use OMPIO without mounting pvfs2? if I tryed the same filename format 
as in ROMIO I got "MPI_ERR_FILE: invalid file".
If I use normal filename format ("/mountpoint/filename") and force use of pvfs2 
by using  --mca io ompio --mca fs pvfs2, then my app fails with
mca_fs_base_file_select() failed (and backtrace).

At OrangeFS documentation (http://docs.orangefs.com/v_2_8_8/index.htm) is 
chapter about using ROMIO, and it says, that i shoud compile apps with -lpvfs2. 
I have tryed it, but nothing change (ROMIO works with special filename format, 
OMPIO doesnt work)

Thanks for your help. If you point me to some usefull documentation, I will be 
happy.
Hanousek Vít


-- Původní zpráva --
Od: Rob Latham
Komu: us...@open-mpi.org, vithanou...@seznam.cz
Datum: 24. 2. 2015 22:10:08
Předmět: Re: [OMPI users] MPIIO and OrangeFS

On 02/24/2015 02:00 PM, vithanousek wrote:

Hello,

Im not sure if I have my OrangeFS (2.8.8) and OpenMPI (1.8.4) set up corectly. 
One short questin?

Is it needed to have OrangeFS  mounted  through kernel module, if I want use 
MPIIO?


nope!


My simple MPIIO hello world program doesnt work, If i havent mounted OrangeFS. 
When I mount OrangeFS, it works. So I'm not sure if OMPIO (or ROMIO) is using 
pvfs2 servers directly or if it is using kernel module.

Sorry for stupid question, but I didnt find any documentation about it.


http://www.pvfs.org/cvs/pvfs-2-8-branch-docs/doc/pvfs2-quickstart/pvfs2-quickstart.php#sec:romio

It sounds like you have not configured your MPI implementation with
PVFS2 support (OrangeFS is a re-branding of PVFS2, but as far as MPI-IO
is concerned, they are the same).

OpenMPI passes flags to romio like this at configure time:

   --with-io-romio-flags="--with-file-system=pvfs2+ufs+nfs"

I'm not sure how OMPIO takes flags.

If pvfs2-ping and pvfs2-cp and pvfs2-ls work, then you can bypass the
kernel.

also, please check return codes:

http://stackoverflow.com/questions/22859269/what-do-mpi-io-error-codes-mean/26373193#26373193

==rob



Thanks for replays
Hanousek Vít
___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2015/02/26382.php





--
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335


Re: [OMPI users] Complexity of MPI_Comm_split and MPI_Comm_Create?

2015-01-20 Thread Edgar Gabriel
Here are the communication operations occurring in the best case 
scenario in Open MPI right now:


Comm_create:
  - Communicator ID allocation: 2 Allreduce operations per round of
negotiations
  - 1 Allreduce operation for 'activating' the communicator

Comm_split:
  - 1 Allgather operation for collecting all color keys
  - Communicator ID allocation: 2 Allreduce operations per round of
negotiations
  - 1 Allreduce operation for 'activating' the communicator

As the description above suggests, you might need more than one round 
for the comunicator id allocation, depending on the history of the 
application and which IDs have already been used.


The details for how the operations are implemented can vary. We could 
assume however a binary tree for the reduce and the broadcast portion of 
the Allreduce operation, each being O(log P). For Allgather we could a 
combination of a linear gather (O(P)) and a binary tree broadcast (O(log 
P)).


So as of today, Comm_split is more expensive than Comm_create.

Thanks
Edgar

On 1/19/2015 4:13 PM, Jonathan Eckstein wrote:

Dear Open MPIers:

I have been using MPI for many years, most recently Open MPI.  But I
have just encountered the first situation in which it will be helpful to
create communicators (for an unstructured sparse matrix algorithm).

I have identified two ways I could create the communicators I need.
Where P denotes the number of MPI processors, Option A is:
   1.  Exchange of messages between processors of adjacent rank
   [O(1) message rounds (one up, one down)]
   2.  One scan operation
   [O(log P) message rounds]
   3.  One or two calls to MPI_COMM_SPLIT
   [Unknown complexity]
Option B is:
   1.  Three scan operations (one in reverse direction)
   [O(log P) message rounds + time to make reverse communicator]
   2.  Each processor calls MPI_GROUP_RANGE_INCL and MPI_COMM_CREATE
   at most twice
   [Unknown complexity]

All the groups/communicators I am creating are stride-1 ranges of
contiguous processors from MPI_COMM_WORLD.  Some of them could overlap
by one processor, hence the possible need to call MPI_COMM_SPLIT or
MPI_COMM_CREATE twice per processor.

Option A looks easier to code, but I wonder whether it will scale as
well, because I am not sure about the complexity of MPI_COMM_SPLIT. What
are the parallel message complexities of MPI_COMM_SPLIT and
MPI_COMM_CREATE?  I poked around the web but could not find much on this
topic.

For option B, I will need to make a communicator that has the same
processes as MPI_COMM_WORLD, but in reverse order.  This looks like it
can be done easily with MPI_GROUP_RANGE_INCL with a stride of -1, but
again I am not sure how much communication is required to set up the
communicator -- I would guess O(log P) rounds of messages.

Any advice or explanation you can offer would be much appreciated.

Professor Jonathan Eckstein
Rutgers University


___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post:
http://www.open-mpi.org/community/lists/users/2015/01/26216.php


Re: [OMPI users] MPI-I/O issues

2014-08-13 Thread Edgar Gabriel
Sorry for my silence on that, I was out for a couple of days. I have a 
patch that made ompio work for Mohamads test, he is testing it now on 
his own/additional test cases, and I'll commit it (and file a CMR) as 
soon as we have feedback on that.


Thanks
Edgar
On 8/12/2014 10:30 AM, Jeff Squyres (jsquyres) wrote:

I filed https://svn.open-mpi.org/trac/ompi/ticket/4856 to apply these ROMIO 
patches.

Probably won't happen until 1.8.3.

On Aug 6, 2014, at 2:54 PM, Rob Latham <r...@mcs.anl.gov> wrote:




On 08/06/2014 11:50 AM, Mohamad Chaarawi wrote:


To replicate, run the program with 2 or more procs:

mpirun -np 2 ./hindexed_io mpi_test_file

[jam:15566] *** Process received signal ***
[jam:15566] Signal: Segmentation fault (11)
[jam:15566] Signal code: Address not mapped (1)
[jam:15566] Failing at address: (nil)
[jam:15566] [ 0] [0xfcd440]
[jam:15566] [ 1]
/scr/chaarawi/install/ompi/lib/libmpi.so.1(ADIOI_Flatten_datatype+0x17a)[0xc80f2a]


I bet OpenMPI needs to pick up a few patches for this fault:

- http://git.mpich.org/mpich.git/commit/50f3d5806
- http://git.mpich.org/mpich.git/commit/97114ec5b
- http://git.mpich.org/mpich.git/commit/90e15e9b0
- http://git.mpich.org/mpich.git/commit/76a079c7c


... and two more patches that are sitting in my tree waiting review.


==rob





[jam:15566] [ 2]
/scr/chaarawi/install/ompi/lib/libmpi.so.1(ADIO_Set_view+0x1c1)[0xc72a6d]
[jam:15566] [ 3]
/scr/chaarawi/install/ompi/lib/libmpi.so.1(mca_io_romio_dist_MPI_File_set_view+0x69b)[0xc8d11b]

[jam:15566] [ 4]
/scr/chaarawi/install/ompi/lib/libmpi.so.1(mca_io_romio_file_set_view+0x7c)[0xc4f7c5]

[jam:15566] [ 5]
/scr/chaarawi/install/ompi/lib/libmpi.so.1(PMPI_File_set_view+0x1e6)[0xb32f7e]

[jam:15566] [ 6] ./hindexed_io[0x8048aa6]
[jam:15566] [ 7] /lib/libc.so.6(__libc_start_main+0xdc)[0x7d5ebc]
[jam:15566] [ 8] ./hindexed_io[0x80487e1]
[jam:15566] *** End of error message ***

If I use --mca io ompio with 2 or more procs, the program segfaults in
write_at_all (regardless of what routine is used to construct a 0 sized
datatype):

[jam:15687] *** Process received signal ***
[jam:15687] Signal: Floating point exception (8)
[jam:15687] Signal code: Integer divide-by-zero (1)
[jam:15687] Failing at address: 0x3e29b7
[jam:15687] [ 0] [0xe56440]
[jam:15687] [ 1]
/scr/chaarawi/install/ompi/lib/libmpi.so.1(ompi_io_ompio_set_explicit_offset+0x9d)[0x3513bc]

[jam:15687] [ 2]
/scr/chaarawi/install/ompi/lib/libmpi.so.1(ompio_io_ompio_file_write_at_all+0x3e)[0x35869a]

[jam:15687] [ 3]
/scr/chaarawi/install/ompi/lib/libmpi.so.1(mca_io_ompio_file_write_at_all+0x66)[0x358650]

[jam:15687] [ 4]
/scr/chaarawi/install/ompi/lib/libmpi.so.1(MPI_File_write_at_all+0x1b3)[0x1f46f3]

[jam:15687] [ 5] ./hindexed_io[0x8048b07]
[jam:15687] [ 6] /lib/libc.so.6(__libc_start_main+0xdc)[0x7d5ebc]
[jam:15687] [ 7] ./hindexed_io[0x80487e1]
[jam:15687] *** End of error message ***

If I use mpich 3.1.2 , I don't see those issues.

Thanks,
Mohamad


___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/08/24931.php



--
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA
___
users mailing list
us...@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: 
http://www.open-mpi.org/community/lists/users/2014/08/24934.php





--
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335


Re: [OMPI users] bug in MPI_File_set_view?

2014-05-15 Thread Edgar Gabriel
2] [ 7] 
> /usr/local/lib/openmpi/mca_io_romio.so(mca_io_romio_dist_MPI_File_close+0xe8)[0x7f58315c9dd8]
> [oriol-VirtualBox:13972] [ 8] 
> /usr/local/lib/libmpi.so.1(+0x3a2c6)[0x7f584583c2c6]
> [oriol-VirtualBox:13972] [ 9] 
> /usr/local/lib/libmpi.so.1(ompi_file_close+0x41)[0x7f584583c811]
> [oriol-VirtualBox:13972] [10] 
> /usr/local/lib/libmpi.so.1(PMPI_File_close+0x78)[0x7f5845878118]
> [oriol-VirtualBox:13972] [11] ./binary[0x42099e]
> [oriol-VirtualBox:13972] [12] ./binary[0x48ed86]
> [oriol-VirtualBox:13972] [13] ./binary[0x40e49f]
> [oriol-VirtualBox:13972] [14] 
> /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f5844a2ede5]
> [oriol-VirtualBox:13972] [15] ./binary[0x40d679]
> [oriol-VirtualBox:13972] *** End of error message ***
> --
> mpirun noticed that process rank 2 with PID 13969 on node oriol-VirtualBox 
> exited on signal 6 (Aborted).
> --
> 
> 

-- 
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335



signature.asc
Description: OpenPGP digital signature


Re: [OMPI users] bug in MPI_File_set_view?

2014-05-14 Thread Edgar Gabriel
We also fixed a similar bug in OMPIO roughly one year back, so I would
hope that it should work with OMPIO as well.

Thanks
Edga

On 5/14/2014 9:17 AM, Ralph Castain wrote:
> You might give it a try with 1.8.1 or the nightly snapshot from 1.8.2 - we 
> updated ROMIO since the 1.6 series, and whatever fix is required may be in 
> the newer version
> 
> 
> On May 14, 2014, at 6:52 AM, CANELA-XANDRI Oriol 
> <oriol.canela-xan...@roslin.ed.ac.uk> wrote:
> 
>> Hello,
>>
>> I am using MPI IO for writing/reading  a block cyclic distribution matrix 
>> into a file.
>>
>> It works fine except when there is some MPI threads with no data (i.e. when 
>> the matrix is small enough, or the block size is big enough that some 
>> processes in the grid do not have any matrix block). In this case, I receive 
>> an error when calling MPI_File_set_view saying that the data cannot be 
>> freed. I tried with 1.3 and 1.6 versions. When I try with MPICH it works 
>> without errors. Could this be a bug?
>>
>> My function is (where nBlockRows/nBlockCols define the size of the blocks, 
>> nGlobRows/nGlobCols define the global size of the matrix, 
>> nProcRows/nProcCols define the dimensions of the process grid, and fname is 
>> the name of the file.):
>>
>> void Matrix::writeMatrixMPI(std::string fname) {
>>  int dims[] = {this->nGlobRows, this->nGlobCols};
>>  int dargs[] = {this->nBlockRows, this->nBlockCols};
>>  int distribs[] = {MPI_DISTRIBUTE_CYCLIC, MPI_DISTRIBUTE_CYCLIC};
>>  int dim[] = {communicator->nProcRows, communicator->nProcCols};
>>  char nat[] = "native";
>>  int rc;
>>  MPI_Datatype dcarray;
>>  MPI_File cFile;
>>  MPI_Status status;
>>
>>  MPI_Type_create_darray(communicator->mpiNumTasks, communicator->mpiRank, 2, 
>> dims, distribs, dargs, dim, MPI_ORDER_FORTRAN, MPI_DOUBLE, );
>>  MPI_Type_commit();
>>
>>  std::vector fn(fname.begin(), fname.end());
>>  fn.push_back('\0');
>>  rc = MPI_File_open(MPI_COMM_WORLD, [0], MPI_MODE_CREATE | 
>> MPI_MODE_WRONLY, MPI_INFO_NULL, );
>>  if(rc){
>>std::stringstream ss;
>>ss << "Error: Failed to open file: " << rc;
>>misc.error(ss.str(), 0);
>>  }
>>  else
>>  {
>>MPI_File_set_view(cFile, 0, MPI_DOUBLE, dcarray, nat, MPI_INFO_NULL);
>>MPI_File_write_all(cFile, this->m, this->nRows*this->nCols, MPI_DOUBLE, 
>> );
>>  }
>>  MPI_File_close();
>>  MPI_Type_free();
>> }
>>
>> Best regards,
>>
>> Oriol
>>
>> -- 
>> The University of Edinburgh is a charitable body, registered in
>> Scotland, with registration number SC005336.
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 

-- 
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335



signature.asc
Description: OpenPGP digital signature


Re: [OMPI users] OpenMPI-ROMIO-OrangeFS

2014-03-25 Thread Edgar Gabriel
yes, the patch has been submitted to the 1.6 branch for review, not sure
what the precise status of it is. The problems found are more or less
independent of the PVFS2 version.

Thanks
Edga

On 3/25/2014 7:32 AM, Dave Love wrote:
> Edgar Gabriel <gabr...@cs.uh.edu> writes:
> 
>> I am still looking into the PVFS2 with ROMIO problem with the 1.6
>> series, where (as I mentioned yesterday) the problem I am having right
>> now is that the data is wrong. Not sure what causes it, but since I have
>> teach this afternoon again, it might be friday until I can digg into that.
> 
> Was there any progress with this?  Otherwise, what version of PVFS2 is
> known to work with OMPI 1.6?  Thanks.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 

-- 
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335



signature.asc
Description: OpenPGP digital signature


Re: [OMPI users] OpenMPI-ROMIO-OrangeFS

2014-02-27 Thread Edgar Gabriel
On 2/27/2014 9:44 AM, Dave Love wrote:
> Edgar Gabriel <gabr...@cs.uh.edu> writes:
> 
>> so we had ROMIO working with PVFS2 (not OrangeFS, which is however
>> registered as PVFS2 internally). We have one cluster which uses
>> OrangeFS, on that machine however we used OMPIO, not ROMIO.
> 
> [What's OMPIO, and should we want it?]

OMPIO is the 'native' implementation of MPI I/O in Open MPI, its however
only available from the 1.7 series onwards.

I am still looking into the PVFS2 with ROMIO problem with the 1.6
series, where (as I mentioned yesterday) the problem I am having right
now is that the data is wrong. Not sure what causes it, but since I have
teach this afternoon again, it might be friday until I can digg into that.

Thanks
Edgar

> 
> This is another vote for working 1.6.5 romio with orangefs -- or pvfs2
> if the re-branded version won't work.  It seems particularly attractive
> for use on a system that was mistakenly bought with just NFS, especially
> if it can be spun up in user space practically.
> 
> I got OMPI built for 1.6.5 with the two obvious patches from the repo,
> but it sounds as though that's not good enough.
> 
> 1.6 is required for compatibility with RHEL6, by the way.
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 

-- 
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335



signature.asc
Description: OpenPGP digital signature


Re: [OMPI users] OrangeFS ROMIO support

2014-02-26 Thread Edgar Gabriel
that was my fault, I did not follow up the time, got probably side
tracked by something. Anyway, I suspect that you actually have the
patch, otherwise the current Open MPI trunk and the 1.7 release series
would not have the patch after the last ROMIO update  - at least I did
not reapply it, not sure whether Nathan did.

Thanks
Edgar

On 2/26/2014 4:52 PM, Latham, Robert J. wrote:
> On Tue, 2014-02-25 at 07:26 -0600, Edgar Gabriel wrote:
>> this was/is a bug in ROMIO, in which they assume a datatype is an int. I
>> fixed it originally in a previous version of Open MPI on the trunk, but
>> it did not get ported upstream, so we might have to do the same fix again.
>>
> 
> Sorry about that.  I'm going through OpenMPI SVN now to see what other
> gems I may have dropped.
> 
> ==rob
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 

-- 
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335



signature.asc
Description: OpenPGP digital signature


Re: [OMPI users] OpenMPI-ROMIO-OrangeFS

2014-02-26 Thread Edgar Gabriel
ok, then this must be a difference between OrangeFS and PVFs2.  It turns
out that trunk and 1.7 does actually have the patch, but 1.6 series does
not have it. The actual commit was done in

https://svn.open-mpi.org/trac/ompi/changeset/24768

and based on the line numbers, I think it should apply cleanly.
Using that patch, 1.6 compiles and executes without complaining about
PVFS2 support, but running my testsuite brings up a ton of data errors.

I am still digging a bit into that, but that's where I am right now.

Thanks
Edgar




On 2/26/2014 3:17 PM, vithanousek wrote:
> At first Thank you very much for your time.
> 
> "--with-file-system=pvfs2+ufs+nfs" didnt help.
> 
> But if find (by google) some part of orangefs test. I dont know what is
> this exactly doing, but when I edited source code of OpenMPI like doing
> this line, all seems that it is working now. (changing
> ADIOI_PVFS2_IReadContig and ADIOI_PVFS2_IWriteContig to NULL in file
> ad_pvfs2.c)
> 
> http://www.orangefs.org/trac/orangefs/browser/branches/OFSTest-dev/OFSTest/OFSTestNode.py?rev=10645#L1328
> 
> Other tests I will do tomorrow.
> 
> Thanks
> Hanousek Vít
> 
> 
> 
> -- Původní zpráva --
> Od: Edgar Gabriel <gabr...@cs.uh.edu>
> Komu: us...@open-mpi.org
> Datum: 26. 2. 2014 21:08:07
> Předmět: Re: [OMPI users] OpenMPI-ROMIO-OrangeFS
> 
> 
> not sure whether its the problem or not, but usually have an additional
> flag set :
> 
> --with-io-romio-flags="--with-file-system=pvfs2+ufs+nfs
> --with-pvfs2=/opt/pvfs-2.8.2"
> 
> compilation is a bit slow for me today...
> 
> Edgar
> 
> 
> On 2/26/2014 2:05 PM, vithanousek wrote:
> > Now I compiled by doing this:
> > OrangeFS (original, withou editing):
> >
> > ./configure --prefix=/usr/local/orangefs
> > --with-kernel=/usr/src/kernels/2.6.32-431.5.1.el6.x86_64
> > --with-openib=/usr --without-bmi-tcp --enable-shared
> > make
> > make kmod
> > make install
> > make kmod_install
> >
> > Without error.
> > OpenMPI (with edited switch to ifs):
> >
> > ./configure --prefix=/usr/local/openmpi_1.6.5_romio
> > --with-io-romio-flags='--with-pvfs2=/usr/local/orangefs'
> > make
> > make install
> >
> > Without error.
> > parallel FS mount work. But I still cant use MPIIO.
> > I compiled simple MPIIO program and run it by this:
> >
> > mpicc -o mpiio mpiio.c
> > mpirun -np 1 -host node18 mpiio
> > [node18:02334] mca: base: component_find: unable to open
> > /usr/local/openmpi_1.6.5_romio/lib/openmpi/mca_io_romio:
> > /usr/local/openmpi_1.6.5_romio/lib/openmpi/mca_io_romio.so: undefined
> > symbol: ADIOI_PVFS2_IReadContig (ignored)
> >
> > And no file is created.
> > I tried compile it with:
> > mpicc -o mpiio mpiio.c -lpvfs2 -L/usr/local/orangefs/lib
> >
> > but i got the same results, have You any idea?
> >
> > Thank for reply
> > Hanousek Vít
> >
> >
> >
> >
> >
> > -- Původní zpráva --
> > Od: vithanousek <vithanou...@seznam.cz>
> > Komu: Open MPI Users <us...@open-mpi.org>
> > Datum: 26. 2. 2014 20:30:17
> > Předmět: Re: [OMPI users] OpenMPI-ROMIO-OrangeFS
> >
> >
> > Thanks for your Time,
> >
> > I'm little bit confused, what is diferent between pvfs2 and
> > orangefs. I was thinking, that only project changes name.
> >
> > I get hint from OrangeFS maillist, to compile OrangeFs with
> > --enable-shared. This produce a some shared library (.so) in
> > /usr/local/orangefs/lib and I can compile OpenMPI 1.6.5 now (with
> > fixed "switch =>ifs" in ROMIO).
> >
> > I will test if it is working in next hour (some configuration steps
> > is needed).
> >
> > Thanks.
> > Hanousek Vít
> >
> > -- Původní zpráva --
> > Od: Edgar Gabriel <gabr...@cs.uh.edu>
> > Komu: Open MPI Users <us...@open-mpi.org>
> > Datum: 26. 2. 2014 20:18:03
> > Předmět: Re: [OMPI users] OpenMPI-ROMIO-OrangeFS
> >
> >
> > so we had ROMIO working with PVFS2 (not OrangeFS, which is however
> > registered as PVFS2 internally). We have one cluster which uses
> > OrangeFS, on that machine however we used OMPIO, not ROMIO. I am
> > curren

Re: [OMPI users] OpenMPI-ROMIO-OrangeFS

2014-02-26 Thread Edgar Gabriel
not sure whether its the problem or not, but usually have an additional
flag set :

 --with-io-romio-flags="--with-file-system=pvfs2+ufs+nfs
--with-pvfs2=/opt/pvfs-2.8.2"

compilation is a bit slow for me today...

Edgar


On 2/26/2014 2:05 PM, vithanousek wrote:
> Now I compiled by doing this:
> OrangeFS (original, withou editing):
> 
> ./configure --prefix=/usr/local/orangefs
> --with-kernel=/usr/src/kernels/2.6.32-431.5.1.el6.x86_64
> --with-openib=/usr --without-bmi-tcp --enable-shared
> make
> make kmod
> make install
> make kmod_install
> 
> Without error.
> OpenMPI (with edited switch to ifs):
> 
> ./configure --prefix=/usr/local/openmpi_1.6.5_romio
> --with-io-romio-flags='--with-pvfs2=/usr/local/orangefs'
> make
> make install
> 
> Without error.
> parallel FS mount work. But I still cant use MPIIO.
> I compiled simple MPIIO program and run it by this:
> 
> mpicc -o mpiio mpiio.c
> mpirun -np 1 -host node18 mpiio
> [node18:02334] mca: base: component_find: unable to open
> /usr/local/openmpi_1.6.5_romio/lib/openmpi/mca_io_romio:
> /usr/local/openmpi_1.6.5_romio/lib/openmpi/mca_io_romio.so: undefined
> symbol: ADIOI_PVFS2_IReadContig (ignored)
> 
> And no file is created.
> I tried compile it with:
> mpicc -o mpiio mpiio.c -lpvfs2 -L/usr/local/orangefs/lib
> 
> but i got the same results, have You any idea?
> 
> Thank for reply
> Hanousek Vít
> 
> 
> 
> 
> 
> -- Původní zpráva --
> Od: vithanousek <vithanou...@seznam.cz>
> Komu: Open MPI Users <us...@open-mpi.org>
> Datum: 26. 2. 2014 20:30:17
> Předmět: Re: [OMPI users] OpenMPI-ROMIO-OrangeFS
> 
> 
> Thanks for your Time,
> 
> I'm little bit confused, what is diferent between pvfs2 and
> orangefs. I was thinking, that only project changes name.
> 
> I get hint from OrangeFS maillist, to compile OrangeFs with
> --enable-shared. This produce a some shared library (.so) in
> /usr/local/orangefs/lib and I can compile OpenMPI 1.6.5 now (with
> fixed "switch =>ifs" in ROMIO).
> 
> I will test if it is working in next hour (some configuration steps
> is needed).
> 
> Thanks.
> Hanousek Vít
> 
> -- Původní zpráva --
> Od: Edgar Gabriel <gabr...@cs.uh.edu>
> Komu: Open MPI Users <us...@open-mpi.org>
> Datum: 26. 2. 2014 20:18:03
> Předmět: Re: [OMPI users] OpenMPI-ROMIO-OrangeFS
> 
> 
> so we had ROMIO working with PVFS2 (not OrangeFS, which is however
> registered as PVFS2 internally). We have one cluster which uses
> OrangeFS, on that machine however we used OMPIO, not ROMIO. I am
> currently compiling the 1.6 version of Open MPI to see whether I can
> reproduce your problem.
> 
> Thanks
> Edgar
> 
> On 2/26/2014 12:23 PM, vithanousek wrote:
> > Thanks for reply,
> >
> > Is it possible that the patch solvs all this problems, not
> only "switch
> > => ifs" problem?
> > I realy dont know, wher the problem is now (OpenMPI, ROMIO,
> OrangeFS).
> >
> > Thanks
> > Hanousek Vít
> >
> > -- Původní zpráva --
> > Od: Ralph Castain <r...@open-mpi.org>
> > Komu: Open MPI Users <us...@open-mpi.org>
> > Datum: 26. 2. 2014 19:16:36
> > Předmět: Re: [OMPI users] OpenMPI-ROMIO-OrangeFS
> >
> >
> > Edgar hasn't had a chance to find the necessary patch - he was on
> > travel, returning soon.
> >
> >
> > On Feb 26, 2014, at 9:27 AM, vithanousek
> <vithanou...@seznam.cz> wrote:
> >
> > > Hello,
> > >
> > > I have still problems with compiling OpenMPI 1.6.5 with OrangeFS
> > 2.8.7 support.
> > >
> > > I compiled OrangeFS by this:
> > >
> > > ./configure --prefix=/usr/local/orangefs2
> > --with-kernel=/usr/src/kernels/2.6.32-431.5.1.el6.x86_64
> > --with-openib=/usr --without-bmi-tcp
> > > make -j 32
> > > make -j 32 kmod
> > > make install
> > > make kmod_install
> > >
> > > this works.
> > > than I tried to compile OpenMPI (with fixed convert_named
> function
> > in ad_pvfs2_io_dtype.c) by this:
> > >
> &g

Re: [OMPI users] OpenMPI-ROMIO-OrangeFS

2014-02-26 Thread Edgar Gabriel
so we had ROMIO working with PVFS2 (not OrangeFS, which is however
registered as PVFS2 internally). We have one cluster which uses
OrangeFS, on that machine however we used OMPIO, not ROMIO. I am
currently compiling the 1.6 version of Open MPI to see whether I can
reproduce your problem.

Thanks
Edgar

On 2/26/2014 12:23 PM, vithanousek wrote:
> Thanks for reply,
> 
> Is it possible that the patch solvs all this problems, not only "switch
> => ifs" problem?
> I realy dont know, wher the problem is now (OpenMPI, ROMIO, OrangeFS).
> 
> Thanks
> Hanousek Vít
> 
> -- Původní zpráva --
> Od: Ralph Castain <r...@open-mpi.org>
> Komu: Open MPI Users <us...@open-mpi.org>
> Datum: 26. 2. 2014 19:16:36
> Předmět: Re: [OMPI users] OpenMPI-ROMIO-OrangeFS
> 
> 
> Edgar hasn't had a chance to find the necessary patch - he was on
> travel, returning soon.
> 
> 
> On Feb 26, 2014, at 9:27 AM, vithanousek <vithanou...@seznam.cz> wrote:
> 
> > Hello,
> >
> > I have still problems with compiling OpenMPI 1.6.5 with OrangeFS
> 2.8.7 support.
> >
> > I compiled OrangeFS by this:
> >
> > ./configure --prefix=/usr/local/orangefs2
> --with-kernel=/usr/src/kernels/2.6.32-431.5.1.el6.x86_64
> --with-openib=/usr --without-bmi-tcp
> > make -j 32
> > make -j 32 kmod
> > make install
> > make kmod_install
> >
> > this works.
> > than I tried to compile OpenMPI (with fixed convert_named function
> in ad_pvfs2_io_dtype.c) by this:
> >
> > ./configure --prefix=/usr/local/openmpi_1.6.5_romio2
> --with-io-romio-flags='--with-pvfs2=/usr/local/orangefs2'
> > (...)
> > make -j32
> > (...)
> > CCLD mca_io_romio.la
> > /usr/bin/ld: /usr/local/orangefs2/lib/libpvfs2.a(errno-mapping.o):
> relocation R_X86_64_32S against `PINT_errno_mapping' can not be used
> when making a shared object; recompile with -fPIC
> > /usr/local/orangefs2/lib/libpvfs2.a: could not read symbols: Bad value
> > collect2: ld returned 1 exit status
> > make[3]: *** [mca_io_romio.la] Error 1
> >
> > So I tried recompile OrangeFS by this:
> >
> > export CFLAGS="-fPIC"
> > ./configure --prefix=/usr/local/orangefs2
> --with-kernel=/usr/src/kernels/2.6.32-431.5.1.el6.x86_64
> --with-openib=/usr --without-bmi-tcp
> > make -j 32
> > make -j 32 kmod
> > make install
> > make kmod_install
> >
> > (there was errors with current->fsuid => current->cred->fsuid, in
> multiple files. I hardcoded this in files, bad idea I know )
> > Then compilation of OpenMPI works.
> >
> > ./configure --prefix=/usr/local/openmpi_1.6.5_romio2
> --with-io-romio-flags='--with-pvfs2=/usr/local/orangefs2'
> > make -j32
> > make install
> >
> > but when i created simple program which is using MPIIO, it failed
> when i run it:
> >
> > mpirun -np 1 -host node18 mpiio
> > [node18:01696] mca: base: component_find: unable to open
> /usr/local/openmpi_1.6.5_romio/lib/openmpi/mca_io_romio:
> /usr/local/openmpi_1.6.5_romio/lib/openmpi/mca_io_romio.so:
> undefined symbol: ADIOI_PVFS2_IReadContig (ignored)
> >
> > Because I got message form OrangeFS mailing list about -fPIC
> errors, i tryed to recompile OrangeFS withou this flag and compile
> OpenMPI (static linked) by this:
> >
> > ./congure --prefix=/usr/local/openmpi_1.6.5_romio2
> --with-io-romio-flags='--with-pvfs2=/usr/local/orangefs2'
> --enable-static --disable-shared
> > (...)
> > make -j 32
> > (...)
> > CCLD otfmerge-mpi
> >
> 
> /root/openmpi-1.6.5/ompi/contrib/vt/vt/../../../.libs/libmpi.a(ad_pvfs2.o):(.data+0x60):
> undefined reference to `ADIOI_PVFS2_IReadContig'
> >
> 
> /root/openmpi-1.6.5/ompi/contrib/vt/vt/../../../.libs/libmpi.a(ad_pvfs2.o):(.data+0x68):
> undefined reference to `ADIOI_PVFS2_IWriteContig'
> > collect2: ld returned 1 exit status
> > make[10]: *** [otfmerge-mpi] Error 1
> > (...)
> >
> > Now I realy dont know, what is wrong.
> > Is there Anybody ho has OpenMPI working with OrangeFS?
> >
> > Thanks for replies
> > HanousekVít
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mp

Re: [OMPI users] OrangeFS ROMIO support

2014-02-25 Thread Edgar Gabriel
give me day ( I am currently out of town) to dig up the patch again, and
reapply it again to the trunk, we can CMR it to 1.6 and 1.7. I did not
follow up at that time, since nobody else seemed to care. I guess the
number of users using the combination of OPen MPI + ROMIO + PVFS2 is
pretty low.

Thanks
Edgar

On 2/25/2014 7:34 AM, Jeff Squyres (jsquyres) wrote:
> Edgar --
> 
> Is there a fix that we should CMR to the v1.6 branch?
> 
> 
> On Feb 25, 2014, at 8:26 AM, Edgar Gabriel <gabr...@cs.uh.edu> wrote:
> 
>> this was/is a bug in ROMIO, in which they assume a datatype is an int. I
>> fixed it originally in a previous version of Open MPI on the trunk, but
>> it did not get ported upstream, so we might have to do the same fix again.
>>
>> Thanks
>> Edgar
>>
>> On 2/25/2014 7:15 AM, vithanousek wrote:
>>> Hello,
>>>
>>> At fisrt, please, excuse my poor level of english.
>>>
>>> I'm little bit confused by versions of OpenMPI and ROMIO, because i met
>>> siliar bugs reported in multiple versions. Im buliding version 1.6.5
>>> (current stable).
>>>
>>> I compiled OpenMPI 1.6.5 with included ROMIO by doing this:
>>>
>>> ./configure --prefix=/usr/local/openmpi_1.6.5_romio \
>>> --with-io-romio-flags='--with-pvfs2=/usr/local/orangefs'
>>> make -j 32
>>>
>>> and I got error message:
>>> (...)
>>> ad_pvfs2_io_dtype.c: In function 'convert_named':
>>> ad_pvfs2_io_dtype.c:581: error: switch quantity not an integer
>>> ad_pvfs2_io_dtype.c:583: error: pointers are not permitted as case values
>>> (...)
>>>
>>> I rewrited "switch" construction to multiple "if" constructions. This
>>> solve compiling problem.
>>> But I cant use myself edited source code of OpenMPI on our cluster.
>>> Is this bug of sourcecode and will be fixed, or am I doing something wrong?
>>>
>>> Thanks for reply
>>> Hanousek Vít
>>>
>>>
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>
>> -- 
>> Edgar Gabriel
>> Associate Professor
>> Parallel Software Technologies Lab  http://pstl.cs.uh.edu
>> Department of Computer Science  University of Houston
>> Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
>> Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 

-- 
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335



signature.asc
Description: OpenPGP digital signature


Re: [OMPI users] OrangeFS ROMIO support

2014-02-25 Thread Edgar Gabriel
this was/is a bug in ROMIO, in which they assume a datatype is an int. I
fixed it originally in a previous version of Open MPI on the trunk, but
it did not get ported upstream, so we might have to do the same fix again.

Thanks
Edgar

On 2/25/2014 7:15 AM, vithanousek wrote:
> Hello,
> 
> At fisrt, please, excuse my poor level of english.
> 
> I'm little bit confused by versions of OpenMPI and ROMIO, because i met
> siliar bugs reported in multiple versions. Im buliding version 1.6.5
> (current stable).
> 
> I compiled OpenMPI 1.6.5 with included ROMIO by doing this:
> 
> ./configure --prefix=/usr/local/openmpi_1.6.5_romio \
> --with-io-romio-flags='--with-pvfs2=/usr/local/orangefs'
> make -j 32
> 
> and I got error message:
> (...)
> ad_pvfs2_io_dtype.c: In function 'convert_named':
> ad_pvfs2_io_dtype.c:581: error: switch quantity not an integer
> ad_pvfs2_io_dtype.c:583: error: pointers are not permitted as case values
> (...)
> 
> I rewrited "switch" construction to multiple "if" constructions. This
> solve compiling problem.
> But I cant use myself edited source code of OpenMPI on our cluster.
> Is this bug of sourcecode and will be fixed, or am I doing something wrong?
> 
> Thanks for reply
> Hanousek Vít
> 
> 
> _______
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 

-- 
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335



signature.asc
Description: OpenPGP digital signature


Re: [OMPI users] opening a file with MPI-IO

2013-05-17 Thread Edgar Gabriel
wow, lets try again in english :-)

can you maybe detail more precisely what scenario you are particularly
worried about? I would think that the return code of the operation
is reliable on whether opening the file was successful or not (i.e.
MPI_SUCCESS vs. anything else).

Edgar

On 5/17/2013 7:55 AM, Edgar Gabriel wrote:
> can you maybe detail more precisely what scenario you are particularly
> worried about? I would think that the return code of the operation
> should be reliable on whether opening the file successful or (i.e.
> MPI_SUCCESS vs. anything else).
> 
> Edgar
> 
> On 5/17/2013 4:00 AM, Peter van Hoof wrote:
>> Dear users,
>>
>> I have been banging my head against the wall for some time to find a
>> reliable and portable way to determine if a call to MPI::File::Open()
>> was successful or not.
>>
>> Let me give some background information first. We develop an open-source
>> astrophysical modeling code called Cloudy. This is used by many
>> scientists on a variety of platforms. We obviously have no control over
>> the MPI version that is installed on that platform, it may not even be
>> open-MPI. So what we need is a method that is supported by all MPI distros.
>>
>> Our code is written in C++, so we use the C++ version of the MPI and
>> MPI-IO libraries.
>>
>> Any help would be greatly appreciated.
>>
>>
>> Cheers,
>>
>> Peter.
>>
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 



signature.asc
Description: OpenPGP digital signature


Re: [OMPI users] opening a file with MPI-IO

2013-05-17 Thread Edgar Gabriel
can you maybe detail more precisely what scenario you are particularly
worried about? I would think that the return code of the operation
should be reliable on whether opening the file successful or (i.e.
MPI_SUCCESS vs. anything else).

Edgar

On 5/17/2013 4:00 AM, Peter van Hoof wrote:
> Dear users,
> 
> I have been banging my head against the wall for some time to find a
> reliable and portable way to determine if a call to MPI::File::Open()
> was successful or not.
> 
> Let me give some background information first. We develop an open-source
> astrophysical modeling code called Cloudy. This is used by many
> scientists on a variety of platforms. We obviously have no control over
> the MPI version that is installed on that platform, it may not even be
> open-MPI. So what we need is a method that is supported by all MPI distros.
> 
> Our code is written in C++, so we use the C++ version of the MPI and
> MPI-IO libraries.
> 
> Any help would be greatly appreciated.
> 
> 
> Cheers,
> 
> Peter.
> 



signature.asc
Description: OpenPGP digital signature


Re: [OMPI users] problem with groups and communicators in openmpi-1.6.4rc2

2013-01-19 Thread Edgar Gabriel
roup_w_ntasks;
>>  for (i = 0; i < group_w_ntasks; ++i)
>>  {
>>  num_columns[i] = tmp;   /* number of columns*/
>>  }
>>  for (i = 0; i < (Q % group_w_ntasks); ++i)  /* adjust size  */
>>  {
>>  num_columns[i]++;
>>  }
>>  for (i = 0; i < group_w_ntasks; ++i)
>>  {
>>  /* nothing to do because "column_t" contains already all
>>   * elements of a column, i.e., the "size" is equal to the
>>   * number of columns in the block
>>   */
>>  sr_counts[i] = num_columns[i];  /* "size" of column-block   */
>>  }
>>  sr_disps[0] = 0;/* start of i-th column-block   
>> */
>>  for (i = 1; i < group_w_ntasks; ++i)
>>  {
>>  sr_disps[i] = sr_disps[i - 1] + sr_counts[i - 1];
>>  }
>>}
>>/* inform all processes about their column block sizes*/
>>MPI_Bcast (num_columns, group_w_ntasks, MPI_INT, 0, COMM_WORKER);
>>/* allocate memory for a column block and define a new derived
>> * data type for the column block. This data type is possibly
>> * different for different processes if the number of processes
>> * isn't a factor of the row size of the original matrix. Don't
>> * forget to resize the extent of the new data type in such a
>> * way that the extent of the whole column looks like just one
>> * element so that the next column starts in col_block[0][i]
>> * in MPI_Scatterv/MPI_Gatherv.
>> */
>>col_block = (double **) malloc (P * num_columns[group_w_mytid] *
>>  sizeof (double));
>>TestEqualsNULL (col_block);
>>MPI_Type_vector (P, 1, num_columns[group_w_mytid], MPI_DOUBLE,
>>   _column_t);
>>MPI_Type_create_resized (tmp_column_t, 0, sizeof (double),
>>   _block_t);
>>MPI_Type_commit (_block_t);
>>MPI_Type_free (_column_t);
>>/* send column block i of "matrix" to process i   */
>>MPI_Scatterv (matrix, sr_counts, sr_disps, column_t,
>>col_block, num_columns[group_w_mytid],
>>col_block_t, 0, COMM_WORKER);
>>/* Modify column elements. The compiler doesn't know the structure
>> * of the column block matrix so that you have to do the index
>> * calculations for mat[i][j] yourself. In C a matrix is stored
>> * row-by-row so that the i-th row starts at location "i * q" if
>> * the matrix has "q" columns. Therefore the address of mat[i][j]
>> * can be expressed as "(double *) mat + i * q + j" and mat[i][j]
>> * itself as "*((double *) mat + i * q + j)".
>> */
>>for (i = 0; i < P; ++i)
>>{
>>  for (j = 0; j < num_columns[group_w_mytid]; ++j)
>>  {
>>  if ((group_w_mytid % 2) == 0)
>>  {
>>/* col_block[i][j] *= col_block[i][j] */
>>
>>*((double *) col_block + i * num_columns[group_w_mytid] + j) *=
>>*((double *) col_block + i * num_columns[group_w_mytid] + j);
>>  }
>>  else
>>  {
>>/* col_block[i][j] *= FACTOR  */
>>
>>*((double *) col_block + i * num_columns[group_w_mytid] + j) *=
>>  FACTOR;
>>  }
>>  }
>>}
>>/* receive column-block i of "matrix" from process i  */
>>MPI_Gatherv (col_block, num_columns[group_w_mytid], col_block_t,
>>   matrix, sr_counts, sr_disps, column_t,
>>   0, COMM_WORKER);
>>if (group_w_mytid == 0)
>>{
>>  printf ("\n\nresult matrix:\n"
>>"  elements are sqared in columns:\n  ");
>>  tmp  = 0;
>>  tmp1 = 0;
>>  for (i = 0; i < group_w_ntasks; ++i)
>>  {
>>  tmp1 = tmp1 + num_columns[i];
>>  if ((i % 2) == 0)
>>  {
>>for (j = tmp; j < tmp1; ++j)
>>{
>>  printf ("%4d", j);
>>}
>>  }
>>  tmp = tmp1;
>>  }
>>  printf ("\n  elements are multiplied with %d in columns:\n  ",
>>FACTOR);
>>  tmp  = 0;
>>  tmp1 = 0;
>>  for (i = 0; i < group_w_ntasks; ++i)
>>  {
>>  tmp1 = tmp1 + num_columns[i];
>>  if ((i % 2) != 0)
>>  {
>>for (j = tmp; j < tmp1; ++j)
>>{
>>  printf ("%4d", j);
>>}
>>  }
>>  tmp = tmp1;
>>  }
>>  printf ("\n\n\n");
>>  print_matrix (P, Q, (double **) matrix);
>>  free (sr_counts);
>>  free (sr_disps);
>>}
>>free (num_columns);
>>free (col_block);
>>MPI_Type_free (_t);
>>MPI_Type_free (_block_t);
>>MPI_Comm_free (_WORKER);
>>  }
>>
>>
>>  /* =
>>   * ==  This is the group "group_other". ==
>>   * =
>>   */
>>  MPI_Group_rank (group_other, _o_mytid);
>>  if (group_o_mytid != MPI_UNDEFINED)
>>  {
>>/* Nothing to do (only to demonstrate how to divide work for
>> * different groups).
>> */
>>MPI_Comm_size (COMM_OTHER, _o_ntasks);
>>if (group_o_mytid == 0)
>>{
>>  if (group_o_ntasks == 1)
>>  {
>>  printf ("\nGroup \"group_other\" contains %d process "
>>  "which has\n"
>>  "nothing to do.\n\n", group_o_ntasks);
>>  }
>>  else
>>  {
>>  printf ("\nGroup \"group_other\" contains %d processes "
>>  "which have\n"
>>  "nothing to do.\n\n", group_o_ntasks);
>>  }
>>}
>>MPI_Comm_free (_OTHER);
>>  }
>>
>>
>>  /* =
>>   * ==  all groups will reach this point ==
>>   * =
>>   */
>>  MPI_Group_free (_worker);
>>  MPI_Group_free (_other);
>>  MPI_Finalize ();
>>  return EXIT_SUCCESS;
>> }
>>
>>
>> /* Print the values of an arbitrary 2D-matrix of "double" values. The
>> * compiler doesn't know the structure of the matrix so that you have
>> * to do the index calculations for mat[i][j] yourself. In C a matrix
>> * is stored row-by-row so that the i-th row starts at location "i * q"
>> * if the matrix has "q" columns. Therefore the address of mat[i][j]
>> * can be expressed as "(double *) mat + i * q + j" and mat[i][j]
>> * itself as "*((double *) mat + i * q + j)".
>> *
>> * input parameters:  p   number of rows
>> *q   number of columns
>> *mat 2D-matrix of "double" values
>> * output parameters: none
>> * return value:  none
>> * side effects:  none
>> *
>> */
>> void print_matrix (int p, int q, double **mat)
>> {
>>  int i, j;   /* loop variables   */
>>
>>  for (i = 0; i < p; ++i)
>>  {
>>for (j = 0; j < q; ++j)
>>{
>>  printf ("%6g", *((double *) mat + i * q + j));
>>}
>>printf ("\n");
>>  }
>>  printf ("\n");
>> }
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 

-- 
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335



signature.asc
Description: OpenPGP digital signature


Re: [OMPI users] MPI-IO puzzlement

2012-05-10 Thread Edgar Gabriel
what file system is this on?

On 5/10/2012 12:37 PM, Ricardo Reis wrote:
> 
> what is the communicator that you used to open the file? I am wondering
> whether it differs from the communicator used in MPI_Barrier, and some
> processes do not enter the Barrier at all...
> 
> Thanks
> Edgar
> 
> 
> world, I only use one comm on this code.
> 
> world = MPI_COMM_WORLD
> 
> 
> 
> 
>CALL MPI_file_open(world, TRIM(flname), &
>  amode, MPI_INFO_NULL, fh, ierr)
> 
> 
> I have run this code in other machines, with other configs and got no
> problem. I'm running with all debug flags turned on but it just hangs
> there without any feedback to me. I wonder if there is something I could
> do to have some feedback.
> 
> 
> 
> 
> 
>  Ricardo Reis
> 
>  'Non Serviam'
> 
>  PhD/MSc Mechanical Engineering | Lic. Aerospace Engineering
> 
>  Computational Fluid Dynamics, High Performance Computing, Turbulence
>  http://www.lasef.ist.utl.pt
> 
>  Cultural Instigator @ Rádio Zero
>  http://www.radiozero.pt
> 
>  http://www.flickr.com/photos/rreis/
> 
>  contacts:  gtalk: kyriu...@gmail.com  skype: kyriusan
> 
>  Institutional Address:
> 
>  Ricardo J.N. dos Reis
>  IDMEC, Instituto Superior Técnico, Technical University of Lisbon
>  Av. Rovisco Pais
>  1049-001 Lisboa
>  Portugal
> 
>   - email sent with alpine 2.00 -
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



signature.asc
Description: OpenPGP digital signature


Re: [OMPI users] MPI-IO puzzlement

2012-05-10 Thread Edgar Gabriel
what is the communicator that you used to open the file? I am wondering
whether it differs from the communicator used in MPI_Barrier, and some
processes do not enter the Barrier at all...

Thanks
Edgar

On 5/10/2012 12:22 PM, Ricardo Reis wrote:
> 
>  Hi all
> 
>  I'm trying to run my code in a cluster here with infiniband. It is in
> Fortran 95/2003 and uses MPI-IO for output. I'm using openmpi 1.5.5. It
> runs has been running fine but for a particular configuration, using all
> of the cluster cores (128, divided in 4 boxes with 4 Octo-core Opterons
> each), it hangs while calling MPI-IO.
> 
>  So what I am asking is help in debugging this. This is the relevant
> part of the code
> 
> CALL MPI_File_set_view(fh, disp, etype, filetype, &
>  TRIM(datarep),  MPI_INFO_NULL, ierr)
> 
> IF(DEBGON)THEN
>IF(master)THEN
>   WRITE(logfl,'(/,"DBG: WriteMPI_IO going to write file.")')
>   FLUSH(logfl)
>ENDIF
>CALL MPI_Barrier(world, ierr)
> ENDIF
> 
> CALL MPI_file_write_at_all(fh, offset, arr, dim, &
>  etype, status, ierr)
> 
> 
> 
>  And it hangs just after the flush, so apparently in the
> MPI_write_at_all call.
> 
>  Any ideas of what to do or where to look are welcomed.
> 
>  best,
> 
> 
>  Ricardo Reis
> 
>  'Non Serviam'
> 
>  PhD/MSc Mechanical Engineering | Lic. Aerospace Engineering
> 
>  Computational Fluid Dynamics, High Performance Computing, Turbulence
>  http://www.lasef.ist.utl.pt
> 
>  Cultural Instigator @ Rádio Zero
>  http://www.radiozero.pt
> 
>  http://www.flickr.com/photos/rreis/
> 
>  contacts:  gtalk: kyriu...@gmail.com  skype: kyriusan
> 
>  Institutional Address:
> 
>  Ricardo J.N. dos Reis
>  IDMEC, Instituto Superior Técnico, Technical University of Lisbon
>  Av. Rovisco Pais
>  1049-001 Lisboa
>  Portugal
> 
>   - email sent with alpine 2.00 -
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




signature.asc
Description: OpenPGP digital signature


Re: [OMPI users] Problem with MPI_Barrier (Inter-communicator)

2012-04-05 Thread Edgar Gabriel
so just to confirm, I ran our test suite for inter-communicator
collective operations and communicator duplication, and everything still
works. Specifically comm_dup on an intercommunicator is not
fundamentally broken, but worked for my tests.

Having your code to see what your code precisely does would help me to
hunt the problem down, since I am otherwise not able to reproduce the
problem.

Also, which version of Open MPI did you use?

Thanks
Edgar

On 4/4/2012 3:09 PM, Thatyene Louise Alves de Souza Ramos wrote:
> Hi Edgar, thank you for the response.
> 
> Unfortunately, I've tried with and without this option. In both the
> result was the same... =(
> 
> On Wed, Apr 4, 2012 at 5:04 PM, Edgar Gabriel <gabr...@cs.uh.edu
> <mailto:gabr...@cs.uh.edu>> wrote:
> 
> did you try to start the program with the --mca coll ^inter switch that
> I mentioned? Collective dup for intercommunicators should work, its
> probably again the bcast over a communicator of size 1 that is causing
> the hang, and you could avoid it with the flag that I mentioned above.
> 
> Also, if you could attach your test code, that would help in hunting
> things down.
> 
> Thanks
> Edgar
> 
> On 4/4/2012 2:18 PM, Thatyene Louise Alves de Souza Ramos wrote:
> > Hi there.
> >
> > I've made some tests related to the problem reported by Rodrigo. And I
> > think, I'd rather be wrong, that /collective calls like Create and Dup
> > do not work with Inter communicators. I've try this in the client
> group:/
> >
> > *MPI::Intercomm tmp_inter_comm;*
> > *
> > *
> > *tmp_inter_comm = server_comm.Create (server_comm.Get_group().Excl(1,
> > ));*
> > *
> > *
> > *if(server_comm.Get_rank() != rank)*
> > *server_comm = tmp_inter_comm.Dup();*
> > *else*
> > *server_comm = MPI::COMM_NULL;*
> > *
> > *
> > The server_comm is the original inter communicator with the server
> group.
> >
> > I've noticed that the program hangs in the Dup call. It seems that the
> > tmp_inter_comm created without one process still has this process,
> > because the other processes are waiting for it call the Dup too.
> >
> > What do you think?
> >
> > On Wed, Mar 28, 2012 at 6:03 PM, Edgar Gabriel <gabr...@cs.uh.edu
> <mailto:gabr...@cs.uh.edu>
> > <mailto:gabr...@cs.uh.edu <mailto:gabr...@cs.uh.edu>>> wrote:
> >
> > it just uses a different algorithm which avoids the bcast on a
> > communicator of 1 (which is causing the problem here).
> >
> > Thanks
> > Edgar
> >
> > On 3/28/2012 12:08 PM, Rodrigo Oliveira wrote:
> > > Hi Edgar,
> > >
> > > I tested the execution of my code using the option -mca coll
> ^inter as
> > > you suggested and the program worked fine, even when I use 1
> server
> > > instance.
> > >
> > > What is the modification caused by this parameter? I did not
> find an
> > > explanation about the utilization of the module coll inter.
> > >
> > > Thanks a lot for your attention and for the solution.
> > >
> > > Best regards,
> > >
> > > Rodrigo Oliveira
> > >
> > > On Tue, Mar 27, 2012 at 1:10 PM, Rodrigo Oliveira
> > > <rsilva.olive...@gmail.com
> <mailto:rsilva.olive...@gmail.com> <mailto:rsilva.olive...@gmail.com
> <mailto:rsilva.olive...@gmail.com>>
> > <mailto:rsilva.olive...@gmail.com
> <mailto:rsilva.olive...@gmail.com>
> > <mailto:rsilva.olive...@gmail.com
> <mailto:rsilva.olive...@gmail.com>>>> wrote:
> > >
> > >
> > > Hi Edgar.
>     > >
> > > Thanks for the response. I just did not understand why
> the Barrier
> > > works before I remove one of the client processes.
> > >
> > > I tryed it with 1 server and 3 clients and it worked
> properly.
> > After
> > > I removed 1 of the clients, it stops working. So, the
> removal is
> > > affecting the functionality of Barrier, I guess.
> > >
> > > Anyone has an idea?
> > >
> > >
>

Re: [OMPI users] Problem with MPI_Barrier (Inter-communicator)

2012-04-05 Thread Edgar Gabriel
can you please send me your testcode?
Thanks
Edgar

On 4/4/2012 3:09 PM, Thatyene Louise Alves de Souza Ramos wrote:
> Hi Edgar, thank you for the response.
> 
> Unfortunately, I've tried with and without this option. In both the
> result was the same... =(
> 
> On Wed, Apr 4, 2012 at 5:04 PM, Edgar Gabriel <gabr...@cs.uh.edu
> <mailto:gabr...@cs.uh.edu>> wrote:
> 
> did you try to start the program with the --mca coll ^inter switch that
> I mentioned? Collective dup for intercommunicators should work, its
> probably again the bcast over a communicator of size 1 that is causing
> the hang, and you could avoid it with the flag that I mentioned above.
> 
> Also, if you could attach your test code, that would help in hunting
> things down.
> 
> Thanks
> Edgar
> 
> On 4/4/2012 2:18 PM, Thatyene Louise Alves de Souza Ramos wrote:
> > Hi there.
> >
> > I've made some tests related to the problem reported by Rodrigo. And I
> > think, I'd rather be wrong, that /collective calls like Create and Dup
> > do not work with Inter communicators. I've try this in the client
> group:/
> >
> > *MPI::Intercomm tmp_inter_comm;*
> > *
> > *
> > *tmp_inter_comm = server_comm.Create (server_comm.Get_group().Excl(1,
> > ));*
> > *
> > *
> > *if(server_comm.Get_rank() != rank)*
> > *server_comm = tmp_inter_comm.Dup();*
> > *else*
> > *server_comm = MPI::COMM_NULL;*
> > *
> > *
> > The server_comm is the original inter communicator with the server
> group.
> >
> > I've noticed that the program hangs in the Dup call. It seems that the
> > tmp_inter_comm created without one process still has this process,
> > because the other processes are waiting for it call the Dup too.
> >
> > What do you think?
> >
> > On Wed, Mar 28, 2012 at 6:03 PM, Edgar Gabriel <gabr...@cs.uh.edu
> <mailto:gabr...@cs.uh.edu>
> > <mailto:gabr...@cs.uh.edu <mailto:gabr...@cs.uh.edu>>> wrote:
> >
> > it just uses a different algorithm which avoids the bcast on a
> > communicator of 1 (which is causing the problem here).
> >
> > Thanks
> > Edgar
> >
> > On 3/28/2012 12:08 PM, Rodrigo Oliveira wrote:
> > > Hi Edgar,
> > >
> > > I tested the execution of my code using the option -mca coll
> ^inter as
> > > you suggested and the program worked fine, even when I use 1
> server
> > > instance.
> > >
> > > What is the modification caused by this parameter? I did not
> find an
> > > explanation about the utilization of the module coll inter.
> > >
> > > Thanks a lot for your attention and for the solution.
> > >
> > > Best regards,
> > >
> > > Rodrigo Oliveira
> > >
> > > On Tue, Mar 27, 2012 at 1:10 PM, Rodrigo Oliveira
> > > <rsilva.olive...@gmail.com
> <mailto:rsilva.olive...@gmail.com> <mailto:rsilva.olive...@gmail.com
> <mailto:rsilva.olive...@gmail.com>>
> > <mailto:rsilva.olive...@gmail.com
> <mailto:rsilva.olive...@gmail.com>
> > <mailto:rsilva.olive...@gmail.com
> <mailto:rsilva.olive...@gmail.com>>>> wrote:
> > >
> > >
> > > Hi Edgar.
>     > >
> > > Thanks for the response. I just did not understand why
> the Barrier
> > > works before I remove one of the client processes.
> > >
> > > I tryed it with 1 server and 3 clients and it worked
> properly.
> > After
> > > I removed 1 of the clients, it stops working. So, the
> removal is
> > > affecting the functionality of Barrier, I guess.
> > >
> > > Anyone has an idea?
> > >
> > >
> > > On Mon, Mar 26, 2012 at 12:34 PM, Edgar Gabriel
> > <gabr...@cs.uh.edu <mailto:gabr...@cs.uh.edu>
> <mailto:gabr...@cs.uh.edu <mailto:gabr...@cs.uh.edu>>
> > > <mailto:gabr...@cs.uh.edu <mailto:gabr...@cs.uh.edu>
> <mailto:gabr...@cs.uh.edu <mailto:gabr...@cs.uh.edu>>&g

Re: [OMPI users] Problem with MPI_Barrier (Inter-communicator)

2012-04-04 Thread Edgar Gabriel
did you try to start the program with the --mca coll ^inter switch that
I mentioned? Collective dup for intercommunicators should work, its
probably again the bcast over a communicator of size 1 that is causing
the hang, and you could avoid it with the flag that I mentioned above.

Also, if you could attach your test code, that would help in hunting
things down.

Thanks
Edgar

On 4/4/2012 2:18 PM, Thatyene Louise Alves de Souza Ramos wrote:
> Hi there.
> 
> I've made some tests related to the problem reported by Rodrigo. And I
> think, I'd rather be wrong, that /collective calls like Create and Dup
> do not work with Inter communicators. I've try this in the client group:/
> 
> *MPI::Intercomm tmp_inter_comm;*
> *
> *
> *tmp_inter_comm = server_comm.Create (server_comm.Get_group().Excl(1,
> ));*
> *
> *
> *if(server_comm.Get_rank() != rank)*
> *server_comm = tmp_inter_comm.Dup();*
> *else*
> *server_comm = MPI::COMM_NULL;*
> *
> *
> The server_comm is the original inter communicator with the server group.
> 
> I've noticed that the program hangs in the Dup call. It seems that the
> tmp_inter_comm created without one process still has this process,
> because the other processes are waiting for it call the Dup too.
> 
> What do you think?
> 
> On Wed, Mar 28, 2012 at 6:03 PM, Edgar Gabriel <gabr...@cs.uh.edu
> <mailto:gabr...@cs.uh.edu>> wrote:
> 
> it just uses a different algorithm which avoids the bcast on a
> communicator of 1 (which is causing the problem here).
> 
> Thanks
> Edgar
> 
> On 3/28/2012 12:08 PM, Rodrigo Oliveira wrote:
> > Hi Edgar,
> >
> > I tested the execution of my code using the option -mca coll ^inter as
> > you suggested and the program worked fine, even when I use 1 server
> > instance.
> >
> > What is the modification caused by this parameter? I did not find an
> > explanation about the utilization of the module coll inter.
> >
> > Thanks a lot for your attention and for the solution.
> >
> > Best regards,
> >
> > Rodrigo Oliveira
> >
> > On Tue, Mar 27, 2012 at 1:10 PM, Rodrigo Oliveira
> > <rsilva.olive...@gmail.com <mailto:rsilva.olive...@gmail.com>
> <mailto:rsilva.olive...@gmail.com
> <mailto:rsilva.olive...@gmail.com>>> wrote:
> >
> >
> > Hi Edgar.
> >
> > Thanks for the response. I just did not understand why the Barrier
> > works before I remove one of the client processes.
> >
> > I tryed it with 1 server and 3 clients and it worked properly.
> After
> > I removed 1 of the clients, it stops working. So, the removal is
> > affecting the functionality of Barrier, I guess.
> >
> > Anyone has an idea?
> >
> >
> > On Mon, Mar 26, 2012 at 12:34 PM, Edgar Gabriel
> <gabr...@cs.uh.edu <mailto:gabr...@cs.uh.edu>
> > <mailto:gabr...@cs.uh.edu <mailto:gabr...@cs.uh.edu>>> wrote:
> >
> > I do not recall on what the agreement was on how to treat
> the size=1
> >
> >
> >
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org <mailto:us...@open-mpi.org>
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org <mailto:us...@open-mpi.org>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Edgar Gabriel
Associate Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335



signature.asc
Description: OpenPGP digital signature


Re: [OMPI users] Problem with MPI_Barrier (Inter-communicator)

2012-03-28 Thread Edgar Gabriel
it just uses a different algorithm which avoids the bcast on a
communicator of 1 (which is causing the problem here).

Thanks
Edgar

On 3/28/2012 12:08 PM, Rodrigo Oliveira wrote:
> Hi Edgar,
> 
> I tested the execution of my code using the option -mca coll ^inter as
> you suggested and the program worked fine, even when I use 1 server
> instance.
> 
> What is the modification caused by this parameter? I did not find an
> explanation about the utilization of the module coll inter.
> 
> Thanks a lot for your attention and for the solution.
> 
> Best regards,
> 
> Rodrigo Oliveira
> 
> On Tue, Mar 27, 2012 at 1:10 PM, Rodrigo Oliveira
> <rsilva.olive...@gmail.com <mailto:rsilva.olive...@gmail.com>> wrote:
> 
> 
> Hi Edgar.
> 
> Thanks for the response. I just did not understand why the Barrier
> works before I remove one of the client processes.
> 
> I tryed it with 1 server and 3 clients and it worked properly. After
> I removed 1 of the clients, it stops working. So, the removal is
> affecting the functionality of Barrier, I guess.
> 
> Anyone has an idea?
> 
> 
> On Mon, Mar 26, 2012 at 12:34 PM, Edgar Gabriel <gabr...@cs.uh.edu
> <mailto:gabr...@cs.uh.edu>> wrote:
> 
> I do not recall on what the agreement was on how to treat the size=1
> 
> 
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



signature.asc
Description: OpenPGP digital signature


Re: [OMPI users] Problem with MPI_Barrier (Inter-communicator)

2012-03-26 Thread Edgar Gabriel
yes and no,. So first, here is a quick fix for you: if you start the
server using

mpirun -np 2 -mca coll ^inter ./server

your test code finishes (with one minor modification to your code,
namely the process that is being excluded on the client side does need a
condition to leave the while loop as well.).

That being said, here is what the problem seems to be when using the
inter communicator module. The inter-comm barrier is handled initially
by the basic module, and is implemented by calling an allreduce
operation. The inter-communicator allreduce  per default uses the
implementation in the inter module, as a sequence of intra-reduce on the
local communicator, a point-to-point exchange of the results of the two
local groups by the local root processes (rank zero in the local groups
of the intercomm), and a broadcast of the results on the local group.
And it is this very last step that we are hanging.

So bottom line, the intra-communicator broadcast for a communicator size
of 1 is hanging, as far as I can see independent of whether we use tuned
or basic.

I do not recall on what the agreement was on how to treat the size=1
scenarios in coll. Looking at the routine in tuned ( e.g.
ompi_coll_tuned_bcast_intra_generic ) there is a statement which clearly
indicates that it should not be used for 1 proc

assert(size>1)

but I do not recall on which module or what the agreement was on how
that was supposed to be treated correctly. I am also no sure why the
bcast on 1 proc works on the server side but does not on the client
side. That's where I stand right now in the analysis.


Thanks
Edgar

On 3/26/2012 8:39 AM, Rodrigo Oliveira wrote:
> Hi Edgar, 
> 
> Did you take a look at my code? Any idea about what is happening? I did
> a lot of tests and it does not work.
> 
> Thanks
> 
> On Tue, Mar 20, 2012 at 3:43 PM, Rodrigo Oliveira
> <rsilva.olive...@gmail.com <mailto:rsilva.olive...@gmail.com>> wrote:
> 
> The command I use to compile and run is:
> 
> mpic++ server.cc -o server && mpic++ client.cc -o client && mpirun
> -np 1 ./server
> 
> Rodrigo
> 
> 
> On Tue, Mar 20, 2012 at 3:40 PM, Rodrigo Oliveira
> <rsilva.olive...@gmail.com <mailto:rsilva.olive...@gmail.com>> wrote:
> 
> Hi Edgar.
> 
> Thanks for the response. The simplified code is attached:
> server, client and a .h containing some constants. I put some
> "prints" to show the behavior.
> 
> Regards
> 
> Rodrigo
> 
> 
> On Tue, Mar 20, 2012 at 11:47 AM, Edgar Gabriel
> <gabr...@cs.uh.edu <mailto:gabr...@cs.uh.edu>> wrote:
> 
> do you have by any chance the actual or a small reproducer?
> It might be
> much easier to hunt the problem down...
> 
> Thanks
> Edgar
> 
> On 3/19/2012 8:12 PM, Rodrigo Oliveira wrote:
> > Hi there.
> >
> > I am facing a very strange problem when using MPI_Barrier
> over an
> > inter-communicator after some operations I describe bellow:
> >
> > 1) I start a server calling mpirun.
> > 2) The server spawns 2 copies of a client using
> MPI_Comm_spawn, creating
> > an inter-communicator between the two groups. The server
> group with 1
> > process (lets name it as A) and the client group with 2
> processes (group B).
> > 3) After that, I need to detach one of the processes (rank
> 0) in group B
> > from the inter-communicator AB. To do that I do the
> following steps:
> >
> > Server side:
> > .
> > tmp_inter_comm = client_comm.Create (
> client_comm.Get_group ( ) );
> > client_comm.Free ( );
> > client_comm = tmp_inter_comm;
> > .
> > client_comm.Barrier();
> > .
> >
> > Client side:
> > 
> > rank = 0;
> > tmp_inter_comm = server_comm.Create (
> server_comm.Get_group (
> > ).Excl ( 1,  ) );
> > server_comm.Free ( );
> > server_comm = tmp_inter_comm;
> > .
> > if (server_comm != MPI::COMM_NULL)
> > server_comm.Barrier();
> >
> >
> > The problem: eve

Re: [OMPI users] Problem with MPI_Barrier (Inter-communicator)

2012-03-20 Thread Edgar Gabriel
do you have by any chance the actual or a small reproducer? It might be
much easier to hunt the problem down...

Thanks
Edgar

On 3/19/2012 8:12 PM, Rodrigo Oliveira wrote:
> Hi there.
> 
> I am facing a very strange problem when using MPI_Barrier over an
> inter-communicator after some operations I describe bellow:
> 
> 1) I start a server calling mpirun.
> 2) The server spawns 2 copies of a client using MPI_Comm_spawn, creating
> an inter-communicator between the two groups. The server group with 1
> process (lets name it as A) and the client group with 2 processes (group B).
> 3) After that, I need to detach one of the processes (rank 0) in group B
> from the inter-communicator AB. To do that I do the following steps:
> 
> Server side:
> .
> tmp_inter_comm = client_comm.Create ( client_comm.Get_group ( ) );
> client_comm.Free ( );
> client_comm = tmp_inter_comm;
> .
> client_comm.Barrier();
> .
> 
> Client side:
> 
> rank = 0;
> tmp_inter_comm = server_comm.Create ( server_comm.Get_group (
> ).Excl ( 1,  ) );
> server_comm.Free ( );
> server_comm = tmp_inter_comm;
> .
> if (server_comm != MPI::COMM_NULL)
> server_comm.Barrier();
> 
> 
> The problem: everything works fine until the call to Barrier. In that
> point, the server exits the barrier, but the client at the group B does
> not. Observe that we have only one process inside B, because I used Excl
> to remove one process from the original group.
> 
> p.s.: This occurs in the version 1.5.4 and the C++ API.
> 
> I am very concerned about this problem because this solution plays a
> very important role in my master thesis.
> 
> Is this an ompi problem or am I doing something wrong?
> 
> Thanks in advance
> 
> Rodrigo Oliveira
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



signature.asc
Description: OpenPGP digital signature


Re: [OMPI users] Want to find LogGP parameters. Please help

2011-10-27 Thread Edgar Gabriel
you can have a look at the Netgauge tool of Torsten Hoefler, this tool
can report the LogGP parameters.

http://unixer.de/research/netgauge/

Thanks
Edgar

On 10/26/2011 11:48 AM, Mudassar Majeed wrote:
> Dear MPI people,
>I want to use LogGP model with MPI to
> find a message with K bytes will take how much time. In this, I need to
> find Latency L, Overhead o and Gap G. Can somebody tell me how can I
> measure these three parameters of the underlying network ? and how often
> should I measure these parameters so that the predication of time for
> sending a message of K bytes remains accurate.
> 
> regards,
> Mudassar
> 



signature.asc
Description: OpenPGP digital signature


Re: [OMPI users] Problem with MPI_Intercomm_create

2011-06-07 Thread Edgar Gabriel


On 6/7/2011 10:23 AM, George Bosilca wrote:
> 
> On Jun 7, 2011, at 11:00 , Edgar Gabriel wrote:
> 
>> George,
>> 
>> I did not look over all the details of your test, but it looks to
>> me like you are violating one of the requirements of
>> intercomm_create namely the request that the two groups have to be
>> disjoint. In your case the parent process(es) are part of both
>> local intra-communicators, isn't it?
> 
> The two groups of the two local communicators are disjoints. One
> contains A,B while the other only C. The bridge communicator contains
> A,C.
> 
> I'm confident my example is supposed to work. At least for Open MPI
> the error is under the hood, as the resulting inter-communicator is
> valid but contains NULL endpoints for the remote process.

I'll come back to that later, I am not yet convinced that your code is
correct :-) Your local groups might be disjoint, but I am worried about
the ranks of the remote leader in your example. THey can not be 0 from
both groups perspective.

> 
> Regarding the fact that the two leader should be separate processes,
> you will not find any wording about this in the current version of
> the standard. In the 1.1 there were two opposite sentences about this
> one stating that the two groups can be disjoint, while the other
> claiming that the two leaders can be the same process. After
> discussion, the agreement was that the two groups have to be
> disjoint, and the standard has been amended to match the agreement.


I realized that this is a non-issue. If the two local groups are
disjoint, there is no way that the two local leaders are the same process.

Thanks
Edgar

> 
> george.
> 
> 
>> 
>> I just have MPI-1.1. at hand right now, but here is what it says: 
>> 
>> 
>> Overlap of local and remote groups that are bound into an 
>> inter-communicator is prohibited. If there is overlap, then the
>> program is erroneous and is likely to deadlock.
>> 
>>  so bottom line is that the two local intra-communicators that
>> are being used have to be disjoint, and the bridgecomm needs to be
>> a communicator where at least one process of each of the two
>> disjoint groups need to be able to talk to each other.
>> Interestingly I did not find a sentence whether it is allowed to be
>> the same process, or whether the two local leaders need to be
>> separate processes...
>> 
>> 
>> Thanks Edgar
>> 
>> 
>> On 6/7/2011 12:57 AM, George Bosilca wrote:
>>> Frederic,
>>> 
>>> Attached you will find an example that is supposed to work. The
>>> main difference with your code is on T3, T4 where you have
>>> inversed the local and remote comm. As depicted on the picture
>>> attached below, during the 3th step you will create the intercomm
>>> between ab and c (no overlap) using ac as a bridge communicator
>>> (here the two roots, a and c, can exchange messages).
>>> 
>>> Based on the MPI 2.2 standard, especially on the paragraph in
>>> PS:, the attached code should have been working. Unfortunately, I
>>> couldn't run it successfully neither with Open MPI trunk nor
>>> MPICH2 1.4rc1.
>>> 
>>> george.
>>> 
>>> PS: Here is what the MPI standard states about the
>>> MPI_Intercomm_create:
>>>> The function MPI_INTERCOMM_CREATE can be used to create an
>>>> inter-communicator from two existing intra-communicators, in
>>>> the following situation: At least one selected member from each
>>>> group (the “group leader”) has the ability to communicate with
>>>> the selected member from the other group; that is, a “peer”
>>>> communicator exists to which both leaders belong, and each
>>>> leader knows the rank of the other leader in this peer
>>>> communicator. Furthermore, members of each group know the rank
>>>> of their leader.
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> On Jun 1, 2011, at 05:00 , Frédéric Feyel wrote:
>>> 
>>>> Hello,
>>>> 
>>>> I have a problem using MPI_Intercomm_create.
>>>> 
>>>> I 5 tasks, let's say T0, T1, T2, T3, T4 resulting from two
>>>> spawn operations by T0.
>>>> 
>>>> So I have two intra-communicator :
>>>> 
>>>> intra0 contains : T0, T1, T2 intra1 contains : T0, T3, T4
>>>> 
>>>> my goal is to make a collective loop to build a single
>>>> intra-communicato

Re: [OMPI users] Problem with MPI_Intercomm_create

2011-06-07 Thread Edgar Gabriel
George,

I did not look over all the details of your test, but it looks to me
like you are violating one of the requirements of intercomm_create
namely the request that the two groups have to be disjoint. In your case
the parent process(es) are part of both local intra-communicators, isn't it?

I just have MPI-1.1. at hand right now, but here is what it says:


Overlap of local and remote groups that are bound into an
inter-communicator is prohibited. If there is overlap, then the program
is erroneous and is likely to deadlock.


so bottom line is that the two local intra-communicators that are being
used have to be disjoint, and the bridgecomm needs to be a communicator
where at least one process of each of the two disjoint groups need to be
able to talk to each other. Interestingly I did not find a sentence
whether it is allowed to be the same process, or whether the two local
leaders need to be separate processes...


Thanks
Edgar


On 6/7/2011 12:57 AM, George Bosilca wrote:
> Frederic,
> 
> Attached you will find an example that is supposed to work. The main 
> difference with your code is on T3, T4 where you have inversed the local and 
> remote comm. As depicted on the picture attached below, during the 3th step 
> you will create the intercomm between ab and c (no overlap) using ac as a 
> bridge communicator (here the two roots, a and c, can exchange messages).
> 
> Based on the MPI 2.2 standard, especially on the paragraph in PS:, the 
> attached code should have been working. Unfortunately, I couldn't run it 
> successfully neither with Open MPI trunk nor MPICH2 1.4rc1. 
> 
>  george.
> 
> PS: Here is what the MPI standard states about the MPI_Intercomm_create:
>> The function MPI_INTERCOMM_CREATE can be used to create an 
>> inter-communicator from two existing intra-communicators, in the following 
>> situation: At least one selected member from each group (the “group leader”) 
>> has the ability to communicate with the selected member from the other 
>> group; that is, a “peer” communicator exists to which both leaders belong, 
>> and each leader knows the rank of the other leader in this peer 
>> communicator. Furthermore, members of each group know the rank of their 
>> leader.
> 
> 
> 
> 
> 
> 
> 
> 
> 
> On Jun 1, 2011, at 05:00 , Frédéric Feyel wrote:
> 
>> Hello,
>>
>> I have a problem using MPI_Intercomm_create.
>>
>> I 5 tasks, let's say T0, T1, T2, T3, T4 resulting from two spawn
>> operations by T0.
>>
>> So I have two intra-communicator :
>>
>> intra0 contains : T0, T1, T2
>> intra1 contains : T0, T3, T4
>>
>> my goal is to make a collective loop to build a single intra-communicator
>> containing T0, T1, T2, T3, T4
>>
>> I tried to do it using MPI_Intercomm_create and MPI_Intercom_merge calls,
>> but without success (I always get MPI internal errors).
>>
>> What I am doing :
>>
>> on T0 :
>> ***
>>
>> MPI_Intercom_create(intra0,0,intra1,0,1,_com)
>>
>> on T1 and T2 :
>> **
>>
>> MPI_Intercom_create(intra0,0,MPI_COMM_WORLD,0,1,_com)
>>
>> on T3 and T4 :
>> **
>>
>> MPI_Intercom_create(intra1,0,MPI_COMM_WORLD,0,1,_com)
>>
>>
>> I'm certainly missing something. Could anybody help me to solve this
>> problem ?
>>
>> Best regards,
>>
>> Frédéric.
>>
>> PS : of course I did an extensive web search without finding anything
>> usefull on my problem.
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335



signature.asc
Description: OpenPGP digital signature


Re: [OMPI users] MPI_Bcast issue

2010-08-09 Thread Edgar Gabriel
On 8/8/2010 8:13 PM, Randolph Pullen wrote:
> Thanks,  although “An intercommunicator cannot be used for collective
> communication.” i.e ,  bcast calls., 

yes it can. MPI-1 did not allow for collective operations on
intercommunicators, but the MPI-2 specification did introduce that notion.

Thanks
Edgar

> I can see how the MPI_Group_xx
> calls can be used to produce a useful group and then communicator;  -
> thanks again but this is really the side issue to my main question
> about MPI_Bcast.
> 
> I seem to have duplicate concurrent processes interfering with each
> other.  This would appear to be a breach of the MPI safety dictum, ie
> MPI_COMM_WORD is supposed to only include the processes started by a
> single mpirun command and isolate these processes from other similar
> groups of processes safely.
> 
> So, it would appear to be a bug.  If so this has significant
> implications for environments such as mine, where it may often occur
> that the same program is run by different users simultaneously.
> 
> It is really this issue that it concerning me, I can rewrite the code
> but if it can crash when 2 copies run at the same time, I have a much
> bigger problem.
> 
> My suspicion is that a within the MPI_Bcast handshaking, a
> syncronising broadcast call may be colliding across the environments.
> My only evidence is an otherwise working program waits on broadcast
> reception forever when two or more copies are run at [exactly] the
> same time.
> 
> Has anyone else seen similar behavior in concurrently running
> programs that perform lots of broadcasts perhaps?
> 
> Randolph
> 
> 
> --- On Sun, 8/8/10, David Zhang  wrote:
> 
> From: David Zhang  Subject: Re: [OMPI users]
> MPI_Bcast issue To: "Open MPI Users"  Received:
> Sunday, 8 August, 2010, 12:34 PM
> 
> In particular, intercommunicators
> 
> On 8/7/10, Aurélien Bouteiller  wrote:
>> You should consider reading about communicators in MPI.
>> 
>> Aurelien -- Aurelien Bouteiller, Ph.D. Innovative Computing
>> Laboratory, The University of Tennessee.
>> 
>> Envoyé de mon iPad
>> 
>> Le Aug 7, 2010 à 1:05, Randolph Pullen
>>  a écrit :
>> 
>>> I seem to be having a problem with MPI_Bcast. My massive I/O
>>> intensive data movement program must broadcast from n to n nodes.
>>> My problem starts because I require 2 processes per node, a
>>> sender and a receiver and I have implemented these using MPI
>>> processes rather than tackle the complexities of threads on MPI.
>>> 
>>> Consequently, broadcast and calls like alltoall are not
>>> completely helpful.  The dataset is huge and each node must end
>>> up with a complete copy built by the large number of contributing
>>> broadcasts from the sending nodes.  Network efficiency and run
>>> time are paramount.
>>> 
>>> As I don’t want to needlessly broadcast all this data to the
>>> sending nodes and I have a perfectly good MPI program that
>>> distributes globally from a single node (1 to N), I took the
>>> unusual decision to start N copies of this program by spawning
>>> the MPI system from the PVM system in an effort to get my N to N
>>> concurrent transfers.
>>> 
>>> It seems that the broadcasts running on concurrent MPI
>>> environments collide and cause all but the first process to hang
>>> waiting for their broadcasts.  This theory seems to be confirmed
>>> by introducing a sleep of n-1 seconds before the first MPI_Bcast
>>> call on each node, which results in the code working perfectly.
>>> (total run time 55 seconds, 3 nodes, standard TCP stack)
>>> 
>>> My guess is that unlike PVM, OpenMPI implements broadcasts with
>>> broadcasts rather than multicasts.  Can someone confirm this?  Is
>>> this a bug?
>>> 
>>> Is there any multicast or N to N broadcast where sender processes
>>> can avoid participating when they don’t need to?
>>> 
>>> Thanks in advance Randolph
>>> 
>>> 
>>> 
>>> ___ users mailing
>>> list us...@open-mpi.org 
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
> 
> 
> 
> ___ users mailing list 
> us...@open-mpi.org 
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-28 Thread Edgar Gabriel
hm, this looks actually correct. The question now basically is, why the
intermediate hand-shake by the processes with rank 0 on the
inter-communicator is not finishing.
I am wandering whether this could be related to a problem reported in
another thread (Processes stuck after MPI_Waitall() in 1.4.1)?

http://www.open-mpi.org/community/lists/users/2010/07/13720.php




On 7/28/2010 4:01 AM, Grzegorz Maj wrote:
> I've attached gdb to the client which has just connected to the grid.
> Its bt is almost exactly the same as the server's one:
> #0  0x428066d7 in sched_yield () from /lib/libc.so.6
> #1  0x00933cbf in opal_progress () at ../../opal/runtime/opal_progress.c:220
> #2  0x00d460b8 in opal_condition_wait (c=0xdc3160, m=0xdc31a0) at
> ../../opal/threads/condition.h:99
> #3  0x00d463cc in ompi_request_default_wait_all (count=2,
> requests=0xff8a36d0, statuses=0x0) at
> ../../ompi/request/req_wait.c:262
> #4  0x00a1431f in mca_coll_inter_allgatherv_inter (sbuf=0xff8a3794,
> scount=1, sdtype=0x8049400, rbuf=0xff8a3750, rcounts=0x80948e0,
> disps=0x8093938, rdtype=0x8049400, comm=0x8094fb8, module=0x80954a0)
> at ../../../../../ompi/mca/coll/inter/coll_inter_allgatherv.c:127
> #5  0x00d3198f in ompi_comm_determine_first (intercomm=0x8094fb8,
> high=1) at ../../ompi/communicator/comm.c:1199
> #6  0x00d75833 in PMPI_Intercomm_merge (intercomm=0x8094fb8, high=1,
> newcomm=0xff8a4c00) at pintercomm_merge.c:84
> #7  0x08048a16 in main (argc=892352312, argv=0x32323038) at client.c:28
> 
> I've tried both scenarios described: when hangs a client connecting
> from machines B and C. In both cases bt looks the same.
> How does it look like?
> Shall I repost that using a different subject as Ralph suggested?
> 
> Regards,
> Grzegorz
> 
> 
> 
> 2010/7/27 Edgar Gabriel <gabr...@cs.uh.edu>:
>> based on your output shown here, there is absolutely nothing wrong
>> (yet). Both processes are in the same function and do what they are
>> supposed to do.
>>
>> However, I am fairly sure that the client process bt that you show is
>> already part of current_intracomm. Could you try to create a bt of the
>> process that is not yet part of current_intracomm (If I understand your
>> code correctly, the intercommunicator is n-1 configuration, with each
>> client process being part of n after the intercomm_merge). It would be
>> interesting to see where that process is...
>>
>> Thanks
>> Edgar
>>
>> On 7/27/2010 1:42 PM, Ralph Castain wrote:
>>> This slides outside of my purview - I would suggest you post this question 
>>> with a different subject line specifically mentioning failure of 
>>> intercomm_merge to work so it attracts the attention of those with 
>>> knowledge of that area.
>>>
>>>
>>> On Jul 27, 2010, at 9:30 AM, Grzegorz Maj wrote:
>>>
>>>> So now I have a new question.
>>>> When I run my server and a lot of clients on the same machine,
>>>> everything looks fine.
>>>>
>>>> But when I try to run the clients on several machines the most
>>>> frequent scenario is:
>>>> * server is stared on machine A
>>>> * X (= 1, 4, 10, ..) clients are started on machine B and they connect
>>>> successfully
>>>> * the first client starting on machine C connects successfully to the
>>>> server, but the whole grid hangs on MPI_Comm_merge (all the processes
>>>> from intercommunicator get there).
>>>>
>>>> As I said it's the most frequent scenario. Sometimes I can connect the
>>>> clients from several machines. Sometimes it hangs (always on
>>>> MPI_Comm_merge) when connecting the clients from machine B.
>>>> The interesting thing is, that if before MPI_Comm_merge I send a dummy
>>>> message on the intercommunicator from process rank 0 in one group to
>>>> process rank 0 in the other one, it will not hang on MPI_Comm_merge.
>>>>
>>>> I've tried both versions with and without the first patch (ompi-server
>>>> as orted) but it doesn't change the behavior.
>>>>
>>>> I've attached gdb to my server, this is bt:
>>>> #0  0xe410 in __kernel_vsyscall ()
>>>> #1  0x00637afc in sched_yield () from /lib/libc.so.6
>>>> #2  0xf7e8ce31 in opal_progress () at 
>>>> ../../opal/runtime/opal_progress.c:220
>>>> #3  0xf7f60ad4 in opal_condition_wait (c=0xf7fd7dc0, m=0xf7fd7e00) at
>>>> ../../opal/threads/condition.h:99
>>>> #4  0xf7f60dee in ompi_request_default_wait_all (count=2,
>>>> requests=0xff8d7754, statu

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-27 Thread Edgar Gabriel
0x86aa7f8,
>> module=0x868b700)
>>at ../../../../../ompi/mca/coll/tuned/coll_tuned_decision_fixed.c:256
>> #12 0xf7c73269 in mca_coll_sync_bcast (buff=0xff822d20, count=1,
>> datatype=0x868bd00, root=0, comm=0x86aa7f8, module=0x86aaa28) at
>> ../../../../../ompi/mca/coll/sync/coll_sync_bcast.c:44
>> #13 0xf7c80381 in mca_coll_inter_allgatherv_inter (sbuf=0xff822d64,
>> scount=0, sdtype=0x8049400, rbuf=0xff822d20, rcounts=0x868a188,
>> disps=0x868abb8, rdtype=0x8049400, comm=0x86aa300,
>>module=0x86aae18) at
>> ../../../../../ompi/mca/coll/inter/coll_inter_allgatherv.c:134
>> #14 0xf7e9398f in ompi_comm_determine_first (intercomm=0x86aa300,
>> high=0) at ../../ompi/communicator/comm.c:1199
>> #15 0xf7ed7833 in PMPI_Intercomm_merge (intercomm=0x86aa300, high=0,
>> newcomm=0xff8241d0) at pintercomm_merge.c:84
>> #16 0x08048afd in main (argc=943274038, argv=0x33393133) at client.c:47
>>
>>
>>
>> What do you think may cause the problem?
>>
>>
>> 2010/7/26 Ralph Castain <r...@open-mpi.org>:
>>> No problem at all - glad it works!
>>>
>>> On Jul 26, 2010, at 7:58 AM, Grzegorz Maj wrote:
>>>
>>>> Hi,
>>>> I'm very sorry, but the problem was on my side. My installation
>>>> process was not always taking the newest sources of openmpi. In this
>>>> case it hasn't installed the version with the latest patch. Now I
>>>> think everything works fine - I could run over 130 processes with no
>>>> problems.
>>>> I'm sorry again that I've wasted your time. And thank you for the patch.
>>>>
>>>> 2010/7/21 Ralph Castain <r...@open-mpi.org>:
>>>>> We're having some problem replicating this once my patches are applied. 
>>>>> Can you send us your configure cmd? Just the output from "head 
>>>>> config.log" will do for now.
>>>>>
>>>>> Thanks!
>>>>>
>>>>> On Jul 20, 2010, at 9:09 AM, Grzegorz Maj wrote:
>>>>>
>>>>>> My start script looks almost exactly the same as the one published by
>>>>>> Edgar, ie. the processes are starting one by one with no delay.
>>>>>>
>>>>>> 2010/7/20 Ralph Castain <r...@open-mpi.org>:
>>>>>>> Grzegorz: something occurred to me. When you start all these processes, 
>>>>>>> how are you staggering their wireup? Are they flooding us, or are you 
>>>>>>> time-shifting them a little?
>>>>>>>
>>>>>>>
>>>>>>> On Jul 19, 2010, at 10:32 AM, Edgar Gabriel wrote:
>>>>>>>
>>>>>>>> Hm, so I am not sure how to approach this. First of all, the test case
>>>>>>>> works for me. I used up to 80 clients, and for both optimized and
>>>>>>>> non-optimized compilation. I ran the tests with trunk (not with 1.4
>>>>>>>> series, but the communicator code is identical in both cases). Clearly,
>>>>>>>> the patch from Ralph is necessary to make it work.
>>>>>>>>
>>>>>>>> Additionally, I went through the communicator creation code for dynamic
>>>>>>>> communicators trying to find spots that could create problems. The only
>>>>>>>> place that I found the number 64 appear is the fortran-to-c mapping
>>>>>>>> arrays (e.g. for communicators), where the initial size of the table is
>>>>>>>> 64. I looked twice over the pointer-array code to see whether we could
>>>>>>>> have a problem their (since it is a key-piece of the cid allocation 
>>>>>>>> code
>>>>>>>> for communicators), but I am fairly confident that it is correct.
>>>>>>>>
>>>>>>>> Note, that we have other (non-dynamic tests), were comm_set is called
>>>>>>>> 100,000 times, and the code per se does not seem to have a problem due
>>>>>>>> to being called too often. So I am not sure what else to look at.
>>>>>>>>
>>>>>>>> Edgar
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> On 7/13/2010 8:42 PM, Ralph Castain wrote:
>>>>>>>>> As far as I can tell, it appears the problem is somewhere in our 
>>>>>>>>> communicator setup. The people knowle

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-19 Thread Edgar Gabriel
Hm, so I am not sure how to approach this. First of all, the test case
works for me. I used up to 80 clients, and for both optimized and
non-optimized compilation. I ran the tests with trunk (not with 1.4
series, but the communicator code is identical in both cases). Clearly,
the patch from Ralph is necessary to make it work.

Additionally, I went through the communicator creation code for dynamic
communicators trying to find spots that could create problems. The only
place that I found the number 64 appear is the fortran-to-c mapping
arrays (e.g. for communicators), where the initial size of the table is
64. I looked twice over the pointer-array code to see whether we could
have a problem their (since it is a key-piece of the cid allocation code
for communicators), but I am fairly confident that it is correct.

Note, that we have other (non-dynamic tests), were comm_set is called
100,000 times, and the code per se does not seem to have a problem due
to being called too often. So I am not sure what else to look at.

Edgar



On 7/13/2010 8:42 PM, Ralph Castain wrote:
> As far as I can tell, it appears the problem is somewhere in our communicator 
> setup. The people knowledgeable on that area are going to look into it later 
> this week.
> 
> I'm creating a ticket to track the problem and will copy you on it.
> 
> 
> On Jul 13, 2010, at 6:57 AM, Ralph Castain wrote:
> 
>>
>> On Jul 13, 2010, at 3:36 AM, Grzegorz Maj wrote:
>>
>>> Bad news..
>>> I've tried the latest patch with and without the prior one, but it
>>> hasn't changed anything. I've also tried using the old code but with
>>> the OMPI_DPM_BASE_MAXJOBIDS constant changed to 80, but it also didn't
>>> help.
>>> While looking through the sources of openmpi-1.4.2 I couldn't find any
>>> call of the function ompi_dpm_base_mark_dyncomm.
>>
>> It isn't directly called - it shows in ompi_comm_set as 
>> ompi_dpm.mark_dyncomm. You were definitely overrunning that array, but I 
>> guess something else is also being hit. Have to look further...
>>
>>
>>>
>>>
>>> 2010/7/12 Ralph Castain :
 Just so you don't have to wait for 1.4.3 release, here is the patch 
 (doesn't include the prior patch).




 On Jul 12, 2010, at 12:13 PM, Grzegorz Maj wrote:

> 2010/7/12 Ralph Castain :
>> Dug around a bit and found the problem!!
>>
>> I have no idea who or why this was done, but somebody set a limit of 64 
>> separate jobids in the dynamic init called by ompi_comm_set, which 
>> builds the intercommunicator. Unfortunately, they hard-wired the array 
>> size, but never check that size before adding to it.
>>
>> So after 64 calls to connect_accept, you are overwriting other areas of 
>> the code. As you found, hitting 66 causes it to segfault.
>>
>> I'll fix this on the developer's trunk (I'll also add that original 
>> patch to it). Rather than my searching this thread in detail, can you 
>> remind me what version you are using so I can patch it too?
>
> I'm using 1.4.2
> Thanks a lot and I'm looking forward for the patch.
>
>>
>> Thanks for your patience with this!
>> Ralph
>>
>>
>> On Jul 12, 2010, at 7:20 AM, Grzegorz Maj wrote:
>>
>>> 1024 is not the problem: changing it to 2048 hasn't change anything.
>>> Following your advice I've run my process using gdb. Unfortunately I
>>> didn't get anything more than:
>>>
>>> Program received signal SIGSEGV, Segmentation fault.
>>> [Switching to Thread 0xf7e4c6c0 (LWP 20246)]
>>> 0xf7f39905 in ompi_comm_set () from /home/gmaj/openmpi/lib/libmpi.so.0
>>>
>>> (gdb) bt
>>> #0  0xf7f39905 in ompi_comm_set () from 
>>> /home/gmaj/openmpi/lib/libmpi.so.0
>>> #1  0xf7e3ba95 in connect_accept () from
>>> /home/gmaj/openmpi/lib/openmpi/mca_dpm_orte.so
>>> #2  0xf7f62013 in PMPI_Comm_connect () from 
>>> /home/gmaj/openmpi/lib/libmpi.so.0
>>> #3  0x080489ed in main (argc=825832753, argv=0x34393638) at client.c:43
>>>
>>> What's more: when I've added a breakpoint on ompi_comm_set in 66th
>>> process and stepped a couple of instructions, one of the other
>>> processes crashed (as usualy on ompi_comm_set) earlier than 66th did.
>>>
>>> Finally I decided to recompile openmpi using -g flag for gcc. In this
>>> case the 66 processes issue has gone! I was running my applications
>>> exactly the same way as previously (even without recompilation) and
>>> I've run successfully over 130 processes.
>>> When switching back to the openmpi compilation without -g it again 
>>> segfaults.
>>>
>>> Any ideas? I'm really confused.
>>>
>>>
>>>
>>> 2010/7/7 Ralph Castain :
 I would guess the #files limit of 1024. However, if it behaves the 
 same way when spread across multiple machines, I would suspect it is 
 

Re: [OMPI users] [openib] segfault when using openib btl

2010-07-15 Thread Edgar Gabriel
On 7/15/2010 10:18 AM, Eloi Gaudry wrote:
> hi edgar,
> 
> thanks for the tips, I'm gonna try this option as well. the segmentation 
> fault i'm observing always happened during a collective communication 
> indeed...
> does it basically switch all collective communication to basic mode, right ?
> 
> sorry for my ignorance, but what's a NCA ? 

sorry, I meant to type HCA (InifinBand networking card)

Thanks
Edgar

> 
> thanks,
> éloi
> 
> On Thursday 15 July 2010 16:20:54 Edgar Gabriel wrote:
>> you could try first to use the algorithms in the basic module, e.g.
>>
>> mpirun -np x --mca coll basic ./mytest
>>
>> and see whether this makes a difference. I used to observe sometimes a
>> (similar ?) problem in the openib btl triggered from the tuned
>> collective component, in cases where the ofed libraries were installed
>> but no NCA was found on a node. It used to work however with the basic
>> component.
>>
>> Thanks
>> Edgar
>>
>> On 7/15/2010 3:08 AM, Eloi Gaudry wrote:
>>> hi Rolf,
>>>
>>> unfortunately, i couldn't get rid of that annoying segmentation fault
>>> when selecting another bcast algorithm. i'm now going to replace
>>> MPI_Bcast with a naive implementation (using MPI_Send and MPI_Recv) and
>>> see if that helps.
>>>
>>> regards,
>>> éloi
>>>
>>> On Wednesday 14 July 2010 10:59:53 Eloi Gaudry wrote:
>>>> Hi Rolf,
>>>>
>>>> thanks for your input. You're right, I miss the
>>>> coll_tuned_use_dynamic_rules option.
>>>>
>>>> I'll check if I the segmentation fault disappears when using the basic
>>>> bcast linear algorithm using the proper command line you provided.
>>>>
>>>> Regards,
>>>> Eloi
>>>>
>>>> On Tuesday 13 July 2010 20:39:59 Rolf vandeVaart wrote:
>>>>> Hi Eloi:
>>>>> To select the different bcast algorithms, you need to add an extra mca
>>>>> parameter that tells the library to use dynamic selection.
>>>>> --mca coll_tuned_use_dynamic_rules 1
>>>>>
>>>>> One way to make sure you are typing this in correctly is to use it with
>>>>> ompi_info.  Do the following:
>>>>> ompi_info -mca coll_tuned_use_dynamic_rules 1 --param coll
>>>>>
>>>>> You should see lots of output with all the different algorithms that
>>>>> can be selected for the various collectives.
>>>>> Therefore, you need this:
>>>>>
>>>>> --mca coll_tuned_use_dynamic_rules 1 --mca coll_tuned_bcast_algorithm 1
>>>>>
>>>>> Rolf
>>>>>
>>>>> On 07/13/10 11:28, Eloi Gaudry wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I've found that "--mca coll_tuned_bcast_algorithm 1" allowed to switch
>>>>>> to the basic linear algorithm. Anyway whatever the algorithm used, the
>>>>>> segmentation fault remains.
>>>>>>
>>>>>> Does anyone could give some advice on ways to diagnose the issue I'm
>>>>>> facing ?
>>>>>>
>>>>>> Regards,
>>>>>> Eloi
>>>>>>
>>>>>> On Monday 12 July 2010 10:53:58 Eloi Gaudry wrote:
>>>>>>> Hi,
>>>>>>>
>>>>>>> I'm focusing on the MPI_Bcast routine that seems to randomly segfault
>>>>>>> when using the openib btl. I'd like to know if there is any way to
>>>>>>> make OpenMPI switch to a different algorithm than the default one
>>>>>>> being selected for MPI_Bcast.
>>>>>>>
>>>>>>> Thanks for your help,
>>>>>>> Eloi
>>>>>>>
>>>>>>> On Friday 02 July 2010 11:06:52 Eloi Gaudry wrote:
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>> I'm observing a random segmentation fault during an internode
>>>>>>>> parallel computation involving the openib btl and OpenMPI-1.4.2 (the
>>>>>>>> same issue can be observed with OpenMPI-1.3.3).
>>>>>>>>
>>>>>>>>mpirun (Open MPI) 1.4.2
>>>>>>>>Report bugs to http://www.open-mpi.org/community/help/
>>>>>>>>[pbn08:02624] *** Process received signal ***
>>>>>>>>[pbn08:02624] Signal: Seg

Re: [OMPI users] [openib] segfault when using openib btl

2010-07-15 Thread Edgar Gabriel
you could try first to use the algorithms in the basic module, e.g.

mpirun -np x --mca coll basic ./mytest

and see whether this makes a difference. I used to observe sometimes a
(similar ?) problem in the openib btl triggered from the tuned
collective component, in cases where the ofed libraries were installed
but no NCA was found on a node. It used to work however with the basic
component.

Thanks
Edgar


On 7/15/2010 3:08 AM, Eloi Gaudry wrote:
> hi Rolf,
> 
> unfortunately, i couldn't get rid of that annoying segmentation fault when 
> selecting another bcast algorithm.
> i'm now going to replace MPI_Bcast with a naive implementation (using 
> MPI_Send and MPI_Recv) and see if that helps.
> 
> regards,
> éloi
> 
> 
> On Wednesday 14 July 2010 10:59:53 Eloi Gaudry wrote:
>> Hi Rolf,
>>
>> thanks for your input. You're right, I miss the
>> coll_tuned_use_dynamic_rules option.
>>
>> I'll check if I the segmentation fault disappears when using the basic
>> bcast linear algorithm using the proper command line you provided.
>>
>> Regards,
>> Eloi
>>
>> On Tuesday 13 July 2010 20:39:59 Rolf vandeVaart wrote:
>>> Hi Eloi:
>>> To select the different bcast algorithms, you need to add an extra mca
>>> parameter that tells the library to use dynamic selection.
>>> --mca coll_tuned_use_dynamic_rules 1
>>>
>>> One way to make sure you are typing this in correctly is to use it with
>>> ompi_info.  Do the following:
>>> ompi_info -mca coll_tuned_use_dynamic_rules 1 --param coll
>>>
>>> You should see lots of output with all the different algorithms that can
>>> be selected for the various collectives.
>>> Therefore, you need this:
>>>
>>> --mca coll_tuned_use_dynamic_rules 1 --mca coll_tuned_bcast_algorithm 1
>>>
>>> Rolf
>>>
>>> On 07/13/10 11:28, Eloi Gaudry wrote:
 Hi,

 I've found that "--mca coll_tuned_bcast_algorithm 1" allowed to switch
 to the basic linear algorithm. Anyway whatever the algorithm used, the
 segmentation fault remains.

 Does anyone could give some advice on ways to diagnose the issue I'm
 facing ?

 Regards,
 Eloi

 On Monday 12 July 2010 10:53:58 Eloi Gaudry wrote:
> Hi,
>
> I'm focusing on the MPI_Bcast routine that seems to randomly segfault
> when using the openib btl. I'd like to know if there is any way to
> make OpenMPI switch to a different algorithm than the default one
> being selected for MPI_Bcast.
>
> Thanks for your help,
> Eloi
>
> On Friday 02 July 2010 11:06:52 Eloi Gaudry wrote:
>> Hi,
>>
>> I'm observing a random segmentation fault during an internode
>> parallel computation involving the openib btl and OpenMPI-1.4.2 (the
>> same issue can be observed with OpenMPI-1.3.3).
>>
>>mpirun (Open MPI) 1.4.2
>>Report bugs to http://www.open-mpi.org/community/help/
>>[pbn08:02624] *** Process received signal ***
>>[pbn08:02624] Signal: Segmentation fault (11)
>>[pbn08:02624] Signal code: Address not mapped (1)
>>[pbn08:02624] Failing at address: (nil)
>>[pbn08:02624] [ 0] /lib64/libpthread.so.0 [0x349540e4c0]
>>[pbn08:02624] *** End of error message ***
>>sh: line 1:  2624 Segmentation fault
>>
>> \/share\/hpc3\/actran_suite\/Actran_11\.0\.rc2\.41872\/RedHatEL\-5\/x
>> 86 _6 4\ /bin\/actranpy_mp
>> '--apl=/share/hpc3/actran_suite/Actran_11.0.rc2.41872/RedHatEL-5/x86_
>> 64 /A c tran_11.0.rc2.41872'
>> '--inputfile=/work/st25652/LSF_130073_0_47696_0/Case1_3Dreal_m4_n2.da
>> t' '--scratch=/scratch/st25652/LSF_130073_0_47696_0/scratch'
>> '--mem=3200' '--threads=1' '--errorlevel=FATAL' '--t_max=0.1'
>> '--parallel=domain'
>>
>> If I choose not to use the openib btl (by using --mca btl self,sm,tcp
>> on the command line, for instance), I don't encounter any problem and
>> the parallel computation runs flawlessly.
>>
>> I would like to get some help to be able:
>> - to diagnose the issue I'm facing with the openib btl
>> - understand why this issue is observed only when using the openib
>> btl and not when using self,sm,tcp
>>
>> Any help would be very much appreciated.
>>
>> The outputs of ompi_info and the configure scripts of OpenMPI are
>> enclosed to this email, and some information on the infiniband
>> drivers as well.
>>
>> Here is the command line used when launching a parallel computation
>>
>> using infiniband:
>>path_to_openmpi/bin/mpirun -np $NPROCESS --hostfile host.list
>>--mca
>>
>> btl openib,sm,self,tcp  --display-map --verbose --version --mca
>> mpi_warn_on_fork 0 --mca btl_openib_want_fork_support 0 [...]
>>
>> and the command line used if not using infiniband:
>>path_to_openmpi/bin/mpirun -np $NPROCESS --hostfile host.list
>>--mca
>>
>> btl self,sm,tcp  --display-map --verbose --version --mca
>> 

Re: [OMPI users] Allgather in inter-communicator bug,

2010-05-20 Thread Edgar Gabriel
thanks for pointing the problem out. I checked in the code, the problem
is the MPI layer itself. The following check prevents us from doing
anything


e.g. ompi/mpi/c/allgather.c

   if ((MPI_IN_PLACE != sendbuf && 0 == sendcount) ||
(0 == recvcount)) {
return MPI_SUCCESS;
}



so the problem is not in the modules/algorithms but in the API layer,
which did not encounter for intercommunicators correctly. I'll try to
fix it.

Thanks
edgar

On 05/20/2010 10:48 AM, Battalgazi YILDIRIM wrote:
> Hi,
> 
> you are right, I should have provided C++ and Fortran example, so I am
> doing now
> 
> 
> Here is "cplusplus.cpp"
> 
> #include 
> #include 
> using namespace std;
> int main()
> {
> MPI::Init();
> char command[] = "./a.out";
> MPI::Info info;
> MPI::Intercomm child = MPI::COMM_WORLD.Spawn(command, NULL, 8,info, 0);
> int a[8]={0,0,0,0,0,0,0,0};
> int dummy;
> child.Allgather(, 0, MPI::INT, a, 1, MPI::INT);
> child.Disconnect();
> cout << "a[";
> for ( int i = 0; i < 7; i++ )
> cout << a[i] << ",";
> cout << a[7] << "]" << endl;
> 
> MPI::Finalize();
> }
> 
> 
> Here is again "fortran.f90"
> 
> program main
>  use mpi
>  implicit none
>  integer :: parent, rank, val, dummy, ierr
>  call MPI_Init(ierr)
>  call MPI_Comm_get_parent(parent, ierr)
>  call MPI_Comm_rank(parent, rank, ierr)
>  val = rank + 1
>  call MPI_Allgather(val,   1, MPI_INTEGER, &
> dummy, 0, MPI_INTEGER, &
> parent, ierr)
>  call MPI_Comm_disconnect(parent, ierr)
>  call MPI_Finalize(ierr)
> end program main
> 
> here is how you build and run
> 
> -bash-3.2$ mpif90 fortran.f90
> -bash-3.2$ mpiCC -o parent cplusplus.cpp
> -bash-3.2$ ./parent
> a[0,0,0,0,0,0,0,0]
> 
> 
> 
> If I use mpich2,
> -bash-3.2$ mpif90 fortran.f90
> -bash-3.2$ mpiCC -o parent cplusplus.cpp
> -bash-3.2$ ./parent
> a[1,2,3,4,5,6,7,8]
> 
> I hope that you can repeat this problem to see problem with OPENMPI,
> 
> Thanks,
> 
> 
> On Thu, May 20, 2010 at 10:09 AM, Jeff Squyres  > wrote:
> 
> Can you send us an all-C or all-Fortran example that shows the problem?
> 
> We don't have easy access to test through the python bindings.
>  ...ok, I admit it, it's laziness on my part.  :-)  But having a
> pure Open MPI test app would also remove some possible variables and
> possible sources of error.
> 
> 
> On May 20, 2010, at 9:43 AM, Battalgazi YILDIRIM wrote:
> 
> > Hi Jody,
> >
> > I think that it is correct, you can  test this example in your
> desktop,
> >
> > thanks,
> >
> > On Thu, May 20, 2010 at 3:18 AM, jody  > wrote:
> > Hi
> > I am really no python expert, but it looks to me as if you were
> > gathering arrays filled with zeroes:
> >  a = array('i', [0]) * n
> >
> > Shouldn't this line be
> >  a = array('i', [r])*n
> > where r is the rank of the process?
> >
> > Jody
> >
> >
> > On Thu, May 20, 2010 at 12:00 AM, Battalgazi YILDIRIM
> > > wrote:
> > > Hi,
> > >
> > >
> > > I am trying to use intercommunicator ::Allgather between two
> child process.
> > > I have fortran and Python code,
> > > I am using mpi4py for python. It seems that ::Allgather is not
> working
> > > properly in my desktop.
> > >
> > >  I have contacted first mpi4py developers (Lisandro Dalcin), he
> simplified
> > > my problem and provided two example files (python.py and
> fortran.f90,
> > > please see below).
> > >
> > > We tried with different MPI vendors, the following example
> worked correclty(
> > > it means the final print out should be array('i', [1, 2, 3, 4,
> 5, 6, 7, 8])
> > > )
> > >
> > > However, it is not giving correct answer in my two desktop
> (Redhat and
> > > ubuntu) both
> > > using OPENMPI
> > >
> > > Could yo look at this problem please?
> > >
> > > If you want to follow our discussion before you, you can go to
> following
> > > link:
> > >
> 
> http://groups.google.com/group/mpi4py/browse_thread/thread/c17c660ae56ff97e
> > >
> > > yildirim@memosa:~/python_intercomm$ more python.py
> > > from mpi4py import MPI
> > > from array import array
> > > import os
> > >
> > > progr = os.path.abspath('a.out')
> > > child = MPI.COMM_WORLD.Spawn(progr,[], 8)
> > > n = child.remote_size
> > > a = array('i', [0]) * n
> > > child.Allgather([None,MPI.INT ],[a,MPI.INT
> ])
> > > child.Disconnect()
> > > print a
> > >
> > > yildirim@memosa:~/python_intercomm$ more fortran.f90
> > > program main
> > >  use mpi
> > >  implicit none
> > >  integer :: parent, rank, val, 

Re: [OMPI users] Problems Using PVFS2 with OpenMPI

2010-01-13 Thread Edgar Gabriel
I don't know whether its relevant for this problem or not, but a couple 
of weeks ago we also found that we had to apply the following patch to 
to compile ROMIO with OpenMPI over pvfs2. There is an additional header 
pvfs2-compat.h included in the ROMIO version of MPICH, but is somehow 
missing in the OpenMPI version


ompi/mca/io/romio/romio/adio/ad_pvfs2/ad_pvfs2.h
--- a/ompi/mca/io/romio/romio/adio/ad_pvfs2/ad_pvfs2.h  Thu Sep 03
11:55:51 2009 -0500
+++ b/ompi/mca/io/romio/romio/adio/ad_pvfs2/ad_pvfs2.h  Mon Sep 21
10:16:27 2009 -0500
@@ -11,6 +11,10 @@
 #include "adio.h"
 #ifdef HAVE_PVFS2_H
 #include "pvfs2.h"
+#endif
+
+#ifdef PVFS2_VERSION_MAJOR
+#include "pvfs2-compat.h"
 #endif


Thanks
Edgar


Rob Latham wrote:

On Tue, Jan 12, 2010 at 02:15:54PM -0800, Evan Smyth wrote:

OpenMPI 1.4 (had same issue with 1.3.3) is configured with
./configure --prefix=/work/rd/evan/archives/openmpi/openmpi/1.4/enable_pvfs \
--enable-mpi-threads --with-io-romio-flags="--with-filesystems=pvfs2+ufs+nfs"



PVFS 2.8.1 is configured to install in the default location (/usr/local) with
./configure --with-mpi=/work/rd/evan/archives/openmpi/openmpi/1.4/enable_pvfs


In addition to Jeff's request for the build logs, do you have
'pvfs2-config' in your path?   
 

I build and install these (in this order) and setup my PVFS2 space using
instructions at pvfs.org. I am able to use this space using the
/usr/local/bin/pvfs2-ls types of commands. I am simply running a 2-server
config (2 data servers and the same 2 hosts are metadata servers). As I say,
manually, this all seems fine (even when I'm not root). It may be
relevant that I am *not* using the kernel interface for PVFS2 as I
am just trying to get a
better understanding of how this works.


That's a good piece of information.  I run in that configuration
often, so we should be able to make this work.


It is perhaps relevant that I have not had to explicitly tell
OpenMPI where I installed PVFS. I have told PVFS where I installed
OpenMPI, though. This does seem slightly odd but there does not
appear to be a way of telling OpenMPI this information. Perhaps it
is not needed.


PVFS needs an MPI library only to build MPI-based testcases.  The
servers, client libraries, and utilities do not use MPI.


In any event, I then build my test program against this OpenMPI and
in that program I have the following call sequence (i is 0 and where
mntPoint is the path to my pvfs2 mount point -- I also tried
prefixing a "pvfs2:" in the front of this as I read somewhere that
that was optional).


In this case, since you do not have the PVFS file system mounted, the
'pvfs2:' prefix is mandatory.  Otherwise, the MPI-IO library will try
to look for a directory that does not exist.


Which will only execute on one of my ranks (the way I'm running it).
No matter what I try, the MPI_File_open call fails with an
MPI_ERR_ACCESS error code.  This suggests a permission problem but I
am able to manually cp and rm from the pvfs2 space without problem
so I am not at all clear on what the permission problem is. My
access flags look fine to me (the MPI_MODE_UNIQUE_OPEN flag makes no
difference in this case as I'm only opening a single file anyway).
If I write this file to shared NFS storage, all is "fine"
(obviously, I do not consider that a permanent solution, though).



Does anyone have any idea why this is not working? Alternately or in
addition, does anyone have step-by-step instructions for how to
build and set up PVFS2 with OpenMPI as well as an example program
because this is the first time I've attempted this so I may well be
doing something wrong.


It sounds like you're on the right track.  I should update the PVFS
quickstart for the OpenMPI specifics.  In addition to pvfs2-ping and
pvfs2-ls, make sure you can pvfs2-cp files to and from your volume.
If those 3 utilities work, then your OpenMPI installation should work
as well.  


==rob



--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335


Re: [OMPI users] Deadlock in MPI_File_write_all on Infiniband

2009-10-12 Thread Edgar Gabriel
I am wondering whether this is really due to the usage of 
File_write_all. We had a bug in in 1.3 series so far (which will be 
fixed in 1.3.4) where we lost message segments and thus had a deadlock 
in Comm_dup if there was communication occurring *right after* the 
Comm_dup. File_open executes a comm_dup internally.


If you replace write_all by write, you are avoiding the communication. 
If you replace ib by tcp, your entire timing is different and you might 
accidentally not see the deadlock...


Just my $0.02 ...

Thanks
Edgar

Dorian Krause wrote:

Dear list,

the attached program deadlocks in MPI_File_write_all when run with 16 
processes on two 8 core nodes of an Infiniband cluster. It runs fine when I


a) use tcp
or
b) replace MPI_File_write_all by MPI_File_write

I'm using openmpi V. 1.3.2 (but I checked that the problem is also 
occurs with version 1.3.3). The OFED version is 1.4 (installed via 
Rocks). The Operating system is CentOS 5.2


I compile with gcc-4.1.2. The openmpi configure flags are

  ../../configure --prefix=/share/apps/openmpi/1.3.2/gcc-4.1.2/ 
--with-io-romio-flags=--with-file-system=nfs+ufs+pvfs2 
--with-wrapper-ldflags=-L/share/apps/pvfs2/lib 
CPPFLAGS=-I/share/apps/pvfs2/include/ LDFLAGS=-L/share/apps/pvfs2/lib 
LIBS=-lpvfs2 -lpthread


The user home directories are mounted via nfs.

Is it a problem with the user code, the system or with openmpi?

Thanks,
Dorian




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335


Re: [OMPI users] Automated tuning tool

2009-08-07 Thread Edgar Gabriel

Gus Correa wrote:

Terry Frankcombe wrote:

There's been quite some discussion here lately about the effect of OMPI
tuning parameters, particularly w.r.t. collectives.

Is there some tool to probe performance on any given network/collection
of nodes to aid optimisation of these parameters?

(I'm thinking something along the philosophy of ATLAS.)


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



Hi Terry

We are also looking for this holy grail.

So far I found this 2008 reference to a certain
"Open Tool for Parameter Optimization (OTPO)":

http://www.springerlink.com/content/h5162153l184r7p0/

OTPO defines itself as this:

"OTPO systematically tests large numbers of combinations of Open MPI’s 
run-time tunable parameters for common communication patterns and 
performance metrics to determine the “best” set for a given platform."


you can checkout the OTPO code at

http://svn.open-mpi.org/svn/otpo/trunk/

It supports as of now netpipe and skampi collectives for tuning. It is 
far from perfect, but it is a starting point. If there are any issues,

please let us know...

Thanks
Edgar



However, I couldn't find any reference to the actual code or scripts,
and whether it is available, tested, free, downloadable, etc.

At this point I am doing these performance
tests in a laborious and inefficient manual way,
when I have the time to do it.

As some of the aforementioned article authors
are list subscribers (and OpenMPI developers),
maybe they can shed some light about OTPO, tuned collective 
optimization, OpenMPI runtime parameter optimization, etc.


IMHO, this topic deserves at least a FAQ.

Developers, Jeff:  Any suggestions?  :)

Many thanks,
Gus Correa
-
Gustavo Correa
Lamont-Doherty Earth Observatory - Columbia University
Palisades, NY, 10964-8000 - USA
-

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335


Re: [OMPI users] strange IMB runs

2009-07-31 Thread Edgar Gabriel

Michael Di Domenico wrote:

mpi_leave_pinned didn't help still at ~145MB/sec
btl_sm_eager_limit from 4096 to 8192 pushes me upto ~212MB/sec, but
pushing it past that doesn't change it anymore

Are there any intelligent programs that can go through and test all
the different permutations of tunables for openmpi?  Outside of me
just writing an ugly looping script...


actually there is,

http://svn.open-mpi.org/svn/otpo/trunk/

this tool has been used to tune openib parameter, and I would guess that 
it could be used without any modification to also run netpipe over sm...


Thanks
Edgar


On Wed, Jul 29, 2009 at 1:55 PM, Dorian Krause<doriankra...@web.de> wrote:

Hi,

--mca mpi_leave_pinned 1

might help. Take a look at the FAQ for various tuning parameters.


Michael Di Domenico wrote:

I'm not sure I understand what's actually happened here.  I'm running
IMB on an HP superdome, just comparing the PingPong benchmark

HP-MPI v2.3
Max ~ 700-800MB/sec

OpenMPI v1.3
-mca btl self,sm - Max ~ 125-150MB/sec
-mca btl self,tcp - Max ~ 500-550MB/sec

Is this behavior expected?  Are there any tunables to get the OpenMPI
sockets up near HP-MPI?
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335


Re: [OMPI users] FW: hanging after many comm create/destroy's

2009-05-20 Thread Edgar Gabriel
I am 99.99% sure that this bug has been fixed in the current trunk and 
will be available in the upcoming 1.3.3 release...


Thanks
Edgar

Lippert, Ross wrote:
 


The attached program prints hangs at after printing "Iteration 65524".
It does not appear to me that it should.  Removal of the barrier call or
changing the barrier call to use MPI_COMM_WORLD does get rid of the
hang, so I believe this program is a minimal representation of a bug.

I have attached the output of ompi_info --all as well.  I do not have
access to the config.log.

The command to compile was

mpicc mpibug.c

The command to run was 


orterun --np 8 --mca btl tcp,self -- ./a.out


-r




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335


Re: [OMPI users] strange bug

2009-05-12 Thread Edgar Gabriel
hm, so I am out of ideas. I created multiple variants of test-programs 
which did what you basically described, and they all passed and did not 
generate problems. I compiled the MUMPS library and ran the tests that 
they have in the examples directory, and they all worked.


Additionally, I checked in the source code of Open MPI. In comm_dup 
there is only a single location where we raise the error MPI_ERR_INTERN 
(which was reported in your email). I am fairly positive, that this can 
not occur, else we would segfault prior to that (it is a stupid check, 
don't ask). Furthermore, the code segment that has been modified does 
not raise anywhere MPI_ERR_INTERN. Of course, it could be a secondary 
effect and be created somewhere else (PML_ADD or collective module 
selection) and comm_dup just passes the error code up.


One way or the other, I need more hints on what the code does. Any 
chance of getting a smaller code fragment which replicates the problem? 
It could use the MUMPS library, I am fine with that since I just 
compiled and installed it with the current ompi trunk...


Thanks
Edgar

Edgar Gabriel wrote:
I would say the probability is large that it is due to the recent 'fix'. 
 I will try to create a testcase similar to what you suggested. Could 
you give us maybe some hints on which functionality of MUMPS you are 
using, or even share the code/ a code fragment?


Thanks
Edgar

Jeff Squyres wrote:

Hey Edgar --

Could this have anything to do with your recent fixes?

On May 12, 2009, at 8:30 AM, Anton Starikov wrote:


hostfile from torque PBS_NODEFILE (OMPI is compilled with torque
support)

It happens with or without rankfile.
Started with
mpirun -np 16 ./somecode

mca parameters:

btl = self,sm,openib
mpi_maffinity_alone = 1
rmaps_base_no_oversubscribe = 1 (rmaps_base_no_oversubscribe = 0
doesn't change it)

I tested with both: "btl=self,sm" on 16c-core nodes and
"btl=self,sm,openib" on 8x dual-core nodes , result is the same.

It looks like it always occurs exactly at the same point in the
execution, not at the beginning, it is not first MPI_Comm_dup in the
code.

I can't say too much about particular piece of the code, where it is
happening, because it is in the 3rd-party library (MUMPS).  When error
occurs, MPI_Comm_dup in every task deals with single-task communicator
(MPI_Comm_split of initial MPI_Comm_world for 16 processes into 16
groups, 1 process per group). And I  can guess that before this error,
MPI_Comm_dup is called something like 100 of times by the same piece
of code on the same communicators without any problems.

I can say that it used to work correctly with all previous versions of
openmpi we used (1.2.8-1.3.2 and some earlier versions). It also works
correctly on other platforms/MPI implementations.

All environmental variables (PATH, LD_LIBRARY_PATH) are correct.
I recompiled code and 3rd-party libraries with this version of OMPI.











--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335


Re: [OMPI users] strange bug

2009-05-12 Thread Edgar Gabriel
I would say the probability is large that it is due to the recent 'fix'. 
 I will try to create a testcase similar to what you suggested. Could 
you give us maybe some hints on which functionality of MUMPS you are 
using, or even share the code/ a code fragment?


Thanks
Edgar

Jeff Squyres wrote:

Hey Edgar --

Could this have anything to do with your recent fixes?

On May 12, 2009, at 8:30 AM, Anton Starikov wrote:


hostfile from torque PBS_NODEFILE (OMPI is compilled with torque
support)

It happens with or without rankfile.
Started with
mpirun -np 16 ./somecode

mca parameters:

btl = self,sm,openib
mpi_maffinity_alone = 1
rmaps_base_no_oversubscribe = 1 (rmaps_base_no_oversubscribe = 0
doesn't change it)

I tested with both: "btl=self,sm" on 16c-core nodes and
"btl=self,sm,openib" on 8x dual-core nodes , result is the same.

It looks like it always occurs exactly at the same point in the
execution, not at the beginning, it is not first MPI_Comm_dup in the
code.

I can't say too much about particular piece of the code, where it is
happening, because it is in the 3rd-party library (MUMPS).  When error
occurs, MPI_Comm_dup in every task deals with single-task communicator
(MPI_Comm_split of initial MPI_Comm_world for 16 processes into 16
groups, 1 process per group). And I  can guess that before this error,
MPI_Comm_dup is called something like 100 of times by the same piece
of code on the same communicators without any problems.

I can say that it used to work correctly with all previous versions of
openmpi we used (1.2.8-1.3.2 and some earlier versions). It also works
correctly on other platforms/MPI implementations.

All environmental variables (PATH, LD_LIBRARY_PATH) are correct.
I recompiled code and 3rd-party libraries with this version of OMPI.









--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335


Re: [OMPI users] Issue with Profiling Fortran code

2008-12-05 Thread Edgar Gabriel

George,

I hope you are aware, that *many* tools and application actually profile 
the fortran MPI layer by intercepting the C function calls. This allows 
them to not have to deal with f2c translation of MPI objects and not 
worry about the name mangling issue. Would there be a way to have both 
options  e.g. as a configure flag? The current commit basically breaks 
all of these applications...


Thanks
Edgar

George Bosilca wrote:

Nick,

Thanks for noticing this. It's unbelievable that nobody noticed that 
over the last 5 years. Anyway, I think we have a one line fix for this 
problem. I'll test it asap, and then push it in the 1.3.


  Thanks,
george.

On Dec 5, 2008, at 10:14 , Nick Wright wrote:


Hi Antony

That will work yes, but its not portable to other MPI's that do 
implement the profiling layer correctly unfortunately.


I guess we will just need to detect that we are using openmpi when our 
tool is configured and add some macros to deal with that accordingly. 
Is there an easy way to do this built into openmpi?


Thanks

Nick.

Anthony Chan wrote:

Hope I didn't misunderstand your question.  If you implement
your profiling library in C where you do your real instrumentation,
you don't need to implement the fortran layer, you can simply link
with Fortran to C MPI wrapper library -lmpi_f77. i.e.
/bin/mpif77 -o foo foo.f -L/lib -lmpi_f77 -lYourProfClib
where libYourProfClib.a is your profiling tool written in C. If you 
don't want to intercept the MPI call twice for fortran program,

you need to implment fortran layer.  In that case, I would think you
can just call C version of PMPI_xxx directly from your fortran layer, 
e.g.

void mpi_comm_rank_(MPI_Comm *comm, int *rank, int *info) {
   printf("mpi_comm_rank call successfully intercepted\n");
   *info = PMPI_Comm_rank(comm,rank);
}
A.Chan
- "Nick Wright" <nwri...@sdsc.edu> wrote:

Hi

I am trying to use the PMPI interface with OPENMPI to profile a
fortran program.

I have tried with 1.28 and 1.3rc1 with --enable-mpi-profile switched
on.

The problem seems to be that if one eg. intercepts to call to 
mpi_comm_rank_ (the fortran hook) then calls pmpi_comm_rank_ this then


calls MPI_Comm_rank (the C hook) not PMPI_Comm_rank as it should.

So if one wants to create a library that can profile C and Fortran
codes at the same time one ends up intercepting the mpi call twice. 
Which is


not desirable and not what should happen (and indeed doesn't happen in

other MPI implementations).

A simple example to illustrate is below. If somebody knows of a fix to

avoid this issue that would be great !

Thanks

Nick.

pmpi_test.c: mpicc pmpi_test.c -c

#include
#include "mpi.h"
void mpi_comm_rank_(MPI_Comm *comm, int *rank, int *info) {
  printf("mpi_comm_rank call successfully intercepted\n");
  pmpi_comm_rank_(comm,rank,info);
}
int MPI_Comm_rank(MPI_Comm comm, int *rank) {
  printf("MPI_comm_rank call successfully intercepted\n");
  PMPI_Comm_rank(comm,rank);
}

hello_mpi.f: mpif77 hello_mpi.f pmpi_test.o

  program hello
   implicit none
   include 'mpif.h'
   integer ierr
   integer myid,nprocs
   character*24 fdate,host
   call MPI_Init( ierr )
  myid=0
  call mpi_comm_rank(MPI_COMM_WORLD, myid, ierr )
  call mpi_comm_size(MPI_COMM_WORLD , nprocs, ierr )
  call getenv('HOST',host)
  write (*,*) 'Hello World from proc',myid,' out of',nprocs,host
  call mpi_finalize(ierr)
  end



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335


Re: [OMPI users] Hybrid program

2008-12-02 Thread Edgar Gabriel
its on OpenSuSE 11 with kernel 2.6.25.11. I don't know the libnuma 
library version, but I suspect that its fairly new.


I will try to investigate that in the next days a little more. I do 
think that they use sched_setaffinity() underneath the hood (because in 
one of my failed attempts when I passed in the wrong argument, I got 
actually the same error message that I got earlier with 
sched_setaffinity), but they must do something additionally underneath.


Anyway, I just wanted to report the result, and that there is obviously 
a difference, even if can't explain it right now in details.


Thanks
Edgar

Jeff Squyres wrote:

On Dec 2, 2008, at 11:27 AM, Edgar Gabriel wrote:

so I ran a couple of tests today and I can not confirm your statement. 
I wrote simple a simple test code where a process first sets an 
affinity mask and than spawns a number of threads. The threads modify 
the affinity mask and every thread ( including the master thread) 
print out there affinity mask at the end.


With sched_getaffinity() and sched_setaffinity() it was indeed such 
that the master thread had the same affinity mask as the thread that 
it spawned. This means, that the modification of the affinity mask by 
a new thread in fact did affect the master thread.


Executing the same codesquence however using the libnuma calls, the 
master thread however was not affected by the new affinity mask of the 
children. So clearly, libnuma must be doing something differently.


What distro/version of Linux are you using, and what version of libnuma?

Libnuma v2.0.x very definitely is just a wrapper around the syscall for 
sched_setaffinity().  I downloaded it from:


ftp://oss.sgi.com/www/projects/libnuma/download



--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335


Re: [OMPI users] Hybrid program

2008-12-02 Thread Edgar Gabriel

Jeff,

so I ran a couple of tests today and I can not confirm your statement. I 
wrote simple a simple test code where a process first sets an affinity 
mask and than spawns a number of threads. The threads modify the 
affinity mask and every thread ( including the master thread) print out 
there affinity mask at the end.


With sched_getaffinity() and sched_setaffinity() it was indeed such that 
the master thread had the same affinity mask as the thread that it 
spawned. This means, that the modification of the affinity mask by a new 
thread in fact did affect the master thread.


Executing the same codesquence however using the libnuma calls, the 
master thread however was not affected by the new affinity mask of the 
children. So clearly, libnuma must be doing something differently.


The catch however is, that while I coded the example using libnuma, I 
realized the libnuma allows you only to assign a thread to a socket, not 
to a cpu/core. i.e. you do not have control on which cpu on the socket 
your threads are running, only on which socket.


Thanks
Edgar

Jeff Squyres wrote:

On Nov 20, 2008, at 9:43 AM, Ralph Castain wrote:


Interesting - learn something new every day! :-)


Sorry; I was out for the holiday last week, but a clarification: 
libnuma's man page says that numa_run_on_node*() binds a "thread", but 
it really should say "process".  I looked at the code, and they're 
simply implementing a wrapper around sched_setaffinity(), which is a 
per-process binding.  Not a per-thread binding.



On Nov 20, 2008, at 7:34 AM, Edgar Gabriel wrote:

if you look at recent versions of libnuma, there are two functions 
called numa_run_on_node() and numa_run_on_node_mask(), which allow 
thread-based assignments to CPUs




--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335


Re: [OMPI users] Hybrid program

2008-11-20 Thread Edgar Gabriel
I don't think that they conflict with our paffinity module and setting. 
My understanding is that if you set a new affinity mask, it simply 
overwrites the previous setting. So in the worst case it voids the 
setting made by Open MPI, but I don't think that it should cause 
'problems'. Admittedly, I haven't tried the library and the function 
calls yet, I just learned relatively recently about them...


Thanks
Edga

Ralph Castain wrote:

Interesting - learn something new every day! :-)

How does this interact with OMPI's paffinity/maffinity assignments? With 
the rank/slot mapping and binding system?


Should users -not- set paffinity if they include these numa calls in 
their code?


Can we detect any potential conflict in OMPI and avoid setting 
paffinity_alone? Reason I ask: many systems set paffinity_alone in the 
default mca param file because they always assign dedicated nodes to 
users. While users can be told to be sure to turn it "off" when using 
these calls, it seems inevitable that they will forget - and complaints 
will appear.


Thanks
Ralph



On Nov 20, 2008, at 7:34 AM, Edgar Gabriel wrote:

if you look at recent versions of libnuma, there are two functions 
called numa_run_on_node() and numa_run_on_node_mask(), which allow 
thread-based assignments to CPUs


Thanks
Edgar

Gabriele Fatigati wrote:

Is there a way to assign one thread to one core? Also from code, not
necessary with OpenMPI option.
Thanks.
2008/11/19 Stephen Wornom <stephen.wor...@sophia.inria.fr>:

Gabriele Fatigati wrote:

Ok,
but in Ompi 1.3 how can i enable it?

This may not be relevant, but I could not get a hybrid mpi+OpenMP 
code to

work correctly.
Would my problem be related to Gabriele's and perhaps fixed in 
openmpi 1.3?

Stephen

2008/11/18 Ralph Castain <r...@lanl.gov>:

I am afraid it is only available in 1.3 - we didn't backport it to 
the

1.2
series


On Nov 18, 2008, at 10:06 AM, Gabriele Fatigati wrote:



Hi,
how can i set "slot mapping" as you told me? With TASK GEOMETRY? 
Or is

a new 1.3 OpenMPI feature?

Thanks.

2008/11/18 Ralph Castain <r...@lanl.gov>:

Unfortunately, paffinity doesn't know anything about assigning 
threads

to
cores. This is actually a behavior of Linux, which only allows
paffinity
to
be set at the process level. So, when you set paffinity on a 
process,

you
bind all threads of that process to the specified core(s). You 
cannot

specify that a thread be given a specific core.

In this case, your two threads/process are sharing the same core 
and

thus
contending for it. As you'd expect in that situation, one thread 
gets

the
vast majority of the attention, while the other thread is mostly 
idle.


If you can upgrade to the beta 1.3 release, try using the slot 
mapping

to
assign multiple cores to each process. This will ensure that the
threads
for
that process have exclusive access to those cores, but will not 
bind a
particular thread to one core - the threads can "move around" 
across

the
specified set of cores. Your threads will then be able to run 
without

interfering with each other.

Ralph


On Nov 18, 2008, at 9:18 AM, Gabriele Fatigati wrote:



Dear OpenMPI developers,
i have a strange problem with mixed program MPI+OPENMP over 
OpenMPI
1.2.6. I'm using PJL TASK  GEOMETRY in LSF Scheduler, setting 2 
MPI

process every compute node, and 2 OMP threads per process. Using
paffinity and maffinity, i've noted that over every node, i have 2
thread that works 100%, and 2 threads doesn't works, or works very
few.

If i disable paffinity and maffinity, 4 threads works well, 
without

load imbalance.
I don't understand this issue: paffinity and maffinity should map
every thread over a specific core, optimizing the cache flow, 
but i

have this without settings there!

Can i use paffinity and maffinity in mixed MPI+OpenMP program? 
Or it

works only over MPI thread?

Thanks in advance.


--
Ing. Gabriele Fatigati

CINECA Systems & Tecnologies Department

Supercomputing  Group

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39 051 6171722

g.fatig...@cineca.it
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Ing. Gabriele Fatigati

CINECA Systems & Tecnologies Department

Supercomputing  Group

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39 051 6171722

g.fatig...@cineca.it
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users









--
stephen.wor...@sophia.inria.fr
2004 rou

Re: [OMPI users] Hybrid program

2008-11-20 Thread Edgar Gabriel
I would guess that you can, if the library is installed, and as far as I 
know it is part of most recent Linux distributions...


Thanks
Edgar

Gabriele Fatigati wrote:

Thanks Edgar,
but can i use these libraries also in a not NUMA machines?

2008/11/20 Edgar Gabriel <gabr...@cs.uh.edu>:

if you look at recent versions of libnuma, there are two functions called
numa_run_on_node() and numa_run_on_node_mask(), which allow thread-based
assignments to CPUs

Thanks
Edgar

Gabriele Fatigati wrote:

Is there a way to assign one thread to one core? Also from code, not
necessary with OpenMPI option.

Thanks.

2008/11/19 Stephen Wornom <stephen.wor...@sophia.inria.fr>:

Gabriele Fatigati wrote:

Ok,
but in Ompi 1.3 how can i enable it?


This may not be relevant, but I could not get a hybrid mpi+OpenMP code to
work correctly.
Would my problem be related to Gabriele's and perhaps fixed in openmpi
1.3?
Stephen

2008/11/18 Ralph Castain <r...@lanl.gov>:


I am afraid it is only available in 1.3 - we didn't backport it to the
1.2
series


On Nov 18, 2008, at 10:06 AM, Gabriele Fatigati wrote:



Hi,
how can i set "slot mapping" as you told me? With TASK GEOMETRY? Or is
a new 1.3 OpenMPI feature?

Thanks.

2008/11/18 Ralph Castain <r...@lanl.gov>:


Unfortunately, paffinity doesn't know anything about assigning
threads
to
cores. This is actually a behavior of Linux, which only allows
paffinity
to
be set at the process level. So, when you set paffinity on a process,
you
bind all threads of that process to the specified core(s). You cannot
specify that a thread be given a specific core.

In this case, your two threads/process are sharing the same core and
thus
contending for it. As you'd expect in that situation, one thread gets
the
vast majority of the attention, while the other thread is mostly
idle.

If you can upgrade to the beta 1.3 release, try using the slot
mapping
to
assign multiple cores to each process. This will ensure that the
threads
for
that process have exclusive access to those cores, but will not bind
a
particular thread to one core - the threads can "move around" across
the
specified set of cores. Your threads will then be able to run without
interfering with each other.

Ralph


On Nov 18, 2008, at 9:18 AM, Gabriele Fatigati wrote:



Dear OpenMPI developers,
i have a strange problem with mixed program MPI+OPENMP over OpenMPI
1.2.6. I'm using PJL TASK  GEOMETRY in LSF Scheduler, setting 2 MPI
process every compute node, and 2 OMP threads per process. Using
paffinity and maffinity, i've noted that over every node, i have 2
thread that works 100%, and 2 threads doesn't works, or works very
few.

If i disable paffinity and maffinity, 4 threads works well, without
load imbalance.
I don't understand this issue: paffinity and maffinity should map
every thread over a specific core, optimizing the cache flow, but i
have this without settings there!

Can i use paffinity and maffinity in mixed MPI+OpenMP program? Or it
works only over MPI thread?

Thanks in advance.


--
Ing. Gabriele Fatigati

CINECA Systems & Tecnologies Department

Supercomputing  Group

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39 051 6171722

g.fatig...@cineca.it
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Ing. Gabriele Fatigati

CINECA Systems & Tecnologies Department

Supercomputing  Group

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39 051 6171722

g.fatig...@cineca.it
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users







--
stephen.wor...@sophia.inria.fr
2004 route des lucioles - BP93
Sophia Antipolis
06902 CEDEX

Tel: 04 92 38 50 54
Fax: 04 97 15 53 51


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users








--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335


Re: [OMPI users] Hybrid program

2008-11-20 Thread Edgar Gabriel
if you look at recent versions of libnuma, there are two functions 
called numa_run_on_node() and numa_run_on_node_mask(), which allow 
thread-based assignments to CPUs


Thanks
Edgar

Gabriele Fatigati wrote:

Is there a way to assign one thread to one core? Also from code, not
necessary with OpenMPI option.

Thanks.

2008/11/19 Stephen Wornom :

Gabriele Fatigati wrote:

Ok,
but in Ompi 1.3 how can i enable it?


This may not be relevant, but I could not get a hybrid mpi+OpenMP code to
work correctly.
Would my problem be related to Gabriele's and perhaps fixed in openmpi 1.3?
Stephen

2008/11/18 Ralph Castain :


I am afraid it is only available in 1.3 - we didn't backport it to the
1.2
series


On Nov 18, 2008, at 10:06 AM, Gabriele Fatigati wrote:



Hi,
how can i set "slot mapping" as you told me? With TASK GEOMETRY? Or is
a new 1.3 OpenMPI feature?

Thanks.

2008/11/18 Ralph Castain :


Unfortunately, paffinity doesn't know anything about assigning threads
to
cores. This is actually a behavior of Linux, which only allows
paffinity
to
be set at the process level. So, when you set paffinity on a process,
you
bind all threads of that process to the specified core(s). You cannot
specify that a thread be given a specific core.

In this case, your two threads/process are sharing the same core and
thus
contending for it. As you'd expect in that situation, one thread gets
the
vast majority of the attention, while the other thread is mostly idle.

If you can upgrade to the beta 1.3 release, try using the slot mapping
to
assign multiple cores to each process. This will ensure that the
threads
for
that process have exclusive access to those cores, but will not bind a
particular thread to one core - the threads can "move around" across
the
specified set of cores. Your threads will then be able to run without
interfering with each other.

Ralph


On Nov 18, 2008, at 9:18 AM, Gabriele Fatigati wrote:



Dear OpenMPI developers,
i have a strange problem with mixed program MPI+OPENMP over OpenMPI
1.2.6. I'm using PJL TASK  GEOMETRY in LSF Scheduler, setting 2 MPI
process every compute node, and 2 OMP threads per process. Using
paffinity and maffinity, i've noted that over every node, i have 2
thread that works 100%, and 2 threads doesn't works, or works very
few.

If i disable paffinity and maffinity, 4 threads works well, without
load imbalance.
I don't understand this issue: paffinity and maffinity should map
every thread over a specific core, optimizing the cache flow, but i
have this without settings there!

Can i use paffinity and maffinity in mixed MPI+OpenMP program? Or it
works only over MPI thread?

Thanks in advance.


--
Ing. Gabriele Fatigati

CINECA Systems & Tecnologies Department

Supercomputing  Group

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39 051 6171722

g.fatig...@cineca.it
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
Ing. Gabriele Fatigati

CINECA Systems & Tecnologies Department

Supercomputing  Group

Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy

www.cineca.itTel:   +39 051 6171722

g.fatig...@cineca.it
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users









--
stephen.wor...@sophia.inria.fr
2004 route des lucioles - BP93
Sophia Antipolis
06902 CEDEX

Tel: 04 92 38 50 54
Fax: 04 97 15 53 51


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users







Re: [OMPI users] runtime warnings with MPI_File_write_ordered

2008-07-18 Thread Edgar Gabriel
here is a patch that we use on our development version to silence that 
warning, you have to apply it to.


ompi/ompi/mca/io/romio/romio/mpi-io/io_romio_close.c

I would not like to commit that to the repository since I can not 
oversee whether it causes problems in some other settings/scenario/file 
systems. However, it fixed for us the problems when experimenting with 
shared file pointers (e.g. MPI_File_write_ordered) and did not create 
any issues so far.


Application of that patch at your own risk:-)

Thanks
Edgar


Brian Austin wrote:

Hi,

Sorry about my previous message, it was sent before I'd finished 
composing it.


Whenever I use MPI_File_write_ordered(), all but one process send the 
following message to stderr.

ADIOI_GEN_DELETE (line 22): **io No such file or directory

I have read
http://www.open-mpi.org/community/lists/users/2008/01/4936.php
which suggests that this message appears because my program is trying 
to delete a file that does not exist, but my program does not 
explicitly delete any files. I've included a test program to 
demonstrate the message.


Is there anything I can do to avoid or suppress this message?
The message I referred to before says that I could "ignore errors from 
MPI_File_delete".  How do I do that?


Thanks,
Brian

int
main( int argc, char *argv[]){

  char buff[2] = "a";
  MPI_File fh;
  MPI_Status status;

  MPI_Init( ,  );

  MPI_File_open( MPI_COMM_WORLD, "foo.txt",
 MPI_MODE_CREATE | MPI_MODE_WRONLY,
 MPI_INFO_NULL,  );

  MPI_File_write_ordered( fh, buff, 1, MPI_BYTE,  );

  MPI_File_close(  );

  MPI_Finalize();

  return 0;
}//main



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


42d41
< 
66,73c65
< 	int rank;
< 	MPI_Comm_rank ( (fh)->comm,  );
< 	if ( rank == 0 ) {
< 		ADIO_Close((fh)->shared_fp_fd, _code);
< 	}
< 	else {
< 		error_code = MPI_SUCCESS;
< 	}
---
> 	ADIO_Close((fh)->shared_fp_fd, _code);


Re: [OMPI users] Problem with MPI_Scatter() on inter-communicator...

2008-04-10 Thread Edgar Gabriel

done...

Jeff Squyres wrote:

Edgar --

Can you file a CMR for v1.2?

On Apr 10, 2008, at 8:10 AM, Edgar Gabriel wrote:
thanks for reporting the bug, it is fixed on the trunk. The problem  
was

this time not in the algorithm, but in the checking of the
preconditions. If recvcount was zero and the rank not equal to the  
rank
of the root, than we did not even start the scatter, assuming that  
there

was nothing to do. For inter-communicators the check has to be however
extended to accept recvcount=0 for root=MPI_ROOT. The fix is in the
trunk in rev. 18123.

Thanks
Edgar

Edgar Gabriel wrote:

I don't think that anybody answered to your email so far, I'll have a
look at it on thursday...

Thanks
Edgar

Audet, Martin wrote:

Hi,

I don't know if it is my sample code or if it is a problem whit  
MPI_Scatter() on inter-communicator (maybe similar to the problem  
we found with MPI_Allgather() on inter-communicator a few weeks  
ago) but a simple program I wrote freeze during its second  
iteration of a loop doing an MPI_Scatter() over an inter- 
communicator.


For example if I compile as follows:

 mpicc -Wall scatter_bug.c -o scatter_bug

I get no error or warning. Then if a start it with np=2 as follows:

   mpiexec -n 2 ./scatter_bug

it prints:

  beginning Scatter i_root_group=0
  ending Scatter i_root_group=0
  beginning Scatter i_root_group=1

and then hang...

Note also that if I change the for loop to execute only the  
MPI_Scatter() of the second iteration (e.g. replacing  
"i_root_group=0;" by "i_root_group=1;"), it prints:


   beginning Scatter i_root_group=1

and then hang...

The problem therefore seems to be related with the second  
iteration itself.


Please note that this program run fine with mpich2 1.0.7rc2  
(ch3:sock device) for many different number of process (np) when  
the executable is ran with or without valgrind.


The OpenMPI version I use is 1.2.6rc3 and was configured as follows:

  ./configure --prefix=/usr/local/openmpi-1.2.6rc3 --disable-mpi- 
f77 --disable-mpi-f90 --disable-mpi-cxx --disable-cxx-exceptions -- 
with-io-romio-flags=--with-file-system=ufs+nfs


Note also that all process (when using OpenMPI or mpich2) were  
started on the same machine.


Also if you look at source code, you will notice that some  
arguments to MPI_Scatter() are NULL or 0. This may look strange  
and problematic when using a normal intra-communicator. However  
according to the book "MPI - The complete reference" vol 2 about  
MPI-2, for MPI_Scatter() with an inter-communicator:


 "The sendbuf, sendcount and sendtype arguments are significant  
only at the root process. The recvbuf, recvcount, and recvtype  
arguments are significant only at the processes of the leaf group."


If anyone else can have a look at this program and try it it would  
be helpful.


Thanks,

Martin


#include 
#include 
#include 

int main(int argc, char **argv)
{
  int ret_code = 0;
  int comm_size, comm_rank;

  MPI_Init(, );

  MPI_Comm_size(MPI_COMM_WORLD, _size);
  MPI_Comm_rank(MPI_COMM_WORLD, _rank);

  if (comm_size > 1) {
 MPI_Comm subcomm, intercomm;
 const int group_id = comm_rank % 2;
 int i_root_group;

 /* split process in two groups:  even and odd comm_ranks. */
 MPI_Comm_split(MPI_COMM_WORLD, group_id, 0, );

 /* The remote leader comm_rank for even and odd groups are  
respectively: 1 and 0 */
 MPI_Intercomm_create(subcomm, 0, MPI_COMM_WORLD, 1-group_id,  
0, );


 /* for i_root_group==0 process with comm_rank==0 scatter data  
to all process with odd  comm_rank */
 /* for i_root_group==1 process with comm_rank==1 scatter data  
to all process with even comm_rank */

 for (i_root_group=0; i_root_group < 2; i_root_group++) {
if (comm_rank == 0) {
   printf("beginning Scatter i_root_group=%d 
\n",i_root_group);

}
if (group_id == i_root_group) {
   const int  is_root  = (comm_rank == i_root_group);
   int   *send_buf = NULL;
   if (is_root) {
  const int nbr_other = (comm_size+i_root_group)/2;
  int   ii;
  send_buf = malloc(nbr_other*sizeof(*send_buf));
  for (ii=0; ii < nbr_other; ii++) {
  send_buf[ii] = ii;
  }
   }
   MPI_Scatter(send_buf, 1, MPI_INT,
   NULL, 0, MPI_INT, (is_root ? MPI_ROOT :  
MPI_PROC_NULL), intercomm);


   if (is_root) {
  free(send_buf);
   }
}
else {
   int an_int;
   MPI_Scatter(NULL,0, MPI_INT,
   _int, 1, MPI_INT, 0, intercomm);
}
if (comm_rank == 0) {
   printf("ending Scatter i_root_group=%d\n",i_root_group);
}
 }

 MPI_Comm_free();
 MPI_Comm_free();
  }
  else {
 fprintf(stderr, "%s: error this program must be started np >  

Re: [OMPI users] Problem with MPI_Scatter() on inter-communicator...

2008-04-10 Thread Edgar Gabriel
thanks for reporting the bug, it is fixed on the trunk. The problem was 
this time not in the algorithm, but in the checking of the 
preconditions. If recvcount was zero and the rank not equal to the rank 
of the root, than we did not even start the scatter, assuming that there 
was nothing to do. For inter-communicators the check has to be however 
extended to accept recvcount=0 for root=MPI_ROOT. The fix is in the 
trunk in rev. 18123.


Thanks
Edgar

Edgar Gabriel wrote:
I don't think that anybody answered to your email so far, I'll have a 
look at it on thursday...


Thanks
Edgar

Audet, Martin wrote:

Hi,

I don't know if it is my sample code or if it is a problem whit MPI_Scatter() 
on inter-communicator (maybe similar to the problem we found with 
MPI_Allgather() on inter-communicator a few weeks ago) but a simple program I 
wrote freeze during its second iteration of a loop doing an MPI_Scatter() over 
an inter-communicator.

For example if I compile as follows:

  mpicc -Wall scatter_bug.c -o scatter_bug

I get no error or warning. Then if a start it with np=2 as follows:

mpiexec -n 2 ./scatter_bug

it prints:

   beginning Scatter i_root_group=0
   ending Scatter i_root_group=0
   beginning Scatter i_root_group=1

and then hang...

Note also that if I change the for loop to execute only the MPI_Scatter() of the second iteration 
(e.g. replacing "i_root_group=0;" by "i_root_group=1;"), it prints:

beginning Scatter i_root_group=1

and then hang...

The problem therefore seems to be related with the second iteration itself.

Please note that this program run fine with mpich2 1.0.7rc2 (ch3:sock device) 
for many different number of process (np) when the executable is ran with or 
without valgrind.

The OpenMPI version I use is 1.2.6rc3 and was configured as follows:

   ./configure --prefix=/usr/local/openmpi-1.2.6rc3 --disable-mpi-f77 
--disable-mpi-f90 --disable-mpi-cxx --disable-cxx-exceptions 
--with-io-romio-flags=--with-file-system=ufs+nfs

Note also that all process (when using OpenMPI or mpich2) were started on the 
same machine.

Also if you look at source code, you will notice that some arguments to MPI_Scatter() are 
NULL or 0. This may look strange and problematic when using a normal intra-communicator. 
However according to the book "MPI - The complete reference" vol 2 about MPI-2, 
for MPI_Scatter() with an inter-communicator:

  "The sendbuf, sendcount and sendtype arguments are significant only at the root 
process. The recvbuf, recvcount, and recvtype arguments are significant only at the 
processes of the leaf group."

If anyone else can have a look at this program and try it it would be helpful.

Thanks,

Martin


#include 
#include 
#include 

int main(int argc, char **argv)
{
   int ret_code = 0;
   int comm_size, comm_rank;

   MPI_Init(, );

   MPI_Comm_size(MPI_COMM_WORLD, _size);
   MPI_Comm_rank(MPI_COMM_WORLD, _rank);

   if (comm_size > 1) {
  MPI_Comm subcomm, intercomm;
  const int group_id = comm_rank % 2;
  int i_root_group;

  /* split process in two groups:  even and odd comm_ranks. */
  MPI_Comm_split(MPI_COMM_WORLD, group_id, 0, );

  /* The remote leader comm_rank for even and odd groups are respectively: 
1 and 0 */
  MPI_Intercomm_create(subcomm, 0, MPI_COMM_WORLD, 1-group_id, 0, 
);

  /* for i_root_group==0 process with comm_rank==0 scatter data to all 
process with odd  comm_rank */
  /* for i_root_group==1 process with comm_rank==1 scatter data to all 
process with even comm_rank */
  for (i_root_group=0; i_root_group < 2; i_root_group++) {
 if (comm_rank == 0) {
printf("beginning Scatter i_root_group=%d\n",i_root_group);
 }
 if (group_id == i_root_group) {
const int  is_root  = (comm_rank == i_root_group);
int   *send_buf = NULL;
if (is_root) {
   const int nbr_other = (comm_size+i_root_group)/2;
   int   ii;
   send_buf = malloc(nbr_other*sizeof(*send_buf));
   for (ii=0; ii < nbr_other; ii++) {
   send_buf[ii] = ii;
   }
}
MPI_Scatter(send_buf, 1, MPI_INT,
NULL, 0, MPI_INT, (is_root ? MPI_ROOT : 
MPI_PROC_NULL), intercomm);

if (is_root) {
   free(send_buf);
}
 }
 else {
int an_int;
MPI_Scatter(NULL,0, MPI_INT,
_int, 1, MPI_INT, 0, intercomm);
 }
 if (comm_rank == 0) {
printf("ending Scatter i_root_group=%d\n",i_root_group);
 }
  }

  MPI_Comm_free();
  MPI_Comm_free();
   }
   else {
  fprintf(stderr, "%s: error this program must be started np > 1\n", 
argv[0]);
  ret_code = 1;
   }

   MPI_Finalize();

   return ret_code;
}

___

Re: [OMPI users] Problem with MPI_Scatter() on inter-communicator...

2008-04-08 Thread Edgar Gabriel
I don't think that anybody answered to your email so far, I'll have a 
look at it on thursday...


Thanks
Edgar

Audet, Martin wrote:

Hi,

I don't know if it is my sample code or if it is a problem whit MPI_Scatter() 
on inter-communicator (maybe similar to the problem we found with 
MPI_Allgather() on inter-communicator a few weeks ago) but a simple program I 
wrote freeze during its second iteration of a loop doing an MPI_Scatter() over 
an inter-communicator.

For example if I compile as follows:

  mpicc -Wall scatter_bug.c -o scatter_bug

I get no error or warning. Then if a start it with np=2 as follows:

mpiexec -n 2 ./scatter_bug

it prints:

   beginning Scatter i_root_group=0
   ending Scatter i_root_group=0
   beginning Scatter i_root_group=1

and then hang...

Note also that if I change the for loop to execute only the MPI_Scatter() of the second iteration 
(e.g. replacing "i_root_group=0;" by "i_root_group=1;"), it prints:

beginning Scatter i_root_group=1

and then hang...

The problem therefore seems to be related with the second iteration itself.

Please note that this program run fine with mpich2 1.0.7rc2 (ch3:sock device) 
for many different number of process (np) when the executable is ran with or 
without valgrind.

The OpenMPI version I use is 1.2.6rc3 and was configured as follows:

   ./configure --prefix=/usr/local/openmpi-1.2.6rc3 --disable-mpi-f77 
--disable-mpi-f90 --disable-mpi-cxx --disable-cxx-exceptions 
--with-io-romio-flags=--with-file-system=ufs+nfs

Note also that all process (when using OpenMPI or mpich2) were started on the 
same machine.

Also if you look at source code, you will notice that some arguments to MPI_Scatter() are 
NULL or 0. This may look strange and problematic when using a normal intra-communicator. 
However according to the book "MPI - The complete reference" vol 2 about MPI-2, 
for MPI_Scatter() with an inter-communicator:

  "The sendbuf, sendcount and sendtype arguments are significant only at the root 
process. The recvbuf, recvcount, and recvtype arguments are significant only at the 
processes of the leaf group."

If anyone else can have a look at this program and try it it would be helpful.

Thanks,

Martin


#include 
#include 
#include 

int main(int argc, char **argv)
{
   int ret_code = 0;
   int comm_size, comm_rank;

   MPI_Init(, );

   MPI_Comm_size(MPI_COMM_WORLD, _size);
   MPI_Comm_rank(MPI_COMM_WORLD, _rank);

   if (comm_size > 1) {
  MPI_Comm subcomm, intercomm;
  const int group_id = comm_rank % 2;
  int i_root_group;

  /* split process in two groups:  even and odd comm_ranks. */
  MPI_Comm_split(MPI_COMM_WORLD, group_id, 0, );

  /* The remote leader comm_rank for even and odd groups are respectively: 
1 and 0 */
  MPI_Intercomm_create(subcomm, 0, MPI_COMM_WORLD, 1-group_id, 0, 
);

  /* for i_root_group==0 process with comm_rank==0 scatter data to all 
process with odd  comm_rank */
  /* for i_root_group==1 process with comm_rank==1 scatter data to all 
process with even comm_rank */
  for (i_root_group=0; i_root_group < 2; i_root_group++) {
 if (comm_rank == 0) {
printf("beginning Scatter i_root_group=%d\n",i_root_group);
 }
 if (group_id == i_root_group) {
const int  is_root  = (comm_rank == i_root_group);
int   *send_buf = NULL;
if (is_root) {
   const int nbr_other = (comm_size+i_root_group)/2;
   int   ii;
   send_buf = malloc(nbr_other*sizeof(*send_buf));
   for (ii=0; ii < nbr_other; ii++) {
   send_buf[ii] = ii;
   }
}
MPI_Scatter(send_buf, 1, MPI_INT,
NULL, 0, MPI_INT, (is_root ? MPI_ROOT : 
MPI_PROC_NULL), intercomm);

if (is_root) {
   free(send_buf);
}
 }
 else {
int an_int;
MPI_Scatter(NULL,0, MPI_INT,
_int, 1, MPI_INT, 0, intercomm);
 }
 if (comm_rank == 0) {
printf("ending Scatter i_root_group=%d\n",i_root_group);
 }
  }

  MPI_Comm_free();
  MPI_Comm_free();
   }
   else {
  fprintf(stderr, "%s: error this program must be started np > 1\n", 
argv[0]);
  ret_code = 1;
   }

   MPI_Finalize();

   return ret_code;
}

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335


Re: [OMPI users] RE : RE : MPI_Comm_connect() fails

2008-03-17 Thread Edgar Gabriel
I wouldn't apply George's patch. George's patch in allgather did not 
look correct to me. He was mixing local and remote count in a wrong way...


Thanks
Edgar

Audet, Martin wrote:

Edgar,

I merged the changes you did from -r17848:17849 in the trunk to OpenMPI version 
1.2.6rc2 with George's patch and my small examples now work.

Martin

De : users-boun...@open-mpi.org [users-boun...@open-mpi.org] de la part de 
Edgar Gabriel [gabr...@cs.uh.edu]
Date d'envoi : 17 mars 2008 15:59
À : Open MPI Users
Objet : Re: [OMPI users] RE : MPI_Comm_connect() fails

already working on it, together with a move_request
Thanks
Edgar

Jeff Squyres wrote:

Edgar --

Can you make a patch for the 1.2 series?

On Mar 17, 2008, at 3:45 PM, Edgar Gabriel wrote:


Martin,

I found the problem in the inter-allgather, and fixed it in patch
17849.
The same test using however MPI_Intercomm_create (just to simplify my
life compared to Connect/Accept) using 2 vs 4 processes in the two
groups passes for me -- and did fail with the previous version.


Thanks
Edgar


Audet, Martin wrote:

Hi Jeff,

As I said in my last message (see bellow) the patch (or at least
the patch I got) don't fixes the problem for me. Whether I apply it
over OpenMPI 1.2.5 or 1.2.6rc2, I still get the same problem:

 The client aborts with a truncation error message while the server
freeze when for example the server is started on 3 process and the
client on 2 process.

Feel free to try yourself the two small client and server programs
I posted in my first message.

Thanks,

Martin


Subject: [OMPI users] RE : users Digest, Vol 841, Issue 3
From: Audet, Martin (Martin.Audet_at_[hidden])
Date: 2008-03-13 17:04:25

Hi Georges,

Thanks for your patch, but I'm not sure I got it correctly. The
patch I got modify a few arguments passed to isend()/irecv()/recv()
in coll_basic_allgather.c. Here is the patch I applied:

Index: ompi/mca/coll/basic/coll_basic_allgather.c
===
--- ompi/mca/coll/basic/coll_basic_allgather.c (revision 17814)
+++ ompi/mca/coll/basic/coll_basic_allgather.c (working copy)
@@ -149,7 +149,7 @@
}

/* Do a send-recv between the two root procs. to avoid
deadlock */
- err = MCA_PML_CALL(isend(sbuf, scount, sdtype, 0,
+ err = MCA_PML_CALL(isend(sbuf, scount, sdtype, root,
 MCA_COLL_BASE_TAG_ALLGATHER,
 MCA_PML_BASE_SEND_STANDARD,
 comm, [rsize]));
@@ -157,7 +157,7 @@
return err;
}

- err = MCA_PML_CALL(irecv(rbuf, rcount, rdtype, 0,
+ err = MCA_PML_CALL(irecv(rbuf, rcount, rdtype, root,
 MCA_COLL_BASE_TAG_ALLGATHER, comm,
 [0]));
if (OMPI_SUCCESS != err) {
@@ -186,14 +186,14 @@
return err;
}

- err = MCA_PML_CALL(isend(rbuf, rsize * rcount, rdtype, 0,
+ err = MCA_PML_CALL(isend(rbuf, rsize * scount, sdtype, root,
 MCA_COLL_BASE_TAG_ALLGATHER,
 MCA_PML_BASE_SEND_STANDARD, comm,
));
if (OMPI_SUCCESS != err) {
goto exit;
}

- err = MCA_PML_CALL(recv(tmpbuf, size * scount, sdtype, 0,
+ err = MCA_PML_CALL(recv(tmpbuf, size * rcount, rdtype, root,
MCA_COLL_BASE_TAG_ALLGATHER, comm,
MPI_STATUS_IGNORE));
if (OMPI_SUCCESS != err) {

However with this patch, I still have the problem. Suppose I start
the server with three process and the client with two, the clients
prints:

[audet_at_linux15 dyn_connect]$ mpiexec --universe univ1 -n 2 ./
aclient '0.2.0:2000'
intercomm_flag = 1
intercomm_remote_size = 3
rem_rank_tbl[3] = { 0 1 2}
[linux15:26114] *** An error occurred in MPI_Allgather
[linux15:26114] *** on communicator
[linux15:26114] *** MPI_ERR_TRUNCATE: message truncated
[linux15:26114] *** MPI_ERRORS_ARE_FATAL (goodbye)
mpiexec noticed that job rank 0 with PID 26113 on node linux15
exited on signal 15 (Terminated).
[audet_at_linux15 dyn_connect]$

and abort. The server on the other side simply hang (as before).

Regards,

Martin

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-bounces@open-
mpi.org] On Behalf Of Jeff Squyres
Sent: March 14, 2008 19:45
To: Open MPI Users
Subject: Re: [OMPI users] RE : MPI_Comm_connect() fails

Yes, please let us know if this fixes it.  We're working on a 1.2.6
release; we can definitely put this fix in there if it's correct.

Thanks!


On Mar 13, 2008, at 4:07 PM, George Bosilca wrote:


I dig into the sources and I think you correctly pinpoint the bug.
It seems we have a mismatch between the local and remote sizes in
the inter-communicator allgather in the 1.2 series (which explain
the message truncation error when the local and remote groups have a
different number of processes). Attached to this email you can

Re: [OMPI users] RE : MPI_Comm_connect() fails

2008-03-17 Thread Edgar Gabriel

already working on it, together with a move_request
Thanks
Edgar

Jeff Squyres wrote:

Edgar --

Can you make a patch for the 1.2 series?

On Mar 17, 2008, at 3:45 PM, Edgar Gabriel wrote:


Martin,

I found the problem in the inter-allgather, and fixed it in patch  
17849.

The same test using however MPI_Intercomm_create (just to simplify my
life compared to Connect/Accept) using 2 vs 4 processes in the two
groups passes for me -- and did fail with the previous version.


Thanks
Edgar


Audet, Martin wrote:

Hi Jeff,

As I said in my last message (see bellow) the patch (or at least  
the patch I got) don't fixes the problem for me. Whether I apply it  
over OpenMPI 1.2.5 or 1.2.6rc2, I still get the same problem:


 The client aborts with a truncation error message while the server  
freeze when for example the server is started on 3 process and the  
client on 2 process.


Feel free to try yourself the two small client and server programs  
I posted in my first message.


Thanks,

Martin


Subject: [OMPI users] RE : users Digest, Vol 841, Issue 3
From: Audet, Martin (Martin.Audet_at_[hidden])
Date: 2008-03-13 17:04:25

Hi Georges,

Thanks for your patch, but I'm not sure I got it correctly. The  
patch I got modify a few arguments passed to isend()/irecv()/recv()  
in coll_basic_allgather.c. Here is the patch I applied:


Index: ompi/mca/coll/basic/coll_basic_allgather.c
===
--- ompi/mca/coll/basic/coll_basic_allgather.c (revision 17814)
+++ ompi/mca/coll/basic/coll_basic_allgather.c (working copy)
@@ -149,7 +149,7 @@
}

/* Do a send-recv between the two root procs. to avoid  
deadlock */

- err = MCA_PML_CALL(isend(sbuf, scount, sdtype, 0,
+ err = MCA_PML_CALL(isend(sbuf, scount, sdtype, root,
 MCA_COLL_BASE_TAG_ALLGATHER,
 MCA_PML_BASE_SEND_STANDARD,
 comm, [rsize]));
@@ -157,7 +157,7 @@
return err;
}

- err = MCA_PML_CALL(irecv(rbuf, rcount, rdtype, 0,
+ err = MCA_PML_CALL(irecv(rbuf, rcount, rdtype, root,
 MCA_COLL_BASE_TAG_ALLGATHER, comm,
 [0]));
if (OMPI_SUCCESS != err) {
@@ -186,14 +186,14 @@
return err;
}

- err = MCA_PML_CALL(isend(rbuf, rsize * rcount, rdtype, 0,
+ err = MCA_PML_CALL(isend(rbuf, rsize * scount, sdtype, root,
 MCA_COLL_BASE_TAG_ALLGATHER,
 MCA_PML_BASE_SEND_STANDARD, comm,  
));

if (OMPI_SUCCESS != err) {
goto exit;
}

- err = MCA_PML_CALL(recv(tmpbuf, size * scount, sdtype, 0,
+ err = MCA_PML_CALL(recv(tmpbuf, size * rcount, rdtype, root,
MCA_COLL_BASE_TAG_ALLGATHER, comm,
MPI_STATUS_IGNORE));
if (OMPI_SUCCESS != err) {

However with this patch, I still have the problem. Suppose I start  
the server with three process and the client with two, the clients  
prints:


[audet_at_linux15 dyn_connect]$ mpiexec --universe univ1 -n 2 ./ 
aclient '0.2.0:2000'

intercomm_flag = 1
intercomm_remote_size = 3
rem_rank_tbl[3] = { 0 1 2}
[linux15:26114] *** An error occurred in MPI_Allgather
[linux15:26114] *** on communicator
[linux15:26114] *** MPI_ERR_TRUNCATE: message truncated
[linux15:26114] *** MPI_ERRORS_ARE_FATAL (goodbye)
mpiexec noticed that job rank 0 with PID 26113 on node linux15  
exited on signal 15 (Terminated).

[audet_at_linux15 dyn_connect]$

and abort. The server on the other side simply hang (as before).

Regards,

Martin

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-bounces@open- 
mpi.org] On Behalf Of Jeff Squyres

Sent: March 14, 2008 19:45
To: Open MPI Users
Subject: Re: [OMPI users] RE : MPI_Comm_connect() fails

Yes, please let us know if this fixes it.  We're working on a 1.2.6
release; we can definitely put this fix in there if it's correct.

Thanks!


On Mar 13, 2008, at 4:07 PM, George Bosilca wrote:


I dig into the sources and I think you correctly pinpoint the bug.
It seems we have a mismatch between the local and remote sizes in
the inter-communicator allgather in the 1.2 series (which explain
the message truncation error when the local and remote groups have a
different number of processes). Attached to this email you can find
a patch that [hopefully] solve this problem. If you can please test
it and let me know if this solve your problem.

Thanks,
  george.




On Mar 13, 2008, at 1:11 PM, Audet, Martin wrote:


Hi,

After re-checking the MPI standard (www.mpi-forum.org and MPI - The
Complete Reference), I'm more and more convinced that my small
examples programs establishing a intercommunicator with
MPI_Comm_Connect()/MPI_Comm_accept() over an MPI port and
exchanging data over it with MPI_Allgather() is correct. Especially
calling MPI_Allgather() with recvcount=1 (its third

Re: [OMPI users] RE : MPI_Comm_connect() fails

2008-03-17 Thread Edgar Gabriel
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems

_______
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335


Re: [OMPI users] openMPI on 64 bit SUSE 10.2 OS

2008-02-12 Thread Edgar Gabriel
I doubt that this has to do anything with the platform. We are running 
here Open MPI on a 64bit architecture using SuSe 10.2  since quite a 
while successfully. However, you configure log is indicating, the 
parpack could not be found, so you might have to change the CFLAGS and 
LDFLAGS in order for you configure script to find the according library.


Hsieh, Pei-Ying (MED US) wrote:

configure: error: The MPI version needs parpack. Disabling MPI.
peiying@saturn:~/elmer/elmer-5.4.0/fem-5.4.0>   


Thanks
Edgar


Re: [OMPI users] Question about MPI_Waitany

2008-01-30 Thread Edgar Gabriel
I think you are mixing up two different things here: a NULL pointer is 
invalid, and thus Open MPI has to raise an error. If a request is 
MPI_REQUEST_NULL, that's perfectly legal according to the standard. 
However MPI_REQUEST_NULL is not a NULL pointer, its a well defined value.


Francisco Jesús Martínez Serrano wrote:

Hello Users,

the man page for MPI_Waitany states that

"The array_of_requests list *may contain null* or inactive handles. If
the list contains no active handles (list has length  zero  or all
entries are null or inactive), then the call returns immediately with
index = MPI_UNDEFINED, and an empty status."

I've been having problems with Open MPI and a code that runs fine with
LAM, I have managed to trace it to a call to MPI_Waitany with some
requests set to null (but properly allocated).

The current trunk code for ompi/mpi/c/waitany.c states:

int MPI_Waitany(int count, MPI_Request *requests, int *index,
MPI_Status *status)
{

OPAL_CR_TEST_CHECKPOINT_READY();

if ( MPI_PARAM_CHECK ) {
int i, rc = MPI_SUCCESS;
OMPI_ERR_INIT_FINALIZE(FUNC_NAME);
if ((NULL == requests) && (0 != count)) {
rc = MPI_ERR_REQUEST;
} else {
for (i = 0; i < count; i++) {
if (NULL == requests[i]) {
rc = MPI_ERR_REQUEST;
break;
}
}
}
if ((NULL == index) || (0 > count)) {
rc = MPI_ERR_ARG;
}
OMPI_ERRHANDLER_CHECK(rc, MPI_COMM_WORLD, rc, FUNC_NAME);
}
(...)


From what I understand in this code, if any of the requests is NULL

then an MPI_ERR_REQUEST error will be issued.

Is this a limitation of Open MPI (i.e. further processing of this
query will result in an error if a request is null), or a simple bug?
Of course, I could be mistaken... :-)

Cheers!
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] Simple MPI_Comm_spawn program hangs

2007-12-02 Thread Edgar Gabriel
MPI_Comm_spawn is tested nightly by the test our suites, so it should 
definitely work...


Thanks
Edgar

Prakash Velayutham wrote:

Thanks Edgar. I did not know that. Really?

Anyways, you are sure, an MPI job will work as a spawned process  
instead of "hostname"?


Thanks,
Prakash


On Dec 1, 2007, at 5:56 PM, Edgar Gabriel wrote:

MPI_Comm_spawn has to build an intercommunicator with the child  
process

that it spawns. Thus, you can not spawn a non-MPI job such as
/bin/hostname, since the parent process waits for some messages from  
the

child process(es) in order to set up the intercommunicator.

Thanks
Edgar

Prakash Velayutham wrote:

Hello,

Open MPI 1.2.4

I am trying to run a simple C program.

##

#include 
#include 
#include 
#include "mpi.h"

void
main(int argc, char **argv)
{

int tag = 0;
int my_rank;
int num_proc;
charmessage_0[] = "hello slave, i'm your master";
charmessage_1[50];
charmaster_data[] = "slaves to work";
int array_of_errcodes[10];
int num;
MPI_Status  status;
MPI_Comminter_comm;
MPI_Infoinfo;
int arr[1];
int rc1;

MPI_Init(, );
MPI_Comm_rank(MPI_COMM_WORLD, _rank);
MPI_Comm_size(MPI_COMM_WORLD, _proc);

printf("MASTER : spawning a slave ... \n");
rc1 = MPI_Comm_spawn("/bin/hostname", MPI_ARGV_NULL, 1,
MPI_INFO_NULL, 0, MPI_COMM_WORLD, _comm, arr);

MPI_Finalize();
exit(0);
}

##


This program hangs as below:

prakash@bmi-xeon1-01:~/thesis/CS/Samples> ./master1
MASTER : spawning a slave ...
bmi-xeon1-01

Any ideas  why?

Thanks,
Prakash
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab  http://pstl.cs.uh.edu
Department of Computer Science  University of Houston
Philip G. Hoffman Hall, Room 524Houston, TX-77204, USA
Tel: +1 (713) 743-3857  Fax: +1 (713) 743-3335


  1   2   >