Re: [OMPI users] New to (Open)MPI

2016-09-01 Thread Gilles Gouaillardet
Hi, FCoE is for storage, Ethernet is for the network. I assume you can ssh into your nodes, which means you have a TCP/IP, and it is up and running. i do not know the details of Cisco hardware, but you might be able to use usnic (native btl or via libfabric) instead of the plain TCP/IP netw

Re: [OMPI users] job aborts "readv failed: Connection reset by peer"

2016-09-02 Thread Gilles Gouaillardet
Did you ran ulimit -c unlimited before invoking mpirun ? if your application can be ran with only one tasks, you can try to run it under gdb. you will hopefully be able to see where the illegal instruction occurs. since you are running on AMD processors, you have to make sure you are not using an

Re: [OMPI users] job aborts "readv failed: Connection reset by peer"

2016-09-02 Thread Gilles Gouaillardet
you need to run the ulimit command before mpirun and on the same node. if it still does not work, then you can use a wrapper. instead of mpirun a.out you would do mpirun a.sh a.sh is a script ulimit -c unlimited exec a.out the core is created in the current directory Cheers, Gilles On Saturda

Re: [OMPI users] Unable to mpirun from within torque

2016-09-07 Thread Gilles Gouaillardet
Hi, Which version of Open MPI are you running ? I noted that though you are asking three nodes and one task per node, you have been allocated 2 nodes only. I do not know if this is related to this issue. Note if you use the machinefile, a00551 has two slots (since it appears twice in the machi

Re: [OMPI users] OMPI users] Unable to mpirun from within torque

2016-09-07 Thread Gilles Gouaillardet
a00551.science.domain >[a00551.science.domain:04104] [[53688,0],0] plm:base:receive update proc >state command from [[53688,0],1] >[a00551.science.domain:04104] [[53688,0],0] plm:base:receive got >update_proc_state for job [53688,1] >[1,1]:a00551.science.domain >[a00551.science.domai

Re: [OMPI users] OMPI users] Unable to mpirun from within torque

2016-09-07 Thread Gilles Gouaillardet
>> [a00551.science.domain:04104] [[53688,0],0] complete_setup on job [53688,1] >> [a00551.science.domain:04104] [[53688,0],0] plm:base:receive update proc >> state command from [[53688,0],1] >> [a00551.science.domain:04104] [[53688,0],0] plm:base

Re: [OMPI users] OMPI users] OMPI users] Unable to mpirun from within torque

2016-09-07 Thread Gilles Gouaillardet
t;tm...*sigh* > >Thanks again! > >Oswin > >On 2016-09-07 16:21, Gilles Gouaillardet wrote: >> Note the torque library will only show up if you configure'd with >> --disable-dlopen. Otherwise, you can ldd >> /.../lib/openmpi/mca_plm_tm.so >> >> C

Re: [OMPI users] OMPI users] Unable to mpirun from within torque

2016-09-07 Thread Gilles Gouaillardet
up spawning an orted on its own node. You might try ensuring that your machinefile is using the exact same name as provided in your allocation On Sep 7, 2016, at 7:06 AM, Gilles Gouaillardet wrote: Thanjs for the ligs From what i see now, it looks like a00551 is running both mpirun and

Re: [OMPI users] Unable to mpirun from within torque

2016-09-07 Thread Gilles Gouaillardet
6-09-07 14:41, Gilles Gouaillardet wrote: Hi, Which version of Open MPI are you running ? I noted that though you are asking three nodes and one task per node, you have been allocated 2 nodes only. I do not know if this is related to this issue. Note if you use the machinefile, a00551 has two sl

Re: [OMPI users] Unable to mpirun from within torque

2016-09-08 Thread Gilles Gouaillardet
n options to do so. We give you lots and lots of knobs for just that reason. On Sep 7, 2016, at 10:53 PM, Gilles Gouaillardet wrote: Ralph, there might be an issue within Open MPI. on the cluster i used, hostname returns the FQDN, and $PBS_NODEFILE uses the FQDN too. my $PBS_NODEFILE ha

Re: [OMPI users] Unable to mpirun from within torque

2016-09-08 Thread Gilles Gouaillardet
a range of apps. Lesson to be learned: never, ever muddle around with a system-generated file. If you want to modify where things go, then use one or more of the mpirun options to do so. We give you lots and lots of knobs for just that reason. On Sep 7, 2016, at 10:53 PM, Gilles Gouaillardet

Re: [OMPI users] Unable to mpirun from within torque

2016-09-08 Thread Gilles Gouaillardet
component tm On 2016-09-08 10:18, Gilles Gouaillardet wrote: Ralph, i am not sure i am reading you correctly, so let me clarify. i did not hack $PBS_NODEFILE for fun nor profit, i was simply trying to reproduce an issue i could not reproduce otherwise. /* my job submitted with -l nodes=3

Re: [OMPI users] OMPI users] Unable to mpirun from within torque

2016-09-08 Thread Gilles Gouaillardet
:receive got >update_proc_state for job [34937,1] >[a00551.science.domain:18889] [[34937,0],0] plm:base:receive got >update_proc_state for vpid 2 state NORMALLY TERMINATED exit_code 0 >[a00551.science.domain:18889] [[34937,0],0] plm:base:receive done >processing commands >[a00551

Re: [OMPI users] OMPI users] Unable to mpirun from within torque

2016-09-08 Thread Gilles Gouaillardet
mand from [[34937,0],2] >[a00551.science.domain:18889] [[34937,0],0] plm:base:receive got >update_proc_state for job [34937,1] >[a00551.science.domain:18889] [[34937,0],0] plm:base:receive got >update_proc_state for vpid 2 state NORMALLY TERMINATED exit_code 0 >[a00551.science.domai

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Gilles Gouaillardet
_PATH) Cheers, Gilles On 9/12/2016 3:59 PM, Mahmood Naderan wrote: Hi, Following the suggestion by Gilles Gouaillardet (https://mail-archive.com/users@lists.open-mpi.org/msg29688.html), I ran a configure command for a program like this ​# ../Src/configure FC=/export/apps/siesta/openmpi-1.8.

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Gilles Gouaillardet
That sounds good to me ! just to make it crystal clear ... assuming you configure'd your Open MPI 1.8.8 with --enable-mpirun-prefix-by-default (and if you did not, i do encourage you to do so), then all you need is to remove /opt/openmpi/lib from your LD_LIBRARY_PATH (e.g. you do *not* ha

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Gilles Gouaillardet
Basically, it means libs with be linked with -Wl,-rpath,/export/apps/siesta/openmpi-1.8.8/lib so if you run a.out with an empty $LD_LIBRARY_PATH, then it will look for the MPI libraries in /export/apps/siesta/openmpi-1.8.8/lib Cheers, Gilles On 9/12/2016 4:50 PM, Mahmood Naderan wrote:

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Gilles Gouaillardet
Mahmood, I was suggesting you (re)configure (i assume you did it) the Open MPI 1.8.8 installed in /export/apps/siesta/openmpi-1.8.8 with --enable-mpirun-prefix-by-default Cheers, Gilles On 9/12/2016 4:51 PM, Mahmood Naderan wrote: >​ --enable-mpirun-prefix-by-default​ What is that? Does t

Re: [OMPI users] Problem with specifying wrapper compiler mpifort

2016-09-12 Thread Gilles Gouaillardet
Mahmood, you need to manually remove /opt/openmpi/lib from your LD_LIBRARY_PATH (or have your sysadmin do it if this is somehow done automatically) the point of configuring with --enable-mpirun-prefix-by-default is you do *not* need to add /export/apps/siesta/openmpi-1.8.8/lib in your LD_LIBRA

Re: [OMPI users] MPI libraries

2016-09-12 Thread Gilles Gouaillardet
Mahmood, mpi_siesta is a siesta library, not an Open MPI library. fwiw, you might want to try again from scratch with MPI_INTERFACE=libmpi_f90.a DEFS_MPI=-DMPI in your arch.make i do not think libmpi_f90.a is related to an OpenMPI library. if you need some more support, please refer to the sies

Re: [OMPI users] job distribution issue

2016-09-14 Thread Gilles Gouaillardet
That typically occurs if nwchem is linked with MPICH and you are using OpenMPI mpirun. A first, i recommend you double check your environment, and run ldd nwchem the very same Open MPI is used by everyone Cheers, Gilles On Wednesday, September 14, 2016, abhisek Mondal wrote: > Hi, > I'm on a s

Re: [OMPI users] static linking MPI libraries with applications

2016-09-14 Thread Gilles Gouaillardet
Mahmood, try to prepend /export/apps/siesta/openmpi-1.8.8/lib to your $LD_LIBRARY_PATH note this is not required when Open MPI is configure'd with --enable-mpirun-prefix-by-default Cheers, Gilles On Wednesday, September 14, 2016, Mahmood Naderan wrote: > Hi, > Here is the problem with stat

Re: [OMPI users] static linking MPI libraries with applications

2016-09-14 Thread Gilles Gouaillardet
he library paths to LD_LIBRARY_PATH, but I want > to statically put the required libraries in the binary. > > > > Regards, > Mahmood > > > > On Wed, Sep 14, 2016 at 4:44 PM, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com > > wrote: >

Re: [OMPI users] static linking MPI libraries with applications

2016-09-14 Thread Gilles Gouaillardet
Mahmood, i meant --disable-dlopen i am not aware of a --disable-dl-dlopen option, which does not mean it does not exist Cheers, Gilles On Wednesday, September 14, 2016, Mahmood Naderan wrote: > Do you mean --disable-dl-dlopen? The last lines of configure are > > +++ Configuring MCA framework

Re: [OMPI users] Still "illegal instruction"

2016-09-15 Thread Gilles Gouaillardet
Mahmood, You can gdb --pid=core.5383 And then bt An then disas And "scroll" until the current instruction Iirc, there is a star at the beginning of this line You can also try show maps Or info maps (I cannot remember the syntax...) Btw, did you compile lapack and friends by yourself ? Mahmood N

Re: [OMPI users] OMPI users] Still "illegal instruction"

2016-09-15 Thread Gilles Gouaillardet
--core=... is the right syntax, sorry about that No need to recompile with -g, binary is good enough here Then you need to run disas in gdb, to disassemble the instruction at 0x08da76e And then, still in gdb info maps or show maps To find out the library this instruction is coming from OpenBLAS i

Re: [OMPI users] OMPI users] Still "illegal instruction"

2016-09-15 Thread Gilles Gouaillardet
Ok, you can try this under gdb info proc mapping info registers x /100x $rip x /100x $eip I remember you are running on AMD cpus that is why INTEL is only instructions must be avoided Cheers, Gilles On Thursday, September 15, 2016, Mahmood Naderan wrote: > disas command fails. > > Prog

Re: [OMPI users] OMPI users] Still "illegal instruction"

2016-09-15 Thread Gilles Gouaillardet
if gcc is installed on your compute node, you can run echo | gcc -v -E - 2>&1 | grep cc1 and look for the -march=xxx parameter /* you might want to compare that with your fronted */ And/or you can run grep family /proc/cpuinfo on your compute node Then man gcc on your front end node >From my gc

Re: [OMPI users] OMPI users] Still "illegal instruction"

2016-09-15 Thread Gilles Gouaillardet
Mahmood, -march=bdver1 should be ok on your nodes. from the gcc command line, i was expecting -march=xxx, but it is missing (your gcc might be a bit older for that) note you have to recompile all your libs (openblas and friends) with -march=bdver1 i guess your gdb is also a bit too old to suppor

Re: [OMPI users] OMPI users] Still "illegal instruction"

2016-09-15 Thread Gilles Gouaillardet
Mahmood, note you have to compile the source file that contains the snippet with '-g -O0', and link with '-g -O0' also, there was a typo in the gdb command, please read "frame 1" instead of "frame #1" Cheers, Gilles On Fri, Sep 16, 2016 at 12:53 P

Re: [OMPI users] Compilation without NVML support

2016-09-20 Thread Gilles Gouaillardet
Fred, Can you try to configure with --disable-nvml This is an option that should be passed to embedded hwloc. Cheers, Gilles On Tuesday, September 20, 2016, Fred Mioux wrote: > Hello , > > > > I compile OpenMPI on a machine which support CUDA and NVML, but I don’t > want to include this in m

Re: [OMPI users] Compilation without NVML support

2016-09-20 Thread Gilles Gouaillardet
Brice, An other option is to add a --with-hwloc-flags configure option to Open MPI, and pass the value to embedded hwloc configure. We already do that for ROMIO (--with-io-romio-flags) Cheers, Gilles On Tuesday, September 20, 2016, Brice Goglin wrote: > Hello > Assuming this NVML detection is

Re: [OMPI users] OMPI users] Compilation without NVML support

2016-09-21 Thread Gilles Gouaillardet
Cheers, Gilles Fred Mioux wrote: >Thank you for your answer. > >I already tried this but it doesn't work. > > >Regards > >Fred Mioux > > >2016-09-20 16:50 GMT+02:00 Gilles Gouaillardet : > >Fred, > > >Can you try to configure with > >--d

Re: [OMPI users] OMPI users] Compilation without NVML support

2016-09-21 Thread Gilles Gouaillardet
That was a kind of braimstorming This is not implemented and cannot work, see my previous message Cheers, Gilles Fred Mioux wrote: >Thank you for your answer, I will try it. > > >Regards, > >Fred Mioux > > >2016-09-20 17:02 GMT+02:00 Gilles Gouaillardet : > >

Re: [OMPI users] Strange errors when running mpirun

2016-09-22 Thread Gilles Gouaillardet
Justin, i do not see this error on my laptop which version of OS X are you running ? can you try to TMPDIR=/tmp mpirun -n 1 Cheers, Gilles On Thu, Sep 22, 2016 at 7:21 PM, Nathan Hjelm wrote: > FWIW it works fine for me on my MacBook Pro running 10.12 with Open MPI 2.0.1 > installed through

Re: [OMPI users] Strange errors when running mpirun

2016-09-22 Thread Gilles Gouaillardet
ustin On Thu, Sep 22, 2016 at 7:33 AM, Gilles Gouaillardet wrote: Justin, i do not see this error on my laptop which version of OS X are you running ? can you try to TMPDIR=/tmp mpirun -n 1 Cheers, Gilles On Thu, Sep 22, 2016 at 7:21 PM, Nathan Hjelm wrote: FWIW it works fine for me

Re: [OMPI users] Openmpi 1.10.x, mpirun and Slurm 15.08 problem

2016-09-23 Thread Gilles Gouaillardet
Marcin, You can also try to exclude the public subnet(s) (e.g. 1.2.3.0/24) and the loopback interface instead of em4 that does not exist on the compute nodes. Or you can include only the private subnet(s) that are common to frontend and compute nodes Cheers, Gilles On Saturday, September 24, 20

Re: [OMPI users] Problem running an MPI program through the PBS manager

2016-09-26 Thread Gilles Gouaillardet
Mahmood, The node is defined in the PBS config, however it is not part of the allocation (e.g. job) so it cannot be used, and hence the error message. In your PBS script, you do not need -np nor -host parameters to your mpirun command. Open MPI mpirun will automatically detect it is launched from

Re: [OMPI users] openmpi-2.0.1 build error: multiple definition of `pmi_opcaddy_t_class'

2016-09-27 Thread Gilles Gouaillardet
Hi, I can see this error happening if you configure with --disable-dlopen --with-pmi In opal/mca/pmix/s?/pmix_s?.c, you can try to add the static keyword before OBJ_CLASS_INSTANCE(pmi_opcaddy_t, ...) Or you can update the files to use unique class name (probably safer...) Cheers, Gilles On Wed

Re: [OMPI users] openmpi-2.0.1 build error: multiple definition of `pmi_opcaddy_t_class'

2016-09-27 Thread Gilles Gouaillardet
tter: @PenguinHPC/ On Tue, Sep 27, 2016 at 11:51 AM, Gilles Gouaillardet mailto:gilles.gouaillar...@gmail.com>> wrote: Hi, I can see this error happening if you configure with --disable-dlopen --with-pmi In opal/mca/pmix/s?/pmix_s?.c, you can try to

Re: [OMPI users] MPI_Comm_spawn

2016-09-29 Thread Gilles Gouaillardet
Hi, I do not expect spawn can work with direct launch (e.g. srun) Do you have PSM (e.g. Infinipath) hardware ? That could be linked to the failure Can you please try mpirun --mca pml ob1 --mca btl tcp,sm,self -np 1 --hostfile my_hosts ./manager 1 and see if it help ? Note if you have the poss

Re: [OMPI users] MPI_Comm_spawn

2016-09-29 Thread Gilles Gouaillardet
gt; > On Sep 29, 2016, at 6:06 AM, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com > > wrote: > > Hi, > > I do not expect spawn can work with direct launch (e.g. srun) > > Do you have PSM (e.g. Infinipath) hardware ? That could be linked to the > failure

Re: [OMPI users] openmpi 2.1 large messages

2016-09-29 Thread Gilles Gouaillardet
Rick, can you please provide some more information : - Open MPI version - interconnect used - number of tasks / number of nodes - does the hang occur in the first MPI_Bcast of 8000 bytes ? note there is a known issue if you MPI_Bcast with different but matching signatures (e.g. some tas

Re: [OMPI users] what was the rationale behind rank mapping by socket?

2016-09-29 Thread Gilles Gouaillardet
David, i guess you would have expected the default mapping/binding scheme is core instead of sockets iirc, we decided *not* to bind to cores by default because it is "safer" if you simply OMP_NUM_THREADS=8 mpirun -np 2 a.out then, a default mapping/binding scheme by core means the OpenMP t

Re: [OMPI users] what was the rationale behind rank mapping by socket?

2016-09-29 Thread Gilles Gouaillardet
manner of problems with binding when using cpusets/cgroups. -- bennet On Thu, Sep 29, 2016 at 9:52 PM, Gilles Gouaillardet wrote: David, i guess you would have expected the default mapping/binding scheme is core instead of sockets iirc, we decided *not* to bind to cores by default bec

Re: [OMPI users] Strange errors when running mpirun

2016-09-30 Thread Gilles Gouaillardet
/open-mpi/ompi/pull/2135.patch, or by using the latest open-mpi from homebrew. On Fri, Sep 23, 2016 at 11:15 AM, Gilles Gouaillardet wrote: > Justin, > > > the root cause could be the length of $TMPDIR that might cause some path > being truncated. > > you can check that by

Re: [OMPI users] openmpi 2.1 large messages

2016-09-30 Thread Gilles Gouaillardet
; dest = 0; > > source = 0; > > std::cout << "Task " << rank << " > sending." << std::endl; > > MPI_Bcast(inmsg,bufsize, > MPI_BYTE,rank,MPI_COMM_WORL

Re: [OMPI users] Viewing the output of the program

2016-10-01 Thread Gilles Gouaillardet
Mahmood, iirc, a related bug was fixed in v2.0.0 Can you please update to 2.0.1 and try again ? Cheers, Gilles On Saturday, October 1, 2016, Mahmood Naderan wrote: > Hi, > Here is the bizarre behavior of the system and hope that someone can > clarify is this related to OMPI or not. > > When I

Re: [OMPI users] problems with client server scenario using MPI_Comm_connect

2016-10-04 Thread Gilles Gouaillardet
Rick, I do not think ompi_server is required here. Can you please post a trimmed version of your client and server, and your two mpirun command lines. You also need to make sure all ranks have the same root parameter when invoking MPI_Comm_accept and MPI_Comm_connect Cheers, Gilles "Marlborou

Re: [OMPI users] OS X + Xcode 8 : dyld: Symbol not found: _clock_gettime

2016-10-04 Thread Gilles Gouaillardet
Christophe, If i read between the lines, you had Open MPI running just fine, then you upgraded xcode and that broke OpenMPI. Am i right so far ? Did you build Open MPI by yourself, or did you get binaries from somewhere (such as brew) ? In the first case, you need to rebuild Open MPI. (You have

Re: [OMPI users] problems with client server scenario using MPI_Comm_connect

2016-10-04 Thread Gilles Gouaillardet
> if(mpi_error) > > { > > ... > > } > > std::cout << "Established port name is " << port_name << > std::endl; > > s

Re: [OMPI users] problems with client server scenario using MPI_Comm_connect

2016-10-04 Thread Gilles Gouaillardet
les; > > The abort occurs somewhere between 30 and 60 seconds. Is > there some configuration setting that could influence this? > > > > Rick > > > > *From:* users [mailto:users-boun...@lists.open-mpi.org > ] *On > Behalf Of *Gilles Gouaillardet > *Sent:* Tuesd

Re: [OMPI users] Question on using Github to see bugs fixed in past versions

2016-10-04 Thread Gilles Gouaillardet
Edwin, changes are summarized in the NEWS file we used to have two github repositories, and they were "merged" recently with github, you can list the closed PR for a given milestone https://github.com/open-mpi/ompi-release/milestones?state=closed then you can click on a milestone, and list

Re: [OMPI users] MPI + system() call + Matlab MEX crashes

2016-10-05 Thread Gilles Gouaillardet
Juraj, if i understand correctly, the "master" task calls MPI_Init(), and then fork&exec matlab. In some cases (lack of hardware support), fork cannot even work. but let's assume it is fine for now. Then, if i read between the lines, matlab calls mexFunction that MPI_Init(). As far as i a

Re: [OMPI users] Data corruption in v2.0.1 using MPI_IN_PLACE with MPI_Ireduce

2016-10-06 Thread Gilles Gouaillardet
Andre, this issue has already been reported and fixed (the fix will be available in v2.0.2) Meanwhile, you can manually apply the fix available at https://github.com/open-mpi/ompi/commit/78f74e58d09a3958043a0c70861b26664def3bb3.patch Cheers, Gilles On 10/7/2016 8:44 AM, Andre Kessler w

Re: [OMPI users] Crash during MPI_Finalize

2016-10-10 Thread Gilles Gouaillardet
George, i tried to mimick this with the latest v1.10, and failed to reproduce the error. at first, i recommend you try the latest v1.10 (1.10.4) or event 2.0.1. unusable stack trace can sometimes be caused by unloaded modules, so if the issue persists, you might want to try rebuilding Op

Re: [OMPI users] MPI Behaviour Question

2016-10-11 Thread Gilles Gouaillardet
Mark, My understanding is that shell meta expansion occurs once on the first node, so from an Open MPI point of view, you really invoke mpirun echo node0 I suspect mpirun echo 'Hello from $(hostname)' Is what you want to do I do not know about mpirun echo 'Hello from $HOSTNAME' $HOSTNAME might be

Re: [OMPI users] Using Open MPI with multiple versions of GCC and G++

2016-10-11 Thread Gilles Gouaillardet
FWIW. mpicxx does two things : 1) use the C++ compiler (e.g. g++) 2) if Open MPI was configured with (deprecated) C++ bindings (e.g. --enable-mpi-cxx), then link with the Open MPI C++ library that contains bindings. IIRC, Open MPI v1.10 does build C++ bindings by default, but v2.0 does n

Re: [OMPI users] Openmpi 2.0.1 build --with-psm2 failed on CentOS 7.2

2016-10-11 Thread Gilles Gouaillardet
Limin, It seems libpsm2 provided by Centos 7 is a bit too old all symbols are prefixed with psm_, and Open MPI expect they are prefixed with psm2_ i am afraid your only option is to manually install the latest libpsm2 and then configure again with your psm2 install dir Cheers, Gilles

Re: [OMPI users] mpirun works with cmd line call , but not with app context file arg

2016-10-16 Thread Gilles Gouaillardet
Out of curiosity, why do you specify both --hostfile and -H ? Do you observe the same behavior without --hostfile ~/.mpihosts ? Also, do you have at least 4 cores on both A.lan and B.lan ? Cheers, Gilles On Sunday, October 16, 2016, MM wrote: > Hi, > > openmpi 1.10.3 > > this call: > > mpirun

Re: [OMPI users] communications groups

2016-10-17 Thread Gilles Gouaillardet
Rick, In my understanding, sensorgroup is a group with only task 1 Consequently, sensorComm is - similar to MPI_COMM_SELF on task 1 - MPI_COMM_NULL on other tasks, and hence the barrier fails I suggest you double check sensorgroup is never MPI_GROUP_EMPTY and add a test not to call MPI_Barrier on

Re: [OMPI users] communications groups

2016-10-17 Thread Gilles Gouaillardet
Rick, I re-read the MPI standard and was unable to figure out if sensorgroup is MPI_GROUP_EMPTY or a group with task 1 on tasks except task 1 (A group that does not contain the current task makes little sense to me, but I do not see any reason why this group have to be MPI_GROUP_EMPTY) Regardless

Re: [OMPI users] communications groups

2016-10-17 Thread Gilles Gouaillardet
dispatcher will use the 2 comm groups to coordinate activity. I tried > adding the dispatcher to the sensorList comm group, but I get an error > saying “invalid task”. > > > > Rick > > > > *From:* users [mailto:users-boun...@lists.open-mpi.org > ] *On > Behalf

Re: [OMPI users] Problem with double shared library

2016-10-17 Thread Gilles Gouaillardet
Sean, if i understand correctly, your built a libtransport_mpi.so library that depends on Open MPI, and your main program dlopen libtransport_mpi.so. in this case, and at least for the time being, you need to use RTLD_GLOBAL in your dlopen flags. Cheers, Gilles On 10/18/2016 4:53 AM,

Re: [OMPI users] openmpi-1.10.3 cross compile configure options for arm-openwrt-linux-muslgnueabi on x86_64-linux-gnu

2016-10-18 Thread Gilles Gouaillardet
Hi, can you please give the patch below a try ? Cheers, Gilles diff --git a/ompi/tools/wrappers/ompi_wrapper_script.in b/ompi/tools/wrappers/ompi_wrapper_script.in index d87649f..b66fec3 100644 --- a/ompi/tools/wrappers/ompi_wrapper_script.in +++ b/ompi/tools/wrappers/ompi_wrapper_script.in

Re: [OMPI users] OMPI users] openmpi-1.10.3 cross compile configure options for arm-openwrt-linux-muslgnueabi on x86_64-linux-gnu

2016-10-18 Thread Gilles Gouaillardet
vtfilter* > >lrwxrwxrwx 1 nmahesh nmahesh       8 Oct 18 15:34 vtfiltergen -> vtfilter* > >lrwxrwxrwx 1 nmahesh nmahesh      12 Oct 18 15:34 vtfiltergen-mpi -> >vtfilter-mpi* > >-rwxr-xr-x 1 nmahesh nmahesh 3100359 Oct 18 15:34 vtfilter-mpi* > >lrwxrwxrwx 1

Re: [OMPI users] openmpi-1.10.3 cross compile configure options for arm-openwrt-linux-muslgnueabi on x86_64-linux-gnu

2016-10-18 Thread Gilles Gouaillardet
You can try configure --host=arm... CC=gcc_cross_compiler CXX=g++_cross_compiler On Tuesday, October 18, 2016, Mahesh Nanavalla < mahesh.nanavalla...@gmail.com> wrote: > Hi all, > > How to cross compile *openmpi *for* arm *on* x86_64 pc.* > > *Kindly provide configure options for above...* > > Th

Re: [OMPI users] communications groups

2016-10-18 Thread Gilles Gouaillardet
or > must be in the same group. Is my understanding incorrect? > > > > Thanks > > Rick > > > > From: users [mailto:users-boun...@lists.open-mpi.org] On Behalf Of Gilles > Gouaillardet > Sent: Monday, October 17, 2016 10:38 AM > > > To: Open MPI Users

Re: [OMPI users] openmpi-1.10.3 cross compile configure options for arm-openwrt-linux-muslgnueabi on x86_64-linux-gnu

2016-10-19 Thread Gilles Gouaillardet
; > --host=arm-openwrt-linux-muslgnueabi >>> > > --enable-script-wrapper-compilers >>> > > --disable-mpi-fortran >>> > > --enable-shared >>> > > --disable-dlopen >>> > > >>> > > it's configured ,make &

Re: [OMPI users] openmpi-1.10.3 cross compile configure options for arm-openwrt-linux-muslgnueabi on x86_64-linux-gnu

2016-10-19 Thread Gilles Gouaillardet
*--disable-vt *while configuring the openmpi-1.10.3 , > > the size of the installation directory reduced 70MB to 9MB. > > will it effect anything? > > > On Wed, Oct 19, 2016 at 4:06 PM, Gilles Gouaillardet < > gilles.gouaillar...@gmail.com > > wrote: > >> vt

Re: [OMPI users] communicator with MPI_Reduce

2016-10-21 Thread Gilles Gouaillardet
Rick, if you are using an inter communicator, please refer to the man page (a group has to use MPI_ROOT as the root argument) Cheers, Gilles On 10/21/2016 3:36 PM, George Bosilca wrote: Rick, There are few requirements to use any communicator in a collective operation (including MPI_R

Re: [OMPI users] Help - Client / server - app hangs in connect/accept by the second or next client that wants to connect to server

2016-10-21 Thread Gilles Gouaillardet
to ask - are there any new solutions or investigations in > this problem? > > Cheers, > > Matus Dobrotka > > 2016-07-19 15:23 GMT+02:00 Gilles Gouaillardet < > gilles.gouaillar...@gmail.com > >: > >> my bad for the confusion, >> >> I misr

Re: [OMPI users] Small typos in MPI_Info_get_nkeys/MPI_Info_get_nthkey man pages

2016-10-23 Thread Gilles Gouaillardet
Thanks Nicolas, master has been patched, and i will make the PR for the release branches Cheers, Gilles On 10/23/2016 8:28 PM, Nicolas Joly wrote: Hi, Just noticed a typo in MPI_Info_get_nkeys/MPI_Info_get_nthkey man pages. The last cross-reference is 'MPI_Info_get_valueln' where it shou

Re: [OMPI users] Problem building OpenMPI with CUDA 8.0

2016-10-23 Thread Gilles Gouaillardet
Justin, iirc, NVML is only used by hwloc (e.g. not by CUDA) and there is no real benefit for having that. as a workaround, you can export enable_nvml=no and then configure && make install Cheers, Gilles On 10/20/2016 12:49 AM, Jeff Squyres (jsquyres) wrote: Justin -- Fair point. Can y

Re: [OMPI users] Problem building OpenMPI with CUDA 8.0

2016-10-24 Thread Gilles Gouaillardet
. Could be shell variable such as HWLOC_DISABLE_NVML=yes for all our major configured dependencies. Brice Le 24/10/2016 02:12, Gilles Gouaillardet a écrit : Justin, iirc, NVML is only used by hwloc (e.g. not by CUDA) and there is no real benefit for having that. as a workaround, you can

Re: [OMPI users] prevent vampirTrace from linking to CUDA

2016-10-25 Thread Gilles Gouaillardet
Jack, It looks like a bug in vt configury If you do not need vt, then you can configure with --disable-vt (Fwiw, vt has been removed from Open MPI 2.0) If you need vt, you might be lucky with export with_cuda=no configure ... Cheers, Gilles Jack Stalnaker wrote: >How do I prevent the build f

Re: [OMPI users] Fortran and MPI-3 shared memory

2016-10-27 Thread Gilles Gouaillardet
From the MPI 3.1 standard (page 338) Rationale. The C bindings of MPI_ALLOC_MEM and MPI_FREE_MEM are similar to the bindings for the malloc and free C library calls: a call to MPI_Alloc_mem(: : :, &base) should be paired with a call to MPI_Free_mem(base) (one less level of indirection). Both

Re: [OMPI users] OMPI users] Fortran and MPI-3 shared memory

2016-10-27 Thread Gilles Gouaillardet
Jeff, Out of curiosity, did you compile the Fortran test program with -O0 ? Cheers, Gilles Tom Rosmond wrote: >Jeff, > >Thanks for looking at this.  I know it isn't specific to Open-MPI, but it is a >frustrating issue vis-a-vis MPI and Fortran.  There are many very large MPI >applications ar

Re: [OMPI users] redeclared identifier for openmpi-v2.0.1-130-gb3a367d witj Sun C on Linux

2016-10-27 Thread Gilles Gouaillardet
Siegmar, The fix is in the pipe. Meanwhile, you can download it at https://github.com/open-mpi/ompi/pull/2295.patch Cheers, Gilles Siegmar Gross wrote: >Hi, > >I tried to install openmpi-v2.0.1-130-gb3a367d on my "SUSE Linux >Enterprise Server 12.1 (x86_64)" with Sun C 5.14 beta. Unfortunatel

Re: [OMPI users] Fortran and MPI-3 shared memory

2016-10-27 Thread Gilles Gouaillardet
Tom, regardless the (lack of) memory model in Fortran, there is an error in testmpi3.f90 shar_mem is declared as an integer, and hence is not in the shared memory. i attached my version of testmpi3.f90, which behaves just like the C version, at least when compiled with -g -O0 and with Ope

Re: [OMPI users] OpenMPI + InfiniBand

2016-10-28 Thread Gilles Gouaillardet
Sergei, is there any reason why you configure with --with-verbs-libdir=/usr/lib ? as far as i understand, --with-verbs should be enough, and /usr/lib nor /usr/local/lib should ever be used in the configure command line (and btw, are you running on a 32 bits system ? should the 64 bits libs be in /

Re: [OMPI users] Redusing libmpi.so size....

2016-11-01 Thread Gilles Gouaillardet
Did you strip the libraries already ? the script will show the list of frameworks and components used by MPI helloworld. from that, you can deduce a list of components that are not required, exclude them via the configure command line, and rebuild a trimmed Open MPI. note this is pretty pa

Re: [OMPI users] error on dlopen

2016-11-03 Thread Gilles Gouaillardet
Mahmood, did you build Open MPI as a static only library ? i guess the -ldl position is wrong. your link command line should be mpifort -O3 -o xCbtest blacstest.o btprim.o tools.o Cbt.o ../../libscalapack.a -ldl you can manually mpifort -O3 -o xCbtest --showme blacstest.o btprim.o tools.

Re: [OMPI users] error on dlopen

2016-11-04 Thread Gilles Gouaillardet
til I don't see what you said after "-lopen-pal". Is that OK? Regards, Mahmood On Fri, Nov 4, 2016 at 10:23 AM, Gilles Gouaillardet mailto:gil...@rist.or.jp>> wrote: Mahmood, did you build Open MPI as a static only library ? i guess the -ldl position is

Re: [OMPI users] error on dlopen

2016-11-04 Thread Gilles Gouaillardet
On Fri, Nov 4, 2016 at 11:02 AM, Gilles Gouaillardet mailto:gil...@rist.or.jp>> wrote: Yes, that is a problem :-( you might want to reconfigure with --enable-static --disable-shared --disable-dlopen and see if it helps or you can simply manuall edit /opt/o

Re: [OMPI users] error on dlopen

2016-11-04 Thread Gilles Gouaillardet
You might have to remove -ldl from the scalapack makefile If it still does not work, can you please post mpirun --showme ... output ? Cheers, Gilles On Friday, November 4, 2016, Mahmood Naderan wrote: > Hi Gilles, > I noticed that /opt/openmpi-2.0.1/share/openmpi/mpifort-wrapper-data.txt > is

Re: [OMPI users] OMPI users] mpirun --map-by-node

2016-11-04 Thread Gilles Gouaillardet
As long as you run 3 MPI tasks, both options will produce the same mapping. If you want to run up to 12 tasks, then --map-by node is the way to go Mahesh Nanavalla wrote: >s... > > >Thanks for responding me. > >i have solved that as below by limiting slots in hostfile > > >root@OpenWrt:~#

Re: [OMPI users] False positives and even failure with Open MPI and memchecker

2016-11-05 Thread Gilles Gouaillardet
Hi, note your printf line is missing. if you printf l_prev, then the valgrind error occurs in all variants at first glance, it looks like a false positive, and i will investigate it Cheers, Gilles On Sat, Nov 5, 2016 at 7:59 PM, Yvan Fournier wrote: > Hello, > > I have observed what seems to

Re: [OMPI users] False positives and even failure with Open MPI and memchecker

2016-11-05 Thread Gilles Gouaillardet
les On Sat, Nov 5, 2016 at 9:48 PM, Gilles Gouaillardet wrote: > Hi, > > note your printf line is missing. > if you printf l_prev, then the valgrind error occurs in all variants > > at first glance, it looks like a false positive, and i will investigate it > > > Cheers, &

Re: [OMPI users] False positives and even failure with Open MPI and memchecker

2016-11-05 Thread Gilles Gouaillardet
so it seems we took some shortcuts in pml/ob1 the attached patch (for the v1.10 branch) should fix this issue Cheers Gilles On Sat, Nov 5, 2016 at 10:08 PM, Gilles Gouaillardet wrote: > that really looks like a bug > > if you rewrite your program with > > MPI_Sendrecv

Re: [OMPI users] Having problem with the Master/Slave program...

2016-11-07 Thread Gilles Gouaillardet
Hi, master() never MPI_Bcast(dojob=2, c...), hence the hang Cheers, Gilles On Tuesday, November 8, 2016, Baris Kececi via users < users@lists.open-mpi.org> wrote: > Hi friends, > I'm trying to write a simple parallel master/slave program. > Here in my program the task of master is to distribut

Re: [OMPI users] Valgrind errors related to MPI_Win_allocate_shared

2016-11-15 Thread Gilles Gouaillardet
Joseph, thanks for the report, this is a real memory leak. i fixed it in master, and the fix is now being reviewed. meanwhile, you can manually apply the patch available at https://github.com/open-mpi/ompi/pull/2418.patch Cheers, Gilles On Tue, Nov 15, 2016 at 1:52 AM, Joseph Schuchart wrote:

Re: [OMPI users] symbol lookup error for a "hello world" fortran script

2016-11-15 Thread Gilles Gouaillardet
Julien, first, make sure you are using the Open MPI wrapper which mpifort should be /usr/lib/openmpi/bin if i understand correctly then make sure you exported your LD_LIBRARY_PATH *after* you prepended the path to Open MPI lib in your .bashrc you can either LD_LIBRARY_PATH=/usr/lib/openmpi/lib:$LD

Re: [OMPI users] symbol lookup error for a "hello world" fortran script

2016-11-15 Thread Gilles Gouaillardet
002b5a23666000) > libnuma.so.1 => /usr/lib/x86_64-linux-gnu/libnuma.so.1 > (0x2b5a238a) > libltdl.so.7 => /usr/lib/x86_64-linux-gnu/libltdl.so.7 > (0x2b5a23aac000) > > > Thanks. > > > > Julien > > > > 2016-11-15 23:52 GMT-05:00 Gille

Re: [OMPI users] OMPI users] MPI_ABORT was invoked on rank 0 in communicator compute with errorcode 59

2016-11-16 Thread Gilles Gouaillardet
Hi, With ddt, you can do offline debugging just to get where the program crashes ddt -n 8 --offline a.out ... You might also wanna try the reverse connect feature Cheers, Gilles "Beheshti, Mohammadali" wrote: >Hi Gus, > >Thank you very much for your prompt response. The myjob.sh script is as

Re: [OMPI users] openmpi-2.0.1

2016-11-17 Thread Gilles Gouaillardet
Hi, at first, you might want to remove path related to Intel MPI runtime (e.g. /opt/intel/composer_xe_2013_sp1.2.144/mpirt/lib/intel64) from your environment if you are using bash, double check you export LD_LIBRARY_PATH (otherwise echo $LD_LIBRARY_PATH and the linker see different things) then

Re: [OMPI users] Error bash: /usr/mpi/gcc/openmpi-1.8.8/bin/orted: No such file or directory

2016-11-20 Thread Gilles Gouaillardet
Sebastian, The error message is pretty self-explanatory /usr/mpi/gcc/openmpi-1.8.8/bin/orted is missing on your compute nodes. it seems you are using /usr/mpi/gcc/openmpi-1.8.8/bin/mpirun on your frontend node (e.g. the node on which mpirun is invoked) but Open MPI was not updated on some nodes l

Re: [OMPI users] valgrind invalid read

2016-11-21 Thread Gilles Gouaillardet
Yann, this is a bug that was previously reported, and the fix is pending on review. meanwhile, you can manually apply the patch available at https://github.com/open-mpi/ompi/pull/2418 Cheers, Gilles On 11/18/2016 9:34 PM, Yann Jobic wrote: Hi, I'm using valgrind 3.12 with openmpi 2.

Re: [OMPI users] Valgrind errors related to MPI_Win_allocate_shared

2016-11-21 Thread Gilles Gouaillardet
properly free'd at the end. Any chance this can be filtered using the suppression file? Cheers, Joseph On 11/15/2016 04:39 PM, Gilles Gouaillardet wrote: Joseph, thanks for the report, this is a real memory leak. i fixed it in master, and the fix is now being reviewed. meanwhile, yo

Re: [OMPI users] ScaLapack tester fails with 2.0.1, works with 1.10.4; Intel Omni-Path

2016-11-22 Thread Gilles Gouaillardet
Christoph, out of curiosity, could you try to mpirun --mca coll ^tuned ... and see if it helps ? Cheers, Gilles On Tue, Nov 22, 2016 at 7:21 PM, Christof Koehler wrote: > Hello again, > > I tried to replicate the situation on the workstation at my desk, > running ubuntu 14.04 (gcc 4.8.4) and

<    1   2   3   4   5   6   7   8   9   10   >