[OMPI users] Univa Grid Engine and OpenMPI 1.8.7

2020-01-12 Thread Lane, William via users
I'm having problems w/an old openMPI test program which I re-compiled using OpenMPI 1.8.7 for CentOS 6.3 running Univa Grid Engine 8.6.4. 1. Are the special PE requirements for Son of Grid Engine needed for Univa Grid Engine 8.6.4 (in particular qsort_args and/or control_slaves both being pre

Re: [OMPI users] Strange OpenMPI errors showing up in Caffe rc5 build

2017-05-06 Thread Lane, William
The strange thing is OpenMPI isn't mentioned anywhere as being a dependency for Caffe! I haven't read anything that suggests OpenMPI is supported in Caffe either. This is why I figure it must be a dependency of Caffe (of which there are 15) that relies on OpenMPI. I tried setting the compiler

[OMPI users] Strange OpenMPI errors on building Caffe 1.0

2017-05-04 Thread Lane, William
I know this could possibly be off-topic, but the errors are OpenMPI errors and if anyone could shed light on the nature of these errors I figure it would be this group: CXX/LD -o .build_release/tools/upgrade_solver_proto_text.bin g++ .build_release/tools/upgrade_solver_proto_text.o -o .build_re

[OMPI users] Strange OpenMPI errors showing up in Caffe rc5 build

2017-05-04 Thread Lane, William
I know this could possibly be off-topic, but the errors are OpenMPI errors and if anyone could shed light on the nature of these errors I figure it would be this group: CXX/LD -o .build_release/tools/upgrade_solver_proto_text.bin g++ .build_release/tools/upgrade_solver_proto_text.o -o .build_r

[OMPI users] Test of new OpenMPI email list

2016-07-27 Thread Lane, William
Just a test. IMPORTANT WARNING: This message is intended for the use of the person or entity to which it is addressed and may contain information that is privileged and confidential, the disclosure of which is governed by applicable law. If the reader of this message is not the intended recipie

Re: [OMPI users] OpenMPI 1.10.1 *ix hard/soft open files limits >= 4096 still required?

2016-03-18 Thread Lane, William
kend for some reason. It gets used by both vader and sm, so no help there. I’m afraid I’ll have to defer to Nathan from here as he is more familiar with it than I. On Mar 17, 2016, at 4:55 PM, Lane, William mailto:william.l...@cshs.org>> wrote: I ran OpenMPI using the "-mca btl ^vader&q

Re: [OMPI users] OpenMPI 1.10.1 *ix hard/soft open files limits >= 4096 still required?

2016-03-17 Thread Lane, William
lib64/libc.so.6(__libc_start_main+0xfd)[0x2b1b0e298cdd] [csclprd3-6-12:30667] [14] /hpc/home/lanew/mpi/openmpi/a_1_10_1.out[0x400999] [csclprd3-6-12:30667] *** End of error message *** -Bill L. From: users [users-boun...@open-mpi.org] on behalf of Lane, William [willia

Re: [OMPI users] OpenMPI 1.10.1 *ix hard/soft open files limits >= 4096 still required?

2016-03-17 Thread Lane, William
ory BTL to segfault. Try turning vader off and see if that helps - I’m not sure what you are using, but maybe “-mca btl ^vader” will suffice Nathan - any other suggestions? On Mar 17, 2016, at 4:40 PM, Lane, William mailto:william.l...@cshs.org>> wrote: I remember years ago, OpenMPI

[OMPI users] OpenMPI 1.10.1 *ix hard/soft open files limits >= 4096 still required?

2016-03-17 Thread Lane, William
I remember years ago, OpenMPI (version 1.3.3) required the hard/soft open files limits be >= 4096 in order to function when large numbers of slots were requested (with 1.3.3 this was at roughly 85 slots). Is this requirement still present for OpenMPI versions 1.10.1 and greater? I'm having some is

[OMPI users] running OpenMPI jobs (either 1.10.1 or 1.8.7) on SoGE more problems

2016-03-16 Thread Lane, William
I'm getting an error message early on: [csclprd3-0-11:17355] [[36373,0],17] plm:rsh: using "/opt/sge/bin/lx-amd64/qrsh -inherit -nostdin -V -verbose" for launching unable to write to file /tmp/285019.1.verylong.q/qrsh_error: No space left on device[csclprd3-6-10:18352] [[36373,0],21] plm:rsh: usi

Re: [OMPI users] Son of Grid Engine, Parallel Environments and OpenMPI 1.8.7

2015-09-10 Thread Lane, William
Re: [OMPI users] Son of Grid Engine, Parallel Environments and OpenMPI 1.8.7 "Lane, William" writes: > Our issues with OpenMPI 1.8.7 and Son-of-Gridengine turned out to be > down to using the wrong Parallel Environment. Having a PE with > control_slaves set to TRUE and start

Re: [OMPI users] Son of Grid Engine, Parallel Environments and OpenMPI 1.8.7

2015-09-04 Thread Lane, William
checked and ompi transparently fall back to --heyero-nodes if needed bottom line, on a heterogeneous cluster, it is required or safer to use the --hetero-nodes option Cheers, Gilles On Wednesday, August 12, 2015, Dave Love mailto:d.l...@liverpool.ac.uk>> wrote: "Lane, William" &g

Re: [OMPI users] Son of Grid Engine, Parallel Environments and OpenMPI 1.8.7

2015-08-11 Thread Lane, William
___ From: users [users-boun...@open-mpi.org] on behalf of Dave Love [d.l...@liverpool.ac.uk] Sent: Tuesday, August 11, 2015 9:34 AM To: Open MPI Users Subject: Re: [OMPI users] Son of Grid Engine, Parallel Environments and OpenMPI 1.8.7 "Lane, William" writes: > I

Re: [OMPI users] Son of Grid Engine, Parallel Environments and OpenMPI 1.8.7

2015-08-10 Thread Lane, William
ng and what the daemons think is wrong. On Wed, Aug 5, 2015 at 3:17 PM, Lane, William mailto:william.l...@cshs.org>> wrote: Actually, we're still having problems submitting OpenMPI 1.8.7 jobs to the cluster thru SGE (which we need to do in order to track usage stats on the cluster). I

Re: [OMPI users] Son of Grid Engine, Parallel Environments and OpenMPI 1.8.7

2015-08-05 Thread Lane, William
there for qsort, but I haven't checked it against SGE. Let us know if you hit a problem and we'll try to figure it out. Glad to hear your cluster is working - nice to have such challenges to shake the cobwebs out :-) On Wed, Aug 5, 2015 at 12:43 PM, Lane, William mailto:william.l...

[OMPI users] Son of Grid Engine, Parallel Environments and OpenMPI 1.8.7

2015-08-05 Thread Lane, William
I read @ https://www.open-mpi.org/faq/?category=sge that for OpenMPI Parallel Environments there's a special consideration for Son of Grid Engine: '"qsort_args" is necessary with the Son of Grid Engine distribution, version 8.1.1 and later, and probably only applicable to it. For very

[OMPI users] What Red Hat Enterprise/CentOS NUMA libraries are recommended/required for OpenMPI?

2015-08-05 Thread Lane, William
I'm running OpenMPI 1.8.7 tests on a mixed bag cluster of various systems under CentOS 6.3, I've been intermittently getting warnings about not having the proper NUMA libraries installed. Which NUMA libraries should be installed for CentOS 6.3 and OpenMPI 1.8.7? Here's what I currently have instal

[OMPI users] SGE problems w/OpenMPI 1.8.7

2015-07-30 Thread Lane, William
I'm running a mixed cluster of Blades (HS21 and HS22 chassis), x3550-M3 and X3550-M4 systems, some of which support hyperthreading, while others don't (specifically the HS21 blades) all on CentOS 6.3 w/SGE. I have no problems running my simple OpenMPI 1.8.7 test code outside of SGE (with or with

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-07-24 Thread Lane, William
ckages are installed on compute nodes? Thank you for all your help w/this Ralph, now I can move on and get the Linpack benchmark running. -Bill L. From: users [users-boun...@open-mpi.org] on behalf of Lane, William [william.l...@cshs.org] Sent: Thursday, July 16, 20

[OMPI users] NUMA: Non-local memory access and performance effects on OpenMPI

2015-07-24 Thread Lane, William
I'm just curious, if we run an OpenMPI job and it makes use of non-local memory (i.e. memory tied to another socket) what kind of effects are seen on performance? How would you go about testing the above? I can't think of any command line parameter that would allow one to split an OpenMPI proces

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-07-16 Thread Lane, William
users-boun...@open-mpi.org] on behalf of Ralph Castain [r...@open-mpi.org] Sent: Wednesday, July 15, 2015 10:08 PM To: Open MPI Users Subject: Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash Stable 1.8.7 has been released - please let me know if the problem is resolved. On

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-07-16 Thread Lane, William
: Open MPI Users Subject: Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash Can you give it a try? I’m skeptical, but it might work. The rc is out on the web site: http://www.open-mpi.org/software/ompi/v1.8/ On Jul 14, 2015, at 11:17 AM, Lane, William mailto:william.l...@cshs.o

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-07-14 Thread Lane, William
ut at least one problem. We'll just have to see what happens and attack it next. Ralph On Tue, Jul 7, 2015 at 8:07 PM, Lane, William mailto:william.l...@cshs.org>> wrote: I'm sorry I haven't been able to get the lstopo information for all the nodes, but I had to get t

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-07-07 Thread Lane, William
's > only one PU per core, then hyper threading is disabled. > > >> On Jun 29, 2015, at 4:42 PM, Lane, William wrote: >> >> Would the output of dmidecode -t processor and/or lstopo tell me conclusively >> if hyperthreading is enabled or not? Hyperthreading

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-06-29 Thread Lane, William
its OS, are you ? Cheers, Gilles mca_btl_sm_add_procs( int mca_btl_sm_add_procs( On Wednesday, June 24, 2015, Lane, William mailto:william.l...@cshs.org>> wrote: Gilles, All the blades only have two core Xeons (without hyperthreading) populating both their sockets. All the x3550 nodes

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-06-26 Thread Lane, William
e the issue :-( can you try to reproduce the issue with the smallest hostfile, and then run lstopo on all the nodes ? btw, you are not mixing 32 bits and 64 bits OS, are you ? Cheers, Gilles mca_btl_sm_add_procs( int mca_btl_sm_add_procs( On Wednesday, June 24, 2015, Lane, Willia

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-06-24 Thread Lane, William
If there is a problem with placement, that is where it would exist. On Tue, Jun 23, 2015 at 5:12 PM, Lane, William mailto:william.l...@cshs.org>> wrote: Ralph, There is something funny going on

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-06-23 Thread Lane, William
heers, Gilles On Friday, June 19, 2015, Lane, William mailto:william.l...@cshs.org>> wrote: Ralph, I created a hostfile that just has the names of the hosts while specifying no slot information whatsoever (e.g. csclprd3-0-0) and received the following errors: mpirun -np 132 -report-binding

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-06-23 Thread Lane, William
Ralph, There is something funny going on, the trace from the runs w/the debug build aren't showing any differences from what I got earlier. However, I did do a run w/the --bind-to core switch and was surprised to see that hyperthreading cores were sometimes being used. Here's the traces that I ha

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-06-23 Thread Lane, William
e the issue ? do all the nodes have the same configuration ? if yes, what happens without --hetero-nodes ? Cheers, Gilles On Friday, June 19, 2015, Lane, William > wrote: Ralph, I created a hostfile that just has the names of the hosts while specifying no slot information whatsoever (e.g. cs

Re: [OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-06-19 Thread Lane, William
l? I also suspect that you would have no problems if you -bind-to none - does that in fact work? On Jun 18, 2015, at 4:54 PM, Lane, William mailto:william.l...@cshs.org>> wrote: I'm having a strange problem w/OpenMPI 1.8.6. If I run my OpenMPI test code (compiled against OpenMPI 1.8.

[OMPI users] OpenMPI 1.8.6, CentOS 6.3, too many slots = crash

2015-06-18 Thread Lane, William
I'm having a strange problem w/OpenMPI 1.8.6. If I run my OpenMPI test code (compiled against OpenMPI 1.8.6 libraries) on < 131 slots I get no issues. Anything over 131 errors out: mpirun -np 132 -report-bindings --prefix /hpc/apps/mpi/openmpi/1.8.6/ --hostfile hostfile-single --mca btl_tcp_if_in

[OMPI users] OpenMPI stable 1.8.6 release date?

2015-06-15 Thread Lane, William
Is there any fixed date at which time OpenMPI stable 1.8.6 will be released? I know the pre-release version is available but I'd rather wait for the stable version. Thank you, -Bill Lane IMPORTANT WARNING: This message is intended for the use of the person or entity to which it is addressed and

[OMPI users] Problems running linpack benchmark on old Sunfire opteron nodes

2015-05-23 Thread Lane, William
I've compiled the linpack benchmark using openMPI 1.8.5 libraries and include files on CentOS 6.4. I've tested the binary on the one Intel node (some sort of 4-core Xeon) and it runs, but when I try to run it on any of the old Sunfire opteron compute nodes it appears to hang (although top indicate

Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3

2015-04-28 Thread Lane, William
he behavior from this particular cmd line. Looks like we are binding-to-core by default, even if you specify use-hwthread-cpus. I’ll fix that one - still don’t understand the segfaults. Bill: can you shed some light on those? On Apr 9, 2015, at 8:28 PM, Lane, William mailto:william.l...@cshs.

Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3

2015-04-09 Thread Lane, William
On Apr 8, 2015, at 10:55 AM, Lane, William mailto:william.l...@cshs.org>> wrote: Ralph, I added one of the newer LGA2011 nodes to my hostfile and re-ran the benchmark successfully and saw some strange results WRT the binding directives. Why are hyperthreading cores being used on the LGA

Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3

2015-04-08 Thread Lane, William
Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3 On Apr 8, 2015, at 9:29 AM, Lane, William mailto:william.l...@cshs.org>> wrote: Ralph, Thanks for YOUR help, I never would've managed to get the LAPACK benchmark running on more than one node in our cluster without your help. Ralph, i

Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3

2015-04-08 Thread Lane, William
ybe just NUMA nodes that also have hyperthreading capabilities? Bill L. From: users [users-boun...@open-mpi.org] on behalf of Lane, William [william.l...@cshs.org] Sent: Wednesday, April 08, 2015 9:29 AM To: Open MPI Users Subject: Re: [OMPI users] OpenMPI 1.8.2

Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3

2015-04-08 Thread Lane, William
at 1:17 PM, Lane, William mailto:william.l...@cshs.org>> wrote: Ralph, I've finally had some luck using the following: $MPI_DIR/bin/mpirun -np $NSLOTS --report-bindings --hostfile hostfile-single --mca btl_tcp_if_include eth0 --hetero-nodes --use-hwthread-cpus --prefix $MPI_DI

Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3

2015-04-07 Thread Lane, William
lso help. On Apr 6, 2015, at 12:24 PM, Lane, William mailto:william.l...@cshs.org>> wrote: Ralph, For the following two different commandline invocations of the LAPACK benchmark $MPI_DIR/bin/mpirun -np $NSLOTS --report-bindings --hostfile hostfile-no_slots --mca btl_tcp_if_include eth0 --h

Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3

2015-04-06 Thread Lane, William
Sent: Sunday, April 05, 2015 6:09 PM To: Open MPI Users Subject: Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3 On Apr 5, 2015, at 5:58 PM, Lane, William mailto:william.l...@cshs.org>> wrote: I think some of the Intel Blade systems in the cluster are dual core, but don't suppor

Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3

2015-04-05 Thread Lane, William
;t require a swap region be mounted - I didn't see anything in your original message indicating that OMPI had actually crashed, but just wasn't launching due to the above issue. Were you actually seeing crashes as well? On Wed, Apr 1, 2015 at 8:31 AM, Lane, William mailto:william.l

Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3

2015-04-01 Thread Lane, William
upport one proc/core - i.e., we didn't at least 128 cores in the sum of the nodes you told us about. I take it you were expecting that there were that many or more? Ralph On Wed, Apr 1, 2015 at 12:54 AM, Lane, William mailto:william.l...@cshs.org>> wrote: I'm having problems ru

[OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3

2015-04-01 Thread Lane, William
I'm having problems running OpenMPI jobs (using a hostfile) on an HPC cluster running ROCKS on CentOS 6.3. I'm running OpenMPI outside of Sun Grid Engine (i.e. it is not submitted as a job to SGE). The program being run is a LAPACK benchmark. The commandline parameter I'm using to run the jobs is:

Re: [OMPI users] Mpirun 1.5.4 problems when request > 28 slots (updated findings)

2014-09-02 Thread Lane, William
@open-mpi.org] on behalf of Ralph Castain [r...@open-mpi.org] Sent: Tuesday, September 02, 2014 11:03 AM To: Open MPI Users Subject: Re: [OMPI users] Mpirun 1.5.4 problems when request > 28 slots (updated findings) On Sep 2, 2014, at 10:48 AM, Lane, William wrote: > Ralph, > > T

Re: [OMPI users] Mpirun 1.5.4 problems when request > 28 slots (updated findings)

2014-09-02 Thread Lane, William
e between the "to" and the "none". You'll take a performance hit, but it should at least run. On Aug 29, 2014, at 11:29 PM, Lane, William wrote: > The --bind-to-none switch didn't help, I'm still getting the same errors. > > The only NUMA pa

Re: [OMPI users] Mpirun 1.5.4 problems when request > 28 slots (updated findings)

2014-08-30 Thread Lane, William
uest > 28 slots (updated findings) Am 28.08.2014 um 10:09 schrieb Lane, William: > I have some updates on these issues and some test results as well. > > We upgraded OpenMPI to the latest version 1.8.2, but when submitting jobs via > the SGE orte parallel environment received &g

Re: [OMPI users] Mpirun 1.5.4 problems when request > 28 slots (updated findings)

2014-08-28 Thread Lane, William
users-boun...@open-mpi.org] on behalf of Jeff Squyres (jsquyres) [jsquy...@cisco.com] Sent: Friday, August 08, 2014 5:25 AM To: Open MPI User's List Subject: Re: [OMPI users] Mpirun 1.5.4 problems when request > 28 slots On Aug 8, 2014, at 1:24 AM, Lane, William wrote: > Using the &qu

Re: [OMPI users] Mpirun 1.5.4 problems when request > 28 slots

2014-08-08 Thread Lane, William
hich may be a core or a hyperthread). So if you're running in a bind-to-core situation, if it's a "before OMPI supporter HT properly" version, then you'll bind 2 MPI processes to a single core, and that will likely be pretty terrible for overall performance. Does that help

Re: [OMPI users] Mpirun 1.5.4 problems when request > 28 slots

2014-07-22 Thread Lane, William
ject: Re: [OMPI users] Mpirun 1.5.4 problems when request > 28 slots Hmmm...that's not a "bug", but just a packaging issue with the way CentOS distributed some variants of OMPI that requires you install/update things in a specific order. On Jul 20, 2014, at 11:34 PM, Lane, Will

Re: [OMPI users] Mpirun 1.5.4 problems when request > 28 slots

2014-07-21 Thread Lane, William
28 slots I'm unaware of any CentOS-OMPI bug, and I've been using CentOS throughout the 6.x series running OMPI 1.6.x and above. I can't speak to the older versions of CentOS and/or the older versions of OMPI. On Jul 19, 2014, at 8:14 PM, Lane, William mailto:william.l...@cshs

Re: [OMPI users] Mpirun 1.5.4 problems when request > 28 slots

2014-07-19 Thread Lane, William
OMPI users] Mpirun 1.5.4 problems when request > 28 slots Not for this test case size. You should be just fine with the default values. If I understand you correctly, you've run this app at scale before on another cluster without problem? On Jul 19, 2014, at 1:34 PM, Lane, William

Re: [OMPI users] Mpirun 1.5.4 problems when request > 28 slots

2014-07-19 Thread Lane, William
r app, which is what I'd expect given your description * try using gdb (or pick your debugger) to look at the corefile and see where it is failing I'd also suggest updating OMPI to the 1.6.5 or 1.8.1 versions, but I doubt that's the issue behind this problem. On Jul 19, 201

[OMPI users] Mpirun 1.5.4 problems when request > 28 slots

2014-07-19 Thread Lane, William
I'm getting consistent errors of the form: "mpirun noticed that process rank 3 with PID 802 on node csclprd3-0-8 exited on signal 11 (Segmentation fault)." whenever I request more than 28 slots. These errors even occur when I run mpirun locally on a compute node that has 32 slots (8 cores, 16 wi

[OMPI users] unsubscribe

2011-09-19 Thread Lane, William
please unsubscribe me from this maillist. Thank you, -Bill Lane From: users-boun...@open-mpi.org [users-boun...@open-mpi.org] on behalf of Ole Nielsen [ole.moller.niel...@gmail.com] Sent: Monday, September 19, 2011 1:39 AM To: us...@open-mpi.org Subject: Re: [OMP

Re: [OMPI users] Can run OpenMPI testcode on 86 or fewer slots in cluster, but nothing more than that

2011-07-27 Thread Lane, William
Thank you for your help Ralph and Reuti, The problem turned out to be the number of file descriptors was insufficient. The reason given by a sys admin was that since SGE isn't a user it wasn't initially using the new upper bound on the number of file descriptors. -Bill Lane

Re: [OMPI users] Can run OpenMPI testcode on 86 or fewer slots in cluster, but nothing more than that

2011-07-26 Thread Lane, William
descriptors on that node, assuming your application does a complete wireup across all procs. Updating to 1.4.3 would be a good idea as it is more stable, but it may not resolve this problem if the issue is one of the above. HTH Ralph On Jul 25, 2011, at 11:23 PM, Lane, William wrote: > Please h

[OMPI users] Can run OpenMPI testcode on 86 or fewer slots in cluster, but nothing more than that

2011-07-26 Thread Lane, William
Please help me resolve the following problems with a 306 node Rocks cluster using SGE. Please note I can run the job successfully on <87 slots, but not anymore than that. We're running SGE and I'm submitting my jobs via the SGE CLI utility qsub and the following lines from a script: mpirun -n $