Here is what I see in my 1.8.5 build lib directory: lrwxrwxrwx. 1 rhc 15 Apr 28 07:51 libmpi.so -> libmpi.so.1.6.0* lrwxrwxrwx. 1 rhc 15 Apr 28 07:51 libmpi.so.1 -> libmpi.so.1.6.0* -rwxr-xr-x. 1 rhc 1015923 Apr 28 07:51 libmpi.so.1.6.0*
So it should just be a link > On Apr 28, 2015, at 10:30 AM, Lane, William <william.l...@cshs.org> wrote: > > Ralph, > > I copied the LAPACK benchmark binaries (xhpl being the binary) over to a > development system (which > is running the same version of CentOS) but I'm getting some errors trying to > run the OpenMPI LAPACK benchmark > on OpenMPI 1.8.5: > > xhpl: error while loading shared libraries: libmpi.so.1: cannot open shared > object file: No such file or directory > xhpl: error while loading shared libraries: libmpi.so.1: cannot open shared > object file: No such file or directory > xhpl: error while loading shared libraries: libmpi.so.1: cannot open shared > object file: No such file or directory > xhpl: error while loading shared libraries: libmpi.so.1: cannot open shared > object file: No such file or directory > > When I look at the 1.8.5 install directory I find the following shared object > library but no libmpi.so.1 > > /apps/mpi/openmpi/1.8.5-dev/lib/libmpi.so > /apps/mpi/openmpi/1.8.5-dev/lib/libmpi.so.0 > > Is it necessary to re-compile the OpenMPI LAPACK benchmark to run OpenMPI > 1.8.5 > as opposed to 1.8.2? > > -Bill L. > > From: users [users-boun...@open-mpi.org] on behalf of Ralph Castain > [r...@open-mpi.org] > Sent: Friday, April 10, 2015 5:28 PM > To: Open MPI Users > Subject: Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3 > > This will be in the next nightly 1.8.5 tarball. > > Bill: can you test it to see if we’ve fixed the problem? > > Thanks > Ralph > > >> On Apr 10, 2015, at 2:15 PM, Ralph Castain <r...@open-mpi.org >> <mailto:r...@open-mpi.org>> wrote: >> >> Okay, I at least now understand the behavior from this particular cmd line. >> Looks like we are binding-to-core by default, even if you specify >> use-hwthread-cpus. I’ll fix that one - still don’t understand the segfaults. >> >> Bill: can you shed some light on those? >> >> >>> On Apr 9, 2015, at 8:28 PM, Lane, William <william.l...@cshs.org >>> <mailto:william.l...@cshs.org>> wrote: >>> >>> Ralph, >>> >>> In looking at the /proc/cpuinfo textfile it looks like hyperthreading >>> is enabled (in that it indicates 16 siblings for each of the 8 cores of the >>> two LGA2011 CPU's). I don't have access to the BIOS on this system though >>> so I'll have to check w/someone else. >>> >>> I have done more testing and found that at 104 slots requested OpenMPI >>> won't run the LAPACK benchmark. All the LGA2011 nodes exhibit the same >>> strange binding behavior (maybe because hyperthreading is turned on for >>> these nodes, but no the LGA 1366 nodes?). Below is all the relevant >>> information to >>> that run: >>> >>> II. >>> a. $MPI_DIR/bin/mpirun -np $NSLOTS --report-bindings --hostfile >>> hostfile-single --mca btl_tcp_if_include eth0 --hetero-nodes >>> --use-hwthread-cpus --prefix $MPI_DIR $BENCH_DIR/$APP_DIR/$APP_BIN >>> >>> where NSLOTS=104 >>> >>> b. >>> [lanew@csclprd3s1 hpl]$ . /hpc/apps/benchmarks/runhpl4.job >>> [csclprd3-6-1:27586] MCW rank 3 bound to socket 1[core 3[hwt 0]]: [./.][./B] >>> [csclprd3-6-1:27586] MCW rank 0 bound to socket 0[core 0[hwt 0]]: [B/.][./.] >>> [csclprd3-6-1:27586] MCW rank 1 bound to socket 1[core 2[hwt 0]]: [./.][B/.] >>> [csclprd3-6-1:27586] MCW rank 2 bound to socket 0[core 1[hwt 0]]: [./B][./.] >>> [csclprd3-0-2:04454] MCW rank 27 bound to socket 0[core 1[hwt 0]]: >>> [./B/./././.] >>> [csclprd3-0-2:04454] MCW rank 28 bound to socket 0[core 2[hwt 0]]: >>> [././B/././.] >>> [csclprd3-0-2:04454] MCW rank 29 bound to socket 0[core 3[hwt 0]]: >>> [./././B/./.] >>> [csclprd3-0-2:04454] MCW rank 30 bound to socket 0[core 4[hwt 0]]: >>> [././././B/.] >>> [csclprd3-0-2:04454] MCW rank 31 bound to socket 0[core 5[hwt 0]]: >>> [./././././B] >>> [csclprd3-0-2:04454] MCW rank 26 bound to socket 0[core 0[hwt 0]]: >>> [B/././././.] >>> [csclprd3-0-0:21129] MCW rank 8 bound to socket 0[core 0[hwt 0]]: >>> [B/././././.][./././././.] >>> [csclprd3-0-0:21129] MCW rank 9 bound to socket 1[core 6[hwt 0]]: >>> [./././././.][B/././././.] >>> [csclprd3-0-0:21129] MCW rank 10 bound to socket 0[core 1[hwt 0]]: >>> [./B/./././.][./././././.] >>> [csclprd3-0-0:21129] MCW rank 11 bound to socket 1[core 7[hwt 0]]: >>> [./././././.][./B/./././.] >>> [csclprd3-0-0:21129] MCW rank 12 bound to socket 0[core 2[hwt 0]]: >>> [././B/././.][./././././.] >>> [csclprd3-0-0:21129] MCW rank 13 bound to socket 1[core 8[hwt 0]]: >>> [./././././.][././B/././.] >>> [csclprd3-0-0:21129] MCW rank 14 bound to socket 0[core 3[hwt 0]]: >>> [./././B/./.][./././././.] >>> [csclprd3-0-0:21129] MCW rank 15 bound to socket 1[core 9[hwt 0]]: >>> [./././././.][./././B/./.] >>> [csclprd3-0-0:21129] MCW rank 16 bound to socket 0[core 4[hwt 0]]: >>> [././././B/.][./././././.] >>> [csclprd3-0-0:21129] MCW rank 17 bound to socket 1[core 10[hwt 0]]: >>> [./././././.][././././B/.] >>> [csclprd3-0-0:21129] MCW rank 18 bound to socket 0[core 5[hwt 0]]: >>> [./././././B][./././././.] >>> [csclprd3-0-0:21129] MCW rank 19 bound to socket 1[core 11[hwt 0]]: >>> [./././././.][./././././B] >>> [csclprd3-0-1:12882] MCW rank 22 bound to socket 0[core 2[hwt 0]]: >>> [././B/././.] >>> [csclprd3-0-1:12882] MCW rank 23 bound to socket 0[core 3[hwt 0]]: >>> [./././B/./.] >>> [csclprd3-0-1:12882] MCW rank 24 bound to socket 0[core 4[hwt 0]]: >>> [././././B/.] >>> [csclprd3-0-1:12882] MCW rank 25 bound to socket 0[core 5[hwt 0]]: >>> [./././././B] >>> [csclprd3-0-1:12882] MCW rank 20 bound to socket 0[core 0[hwt 0]]: >>> [B/././././.] >>> [csclprd3-0-1:12882] MCW rank 21 bound to socket 0[core 1[hwt 0]]: >>> [./B/./././.] >>> [csclprd3-0-9:27268] MCW rank 92 bound to socket 0[core 2[hwt 0-1]]: >>> [../../BB/../../../../..][../../../../../../../..] >>> [csclprd3-0-9:27268] MCW rank 93 bound to socket 1[core 10[hwt 0-1]]: >>> [../../../../../../../..][../../BB/../../../../..] >>> [csclprd3-0-9:27268] MCW rank 94 bound to socket 0[core 3[hwt 0-1]]: >>> [../../../BB/../../../..][../../../../../../../..] >>> [csclprd3-0-9:27268] MCW rank 95 bound to socket 1[core 11[hwt 0-1]]: >>> [../../../../../../../..][../../../BB/../../../..] >>> [csclprd3-0-9:27268] MCW rank 96 bound to socket 0[core 4[hwt 0-1]]: >>> [../../../../BB/../../..][../../../../../../../..] >>> [csclprd3-0-9:27268] MCW rank 97 bound to socket 1[core 12[hwt 0-1]]: >>> [../../../../../../../..][../../../../BB/../../..] >>> [csclprd3-0-8:22880] MCW rank 84 bound to socket 0[core 6[hwt 0-1]]: >>> [../../../../../../BB/..][../../../../../../../..] >>> [csclprd3-0-8:22880] MCW rank 85 bound to socket 1[core 14[hwt 0-1]]: >>> [../../../../../../../..][../../../../../../BB/..] >>> [csclprd3-0-8:22880] MCW rank 86 bound to socket 0[core 7[hwt 0-1]]: >>> [../../../../../../../BB][../../../../../../../..] >>> [csclprd3-0-8:22880] MCW rank 87 bound to socket 1[core 15[hwt 0-1]]: >>> [../../../../../../../..][../../../../../../../BB] >>> [csclprd3-0-8:22880] MCW rank 72 bound to socket 0[core 0[hwt 0-1]]: >>> [BB/../../../../../../..][../../../../../../../..] >>> [csclprd3-0-8:22880] MCW rank 73 bound to socket 1[core 8[hwt 0-1]]: >>> [../../../../../../../..][BB/../../../../../../..] >>> [csclprd3-0-8:22880] MCW rank 74 bound to socket 0[core 1[hwt 0-1]]: >>> [../BB/../../../../../..][../../../../../../../..] >>> [csclprd3-0-8:22880] MCW rank 75 bound to socket 1[core 9[hwt 0-1]]: >>> [../../../../../../../..][../BB/../../../../../..] >>> [csclprd3-0-8:22880] MCW rank 76 bound to socket 0[core 2[hwt 0-1]]: >>> [../../BB/../../../../..][../../../../../../../..] >>> [csclprd3-0-8:22880] MCW rank 77 bound to socket 1[core 10[hwt 0-1]]: >>> [../../../../../../../..][../../BB/../../../../..] >>> [csclprd3-0-8:22880] MCW rank 78 bound to socket 0[core 3[hwt 0-1]]: >>> [../../../BB/../../../..][../../../../../../../..] >>> [csclprd3-6-5:18000] MCW rank 7 bound to socket 1[core 3[hwt 0]]: [./.][./B] >>> [csclprd3-6-5:18000] MCW rank 4 bound to socket 0[core 0[hwt 0]]: [B/.][./.] >>> [csclprd3-6-5:18000] MCW rank 5 bound to socket 1[core 2[hwt 0]]: [./.][B/.] >>> [csclprd3-6-5:18000] MCW rank 6 bound to socket 0[core 1[hwt 0]]: [./B][./.] >>> [csclprd3-0-7:08058] MCW rank 60 bound to socket 0[core 2[hwt 0-1]]: >>> [../../BB/../../../../..][../../../../../../../..] >>> [csclprd3-0-7:08058] MCW rank 61 bound to socket 1[core 10[hwt 0-1]]: >>> [../../../../../../../..][../../BB/../../../../..] >>> [csclprd3-0-7:08058] MCW rank 62 bound to socket 0[core 3[hwt 0-1]]: >>> [../../../BB/../../../..][../../../../../../../..] >>> [csclprd3-0-7:08058] MCW rank 63 bound to socket 1[core 11[hwt 0-1]]: >>> [../../../../../../../..][../../../BB/../../../..] >>> [csclprd3-0-7:08058] MCW rank 64 bound to socket 0[core 4[hwt 0-1]]: >>> [../../../../BB/../../..][../../../../../../../..] >>> [csclprd3-0-7:08058] MCW rank 65 bound to socket 1[core 12[hwt 0-1]]: >>> [../../../../../../../..][../../../../BB/../../..] >>> [csclprd3-0-7:08058] MCW rank 66 bound to socket 0[core 5[hwt 0-1]]: >>> [../../../../../BB/../..][../../../../../../../..] >>> [csclprd3-0-7:08058] MCW rank 67 bound to socket 1[core 13[hwt 0-1]]: >>> [../../../../../../../..][../../../../../BB/../..] >>> [csclprd3-0-7:08058] MCW rank 68 bound to socket 0[core 6[hwt 0-1]]: >>> [../../../../../../BB/..][../../../../../../../..] >>> [csclprd3-0-7:08058] MCW rank 69 bound to socket 1[core 14[hwt 0-1]]: >>> [../../../../../../../..][../../../../../../BB/..] >>> [csclprd3-0-7:08058] MCW rank 70 bound to socket 0[core 7[hwt 0-1]]: >>> [../../../../../../../BB][../../../../../../../..] >>> [csclprd3-0-7:08058] MCW rank 71 bound to socket 1[core 15[hwt 0-1]]: >>> [../../../../../../../..][../../../../../../../BB] >>> [csclprd3-0-7:08058] MCW rank 56 bound to socket 0[core 0[hwt 0-1]]: >>> [BB/../../../../../../..][../../../../../../../..] >>> [csclprd3-0-7:08058] MCW rank 57 bound to socket 1[core 8[hwt 0-1]]: >>> [../../../../../../../..][BB/../../../../../../..] >>> [csclprd3-0-7:08058] MCW rank 58 bound to socket 0[core 1[hwt 0-1]]: >>> [../BB/../../../../../..][../../../../../../../..] >>> [csclprd3-0-7:08058] MCW rank 59 bound to socket 1[core 9[hwt 0-1]]: >>> [../../../../../../../..][../BB/../../../../../..] >>> [csclprd3-0-5:15446] MCW rank 44 bound to socket 0[core 0[hwt 0]]: >>> [B/././././.] >>> [csclprd3-0-5:15446] MCW rank 45 bound to socket 0[core 1[hwt 0]]: >>> [./B/./././.] >>> [csclprd3-0-5:15446] MCW rank 46 bound to socket 0[core 2[hwt 0]]: >>> [././B/././.] >>> [csclprd3-0-5:15446] MCW rank 47 bound to socket 0[core 3[hwt 0]]: >>> [./././B/./.] >>> [csclprd3-0-5:15446] MCW rank 48 bound to socket 0[core 4[hwt 0]]: >>> [././././B/.] >>> [csclprd3-0-5:15446] MCW rank 49 bound to socket 0[core 5[hwt 0]]: >>> [./././././B] >>> [csclprd3-0-9:27268] MCW rank 98 bound to socket 0[core 5[hwt 0-1]]: >>> [../../../../../BB/../..][../../../../../../../..] >>> [csclprd3-0-9:27268] MCW rank 99 bound to socket 1[core 13[hwt 0-1]]: >>> [../../../../../../../..][../../../../../BB/../..] >>> [csclprd3-0-9:27268] MCW rank 100 bound to socket 0[core 6[hwt 0-1]]: >>> [../../../../../../BB/..][../../../../../../../..] >>> [csclprd3-0-9:27268] MCW rank 101 bound to socket 1[core 14[hwt 0-1]]: >>> [../../../../../../../..][../../../../../../BB/..] >>> [csclprd3-0-9:27268] MCW rank 102 bound to socket 0[core 7[hwt 0-1]]: >>> [../../../../../../../BB][../../../../../../../..] >>> [csclprd3-0-9:27268] MCW rank 103 bound to socket 1[core 15[hwt 0-1]]: >>> [../../../../../../../..][../../../../../../../BB] >>> [csclprd3-0-9:27268] MCW rank 88 bound to socket 0[core 0[hwt 0-1]]: >>> [BB/../../../../../../..][../../../../../../../..] >>> [csclprd3-0-9:27268] MCW rank 89 bound to socket 1[core 8[hwt 0-1]]: >>> [../../../../../../../..][BB/../../../../../../..] >>> [csclprd3-0-9:27268] MCW rank 90 bound to socket 0[core 1[hwt 0-1]]: >>> [../BB/../../../../../..][../../../../../../../..] >>> [csclprd3-0-9:27268] MCW rank 91 bound to socket 1[core 9[hwt 0-1]]: >>> [../../../../../../../..][../BB/../../../../../..] >>> [csclprd3-0-6:28854] MCW rank 51 bound to socket 0[core 1[hwt 0]]: >>> [./B/./././.] >>> [csclprd3-0-6:28854] MCW rank 52 bound to socket 0[core 2[hwt 0]]: >>> [././B/././.] >>> [csclprd3-0-6:28854] MCW rank 53 bound to socket 0[core 3[hwt 0]]: >>> [./././B/./.] >>> [csclprd3-0-6:28854] MCW rank 54 bound to socket 0[core 4[hwt 0]]: >>> [././././B/.] >>> [csclprd3-0-6:28854] MCW rank 55 bound to socket 0[core 5[hwt 0]]: >>> [./././././B] >>> [csclprd3-0-6:28854] MCW rank 50 bound to socket 0[core 0[hwt 0]]: >>> [B/././././.] >>> [csclprd3-0-4:28072] MCW rank 38 bound to socket 0[core 0[hwt 0]]: >>> [B/././././.] >>> [csclprd3-0-4:28072] MCW rank 39 bound to socket 0[core 1[hwt 0]]: >>> [./B/./././.] >>> [csclprd3-0-4:28072] MCW rank 40 bound to socket 0[core 2[hwt 0]]: >>> [././B/././.] >>> [csclprd3-0-4:28072] MCW rank 41 bound to socket 0[core 3[hwt 0]]: >>> [./././B/./.] >>> [csclprd3-0-4:28072] MCW rank 42 bound to socket 0[core 4[hwt 0]]: >>> [././././B/.] >>> [csclprd3-0-4:28072] MCW rank 43 bound to socket 0[core 5[hwt 0]]: >>> [./././././B] >>> [csclprd3-0-8:22880] MCW rank 79 bound to socket 1[core 11[hwt 0-1]]: >>> [../../../../../../../..][../../../BB/../../../..] >>> [csclprd3-0-8:22880] MCW rank 80 bound to socket 0[core 4[hwt 0-1]]: >>> [../../../../BB/../../..][../../../../../../../..] >>> [csclprd3-0-8:22880] MCW rank 81 bound to socket 1[core 12[hwt 0-1]]: >>> [../../../../../../../..][../../../../BB/../../..] >>> [csclprd3-0-8:22880] MCW rank 82 bound to socket 0[core 5[hwt 0-1]]: >>> [../../../../../BB/../..][../../../../../../../..] >>> [csclprd3-0-8:22880] MCW rank 83 bound to socket 1[core 13[hwt 0-1]]: >>> [../../../../../../../..][../../../../../BB/../..] >>> [csclprd3-0-3:11837] MCW rank 33 bound to socket 0[core 1[hwt 0]]: >>> [./B/./././.] >>> [csclprd3-0-3:11837] MCW rank 34 bound to socket 0[core 2[hwt 0]]: >>> [././B/././.] >>> [csclprd3-0-3:11837] MCW rank 35 bound to socket 0[core 3[hwt 0]]: >>> [./././B/./.] >>> [csclprd3-0-3:11837] MCW rank 36 bound to socket 0[core 4[hwt 0]]: >>> [././././B/.] >>> [csclprd3-0-3:11837] MCW rank 37 bound to socket 0[core 5[hwt 0]]: >>> [./././././B] >>> [csclprd3-0-3:11837] MCW rank 32 bound to socket 0[core 0[hwt 0]]: >>> [B/././././.] >>> [csclprd3-0-9:27275] *** Process received signal *** >>> [csclprd3-0-9:27275] Signal: Bus error (7) >>> [csclprd3-0-9:27275] Signal code: Non-existant physical address (2) >>> [csclprd3-0-9:27275] Failing at address: 0x7f1215253000 >>> [csclprd3-0-9:27276] *** Process received signal *** >>> [csclprd3-0-9:27276] Signal: Bus error (7) >>> [csclprd3-0-9:27276] Signal code: Non-existant physical address (2) >>> [csclprd3-0-9:27276] Failing at address: 0x7f563e874380 >>> [csclprd3-0-9:27277] *** Process received signal *** >>> [csclprd3-0-9:27277] Signal: Bus error (7) >>> [csclprd3-0-9:27277] Signal code: Non-existant physical address (2) >>> [csclprd3-0-9:27277] Failing at address: 0x7fbbec79a300 >>> [csclprd3-0-9:27278] *** Process received signal *** >>> [csclprd3-0-9:27278] Signal: Bus error (7) >>> [csclprd3-0-9:27278] Signal code: Non-existant physical address (2) >>> [csclprd3-0-9:27278] Failing at address: 0x7fbadf816080 >>> [csclprd3-0-9:27279] *** Process received signal *** >>> [csclprd3-0-9:27279] Signal: Bus error (7) >>> [csclprd3-0-9:27279] Signal code: Non-existant physical address (2) >>> [csclprd3-0-9:27279] Failing at address: 0x7fab5dfa0100 >>> [csclprd3-0-9:27280] *** Process received signal *** >>> [csclprd3-0-9:27280] Signal: Bus error (7) >>> [csclprd3-0-9:27280] Signal code: Non-existant physical address (2) >>> [csclprd3-0-9:27280] Failing at address: 0x7f0bb4034500 >>> [csclprd3-0-9:27281] *** Process received signal *** >>> [csclprd3-0-9:27281] Signal: Bus error (7) >>> [csclprd3-0-9:27281] Signal code: Non-existant physical address (2) >>> [csclprd3-0-9:27281] Failing at address: 0x7f49bb544f80 >>> [csclprd3-0-9:27282] *** Process received signal *** >>> [csclprd3-0-9:27282] Signal: Bus error (7) >>> [csclprd3-0-9:27282] Signal code: Non-existant physical address (2) >>> [csclprd3-0-9:27282] Failing at address: 0x7fe647f61f00 >>> [csclprd3-0-9:27283] *** Process received signal *** >>> [csclprd3-0-9:27283] Signal: Bus error (7) >>> [csclprd3-0-9:27283] Signal code: Non-existant physical address (2) >>> [csclprd3-0-9:27283] Failing at address: 0x7f79a9d25580 >>> [csclprd3-0-9:27272] *** Process received signal *** >>> [csclprd3-0-9:27272] Signal: Bus error (7) >>> [csclprd3-0-9:27272] Signal code: Non-existant physical address (2) >>> [csclprd3-0-9:27272] Failing at address: 0x7f64568adf80 >>> [csclprd3-0-9:27269] *** Process received signal *** >>> [csclprd3-0-9:27269] Signal: Bus error (7) >>> [csclprd3-0-9:27269] Signal code: Non-existant physical address (2) >>> [csclprd3-0-9:27269] Failing at address: 0x7f5e2a17e580 >>> [csclprd3-0-9:27270] *** Process received signal *** >>> [csclprd3-0-9:27270] Signal: Bus error (7) >>> [csclprd3-0-9:27270] Signal code: Non-existant physical address (2) >>> [csclprd3-0-9:27270] Failing at address: 0x7fda95421400 >>> [csclprd3-0-9:27271] *** Process received signal *** >>> [csclprd3-0-9:27271] Signal: Bus error (7) >>> [csclprd3-0-9:27271] Signal code: Non-existant physical address (2) >>> [csclprd3-0-9:27271] Failing at address: 0x7f873e76c100 >>> [csclprd3-0-9:27271] [ 0] [csclprd3-0-9:27273] *** Process received signal >>> *** >>> [csclprd3-0-9:27273] Signal: Bus error (7) >>> [csclprd3-0-9:27273] Signal code: Non-existant physical address (2) >>> [csclprd3-0-9:27273] Failing at address: 0x7f5dc6e99e80 >>> [csclprd3-0-9:27274] *** Process received signal *** >>> [csclprd3-0-9:27274] Signal: Bus error (7) >>> [csclprd3-0-9:27274] Signal code: Non-existant physical address (2) >>> [csclprd3-0-9:27274] Failing at address: 0x7f83afce2280 >>> [csclprd3-0-9:27274] [ 0] [csclprd3-0-9:27269] [ 0] >>> /lib64/libc.so.6(+0x32920)[0x7f5e39b82920] >>> [csclprd3-0-9:27269] [ 1] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_btl_sm.so(+0x511a)[0x7f5e2f5f111a] >>> [csclprd3-0-9:27269] [ 2] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_grow+0x239)[0x7f5e3a6960c9] >>> [csclprd3-0-9:27269] [ 3] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_resize_mt+0x40)[0x7f5e3a696200] >>> [csclprd3-0-9:27269] [ 4] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_bml_r2.so(+0x138e)[0x7f5e2f9f838e] >>> [csclprd3-0-9:27269] [ 5] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xd5)[0x7f5e2eb204e5] >>> [csclprd3-0-9:27269] [ 6] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7f5e3a6b0e26] >>> [csclprd3-0-9:27269] [ 7] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(MPI_Init+0x170)[0x7f5e3a6cfe10] >>> [csclprd3-0-9:27269] [ 8] /hpc/apps/benchmarks/hpl/xhpl[0x401571] >>> [csclprd3-0-9:27269] [ 9] >>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f5e39b6ecdd] >>> [csclprd3-0-9:27269] [10] /hpc/apps/benchmarks/hpl/xhpl[0x401439] >>> [csclprd3-0-9:27269] *** End of error message *** >>> [csclprd3-0-9:27270] [ 0] /lib64/libc.so.6(+0x32920)[0x7fdaa8d89920] >>> [csclprd3-0-9:27270] [ 1] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_btl_sm.so(+0x511a)[0x7fdaa29d711a] >>> [csclprd3-0-9:27270] [ 2] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_grow+0x239)[0x7fdaa989d0c9] >>> [csclprd3-0-9:27270] [ 3] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_resize_mt+0x40)[0x7fdaa989d200] >>> [csclprd3-0-9:27270] [ 4] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_bml_r2.so(+0x138e)[0x7fdaa2dde38e] >>> [csclprd3-0-9:27270] [ 5] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xd5)[0x7fdaa1f064e5] >>> [csclprd3-0-9:27270] [ 6] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7fdaa98b7e26] >>> [csclprd3-0-9:27270] [ 7] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(MPI_Init+0x170)[0x7fdaa98d6e10] >>> [csclprd3-0-9:27270] [ 8] /hpc/apps/benchmarks/hpl/xhpl[0x401571] >>> [csclprd3-0-9:27270] [ 9] >>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fdaa8d75cdd] >>> [csclprd3-0-9:27270] [10] /hpc/apps/benchmarks/hpl/xhpl[0x401439] >>> [csclprd3-0-9:27270] *** End of error message *** >>> /lib64/libc.so.6(+0x32920)[0x7f875211a920] >>> [csclprd3-0-9:27271] [ 1] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_btl_sm.so(+0x511a)[0x7f8747dfe11a] >>> [csclprd3-0-9:27271] [ 2] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_grow+0x239)[0x7f8752c2e0c9] >>> [csclprd3-0-9:27271] [ 3] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_resize_mt+0x40)[0x7f8752c2e200] >>> [csclprd3-0-9:27271] [ 4] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_bml_r2.so(+0x138e)[0x7f874c20538e] >>> [csclprd3-0-9:27271] [ 5] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xd5)[0x7f874732d4e5] >>> [csclprd3-0-9:27271] [ 6] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7f8752c48e26] >>> [csclprd3-0-9:27271] [ 7] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(MPI_Init+0x170)[0x7f8752c67e10] >>> [csclprd3-0-9:27271] [ 8] /hpc/apps/benchmarks/hpl/xhpl[0x401571] >>> [csclprd3-0-9:27271] [ 9] >>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f8752106cdd] >>> [csclprd3-0-9:27271] [10] /hpc/apps/benchmarks/hpl/xhpl[0x401439] >>> [csclprd3-0-9:27271] *** End of error message *** >>> [csclprd3-0-9:27273] [ 0] /lib64/libc.so.6(+0x32920)[0x7f5ddaa3c920] >>> [csclprd3-0-9:27273] [ 1] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_btl_sm.so(+0x511a)[0x7f5dd47ae11a] >>> [csclprd3-0-9:27273] [ 2] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_grow+0x239)[0x7f5ddb5500c9] >>> [csclprd3-0-9:27273] [ 3] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_resize_mt+0x40)[0x7f5ddb550200] >>> [csclprd3-0-9:27273] [ 4] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_bml_r2.so(+0x138e)[0x7f5dd4bb538e] >>> [csclprd3-0-9:27273] [ 5] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xd5)[0x7f5dcfbe54e5] >>> [csclprd3-0-9:27273] [ 6] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7f5ddb56ae26] >>> [csclprd3-0-9:27273] [ 7] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(MPI_Init+0x170)[0x7f5ddb589e10] >>> [csclprd3-0-9:27273] [ 8] /hpc/apps/benchmarks/hpl/xhpl[0x401571] >>> [csclprd3-0-9:27273] [ 9] >>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f5ddaa28cdd] >>> [csclprd3-0-9:27273] [10] /hpc/apps/benchmarks/hpl/xhpl[0x401439] >>> [csclprd3-0-9:27273] *** End of error message *** >>> [csclprd3-0-9:27275] [ 0] /lib64/libc.so.6(+0x32920)[0x7f1228ede920] >>> [csclprd3-0-9:27275] [ 1] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_btl_sm.so(+0x511a)[0x7f1222bac11a] >>> [csclprd3-0-9:27275] [ 2] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_grow+0x239)[0x7f12299f20c9] >>> [csclprd3-0-9:27275] [ 3] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_resize_mt+0x40)[0x7f12299f2200] >>> [csclprd3-0-9:27275] [ 4] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_bml_r2.so(+0x138e)[0x7f1222fb338e] >>> [csclprd3-0-9:27275] [ 5] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xd5)[0x7f12220db4e5] >>> [csclprd3-0-9:27275] [ 6] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7f1229a0ce26] >>> [csclprd3-0-9:27275] [ 7] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(MPI_Init+0x170)[0x7f1229a2be10] >>> [csclprd3-0-9:27275] [ 8] /hpc/apps/benchmarks/hpl/xhpl[0x401571] >>> [csclprd3-0-9:27275] [ 9] >>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f1228ecacdd] >>> [csclprd3-0-9:27275] [10] /hpc/apps/benchmarks/hpl/xhpl[0x401439] >>> [csclprd3-0-9:27275] *** End of error message *** >>> [csclprd3-0-9:27276] [ 0] /lib64/libc.so.6(+0x32920)[0x7f565218a920] >>> [csclprd3-0-9:27276] [ 1] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_btl_sm.so(+0x511a)[0x7f5647dfe11a] >>> [csclprd3-0-9:27276] [ 2] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_grow+0x239)[0x7f5652c9e0c9] >>> [csclprd3-0-9:27276] [ 3] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_resize_mt+0x40)[0x7f5652c9e200] >>> [csclprd3-0-9:27276] [ 4] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_bml_r2.so(+0x138e)[0x7f564c2f738e] >>> [csclprd3-0-9:27276] [ 5] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xd5)[0x7f564732d4e5] >>> [csclprd3-0-9:27276] [ 6] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7f5652cb8e26] >>> [csclprd3-0-9:27276] [ 7] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(MPI_Init+0x170)[0x7f5652cd7e10] >>> [csclprd3-0-9:27276] [ 8] /hpc/apps/benchmarks/hpl/xhpl[0x401571] >>> [csclprd3-0-9:27276] [ 9] >>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f5652176cdd] >>> [csclprd3-0-9:27276] [10] /hpc/apps/benchmarks/hpl/xhpl[0x401439] >>> [csclprd3-0-9:27276] *** End of error message *** >>> [csclprd3-0-9:27277] [ 0] /lib64/libc.so.6(+0x32920)[0x7fbc00059920] >>> [csclprd3-0-9:27277] [ 1] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_btl_sm.so(+0x511a)[0x7fbbf9de811a] >>> [csclprd3-0-9:27277] [ 2] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_grow+0x239)[0x7fbc00b6d0c9] >>> [csclprd3-0-9:27277] [ 3] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_resize_mt+0x40)[0x7fbc00b6d200] >>> [csclprd3-0-9:27277] [ 4] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_bml_r2.so(+0x138e)[0x7fbbfa1ef38e] >>> [csclprd3-0-9:27277] [ 5] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xd5)[0x7fbbf93174e5] >>> [csclprd3-0-9:27277] [ 6] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7fbc00b87e26] >>> [csclprd3-0-9:27277] [ 7] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(MPI_Init+0x170)[0x7fbc00ba6e10] >>> [csclprd3-0-9:27277] [ 8] /hpc/apps/benchmarks/hpl/xhpl[0x401571] >>> [csclprd3-0-9:27277] [ 9] >>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fbc00045cdd] >>> [csclprd3-0-9:27277] [10] /hpc/apps/benchmarks/hpl/xhpl[0x401439] >>> [csclprd3-0-9:27277] *** End of error message *** >>> [csclprd3-0-9:27278] [ 0] /lib64/libc.so.6(+0x32920)[0x7fbaf33a1920] >>> [csclprd3-0-9:27278] [ 1] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_btl_sm.so(+0x511a)[0x7fbaed0a211a] >>> [csclprd3-0-9:27278] [ 2] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_grow+0x239)[0x7fbaf3eb50c9] >>> [csclprd3-0-9:27278] [ 3] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_resize_mt+0x40)[0x7fbaf3eb5200] >>> [csclprd3-0-9:27278] [ 4] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_bml_r2.so(+0x138e)[0x7fbaed4a938e] >>> [csclprd3-0-9:27278] [ 5] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xd5)[0x7fbaec5d14e5] >>> [csclprd3-0-9:27278] [ 6] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7fbaf3ecfe26] >>> [csclprd3-0-9:27278] [ 7] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(MPI_Init+0x170)[0x7fbaf3eeee10] >>> [csclprd3-0-9:27278] [ 8] /hpc/apps/benchmarks/hpl/xhpl[0x401571] >>> [csclprd3-0-9:27278] [ 9] >>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fbaf338dcdd] >>> [csclprd3-0-9:27278] [10] /hpc/apps/benchmarks/hpl/xhpl[0x401439] >>> [csclprd3-0-9:27278] *** End of error message *** >>> [csclprd3-0-9:27279] [ 0] /lib64/libc.so.6(+0x32920)[0x7fab71930920] >>> [csclprd3-0-9:27279] [ 1] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_btl_sm.so(+0x511a)[0x7fab675f111a] >>> [csclprd3-0-9:27279] [ 2] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_grow+0x239)[0x7fab724440c9] >>> [csclprd3-0-9:27279] [ 3] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_resize_mt+0x40)[0x7fab72444200] >>> [csclprd3-0-9:27279] [ 4] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_bml_r2.so(+0x138e)[0x7fab679f838e] >>> [csclprd3-0-9:27279] [ 5] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xd5)[0x7fab66b204e5] >>> [csclprd3-0-9:27279] [ 6] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7fab7245ee26] >>> [csclprd3-0-9:27279] [ 7] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(MPI_Init+0x170)[0x7fab7247de10] >>> [csclprd3-0-9:27279] [ 8] /hpc/apps/benchmarks/hpl/xhpl[0x401571] >>> [csclprd3-0-9:27279] [ 9] >>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fab7191ccdd] >>> [csclprd3-0-9:27279] [10] /hpc/apps/benchmarks/hpl/xhpl[0x401439] >>> [csclprd3-0-9:27279] *** End of error message *** >>> [csclprd3-0-9:27280] [ 0] /lib64/libc.so.6(+0x32920)[0x7f0bc7a18920] >>> [csclprd3-0-9:27280] [ 1] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_btl_sm.so(+0x511a)[0x7f0bc163b11a] >>> [csclprd3-0-9:27280] [ 2] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_grow+0x239)[0x7f0bc852c0c9] >>> [csclprd3-0-9:27280] [ 3] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_resize_mt+0x40)[0x7f0bc852c200] >>> [csclprd3-0-9:27280] [ 4] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_bml_r2.so(+0x138e)[0x7f0bc1a4238e] >>> [csclprd3-0-9:27280] [ 5] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xd5)[0x7f0bc0b6a4e5] >>> [csclprd3-0-9:27280] [ 6] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7f0bc8546e26] >>> [csclprd3-0-9:27280] [ 7] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(MPI_Init+0x170)[0x7f0bc8565e10] >>> [csclprd3-0-9:27280] [ 8] /hpc/apps/benchmarks/hpl/xhpl[0x401571] >>> [csclprd3-0-9:27280] [ 9] >>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f0bc7a04cdd] >>> [csclprd3-0-9:27280] [10] /hpc/apps/benchmarks/hpl/xhpl[0x401439] >>> [csclprd3-0-9:27280] *** End of error message *** >>> [csclprd3-0-9:27281] [ 0] /lib64/libc.so.6(+0x32920)[0x7f49cf009920] >>> [csclprd3-0-9:27281] [ 1] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_btl_sm.so(+0x511a)[0x7f49c8d0911a] >>> [csclprd3-0-9:27281] [ 2] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_grow+0x239)[0x7f49cfb1d0c9] >>> [csclprd3-0-9:27281] [ 3] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_resize_mt+0x40)[0x7f49cfb1d200] >>> [csclprd3-0-9:27281] [ 4] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_bml_r2.so(+0x138e)[0x7f49c911038e] >>> [csclprd3-0-9:27281] [ 5] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xd5)[0x7f49c82384e5] >>> [csclprd3-0-9:27281] [ 6] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7f49cfb37e26] >>> [csclprd3-0-9:27281] [ 7] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(MPI_Init+0x170)[0x7f49cfb56e10] >>> [csclprd3-0-9:27281] [ 8] /hpc/apps/benchmarks/hpl/xhpl[0x401571] >>> [csclprd3-0-9:27281] [ 9] >>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f49ceff5cdd] >>> [csclprd3-0-9:27281] [10] /hpc/apps/benchmarks/hpl/xhpl[0x401439] >>> [csclprd3-0-9:27281] *** End of error message *** >>> [csclprd3-0-9:27282] [ 0] /lib64/libc.so.6(+0x32920)[0x7fe65bb89920] >>> [csclprd3-0-9:27282] [ 1] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_btl_sm.so(+0x511a)[0x7fe6557b711a] >>> [csclprd3-0-9:27282] [ 2] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_grow+0x239)[0x7fe65c69d0c9] >>> [csclprd3-0-9:27282] [ 3] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_resize_mt+0x40)[0x7fe65c69d200] >>> [csclprd3-0-9:27282] [ 4] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_bml_r2.so(+0x138e)[0x7fe655bbe38e] >>> [csclprd3-0-9:27282] [ 5] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xd5)[0x7fe654ce64e5] >>> [csclprd3-0-9:27282] [ 6] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7fe65c6b7e26] >>> [csclprd3-0-9:27282] [ 7] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(MPI_Init+0x170)[0x7fe65c6d6e10] >>> [csclprd3-0-9:27282] [ 8] /hpc/apps/benchmarks/hpl/xhpl[0x401571] >>> [csclprd3-0-9:27282] [ 9] >>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7fe65bb75cdd] >>> [csclprd3-0-9:27282] [10] /hpc/apps/benchmarks/hpl/xhpl[0x401439] >>> [csclprd3-0-9:27282] *** End of error message *** >>> [csclprd3-0-9:27283] [ 0] /lib64/libc.so.6(+0x32920)[0x7f79bd430920] >>> [csclprd3-0-9:27283] [ 1] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_btl_sm.so(+0x511a)[0x7f79b31e911a] >>> [csclprd3-0-9:27283] [ 2] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_grow+0x239)[0x7f79bdf440c9] >>> [csclprd3-0-9:27283] [ 3] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_resize_mt+0x40)[0x7f79bdf44200] >>> [csclprd3-0-9:27283] [ 4] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_bml_r2.so(+0x138e)[0x7f79b35f038e] >>> [csclprd3-0-9:27283] [ 5] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xd5)[0x7f79b27184e5] >>> [csclprd3-0-9:27283] [ 6] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7f79bdf5ee26] >>> [csclprd3-0-9:27283] [ 7] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(MPI_Init+0x170)[0x7f79bdf7de10] >>> [csclprd3-0-9:27283] [ 8] /hpc/apps/benchmarks/hpl/xhpl[0x401571] >>> [csclprd3-0-9:27283] [ 9] >>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f79bd41ccdd] >>> [csclprd3-0-9:27283] [10] /hpc/apps/benchmarks/hpl/xhpl[0x401439] >>> [csclprd3-0-9:27283] *** End of error message *** >>> /lib64/libc.so.6(+0x32920)[0x7f83c367f920] >>> [csclprd3-0-9:27274] [ 1] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_btl_sm.so(+0x511a)[0x7f83bd2f211a] >>> [csclprd3-0-9:27274] [ 2] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_grow+0x239)[0x7f83c41930c9] >>> [csclprd3-0-9:27274] [ 3] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_resize_mt+0x40)[0x7f83c4193200] >>> [csclprd3-0-9:27274] [ 4] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_bml_r2.so(+0x138e)[0x7f83bd6f938e] >>> [csclprd3-0-9:27274] [ 5] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xd5)[0x7f83bc8214e5] >>> [csclprd3-0-9:27274] [ 6] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7f83c41ade26] >>> [csclprd3-0-9:27274] [ 7] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(MPI_Init+0x170)[0x7f83c41cce10] >>> [csclprd3-0-9:27274] [ 8] /hpc/apps/benchmarks/hpl/xhpl[0x401571] >>> [csclprd3-0-9:27274] [ 9] >>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f83c366bcdd] >>> [csclprd3-0-9:27274] [10] /hpc/apps/benchmarks/hpl/xhpl[0x401439] >>> [csclprd3-0-9:27274] *** End of error message *** >>> [csclprd3-0-9:27284] *** Process received signal *** >>> [csclprd3-0-9:27284] Signal: Bus error (7) >>> [csclprd3-0-9:27284] Signal code: Non-existant physical address (2) >>> [csclprd3-0-9:27284] Failing at address: 0x7f7273480e80 >>> [csclprd3-0-9:27284] [ 0] /lib64/libc.so.6(+0x32920)[0x7f7287160920] >>> [csclprd3-0-9:27284] [ 1] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_btl_sm.so(+0x511a)[0x7f7280ebf11a] >>> [csclprd3-0-9:27284] [ 2] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_grow+0x239)[0x7f7287c740c9] >>> [csclprd3-0-9:27284] [ 3] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_resize_mt+0x40)[0x7f7287c74200] >>> [csclprd3-0-9:27284] [ 4] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_bml_r2.so(+0x138e)[0x7f72812c638e] >>> [csclprd3-0-9:27284] [ 5] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xd5)[0x7f72803ee4e5] >>> [csclprd3-0-9:27284] [ 6] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7f7287c8ee26] >>> [csclprd3-0-9:27284] [ 7] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(MPI_Init+0x170)[0x7f7287cade10] >>> [csclprd3-0-9:27284] [ 8] /hpc/apps/benchmarks/hpl/xhpl[0x401571] >>> [csclprd3-0-9:27284] [ 9] >>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f728714ccdd] >>> [csclprd3-0-9:27284] [10] /hpc/apps/benchmarks/hpl/xhpl[0x401439] >>> [csclprd3-0-9:27284] *** End of error message *** >>> [csclprd3-0-9:27272] [ 0] /lib64/libc.so.6(+0x32920)[0x7f646a415920] >>> [csclprd3-0-9:27272] [ 1] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_btl_sm.so(+0x511a)[0x7f64641b911a] >>> [csclprd3-0-9:27272] [ 2] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_grow+0x239)[0x7f646af290c9] >>> [csclprd3-0-9:27272] [ 3] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_free_list_resize_mt+0x40)[0x7f646af29200] >>> [csclprd3-0-9:27272] [ 4] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_bml_r2.so(+0x138e)[0x7f64645c038e] >>> [csclprd3-0-9:27272] [ 5] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0xd5)[0x7f645f5344e5] >>> [csclprd3-0-9:27272] [ 6] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(ompi_mpi_init+0x8d6)[0x7f646af43e26] >>> [csclprd3-0-9:27272] [ 7] >>> /hpc/apps/mpi/openmpi/1.8.2/lib/libmpi.so.1(MPI_Init+0x170)[0x7f646af62e10] >>> [csclprd3-0-9:27272] [ 8] /hpc/apps/benchmarks/hpl/xhpl[0x401571] >>> [csclprd3-0-9:27272] [ 9] >>> /lib64/libc.so.6(__libc_start_main+0xfd)[0x7f646a401cdd] >>> [csclprd3-0-9:27272] [10] /hpc/apps/benchmarks/hpl/xhpl[0x401439] >>> [csclprd3-0-9:27272] *** End of error message *** >>> -------------------------------------------------------------------------- >>> mpirun noticed that process rank 88 with PID 27269 on node csclprd3-0-9 >>> exited on signal 7 (Bus error). >>> -------------------------------------------------------------------------- >>> 16 total processes killed (some possibly by mpirun during cleanup) >>> >>> c. hostfile >>> csclprd3-6-1 slots=4 max-slots=4 >>> csclprd3-6-5 slots=4 max-slots=4 >>> csclprd3-0-0 slots=12 max-slots=24 >>> csclprd3-0-1 slots=6 max-slots=12 >>> csclprd3-0-2 slots=6 max-slots=12 >>> csclprd3-0-3 slots=6 max-slots=12 >>> csclprd3-0-4 slots=6 max-slots=12 >>> csclprd3-0-5 slots=6 max-slots=12 >>> csclprd3-0-6 slots=6 max-slots=12 >>> #total number of successfully tested non-hyperthreaded computes slots at >>> this point is 56 >>> csclprd3-0-7 slots=16 max-slots=32 >>> #total number of successfully tested slots at this point is 72 >>> csclprd3-0-8 slots=16 max-slots=32 >>> #total number of successfully tested slots at this point is 88 >>> csclprd3-0-9 slots=16 max-slots=32 >>> #total number of slots at this point is 104 >>> #csclprd3-0-10 slots=16 max-slots=32 >>> #csclprd3-0-11 slots=16 max-slots=32 >>> #total number of slots at this point is 136 >>> >>> >>> From: users [users-boun...@open-mpi.org >>> <mailto:users-boun...@open-mpi.org>] on behalf of Ralph Castain >>> [r...@open-mpi.org <mailto:r...@open-mpi.org>] >>> Sent: Wednesday, April 08, 2015 11:31 AM >>> To: Open MPI Users >>> Subject: Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3 >>> >>> Just for clarity: does the BIOS on the LGA2011 system have HT enabled? >>> >>>> On Apr 8, 2015, at 10:55 AM, Lane, William <william.l...@cshs.org >>>> <mailto:william.l...@cshs.org>> wrote: >>>> >>>> Ralph, >>>> >>>> I added one of the newer LGA2011 nodes to my hostfile and >>>> re-ran the benchmark successfully and saw some strange results WRT the >>>> binding directives. Why are hyperthreading cores being used >>>> on the LGA2011 system but not any of other systems which >>>> are mostly hyperthreaded Westmeres)? Isn't the --use-hwthread-cpus >>>> switch supposed to prevent OpenMPI from using hyperthreaded >>>> cores? >>>> >>>> OpenMPI LAPACK invocation: >>>> >>>> $MPI_DIR/bin/mpirun -np $NSLOTS --report-bindings --hostfile >>>> hostfile-single --mca btl_tcp_if_include eth0 --hetero-nodes >>>> --use-hwthread-cpus --prefix $MPI_DIR $BENCH_DIR/$APP_DIR/$APP_BIN >>>> >>>> Where NSLOTS=72 >>>> >>>> hostfile: >>>> csclprd3-6-1 slots=4 max-slots=4 >>>> csclprd3-6-5 slots=4 max-slots=4 >>>> csclprd3-0-0 slots=12 max-slots=24 >>>> csclprd3-0-1 slots=6 max-slots=12 >>>> csclprd3-0-2 slots=6 max-slots=12 >>>> csclprd3-0-3 slots=6 max-slots=12 >>>> csclprd3-0-4 slots=6 max-slots=12 >>>> csclprd3-0-5 slots=6 max-slots=12 >>>> csclprd3-0-6 slots=6 max-slots=12 >>>> #total number of successfully tested non-hyperthreaded computes slots at >>>> this point is 56 >>>> csclprd3-0-7 slots=16 max-slots=32 >>>> >>>> LGA1366 Westmere w/two Intel Xeon X5675 6-core/12-hyperthread CPU's >>>> >>>> [csclprd3-0-0:11848] MCW rank 11 bound to socket 1[core 7[hwt 0]]: >>>> [./././././.][./B/./././.] >>>> [csclprd3-0-0:11848] MCW rank 12 bound to socket 0[core 2[hwt 0]]: >>>> [././B/././.][./././././.] >>>> [csclprd3-0-0:11848] MCW rank 13 bound to socket 1[core 8[hwt 0]]: >>>> [./././././.][././B/././.] >>>> [csclprd3-0-0:11848] MCW rank 14 bound to socket 0[core 3[hwt 0]]: >>>> [./././B/./.][./././././.] >>>> [csclprd3-0-0:11848] MCW rank 15 bound to socket 1[core 9[hwt 0]]: >>>> [./././././.][./././B/./.] >>>> [csclprd3-0-0:11848] MCW rank 16 bound to socket 0[core 4[hwt 0]]: >>>> [././././B/.][./././././.] >>>> [csclprd3-0-0:11848] MCW rank 17 bound to socket 1[core 10[hwt 0]]: >>>> [./././././.][././././B/.] >>>> [csclprd3-0-0:11848] MCW rank 18 bound to socket 0[core 5[hwt 0]]: >>>> [./././././B][./././././.] >>>> [csclprd3-0-0:11848] MCW rank 19 bound to socket 1[core 11[hwt 0]]: >>>> [./././././.][./././././B] >>>> [csclprd3-0-0:11848] MCW rank 8 bound to socket 0[core 0[hwt 0]]: >>>> [B/././././.][./././././.] >>>> [csclprd3-0-0:11848] MCW rank 9 bound to socket 1[core 6[hwt 0]]: >>>> [./././././.][B/././././.] >>>> [csclprd3-0-0:11848] MCW rank 10 bound to socket 0[core 1[hwt 0]]: >>>> [./B/./././.][./././././.] >>>> >>>> but for the LGA2011 system w/two 8-core/16-hyperthread CPU's >>>> >>>> [csclprd3-0-7:30876] MCW rank 60 bound to socket 0[core 2[hwt 0-1]]: >>>> [../../BB/../../../../..][../../../../../../../..] >>>> [csclprd3-0-7:30876] MCW rank 61 bound to socket 1[core 10[hwt 0-1]]: >>>> [../../../../../../../..][../../BB/../../../../..] >>>> [csclprd3-0-7:30876] MCW rank 62 bound to socket 0[core 3[hwt 0-1]]: >>>> [../../../BB/../../../..][../../../../../../../..] >>>> [csclprd3-0-7:30876] MCW rank 63 bound to socket 1[core 11[hwt 0-1]]: >>>> [../../../../../../../..][../../../BB/../../../..] >>>> [csclprd3-0-7:30876] MCW rank 64 bound to socket 0[core 4[hwt 0-1]]: >>>> [../../../../BB/../../..][../../../../../../../..] >>>> [csclprd3-0-7:30876] MCW rank 65 bound to socket 1[core 12[hwt 0-1]]: >>>> [../../../../../../../..][../../../../BB/../../..] >>>> [csclprd3-0-7:30876] MCW rank 66 bound to socket 0[core 5[hwt 0-1]]: >>>> [../../../../../BB/../..][../../../../../../../..] >>>> [csclprd3-0-7:30876] MCW rank 67 bound to socket 1[core 13[hwt 0-1]]: >>>> [../../../../../../../..][../../../../../BB/../..] >>>> [csclprd3-0-7:30876] MCW rank 68 bound to socket 0[core 6[hwt 0-1]]: >>>> [../../../../../../BB/..][../../../../../../../..] >>>> [csclprd3-0-7:30876] MCW rank 69 bound to socket 1[core 14[hwt 0-1]]: >>>> [../../../../../../../..][../../../../../../BB/..] >>>> [csclprd3-0-7:30876] MCW rank 70 bound to socket 0[core 7[hwt 0-1]]: >>>> [../../../../../../../BB][../../../../../../../..] >>>> [csclprd3-0-7:30876] MCW rank 71 bound to socket 1[core 15[hwt 0-1]]: >>>> [../../../../../../../..][../../../../../../../BB] >>>> [csclprd3-0-7:30876] MCW rank 56 bound to socket 0[core 0[hwt 0-1]]: >>>> [BB/../../../../../../..][../../../../../../../..] >>>> [csclprd3-0-7:30876] MCW rank 57 bound to socket 1[core 8[hwt 0-1]]: >>>> [../../../../../../../..][BB/../../../../../../..] >>>> [csclprd3-0-7:30876] MCW rank 58 bound to socket 0[core 1[hwt 0-1]]: >>>> [../BB/../../../../../..][../../../../../../../..] >>>> [csclprd3-0-7:30876] MCW rank 59 bound to socket 1[core 9[hwt 0-1]]: >>>> [../../../../../../../..][../BB/../../../../../..] >>>> >>>> >>>> >>>> >>>> From: users [users-boun...@open-mpi.org >>>> <mailto:users-boun...@open-mpi.org>] on behalf of Ralph Castain >>>> [r...@open-mpi.org <mailto:r...@open-mpi.org>] >>>> Sent: Wednesday, April 08, 2015 10:26 AM >>>> To: Open MPI Users >>>> Subject: Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3 >>>> >>>> >>>>> On Apr 8, 2015, at 9:29 AM, Lane, William <william.l...@cshs.org >>>>> <mailto:william.l...@cshs.org>> wrote: >>>>> >>>>> Ralph, >>>>> >>>>> Thanks for YOUR help, I never >>>>> would've managed to get the LAPACK >>>>> benchmark running on more than one >>>>> node in our cluster without your help. >>>>> >>>>> Ralph, is hyperthreading more of a curse >>>>> than an advantage for HPC applications? >>>> >>>> Wow, you’ll get a lot of argument over that issue! From what I can see, it >>>> is very application dependent. Some apps appear to benefit, while others >>>> can even suffer from it. >>>> >>>> I think we should support a mix of nodes in this usage, so I’ll try to >>>> come up with a way to do so. >>>> >>>>> >>>>> I'm going to go through all the OpenMPI >>>>> articles on hyperthreading and NUMA to >>>>> see if that will shed any light on these >>>>> issues. >>>>> >>>>> -Bill L. >>>>> >>>>> >>>>> From: users [users-boun...@open-mpi.org >>>>> <mailto:users-boun...@open-mpi.org>] on behalf of Ralph Castain >>>>> [r...@open-mpi.org <mailto:r...@open-mpi.org>] >>>>> Sent: Tuesday, April 07, 2015 7:32 PM >>>>> To: Open MPI Users >>>>> Subject: Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3 >>>>> >>>>> I’m not sure our man pages are good enough to answer your question, but >>>>> here is the URL >>>>> >>>>> http://www.open-mpi.org/doc/v1.8/ <http://www.open-mpi.org/doc/v1.8/> >>>>> >>>>> I’m a tad tied up right now, but I’ll try to address this prior to 1.8.5 >>>>> release. Thanks for all that debug effort! Helps a bunch. >>>>> >>>>>> On Apr 7, 2015, at 1:17 PM, Lane, William <william.l...@cshs.org >>>>>> <mailto:william.l...@cshs.org>> wrote: >>>>>> >>>>>> Ralph, >>>>>> >>>>>> I've finally had some luck using the following: >>>>>> $MPI_DIR/bin/mpirun -np $NSLOTS --report-bindings --hostfile >>>>>> hostfile-single --mca btl_tcp_if_include eth0 --hetero-nodes >>>>>> --use-hwthread-cpus --prefix $MPI_DIR $BENCH_DIR/$APP_DIR/$APP_BIN >>>>>> >>>>>> Where $NSLOTS was 56 and my hostfile hostfile-single is: >>>>>> >>>>>> csclprd3-0-0 slots=12 max-slots=24 >>>>>> csclprd3-0-1 slots=6 max-slots=12 >>>>>> csclprd3-0-2 slots=6 max-slots=12 >>>>>> csclprd3-0-3 slots=6 max-slots=12 >>>>>> csclprd3-0-4 slots=6 max-slots=12 >>>>>> csclprd3-0-5 slots=6 max-slots=12 >>>>>> csclprd3-0-6 slots=6 max-slots=12 >>>>>> csclprd3-6-1 slots=4 max-slots=4 >>>>>> csclprd3-6-5 slots=4 max-slots=4 >>>>>> >>>>>> The max-slots differs from slots on some nodes >>>>>> because I include the hyperthreaded cores in >>>>>> the max-slots, the last two nodes have CPU's that >>>>>> don't support hyperthreading at all. >>>>>> >>>>>> Does --use-hwthread-cpus prevent slots from >>>>>> being assigned to hyperthreading cores? >>>>>> >>>>>> For some reason the manpage for OpenMPI 1.8.2 >>>>>> isn't installed on our CentOS 6.3 systems is there a >>>>>> URL I can I find a copy of the manpages for OpenMPI 1.8.2? >>>>>> >>>>>> Thanks for your help, >>>>>> >>>>>> -Bill Lane >>>>>> >>>>>> From: users [users-boun...@open-mpi.org >>>>>> <mailto:users-boun...@open-mpi.org>] on behalf of Ralph Castain >>>>>> [r...@open-mpi.org <mailto:r...@open-mpi.org>] >>>>>> Sent: Monday, April 06, 2015 1:39 PM >>>>>> To: Open MPI Users >>>>>> Subject: Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3 >>>>>> >>>>>> Hmmm…well, that shouldn’t be the issue. To check, try running it with >>>>>> “bind-to none”. If you can get a backtrace telling us where it is >>>>>> crashing, that would also help. >>>>>> >>>>>> >>>>>>> On Apr 6, 2015, at 12:24 PM, Lane, William <william.l...@cshs.org >>>>>>> <mailto:william.l...@cshs.org>> wrote: >>>>>>> >>>>>>> Ralph, >>>>>>> >>>>>>> For the following two different commandline invocations of the LAPACK >>>>>>> benchmark >>>>>>> >>>>>>> $MPI_DIR/bin/mpirun -np $NSLOTS --report-bindings --hostfile >>>>>>> hostfile-no_slots --mca btl_tcp_if_include eth0 --hetero-nodes >>>>>>> --use-hwthread-cpus --bind-to hwthread --prefix $MPI_DIR >>>>>>> $BENCH_DIR/$APP_DIR/$APP_BIN >>>>>>> >>>>>>> $MPI_DIR/bin/mpirun -np $NSLOTS --report-bindings --hostfile >>>>>>> hostfile-no_slots --mca btl_tcp_if_include eth0 --hetero-nodes >>>>>>> --bind-to-core --prefix $MPI_DIR $BENCH_DIR/$APP_DIR/$APP_BIN >>>>>>> >>>>>>> I'm receiving the same kinds of OpenMPI error messages (but for >>>>>>> different nodes in the ring): >>>>>>> >>>>>>> [csclprd3-0-16:25940] *** Process received signal *** >>>>>>> [csclprd3-0-16:25940] Signal: Bus error (7) >>>>>>> [csclprd3-0-16:25940] Signal code: Non-existant physical >>>>>>> address (2) >>>>>>> [csclprd3-0-16:25940] Failing at address: 0x7f8b1b5a2600 >>>>>>> >>>>>>> >>>>>>> -------------------------------------------------------------------------- >>>>>>> mpirun noticed that process rank 82 with PID 25936 on node >>>>>>> csclprd3-0-16 exited on signal 7 (Bus error). >>>>>>> >>>>>>> -------------------------------------------------------------------------- >>>>>>> 16 total processes killed (some possibly by mpirun during >>>>>>> cleanup) >>>>>>> >>>>>>> It seems to occur on systems that have more than one, physical CPU >>>>>>> installed. Could >>>>>>> this be due to a lack of the correct NUMA libraries being installed? >>>>>>> >>>>>>> -Bill L. >>>>>>> >>>>>>> From: users [users-boun...@open-mpi.org >>>>>>> <mailto:users-boun...@open-mpi.org>] on behalf of Ralph Castain >>>>>>> [r...@open-mpi.org <mailto:r...@open-mpi.org>] >>>>>>> Sent: Sunday, April 05, 2015 6:09 PM >>>>>>> To: Open MPI Users >>>>>>> Subject: Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3 >>>>>>> >>>>>>> >>>>>>>> On Apr 5, 2015, at 5:58 PM, Lane, William <william.l...@cshs.org >>>>>>>> <mailto:william.l...@cshs.org>> wrote: >>>>>>>> >>>>>>>> I think some of the Intel Blade systems in the cluster are >>>>>>>> dual core, but don't support hyperthreading. Maybe it >>>>>>>> would be better to exclude hyperthreading altogether >>>>>>>> from submitted OpenMPI jobs? >>>>>>> >>>>>>> Yes - or you can add "--hetero-nodes -use-hwthread-cpus --bind-to >>>>>>> hwthread" to the cmd line. This tells mpirun that the nodes aren't all >>>>>>> the same, and so it has to look at each node's topology instead of >>>>>>> taking the first node as the template for everything. The second tells >>>>>>> it to use the HTs as independent cpus where they are supported. >>>>>>> >>>>>>> I'm not entirely sure the suggestion will work - if we hit a place >>>>>>> where HT isn't supported, we may balk at being asked to bind to HTs. I >>>>>>> can probably make a change that supports this kind of hetero >>>>>>> arrangement (perhaps something like bind-to pu) - might make it into >>>>>>> 1.8.5 (we are just starting the release process on it now). >>>>>>> >>>>>>>> >>>>>>>> OpenMPI doesn't crash, but it doesn't run the LAPACK >>>>>>>> benchmark either. >>>>>>>> >>>>>>>> Thanks again Ralph. >>>>>>>> >>>>>>>> Bill L. >>>>>>>> >>>>>>>> From: users [users-boun...@open-mpi.org >>>>>>>> <mailto:users-boun...@open-mpi.org>] on behalf of Ralph Castain >>>>>>>> [r...@open-mpi.org <mailto:r...@open-mpi.org>] >>>>>>>> Sent: Wednesday, April 01, 2015 8:40 AM >>>>>>>> To: Open MPI Users >>>>>>>> Subject: Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3 >>>>>>>> >>>>>>>> Bingo - you said the magic word. This is a terminology issue. When we >>>>>>>> say "core", we mean the old definition of "core", not "hyperthreads". >>>>>>>> If you want to use HTs as your base processing unit and bind to them, >>>>>>>> then you need to specify --bind-to hwthread. That warning should then >>>>>>>> go away. >>>>>>>> >>>>>>>> We don't require a swap region be mounted - I didn't see anything in >>>>>>>> your original message indicating that OMPI had actually crashed, but >>>>>>>> just wasn't launching due to the above issue. Were you actually seeing >>>>>>>> crashes as well? >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Apr 1, 2015 at 8:31 AM, Lane, William <william.l...@cshs.org >>>>>>>> <mailto:william.l...@cshs.org>> wrote: >>>>>>>> Ralph, >>>>>>>> >>>>>>>> Here's the associated hostfile: >>>>>>>> >>>>>>>> #openMPI hostfile for csclprd3 >>>>>>>> #max slots prevents oversubscribing csclprd3-0-9 >>>>>>>> csclprd3-0-0 slots=12 max-slots=12 >>>>>>>> csclprd3-0-1 slots=6 max-slots=6 >>>>>>>> csclprd3-0-2 slots=6 max-slots=6 >>>>>>>> csclprd3-0-3 slots=6 max-slots=6 >>>>>>>> csclprd3-0-4 slots=6 max-slots=6 >>>>>>>> csclprd3-0-5 slots=6 max-slots=6 >>>>>>>> csclprd3-0-6 slots=6 max-slots=6 >>>>>>>> csclprd3-0-7 slots=32 max-slots=32 >>>>>>>> csclprd3-0-8 slots=32 max-slots=32 >>>>>>>> csclprd3-0-9 slots=32 max-slots=32 >>>>>>>> csclprd3-0-10 slots=32 max-slots=32 >>>>>>>> csclprd3-0-11 slots=32 max-slots=32 >>>>>>>> csclprd3-0-12 slots=12 max-slots=12 >>>>>>>> csclprd3-0-13 slots=24 max-slots=24 >>>>>>>> csclprd3-0-14 slots=16 max-slots=16 >>>>>>>> csclprd3-0-15 slots=16 max-slots=16 >>>>>>>> csclprd3-0-16 slots=24 max-slots=24 >>>>>>>> csclprd3-0-17 slots=24 max-slots=24 >>>>>>>> csclprd3-6-1 slots=4 max-slots=4 >>>>>>>> csclprd3-6-5 slots=4 max-slots=4 >>>>>>>> >>>>>>>> The number of slots also includes hyperthreading >>>>>>>> cores. >>>>>>>> >>>>>>>> One more question, would not having defined swap >>>>>>>> partitions on all the nodes in the ring cause OpenMPI >>>>>>>> to crash? Because no swap partitions are defined >>>>>>>> for any of the above systems. >>>>>>>> >>>>>>>> -Bill L. >>>>>>>> >>>>>>>> >>>>>>>> From: users [users-boun...@open-mpi.org >>>>>>>> <mailto:users-boun...@open-mpi.org>] on behalf of Ralph Castain >>>>>>>> [r...@open-mpi.org <mailto:r...@open-mpi.org>] >>>>>>>> Sent: Wednesday, April 01, 2015 5:04 AM >>>>>>>> To: Open MPI Users >>>>>>>> Subject: Re: [OMPI users] OpenMPI 1.8.2 problems on CentOS 6.3 >>>>>>>> >>>>>>>> The warning about binding to memory is due to not having numactl-devel >>>>>>>> installed on the system. The job would still run, but we are warning >>>>>>>> you that we cannot bind memory to the same domain as the core where we >>>>>>>> bind the process. Can cause poor performance, but not fatal. I forget >>>>>>>> the name of the param, but you can tell us to "shut up" :-) >>>>>>>> >>>>>>>> The other warning/error indicates that we aren't seeing enough cores >>>>>>>> on the allocation you gave us via the hostile to support one proc/core >>>>>>>> - i.e., we didn't at least 128 cores in the sum of the nodes you told >>>>>>>> us about. I take it you were expecting that there were that many or >>>>>>>> more? >>>>>>>> >>>>>>>> Ralph >>>>>>>> >>>>>>>> >>>>>>>> On Wed, Apr 1, 2015 at 12:54 AM, Lane, William <william.l...@cshs.org >>>>>>>> <mailto:william.l...@cshs.org>> wrote: >>>>>>>> I'm having problems running OpenMPI jobs >>>>>>>> (using a hostfile) on an HPC cluster running >>>>>>>> ROCKS on CentOS 6.3. I'm running OpenMPI >>>>>>>> outside of Sun Grid Engine (i.e. it is not submitted >>>>>>>> as a job to SGE). The program being run is a LAPACK >>>>>>>> benchmark. The commandline parameter I'm >>>>>>>> using to run the jobs is: >>>>>>>> >>>>>>>> $MPI_DIR/bin/mpirun -np $NSLOTS -bind-to-core -report-bindings >>>>>>>> --hostfile hostfile --mca btl_tcp_if_include eth0 --prefix $MPI_DIR >>>>>>>> $BENCH_DIR/$APP_DIR/$APP_BIN >>>>>>>> >>>>>>>> Where MPI_DIR=/hpc/apps/mpi/openmpi/1.8.2/ >>>>>>>> NSLOTS=128 >>>>>>>> >>>>>>>> I'm getting errors of the form and OpenMPI never runs the LAPACK >>>>>>>> benchmark: >>>>>>>> >>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>>>>>> WARNING: a request was made to bind a process. While the system >>>>>>>> supports binding the process itself, at least one node does NOT >>>>>>>> support binding memory to the process location. >>>>>>>> >>>>>>>> Node: csclprd3-0-11 >>>>>>>> >>>>>>>> This usually is due to not having the required NUMA support >>>>>>>> installed >>>>>>>> on the node. In some Linux distributions, the required support is >>>>>>>> contained in the libnumactl and libnumactl-devel packages. >>>>>>>> This is a warning only; your job will continue, though performance >>>>>>>> may be degraded. >>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>>>>>> >>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>>>>>> A request was made to bind to that would result in binding more >>>>>>>> processes than cpus on a resource: >>>>>>>> >>>>>>>> Bind to: CORE >>>>>>>> Node: csclprd3-0-11 >>>>>>>> #processes: 2 >>>>>>>> #cpus: 1 >>>>>>>> >>>>>>>> You can override this protection by adding the "overload-allowed" >>>>>>>> option to your binding directive. >>>>>>>> >>>>>>>> -------------------------------------------------------------------------- >>>>>>>> >>>>>>>> The only installed numa packages are: >>>>>>>> numactl.x86_64 >>>>>>>> 2.0.7-3.el6 @centos6.3-x86_64-0/$ >>>>>>>> >>>>>>>> When I search for the available NUMA packages I find: >>>>>>>> >>>>>>>> yum search numa | less >>>>>>>> >>>>>>>> Loaded plugins: fastestmirror >>>>>>>> Loading mirror speeds from cached hostfile >>>>>>>> ============================== N/S Matched: numa >>>>>>>> =============================== >>>>>>>> numactl-devel.i686 : Development package for building >>>>>>>> Applications that use numa >>>>>>>> numactl-devel.x86_64 : Development package for building >>>>>>>> Applications that use >>>>>>>> : numa >>>>>>>> numad.x86_64 : NUMA user daemon >>>>>>>> numactl.i686 : Library for tuning for Non Uniform Memory >>>>>>>> Access machines >>>>>>>> numactl.x86_64 : Library for tuning for Non Uniform Memory >>>>>>>> Access machines >>>>>>>> >>>>>>>> Do I need to install additional and/or different NUMA packages in >>>>>>>> order to get OpenMPI to work >>>>>>>> on this cluster? >>>>>>>> >>>>>>>> -Bill Lane >>>>>>>> IMPORTANT WARNING: This message is intended for the use of the person >>>>>>>> or entity to which it is addressed and may contain information that is >>>>>>>> privileged and confidential, the disclosure of which is governed by >>>>>>>> applicable law. If the reader of this message is not the intended >>>>>>>> recipient, or the employee or agent responsible for delivering it to >>>>>>>> the intended recipient, you are hereby notified that any >>>>>>>> dissemination, distribution or copying of this information is strictly >>>>>>>> prohibited. Thank you for your cooperation. >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>>>>> Searchable archives: >>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/index.php >>>>>>>> <http://www.open-mpi.org/community/lists/users/2015/04/index.php> >>>>>>>> >>>>>>>> IMPORTANT WARNING: This message is intended for the use of the person >>>>>>>> or entity to which it is addressed and may contain information that is >>>>>>>> privileged and confidential, the disclosure of which is governed by >>>>>>>> applicable law. If the reader of this message is not the intended >>>>>>>> recipient, or the employee or agent responsible for delivering it to >>>>>>>> the intended recipient, you are hereby notified that any >>>>>>>> dissemination, distribution or copying of this information is strictly >>>>>>>> prohibited. Thank you for your cooperation. >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>>>>> Link to this post: >>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26589.php >>>>>>>> <http://www.open-mpi.org/community/lists/users/2015/04/26589.php> >>>>>>>> >>>>>>>> IMPORTANT WARNING: This message is intended for the use of the person >>>>>>>> or entity to which it is addressed and may contain information that is >>>>>>>> privileged and confidential, the disclosure of which is governed by >>>>>>>> applicable law. If the reader of this message is not the intended >>>>>>>> recipient, or the employee or agent responsible for delivering it to >>>>>>>> the intended recipient, you are hereby notified that any >>>>>>>> dissemination, distribution or copying of this information is strictly >>>>>>>> prohibited. Thank you for your cooperation. >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>>>>> Link to this post: >>>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26611.php >>>>>>>> <http://www.open-mpi.org/community/lists/users/2015/04/26611.php> >>>>>>> IMPORTANT WARNING: This message is intended for the use of the person >>>>>>> or entity to which it is addressed and may contain information that is >>>>>>> privileged and confidential, the disclosure of which is governed by >>>>>>> applicable law. If the reader of this message is not the intended >>>>>>> recipient, or the employee or agent responsible for delivering it to >>>>>>> the intended recipient, you are hereby notified that any dissemination, >>>>>>> distribution or copying of this information is strictly prohibited. >>>>>>> Thank you for your cooperation. >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>>>> Link to this post: >>>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26618.php >>>>>>> <http://www.open-mpi.org/community/lists/users/2015/04/26618.php> >>>>>> IMPORTANT WARNING: This message is intended for the use of the person or >>>>>> entity to which it is addressed and may contain information that is >>>>>> privileged and confidential, the disclosure of which is governed by >>>>>> applicable law. If the reader of this message is not the intended >>>>>> recipient, or the employee or agent responsible for delivering it to the >>>>>> intended recipient, you are hereby notified that any dissemination, >>>>>> distribution or copying of this information is strictly prohibited. >>>>>> Thank you for your cooperation. >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>>> Link to this post: >>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26643.php >>>>>> <http://www.open-mpi.org/community/lists/users/2015/04/26643.php> >>>>> IMPORTANT WARNING: This message is intended for the use of the person or >>>>> entity to which it is addressed and may contain information that is >>>>> privileged and confidential, the disclosure of which is governed by >>>>> applicable law. If the reader of this message is not the intended >>>>> recipient, or the employee or agent responsible for delivering it to the >>>>> intended recipient, you are hereby notified that any dissemination, >>>>> distribution or copying of this information is strictly prohibited. Thank >>>>> you for your cooperation. _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>>> Link to this post: >>>>> http://www.open-mpi.org/community/lists/users/2015/04/26655.php >>>>> <http://www.open-mpi.org/community/lists/users/2015/04/26655.php> >>>> IMPORTANT WARNING: This message is intended for the use of the person or >>>> entity to which it is addressed and may contain information that is >>>> privileged and confidential, the disclosure of which is governed by >>>> applicable law. If the reader of this message is not the intended >>>> recipient, or the employee or agent responsible for delivering it to the >>>> intended recipient, you are hereby notified that any dissemination, >>>> distribution or copying of this information is strictly prohibited. Thank >>>> you for your cooperation. _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>>> Link to this post: >>>> http://www.open-mpi.org/community/lists/users/2015/04/26659.php >>>> <http://www.open-mpi.org/community/lists/users/2015/04/26659.php> >>> IMPORTANT WARNING: This message is intended for the use of the person or >>> entity to which it is addressed and may contain information that is >>> privileged and confidential, the disclosure of which is governed by >>> applicable law. If the reader of this message is not the intended >>> recipient, or the employee or agent responsible for delivering it to the >>> intended recipient, you are hereby notified that any dissemination, >>> distribution or copying of this information is strictly prohibited. Thank >>> you for your cooperation. _______________________________________________ >>> users mailing list >>> us...@open-mpi.org <mailto:us...@open-mpi.org> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >>> <http://www.open-mpi.org/mailman/listinfo.cgi/users> >>> Link to this post: >>> http://www.open-mpi.org/community/lists/users/2015/04/26664.php >>> <http://www.open-mpi.org/community/lists/users/2015/04/26664.php> > > IMPORTANT WARNING: This message is intended for the use of the person or > entity to which it is addressed and may contain information that is > privileged and confidential, the disclosure of which is governed by > applicable law. If the reader of this message is not the intended recipient, > or the employee or agent responsible for delivering it to the intended > recipient, you are hereby notified that any dissemination, distribution or > copying of this information is strictly prohibited. Thank you for your > cooperation. _______________________________________________ > users mailing list > us...@open-mpi.org <mailto:us...@open-mpi.org> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > <http://www.open-mpi.org/mailman/listinfo.cgi/users> > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/04/26805.php > <http://www.open-mpi.org/community/lists/users/2015/04/26805.php>