Hooray! On Dec 19, 2013, at 10:14 PM, tmish...@jcity.maeda.co.jp wrote:
> > > Hi Ralph, > > Thank you for your fix. It works for me. > > Tetsuya Mishima > > >> Actually, it looks like it would happen with hetero-nodes set - only > required that at least two nodes have the same architecture. So you might > want to give the trunk a shot as it may well now be >> fixed. >> >> >> On Dec 19, 2013, at 8:35 AM, Ralph Castain <r...@open-mpi.org> wrote: >> >>> Hmmm...not having any luck tracking this down yet. If anything, based > on what I saw in the code, I would have expected it to fail when > hetero-nodes was false, not the other way around. >>> >>> I'll keep poking around - just wanted to provide an update. >>> >>> On Dec 19, 2013, at 12:54 AM, tmish...@jcity.maeda.co.jp wrote: >>> >>>> >>>> >>>> Hi Ralph, sorry for intersecting post. >>>> >>>> Your advice about -hetero-nodes in other thread gives me a hint. >>>> >>>> I already put "orte_hetero_nodes = 1" in my mca-params.conf, because >>>> you told me a month ago that my environment would need this option. >>>> >>>> Removing this line from mca-params.conf, then it works. >>>> In other word, you can replicate it by adding -hetero-nodes as >>>> shown below. >>>> >>>> qsub: job 8364.manage.cluster completed >>>> [mishima@manage mpi]$ qsub -I -l nodes=2:ppn=8 >>>> qsub: waiting for job 8365.manage.cluster to start >>>> qsub: job 8365.manage.cluster ready >>>> >>>> [mishima@node11 ~]$ ompi_info --all | grep orte_hetero_nodes >>>> MCA orte: parameter "orte_hetero_nodes" (current value: >>>> "false", data source: default, level: 9 dev/all, >>>> type: bool) >>>> [mishima@node11 ~]$ cd ~/Desktop/openmpi-1.7/demos/ >>>> [mishima@node11 demos]$ mpirun -np 4 -cpus-per-proc 4 -report-bindings >>>> myprog >>>> [node11.cluster:27895] MCW rank 0 bound to socket 0[core 0[hwt 0]], > socket >>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so >>>> cket 0[core 3[hwt 0]]: [B/B/B/B][./././.] >>>> [node11.cluster:27895] MCW rank 1 bound to socket 1[core 4[hwt 0]], > socket >>>> 1[core 5[hwt 0]], socket 1[core 6[hwt 0]], so >>>> cket 1[core 7[hwt 0]]: [./././.][B/B/B/B] >>>> [node12.cluster:24891] MCW rank 3 bound to socket 1[core 4[hwt 0]], > socket >>>> 1[core 5[hwt 0]], socket 1[core 6[hwt 0]], so >>>> cket 1[core 7[hwt 0]]: [./././.][B/B/B/B] >>>> [node12.cluster:24891] MCW rank 2 bound to socket 0[core 0[hwt 0]], > socket >>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so >>>> cket 0[core 3[hwt 0]]: [B/B/B/B][./././.] >>>> Hello world from process 0 of 4 >>>> Hello world from process 1 of 4 >>>> Hello world from process 2 of 4 >>>> Hello world from process 3 of 4 >>>> [mishima@node11 demos]$ mpirun -np 4 -cpus-per-proc 4 -report-bindings >>>> -hetero-nodes myprog >>>> > -------------------------------------------------------------------------- >>>> A request was made to bind to that would result in binding more >>>> processes than cpus on a resource: >>>> >>>> Bind to: CORE >>>> Node: node12 >>>> #processes: 2 >>>> #cpus: 1 >>>> >>>> You can override this protection by adding the "overload-allowed" >>>> option to your binding directive. >>>> > -------------------------------------------------------------------------- >>>> >>>> >>>> As far as I checked, data->num_bound seems to become bad in > bind_downwards, >>>> when I put "-hetero-nodes". I hope you can clear the problem. >>>> >>>> Regards, >>>> Tetsuya Mishima >>>> >>>> >>>>> Yes, it's very strange. But I don't think there's any chance that >>>>> I have < 8 actual cores on the node. I guess that you cat replicate >>>>> it with SLURM, please try it again. >>>>> >>>>> I changed to use node10 and node11, then I got the warning against >>>>> node11. >>>>> >>>>> Furthermore, just as an information for you, I tried to add >>>>> "-bind-to core:overload-allowed", then it worked as shown below. >>>>> But I think node11 is never overloaded because it has 8 cores. >>>>> >>>>> qsub: job 8342.manage.cluster completed >>>>> [mishima@manage ~]$ qsub -I -l nodes=node10:ppn=8+node11:ppn=8 >>>>> qsub: waiting for job 8343.manage.cluster to start >>>>> qsub: job 8343.manage.cluster ready >>>>> >>>>> [mishima@node10 ~]$ cd ~/Desktop/openmpi-1.7/demos/ >>>>> [mishima@node10 demos]$ cat $PBS_NODEFILE >>>>> node10 >>>>> node10 >>>>> node10 >>>>> node10 >>>>> node10 >>>>> node10 >>>>> node10 >>>>> node10 >>>>> node11 >>>>> node11 >>>>> node11 >>>>> node11 >>>>> node11 >>>>> node11 >>>>> node11 >>>>> node11 >>>>> [mishima@node10 demos]$ mpirun -np 4 -cpus-per-proc 4 > -report-bindings >>>>> myprog >>>>> >>>> > -------------------------------------------------------------------------- >>>>> A request was made to bind to that would result in binding more >>>>> processes than cpus on a resource: >>>>> >>>>> Bind to: CORE >>>>> Node: node11 >>>>> #processes: 2 >>>>> #cpus: 1 >>>>> >>>>> You can override this protection by adding the "overload-allowed" >>>>> option to your binding directive. >>>>> >>>> > -------------------------------------------------------------------------- >>>>> [mishima@node10 demos]$ mpirun -np 4 -cpus-per-proc 4 > -report-bindings >>>>> -bind-to core:overload-allowed myprog >>>>> [node10.cluster:27020] MCW rank 0 bound to socket 0[core 0[hwt 0]], >>>> socket >>>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so >>>>> cket 0[core 3[hwt 0]]: [B/B/B/B][./././.] >>>>> [node10.cluster:27020] MCW rank 1 bound to socket 1[core 4[hwt 0]], >>>> socket >>>>> 1[core 5[hwt 0]], socket 1[core 6[hwt 0]], so >>>>> cket 1[core 7[hwt 0]]: [./././.][B/B/B/B] >>>>> [node11.cluster:26597] MCW rank 3 bound to socket 1[core 4[hwt 0]], >>>> socket >>>>> 1[core 5[hwt 0]], socket 1[core 6[hwt 0]], so >>>>> cket 1[core 7[hwt 0]]: [./././.][B/B/B/B] >>>>> [node11.cluster:26597] MCW rank 2 bound to socket 0[core 0[hwt 0]], >>>> socket >>>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so >>>>> cket 0[core 3[hwt 0]]: [B/B/B/B][./././.] >>>>> Hello world from process 1 of 4 >>>>> Hello world from process 0 of 4 >>>>> Hello world from process 3 of 4 >>>>> Hello world from process 2 of 4 >>>>> >>>>> Regards, >>>>> Tetsuya Mishima >>>>> >>>>> >>>>>> Very strange - I can't seem to replicate it. Is there any chance > that >>>> you >>>>> have < 8 actual cores on node12? >>>>>> >>>>>> >>>>>> On Dec 18, 2013, at 4:53 PM, tmish...@jcity.maeda.co.jp wrote: >>>>>> >>>>>>> >>>>>>> >>>>>>> Hi Ralph, sorry for confusing you. >>>>>>> >>>>>>> At that time, I cut and paste the part of "cat $PBS_NODEFILE". >>>>>>> I guess I didn't paste the last line by my mistake. >>>>>>> >>>>>>> I retried the test and below one is exactly what I got when I did > the >>>>> test. >>>>>>> >>>>>>> [mishima@manage ~]$ qsub -I -l nodes=node11:ppn=8+node12:ppn=8 >>>>>>> qsub: waiting for job 8338.manage.cluster to start >>>>>>> qsub: job 8338.manage.cluster ready >>>>>>> >>>>>>> [mishima@node11 ~]$ cat $PBS_NODEFILE >>>>>>> node11 >>>>>>> node11 >>>>>>> node11 >>>>>>> node11 >>>>>>> node11 >>>>>>> node11 >>>>>>> node11 >>>>>>> node11 >>>>>>> node12 >>>>>>> node12 >>>>>>> node12 >>>>>>> node12 >>>>>>> node12 >>>>>>> node12 >>>>>>> node12 >>>>>>> node12 >>>>>>> [mishima@node11 ~]$ mpirun -np 4 -cpus-per-proc 4 -report-bindings >>>>> myprog >>>>>>> >>>>> >>>> > -------------------------------------------------------------------------- >>>>>>> A request was made to bind to that would result in binding more >>>>>>> processes than cpus on a resource: >>>>>>> >>>>>>> Bind to: CORE >>>>>>> Node: node12 >>>>>>> #processes: 2 >>>>>>> #cpus: 1 >>>>>>> >>>>>>> You can override this protection by adding the "overload-allowed" >>>>>>> option to your binding directive. >>>>>>> >>>>> >>>> > -------------------------------------------------------------------------- >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> Tetsuya Mishima >>>>>>> >>>>>>>> I removed the debug in #2 - thanks for reporting it >>>>>>>> >>>>>>>> For #1, it actually looks to me like this is correct. If you look > at >>>>> your >>>>>>> allocation, there are only 7 slots being allocated on node12, yet > you >>>>> have >>>>>>> asked for 8 cpus to be assigned (2 procs with 2 >>>>>>>> cpus/proc). So the warning is in fact correct >>>>>>>> >>>>>>>> >>>>>>>> On Dec 18, 2013, at 4:04 PM, tmish...@jcity.maeda.co.jp wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Hi Ralph, I found that openmpi-1.7.4rc1 was already uploaded. So >>>> I'd >>>>>>> like >>>>>>>>> to report >>>>>>>>> 3 issues mainly regarding -cpus-per-proc. >>>>>>>>> >>>>>>>>> 1) When I use 2 nodes(node11,node12), which has 8 cores each(= 2 >>>>>>> sockets X >>>>>>>>> 4 cores/socket), >>>>>>>>> it starts to produce the error again as shown below. At least, >>>>>>>>> openmpi-1.7.4a1r29646 did >>>>>>>>> work well. >>>>>>>>> >>>>>>>>> [mishima@manage ~]$ qsub -I -l nodes=2:ppn=8 >>>>>>>>> qsub: waiting for job 8336.manage.cluster to start >>>>>>>>> qsub: job 8336.manage.cluster ready >>>>>>>>> >>>>>>>>> [mishima@node11 ~]$ cd ~/Desktop/openmpi-1.7/demos/ >>>>>>>>> [mishima@node11 demos]$ cat $PBS_NODEFILE >>>>>>>>> node11 >>>>>>>>> node11 >>>>>>>>> node11 >>>>>>>>> node11 >>>>>>>>> node11 >>>>>>>>> node11 >>>>>>>>> node11 >>>>>>>>> node11 >>>>>>>>> node12 >>>>>>>>> node12 >>>>>>>>> node12 >>>>>>>>> node12 >>>>>>>>> node12 >>>>>>>>> node12 >>>>>>>>> node12 >>>>>>>>> [mishima@node11 demos]$ mpirun -np 4 -cpus-per-proc 4 >>>>> -report-bindings >>>>>>>>> myprog >>>>>>>>> >>>>>>> >>>>> >>>> > -------------------------------------------------------------------------- >>>>>>>>> A request was made to bind to that would result in binding more >>>>>>>>> processes than cpus on a resource: >>>>>>>>> >>>>>>>>> Bind to: CORE >>>>>>>>> Node: node12 >>>>>>>>> #processes: 2 >>>>>>>>> #cpus: 1 >>>>>>>>> >>>>>>>>> You can override this protection by adding the "overload-allowed" >>>>>>>>> option to your binding directive. >>>>>>>>> >>>>>>> >>>>> >>>> > -------------------------------------------------------------------------- >>>>>>>>> >>>>>>>>> Of course it works well using only one node. >>>>>>>>> >>>>>>>>> [mishima@node11 demos]$ mpirun -np 2 -cpus-per-proc 4 >>>>> -report-bindings >>>>>>>>> myprog >>>>>>>>> [node11.cluster:26238] MCW rank 0 bound to socket 0[core 0[hwt > 0]], >>>>>>> socket >>>>>>>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so >>>>>>>>> cket 0[core 3[hwt 0]]: [B/B/B/B][./././.] >>>>>>>>> [node11.cluster:26238] MCW rank 1 bound to socket 1[core 4[hwt > 0]], >>>>>>> socket >>>>>>>>> 1[core 5[hwt 0]], socket 1[core 6[hwt 0]], so >>>>>>>>> cket 1[core 7[hwt 0]]: [./././.][B/B/B/B] >>>>>>>>> Hello world from process 1 of 2 >>>>>>>>> Hello world from process 0 of 2 >>>>>>>>> >>>>>>>>> >>>>>>>>> 2) Adding "-bind-to numa", it works but the message "bind:upward >>>>> target >>>>>>>>> NUMANode type NUMANode" appears. >>>>>>>>> As far as I remember, I didn't see such a kind of message before. >>>>>>>>> >>>>>>>>> mishima@node11 demos]$ mpirun -np 4 -cpus-per-proc 4 >>>> -report-bindings >>>>>>>>> -bind-to numa myprog >>>>>>>>> [node11.cluster:26260] [[8844,0],0] bind:upward target NUMANode >>>> type >>>>>>>>> NUMANode >>>>>>>>> [node11.cluster:26260] [[8844,0],0] bind:upward target NUMANode >>>> type >>>>>>>>> NUMANode >>>>>>>>> [node11.cluster:26260] [[8844,0],0] bind:upward target NUMANode >>>> type >>>>>>>>> NUMANode >>>>>>>>> [node11.cluster:26260] [[8844,0],0] bind:upward target NUMANode >>>> type >>>>>>>>> NUMANode >>>>>>>>> [node11.cluster:26260] MCW rank 0 bound to socket 0[core 0[hwt > 0]], >>>>>>> socket >>>>>>>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so >>>>>>>>> cket 0[core 3[hwt 0]]: [B/B/B/B][./././.] >>>>>>>>> [node11.cluster:26260] MCW rank 1 bound to socket 1[core 4[hwt > 0]], >>>>>>> socket >>>>>>>>> 1[core 5[hwt 0]], socket 1[core 6[hwt 0]], so >>>>>>>>> cket 1[core 7[hwt 0]]: [./././.][B/B/B/B] >>>>>>>>> [node12.cluster:23607] MCW rank 3 bound to socket 1[core 4[hwt > 0]], >>>>>>> socket >>>>>>>>> 1[core 5[hwt 0]], socket 1[core 6[hwt 0]], so >>>>>>>>> cket 1[core 7[hwt 0]]: [./././.][B/B/B/B] >>>>>>>>> [node12.cluster:23607] MCW rank 2 bound to socket 0[core 0[hwt > 0]], >>>>>>> socket >>>>>>>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so >>>>>>>>> cket 0[core 3[hwt 0]]: [B/B/B/B][./././.] >>>>>>>>> Hello world from process 1 of 4 >>>>>>>>> Hello world from process 0 of 4 >>>>>>>>> Hello world from process 3 of 4 >>>>>>>>> Hello world from process 2 of 4 >>>>>>>>> >>>>>>>>> >>>>>>>>> 3) I use PGI compiler. It can not accept compiler switch >>>>>>>>> "-Wno-variadic-macros", which is >>>>>>>>> included in configure script. >>>>>>>>> >>>>>>>>> btl_usnic_CFLAGS="-Wno-variadic-macros" >>>>>>>>> >>>>>>>>> I removed this switch, then I could continue to build 1.7.4rc1. >>>>>>>>> >>>>>>>>> Regards, >>>>>>>>> Tetsuya Mishima >>>>>>>>> >>>>>>>>> >>>>>>>>>> Hmmm...okay, I understand the scenario. Must be something in the >>>>> algo >>>>>>>>> when it only has one node, so it shouldn't be too hard to track >>>> down. >>>>>>>>>> >>>>>>>>>> I'm off on travel for a few days, but will return to this when I >>>> get >>>>>>>>> back. >>>>>>>>>> >>>>>>>>>> Sorry for delay - will try to look at this while I'm gone, but >>>> can't >>>>>>>>> promise anything :-( >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Dec 10, 2013, at 6:58 PM, tmish...@jcity.maeda.co.jp wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Hi Ralph, sorry for confusing. >>>>>>>>>>> >>>>>>>>>>> We usually logon to "manage", which is our control node. >>>>>>>>>>> From manage, we submit job or enter a remote node such as >>>>>>>>>>> node03 by torque interactive mode(qsub -I). >>>>>>>>>>> >>>>>>>>>>> At that time, instead of torque, I just did rsh to node03 from >>>>> manage >>>>>>>>>>> and ran myprog on the node. I hope you could understand what I >>>> did. >>>>>>>>>>> >>>>>>>>>>> Now, I retried with "-host node03", which still causes the >>>> problem: >>>>>>>>>>> (I comfirmed local run on manage caused the same problem too) >>>>>>>>>>> >>>>>>>>>>> [mishima@manage ~]$ rsh node03 >>>>>>>>>>> Last login: Wed Dec 11 11:38:57 from manage >>>>>>>>>>> [mishima@node03 ~]$ cd ~/Desktop/openmpi-1.7/demos/ >>>>>>>>>>> [mishima@node03 demos]$ >>>>>>>>>>> [mishima@node03 demos]$ mpirun -np 8 -host node03 >>>> -report-bindings >>>>>>>>>>> -cpus-per-proc 4 -map-by socket myprog >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>>> > -------------------------------------------------------------------------- >>>>>>>>>>> A request was made to bind to that would result in binding more >>>>>>>>>>> processes than cpus on a resource: >>>>>>>>>>> >>>>>>>>>>> Bind to: CORE >>>>>>>>>>> Node: node03 >>>>>>>>>>> #processes: 2 >>>>>>>>>>> #cpus: 1 >>>>>>>>>>> >>>>>>>>>>> You can override this protection by adding the > "overload-allowed" >>>>>>>>>>> option to your binding directive. >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>>> > -------------------------------------------------------------------------- >>>>>>>>>>> >>>>>>>>>>> It' strange, but I have to report that "-map-by socket:span" >>>> worked >>>>>>>>> well. >>>>>>>>>>> >>>>>>>>>>> [mishima@node03 demos]$ mpirun -np 8 -host node03 >>>> -report-bindings >>>>>>>>>>> -cpus-per-proc 4 -map-by socket:span myprog >>>>>>>>>>> [node03.cluster:11871] MCW rank 2 bound to socket 1[core 8[hwt >>>> 0]], >>>>>>>>> socket >>>>>>>>>>> 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], s >>>>>>>>>>> ocket 1[core 11[hwt 0]]: >>>>>>>>>>> >>>>> [./././././././.][B/B/B/B/./././.][./././././././.][./././././././.] >>>>>>>>>>> [node03.cluster:11871] MCW rank 3 bound to socket 1[core 12[hwt >>>>> 0]], >>>>>>>>> socket >>>>>>>>>>> 1[core 13[hwt 0]], socket 1[core 14[hwt 0]], >>>>>>>>>>> socket 1[core 15[hwt 0]]: >>>>>>>>>>> >>>>> [./././././././.][././././B/B/B/B][./././././././.][./././././././.] >>>>>>>>>>> [node03.cluster:11871] MCW rank 4 bound to socket 2[core 16[hwt >>>>> 0]], >>>>>>>>> socket>>>>>>>>> 2[core 17[hwt 0]], socket 2[core 18[hwt 0]], >>>>>>>>>>> socket 2[core 19[hwt 0]]: >>>>>>>>>>> >>>>> [./././././././.][./././././././.][B/B/B/B/./././.][./././././././.] >>>>>>>>>>> [node03.cluster:11871] MCW rank 5 bound to socket 2[core 20[hwt >>>>> 0]], >>>>>>>>> socket >>>>>>>>>>> 2[core 21[hwt 0]], socket 2[core 22[hwt 0]], >>>>>>>>>>> socket 2[core 23[hwt 0]]: >>>>>>>>>>> >>>>> [./././././././.][./././././././.][././././B/B/B/B][./././././././.] >>>>>>>>>>> [node03.cluster:11871] MCW rank 6 bound to socket 3[core 24[hwt >>>>> 0]], >>>>>>>>> socket >>>>>>>>>>> 3[core 25[hwt 0]], socket 3[core 26[hwt 0]], >>>>>>>>>>> socket 3[core 27[hwt 0]]: >>>>>>>>>>> >>>>> [./././././././.][./././././././.][./././././././.][B/B/B/B/./././.] >>>>>>>>>>> [node03.cluster:11871] MCW rank 7 bound to socket 3[core 28[hwt >>>>> 0]], >>>>>>>>> socket >>>>>>>>>>> 3[core 29[hwt 0]], socket 3[core 30[hwt 0]], >>>>>>>>>>> socket 3[core 31[hwt 0]]: >>>>>>>>>>> >>>>> [./././././././.][./././././././.][./././././././.][././././B/B/B/B] >>>>>>>>>>> [node03.cluster:11871] MCW rank 0 bound to socket 0[core 0[hwt >>>> 0]], >>>>>>>>> socket >>>>>>>>>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so >>>>>>>>>>> cket 0[core 3[hwt 0]]: >>>>>>>>>>> >>>>> [B/B/B/B/./././.][./././././././.][./././././././.][./././././././.] >>>>>>>>>>> [node03.cluster:11871] MCW rank 1 bound to socket 0[core 4[hwt >>>> 0]], >>>>>>>>> socket >>>>>>>>>>> 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], so >>>>>>>>>>> cket 0[core 7[hwt 0]]: >>>>>>>>>>> >>>>> [././././B/B/B/B][./././././././.][./././././././.][./././././././.] >>>>>>>>>>> Hello world from process 2 of 8 >>>>>>>>>>> Hello world from process 6 of 8 >>>>>>>>>>> Hello world from process 3 of 8 >>>>>>>>>>> Hello world from process 7 of 8 >>>>>>>>>>> Hello world from process 1 of 8 >>>>>>>>>>> Hello world from process 5 of 8 >>>>>>>>>>> Hello world from process 0 of 8 >>>>>>>>>>> Hello world from process 4 of 8 >>>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Tetsuya Mishima >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> On Dec 10, 2013, at 6:05 PM, tmish...@jcity.maeda.co.jp wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Hi Ralph, >>>>>>>>>>>>> >>>>>>>>>>>>> I tried again with -cpus-per-proc 2 as shown below. >>>>>>>>>>>>> Here, I found that "-map-by socket:span" worked well. >>>>>>>>>>>>> >>>>>>>>>>>>> [mishima@node03 demos]$ mpirun -np 8 -report-bindings >>>>>>> -cpus-per-proc >>>>>>>>> 2 >>>>>>>>>>>>> -map-by socket:span myprog >>>>>>>>>>>>> [node03.cluster:10879] MCW rank 2 bound to socket 1[core 8 > [hwt >>>>> 0]], >>>>>>>>>>> socket >>>>>>>>>>>>> 1[core 9[hwt 0]]: [./././././././.][B/B/././. >>>>>>>>>>>>> /././.][./././././././.][./././././././.] >>>>>>>>>>>>> [node03.cluster:10879] MCW rank 3 bound to socket 1[core 10 > [hwt >>>>>>> 0]], >>>>>>>>>>> socket >>>>>>>>>>>>> 1[core 11[hwt 0]]: [./././././././.][././B/B >>>>>>>>>>>>> /./././.][./././././././.][./././././././.] >>>>>>>>>>>>> [node03.cluster:10879] MCW rank 4 bound to socket 2[core 16 > [hwt >>>>>>> 0]], >>>>>>>>>>> socket >>>>>>>>>>>>> 2[core 17[hwt 0]]: [./././././././.][./././. >>>>>>>>>>>>> /./././.][B/B/./././././.][./././././././.] >>>>>>>>>>>>> [node03.cluster:10879] MCW rank 5 bound to socket 2[core 18 > [hwt >>>>>>> 0]], >>>>>>>>>>> socket >>>>>>>>>>>>> 2[core 19[hwt 0]]: [./././././././.][./././. >>>>>>>>>>>>> /./././.][././B/B/./././.][./././././././.] >>>>>>>>>>>>> [node03.cluster:10879] MCW rank 6 bound to socket 3[core 24 > [hwt >>>>>>> 0]], >>>>>>>>>>> socket >>>>>>>>>>>>> 3[core 25[hwt 0]]: [./././././././.][./././. >>>>>>>>>>>>> /./././.][./././././././.][B/B/./././././.] >>>>>>>>>>>>> [node03.cluster:10879] MCW rank 7 bound to socket 3[core 26 > [hwt >>>>>>> 0]], >>>>>>>>>>> socket >>>>>>>>>>>>> 3[core 27[hwt 0]]: [./././././././.][./././. >>>>>>>>>>>>> /./././.][./././././././.][././B/B/./././.] >>>>>>>>>>>>> [node03.cluster:10879] MCW rank 0 bound to socket 0[core 0 > [hwt >>>>> 0]], >>>>>>>>>>> socket >>>>>>>>>>>>> 0[core 1[hwt 0]]: [B/B/./././././.][././././. >>>>>>>>>>>>> /././.][./././././././.][./././././././.] >>>>>>>>>>>>> [node03.cluster:10879] MCW rank 1 bound to socket 0[core 2 > [hwt >>>>> 0]], >>>>>>>>>>> socket >>>>>>>>>>>>> 0[core 3[hwt 0]]: [././B/B/./././.][././././. >>>>>>>>>>>>> /././.][./././././././.][./././././././.] >>>>>>>>>>>>> Hello world from process 1 of 8 >>>>>>>>>>>>> Hello world from process 0 of 8 >>>>>>>>>>>>> Hello world from process 4 of 8 >>>>>>>>>>>>> Hello world from process 2 of 8 >>>>>>>>>>>>> Hello world from process 7 of 8 >>>>>>>>>>>>> Hello world from process 6 of 8 >>>>>>>>>>>>> Hello world from process 5 of 8> >>>>>>> Hello world from >>>> process 3 of 8 >>>>>>>>>>>>> [mishima@node03 demos]$ mpirun -np 8 -report-bindings >>>>>>> -cpus-per-proc >>>>>>>>> 2 >>>>>>>>>>>>> -map-by socket myprog >>>>>>>>>>>>> [node03.cluster:10921] MCW rank 2 bound to socket 0[core 4 > [hwt >>>>> 0]], >>>>>>>>>>> socket >>>>>>>>>>>>> 0[core 5[hwt 0]]: [././././B/B/./.][././././. >>>>>>>>>>>>> /././.][./././././././.][./././././././.] >>>>>>>>>>>>> [node03.cluster:10921] MCW rank 3 bound to socket 0[core 6 > [hwt >>>>> 0]], >>>>>>>>>>> socket >>>>>>>>>>>>> 0[core 7[hwt 0]]: [././././././B/B][././././. >>>>>>>>>>>>> /././.][./././././././.][./././././././.] >>>>>>>>>>>>> [node03.cluster:10921] MCW rank 4 bound to socket 1[core 8 > [hwt >>>>> 0]], >>>>>>>>>>> socket >>>>>>>>>>>>> 1[core 9[hwt 0]]: [./././././././.][B/B/././. >>>>>>>>>>>>> /././.][./././././././.][./././././././.] >>>>>>>>>>>>> [node03.cluster:10921] MCW rank 5 bound to socket 1[core 10 > [hwt >>>>>>> 0]], >>>>>>>>>>> socket >>>>>>>>>>>>> 1[core 11[hwt 0]]: [./././././././.][././B/B >>>>>>>>>>>>> /./././.][./././././././.][./././././././.] >>>>>>>>>>>>> [node03.cluster:10921] MCW rank 6 bound to socket 1[core 12 > [hwt >>>>>>> 0]], >>>>>>>>>>> socket >>>>>>>>>>>>> 1[core 13[hwt 0]]: [./././././././.][./././. >>>>>>>>>>>>> /B/B/./.][./././././././.][./././././././.] >>>>>>>>>>>>> [node03.cluster:10921] MCW rank 7 bound to socket 1[core 14 > [hwt >>>>>>> 0]], >>>>>>>>>>> socket >>>>>>>>>>>>> 1[core 15[hwt 0]]: [./././././././.][./././. >>>>>>>>>>>>> /././B/B][./././././././.][./././././././.] >>>>>>>>>>>>> [node03.cluster:10921] MCW rank 0 bound to socket 0[core 0 > [hwt >>>>> 0]], >>>>>>>>>>> socket >>>>>>>>>>>>> 0[core 1[hwt 0]]: [B/B/./././././.][././././. >>>>>>>>>>>>> /././.][./././././././.][./././././././.] >>>>>>>>>>>>> [node03.cluster:10921] MCW rank 1 bound to socket 0[core 2 > [hwt >>>>> 0]], >>>>>>>>>>> socket >>>>>>>>>>>>> 0[core 3[hwt 0]]: [././B/B/./././.][././././. >>>>>>>>>>>>> /././.][./././././././.][./././././././.] >>>>>>>>>>>>> Hello world from process 5 of 8 >>>>>>>>>>>>> Hello world from process 1 of 8 >>>>>>>>>>>>> Hello world from process 6 of 8 >>>>>>>>>>>>> Hello world from process 4 of 8 >>>>>>>>>>>>> Hello world from process 2 of 8 >>>>>>>>>>>>> Hello world from process 0 of 8 >>>>>>>>>>>>> Hello world from process 7 of 8 >>>>>>>>>>>>> Hello world from process 3 of 8 >>>>>>>>>>>>> >>>>>>>>>>>>> "-np 8" and "-cpus-per-proc 4" just filled all sockets. >>>>>>>>>>>>> In this case, I guess "-map-by socket:span" and "-map-by >>>> socket" >>>>>>> has >>>>>>>>>>> same >>>>>>>>>>>>> meaning. >>>>>>>>>>>>> Therefore, there's no problem about that. Sorry for > distubing. >>>>>>>>>>>> >>>>>>>>>>>> No problem - glad you could clear that up :-) >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> By the way, through this test, I found another problem. >>>>>>>>>>>>> Without torque manager and just using rsh, it causes the same >>>>> error >>>>>>>>>>> like >>>>>>>>>>>>> below: >>>>>>>>>>>>> >>>>>>>>>>>>> [mishima@manage openmpi-1.7]$ rsh node03 >>>>>>>>>>>>> Last login: Wed Dec 11 09:42:02 from manage >>>>>>>>>>>>> [mishima@node03 ~]$ cd ~/Desktop/openmpi-1.7/demos/ >>>>>>>>>>>>> [mishima@node03 demos]$ mpirun -np 8 -report-bindings >>>>>>> -cpus-per-proc >>>>>>>>> 4 >>>>>>>>>>>>> -map-by socket myprog >>>>>>>>>>>> >>>>>>>>>>>> I don't understand the difference here - you are simply > starting >>>>> it >>>>>>>>> from>>>>> a different node? It looks like everything is expected > to >>>>> run local >>>>>>> to >>>>>>>>>>> mpirun, yes? So there is no rsh actually involved here. >>>>>>>>>>>> Are you still running in an allocation? >>>>>>>>>>>> >>>>>>>>>>>> If you run this with "-host node03" on the cmd line, do you > see >>>>> the >>>>>>>>> same >>>>>>>>>>> problem? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>>> > -------------------------------------------------------------------------- >>>>>>>>>>>>> A request was made to bind to that would result in binding > more >>>>>>>>>>>>> processes than cpus on a resource: >>>>>>>>>>>>> >>>>>>>>>>>>> Bind to: CORE >>>>>>>>>>>>> Node: node03 >>>>>>>>>>>>> #processes: 2 >>>>>>>>>>>>> #cpus: 1 >>>>>>>>>>>>> >>>>>>>>>>>>> You can override this protection by adding the >>>> "overload-allowed" >>>>>>>>>>>>> option to your binding directive. >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>>> > -------------------------------------------------------------------------- >>>>>>>>>>>>> [mishima@node03 demos]$ >>>>>>>>>>>>> [mishima@node03 demos]$ mpirun -np 8 -report-bindings >>>>>>> -cpus-per-proc >>>>>>>>> 4 >>>>>>>>>>>>> myprog >>>>>>>>>>>>> [node03.cluster:11036] MCW rank 2 bound to socket 1[core 8 > [hwt >>>>> 0]], >>>>>>>>>>> socket >>>>>>>>>>>>> 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], s >>>>>>>>>>>>> ocket 1[core 11[hwt 0]]: >>>>>>>>>>>>> >>>>>>> > [./././././././.][B/B/B/B/./././.][./././././././.][./././././././.] >>>>>>>>>>>>> [node03.cluster:11036] MCW rank 3 bound to socket 1[core 12 > [hwt >>>>>>> 0]], >>>>>>>>>>> socket >>>>>>>>>>>>> 1[core 13[hwt 0]], socket 1[core 14[hwt 0]], >>>>>>>>>>>>> socket 1[core 15[hwt 0]]: >>>>>>>>>>>>> >>>>>>> > [./././././././.][././././B/B/B/B][./././././././.][./././././././.] >>>>>>>>>>>>> [node03.cluster:11036] MCW rank 4 bound to socket 2[core 16 > [hwt >>>>>>> 0]], >>>>>>>>>>> socket >>>>>>>>>>>>> 2[core 17[hwt 0]], socket 2[core 18[hwt 0]], >>>>>>>>>>>>> socket 2[core 19[hwt 0]]: >>>>>>>>>>>>> >>>>>>> > [./././././././.][./././././././.][B/B/B/B/./././.][./././././././.] >>>>>>>>>>>>> [node03.cluster:11036] MCW rank 5 bound to socket 2[core 20 > [hwt >>>>>>> 0]], >>>>>>>>>>> socket >>>>>>>>>>>>> 2[core 21[hwt 0]], socket 2[core 22[hwt 0]], >>>>>>>>>>>>> socket 2[core 23[hwt 0]]: >>>>>>>>>>>>> >>>>>>> > [./././././././.][./././././././.][././././B/B/B/B][./././././././.] >>>>>>>>>>>>> [node03.cluster:11036] MCW rank 6 bound to socket 3[core 24 > [hwt >>>>>>> 0]], >>>>>>>>>>> socket >>>>>>>>>>>>> 3[core 25[hwt 0]], socket 3[core 26[hwt 0]], >>>>>>>>>>>>> socket 3[core 27[hwt 0]]:>>>>> >>>>>>> > [./././././././.][./././././././.][./././././././.][B/B/B/B/./././.] >>>>>>>>>>>>> [node03.cluster:11036] MCW rank 7 bound to socket 3[core 28 > [hwt >>>>>>> 0]], >>>>>>>>>>> socket >>>>>>>>>>>>> 3[core 29[hwt 0]], socket 3[core 30[hwt 0]], >>>>>>>>>>>>> socket 3[core 31[hwt 0]]: >>>>>>>>>>>>> >>>>>>> > [./././././././.][./././././././.][./././././././.][././././B/B/B/B] >>>>>>>>>>>>> [node03.cluster:11036] MCW rank 0 bound to socket 0[core 0 > [hwt >>>>> 0]], >>>>>>>>>>> socket >>>>>>>>>>>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so >>>>>>>>>>>>> cket 0[core 3[hwt 0]]: >>>>>>>>>>>>> >>>>>>> > [B/B/B/B/./././.][./././././././.][./././././././.][./././././././.] >>>>>>>>>>>>> [node03.cluster:11036] MCW rank 1 bound to socket 0[core 4 > [hwt >>>>> 0]], >>>>>>>>>>> socket >>>>>>>>>>>>> 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], so >>>>>>>>>>>>> cket 0[core 7[hwt 0]]: >>>>>>>>>>>>> >>>>>>> > [././././B/B/B/B][./././././././.][./././././././.][./././././././.] >>>>>>>>>>>>> Hello world from process 4 of 8 >>>>>>>>>>>>> Hello world from process 2 of 8 >>>>>>>>>>>>> Hello world from process 6 of 8 >>>>>>>>>>>>> Hello world from process 5 of 8 >>>>>>>>>>>>> Hello world from process 3 of 8 >>>>>>>>>>>>> Hello world from process 7 of 8 >>>>>>>>>>>>> Hello world from process 0 of 8 >>>>>>>>>>>>> Hello world from process 1 of 8 >>>>>>>>>>>>> >>>>>>>>>>>>> Regards, >>>>>>>>>>>>> Tetsuya Mishima >>>>>>>>>>>>> >>>>>>>>>>>>>> Hmmm...that's strange. I only have 2 sockets on my system, > but >>>>> let >>>>>>>>> me >>>>>>>>>>>>> poke around a bit and see what might be happening. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On Dec 10, 2013, at 4:47 PM, tmish...@jcity.maeda.co.jp > wrote: >>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi Ralph, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thanks. I didn't know the meaning of "socket:span". >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> But it still causes the problem, which seems socket:span >>>>> doesn't >>>>>>>>>>> work. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [mishima@manage demos]$ qsub -I -l nodes=node03:ppn=32 >>>>>>>>>>>>>>> qsub: waiting for job 8265.manage.cluster to start >>>>>>>>>>>>>>> qsub: job 8265.manage.cluster ready >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> [mishima@node03 ~]$ cd ~/Desktop/openmpi-1.7/demos/ >>>>>>>>>>>>>>> [mishima@node03 demos]$ mpirun -np 8 -report-bindings >>>>>>>>> -cpus-per-proc >>>>>>>>>>> 4 >>>>>>>>>>>>>>> -map-by socket:span myprog >>>>>>>>>>>>>>> [node03.cluster:10262] MCW rank 2 bound to socket 1[core 8 >>>> [hwt >>>>>>> 0]], >>>>>>>>>>>>> socket >>>>>>>>>>>>>>> 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], s >>>>>>>>>>>>>>> ocket 1[core 11[hwt 0]]: >>>>>>>>>>>>>>> >>>>>>>>> >>>> [./././././././.][B/B/B/B/./././.][./././././././.][./././././././.] >>>>>>>>>>>>>>> [node03.cluster:10262] MCW rank 3 bound to socket 1[core 12 >>>> [hwt >>>>>>>>> 0]], >>>>>>>>>>>>> socket >>>>>>>>>>>>>>> 1[core 13[hwt 0]], socket 1[core 14[hwt 0]], >>>>>>>>>>>>>>> socket 1[core 15[hwt 0]]: >>>>>>>>>>>>>>> >>>>>>>>> >>>> [./././././././.][././././B/B/B/B][./././././././.][./././././././.] >>>>>>>>>>>>>>> [node03.cluster:10262] MCW rank 4 bound to socket 2[core 16 >>>> [hwt >>>>>>>>> 0]], >>>>>>>>>>>>> socket >>>>>>>>>>>>>>> 2[core 17[hwt 0]], socket 2[core 18[hwt 0]], >>>>>>>>>>>>>>> socket 2[core 19[hwt 0]]: >>>>>>>>>>>>>>> >>>>>>>>> >>>> [./././././././.][./././././././.][B/B/B/B/./././.][./././././././.] >>>>>>>>>>>>>>> [node03.cluster:10262] MCW rank 5 bound to socket 2[core 20 >>>> [hwt >>>>>>>>> 0]], >>>>>>>>>>>>> socket >>>>>>>>>>>>>>> 2[core 21[hwt 0]], socket 2[core 22[hwt 0]], >>>>>>>>>>>>>>> socket 2[core 23[hwt 0]]: >>>>>>>>>>>>>>> >>>>>>>>> >>>> [./././././././.][./././././././.][././././B/B/B/B][./././././././.] >>>>>>>>>>>>>>> [node03.cluster:10262] MCW rank 6 bound to socket 3[core 24 >>>> [hwt >>>>>>>>> 0]], >>>>>>>>>>>>> socket >>>>>>>>>>>>>>> 3[core 25[hwt 0]], socket 3[core 26[hwt 0]], >>>>>>>>>>>>>>> socket 3[core 27[hwt 0]]: >>>>>>>>>>>>>>> >>>>>>>>> >>>> [./././././././.][./././././././.][./././././././.][B/B/B/B/./././.] >>>>>>>>>>>>>>> [node03.cluster:10262] MCW rank 7 bound to socket 3[core 28 >>>> [hwt >>>>>>>>> 0]], >>>>>>>>>>>>> socket >>>>>>>>>>>>>>> 3[core 29[hwt 0]], socket 3[core 30[hwt 0]], >>>>>>>>>>>>>>> socket 3[core 31[hwt 0]]: >>>>>>>>>>>>>>> >>>>>>>>> >>>> [./././././././.][./././././././.][./././././././.][././././B/B/B/B] >>>>>>>>>>>>>>> [node03.cluster:10262] MCW rank 0 bound to socket 0[core 0 >>>> [hwt >>>>>>> 0]], >>>>>>>>>>>>> socket >>>>>>>>>>>>>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so >>>>>>>>>>>>>>> cket 0[core 3[hwt 0]]: >>>>>>>>>>>>>>> >>>>>>>>> >>>> [B/B/B/B/./././.][./././././././.][./././././././.][./././././././.] >>>>>>>>>>>>>>> [node03.cluster:10262] MCW rank 1 bound to socket 0[core 4 >>>> [hwt >>>>>>> 0]], >>>>>>>>>>>>> socket >>>>>>>>>>>>>>> 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], so >>>>>>>>>>>>>>> cket 0[core 7[hwt 0]]: >>>>>>>>>>>>>>> >>>>>>>>> >>>> [././././B/B/B/B][./././././././.][./././././././.][./././././././.] >>>>>>>>>>>>>>> Hello world from process 0 of 8>>>>>>>>>>>>> Hello world > from process 3 of 8 >>>>>>>>>>>>>>> Hello world from process 1 of 8 >>>>>>>>>>>>>>> Hello world from process 4 of 8 >>>>>>>>>>>>>>> Hello world from process 6 of 8 >>>>>>>>>>>>>>> Hello world from process 5 of 8 >>>>>>>>>>>>>>> Hello world from process 2 of 8 >>>>>>>>>>>>>>> Hello world from process 7 of 8 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>> Tetsuya Mishima >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> No, that is actually correct. We map a socket until full, >>>> then >>>>>>>>> move >>>>>>>>>>> to >>>>>>>>>>>>>>> the next. What you want is --map-by socket:span >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> On Dec 10, 2013, at 3:42 PM, tmish > i...@jcity.maeda.co.jp >>>> wrote: >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi Ralph, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I had a time to try your patch yesterday using >>>>>>>>>>> openmpi-1.7.4a1r29646. >>>>>>>>>>>>>>>>>>>>>>>> It stopped the error but unfortunately "mapping by >>>>>>>>> socket" itself >>>>>>>>>>>>>>> didn't >>>>>>>>>>>>>>>>> work >>>>>>>>>>>>>>>>> well as shown bellow: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> [mishima@manage demos]$ qsub -I -l nodes=1:ppn=32 >>>>>>>>>>>>>>>>> qsub: waiting for job 8260.manage.cluster to start >>>>>>>>>>>>>>>>> qsub: job 8260.manage.cluster ready >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> [mishima@node04 ~]$ cd ~/Desktop/openmpi-1.7/demos/ >>>>>>>>>>>>>>>>> [mishima@node04 demos]$ mpirun -np 8 -report-bindings >>>>>>>>>>> -cpus-per-proc >>>>>>>>>>>>> 4 >>>>>>>>>>>>>>>>> -map-by socket myprog >>>>>>>>>>>>>>>>> [node04.cluster:27489] MCW rank 2 bound to socket 1[core > 8 >>>>> [hwt >>>>>>>>> 0]], >>>>>>>>>>>>>>> socket >>>>>>>>>>>>>>>>> 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], s >>>>>>>>>>>>>>>>> ocket 1[core 11[hwt 0]]: >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>> [./././././././.][B/B/B/B/./././.][./././././././.][./././././././.] >>>>>>>>>>>>>>>>> [node04.cluster:27489] MCW rank 3 bound to socket 1[core > 12 >>>>> [hwt >>>>>>>>>>> 0]], >>>>>>>>>>>>>>> socket >>>>>>>>>>>>>>>>> 1[core 13[hwt 0]], socket 1[core 14[hwt 0]], >>>>>>>>>>>>>>>>> socket 1[core 15[hwt 0]]: >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>> [./././././././.][././././B/B/B/B][./././././././.][./././././././.] >>>>>>>>>>>>>>>>> [node04.cluster:27489] MCW rank 4 bound to socket 2[core > 16 >>>>> [hwt >>>>>>>>>>> 0]], >>>>>>>>>>>>>>> socket >>>>>>>>>>>>>>>>> 2[core 17[hwt 0]], socket 2[core 18[hwt 0]], >>>>>>>>>>>>>>>>> socket 2[core 19[hwt 0]]: >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>> [./././././././.][./././././././.][B/B/B/B/./././.][./././././././.] >>>>>>>>>>>>>>>>> [node04.cluster:27489] MCW rank 5 bound to socket 2[core > 20 >>>>> [hwt >>>>>>>>>>> 0]], >>>>>>>>>>>>>>> socket >>>>>>>>>>>>>>>>> 2[core 21[hwt 0]], socket 2[core 22[hwt 0]], >>>>>>>>>>>>>>>>> socket 2[core 23[hwt 0]]: >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>> [./././././././.][./././././././.][././././B/B/B/B][./././././././.] >>>>>>>>>>>>>>>>> [node04.cluster:27489] MCW rank 6 bound to socket 3[core > 24 >>>>> [hwt >>>>>>>>>>> 0]], >>>>>>>>>>>>>>> socket >>>>>>>>>>>>>>>>> 3[core 25[hwt 0]], socket 3[core 26[hwt 0]], >>>>>>>>>>>>>>>>> socket 3[core 27[hwt 0]]: >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>> [./././././././.][./././././././.][./././././././.][B/B/B/B/./././.] >>>>>>>>>>>>>>>>> [node04.cluster:27489] MCW rank 7 bound to socket 3[core > 28 >>>>> [hwt >>>>>>>>>>> 0]], >>>>>>>>>>>>>>> socket >>>>>>>>>>>>>>>>> 3[core 29[hwt 0]], socket 3[core 30[hwt 0]], >>>>>>>>>>>>>>>>> socket 3[core 31[hwt 0]]: >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>> [./././././././.][./././././././.][./././././././.][././././B/B/B/B] >>>>>>>>>>>>>>>>> [node04.cluster:27489] MCW rank 0 bound to socket 0[core > 0 >>>>> [hwt >>>>>>>>> 0]], >>>>>>>>>>>>>>> socket >>>>>>>>>>>>>>>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so >>>>>>>>>>>>>>>>> cket 0[core 3[hwt 0]]: >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>> [B/B/B/B/./././.][./././././././.][./././././././.][./././././././.] >>>>>>>>>>>>>>>>> [node04.cluster:27489] MCW rank 1 bound to socket 0[core > 4 >>>>> [hwt >>>>>>>>> 0]], >>>>>>>>>>>>>>> socket >>>>>>>>>>>>>>>>> 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], so >>>>>>>>>>>>>>>>> cket 0[core 7[hwt 0]]: >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>> [././././B/B/B/B][./././././././.][./././././././.][./././././././.] >>>>>>>>>>>>>>>>> Hello world from process 2 of 8 >>>>>>>>>>>>>>>>> Hello world from process 1 of 8 >>>>>>>>>>>>>>>>> Hello world from process 3 of 8 >>>>>>>>>>>>>>>>> Hello world from process 0 of 8 >>>>>>>>>>>>>>>>> Hello world from process 6 of 8 >>>>>>>>>>>>>>>>> Hello world from process 5 of 8 >>>>>>>>>>>>>>>>> Hello world from process 4 of 8 >>>>>>>>>>>>>>>>> Hello world from process 7 of 8 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I think this should be like this: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> rank 00 >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>> [B/B/B/B/./././.][./././././././.][./././././././.][./././././././.] >>>>>>>>>>>>>>>>> rank 01 >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>> [./././././././.][B/B/B/B/./././.][./././././././.][./././././././.] >>>>>>>>>>>>>>>>> rank 02 >>>>>>>>>>>>>>>>> >>>>>>>>>>> >>>>> [./././././././.][./././././././.][B/B/B/B/./././.][./././././././.] >>>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>> Tetsuya Mishima >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I fixed this under the trunk (was an issue regardless of >>>> RM) >>>>>>> and >>>>>>>>>>>>> have >>>>>>>>>>>>>>>>> scheduled it for 1.7.4. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Thanks! >>>>>>>>>>>>>>>>>> Ralph >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> On Nov 25, 2013, at 4:22 PM, tmish...@jcity.maeda.co.jp >>>>> wrote: >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Hi Ralph, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thank you very much for your quick response.> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I'm afraid to say that I found one more issuse... >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> It's not so serious. Please check it when you have a > lot >>>> of >>>>>>>>> time. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> The problem is cpus-per-proc with -map-by option under >>>>> Torque >>>>>>>>>>>>>>> manager. >>>>>>>>>>>>>>>>>>> It doesn't work as shown below. I guess you can get the >>>>> same >>>>>>>>>>>>>>>>>>> behaviour under Slurm manager. >>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Of course, if I remove -map-by option, it works quite >>>> well. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> [mishima@manage testbed2]$ qsub -I -l nodes=1:ppn=32 >>>>>>>>>>>>>>>>>>> qsub: waiting for job 8116.manage.cluster to start >>>>>>>>>>>>>>>>>>> qsub: job 8116.manage.cluster ready >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> [mishima@node03 ~]$ cd ~/Ducom/testbed2 >>>>>>>>>>>>>>>>>>> [mishima@node03 testbed2]$ mpirun -np 8 > -report-bindings >>>>>>>>>>>>>>> -cpus-per-proc >>>>>>>>>>>>>>>>> 4 >>>>>>>>>>>>>>>>>>> -map-by socket mPre >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>>> > -------------------------------------------------------------------------- >>>>>>>>>>>>>>>>>>> A request was made to bind to that would result in >>>> binding >>>>>>> more >>>>>>>>>>>>>>>>>>> processes than cpus on a resource: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Bind to: CORE >>>>>>>>>>>>>>>>>>> Node: node03>>>>>>> #processes: 2 >>>>>>>>>>>>>>>>>>> #cpus: 1 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> You can override this protection by adding the >>>>>>>>> "overload-allowed" >>>>>>>>>>>>>>>>>>> option to your binding directive. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>>> > -------------------------------------------------------------------------- >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> [mishima@node03 testbed2]$ mpirun -np 8 > -report-bindings >>>>>>>>>>>>>>> -cpus-per-proc >>>>>>>>>>>>>>>>> 4 >>>>>>>>>>>>>>>>>>> mPre >>>>>>>>>>>>>>>>>>> [node03.cluster:18128] MCW rank 2 bound to socket 1 > [core >>>> 8 >>>>>>> [hwt >>>>>>>>>>> 0]], >>>>>>>>>>>>>>>>> socket >>>>>>>>>>>>>>>>>>> 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], s >>>>>>>>>>>>>>>>>>> ocket 1[core 11[hwt 0]]: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>> > [./././././././.][B/B/B/B/./././.][./././././././.][./././././././.] >>>>>>>>>>>>>>>>>>> [node03.cluster:18128] MCW rank 3 bound to socket 1 > [core >>>> 12 >>>>>>> [hwt >>>>>>>>>>>>> 0]], >>>>>>>>>>>>>>>>> socket >>>>>>>>>>>>>>>>>>> 1[core 13[hwt 0]], socket 1[core 14[hwt 0]], >>>>>>>>>>>>>>>>>>> socket 1[core 15[hwt 0]]: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>> > [./././././././.][././././B/B/B/B][./././././././.][./././././././.] >>>>>>>>>>>>>>>>>>> [node03.cluster:18128] MCW rank 4 bound to socket 2 > [core >>>> 16 >>>>>>> [hwt >>>>>>>>>>>>> 0]], >>>>>>>>>>>>>>>>> socket >>>>>>>>>>>>>>>>>>> 2[core 17[hwt 0]], socket 2[core 18[hwt 0]], >>>>>>>>>>>>>>>>>>> socket 2[core 19[hwt 0]]: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>> > [./././././././.][./././././././.][B/B/B/B/./././.][./././././././.] >>>>>>>>>>>>>>>>>>> [node03.cluster:18128] MCW rank 5 bound to socket 2 > [core >>>> 20 >>>>>>> [hwt >>>>>>>>>>>>> 0]], >>>>>>>>>>>>>>>>> socket >>>>>>>>>>>>>>>>>>> 2[core 21[hwt 0]], socket 2[core 22[hwt 0]], >>>>>>>>>>>>>>>>>>> socket 2[core 23[hwt 0]]: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>> > [./././././././.][./././././././.][././././B/B/B/B][./././././././.] >>>>>>>>>>>>>>>>>>> [node03.cluster:18128] MCW rank 6 bound to socket 3 > [core >>>> 24 >>>>>>> [hwt >>>>>>>>>>>>> 0]], >>>>>>>>>>>>>>>>> socket >>>>>>>>>>>>>>>>>>> 3[core 25[hwt 0]], socket 3[core 26[hwt 0]], >>>>>>>>>>>>>>>>>>> socket 3[core 27[hwt 0]]: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>> > [./././././././.][./././././././.][./././././././.][B/B/B/B/./././.] >>>>>>>>>>>>>>>>>>> [node03.cluster:18128] MCW rank 7 bound to socket 3 > [core >>>> 28 >>>>>>> [hwt >>>>>>>>>>>>> 0]], >>>>>>>>>>>>>>>>> socket >>>>>>>>>>>>>>>>>>> 3[core 29[hwt 0]], socket 3[core 30[hwt 0]], >>>>>>>>>>>>>>>>>>> socket 3[core 31[hwt 0]]: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>> >>>>> >>>> > [./././././././.][./././././././.][./././././././.][././././B/B/B/B]>>>>>>>>>>>>> > >>>> >>>>> [node03.cluster:18128] MCW rank 0 bound to socket 0[core 0 >>>>>>> [hwt >>>>>>>>>>> 0]], >>>>>>>>>>>>>>>>> socket >>>>>>>>>>>>>>>>>>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so >>>>>>>>>>>>>>>>>>> cket 0[core 3[hwt 0]]: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>> > [B/B/B/B/./././.][./././././././.][./././././././.][./././././././.] >>>>>>>>>>>>>>>>>>> [node03.cluster:18128] MCW rank 1 bound to socket 0 > [core >>>> 4 >>>>>>> [hwt >>>>>>>>>>> 0]], >>>>>>>>>>>>>>>>> socket >>>>>>>>>>>>>>>>>>> 0[core 5[hwt 0]], socket 0[core 6[hwt 0]], so >>>>>>>>>>>>>>>>>>> cket 0[core 7[hwt 0]]: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>> > [././././B/B/B/B][./././././././.][./././././././.][./././././././.] >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>> Regards, >>>>>>>>>>>>>>>>>>> Tetsuya Mishima >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Fixed and scheduled to move to 1.7.4. Thanks again! >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Nov 17, 2013, at 6:11 PM, Ralph Castain >>>>>>> <r...@open-mpi.org> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Thanks! That's precisely where I was going to look > when >>>> I >>>>>>> had >>>>>>>>>>>>>>> time :-) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I'll update tomorrow. >>>>>>>>>>>>>>>>>>>> Ralph >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> On Sun, Nov 17, 2013 at 7:01 PM, >>>>>>>>>>>>> <tmish...@jcity.maeda.co.jp>wrote: >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Hi Ralph, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> This is the continuous story of "Segmentation fault in >>>>>>>>> oob_tcp.c >>>>>>>>>>>>> of >>>>>>>>>>>>>>>>>>>> openmpi-1.7.4a1r29646". >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I found the cause. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Firstly, I noticed that your hostfile can work and > mine >>>>> can >>>>>>>>> not. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Your host file: >>>>>>>>>>>>>>>>>>>> cat hosts >>>>>>>>>>>>>>>>>>>> bend001 slots=12 >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> My host file: >>>>>>>>>>>>>>>>>>>> cat hosts >>>>>>>>>>>>>>>>>>>> node08 >>>>>>>>>>>>>>>>>>>> node08 >>>>>>>>>>>>>>>>>>>> ...(total 8 lines) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I modified my script file to add "slots=1" to each > line >>>> of >>>>>>> my >>>>>>>>>>>>>>> hostfile >>>>>>>>>>>>>>>>>>>> just before launching mpirun. Then it worked. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> My host file(modified): >>>>>>>>>>>>>>>>>>>> cat hosts >>>>>>>>>>>>>>>>>>>> node08 slots=1 >>>>>>>>>>>>>>>>>>>> node08 slots=1 >>>>>>>>>>>>>>>>>>>> ...(total 8 lines) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Secondary, I confirmed that there's a slight > difference >>>>>>>>> between >>>>>>>>>>>>>>>>>>>> orte/util/hostfile/hostfile.c of 1.7.3 and that of >>>>>>>>>>> 1.7.4a1r29646. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> $ diff >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>> >>>>>>> >>>> hostfile.c.org ../../../../openmpi-1.7.3/orte/util/hostfile/hostfile.c >>>>>>>>>>>>>>>>>>>> 394,401c394,399 >>>>>>>>>>>>>>>>>>>> < if (got_count) { >>>>>>>>>>>>>>>>>>>> < node->slots_given = true; >>>>>>>>>>>>>>>>>>>> < } else if (got_max) { >>>>>>>>>>>>>>>>>>>> < node->slots = node->slots_max; >>>>>>>>>>>>>>>>>>>> < node->slots_given = true; >>>>>>>>>>>>>>>>>>>> < } else { >>>>>>>>>>>>>>>>>>>> < /* should be set by obj_new, but just to be >>>>> clear >>>>>>> */ >>>>>>>>>>>>>>>>>>>> < node->slots_given = false; >>>>>>>>>>>>>>>>>>>> --- >>>>>>>>>>>>>>>>>>>>> if (!got_count) { >>>>>>>>>>>>>>>>>>>>> if (got_max) { >>>>>>>>>>>>>>>>>>>>> node->slots = node->slots_max; >>>>>>>>>>>>>>>>>>>>> } else { >>>>>>>>>>>>>>>>>>>>> ++node->slots;>>>>>>>>>>>>> } >>>>>>>>>>>>>>>>>>>> .... >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Finally, I added the line 402 below just as a > tentative >>>>>>> trial. >>>>>>>>>>>>>>>>>>>> Then, it worked. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> cat -n orte/util/hostfile/hostfile.c: >>>>>>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>>>>> 394 if (got_count) { >>>>>>>>>>>>>>>>>>>> 395 node->slots_given = true; >>>>>>>>>>>>>>>>>>>> 396 } else if (got_max) { >>>>>>>>>>>>>>>>>>>> 397 node->slots = node->slots_max; >>>>>>>>>>>>>>>>>>>> 398 node->slots_given = true; >>>>>>>>>>>>>>>>>>>> 399 } else { >>>>>>>>>>>>>>>>>>>> 400 /* should be set by obj_new, but just to > be >>>>>>> clear >>>>>>>>>>> */ >>>>>>>>>>>>>>>>>>>> 401 node->slots_given >>>>>>> = false; >>>>>>>>>>>>>>>>>>>> 402 ++node->slots; /* added by tmishima */ >>>>>>>>>>>>>>>>>>>> 403 } >>>>>>>>>>>>>>>>>>>> ... >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Please fix the problem properly, because it's just > based >>>>> on >>>>>>> my >>>>>>>>>>>>>>>>>>>> random guess. It's related to the treatment of > hostfile >>>>>>> where >>>>>>>>>>>>> slots >>>>>>>>>>>>>>>>>>>> information is not given. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Regards, >>>>>>>>>>>>>>>>>>>> Tetsuya Mishima >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> >>>>>>> >>>>> >>>> > http://www.open-mpi.org/mailman/listinfo.cgi/users_______________________________________________ > >>>> >>>>> >>>>>>> >>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> users mailing list>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>> > users@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>>> >>>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>>> users mailing list >>>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>>> >>>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>>> users mailing list >>>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> users mailing list >>>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> users mailing list >>>>>>>>>>> us...@open-mpi.org >>>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> users mailing list >>>>>>>>>> us...@open-mpi.org >>>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> users mailing list >>>>>>>>> us...@open-mpi.org >>>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> us...@open-mpi.org >>>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>>> >>>>>>> _______________________________________________ >>>>>>> users mailing list >>>>>>> us...@open-mpi.org >>>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>>> >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> us...@open-mpi.org >>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>>> >>>>> _______________________________________________ >>>>> users mailing list >>>>> us...@open-mpi.org >>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>>> >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users