Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-08-09 Thread tmishima
Finally it worked, thanks!

[mishima@manage OMB-3.1.1-openmpi2.0.0]$ ompi_info --param btl openib
--level 5 | grep openib_flags
  MCA btl openib: parameter "btl_openib_flags" (current value:
"65847", data source: default, level: 5 tuner/det
ail, type: unsigned_int)
[mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -report-bindings
osu_bw
[manage.cluster:14439] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
cket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]:
[B/B/B/B/B/B][./././././.]
[manage.cluster:14439] MCW rank 1 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
cket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt 0]]:
[B/B/B/B/B/B][./././././.]
# OSU MPI Bandwidth Test v3.1.1
# SizeBandwidth (MB/s)
1 1.72
2 3.52
4 7.01
814.11
16   28.17
32   55.90
64   99.83
128 159.13
256 272.98
512 476.35
1024911.49
2048   1319.96
4096   1767.78
8192   2169.53
16384  2507.96
32768  2957.28
65536  3206.90
131072 3610.33
262144 3985.18
524288 4379.47
10485764560.90
20971524661.44
41943044631.21


Tetsuya Mishima

2016/08/10 11:57:29、"devel"さんは「Re: [OMPI devel] sm BTL performace of
the openmpi-2.0.0」で書きました
> Ack, the segv is due to a typo from transcribing the patch. Fixed. Please
try the following patch and let me know if it fixes the issues.
>
>
https://github.com/hjelmn/ompi/commit/4079eec9749e47dddc6acc9c0847b3091601919f.patch

>
> -Nathan
>
> > On Aug 8, 2016, at 9:48 PM, tmish...@jcity.maeda.co.jp wrote:
> >
> > The latest patch also causes a segfault...
> >
> > By the way, I found a typo as below. _pml_ob1.use_all_rdma in the
last
> > line should be _pml_ob1.use_all_rdma:
> >
> > +mca_pml_ob1.use_all_rdma = false;
> > +(void) mca_base_component_var_register
> > (_pml_ob1_component.pmlm_version, "use_all_rdma",
> > +   "Use all available RDMA
btls
> > for the RDMA and RDMA pipeline protocols "
> > +   "(default: false)",
> > MCA_BASE_VAR_TYPE_BOOL, NULL, 0, 0,
> > +   OPAL_INFO_LVL_5,
> > MCA_BASE_VAR_SCOPE_GROUP, _pml_ob1.use_all_rdma);
> > +
> >
> > Here is the OSU_BW and gdb output:
> >
> > # OSU MPI Bandwidth Test v3.1.1
> > # SizeBandwidth (MB/s)
> > 1 2.19
> > 2 4.43
> > 4 8.98
> > 818.07
> > 16   35.58
> > 32   70.62
> > 64  108.88
> > 128 172.97
> > 256 305.73
> > 512 536.48
> > 1024957.57
> > 2048   1587.21
> > 4096   1638.81
> > 8192   2165.14
> > 16384  2482.43
> > 32768  2866.33
> > 65536  3655.33
> > 131072 4208.40
> > 262144 4596.12
> > 524288 4769.27
> > 10485764900.00
> > [manage:16596] *** Process received signal ***
> > [manage:16596] Signal: Segmentation fault (11)
> > [manage:16596] Signal code: Address not mapped (1)
> > [manage:16596] Failing at address: 0x8
> > ...
> > Core was generated by `osu_bw'.
> > Program terminated with signal 11, Segmentation fault.
> > #0  0x0031d9008806 in ?? () from /lib64/libgcc_s.so.1
> > (gdb) where
> > #0  0x0031d9008806 in ?? () from /lib64/libgcc_s.so.1
> > #1  0x0031d9008934 in _Unwind_Backtrace ()
from /lib64/libgcc_s.so.1
> > #2  0x0037ab8e5ee8 in backtrace () from /lib64/libc.so.6
> > #3  0x2b5060c14345 in opal_backtrace_print ()
> > at ./backtrace_execinfo.c:47
> > #4  0x2b5060c11180 in show_stackframe () at ./stacktrace.c:331
> > #5  
> > #6  mca_pml_ob1_recv_request_schedule_once ()
at ./pml_ob1_recvreq.c:983
> > #7  0x2aaab461c71a in mca_pml_ob1_recv_request_progress_rndv ()
> >
> >
from /home/mishima/opt/mpi/openmpi-2.0.0-pgi16.5/lib/openmpi/mca_pml_ob1.so
> > #8  0x2aaab46198e5 in mca_pml_ob1_recv_frag_match ()
> > at ./pml_ob1_recvfrag.c:715
> > #9  0x2aaab4618e46 in mca_pml_ob1_recv_frag_callback_rndv ()
> > at ./pml_ob1_recvfrag.c:267
> > #10 0x2aaab37958d3 in mca_btl_vader_poll_handle_frag ()
> > at ./btl_vader_component.c:589
> > #11 0x2aaab3795b9a in mca_btl_vader_component_progress ()
> > at ./btl_vader_component.c:231
> > #12 0x2b5060bd16fc in opal_progress () at
runtime/opal_progress.c:224
> > #13 

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-08-08 Thread tmishima
I understood. Thanks.

Tetsuya Mishima

2016/08/09 11:33:15、"devel"さんは「Re: [OMPI devel] sm BTL performace of
the openmpi-2.0.0」で書きました
> I will add a control to have the new behavior or using all available RDMA
btls or just the eager ones for the RDMA protocol. The flags will remain as
they are. And, yes, for 2.0.0 you can set the btl
> flags if you do not intend to use MPI RMA.
>
> New patch:
>
>
https://github.com/hjelmn/ompi/commit/43267012e58d78e3fc713b98c6fb9f782de977c7.patch

>
> -Nathan
>
> > On Aug 8, 2016, at 8:16 PM, tmish...@jcity.maeda.co.jp wrote:
> >
> > Then, my understanding is that you will restore the default value of
> > btl_openib_flags to previous one( = 310) and add a new MCA parameter to
> > control HCA inclusion for such a situation. The work arround so far for
> > openmpi-2.0.0 is setting those flags manually. Right?
> >
> > Tetsuya Mishima
> >
> > 2016/08/09 9:56:29、"devel"さんは「Re: [OMPI devel] sm BTL performace
of
> > the openmpi-2.0.0」で書きました
> >> Hmm, not good. So we have a situation where it is sometimes better to
> > include the HCA when it is the only rdma btl. Will have a new version
up in
> > a bit that adds an MCA parameter to control the
> >> behavior. The default will be the same as 1.10.x.
> >>
> >> -Nathan
> >>
> >>> On Aug 8, 2016, at 4:51 PM, tmish...@jcity.maeda.co.jp wrote:
> >>>
> >>> Hi, unfortunately it doesn't work well. The previous one was much
> >>> better ...
> >>>
> >>> [mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2
-report-bindings
> >>> osu_bw
> >>> [manage.cluster:25107] MCW rank 0 bound to socket 0[core 0[hwt 0]],
> > socket
> >>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
> >>> cket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt
> > 0]]:
> >>> [B/B/B/B/B/B][./././././.]
> >>> [manage.cluster:25107] MCW rank 1 bound to socket 0[core 0[hwt 0]],
> > socket
> >>> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
> >>> cket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt
> > 0]]:
> >>> [B/B/B/B/B/B][./././././.]
> >>> # OSU MPI Bandwidth Test v3.1.1
> >>> # SizeBandwidth (MB/s)
> >>> 1 2.22
> >>> 2 4.53
> >>> 4 9.11
> >>> 818.02
> >>> 16   35.44
> >>> 32   70.84
> >>> 64  113.71
> >>> 128 176.74
> >>> 256 311.07
> >>> 512 529.03
> >>> 1024907.83
> >>> 2048   1597.66
> >>> 4096330.14
> >>> 8192516.49
> >>> 16384   780.31
> >>> 32768  1038.43
> >>> 65536  1186.36
> >>> 131072 1268.87
> >>> 262144 1222.24
> >>> 524288 1232.30
> >>> 10485761244.62
> >>> 20971521260.25
> >>> 41943041263.47
> >>>
> >>> Tetsuya
> >>>
> >>>
> >>> 2016/08/09 2:42:24、"devel"さんは「Re: [OMPI devel] sm BTL performace
> > of
> >>> the openmpi-2.0.0」で書きました
>  Ok, there was a problem with the selection logic when only one rdma
> >>> capable btl is available. I changed the logic to always use the RDMA
> > btl
> >>> over pipelined send/recv. This works better for me on a
>  Intel Omnipath system. Let me know if this works for you.
> 
> 
> >>>
> >
https://github.com/hjelmn/ompi/commit/dddb865b5337213fd73d0e226b02e2f049cfab47.patch

> >
> >>>
> 
>  -Nathan
> 
>  On Aug 07, 2016, at 10:00 PM, tmish...@jcity.maeda.co.jp wrote:
> 
>  Hi, here is the gdb output for additional information:
> 
>  (It might be inexact, because I built openmpi-2.0.0 without debug
> > option)
> 
>  Core was generated by `osu_bw'.
>  Program terminated with signal 11, Segmentation fault.
>  #0 0x0031d9008806 in ?? () from /lib64/libgcc_s.so.1
>  (gdb) where
>  #0 0x0031d9008806 in ?? () from /lib64/libgcc_s.so.1
>  #1 0x0031d9008934 in _Unwind_Backtrace ()
> > from /lib64/libgcc_s.so.1
>  #2 0x0037ab8e5ee8 in backtrace () from /lib64/libc.so.6
>  #3 0x2ad882bd4345 in opal_backtrace_print ()
>  at ./backtrace_execinfo.c:47
>  #4 0x2ad882bd1180 in show_stackframe () at ./stacktrace.c:331
>  #5 
>  #6 mca_pml_ob1_recv_request_schedule_once ()
> > at ./pml_ob1_recvreq.c:983
>  #7 0x2aaab412f47a in mca_pml_ob1_recv_request_progress_rndv ()
> 
> 
> >>>
> >
from /home/mishima/opt/mpi/openmpi-2.0.0-pgi16.5/lib/openmpi/mca_pml_ob1.so
>  #8 0x2aaab412c645 in mca_pml_ob1_recv_frag_match ()
>  at ./pml_ob1_recvfrag.c:715
>  #9 0x2aaab412bba6 in mca_pml_ob1_recv_frag_callback_rndv ()
>  at ./pml_ob1_recvfrag.c:267
>  #10 0x2f2748d3 in mca_btl_vader_poll_handle_frag ()
>  at ./btl_vader_component.c:589
>  #11 0x2f274b9a in mca_btl_vader_component_progress ()
>  at 

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-08-08 Thread tmishima
Then, my understanding is that you will restore the default value of
btl_openib_flags to previous one( = 310) and add a new MCA parameter to
control HCA inclusion for such a situation. The work arround so far for
openmpi-2.0.0 is setting those flags manually. Right?

Tetsuya Mishima

2016/08/09 9:56:29、"devel"さんは「Re: [OMPI devel] sm BTL performace of
the openmpi-2.0.0」で書きました
> Hmm, not good. So we have a situation where it is sometimes better to
include the HCA when it is the only rdma btl. Will have a new version up in
a bit that adds an MCA parameter to control the
> behavior. The default will be the same as 1.10.x.
>
> -Nathan
>
> > On Aug 8, 2016, at 4:51 PM, tmish...@jcity.maeda.co.jp wrote:
> >
> > Hi, unfortunately it doesn't work well. The previous one was much
> > better ...
> >
> > [mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -report-bindings
> > osu_bw
> > [manage.cluster:25107] MCW rank 0 bound to socket 0[core 0[hwt 0]],
socket
> > 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
> > cket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt
0]]:
> > [B/B/B/B/B/B][./././././.]
> > [manage.cluster:25107] MCW rank 1 bound to socket 0[core 0[hwt 0]],
socket
> > 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
> > cket 0[core 3[hwt 0]], socket 0[core 4[hwt 0]], socket 0[core 5[hwt
0]]:
> > [B/B/B/B/B/B][./././././.]
> > # OSU MPI Bandwidth Test v3.1.1
> > # SizeBandwidth (MB/s)
> > 1 2.22
> > 2 4.53
> > 4 9.11
> > 818.02
> > 16   35.44
> > 32   70.84
> > 64  113.71
> > 128 176.74
> > 256 311.07
> > 512 529.03
> > 1024907.83
> > 2048   1597.66
> > 4096330.14
> > 8192516.49
> > 16384   780.31
> > 32768  1038.43
> > 65536  1186.36
> > 131072 1268.87
> > 262144 1222.24
> > 524288 1232.30
> > 10485761244.62
> > 20971521260.25
> > 41943041263.47
> >
> > Tetsuya
> >
> >
> > 2016/08/09 2:42:24、"devel"さんは「Re: [OMPI devel] sm BTL performace
of
> > the openmpi-2.0.0」で書きました
> >> Ok, there was a problem with the selection logic when only one rdma
> > capable btl is available. I changed the logic to always use the RDMA
btl
> > over pipelined send/recv. This works better for me on a
> >> Intel Omnipath system. Let me know if this works for you.
> >>
> >>
> >
https://github.com/hjelmn/ompi/commit/dddb865b5337213fd73d0e226b02e2f049cfab47.patch

> >
> >>
> >> -Nathan
> >>
> >> On Aug 07, 2016, at 10:00 PM, tmish...@jcity.maeda.co.jp wrote:
> >>
> >> Hi, here is the gdb output for additional information:
> >>
> >> (It might be inexact, because I built openmpi-2.0.0 without debug
option)
> >>
> >> Core was generated by `osu_bw'.
> >> Program terminated with signal 11, Segmentation fault.
> >> #0 0x0031d9008806 in ?? () from /lib64/libgcc_s.so.1
> >> (gdb) where
> >> #0 0x0031d9008806 in ?? () from /lib64/libgcc_s.so.1
> >> #1 0x0031d9008934 in _Unwind_Backtrace ()
from /lib64/libgcc_s.so.1
> >> #2 0x0037ab8e5ee8 in backtrace () from /lib64/libc.so.6
> >> #3 0x2ad882bd4345 in opal_backtrace_print ()
> >> at ./backtrace_execinfo.c:47
> >> #4 0x2ad882bd1180 in show_stackframe () at ./stacktrace.c:331
> >> #5 
> >> #6 mca_pml_ob1_recv_request_schedule_once ()
at ./pml_ob1_recvreq.c:983
> >> #7 0x2aaab412f47a in mca_pml_ob1_recv_request_progress_rndv ()
> >>
> >>
> >
from /home/mishima/opt/mpi/openmpi-2.0.0-pgi16.5/lib/openmpi/mca_pml_ob1.so
> >> #8 0x2aaab412c645 in mca_pml_ob1_recv_frag_match ()
> >> at ./pml_ob1_recvfrag.c:715
> >> #9 0x2aaab412bba6 in mca_pml_ob1_recv_frag_callback_rndv ()
> >> at ./pml_ob1_recvfrag.c:267
> >> #10 0x2f2748d3 in mca_btl_vader_poll_handle_frag ()
> >> at ./btl_vader_component.c:589
> >> #11 0x2f274b9a in mca_btl_vader_component_progress ()
> >> at ./btl_vader_component.c:231
> >> #12 0x2ad882b916fc in opal_progress () at
runtime/opal_progress.c:224
> >> #13 0x2ad8820a9aa5 in ompi_request_default_wait_all () at
> >> request/req_wait.c:77
> >> #14 0x2ad8820f10dd in PMPI_Waitall () at ./pwaitall.c:76
> >> #15 0x00401108 in main () at ./osu_bw.c:144
> >>
> >> Tetsuya
> >>
> >>
> >> 2016/08/08 12:34:57、"devel"さんは「Re: [OMPI devel] sm BTL performace
of
> >> the openmpi-2.0.0」で書きました
> >> Hi, it caused segfault as below:
> >> [manage.cluster:25436] MCW rank 0 bound to socket 0[core 0[hwt
0]],socket
> >> 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], socket 0[core 3[hwt 0]],
> > socket 0[core 4[hwt 0]], socket 0[core 5[hwt
> > 0]]:[B/B/B/B/B/B][./././././.][manage.cluster:25436] MCW rank 1 bound
to
> > socket 0[core
> >> 0[hwt 0]],socket
> >> 0[core 1[hwt 0]], socket 0[core 2[hwt 

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-07-26 Thread tmishima
Hi Gilles,

I confirmed the vader is used when I don't specify any BTL as you pointed
out!

Regards,
Tetsuya Mishima

[mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 --mca
btl_base_verbose 10 -bind-to core -report-bindings osu_bw
[manage.cluster:20006] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
[B/././././.][./././././.]
[manage.cluster:20006] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
[./B/./././.][./././././.]
[manage.cluster:20011] mca: base: components_register: registering
framework btl components
[manage.cluster:20011] mca: base: components_register: found loaded
component self
[manage.cluster:20011] mca: base: components_register: component self
register function successful
[manage.cluster:20011] mca: base: components_register: found loaded
component vader
[manage.cluster:20011] mca: base: components_register: component vader
register function successful
[manage.cluster:20011] mca: base: components_register: found loaded
component tcp
[manage.cluster:20011] mca: base: components_register: component tcp
register function successful
[manage.cluster:20011] mca: base: components_register: found loaded
component sm
[manage.cluster:20011] mca: base: components_register: component sm
register function successful
[manage.cluster:20011] mca: base: components_register: found loaded
component openib
[manage.cluster:20011] mca: base: components_register: component openib
register function successful
[manage.cluster:20011] mca: base: components_open: opening btl components
[manage.cluster:20011] mca: base: components_open: found loaded component
self
[manage.cluster:20011] mca: base: components_open: component self open
function successful
[manage.cluster:20011] mca: base: components_open: found loaded component
vader
[manage.cluster:20011] mca: base: components_open: component vader open
function successful
[manage.cluster:20011] mca: base: components_open: found loaded component
tcp
[manage.cluster:20011] mca: base: components_open: component tcp open
function successful
[manage.cluster:20011] mca: base: components_open: found loaded component
sm
[manage.cluster:20011] mca: base: components_open: component sm open
function successful
[manage.cluster:20011] mca: base: components_open: found loaded component
openib
[manage.cluster:20011] mca: base: components_open: component openib open
function successful
[manage.cluster:20011] select: initializing btl component self
[manage.cluster:20011] select: init of component self returned success
[manage.cluster:20011] select: initializing btl component vader
[manage.cluster:20011] select: init of component vader returned success
[manage.cluster:20011] select: initializing btl component tcp
[manage.cluster:20011] select: init of component tcp returned success
[manage.cluster:20011] select: initializing btl component sm
[manage.cluster:20011] select: init of component sm returned success
[manage.cluster:20011] select: initializing btl component openib
[manage.cluster:20011] Checking distance from this process to device=mthca0
[manage.cluster:20011] hwloc_distances->nbobjs=2
[manage.cluster:20011] hwloc_distances->latency[0]=1.00
[manage.cluster:20011] hwloc_distances->latency[1]=1.60
[manage.cluster:20011] hwloc_distances->latency[2]=1.60
[manage.cluster:20011] hwloc_distances->latency[3]=1.00
[manage.cluster:20011] ibv_obj->type set to NULL
[manage.cluster:20011] Process is bound: distance to device is 0.00
[manage.cluster:20012] mca: base: components_register: registering
framework btl components
[manage.cluster:20012] mca: base: components_register: found loaded
component self
[manage.cluster:20012] mca: base: components_register: component self
register function successful
[manage.cluster:20012] mca: base: components_register: found loaded
component vader
[manage.cluster:20012] mca: base: components_register: component vader
register function successful
[manage.cluster:20012] mca: base: components_register: found loaded
component tcp
[manage.cluster:20012] mca: base: components_register: component tcp
register function successful
[manage.cluster:20012] mca: base: components_register: found loaded
component sm
[manage.cluster:20012] mca: base: components_register: component sm
register function successful
[manage.cluster:20012] mca: base: components_register: found loaded
component openib
[manage.cluster:20012] mca: base: components_register: component openib
register function successful
[manage.cluster:20012] mca: base: components_open: opening btl components
[manage.cluster:20012] mca: base: components_open: found loaded component
self
[manage.cluster:20012] mca: base: components_open: component self open
function successful
[manage.cluster:20012] mca: base: components_open: found loaded component
vader
[manage.cluster:20012] mca: base: components_open: component vader open
function successful
[manage.cluster:20012] mca: base: components_open: found loaded component
tcp
[manage.cluster:20012] mca: base: components_open: component tcp 

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-07-26 Thread tmishima
Hi,

Thanks. I will try it and report later.

Tetsuya Mishima


2016/07/27 9:20:28、"devel"さんは「Re: [OMPI devel] sm BTL performace of
the openmpi-2.0.0」で書きました
> sm is deprecated in 2.0.0 and will likely be removed in favor of vader in
2.1.0.
>
> This issue is probably this known issue:
https://github.com/open-mpi/ompi-release/pull/1250
>
> Please apply those commits and see if it fixes the issue for you.
>
> -Nathan
>
> > On Jul 26, 2016, at 6:17 PM, tmish...@jcity.maeda.co.jp wrote:
> >
> > Hi Gilles,
> >
> > Thanks. I ran again with --mca pml ob1 but I've got the same results as
> > below:
> >
> > [mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -mca pml ob1
-bind-to
> > core -report-bindings osu_bw
> > [manage.cluster:18142] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
> > [B/././././.][./././././.]
> > [manage.cluster:18142] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
> > [./B/./././.][./././././.]
> > # OSU MPI Bandwidth Test v3.1.1
> > # SizeBandwidth (MB/s)
> > 1 1.48
> > 2 3.07
> > 4 6.26
> > 812.53
> > 16   24.33
> > 32   49.03
> > 64   83.46
> > 128 132.60
> > 256 234.96
> > 512 420.86
> > 1024842.37
> > 2048   1231.65
> > 4096264.67
> > 8192472.16
> > 16384   740.42
> > 32768  1030.39
> > 65536  1191.16
> > 131072 1269.45
> > 262144 1238.33
> > 524288 1247.97
> > 10485761257.96
> > 20971521274.74
> > 41943041280.94
> > [mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -mca pml ob1 -mca
btl
> > self,sm -bind-to core -report-bindings osu_b
> > w
> > [manage.cluster:18204] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
> > [B/././././.][./././././.]
> > [manage.cluster:18204] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
> > [./B/./././.][./././././.]
> > # OSU MPI Bandwidth Test v3.1.1
> > # SizeBandwidth (MB/s)
> > 1 0.52
> > 2 1.05
> > 4 2.08
> > 8 4.18
> > 168.21
> > 32   16.65
> > 64   32.60
> > 128  66.70
> > 256 132.45
> > 512 269.27
> > 1024504.63
> > 2048819.76
> > 4096874.54
> > 8192   1447.11
> > 16384  2263.28
> > 32768  3236.85
> > 65536  3567.34
> > 131072 3555.17
> > 262144 3455.76
> > 524288 3441.80
> > 10485763505.30
> > 20971523534.01
> > 41943043546.94
> > [mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -mca pml ob1 -mca
btl
> > self,sm,openib -bind-to core -report-binding
> > s osu_bw
> > [manage.cluster:18218] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
> > [B/././././.][./././././.]
> > [manage.cluster:18218] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
> > [./B/./././.][./././././.]
> > # OSU MPI Bandwidth Test v3.1.1
> > # SizeBandwidth (MB/s)
> > 1 0.51
> > 2 1.03
> > 4 2.05
> > 8 4.07
> > 168.14
> > 32   16.32
> > 64   32.98
> > 128  63.70
> > 256 126.66
> > 512 252.61
> > 1024480.22
> > 2048810.54
> > 4096290.61
> > 8192512.49
> > 16384   764.60
> > 32768  1036.81
> > 65536  1182.81
> > 131072 1264.48
> > 262144 1235.82
> > 524288 1246.70
> > 10485761254.66
> > 20971521274.64
> > 41943041280.65
> > [mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -mca pml ob1 -mca
btl
> > self,openib -bind-to core -report-bindings o
> > su_bw
> > [manage.cluster:18276] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
> > [B/././././.][./././././.]
> > [manage.cluster:18276] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
> > [./B/./././.][./././././.]
> > # OSU MPI Bandwidth Test v3.1.1
> > # SizeBandwidth (MB/s)
> > 1 0.54
> > 2 1.08
> > 4 2.18
> > 8 4.33
> > 168.69
> > 32   17.39
> > 64   34.34
> > 128  66.28
> > 256 130.36
> > 512 

Re: [OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-07-26 Thread tmishima
Hi Gilles,

Thanks. I ran again with --mca pml ob1 but I've got the same results as
below:

[mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -mca pml ob1 -bind-to
core -report-bindings osu_bw
[manage.cluster:18142] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
[B/././././.][./././././.]
[manage.cluster:18142] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
[./B/./././.][./././././.]
# OSU MPI Bandwidth Test v3.1.1
# SizeBandwidth (MB/s)
1 1.48
2 3.07
4 6.26
812.53
16   24.33
32   49.03
64   83.46
128 132.60
256 234.96
512 420.86
1024842.37
2048   1231.65
4096264.67
8192472.16
16384   740.42
32768  1030.39
65536  1191.16
131072 1269.45
262144 1238.33
524288 1247.97
10485761257.96
20971521274.74
41943041280.94
[mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -mca pml ob1 -mca btl
self,sm -bind-to core -report-bindings osu_b
w
[manage.cluster:18204] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
[B/././././.][./././././.]
[manage.cluster:18204] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
[./B/./././.][./././././.]
# OSU MPI Bandwidth Test v3.1.1
# SizeBandwidth (MB/s)
1 0.52
2 1.05
4 2.08
8 4.18
168.21
32   16.65
64   32.60
128  66.70
256 132.45
512 269.27
1024504.63
2048819.76
4096874.54
8192   1447.11
16384  2263.28
32768  3236.85
65536  3567.34
131072 3555.17
262144 3455.76
524288 3441.80
10485763505.30
20971523534.01
41943043546.94
[mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -mca pml ob1 -mca btl
self,sm,openib -bind-to core -report-binding
s osu_bw
[manage.cluster:18218] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
[B/././././.][./././././.]
[manage.cluster:18218] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
[./B/./././.][./././././.]
# OSU MPI Bandwidth Test v3.1.1
# SizeBandwidth (MB/s)
1 0.51
2 1.03
4 2.05
8 4.07
168.14
32   16.32
64   32.98
128  63.70
256 126.66
512 252.61
1024480.22
2048810.54
4096290.61
8192512.49
16384   764.60
32768  1036.81
65536  1182.81
131072 1264.48
262144 1235.82
524288 1246.70
10485761254.66
20971521274.64
41943041280.65
[mishima@manage OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -mca pml ob1 -mca btl
self,openib -bind-to core -report-bindings o
su_bw
[manage.cluster:18276] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
[B/././././.][./././././.]
[manage.cluster:18276] MCW rank 1 bound to socket 0[core 1[hwt 0]]:
[./B/./././.][./././././.]
# OSU MPI Bandwidth Test v3.1.1
# SizeBandwidth (MB/s)
1 0.54
2 1.08
4 2.18
8 4.33
168.69
32   17.39
64   34.34
128  66.28
256 130.36
512 241.81
1024429.86
2048553.44
4096707.14
8192879.60
16384   763.02
32768  1042.89
65536  1185.45
131072 1267.56
262144 1227.41
524288 1244.61
10485761255.66
20971521273.55
41943041281.05


2016/07/27 9:02:49、"devel"さんは「Re: [OMPI devel] sm BTL performace of
the openmpi-2.0.0」で書きました
> Hi,
>
>
> can you please run again with
>
> --mca pml ob1
>
>
> if Open MPI was built with mxm support, pml/cm and mtl/mxm are used
> instead of pml/ob1 and btl/openib
>
>
> Cheers,
>
>
> Gilles
>
>
> On 7/27/2016 8:56 AM, tmish...@jcity.maeda.co.jp wrote:
> > Hi folks,
> >
> > I saw a performance degradation of openmpi-2.0.0 when I ran our
application
> > on a node (12cores). So I did 4 tests using osu_bw as below:
> >
> > 1: mpirun –np 2 osu_bw   

[OMPI devel] sm BTL performace of the openmpi-2.0.0

2016-07-26 Thread tmishima

Hi folks,

I saw a performance degradation of openmpi-2.0.0 when I ran our application
on a node (12cores). So I did 4 tests using osu_bw as below:

1: mpirun –np 2 osu_bw  bad(30% of test2)
2: mpirun –np 2 –mca btl self,sm osu_bw good(same as openmpi1.10.3)
3: mpirun –np 2 –mca btl self,sm,openib osu_bw  bad(30% of test2)
4: mpirun –np 2 –mca btl self,openib osu_bw bad(30% of test2)

I  guess openib BTL was used in the test 1 and 3, because these results are
almost  same  as  test  4. I believe that sm BTL should be used even in the
test 1 and 3, because its priority is higher than openib. Unfortunately, at
the  moment,  I couldn’t figure out the root cause. So please someone would
take care of it.

Regards,
Tetsuya Mishima

P.S. Here I attached these test results.

[mishima@manage   OMB-3.1.1-openmpi2.0.0]$   mpirun  -np  2  -bind-to  core
-report-bindings osu_bw
[manage.cluster:13389]  MCW  rank  0  bound  to  socket  0[core  0[hwt 0]]:
[B/././././.][./././././.]
[manage.cluster:13389]  MCW  rank  1  bound  to  socket  0[core  1[hwt 0]]:
[./B/./././.][./././././.]
# OSU MPI Bandwidth Test v3.1.1
# SizeBandwidth (MB/s)
1 1.49
2 3.04
4 6.13
812.23
16   25.01
32   49.96
64   87.07
128 138.87
256 245.97
512 423.30
1024865.85
2048   1279.63
4096264.79
8192473.92
16384   739.27
32768  1030.49
65536  1190.21
131072 1270.77
262144 1238.74
524288 1245.97
10485761260.09
20971521274.53
41943041285.07
[mishima@manage  OMB-3.1.1-openmpi2.0.0]$  mpirun  -np  2  -mca btl self,sm
-bind-to core -report-bindings osu_bw
[manage.cluster:13448]  MCW  rank  0  bound  to  socket  0[core  0[hwt 0]]:
[B/././././.][./././././.]
[manage.cluster:13448]  MCW  rank  1  bound  to  socket  0[core  1[hwt 0]]:
[./B/./././.][./././././.]
# OSU MPI Bandwidth Test v3.1.1
# SizeBandwidth (MB/s)
1 0.51
2 1.01
4 2.03
8 4.08
167.92
32   16.16
64   32.53
128  64.30
256 128.19
512 256.48
1024468.62
2048785.29
4096854.78
8192   1404.51
16384  2249.20
32768  3136.40
65536  3495.84
131072 3436.69
262144 3392.11
524288 3400.07
10485763460.60
20971523488.09
41943043498.45
[mishima@manageOMB-3.1.1-openmpi2.0.0]$   mpirun   -np   2   -mca   btl
self,sm,openib -bind-to core -report-bindings osu_bw
[manage.cluster:13462]  MCW  rank  0  bound  to  socket  0[core  0[hwt 0]]:
[B/././././.][./././././.]
[manage.cluster:13462]  MCW  rank  1  bound  to  socket  0[core  1[hwt 0]]:
[./B/./././.][./././././.]
# OSU MPI Bandwidth Test v3.1.1
# SizeBandwidth (MB/s)
1 0.54
2 1.09
4 2.18
8 4.37
168.75
32   17.37
64   34.67
128  66.66
256 132.55
512 261.52
1024489.51
2048818.38
4096290.48
8192511.64
16384   765.24
32768  1043.28
65536  1180.48
131072 1261.41
262144 1232.86
524288 1245.70
10485761245.69
20971521268.67
41943041281.33
[mishima@manage  OMB-3.1.1-openmpi2.0.0]$ mpirun -np 2 -mca btl self,openib
-bind-to core -report-bindings osu_bw
[manage.cluster:13521]  MCW  rank  0  bound  to  socket  0[core  0[hwt 0]]:
[B/././././.][./././././.]
[manage.cluster:13521]  MCW  rank  1  bound  to  socket  0[core  1[hwt 0]]:
[./B/./././.][./././././.]
# OSU MPI Bandwidth Test v3.1.1
# SizeBandwidth (MB/s)
1 0.54
2 1.08
4 2.16
8 4.34
168.64
32   17.25
64   34.30
128  66.13
256 129.99
512 242.26
1024429.24
2048556.00
4096706.80
8192874.35
16384   762.60
32768  1039.61
65536   

Re: [OMPI devel] v2.0.0rc4 is released

2016-07-07 Thread tmishima
Hi Gilles san, thank you for your quick comment. I fully understand the
meaning of the warning. Regarding the question you raise, I'm afraid that
I'm not sure which solution is better ...

Regards,
Tetsuya Mishima

2016/07/07 14:13:02、"devel"さんは「Re: [OMPI devel] v2.0.0rc4 is
released」で書きました
> This is a warning that can be safely ignored.
>
>
> That being said, this can be seen as a false positive (unless we fix
> flex or its generated output).
>
> Also, and generally speaking, these kind of warnings is for developers
> only (e.g. end users can do nothing about that).
>
>
> That raises the question : what could/should we do ?
>
> - master filters out these false positives, should we backport this to
> v2.x ?
>
> - should we simply not check for common symbols when building from a
> tarball ?
>
>
> Cheers,
>
>
> Gilles
>
> On 7/7/2016 2:03 PM, tmish...@jcity.maeda.co.jp wrote:
> > Hi Jeff, sorry for a very short report. I saw the warning below
> > at the end of installation of openmpi-2.0.0rc4. Is this okay?
> >
> > $ make install
> > ...
> > make  install-exec-hook
> > make[3]: Entering directory
> > `/home/mishima/mis/openmpi/openmpi-pgi16.5/openmpi-2.0.0rc4'
> > WARNING!  Common symbols found:
> >show_help_lex.o: 0004 C opal_show_help_yyleng
> >show_help_lex.o: 0008 C opal_show_help_yytext
> > hostfile_lex.o: 0004 C orte_util_hostfile_leng
> > hostfile_lex.o: 0008 C orte_util_hostfile_text
> >  rmaps_rank_file_lex.o: 0004 C
orte_rmaps_rank_file_leng
> >  rmaps_rank_file_lex.o: 0008 C
orte_rmaps_rank_file_text
> > make[3]: [install-exec-hook] Error 1 (ignored)
> > make[3]: Leaving directory
> > `/home/mishima/mis/openmpi/openmpi-pgi16.5/openmpi-2.0.0rc4'
> > make[2]: Nothing to be done for `install-data-am'.
> > make[2]: Leaving directory
> > `/home/mishima/mis/openmpi/openmpi-pgi16.5/openmpi-2.0.0rc4'
> > make[1]: Leaving directory
> > `/home/mishima/mis/openmpi/openmpi-pgi16.5/openmpi-2.0.0rc4'
> >
> > Regards,
> > Tetsuya Mishima
> >
> > 2016/07/07 2:40:25、"devel"さんは「[OMPI devel] v2.0.0rc4 is
released」で書
> > きました
> >> While crossing our fingers and doing mystical rain dances, we're
hoping
> > that 2.0.0rc4 is the last rc before v2.0.0 (final) is released.  Please
> > test!
> >> https://www.open-mpi.org/software/ompi/v2.x/
> >>
> >> Changes since rc3 (the list may look long, but most are quite small
> > corner cases):
> >> - Lots of threading fixes
> >> - More fixes for the new memory patcher system
> >> - Updates to NEWS and README
> >> - Fixed some hcoll bugs
> >> - Updates for external PMIx support
> >> - PMIx direct launching fixes
> >> - libudev fixes
> >> - compatibility fixes with ibv_exp_*
> >> - 32 bit compatibility fixes
> >> - fix some powerpc issues
> >> - various OMPIO / libnbc fixes from Lisandro Dalcin
> >> - fix some Solaris configury patching
> >> - fix PSM/PSM2 active state detection
> >> - disable PSM/PSM2 signal hijacking by default
> >> - datatype fixes
> >> - portals4 fixes
> >> - change ofi MTL to only use a limited set of OFI providers by default
> >> - fix OSHMEM init error check
> >>
> >> --
> >> Jeff Squyres
> >> jsquy...@cisco.com
> >> For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post:
> > http://www.open-mpi.org/community/lists/devel/2016/07/19153.php
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
http://www.open-mpi.org/community/lists/devel/2016/07/19158.php
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/develLink to
this post: http://www.open-mpi.org/community/lists/devel/2016/07/19159.php

Re: [OMPI devel] v2.0.0rc4 is released

2016-07-07 Thread tmishima
Hi Jeff, sorry for a very short report. I saw the warning below
at the end of installation of openmpi-2.0.0rc4. Is this okay?

$ make install
...
make  install-exec-hook
make[3]: Entering directory
`/home/mishima/mis/openmpi/openmpi-pgi16.5/openmpi-2.0.0rc4'
WARNING!  Common symbols found:
  show_help_lex.o: 0004 C opal_show_help_yyleng
  show_help_lex.o: 0008 C opal_show_help_yytext
   hostfile_lex.o: 0004 C orte_util_hostfile_leng
   hostfile_lex.o: 0008 C orte_util_hostfile_text
rmaps_rank_file_lex.o: 0004 C orte_rmaps_rank_file_leng
rmaps_rank_file_lex.o: 0008 C orte_rmaps_rank_file_text
make[3]: [install-exec-hook] Error 1 (ignored)
make[3]: Leaving directory
`/home/mishima/mis/openmpi/openmpi-pgi16.5/openmpi-2.0.0rc4'
make[2]: Nothing to be done for `install-data-am'.
make[2]: Leaving directory
`/home/mishima/mis/openmpi/openmpi-pgi16.5/openmpi-2.0.0rc4'
make[1]: Leaving directory
`/home/mishima/mis/openmpi/openmpi-pgi16.5/openmpi-2.0.0rc4'

Regards,
Tetsuya Mishima

2016/07/07 2:40:25、"devel"さんは「[OMPI devel] v2.0.0rc4 is released」で書
きました
> While crossing our fingers and doing mystical rain dances, we're hoping
that 2.0.0rc4 is the last rc before v2.0.0 (final) is released.  Please
test!
>
> https://www.open-mpi.org/software/ompi/v2.x/
>
> Changes since rc3 (the list may look long, but most are quite small
corner cases):
>
> - Lots of threading fixes
> - More fixes for the new memory patcher system
> - Updates to NEWS and README
> - Fixed some hcoll bugs
> - Updates for external PMIx support
> - PMIx direct launching fixes
> - libudev fixes
> - compatibility fixes with ibv_exp_*
> - 32 bit compatibility fixes
> - fix some powerpc issues
> - various OMPIO / libnbc fixes from Lisandro Dalcin
> - fix some Solaris configury patching
> - fix PSM/PSM2 active state detection
> - disable PSM/PSM2 signal hijacking by default
> - datatype fixes
> - portals4 fixes
> - change ofi MTL to only use a limited set of OFI providers by default
> - fix OSHMEM init error check
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: https://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
http://www.open-mpi.org/community/lists/devel/2016/07/19153.php

Re: [OMPI devel] binding output error

2015-04-20 Thread tmishima
Hi Devendar,

As far as I know, the report-bindings option shows the logical
cpu order. On the other hand, you are talking about physical one,
I guess.

Regards,
Tetsuya Mishima

2015/04/21 9:04:37、"devel"さんは「Re: [OMPI devel] binding output
error」で書きました
> HT is not enabled.  All node are same topo . This is reproducible even on
single node.
>
>
>
> I ran osu latency to see if it is really is mapped to other socket or not
with –map-by socket.  It looks likes mapping is correct as per latency
test.
>
>
>
> $mpirun -np 2 -report-bindings -map-by
socket  
/hpc/local/benchmarks/hpc-stack-icc/install/ompi-mellanox-v1.8/tests/osu-micro-benchmarks-4.4.1/osu_latency

>
> [clx-orion-001:10084] MCW rank 0 bound to socket 0[core 0[hwt 0]]:
[B/././././././././././././.][./././././././././././././.]
>
> [clx-orion-001:10084] MCW rank 1 bound to socket 1[core 14[hwt 0]]:
[./././././././././././././.][B/././././././././././././.]
>
> # OSU MPI Latency Test v4.4.1
>
> # Size  Latency (us)
>
> 0   0.50
>
> 1   0.50
>
> 2   0.50
>
> 4   0.49
>
>
>
>
>
> $mpirun -np 2 -report-bindings -cpu-set
1,7 
/hpc/local/benchmarks/hpc-stack-icc/install/ompi-mellanox-v1.8/tests/osu-micro-benchmarks-4.4.1/osu_latency

>
> [clx-orion-001:10155] MCW rank 0 bound to socket 0[core 1[hwt 0]]:
[./B/./././././././././././.][./././././././././././././.]
>
> [clx-orion-001:10155] MCW rank 1 bound to socket 0[core 7[hwt 0]]:
[./././././././B/./././././.][./././././././././././././.]
>
> # OSU MPI Latency Test v4.4.1
>
> # Size  Latency (us)
>
> 0   0.23
>
> 1   0.24
>
> 2   0.23
>
> 4   0.22
>
> 8   0.23
>
>
>
> Both hwloc and /proc/cpuinfo indicates following cpu numbering
>
> socket 0 cpus: 0 1 2 3 4 5 6 14 15 16 17 18 19 20
>
> socket 1 cpus: 7 8 9 10 11 12 13 21 22 23 24 25 26 27
>
>
>
> $hwloc-info -f
>
> Machine (256GB)
>
>   NUMANode L#0 (P#0 128GB) + Socket L#0 + L3 L#0 (35MB)
>
>     L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0 + PU L#0 (P#0)
>
>     L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1 + PU L#1 (P#1)
>
>     L2 L#2 (256KB) + L1 L#2 (32KB) + Core L#2 + PU L#2 (P#2)
>
>     L2 L#3 (256KB) + L1 L#3 (32KB) + Core L#3 + PU L#3 (P#3)
>
>     L2 L#4 (256KB) + L1 L#4 (32KB) + Core L#4 + PU L#4 (P#4)
>
>     L2 L#5 (256KB) + L1 L#5 (32KB) + Core L#5 + PU L#5 (P#5)
>
>     L2 L#6 (256KB) + L1 L#6 (32KB) + Core L#6 + PU L#6 (P#6)
>
>     L2 L#7 (256KB) + L1 L#7 (32KB) + Core L#7 + PU L#7 (P#14)
>
>     L2 L#8 (256KB) + L1 L#8 (32KB) + Core L#8 + PU L#8 (P#15)
>
>     L2 L#9 (256KB) + L1 L#9 (32KB) + Core L#9 + PU L#9 (P#16)
>
>     L2 L#10 (256KB) + L1 L#10 (32KB) + Core L#10 + PU L#10 (P#17)
>
>     L2 L#11 (256KB) + L1 L#11 (32KB) + Core L#11 + PU L#11 (P#18)
>
>     L2 L#12 (256KB) + L1 L#12 (32KB) + Core L#12 + PU L#12 (P#19)
>
>     L2 L#13 (256KB) + L1 L#13 (32KB) + Core L#13 + PU L#13 (P#20)
>
>   NUMANode L#1 (P#1 128GB) + Socket L#1 + L3 L#1 (35MB)
>
>     L2 L#14 (256KB) + L1 L#14 (32KB) + Core L#14 + PU L#14 (P#7)
>
>     L2 L#15 (256KB) + L1 L#15 (32KB) + Core L#15 + PU L#15 (P#8)
>
>     L2 L#16 (256KB) + L1 L#16 (32KB) + Core L#16 + PU L#16 (P#9)
>
>     L2 L#17 (256KB) + L1 L#17 (32KB) + Core L#17 + PU L#17 (P#10)
>
>     L2 L#18 (256KB) + L1 L#18 (32KB) + Core L#18 + PU L#18 (P#11)
>
>     L2 L#19 (256KB) + L1 L#19 (32KB) + Core L#19 + PU L#19 (P#12)
>
>     L2 L#20 (256KB) + L1 L#20 (32KB) + Core L#20 + PU L#20 (P#13)
>
>     L2 L#21 (256KB) + L1 L#21 (32KB) + Core L#21 + PU L#21 (P#21)
>
>     L2 L#22 (256KB) + L1 L#22 (32KB) + Core L#22 + PU L#22 (P#22)
>
>     L2 L#23 (256KB) + L1 L#23 (32KB) + Core L#23 + PU L#23 (P#23)
>
>     L2 L#24 (256KB) + L1 L#24 (32KB) + Core L#24 + PU L#24 (P#24)
>
>     L2 L#25 (256KB) + L1 L#25 (32KB) + Core L#25 + PU L#25 (P#25)
>
>     L2 L#26 (256KB) + L1 L#26 (32KB) + Core L#26 + PU L#26 (P#26)
>
>     L2 L#27 (256KB) + L1 L#27 (32KB) + Core L#27 + PU L#27 (P#27)
>
>
>
>
>
> So, Is --reporting-binding shows one more level of logical CPU numbering?
>
>
>
>
>
> -Devendar
>
>
>
>
>
> From:devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Ralph Castain
> Sent: Monday, April 20, 2015 3:52 PM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] binding output error
>
>
>
> Also, was this with HT's enabled? I'm wondering if the print code is
incorrectly computing the core because it isn't correctly accounting for HT
cpus.
>
>
>
>
>
> On Mon, Apr 20, 2015 at 3:49 PM, Jeff Squyres (jsquyres)
 wrote:
>
> Ralph's the authority on this one, but just to be sure: are all nodes the
same topology? E.g., does adding "--hetero-nodes" to the mpirun command
line fix the problem?
>
>
>
> > On Apr 20, 2015, at 9:29 AM, Elena Elkina 
wrote:
> >
> > Hi guys,
> >
> > I faced with an issue on our cluster related to mapping & binding
policies on 1.8.5.
> >
> > The matter is that 

Re: [OMPI devel] oshmem-openmpi-1.8.2 causes compile error with -i8(64bit fortarn integer) configuration

2014-09-16 Thread tmishima
Gilles,

Your patch looks good to me and I think this issue should be fixed
in the upcoming openmpi-1.8.3. Could you commit it to the trunk and
create a CMR for it?

Tetsuya

> Mishima-san,
>
> the root cause is macro expansion does not always occur as one would
> have expected ...
>
> could you please give a try to the attached patch ?
>
> it compiles (at least with gcc) and i made zero tests so far 
>
> Cheers,
>
> Gilles
>
> On 2014/09/01 10:44, tmish...@jcity.maeda.co.jp wrote:
> > Hi folks,
> >
> > I tried to build openmpi-1.8.2 with PGI fortran and -i8(64bit fortran
int)
> > option
> > as shown below:
> >
> > ./configure \
> > --prefix=/home/mishima/opt/mpi/openmpi-1.8.2-pgi14.7_int64 \
> > --enable-abi-breaking-fortran-status-i8-fix \
> > --with-tm \
> > --with-verbs \
> > --disable-ipv6 \
> > CC=pgcc CFLAGS="-tp k8-64e -fast" \
> > CXX=pgCC CXXFLAGS="-tp k8-64e -fast" \
> > F77=pgfortran FFLAGS="-i8 -tp k8-64e -fast" \
> > FC=pgfortran FCFLAGS="-i8 -tp k8-64e -fast"
> >
> > Then I saw this compile error in making oshmem at the last stage:
> >
> > if test ! -r pshmem_real8_swap_f.c ; then \
> > pname=`echo pshmem_real8_swap_f.c | cut -b '2-'` ; \
> > ln -s ../../../../oshmem/shmem/fortran/$pname
> > pshmem_real8_swap_f.c ; \
> > fi
> >   CC   pshmem_real8_swap_f.lo
> > if test ! -r pshmem_int4_cswap_f.c ; then \
> > pname=`echo pshmem_int4_cswap_f.c | cut -b '2-'` ; \
> > ln -s ../../../../oshmem/shmem/fortran/$pname
> > pshmem_int4_cswap_f.c ; \
> > fi
> >   CC   pshmem_int4_cswap_f.lo
> > PGC-S-0058-Illegal lvalue (pshmem_int4_cswap_f.c: 39)
> > PGC/x86-64 Linux 14.7-0: compilation completed with severe errors
> > make[3]: *** [pshmem_int4_cswap_f.lo] Error 1
> > make[3]: Leaving directory
> >
`/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem/shmem/fortran/profile'

> > make[2]: *** [all-recursive] Error 1
> > make[2]: Leaving directory
> >
`/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem/shmem/fortran'

> > make[1]: *** [all-recursive] Error 1
> > make[1]: Leaving directory
> > `/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem'
> > make: *** [all-recursive] Error 1
> >
> > I confirmed that it worked if I added configure option of
--disable-oshmem.
> > So, I hope that oshmem experts would fix this problem.
> >
> > (additional note)
> > I switched to use gnu compiler and checked with this configuration,
then
> > I got the same error:
> >
> > ./configure \
> > --prefix=/home/mishima/opt/mpi/openmpi-1.8.2-gnu_int64 \
> > --enable-abi-breaking-fortran-status-i8-fix \
> > --disable-ipv6 \
> > F77=gfortran \
> > FC=gfortran \
> > CC=gcc \
> > CXX=g++ \
> > FFLAGS="-m64 -fdefault-integer-8" \
> > FCFLAGS="-m64 -fdefault-integer-8" \
> > CFLAGS=-m64 \
> > CXXFLAGS=-m64
> >
> > make
> > 
> > if test ! -r pshmem_int4_cswap_f.c ; then \
> > pname=`echo pshmem_int4_cswap_f.c | cut -b '2-'` ; \
> > ln -s ../../../../oshmem/shmem/fortran/$pname
> > pshmem_int4_cswap_f.c ; \
> > fi
> >   CC   pshmem_int4_cswap_f.lo
> > pshmem_int4_cswap_f.c: In function 'shmem_int4_cswap_f':
> > pshmem_int4_cswap_f.c:39: error: invalid lvalue in unary '&'
> > make[3]: *** [pshmem_int4_cswap_f.lo] Error 1
> >
> > Regards
> > Tetsuya Mishima
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/08/15764.php
>
>  - oshmem.i8.patch___
> devel mailing list
> de...@open-mpi.org
> Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/develSearchable archives:
http://www.open-mpi.org/community/lists/devel/2014/09/index.php



Re: [OMPI devel] oshmem-openmpi-1.8.2 causes compile error with -i8(64bit fortarn integer) configuration

2014-09-01 Thread tmishima
Gilles,

Thank you for your fix. I successfully compiled it with PGI, although
I could not check it executing actual test run.

Tetsuya

> Mishima-san,
>
> the root cause is macro expansion does not always occur as one would
> have expected ...
>
> could you please give a try to the attached patch ?
>
> it compiles (at least with gcc) and i made zero tests so far 
>
> Cheers,
>
> Gilles
>
> On 2014/09/01 10:44, tmish...@jcity.maeda.co.jp wrote:
> > Hi folks,
> >
> > I tried to build openmpi-1.8.2 with PGI fortran and -i8(64bit fortran
int)
> > option
> > as shown below:
> >
> > ./configure \
> > --prefix=/home/mishima/opt/mpi/openmpi-1.8.2-pgi14.7_int64 \
> > --enable-abi-breaking-fortran-status-i8-fix \
> > --with-tm \
> > --with-verbs \
> > --disable-ipv6 \
> > CC=pgcc CFLAGS="-tp k8-64e -fast" \
> > CXX=pgCC CXXFLAGS="-tp k8-64e -fast" \
> > F77=pgfortran FFLAGS="-i8 -tp k8-64e -fast" \
> > FC=pgfortran FCFLAGS="-i8 -tp k8-64e -fast"
> >
> > Then I saw this compile error in making oshmem at the last stage:
> >
> > if test ! -r pshmem_real8_swap_f.c ; then \
> > pname=`echo pshmem_real8_swap_f.c | cut -b '2-'` ; \
> > ln -s ../../../../oshmem/shmem/fortran/$pname
> > pshmem_real8_swap_f.c ; \
> > fi
> >   CC   pshmem_real8_swap_f.lo
> > if test ! -r pshmem_int4_cswap_f.c ; then \
> > pname=`echo pshmem_int4_cswap_f.c | cut -b '2-'` ; \
> > ln -s ../../../../oshmem/shmem/fortran/$pname
> > pshmem_int4_cswap_f.c ; \
> > fi
> >   CC   pshmem_int4_cswap_f.lo
> > PGC-S-0058-Illegal lvalue (pshmem_int4_cswap_f.c: 39)
> > PGC/x86-64 Linux 14.7-0: compilation completed with severe errors
> > make[3]: *** [pshmem_int4_cswap_f.lo] Error 1
> > make[3]: Leaving directory
> >
`/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem/shmem/fortran/profile'

> > make[2]: *** [all-recursive] Error 1
> > make[2]: Leaving directory
> >
`/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem/shmem/fortran'

> > make[1]: *** [all-recursive] Error 1
> > make[1]: Leaving directory
> > `/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem'
> > make: *** [all-recursive] Error 1
> >
> > I confirmed that it worked if I added configure option of
--disable-oshmem.
> > So, I hope that oshmem experts would fix this problem.
> >
> > (additional note)
> > I switched to use gnu compiler and checked with this configuration,
then
> > I got the same error:
> >
> > ./configure \
> > --prefix=/home/mishima/opt/mpi/openmpi-1.8.2-gnu_int64 \
> > --enable-abi-breaking-fortran-status-i8-fix \
> > --disable-ipv6 \
> > F77=gfortran \
> > FC=gfortran \
> > CC=gcc \
> > CXX=g++ \
> > FFLAGS="-m64 -fdefault-integer-8" \
> > FCFLAGS="-m64 -fdefault-integer-8" \
> > CFLAGS=-m64 \
> > CXXFLAGS=-m64
> >
> > make
> > 
> > if test ! -r pshmem_int4_cswap_f.c ; then \
> > pname=`echo pshmem_int4_cswap_f.c | cut -b '2-'` ; \
> > ln -s ../../../../oshmem/shmem/fortran/$pname
> > pshmem_int4_cswap_f.c ; \
> > fi
> >   CC   pshmem_int4_cswap_f.lo
> > pshmem_int4_cswap_f.c: In function 'shmem_int4_cswap_f':
> > pshmem_int4_cswap_f.c:39: error: invalid lvalue in unary '&'
> > make[3]: *** [pshmem_int4_cswap_f.lo] Error 1
> >
> > Regards
> > Tetsuya Mishima
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/08/15764.php
>
>  - oshmem.i8.patch___
> devel mailing list
> de...@open-mpi.org
> Subscription:
http://www.open-mpi.org/mailman/listinfo.cgi/develSearchable archives:
http://www.open-mpi.org/community/lists/devel/2014/09/index.php



[OMPI devel] oshmem-openmpi-1.8.2 causes compile error with -i8(64bit fortarn integer) configuration

2014-08-31 Thread tmishima

Hi folks,

I tried to build openmpi-1.8.2 with PGI fortran and -i8(64bit fortran int)
option
as shown below:

./configure \
--prefix=/home/mishima/opt/mpi/openmpi-1.8.2-pgi14.7_int64 \
--enable-abi-breaking-fortran-status-i8-fix \
--with-tm \
--with-verbs \
--disable-ipv6 \
CC=pgcc CFLAGS="-tp k8-64e -fast" \
CXX=pgCC CXXFLAGS="-tp k8-64e -fast" \
F77=pgfortran FFLAGS="-i8 -tp k8-64e -fast" \
FC=pgfortran FCFLAGS="-i8 -tp k8-64e -fast"

Then I saw this compile error in making oshmem at the last stage:

if test ! -r pshmem_real8_swap_f.c ; then \
pname=`echo pshmem_real8_swap_f.c | cut -b '2-'` ; \
ln -s ../../../../oshmem/shmem/fortran/$pname
pshmem_real8_swap_f.c ; \
fi
  CC   pshmem_real8_swap_f.lo
if test ! -r pshmem_int4_cswap_f.c ; then \
pname=`echo pshmem_int4_cswap_f.c | cut -b '2-'` ; \
ln -s ../../../../oshmem/shmem/fortran/$pname
pshmem_int4_cswap_f.c ; \
fi
  CC   pshmem_int4_cswap_f.lo
PGC-S-0058-Illegal lvalue (pshmem_int4_cswap_f.c: 39)
PGC/x86-64 Linux 14.7-0: compilation completed with severe errors
make[3]: *** [pshmem_int4_cswap_f.lo] Error 1
make[3]: Leaving directory
`/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem/shmem/fortran/profile'
make[2]: *** [all-recursive] Error 1
make[2]: Leaving directory
`/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem/shmem/fortran'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory
`/home/mishima/mis/openmpi/openmpi-pgi14.7/int64/openmpi-1.8.2/oshmem'
make: *** [all-recursive] Error 1

I confirmed that it worked if I added configure option of --disable-oshmem.
So, I hope that oshmem experts would fix this problem.

(additional note)
I switched to use gnu compiler and checked with this configuration, then
I got the same error:

./configure \
--prefix=/home/mishima/opt/mpi/openmpi-1.8.2-gnu_int64 \
--enable-abi-breaking-fortran-status-i8-fix \
--disable-ipv6 \
F77=gfortran \
FC=gfortran \
CC=gcc \
CXX=g++ \
FFLAGS="-m64 -fdefault-integer-8" \
FCFLAGS="-m64 -fdefault-integer-8" \
CFLAGS=-m64 \
CXXFLAGS=-m64

make

if test ! -r pshmem_int4_cswap_f.c ; then \
pname=`echo pshmem_int4_cswap_f.c | cut -b '2-'` ; \
ln -s ../../../../oshmem/shmem/fortran/$pname
pshmem_int4_cswap_f.c ; \
fi
  CC   pshmem_int4_cswap_f.lo
pshmem_int4_cswap_f.c: In function 'shmem_int4_cswap_f':
pshmem_int4_cswap_f.c:39: error: invalid lvalue in unary '&'
make[3]: *** [pshmem_int4_cswap_f.lo] Error 1

Regards
Tetsuya Mishima



Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-08-02 Thread tmishima


Hi Ralph,

I comfirmed that the openib issue was really fixed by r32395
and hope you'll be able to release the final version soon.

Tetsuya

> Kewl - the openib issue has been fixed in the nightly tarball. I'm
waiting for review of a couple of pending CMRs, then we'll release a quick
rc4 and move to release the final version
>
>
> On Aug 1, 2014, at 9:55 PM, tmish...@jcity.maeda.co.jp wrote:
>
> >
> >
> > I comfirmed openmpi-1.8.2rc3 with PGI-14.7 worked fine for me
> > except for the openib issue reported by Mike Dubman.
> >
> > Tetsuya Mishima
> >
> >> Sorry, finally got through all this ompi email and see this problem
was
> > fixed.
> >>
> >> -Original Message-
> >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Pritchard
> > Jr., Howard
> >> Sent: Friday, August 01, 2014 8:59 AM
> >> To: Open MPI Developers
> >> Subject: Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built
with
> > PGI-14.7 causes link error
> >>
> >> Hi Jeff,
> >>
> >> Finally got info yesterday about where the newer PGI compilers are
hiding
> > out at LANL.
> >> I'll check this out today.
> >>
> >> Howard
> >>
> >>
> >> -Original Message-
> >> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff
Squyres
> > (jsquyres)
> >> Sent: Tuesday, July 29, 2014 5:24 PM
> >> To: Open MPI Developers List
> >> Subject: Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built
with
> > PGI-14.7 causes link error
> >>
> >> Tetsuya --
> >>
> >> I am unable to test with the PGI compiler -- I don't have a license.
I
> > was hoping that LANL would be able to test today, but I don't think
they
> > got to it.
> >>
> >> Can you send more details?
> >>
> >> E.g., can you send the all the stuff listed on
> > http://www.open-mpi.org/community/help/ for 1.8 and 1.8.2rc2 for the
14.7
> > compiler?
> >>
> >> I'm *guessing* that we've done something new in the changes since 1.8
> > that PGI doesn't support, and we need to disable that something
(hopefully
> > while not needing to disable the entire mpi_f08
> >> bindings...).
> >>
> >>
> >>
> >> On Jul 28, 2014, at 11:43 PM, tmish...@jcity.maeda.co.jp wrote:
> >>
> >>>
> >>> Hi folks,
> >>>
> >>> I tried to build openmpi-1.8.2rc2 with PGI-14.7 and execute a sample
> >>> program. Then, it causes linking error:
> >>>
> >>> [mishima@manage work]$ cat test.f
> >>> program hello_world
> >>> use mpi_f08
> >>> implicit none
> >>>
> >>> type(MPI_Comm) :: comm
> >>> integer :: myid, npes, ierror
> >>> integer :: name_length
> >>> character(len=MPI_MAX_PROCESSOR_NAME) :: processor_name
> >>>
> >>> call mpi_init(ierror)
> >>> comm = MPI_COMM_WORLD
> >>> call MPI_Comm_rank(comm, myid, ierror)
> >>> call MPI_Comm_size(comm, npes, ierror)
> >>> call MPI_Get_processor_name(processor_name, name_length, ierror)
> >>> write (*,'(A,X,I4,X,A,X,I4,X,A,X,A)')
> >>>+"Process", myid, "of", npes, "is on", trim(processor_name)
> >>> call MPI_Finalize(ierror)
> >>>
> >>> end program hello_world
> >>>
> >>> [mishima@manage work]$ mpif90 test.f -o test.ex
> >>> /tmp/pgfortran65ZcUeoncoqT.o: In function `.C1_283':
> >>> test.f:(.data+0x6c): undefined reference to
> > `mpi_f08_interfaces_callbacks_'
> >>> test.f:(.data+0x74): undefined reference to `mpi_f08_interfaces_'
> >>> test.f:(.data+0x7c): undefined reference to `pmpi_f08_interfaces_'
> >>> test.f:(.data+0x84): undefined reference to `mpi_f08_sizeof_'
> >>>
> >>> So, I did some more tests with previous version of PGI and
> >>> openmpi-1.8. The results are summarized as follows:
> >>>
> >>> PGI13.10   PGI14.7
> >>> openmpi-1.8   OK OK
> >>> openmpi-1.8.2rc2  configure sets use_f08_mpi:no  link error
> >>>
> >>> Regards,
> >>> Tetsuya Mishima
> >>>
> >>> ___
> >>> devel mailing list
> >>> de...@open-mpi.org
> >>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >>> Link to this post:
> >>> http://www.open-mpi.org/community/lists/devel/2014/07/15303.php
> >>
> >>
> >> --
> >> Jeff Squyres
> >> jsquy...@cisco.com
> >> For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >>
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post:
> > http://www.open-mpi.org/community/lists/devel/2014/07/15335.php
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post:
> > http://www.open-mpi.org/community/lists/devel/2014/08/15452.php
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post:
> > 

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-08-02 Thread tmishima


I comfirmed openmpi-1.8.2rc3 with PGI-14.7 worked fine for me
except for the openib issue reported by Mike Dubman.

Tetsuya Mishima

> Sorry, finally got through all this ompi email and see this problem was
fixed.
>
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Pritchard
Jr., Howard
> Sent: Friday, August 01, 2014 8:59 AM
> To: Open MPI Developers
> Subject: Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with
PGI-14.7 causes link error
>
> Hi Jeff,
>
> Finally got info yesterday about where the newer PGI compilers are hiding
out at LANL.
> I'll check this out today.
>
> Howard
>
>
> -Original Message-
> From: devel [mailto:devel-boun...@open-mpi.org] On Behalf Of Jeff Squyres
(jsquyres)
> Sent: Tuesday, July 29, 2014 5:24 PM
> To: Open MPI Developers List
> Subject: Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with
PGI-14.7 causes link error
>
> Tetsuya --
>
> I am unable to test with the PGI compiler -- I don't have a license.  I
was hoping that LANL would be able to test today, but I don't think they
got to it.
>
> Can you send more details?
>
> E.g., can you send the all the stuff listed on
http://www.open-mpi.org/community/help/ for 1.8 and 1.8.2rc2 for the 14.7
compiler?
>
> I'm *guessing* that we've done something new in the changes since 1.8
that PGI doesn't support, and we need to disable that something (hopefully
while not needing to disable the entire mpi_f08
> bindings...).
>
>
>
> On Jul 28, 2014, at 11:43 PM, tmish...@jcity.maeda.co.jp wrote:
>
> >
> > Hi folks,
> >
> > I tried to build openmpi-1.8.2rc2 with PGI-14.7 and execute a sample
> > program. Then, it causes linking error:
> >
> > [mishima@manage work]$ cat test.f
> >  program hello_world
> >  use mpi_f08
> >  implicit none
> >
> >  type(MPI_Comm) :: comm
> >  integer :: myid, npes, ierror
> >  integer :: name_length
> >  character(len=MPI_MAX_PROCESSOR_NAME) :: processor_name
> >
> >  call mpi_init(ierror)
> >  comm = MPI_COMM_WORLD
> >  call MPI_Comm_rank(comm, myid, ierror)
> >  call MPI_Comm_size(comm, npes, ierror)
> >  call MPI_Get_processor_name(processor_name, name_length, ierror)
> >  write (*,'(A,X,I4,X,A,X,I4,X,A,X,A)')
> > +"Process", myid, "of", npes, "is on", trim(processor_name)
> >  call MPI_Finalize(ierror)
> >
> >  end program hello_world
> >
> > [mishima@manage work]$ mpif90 test.f -o test.ex
> > /tmp/pgfortran65ZcUeoncoqT.o: In function `.C1_283':
> > test.f:(.data+0x6c): undefined reference to
`mpi_f08_interfaces_callbacks_'
> > test.f:(.data+0x74): undefined reference to `mpi_f08_interfaces_'
> > test.f:(.data+0x7c): undefined reference to `pmpi_f08_interfaces_'
> > test.f:(.data+0x84): undefined reference to `mpi_f08_sizeof_'
> >
> > So, I did some more tests with previous version of PGI and
> > openmpi-1.8. The results are summarized as follows:
> >
> >  PGI13.10   PGI14.7
> > openmpi-1.8   OK OK
> > openmpi-1.8.2rc2  configure sets use_f08_mpi:no  link error
> >
> > Regards,
> > Tetsuya Mishima
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
> > http://www.open-mpi.org/community/lists/devel/2014/07/15303.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/07/15335.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/08/15452.php
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/08/15455.php



Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-31 Thread tmishima


Hi Paul,

Thank you for your investigation. I'm sure it's very
close to fix the problem although I myself can't do
that. So I must owe you something...

Please try Awamori, which is Okinawa's sake and very
good in such a hot day.

Tetsuya

> On Wed, Jul 30, 2014 at 8:53 PM, Paul Hargrove wrote:
> [...]
> I have a clear answer to *what* is different (below) and am next looking
into the why/how now.
> It seems that 1.8.1 has included all dependencies into libmpi_usempif08
while 1.8.2rc2 does not.
>  [...]
>
> The difference appears to stem from the following difference in
ompi/mpi/fortran/use-mpi-f08/Makefile.am:
>
> 1.8.1:
> libmpi_usempif08_la_LIBADD = \
>         $(module_sentinel_file) \
>         $(OMPI_MPIEXT_USEMPIF08_LIBS) \
>         $(top_builddir)/ompi/libmpi.la
>
> 1.8.2rc2:
> libmpi_usempif08_la_LIBADD = \
>         $(OMPI_MPIEXT_USEMPIF08_LIBS) \
>         $(top_builddir)/ompi/libmpi.la
> libmpi_usempif08_la_DEPENDENCIES = $(module_sentinel_file)
>
> Where in both cases one has:
>
> module_sentinel_file = \
>         libforce_usempif08_internal_modules_to_be_built.la
>
> which contains all of the symbols which my previous testing found had
"disappeared" from libmpi_usempif08.so between 1.8.1 and 1.8.2rc2.
>
> I don't have recent enough autotools to attempt the change the
Makefile.am, but instead restored the removed item from
libmpi_usempif08_la_LIBADD directly in Makefile.in.  However, rather than
fixing
> the problem, that resulted in multiple definitions of a bunch of _eq and
_ne functions (e.g. mpi_f08_types_ompi_request_op_ne_).  So, I am uncertain
how to proceed.
>
> Use svn blame points at a "bulk" CMR of many fortran related changes,
including one related to the eq/ne operators.  So, I am turning over this
investigation to Jeff and/or Ralph to figure out what
> actually is required to fix this without loss of whatever benefits were
in that CMR.  I am still available to test the proposed fixes.  Happy
hunting...
>
> Somebody owes me a virtual beer (or nihonshu) ;-)
> -Paul
>
>
> --
>
> Paul H. Hargrove                          phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department     Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/develLink to
this post: http://www.open-mpi.org/community/lists/devel/2014/07/15387.php



Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread tmishima


Paul and Jeff,

I additionally installed PGI14.4 and check the behavior.
Then, I confirmed that both versions create same results.

PGI14.7:
[mishima@manage work]$ mpif90 test.f -o test.ex --showme
pgfortran test.f -o test.ex
-I/home/mishima/opt/mpi/openmpi-1.8.2rc2-pgi14.7/include
-I/home/mishima/opt/mpi/openmpi-1.8
.2rc2-pgi14.7/lib -Wl,-rpath
-Wl,/home/mishima/opt/mpi/openmpi-1.8.2rc2-pgi14.7/lib
-L/home/mishima/opt/mpi/openmpi-1.8.
2rc2-pgi14.7/lib -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi
[mishima@manage work]$ mpif90 test.f -o test.ex
/tmp/pgfortranD-vdxk_lnPL3.o: In function `.C1_283':
test.f:(.data+0x6c): undefined reference to `mpi_f08_interfaces_callbacks_'
test.f:(.data+0x74): undefined reference to `mpi_f08_interfaces_'
test.f:(.data+0x7c): undefined reference to `pmpi_f08_interfaces_'
test.f:(.data+0x84): undefined reference to `mpi_f08_sizeof_'

PGI14.4:
[mishima@manage work]$ mpif90 test.f -o test.ex --showme
pgfortran test.f -o test.ex
-I/home/mishima/opt/mpi/openmpi-1.8.2rc2-pgi14.4/include
-I/home/mishima/opt/mpi/openmpi-1.8
.2rc2-pgi14.4/lib -Wl,-rpath
-Wl,/home/mishima/opt/mpi/openmpi-1.8.2rc2-pgi14.4/lib
-L/home/mishima/opt/mpi/openmpi-1.8.
2rc2-pgi14.4/lib -lmpi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi
[mishima@manage work]$ mpif90 test.f -o test.ex
/tmp/pgfortranm9sdKiZYkrMy.o: In function `.C1_283':
test.f:(.data+0x6c): undefined reference to `mpi_f08_interfaces_callbacks_'
test.f:(.data+0x74): undefined reference to `mpi_f08_interfaces_'
test.f:(.data+0x7c): undefined reference to `pmpi_f08_interfaces_'
test.f:(.data+0x84): undefined reference to `mpi_f08_sizeof_'

As I reported before, mpi_f08*.mod is created in $prefix/lib.

[mishima@manage openmpi-1.8.2rc2-pgi14.7]$ ll lib/mpi_f08*
-rwxr-xr-x 1 mishima mishima327 Jul 30 12:27 lib/mpi_f08_ext.mod
-rwxr-xr-x 1 mishima mishima  11716 Jul 30 12:27
lib/mpi_f08_interfaces_callbacks.mod
-rwxr-xr-x 1 mishima mishima 374813 Jul 30 12:27 lib/mpi_f08_interfaces.mod
-rwxr-xr-x 1 mishima mishima 715615 Jul 30 12:27 lib/mpi_f08.mod
-rwxr-xr-x 1 mishima mishima  14730 Jul 30 12:27 lib/mpi_f08_sizeof.mod
-rwxr-xr-x 1 mishima mishima  77141 Jul 30 12:27 lib/mpi_f08_types.mod


Strange thing is that openmpi-1.8 with PGI14.7 works fine.
What's the difference with openmpi-1.8 and openmpi-1.8.2rc2?

[mishima@manage work]$ mpif90 test.f -o test.ex --showme
pgfortran test.f -o test.ex
-I/home/mishima/opt/mpi/openmpi-1.8-pgi14.7/include
-I/home/mishima/opt/mpi/openmpi-1.8-pgi1
4.7/lib -Wl,-rpath -Wl,/home/mishima/opt/mpi/openmpi-1.8-pgi14.7/lib
-L/home/mishima/opt/mpi/openmpi-1.8-pgi14.7/lib -lm
pi_usempif08 -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi
[mishima@manage work]$ mpif90 test.f -o test.ex
[mishima@manage work]$

Tetsuya

> On Jul 28, 2014, at 11:43 PM, tmish...@jcity.maeda.co.jp wrote:
>
> > [mishima@manage work]$ mpif90 test.f -o test.ex
> > /tmp/pgfortran65ZcUeoncoqT.o: In function `.C1_283':
> > test.f:(.data+0x6c): undefined reference to
`mpi_f08_interfaces_callbacks_'
> > test.f:(.data+0x74): undefined reference to `mpi_f08_interfaces_'
> > test.f:(.data+0x7c): undefined reference to `pmpi_f08_interfaces_'
> > test.f:(.data+0x84): undefined reference to `mpi_f08_sizeof_'
>
> Just to go back to the original post here: can you send the results of
>
> mpifort test.f -o test.ex --showme
>
> I'd like to see what fortran libraries are being linked in.  Here's what
I get when I compile OMPI with the Intel suite:
>
> -
> $ mpifort hello_usempif08.f90 -o hello --showme
> ifort hello_usempif08.f90 -o hello -I/home/jsquyres/bogus/include
-I/home/jsquyres/bogus/lib -Wl,-rpath -Wl,/home/jsquyres/bogus/lib
-Wl,--enable-new-dtags -L/home/jsquyres/bogus/lib -lmpi_usempif08
> -lmpi_usempi_ignore_tkr -lmpi_mpifh -lmpi
> 
>
> I note that with the Intel compiler, the Fortran module files are created
in the lib directory (i.e., $prefix/lib), which is -L'ed on the link line.
Does the PGI compiler require something
> different?  Does the PGI 14 compiler make an additional library for
modules that we need to link in?
>
> We didn't use CONTAINS, and it supposedly works fine with the mpi module
(right, guys?), so I'm not sure would the same scheme wouldn't work for the
mpi_f08 module...?
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/07/15377.php



Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread tmishima


Hi Paul, thank you for your comment.

I don't think my mpi_f08.mod is older one, because the time stamp is
equal to the time when I rebuilt them today.

[mishima@manage openmpi-1.8.2rc2-pgi14.7]$ ll lib/mpi*
-rwxr-xr-x 1 mishima mishima315 Jul 30 12:27 lib/mpi_ext.mod
-rwxr-xr-x 1 mishima mishima327 Jul 30 12:27 lib/mpi_f08_ext.mod
-rwxr-xr-x 1 mishima mishima  11716 Jul 30 12:27
lib/mpi_f08_interfaces_callbacks.mod
-rwxr-xr-x 1 mishima mishima 374813 Jul 30 12:27 lib/mpi_f08_interfaces.mod
-rwxr-xr-x 1 mishima mishima 715615 Jul 30 12:27 lib/mpi_f08.mod
-rwxr-xr-x 1 mishima mishima  14730 Jul 30 12:27 lib/mpi_f08_sizeof.mod
-rwxr-xr-x 1 mishima mishima  77141 Jul 30 12:27 lib/mpi_f08_types.mod
-rwxr-xr-x 1 mishima mishima 878339 Jul 30 12:27 lib/mpi.mod

Regards,
Tetsuya

> On Tue, Jul 29, 2014 at 6:38 PM, Paul Hargrove wrote:
>
> On Tue, Jul 29, 2014 at 6:33 PM, Paul Hargrove wrote:
> I am trying again with an explicit --enable-mpi-fortran=usempi at
configure time to see what happens.
>
> Of course that should have said --enable-mpi-fortran=usempif08
>
> I've switched to using PG13.6 for my testing.
> I find that even when I pass that flag I see that use_mpi_f08 is NOT
enabled:
>
> checking Fortran compiler ignore TKR syntax... not cached; checking
variants
> checking for Fortran compiler support of TYPE(*), DIMENSION(*)... no
> checking for Fortran compiler support of !DEC$ ATTRIBUTES NO_ARG_CHECK...
no
> checking for Fortran compiler support of !$PRAGMA IGNORE_TKR... no
> checking for Fortran compiler support of !DIR$ IGNORE_TKR... yes
> checking Fortran compiler ignore TKR syntax... 1:real, dimension(*):!DIR$
IGNORE_TKR
> checking if Fortran compiler supports ISO_C_BINDING... yes
> checking if building Fortran 'use mpi' bindings... yes
> checking if Fortran compiler supports SUBROUTINE BIND(C)... yes
> checking if Fortran compiler supports TYPE, BIND(C)... yes
> checking if Fortran compiler supports TYPE(type), BIND(C, NAME="name")...
yes
> checking if Fortran compiler supports PROCEDURE... no
> checking if building Fortran 'use mpi_f08' bindings... no
>
> Contrast that to openmpi-1.8.1 and the same compiler:
>
> checking Fortran compiler ignore TKR syntax... not cached; checking
variants
> checking for Fortran compiler support of TYPE(*), DIMENSION(*)... no
> checking for Fortran compiler support of !DEC$ ATTRIBUTES NO_ARG_CHECK...
no
> checking for Fortran compiler support of !$PRAGMA IGNORE_TKR... no
> checking for Fortran compiler support of !DIR$ IGNORE_TKR... yes
> checking Fortran compiler ignore TKR syntax... 1:real, dimension(*):!DIR$
IGNORE_TKR
> checking if building Fortran 'use mpi' bindings... yes
> checking if Fortran compiler supports ISO_C_BINDING... yes
> checking if Fortran compiler supports SUBROUTINE BIND(C)... yes
> checking if Fortran compiler supports TYPE, BIND(C)... yes
> checking if Fortran compiler supports TYPE(type), BIND(C, NAME="name")...
yes
> checking if Fortran compiler supports optional arguments... yes
> checking if Fortran compiler supports PRIVATE... yes
> checking if Fortran compiler supports PROTECTED... yes
> checking if Fortran compiler supports ABSTRACT... yes
> checking if Fortran compiler supports ASYNCHRONOUS... yes
> checking if Fortran compiler supports PROCEDURE... no
> checking size of Fortran type(test_mpi_handle)... 4
> checking Fortran compiler F08 assumed rank syntax... not cached; checking
> checking for Fortran compiler support of TYPE(*), DIMENSION(..)... no
> checking Fortran compiler F08 assumed rank syntax... no
> checking which mpi_f08 implementation to build... "good" compiler, no
array subsections
> checking if building Fortran 'use mpi_f08' bindings... yes
>
> So, somewhere between 1.8.1 and 1.8.2rc2 something has happened in the
configure logic to disqualify the pgf90 compiler.
>
> I also surprised to see 1.8.2rc2 performing *fewer* tests of FC then
1.8.1 did (unless they moved elsewhere?).
>
> In the end I cannot reproduce the originally reported problem for the
simple reason that I instead see:
>
> {hargrove@hopper04
openmpi-1.8.2rc2-linux-x86_64-pgi-14.4}$ ./INST/bin/mpif90 ../test.f
> PGF90-F-0004-Unable to open MODULE file mpi_f08.mod (../test.f: 2)
> PGF90/x86-64 Linux 14.4-0: compilation aborted
>
>
> Tetsuya Mishima,
>
> Is it possible that your installation of 1.8.2rc2 was to the same prefix
as an older build?
> It that is the case, you may have the mpi_f08.mod from the older build
even though no f08 support is in the new build.
>
>
> -Paul
>
>
> --
>
> Paul H. Hargrove                          phhargr...@lbl.gov
> Future Technologies Group
> Computer and Data Sciences Department     Tel: +1-510-495-2352
> Lawrence Berkeley National Laboratory     Fax: +1-510-486-6900
___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/develLink to
this post: 

Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread tmishima


This is another one.

(See attached file: openmpi-1.8.2rc2-pgi14.7.tar.gz)

Tetusya

> Tetsuya --
>
> I am unable to test with the PGI compiler -- I don't have a license.  I
was hoping that LANL would be able to test today, but I don't think they
got to it.
>
> Can you send more details?
>
> E.g., can you send the all the stuff listed on
http://www.open-mpi.org/community/help/ for 1.8 and 1.8.2rc2 for the 14.7
compiler?
>
> I'm *guessing* that we've done something new in the changes since 1.8
that PGI doesn't support, and we need to disable that something (hopefully
while not needing to disable the entire mpi_f08
> bindings...).
>
>
>
> On Jul 28, 2014, at 11:43 PM, tmish...@jcity.maeda.co.jp wrote:
>
> >
> > Hi folks,
> >
> > I tried to build openmpi-1.8.2rc2 with PGI-14.7 and execute a sample
> > program. Then, it causes linking error:
> >
> > [mishima@manage work]$ cat test.f
> >  program hello_world
> >  use mpi_f08
> >  implicit none
> >
> >  type(MPI_Comm) :: comm
> >  integer :: myid, npes, ierror
> >  integer :: name_length
> >  character(len=MPI_MAX_PROCESSOR_NAME) :: processor_name
> >
> >  call mpi_init(ierror)
> >  comm = MPI_COMM_WORLD
> >  call MPI_Comm_rank(comm, myid, ierror)
> >  call MPI_Comm_size(comm, npes, ierror)
> >  call MPI_Get_processor_name(processor_name, name_length, ierror)
> >  write (*,'(A,X,I4,X,A,X,I4,X,A,X,A)')
> > +"Process", myid, "of", npes, "is on", trim(processor_name)
> >  call MPI_Finalize(ierror)
> >
> >  end program hello_world
> >
> > [mishima@manage work]$ mpif90 test.f -o test.ex
> > /tmp/pgfortran65ZcUeoncoqT.o: In function `.C1_283':
> > test.f:(.data+0x6c): undefined reference to
`mpi_f08_interfaces_callbacks_'
> > test.f:(.data+0x74): undefined reference to `mpi_f08_interfaces_'
> > test.f:(.data+0x7c): undefined reference to `pmpi_f08_interfaces_'
> > test.f:(.data+0x84): undefined reference to `mpi_f08_sizeof_'
> >
> > So, I did some more tests with previous version of PGI and
> > openmpi-1.8. The results are summarized as follows:
> >
> >  PGI13.10   PGI14.7
> > openmpi-1.8   OK OK
> > openmpi-1.8.2rc2  configure sets use_f08_mpi:no  link error
> >
> > Regards,
> > Tetsuya Mishima
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/07/15303.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/07/15335.php

openmpi-1.8.2rc2-pgi14.7.tar.gz
Description: Binary data


Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread tmishima


Hi Jeff,

Sorry for poor information and late reply. Today, I attended a very very
long meeting ...

Anyway, I attached compile-output and configure-log.
(due to file size limitation, I send them in twice)

I hope you could find the problem.

(See attached file: openmpi-1.8-pgi14.7.tar.gz)

Regards,
Tetsuya

> Tetsuya --
>
> I am unable to test with the PGI compiler -- I don't have a license.  I
was hoping that LANL would be able to test today, but I don't think they
got to it.
>
> Can you send more details?
>
> E.g., can you send the all the stuff listed on
http://www.open-mpi.org/community/help/ for 1.8 and 1.8.2rc2 for the 14.7
compiler?
>
> I'm *guessing* that we've done something new in the changes since 1.8
that PGI doesn't support, and we need to disable that something (hopefully
while not needing to disable the entire mpi_f08
> bindings...).
>
>
>
> On Jul 28, 2014, at 11:43 PM, tmish...@jcity.maeda.co.jp wrote:
>
> >
> > Hi folks,
> >
> > I tried to build openmpi-1.8.2rc2 with PGI-14.7 and execute a sample
> > program. Then, it causes linking error:
> >
> > [mishima@manage work]$ cat test.f
> >  program hello_world
> >  use mpi_f08
> >  implicit none
> >
> >  type(MPI_Comm) :: comm
> >  integer :: myid, npes, ierror
> >  integer :: name_length
> >  character(len=MPI_MAX_PROCESSOR_NAME) :: processor_name
> >
> >  call mpi_init(ierror)
> >  comm = MPI_COMM_WORLD
> >  call MPI_Comm_rank(comm, myid, ierror)
> >  call MPI_Comm_size(comm, npes, ierror)
> >  call MPI_Get_processor_name(processor_name, name_length, ierror)
> >  write (*,'(A,X,I4,X,A,X,I4,X,A,X,A)')
> > +"Process", myid, "of", npes, "is on", trim(processor_name)
> >  call MPI_Finalize(ierror)
> >
> >  end program hello_world
> >
> > [mishima@manage work]$ mpif90 test.f -o test.ex
> > /tmp/pgfortran65ZcUeoncoqT.o: In function `.C1_283':
> > test.f:(.data+0x6c): undefined reference to
`mpi_f08_interfaces_callbacks_'
> > test.f:(.data+0x74): undefined reference to `mpi_f08_interfaces_'
> > test.f:(.data+0x7c): undefined reference to `pmpi_f08_interfaces_'
> > test.f:(.data+0x84): undefined reference to `mpi_f08_sizeof_'
> >
> > So, I did some more tests with previous version of PGI and
> > openmpi-1.8. The results are summarized as follows:
> >
> >  PGI13.10   PGI14.7
> > openmpi-1.8   OK OK
> > openmpi-1.8.2rc2  configure sets use_f08_mpi:no  link error
> >
> > Regards,
> > Tetsuya Mishima
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/07/15303.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/07/15335.php

openmpi-1.8-pgi14.7.tar.gz
Description: Binary data


Re: [OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-30 Thread tmishima


Sorry for poor information. I attached compile-output and configure-log.
I hope you could find the problem.

(See attached file: openmpi-pgi14.7.tar.gz)

Regards,
Tetsuya Mishima

> Tetsuya --
>
> I am unable to test with the PGI compiler -- I don't have a license.  I
was hoping that LANL would be able to test today, but I don't think they
got to it.
>
> Can you send more details?
>
> E.g., can you send the all the stuff listed on
http://www.open-mpi.org/community/help/ for 1.8 and 1.8.2rc2 for the 14.7
compiler?
>
> I'm *guessing* that we've done something new in the changes since 1.8
that PGI doesn't support, and we need to disable that something (hopefully
while not needing to disable the entire mpi_f08
> bindings...).
>
>
>
> On Jul 28, 2014, at 11:43 PM, tmish...@jcity.maeda.co.jp wrote:
>
> >
> > Hi folks,
> >
> > I tried to build openmpi-1.8.2rc2 with PGI-14.7 and execute a sample
> > program. Then, it causes linking error:
> >
> > [mishima@manage work]$ cat test.f
> >  program hello_world
> >  use mpi_f08
> >  implicit none
> >
> >  type(MPI_Comm) :: comm
> >  integer :: myid, npes, ierror
> >  integer :: name_length
> >  character(len=MPI_MAX_PROCESSOR_NAME) :: processor_name
> >
> >  call mpi_init(ierror)
> >  comm = MPI_COMM_WORLD
> >  call MPI_Comm_rank(comm, myid, ierror)
> >  call MPI_Comm_size(comm, npes, ierror)
> >  call MPI_Get_processor_name(processor_name, name_length, ierror)
> >  write (*,'(A,X,I4,X,A,X,I4,X,A,X,A)')
> > +"Process", myid, "of", npes, "is on", trim(processor_name)
> >  call MPI_Finalize(ierror)
> >
> >  end program hello_world
> >
> > [mishima@manage work]$ mpif90 test.f -o test.ex
> > /tmp/pgfortran65ZcUeoncoqT.o: In function `.C1_283':
> > test.f:(.data+0x6c): undefined reference to
`mpi_f08_interfaces_callbacks_'
> > test.f:(.data+0x74): undefined reference to `mpi_f08_interfaces_'
> > test.f:(.data+0x7c): undefined reference to `pmpi_f08_interfaces_'
> > test.f:(.data+0x84): undefined reference to `mpi_f08_sizeof_'
> >
> > So, I did some more tests with previous version of PGI and
> > openmpi-1.8. The results are summarized as follows:
> >
> >  PGI13.10   PGI14.7
> > openmpi-1.8   OK OK
> > openmpi-1.8.2rc2  configure sets use_f08_mpi:no  link error
> >
> > Regards,
> > Tetsuya Mishima
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/07/15303.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/07/15335.php

openmpi-pgi14.7.tar.gz
Description: Binary data


[OMPI devel] openmpi-1.8.2rc2 and f08 interface built with PGI-14.7 causes link error

2014-07-29 Thread tmishima

Hi folks,

I tried to build openmpi-1.8.2rc2 with PGI-14.7 and execute a sample
program. Then, it causes linking error:

[mishima@manage work]$ cat test.f
  program hello_world
  use mpi_f08
  implicit none

  type(MPI_Comm) :: comm
  integer :: myid, npes, ierror
  integer :: name_length
  character(len=MPI_MAX_PROCESSOR_NAME) :: processor_name

  call mpi_init(ierror)
  comm = MPI_COMM_WORLD
  call MPI_Comm_rank(comm, myid, ierror)
  call MPI_Comm_size(comm, npes, ierror)
  call MPI_Get_processor_name(processor_name, name_length, ierror)
  write (*,'(A,X,I4,X,A,X,I4,X,A,X,A)')
 +"Process", myid, "of", npes, "is on", trim(processor_name)
  call MPI_Finalize(ierror)

  end program hello_world

[mishima@manage work]$ mpif90 test.f -o test.ex
/tmp/pgfortran65ZcUeoncoqT.o: In function `.C1_283':
test.f:(.data+0x6c): undefined reference to `mpi_f08_interfaces_callbacks_'
test.f:(.data+0x74): undefined reference to `mpi_f08_interfaces_'
test.f:(.data+0x7c): undefined reference to `pmpi_f08_interfaces_'
test.f:(.data+0x84): undefined reference to `mpi_f08_sizeof_'

So, I did some more tests with previous version of PGI and
openmpi-1.8. The results are summarized as follows:

  PGI13.10   PGI14.7
openmpi-1.8   OK OK
openmpi-1.8.2rc2  configure sets use_f08_mpi:no  link error

Regards,
Tetsuya Mishima



Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-20 Thread tmishima


Hi Ralph,

By the way, something is wrong with your latest rmaps_rank_file.c.
I've got the error below. I'm tring to find the problem. But, you
could find it more quickly...

[mishima@manage trial]$ cat rankfile
rank 0=node05 slot=0-1
rank 1=node05 slot=3-4
rank 2=node05 slot=6-7
[mishima@manage trial]$ mpirun -np 3 -rf rankfile -report-bindings
demos/myprog
--
Error, invalid syntax in the rankfile (rankfile)
syntax must be the fallowing
rank i=host_i slot=string
Examples of proper syntax include:
rank 1=host1 slot=1:0,1
rank 0=host2 slot=0:*
rank 2=host4 slot=1-2
rank 3=host3 slot=0:1;1:0-2
--
[manage.cluster:24456] [[20979,0],0] ORTE_ERROR_LOG: Bad parameter in file
rmaps_rank_file.c at line 483
[manage.cluster:24456] [[20979,0],0] ORTE_ERROR_LOG: Bad parameter in file
rmaps_rank_file.c at line 149
[manage.cluster:24456] [[20979,0],0] ORTE_ERROR_LOG: Bad parameter in file
base/rmaps_base_map_job.c at line 287

Regards,
Tetsuya Mishima

> My guess is that the coll/ml component may have problems with binding a
single process across multiple cores like that - it might be that we'll
have to have it check for that condition and disqualify
> itself. It is a particularly bad binding pattern, though, as shared
memory gets completely messed up when you split that way.
>
>
> On Jun 19, 2014, at 3:57 PM, tmish...@jcity.maeda.co.jp wrote:
>
> >
> > Hi folks,
> >
> > Recently I have been seeing a hang with trunk when I specify a
> > particular binding by use of rankfile or "-map-by slot".
> >
> > This can be reproduced by the rankfile which allocates a process
> > beyond socket boundary. For example, on the node05 which has 2 socket
> > with 4 core, the rank 1 is allocated through socket 0 and 1 as shown
> > below. Then it hangs in the middle of communication.
> >
> > [mishima@manage trial]$ cat rankfile1
> > rank 0=node05 slot=0-1
> > rank 1=node05 slot=3-4
> > rank 2=node05 slot=6-7
> >
> > [mishima@manage trial]$ mpirun -rf rankfile1 -report-bindings
demos/myprog
> > [node05.cluster:02342] MCW rank 0 bound to socket 0[core 0[hwt 0]],
socket
> > 0[core 1[hwt 0]]: [B/B/./.][./././.]
> > [node05.cluster:02342] MCW rank 1 bound to socket 0[core 3[hwt 0]],
socket
> > 1[core 4[hwt 0]]: [./././B][B/././.]
> > [node05.cluster:02342] MCW rank 2 bound to socket 1[core 6[hwt 0]],
socket
> > 1[core 7[hwt 0]]: [./././.][././B/B]
> > Hello world from process 2 of 3
> > Hello world from process 1 of 3
> > << hang here! >>
> >
> > If I disable coll_ml or use 1.8 series, it works, which means it
> > might be affected by coll_ml component, I guess. But, unfortunately,
> > I have no idea to fix this problem. So, please somebody could resolve
> > the issue.
> >
> > [mishima@manage trial]$ mpirun -rf rankfile1 -report-bindings -mca
> > coll_ml_priority 0 demos/myprog
> > [node05.cluster:02382] MCW rank 0 bound to socket 0[core 0[hwt 0]],
socket
> > 0[core 1[hwt 0]]: [B/B/./.][./././.]
> > [node05.cluster:02382] MCW rank 1 bound to socket 0[core 3[hwt 0]],
socket
> > 1[core 4[hwt 0]]: [./././B][B/././.]
> > [node05.cluster:02382] MCW rank 2 bound to socket 1[core 6[hwt 0]],
socket
> > 1[core 7[hwt 0]]: [./././.][././B/B]
> > Hello world from process 2 of 3
> > Hello world from process 0 of 3
> > Hello world from process 1 of 3
> >
> > In addtition, when I use the host with 12 cores, "-map-by slot" causes
the
> > same problem.
> > [mishima@manage trial]$ mpirun -np 3 -map-by slot:pe=4 -report-bindings
> > demos/myprog
> > [manage.cluster:02557] MCW rank 0 bound to socket 0[core 0[hwt 0]],
socket
> > 0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
> > cket 0[core 3[hwt 0]]: [B/B/B/B/./.][./././././.]
> > [manage.cluster:02557] MCW rank 1 bound to socket 0[core 4[hwt 0]],
socket
> > 0[core 5[hwt 0]], socket 1[core 6[hwt 0]], so
> > cket 1[core 7[hwt 0]]: [././././B/B][B/B/./././.]
> > [manage.cluster:02557] MCW rank 2 bound to socket 1[core 8[hwt 0]],
socket
> > 1[core 9[hwt 0]], socket 1[core 10[hwt 0]], s
> > ocket 1[core 11[hwt 0]]: [./././././.][././B/B/B/B]
> > Hello world from process 1 of 3
> > Hello world from process 2 of 3
> > << hang here! >>
> >
> > Regards,
> > Tetsuya Mishima
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/06/15030.php
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/06/15032.php



Re: [OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-20 Thread tmishima


I'm not sure, but I guess it's related to Gilles's ticket.
It's a quite bad binding pattern as Ralph pointed out, so
checking for that condition and disqualifying coll/ml could
be a practical solution as well.

Tetsuya

> It is related, but it means that coll/ml has a higher degree of
sensitivity to the binding pattern than what you reported (which was that
coll/ml doesn't work with unbound processes). What we are now
> seeing is that coll/ml also doesn't work when processes are bound across
sockets.
>
> Which means that Nathan's revised tests are going to have to cover a lot
more corner cases. Our locality flags don't currently include
"bound-to-multiple-sockets", and I'm not sure how he is going to
> easily resolve that case.
>
>
> On Jun 19, 2014, at 8:02 PM, Gilles Gouaillardet
 wrote:
>
> > Ralph and Tetsuya,
> >
> > is this related to the hang i reported at
> > http://www.open-mpi.org/community/lists/devel/2014/06/14975.php ?
> >
> > Nathan already replied he is working on a fix.
> >
> > Cheers,
> >
> > Gilles
> >
> >
> > On 2014/06/20 11:54, Ralph Castain wrote:
> >> My guess is that the coll/ml component may have problems with binding
a single process across multiple cores like that - it might be that we'll
have to have it check for that condition and
> disqualify itself. It is a particularly bad binding pattern, though, as
shared memory gets completely messed up when you split that way.
> >>
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/06/15033.php
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/06/15034.php



[OMPI devel] trunk hangs when I specify a particular binding by rankfile

2014-06-19 Thread tmishima

Hi folks,

Recently I have been seeing a hang with trunk when I specify a
particular binding by use of rankfile or "-map-by slot".

This can be reproduced by the rankfile which allocates a process
beyond socket boundary. For example, on the node05 which has 2 socket
with 4 core, the rank 1 is allocated through socket 0 and 1 as shown
below. Then it hangs in the middle of communication.

[mishima@manage trial]$ cat rankfile1
rank 0=node05 slot=0-1
rank 1=node05 slot=3-4
rank 2=node05 slot=6-7

[mishima@manage trial]$ mpirun -rf rankfile1 -report-bindings demos/myprog
[node05.cluster:02342] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]]: [B/B/./.][./././.]
[node05.cluster:02342] MCW rank 1 bound to socket 0[core 3[hwt 0]], socket
1[core 4[hwt 0]]: [./././B][B/././.]
[node05.cluster:02342] MCW rank 2 bound to socket 1[core 6[hwt 0]], socket
1[core 7[hwt 0]]: [./././.][././B/B]
Hello world from process 2 of 3
Hello world from process 1 of 3
<< hang here! >>

If I disable coll_ml or use 1.8 series, it works, which means it
might be affected by coll_ml component, I guess. But, unfortunately,
I have no idea to fix this problem. So, please somebody could resolve
the issue.

[mishima@manage trial]$ mpirun -rf rankfile1 -report-bindings -mca
coll_ml_priority 0 demos/myprog
[node05.cluster:02382] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]]: [B/B/./.][./././.]
[node05.cluster:02382] MCW rank 1 bound to socket 0[core 3[hwt 0]], socket
1[core 4[hwt 0]]: [./././B][B/././.]
[node05.cluster:02382] MCW rank 2 bound to socket 1[core 6[hwt 0]], socket
1[core 7[hwt 0]]: [./././.][././B/B]
Hello world from process 2 of 3
Hello world from process 0 of 3
Hello world from process 1 of 3

In addtition, when I use the host with 12 cores, "-map-by slot" causes the
same problem.
[mishima@manage trial]$ mpirun -np 3 -map-by slot:pe=4 -report-bindings
demos/myprog
[manage.cluster:02557] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
cket 0[core 3[hwt 0]]: [B/B/B/B/./.][./././././.]
[manage.cluster:02557] MCW rank 1 bound to socket 0[core 4[hwt 0]], socket
0[core 5[hwt 0]], socket 1[core 6[hwt 0]], so
cket 1[core 7[hwt 0]]: [././././B/B][B/B/./././.]
[manage.cluster:02557] MCW rank 2 bound to socket 1[core 8[hwt 0]], socket
1[core 9[hwt 0]], socket 1[core 10[hwt 0]], s
ocket 1[core 11[hwt 0]]: [./././././.][././B/B/B/B]
Hello world from process 1 of 3
Hello world from process 2 of 3
<< hang here! >>

Regards,
Tetsuya Mishima



Re: [OMPI devel] openmpi-1.8 - hangup using more than 4 nodes under managed state by Torque

2014-04-01 Thread tmishima


Thanks Ralph.

Tetsuya

> I tracked it down - not Torque specific, but impacts all managed
environments. Will fix
>
>
> On Apr 1, 2014, at 2:23 AM, tmish...@jcity.maeda.co.jp wrote:
>
> >
> > Hi Ralph,
> >
> > I saw another hangup with openmpi-1.8 when I used more than 4 nodes
> > (having 8 cores each) under managed state by Torque. Although I'm not
> > sure you can reproduce it with SLURM, at leaset with Torque it can be
> > reproduced in this way:
> >
> > [mishima@manage ~]$ qsub -I -l nodes=4:ppn=8
> > qsub: waiting for job 8726.manage.cluster to start
> > qsub: job 8726.manage.cluster ready
> >
> > [mishima@node09 ~]$ mpirun -np 65 ~/mis/openmpi/demos/myprog
> >
--
> > There are not enough slots available in the system to satisfy the 65
slots
> > that were requested by the application:
> >  /home/mishima/mis/openmpi/demos/myprog
> >
> > Either request fewer slots for your application, or make more slots
> > available
> > for use.
> >
--
> > <<< HANG HERE!! >>>
> > Abort is in progress...hit ctrl-c again within 5 seconds to forcibly
> > terminate
> >
> > I found this behavior when I happened to input wrong procs. With less
than
> > 4
> > nodes or rsh - namely unmanaged state, it works. I'm afraid to say I
have
> > no
> > idea how to resolve it. I hope you could fix the problem.
> >
> > Tetsuya
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Searchable archives:
http://www.open-mpi.org/community/lists/devel/2014/04/index.php
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/04/14438.php



Re: [OMPI devel] system call failed during shared memory initialization with openmpi-1.8a1r31254

2014-03-30 Thread tmishima


Hi Jeff,

it worked for me with openmpi-1.8rc1.

Tetsuya

> Ralph applied a bunch of CMRs to the v1.8 branch after the nightly
tarball was made last night.
>
> I just created a new nightly tarball that includes all of those CMRs:
1.8a1r31269.  It should have the fix for this error included in it.
>
>
> On Mar 28, 2014, at 6:50 AM,  wrote:
>
> >
> >
> > Thanks Jeff. It seems to be really the latest one - ticket #4474.
> >
> >> On Mar 28, 2014, at 5:45 AM,  wrote:
> >>
> >>>
> >
--
> >>> A system call failed during shared memory initialization that should
> >>> not have.  It is likely that your MPI job will now either abort or
> >>> experience performance degradation.
> >>>
> >>> Local host:  node03.cluster
> >>> System call: unlink
> >>>
> >
(2) /tmp/openmpi-sessions-mishima@node03_0/17579/1/vader_segment.node03.0
> >>> Error:   No such file or directory (errno 2)
> >>>
> >
--
> >>
> >>
> >> This error was just fixed last night.
> >>
> >> --
> >> Jeff Squyres
> >> jsquy...@cisco.com
> >> For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >>
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post:
> > http://www.open-mpi.org/community/lists/devel/2014/03/14416.php
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/03/14417.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/03/14419.php



Re: [OMPI devel] system call failed during shared memory initialization with openmpi-1.8a1r31254

2014-03-28 Thread tmishima


Thanks Jeff. But I'm already offline today ...
I can not confirm it until monday morning, sorry.

Tetsuya

> Ralph applied a bunch of CMRs to the v1.8 branch after the nightly
tarball was made last night.
>
> I just created a new nightly tarball that includes all of those CMRs:
1.8a1r31269.  It should have the fix for this error included in it.
>
>
> On Mar 28, 2014, at 6:50 AM,  wrote:
>
> >
> >
> > Thanks Jeff. It seems to be really the latest one - ticket #4474.
> >
> >> On Mar 28, 2014, at 5:45 AM,  wrote:
> >>
> >>>
> >
--
> >>> A system call failed during shared memory initialization that should
> >>> not have.  It is likely that your MPI job will now either abort or
> >>> experience performance degradation.
> >>>
> >>> Local host:  node03.cluster
> >>> System call: unlink
> >>>
> >
(2) /tmp/openmpi-sessions-mishima@node03_0/17579/1/vader_segment.node03.0
> >>> Error:   No such file or directory (errno 2)
> >>>
> >
--
> >>
> >>
> >> This error was just fixed last night.
> >>
> >> --
> >> Jeff Squyres
> >> jsquy...@cisco.com
> >> For corporate legal information go to:
> > http://www.cisco.com/web/about/doing_business/legal/cri/
> >>
> >> ___
> >> devel mailing list
> >> de...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >> Link to this post:
> > http://www.open-mpi.org/community/lists/devel/2014/03/14416.php
> >
> > ___
> > devel mailing list
> > de...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> > Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/03/14417.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/03/14419.php



Re: [OMPI devel] system call failed during shared memory initialization with openmpi-1.8a1r31254

2014-03-28 Thread tmishima


Thanks Jeff. It seems to be really the latest one - ticket #4474.

> On Mar 28, 2014, at 5:45 AM,  wrote:
>
> >
--
> > A system call failed during shared memory initialization that should
> > not have.  It is likely that your MPI job will now either abort or
> > experience performance degradation.
> >
> >  Local host:  node03.cluster
> >  System call: unlink
> >
(2) /tmp/openmpi-sessions-mishima@node03_0/17579/1/vader_segment.node03.0
> >  Error:   No such file or directory (errno 2)
> >
--
>
>
> This error was just fixed last night.
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/03/14416.php



[OMPI devel] system call failed during shared memory initialization with openmpi-1.8a1r31254

2014-03-28 Thread tmishima

Hi all,

I saw this error as shown below with openmpi-1.8a1r31254.
I've never seen it before with openmpi-1.7.5.

The message implies it's related to vader and I can stop
it by excluding vader from btl, -mca btl ^vader.

Could someone fix this problem?

Tetsuya

[mishima@manage openmpi]$ mpirun -np 16 -host node03,node04 -map-by
numa:pe=4 -display-map -report-bindings -bind-to cor
e ./demos/myprog
 Data for JOB [17579,1] offset 0

    JOB MAP   

 Data for node: node03  Num slots: 1Max slots: 0Num procs: 8
Process OMPI jobid: [17579,1] App: 0 Process rank: 0
Process OMPI jobid: [17579,1] App: 0 Process rank: 1
Process OMPI jobid: [17579,1] App: 0 Process rank: 2
Process OMPI jobid: [17579,1] App: 0 Process rank: 3
Process OMPI jobid: [17579,1] App: 0 Process rank: 4
Process OMPI jobid: [17579,1] App: 0 Process rank: 5
Process OMPI jobid: [17579,1] App: 0 Process rank: 6
Process OMPI jobid: [17579,1] App: 0 Process rank: 7

 Data for node: node04  Num slots: 1Max slots: 0Num procs: 8
Process OMPI jobid: [17579,1] App: 0 Process rank: 8
Process OMPI jobid: [17579,1] App: 0 Process rank: 9
Process OMPI jobid: [17579,1] App: 0 Process rank: 10
Process OMPI jobid: [17579,1] App: 0 Process rank: 11
Process OMPI jobid: [17579,1] App: 0 Process rank: 12
Process OMPI jobid: [17579,1] App: 0 Process rank: 13
Process OMPI jobid: [17579,1] App: 0 Process rank: 14
Process OMPI jobid: [17579,1] App: 0 Process rank: 15

 =
[node03.cluster:23025] MCW rank 4 bound to socket 2[core 16[hwt 0]], socket
2[core 17[hwt 0]], socket 2[core 18[hwt 0]],
 socket 2[core 19[hwt 0]]:
[./././././././.][./././././././.][B/B/B/B/./././.][./././././././.]
[node03.cluster:23025] MCW rank 5 bound to socket 2[core 20[hwt 0]], socket
2[core 21[hwt 0]], socket 2[core 22[hwt 0]],
 socket 2[core 23[hwt 0]]:
[./././././././.][./././././././.][././././B/B/B/B][./././././././.]
[node03.cluster:23025] MCW rank 6 bound to socket 3[core 24[hwt 0]], socket
3[core 25[hwt 0]], socket 3[core 26[hwt 0]],
 socket 3[core 27[hwt 0]]:
[./././././././.][./././././././.][./././././././.][B/B/B/B/./././.]
[node03.cluster:23025] MCW rank 7 bound to socket 3[core 28[hwt 0]], socket
3[core 29[hwt 0]], socket 3[core 30[hwt 0]],
 socket 3[core 31[hwt 0]]:
[./././././././.][./././././././.][./././././././.][././././B/B/B/B]
[node03.cluster:23025] MCW rank 0 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
cket 0[core 3[hwt 0]]:
[B/B/B/B/./././.][./././././././.][./././././././.][./././././././.]
[node04.cluster:29332] MCW rank 10 bound to socket 1[core 8[hwt 0]], socket
1[core 9[hwt 0]], socket 1[core 10[hwt 0]],
socket 1[core 11[hwt 0]]:
[./././././././.][B/B/B/B/./././.][./././././././.][./././././././.]
[node04.cluster:29332] MCW rank 11 bound to socket 1[core 12[hwt 0]],
socket 1[core 13[hwt 0]], socket 1[core 14[hwt 0]]
, socket 1[core 15[hwt 0]]:
[./././././././.][././././B/B/B/B][./././././././.][./././././././.]
[node04.cluster:29332] MCW rank 12 bound to socket 2[core 16[hwt 0]],
socket 2[core 17[hwt 0]], socket 2[core 18[hwt 0]]
, socket 2[core 19[hwt 0]]:
[./././././././.][./././././././.][B/B/B/B/./././.][./././././././.]
[node04.cluster:29332] MCW rank 13 bound to socket 2[core 20[hwt 0]],
socket 2[core 21[hwt 0]], socket 2[core 22[hwt 0]]
, socket 2[core 23[hwt 0]]:
[./././././././.][./././././././.][././././B/B/B/B][./././././././.]
[node04.cluster:29332] MCW rank 14 bound to socket 3[core 24[hwt 0]],
socket 3[core 25[hwt 0]], socket 3[core 26[hwt 0]]
, socket 3[core 27[hwt 0]]:
[./././././././.][./././././././.][./././././././.][B/B/B/B/./././.]
[node04.cluster:29332] MCW rank 15 bound to socket 3[core 28[hwt 0]],
socket 3[core 29[hwt 0]], socket 3[core 30[hwt 0]]
, socket 3[core 31[hwt 0]]:
[./././././././.][./././././././.][./././././././.][././././B/B/B/B]
[node04.cluster:29332] MCW rank 8 bound to socket 0[core 0[hwt 0]], socket
0[core 1[hwt 0]], socket 0[core 2[hwt 0]], so
cket 0[core 3[hwt 0]]:
[B/B/B/B/./././.][./././././././.][./././././././.][./././././././.]
[node04.cluster:29332] MCW rank 9 bound to socket 0[core 4[hwt 0]], socket
0[core 5[hwt 0]], socket 0[core 6[hwt 0]], so
cket 0[core 7[hwt 0]]:
[././././B/B/B/B][./././././././.][./././././././.][./././././././.]
[node03.cluster:23025] MCW rank 1 bound to socket 0[core 4[hwt 0]], socket
0[core 5[hwt 0]], socket 0[core 6[hwt 0]], so
cket 0[core 7[hwt 0]]:
[././././B/B/B/B][./././././././.][./././././././.][./././././././.]
[node03.cluster:23025] MCW rank 2 bound to socket 1[core 8[hwt 0]], socket
1[core 9[hwt 0]], socket 1[core 10[hwt 0]], s
ocket 1[core 11[hwt 0]]:
[./././././././.][B/B/B/B/./././.][./././././././.][./././././././.]
[node03.cluster:23025] MCW rank 3 bound to socket 1[core 12[hwt 0]], 

Re: [OMPI devel] cleanup of rr_byobj

2014-03-27 Thread tmishima


I added two improvements. Please replace the previous patch file
by this attached one, and take a look this week end.

1. Add pre-check for ORTE_ERR_NOT_FOUND to make retry with byslot
work afterward correctly. Otherwise, the retry could fail, because
some fields such as node->procs, node->slots_inuse is already
updated.

2. Improve the detection of oversubscription, when node->slots is not
multiple number of cpus_per_rank. For example, using node05, node06
with slots = 8 and setting cpus_per_rank = 3, np = 5 should be
oversubscribed, although np x cpus_per_rank(3X5=15) is less than
num_slots(=16). I fixed to detect this oversubscription.

Tetsuya

(See attached file: patch.byobj2)

> Hi Tetsuya
>
> Let me take a look when I get home this weekend - I'm giving an ORTE
tutorial to a group of new developers this week and my time is very
limited.
>
> Thanks
> Ralph
>
>
>
> On Tue, Mar 25, 2014 at 5:37 PM,  wrote:
>
> Hi Ralph, I moved on to the development list.
>
> I'm not sure why add_one flag is used in the rr_byobj.
> Here, if oversubscribed, proc is mapped to each object
> one by one. So, I think the add_one is not necesarry.
>
> Instead, when the user doesn't permit oversubscription,
> the second pass should be skipped.
>
> I made the logic a bit clear based upon this idea and
> removed some outputs to synchronize it with the 1.7 branch.
>
> Please take a look at attached patch file.
>
> Tetsuya
>
> (See attached file: patch.byobj)
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/03/14393.php___

> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/develLink to
this post: http://www.open-mpi.org/community/lists/devel/2014/03/14394.php

patch.byobj2
Description: Binary data


Re: [OMPI devel] cleanup of rr_byobj

2014-03-26 Thread tmishima


no problem - it's a minor cleanup.

Tetsuya

> Hi Tetsuya
>
> Let me take a look when I get home this weekend - I'm giving an ORTE
tutorial to a group of new developers this week and my time is very
limited.
>
> Thanks
> Ralph
>
>
>
> On Tue, Mar 25, 2014 at 5:37 PM,  wrote:
>
> Hi Ralph, I moved on to the development list.
>
> I'm not sure why add_one flag is used in the rr_byobj.
> Here, if oversubscribed, proc is mapped to each object
> one by one. So, I think the add_one is not necesarry.
>
> Instead, when the user doesn't permit oversubscription,
> the second pass should be skipped.
>
> I made the logic a bit clear based upon this idea and
> removed some outputs to synchronize it with the 1.7 branch.
>
> Please take a look at attached patch file.
>
> Tetsuya
>
> (See attached file: patch.byobj)
> ___
> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Link to this post:
http://www.open-mpi.org/community/lists/devel/2014/03/14393.php___

> devel mailing list
> de...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/develLink to
this post: http://www.open-mpi.org/community/lists/devel/2014/03/14394.php



[OMPI devel] cleanup of rr_byobj

2014-03-25 Thread tmishima

Hi Ralph, I moved on to the development list.

I'm not sure why add_one flag is used in the rr_byobj.
Here, if oversubscribed, proc is mapped to each object
one by one. So, I think the add_one is not necesarry.

Instead, when the user doesn't permit oversubscription,
the second pass should be skipped.

I made the logic a bit clear based upon this idea and
removed some outputs to synchronize it with the 1.7 branch.

Please take a look at attached patch file.

Tetsuya

(See attached file: patch.byobj)

patch.byobj
Description: Binary data