When I ran my osu tests , I was able to get the numbers out of all the
tests except latency_mt (which was obvious, as I didnt compile open-mpi
with multi threaded support).
A good way to know if the problem is with openmpi or with your custom OFED
stack would be to use some other device like tcp instead of ib and rerun
these one sided comm tests.
On Wed, Feb 29, 2012 at 10:04 AM, Barrett, Brian W <bwba...@sandia.gov>wrote:

> I'm pretty sure that they are correct.  Our one-sided implementation is
> buggier than I'd like (indeed, I'm in the process of rewriting most of it
> as part of Open MPI's support for MPI-3's revised RDMA), so it's likely
> that the bugs are in Open MPI's onesided support.  Can you try a more
> recent release (something from the 1.5 tree) and see if the problem
> persists?
>
> Thanks,
>
> Brian
>
> On 2/29/12 10:56 AM, "Jeffrey Squyres" <jsquy...@cisco.com> wrote:
>
> >FWIW, I'm immediately suspicious of *any* MPI application that uses the
> >MPI one-sided operations (i.e., MPI_PUT and MPI_GET).  It looks like
> >these two OSU benchmarks are using those operations.
> >
> >Is it known that these two benchmarks are correct?
> >
> >
> >
> >On Feb 29, 2012, at 11:33 AM, Venkateswara Rao Dokku wrote:
> >
> >> Sorry, i forgot to introduce the system.. Ours is the customized OFED
> >>stack implemented to work on the specific hardware.. We tested the stack
> >>with the q-perf and Intel Benchmarks(IMB-3.2.2).. they went fine.. We
> >>want to execute the osu_benchamark3.1.1 suite on our OFED..
> >>
> >> On Wed, Feb 29, 2012 at 9:57 PM, Venkateswara Rao Dokku
> >><dvrao....@gmail.com> wrote:
> >> Hiii,
> >> I tried executing osu_benchamarks-3.1.1 suite with the openmpi-1.4.3...
> >>I could run 10 bench-mark tests (except osu_put_bibw,osu_put_bw,osu_
> >> get_bw,osu_latency_mt) out of 14 tests in the bench-mark suite... and
> >>the remaining tests are hanging at some message size.. the output is
> >>shown below
> >>
> >> [root@test2 ~]# mpirun --prefix /usr/local/ -np 2 --mca btl
> >>openib,self,sm -H 192.168.0.175,192.168.0.174 --mca
> >>orte_base_help_aggregate 0
> >>/root/ramu/ofed_pkgs/osu_benchmarks-3.1.1/osu_put_bibw
> >> failed to create doorbell file /dev/plx2_char_dev
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> WARNING: No preset parameters were found for the device that Open MPI
> >> detected:
> >>
> >>   Local host:            test1
> >>   Device name:           plx2_0
> >>   Device vendor ID:      0x10b5
> >>   Device vendor part ID: 4277
> >>
> >> Default device parameters will be used, which may result in lower
> >> performance.  You can edit any of the files specified by the
> >> btl_openib_device_param_files MCA parameter to set values for your
> >> device.
> >>
> >> NOTE: You can turn off this warning by setting the MCA parameter
> >>       btl_openib_warn_no_device_params_found to 0.
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> failed to create doorbell file /dev/plx2_char_dev
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> WARNING: No preset parameters were found for the device that Open MPI
> >> detected:
> >>
> >>   Local host:            test2
> >>   Device name:           plx2_0
> >>   Device vendor ID:      0x10b5
> >>   Device vendor part ID: 4277
> >>
> >> Default device parameters will be used, which may result in lower
> >> performance.  You can edit any of the files specified by the
> >> btl_openib_device_param_files MCA parameter to set values for your
> >> device.
> >>
> >> NOTE: You can turn off this warning by setting the MCA parameter
> >>       btl_openib_warn_no_device_params_found to 0.
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> # OSU One Sided MPI_Put Bi-directional Bandwidth Test v3.1.1
> >> # Size     Bi-Bandwidth (MB/s)
> >> plx2_create_qp line: 415
> >> plx2_create_qp line: 415
> >> plx2_create_qp line: 415
> >> plx2_create_qp line: 415
> >> 1                         0.00
> >> 2                         0.00
> >> 4                         0.01
> >> 8                         0.03
> >> 16                        0.07
> >> 32                        0.15
> >> 64                        0.11
> >> 128                       0.21
> >> 256                       0.43
> >> 512                       0.88
> >> 1024                      2.10
> >> 2048                      4.21
> >> 4096                      8.10
> >> 8192                     16.19
> >> 16384                     8.46
> >> 32768                    20.34
> >> 65536                    39.85
> >> 131072                   84.22
> >> 262144                  142.23
> >> 524288                  234.83
> >> mpirun: killing job...
> >>
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> mpirun noticed that process rank 0 with PID 7305 on node test2 exited
> >>on signal 0 (Unknown signal 0).
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> 2 total processes killed (some possibly by mpirun during cleanup)
> >> mpirun: clean termination accomplished
> >>
> >> [root@test2 ~]# mpirun --prefix /usr/local/ -np 2 --mca btl
> >>openib,self,sm -H 192.168.0.175,192.168.0.174 --mca
> >>orte_base_help_aggregate 0
> >>/root/ramu/ofed_pkgs/osu_benchmarks-3.1.1/osu_put_bw
> >> failed to create doorbell file /dev/plx2_char_dev
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> WARNING: No preset parameters were found for the device that Open MPI
> >> detected:
> >>
> >>   Local host:            test1
> >>   Device name:           plx2_0
> >>   Device vendor ID:      0x10b5
> >>   Device vendor part ID: 4277
> >>
> >> Default device parameters will be used, which may result in lower
> >> performance.  You can edit any of the files specified by the
> >> btl_openib_device_param_files MCA parameter to set values for your
> >> device.
> >>
> >> NOTE: You can turn off this warning by setting the MCA parameter
> >>       btl_openib_warn_no_device_params_found to 0.
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> failed to create doorbell file /dev/plx2_char_dev
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> WARNING: No preset parameters were found for the device that Open MPI
> >> detected:
> >>
> >>   Local host:            test2
> >>   Device name:           plx2_0
> >>   Device vendor ID:      0x10b5
> >>   Device vendor part ID: 4277
> >>
> >> Default device parameters will be used, which may result in lower
> >> performance.  You can edit any of the files specified by the
> >> btl_openib_device_param_files MCA parameter to set values for your
> >> device.
> >>
> >> NOTE: You can turn off this warning by setting the MCA parameter
> >>       btl_openib_warn_no_device_params_found to 0.
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> alloc_srq max: 512 wqe_shift: 5
> >> # OSU One Sided MPI_Put Bandwidth Test v3.1.1
> >> # Size        Bandwidth (MB/s)
> >> plx2_create_qp line: 415
> >> plx2_create_qp line: 415
> >> plx2_create_qp line: 415
> >> plx2_create_qp line: 415
> >> 1                         0.02
> >> 2                         0.05
> >> 4                         0.10
> >> 8                         0.19
> >> 16                        0.39
> >> 32                        0.77
> >> 64                        1.53
> >> 128                       2.57
> >> 256                       4.16
> >> 512                       8.30
> >> 1024                     16.62
> >> 2048                     33.22
> >> 4096                     66.51
> >> 8192                     42.45
> >> 16384                    11.99
> >> 32768                    18.20
> >> 65536                    76.04
> >> 131072                   98.64
> >> 262144                  407.66
> >> 524288                  489.84
> >> mpirun: killing job...
> >>
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> mpirun noticed that process rank 0 with PID 7314 on node test2 exited
> >>on signal 0 (Unknown signal 0).
> >>
> >>-------------------------------------------------------------------------
> >>-
> >> 2 total processes killed (some possibly by mpirun during cleanup)
> >> mpirun: clean termination accomplished
> >>
> >> I even checked the logs but i couldn't see any errors...
> >> Could you suggest a way to overcome/debug this issue..
> >>
> >> Thanks for the kind reply..
> >>
> >>
> >> --
> >> Thanks & Regards,
> >> D.Venkateswara Rao,
> >> Software Engineer,One Convergence Devices Pvt Ltd.,
> >> Jubille Hills,Hyderabad.
> >>
> >>
> >>
> >>
> >> --
> >> Thanks & Regards,
> >> D.Venkateswara Rao,
> >> Software Engineer,One Convergence Devices Pvt Ltd.,
> >> Jubille Hills,Hyderabad.
> >>
> >> _______________________________________________
> >> users mailing list
> >> us...@open-mpi.org
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
> >--
> >Jeff Squyres
> >jsquy...@cisco.com
> >For corporate legal information go to:
> >http://www.cisco.com/web/about/doing_business/legal/cri/
> >
> >
> >_______________________________________________
> >users mailing list
> >us...@open-mpi.org
> >http://www.open-mpi.org/mailman/listinfo.cgi/users
> >
> >
>
>
> --
>  Brian W. Barrett
>  Dept. 1423: Scalable System Software
>  Sandia National Laboratories
>
>
>
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to