Hi Jeff, thanks for you reply. I did understand previous questions about RDMA. Ever with SKaMPI, i tried to run with mpi_leave_pinned = 1, as you have suggested. But also in this case, execution time is very similar to previous case.
Does it means that SKaMPI, reallocates buffer every time ? For example, with "MPI_Bcast-length" test, over 128 procs, the collective is repeated about 28 times, increasing buffer size for each step by internal formula, and finale buffer size =2097152 K. Since there aren't advantages with leave_pinned = 1, it means that SKaMPI doesn't allocates buffer of 2097152 K initially, but it allocates small buffer and reallocates buffer every time, with more large size. Is it possible? If no, which is the cause of similar performance? Another question: RDMA pipeline protocol for long messages, in OpenMPI 1.2.6 is setting by default? 2008/6/6 Gabriele Fatigati <gabriele.fatig...@gmail.com>: > Hi Jeff, > thanks for you reply. I did understand previous questions about RDMA. Ever > with SKaMPI, i tried to run with mpi_leave_pinned = 1, as you have > suggested. But also in this case, execution time is very similar to > previous case. > > Does it means that SKaMPI, reallocates buffer every time ? For example, > with "MPI_Bcast-length" test, over 128 procs, the collective is repeated > about 28 times, increasing buffer size for each step by internal formula, > and finale buffer size =2097152 K. > > Since there aren't advantages with leave_pinned = 1, it means that SKaMPI > doesn't allocates buffer of 2097152 K initially, but it allocates small > buffer and reallocates buffer every time, with more large size. Is it > possible? If no, which is the cause of similar performance? > > Another question: RDMA pipeline protocol for long messages, in OpenMPI > 1.2.6 is setting by default? > > 2008/6/6 Jeff Squyres <jsquy...@cisco.com>: > > Note that "eager" RDMA is only used for short messages -- it's not >> really relevant to whether the same user buffers are re-used or not >> (the mpi_leave_pinned parameter for long messages is only useful if >> long buffers are re-used). See this FAQ item: >> >> http://www.open-mpi.org/faq/?category=openfabrics#ib-small-message-rdma >> >> For benchmarks (like SKAMPI) that re-use long buffers, you might want >> to use the mpi_leave_pinned MCA parameter: >> >> >> http://www.open-mpi.org/faq/?category=openfabrics#large-message-leave-pinned >> http://www.open-mpi.org/faq/?category=tuning#running-perf-numbers >> >> >> On Jun 5, 2008, at 9:47 AM, Gabriele Fatigati wrote: >> >> > >> > Hi, >> > i'm testing SKaMPI Benchmark on IBM Blade System over Infiniband. >> > Current version of OpenMPI is 1.2.6 >> > I have tried to disable RDMA setting btl_openib_use_eager_rdma = 0. >> > But, i have noted that, in MPI collectives execution time, there are >> > few difference beetween RDMA active and none. Before tests, I >> > expected that with RDMA off, excecution time was more long. >> > >> > So, i suppose that SKaMPI benchmark does continues reallocation of >> > buffers that forbid benefits of RDMA protocol. Indeed, if initial >> > buffer address change every time, we have to do very much >> > registration of memory pages afterwards decay of perfomance. >> > >> > I used RDMA pipeline protocol. This protocol should makes no >> > assumption about the application reuse of source and target buffers. >> > But, is it every true? >> > Parameters net are explained below. >> > >> > MCA btl: parameter "btl_openib_mpool" (current value: "rdma") >> > MCA btl: parameter "btl_openib_ib_max_rdma_dst >> > _ops" (current value: "4") >> > MCA btl: parameter "btl_openib_use_eager_rdma" (current value: "1") >> > MCA btl: parameter "btl_openib_eager_rdma_threshold" (current value: >> > "16") >> > MCA btl: parameter "btl_openib_max_eager_rdma" (current value: "16") >> > MCA btl: parameter "btl_openib_eager_rdma_num" (current value: "16") >> > MCA btl: parameter "btl_openib_min_rdma_size" (current value: >> > "1048576") >> > MCA btl: parameter "btl_openib_max_rdma_size" (current value: >> > "1048576") >> > >> > -- >> > Gabriele Fatigati >> > >> > CINECA Systems & Tecnologies Department >> > >> > Supercomputing Group >> > >> > Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy >> > >> > www.cineca.it Tel: +39 051 6171722 >> > >> > g.fatig...@cineca.it _______________________________________________ >> > users mailing list >> > us...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> -- >> Jeff Squyres >> Cisco Systems >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> > > > -- > Gabriele Fatigati > > CINECA Systems & Tecnologies Department > > Supercomputing Group > > Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy > > www.cineca.it Tel: +39 051 6171722 > > g.fatig...@cineca.it > -- Gabriele Fatigati CINECA Systems & Tecnologies Department Supercomputing Group Via Magnanelli 6/3, Casalecchio di Reno (BO) Italy www.cineca.it Tel: +39 051 6171722 g.fatig...@cineca.it