Re: [OMPI devel] subcommunicator OpenMPI issues on K

2017-11-07 Thread Kawashima, Takahiro
> > As other people said, Fujitsu MPI used in K is based on old > > Open MPI (v1.6.3 with bug fixes). > > I guess the obvious question is will the vanilla Open-MPI work on K? Unfortunately no. Support of Tofu and Fujitsu resource manager are not included in Open MPI. Takahiro Kawashima, MPI

Re: [OMPI devel] subcommunicator OpenMPI issues on K

2017-11-07 Thread Gilles Gouaillardet
vanilla Open MPI has no support for the tofu interconnect, nor Fujitsu batch manager. also, and iirc, tcp communication between two compute nodes is not always possible on the K computer. so the simple answer is no. Cheers, Gilles On 11/8/2017 10:37 AM, Christopher Samuel wrote: On

Re: [OMPI devel] subcommunicator OpenMPI issues on K

2017-11-07 Thread Christopher Samuel
On 08/11/17 12:30, Kawashima, Takahiro wrote: > As other people said, Fujitsu MPI used in K is based on old > Open MPI (v1.6.3 with bug fixes). I guess the obvious question is will the vanilla Open-MPI work on K? -- Christopher SamuelSenior Systems Administrator Melbourne

Re: [OMPI devel] subcommunicator OpenMPI issues on K

2017-11-07 Thread Kawashima, Takahiro
Samuel, I am a developer of Fujitsu MPI. Thanks for using the K computer. For official support, please consult with the helpdesk of K, as Gilles said. The helpdesk may have information based on past inquiries. If not, the inquiry will be forwarded to our team. As other people said, Fujitsu MPI

Re: [OMPI devel] subcommunicator OpenMPI issues on K

2017-11-07 Thread Gilles Gouaillardet
Folks, fwiw, i wrote a simple test program that mimicks Open MPI qsort usage, and a given qsort invokation on 1 million keys always take less than one second on K. So qsort() is unlikely to be blamed here. Cheers, Gilles On Wed, Nov 8, 2017 at 3:24 AM, George Bosilca

Re: [OMPI devel] subcommunicator OpenMPI issues on K

2017-11-07 Thread Edgar Gabriel
My guess would be that both aspects (sorting + CID allocation) could be a problem. There was a loong time back an effort to convert the sequence of allgather + qsort into a distributed sort (based on a paper by Moody et. al. where he demonstrated the benefits of this approach).  We didn't get

Re: [OMPI devel] subcommunicator OpenMPI issues on K

2017-11-07 Thread Samuel Williams
Although splitting into sqrt(p) teams of sqrt(p) is common for 2D SpMV (or 3D for LU/SparseLU), splitting by coarse grained parallelism (e.g. multiple concurrent solves on multiple RHS) is also possible. Here, the teams are

Re: [OMPI devel] subcommunicator OpenMPI issues on K

2017-11-07 Thread George Bosilca
Samuel, You are right, we use qsort to sort the keys, but the qsort only applies on participants with the same color. So while the complexity of the qsort might reach bottom only when most of the processes participate with the same color. What I think is OMPI problem in this are is the selection

Re: [OMPI devel] subcommunicator OpenMPI issues on K

2017-11-07 Thread Samuel Williams
I'll ask my collaborators if they've submitted a ticket. (they have the accounts; built the code; ran the code; observed the issues) I believe the issue on MPICH was a qsort issue and not a Allreduce issue. When this is coupled with the fact that it looked like qsort is called in

Re: [OMPI devel] subcommunicator OpenMPI issues on K

2017-11-07 Thread Gilles Gouaillardet
Samuel, The default MPI library on the K computer is Fujitsu MPI, and yes, it is based on Open MPI. /* fwiw, an alternative is RIKEN MPI, and it is MPICH based */ >From a support perspective, this should be reported to the HPCI helpdesk http://www.hpci-office.jp/pages/e_support As far as i

[OMPI devel] subcommunicator OpenMPI issues on K

2017-11-07 Thread Samuel Williams
Some of my collaborators have had issues with one of my benchmarks at high concurrency (82K MPI procs) on the K machine in Japan. I believe K uses OpenMPI and the issues has been tracked to time in MPI_Comm_dup/Comm_split increasing quadratically with process concurrency. At 82K processes,