Re: [OMPI users] [Open MPI Announce] Open MPI 4.0.0 Released
Dear Howard, just want to report back, that using xpmem with ucx got me a working SHMEM again. Thanks. Best, Bert On Wed, Nov 14, 2018 at 12:09 PM Howard Pritchard wrote: > > Hi Bert, > > If you'd prefer to return to the land of convenience and don't need to mix MPI > and OpenSHMEM, then you may want to try the path I outlined in the email > archived at the following link > > https://www.mail-archive.com/users@lists.open-mpi.org/msg32274.html > > Howard > > > Am Di., 13. Nov. 2018 um 23:10 Uhr schrieb Bert Wesarg via users > : >> >> Dear Takahiro, >> On Wed, Nov 14, 2018 at 5:38 AM Kawashima, Takahiro >> wrote: >> > >> > XPMEM moved to GitLab. >> > >> > https://gitlab.com/hjelmn/xpmem >> >> the first words from the README aren't very pleasant to read: >> >> This is an experimental version of XPMEM based on a version provided by >> Cray and uploaded to https://code.google.com/p/xpmem. This version supports >> any kernel 3.12 and newer. *Keep in mind there may be bugs and this version >> may cause kernel panics, code crashes, eat your cat, etc.* >> >> Installing this on my laptop where I just want developing with SHMEM >> it would be a pitty to lose work just because of that. >> >> Best, >> Bert >> >> > >> > Thanks, >> > Takahiro Kawashima, >> > Fujitsu >> > >> > > Hello Bert, >> > > >> > > What OS are you running on your notebook? >> > > >> > > If you are running Linux, and you have root access to your system, then >> > > you should be able to resolve the Open SHMEM support issue by installing >> > > the XPMEM device driver on your system, and rebuilding UCX so it picks >> > > up XPMEM support. >> > > >> > > The source code is on GitHub: >> > > >> > > https://github.com/hjelmn/xpmem >> > > >> > > Some instructions on how to build the xpmem device driver are at >> > > >> > > https://github.com/hjelmn/xpmem/wiki/Installing-XPMEM >> > > >> > > You will need to install the kernel source and symbols rpms on your >> > > system before building the xpmem device driver. >> > > >> > > Hope this helps, >> > > >> > > Howard >> > > >> > > >> > > Am Di., 13. Nov. 2018 um 15:00 Uhr schrieb Bert Wesarg via users < >> > > users@lists.open-mpi.org>: >> > > >> > > > Hi, >> > > > >> > > > On Mon, Nov 12, 2018 at 10:49 PM Pritchard Jr., Howard via announce >> > > > wrote: >> > > > > >> > > > > The Open MPI Team, representing a consortium of research, academic, >> > > > > and >> > > > > industry partners, is pleased to announce the release of Open MPI >> > > > > version >> > > > > 4.0.0. >> > > > > >> > > > > v4.0.0 is the start of a new release series for Open MPI. Starting >> > > > > with >> > > > > this release, the OpenIB BTL supports only iWarp and RoCE by default. >> > > > > Starting with this release, UCX is the preferred transport protocol >> > > > > for Infiniband interconnects. The embedded PMIx runtime has been >> > > > > updated >> > > > > to 3.0.2. The embedded Romio has been updated to 3.2.1. This >> > > > > release is ABI compatible with the 3.x release streams. There have >> > > > > been >> > > > numerous >> > > > > other bug fixes and performance improvements. >> > > > > >> > > > > Note that starting with Open MPI v4.0.0, prototypes for several >> > > > > MPI-1 symbols that were deleted in the MPI-3.0 specification >> > > > > (which was published in 2012) are no longer available by default in >> > > > > mpi.h. See the README for further details. >> > > > > >> > > > > Version 4.0.0 can be downloaded from the main Open MPI web site: >> > > > > >> > > > > https://www.open-mpi.org/software/ompi/v4.0/ >> > > > > >> > > > > >> > > > > 4.0.0 -- September, 2018 >> > > > > >> > > > > >> > > > > - OSHMEM updated to the OpenSHMEM 1.4 API. >> > > > > - Do not build OpenSHMEM layer when there are no SPMLs available. >> > > > > Currently, this means the OpenSHMEM layer will only build if >> > > > > a MXM or UCX library is found. >> > > > >> > > > so what is the most convenience way to get SHMEM working on a single >> > > > shared memory node (aka. notebook)? I just realized that I don't have >> > > > a SHMEM since Open MPI 3.0. But building with UCX does not help >> > > > either. I tried with UCX 1.4 but Open MPI SHMEM >> > > > still does not work: >> > > > >> > > > $ oshcc -o shmem_hello_world-4.0.0 >> > > > openmpi-4.0.0/examples/hello_oshmem_c.c >> > > > $ oshrun -np 2 ./shmem_hello_world-4.0.0 >> > > > [1542109710.217344] [tudtug:27715:0] select.c:406 UCX ERROR >> > > > no remote registered memory access transport to tudtug:27716: >> > > > self/self - Destination is unreachable, tcp/enp0s31f6 - no put short, >> > > > tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable, >> > > > mm/posix - Destination is unreachable, cma/cma - no put short >> > > > [1542109710.217344] [tudtug:27716:0] select.c:406 UCX ERROR >> > > > no remote registered memory access transport to tudtug:27715: >> > > > self/self - Destination is unreachable, tcp/enp0s31f6 - n
Re: [OMPI users] [Open MPI Announce] Open MPI 4.0.0 Released
I really need to update that wording. It has been awhile and the code seems to have stabilized. It’s quite safe to use and supports some of the latest kernel versions. -Nathan > On Nov 13, 2018, at 11:06 PM, Bert Wesarg via users > wrote: > > Dear Takahiro, > On Wed, Nov 14, 2018 at 5:38 AM Kawashima, Takahiro > wrote: >> >> XPMEM moved to GitLab. >> >> https://gitlab.com/hjelmn/xpmem > > the first words from the README aren't very pleasant to read: > > This is an experimental version of XPMEM based on a version provided by > Cray and uploaded to https://code.google.com/p/xpmem. This version supports > any kernel 3.12 and newer. *Keep in mind there may be bugs and this version > may cause kernel panics, code crashes, eat your cat, etc.* > > Installing this on my laptop where I just want developing with SHMEM > it would be a pitty to lose work just because of that. > > Best, > Bert > >> >> Thanks, >> Takahiro Kawashima, >> Fujitsu >> >>> Hello Bert, >>> >>> What OS are you running on your notebook? >>> >>> If you are running Linux, and you have root access to your system, then >>> you should be able to resolve the Open SHMEM support issue by installing >>> the XPMEM device driver on your system, and rebuilding UCX so it picks >>> up XPMEM support. >>> >>> The source code is on GitHub: >>> >>> https://github.com/hjelmn/xpmem >>> >>> Some instructions on how to build the xpmem device driver are at >>> >>> https://github.com/hjelmn/xpmem/wiki/Installing-XPMEM >>> >>> You will need to install the kernel source and symbols rpms on your >>> system before building the xpmem device driver. >>> >>> Hope this helps, >>> >>> Howard >>> >>> >>> Am Di., 13. Nov. 2018 um 15:00 Uhr schrieb Bert Wesarg via users < >>> users@lists.open-mpi.org>: >>> Hi, On Mon, Nov 12, 2018 at 10:49 PM Pritchard Jr., Howard via announce wrote: > > The Open MPI Team, representing a consortium of research, academic, and > industry partners, is pleased to announce the release of Open MPI version > 4.0.0. > > v4.0.0 is the start of a new release series for Open MPI. Starting with > this release, the OpenIB BTL supports only iWarp and RoCE by default. > Starting with this release, UCX is the preferred transport protocol > for Infiniband interconnects. The embedded PMIx runtime has been updated > to 3.0.2. The embedded Romio has been updated to 3.2.1. This > release is ABI compatible with the 3.x release streams. There have been numerous > other bug fixes and performance improvements. > > Note that starting with Open MPI v4.0.0, prototypes for several > MPI-1 symbols that were deleted in the MPI-3.0 specification > (which was published in 2012) are no longer available by default in > mpi.h. See the README for further details. > > Version 4.0.0 can be downloaded from the main Open MPI web site: > > https://www.open-mpi.org/software/ompi/v4.0/ > > > 4.0.0 -- September, 2018 > > > - OSHMEM updated to the OpenSHMEM 1.4 API. > - Do not build OpenSHMEM layer when there are no SPMLs available. > Currently, this means the OpenSHMEM layer will only build if > a MXM or UCX library is found. so what is the most convenience way to get SHMEM working on a single shared memory node (aka. notebook)? I just realized that I don't have a SHMEM since Open MPI 3.0. But building with UCX does not help either. I tried with UCX 1.4 but Open MPI SHMEM still does not work: $ oshcc -o shmem_hello_world-4.0.0 openmpi-4.0.0/examples/hello_oshmem_c.c $ oshrun -np 2 ./shmem_hello_world-4.0.0 [1542109710.217344] [tudtug:27715:0] select.c:406 UCX ERROR no remote registered memory access transport to tudtug:27716: self/self - Destination is unreachable, tcp/enp0s31f6 - no put short, tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable, mm/posix - Destination is unreachable, cma/cma - no put short [1542109710.217344] [tudtug:27716:0] select.c:406 UCX ERROR no remote registered memory access transport to tudtug:27715: self/self - Destination is unreachable, tcp/enp0s31f6 - no put short, tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable, mm/posix - Destination is unreachable, cma/cma - no put short [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266 Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305 Error: add procs FAILED rc=-2 [tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266 Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable [tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305 Error: add procs FAILED rc=-2 ---
Re: [OMPI users] [Open MPI Announce] Open MPI 4.0.0 Released
Hi Bert, If you'd prefer to return to the land of convenience and don't need to mix MPI and OpenSHMEM, then you may want to try the path I outlined in the email archived at the following link https://www.mail-archive.com/users@lists.open-mpi.org/msg32274.html Howard Am Di., 13. Nov. 2018 um 23:10 Uhr schrieb Bert Wesarg via users < users@lists.open-mpi.org>: > Dear Takahiro, > On Wed, Nov 14, 2018 at 5:38 AM Kawashima, Takahiro > wrote: > > > > XPMEM moved to GitLab. > > > > https://gitlab.com/hjelmn/xpmem > > the first words from the README aren't very pleasant to read: > > This is an experimental version of XPMEM based on a version provided by > Cray and uploaded to https://code.google.com/p/xpmem. This version > supports > any kernel 3.12 and newer. *Keep in mind there may be bugs and this version > may cause kernel panics, code crashes, eat your cat, etc.* > > Installing this on my laptop where I just want developing with SHMEM > it would be a pitty to lose work just because of that. > > Best, > Bert > > > > > Thanks, > > Takahiro Kawashima, > > Fujitsu > > > > > Hello Bert, > > > > > > What OS are you running on your notebook? > > > > > > If you are running Linux, and you have root access to your system, > then > > > you should be able to resolve the Open SHMEM support issue by > installing > > > the XPMEM device driver on your system, and rebuilding UCX so it picks > > > up XPMEM support. > > > > > > The source code is on GitHub: > > > > > > https://github.com/hjelmn/xpmem > > > > > > Some instructions on how to build the xpmem device driver are at > > > > > > https://github.com/hjelmn/xpmem/wiki/Installing-XPMEM > > > > > > You will need to install the kernel source and symbols rpms on your > > > system before building the xpmem device driver. > > > > > > Hope this helps, > > > > > > Howard > > > > > > > > > Am Di., 13. Nov. 2018 um 15:00 Uhr schrieb Bert Wesarg via users < > > > users@lists.open-mpi.org>: > > > > > > > Hi, > > > > > > > > On Mon, Nov 12, 2018 at 10:49 PM Pritchard Jr., Howard via announce > > > > wrote: > > > > > > > > > > The Open MPI Team, representing a consortium of research, > academic, and > > > > > industry partners, is pleased to announce the release of Open MPI > version > > > > > 4.0.0. > > > > > > > > > > v4.0.0 is the start of a new release series for Open MPI. > Starting with > > > > > this release, the OpenIB BTL supports only iWarp and RoCE by > default. > > > > > Starting with this release, UCX is the preferred transport > protocol > > > > > for Infiniband interconnects. The embedded PMIx runtime has been > updated > > > > > to 3.0.2. The embedded Romio has been updated to 3.2.1. This > > > > > release is ABI compatible with the 3.x release streams. There have > been > > > > numerous > > > > > other bug fixes and performance improvements. > > > > > > > > > > Note that starting with Open MPI v4.0.0, prototypes for several > > > > > MPI-1 symbols that were deleted in the MPI-3.0 specification > > > > > (which was published in 2012) are no longer available by default in > > > > > mpi.h. See the README for further details. > > > > > > > > > > Version 4.0.0 can be downloaded from the main Open MPI web site: > > > > > > > > > > https://www.open-mpi.org/software/ompi/v4.0/ > > > > > > > > > > > > > > > 4.0.0 -- September, 2018 > > > > > > > > > > > > > > > - OSHMEM updated to the OpenSHMEM 1.4 API. > > > > > - Do not build OpenSHMEM layer when there are no SPMLs available. > > > > > Currently, this means the OpenSHMEM layer will only build if > > > > > a MXM or UCX library is found. > > > > > > > > so what is the most convenience way to get SHMEM working on a single > > > > shared memory node (aka. notebook)? I just realized that I don't have > > > > a SHMEM since Open MPI 3.0. But building with UCX does not help > > > > either. I tried with UCX 1.4 but Open MPI SHMEM > > > > still does not work: > > > > > > > > $ oshcc -o shmem_hello_world-4.0.0 > openmpi-4.0.0/examples/hello_oshmem_c.c > > > > $ oshrun -np 2 ./shmem_hello_world-4.0.0 > > > > [1542109710.217344] [tudtug:27715:0] select.c:406 UCX ERROR > > > > no remote registered memory access transport to tudtug:27716: > > > > self/self - Destination is unreachable, tcp/enp0s31f6 - no put short, > > > > tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable, > > > > mm/posix - Destination is unreachable, cma/cma - no put short > > > > [1542109710.217344] [tudtug:27716:0] select.c:406 UCX ERROR > > > > no remote registered memory access transport to tudtug:27715: > > > > self/self - Destination is unreachable, tcp/enp0s31f6 - no put short, > > > > tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable, > > > > mm/posix - Destination is unreachable, cma/cma - no put short > > > > [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266 > > > > Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable > > > > [tudtug:27715] ../
Re: [OMPI users] [Open MPI Announce] Open MPI 4.0.0 Released
Dear Takahiro, On Wed, Nov 14, 2018 at 5:38 AM Kawashima, Takahiro wrote: > > XPMEM moved to GitLab. > > https://gitlab.com/hjelmn/xpmem the first words from the README aren't very pleasant to read: This is an experimental version of XPMEM based on a version provided by Cray and uploaded to https://code.google.com/p/xpmem. This version supports any kernel 3.12 and newer. *Keep in mind there may be bugs and this version may cause kernel panics, code crashes, eat your cat, etc.* Installing this on my laptop where I just want developing with SHMEM it would be a pitty to lose work just because of that. Best, Bert > > Thanks, > Takahiro Kawashima, > Fujitsu > > > Hello Bert, > > > > What OS are you running on your notebook? > > > > If you are running Linux, and you have root access to your system, then > > you should be able to resolve the Open SHMEM support issue by installing > > the XPMEM device driver on your system, and rebuilding UCX so it picks > > up XPMEM support. > > > > The source code is on GitHub: > > > > https://github.com/hjelmn/xpmem > > > > Some instructions on how to build the xpmem device driver are at > > > > https://github.com/hjelmn/xpmem/wiki/Installing-XPMEM > > > > You will need to install the kernel source and symbols rpms on your > > system before building the xpmem device driver. > > > > Hope this helps, > > > > Howard > > > > > > Am Di., 13. Nov. 2018 um 15:00 Uhr schrieb Bert Wesarg via users < > > users@lists.open-mpi.org>: > > > > > Hi, > > > > > > On Mon, Nov 12, 2018 at 10:49 PM Pritchard Jr., Howard via announce > > > wrote: > > > > > > > > The Open MPI Team, representing a consortium of research, academic, and > > > > industry partners, is pleased to announce the release of Open MPI > > > > version > > > > 4.0.0. > > > > > > > > v4.0.0 is the start of a new release series for Open MPI. Starting with > > > > this release, the OpenIB BTL supports only iWarp and RoCE by default. > > > > Starting with this release, UCX is the preferred transport protocol > > > > for Infiniband interconnects. The embedded PMIx runtime has been updated > > > > to 3.0.2. The embedded Romio has been updated to 3.2.1. This > > > > release is ABI compatible with the 3.x release streams. There have been > > > numerous > > > > other bug fixes and performance improvements. > > > > > > > > Note that starting with Open MPI v4.0.0, prototypes for several > > > > MPI-1 symbols that were deleted in the MPI-3.0 specification > > > > (which was published in 2012) are no longer available by default in > > > > mpi.h. See the README for further details. > > > > > > > > Version 4.0.0 can be downloaded from the main Open MPI web site: > > > > > > > > https://www.open-mpi.org/software/ompi/v4.0/ > > > > > > > > > > > > 4.0.0 -- September, 2018 > > > > > > > > > > > > - OSHMEM updated to the OpenSHMEM 1.4 API. > > > > - Do not build OpenSHMEM layer when there are no SPMLs available. > > > > Currently, this means the OpenSHMEM layer will only build if > > > > a MXM or UCX library is found. > > > > > > so what is the most convenience way to get SHMEM working on a single > > > shared memory node (aka. notebook)? I just realized that I don't have > > > a SHMEM since Open MPI 3.0. But building with UCX does not help > > > either. I tried with UCX 1.4 but Open MPI SHMEM > > > still does not work: > > > > > > $ oshcc -o shmem_hello_world-4.0.0 openmpi-4.0.0/examples/hello_oshmem_c.c > > > $ oshrun -np 2 ./shmem_hello_world-4.0.0 > > > [1542109710.217344] [tudtug:27715:0] select.c:406 UCX ERROR > > > no remote registered memory access transport to tudtug:27716: > > > self/self - Destination is unreachable, tcp/enp0s31f6 - no put short, > > > tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable, > > > mm/posix - Destination is unreachable, cma/cma - no put short > > > [1542109710.217344] [tudtug:27716:0] select.c:406 UCX ERROR > > > no remote registered memory access transport to tudtug:27715: > > > self/self - Destination is unreachable, tcp/enp0s31f6 - no put short, > > > tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable, > > > mm/posix - Destination is unreachable, cma/cma - no put short > > > [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266 > > > Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable > > > [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305 > > > Error: add procs FAILED rc=-2 > > > [tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266 > > > Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable > > > [tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305 > > > Error: add procs FAILED rc=-2 > > > -- > > > It looks like SHMEM_INIT failed for some reason; your parallel process is > > > likely to abort. There are many reasons that a parallel process can > > > fail during SHMEM_IN
Re: [OMPI users] [Open MPI Announce] Open MPI 4.0.0 Released
Howard, On Wed, Nov 14, 2018 at 5:26 AM Howard Pritchard wrote: > > Hello Bert, > > What OS are you running on your notebook? Ubuntu 18.04 > > If you are running Linux, and you have root access to your system, then > you should be able to resolve the Open SHMEM support issue by installing > the XPMEM device driver on your system, and rebuilding UCX so it picks > up XPMEM support. > > The source code is on GitHub: > > https://github.com/hjelmn/xpmem > > Some instructions on how to build the xpmem device driver are at > > https://github.com/hjelmn/xpmem/wiki/Installing-XPMEM > > You will need to install the kernel source and symbols rpms on your > system before building the xpmem device driver. I will try that. I already tried KNEM, which also did not worked. Though thats definitely leaving the country of convenience. For a development machine where performance doesn't matter, its a huge step back for Open MPI I think. I wil report back if that works. Thanks. Best, Bert > > Hope this helps, > > Howard > > > Am Di., 13. Nov. 2018 um 15:00 Uhr schrieb Bert Wesarg via users > : >> >> Hi, >> >> On Mon, Nov 12, 2018 at 10:49 PM Pritchard Jr., Howard via announce >> wrote: >> > >> > The Open MPI Team, representing a consortium of research, academic, and >> > industry partners, is pleased to announce the release of Open MPI version >> > 4.0.0. >> > >> > v4.0.0 is the start of a new release series for Open MPI. Starting with >> > this release, the OpenIB BTL supports only iWarp and RoCE by default. >> > Starting with this release, UCX is the preferred transport protocol >> > for Infiniband interconnects. The embedded PMIx runtime has been updated >> > to 3.0.2. The embedded Romio has been updated to 3.2.1. This >> > release is ABI compatible with the 3.x release streams. There have been >> > numerous >> > other bug fixes and performance improvements. >> > >> > Note that starting with Open MPI v4.0.0, prototypes for several >> > MPI-1 symbols that were deleted in the MPI-3.0 specification >> > (which was published in 2012) are no longer available by default in >> > mpi.h. See the README for further details. >> > >> > Version 4.0.0 can be downloaded from the main Open MPI web site: >> > >> > https://www.open-mpi.org/software/ompi/v4.0/ >> > >> > >> > 4.0.0 -- September, 2018 >> > >> > >> > - OSHMEM updated to the OpenSHMEM 1.4 API. >> > - Do not build OpenSHMEM layer when there are no SPMLs available. >> > Currently, this means the OpenSHMEM layer will only build if >> > a MXM or UCX library is found. >> >> so what is the most convenience way to get SHMEM working on a single >> shared memory node (aka. notebook)? I just realized that I don't have >> a SHMEM since Open MPI 3.0. But building with UCX does not help >> either. I tried with UCX 1.4 but Open MPI SHMEM >> still does not work: >> >> $ oshcc -o shmem_hello_world-4.0.0 openmpi-4.0.0/examples/hello_oshmem_c.c >> $ oshrun -np 2 ./shmem_hello_world-4.0.0 >> [1542109710.217344] [tudtug:27715:0] select.c:406 UCX ERROR >> no remote registered memory access transport to tudtug:27716: >> self/self - Destination is unreachable, tcp/enp0s31f6 - no put short, >> tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable, >> mm/posix - Destination is unreachable, cma/cma - no put short >> [1542109710.217344] [tudtug:27716:0] select.c:406 UCX ERROR >> no remote registered memory access transport to tudtug:27715: >> self/self - Destination is unreachable, tcp/enp0s31f6 - no put short, >> tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable, >> mm/posix - Destination is unreachable, cma/cma - no put short >> [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266 >> Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable >> [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305 >> Error: add procs FAILED rc=-2 >> [tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266 >> Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable >> [tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305 >> Error: add procs FAILED rc=-2 >> -- >> It looks like SHMEM_INIT failed for some reason; your parallel process is >> likely to abort. There are many reasons that a parallel process can >> fail during SHMEM_INIT; some of which are due to configuration or environment >> problems. This failure appears to be an internal failure; here's some >> additional information (which may only be relevant to an Open SHMEM >> developer): >> >> SPML add procs failed >> --> Returned "Out of resource" (-2) instead of "Success" (0) >> -- >> [tudtug:27715] Error: pshmem_init.c:80 - _shmem_init() SHMEM failed to >> initialize - aborting >> [tudtug:27716] Error: pshmem_init.c:80 - _shmem_init() SHMEM failed to >> initialize - abort
Re: [OMPI users] [Open MPI Announce] Open MPI 4.0.0 Released
XPMEM moved to GitLab. https://gitlab.com/hjelmn/xpmem Thanks, Takahiro Kawashima, Fujitsu > Hello Bert, > > What OS are you running on your notebook? > > If you are running Linux, and you have root access to your system, then > you should be able to resolve the Open SHMEM support issue by installing > the XPMEM device driver on your system, and rebuilding UCX so it picks > up XPMEM support. > > The source code is on GitHub: > > https://github.com/hjelmn/xpmem > > Some instructions on how to build the xpmem device driver are at > > https://github.com/hjelmn/xpmem/wiki/Installing-XPMEM > > You will need to install the kernel source and symbols rpms on your > system before building the xpmem device driver. > > Hope this helps, > > Howard > > > Am Di., 13. Nov. 2018 um 15:00 Uhr schrieb Bert Wesarg via users < > users@lists.open-mpi.org>: > > > Hi, > > > > On Mon, Nov 12, 2018 at 10:49 PM Pritchard Jr., Howard via announce > > wrote: > > > > > > The Open MPI Team, representing a consortium of research, academic, and > > > industry partners, is pleased to announce the release of Open MPI version > > > 4.0.0. > > > > > > v4.0.0 is the start of a new release series for Open MPI. Starting with > > > this release, the OpenIB BTL supports only iWarp and RoCE by default. > > > Starting with this release, UCX is the preferred transport protocol > > > for Infiniband interconnects. The embedded PMIx runtime has been updated > > > to 3.0.2. The embedded Romio has been updated to 3.2.1. This > > > release is ABI compatible with the 3.x release streams. There have been > > numerous > > > other bug fixes and performance improvements. > > > > > > Note that starting with Open MPI v4.0.0, prototypes for several > > > MPI-1 symbols that were deleted in the MPI-3.0 specification > > > (which was published in 2012) are no longer available by default in > > > mpi.h. See the README for further details. > > > > > > Version 4.0.0 can be downloaded from the main Open MPI web site: > > > > > > https://www.open-mpi.org/software/ompi/v4.0/ > > > > > > > > > 4.0.0 -- September, 2018 > > > > > > > > > - OSHMEM updated to the OpenSHMEM 1.4 API. > > > - Do not build OpenSHMEM layer when there are no SPMLs available. > > > Currently, this means the OpenSHMEM layer will only build if > > > a MXM or UCX library is found. > > > > so what is the most convenience way to get SHMEM working on a single > > shared memory node (aka. notebook)? I just realized that I don't have > > a SHMEM since Open MPI 3.0. But building with UCX does not help > > either. I tried with UCX 1.4 but Open MPI SHMEM > > still does not work: > > > > $ oshcc -o shmem_hello_world-4.0.0 openmpi-4.0.0/examples/hello_oshmem_c.c > > $ oshrun -np 2 ./shmem_hello_world-4.0.0 > > [1542109710.217344] [tudtug:27715:0] select.c:406 UCX ERROR > > no remote registered memory access transport to tudtug:27716: > > self/self - Destination is unreachable, tcp/enp0s31f6 - no put short, > > tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable, > > mm/posix - Destination is unreachable, cma/cma - no put short > > [1542109710.217344] [tudtug:27716:0] select.c:406 UCX ERROR > > no remote registered memory access transport to tudtug:27715: > > self/self - Destination is unreachable, tcp/enp0s31f6 - no put short, > > tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable, > > mm/posix - Destination is unreachable, cma/cma - no put short > > [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266 > > Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable > > [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305 > > Error: add procs FAILED rc=-2 > > [tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266 > > Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable > > [tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305 > > Error: add procs FAILED rc=-2 > > -- > > It looks like SHMEM_INIT failed for some reason; your parallel process is > > likely to abort. There are many reasons that a parallel process can > > fail during SHMEM_INIT; some of which are due to configuration or > > environment > > problems. This failure appears to be an internal failure; here's some > > additional information (which may only be relevant to an Open SHMEM > > developer): > > > > SPML add procs failed > > --> Returned "Out of resource" (-2) instead of "Success" (0) > > -- > > [tudtug:27715] Error: pshmem_init.c:80 - _shmem_init() SHMEM failed to > > initialize - aborting > > [tudtug:27716] Error: pshmem_init.c:80 - _shmem_init() SHMEM failed to > > initialize - aborting > > -- > > SHMEM_ABORT was invoked on rank 0 (pid 27715, host=tudtug) with errorcod
Re: [OMPI users] [Open MPI Announce] Open MPI 4.0.0 Released
Hello Bert, What OS are you running on your notebook? If you are running Linux, and you have root access to your system, then you should be able to resolve the Open SHMEM support issue by installing the XPMEM device driver on your system, and rebuilding UCX so it picks up XPMEM support. The source code is on GitHub: https://github.com/hjelmn/xpmem Some instructions on how to build the xpmem device driver are at https://github.com/hjelmn/xpmem/wiki/Installing-XPMEM You will need to install the kernel source and symbols rpms on your system before building the xpmem device driver. Hope this helps, Howard Am Di., 13. Nov. 2018 um 15:00 Uhr schrieb Bert Wesarg via users < users@lists.open-mpi.org>: > Hi, > > On Mon, Nov 12, 2018 at 10:49 PM Pritchard Jr., Howard via announce > wrote: > > > > The Open MPI Team, representing a consortium of research, academic, and > > industry partners, is pleased to announce the release of Open MPI version > > 4.0.0. > > > > v4.0.0 is the start of a new release series for Open MPI. Starting with > > this release, the OpenIB BTL supports only iWarp and RoCE by default. > > Starting with this release, UCX is the preferred transport protocol > > for Infiniband interconnects. The embedded PMIx runtime has been updated > > to 3.0.2. The embedded Romio has been updated to 3.2.1. This > > release is ABI compatible with the 3.x release streams. There have been > numerous > > other bug fixes and performance improvements. > > > > Note that starting with Open MPI v4.0.0, prototypes for several > > MPI-1 symbols that were deleted in the MPI-3.0 specification > > (which was published in 2012) are no longer available by default in > > mpi.h. See the README for further details. > > > > Version 4.0.0 can be downloaded from the main Open MPI web site: > > > > https://www.open-mpi.org/software/ompi/v4.0/ > > > > > > 4.0.0 -- September, 2018 > > > > > > - OSHMEM updated to the OpenSHMEM 1.4 API. > > - Do not build OpenSHMEM layer when there are no SPMLs available. > > Currently, this means the OpenSHMEM layer will only build if > > a MXM or UCX library is found. > > so what is the most convenience way to get SHMEM working on a single > shared memory node (aka. notebook)? I just realized that I don't have > a SHMEM since Open MPI 3.0. But building with UCX does not help > either. I tried with UCX 1.4 but Open MPI SHMEM > still does not work: > > $ oshcc -o shmem_hello_world-4.0.0 openmpi-4.0.0/examples/hello_oshmem_c.c > $ oshrun -np 2 ./shmem_hello_world-4.0.0 > [1542109710.217344] [tudtug:27715:0] select.c:406 UCX ERROR > no remote registered memory access transport to tudtug:27716: > self/self - Destination is unreachable, tcp/enp0s31f6 - no put short, > tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable, > mm/posix - Destination is unreachable, cma/cma - no put short > [1542109710.217344] [tudtug:27716:0] select.c:406 UCX ERROR > no remote registered memory access transport to tudtug:27715: > self/self - Destination is unreachable, tcp/enp0s31f6 - no put short, > tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable, > mm/posix - Destination is unreachable, cma/cma - no put short > [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266 > Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable > [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305 > Error: add procs FAILED rc=-2 > [tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266 > Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable > [tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305 > Error: add procs FAILED rc=-2 > -- > It looks like SHMEM_INIT failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during SHMEM_INIT; some of which are due to configuration or > environment > problems. This failure appears to be an internal failure; here's some > additional information (which may only be relevant to an Open SHMEM > developer): > > SPML add procs failed > --> Returned "Out of resource" (-2) instead of "Success" (0) > -- > [tudtug:27715] Error: pshmem_init.c:80 - _shmem_init() SHMEM failed to > initialize - aborting > [tudtug:27716] Error: pshmem_init.c:80 - _shmem_init() SHMEM failed to > initialize - aborting > -- > SHMEM_ABORT was invoked on rank 0 (pid 27715, host=tudtug) with errorcode > -1. > -- > -- > A SHMEM process is aborting at a time when it cannot guarantee that all > of its peer processes in the job will be killed properly. You should >
[OMPI users] [Open MPI Announce] Open MPI 4.0.0 Released
Hi, On Mon, Nov 12, 2018 at 10:49 PM Pritchard Jr., Howard via announce wrote: > > The Open MPI Team, representing a consortium of research, academic, and > industry partners, is pleased to announce the release of Open MPI version > 4.0.0. > > v4.0.0 is the start of a new release series for Open MPI. Starting with > this release, the OpenIB BTL supports only iWarp and RoCE by default. > Starting with this release, UCX is the preferred transport protocol > for Infiniband interconnects. The embedded PMIx runtime has been updated > to 3.0.2. The embedded Romio has been updated to 3.2.1. This > release is ABI compatible with the 3.x release streams. There have been > numerous > other bug fixes and performance improvements. > > Note that starting with Open MPI v4.0.0, prototypes for several > MPI-1 symbols that were deleted in the MPI-3.0 specification > (which was published in 2012) are no longer available by default in > mpi.h. See the README for further details. > > Version 4.0.0 can be downloaded from the main Open MPI web site: > > https://www.open-mpi.org/software/ompi/v4.0/ > > > 4.0.0 -- September, 2018 > > > - OSHMEM updated to the OpenSHMEM 1.4 API. > - Do not build OpenSHMEM layer when there are no SPMLs available. > Currently, this means the OpenSHMEM layer will only build if > a MXM or UCX library is found. so what is the most convenience way to get SHMEM working on a single shared memory node (aka. notebook)? I just realized that I don't have a SHMEM since Open MPI 3.0. But building with UCX does not help either. I tried with UCX 1.4 but Open MPI SHMEM still does not work: $ oshcc -o shmem_hello_world-4.0.0 openmpi-4.0.0/examples/hello_oshmem_c.c $ oshrun -np 2 ./shmem_hello_world-4.0.0 [1542109710.217344] [tudtug:27715:0] select.c:406 UCX ERROR no remote registered memory access transport to tudtug:27716: self/self - Destination is unreachable, tcp/enp0s31f6 - no put short, tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable, mm/posix - Destination is unreachable, cma/cma - no put short [1542109710.217344] [tudtug:27716:0] select.c:406 UCX ERROR no remote registered memory access transport to tudtug:27715: self/self - Destination is unreachable, tcp/enp0s31f6 - no put short, tcp/wlp61s0 - no put short, mm/sysv - Destination is unreachable, mm/posix - Destination is unreachable, cma/cma - no put short [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266 Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable [tudtug:27715] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305 Error: add procs FAILED rc=-2 [tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:266 Error: ucp_ep_create(proc=1/2) failed: Destination is unreachable [tudtug:27716] ../../../../../oshmem/mca/spml/ucx/spml_ucx.c:305 Error: add procs FAILED rc=-2 -- It looks like SHMEM_INIT failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during SHMEM_INIT; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open SHMEM developer): SPML add procs failed --> Returned "Out of resource" (-2) instead of "Success" (0) -- [tudtug:27715] Error: pshmem_init.c:80 - _shmem_init() SHMEM failed to initialize - aborting [tudtug:27716] Error: pshmem_init.c:80 - _shmem_init() SHMEM failed to initialize - aborting -- SHMEM_ABORT was invoked on rank 0 (pid 27715, host=tudtug) with errorcode -1. -- -- A SHMEM process is aborting at a time when it cannot guarantee that all of its peer processes in the job will be killed properly. You should double check that everything has shut down cleanly. Local host: tudtug PID:27715 -- -- Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted. -- -- oshrun detected that one or more processes exited with non-zero status, thus causing the job to be terminated. The first process to do so was: Process name: [[2212,1],1] Exit code:255 -- [tudtug:27710] 1 more process has sent help message help-shmem-runtime.txt / shmem_init:startup:in