Hey guys, I have installed OpeMPI v.3.1.4 with UCX, gdrcopy and CUDA from source. When I load the module, here is the list of dependencies: [image: image.png]
In order to test the sanity of this new build, I use the CUDA-aware OpenMPI <https://github.com/NVIDIA-developer-blog/code-samples/tree/master/posts/cuda-aware-mpi-example> example on github. I opened this original ticket #21 <https://github.com/NVIDIA-developer-blog/code-samples/issues/21>, but then I was advised to contact the OpenMPI user list. When I try the jacobi_cuda_normal_mpi with 4 processes, I get an output, and a lot of (similar) warning messages at the bottom. The warnings look like this (full stdout/stderr attached below): [1557488926.059031] [r24g38:186803:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b11a8000000 Shall I be concerned about UCX? Is there a way I would inspect this further, and fix the issue, so the the stderr stays clean? Kind regards, Ehsan Moravveji
Topology size: 2 x 2 Local domain size (current node): 4096 x 4096 Global domain size (all nodes): 8192 x 8192 Starting Jacobi run with 4 processes using "Tesla P100-SXM2-16GB" GPUs (ECC enabled: 4 / 4): Iteration: 0 - Residue: 0.250000 Iteration: 100 - Residue: 0.002397 Iteration: 200 - Residue: 0.001204 Iteration: 300 - Residue: 0.000804 Iteration: 400 - Residue: 0.000603 Iteration: 500 - Residue: 0.000483 Iteration: 600 - Residue: 0.000403 Iteration: 700 - Residue: 0.000345 Iteration: 800 - Residue: 0.000302 Iteration: 900 - Residue: 0.000269 Stopped after 1000 iterations with residue 0.000242 Total Jacobi run time: 1.2674 sec. Average per-process communication time: 0.1054 sec. Measured lattice updates: 52.92 GLU/s (total), 13.23 GLU/s (per process) Measured FLOPS: 264.62 GFLOPS (total), 66.15 GFLOPS (per process) Measured device bandwidth: 3.39 TB/s (total), 846.78 GB/s (per process) [1557488926.059031] [r24g38:186803:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b11a8000000 [1557488926.059057] [r24g38:186803:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b11b0200000 [1557488926.059062] [r24g38:186803:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b11b0208000 [1557488926.059030] [r24g38:186801:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2af544000000 [1557488926.059045] [r24g38:186801:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2af54c200000 [1557488926.059049] [r24g38:186801:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2af54c208000 [1557488926.059054] [r24g38:186801:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2af54c210000 [1557488926.059058] [r24g38:186801:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2af54c218000 [1557488926.059062] [r24g38:186801:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2af54c220000 [1557488926.059066] [r24g38:186801:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2af54e000000 [1557488926.059066] [r24g38:186803:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b11b0210000 [1557488926.059071] [r24g38:186803:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b11b0218000 [1557488926.059075] [r24g38:186803:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b11b0220000 [1557488926.059079] [r24g38:186803:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b11b2000000 [1557488926.059150] [r24g38:186802:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b99a4000000 [1557488926.059177] [r24g38:186802:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b99ac200000 [1557488926.059183] [r24g38:186802:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b99ac208000 [1557488926.059188] [r24g38:186802:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b99ac210000 [1557488926.059226] [r24g38:186802:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b99ac218000 [1557488926.059231] [r24g38:186802:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b99ac220000 [1557488926.059236] [r24g38:186802:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b99ae000000 [1557488926.059150] [r24g38:186804:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b4bf8000000 [1557488926.059177] [r24g38:186804:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b4c00200000 [1557488926.059182] [r24g38:186804:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b4c00208000 [1557488926.059187] [r24g38:186804:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b4c00210000 [1557488926.059192] [r24g38:186804:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b4c00218000 [1557488926.059226] [r24g38:186804:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b4c00220000 [1557488926.059231] [r24g38:186804:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b4c02000000 [1557488926.914026] [r24g38:186802:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b99a4000000 [1557488926.914056] [r24g38:186802:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b99ac200000 [1557488926.914068] [r24g38:186802:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b99ac208000 [1557488926.914079] [r24g38:186802:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b99ac210000 [1557488926.914089] [r24g38:186802:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b99ac218000 [1557488926.914100] [r24g38:186802:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b99ac220000 [1557488926.914110] [r24g38:186802:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b99ae000000 [1557488926.914149] [r24g38:186804:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b4bf8000000 [1557488926.914177] [r24g38:186804:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b4c00200000 [1557488926.914227] [r24g38:186804:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b4c00208000 [1557488926.914240] [r24g38:186804:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b4c00210000 [1557488926.914251] [r24g38:186804:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b4c00218000 [1557488926.914261] [r24g38:186804:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b4c00220000 [1557488926.914271] [r24g38:186804:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b4c02000000 [1557488926.996036] [r24g38:186803:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b11a8000000 [1557488926.996063] [r24g38:186803:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b11b0200000 [1557488926.996097] [r24g38:186803:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b11b0208000 [1557488926.996108] [r24g38:186803:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b11b0210000 [1557488926.996118] [r24g38:186803:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b11b0218000 [1557488926.996127] [r24g38:186803:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b11b0220000 [1557488926.996137] [r24g38:186803:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2b11b2000000 [1557488927.077068] [r24g38:186801:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2af544000000 [1557488927.077095] [r24g38:186801:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2af54c200000 [1557488927.077107] [r24g38:186801:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2af54c208000 [1557488927.077117] [r24g38:186801:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2af54c210000 [1557488927.077154] [r24g38:186801:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2af54c218000 [1557488927.077165] [r24g38:186801:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2af54c220000 [1557488927.077175] [r24g38:186801:0] memtype_cache.c:137 UCX WARN destroying inuse address:0x2af54e000000
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users