On 14 June 2022 at 09:38, Serguei Sokol wrote: | Hi, | | Probably, this issue would be better posted here | https://github.com/openucx/ucx/issues
Seconded. I don't have much to add: these MPI examples are ten years old, worked all the time as far as we know, and haven't changed. Maybe some base assumption in the libraries is different. And if I may, I am not sure how much you are gaining here. You are fitting each time series individually with R. That won't be fast. You will not gain something -- but maybe not_that_ much -- by spreading MPI around it. If speed is of the essence, maybe you need an equivalent of auto.arima in compiled code. As a debugging measure, I'd also try to see if the issue goes away when you call less demaning R code: maybe auto.arima leads to multithreaded code inside each call which may upset the R stack called from MPI? Also, rinside_mpi_sample4.cpp is (per the one-commit in its history) a contributed example. Maybe its authors can help. Sorry, no smoking gun or immediate help. Dirk | Best, | Serguei. | | Le 14/06/2022 à 07:24, Maddegedara Lalith a écrit : | > Hello, | > | > I want to use RInside in my C++ based MPI application to do time series | > forecasting using the auto.arima library of R. The RInside instance in | > each MPI rank is expected to do an independent calculation (e.g. time | > series forecast). | > | > With one MPI rank, it always completes without producing any error. | > However, with more than 1 mpi ranks, it produces the following error. | > Depending on the run, different numbers of mpi ranks produce the same | > error. On rare occasions, all the ranks successfully complete the | > execution. Further, I found that even your example | > "rinside_mpi_sample4.cpp" produces the same error. | > | > I am using the Intel MPI library (version 2021.1). I tried | > compiling with icpc and g++. Both produced the same error. | > Could you please help me to solve this problem. | > | > With best regards | > Lal | > | > [ibis:14878:0:14992] ud_ep.c:565 Assertion `ep->dest_ep_id == | > UCT_UD_EP_NULL_ID || ep->dest_ep_id == ctl->conn_rep.src_ep_id' failed | > | > ==== backtrace (tid: 14994) ==== | > 0 0x000000000004d455 ucs_debug_print_backtrace() ???:0 | > 1 0x0000000000042b5f uct_ud_ep_process_rx() ???:0 | > 2 0x00000000000471cd uct_ud_mlx5_ep_t_delete() ???:0 | > 3 0x000000000003ebdf uct_ud_iface_release_desc() ???:0 | > 4 0x0000000000040436 ucs_cpu_get_memcpy_bw() ???:0 | > 5 0x000000000004050b ucs_cpu_get_memcpy_bw() ???:0 | > 6 0x0000000000041343 ucs_async_dispatch_handlers() ???:0 | > 7 0x0000000000041488 ucs_async_dispatch_timerq() ???:0 | > 8 0x0000000000043c34 ucs_async_pipe_drain() ???:0 | > 9 0x0000000000007ea5 start_thread() pthread_create.c:0 | > 10 0x00000000000fe96d __clone() ???:0 | > ================================= | > | > _______________________________________________ | > Rcpp-devel mailing list | > Rcpp-devel@lists.r-forge.r-project.org | > https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel | | _______________________________________________ | Rcpp-devel mailing list | Rcpp-devel@lists.r-forge.r-project.org | https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel -- dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org _______________________________________________ Rcpp-devel mailing list Rcpp-devel@lists.r-forge.r-project.org https://lists.r-forge.r-project.org/cgi-bin/mailman/listinfo/rcpp-devel