Re: [OMPI devel] IBCM error
Fixed in https://svn.open-mpi.org/trac/ompi/changeset/18897 Is it any other know IBCM issue ? Regards, Pasha Jeff Squyres wrote: I think you said opposite things: Lenny's command line did not specifically ask for ibcm, but it was used anyway. Lenny -- did you explicitly request it somewhere else (e.g., env var or MCA param file)? I suspect that you did not; I suspect (without looking at the code again) that ibcm tried to select itself and failed on the ibcm_listen() call, so it fell back to oob. This might have to be another workaround in OMPI, perhaps something like this: if (ibcm_listen() fails) if (ibcm explicitly requested) print_warning() fail to use ibcm Has this been filed as a bug at openfabrics.org? I don't think that I filed it when Brad and I were testing on RoadRunner -- it would probably be good if someone filed it. On Jul 13, 2008, at 8:56 AM, Lenny Verkhovsky wrote: Pasha is right, I didn't disabled it. On 7/13/08, Pavel Shamis (Pasha) wrote: Jeff Squyres wrote: Brad and I did some scale testing of IBCM and saw this error sometimes. It seemed to happen with higher frequency when you increased the number of processes on a single node. I talked to Sean Hefty about it, but we never figured out a definitive cause or solution. My best guess is that there is something wonky about multiple processes simultaneously interacting with the IBCM kernel driver from userspace; but I don't know jack about kernel stuff, so that's a total SWAG. Thanks for reminding me of this issue; I admit that I had forgotten about it. :-( Pasha -- should IBCM not be the default? It is not default. I guess Lenny configured it explicitly, is not it ? Pasha. On Jul 13, 2008, at 7:08 AM, Lenny Verkhovsky wrote: Hi, I am getting this error sometimes. /home/USERS/lenny/OMPI_COMP_PATH/bin/mpirun -np 100 -hostfile /home/USERS/lenny/TESTS/COMPILERS/hostfile /home/USERS/lenny/TESTS/COMPILERS/hello [witch24][[32428,1],96][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_ibcm.c:769:ibcm_component_query] failed to ib_cm_listen 10 times: rc=-1, errno=22 Hello world! I'm 0 of 100 on witch2 Best Regards Lenny. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] IBCM error
I think you said opposite things: Lenny's command line did not specifically ask for ibcm, but it was used anyway. Lenny -- did you explicitly request it somewhere else (e.g., env var or MCA param file)? I suspect that you did not; I suspect (without looking at the code again) that ibcm tried to select itself and failed on the ibcm_listen() call, so it fell back to oob. This might have to be another workaround in OMPI, perhaps something like this: if (ibcm_listen() fails) if (ibcm explicitly requested) print_warning() fail to use ibcm Has this been filed as a bug at openfabrics.org? I don't think that I filed it when Brad and I were testing on RoadRunner -- it would probably be good if someone filed it. On Jul 13, 2008, at 8:56 AM, Lenny Verkhovsky wrote: Pasha is right, I didn't disabled it. On 7/13/08, Pavel Shamis (Pasha) wrote: Jeff Squyres wrote: Brad and I did some scale testing of IBCM and saw this error sometimes. It seemed to happen with higher frequency when you increased the number of processes on a single node. I talked to Sean Hefty about it, but we never figured out a definitive cause or solution. My best guess is that there is something wonky about multiple processes simultaneously interacting with the IBCM kernel driver from userspace; but I don't know jack about kernel stuff, so that's a total SWAG. Thanks for reminding me of this issue; I admit that I had forgotten about it. :-( Pasha -- should IBCM not be the default? It is not default. I guess Lenny configured it explicitly, is not it ? Pasha. On Jul 13, 2008, at 7:08 AM, Lenny Verkhovsky wrote: Hi, I am getting this error sometimes. /home/USERS/lenny/OMPI_COMP_PATH/bin/mpirun -np 100 -hostfile /home/ USERS/lenny/TESTS/COMPILERS/hostfile /home/USERS/lenny/TESTS/ COMPILERS/hello [witch24][[32428,1],96][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_ibcm.c:769:ibcm_component_query] failed to ib_cm_listen 10 times: rc=-1, errno=22 Hello world! I'm 0 of 100 on witch2 Best Regards Lenny. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
Re: [OMPI devel] IBCM error
Pasha is right, I didn't disabled it. On 7/13/08, Pavel Shamis (Pasha) wrote: > > Jeff Squyres wrote: > >> Brad and I did some scale testing of IBCM and saw this error sometimes. >> It seemed to happen with higher frequency when you increased the number of >> processes on a single node. >> >> I talked to Sean Hefty about it, but we never figured out a definitive >> cause or solution. My best guess is that there is something wonky about >> multiple processes simultaneously interacting with the IBCM kernel driver >> from userspace; but I don't know jack about kernel stuff, so that's a total >> SWAG. >> >> Thanks for reminding me of this issue; I admit that I had forgotten about >> it. :-( Pasha -- should IBCM not be the default? >> > It is not default. I guess Lenny configured it explicitly, is not it ? > > Pasha. > > >> >> >> On Jul 13, 2008, at 7:08 AM, Lenny Verkhovsky wrote: >> >> Hi, >>> >>> I am getting this error sometimes. >>> >>> /home/USERS/lenny/OMPI_COMP_PATH/bin/mpirun -np 100 -hostfile >>> /home/USERS/lenny/TESTS/COMPILERS/hostfile >>> /home/USERS/lenny/TESTS/COMPILERS/hello >>> [witch24][[32428,1],96][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_ibcm.c:769:ibcm_component_query] >>> failed to ib_cm_listen 10 times: rc=-1, errno=22 >>> Hello world! I'm 0 of 100 on witch2 >>> >>> >>> Best Regards >>> >>> Lenny. >>> >>> >>> ___ >>> devel mailing list >>> de...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/devel >>> >> >> >> > ___ > devel mailing list > de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/devel >
Re: [OMPI devel] IBCM error
Jeff Squyres wrote: Brad and I did some scale testing of IBCM and saw this error sometimes. It seemed to happen with higher frequency when you increased the number of processes on a single node. I talked to Sean Hefty about it, but we never figured out a definitive cause or solution. My best guess is that there is something wonky about multiple processes simultaneously interacting with the IBCM kernel driver from userspace; but I don't know jack about kernel stuff, so that's a total SWAG. Thanks for reminding me of this issue; I admit that I had forgotten about it. :-( Pasha -- should IBCM not be the default? It is not default. I guess Lenny configured it explicitly, is not it ? Pasha. On Jul 13, 2008, at 7:08 AM, Lenny Verkhovsky wrote: Hi, I am getting this error sometimes. /home/USERS/lenny/OMPI_COMP_PATH/bin/mpirun -np 100 -hostfile /home/USERS/lenny/TESTS/COMPILERS/hostfile /home/USERS/lenny/TESTS/COMPILERS/hello [witch24][[32428,1],96][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_ibcm.c:769:ibcm_component_query] failed to ib_cm_listen 10 times: rc=-1, errno=22 Hello world! I'm 0 of 100 on witch2 Best Regards Lenny. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel
Re: [OMPI devel] IBCM error
Brad and I did some scale testing of IBCM and saw this error sometimes. It seemed to happen with higher frequency when you increased the number of processes on a single node. I talked to Sean Hefty about it, but we never figured out a definitive cause or solution. My best guess is that there is something wonky about multiple processes simultaneously interacting with the IBCM kernel driver from userspace; but I don't know jack about kernel stuff, so that's a total SWAG. Thanks for reminding me of this issue; I admit that I had forgotten about it. :-( Pasha -- should IBCM not be the default? On Jul 13, 2008, at 7:08 AM, Lenny Verkhovsky wrote: Hi, I am getting this error sometimes. /home/USERS/lenny/OMPI_COMP_PATH/bin/mpirun -np 100 -hostfile /home/ USERS/lenny/TESTS/COMPILERS/hostfile /home/USERS/lenny/TESTS/ COMPILERS/hello [witch24][[32428,1],96][../../../../../ompi/mca/btl/openib/connect/ btl_openib_connect_ibcm.c:769:ibcm_component_query] failed to ib_cm_listen 10 times: rc=-1, errno=22 Hello world! I'm 0 of 100 on witch2 Best Regards Lenny. ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Jeff Squyres Cisco Systems
[OMPI devel] IBCM error
Hi, I am getting this error sometimes. /home/USERS/lenny/OMPI_COMP_PATH/bin/mpirun -np 100 -hostfile /home/USERS/lenny/TESTS/COMPILERS/hostfile /home/USERS/lenny/TESTS/COMPILERS/hello [witch24][[32428,1],96][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_ibcm.c:769:ibcm_component_query] failed to ib_cm_listen 10 times: rc=-1, errno=22 Hello world! I'm 0 of 100 on witch2 Best Regards Lenny.