Hi, I compiled OpenMPI 1.6 on a 64bit Solaris ultrasparc machine. Compilation and installation worked without a problem. However, when trying to run an application with mpirun I always faced this error:
[hostname:14798] [[50433,0],0] rmcast:init: setsockopt() failed on MULTICAST_IF for multicast network xxx.xxx.xxx.xxx interface xxx.xxx.xxx.xxx Error: Invalid argument (22) [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file ../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 825 [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file ../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 744 [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file ../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 193 [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file ../../../../openmpi-1.6/orte/mca/rmcast/base/rmcast_base_select.c at line 56 [hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file ../../../../../openmpi-1.6/orte/mca/ess/hnp/ess_hnp_module.c at line 233 -------------------------------------------------------------------------- It looks like orte_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during orte_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): orte_rmcast_base_select failed --> Returned value Error (-1) instead of ORTE_SUCCESS After some digging I found that the following patch seems to fix the problem (at least the application seems to run correct now): --- a/orte/mca/rmcast/udp/rmcast_udp.c Tue Apr 3 16:30:29 2012 +++ b/orte/mca/rmcast/udp/rmcast_udp.c Mon Jul 30 15:12:02 2012 @@ -936,9 +936,16 @@ } } else { /* on the xmit side, need to set the interface */ + void const *addrptr; memset(&inaddr, 0, sizeof(inaddr)); inaddr.sin_addr.s_addr = htonl(chan->interface); +#ifdef __sun + addrlen = sizeof(inaddr.sin_addr); + addrptr = (void *)&inaddr.sin_addr; +#else addrlen = sizeof(struct sockaddr_in); + addrptr = (void *)&inaddr; +#endif OPAL_OUTPUT_VERBOSE((2, orte_rmcast_base.rmcast_output, "setup:socket:xmit interface %03d.%03d.%03d.%03d", @@ -945,7 +952,7 @@ OPAL_IF_FORMAT_ADDR(chan->interface))); if ((setsockopt(target_sd, IPPROTO_IP, IP_MULTICAST_IF, - (void *)&inaddr, addrlen)) < 0) { + addrptr, addrlen)) < 0) { opal_output(0, "%s rmcast:init: setsockopt() failed on MULTICAST_IF\n" "\tfor multicast network %03d.%03d.%03d.%03d interface %03d.%03d.%03d.%03d\n\tError: %s (%d)", ORTE_NAME_PRINT(ORTE_PROC_MY_NAME), Can anybody confirm that the patch is good/correct? In particular that the '__sun' part is the right thing to do? Thanks, Daniel
smime.p7s
Description: S/MIME Cryptographic Signature