Re: [OMPI users] internal error with mpiJava in openmpi-1.9a1r27380
Like I said, I haven't tried any of that, so I have no idea if/how it would work. I don't have access to any hetero system and we don't see it very often at all, so it is quite possible the hetero support really isn't there. I'll look at some of the Java-specific issues later. On Thu, Oct 11, 2012 at 12:51 AM, Siegmar Gross < siegmar.gr...@informatik.hs-fulda.de> wrote: > Hi, > > > I haven't tried heterogeneous apps on the Java code yet - could well not > > work. At the least, I would expect you need to compile your Java app > > against the corresponding OMPI install on each architecture, and ensure > the > > right one gets run on each node. Even though it's a Java app, the classes > > need to get linked against the proper OMPI code for that node. > > > > As for Linux-only operation: it works fine for me. Did you remember to > (a) > > build mpiexec on those linux machines (as opposed to using the Solaris > > version), and (b) recompile your app against that OMPI installation? > > I didn't know that the classfiles are different, but it doesn't change > anything, if I create different classfiles. I use a small shell script > to create all neccessary files on all machines. > > > tyr java 118 make_classfiles > === rs0 === > ... > mpijavac -d /home/fd1026/SunOS/sparc/mpi_classfiles MsgSendRecvMain.java > mpijavac -d /home/fd1026/SunOS/sparc/mpi_classfiles ColumnSendRecvMain.java > mpijavac -d /home/fd1026/SunOS/sparc/mpi_classfiles ColumnScatterMain.java > mpijavac -d /home/fd1026/SunOS/sparc/mpi_classfiles EnvironVarMain.java > === sunpc1 === > ... > mpijavac -d /home/fd1026/SunOS/x86_64/mpi_classfiles MsgSendRecvMain.java > mpijavac -d /home/fd1026/SunOS/x86_64/mpi_classfiles > ColumnSendRecvMain.java > mpijavac -d /home/fd1026/SunOS/x86_64/mpi_classfiles ColumnScatterMain.java > mpijavac -d /home/fd1026/SunOS/x86_64/mpi_classfiles EnvironVarMain.java > === linpc1 === > ... > mpijavac -d /home/fd1026/Linux/x86_64/mpi_classfiles MsgSendRecvMain.java > mpijavac -d /home/fd1026/Linux/x86_64/mpi_classfiles > ColumnSendRecvMain.java > mpijavac -d /home/fd1026/Linux/x86_64/mpi_classfiles ColumnScatterMain.java > mpijavac -d /home/fd1026/Linux/x86_64/mpi_classfiles EnvironVarMain.java > > > Every machine should now find its classfiles. > > tyr java 119 mpiexec -host sunpc0,linpc0,rs0 java EnvironVarMain > > Operating system: SunOSProcessor architecture: x86_64 > CLASSPATH: ...:.:/home/fd1026/SunOS/x86_64/mpi_classfiles > > Operating system: LinuxProcessor architecture: x86_64 > CLASSPATH: ...:.:/home/fd1026/Linux/x86_64/mpi_classfiles > > Operating system: SunOSProcessor architecture: sparc > CLASSPATH: ...:.:/home/fd1026/SunOS/sparc/mpi_classfiles > > > > tyr java 120 mpiexec -host sunpc0,linpc0,rs0 java MsgSendRecvMain > -- > It looks like opal_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during opal_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > mca_base_open failed > --> Returned value -2 instead of OPAL_SUCCESS > -- > ... > > > > tyr java 121 mpiexec -host sunpc0,rs0 java MsgSendRecvMain > [rs0.informatik.hs-fulda.de:13671] *** An error occurred in MPI_Comm_dup > [rs0.informatik.hs-fulda.de:13671] *** reported by process [1077346305,1] > [rs0.informatik.hs-fulda.de:13671] *** on communicator MPI_COMM_WORLD > [rs0.informatik.hs-fulda.de:13671] *** MPI_ERR_INTERN: internal error > [rs0.informatik.hs-fulda.de:13671] *** MPI_ERRORS_ARE_FATAL (processes in > this > communicator will now abort, > [rs0.informatik.hs-fulda.de:13671] ***and potentially your MPI job) > > > > I get an error even then, when I login on a Linux machine, before I > run the command. > > linpc0 fd1026 99 mpiexec -host linpc0,linpc1 java MsgSendRecvMain > -- > It looks like opal_init failed for some reason; your parallel process is > likely to abort. There are many reasons that a parallel process can > fail during opal_init; some of which are due to configuration or > environment problems. This failure appears to be an internal failure; > here's some additional information (which may only be relevant to an > Open MPI developer): > > mca_base_open failed > --> Returned value -2 instead of OPAL_SUCCESS > -- > ... > *** An error occurred in MPI_Init > *** on a NULL communicator > *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, > ***and potentially your MPI job) > [linpc1:3004] Local abort before MPI_I
Re: [OMPI users] debugs for jobs not starting
I'm afraid I'm confused - I don't understand what is and isn't working. What "next process" isn't starting? On Thu, Oct 11, 2012 at 9:41 AM, Michael Di Domenico wrote: > adding some additional info > > did an strace on an orted process where xhpl failed to start, i did > this after the mpirun execution, so i probably missed some output, but > it keeps scrolling > > poll([{fd=4, events=POLLIN},{fd=7, events=POLLIN},{fd=8, > events=POLLIN},{fd=10, events=POLLIN},{fd=12, events=POLLIN},{fd=13, > events=POLLIN},{fd=14, events=POLLIN},{fd=15, events=POLLIN},{fd=16, > events=POLLIN}], 9, 1000) = 0 (Timeout) > > i didn't see anything useful in /proc under those file descriptors, > but perhaps i missed something i don't know to look for > > On Thu, Oct 11, 2012 at 12:06 PM, Michael Di Domenico > wrote: > > too add a little more detail, it looks like xhpl is not actually > > starting on all nodes when i kick off the mpirun > > > > each time i cancel and restart the job, the nodes that do not start > > change, so i can't call it a bad node > > > > if i disable infiniband with --mca btl self,sm,tcp on occasion i can > > get xhpl to actually run, but it's not consistent > > > > i'm going to check my ethernet network and make sure there's no > > problems there (could this be an OOB error with mpirun?), on the nodes > > that fail to start xhpl, i do see the orte process, but nothing in the > > logs about why it failed to launch xhpl > > > > > > > > On Thu, Oct 11, 2012 at 11:49 AM, Michael Di Domenico > > wrote: > >> I'm trying to diagnose an MPI job (in this case xhpl), that fails to > >> start when the rank count gets fairly high into the thousands. > >> > >> My symptom is the jobs fires up via slurm, and I can see all the xhpl > >> processes on the nodes, but it never kicks over to the next process. > >> > >> My question is, what debugs should I turn on to tell me what the > >> system might be waiting on? > >> > >> I've checked a bunch of things, but I'm probably overlooking something > >> trivial (which is par for me). > >> > >> I'm using the Openmpi 1.6.1, Slurm 2.4.2 on CentOS 6.3, with > Infiniband/PSM > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >
Re: [OMPI users] [1.6.2] Compilation Error (at vtfilter) with Intel Compiler
Hi, The error I get I couldn't find in the mails from your link. But I also didn't set CXX, F77 and FC. I'll try that tomorrow and we'll see if it changes anything. I find the error I get weird because some file is not found which I guess should not occur when switching compilers ... On Thu, Oct 11, 2012 at 01:09:28PM -0400, Gus Correa wrote: > Hi Christian > > Would your problem be similar to the one reported two days ago on > this thread? [It also failed to compile vampir trace tools, > it also didn't have the Intel C++ compiler specified to configure.] > > http://www.open-mpi.org/community/lists/users/2012/10/20449.php > > Have you tried to specify the Intel C++ compiler > to the configure script? > > ./configure CC=icc CXX=icpc ... etc, etc ... > > I hope this helps, > Gus Correa > > > > On 10/11/2012 11:00 AM, Christian Krause wrote: > >Hi, > > > >I tried to compile the current OpenMPI 1.6.2 with the Intel Compiler > > > ># icc --version > >icc (ICC) 12.0.4 20110427 > > > > > >The error I get is the following (I changed directly in the vtfilter > >directory where the error occurs to reduce output for this mail): > > > ># cd ompi/contrib/vt/vt/tools/vtfilter/ > ># make > >Making all in . > >make[1]: Entering directory > >`/gpfs0/global/local/src/xxx-mpi/openmpi-1.6.2/ompi/contrib/vt/vt/tools/vtfilter' > > CXXvtfilter-vt_filter.o > >cc1plus: error: vtfilter-vt_filter.d: No such file or directory > >make[1]: *** [vtfilter-vt_filter.o] Error 1 > >make[1]: Leaving directory > >`/gpfs0/global/local/src/xxx-mpi/openmpi-1.6.2/ompi/contrib/vt/vt/tools/vtfilter' > >make: *** [all-recursive] Error 1 > > > > > >configure options from config.log: > > > >./configure CC=icc --prefix=/usr/local/openmpi/1.6.2_intel_12.0.4 > >--with-sge --with-hwloc=/usr/local/hwloc/1.5_intel_12.0.4 > >--with-openib-libdir=/usr/lib64 --with-udapl-libdir=/usr/lib64 > > > > > >I have already built hwloc and pciutils locally using icc. Also I > >recently compiled OpenMPI 1.6.2 with gcc 4.7.1 with hwloc and pciutils > >too which worked without problems (configure basically the same, i.e. > >not setting CC and using different hwloc). That's why I'm assuming the > >error is somehow icc's fault ... I'm new to this mailing list and I > >already received some mails concerning the Intel Compiler so I figure > >there may be others who've experienced this problem? > > > > > >Thanks for any help in advance. > > > >Regards > >Christian > >___ > >users mailing list > >us...@open-mpi.org > >http://www.open-mpi.org/mailman/listinfo.cgi/users > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Beste Grüße / Best Regards Christian Krause aka wookietreiber --- EGAL WIE DICHT DU BIST, GOETHE WAR DICHTER.
Re: [OMPI users] [1.6.2] Compilation Error (at vtfilter) with Intel Compiler
Hi Christian Would your problem be similar to the one reported two days ago on this thread? [It also failed to compile vampir trace tools, it also didn't have the Intel C++ compiler specified to configure.] http://www.open-mpi.org/community/lists/users/2012/10/20449.php Have you tried to specify the Intel C++ compiler to the configure script? ./configure CC=icc CXX=icpc ... etc, etc ... I hope this helps, Gus Correa On 10/11/2012 11:00 AM, Christian Krause wrote: Hi, I tried to compile the current OpenMPI 1.6.2 with the Intel Compiler # icc --version icc (ICC) 12.0.4 20110427 The error I get is the following (I changed directly in the vtfilter directory where the error occurs to reduce output for this mail): # cd ompi/contrib/vt/vt/tools/vtfilter/ # make Making all in . make[1]: Entering directory `/gpfs0/global/local/src/xxx-mpi/openmpi-1.6.2/ompi/contrib/vt/vt/tools/vtfilter' CXXvtfilter-vt_filter.o cc1plus: error: vtfilter-vt_filter.d: No such file or directory make[1]: *** [vtfilter-vt_filter.o] Error 1 make[1]: Leaving directory `/gpfs0/global/local/src/xxx-mpi/openmpi-1.6.2/ompi/contrib/vt/vt/tools/vtfilter' make: *** [all-recursive] Error 1 configure options from config.log: ./configure CC=icc --prefix=/usr/local/openmpi/1.6.2_intel_12.0.4 --with-sge --with-hwloc=/usr/local/hwloc/1.5_intel_12.0.4 --with-openib-libdir=/usr/lib64 --with-udapl-libdir=/usr/lib64 I have already built hwloc and pciutils locally using icc. Also I recently compiled OpenMPI 1.6.2 with gcc 4.7.1 with hwloc and pciutils too which worked without problems (configure basically the same, i.e. not setting CC and using different hwloc). That's why I'm assuming the error is somehow icc's fault ... I'm new to this mailing list and I already received some mails concerning the Intel Compiler so I figure there may be others who've experienced this problem? Thanks for any help in advance. Regards Christian ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] debugs for jobs not starting
adding some additional info did an strace on an orted process where xhpl failed to start, i did this after the mpirun execution, so i probably missed some output, but it keeps scrolling poll([{fd=4, events=POLLIN},{fd=7, events=POLLIN},{fd=8, events=POLLIN},{fd=10, events=POLLIN},{fd=12, events=POLLIN},{fd=13, events=POLLIN},{fd=14, events=POLLIN},{fd=15, events=POLLIN},{fd=16, events=POLLIN}], 9, 1000) = 0 (Timeout) i didn't see anything useful in /proc under those file descriptors, but perhaps i missed something i don't know to look for On Thu, Oct 11, 2012 at 12:06 PM, Michael Di Domenico wrote: > too add a little more detail, it looks like xhpl is not actually > starting on all nodes when i kick off the mpirun > > each time i cancel and restart the job, the nodes that do not start > change, so i can't call it a bad node > > if i disable infiniband with --mca btl self,sm,tcp on occasion i can > get xhpl to actually run, but it's not consistent > > i'm going to check my ethernet network and make sure there's no > problems there (could this be an OOB error with mpirun?), on the nodes > that fail to start xhpl, i do see the orte process, but nothing in the > logs about why it failed to launch xhpl > > > > On Thu, Oct 11, 2012 at 11:49 AM, Michael Di Domenico > wrote: >> I'm trying to diagnose an MPI job (in this case xhpl), that fails to >> start when the rank count gets fairly high into the thousands. >> >> My symptom is the jobs fires up via slurm, and I can see all the xhpl >> processes on the nodes, but it never kicks over to the next process. >> >> My question is, what debugs should I turn on to tell me what the >> system might be waiting on? >> >> I've checked a bunch of things, but I'm probably overlooking something >> trivial (which is par for me). >> >> I'm using the Openmpi 1.6.1, Slurm 2.4.2 on CentOS 6.3, with Infiniband/PSM
Re: [OMPI users] debugs for jobs not starting
too add a little more detail, it looks like xhpl is not actually starting on all nodes when i kick off the mpirun each time i cancel and restart the job, the nodes that do not start change, so i can't call it a bad node if i disable infiniband with --mca btl self,sm,tcp on occasion i can get xhpl to actually run, but it's not consistent i'm going to check my ethernet network and make sure there's no problems there (could this be an OOB error with mpirun?), on the nodes that fail to start xhpl, i do see the orte process, but nothing in the logs about why it failed to launch xhpl On Thu, Oct 11, 2012 at 11:49 AM, Michael Di Domenico wrote: > I'm trying to diagnose an MPI job (in this case xhpl), that fails to > start when the rank count gets fairly high into the thousands. > > My symptom is the jobs fires up via slurm, and I can see all the xhpl > processes on the nodes, but it never kicks over to the next process. > > My question is, what debugs should I turn on to tell me what the > system might be waiting on? > > I've checked a bunch of things, but I'm probably overlooking something > trivial (which is par for me). > > I'm using the Openmpi 1.6.1, Slurm 2.4.2 on CentOS 6.3, with Infiniband/PSM
[OMPI users] debugs for jobs not starting
I'm trying to diagnose an MPI job (in this case xhpl), that fails to start when the rank count gets fairly high into the thousands. My symptom is the jobs fires up via slurm, and I can see all the xhpl processes on the nodes, but it never kicks over to the next process. My question is, what debugs should I turn on to tell me what the system might be waiting on? I've checked a bunch of things, but I'm probably overlooking something trivial (which is par for me). I'm using the Openmpi 1.6.1, Slurm 2.4.2 on CentOS 6.3, with Infiniband/PSM
[OMPI users] [1.6.2] Compilation Error (at vtfilter) with Intel Compiler
Hi, I tried to compile the current OpenMPI 1.6.2 with the Intel Compiler # icc --version icc (ICC) 12.0.4 20110427 The error I get is the following (I changed directly in the vtfilter directory where the error occurs to reduce output for this mail): # cd ompi/contrib/vt/vt/tools/vtfilter/ # make Making all in . make[1]: Entering directory `/gpfs0/global/local/src/xxx-mpi/openmpi-1.6.2/ompi/contrib/vt/vt/tools/vtfilter' CXXvtfilter-vt_filter.o cc1plus: error: vtfilter-vt_filter.d: No such file or directory make[1]: *** [vtfilter-vt_filter.o] Error 1 make[1]: Leaving directory `/gpfs0/global/local/src/xxx-mpi/openmpi-1.6.2/ompi/contrib/vt/vt/tools/vtfilter' make: *** [all-recursive] Error 1 configure options from config.log: ./configure CC=icc --prefix=/usr/local/openmpi/1.6.2_intel_12.0.4 --with-sge --with-hwloc=/usr/local/hwloc/1.5_intel_12.0.4 --with-openib-libdir=/usr/lib64 --with-udapl-libdir=/usr/lib64 I have already built hwloc and pciutils locally using icc. Also I recently compiled OpenMPI 1.6.2 with gcc 4.7.1 with hwloc and pciutils too which worked without problems (configure basically the same, i.e. not setting CC and using different hwloc). That's why I'm assuming the error is somehow icc's fault ... I'm new to this mailing list and I already received some mails concerning the Intel Compiler so I figure there may be others who've experienced this problem? Thanks for any help in advance. Regards Christian
Re: [OMPI users] windows + threads
Just to follow up my earlier post, checking out master and building that gives me the same lock up in ompi_info > ompi_info.exe!opal_atomic_lifo_push(opal_atomic_lifo_t * lifo, > opal_list_item_t * item) Line 73 C ompi_info.exe!ompi_free_list_grow(ompi_free_list_t * flist, unsigned __int64 num_elements) Line 257 C ompi_info.exe!ompi_rb_tree_init(ompi_rb_tree_t * tree, int (void *, void *) * comp) Line 77 C ompi_info.exe!mca_mpool_base_tree_init() Line 88 C ompi_info.exe!mca_mpool_base_open() Line 86 C ompi_info.exe!ompi_info_register_components(opal_pointer_array_t * mca_types, opal_pointer_array_t * component_map) Line 264C ompi_info.exe!main(int argc, char * * argv) Line 158 C ompi_info.exe!__tmainCRTStartup() Line 536 C ompi_info.exe!mainCRTStartup() Line 377 C kernel32.dll!07feac87167e() Unknown ntdll.dll!07feae4cc3f1() Unknown at the line below with the * at the start. Well actually I guess it's sitting in a spin lock. Should I continue playing or is master unstable? Thanks JB /* Add one element to the LIFO. We will return the last head of the list * to allow the upper level to detect if this element is the first one in the * list (if the list was empty before this operation). */ static inline opal_list_item_t* opal_atomic_lifo_push( opal_atomic_lifo_t* lifo, opal_list_item_t* item ) { #if OPAL_ENABLE_MULTI_THREADS do { * item->opal_list_next = lifo->opal_lifo_head; opal_atomic_wmb(); if( opal_atomic_cmpset_ptr( &(lifo->opal_lifo_head), (void*)item->opal_list_next, item ) ) { opal_atomic_cmpset_32((volatile int32_t*)&item->item_free, 1, 0); return (opal_list_item_t*)item->opal_list_next; } /* DO some kind of pause to release the bus */ } while( 1 ); #else item->opal_list_next = lifo->opal_lifo_head; lifo->opal_lifo_head = item; return (opal_list_item_t*)item->opal_list_next; #endif /* OPAL_ENABLE_MULTI_THREADS */ }
[OMPI users] question to scattering an object in openmpi-1.9a1r27380
Hi, I have built openmpi-1.9a1r27380 with Java support and try some small programs. When I try to scatter an object, I get a ClassCastException. I use the following object. public class MyData implements java.io.Serializable { static final long serialVersionUID = -5243516570672186644L; private intage; private String name; private double salary; public MyData () { age= 0; name = ""; salary = 0.0; } public void setAge (int newAge) { age = newAge; } ... } I use the following main program. import mpi.*; public class ObjectScatterMain { public static void main (String args[]) throws MPIException { intmytid; /* my task id */ MyData dataItem, objBuffer; String processor_name; /* name of local machine*/ MPI.Init (args); processor_name = MPI.Get_processor_name (); mytid = MPI.COMM_WORLD.Rank (); dataItem = new MyData (); objBuffer = new MyData (); if (mytid == 0) { /* initialize data item */ dataItem.setAge (35); dataItem.setName ("Smith"); dataItem.setSalary (2545.75); } MPI.COMM_WORLD.Scatter (dataItem, 0, 1, MPI.OBJECT, objBuffer, 0, 1, MPI.OBJECT, 0); /* Each process prints its received data item. The outputs * can intermingle on the screen so that you must use * "-output-filename" in Open MPI. */ System.out.printf ("\nProcess %d running on %s.\n" + " Age: %d\n" + " Name: %s\n" + " Salary: %10.2f\n", mytid, processor_name, objBuffer.getAge (), objBuffer.getName (), objBuffer.getSalary ()); MPI.Finalize(); } } I get the following error, when I compile and run the program. tyr java 218 mpijavac ObjectScatterMain.java tyr java 219 mpiexec java ObjectScatterMain Exception in thread "main" java.lang.ClassCastException: MyData cannot be cast to [Ljava.lang.Object; at mpi.Intracomm.copyBuffer(Intracomm.java:119) at mpi.Intracomm.Scatter(Intracomm.java:389) at ObjectScatterMain.main(ObjectScatterMain.java:45) -- mpiexec has exited due to process rank 0 with PID 25898 on ... Has anybody an idea why I get a ClassCastException or how I must define an object, which I can use in a scatter operation? Thank you very much for any help in advance. Kind regards Siegmar
Re: [OMPI users] internal error with mpiJava in openmpi-1.9a1r27380
Hi, > I haven't tried heterogeneous apps on the Java code yet - could well not > work. At the least, I would expect you need to compile your Java app > against the corresponding OMPI install on each architecture, and ensure the > right one gets run on each node. Even though it's a Java app, the classes > need to get linked against the proper OMPI code for that node. > > As for Linux-only operation: it works fine for me. Did you remember to (a) > build mpiexec on those linux machines (as opposed to using the Solaris > version), and (b) recompile your app against that OMPI installation? I didn't know that the classfiles are different, but it doesn't change anything, if I create different classfiles. I use a small shell script to create all neccessary files on all machines. tyr java 118 make_classfiles === rs0 === ... mpijavac -d /home/fd1026/SunOS/sparc/mpi_classfiles MsgSendRecvMain.java mpijavac -d /home/fd1026/SunOS/sparc/mpi_classfiles ColumnSendRecvMain.java mpijavac -d /home/fd1026/SunOS/sparc/mpi_classfiles ColumnScatterMain.java mpijavac -d /home/fd1026/SunOS/sparc/mpi_classfiles EnvironVarMain.java === sunpc1 === ... mpijavac -d /home/fd1026/SunOS/x86_64/mpi_classfiles MsgSendRecvMain.java mpijavac -d /home/fd1026/SunOS/x86_64/mpi_classfiles ColumnSendRecvMain.java mpijavac -d /home/fd1026/SunOS/x86_64/mpi_classfiles ColumnScatterMain.java mpijavac -d /home/fd1026/SunOS/x86_64/mpi_classfiles EnvironVarMain.java === linpc1 === ... mpijavac -d /home/fd1026/Linux/x86_64/mpi_classfiles MsgSendRecvMain.java mpijavac -d /home/fd1026/Linux/x86_64/mpi_classfiles ColumnSendRecvMain.java mpijavac -d /home/fd1026/Linux/x86_64/mpi_classfiles ColumnScatterMain.java mpijavac -d /home/fd1026/Linux/x86_64/mpi_classfiles EnvironVarMain.java Every machine should now find its classfiles. tyr java 119 mpiexec -host sunpc0,linpc0,rs0 java EnvironVarMain Operating system: SunOSProcessor architecture: x86_64 CLASSPATH: ...:.:/home/fd1026/SunOS/x86_64/mpi_classfiles Operating system: LinuxProcessor architecture: x86_64 CLASSPATH: ...:.:/home/fd1026/Linux/x86_64/mpi_classfiles Operating system: SunOSProcessor architecture: sparc CLASSPATH: ...:.:/home/fd1026/SunOS/sparc/mpi_classfiles tyr java 120 mpiexec -host sunpc0,linpc0,rs0 java MsgSendRecvMain -- It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): mca_base_open failed --> Returned value -2 instead of OPAL_SUCCESS -- ... tyr java 121 mpiexec -host sunpc0,rs0 java MsgSendRecvMain [rs0.informatik.hs-fulda.de:13671] *** An error occurred in MPI_Comm_dup [rs0.informatik.hs-fulda.de:13671] *** reported by process [1077346305,1] [rs0.informatik.hs-fulda.de:13671] *** on communicator MPI_COMM_WORLD [rs0.informatik.hs-fulda.de:13671] *** MPI_ERR_INTERN: internal error [rs0.informatik.hs-fulda.de:13671] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, [rs0.informatik.hs-fulda.de:13671] ***and potentially your MPI job) I get an error even then, when I login on a Linux machine, before I run the command. linpc0 fd1026 99 mpiexec -host linpc0,linpc1 java MsgSendRecvMain -- It looks like opal_init failed for some reason; your parallel process is likely to abort. There are many reasons that a parallel process can fail during opal_init; some of which are due to configuration or environment problems. This failure appears to be an internal failure; here's some additional information (which may only be relevant to an Open MPI developer): mca_base_open failed --> Returned value -2 instead of OPAL_SUCCESS -- ... *** An error occurred in MPI_Init *** on a NULL communicator *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort, ***and potentially your MPI job) [linpc1:3004] Local abort before MPI_INIT completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed! ... linpc0 fd1026 99 mpijavac -showme /usr/local/jdk1.7.0_07-64/bin/javac -cp ... :.:/home/fd1026/Linux/x86_64/mpi_classfiles:/usr/local/openmpi-1.9_64_cc/lib64/ mpi.jar By the way I have the same classfiles for all architectures. Are you sure that they should be different? I don't find any absolute path names in the files, when I use "strings". tyr java 133 diff ~/SunOS/sparc/mpi_classfiles/MsgSendRecvMain.class \ ~/SunO
[OMPI users] windows + threads
Hi list, I searched the archives, but didn't turn anything up... I have a new machine which I've installed windows 8 x64 + MSVC 2012 (MSVC 11) and have compiled openmpi from the git svn clone(on branch origin/v1.6) using these settings ... cmake -DOMPI_ENABLE_THREAD_MULTIPLE=true -DOPAL_ENABLE_MULTI_THREADS=true -DOMPI_WANT_CXX_BINDINGS=false -DCMAKE_C_FLAGS:STRING=/MP -DCMAKE_CXX_FLAGS:STRING=/MP -DCMAKE_INSTALL_PREFIX="%MPI_DIR%" D:\Code\ompi-svn-mirror -DCMAKE_GENERATOR="Visual Studio 11 Win64" The compilation succeeds, but when I run my app, I see that THREADS_MULTIPLE is not set. So I tried running ompi_info and I see that it outputs the following (at bottom of post), but locks up. The stack trace when it locks up is as follows... libmpid.dll!opal_atomic_cmpset_ptr(volatile void * addr, void * oldval, void * newval) Line 198 C++ libmpid.dll!opal_atomic_lifo_push(opal_atomic_lifo_t * lifo, opal_list_item_t * item) Line 77 C++ libmpid.dll!ompi_free_list_grow(ompi_free_list_t * flist, unsigned __int64 num_elements) Line 237 C++ libmpid.dll!ompi_rb_tree_init(ompi_rb_tree_t * tree, int (void *, void *) * comp) Line 77 C++ libmpid.dll!mca_mpool_base_tree_init() Line 88 C++ libmpid.dll!mca_mpool_base_open() Line 86 C++ ompi_info.exe!ompi_info_open_components() Line 515 C++ ompi_info.exe!main(int argc, char * * argv) Line 285 C ompi_info.exe!__tmainCRTStartup() Line 536 C ompi_info.exe!mainCRTStartup() Line 377C kernel32.dll!07feac87167e() Unknown ntdll.dll!07feae4cc3f1()Unknown My question is : has anyone tested msvc 12 and openmpi and can they recommend a source version I can use to compile and enable threads. If this combination of compilers etc is not yet supported, how can I help fix this. The fact that ompi_info reports "Thread support: no" indicates to me that either the cmake config is failing, or I've messed up with options. I tried the v1.7 branch, but the cmake support appears flaky. I'm willing to either fix the 1.7 cmake or the 1.6 thread lock, if necessary, but I don't want to waste my time if it isn't going to work within a reasonable amount of debugging. I welcome any advice on how to get this compiling and working and offer cmake related help if you need it to work on this platform. NB. I think I said my program runs, but actually, with threads enabled it bombs out during MPI_Init_thread(&argc, &argv, MPI_THREAD_MULTIPLE, &provided); - it runs without threads, but I need them. Thanks JB output of ompi_info Package: Open MPI biddisco@CRUSCA Distribution Open MPI: 1.6.3a1-1 Open MPI SVN revision: -1 Open MPI release date: Unreleased developer copy Open RTE: 1.6.3a1-1 Open RTE SVN revision: -1 Open RTE release date: Unreleased developer copy OPAL: 1.6.3a1-1 OPAL SVN revision: -1 OPAL release date: Unreleased developer copy MPI API: 2.1 Ident string: 1.6.3a1 Prefix: D:\build\openmpi\Debug/.. Configured architecture: Windows-6.2 64 bit Configure host: CRUSCA Configured by: biddisco Configured on: 07:52 11/10/2012 Configure host: CRUSCA Built by: biddisco Built on: 07:52 11/10/2012 Built host: CRUSCA C bindings: yes C++ bindings: yes Fortran77 bindings: no Fortran90 bindings: no Fortran90 bindings size: na C compiler: cl C compiler absolute: C:/Program Files (x86)/Microsoft Visual Studio 11.0/VC/bin/x86_amd64/cl.exe C compiler family name: MICROSOFT C compiler version: 1700 C++ compiler: cl C++ compiler absolute: C:/Program Files (x86)/Microsoft Visual Studio 11.0/VC/bin/x86_amd64/cl.exe Fortran77 compiler: none Fortran77 compiler abs: none Fortran90 compiler: none Fortran90 compiler abs: none C profiling: yes C++ profiling: yes Fortran77 profiling: no Fortran90 profiling: no C++ exceptions: no Thread support: no Sparse Groups: no Internal debug support: no MPI interface warnings: no MPI parameter check: never Memory profiling support: no Memory debugging support: no libltdl support: no Heterogeneous support: no mpirun default --prefix: yes MPI I/O support: yes MPI_WTIME support: gettimeofday Symbol vis. support: yes Host topology support: no MPI extensions: none FT Checkpoint support: yes (checkpoint thread: no) VampirTrace support: no MPI_MAX_PROCESSOR_NAME: 256 MPI_MAX_ERROR_STRING: 256 MPI_MAX_OBJECT_NAME: 64 MPI_MAX_INFO_KEY: 36 MPI_MAX_INFO_VAL: 256 MPI_MAX_PORT_NAME: 1024 MPI_MAX_DATAREP_STRING: 128