Re: [OMPI users] internal error with mpiJava in openmpi-1.9a1r27380

2012-10-11 Thread Ralph Castain
Like I said, I haven't tried any of that, so I have no idea if/how it would
work. I don't have access to any hetero system and we don't see it very
often at all, so it is quite possible the hetero support really isn't there.

I'll look at some of the Java-specific issues later.


On Thu, Oct 11, 2012 at 12:51 AM, Siegmar Gross <
siegmar.gr...@informatik.hs-fulda.de> wrote:

> Hi,
>
> > I haven't tried heterogeneous apps on the Java code yet - could well not
> > work. At the least, I would expect you need to compile your Java app
> > against the corresponding OMPI install on each architecture, and ensure
> the
> > right one gets run on each node. Even though it's a Java app, the classes
> > need to get linked against the proper OMPI code for that node.
> >
> > As for Linux-only operation: it works fine for me. Did you remember to
> (a)
> > build mpiexec on those linux machines (as opposed to using the Solaris
> > version), and (b) recompile your app against that OMPI installation?
>
> I didn't know that the classfiles are different, but it doesn't change
> anything, if I create different classfiles. I use a small shell script
> to create all neccessary files on all machines.
>
>
> tyr java 118 make_classfiles
> === rs0 ===
> ...
> mpijavac -d /home/fd1026/SunOS/sparc/mpi_classfiles MsgSendRecvMain.java
> mpijavac -d /home/fd1026/SunOS/sparc/mpi_classfiles ColumnSendRecvMain.java
> mpijavac -d /home/fd1026/SunOS/sparc/mpi_classfiles ColumnScatterMain.java
> mpijavac -d /home/fd1026/SunOS/sparc/mpi_classfiles EnvironVarMain.java
> === sunpc1 ===
> ...
> mpijavac -d /home/fd1026/SunOS/x86_64/mpi_classfiles MsgSendRecvMain.java
> mpijavac -d /home/fd1026/SunOS/x86_64/mpi_classfiles
> ColumnSendRecvMain.java
> mpijavac -d /home/fd1026/SunOS/x86_64/mpi_classfiles ColumnScatterMain.java
> mpijavac -d /home/fd1026/SunOS/x86_64/mpi_classfiles EnvironVarMain.java
> === linpc1 ===
> ...
> mpijavac -d /home/fd1026/Linux/x86_64/mpi_classfiles MsgSendRecvMain.java
> mpijavac -d /home/fd1026/Linux/x86_64/mpi_classfiles
> ColumnSendRecvMain.java
> mpijavac -d /home/fd1026/Linux/x86_64/mpi_classfiles ColumnScatterMain.java
> mpijavac -d /home/fd1026/Linux/x86_64/mpi_classfiles EnvironVarMain.java
>
>
> Every machine should now find its classfiles.
>
> tyr java 119 mpiexec -host sunpc0,linpc0,rs0 java EnvironVarMain
>
> Operating system: SunOSProcessor architecture: x86_64
>   CLASSPATH: ...:.:/home/fd1026/SunOS/x86_64/mpi_classfiles
>
> Operating system: LinuxProcessor architecture: x86_64
>   CLASSPATH: ...:.:/home/fd1026/Linux/x86_64/mpi_classfiles
>
> Operating system: SunOSProcessor architecture: sparc
>   CLASSPATH: ...:.:/home/fd1026/SunOS/sparc/mpi_classfiles
>
>
>
> tyr java 120 mpiexec -host sunpc0,linpc0,rs0 java MsgSendRecvMain
> --
> It looks like opal_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
>   mca_base_open failed
>   --> Returned value -2 instead of OPAL_SUCCESS
> --
> ...
>
>
>
> tyr java 121 mpiexec -host sunpc0,rs0 java MsgSendRecvMain
> [rs0.informatik.hs-fulda.de:13671] *** An error occurred in MPI_Comm_dup
> [rs0.informatik.hs-fulda.de:13671] *** reported by process [1077346305,1]
> [rs0.informatik.hs-fulda.de:13671] *** on communicator MPI_COMM_WORLD
> [rs0.informatik.hs-fulda.de:13671] *** MPI_ERR_INTERN: internal error
> [rs0.informatik.hs-fulda.de:13671] *** MPI_ERRORS_ARE_FATAL (processes in
> this
> communicator will now abort,
> [rs0.informatik.hs-fulda.de:13671] ***and potentially your MPI job)
>
>
>
> I get an error even then, when I login on a Linux machine, before I
> run the command.
>
> linpc0 fd1026 99 mpiexec -host linpc0,linpc1 java MsgSendRecvMain
> --
> It looks like opal_init failed for some reason; your parallel process is
> likely to abort.  There are many reasons that a parallel process can
> fail during opal_init; some of which are due to configuration or
> environment problems.  This failure appears to be an internal failure;
> here's some additional information (which may only be relevant to an
> Open MPI developer):
>
>   mca_base_open failed
>   --> Returned value -2 instead of OPAL_SUCCESS
> --
> ...
> *** An error occurred in MPI_Init
> *** on a NULL communicator
> *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
> ***and potentially your MPI job)
> [linpc1:3004] Local abort before 

Re: [OMPI users] debugs for jobs not starting

2012-10-11 Thread Ralph Castain
I'm afraid I'm confused - I don't understand what is and isn't working.
What "next process" isn't starting?


On Thu, Oct 11, 2012 at 9:41 AM, Michael Di Domenico  wrote:

> adding some additional info
>
> did an strace on an orted process where xhpl failed to start, i did
> this after the mpirun execution, so i probably missed some output, but
> it keeps scrolling
>
> poll([{fd=4, events=POLLIN},{fd=7, events=POLLIN},{fd=8,
> events=POLLIN},{fd=10, events=POLLIN},{fd=12, events=POLLIN},{fd=13,
> events=POLLIN},{fd=14, events=POLLIN},{fd=15, events=POLLIN},{fd=16,
> events=POLLIN}], 9, 1000) = 0 (Timeout)
>
> i didn't see anything useful in /proc under those file descriptors,
> but perhaps i missed something i don't know to look for
>
> On Thu, Oct 11, 2012 at 12:06 PM, Michael Di Domenico
>  wrote:
> > too add a little more detail, it looks like xhpl is not actually
> > starting on all nodes when i kick off the mpirun
> >
> > each time i cancel and restart the job, the nodes that do not start
> > change, so i can't call it a bad node
> >
> > if i disable infiniband with --mca btl self,sm,tcp on occasion i can
> > get xhpl to actually run, but it's not consistent
> >
> > i'm going to check my ethernet network and make sure there's no
> > problems there (could this be an OOB error with mpirun?), on the nodes
> > that fail to start xhpl, i do see the orte process, but nothing in the
> > logs about why it failed to launch xhpl
> >
> >
> >
> > On Thu, Oct 11, 2012 at 11:49 AM, Michael Di Domenico
> >  wrote:
> >> I'm trying to diagnose an MPI job (in this case xhpl), that fails to
> >> start when the rank count gets fairly high into the thousands.
> >>
> >> My symptom is the jobs fires up via slurm, and I can see all the xhpl
> >> processes on the nodes, but it never kicks over to the next process.
> >>
> >> My question is, what debugs should I turn on to tell me what the
> >> system might be waiting on?
> >>
> >> I've checked a bunch of things, but I'm probably overlooking something
> >> trivial (which is par for me).
> >>
> >> I'm using the Openmpi 1.6.1, Slurm 2.4.2 on CentOS 6.3, with
> Infiniband/PSM
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] [1.6.2] Compilation Error (at vtfilter) with Intel Compiler

2012-10-11 Thread wookietreiber
Hi,

The error I get I couldn't find in the mails from your link. But I also
didn't set CXX, F77 and FC. I'll try that tomorrow and we'll see if it
changes anything.

I find the error I get weird because some file is not found which
I guess should not occur when switching compilers ...


On Thu, Oct 11, 2012 at 01:09:28PM -0400, Gus Correa wrote:
> Hi Christian
> 
> Would your problem be similar to the one reported two days ago on
> this thread? [It also failed to compile vampir trace tools,
> it also didn't have the Intel C++ compiler specified to configure.]
> 
> http://www.open-mpi.org/community/lists/users/2012/10/20449.php
> 
> Have you tried to specify the Intel C++ compiler
> to the configure script?
> 
> ./configure CC=icc CXX=icpc  ... etc, etc ...
> 
> I hope this helps,
> Gus Correa
> 
> 
> 
> On 10/11/2012 11:00 AM, Christian Krause wrote:
> >Hi,
> >
> >I tried to compile the current OpenMPI 1.6.2 with the Intel Compiler
> >
> ># icc --version
> >icc (ICC) 12.0.4 20110427
> >
> >
> >The error I get is the following (I changed directly in the vtfilter
> >directory where the error occurs to reduce output for this mail):
> >
> ># cd ompi/contrib/vt/vt/tools/vtfilter/
> ># make
> >Making all in .
> >make[1]: Entering directory
> >`/gpfs0/global/local/src/xxx-mpi/openmpi-1.6.2/ompi/contrib/vt/vt/tools/vtfilter'
> >   CXXvtfilter-vt_filter.o
> >cc1plus: error: vtfilter-vt_filter.d: No such file or directory
> >make[1]: *** [vtfilter-vt_filter.o] Error 1
> >make[1]: Leaving directory
> >`/gpfs0/global/local/src/xxx-mpi/openmpi-1.6.2/ompi/contrib/vt/vt/tools/vtfilter'
> >make: *** [all-recursive] Error 1
> >
> >
> >configure options from config.log:
> >
> >./configure CC=icc --prefix=/usr/local/openmpi/1.6.2_intel_12.0.4
> >--with-sge --with-hwloc=/usr/local/hwloc/1.5_intel_12.0.4
> >--with-openib-libdir=/usr/lib64 --with-udapl-libdir=/usr/lib64
> >
> >
> >I have already built hwloc and pciutils locally using icc. Also I
> >recently compiled OpenMPI 1.6.2 with gcc 4.7.1 with hwloc and pciutils
> >too which worked without problems (configure basically the same, i.e.
> >not setting CC and using different hwloc). That's why I'm assuming the
> >error is somehow icc's fault ... I'm new to this mailing list and I
> >already received some mails concerning the Intel Compiler so I figure
> >there may be others who've experienced this problem?
> >
> >
> >Thanks for any help in advance.
> >
> >Regards
> >Christian
> >___
> >users mailing list
> >us...@open-mpi.org
> >http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 

Beste Grüße / Best Regards
Christian Krause aka wookietreiber

---

EGAL WIE DICHT DU BIST, GOETHE WAR DICHTER.


Re: [OMPI users] [1.6.2] Compilation Error (at vtfilter) with Intel Compiler

2012-10-11 Thread Gus Correa

Hi Christian

Would your problem be similar to the one reported two days ago on
this thread? [It also failed to compile vampir trace tools,
it also didn't have the Intel C++ compiler specified to configure.]

http://www.open-mpi.org/community/lists/users/2012/10/20449.php

Have you tried to specify the Intel C++ compiler
to the configure script?

./configure CC=icc CXX=icpc  ... etc, etc ...

I hope this helps,
Gus Correa



On 10/11/2012 11:00 AM, Christian Krause wrote:

Hi,

I tried to compile the current OpenMPI 1.6.2 with the Intel Compiler

# icc --version
icc (ICC) 12.0.4 20110427


The error I get is the following (I changed directly in the vtfilter
directory where the error occurs to reduce output for this mail):

# cd ompi/contrib/vt/vt/tools/vtfilter/
# make
Making all in .
make[1]: Entering directory
`/gpfs0/global/local/src/xxx-mpi/openmpi-1.6.2/ompi/contrib/vt/vt/tools/vtfilter'
   CXXvtfilter-vt_filter.o
cc1plus: error: vtfilter-vt_filter.d: No such file or directory
make[1]: *** [vtfilter-vt_filter.o] Error 1
make[1]: Leaving directory
`/gpfs0/global/local/src/xxx-mpi/openmpi-1.6.2/ompi/contrib/vt/vt/tools/vtfilter'
make: *** [all-recursive] Error 1


configure options from config.log:

./configure CC=icc --prefix=/usr/local/openmpi/1.6.2_intel_12.0.4
--with-sge --with-hwloc=/usr/local/hwloc/1.5_intel_12.0.4
--with-openib-libdir=/usr/lib64 --with-udapl-libdir=/usr/lib64


I have already built hwloc and pciutils locally using icc. Also I
recently compiled OpenMPI 1.6.2 with gcc 4.7.1 with hwloc and pciutils
too which worked without problems (configure basically the same, i.e.
not setting CC and using different hwloc). That's why I'm assuming the
error is somehow icc's fault ... I'm new to this mailing list and I
already received some mails concerning the Intel Compiler so I figure
there may be others who've experienced this problem?


Thanks for any help in advance.

Regards
Christian
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] debugs for jobs not starting

2012-10-11 Thread Michael Di Domenico
adding some additional info

did an strace on an orted process where xhpl failed to start, i did
this after the mpirun execution, so i probably missed some output, but
it keeps scrolling

poll([{fd=4, events=POLLIN},{fd=7, events=POLLIN},{fd=8,
events=POLLIN},{fd=10, events=POLLIN},{fd=12, events=POLLIN},{fd=13,
events=POLLIN},{fd=14, events=POLLIN},{fd=15, events=POLLIN},{fd=16,
events=POLLIN}], 9, 1000) = 0 (Timeout)

i didn't see anything useful in /proc under those file descriptors,
but perhaps i missed something i don't know to look for

On Thu, Oct 11, 2012 at 12:06 PM, Michael Di Domenico
 wrote:
> too add a little more detail, it looks like xhpl is not actually
> starting on all nodes when i kick off the mpirun
>
> each time i cancel and restart the job, the nodes that do not start
> change, so i can't call it a bad node
>
> if i disable infiniband with --mca btl self,sm,tcp on occasion i can
> get xhpl to actually run, but it's not consistent
>
> i'm going to check my ethernet network and make sure there's no
> problems there (could this be an OOB error with mpirun?), on the nodes
> that fail to start xhpl, i do see the orte process, but nothing in the
> logs about why it failed to launch xhpl
>
>
>
> On Thu, Oct 11, 2012 at 11:49 AM, Michael Di Domenico
>  wrote:
>> I'm trying to diagnose an MPI job (in this case xhpl), that fails to
>> start when the rank count gets fairly high into the thousands.
>>
>> My symptom is the jobs fires up via slurm, and I can see all the xhpl
>> processes on the nodes, but it never kicks over to the next process.
>>
>> My question is, what debugs should I turn on to tell me what the
>> system might be waiting on?
>>
>> I've checked a bunch of things, but I'm probably overlooking something
>> trivial (which is par for me).
>>
>> I'm using the Openmpi 1.6.1, Slurm 2.4.2 on CentOS 6.3, with Infiniband/PSM


Re: [OMPI users] debugs for jobs not starting

2012-10-11 Thread Michael Di Domenico
too add a little more detail, it looks like xhpl is not actually
starting on all nodes when i kick off the mpirun

each time i cancel and restart the job, the nodes that do not start
change, so i can't call it a bad node

if i disable infiniband with --mca btl self,sm,tcp on occasion i can
get xhpl to actually run, but it's not consistent

i'm going to check my ethernet network and make sure there's no
problems there (could this be an OOB error with mpirun?), on the nodes
that fail to start xhpl, i do see the orte process, but nothing in the
logs about why it failed to launch xhpl



On Thu, Oct 11, 2012 at 11:49 AM, Michael Di Domenico
 wrote:
> I'm trying to diagnose an MPI job (in this case xhpl), that fails to
> start when the rank count gets fairly high into the thousands.
>
> My symptom is the jobs fires up via slurm, and I can see all the xhpl
> processes on the nodes, but it never kicks over to the next process.
>
> My question is, what debugs should I turn on to tell me what the
> system might be waiting on?
>
> I've checked a bunch of things, but I'm probably overlooking something
> trivial (which is par for me).
>
> I'm using the Openmpi 1.6.1, Slurm 2.4.2 on CentOS 6.3, with Infiniband/PSM


[OMPI users] debugs for jobs not starting

2012-10-11 Thread Michael Di Domenico
I'm trying to diagnose an MPI job (in this case xhpl), that fails to
start when the rank count gets fairly high into the thousands.

My symptom is the jobs fires up via slurm, and I can see all the xhpl
processes on the nodes, but it never kicks over to the next process.

My question is, what debugs should I turn on to tell me what the
system might be waiting on?

I've checked a bunch of things, but I'm probably overlooking something
trivial (which is par for me).

I'm using the Openmpi 1.6.1, Slurm 2.4.2 on CentOS 6.3, with Infiniband/PSM


[OMPI users] [1.6.2] Compilation Error (at vtfilter) with Intel Compiler

2012-10-11 Thread Christian Krause
Hi,

I tried to compile the current OpenMPI 1.6.2 with the Intel Compiler

# icc --version
icc (ICC) 12.0.4 20110427


The error I get is the following (I changed directly in the vtfilter
directory where the error occurs to reduce output for this mail):

# cd ompi/contrib/vt/vt/tools/vtfilter/
# make
Making all in .
make[1]: Entering directory
`/gpfs0/global/local/src/xxx-mpi/openmpi-1.6.2/ompi/contrib/vt/vt/tools/vtfilter'
  CXXvtfilter-vt_filter.o
cc1plus: error: vtfilter-vt_filter.d: No such file or directory
make[1]: *** [vtfilter-vt_filter.o] Error 1
make[1]: Leaving directory
`/gpfs0/global/local/src/xxx-mpi/openmpi-1.6.2/ompi/contrib/vt/vt/tools/vtfilter'
make: *** [all-recursive] Error 1


configure options from config.log:

./configure CC=icc --prefix=/usr/local/openmpi/1.6.2_intel_12.0.4
--with-sge --with-hwloc=/usr/local/hwloc/1.5_intel_12.0.4
--with-openib-libdir=/usr/lib64 --with-udapl-libdir=/usr/lib64


I have already built hwloc and pciutils locally using icc. Also I
recently compiled OpenMPI 1.6.2 with gcc 4.7.1 with hwloc and pciutils
too which worked without problems (configure basically the same, i.e.
not setting CC and using different hwloc). That's why I'm assuming the
error is somehow icc's fault ... I'm new to this mailing list and I
already received some mails concerning the Intel Compiler so I figure
there may be others who've experienced this problem?


Thanks for any help in advance.

Regards
Christian


Re: [OMPI users] windows + threads

2012-10-11 Thread Biddiscombe, John A.
Just to follow up my earlier post, checking out master and building that gives 
me the same lock up in ompi_info

>  ompi_info.exe!opal_atomic_lifo_push(opal_atomic_lifo_t * lifo, 
> opal_list_item_t * item) Line 73 C
  ompi_info.exe!ompi_free_list_grow(ompi_free_list_t * flist, unsigned 
__int64 num_elements) Line 257   C
  ompi_info.exe!ompi_rb_tree_init(ompi_rb_tree_t * tree, int (void *, void 
*) * comp) Line 77 C
  ompi_info.exe!mca_mpool_base_tree_init() Line 88   C
  ompi_info.exe!mca_mpool_base_open() Line 86 C
  ompi_info.exe!ompi_info_register_components(opal_pointer_array_t * 
mca_types, opal_pointer_array_t * component_map) Line 264C
  ompi_info.exe!main(int argc, char * * argv) Line 158   C
  ompi_info.exe!__tmainCRTStartup() Line 536  C
  ompi_info.exe!mainCRTStartup() Line 377  C
  kernel32.dll!07feac87167e()   Unknown
  ntdll.dll!07feae4cc3f1()  Unknown

at the line below with the * at the start. Well actually I guess it's sitting 
in a spin lock. Should I continue playing or is master unstable?

Thanks
JB

/* Add one element to the LIFO. We will return the last head of the list
* to allow the upper level to detect if this element is the first one in the
* list (if the list was empty before this operation).
*/
static inline opal_list_item_t* opal_atomic_lifo_push( opal_atomic_lifo_t* lifo,
   opal_list_item_t* item )
{
#if OPAL_ENABLE_MULTI_THREADS
do {
*   item->opal_list_next = lifo->opal_lifo_head;
opal_atomic_wmb();
if( opal_atomic_cmpset_ptr( &(lifo->opal_lifo_head),
(void*)item->opal_list_next,
item ) ) {
opal_atomic_cmpset_32((volatile int32_t*)>item_free, 1, 0);
return (opal_list_item_t*)item->opal_list_next;
}
/* DO some kind of pause to release the bus */
} while( 1 );
#else
item->opal_list_next = lifo->opal_lifo_head;
lifo->opal_lifo_head = item;
return (opal_list_item_t*)item->opal_list_next;
#endif  /* OPAL_ENABLE_MULTI_THREADS */
}



[OMPI users] question to scattering an object in openmpi-1.9a1r27380

2012-10-11 Thread Siegmar Gross
Hi,

I have built openmpi-1.9a1r27380 with Java support and try some small
programs. When I try to scatter an object, I get a ClassCastException.
I use the following object.

public class MyData implements java.io.Serializable
{
  static final long serialVersionUID = -5243516570672186644L;

  private intage;
  private String name;
  private double salary;

  public MyData ()
  {
age= 0;
name   = "";
salary = 0.0;
  }

  public void setAge (int newAge)
  {
age = newAge;
  }
...
}


I use the following main program.

import mpi.*;

public class ObjectScatterMain
{
  public static void main (String args[]) throws MPIException
  {
intmytid;   /* my task id   */
MyData dataItem, objBuffer;
String processor_name;  /* name of local machine*/

MPI.Init (args);
processor_name = MPI.Get_processor_name ();
mytid  = MPI.COMM_WORLD.Rank ();
dataItem   = new MyData ();
objBuffer  = new MyData ();
if (mytid == 0)
{
  /* initialize data item   */
  dataItem.setAge (35);
  dataItem.setName ("Smith");
  dataItem.setSalary (2545.75);
}
MPI.COMM_WORLD.Scatter (dataItem, 0, 1, MPI.OBJECT,
objBuffer, 0, 1, MPI.OBJECT, 0);
/* Each process prints its received data item. The outputs
 * can intermingle on the screen so that you must use
 * "-output-filename" in Open MPI.
 */
System.out.printf ("\nProcess %d running on %s.\n" +
   "  Age:  %d\n" +
   "  Name: %s\n" +
   "  Salary: %10.2f\n",
   mytid, processor_name,
   objBuffer.getAge (),
   objBuffer.getName (),
   objBuffer.getSalary ());
MPI.Finalize();
  }
}


I get the following error, when I compile and run the program.

tyr java 218 mpijavac ObjectScatterMain.java
tyr java 219 mpiexec java ObjectScatterMain
Exception in thread "main" java.lang.ClassCastException:
  MyData cannot be cast to [Ljava.lang.Object;
at mpi.Intracomm.copyBuffer(Intracomm.java:119)
at mpi.Intracomm.Scatter(Intracomm.java:389)
at ObjectScatterMain.main(ObjectScatterMain.java:45)
--
mpiexec has exited due to process rank 0 with PID 25898 on
...


Has anybody an idea why I get a ClassCastException or how I must define
an object, which I can use in a scatter operation? Thank you very much
for any help in advance.


Kind regards

Siegmar



Re: [OMPI users] internal error with mpiJava in openmpi-1.9a1r27380

2012-10-11 Thread Siegmar Gross
Hi,

> I haven't tried heterogeneous apps on the Java code yet - could well not
> work. At the least, I would expect you need to compile your Java app
> against the corresponding OMPI install on each architecture, and ensure the
> right one gets run on each node. Even though it's a Java app, the classes
> need to get linked against the proper OMPI code for that node.
> 
> As for Linux-only operation: it works fine for me. Did you remember to (a)
> build mpiexec on those linux machines (as opposed to using the Solaris
> version), and (b) recompile your app against that OMPI installation?

I didn't know that the classfiles are different, but it doesn't change
anything, if I create different classfiles. I use a small shell script
to create all neccessary files on all machines.


tyr java 118 make_classfiles
=== rs0 ===
...
mpijavac -d /home/fd1026/SunOS/sparc/mpi_classfiles MsgSendRecvMain.java
mpijavac -d /home/fd1026/SunOS/sparc/mpi_classfiles ColumnSendRecvMain.java
mpijavac -d /home/fd1026/SunOS/sparc/mpi_classfiles ColumnScatterMain.java
mpijavac -d /home/fd1026/SunOS/sparc/mpi_classfiles EnvironVarMain.java
=== sunpc1 ===
...
mpijavac -d /home/fd1026/SunOS/x86_64/mpi_classfiles MsgSendRecvMain.java
mpijavac -d /home/fd1026/SunOS/x86_64/mpi_classfiles ColumnSendRecvMain.java
mpijavac -d /home/fd1026/SunOS/x86_64/mpi_classfiles ColumnScatterMain.java
mpijavac -d /home/fd1026/SunOS/x86_64/mpi_classfiles EnvironVarMain.java
=== linpc1 ===
...
mpijavac -d /home/fd1026/Linux/x86_64/mpi_classfiles MsgSendRecvMain.java
mpijavac -d /home/fd1026/Linux/x86_64/mpi_classfiles ColumnSendRecvMain.java
mpijavac -d /home/fd1026/Linux/x86_64/mpi_classfiles ColumnScatterMain.java
mpijavac -d /home/fd1026/Linux/x86_64/mpi_classfiles EnvironVarMain.java


Every machine should now find its classfiles.

tyr java 119 mpiexec -host sunpc0,linpc0,rs0 java EnvironVarMain

Operating system: SunOSProcessor architecture: x86_64
  CLASSPATH: ...:.:/home/fd1026/SunOS/x86_64/mpi_classfiles

Operating system: LinuxProcessor architecture: x86_64
  CLASSPATH: ...:.:/home/fd1026/Linux/x86_64/mpi_classfiles

Operating system: SunOSProcessor architecture: sparc
  CLASSPATH: ...:.:/home/fd1026/SunOS/sparc/mpi_classfiles



tyr java 120 mpiexec -host sunpc0,linpc0,rs0 java MsgSendRecvMain
--
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  mca_base_open failed
  --> Returned value -2 instead of OPAL_SUCCESS
--
...



tyr java 121 mpiexec -host sunpc0,rs0 java MsgSendRecvMain
[rs0.informatik.hs-fulda.de:13671] *** An error occurred in MPI_Comm_dup
[rs0.informatik.hs-fulda.de:13671] *** reported by process [1077346305,1]
[rs0.informatik.hs-fulda.de:13671] *** on communicator MPI_COMM_WORLD
[rs0.informatik.hs-fulda.de:13671] *** MPI_ERR_INTERN: internal error
[rs0.informatik.hs-fulda.de:13671] *** MPI_ERRORS_ARE_FATAL (processes in this 
communicator will now abort,
[rs0.informatik.hs-fulda.de:13671] ***and potentially your MPI job)



I get an error even then, when I login on a Linux machine, before I
run the command.

linpc0 fd1026 99 mpiexec -host linpc0,linpc1 java MsgSendRecvMain
--
It looks like opal_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during opal_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  mca_base_open failed
  --> Returned value -2 instead of OPAL_SUCCESS
--
...
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
***and potentially your MPI job)
[linpc1:3004] Local abort before MPI_INIT completed successfully; not able to 
aggregate error messages, and not able to guarantee that all other processes 
were killed!
...


linpc0 fd1026 99 mpijavac -showme
/usr/local/jdk1.7.0_07-64/bin/javac -cp ... 
:.:/home/fd1026/Linux/x86_64/mpi_classfiles:/usr/local/openmpi-1.9_64_cc/lib64/
mpi.jar


By the way I have the same classfiles for all architectures. Are you
sure that they should be different? I don't find any absolute path names
in the files, when I use "strings".

tyr java 133 diff ~/SunOS/sparc/mpi_classfiles/MsgSendRecvMain.class \
  

[OMPI users] windows + threads

2012-10-11 Thread Biddiscombe, John A.
Hi list,

I searched the archives, but didn't turn anything up...

I have a new machine which I've installed windows 8 x64 + MSVC 2012 (MSVC 11) 
and have compiled openmpi from the git svn clone(on branch origin/v1.6)  using 
these settings ...
cmake -DOMPI_ENABLE_THREAD_MULTIPLE=true -DOPAL_ENABLE_MULTI_THREADS=true 
-DOMPI_WANT_CXX_BINDINGS=false -DCMAKE_C_FLAGS:STRING=/MP 
-DCMAKE_CXX_FLAGS:STRING=/MP -DCMAKE_INSTALL_PREFIX="%MPI_DIR%" 
D:\Code\ompi-svn-mirror -DCMAKE_GENERATOR="Visual Studio 11 Win64"

The compilation succeeds, but when I run my app, I see that THREADS_MULTIPLE is 
not set. So I tried running ompi_info and I see that it outputs the following 
(at bottom of post), but locks up.
The stack trace when it locks up is as follows...

libmpid.dll!opal_atomic_cmpset_ptr(volatile void * addr, void * oldval, 
void * newval) Line 198   C++
libmpid.dll!opal_atomic_lifo_push(opal_atomic_lifo_t * lifo, 
opal_list_item_t * item) Line 77 C++
libmpid.dll!ompi_free_list_grow(ompi_free_list_t * flist, unsigned __int64 
num_elements) Line 237 C++
libmpid.dll!ompi_rb_tree_init(ompi_rb_tree_t * tree, int (void *, void *) * 
comp) Line 77   C++
libmpid.dll!mca_mpool_base_tree_init() Line 88  C++
libmpid.dll!mca_mpool_base_open() Line 86  C++
ompi_info.exe!ompi_info_open_components() Line 515   C++
ompi_info.exe!main(int argc, char * * argv) Line 285 C
ompi_info.exe!__tmainCRTStartup() Line 536 C
ompi_info.exe!mainCRTStartup() Line 377C
kernel32.dll!07feac87167e() Unknown
ntdll.dll!07feae4cc3f1()Unknown

My question is : has anyone tested msvc 12 and openmpi and can they recommend a 
source version I can use to compile and enable threads. If this combination of 
compilers etc is not yet supported, how can I help fix this. The fact that 
ompi_info reports "Thread support: no" indicates to me that either the cmake 
config is failing, or I've messed up with options. I tried the v1.7 branch, but 
the cmake support appears flaky. I'm willing to either fix the 1.7 cmake or the 
1.6 thread lock, if necessary, but I don't want to waste my time if it isn't 
going to work within a reasonable amount of debugging. I welcome any advice on 
how to get this compiling and working and offer cmake related help if you need 
it to work on this platform.
NB. I think I said my program runs, but actually, with threads enabled it bombs 
out during  MPI_Init_thread(, , MPI_THREAD_MULTIPLE, ); - it 
runs without threads, but I need them.

Thanks

JB

output of ompi_info

 Package: Open MPI biddisco@CRUSCA Distribution
Open MPI: 1.6.3a1-1
   Open MPI SVN revision: -1
   Open MPI release date: Unreleased developer copy
Open RTE: 1.6.3a1-1
   Open RTE SVN revision: -1
   Open RTE release date: Unreleased developer copy
OPAL: 1.6.3a1-1
   OPAL SVN revision: -1
   OPAL release date: Unreleased developer copy
 MPI API: 2.1
Ident string: 1.6.3a1
  Prefix: D:\build\openmpi\Debug/..
Configured architecture: Windows-6.2 64 bit
  Configure host: CRUSCA
   Configured by: biddisco
   Configured on: 07:52 11/10/2012
  Configure host: CRUSCA
Built by: biddisco
Built on: 07:52 11/10/2012
  Built host: CRUSCA
  C bindings: yes
C++ bindings: yes
  Fortran77 bindings: no
  Fortran90 bindings: no
Fortran90 bindings size: na
  C compiler: cl
 C compiler absolute: C:/Program Files (x86)/Microsoft Visual Studio
  11.0/VC/bin/x86_amd64/cl.exe
  C compiler family name: MICROSOFT
  C compiler version: 1700
C++ compiler: cl
  C++ compiler absolute: C:/Program Files (x86)/Microsoft Visual Studio
  11.0/VC/bin/x86_amd64/cl.exe
  Fortran77 compiler: none
  Fortran77 compiler abs: none
  Fortran90 compiler: none
  Fortran90 compiler abs: none
 C profiling: yes
   C++ profiling: yes
 Fortran77 profiling: no
 Fortran90 profiling: no
  C++ exceptions: no
  Thread support: no
   Sparse Groups: no
  Internal debug support: no
  MPI interface warnings: no
 MPI parameter check: never
Memory profiling support: no
Memory debugging support: no
 libltdl support: no
   Heterogeneous support: no
mpirun default --prefix: yes
 MPI I/O support: yes
   MPI_WTIME support: gettimeofday
 Symbol vis. support: yes
   Host topology support: no
  MPI extensions: none
   FT Checkpoint support: yes (checkpoint thread: no)
 VampirTrace support: no
  MPI_MAX_PROCESSOR_NAME: 256
MPI_MAX_ERROR_STRING: 256
 MPI_MAX_OBJECT_NAME: 64
MPI_MAX_INFO_KEY: 36
MPI_MAX_INFO_VAL: 256
   MPI_MAX_PORT_NAME: 1024
  MPI_MAX_DATAREP_STRING: 128