from:"Terry Dontje"

Re: [OMPI users] setsockopt() fails with EINVAL on solaris

2012-07-30 Thread TERRY DONTJE

Do you know what r# of 1.6 you were trying to compile?  Is this via the 
tarball or svn?


thanks,

--td

On 7/30/2012 9:41 AM, Daniel Junglas wrote:

Hi,

I compiled OpenMPI 1.6 on a 64bit Solaris ultrasparc machine.
Compilation and installation worked without a problem. However,
when trying to run an application with mpirun I always faced
this error:

[hostname:14798] [[50433,0],0] rmcast:init: setsockopt() failed on
MULTICAST_IF
 for multicast network xxx.xxx.xxx.xxx interface xxx.xxx.xxx.xxx
 Error: Invalid argument (22)
[hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file
../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 825
[hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file
../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 744
[hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file
../../../../../openmpi-1.6/orte/mca/rmcast/udp/rmcast_udp.c at line 193
[hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file
../../../../openmpi-1.6/orte/mca/rmcast/base/rmcast_base_select.c at line
56
[hostname:14798] [[50433,0],0] ORTE_ERROR_LOG: Error in file
../../../../../openmpi-1.6/orte/mca/ess/hnp/ess_hnp_module.c at line 233
--
It looks like orte_init failed for some reason; your parallel process is
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

   orte_rmcast_base_select failed
   -->  Returned value Error (-1) instead of ORTE_SUCCESS


After some digging I found that the following patch seems to fix the
problem (at least the application seems to run correct now):
--- a/orte/mca/rmcast/udp/rmcast_udp.c  Tue Apr  3 16:30:29 2012
+++ b/orte/mca/rmcast/udp/rmcast_udp.c  Mon Jul 30 15:12:02 2012
@@ -936,9 +936,16 @@
  }
  } else {
  /* on the xmit side, need to set the interface */
+void const *addrptr;
  memset(, 0, sizeof(inaddr));
  inaddr.sin_addr.s_addr = htonl(chan->interface);
+#ifdef __sun
+addrlen = sizeof(inaddr.sin_addr);
+addrptr = (void *)_addr;
+#else
  addrlen = sizeof(struct sockaddr_in);
+addrptr = (void *)
+#endif

  OPAL_OUTPUT_VERBOSE((2, orte_rmcast_base.rmcast_output,
   "setup:socket:xmit interface
%03d.%03d.%03d.%03d",
@@ -945,7 +952,7 @@
   OPAL_IF_FORMAT_ADDR(chan->interface)));

  if ((setsockopt(target_sd, IPPROTO_IP, IP_MULTICAST_IF,
-(void *), addrlen))<  0) {
+addrptr, addrlen))<  0) {
  opal_output(0, "%s rmcast:init: setsockopt() failed on
MULTICAST_IF\n"
  "\tfor multicast network %03d.%03d.%03d.%03d
interface %03d.%03d.%03d.%03d\n\tError: %s (%d)",
  ORTE_NAME_PRINT(ORTE_PROC_MY_NAME),
Can anybody confirm that the patch is good/correct? In particular
that the '__sun' part is the right thing to do?

Thanks,

Daniel


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] sndlib problem by mpicc compiler

2012-07-30 Thread TERRY DONTJE

Please show me how you are compiling the program under gcc and mpicc.  
Plus do a "mpicc --showme".


--td

On 7/30/2012 8:33 AM, Paweł Jaromin wrote:

This situation is also strange for me, I spend 2 days to find a bug :(.

Unfortunately I am not  a professional  C/C++ programmer, but I have
to make this program. Please have a look in a picture from link below,
maybe it will be more clear.

http://vipjg.nazwa.pl/sndfile_error.png









2012/7/30 TERRY DONTJE<terry.don...@oracle.com>:

On 7/30/2012 6:11 AM, Paweł Jaromin wrote:

Hello

Thanks for fast answer, but the problem looks a little different.

Of course, I use this code only for master node (rank 0), because only
this node has an access to file.

As You can see i use "if" clause to check sndFile for NULL:

if (sndFile == NULL)

and it returns not NULL value, so the code can run forward.
I have found the problem during check array:


   long numFrames = sf_readf_float(sndFile, snd_buffor, 
sfinfo.frames);

   // Check correct number of samples loaded
   if (numFrames != sfinfo.frames) {
  fprintf(stderr, "Did not read enough frames for 
source\n");
  sf_close(sndFile);
  free(snd_buffor);
  MPI_Finalize();
  return 1;
   }

So, after that I went to debuger to check variables (I use Eclipse PTP
and sdm enviroment), then after inicjalization variable "sndFile" has
"no value" not "NULL" . Unfortunatelly sndFile has still the same
value to the end of program :(.

What do you mean by sndFile has "no value"?  There isn't a special "no
value" value to a variable unless you are debugging a code that somehow had
some variable optimized out at the particular line you are interested in.

Declarations:
FILE*outfile = NULL ;
SF_INFO sfinfo ;
SNDFILE *sndFile= NULL;

Very interesting is , that "sfinfo" from the same library  works perfect.
At the end of this story, I modified the program without MPI , then
compiled it by gcc (not mpicc) and it works fine (in debuger sndFile
has proper value).

So it seems you believe mpicc is doing something wrong when all mpicc is is
a wrapper to a compiler.  Maybe doing a "mpicc --showme" will give you an
idea what compiler and options mpicc is passing to the compiler.  This
should give you an idea  the difference between your gcc and mpicc
compilation.  I would suspect either mpicc is using a compiler significantly
different than gcc or that mpicc might be passing some optimization
parameter that is messing the code execution (just a guess).


I hope, now is clear.

Not really.

--td



2012/7/30 TERRY DONTJE<terry.don...@oracle.com>:

I am not sure I am understanding the problem correctly so let me describe it
back to you with a couple clarifications.

So your program using sf_open compiles successfully when using gcc and
mpicc.  However, when you run the executable compiled using mpicc sndFile is
null?

If the above is right can you tell us how you ran the code?
Will the code run ok if ran with "mpirun -np 1" on the same machine you run
the gcc code normally?
When the mpicc compiled code sf_open call returns NULL what does the
successive sf_strerror report?
My wild guess is when you run the mpicc compiled code one of the processes
is on a node that doesn't have access to the file passed to sf_open.

--td

On 7/28/2012 1:08 PM, Paweł Jaromin wrote:

Hello all

Because I try make a program to parallel procesing sound files, I use
libsndfile library to load and write wav files. Sytuation is strange,
because when I compile the program by gcc is good (no parallel), but
if I do it by mpicc is a problem with sndFile variable.

// Open sound file
SF_INFO sndInfo;
SNDFILE *sndFile = sf_open(argv[1], SFM_READ,);
if (sndFile == NULL) {
   fprintf(stderr, "Error reading source file '%s': %s\n", argv[1],
sf_strerror(sndFile));
   return 1;
}

This code run witout an error, but variable is "No value"

Maybe somone can help me ??


--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle - Performance Technologies
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle - Performance Technologies
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com




___
users mailing list
us...@open-mpi.org
http://w

Re: [OMPI users] sndlib problem by mpicc compiler

2012-07-30 Thread TERRY DONTJE


On 7/30/2012 6:11 AM, Paweł Jaromin wrote:

Hello

Thanks for fast answer, but the problem looks a little different.

Of course, I use this code only for master node (rank 0), because only
this node has an access to file.

As You can see i use "if" clause to check sndFile for NULL:

if (sndFile == NULL)

and it returns not NULL value, so the code can run forward.
I have found the problem during check array:


   long numFrames = sf_readf_float(sndFile, snd_buffor, 
sfinfo.frames);

   // Check correct number of samples loaded
   if (numFrames != sfinfo.frames) {
  fprintf(stderr, "Did not read enough frames for 
source\n");
  sf_close(sndFile);
  free(snd_buffor);
  MPI_Finalize();
  return 1;
   }

So, after that I went to debuger to check variables (I use Eclipse PTP
and sdm enviroment), then after inicjalization variable "sndFile" has
"no value" not "NULL" . Unfortunatelly sndFile has still the same
value to the end of program :(.
What do you mean by sndFile has "no value"?  There isn't a special "no 
value" value to a variable unless you are debugging a code that somehow 
had some variable optimized out at the particular line you are 
interested in.

Declarations:
FILE*outfile = NULL ;
SF_INFO sfinfo ;
SNDFILE *sndFile= NULL;

Very interesting is , that "sfinfo" from the same library  works perfect.
At the end of this story, I modified the program without MPI , then
compiled it by gcc (not mpicc) and it works fine (in debuger sndFile
has proper value).
So it seems you believe mpicc is doing something wrong when all mpicc is 
is a wrapper to a compiler.  Maybe doing a "mpicc --showme" will give 
you an idea what compiler and options mpicc is passing to the compiler.  
This should give you an idea  the difference between your gcc and mpicc 
compilation.  I would suspect either mpicc is using a compiler 
significantly different than gcc or that mpicc might be passing some 
optimization parameter that is messing the code execution (just a guess).


I hope, now is clear.

Not really.

--td



2012/7/30 TERRY DONTJE<terry.don...@oracle.com>:

I am not sure I am understanding the problem correctly so let me describe it
back to you with a couple clarifications.

So your program using sf_open compiles successfully when using gcc and
mpicc.  However, when you run the executable compiled using mpicc sndFile is
null?

If the above is right can you tell us how you ran the code?
Will the code run ok if ran with "mpirun -np 1" on the same machine you run
the gcc code normally?
When the mpicc compiled code sf_open call returns NULL what does the
successive sf_strerror report?
My wild guess is when you run the mpicc compiled code one of the processes
is on a node that doesn't have access to the file passed to sf_open.

--td

On 7/28/2012 1:08 PM, Paweł Jaromin wrote:

Hello all

Because I try make a program to parallel procesing sound files, I use
libsndfile library to load and write wav files. Sytuation is strange,
because when I compile the program by gcc is good (no parallel), but
if I do it by mpicc is a problem with sndFile variable.

// Open sound file
SF_INFO sndInfo;
SNDFILE *sndFile = sf_open(argv[1], SFM_READ,);
if (sndFile == NULL) {
   fprintf(stderr, "Error reading source file '%s': %s\n", argv[1],
sf_strerror(sndFile));
   return 1;
}

This code run witout an error, but variable is "No value"

Maybe somone can help me ??


--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle - Performance Technologies
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users





--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>

Re: [OMPI users] sndlib problem by mpicc compiler

2012-07-30 Thread TERRY DONTJE

I am not sure I am understanding the problem correctly so let me 
describe it back to you with a couple clarifications.


So your program using sf_open compiles successfully when using gcc and 
mpicc.  However, when you run the executable compiled using mpicc 
sndFile is null?


If the above is right can you tell us how you ran the code?
Will the code run ok if ran with "mpirun -np 1" on the same machine you 
run the gcc code normally?
When the mpicc compiled code sf_open call returns NULL what does the 
successive sf_strerror report?
My wild guess is when you run the mpicc compiled code one of the 
processes is on a node that doesn't have access to the file passed to 
sf_open.


--td
On 7/28/2012 1:08 PM, Paweł Jaromin wrote:

Hello all

Because I try make a program to parallel procesing sound files, I use
libsndfile library to load and write wav files. Sytuation is strange,
because when I compile the program by gcc is good (no parallel), but
if I do it by mpicc is a problem with sndFile variable.

// Open sound file
SF_INFO sndInfo;
SNDFILE *sndFile = sf_open(argv[1], SFM_READ,);
if (sndFile == NULL) {
   fprintf(stderr, "Error reading source file '%s': %s\n", argv[1],
sf_strerror(sndFile));
   return 1;
}

This code run witout an error, but variable is "No value"

Maybe somone can help me ??



--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] MPI_Comm_spawn and exit of parent process.

2012-06-18 Thread TERRY DONTJE


On 6/16/2012 8:03 AM, Roland Schulz wrote:

Hi,

I would like to start a single process without mpirun and then use 
MPI_Comm_spawn to start up as many processes as required. I don't want 
the parent process to take up any resources, so I tried to disconnect 
the inter communicator and then finalize mpi and exit the parent. But 
as soon as I do that the children exit too. Why is that? Can I somehow 
change that behavior? Or can I wait on the children to exit without 
the waiting taking up CPU time?


The reason I don't need the parent as soon as the children are 
spawned, is that I need one intra-communicator over all processes. And 
as far as I know I cannot join the parent and children to one 
intra-communicator.
You could use MPI_Intercomm_merge to create an intra-communicator out of 
the groups in an inter-communicator and pass the inter-communicator you 
get back from the MPI_Comm_spawn call.


--td


The purpose of the whole exercise is that I want that my program to 
use all cores of a node by default when executed without mpirun.


I have tested this with OpenMPI 1.4.5. A sample program is here: 
http://pastebin.com/g2XSZwvY . "Child finalized" is only printed with 
the sleep(2) in the parent not commented out.


Roland

--
ORNL/UT Center for Molecular Biophysics cmb.ornl.gov 
865-241-1537, ORNL PO BOX 2008 MS6309


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] testing for openMPI

2012-06-07 Thread TERRY DONTJE


Try: ps -elf | grep hello
This should list out all the processes named hello.
In that output is the pid (should be the 4th column) of the process and 
you give your debugger that pid.  For example if the pid was 1234 you'd 
give "gdb - 1234".


Actually Jeff's suggestion of this being a firewall issue is something 
to look into.


--td


On 6/7/2012 6:36 AM, Duke wrote:

On 6/7/12 5:31 PM, TERRY DONTJE wrote:
Can you get on one of the nodes and see the job's processes?  If so 
can you then attach a debugger to it and get a stack?  I wonder if 
the processes are stuck in MPI_Init?


Thanks Terry for your suggestion, but please let me know how would I 
do it? I can ssh to the nodes, but how do I check the job's process? I 
am new to this.


Thanks,

D.



--td

On 6/7/2012 6:06 AM, Duke wrote:

Hi again,

Somehow the verbose flag (-v) did not work for me. I tried 
--debug-daemon and got:


[mpiuser@fantomfs40a ~]$ mpirun --debug-daemons -np 3 --machinefile 
/home/mpiuser/.mpi_hostfile ./test/mpihello

Daemon was launched on hp430a - beginning to initialize
Daemon [[34432,0],1] checking in as pid 3011 on host hp430a


Somehow the program got stuck when checking on hosts. The secure log 
on hp430a showed that mpiuser logged in just fine:


tail /var/log/secure
Jun  7 17:07:31 hp430a sshd[3007]: Accepted publickey for mpiuser 
from 192.168.0.101 port 34037 ssh2
Jun  7 17:07:31 hp430a sshd[3007]: pam_unix(sshd:session): session 
opened for user mpiuser by (uid=0)


Any idea where/how/what to process/check?

Thanks,

D.

On 6/7/12 4:38 PM, Duke wrote:

Hi Jingha,

On 6/7/12 4:28 PM, Jingcha Joba wrote:

Hello Duke,
Welcome to the forum.
The way openmpi schedules by default is to fill all the slots in a 
host, before moving on to next host.

Check this link for some info:
http://www.open-mpi.org/faq/?category=running#mpirun-scheduling


Thanks for quick answer. I checked the FAQ, and tried with 
processes more than 2, but somehow it got stalled:


[mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --machinefile 
/home/mpiuser/.mpi_hostfile ./test/mpihello

^Cmpirun: killing job...

I tried --host flag and it got stalled as well:

[mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --host hp430a,hp430b 
./test/mpihello



My configuration must be wrong somewhere. Anyidea how I can check 
the system?


Thanks,

D.




--
Jingcha
On Thu, Jun 7, 2012 at 2:11 AM, Duke <duke.li...@gmx.com 
<mailto:duke.li...@gmx.com>> wrote:


Hi folks,

Please be gentle to the newest member of openMPI, I am totally
new to this field. I just built a test cluster with 3 boxes on
Scientific Linux 6.2 and openMPI (Open MPI 1.5.3), and I
wanted to test how the cluster works but I cant figure out
what was/is happening. On my master node, I have the hostfile:

[mpiuser@fantomfs40a ~]$ cat .mpi_hostfile
# The Hostfile for Open MPI
fantomfs40a slots=2
hp430a slots=4 max-slots=4
hp430b slots=4 max-slots=4

To test, I used the following c code:

[mpiuser@fantomfs40a ~]$ cat test/mpihello.c
/* program hello */
/* Adapted from mpihello.f by drs */

#include 
#include 

int main(int argc, char **argv)
{
 int *buf, i, rank, nints, len;
 char hostname[256];

 MPI_Init(,);
 MPI_Comm_rank(MPI_COMM_WORLD, );
 gethostname(hostname,255);
 printf("Hello world!  I am process number: %d on host %s\n",
rank, hostname);
 MPI_Finalize();
 return 0;
}

and then compiled and ran:

[mpiuser@fantomfs40a ~]$ mpicc -o test/mpihello test/mpihello.c
[mpiuser@fantomfs40a ~]$ mpirun -np 2 --machinefile
/home/mpiuser/.mpi_hostfile ./test/mpihello
Hello world!  I am process number: 0 on host fantomfs40a
Hello world!  I am process number: 1 on host fantomfs40a

Unfortunately the result did not show what I wanted. I
expected to see somethign like:

Hello world!  I am process number: 0 on host hp430a
Hello world!  I am process number: 1 on host hp430b

Anybody has any idea what I am doing wrong?

Thank you in advance,

D.





___
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>





_

Re: [OMPI users] testing for openMPI

2012-06-07 Thread TERRY DONTJE

Another sanity think to try is see if you can run your test program on 
just one of the nodes?  If that works more than likely MPI is having 
issues setting up connections between the nodes.


--td

On 6/7/2012 6:06 AM, Duke wrote:

Hi again,

Somehow the verbose flag (-v) did not work for me. I tried 
--debug-daemon and got:


[mpiuser@fantomfs40a ~]$ mpirun --debug-daemons -np 3 --machinefile 
/home/mpiuser/.mpi_hostfile ./test/mpihello

Daemon was launched on hp430a - beginning to initialize
Daemon [[34432,0],1] checking in as pid 3011 on host hp430a


Somehow the program got stuck when checking on hosts. The secure log 
on hp430a showed that mpiuser logged in just fine:


tail /var/log/secure
Jun  7 17:07:31 hp430a sshd[3007]: Accepted publickey for mpiuser from 
192.168.0.101 port 34037 ssh2
Jun  7 17:07:31 hp430a sshd[3007]: pam_unix(sshd:session): session 
opened for user mpiuser by (uid=0)


Any idea where/how/what to process/check?

Thanks,

D.

On 6/7/12 4:38 PM, Duke wrote:

Hi Jingha,

On 6/7/12 4:28 PM, Jingcha Joba wrote:

Hello Duke,
Welcome to the forum.
The way openmpi schedules by default is to fill all the slots in a 
host, before moving on to next host.

Check this link for some info:
http://www.open-mpi.org/faq/?category=running#mpirun-scheduling


Thanks for quick answer. I checked the FAQ, and tried with processes 
more than 2, but somehow it got stalled:


[mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --machinefile 
/home/mpiuser/.mpi_hostfile ./test/mpihello

^Cmpirun: killing job...

I tried --host flag and it got stalled as well:

[mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --host hp430a,hp430b 
./test/mpihello



My configuration must be wrong somewhere. Anyidea how I can check the 
system?


Thanks,

D.




--
Jingcha
On Thu, Jun 7, 2012 at 2:11 AM, Duke > wrote:


Hi folks,

Please be gentle to the newest member of openMPI, I am totally
new to this field. I just built a test cluster with 3 boxes on
Scientific Linux 6.2 and openMPI (Open MPI 1.5.3), and I wanted
to test how the cluster works but I cant figure out what was/is
happening. On my master node, I have the hostfile:

[mpiuser@fantomfs40a ~]$ cat .mpi_hostfile
# The Hostfile for Open MPI
fantomfs40a slots=2
hp430a slots=4 max-slots=4
hp430b slots=4 max-slots=4

To test, I used the following c code:

[mpiuser@fantomfs40a ~]$ cat test/mpihello.c
/* program hello */
/* Adapted from mpihello.f by drs */

#include 
#include 

int main(int argc, char **argv)
{
 int *buf, i, rank, nints, len;
 char hostname[256];

 MPI_Init(,);
 MPI_Comm_rank(MPI_COMM_WORLD, );
 gethostname(hostname,255);
 printf("Hello world!  I am process number: %d on host %s\n",
rank, hostname);
 MPI_Finalize();
 return 0;
}

and then compiled and ran:

[mpiuser@fantomfs40a ~]$ mpicc -o test/mpihello test/mpihello.c
[mpiuser@fantomfs40a ~]$ mpirun -np 2 --machinefile
/home/mpiuser/.mpi_hostfile ./test/mpihello
Hello world!  I am process number: 0 on host fantomfs40a
Hello world!  I am process number: 1 on host fantomfs40a

Unfortunately the result did not show what I wanted. I expected
to see somethign like:

Hello world!  I am process number: 0 on host hp430a
Hello world!  I am process number: 1 on host hp430b

Anybody has any idea what I am doing wrong?

Thank you in advance,

D.





___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] testing for openMPI

2012-06-07 Thread TERRY DONTJE

Can you get on one of the nodes and see the job's processes?  If so can 
you then attach a debugger to it and get a stack?  I wonder if the 
processes are stuck in MPI_Init?


--td

On 6/7/2012 6:06 AM, Duke wrote:

Hi again,

Somehow the verbose flag (-v) did not work for me. I tried 
--debug-daemon and got:


[mpiuser@fantomfs40a ~]$ mpirun --debug-daemons -np 3 --machinefile 
/home/mpiuser/.mpi_hostfile ./test/mpihello

Daemon was launched on hp430a - beginning to initialize
Daemon [[34432,0],1] checking in as pid 3011 on host hp430a


Somehow the program got stuck when checking on hosts. The secure log 
on hp430a showed that mpiuser logged in just fine:


tail /var/log/secure
Jun  7 17:07:31 hp430a sshd[3007]: Accepted publickey for mpiuser from 
192.168.0.101 port 34037 ssh2
Jun  7 17:07:31 hp430a sshd[3007]: pam_unix(sshd:session): session 
opened for user mpiuser by (uid=0)


Any idea where/how/what to process/check?

Thanks,

D.

On 6/7/12 4:38 PM, Duke wrote:

Hi Jingha,

On 6/7/12 4:28 PM, Jingcha Joba wrote:

Hello Duke,
Welcome to the forum.
The way openmpi schedules by default is to fill all the slots in a 
host, before moving on to next host.

Check this link for some info:
http://www.open-mpi.org/faq/?category=running#mpirun-scheduling


Thanks for quick answer. I checked the FAQ, and tried with processes 
more than 2, but somehow it got stalled:


[mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --machinefile 
/home/mpiuser/.mpi_hostfile ./test/mpihello

^Cmpirun: killing job...

I tried --host flag and it got stalled as well:

[mpiuser@fantomfs40a ~]$ mpirun -v -np 4 --host hp430a,hp430b 
./test/mpihello



My configuration must be wrong somewhere. Anyidea how I can check the 
system?


Thanks,

D.




--
Jingcha
On Thu, Jun 7, 2012 at 2:11 AM, Duke > wrote:


Hi folks,

Please be gentle to the newest member of openMPI, I am totally
new to this field. I just built a test cluster with 3 boxes on
Scientific Linux 6.2 and openMPI (Open MPI 1.5.3), and I wanted
to test how the cluster works but I cant figure out what was/is
happening. On my master node, I have the hostfile:

[mpiuser@fantomfs40a ~]$ cat .mpi_hostfile
# The Hostfile for Open MPI
fantomfs40a slots=2
hp430a slots=4 max-slots=4
hp430b slots=4 max-slots=4

To test, I used the following c code:

[mpiuser@fantomfs40a ~]$ cat test/mpihello.c
/* program hello */
/* Adapted from mpihello.f by drs */

#include 
#include 

int main(int argc, char **argv)
{
 int *buf, i, rank, nints, len;
 char hostname[256];

 MPI_Init(,);
 MPI_Comm_rank(MPI_COMM_WORLD, );
 gethostname(hostname,255);
 printf("Hello world!  I am process number: %d on host %s\n",
rank, hostname);
 MPI_Finalize();
 return 0;
}

and then compiled and ran:

[mpiuser@fantomfs40a ~]$ mpicc -o test/mpihello test/mpihello.c
[mpiuser@fantomfs40a ~]$ mpirun -np 2 --machinefile
/home/mpiuser/.mpi_hostfile ./test/mpihello
Hello world!  I am process number: 0 on host fantomfs40a
Hello world!  I am process number: 1 on host fantomfs40a

Unfortunately the result did not show what I wanted. I expected
to see somethign like:

Hello world!  I am process number: 0 on host hp430a
Hello world!  I am process number: 1 on host hp430b

Anybody has any idea what I am doing wrong?

Thank you in advance,

D.





___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] "-library=stlport4" neccessary for Sun C

2012-06-06 Thread TERRY DONTJE




On 6/6/2012 4:38 AM, Siegmar Gross wrote:

Hello,

I compiled "openmpi-1.6" on "Solaris 10 sparc", "Solaris 10 x86",
and Linux (openSuSE 12.1) with "Sun C 5.12". Today I searched my
log-files for "WARNING" and found the following message.

WARNING: **
WARNING: *** VampirTrace cannot be built due to your STL appears to
WARNING: *** be broken.
WARNING: *** Please try again re-configuring Open MPI with using
WARNING: *** the STLport4 by adding the compiler flag -library=stlport4
WARNING: *** to CXXFLAGS.
WARNING: *** Pausing to give you time to read this message...
WARNING: **

With this recommendation I could configure and build VampirTrace
support. Perhaps somebody can add this option as default to
"configure" for "Sun C 5.12" on Solaris and Linux.
STLport4 should *not* be the default in cases that VT is not built for 
OMPI with Oracle compilers.  I imagine that the configure code can be 
bent to detect that VT is going to be built and default to STLport4 but 
I vaguely recollect this is easier said than done.


I'll open a ticket on this issue but I am not going to promise this will 
be addressed anytime soon unless someone else decides to take a crack at 
this issue.


--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] problem with sctp.h on Solaris

2012-06-05 Thread TERRY DONTJE

This looks like a missing check in the sctp configure.m4.  I am working 
on a patch.


--td

On 6/5/2012 10:10 AM, Siegmar Gross wrote:

Hello,

I compiled "openmpi-1.6" on "Solaris 10 sparc" and "Solaris 10 x86"
with "gcc-4.6.2" and "Sun C 5.12". Today I searched my log-files for
"WARNING" and found the following message.

WARNING: netinet/sctp.h: present but cannot be compiled
WARNING: netinet/sctp.h: check for missing prerequisite headers?
WARNING: netinet/sctp.h: see the Autoconf documentation
WARNING: netinet/sctp.h: section "Present But Cannot Be Compiled"
WARNING: netinet/sctp.h: proceeding with the compiler's result
WARNING: ## -- ##
WARNING: ## Report this to http://www.open-mpi.org/community/help/ ##
WARNING: ## -- ##

Looking in "config.log" showed that some types are undefined.

tyr openmpi-1.6-SunOS.sparc.64_cc 323 grep sctp config.log
configure:119568: result: elan, mx, ofud, openib, portals, sctp, sm, tcp, udapl
configure:125730: checking for MCA component btl:sctp compile mode
configure:125752: checking --with-sctp value
configure:125862: checking --with-sctp-libdir value
configure:125946: checking netinet/sctp.h usability
"/usr/include/netinet/sctp.h", line 228:
   incomplete struct/union/enum sockaddr_storage: spc_aaddr
"/usr/include/netinet/sctp.h", line 530: syntax error before or at: socklen_t
"/usr/include/netinet/sctp.h", line 533: syntax error before or at: socklen_t
"/usr/include/netinet/sctp.h", line 537: syntax error before or at: socklen_t
"/usr/include/netinet/sctp.h", line 772: syntax error before or at: ipaddr_t
"/usr/include/netinet/sctp.h", line 779: syntax error before or at: in6_addr_t
| #include
...

The missing types are defined via. In which files must
I include this header file to avoid the warning? Thank you very much
for any help in advance.


Kind regards

Siegmar

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] possible bug exercised by mpi4py

2012-05-25 Thread TERRY DONTJE

BTW, the changes prior to r26496 failed some of the MTT test runs on 
several systems.  So if the current implementation is deemed not 
"correct" I suspect we will need to figure out if there are changes to 
the tests that need to be done.


See http://www.open-mpi.org/mtt/index.php?do_redir=2066 for some of the 
failures I think are due to r26495 reduce scatter changes.


--td

On 5/25/2012 12:27 AM, George Bosilca wrote:

On May 24, 2012, at 23:48 , Dave Goodell wrote:


On May 24, 2012, at 10:34 PM CDT, George Bosilca wrote:


On May 24, 2012, at 23:18, Dave Goodell  wrote:


So I take back my prior "right".  Upon further inspection of the text and the 
MPICH2 code I believe it to be true that the number of the elements in the recvcounts 
array must be equal to the size of the LOCAL group.

This is quite illogical, but it will not be the first time the standard is 
lacking some. So, if I understand you correctly, in the case of an 
intercommunicator a process doesn't know how much data it has to reduce, at 
least not until it receives the array of recvcounts from the remote group. 
Weird!

No, it knows because of the restriction that $sum_i^n{recvcounts[i]}$ yields 
the same sum in each group.

I should have read the entire paragraph of the standard … including the 
rationale. Indeed, the rationale describes exactly what you mentioned.

Apparently the figure 12 on the following [MPI Forum blessed] link is supposed 
to clarify any potential misunderstanding regarding the reduce_scatter. Count 
how many elements are on each side of the intercommunicator ;)

   george.


The way it's implemented in MPICH2, and the way that makes this make a lot more 
sense to me, is that you first do intercommunicator reductions to temporary 
buffers on rank 0 in each group.  Then rank 0 scatters within the local group.  
The way I had been thinking about it was to do a local reduction followed by an 
intercomm scatter, but that isn't what the standard is saying, AFAICS.


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] Regarding the execution time calculation

2012-05-08 Thread TERRY DONTJE


On 5/7/2012 8:40 PM, Jeff Squyres (jsquyres) wrote:

On May 7, 2012, at 8:31 PM, Jingcha Joba wrote:


So in the above stated example, end-start will be:  + 
20ms ?

(time slice of P2 + P3 = 20ms)

More or less (there's nonzero amount of time required for the kernel scheduler, 
and the time quantum for each of P2 and P3 is likely not *exactly* 10ms).  But 
you're over thinking this.

The elapsed wall-clock time is simply (end-start).

To kind of add to what Jeff is saying, the case you are describing 
sounds like oversubscription.  If you really need to find the "pure" 
performance of the code you should be running on a dedicated cluster 
otherwise you'll be battling other issues in addition to timeslicing.


--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] MPI over tcp

2012-05-04 Thread TERRY DONTJE




On 5/4/2012 1:17 PM, Don Armstrong wrote:

On Fri, 04 May 2012, Rolf vandeVaart wrote:

On Behalf Of Don Armstrong

On Thu, 03 May 2012, Rolf vandeVaart wrote:

2. If that works, then you can also run with a debug switch to
see what connections are being made by MPI.

You can see the connections being made in the attached log:

[archimedes:29820] btl: tcp: attempting to connect() to [[60576,1],2] address
138.23.141.162 on port 2001

Yes, I missed that. So, can we simplify the problem. Can you run
with np=2 and one process on each node?

It hangs in exactly the same spot without completing the initial
sm-based message. [Specifically, the GUID sending and acking appears
to complete on the tcp connection, but the actual traffic is never
sent, and the
ompi_request_wait_completion(>req_send.req_base.req_ompi);
never clears).


Also, maybe you can send the ifconfig output from each node. We
sometimes see this type of hanging when a node has two different
interfaces on the same subnet.

1: lo:  mtu 16436 qdisc noqueue state UNKNOWN
 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
 inet 127.0.0.1/8 scope host lo
 inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc pfifo_fast state UP 
qlen 1000
 link/ether 00:30:48:7d:82:54 brd ff:ff:ff:ff:ff:ff
 inet 138.23.140.43/23 brd 138.23.141.255 scope global eth0
 inet 172.16.30.79/24 brd 172.16.30.255 scope global eth0:1
 inet6 fe80::230:48ff:fe7d:8254/64 scope link
valid_lft forever preferred_lft forever
3: eth1:  mtu 1500 qdisc pfifo_fast state 
DOWN qlen 1000
 link/ether 00:30:48:7d:82:55 brd ff:ff:ff:ff:ff:ff
 inet6 fd74:56b0:69d6::2101/118 scope global
valid_lft forever preferred_lft forever
 inet6 fe80::230:48ff:fe7d:8255/64 scope link
valid_lft forever preferred_lft forever
16: tun0:  mtu 1500 qdisc pfifo_fast 
state UNKNOWN qlen 100
 link/none
 inet 10.134.0.6/24 brd 10.134.0.255 scope global tun0
17: tun1:  mtu 1500 qdisc pfifo_fast 
state UNKNOWN qlen 100
 link/none
 inet 10.137.0.201/24 brd 10.137.0.255 scope global tun1

1: lo:  mtu 16436 qdisc noqueue state UNKNOWN
 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
 inet 127.0.0.1/8 scope host lo
 inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0:  mtu 1500 qdisc mq state UP qlen 1000
 link/ether 00:17:a4:4b:7c:ea brd ff:ff:ff:ff:ff:ff
 inet 172.16.30.110/24 brd 172.16.30.255 scope global eth0:1
 inet 138.23.141.162/23 brd 138.23.141.255 scope global eth0
 inet6 fe80::217:a4ff:fe4b:7cea/64 scope link
valid_lft forever preferred_lft forever
3: eth1:  mtu 1500 qdisc noop state DOWN qlen 1000
 link/ether 00:17:a4:4b:7c:ec brd ff:ff:ff:ff:ff:ff
7: tun0:  mtu 1500 qdisc pfifo_fast 
state UNKNOWN qlen 100
 link/none
 inet 10.134.0.26/24 brd 10.134.0.255 scope global tun0


Assuming there are multiple interfaces, can you experiment with the runtime 
flags outlined here?
http://www.open-mpi.org/faq/?category=tcp#tcp-selection

It's already running with btl_tcp_if_include=eth0 and
oob_tcp_if_include=eth0; the connections are happening only on eth0,
which has the 138.23.141.16 addresses.
Sorry if this is a stupid question but what is eth0:1 (it's under 
eth0).  Are the 172.16.30.X addresses pingable to each other?



--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] MPI over tcp

2012-05-04 Thread TERRY DONTJE




On 5/4/2012 8:26 AM, Rolf vandeVaart wrote:



2. If that works, then you can also run with a debug switch to see
what connections are being made by MPI.

You can see the connections being made in the attached log:

[archimedes:29820] btl: tcp: attempting to connect() to [[60576,1],2] address
138.23.141.162 on port 2001

Yes, I missed that.  So, can we simplify the problem.  Can you run with np=2 
and one process on each node?
Also, maybe you can send the ifconfig output from each node.  We sometimes see 
this type of hanging when
a node has two different interfaces on the same subnet.

Assuming there are multiple interfaces, can you experiment with the runtime 
flags outlined here?
http://www.open-mpi.org/faq/?category=tcp#tcp-selection

Maybe by restricting to specific interfaces you can figure out which network is 
the problem.

Another cause of tcp hangs, if you are on linux, is if the virbr0 
interfaces are configured.  The tcp btl will incorrectly think that it 
can use the virbr interfaces to communicate with other nodes.  You 
either need to disable the virbr interfaces or exclude them from being 
used by the tcp btl.


--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] MPI doesn't recognize multiple cores available on multicore machines

2012-04-26 Thread TERRY DONTJE




On 4/25/2012 1:00 PM, Jeff Squyres wrote:

On Apr 25, 2012, at 12:51 PM, Ralph Castain wrote:


Sounds rather bizarre. Do you have lstopo on your machine? Might be useful to 
see the output of that so we can understand what it thinks the topology is like 
as this underpins the binding code.

The -nooversubscribe option is a red herring here - it has nothing to do with 
the problem, nor will it help.

FWIW: if you aren't adding --bind-to-core, then OMPI isn't launching your process on any 
specific core at all - we are simply launching it on the node. It sounds to me like your 
code is incorrectly identifying "sharing" when a process isn't bound to a 
specific core.

+1

Put differently: if you're not binding your processes to processor cores, then 
it's quite likely/possible that multiple processes *are* running on the same 
processor cores, at least intermittently, because the OS is allowed to migrate 
processes to whatever processor cores it wants to.
However, Kyle mentioned previously that he was doing a -bind-to-core 
option.  I would suggest adding -report-bindings to the mpirun command 
line and see what mpirun really thinks it is binding to if it is at all.


There is one piece of information that seems missing and confusing me.  
Kyle how is your code determining it is the only process bound to a core 
or conversely another process is bound to the same core?


--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] HRM problem

2012-04-24 Thread TERRY DONTJE

To determine if an MPI process is waiting for a message do what Rayson 
suggested and attach a debugger to the processes and see if any of them 
are stuck in MPI.  Either internally in a MPI_Recv or MPI_Wait call or 
looping on a MPI_Test call.

Other things to consider.
  Is this the first time you've ran it (with Open MPI? with any MPI?)?
  How many processes is the job using?  Are you oversubscribing your 
processors?

  What version of Open MPI are you using?
  Have you tested all network connections?
  It might help us to know the size of cluster you are running and what 
type of network?

--td
On 4/24/2012 2:42 AM, Syed Ahsan Ali wrote:

Dear Rayson,

That is a Nuemrical model that is written by National weather service 
of a country. The logs of the model show every detail about the 
simulation progress. I have checked on the remote nodes as well the 
application binary is running but the logs show no progress, it is 
just waiting at a point. The input data is correct everything is fine. 
How can I check if the MPI task is waiting for a message?

Ahsan

On Tue, Apr 24, 2012 at 11:03 AM, Rayson Ho > wrote:

Seems like there's a bug in the application. Did you or someone else
write it, or did you get it from an ISV??

You can log onto one of the nodes, attach a debugger, and see if the
MPI task is waiting for a message (looping in one of the MPI receive
functions)...

Rayson

=
Open Grid Scheduler / Grid Engine
http://gridscheduler.sourceforge.net/

Scalable Grid Engine Support Program
http://www.scalablelogic.com/

On Tue, Apr 24, 2012 at 12:49 AM, Syed Ahsan Ali
> wrote:
> Dear All,
>
> I am having problem with running an application on Dell cluster
. The model
> starts well but no further progress is shown. It just stuck. I
have checked
> the systems, no apparent hardware error is there. Other open mpi
> applications are running well on the same cluster. I have tried
running the
> application on cores of the same server as well but the problem
is same. The
> application just don't move further. The same application is
also running
> well on a backup cluster. Please help.
>
>
> Thanks and Best Regards
>
> Ahsan
>
> ___
> users mailing list
> us...@open-mpi.org 
> http://www.open-mpi.org/mailman/listinfo.cgi/users

--
==
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/

___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] Strange OpenMPI messages

2012-02-15 Thread TERRY DONTJE

Do you get any interfaces shown when you run "ibstat" on any of the 
nodes your job is spawned on?

--td

On 2/15/2012 1:27 AM, Tohiko Looka wrote:

Mm... This is really strange
I don't have that service and there is no ib* output in 'ifconfig -a' 
or 'Infinband' in 'lspci'
Which makes me believe that I don't have such a network. I also 
checked on an identical computer on the same network with the same 
results.

What's strange is that these messages didn't use to show up and they 
don't show up on that identical computer; only on mine. Even though 
both computers have the same hardware, openMPI version and on the same 
network.

I guess I can safely ignore these warnings and run on Ethernet, but it 
would be nice to know what happened there, in case anybody has an idea.

Thank you,

On Wed, Feb 15, 2012 at 12:52 AM, Gustavo Correa 
> wrote:

Hi Tohiko

OpenFabrics network a.k.a. Infiniband a.k.a. IB.
To check if the compute nodes have IB interfaces, try:

lspci [and search the output for Infinband]

To see if the IB interface is configured try:

ifconfig -a  [and search the output for ib0, ib1, or similar]

To check if the OFED module is up try:

'service openibd status'

As an alternative, you could also try to run your program over
Ethernet, avoiding Infinband,
in case you don't have IB or if somehow it is broken.
It is slower than Infiniband, though.

Try something like this:

mpiexec -mca btl tcp,sm,self -np 4 ./my_mpi_program

I hope this helps,
Gus Correa

On Feb 14, 2012, at 4:02 PM, Tohiko Looka wrote:

> Sorry for the noob question, but how do I check my network type
and if OFED service is running correctly or not? And how do I run it
>
> Thank you,
>
> On Tue, Feb 14, 2012 at 2:14 PM, Jeff Squyres
> wrote:
> Do you have an OpenFabrics-based network?  (e.g., InfiniBand or
iWarp)
>
> If so, this error message usually means that OFED is either
installed incorrectly, or is not running properly (e.g., its
services didn't get started properly upon boot).
>
> If you don't have an OpenFabrics-based network, then it usually
means that you have OpenFabrics services running when you really
shouldn't (because you don't have any OpenFabrics-based devices).
>
>
> On Feb 14, 2012, at 4:48 AM, Tohiko Looka wrote:
>
> > Greetings,
> >
> > Until today I was running my openmpi applications with no
errors/warnings
> > Today I restarted my computer (possibly after an automatic
openmpi update) and got these warnings when
> > running my program
> > [tohiko@kw12614 1d]$ mpirun -x LD_LIBRARY_PATH -hostfile hosts
-np 10 hello
> > librdmacm: couldn't read ABI version.
> > librdmacm: assuming: 4
> > CMA: unable to get RDMA device list
> >
--
> > [[21652,1],0]: A high-performance Open MPI point-to-point
messaging module
> > was unable to find any relevant network interfaces:
> >
> > Module: OpenFabrics (openib)
> >   Host: kw12614
> >
> > Another transport will be used instead, although this may
result in
> > lower performance.
> >
--
> > [kw12614:03195] 10 more processes have sent help message
help-mpi-btl-base.txt / btl:no-nics
> > [kw12614:03195] Set MCA parameter "orte_base_help_aggregate"
to 0 to see all help / error messages
> >
> >
> > Is this normal? And how come it happened now?
> > -- Tohiko
> > ___
> > users mailing list
> > us...@open-mpi.org 
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com 
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
> ___
> users mailing list
> us...@open-mpi.org 
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> ___
> users mailing list
> us...@open-mpi.org 
> http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631

Re: [OMPI users] MPI_Allgather problem

2012-01-27 Thread TERRY DONTJE

 ompi_info should tell you the current version of Open MPI your path is 
pointing to.
Are you sure your path is pointing to the area that the OpenFOAM package 
delivered Open MPI into?


--td
On 1/27/2012 5:02 AM, Brett Tully wrote:
Interesting. In the same set of updates, I installed OpenFOAM from 
their Ubuntu deb package and it claims to ship with openmpi. I just 
downloaded their Third-party source tar and unzipped it to see what 
version of openmpi they are using, and it is 1.5.3. However, when I do 
man openmpi, or ompi_info, I get the same version as before (1.4.3). 
How do I determine for sure what is being included when I compile 
something using mpicc?


Thanks,
Brett.


On Thu, Jan 26, 2012 at 10:05 PM, Jeff Squyres > wrote:


What version did you upgrade to?  (we don't control the Ubuntu
packaging)

I see a bullet in the soon-to-be-released 1.4.5 release notes:

- Fix obscure cases where MPI_ALLGATHER could crash.  Thanks to Andrew
 Senin for reporting the problem.

But that would be surprising if this is what fixed your issue,
especially since it's not released yet.  :-)



On Jan 26, 2012, at 5:24 AM, Brett Tully wrote:

> As of two days ago, this problem has disappeared and the tests
that I had written and run each night are now passing. Having
looked through the update log of my machine (Ubuntu 11.10) it
appears as though I got a new version of mpi-default-dev
(0.6ubuntu1). I would like to understand this problem in more
detail -- is it possible to see what changed in this update?
> Thanks,
> Brett.
>
>
>
> On Fri, Dec 9, 2011 at 6:43 PM, teng ma > wrote:
> I guess your output is from different ranks.   YOu can add rank
infor inside print to tell like follows:
>
> (void) printf("rank %d: gathered[%d].node = %d\n", rank, i,
gathered[i].node);
>
> From my side, I did not see anything wrong from your code in
Open MPI 1.4.3. after I add rank, the output is
> rank 5: gathered[0].node = 0
> rank 5: gathered[1].node = 1
> rank 5: gathered[2].node = 2
> rank 5: gathered[3].node = 3
> rank 5: gathered[4].node = 4
> rank 5: gathered[5].node = 5
> rank 3: gathered[0].node = 0
> rank 3: gathered[1].node = 1
> rank 3: gathered[2].node = 2
> rank 3: gathered[3].node = 3
> rank 3: gathered[4].node = 4
> rank 3: gathered[5].node = 5
> rank 1: gathered[0].node = 0
> rank 1: gathered[1].node = 1
> rank 1: gathered[2].node = 2
> rank 1: gathered[3].node = 3
> rank 1: gathered[4].node = 4
> rank 1: gathered[5].node = 5
> rank 0: gathered[0].node = 0
> rank 0: gathered[1].node = 1
> rank 0: gathered[2].node = 2
> rank 0: gathered[3].node = 3
> rank 0: gathered[4].node = 4
> rank 0: gathered[5].node = 5
> rank 4: gathered[0].node = 0
> rank 4: gathered[1].node = 1
> rank 4: gathered[2].node = 2
> rank 4: gathered[3].node = 3
> rank 4: gathered[4].node = 4
> rank 4: gathered[5].node = 5
> rank 2: gathered[0].node = 0
> rank 2: gathered[1].node = 1
> rank 2: gathered[2].node = 2
> rank 2: gathered[3].node = 3
> rank 2: gathered[4].node = 4
> rank 2: gathered[5].node = 5
>
> Is that what you expected?
>
> On Fri, Dec 9, 2011 at 12:03 PM, Brett Tully
> wrote:
> Dear all,
>
> I have not used OpenMPI much before, but am maintaining a large
legacy application. We noticed a bug to do with a call to
MPI_Allgather as summarised in this post to Stackoverflow:

http://stackoverflow.com/questions/8445398/mpi-allgather-produces-inconsistent-results
>
> In the process of looking further into the problem, I noticed
that the following function results in strange behaviour.
>
> void test_all_gather() {
>
> struct _TEST_ALL_GATHER {
> int node;
> };
>
> int ierr, size, rank;
> ierr = MPI_Comm_size(MPI_COMM_WORLD, );
> ierr = MPI_Comm_rank(MPI_COMM_WORLD, );
>
> struct _TEST_ALL_GATHER local;
> struct _TEST_ALL_GATHER *gathered;
>
> gathered = (struct _TEST_ALL_GATHER*) malloc(size *
sizeof(*gathered));
>
> local.node = rank;
>
> MPI_Allgather(, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
> gathered, sizeof(struct _TEST_ALL_GATHER), MPI_BYTE,
MPI_COMM_WORLD);
>
> int i;
> for (i = 0; i < numnodes; ++i) {
> (void) printf("gathered[%d].node = %d\n", i,
gathered[i].node);
> }
>
> FREE(gathered);
> }
>
> At one point, this function printed the following:
> gathered[0].node = 2
> gathered[1].node = 3
> gathered[2].node = 2
> gathered[3].node = 3
>

Re: [OMPI users] localhost only

2012-01-17 Thread TERRY DONTJE

Is there a way to set up an interface analogous to Unix's loopback?  I 
suspect setting "-mca btl self,sm" wouldn't help since this is probably 
happening while the processes are bootstrapping.


--td

On 1/16/2012 7:26 PM, Ralph Castain wrote:

The problem is that OMPI is looking for a tcp port for your computer. With no 
network enabled, you don't have one, and so mpirun aborts. I don't know of any 
way around this at the moment.

Sent from my iPad

On Jan 16, 2012, at 4:53 PM, Gustavo Correa  wrote:


Have you tried to specify the hosts with something like this?

mpirun -np 2 -host localhost ./my_program

See 'man mpirun' for more details.

I hope it helps,
Gus Correa

On Jan 16, 2012, at 6:34 PM, MM wrote:


hi,

when my wireless adapter is down on my laptop, only localhost is configured.
In this case, when I mpirun 2 binaries on my laptop, mpirun fails with this 
error:


It looks like orte_init failed for some reason; your parallel process i
likely to abort.  There are many reasons that a parallel process can
fail during orte_init; some of which are due to configuration or
environment problems.  This failure appears to be an internal failure;
here's some additional information (which may only be relevant to an
Open MPI developer):

  orte_rml_base_select failed
  -->  Returned value Error (-1) instead of ORTE_SUCCESS



when I turn on the wireless adapter back on, the mpirun works fine

Is there a way to make mpirun realize all my binaries run on the same box, and 
therefore don't need any other interface but localhost?

PS: this is ipconfig when the wireless adapter is off


ipconfig /all

Windows IP Configuration

Host Name . . . . . . . . . . . . :
Primary Dns Suffix  . . . . . . . :
Node Type . . . . . . . . . . . . : Hybrid
IP Routing Enabled. . . . . . . . : No
WINS Proxy Enabled. . . . . . . . : No

Ethernet adapter Wireless Network Connection:

Media State . . . . . . . . . . . : Media disconnected
Description . . . . . . . . . . . : Intel(R) WiFi Link 5100 AGN
Physical Address. . . . . . . . . :

rds,

MM
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] Openmpi SGE and BLACS

2012-01-13 Thread TERRY DONTJE

Do you have a stack of where exactly things are seg faulting in 
blacs_pinfo?


--td

On 1/13/2012 8:12 AM, Conn ORourke wrote:

Dear Openmpi Users,

I am reserving several processors with SGE upon which I want to run a 
number of openmpi jobs, all of which individually (and combined) use 
less than the reserved number of processors. The code I am using uses 
BLACS, and when blacs_pinfo is called I get a seg fault. If the code 
doesn't call blacs_pinfo it runs fine being submitted in this manner. 
blacs_pinfo simply returns the number of available processors, so I 
suspect this is an issue with SGE and openmpi and the requested node 
number being different to that given to mpirun.


Can anyone explain why this would happen with openmpi jobs using 
BLACS  on the SGE? And suggest maybe a way around it?


Many thanks
Conn

example submission script:
|#!/bin/bash -f -l
#$ -V
#$ -N test
#$ -S /bin/bash
#$ -cwd
#$ -l vf=1800M
#$ -pe ib-ompi 12
#$ -q infiniband.q


 BIN=~/bin/program
 for  iin  XPOL,YPOL,ZPOL;  do
mkdir ${TMPDIR}/4ZP;
mkdir ${TMPDIR}/4ZP/$i;
cp./4ZP/$i/*  ${TMPDIR}/4ZP/$i;
 done

 cd ${TMPDIR}/4ZP/XPOL;
 mpirun-np4  -machinefile ${TMPDIR}/machines $BIN>  output&
 cd ${TMPDIR}/4ZP/YPOL;
 mpirun-np4  -machinefile ${TMPDIR}/machines $BIN>  output&
 cd ${TMPDIR}/4ZP/ZPOL;
 mpirun-np4  -machinefile ${TMPDIR}/machines $BIN>  output;

 for  iinXPOL YPOL ZPOL;  do
  cp ${TMPDIR}/4ZP/$i/*  ${HOME}/4ZP/$i;
 done


blacs_pinfo.c:
||#include "Bdef.h"

#if (INTFACE == C_CALL)
voidCblacs_pinfo(int*mypnum,  int*nprocs)
#else
F_VOID_FUNC blacs_pinfo_(int*mypnum,  int*nprocs)
#endif
{
int ierr;
extern int BI_Iam,  BI_Np;

/*
  *  If  this is our first call,  will need toset  up some stuff
  */
if  (BI_F77_MPI_COMM_WORLD==  NULL)
{
/*
  *  The  BLACS always call f77's mpi_init.  If the user is using C, he 
should
  *explicitly call MPI_Init . . .
  */
   MPI_Initialized(nprocs);
#ifdef MainInF77
   if (!(*nprocs)) bi_f77_init_();
#else
   if (!(*nprocs))
  BI_BlacsErr(-1, -1, __FILE__,
 "Users with C main programs must explicitly call MPI_Init");
#endif
   BI_F77_MPI_COMM_WORLD = (int *) malloc(sizeof(int));
#ifdef UseF77Mpi
   BI_F77_MPI_CONSTANTS = (int *)
  malloc(23*sizeof(int));
   ierr = 1;
   bi_f77_get_constants_(BI_F77_MPI_COMM_WORLD,, BI_F77_MPI_CONSTANTS);
#else
   ierr = 0;
   bi_f77_get_constants_(BI_F77_MPI_COMM_WORLD,, nprocs);
#endif
   BI_MPI_Comm_size(BI_MPI_COMM_WORLD,_Np, ierr);
   BI_MPI_Comm_rank(BI_MPI_COMM_WORLD,_Iam, ierr);
}
*mypnum = BI_Iam;
*nprocs = BI_Np;
}|


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] using MPI_Recv in two different threads.

2012-01-11 Thread TERRY DONTJE

I am a little confused by your problem statement.  Are you saying you 
want to have each MPI process to have multiple threads that can call MPI 
concurrently?  If so you'll want to read up on the MPI_Init_thread 
function.


--td

On 1/11/2012 7:19 AM, Hamilton Fischer wrote:

Hi, I'm actually using mpi4py but my question should be similar to normal MPI 
in spirit.

Simply, I want to do a MPMD application with a dedicated thread for each node 
(I have a small network). I was wondering if it was okay to do a blocking recv 
in each independent thread. Of course, since each thread has one node, there is 
no problem with wrong recv's being picked up by other threads.


Thanks.

noobermin

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] Error launching w/ 1.5.3 on IB mthca nodes

2011-12-19 Thread TERRY DONTJE


On 12/19/2011 2:10 AM, V. Ram wrote:

On Thu, Dec 15, 2011, at 09:28 PM, Jeff Squyres wrote:

Very strange.  I have a lot of older mthca-based HCAs in my Cisco MPI
test cluster, and I don't see these kinds of problems.

Can I ask what version of OFED you're using, or what version of OFED the
IB software stack is coming from?

Just to set expectations here, Jeff is on vacation until January so he 
might not reply to this anytime soon.


--td

Thank you.

V. Ram


On Dec 15, 2011, at 7:24 PM, V. Ram wrote:


Hi Terry,

Thanks so much for the response.  My replies are in-line below.

On Thu, Dec 15, 2011, at 07:00 AM, TERRY DONTJE wrote:

IIRC, RNR's are usually due to the receiving side not having a segment
registered and ready to receive data on a QP.  The btl does go through a
big dance and does its own flow control to make sure this doesn't happen.

So when this happens are both the sending and receiving nodes using
mthca's to communicate with?

Yes.  For the newer nodes using onboard mlx4, this issue doesn't arise.
The mlx4-based nodes are using the same core switch as the mthca nodes.


By any chance is it a particular node (or pair of nodes) this seems to
happen with?

No.  I've got 40 nodes total with this hardware configuration, and the
problem has been seen on most/all nodes at one time or another.  It
doesn't seem, based on the limited number of observable parameters I'm
aware of, to be dependent on the number of nodes involved.

It is an intermittent problem, but when it happens, it happens at job
launch, and it does occur most of the time.

Thanks,

V. Ram


--td

Open MPI InfiniBand gurus and/or Mellanox: could I please get some
assistance with this? Any suggestions on tunables or debugging
parameters to try?

Thank you very much.

On Mon, Dec 12, 2011, at 10:42 AM, V. Ram wrote:

Hello,

We are running a cluster that has a good number of older nodes with
Mellanox IB HCAs that have the "mthca" device name ("ib_mthca" kernel
module).

These adapters are all at firmware level 4.8.917 .

The Open MPI in use is 1.5.3 , kernel 2.6.39 , x86-64. Jobs are
launched/managed using Slurm 2.2.7. The IB software and drivers
correspond to OFED 1.5.3.2 , and I've verified that the kernel modules
in use are all from this OFED version.

On nodes with the mthca hardware *only*, we get frequent, but
intermittent job startup failures, with messages like:

/

[[19373,1],54][btl_openib_component.c:3320:handle_wc] from compute-c3-07
to: compute-c3-01 error polling LP CQ with status RECEIVER NOT READY
RETRY EXCEEDED ERROR status
number 13 for wr_id 2a25c200 opcode 128 vendor error 135 qp_idx 0

--
The OpenFabrics "receiver not ready" retry count on a per-peer
connection between two MPI processes has been exceeded. In general,
this should not happen because Open MPI uses flow control on per-peer
connections to ensure that receivers are always ready when data is
sent.

[further standard error text snipped...]

Below is some information about the host that raised the error and the
peer to which it was connected:

Local host: compute-c3-07
Local device: mthca0
Peer host: compute-c3-01

You may need to consult with your system administrator to get this
problem fixed.
--

/

During these job runs, I have monitored the InfiniBand performance
counters on the endpoints and switch. No telltale counters for any of
these ports change during these failed job initiations.

ibdiagnet works fine and properly enumerates the fabric and related
performance counters, both from the affected nodes, as well as other
nodes attached to the IB switch. The IB connectivity itself seems fine
from these nodes.

Other nodes with different HCAs use the same InfiniBand fabric
continuously without any issue, so I don't think it's the fabric/switch.

I'm at a loss for what to do next to try and find the root cause of the
issue. I suspect something perhaps having to do with the mthca
support/drivers, but how can I track this down further?

Thank you,

V. Ram.

--
http://www.fastmail.fm - One of many happy users:
  http://www.fastmail.fm/docs/quotes.html

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>

Re: [OMPI users] Error launching w/ 1.5.3 on IB mthca nodes

2011-12-15 Thread TERRY DONTJE

IIRC, RNR's are usually due to the receiving side not having a segment 
registered and ready to receive data on a QP.  The btl does go through a 
big dance and does its own flow control to make sure this doesn't happen.


So when this happens are both the sending and receiving nodes using 
mthca's to communicate with?


By any chance is it a particular node (or pair of nodes) this seems to 
happen with?


--td


Open MPI InfiniBand gurus and/or Mellanox: could I please get some
assistance with this? Any suggestions on tunables or debugging
parameters to try?

Thank you very much.

On Mon, Dec 12, 2011, at 10:42 AM, V. Ram wrote:
> Hello,
>
> We are running a cluster that has a good number of older nodes with
> Mellanox IB HCAs that have the "mthca" device name ("ib_mthca" kernel
> module).
>
> These adapters are all at firmware level 4.8.917 .
>
> The Open MPI in use is 1.5.3 , kernel 2.6.39 , x86-64. Jobs are
> launched/managed using Slurm 2.2.7. The IB software and drivers
> correspond to OFED 1.5.3.2 , and I've verified that the kernel modules
> in use are all from this OFED version.
>
> On nodes with the mthca hardware *only*, we get frequent, but
> intermittent job startup failures, with messages like:
>
> /
>
> [[19373,1],54][btl_openib_component.c:3320:handle_wc] from compute-c3-07
> to: compute-c3-01 error polling LP CQ with status RECEIVER NOT READY
> RETRY EXCEEDED ERROR status
> number 13 for wr_id 2a25c200 opcode 128 vendor error 135 qp_idx 0
>
> --
> The OpenFabrics "receiver not ready" retry count on a per-peer
> connection between two MPI processes has been exceeded. In general,
> this should not happen because Open MPI uses flow control on per-peer
> connections to ensure that receivers are always ready when data is
> sent.
>
> [further standard error text snipped...]
>
> Below is some information about the host that raised the error and the
> peer to which it was connected:
>
> Local host: compute-c3-07
> Local device: mthca0
> Peer host: compute-c3-01
>
> You may need to consult with your system administrator to get this
> problem fixed.
> --
>
> /
>
> During these job runs, I have monitored the InfiniBand performance
> counters on the endpoints and switch. No telltale counters for any of
> these ports change during these failed job initiations.
>
> ibdiagnet works fine and properly enumerates the fabric and related
> performance counters, both from the affected nodes, as well as other
> nodes attached to the IB switch. The IB connectivity itself seems fine
> from these nodes.
>
> Other nodes with different HCAs use the same InfiniBand fabric
> continuously without any issue, so I don't think it's the fabric/switch.
>
> I'm at a loss for what to do next to try and find the root cause of the
> issue. I suspect something perhaps having to do with the mthca
> support/drivers, but how can I track this down further?
>
> Thank you,
>
> V. Ram.

Re: [OMPI users] Deadlock at MPI_FInalize

2011-11-28 Thread TERRY DONTJE


Are all the other processes gone?  What version of OMPI are you using?

On 11/28/2011 9:00 AM, Mudassar Majeed wrote:


Dear people,
  In my MPI application, all the processes call 
the MPI_Finalize (all processes reach there) but the rank 0 process 
could not finish with MPI_Finalize and the application remains 
running. Please suggest what can be the cause of that.


regards,
Mudassar


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] How are the Open MPI processes spawned?

2011-11-23 Thread TERRY DONTJE


On 11/23/2011 2:02 PM, Paul Kapinos wrote:

Hello Ralph, hello all,

Two news, as usual a good and a bad one.

The good: we believe to find out *why* it hangs

The bad: it seem for me, this is a bug or at least undocumented 
feature of Open MPI /1.5.x.


In detail:
As said, we see mystery hang-ups if starting on some nodes using some 
permutation of hostnames. Usually removing "some bad" nodes helps, 
sometimes a permutation of node names in the hostfile is enough(!). 
The behaviour is reproducible.


The machines have at least 2 networks:

*eth0* is used for installation, monitoring, ... - this ethernet is 
very slim


*ib0* - is the "IP over IB" interface and is used for everything: the 
file systems, ssh and so on. The hostnames are bound to the ib0 
network; our idea was not to use eth0 for MPI at all.


all machines are available from any over ib0 (are in one network).

But on eth0 there are at least two different networks; especially the 
computer linuxbsc025 is in different network than the others and is 
not reachable from other nodes over eth0! (but reachable over ib0. The 
name used in the hostfile is resolved to the IP of ib0 ).


So I believe that Open MPI /1.5.x tries to communicate over eth0 and 
cannot do it, and hangs. The /1.4.3 does not hang, so this issue is 
1.5.x-specific (seen in 1.5.3 and 1.5.4). A bug?


I also tried to disable the eth0 completely:

$ mpiexec -mca btl_tcp_if_exclude eth0,lo  -mca btl_tcp_if_include ib0 
...


I believe if you give "-mca btl_tcp_if_include ib0" you do not need to 
specify the exclude parameter.
...but this does not help. All right, the above command should disable 
the usage of eth0 for MPI communication itself, but it hangs just 
before the MPI is started, isn't it? (because one process lacks, the 
MPI_INIT cannot be passed)


By "just before the MPI is started" do you mean while orte is launching 
the processes.
I wonder if you need to specify "-mca oob_tcp_if_include ib0" also but I 
think that may depend on which oob you are using.
Now a question: is there a way to forbid the mpiexec to use some 
interfaces at all?


Best wishes,

Paul Kapinos

P.S. Of course we know about the good idea to bring all nodes into the 
same net on eth0, but at this point it is impossible due of technical 
reason[s]...


P.S.2 I'm not sure that the issue is really rooted in the above 
mentioned misconfiguration of eth0, but I have no better idea at this 
point...



The map seem to be correctly build, also the output if the daemons 
seem to be the same (see helloworld.txt)


Unfortunately, it appears that OMPI was not built with --enable-debug 
as there is no debug info in the output. Without a debug installation 
of OMPI, the ability to determine the problem is pretty limited.


well, this will be the next option we will activate. We also have 
another issue here, on (not) using uDAPL..






You should also try putting that long list of nodes in a hostfile - 
see if that makes a difference.
It will process the nodes thru a different code path, so if there 
is some problem in --host,

this will tell us.
No, with the host file instead of host list on command line the 
behaviour is the same.


But, I just found out that the 1.4.3 does *not* hang on this 
constellation. The next thing I will try will be the installation of 
1.5.4 :o)


Best,

Paul

P.S. started:

$ /opt/MPI/openmpi-1.5.3/linux/intel/bin/mpiexec --hostfile 
hostfile-mini -mca odls_base_verbose 5 --leave-session-attached 
--display-map  helloworld 2>&1 | tee helloworld.txt





On Nov 21, 2011, at 9:33 AM, Paul Kapinos wrote:

Hello Open MPI volks,

We use OpenMPI 1.5.3 on our pretty new 1800+ nodes InfiniBand 
cluster, and we have some strange hangups if starting OpenMPI 
processes.


The nodes are named linuxbsc001,linuxbsc002,... (with some lacuna 
due of  offline nodes). Each node is accessible from each other 
over SSH (without password), also MPI programs between any two 
nodes are checked to run.



So long, I tried to start some bigger number of processes, one 
process per node:

$ mpiexec -np NN  --host linuxbsc001,linuxbsc002,... MPI_FastTest.exe

Now the problem: there are some constellations of names in the 
host list on which mpiexec reproducible hangs forever; and more 
surprising: other *permutation* of the *same* node names may run 
without any errors!


Example: the command in laueft.txt runs OK, the command in 
haengt.txt hangs. Note: the only difference is that the node 
linuxbsc025 is put on the end of the host list. Amazed, too?


Looking on the particular nodes during the above mpiexec hangs, we 
found the orted daemons started on *each* node and the binary on 
all but one node (orted.txt, MPI_FastTest.txt).
Again amazing that the node with no user process started (leading 
to hangup in MPI_Init of all processes and thus to hangup, I 
believe) was always the same, linuxbsc005, which is NOT the 
permuted item linuxbsc025...


This behaviour is reproducible. The hang-on only occure

Re: [OMPI users] EXTERNAL: Re: How to set up state-less node /tmp for OpenMPI usage

2011-11-04 Thread TERRY DONTJE

David, are you saying your jobs consistently leave behind session files 
after the job exits?  It really shouldn't even in the case when a job 
aborts, I thought, mpirun took great pains to cleanup after itself.
Can you tell us what version of OMPI you are running with?  I think I 
could see kill -9 of mpirun and processes below would cause turds to be 
left behind.


--td

On 11/4/2011 2:37 AM, David Turner wrote:

% df /tmp
Filesystem   1K-blocks  Used Available Use% Mounted on
- 12330084822848  11507236   7% /
% df /
Filesystem   1K-blocks  Used Available Use% Mounted on
- 12330084822848  11507236   7% /

That works out to 11GB.  But...

The compute nodes have 24GB.  Freshly booted, about 3.2GB is
consumed by the kernel, various services, and the root file system.
At this time, usage of /tmp is essentially nil.

We set user memory limits to 20GB.

I would imagine that the size of the session directories depends on a
number of factors; perhaps the developers can comment on that.  I have
only seen total sizes in the 10s of MBs on our 8-node, 24GB nodes.

As long as they're removed after each job, they don't really compete
with the application for available memory.

On 11/3/11 8:40 PM, Ed Blosch wrote:

Thanks very much, exactly what I wanted to hear. How big is /tmp?

-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On
Behalf Of David Turner
Sent: Thursday, November 03, 2011 6:36 PM
To: us...@open-mpi.org
Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node 
/tmp

for OpenMPI usage

I'm not a systems guy, but I'll pitch in anyway.  On our cluster,
all the compute nodes are completely diskless.  The root file system,
including /tmp, resides in memory (ramdisk).  OpenMPI puts these
session directories therein.  All our jobs run through a batch
system (torque).  At the conclusion of each batch job, an epilogue
process runs that removes all files belonging to the owner of the
current batch job from /tmp (and also looks for and kills orphan
processes belonging to the user).  This epilogue had to written
by our systems staff.

I believe this is a fairly common configuration for diskless
clusters.

On 11/3/11 4:09 PM, Blosch, Edwin L wrote:
Thanks for the help.  A couple follow-up-questions, maybe this 
starts to

go outside OpenMPI:


What's wrong with using /dev/shm?  I think you said earlier in this 
thread

that this was not a safe place.


If the NFS-mount point is moved from /tmp to /work, would a /tmp 
magically
appear in the filesystem for a stateless node?  How big would it be, 
given
that there is no local disk, right?  That may be something I have to 
ask the

vendor, which I've tried, but they don't quite seem to get the question.


Thanks




-Original Message-
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On

Behalf Of Ralph Castain

Sent: Thursday, November 03, 2011 5:22 PM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less 
node /tmp

for OpenMPI usage



On Nov 3, 2011, at 2:55 PM, Blosch, Edwin L wrote:

I might be missing something here. Is there a side-effect or 
performance
loss if you don't use the sm btl?  Why would it exist if there is a 
wholly

equivalent alternative?  What happens to traffic that is intended for
another process on the same node?


There is a definite performance impact, and we wouldn't recommend doing

what Eugene suggested if you care about performance.


The correct solution here is get your sys admin to make /tmp local. 
Making
/tmp NFS mounted across multiple nodes is a major "faux pas" in the 
Linux

world - it should never be done, for the reasons stated by Jeff.





Thanks


-Original Message-
From: users-boun...@open-mpi.org 
[mailto:users-boun...@open-mpi.org] On

Behalf Of Eugene Loh

Sent: Thursday, November 03, 2011 1:23 PM
To: us...@open-mpi.org
Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node

/tmp for OpenMPI usage


Right.  Actually "--mca btl ^sm".  (Was missing "btl".)

On 11/3/2011 11:19 AM, Blosch, Edwin L wrote:

I don't tell OpenMPI what BTLs to use. The default uses sm and puts a

session file on /tmp, which is NFS-mounted and thus not a good choice.


Are you suggesting something like --mca ^sm?


-Original Message-
From: users-boun...@open-mpi.org 
[mailto:users-boun...@open-mpi.org] On

Behalf Of Eugene Loh

Sent: Thursday, November 03, 2011 12:54 PM
To: us...@open-mpi.org
Subject: Re: [OMPI users] EXTERNAL: Re: How to set up state-less node

/tmp for OpenMPI usage


I've not been following closely.  Why must one use shared-memory
communications?  How about using other BTLs in a "loopback" fashion?
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
___
users mailing list

Re: [OMPI users] Changing plm_rsh_agent system wide

2011-10-26 Thread TERRY DONTJE


Sorry please disregard my reply to this email.

:-)

--td

On 10/26/2011 10:44 AM, Ralph Castain wrote:

Did the version you are running get installed in /usr? Sounds like you are 
picking up a different version when running a command - i.e., that your PATH is 
finding a different installation than the one in /usr.


On Oct 26, 2011, at 3:11 AM, Patrick Begou wrote:


I need to change system wide how OpenMPI launch the jobs on the nodes of my 
cluster.

Setting:
export OMPI_MCA_plm_rsh_agent=oarsh

works fine but I would like this config to be the default with OpenMPI. I've 
read several threads (discussions, FAQ) about this but none of the provided 
solutions seams to work.

I have two files:
/usr/lib/openmpi/1.4-gcc/etc/openmpi-mca-params.conf
/usr/lib64/openmpi/1.4-gcc/etc/openmpi-mca-params.conf

In these files I've set various flavor of the syntax (only one at a time, and 
the same in each file of course!):
test 1) plm_rsh_agent = oarsh
test 2) pls_rsh_agent = oarsh
test 3) orte_rsh_agent = oarsh

But each time when I run "ompi_info --param plm rsh" I get:
MCA plm: parameter "plm_rsh_agent" (current value: "ssh : rsh", data source: 
default value, synonyms:
  pls_rsh_agent)
  The command used to launch executables on remote nodes (typically either 
"ssh" or "rsh")

With the exported variable it works fine.
Any suggestion ?

The rpm package of my linux Rocks Cluster provides:
   Package: Open MPI root@build-x86-64 Distribution
   Open MPI: 1.4.3
   Open MPI SVN revision: r23834
   Open MPI release date: Oct 05, 2010

Thanks

Patrick



--
===
|  Equipe M.O.S.T. | http://most.hmg.inpg.fr  |
|  Patrick BEGOU   |      |
|  LEGI| mailto:patrick.be...@hmg.inpg.fr |
|  BP 53 X | Tel 04 76 82 51 35   |
|  38041 GRENOBLE CEDEX| Fax 04 76 82 52 71   |
===

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] Changing plm_rsh_agent system wide

2011-10-26 Thread TERRY DONTJE


I am using prefix configuration so no it does not exist in /usr.

--td

On 10/26/2011 10:44 AM, Ralph Castain wrote:

Did the version you are running get installed in /usr? Sounds like you are 
picking up a different version when running a command - i.e., that your PATH is 
finding a different installation than the one in /usr.


On Oct 26, 2011, at 3:11 AM, Patrick Begou wrote:


I need to change system wide how OpenMPI launch the jobs on the nodes of my 
cluster.

Setting:
export OMPI_MCA_plm_rsh_agent=oarsh

works fine but I would like this config to be the default with OpenMPI. I've 
read several threads (discussions, FAQ) about this but none of the provided 
solutions seams to work.

I have two files:
/usr/lib/openmpi/1.4-gcc/etc/openmpi-mca-params.conf
/usr/lib64/openmpi/1.4-gcc/etc/openmpi-mca-params.conf

In these files I've set various flavor of the syntax (only one at a time, and 
the same in each file of course!):
test 1) plm_rsh_agent = oarsh
test 2) pls_rsh_agent = oarsh
test 3) orte_rsh_agent = oarsh

But each time when I run "ompi_info --param plm rsh" I get:
MCA plm: parameter "plm_rsh_agent" (current value: "ssh : rsh", data source: 
default value, synonyms:
  pls_rsh_agent)
  The command used to launch executables on remote nodes (typically either 
"ssh" or "rsh")

With the exported variable it works fine.
Any suggestion ?

The rpm package of my linux Rocks Cluster provides:
   Package: Open MPI root@build-x86-64 Distribution
   Open MPI: 1.4.3
   Open MPI SVN revision: r23834
   Open MPI release date: Oct 05, 2010

Thanks

Patrick



--
===
|  Equipe M.O.S.T. | http://most.hmg.inpg.fr  |
|  Patrick BEGOU   |      |
|  LEGI| mailto:patrick.be...@hmg.inpg.fr |
|  BP 53 X | Tel 04 76 82 51 35   |
|  38041 GRENOBLE CEDEX| Fax 04 76 82 52 71   |
===

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread TERRY DONTJE


This looks more like a seg fault in wrf and not OMPI.

Sorry not much I can do here to help you.

--td

On 10/25/2011 9:53 AM, Mouhamad Al-Sayed-Ali wrote:

Hi again,

 This is exactly the error I have:


taskid: 0 hostname: part034.u-bourgogne.fr
[part034:21443] *** Process received signal ***
[part034:21443] Signal: Segmentation fault (11)
[part034:21443] Signal code: Address not mapped (1)
[part034:21443] Failing at address: 0xfffe01eeb340
[part034:21443] [ 0] /lib64/libpthread.so.0 [0x3612c0de70]
[part034:21443] [ 1] wrf.exe(__module_ra_rrtm_MOD_taugb3+0x418) 
[0x11cc9d8]
[part034:21443] [ 2] wrf.exe(__module_ra_rrtm_MOD_gasabs+0x260) 
[0x11cfca0]

[part034:21443] [ 3] wrf.exe(__module_ra_rrtm_MOD_rrtm+0xb31) [0x11e6e41]
[part034:21443] [ 4] wrf.exe(__module_ra_rrtm_MOD_rrtmlwrad+0x25ec) 
[0x11e9bcc]
[part034:21443] [ 5] 
wrf.exe(__module_radiation_driver_MOD_radiation_driver+0xe573) [0xcc4ed3]
[part034:21443] [ 6] 
wrf.exe(__module_first_rk_step_part1_MOD_first_rk_step_part1+0x40c5) 
[0xe0e4f5]

[part034:21443] [ 7] wrf.exe(solve_em_+0x22e58) [0x9b45c8]
[part034:21443] [ 8] wrf.exe(solve_interface_+0x80a) [0x902dda]
[part034:21443] [ 9] wrf.exe(__module_integrate_MOD_integrate+0x236) 
[0x4b2c4a]
[part034:21443] [10] wrf.exe(__module_wrf_top_MOD_wrf_run+0x24) 
[0x47a924]

[part034:21443] [11] wrf.exe(main+0x41) [0x4794d1]
[part034:21443] [12] /lib64/libc.so.6(__libc_start_main+0xf4) 
[0x361201d8b4]

[part034:21443] [13] wrf.exe [0x4793c9]
[part034:21443] *** End of error message ***
---

Mouhamad


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread TERRY DONTJE


Can you run wrf successfully on one node?
Can you run a simple code across your two nodes?  I would try hostname 
then some simple MPI program like the ring example.


--td

On 10/25/2011 9:05 AM, Mouhamad Al-Sayed-Ali wrote:

hello,


-What version of ompi are you using

  I am using ompi version 1.4.1-1 compiled with gcc 4.5


-What type of machine and os are you running on

   I'm using linux machine 64 bits.


-What does the machine file look like

  part033
  part033
  part031
  part031


-Is there a stack trace left behind by the pid that seg faulted?

  No, there is no stack trace


Thanks for your help

Mouhamad Alsayed


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] exited on signal 11 (Segmentation fault).

2011-10-25 Thread TERRY DONTJE


Some more info would be nice like:
-What version of ompi are you using
-What type of machine and os are you running on
-What does the machine file look like
-Is there a stack trace left behind by the pid that seg faulted?

--td

On 10/25/2011 8:07 AM, Mouhamad Al-Sayed-Ali wrote:

Hello,

I have tried to run the executable "wrf.exe", using

  mpirun -machinefile /tmp/108388.1.par2/machines -np 4 wrf.exe

but, I've got the following error:

-- 

mpirun noticed that process rank 1 with PID 9942 on node 
part031.u-bourgogne.fr exited on signal 11 (Segmentation fault).
-- 


   11.54s real 6.03s user 0.32s system
Starter(9908): Return code=139
Starter end(9908)




Thanks for your help


Mouhamad Alsayed


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.

2011-07-16 Thread Terry Dontje

I have to agree with Jeff, we really need a complete program to really 
debug this.  Note, without really seeing what the structures look like 
it is hard to determine if maybe there is some type of structure 
mismatch going between recv_packet and load_packet.  Also the output you 
show seems incomplete in that not all data transfers are being shown so 
it is kind of hard to determine if packets are possibly being dropped or 
what.


I agree the output looks suspicious but it still leaves a lot to 
interpretation without really seeing a complete code.


Sorry,

--td

On 7/15/2011 3:44 PM, Jeff Squyres wrote:

Can you write this up in a small, complete program that shows the problem, and 
that we can compile and run?


On Jul 15, 2011, at 3:36 PM, Mudassar Majeed wrote:


*id is same as myid

I am comparing the results by seeing the printed messages, given by the 
printfs

the recv_packet.rank is the rank of the sender that should be equal to 
status.MPI_SOURCE but it is not.

I have updated the code a little bit, here is it.

if( (is_receiver == 1)&&  (is_sender != 1) )
 {
 printf("\nP%d>>  Receiver only ...!!", myid);
 printf("\n");
 MPI_Recv(_packet, 1, loadDatatype, MPI_ANY_SOURCE, MPI_TAG_LOAD, 
comm,);
 printf("\nP%d>>  Received from P%d, packet contains rank: %d", myid, 
status.MPI_SOURCE, recv_packet.rank);
 printf("\n");
 }
 else if( (is_sender == 1)&&  (is_receiver != 1) )
 {
 load_packet.rank = myid;
 load_packet.ld = load;
 printf("\nP%d>>  Sender only ...!! P%d", myid, rec_rank);
 printf("\n");
 MPI_Ssend(_packet, 1, loadDatatype, rec_rank, MPI_TAG_LOAD, comm);
 }
 else if( (is_receiver == 1)&&  (is_sender == 1) )
 {
 load_packet.rank = myid;
 load_packet.ld = load;
 printf("\nP%d>>  Both ...!! P%d", myid, rec_rank);
 printf("\n");
 MPI_Sendrecv(_packet, 1, loadDatatype, rec_rank, MPI_TAG_LOAD,
  _packet, 1, loadDatatype, MPI_ANY_SOURCE, MPI_TAG_LOAD, 
comm,);
 printf("\nP%d>>  Received from P%d, packet contains rank: %d", myid, 
status.MPI_SOURCE, recv_packet.rank);
 printf("\n");
 }

here is the output

P11>>  Sender only ...!! P2

P14>>  Sender only ...!! P6

P15>>  Neither ...!!

P15>>  I could reach here ...!!

P8>>  Neither ...!!

P8>>  I could reach here ...!!

P1>>  Receiver only ...!!

P9>>  Sender only ...!! P0

P2>>  Receiver only ...!!


P10>>  Sender only ...!! P1

P3>>  Receiver only ...!!

P3>>  Received from P13, packet contains rank: 14


P0>>  Receiver only ...!!

P0>>  Received from P3, packet contains rank: 9

P4>>  Receiver only ...!!

P12>>  Neither ...!!

P12>>  I could reach here ...!!

P5>>  Both ...!! P3

P13>>  Sender only ...!! P4

P13>>  I could reach here ...!!

P6>>  Both ...!! P5

P7>>  Neither ...!!

P7>>  I could reach here ...!!

P14>>  I could reach here ...!!

P1>>  Received from P7, packet contains rank: 11

P1>>  I could reach here ...!!

P9>>  I could reach here ...!!
P2>>  Received from P11, packet contains rank: 13

P2>>  I could reach here ...!!

P0>>  I could reach here ...!!

P11>>  I could reach here ...!!
P3>>  I could reach here ...!!


regards,
Mudassar

From: Terry Dontje<terry.don...@oracle.com>
To: Mudassar Majeed<mudassar...@yahoo.com>
Cc: "us...@open-mpi.org"<us...@open-mpi.org>
Sent: Friday, July 15, 2011 9:06 PM
Subject: Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.



On 7/15/2011 2:35 PM, Mudassar Majeed wrote:

Here is the code

 if( (is_receiver == 1)&&  (is_sender != 1) )
 {
 printf("\nP%d>>  Receiver only ...!!", myid);
 printf("\n");
 MPI_Recv(_packet, 1, loadDatatype, MPI_ANY_SOURCE, MPI_TAG_LOAD, 
comm,);
 printf("\nP%d>>  Received from P%d", myid, status.MPI_SOURCE);
 printf("\n");
 }
 else if( (is_sender == 1)&&  (is_receiver != 1) )
 {
 load_packet.rank = *id;
 load_packet.ld = load;
 printf("\nP%d>>  Sender only ...!! P%d", myid, rec_rank);
 printf("\n");
 MPI_Ssend(_packet, 1, loadDatatype, rec_rank, MPI_TAG_LOAD, comm);
 }
 else if( (is_receiver == 1)&&  (is_sender == 1) )
 {
 load_packet.rank = *id;
 load_packet.ld = load;
 printf("\nP%d>>  Both ...!! P%d", myid, rec_rank);
 printf("\n");
 MPI_Sendrecv(_packet, 1, loadDatatype, rec_rank, MPI_TAG_LOAD,
  _packet, 1,

Re: [OMPI users] Does Oracle Cluster Tools aka Sun's MPI work with LDAP?

2011-07-15 Thread Terry Dontje




On 7/15/2011 1:46 PM, Paul Kapinos wrote:

Hi OpenMPI volks (and Oracle/Sun experts),

we have a problem with Sun's MPI (Cluster Tools 8.2.x) on a part of 
our cluster. In the part of the cluster where LDAP is activated, the 
mpiexec  does not try to spawn tasks on remote nodes at all, but exits 
with an error message alike below. If 'strace -f' the mpiexec, no exec 
of "ssh" can be found at all. Wondering, mpiexec tries to look into 
/etc/passwd (where user is not in, because using LDAP!).



Note this is an area that should be no different than from stock Open MPI.
I would suspect that the message might be coming from ssh.  I wouldn't 
suspect mpiexec would be looking into /etc/passwd at all, why would it 
need to.  It should just be using ssh.  Can you manually ssh to the same 
node?
On the old part of the cluster, where NIS is used as the 
autentification method, Sun MPI runs very fine.


So, is Suns MPI compatible with LDAP autotentification method at all?


In as far as whatever launcher you use is compatible with LDAP.

Best wishes,

Paul


P.S. in both parts if the cluster, me (login marked as x here) can 
login to any node by ssh without need to type the password.




-- 


The user (x) is unknown to the system (i.e. there is no corresponding
entry in the password file). Please contact your system administrator
for a fix.
-- 

[cluster-beta.rz.RWTH-Aachen.DE:31535] [[57885,0],0] ORTE_ERROR_LOG: 
Fatal in file plm_rsh_module.c at line 1058
-- 





___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.

2011-07-15 Thread Terry Dontje




On 7/15/2011 2:35 PM, Mudassar Majeed wrote:


Here is the code

if( (is_receiver == 1) && (is_sender != 1) )
{
printf("\nP%d >> Receiver only ...!!", myid);
printf("\n");
MPI_Recv(_packet, 1, loadDatatype, MPI_ANY_SOURCE, 
MPI_TAG_LOAD, comm, );

printf("\nP%d >> Received from P%d", myid, status.MPI_SOURCE);
printf("\n");
}
else if( (is_sender == 1) && (is_receiver != 1) )
{
load_packet.rank = *id;
load_packet.ld = load;
printf("\nP%d >> Sender only ...!! P%d", myid, rec_rank);
printf("\n");
MPI_Ssend(_packet, 1, loadDatatype, rec_rank, 
MPI_TAG_LOAD, comm);

}
else if( (is_receiver == 1) && (is_sender == 1) )
{
load_packet.rank = *id;
load_packet.ld = load;
printf("\nP%d >> Both ...!! P%d", myid, rec_rank);
printf("\n");
MPI_Sendrecv(_packet, 1, loadDatatype, rec_rank, 
MPI_TAG_LOAD,
_packet, 1, loadDatatype, MPI_ANY_SOURCE, MPI_TAG_LOAD, comm, 
);

printf("\nP%d >> Received from P%d", myid, status.MPI_SOURCE);
printf("\n");
}

A process can be a message sender, or receiver or both. There are 16 
ranks. "rec_rank" contains the rank of the receiver. It is displayed 
before the message is sent.
Every sender displays this "rec_rank" and it should correctly. But on 
the receiver sides, status.MPI_SOURCE is displayed (after receiving 
message), but the value

is not matching with the expected sender's rank.
Sorry, but I still don't see how you are detecting the mismatch.  I 
assume load_packet_rank some how relates to load_packet.  But why are 
you setting it to *id instead of myid?  Also on the receive side I see 
no place where you pull out the rank from the recv_packet to compare 
with status.MPI_SOURCE.


I did not understand about kernel that you were talking about.

A "kernel" that I am talking about is a small piece of code someone can 
build and run to see the problem.
See the code is very clear and it sends the message to "rec_rank" that 
was displayed before sending the message. But on the receiver side the 
MPI_SOURCE comes to be wrong.
This shows to me that messages on the receiving sides are captured on 
the basis of MPI_ANY_SOURCE, that seems like it does not see the 
destination of message while capturing it from message queue of the 
MPI system.


regards,
Mudassar


*From:* Terry Dontje <terry.don...@oracle.com>
*To:* Mudassar Majeed <mudassar...@yahoo.com>
*Cc:* "us...@open-mpi.org" <us...@open-mpi.org>
*Sent:* Friday, July 15, 2011 7:10 PM
*Subject:* Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.



On 7/15/2011 12:49 PM, Mudassar Majeed wrote:


Yes, processes receive messages that were not sent to them. I am 
receiving the message with the following call


MPI_Recv(_packet, 1, loadDatatype, MPI_ANY_SOURCE, MPI_TAG_LOAD, 
comm, );


and that was sent using the following call,

MPI_Ssend(_packet, 1, loadDatatype, rec_rank, MPI_TAG_LOAD, comm);

What problem it can have ?. All the parameters are correct, I have 
seen them by printf.  What I am thinking is that, the receive is done 
with MPI_ANY_SOURCE, so the process is getting any message (from any 
source). What should be done so that only that message is captured 
that had the destination as this process.


By virtue of MPI the MPI_Recv call should only return messages 
destined for that rank.  What makes you think that is not happening?  
Can you make some sort of kernel of code that proves your theory that 
your MPI_Recv is receiving another rank's message?  If you can and 
then post that code maybe we'll be able to figure out what the issue is.


Right now, it seems we are at a deadlock of you claiming something is 
happening that really cannot be happening.  So unless we have more 
than a broad description of the problem it is going to be nearly 
impossible for us to tell you what is wrong.


--td

regards,
Mudassar

Date: Fri, 15 Jul 2011 07:04:34 -0400
From: Terry Dontje <terry.don...@oracle.com 
<mailto:terry.don...@oracle.com>>

Subject: Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.
To: us...@open-mpi.org <mailto:us...@open-mpi.org>
Message-ID: <4e201ec2@oracle.com <mailto:4e201ec2@oracle.com>>
Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"

Well MPI_Recv does give you the message that was sent specifically to
the rank calling it by any of the processes in the communicator.  If you
think the message you received should have gone to another rank then
there is a bug somewhere.  I would start by either adding debugging
printf's to your code to trace the messages

Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.

2011-07-15 Thread Terry Dontje

On 7/15/2011 12:49 PM, Mudassar Majeed wrote:

Yes, processes receive messages that were not sent to them. I am 
receiving the message with the following call

MPI_Recv(_packet, 1, loadDatatype, MPI_ANY_SOURCE, MPI_TAG_LOAD, 
comm, );

and that was sent using the following call,

MPI_Ssend(_packet, 1, loadDatatype, rec_rank, MPI_TAG_LOAD, comm);

What problem it can have ?. All the parameters are correct, I have 
seen them by printf.  What I am thinking is that, the receive is done 
with MPI_ANY_SOURCE, so the process is getting any message (from any 
source). What should be done so that only that message is captured 
that had the destination as this process.

By virtue of MPI the MPI_Recv call should only return messages destined 
for that rank.  What makes you think that is not happening?  Can you 
make some sort of kernel of code that proves your theory that your 
MPI_Recv is receiving another rank's message?  If you can and then post 
that code maybe we'll be able to figure out what the issue is.

Right now, it seems we are at a deadlock of you claiming something is 
happening that really cannot be happening.  So unless we have more than 
a broad description of the problem it is going to be nearly impossible 
for us to tell you what is wrong.

--td

regards,
Mudassar

Date: Fri, 15 Jul 2011 07:04:34 -0400
From: Terry Dontje <terry.don...@oracle.com 
<mailto:terry.don...@oracle.com>>

Subject: Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.
To: us...@open-mpi.org <mailto:us...@open-mpi.org>
Message-ID: <4e201ec2@oracle.com <mailto:4e201ec2@oracle.com>>
Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"

Well MPI_Recv does give you the message that was sent specifically to
the rank calling it by any of the processes in the communicator.  If you
think the message you received should have gone to another rank then
there is a bug somewhere.  I would start by either adding debugging
printf's to your code to trace the messages.  Or narrowing down the
code to a small kernel such that you can prove to yourself that MPI is
working the way it should and if not you can show us where it is going
wrong.

--td

On 7/15/2011 6:51 AM, Mudassar Majeed wrote:
> I get the sender's rank in status.MPI_SOURCE, but it is different than
> expected. I need to receive that message which was sent to me, not any
> message.
>
> regards,
>
> Date: Fri, 15 Jul 2011 06:33:41 -0400
> From: Terry Dontje <terry.don...@oracle.com 
<mailto:terry.don...@oracle.com>

> <mailto:terry.don...@oracle.com <mailto:terry.don...@oracle.com>>>
> Subject: Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.
> To: us...@open-mpi.org <mailto:us...@open-mpi.org> 
<mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>
> Message-ID: <4e201785.6010...@oracle.com 
<mailto:4e201785.6010...@oracle.com>
> <mailto:4e201785.6010...@oracle.com 
<mailto:4e201785.6010...@oracle.com>>>

> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
>
> Mudassar,
>
> You can do what you are asking.  The receiver uses MPI_ANY_SOURCE for
> the source rank value and when you receive a message the
> status.MPI_SOURCE will contain the rank of the actual sender not the
> receiver's rank.  If you are not seeing that then there is a bug
> somewhere.
>
> --td
>
> On 7/14/2011 9:54 PM, Mudassar Majeed wrote:
> > Friend,
> >  I can not specify the rank of the sender. Because only
> > the sender knows to which receiver the message is to be sent. The
> > receiver does not know from which sender the message will come. I am
> > trying to do a research work on load balancing in MPI application
> > where load is redistributed, so in that I require a receiver to
> > receive a load value from a sender that it does not know. On the other
> > hand, the sender actually calculates, to which receiver this load
> > value should be sent. So for this, I want sender to send a message
> > containing the load to a receiver, but receiver does not know from
> > which sender the message will come. See, it is like send receiver in
> > DATAGRAM sockets. The receiver, receives the message on the IP and
> > port, the message which was directed for it. I want to have same
> > behavior. But it seems that it is not possible in MPI. Isn't it?
> >
> > regards,
> > Mudassar
> >
> > 

> > *From:* Jeff Squyres <jsquy...@cisco.com 
<mailto:jsquy...@cisco.com> <mailto:jsquy...@cisco.com 
<mailto:jsquy...@cisco.com>>>
> > *To:* Mudassar Majeed <mudassar...@yahoo.com 
<mailto:mudassar...@yahoo.com>

> <

Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.

2011-07-15 Thread Terry Dontje

Well MPI_Recv does give you the message that was sent specifically to 
the rank calling it by any of the processes in the communicator.  If you 
think the message you received should have gone to another rank then 
there is a bug somewhere.  I would start by either adding debugging 
printf's to your code to trace the messages.   Or narrowing down the 
code to a small kernel such that you can prove to yourself that MPI is 
working the way it should and if not you can show us where it is going 
wrong.


--td

On 7/15/2011 6:51 AM, Mudassar Majeed wrote:
I get the sender's rank in status.MPI_SOURCE, but it is different than 
expected. I need to receive that message which was sent to me, not any 
message.


regards,

Date: Fri, 15 Jul 2011 06:33:41 -0400
From: Terry Dontje <terry.don...@oracle.com 
<mailto:terry.don...@oracle.com>>

Subject: Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.
To: us...@open-mpi.org <mailto:us...@open-mpi.org>
Message-ID: <4e201785.6010...@oracle.com 
<mailto:4e201785.6010...@oracle.com>>

Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"

Mudassar,

You can do what you are asking.  The receiver uses MPI_ANY_SOURCE for
the source rank value and when you receive a message the
status.MPI_SOURCE will contain the rank of the actual sender not the
receiver's rank.  If you are not seeing that then there is a bug 
somewhere.


--td

On 7/14/2011 9:54 PM, Mudassar Majeed wrote:
> Friend,
>  I can not specify the rank of the sender. Because only
> the sender knows to which receiver the message is to be sent. The
> receiver does not know from which sender the message will come. I am
> trying to do a research work on load balancing in MPI application
> where load is redistributed, so in that I require a receiver to
> receive a load value from a sender that it does not know. On the other
> hand, the sender actually calculates, to which receiver this load
> value should be sent. So for this, I want sender to send a message
> containing the load to a receiver, but receiver does not know from
> which sender the message will come. See, it is like send receiver in
> DATAGRAM sockets. The receiver, receives the message on the IP and
> port, the message which was directed for it. I want to have same
> behavior. But it seems that it is not possible in MPI. Isn't it?
>
> regards,
> Mudassar
>
> 
> *From:* Jeff Squyres <jsquy...@cisco.com <mailto:jsquy...@cisco.com>>
> *To:* Mudassar Majeed <mudassar...@yahoo.com 
<mailto:mudassar...@yahoo.com>>

> *Cc:* Open MPI Users <us...@open-mpi.org <mailto:us...@open-mpi.org>>
> *Sent:* Friday, July 15, 2011 3:30 AM
> *Subject:* Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.
>
> Right.  I thought you were asking about receiving *another* message
> from whomever you just received from via ANY_SOURCE.
>
> If you want to receive from a specific sender, you just specify the
> rank you want to receive from -- not ANY_SOURCE.
>
> You will always only receive messages that were sent to *you*.
> There's no MPI_SEND_TO_ANYONE_WHO_IS_LISTENING functionality, for
> example.  So your last statement: "But when it captures with ..
> MPI_ANY_SOURCE and MPI_ANY_TAG, the receiver will capture any message
> (even not targetted for it)" is incorrect.
>
> I guess I still don't understand your question...?
>
>
> On Jul 14, 2011, at 9:17 PM, Mudassar Majeed wrote:
>
> >
> > I know this, but when I compare status.MPI_SOURCE with myid, they
> are different. I guess you need to reconsider my question. The
> MPI_Recv function seems to capture message from the queue with some
> search parameters like source, tag etc. So in case the receiver does
> not know the sender and wants to receive only that message which was
> sent for this receiver. But when it captures with source as
> MPI_ANY_SOURCE and MPI_ANY_TAG, the receiver will capture any message
> (even not targetted for it).
> >
> > regards,
> > Mudassar
> >
> >
> > From: Jeff Squyres <jsquy...@cisco.com <mailto:jsquy...@cisco.com> 
<mailto:jsquy...@cisco.com <mailto:jsquy...@cisco.com>>>
> > To: Mudassar Majeed <mudassar...@yahoo.com 
<mailto:mudassar...@yahoo.com>
> <mailto:mudassar...@yahoo.com <mailto:mudassar...@yahoo.com>>>; Open 
MPI Users <us...@open-mpi.org <mailto:us...@open-mpi.org>

> <mailto:us...@open-mpi.org <mailto:us...@open-mpi.org>>>
> > Sent: Friday, July 15, 2011 1:58 AM
> > Subject: Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.
> >
> > When you use MPI_ANY_SOURCE in a receive, the r

Re: [OMPI users] Open MPI & Grid Engine/Grid Scheduler thread binding

2011-07-15 Thread Terry Dontje

Here's, hopefully, more useful info.  Note reading the job2core.pdf 
presentation, that was  mentioned earlier, more closely will also 
clarify a couple points (I've put those points inline below).


On 7/15/2011 12:01 AM, Ralph Castain wrote:

On Jul 14, 2011, at 5:46 PM, Jeff Squyres wrote:


Looping in the users mailing list so that Ralph and Oracle can comment...

Not entirely sure what I can contribute here, but I'll try - see below for some 
clarifications. I think the discussion here is based on some misunderstanding 
of how OMPI works.



On Jul 14, 2011, at 2:34 PM, Rayson Ho wrote:


(CC'ing Jeff from the Open-MPI project...)

On Thu, Jul 14, 2011 at 1:35 PM, Tad Kollar  wrote:

As I thought more about it, I was afraid that might be the case, but hoped
sge_shepherd would do some magic for tightly-integrated jobs.

To SGE, if each of the tasks is not started by sge_shepherd, then the
only option is to set the binding mask to the allocation, which in
your original case, was the whole system (48 CPUs).



We're running OpenMPI 1.5.3 if that makes a difference. Do you know of
anyone using an MVAPICH2 1.6 pe that can handle binding?

OMPI uses its own binding scheme - we stick within the overall binding envelope 
given to us, but we don't use external bindings of individual procs. Reason is 
simple: SGE has no visibility into the MPI procs we spawn. All SGE sees is 
mpirun and the daemons (called orteds) we launch on each node, and so it can't 
provide a binding scheme for the MPI procs (it actually has no idea how many 
procs are on each node as OMPI's mapper can support a multitude of algorithms, 
all invisible to SGE).


However, if one reads the job2core.pdf presentation on page 14 it talks 
about how SGE will pass a rankfile to Open MPI which is how SGE drives 
the binding it wants for an Open MPI job.

I just downloaded Open MPI 1.5.4a and grep'ed the source, looks like
it is not looking at the SGE_BINDING env variable that is set by SGE.

No, we don't. However, the orteds do check to see if they have been bound, and 
if so, to what processors. Those bindings are then used as an envelope limiting 
the processors we use to bind the procs we spawn.

I believe SGE_BINDING is an env-var to SGE that tells it what binding to 
use for the job and SGE will then, as mentioned above, generate a 
rankfile to be used by Open MPI.



The serial case worked (its affinity list was '0' instead of '0-47'), so at
least we know that's in good shape :-)

Please also submit a few more jobs and see if the new hwloc code is
able to handle multiple jobs running on your AMD MC server.



My ultimate goal is for affinity support to be enabled and scheduled
automatically for all MPI users, i.e. without them having to do any more
than they would for a no-affinity job (otherwise I have a feeling most of
them would just ignore it). What do you think it will take to get to that
point?

We tried to do this once - I set a default param to auto-bind processes. Major 
error. I was lynched by the MPI user community until we removed that param.

Reason is simple: suppose you have MPI processes that launch threads. Remember, 
there is no thread-level binding out there - all the OS will let you do is bind 
at the process level. So now you bind someone's MPI process to some core(s), 
which forces all the threads from that process to stay within that 
bindingthereby potentially creating a horrendous thread-contention problem.

It doesn't take threading to cause problems - some applications just don't work 
as well when bound. It's true that the benchmarks generally do, but they aren't 
representative of real applications.

Bottom line: defaulting to binding processes was something the MPI community 
appears to have rejected, with reason. Might argue about whether or not they 
are correct, but that appears to be the consensus, and it is the position OMPI 
has adopted. User ignorance of when to bind and when not to bind is not a valid 
reason to impact everyone.



That's my goal since 2008...

I started a mail thread, "processor affinity -- OpenMPI / batchsystem
integration" to the Open MPI list in 2008. And in 2009, the conclusion
was that Sun was saying that the binding info is set in the
environment and Open MPI would perform the binding itself (so I
assumed that was done):

It is done - we just use OMPI's binding schemes and not the ones provided 
natively by SGE. Like I said above, SGE doesn't see the MPI procs and can't 
provide a binding pattern for them - so looking at the SUNW_MP_BIND envar is 
pointless.

Note SUNW_MP_BIND has *nothing* to do with  Open MPI but is a way that 
SGE feeds binding to OpenMP (note no "I") applications.  So Ralph is 
right that this env-var is pointless from an Open MPI perspective.



http://www.open-mpi.org/community/lists/users/2009/10/10938.php

Revisiting the presentation (see: job2core.pdf link at the above URL),
Sun's variable name is $SUNW_MP_BIND, so it is most

Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.

2011-07-15 Thread Terry Dontje

Mudassar,

You can do what you are asking.  The receiver uses MPI_ANY_SOURCE for 
the source rank value and when you receive a message the 
status.MPI_SOURCE will contain the rank of the actual sender not the 
receiver's rank.  If you are not seeing that then there is a bug somewhere.

--td

On 7/14/2011 9:54 PM, Mudassar Majeed wrote:

Friend,
 I can not specify the rank of the sender. Because only 
the sender knows to which receiver the message is to be sent. The 
receiver does not know from which sender the message will come. I am 
trying to do a research work on load balancing in MPI application 
where load is redistributed, so in that I require a receiver to 
receive a load value from a sender that it does not know. On the other 
hand, the sender actually calculates, to which receiver this load 
value should be sent. So for this, I want sender to send a message 
containing the load to a receiver, but receiver does not know from 
which sender the message will come. See, it is like send receiver in 
DATAGRAM sockets. The receiver, receives the message on the IP and 
port, the message which was directed for it. I want to have same 
behavior. But it seems that it is not possible in MPI. Isn't it?

regards,
Mudassar

*From:* Jeff Squyres 
*To:* Mudassar Majeed 
*Cc:* Open MPI Users 
*Sent:* Friday, July 15, 2011 3:30 AM
*Subject:* Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.

Right.  I thought you were asking about receiving *another* message 
from whomever you just received from via ANY_SOURCE.

If you want to receive from a specific sender, you just specify the 
rank you want to receive from -- not ANY_SOURCE.

You will always only receive messages that were sent to *you*.  
There's no MPI_SEND_TO_ANYONE_WHO_IS_LISTENING functionality, for 
example.  So your last statement: "But when it captures with .. 
MPI_ANY_SOURCE and MPI_ANY_TAG, the receiver will capture any message 
(even not targetted for it)" is incorrect.

I guess I still don't understand your question...?

On Jul 14, 2011, at 9:17 PM, Mudassar Majeed wrote:

>
> I know this, but when I compare status.MPI_SOURCE with myid, they 
are different. I guess you need to reconsider my question. The 
MPI_Recv function seems to capture message from the queue with some 
search parameters like source, tag etc. So in case the receiver does 
not know the sender and wants to receive only that message which was 
sent for this receiver. But when it captures with source as 
MPI_ANY_SOURCE and MPI_ANY_TAG, the receiver will capture any message 
(even not targetted for it).

>
> regards,
> Mudassar
>
>
> From: Jeff Squyres >
> To: Mudassar Majeed >; Open MPI Users >

> Sent: Friday, July 15, 2011 1:58 AM
> Subject: Re: [OMPI users] Urgent Question regarding, MPI_ANY_SOURCE.
>
> When you use MPI_ANY_SOURCE in a receive, the rank of the actual 
sender is passed back to you in the status.MPI_SOURCE.

>
> On Jul 14, 2011, at 7:55 PM, Mudassar Majeed wrote:
>
> > Hello people,
> >I am trapped in the following problem plz 
help me. Suppose a process A sends a message to process B. The process 
B will receive the message with MPI_Recv with MPI_ANY_SOURCE in the 
source argument. Let say process B does not know that A is the sender. 
But I want B to receive message from process A (the one who actually 
sends the message to process B). But if I use MPI_ANY_SOURCE, then any 
message from any source is captured by process B (let say there are 
other processes sending messages). Instead of MPI_ANY_SOURCE I cannot 
use A in the source argument as B does not know about the sender. What 
should I do in this situation ?

> >
> > regards,
> > Mudassar Majeed
> > ___
> > users mailing list
> > us...@open-mpi.org 
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com 
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
>
>

--
Jeff Squyres
jsquy...@cisco.com 
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] OpenMPI LS-DYNA Connection refused

2011-05-03 Thread Terry Dontje

Looking at your output more the below "Connect to address" doesn't match 
any messages I see in the source code.  Also "trying normal 
/usr/bin/rsh" looks odd to me.


You may want to set the mca parameter mpi_abort_delay and attach a 
debugger to the abortive process and dump out a stack trace.  That 
should give a better idea where the failure is being triggered.  You can 
look at http://www.open-mpi.org/faq/?category=debugging question 4 for 
more info on the parameter.


--td

On 05/02/2011 03:40 PM, Robert Walters wrote:


I've attached the typical error message I've been getting. This is 
from a run I initiated this morning. The first few lines or so are 
related to the LS-DYNA program and are just there to let you know its 
running successfully for an hour and a half.


What's interesting is this doesn't happen on every job I run, and will 
recur for the same simulation. For instance, Simulation A will run for 
40 hours, and complete successfully. Simulation B will run for 6 
hours, and die from an error. Any further attempts to run simulation B 
will always end from an error. This makes me think there is some kind 
of bad calculation happening that OpenMPI doesn't know how to handle, 
or LS-DYNA doesn't know how to pass to OpenMPI. On the other hand, 
this particular simulation is one of those "benchmarks" and everyone 
runs it. I should not be getting errors from the FE code itself. 
Odd... I think I'll try this as an SMP job as well as an MPP job over 
a single node and see if the issue continues. That way I can figure 
out if its OpenMPI related or FE code related, but as I mentioned, I 
don't think it is FE code related since others have successfully run 
this particular benchmarking simulation.


*_Error Message:_*

 Parallel execution with 56 MPP proc

 NLQ used/max   152/   152

 Start time   05/02/2011 10:02:20

 End time 05/02/2011 11:24:46

 Elapsed time4946 seconds(  1 hours 22 min. 26 sec.) for9293 
cycles


 E r r o r   t e r m i n a t i o n

--

MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD

with errorcode -1525207032.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.

You may or may not see output from other processes, depending on

exactly when Open MPI kills them.

--

connect to address xx.xxx.xx.xxx port 544: Connection refused

connect to address xx.xxx.xx.xxx port 544: Connection refused

trying normal rsh (/usr/bin/rsh)

--

mpirun has exited due to process rank 0 with PID 24488 on

node allision exiting without calling "finalize". This may

have caused other processes in the application to be

terminated by signals sent by mpirun (as reported here).

--

Regards,

Robert Walters



*From:*users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] 
*On Behalf Of *Terry Dontje

*Sent:* Monday, May 02, 2011 2:50 PM
*To:* us...@open-mpi.org
*Subject:* Re: [OMPI users] OpenMPI LS-DYNA Connection refused

On 05/02/2011 02:04 PM, Robert Walters wrote:

Terry,

I was under the impression that all connections are made because of 
the nature of the program that OpenMPI is invoking. LS-DYNA is a 
finite element solver and for any given simulation I run, the cores on 
each node must constantly communicate with one another to check for 
various occurrences (contact with various pieces/parts, updating nodal 
coordinates, etc...).


You might be right, the connections might have been established but 
the error message you state (connection refused) seems out of place if 
the connection was already established.


Was there more error messages from OMPI other than "connection 
refused"?  If so could you possibly provide that output to us, maybe 
it will give us a hint where in the library things are messing up.


I've run the program using --mca mpi_preconnect_mpi 1 and the 
simulation has started itself up successfully which I think means that 
the mpi_preconnect passed since all of the child processes have 
started up on each individual node. Thanks for the suggestion though, 
it's a good place to start.


Yeah, it possibly could be telling if things do work with this setting.

I've been worried (though I have no basis for it) that messages may be 
getting queued up and hitting some kind of ceiling or timeout. As a 
finite element code, I think the communication occurs on a large 
scale. Lots of very small packets going back and forth quickly. A few 
studies have been done by the High Performance Computing Advisory 
Council 
(http://www.hpcadvisorycouncil.com/pdf/LS-DYNA%20_analysis.pdf) and 
they've suggested

Re: [OMPI users] OpenMPI LS-DYNA Connection refused

2011-05-03 Thread Terry Dontje


A little more clarification:

1.  Simulations that fail always seem to fail?
2.  Does the same simulation always fail between the same processes (how 
about nodes)? I thought you

said no previously.
3.  Did the mpi_preconnect_mpi help any?
4.  Is there any informational messages in the /var/log/messages file 
around or before the abort?
5.  Have you tried netstat -s 1 while the program is running on one of 
the nodes that fail and see if

you are getting any of the failure type events spiking?

The error code coming back from MPI_Abort seems really odd.  I am 
curious whether the connection refused is a result of the abort or what?


--td
On 05/02/2011 03:40 PM, Robert Walters wrote:


I've attached the typical error message I've been getting. This is 
from a run I initiated this morning. The first few lines or so are 
related to the LS-DYNA program and are just there to let you know its 
running successfully for an hour and a half.


What's interesting is this doesn't happen on every job I run, and will 
recur for the same simulation. For instance, Simulation A will run for 
40 hours, and complete successfully. Simulation B will run for 6 
hours, and die from an error. Any further attempts to run simulation B 
will always end from an error. This makes me think there is some kind 
of bad calculation happening that OpenMPI doesn't know how to handle, 
or LS-DYNA doesn't know how to pass to OpenMPI. On the other hand, 
this particular simulation is one of those "benchmarks" and everyone 
runs it. I should not be getting errors from the FE code itself. 
Odd... I think I'll try this as an SMP job as well as an MPP job over 
a single node and see if the issue continues. That way I can figure 
out if its OpenMPI related or FE code related, but as I mentioned, I 
don't think it is FE code related since others have successfully run 
this particular benchmarking simulation.


*_Error Message:_*

 Parallel execution with 56 MPP proc

 NLQ used/max   152/   152

 Start time   05/02/2011 10:02:20

 End time 05/02/2011 11:24:46

 Elapsed time4946 seconds(  1 hours 22 min. 26 sec.) for9293 
cycles


 E r r o r   t e r m i n a t i o n

--

MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD

with errorcode -1525207032.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.

You may or may not see output from other processes, depending on

exactly when Open MPI kills them.

--

connect to address xx.xxx.xx.xxx port 544: Connection refused

connect to address xx.xxx.xx.xxx port 544: Connection refused

trying normal rsh (/usr/bin/rsh)

--

mpirun has exited due to process rank 0 with PID 24488 on

node allision exiting without calling "finalize". This may

have caused other processes in the application to be

terminated by signals sent by mpirun (as reported here).

--

Regards,

Robert Walters



*From:*users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] 
*On Behalf Of *Terry Dontje

*Sent:* Monday, May 02, 2011 2:50 PM
*To:* us...@open-mpi.org
*Subject:* Re: [OMPI users] OpenMPI LS-DYNA Connection refused

On 05/02/2011 02:04 PM, Robert Walters wrote:

Terry,

I was under the impression that all connections are made because of 
the nature of the program that OpenMPI is invoking. LS-DYNA is a 
finite element solver and for any given simulation I run, the cores on 
each node must constantly communicate with one another to check for 
various occurrences (contact with various pieces/parts, updating nodal 
coordinates, etc...).


You might be right, the connections might have been established but 
the error message you state (connection refused) seems out of place if 
the connection was already established.


Was there more error messages from OMPI other than "connection 
refused"?  If so could you possibly provide that output to us, maybe 
it will give us a hint where in the library things are messing up.


I've run the program using --mca mpi_preconnect_mpi 1 and the 
simulation has started itself up successfully which I think means that 
the mpi_preconnect passed since all of the child processes have 
started up on each individual node. Thanks for the suggestion though, 
it's a good place to start.


Yeah, it possibly could be telling if things do work with this setting.

I've been worried (though I have no basis for it) that messages may be 
getting queued up and hitting some kind of ceiling or timeout. As a 
finite element code, I think the communication occurs on a large 
scale. Lots of very small packets going back and forth quickly. A few 
studies have been done by

Re: [OMPI users] OpenMPI LS-DYNA Connection refused

2011-05-02 Thread Terry Dontje


On 05/02/2011 02:04 PM, Robert Walters wrote:


Terry,

I was under the impression that all connections are made because of 
the nature of the program that OpenMPI is invoking. LS-DYNA is a 
finite element solver and for any given simulation I run, the cores on 
each node must constantly communicate with one another to check for 
various occurrences (contact with various pieces/parts, updating nodal 
coordinates, etc...).


You might be right, the connections might have been established but the 
error message you state (connection refused) seems out of place if the 
connection was already established.


Was there more error messages from OMPI other than "connection 
refused"?  If so could you possibly provide that output to us, maybe it 
will give us a hint where in the library things are messing up.


I've run the program using --mca mpi_preconnect_mpi 1 and the 
simulation has started itself up successfully which I think means that 
the mpi_preconnect passed since all of the child processes have 
started up on each individual node. Thanks for the suggestion though, 
it's a good place to start.



Yeah, it possibly could be telling if things do work with this setting.


I've been worried (though I have no basis for it) that messages may be 
getting queued up and hitting some kind of ceiling or timeout. As a 
finite element code, I think the communication occurs on a large 
scale. Lots of very small packets going back and forth quickly. A few 
studies have been done by the High Performance Computing Advisory 
Council 
(http://www.hpcadvisorycouncil.com/pdf/LS-DYNA%20_analysis.pdf) and 
they've suggested that LS-DYNA communicates at very, very high rates 
(Not sure but from pg.15 of that document they're suggesting hundreds 
of millions of messages in only a few hours). Is there any kind of 
buffer or queue that OpenMPI develops if messages are created too 
quickly? Does it dispatch them immediately or does it attempt to apply 
some kind of traffic flow control?


The queuing really depends on what type of calls the application is 
making.  If it is doing blocking sends then I wouldn't expect too much 
queuing happening using the tcp btl.  As far as traffic flow control is 
concerned I believe the tcp btl doesn't do any for the most part and 
lets tcp handle that.  Maybe someone else on the list could chime in if 
I am wrong here.


In the past I have seen where lots of traffic on the network and to a 
particular node has cause some connections not to be established.  But I 
don't know of any outstanding issue with such issues right now.


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] OMPI vs. network socket communcation

2011-05-02 Thread Terry Dontje


On 05/02/2011 11:30 AM, Jack Bryan wrote:

Thanks for your reply.

MPI is for academic purpose. How about business applications ?

There are quite a bit of non-academic MPI applications.  For example 
there are quite a bit of simulation codes from different vendors that 
support MPI (Nastran is one common one).
What kinds of parallel/distributed computing environment do the 
financial institutions

use for their high frequency trading ?
I personally know of a private trading shop that uses MPI, but that's as 
much as I can say.  I am not sure how common it is, however the direct 
communications to the trading servers is still via sockets or something 
similar as opposed to MPI.


--td



Any help is really appreciated.

Thanks,


Date: Mon, 2 May 2011 08:34:33 -0400
From: terry.don...@oracle.com
To: us...@open-mpi.org
Subject: Re: [OMPI users] OMPI vs. network socket communcation

On 04/30/2011 08:52 PM, Jack Bryan wrote:

Hi, All:

What is the relationship between MPI communication and socket
communication ?

MPI may use socket communications to do communications between two 
processes.  Aside from that they are used for different purposes.


Is the network socket programming better than MPI ?

Depends on what you are trying to do.  If you are writing a parallel 
program that may run in multiple environments with different types of 
performing protocols available for its use then MPI is probably 
better.  If you are looking to do simple client/server type 
programming then socket program might have an advantage.



I am a newbie of network socket programming.

I do not know which one is better for parallel/distributed
computing ?

IMO MPI.


I know that network socket is unix-based file communication
between server and client.

If they can also be used for parallel computing, how MPI can work
better than them ?

There is a lot of stuff that MPI does behind the curtain to make a 
parallel applications life a lot easier.  As far as performance MPI 
will not perform better than sockets if it is using sockets as the 
underlying model.  However, the performance difference should be 
negligible which makes all the other stuff MPI does for you a big win.



I know MPI is for homogeneous cluster system and network socket is
based on internet TCP/IP.

What do you mean by homogeneous cluster?  There are some MPIs that can 
work among different platforms and even different OSes (though some 
initial setup may be necessary).


Hope this helps,


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 




___ users mailing list 
us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] OMPI vs. network socket communcation

2011-05-02 Thread Terry Dontje


On 04/30/2011 08:52 PM, Jack Bryan wrote:

Hi, All:

What is the relationship between MPI communication and socket 
communication ?


MPI may use socket communications to do communications between two 
processes.  Aside from that they are used for different purposes.

Is the network socket programming better than MPI ?
Depends on what you are trying to do.  If you are writing a parallel 
program that may run in multiple environments with different types of 
performing protocols available for its use then MPI is probably better.  
If you are looking to do simple client/server type programming then 
socket program might have an advantage.


I am a newbie of network socket programming.

I do not know which one is better for parallel/distributed computing ?

IMO MPI.


I know that network socket is unix-based file communication between 
server and client.


If they can also be used for parallel computing, how MPI can work 
better than them ?
There is a lot of stuff that MPI does behind the curtain to make a 
parallel applications life a lot easier.  As far as performance MPI will 
not perform better than sockets if it is using sockets as the underlying 
model.  However, the performance difference should be negligible which 
makes all the other stuff MPI does for you a big win.


I know MPI is for homogeneous cluster system and network socket is 
based on internet TCP/IP.
What do you mean by homogeneous cluster?  There are some MPIs that can 
work among different platforms and even different OSes (though some 
initial setup may be necessary).


Hope this helps,


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] Configure fail: OpenMPI/1.5.3 with Support for LSF using Sun Studio compilers

2011-04-08 Thread Terry Dontje

Paul and I have been talking about the below issue and I thought it 
would be useful to update the list just in case someone else runs into 
this problem and ends up searching the email list before we actually fix 
the issue.


The problem is OMPI's configure tests to see if -lm is needed to get 
math library functions (eg ceil, sqrt...).  In the case that one is 
using the Solaris Studio compilers (from Oracle) and one passes in the 
-fast option via CFLAGS the -lm test in configure will turn out false 
because -fast set the -xlibmopt flag which provides inline versions of 
some of the math library function.  Because of that OMPI decides it 
doesn't need to set -lm for linking.


The above is problematic when configuring with -with-lsf because the lsf 
library libbat.so has a symbol of ceil that needs to be resolved (so it 
needs -lm in the case of Studio compilers).  Without the -lm the 
configure check for lsf fails.


There are several work arounds:

1.  Put LIBS="-lm" on the configure line.  The compiler still will 
inline the math function compiled in OMPI but linking of the ompi libs 
with lsf libs will work because of the -lm.


2.  In the CFLAGS add -xnolibmopt in addition to -fast.  This will turn 
off the inlining and cause OMPI's configure script to insert -lm.


3.  Don't use -fast.

--td
On 04/07/2011 08:36 AM, Paul Kapinos wrote:

Hi Terry,


so, the attached ceil.c example file *can* be compiled by "CC" (the 
Studio C++ compiler), but *cannot* be compiled using "cc" (the 
Studio C compiler).

$ CC ceil.c
$ cc ceil.c

Did you try to link in the math library -lm?  When I did this your 
test program worked for me and that actually is the first test that 
the configure does.


5. Looking into configure.log and searching on `ceil' results: there 
was a check for the availability of `ceil' for the C compiler (see 
config.log.ceil). This check says `ceil' is *available* for the "cc" 
Compiler, which is *wrong*, cf. (4).

See above, it actually is right when you link in the math lib.


Thankt for the tipp! Yes, if using -lm so the Studio C compiler "cc" 
works also fine for ceil.c:


$ cc ceil.c -lm



So, is there an error in the configure stage? Or either the checks 
in config.log.ceil does not rely on the avilability of the `ceil' 
funcion in the C compiler?
It looks to me like the lbat configure test is not linking in the 
math lib. 


Yes, the is no -lm in configure:84213 line.

Note the cheks for ceil again, config.log.ceil. As far as I unterstood 
these logs, the checks for ceil and for the need of -lm deliver wrong 
results:



configure:55000: checking if we need -lm for ceil

configure:55104: result: no

configure:55115: checking for ceil

configure:55115: result: yes


So, configure assumes "ceil" is available for  the "cc" compiler 
without the need for -lm flag - and this is *wrong*, "cc" need -lm.


It seem for me to be an configure issue.

Greetings

Paul



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] Configure fail: OpenMPI/1.5.3 with Support for LSF using Sun Studio compilers

2011-04-07 Thread Terry Dontje


On 04/07/2011 08:36 AM, Paul Kapinos wrote:

Hi Terry,


so, the attached ceil.c example file *can* be compiled by "CC" (the 
Studio C++ compiler), but *cannot* be compiled using "cc" (the 
Studio C compiler).

$ CC ceil.c
$ cc ceil.c

Did you try to link in the math library -lm?  When I did this your 
test program worked for me and that actually is the first test that 
the configure does.


5. Looking into configure.log and searching on `ceil' results: there 
was a check for the availability of `ceil' for the C compiler (see 
config.log.ceil). This check says `ceil' is *available* for the "cc" 
Compiler, which is *wrong*, cf. (4).

See above, it actually is right when you link in the math lib.


Thankt for the tipp! Yes, if using -lm so the Studio C compiler "cc" 
works also fine for ceil.c:


$ cc ceil.c -lm



So, is there an error in the configure stage? Or either the checks 
in config.log.ceil does not rely on the avilability of the `ceil' 
funcion in the C compiler?
It looks to me like the lbat configure test is not linking in the 
math lib. 


Yes, the is no -lm in configure:84213 line.

Note the cheks for ceil again, config.log.ceil. As far as I unterstood 
these logs, the checks for ceil and for the need of -lm deliver wrong 
results:



configure:55000: checking if we need -lm for ceil

configure:55104: result: no

configure:55115: checking for ceil

configure:55115: result: yes


So, configure assumes "ceil" is available for  the "cc" compiler 
without the need for -lm flag - and this is *wrong*, "cc" need -lm.


Interesting.  I've looked at some of my x86, Studio, linux builds of 
OMPI 1.5 branch and I see the correct configure results for ceil that 
correctly identify the need for -lm.  Your's definitely does not come up 
with the right answer.  Are you using the "official" ompi 1.5.3 tarball?

It seem for me to be an configure issue.


Certainly does.

--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] Configure fail: OpenMPI/1.5.3 with Support for LSF using Sun Studio compilers

2011-04-07 Thread Terry Dontje


On 04/07/2011 06:16 AM, Paul Kapinos wrote:

Dear OpenMPI developers,

We tried to build OpenMPI 1.5.3 including Support for Platform LSF 
using the Sun Studio (=Oracle Solaris Studio now) /12.2 and the 
configure stage failed.


1. Used flags:

./configure --with-lsf --with-openib --with-devel-headers 
--enable-contrib-no-build=vt --enable-mpi-threads CFLAGS="-fast 
-xtarget=nehalem -m64"   CXXFLAGS="-fast -xtarget=nehalem -m64" 
FFLAGS="-fast -xtarget=nehalem" -m64   FCFLAGS="-fast -xtarget=nehalem 
-m64"   F77=f95 LDFLAGS="-fast -xtarget=nehalem -m64" 
--prefix=//openmpi-1.5.3mt/linux64/studio


(note the Support for LSF enabled by --with-lsf). The compiler envvars 
are set as following:

$ echo $CC $FC $CXX
cc f95 CC

The compiler info: (cc -V, CC -V)
cc: Sun C 5.11 Linux_i386 2010/08/13
CC: Sun C++ 5.11 Linux_i386 2010/08/13


2. The configure error was:
##
checking for lsb_launch in -lbat... no
configure: WARNING: LSF support requested (via --with-lsf) but not found.
configure: error: Aborting.
##


3. In the config.log (see the config.log.error) there is more info 
about the problem. crucial info is:

##
/opt/lsf/8.0/linux2.6-glibc2.3-x86_64/lib/libbat.so: undefined 
reference to `ceil'

##

4. Googling vor `ceil' results e.g. in 
http://www.cplusplus.com/reference/clibrary/cmath/ceil/


so, the attached ceil.c example file *can* be compiled by "CC" (the 
Studio C++ compiler), but *cannot* be compiled using "cc" (the Studio 
C compiler).

$ CC ceil.c
$ cc ceil.c

Did you try to link in the math library -lm?  When I did this your test 
program worked for me and that actually is the first test that the 
configure does.


5. Looking into configure.log and searching on `ceil' results: there 
was a check for the availability of `ceil' for the C compiler (see 
config.log.ceil). This check says `ceil' is *available* for the "cc" 
Compiler, which is *wrong*, cf. (4).

See above, it actually is right when you link in the math lib.


So, is there an error in the configure stage? Or either the checks in 
config.log.ceil does not rely on the avilability of the `ceil' funcion 
in the C compiler?

It looks to me like the lbat configure test is not linking in the math lib.

Best wishes,
Paul Kapinos






P.S. Note in in the past we build many older versions of OpenMPI with 
no support for LSF and no such problems







___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] mpi problems,

2011-04-07 Thread Terry Dontje


On 04/06/2011 03:38 PM, Nehemiah Dacres wrote:
I am also trying to get netlib's hpl to run via sun cluster tools so i 
am trying to compile it and am having trouble. Which is the proper mpi 
library to give?

naturally this isn't going to work

MPdir= /opt/SUNWhpc/HPC8.2.1c/sun/
MPinc= -I$(MPdir)/include
*MPlib= $(MPdir)/lib/libmpi.a*
Is there a reason you are trying to link with a static libmpi.  You 
really want to link with libmpi.so.  It also seems like whatever 
Makefile you are using is not using mpicc, is that true.  The reason 
that is important is mpicc would pick up the right libs you needed.  
Which brings me to Ralph's comment, if you really want to go around the 
mpicc way of compiling use mpicc --showme, copy the compile line shown 
in that commands output and insert your files accordingly.


--td


because that doesn't exist
/opt/SUNWhpc-O/HPC8.2.1c/sun/lib/libotf.a  
/opt/SUNWhpc-O/HPC8.2.1c/sun/lib/libvt.fmpi.a  
/opt/SUNWhpc-O/HPC8.2.1c/sun/lib/libvt.omp.a
/opt/SUNWhpc-O/HPC8.2.1c/sun/lib/libvt.a   
/opt/SUNWhpc-O/HPC8.2.1c/sun/lib/libvt.mpi.a   
/opt/SUNWhpc-O/HPC8.2.1c/sun/lib/libvt.ompi.a


is what I have for listing *.a  in the lib directory. none of those 
are equivilant becasue they are all linked with vampire trace if I am 
reading the names right. I've already tried putting  
/opt/SUNWhpc-O/HPC8.2.1c/sun/lib/libvt.mpi.a for this and it didnt 
work giving errors like


On Wed, Apr 6, 2011 at 12:42 PM, Terry Dontje <terry.don...@oracle.com 
<mailto:terry.don...@oracle.com>> wrote:


Something looks fishy about your numbers.  The first two sets of
numbers look the same and the last set do look better for the most
part.  Your mpirun command line looks weird to me with the "-mca
orte_base_help_aggregate btl,openib,self," did something get
chopped off with the text copy?  You should have had a "-mca btl
openib,self".  Can you do a run with "-mca btl tcp,self", it
should be slower.

I really wouldn't have expected another compiler over IB to be
that dramatically lower performing.

--td



On 04/06/2011 12:40 PM, Nehemiah Dacres wrote:

also, I'm not sure if I'm reading the results right. According to
the last run, did using the sun compilers (update 1 )  result in
higher performance with sunct?

On Wed, Apr 6, 2011 at 11:38 AM, Nehemiah Dacres
<dacre...@slu.edu <mailto:dacre...@slu.edu>> wrote:

some tests I did. I hope this isn't an abuse of the list.
please tell me if it is but thanks to all those who helped me.

this  goes to say that the sun MPI works with programs not
compiled with sun’s compilers.
this first test was run as a base case to see if MPI works.,
the sedcond run is to see the speed up using OpenIB provides
jian@therock ~]$ mpirun -machinefile list
/opt/iba/src/mpi_apps/mpi_stress/mpi_stress
Start mpi_stress at Wed Apr  6 10:56:29 2011

  Size (bytes) TxMessages  TxMillionBytes/s  
TxMessages/s

32  1  2.77
  86485.67
64  1  5.76
  90049.42
   128  1 11.00
  85923.85
   256  1 18.78
  73344.43
   512  1 34.47
  67331.98
  1024  1 34.81
  33998.09
  2048  1 17.31
   8454.27
  4096  1 18.34
   4476.61
  8192  1 25.43
   3104.28
 16384  1 15.56
949.50
 32768  1 13.95
425.74

 65536  1  9.88
150.79
131072   8192 11.05
 84.31
262144   4096 13.12
 50.04
524288   2048 16.54
 31.55
   1048576   1024 19.92
 18.99
   2097152512 22.54
 10.75
   4194304256 25.46
  6.07

Iteration 0 : errors = 0, total = 0 (495 secs, Wed Apr  6
11:04:44 2011)
After 1 iteration(s), 8 mins and 15 secs, total errors = 0

here is the infiniband run

[jian@therock ~]$ mpirun

Re: [OMPI users] mpi problems,

2011-04-06 Thread Terry Dontje

Something looks fishy about your numbers.  The first two sets of numbers 
look the same and the last set do look better for the most part.  Your 
mpirun command line looks weird to me with the "-mca 
orte_base_help_aggregate btl,openib,self," did something get chopped off 
with the text copy?  You should have had a "-mca btl openib,self".  Can 
you do a run with "-mca btl tcp,self", it should be slower.


I really wouldn't have expected another compiler over IB to be that 
dramatically lower performing.


--td


On 04/06/2011 12:40 PM, Nehemiah Dacres wrote:
also, I'm not sure if I'm reading the results right. According to the 
last run, did using the sun compilers (update 1 )  result in higher 
performance with sunct?


On Wed, Apr 6, 2011 at 11:38 AM, Nehemiah Dacres > wrote:


some tests I did. I hope this isn't an abuse of the list. please
tell me if it is but thanks to all those who helped me.

this  goes to say that the sun MPI works with programs not
compiled with sun’s compilers.
this first test was run as a base case to see if MPI works., the
sedcond run is to see the speed up using OpenIB provides
jian@therock ~]$ mpirun -machinefile list
/opt/iba/src/mpi_apps/mpi_stress/mpi_stress
Start mpi_stress at Wed Apr  6 10:56:29 2011

  Size (bytes) TxMessages  TxMillionBytes/s  
TxMessages/s
32  1  2.77  
86485.67
64  1  5.76  
90049.42
   128  1 11.00  
85923.85
   256  1 18.78  
73344.43
   512  1 34.47  
67331.98
  1024  1 34.81  
33998.09
  2048  1 17.31   
8454.27
  4096  1 18.34   
4476.61
  8192  1 25.43   
3104.28

 16384  1 15.56
949.50
 32768  1 13.95
425.74

 65536  1  9.88
150.79
131072   8192 11.05
 84.31
262144   4096 13.12
 50.04
524288   2048 16.54
 31.55
   1048576   1024 19.92
 18.99
   2097152512 22.54
 10.75
   4194304256 25.46
  6.07

Iteration 0 : errors = 0, total = 0 (495 secs, Wed Apr  6 11:04:44
2011)
After 1 iteration(s), 8 mins and 15 secs, total errors = 0

here is the infiniband run

[jian@therock ~]$ mpirun -mca orte_base_help_aggregate
btl,openib,self, -machinefile list
/opt/iba/src/mpi_apps/mpi_stress/mpi_stress
Start mpi_stress at Wed Apr  6 11:07:06 2011

  Size (bytes) TxMessages  TxMillionBytes/s  
TxMessages/s

32  1  2.72   84907.69
64  1  5.83   91097.94
   128  1 10.75   83959.63
   256  1 18.53   72384.48
   512  1 34.96   68285.00
  1024  1 11.40   11133.10
  2048  1 20.88   10196.34
  4096  1 10.132472.13
  8192  1 19.322358.25
 16384  1 14.58 890.10
 32768  1 15.85 483.61
 65536  1  9.04 137.95
 1310728192 10.90  83.12
262144   4096 13.57
 51.76
524288  2048 16.82
 32.08
   10485761024 19.10  18.21
   2097152512 22.13  10.55
   4194304256 21.66   5.16

Iteration 0 : errors = 0, total = 0 (511 secs, Wed Apr  6 11:15:37
2011)
After 1 iteration(s), 8 mins and 31 secs, total errors = 0
compiled with the sun compilers i think
[jian@therock ~]$ mpirun -mca orte_base_help_aggregate
btl,openib,self,

Re: [OMPI users] Not pointing to correct libraries

2011-04-05 Thread Terry Dontje

I am not sure Fedora comes with Open MPI installed on it by default (at 
least my FC13 did not).  You may want to look at trying to install the 
Open MPI from yum or some other package mananger.  Or you can download 
the source tarball from http://www.open-mpi.org/software/ompi/v1.4/, 
build and install it yourself.


--td

On 04/05/2011 11:01 AM, Warnett, Jason wrote:


Hello

I am running on Linux, latest version of mpi built but I've run into a 
few issues with a program which I am trying to run. It is a widely 
used open source application called LIGGGHTS so I know the code works 
and should compile, so I obviously have a setting wrong with MPI. I 
saw a similar problem in a previous post (2007), but couldn't see how 
to resolve it as I am quite new to the terminal environment in Unix 
(always been windows... until now).


So the issue I am getting is the following error...

[Jay@Jay chute_wear]$ mpirun -np 1 lmp_fedora < in.chute_wear
lmp_fedora: error while loading shared libraries: libmpi_cxx.so.0: 
cannot open shared object file: No such file or directory


So I checked where stuff was pointing using the ldd command as in that 
post and found the following:

linux-gate.so.1 =>  (0x00d1)
libmpi_cxx.so.0 => not found
libmpi.so.0 => not found
libopen-rte.so.0 => not found
libopen-pal.so.0 => not found
libdl.so.2 => /lib/libdl.so.2 (0x00cbe000)
libnsl.so.1 => /lib/libnsl.so.1 (0x007e6000)
libutil.so.1 => /lib/libutil.so.1 (0x009fa000)
libstdc++.so.6 => /usr/lib/libstdc++.so.6 (0x04a02000)
libm.so.6 => /lib/libm.so.6 (0x008a4000)
libgcc_s.so.1 => /lib/libgcc_s.so.1 (0x0011)
libpthread.so.0 => /lib/libpthread.so.0 (0x0055)
libc.so.6 => /lib/libc.so.6 (0x003b3000)
/lib/ld-linux.so.2 (0x00bfa000)

so it is the open mpi files it isn't linking to. How can i sort this? 
I shouldn't need to edit code of the executable of LIGGGHTS I've 
compiled as I know other people are using the same thing so I guess it 
is to do with the way I installed openMPI. I did a system search and 
couldn't find a file called libmpi* anywhere... so my guess is that 
I've incorrectly installed. I have tried several ways, but could you 
tell me how to fix it/ install correctly? (embaressing if it is to do 
with a correct install...)


Thanks

Jay


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] alltoall messages > 2^26

2011-04-05 Thread Terry Dontje

It was asked during the community concall whether the below may be 
related to ticket #2722 https://svn.open-mpi.org/trac/ompi/ticket/2722?


--td

On 04/04/2011 10:17 PM, David Zhang wrote:
Any error messages?  Maybe the nodes ran out of memory?  I know MPI 
implement some kind of buffering under the hood, so even though you're 
sending array's over 2^26 in size, it may require more than that for 
MPI to actually send it.


On Mon, Apr 4, 2011 at 2:16 PM, Michael Di Domenico 
> wrote:


Has anyone seen an issue where OpenMPI/Infiniband hangs when sending
messages over 2^26 in size?

For a reason i have not determined just yet machines on my cluster
(OpenMPI v1.5 and Qlogic Stack/QDR IB Adapters) is failing to send
array's over 2^26 in size via the AllToAll collective. (user code)

Further testing seems to indicate that an MPI message over 2^26 fails
(tested with IMB-MPI)

Running the same test on a different older IB connected cluster seems
to work, which would seem to indicate a problem with the infiniband
drivers of some sort rather then openmpi (but i'm not sure).

Any thoughts, directions, or tests?
___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users




--
David Zhang
University of California, San Diego


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] orte-odls-default:execv-error

2011-04-05 Thread Terry Dontje


On 04/05/2011 05:11 AM, SLIM H.A. wrote:

After an upgrade of our system I receive the following error message
(openmpi 1.4.2 with gridengine):


quote


--
Sorry!  You were supposed to get help about:
 orte-odls-default:execv-error
But I couldn't open the help file:
 ...path/1.4.2/share/openmpi/help-odls-default.txt: Cannot send after
transport endpoint shut
down.  Sorry!

end quote

and this is this is the section in the text file
...path/1.4.2/share/openmpi/help-odls-default.txt that refers to
orte-odls-default:execv-error





quote

[orte-odls-default:execv-error]
Could not execute the executable "%s": %s

This could mean that your PATH or executable name is wrong, or that you
do not
have the necessary permissions.  Please ensure that the executable is
able to be
found and executed."

end quote

Does the execv-error mean that the file
...path/1.4.2/share/openmpi/help-odls-default.txt was not accessible or
is there a different reason?

No, it thinks it cannot find some executable that was requested to run.  
Do you have the exact mpirun command line that was trying to be ran?  
Can you first try and run without gridengine?

The error message continues with


quote


--
[cn004:00591] mca: base: component_find: unable to open
...path/1.4.2/lib/openmpi/mca_grpcomm_basic: file not found (ignored)
[cn004:00586] mca: base: component_find: unable to open
...path/1.4.2/lib/openmpi/mca_notifier_syslog: file not found (ignored)
[cn004:00585] mca: base: component_find: unable to open
...path/1.4.2/lib/openmpi/mca_notifier_syslog: file not found (ignored)

--
Sorry!  You were supposed to get help about:
 find-available:none-found
But I couldn't open the help file:
 ...path/1.4.2/share/openmpi/help-mca-base.txt: Cannot send after
transport endpoint shutdown
.  Sorry!

--
[cn004:00586] PML ob1 cannot be selected

end quote

but there are .so and .la libraries in the directory
...path/1.4.2/lib/openmpi
Are those the ones not found?
I've seen this when either OPAL_PREFIX or LD_LIBRARY_PATH not being set 
up correctly.

Thanks

Henk

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] mpi problems,

2011-04-04 Thread Terry Dontje

libfui.so is a library a part of the Solaris Studio FORTRAN tools. It
should be located under lib from where your Solaris Studio compilers are
installed from. So one question is whether you actually have Studio
Fortran installed on all your nodes or not?

--td

On 04/04/2011 04:02 PM, Ralph Castain wrote:
Well, where is libfui located? Is that location in your ld path? Is
the lib present on all nodes in your hostfile?

On Apr 4, 2011, at 1:58 PM, Nehemiah Dacres wrote:

[jian@therock ~]$ echo $LD_LIBRARY_PATH
/opt/sun/sunstudio12.1/lib:/opt/vtk/lib:/opt/gridengine/lib/lx26-amd64:/opt/gridengine/lib/lx26-amd64:/home/jian/.crlibs:/home/jian/.crlibs32
[jian@therock ~]$ /opt/SUNWhpc/HPC8.2.1c/sun/bin/mpirun -np 4
-hostfile list ring2
ring2: error while loading shared libraries: libfui.so.1: cannot open
shared object file: No such file or directory
ring2: error while loading shared libraries: libfui.so.1: cannot open
shared object file: No such file or directory
ring2: error while loading shared libraries: libfui.so.1: cannot open
shared object file: No such file or directory

mpirun: killing job...

--
mpirun noticed that process rank 1 with PID 31763 on node compute-0-1
exited on signal 0 (Unknown signal 0).

--
mpirun: clean termination accomplished

I really don't know what's wrong here. I was sure that would work

On Mon, Apr 4, 2011 at 2:43 PM, Samuel K. Gutierrez > wrote:

Hi,

Try prepending the path to your compiler libraries.

Example (bash-like):

export
LD_LIBRARY_PATH=/compiler/prefix/lib:/ompi/prefix/lib:$LD_LIBRARY_PATH

--
Samuel K. Gutierrez
Los Alamos National Laboratory

On Apr 4, 2011, at 1:33 PM, Nehemiah Dacres wrote:

altering LD_LIBRARY_PATH alter's the process's path to mpi's
libraries, how do i alter its path to compiler libs like
libfui.so.1? it needs to find them cause it was compiled by a
sun compiler

On Mon, Apr 4, 2011 at 10:06 AM, Nehemiah Dacres
> wrote:

As Ralph indicated, he'll add the hostname to the error
message (but that might be tricky; that error message is
coming from rsh/ssh...).

In the meantime, you might try (csh style):

foreach host (`cat list`)
echo $host
ls -l /opt/SUNWhpc/HPC8.2.1c/sun/bin/orted
end

that's what the tentakel line was refering to, or ...

On Apr 4, 2011, at 10:24 AM, Nehemiah Dacres wrote:

> I have installed it via a symlink on all of the nodes,
I can go 'tentakel which mpirun ' and it finds it' I'll
check the library paths but isn't there a way to find
out which nodes are returning the error?

I found it misslinked on a couple nodes. thank you

--
Nehemiah I. Dacres

System Administrator
Advanced Technology Group Saint Louis University

--
Nehemiah I. Dacres

System Administrator
Advanced Technology Group Saint Louis University

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Nehemiah I. Dacres
System Administrator
Advanced Technology Group Saint Louis University

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] 1.5.3 and SGE integration?

2011-03-21 Thread Terry Dontje


Dave what version of Grid Engine are you using?
The plm checks for the following env-var's to determine if you are 
running Grid Engine.

SGE_ROOT
ARC
PE_HOSTFILE
JOB_ID

If these are not there during the session that mpirun is executed then 
it will resort to ssh.


--td


On 03/21/2011 08:24 AM, Dave Love wrote:

I've just tried 1.5.3 under SGE with tight integration, which seems to
be broken.  I built and ran in the same way as for 1.4.{1,3}, which
works, and ompi_info reports the same gridengine parameters for 1.5 as
for 1.4.

The symptoms are that it reports a failure to communicate using ssh,
whereas it should be using the SGE builtin method via qrsh.

There doesn't seem to be a relevant bug report, but before I
investigate, has anyone else succeeded/failed with it, or have any
hints?

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] Error in Binding MPI Process to a socket

2011-03-18 Thread Terry Dontje

On 03/17/2011 03:31 PM, vaibhav dutt wrote:

Hi,

Thanks for your reply. I tried to execute first a process by using

mpirun -machinefile hostfile.txt  --slot-list 0:1   -np 1

but it gives the same as error as mentioned previously.

Then, I created a rankfile with contents"

rank 0=t1.tools.xxx  slot=0:0
rank 1=t1.tools.xxx  slot=1:0.

and the  used command

mpirun -machinefile hostfile.txt --rankfile my_rankfile.txt   -np 2

but ended  up getting same error. Is there any patch that I can 
install in my system to make it

topology aware?

You may want to check that you have numa turned on.

If you look in your /etc/grub.conf file does the kernel line have 
"numa=on" in it.  If not I would suggest making a new boot line and 
appending numa=on at the end.  That way if the new boot line doesn't 
work you'll be able to go back to the old one.  Anyway, my boot line 
that turns on numa looks like the following:

title Red Hat Enterprise Linux AS-up (2.6.9-67.EL)
root (hd0,0)
kernel /vmlinuz-2.6.9-67.EL ro root=LABEL=/ console=tty0 
console=ttyS0,9600 rhgb quiet numa=on

And of course once you've saved the changes you'll need to reboot and 
select the new boot line at the grub menu.

--td

Thanks

On Thu, Mar 17, 2011 at 2:05 PM, Ralph Castain > wrote:

The error is telling you that your OS doesn't support queries
telling us what cores are on which sockets, so we can't perform a
"bind to socket" operation. You can probably still "bind to core",
so if you know what cores are in which sockets, then you could use
the rank_file mapper to assign processes to groups of cores in a
socket.

It's just that we can't do it automatically because the OS won't
give us the required info.

See "mpirun -h" for more info on slot lists.

On Mar 17, 2011, at 11:26 AM, vaibhav dutt wrote:

> Hi,
>
> I am trying to perform an experiment in which I can spawn 2 MPI
processes, one on each socket in a 4 core node
> having 2 dual cores. I used the option  "bind to socket" which
mpirun for that but I am getting an error like:
>
> An attempt was made to bind a process to a specific hardware
topology
> mapping (e.g., binding to a socket) but the operating system
does not
> support such topology-aware actions.  Talk to your local system
> administrator to find out if your system can support topology-aware
> functionality (e.g., Linux Kernels newer than v2.6.18).
>
> Systems that do not support processor topology-aware
functionality cannot
> use "bind to socket" and other related functionality.
>
>
> Can anybody please tell me what is this error about. Is there
any other option than "bind to socket"
> that I can use.
>
> Thanks.
> ___
> users mailing list
> us...@open-mpi.org 
> http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] Error in Binding MPI Process to a socket

2011-03-18 Thread Terry Dontje

On 03/17/2011 03:31 PM, vaibhav dutt wrote:

Hi,

Thanks for your reply. I tried to execute first a process by using

mpirun -machinefile hostfile.txt  --slot-list 0:1   -np 1

but it gives the same as error as mentioned previously.

Then, I created a rankfile with contents"

rank 0=t1.tools.xxx  slot=0:0
rank 1=t1.tools.xxx  slot=1:0.

and the  used command

mpirun -machinefile hostfile.txt --rankfile my_rankfile.txt   -np 2

but ended  up getting same error. Is there any patch that I can 
install in my system to make it

topology aware?

You may want to check that you have numa turned on.

If you look in your /etc/grub.conf file does the kernel line have 
"numa=on" in it.  If not I would suggest making a new boot line and 
appending numa=on at the end.  That way if the new boot line doesn't 
work you'll be able to go back to the old one.  Anyway, my boot line 
that turns on numa looks like the following:

title Red Hat Enterprise Linux AS-up (2.6.9-67.EL)
root (hd0,0)
kernel /vmlinuz-2.6.9-67.EL ro root=LABEL=/ console=tty0 
console=ttyS0,9600 rhgb quiet numa=on

And of course once you've saved the changes you'll need to reboot and 
select the new boot line at the grub menu.

--td

Thanks

On Thu, Mar 17, 2011 at 2:05 PM, Ralph Castain > wrote:

The error is telling you that your OS doesn't support queries
telling us what cores are on which sockets, so we can't perform a
"bind to socket" operation. You can probably still "bind to core",
so if you know what cores are in which sockets, then you could use
the rank_file mapper to assign processes to groups of cores in a
socket.

It's just that we can't do it automatically because the OS won't
give us the required info.

See "mpirun -h" for more info on slot lists.

On Mar 17, 2011, at 11:26 AM, vaibhav dutt wrote:

> Hi,
>
> I am trying to perform an experiment in which I can spawn 2 MPI
processes, one on each socket in a 4 core node
> having 2 dual cores. I used the option  "bind to socket" which
mpirun for that but I am getting an error like:
>
> An attempt was made to bind a process to a specific hardware
topology
> mapping (e.g., binding to a socket) but the operating system
does not
> support such topology-aware actions.  Talk to your local system
> administrator to find out if your system can support topology-aware
> functionality (e.g., Linux Kernels newer than v2.6.18).
>
> Systems that do not support processor topology-aware
functionality cannot
> use "bind to socket" and other related functionality.
>
>
> Can anybody please tell me what is this error about. Is there
any other option than "bind to socket"
> that I can use.
>
> Thanks.
> ___
> users mailing list
> us...@open-mpi.org 
> http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] Totalview not showing main program on startup with OpenMPI 1.3.x and 1.4.x

2011-02-11 Thread Terry Dontje

Sorry I have to ask this, did you build your lastest OMPI version, not 
just the application, with the -g flag too.


IIRC, when I ran into this issue I was actually able to do stepi's and 
eventually pop up the stack however that is really no way to debug a 
program :-).


Unless OMPI is somehow trashing the stack I don't see what OMPI could be 
doing to cause this type of an issue.  Again when I ran into this issue 
known working programs still worked I just was unable to get a full 
stack.  So it was definitely an interfacing issue between totalview and 
the executable (or the result of how the executable and libraries were 
compiled).   Another thing I noticed was when using Solaris Studio dbx I 
was also able to see the full stack where I could not when using 
totaview.  I am not sure if gdb could also see the full stack or not but 
it might be worth a try to attach gdb to a running program and see if 
you get a full stack.


--td


On 02/09/2011 05:35 PM, Dennis McRitchie wrote:


Thanks Terry.

Unfortunately, -fno-omit-frame-pointer is the default for the Intel 
compiler when --g  is used, which I am using since it is necessary for 
source level debugging. So the compiler kindly tells me that it is 
ignoring your suggested option when I specify it. J


Also, since I can reproduce this problem by simply changing the 
OpenMPI version, without changing the compiler version, it strikes me 
as being more likely to be an OpenMPI-related issue: 1.2.8 works, but 
anything later does not (as described below).


I have tried different versions of TotalView from 8.1 to 8.9, but all 
behave the same.


I was wondering if a change to the openmpi-totalview.tcl script might 
be needed?


Dennis

*From:*users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] 
*On Behalf Of *Terry Dontje

*Sent:* Wednesday, February 09, 2011 5:02 PM
*To:* us...@open-mpi.org
*Subject:* Re: [OMPI users] Totalview not showing main program on 
startup with OpenMPI 1.3.x and 1.4.x


This sounds like something I ran into some time ago that involved the 
compiler omitting frame pointers.  You may want to try to compile your 
code with -fno-omit-frame-pointer.  I am unsure if you may need to do 
the same while building MPI though.


--td

On 02/09/2011 02:49 PM, Dennis McRitchie wrote:

Hi,
  
I'm encountering a strange problem and can't find it having been discussed on this mailing list.
  
When building and running my parallel program using any recent Intel compiler and OpenMPI 1.2.8, TotalView behaves entirely correctly, displaying the "Process mpirun is a parallel job. Do you want to stop the job now?" dialog box, and stopping at the start of the program. The code displayed is the source code of my program's function main, and the stack trace window shows that we are stopped in the poll function many levels "up" from my main function's call to MPI_Init. I can then set breakpoints, single step, etc., and the code runs appropriately.
  
But when building and running using Intel compilers with OpenMPI 1.3.x or 1.4.x, TotalView displays the usual dialog box, and stops at the start of the program; but my main program's source code is *not* displayed. The stack trace window again shows that we are stopped in the poll function several levels "up" from my main function's call to MPI_Init; but this time, the code displayed is the assembler code for the poll function itself.
  
If I click on 'main' in the stack trace window, the source code for my program's function main is then displayed, and I can now set breakpoints, single step, etc. as usual.
  
So why is the program's source code not displayed when using 1.3.x and 1.4.x, but is displayed when using 1.2.8. This change in behavior is fairly confusing to our users, and it would be nice to have it work as it used to, if possible.
  
Thanks,

Dennis
  
Dennis McRitchie

Computational Science and Engineering Support (CSES)
Academic Services Department
Office of Information Technology
Princeton University
  
  
___

users mailing list
us...@open-mpi.org  <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>

Re: [OMPI users] Totalview not showing main program on startup with OpenMPI 1.3.x and 1.4.x

2011-02-09 Thread Terry Dontje

This sounds like something I ran into some time ago that involved the 
compiler omitting frame pointers.  You may want to try to compile your 
code with -fno-omit-frame-pointer.  I am unsure if you may need to do 
the same while building MPI though.


--td

On 02/09/2011 02:49 PM, Dennis McRitchie wrote:

Hi,

I'm encountering a strange problem and can't find it having been discussed on 
this mailing list.

When building and running my parallel program using any recent Intel compiler and OpenMPI 1.2.8, 
TotalView behaves entirely correctly, displaying the "Process mpirun is a parallel job. Do you 
want to stop the job now?" dialog box, and stopping at the start of the program. The code 
displayed is the source code of my program's function main, and the stack trace window shows that 
we are stopped in the poll function many levels "up" from my main function's call to 
MPI_Init. I can then set breakpoints, single step, etc., and the code runs appropriately.

But when building and running using Intel compilers with OpenMPI 1.3.x or 1.4.x, 
TotalView displays the usual dialog box, and stops at the start of the program; but my 
main program's source code is *not* displayed. The stack trace window again shows that we 
are stopped in the poll function several levels "up" from my main function's 
call to MPI_Init; but this time, the code displayed is the assembler code for the poll 
function itself.

If I click on 'main' in the stack trace window, the source code for my 
program's function main is then displayed, and I can now set breakpoints, 
single step, etc. as usual.

So why is the program's source code not displayed when using 1.3.x and 1.4.x, 
but is displayed when using 1.2.8. This change in behavior is fairly confusing 
to our users, and it would be nice to have it work as it used to, if possible.

Thanks,
Dennis

Dennis McRitchie
Computational Science and Engineering Support (CSES)
Academic Services Department
Office of Information Technology
Princeton University


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] How closely tied is a specific release of OpenMPI to the host operating system and other system software?

2011-02-02 Thread Terry Dontje


On 02/01/2011 07:34 PM, Jeff Squyres wrote:

On Feb 1, 2011, at 5:02 PM, Jeffrey A Cummings wrote:


I'm getting a lot of push back from the SysAdmin folks claiming that OpenMPI is 
closely intertwined with the specific version of the operating system and/or 
other system software (i.e., Rocks on the clusters).

I wouldn't say that this is true.  We test across a wide variety of OS's and 
compilers.  I'm sure that there are particular platforms/environments that can 
trip up some kind of problem (it's happened before), but in general, Open MPI 
is pretty portable.


To state my question another way:  Apparently each release of Linux and/or 
Rocks comes with some version of OpenMPI bundled in.  Is it dangerous in some 
way to upgrade to a newer version of OpenMPI?

Not at all.  Others have said it, but I'm one of the developers and I'll 
reinforce their answers: I regularly have about a dozen different installations 
of Open MPI on my cluster at any given time (all in different stages of 
development -- all installed to different prefixes).  I switch between them 
quite easily by changing my PATH and LD_LIBRARY_PATH (both locally and on 
remote nodes).
Not to be a lone descenting opinion here is my experience in doing the 
above.


First if you are always recompiling your application with a specific 
version of OMPI then I would agree with everything Jeff said above.  
That is you can build many versions of OMPI on many linux versions and 
have them run.


But there are definite pitfalls once you start trying to keep one set of 
executables and OMPI binaries across different Linux versions.


1.  You may see executables not be able to use OMPI libraries that 
differ in the first dot number release (eg 1.3 vs 1.4 or 1.5 branches).  
We the community try to avoid these incompatibilities as much as 
possible but it happens on occasion (I think 1.3 to 1.4 is one such 
occasion).


2.  The system libraries on different linux versions are not always the 
same.  At Oracle we build a binary distribution of OMPI that we test out 
on several different versions of Linux.  The key here is building on a 
machine that is essentially the lowest common denominator of all the 
system software that exists on the machines one will be running on.  
This is essentially why Oracle states a bounded set of OS versions a 
distribution runs on.  An example of this is there is a component in 
OMPI that was relying on a version of libbfd that changed significantly 
between Linux version.  Once we got rid of the usage of that library we 
were ok.  There are not "a lot" of these instances but the number is not 
zero.


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] Newbie question

2011-01-11 Thread Terry Dontje

So are you trying to start an mpi job that one process is one executable 
and the other process(es) are something else?  If so, you probably want 
to use a multiple app context.  If you look at  FAQ question 7. How do I 
run an MPMD MPI Job at http://www.open-mpi.org/faq/?category=running 
this should answer your question below I believe.


--td

On 01/11/2011 01:06 AM, Tena Sakai wrote:

Hi,

Thanks for your reply.

I am afraid your terse response doesn’t shed much light.  What I need 
is “hosts”
parameter I can use to mpi.spawn.Rslaves() function.  Can you explain 
or better

yet give an example as to how I can get this via mpirun?

Looking at mpirun man page, I found an example:
  mpirun –H aa,aa,bb  ./a.out
and similar ones.  But they all execute a program (like a.out above). 
 That’’s not
what I want.  What I want is to spawn a bunch of R slaves to other 
machines on
the network.  I can spawn R slaves, as many as I like, to the local 
machine, but
I don’t know how to do this with machines on the network.  That’s what 
“hosts”
parameter of mpi.spawn.Rslaves() enables me to do, I think.  If I can 
do that, then

Rmpi has function(s) to send command to each of the spawned slaves.

My question is how can I get open MPI to give me those “hosts” parameters.

Can you please help me?

Thank you in advance.

Tena Sakai
tsa...@gallo.ucsf.edu


On 1/10/11 8:14 PM, "pooja varshneya"  wrote:

You can use mpirun.

On Mon, Jan 10, 2011 at 8:04 PM, Tena Sakai
 wrote:

Hi,

I am an mpi newbie.  My open MPI is v 1.4.3, which I compiled
on a linux machine.

I am using a language called R, which has an mpi
interface/package.
It appears that it is happy, on the surface, with the open MPI
I installed.

There is an R function called mpi.spawn.Rslaves().  An argument to
this function is nslaves.  I can issue, for example,
  mpi.spawn.Rslaves( nslaves=20 )
And it spawns 20 slave processes.  The trouble is that it is
all on the
same node as that of the master.  I want, instead, these 20
(or more)
slaves spawned on other machines on the network.

It so happens the mpi.spawn.Rslaves() has an extra argument called
hosts.  Here’s the definition of hosts from the api document:
“NULL or
LAM node numbers to specify where R slaves to be spawned.”  I have
no idea what LAM node is, but there  is a funciton called
lamhosts().
which returns a bit verbose message:

  It seems that there is no lamd running on the host
compute-0-0.local.

  This indicates that the LAM/MPI runtime environment is not
operating.
  The LAM/MPI runtime environment is necessary for the
"lamnodes" command.

  Please run the "lamboot" command the start the LAM/MPI runtime
  environment.  See the LAM/MPI documentation for how to invoke
  "lamboot" across multiple machines.

Here’s my question.  Is there such command as lamboot in open
MPI 1.4.3?
Or am I using a wrong mpi software?  In a FAQ I read that
there are other
MPI software (FT-mpi, LA-mpi, LAM-mpi), but I had notion that
open MPI
is to have functionalities of all.  Is this a wrong impression?

Thank you for your help.

Tena Sakai
tsa...@gallo.ucsf.edu 

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] Method for worker to determine its "rank" on a single machine?

2010-12-10 Thread Terry Dontje


On 12/10/2010 03:24 PM, David Mathog wrote:

Ashley Pittman wrote:


For a much simpler approach you could also use these two environment

variables, this is on my current system which is 1.5 based, YMMV of course.

OMPI_COMM_WORLD_LOCAL_RANK
OMPI_COMM_WORLD_LOCAL_SIZE
However that doesn't really tell you which MPI_COMM_WORLD ranks are on 
the same node as you I believe.


--td

That is simpler.  It works on OMPI 1.4.3 too:

cat>/usr/common/bin/dumpev.sh

Re: [OMPI users] Guaranteed run rank 0 on a given machine?

2010-12-10 Thread Terry Dontje


On 12/10/2010 01:46 PM, David Mathog wrote:

The master is commonly very different from the workers, so I expected
there would be something like

   --rank0-on

but there doesn't seem to be a single switch on mpirun to do that.

If "mastermachine" is the first entry in the hostfile, or the first
machine in a -hosts list, will rank 0 always run there?  If so, will it
always run in the first slot on the first machine listed?  That seems to
be the case in practice, but is it guaranteed?  Even if -loadbalance is
used?


For Open MPI the above is correct, I am hesitant to use guaranteed though.

Otherwise, there is the rankfile method.  In the situation where the
master must run on a specific node, but there is no preference for the
workers, would a rank file like this be sufficient?

rank 0=mastermachine slot=0
I thought you may have had to give all ranks but empirically it looks 
like you can.

The mpirun man page gives an example where all nodes/slots are
specified, but it doesn't say explicitly what happens if the
configuration is only partially specified, or how it interacts with the
-np parameter.  Modifying the man page example:

cat myrankfile
rank 0=aa slot=1:0-2
rank 1=bb slot=0:0,1
rank 2=cc slot=1-2
mpirun -H aa,bb,cc,dd -np 4 -rf myrankfile ./a.out

Rank 0 runs on node aa, bound to socket 1, cores 0-2.
Rank 1 runs on node bb, bound to socket 0, cores 0 and 1.
Rank 2 runs on node cc, bound to cores 1 and 2.

Rank 3 runs where?  not at all, or on dd, aa:slot=0, or ...?
From my empirical runs it looks to me like rank 3 would end up on aa 
possibly slot=0.
In other words once you run out of entries in the rankfile it looks like 
the processes then start from the beginning of the hostlist again.


--td

Also, in my limited testing --host and -hostfile seem to be mutually
exclusive.  That is reasonable, but it isn't clear that it is intended.
  Example, with a hostfile containing one entry for "monkey02.cluster
slots=1":

mpirun  --host monkey01   --mca plm_rsh_agent rsh  hostname
monkey01.cluster
mpirun  --host monkey02   --mca plm_rsh_agent rsh  hostname
monkey02.cluster
mpirun  -hostfile /usr/common/etc/openmpi.machines.test1 \
--mca plm_rsh_agent rsh  hostname
monkey02.cluster
mpirun  --host monkey01  \
   -hostfile /usr/commom/etc/openmpi.machines.test1 \
   --mca plm_rsh_agent rsh  hostname
--
There are no allocated resources for the application
   hostname
that match the requested mapping:


Verify that you have mapped the allocated resources properly using the
--host or --hostfile specification.
--




Thanks,

David Mathog
mat...@caltech.edu
Manager, Sequence Analysis Facility, Biology Division, Caltech
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] Method for worker to determine its "rank" on a single machine?

2010-12-10 Thread Terry Dontje


On 12/10/2010 09:19 AM, Richard Treumann wrote:


It seems to me the MPI_Get_processor_namedescription is too ambiguous 
to make this 100% portable.  I assume most MPI implementations simply 
use the hostname so all processes on the same host will return the 
same string.  The suggestion would work then.


However, it would also be reasonable for an MPI  that did processor 
binding to return " hostname.socket#.core#" so every rank would have a 
unique processor name.
Fair enough.  However, I think it is a lot more stable then grabbing 
information from the bowels of the runtime environment.  Of course one 
could just call the appropriate system call to get the hostname, if you 
are on the right type of OS/Architecture :-).


The extension idea is a bit at odds with the idea that MPI is an 
architecture independent API.  That does not rule out the option if 
there is a good use case but it does raise the bar just a bit.


Yeah, that is kind of the rub isn't it.  There is enough architectural 
differences out there that it might be difficult to come to an agreement 
on the elements of locality you should focus on.  It would be nice if 
there was some sort of distance value that would be assigned to each 
peer a process has.  Of course then you still have the problem trying to 
figure out what distance you really want to base your grouping on.


--td

Dick Treumann  -  MPI Team
IBM Systems & Technology Group
Dept X2ZA / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363



From:   Ralph Castain <r...@open-mpi.org>
To: Open MPI Users <us...@open-mpi.org>
Date:   12/10/2010 08:00 AM
Subject: 	Re: [OMPI users] Method for worker to determine its "rank" 
on asingle machine?

Sent by:users-boun...@open-mpi.org






Ick - I agree that's portable, but truly ugly.

Would it make sense to implement this as an MPI extension, and then 
perhaps propose something to the Forum for this purpose?


Just hate to see such a complex, time-consuming method when the info 
is already available on every process.


On Dec 10, 2010, at 3:36 AM, Terry Dontje wrote:

A more portable way of doing what you want below is to gather each 
processes processor_name given by MPI_Get_processor_name, have the 
root who gets this data assign unique numbers to each name and then 
scatter that info to the processes and have them use that as the color 
to a MPI_Comm_split call.  Once you've done that you can do a 
MPI_Comm_size to find how many are on the node and be able to send to 
all the other processes on that node using the new communicator.


Good luck,

--td
On 12/09/2010 08:18 PM, Ralph Castain wrote:
The answer is yes - sort of...

In OpenMPI, every process has information about not only its own local 
rank, but the local rank of all its peers regardless of what node they 
are on. We use that info internally for a variety of things.


Now the "sort of". That info isn't exposed via an MPI API at this 
time. If that doesn't matter, then I can tell you how to get it - it's 
pretty trivial to do.



On Dec 9, 2010, at 6:14 PM, David Mathog wrote:


Is it possible through MPI for a worker to determine:

1. how many MPI processes are running on the local machine
2. within that set its own "local rank"

?

For instance, a quad core with 4 processes might be hosting ranks 10,
14, 15, 20, in which case the "local ranks" would be 1,2,3,4.  The idea
being to use this information so that a program could selectively access
different local resources.  Simple example: on this 4 worker machine
reside telephone directories for Los Angeles, San Diego, San Jose, and
Sacramento.  Each worker is to open one database and search it when the
master sends a request.  With the "local rank" number this would be as
easy as naming the databases file1, file2, file3, and file4.  Without it
the 4 processes would have to communicate with each other somehow to
sort out which is to use which database.  And that could get ugly fast,
especially if they don't all start at the same time.

Thanks,

David Mathog
_mathog@caltech.edu_ <mailto:mat...@caltech.edu>
Manager, Sequence Analysis Facility, Biology Division, Caltech
___
users mailing list
_users@open-mpi.org_ <mailto:us...@open-mpi.org>
_http://www.open-mpi.org/mailman/listinfo.cgi/users_


___
users mailing list
_users@open-mpi.org_ <mailto:us...@open-mpi.org>
_http://www.open-mpi.org/mailman/listinfo.cgi/users_



--

Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email _terry.dontje@oracle.com_ <mailto:terry.don...@oracle.com>

Re: [OMPI users] Segmentation fault in mca_pml_ob1.so

2010-12-07 Thread Terry Dontje

I am not sure this has anything to do with your problem but if you look 
at the stack entry for PMPI_Recv I noticed the buf has a value of 0.  
Shouldn't that be an address?


Does your code fail if the MPI library is built with -g?  If it does 
fail the same way, the next step I would do would be to walk up the 
stack and try and figure out where the sendreq address is coming from 
because supposedly it is that address that is not mapped according to 
the original stack.


--td

On 12/07/2010 08:29 AM, Grzegorz Maj wrote:

Some update on this issue. I've attached gdb to the crashing
application and I got:

-
Program received signal SIGSEGV, Segmentation fault.
mca_pml_ob1_send_request_put (sendreq=0x130c480, btl=0xc49850,
hdr=0xd10e60) at pml_ob1_sendreq.c:1231
1231pml_ob1_sendreq.c: No such file or directory.
in pml_ob1_sendreq.c
(gdb) bt
#0  mca_pml_ob1_send_request_put (sendreq=0x130c480, btl=0xc49850,
hdr=0xd10e60) at pml_ob1_sendreq.c:1231
#1  0x7fc55bf31693 in mca_btl_tcp_endpoint_recv_handler (sd=, flags=, user=) at btl_tcp_endpoint.c:718
#2  0x7fc55fff7de4 in event_process_active (base=0xc1daf0,
flags=2) at event.c:651
#3  opal_event_base_loop (base=0xc1daf0, flags=2) at event.c:823
#4  0x7fc55ffe9ff1 in opal_progress () at runtime/opal_progress.c:189
#5  0x7fc55c9d7115 in opal_condition_wait (addr=, count=, datatype=,
src=, tag=,
 comm=, status=0xcc6100) at
../../../../opal/threads/condition.h:99
#6  ompi_request_wait_completion (addr=,
count=, datatype=,
src=, tag=,
 comm=, status=0xcc6100) at
../../../../ompi/request/request.h:375
#7  mca_pml_ob1_recv (addr=, count=, datatype=, src=, tag=, comm=,
 status=0xcc6100) at pml_ob1_irecv.c:104
#8  0x7fc560511260 in PMPI_Recv (buf=0x0, count=12884048,
type=0xd10410, source=-1, tag=0, comm=0xd0daa0, status=0xcc6100) at
precv.c:75
#9  0x0049cc43 in BI_Srecv ()
#10 0x0049c555 in BI_IdringBR ()
#11 0x00495ba1 in ilp64_Cdgebr2d ()
#12 0x0047ffa0 in Cdgebr2d ()
#13 0x7fc5621da8e1 in PB_CInV2 () from
/home/gmaj/lib/intel_mkl/current/lib/em64t/libmkl_scalapack_ilp64.so
#14 0x7fc56220289c in PB_CpgemmAB () from
/home/gmaj/lib/intel_mkl/current/lib/em64t/libmkl_scalapack_ilp64.so
#15 0x7fc5622b28fd in pdgemm_ () from
/home/gmaj/lib/intel_mkl/current/lib/em64t/libmkl_scalapack_ilp64.so
-

So this looks like the line responsible for segmentation fault is:
mca_bml_base_endpoint_t *bml_endpoint = sendreq->req_endpoint;

I repeated it several times: always crashes in the same line.

I have no idea what to do with this. Again, any help would be appreciated.

Thanks,
Grzegorz Maj



2010/12/6 Grzegorz Maj:

Hi,
I'm using mkl scalapack in my project. Recently, I was trying to run
my application on new set of nodes. Unfortunately, when I try to
execute more than about 20 processes, I get segmentation fault.

[compn7:03552] *** Process received signal ***
[compn7:03552] Signal: Segmentation fault (11)
[compn7:03552] Signal code: Address not mapped (1)
[compn7:03552] Failing at address: 0x20b2e68
[compn7:03552] [ 0] /lib64/libpthread.so.0(+0xf3c0) [0x7f46e0fc33c0]
[compn7:03552] [ 1]
/home/gmaj/lib/openmpi/lib/openmpi/mca_pml_ob1.so(+0xd577)
[0x7f46dd093577]
[compn7:03552] [ 2]
/home/gmaj/lib/openmpi/lib/openmpi/mca_btl_tcp.so(+0x5b4c)
[0x7f46dc5edb4c]
[compn7:03552] [ 3]
/home/gmaj/lib/openmpi/lib/libopen-pal.so.0(+0x1dbe8) [0x7f46e0679be8]
[compn7:03552] [ 4]
(home/gmaj/lib/openmpi/lib/libopen-pal.so.0(opal_progress+0xa1)
[0x7f46e066dbf1]
[compn7:03552] [ 5]
/home/gmaj/lib/openmpi/lib/openmpi/mca_pml_ob1.so(+0x5945)
[0x7f46dd08b945]
[compn7:03552] [ 6]
/home/gmaj/lib/openmpi/lib/libmpi.so.0(MPI_Send+0x6a) [0x7f46e0b4f10a]
[compn7:03552] [ 7] /home/gmaj/matrix/matrix(BI_Ssend+0x21) [0x49cc11]
[compn7:03552] [ 8] /home/gmaj/matrix/matrix(BI_IdringBR+0x79) [0x49c579]
[compn7:03552] [ 9] /home/gmaj/matrix/matrix(ilp64_Cdgebr2d+0x221) [0x495bb1]
[compn7:03552] [10] /home/gmaj/matrix/matrix(Cdgebr2d+0xd0) [0x47ffb0]
[compn7:03552] [11]
/home/gmaj/lib/intel_mkl/current/lib/em64t/libmkl_scalapack_ilp64.so(PB_CInV2+0x1304)
[0x7f46e27f5124]
[compn7:03552] *** End of error message ***

This error appears during some scalapack computation. My processes do
some mpi communication before this error appears.

I found out, that by modifying btl_tcp_eager_limit and
btl_tcp_max_send_size parameters, I can run more processes - the
smaller those values are, the more processes I can run. Unfortunately,
by this method I've succeeded to run up to 30 processes, which is
still far to small.

Some clue may be what valgrind says:

==3894== Syscall param writev(vector[...]) points to uninitialised byte(s)
==3894==at 0x82D009B: writev (in /lib64/libc-2.12.90.so)
==3894==by 0xBA2136D: mca_btl_tcp_frag_send (in
/home/gmaj/lib/openmpi/lib/openmpi/mca_btl_tcp.so)
==3894==by 0xBA203D0: mca_btl_tcp_endpoint_send (in
/home/gmaj/lib/openmpi/lib/openmpi/mca_btl_tcp.so)
==3894==by

Re: [OMPI users] [Rocks-Discuss] compiling Openmpi on solaris studio express

2010-11-30 Thread Terry Dontje

Ticket 2632 really spells out what the issue is.

On 11/30/2010 10:23 AM, Prentice Bisbal wrote:

Nehemiah Dacres wrote:

that looks about right. So the suggestion:

./configure LDFLAGS="-notpath ... ... ..."

-notpath should be replaced by whatever the proper flag should be, in my case 
-L  ?

Yes, that's exactly what I meant. I should have chosen something better
than "-notpath" to say "put a value there that was not '-path'".
I don't think the above will fix the problem because it has to do with 
what how one passes the --rpath option to the linker.  Prior to Studio 
12.2 the --rpath option was passed through to the linker blindly (with a 
warning).  In Studio 12.2 the compiler recognizes -r as a compiler 
option and now "-path" is blindly passed to the linker which has no idea 
what that means.  So one really needs to preface "--rpath" with either 
"-Wl," or "-Qoption ld ".  I don't believe changing the LDFLAGS will 
actually change problem.

--td

Not sure if my suggestion will help, given the bug report below. If
you're really determined, you can always try editing all the makefiles
after configure. Something like this might work:

find . -name Makefile -exec sed -i.bak s/-path/-L/g \{\} \;

Use that at your own risk. You might change instances of the string
'-path' that are actually correct.

Prentice

On Mon, Nov 29, 2010 at 3:16 PM, Rolf vandeVaart
>  wrote:

 This problem looks a lot like a thread from earlier today.  Can you
 look at this
 ticket and see if it helps?  It has a workaround documented in it.

 https://svn.open-mpi.org/trac/ompi/ticket/2632

 Rolf

 On 11/29/10 16:13, Prentice Bisbal wrote:

 No, it looks like ld is being called with the option -path, and your
 linker doesn't use that switch. Grep you Makefile(s) for the string
 "-path". It's probably in a statement defining LDFLAGS somewhere.

 When you find it, replace it with the equivalent switch for your
 compiler. You may be able to override it's value on the configure
 command-line, which is usually easiest/best:

 ./configure LDFLAGS="-notpath ... ... ..."

 --
 Prentice

 Nehemiah Dacres wrote:

 it may have been that  I didn't set ld_library_path

 On Mon, Nov 29, 2010 at 2:36 PM, Nehemiah 
Dacres
 >  wrote:

 thank you, you have been doubly helpful, but I am having linking
 errors and I do not know what the solaris studio compiler's
 preferred linker is. The

 the configure statement was

 ./configure --prefix=/state/partition1/apps/sunmpi/
 --enable-mpi-threads --with-sge --enable-static
 --enable-sparse-groups CC=/opt/oracle/solstudio12.2/bin/suncc
 CXX=/opt/oracle/solstudio12.2/bin/sunCC
 F77=/opt/oracle/solstudio12.2/bin/sunf77
 FC=/opt/oracle/solstudio12.2/bin/sunf90

compile statement was

 make all install 2>errors

 error below is

 f90: Warning: Option -path passed to ld, if ld is invoked, ignored
 otherwise
 f90: Warning: Option -path passed to ld, if ld is invoked, ignored
 otherwise
 f90: Warning: Option -path passed to ld, if ld is invoked, ignored
 otherwise
 f90: Warning: Option -path passed to ld, if ld is invoked, ignored
 otherwise
 f90: Warning: Option -soname passed to ld, if ld is invoked, ignored
 otherwise
 /usr/bin/ld: unrecognized option '-path'
 /usr/bin/ld: use the --help option for usage information
 make[4]: *** [libmpi_f90.la  
] Error 2
 make[3]: *** [all-recursive] Error 1
 make[2]: *** [all] Error 2
 make[1]: *** [all-recursive] Error 1
 make: *** [all-recursive] Error 1

 am I doing this wrong? are any of those configure flags unnecessary
 or inappropriate

 On Mon, Nov 29, 2010 at 2:06 PM, Gus 
Correa
 >  wrote:

 Nehemiah Dacres wrote:

 I want to compile openmpi to work with the solaris studio
 express  or
 solaris studio. This is a different version than is installed 
on
 rockscluster 5.2  and would like to know if there any
 gotchas or configure
 flags I should use to get it working or portable to nodes on
 the cluster.
 Software-wise,  it is a fairly homogeneous environment with
 only slight
 variations on the hardware side which could be isolated
 (machinefile flag
 and what-not)
 Please advise

 Hi Nehemiah
 I just answered

Re: [OMPI users] [Rocks-Discuss] compiling Openmpi on solaris studio express

2010-11-30 Thread Terry Dontje

A slight note for the below there should be a space between "ld" and the 
ending single quote mark so it should be '-Qoption ld ' not '-Qoption ld'


--td
On 11/30/2010 06:31 AM, Terry Dontje wrote:
Actually there is a way to modify the configure file that will not 
require the autogen.sh to be ran.
If you go into configure and search for "Sun F" a few lines down will 
be one of three assignments:

lt_prog_compiler_wl
lt_prog_compiler_wl_F77
lt_prog_compiler_wl_FC

If you change them all to '-Qoption ld' and then do the configure 
things should work.


Good luck,

--td

On 11/30/2010 06:19 AM, Terry Dontje wrote:

On 11/29/2010 05:41 PM, Nehemiah Dacres wrote:

thanks.
FYI: its openmpi-1.4.2 from a tarball like you assume
I changed this line
 *Sun\ F* | *Sun*Fortran*)
  # Sun Fortran 8.3 passes all unrecognized flags to the linker
  _LT_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
  _LT_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
  _LT_TAGVAR(lt_prog_compiler_wl, $1)='-Qoption ld '

 unfortunately my autoconf tool is out of date (2.59 , it says it 
wants 2.60+ )


The build page (http://www.open-mpi.org/svn/building.php) show's the 
versions of the tools you need to build OMPI.  Sorry, unfortunately 
in order for this workaround to work you need to re-autogen.sh no way 
around that.


On Mon, Nov 29, 2010 at 4:11 PM, Rolf vandeVaart 
<rolf.vandeva...@oracle.com <mailto:rolf.vandeva...@oracle.com>> wrote:


No, I do not believe so.  First, I assume you are trying to
build either 1.4 or 1.5, not the trunk.
Secondly, I assume you are building from a tarfile that you have
downloaded.  Assuming these
two things are true, then (as stated in the bug report), prior
to running configure, you want to
make the following edits to config/libtool.m4 in all the places
you see it. ( I think just one place)

FROM:

 *Sun\ F*)
   # Sun Fortran 8.3 passes all unrecognized flags to the linker
   _LT_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
   _LT_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
   _LT_TAGVAR(lt_prog_compiler_wl, $1)=''
   ;;

TO:

 *Sun\ F*)
   # Sun Fortran 8.3 passes all unrecognized flags to the linker
   _LT_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
   _LT_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
   _LT_TAGVAR(lt_prog_compiler_wl, $1)='-Qoption ld '
   ;;



Note the difference in the lt_prog_compiler_wl line.

I ran ./configure anyway, but I don't think it did anything
It didn't, the change to libtool.m4 only affects the build system 
when you do an autogen.sh.


--td



Then, you need to run ./autogen.sh.  Then, redo your configure
but you do not need to do anything
with LDFLAGS.  Just use your original flags.  I think this
should work, but I am only reading
what is in the ticket.

Rolf



On 11/29/10 16:26, Nehemiah Dacres wrote:

that looks about right. So the suggestion:

./configure LDFLAGS="-notpath ... ... ..."

-notpath should be replaced by whatever the proper flag should be, in my case 
-L  ?

   


On Mon, Nov 29, 2010 at 3:16 PM, Rolf vandeVaart
<rolf.vandeva...@oracle.com
<mailto:rolf.vandeva...@oracle.com>> wrote:

This problem looks a lot like a thread from earlier today. 
Can you look at this

ticket and see if it helps?  It has a workaround documented
in it.

https://svn.open-mpi.org/trac/ompi/ticket/2632

Rolf


On 11/29/10 16:13, Prentice Bisbal wrote:

No, it looks like ld is being called with the option -path, and your
linker doesn't use that switch. Grep you Makefile(s) for the string
"-path". It's probably in a statement defining LDFLAGS somewhere.

When you find it, replace it with the equivalent switch for your
compiler. You may be able to override it's value on the configure
command-line, which is usually easiest/best:

./configure LDFLAGS="-notpath ... ... ..."

--
Prentice


Nehemiah Dacres wrote:
   

it may have been that  I didn't set ld_library_path

On Mon, Nov 29, 2010 at 2:36 PM, Nehemiah Dacres<dacre...@slu.edu  
<mailto:dacre...@slu.edu>
<mailto:dacre...@slu.edu>>  wrote:

 thank you, you have been doubly helpful, but I am having linking
 errors and I do not know what the solaris studio compiler's
 preferred linker is. The

 the configure statement was

 ./configure --prefix=/state/partition1/apps/sunmpi/
 --enable-mpi-threads --with-sge --enable-static
 --enable-sparse-groups CC=/opt/oracle/solstudio12.2/bin/suncc
 CXX=/opt/oracle/solstudio12.2

Re: [OMPI users] [Rocks-Discuss] compiling Openmpi on solaris studio express

2010-11-30 Thread Terry Dontje

Actually there is a way to modify the configure file that will not 
require the autogen.sh to be ran.
If you go into configure and search for "Sun F" a few lines down will be 
one of three assignments:

lt_prog_compiler_wl
lt_prog_compiler_wl_F77
lt_prog_compiler_wl_FC

If you change them all to '-Qoption ld' and then do the configure things 
should work.


Good luck,

--td

On 11/30/2010 06:19 AM, Terry Dontje wrote:

On 11/29/2010 05:41 PM, Nehemiah Dacres wrote:

thanks.
FYI: its openmpi-1.4.2 from a tarball like you assume
I changed this line
 *Sun\ F* | *Sun*Fortran*)
  # Sun Fortran 8.3 passes all unrecognized flags to the linker
  _LT_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
  _LT_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
  _LT_TAGVAR(lt_prog_compiler_wl, $1)='-Qoption ld '

 unfortunately my autoconf tool is out of date (2.59 , it says it 
wants 2.60+ )


The build page (http://www.open-mpi.org/svn/building.php) show's the 
versions of the tools you need to build OMPI.  Sorry, unfortunately in 
order for this workaround to work you need to re-autogen.sh no way 
around that.


On Mon, Nov 29, 2010 at 4:11 PM, Rolf vandeVaart 
<rolf.vandeva...@oracle.com <mailto:rolf.vandeva...@oracle.com>> wrote:


No, I do not believe so.  First, I assume you are trying to build
either 1.4 or 1.5, not the trunk.
Secondly, I assume you are building from a tarfile that you have
downloaded.  Assuming these
two things are true, then (as stated in the bug report), prior to
running configure, you want to
make the following edits to config/libtool.m4 in all the places
you see it. ( I think just one place)

FROM:

 *Sun\ F*)
   # Sun Fortran 8.3 passes all unrecognized flags to the linker
   _LT_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
   _LT_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
   _LT_TAGVAR(lt_prog_compiler_wl, $1)=''
   ;;

TO:

 *Sun\ F*)
   # Sun Fortran 8.3 passes all unrecognized flags to the linker
   _LT_TAGVAR(lt_prog_compiler_pic, $1)='-KPIC'
   _LT_TAGVAR(lt_prog_compiler_static, $1)='-Bstatic'
   _LT_TAGVAR(lt_prog_compiler_wl, $1)='-Qoption ld '
   ;;



Note the difference in the lt_prog_compiler_wl line.

I ran ./configure anyway, but I don't think it did anything
It didn't, the change to libtool.m4 only affects the build system when 
you do an autogen.sh.


--td



Then, you need to run ./autogen.sh.  Then, redo your configure
but you do not need to do anything
with LDFLAGS.  Just use your original flags.  I think this should
work, but I am only reading
what is in the ticket.

Rolf



On 11/29/10 16:26, Nehemiah Dacres wrote:

that looks about right. So the suggestion:

./configure LDFLAGS="-notpath ... ... ..."

-notpath should be replaced by whatever the proper flag should be, in my case 
-L  ?

   


On Mon, Nov 29, 2010 at 3:16 PM, Rolf vandeVaart
<rolf.vandeva...@oracle.com <mailto:rolf.vandeva...@oracle.com>>
wrote:

This problem looks a lot like a thread from earlier today. 
Can you look at this

ticket and see if it helps?  It has a workaround documented
in it.

https://svn.open-mpi.org/trac/ompi/ticket/2632

Rolf


On 11/29/10 16:13, Prentice Bisbal wrote:

No, it looks like ld is being called with the option -path, and your
linker doesn't use that switch. Grep you Makefile(s) for the string
"-path". It's probably in a statement defining LDFLAGS somewhere.

When you find it, replace it with the equivalent switch for your
compiler. You may be able to override it's value on the configure
command-line, which is usually easiest/best:

./configure LDFLAGS="-notpath ... ... ..."

--
Prentice


Nehemiah Dacres wrote:
   

it may have been that  I didn't set ld_library_path

On Mon, Nov 29, 2010 at 2:36 PM, Nehemiah Dacres<dacre...@slu.edu  
<mailto:dacre...@slu.edu>
<mailto:dacre...@slu.edu>>  wrote:

 thank you, you have been doubly helpful, but I am having linking
 errors and I do not know what the solaris studio compiler's
 preferred linker is. The

 the configure statement was

 ./configure --prefix=/state/partition1/apps/sunmpi/
 --enable-mpi-threads --with-sge --enable-static
 --enable-sparse-groups CC=/opt/oracle/solstudio12.2/bin/suncc
 CXX=/opt/oracle/solstudio12.2/bin/sunCC
 F77=/opt/oracle/solstudio12.2/bin/sunf77
 FC=/opt/oracle/solstudio12.2/bin/sunf90

compile statement was

 make all install 2>errors

Re: [OMPI users] cannot build Open MPI 1.5 on Linux x86_64 with Oracle/Sun C 5.11

2010-11-29 Thread Terry Dontje

This is ticket 2632 https://svn.open-mpi.org/trac/ompi/ticket/2632.  A 
fix has been put into the trunk last week.  We should be able to CMR 
this fix to the 1.5 and 1.4 branches later this week.The ticket 
actually has a workaround for 1.5 branch.


--td
On 11/29/2010 09:46 AM, Siegmar Gross wrote:

Hi,

in the meantime we have installed gcc-4.5.1 and now I get a different error,
when I try to build OpenMPI-1.5 with Oracle Studio 12 Update 2 on Linux.

linpc4 openmpi-1.5-Linux.x86_64.32_cc_gcc-4.5.1 121 head -18 config.log
...
   $ ../openmpi-1.5/configure --prefix=/usr/local/openmpi-1.5_32_cc
LDFLAGS=-m32 CC=cc CXX=CC F77=f77 FC=f95 CFLAGS=-m32 CXXFLAGS=-m32
FFLAGS=-m32 FCFLAGS=-m32 CXXLDFLAGS=-m32 CPPFLAGS= C_INCL_PATH=
C_INCLUDE_PATH= CPLUS_INCLUDE_PATH= OBJC_INCLUDE_PATH= MPICHHOME=
--without-udapl --without-openib --enable-mpi-f90
--with-mpi-f90-size=small --enable-heterogeneous
--enable-cxx-exceptions --enable-shared
--enable-orterun-prefix-by-default --with-threads=posix
--enable-mpi-threads --disable-progress-threads

## - ##
## Platform. ##
## - ##

hostname = linpc4
uname -m = x86_64
uname -r = 2.6.31.14-0.4-desktop
uname -s = Linux
uname -v = #1 SMP PREEMPT 2010-10-25 08:45:30 +0200



linpc4 openmpi-1.5-Linux.x86_64.32_cc_gcc-4.5.1 122 tail -20
   log.make.Linux.x86_64.32_cc
../../../../openmpi-1.5/ompi/mpi/f90/scripts/mpi_wtick_f90.f90.sh
/export2/src/openmpi-1.5/openmpi-1.5-Linux.x86_64.32_cc_gcc-4.5.1/ompi/mpi/f90>
mpi_wtick_f90.f90
   FC mpi_wtick_f90.lo
../../../../openmpi-1.5/ompi/mpi/f90/scripts/mpi_wtime_f90.f90.sh
/export2/src/openmpi-1.5/openmpi-1.5-Linux.x86_64.32_cc_gcc-4.5.1/ompi/mpi/f90>
mpi_wtime_f90.f90
   FC mpi_wtime_f90.lo
   FCLD   libmpi_f90.la
f90: Warning: Option -path passed to ld, if ld is invoked, ignored otherwise
f90: Warning: Option -path passed to ld, if ld is invoked, ignored otherwise
f90: Warning: Option -path passed to ld, if ld is invoked, ignored otherwise
f90: Warning: Option -soname passed to ld, if ld is invoked, ignored otherwise
/usr/bin/ld: unrecognized option '-path'
/usr/bin/ld: use the --help option for usage information
make[4]: *** [libmpi_f90.la] Error 2
make[4]: Leaving directory
`/export2/src/openmpi-1.5/openmpi-1.5-Linux.x86_64.32_cc_gcc-4.5.1/ompi/mpi/f90'
make[3]: *** [all-recursive] Error 1
make[3]: Leaving directory
`/export2/src/openmpi-1.5/openmpi-1.5-Linux.x86_64.32_cc_gcc-4.5.1/ompi/mpi/f90'
make[2]: *** [all] Error 2
make[2]: Leaving directory
`/export2/src/openmpi-1.5/openmpi-1.5-Linux.x86_64.32_cc_gcc-4.5.1/ompi/mpi/f90'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory
`/export2/src/openmpi-1.5/openmpi-1.5-Linux.x86_64.32_cc_gcc-4.5.1/ompi'
make: *** [all-recursive] Error 1
linpc4 openmpi-1.5-Linux.x86_64.32_cc_gcc-4.5.1 123


In my opinion it is still a strange behaviour that building OpenMPI with
"cc" depends on the installed version of "gcc". Has anybody successfully
build OpenMPI-1.5 with Oracle Studio C on Linux? Which command line
options did you use? I get the same error if I try to build a 64-bit
version. I can build and install OpenMPI-1.5 in a 32- and 64-bit version
without Fortran support, if I replace
"--enable-mpi-f90 --with-mpi-f90-size=small" with
"--disable-mpi-f77 --disable-mpi-f90" in the above "configure"-command.

"make check" delivers "PASSED" for all tests in the 64-bit and one
"FAILED" in the 32-bit version.

...
make  check-TESTS
make[3]: Entering directory
`/export2/src/openmpi-1.5/openmpi-1.5-Linux.x86_64.32_cc_gcc-4.5.1_without_f90/te
st/util'
  Failure : Mismatch: input "/home/fd1026", expected:1 got:0

SUPPORT: OMPI Test failed: opal_path_nfs() (1 of 13 failed)
FAIL: opal_path_nfs

1 of 1 test failed
Please report to http://www.open-mpi.org/community/help/

make[3]: *** [check-TESTS] Error 1
make[3]: Leaving directory
`/export2/src/openmpi-1.5/openmpi-1.5-Linux.x86_64.32_cc_gc
c-4.5.1_without_f90/test/util'
make[2]: *** [check-am] Error 2
make[2]: Leaving directory
`/export2/src/openmpi-1.5/openmpi-1.5-Linux.x86_64.32_cc_gc
c-4.5.1_without_f90/test/util'
make[1]: *** [check-recursive] Error 1
make[1]: Leaving directory
`/export2/src/openmpi-1.5/openmpi-1.5-Linux.x86_64.32_cc_gc
c-4.5.1_without_f90/test'
make: *** [check-recursive] Error 1
linpc4 openmpi-1.5-Linux.x86_64.32_cc_gcc-4.5.1_without_f90 131



I can also successfully build and run my small C test programs which I
mentioned in my earlier email with this OpenMPI package. Any ideas how
I can build Fortran support? Thank you very much for any suggestions in
advance.


Kind regards

Siegmar



   Sorry, but can you give us the config line, the config.log and the
full output of make preferrably with make V=1?

--td
On 10/29/2010 04:30 AM, Siegmar Gross wrote:

Hi,

I tried to build Open MPI 1.5 on Solaris X86 and x86_64 with Oracle
Studio 12.2. I can compile Open

Re: [OMPI users] Prioritization of --mca btl openib,tcp,self

2010-11-23 Thread Terry Dontje


On 11/22/2010 08:18 PM, Paul Monday (Parallel Scientific) wrote:

This is a follow-up to an earlier question, I'm trying to understand how --mca 
btl prioritizes it's choice for connectivity.  Going back to my original 
network, there are actually two networks running around.  A point to point 
Infiniband network that looks like this (with two fabrics):

A(port 1)(opensm)-->B
A(port 2)(opensm)-->C

The original question queried whether there was a way to avoid the problem of B 
and C not being able to talk to each other if I were to run

mpirun  -host A,B,C --mca btl openib,self -d /mnt/shared/apps/myapp

"At least one pair of MPI processes are unable to reach each other for
MPI communications." ...

There is an additional network though, I have an ethernet management network 
that connects to all nodes.  If our program retrieves the ranks from the nodes 
using TCP and then can shift to openib, that would be interesting and, in fact, 
if I run

mpirun  -host A,B,C --mca btl openib,tcp,self -d /mnt/shared/apps/myapp

The program does, in fact, run cleanly.

But, the question I have now is does MPI "choose" to use tcp when it can find 
all nodes and then always use tcp, or will it fall back to openib if it can?
For MPI communications (as opposed to the ORTE communications) the 
library will try and pick out the most performant protocol to use for 
communications between two nodes.  So in your case A-B and A-C should 
use the openib btl and B-C should use the tcp btl.

So ... more succinctly:
Given a list of btls, such as openib,tcp,self, and a program can only broadcast 
on tcp but individual operations can occur over openib between nodes, will 
mpirun use the first interconnect that works for each operation or once it 
finds one that the broadcast phase works on will it use that one permanently?
If by broadcast you mean MPI_Bcast, this is actually done using point to 
point algorithms so the communications will happen over a mixture of IB 
and TCP.


If you mean something else by broadcast you'll need to clarify what you 
mean because there really isn't a direct use of protocol broadcasts in 
MPI or even ORTE to my knowledge.

And, as a follow-up, can I turn off the attempt to broadcast to touch all nodes?

See above.

Paul Monday
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] Multiple Subnet MPI Fail

2010-11-22 Thread Terry Dontje

You're gonna have to use a protocol that can route through a machine, 
OFED User Verbs (ie openib) does not do this.  The only way I know of to 
do this via OMPI is with the tcp btl.


--td

On 11/22/2010 09:28 AM, Paul Monday (Parallel Scientific) wrote:
We've been using OpenMPI in a switched environment with success, but 
we've moved to a point to point environment to do some work.  Some of 
the nodes cannot talk directly to one another, sort of like this with 
computers A,B, C with A having two ports:


A(1)(opensm)-->B
A(2)(opensm)-->C

B is not connected to C in any way.

When we try to run our OpenMPI program we are receiving:
At least one pair of MPI processes are unable to reach each other for
MPI communications.  This means that no Open MPI device has indicated
that it can be used to communicate between these processes.  This is
an error; Open MPI requires that all MPI processes be able to reach
each other.  This error can sometimes be the result of forgetting to
specify the "self" BTL.

  Process 1 ([[1581,1],5]) is on host: pg-B
  Process 2 ([[1581,1],0]) is on host: pg-C
  BTLs attempted: openib self sm

Your MPI job is now going to abort; sorry.


I hope I'm not being overly naive but, is their a way to join the 
subnets at the MPI layer?  It seems like IP over IB would be too high 
up the stack.


Paul Monday
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-18 Thread Terry Dontje

Yes, I believe this solves the mystery.  In short OGE and ORTE both 
work.  In the linear:1 case the job is exiting because there are not 
enough resources for the orte binding to work, which actually makes 
sense.  In the linear:2 case I think we've proven that we are binding to 
the right amount of resources and to the correct physical resources at 
the process level.


In the case you do not do pass bind-to-core to mpirun with a qsub using  
linear:2 the processes on the same node will actually bind to the same 
two cores.  The only way to determine this is to run something that 
prints out the binding from the system.  There is no way to do this via 
OMPI because it only reports binding when you are requesting mpirun to 
do some type of binding (like -bind-to-core or -bind-to-socket.


In the linear:1 case with no binding I think you are having the 
processes on the same node run on the same core.   Which is exactly what 
you are asking for I believe.


So I believe we understand what is going on with the binding and it 
makes sense to me.  As far as the allocation issue of slots vs. cores 
and trying to not overallocate cores I believe the new allocation rule 
make sense to do but I'll let you hash that out with Daniel.


In summary I don't believe there is any OMPI bugs related to what we've 
seen and the OGE issue is just the allocation issue, right?


--td


On 11/18/2010 01:32 AM, Chris Jewell wrote:

Perhaps if someone could run this test again with --report-bindings 
--leave-session-attached and provide -all- output we could verify that analysis 
and clear up the confusion?


Yeah, however I bet you we still won't see output.

Actually, it seems we do get more output!  Results of 'qsub -pe mpi 8 -binding 
linear:2 myScript.com'

with

'mpirun -mca ras_gridengine_verbose 100 -report-bindings 
--leave-session-attached -bycore -bind-to-core ./unterm'

[exec1:06504] System has detected external process binding to cores 0028
[exec1:06504] ras:gridengine: JOB_ID: 59467
[exec1:06504] ras:gridengine: PE_HOSTFILE: 
/usr/sge/default/spool/exec1/active_jobs/59467.1/pe_hostfile
[exec1:06504] ras:gridengine: exec1.cluster.stats.local: PE_HOSTFILE shows 
slots=2
[exec1:06504] ras:gridengine: exec3.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06504] ras:gridengine: exec2.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06504] ras:gridengine: exec7.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06504] ras:gridengine: exec4.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06504] ras:gridengine: exec5.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06504] ras:gridengine: exec6.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06504] [[59608,0],0] odls:default:fork binding child [[59608,1],0] to 
cpus 0008
[exec1:06504] [[59608,0],0] odls:default:fork binding child [[59608,1],1] to 
cpus 0020
[exec3:20248] [[59608,0],1] odls:default:fork binding child [[59608,1],2] to 
cpus 0008
[exec4:26792] [[59608,0],4] odls:default:fork binding child [[59608,1],5] to 
cpus 0001
[exec2:32462] [[59608,0],2] odls:default:fork binding child [[59608,1],3] to 
cpus 0001
[exec7:09833] [[59608,0],3] odls:default:fork binding child [[59608,1],4] to 
cpus 0002
[exec5:10834] [[59608,0],5] odls:default:fork binding child [[59608,1],6] to 
cpus 0001
[exec6:04230] [[59608,0],6] odls:default:fork binding child [[59608,1],7] to 
cpus 0001

AHHA!  Now I get the following if I use 'qsub -pe mpi 8 -binding linear:1 
myScript.com' with the above mpirun command:

[exec1:06552] System has detected external process binding to cores 0020
[exec1:06552] ras:gridengine: JOB_ID: 59468
[exec1:06552] ras:gridengine: PE_HOSTFILE: 
/usr/sge/default/spool/exec1/active_jobs/59468.1/pe_hostfile
[exec1:06552] ras:gridengine: exec1.cluster.stats.local: PE_HOSTFILE shows 
slots=2
[exec1:06552] ras:gridengine: exec3.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06552] ras:gridengine: exec2.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06552] ras:gridengine: exec7.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06552] ras:gridengine: exec4.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06552] ras:gridengine: exec5.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec1:06552] ras:gridengine: exec6.cluster.stats.local: PE_HOSTFILE shows 
slots=1
--
mpirun was unable to start the specified application as it encountered an error:

Error name: Unknown error: 1
Node: exec1

when attempting to start process rank 0.
--
[exec1:06552] [[59432,0],0] odls:default:fork binding child [[59432,1],0] to 
cpus 0020
--
Not enough processors were found on the local host to meet the requested
binding action:

   Local host:exec1
   Action requested:  bind-to-core
   Application name:

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Terry Dontje


On 11/17/2010 10:48 AM, Ralph Castain wrote:
No problem at all. I confess that I am lost in all the sometimes 
disjointed emails in this thread. Frankly, now that I search, I can't 
find it either! :-(


I see one email that clearly shows the external binding report from 
mpirun, but not from any daemons. I see another email (after you asked 
if there was all the output) that states "yep", indicating that was 
all the output, and then proceeds to offer additional output that 
wasn't in the original email you asked about!


So I am now as thoroughly confused as you are...

That said, I am confident in the code in ORTE as it has worked 
correctly when I tested it against external bindings in other 
environments. So I really do believe this is an OGE issue where the 
orted isn't getting correctly bound against all allocated cores.


I am confused by your statement above because we don't even know what is 
being bound or not.  We know that in it looks like the hnp is bound to 2 
cores which is what we asked for but we don't know what any of the 
processes themselves are bound to.   So I personally cannot point to 
ORTE or OGE as the culprit because I don't think we know whether there 
is an issue.


So, until we are able to get the -report-bindings output from the a.out 
code (note I did not say orted) it is kind of hard to claim there is 
even an issue.  Which brings me back to the output question.  After some 
thinking the --report-bindings output I am expecting is not from the 
orted itself but from the a.out before it executes the user code.   
Which now makes me wonder if there is some odd OGE/OMPI integration 
issue which the -bind-to-code -report-bindings options are not being 
propagated/recognized/honored when qsub is given the -binding option.


Perhaps if someone could run this test again with --report-bindings 
--leave-session-attached and provide -all- output we could verify that 
analysis and clear up the confusion?



Yeah, however I bet you we still won't see output.

--td



On Wed, Nov 17, 2010 at 8:13 AM, Terry Dontje <terry.don...@oracle.com 
<mailto:terry.don...@oracle.com>> wrote:


On 11/17/2010 10:00 AM, Ralph Castain wrote:

--leave-session-attached is always required if you want to see
output from the daemons. Otherwise, the launcher closes the ssh
session (or qrsh session, in this case) as part of its normal
operating procedure, thus terminating the stdout/err channel.



I believe you but isn't it weird that without the --binding option
to qsub we saw -report-bindings output from the orteds?

Do you have the date of the email that has the info you talked
about below.  I really am not trying to be an a-hole about this
but there have been so much data and email flying around it would
be nice to actually see the output you mention.

--td



On Wed, Nov 17, 2010 at 7:51 AM, Terry Dontje
<terry.don...@oracle.com <mailto:terry.don...@oracle.com>> wrote:

On 11/17/2010 09:32 AM, Ralph Castain wrote:

Cris' output is coming solely from the HNP, which is correct
given the way things were executed. My comment was from
another email where he did what I asked, which was to
include the flags:

--report-bindings --leave-session-attached

so we could see the output from each orted. In that email,
it was clear that while mpirun was bound to multiple cores,
the orteds are being bound to a -single- core.

Hence the problem.


Hmm, I see Ralph's comment on 11/15 but I don't see any
output that shows what Ralph say's above.  The only
report-bindings output I see is when he runs without OGE
binding.   Can someone give me the date and time of Chris'
email with the --report-bindings and
--leave-session-attached.  Or a rerun of the below with the
--leave-session-attached option would also help.

I find it confusing that --leave-session-attached is not
required when the OGE binding argument is not given.

--td


HTH
Ralph


    On Wed, Nov 17, 2010 at 6:57 AM, Terry Dontje
<terry.don...@oracle.com <mailto:terry.don...@oracle.com>>
wrote:

On 11/17/2010 07:41 AM, Chris Jewell wrote:

On 17 Nov 2010, at 11:56, Terry Dontje wrote:

You are absolutely correct, Terry, and the 1.4 release series does 
include the proper code. The point here, though, is that SGE binds the orted to 
a single core, even though other cores are also allocated. So the orted detects 
an external binding of one core, and binds all its children to that same core.

I do not think you are right here.  Chris sent the following which looks like 
OGE (fka SGE) actually did bind the hnp to multiple cores.  However that message I 
believe is not coming from the processes themselves and actually is o

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Terry Dontje


On 11/17/2010 10:00 AM, Ralph Castain wrote:
--leave-session-attached is always required if you want to see output 
from the daemons. Otherwise, the launcher closes the ssh session (or 
qrsh session, in this case) as part of its normal operating procedure, 
thus terminating the stdout/err channel.



I believe you but isn't it weird that without the --binding option to 
qsub we saw -report-bindings output from the orteds?


Do you have the date of the email that has the info you talked about 
below.  I really am not trying to be an a-hole about this but there have 
been so much data and email flying around it would be nice to actually 
see the output you mention.


--td

On Wed, Nov 17, 2010 at 7:51 AM, Terry Dontje <terry.don...@oracle.com 
<mailto:terry.don...@oracle.com>> wrote:


On 11/17/2010 09:32 AM, Ralph Castain wrote:

Cris' output is coming solely from the HNP, which is correct
given the way things were executed. My comment was from another
email where he did what I asked, which was to include the flags:

--report-bindings --leave-session-attached

so we could see the output from each orted. In that email, it was
clear that while mpirun was bound to multiple cores, the orteds
are being bound to a -single- core.

Hence the problem.


Hmm, I see Ralph's comment on 11/15 but I don't see any output
that shows what Ralph say's above.  The only report-bindings
output I see is when he runs without OGE binding.   Can someone
give me the date and time of Chris' email with the
--report-bindings and --leave-session-attached.  Or a rerun of the
below with the --leave-session-attached option would also help.

I find it confusing that --leave-session-attached is not required
when the OGE binding argument is not given.

--td


HTH
Ralph


On Wed, Nov 17, 2010 at 6:57 AM, Terry Dontje
<terry.don...@oracle.com <mailto:terry.don...@oracle.com>> wrote:

On 11/17/2010 07:41 AM, Chris Jewell wrote:

On 17 Nov 2010, at 11:56, Terry Dontje wrote:

You are absolutely correct, Terry, and the 1.4 release series does 
include the proper code. The point here, though, is that SGE binds the orted to 
a single core, even though other cores are also allocated. So the orted detects 
an external binding of one core, and binds all its children to that same core.

I do not think you are right here.  Chris sent the following which looks like OGE 
(fka SGE) actually did bind the hnp to multiple cores.  However that message I believe is 
not coming from the processes themselves and actually is only shown by the hnp.  I wonder 
if Chris adds a "-bind-to-core" option  we'll see more output from the a.out's 
before they exec unterm?

As requested using

$ qsub -pe mpi 8 -binding linear:2 myScript.com'

and

'mpirun -mca ras_gridengine_verbose 100 --report-bindings -by-core 
-bind-to-core ./unterm'

[exec5:06671] System has detected external process binding to cores 0028
[exec5:06671] ras:gridengine: JOB_ID: 59434
[exec5:06671] ras:gridengine: PE_HOSTFILE: 
/usr/sge/default/spool/exec5/active_jobs/59434.1/pe_hostfile
[exec5:06671] ras:gridengine: exec5.cluster.stats.local: PE_HOSTFILE 
shows slots=2
[exec5:06671] ras:gridengine: exec1.cluster.stats.local: PE_HOSTFILE 
shows slots=2
[exec5:06671] ras:gridengine: exec4.cluster.stats.local: PE_HOSTFILE 
shows slots=1
[exec5:06671] ras:gridengine: exec3.cluster.stats.local: PE_HOSTFILE 
shows slots=1
[exec5:06671] ras:gridengine: exec2.cluster.stats.local: PE_HOSTFILE 
shows slots=1
[exec5:06671] ras:gridengine: exec7.cluster.stats.local: PE_HOSTFILE 
shows slots=1

No more info.  I note that the external binding is slightly different 
to what I had before, but our cluster is busier today :-)


I would have expected more output.

--td


Chris


--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778






___
users mailing list
us...@open-mpi.org  <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users



-- 
Oracle

Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>




___
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing lis

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Terry Dontje


On 11/17/2010 09:32 AM, Ralph Castain wrote:
Cris' output is coming solely from the HNP, which is correct given the 
way things were executed. My comment was from another email where he 
did what I asked, which was to include the flags:


--report-bindings --leave-session-attached

so we could see the output from each orted. In that email, it was 
clear that while mpirun was bound to multiple cores, the orteds are 
being bound to a -single- core.


Hence the problem.

Hmm, I see Ralph's comment on 11/15 but I don't see any output that 
shows what Ralph say's above.  The only report-bindings output I see is 
when he runs without OGE binding.   Can someone give me the date and 
time of Chris' email with the --report-bindings and 
--leave-session-attached.  Or a rerun of the below with the 
--leave-session-attached option would also help.


I find it confusing that --leave-session-attached is not required when 
the OGE binding argument is not given.


--td

HTH
Ralph


On Wed, Nov 17, 2010 at 6:57 AM, Terry Dontje <terry.don...@oracle.com 
<mailto:terry.don...@oracle.com>> wrote:


On 11/17/2010 07:41 AM, Chris Jewell wrote:

On 17 Nov 2010, at 11:56, Terry Dontje wrote:

You are absolutely correct, Terry, and the 1.4 release series does include 
the proper code. The point here, though, is that SGE binds the orted to a 
single core, even though other cores are also allocated. So the orted detects 
an external binding of one core, and binds all its children to that same core.

I do not think you are right here.  Chris sent the following which looks like OGE 
(fka SGE) actually did bind the hnp to multiple cores.  However that message I believe is 
not coming from the processes themselves and actually is only shown by the hnp.  I wonder 
if Chris adds a "-bind-to-core" option  we'll see more output from the a.out's 
before they exec unterm?

As requested using

$ qsub -pe mpi 8 -binding linear:2 myScript.com'

and

'mpirun -mca ras_gridengine_verbose 100 --report-bindings -by-core 
-bind-to-core ./unterm'

[exec5:06671] System has detected external process binding to cores 0028
[exec5:06671] ras:gridengine: JOB_ID: 59434
[exec5:06671] ras:gridengine: PE_HOSTFILE: 
/usr/sge/default/spool/exec5/active_jobs/59434.1/pe_hostfile
[exec5:06671] ras:gridengine: exec5.cluster.stats.local: PE_HOSTFILE shows 
slots=2
[exec5:06671] ras:gridengine: exec1.cluster.stats.local: PE_HOSTFILE shows 
slots=2
[exec5:06671] ras:gridengine: exec4.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec5:06671] ras:gridengine: exec3.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec5:06671] ras:gridengine: exec2.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec5:06671] ras:gridengine: exec7.cluster.stats.local: PE_HOSTFILE shows 
slots=1

No more info.  I note that the external binding is slightly different to 
what I had before, but our cluster is busier today :-)


I would have expected more output.

--td


Chris


--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778






___
users mailing list
us...@open-mpi.org  <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users



-- 
Oracle

Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>




___
users mailing list
us...@open-mpi.org <mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Terry Dontje


On 11/17/2010 07:41 AM, Chris Jewell wrote:

On 17 Nov 2010, at 11:56, Terry Dontje wrote:

You are absolutely correct, Terry, and the 1.4 release series does include the 
proper code. The point here, though, is that SGE binds the orted to a single 
core, even though other cores are also allocated. So the orted detects an 
external binding of one core, and binds all its children to that same core.

I do not think you are right here.  Chris sent the following which looks like OGE (fka 
SGE) actually did bind the hnp to multiple cores.  However that message I believe is not 
coming from the processes themselves and actually is only shown by the hnp.  I wonder if 
Chris adds a "-bind-to-core" option  we'll see more output from the a.out's 
before they exec unterm?

As requested using

$ qsub -pe mpi 8 -binding linear:2 myScript.com'

and

'mpirun -mca ras_gridengine_verbose 100 --report-bindings -by-core 
-bind-to-core ./unterm'

[exec5:06671] System has detected external process binding to cores 0028
[exec5:06671] ras:gridengine: JOB_ID: 59434
[exec5:06671] ras:gridengine: PE_HOSTFILE: 
/usr/sge/default/spool/exec5/active_jobs/59434.1/pe_hostfile
[exec5:06671] ras:gridengine: exec5.cluster.stats.local: PE_HOSTFILE shows 
slots=2
[exec5:06671] ras:gridengine: exec1.cluster.stats.local: PE_HOSTFILE shows 
slots=2
[exec5:06671] ras:gridengine: exec4.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec5:06671] ras:gridengine: exec3.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec5:06671] ras:gridengine: exec2.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec5:06671] ras:gridengine: exec7.cluster.stats.local: PE_HOSTFILE shows 
slots=1

No more info.  I note that the external binding is slightly different to what I 
had before, but our cluster is busier today :-)


I would have expected more output.

--td

Chris


--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778






___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-17 Thread Terry Dontje

On 11/16/2010 08:24 PM, Ralph Castain wrote:

On Tue, Nov 16, 2010 at 12:23 PM, Terry Dontje
<terry.don...@oracle.com <mailto:terry.don...@oracle.com>> wrote:

On 11/16/2010 01:31 PM, Reuti wrote:

Hi Ralph,

Am 16.11.2010 um 15:40 schrieb Ralph Castain:

2. have SGE bind procs it launches to -all- of those cores. I believe SGE
does this automatically to constrain the procs to running on only those cores.

This is another "bug/feature" in SGE: it's a matter of discussion, whether the
shepherd should get exactly one core (in case you use more than one `qrsh`per node) for each call,
or *all* cores assigned (which we need right now, as the processes in Open MPI will be forks of
orte daemon). About such a situtation I filled an issue a long time ago and
"limit_to_one_qrsh_per_host yes/no" in the PE definition would do (this setting should
then also change the core allocation of the master process):

http://gridengine.sunsource.net/issues/show_bug.cgi?id=1254

I believe this is indeed the crux of the issue

fantastic to share the same view.

FWIW, I think I agree too.

3. tell OMPI to --bind-to-core.

In other words, tell SGE to allocate a certain number of cores on each
node, but to bind each proc to all of them (i.e., don't bind a proc to a
specific core). I'm pretty sure that is a standard SGE option today (at least,
I know it used to be). I don't believe any patch or devel work is required (to
either SGE or OMPI).

When you use a fixed allocation_rule and a matching -binding request it
will work today. But any other case won't be distributed in the correct way.

Is it possible to not include the -binding request? If SGE is told to use a
fixed allocation_rule, and to allocate (for example) 2 cores/node, then won't
the orted see
itself bound to two specific cores on each node?

When you leave out the -binding, all jobs are allowed to run on any core.

We would then be okay as the spawned children of orted would inherit its
binding. Just don't tell mpirun to bind the processes and the threads of those
MPI procs will be able to operate across the provided cores.

Or does SGE only allocate 2 cores/node in that case (i.e., allocate, but no
-binding given), but doesn't bind the orted to any two specific cores? If so,
then that would be a problem as the orted would think itself unconstrained. If
I understand the thread correctly, you're saying that this is what happens
today - true?

Exactly. It won't apply any binding at all and orted would think of being
unlimited. I.e. limited only by the number of slots it should use thereon.

So I guess the question I have for Ralph. I thought, and this
might be mixing some of the ideas Jeff and I've been talking
about, that when a RM executes the orted with a bound set of
resources (ie cores) that orted would bind the individual
processes on a subset of the bounded resources. Is this not
really the case for 1.4.X branch? I believe it is the case for
the trunk based on Jeff's refactoring.

You are absolutely correct, Terry, and the 1.4 release series does
include the proper code. The point here, though, is that SGE binds the
orted to a single core, even though other cores are also allocated. So
the orted detects an external binding of one core, and binds all its
children to that same core.
I do not think you are right here. Chris sent the following which looks
like OGE (fka SGE) actually did bind the hnp to multiple cores. However
that message I believe is not coming from the processes themselves and
actually is only shown by the hnp. I wonder if Chris adds a
"-bind-to-core" option we'll see more output from the a.out's before
they exec unterm?

Sure. Here's the stderr of a job submitted to my cluster with 'qsub -pe mpi 8
-binding linear:2 myScript.com' where myScript.com runs 'mpirun -mca
ras_gridengine_verbose 100 --report-bindings ./unterm':

[exec4:17384] System has detected external process binding to cores 0022

[exec4:17384] ras:gridengine: JOB_ID: 59352
[exec4:17384] ras:gridengine: PE_HOSTFILE:
/usr/sge/default/spool/exec4/active_jobs/59352.1/pe_hostfile
[exec4:17384] ras:gridengine: exec4.cluster.stats.local: PE_HOSTFILE shows
slots=2
[exec4:17384] ras:gridengine: exec2.cluster.stats.local: PE_HOSTFILE shows
slots=1
[exec4:17384] ras:gridengine: exec7.cluster.stats.local: PE_HOSTFILE shows
slots=1
[exec4:17384] ras:gridengine: exec3.cluster.stats.local: PE_HOSTFILE shows
slots=1
[exec4:17384] ras:gridengine: exec6.cluster.stats.local: PE_HOSTFILE shows
slots=1
[exec4:17384] ras:gridengine: exec1.cluster.stats.local: PE_HOSTFILE shows
slots=1
[exec4:17384] ras:gridengine: exec5.cluster.stats.local: PE_HOSTFILE shows
slots=1

--td
What I had suggested to Reuti was to not include the -binding flag to
SGE in the hopes that SGE would then bind the orted

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Terry Dontje

On 11/16/2010 01:31 PM, Reuti wrote:

Hi Ralph,

Am 16.11.2010 um 15:40 schrieb Ralph Castain:

2. have SGE bind procs it launches to -all- of those cores. I believe SGE does
this automatically to constrain the procs to running on only those cores.

This is another "bug/feature" in SGE: it's a matter of discussion, whether the shepherd
should get exactly one core (in case you use more than one `qrsh`per node) for each call, or *all*
cores assigned (which we need right now, as the processes in Open MPI will be forks of orte
daemon). About such a situtation I filled an issue a long time ago and
"limit_to_one_qrsh_per_host yes/no" in the PE definition would do (this setting should
then also change the core allocation of the master process):

http://gridengine.sunsource.net/issues/show_bug.cgi?id=1254

I believe this is indeed the crux of the issue

fantastic to share the same view.

FWIW, I think I agree too.

3. tell OMPI to --bind-to-core.

In other words, tell SGE to allocate a certain number of cores on each node,
but to bind each proc to all of them (i.e., don't bind a proc to a specific
core). I'm pretty sure that is a standard SGE option today (at least, I know it
used to be). I don't believe any patch or devel work is required (to either SGE
or OMPI).

When you use a fixed allocation_rule and a matching -binding request it will
work today. But any other case won't be distributed in the correct way.

When you leave out the -binding, all jobs are allowed to run on any core.

Exactly. It won't apply any binding at all and orted would think of being
unlimited. I.e. limited only by the number of slots it should use thereon.

So I guess the question I have for Ralph. I thought, and this might be
mixing some of the ideas Jeff and I've been talking about, that when a
RM executes the orted with a bound set of resources (ie cores) that
orted would bind the individual processes on a subset of the bounded
resources. Is this not really the case for 1.4.X branch? I believe it
is the case for the trunk based on Jeff's refactoring.

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Terry Dontje


On 11/16/2010 12:13 PM, Chris Jewell wrote:

On 16 Nov 2010, at 14:26, Terry Dontje wrote:

In the original case of 7 nodes and processes if we do -binding pe linear:2, 
and add the -bind-to-core to mpirun  I'd actually expect 6 of the nodes 
processes bind to one core and the 7th node with 2 processes to have each of 
those processes bound to different cores on the same machine.

Can we get a full output of such a run with -report-bindings turned on.  I 
think we should find out that things actually are happening correctly except 
for the fact that the 6 of the nodes have 2 cores allocated but only one is 
being bound to by a process.

Sure.   Here's the stderr of a job submitted to my cluster with 'qsub -pe mpi 8 
-binding linear:2 myScript.com'  where myScript.com runs 'mpirun -mca 
ras_gridengine_verbose 100 --report-bindings ./unterm':

[exec4:17384] System has detected external process binding to cores 0022
[exec4:17384] ras:gridengine: JOB_ID: 59352
[exec4:17384] ras:gridengine: PE_HOSTFILE: 
/usr/sge/default/spool/exec4/active_jobs/59352.1/pe_hostfile
[exec4:17384] ras:gridengine: exec4.cluster.stats.local: PE_HOSTFILE shows 
slots=2
[exec4:17384] ras:gridengine: exec2.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec4:17384] ras:gridengine: exec7.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec4:17384] ras:gridengine: exec3.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec4:17384] ras:gridengine: exec6.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec4:17384] ras:gridengine: exec1.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec4:17384] ras:gridengine: exec5.cluster.stats.local: PE_HOSTFILE shows 
slots=1


Is that all that came out?  I would have expected a some output from 
each process after the orted forked the processes but before the exec of 
unterm.


--td

Chris

--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778






___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Terry Dontje


On 11/16/2010 12:13 PM, Chris Jewell wrote:

On 16 Nov 2010, at 14:26, Terry Dontje wrote:

In the original case of 7 nodes and processes if we do -binding pe linear:2, 
and add the -bind-to-core to mpirun  I'd actually expect 6 of the nodes 
processes bind to one core and the 7th node with 2 processes to have each of 
those processes bound to different cores on the same machine.

Can we get a full output of such a run with -report-bindings turned on.  I 
think we should find out that things actually are happening correctly except 
for the fact that the 6 of the nodes have 2 cores allocated but only one is 
being bound to by a process.

Sure.   Here's the stderr of a job submitted to my cluster with 'qsub -pe mpi 8 
-binding linear:2 myScript.com'  where myScript.com runs 'mpirun -mca 
ras_gridengine_verbose 100 --report-bindings ./unterm':

[exec4:17384] System has detected external process binding to cores 0022
[exec4:17384] ras:gridengine: JOB_ID: 59352
[exec4:17384] ras:gridengine: PE_HOSTFILE: 
/usr/sge/default/spool/exec4/active_jobs/59352.1/pe_hostfile
[exec4:17384] ras:gridengine: exec4.cluster.stats.local: PE_HOSTFILE shows 
slots=2
[exec4:17384] ras:gridengine: exec2.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec4:17384] ras:gridengine: exec7.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec4:17384] ras:gridengine: exec3.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec4:17384] ras:gridengine: exec6.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec4:17384] ras:gridengine: exec1.cluster.stats.local: PE_HOSTFILE shows 
slots=1
[exec4:17384] ras:gridengine: exec5.cluster.stats.local: PE_HOSTFILE shows 
slots=1


Is that all that came out?  I would have expected a some output from 
each process after the orted forked the processes but before the exec of 
unterm.


--td

Chris

--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778






___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Terry Dontje

On 11/16/2010 10:59 AM, Reuti wrote:

Am 16.11.2010 um 15:26 schrieb Terry Dontje:

1. allocate a specified number of cores on each node to your job

this is currently the bug in the "slot<=> core" relation in SGE, which has to
be removed, updated or clarified. For now slot and core count are out of sync AFAICS.

Technically this isn't a bug but a gap in the allocation rule. I think the
solution is a new allocation rule.

Yes, you can phrase it this way. But what do you mean by "new allocation rule"?
The proposal of have a slot allocation rule that forces the number of
cores allocated on each node to equal the number of slots.

The slot allocation should follow the specified cores?

The other way around I think.

2. have SGE bind procs it launches to -all- of those cores. I believe SGE does
this automatically to constrain the procs to running on only those cores.

This is another "bug/feature" in SGE: it's a matter of discussion, whether the shepherd
should get exactly one core (in case you use more than one `qrsh`per node) for each call, or *all*
cores assigned (which we need right now, as the processes in Open MPI will be forks of orte
daemon). About such a situtation I filled an issue a long time ago and
"limit_to_one_qrsh_per_host yes/no" in the PE definition would do (this setting should
then also change the core allocation of the master process):

http://gridengine.sunsource.net/issues/show_bug.cgi?id=1254

Isn't it almost required to have the shepherd bind to all the cores so that the
orted inherits that binding?

Yes, for orted. But if you want to have any other (legacy) application which
using N times `qrsh` to an exechost when you got N slots thereon, then only one
core should be bound to each of the started shepherds.

Blech. Not sure of the solution for that but I see what you are saying
now :-).

3. tell OMPI to --bind-to-core.

In other words, tell SGE to allocate a certain number of cores on each node,
but to bind each proc to all of them (i.e., don't bind a proc to a specific
core). I'm pretty sure that is a standard SGE option today (at least, I know it
used to be). I don't believe any patch or devel work is required (to either SGE
or OMPI).

When you use a fixed allocation_rule and a matching -binding request it will
work today. But any other case won't be distributed in the correct way.

Ok, so what is the "correct" way and we sure it isn't distributed correctly?

You posted the two cases yesterday. Do we agree that both cases aren't correct, or do you think it's a
correct allocation for both cases? Even if it could be "repaired" in Open MPI, it would be
better to fix the generated 'pe' PE hostfile and 'set' allocation, i.e. the "slot<=>
cores" relation.

So I am not a GE type of guy but from what I've been led to believe what
happened is correct (in some form of correct). That is in case one we
asked for a core allocation of 1 core per node and a core allocation of
2 cores in the other case. That is what we were given. The fact that
we distributed the slots in a non-uniform manner I am not sure is GE's
fault. Note I can understand where it may seem non-intuitive and not
nice for people wanting to do things like this.

In the original case of 7 nodes and processes if we do -binding pe linear:2,
and add the -bind-to-core to mpirun I'd actually expect 6 of the nodes
processes bind to one core and the 7th node with 2 processes to have each of
those processes bound to different cores on the same machine.

Yes, possibly it could be repaired this way (for now I have no free machines to play with). But
then the "reserved" cores by the "-binding pe linear:2" are lost for other
processes on these 6 nodes, and the slot count gets out of sync with slots.
Right, if you want to rightsize the amount of cores allocated to slots
allocated on each node then we are stuck unless a new allocation rule is
made.

Can we get a full output of such a run with -report-bindings turned on. I
think we should find out that things actually are happening correctly except
for the fact that the 6 of the nodes have 2 cores allocated but only one is
being bound to by a process.

You mean, to accept the current behavior as being the intended one, as finally
for having only one job running on these machines we get what we asked for -
despite the fact that cores are lost for other processes?

Yes, that is what I mean. I first would like to prove at least to
myself things are working the way we think they are. I believe the
discussion of recovering the lost cores is the next step. Either we
redefine what -binding linear:X means in light of slots, we make a new
allocation rule -binding slots:X or live with the lost cores. Note, the
"we" here is loosely used. I am by no means the keeper of GE and just
injected myself in this discussion because,

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Terry Dontje

On 11/16/2010 09:08 AM, Reuti wrote:

Hi,

Am 16.11.2010 um 14:07 schrieb Ralph Castain:

Perhaps I'm missing it, but it seems to me that the real problem lies in the interaction
between SGE and OMPI during OMPI's two-phase launch. The verbose output shows that SGE
dutifully allocated the requested number of cores on each node. However, OMPI launches
only one process on each node (the ORTE daemon), which SGE "binds" to a single
core since that is what it was told to do.

Since SGE never sees the local MPI procs spawned by ORTE, it can't assign
bindings to them. The ORTE daemon senses its local binding (i.e., to a single
core in the allocation), and subsequently binds all its local procs to that
core.

I believe all you need to do is tell SGE to:

1. allocate a specified number of cores on each node to your job

this is currently the bug in the "slot<=> core" relation in SGE, which has to
be removed, updated or clarified. For now slot and core count are out of sync AFAICS.

Technically this isn't a bug but a gap in the allocation rule. I think
the solution is a new allocation rule.

2. have SGE bind procs it launches to -all- of those cores. I believe SGE does
this automatically to constrain the procs to running on only those cores.

This is another "bug/feature" in SGE: it's a matter of discussion, whether the shepherd
should get exactly one core (in case you use more than one `qrsh`per node) for each call, or *all*
cores assigned (which we need right now, as the processes in Open MPI will be forks of orte
daemon). About such a situtation I filled an issue a long time ago and
"limit_to_one_qrsh_per_host yes/no" in the PE definition would do (this setting should
then also change the core allocation of the master process):

http://gridengine.sunsource.net/issues/show_bug.cgi?id=1254
Isn't it almost required to have the shepherd bind to all the cores so
that the orted inherits that binding?

3. tell OMPI to --bind-to-core.

In other words, tell SGE to allocate a certain number of cores on each node,
but to bind each proc to all of them (i.e., don't bind a proc to a specific
core). I'm pretty sure that is a standard SGE option today (at least, I know it
used to be). I don't believe any patch or devel work is required (to either SGE
or OMPI).

When you use a fixed allocation_rule and a matching -binding request it will
work today. But any other case won't be distributed in the correct way.

Ok, so what is the "correct" way and we sure it isn't distributed correctly?

In the original case of 7 nodes and processes if we do -binding pe
linear:2, and add the -bind-to-core to mpirun I'd actually expect 6 of
the nodes processes bind to one core and the 7th node with 2 processes
to have each of those processes bound to different cores on the same
machine.

Can we get a full output of such a run with -report-bindings turned on.
I think we should find out that things actually are happening correctly
except for the fact that the 6 of the nodes have 2 cores allocated but
only one is being bound to by a process.

--td

-- Reuti

On Tue, Nov 16, 2010 at 4:07 AM, Reuti wrote:
Am 16.11.2010 um 10:26 schrieb Chris Jewell:

Hi all,

On 11/15/2010 02:11 PM, Reuti wrote:

Just to give my understanding of the problem:

Sorry, I am still trying to grok all your email as what the problem you
are trying to solve. So is the issue is trying to have two jobs having
processes on the same node be able to bind there processes on different
resources. Like core 1 for the first job and core 2 and 3 for the 2nd job?

--td

You can't get 2 slots on a machine, as it's limited by the core count to one
here, so such a slot allocation shouldn't occur at all.

So to clarify, the current -binding:
allocates binding_amount cores to each sge_shepherd process associated with a job_id.
There appears to be only one sge_shepherd process per job_id per execution node, so all
child processes run on these allocated cores. This is irrespective of the number of slots
allocated to the node.

I agree with Reuti that the binding_amount parameter should be a maximum number
of bound cores per node, with the actual number determined by the number of
slots allocated per node. FWIW, an alternative approach might be to have
another binding_type ('slot', say) that automatically allocated one core per
slot.

Of course, a complex situation might arise if a user submits a combined
MPI/multithreaded job, but then I guess we're into the realm of setting
allocation_rule.

IIRC there was a discussion on the [GE users] list about it, to get an uniform
distribution on all slave nodes for such jobs, as also e.g. $OMP_NUM_THREADS will be set
to the same value for all slave nodes for hybrid jobs. Otherwise it would be necessary to
adjust SGE to set this value in the "-builtin-" startup method automatically on
all nodes to the local granted slots value. For now a fixed allocation rule of

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-16 Thread Terry Dontje

On 11/16/2010 04:26 AM, Chris Jewell wrote:

Hi all,

On 11/15/2010 02:11 PM, Reuti wrote:

Just to give my understanding of the problem:

--td

You can't get 2 slots on a machine, as it's limited by the core count to one
here, so such a slot allocation shouldn't occur at all.

I believe the above is correct.

That might be correct, I've put in a question to someone who should know.

Of course, a complex situation might arise if a user submits a combined
MPI/multithreaded job, but then I guess we're into the realm of setting
allocation_rule.

Yes, that would get ugly.

Is it going to be worth looking at creating a patch for this? I don't know
much of the internals of SGE -- would it be hard work to do? I've not that
much time to dedicate towards it, but I could put some effort in if necessary...

Is the patch you're wanting is for a "slot" binding_type?

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-15 Thread Terry Dontje


On 11/15/2010 02:11 PM, Reuti wrote:

Just to give my understanding of the problem:

Am 15.11.2010 um 19:57 schrieb Terry Dontje:


On 11/15/2010 11:08 AM, Chris Jewell wrote:

Sorry, I am still trying to grok all your email as what the problem you
are trying to solve. So is the issue is trying to have two jobs having
processes on the same node be able to bind there processes on different
resources. Like core 1 for the first job and core 2 and 3 for the 2nd job?

--td


That's exactly it.  Each MPI process needs to be bound to 1 processor in a way 
that reflects GE's slot allocation scheme.



I actually don't think that I got it.  So you give two cases:

Case 1:
$ qsub -pe mpi 8 -binding pe linear:1 myScript.com

and my pe_hostfile looks like:

exec6.cluster.stats.local 2
batch.q@exec6.cluster.stats.local
  0,1

Shouldn't here two cores be reserved for exec6 as it got two slots?



That's what I was wondering.

exec1.cluster.stats.local 1
batch.q@exec1.cluster.stats.local
  0,1
exec7.cluster.stats.local 1
batch.q@exec7.cluster.stats.local
  0,1
exec5.cluster.stats.local 1
batch.q@exec5.cluster.stats.local
  0,1
exec4.cluster.stats.local 1
batch.q@exec4.cluster.stats.local
  0,1
exec3.cluster.stats.local 1
batch.q@exec3.cluster.stats.local
  0,1
exec2.cluster.stats.local 1
batch.q@exec2.cluster.stats.local
  0,1


Case 2:
Notice that, because I have specified the -binding pe linear:1, each execution 
node binds processes for the job_id to one core.  If I have -binding pe 
linear:2, I get:

exec6.cluster.stats.local 2
batch.q@exec6.cluster.stats.local
  0,1:0,2
exec1.cluster.stats.local 1
batch.q@exec1.cluster.stats.local
  0,1:0,2
exec7.cluster.stats.local 1
batch.q@exec7.cluster.stats.local
  0,1:0,2
exec4.cluster.stats.local 1
batch.q@exec4.cluster.stats.local
  0,1:0,2
exec3.cluster.stats.local 1
batch.q@exec3.cluster.stats.local
  0,1:0,2
exec2.cluster.stats.local 1
batch.q@exec2.cluster.stats.local
  0,1:0,2
exec5.cluster.stats.local 1
batch.q@exec5.cluster.stats.local
  0,1:0,2

Is your complaint really the fact that exec6 has been allocated two slots but 
there seems to only be one slot worth of resources allocated

All are wrong except exec6. They should only get one core assigned.


Huh?  I would have thought exec6 would get 4 cores and the rest are correct.

--td


-- Reuti



to it (ie in case one exec6 only has 1 core and case 2 it has two where maybe 
you'd expect 2 and 4 cores allocated respectively)?

--

Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle - Performance Technologies
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-15 Thread Terry Dontje


On 11/15/2010 11:08 AM, Chris Jewell wrote:

Sorry, I am still trying to grok all your email as what the problem you
are trying to solve. So is the issue is trying to have two jobs having
processes on the same node be able to bind there processes on different
resources. Like core 1 for the first job and core 2 and 3 for the 2nd job?

--td

That's exactly it.  Each MPI process needs to be bound to 1 processor in a way 
that reflects GE's slot allocation scheme.


I actually don't think that I got it.  So you give two cases:

Case 1:

$ qsub -pe mpi 8 -binding pe linear:1 myScript.com

and my pe_hostfile looks like:

exec6.cluster.stats.local 2batch.q@exec6.cluster.stats.local  0,1
exec1.cluster.stats.local 1batch.q@exec1.cluster.stats.local  0,1
exec7.cluster.stats.local 1batch.q@exec7.cluster.stats.local  0,1
exec5.cluster.stats.local 1batch.q@exec5.cluster.stats.local  0,1
exec4.cluster.stats.local 1batch.q@exec4.cluster.stats.local  0,1
exec3.cluster.stats.local 1batch.q@exec3.cluster.stats.local  0,1
exec2.cluster.stats.local 1batch.q@exec2.cluster.stats.local  0,1


Case 2:

Notice that, because I have specified the -binding pe linear:1, each execution 
node binds processes for the job_id to one core.  If I have -binding pe 
linear:2, I get:

exec6.cluster.stats.local 2batch.q@exec6.cluster.stats.local  0,1:0,2
exec1.cluster.stats.local 1batch.q@exec1.cluster.stats.local  0,1:0,2
exec7.cluster.stats.local 1batch.q@exec7.cluster.stats.local  0,1:0,2
exec4.cluster.stats.local 1batch.q@exec4.cluster.stats.local  0,1:0,2
exec3.cluster.stats.local 1batch.q@exec3.cluster.stats.local  0,1:0,2
exec2.cluster.stats.local 1batch.q@exec2.cluster.stats.local  0,1:0,2
exec5.cluster.stats.local 1batch.q@exec5.cluster.stats.local  0,1:0,2

Is your complaint really the fact that exec6 has been allocated two 
slots but there seems to only be one slot worth of resources allocated 
to it (ie in case one exec6 only has 1 core and case 2 it has two where 
maybe you'd expect 2 and 4 cores allocated respectively)?


--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts

2010-11-15 Thread Terry Dontje

Sorry, I am still trying to grok all your email as what the problem you 
are trying to solve.  So is the issue is trying to have two jobs having 
processes on the same node be able to bind there processes on different 
resources.  Like core 1 for the first job and core 2 and 3 for the 2nd job?


--td

On 11/15/2010 09:29 AM, Chris Jewell wrote:

Hi,


If, indeed, it is not possible currently to implement this type of core-binding 
in tightly integrated OpenMPI/GE, then a solution might lie in a custom script 
run in the parallel environment's 'start proc args'. This script would have to 
find out which slots are allocated where on the cluster, and write an OpenMPI 
rankfile.

Exactly this should work.

If you use "binding_instance" "pe" and reformat the information in the $PE_HOSTFILE to a 
"rankfile", it should work to get the desired allocation. Maybe you can share the script with this 
list once you got it working.


As far as I can see, that's not going to work.  This is because, exactly like 
"binding_instance" "set", for -binding pe linear:n you get n cores bound per 
node.  This is easily verifiable by using a long job and examining the pe_hostfile.  For example, I 
submit a job with:

$ qsub -pe mpi 8 -binding pe linear:1 myScript.com

and my pe_hostfile looks like:

exec6.cluster.stats.local 2 batch.q@exec6.cluster.stats.local 0,1
exec1.cluster.stats.local 1 batch.q@exec1.cluster.stats.local 0,1
exec7.cluster.stats.local 1 batch.q@exec7.cluster.stats.local 0,1
exec5.cluster.stats.local 1 batch.q@exec5.cluster.stats.local 0,1
exec4.cluster.stats.local 1 batch.q@exec4.cluster.stats.local 0,1
exec3.cluster.stats.local 1 batch.q@exec3.cluster.stats.local 0,1
exec2.cluster.stats.local 1 batch.q@exec2.cluster.stats.local 0,1

Notice that, because I have specified the -binding pe linear:1, each execution 
node binds processes for the job_id to one core.  If I have -binding pe 
linear:2, I get:

exec6.cluster.stats.local 2 batch.q@exec6.cluster.stats.local 0,1:0,2
exec1.cluster.stats.local 1 batch.q@exec1.cluster.stats.local 0,1:0,2
exec7.cluster.stats.local 1 batch.q@exec7.cluster.stats.local 0,1:0,2
exec4.cluster.stats.local 1 batch.q@exec4.cluster.stats.local 0,1:0,2
exec3.cluster.stats.local 1 batch.q@exec3.cluster.stats.local 0,1:0,2
exec2.cluster.stats.local 1 batch.q@exec2.cluster.stats.local 0,1:0,2
exec5.cluster.stats.local 1 batch.q@exec5.cluster.stats.local 0,1:0,2

So the pe_hostfile still doesn't give an accurate representation of the binding 
allocation for use by OpenMPI.  Question: is there a system file or command that I could 
use to check which processors are "occupied"?

Chris

--
Dr Chris Jewell
Department of Statistics
University of Warwick
Coventry
CV4 7AL
UK
Tel: +44 (0)24 7615 0778






___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] cannot install Open MPI 1.5 on Solaris x86_64 withOracle/Sun C 5.11

2010-11-01 Thread Terry Dontje

 I am able to build on Linux systems with Sun C 5.11 using gcc-4.1.2.  
Still trying to get a version of gcc 4.3.4 compiled on our systems so I 
can use it with Sun C 5.11 to build OMPI.


--td

On 11/01/2010 05:58 AM, Siegmar Gross wrote:

Hi,


   Sorry, but can you give us the config line, the config.log and the
full output of make preferrably with make V=1?

--td
On 10/29/2010 04:30 AM, Siegmar Gross wrote:

Hi,

I tried to build Open MPI 1.5 on Solaris X86 and x86_64 with Oracle
Studio 12.2. I can compile Open MPI with thread support, but I can
only partly install it because "libtool" will not find "f95" although
it is available. "make check" shows no failures.

I made a mistake the first time. I'm sorry for that. This weekend I
rebuild everything and now the following installations work. "ok"
means I could install the package and successfully run two small
programs (one is a simple matrix multiplication with MPI and OpenMP,
2 processes and 8 threads on a dual processor eight core SPARC64 VII
system). I used gcc-4.2.0 and Oracle/Sun C 5.11.

SunOS sparc,  32-bit, cc: ok
SunOS sparc,  64-bit, cc: ok
SunOS x86,32-bit, cc: ok
SunOS x86_64, 32-bit, cc: ok
SunOS x86_64, 64-bit, cc: ok
Linux x86,32-bit, cc: "make" still breaks
Linux x86_64, 32-bit, cc: "make" still breaks
Linux x86_64, 64-bit, cc: "make" still breaks

SunOS sparc,  32-bit, gcc: ok
SunOS sparc,  64-bit, gcc: ok
SunOS x86,32-bit, gcc: ok
SunOS x86_64, 32-bit, gcc: ok
SunOS x86_64, 64-bit, gcc: ok
Linux x86,32-bit, gcc: ok
Linux x86_64, 32-bit, gcc: ok
Linux x86_64, 64-bit, gcc: ok

The problems on Solaris x86 and Solaris x86_64 could be solved using
Sun C 5.11 instead of Sun C 5.9. Unfortuantely I have still the same
problem on Linux x86 and Linux x86_64 with Sun C 5.11.

tyr openmpi-1.5-Linux.x86_64.32_cc 417 tail -15
   log.make.Linux.x86_64.32_cc
make[3]: Leaving directory `.../opal/libltdl'
make[2]: Leaving directory `.../opal/libltdl'
Making all in asm
make[2]: Entering directory `.../opal/asm'
   CC asm.lo
rm -f atomic-asm.S
ln -s "../../../openmpi-1.5/opal/asm/generated/atomic-ia32-linux-nongas.s"
   atomic-asm.S
   CPPAS  atomic-asm.lo
cc1: error: unrecognized command line option "-fno-directives-only"
cc: cpp failed for atomic-asm.S
make[2]: *** [atomic-asm.lo] Error 1
make[2]: Leaving directory `.../opal/asm'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `.../opal'
make: *** [all-recursive] Error 1
tyr openmpi-1.5-Linux.x86_64.32_cc 418

I can switch back to Sun C 5.9 on Solaris x86(_64) systems if somebody
is interested to solve the problem for the older compiler. I used the
following options:

../openmpi-1.5/configure --prefix=/usr/local/openmpi-1.5_64_gcc \
   --libdir=/usr/local/openmpi-1.5_64_gcc/lib64 \
   LDFLAGS="-m64 -L/usr/local/gcc-4.2.0/lib/sparcv9" \
   CC="gcc" CPP="cpp" CXX="g++" CXXCPP="cpp" F77="gfortran" \
   CFLAGS="-m64" CXXFLAGS="-m64" FFLAGS="-m64" FCFLAGS="-m64" \
   CXXLDFLAGS="-m64" CPPFLAGS="" \
   C_INCL_PATH="" C_INCLUDE_PATH="" CPLUS_INCLUDE_PATH="" \
   OBJC_INCLUDE_PATH="" MPIHOME="" \
   --without-udapl --without-openib \
   --enable-mpi-f90 --with-mpi-f90-size=small \
   --enable-heterogeneous --enable-cxx-exceptions \
   --enable-shared --enable-orterun-prefix-by-default \
   --with-threads=posix --enable-mpi-threads --disable-progress-threads \
   |&  tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_gcc

For x86_64 I changed one line:

   LDFLAGS="-m64 -L/usr/local/gcc-4.2.0/lib/amd64" \


../openmpi-1.5/configure --prefix=/usr/local/openmpi-1.5_64_cc \
   --libdir=/usr/local/openmpi-1.5_64_cc/lib64 \
   LDFLAGS="-m64" \
   CC="cc" CXX="CC" F77="f77" FC="f95" \
   CFLAGS="-m64" CXXFLAGS="-m64" FFLAGS="-m64" FCFLAGS="-m64" \
   CXXLDFLAGS="-m64" CPPFLAGS="" \
   C_INCL_PATH="" C_INCLUDE_PATH="" CPLUS_INCLUDE_PATH="" \
   OBJC_INCLUDE_PATH="" MPICHHOME="" \
   --without-udapl --without-openib \
   --enable-mpi-f90 --with-mpi-f90-size=small \
   --enable-heterogeneous --enable-cxx-exceptions \
   --enable-shared --enable-orterun-prefix-by-default \
   --with-threads=posix --enable-mpi-threads --disable-progress-threads \
   |&  tee log.configure.$SYSTEM_ENV.$MACHINE_ENV.64_cc


For 32-bit systems I changed "-m64" to "-m32", didn't specify "-L..."
in LDFLAGS, and didn't use "--libdir=...".


Kind regards

Siegmar

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] cannot install Open MPI 1.5 on Solaris x86_64 with Oracle/Sun C 5.11

2010-10-29 Thread Terry Dontje

 Sorry, but can you give us the config line, the config.log and the 
full output of make preferrably with make V=1?


--td
On 10/29/2010 04:30 AM, Siegmar Gross wrote:

Hi,

I tried to build Open MPI 1.5 on Solaris X86 and x86_64 with Oracle
Studio 12.2. I can compile Open MPI with thread support, but I can
only partly install it because "libtool" will not find "f95" although
it is available. "make check" shows no failures.

tyr openmpi-1.5-SunOS.x86_64.32_cc 188 ssh sunpc4 cc -V
cc: Sun C 5.11 SunOS_i386 145355-01 2010/10/11
usage: cc [ options ] files.  Use 'cc -flags' for details

No suspicious warnings or errors in log.configure.SunOS.x86_64.32_cc.

tyr openmpi-1.5-SunOS.x86_64.32_cc 182 grep -i warning:
   log.make.SunOS.x86_64.32_cc | more

".../opal/mca/crs/none/crs_none_module.c", line 136:
   warning: statement not reached

".../orte/mca/errmgr/errmgr.h", line 135: warning: attribute
   "noreturn" may not be applied to variable, ignored
(a lot of these warnings)

".../orte/mca/rmcast/tcp/rmcast_tcp.c", line 982: warning:
   assignment type mismatch:
".../orte/mca/rmcast/tcp/rmcast_tcp.c", line 1023: warning:
   assignment type mismatch:
".../orte/mca/rmcast/udp/rmcast_udp.c", line 877: warning:
   assignment type mismatch:
".../orte/mca/rmcast/udp/rmcast_udp.c", line 918: warning:
   assignment type mismatch:

".../orte/tools/orte-ps/orte-ps.c", line 288: warning:
   initializer does not fit or is out of range: 0xfffe
".../orte/tools/orte-ps/orte-ps.c", line 289: warning:
   initializer does not fit or is out of range: 0xfffe

grep -i error: log.make.SunOS.x86_64.32_cc | more

tyr openmpi-1.5-SunOS.x86_64.32_cc 185 grep -i FAIL
   log.make-check.SunOS.x86_64.32_cc
tyr openmpi-1.5-SunOS.x86_64.32_cc 186 grep -i SKIP
   log.make-check.SunOS.x86_64.32_cc
tyr openmpi-1.5-SunOS.x86_64.32_cc 187 grep -i PASS
   log.make-check.SunOS.x86_64.32_cc
PASS: predefined_gap_test
File opened with dladvise_local, all passed
PASS: dlopen_test
All 2 tests passed
 - 1 threads: Passed
 - 2 threads: Passed
 - 4 threads: Passed
 - 5 threads: Passed
 - 8 threads: Passed
PASS: atomic_barrier
 - 1 threads: Passed
 - 2 threads: Passed
 - 4 threads: Passed
 - 5 threads: Passed
 - 8 threads: Passed
PASS: atomic_barrier_noinline
 - 1 threads: Passed
 - 2 threads: Passed
 - 4 threads: Passed
 - 5 threads: Passed
 - 8 threads: Passed
PASS: atomic_spinlock
 - 1 threads: Passed
 - 2 threads: Passed
 - 4 threads: Passed
 - 5 threads: Passed
 - 8 threads: Passed
PASS: atomic_spinlock_noinline
 - 1 threads: Passed
 - 2 threads: Passed
 - 4 threads: Passed
 - 5 threads: Passed
 - 8 threads: Passed
PASS: atomic_math
 - 1 threads: Passed
 - 2 threads: Passed
 - 4 threads: Passed
 - 5 threads: Passed
 - 8 threads: Passed
PASS: atomic_math_noinline
 - 1 threads: Passed
 - 2 threads: Passed
 - 4 threads: Passed
 - 5 threads: Passed
 - 8 threads: Passed
PASS: atomic_cmpset
 - 1 threads: Passed
 - 2 threads: Passed
 - 4 threads: Passed
 - 5 threads: Passed
 - 8 threads: Passed
PASS: atomic_cmpset_noinline
All 8 tests passed
All 0 tests passed
All 0 tests passed
decode [PASSED]
PASS: opal_datatype_test
PASS: checksum
PASS: position
decode [PASSED]
PASS: ddt_test
decode [PASSED]
PASS: ddt_raw
All 5 tests passed
SUPPORT: OMPI Test Passed: opal_path_nfs(): (0 tests)
PASS: opal_path_nfs
1 test passed


tyr openmpi-1.5-SunOS.x86_64.32_cc 190 grep -i warning:
   log.make-install.SunOS.x86_64.32_cc | more
libtool: install: warning: relinking `libmpi_cxx.la'
libtool: install: warning: relinking `libmpi_f77.la'
libtool: install: warning: relinking `libmpi_f90.la'

tyr openmpi-1.5-SunOS.x86_64.32_cc 191 grep -i error:
   log.make-install.SunOS.x86_64.32_cc | more
libtool: install: error: relink `libmpi_f90.la' with the above
   command before installing it

tyr openmpi-1.5-SunOS.x86_64.32_cc 194 tail -20
   log.make-install.SunOS.x86_64.32_cc
make[4]: Leaving directory `.../ompi/mpi/f90/scripts'
make[4]: Entering directory `.../ompi/mpi/f90'
make[5]: Entering directory `.../ompi/mpi/f90'
test -z "/usr/local/openmpi-1.5_32_cc/lib" ||
   /usr/local/bin/mkdir -p "/usr/local/openmpi-1.5_32_cc/lib"
  /bin/bash ../../../libtool   --mode=install /usr/local/bin/install -c
libmpi_f90.la '/usr/local/openmpi-1.5_32_cc/lib'
libtool: install: warning: relinking `libmpi_f90.la'
libtool: install: (cd
/export2/src/openmpi-1.5/openmpi-1.5-SunOS.x86_64.32_cc/ompi/mpi/f90; /bin/bash
/export2/src/openmpi-1.5/openmpi-1.5-SunOS.x86_64.32_cc/libtool  --silent --tag 
FC
--mode=relink f95 -I../../../ompi/include 
-I../../../../openmpi-1.5/ompi/include -I.
-I../../../../openmpi-1.5/ompi/mpi/f90 -I../../../ompi/mpi/f90 -m32 
-version-info 1:0:0
-export-dynamic -m32 -o libmpi_f90.la -rpath /usr/local/openmpi-1.5_32_cc/lib 
mpi.lo
mpi_sizeof.lo mpi_comm_spawn_multiple_f90.lo mpi_testall_f90.lo

Re: [OMPI users] Open MPI program cannot complete

2010-10-25 Thread Terry Dontje

 So what you are saying is *all* the ranks have entered MPI_Finalize 
and only a subset has exited per placing prints before and after 
MPI_Finalize.  Good.  So my guess is that the processes stuck in 
MPI_Finalize have a prior MPI request outstanding that for whatever 
reason is unable to complete.  So I would first look at all the MPI 
requests and make sure they completed.


--td

On 10/25/2010 02:38 AM, Jack Bryan wrote:

thanks
I found a problem:

I used:

 cout << " I am rank " << rank << " I am before 
MPI_Finalize()" << endl;

 MPI_Finalize();
cout << " I am rank " << rank << " I am after MPI_Finalize()" << endl;
 return 0;

I can get the output " I am rank 0 (1, 2, ) I am before 
MPI_Finalize() ".


and
   " I am rank 0 I am after MPI_Finalize() "
But, other processes do not printed out "I am rank ... I am after 
MPI_Finalize()" .


It is weird. The process has reached the point just before 
MPI_Finalize(), why they are hanged there ?


Are there other better ways to check this ?

Any help is appreciated.

thanks

Jack

Oct. 25 2010


From: solarbik...@gmail.com
Date: Sun, 24 Oct 2010 19:47:54 -0700
To: us...@open-mpi.org
Subject: Re: [OMPI users] Open MPI program cannot complete

how do you know all process call mpi_finalize?  did you have all of 
them print out something before they call mpi_finalize? I think what 
Gustavo is getting at is maybe you had some MPI calls within your 
snippets that hangs your program, thus some of your processes never 
called mpi_finalize.


On Sun, Oct 24, 2010 at 6:59 PM, Jack Bryan > wrote:


Thanks,

But, my code is too long to be posted.

What are the common reasons of this kind of problems ?

Any help is appreciated.

Jack

Oct. 24 2010

> From: g...@ldeo.columbia.edu 
> Date: Sun, 24 Oct 2010 18:09:52 -0400

> To: us...@open-mpi.org 
> Subject: Re: [OMPI users] Open MPI program cannot complete
>
> Hi Jack
>
> Your code snippet is too terse, doesn't show the MPI calls.
> It is hard to guess what is the problem this way.
>
> Gus Correa
> On Oct 24, 2010, at 5:43 PM, Jack Bryan wrote:
>
> > Thanks for the reply.
> > But, I use mpi_waitall() to make sure that all MPI
communications have been done before a process call MPI_Finalize()
and returns.
> >
> > Any help is appreciated.
> >
> > thanks
> >
> > Jack
> >
> > Oct. 24 2010
> >
> > > From: g...@ldeo.columbia.edu 
> > > Date: Sun, 24 Oct 2010 17:31:11 -0400
> > > To: us...@open-mpi.org 
> > > Subject: Re: [OMPI users] Open MPI program cannot complete
> > >
> > > Hi Jack
> > >
> > > It may depend on "do some things".
> > > Does it involve MPI communication?
> > >
> > > Also, why not put MPI_Finalize();return 0 outside the ifs?
> > >
> > > Gus Correa
> > >
> > > On Oct 24, 2010, at 2:23 PM, Jack Bryan wrote:
> > >
> > > > Hi
> > > >
> > > > I got a problem of open MPI.
> > > >
> > > > My program has 5 processes.
> > > >
> > > > All of them can run MPI_Finalize() and return 0.
> > > >
> > > > But, the whole program cannot be completed.
> > > >
> > > > In the MPI cluster job queue, it is strill in running status.
> > > >
> > > > If I use 1 process to run it, no problem.
> > > >
> > > > Why ?
> > > >
> > > > My program:
> > > >
> > > > int main (int argc, char **argv)
> > > > {
> > > >
> > > > MPI_Init(, );
> > > > MPI_Comm_rank(MPI_COMM_WORLD, );
> > > > MPI_Comm_size(MPI_COMM_WORLD, );
> > > > MPI_Comm world;
> > > > world = MPI_COMM_WORLD;
> > > >
> > > > if (myRank == 0)
> > > > {
> > > > do some things.
> > > > }
> > > >
> > > > if (myRank != 0)
> > > > {
> > > > do some things.
> > > > MPI_Finalize();
> > > > return 0 ;
> > > > }
> > > > if (myRank == 0)
> > > > {
> > > > MPI_Finalize();
> > > > return 0;
> > > > }
> > > >
> > > > }
> > > >
> > > > And, some output files get wrong codes, which can not be
readible.
> > > > In 1-process case, the program can print correct results
to these output files .
> > > >
> > > > Any help is appreciated.
> > > >
> > > > thanks
> > > >
> > > > Jack
> > > >
> > > > Oct. 24 2010
> > > >
> > > > ___
> > > > users mailing list
> > > > us...@open-mpi.org 
> > > > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > >
> > >
> > > ___
> > >

Re: [OMPI users] segmentation fault in mpiexec (Linux, Oracle/Sun C)

2010-10-21 Thread Terry Dontje

 When you do a make can your add a V=1 to have the actual compile lines 
printed out.  That will probably show you the line with 
-fno-directives-only in it.  Which is odd because I think that option is 
a gcc'ism and don't know why it would show up in a studio build (note my 
build doesn't show it).


Maybe a copy of the config.log and config.status might be helpful.  Have 
you tried to start from square one?  It really seems like the configure 
or libtool might be setting things up for gcc which is odd with the 
configure line you show.


--td

On 10/21/2010 09:41 AM, Siegmar Gross wrote:

   I wonder if the error below be due to crap being left over in the
source tree.  Can you do a "make clean".  Note on a new checkout from
the v1.5 svn branch I was able to build 64 bit with the following
configure line:

linpc4 openmpi-1.5-Linux.x86_64.32_cc 123 make clean
Making clean in test
make[1]: Entering directory
...

../openmpi-1.5/configure \
   FC=f95 F77=f77 CC=cc CXX=CC --without-openib --without-udapl \
   --enable-heterogeneous --enable-cxx-exceptions --enable-shared \
   --enable-orterun-prefix-by-default --with-sge --disable-mpi-threads \
   --enable-mpi-f90 --with-mpi-f90-size=small --disable-progress-threads \
   --prefix=/usr/local/openmpi-1.5_32_cc CFLAGS=-m64 CXXFLAGS=-m64 \
   FFLAGS=-m64 FCFLAGS=-m64

make |&  tee log.make.$SYSTEM_ENV.$MACHINE_ENV.32_cc


...
make[3]: Leaving directory
`/export2/src/openmpi-1.5/openmpi-1.5-Linux.x86_64.32_cc/opal/libltdl'
make[2]: Leaving directory
`/export2/src/openmpi-1.5/openmpi-1.5-Linux.x86_64.32_cc/opal/libltdl'
Making all in asm
make[2]: Entering directory
`/export2/src/openmpi-1.5/openmpi-1.5-Linux.x86_64.32_cc/opal/asm'
   CC asm.lo
rm -f atomic-asm.S
ln -s ".../opal/asm/generated/atomic-ia32-linux-nongas.s" atomic-asm.S
   CPPAS  atomic-asm.lo
cc1: error: unrecognized command line option "-fno-directives-only"
cc: cpp failed for atomic-asm.S
make[2]: *** [atomic-asm.lo] Error 1
make[2]: Leaving directory `.../opal/asm'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/.../opal'
make: *** [all-recursive] Error 1


Do you know where I can find "-fno-directives-only"? "grep" didn't
show any results. I tried to rebuild the package with my original
settings and didn't succeed (same error as above) so that something
must have changed in the last two days on "linpc4". The operator told
me that he hasn't changed anything, so I have no idea why I cannot
build the package today. The log-files from "configure" are identical,
but the log-files from "make" differ (I changed the language with
"setenv LC_ALL C" because I have some errors on other machines as well
and wanted english messages so that you can read them).


tyr openmpi-1.5 198 diff
   openmpi-1.5-Linux.x86_64.32_cc.old/log.configure.Linux.x86_64.32_cc
   openmpi-1.5-Linux.x86_64.32_cc/log.configure.Linux.x86_64.32_cc |more

tyr openmpi-1.5 199 diff
   openmpi-1.5-Linux.x86_64.32_cc.old/log.make.Linux.x86_64.32_cc
   openmpi-1.5-Linux.x86_64.32_cc/log.make.Linux.x86_64.32_cc | more
3c3
<  make[1]: FÃ¼r das Ziel Â»allÂ« ist nichts zu tun.
---

make[1]: Nothing to be done for `all'.

7c7
<  make[1]: FÃ¼r das Ziel Â»allÂ« ist nichts zu tun.
---

make[1]: Nothing to be done for `all'.

74,76c74,76
<  :19:0: Warnung: Â»__FLT_EVAL_METHOD__Â« redefiniert
<  :93:0: Anmerkung: dies ist die Stelle der vorherigen Definition

Re: [OMPI users] segmentation fault in mpiexec (Linux, Oracle/Sun C)

2010-10-21 Thread Terry Dontje


 On 10/21/2010 10:18 AM, Jeff Squyres wrote:

Terry --

Can you file relevant ticket(s) for v1.5 on Trac?

Once I have more information and have proven it isn't due to us using 
old compilers or a compiler error itself.


--td

On Oct 21, 2010, at 10:10 AM, Terry Dontje wrote:


I've reproduced Siegmar's issue when I have the threads options on but it does 
not show up when they are off.  It is actually segv'ing in 
mca_btl_sm_component_close on an access at address 0 (obviously not a good 
thing).  I am going compile things with debug on and see if I can track this 
further but I think I am smelling the smoke of a bug...

Siegmar, I was able to get stuff working with 32 bits when I removed -with-threads=posix 
and replaced "-enable-mpi-threads" with --disable-mpi-threads in your configure 
line.  I think your previous issue with things not building must be left over cruft.

Note, my compiler hang disappeared on me.  So maybe there was an environmental 
issue on my side.

--td


On 10/21/2010 06:47 AM, Terry Dontje wrote:

On 10/21/2010 06:43 AM, Jeff Squyres (jsquyres) wrote:

Also, i'm not entirely sure what all the commands are that you are showing. 
Some of those warnings (eg in config.log) are normal.

The 32 bit test failure is not, though. Terry - any idea there?

The test program is failing in MPI_Finalize which seems odd and the code itself 
looks pretty dead simple.  I am rebuilding a v1.5 workspace without the 
different thread options.  Once that is done I'll try the test program.

BTW, when I tried to build with the original options Siegmar used the compiles 
looked like they hung, doh.

--td


Sent from my PDA. No type good.

On Oct 21, 2010, at 6:25 AM, "Terry Dontje"<terry.don...@oracle.com>  wrote:


I wonder if the error below be due to crap being left over in the source tree.  Can you 
do a "make clean".  Note on a new checkout from the v1.5 svn branch I was able 
to build 64 bit with the following configure line:

../configure FC=f95 F77=f77 CC=cc CXX=CC --without-openib --without-udapl 
-enable-heterogeneous --enable-cxx-exceptions --enable-shared 
--enable-orterun-prefix-by-default --with-sge --disable-mpi-threads 
--enable-mpi-f90 --with-mpi-f90-size=small --disable-progress-threads 
--prefix=/workspace/tdd/ctnext/v15 CFLAGS=-m64 CXXFLAGS=-m64 
FFLAGS=-m64 FCFLAGS=-m64

--td
On 10/21/2010 05:38 AM, Siegmar Gross wrote:

Hi,

thank you very much for your reply.



   Can you remove the -with-threads and -enable-mpi-threads options from
the configure line and see if that helps your 32 bit problem any?


I cannot build the package when I remove these options.

linpc4 openmpi-1.5-Linux.x86_64.32_cc 189 head -8 config.log
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by Open MPI configure 1.5, which was
generated by GNU Autoconf 2.65.  Invocation command line was

   $ ../openmpi-1.5/configure --prefix=/usr/local/openmpi-1.5_32_cc
   CFLAGS=-m32 CXXFLAGS=-m32 FFLAGS=-m32 FCFLAGS=-m32 CXXLDFLAGS=-m32
   CPPFLAGS= LDFLAGS=-m32 C_INCL_PATH= C_INCLUDE_PATH= CPLUS_INCLUDE_PATH=
   OBJC_INCLUDE_PATH= MPICHHOME= CC=cc CXX=CC F77=f95 FC=f95
   --without-udapl --enable-shared --enable-heterogeneous
   --enable-cxx-exceptions


linpc4 openmpi-1.5-Linux.x86_64.32_cc 190 head -8 ../*.old/config.log
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by Open MPI configure 1.5, which was
generated by GNU Autoconf 2.65.  Invocation command line was

   $ ../openmpi-1.5/configure --prefix=/usr/local/openmpi-1.5_32_cc
   CFLAGS=-m32 CXXFLAGS=-m32 FFLAGS=-m32 FCFLAGS=-m32 CXXLDFLAGS=-m32
   CPPFLAGS= LDFLAGS=-m32 C_INCL_PATH= C_INCLUDE_PATH= CPLUS_INCLUDE_PATH=
   OBJC_INCLUDE_PATH= MPICHHOME= CC=cc CXX=CC F77=f95 FC=f95
   --without-udapl --with-threads=posix --enable-mpi-threads
   --enable-shared --enable-heterogeneous --enable-cxx-exceptions


linpc4 openmpi-1.5-Linux.x86_64.32_cc 194 dir log.* ../*.old/log.*
... 132406 Oct 19 13:01
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.configure.Linux.x86_64.32_cc
... 195587 Oct 19 16:09
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.make-check.Linux.x86_64.32_cc
... 356672 Oct 19 16:07
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.make-install.Linux.x86_64.32_cc
... 280596 Oct 19 13:42
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.make.Linux.x86_64.32_cc
... 132265 Oct 21 10:51 log.configure.Linux.x86_64.32_cc
...  10890 Oct 21 10:51 log.make.Linux.x86_64.32_cc


linpc4 openmpi-1.5-Linux.x86_64.32_cc 195 grep -i warning:
   log.configure.Linux.x86_64.32_cc
configure: WARNING: *** Did not find corresponding C type
configure: WARNING: MPI_REAL16 and MPI_COMPLEX32 support have been disabled
configure: WARNING: *** Corresponding Fortran 77 type (REAL*16) not supported
configure: WARNING: *** Skipping Fortran 90 type (REA

Re: [OMPI users] segmentation fault in mpiexec (Linux, Oracle/Sun C)

2010-10-21 Thread Terry Dontje

 I've reproduced Siegmar's issue when I have the threads options on but 
it does not show up when they are off.  It is actually segv'ing in 
mca_btl_sm_component_close on an access at address 0 (obviously not a 
good thing).  I am going compile things with debug on and see if I can 
track this further but I think I am smelling the smoke of a bug...


Siegmar, I was able to get stuff working with 32 bits when I removed 
-with-threads=posix and replaced "-enable-mpi-threads" with 
--disable-mpi-threads in your configure line.  I think your previous 
issue with things not building must be left over cruft.


Note, my compiler hang disappeared on me.  So maybe there was an 
environmental issue on my side.


--td


On 10/21/2010 06:47 AM, Terry Dontje wrote:

On 10/21/2010 06:43 AM, Jeff Squyres (jsquyres) wrote:
Also, i'm not entirely sure what all the commands are that you are 
showing. Some of those warnings (eg in config.log) are normal.


The 32 bit test failure is not, though. Terry - any idea there?
The test program is failing in MPI_Finalize which seems odd and the 
code itself looks pretty dead simple.  I am rebuilding a v1.5 
workspace without the different thread options.  Once that is done 
I'll try the test program.


BTW, when I tried to build with the original options Siegmar used the 
compiles looked like they hung, doh.


--td



Sent from my PDA. No type good.

On Oct 21, 2010, at 6:25 AM, "Terry Dontje" <terry.don...@oracle.com 
<mailto:terry.don...@oracle.com>> wrote:


I wonder if the error below be due to crap being left over in the 
source tree.  Can you do a "make clean".  Note on a new checkout 
from the v1.5 svn branch I was able to build 64 bit with the 
following configure line:


../configure FC=f95 F77=f77 CC=cc CXX=CC --without-openib 
--without-udapl -enable-heterogeneous --enable-cxx-exceptions 
--enable-shared --enable-orterun-prefix-by-default --with-sge 
--disable-mpi-threads --enable-mpi-f90 --with-mpi-f90-size=small 
--disable-progress-threads --prefix=/workspace/tdd/ctnext/v15 
CFLAGS=-m64 CXXFLAGS=-m64 FFLAGS=-m64 FCFLAGS=-m64


--td
On 10/21/2010 05:38 AM, Siegmar Gross wrote:

Hi,

thank you very much for your reply.


   Can you remove the -with-threads and -enable-mpi-threads options from
the configure line and see if that helps your 32 bit problem any?

I cannot build the package when I remove these options.

linpc4 openmpi-1.5-Linux.x86_64.32_cc 189 head -8 config.log
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by Open MPI configure 1.5, which was
generated by GNU Autoconf 2.65.  Invocation command line was

   $ ../openmpi-1.5/configure --prefix=/usr/local/openmpi-1.5_32_cc
   CFLAGS=-m32 CXXFLAGS=-m32 FFLAGS=-m32 FCFLAGS=-m32 CXXLDFLAGS=-m32
   CPPFLAGS= LDFLAGS=-m32 C_INCL_PATH= C_INCLUDE_PATH= CPLUS_INCLUDE_PATH=
   OBJC_INCLUDE_PATH= MPICHHOME= CC=cc CXX=CC F77=f95 FC=f95
   --without-udapl --enable-shared --enable-heterogeneous
   --enable-cxx-exceptions


linpc4 openmpi-1.5-Linux.x86_64.32_cc 190 head -8 ../*.old/config.log
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by Open MPI configure 1.5, which was
generated by GNU Autoconf 2.65.  Invocation command line was

   $ ../openmpi-1.5/configure --prefix=/usr/local/openmpi-1.5_32_cc
   CFLAGS=-m32 CXXFLAGS=-m32 FFLAGS=-m32 FCFLAGS=-m32 CXXLDFLAGS=-m32
   CPPFLAGS= LDFLAGS=-m32 C_INCL_PATH= C_INCLUDE_PATH= CPLUS_INCLUDE_PATH=
   OBJC_INCLUDE_PATH= MPICHHOME= CC=cc CXX=CC F77=f95 FC=f95
   --without-udapl --with-threads=posix --enable-mpi-threads
   --enable-shared --enable-heterogeneous --enable-cxx-exceptions


linpc4 openmpi-1.5-Linux.x86_64.32_cc 194 dir log.* ../*.old/log.*
... 132406 Oct 19 13:01
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.configure.Linux.x86_64.32_cc
... 195587 Oct 19 16:09
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.make-check.Linux.x86_64.32_cc
... 356672 Oct 19 16:07
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.make-install.Linux.x86_64.32_cc
... 280596 Oct 19 13:42
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.make.Linux.x86_64.32_cc
... 132265 Oct 21 10:51 log.configure.Linux.x86_64.32_cc
...  10890 Oct 21 10:51 log.make.Linux.x86_64.32_cc


linpc4 openmpi-1.5-Linux.x86_64.32_cc 195 grep -i warning:
   log.configure.Linux.x86_64.32_cc
configure: WARNING: *** Did not find corresponding C type
configure: WARNING: MPI_REAL16 and MPI_COMPLEX32 support have been disabled
configure: WARNING: *** Corresponding Fortran 77 type (REAL*16) not supported
configure: WARNING: *** Skipping Fortran 90 type (REAL*16)
configure: WARNING: valgrind.h not found
configure: WARNING: Unknown architecture ... proceeding anyway
configure: WARNING: File locks may not work with NFS.  See the Installation and
configure: WARNING:  -xldscope=hidden has been added to CFLAGS

linp

Re: [OMPI users] segmentation fault in mpiexec (Linux, Oracle/Sun C)

2010-10-21 Thread Terry Dontje


 On 10/21/2010 06:43 AM, Jeff Squyres (jsquyres) wrote:
Also, i'm not entirely sure what all the commands are that you are 
showing. Some of those warnings (eg in config.log) are normal.


The 32 bit test failure is not, though. Terry - any idea there?
The test program is failing in MPI_Finalize which seems odd and the code 
itself looks pretty dead simple.  I am rebuilding a v1.5 workspace 
without the different thread options.  Once that is done I'll try the 
test program.


BTW, when I tried to build with the original options Siegmar used the 
compiles looked like they hung, doh.


--td



Sent from my PDA. No type good.

On Oct 21, 2010, at 6:25 AM, "Terry Dontje" <terry.don...@oracle.com 
<mailto:terry.don...@oracle.com>> wrote:


I wonder if the error below be due to crap being left over in the 
source tree.  Can you do a "make clean".  Note on a new checkout from 
the v1.5 svn branch I was able to build 64 bit with the following 
configure line:


../configure FC=f95 F77=f77 CC=cc CXX=CC --without-openib 
--without-udapl -enable-heterogeneous --enable-cxx-exceptions 
--enable-shared --enable-orterun-prefix-by-default --with-sge 
--disable-mpi-threads --enable-mpi-f90 --with-mpi-f90-size=small 
--disable-progress-threads --prefix=/workspace/tdd/ctnext/v15 
CFLAGS=-m64 CXXFLAGS=-m64 FFLAGS=-m64 FCFLAGS=-m64


--td
On 10/21/2010 05:38 AM, Siegmar Gross wrote:

Hi,

thank you very much for your reply.


   Can you remove the -with-threads and -enable-mpi-threads options from
the configure line and see if that helps your 32 bit problem any?

I cannot build the package when I remove these options.

linpc4 openmpi-1.5-Linux.x86_64.32_cc 189 head -8 config.log
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by Open MPI configure 1.5, which was
generated by GNU Autoconf 2.65.  Invocation command line was

   $ ../openmpi-1.5/configure --prefix=/usr/local/openmpi-1.5_32_cc
   CFLAGS=-m32 CXXFLAGS=-m32 FFLAGS=-m32 FCFLAGS=-m32 CXXLDFLAGS=-m32
   CPPFLAGS= LDFLAGS=-m32 C_INCL_PATH= C_INCLUDE_PATH= CPLUS_INCLUDE_PATH=
   OBJC_INCLUDE_PATH= MPICHHOME= CC=cc CXX=CC F77=f95 FC=f95
   --without-udapl --enable-shared --enable-heterogeneous
   --enable-cxx-exceptions


linpc4 openmpi-1.5-Linux.x86_64.32_cc 190 head -8 ../*.old/config.log
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by Open MPI configure 1.5, which was
generated by GNU Autoconf 2.65.  Invocation command line was

   $ ../openmpi-1.5/configure --prefix=/usr/local/openmpi-1.5_32_cc
   CFLAGS=-m32 CXXFLAGS=-m32 FFLAGS=-m32 FCFLAGS=-m32 CXXLDFLAGS=-m32
   CPPFLAGS= LDFLAGS=-m32 C_INCL_PATH= C_INCLUDE_PATH= CPLUS_INCLUDE_PATH=
   OBJC_INCLUDE_PATH= MPICHHOME= CC=cc CXX=CC F77=f95 FC=f95
   --without-udapl --with-threads=posix --enable-mpi-threads
   --enable-shared --enable-heterogeneous --enable-cxx-exceptions


linpc4 openmpi-1.5-Linux.x86_64.32_cc 194 dir log.* ../*.old/log.*
... 132406 Oct 19 13:01
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.configure.Linux.x86_64.32_cc
... 195587 Oct 19 16:09
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.make-check.Linux.x86_64.32_cc
... 356672 Oct 19 16:07
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.make-install.Linux.x86_64.32_cc
... 280596 Oct 19 13:42
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.make.Linux.x86_64.32_cc
... 132265 Oct 21 10:51 log.configure.Linux.x86_64.32_cc
...  10890 Oct 21 10:51 log.make.Linux.x86_64.32_cc


linpc4 openmpi-1.5-Linux.x86_64.32_cc 195 grep -i warning:
   log.configure.Linux.x86_64.32_cc
configure: WARNING: *** Did not find corresponding C type
configure: WARNING: MPI_REAL16 and MPI_COMPLEX32 support have been disabled
configure: WARNING: *** Corresponding Fortran 77 type (REAL*16) not supported
configure: WARNING: *** Skipping Fortran 90 type (REAL*16)
configure: WARNING: valgrind.h not found
configure: WARNING: Unknown architecture ... proceeding anyway
configure: WARNING: File locks may not work with NFS.  See the Installation and
configure: WARNING:  -xldscope=hidden has been added to CFLAGS

linpc4 openmpi-1.5-Linux.x86_64.32_cc 196 grep -i warning:
   ../*.old/log.configure.Linux.x86_64.32_cc
configure: WARNING: *** Did not find corresponding C type
configure: WARNING: MPI_REAL16 and MPI_COMPLEX32 support have been disabled
configure: WARNING: *** Corresponding Fortran 77 type (REAL*16) not supported
configure: WARNING: *** Skipping Fortran 90 type (REAL*16)
configure: WARNING: valgrind.h not found
configure: WARNING: Unknown architecture ... proceeding anyway
configure: WARNING: File locks may not work with NFS.  See the Installation and
configure: WARNING:  -xldscope=hidden has been added to CFLAGS

linpc4 openmpi-1.5-Linux.x86_64.32_cc 197 grep -i error:
   log.configure.Linux.x86_64.32_cc
configure: error: no libz found; check path for ZLIB p

Re: [OMPI users] segmentation fault in mpiexec (Linux, Oracle/Sun C)

2010-10-21 Thread Terry Dontje

 I wonder if the error below be due to crap being left over in the 
source tree.  Can you do a "make clean".  Note on a new checkout from 
the v1.5 svn branch I was able to build 64 bit with the following 
configure line:


../configure FC=f95 F77=f77 CC=cc CXX=CC --without-openib 
--without-udapl -enable-heterogeneous --enable-cxx-exceptions 
--enable-shared --enable-orterun-prefix-by-default --with-sge 
--disable-mpi-threads --enable-mpi-f90 --with-mpi-f90-size=small 
--disable-progress-threads --prefix=/workspace/tdd/ctnext/v15 
CFLAGS=-m64 CXXFLAGS=-m64 FFLAGS=-m64 FCFLAGS=-m64


--td
On 10/21/2010 05:38 AM, Siegmar Gross wrote:

Hi,

thank you very much for your reply.


   Can you remove the -with-threads and -enable-mpi-threads options from
the configure line and see if that helps your 32 bit problem any?

I cannot build the package when I remove these options.

linpc4 openmpi-1.5-Linux.x86_64.32_cc 189 head -8 config.log
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by Open MPI configure 1.5, which was
generated by GNU Autoconf 2.65.  Invocation command line was

   $ ../openmpi-1.5/configure --prefix=/usr/local/openmpi-1.5_32_cc
   CFLAGS=-m32 CXXFLAGS=-m32 FFLAGS=-m32 FCFLAGS=-m32 CXXLDFLAGS=-m32
   CPPFLAGS= LDFLAGS=-m32 C_INCL_PATH= C_INCLUDE_PATH= CPLUS_INCLUDE_PATH=
   OBJC_INCLUDE_PATH= MPICHHOME= CC=cc CXX=CC F77=f95 FC=f95
   --without-udapl --enable-shared --enable-heterogeneous
   --enable-cxx-exceptions


linpc4 openmpi-1.5-Linux.x86_64.32_cc 190 head -8 ../*.old/config.log
This file contains any messages produced by compilers while
running configure, to aid debugging if configure makes a mistake.

It was created by Open MPI configure 1.5, which was
generated by GNU Autoconf 2.65.  Invocation command line was

   $ ../openmpi-1.5/configure --prefix=/usr/local/openmpi-1.5_32_cc
   CFLAGS=-m32 CXXFLAGS=-m32 FFLAGS=-m32 FCFLAGS=-m32 CXXLDFLAGS=-m32
   CPPFLAGS= LDFLAGS=-m32 C_INCL_PATH= C_INCLUDE_PATH= CPLUS_INCLUDE_PATH=
   OBJC_INCLUDE_PATH= MPICHHOME= CC=cc CXX=CC F77=f95 FC=f95
   --without-udapl --with-threads=posix --enable-mpi-threads
   --enable-shared --enable-heterogeneous --enable-cxx-exceptions


linpc4 openmpi-1.5-Linux.x86_64.32_cc 194 dir log.* ../*.old/log.*
... 132406 Oct 19 13:01
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.configure.Linux.x86_64.32_cc
... 195587 Oct 19 16:09
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.make-check.Linux.x86_64.32_cc
... 356672 Oct 19 16:07
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.make-install.Linux.x86_64.32_cc
... 280596 Oct 19 13:42
   ../openmpi-1.5-Linux.x86_64.32_cc.old/log.make.Linux.x86_64.32_cc
... 132265 Oct 21 10:51 log.configure.Linux.x86_64.32_cc
...  10890 Oct 21 10:51 log.make.Linux.x86_64.32_cc


linpc4 openmpi-1.5-Linux.x86_64.32_cc 195 grep -i warning:
   log.configure.Linux.x86_64.32_cc
configure: WARNING: *** Did not find corresponding C type
configure: WARNING: MPI_REAL16 and MPI_COMPLEX32 support have been disabled
configure: WARNING: *** Corresponding Fortran 77 type (REAL*16) not supported
configure: WARNING: *** Skipping Fortran 90 type (REAL*16)
configure: WARNING: valgrind.h not found
configure: WARNING: Unknown architecture ... proceeding anyway
configure: WARNING: File locks may not work with NFS.  See the Installation and
configure: WARNING:  -xldscope=hidden has been added to CFLAGS

linpc4 openmpi-1.5-Linux.x86_64.32_cc 196 grep -i warning:
   ../*.old/log.configure.Linux.x86_64.32_cc
configure: WARNING: *** Did not find corresponding C type
configure: WARNING: MPI_REAL16 and MPI_COMPLEX32 support have been disabled
configure: WARNING: *** Corresponding Fortran 77 type (REAL*16) not supported
configure: WARNING: *** Skipping Fortran 90 type (REAL*16)
configure: WARNING: valgrind.h not found
configure: WARNING: Unknown architecture ... proceeding anyway
configure: WARNING: File locks may not work with NFS.  See the Installation and
configure: WARNING:  -xldscope=hidden has been added to CFLAGS

linpc4 openmpi-1.5-Linux.x86_64.32_cc 197 grep -i error:
   log.configure.Linux.x86_64.32_cc
configure: error: no libz found; check path for ZLIB package first...
configure: error: no vtf3.h found; check path for VTF3 package first...
configure: error: no BPatch.h found; check path for Dyninst package first...
configure: error: no f2c.h found; check path for CLAPACK package first...
configure: error: MPI Correctness Checking support cannot be built inside Open
MPI
configure: error: no papi.h found; check path for PAPI package first...
configure: error: no libcpc.h found; check path for CPC package first...
configure: error: no ctool/ctool.h found; check path for CTool package first...

linpc4 openmpi-1.5-Linux.x86_64.32_cc 198 grep -i error:
   ../*.old/log.configure.Linux.x86_64.32_cc
configure: error: no libz found; check path for ZLIB package first...
configure: error: no vtf3.h found; check path for VTF3 package

Re: [OMPI users] segmentation fault in mpiexec (Linux, Oracle/Sun C)

2010-10-20 Thread Terry Dontje

 Can you remove the -with-threads and -enable-mpi-threads options from 
the configure line and see if that helps your 32 bit problem any?


--td
On 10/20/2010 09:38 AM, Siegmar Gross wrote:

Hi,

I have built Open MPI 1.5 on Linux x86_64 with the Oracle/Sun Studio C
compiler. Unfortunately "mpiexec" breaks when I run a small propgram.

linpc4 small_prog 106 cc -V
cc: Sun C 5.10 Linux_i386 2009/06/03
usage: cc [ options] files.  Use 'cc -flags' for details

linpc4 small_prog 107 uname -a
Linux linpc4 2.6.27.45-0.1-default #1 SMP 2010-02-22 16:49:47 +0100 x86_64
x86_64 x86_64 GNU/Linux

linpc4 small_prog 108 mpicc -show
cc -I/usr/local/openmpi-1.5_32_cc/include -mt
   -L/usr/local/openmpi-1.5_32_cc/lib -lmpi -ldl -Wl,--export-dynamic -lnsl
   -lutil -lm -ldl

linpc4 small_prog 109 mpicc -m32 rank_size.c
linpc4 small_prog 110 mpiexec -np 2 a.out
I'm process 0 of 2 available processes running on linpc4.
MPI standard 2.1 is supported.
I'm process 1 of 2 available processes running on linpc4.
MPI standard 2.1 is supported.
[linpc4:11564] *** Process received signal ***
[linpc4:11564] Signal: Segmentation fault (11)
[linpc4:11564] Signal code:  (128)
[linpc4:11564] Failing at address: (nil)
[linpc4:11565] *** Process received signal ***
[linpc4:11565] Signal: Segmentation fault (11)
[linpc4:11565] Signal code:  (128)
[linpc4:11565] Failing at address: (nil)
[linpc4:11564] [ 0] [0xe410]
[linpc4:11564] [ 1] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
   (mca_base_components_close+0x8c) [0xf774ccd0]
[linpc4:11564] [ 2] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
   (mca_btl_base_close+0xc5) [0xf76bd255]
[linpc4:11564] [ 3] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
   (mca_bml_base_close+0x32) [0xf76bd112]
[linpc4:11564] [ 4] /usr/local/openmpi-1.5_32_cc/lib/openmpi/
   mca_pml_ob1.so [0xf73d971f]
[linpc4:11564] [ 5] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
   (mca_base_components_close+0x8c) [0xf774ccd0]
[linpc4:11564] [ 6] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
   (mca_pml_base_close+0xc1) [0xf76e4385]
[linpc4:11564] [ 7] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
   [0xf76889e6]
[linpc4:11564] [ 8] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
   (PMPI_Finalize+0x3c) [0xf769dd4c]
[linpc4:11564] [ 9] a.out(main+0x98) [0x8048a18]
[linpc4:11564] [10] /lib/libc.so.6(__libc_start_main+0xe5) [0xf749c705]
[linpc4:11564] [11] a.out(_start+0x41) [0x8048861]
[linpc4:11564] *** End of error message ***
[linpc4:11565] [ 0] [0xe410]
[linpc4:11565] [ 1] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
   (mca_base_components_close+0x8c) [0xf76bccd0]
[linpc4:11565] [ 2] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
   (mca_btl_base_close+0xc5) [0xf762d255]
[linpc4:11565] [ 3] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
   (mca_bml_base_close+0x32) [0xf762d112]
[linpc4:11565] [ 4] /usr/local/openmpi-1.5_32_cc/lib/openmpi/
   mca_pml_ob1.so [0xf734971f]
[linpc4:11565] [ 5] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
   (mca_base_components_close+0x8c) [0xf76bccd0]
[linpc4:11565] [ 6] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
   (mca_pml_base_close+0xc1) [0xf7654385]
[linpc4:11565] [ 7] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
   [0xf75f89e6]
[linpc4:11565] [ 8] /usr/local/openmpi-1.5_32_cc/lib/libmpi.so.1
   (PMPI_Finalize+0x3c) [0xf760dd4c]
[linpc4:11565] [ 9] a.out(main+0x98) [0x8048a18]
[linpc4:11565] [10] /lib/libc.so.6(__libc_start_main+0xe5) [0xf740c705]
[linpc4:11565] [11] a.out(_start+0x41) [0x8048861]
[linpc4:11565] *** End of error message ***
--
mpiexec noticed that process rank 0 with PID 11564 on node linpc4 exited
   on signal 11 (Segmentation fault).
--
2 total processes killed (some possibly by mpiexec during cleanup)
linpc4 small_prog 111


"make check" shows that one test failed.

linpc4 openmpi-1.5-Linux.x86_64.32_cc 114 grep FAIL
   log.make-check.Linux.x86_64.32_cc
FAIL: opal_path_nfs
linpc4 openmpi-1.5-Linux.x86_64.32_cc 115 grep PASS
   log.make-check.Linux.x86_64.32_cc
PASS: predefined_gap_test
PASS: dlopen_test
PASS: atomic_barrier
PASS: atomic_barrier_noinline
PASS: atomic_spinlock
PASS: atomic_spinlock_noinline
PASS: atomic_math
PASS: atomic_math_noinline
PASS: atomic_cmpset
PASS: atomic_cmpset_noinline
decode [PASSED]
PASS: opal_datatype_test
PASS: checksum
PASS: position
decode [PASSED]
PASS: ddt_test
decode [PASSED]
PASS: ddt_raw
linpc4 openmpi-1.5-Linux.x86_64.32_cc 116

I used the following command to build the package.

../openmpi-1.5/configure --prefix=/usr/local/openmpi-1.5_32_cc \
   CFLAGS="-m32" CXXFLAGS="-m32" FFLAGS="-m32" FCFLAGS="-m32" \
   CXXLDFLAGS="-m32" CPPFLAGS="" \
   LDFLAGS="-m32" \
   C_INCL_PATH="" C_INCLUDE_PATH="" CPLUS_INCLUDE_PATH="" \
   OBJC_INCLUDE_PATH="" MPICHHOME="" \
   CC="cc" CXX="CC" F77="f95" FC="f95" \
   --without-udapl --with-threads=posix --enable-mpi-threads \
   --enable-shared

Re: [OMPI users] Bad performance when scattering big size of data?

2010-10-05 Thread Terry Dontje


 On 10/05/2010 10:23 AM, Storm Zhang wrote:
Sorry, I should say one more thing about the 500 procs test. I tried 
to run two 500 procs at the same time using SGE and it runs fast and 
finishes at the same time as the single run. So I think OpenMPI can 
handle them separately very well.


For the bind-to-core, I tried to run mpirun --help but not find the 
bind-to-core info. I only see bynode or byslot options. Is it same as 
bind-to-core? My mpirun shows version 1.3.3 but ompi_info shows 1.4.2.


No, -bynode/-byslot is for mapping not binding.  I cannot explain the 
different release versions of ompi_info and mpirun.  Have you done a 
which to see where each of them are located.  Anyways, 1.3.3 does not 
have any of the -bind-to-* options.


--td

Thanks a lot.

Linbao


On Mon, Oct 4, 2010 at 9:18 PM, Eugene Loh > wrote:


Storm Zhang wrote:


Here is what I meant: the results of 500 procs in fact shows
it with 272-304(<500) real cores, the program's running time
is good, which is almost five times 100 procs' time. So it can
be handled very well. Therefore I guess OpenMPI or Rocks OS
does make use of hyperthreading to do the job. But with 600
procs, the running time is more than double of that of 500
procs. I don't know why. This is my problem.
BTW, how to use -bind-to-core? I added it as mpirun's options.
It always gives me error " the executable 'bind-to-core' can't
be found. Isn't it like:
mpirun --mca btl_tcp_if_include eth0 -np 600  -bind-to-core
scatttest


Thanks for sending the mpirun run and error message.  That helps.

It's not recognizing the --bind-to-core option.  (Single hyphen,
as you had, should also be okay.)  Skimming through the e-mail, it
looks like you are using OMPI 1.3.2 and 1.4.2.  Did you try
--bind-to-core with both?  If I remember my version numbers,
--bind-to-core will not be recognized with 1.3.2, but should be
with 1.4.2.  Could it be that you only tried 1.3.2?

Another option is to try "mpirun --help".  Make sure that it
reports --bind-to-core.

___
users mailing list
us...@open-mpi.org 
http://www.open-mpi.org/mailman/listinfo.cgi/users



___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-29 Thread Terry Dontje

In some of the testing Eloi did earlier he did disabled eager rdma and 
still saw the issue.


--td

Shamis, Pavel wrote:

Terry,
Ishai Rabinovitz is HPC team manager (I added him to CC)

Eloi,

Back to issue. I have seen very similar issue long time ago on some hardware 
platforms that support relaxed ordering memory operations. If I remember 
correct it was some IBM platform.
Do you know if relaxed memory ordering is enabled on your platform ? If it is 
enabled you have to disable eager rdma.

Regards,
Pasha

On Sep 29, 2010, at 1:04 PM, Terry Dontje wrote:

Pasha, do you by any chance know who at Mellanox might be responsible for OMPI 
working?

--td

Eloi Gaudry wrote:
 Hi Nysal, Terry,
Thanks for your input on this issue.
I'll follow your advice. Do you know any Mellanox developer I may discuss with, 
preferably someone who has spent some time inside the openib btl ?

Regards,
Eloi

On 29/09/2010 06:01, Nysal Jan wrote:
Hi Eloi,
We discussed this issue during the weekly developer meeting & there were no 
further suggestions, apart from checking the driver and firmware levels. The 
consensus was that it would be better if you could take this up directly with your 
IB vendor.

Regards
--Nysal
___
users mailing list
us...@open-mpi.org<mailto:us...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users


--

Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle - Performance Technologies
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com<mailto:terry.don...@oracle.com>




___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
  



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com <mailto:terry.don...@oracle.com>

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-29 Thread Terry Dontje

Pasha, do you by any chance know who at Mellanox might be responsible 
for OMPI working?


--td

Eloi Gaudry wrote:

 Hi Nysal, Terry,
Thanks for your input on this issue.
I'll follow your advice. Do you know any Mellanox developer I may 
discuss with, preferably someone who has spent some time inside the 
openib btl ?


Regards,
Eloi

On 29/09/2010 06:01, Nysal Jan wrote:

Hi Eloi,
We discussed this issue during the weekly developer meeting & there 
were no further suggestions, apart from checking the driver and 
firmware levels. The consensus was that it would be better if you 
could take this up directly with your IB vendor.


Regards
--Nysal

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle * - Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-27 Thread Terry Dontje

Ok there were no 0 value tags in your files.  Are you running this with 
no eager RDMA?  If not can you set the following options "-mca 
btl_openib_use_eager_rdma 0 -mca btl_openib_max_eager_rdma 0 -mca 
btl_openib_flags 1".


thanks,

--td

Eloi Gaudry wrote:

Terry,

Please find enclosed the requested check outputs (using -output-filename 
stdout.tag.null option).
I'm displaying frag->hdr->tag here.

Eloi

On Monday 27 September 2010 16:29:12 Terry Dontje wrote:
  

Eloi, sorry can you print out frag->hdr->tag?

Unfortunately from your last email I think it will still all have
non-zero values.
If that ends up being the case then there must be something odd with the
descriptor pointer to the fragment.

--td

Eloi Gaudry wrote:


Terry,

Please find enclosed the requested check outputs (using -output-filename
stdout.tag.null option).

For information, Nysal In his first message referred to
ompi/mca/pml/ob1/pml_ob1_hdr.h and said that hdr->tg value was wrnong on
receiving side. #define MCA_PML_OB1_HDR_TYPE_MATCH (MCA_BTL_TAG_PML
+ 1)
#define MCA_PML_OB1_HDR_TYPE_RNDV  (MCA_BTL_TAG_PML + 2)
#define MCA_PML_OB1_HDR_TYPE_RGET  (MCA_BTL_TAG_PML + 3)

 #define MCA_PML_OB1_HDR_TYPE_ACK   (MCA_BTL_TAG_PML + 4)

#define MCA_PML_OB1_HDR_TYPE_NACK  (MCA_BTL_TAG_PML + 5)
#define MCA_PML_OB1_HDR_TYPE_FRAG  (MCA_BTL_TAG_PML + 6)
#define MCA_PML_OB1_HDR_TYPE_GET   (MCA_BTL_TAG_PML + 7)

 #define MCA_PML_OB1_HDR_TYPE_PUT   (MCA_BTL_TAG_PML + 8)

#define MCA_PML_OB1_HDR_TYPE_FIN   (MCA_BTL_TAG_PML + 9)
and in ompi/mca/btl/btl.h
#define MCA_BTL_TAG_PML 0x40

Eloi

On Monday 27 September 2010 14:36:59 Terry Dontje wrote:
  

I am thinking checking the value of *frag->hdr right before the return
in the post_send function in ompi/mca/btl/openib/btl_openib_endpoint.h.
It is line 548 in the trunk
https://svn.open-mpi.org/source/xref/ompi-trunk/ompi/mca/btl/openib/btl_
ope nib_endpoint.h#548

--td

Eloi Gaudry wrote:


Hi Terry,

Do you have any patch that I could apply to be able to do so ? I'm
remotely working on a cluster (with a terminal) and I cannot use any
parallel debugger or sequential debugger (with a call to xterm...). I
can track frag->hdr->tag value in
ompi/mca/btl/openib/btl_openib_component.c::handle_wc in the
SEND/RDMA_WRITE case, but this is all I can think of alone.

You'll find a stacktrace (receive side) in this thread (10th or 11th
message) but it might be pointless.

Regards,
Eloi

On Monday 27 September 2010 11:43:55 Terry Dontje wrote:
  

So it sounds like coalescing is not your issue and that the problem
has something to do with the queue sizes.  It would be helpful if we
could detect the hdr->tag == 0 issue on the sending side and get at
least a stack trace.  There is something really odd going on here.

--td

Eloi Gaudry wrote:


Hi Terry,

I'm sorry to say that I might have missed a point here.

I've lately been relaunching all previously failing computations with
the message coalescing feature being switched off, and I saw the same
hdr->tag=0 error several times, always during a collective call
(MPI_Comm_create, MPI_Allreduce and MPI_Broadcast, so far). And as
soon as I switched to the peer queue option I was previously using
(--mca btl_openib_receive_queues P,65536,256,192,128 instead of using
--mca btl_openib_use_message_coalescing 0), all computations ran
flawlessly.

As for the reproducer, I've already tried to write something but I
haven't succeeded so far at reproducing the hdr->tag=0 issue with it.

Eloi

On 24/09/2010 18:37, Terry Dontje wrote:
  

Eloi Gaudry wrote:


Terry,

You were right, the error indeed seems to come from the message
coalescing feature. If I turn it off using the "--mca
btl_openib_use_message_coalescing 0", I'm not able to observe the
"hdr->tag=0" error.

There are some trac requests associated to very similar error
(https://svn.open-mpi.org/trac/ompi/search?q=coalescing) but they
are all closed (except
https://svn.open-mpi.org/trac/ompi/ticket/2352 that might be
related), aren't they ? What would you suggest Terry ?
  

Interesting, though it looks to me like the segv in ticket 2352
would have happened on the send side instead of the receive side
like you have.  As to what to do next it would be really nice to
have some sort of reproducer that we can try and debug what is
really going on.  The only other thing to do without a reproducer
is to inspect the code on the send side to figure out what might
make it generate at 0 hdr->tag.  Or maybe instrument the send side
to stop when it is about ready to send a 0 hdr->tag and see if we
can see how the code got there.

I might have some cycles to look at this Monday.

--td

    

Eloi

On Friday 24 September 2010 16:00:26 Terry Dontje wrote:
  

Eloi Gaudry wrote:


Terry,

No, I

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-27 Thread Terry Dontje


Eloi, sorry can you print out frag->hdr->tag?

Unfortunately from your last email I think it will still all have 
non-zero values.
If that ends up being the case then there must be something odd with the 
descriptor pointer to the fragment.


--td

Eloi Gaudry wrote:

Terry,

Please find enclosed the requested check outputs (using -output-filename 
stdout.tag.null option).

For information, Nysal In his first message referred to 
ompi/mca/pml/ob1/pml_ob1_hdr.h and said that hdr->tg value was wrnong on 
receiving side.
#define MCA_PML_OB1_HDR_TYPE_MATCH (MCA_BTL_TAG_PML + 1)
#define MCA_PML_OB1_HDR_TYPE_RNDV  (MCA_BTL_TAG_PML + 2)
#define MCA_PML_OB1_HDR_TYPE_RGET  (MCA_BTL_TAG_PML + 3)
 #define MCA_PML_OB1_HDR_TYPE_ACK   (MCA_BTL_TAG_PML + 4)
#define MCA_PML_OB1_HDR_TYPE_NACK  (MCA_BTL_TAG_PML + 5)
#define MCA_PML_OB1_HDR_TYPE_FRAG  (MCA_BTL_TAG_PML + 6)
#define MCA_PML_OB1_HDR_TYPE_GET   (MCA_BTL_TAG_PML + 7)
 #define MCA_PML_OB1_HDR_TYPE_PUT   (MCA_BTL_TAG_PML + 8)
#define MCA_PML_OB1_HDR_TYPE_FIN   (MCA_BTL_TAG_PML + 9)
and in ompi/mca/btl/btl.h 
#define MCA_BTL_TAG_PML 0x40


Eloi

On Monday 27 September 2010 14:36:59 Terry Dontje wrote:
  

I am thinking checking the value of *frag->hdr right before the return
in the post_send function in ompi/mca/btl/openib/btl_openib_endpoint.h.
It is line 548 in the trunk
https://svn.open-mpi.org/source/xref/ompi-trunk/ompi/mca/btl/openib/btl_ope
nib_endpoint.h#548

--td

Eloi Gaudry wrote:


Hi Terry,

Do you have any patch that I could apply to be able to do so ? I'm
remotely working on a cluster (with a terminal) and I cannot use any
parallel debugger or sequential debugger (with a call to xterm...). I
can track frag->hdr->tag value in
ompi/mca/btl/openib/btl_openib_component.c::handle_wc in the
SEND/RDMA_WRITE case, but this is all I can think of alone.

You'll find a stacktrace (receive side) in this thread (10th or 11th
message) but it might be pointless.

Regards,
Eloi

On Monday 27 September 2010 11:43:55 Terry Dontje wrote:
  

So it sounds like coalescing is not your issue and that the problem has
something to do with the queue sizes.  It would be helpful if we could
detect the hdr->tag == 0 issue on the sending side and get at least a
stack trace.  There is something really odd going on here.

--td

Eloi Gaudry wrote:


Hi Terry,

I'm sorry to say that I might have missed a point here.

I've lately been relaunching all previously failing computations with
the message coalescing feature being switched off, and I saw the same
hdr->tag=0 error several times, always during a collective call
(MPI_Comm_create, MPI_Allreduce and MPI_Broadcast, so far). And as
soon as I switched to the peer queue option I was previously using
(--mca btl_openib_receive_queues P,65536,256,192,128 instead of using
--mca btl_openib_use_message_coalescing 0), all computations ran
flawlessly.

As for the reproducer, I've already tried to write something but I
haven't succeeded so far at reproducing the hdr->tag=0 issue with it.

Eloi

On 24/09/2010 18:37, Terry Dontje wrote:
  

Eloi Gaudry wrote:


Terry,

You were right, the error indeed seems to come from the message
coalescing feature. If I turn it off using the "--mca
btl_openib_use_message_coalescing 0", I'm not able to observe the
"hdr->tag=0" error.

There are some trac requests associated to very similar error
(https://svn.open-mpi.org/trac/ompi/search?q=coalescing) but they are
all closed (except https://svn.open-mpi.org/trac/ompi/ticket/2352
that might be related), aren't they ? What would you suggest Terry ?
  

Interesting, though it looks to me like the segv in ticket 2352 would
have happened on the send side instead of the receive side like you
have.  As to what to do next it would be really nice to have some
sort of reproducer that we can try and debug what is really going
on.  The only other thing to do without a reproducer is to inspect
the code on the send side to figure out what might make it generate
at 0 hdr->tag.  Or maybe instrument the send side to stop when it is
about ready to send a 0 hdr->tag and see if we can see how the code
got there.

I might have some cycles to look at this Monday.

--td

    

Eloi

On Friday 24 September 2010 16:00:26 Terry Dontje wrote:
  

Eloi Gaudry wrote:


Terry,

No, I haven't tried any other values than P,65536,256,192,128 yet.

The reason why is quite simple. I've been reading and reading again
this thread to understand the btl_openib_receive_queues meaning and
I can't figure out why the default values seem to induce the hdr-

  

tag=0 issue
(http://www.open-mpi.org/community/lists/users/2009/01/7808.php).


Yeah, the size of the fragments and number of them really should not
cause this issue.  So I too am a little perplexed about it.

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-24 Thread Terry Dontje

Eloi Gaudry wrote:

Terry,

You were right, the error indeed seems to come from the message coalescing
feature.
If I turn it off using the "--mca btl_openib_use_message_coalescing 0", I'm not able to
observe the "hdr->tag=0" error.

There are some trac requests associated to very similar error (https://svn.open-mpi.org/trac/ompi/search?q=coalescing) but they are all closed (except https://svn.open-mpi.org/trac/ompi/ticket/2352
that might be related), aren't they ? What would you suggest Terry ?

Interesting, though it looks to me like the segv in ticket 2352 would
have happened on the send side instead of the receive side like you
have. As to what to do next it would be really nice to have some sort
of reproducer that we can try and debug what is really going on. The
only other thing to do without a reproducer is to inspect the code on
the send side to figure out what might make it generate at 0 hdr->tag.
Or maybe instrument the send side to stop when it is about ready to send
a 0 hdr->tag and see if we can see how the code got there.

I might have some cycles to look at this Monday.

--td

Eloi

On Friday 24 September 2010 16:00:26 Terry Dontje wrote:

Eloi Gaudry wrote:

Terry,

No, I haven't tried any other values than P,65536,256,192,128 yet.

The reason why is quite simple. I've been reading and reading again this
thread to understand the btl_openib_receive_queues meaning and I can't
figure out why the default values seem to induce the hdr-

tag=0 issue
(http://www.open-mpi.org/community/lists/users/2009/01/7808.php).

Yeah, the size of the fragments and number of them really should not
cause this issue. So I too am a little perplexed about it.

Do you think that the default shared received queue parameters are
erroneous for this specific Mellanox card ? Any help on finding the
proper parameters would actually be much appreciated.

I don't necessarily think it is the queue size for a specific card but
more so the handling of the queues by the BTL when using certain sizes.
At least that is one gut feel I have.

In my mind the tag being 0 is either something below OMPI is polluting
the data fragment or OMPI's internal protocol is some how getting messed
up. I can imagine (no empirical data here) the queue sizes could change
how the OMPI protocol sets things up. Another thing may be the
coalescing feature in the openib BTL which tries to gang multiple
messages into one packet when resources are running low. I can see
where changing the queue sizes might affect the coalescing. So, it
might be interesting to turn off the coalescing. You can do that by
setting "--mca btl_openib_use_message_coalescing 0" in your mpirun line.

If that doesn't solve the issue then obviously there must be something
else going on :-).

Note, the reason I am interested in this is I am seeing a similar error
condition (hdr->tag == 0) on a development system. Though my failing
case fails with np=8 using the connectivity test program which is mainly
point to point and there are not a significant amount of data transfers
going on either.

--td

Eloi

On Friday 24 September 2010 14:27:07 you wrote:

That is interesting. So does the number of processes affect your runs
any. The times I've seen hdr->tag be 0 usually has been due to protocol
issues. The tag should never be 0. Have you tried to do other
receive_queue settings other than the default and the one you mention.

I wonder if you did a combination of the two receive queues causes a
failure or not. Something like

P,128,256,192,128:P,65536,256,192,128

I am wondering if it is the first queuing definition causing the issue
or possibly the SRQ defined in the default.

--td

Eloi Gaudry wrote:

Hi Terry,

The messages being send/received can be of any size, but the error
seems to happen more often with small messages (as an int being
broadcasted or allreduced). The failing communication differs from one
run to another, but some spots are more likely to be failing than
another. And as far as I know, there are always located next to a
small message (an int being broadcasted for instance) communication.
Other typical messages size are

10k but can be very much larger.

I've been checking the hca being used, its' from mellanox (with
vendor_part_id=26428). There is no receive_queues parameters associated
to it.

$ cat share/openmpi/mca-btl-openib-device-params.ini as well:
[...]

# A.k.a. ConnectX
[Mellanox Hermon]
vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3
vendor_part_id =
25408,25418,25428,26418,26428,25448,26438,26448,26468,26478,26488
use_eager_rdma = 1
mtu = 2048
max_inline_data = 128

[..]

$ ompi_info --param btl openib --parsable | grep receive_queues

mca:btl:openib:param:btl_openib_receive_queues:value:P,128,256,192,128
:S ,2048,256

Re: [OMPI users] [openib] segfault when using openib btl

2010-09-24 Thread Terry Dontje

I wonder if you did a combination of the two receive queues causes a
failure or not. Something like

P,128,256,192,128:P,65536,256,192,128

I am wondering if it is the first queuing definition causing the issue or
possibly the SRQ defined in the default.

--td

Eloi Gaudry wrote:

Hi Terry,

The messages being send/received can be of any size, but the error seems to
happen more often with small messages (as an int being broadcasted or
allreduced).
The failing communication differs from one run to another, but some spots are more likely to be failing than another. And as far as I know, there are always located next to a small message (an int
being broadcasted for instance) communication. Other typical messages size are >10k but can be very much larger.

I've been checking the hca being used, its' from mellanox (with
vendor_part_id=26428). There is no receive_queues parameters associated to it.
$ cat share/openmpi/mca-btl-openib-device-params.ini as well:
[...]
# A.k.a. ConnectX
[Mellanox Hermon]
vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3
vendor_part_id =
25408,25418,25428,26418,26428,25448,26438,26448,26468,26478,26488
use_eager_rdma = 1
mtu = 2048
max_inline_data = 128
[..]

$ ompi_info --param btl openib --parsable | grep receive_queues

mca:btl:openib:param:btl_openib_receive_queues:value:P,128,256,192,128:S,2048,256,128,32:S,12288,256,128,32:S,65536,256,128,32
mca:btl:openib:param:btl_openib_receive_queues:data_source:default value
mca:btl:openib:param:btl_openib_receive_queues:status:writable
mca:btl:openib:param:btl_openib_receive_queues:help:Colon-delimited, comma
delimited list of receive queues: P,4096,8,6,4:P,32768,8,6,4
mca:btl:openib:param:btl_openib_receive_queues:deprecated:no

I was wondering if these parameters (automatically computed at openib btl init for what I understood) were not incorrect in some way and I plugged some others values: "P,65536,256,192,128" (someone on
the list used that values when encountering a different issue) . Since that, I haven't been able to observe the segfault (occuring as hrd->tag = 0 in btl_openib_component.c:2881) yet.

Eloi

/home/pp_fr/st03230/EG/Softs/openmpi-custom-1.4.2/bin/

On Thursday 23 September 2010 23:33:48 Terry Dontje wrote:

Eloi, I am curious about your problem. Can you tell me what size of job
it is? Does it always fail on the same bcast, or same process?

Eloi Gaudry wrote:

Hi Nysal,

Thanks for your suggestions.

I'm now able to get the checksum computed and redirected to stdout,
thanks (I forgot the "-mca pml_base_verbose 5" option, you were right).
I haven't been able to observe the segmentation fault (with hdr->tag=0)
so far (when using pml csum) but I 'll let you know when I am.

I've got two others question, which may be related to the error observed:

1/ does the maximum number of MPI_Comm that can be handled by OpenMPI
somehow depends on the btl being used (i.e. if I'm using openib, may I
use the same number of MPI_Comm object as with tcp) ? Is there something
as MPI_COMM_MAX in OpenMPI ?

2/ the segfaults only appears during a mpi collective call, with very
small message (one int is being broadcast, for instance) ; i followed
the guidelines given at http://icl.cs.utk.edu/open-
mpi/faq/?category=openfabrics#ib-small-message-rdma but the debug-build
of OpenMPI asserts if I use a different min-size that 255. Anyway, if I
deactivate eager_rdma, the segfaults remains. Does the openib btl handle
very small message differently (even with eager_rdma deactivated) than
tcp ?

Others on the list does coalescing happen with non-eager_rdma? If so
then that would possibly be one difference between the openib btl and
tcp aside from the actual protocol used.

is there a way to make sure that large messages and small messages are
handled the same way ?

Do you mean so they all look like eager messages? How large of messages
are we talking about here 1K, 1M or 10M?

--td

Regards,
Eloi

On Friday 17 September 2010 17:57:17 Nysal Jan wrote:

Hi Eloi,
Create a debug build of OpenMPI (--enable-debug) and while running with
the csum PML add "-mca pml_base_verbose 5" to the command line. This
will print the checksum details for each fragment sent over the wire.
I'm guessing it didnt catch anything because the BTL failed. The
checksum verification is done in the PML, which the BTL calls via a
callback function. In your case the PML callback is never called
because the hdr->tag is invalid. So enabling checksum tracing also
might not be of much use. Is it the first Bcast that fails or the nth
Bcast and what is the message size? I'm not

1 2 >

1 - 100 of 169 matches

Mail list logo