Re: [OMPI devel] SM BTL hang issue

2007-08-31 Thread Terry D. Dontje

Scott Atchley wrote:


Terry,

Are you testing on Linux? If so, which kernel?

 

No, I am running into issues on Solaris but Ollie's run of the test code 
on Linux seems to work fine.


--td

See the patch to iperf to handle kernel 2.6.21 and the issue that  
they had with usleep(0):


http://dast.nlanr.net/Projects/Iperf2.0/patch-iperf-linux-2.6.21.txt

Scott

On Aug 31, 2007, at 1:36 PM, Terry D. Dontje wrote:

 


Ok, I have an update to this issue.  I believe there is an
implementation difference of sched_yield between Linux and  
Solaris.  If

I change the sched_yield in opal_progress to be a usleep(500) then my
program completes quite quickly.  I have sent a few questions to a
Solaris engineer and hopefully will get some useful information.

That being said, CT-6's implementation also used yield calls (note  
this
actually is what sched_yield reduces down to in Solaris) and we did  
not

see the same degradation issue as with Open MPI.  I believe the reason
is because CT-6's SM implementation is not calling CT-6's version of
progress recursively and forcing all the unexpected to be read in  
before
continuing.  CT-6 also has a natural flow control in it's  
implementation

(ie it has a fixed set fifo for eager messages.

I believe both of these characteristics lend CT-6 to not being
completely killed by the yield differences.

--td


Li-Ta Lo wrote:

   


On Thu, 2007-08-30 at 12:45 -0400, terry.don...@sun.com wrote:


 


Li-Ta Lo wrote:



   


On Thu, 2007-08-30 at 12:25 -0400, terry.don...@sun.com wrote:




 


Li-Ta Lo wrote:





   


On Wed, 2007-08-29 at 14:06 -0400, Terry D. Dontje wrote:






 


hmmm, interesting since my version doesn't abort at all.







   

Some problem with fortran compiler/language binding? My C  
translation

doesn't have any problem.

[ollie@exponential ~]$ mpirun -np 4 a.out 10
Target duration (seconds): 10.00, #of msgs: 50331, usec  
per msg:

198.684707







 

Did you oversubscribe?  I found np=10 on a 8 core system  
clogged things

up sufficiently.





   

Yea, I used np 10 on a 2 proc, 2 hyper-thread system (total 4  
threads).






 


Is this using Linux?



   


Yes.

Ollie


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


 


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
   



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
 





Re: [OMPI devel] SM BTL hang issue

2007-08-31 Thread Scott Atchley

Terry,

Are you testing on Linux? If so, which kernel?

See the patch to iperf to handle kernel 2.6.21 and the issue that  
they had with usleep(0):


http://dast.nlanr.net/Projects/Iperf2.0/patch-iperf-linux-2.6.21.txt

Scott

On Aug 31, 2007, at 1:36 PM, Terry D. Dontje wrote:


Ok, I have an update to this issue.  I believe there is an
implementation difference of sched_yield between Linux and  
Solaris.  If

I change the sched_yield in opal_progress to be a usleep(500) then my
program completes quite quickly.  I have sent a few questions to a
Solaris engineer and hopefully will get some useful information.

That being said, CT-6's implementation also used yield calls (note  
this
actually is what sched_yield reduces down to in Solaris) and we did  
not

see the same degradation issue as with Open MPI.  I believe the reason
is because CT-6's SM implementation is not calling CT-6's version of
progress recursively and forcing all the unexpected to be read in  
before
continuing.  CT-6 also has a natural flow control in it's  
implementation

(ie it has a fixed set fifo for eager messages.

I believe both of these characteristics lend CT-6 to not being
completely killed by the yield differences.

--td


Li-Ta Lo wrote:


On Thu, 2007-08-30 at 12:45 -0400, terry.don...@sun.com wrote:



Li-Ta Lo wrote:




On Thu, 2007-08-30 at 12:25 -0400, terry.don...@sun.com wrote:





Li-Ta Lo wrote:






On Wed, 2007-08-29 at 14:06 -0400, Terry D. Dontje wrote:







hmmm, interesting since my version doesn't abort at all.







Some problem with fortran compiler/language binding? My C  
translation

doesn't have any problem.

[ollie@exponential ~]$ mpirun -np 4 a.out 10
Target duration (seconds): 10.00, #of msgs: 50331, usec  
per msg:

198.684707







Did you oversubscribe?  I found np=10 on a 8 core system  
clogged things

up sufficiently.





Yea, I used np 10 on a 2 proc, 2 hyper-thread system (total 4  
threads).







Is this using Linux?






Yes.

Ollie


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




Re: [OMPI devel] SM BTL hang issue

2007-08-31 Thread Terry D. Dontje
Ok, I have an update to this issue.  I believe there is an 
implementation difference of sched_yield between Linux and Solaris.  If 
I change the sched_yield in opal_progress to be a usleep(500) then my 
program completes quite quickly.  I have sent a few questions to a 
Solaris engineer and hopefully will get some useful information.


That being said, CT-6's implementation also used yield calls (note this 
actually is what sched_yield reduces down to in Solaris) and we did not 
see the same degradation issue as with Open MPI.  I believe the reason 
is because CT-6's SM implementation is not calling CT-6's version of 
progress recursively and forcing all the unexpected to be read in before 
continuing.  CT-6 also has a natural flow control in it's implementation 
(ie it has a fixed set fifo for eager messages.


I believe both of these characteristics lend CT-6 to not being 
completely killed by the yield differences.


--td


Li-Ta Lo wrote:


On Thu, 2007-08-30 at 12:45 -0400, terry.don...@sun.com wrote:
 


Li-Ta Lo wrote:

   


On Thu, 2007-08-30 at 12:25 -0400, terry.don...@sun.com wrote:


 


Li-Ta Lo wrote:

  

   


On Wed, 2007-08-29 at 14:06 -0400, Terry D. Dontje wrote:




 


hmmm, interesting since my version doesn't abort at all.

 

  

   

Some problem with fortran compiler/language binding? My C translation 
doesn't have any problem.


[ollie@exponential ~]$ mpirun -np 4 a.out 10
Target duration (seconds): 10.00, #of msgs: 50331, usec per msg:
198.684707





 

Did you oversubscribe?  I found np=10 on a 8 core system clogged things 
up sufficiently.


  

   


Yea, I used np 10 on a 2 proc, 2 hyper-thread system (total 4 threads).



 


Is this using Linux?

   




Yes.

Ollie


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
 





Re: [OMPI devel] SM BTL hang issue

2007-08-30 Thread Terry . Dontje

Li-Ta Lo wrote:


On Thu, 2007-08-30 at 12:25 -0400, terry.don...@sun.com wrote:
 


Li-Ta Lo wrote:

   


On Wed, 2007-08-29 at 14:06 -0400, Terry D. Dontje wrote:


 


hmmm, interesting since my version doesn't abort at all.

  

   

Some problem with fortran compiler/language binding? My C translation 
doesn't have any problem.


[ollie@exponential ~]$ mpirun -np 4 a.out 10
Target duration (seconds): 10.00, #of msgs: 50331, usec per msg:
198.684707



 

Did you oversubscribe?  I found np=10 on a 8 core system clogged things 
up sufficiently.


   




Yea, I used np 10 on a 2 proc, 2 hyper-thread system (total 4 threads).

 


Is this using Linux?

--td


Ollie



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
 





Re: [OMPI devel] SM BTL hang issue

2007-08-30 Thread Li-Ta Lo
On Thu, 2007-08-30 at 12:25 -0400, terry.don...@sun.com wrote:
> Li-Ta Lo wrote:
> 
> >On Wed, 2007-08-29 at 14:06 -0400, Terry D. Dontje wrote:
> >  
> >
> >>hmmm, interesting since my version doesn't abort at all.
> >>
> >>
> >>
> >
> >
> >Some problem with fortran compiler/language binding? My C translation 
> >doesn't have any problem.
> >
> >[ollie@exponential ~]$ mpirun -np 4 a.out 10
> >Target duration (seconds): 10.00, #of msgs: 50331, usec per msg:
> >198.684707
> >
> >  
> >
> Did you oversubscribe?  I found np=10 on a 8 core system clogged things 
> up sufficiently.
> 


Yea, I used np 10 on a 2 proc, 2 hyper-thread system (total 4 threads).

Ollie





Re: [OMPI devel] SM BTL hang issue

2007-08-30 Thread Terry . Dontje

Li-Ta Lo wrote:


On Wed, 2007-08-29 at 14:06 -0400, Terry D. Dontje wrote:
 


hmmm, interesting since my version doesn't abort at all.

   




Some problem with fortran compiler/language binding? My C translation 
doesn't have any problem.


[ollie@exponential ~]$ mpirun -np 4 a.out 10
Target duration (seconds): 10.00, #of msgs: 50331, usec per msg:
198.684707

 

Did you oversubscribe?  I found np=10 on a 8 core system clogged things 
up sufficiently.


--td


Ollie

 




#include 
#include 
#include 

int main(int argc, char *argv[])
{
   double duration = 10, endtime;
   long nmsgs = 1;
   int keep_going = 1, rank, size;
   MPI_Status status;

   MPI_Init(, );
   MPI_Comm_rank(MPI_COMM_WORLD, );
   MPI_Comm_size(MPI_COMM_WORLD, );

   if (size == 1) {
fprintf(stderr, "Need at least 2 processes\n");
   } else if (rank == 0) {
duration = strtod(argv[1], NULL);
endtime = MPI_Wtime() + duration;

do {
MPI_Send(_going, 1, MPI_INT, 1, 0x11, MPI_COMM_WORLD);
nmsgs += 1;
} while (MPI_Wtime() < endtime);

keep_going = 0;
MPI_Send(_going, 1, MPI_INT, 1, 0x11, MPI_COMM_WORLD);

fprintf(stderr, "Target duration (seconds): %f, #of msgs: %d, usec per msg: 
%f\n",
duration, nmsgs, 1.0e6*duration/nmsgs);
   } else {
do {
MPI_Recv(_going, 1, MPI_INT, rank-1, 0x11, MPI_COMM_WORLD, 
);

if (rank == (size-1))
continue;

MPI_Send(_going, 1, MPI_INT, rank+1, 0x11, MPI_COMM_WORLD);
} while (keep_going);
   }

   MPI_Finalize();

}
 




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
 





Re: [OMPI devel] SM BTL hang issue

2007-08-30 Thread Li-Ta Lo
On Wed, 2007-08-29 at 14:06 -0400, Terry D. Dontje wrote:
> hmmm, interesting since my version doesn't abort at all.
> 


Some problem with fortran compiler/language binding? My C translation 
doesn't have any problem.

[ollie@exponential ~]$ mpirun -np 4 a.out 10
Target duration (seconds): 10.00, #of msgs: 50331, usec per msg:
198.684707

Ollie

#include 
#include 
#include 

int main(int argc, char *argv[])
{
double duration = 10, endtime;
long nmsgs = 1;
int keep_going = 1, rank, size;
MPI_Status status;

MPI_Init(, );
MPI_Comm_rank(MPI_COMM_WORLD, );
MPI_Comm_size(MPI_COMM_WORLD, );

if (size == 1) {
	fprintf(stderr, "Need at least 2 processes\n");
} else if (rank == 0) {
	duration = strtod(argv[1], NULL);
	endtime = MPI_Wtime() + duration;

	do {
	MPI_Send(_going, 1, MPI_INT, 1, 0x11, MPI_COMM_WORLD);
	nmsgs += 1;
	} while (MPI_Wtime() < endtime);

	keep_going = 0;
	MPI_Send(_going, 1, MPI_INT, 1, 0x11, MPI_COMM_WORLD);

	fprintf(stderr, "Target duration (seconds): %f, #of msgs: %d, usec per msg: %f\n",
		duration, nmsgs, 1.0e6*duration/nmsgs);
} else {
	do {
	MPI_Recv(_going, 1, MPI_INT, rank-1, 0x11, MPI_COMM_WORLD, );

	if (rank == (size-1))
		continue;

	MPI_Send(_going, 1, MPI_INT, rank+1, 0x11, MPI_COMM_WORLD);
	} while (keep_going);
}

MPI_Finalize();

}


Re: [OMPI devel] SM BTL hang issue

2007-08-29 Thread Terry D. Dontje

hmmm, interesting since my version doesn't abort at all.

--td

Li-Ta Lo wrote:


On Wed, 2007-08-29 at 11:36 -0400, Terry D. Dontje wrote:
 

To run the code I usually do "mpirun -np 6 a.out 10" on a 2 core 
system.  It'll print out the following and then hang:

Target duration (seconds): 10.00
# of messages sent in that time:  589207
Microseconds per message: 16.972

   




I know almost nothing about FORTRAN but the stack dump told me
it got NULL pointer reference when accessing the "me" variable
in the do .. while loop. How can this happen?

[ollie@exponential ~]$ mpirun -np 2 a.out 100
[exponential:22145] *** Process received signal ***
[exponential:22145] Signal: Segmentation fault (11)
[exponential:22145] Signal code: Address not mapped (1)
[exponential:22145] Failing at address: (nil)
[exponential:22145] [ 0] [0xb7f2a440]
[exponential:22145] [ 1] a.out(MAIN__+0x54a) [0x804909e]
[exponential:22145] [ 2] a.out(main+0x27) [0x8049127]
[exponential:22145] [ 3] /lib/libc.so.6(__libc_start_main+0xe0)
[0x4e75ef70]
[exponential:22145] [ 4] a.out [0x8048aa1]
[exponential:22145] *** End of error message ***

   call MPI_Send(keep_going,1,MPI_LOGICAL,me+1,1,
$   MPI_COMM_WORLD,ier)
804909e:   8b 45 d4mov0xffd4(%ebp),%eax
80490a1:   83 c0 01add$0x1,%eax

It is compiled with g77/g90.

Ollie


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
 





Re: [OMPI devel] SM BTL hang issue

2007-08-29 Thread Li-Ta Lo
On Wed, 2007-08-29 at 11:36 -0400, Terry D. Dontje wrote:
> To run the code I usually do "mpirun -np 6 a.out 10" on a 2 core 
> system.  It'll print out the following and then hang:
> Target duration (seconds): 10.00
> # of messages sent in that time:  589207
> Microseconds per message: 16.972
> 


I know almost nothing about FORTRAN but the stack dump told me
it got NULL pointer reference when accessing the "me" variable
in the do .. while loop. How can this happen?

[ollie@exponential ~]$ mpirun -np 2 a.out 100
[exponential:22145] *** Process received signal ***
[exponential:22145] Signal: Segmentation fault (11)
[exponential:22145] Signal code: Address not mapped (1)
[exponential:22145] Failing at address: (nil)
[exponential:22145] [ 0] [0xb7f2a440]
[exponential:22145] [ 1] a.out(MAIN__+0x54a) [0x804909e]
[exponential:22145] [ 2] a.out(main+0x27) [0x8049127]
[exponential:22145] [ 3] /lib/libc.so.6(__libc_start_main+0xe0)
[0x4e75ef70]
[exponential:22145] [ 4] a.out [0x8048aa1]
[exponential:22145] *** End of error message ***

call MPI_Send(keep_going,1,MPI_LOGICAL,me+1,1,
 $   MPI_COMM_WORLD,ier)
 804909e:   8b 45 d4mov0xffd4(%ebp),%eax
 80490a1:   83 c0 01add$0x1,%eax

It is compiled with g77/g90.

Ollie




Re: [OMPI devel] SM BTL hang issue

2007-08-29 Thread Richard Graham
If you are going to look at it, I will not bother with this.

Rich


On 8/29/07 10:47 AM, "Gleb Natapov"  wrote:

> On Wed, Aug 29, 2007 at 10:46:06AM -0400, Richard Graham wrote:
>> Gleb,
>>   Are you looking at this ?
> Not today. And I need the code to reproduce the bug. Is this possible?
> 
>> 
>> Rich
>> 
>> 
>> On 8/29/07 9:56 AM, "Gleb Natapov"  wrote:
>> 
>>> On Wed, Aug 29, 2007 at 04:48:07PM +0300, Gleb Natapov wrote:
 Is this trunk or 1.2?
>>> Oops. I should read more carefully :) This is trunk.
>>> 
 
 On Wed, Aug 29, 2007 at 09:40:30AM -0400, Terry D. Dontje wrote:
> I have a program that does a simple bucket brigade of sends and receives
> where rank 0 is the start and repeatedly sends to rank 1 until a certain
> amount of time has passed and then it sends and all done packet.
> 
> Running this under np=2 always works.  However, when I run with greater
> than 2 using only the SM btl the program usually hangs and one of the
> processes has a long stack that has a lot of the following 3 calls in it:
> 
>  [25] opal_progress(), line 187 in "opal_progress.c"
>   [26] mca_btl_sm_component_progress(), line 397 in "btl_sm_component.c"
>   [27] mca_bml_r2_progress(), line 110 in "bml_r2.c"
> 
> When stepping through the ompi_fifo_write_to_head routine it looks like
> the fifo has overflowed.
> 
> I am wondering if what is happening is rank 0 has sent a bunch of
> messages that have exhausted the
> resources such that one of the middle ranks which is in the process of
> sending cannot send and therefore
> never gets to the point of trying to receive the messages from rank 0?
> 
> Is the above a possible scenario or are messages periodically bled off
> the SM BTL's fifos?
> 
> Note, I have seen np=3 pass sometimes and I can get it to pass reliably
> if I raise the shared memory space used by the BTL.   This is using the
> trunk.
> 
> 
> --td
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
 
 --
 Gleb.
 ___
 devel mailing list
 de...@open-mpi.org
 http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>> 
>>> --
>>> Gleb.
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> --
> Gleb.
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] SM BTL hang issue

2007-08-29 Thread Richard Graham
Gleb,
  Are you looking at this ?

Rich


On 8/29/07 9:56 AM, "Gleb Natapov"  wrote:

> On Wed, Aug 29, 2007 at 04:48:07PM +0300, Gleb Natapov wrote:
>> Is this trunk or 1.2?
> Oops. I should read more carefully :) This is trunk.
> 
>> 
>> On Wed, Aug 29, 2007 at 09:40:30AM -0400, Terry D. Dontje wrote:
>>> I have a program that does a simple bucket brigade of sends and receives
>>> where rank 0 is the start and repeatedly sends to rank 1 until a certain
>>> amount of time has passed and then it sends and all done packet.
>>> 
>>> Running this under np=2 always works.  However, when I run with greater
>>> than 2 using only the SM btl the program usually hangs and one of the
>>> processes has a long stack that has a lot of the following 3 calls in it:
>>> 
>>>  [25] opal_progress(), line 187 in "opal_progress.c"
>>>   [26] mca_btl_sm_component_progress(), line 397 in "btl_sm_component.c"
>>>   [27] mca_bml_r2_progress(), line 110 in "bml_r2.c"
>>> 
>>> When stepping through the ompi_fifo_write_to_head routine it looks like
>>> the fifo has overflowed.
>>> 
>>> I am wondering if what is happening is rank 0 has sent a bunch of
>>> messages that have exhausted the
>>> resources such that one of the middle ranks which is in the process of
>>> sending cannot send and therefore
>>> never gets to the point of trying to receive the messages from rank 0?
>>> 
>>> Is the above a possible scenario or are messages periodically bled off
>>> the SM BTL's fifos?
>>> 
>>> Note, I have seen np=3 pass sometimes and I can get it to pass reliably
>>> if I raise the shared memory space used by the BTL.   This is using the
>>> trunk.
>>> 
>>> 
>>> --td
>>> 
>>> 
>>> ___
>>> devel mailing list
>>> de...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>> 
>> --
>> Gleb.
>> ___
>> devel mailing list
>> de...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> 
> --
> Gleb.
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel



Re: [OMPI devel] SM BTL hang issue

2007-08-29 Thread Terry D. Dontje

Trunk.

--td
Gleb Natapov wrote:


Is this trunk or 1.2?

On Wed, Aug 29, 2007 at 09:40:30AM -0400, Terry D. Dontje wrote:
 

I have a program that does a simple bucket brigade of sends and receives 
where rank 0 is the start and repeatedly sends to rank 1 until a certain 
amount of time has passed and then it sends and all done packet.


Running this under np=2 always works.  However, when I run with greater 
than 2 using only the SM btl the program usually hangs and one of the 
processes has a long stack that has a lot of the following 3 calls in it:


[25] opal_progress(), line 187 in "opal_progress.c"
 [26] mca_btl_sm_component_progress(), line 397 in "btl_sm_component.c"
 [27] mca_bml_r2_progress(), line 110 in "bml_r2.c"

When stepping through the ompi_fifo_write_to_head routine it looks like 
the fifo has overflowed.


I am wondering if what is happening is rank 0 has sent a bunch of 
messages that have exhausted the
resources such that one of the middle ranks which is in the process of 
sending cannot send and therefore

never gets to the point of trying to receive the messages from rank 0?

Is the above a possible scenario or are messages periodically bled off 
the SM BTL's fifos?


Note, I have seen np=3 pass sometimes and I can get it to pass reliably 
if I raise the shared memory space used by the BTL.   This is using the 
trunk.



--td


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
   



--
Gleb.
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel
 





Re: [OMPI devel] SM BTL hang issue

2007-08-29 Thread Gleb Natapov
Is this trunk or 1.2?

On Wed, Aug 29, 2007 at 09:40:30AM -0400, Terry D. Dontje wrote:
> I have a program that does a simple bucket brigade of sends and receives 
> where rank 0 is the start and repeatedly sends to rank 1 until a certain 
> amount of time has passed and then it sends and all done packet.
> 
> Running this under np=2 always works.  However, when I run with greater 
> than 2 using only the SM btl the program usually hangs and one of the 
> processes has a long stack that has a lot of the following 3 calls in it:
> 
>  [25] opal_progress(), line 187 in "opal_progress.c"
>   [26] mca_btl_sm_component_progress(), line 397 in "btl_sm_component.c"
>   [27] mca_bml_r2_progress(), line 110 in "bml_r2.c"
> 
> When stepping through the ompi_fifo_write_to_head routine it looks like 
> the fifo has overflowed.
> 
> I am wondering if what is happening is rank 0 has sent a bunch of 
> messages that have exhausted the
> resources such that one of the middle ranks which is in the process of 
> sending cannot send and therefore
> never gets to the point of trying to receive the messages from rank 0?
> 
> Is the above a possible scenario or are messages periodically bled off 
> the SM BTL's fifos?
> 
> Note, I have seen np=3 pass sometimes and I can get it to pass reliably 
> if I raise the shared memory space used by the BTL.   This is using the 
> trunk.
> 
> 
> --td
> 
> 
> ___
> devel mailing list
> de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
Gleb.