Re: [OMPI users] Totalview not showing main program on startup with OpenMPI 1.3.x and 1.4.x

2011-02-11 Thread Dennis McRitchie
Hi Terry,

Someone else at the University builds the packages that I use, and we've been 
experimenting for the last few days with different openmpi build options to see 
what might be causing this.

Re the stack, I can always see the entire stack in the TV stack pane, and I can 
always click on 'main' in the stack pane and thereby make my main program's 
source code appear. I can then debug as usual. But, as you said, this is still 
no way to debug a program...

The only thing that might point the finger at OpenMPI is that the same build 
options led to different behavior when running with OpenMPI 1.2.8 vs. anything 
later. But I imagine that it will turn out to be related to the availability 
(or the lack thereof) of OpenMPI symbols to TotalView as to whether it thinks 
it should be displaying assembler or not.

I'll keep you posted with our progress.

Thanks for the tips.

Dennis

From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Terry Dontje
Sent: Friday, February 11, 2011 6:38 AM
To: us...@open-mpi.org
Subject: Re: [OMPI users] Totalview not showing main program on startup with 
OpenMPI 1.3.x and 1.4.x

Sorry I have to ask this, did you build your lastest OMPI version, not just the 
application, with the -g flag too.

IIRC, when I ran into this issue I was actually able to do stepi's and 
eventually pop up the stack however that is really no way to debug a program 
:-).

Unless OMPI is somehow trashing the stack I don't see what OMPI could be doing 
to cause this type of an issue.  Again when I ran into this issue known working 
programs still worked I just was unable to get a full stack.  So it was 
definitely an interfacing issue between totalview and the executable (or the 
result of how the executable and libraries were compiled).   Another thing I 
noticed was when using Solaris Studio dbx I was also able to see the full stack 
where I could not when using totaview.  I am not sure if gdb could also see the 
full stack or not but it might be worth a try to attach gdb to a running 
program and see if you get a full stack.

--td


On 02/09/2011 05:35 PM, Dennis McRitchie wrote:
Thanks Terry.

Unfortunately, -fno-omit-frame-pointer is the default for the Intel compiler 
when -g  is used, which I am using since it is necessary for source level 
debugging. So the compiler kindly tells me that it is ignoring your suggested 
option when I specify it.  :)

Also, since I can reproduce this problem by simply changing the OpenMPI 
version, without changing the compiler version, it strikes me as being more 
likely to be an OpenMPI-related issue: 1.2.8 works, but anything later does not 
(as described below).

I have tried different versions of TotalView from 8.1 to 8.9, but all behave 
the same.

I was wondering if a change to the openmpi-totalview.tcl script might be needed?

Dennis


From: users-boun...@open-mpi.org 
[mailto:users-boun...@open-mpi.org] On Behalf Of Terry Dontje
Sent: Wednesday, February 09, 2011 5:02 PM
To: us...@open-mpi.org
Subject: Re: [OMPI users] Totalview not showing main program on startup with 
OpenMPI 1.3.x and 1.4.x

This sounds like something I ran into some time ago that involved the compiler 
omitting frame pointers.  You may want to try to compile your code with 
-fno-omit-frame-pointer.  I am unsure if you may need to do the same while 
building MPI though.

--td

On 02/09/2011 02:49 PM, Dennis McRitchie wrote:

Hi,



I'm encountering a strange problem and can't find it having been discussed on 
this mailing list.



When building and running my parallel program using any recent Intel compiler 
and OpenMPI 1.2.8, TotalView behaves entirely correctly, displaying the 
"Process mpirun is a parallel job. Do you want to stop the job now?" dialog 
box, and stopping at the start of the program. The code displayed is the source 
code of my program's function main, and the stack trace window shows that we 
are stopped in the poll function many levels "up" from my main function's call 
to MPI_Init. I can then set breakpoints, single step, etc., and the code runs 
appropriately.



But when building and running using Intel compilers with OpenMPI 1.3.x or 
1.4.x, TotalView displays the usual dialog box, and stops at the start of the 
program; but my main program's source code is *not* displayed. The stack trace 
window again shows that we are stopped in the poll function several levels "up" 
from my main function's call to MPI_Init; but this time, the code displayed is 
the assembler code for the poll function itself.



If I click on 'main' in the stack trace window, the source code for my 
program's function main is then displayed, and I can now set breakpoints, 
single step, etc. as usual.



So why is the program's source code not displayed when using 1.3.x and 1.4.x, 
but is displayed when using 1.2.8. This change in behavior is fairly confusing 
to our users, and it would be nice to 

Re: [OMPI users] Totalview not showing main program on startup with OpenMPI 1.3.x and 1.4.x

2011-02-11 Thread Terry Dontje
Sorry I have to ask this, did you build your lastest OMPI version, not 
just the application, with the -g flag too.


IIRC, when I ran into this issue I was actually able to do stepi's and 
eventually pop up the stack however that is really no way to debug a 
program :-).


Unless OMPI is somehow trashing the stack I don't see what OMPI could be 
doing to cause this type of an issue.  Again when I ran into this issue 
known working programs still worked I just was unable to get a full 
stack.  So it was definitely an interfacing issue between totalview and 
the executable (or the result of how the executable and libraries were 
compiled).   Another thing I noticed was when using Solaris Studio dbx I 
was also able to see the full stack where I could not when using 
totaview.  I am not sure if gdb could also see the full stack or not but 
it might be worth a try to attach gdb to a running program and see if 
you get a full stack.


--td


On 02/09/2011 05:35 PM, Dennis McRitchie wrote:


Thanks Terry.

Unfortunately, -fno-omit-frame-pointer is the default for the Intel 
compiler when --g  is used, which I am using since it is necessary for 
source level debugging. So the compiler kindly tells me that it is 
ignoring your suggested option when I specify it. J


Also, since I can reproduce this problem by simply changing the 
OpenMPI version, without changing the compiler version, it strikes me 
as being more likely to be an OpenMPI-related issue: 1.2.8 works, but 
anything later does not (as described below).


I have tried different versions of TotalView from 8.1 to 8.9, but all 
behave the same.


I was wondering if a change to the openmpi-totalview.tcl script might 
be needed?


Dennis

*From:*users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] 
*On Behalf Of *Terry Dontje

*Sent:* Wednesday, February 09, 2011 5:02 PM
*To:* us...@open-mpi.org
*Subject:* Re: [OMPI users] Totalview not showing main program on 
startup with OpenMPI 1.3.x and 1.4.x


This sounds like something I ran into some time ago that involved the 
compiler omitting frame pointers.  You may want to try to compile your 
code with -fno-omit-frame-pointer.  I am unsure if you may need to do 
the same while building MPI though.


--td

On 02/09/2011 02:49 PM, Dennis McRitchie wrote:

Hi,
  
I'm encountering a strange problem and can't find it having been discussed on this mailing list.
  
When building and running my parallel program using any recent Intel compiler and OpenMPI 1.2.8, TotalView behaves entirely correctly, displaying the "Process mpirun is a parallel job. Do you want to stop the job now?" dialog box, and stopping at the start of the program. The code displayed is the source code of my program's function main, and the stack trace window shows that we are stopped in the poll function many levels "up" from my main function's call to MPI_Init. I can then set breakpoints, single step, etc., and the code runs appropriately.
  
But when building and running using Intel compilers with OpenMPI 1.3.x or 1.4.x, TotalView displays the usual dialog box, and stops at the start of the program; but my main program's source code is *not* displayed. The stack trace window again shows that we are stopped in the poll function several levels "up" from my main function's call to MPI_Init; but this time, the code displayed is the assembler code for the poll function itself.
  
If I click on 'main' in the stack trace window, the source code for my program's function main is then displayed, and I can now set breakpoints, single step, etc. as usual.
  
So why is the program's source code not displayed when using 1.3.x and 1.4.x, but is displayed when using 1.2.8. This change in behavior is fairly confusing to our users, and it would be nice to have it work as it used to, if possible.
  
Thanks,

Dennis
  
Dennis McRitchie

Computational Science and Engineering Support (CSES)
Academic Services Department
Office of Information Technology
Princeton University
  
  
___

users mailing list
us...@open-mpi.org  
http://www.open-mpi.org/mailman/listinfo.cgi/users

--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





Re: [OMPI users] Totalview not showing main program on startup with OpenMPI 1.3.x and 1.4.x

2011-02-09 Thread Dennis McRitchie
Thanks Terry.

Unfortunately, -fno-omit-frame-pointer is the default for the Intel compiler 
when -g  is used, which I am using since it is necessary for source level 
debugging. So the compiler kindly tells me that it is ignoring your suggested 
option when I specify it.  :)

Also, since I can reproduce this problem by simply changing the OpenMPI 
version, without changing the compiler version, it strikes me as being more 
likely to be an OpenMPI-related issue: 1.2.8 works, but anything later does not 
(as described below).

I have tried different versions of TotalView from 8.1 to 8.9, but all behave 
the same.

I was wondering if a change to the openmpi-totalview.tcl script might be needed?

Dennis


From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Terry Dontje
Sent: Wednesday, February 09, 2011 5:02 PM
To: us...@open-mpi.org
Subject: Re: [OMPI users] Totalview not showing main program on startup with 
OpenMPI 1.3.x and 1.4.x

This sounds like something I ran into some time ago that involved the compiler 
omitting frame pointers.  You may want to try to compile your code with 
-fno-omit-frame-pointer.  I am unsure if you may need to do the same while 
building MPI though.

--td

On 02/09/2011 02:49 PM, Dennis McRitchie wrote:

Hi,



I'm encountering a strange problem and can't find it having been discussed on 
this mailing list.



When building and running my parallel program using any recent Intel compiler 
and OpenMPI 1.2.8, TotalView behaves entirely correctly, displaying the 
"Process mpirun is a parallel job. Do you want to stop the job now?" dialog 
box, and stopping at the start of the program. The code displayed is the source 
code of my program's function main, and the stack trace window shows that we 
are stopped in the poll function many levels "up" from my main function's call 
to MPI_Init. I can then set breakpoints, single step, etc., and the code runs 
appropriately.



But when building and running using Intel compilers with OpenMPI 1.3.x or 
1.4.x, TotalView displays the usual dialog box, and stops at the start of the 
program; but my main program's source code is *not* displayed. The stack trace 
window again shows that we are stopped in the poll function several levels "up" 
from my main function's call to MPI_Init; but this time, the code displayed is 
the assembler code for the poll function itself.



If I click on 'main' in the stack trace window, the source code for my 
program's function main is then displayed, and I can now set breakpoints, 
single step, etc. as usual.



So why is the program's source code not displayed when using 1.3.x and 1.4.x, 
but is displayed when using 1.2.8. This change in behavior is fairly confusing 
to our users, and it would be nice to have it work as it used to, if possible.



Thanks,

   Dennis



Dennis McRitchie

Computational Science and Engineering Support (CSES)

Academic Services Department

Office of Information Technology

Princeton University





___

users mailing list

us...@open-mpi.org

http://www.open-mpi.org/mailman/listinfo.cgi/users

--
[Oracle]
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle - Performance Technologies
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com




Re: [OMPI users] Totalview not showing main program on startup with OpenMPI 1.3.x and 1.4.x

2011-02-09 Thread Terry Dontje
This sounds like something I ran into some time ago that involved the 
compiler omitting frame pointers.  You may want to try to compile your 
code with -fno-omit-frame-pointer.  I am unsure if you may need to do 
the same while building MPI though.


--td

On 02/09/2011 02:49 PM, Dennis McRitchie wrote:

Hi,

I'm encountering a strange problem and can't find it having been discussed on 
this mailing list.

When building and running my parallel program using any recent Intel compiler and OpenMPI 1.2.8, 
TotalView behaves entirely correctly, displaying the "Process mpirun is a parallel job. Do you 
want to stop the job now?" dialog box, and stopping at the start of the program. The code 
displayed is the source code of my program's function main, and the stack trace window shows that 
we are stopped in the poll function many levels "up" from my main function's call to 
MPI_Init. I can then set breakpoints, single step, etc., and the code runs appropriately.

But when building and running using Intel compilers with OpenMPI 1.3.x or 1.4.x, 
TotalView displays the usual dialog box, and stops at the start of the program; but my 
main program's source code is *not* displayed. The stack trace window again shows that we 
are stopped in the poll function several levels "up" from my main function's 
call to MPI_Init; but this time, the code displayed is the assembler code for the poll 
function itself.

If I click on 'main' in the stack trace window, the source code for my 
program's function main is then displayed, and I can now set breakpoints, 
single step, etc. as usual.

So why is the program's source code not displayed when using 1.3.x and 1.4.x, 
but is displayed when using 1.2.8. This change in behavior is fairly confusing 
to our users, and it would be nice to have it work as it used to, if possible.

Thanks,
Dennis

Dennis McRitchie
Computational Science and Engineering Support (CSES)
Academic Services Department
Office of Information Technology
Princeton University


___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Oracle
Terry D. Dontje | Principal Software Engineer
Developer Tools Engineering | +1.781.442.2631
Oracle *- Performance Technologies*
95 Network Drive, Burlington, MA 01803
Email terry.don...@oracle.com 





[OMPI users] Totalview not showing main program on startup with OpenMPI 1.3.x and 1.4.x

2011-02-09 Thread Dennis McRitchie
Hi,

I'm encountering a strange problem and can't find it having been discussed on 
this mailing list.

When building and running my parallel program using any recent Intel compiler 
and OpenMPI 1.2.8, TotalView behaves entirely correctly, displaying the 
"Process mpirun is a parallel job. Do you want to stop the job now?" dialog 
box, and stopping at the start of the program. The code displayed is the source 
code of my program's function main, and the stack trace window shows that we 
are stopped in the poll function many levels "up" from my main function's call 
to MPI_Init. I can then set breakpoints, single step, etc., and the code runs 
appropriately.

But when building and running using Intel compilers with OpenMPI 1.3.x or 
1.4.x, TotalView displays the usual dialog box, and stops at the start of the 
program; but my main program's source code is *not* displayed. The stack trace 
window again shows that we are stopped in the poll function several levels "up" 
from my main function's call to MPI_Init; but this time, the code displayed is 
the assembler code for the poll function itself.

If I click on 'main' in the stack trace window, the source code for my 
program's function main is then displayed, and I can now set breakpoints, 
single step, etc. as usual.

So why is the program's source code not displayed when using 1.3.x and 1.4.x, 
but is displayed when using 1.2.8. This change in behavior is fairly confusing 
to our users, and it would be nice to have it work as it used to, if possible.

Thanks,
   Dennis

Dennis McRitchie
Computational Science and Engineering Support (CSES)
Academic Services Department
Office of Information Technology
Princeton University