Re: [m5-dev] syscall tracer

2009-01-31 Thread nathan binkert
I think I'm missing the high level overview of what this thread is
about, but I do have one comment

 To help find that seg fault, I'd suggest going into the kernel and placing
 m5_exit() calls in arch/x86/mm/fault.c in the do_page_fault() where the
 kernel sends a SIGSEGV to user code and that'll help track down when it
 happens the first time, and reduce the cruft that happens after the program
 halts, like printing Segmentation Fault to the serial port.  I'm not sure
 a syscall tracer will help with finding the segfault, I have a feeling its
 all in glibc and some weird corner case in the ISA of the M5 implementation
 that is causing the bug.  This version of glibc causing the fault does work
 on real hardware correct?

I'm not sure what this segfault is all about, but an alternative to
m5_exit is that we could create  a breakpoint instruction and allow it
to take a parameter as a breakpoint number.  This way, M5 could
generally run ignoring the breakpoints, unless you turn them on.  The
breakpoint could also be a string which refers to a traceflag.  An
abuse of the traceflags mechanism, but not so bad.

Another thing that could be nice for debugging is hacking ld.so a
little bit so that it passes the symbols to M5 with the add_symbol
instruction.  You'd of course not want all instances of ld.so running
to do this, so you'd want to be able to have a set of programs or PIDs
or something that this happened on.

Just some thoughts.

  Nate
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] syscall tracer

2009-01-31 Thread Gabe Black

 Another thing that could be nice for debugging is hacking ld.so a
 little bit so that it passes the symbols to M5 with the add_symbol
 instruction.  You'd of course not want all instances of ld.so running
 to do this, so you'd want to be able to have a set of programs or PIDs
 or something that this happened on.
   
It would be nice to hack the kernels elf loader to do something 
similar as well. Both could provide be sort of an overlay symbol table 
you'd pull in and out on task switches. I like these ideas, but 
unfortunately the binaries I'm dealing with in my disk image don't seem 
to have any symbols in the first place.

Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] syscall tracer

2009-01-31 Thread nathan binkert
It would be nice to hack the kernels elf loader to do something
 similar as well. Both could provide be sort of an overlay symbol table
 you'd pull in and out on task switches. I like these ideas, but
 unfortunately the binaries I'm dealing with in my disk image don't seem
 to have any symbols in the first place.
Unfortunately, that's a common problem.  You'd have to change your
linker options and rebuild everything with that on, though I'd have
thought with Gentoo that that would be easy.

 Nate
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] syscall tracer

2009-01-30 Thread Steve Reinhardt
I've been following your thoughts but haven't replied because I don't have
any particular ideas beyond what you've said.  The main things I would say
are to build off the existing SE-mode data structures and make the tracing
work with SE mode as well, but you've already covered those.  You'll see
that there is some cheesy SE-mode tracing that simply prints the first 3
args as ints (or something like that), which was basically as much effort as
I was willing to put into it.  It does sound very useful though.

One idea that just came to mind is that it might be worth looking at the
strace source to see if there are any ideas there you can use.  Better to
look at the BSD one rather than the Linux one of course in case there is
code that you want to re-use directly.

Steve

On Thu, Jan 29, 2009 at 11:54 PM, Gabe Black gbl...@eecs.umich.edu wrote:

 Anybody? I was thinking one option would be to extend SyscallDesc to
 have a gatherArgs() function and a describe() function. describe() would
 just generate a string which would be like disassembly but for syscalls.
 Then, every syscall would have a nice line in SE with the syscall
 traceflag, but it would also automatically be available in FS for my
 tracer. gatherArgs would just populate member variables (for instance)
 with the syscall arguments so they aren't pulled in for both the
 description and the actual syscall. It wouldn't be necessary, especially
 considering that syscalls aren't a very performance sensitive operation
 for us.

 Gabe

 Gabe Black wrote:
  Actually the timer goes off and the UART gets checked manually, and
  everything has passed through by that point so execution continues. I
  think there's supposed to be an interrupt or something for when the UART
  finishes, so there may be an issue with that never showing up.
 
  Gabe Black wrote:
 
  As a vote in favor of the usefulness of something like this, I think
  I've identified -a- problem with it. There's a close system call that's
  called on file descriptor 0 which is connected to the UART. The kernel
  starts waiting for the buffer to drain, but it never does for some
  reason and it just wakes up every now and then to give it another shot.
  I don't know if this has anything to do with the segfault, but I'd guess
  this is partially from me implementing all interrupts like they're edge
  triggered rather than level triggered like the UART apparently expects.
  If the UART starts driving its interrupt line while they're disabled for
  some reason, that will get lost and some count could end up out of
  balance. I'm going to be working on my somewhat hacky interrupt wiring
  scheme to make it less hacky and something I'm willing to push, and in
  the process I'll probably try to fix this too.
 
  To the folks with more kernel experience than me, does that sound
  like a reasonable theory? Is there something else that might make it
  wait forever? It seems to think it's got 7 characters in the buffer
  which seems like a very small number compared to how much output it's
  generated.
 
  Gabe
 
  Gabe Black wrote:
 
 
  Unfortunately, decreasing the TLB size to one was a red haring
  (sic?). With only one entry, if an instruction or an access spans pages
  (which takes amazingly long to happen), the TLB thrashes back and forth
  in that one entry and never gets anywhere. Now what I'm trying to do to
  get a better handle on the flow of the program is to implement a
 tracer,
  like the one you get with the Exec traceflag, but that prints out the
  parameters and return value of system calls. I have a simple version of
  this hacked in already, but there are probably four things that prevent
  it from working as well as it could. Three of those are mapping syscall
  numbers to names, knowing how many arguments there are, and knowing
  which are string pointers so the string can be gathered with functional
  accesses. The fourth is identifying when you're entering or exiting a
  syscall in the first place. For the first group, I'm just simply
  ignoring those issues and printing the raw syscall number and the
 values
  in each of the argument registers. For the last one, I'm doing a
  hardcoded check for the mnemonics of fitting instructions as they go by
  which you can imagine is hardly efficient or flexible.
 
  At least some of the information I'd like to get at, like syscall
  names at least, is already present in m5 but is only compiled in for
 SE.
  Other bits of information like the number and types of arguments would
  be harder to get at. I was thinking for system call and system return
  instructions the IsSyscall and IsReturn and/or IsCall could be used to
  generically identify the instructions of interest. Does anyone have any
  thoughts about whether this is worthwhile and how best to get the
  information it would need gathered together?
 
  Gabe
  ___
  m5-dev mailing list
  m5-dev@m5sim.org
  

Re: [m5-dev] syscall tracer

2009-01-30 Thread Geoffrey Blake
What exactly are you trying to do with making a syscall tracer Gabe? I
thought your original problem was a happening with GLIBC doing some bizarre
pointer encryption/decryption and it was getting it wrong leading to a
segmentation fault?

 

To help find that seg fault, I'd suggest going into the kernel and placing
m5_exit() calls in arch/x86/mm/fault.c in the do_page_fault() where the
kernel sends a SIGSEGV to user code and that'll help track down when it
happens the first time, and reduce the cruft that happens after the program
halts, like printing Segmentation Fault to the serial port.  I'm not sure
a syscall tracer will help with finding the segfault, I have a feeling its
all in glibc and some weird corner case in the ISA of the M5 implementation
that is causing the bug.  This version of glibc causing the fault does work
on real hardware correct?

 

Geoff

 

From: m5-dev-boun...@m5sim.org [mailto:m5-dev-boun...@m5sim.org] On Behalf
Of Steve Reinhardt
Sent: Friday, January 30, 2009 10:12 AM
To: M5 Developer List
Subject: Re: [m5-dev] syscall tracer

 

I've been following your thoughts but haven't replied because I don't have
any particular ideas beyond what you've said.  The main things I would say
are to build off the existing SE-mode data structures and make the tracing
work with SE mode as well, but you've already covered those.  You'll see
that there is some cheesy SE-mode tracing that simply prints the first 3
args as ints (or something like that), which was basically as much effort as
I was willing to put into it.  It does sound very useful though.

One idea that just came to mind is that it might be worth looking at the
strace source to see if there are any ideas there you can use.  Better to
look at the BSD one rather than the Linux one of course in case there is
code that you want to re-use directly.

Steve

On Thu, Jan 29, 2009 at 11:54 PM, Gabe Black gbl...@eecs.umich.edu wrote:

Anybody? I was thinking one option would be to extend SyscallDesc to
have a gatherArgs() function and a describe() function. describe() would
just generate a string which would be like disassembly but for syscalls.
Then, every syscall would have a nice line in SE with the syscall
traceflag, but it would also automatically be available in FS for my
tracer. gatherArgs would just populate member variables (for instance)
with the syscall arguments so they aren't pulled in for both the
description and the actual syscall. It wouldn't be necessary, especially
considering that syscalls aren't a very performance sensitive operation
for us.

Gabe


Gabe Black wrote:
 Actually the timer goes off and the UART gets checked manually, and
 everything has passed through by that point so execution continues. I
 think there's supposed to be an interrupt or something for when the UART
 finishes, so there may be an issue with that never showing up.

 Gabe Black wrote:

 As a vote in favor of the usefulness of something like this, I think
 I've identified -a- problem with it. There's a close system call that's
 called on file descriptor 0 which is connected to the UART. The kernel
 starts waiting for the buffer to drain, but it never does for some
 reason and it just wakes up every now and then to give it another shot.
 I don't know if this has anything to do with the segfault, but I'd guess
 this is partially from me implementing all interrupts like they're edge
 triggered rather than level triggered like the UART apparently expects.
 If the UART starts driving its interrupt line while they're disabled for
 some reason, that will get lost and some count could end up out of
 balance. I'm going to be working on my somewhat hacky interrupt wiring
 scheme to make it less hacky and something I'm willing to push, and in
 the process I'll probably try to fix this too.

 To the folks with more kernel experience than me, does that sound
 like a reasonable theory? Is there something else that might make it
 wait forever? It seems to think it's got 7 characters in the buffer
 which seems like a very small number compared to how much output it's
 generated.

 Gabe

 Gabe Black wrote:


 Unfortunately, decreasing the TLB size to one was a red haring
 (sic?). With only one entry, if an instruction or an access spans pages
 (which takes amazingly long to happen), the TLB thrashes back and forth
 in that one entry and never gets anywhere. Now what I'm trying to do to
 get a better handle on the flow of the program is to implement a tracer,
 like the one you get with the Exec traceflag, but that prints out the
 parameters and return value of system calls. I have a simple version of
 this hacked in already, but there are probably four things that prevent
 it from working as well as it could. Three of those are mapping syscall
 numbers to names, knowing how many arguments there are, and knowing
 which are string pointers so the string can be gathered with functional
 accesses. The fourth is identifying when you're entering or exiting

Re: [m5-dev] syscall tracer

2009-01-30 Thread gblack
Quoting Geoffrey Blake bla...@umich.edu:

 What exactly are you trying to do with making a syscall tracer Gabe? I
 thought your original problem was a happening with GLIBC doing some bizarre
 pointer encryption/decryption and it was getting it wrong leading to a
 segmentation fault?

That is the base problem. What I was doing manually was that since the binary is
dynamically linked, I was searching for system calls in the Exec trace to find
where the dynamic linker had mmapped slabs of init and libc. That was/is the
only way to know that something like 0x28faff0ac850 really goes with 0x850 in
the linker (or init or libc? I forget). There are patterns, but they're hard to
remember and they were actually changing when I tried the small TLB size.
Address space randomization seems to be sensitive to small changes in
execution, so unless I was just changing the trace flags I'd have to figure out
the mappings all over again. Then it occurred to me there was an easy way to
automate part of that process which is where this part came from.




 To help find that seg fault, I'd suggest going into the kernel and placing
 m5_exit() calls in arch/x86/mm/fault.c in the do_page_fault() where the
 kernel sends a SIGSEGV to user code and that'll help track down when it
 happens the first time, and reduce the cruft that happens after the program
 halts, like printing Segmentation Fault to the serial port.  I'm not sure
 a syscall tracer will help with finding the segfault, I have a feeling its
 all in glibc and some weird corner case in the ISA of the M5 implementation
 that is causing the bug.  This version of glibc causing the fault does work
 on real hardware correct?

I'm assuming it works on real hardware. The image I'm using is the starter file
system for Gentoo, so if it didn't work there'd be a lot of annoyed people.
What I did was add a trace flag for all faults in x86. Since there are no tlb
miss faults, or at least those work different, the only ones that should show
up are the page faults. That let me home in on the exact instruction at fault,
and then though a lot of painful pattern matching find the C that spawned it.

I think there's some sort of address mapping issue because the thing that set
the pointer that's being garbled appears to be constructing a linked list for
the heap manager. Obviously those two things shouldn't land on top of each
other. Fortunately, I found the code that manipulates that as well.
Unfortunately I forgot where it was, so I'll have to dumpster dive again. What
I'm going to do is to find the actual address used in both cases, fortunately a
statically defined global address in the faulting case, and try to figure out
which one is in the wrong for using it. It's possible they both are right and
the kernel is mistakenly mapping the same physical page to both addresses. In
that case it'll be a little more fun to figure out, but at least I'll be
working with debug information and letting gdb do the heavy lifting. It's also
possible that the kernel is mapping everything right and my page table walker
or TLB code is tripping things up.

Gabe

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] syscall tracer

2009-01-29 Thread Gabe Black
Actually the timer goes off and the UART gets checked manually, and 
everything has passed through by that point so execution continues. I 
think there's supposed to be an interrupt or something for when the UART 
finishes, so there may be an issue with that never showing up.

Gabe Black wrote:
 As a vote in favor of the usefulness of something like this, I think 
 I've identified -a- problem with it. There's a close system call that's 
 called on file descriptor 0 which is connected to the UART. The kernel 
 starts waiting for the buffer to drain, but it never does for some 
 reason and it just wakes up every now and then to give it another shot. 
 I don't know if this has anything to do with the segfault, but I'd guess 
 this is partially from me implementing all interrupts like they're edge 
 triggered rather than level triggered like the UART apparently expects. 
 If the UART starts driving its interrupt line while they're disabled for 
 some reason, that will get lost and some count could end up out of 
 balance. I'm going to be working on my somewhat hacky interrupt wiring 
 scheme to make it less hacky and something I'm willing to push, and in 
 the process I'll probably try to fix this too.

 To the folks with more kernel experience than me, does that sound 
 like a reasonable theory? Is there something else that might make it 
 wait forever? It seems to think it's got 7 characters in the buffer 
 which seems like a very small number compared to how much output it's 
 generated.

 Gabe

 Gabe Black wrote:
   
 Unfortunately, decreasing the TLB size to one was a red haring 
 (sic?). With only one entry, if an instruction or an access spans pages 
 (which takes amazingly long to happen), the TLB thrashes back and forth 
 in that one entry and never gets anywhere. Now what I'm trying to do to 
 get a better handle on the flow of the program is to implement a tracer, 
 like the one you get with the Exec traceflag, but that prints out the 
 parameters and return value of system calls. I have a simple version of 
 this hacked in already, but there are probably four things that prevent 
 it from working as well as it could. Three of those are mapping syscall 
 numbers to names, knowing how many arguments there are, and knowing 
 which are string pointers so the string can be gathered with functional 
 accesses. The fourth is identifying when you're entering or exiting a 
 syscall in the first place. For the first group, I'm just simply 
 ignoring those issues and printing the raw syscall number and the values 
 in each of the argument registers. For the last one, I'm doing a 
 hardcoded check for the mnemonics of fitting instructions as they go by 
 which you can imagine is hardly efficient or flexible.

 At least some of the information I'd like to get at, like syscall 
 names at least, is already present in m5 but is only compiled in for SE. 
 Other bits of information like the number and types of arguments would 
 be harder to get at. I was thinking for system call and system return 
 instructions the IsSyscall and IsReturn and/or IsCall could be used to 
 generically identify the instructions of interest. Does anyone have any 
 thoughts about whether this is worthwhile and how best to get the 
 information it would need gathered together?

 Gabe
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev
   
 

 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev
   

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


Re: [m5-dev] syscall tracer

2009-01-29 Thread Gabe Black
Anybody? I was thinking one option would be to extend SyscallDesc to 
have a gatherArgs() function and a describe() function. describe() would 
just generate a string which would be like disassembly but for syscalls. 
Then, every syscall would have a nice line in SE with the syscall 
traceflag, but it would also automatically be available in FS for my 
tracer. gatherArgs would just populate member variables (for instance) 
with the syscall arguments so they aren't pulled in for both the 
description and the actual syscall. It wouldn't be necessary, especially 
considering that syscalls aren't a very performance sensitive operation 
for us.

Gabe

Gabe Black wrote:
 Actually the timer goes off and the UART gets checked manually, and 
 everything has passed through by that point so execution continues. I 
 think there's supposed to be an interrupt or something for when the UART 
 finishes, so there may be an issue with that never showing up.

 Gabe Black wrote:
   
 As a vote in favor of the usefulness of something like this, I think 
 I've identified -a- problem with it. There's a close system call that's 
 called on file descriptor 0 which is connected to the UART. The kernel 
 starts waiting for the buffer to drain, but it never does for some 
 reason and it just wakes up every now and then to give it another shot. 
 I don't know if this has anything to do with the segfault, but I'd guess 
 this is partially from me implementing all interrupts like they're edge 
 triggered rather than level triggered like the UART apparently expects. 
 If the UART starts driving its interrupt line while they're disabled for 
 some reason, that will get lost and some count could end up out of 
 balance. I'm going to be working on my somewhat hacky interrupt wiring 
 scheme to make it less hacky and something I'm willing to push, and in 
 the process I'll probably try to fix this too.

 To the folks with more kernel experience than me, does that sound 
 like a reasonable theory? Is there something else that might make it 
 wait forever? It seems to think it's got 7 characters in the buffer 
 which seems like a very small number compared to how much output it's 
 generated.

 Gabe

 Gabe Black wrote:
   
 
 Unfortunately, decreasing the TLB size to one was a red haring 
 (sic?). With only one entry, if an instruction or an access spans pages 
 (which takes amazingly long to happen), the TLB thrashes back and forth 
 in that one entry and never gets anywhere. Now what I'm trying to do to 
 get a better handle on the flow of the program is to implement a tracer, 
 like the one you get with the Exec traceflag, but that prints out the 
 parameters and return value of system calls. I have a simple version of 
 this hacked in already, but there are probably four things that prevent 
 it from working as well as it could. Three of those are mapping syscall 
 numbers to names, knowing how many arguments there are, and knowing 
 which are string pointers so the string can be gathered with functional 
 accesses. The fourth is identifying when you're entering or exiting a 
 syscall in the first place. For the first group, I'm just simply 
 ignoring those issues and printing the raw syscall number and the values 
 in each of the argument registers. For the last one, I'm doing a 
 hardcoded check for the mnemonics of fitting instructions as they go by 
 which you can imagine is hardly efficient or flexible.

 At least some of the information I'd like to get at, like syscall 
 names at least, is already present in m5 but is only compiled in for SE. 
 Other bits of information like the number and types of arguments would 
 be harder to get at. I was thinking for system call and system return 
 instructions the IsSyscall and IsReturn and/or IsCall could be used to 
 generically identify the instructions of interest. Does anyone have any 
 thoughts about whether this is worthwhile and how best to get the 
 information it would need gathered together?

 Gabe
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev
   
 
   
 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev
   
 

 ___
 m5-dev mailing list
 m5-dev@m5sim.org
 http://m5sim.org/mailman/listinfo/m5-dev
   

___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev


[m5-dev] syscall tracer

2009-01-28 Thread Gabe Black
Unfortunately, decreasing the TLB size to one was a red haring 
(sic?). With only one entry, if an instruction or an access spans pages 
(which takes amazingly long to happen), the TLB thrashes back and forth 
in that one entry and never gets anywhere. Now what I'm trying to do to 
get a better handle on the flow of the program is to implement a tracer, 
like the one you get with the Exec traceflag, but that prints out the 
parameters and return value of system calls. I have a simple version of 
this hacked in already, but there are probably four things that prevent 
it from working as well as it could. Three of those are mapping syscall 
numbers to names, knowing how many arguments there are, and knowing 
which are string pointers so the string can be gathered with functional 
accesses. The fourth is identifying when you're entering or exiting a 
syscall in the first place. For the first group, I'm just simply 
ignoring those issues and printing the raw syscall number and the values 
in each of the argument registers. For the last one, I'm doing a 
hardcoded check for the mnemonics of fitting instructions as they go by 
which you can imagine is hardly efficient or flexible.

At least some of the information I'd like to get at, like syscall 
names at least, is already present in m5 but is only compiled in for SE. 
Other bits of information like the number and types of arguments would 
be harder to get at. I was thinking for system call and system return 
instructions the IsSyscall and IsReturn and/or IsCall could be used to 
generically identify the instructions of interest. Does anyone have any 
thoughts about whether this is worthwhile and how best to get the 
information it would need gathered together?

Gabe
___
m5-dev mailing list
m5-dev@m5sim.org
http://m5sim.org/mailman/listinfo/m5-dev