Re: question on profiling code

2003-03-21 Thread Bruce Evans
Long ago, On Mon, 17 Feb 2003, Jake Burkholder wrote:

[I quoted everything since this thread is so old.  My main point is at
the end.  It restarts and older part of this thread.]

  Can you explain how fuswintr() and suswintr() work on sparc64's?  They
  seem to cause traps if the user counter is not mapped, and I can't see
  where the traps are handled.  ia64/trap.c has a comment about where these
  traps are handled, but has dummies for fuswintr() and suswintr() so the
  traps never occur.  Depending on traps in fast interrupt handlers is
  a bug IMO.  It extends the scope of the fast interrupt handler to the
  trap handler, and it is difficult to limit this scope and verify the
  locking for it.

 Ok.  Sparc64 uses lazy trap handling, similar to how I saw you'd done
 in your sys.dif.  The functions that access user space are delimited by
 labels, and in trap and trap_pfault we check to see if the pc is inside
 of the labels.  fuswintr and suswintr and bracketed by fs_nofault_intr_begin
 and fs_nofault_intr_end, which trap_pfault checks for specifically before
 doing much of anything:

 if (ctx != TLB_CTX_KERNEL) {
 if ((tf-tf_tstate  TSTATE_PRIV) != 0 
 (tf-tf_tpc = (u_long)fs_nofault_intr_begin 
  tf-tf_tpc = (u_long)fs_nofault_intr_end)) {
 tf-tf_tpc = (u_long)fs_fault;
 tf-tf_tnpc = tf-tf_tpc + 4;
 return (0);
 }
   ... handle fault

 ctx != TLB_CTX_KERNEL is akin to va  VM_MAXUSER_ADDRESS (the address spaces
 overlap on sparc64 so we can only rely on tlb context numbers).  Note that
 the range bracketed by the fs_nofault_intr_* is included in the fs_nofault
 range, which handles alignment or data access exception faults.

I see.  This is much cleaner than my version, since I use lots of equality
checks instead of the range check, and need labels for them all.

 It does take some special care in trap() and trap_pfault() not to access
 important data, or acquire any locks before this test.  Non-trivial
 interrupts are still masked here, which buys us something.  Probably the
 vmspace pointer should not be counted on in this context, but I don't think
 it will ever be invalid for the current process, especially since the original
 interrupt occured in usermode.

The i386 trap() wants to enable interrupts early since this reduces its
complications, but this is wrong even for its existing lazy trap handling.
My version has even more complications in an attempt to only mask interrupts
earlier in the non-lazy trap handling cases.

 The only locking that's required that I can see is that PS_PROFIL not be
 set when the profiling buffer is invalid.  But all that will happen is that
 attempts to update the profiling buffer will be ignored.  Technically the
 process should get a signal but addupc_task doesn't check the return value
 of copyin/out (oops).

addupc_task still doesn't check.  It wouldn't hurt for profil() to
check the buffer up front using useracc().  But silly processes could
still unmap the buffer, so copyin/out should check too.  Strictly, we
need enough locking to prevent invalid profiling buffers being used
once the process has seen that they have been invalidated.  Even silly
processes can reasonably expect to use their profiling buffers for
something else after they have turned off profiling.

Back to an older part of this thread:

 On Mon, 17 Feb 2003, Jake Burkholder wrote:

  Apparently, On Mon, Feb 17, 2003 at 05:35:09PM +1100,
   Now there is a stronger reason: clock interrupt handling is fast,
   so it normally returns to user mode without going near ast(), and the
   counter is not updated until the next non-fast interrupt or syscall.
 
  In freebsd fast interrupts do handle asts, on i386 they return through
  doreti

 Oops.

Actually, this doesn't always help.  ast() is only called on return to user
mode.  doreti doesn't check TDF_NEEDRESCHED on return to kernel mode.  Thus
the following code in ithread_schedule() only works right if the interrupt
occurred in user mode:

%   if (TD_AWAITING_INTR(td)) {
%   CTR2(KTR_INTR, %s: setrunqueue %d, __func__, p-p_pid);
%   TD_CLR_IWAIT(td);
%   setrunqueue(td);
%   if (do_switch 
%   (ctd-td_critnest == 1) ) {
%   KASSERT((TD_IS_RUNNING(ctd)),
%   (ithread_schedule: Bad state for curthread.));
%   ctd-td_proc-p_stats-p_ru.ru_nivcsw++;
%   if (ctd-td_kse-ke_flags  KEF_IDLEKSE)
%   ctd-td_state = TDS_CAN_RUN; /* XXXKSE */
%   mi_switch();
%   } else {
%   curthread-td_flags |= TDF_NEEDRESCHED;
^^^
%   }
%   } else {

There is only a problem when we didn't switch immediately 

Re: question on profiling code

2003-02-17 Thread Bruce Evans
On Mon, 17 Feb 2003, Jake Burkholder wrote:

 Apparently, On Mon, Feb 17, 2003 at 05:35:09PM +1100,
   Bruce Evans said words to the effect of;
  ...  Also, ast() doesn't
  have access to the frame, and there is no macro like CLKF_PC() for
  general frames.  This probably doesn't matter much, since signals are
  rare and the hitting on the signal handler's pc would be perfectly
  correct if the profiling interrupt occurred an instant later.

 There are macros for accessing trapframes, like the ones for clockframe,
 TRAPF_PC etc.

Oops.  ast() always had a frame, and it needed and got some frame access
macros when it was MI'ized.

  Now there is a stronger reason: clock interrupt handling is fast,
  so it normally returns to user mode without going near ast(), and the
  counter is not updated until the next non-fast interrupt or syscall.

 In freebsd fast interrupts do handle asts, on i386 they return through
 doreti

Oops.

 (you may consider this a bug and have removed it in your version).

Indeed I have (my version never had it).  This doesn't result cause
the problem that I thought we had (of not updating the user profiling
counter as soon as possible) since my hardclock() and statclock()
aren't fast interrupt handlers.  However, using ithreads causes similar
problems stashing the pc.  My sched_ithd() saves the pc and other
relevant state for the the ithreads to look at later, much like
addupc_intr() saves things.

 I see no reason not to just use the pc in the trapframe passed to ast,
 even in the case of signals they won't be posted until after addupc_task
 is called.

I think this would work even in my version.  The pc for a clock interrupt
in kernel mode is more interesting in my version, but the pc for a clock
interrupt from user mode must be the same one as in the user's trapframe
except for the special case of signal handling.

Can you explain how fuswintr() and suswintr() work on sparc64's?  They
seem to cause traps if the user counter is not mapped, and I can't see
where the traps are handled.  ia64/trap.c has a comment about where these
traps are handled, but has dummies for fuswintr() and suswintr() so the
traps never occur.  Depending on traps in fast interrupt handlers is
a bug IMO.  It extends the scope of the fast interrupt handler to the
trap handler, and it is difficult to limit this scope and verify the
locking for it.

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: question on profiling code

2003-02-17 Thread Bruce Evans
On Sun, 16 Feb 2003, Julian Elischer wrote:

 On Mon, 17 Feb 2003, Bruce Evans wrote:

  On Sun, 16 Feb 2003, Julian Elischer wrote:
 
   In addupc_intr, if the increment cannot be done immediatly, the addres
   to increment the count for is stored and the increment is done later at
   ast or userret() time...
 
  Note that cannot be done immediatly is always except on sparc64's
  under FreeBSD, since incrementing the counter immediately is only
  implemented on sparc64's.

 Care to explain this statement?
 It doesn't corelate what I see in addupc_int()
 which is Machine Independent.

addupc_intr() calls the MD functions fuswintr() and suswintr() to do most
of the work (if possible), and these functions are dummies which find it
impossible to do the work on all arches except sparc64.

    Also, ast() doesn't
  have access to the frame,

 funny, it uses it..

  and there is no macro like CLKF_PC() for
  general frames.

 They seem to be the same. Macro or no macro, ast and userret can
 certainly access the return address.

I misremembered this.  See another reply for more details.

  Now there is a stronger reason: clock interrupt handling is fast,
  so it normally returns to user mode without going near ast(), and the
  counter is not updated until the next non-fast interrupt or syscall.
  This might not happen until another profiling interrupt clobbers the
  saved pc, not to mention the saved tick count (it could just increment
  the tick count so that ticks are assigned to the last saved pc instead
  of lost; currently, the tick count mostly just tracks KEF_OWEUPC so
  it need not be saved).  This was broken by SMPng (hardclock() is not
  really a fast interrupt handler but is abused as one).  However, normal
  applications probably make enough syscalls to get the right counter
  updated, provided we use the counter for the pc at fast interrupt time
  and not the counter for the pc later.

 I'm trying to fix this for multithreaded programs..
 thanks for the answer.. I hadn't considered that reason.
 I assumed that the ASTPENDIG flag would be set, and that
 the AST would happen, (the userret certainly happens).
 (does it not?)

I was wrong here too.  AST's happen soon even in the fast interrupt case
except in my version (where this is moot because fast interrupt handlers
can't schule ASTs).

As I understand it, what you want to do is remove pr_addr and pr_ticks
because these were never needed before KSE and just get in the way for
KSE (they would have to be per-thread and then their redudancy would be
more bogus).  I agree with trying this.

Brcause


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: question on profiling code

2003-02-17 Thread Jake Burkholder
 
 Can you explain how fuswintr() and suswintr() work on sparc64's?  They
 seem to cause traps if the user counter is not mapped, and I can't see
 where the traps are handled.  ia64/trap.c has a comment about where these
 traps are handled, but has dummies for fuswintr() and suswintr() so the
 traps never occur.  Depending on traps in fast interrupt handlers is
 a bug IMO.  It extends the scope of the fast interrupt handler to the
 trap handler, and it is difficult to limit this scope and verify the
 locking for it.

Ok.  Sparc64 uses lazy trap handling, similar to how I saw you'd done
in your sys.dif.  The functions that access user space are delimited by
labels, and in trap and trap_pfault we check to see if the pc is inside
of the labels.  fuswintr and suswintr and bracketed by fs_nofault_intr_begin
and fs_nofault_intr_end, which trap_pfault checks for specifically before
doing much of anything:

if (ctx != TLB_CTX_KERNEL) {
if ((tf-tf_tstate  TSTATE_PRIV) != 0 
(tf-tf_tpc = (u_long)fs_nofault_intr_begin 
 tf-tf_tpc = (u_long)fs_nofault_intr_end)) {
tf-tf_tpc = (u_long)fs_fault;
tf-tf_tnpc = tf-tf_tpc + 4;
return (0);
}
... handle fault

ctx != TLB_CTX_KERNEL is akin to va  VM_MAXUSER_ADDRESS (the address spaces
overlap on sparc64 so we can only rely on tlb context numbers).  Note that
the range bracketed by the fs_nofault_intr_* is included in the fs_nofault
range, which handles alignment or data access exception faults.

It does take some special care in trap() and trap_pfault() not to access
important data, or acquire any locks before this test.  Non-trivial
interrupts are still masked here, which buys us something.  Probably the
vmspace pointer should not be counted on in this context, but I don't think
it will ever be invalid for the current process, especially since the original
interrupt occured in usermode.

The only locking that's required that I can see is that PS_PROFIL not be
set when the profiling buffer is invalid.  But all that will happen is that
attempts to update the profiling buffer will be ignored.  Technically the
process should get a signal but addupc_task doesn't check the return value
of copyin/out (oops).

Jake

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: question on profiling code

2003-02-16 Thread Bruce Evans
On Sun, 16 Feb 2003, Julian Elischer wrote:

 In addupc_intr, if the increment cannot be done immediatly, the addres
 to increment the count for is stored and the increment is done later at
 ast or userret() time...

Note that cannot be done immediatly is always except on sparc64's
under FreeBSD, since incrementing the counter immediately is only
implemented on sparc64's.

 is there any reason that the address of the PC needs to be stored?
 why is the address from the frame at that time not useable?

 is it because the PC in the return frame may be hacked up for signals?

That's was good a reason as any.  I think the next return to user mode
is to the signal handler's context (if there is a signal to be handled).
It would be wrong to use the signal handler's pc.  Also, ast() doesn't
have access to the frame, and there is no macro like CLKF_PC() for
general frames.  This probably doesn't matter much, since signals are
rare and the hitting on the signal handler's pc would be perfectly
correct if the profiling interrupt occurred an instant later.

Now there is a stronger reason: clock interrupt handling is fast,
so it normally returns to user mode without going near ast(), and the
counter is not updated until the next non-fast interrupt or syscall.
This might not happen until unother profiling interrupt clobbers the
saved pc, not to mention the saved tick count (it could just increment
the tick count so that ticks are assigned to the last saved pc instead
of lost; currently, the tick count mostly just tracks KEF_OWEUPC so
it need not be saved).  This was broken by SMPng (hardclock() is not
really a fast interrupt handler but is abused as one).  However, normal
applications probably make enough syscalls to get the right counter
updated, provided we use the counter for the pc at fast interrupt time
and not the counter for the pc later.

Bruce


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: question on profiling code

2003-02-16 Thread Jake Burkholder
Apparently, On Mon, Feb 17, 2003 at 05:35:09PM +1100,
Bruce Evans said words to the effect of;

 On Sun, 16 Feb 2003, Julian Elischer wrote:
 
  In addupc_intr, if the increment cannot be done immediatly, the addres
  to increment the count for is stored and the increment is done later at
  ast or userret() time...
 
 Note that cannot be done immediatly is always except on sparc64's
 under FreeBSD, since incrementing the counter immediately is only
 implemented on sparc64's.
 
  is there any reason that the address of the PC needs to be stored?
  why is the address from the frame at that time not useable?
 
  is it because the PC in the return frame may be hacked up for signals?
 
 That's was good a reason as any.  I think the next return to user mode
 is to the signal handler's context (if there is a signal to be handled).
 It would be wrong to use the signal handler's pc.  Also, ast() doesn't
 have access to the frame, and there is no macro like CLKF_PC() for
 general frames.  This probably doesn't matter much, since signals are
 rare and the hitting on the signal handler's pc would be perfectly
 correct if the profiling interrupt occurred an instant later.

There are macros for accessing trapframes, like the ones for clockframe,
TRAPF_PC etc.

 
 Now there is a stronger reason: clock interrupt handling is fast,
 so it normally returns to user mode without going near ast(), and the
 counter is not updated until the next non-fast interrupt or syscall.

In freebsd fast interrupts do handle asts, on i386 they return through
doreti (you may consider this a bug and have removed it in your version).
I see no reason not to just use the pc in the trapframe passed to ast,
even in the case of signals they won't be posted until after addupc_task
is called.

Jake

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: question on profiling code

2003-02-16 Thread Julian Elischer


On Mon, 17 Feb 2003, Bruce Evans wrote:

 On Sun, 16 Feb 2003, Julian Elischer wrote:
 
  In addupc_intr, if the increment cannot be done immediatly, the addres
  to increment the count for is stored and the increment is done later at
  ast or userret() time...
 
 Note that cannot be done immediatly is always except on sparc64's
 under FreeBSD, since incrementing the counter immediately is only
 implemented on sparc64's.

Care to explain this statement?
It doesn't corelate what I see in addupc_int()
which is Machine Independent.


 
  is there any reason that the address of the PC needs to be stored?
  why is the address from the frame at that time not useable?
 
  is it because the PC in the return frame may be hacked up for signals?
 
 That's was good a reason as any.  I think the next return to user mode
 is to the signal handler's context (if there is a signal to be handled).
 It would be wrong to use the signal handler's pc.  Also, ast() doesn't
 have access to the frame,

funny, it uses it..

 and there is no macro like CLKF_PC() for
 general frames. 

They seem to be the same. Macro or no macro, ast and userret can
certainly access the return address.

 This probably doesn't matter much, since signals are
 rare and the hitting on the signal handler's pc would be perfectly
 correct if the profiling interrupt occurred an instant later.

that is true.

 
 Now there is a stronger reason: clock interrupt handling is fast,
 so it normally returns to user mode without going near ast(), and the
 counter is not updated until the next non-fast interrupt or syscall.
 This might not happen until another profiling interrupt clobbers the
 saved pc, not to mention the saved tick count (it could just increment
 the tick count so that ticks are assigned to the last saved pc instead
 of lost; currently, the tick count mostly just tracks KEF_OWEUPC so
 it need not be saved).  This was broken by SMPng (hardclock() is not
 really a fast interrupt handler but is abused as one).  However, normal
 applications probably make enough syscalls to get the right counter
 updated, provided we use the counter for the pc at fast interrupt time
 and not the counter for the pc later.

I'm trying to fix this for multithreaded programs..
thanks for the answer.. I hadn't considered that reason.
I assumed that the ASTPENDIG flag would be set, and that 
the AST would happen, (the userret certainly happens).
(does it not?)



 
 Bruce
 
 
 To Unsubscribe: send mail to [EMAIL PROTECTED]
 with unsubscribe freebsd-current in the body of the message
 


To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message