Re: question on profiling code
Long ago, On Mon, 17 Feb 2003, Jake Burkholder wrote: [I quoted everything since this thread is so old. My main point is at the end. It restarts and older part of this thread.] Can you explain how fuswintr() and suswintr() work on sparc64's? They seem to cause traps if the user counter is not mapped, and I can't see where the traps are handled. ia64/trap.c has a comment about where these traps are handled, but has dummies for fuswintr() and suswintr() so the traps never occur. Depending on traps in fast interrupt handlers is a bug IMO. It extends the scope of the fast interrupt handler to the trap handler, and it is difficult to limit this scope and verify the locking for it. Ok. Sparc64 uses lazy trap handling, similar to how I saw you'd done in your sys.dif. The functions that access user space are delimited by labels, and in trap and trap_pfault we check to see if the pc is inside of the labels. fuswintr and suswintr and bracketed by fs_nofault_intr_begin and fs_nofault_intr_end, which trap_pfault checks for specifically before doing much of anything: if (ctx != TLB_CTX_KERNEL) { if ((tf-tf_tstate TSTATE_PRIV) != 0 (tf-tf_tpc = (u_long)fs_nofault_intr_begin tf-tf_tpc = (u_long)fs_nofault_intr_end)) { tf-tf_tpc = (u_long)fs_fault; tf-tf_tnpc = tf-tf_tpc + 4; return (0); } ... handle fault ctx != TLB_CTX_KERNEL is akin to va VM_MAXUSER_ADDRESS (the address spaces overlap on sparc64 so we can only rely on tlb context numbers). Note that the range bracketed by the fs_nofault_intr_* is included in the fs_nofault range, which handles alignment or data access exception faults. I see. This is much cleaner than my version, since I use lots of equality checks instead of the range check, and need labels for them all. It does take some special care in trap() and trap_pfault() not to access important data, or acquire any locks before this test. Non-trivial interrupts are still masked here, which buys us something. Probably the vmspace pointer should not be counted on in this context, but I don't think it will ever be invalid for the current process, especially since the original interrupt occured in usermode. The i386 trap() wants to enable interrupts early since this reduces its complications, but this is wrong even for its existing lazy trap handling. My version has even more complications in an attempt to only mask interrupts earlier in the non-lazy trap handling cases. The only locking that's required that I can see is that PS_PROFIL not be set when the profiling buffer is invalid. But all that will happen is that attempts to update the profiling buffer will be ignored. Technically the process should get a signal but addupc_task doesn't check the return value of copyin/out (oops). addupc_task still doesn't check. It wouldn't hurt for profil() to check the buffer up front using useracc(). But silly processes could still unmap the buffer, so copyin/out should check too. Strictly, we need enough locking to prevent invalid profiling buffers being used once the process has seen that they have been invalidated. Even silly processes can reasonably expect to use their profiling buffers for something else after they have turned off profiling. Back to an older part of this thread: On Mon, 17 Feb 2003, Jake Burkholder wrote: Apparently, On Mon, Feb 17, 2003 at 05:35:09PM +1100, Now there is a stronger reason: clock interrupt handling is fast, so it normally returns to user mode without going near ast(), and the counter is not updated until the next non-fast interrupt or syscall. In freebsd fast interrupts do handle asts, on i386 they return through doreti Oops. Actually, this doesn't always help. ast() is only called on return to user mode. doreti doesn't check TDF_NEEDRESCHED on return to kernel mode. Thus the following code in ithread_schedule() only works right if the interrupt occurred in user mode: % if (TD_AWAITING_INTR(td)) { % CTR2(KTR_INTR, %s: setrunqueue %d, __func__, p-p_pid); % TD_CLR_IWAIT(td); % setrunqueue(td); % if (do_switch % (ctd-td_critnest == 1) ) { % KASSERT((TD_IS_RUNNING(ctd)), % (ithread_schedule: Bad state for curthread.)); % ctd-td_proc-p_stats-p_ru.ru_nivcsw++; % if (ctd-td_kse-ke_flags KEF_IDLEKSE) % ctd-td_state = TDS_CAN_RUN; /* XXXKSE */ % mi_switch(); % } else { % curthread-td_flags |= TDF_NEEDRESCHED; ^^^ % } % } else { There is only a problem when we didn't switch immediately
Re: question on profiling code
On Mon, 17 Feb 2003, Jake Burkholder wrote: Apparently, On Mon, Feb 17, 2003 at 05:35:09PM +1100, Bruce Evans said words to the effect of; ... Also, ast() doesn't have access to the frame, and there is no macro like CLKF_PC() for general frames. This probably doesn't matter much, since signals are rare and the hitting on the signal handler's pc would be perfectly correct if the profiling interrupt occurred an instant later. There are macros for accessing trapframes, like the ones for clockframe, TRAPF_PC etc. Oops. ast() always had a frame, and it needed and got some frame access macros when it was MI'ized. Now there is a stronger reason: clock interrupt handling is fast, so it normally returns to user mode without going near ast(), and the counter is not updated until the next non-fast interrupt or syscall. In freebsd fast interrupts do handle asts, on i386 they return through doreti Oops. (you may consider this a bug and have removed it in your version). Indeed I have (my version never had it). This doesn't result cause the problem that I thought we had (of not updating the user profiling counter as soon as possible) since my hardclock() and statclock() aren't fast interrupt handlers. However, using ithreads causes similar problems stashing the pc. My sched_ithd() saves the pc and other relevant state for the the ithreads to look at later, much like addupc_intr() saves things. I see no reason not to just use the pc in the trapframe passed to ast, even in the case of signals they won't be posted until after addupc_task is called. I think this would work even in my version. The pc for a clock interrupt in kernel mode is more interesting in my version, but the pc for a clock interrupt from user mode must be the same one as in the user's trapframe except for the special case of signal handling. Can you explain how fuswintr() and suswintr() work on sparc64's? They seem to cause traps if the user counter is not mapped, and I can't see where the traps are handled. ia64/trap.c has a comment about where these traps are handled, but has dummies for fuswintr() and suswintr() so the traps never occur. Depending on traps in fast interrupt handlers is a bug IMO. It extends the scope of the fast interrupt handler to the trap handler, and it is difficult to limit this scope and verify the locking for it. Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: question on profiling code
On Sun, 16 Feb 2003, Julian Elischer wrote: On Mon, 17 Feb 2003, Bruce Evans wrote: On Sun, 16 Feb 2003, Julian Elischer wrote: In addupc_intr, if the increment cannot be done immediatly, the addres to increment the count for is stored and the increment is done later at ast or userret() time... Note that cannot be done immediatly is always except on sparc64's under FreeBSD, since incrementing the counter immediately is only implemented on sparc64's. Care to explain this statement? It doesn't corelate what I see in addupc_int() which is Machine Independent. addupc_intr() calls the MD functions fuswintr() and suswintr() to do most of the work (if possible), and these functions are dummies which find it impossible to do the work on all arches except sparc64. Also, ast() doesn't have access to the frame, funny, it uses it.. and there is no macro like CLKF_PC() for general frames. They seem to be the same. Macro or no macro, ast and userret can certainly access the return address. I misremembered this. See another reply for more details. Now there is a stronger reason: clock interrupt handling is fast, so it normally returns to user mode without going near ast(), and the counter is not updated until the next non-fast interrupt or syscall. This might not happen until another profiling interrupt clobbers the saved pc, not to mention the saved tick count (it could just increment the tick count so that ticks are assigned to the last saved pc instead of lost; currently, the tick count mostly just tracks KEF_OWEUPC so it need not be saved). This was broken by SMPng (hardclock() is not really a fast interrupt handler but is abused as one). However, normal applications probably make enough syscalls to get the right counter updated, provided we use the counter for the pc at fast interrupt time and not the counter for the pc later. I'm trying to fix this for multithreaded programs.. thanks for the answer.. I hadn't considered that reason. I assumed that the ASTPENDIG flag would be set, and that the AST would happen, (the userret certainly happens). (does it not?) I was wrong here too. AST's happen soon even in the fast interrupt case except in my version (where this is moot because fast interrupt handlers can't schule ASTs). As I understand it, what you want to do is remove pr_addr and pr_ticks because these were never needed before KSE and just get in the way for KSE (they would have to be per-thread and then their redudancy would be more bogus). I agree with trying this. Brcause To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: question on profiling code
Can you explain how fuswintr() and suswintr() work on sparc64's? They seem to cause traps if the user counter is not mapped, and I can't see where the traps are handled. ia64/trap.c has a comment about where these traps are handled, but has dummies for fuswintr() and suswintr() so the traps never occur. Depending on traps in fast interrupt handlers is a bug IMO. It extends the scope of the fast interrupt handler to the trap handler, and it is difficult to limit this scope and verify the locking for it. Ok. Sparc64 uses lazy trap handling, similar to how I saw you'd done in your sys.dif. The functions that access user space are delimited by labels, and in trap and trap_pfault we check to see if the pc is inside of the labels. fuswintr and suswintr and bracketed by fs_nofault_intr_begin and fs_nofault_intr_end, which trap_pfault checks for specifically before doing much of anything: if (ctx != TLB_CTX_KERNEL) { if ((tf-tf_tstate TSTATE_PRIV) != 0 (tf-tf_tpc = (u_long)fs_nofault_intr_begin tf-tf_tpc = (u_long)fs_nofault_intr_end)) { tf-tf_tpc = (u_long)fs_fault; tf-tf_tnpc = tf-tf_tpc + 4; return (0); } ... handle fault ctx != TLB_CTX_KERNEL is akin to va VM_MAXUSER_ADDRESS (the address spaces overlap on sparc64 so we can only rely on tlb context numbers). Note that the range bracketed by the fs_nofault_intr_* is included in the fs_nofault range, which handles alignment or data access exception faults. It does take some special care in trap() and trap_pfault() not to access important data, or acquire any locks before this test. Non-trivial interrupts are still masked here, which buys us something. Probably the vmspace pointer should not be counted on in this context, but I don't think it will ever be invalid for the current process, especially since the original interrupt occured in usermode. The only locking that's required that I can see is that PS_PROFIL not be set when the profiling buffer is invalid. But all that will happen is that attempts to update the profiling buffer will be ignored. Technically the process should get a signal but addupc_task doesn't check the return value of copyin/out (oops). Jake To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: question on profiling code
On Sun, 16 Feb 2003, Julian Elischer wrote: In addupc_intr, if the increment cannot be done immediatly, the addres to increment the count for is stored and the increment is done later at ast or userret() time... Note that cannot be done immediatly is always except on sparc64's under FreeBSD, since incrementing the counter immediately is only implemented on sparc64's. is there any reason that the address of the PC needs to be stored? why is the address from the frame at that time not useable? is it because the PC in the return frame may be hacked up for signals? That's was good a reason as any. I think the next return to user mode is to the signal handler's context (if there is a signal to be handled). It would be wrong to use the signal handler's pc. Also, ast() doesn't have access to the frame, and there is no macro like CLKF_PC() for general frames. This probably doesn't matter much, since signals are rare and the hitting on the signal handler's pc would be perfectly correct if the profiling interrupt occurred an instant later. Now there is a stronger reason: clock interrupt handling is fast, so it normally returns to user mode without going near ast(), and the counter is not updated until the next non-fast interrupt or syscall. This might not happen until unother profiling interrupt clobbers the saved pc, not to mention the saved tick count (it could just increment the tick count so that ticks are assigned to the last saved pc instead of lost; currently, the tick count mostly just tracks KEF_OWEUPC so it need not be saved). This was broken by SMPng (hardclock() is not really a fast interrupt handler but is abused as one). However, normal applications probably make enough syscalls to get the right counter updated, provided we use the counter for the pc at fast interrupt time and not the counter for the pc later. Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: question on profiling code
Apparently, On Mon, Feb 17, 2003 at 05:35:09PM +1100, Bruce Evans said words to the effect of; On Sun, 16 Feb 2003, Julian Elischer wrote: In addupc_intr, if the increment cannot be done immediatly, the addres to increment the count for is stored and the increment is done later at ast or userret() time... Note that cannot be done immediatly is always except on sparc64's under FreeBSD, since incrementing the counter immediately is only implemented on sparc64's. is there any reason that the address of the PC needs to be stored? why is the address from the frame at that time not useable? is it because the PC in the return frame may be hacked up for signals? That's was good a reason as any. I think the next return to user mode is to the signal handler's context (if there is a signal to be handled). It would be wrong to use the signal handler's pc. Also, ast() doesn't have access to the frame, and there is no macro like CLKF_PC() for general frames. This probably doesn't matter much, since signals are rare and the hitting on the signal handler's pc would be perfectly correct if the profiling interrupt occurred an instant later. There are macros for accessing trapframes, like the ones for clockframe, TRAPF_PC etc. Now there is a stronger reason: clock interrupt handling is fast, so it normally returns to user mode without going near ast(), and the counter is not updated until the next non-fast interrupt or syscall. In freebsd fast interrupts do handle asts, on i386 they return through doreti (you may consider this a bug and have removed it in your version). I see no reason not to just use the pc in the trapframe passed to ast, even in the case of signals they won't be posted until after addupc_task is called. Jake To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message
Re: question on profiling code
On Mon, 17 Feb 2003, Bruce Evans wrote: On Sun, 16 Feb 2003, Julian Elischer wrote: In addupc_intr, if the increment cannot be done immediatly, the addres to increment the count for is stored and the increment is done later at ast or userret() time... Note that cannot be done immediatly is always except on sparc64's under FreeBSD, since incrementing the counter immediately is only implemented on sparc64's. Care to explain this statement? It doesn't corelate what I see in addupc_int() which is Machine Independent. is there any reason that the address of the PC needs to be stored? why is the address from the frame at that time not useable? is it because the PC in the return frame may be hacked up for signals? That's was good a reason as any. I think the next return to user mode is to the signal handler's context (if there is a signal to be handled). It would be wrong to use the signal handler's pc. Also, ast() doesn't have access to the frame, funny, it uses it.. and there is no macro like CLKF_PC() for general frames. They seem to be the same. Macro or no macro, ast and userret can certainly access the return address. This probably doesn't matter much, since signals are rare and the hitting on the signal handler's pc would be perfectly correct if the profiling interrupt occurred an instant later. that is true. Now there is a stronger reason: clock interrupt handling is fast, so it normally returns to user mode without going near ast(), and the counter is not updated until the next non-fast interrupt or syscall. This might not happen until another profiling interrupt clobbers the saved pc, not to mention the saved tick count (it could just increment the tick count so that ticks are assigned to the last saved pc instead of lost; currently, the tick count mostly just tracks KEF_OWEUPC so it need not be saved). This was broken by SMPng (hardclock() is not really a fast interrupt handler but is abused as one). However, normal applications probably make enough syscalls to get the right counter updated, provided we use the counter for the pc at fast interrupt time and not the counter for the pc later. I'm trying to fix this for multithreaded programs.. thanks for the answer.. I hadn't considered that reason. I assumed that the ASTPENDIG flag would be set, and that the AST would happen, (the userret certainly happens). (does it not?) Bruce To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message To Unsubscribe: send mail to [EMAIL PROTECTED] with unsubscribe freebsd-current in the body of the message