Re: Pump my sched: fewer SCHED_LOCK() & kill p_priority
Amit Kulkarni writes: >> root on sd2a (88532b67c09ce3ee.a) swap on sd2b dump on sd2b >> TSC skew=-6129185140 drift=170 >> TSC skew=-6129184900 drift=-10 >> TSC skew=-6129184890 drift=-20 >> TSC skew=-6129184910 drift=30 >> TSC skew=-6129184910 drift=10 >> TSC skew=-6129184900 drift=20 >> TSC skew=-6129184910 drift=30 >> iwm0: hw rev 0x230, fw ver 22.361476.0, address 68:ec:c5:ad:9a:cb >> initializing kernel modesetting (RAVEN 0x1002:0x15DD 0x17AA:0x506F 0xC4). >> amdgpu0: 1920x1080, 32bpp >> wsdisplay0 at amdgpu0 mux 1: console (std, vt100 emulation), using wskbd0 >> wsdisplay0: screen 1-5 added (std, vt100 emulation) >> > > It seems that you have Paul's TSC patch also applied. Please apply > just one patch and test separately, and then report back! > > Thanks Ah, I tested also without the TSC patch and it didn't make any difference. Only other tweak is enabled amdgpu driver in GENERIC. timo
Re: Pump my sched: fewer SCHED_LOCK() & kill p_priority
> root on sd2a (88532b67c09ce3ee.a) swap on sd2b dump on sd2b > TSC skew=-6129185140 drift=170 > TSC skew=-6129184900 drift=-10 > TSC skew=-6129184890 drift=-20 > TSC skew=-6129184910 drift=30 > TSC skew=-6129184910 drift=10 > TSC skew=-6129184900 drift=20 > TSC skew=-6129184910 drift=30 > iwm0: hw rev 0x230, fw ver 22.361476.0, address 68:ec:c5:ad:9a:cb > initializing kernel modesetting (RAVEN 0x1002:0x15DD 0x17AA:0x506F 0xC4). > amdgpu0: 1920x1080, 32bpp > wsdisplay0 at amdgpu0 mux 1: console (std, vt100 emulation), using wskbd0 > wsdisplay0: screen 1-5 added (std, vt100 emulation) > It seems that you have Paul's TSC patch also applied. Please apply just one patch and test separately, and then report back! Thanks
Re: Pump my sched: fewer SCHED_LOCK() & kill p_priority
Martin Pieuchot writes: > On 06/06/19(Thu) 15:16, Martin Pieuchot wrote: >> On 02/06/19(Sun) 16:41, Martin Pieuchot wrote: >> > On 01/06/19(Sat) 18:55, Martin Pieuchot wrote: >> > > Diff below exists mainly for documentation and test purposes. If >> > > you're not interested about how to break the scheduler internals in >> > > pieces, don't read further and go straight to testing! >> > > >> > > - First change is to stop calling tsleep(9) at PUSER. That makes >> > > it clear that all "sleeping priorities" are smaller than PUSER. >> > > That's important to understand for the diff below. `p_priority' >> > > is currently a placeholder for the "sleeping priority" and the >> > > "runnqueue priority". Both fields are separated by this diff. >> > > >> > > - When a thread goes to sleep, the priority argument of tsleep(9) is >> > > now recorded in `p_slpprio'. This argument can be considered as part >> > > of the sleep queue. Its purpose is to place the thread into a higher >> > > runqueue when awoken. >> > > >> > > - Currently, for stopped threads, `p_priority' correspond to `p_usrpri'. >> > > So setrunnable() has been untangled to place SSTOP and SSLEEP threads >> > > in the preferred queue without having to use `p_priority'. Note that >> > > `p_usrpri' is still recalculated *after* having called setrunqueue(). >> > > This is currently fine because setrunnable() is called with >> > > SCHED_LOCK() >> > > but it will be racy when we'll split it. >> > > >> > > - A new field, `p_runprio' has been introduced. It should be considered >> > > as part of the per-CPU runqueues. It indicates where a current thread >> > > is placed. >> > > >> > > - `spc_curpriority' is now updated at every context-switch. That means >> > >need_resched() won't be called after comparing an out-of-date value. >> > >At the same time, `p_usrpri' is initialized to the highest possible >> > >value for idle threads. >> > > >> > > - resched_proc() was calling need_resched() in the following conditions: >> > >- If the SONPROC thread has a higher priority that the current >> > > running thread (itself). >> > >- Twice in setrunnable() when we know that p_priority <= p_usrpri. >> > >- If schedcpu() considered that a thread, after updating its prio, >> > > should preempt the one running on the CPU pointed by `p_cpu'. >> > > >> > > The diff below simplify all of that by calling need_resched() when: >> > >- A thread is inserted in a CPU runqueue at a higher priority than >> > > the one SONPROC. >> > >- schedcpu() decides that a thread in SRUN state should preempt the >> > > one SONPROC. >> > > >> > > - `p_estcpu' `p_usrpri' and `p_slptime' which represent the "priority" >> > > of a thread are now updated while holding a per-thread mutex. As a >> > > result schedclock() and donice() no longer takes the SCHED_LOCK(), >> > > and schedcpu() almost never take it. >> > > >> > > - With this diff top(1) and ps(1) will report the "real" `p_usrpi' value >> > > when displaying priorities. This is helpful to understand what's >> > > happening: >> > > >> > > load averages: 0.99, 0.56, 0.25 two.lab.grenadille.net >> > > 23:42:10 >> > > 70 threads: 68 idle, 2 on processor >> > > up 0:09 >> > > CPU0: 0.0% user, 0.0% nice, 51.0% sys, 2.0% spin, 0.0% intr, 47.1% >> > > idle >> > > CPU1: 2.0% user, 0.0% nice, 51.0% sys, 3.9% spin, 0.0% intr, 43.1% >> > > idle >> > > Memory: Real: 47M/1005M act/tot Free: 2937M Cache: 812M Swap: 0K/4323M >> > > >> > > PID TID PRI NICE SIZE RES STATE WAIT TIMECPU >> > > COMMAND >> > > 81000 145101 7200K 1664K sleep/1 bored 1:15 36.96% >> > > softnet >> > > 47133 244097 730 2984K 4408K sleep/1 netio 1:06 35.06% cvs >> > > 64749 522184 660 176K 148K onproc/1 - 0:55 28.81% nfsd >> > > 21615 602473 12700K 1664K sleep/0 - 7:22 0.00% >> > > idle0 >> > > 12413 606242 12700K 1664K sleep/1 - 7:08 0.00% idle1 >> > > 85778 338258 500 4936K 7308K idle select0:10 0.00% ssh >> > > 22771 575513 500 176K 148K sleep/0 nfsd 0:02 0.00% nfsd >> > > >> > > >> > > >> > > - The removal of `p_priority' and the change that makes mi_switch() >> > > always update `spc_curpriority' might introduce some changes in >> > > behavior, especially with kernel threads that were not going through >> > > tsleep(9). We currently have some situations where the priority of >> > > the running thread isn't correctly reflected. This diff changes that >> > > which means we should be able to better understand where the problems >> > > are. >> > > >> > > I'd be interested in comments/tests/reviews before continuing in this >> > > direction. Note that at least part of this diff are required to split >> > > the accounting apart from
Re: Pump my sched: fewer SCHED_LOCK() & kill p_priority
On 2019-06-21 17:11:26, Martin Pieuchot wrote: > On 06/06/19(Thu) 15:16, Martin Pieuchot wrote: > > On 02/06/19(Sun) 16:41, Martin Pieuchot wrote: > > > On 01/06/19(Sat) 18:55, Martin Pieuchot wrote: > > > > Diff below exists mainly for documentation and test purposes. If > > > > you're not interested about how to break the scheduler internals in > > > > pieces, don't read further and go straight to testing! > > > > > > > > - First change is to stop calling tsleep(9) at PUSER. That makes > > > > it clear that all "sleeping priorities" are smaller than PUSER. > > > > That's important to understand for the diff below. `p_priority' > > > > is currently a placeholder for the "sleeping priority" and the > > > > "runnqueue priority". Both fields are separated by this diff. > > > > > > > > - When a thread goes to sleep, the priority argument of tsleep(9) is > > > > now recorded in `p_slpprio'. This argument can be considered as part > > > > of the sleep queue. Its purpose is to place the thread into a higher > > > > runqueue when awoken. > > > > > > > > - Currently, for stopped threads, `p_priority' correspond to > > > > `p_usrpri'. > > > > So setrunnable() has been untangled to place SSTOP and SSLEEP threads > > > > in the preferred queue without having to use `p_priority'. Note that > > > > `p_usrpri' is still recalculated *after* having called setrunqueue(). > > > > This is currently fine because setrunnable() is called with > > > > SCHED_LOCK() > > > > but it will be racy when we'll split it. > > > > > > > > - A new field, `p_runprio' has been introduced. It should be considered > > > > as part of the per-CPU runqueues. It indicates where a current thread > > > > is placed. > > > > > > > > - `spc_curpriority' is now updated at every context-switch. That means > > > >need_resched() won't be called after comparing an out-of-date value. > > > >At the same time, `p_usrpri' is initialized to the highest possible > > > >value for idle threads. > > > > > > > > - resched_proc() was calling need_resched() in the following conditions: > > > >- If the SONPROC thread has a higher priority that the current > > > > running thread (itself). > > > >- Twice in setrunnable() when we know that p_priority <= p_usrpri. > > > >- If schedcpu() considered that a thread, after updating its prio, > > > > should preempt the one running on the CPU pointed by `p_cpu'. > > > > > > > > The diff below simplify all of that by calling need_resched() when: > > > >- A thread is inserted in a CPU runqueue at a higher priority than > > > > the one SONPROC. > > > >- schedcpu() decides that a thread in SRUN state should preempt the > > > > one SONPROC. > > > > > > > > - `p_estcpu' `p_usrpri' and `p_slptime' which represent the "priority" > > > > of a thread are now updated while holding a per-thread mutex. As a > > > > result schedclock() and donice() no longer takes the SCHED_LOCK(), > > > > and schedcpu() almost never take it. > > > > > > > > - With this diff top(1) and ps(1) will report the "real" `p_usrpi' value > > > > when displaying priorities. This is helpful to understand what's > > > > happening: > > > > > > > > load averages: 0.99, 0.56, 0.25 two.lab.grenadille.net > > > > 23:42:10 > > > > 70 threads: 68 idle, 2 on processor > > > > up 0:09 > > > > CPU0: 0.0% user, 0.0% nice, 51.0% sys, 2.0% spin, 0.0% intr, 47.1% > > > > idle > > > > CPU1: 2.0% user, 0.0% nice, 51.0% sys, 3.9% spin, 0.0% intr, 43.1% > > > > idle > > > > Memory: Real: 47M/1005M act/tot Free: 2937M Cache: 812M Swap: 0K/4323M > > > > > > > > PID TID PRI NICE SIZE RES STATE WAIT TIMECPU > > > > COMMAND > > > > 81000 145101 7200K 1664K sleep/1 bored 1:15 36.96% > > > > softnet > > > > 47133 244097 730 2984K 4408K sleep/1 netio 1:06 35.06% cvs > > > > 64749 522184 660 176K 148K onproc/1 - 0:55 28.81% nfsd > > > > 21615 602473 12700K 1664K sleep/0 - 7:22 0.00% > > > > idle0 > > > > 12413 606242 12700K 1664K sleep/1 - 7:08 0.00% > > > > idle1 > > > > 85778 338258 500 4936K 7308K idle select0:10 0.00% ssh > > > > > > > > 22771 575513 500 176K 148K sleep/0 nfsd 0:02 0.00% > > > > nfsd > > > > > > > > > > > > > > > > - The removal of `p_priority' and the change that makes mi_switch() > > > > always update `spc_curpriority' might introduce some changes in > > > > behavior, especially with kernel threads that were not going through > > > > tsleep(9). We currently have some situations where the priority of > > > > the running thread isn't correctly reflected. This diff changes that > > > > which means we should be able to better understand where the problems > > > > are. > > > > > > > > I'd be interested in
Re: Pump my sched: fewer SCHED_LOCK() & kill p_priority
On Fri, 21 Jun 2019 21:54:18 -0700 Mike Larkin wrote: > On Fri, Jun 21, 2019 at 05:11:26PM -0300, Martin Pieuchot wrote: > > On 06/06/19(Thu) 15:16, Martin Pieuchot wrote: > > > On 02/06/19(Sun) 16:41, Martin Pieuchot wrote: > > > > On 01/06/19(Sat) 18:55, Martin Pieuchot wrote: > > > > > Diff below exists mainly for documentation and test purposes. If > > > > > you're not interested about how to break the scheduler internals in > > > > > pieces, don't read further and go straight to testing! > > > > > > > > > > - First change is to stop calling tsleep(9) at PUSER. That makes > > > > > it clear that all "sleeping priorities" are smaller than PUSER. > > > > > That's important to understand for the diff below. `p_priority' > > > > > is currently a placeholder for the "sleeping priority" and the > > > > > "runnqueue priority". Both fields are separated by this diff. > > > > > > > > > > - When a thread goes to sleep, the priority argument of tsleep(9) is > > > > > now recorded in `p_slpprio'. This argument can be considered as > > > > > part > > > > > of the sleep queue. Its purpose is to place the thread into a > > > > > higher > > > > > runqueue when awoken. > > > > > > > > > > - Currently, for stopped threads, `p_priority' correspond to > > > > > `p_usrpri'. > > > > > So setrunnable() has been untangled to place SSTOP and SSLEEP > > > > > threads > > > > > in the preferred queue without having to use `p_priority'. Note > > > > > that > > > > > `p_usrpri' is still recalculated *after* having called > > > > > setrunqueue(). > > > > > This is currently fine because setrunnable() is called with > > > > > SCHED_LOCK() > > > > > but it will be racy when we'll split it. > > > > > > > > > > - A new field, `p_runprio' has been introduced. It should be > > > > > considered > > > > > as part of the per-CPU runqueues. It indicates where a current > > > > > thread > > > > > is placed. > > > > > > > > > > - `spc_curpriority' is now updated at every context-switch. That > > > > > means > > > > >need_resched() won't be called after comparing an out-of-date > > > > > value. > > > > >At the same time, `p_usrpri' is initialized to the highest possible > > > > >value for idle threads. > > > > > > > > > > - resched_proc() was calling need_resched() in the following > > > > > conditions: > > > > >- If the SONPROC thread has a higher priority that the current > > > > > running thread (itself). > > > > >- Twice in setrunnable() when we know that p_priority <= p_usrpri. > > > > >- If schedcpu() considered that a thread, after updating its prio, > > > > > should preempt the one running on the CPU pointed by `p_cpu'. > > > > > > > > > > The diff below simplify all of that by calling need_resched() when: > > > > >- A thread is inserted in a CPU runqueue at a higher priority than > > > > > the one SONPROC. > > > > >- schedcpu() decides that a thread in SRUN state should preempt the > > > > > one SONPROC. > > > > > > > > > > - `p_estcpu' `p_usrpri' and `p_slptime' which represent the "priority" > > > > > of a thread are now updated while holding a per-thread mutex. As a > > > > > result schedclock() and donice() no longer takes the SCHED_LOCK(), > > > > > and schedcpu() almost never take it. > > > > > > > > > > - With this diff top(1) and ps(1) will report the "real" `p_usrpi' > > > > > value > > > > > when displaying priorities. This is helpful to understand what's > > > > > happening: > > > > > > > > > > load averages: 0.99, 0.56, 0.25 > > > > > two.lab.grenadille.net 23:42:10 > > > > > 70 threads: 68 idle, 2 on processor > > > > > up 0:09 > > > > > CPU0: 0.0% user, 0.0% nice, 51.0% sys, 2.0% spin, 0.0% intr, > > > > > 47.1% idle > > > > > CPU1: 2.0% user, 0.0% nice, 51.0% sys, 3.9% spin, 0.0% intr, > > > > > 43.1% idle > > > > > Memory: Real: 47M/1005M act/tot Free: 2937M Cache: 812M Swap: 0K/4323M > > > > > > > > > > PID TID PRI NICE SIZE RES STATE WAIT TIMECPU > > > > > COMMAND > > > > > 81000 145101 7200K 1664K sleep/1 bored 1:15 36.96% > > > > > softnet > > > > > 47133 244097 730 2984K 4408K sleep/1 netio 1:06 35.06% > > > > > cvs > > > > > 64749 522184 660 176K 148K onproc/1 - 0:55 28.81% > > > > > nfsd > > > > > 21615 602473 12700K 1664K sleep/0 - 7:22 0.00% > > > > > idle0 > > > > > 12413 606242 12700K 1664K sleep/1 - 7:08 0.00% > > > > > idle1 > > > > > 85778 338258 500 4936K 7308K idle select0:10 0.00% > > > > > ssh > > > > > 22771 575513 500 176K 148K sleep/0 nfsd 0:02 0.00% > > > > > nfsd > > > > > > > > > > > > > > > > > > > > - The removal of `p_priority' and the change that makes mi_switch() > > > > > always update `spc_curpriority' might introduce some
Re: Pump my sched: fewer SCHED_LOCK() & kill p_priority
On Fri, Jun 21, 2019 at 05:11:26PM -0300, Martin Pieuchot wrote: > On 06/06/19(Thu) 15:16, Martin Pieuchot wrote: > > On 02/06/19(Sun) 16:41, Martin Pieuchot wrote: > > > On 01/06/19(Sat) 18:55, Martin Pieuchot wrote: > > > > Diff below exists mainly for documentation and test purposes. If > > > > you're not interested about how to break the scheduler internals in > > > > pieces, don't read further and go straight to testing! > > > > > > > > - First change is to stop calling tsleep(9) at PUSER. That makes > > > > it clear that all "sleeping priorities" are smaller than PUSER. > > > > That's important to understand for the diff below. `p_priority' > > > > is currently a placeholder for the "sleeping priority" and the > > > > "runnqueue priority". Both fields are separated by this diff. > > > > > > > > - When a thread goes to sleep, the priority argument of tsleep(9) is > > > > now recorded in `p_slpprio'. This argument can be considered as part > > > > of the sleep queue. Its purpose is to place the thread into a higher > > > > runqueue when awoken. > > > > > > > > - Currently, for stopped threads, `p_priority' correspond to > > > > `p_usrpri'. > > > > So setrunnable() has been untangled to place SSTOP and SSLEEP threads > > > > in the preferred queue without having to use `p_priority'. Note that > > > > `p_usrpri' is still recalculated *after* having called setrunqueue(). > > > > This is currently fine because setrunnable() is called with > > > > SCHED_LOCK() > > > > but it will be racy when we'll split it. > > > > > > > > - A new field, `p_runprio' has been introduced. It should be considered > > > > as part of the per-CPU runqueues. It indicates where a current thread > > > > is placed. > > > > > > > > - `spc_curpriority' is now updated at every context-switch. That means > > > >need_resched() won't be called after comparing an out-of-date value. > > > >At the same time, `p_usrpri' is initialized to the highest possible > > > >value for idle threads. > > > > > > > > - resched_proc() was calling need_resched() in the following conditions: > > > >- If the SONPROC thread has a higher priority that the current > > > > running thread (itself). > > > >- Twice in setrunnable() when we know that p_priority <= p_usrpri. > > > >- If schedcpu() considered that a thread, after updating its prio, > > > > should preempt the one running on the CPU pointed by `p_cpu'. > > > > > > > > The diff below simplify all of that by calling need_resched() when: > > > >- A thread is inserted in a CPU runqueue at a higher priority than > > > > the one SONPROC. > > > >- schedcpu() decides that a thread in SRUN state should preempt the > > > > one SONPROC. > > > > > > > > - `p_estcpu' `p_usrpri' and `p_slptime' which represent the "priority" > > > > of a thread are now updated while holding a per-thread mutex. As a > > > > result schedclock() and donice() no longer takes the SCHED_LOCK(), > > > > and schedcpu() almost never take it. > > > > > > > > - With this diff top(1) and ps(1) will report the "real" `p_usrpi' value > > > > when displaying priorities. This is helpful to understand what's > > > > happening: > > > > > > > > load averages: 0.99, 0.56, 0.25 two.lab.grenadille.net > > > > 23:42:10 > > > > 70 threads: 68 idle, 2 on processor > > > > up 0:09 > > > > CPU0: 0.0% user, 0.0% nice, 51.0% sys, 2.0% spin, 0.0% intr, 47.1% > > > > idle > > > > CPU1: 2.0% user, 0.0% nice, 51.0% sys, 3.9% spin, 0.0% intr, 43.1% > > > > idle > > > > Memory: Real: 47M/1005M act/tot Free: 2937M Cache: 812M Swap: 0K/4323M > > > > > > > > PID TID PRI NICE SIZE RES STATE WAIT TIMECPU > > > > COMMAND > > > > 81000 145101 7200K 1664K sleep/1 bored 1:15 36.96% > > > > softnet > > > > 47133 244097 730 2984K 4408K sleep/1 netio 1:06 35.06% cvs > > > > 64749 522184 660 176K 148K onproc/1 - 0:55 28.81% nfsd > > > > 21615 602473 12700K 1664K sleep/0 - 7:22 0.00% > > > > idle0 > > > > 12413 606242 12700K 1664K sleep/1 - 7:08 0.00% > > > > idle1 > > > > 85778 338258 500 4936K 7308K idle select0:10 0.00% ssh > > > > > > > > 22771 575513 500 176K 148K sleep/0 nfsd 0:02 0.00% > > > > nfsd > > > > > > > > > > > > > > > > - The removal of `p_priority' and the change that makes mi_switch() > > > > always update `spc_curpriority' might introduce some changes in > > > > behavior, especially with kernel threads that were not going through > > > > tsleep(9). We currently have some situations where the priority of > > > > the running thread isn't correctly reflected. This diff changes that > > > > which means we should be able to better understand where the problems > > > > are. > > > > > > > > I'd be interested
Re: Pump my sched: fewer SCHED_LOCK() & kill p_priority
On 06/06/19(Thu) 15:16, Martin Pieuchot wrote: > On 02/06/19(Sun) 16:41, Martin Pieuchot wrote: > > On 01/06/19(Sat) 18:55, Martin Pieuchot wrote: > > > Diff below exists mainly for documentation and test purposes. If > > > you're not interested about how to break the scheduler internals in > > > pieces, don't read further and go straight to testing! > > > > > > - First change is to stop calling tsleep(9) at PUSER. That makes > > > it clear that all "sleeping priorities" are smaller than PUSER. > > > That's important to understand for the diff below. `p_priority' > > > is currently a placeholder for the "sleeping priority" and the > > > "runnqueue priority". Both fields are separated by this diff. > > > > > > - When a thread goes to sleep, the priority argument of tsleep(9) is > > > now recorded in `p_slpprio'. This argument can be considered as part > > > of the sleep queue. Its purpose is to place the thread into a higher > > > runqueue when awoken. > > > > > > - Currently, for stopped threads, `p_priority' correspond to `p_usrpri'. > > > So setrunnable() has been untangled to place SSTOP and SSLEEP threads > > > in the preferred queue without having to use `p_priority'. Note that > > > `p_usrpri' is still recalculated *after* having called setrunqueue(). > > > This is currently fine because setrunnable() is called with > > > SCHED_LOCK() > > > but it will be racy when we'll split it. > > > > > > - A new field, `p_runprio' has been introduced. It should be considered > > > as part of the per-CPU runqueues. It indicates where a current thread > > > is placed. > > > > > > - `spc_curpriority' is now updated at every context-switch. That means > > >need_resched() won't be called after comparing an out-of-date value. > > >At the same time, `p_usrpri' is initialized to the highest possible > > >value for idle threads. > > > > > > - resched_proc() was calling need_resched() in the following conditions: > > >- If the SONPROC thread has a higher priority that the current > > > running thread (itself). > > >- Twice in setrunnable() when we know that p_priority <= p_usrpri. > > >- If schedcpu() considered that a thread, after updating its prio, > > > should preempt the one running on the CPU pointed by `p_cpu'. > > > > > > The diff below simplify all of that by calling need_resched() when: > > >- A thread is inserted in a CPU runqueue at a higher priority than > > > the one SONPROC. > > >- schedcpu() decides that a thread in SRUN state should preempt the > > > one SONPROC. > > > > > > - `p_estcpu' `p_usrpri' and `p_slptime' which represent the "priority" > > > of a thread are now updated while holding a per-thread mutex. As a > > > result schedclock() and donice() no longer takes the SCHED_LOCK(), > > > and schedcpu() almost never take it. > > > > > > - With this diff top(1) and ps(1) will report the "real" `p_usrpi' value > > > when displaying priorities. This is helpful to understand what's > > > happening: > > > > > > load averages: 0.99, 0.56, 0.25 two.lab.grenadille.net > > > 23:42:10 > > > 70 threads: 68 idle, 2 on processorup > > > 0:09 > > > CPU0: 0.0% user, 0.0% nice, 51.0% sys, 2.0% spin, 0.0% intr, 47.1% > > > idle > > > CPU1: 2.0% user, 0.0% nice, 51.0% sys, 3.9% spin, 0.0% intr, 43.1% > > > idle > > > Memory: Real: 47M/1005M act/tot Free: 2937M Cache: 812M Swap: 0K/4323M > > > > > > PID TID PRI NICE SIZE RES STATE WAIT TIMECPU > > > COMMAND > > > 81000 145101 7200K 1664K sleep/1 bored 1:15 36.96% > > > softnet > > > 47133 244097 730 2984K 4408K sleep/1 netio 1:06 35.06% cvs > > > 64749 522184 660 176K 148K onproc/1 - 0:55 28.81% nfsd > > > 21615 602473 12700K 1664K sleep/0 - 7:22 0.00% idle0 > > > > > > 12413 606242 12700K 1664K sleep/1 - 7:08 0.00% idle1 > > > 85778 338258 500 4936K 7308K idle select0:10 0.00% ssh > > > 22771 575513 500 176K 148K sleep/0 nfsd 0:02 0.00% nfsd > > > > > > > > > > > > - The removal of `p_priority' and the change that makes mi_switch() > > > always update `spc_curpriority' might introduce some changes in > > > behavior, especially with kernel threads that were not going through > > > tsleep(9). We currently have some situations where the priority of > > > the running thread isn't correctly reflected. This diff changes that > > > which means we should be able to better understand where the problems > > > are. > > > > > > I'd be interested in comments/tests/reviews before continuing in this > > > direction. Note that at least part of this diff are required to split > > > the accounting apart from the SCHED_LOCK() as well. > > > > > > I'll also work on exporting scheduler statistics unless somebody wants > > > to
Re: Pump my sched: fewer SCHED_LOCK() & kill p_priority
On Sun, Jun 9, 2019 at 10:39 PM Theo de Raadt wrote: > > Amit Kulkarni wrote: > > > > Index: sys/sysctl.h > > > === > > > RCS file: /cvs/src/sys/sys/sysctl.h,v > > > retrieving revision 1.188 > > > diff -u -p -r1.188 sysctl.h > > > --- sys/sysctl.h1 Jun 2019 14:11:18 - 1.188 > > > +++ sys/sysctl.h1 Jun 2019 16:36:13 - > > > @@ -629,7 +629,7 @@ do { > > > \ > > > (kp)->p_stat = (p)->p_stat; \ > > > (kp)->p_slptime = (p)->p_slptime; \ > > > (kp)->p_holdcnt = 1;\ > > > - (kp)->p_priority = (p)->p_priority; \ > > > + (kp)->p_priority = (p)->p_usrpri + PZERO; \ > > > (kp)->p_usrpri = (p)->p_usrpri; \ > > > if ((p)->p_wchan && (p)->p_wmesg) \ > > > copy_str((kp)->p_wmesg, (p)->p_wmesg, \ > > > > > > > > > Hi, > > > > A request, to remove the +PZERO here above and the -PZERO in > > /usr/src/usr.bin/top/machine.c, why do an unnecessary calculation twice, > > once to set and other time to this? > > This is getting out of hand. > > Have you reviewed the *entire universe* of software to ensure that > top is the only program which looks at this? > > No. You have not. So please stop proposing changes where you aren't > willing to invest into studying the history. Got it. Sorry, my fault here.
Re: Pump my sched: fewer SCHED_LOCK() & kill p_priority
Amit Kulkarni wrote: > > Index: sys/sysctl.h > > === > > RCS file: /cvs/src/sys/sys/sysctl.h,v > > retrieving revision 1.188 > > diff -u -p -r1.188 sysctl.h > > --- sys/sysctl.h1 Jun 2019 14:11:18 - 1.188 > > +++ sys/sysctl.h1 Jun 2019 16:36:13 - > > @@ -629,7 +629,7 @@ do { > > \ > > (kp)->p_stat = (p)->p_stat; \ > > (kp)->p_slptime = (p)->p_slptime; \ > > (kp)->p_holdcnt = 1;\ > > - (kp)->p_priority = (p)->p_priority; \ > > + (kp)->p_priority = (p)->p_usrpri + PZERO; \ > > (kp)->p_usrpri = (p)->p_usrpri; \ > > if ((p)->p_wchan && (p)->p_wmesg) \ > > copy_str((kp)->p_wmesg, (p)->p_wmesg, \ > > > > > Hi, > > A request, to remove the +PZERO here above and the -PZERO in > /usr/src/usr.bin/top/machine.c, why do an unnecessary calculation twice, once > to set and other time to this? This is getting out of hand. Have you reviewed the *entire universe* of software to ensure that top is the only program which looks at this? No. You have not. So please stop proposing changes where you aren't willing to invest into studying the history.
Re: Pump my sched: fewer SCHED_LOCK() & kill p_priority
> Index: sys/sysctl.h > === > RCS file: /cvs/src/sys/sys/sysctl.h,v > retrieving revision 1.188 > diff -u -p -r1.188 sysctl.h > --- sys/sysctl.h 1 Jun 2019 14:11:18 - 1.188 > +++ sys/sysctl.h 1 Jun 2019 16:36:13 - > @@ -629,7 +629,7 @@ do { > \ > (kp)->p_stat = (p)->p_stat; \ > (kp)->p_slptime = (p)->p_slptime; \ > (kp)->p_holdcnt = 1;\ > - (kp)->p_priority = (p)->p_priority; \ > + (kp)->p_priority = (p)->p_usrpri + PZERO; \ > (kp)->p_usrpri = (p)->p_usrpri; \ > if ((p)->p_wchan && (p)->p_wmesg) \ > copy_str((kp)->p_wmesg, (p)->p_wmesg, \ > Hi, A request, to remove the +PZERO here above and the -PZERO in /usr/src/usr.bin/top/machine.c, why do an unnecessary calculation twice, once to set and other time to unset? Thanks
Re: Pump my sched: fewer SCHED_LOCK() & kill p_priority
On 02/06/19(Sun) 16:41, Martin Pieuchot wrote: > On 01/06/19(Sat) 18:55, Martin Pieuchot wrote: > > Diff below exists mainly for documentation and test purposes. If > > you're not interested about how to break the scheduler internals in > > pieces, don't read further and go straight to testing! > > > > - First change is to stop calling tsleep(9) at PUSER. That makes > > it clear that all "sleeping priorities" are smaller than PUSER. > > That's important to understand for the diff below. `p_priority' > > is currently a placeholder for the "sleeping priority" and the > > "runnqueue priority". Both fields are separated by this diff. > > > > - When a thread goes to sleep, the priority argument of tsleep(9) is > > now recorded in `p_slpprio'. This argument can be considered as part > > of the sleep queue. Its purpose is to place the thread into a higher > > runqueue when awoken. > > > > - Currently, for stopped threads, `p_priority' correspond to `p_usrpri'. > > So setrunnable() has been untangled to place SSTOP and SSLEEP threads > > in the preferred queue without having to use `p_priority'. Note that > > `p_usrpri' is still recalculated *after* having called setrunqueue(). > > This is currently fine because setrunnable() is called with SCHED_LOCK() > > but it will be racy when we'll split it. > > > > - A new field, `p_runprio' has been introduced. It should be considered > > as part of the per-CPU runqueues. It indicates where a current thread > > is placed. > > > > - `spc_curpriority' is now updated at every context-switch. That means > >need_resched() won't be called after comparing an out-of-date value. > >At the same time, `p_usrpri' is initialized to the highest possible > >value for idle threads. > > > > - resched_proc() was calling need_resched() in the following conditions: > >- If the SONPROC thread has a higher priority that the current > > running thread (itself). > >- Twice in setrunnable() when we know that p_priority <= p_usrpri. > >- If schedcpu() considered that a thread, after updating its prio, > > should preempt the one running on the CPU pointed by `p_cpu'. > > > > The diff below simplify all of that by calling need_resched() when: > >- A thread is inserted in a CPU runqueue at a higher priority than > > the one SONPROC. > >- schedcpu() decides that a thread in SRUN state should preempt the > > one SONPROC. > > > > - `p_estcpu' `p_usrpri' and `p_slptime' which represent the "priority" > > of a thread are now updated while holding a per-thread mutex. As a > > result schedclock() and donice() no longer takes the SCHED_LOCK(), > > and schedcpu() almost never take it. > > > > - With this diff top(1) and ps(1) will report the "real" `p_usrpi' value > > when displaying priorities. This is helpful to understand what's > > happening: > > > > load averages: 0.99, 0.56, 0.25 two.lab.grenadille.net > > 23:42:10 > > 70 threads: 68 idle, 2 on processorup > > 0:09 > > CPU0: 0.0% user, 0.0% nice, 51.0% sys, 2.0% spin, 0.0% intr, 47.1% idle > > CPU1: 2.0% user, 0.0% nice, 51.0% sys, 3.9% spin, 0.0% intr, 43.1% idle > > Memory: Real: 47M/1005M act/tot Free: 2937M Cache: 812M Swap: 0K/4323M > > > > PID TID PRI NICE SIZE RES STATE WAIT TIMECPU COMMAND > > 81000 145101 7200K 1664K sleep/1 bored 1:15 36.96% softnet > > 47133 244097 730 2984K 4408K sleep/1 netio 1:06 35.06% cvs > > 64749 522184 660 176K 148K onproc/1 - 0:55 28.81% nfsd > > 21615 602473 12700K 1664K sleep/0 - 7:22 0.00% idle0 > > 12413 606242 12700K 1664K sleep/1 - 7:08 0.00% idle1 > > 85778 338258 500 4936K 7308K idle select0:10 0.00% ssh > > 22771 575513 500 176K 148K sleep/0 nfsd 0:02 0.00% nfsd > > > > > > > > - The removal of `p_priority' and the change that makes mi_switch() > > always update `spc_curpriority' might introduce some changes in > > behavior, especially with kernel threads that were not going through > > tsleep(9). We currently have some situations where the priority of > > the running thread isn't correctly reflected. This diff changes that > > which means we should be able to better understand where the problems > > are. > > > > I'd be interested in comments/tests/reviews before continuing in this > > direction. Note that at least part of this diff are required to split > > the accounting apart from the SCHED_LOCK() as well. > > > > I'll also work on exporting scheduler statistics unless somebody wants > > to beat me :) > > Updated diff to use IPL_SCHED and rebased to apply on top of -current :) Updated diff that fixes a pagefault reported by sthen@. Index: arch/amd64/amd64/genassym.cf === RCS file:
Re: Pump my sched: fewer SCHED_LOCK() & kill p_priority
On Thu, Jun 06, 2019 at 02:55:35PM +0100, Stuart Henderson wrote: > I'm testing the "pump my sched" and read/write unlock diffs and ran into > the panic below. Seems more likely that it would be connected with the > sched diff rather than anything else. I'll build a WITNESS kernel and > see if I can get more details if it hapens again. > Funny thing, I ran into the same panic last night after running this diff without issue for several days. Mine didn't include a stack trace though. > ddb{1}> show panic > kernel diagnostic assertion "__mp_lock_held(_lock, curcpu()) == 0" > failed > : file "/sys/kern/kern_lock.c", line 63 > ddb{1}> tr > db_enter() at db_enter+0x10 > panic() at panic+0x128 > __assert(81aebffd,81ac7a49,3f,81b0e315) at > __assert+0x2 > e > _kernel_lock() at _kernel_lock+0xf6 > pageflttrap() at pageflttrap+0x75 > kerntrap(800033ee0ca0) at kerntrap+0x91 > alltraps_kern(6,380,fffb,2,80002201aff0,e0) at alltraps_kern+0x7b > setrunqueue(0,8000342b2ce8,e0) at setrunqueue+0xbc > setrunnable(8000342b2ce8,e0) at setrunnable+0xa5 > ptsignal(8000342b2ce8,13,0) at ptsignal+0x3c4 > sys_kill(800033c2fcc0,800033ee0ec0,800033ee0f20) at sys_kill+0x1c5 > syscall(800033ee0f90) at syscall+0x389 > Xsyscall(6,7a,0,7a,13,7f7e3e18) at Xsyscall+0x128 > end of kernel > end trace frame: 0x7f7e3cb0, count: -13 > ddb{1}> ps >PID TID PPIDUID S FLAGS WAIT COMMAND > 89165 411452 1971667 30x1000b2 poll ping > 1971 11232 86869667 30x82 piperdcheck_ping > 66051 380473 92153667 30x1000b2 poll ping > 92153 480496 86869667 30x82 piperdcheck_ping > 7608 110605 8332667 30x1000b2 poll ping > 8332 284560 86869667 30x82 piperdcheck_ping > 40097 336365 35317667 30x1000b2 poll ping > 35317 385677 86869667 30x82 piperdcheck_ping > 33714 102547 21689667 30x1000b2 poll ping > 21689 210571 86869667 30x82 piperdcheck_ping > 72142 347023 62694667 30x1000b2 poll ping > 62694 439660 86869667 30x82 piperdcheck_ping > 87347 447624 76730 1000 4 0x8000430xscreensaver > 37569 319522 76730 1000 20x800402piecewise > 11841 481452 18848667 30x1000b2 poll ping > 18848 111230 86869667 30x82 piperdcheck_ping > 69292 189646 44902667 30x1000b2 poll ping > 44902 21064 86869667 30x82 piperdcheck_ping > 67905 363250 1761507 30x92 kqreadpickup > 86709 146945 91951 1000 30x92 kqreadimap > 36596 227858 91951666 30x92 kqreadimap-login > 90723 180281 91951 1000 30x92 kqreadimap > 42867 129499 91951 1000 30x92 kqreadimap > 52963 203919 91951 1000 30x92 kqreadimap > 61599 167915 91951666 30x92 kqreadimap-login > 54450 154426 91951666 30x92 kqreadimap-login > 54061 103553 91951666 30x92 kqreadimap-login > 39422 362611 85872715 30x90 netconperl > 49809 260623 85872715 30x90 lockf perl > 3824 92473 85872715 30x90 lockf perl > 30990 398152 85872715 30x90 lockf perl > 3327 264795 85872715 30x90 lockf perl > 24406 217143 85872715 30x90 lockf perl > 52871 404585 85872715 30x90 lockf perl > 58220 118881 85872715 30x90 lockf perl > 32898 307331 85872715 30x90 lockf perl > 50203 396987 85872715 30x90 lockf perl > 93024 24785 1 1000 30x100080 selectssh > 22261 482889 52840 1000 30x100083 poll ssh > 52840 404967 61470 1000 30x10008b pause ksh > 35796 132597 61470 1000 30xb0 netio urxvt > 61470 485028 72107 1000 30xb2 selecturxvt > 72107 196192 1 1000 30x10008a pause sh > 70238 396933 1 1000 30x100080 selectssh > 41840 302658 10362 1000 30x100083 poll ssh > 89638 411799 91951 1000 30x92 kqreadimap > 49250 507579 81821 1000 30x100083 ttyin ksh > 37608 523403 91951666 30x92 kqreadimap-login > 75552 67606 33096 1000 30x100083 poll mutt > 33096 492099 81821 1000 30x10008b pause ksh > 81821 270355 1 1000 30x100080 kqreadtmux > 24522 341281 91861 1000 30x100083 kqreadtmux >
Re: Pump my sched: fewer SCHED_LOCK() & kill p_priority
On 2019/06/06 14:55, Stuart Henderson wrote: > I'm testing the "pump my sched" and read/write unlock diffs and ran into > the panic below. Seems more likely that it would be connected with the > sched diff rather than anything else. I'll build a WITNESS kernel and > see if I can get more details if it hapens again. Easily repeatable: Run xscreensaver. Let it lock. Try to unlock. login: panic: kernel diagnostic assertion "__mp_lock_held(_lock, curcpu()) == 0" failed: file "/src/cvs-openbsd/sys/kern/kern_lock.c", line 63 Stopped at db_enter+0x10: popq%rbp TIDPIDUID PRFLAGS PFLAGS CPU COMMAND 6043 83667 1000 0x2 01 firefox 320664 83667 1000 0x2 0x4802 firefox 523640 70812 1000 0x2 0x4003 firefox *307581 55620 10000x32 00K xscreensaver db_enter() at db_enter+0x10 panic() at panic+0x128 __assert(81af6d92,81afc585,3f,81b1a406) at __assert+0x2 e __mp_lock_held(811362e0,800033cfe5e0) at __mp_lock_held pageflttrap() at pageflttrap+0x78 kerntrap(800033cfe6b0) at kerntrap+0x91 alltraps_kern(6,300,fffb,2,800022009ff0,c0) at alltraps_kern+0x7b setrunqueue(0,8000342572a8,c0) at setrunqueue+0xbc setrunnable(8000342572a8,c0) at setrunnable+0xa5 ptsignal(8000342572a8,13,0) at ptsignal+0x3c4 sys_kill(800033c532a8,800033cfe8d0,800033cfe930) at sys_kill+0x1c5 syscall(800033cfe9a0) at syscall+0x399 Xsyscall(6,7a,0,7a,13,7f7e8cc8) at Xsyscall+0x128 end of kernel end trace frame: 0x7f7e8b60, count: 2 https://www.openbsd.org/ddb.html describes the minimum info required in bug reports. Insufficient info makes it difficult to find and fix bugs. ddb{0}> sh witness Sleep locks: _fpriv->lock (type: rwlock, depth: 0)_priv->fbs_lock (type: rwlock, d epth: 0)futex (type: rwlock, depth: 0) _lock (type: kernel_lock, depth: 6 ) >lock (type: rwlock, depth: 5) _lock (type: kernel_lock, depth: 6 ) -- (already displayed) >pr_lock (type: rwlock, depth: 1) _lock (type: kernel_lock, depth: 6) -- (already displayed) swplk (type: rwlock, depth: 0) >lock (type: rwlock, depth: 5) -- (already d isplayed) >i_lock (type: rrwlock, depth: 2) _lock (type: kernel_lock, depth: 6) -- (already displayed) >lock (type: rwlock, depth: 5) -- (already displayed) >dk_lock (type: rwlock, depth: 3) _lock (type: kernel_lock, d epth: 6) -- (already displayed) _mtx (type: rwlock, depth: 3) >dh_mtx (type: rwlock, depth: 4 ) lockflk (type: rwlock, depth: 3)(type: rwlock, depth: 4) ->pr_lock (type: rwlock, depth: 4) >pr_lock (type: rwlock, depth: 4) ->dh_mtx (type: rwlock, depth: 4) -- (already displayed) >uv_lock (type: rwlock, depth: 3) >uv_lock (type: rwlock, depth: 4) >uv_lock (type: rwlock, depth: 4) -- (already displayed) sysctllk (type: rwlock, depth: 0) _lock (type: kernel_lock, depth: 6) -- (already displayed) >lock (type: rwlock, depth: 5) -- (already displayed) >fd_fd.fd_lock (type: rwlock, depth: 1) _lock (type: kernel_lo ck, depth: 6) -- (already displayed) >lock (type: rwlock, depth: 5) -- (already displayed) netlock (type: rwlock, depth: 2) _lock (type: kernel_lock, depth: 6) -- (already displayed) pools (type: rwlock, depth: 3)_lock (type: kernel_lock, depth: 6) -- (already displayed) >lock (type: rwlock, depth: 5) -- (already displayed) >ar_lock (type: rwlock, depth: 3) vlantag (type: rwlock, depth: 3) lo ckflk (type: rwlock, depth: 3) -- (already displayed) >i_lock (type: rrwlock, depth: 2) -- (already displayed) vfs_stall (type: rwlock, depth: 2) _lock (type: kernel_lock, depth: 6 ) -- (already displayed) ptarrlk (type: rwlock, depth: 2) netlock (type: rwlock, depth: 2) -- (already displayed) tc_lock (type: rwlock, depth: 1) >i_lock (type: rrwlock, depth: 2) -- (alr eady displayed) vfs_stall (type: rwlock, depth: 2) -- (already displayed) sysctldlk (type: rwlock, depth: 1)>hdcp_mutex (type: rwlock, depth: 0)>lock (type: rwlock, depth: 0)>mutex (type: rwlock, depth: 0 ) >mode_config.idr_mutex (type: rwlock, depth: 3)_priv->dpll_lock (typ e: rwlock, depth: 0)taskq (type: rwlock, depth: 0) >struct_mutex (type: rw lock, depth: 3) >lock (type: rwlock, depth: 5) -- (already displayed) (type: rwlock, depth: 4) -- (already displayed) >vm_lock (type: rwlock, depth: 5) _priv->pcu_lock (type: rwlock, de pth: 4) _priv->gt_pm.rps.power.mutex (type: rwlock, depth: 5) _dom ains->lock (type: rwlock, depth: 4) _priv->mm.stolen_lock (type: rwlock, d epth: 4) >lock (type: rwlock, depth: 4) >vm_lock (type: rwlock, de pth: 4) >mm.lock (type: rwlock, depth: 4) >lock (type: rwlock, dep th: 5) -- (already displayed) >vm_lock (type: rwlock, depth: 5) -- (already displayed) >mm.get_page.lock (type: rwlock, depth: 4) >vm_lock (type: rwlock, d epth: 5) -- (already displayed) >mm.lock (type: rwlock, depth: 4) -- (already displayed) _priv->av_mutex (type: rwlock, depth:
Re: Pump my sched: fewer SCHED_LOCK() & kill p_priority
I'm testing the "pump my sched" and read/write unlock diffs and ran into the panic below. Seems more likely that it would be connected with the sched diff rather than anything else. I'll build a WITNESS kernel and see if I can get more details if it hapens again. ddb{1}> show panic kernel diagnostic assertion "__mp_lock_held(_lock, curcpu()) == 0" failed : file "/sys/kern/kern_lock.c", line 63 ddb{1}> tr db_enter() at db_enter+0x10 panic() at panic+0x128 __assert(81aebffd,81ac7a49,3f,81b0e315) at __assert+0x2 e _kernel_lock() at _kernel_lock+0xf6 pageflttrap() at pageflttrap+0x75 kerntrap(800033ee0ca0) at kerntrap+0x91 alltraps_kern(6,380,fffb,2,80002201aff0,e0) at alltraps_kern+0x7b setrunqueue(0,8000342b2ce8,e0) at setrunqueue+0xbc setrunnable(8000342b2ce8,e0) at setrunnable+0xa5 ptsignal(8000342b2ce8,13,0) at ptsignal+0x3c4 sys_kill(800033c2fcc0,800033ee0ec0,800033ee0f20) at sys_kill+0x1c5 syscall(800033ee0f90) at syscall+0x389 Xsyscall(6,7a,0,7a,13,7f7e3e18) at Xsyscall+0x128 end of kernel end trace frame: 0x7f7e3cb0, count: -13 ddb{1}> ps PID TID PPIDUID S FLAGS WAIT COMMAND 89165 411452 1971667 30x1000b2 poll ping 1971 11232 86869667 30x82 piperdcheck_ping 66051 380473 92153667 30x1000b2 poll ping 92153 480496 86869667 30x82 piperdcheck_ping 7608 110605 8332667 30x1000b2 poll ping 8332 284560 86869667 30x82 piperdcheck_ping 40097 336365 35317667 30x1000b2 poll ping 35317 385677 86869667 30x82 piperdcheck_ping 33714 102547 21689667 30x1000b2 poll ping 21689 210571 86869667 30x82 piperdcheck_ping 72142 347023 62694667 30x1000b2 poll ping 62694 439660 86869667 30x82 piperdcheck_ping 87347 447624 76730 1000 4 0x8000430xscreensaver 37569 319522 76730 1000 20x800402piecewise 11841 481452 18848667 30x1000b2 poll ping 18848 111230 86869667 30x82 piperdcheck_ping 69292 189646 44902667 30x1000b2 poll ping 44902 21064 86869667 30x82 piperdcheck_ping 67905 363250 1761507 30x92 kqreadpickup 86709 146945 91951 1000 30x92 kqreadimap 36596 227858 91951666 30x92 kqreadimap-login 90723 180281 91951 1000 30x92 kqreadimap 42867 129499 91951 1000 30x92 kqreadimap 52963 203919 91951 1000 30x92 kqreadimap 61599 167915 91951666 30x92 kqreadimap-login 54450 154426 91951666 30x92 kqreadimap-login 54061 103553 91951666 30x92 kqreadimap-login 39422 362611 85872715 30x90 netconperl 49809 260623 85872715 30x90 lockf perl 3824 92473 85872715 30x90 lockf perl 30990 398152 85872715 30x90 lockf perl 3327 264795 85872715 30x90 lockf perl 24406 217143 85872715 30x90 lockf perl 52871 404585 85872715 30x90 lockf perl 58220 118881 85872715 30x90 lockf perl 32898 307331 85872715 30x90 lockf perl 50203 396987 85872715 30x90 lockf perl 93024 24785 1 1000 30x100080 selectssh 22261 482889 52840 1000 30x100083 poll ssh 52840 404967 61470 1000 30x10008b pause ksh 35796 132597 61470 1000 30xb0 netio urxvt 61470 485028 72107 1000 30xb2 selecturxvt 72107 196192 1 1000 30x10008a pause sh 70238 396933 1 1000 30x100080 selectssh 41840 302658 10362 1000 30x100083 poll ssh 89638 411799 91951 1000 30x92 kqreadimap 49250 507579 81821 1000 30x100083 ttyin ksh 37608 523403 91951666 30x92 kqreadimap-login 75552 67606 33096 1000 30x100083 poll mutt 33096 492099 81821 1000 30x10008b pause ksh 81821 270355 1 1000 30x100080 kqreadtmux 24522 341281 91861 1000 30x100083 kqreadtmux 23218 441369 78842 1000 30x82 poll firefox 23218 430421 78842 1000 3 0x482 kqreadfirefox 23218 460884 78842 1000 3 0x482 fsleepfirefox 23218 17619 78842 1000 3 0x482 fsleepfirefox 23218 138796 78842 1000 3 0x482 fsleepfirefox 23218 341276 78842 1000 3 0x482
Re: Pump my sched: fewer SCHED_LOCK() & kill p_priority
On 02/06/19(Sun) 22:03, Mark Kettenis wrote: > > Date: Sat, 1 Jun 2019 18:55:20 -0300 > > From: Martin Pieuchot > [...] > > - First change is to stop calling tsleep(9) at PUSER. That makes > > it clear that all "sleeping priorities" are smaller than PUSER. > > That's important to understand for the diff below. `p_priority' > > is currently a placeholder for the "sleeping priority" and the > > "runnqueue priority". Both fields are separated by this diff. > > Separating out the fields is a good idea. The current way priorities > are recorded is just confusing. The use of PUSER vs. PWAIT seems to > be fairly arbitrary, so that is probably not a big issue. Except > maybe for the single-threded signal stuff. Would be good to get > guenther@'s thoughts on this bit. > > The PUSER -> PWAIT change isn't really necessary is it? It just makes > it easier for you to understand what;s going on when looking at the > queues. The problem becomes easier to understand with this change. They is currently two places where `p_priority' is updated iff it was previously >= PUSER. These two places are schedclock() and schedcpu() they both "duplicate" the same logic to update `p_priority' just after having recalculated `p_usrpri'. So with it we can say that this code do no apply to the new `p_slpprio' because it is always < PUSER. Now the question is why does `p_priority' exist and why it some times reflect `p_usrpri'? The only places where mi_switch() is called without updating `p_priority' nor putting the caller on a queue are related to signals. To exit the SSTOP state, setrunnable() is called on a thread. Since setrunnable() is shared between SSLEEP and SSTOP threads it is not obvious that `p_priority' is the "sleeping priority" in the first case and 'p_usrpri' in the second case. The only exception to this logic are stopped threads with nice(1) value that have a `p_priority' < PUSER. In this specific case my diff, that passes `p_usrpri' as argument to setrunnable(), introduces a change in behavior. However I doubt it matters since such threads are generally exceptions and the only ones with a priority < PUSER. This can be observed by stopping sndiod(8). > > [...] > > - `p_estcpu' `p_usrpri' and `p_slptime' which represent the "priority" > > of a thread are now updated while holding a per-thread mutex. As a > > result schedclock() and donice() no longer takes the SCHED_LOCK(), > > and schedcpu() almost never take it. > > Need to look closer at how this works. `p_slptime' could be removed if we want to simplify the logic. I don't see the point of deferring the calculation of sleeping/stopped threads. > > > - With this diff top(1) and ps(1) will report the "real" `p_usrpi' value > > when displaying priorities. This is helpful to understand what's > > happening: > > Do you intend to remove that bit before committing this? I don't know. Which "priority" should we export then? The sleep/run priority? This would be the most backward-compatible change. However the priority that really matters *is* `p_usrpri'. With this diff it becomes so much easier to understand what's happening...
Re: Pump my sched: fewer SCHED_LOCK() & kill p_priority
On Mon, Jun 03, 2019 at 11:50:14AM +0200, Solene Rapenne wrote: > On Sat, Jun 01, 2019 at 06:55:20PM -0300, Martin Pieuchot wrote: > > Diff below exists mainly for documentation and test purposes. If > > you're not interested about how to break the scheduler internals in > > pieces, don't read further and go straight to testing! > > I'm running it since a few hours. > > - games/gzdoom feels smoother with this patch (stuttering was certainly > related to audio) > - mpd playback doesn't seem interrupted under heavy load as it > occasionnaly did > > this may be coincidences or placebo effect. > On one of my machines, I'm running this diff and the unlock more syscalls diff (the one from mpi@ that claudio@ recently reposted). It does indeed seem more responsive (unclear which diff is doing this, though). The combination is also stable without anything out of the ordinary. -ml
Re: Pump my sched: fewer SCHED_LOCK() & kill p_priority
On 2.6.2019. 21:41, Martin Pieuchot wrote: > On 01/06/19(Sat) 18:55, Martin Pieuchot wrote: >> Diff below exists mainly for documentation and test purposes. If >> you're not interested about how to break the scheduler internals in >> pieces, don't read further and go straight to testing! > Updated diff to use IPL_SCHED and rebased to apply on top of -current :) i'm running this diff together with proctreelk diff on openbsd desktop with gnome and samba server and everything seems fine
Re: Pump my sched: fewer SCHED_LOCK() & kill p_priority
On Sat, Jun 01, 2019 at 06:55:20PM -0300, Martin Pieuchot wrote: > Diff below exists mainly for documentation and test purposes. If > you're not interested about how to break the scheduler internals in > pieces, don't read further and go straight to testing! I'm running it since a few hours. - games/gzdoom feels smoother with this patch (stuttering was certainly related to audio) - mpd playback doesn't seem interrupted under heavy load as it occasionnaly did this may be coincidences or placebo effect.
Re: Pump my sched: fewer SCHED_LOCK() & kill p_priority
Hi, This is pretty cool diff in splitting the sleep prio and the run prio! In a few places, the documentation comment could be changed from process to proc, tried to find it below and mark. It leaves reader confused for a moment. thanks > > - `spc_curpriority' is now updated at every context-switch. That means > >need_resched() won't be called after comparing an out-of-date value. > >At the same time, `p_usrpri' is initialized to the highest possible > >value for idle threads. > > - resched_proc() was calling need_resched() in the following conditions: > >- If the SONPROC thread has a higher priority that the current > > running thread (itself). > >- Twice in setrunnable() when we know that p_priority <= p_usrpri. > >- If schedcpu() considered that a thread, after updating its prio, > > should preempt the one running on the CPU pointed by `p_cpu'. > > > > The diff below simplify all of that by calling need_resched() when: > >- A thread is inserted in a CPU runqueue at a higher priority than > > the one SONPROC. > >- schedcpu() decides that a thread in SRUN state should preempt the > > one SONPROC. Just FYI, this should fix a serious bug, the resched_proc() call was very wrong in comparing stale priority in deciding what to schedule, and it made a pretty bad decision consistently! > > - `p_estcpu' `p_usrpri' and `p_slptime' which represent the "priority" > > of a thread are now updated while holding a per-thread mutex. As a > > result schedclock() and donice() no longer takes the SCHED_LOCK(), > > and schedcpu() almost never take it. You forgot to add resetpriority() which is also moved from SCHED_LOCK! > > > > - With this diff top(1) and ps(1) will report the "real" `p_usrpi' value > > when displaying priorities. This is helpful to understand what's > > happening: > > > > load averages: 0.99, 0.56, 0.25 two.lab.grenadille.net > > 23:42:10 > > 70 threads: 68 idle, 2 on processorup > > 0:09 > > CPU0: 0.0% user, 0.0% nice, 51.0% sys, 2.0% spin, 0.0% intr, 47.1% idle > > CPU1: 2.0% user, 0.0% nice, 51.0% sys, 3.9% spin, 0.0% intr, 43.1% idle > > Memory: Real: 47M/1005M act/tot Free: 2937M Cache: 812M Swap: 0K/4323M > > > > PID TID PRI NICE SIZE RES STATE WAIT TIMECPU COMMAND > > 81000 145101 7200K 1664K sleep/1 bored 1:15 36.96% softnet > > 47133 244097 730 2984K 4408K sleep/1 netio 1:06 35.06% cvs > > 64749 522184 660 176K 148K onproc/1 - 0:55 28.81% nfsd > > 21615 602473 12700K 1664K sleep/0 - 7:22 0.00% idle0 > > 12413 606242 12700K 1664K sleep/1 - 7:08 0.00% idle1 > > 85778 338258 500 4936K 7308K idle select0:10 0.00% ssh > > 22771 575513 500 176K 148K sleep/0 nfsd 0:02 0.00% nfsd > > > > > > > > - The removal of `p_priority' and the change that makes mi_switch() > > always update `spc_curpriority' might introduce some changes in > > behavior, especially with kernel threads that were not going through > > tsleep(9). We currently have some situations where the priority of > > the running thread isn't correctly reflected. This diff changes that > > which means we should be able to better understand where the problems > > are. > > > > I'd be interested in comments/tests/reviews before continuing in this > > direction. Note that at least part of this diff are required to split > > the accounting apart from the SCHED_LOCK() as well. > > > > I'll also work on exporting scheduler statistics unless somebody wants > > to beat me :) > > Updated diff to use IPL_SCHED and rebased to apply on top of -current :) > > Index: arch/amd64/amd64/genassym.cf > === > RCS file: /cvs/src/sys/arch/amd64/amd64/genassym.cf,v > retrieving revision 1.40 > diff -u -p -r1.40 genassym.cf > --- arch/amd64/amd64/genassym.cf 17 May 2019 19:07:15 - 1.40 > +++ arch/amd64/amd64/genassym.cf 1 Jun 2019 16:27:46 - > @@ -32,7 +32,6 @@ export VM_MIN_KERNEL_ADDRESS > > struct proc > member p_addr > -member p_priority > member p_stat > member p_wchan > member P_MD_REGS p_md.md_regs > Index: arch/hppa/hppa/genassym.cf > === > RCS file: /cvs/src/sys/arch/hppa/hppa/genassym.cf,v > retrieving revision 1.47 > diff -u -p -r1.47 genassym.cf > --- arch/hppa/hppa/genassym.cf9 Feb 2015 08:20:13 - 1.47 > +++ arch/hppa/hppa/genassym.cf1 Jun 2019 17:21:44 - > @@ -130,7 +130,6 @@ membertf_cr30 > # proc fields and values > struct proc > member p_addr > -member p_priority > member p_stat > member p_wchan > member p_md > Index: arch/i386/i386/esm.c >
Re: Pump my sched: fewer SCHED_LOCK() & kill p_priority
> Date: Sat, 1 Jun 2019 18:55:20 -0300 > From: Martin Pieuchot > > Diff below exists mainly for documentation and test purposes. If > you're not interested about how to break the scheduler internals in > pieces, don't read further and go straight to testing! Still digesting this, but. > - First change is to stop calling tsleep(9) at PUSER. That makes > it clear that all "sleeping priorities" are smaller than PUSER. > That's important to understand for the diff below. `p_priority' > is currently a placeholder for the "sleeping priority" and the > "runnqueue priority". Both fields are separated by this diff. Separating out the fields is a good idea. The current way priorities are recorded is just confusing. The use of PUSER vs. PWAIT seems to be fairly arbitrary, so that is probably not a big issue. Except maybe for the single-threded signal stuff. Would be good to get guenther@'s thoughts on this bit. The PUSER -> PWAIT change isn't really necessary is it? It just makes it easier for you to understand what;s going on when looking at the queues. > - When a thread goes to sleep, the priority argument of tsleep(9) is > now recorded in `p_slpprio'. This argument can be considered as part > of the sleep queue. Its purpose is to place the thread into a higher > runqueue when awoken. Great! > - Currently, for stopped threads, `p_priority' correspond to `p_usrpri'. > So setrunnable() has been untangled to place SSTOP and SSLEEP threads > in the preferred queue without having to use `p_priority'. Note that > `p_usrpri' is still recalculated *after* having called setrunqueue(). > This is currently fine because setrunnable() is called with SCHED_LOCK() > but it will be racy when we'll split it. > > - A new field, `p_runprio' has been introduced. It should be considered > as part of the per-CPU runqueues. It indicates where a current thread > is placed. You made this an uint8_t, whereas the other priority fields are all u_char. Different names for the same thing, but it probably is a good idea to keep this consistent for now. > - `spc_curpriority' is now updated at every context-switch. That means >need_resched() won't be called after comparing an out-of-date value. >At the same time, `p_usrpri' is initialized to the highest possible >value for idle threads. > > - resched_proc() was calling need_resched() in the following conditions: >- If the SONPROC thread has a higher priority that the current > running thread (itself). >- Twice in setrunnable() when we know that p_priority <= p_usrpri. >- If schedcpu() considered that a thread, after updating its prio, > should preempt the one running on the CPU pointed by `p_cpu'. > > The diff below simplify all of that by calling need_resched() when: >- A thread is inserted in a CPU runqueue at a higher priority than > the one SONPROC. >- schedcpu() decides that a thread in SRUN state should preempt the > one SONPROC. > > - `p_estcpu' `p_usrpri' and `p_slptime' which represent the "priority" > of a thread are now updated while holding a per-thread mutex. As a > result schedclock() and donice() no longer takes the SCHED_LOCK(), > and schedcpu() almost never take it. Need to look closer at how this works. > - With this diff top(1) and ps(1) will report the "real" `p_usrpi' value > when displaying priorities. This is helpful to understand what's > happening: Do you intend to remove that bit before committing this? > load averages: 0.99, 0.56, 0.25 two.lab.grenadille.net > 23:42:10 > 70 threads: 68 idle, 2 on processorup > 0:09 > CPU0: 0.0% user, 0.0% nice, 51.0% sys, 2.0% spin, 0.0% intr, 47.1% idle > CPU1: 2.0% user, 0.0% nice, 51.0% sys, 3.9% spin, 0.0% intr, 43.1% idle > Memory: Real: 47M/1005M act/tot Free: 2937M Cache: 812M Swap: 0K/4323M > > PID TID PRI NICE SIZE RES STATE WAIT TIMECPU COMMAND > 81000 145101 7200K 1664K sleep/1 bored 1:15 36.96% softnet > 47133 244097 730 2984K 4408K sleep/1 netio 1:06 35.06% cvs > 64749 522184 660 176K 148K onproc/1 - 0:55 28.81% nfsd > 21615 602473 12700K 1664K sleep/0 - 7:22 0.00% idle0 > 12413 606242 12700K 1664K sleep/1 - 7:08 0.00% idle1 > 85778 338258 500 4936K 7308K idle select0:10 0.00% ssh > 22771 575513 500 176K 148K sleep/0 nfsd 0:02 0.00% nfsd > > > > - The removal of `p_priority' and the change that makes mi_switch() > always update `spc_curpriority' might introduce some changes in > behavior, especially with kernel threads that were not going through > tsleep(9). We currently have some situations where the priority of > the running thread isn't correctly reflected. This diff changes that > which means we should be able to better understand where the problems >
Re: Pump my sched: fewer SCHED_LOCK() & kill p_priority
On 01/06/19(Sat) 18:55, Martin Pieuchot wrote: > Diff below exists mainly for documentation and test purposes. If > you're not interested about how to break the scheduler internals in > pieces, don't read further and go straight to testing! > > - First change is to stop calling tsleep(9) at PUSER. That makes > it clear that all "sleeping priorities" are smaller than PUSER. > That's important to understand for the diff below. `p_priority' > is currently a placeholder for the "sleeping priority" and the > "runnqueue priority". Both fields are separated by this diff. > > - When a thread goes to sleep, the priority argument of tsleep(9) is > now recorded in `p_slpprio'. This argument can be considered as part > of the sleep queue. Its purpose is to place the thread into a higher > runqueue when awoken. > > - Currently, for stopped threads, `p_priority' correspond to `p_usrpri'. > So setrunnable() has been untangled to place SSTOP and SSLEEP threads > in the preferred queue without having to use `p_priority'. Note that > `p_usrpri' is still recalculated *after* having called setrunqueue(). > This is currently fine because setrunnable() is called with SCHED_LOCK() > but it will be racy when we'll split it. > > - A new field, `p_runprio' has been introduced. It should be considered > as part of the per-CPU runqueues. It indicates where a current thread > is placed. > > - `spc_curpriority' is now updated at every context-switch. That means >need_resched() won't be called after comparing an out-of-date value. >At the same time, `p_usrpri' is initialized to the highest possible >value for idle threads. > > - resched_proc() was calling need_resched() in the following conditions: >- If the SONPROC thread has a higher priority that the current > running thread (itself). >- Twice in setrunnable() when we know that p_priority <= p_usrpri. >- If schedcpu() considered that a thread, after updating its prio, > should preempt the one running on the CPU pointed by `p_cpu'. > > The diff below simplify all of that by calling need_resched() when: >- A thread is inserted in a CPU runqueue at a higher priority than > the one SONPROC. >- schedcpu() decides that a thread in SRUN state should preempt the > one SONPROC. > > - `p_estcpu' `p_usrpri' and `p_slptime' which represent the "priority" > of a thread are now updated while holding a per-thread mutex. As a > result schedclock() and donice() no longer takes the SCHED_LOCK(), > and schedcpu() almost never take it. > > - With this diff top(1) and ps(1) will report the "real" `p_usrpi' value > when displaying priorities. This is helpful to understand what's > happening: > > load averages: 0.99, 0.56, 0.25 two.lab.grenadille.net > 23:42:10 > 70 threads: 68 idle, 2 on processorup > 0:09 > CPU0: 0.0% user, 0.0% nice, 51.0% sys, 2.0% spin, 0.0% intr, 47.1% idle > CPU1: 2.0% user, 0.0% nice, 51.0% sys, 3.9% spin, 0.0% intr, 43.1% idle > Memory: Real: 47M/1005M act/tot Free: 2937M Cache: 812M Swap: 0K/4323M > > PID TID PRI NICE SIZE RES STATE WAIT TIMECPU COMMAND > 81000 145101 7200K 1664K sleep/1 bored 1:15 36.96% softnet > 47133 244097 730 2984K 4408K sleep/1 netio 1:06 35.06% cvs > 64749 522184 660 176K 148K onproc/1 - 0:55 28.81% nfsd > 21615 602473 12700K 1664K sleep/0 - 7:22 0.00% idle0 > 12413 606242 12700K 1664K sleep/1 - 7:08 0.00% idle1 > 85778 338258 500 4936K 7308K idle select0:10 0.00% ssh > 22771 575513 500 176K 148K sleep/0 nfsd 0:02 0.00% nfsd > > > > - The removal of `p_priority' and the change that makes mi_switch() > always update `spc_curpriority' might introduce some changes in > behavior, especially with kernel threads that were not going through > tsleep(9). We currently have some situations where the priority of > the running thread isn't correctly reflected. This diff changes that > which means we should be able to better understand where the problems > are. > > I'd be interested in comments/tests/reviews before continuing in this > direction. Note that at least part of this diff are required to split > the accounting apart from the SCHED_LOCK() as well. > > I'll also work on exporting scheduler statistics unless somebody wants > to beat me :) Updated diff to use IPL_SCHED and rebased to apply on top of -current :) Index: arch/amd64/amd64/genassym.cf === RCS file: /cvs/src/sys/arch/amd64/amd64/genassym.cf,v retrieving revision 1.40 diff -u -p -r1.40 genassym.cf --- arch/amd64/amd64/genassym.cf17 May 2019 19:07:15 - 1.40 +++ arch/amd64/amd64/genassym.cf1 Jun 2019 16:27:46 - @@ -32,7 +32,6 @@ export
Pump my sched: fewer SCHED_LOCK() & kill p_priority
Diff below exists mainly for documentation and test purposes. If you're not interested about how to break the scheduler internals in pieces, don't read further and go straight to testing! - First change is to stop calling tsleep(9) at PUSER. That makes it clear that all "sleeping priorities" are smaller than PUSER. That's important to understand for the diff below. `p_priority' is currently a placeholder for the "sleeping priority" and the "runnqueue priority". Both fields are separated by this diff. - When a thread goes to sleep, the priority argument of tsleep(9) is now recorded in `p_slpprio'. This argument can be considered as part of the sleep queue. Its purpose is to place the thread into a higher runqueue when awoken. - Currently, for stopped threads, `p_priority' correspond to `p_usrpri'. So setrunnable() has been untangled to place SSTOP and SSLEEP threads in the preferred queue without having to use `p_priority'. Note that `p_usrpri' is still recalculated *after* having called setrunqueue(). This is currently fine because setrunnable() is called with SCHED_LOCK() but it will be racy when we'll split it. - A new field, `p_runprio' has been introduced. It should be considered as part of the per-CPU runqueues. It indicates where a current thread is placed. - `spc_curpriority' is now updated at every context-switch. That means need_resched() won't be called after comparing an out-of-date value. At the same time, `p_usrpri' is initialized to the highest possible value for idle threads. - resched_proc() was calling need_resched() in the following conditions: - If the SONPROC thread has a higher priority that the current running thread (itself). - Twice in setrunnable() when we know that p_priority <= p_usrpri. - If schedcpu() considered that a thread, after updating its prio, should preempt the one running on the CPU pointed by `p_cpu'. The diff below simplify all of that by calling need_resched() when: - A thread is inserted in a CPU runqueue at a higher priority than the one SONPROC. - schedcpu() decides that a thread in SRUN state should preempt the one SONPROC. - `p_estcpu' `p_usrpri' and `p_slptime' which represent the "priority" of a thread are now updated while holding a per-thread mutex. As a result schedclock() and donice() no longer takes the SCHED_LOCK(), and schedcpu() almost never take it. - With this diff top(1) and ps(1) will report the "real" `p_usrpi' value when displaying priorities. This is helpful to understand what's happening: load averages: 0.99, 0.56, 0.25 two.lab.grenadille.net 23:42:10 70 threads: 68 idle, 2 on processorup 0:09 CPU0: 0.0% user, 0.0% nice, 51.0% sys, 2.0% spin, 0.0% intr, 47.1% idle CPU1: 2.0% user, 0.0% nice, 51.0% sys, 3.9% spin, 0.0% intr, 43.1% idle Memory: Real: 47M/1005M act/tot Free: 2937M Cache: 812M Swap: 0K/4323M PID TID PRI NICE SIZE RES STATE WAIT TIMECPU COMMAND 81000 145101 7200K 1664K sleep/1 bored 1:15 36.96% softnet 47133 244097 730 2984K 4408K sleep/1 netio 1:06 35.06% cvs 64749 522184 660 176K 148K onproc/1 - 0:55 28.81% nfsd 21615 602473 12700K 1664K sleep/0 - 7:22 0.00% idle0 12413 606242 12700K 1664K sleep/1 - 7:08 0.00% idle1 85778 338258 500 4936K 7308K idle select0:10 0.00% ssh 22771 575513 500 176K 148K sleep/0 nfsd 0:02 0.00% nfsd - The removal of `p_priority' and the change that makes mi_switch() always update `spc_curpriority' might introduce some changes in behavior, especially with kernel threads that were not going through tsleep(9). We currently have some situations where the priority of the running thread isn't correctly reflected. This diff changes that which means we should be able to better understand where the problems are. I'd be interested in comments/tests/reviews before continuing in this direction. Note that at least part of this diff are required to split the accounting apart from the SCHED_LOCK() as well. I'll also work on exporting scheduler statistics unless somebody wants to beat me :) This has been tested on amd64 and sparc64 and includes ze mtx_enter_try(9) diff I just sent. Index: arch/amd64/amd64/genassym.cf === RCS file: /cvs/src/sys/arch/amd64/amd64/genassym.cf,v retrieving revision 1.40 diff -u -p -r1.40 genassym.cf --- arch/amd64/amd64/genassym.cf17 May 2019 19:07:15 - 1.40 +++ arch/amd64/amd64/genassym.cf1 Jun 2019 16:27:46 - @@ -32,7 +32,6 @@ exportVM_MIN_KERNEL_ADDRESS struct proc member p_addr -member p_priority member p_stat member p_wchan member P_MD_REGS p_md.md_regs Index: arch/hppa/hppa/genassym.cf