Re: any scheduler/ipi/wakeup bug fixed in the last year?

2019-12-11 Thread Hans Petter Selasky

I wonder if there have been any bug fixes in that area over the past year or so.
Any help and pointers are welcome.


Hi,

A long time ago I fixed an issue for ARM:

http://svnweb.freebsd.org/changeset/base/265913

I've always wondered why x86 does some fixed amount of idle spins before 
going to sleep.


--HPS
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: any scheduler/ipi/wakeup bug fixed in the last year?

2019-12-11 Thread Andriy Gapon
On 11/12/2019 13:05, Konstantin Belousov wrote:
> On Wed, Dec 11, 2019 at 12:48:36PM +0200, Andriy Gapon wrote:
...
>> tdq_oldswitchcnt = 26, tdq_lowpri = 92 '\\', tdq_ipipending = 0 '\000', 
>> tdq_idx
...

> What is the value of tdq_ipipending ?
> See https://reviews.freebsd.org/D22758

It's zero, so it's probably a different issue.


-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: any scheduler/ipi/wakeup bug fixed in the last year?

2019-12-11 Thread Andriy Gapon
On 11/12/2019 12:48, Andriy Gapon wrote:
> So, if I am not confused, it appears like possibly a notification from a 
> waking
> CPU to the woken CPU (CPU3) was never delivered.
> Potentially, a problem with cpu_idle_wakeup() ?
> 
> I wonder if there have been any bug fixes in that area over the past year or 
> so.
> Any help and pointers are welcome.

Hardware:
CPU: Intel(R) Xeon(R) CPU E5-2680 v4 @ 2.40GHz (2400.05-MHz K8-class CPU)
  Origin="GenuineIntel"  Id=0x406f1  Family=0x6  Model=0x4f  Stepping=1
FreeBSD/SMP: 2 package(s) x 14 core(s)

machdep.idle: acpi
machdep.idle_available: spin, mwait, hlt, acpi
machdep.idle_mwait: 1

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: any scheduler/ipi/wakeup bug fixed in the last year?

2019-12-11 Thread Konstantin Belousov
On Wed, Dec 11, 2019 at 12:48:36PM +0200, Andriy Gapon wrote:
> 
> I am investigating a problem that originally looked like a ZFS I/O hang.
> But it quickly became obvious that the GEOM "up" queue was not being 
> processed.
> (kgdb) p g_bio_run_up
> $54 = {bio_queue = {tqh_first = 0xf801d8627178, tqh_last =
> 0xf80134751658}, bio_queue_lock = {lock_object = {lo_name =
> 0x80ad11ab "bio queue", lo_flags = 16973824, lo_data = 0, lo_witness =
> 0x0}, mtx_lock = 0}, bio_queue_length = 19}
> 
> The queue is unlocked and there are 19 bio-s on it.
> At the same time:
> (kgdb) tid 100125
> (kgdb) bt
> #0  sched_switch (td=0xf80111b23000, newtd=0xf801119d2000,
> flags=) at /usr/src/sys/kern/sched_ule.c:1997
> #1  0x80705405 in mi_switch (flags=, newtd=0x0) at
> /usr/src/sys/kern/kern_synch.c:436
> #2  0x8074844a in sleepq_wait (wchan=, pri=)
> at /usr/src/sys/kern/subr_sleepqueue.c:694
> #3  0x80704ed6 in _sleep (ident=0x81233d68 ,
> lock=0x810d72e0 , priority=,
> wmesg=0x80b417e4 "-", sbt=0, pr=0, flags=256) at
> /usr/src/sys/kern/kern_synch.c:216
> #4  0x8067713c in g_io_schedule_up (tp=) at
> /usr/src/sys/geom/geom_io.c:908
> #5  0x8067772d in g_up_procbody (arg=) at
> /usr/src/sys/geom/geom_kern.c:99
> #6  0x806c64c1 in fork_exit (callout=0x806776c0 
> ,
> arg=0x0, frame=0xfe014cc87ac0) at /usr/src/sys/kern/kern_fork.c:1042
> 
> The "g_up" thread is sleeping as if the queue was empty.
> The code in g_io_schedule_up() and g_io_deliver() is obviously correct with
> respect to synchronizing the queue access and wait/wakeup.
> So, there must be something deeper.
> 
> I examined the struct thread and the related scheduling objects:
> (kgdb) p *td
> $57 = {td_lock = 0x810f3a00 , td_proc =
> 0xf801119cd590, td_plist = {tqe_next = 0xf80111b1f5e0, tqe_prev =
> 0xf80111b235f0}, td_runq = {tqe_next = 0x0,
> tqe_prev = 0x810f3bd8 }, td_slpq = {tqe_next = 0x0,
> tqe_prev = 0xf80100050280}, td_lockq = {tqe_next = 0x0, tqe_prev =
> 0xfe018e443998}, td_hash = {le_next = 0x0, le_prev = 0xfe014bab68e8},
>   td_cpuset = 0xf80111b3a618, td_domain = {dr_policy = 0x810d78d8
> , dr_iterator = 0}, td_sel = 0x0, td_sleepqueue =
> 0xf80100050280, td_turnstile = 0xf801a7ed8a80, td_rlqe = 0x0,
>   td_umtxq = 0xf80111b13e80, td_tid = 100125, td_sigqueue = {sq_signals =
> {__bits = {0, 0, 0, 0}}, sq_kill = {__bits = {0, 0, 0, 0}}, sq_ptrace = 
> {__bits
> = {0, 0, 0, 0}}, sq_list = {tqh_first = 0x0,
>   tqh_last = 0xf80111b230d8}, sq_proc = 0xf801119cd590, sq_flags =
> 1}, td_lend_user_pri = 255 '\377', td_flags = 4, td_inhibitors = 0, td_pflags 
> =
> 2097152, td_dupfd = 0, td_sqqueue = 0, td_wchan = 0x0,
>   td_wmesg = 0x0, td_owepreempt = 0 '\000', td_tsqueue = 0 '\000', td_locks = 
> 0,
> td_rw_rlocks = 0, td_sx_slocks = 0, td_lk_slocks = 0, td_stopsched = 0,
> td_blocked = 0x0, td_lockname = 0x0, td_contested = {lh_first = 0x0},
>   td_sleeplocks = 0x0, td_intr_nesting_level = 0, td_pinned = 0, td_ucred =
> 0xf80100082b00, td_limit = 0xf80100082a00, td_slptick = 0, td_blktick 
> =
> 0, td_swvoltick = -2139537593, td_swinvoltick = -2139537706, td_cow = 0,
>   td_ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, 
> tv_usec
> = 0}, ru_maxrss = 0, ru_ixrss = 0, ru_idrss = 0, ru_isrss = 0, ru_minflt = 0,
> ru_majflt = 0, ru_nswap = 0, ru_inblock = 0, ru_oublock = 0,
> ru_msgsnd = 0, ru_msgrcv = 0, ru_nsignals = 0, ru_nvcsw = 14113408,
> ru_nivcsw = 240828}, td_rux = {rux_runtime = 202213463115, rux_uticks = 0,
> rux_sticks = 10554, rux_iticks = 0, rux_uu = 0, rux_su = 36818497,
> rux_tu = 36818497}, td_incruntime = 46828278, td_runtime = 202260266673,
> td_pticks = 10557, td_sticks = 3, td_iticks = 0, td_uticks = 0, td_intrval = 
> 0,
> td_oldsigmask = {__bits = {0, 0, 0, 0}}, td_generation = 14354236,
>   td_sigstk = {ss_sp = 0x0, ss_size = 0, ss_flags = 0}, td_xsig = 0,
> td_profil_addr = 0, td_profil_ticks = 0, td_name = "g_up", '\000'  times>, td_fpop = 0x0, td_dbgflags = 0, td_si = {si_signo = 0, si_errno = 0,
> si_code = 0, si_pid = 0, si_uid = 0, si_status = 0, si_addr = 0x0, 
> si_value
> = {sival_int = 0, sival_ptr = 0x0, sigval_int = 0, sigval_ptr = 0x0}, _reason 
> =
> {_fault = {_trapno = 0}, _timer = {_timerid = 0, _overrun = 0},
>   _mesgq = {_mqd = 0}, _poll = {_band = 0}, __spare__ = {__spare1__ = 0,
> __spare2__ = {0, 0, 0, 0, 0, 0, 0, td_ng_outbound = 0, td_osd = 
> {osd_nslots
> = 0, osd_slots = 0x0, osd_next = {le_next = 0x0, le_prev = 0x0}},
>   td_map_def_user = 0x0, td_dbg_forked = 0, td_vp_reserv = 0, td_no_sleeping =
> 0, td_su = 0x0, td_sleeptimo = 0, td_rtcgen = 0, td_sigmask = {__bits = {0, 0,
> 0, 0}}, td_rqindex = 23 '\027', td_base_pri = 92 '\\',
>   td_priority = 92 '\\', td_pri_class = 3 '\003', td_user_pri = 120 'x',
> td_base_user_pri = 120 'x', td_rb_list = 0, td_rbp_list = 

any scheduler/ipi/wakeup bug fixed in the last year?

2019-12-11 Thread Andriy Gapon


I am investigating a problem that originally looked like a ZFS I/O hang.
But it quickly became obvious that the GEOM "up" queue was not being processed.
(kgdb) p g_bio_run_up
$54 = {bio_queue = {tqh_first = 0xf801d8627178, tqh_last =
0xf80134751658}, bio_queue_lock = {lock_object = {lo_name =
0x80ad11ab "bio queue", lo_flags = 16973824, lo_data = 0, lo_witness =
0x0}, mtx_lock = 0}, bio_queue_length = 19}

The queue is unlocked and there are 19 bio-s on it.
At the same time:
(kgdb) tid 100125
(kgdb) bt
#0  sched_switch (td=0xf80111b23000, newtd=0xf801119d2000,
flags=) at /usr/src/sys/kern/sched_ule.c:1997
#1  0x80705405 in mi_switch (flags=, newtd=0x0) at
/usr/src/sys/kern/kern_synch.c:436
#2  0x8074844a in sleepq_wait (wchan=, pri=)
at /usr/src/sys/kern/subr_sleepqueue.c:694
#3  0x80704ed6 in _sleep (ident=0x81233d68 ,
lock=0x810d72e0 , priority=,
wmesg=0x80b417e4 "-", sbt=0, pr=0, flags=256) at
/usr/src/sys/kern/kern_synch.c:216
#4  0x8067713c in g_io_schedule_up (tp=) at
/usr/src/sys/geom/geom_io.c:908
#5  0x8067772d in g_up_procbody (arg=) at
/usr/src/sys/geom/geom_kern.c:99
#6  0x806c64c1 in fork_exit (callout=0x806776c0 ,
arg=0x0, frame=0xfe014cc87ac0) at /usr/src/sys/kern/kern_fork.c:1042

The "g_up" thread is sleeping as if the queue was empty.
The code in g_io_schedule_up() and g_io_deliver() is obviously correct with
respect to synchronizing the queue access and wait/wakeup.
So, there must be something deeper.

I examined the struct thread and the related scheduling objects:
(kgdb) p *td
$57 = {td_lock = 0x810f3a00 , td_proc =
0xf801119cd590, td_plist = {tqe_next = 0xf80111b1f5e0, tqe_prev =
0xf80111b235f0}, td_runq = {tqe_next = 0x0,
tqe_prev = 0x810f3bd8 }, td_slpq = {tqe_next = 0x0,
tqe_prev = 0xf80100050280}, td_lockq = {tqe_next = 0x0, tqe_prev =
0xfe018e443998}, td_hash = {le_next = 0x0, le_prev = 0xfe014bab68e8},
  td_cpuset = 0xf80111b3a618, td_domain = {dr_policy = 0x810d78d8
, dr_iterator = 0}, td_sel = 0x0, td_sleepqueue =
0xf80100050280, td_turnstile = 0xf801a7ed8a80, td_rlqe = 0x0,
  td_umtxq = 0xf80111b13e80, td_tid = 100125, td_sigqueue = {sq_signals =
{__bits = {0, 0, 0, 0}}, sq_kill = {__bits = {0, 0, 0, 0}}, sq_ptrace = {__bits
= {0, 0, 0, 0}}, sq_list = {tqh_first = 0x0,
  tqh_last = 0xf80111b230d8}, sq_proc = 0xf801119cd590, sq_flags =
1}, td_lend_user_pri = 255 '\377', td_flags = 4, td_inhibitors = 0, td_pflags =
2097152, td_dupfd = 0, td_sqqueue = 0, td_wchan = 0x0,
  td_wmesg = 0x0, td_owepreempt = 0 '\000', td_tsqueue = 0 '\000', td_locks = 0,
td_rw_rlocks = 0, td_sx_slocks = 0, td_lk_slocks = 0, td_stopsched = 0,
td_blocked = 0x0, td_lockname = 0x0, td_contested = {lh_first = 0x0},
  td_sleeplocks = 0x0, td_intr_nesting_level = 0, td_pinned = 0, td_ucred =
0xf80100082b00, td_limit = 0xf80100082a00, td_slptick = 0, td_blktick =
0, td_swvoltick = -2139537593, td_swinvoltick = -2139537706, td_cow = 0,
  td_ru = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec
= 0}, ru_maxrss = 0, ru_ixrss = 0, ru_idrss = 0, ru_isrss = 0, ru_minflt = 0,
ru_majflt = 0, ru_nswap = 0, ru_inblock = 0, ru_oublock = 0,
ru_msgsnd = 0, ru_msgrcv = 0, ru_nsignals = 0, ru_nvcsw = 14113408,
ru_nivcsw = 240828}, td_rux = {rux_runtime = 202213463115, rux_uticks = 0,
rux_sticks = 10554, rux_iticks = 0, rux_uu = 0, rux_su = 36818497,
rux_tu = 36818497}, td_incruntime = 46828278, td_runtime = 202260266673,
td_pticks = 10557, td_sticks = 3, td_iticks = 0, td_uticks = 0, td_intrval = 0,
td_oldsigmask = {__bits = {0, 0, 0, 0}}, td_generation = 14354236,
  td_sigstk = {ss_sp = 0x0, ss_size = 0, ss_flags = 0}, td_xsig = 0,
td_profil_addr = 0, td_profil_ticks = 0, td_name = "g_up", '\000' , td_fpop = 0x0, td_dbgflags = 0, td_si = {si_signo = 0, si_errno = 0,
si_code = 0, si_pid = 0, si_uid = 0, si_status = 0, si_addr = 0x0, si_value
= {sival_int = 0, sival_ptr = 0x0, sigval_int = 0, sigval_ptr = 0x0}, _reason =
{_fault = {_trapno = 0}, _timer = {_timerid = 0, _overrun = 0},
  _mesgq = {_mqd = 0}, _poll = {_band = 0}, __spare__ = {__spare1__ = 0,
__spare2__ = {0, 0, 0, 0, 0, 0, 0, td_ng_outbound = 0, td_osd = {osd_nslots
= 0, osd_slots = 0x0, osd_next = {le_next = 0x0, le_prev = 0x0}},
  td_map_def_user = 0x0, td_dbg_forked = 0, td_vp_reserv = 0, td_no_sleeping =
0, td_su = 0x0, td_sleeptimo = 0, td_rtcgen = 0, td_sigmask = {__bits = {0, 0,
0, 0}}, td_rqindex = 23 '\027', td_base_pri = 92 '\\',
  td_priority = 92 '\\', td_pri_class = 3 '\003', td_user_pri = 120 'x',
td_base_user_pri = 120 'x', td_rb_list = 0, td_rbp_list = 0, td_rb_inact = 0,
td_sa = {code = 0, callp = 0x0, args = {0 }, narg = 0},
  td_pcb = 0xfe014cc87b80, td_state = TDS_RUNQ, td_uretoff = {tdu_retval =
{0, 0}, tdu_off = 0}, td_cowgen = 0, td_slpcallout = {c_links = {le = {le_next =
0x0, le_prev = 0x0}, sle