Re: bus_space_barrier semantics

2022-07-16 Thread Martin Husemann
On Sat, Jul 16, 2022 at 01:42:38PM -0700, Jason Thorpe wrote:

> Didn't OpenBSD remove the offset / length arguments?  Anyway, I'm not
> particularly concerned about this part, but it would be a nice wart to
> rid ourselves of.

Since these calls (now) should be relatively rare, and conversion would
be to just remove arguments when we port a driver, that sounds like
an acceptable difference to me.

Martin


re: Anyone recall the dreaded tstile issue?

2022-07-16 Thread matthew green
> I got two off-list mails suggesting LOCKDEBUG.  I tried that today -
> I added LOCKDEBUG and VNODE_LOCKDEBUG both (the latter I found in the
> ALL config when grepping for LOCKDEBUG).
>
> That kernel panicked promptly on boot
>
> panic: vop_read: vp: locked 0, expected 1

just use LOCKDEBUG for now.

this is VNODE_LOCKDEBUG and it might be buggy since no one
uses it, but LOCKDEBUG is quite common.


.mrg.


Re: Anyone recall the dreaded tstile issue?

2022-07-16 Thread Mouse
I got two off-list mails suggesting LOCKDEBUG.  I tried that today -
I added LOCKDEBUG and VNODE_LOCKDEBUG both (the latter I found in the
ALL config when grepping for LOCKDEBUG).

That kernel panicked promptly on boot

panic: vop_read: vp: locked 0, expected 1

with a stack trace that goes breakpoint, panic, VOP_READ, vn_rdwr,
check_exec, execve1, sys_execve, start_init.  I haven't yet dug any
deeper; I don't know whether this is part of my problem (probably not,
I would guess, because of the apparent puffs connection), a bug I
introduced and just never noticed before, or something broken with one
(or both!) of those options.  I'll be investigating to figure out which
of those it is, but, for the moment, LOCKDEBUG is off the table. :(

I'm also going to be using a different machine for at least some of my
testing; rebooting this machine this much is...inconvenient.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: bus_space_barrier semantics

2022-07-16 Thread Jason Thorpe



> On Jul 16, 2022, at 11:15 AM, Taylor R Campbell  wrote:
> 
> Thoughts?

110% on-board with all of this.

> P.S.  I also considered eliminating the offset/length argument,
> because no implementations take advantage of them, so I started (with
> maya@'s help wrangling coccinelle) to draft a patch to remove them.
> However, the other BSDs have the same API, so changing this would make
> it more of a pain to share drivers.

Didn’t OpenBSD remove the offset / length arguments?  Anyway, I’m not 
particularly concerned about this part, but it would be a nice wart to rid 
ourselves of.

-- thorpej




bus_space_barrier semantics

2022-07-16 Thread Taylor R Campbell
tl;dr -- Proposal:
- Clarify bus_space_barrier is needed only if prefetchable/cacheable.
- Define BUS_SPACE_BARRIER_READ as acquire barrier, like membar_acquire.
- Define BUS_SPACE_BARRIER_WRITE as release barrier, like membar_release.


Many people have been confused by the semantics of bus_space_barrier,
and the bus_space(9) man page is internally inconsistent.

But on every machine I've examined, bus_space is implemented so that
bus_space_read/write are observed by the device in program order
already -- except if you enable BUS_SPACE_MAP_PREFETCHABLE or
BUS_SPACE_MAP_CACHEABLE.

Only a handful of drivers other than framebuffer drivers even use
bus_space_barrier -- the vast majority of drivers using bus_space lack
bus_space_barrier calls and would undoubtedly fail to function at all
if the machine were reordering their bus_space_reads/writes.

I propose we explicitly adopt and document the following semantics, to
match most existing usage and implementations:

- bus_space_barrier is needed only for mappings with
  BUS_SPACE_MAP_PREFETCHABLE or BUS_SPACE_MAP_CACHEABLE.
  bus_space_read/write on non-prefetchable, non-cacheable mappings on
  a single device are issued in program order and never fused or torn.

- bus_space_barrier(t, h, o, l, BUS_SPACE_BARRIER_READ) means that any
  program-prior bus_space_read (on any space) has returned data from
  the device or storage before any program-later bus_space_read or
  bus_space_write on t/h/o/l (tag, handle, offset, length), or memory
  access via the l bytes from (char *)bus_space_vaddr(t, h) + o.

  If h was mapped non-prefetchable and non-cacheable this may be a
  noop.

  This functions similarly to membar_acquire, but for I/O.

- bus_space_barrier(t, h, o, l, BUS_SPACE_BARRIER_WRITE) means that
  any program-prior bus_space_read or bus_space_write on t/h/o/l, or
  memory access via the l bytes from (char *)bus_space_vaddr(t, h) +
  o, has returned data from or been observed by the device or storage
  before before any program-later bus_space_write (on any space) is
  observed by the device.

  If h was mapped non-prefetchable and non-cacheable this may be a
  noop.

  This functions similarly to membar_release, but for I/O.

- bus_space_barrier(t, h, o, l, BUS_SPACE_BARRIER_READ |
  BUS_SPACE_BARRIER_WRITE) means that all program-prior bus_space_read
  or bus_space_write on any space must complete before any
  program-later bus_space_read or bus_space_write on any space.

  Note that this is independent of t/h/o/l; there's no way the
  arguments can elide any action here.

  This functions similarly to membar_sync, but for I/O.

- No MI notion of `ordering' vs `completion'.

  In practical terms this distinction is relevant only for software
  timing measurement (when do machine instructions retire/commit on
  the CPU?) or highly machine-dependent things (when do speculative
  TLB translation table walks happen? when does a machine-check
  exception trigger?), not for device driver correctness (in what
  order do reads and writes hit the device?).

  Currently there is no such MI difference in the API but the man page
  makes the distinction nevertheless in its prose.

Drivers that don't use BUS_SPACE_MAP_PREFETCHABLE or
BUS_SPACE_MAP_CACHEABLE never have to worry about barriers -- and,
since `cacheable' is almost always, if not always, used for ROMs,
bus_space_barrier is relevant essentially only for drivers that use
BUS_SPACE_MAP_PREFETCHABLE, e.g. for framebuffers or command queues.
(Drivers may still have to worry about bus_dmamap_sync if they use
bus_dma(9), of course.)

The usage model is something like:

/* Map MMIO registers and command/response regions on attach */
bus_space_map(sc->sc_regt, ..., 0, >sc_regh);
bus_space_map(sc->sc_memt, ..., BUS_SPACE_MAP_PREFETCHABLE, 
>sc_memh);

/* Submit a command */
i = sc->sc_nextcmdslot;
bus_space_write_4(sc->sc_memt, sc->sc_memh, CMDSLOT(i), cmd);
bus_space_write_4(sc->sc_memt, sc->sc_memh, CMDSLOT(i) + 4, arg1);
bus_space_write_4(sc->sc_memt, sc->sc_memh, CMDSLOT(i) + 8, arg2);
...
bus_space_write_4(sc->sc_memt, sc->sc_memh, CMDSLOT(i) + 4*n, argn);
bus_space_barrier(sc->sc_memt, sc->sc_memh, CMDSLOT(i), 4*n,
BUS_SPACE_BARRIER_WRITE);
bus_space_write_4(sc->sc_regt, sc->sc_regh, CMDTAIL, i);
sc->sc_nextcmdslot = (i + 1) % sc->sc_ncmdslots;

/* Obtain a response */
i = bus_space_read_4(sc->sc_regt, sc->sc_regh, RESPTAIL);
bus_space_barrier_4(sc->sc_memt, sc->sc_memh, RESPSLOT(i), 4*n,
BUS_SPACE_BARRIER_READ);
... bus_space_read_4(sc->sc_memt, sc->sc_memh, RESPSLOT(i) + 4*j) ...

The practical consequences of this proposal are:

1. We remove a lot of verbiage (and internal inconsistency) in
   bus_space(9).

2. We can omit needless bus_space_barrier calls in drivers that don't
   use BUS_SPACE_MAP_PREFETCHABLE (or 

Re: Anyone recall the dreaded tstile issue?

2022-07-16 Thread Mouse
One other thing of note.

I just now looked at the record I captured of the last hang.  I looked
for other processes waiting on puffsrpl, on the theory that if there
are two unexpected such, there may be others.

There are:

557  1 3   084   d2251500mailwrapper puffsrpl
3985 1 3   184   d1c3ba80  bozohttpd puffsrpl
198161 3   084   d23d9360  smtpd puffsrpl
557  1 3   084   d2251500mailwrapper puffsrpl
3985 1 3   184   d1c3ba80  bozohttpd puffsrpl
236191 3   184   d68ef080  bozohttpd puffsrpl
3698 1 3   084   d2eafca0 as puffsrpl
231761 3   184   d1f3b7e0git puffsrpl
7871 1 3   184   d1e382a0  bozohttpd puffsrpl

bozohttpd, those do not surprise me.  At least half the hits I get
these days are fetching from the puffs mount (it's
/export/ftp/pub/mouse/git-unpacked; /export/ftp is also the root
directory used by bozohttpd.)

The git and one of the mailwrappers were at the end of the tstile wait
chains I found.  That leaves the other mailwrapper, the smtpd, and the
as.

None of those five would I expect to be going anywhere near the puffs
mount point.  That an as is running does not surprise me; I had a
build-of-the-world going when this happened.  But it shouldn't've been
touching anything outside of /usr/src, the build-into directories, and
/tmp (or moral equivalent - /usr/tmp, or /var/tmp, or some such).

I'm going to have to get stack traces next hang.  I've now set up a
dump partition, so I have a fighting chance of getting a kernel dump.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Anyone recall the dreaded tstile issue?

2022-07-16 Thread Mouse
Oh, and one thing I don't think I've said yet:

Thank you very much, everyone who's even thought about this issue!
I've seen at least two off-list emails already in addition to the
on-list ones.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Anyone recall the dreaded tstile issue?

2022-07-16 Thread Mouse
>> My best guess at the moment is that there is a deadlock loop where
>> something tries to touch the puffs filesystem, the user process
>> forks a child as part of that operation, and the child gets locked
>> up trying to access the puffs filesystem.
> That is possible, [...]

> A more common case, I believe, is [...failing to unlock in an error
> path...]

> The function [...] which causes the problem is no longer active, no
> amount of stack tracing will find it.  The process which called it
> might not even still exist, it might have received the error return,
> and exited.

I find the notion of a nonexistent process holding a lock disturbing,
but of course that's just a human-layer issue.

> Finding this kind of thing requires very careful and thorough code
> reading, analysing every lock, and making sure that lock gets
> released, somewhere, on every possible path after it is taken.

Well...if I wanted to debug that, I would probably grow each lock by
some kind of indication (a PC value, or __FILE__ and __LINE__) of where
it was last locked.  Then, once the culprit lock is found

> The best you can really hope for from examining the wedged system is
> to find which lock is (usually "might be") the instigator of it all.
> That can help narrow the focus of code investigation.

At the moment, I'd be happy with that much.  But the system has only
the one puffs mount, which is off under /export/ftp, not anywhere that
is, for example, expected to be on anyone's $PATH, and all the
X-waiting-on-Y-waiting-on-Z chains end up with someone waiting on
puffsrpl.  And the puffs userland processes show no indication of being
stuck in the "holding a lock that's not getting released" sense.  So
there probably is nothing here that could be caught by, for example,
in-kernel deadlock detection.

I am basically certain it has _something_ to do with the puffs
filesystem, because of the puffsrpl waits and because it started
happening shortly after I added the puffs mount.  The real puzzle, for
me, in this latest hang are why/how the mailwrapper and git processes
ended up waiting for puffsrpl.  I will allocate a piece of disk for a
kernel coredump, so I can do detailed post-mortem on a wedged system.
(The machine's main function is to forward packets; I can't really keep
it in ddb for hours while I pore over details of a lockup.)

I will also add timeouts in the puffs userland code, so that if a
forked git process takes too long, it is nuked, with the access that
led to it returning an error - and, of course, logging all over the
place.

I'm also going to change ddb's ps listing to include PPID; in this last
hang, I would have liked to have known whether the git process were a
child of the gitfs process.

I will also take that "other system" I mentioned, make a puffs mount,
and then start playing that game.  If I can get it to tstile in a
reasonable time frame, it will greatly accelerate debugging this.

> Mouse, start with the code you added ... make sure there are no
> problems like this buried in it somewhere (your own code, and
> everything it calls).

I haven't touched the puffs kernel code.  Of course, that doesn't mean
it doesn't have any such issues, but it makes it seem less likely to
me.  While it doesn't rule out such problems in any of my other
changes, it makes that too less likely; it would have to be a bug that
remained latent until I started using puffs

> If that ends up finding nothing, then the best course if action might
> be to use a fairly new kernel.

Possibly, but unless a new kernel can be built with 5.2's compiler, I
run right back into the licensing issue that's the reason I froze at
5.2 to begin with.  I'd also have to port at least a few of my kernel
changes to the new kernel.

I may have to resort to that, but I'd much rather avoid it; even if the
licensing turns out to not be an issue, it would be a lot of work.

> I haven't seen a tstile lockup in ages, [...]

I never saw them at all until I started playing with puffs.  The major
reason I'm reluctant to suspect your "lock held by a nonexistent
process" theory (presumably with the culprit somewhere puffs-related)
here is those two processes waiting on puffsrpl which I would not
expect to be touching the puffs mountpoint at all.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Re: Anyone recall the dreaded tstile issue?

2022-07-16 Thread Robert Elz
Date:Sat, 16 Jul 2022 00:48:59 -0400 (EDT)
From:Mouse 
Message-ID:  <202207160448.aaa09...@stone.rodents-montreal.org>

  | That's what I was trying to do with my looking at "X is tstiled waiting
  | for Y, who is tstiled waiting for Z, who is..." and looking at the
  | non-tstiled process(se) at the ends of those chains.

That can sometimes help, but this is a difficult ussue to debug, as
often the offender is long gone before anyone notices.

  | My best guess at the moment is that there is a deadlock loop where
  | something tries to touch the puffs filesystem, the user process forks a
  | child as part of that operation, and the child gets locked up trying to
  | access the puffs filesystem.

That is possible, as is the case where locking is carried out
improperly (I lock a then try to lock b, you lock b then try to
lock a) - but those are the easier cases to find.

A more common case, I believe, is

func()
{
lock(something);
/*
 * do some work
 */
 if (test for something strange) {
/*
 * this should not happen
 */
return EINVAL;
}
/*
 * more stuff
 */
 unlock(something),
 return answer,
}

where I am sure you can what's missing in this short segment ...  real
code is typically much messier, and the locks not always that explicit,
they can be acquired/released as side effects of other function calls.

The function (func here) which causes the problem is no longer
active, no amount of stack tracing will find it.  The process
which called it might not even still exist, it might have
received the error return, and exited.

Finding this kind of thing requires very careful and thorough
code reading, analysing every lock, and making sure that lock
gets released, somewhere, on every possible path after it is taken.
The best you can really hope for from examining the wedged system
is to find which lock is (usually "might be") the instigator of it all.
That can help narrow the focus of code investigation.

Mouse, start with the code you added ... make sure there are
no problems like this buried in it somewhere (your own code, and
everything it calls).   If that ends up finding nothing, then
the best course if action might be to use a fairly new kernel.
Some very good people (none of whom is me, so I can lather praise)
have done some very good work in fixing most if the issues we
used to have.  I haven't seen a tstile lockup in ages, and I used
to quite often (fortunately mostly ones that affected comparatively
little, but over time, things get more and more clogged, until a
reboot - whuch can rarely be clean in this state - is required).

kre