from:"Jason Thorpe"

Re: Forcing a USB device to "ugen"

2024-03-26 Thread Jason Thorpe



> On Mar 26, 2024, at 2:41 PM, Thor Lancelot Simon  wrote:
> 
> I don't think this can be safely allowed at security level > 0, unless,
> perhaps, it's restricted from working on devices that would match disk
> drivers.

The driver being asked to detach could certainly refuse to do so based on why 
it’s being asked.

-- thorpej

Re: Forcing a USB device to "ugen"

2024-03-26 Thread Jason Thorpe

> On Mar 26, 2024, at 10:18 AM, Jason Thorpe  wrote:
> 
> Like I said in a previous message on this thread, I think the better model is 
> for ugen to always exist for every USB device / interface, and for the kernel 
> driver to attach there.  Then it’s easy to just say “detach whatever kernel 
> driver is here [assuming it’s not busy providing some other service] so that 
> user-space can take over”, and “re-attach any kernel driver that might match 
> this device now that I’m done”.

I guess I should explicitly state the other part of this, which is “and 
anything other than totally benign control pipe transfers without having 
CLAIM’d the device/interface must return EBUSY”.  It’s the act of CLAIM’ing the 
device/interface that detaches the kernel driver and the act of RELEASE’ing it 
that causes the kernel to attempt to re-attach one.

-- thorpej

Re: Forcing a USB device to "ugen"

2024-03-26 Thread Jason Thorpe

> On Mar 26, 2024, at 9:31 AM, Brian Buhrow  wrote:
> 
> Isn't it possible to do most of what Jason proposes by using the drvctl 
> interface to
> detach a driver from a specific USB device?  Then, some glue could be added 
> to the ugen driver
> to allow it to be attached to arbitrary devices using the same drvctl 
> interface?  That seems a
> lot easier to me than building a registry of devices and device IDs, which 
> will be  woefully
> out of date before it gets published.  It also has the advantage of allowing 
> the user to do
> creative stuff that the developers didn't think of.  Am I missing something 
> obvious?

It’s absolutely not easier, because there’s not really an easy way, once the 
kernel driver is detached, to say “attach ugen where that other thing was, AND 
ONLY THERE”.

Like I said in a previous message on this thread, I think the better model is 
for ugen to always exist for every USB device / interface, and for the kernel 
driver to attach there.  Then it’s easy to just say “detach whatever kernel 
driver is here [assuming it’s not busy providing some other service] so that 
user-space can take over”, and “re-attach any kernel driver that might match 
this device now that I’m done”.

-- thorpej

Re: Forcing a USB device to "ugen"

2024-03-26 Thread Jason Thorpe

> On Mar 26, 2024, at 5:37 AM, Robert Swindells  wrote:
> 
> I thought that all FTDI devices provided JTAG etc. functionality, just
> that the pins are not connected to anything in some devices.

I guess it depends on how your individual FTDI board is wired up.  Also, for 
chips without the MPSSE, you need something else.  OpenOCD supports JTAG over 
the FT232R, for example, by bit-banging  with the various RS232 signals (RXD, 
TXD, RTS, CTS, DTR, DCD).

Anyway, as I noted in my original email, my particular board has one of the 
2232 interfaces intended for use as a UART (and connected to a header that 
labels it for that purpose), and the second interface brought out two 4 
different headers labeled for JTAG, SWD, SPI, and I2C.

> What would be wrong with attaching an ugen to interface 1 instead of
> an ucom in the ftdi driver itself?

ugen can’t currently attach to things other than uhub.  I think attaching ugen 
as the leaf is the wrong model, though; ugen should be what the kernel drivers 
themselves attach to, IMO.

-- thorpej

Re: Forcing a USB device to "ugen"

2024-03-26 Thread Jason Thorpe

> On Mar 26, 2024, at 2:49 AM, Martin Husemann  wrote:

> It is also not *that* intrusive as it may sound at first look:
> basically we need a central registry that collects all the
> identification data (vid,pid,strings and what have you) plus the parent
> and the device pointer, and a flag if this is a device claimed by some
> driver or one of the currently visible ugen* things.

I was thinking about this while sipping my first cup of coffee this morning..

I think the right model is for a “ugen” or what-have-you attach to uhub, and 
then for the kernel drivers to attach to ugen (either claim the whole device or 
attach to individual interfaces).

-- thorpej

Forcing a USB device to "ugen"

2024-03-25 Thread Jason Thorpe

I have a device based on the FTDI FT2232C:

[ 3285.311079] uftdi1 at uhub1 port 2 configuration 1 interface 0
[ 3285.311079] uftdi1: SecuringHardware.com (0x0403) Tigard V1.1 (0x6010), 
rev 2.00/7.00, addr 3
[ 3285.311079] ucom1 at uftdi1 portno 1
[ 3285.311079] uftdi2 at uhub1 port 2 configuration 1 interface 1
[ 3285.311079] uftdi2: SecuringHardware.com (0x0403) Tigard V1.1 (0x6010), 
rev 2.00/7.00, addr 3
[ 3285.311079] ucom2 at uftdi2 portno 2

It's a combo device that, in addition to a standard TTL-level UART, has a bunch 
of break-out headers for doing things like SPI, SWD, and JTAG (in my case, I 
need to use JTAG for programming some Atmel CPLDs).  I should be able to do 
this with OpenOCD (pkgsrc/devel/openocd), but libfdti1 fails to find the device 
because libusb1 only deals in "ugen".

"ugenif" might have been a possible solution here, except for the fact that 
0x0403,0x6010 is the standard VID,PID for the FTDI FT2232C, and I don't want 
"interface 1" of ALL FT2232C devices to get the "ugen" treatment.  The desire 
to use "ugen" on "interface 1" is not a property of 0x0403,0x6010, it's really 
a property of "SecuringHardware.com","Tigard V1.1".  Unfortunately, there's 
isn't a way to express that in the kernel config syntax.

I think my only short-term option here is to, in uftdi_match(), specifically 
reject based on this criteria:

- VID == 0x0403
- PID == 0x6010
- interface number == 1
- vendor string == "SecuringHardware.com"
- product string == "Tigard V1.1"

(It's never useful, on this particular board, to use the second port as a UART.)

-- thorpej

Re: netlink protocol

2024-02-16 Thread Jason Thorpe

> On Feb 16, 2024, at 3:16 PM, Rui-Xiang Guo  wrote:
> 
> Hi,
> There are some go modules use this syscall and marked for Linux only.
> This protocol also appeared in FreeBSD 13.2.
> Is there any plan to port it? Or any other method to implement it?

Netlink by itself isn't terribly useful (and it's pretty gross, honestly).  
Kernel subsystems also have to interface with it.

What do these Go modules implement?  Do you need to run an application that 
uses one of them?

-- thorpej

Re: Fixing, reestoring DEC FDDI (DEFPA/DEFEA/DEFTA/DEFQA) in 8.2, 9.2; restore to -current?

2024-02-12 Thread Jason Thorpe

> On Feb 11, 2024, at 1:29 PM, Jonathan Stone  wrote:

> Turns out  that  fddi_ifattach() is broken in 8.2 and 9.2.  It never 
> initialises sc_ec.ec_lock, which causes a panic the fisrt time the kernel 
> tries to add a multicast address to the interface.  If you're using IPv6 (I 
> don't; I comment out "options INET6"),  the panic occurs soon after boot when 
> ipv6 discovery starts.

^^^
Important context.

> I would like to restore "pdq" (fpa, fea, fta, whatever a qbus attachment is; 
> or write one if none) to -current.
> However, 10_RC4. doesn't even have if_fddisubr.c. 

Right, it was removed from -current before netbsd-10 branched after some 
discussion.  Same with Token Ring, for the same reason ... a bunch of 
apparently unused code that had no work done to make MP-safe improvements like 
the Ethernet code received, and the work hadn't been done because, well, no one 
was apparently using it.  Looks like I was right, because (a) no one screamed 
when it disappeared, and (b) when someone tried an older version that was still 
around, it blew up in their face. :-)  Anyway, having that unmaintained code 
lying about introduces a practical barrier to making further improvements to 
the networking code, especially when those improvements introduce changes to 
the contracts between the layers.  In the case of the FDDI code, there's the 
additional complication that the PDQ driver is ... well, it's something!  
Namely, a maze of twisty #ifdefs, all alike, where you stand a very good chance 
of being eaten by a grue.

> I don;t want to re-create the hack of having two different initialisers for 
> the IEE 802 (sic) [*] portions of "struct ethercom'.
> A cleaner solution is to declare a new struct with all the members of 'struct 
> ethercom', except the  'struct  ifnet ec_if;
> 'struct ethercom' then becomes a struct with two members: a struct ifnet, and 
> the new struct (struct iee802_common?).
> That allows clean separation of code which manipulates the additions in 
> today's "struct ethercom', from code which also manipulates struct ifnet.
> 
> Thoughts?  Anyone actively against  PR'ing and (hopefully) minimal patches 
> NetBSD-8 and NetBSD-9?
> Or against restoring FDDI to -current. (and perhaps backporting to NetBSD-10)?
> If I have to, I can probably ship a pair of DEFTAs to an interested 
> contributor, if support from me is too tenuous.
> 
> [*] FDDI is not IEEE 802. But it's derived from Token Ring,, 802.5, which is. 
> And I suppose the refactoring I'm proposing here could add supporing Token 
> Ring, if anyone actively wanted to...

See also about Token Ring above :-)

I am not at all opposed to resurrecting this stuff, doing a re-factor to make 
it easier to maintain going forward, etc.  If someone wants to volunteer to do 
that work (and then actually DO it), who am I to say no?  After all, I love 
obsolete technology as much as (and quite possibly more than) the next guy!  
*Stares in 6800.*  But I would prefer we not return to the previous state where 
the code went completely unmaintained and unused.

-- thorpej

Re: MNT Reform2 USB LCP flash

2024-01-26 Thread Jason Thorpe



> On Jan 26, 2024, at 2:41 AM, Robert Elz  wrote:
> 
>Date:Fri, 26 Jan 2024 09:26:38 - (UTC)
>From:mlel...@serpens.de (Michael van Elst)
>Message-ID:  
> 
>  | Fortunately the drive geometry isn't really used anywhere. All
>  | accesses just use the logical block addresses.
> 
> I have been meaning to suggest for ages that we remove all the
> geometry nonsense from everywhere in the kernel, except those
> drivers that actually need it - those should be responsible for
> converting block numbers to CHS in a way that works for thej,
> if they really need it (ancient ide drives before LBA addressing,
> vax massbus drives, sun xd drives ... anything like that which
> almost no-one has seen in decades).
> 
> It is just bizarre to see ssd and even nvme 'drives' claiming
> to have cylinders and heads!

10% agree.  Alas, FFS's whole schtick is caring about drive geometry, so we 
kind of need to fake up something for newfs on such drives (and we should do it 
in a generic way so the code isn’t replicated in a million different places), 
and have some way of getting the real info for drives where it actually does 
exist.

-- thorpej

Re: _KERNEL_OPT and 0x6e074def

2023-12-19 Thread Jason Thorpe




> On Dec 19, 2023, at 10:33 AM, Jason Thorpe  wrote:
> 
> 
>> On Dec 19, 2023, at 10:01 AM, Edgar Fuß  wrote:
>> 
>> 0x6e074def
> 
> the-ripe-vessel:thorpej 99$ grep UNDEFINED * 
> mkheaders.c:#define UNDEFINED ('n' << 24 | 0 << 20 | 't' << 12 | 0xdefU)
> mkheaders.c:/* Value for defined options with value UNDEFINED */
> mkheaders.c: return (unsigned int)(h != UNDEFINED ? h : DEFINED);
> mkheaders.c: fprint_global(fp, dl->dl_name, UNDEFINED);

Oops, forgot the important part:

the-ripe-vessel:thorpej 98$ pwd
/nbsd/src/usr.bin/config

-- thorpej

Re: _KERNEL_OPT and 0x6e074def

2023-12-19 Thread Jason Thorpe



> On Dec 19, 2023, at 10:01 AM, Edgar Fuß  wrote:
> 
> 0x6e074def

the-ripe-vessel:thorpej 99$ grep UNDEFINED * 
mkheaders.c:#define UNDEFINED ('n' << 24 | 0 << 20 | 't' << 12 | 0xdefU)
mkheaders.c:/* Value for defined options with value UNDEFINED */
mkheaders.c: return (unsigned int)(h != UNDEFINED ? h : DEFINED);
mkheaders.c: fprint_global(fp, dl->dl_name, UNDEFINED);

-- thorpej

Re: __futex(2): use outside Linux compat

2023-12-11 Thread Jason Thorpe



> On Dec 11, 2023, at 5:00 AM, Robert Swindells  wrote:
> 
>> Is it OK to use for NetBSD "native" code since it is not "advertised"
>> by a man page?
> 
> No.

Heavy asterisk here.

Thierry, let’s chat off-list.

-- thorpej

Re: kern/57691: NFS client regression with macOS 14 server

2023-12-08 Thread Jason Thorpe



> On Dec 8, 2023, at 10:55 AM, Jason Thorpe  wrote:
> 
> Probably what’s going on is that the server is verifying the directory cookie 
> more strictly than before.  Those two lines that pack the cookieverf should 
> be inserting 0s if uno_offset is 0.

Just confirmed by code inspection that FreeBSD always sends a 0 cookie verifier 
for uio_offset 0.


if (cookie.qval == 0) {
*tl++ = 0;
*tl++ = 0;
} else {


(From their nfsrpc_readdirplus().)

-- thorpej

Re: kern/57691: NFS client regression with macOS 14 server

2023-12-08 Thread Jason Thorpe

> On Nov 10, 2023, at 11:10 AM, schm...@netbsd.org  wrote:
> 
> tcpdump of "ls" after the problem has manifested:
> https://netbsd.schmonz.com/tmp/nfs/ls-after-the-problem-tcpdump.txt

The tcpdump shows:

16:16:35.683996 IP 10.0.2.2.shilp > 10.0.2.15.exp2: Flags [P.], seq 1573:1693, 
ack 1508, win 65535, length 120: NFS reply xid 834502658 reply ok 116 readdir 
ERROR: READDIR/READDIRPLUS cookie is stale

Looking at the code that issues READDIRPLUS in the kernel in 
nfs_readdirplusrpc():

if (nmp->nm_iflag & NFSMNT_SWAPCOOKIE) {
txdr_swapcookie3(uiop->uio_offset, tl);
} else {
txdr_cookie3(uiop->uio_offset, tl);
}
tl += 2;
*tl++ = dnp->n_cookieverf.nfsuquad[0];
*tl++ = dnp->n_cookieverf.nfsuquad[1];
*tl++ = txdr_unsigned(nmp->nm_readdirsize);

I think the cookie verifier is wrong.  See: 
https://www.rfc-editor.org/rfc/rfc1813#page-81

  cookieverf
This should be set to 0 on the first request to read a
directory. On subsequent requests, it should be a
cookieverf as returned by the server. The cookieverf
must match that returned by the READDIRPLUS call in
which the cookie was acquired.

Probably what’s going on is that the server is verifying the directory cookie 
more strictly than before.  Those two lines that pack the cookieverf should be 
inserting 0s if uno_offset is 0.

-- thorpej

Re: cv_fdrestart()

2023-10-14 Thread Jason Thorpe




> On Oct 14, 2023, at 6:38 AM, Taylor R Campbell 
>  wrote:
> 
>> Date: Fri, 13 Oct 2023 13:41:50 -0700
>> From: Jason Thorpe 
>> 
>>> On Oct 13, 2023, at 11:48 AM, Andrew Doran  wrote:
>>> 
>>> Add cv_fdrestart() (better name suggestions welcome):
>> 
>> Oof.
> 
> Why do we need this at all?

Indeed.

> Condition variable semantics is standard, well-studied, and
> well-understood.  This is a foundational API that essentially every
> kernel subsystem relies integrally on, and it strikes me as a mistake
> to tie it in with file descriptors or the ERESTART mechanism.
> 
> I would ask that a controversial change like this be reverted until we
> have had substantive discussion giving a compelling reason to change
> such a foundational API to add custom, nonstandard semantics wiring it
> up to file descriptors and our ERESTART mechanism.
> 
> We can just do
> 
>foo->flags |= FOO_RESTART;
>cv_broadcast(cv);
> 
> and then the corresponding wait logic can do
> 
>if (foo->flags & FOO_RESTART)
>return ERESTART;
>cv_wait(cv);
>goto retry;

See also eventfd_wait().

My main objection was “putting fd-specific restart logic in condvars”.  Of 
course it would be better to not pollute condvars at all.

-- thorpej

cv_fdrestart()

2023-10-13 Thread Jason Thorpe



> On Oct 13, 2023, at 11:48 AM, Andrew Doran  wrote:
> 
> Module Name: src
> Committed By: ad
> Date: Fri Oct 13 18:48:56 UTC 2023
> 
> Modified Files:
> src/sys/kern: kern_condvar.c kern_sleepq.c
> src/sys/rump/librump/rumpkern: locks.c locks_up.c
> src/sys/sys: condvar.h lwp.h
> 
> Log Message:
> Add cv_fdrestart() (better name suggestions welcome):

Oof.

I’d suggest doing something like:

void
cv_broadcast_cb(kcondvar_t *cv, void (*callback)(lwp_t *))
{
. . .
}

. . . to make this a generic mechanism, rather that something so specialized 
for the few call sites that need this behavior.


> 
> Like cv_broadcast(), but make any LWPs that share the same file descriptor
> table as the caller return ERESTART when resuming.  Used to dislodge LWPs
> waiting for I/O that prevent a file descriptor from being closed, without
> upsetting access to the file (not descriptor) made from another direction.
> 
> 
> To generate a diff of this commit:
> cvs rdiff -u -r1.59 -r1.60 src/sys/kern/kern_condvar.c
> cvs rdiff -u -r1.83 -r1.84 src/sys/kern/kern_sleepq.c
> cvs rdiff -u -r1.86 -r1.87 src/sys/rump/librump/rumpkern/locks.c
> cvs rdiff -u -r1.12 -r1.13 src/sys/rump/librump/rumpkern/locks_up.c
> cvs rdiff -u -r1.17 -r1.18 src/sys/sys/condvar.h
> cvs rdiff -u -r1.227 -r1.228 src/sys/sys/lwp.h
> 
> Please note that diffs are not public domain; they are subject to the
> copyright notices on the relevant files.
> 

-- thorpej

Re: Maxphys on -current?

2023-08-04 Thread Jason Thorpe

> On Aug 3, 2023, at 2:19 PM, Brian Buhrow  wrote:
> 
> hello.  I know that this has ben a very long term project, but I'm wondering 
> about the
> status of this effort?  I note that FreeBSD-13 has a Maxphys value of 1048576 
> bytes.
> Have we found other ways to get more throughput from ATA disks that obviate 
> the need for this
> setting which I'm not aware of?
> If not, is anyone working on this project?  The wiki page says the project is 
> stalled.

If someone does pick this up, I think it would be a good idea to start from 
scratch, because MAXPHYS, as it stands, is used for multiple things.  
Thankfully, I think it would be relatively straightforward to do the work that 
I am suggesting incrementally.

Here goes...

MAXPHYS is really supposed to be “maximum transfer via physio”, which is the 
code path you use when you open /dev/rsd0e and read/write to it.  The 
user-space pages are wired and mapped into the kernel for the purpose of doing 
I/O.  MAXPHYS is a per-architecture constant because some systems have 
different constraints as to how much KVA space can be used for that at any 
given time.

Unfortunately, some of the adjacent physio machinery (e.g. minphys()) is also 
used for other purposes, specifically to clamp I/O sizes to constraints defined 
by the physical device and/or the controllers / busses they’re connected to.

Logically, these are two totally separate things, and IMO they should be 
cleanly separated.

What we *should* have is the notion of “I/O parameters” that are defined by the 
device… max I/O size, max queue depth, preferred I/O size, preferred I/O 
alignment, physical block size, logical block size, etc.  The base values for 
these parameters should come from the leaf device (e.g.. the disk), and then be 
clamped as needed by it’s connective tissue (the controller, the system bus the 
controller is connected to, and ultimately the platform-specific e.g. DMA 
constraints).

The the interface layers (the page cache / UBC, the traditional block I/O 
buffer cache, and the physio interface for user-space) can further impose their 
own constraints, as necessary per their API contract.  There is zero reason 
that MAXPHYS should impact the maximum I/O that a file system can do via the 
UBC, for example.

(In a perfect world, we wouldn’t even have to consume virtual address space to 
bring data and and out of the page cache / UBC, because we already know the 
physical addresses of the pages that are being pulled in / cleaned.)

> Any thoughts or news would be greatly appreciated.

Anyway, there are mine :-)

-- thorpej

Re: kcmp(2)

2023-07-20 Thread Jason Thorpe




> On Jul 20, 2023, at 11:01 AM, Mouse  wrote:
> 
 Don't cloner instances differ in minor number?  [...]
>>> Not that I?m aware of.  [...]
>> Well, as noted in this thread, traditionally you can tell when two
>> files are the same by examining stat results.
> 
> Maybe that should be a hint.  See below.
> 
>> And the cloner mechanism replaced an older scheme where you had to
>> pick the number of instances you wanted, and unless I'm
>> misremembering badly in that world each had to have its own minor
>> number.
> 
> That's my understanding as well.
> 
>> That said, it almost certainly isn't important...
> 
> Well, if it means that with a minor tweak NetBSD could have kcmp(3)
> instead of kcmp(2), it could be.

If you read the Linux kcmp(2) manual page, you’ll see that it’s a lot more than 
just file descriptors.

-- thorpej

Re: kcmp(2)

2023-07-18 Thread Jason Thorpe

> On Jul 18, 2023, at 8:51 PM, David Holland  wrote:
> 
> On Tue, Jul 18, 2023 at 03:25:02PM -0700, Jason Thorpe wrote:
>> That *might* work in this particular case, but it would not work
>> for e.g. a cloning device where you get additional descriptors via
>> dup() or whatever.
> 
> Don't cloner instances differ in minor number? If not, shouldn't they?

Not that I’m aware of.  They result in a new file object with a new private 
data pointer, but they don’t change the minor number and I don’t see why 
forcing them to do so would be such a good idea.  What if you had a single 
driver (that consumes a major # slot) that wanted to provide two cloning 
interfaces?  If each clone got its own minor #, then you’d be artificially 
limiting how many could be created.

-- thorpej

Re: kcmp(2)

2023-07-18 Thread Jason Thorpe




> On Jul 18, 2023, at 2:57 PM, Mouse  wrote:
> 
>>> What does Mesa use kcmp(2) for?
>> Working out whether two device handles are to the same DRM device.
> 
>>> Is there another way to accomplish what Mesa is trying to use it for?
>> I don't know of one.
> 
> Is fstat() plus checking st_rdev (and possibly other fields)
> insufficient?

That *might* work in this particular case, but it would not work for e.g. a 
cloning device where you get additional descriptors via dup() or whatever.

-- thorpej

Re: [PATCH] style(5): No struct typedefs

2023-07-14 Thread Jason Thorpe

> On Jul 14, 2023, at 2:56 AM, Martin Husemann  wrote:
> 
> The typedef itself buys you nothing but a few charactes less to type,

No, that’s not correct.  For a type that’s meant to be “opaque” (i.e. “users of 
the interface should not have to know or care about any of the implementation 
details, historical leakage be damned”), the typedef gives you true opacity… if 
you’re using “struct kmutex *” rather than “kmutex_t *” then you’re already 
eating the forbidden fruit.

-- thorpej

Re: [PATCH] style(5): No struct typedefs

2023-07-12 Thread Jason Thorpe




> On Jul 12, 2023, at 5:25 AM, Mouse  wrote:
> 
> [paragraph-length-line damage repaired manually]
> 
>> What about something like  and
>> ?  The type definitions go into the former
>> header file, [...]
> 
> Well, I don't like the "typedefs" name,

Yah, I don't like it either, but.. :-)

> because as I see it the divide
> you're sketching here (which I support, in general) is the divide
> between interface and implementation, and in some cases the interface
> is more than just the typedefs.

Sort of.

  // contains the "vnode_t" opaque type definition
 // contains the guts of "struct vnode" and the other 
implementation details

  // Contains some of the file system interfaces, some of which 
use vnode_t
  // Contains the vnode interfaces, which definitely use vnode_t

The latter if the two would each include .

Feel free to pick a better name, but that's the idea.

-- thorpej

Re: [PATCH] style(5): No struct typedefs

2023-07-12 Thread Jason Thorpe




> On Jul 11, 2023, at 9:51 PM, Robert Elz  wrote:
> 
> That is to rip the definition of __NetBSD_Version__  (with however
> nany underscores it really has) out of  and into a new
> header file all of its own  (perhaps - no bikeshedding
> about names here please) with the rule that *no* other header file is
> oermitted to include it.   C code that needs __NetBSD_Version__ needs
> to explcitly include that file itself ... there is surprisingly little
> of it.   This will make the side effects of doing a kernel version bump
> be so small (really, so it should be) that doing one is an automatic
> decision any time it might be needed, and we end the "even though this
> is a kernel internal ABI change, I don't think any modules will be affected"
> (and no no bump happens - which must be what happens sometimes, either that
> of some of our developers don't understand what a kernal internal ABI
> change means) and should also avoid the need for "ride the kernel
> version bump" (perhaps several hours later) or "let me know when you're
> going to bump the kernel version, I have changes to commit which would
> need that, but don't need to be commmitted right away" - which often
> turns into a "ride the bump" when the developer who did a change, and
> a bump, did not remeber, or perhaps even know, the other developer was
> waiting and would have likes to coordinate.

I’ve been thinking about this a little more…

What about something like  and 
?  The type definitions go into the former header 
file, that can be included by whatever other header needs that type definition. 
 The full-blown structure and internal details of it go into the latter, to be 
only included by things that access the guts, and now it’s VERY clear that “I 
am using implementation details” aspect of touching those guts.

-- thorpej

Re: [PATCH] style(5): No struct typedefs

2023-07-11 Thread Jason Thorpe




> On Jul 11, 2023, at 2:56 PM, Taylor R Campbell 
>  wrote:
> 
> I agree the keyword is ugly, and it's unfortunate that in order to
> omit it we would have to use C++, but the ugliness gives us practical
> benefits of better type-checking, reduced header file maintenance
> burden, and reduced motivation for unnecessary header file
> dependencies.

No -- you just don't have to use "void *".  Can you point to a practical 
problematic example?

-- thorpej

Re: [PATCH] style(5): No struct typedefs

2023-07-11 Thread Jason Thorpe




> On Jul 11, 2023, at 6:43 AM, Johnny Billquist  wrote:
> 
> typedefs in C don't really create new types. They are all just derivatives. 
> Sometimes I even wonder why typedef exists in C. Feels like I could 
> accomplish the same with a #define

Because #define generates pollution in unrelated things.

-- thorpej

Re: [PATCH] style(5): No struct typedefs

2023-07-11 Thread Jason Thorpe

> On Jul 11, 2023, at 3:17 AM, Taylor R Campbell  wrote:
> 
> If we used `struct bus_dma_tag *' instead, the forward declaration
> could be `struct bus_dma_tag;' instead of having to pull in all of
> sys/bus.h, _and_ the C compiler would actually check types.

In the original design, it’s not always a struct.  That was the whole point of 
using a more abstract type.

If you want to hide the struct'ness in a machdep header file, fine, but I 
completely disagree with the notion of requiring the use of the “struct” 
keyword all over the place.

-- thorpej

Re: Anonymous vnodes?

2023-06-26 Thread Jason Thorpe



> On Jun 26, 2023, at 3:13 PM, Theodore Preduta  wrote:
> 
> Is it possible to create a vnode for a regular file in a file system
> without linking the vnode to any directory, so that it disappears when
> all open file descriptors to it are closed?  (As far as I can tell, this
> isn't possible with any of the vn_* or VOP_* functions?)

There isn't a general way to do this.

> For context, I'm currently working on implementing memfd_create(2), and
> thought this might be a shortcut.  Otherwise, I'll have to implement it
> in terms of uvm operations (which is fine, just more work).

Seems like these objects should be implemented above the file system ... just 
create a new descriptor type and interface directly with UVM.

-- thorpej

Re: malloc(9) vs kmem(9) interfaces

2023-06-01 Thread Jason Thorpe



> On Jun 1, 2023, at 5:27 AM, Greg Troxel  wrote:
> 
> However, you are talking about maintaining a count of objects by user.
> That is vastly cheaper, and likely 90%+ as useful.
> 
> SO there is "object attribution" and "total usage attribution".

The old malloc() attribution mechanism just counted bytes-allocated-per-tag.

-- thorpej

Re: malloc(9) vs kmem(9) interfaces

2023-05-31 Thread Jason Thorpe

> On May 31, 2023, at 1:54 PM, Andrew Doran  wrote:

>> - What would the cost of restoring attribution be, other than the
>>  obvious O(ntag*nsizebuckets) memory cost to record it and the effort
>>  to annotate allocations?
> 
> Related to this, in my experiments it turns out that using dedicated pools
> for objects for no other reason than to to get attribution or be space
> efficient seems to exert a lot of presssure on the CPU cache to the point
> that reverting to the general purpose allocator where possible yields a
> small but measurable and repeatable reduction in system time during builds.
> I plan to do this for some objects.

My big beef with specific pools for random objects is that it reduces 
utilization of memory.  Let’s say you have a pool for “foo” structures, and 
they’re sized correctly to fit into a 128 byte kmem pool.  But they have their 
own.  And there are 7 of them allocated.  That’s 3200 bytes that are being 
wasted when those objects could be allocated from a partially-fragmented 
kmem128 pool.

Back in the day, a lot of subsystems were converted to use pools directly 
because that was how access to a direct-mapped super-page was achieved.  Now 
that kmem backs malloc(), there’s less pressure there.  But for fixed-size 
allocations, malloc() is an inefficient API, because of the need to store the 
size and then round up to the minimum allocation alignment for the architecture 
(16 bytes on some systems).

Many of the direct pool consumers should switch to kmem, IMO…  and pool_cache 
should probably have a kmem_cache counterpart (pool_cache ctor/dtor support, 
but with a generic kmem bucket, rather than a specific pool).  This would 
likely lead to a general reduction in memory fragmentation in the system.

If we really want to have an “attribution tag” system, we can do that *outside* 
of the allocator… It wouldn’t be all that different from evcnt.

-- thorpej

Re: sdmmc question.

2023-05-24 Thread Jason Thorpe

> On May 24, 2023, at 2:39 AM, Taylor R Campbell 
>  wrote:
> 
> Possible complication -- not sure if the spibus interface provides the
> hardware access needed by sd@spi, judging by this FreeBSD wiki page:
> https://wiki.freebsd.org/riscv/HiFiveUnmatched  (But this is outside
> my area of expertise, so don't let my assessment get in your way!)

Probably what that FreeBSD wiki is referring to is that you need to be able to 
strobe the CS line for the card explicitly (and repeatedly) outside of the 
context of a normal SPI I/O operation … this is required as part of the process 
of putting the card into SPI mode, which needs to be done any time a card is 
inserted.

-- thorpej

Re: PROPOSAL: Split uiomove into uiopeek, uioskip

2023-05-10 Thread Jason Thorpe

> On May 9, 2023, at 3:09 PM, Taylor R Campbell 
>  wrote:
> 
> - uiopeek leaves uio itself untouched (it may modify the source/target
>  buffers but it's idempotent).

Hm… I’m having second thoughts about uiopeek(), as well.  It implies a 
direction (“peek” feels like “read”, and “write” would feel more like a 
“poke”).  I think uiocopy() is a better name, and I think it is sufficiently 
different from uiomove() (“move” implies a sort of destructive-ness that “copy” 
does not).

-- thorpej

Re: PROPOSAL: Split uiomove into uiopeek, uioskip

2023-05-09 Thread Jason Thorpe

> On May 9, 2023, at 2:21 PM, Taylor R Campbell 
>  wrote:
> 
> tl;dr
> 
> I propose adding uiopeek(buf, n, uio) and uioskip(n, uio) which

I’m not a fan of uioskip() as a name … I think uioadvance() would be better.  
Skip implies, to my brain, that the data is being thrown away (even if you’re 
already consumed it).

-- thorpej

Re: Nixing __HAVE_ATOMIC_AS_MEMBAR

2023-02-22 Thread Jason Thorpe




> On Feb 22, 2023, at 4:48 PM, Taylor R Campbell 
>  wrote:
> 
> (c) Just make the membars unconditional -- take off the #ifndefs, but
>don't add any new functions or change the semantics of existing
>ones.

That’s my preference.  Having ti at all is confusing.

-- thorpej

Re: rw_downgrade/tryupgrade semantics

2023-02-22 Thread Jason Thorpe




> On Feb 22, 2023, at 9:40 AM, Taylor R Campbell 
>  wrote:
> 
> I believe this is guaranteed as rwlock(9) is currently implemented.
> (Whether it is _intended_ to be guaranteed, I'm not sure.)

That guarantee certainly falls in line with the principle of least 
astonishment.  We should probably document it, but it doesn’t seem correct any 
other way.

> The reason these can fail is that rw_tryupgrade and rw_downgrade only
> issue store-before-store barriers (membar_producer), putting no
> restrictions on when the CPU can load foo->nreaders relative to
> rw_tryupgrade or rw_downgrade.

That’s a bug, full-stop.

> Even though the reader is not technically _just_ a `reader', in that
> it issues stores as well as loads, I think these possibilities are
> extremely counterintuitive and possibly dangerous, especially for a
> relatively high-level API like rwlock(9) that prioritizes ease of use
> over maximal parallelism -- merely taking a read lock can lead to
> shared memory contention, which is why we also have harder-to-use but
> cheaper options like pserialize(9), psref(9), and localcount(9).

I think it would be worth documenting that “reader” and “writer” are just 
colloquialisms … “shared” and “exclusive” is what is really meant, but nearly 
everyone uses the “reader” and “writer” names.

> I'm open to other opinions; perhaps it is intended that loads in a
> writer can bleed into an adjacent read section, and that readers
> aren't supposed to issue stores anyway, and perhaps there's a good
> reason for all this.
> 
> But absent a good reason, I think rw_downgrade should be a release
> operation, and rw_tryupgrade should be an acquire operation, just like
> rw_exit and rw_enter.  Which means they need to use membar_release and
> membar_acquire inside, not membar_producer (and we need to issue
> pullups).

I agree 100% … try-upgrade and downgrade are really just optimizations … their 
effect should be equivalent to dropping the writer lock and re-acquiring as a 
reader.

-- thorpej

Re: Add five new escape sequences to wscons

2023-01-18 Thread Jason Thorpe



> On Jan 18, 2023, at 3:34 AM, David Brownlee  wrote:
> 
> Technically the wscons terminal type is wsvt25, an extended ANSI
> compatible terminal, already supporting more sequences than vt100.
> 
> Having it also support a useful subset of xterm, providing it doesn't
> add an excessive amount of complexity, seems like a useful addition,
> particularly if other systems also have a "wscons" with similar
> additional handling.

100% agree.

-- thorpej

Re: 9.99.100 fallout: bootloader and modules

2022-09-21 Thread Jason Thorpe




> On Sep 21, 2022, at 5:00 AM, Johnny Billquist  wrote:
> 
> Searching for uses of netbsd_version, there is some more broken logic in a 
> few places, following similar patterns or assumptions.
> 
> Like in /usr/src/sys/arch/i386/stand/lib/exec.c:
> 
>if (netbsd_version / 100 % 100 == 99) {
>/* -current */
>snprintf(buf, bufsize,
>"/stand/%s/%d.%d.%d/modules", machine,
>netbsd_version / 1,
>netbsd_version / 100 % 100,
>netbsd_version / 100 % 100);
>} else if (netbsd_version != 0) {
>/* release */
>snprintf(buf, bufsize,
>"/stand/%s/%d.%d/modules", machine,
>netbsd_version / 1,
>netbsd_version / 100 % 100);
>}
> 
> 
> So just changing the modulo in the place you were suggesting isn't enough. :-P

Obviously we need a function in libsa or libkern for this.

-- thorpej

Re: Problem with outstanding knotes and device detach - and a fix

2022-07-18 Thread Jason Thorpe

> On Jul 18, 2022, at 2:03 AM, Taylor R Campbell 
>  wrote:
> 
>> Date: Sun, 17 Jul 2022 12:54:56 -0700
>> From: Jason Thorpe 
>> 
>> And another new version.  This:
>> 
>> ==> Creates a knote_impl structure that's private to kern_event.c
>> that has the new lock.  I took the opportunity to move kn_influx to
>> the knote_impl as well, since absolutely no one outside of
>> kern_event.c should touch it.  This change is ABI-compatible.
>> 
>> ==> Improves the comments about the locking rules.
>> 
>> There are no functional changes since the last patch.
> 
> LGTM, thanks for working on this!
> 
> If I understand correctly, the issue is that removing the kqueue
> listener (i.e., filter_detach) and detaching the device notifier
> (i.e., klist_fini, via seldestroy) can race.  A process can close its
> kqueue file descriptor, or EV_DELETE a specific event subscription, at
> the same time as a device it had subscribed to is detaching and
> calling seldestroy.  That is, the lifetimes of:
> 
> (a) a knote in a kqueue listener for any of its event subscriptions, and
> (b) a knote in a driver's klist for any of its subscribers,
> 
> are overlapping but controlled independently, one by the kqueue user
> and the other by the driver, e.g. when the device is physically
> yanked, so the destruction by one path might happen while the other is
> still in use.  filter_detach can't safely call into the driver if the
> knote is being freed by klist_fini; klist_fini can't safely return to
> the driver if filter_detach is still in progress.

It’s actually simpler than that, and doesn’t really involve even much of a 
race… it’s that any knote associated with a device that has just been yanked 
may be referring to a device instance’s private data (via kn->kn_hook), and has 
a kn_fop pointer that points to functions provided by the now-detached device 
instance.  In the case of a driver that is statically linked into the kernel, 
at least the function pointers are probably stable and the problem is a simple 
use-after-free… but in the case of a dynamically-loaded driver module, kn_fop 
might itself now point at garbage, if the driver is unloaded.

So, if the driver detach happens at any time before the kqueue listener is 
detached, there is the potential to go BOOM (it’s not guaranteed, because UAF 
bugs are sometimes like that).

> Serializing the call to .f_detach and the replacement of kn_fop in
> klist_fini with a mutex ensures that one happens after the other for
> each knote.  Using a mutex attached to the knote itself avoids
> problems with the identity of the knote changing (close, EV_DELETE)
> and avoids problems with the identity of the device notifier changing
> (device detach) while either one is trying to run.

More or less… the mutex around the filter ops ensures that kn_fop will point to 
EITHER the driver-provided filter ops OR the stub do-nothing filter ops, and by 
holding the mutex across the call into the filter ops, ensures that 
klist_fini() will not complete while a filter op call is in-progress, thus 
ensuring that the freeing of the memory that backs the klist being torn down 
will not complete until there are no callers inside the about-to-be-gone filter 
ops.

> The kn_fop member is supposed to be stable all use points, so it
> doesn't require additional synchronization like atomic_load/store_*:
> 
> - Before .f_attach returns, the kqueue logic won't touch it, so
>  .f_attach can safely set it without synchronization.
> 
> - By the time the driver calls klist_fini (seldestroy), the driver
>  promises never to KNOTE again so it will never be touched by
>  filter_event.

It also promises to not allow any further f_attach calls to succeed.  (This is 
not new; the same promise is already made for select, and has been for many 
years).

> - filter_touch is used only for EVFILT_TIMER and EVFILT_USER, which
>  don't put the knote on any klist so never use klist_fini on it.

Right.

> - filter_detach and klist_fini are serialized by the knote foplock.

Right.

-- thorpej

Re: Problem with outstanding knotes and device detach - and a fix

2022-07-17 Thread Jason Thorpe


> On Jul 13, 2022, at 7:18 PM, Jason Thorpe  wrote:
> 
> Ok, new version.  Main differences:

And another new version.  This:

==> Creates a knote_impl structure that’s private to kern_event.c that has the 
new lock.  I took the opportunity to move kn_influx to the knote_impl as well, 
since absolutely no one outside of kern_event.c should touch it.  This change 
is ABI-compatible.

==> Improves the comments about the locking rules.

There are no functional changes since the last patch.

Index: kern/kern_event.c
===
RCS file: /cvsroot/src/sys/kern/kern_event.c,v
retrieving revision 1.143
diff -u -p -r1.143 kern_event.c
--- kern/kern_event.c   13 Jul 2022 14:11:46 -  1.143
+++ kern/kern_event.c   17 Jul 2022 19:44:06 -
@@ -120,6 +120,61 @@ static voidfilt_userdetach(struct knote
 static int filt_user(struct knote *, long hint);
 static int filt_usertouch(struct knote *, struct kevent *, long type);
 
+/*
+ * Private knote state that should never be exposed outside
+ * of kern_event.c
+ *
+ * Field locking:
+ *
+ * q   kn_kq->kq_lock
+ */
+struct knote_impl {
+   struct knoteki_knote;
+   unsigned intki_influx;  /* q: in-flux counter */
+   kmutex_tki_foplock; /* for kn_filterops */
+};
+
+#defineKIMPL_TO_KNOTE(kip) (&(kip)->ki_knote)
+#defineKNOTE_TO_KIMPL(knp) container_of((knp), struct knote_impl, 
ki_knote)
+
+static inline struct knote *
+knote_alloc(bool sleepok)
+{
+   struct knote_impl *ki;
+
+   ki = kmem_zalloc(sizeof(*ki), sleepok ? KM_SLEEP : KM_NOSLEEP);
+   mutex_init(>ki_foplock, MUTEX_DEFAULT, IPL_NONE);
+
+   return KIMPL_TO_KNOTE(ki);
+}
+
+static inline void
+knote_free(struct knote *kn)
+{
+   struct knote_impl *ki = KNOTE_TO_KIMPL(kn);
+
+   mutex_destroy(>ki_foplock);
+   kmem_free(ki, sizeof(*ki));
+}
+
+static inline void
+knote_foplock_enter(struct knote *kn)
+{
+   mutex_enter(_TO_KIMPL(kn)->ki_foplock);
+}
+
+static inline void
+knote_foplock_exit(struct knote *kn)
+{
+   mutex_exit(_TO_KIMPL(kn)->ki_foplock);
+}
+
+static inline bool
+knote_foplock_owned(struct knote *kn)
+{
+   return mutex_owned(_TO_KIMPL(kn)->ki_foplock);
+}
+
 static const struct fileops kqueueops = {
.fo_name = "kqueue",
.fo_read = (void *)enxio,
@@ -133,6 +188,31 @@ static const struct fileops kqueueops = 
.fo_restart = kqueue_restart,
 };
 
+static void
+filt_nopdetach(struct knote *kn __unused)
+{
+}
+
+static int
+filt_nopevent(struct knote *kn __unused, long hint __unused)
+{
+   return 0;
+}
+
+static const struct filterops nop_fd_filtops = {
+   .f_flags = FILTEROP_ISFD | FILTEROP_MPSAFE,
+   .f_attach = NULL,
+   .f_detach = filt_nopdetach,
+   .f_event = filt_nopevent,
+};
+
+static const struct filterops nop_filtops = {
+   .f_flags = FILTEROP_MPSAFE,
+   .f_attach = NULL,
+   .f_detach = filt_nopdetach,
+   .f_event = filt_nopevent,
+};
+
 static const struct filterops kqread_filtops = {
.f_flags = FILTEROP_ISFD | FILTEROP_MPSAFE,
.f_attach = NULL,
@@ -232,15 +312,41 @@ static size_t user_kfiltersz; /* size 
  * -> object lock (e.g., device driver lock, )
  * -> kn_kq->kq_lock
  *
- * Locking rules:
+ * Locking rules.  ==> indicates the lock is acquired by the backing
+ * object, locks prior are acquired before calling filter ops:
+ *
+ * f_attach: fdp->fd_lock -> knote foplock ->
+ *   (maybe) KERNEL_LOCK ==> backing object lock
+ *
+ * f_detach: fdp->fd_lock -> knote foplock ->
+ *(maybe) KERNEL_LOCK ==> backing object lock
+ *
+ * f_event via kevent: fdp->fd_lock -> knote foplock ->
+ *(maybe) KERNEL_LOCK ==> backing object lock
+ *N.B. NOTE_SUBMIT will never be set in the "hint" argument
+ *in this case.
  *
- * f_attach: fdp->fd_lock, KERNEL_LOCK
- * f_detach: fdp->fd_lock, KERNEL_LOCK
- * f_event(!NOTE_SUBMIT) via kevent: fdp->fd_lock, _no_ object lock
- * f_event via knote: whatever caller guarantees
- * Typically,  f_event(NOTE_SUBMIT) via knote: object lock
- * f_event(!NOTE_SUBMIT) via knote: nothing,
- * acquires/releases object lock inside.
+ * f_event via knote (via backing object: Whatever caller guarantees.
+ * Typically:
+ * f_event(NOTE_SUBMIT): caller has already acquired backing
+ * object lock.
+ * f_event(!NOTE_SUBMIT): caller has not acquired backing object,
+ * lock or has possibly acquired KERNEL_LOCK.  Backing object
+ * lock may or may not be acquired as-needed.
+ * N.B. the knote foplock will **not** be acquired in this case.  The
+ *

Re: bus_space_barrier semantics

2022-07-16 Thread Jason Thorpe

> On Jul 16, 2022, at 11:15 AM, Taylor R Campbell  wrote:
> 
> Thoughts?

110% on-board with all of this.

> P.S.  I also considered eliminating the offset/length argument,
> because no implementations take advantage of them, so I started (with
> maya@'s help wrangling coccinelle) to draft a patch to remove them.
> However, the other BSDs have the same API, so changing this would make
> it more of a pain to share drivers.

Didn’t OpenBSD remove the offset / length arguments?  Anyway, I’m not 
particularly concerned about this part, but it would be a nice wart to rid 
ourselves of.

-- thorpej

Re: Problem with outstanding knotes and device detach - and a fix

2022-07-13 Thread Jason Thorpe


> On Jul 13, 2022, at 12:02 PM, Jason Thorpe  wrote:
> 
>> On Jul 13, 2022, at 11:25 AM, Taylor R Campbell 
>>  wrote:
>> 
>> Sorry, haven't had time yet to do a full review, but I spot at least
>> one problem that means this won't fly as is: kqueue_register and
>> kqueue_scan both call filter_touch under a spin lock, but with your
>> patch filter_touch now takes an rwlock -- which is forbidden under a
>> spin lock (and it'll crash under LOCKDEBUG).
> 
> I found another issue as well and will post a follow-up later.

Ok, new version.  Main differences:

==> each knote gets its own mutex, rather than using a global rwlock.

==> Moved the locking outside of the filter_*() functions.

==> “Dealt” with the locking issue around the “touch” op, by only allowing 
user_filtops and timer_filtops to use it, because they’re known to be safe to 
call “touch” without holding the knote foplock (they don’t put notes on any 
lists).

The knote foplock got tossed onto the end of struct knote with the idea being 
that it doesn’t change the knote ABI for modules (they can’t allocate their own 
knotes).  If this gets aligned with another change that bumps the kernel 
version, I’ll put the foplock up next to fop.

This passes the kqueue tests on a LOCKDEBUG kernel.

Index: kern/kern_event.c
===
RCS file: /cvsroot/src/sys/kern/kern_event.c,v
retrieving revision 1.143
diff -u -p -r1.143 kern_event.c
--- kern/kern_event.c   13 Jul 2022 14:11:46 -  1.143
+++ kern/kern_event.c   14 Jul 2022 02:01:44 -
@@ -133,6 +133,31 @@ static const struct fileops kqueueops = 
.fo_restart = kqueue_restart,
 };
 
+static void
+filt_nopdetach(struct knote *kn __unused)
+{
+}
+
+static int
+filt_nopevent(struct knote *kn __unused, long hint __unused)
+{
+   return 0;
+}
+
+static const struct filterops nop_fd_filtops = {
+   .f_flags = FILTEROP_ISFD | FILTEROP_MPSAFE,
+   .f_attach = NULL,
+   .f_detach = filt_nopdetach,
+   .f_event = filt_nopevent,
+};
+
+static const struct filterops nop_filtops = {
+   .f_flags = FILTEROP_MPSAFE,
+   .f_attach = NULL,
+   .f_detach = filt_nopdetach,
+   .f_event = filt_nopevent,
+};
+
 static const struct filterops kqread_filtops = {
.f_flags = FILTEROP_ISFD | FILTEROP_MPSAFE,
.f_attach = NULL,
@@ -235,12 +260,14 @@ static size_t user_kfiltersz; /* size 
  * Locking rules:
  *
  * f_attach: fdp->fd_lock, KERNEL_LOCK
- * f_detach: fdp->fd_lock, KERNEL_LOCK
- * f_event(!NOTE_SUBMIT) via kevent: fdp->fd_lock, _no_ object lock
- * f_event via knote: whatever caller guarantees
+ * f_detach: fdp->fd_lock, knote foplock, KERNEL_LOCK
+ * f_event(!NOTE_SUBMIT) via kevent: fdp->fd_lock, knote foplock,
+ *   acquires/releases object lock inside
+ * f_event via knote: whatever caller guarantees,
+ * knote foplock _not_ acquired
  * Typically,  f_event(NOTE_SUBMIT) via knote: object lock
- * f_event(!NOTE_SUBMIT) via knote: nothing,
- * acquires/releases object lock inside.
+ * f_event(!NOTE_SUBMIT) via knote:
+ * acquires/releases object lock inside
  *
  * Locking rules when detaching knotes:
  *
@@ -429,6 +456,7 @@ knote_alloc(bool sleepok)
struct knote *kn;
 
kn = kmem_zalloc(sizeof(*kn), sleepok ? KM_SLEEP : KM_NOSLEEP);
+   mutex_init(>kn_foplock, MUTEX_DEFAULT, IPL_NONE);
 
return kn;
 }
@@ -436,14 +464,28 @@ knote_alloc(bool sleepok)
 static inline void
 knote_free(struct knote *kn)
 {
+   mutex_destroy(>kn_foplock);
kmem_free(kn, sizeof(*kn));
 }
 
+/*
+ * Calls into the filterops need to be resilient against things which
+ * destroy a klist, e.g. device detach, freeing a vnode, etc., to avoid
+ * chasing garbage pointers (to data, or even potentially code in a
+ * module about to be unloaded).  To that end, we acquire the
+ * knote foplock before calling into the filter ops.  When a driver
+ * (or anything else) is tearing down its klist, klist_fini() enumerates
+ * each knote, acquires its foplock, and replaces the filterops with a
+ * nop stub, allowing knote detach (when descriptors are closed) to safely
+ * proceed.
+ */
+
 static int
 filter_attach(struct knote *kn)
 {
int rv;
 
+   KASSERT(mutex_owned(>kn_foplock));
KASSERT(kn->kn_fop != NULL);
KASSERT(kn->kn_fop->f_attach != NULL);
 
@@ -465,6 +507,8 @@ filter_attach(struct knote *kn)
 static void
 filter_detach(struct knote *kn)
 {
+
+   KASSERT(mutex_owned(>kn_foplock));
KASSERT(kn->kn_fop != NULL);
KASSERT(kn->kn_fop->f_detach != NULL);
 
@@ -478,10 +522,12 @

Re: Problem with outstanding knotes and device detach - and a fix

2022-07-13 Thread Jason Thorpe



> On Jul 13, 2022, at 11:25 AM, Taylor R Campbell 
>  wrote:
> 
> Sorry, haven't had time yet to do a full review, but I spot at least
> one problem that means this won't fly as is: kqueue_register and
> kqueue_scan both call filter_touch under a spin lock, but with your
> patch filter_touch now takes an rwlock -- which is forbidden under a
> spin lock (and it'll crash under LOCKDEBUG).

I found another issue as well and will post a follow-up later.

-- thorpej

Re: Problem with outstanding knotes and device detach - and a fix

2022-07-12 Thread Jason Thorpe

> On Jul 12, 2022, at 7:54 AM, Jason Thorpe  wrote:
> 
> The following patch rectifies this situation by having klist_fini() traverse 
> a list of knotes and substitute their filterops with no-op stubs.  This 
> requires synchronizing with any calls into the filterops themselves.  We have 
> convenient funnel-points for this in kern_event.c already, so it’s not 
> particularly difficult… the main issue is that the filterops themselves are 
> allowed to block.  My solution to this was to have a single global rwlock, 
> knote_filterops_lock, that’s acquired for reading when a call though a 
> knote's filterops is made, and in klist_fini() is acquired for writing for 
> the short duration needed to stub out filterops.  This lock should **very 
> rarely** be contended, but it will be *hot* (lots of acquire-for-read), so it 
> gets its own cache line.
> 
> If someone has ideas about another synchronization mechanism, I’m all ears… 
> again, the main issue is that the filterops calls themselves are allowed to 
> block, so I’m not sure that any of our passive serialization mechanisms would 
> work here.

FWIW, another alternative is to put the rwlock into the knote itself, if we 
think that lock will be Too Hot (which is not an unreasonable concern on a 
machine with many cores).  The trade-off is memory.  I could see putting the 
lock in the knote itself on amd64 and aarch64 (where we would expect to see 
high core count machines) and a global lock on everything else.  Then again, on 
amd64, struct knote is 136 bytes, so it already spills into the kmem-00192 
bucket, and on i386 it’s 84 bytes, so kmem-00128 bucket.  I guess each of those 
could easily accommodate an additional pointer-sized struct member to alleviate 
any scalability concern about a global rwlock.

-- thorpej

Re: Problem with outstanding knotes and device detach - and a fix

2022-07-12 Thread Jason Thorpe



> On Jul 12, 2022, at 10:27 AM, Jason Thorpe  wrote:
> 
> 
>> On Jul 12, 2022, at 7:54 AM, Jason Thorpe  wrote:
>> 
>> If someone has ideas about another synchronization mechanism, I’m all ears… 
>> again, the main issue is that the filterops calls themselves are allowed to 
>> block, so I’m not sure that any of our passive serialization mechanisms 
>> would work here.
>> 
>> 
> 
> Well, for some reason, I mistakenly posted an old version of this patch.  
> I’ll post a refresh later today.

Oh, no, never mind, I’m wrong about this :-). NOTHING TO SEE HERE *whistles*

-- thorpej

Re: Problem with outstanding knotes and device detach - and a fix

2022-07-12 Thread Jason Thorpe



> On Jul 12, 2022, at 7:54 AM, Jason Thorpe  wrote:
> 
> If someone has ideas about another synchronization mechanism, I’m all ears… 
> again, the main issue is that the filterops calls themselves are allowed to 
> block, so I’m not sure that any of our passive serialization mechanisms would 
> work here.
> 
> 

Well, for some reason, I mistakenly posted an old version of this patch.  I’ll 
post a refresh later today.

-- thorpej

Fix for PR kern/56713

2022-07-12 Thread Jason Thorpe

When I changed vnode kqueue events to be posted from common code at the VFS 
layer, I introduced a regression in sending those events on stacked file 
systems like nullfs.  The root cause of the problem is that when knotes are 
attached to the vnode, that operation (by way of the VOP bypass mechanism in 
layerfs) drills all the way down to the bottom of the vnode stack and attaches 
there (as intended), but with event delivery being handled now in vnode_if.c, 
it’s the upper vnode that’s being worked with, and no knotes are attached there.

These stacked vnodes already share a bit of state with the bottom layer… 
(locks, etc.), so my solution to this problem is to allow these stacked vnodes 
to also refer to the bottom vnode’s klist.  Each vnode always starts out 
referring to its own, but as a layerfs vnode is constructed, it is told to 
refer to the lower vnode's klist (just as it refers to the lower vnode’s 
interlock and uvm_obj lock).  Various assertions are included to ensure that 
knotes are never attached to a vnode that refers to another vnode’s klist (they 
should always be attached to the bottom-most vnode).

(After applying the patch, vnode_if.c must be rebuilt.)

Index: sys/fs/union/union_subr.c
===
RCS file: /cvsroot/src/sys/fs/union/union_subr.c,v
retrieving revision 1.81
diff -u -p -r1.81 union_subr.c
--- sys/fs/union/union_subr.c   19 Mar 2022 13:53:32 -  1.81
+++ sys/fs/union/union_subr.c   12 Jul 2022 00:17:28 -
@@ -232,10 +232,11 @@ union_newupper(struct union_node *un, st
unlock_ap.a_desc = VDESC(vop_unlock);
unlock_ap.a_vp = UNIONTOV(un);
genfs_unlock(_ap);
-   /* Update union vnode interlock & vmobjlock. */
+   /* Update union vnode interlock, vmobjlock, & klist. */
vshareilock(UNIONTOV(un), uppervp);
rw_obj_hold(uppervp->v_uobj.vmobjlock);
uvm_obj_setlock((un)->v_uobj, uppervp->v_uobj.vmobjlock);
+   vshareklist(UNIONTOV(un), uppervp);
mutex_exit(>un_lock);
if (ohash != nhash) {
LIST_INSERT_HEAD([nhash], un, un_cache);
@@ -577,6 +578,7 @@ union_loadvnode(struct mount *mp, struct
vshareilock(vp, svp);
rw_obj_hold(svp->v_uobj.vmobjlock);
uvm_obj_setlock(>v_uobj, svp->v_uobj.vmobjlock);
+   vshareklist(vp, svp);
 
/* detect the root vnode (and aliases) */
if ((un->un_uppervp == um->um_uppervp) &&
Index: sys/miscfs/genfs/layer_vfsops.c
===
RCS file: /cvsroot/src/sys/miscfs/genfs/layer_vfsops.c,v
retrieving revision 1.54
diff -u -p -r1.54 layer_vfsops.c
--- sys/miscfs/genfs/layer_vfsops.c 23 Feb 2020 15:46:41 -  1.54
+++ sys/miscfs/genfs/layer_vfsops.c 12 Jul 2022 00:17:28 -
@@ -205,10 +205,11 @@ layerfs_loadvnode(struct mount *mp, stru
 
xp = kmem_alloc(lmp->layerm_size, KM_SLEEP);
 
-   /* Share the interlock and vmobjlock with the lower node. */
+   /* Share the interlock, vmobjlock, and klist with the lower node. */
vshareilock(vp, lowervp);
rw_obj_hold(lowervp->v_uobj.vmobjlock);
uvm_obj_setlock(>v_uobj, lowervp->v_uobj.vmobjlock);
+   vshareklist(vp, lowervp);
 
vp->v_tag = lmp->layerm_tag;
vp->v_type = lowervp->v_type;
Index: sys/kern/vfs_vnode.c
===
RCS file: /cvsroot/src/sys/kern/vfs_vnode.c,v
retrieving revision 1.143
diff -u -p -r1.143 vfs_vnode.c
--- sys/kern/vfs_vnode.c9 Apr 2022 23:45:45 -   1.143
+++ sys/kern/vfs_vnode.c12 Jul 2022 00:17:28 -
@@ -457,7 +457,8 @@ vnalloc_marker(struct mount *mp)
vp->v_mount = mp;
vp->v_type = VBAD;
vp->v_interlock = mutex_obj_alloc(MUTEX_DEFAULT, IPL_NONE);
-   klist_init(>v_klist);
+   klist_init(>vi_klist.vk_klist);
+   vp->v_klist = >vi_klist;
vip->vi_state = VS_MARKER;
 
return vp;
@@ -475,7 +476,7 @@ vnfree_marker(vnode_t *vp)
KASSERT(vip->vi_state == VS_MARKER);
mutex_obj_free(vp->v_interlock);
uvm_obj_destroy(>v_uobj, true);
-   klist_fini(>v_klist);
+   klist_fini(>vi_klist.vk_klist);
pool_cache_put(vcache_pool, vip);
 }
 
@@ -1391,7 +1392,8 @@ vcache_alloc(void)
vp->v_interlock = mutex_obj_alloc(MUTEX_DEFAULT, IPL_NONE);
 
uvm_obj_init(>v_uobj, _vnodeops, true, 1);
-   klist_init(>v_klist);
+   klist_init(>vi_klist.vk_klist);
+   vp->v_klist = >vi_klist;
cv_init(>v_cv, "vnode");
cache_vnode_init(vp);
 
@@ -1453,7 +1455,9 @@ vcache_free(vnode_impl_t *vip)
mutex_obj_free(vp->v_interlock);
rw_destroy(>vi_lock);
uvm_obj_destroy(>v_uobj, true);
-   klist_fini(>v_klist);
+   KASSERT(vp->v_klist == >vi_klist ||
+   SLIST_EMPTY(>vi_klist.vk_klist));
+   klist_fini(>vi_klist.vk_klist);
cv_destroy(>v_cv);

Problem with outstanding knotes and device detach - and a fix

2022-07-12 Thread Jason Thorpe

Background: kqueue events are represented by a “knote” that is kept on a list 
of knotes (a klist) by its respective driver (usually indirectly in the 
“selinfo” structure).  A similar situation exists for vnodes.  Then, when 
activated, they are queued onto the kqueue’s active note list.  The driver gets 
called to attach the knote in a generic fashion, but then provides its own ops 
vector for subsequent operations, including “detach”.  When a file descriptor 
for a given knote is closed, this driver-supplied “detach” function is called 
to remove it from whatever driver-specific klist the knote is on.

When a driver detaches, it frees its softc structure, which includes the 
selinfo / klist.  This list might still have notes on it… but that isn’t 
inherently problematic, because the driver owns the list and no one will ever 
traverse it again — the generic kqueue code doesn’t care about it.

But it is a problem when an outstanding file descriptor with a registered 
kqueue event (e.g. EVFILT_READ) for that device that may have been open when 
the device was detached.  In this case, when the file descriptor is closed, the 
knote, which still points to the now-detached device’s filterops, will call the 
“detach” operation.  For a driver that is still loaded into the kernel, we call 
code that’s still valid, but a use-after-free condition exists where the 
now-freed softc structure will be accessed to unlink the knote from the klist.  
The situation is worse for modular drivers that may have been unloaded after 
the device detaches — jumping through function pointers to code that no longer 
exists is bad, m’kay?

The following patch rectifies this situation by having klist_fini() traverse a 
list of knotes and substitute their filterops with no-op stubs.  This requires 
synchronizing with any calls into the filterops themselves.  We have convenient 
funnel-points for this in kern_event.c already, so it’s not particularly 
difficult… the main issue is that the filterops themselves are allowed to 
block.  My solution to this was to have a single global rwlock, 
knote_filterops_lock, that’s acquired for reading when a call though a knote's 
filterops is made, and in klist_fini() is acquired for writing for the short 
duration needed to stub out filterops.  This lock should **very rarely** be 
contended, but it will be *hot* (lots of acquire-for-read), so it gets its own 
cache line.

If someone has ideas about another synchronization mechanism, I’m all ears… 
again, the main issue is that the filterops calls themselves are allowed to 
block, so I’m not sure that any of our passive serialization mechanisms would 
work here.

Index: kern/kern_event.c
===
RCS file: /cvsroot/src/sys/kern/kern_event.c,v
retrieving revision 1.141
diff -u -p -r1.141 kern_event.c
--- kern/kern_event.c   24 May 2022 20:50:19 -  1.141
+++ kern/kern_event.c   11 Jul 2022 20:53:40 -
@@ -133,6 +133,31 @@ static const struct fileops kqueueops = 
.fo_restart = kqueue_restart,
 };
 
+static void
+filt_nopdetach(struct knote *kn __unused)
+{
+}
+
+static int
+filt_nopevent(struct knote *kn __unused, long hint __unused)
+{
+   return 0;
+}
+
+static const struct filterops nop_fd_filtops = {
+   .f_flags = FILTEROP_ISFD | FILTEROP_MPSAFE,
+   .f_attach = NULL,
+   .f_detach = filt_nopdetach,
+   .f_event = filt_nopevent,
+};
+
+static const struct filterops nop_filtops = {
+   .f_flags = FILTEROP_MPSAFE,
+   .f_attach = NULL,
+   .f_detach = filt_nopdetach,
+   .f_event = filt_nopevent,
+};
+
 static const struct filterops kqread_filtops = {
.f_flags = FILTEROP_ISFD | FILTEROP_MPSAFE,
.f_attach = NULL,
@@ -232,6 +257,9 @@ static size_t   user_kfiltersz; /* size 
  * -> object lock (e.g., device driver lock, )
  * -> kn_kq->kq_lock
  *
+ * knote_filterops_lock (rw)
+ * -> whatever will be acquired in filter_*()
+ *
  * Locking rules:
  *
  * f_attach: fdp->fd_lock, KERNEL_LOCK
@@ -279,6 +307,24 @@ static size_t  user_kfiltersz; /* size 
  */
 static krwlock_t   kqueue_filter_lock; /* lock on filter lists */
 
+/*
+ * Calls into the filterops need to be resilient against things which
+ * destroy a klist, e.g. device detach, freeing a vnode, etc., to avoid
+ * chasing garbage pointers (to data, or even potentially code in a
+ * module about to be unloaded).  To that end, we acquire the
+ * knote_filterops_lock as a reader whenver calling into the filter
+ * ops (in the filter_*() functions).  When a driver (or anything else)
+ * is tearing down its klist, klist_fini() acquires knote_filterops_lock
+ * as a writer, and replaces all of the filterops in each knote with a
+ * stub, allowing knote detach (when descriptors are closed) to safely
+ * proceed.
+ *
+ * This lock should be very rarely *contended* (klist tear-down is fairly
+ * rare), but it

Re: membar_enter semantics

2022-02-11 Thread Jason Thorpe

> On Feb 11, 2022, at 10:27 AM, Taylor R Campbell  wrote:
> 
> So x86, powerpc, and sparc64 all implement what I suggest membar_enter
> _should_ be (and what all current users use it for!), and _fail_ to
> implement what membar_enter is documented to be in the man page.
> 
> And of all the membar_enter definitions, only the riscv one fails to
> guarantee the load-before-load/store ordering I suggest it should be
> documented to have.

My beef with the membar_enter definitional change is with the word "acquire".  
I.e. you want to give it what is called today "acquire" semantics.  My beef is 
with now "acquire" is defined, as load-before-load/store.  I think it's a 
non-intuitive word to use for that kind of barrier, especially when used in 
context of the Solaris description of membar_enter(), as you quoted earlier:

The membar_enter() function is a generic memory barrier used
during lock entry.  It is placed after the memory operation that
acquires the lock to guarantee that the lock protects its data.
No stores from after the memory barrier will reach visibility and
no loads from after the barrier will be resolved before the lock
acquisition reaches global visibility.

So, let's again take a look at the rules for PSO from the SPARC v9 architecture 
manual:

"""
The rules for PSO are:

* Loads are blocking and ordered with respect to earlier loads.
* Atomic load-stores are ordered with respect to loads.

Thus, PSO ensures that:
* Each load and atomic load-store instruction behaves as if it were followed by 
a MEMBAR with a mask value of 05.
* Explicit MEMBAR instructions are required to order store and atomic 
load-store instructions with respect to each other.
"""

My gripe hinges on what "lock acquisition" means.  In SPARC-land, the above 
Solaris description can be broken down into three cases:

v7 -- Locks are acquired with "ldstub", and memory is always strongly-ordered; 
membar_enter() is a nop.

v8 -- Locks are acquired with "ldstub", and CPU Partial Store Order; 
membar_enter() is "stbar" (equivalent to "membar #StoreStore" in v9-speak).

V9 -- Locks are acquired with either "ldstub" or "casx", and CPU can do PSO and 
RMO.  Both "ldstub" and "casx" are the same with respect to how they interact 
with barriers.  But PSO and RMO have two different requirements:

v9-PSO -- Because Atomic load-stores ("ldstub" and "casx") are not ordered with 
respect to stores, you would need "membar #StoreStore" (in PSO mode, Atomic 
load-stores are already strongly ordered with respect to other loads).

v9-RMO -- In **this case**, because an Atomic load-store is a load, you can use 
"membar #LoadLoad | #LoadStore" (your proposed semantics: 
load-before-load/store).  But because an Atomic load-store is also a store, you 
can also use "membar #StoreLoad | #StoreStore" (the current definition in our 
man page: store-before-load/store).

Now, because in PSO mode, Atomic load-stores are not strongly ordered with 
respect to stores, in order for the following code to work:

mutex_enter();
*foo = 0;
result = *bar;
mutex_exit();

...then you need to issue a "membar #StoreStore" because the ordering of the 
lock acquisition and the store through *foo is not guaranteed without it.  But 
you can also issue a "membar #StoreLoad | #StoreStore", which also works in RMO 
mode.

In other words, it's the **store into the lock cell** that actually performs 
the acquisition of the lock.  In addition to being true on platforms that have 
Atomic load-store (like SPARC), it is also true on platforms that have LL/SC 
semantics (the load in that case doesn't mean jack-squat, and the ordering 
guarantees that the LL has are specifically with respect to the paired SC).  
Until that store is globally visible, the lock does not protect its data, on 
all platforms that NetBSD supports.  And this is why I described membar_enter() 
the way I did.

Now, if we want to loosen the definition to be "acquisition of lock", which 
would allow a load-before-load/store after an Atomic load-store, then that's 
fine, but I would object to defining "membar_enter()" as 
"load-before-load/store" ... rather, I would word it in a more fuzzy way such 
that load-before-load/store is allowed if such a barrier works for that 
architecture, but would also allow store-before-load/store if some other 
architecture needs those semantics for whatever reason.

And that's why I'd also prefer to have a new name for an additional operation 
if you want to have one that is explicitly "load-before-load/store".

> (sparc64 membar_sync also fails to order store-before-load, and that's
> clearly just an implementation bug because membar_sync has always been
> unequivocally intended and used as a full load/store-before-load/store
> sequential consistency barrier.  Easy fix: change membar_sync to do
> `membar #StoreLoad' for kernel running in TSO.  If userland is really
> running in RMO, then all the membars need to be

Re: membar_enter semantics

2022-02-11 Thread Jason Thorpe

> On Feb 11, 2022, at 9:03 AM, Jason Thorpe  wrote:
> 
> 
>> On Feb 11, 2022, at 8:45 AM, Martin Husemann  wrote:
>> 
>> so Jason is correct. For userland we default to RMO for 64bit binaries
>> (but honor their elf flags), and TSO for 32bit.
> 
> Right, v7 didn't have any barrier instructions, and v8 only had STBAR (a 
> stores-before-stores barrier), IIRC.  I did not realize we are using RMO for 
> 64-bit userland, tho.

So, given the existing definition of membar_enter() in the man page, sparc64's 
userland membar_sync() really ought to be "membar #StoreStore | #StoreLoad", 
and with Taylor's newly-proposed semantics it would need to be "membar 
#LoadLoad | #LoadStore".

The current "membar #LoadLoad" is *only* maybe-OK if you're in TSO mode, and 
even then I think it's dicey.

-- thorpej

Re: membar_enter semantics

2022-02-11 Thread Jason Thorpe

> On Feb 11, 2022, at 8:45 AM, Martin Husemann  wrote:
> 
> so Jason is correct. For userland we default to RMO for 64bit binaries
> (but honor their elf flags), and TSO for 32bit.

Right, v7 didn't have any barrier instructions, and v8 only had STBAR (a 
stores-before-stores barrier), IIRC.  I did not realize we are using RMO for 
64-bit userland, tho.

-- thorpej

Re: membar_enter semantics

2022-02-11 Thread Jason Thorpe

> On Feb 11, 2022, at 5:33 AM, Taylor R Campbell  wrote:
> 
> sparc64: alias for membar_sync (not sure membar #LoadLoad is right, though!)

Almost forgot to mention ... I wouldn't read too much into how sparc64's 
barriers are implemented ... I'm pretty sure our sparc64 port configures the 
system for Total Store Order and not Partial Store Order.

-- thorpej

Re: membar_enter semantics

2022-02-11 Thread Jason Thorpe



> On Feb 11, 2022, at 7:07 AM, Taylor R Campbell  wrote:
> 
>> Date: Fri, 11 Feb 2022 06:52:48 -0800
>> From: Jason Thorpe 
>> 
>> I would prefer we adopt the Solaris description about a generic
>> barrier that provides "lock-is-visible-before-load/store" without
>> explicitly stating "load-before-load/store", and provide a new
>> membar_acquire() that means "load-before-load/store".
> 
> What does `lock-is-visible' mean?  How would you ever use it
> correctly, or audit correct use?  What operations do we have that have
> `lock-is-visible' semantics, which don't also imply an acquire
> operation and thus don't need an explicit barrier anyway?

Actually, you know what, I’m just going to defer on all of this.  I have a huge 
beef with the modern naming of various memory barrier operations (they are not 
intuitive at all, IMO) … so whatever.

> It turns out that _even on x86_, our membar_enter fails to implement
> the semantics we documented for it

Sure, change the documentation rather than fix the bug, then.

-- thorpej

Re: membar_enter semantics

2022-02-11 Thread Jason Thorpe

> On Feb 11, 2022, at 5:33 AM, Taylor R Campbell  wrote:
> 
> So I propose to change the membar_enter documentation to match the
> definitions and usage (and change the riscv definition), making it
> instead:
> 
> membar_enter()
>   Any load preceding membar_enter() will happen before all memory
>   operations following it.
> 
> This will also let us delete the obnoxious text I added to
> atomic_loadstore(9) warning about membar_enter semantics.

I would prefer we adopt the Solaris description about a generic barrier that 
provides “lock-is-visible-before-load/store” without explicitly stating 
“load-before-load/store”, and provide a new membar_acquire() that means 
“load-before-load/store”.

-- thorpej

Re: membar_enter semantics

2022-02-11 Thread Jason Thorpe

> On Feb 11, 2022, at 5:33 AM, Taylor R Campbell  wrote:
> 
> The following definitions provide load-before-load/store:
> 
> aarch64: alias for membar_sync
> alpha: alias for membar_sync
> arm: alias for membar_sync

Actually, on Alpha[*], an alias to membar_sync is really providing 
“load/store-before-load/store”.  That’s supposed to be what membar_sync() is 
defined as everywhere.

[*] MB is “load/store-before-load/store” and WMB is “store-before-store”.

-- thorpej

Re: Regulator

2022-01-12 Thread Jason Thorpe

> On Jan 12, 2022, at 9:08 AM, Jason Thorpe  wrote:
> 
>> On Jan 12, 2022, at 8:02 AM, Taylor R Campbell 
>>  wrote:
>> 
>>> Date: Wed, 12 Jan 2022 15:42:17 +
>>> From: Robert Swindells 
>>> 
>>> I'm guessing this is a platform that doesn't use FDT.
>>> 
>>> Maybe we need a more general API for regulators.
>> 
>> What other instances would a more general API cover?  Is there some
>> ACPI interface for regulators?
> 
> The ACPI interface for regulators is usually "put device into power state X", 
> but of course that depends on the right ASL being present in the node for the 
> device.

I guess I forgot to mention, it's not necessarily about setting the device into 
a specific power state (like "soft-suspend" vs "deep-sleep" vs "off").  In 
addition to "on" and "off", the regulator API can be used to step voltages.  
This is useful when you want to consume as little power as possible for a given 
performance requirement... like if you're running an interface at a slower 
clock rate (for whatever reason), you can sometimes step down the voltage 
because you don't need as much drive strength to defeat the capacitances of the 
physical device, for example.  I don't know that ACPI really encapsulates that 
(unless there are DSMs or whatever defined for the device type that know how to 
do such stepping).

In general, the philosophy between ACPI and Device Tree are different in this 
regard.  ACPI aims to encapsulate the knowledge in methods that the firmware 
provides, and Device Tree aims to describe the connections between the elements 
and then provides generic interfaces to interact with those elements.

-- thorpej

Re: Regulator

2022-01-12 Thread Jason Thorpe



> On Jan 12, 2022, at 9:17 AM, Jason Thorpe  wrote:
> 
>> Both lima and panfrost need to make calls to the regulator API, as well
>> as clock, reset and interrupt ones.
> 
> Lima will probably be fine because the platforms that need it will use FDT.  
> Except for if you're using that board with ACPI, I guess.  Same situation 
> with resets.  Interrupts are another story and a bit more complicated.

"Resets" is also a little more complicated, because there are two kinds on the 
Device Tree bindings universe:

- Reset Controllers (which use the "Reset" bindings)

- Reset GPIOs (which use the "GPIO" bindings)

Our FDT GPIO and generic GPIO interfaces are in need of harmonizing, and I've 
been thinking about that lately, so hopefully will have some bandwidth to make 
progress on that front in the coming months.

-- thorpej

Re: Regulator

2022-01-12 Thread Jason Thorpe

> On Jan 12, 2022, at 9:12 AM, Robert Swindells  wrote:
> 
> 
> Taylor R Campbell  wrote:
>> Date: Wed, 12 Jan 2022 15:42:17 +
>>> From: Robert Swindells 
>>> 
>>> I'm guessing this is a platform that doesn't use FDT.
>>> 
>>> Maybe we need a more general API for regulators.
>> 
>> What other instances would a more general API cover?  Is there some
>> ACPI interface for regulators?
> 
> Don't know, somebody could look at what the Linux API covers.
> 
> Could also look at doing clocks, resets and interrupts.

We have one for clocks, but the only way to acquire a clock handle is via FDT 
at the moment.

> Both lima and panfrost need to make calls to the regulator API, as well
> as clock, reset and interrupt ones.

Lima will probably be fine because the platforms that need it will use FDT.  
Except for if you're using that board with ACPI, I guess.  Same situation with 
resets.  Interrupts are another story and a bit more complicated.

-- thorpej

Re: Regulator

2022-01-12 Thread Jason Thorpe




> On Jan 12, 2022, at 9:08 AM, Jason Thorpe  wrote:
> 
> 
> 
>> On Jan 12, 2022, at 8:02 AM, Taylor R Campbell 
>>  wrote:
>> 
>>> Date: Wed, 12 Jan 2022 15:42:17 +
>>> From: Robert Swindells 
>>> 
>>> I'm guessing this is a platform that doesn't use FDT.
>>> 
>>> Maybe we need a more general API for regulators.
>> 
>> What other instances would a more general API cover?  Is there some
>> ACPI interface for regulators?
> 
> The ACPI interface for regulators is usually "put device into power state X", 
> but of course that depends on the right ASL being present in the node for the 
> device.

(Of course I mean AML here, but of course it's all initially written in ASL, 
so...)

> 
>>> The current situation causes a fair bit of "#ifdef FDT" in the drm
>>> compat code.
>> 
>> I count 1:
>> 
>> https://nxr.netbsd.org/xref/src/sys/external/bsd/drm2/include/linux/regulator/consumer.h?r=1.5#39
>> 
>> Am I missing some?
>> 
>> There's also one in linux/clk.h, for clocks, and two in nouveau_pci.c
>> and radeon_pci.c, for kicking out the FBT framebuffer; these don't
>> appear to be relevant to regulators.
> 
> Any place where there's an "#ifdef FDT" is a failure to create a proper 
> abstraction.
> 
> -- thorpej

-- thorpej

Re: Regulator

2022-01-12 Thread Jason Thorpe




> On Jan 12, 2022, at 8:02 AM, Taylor R Campbell 
>  wrote:
> 
>> Date: Wed, 12 Jan 2022 15:42:17 +
>> From: Robert Swindells 
>> 
>> I'm guessing this is a platform that doesn't use FDT.
>> 
>> Maybe we need a more general API for regulators.
> 
> What other instances would a more general API cover?  Is there some
> ACPI interface for regulators?

The ACPI interface for regulators is usually "put device into power state X", 
but of course that depends on the right ASL being present in the node for the 
device.

>> The current situation causes a fair bit of "#ifdef FDT" in the drm
>> compat code.
> 
> I count 1:
> 
> https://nxr.netbsd.org/xref/src/sys/external/bsd/drm2/include/linux/regulator/consumer.h?r=1.5#39
> 
> Am I missing some?
> 
> There's also one in linux/clk.h, for clocks, and two in nouveau_pci.c
> and radeon_pci.c, for kicking out the FBT framebuffer; these don't
> appear to be relevant to regulators.

Any place where there's an "#ifdef FDT" is a failure to create a proper 
abstraction.

-- thorpej

Re: Regulator

2022-01-12 Thread Jason Thorpe




> On Jan 12, 2022, at 6:34 AM, Emmanuel Dreyfus  wrote:
> 
> Hello
> 
> I am still working on Goodix touchpanel as told in [1] and [2]. I
> wrote a driver for the Intel GPIO chip that interfaces with Goodix
> chipreset and interrupt pins. I have not committed yet, because it
> remains untested.
> 
> There is another readblock. The Linux driver for Goodix touchpanel 
> uses a beast called regulator, which seems to control the chip
> power supply:
>ts->avdd28 = devm_regulator_get(dev, "AVDD28");
> (...)
>ts->vddio = devm_regulator_get(dev, "VDDIO");
> (...)
>/* power up the controller */
>error = regulator_enable(ts->avdd28);
> (...)
>error = regulator_enable(ts->vddio);
> 
> Is it another chip that neds another driver? Or a feature of the
> Goodix chip?  What support do we have for thins kind of thing? 
> grep -r regulator returns manu hits in src/sys/dev/i2c and
> src/sys/dev/fdt but we do not have any generic API for that, right?

We support regulators using FDT.

-- thorpej

Passive serialization support in the pool allocator

2021-12-21 Thread Jason Thorpe

An #ifdef that Taylor added in the new DRM code was bugging me, because a 
similar situation already existed in the NetBSD kernel, whereby the type 
stability of a object’s backing memory (the iam existentium case being LWPs) 
was required for a passively-serialized weak reference to work.  The LWP pool 
cache had a hack to handle it, and the hack was propagated to the new DRM code 
under an “#ifdef __NetBSD__”.

I initially fixed this by adding a “pre-destruct” callback that could be 
optionally set for a pool cache, but after further discussion, it seemed like 
having direct knowledge of passive serialization synchronization points in the 
allocator was a better solution, so here is a diff that implements it for you 
review.

https://www.netbsd.org/~thorpej/pool-pser-diff.txt

Please comment ASAP, because I want to ride a kernel version bump.

-- thorpej

Re: Request for implementation of KERN_PROC_SIGTRAMP sysctl

2021-11-21 Thread Jason Thorpe

If strip removes it, then you’re doomed anyway and trampoline assist via a 
function won’t help you, because you won’t be able to get to the trampoline or 
past it anyway.

-- thorpej
Sent from my iPhone.

> On Nov 21, 2021, at 7:54 AM, John Marino (NetBSD)  wrote:
> 
> 
> I'm not very familiar with CFA information.  I've been worried that strip(1) 
> removes those symbols.  Is that worry meritless?
> 
>> On Sun, Nov 21, 2021 at 9:32 AM Jason Thorpe  wrote:
>> 
>> > On Nov 21, 2021, at 6:35 AM, John Marino (NetBSD)  wrote:
>> > 
>> > Sorry, I meant to answer and got overwhelmed.
>> > Actually, I had a typo in the previous response. I meant to say "GCC" 
>> > unwinder, not gdb unwinder.
>> > I don't see how this helps support the GCC unwinder.  The context 
>> > information to support the unwind is already provided, we just need the 
>> > boolean (is this a sigtramp frame?) to decide whether or not to use it.  I 
>> > thought Uwe's suggestion to return the context was either an expansion for 
>> > more general use or a second function.  All we have is the pc pointer. 
>> 
>> Well, getting back to a previous part of the conversation, there can be 
>> multiple kinds of contexts, either a “sigcontext” or a “ucontext_t”, and so 
>> telling you “it’s in a trampoline” isn’t necessarily useful — you need to 
>> know what kind it is.  The idea behind __sigtramp_unwind_np() is that you 
>> would need less architecture-dependent logic (and that the API would be 
>> future-proof, in case the way the handlers work changes for some reason).
>> 
>> However, Joerg correctly pointed out that the real correct solution already 
>> exists, which is to say “make sure DWARF CFA information exists for signal 
>> trampolines”, which is what I’ve been focusing on over the last couple of 
>> weeks.  Only if that proves to be insufficient for some reason should we add 
>> a new non-portable API to assist unwind.  DWARF unwind information doesn’t 
>> really work for FreeBSD trampolines, because the handler is supplied by the 
>> kernel, and so of course there’s not really a good way to look up the CIE / 
>> FDE for it.
>> 
>> -- thorpej
>>

Re: Request for implementation of KERN_PROC_SIGTRAMP sysctl

2021-11-21 Thread Jason Thorpe

> On Nov 21, 2021, at 8:28 AM, Jason Thorpe  wrote:
> 
> If strip removes it, then you’re doomed anyway and trampoline assist via a 
> function won’t help you, because you won’t be able to get to the trampoline 
> or past it anyway.

Here’s a before/after of a “strip -s” of a binary on the DWARF unwind 
information:

 12 .eh_frame_hdr 01a4  0001200077f0  0001200077f0  77f0  2**2
  CONTENTS, ALLOC, LOAD, READONLY, DATA
 13 .eh_frame 07a4  000120007998  000120007998  7998  2**3
  CONTENTS, ALLOC, LOAD, READONLY, DATA

 12 .eh_frame_hdr 01a4  0001200077f0  0001200077f0  77f0  2**2
  CONTENTS, ALLOC, LOAD, READONLY, DATA
 13 .eh_frame 07a4  000120007998  000120007998  7998  2**3
  CONTENTS, ALLOC, LOAD, READONLY, DATA

I.e. strip does not effect the unwind information, because unwind information 
is not debugging information nor is it part of the symbol information; it is, 
in fact, required for correct operation of the program in the face of 
exceptions.  And my test program still works:

alpha-vm:thorpej 22$ ./test1  
^Cx 2
Backtrace 4 stack frames.
0x1200014e4 <_init+0x304> at ./test1
0x3fffdddbfd4 <__sigtramp_siginfo_2> at /usr/lib/libc.so.12
0x1200015e4 <_init+0x404> at ./test1
0x120001590 <_init+0x3b0> at ./test1

As you can see, because I stripped the symbols out of the program binary, the 
symbol names are wrong, but the unwind works and the program counter values are 
the same as an un-stripped copy of the program:

alpha-vm:thorpej 23$ ./sigbttest   
^Cx 2
Backtrace 4 stack frames.
0x1200014e4  at ./sigbttest
0x3fffdddbfd4 <__sigtramp_siginfo_2> at /usr/lib/libc.so.12
0x1200015e4  at ./sigbttest
0x120001590  at ./sigbttest

So, I think your worry about it is unwarranted.

>> On Nov 21, 2021, at 7:54 AM, John Marino (NetBSD)  wrote:
>> 
>> 
>> I'm not very familiar with CFA information.  I've been worried that strip(1) 
>> removes those symbols.  Is that worry meritless?

-- thorpej

Re: Request for implementation of KERN_PROC_SIGTRAMP sysctl

2021-11-21 Thread Jason Thorpe

> On Nov 21, 2021, at 6:35 AM, John Marino (NetBSD)  wrote:
> 
> Sorry, I meant to answer and got overwhelmed.
> Actually, I had a typo in the previous response. I meant to say "GCC" 
> unwinder, not gdb unwinder.
> I don't see how this helps support the GCC unwinder.  The context information 
> to support the unwind is already provided, we just need the boolean (is this 
> a sigtramp frame?) to decide whether or not to use it.  I thought Uwe's 
> suggestion to return the context was either an expansion for more general use 
> or a second function.  All we have is the pc pointer. 

Well, getting back to a previous part of the conversation, there can be 
multiple kinds of contexts, either a “sigcontext” or a “ucontext_t”, and so 
telling you “it’s in a trampoline” isn’t necessarily useful — you need to know 
what kind it is.  The idea behind __sigtramp_unwind_np() is that you would need 
less architecture-dependent logic (and that the API would be future-proof, in 
case the way the handlers work changes for some reason).

However, Joerg correctly pointed out that the real correct solution already 
exists, which is to say “make sure DWARF CFA information exists for signal 
trampolines”, which is what I’ve been focusing on over the last couple of 
weeks.  Only if that proves to be insufficient for some reason should we add a 
new non-portable API to assist unwind.  DWARF unwind information doesn’t really 
work for FreeBSD trampolines, because the handler is supplied by the kernel, 
and so of course there’s not really a good way to look up the CIE / FDE for it.

-- thorpej

Re: timecounters

2021-11-15 Thread Jason Thorpe



> On Nov 15, 2021, at 8:32 AM, Michael  wrote:
> 
> IIRC on macppc SMP startup we stop the timebase, then zero the counters
> on all CPUs while they hatch, then start the timebase again.

But that's for the clock interrupt, right?  The time counter thing isn't about 
clock interrupts.

-- thorpej

Re: timecounters

2021-11-15 Thread Jason Thorpe

> On Nov 14, 2021, at 4:49 PM, Joerg Sonnenberger  wrote:
> 
>> x86 TSC: cycle count from CPU register. Very quick to read, but unreliable 
>> if 
>> CPU frequency changes because of power saving. Also each CPU has its own
>> value (how do we cope with that?)
> 
> It's even more complicated. For older x86 CPUs, different counts could
> go at different frequencies, but this was fixed around the Netburst era
> or so. That said, they frequently have issues when used in SMP systems.
> It's a high resolution timer.

This is handled correctly on Alpha now (fixed roughly 1 year ago with 
kern_cctr.c 1.11).  If the cycle counters are running at the same nominal 
frequency, the offset of the counter on the secondary CPUs is calibrated 
periodically (~once per second) relative to the primary CPU’s counter.  Alpha 
is the only port using that code at the moment, but there is nothing 
machine-dependent about it.

However, that code cannot deal with counters whose frequency changes, or 
counters that run at a different nominal frequency than the primary’s.

-- thorpej

Re: Request for implementation of KERN_PROC_SIGTRAMP sysctl

2021-11-14 Thread Jason Thorpe

> On Nov 14, 2021, at 3:12 PM, John Marino (NetBSD)  wrote:
> 
> So for the purpose of the gdb unwinder, I would pass NULL for the sp argument?
> The unwinder would only be checking the result to see __sigtramp_unwind_np 
> returns null or not.

This would not work for the gdb unwinder — it only works for unwinding from 
within the same process, so for exception handling, etc.  I.e. if the unwinder 
in the compiler runtime wasn’t able to process the DWARF call frame info for 
the signal trampoline, for example.  The gdb unwinder may have to operate on a 
process that’s no longer running (and, at the very least, is not the current 
process), so it needs to rely only on DWARF call frame info and/or heuristics 
based on the symbol name (which I added to gdb quite some years ago now).

The requirements for using this are:

==> In the frame that represents the handler itself, you are able to get the 
return address the handler will return to.

==> In that same frame, you can compute what the stack pointer should be when 
the handler returns (by either inspecting the instructions in the function 
prologue that setup the handler’s stack frame, or by parsing the DWARF call 
frame info).

It’s that return address and stack pointer that you would pass to 
__sigtramp_unwind_np().

Then the context returned by __sigtramp_unwind_np() would allow you to then get 
the stack pointer, PC, etc. for the context that would be resumed after the 
handler returns (because remember, signals can run on their own stack).

So, this is a little more than just “is this PC in the signal trampoline?”.  
This is “If this PC is in the signal trampoline, then return a pointer to the 
context that will be restored when the signal returns, given this SP.”  This 
was uwe’s suggestion.

I can still expose just __sigtramp_check_np(), but the assumption was that you 
would use a “YES!” result from that to to off an find the context-to-restore… 
so we decided to encapsulate some of that work for you as well.  Does that make 
sense?

If that assumption about how you would use __sigtramp_check_np() is invalid, 
please let me know so I can adjust the proposal.

-- thorpej

Re: Request for implementation of KERN_PROC_SIGTRAMP sysctl

2021-11-14 Thread Jason Thorpe

> On Oct 28, 2021, at 8:49 AM, Valery Ushakov  wrote:
> 
> It is ucontext for the siginfo trampoline and sigcontext for the older
> one, isn't it?

Ok, I’ve settled on the following:

void *__sigtramp_unwind_np(void *pc, void *sp, int *versp);

Given a program counter and stack pointer, return a pointer to the context that 
will be restored when the signal handler returns.  The signal trampoline 
version is returned in *versp; versp must not be NULL, and passing NULL will 
result in undefined behavior.

Returns NULL if the provided pc is not within the signal trampoline, or if the 
location of the context to restore cannot be determined.

If the returned version is within the range __SIGTRAMP_SIGINFO_VERSION_MIN … 
__SIGTRAMP_SIGINFO_VERSION_MAX, then the returned context points to a 
ucontext_t.

If the returned version is within the range __SIGTRAMP_SIGCONTEXT_VERSION_MIN … 
__SIGTRAMP_SIGCONTEXT_VERSION_MAX, then the returned context points to a struct 
sigcontext.

Note that the layout of the context structures is architecture-dependent and 
may also be version-dependent.

Does this sound good to everyone? 

-- thorpej

Re: if_ethersubr.c: if_ierrors -> if_iqdrops?

2021-11-08 Thread Jason Thorpe



> On Nov 7, 2021, at 11:07 PM, RVP  wrote:
> 
> So, I hacked up a small patch to put most of these into the
> "if_iqdrops" bucket. The rest (following FreeBSD) remain as errors.

LGTM!

-- thorpej

Re: Request for implementation of KERN_PROC_SIGTRAMP sysctl

2021-10-29 Thread Jason Thorpe

> On Oct 29, 2021, at 7:54 AM, Jason Thorpe  wrote:
> 
> 
> 
>> On Oct 29, 2021, at 2:45 AM, Martin Husemann  wrote:
>> 
>> On Thu, Oct 28, 2021 at 11:14:39AM -0700, Jason Thorpe wrote:
>>> We really just need to kill the sigcontext stuff completely.  More to the 
>>> point, we should version sigaction() before (or concurrently with) adding 
>>> whatever call we add here so as to ensure that it will only ever be a 
>>> ucontext.
>> 
>> Something there is broken in -current, see PR 56471 (freshly compiled macppc
>> ntpd tries to use compat_16___sigreturn14).
> 
> Yes, and from your description it should have failed to register a 
> “sigcontext” handler in the first place.

Martin, is kern.module.autoload enabled on your machine?  (Sorry, my Mac mini 
is in a state of “has been turned off for a while”, so I need to resurrect it 
today, so asking for info here to get the investigation going…)

The PowerPC signal trampoline also has a case where it can return EINVAL 
because it thinks the saved SRR1 has been tampered with.  I’m wondering if 
somehow the old sigcontext trampoline is getting invoked for a siginfo-style 
handler.

-- thorpej

Re: Request for implementation of KERN_PROC_SIGTRAMP sysctl

2021-10-29 Thread Jason Thorpe




> On Oct 29, 2021, at 2:45 AM, Martin Husemann  wrote:
> 
> On Thu, Oct 28, 2021 at 11:14:39AM -0700, Jason Thorpe wrote:
>> We really just need to kill the sigcontext stuff completely.  More to the 
>> point, we should version sigaction() before (or concurrently with) adding 
>> whatever call we add here so as to ensure that it will only ever be a 
>> ucontext.
> 
> Something there is broken in -current, see PR 56471 (freshly compiled macppc
> ntpd tries to use compat_16___sigreturn14).

Yes, and from your description it should have failed to register a “sigcontext” 
handler in the first place.

I’m looking into it.

-- thorpej

Re: Request for implementation of KERN_PROC_SIGTRAMP sysctl

2021-10-28 Thread Jason Thorpe




> On Oct 28, 2021, at 8:49 AM, Valery Ushakov  wrote:
> 
> On Wed, Oct 27, 2021 at 20:59:12 -0700, Jason Thorpe wrote:
> 
>>> On Oct 27, 2021, at 4:01 PM, Jason Thorpe  wrote:
>>> 
>>> 
>>>> On Oct 27, 2021, at 3:44 PM, Valery Ushakov  wrote:
>>>> 
>>>> On Wed, Oct 27, 2021 at 07:50:55 -0700, Jason Thorpe wrote:
>>>> 
>>>> I was wondering if it might be easier to not put the onus onto the
>>>> caller and instead have a function that returns the interrupted
>>>> ucontext (or NULL, if the pc is not in a trampoline).
>>>> 
>>>> ucontext_t *__unwind_sigtramp(return_pc, return_sp)
>>> 
>>> That would certainly be a nicer API.
>> 
>> Thought about it a little more.
>> 
>> To make this really work, we'd definitely have to version
>> sigaction() so that it fully de-supported sigcontext handlers.
>> Otherwise, it's a toss-up whether you have a sigcontext or a
>> ucontext on the stack.
> 
> It is ucontext for the siginfo trampoline and sigcontext for the older
> one, isn't it?

Sure, unless you're a platform that has never supported sigcontext, if you have 
COMPAT_16 disabled, or, ...

We really just need to kill the sigcontext stuff completely.  More to the 
point, we should version sigaction() before (or concurrently with) adding 
whatever call we add here so as to ensure that it will only ever be a ucontext.

-- thorpej

Re: Request for implementation of KERN_PROC_SIGTRAMP sysctl

2021-10-27 Thread Jason Thorpe

> On Oct 27, 2021, at 4:01 PM, Jason Thorpe  wrote:
> 
> 
>> On Oct 27, 2021, at 3:44 PM, Valery Ushakov  wrote:
>> 
>> On Wed, Oct 27, 2021 at 07:50:55 -0700, Jason Thorpe wrote:
>> 
>> I was wondering if it might be easier to not put the onus onto the
>> caller and instead have a function that returns the interrupted
>> ucontext (or NULL, if the pc is not in a trampoline).
>> 
>> ucontext_t *__unwind_sigtramp(return_pc, return_sp)
> 
> That would certainly be a nicer API.

Thought about it a little more.

To make this really work, we’d definitely have to version sigaction() so that 
it fully de-supported sigcontext handlers.  Otherwise, it’s a toss-up whether 
you have a sigcontext or a ucontext on the stack.  I do think a 
__sigtramp_unwind() would also be a good addition.  But I also see some value 
in the basic check (which you need to have to __sigtramp_unwind() anyway…).

-- thorpej

Re: Request for implementation of KERN_PROC_SIGTRAMP sysctl

2021-10-27 Thread Jason Thorpe



> On Oct 27, 2021, at 3:44 PM, Valery Ushakov  wrote:
> 
> On Wed, Oct 27, 2021 at 07:50:55 -0700, Jason Thorpe wrote:
> 
> I was wondering if it might be easier to not put the onus onto the
> caller and instead have a function that returns the interrupted
> ucontext (or NULL, if the pc is not in a trampoline).
> 
> ucontext_t *__unwind_sigtramp(return_pc, return_sp)

That would certainly be a nicer API.

-- thorpej

Re: Request for implementation of KERN_PROC_SIGTRAMP sysctl

2021-10-27 Thread Jason Thorpe

> On Oct 18, 2021, at 9:41 AM, John Marino (NetBSD)  wrote:
> 
> yes, it sounds like a __in_signal_trampoline function would work for
> the GCC unwind, and I would think it would work for GDB as well.

Ok, I have implemented a new function with this signature:

/*
 * __sigtramp_check_np --
 *
 *  Non-portable function that checks if the specified program
 *  counter value is within the signal return trampoline.  Returns
 *  the trampoline version numnber corresponding to what style of
 *  trampoline it matches, or -1 if the program value is not within
 *  the signal return trampoline.
 */
int __sigtramp_check_np(void *pc);

Usage would be like:

/*
 * If you need access to “struct sigcontext” to perform the unwind, you must
 * define _LIBC before including .
 */
#define _LIBC
#include 

.
.
.

int rv = __sigtramp_check_np((void *)context->return_address);

if (rv == -1) {
/* address was not within a signal return trampoline. */
}

if (rv == __SIGTRAMP_SIGCODE_VERSION) {
/*
 * ultra-legacy — this will never be returned for modern 
binaries.
 *  in fact, the current implementation of __sigtramp_check_np()
 * will never return it and there is no reason to add support 
for it.
 */
}

if (rv >= __SIGTRAMP_SIGCONTEXT_VERSION_MIN &&
rv <= __SIGTRAMP_SIGCONTEXT_VERSION_MAX) {
#ifdef __HAVE_STRUCT_SIGCONTEXT
/* do sigcontext stuff, distinguishing between versions, as 
needed */
#else
/*
 * no sigcontext structure is available here, so none of these 
values
 * should have been returned.
 */
#endif
}

if (rv >= __SIGTRAMP_SIGINFO_VERSION_MIN &&
rv <= __SIGTRAMP_SIGINFO_VERSION_MAX) {
/* do siginfo stuff, distinguishing between versions, as needed 
*/
}

/*
 * Address was within a signal return trampoline, but it’s not a version
 * that this program knows how to decode so *shrug*.
 */

For the most part, the MIN and MAX values will be the same (currently only the 
VAX has multiple versions of either kind of signal trampoline, and it’s for the 
“sigcontext” type).

Again, to be clear, this can query the current process ONLY.

Is this suitable for your needs?  Looking at your GCC unwinder for FreeBSD 
(link in your original email), it seems it would work fine (that unwinder uses 
getpid() to pass to the sysctl).

You would be able to test for the availability of __sigtramp_check_np() by 
testing for __SIGTRAMP_SIGCODE_VERSION being defined:

#ifdef __SIGTRAMP_SIGCODE_VERSION
/*
 * Code to check if we’re in a NetBSD signal trampoline.
 */
...
#endif /* __SIGTRAMP_SIGCODE_VERSION */

It’s not the prettiest API in the world, but you’re by definition diving into 
implementation details here, and the API does provide all the info needed to do 
the work.

Any other comments?  Christos?

…and while there was a flurry of activity lately around this area, the minimal 
changes to support __sigtramp_check_np() are easily isolated and could be back 
ported to the netbsd-9 branch without much trouble.

-- thorpej

Re: Request for implementation of KERN_PROC_SIGTRAMP sysctl

2021-10-18 Thread Jason Thorpe

> On Oct 18, 2021, at 9:41 AM, John Marino (NetBSD)  wrote:
> 
> yes, it sounds like a __in_signal_trampoline function would work for
> the GCC unwind, and I would think it would work for GDB as well.

Maybe I missed the GDB use case ... what is it that GDB needs to be able to do 
here?  This hypothetical function would only be able to query "self", not 
another process, so if GDB needs to be able to do the check for another process 
(the debug target, presumably), then it will need to use something else (can't 
it use the CFI notations to do the unwind?)

(Not all of the trampolines are correctly notated, but that's not a 
particularly tough problem to solve, either...)

-- thorpej

Re: Request for implementation of KERN_PROC_SIGTRAMP sysctl

2021-10-18 Thread Jason Thorpe




> On Oct 18, 2021, at 8:41 AM, John Marino (NetBSD)  wrote:
> 
> For GCC, we've got the return address (context->ra in the unwind
> programs in the original post).   The "problem" is that we want to
> know if that address falls on a signal trampoline frame.  If NO,
> return "end of stack", otherwise unwind the frame.

Seems like a non-portable libc function that could check among the various 
candidates that libc knows about would not be terribly difficult.

> A sysctl that returns an array of address pairs for all signal
> trampolines in the process is what I'm requesting.

Would a function like (hypothetically-named) "bool 
__in_signal_trampoline(uintptr_t addr);" provided by libc be workable for your 
use case?

> If there's another way to determine if an address falls within a
> signal trampoline, I'd like to see actual code to see if I can adapt
> it.
> Of course, the kernel team could just deny the request, but I won't be
> able to fix the regression caused when the per-signal trampolines were
> introduced.
> Thanks,
> John

-- thorpej

Re: Request for implementation of KERN_PROC_SIGTRAMP sysctl

2021-10-15 Thread Jason Thorpe



> On Oct 15, 2021, at 1:19 PM, Valery Ushakov  wrote:
> 
> On Fri, Oct 15, 2021 at 23:14:39 +0300, Valery Ushakov wrote:
> 
>> On Fri, Oct 15, 2021 at 14:44:16 -0500, John Marino (NetBSD) wrote:
>> 
>>> Is it possible for NetBSD to implement KERN_PROC_SIGTRAMP sysctl?
>> 
>> It's been ages since I touched this area, but don't we have
>> per-sigaction trampolines?  I mean, in practice they all use the same
>> __sigtramp_siginfo_$version trampoline, that sigaction passes to the
>> actual syscall, but in principle the process can have different
>> trampolines for different signals, can't it?
>> 
>> struct sys___sigaction_sigtramp_args {
>>  syscallarg(int) signum;
>>  syscallarg(const struct sigaction *) nsa;
>>  syscallarg(struct sigaction *) osa;
>>  syscallarg(const void *) tramp; // <-
>>  syscallarg(int) vers;
>> };
> 
> PS: We used to have a trampoline that the kernel copied out into the
> process address space (bottom of the stack, iirc) - and that would be
> something for KERN_PROC_SIGTRAMP to return indeed.  But that was like
> before netbsd 2.0, iirc.

Right, that was The Very Old Way.  There is an API contract between libc and 
the kernel vis a vis the signal trampoline, but the trampoline itself is in 
libc, which could get mapped anywhere.  So there is not a single static address 
that could be returned, even per-process, because, as uwe points out, the 
trampoline can be specified on a per-signal basis (there is one for "siginfo" 
signal delivery and another for the old-style "sigcontext" signal delivery).

-- thorpej

Re: Representing a rotary encoder input device

2021-09-22 Thread Jason Thorpe




> On Sep 22, 2021, at 5:23 PM, Chris Hanson  wrote:
> 
> On Sep 22, 2021, at 7:10 AM, Jason Thorpe  wrote:
> 
>> Well, ultimately, we translate “HID report” -> “ws* input event”.  Or are 
>> you suggesting that we should have a new interface to user-space that just 
>> sends HID reports?
> 
> That sounds like a good long-term plan, IMO, given the prevalence of HID in 
> the modern era and the ability to map virtually everything to it.

That wouldn't really bother me and seems like a really good idea, it's just 
work that I don't want to do :-)

(A set of helper functions to generate HID reports for devices that need to 
translate to HID from their native protocol would of course be super helpful.)

> I've though occasionally about putting together drivers for some of the 
> HP-HIL devices (single-dial-and-button rotary encoder, dial box with nine 
> rotary encoders, and button box with 32 buttons with LEDs) but haven't had 
> much clue how to wire them into the higher layers of the system to actually 
> use.

Some people have all the luck getting cool HP-HIL devices!

-- thorpej

Locking protocol for dev/scsipi

2021-09-22 Thread Jason Thorpe

Is the locking protocol for the SCSI subsystem documented anywhere?  
Specifically, the contract between the generic SCSI layer and controller 
drivers?

Thx.

-- thorpej

Re: Representing a rotary encoder input device

2021-09-22 Thread Jason Thorpe

> On Sep 22, 2021, at 8:43 AM, Greg Troxel  wrote:
> 
> 
> What do other systems do?  It strikes me what wsmouse feels like it is
> for things connected with the kbd/mouse/display world.  To be
> cantankerous, using it seems a little bit like representing a GPIO input
> as a 1-button mouse that doesn't move.

I haven't looked at what Linux does with rotary encoders yet, specifically.  
For GPIO buttons, I do know that those are translated into Linux key press 
events in their general input stream.  We do that for buttons, too... that's 
what the "gpiokeys" driver is for, and these are the Devicetree bindings for it:

https://github.com/devicetree-org/devicetree-source/blob/master/Bindings/input/gpio-keys.txt

> I would imagine that a rotary encoder is more likely to be a volume or
> level control, but perhaps not for the machine, perhaps just reported
> over MQTT so Home Assistant on some other machine can deal.

In my use case where the encoder is a volume control, it really is for the 
device it's attached to.

> If you are really talking about encoders hooked to gpio, then perhaps
> gpio should grow a facility to take N pins and say they are some kind of
> encoder and then have a gpio encoder abstraction.

Yes, that's what the Devicetree binding facilitates ... writing a driver that 
gathers up multiple GPIO inputs to present a higher-level device.

> But maybe you are trying to use an encoder to add scroll to a 3-button
> mouse?

No, that's not what I'm doing.

-- thorpej

Re: Representing a rotary encoder input device

2021-09-22 Thread Jason Thorpe

> On Sep 22, 2021, at 7:52 AM, Mouse  wrote:
> 
>> Well, ultimately, we translate â??HID reportâ?? -> â??ws* input eventâ??.  O$
> 
> Do you really want input devices to be inaccessible except through
> ws* interfaces?  Unless it's possible to get ws* events without needing
> the whole wscons infrastructure set up (eg, on a machine with serial
> console), that strikes me as a bad idea.  (Keyboards and mice also
> really should be accessible without all of wscons too, but that's less
> likely to be a practical issue.)

Well, I'm not particularly interested in having to decode each device type 
separately in user-space.  ws* presents an abstraction from which you can read 
the events.  There's nothing that says you have to have those things connected 
to a display.

-- thorpej

Re: Representing a rotary encoder input device

2021-09-22 Thread Jason Thorpe




> On Sep 21, 2021, at 10:47 PM, Michael van Elst  wrote:
> 
> thor...@me.com (Jason Thorpe) writes:
> 
>> Trying to think about the best way to represent such a device, I guess =
>> within wscons (they almost seem sort of like a 1-axis mouse, but I could =
>> be convinced otherwise).
> 
> You can make it a HID, because that's what it is.
> 
> Currently we only expose USB hid devices and sys/dev/hid is not much
> more than a framework to support these. This would need to be extended
> to a real abstraction (and uhidev might then be just a subclass).

Well, ultimately, we translate “HID report” -> “ws* input event”.  Or are you 
suggesting that we should have a new interface to user-space that just sends 
HID reports?

-- thorpej

Representing a rotary encoder input device

2021-09-21 Thread Jason Thorpe

Hey folks...

As part of a long-running hardware project I have been playing with, I'm 
experimenting with using a rotary encoder for input.  For reference, here is 
the Devicetree binding for rotary encoders:


https://github.com/devicetree-org/devicetree-source/blob/master/Bindings/input/rotary-encoder.txt

Rotary encoders come in a bunch of different forms, of course... ranging from 
simple knobs with detents to devices similar to the original iPod click wheel.  
Some include built-in push buttons (that signal a separate GPIO from the two 
GPIOs used to signal the rotary control itself).

Trying to think about the best way to represent such a device, I guess within 
wscons (they almost seem sort of like a 1-axis mouse, but I could be convinced 
otherwise).

Personally I'm seeing my use cases teetering towards using relative events 
(value selection that from 0-9 that wraps around back to 0, and also volume 
control that simply saturates at 0 or 100).  But I want to make sure I cover 
the absolute cases, as well.

Thoughts?

-- thorpej

Re: eventfd(2) and timerfd(2) APIs

2021-09-19 Thread Jason Thorpe

> On Sep 18, 2021, at 4:59 PM, John Nemeth  wrote:
> 
> Nice.  timerfd(2) is Asterisk's preferred timing source.  This
> should improve our support for Asterisk.  As you might guess, I'm
> all for the addition of these.

Yes, I remember you mentioning that before, and I just double-checked to make 
sure that Asterisk isn’t using anything that isn’t supported by this 
implementation (Linux has some additional non-standard clock IDs that we don’t 
support).

-- thorpej

Re: eventfd(2) and timerfd(2) APIs

2021-09-19 Thread Jason Thorpe

> On Sep 18, 2021, at 3:54 PM, Jason Thorpe  wrote:
> 
>> 
>> | else plumbs it through either, but I'll go ahead and do so.
>> 
>> Please do.   What other things don't permit fcntl() to work?   We
>> should fix any of those.
> 
> Well, I’ll fix these 2 anyway.

…and I just double-checked this as well… F_GETFL and F_SETFL don’t need any 
specific hooks in the implementation; the flags are all handled in common code.

-- thorpej

Re: eventfd(2) and timerfd(2) APIs

2021-09-19 Thread Jason Thorpe




> On Sep 18, 2021, at 6:26 PM, Jason Thorpe  wrote:
> 
>> On Sep 18, 2021, at 5:51 PM, Robert Elz  wrote:
>> 
>> | You'll get EBADF if you try it on anything else
>> 
>> and that's definitely wrong, on a pipe the system is allowed to return 
>> EINVAL,
>> and that's what ought to be returned whenever the fd is not something that
>> supports chmod (etc).   EBADF should only be used when the process doesn't
>> have the fd open.
> 
> Oops, I may be wrong about that specific detail.  I’ll double-check later.

Indeed, I was mistaken, you do get EINVAL in the “good fd that’s not a vnode” 
case.

-- thorpej

Re: eventfd(2) and timerfd(2) APIs

2021-09-18 Thread Jason Thorpe




> On Sep 18, 2021, at 5:51 PM, Robert Elz  wrote:
> 
>  | You'll get EBADF if you try it on anything else
> 
> and that's definitely wrong, on a pipe the system is allowed to return EINVAL,
> and that's what ought to be returned whenever the fd is not something that
> supports chmod (etc).   EBADF should only be used when the process doesn't
> have the fd open.

Oops, I may be wrong about that specific detail.  I’ll double-check later.

-- thorpej

Re: eventfd(2) and timerfd(2) APIs

2021-09-18 Thread Jason Thorpe



> On Sep 18, 2021, at 3:35 PM, Robert Elz  wrote:
> 
>Date:Sat, 18 Sep 2021 13:21:27 -0700
>From:    Jason Thorpe 
>Message-ID:  <5e7b8a22-14c2-4dce-ace2-31552f412...@me.com>
> 
> 
>  | >  unless the
>  | >  .Nm
>  | >  object was created with
>  | >  .Dv TFD_NONBLOCK .
>  |
>  | I'm using those names, because those are the names used in the Linux API.
> 
> It wasn't the names I was concerned about.
> 
>  | If you look at the code (it's on the thorpej-futex branch),
>  | you will see that they are aliases for O_NONBLOCK and O_CLOEXEC.
> 
> That was kind of obvious anyway from the man page:
> 
>  The following flags define the behavior of the resulting object:
>  .Bl -tag -width "EFD_SEMAPHORE"
>  .It Dv EFD_CLOEXEC
>  Sets the
>  .Dv O_CLOEXEC
>  flag; see
>  .Xr open 2
>  for more information.
> 
> So:
>  | I will clarify this in the man page.
> 
> probably isn't really necessary.   I was more concerned with the
> "unless the object was created with" - implying that if those flags
> are changed later, that would be irrelevant, as it is the state at
> create time that matters.   That would be unfortunate indeed, but:

I’ve changed the man pages to state “set for non-blocking I/O”.

>  | Actually, I didn't plumb fcntl through because just about nothing
> 
> might explain part of that (though you can't avoid the ability to
> alter O_CLOEXEC that way, as that's a much higher level operation).

>  | else plumbs it through either, but I'll go ahead and do so.
> 
> Please do.   What other things don't permit fcntl() to work?   We
> should fix any of those.

Well, I’ll fix these 2 anyway.

>  | The behavior of timerfd with respect to read is documented in my man page:
> 
> Yes, I saw that.
> 
>  | Writes to a timerfd return an error.  I will clarify this in the man page.
> 
> That would be useful.   You might want to also indicate how these
> descriptors are destroyed (I assume just close(2) but who knows).

Yes, they’re file descriptors, so close(2) gets rid of them.  Does this really 
need to be stated explicitly?

>  | > Finally, what does fstat() return about these fds?
> 
> The one I should have asked about, but forgot, was (st_mode & _S_FMT)
> Ie: what kind of object are these things pretending to be?

static int 
timerfd_fop_stat(file_t * const fp, struct stat * const st)
{
struct timerfd * const tfd = fp->f_timerfd;
 
memset(st, 0, sizeof(*st)); 
 
itimer_lock();
st->st_size = (off_t)timerfd_fire_count(tfd);
st->st_atimespec = tfd->tfd_atime;
st->st_mtimespec = tfd->tfd_mtime;
itimer_unlock();

st->st_blksize = sizeof(uint64_t);
st->st_mode = S_IFIFO | S_IRUSR | S_IWUSR;
st->st_blocks = 1;
st->st_birthtimespec = st->st_ctimespec = tfd->tfd_btime;
st->st_uid = kauth_cred_geteuid(fp->f_cred);
st->st_gid = kauth_cred_getegid(fp->f_cred);
 
return 0;
}

eventfd is similar.

> Since they're fd's, they can be inherited, open, by other processes
> (and since the man page hints at it, probably sent through a AF_UNIX
> socket), but particularly in the former case, the receiving process
> needs to know (or at least be able to find out) what it is that is on
> this fd it has received.
> 
>  | Of course, we don't document what these are for other kinds of descriptors,
> 
> for many there's no need, as everything is exactly what stat(2) claims
> it will be.   For any where that is not true, or is insufficient, we
> should be documenting it.

There are, of course, not enough _S_FMT bits to describe the possible 
combinations.

> If this was just a linux compat hack, so linux binaries could run,
> then most of this wouldn't matter - the application would do whatever
> linux allows it to do, and nothing actually built on NetBSD would
> ever care.
> 
> But if these are to be full NetBSD interfaces, they need to be
> both complete (and sane) and properly documented.   That means
> which of the f*() interfaces (fstat, fchmod, fchown, ...) work,

Actually, fchmod(), fchown(), etc. only work on DTYPE_VNODE descriptors.  
You’ll get EBADF if you try it on anything else (look for any place that calls 
fd_getvnode()).

-- thorpej

Re: eventfd(2) and timerfd(2) APIs

2021-09-18 Thread Jason Thorpe

> On Sep 18, 2021, at 12:17 PM, Robert Elz  wrote:
> 
>Date:Sat, 18 Sep 2021 10:26:29 -0700
>From:    Jason Thorpe 
>Message-ID:  <986563ad-88c2-41b9-bf69-51b26240b...@me.com>
> 
> 
>  |https://www.netbsd.org/~thorpej/timerfd.2
> 
> This one contains duplicated text...
> 
>  Because they are associated with a file descriptor, they may be passed
>  to other processes, inherited across a fork, and multiplexed using
>  .Xr kevent ,
>  .Xr poll ,
>  or
>  .Xr select  they are associated with a file descriptor, they may be passed
>  to other processes, inherited across a fork, and multiplexed using
>  .Xr kevent 2 ,
>  .Xr poll 2 ,
>  or
>  .Xr select 2 .
> 
> That should be fixed before anything is committed.

Thanks, fixed.

> Apart from that both man pages contain text like
> 
>  unless the
>  .Nm
>  object was created with
>  .Dv TFD_NONBLOCK .

I’m using those names, because those are the names used in the Linux API.  If 
you look at the code (it’s on the thorpej-futex branch), you will see that they 
are aliases for O_NONBLOCK and O_CLOEXEC.  I will clarify this in the man page.

> Since these things are working with file descriptors, I assume that
> fcntl(2) can be used to manipulate flags like O_NONBLOCK and O_CLOEXEC
> in which case I would guess (and hope) that the state of those flags when the
> object was created isn't what is releant, but the state of the flags at
> the time of the operation concerned.

Actually, I didn’t plumb fcntl through because just about nothing else plumbs 
it through either, but I’ll go ahead and do so.

> The man pages should probably be reworded with that in mind.
> 
> The exact relationships of the {event,timer}fd_*() functions
> and read()/write() is also not clear to me - are those just wrappers
> around read/write or are they distinct sys calls of their own?

In the case of eventfd_read() and eventfd_write(), those are in fact just 
wrappers around read() and write(), they’re implemented in libc, and they’re 
provided only because Linux also provides them and I was aiming for API 
compatibility.

In the case of timerfd, Linux does not provide a timerfd_read() wrapper, so I 
also did not.  timerfd_gettime() and timerfd_settime() are not wrappers around 
anything.  They are themselves system calls, just as they are on Linux.

> I initially assumed the former, but then I see that timerfd_settimer()
> has an extra flags arg, which write() (I presume) has no easy way to
> pass in, so now I am not sure.
> 
> If these are distinct operations how to actual read()/write() interact?

The behavior of timerfd with respect to read is documented in my man page:

 Each time a timerfd timer expires, an internal counter is incremented.
 Reads return the value of this counter as an unsigned 64-bit integer and
 reset the counter to 0.  If the value of the counter is 0, then reads
 will block, unless the timerfd object was created with TFD_NONBLOCK.

Writes to a timerfd return an error.  I will clarify this in the man page.

> Finally, what does fstat() return about these fds?   What is the dev_t ?
> What is the inode number, is the link count meaningfil, how about the
> uid and permissions?And what affects the time fields?

For timerfd:

struct timespec tfd_btime;  /* time created */
struct timespec tfd_mtime;  /* last timerfd_settime() */
struct timespec tfd_atime;  /* last read */

For eventfd:

struct timespec efd_btime;  /* time created */
struct timespec efd_mtime;  /* last write */
struct timespec efd_atime;  /* last read */

Of course, we don’t document what these are for other kinds of descriptors, so 
I didn’t spend a lot of time documenting it for these.  It certainly might be a 
nice idea to fully document the stat info for every descriptor type in the 
system, but I don’t think the lack of that information (for which there is no 
standardized format, it seems, since no other descriptor types seem to document 
it) should be considered a blocker for adding these calls.

-- thorpej

Re: eventfd(2) and timerfd(2) APIs

2021-09-18 Thread Jason Thorpe

(Yes, this is a re-run … but I’m ready to merge it if there are no objections, 
so…)

> On Sep 18, 2021, at 10:26 AM, Jason Thorpe  wrote:
> 
> Last year, I wrote implementations of the Linux eventfd(2) and timerfd(2) 
> interfaces for NetBSD, with the goal of improving our Linux emulation.  In 
> order to be able to test them with ATF tests, I went ahead and made them 
> native calls as well.
> 
> Here are the man pages describing the interfaces:
> 
>   https://www.netbsd.org/~thorpej/eventfd.2
>   https://www.netbsd.org/~thorpej/timerfd.2
> 
> Any objections to adding these?
> 
> -- thorpej
> 

-- thorpej

eventfd(2) and timerfd(2) APIs

2021-09-18 Thread Jason Thorpe

Last year, I wrote implementations of the Linux eventfd(2) and timerfd(2) 
interfaces for NetBSD, with the goal of improving our Linux emulation.  In 
order to be able to test them with ATF tests, I went ahead and made them native 
calls as well.

Here are the man pages describing the interfaces:

https://www.netbsd.org/~thorpej/eventfd.2
https://www.netbsd.org/~thorpej/timerfd.2

Any objections to adding these?

-- thorpej

Re: General device properties API

2021-09-14 Thread Jason Thorpe

> On Sep 12, 2021, at 9:14 AM, Taylor R Campbell 
>  wrote:
> 
>> Date: Sun, 12 Sep 2021 08:57:07 -0700
>> From: Jason Thorpe 
>> 
>>> On Sep 12, 2021, at 8:17 AM, Jason Thorpe  wrote:
>>> 
>>> Doing this with symbols is a mess.
>> 
>> Here's a way to basically get most of what you want without
>> referencing symbols:
> 
> Now the linker doesn't detect namespace collisions.  So if two
> different subsystems in different modules take the same name you might
> silently get runtime memory corruption.

I think we’re at a point here where we disagree on whether or not this will be 
an actual problem.  I don’t believe it will be an actual problem and is 
therefore not worth the extra module management complexity (which isn’t free 
from a “space” perspective because each module has to have code to handle 
loading and unloading).  We’re in engineering trade-off land here, and I think 
you’re making the wrong one.

In any case, here is a diff that adds the argument structure type checking.  I 
also added an awk program to generate the argument structures and all of the 
call binding stuff automatically from an interface description file.  So, if 
someone wants to deal with the additional module management complexity, they 
can change gendevcalls.awk, re-gen the header files, and all of the call sites 
will be updated automatically.

https://www.netbsd.org/~thorpej/device-call-typing-diffs.txt

-- thorpej

Re: Overhaul of I2C and SPI device autoconfiguration

2021-09-12 Thread Jason Thorpe

> On Sep 12, 2021, at 6:16 PM, David Holland  wrote:
> 
> On Sat, Sep 11, 2021 at 03:43:24PM -0700, Jason Thorpe wrote:
>> The basic idea involves the device call mechanism I introduced a while ago.
> 
> You still haven't explained why that needs to be untyped and based on
> uncheckable string constants.

I proposed something for type checking.  However, there is an engineering 
trade-off disagreement about the requirement to reference symbols.

In any case, I consider that to be an implementation detail that is unrelated 
to the meat of this discussion.

-- thorpej

Re: Overhaul of I2C and SPI device autoconfiguration

2021-09-12 Thread Jason Thorpe

> On Sep 12, 2021, at 2:49 PM, Tobias Nygren  wrote:
> 
> On Sun, 12 Sep 2021 12:30:14 -0700
> Jason Thorpe  wrote:
> 
>> Ok, sparc64 should be good do go now.  I made a bunch of fake fixup entries 
>> for the Qemu machine and, after a couple of additional fixes, verified that 
>> the machinery is working as expected.
> 
> Gets further but hits another problem.
> bus mutex uninitialized or points at wrong data?
> 
> iic0 at pcfiic0: I2C bus
> cpu0: data fault: pc=105f440 rpc=143e594 addr=1afc000
> kernel trap 6c: +fast data access protection
> Stopped in pid 0.0 (system) at  netbsd:_lock_cas:   casxa   0x80, 
> %o1, %o2
> db{0}> bt
> iic_acquire_bus(1afd088, 8, 2003dac, 1af0400, 1afd098, 10) at 
> netbsd:iic_acquire_bus+0x94
> spdmem_i2c_match(1075dfa80, 1c66f00, 2004710, 1aecc00, 73, 1c60b98) at 
> netbsd:spdmem_i2c_match+0xa4
> config_match(1075dfa80, 1c66f00, 2004710, 2004710, 20, 0) at 
> netbsd:config_match+0x38
> [rest of trace omitted]

Ok, I'm a little confused by this one.  pcfiic_attach() allocates a single 
channel structure if the ebus front-end did not already do so.  And then for 
either case, it then enumerates the channels and calls 
iic_tag_init(>ch_i2c) for the tag on each one, which initializes the bus 
mutex.

Hm, but I spotted another bug ... I didn't update pcfiic_i2c_exec() for the 
channel split.  That could be clobbering something.  I just pushed an update to 
dev/ic/pcf8584.c -- can you give it a spin?

-- thorpej

Re: Overhaul of I2C and SPI device autoconfiguration

2021-09-12 Thread Jason Thorpe



> On Sep 12, 2021, at 10:26 AM, Jason Thorpe  wrote:
> 
> 
> 
>> On Sep 12, 2021, at 7:37 AM, Tobias Nygren  wrote:
>> 
>> iic3 at rkiic3: I2C bus
>> typec-portc (fcs,fusb302) at iic3 addr 0x22 not configured
>> panic: kernel diagnostic assertion "rv" failed: file 
>> "/usr/src/sys/dev/i2c/i2c.c", line 630
>> cpu0: Begin traceback...
>> trace fp c0f784e0
>> fp c0f78510 vpanic() at c053bd5c netbsd:vpanic+0x14c
>> fp c0f78570 kern_assert() at c079f928 netbsd:kern_assert+0x58
>> fp c0f78600 iic_enumerate_devices_callback() at c03448f4 
>> netbsd:iic_enumerate_devices_callback+0xd4
>> fp c0f78670 of_i2c_enumerate_devices() at c06483cc 
>> netbsd:of_i2c_enumerate_devices+0x10c
>> fp c0f78730 device_call() at c0523e60 netbsd:device_call+0x90
> 
> Huh, ok, I’ll take a look.  I’ll craft a scenario where my DT has a “not 
> configured” device to reproduce the problem.
> 
> Thanks for giving it a spin.  I’ll follow up shortly.

Ok, I reproduced it and found the problem pretty quickly (an errant "return 
false;" that was a paste-o).


[   1.000] bsciic2 at simplebus1: Broadcom Serial Controller
[   1.000] bsciic2: interrupting on icu irq 53
[   1.000] iic2 at bsciic2: I2C bus
[   1.000] power-management (netbsd,uugear-wittypi3) at iic2 addr 0x69 not 
configured
[   1.000] /soc/vec@7e806000 at simplebus1 not configured
[   1.000] dwctwo0 at simplebus1: USB controller
[   1.000] dwctwo0: interrupting on icu irq 9


-- thorpej

Re: Overhaul of I2C and SPI device autoconfiguration

2021-09-12 Thread Jason Thorpe

> On Sep 12, 2021, at 11:32 AM, Jason Thorpe  wrote:
> 
> 
>> On Sep 12, 2021, at 10:27 AM, Jason Thorpe  wrote:
>> 
>> Huh.  Ok, this didn’t happen in qemu, but I’ll take a look today.
> 
> Oh, yes it does.  This is almost certainly due to a small tweak I made 
> locally after I tested it the first time.  I am a dum-dum.  Fixing now.

Ok, sparc64 should be good do go now.  I made a bunch of fake fixup entries for 
the Qemu machine and, after a couple of additional fixes, verified that the 
machinery is working as expected.

-- thorpej

Re: Overhaul of I2C and SPI device autoconfiguration

2021-09-12 Thread Jason Thorpe



> On Sep 12, 2021, at 10:27 AM, Jason Thorpe  wrote:
> 
> Huh.  Ok, this didn’t happen in qemu, but I’ll take a look today.

Oh, yes it does.  This is almost certainly due to a small tweak I made locally 
after I tested it the first time.  I am a dum-dum.  Fixing now.

-- thorpej

Re: Overhaul of I2C and SPI device autoconfiguration

2021-09-12 Thread Jason Thorpe




> On Sep 12, 2021, at 5:31 AM, Tobias Nygren  wrote:
> 
> sparc64 kernel built from thorpej-i2c-spi-conf2 crashed early:
> 
> r...@netra240.rymdfartsverket.se:/usr/obj/sys/arch/sparc64/compile/GENERIC.netra240-debug
> total memory = 7168 MB
> avail memory = 7023 MB
> mainbus0 (root)cpu0: data fault: pc=17b7754 rpc=165c3ec addr=0
> kernel trap 30: data access exception
> Stopped in pid 0.0 (system) at  netbsd:strcmp+0x14: ldub[%o0 
> + %g1], %g3
> db{0}> bt
> device_compatible_lookup(1c7df08, 1, 17deb08, 0, 1060c43cc, 17deb08) at 
> netbsd:device_compatible_lookup+0x4c
> sparc64_device_tree_fixup(1060c4380, 0, 2005b08, 1ae7800, 1c82400, 1c82400) 
> at netbsd:sparc64_device_tree_fixup+0x8c

Huh.  Ok, this didn’t happen in qemu, but I’ll take a look today.

-- thorpej

Re: Overhaul of I2C and SPI device autoconfiguration

2021-09-12 Thread Jason Thorpe




> On Sep 12, 2021, at 7:37 AM, Tobias Nygren  wrote:
> 
> iic3 at rkiic3: I2C bus
> typec-portc (fcs,fusb302) at iic3 addr 0x22 not configured
> panic: kernel diagnostic assertion "rv" failed: file 
> "/usr/src/sys/dev/i2c/i2c.c", line 630
> cpu0: Begin traceback...
> trace fp c0f784e0
> fp c0f78510 vpanic() at c053bd5c netbsd:vpanic+0x14c
> fp c0f78570 kern_assert() at c079f928 netbsd:kern_assert+0x58
> fp c0f78600 iic_enumerate_devices_callback() at c03448f4 
> netbsd:iic_enumerate_devices_callback+0xd4
> fp c0f78670 of_i2c_enumerate_devices() at c06483cc 
> netbsd:of_i2c_enumerate_devices+0x10c
> fp c0f78730 device_call() at c0523e60 netbsd:device_call+0x90

Huh, ok, I’ll take a look.  I’ll craft a scenario where my DT has a “not 
configured” device to reproduce the problem.

Thanks for giving it a spin.  I’ll follow up shortly.

-- thorpej

Re: General device properties API

2021-09-12 Thread Jason Thorpe



> On Sep 12, 2021, at 8:17 AM, Jason Thorpe  wrote:
> 
> Doing this with symbols is a mess.

Here’s a way to basically get most of what you want without referencing symbols:

struct device_call_generic {
const char *name; 
void *args; 
};
 
int device_call_generic(device_t, const struct device_call_generic *); 
 
struct i2c_enumerate_devices_args {
...
};

union i2c_enumerate_devices_binding { 
struct device_call_generic generic;
struct {
const char *name;
struct i2c_enumerate_devices_args *args;
} binding;
};

#define I2C_ENUMERATE_DEVICES(_args_)   \
&((const union i2c_enumerate_devices_binding){  \
.binding.name = "i2c-enumerate-devices",\
.binding.args = (_args_),   \
})  

#define device_call(dev, call)  \
device_call_generic((dev), (call)->generic) 
 
 

struct i2c_attach_args ia = {
.ia_tag = ic,
};
struct i2c_enumerate_devices_args enumargs = {
.ia = ,
.callback = iic_enumerate_devices_callback,
};
 
rv = device_call(dev, I2C_ENUMERATE_DEVICES());

-- thorpej

1 2 3 4 5 >

1 - 100 of 407 matches

Mail list logo