Re: hang in vcache_vget()

2024-06-06 Thread Greg Troxel
Emmanuel Dreyfus  writes:

> Hello
>
> I experienced a system freeze on NetBSD-10.0/i386. Many processes 
> waiting on tstile, and one waiting on vnode, with this backtrace:
> sleepq_block
> cv_wait
> vcache_vget
> vcache_get
> ufs_lookup
> VOP_LOOKUP
> lookup_once
> namei_tryemulroot.constprop.0
> namei
> vn_open
> do_open
> do_sys_openat
>
> I regret I did not took the time to show vnode. 
>
> Is it worth a PR? I have no clue if it can be reproduced.

I would say yes it's worth it.

I have had hangs on 10/amd64, on a system with 32G of ram.  I have been
blaming zfs, but my "never hangs" experience has been on 9/ufs.  But,
others say zfs is fine.

I just came across the "threads leak memory" problem pointed
out by Brian Marcotte, and found a 17G gpg-agent.  I now wonder if
whatever is hanging is being provoked by running out of memory.  Still a
bug, but I no longer feel my "new problem" can be pointed at zfs.

Do you think your system had high memory pressure at the time of your
crash?

(Sort of off topic, you should know that because 32-bit computers are no
longer manufactured, the rust project thinks you shouldn't be using
them.  Take it to the ewaste center right away!)


Re: poll(): IN/OUT vs {RD,WR}NORM

2024-05-27 Thread Greg Troxel
Johnny Billquist  writes:

>  POLLPRIHigh priority data may be read without blocking.
>
>  POLLRDBAND Priority data may be read without blocking.
>
>  POLLRDNORM Normal data may be read without blocking.

Is this related to the "oob data" scheme in TCP (which is a hack that
doesn't work)?   Where do we attach 3 priority levels to data?


Re: Forcing a USB device to "ugen"

2024-03-25 Thread Greg Troxel
Jason Thorpe  writes:

> I should be able to do this with OpenOCD (pkgsrc/devel/openocd), but
> libfdti1 fails to find the device because libusb1 only deals in
> "ugen".

Is that fundamental, in that ugen has ioctls that are ugen-ish that
uftdi does not?   I am guessing you thought about fixing libusb1.

> The desire to use "ugen" on "interface 1" is not a property of
> 0x0403,0x6010, it's really a property of
> "SecuringHardware.com","Tigard V1.1".  Unfortunately, there's isn't a
> way to express that in the kernel config syntax.
>
> I think my only short-term option here is to, in uftdi_match(), specifically 
> reject based on this criteria:
>
>   - VID == 0x0403
>   - PID == 0x6010
>   - interface number == 1
>   - vendor string == "SecuringHardware.com"
>   - product string == "Tigard V1.1"
>
> (It's never useful, on this particular board, to use the second port as a 
> UART.)

That seems reasonable to me.  It seems pretty unlikely to break other
things.


Re: Polymorphic devices

2024-01-06 Thread Greg Troxel
Brad Spencer  writes:

> I don't know just yet, but there might be unwanted device reset the "use
> the one you open" technique.  That is, you might have to reset the chip
> to change mode and if you support say, I2C and GPIO at the same time
> (which is possible), but then change to just GPIO the chip has to be
> reset and that will disrupt any setting you might have set (I think, I
> am am still working out what needs to happen with the mode switches).
> This may not matter in the bigger picture and it wouldn't matter as much
> if the mode switch was a sysctl, which one can say will reset the chip
> anyway.

Interesting complexity, but I'd say state the user has asked for should
live in the driver and if it has to write that again on mode switch so
be it.  Generally if you open a device and close it you don't have much
grounds to expect things you did to persist to the next session, but
devices have device-specific semantics anyway.


Re: Polymorphic devices

2024-01-05 Thread Greg Troxel
Brad Spencer  writes:

> The first is enhancements to ufdti to support the MPSSE engine that some
> of the FTDI chip variants have.  This engine allows the chip to be a I2C
> bus, SPI bus and provides some simple GPIO, and a bunch of other stuff,
> as well as the normal USB UART.  It is not possible to use all of the
> modes at the same time.  That is, these are not separate devices, but
> modes within one device.  Or another way, depending on the mode of the
> chip you get different child devices attached to it.  I am curious on
> what the thoughts are on how this might be modeled.

My reaction without much thought is to attach them all and to have the
non-selected one return ENXIO or similar.  And to have another device on
which you call the ioctl to choose which device to enable.

Or perhaps, to let you open any of them, flipping the mode, and to fail
the 2nd simultaenous open.


Re: Maxphys on -current?

2023-08-03 Thread Greg Troxel
Brian Buhrow  writes:

>   hello.  I know that this has ben a very long term project, but I'm 
> wondering about the
> status of this effort?  I note that FreeBSD-13 has a Maxphys value of 1048576 
> bytes.
> Have we found other ways to get more throughput from ATA disks that obviate 
> the need for this
> setting which I'm not aware of?
> If not, is anyone working on this project?  The wiki page says the project is 
> stalled.

I haven't heard that anyone is.

When you run dd with bs=64k and then bs=1m, how different are the
results?  (I believe raw requests happen accordingly, vs MAXPHYS for fs
etc. access.)


Re: RFC: Native epoll syscalls

2023-06-22 Thread Greg Troxel
Mouse  writes:

>> It is definitely a real problem that people write linuxy code that
>> seems unaware of POSIX and portability.
>
> While I feel a bit uncomfortable appearing to defend the practice (and,
> to be sure, it definitely can be a problem) - but, it's also one of the
> ways advancements happen: add an extension, use it, it turns out to be
> useful, it gets popular
>
> I've done it myself (well, except for the "gets popular" part, which no
> one person can do alone): labeled control structure, AF_TIMER sockets,
> pidconn, validusershell, the list goes on.

Sure, but this is "there are several extensions, and write code that
only uses the local one, even though it could have been written to use
any".  And perhaps "there are mechanisms which could have been adopted,
but instead make up a third".

And I really meant "seems unaware", not "made a deliberate decision,
evidenced by written design" :-)



Re: RFC: Native epoll syscalls

2023-06-22 Thread Greg Troxel
Martin Husemann  writes:

> On Wed, Jun 21, 2023 at 01:50:47PM -0400, Theodore Preduta wrote:
>> There are two main benefits to adding native epoll syscalls:
>> 
>> 1. They can be used to help port Linux software to NetBSD.
>
> Well, syscall numbers are cheap and plenty...
>
> The real question is: is it a usefull and consistent API to have?
> At first sight it looks like a mix of kqueue and poll, and seems to be
> quite complex.

It is definitely a real problem that people write linuxy code that seems
unaware of POSIX and portability.  If we had native epoll, then that
code could be built and used.  That of course doesn't fix the
portability issues, but it avoids them.

It seems to me that if we have epoll emulation, it should not be that
hard to also have it native, and I think the benefit in being able to
run (natively) programs written unportably is significant.


Re: malloc(9) vs kmem(9) interfaces

2023-06-01 Thread Greg Troxel
Taylor R Campbell  writes:

> Right, so the question is -- can we get the attribution _without_
> that?  Surely attribution itself is just a matter of some per-CPU
> counters.

Reading along, it strikes me there is a huge point implicit in your
last sentence.

I first thought of attribution as being able to tell what a particular
allocated object is being used for.  That requires state per object.

However, you are talking about maintaining a count of objects by user.
That is vastly cheaper, and likely 90%+ as useful.

SO there is "object attribution" and "total usage attribution".



Re: LINEAR24 userland format in audio(4) - do we really want it?

2023-05-08 Thread Greg Troxel
nia  writes:

> Unfortunately file formats are standardized but the
> the way the audio APIs are implemented varies. :/
>
>> It's now no longer broken to handle 24bit WAV files.
>
> This is true, but audioplay is hardly the only
> consumer of the API and could easily be made to communicate
> with the kernel using 32-bit samples.
>
> What is the behaviour of everything in pkgsrc when thrown
> 24bit WAV files?

I'm not following.  Are you saying

  we should remove suppport from the kernel API for 24-bit linear?

  lots of stuff in pkgsrc should be fixed so it works better?



Re: USB-related panic in 8.2_STABLE

2023-04-27 Thread Greg Troxel
Timo Buhrmester  writes:

> Apparently out of nothing, one of our servers paniced.
>
>
> uname -a gives:
>
> | NetBSD trave.math.uni-bonn.de 8.2_STABLE NetBSD 8.2_STABLE
> | (MI-Server) #17: Fri Jul 16 14:01:03 CEST 2021
> | supp...@trave.math.uni-bonn.de:/var/work/obj-8/sys/arch/amd64/compile/miserv
> | amd64

My impression is that there have been a lot of USB fixes since 8.

> I've transcribed the panic message and backtrace:
>
> | ohci0: 1 scheduling overruns
> | ugen0: detached
> | ugen0: at uhub4 port 2 (addr 2) disconnected
> | ugen0 at uhub4 port 2
> | ugen0: Phoenixtec Power (0x6da) USB Cable (V2.00) (0x02), rev 1.00/0.06, 
> addr 2
> | uvm_fault(0xfe82574c2458, 0x0, 1) -> e
> | fatal page fault in supervisor mode
> | trap type 6 code 0 rip 0x802f627e cs 0x8 rflags 0x10246 cr2 0x2 
> ilevel 6 (NB: could be ilevel 0 as well) rsp 0x80013f482c10
> | curlwp 0xfe83002b2000 pid 8393.1 lowest kstack 0x80013f4802c0
> | kernel: page fault trap, code=0
> | Stopped in pid 8393.1 (nutdrv_qx_usb) at   netbsd:ugen_get_cdesc+0xb1:
> | movzwl 2(%rax),%edx
> | db{2}> bt
> | ugen_get_cdesc() at netbsd:ugen_get_cdesc+0xb1
> | ugenioctl() at netbsd:ugenioctl+0x9a4
> | cdev_ioctl() at netbsd:cdev_ioctl+0xb4
> | VOP_IOCTL() at netbsd:VOP_IOCTL+0x54
> | vn_ioctl() at netbsd:vn_ioctl+0xa6
> | sys_ioctl() at netbsd:sys_ioctl+0x11a
> | syscall() at netbsd:syscall+0x1ec
> | --- syscall (number 54) ---
> | 7a73c9eff13a:
> | db{2}>
>
> Any idea what's going on?

It can always be hardware.  (Even if one can argue bad hardware should never
lead to panic.)  I'm not saying it is, or is likely, but keep that in
mind.

You didn't give timing.  If this immediately followed the disconnnect,
it's perhaps a bug in ugen to do something after the device is gone.  It
may be that this bug has always been there and that normally the UPS
doesn't disconnect, or you hit a bad race.

Try updating to 9 or 10 :-)


Re: crash in timerfd building pandoc / ghc94 related

2023-02-06 Thread Greg Troxel
PHO  writes:

> On 2/6/23 5:27 PM, Nikita Ronja Gillmann wrote:
>> I encountered this on some version of 10.99.2 and last night again on
>> 10.99.2 from Friday morning.
>> This is an obvious blocker for me for making 9.4.4 the default.
>> I propose to either revert to the last version or make the default GHC
>> version setable.
>
> I wish I could do the latter, but unfortunately not all Haskell
> packages are buildable with 2 major versions of GHC at the same time
> (most are, but there are a few exceptions).
>
> Alternatively, I think I can patch GHC 9.4 so that it won't use
> timerfd. It appears to be an optional feature after all; if its
> ./configure doesn't find timerfd it won't use it. Let me try that.

If it's possible to only do this on NetBSD 10.99, that would be good.
It seems so far, from not really paying attention, that there is nothing
wrong with ghc but that there is a bug in the kernel.   It would also
be good to get a reproduction recipe without haskell.


Re: Enable to send packets on if_loop via bpf

2022-11-21 Thread Greg Troxel

Ryota Ozaki  writes:

> In the specification DLT_NULL assumes a protocol family in the host
> byte order followed by a payload.  Interfaces of DLT_NULL uses
> bpf_mtap_af to pass a mbuf prepending a protocol family.  All interfaces
> follow the spec and work well.
>
> OTOH, bpf_write to interfaces of DLT_NULL is a bit of a sad situation.
> A writing data to an interface of DLT_NULL is treated as a raw data
> (I don't know why); the data is passed to the interface's output routine
> as is with dst (sa_family=AF_UNSPEC).  tun seems to be able
> to handle such raw data but the others can't handle the data (probably
> the data will be dropped like if_loop).

Summarizing and commenting to make sure I'm not confused

  on receive/read, DLT_NULL  prepends AF in host byte order
  on transmit/write, it just sends with AF_UNSPCE

  This seems broken as it is asymmetric, and is bad because it throws
  away information that is hard to reliably recreate.  On the other hand
  this is for link-layer formats, and it seems that some interfaces have
  an AF that is not really part of what is transmitted, even though
  really it is.  For example tun is using an IP proto byte to specify AF
  and really this is part of the link protocol.  Except we pretend it
  isn't.

> Correcting bpf_write to assume a prepending protocol family will
> save some interfaces like gif and gre but won't save others like stf
> and wg.  Even worse, the change may break existing users of tun
> that want to treat data as is (though I don't know if users exist).
>
> BTW, prepending a protocol family on tun is a different protocol from
> DLT_NULL of bpf.  tun has three protocol modes and doesn't always prepend
> a protocol family.  (And also the network byte order is used on tun
> as gert says while DLT_NULL assumes the host byte order.)

wow.

> So my fix will:
> - keep DLT_NULL of if_loop to not break bpf_mtap_af, and
> - unchange DLT_NULL handling in bpf_write except for if_loop to bother
> existing users.
> The patch looks like this:
>
> @@ -447,6 +448,14 @@ bpf_movein(struct uio *uio, int linktype,
> uint64_t mtu, struct mbuf **mp,
> m0->m_len -= hlen;
> }
>
> +   if (linktype == DLT_NULL && ifp->if_type == IFT_LOOP) {
> +   uint32_t af;
> +   memcpy(, mtod(m0, void *), sizeof(af));
> +   sockp->sa_family = af;
> +   m0->m_data += sizeof(af);
> +   m0->m_len -= sizeof(af);
> +   }
> +
> *mp = m0;
> return (0);

That seems ok to me.


I think the long-term right fix is to define DLT_AF which has an AF word
in host order on receive and transmit always, and to modify interfaces
to use it whenever they are AF aware at all.   In this case tun would
fill in the AF word from the IP proto field, and you'd get a
transformed/regularized AF word when really the "link layer packet" had
the IP proto field.  But that's ok as it's just cleanup and reversible.


signature.asc
Description: PGP signature


Re: Enable to send packets on if_loop via bpf

2022-11-09 Thread Greg Troxel

Ryota Ozaki  writes:

> NetBSD can't do this because a loopback interface
> registers itself to bpf as DLT_NULL and bpf treats
> packets being sent over the interface as AF_UNSPEC.
> Packets of AF_UNSPEC are just dropped by loopback
> interfaces.
>
> FreeBSD and OpenBSD enable to do that by letting users
> prepend a protocol family to a sending data.  bpf (or if_loop)
> extracts it and handles the packet as an extracted protocol
> family.  The following patch follows them (the implementation
> is inspired by OpenBSD).
>
> http://www.netbsd.org/~ozaki-r/loop-bpf.patch
>
> The patch changes if_loop to register itself to bpf
> as DLT_LOOP and bpf to handle a prepending protocol
> family on bpf_write if a sender interface is DLT_LOOP.

I am surprised that there is not already a DLT_foo that already has this
concept, an AF word followed by data.  But I guess every interface
already has a more-specific format.

Looking at if_tun.c, I see DLT_NULL.  This should have the same ability
to write.   I have forgotten the details of how tun encodes AF when
transmitting, but I know you can have v4 or v6 inside, and tcpdump works
now. so obviously I must be missing something.

My suggestion is to look at the rest of the drivers that register
DLT_NULL and see if they are amenable to the same fix, and choose a new
DLT_SOMETHING that accomodates the broader situation.

I am not demanding that you add features to the rest of the drivers.  I
am only asking that you think about the architectural issue of how the
rest of them would be updated, so we don't end up with DLT_LOOP,
DLT_TUN, and so on, where they all do almost the same thing, when they
could be the same.

I don't really have an opinion on host vs network for AF, but I think
your choice of aligning with FreeBSD is reasonable.




signature.asc
Description: PGP signature


Re: #pragma once

2022-10-16 Thread Greg Troxel

My quick reaction is that we should stick to actual standards, absent a
really compelling case.  This isn't compelling to me, and the point that
linting for wrong usage isn't hard is a good one.

I happen to be in the middle of a paper (from the guix crowd) about
de-boostrapping ocaml.  It's about getting rid of binary bootstraps.
That's a problem we also have in pkgsrc, but we haven't issued a
manifesto.  While it might seem tangential, the de-bootstrapping world
often wants to compile older code with older tools to construct a build
graph that starts from as little binary as possible.

Thus, "newer compilers all do this" is a bit scary, as while that's what
people usually use, it's more comfortable to say "we need C99 plus X"
for as little X as possible.


signature.asc
Description: PGP signature


Re: Can version bump up to 9.99.100?

2022-09-17 Thread Greg Troxel

David Holland  writes:

> On Fri, Sep 16, 2022 at 07:00:23PM +0700, Robert Elz wrote:
>  > That is, except for in pkgsrc, which is why I still
>  > have a (very mild) concern about that one - it actually compares the
>  > version numbers using its (until it gets changed) "Dewey" comparison
>  > routines, and for those, 9.99.100 is uncharted territory.
>
> No, it's not, pkgsrc-Dewey is well defined on arbitrarily large
> numbers. In fact, that's in some sense the whole point of it relative
> to using fixed-width fields.

And, surely we had 9.99.9 and 9.99.10.  The third digit is no more
special than the second.  It's just that it happens less often so the
problem of arguably incorrectly written two-digit patterns is more
likely than for that to happen with one.

It's not reasonable to constrain a normal process because other bugs
might exist.


signature.asc
Description: PGP signature


Re: Module autounload proposal: opt-in, not opt-out

2022-08-08 Thread Greg Troxel

Paul Goyette  writes:

> (personal note)
> It really seems to me that the current module sub-systems is at
> best a second-class capability.  I often get the feeling that
> others don't really care about modules, until it's the only way
> to provide something else (dtrace).  This proposal feels like
> another nail in the modular coffin.  Rather than disabling (part
> of) the module feature, we should find ways to improve testing
> the feature.

I'd just like to say that while I haven't gone down the "modules first"
path, I have been watching your commits and cheering you on.

I do use a few modules, and this is making me think I should try to run
MODULAR, especially on machines with less memory.
I'm a little scared of not even having UFS, but I can try it as the
low-memory machine is not important.


signature.asc
Description: PGP signature


Re: Module autounload proposal: opt-in, not opt-out

2022-08-08 Thread Greg Troxel

Martin Husemann  writes:

> I think that all modules that we deliver and declare safe for *autoload*
> should require to be actually tested, and a basic test (for me) includes
> testing auto-unload. That does not cover races that slip through "casual"
> testing, but should have caught the worst bugs.

That's a reasonable position for adding modules, but

> So the error in the cases you stumbled in is the autoload and keeping the
> badly tested module autoloadable but forbid its unloading sounds a bit
> strange to me.

Given where we are, do you really mean

  we should withdraw every module from autoload that does not have a
  documented test result, right now?

It seems far better to have them stay loaded than be unavailable.


signature.asc
Description: PGP signature


Re: New iwn firmware & upgrade procedure

2022-06-19 Thread Greg Troxel

Havard Eidnes  writes:

>> A quick skim of /libdata/firmware makes me think it is mostly not
>> versioned.
>
> Really?  I suspect all the if_iwn files are versioned; if it
> follows the pattern for iwlwifi-6000g2a-5, the number behind the
> last hyphen is the version number.

Look at all the devices, not just if_iwn.


signature.asc
Description: PGP signature


Re: New iwn firmware & upgrade procedure

2022-06-19 Thread Greg Troxel

Havard Eidnes  writes:

> 1) Could the if_iwn driver fall back to using the 6000g2a-5 microcode
>without any code changes?  (My gut feeling says "yes", but I have
>no existence proof of that.)

Unless it's really necessary (ABI change in accessing device with new
firmware), it seems that the firmware should just be named for the
device and not have the firmware version.  Thus you'd get the version
you have in tree, and that might be a little old.  Alternatively there
could be a symlink.  But I don't understand why it is versioned.

> Should the wireless firmware go into a different set which we also
> learn the habit of extracting before reboot of the kernel?

If the versioning is really intractable and frequent, perhaps, but I
think this can be 99% solved by not putting firmware versions in
filenames.

A quick skim of /libdata/firmware makes me think it is mostly not
versioned.


signature.asc
Description: PGP signature


Re: Slightly off topic, question about git

2022-06-06 Thread Greg Troxel

David Brownlee  writes:

> I suspect most of this also works with s/git/hg/ assuming NetBSD
> switches to a mercurial repo

Indeed, all of this is not really about git.  Systems in the class of
"distributed VCS" have two important properties:

  commits are atomic across the repo, not per file

  anyone can prepare commits, whether or not they are authorized to
  apply them to the repo.  an authorized person can apply someone else's
  commit.

These more or less lead to "local copy of the repo".  And there are web
tools for people who just want to look at something occasionally.But
I find that it's not that big, that right now I have 3 copies (8, 9,
current), and that it's nice to be able to do things offline (browse,
diff, commit).

CVS is really just RCS with
  organization into groups of files
  ability to operate over ssh (rsh originally :-)
That was really great in 1994; I remember what a big advance it was
(seriously).
  


signature.asc
Description: PGP signature


Re: wsvt25 backspace key should match terminfo definition

2021-11-24 Thread Greg Troxel

RVP  writes:

> On Tue, 23 Nov 2021, Michael van Elst wrote:
>
>> If you restrict yourself to PC hardware (i386/amd64 arch) then
>> you probably have either
>>
>> a PS/2 keyboard -> the backspace key generates a DEL.
>> a USB keyboard  -> the backspace key generates a BS.
>>
>> That's something you cannot shoehorn into a single terminfo
>> attribute and that's why many programs simply ignore terminfo
>> here, in particular when you think about remote access.
>
> So, if I had a USB keyboard (don't have one to check right now), the
> terminfo entry would be correct? How do we make this consistent then?
> Have 2 terminfo entries: wsvt25-ps2 and wsvt25-usb (and fix-up getty
> to set the correct one)?

wscons is supposed to abstract all this, so making wsvt25-foo for
different keyboard classes seems like the wrong approach.

wskbd(4) says:

 •   Mapping from keycodes (defined by the specific keyboard driver) to
 keysyms (hardware independent, defined in
 /usr/include/dev/wscons/wsksymdef.h).

As uwe@ points out, the terms we use and the actual key labels are
confusing.  When I've talked about the DEL key, I've meant the key that
the user types to delete backwards, almost always upper right and easily
reachable when touch typing, and that in DEC tradition sent the DEL 0x1f
character.  It was pointed out that newer terminals have a backarrow
logo, and I see that an IBM USB keyboard has that too.

Then there's the BS key, which older (almost all actual?) terminals had,
but my IBM USB keyboard doesn't have one, and my mac doesn't either.

Looking in wsksymdef.h (netbsd-9, which is handy), we see "keysyms"
which is what keycodes are supposed to map into, and it talks about them
being aligned with ASCII.   Relevant to this discussion there is

#define KS_BackSpace0x08
#define KS_Delete   0x7f
#define KS_KP_Delete0xf29f

So that's for BS, DEL (to use ASCII) and the extended keypad "delete
right" introduced with I think the VT220.

On my USB keyboard, in NetBSD 9 wscons without trying to mess with
mappings, I get

  backarrow (key where DEL should) ==> BS (^H)
  keypad Delete key (next to insert/home/end/pageup/pagedown) ==> DEL (^?)

and I see that stty has erase to to ^H.


The underlying issue is that the norms of some systems are to map that
"user wants to delete left easily reachable key" to BS and some want to
map it to DEL.  I see these as the PC tradition and the UNIX tradition.

So I think NetBSD should decide that we're following the UNIX tradition
that this key is DEL, have wskbd map it that way for all keyboard types,
and have stty erase start out DEL.  (Plus of course carrying this across
ssh so cross-deletionism works, which I think is already the case.)

A quick glance at wskbd and ukbd did not enlighten me.

xev shows similar wrong x keysyms, BS and DEL for "backarrow" and
"keypad delete".



signature.asc
Description: PGP signature


Re: wsvt25 backspace key should match terminfo definition

2021-11-23 Thread Greg Troxel

Valery Ushakov  writes:

> vt52 is different.  I never used a real vt52 or a clone, but the
> manual at vt100.net gives the following picture:
>
>   https://vt100.net/docs/vt52-mm/figure3-1.html
>
> and the description
>
>   https://vt100.net/docs/vt52-mm/chapter3.html#S3.1.2.3
>
>   Key CodeAction Taken if Codes Are Echoed
>   BACK SPACE  010 Backspace (Cursor Left) function
>   DELETE  177 Nothing

That is explaining what the terminal does when those codes are sent by
the computer.  That is a different thing from how the computer
interprets user input.

When using a VT52 on Seventh Edition, for example one pushed DEL to
remove the previous character, and the computer woudl send
"" to make it disappear and leave the cursor left.  One
basically never pushed BS.

> vt100 had similar keyboard (again, never used a real one personally)
>
>   https://vt100.net/docs/vt100-ug/chapter3.html#F3-2
>
>   BACKSPACE   010 Backspace function
>   DELETE  177 Ignored by the VT100

same as vt52, I think.


> But vt200 and later use a different keyboard, lk201 (and i did use a
> real vt220 a lot)
>
>   https://vt100.net/docs/vt220-rm/figure3-1.html
>
> that picture is not very good, the one from the vt320 manual is better
>
>   https://vt100.net/docs/vt320-uu/chapter3.html
>
> vt220 does NOT have a configuration option that selects the code that
> the  But somehow the official terminfo database has kbs=^H for vt220!
>
> Later it became configurable:
>
>   https://vt100.net/docs/vt320-uu/chapter4.html#S4.13
>
> For vt320 (where it *is* configurable) terminfo has
>
>   $ infocmp -1 vt320 | grep kbs
>   kbs=^?,

Very interesting!

>
>> I think the first thing to answer is "what is kbs in terminfo supposed
>> to mean".
>
> X/Open Curses, Issue 7 doesn't explain, other than saying "backspace"
> key, which is an unfortunate name, as it's loaded.  But it's
> sufficiently clear from the context that it's the key that deletes
> backwards, i.e.  deletes under.

So it's the codes generated by the DEL key (as opposed to the Delete
key).

>> My other question is how kbs is used from terminfo.  Is it about
>> generating output sequences to move the active cursor one left?  If so,
>> it's right.  Is it about "what should the user type to delete left",
>> then for a vt52/vt220, that's wrong.  If it is supposed to be both,
>> that's an architectural bug as those aren't the same thing.
>
> No, k* capabilities are sequences generated by the terminal when some
> key is pressed.  The capability for the sequence sent to the the
> terminal to move the cursor left one position is cub1
>
>   $ infocmp -1 vt220 | grep cub1
>   cub1=^H,
>   kcub1=\E[D,
>
> (kcub1 is the sequence generated by the left arrow _k_ey).

Then I'm convinced that kbs should be \? for these terminals.



signature.asc
Description: PGP signature


Re: wsvt25 backspace key should match terminfo definition

2021-11-23 Thread Greg Troxel

Johnny Billquist  writes:

>> For vt320 (where it *is* configurable) terminfo has
>>
>>$ infocmp -1 vt320 | grep kbs
>>kbs=^?,
>
> Which I think it should be.


But what does kbs mean?

 - the ASCII character sent by the computer to move the cursor left?
 - the ASCII character sent by the BS key?
 - the ASCII character sent by the DEL key that the uses uss to delete left?


signature.asc
Description: PGP signature


Re: wsvt25 backspace key should match terminfo definition

2021-11-23 Thread Greg Troxel

Valery Ushakov  writes:

> On Tue, Nov 23, 2021 at 00:01:40 +, RVP wrote:
>
>> On Tue, 23 Nov 2021, Johnny Billquist wrote:
>> 
>> > If something pretends to be a VT220, then the key that deletes
>> > characters to the left should send DEL, not BS...
>> > Just saying...
>> 
>> That's fine with me too. As long as things are consistent. I suggested the
>> kernel change because both terminfo definitions (and the FreeBSD console)
>> go for ^H.
>
> Note that the pckbd_keydesc_us keymap maps the scancode of the <- key to
>
> KC(14),  KS_Cmd_ResetEmul, KS_Delete,
>
> i.e. 0x7f (^?).
>
> terminfo is obviously incorrect here.  Amazingly, the bug is actually
> in vt220 description!  wsvt25 just inherits from it:
>
> $ infocmp -1 vt220 | grep kbs
> kbs=^H,
>
> I checkeed termcap.src from netbsd-4 and it's wrong there too.  I have
> no idea htf that could have happened.

I think (memory is getting fuzzy) the problem is that the old terminals
had a delete key, in the upper right, that users use to remove the
previous character, and a BS key, upper left, that was actually a
carriage control character.


The basic problem is that in the PC world, the idea is that key where
DEL should be has a backarrow the the PC world thinks it is backspace.
That's the DEC-centric viewpoint of course :-)

I think any change needs a careful proposal and review, becuase there
are lots of opinions here and a change is likely to mess up a bunch of
people's configs, even if they have worked around something broken.  I
don't mean "no changes", just that if you don't think this is a really
hard problem you probably shouldn't change it (globally).

Also /usr/include/sys/ttydefaults.h is about all of NetBSD on all sorts
of hardware, not just PCs and there are lots of keyboards as well as
actual terminals.   Ever since we moved beyond ASR33, CERASE has been
0177 (my Unix use more or less began with a VT52 and a Beehive CRT).

xterm has a config to say "make the key where DEL ought to be generate
the key that the tty has configured as ERASE".  I suspect that the right
approach is

  1) choose what wscons generates for the "key where DEL belongs"

  2) have the tty set so that the choice in (1) is 'stty erase'.

I see the same kbs=^H on vt52.


I think the first thing to answer is "what is kbs in terminfo supposed
to mean".



My other question is how kbs is used from terminfo.  Is it about
generating output sequences to move the active cursor one left?  If so,
it's right.  Is it about "what should the user type to delete left",
then for a vt52/vt220, that's wrong.  If it is supposed to be both,
that's an architectural bug as those aren't the same thing.


signature.asc
Description: PGP signature


Re: timecounters

2021-11-13 Thread Greg Troxel

I think it makes sense to document them, and arguably each counter
should have a man page, except for things that are somehow in
timecounter(9) instead (if they don't have a device name?).



signature.asc
Description: PGP signature


Re: Representing a rotary encoder input device

2021-09-22 Thread Greg Troxel

What do other systems do?  It strikes me what wsmouse feels like it is
for things connected with the kbd/mouse/display world.  To be
cantankerous, using it seems a little bit like representing a GPIO input
as a 1-button mouse that doesn't move.

I would imagine that a rotary encoder is more likely to be a volume or
level control, but perhaps not for the machine, perhaps just reported
over MQTT so Home Assistant on some other machine can deal.

If you are really talking about encoders hooked to gpio, then perhaps
gpio should grow a facility to take N pins and say they are some kind of
encoder and then have a gpio encoder abstraction.

But maybe you are trying to use an encoder to add scroll to a 3-button
mouse?


signature.asc
Description: PGP signature


Re: SCSI scanners

2021-06-29 Thread Greg Troxel

Julian Coleman  writes:

> Can we get rid of the SCSI scanner support as well?  It only supports old
> HP and Mustek scanners, and its functionality is superseded by SANE (which
> sends the relevant SCSI commands from userland).

If it's really the case that SANE works with these, then that seems ok.
(I actually have a UMAX scsi scsanner but haven't powered it on in
years.)

I wonder though if this is causing the kind of trouble that uscanner
caused.


signature.asc
Description: PGP signature


Re: protect pmf from network drivers that don't provide if_stop

2021-06-29 Thread Greg Troxel

Martin Husemann  writes:

> On Tue, Jun 29, 2021 at 03:46:20PM +0930, Brett Lymn wrote:
>> I turned up a fix I had put into my source tree a while back, I think at
>> the time the wireless driver (urtwn IIRC) did not set an entry for
>> if_stop.
>
> This is a driver bug, we should not work around it but catch it early
> and fix it.

So maybe KASSERT that stop exists, and then call it if non-NULL, so
regular users don't crash, and DIAGNOSTIC does what DIAGNOSTIC is
supposed to do?


signature.asc
Description: PGP signature


Re: regarding the changes to kernel entropy gathering

2021-04-06 Thread Greg Troxel

Thanks - that is useful information.

I think the big point is that the new seed file is generated from
urandom, not from the internal state, so the new seed doesn't leaak
internal state.  The "save entropy" language didn't allow me to conclude
that.

Also, your explanation is about updating, but it doesn't address
generation of a file for the first time.  Presumably that just takes
urandom without the old seed that isn't there and doesn't overwrite the
old seed that isnt' there.

Interestingly, I have a machine running current, running as a dom0
sometimes, and haven't had problems.  I now realize that's only because
the machine had a seed file created under either 7 or 9 (installed 7,
updated to 9, updated to current).  So it has trusted, untrustworthy
entropy (even though surely after all this time some of it must have
been unobserved).


signature.asc
Description: PGP signature


Re: regarding the changes to kernel entropy gathering

2021-04-06 Thread Greg Troxel

Thor Lancelot Simon  writes:

> shuts down, again all entropy samples that have been added (which, again,
> are accumulating in the per-cpu pools) are propagated to the global pool;
> all the stream RNGs rekey themselves again; then the seed is extracted.

It seems obvious to me that "extracting" the seed should be done in such
a way that the state of the internal rng is still unpredictable from the
saved seed, even if the state of the newly-booted rng will be
predictable.  Perhaps by pulling 256 bytes from urandom, perhaps by
something more direct and then some sort of hash/rekey to get back
traffic protection.

Probably this is already done in a way much better thought out than my
30s reaction, the man page doesn't really say this, at least that I
could follow; rndctl -S says "save entropy pool".


signature.asc
Description: PGP signature


Re: ZFS: time to drop Big Scary Warning

2021-03-25 Thread Greg Troxel

chris...@astron.com (Christos Zoulas) writes:

> That's a good test, but how does zfs compare in for the same test with lets
> say ffs or ext2fs (filesystems that offer persistence)?

With the same system, booted in the  same way, but with 3 different
filesystems mounted on /tmp, I get similar numbers of failures:

tmpfs   12
ffs213
zfs 18

So tmpfs/ffs2 are ~equal and zfs has a few more failures (but it all
looks a bit random and non-repeatable).So it's hard to sort out "zfs
is buggy" vs "some tests fail in timing-related hard-to-understand ways
and that seems provoked slightly more with /tmp on zfs".

Did you mean something else?


signature.asc
Description: PGP signature


Re: ZFS: time to drop Big Scary Warning

2021-03-23 Thread Greg Troxel

I got a suggestion to run atf with a ZFS tmp.  This is all with current
from around March 1, and is straight current, no Xen.

Creating tank0/tmp and having it be mounted on /tmp failed the mount
(but created the volume) with some sort of "busy" error.  I already had
a tmpfs mounted.  Rebooting, zfs got mounted and then tmpfs and i
unmounted tmpfs and then I have a zfs tmp.  So not sure what's up but
feels like a tmpfs issue more than a zfs issue, and not a big deal.  Or
maybe it's a feature that you can't mount over tmpfs.


With /tmp being tmpfs, my results are similar to the releng runs.  I've
indented things that don't match two spaces.

Failed test cases:
  lib/libc/sys/t_futex_ops:futex_wait_timeout_deadline
lib/libc/sys/t_ptrace_waitid:syscall_signal_on_sce
lib/libc/sys/t_truncate:truncate_err
  lib/librumpclient/t_exec:threxec
net/if_wg/t_misc:wg_rekey
  usr.bin/cc/t_tsan_data_race:data_race
usr.bin/make/t_make:archive
usr.bin/c++/t_tsan_data_race:data_race
usr.sbin/cpuctl/t_cpuctl:nointr
usr.sbin/cpuctl/t_cpuctl:offline
fs/ffs/t_quotalimit:slimit_le_1_user
modules/t_x86_pte:rwx

Summary for 903 test programs:
9570 passed test cases.
12 failed test cases.
73 expected failed test cases.
530 skipped test cases.

With /tmp being zfs:tank0/tmp, I get

Failed test cases:
  ./bin/cp/t_cp:file_to_file
  ./lib/libarchive/t_libarchive:libarchive
  ./lib/libc/stdlib/t_mktemp:mktemp_large_template
./lib/libc/sys/t_ptrace_waitid:syscall_signal_on_sce
  ./lib/libc/sys/t_stat:stat_chflags
./lib/libc/sys/t_truncate:truncate_err
./net/if_wg/t_misc:wg_rekey
  ./usr.bin/cc/t_tsan_data_race:data_race_pie
./usr.bin/make/t_make:archive
  ./usr.bin/ztest/t_ztest:assert
./usr.bin/c++/t_tsan_data_race:data_race
  ./usr.bin/c++/t_tsan_data_race:data_race_pie
./usr.sbin/cpuctl/t_cpuctl:nointr
./usr.sbin/cpuctl/t_cpuctl:offline
  ./fs/nfs/t_rquotad:get_nfs_be_1_group
./modules/t_x86_pte:rwx
  ./modules/t_x86_pte:svs_g_bit_set

Summary for 903 test programs:
9567 passed test cases.
17 failed test cases.
72 expected failed test cases.
529 skipped test cases.

which is also similar, but slightly different.

So overal I conclude that there's nothing terrible going on, and that
these results are in the same class of mostly passing but somewhat
irregular as the base case.  So work to do, but it doesn't support "ZFS
is scary".

(Of course, the system stayed up through the tests and has no apparent
trouble, or I would have said.)

As an aside, it would be nice if atf-test used TMPDIR or had an argument
to say what place to do tests.


signature.asc
Description: PGP signature


Re: ZFS: time to drop Big Scary Warning

2021-03-20 Thread Greg Troxel

"J. Hannken-Illjes"  writes:

>> On 19. Mar 2021, at 21:18, Michael  wrote:
>> 
>> On Fri, 19 Mar 2021 15:57:18 -0400
>> Greg Troxel  wrote:
>> 
>>> Even in current, zfs has a Big Scary Warning.  Lots of people are using
>>> it and it seems quite solid, especially by -current standards.  So it
>>> feels times to drop the warning.
>>> 
>>> I am not proposing dropping the warning in 9.
>>> 
>>> Objections/comments?
>> 
>> I've been using it on sparc64 without issues for a while now.
>> Does nfs sharing work these days? I dimly remember problems there.
>
> If you mean misc/55042: Panic when creating a directory on a NFS served ZFS
> it should be fixed in -current.

I have a box running current/amd64 from about March 4, with a zpool on a
disklabel partition, and a filesystem from that exported, mounted on a
9/amd64 box, and did the mkdir test and it was totally fine.   I was
able to have the maproot segfault happen, before the fix.  So yes, this
is fixed.


So summarizing:

  nobody has said there is any remaining serious issue

  many remember issues about NFS (true) but they all seem ok now

and I just looked over the open PRs and w.r.t. current don't see
anything serious.



signature.asc
Description: PGP signature


ZFS: time to drop Big Scary Warning

2021-03-19 Thread Greg Troxel

Even in current, zfs has a Big Scary Warning.  Lots of people are using
it and it seems quite solid, especially by -current standards.  So it
feels times to drop the warning.

I am not proposing dropping the warning in 9.

Objections/comments?


signature.asc
Description: PGP signature


Re: kmem pool for half pagesize is very wasteful

2021-02-23 Thread Greg Troxel

Chuck Silvers  writes:

> in the longer term, I think it would be good to use even larger pool pages
> for large pool objects on systems that have relatively large amount of memory.
> even with your patch, a 1k pool object on a system with a 4k VM page size
> still has 33% overhead for the redzone, which is a lot for something that
> is enabled by DIAGNOSTIC and is thus supposed to be "inexpensive".

So maybe the real bug is that this check should not be part of
DIAGNOSTIC.  I remember from 2.8BSD that DIAGNOSTIC was basically just
supposed to add cheap asserts and panic earlier but not really be slower
in any way anybody would care about.

It seems easy enough to make this separate and not get turned on for
DIAGNOSTIC, but some other define.   It might even be that for current
the checked-in GENERIC enables this.  But someone turning on DIAGNOSTIC
on 9 shouldn't get things that hurt memory usage really at all, or more
than say a 2% degradation in speed.

> there's a tradeoff here in that using a pool page size that matches the
> VM page size allows us to use the direct map, whereas with a larger
> pool page size we can't use the direct map (at least effectively can't today),
> but for pools that already use a pool page size that is larger than
> the VM page size (eg. buf16k using a 64k pool page size) we already
> aren't using the direct map, so there's no real penalty for increasing
> the pool page size even further, as long as the larger pool page size
> is still a tiny percentage of RAM and KVA.  we can choose the pool page size
> such that the overhead of the redzone is bounded by whatever percentage
> we would like.  this way we can use a redzone for most pools while
> still keeping the overhead down to a reasonable level.

That sounds like great progress and I don't mean to say anything
negative about that.


signature.asc
Description: PGP signature


Re: fsync error reporting

2021-02-19 Thread Greg Troxel

Greg Troxel  writes:

>   1) operating system has a succcessful return from a write transaction to
>   a disk controller (perhaps via a controller that has a write-back
>   cache)
>
>   2) operating system has been told by the controller that the write has
>   actually completed to stable storage (guaranteed even if OS crashes or
>   power fails, so actually written or perhaps in battery-backed cache)

I see our man page addresses this with FDISKSYNC.   It sounds like you
aren't proposing to change this (makes sense), but there's the pesky
issue of errors within the disk when writing from cache to media.
Perhaps those are unreportable.


signature.asc
Description: PGP signature


Re: fsync error reporting

2021-02-19 Thread Greg Troxel

David Holland  writes:

>  > > everything that process wrote is on disk,
>  > 
>  > That is probably unattainable, since I've seen it plausibly asserted
>  > that some disks lie, reporting that writes are on the media when this
>  > is not actually true.
>
> Indeed. What I meant to say is that everything has been sent to disk,
> as opposed to being accidentally skipped in the cache because the
> buffer was busy, which will currently happen on some of the fsync
> paths.
>
> That's why flushing the disk-level caches was a separate point.

(ignoring errors as I have no objection to what you proposed and
clarified with mouse@)

Maybe I'm way off in space, but I'd like to see us be careful about

  1) operating system has a succcessful return from a write transaction to
  a disk controller (perhaps via a controller that has a write-back
  cache)

  2) operating system has been told by the controller that the write has
  actually completed to stable storage (guaranteed even if OS crashes or
  power fails, so actually written or perhaps in battery-backed cache)

  A) for stacked filesystems like raid, cgd, and for things like NFS,
  there's basically and e2e ack of the above condition.

POSIX is of course weasely about this.  But it seems obvious that if you
call fsync, you want the property that if there is a crash or power
failure (but not a disk media failure :-) that your bits are there,
which is case 2.  Case 1 is only useful in that files could remain in OS
cache for a long time, and there is a pretty good but not guaranteed
notion that once in device writeback cache they will get to the actual
media in not that long.  The old "sync;sync;sync;sleep 10" thing from
before there was shutdown(8)...

I thought NCQ was supposed to give acks for actual writing, but allow
them to be perhaps ordered and multiple in flight, so that one could use
that instead of the big-hammer inscrutable writeback cache.

If the controller doesn't support NCQ, then it seems one has to issue a
cache flush, which presmably is defined to get all data in cache as of
the flush onto disk before reprorting that its done.


Is that what you're thinking, or do you think this is all about case 1?


signature.asc
Description: PGP signature


Re: fsync_range and O_RDONLY

2021-02-18 Thread Greg Troxel

David Holland  writes:

> Well, if you have it open for write and I have it open for read, and I
> fsync it, it'll sync your changes.

I guess maybe POSIX is wrong then :-)

But as a random user I can type sync to the shell.

> And report any errors to me, so if you're a database and I'm feeling
> nasty I can maybe mess with you that way. So I'm not sure it's a great
> idea.
>
> Right now fsync error reporting is a trainwreck though.

I think that's the real problem; if I open for write and fsync, then I
should get status back that lets me know about my writes, regardless of
who else asked for sync.   Once that's fixed, then the 'others asking
for sync' is much less of a big deal.
I know, ENOPATCH.



signature.asc
Description: PGP signature


Re: fsync_range and O_RDONLY

2021-02-17 Thread Greg Troxel

David Holland  writes:

> Last year, fdatasync() was changed to allow syncing files opened
> read-only, because that ceased to be prohibited by POSIX and something
> apparently depended on it.

I have a dim memory of this and mongodb.

> However, fsync_range() was not also so changed. Should it have been?
> It's now inconsistent with fsync and fdatasync and it seems like it's
> meant to be a strict superset of them.

It seems like it might as well be.  I would expect this to only really
sync the file's metadata, same as the others, but I do not feel like I
really understand this.


signature.asc
Description: PGP signature


Re: partial failures in write(2) (and read(2))

2021-02-05 Thread Greg Troxel

David Holland  writes:

> Basically, it is not feasible to check for and report all possible
> errors ahead of time, nor in general is it possible or even desirable
> to unwind portions of a write that have already been completed, which
> means that if a failure occurs partway through a write there are two
> reasonable choices for proceeding:
>(a) return success with a short count reporting how much data has
>already been written;
>(b) return failure.
>
> In case (a) the error gets lost unless additional steps are taken
> (which as far as I know we currently have no support for); in case (b)
> the fact that some data was written gets lost, potentially leading to
> corrupted output. Neither of these outcomes is optimal, but optimal
> (detecting all errors beforehand, or rolling back the data already
> written) isn't on the table.
>
> It seems to me that for most errors (a) is preferable, since correctly
> written user software will detect the short count, retry with the rest
> of the data, and hit the error case directly, but it seems not
> everyone agrees with me.

It seems to me that (a) is obviously the correct approach.

An obvious question is what POSIX requires, pause for `kill -HUP kred` :)

I am only a junior POSIX lawyer, not a senior one, but as I read

  
https://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html#tag_16_685

I think your case (a) is the only conforming behavior and obviously what
the spec says must happen.   I do not even see a glimmer of support for (b).

There is the issue of PIPE_BUF, and  requests <= PIPE_BUF being atomic,
but I don't think you are talking about that.

Note that write is obligated to return partial completion if interrupted
by a signal.

I think your notion that it's ok to not return the reason the full
amount wasn't written is enirely valid.

I am surprised this is contentious (really; not trying to be difficult).



signature.asc
Description: PGP signature


Re: Temporary memory allocation from interrupt context

2020-11-11 Thread Greg Troxel

Martin Husemann  writes:

> On Wed, Nov 11, 2020 at 08:26:45AM -0500, Greg Troxel wrote:
>> >LOCK(st);
>> >size_t n, max_n = st->num_items;
>> >some_state_item **tmp_list =
>> >kmem_intr_alloc(max_n * sizeof(*tmp_list));
>> 
>> kmem_intr_alloc takes a flag, and it seems that you need to pass
>> KM_NOSLEEP, as blocking for memory in softint context is highly unlikely
>> to be the right thing.
>
> Yes, and of course the real code has that (and works). It's just that 
>  - memoryallocators(9) does not cover this case
>  - kmem_intr_alloc(9) is kinda deprecated - quoting the man page:
>
>   These routines are for the special cases.  Normally,
>   pool_cache(9) should be used for memory allocation from interrupt
>   context.
>
>but how would I use pool_cache(9) here?

Not deprecated, but for "special cases".  I think needing a possibly-big
variable-size chunk of memory at interrupt time is special.

You would use pool_cache by being able to use a fixed-sized object.
But it seems that's not how the situation is.

I think memoryallocators(9) could use some spiffing up; it (on 9) says
kmem(9) cannot be used from interrupt context.

The central hard problem is orthogonal, though: if you don't
pre-allocate, you have to choose between waiting and copying with
failure.


signature.asc
Description: PGP signature


Re: Temporary memory allocation from interrupt context

2020-11-11 Thread Greg Troxel

Martin Husemann  writes:

> Consider the following pseudo-code running in softint context:
>
> void
> softint_func(some_state *st, )
> {
>   LOCK(st);
>   size_t n, max_n = st->num_items;
>   some_state_item **tmp_list =
>   kmem_intr_alloc(max_n * sizeof(*tmp_list));

kmem_intr_alloc takes a flag, and it seems that you need to pass
KM_NOSLEEP, as blocking for memory in softint context is highly unlikely
to be the right thing.

The an page is silent on whether lack of both flags is an error, and if
not what the semantics are.  (It seems to me it should be an error.)

With KM_NOSLEEP, it is possible that the allocation will fail.  Thus
there needs to be a strategy to deal with that.

>   n = 0;
>   for (i : st->items) {
>   if (!(i matches some predicate))
>   continue;
>   i->retain();
>   tmp_list[n++] = i;
>   }
>   UNLOCK(st);
>   /* do something with all elements in tmp_list */
>   kmem_intr_free(tmp_list, max_n * sizeof(*tmp_list));
> }
>
> I don't want to alloca here (the list could be quite huge) and max_n could
> vary a lot, so having a "manual" pool of a few common (preallocated)
> list sizes hanging off the state does not go well either.

I think that you need to pick one of

  pre-allocate the largest size and use it temporarily

  be able to deal with not having memory.  This leads to hard-to-debug
  situations if that code is wrong, becuase usually malloc will succeed.

  figure out that this softint can block indefinitely, only harming
  later calls of the same family, and not leading to kernel
  deadlock/etc.  This leads to hard-to-debug situations if lack of
  memory does lead to hangs, because usually malloc will succeed.

> In a perfect world we would avoid the interrupt allocation all together, but
> I have not found a way to rearrange things here to make this feasible.
>
> Is kmem_intr_alloc(9) the best way forward?

With all that said, note that I'm not the allocation export.


signature.asc
Description: PGP signature


Re: New header for GPIO-like definitions

2020-10-27 Thread Greg Troxel

Julian Coleman  writes:

>   name="LED activity"
>   name="LED disk5_fault"
>
>   name="INDICATOR key_diag"
>   name="INDICATOR disk5_present"
>
> and similar, then parse that in MI code.

Another approach would be to extend the fdt schema in the way they would
if solving this problem and use that.  In other words: if you were in
charge of fdt and were going to add this feature, what would you do?

But, your name overloading proposal seems ok.


signature.asc
Description: PGP signature


Re: New header for GPIO-like definitions

2020-10-26 Thread Greg Troxel

Julian Coleman  writes:

>> >   #define GPIO_PIN_LED0x01
>> >   #define GPIO_PIN_SENSOR 0x02
>> >
>> > Does this seem reasonable, or is there a better way to do this?
>
>> I don't really understand how this is different from in/out.
>> Presumably this is coming from some request from userspace originally,
>> where someone, perhaps in a config file, has told the system how a pin
>> is hooked up.
>
> The definitions of pins are coming from hardware-specific properties.

That's what I missed.  On a device you are dealing with, pin N is
*always* wired to an LED because that's how it comes from the factory.
My head was in maker-land where there is an LED because someone wired
one up.

> In the driver, I'd like to be able to handle requests based on what is
> connected to the pin.  For example, for LED's, attach them to the LED
> framework using led_attach()

That makes sense, then.  But how do you denote that logical high turns
on he light, vs logical low?

>> LED seems overly specific.  Presumably you care that the output does
>> something like "makes a light".  But I don't understand why your API
>> cares about light vs noise.  And I don't see an active high/low in your
>> proposal.   So I don't understand how this is different from just
>> "controllable binary output"
>
> As above, I want to be able to route the pin to the correct internal
> subsystem in the GPIO driver.

I just remember lights before LED, and the fact that they are LED vs
incandescent is not important to how they are used.  I don't know what's
next.

But given there is an led system, there is no incremental harm and it
seems ok.

>> I am also not following SENSOR.DO you just mean "reads if the logic
>> level at the pin is high or low".
>> 
>> I don't think you mean using i2c bitbang for a temp sensor.
>
> Yes, just reading the logic level to display whether the "thing" connected
> is on or off.  A better name would be appreciated.  Maybe "INDICATOR", which
> would match the envsys name "ENVSYS_INDICATOR"?

Or even "GPIO_ENVSYS_INDICATOR" because there might be some binary
inputs later that get hooked up to some other kind of framework.

> Hopefully, the above is enough, but maybe a code snippet would help (this
> snippet is only for LED's, but similar applies for other types).  In the
> hardware-specific driver, I add the pins to proplib:
>
>   add_gpio_pin(pins, "disk_fault", GPIO_PIN_LED,
>   0, GPIO_PIN_ACT_LOW, -1);
>   ...

So I see the ACT_LOW.

GPIO_PIN_LED is an output, but presumably this means that one can no
longer use it with GPIO and only via led_.  Which seems fine. Is that
what you mean?

> Then, in the MD driver I have:
>
>   pin = prop_array_get(pins, i);
>   prop_dictionary_get_uint32(pin, "type", );
>   switch (type) {
>   case GPIO_PIN_LED:
>   ...
>   led_attach(n, l, pcf8574_get, pcf8574_set);

Do you mean MD, or MI?

> and because of the way that this chip works, I also need to know in advance
> which pins are input and which are output, to avoid inadvertently changing
> the input pins to output when writing to the chip.  For that, generic
> GPIO_PIN_IS_INPUT and GPIO_PIN_IS_OUTPUT definitions might be useful too.

I 95% follow, but I am convinced that what you are doing is ok, so to be
clear I have no objections.



signature.asc
Description: PGP signature


Re: New header for GPIO-like definitions

2020-10-26 Thread Greg Troxel

Julian Coleman  writes:

> I'm adding a driver and hardware-specific properties for GPIO's (which pins
> control LED's, which control sensors, etc).  I need to be able to pass the
> pin information from the arch-specific configuration to the MI driver.  I'd
> like to add a new dev/gpio/gpiotypes.h, so that I can share the defintions
> between the MI and MD code, e.g.:
>
>   #define GPIO_PIN_LED0x01
>   #define GPIO_PIN_SENSOR 0x02
>
> Does this seem reasonable, or is there a better way to do this?

I don't really understand how this is different from in/out.
Presumably this is coming from some request from userspace originally,
where someone, perhaps in a config file, has told the system how a pin
is hooked up.

LED seems overly specific.  Presumably you care that the output does
something like "makes a light".  But I don't understand why your API
cares about light vs noise.  And I don't see an active high/low in your
proposal.   So I don't understand how this is different from just
"controllable binary output"

I am also not following SENSOR.DO you just mean "reads if the logic
level at the pin is high or low".

I don't think you mean using i2c bitbang for a temp sensor.

Perhaps you could step back and explain  the bigger picture and what's
awkward currently.  I don't doubt you that more is needed, but I am not
able to understand enough to discuss.




signature.asc
Description: PGP signature


Re: make COMPAT_LINUX match SYSV binaries

2020-10-21 Thread Greg Troxel

co...@sdf.org writes:

> I feel compelled to explain further:
> any OS that doesn't rely on this tag is prone to spitting out binaries
> with the wrong tag. For example, Go spits out Solaris binaries with SYSV
> as well.
>
> Our current solution to it is the kernel reading through the binary,
> checking if it contains certain known symbols that are common on Linux.
>
> We support the following forms of compat:
>
> ultrixnot ELF
> sunos not ELF (we support only oold stuff)
> freebsd   always correctly tagged, because the native OS
>   checks this, like we do.
> linux ELF, not always correctly tagged
>
>
> So, currently, we only support one OS that has this problem, which is
> linux. I am proposing we take advantage of it.
>
> In the event someone adds support for another OS with this problem (say,
> modern Solaris), I don't expect this compat to be enabled by default,
> for security reasons. So the problem will only occur if a user enables
> both forms of compat at the same time.
>
> Users already have to opt in to have Linux compat support. I think it is
> a lot to ask to have them tag every binary.

Thanks for the explanation.  I'm still not thrilled, but I withdraw my
objection.


signature.asc
Description: PGP signature


Re: make COMPAT_LINUX match SYSV binaries

2020-10-20 Thread Greg Troxel

co...@sdf.org writes:

> As a background, some Linux binaries don't claim to be targeting the
> Linux OS, but instead are "SYSV".
>
> We have used some heuristics to still identify those binaries as being
> Linux binaries, like looking into the symbols defined by the binary.
>
> it looks like we no longer have other forms of compat expected to use
> SYSV ELF binaries. Perhaps we should drop this elaborate detection logic
> in favour of detecting SYSV == Linux?

In general adapting to every confused practice out there leads us to a
bad place.  This just feels like a step along that path.

I could see having a sysctl/etc. to enable this behavior, but it seems
really irregular.   Is there a way to have a tool to retag binaries
that are tagged incorrectly?   It seems SYSV emulation should not allow
non-SYSV system calls.



signature.asc
Description: PGP signature


Re: autoloading compat43 on tty ioctls

2020-10-10 Thread Greg Troxel

chris...@astron.com (Christos Zoulas) writes:

> Aside for the TIOCGSID bug which I am about to fix (it is in tty_43.c
> and is used in libc tcgetsid(), all the compat tty ioctls are defined
> in /usr/src/sys/sys/ioctl_compat.h... We can empty that file and try
> to build the tree :-), but I am guessing things will break. Also a lot
> of pkgsrc will break too. It is not 4.3 applications that break it is
> applications that still use the 4.3 terminal api's.

If the API is still present in our source tree, then the implementation
probably does not belong under COMPAT_43.  As I see it COMPAT_43 is to
match an old ABI that one can no longer (on modern NetBSD) compile to.
What you are describing sounds like "we have an API still, and we've had
it since 4.3", which is not in my view COMPAT.


signature.asc
Description: PGP signature


Re: Sample boot.cfg for upgraded systems (rndseed & friends)

2020-09-22 Thread Greg Troxel

David Brownlee  writes:

> What would people think of installing an original copy of the etc set
> in /usr/share/examples/etc or similar - its 4.9M extracted and ~500K
> compressed and the ability to compare what is on the system to what it
> was shipped with would have saved me so much effort over the years :)

I personally unpack etc and xetc to /usr/netbsd-etc via the
INSTALL-NetBSD update script in etcmanage.  I would not be super keen on
adding a full etc by default, especially because then there's the issue
of managing it for upgrades.   But if it is unpacked someplace, and
updated on updates, and old files removed on updates via postinstall
fix, maybe.


signature.asc
Description: PGP signature


Re: Logging a kernel message when blocking on entropy

2020-09-22 Thread Greg Troxel

Andreas Gustafsson  writes:

> The following patch will cause a kernel message to be logged when a
> process blocks on /dev/random or some other randomness API.  It may
> help some users befuddled by pkgsrc builds blocking on /dev/random,
> and I'm finding it useful when testing changes aimed at fixing PR
> 55659.

I'm in favor.

I have not dug in to the brave new entropy world.  I'm sure it's better
in many ways, but it also seems like people/systems that used to not end
up blocked before now do, apparently because some sources that used to
be considered ok (timing of events) no longer are.   So I think people
should be given clues - things appear a bit too difficult now.


signature.asc
Description: PGP signature


Re: Proposal to enable WAPBL by default for 10.0

2020-07-23 Thread Greg Troxel
Taylor R Campbell  writes:

[lots of good points, no disagreement]

If /etc/master.passwd is ending up with junk, that's a clue that code
that updates it isn't doing the write secondary file, fysnc it, rename,
approach.  As I understand it with POSIX filesystems you have to do that
because there is no guarantee on open/write/close that you'll have one
or the other.  Even with zfs, you could have done write on the first
half and not the second, so I think you still need this.q

> work...which is why I used to use ffs+sync on my laptop, and these
> days I avoid ffs altogether in favour of zfs and lfs, except on
> install images written to USB media.)

Do you find that lfs is 100% solid now (in 9-stable, or current)?  I
have seen fixes and never really been sure.


Re: AES leaks, cgd ciphers, and vector units in the kernel

2020-06-23 Thread Greg Troxel
a data point on a machine from 2014:

$ ./aestest -l
BearSSL aes_ct
Intel SSE2 bitsliced

$ progress -f /dev/zero sh -c 'exec ./aestest -e -b 256 -c aes-xts -i "Intel 
SSE2 bitsliced" > /dev/null'
   399 MiB   56.98 MiB/s ^C
$ progress -f /dev/zero sh -c 'exec ./aestest -e -b 256 -c aes-xts -i "BearSSL 
aes_ct" > /dev/null'
   211 MiB   26.38 MiB/s ^C
$ progress -f /dev/zero sh -c 'exec ./bad -e -b 256 -c aes-xts > /dev/null'
   869 MiB   86.85 MiB/s ^C

So the sse2 is slower, but not enough to get upset about.



cpu0: "Intel(R) Core(TM) i7 CPU 930  @ 2.80GHz"
cpu0: Intel Core i7, Xeon 34xx, 35xx and 55xx (Nehalem) (686-class), 2800.09 MHz
cpu0: family 0x6 model 0x1a stepping 0x5 (id 0x106a5)
cpu0: features 0xbfebfbff
cpu0: features 0xbfebfbff
cpu0: features 0xbfebfbff
cpu0: features1 0x98e3bd
cpu0: features1 0x98e3bd
cpu0: features2 0x28100800
cpu0: features3 0x1
cpu0: features7 0x9c00


Re: AES leaks, cgd ciphers, and vector units in the kernel

2020-06-23 Thread Greg Troxel
Taylor R Campbell  writes:

>> What I meant is: consider an external USB disk of say 4T, which has a
>> cgd partition within which is ffs.
>> 
>> Someone attaches it to several systems in turn, doing cgd_attach, mount,
>> and then runs bup with /mnt/bup as the target, getting deduplication
>> across systems.
>
> (Side note: as a matter of architecture I would recommend
> incorporating the cryptography into the application, like borgbackup,
> restic, or Tarsnap do -- anything at a higher level than disks (even
> at the level of the file system, like zfs encryption) has much more
> flexibility and can also provide authentication.  Generally the main
> use case for disk encryption is to enable recycling disks without
> worrying about information disclosure; the threat model and security
> of disk encryption systems are both qualitatively very weak.)

Sure, but this is about doing something that is really reliable about
getting data back for disaster recovery, simplicity, only using tools
that have existed for a long time.  (You can't run zfs on old systems,
and borgbackup has had enough stability issues that I wouldn't trust
it.)

>> So, using the new faster cipher won't work, because it's not supported
>> by the older systems.
>> 
>> Hoewver, if the -current system does AES slowly because it has the new
>> constant-time implementation, and the older ones do it like they used
>> to, I don't see a real problem.
>
> OK.  If you encounter a scenario where this is likely to be a real
> problem, let me know.

>From my viewpoint, a 3x slowdown, but with 100% reliablity is not a big
deal.

> I drafted an SSE2 implementation which considerably improves on the
> BearSSL aes_ct implementation on a number of amd64 CPUs I tested from
> around a decade ago.  It is still slower than before -- and AES-CBC
> encryption hurts by far the most, because it is necessarily
> sequential, whereas AES-CBC decryption and AES-XTS in both directions
> can be vectorized -- but it does mitigate the problem somewhat.  This
> covers all amd64 CPUs and probably most `i386' CPUs of the last 15-20
> years.
>
> There is some more room for improvement -- SSSE3 provides PSHUFB which
> can sequentially speed up parts of AES, and is supported by a good
> number of amd64 CPUs starting around 14 years ago that lack AES-NI --
> but there are diminishing returns for increasing implementation and
> maintenance effort, so I'd like to focus on making an impact on
> systems that matter.  (That includes non-x86 CPUs -- e.g., we could
> probably easily adapt the Intel SSE2 logic to ARM NEON -- but I would
> like to focus on systems where there is demand.)

That sounds good.

> I drafted a couple programs to approximately measure performance from
> userland.  They are very naive and do nothing to measure overhead from
> cgd(4) or disk i/o itself.
>
> https://www.NetBSD.org/~riastradh/tmp/20200621/aestest.tgz
> https://www.NetBSD.org/~riastradh/tmp/20200622/adiantum.tgz

Thanks - will try them.

>> So it remains to make userland AES use also constant time, as a separate
>> step?
>
> Correct.

ok - and helpful details from nia@ noted.


Re: AES leaks, cgd ciphers, and vector units in the kernel

2020-06-18 Thread Greg Troxel
Taylor R Campbell  writes:

>> I don't really see the new cipher as a reasonable option for removable
>> disks that need to be accessed by older systems.  I can see it for
>> encrypted local disk.  But given AES hardware speedup, I suspect most
>> people can just stay with AES.
>
> Can you be more specific about the systems you're concerned about?
> What are the characteristics and performance requirements of the
> different systems that need to share disks?  Do you have a reason to
> need to share a backup drive that you use on an up-to-date NetBSD on
> older hardware where it has to be fast, with a much older version of
> NetBSD?
>
> (I am sure there are use cases I haven't thought of; I just want to
> make sure I understand the use cases before I try to address them.)

What I meant is: consider an external USB disk of say 4T, which has a
cgd partition within which is ffs.

Someone attaches it to several systems in turn, doing cgd_attach, mount,
and then runs bup with /mnt/bup as the target, getting deduplication
across systems.  Of these systems, some are older NetBSD and some are
newer.  Posit one each netbsd 5, 7, 8, 9, current in the mix, as a blend
of strawman and not-so-crazy example.  After this, the disk is taken to
an undisclosed location where it is unlikely to be destroyed (or at
least, unlikely to be destroyed correlated with the main systems'
disks), but at which it does not have reliable physical protection
against snoooping.  I submit that this is not an odd model for cgd
usage.

(I don't actually do this; I mount disks on one system and do
over-the-network backups from the older systems, and my mix of system
versions is different.)

So, using the new faster cipher won't work, because it's not supported
by the older systems.

Hoewver, if the -current system does AES slowly because it has the new
constant-time implementation, and the older ones do it like they used
to, I don't see a real problem.

>> Is there an easy way to publish code that does hardware AES, to allow
>> people to measure on their hardware?  If a call for that on -users turns
>> up essentially zero actual people that would be bothered, I think that
>> would be interesting.
>
> I am not quite sure what you're asking.  Correct me if I have
> misunderstood, but I suspect what you're getting at is:
>
>How can someone on netbsd<=9 test empirically whether this patch
>will have a substantial negative performance impact or not?
>
> On basically all amd64 systems of the past decade, and on most if not
> all aarch64 systems, there is essentially guaranteed to be a net
> performance improvement.  What about other systems?
>
> The best way to test this is to just boot a new kernel and try a
> workload.  But I assume you are looking for a userland program that
> one can compile and run to test it without booting a new kernel.

Yes, that's what I meant.  Kind of like "openssl speed".

> I could in a couple hours make a program that checks cpuid to detect
> hardware support and does some measurements in isolation -- to
> estimate an _upper bound_ on the system performance impact.
>
> The upper bound is likely to be extremely conservative unless your
> workload is actually reading and writing zeros to cgd on a RAM-backed
> file in tmpfs; for a realistic impact on cgd or ipsec you would have
> to take into account the disk or network throughput -- the fraction of
> it that is spent in the crypto is what the 1/3-2/3 figure applies to.

I did sort of mean "how many MB/s would the old impl do, and how many
MB/s would the new one do", realizing that actually reading/writing from
disk might overwhelem that.

I'm not sure my request is reasonable; it might help up the comfort
level for people.

> (Note that there is no impact on userland crypto, which means no
> impact on TLS or OpenVPN or anything like that, unless for some
> bizarre reason you've turned on kern.cryptodevallowsoft and the
> userland crypto uses /dev/crypto, the solution to which is to stop
> using /dev/crypto and/or turn off kern.cryptodevallowsoft for anything
> other than testing because it's terrible (and also the apparently
> boolean nature of kern.cryptodevallowsoft is a lie).)

So it remains to make userland AES use also constant time, as a separate
step?

>> I'm unclear on openssl and hardware support; "openssl speed" might be a
>> good home for this, and I don't know if openssl needs the same treatment
>> as cgd.  (Fine to keep separable things separate; not a complaint!)
>
> OpenSSL is a mixed bag.  It has a lot more MD implementations of
> various cryptographic primitives.  But many of them are still leaky.
> So it's probably not a very good proxy for what the performance impact
> of this patch set will be.

I sort of meant putting the new code in there so it can be measured, but
I realize that's messy.

Please don't take my "is there a way" question as a demand.


Re: AES leaks, cgd ciphers, and vector units in the kernel

2020-06-18 Thread Greg Troxel
Taylor R Campbell  writes:

>> Date: Thu, 18 Jun 2020 07:19:43 +0200
>> From: Martin Husemann 
>> 
>> One minor nit: with the performance impact that high, and there being
>> setups where runtime side channel attacks are totally not an issue,
>> maybe we should leave the old code in place for now, #ifdef'd as
>> default-off options to provide time for a full reconstruction (or untill
>> the machine gets update to "the last decade" cpu)?
>
> Having leaky AES code around is asking for trouble -- and would
> require additional complexity to implement and maintain (e.g., is it
> always unhooked from the build, or do we hook it in just enough to run
> tests?), which would add further burden on an audit to verify that
> it's _not_ being used in a real application.
>
> The goals here are to make that burden completely go away by making
> the answer unconditionally no, there's essentially no danger that AES
> in the kernel is leaky; and to provide alternatives with performance
> ranging from `not worse' to `much better' to avoid the conflict that
> AES invites between performance and security.
>
> If you have a specific system where there's a real negative
> performance impact that matters to you, I would be happy to talk over
> the details and see how we can address it better.

I see your point, and I think this is probably ok, but I share Martin's
concern.

For me, the main use of cgd is to encrypt backup drives.  I am therfore
not really concerned about side channel attacks when they are attached
and keyed on the system being backed up.  (I realize other people use
cgd for other reasons.)

I don't really see the new cipher as a reasonable option for removable
disks that need to be accessed by older systems.  I can see it for
encrypted local disk.  But given AES hardware speedup, I suspect most
people can just stay with AES.

Is there an easy way to publish code that does hardware AES, to allow
people to measure on their hardware?  If a call for that on -users turns
up essentially zero actual people that would be bothered, I think that
would be interesting.

I'm unclear on openssl and hardware support; "openssl speed" might be a
good home for this, and I don't know if openssl needs the same treatment
as cgd.  (Fine to keep separable things separate; not a complaint!)



Re: makesyscalls (moving forward)

2020-06-15 Thread Greg Troxel
David Holland  writes:

> Meanwhile it doesn't belong in sbin because it doesn't require root,
> nor does doing something useful with it require root, and it doesn't
> need to be on /, so... usr.bin. Unless we think libexec is reasonable,
> but if 3rd-party code is going to be running it we really want it on
> the $PATH, so...

I agree with that logic, that makesyscalls is kind of like config, and
that /usr/bin makes sense.  There's nothing admin-ish about it, as
building an operating system is not about configuring the host.

We could have a directory for tools used only for building NetBSD that
are not otherwise useful, and put config and makesyscalls there, but
given that we aren't overwhelming bin in a way that causes trouble, that
doesn't seem like a good idea.


Re: KAUTH_SYSTEM_UNENCRYPTED_SWAP

2020-05-17 Thread Greg Troxel
Alexander Nasonov  writes:

> Greg Troxel wrote:
>> Kamil Rytarowski  writes:
>> 
>> > Is it possible to avoid negation in the name?
>> >
>> > KAUTH_SYSTEM_ENABLE_SWAP_ENCRYPTION
>> 
>> I think the point is to have one permission to enable it, which is
>> perhaps just regular root, and another to disable it if securelevel is
>> elevated.
>> 
>> So perhaps there should be two names, one to enable, one to disable.
>
> Kauth is about security rather than speed or convenience. Disabling
> encryption may improve speed but it definitely degrades your security
> level. So, you can enable vm.swap_encrypt at any level but you can't
> disable it if you care about security.

I understand that.  But there's still a question of "should there be a
KAUTH name for enabling as well as disabling", separate from "what
should the rules be".

I think everybody believes that regardless of securelevel, root should
be able to enable encrypted swap.  But probably almost everyone thinks
regular users should not be allowed to enable it.

I realize we have a lot of "root can", and that extending kauth to make
everything separate is almost certainly too much.  But when disabling is
a big deal, I think it makes sense to add names for both enabling and
disabling, to make that intent clearer in the sources.

But, I don't think this is that important, and a comment would do.


Re: KAUTH_SYSTEM_UNENCRYPTED_SWAP

2020-05-16 Thread Greg Troxel
Kamil Rytarowski  writes:

> Is it possible to avoid negation in the name?
>
> KAUTH_SYSTEM_ENABLE_SWAP_ENCRYPTION

I think the point is to have one permission to enable it, which is
perhaps just regular root, and another to disable it if securelevel is
elevated.

So perhaps there should be two names, one to enable, one to disable.


Re: Rump makes the kernel problematically brittle

2020-04-02 Thread Greg Troxel
Thor Lancelot Simon  writes:

> I'd love to see a GSoC project to actually make rump build like the
> kernel...but it may be too much work.

Good points, and improvement would be great.


Re: Rump makes the kernel problematically brittle

2020-04-02 Thread Greg Troxel
The other side of the coin to "rump is fragile" is "an operating system
without rump-style tests that can be run automatically is suscpetible to
hard-to-detect failures from changes, and is therefore fragile".  There
have been many instances (usually on current-users, I think) of reports
of newly-failing tests cases, leading to rapid removal of
newly-introduced defects.




Re: Rump dependencies (5.2)?

2020-01-13 Thread Greg Troxel
Mouse  writes:

>> The rump build is done with separate reachover makefiles.  [...]
>
> Hm.  Then I think possibly the right answer for the moment is for me to
> excise rump from my tree entirely.  I can't recall ever wanting its
> functionality, and trying to figure out what the dependency graph is
> when it exists only implicitly in Makefiles scattered all over the tree
> sounds like a recipe for serious headaches.
>
> If and when it looks worth the effort, I can always back out the
> removal commit and clean up the result.  But SCM_MEMORY looks like the
> more valuable thing for my use cases for the moment.

Your tree, your call.  But it seems really obvious that you should fix
the rump build and write some atf test cases for your SCM_MEMORY stuff,
and then you will be able to test it automatically.


Re: Proposal, again: Disable autoload of compat_xyz modules

2019-09-26 Thread Greg Troxel
chris...@astron.com (Christos Zoulas) writes:

> I propose something very slightly different that can preserve the current
> functionality with user action:
>
> 1. Remove them from standard kernels in architectures where modules are
>supported. Users can add them back or just use modules.
> 2. Disable autoloading, but provide a sysctl to enable autoloading
>(1 global sysctl for all compat modules). Users can change the default
>in /etc/sysctl.conf (adds sysctl to the proposal)

I am assuming that we are talking about disabling autoloading of a
number of compat modules that are some combination of believed likely to
have security bugs and not used extensively, and this includes compat
for foreign OS, but does not, at least for now, include compat for older
NetBSD.

This situation is basically a balancing act of the needs/benefits
somehow aggregated (I will avoid "averaged") over all users.   It seems
pretty unclear how to evaluate that in total.  But, it does seem like
your single-sysctl proposal means:

  people who like compat being autoloaded can add one line in
  sysctl.conf and be back where they were

  people who want specific modules can load them and not enable the
  general sysctl

  people who don't know about any of this who try to run Linux binaries
  will lose, and presumably there'd be a line in dmesg that says which
  module failed to autoload, like
policy blocked autoloading compat_linux module; see compat_linux(8)
  which would then explain.

I'm also assuming this is being talked about for HEAD and hence 10, and
not 9.

Overall, this seems like a reasonable compromise among conflicting
goals.

If older NetBSD compat were included, I'd want to see a separate sysctl,
default-on for now.  (My guess is that wanting to disable that is a
fairly extreme position, at least these days.)


Re: build.sh sets with xz (was Re: vfs cache timing changes)

2019-09-13 Thread Greg Troxel
Martin Husemann  writes:

> On Fri, Sep 13, 2019 at 06:59:42AM -0400, Greg Troxel wrote:
>> I'd like us to keep somewhat separate the notions of:
>> 
>>   someone is doing build.sh release
>> 
>>   someone wants min-size sets at the expense of a lot of cpu time
>> 
>> 
>> I regularly do build.sh release, and rsync the releasedir bits to other
>> machines, and use them to install.  Now perhaps I should be doing
>> "distribution", but sometimes I want the ISOs.
>
> The default is MKDEBUG=no so you probably will not notice the compression
> difference that much.

I don't follow what DEBUG has to do with this, but that's not important.

> If you set MKDEBUG=yes you can just as easily set USE_XZ_SETS=no
> (or USE_PIGZGZIP=yes if you have pigz installed).

Sure, I realize I could do this.  The question is about defaults.

> The other side of the coin is that we have reproducable builds, and we
> should not make it harder than needed to reproduce our official builds.

It should not difficult or hard to understand, which is perhaps
different than defaults.

> But ... it already needs some settings (which we still need to document
> on a wiki page properly), so we could also default to something else
> and force maximal compressions via the build.sh command line on the
> build cluster.

I could see

MKREPRODUCILE=yes

causing defaults of various things to be a particular way, and perhaps
letting XZ default to no otherwise.  I would hope that what
MKREPRODUCILE=yes has to set is not very many things, but I haven't kept
up.



Re: build.sh sets with xz (was Re: vfs cache timing changes)

2019-09-13 Thread Greg Troxel
"Tom Spindler (moof)"  writes:

>> PS: The xz compression for the debug set takes 36 minutes on my machine.
>> We shoudl do something about it. Matt to use -T for more parallelism?
>
> On older machines, xz's default settings are pretty much unusable,
> and USE_XZ_SETS=no (or USE_PIGZGZIP=yes) is almost a requirement.
> On my not-exactly-slow i7 6700K, build.sh -j4 parallel is just fine
> until it hits the xz stage; gzip is many orders of magnitude faster.
> Maybe if xz were cranked down to -2 or -3 it'd be better at not
> that much of a compression loss, or it defaulted to the higher
> compression level only when doing a `build.sh release`.

(I have not really been building current so am unclear on the xz
details.)

I'd like us to keep somewhat separate the notions of:

  someone is doing build.sh release

  someone wants min-size sets at the expense of a lot of cpu time


I regularly do build.sh release, and rsync the releasedir bits to other
machines, and use them to install.  Now perhaps I should be doing
"distribution", but sometimes I want the ISOs.

Sometimes I do builds just to see if they work, e.g. if being diligent
about testing changes.

(Overall the notion of staying with gzip in most cases, with a tunable
for extreme savins sounds sensible but I am too unclear to really weigh
in on it.)


Re: NFS lockup after UDP fragments getting lost

2019-07-31 Thread Greg Troxel
Edgar Fuß  writes:

> Thanks to riastradh@, this tuned out to be caused by an (UDP, hard)
> HFS mount combined with a mis-configured IPFilter that blocked all but
> the first fragment of a fragmented NFS reply (e.g., readdir) combined
> with a NetBSD design error (or so Taylor says) that a vnode lock may
> be held accross I/O, in this case, network I/O.

Holding a vnode lock across IO seems like a bug to me too.  Marking the
vnode as having an in-process operation so others can
lock/read/report-that-status/unlock seems ok.  But I'm sure you already
know that vnode locking is hard.

> It looks like the operation to which the reply was lost sometimes
> doesn't get retried. Do we have some weird bug where the first
> fragment arriving stops the timeout but the blocking of the remaining
> fragments cause it to wedge?

Probably not.  fragments sit until there's a packet and then the packet
is sent to the stack.  So the NFS code is almost certainly totally
unaware of the arrival of the first fragment.


Re: /dev/random is hot garbage

2019-07-22 Thread Greg Troxel
Taylor R Campbell  writes:

>> It would also be reasonable to have a sysctl to allow /dev/random to
>> return bytes anyway, like urandom would, and to turn this on for our xen
>> builders, as a different workaround.  That's easy, and it doesn't break
>> the way things are supposed to be for people that don't ask for it.
>
> What's the advantage of this over using replacing /dev/random by a
> symlink to /dev/urandom in the build system?
>
> A symlink can be restricted to a chroot, while a sysctl knob would
> affect the host outside the chroot.  The two would presumably require
> essentially the same privileges to enact.

None, now that I think of it.

So let's change that on the xen build host.

And, the other issue is that systems need randomness, and we need a way
to inject some into xen guests.  Enabling some with rndctl works, or at
least used to, even if it is theoretically dangerous.  But we aren't
trying to defend against the dom0.


Re: /dev/random is hot garbage

2019-07-21 Thread Greg Troxel
I don't think we should change /dev/random.   For a very long time, the
notion is that the bits from /dev/random really are ok for keys, and
there has been a notion that such bits are precious and you should be
prepared to wait.  If you aren't generating a key, you shouldn't read
from /dev/random.

So I think rust is wrong and should be fixed.

I can see the reason for frustration, but I believe that we should not
break things that are sensible because they are abused and cause
problems in some environments.

It would also be reasonable to have a sysctl to allow /dev/random to
return bytes anyway, like urandom would, and to turn this on for our xen
builders, as a different workaround.  That's easy, and it doesn't break
the way things are supposed to be for people that don't ask for it.

Also, on the xen build hosts, it would perhaps be good to turn on
entropy collection from network and disk.

Another approach, harder, is to create a xenrnd(4) pseudodevice and
hypervisor call that gets bits from the host's /dev/random and injects
them as if from a hardware rng.




Re: mknod(2) and POSIX

2019-06-18 Thread Greg Troxel
David Holland  writes:

> However, I notice that mknod(2) does not describe how to set the
> object type with the type bits of the mode argument, or document which
> object types are allowed, and mkfifo(2) also does not say whether
> S_IFIFO should be set in the mode argument or not.

This is documented quite well in the opengroup.org standards pages (or
in S_IFIFO, and just don't set any special bits, respectively).  Agreed
that fixing the man pages would be good.

> (Though mkfifo(2) hints not by not documenting EINVAL for "The
> supplied mode is invalid", this sort of inference is annoying even in
> standards and really not ok for docs...)

https://pubs.opengroup.org/onlinepubs/9699919799/functions/mknod.html
https://pubs.opengroup.org/onlinepubs/9699919799/functions/mkfifo.html#tag_16_327

Those seem clear to me.


Re: mknod(2) and POSIX

2019-06-18 Thread Greg Troxel
Agreed with uwe@ about not mixing unrelated changes. Pretend we are
using git :-)


The patch looks fine.

Agreed that making fifos with mknod is an odd thing to do, but if it's
in posix, then we should do it unless there's something really bad about
supporting the posix usage.  In this case, it just seems silly to have a
second way to make fifos, not harmful.


mknod(2) and POSIX

2019-06-18 Thread Greg Troxel
I recently noticed that pkgsrc/sysutils/bup failed when restoring a fifo
under NetBSD because it calls mknod (in python) which calls mknod(3) and
hence mknod(2).

Our mknod(2) man page does not mention creating FIFOS, and claims

 The mknod() function conforms to IEEE Std 1003.1-1990 (“POSIX.1”).
 mknodat() conforms to IEEE Std 1003.1-2008 (“POSIX.1”).

I can't find 1990 online, but 2004 and 2008 require fifo support in
mknod:

  https://pubs.opengroup.org/onlinepubs/009695399/functions/mknod.html
  
https://pubs.opengroup.org/onlinepubs/9699919799.2008edition/functions/mknod.html

However, at least in netbsd-8, our kernel (sys/vfs_syscalls.c:do_mknod_at):

  requires KAUTH_SYSTEM_MKNOD for all callers, and hence EPERM for non-root

  has a switch on allowable types, and S_IFIFO is not one of them, and
  hence EINVAL

I realize mkfifo is preferred in our world, and POSIX says it is
preferred.  But I believe we have a failure to follow POSIX.

Other opinions?


Re: pool: removing ioff?

2019-03-16 Thread Greg Troxel
Maxime Villard  writes:

> I would like to remove the 'ioff' argument from pool_init() and friends,
> documented as 'align_offset' in the man page. This parameter allows the
> caller to specify that the alignment given in 'align' is to be applied at
> the offset 'ioff' within the buffer.
>
> I think we're better-off with hard-aligned structures, ie with __aligned(32)
> in the case of XSCALE. Then we just pass align=32 in the pool, and that's it.
>
> I would prefer to avoid any confusion in the pool initializers and drop ioff,
> rather than having this kind of marginal and not-well-defined features that
> add complexity with no real good reason.
>
> Note also that, as far as I can tell, our policy in the kernel has always
> been to hard-align the structures, and then pass the same alignment in the
> allocators.

I am not objecting as I can't make a performance/complexity argument.

But, I wonder if this comes from the Solaris allocation design, and that
the ioff notion is not about alignment for 4/8 objects to fit the way
the CPU wants, but for say 128 byte objects to be lined up on various
different offsets in different pages to make caching work better.  But
perhaps that doesn't exist in NetBSD, or is done differently, or my
memory of the paper is off.


Re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c

2019-01-21 Thread Greg Troxel


Christoph Badura  writes:

> On Mon, Jan 21, 2019 at 04:24:49PM -0500, Greg Troxel wrote:
>> Separetaly from debug code being careful, if it's a rule that bdv can't
>> be NULL, it's just as well to put in a KASSERT.  Then we'll find out
>> where that isn't true and can fix it.
>
> I must not be getting something.  If rf_containsboot() is passed a NULL
> pointer, it will trap with a page fault and we can get a stacktrace from
> ddb.  If we add a KASSERT it will panic and we can get a stacktrace from
> ddb.  I don't see where the benefit in that is.

The benefit is that the panic from the KASSERT is cleaner, and it
documents for readers of the function that the author believes it is a
rule.   And it will definitely fault even on machines will can
dereference NULL - that is technically if not practically architecture 
dependent.

> Do you think we should add a KASSERT to document that rf_containsboot()
> does expect a valid pointer? I'd see value in that and would go ahead with
> it.

Yes.  Basically, in any kernel function, if there is a requirement that
a pointer be non-NULL, then there should be a KASSERT and the code
should then feel free to assume it is valid.

When a KASSERT is hit, the user gets a message with the KASSERT
expression and the source file/line, instead of a page fault traceback.
It's very easy and quick to go from that printout to the KASSERT that
failed.

Plus, adding the KASSERT, or talking about adding it, is a good way to
check if there is consensus among the other developers that this really
is a rule.  In NetBSD, people are really good at telling you you're
wrong!


Re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c

2019-01-21 Thread Greg Troxel
Christoph Badura  writes:

>> > +  if (bdv == NULL)
>> > +  return 0;
>> > +
>> 
>> This looked suspicious, even before I read the code.
>> The question is if it is ever legitimate for bdv to be NULL.
>
> That is an excellent point.  The short answer is, no it isn't.  And it
> never was NULL in the code that used it.  I got a trap into ddb because of
> a null pointer deref in the DPRINTF that I changed (in the 4th hunk of my
> patch).
>
>> I am a fan of having comments before every function declaring their
>> preconditions and what they guarantee on exit.  Then all uses can be
>> audited if they guarantee the the preconditions are true.  This approach
>> is really hard-core in eiffel, known as design by contract.
>
> Yes, I totally agree.  Also to the rest of your message that I didn't quote.
>
> When I prepared the patch yesterday I was about to delete the above change
> because at first I couldn't remember why I added it ~3 weeks ago.  That
> should have raised a big fat warning sign.
>
> I thought about adding a comment after I read your private mail
> earlier today.  In the end I decided it is better to not change
> rf_containsboot() and instead introduce a wrapper for the benefit of the
> DPRINTF.

Separetaly from debug code being careful, if it's a rule that bdv can't
be NULL, it's just as well to put in a KASSERT.  Then we'll find out
where that isn't true and can fix it.



Re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c

2019-01-21 Thread Greg Troxel
Christoph Badura  writes:

> Here is some instrumentation I found useful during my recent debugging.
> If there are no objections, I'd like to commit soon.
>
> The change to rf_containsroot() simplifies the second DPRINTF that I added.
>
> Index: rf_netbsdkintf.c
> ===
> RCS file: /cvsroot/src/sys/dev/raidframe/rf_netbsdkintf.c,v
> retrieving revision 1.356
> diff -u -r1.356 rf_netbsdkintf.c
> --- rf_netbsdkintf.c  23 Jan 2018 22:42:29 -  1.356
> +++ rf_netbsdkintf.c  20 Jan 2019 22:32:14 -
> @@ -472,6 +472,9 @@
>   const char *bootname = device_xname(bdv);
>   size_t len = strlen(bootname);
>  
> + if (bdv == NULL)
> + return 0;
> +
>   for (int col = 0; col < r->numCol; col++) {
>   const char *devname = r->Disks[col].devname;
>   devname += sizeof("/dev/") - 1;

This looked suspicious, even before I read the code.

The question is if it is ever legitimate for bdv to be NULL.

I am a fan of having comments before every function declaring their
preconditions and what they guarantee on exit.  Then all uses can be
audited if they guarantee the the preconditions are true.  This approach
is really hard-core in eiffel, known as design by contract.

In NetBSD, many functions have KASSERT at the beginning.  This checks them
(under DIAGNOSTIC) but it also is a way of documenting the rules.

>From a quick glance at the code it seems obvious that it's not ok to
call these functions with a NULL bdv.

So if bdv is an argument and not allowed to be NULL, then early on in
that function, where you check/return, there should be

  KASSERT(bdv != NULL)

Not really on point, but as a caution there should be no behavior change
in any function under DIAGNOSTIC, if the code is bug free and
preconditions are met.  So "if something we can rely on isn't true,
panic" is fine, but many other things rae not.


Re: Importing libraries for the kernel

2018-12-12 Thread Greg Troxel
m...@netbsd.org writes:

> I don't expect there to be any problems with the ISC license. It's the
> preferred license for OpenBSD and we use a lot of their code (it's
> everywhere in sys/dev/)

Agreed that the license is ok.

> external, as I understand it, means "please upstream your changes, and
> avoid unnecessary local changes".

Agreed.

And also that we have a plan/expectation of tracking changes and
improvements that upstream makes.  Code that is not in external more or
less implies that we are the maintainer.  For these libraries, my
expectation is that they are being actively maintained and that we will
want to update to newer upstream versions from time to time.


Re: noatime mounts inhibiting atime updates via utime()

2018-12-05 Thread Greg Troxel
Edgar Fuß  writes:

>> Honestly, I think atime is one of the dumbest thing ever.
> We occasionally use them to find out (or have a first guess at):
> -- has anyone used libfoobar last year?
> -- who uses kbaz, i.e. has /home/xyz/.config/kbaz.conf been accessed?
>
> We use snapshots to run backups, so atimes are not touched by them.

I fairly often look at atimes to find out if old libraries have been
used, and various other things.

I have also had a test that tried to use utime fail on a machine that
was noatime.

So the notion that noatime should mean what it does now, but allow
explicit writes sounds good.

I don't see any value in changing the naming of the flags.  Having a fs
write atime updates unless mounted noatime seems fine, and if people
want noatime that's easy.  I would be opposed to e.g. dropping the
noatime option, making noatime default, and adding an atime option.
That's just churn violating historical norms for no good reason.

There's a question of what the default for installs should be, and I
don't have a real opinion about that.


It would be good to have stats about writes, separately including atime
updates.  Right now we know it causes writes but I haven't seen data.


Re: fixing coda in -current

2018-11-26 Thread Greg Troxel
m...@netbsd.org writes:

> On Sun, Nov 25, 2018 at 08:05:21PM -0500, Greg Troxel wrote:
>> However, I am pleased to report that the coda people have said that they
>> are working on a fuse interface, although it's expected to be slower.
>> We'll see, both if it materializes and how fast it is.
>
> That'd be neat.
> ... can we get general consensus about removing kernel coda if that
> happens, and the FUSE implementation works for netbsd too?
> dholland speaks poorly of it, we don't have a volunteer to write out
> tests, and it has a history of brea

Getting consensus is hard enough that I would prefer to defer that
until we see where we are.

The breakage history from NetBSD VFS changes isn't really that bad -- a
few times in 20 years, and it has caused very little trouble for others.



Re: fixing coda in -current

2018-11-26 Thread Greg Troxel
m...@netbsd.org (Emmanuel Dreyfus) writes:

> Greg Troxel  wrote:
>
>> However, I am pleased to report that the coda people have said that they
>> are working on a fuse interface, although it's expected to be slower.
>
> FUSE vs kernel does not really matter when we deal with network
> filesystem performance. The latency of requesting a network operation is
> orders of magnitude higher than issuing a few system calls.

That's true when the file has to be fetched.

Coda, like AFS, caches files in normal operation, and there are read
lock callbacks.  So the first fetch is over the network and slow, and
subsequent reads are at nearly the speed of the underlying filesystem.
It is this speed that people are talking about.




Re: fixing coda in -current

2018-11-25 Thread Greg Troxel


David Holland  writes:

> So I have no immediate comment on the patch but I'd like to understand
> better what it's doing -- the last time I crawled around in it
> (probably 7-8 years ago) it appeared to among other things have an
> incestuous relationship with ufs_readdir such that if you tried to use
> anything under than ffs as the local container it would detonate
> violently. But I never did figure out exactly what the deal was other
> than it was confusing and seemed to violate a lot of abstractions.
>
> Can you clarify? It would be nice to have it working properly and
> stuff like the above is only going to continue to fail in the future...

I didn't read this patch carefully, and I'm not Brett.  But the basic
scheme is that a container file representing a directory is in a
particular format.  This has been a source of issues when there was an
alignment change in directory reading.  My impression is that the way it
should be is that a container file that's a directory should be read in
ufs format, regardless of the container filesystem type.  I am not sure
that's the way the code is.

However, I am pleased to report that the coda people have said that they
are working on a fuse interface, although it's expected to be slower.
We'll see, both if it materializes and how fast it is.


Re: fixing coda in -current

2018-11-20 Thread Greg Troxel
I volunteer to bug Satya about using FUSE instead of a homegrown
(pre-FUSE) kernel interface.

I am unaware of anytning else that allows writes while disconnected and
reintegrates them.  I have actually done that, both on purpose and for
several days while my IPsec connection was messed up, and it really
worked.



Re: fixing coda in -current

2018-11-20 Thread Greg Troxel
I used to use it, and may again.  So I'd like to see it stay, partly
because I think it's good to keep NetBS relevant in the fileystem
research world.  I am expecting to see new upstream activity.

But, I think it makes sense to remove it from GENERIC, and perhaps have
whatever don't-autoload treatment, so that people only get it if they
explicitly ask for it.  That way it should not bother others.





Re: Things not referenced in kernel configs, but mentioned in files.*

2018-11-12 Thread Greg Troxel


co...@sdf.org writes:

> So, I am excluding things that appear in ALL, and I am not checking if

But ALL is an x86 thing, currently.

> they appear as modules.

Interesting, but I suppose they then belong in ALL also.

> So far I had complaints about the appearance of 'lm' which cannot be
> safely included in a default kernel, for example.

Sure, lots of things are not ok in GENERIC, but do those concerns apply
to it being in ALL?


Re: Things not referenced in kernel configs, but mentioned in files.*

2018-11-12 Thread Greg Troxel


co...@sdf.org writes:

> This is an automatically generated list with some hand touchups, feel
> free to do whatever with it. I only generated the output.
>
> ac100ic
> acemidi
> acpipmtr
> [snip]

I wonder if these are candidates to add to an ALL kernel, and if it will
turn out that they are mostly not x86 things.

I see we only have ALL for i386/amd64.  I wonder if it makes sense to
have one in evbarm.


Re: NetBSD-8 kernel too big?

2018-08-30 Thread Greg Troxel

Two thoughts:

  When trimming, ls -lSr in the kernel build directory will identify
  large objects.

  We have had kernel modules for a while, but I'm not entirely clear on
  where we are.  I would think that moving to a mode of aggressively not
  including things that can be modules and loading them from the fs as
  needed would help, particularly if the issue is the bootloader, vs
  memory used up when running.   This is not build as part of an -8
  release build, but there is MODULAR in the conf directory.
  



signature.asc
Description: PGP signature


Re: How to prevent a mutex being _enter()ed from being _destroy()ed?

2018-08-10 Thread Greg Troxel

Edgar Fuß  writes:

> I know very little about locking so that's probably a stupid question, but:

Don't worry - locking is probably the hardest thing to get right.

> Is there a general scheme/rule/proposal how to prevent a mutex that someone 
> is trying to mutex_enter() from being teared down by someone else calling 
> mutex_destroy() on it during that?

Not really.  Basically it's a bug to destroy a mutex that could possibly
be in use.  So there has to be something else protecting getting at the
mutex that is being destroyed.

> Specifically, I'm trying to understand what should prevent a thread from 
> destroying a socket's lock (by sofree()/soput()) while unp_gc() is trying 
> to acquire that lock.

I would expect (without reading the code) that there would be some lock
on the socket structure (using list here; the type is not the point),
and there would be a

  acquire socket_list lock
  find socket
  lock socket
  unlock sockt_list

or alternatively

  acquire socket_list lock
  find socket
  unlink socket from the list
  unlock sockt_list

  do whatever to the socket

So there has to be a rule about when various things are valid based on
being in various higher-level data structures.  In an ideal world this
rule would be clearly explained in the source code.  Ancient BSD
tradition is not to explain these things :-(


signature.asc
Description: PGP signature


Re: new errno ?

2018-07-06 Thread Greg Troxel

Phil Nelson  writes:

> Hello,
>
> In working on the 802.11 refresh, I ran into a new errno code from 
> FreeBSD:
>
> #define EDOOFUS 88  /* Programming error */
>
> Shall we add this one?  (Most likely with a different number since 88 is 
> taken
> in the NetBSD errno.h.)
>
>I could use EPROTO instead, but 

My immediate reaction is not to add it. It's pretty clearly not in
posix, unlikely to be added, and sounds unprofessional.

It seems like it would be used in cases where there is a KASSERT in the
non-DIAGNOSTIC case.  I might just map it to EFAULT or EINVAL.


signature.asc
Description: PGP signature


Re: ./build.sh -k feature request (was Re: GENERIC Kernel Build errors with -fsanitize=undefined option enabled)

2018-06-25 Thread Greg Troxel

[grudgingly keeping tech-kern and current-users both :-]

Reinoud Zandijk  writes:

> @@ -1084,6 +1084,9 @@ Usage: ${progname} [-EhnoPRrUuxy] [-a ar
> Should not be used without expert knowledge of the build 
> system.
>  -h Print this help message.
>  -j njobRun up to njob jobs in parallel; see make(1) -j.
> +-k Continue processing after errors are encountered, but 
> only on
> +   those targets that do not depend on the target whose 
> creation
> +   caused the error.
>  -M obj Set obj root directory to obj; sets MAKEOBJDIRPREFIX.
> Unsets MAKEOBJDIR.
>  -m machSet MACHINE to mach.  Some mach values are actually

a few unrelated comments:

I think it's pretty clear that if you don't pass -k, nothing will
change.  So people that don't like -k or think it's unsafe can just not
pass it.

It would be nice if invoking build.sh with -k resulted in two
properties: if any subpart fails, getting a non-zero exit status from
build.sh, and having that be clear from the end of the log.  Currently
'make -k' in pkgsrc results in -k being passed to the WRKSRC make
invocation, which results in a zero status, which results in a
.build_done cookie, which is wrong.

The full release build is a number of substeps.   It makes sense to use
-k for the main build of the system, after the tools build, to get the
"find all problems" property you want.  But it's not clear to me that it
makes sense to build the main system if the tool build fails.   And it's
not clear that sets should be produced.  Therefore, I wonder if passing
-k to make should be limited to the main build, and the main build
should be deemed to fail if there are any errors, so that there are no
sets produced in that case.





signature.asc
Description: PGP signature


Re: i2c and indirect vs. direct config

2018-05-30 Thread Greg Troxel

(I am a noob at i2c.)

Your points about explicit config make a lot of sense; reminds me of
qbus and isa bus where you have to know.

However, baking into the kernel is unfortunate, and I wonder if it makes
sense to have the i2c plan either in a boot variable or as something
that can configure them after boot, sort of like gpio.conf.


signature.asc
Description: PGP signature


Re: cpu1: failed to start (amd64)

2018-05-30 Thread Greg Troxel

It feels to me like you might be having two problems: SMP/cpu and USB.
I do not understand if they are related or not.

Assuming you have a working netbsd-7 machine (i386 is fine), it might be
best to build a netbsd-8 kernel and debug there, since 8 has many fixes
since 7.  You have built a kernel, but you may find BUILD-NetBSD from
pkgsrc/sysutils/etcmanage useful; that's my heavily annotated invocation
of build.sh.  Your choice of debug options sounds good as a first step.

First, I suspect that "no ACPI" on a modern machine is just not going to
work, as that's how manythings are configured.  My impression is that
disabling ACPI is appropriate on hardware that just barely has ACPI
support, and that support is buggy.   So I'm going to not address the
"no ACPI" case.

This is strange, because while I'm not familiar with that mobo model, it
and the CPU sound very normal.

I wonder if your motherboard's BIOS is up to date.  It might be that the
kernel is getting bad ACPI info.

With 4 cpus, there is something going wrong, and I haven't seen this
before.  You could look in the kernel sources for "failed to start" and
see if you can understand the code.  It may help to print out whatever
information is being used to try to start the other cpus, but I have no
idea what that is.

  db{0}> bt
  vmem_alloc() at netbsd:vmem_alloc+0x3f
  uvm_km_kmem_alloc() at netbsd:uvm_km_kmem_alloc+0x46
  kmem_intr_alloc() at netbsd:kmem_intr_alloc+0x6d
  kmem_intr_zalloc() at netbsd:kmem_intr_zalloc+0xf
  mpbios_scan() at netbsd:mpbios_scan+0x4cd
  mainbus_attach() at netbsd:mainbus_attach+0x2d0
  config_attach_loc() at netbsd:config_attach_loc+0x16e
  cpu_configure() at netbsd:cpu_configure+0x26
  main() at netbsd:main+0x2a3

This looks like mpbios_scan has asked to allocate memory in some
unreasonable or crazy amount.  Really that should not fault/panic, but
if you are able to read mpbios_scan (maybe even disassemble to find the
C line for 0x4cd) and add sanity checking before alloc, that might lead
to figuring it out.

Interestingly the product id is different for all (so am guessing is all
on one chipset).

  The kernel boots if I remove
uhci* at pci? dev ? function ?
  but then the USB drive is not detected and the boot device is not found.

as expected not to be found, but good that everything else is ok.
Presumably there is no 30s delay?

  The system has five uchi entries across two dev numbers. Enabling

Yes, I see

  uhci0 at pci0 dev 26 function 0: vendor 0x8086 product 0x2834 (rev. 0x02)
  uhci1 at pci0 dev 26 function 1: vendor 0x8086 product 0x2835 (rev. 0x02)
  ehci0 at pci0 dev 26 function 7: vendor 0x8086 product 0x283a (rev. 0x02)

  uhci2 at pci0 dev 29 function 0: vendor 0x8086 product 0x2830 (rev. 0x02)
  uhci3 at pci0 dev 29 function 1: vendor 0x8086 product 0x2831 (rev. 0x02)
  uhci4 at pci0 dev 29 function 2: vendor 0x8086 product 0x2832 (rev. 0x02)
  ehci1 at pci0 dev 29 function 7: vendor 0x8086 product 0x2836 (rev. 0x02)

Note that your system has ehci controllers, which are about USB3.  Your
flashdrive is on usb6 which is ehci1.  The way USB3 works is that the
ehci controllers have the ports and USB1/2 devices are handed off to the
uhci (or ohci non-Intel) controllers.  So I wonder if you disabled
those, or if it's only the uhci one you disabled.  But it looks like sd0
attaches in your posted dmesg.

  specific
  devices such as:
uhci4 at pci0 dev 29 function 2
  allows the kernel to boot, but I have not yet got it to detect the USB
  drive in any combinations I have tried so far.

But your posted dmesg attaches?

  If I enable all five devices specifying dev and function numbers then it
  boots but pauses for a very long time (maybe 30+ seconds). I wondered if
  there USB retries/errors not being displayed so I turned on USBVERBOSE
  but saw no additional output.

So what you posted is with all 5 uhci lines, no uchi wildcard, and you
didn't change the ehci wildcard?


So it seems that something is matching the uhci driver, but when the
attach runs it is crashing, perhaps on some device which is somewhere
else.  You can use "pcictl pci0 list" and look up the ids for anything
odd, and then for the other buses.

And yes, if you can set up a serial console, at least to be used when
booting (even if the bios doesn't really cope, if you do boot with
consdev set), and capture, then you can add debugging and maybe figure
out more what's wrong.  I suspect that if you know exactly what's wrong,
this is not too hard to fix, adding some sort of quirk to not believe
something from ACPI, or substitute something sane, exclude some device
id, etc.

With serial you can also setup kgdb, but I'm not sure how soon in boot
that is set up relative to the crash.  This lets you run gdb on another
machine and debug the kernel remotely, with full source listings.  But
ddb is quite useful.

The 30s delay could be a third thing wrong.


signature.asc
Description: PGP signature


Re: Reading a DDS tape with 2M blocks

2018-01-09 Thread Greg Troxel

Edgar Fuß  writes:

> I have a DDS tape (written on an IRIX machine) with 2M blocks.
> Any way to read this on a NetBSD machine?
> My memories of SCSI ILI handling on DDS are fuzzy. I remember you can operate 
> these tapes in fixed or variable block size mode, where some values in the 
> CDB 
> either mean blocks or bytes. I thought in variable mode, you could read block 
> sizes other than the (virtual) physical block size of the tape.

Did you try

dd if=/dev/rsd0d of=FILE bs=2m

or similar?  I believe that dd does reads of the given bs and these
reads are passed to the tape device driver which then does reads of that
size from the hardware, and that this then works fine.


signature.asc
Description: PGP signature


Re: Merging ugen into the usb stack

2017-12-11 Thread Greg Troxel

Martin Husemann <mar...@duskware.de> writes:

> On Mon, Dec 11, 2017 at 08:24:00AM -0500, Greg Troxel wrote:
>> I wonder if we should be attaching drivers to endpoints, rather than
>> devices.
>
> This is the drivers decision (we have drivers that do it).
>
> However, ugen is not able to attach to a single interface itself (currently).

Well, I guess I think it's better to allow drivers to attach to single
interfaces in general, than to make ugen special and try to integrate
it.  But I haven't looked at things enough to justify that opinion.



signature.asc
Description: PGP signature


Re: Merging ugen into the usb stack

2017-12-11 Thread Greg Troxel

Martin Husemann  writes:

> However, it can not work with the way NetBSD uses ugen devices:
>
> uftdi0 at uhub3 port 2
> uftdi0: FTDI (0x9e88) SheevaPlug JTAGKey FT2232D B (0x9e8f), rev 2.00/5.00, 
> addr 3
> ucom0 at uftdi0 portno 1
> ucom1 at uftdi0 portno 2
>
> I can disable the ucom at uftdi0 portno 1, but there is no way to get a ugen
> device to attach there.
>
> The uftdi chip itself offers a separate interface for each of the ports,
> at that layer there should not be a problem.

I wonder if we should be attaching drivers to endpoints, rather than
devices.  It seems fairly common to have multiple endpoints doing
different things (among more-than-one-thing devices), rather than
multiple devices behind a hub.

Letting ugen attach to endpoints that drivers don't deal with seems like
an entirely reasonable solution, and it seems to have lower risk of
problems from things we can't predict.

I also wonder what other operating systems do here (beyond point
solutions that they've had to do).


signature.asc
Description: PGP signature


Re: Proposal: Disable autoload of compat_xyz modules

2017-09-10 Thread Greg Troxel

Manuel Bouyer  writes:

> On Sun, Sep 10, 2017 at 12:17:58PM +0200, Maxime Villard wrote:
>> Re-thinking about this again, it seems to me we could simply add a flags
>> field in modinfo_t, with a bit that says "if this module is builtin, then
>> don't load it". To use compat_xyz, you'll have to type modload, and the
>> kernel will load the module from the builtin list.
>
> If I compile a kernel with a built-in module, I expect this module to
> be active. Otherwise I don't compile it.

But maxv@ is not talking about you deciding to compile a kernel and
putting in a line for a module.  The question is about compat modules
that are in GENERIC, and how to choose defaults so that users who want
to use them aren't inconveniencyed and that users that don't want to use
them don't have reduced security.

Reading maxv@'s suggestion, I wondered about autoload of non-built-in
modules (but maybe that is already disabled).  My quick reaction is that
it would be nice if the "don't autoload" flag had the same behavior for
builtin and non-builtin modules, so that builtin/not is just a linking
style thing, and not more.

But I see your point about respecting explicit configuration.

So I wonder about (without providing a patch of course):

  having a per-compiled-module flag to disable autoload, as suggested
  (in builtin and not, unless I'm confused)

  set the noautoload flag to true in modules that are deemed an
  unnecessary risk to people who have not made a choice to use them

  [so far this is maxv's proposal, I think]

  expand config(8) to be able to set "noautoload", so that if a module
  is included as part of a kernel, it will be marked noautoload if and
  noly if the flag is on the line, regardless of defaults.  This would
  not affect the modules in stand; they'd still have the default value
  of the noautoload flag from the default

  add the noautload flag to in-tree kernel configs for the above modules

which means that in Manuel's custom kernel he can just leave out the
noautoload flag and then that kernel will behave as always.

People trying to run a MODULAR kernel would still need to either edit
their module sources to change the flag (which if you are a MODULAR
type, is more or less like editing GENERIC) or do manual modload.


Overall I find this disabling of things by default but leaving them in
far preferable to not building them or removing them from sources in
terms of getting to a better place in the security/usability trade
space.


signature.asc
Description: PGP signature


Re: kernel aslr: someone interested?

2017-03-25 Thread Greg Troxel

Maxime Villard  writes:

> I would also add - even if it is not a relevant argument - that most
> "commonly-used" operating systems do have kernel aslr: Windows, Mac, Linux,
> etc.

There's another point, which various people may also consider invalid :-)

In the US, there's a federal computer security standard NIST 800-53, and
essentially a subset of that NIST 800-171, and more or less all federal
contractors handling non-public information have to implement it.  There
are a lot of security controls, and exploit mitigation is one of them.

I am not claiming that kernel ASLR is a requirement.  But, I would hate
to see people in these environments be told not to use NetBSD because it
lacks some security controls compared to alternatives.


signature.asc
Description: PGP signature


Re: spurious DIAGNOSTIC message "no disk label"

2016-12-26 Thread Greg Troxel

I think it's wrong to print out messages like that because DIAGNOSTIC is
defined.   DIAGNOSTIC is supposed to just add KASSERT (was panic, long
ago) about conditions that must be true unless the kernel is buggy.

Separately, given that there's no rule that all disks must have labels,
it seems wrong of the kernel to print this.  Certainly readdisklable()
or something can return an error, and a caller can do something, but
IMHO external input shouldn't trigger printfs like this.

So I would be inclined to drop the printf, but as you say you might want
to figure out why there is more than one attempt  to read the label.



signature.asc
Description: PGP signature


Re: Howto use agr to aggregate VPN tunnels

2016-12-14 Thread Greg Troxel

BERTRAND Joël  writes:

>   Hi,
>
>   I have seen in manual :
>
> There is no way to configure agr interfaces without attaching physical
> interfaces.
>
>   Is tap considered as physical interface or not ? tap has MAC
> address thus I think that is not a limitation. And agr created with
> tap0 and tap1 uses tap0 MAC address.

They are not actually physical of course, but I don't see any reason it
should not work.   However, if no one has tried and fixed any bugs that
stop it from working, it might well not.   So I suspect digging in with
gdb or printf might help.

Also, I would suggest setting up agr with two normal ethernet interfaces
to be really sure you are doing everything else right.


signature.asc
Description: PGP signature


Re: CPUs and processes not evenly assigned?!

2016-11-26 Thread Greg Troxel

Hubert Feyrer  writes:

> On Fri, 11 Nov 2016, Michael van Elst wrote:
>> Since we don't have floating point the computation should be done in
>> fixed point arithmetic, e.g.
>>
>>  r_avgcount = (A * r_avgcount + B * INT2FIX(r_mcount)) / (A + B);
>>
>> With the current A=B=1 you get alpha=0.5, but other values are thinkable
>> to make the balancer decide on the short term thread count or an even
>> longer term moving average.
>>
>> Using one fractional bit for INT2FIX by multiplying by two might not
>> be enough.
>
> I see a lot of ground for more research here, determining right amount
> of bits and A and B. To sum up our options at this point:

I see two separate issues.  One is the mixing ratio of the old average
and the new value.  The other is how values are scaled to have a
representation with enough precision.

The patch to multiply by 4, but still adding and dividing by 2, seems to
me to have the average count be 4* the true value, and provide 2 bits of
fraction.  As long as counts are compared and not used in an absolute
sense, I don't see any problems with that approach.

> a) leave the situation as-is and wait for research to get a perfect formula
> b) commit the patch we have and wait for the research to be done
>
> Given that the existing patch in PR kern/43561 and PR kern/51615 does
> improve the current situation, I'd vote for option "b".

I concur with b.



signature.asc
Description: PGP signature


Re: FUA and TCQ

2016-09-23 Thread Greg Troxel

Johnny Billquist  writes:

> With rotating rust, the order of operations can make a huge difference
> in speed. With SSDs you don't have those seek times to begin with, so
> I would expect the gains to be marginal.

For reordering, I agree with you, but the SSD speeds are so high that
pipeling is probably necessary to keep the SSD from stalling due to not
having enough data to write.  So this could help move from 300 MB/s
(that I am seeing) to 550 MB/s.


signature.asc
Description: PGP signature


Re: struct file reference at VFS level

2016-04-23 Thread Greg Troxel

Joerg Sonnenberger <jo...@bec.de> writes:

> On Fri, Apr 22, 2016 at 10:42:10AM -0400, Greg Troxel wrote:
>> I still don't understand why this is about FUSE.  What if a file were
>> opened without O_NONBLOCK and then the same file were opened with?
>
> O_NONBLOCK is pretty much pointless for regular files. It only really
> changes something for sockets, pipes and the like and they behave
> different already.

Sure, but I meant to include especially character special files.


signature.asc
Description: PGP signature


  1   2   3   >