Re: hang in vcache_vget()
Emmanuel Dreyfus writes: > Hello > > I experienced a system freeze on NetBSD-10.0/i386. Many processes > waiting on tstile, and one waiting on vnode, with this backtrace: > sleepq_block > cv_wait > vcache_vget > vcache_get > ufs_lookup > VOP_LOOKUP > lookup_once > namei_tryemulroot.constprop.0 > namei > vn_open > do_open > do_sys_openat > > I regret I did not took the time to show vnode. > > Is it worth a PR? I have no clue if it can be reproduced. I would say yes it's worth it. I have had hangs on 10/amd64, on a system with 32G of ram. I have been blaming zfs, but my "never hangs" experience has been on 9/ufs. But, others say zfs is fine. I just came across the "threads leak memory" problem pointed out by Brian Marcotte, and found a 17G gpg-agent. I now wonder if whatever is hanging is being provoked by running out of memory. Still a bug, but I no longer feel my "new problem" can be pointed at zfs. Do you think your system had high memory pressure at the time of your crash? (Sort of off topic, you should know that because 32-bit computers are no longer manufactured, the rust project thinks you shouldn't be using them. Take it to the ewaste center right away!)
Re: poll(): IN/OUT vs {RD,WR}NORM
Johnny Billquist writes: > POLLPRIHigh priority data may be read without blocking. > > POLLRDBAND Priority data may be read without blocking. > > POLLRDNORM Normal data may be read without blocking. Is this related to the "oob data" scheme in TCP (which is a hack that doesn't work)? Where do we attach 3 priority levels to data?
Re: Forcing a USB device to "ugen"
Jason Thorpe writes: > I should be able to do this with OpenOCD (pkgsrc/devel/openocd), but > libfdti1 fails to find the device because libusb1 only deals in > "ugen". Is that fundamental, in that ugen has ioctls that are ugen-ish that uftdi does not? I am guessing you thought about fixing libusb1. > The desire to use "ugen" on "interface 1" is not a property of > 0x0403,0x6010, it's really a property of > "SecuringHardware.com","Tigard V1.1". Unfortunately, there's isn't a > way to express that in the kernel config syntax. > > I think my only short-term option here is to, in uftdi_match(), specifically > reject based on this criteria: > > - VID == 0x0403 > - PID == 0x6010 > - interface number == 1 > - vendor string == "SecuringHardware.com" > - product string == "Tigard V1.1" > > (It's never useful, on this particular board, to use the second port as a > UART.) That seems reasonable to me. It seems pretty unlikely to break other things.
Re: Polymorphic devices
Brad Spencer writes: > I don't know just yet, but there might be unwanted device reset the "use > the one you open" technique. That is, you might have to reset the chip > to change mode and if you support say, I2C and GPIO at the same time > (which is possible), but then change to just GPIO the chip has to be > reset and that will disrupt any setting you might have set (I think, I > am am still working out what needs to happen with the mode switches). > This may not matter in the bigger picture and it wouldn't matter as much > if the mode switch was a sysctl, which one can say will reset the chip > anyway. Interesting complexity, but I'd say state the user has asked for should live in the driver and if it has to write that again on mode switch so be it. Generally if you open a device and close it you don't have much grounds to expect things you did to persist to the next session, but devices have device-specific semantics anyway.
Re: Polymorphic devices
Brad Spencer writes: > The first is enhancements to ufdti to support the MPSSE engine that some > of the FTDI chip variants have. This engine allows the chip to be a I2C > bus, SPI bus and provides some simple GPIO, and a bunch of other stuff, > as well as the normal USB UART. It is not possible to use all of the > modes at the same time. That is, these are not separate devices, but > modes within one device. Or another way, depending on the mode of the > chip you get different child devices attached to it. I am curious on > what the thoughts are on how this might be modeled. My reaction without much thought is to attach them all and to have the non-selected one return ENXIO or similar. And to have another device on which you call the ioctl to choose which device to enable. Or perhaps, to let you open any of them, flipping the mode, and to fail the 2nd simultaenous open.
Re: Maxphys on -current?
Brian Buhrow writes: > hello. I know that this has ben a very long term project, but I'm > wondering about the > status of this effort? I note that FreeBSD-13 has a Maxphys value of 1048576 > bytes. > Have we found other ways to get more throughput from ATA disks that obviate > the need for this > setting which I'm not aware of? > If not, is anyone working on this project? The wiki page says the project is > stalled. I haven't heard that anyone is. When you run dd with bs=64k and then bs=1m, how different are the results? (I believe raw requests happen accordingly, vs MAXPHYS for fs etc. access.)
Re: RFC: Native epoll syscalls
Mouse writes: >> It is definitely a real problem that people write linuxy code that >> seems unaware of POSIX and portability. > > While I feel a bit uncomfortable appearing to defend the practice (and, > to be sure, it definitely can be a problem) - but, it's also one of the > ways advancements happen: add an extension, use it, it turns out to be > useful, it gets popular > > I've done it myself (well, except for the "gets popular" part, which no > one person can do alone): labeled control structure, AF_TIMER sockets, > pidconn, validusershell, the list goes on. Sure, but this is "there are several extensions, and write code that only uses the local one, even though it could have been written to use any". And perhaps "there are mechanisms which could have been adopted, but instead make up a third". And I really meant "seems unaware", not "made a deliberate decision, evidenced by written design" :-)
Re: RFC: Native epoll syscalls
Martin Husemann writes: > On Wed, Jun 21, 2023 at 01:50:47PM -0400, Theodore Preduta wrote: >> There are two main benefits to adding native epoll syscalls: >> >> 1. They can be used to help port Linux software to NetBSD. > > Well, syscall numbers are cheap and plenty... > > The real question is: is it a usefull and consistent API to have? > At first sight it looks like a mix of kqueue and poll, and seems to be > quite complex. It is definitely a real problem that people write linuxy code that seems unaware of POSIX and portability. If we had native epoll, then that code could be built and used. That of course doesn't fix the portability issues, but it avoids them. It seems to me that if we have epoll emulation, it should not be that hard to also have it native, and I think the benefit in being able to run (natively) programs written unportably is significant.
Re: malloc(9) vs kmem(9) interfaces
Taylor R Campbell writes: > Right, so the question is -- can we get the attribution _without_ > that? Surely attribution itself is just a matter of some per-CPU > counters. Reading along, it strikes me there is a huge point implicit in your last sentence. I first thought of attribution as being able to tell what a particular allocated object is being used for. That requires state per object. However, you are talking about maintaining a count of objects by user. That is vastly cheaper, and likely 90%+ as useful. SO there is "object attribution" and "total usage attribution".
Re: LINEAR24 userland format in audio(4) - do we really want it?
nia writes: > Unfortunately file formats are standardized but the > the way the audio APIs are implemented varies. :/ > >> It's now no longer broken to handle 24bit WAV files. > > This is true, but audioplay is hardly the only > consumer of the API and could easily be made to communicate > with the kernel using 32-bit samples. > > What is the behaviour of everything in pkgsrc when thrown > 24bit WAV files? I'm not following. Are you saying we should remove suppport from the kernel API for 24-bit linear? lots of stuff in pkgsrc should be fixed so it works better?
Re: USB-related panic in 8.2_STABLE
Timo Buhrmester writes: > Apparently out of nothing, one of our servers paniced. > > > uname -a gives: > > | NetBSD trave.math.uni-bonn.de 8.2_STABLE NetBSD 8.2_STABLE > | (MI-Server) #17: Fri Jul 16 14:01:03 CEST 2021 > | supp...@trave.math.uni-bonn.de:/var/work/obj-8/sys/arch/amd64/compile/miserv > | amd64 My impression is that there have been a lot of USB fixes since 8. > I've transcribed the panic message and backtrace: > > | ohci0: 1 scheduling overruns > | ugen0: detached > | ugen0: at uhub4 port 2 (addr 2) disconnected > | ugen0 at uhub4 port 2 > | ugen0: Phoenixtec Power (0x6da) USB Cable (V2.00) (0x02), rev 1.00/0.06, > addr 2 > | uvm_fault(0xfe82574c2458, 0x0, 1) -> e > | fatal page fault in supervisor mode > | trap type 6 code 0 rip 0x802f627e cs 0x8 rflags 0x10246 cr2 0x2 > ilevel 6 (NB: could be ilevel 0 as well) rsp 0x80013f482c10 > | curlwp 0xfe83002b2000 pid 8393.1 lowest kstack 0x80013f4802c0 > | kernel: page fault trap, code=0 > | Stopped in pid 8393.1 (nutdrv_qx_usb) at netbsd:ugen_get_cdesc+0xb1: > | movzwl 2(%rax),%edx > | db{2}> bt > | ugen_get_cdesc() at netbsd:ugen_get_cdesc+0xb1 > | ugenioctl() at netbsd:ugenioctl+0x9a4 > | cdev_ioctl() at netbsd:cdev_ioctl+0xb4 > | VOP_IOCTL() at netbsd:VOP_IOCTL+0x54 > | vn_ioctl() at netbsd:vn_ioctl+0xa6 > | sys_ioctl() at netbsd:sys_ioctl+0x11a > | syscall() at netbsd:syscall+0x1ec > | --- syscall (number 54) --- > | 7a73c9eff13a: > | db{2}> > > Any idea what's going on? It can always be hardware. (Even if one can argue bad hardware should never lead to panic.) I'm not saying it is, or is likely, but keep that in mind. You didn't give timing. If this immediately followed the disconnnect, it's perhaps a bug in ugen to do something after the device is gone. It may be that this bug has always been there and that normally the UPS doesn't disconnect, or you hit a bad race. Try updating to 9 or 10 :-)
Re: crash in timerfd building pandoc / ghc94 related
PHO writes: > On 2/6/23 5:27 PM, Nikita Ronja Gillmann wrote: >> I encountered this on some version of 10.99.2 and last night again on >> 10.99.2 from Friday morning. >> This is an obvious blocker for me for making 9.4.4 the default. >> I propose to either revert to the last version or make the default GHC >> version setable. > > I wish I could do the latter, but unfortunately not all Haskell > packages are buildable with 2 major versions of GHC at the same time > (most are, but there are a few exceptions). > > Alternatively, I think I can patch GHC 9.4 so that it won't use > timerfd. It appears to be an optional feature after all; if its > ./configure doesn't find timerfd it won't use it. Let me try that. If it's possible to only do this on NetBSD 10.99, that would be good. It seems so far, from not really paying attention, that there is nothing wrong with ghc but that there is a bug in the kernel. It would also be good to get a reproduction recipe without haskell.
Re: Enable to send packets on if_loop via bpf
Ryota Ozaki writes: > In the specification DLT_NULL assumes a protocol family in the host > byte order followed by a payload. Interfaces of DLT_NULL uses > bpf_mtap_af to pass a mbuf prepending a protocol family. All interfaces > follow the spec and work well. > > OTOH, bpf_write to interfaces of DLT_NULL is a bit of a sad situation. > A writing data to an interface of DLT_NULL is treated as a raw data > (I don't know why); the data is passed to the interface's output routine > as is with dst (sa_family=AF_UNSPEC). tun seems to be able > to handle such raw data but the others can't handle the data (probably > the data will be dropped like if_loop). Summarizing and commenting to make sure I'm not confused on receive/read, DLT_NULL prepends AF in host byte order on transmit/write, it just sends with AF_UNSPCE This seems broken as it is asymmetric, and is bad because it throws away information that is hard to reliably recreate. On the other hand this is for link-layer formats, and it seems that some interfaces have an AF that is not really part of what is transmitted, even though really it is. For example tun is using an IP proto byte to specify AF and really this is part of the link protocol. Except we pretend it isn't. > Correcting bpf_write to assume a prepending protocol family will > save some interfaces like gif and gre but won't save others like stf > and wg. Even worse, the change may break existing users of tun > that want to treat data as is (though I don't know if users exist). > > BTW, prepending a protocol family on tun is a different protocol from > DLT_NULL of bpf. tun has three protocol modes and doesn't always prepend > a protocol family. (And also the network byte order is used on tun > as gert says while DLT_NULL assumes the host byte order.) wow. > So my fix will: > - keep DLT_NULL of if_loop to not break bpf_mtap_af, and > - unchange DLT_NULL handling in bpf_write except for if_loop to bother > existing users. > The patch looks like this: > > @@ -447,6 +448,14 @@ bpf_movein(struct uio *uio, int linktype, > uint64_t mtu, struct mbuf **mp, > m0->m_len -= hlen; > } > > + if (linktype == DLT_NULL && ifp->if_type == IFT_LOOP) { > + uint32_t af; > + memcpy(, mtod(m0, void *), sizeof(af)); > + sockp->sa_family = af; > + m0->m_data += sizeof(af); > + m0->m_len -= sizeof(af); > + } > + > *mp = m0; > return (0); That seems ok to me. I think the long-term right fix is to define DLT_AF which has an AF word in host order on receive and transmit always, and to modify interfaces to use it whenever they are AF aware at all. In this case tun would fill in the AF word from the IP proto field, and you'd get a transformed/regularized AF word when really the "link layer packet" had the IP proto field. But that's ok as it's just cleanup and reversible. signature.asc Description: PGP signature
Re: Enable to send packets on if_loop via bpf
Ryota Ozaki writes: > NetBSD can't do this because a loopback interface > registers itself to bpf as DLT_NULL and bpf treats > packets being sent over the interface as AF_UNSPEC. > Packets of AF_UNSPEC are just dropped by loopback > interfaces. > > FreeBSD and OpenBSD enable to do that by letting users > prepend a protocol family to a sending data. bpf (or if_loop) > extracts it and handles the packet as an extracted protocol > family. The following patch follows them (the implementation > is inspired by OpenBSD). > > http://www.netbsd.org/~ozaki-r/loop-bpf.patch > > The patch changes if_loop to register itself to bpf > as DLT_LOOP and bpf to handle a prepending protocol > family on bpf_write if a sender interface is DLT_LOOP. I am surprised that there is not already a DLT_foo that already has this concept, an AF word followed by data. But I guess every interface already has a more-specific format. Looking at if_tun.c, I see DLT_NULL. This should have the same ability to write. I have forgotten the details of how tun encodes AF when transmitting, but I know you can have v4 or v6 inside, and tcpdump works now. so obviously I must be missing something. My suggestion is to look at the rest of the drivers that register DLT_NULL and see if they are amenable to the same fix, and choose a new DLT_SOMETHING that accomodates the broader situation. I am not demanding that you add features to the rest of the drivers. I am only asking that you think about the architectural issue of how the rest of them would be updated, so we don't end up with DLT_LOOP, DLT_TUN, and so on, where they all do almost the same thing, when they could be the same. I don't really have an opinion on host vs network for AF, but I think your choice of aligning with FreeBSD is reasonable. signature.asc Description: PGP signature
Re: #pragma once
My quick reaction is that we should stick to actual standards, absent a really compelling case. This isn't compelling to me, and the point that linting for wrong usage isn't hard is a good one. I happen to be in the middle of a paper (from the guix crowd) about de-boostrapping ocaml. It's about getting rid of binary bootstraps. That's a problem we also have in pkgsrc, but we haven't issued a manifesto. While it might seem tangential, the de-bootstrapping world often wants to compile older code with older tools to construct a build graph that starts from as little binary as possible. Thus, "newer compilers all do this" is a bit scary, as while that's what people usually use, it's more comfortable to say "we need C99 plus X" for as little X as possible. signature.asc Description: PGP signature
Re: Can version bump up to 9.99.100?
David Holland writes: > On Fri, Sep 16, 2022 at 07:00:23PM +0700, Robert Elz wrote: > > That is, except for in pkgsrc, which is why I still > > have a (very mild) concern about that one - it actually compares the > > version numbers using its (until it gets changed) "Dewey" comparison > > routines, and for those, 9.99.100 is uncharted territory. > > No, it's not, pkgsrc-Dewey is well defined on arbitrarily large > numbers. In fact, that's in some sense the whole point of it relative > to using fixed-width fields. And, surely we had 9.99.9 and 9.99.10. The third digit is no more special than the second. It's just that it happens less often so the problem of arguably incorrectly written two-digit patterns is more likely than for that to happen with one. It's not reasonable to constrain a normal process because other bugs might exist. signature.asc Description: PGP signature
Re: Module autounload proposal: opt-in, not opt-out
Paul Goyette writes: > (personal note) > It really seems to me that the current module sub-systems is at > best a second-class capability. I often get the feeling that > others don't really care about modules, until it's the only way > to provide something else (dtrace). This proposal feels like > another nail in the modular coffin. Rather than disabling (part > of) the module feature, we should find ways to improve testing > the feature. I'd just like to say that while I haven't gone down the "modules first" path, I have been watching your commits and cheering you on. I do use a few modules, and this is making me think I should try to run MODULAR, especially on machines with less memory. I'm a little scared of not even having UFS, but I can try it as the low-memory machine is not important. signature.asc Description: PGP signature
Re: Module autounload proposal: opt-in, not opt-out
Martin Husemann writes: > I think that all modules that we deliver and declare safe for *autoload* > should require to be actually tested, and a basic test (for me) includes > testing auto-unload. That does not cover races that slip through "casual" > testing, but should have caught the worst bugs. That's a reasonable position for adding modules, but > So the error in the cases you stumbled in is the autoload and keeping the > badly tested module autoloadable but forbid its unloading sounds a bit > strange to me. Given where we are, do you really mean we should withdraw every module from autoload that does not have a documented test result, right now? It seems far better to have them stay loaded than be unavailable. signature.asc Description: PGP signature
Re: New iwn firmware & upgrade procedure
Havard Eidnes writes: >> A quick skim of /libdata/firmware makes me think it is mostly not >> versioned. > > Really? I suspect all the if_iwn files are versioned; if it > follows the pattern for iwlwifi-6000g2a-5, the number behind the > last hyphen is the version number. Look at all the devices, not just if_iwn. signature.asc Description: PGP signature
Re: New iwn firmware & upgrade procedure
Havard Eidnes writes: > 1) Could the if_iwn driver fall back to using the 6000g2a-5 microcode >without any code changes? (My gut feeling says "yes", but I have >no existence proof of that.) Unless it's really necessary (ABI change in accessing device with new firmware), it seems that the firmware should just be named for the device and not have the firmware version. Thus you'd get the version you have in tree, and that might be a little old. Alternatively there could be a symlink. But I don't understand why it is versioned. > Should the wireless firmware go into a different set which we also > learn the habit of extracting before reboot of the kernel? If the versioning is really intractable and frequent, perhaps, but I think this can be 99% solved by not putting firmware versions in filenames. A quick skim of /libdata/firmware makes me think it is mostly not versioned. signature.asc Description: PGP signature
Re: Slightly off topic, question about git
David Brownlee writes: > I suspect most of this also works with s/git/hg/ assuming NetBSD > switches to a mercurial repo Indeed, all of this is not really about git. Systems in the class of "distributed VCS" have two important properties: commits are atomic across the repo, not per file anyone can prepare commits, whether or not they are authorized to apply them to the repo. an authorized person can apply someone else's commit. These more or less lead to "local copy of the repo". And there are web tools for people who just want to look at something occasionally.But I find that it's not that big, that right now I have 3 copies (8, 9, current), and that it's nice to be able to do things offline (browse, diff, commit). CVS is really just RCS with organization into groups of files ability to operate over ssh (rsh originally :-) That was really great in 1994; I remember what a big advance it was (seriously). signature.asc Description: PGP signature
Re: wsvt25 backspace key should match terminfo definition
RVP writes: > On Tue, 23 Nov 2021, Michael van Elst wrote: > >> If you restrict yourself to PC hardware (i386/amd64 arch) then >> you probably have either >> >> a PS/2 keyboard -> the backspace key generates a DEL. >> a USB keyboard -> the backspace key generates a BS. >> >> That's something you cannot shoehorn into a single terminfo >> attribute and that's why many programs simply ignore terminfo >> here, in particular when you think about remote access. > > So, if I had a USB keyboard (don't have one to check right now), the > terminfo entry would be correct? How do we make this consistent then? > Have 2 terminfo entries: wsvt25-ps2 and wsvt25-usb (and fix-up getty > to set the correct one)? wscons is supposed to abstract all this, so making wsvt25-foo for different keyboard classes seems like the wrong approach. wskbd(4) says: • Mapping from keycodes (defined by the specific keyboard driver) to keysyms (hardware independent, defined in /usr/include/dev/wscons/wsksymdef.h). As uwe@ points out, the terms we use and the actual key labels are confusing. When I've talked about the DEL key, I've meant the key that the user types to delete backwards, almost always upper right and easily reachable when touch typing, and that in DEC tradition sent the DEL 0x1f character. It was pointed out that newer terminals have a backarrow logo, and I see that an IBM USB keyboard has that too. Then there's the BS key, which older (almost all actual?) terminals had, but my IBM USB keyboard doesn't have one, and my mac doesn't either. Looking in wsksymdef.h (netbsd-9, which is handy), we see "keysyms" which is what keycodes are supposed to map into, and it talks about them being aligned with ASCII. Relevant to this discussion there is #define KS_BackSpace0x08 #define KS_Delete 0x7f #define KS_KP_Delete0xf29f So that's for BS, DEL (to use ASCII) and the extended keypad "delete right" introduced with I think the VT220. On my USB keyboard, in NetBSD 9 wscons without trying to mess with mappings, I get backarrow (key where DEL should) ==> BS (^H) keypad Delete key (next to insert/home/end/pageup/pagedown) ==> DEL (^?) and I see that stty has erase to to ^H. The underlying issue is that the norms of some systems are to map that "user wants to delete left easily reachable key" to BS and some want to map it to DEL. I see these as the PC tradition and the UNIX tradition. So I think NetBSD should decide that we're following the UNIX tradition that this key is DEL, have wskbd map it that way for all keyboard types, and have stty erase start out DEL. (Plus of course carrying this across ssh so cross-deletionism works, which I think is already the case.) A quick glance at wskbd and ukbd did not enlighten me. xev shows similar wrong x keysyms, BS and DEL for "backarrow" and "keypad delete". signature.asc Description: PGP signature
Re: wsvt25 backspace key should match terminfo definition
Valery Ushakov writes: > vt52 is different. I never used a real vt52 or a clone, but the > manual at vt100.net gives the following picture: > > https://vt100.net/docs/vt52-mm/figure3-1.html > > and the description > > https://vt100.net/docs/vt52-mm/chapter3.html#S3.1.2.3 > > Key CodeAction Taken if Codes Are Echoed > BACK SPACE 010 Backspace (Cursor Left) function > DELETE 177 Nothing That is explaining what the terminal does when those codes are sent by the computer. That is a different thing from how the computer interprets user input. When using a VT52 on Seventh Edition, for example one pushed DEL to remove the previous character, and the computer woudl send "" to make it disappear and leave the cursor left. One basically never pushed BS. > vt100 had similar keyboard (again, never used a real one personally) > > https://vt100.net/docs/vt100-ug/chapter3.html#F3-2 > > BACKSPACE 010 Backspace function > DELETE 177 Ignored by the VT100 same as vt52, I think. > But vt200 and later use a different keyboard, lk201 (and i did use a > real vt220 a lot) > > https://vt100.net/docs/vt220-rm/figure3-1.html > > that picture is not very good, the one from the vt320 manual is better > > https://vt100.net/docs/vt320-uu/chapter3.html > > vt220 does NOT have a configuration option that selects the code that > the But somehow the official terminfo database has kbs=^H for vt220! > > Later it became configurable: > > https://vt100.net/docs/vt320-uu/chapter4.html#S4.13 > > For vt320 (where it *is* configurable) terminfo has > > $ infocmp -1 vt320 | grep kbs > kbs=^?, Very interesting! > >> I think the first thing to answer is "what is kbs in terminfo supposed >> to mean". > > X/Open Curses, Issue 7 doesn't explain, other than saying "backspace" > key, which is an unfortunate name, as it's loaded. But it's > sufficiently clear from the context that it's the key that deletes > backwards, i.e. deletes under. So it's the codes generated by the DEL key (as opposed to the Delete key). >> My other question is how kbs is used from terminfo. Is it about >> generating output sequences to move the active cursor one left? If so, >> it's right. Is it about "what should the user type to delete left", >> then for a vt52/vt220, that's wrong. If it is supposed to be both, >> that's an architectural bug as those aren't the same thing. > > No, k* capabilities are sequences generated by the terminal when some > key is pressed. The capability for the sequence sent to the the > terminal to move the cursor left one position is cub1 > > $ infocmp -1 vt220 | grep cub1 > cub1=^H, > kcub1=\E[D, > > (kcub1 is the sequence generated by the left arrow _k_ey). Then I'm convinced that kbs should be \? for these terminals. signature.asc Description: PGP signature
Re: wsvt25 backspace key should match terminfo definition
Johnny Billquist writes: >> For vt320 (where it *is* configurable) terminfo has >> >>$ infocmp -1 vt320 | grep kbs >>kbs=^?, > > Which I think it should be. But what does kbs mean? - the ASCII character sent by the computer to move the cursor left? - the ASCII character sent by the BS key? - the ASCII character sent by the DEL key that the uses uss to delete left? signature.asc Description: PGP signature
Re: wsvt25 backspace key should match terminfo definition
Valery Ushakov writes: > On Tue, Nov 23, 2021 at 00:01:40 +, RVP wrote: > >> On Tue, 23 Nov 2021, Johnny Billquist wrote: >> >> > If something pretends to be a VT220, then the key that deletes >> > characters to the left should send DEL, not BS... >> > Just saying... >> >> That's fine with me too. As long as things are consistent. I suggested the >> kernel change because both terminfo definitions (and the FreeBSD console) >> go for ^H. > > Note that the pckbd_keydesc_us keymap maps the scancode of the <- key to > > KC(14), KS_Cmd_ResetEmul, KS_Delete, > > i.e. 0x7f (^?). > > terminfo is obviously incorrect here. Amazingly, the bug is actually > in vt220 description! wsvt25 just inherits from it: > > $ infocmp -1 vt220 | grep kbs > kbs=^H, > > I checkeed termcap.src from netbsd-4 and it's wrong there too. I have > no idea htf that could have happened. I think (memory is getting fuzzy) the problem is that the old terminals had a delete key, in the upper right, that users use to remove the previous character, and a BS key, upper left, that was actually a carriage control character. The basic problem is that in the PC world, the idea is that key where DEL should be has a backarrow the the PC world thinks it is backspace. That's the DEC-centric viewpoint of course :-) I think any change needs a careful proposal and review, becuase there are lots of opinions here and a change is likely to mess up a bunch of people's configs, even if they have worked around something broken. I don't mean "no changes", just that if you don't think this is a really hard problem you probably shouldn't change it (globally). Also /usr/include/sys/ttydefaults.h is about all of NetBSD on all sorts of hardware, not just PCs and there are lots of keyboards as well as actual terminals. Ever since we moved beyond ASR33, CERASE has been 0177 (my Unix use more or less began with a VT52 and a Beehive CRT). xterm has a config to say "make the key where DEL ought to be generate the key that the tty has configured as ERASE". I suspect that the right approach is 1) choose what wscons generates for the "key where DEL belongs" 2) have the tty set so that the choice in (1) is 'stty erase'. I see the same kbs=^H on vt52. I think the first thing to answer is "what is kbs in terminfo supposed to mean". My other question is how kbs is used from terminfo. Is it about generating output sequences to move the active cursor one left? If so, it's right. Is it about "what should the user type to delete left", then for a vt52/vt220, that's wrong. If it is supposed to be both, that's an architectural bug as those aren't the same thing. signature.asc Description: PGP signature
Re: timecounters
I think it makes sense to document them, and arguably each counter should have a man page, except for things that are somehow in timecounter(9) instead (if they don't have a device name?). signature.asc Description: PGP signature
Re: Representing a rotary encoder input device
What do other systems do? It strikes me what wsmouse feels like it is for things connected with the kbd/mouse/display world. To be cantankerous, using it seems a little bit like representing a GPIO input as a 1-button mouse that doesn't move. I would imagine that a rotary encoder is more likely to be a volume or level control, but perhaps not for the machine, perhaps just reported over MQTT so Home Assistant on some other machine can deal. If you are really talking about encoders hooked to gpio, then perhaps gpio should grow a facility to take N pins and say they are some kind of encoder and then have a gpio encoder abstraction. But maybe you are trying to use an encoder to add scroll to a 3-button mouse? signature.asc Description: PGP signature
Re: SCSI scanners
Julian Coleman writes: > Can we get rid of the SCSI scanner support as well? It only supports old > HP and Mustek scanners, and its functionality is superseded by SANE (which > sends the relevant SCSI commands from userland). If it's really the case that SANE works with these, then that seems ok. (I actually have a UMAX scsi scsanner but haven't powered it on in years.) I wonder though if this is causing the kind of trouble that uscanner caused. signature.asc Description: PGP signature
Re: protect pmf from network drivers that don't provide if_stop
Martin Husemann writes: > On Tue, Jun 29, 2021 at 03:46:20PM +0930, Brett Lymn wrote: >> I turned up a fix I had put into my source tree a while back, I think at >> the time the wireless driver (urtwn IIRC) did not set an entry for >> if_stop. > > This is a driver bug, we should not work around it but catch it early > and fix it. So maybe KASSERT that stop exists, and then call it if non-NULL, so regular users don't crash, and DIAGNOSTIC does what DIAGNOSTIC is supposed to do? signature.asc Description: PGP signature
Re: regarding the changes to kernel entropy gathering
Thanks - that is useful information. I think the big point is that the new seed file is generated from urandom, not from the internal state, so the new seed doesn't leaak internal state. The "save entropy" language didn't allow me to conclude that. Also, your explanation is about updating, but it doesn't address generation of a file for the first time. Presumably that just takes urandom without the old seed that isn't there and doesn't overwrite the old seed that isnt' there. Interestingly, I have a machine running current, running as a dom0 sometimes, and haven't had problems. I now realize that's only because the machine had a seed file created under either 7 or 9 (installed 7, updated to 9, updated to current). So it has trusted, untrustworthy entropy (even though surely after all this time some of it must have been unobserved). signature.asc Description: PGP signature
Re: regarding the changes to kernel entropy gathering
Thor Lancelot Simon writes: > shuts down, again all entropy samples that have been added (which, again, > are accumulating in the per-cpu pools) are propagated to the global pool; > all the stream RNGs rekey themselves again; then the seed is extracted. It seems obvious to me that "extracting" the seed should be done in such a way that the state of the internal rng is still unpredictable from the saved seed, even if the state of the newly-booted rng will be predictable. Perhaps by pulling 256 bytes from urandom, perhaps by something more direct and then some sort of hash/rekey to get back traffic protection. Probably this is already done in a way much better thought out than my 30s reaction, the man page doesn't really say this, at least that I could follow; rndctl -S says "save entropy pool". signature.asc Description: PGP signature
Re: ZFS: time to drop Big Scary Warning
chris...@astron.com (Christos Zoulas) writes: > That's a good test, but how does zfs compare in for the same test with lets > say ffs or ext2fs (filesystems that offer persistence)? With the same system, booted in the same way, but with 3 different filesystems mounted on /tmp, I get similar numbers of failures: tmpfs 12 ffs213 zfs 18 So tmpfs/ffs2 are ~equal and zfs has a few more failures (but it all looks a bit random and non-repeatable).So it's hard to sort out "zfs is buggy" vs "some tests fail in timing-related hard-to-understand ways and that seems provoked slightly more with /tmp on zfs". Did you mean something else? signature.asc Description: PGP signature
Re: ZFS: time to drop Big Scary Warning
I got a suggestion to run atf with a ZFS tmp. This is all with current from around March 1, and is straight current, no Xen. Creating tank0/tmp and having it be mounted on /tmp failed the mount (but created the volume) with some sort of "busy" error. I already had a tmpfs mounted. Rebooting, zfs got mounted and then tmpfs and i unmounted tmpfs and then I have a zfs tmp. So not sure what's up but feels like a tmpfs issue more than a zfs issue, and not a big deal. Or maybe it's a feature that you can't mount over tmpfs. With /tmp being tmpfs, my results are similar to the releng runs. I've indented things that don't match two spaces. Failed test cases: lib/libc/sys/t_futex_ops:futex_wait_timeout_deadline lib/libc/sys/t_ptrace_waitid:syscall_signal_on_sce lib/libc/sys/t_truncate:truncate_err lib/librumpclient/t_exec:threxec net/if_wg/t_misc:wg_rekey usr.bin/cc/t_tsan_data_race:data_race usr.bin/make/t_make:archive usr.bin/c++/t_tsan_data_race:data_race usr.sbin/cpuctl/t_cpuctl:nointr usr.sbin/cpuctl/t_cpuctl:offline fs/ffs/t_quotalimit:slimit_le_1_user modules/t_x86_pte:rwx Summary for 903 test programs: 9570 passed test cases. 12 failed test cases. 73 expected failed test cases. 530 skipped test cases. With /tmp being zfs:tank0/tmp, I get Failed test cases: ./bin/cp/t_cp:file_to_file ./lib/libarchive/t_libarchive:libarchive ./lib/libc/stdlib/t_mktemp:mktemp_large_template ./lib/libc/sys/t_ptrace_waitid:syscall_signal_on_sce ./lib/libc/sys/t_stat:stat_chflags ./lib/libc/sys/t_truncate:truncate_err ./net/if_wg/t_misc:wg_rekey ./usr.bin/cc/t_tsan_data_race:data_race_pie ./usr.bin/make/t_make:archive ./usr.bin/ztest/t_ztest:assert ./usr.bin/c++/t_tsan_data_race:data_race ./usr.bin/c++/t_tsan_data_race:data_race_pie ./usr.sbin/cpuctl/t_cpuctl:nointr ./usr.sbin/cpuctl/t_cpuctl:offline ./fs/nfs/t_rquotad:get_nfs_be_1_group ./modules/t_x86_pte:rwx ./modules/t_x86_pte:svs_g_bit_set Summary for 903 test programs: 9567 passed test cases. 17 failed test cases. 72 expected failed test cases. 529 skipped test cases. which is also similar, but slightly different. So overal I conclude that there's nothing terrible going on, and that these results are in the same class of mostly passing but somewhat irregular as the base case. So work to do, but it doesn't support "ZFS is scary". (Of course, the system stayed up through the tests and has no apparent trouble, or I would have said.) As an aside, it would be nice if atf-test used TMPDIR or had an argument to say what place to do tests. signature.asc Description: PGP signature
Re: ZFS: time to drop Big Scary Warning
"J. Hannken-Illjes" writes: >> On 19. Mar 2021, at 21:18, Michael wrote: >> >> On Fri, 19 Mar 2021 15:57:18 -0400 >> Greg Troxel wrote: >> >>> Even in current, zfs has a Big Scary Warning. Lots of people are using >>> it and it seems quite solid, especially by -current standards. So it >>> feels times to drop the warning. >>> >>> I am not proposing dropping the warning in 9. >>> >>> Objections/comments? >> >> I've been using it on sparc64 without issues for a while now. >> Does nfs sharing work these days? I dimly remember problems there. > > If you mean misc/55042: Panic when creating a directory on a NFS served ZFS > it should be fixed in -current. I have a box running current/amd64 from about March 4, with a zpool on a disklabel partition, and a filesystem from that exported, mounted on a 9/amd64 box, and did the mkdir test and it was totally fine. I was able to have the maproot segfault happen, before the fix. So yes, this is fixed. So summarizing: nobody has said there is any remaining serious issue many remember issues about NFS (true) but they all seem ok now and I just looked over the open PRs and w.r.t. current don't see anything serious. signature.asc Description: PGP signature
ZFS: time to drop Big Scary Warning
Even in current, zfs has a Big Scary Warning. Lots of people are using it and it seems quite solid, especially by -current standards. So it feels times to drop the warning. I am not proposing dropping the warning in 9. Objections/comments? signature.asc Description: PGP signature
Re: kmem pool for half pagesize is very wasteful
Chuck Silvers writes: > in the longer term, I think it would be good to use even larger pool pages > for large pool objects on systems that have relatively large amount of memory. > even with your patch, a 1k pool object on a system with a 4k VM page size > still has 33% overhead for the redzone, which is a lot for something that > is enabled by DIAGNOSTIC and is thus supposed to be "inexpensive". So maybe the real bug is that this check should not be part of DIAGNOSTIC. I remember from 2.8BSD that DIAGNOSTIC was basically just supposed to add cheap asserts and panic earlier but not really be slower in any way anybody would care about. It seems easy enough to make this separate and not get turned on for DIAGNOSTIC, but some other define. It might even be that for current the checked-in GENERIC enables this. But someone turning on DIAGNOSTIC on 9 shouldn't get things that hurt memory usage really at all, or more than say a 2% degradation in speed. > there's a tradeoff here in that using a pool page size that matches the > VM page size allows us to use the direct map, whereas with a larger > pool page size we can't use the direct map (at least effectively can't today), > but for pools that already use a pool page size that is larger than > the VM page size (eg. buf16k using a 64k pool page size) we already > aren't using the direct map, so there's no real penalty for increasing > the pool page size even further, as long as the larger pool page size > is still a tiny percentage of RAM and KVA. we can choose the pool page size > such that the overhead of the redzone is bounded by whatever percentage > we would like. this way we can use a redzone for most pools while > still keeping the overhead down to a reasonable level. That sounds like great progress and I don't mean to say anything negative about that. signature.asc Description: PGP signature
Re: fsync error reporting
Greg Troxel writes: > 1) operating system has a succcessful return from a write transaction to > a disk controller (perhaps via a controller that has a write-back > cache) > > 2) operating system has been told by the controller that the write has > actually completed to stable storage (guaranteed even if OS crashes or > power fails, so actually written or perhaps in battery-backed cache) I see our man page addresses this with FDISKSYNC. It sounds like you aren't proposing to change this (makes sense), but there's the pesky issue of errors within the disk when writing from cache to media. Perhaps those are unreportable. signature.asc Description: PGP signature
Re: fsync error reporting
David Holland writes: > > > everything that process wrote is on disk, > > > > That is probably unattainable, since I've seen it plausibly asserted > > that some disks lie, reporting that writes are on the media when this > > is not actually true. > > Indeed. What I meant to say is that everything has been sent to disk, > as opposed to being accidentally skipped in the cache because the > buffer was busy, which will currently happen on some of the fsync > paths. > > That's why flushing the disk-level caches was a separate point. (ignoring errors as I have no objection to what you proposed and clarified with mouse@) Maybe I'm way off in space, but I'd like to see us be careful about 1) operating system has a succcessful return from a write transaction to a disk controller (perhaps via a controller that has a write-back cache) 2) operating system has been told by the controller that the write has actually completed to stable storage (guaranteed even if OS crashes or power fails, so actually written or perhaps in battery-backed cache) A) for stacked filesystems like raid, cgd, and for things like NFS, there's basically and e2e ack of the above condition. POSIX is of course weasely about this. But it seems obvious that if you call fsync, you want the property that if there is a crash or power failure (but not a disk media failure :-) that your bits are there, which is case 2. Case 1 is only useful in that files could remain in OS cache for a long time, and there is a pretty good but not guaranteed notion that once in device writeback cache they will get to the actual media in not that long. The old "sync;sync;sync;sleep 10" thing from before there was shutdown(8)... I thought NCQ was supposed to give acks for actual writing, but allow them to be perhaps ordered and multiple in flight, so that one could use that instead of the big-hammer inscrutable writeback cache. If the controller doesn't support NCQ, then it seems one has to issue a cache flush, which presmably is defined to get all data in cache as of the flush onto disk before reprorting that its done. Is that what you're thinking, or do you think this is all about case 1? signature.asc Description: PGP signature
Re: fsync_range and O_RDONLY
David Holland writes: > Well, if you have it open for write and I have it open for read, and I > fsync it, it'll sync your changes. I guess maybe POSIX is wrong then :-) But as a random user I can type sync to the shell. > And report any errors to me, so if you're a database and I'm feeling > nasty I can maybe mess with you that way. So I'm not sure it's a great > idea. > > Right now fsync error reporting is a trainwreck though. I think that's the real problem; if I open for write and fsync, then I should get status back that lets me know about my writes, regardless of who else asked for sync. Once that's fixed, then the 'others asking for sync' is much less of a big deal. I know, ENOPATCH. signature.asc Description: PGP signature
Re: fsync_range and O_RDONLY
David Holland writes: > Last year, fdatasync() was changed to allow syncing files opened > read-only, because that ceased to be prohibited by POSIX and something > apparently depended on it. I have a dim memory of this and mongodb. > However, fsync_range() was not also so changed. Should it have been? > It's now inconsistent with fsync and fdatasync and it seems like it's > meant to be a strict superset of them. It seems like it might as well be. I would expect this to only really sync the file's metadata, same as the others, but I do not feel like I really understand this. signature.asc Description: PGP signature
Re: partial failures in write(2) (and read(2))
David Holland writes: > Basically, it is not feasible to check for and report all possible > errors ahead of time, nor in general is it possible or even desirable > to unwind portions of a write that have already been completed, which > means that if a failure occurs partway through a write there are two > reasonable choices for proceeding: >(a) return success with a short count reporting how much data has >already been written; >(b) return failure. > > In case (a) the error gets lost unless additional steps are taken > (which as far as I know we currently have no support for); in case (b) > the fact that some data was written gets lost, potentially leading to > corrupted output. Neither of these outcomes is optimal, but optimal > (detecting all errors beforehand, or rolling back the data already > written) isn't on the table. > > It seems to me that for most errors (a) is preferable, since correctly > written user software will detect the short count, retry with the rest > of the data, and hit the error case directly, but it seems not > everyone agrees with me. It seems to me that (a) is obviously the correct approach. An obvious question is what POSIX requires, pause for `kill -HUP kred` :) I am only a junior POSIX lawyer, not a senior one, but as I read https://pubs.opengroup.org/onlinepubs/9699919799/functions/write.html#tag_16_685 I think your case (a) is the only conforming behavior and obviously what the spec says must happen. I do not even see a glimmer of support for (b). There is the issue of PIPE_BUF, and requests <= PIPE_BUF being atomic, but I don't think you are talking about that. Note that write is obligated to return partial completion if interrupted by a signal. I think your notion that it's ok to not return the reason the full amount wasn't written is enirely valid. I am surprised this is contentious (really; not trying to be difficult). signature.asc Description: PGP signature
Re: Temporary memory allocation from interrupt context
Martin Husemann writes: > On Wed, Nov 11, 2020 at 08:26:45AM -0500, Greg Troxel wrote: >> >LOCK(st); >> >size_t n, max_n = st->num_items; >> >some_state_item **tmp_list = >> >kmem_intr_alloc(max_n * sizeof(*tmp_list)); >> >> kmem_intr_alloc takes a flag, and it seems that you need to pass >> KM_NOSLEEP, as blocking for memory in softint context is highly unlikely >> to be the right thing. > > Yes, and of course the real code has that (and works). It's just that > - memoryallocators(9) does not cover this case > - kmem_intr_alloc(9) is kinda deprecated - quoting the man page: > > These routines are for the special cases. Normally, > pool_cache(9) should be used for memory allocation from interrupt > context. > >but how would I use pool_cache(9) here? Not deprecated, but for "special cases". I think needing a possibly-big variable-size chunk of memory at interrupt time is special. You would use pool_cache by being able to use a fixed-sized object. But it seems that's not how the situation is. I think memoryallocators(9) could use some spiffing up; it (on 9) says kmem(9) cannot be used from interrupt context. The central hard problem is orthogonal, though: if you don't pre-allocate, you have to choose between waiting and copying with failure. signature.asc Description: PGP signature
Re: Temporary memory allocation from interrupt context
Martin Husemann writes: > Consider the following pseudo-code running in softint context: > > void > softint_func(some_state *st, ) > { > LOCK(st); > size_t n, max_n = st->num_items; > some_state_item **tmp_list = > kmem_intr_alloc(max_n * sizeof(*tmp_list)); kmem_intr_alloc takes a flag, and it seems that you need to pass KM_NOSLEEP, as blocking for memory in softint context is highly unlikely to be the right thing. The an page is silent on whether lack of both flags is an error, and if not what the semantics are. (It seems to me it should be an error.) With KM_NOSLEEP, it is possible that the allocation will fail. Thus there needs to be a strategy to deal with that. > n = 0; > for (i : st->items) { > if (!(i matches some predicate)) > continue; > i->retain(); > tmp_list[n++] = i; > } > UNLOCK(st); > /* do something with all elements in tmp_list */ > kmem_intr_free(tmp_list, max_n * sizeof(*tmp_list)); > } > > I don't want to alloca here (the list could be quite huge) and max_n could > vary a lot, so having a "manual" pool of a few common (preallocated) > list sizes hanging off the state does not go well either. I think that you need to pick one of pre-allocate the largest size and use it temporarily be able to deal with not having memory. This leads to hard-to-debug situations if that code is wrong, becuase usually malloc will succeed. figure out that this softint can block indefinitely, only harming later calls of the same family, and not leading to kernel deadlock/etc. This leads to hard-to-debug situations if lack of memory does lead to hangs, because usually malloc will succeed. > In a perfect world we would avoid the interrupt allocation all together, but > I have not found a way to rearrange things here to make this feasible. > > Is kmem_intr_alloc(9) the best way forward? With all that said, note that I'm not the allocation export. signature.asc Description: PGP signature
Re: New header for GPIO-like definitions
Julian Coleman writes: > name="LED activity" > name="LED disk5_fault" > > name="INDICATOR key_diag" > name="INDICATOR disk5_present" > > and similar, then parse that in MI code. Another approach would be to extend the fdt schema in the way they would if solving this problem and use that. In other words: if you were in charge of fdt and were going to add this feature, what would you do? But, your name overloading proposal seems ok. signature.asc Description: PGP signature
Re: New header for GPIO-like definitions
Julian Coleman writes: >> > #define GPIO_PIN_LED0x01 >> > #define GPIO_PIN_SENSOR 0x02 >> > >> > Does this seem reasonable, or is there a better way to do this? > >> I don't really understand how this is different from in/out. >> Presumably this is coming from some request from userspace originally, >> where someone, perhaps in a config file, has told the system how a pin >> is hooked up. > > The definitions of pins are coming from hardware-specific properties. That's what I missed. On a device you are dealing with, pin N is *always* wired to an LED because that's how it comes from the factory. My head was in maker-land where there is an LED because someone wired one up. > In the driver, I'd like to be able to handle requests based on what is > connected to the pin. For example, for LED's, attach them to the LED > framework using led_attach() That makes sense, then. But how do you denote that logical high turns on he light, vs logical low? >> LED seems overly specific. Presumably you care that the output does >> something like "makes a light". But I don't understand why your API >> cares about light vs noise. And I don't see an active high/low in your >> proposal. So I don't understand how this is different from just >> "controllable binary output" > > As above, I want to be able to route the pin to the correct internal > subsystem in the GPIO driver. I just remember lights before LED, and the fact that they are LED vs incandescent is not important to how they are used. I don't know what's next. But given there is an led system, there is no incremental harm and it seems ok. >> I am also not following SENSOR.DO you just mean "reads if the logic >> level at the pin is high or low". >> >> I don't think you mean using i2c bitbang for a temp sensor. > > Yes, just reading the logic level to display whether the "thing" connected > is on or off. A better name would be appreciated. Maybe "INDICATOR", which > would match the envsys name "ENVSYS_INDICATOR"? Or even "GPIO_ENVSYS_INDICATOR" because there might be some binary inputs later that get hooked up to some other kind of framework. > Hopefully, the above is enough, but maybe a code snippet would help (this > snippet is only for LED's, but similar applies for other types). In the > hardware-specific driver, I add the pins to proplib: > > add_gpio_pin(pins, "disk_fault", GPIO_PIN_LED, > 0, GPIO_PIN_ACT_LOW, -1); > ... So I see the ACT_LOW. GPIO_PIN_LED is an output, but presumably this means that one can no longer use it with GPIO and only via led_. Which seems fine. Is that what you mean? > Then, in the MD driver I have: > > pin = prop_array_get(pins, i); > prop_dictionary_get_uint32(pin, "type", ); > switch (type) { > case GPIO_PIN_LED: > ... > led_attach(n, l, pcf8574_get, pcf8574_set); Do you mean MD, or MI? > and because of the way that this chip works, I also need to know in advance > which pins are input and which are output, to avoid inadvertently changing > the input pins to output when writing to the chip. For that, generic > GPIO_PIN_IS_INPUT and GPIO_PIN_IS_OUTPUT definitions might be useful too. I 95% follow, but I am convinced that what you are doing is ok, so to be clear I have no objections. signature.asc Description: PGP signature
Re: New header for GPIO-like definitions
Julian Coleman writes: > I'm adding a driver and hardware-specific properties for GPIO's (which pins > control LED's, which control sensors, etc). I need to be able to pass the > pin information from the arch-specific configuration to the MI driver. I'd > like to add a new dev/gpio/gpiotypes.h, so that I can share the defintions > between the MI and MD code, e.g.: > > #define GPIO_PIN_LED0x01 > #define GPIO_PIN_SENSOR 0x02 > > Does this seem reasonable, or is there a better way to do this? I don't really understand how this is different from in/out. Presumably this is coming from some request from userspace originally, where someone, perhaps in a config file, has told the system how a pin is hooked up. LED seems overly specific. Presumably you care that the output does something like "makes a light". But I don't understand why your API cares about light vs noise. And I don't see an active high/low in your proposal. So I don't understand how this is different from just "controllable binary output" I am also not following SENSOR.DO you just mean "reads if the logic level at the pin is high or low". I don't think you mean using i2c bitbang for a temp sensor. Perhaps you could step back and explain the bigger picture and what's awkward currently. I don't doubt you that more is needed, but I am not able to understand enough to discuss. signature.asc Description: PGP signature
Re: make COMPAT_LINUX match SYSV binaries
co...@sdf.org writes: > I feel compelled to explain further: > any OS that doesn't rely on this tag is prone to spitting out binaries > with the wrong tag. For example, Go spits out Solaris binaries with SYSV > as well. > > Our current solution to it is the kernel reading through the binary, > checking if it contains certain known symbols that are common on Linux. > > We support the following forms of compat: > > ultrixnot ELF > sunos not ELF (we support only oold stuff) > freebsd always correctly tagged, because the native OS > checks this, like we do. > linux ELF, not always correctly tagged > > > So, currently, we only support one OS that has this problem, which is > linux. I am proposing we take advantage of it. > > In the event someone adds support for another OS with this problem (say, > modern Solaris), I don't expect this compat to be enabled by default, > for security reasons. So the problem will only occur if a user enables > both forms of compat at the same time. > > Users already have to opt in to have Linux compat support. I think it is > a lot to ask to have them tag every binary. Thanks for the explanation. I'm still not thrilled, but I withdraw my objection. signature.asc Description: PGP signature
Re: make COMPAT_LINUX match SYSV binaries
co...@sdf.org writes: > As a background, some Linux binaries don't claim to be targeting the > Linux OS, but instead are "SYSV". > > We have used some heuristics to still identify those binaries as being > Linux binaries, like looking into the symbols defined by the binary. > > it looks like we no longer have other forms of compat expected to use > SYSV ELF binaries. Perhaps we should drop this elaborate detection logic > in favour of detecting SYSV == Linux? In general adapting to every confused practice out there leads us to a bad place. This just feels like a step along that path. I could see having a sysctl/etc. to enable this behavior, but it seems really irregular. Is there a way to have a tool to retag binaries that are tagged incorrectly? It seems SYSV emulation should not allow non-SYSV system calls. signature.asc Description: PGP signature
Re: autoloading compat43 on tty ioctls
chris...@astron.com (Christos Zoulas) writes: > Aside for the TIOCGSID bug which I am about to fix (it is in tty_43.c > and is used in libc tcgetsid(), all the compat tty ioctls are defined > in /usr/src/sys/sys/ioctl_compat.h... We can empty that file and try > to build the tree :-), but I am guessing things will break. Also a lot > of pkgsrc will break too. It is not 4.3 applications that break it is > applications that still use the 4.3 terminal api's. If the API is still present in our source tree, then the implementation probably does not belong under COMPAT_43. As I see it COMPAT_43 is to match an old ABI that one can no longer (on modern NetBSD) compile to. What you are describing sounds like "we have an API still, and we've had it since 4.3", which is not in my view COMPAT. signature.asc Description: PGP signature
Re: Sample boot.cfg for upgraded systems (rndseed & friends)
David Brownlee writes: > What would people think of installing an original copy of the etc set > in /usr/share/examples/etc or similar - its 4.9M extracted and ~500K > compressed and the ability to compare what is on the system to what it > was shipped with would have saved me so much effort over the years :) I personally unpack etc and xetc to /usr/netbsd-etc via the INSTALL-NetBSD update script in etcmanage. I would not be super keen on adding a full etc by default, especially because then there's the issue of managing it for upgrades. But if it is unpacked someplace, and updated on updates, and old files removed on updates via postinstall fix, maybe. signature.asc Description: PGP signature
Re: Logging a kernel message when blocking on entropy
Andreas Gustafsson writes: > The following patch will cause a kernel message to be logged when a > process blocks on /dev/random or some other randomness API. It may > help some users befuddled by pkgsrc builds blocking on /dev/random, > and I'm finding it useful when testing changes aimed at fixing PR > 55659. I'm in favor. I have not dug in to the brave new entropy world. I'm sure it's better in many ways, but it also seems like people/systems that used to not end up blocked before now do, apparently because some sources that used to be considered ok (timing of events) no longer are. So I think people should be given clues - things appear a bit too difficult now. signature.asc Description: PGP signature
Re: Proposal to enable WAPBL by default for 10.0
Taylor R Campbell writes: [lots of good points, no disagreement] If /etc/master.passwd is ending up with junk, that's a clue that code that updates it isn't doing the write secondary file, fysnc it, rename, approach. As I understand it with POSIX filesystems you have to do that because there is no guarantee on open/write/close that you'll have one or the other. Even with zfs, you could have done write on the first half and not the second, so I think you still need this.q > work...which is why I used to use ffs+sync on my laptop, and these > days I avoid ffs altogether in favour of zfs and lfs, except on > install images written to USB media.) Do you find that lfs is 100% solid now (in 9-stable, or current)? I have seen fixes and never really been sure.
Re: AES leaks, cgd ciphers, and vector units in the kernel
a data point on a machine from 2014: $ ./aestest -l BearSSL aes_ct Intel SSE2 bitsliced $ progress -f /dev/zero sh -c 'exec ./aestest -e -b 256 -c aes-xts -i "Intel SSE2 bitsliced" > /dev/null' 399 MiB 56.98 MiB/s ^C $ progress -f /dev/zero sh -c 'exec ./aestest -e -b 256 -c aes-xts -i "BearSSL aes_ct" > /dev/null' 211 MiB 26.38 MiB/s ^C $ progress -f /dev/zero sh -c 'exec ./bad -e -b 256 -c aes-xts > /dev/null' 869 MiB 86.85 MiB/s ^C So the sse2 is slower, but not enough to get upset about. cpu0: "Intel(R) Core(TM) i7 CPU 930 @ 2.80GHz" cpu0: Intel Core i7, Xeon 34xx, 35xx and 55xx (Nehalem) (686-class), 2800.09 MHz cpu0: family 0x6 model 0x1a stepping 0x5 (id 0x106a5) cpu0: features 0xbfebfbff cpu0: features 0xbfebfbff cpu0: features 0xbfebfbff cpu0: features1 0x98e3bd cpu0: features1 0x98e3bd cpu0: features2 0x28100800 cpu0: features3 0x1 cpu0: features7 0x9c00
Re: AES leaks, cgd ciphers, and vector units in the kernel
Taylor R Campbell writes: >> What I meant is: consider an external USB disk of say 4T, which has a >> cgd partition within which is ffs. >> >> Someone attaches it to several systems in turn, doing cgd_attach, mount, >> and then runs bup with /mnt/bup as the target, getting deduplication >> across systems. > > (Side note: as a matter of architecture I would recommend > incorporating the cryptography into the application, like borgbackup, > restic, or Tarsnap do -- anything at a higher level than disks (even > at the level of the file system, like zfs encryption) has much more > flexibility and can also provide authentication. Generally the main > use case for disk encryption is to enable recycling disks without > worrying about information disclosure; the threat model and security > of disk encryption systems are both qualitatively very weak.) Sure, but this is about doing something that is really reliable about getting data back for disaster recovery, simplicity, only using tools that have existed for a long time. (You can't run zfs on old systems, and borgbackup has had enough stability issues that I wouldn't trust it.) >> So, using the new faster cipher won't work, because it's not supported >> by the older systems. >> >> Hoewver, if the -current system does AES slowly because it has the new >> constant-time implementation, and the older ones do it like they used >> to, I don't see a real problem. > > OK. If you encounter a scenario where this is likely to be a real > problem, let me know. >From my viewpoint, a 3x slowdown, but with 100% reliablity is not a big deal. > I drafted an SSE2 implementation which considerably improves on the > BearSSL aes_ct implementation on a number of amd64 CPUs I tested from > around a decade ago. It is still slower than before -- and AES-CBC > encryption hurts by far the most, because it is necessarily > sequential, whereas AES-CBC decryption and AES-XTS in both directions > can be vectorized -- but it does mitigate the problem somewhat. This > covers all amd64 CPUs and probably most `i386' CPUs of the last 15-20 > years. > > There is some more room for improvement -- SSSE3 provides PSHUFB which > can sequentially speed up parts of AES, and is supported by a good > number of amd64 CPUs starting around 14 years ago that lack AES-NI -- > but there are diminishing returns for increasing implementation and > maintenance effort, so I'd like to focus on making an impact on > systems that matter. (That includes non-x86 CPUs -- e.g., we could > probably easily adapt the Intel SSE2 logic to ARM NEON -- but I would > like to focus on systems where there is demand.) That sounds good. > I drafted a couple programs to approximately measure performance from > userland. They are very naive and do nothing to measure overhead from > cgd(4) or disk i/o itself. > > https://www.NetBSD.org/~riastradh/tmp/20200621/aestest.tgz > https://www.NetBSD.org/~riastradh/tmp/20200622/adiantum.tgz Thanks - will try them. >> So it remains to make userland AES use also constant time, as a separate >> step? > > Correct. ok - and helpful details from nia@ noted.
Re: AES leaks, cgd ciphers, and vector units in the kernel
Taylor R Campbell writes: >> I don't really see the new cipher as a reasonable option for removable >> disks that need to be accessed by older systems. I can see it for >> encrypted local disk. But given AES hardware speedup, I suspect most >> people can just stay with AES. > > Can you be more specific about the systems you're concerned about? > What are the characteristics and performance requirements of the > different systems that need to share disks? Do you have a reason to > need to share a backup drive that you use on an up-to-date NetBSD on > older hardware where it has to be fast, with a much older version of > NetBSD? > > (I am sure there are use cases I haven't thought of; I just want to > make sure I understand the use cases before I try to address them.) What I meant is: consider an external USB disk of say 4T, which has a cgd partition within which is ffs. Someone attaches it to several systems in turn, doing cgd_attach, mount, and then runs bup with /mnt/bup as the target, getting deduplication across systems. Of these systems, some are older NetBSD and some are newer. Posit one each netbsd 5, 7, 8, 9, current in the mix, as a blend of strawman and not-so-crazy example. After this, the disk is taken to an undisclosed location where it is unlikely to be destroyed (or at least, unlikely to be destroyed correlated with the main systems' disks), but at which it does not have reliable physical protection against snoooping. I submit that this is not an odd model for cgd usage. (I don't actually do this; I mount disks on one system and do over-the-network backups from the older systems, and my mix of system versions is different.) So, using the new faster cipher won't work, because it's not supported by the older systems. Hoewver, if the -current system does AES slowly because it has the new constant-time implementation, and the older ones do it like they used to, I don't see a real problem. >> Is there an easy way to publish code that does hardware AES, to allow >> people to measure on their hardware? If a call for that on -users turns >> up essentially zero actual people that would be bothered, I think that >> would be interesting. > > I am not quite sure what you're asking. Correct me if I have > misunderstood, but I suspect what you're getting at is: > >How can someone on netbsd<=9 test empirically whether this patch >will have a substantial negative performance impact or not? > > On basically all amd64 systems of the past decade, and on most if not > all aarch64 systems, there is essentially guaranteed to be a net > performance improvement. What about other systems? > > The best way to test this is to just boot a new kernel and try a > workload. But I assume you are looking for a userland program that > one can compile and run to test it without booting a new kernel. Yes, that's what I meant. Kind of like "openssl speed". > I could in a couple hours make a program that checks cpuid to detect > hardware support and does some measurements in isolation -- to > estimate an _upper bound_ on the system performance impact. > > The upper bound is likely to be extremely conservative unless your > workload is actually reading and writing zeros to cgd on a RAM-backed > file in tmpfs; for a realistic impact on cgd or ipsec you would have > to take into account the disk or network throughput -- the fraction of > it that is spent in the crypto is what the 1/3-2/3 figure applies to. I did sort of mean "how many MB/s would the old impl do, and how many MB/s would the new one do", realizing that actually reading/writing from disk might overwhelem that. I'm not sure my request is reasonable; it might help up the comfort level for people. > (Note that there is no impact on userland crypto, which means no > impact on TLS or OpenVPN or anything like that, unless for some > bizarre reason you've turned on kern.cryptodevallowsoft and the > userland crypto uses /dev/crypto, the solution to which is to stop > using /dev/crypto and/or turn off kern.cryptodevallowsoft for anything > other than testing because it's terrible (and also the apparently > boolean nature of kern.cryptodevallowsoft is a lie).) So it remains to make userland AES use also constant time, as a separate step? >> I'm unclear on openssl and hardware support; "openssl speed" might be a >> good home for this, and I don't know if openssl needs the same treatment >> as cgd. (Fine to keep separable things separate; not a complaint!) > > OpenSSL is a mixed bag. It has a lot more MD implementations of > various cryptographic primitives. But many of them are still leaky. > So it's probably not a very good proxy for what the performance impact > of this patch set will be. I sort of meant putting the new code in there so it can be measured, but I realize that's messy. Please don't take my "is there a way" question as a demand.
Re: AES leaks, cgd ciphers, and vector units in the kernel
Taylor R Campbell writes: >> Date: Thu, 18 Jun 2020 07:19:43 +0200 >> From: Martin Husemann >> >> One minor nit: with the performance impact that high, and there being >> setups where runtime side channel attacks are totally not an issue, >> maybe we should leave the old code in place for now, #ifdef'd as >> default-off options to provide time for a full reconstruction (or untill >> the machine gets update to "the last decade" cpu)? > > Having leaky AES code around is asking for trouble -- and would > require additional complexity to implement and maintain (e.g., is it > always unhooked from the build, or do we hook it in just enough to run > tests?), which would add further burden on an audit to verify that > it's _not_ being used in a real application. > > The goals here are to make that burden completely go away by making > the answer unconditionally no, there's essentially no danger that AES > in the kernel is leaky; and to provide alternatives with performance > ranging from `not worse' to `much better' to avoid the conflict that > AES invites between performance and security. > > If you have a specific system where there's a real negative > performance impact that matters to you, I would be happy to talk over > the details and see how we can address it better. I see your point, and I think this is probably ok, but I share Martin's concern. For me, the main use of cgd is to encrypt backup drives. I am therfore not really concerned about side channel attacks when they are attached and keyed on the system being backed up. (I realize other people use cgd for other reasons.) I don't really see the new cipher as a reasonable option for removable disks that need to be accessed by older systems. I can see it for encrypted local disk. But given AES hardware speedup, I suspect most people can just stay with AES. Is there an easy way to publish code that does hardware AES, to allow people to measure on their hardware? If a call for that on -users turns up essentially zero actual people that would be bothered, I think that would be interesting. I'm unclear on openssl and hardware support; "openssl speed" might be a good home for this, and I don't know if openssl needs the same treatment as cgd. (Fine to keep separable things separate; not a complaint!)
Re: makesyscalls (moving forward)
David Holland writes: > Meanwhile it doesn't belong in sbin because it doesn't require root, > nor does doing something useful with it require root, and it doesn't > need to be on /, so... usr.bin. Unless we think libexec is reasonable, > but if 3rd-party code is going to be running it we really want it on > the $PATH, so... I agree with that logic, that makesyscalls is kind of like config, and that /usr/bin makes sense. There's nothing admin-ish about it, as building an operating system is not about configuring the host. We could have a directory for tools used only for building NetBSD that are not otherwise useful, and put config and makesyscalls there, but given that we aren't overwhelming bin in a way that causes trouble, that doesn't seem like a good idea.
Re: KAUTH_SYSTEM_UNENCRYPTED_SWAP
Alexander Nasonov writes: > Greg Troxel wrote: >> Kamil Rytarowski writes: >> >> > Is it possible to avoid negation in the name? >> > >> > KAUTH_SYSTEM_ENABLE_SWAP_ENCRYPTION >> >> I think the point is to have one permission to enable it, which is >> perhaps just regular root, and another to disable it if securelevel is >> elevated. >> >> So perhaps there should be two names, one to enable, one to disable. > > Kauth is about security rather than speed or convenience. Disabling > encryption may improve speed but it definitely degrades your security > level. So, you can enable vm.swap_encrypt at any level but you can't > disable it if you care about security. I understand that. But there's still a question of "should there be a KAUTH name for enabling as well as disabling", separate from "what should the rules be". I think everybody believes that regardless of securelevel, root should be able to enable encrypted swap. But probably almost everyone thinks regular users should not be allowed to enable it. I realize we have a lot of "root can", and that extending kauth to make everything separate is almost certainly too much. But when disabling is a big deal, I think it makes sense to add names for both enabling and disabling, to make that intent clearer in the sources. But, I don't think this is that important, and a comment would do.
Re: KAUTH_SYSTEM_UNENCRYPTED_SWAP
Kamil Rytarowski writes: > Is it possible to avoid negation in the name? > > KAUTH_SYSTEM_ENABLE_SWAP_ENCRYPTION I think the point is to have one permission to enable it, which is perhaps just regular root, and another to disable it if securelevel is elevated. So perhaps there should be two names, one to enable, one to disable.
Re: Rump makes the kernel problematically brittle
Thor Lancelot Simon writes: > I'd love to see a GSoC project to actually make rump build like the > kernel...but it may be too much work. Good points, and improvement would be great.
Re: Rump makes the kernel problematically brittle
The other side of the coin to "rump is fragile" is "an operating system without rump-style tests that can be run automatically is suscpetible to hard-to-detect failures from changes, and is therefore fragile". There have been many instances (usually on current-users, I think) of reports of newly-failing tests cases, leading to rapid removal of newly-introduced defects.
Re: Rump dependencies (5.2)?
Mouse writes: >> The rump build is done with separate reachover makefiles. [...] > > Hm. Then I think possibly the right answer for the moment is for me to > excise rump from my tree entirely. I can't recall ever wanting its > functionality, and trying to figure out what the dependency graph is > when it exists only implicitly in Makefiles scattered all over the tree > sounds like a recipe for serious headaches. > > If and when it looks worth the effort, I can always back out the > removal commit and clean up the result. But SCM_MEMORY looks like the > more valuable thing for my use cases for the moment. Your tree, your call. But it seems really obvious that you should fix the rump build and write some atf test cases for your SCM_MEMORY stuff, and then you will be able to test it automatically.
Re: Proposal, again: Disable autoload of compat_xyz modules
chris...@astron.com (Christos Zoulas) writes: > I propose something very slightly different that can preserve the current > functionality with user action: > > 1. Remove them from standard kernels in architectures where modules are >supported. Users can add them back or just use modules. > 2. Disable autoloading, but provide a sysctl to enable autoloading >(1 global sysctl for all compat modules). Users can change the default >in /etc/sysctl.conf (adds sysctl to the proposal) I am assuming that we are talking about disabling autoloading of a number of compat modules that are some combination of believed likely to have security bugs and not used extensively, and this includes compat for foreign OS, but does not, at least for now, include compat for older NetBSD. This situation is basically a balancing act of the needs/benefits somehow aggregated (I will avoid "averaged") over all users. It seems pretty unclear how to evaluate that in total. But, it does seem like your single-sysctl proposal means: people who like compat being autoloaded can add one line in sysctl.conf and be back where they were people who want specific modules can load them and not enable the general sysctl people who don't know about any of this who try to run Linux binaries will lose, and presumably there'd be a line in dmesg that says which module failed to autoload, like policy blocked autoloading compat_linux module; see compat_linux(8) which would then explain. I'm also assuming this is being talked about for HEAD and hence 10, and not 9. Overall, this seems like a reasonable compromise among conflicting goals. If older NetBSD compat were included, I'd want to see a separate sysctl, default-on for now. (My guess is that wanting to disable that is a fairly extreme position, at least these days.)
Re: build.sh sets with xz (was Re: vfs cache timing changes)
Martin Husemann writes: > On Fri, Sep 13, 2019 at 06:59:42AM -0400, Greg Troxel wrote: >> I'd like us to keep somewhat separate the notions of: >> >> someone is doing build.sh release >> >> someone wants min-size sets at the expense of a lot of cpu time >> >> >> I regularly do build.sh release, and rsync the releasedir bits to other >> machines, and use them to install. Now perhaps I should be doing >> "distribution", but sometimes I want the ISOs. > > The default is MKDEBUG=no so you probably will not notice the compression > difference that much. I don't follow what DEBUG has to do with this, but that's not important. > If you set MKDEBUG=yes you can just as easily set USE_XZ_SETS=no > (or USE_PIGZGZIP=yes if you have pigz installed). Sure, I realize I could do this. The question is about defaults. > The other side of the coin is that we have reproducable builds, and we > should not make it harder than needed to reproduce our official builds. It should not difficult or hard to understand, which is perhaps different than defaults. > But ... it already needs some settings (which we still need to document > on a wiki page properly), so we could also default to something else > and force maximal compressions via the build.sh command line on the > build cluster. I could see MKREPRODUCILE=yes causing defaults of various things to be a particular way, and perhaps letting XZ default to no otherwise. I would hope that what MKREPRODUCILE=yes has to set is not very many things, but I haven't kept up.
Re: build.sh sets with xz (was Re: vfs cache timing changes)
"Tom Spindler (moof)" writes: >> PS: The xz compression for the debug set takes 36 minutes on my machine. >> We shoudl do something about it. Matt to use -T for more parallelism? > > On older machines, xz's default settings are pretty much unusable, > and USE_XZ_SETS=no (or USE_PIGZGZIP=yes) is almost a requirement. > On my not-exactly-slow i7 6700K, build.sh -j4 parallel is just fine > until it hits the xz stage; gzip is many orders of magnitude faster. > Maybe if xz were cranked down to -2 or -3 it'd be better at not > that much of a compression loss, or it defaulted to the higher > compression level only when doing a `build.sh release`. (I have not really been building current so am unclear on the xz details.) I'd like us to keep somewhat separate the notions of: someone is doing build.sh release someone wants min-size sets at the expense of a lot of cpu time I regularly do build.sh release, and rsync the releasedir bits to other machines, and use them to install. Now perhaps I should be doing "distribution", but sometimes I want the ISOs. Sometimes I do builds just to see if they work, e.g. if being diligent about testing changes. (Overall the notion of staying with gzip in most cases, with a tunable for extreme savins sounds sensible but I am too unclear to really weigh in on it.)
Re: NFS lockup after UDP fragments getting lost
Edgar Fuß writes: > Thanks to riastradh@, this tuned out to be caused by an (UDP, hard) > HFS mount combined with a mis-configured IPFilter that blocked all but > the first fragment of a fragmented NFS reply (e.g., readdir) combined > with a NetBSD design error (or so Taylor says) that a vnode lock may > be held accross I/O, in this case, network I/O. Holding a vnode lock across IO seems like a bug to me too. Marking the vnode as having an in-process operation so others can lock/read/report-that-status/unlock seems ok. But I'm sure you already know that vnode locking is hard. > It looks like the operation to which the reply was lost sometimes > doesn't get retried. Do we have some weird bug where the first > fragment arriving stops the timeout but the blocking of the remaining > fragments cause it to wedge? Probably not. fragments sit until there's a packet and then the packet is sent to the stack. So the NFS code is almost certainly totally unaware of the arrival of the first fragment.
Re: /dev/random is hot garbage
Taylor R Campbell writes: >> It would also be reasonable to have a sysctl to allow /dev/random to >> return bytes anyway, like urandom would, and to turn this on for our xen >> builders, as a different workaround. That's easy, and it doesn't break >> the way things are supposed to be for people that don't ask for it. > > What's the advantage of this over using replacing /dev/random by a > symlink to /dev/urandom in the build system? > > A symlink can be restricted to a chroot, while a sysctl knob would > affect the host outside the chroot. The two would presumably require > essentially the same privileges to enact. None, now that I think of it. So let's change that on the xen build host. And, the other issue is that systems need randomness, and we need a way to inject some into xen guests. Enabling some with rndctl works, or at least used to, even if it is theoretically dangerous. But we aren't trying to defend against the dom0.
Re: /dev/random is hot garbage
I don't think we should change /dev/random. For a very long time, the notion is that the bits from /dev/random really are ok for keys, and there has been a notion that such bits are precious and you should be prepared to wait. If you aren't generating a key, you shouldn't read from /dev/random. So I think rust is wrong and should be fixed. I can see the reason for frustration, but I believe that we should not break things that are sensible because they are abused and cause problems in some environments. It would also be reasonable to have a sysctl to allow /dev/random to return bytes anyway, like urandom would, and to turn this on for our xen builders, as a different workaround. That's easy, and it doesn't break the way things are supposed to be for people that don't ask for it. Also, on the xen build hosts, it would perhaps be good to turn on entropy collection from network and disk. Another approach, harder, is to create a xenrnd(4) pseudodevice and hypervisor call that gets bits from the host's /dev/random and injects them as if from a hardware rng.
Re: mknod(2) and POSIX
David Holland writes: > However, I notice that mknod(2) does not describe how to set the > object type with the type bits of the mode argument, or document which > object types are allowed, and mkfifo(2) also does not say whether > S_IFIFO should be set in the mode argument or not. This is documented quite well in the opengroup.org standards pages (or in S_IFIFO, and just don't set any special bits, respectively). Agreed that fixing the man pages would be good. > (Though mkfifo(2) hints not by not documenting EINVAL for "The > supplied mode is invalid", this sort of inference is annoying even in > standards and really not ok for docs...) https://pubs.opengroup.org/onlinepubs/9699919799/functions/mknod.html https://pubs.opengroup.org/onlinepubs/9699919799/functions/mkfifo.html#tag_16_327 Those seem clear to me.
Re: mknod(2) and POSIX
Agreed with uwe@ about not mixing unrelated changes. Pretend we are using git :-) The patch looks fine. Agreed that making fifos with mknod is an odd thing to do, but if it's in posix, then we should do it unless there's something really bad about supporting the posix usage. In this case, it just seems silly to have a second way to make fifos, not harmful.
mknod(2) and POSIX
I recently noticed that pkgsrc/sysutils/bup failed when restoring a fifo under NetBSD because it calls mknod (in python) which calls mknod(3) and hence mknod(2). Our mknod(2) man page does not mention creating FIFOS, and claims The mknod() function conforms to IEEE Std 1003.1-1990 (“POSIX.1”). mknodat() conforms to IEEE Std 1003.1-2008 (“POSIX.1”). I can't find 1990 online, but 2004 and 2008 require fifo support in mknod: https://pubs.opengroup.org/onlinepubs/009695399/functions/mknod.html https://pubs.opengroup.org/onlinepubs/9699919799.2008edition/functions/mknod.html However, at least in netbsd-8, our kernel (sys/vfs_syscalls.c:do_mknod_at): requires KAUTH_SYSTEM_MKNOD for all callers, and hence EPERM for non-root has a switch on allowable types, and S_IFIFO is not one of them, and hence EINVAL I realize mkfifo is preferred in our world, and POSIX says it is preferred. But I believe we have a failure to follow POSIX. Other opinions?
Re: pool: removing ioff?
Maxime Villard writes: > I would like to remove the 'ioff' argument from pool_init() and friends, > documented as 'align_offset' in the man page. This parameter allows the > caller to specify that the alignment given in 'align' is to be applied at > the offset 'ioff' within the buffer. > > I think we're better-off with hard-aligned structures, ie with __aligned(32) > in the case of XSCALE. Then we just pass align=32 in the pool, and that's it. > > I would prefer to avoid any confusion in the pool initializers and drop ioff, > rather than having this kind of marginal and not-well-defined features that > add complexity with no real good reason. > > Note also that, as far as I can tell, our policy in the kernel has always > been to hard-align the structures, and then pass the same alignment in the > allocators. I am not objecting as I can't make a performance/complexity argument. But, I wonder if this comes from the Solaris allocation design, and that the ioff notion is not about alignment for 4/8 objects to fit the way the CPU wants, but for say 128 byte objects to be lined up on various different offsets in different pages to make caching work better. But perhaps that doesn't exist in NetBSD, or is done differently, or my memory of the paper is off.
Re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c
Christoph Badura writes: > On Mon, Jan 21, 2019 at 04:24:49PM -0500, Greg Troxel wrote: >> Separetaly from debug code being careful, if it's a rule that bdv can't >> be NULL, it's just as well to put in a KASSERT. Then we'll find out >> where that isn't true and can fix it. > > I must not be getting something. If rf_containsboot() is passed a NULL > pointer, it will trap with a page fault and we can get a stacktrace from > ddb. If we add a KASSERT it will panic and we can get a stacktrace from > ddb. I don't see where the benefit in that is. The benefit is that the panic from the KASSERT is cleaner, and it documents for readers of the function that the author believes it is a rule. And it will definitely fault even on machines will can dereference NULL - that is technically if not practically architecture dependent. > Do you think we should add a KASSERT to document that rf_containsboot() > does expect a valid pointer? I'd see value in that and would go ahead with > it. Yes. Basically, in any kernel function, if there is a requirement that a pointer be non-NULL, then there should be a KASSERT and the code should then feel free to assume it is valid. When a KASSERT is hit, the user gets a message with the KASSERT expression and the source file/line, instead of a page fault traceback. It's very easy and quick to go from that printout to the KASSERT that failed. Plus, adding the KASSERT, or talking about adding it, is a good way to check if there is consensus among the other developers that this really is a rule. In NetBSD, people are really good at telling you you're wrong!
Re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c
Christoph Badura writes: >> > + if (bdv == NULL) >> > + return 0; >> > + >> >> This looked suspicious, even before I read the code. >> The question is if it is ever legitimate for bdv to be NULL. > > That is an excellent point. The short answer is, no it isn't. And it > never was NULL in the code that used it. I got a trap into ddb because of > a null pointer deref in the DPRINTF that I changed (in the 4th hunk of my > patch). > >> I am a fan of having comments before every function declaring their >> preconditions and what they guarantee on exit. Then all uses can be >> audited if they guarantee the the preconditions are true. This approach >> is really hard-core in eiffel, known as design by contract. > > Yes, I totally agree. Also to the rest of your message that I didn't quote. > > When I prepared the patch yesterday I was about to delete the above change > because at first I couldn't remember why I added it ~3 weeks ago. That > should have raised a big fat warning sign. > > I thought about adding a comment after I read your private mail > earlier today. In the end I decided it is better to not change > rf_containsboot() and instead introduce a wrapper for the benefit of the > DPRINTF. Separetaly from debug code being careful, if it's a rule that bdv can't be NULL, it's just as well to put in a KASSERT. Then we'll find out where that isn't true and can fix it.
Re: patch: debug instrumentation for dev/raidframe/rf_netbsdkintf.c
Christoph Badura writes: > Here is some instrumentation I found useful during my recent debugging. > If there are no objections, I'd like to commit soon. > > The change to rf_containsroot() simplifies the second DPRINTF that I added. > > Index: rf_netbsdkintf.c > === > RCS file: /cvsroot/src/sys/dev/raidframe/rf_netbsdkintf.c,v > retrieving revision 1.356 > diff -u -r1.356 rf_netbsdkintf.c > --- rf_netbsdkintf.c 23 Jan 2018 22:42:29 - 1.356 > +++ rf_netbsdkintf.c 20 Jan 2019 22:32:14 - > @@ -472,6 +472,9 @@ > const char *bootname = device_xname(bdv); > size_t len = strlen(bootname); > > + if (bdv == NULL) > + return 0; > + > for (int col = 0; col < r->numCol; col++) { > const char *devname = r->Disks[col].devname; > devname += sizeof("/dev/") - 1; This looked suspicious, even before I read the code. The question is if it is ever legitimate for bdv to be NULL. I am a fan of having comments before every function declaring their preconditions and what they guarantee on exit. Then all uses can be audited if they guarantee the the preconditions are true. This approach is really hard-core in eiffel, known as design by contract. In NetBSD, many functions have KASSERT at the beginning. This checks them (under DIAGNOSTIC) but it also is a way of documenting the rules. >From a quick glance at the code it seems obvious that it's not ok to call these functions with a NULL bdv. So if bdv is an argument and not allowed to be NULL, then early on in that function, where you check/return, there should be KASSERT(bdv != NULL) Not really on point, but as a caution there should be no behavior change in any function under DIAGNOSTIC, if the code is bug free and preconditions are met. So "if something we can rely on isn't true, panic" is fine, but many other things rae not.
Re: Importing libraries for the kernel
m...@netbsd.org writes: > I don't expect there to be any problems with the ISC license. It's the > preferred license for OpenBSD and we use a lot of their code (it's > everywhere in sys/dev/) Agreed that the license is ok. > external, as I understand it, means "please upstream your changes, and > avoid unnecessary local changes". Agreed. And also that we have a plan/expectation of tracking changes and improvements that upstream makes. Code that is not in external more or less implies that we are the maintainer. For these libraries, my expectation is that they are being actively maintained and that we will want to update to newer upstream versions from time to time.
Re: noatime mounts inhibiting atime updates via utime()
Edgar Fuß writes: >> Honestly, I think atime is one of the dumbest thing ever. > We occasionally use them to find out (or have a first guess at): > -- has anyone used libfoobar last year? > -- who uses kbaz, i.e. has /home/xyz/.config/kbaz.conf been accessed? > > We use snapshots to run backups, so atimes are not touched by them. I fairly often look at atimes to find out if old libraries have been used, and various other things. I have also had a test that tried to use utime fail on a machine that was noatime. So the notion that noatime should mean what it does now, but allow explicit writes sounds good. I don't see any value in changing the naming of the flags. Having a fs write atime updates unless mounted noatime seems fine, and if people want noatime that's easy. I would be opposed to e.g. dropping the noatime option, making noatime default, and adding an atime option. That's just churn violating historical norms for no good reason. There's a question of what the default for installs should be, and I don't have a real opinion about that. It would be good to have stats about writes, separately including atime updates. Right now we know it causes writes but I haven't seen data.
Re: fixing coda in -current
m...@netbsd.org writes: > On Sun, Nov 25, 2018 at 08:05:21PM -0500, Greg Troxel wrote: >> However, I am pleased to report that the coda people have said that they >> are working on a fuse interface, although it's expected to be slower. >> We'll see, both if it materializes and how fast it is. > > That'd be neat. > ... can we get general consensus about removing kernel coda if that > happens, and the FUSE implementation works for netbsd too? > dholland speaks poorly of it, we don't have a volunteer to write out > tests, and it has a history of brea Getting consensus is hard enough that I would prefer to defer that until we see where we are. The breakage history from NetBSD VFS changes isn't really that bad -- a few times in 20 years, and it has caused very little trouble for others.
Re: fixing coda in -current
m...@netbsd.org (Emmanuel Dreyfus) writes: > Greg Troxel wrote: > >> However, I am pleased to report that the coda people have said that they >> are working on a fuse interface, although it's expected to be slower. > > FUSE vs kernel does not really matter when we deal with network > filesystem performance. The latency of requesting a network operation is > orders of magnitude higher than issuing a few system calls. That's true when the file has to be fetched. Coda, like AFS, caches files in normal operation, and there are read lock callbacks. So the first fetch is over the network and slow, and subsequent reads are at nearly the speed of the underlying filesystem. It is this speed that people are talking about.
Re: fixing coda in -current
David Holland writes: > So I have no immediate comment on the patch but I'd like to understand > better what it's doing -- the last time I crawled around in it > (probably 7-8 years ago) it appeared to among other things have an > incestuous relationship with ufs_readdir such that if you tried to use > anything under than ffs as the local container it would detonate > violently. But I never did figure out exactly what the deal was other > than it was confusing and seemed to violate a lot of abstractions. > > Can you clarify? It would be nice to have it working properly and > stuff like the above is only going to continue to fail in the future... I didn't read this patch carefully, and I'm not Brett. But the basic scheme is that a container file representing a directory is in a particular format. This has been a source of issues when there was an alignment change in directory reading. My impression is that the way it should be is that a container file that's a directory should be read in ufs format, regardless of the container filesystem type. I am not sure that's the way the code is. However, I am pleased to report that the coda people have said that they are working on a fuse interface, although it's expected to be slower. We'll see, both if it materializes and how fast it is.
Re: fixing coda in -current
I volunteer to bug Satya about using FUSE instead of a homegrown (pre-FUSE) kernel interface. I am unaware of anytning else that allows writes while disconnected and reintegrates them. I have actually done that, both on purpose and for several days while my IPsec connection was messed up, and it really worked.
Re: fixing coda in -current
I used to use it, and may again. So I'd like to see it stay, partly because I think it's good to keep NetBS relevant in the fileystem research world. I am expecting to see new upstream activity. But, I think it makes sense to remove it from GENERIC, and perhaps have whatever don't-autoload treatment, so that people only get it if they explicitly ask for it. That way it should not bother others.
Re: Things not referenced in kernel configs, but mentioned in files.*
co...@sdf.org writes: > So, I am excluding things that appear in ALL, and I am not checking if But ALL is an x86 thing, currently. > they appear as modules. Interesting, but I suppose they then belong in ALL also. > So far I had complaints about the appearance of 'lm' which cannot be > safely included in a default kernel, for example. Sure, lots of things are not ok in GENERIC, but do those concerns apply to it being in ALL?
Re: Things not referenced in kernel configs, but mentioned in files.*
co...@sdf.org writes: > This is an automatically generated list with some hand touchups, feel > free to do whatever with it. I only generated the output. > > ac100ic > acemidi > acpipmtr > [snip] I wonder if these are candidates to add to an ALL kernel, and if it will turn out that they are mostly not x86 things. I see we only have ALL for i386/amd64. I wonder if it makes sense to have one in evbarm.
Re: NetBSD-8 kernel too big?
Two thoughts: When trimming, ls -lSr in the kernel build directory will identify large objects. We have had kernel modules for a while, but I'm not entirely clear on where we are. I would think that moving to a mode of aggressively not including things that can be modules and loading them from the fs as needed would help, particularly if the issue is the bootloader, vs memory used up when running. This is not build as part of an -8 release build, but there is MODULAR in the conf directory. signature.asc Description: PGP signature
Re: How to prevent a mutex being _enter()ed from being _destroy()ed?
Edgar Fuß writes: > I know very little about locking so that's probably a stupid question, but: Don't worry - locking is probably the hardest thing to get right. > Is there a general scheme/rule/proposal how to prevent a mutex that someone > is trying to mutex_enter() from being teared down by someone else calling > mutex_destroy() on it during that? Not really. Basically it's a bug to destroy a mutex that could possibly be in use. So there has to be something else protecting getting at the mutex that is being destroyed. > Specifically, I'm trying to understand what should prevent a thread from > destroying a socket's lock (by sofree()/soput()) while unp_gc() is trying > to acquire that lock. I would expect (without reading the code) that there would be some lock on the socket structure (using list here; the type is not the point), and there would be a acquire socket_list lock find socket lock socket unlock sockt_list or alternatively acquire socket_list lock find socket unlink socket from the list unlock sockt_list do whatever to the socket So there has to be a rule about when various things are valid based on being in various higher-level data structures. In an ideal world this rule would be clearly explained in the source code. Ancient BSD tradition is not to explain these things :-( signature.asc Description: PGP signature
Re: new errno ?
Phil Nelson writes: > Hello, > > In working on the 802.11 refresh, I ran into a new errno code from > FreeBSD: > > #define EDOOFUS 88 /* Programming error */ > > Shall we add this one? (Most likely with a different number since 88 is > taken > in the NetBSD errno.h.) > >I could use EPROTO instead, but My immediate reaction is not to add it. It's pretty clearly not in posix, unlikely to be added, and sounds unprofessional. It seems like it would be used in cases where there is a KASSERT in the non-DIAGNOSTIC case. I might just map it to EFAULT or EINVAL. signature.asc Description: PGP signature
Re: ./build.sh -k feature request (was Re: GENERIC Kernel Build errors with -fsanitize=undefined option enabled)
[grudgingly keeping tech-kern and current-users both :-] Reinoud Zandijk writes: > @@ -1084,6 +1084,9 @@ Usage: ${progname} [-EhnoPRrUuxy] [-a ar > Should not be used without expert knowledge of the build > system. > -h Print this help message. > -j njobRun up to njob jobs in parallel; see make(1) -j. > +-k Continue processing after errors are encountered, but > only on > + those targets that do not depend on the target whose > creation > + caused the error. > -M obj Set obj root directory to obj; sets MAKEOBJDIRPREFIX. > Unsets MAKEOBJDIR. > -m machSet MACHINE to mach. Some mach values are actually a few unrelated comments: I think it's pretty clear that if you don't pass -k, nothing will change. So people that don't like -k or think it's unsafe can just not pass it. It would be nice if invoking build.sh with -k resulted in two properties: if any subpart fails, getting a non-zero exit status from build.sh, and having that be clear from the end of the log. Currently 'make -k' in pkgsrc results in -k being passed to the WRKSRC make invocation, which results in a zero status, which results in a .build_done cookie, which is wrong. The full release build is a number of substeps. It makes sense to use -k for the main build of the system, after the tools build, to get the "find all problems" property you want. But it's not clear to me that it makes sense to build the main system if the tool build fails. And it's not clear that sets should be produced. Therefore, I wonder if passing -k to make should be limited to the main build, and the main build should be deemed to fail if there are any errors, so that there are no sets produced in that case. signature.asc Description: PGP signature
Re: i2c and indirect vs. direct config
(I am a noob at i2c.) Your points about explicit config make a lot of sense; reminds me of qbus and isa bus where you have to know. However, baking into the kernel is unfortunate, and I wonder if it makes sense to have the i2c plan either in a boot variable or as something that can configure them after boot, sort of like gpio.conf. signature.asc Description: PGP signature
Re: cpu1: failed to start (amd64)
It feels to me like you might be having two problems: SMP/cpu and USB. I do not understand if they are related or not. Assuming you have a working netbsd-7 machine (i386 is fine), it might be best to build a netbsd-8 kernel and debug there, since 8 has many fixes since 7. You have built a kernel, but you may find BUILD-NetBSD from pkgsrc/sysutils/etcmanage useful; that's my heavily annotated invocation of build.sh. Your choice of debug options sounds good as a first step. First, I suspect that "no ACPI" on a modern machine is just not going to work, as that's how manythings are configured. My impression is that disabling ACPI is appropriate on hardware that just barely has ACPI support, and that support is buggy. So I'm going to not address the "no ACPI" case. This is strange, because while I'm not familiar with that mobo model, it and the CPU sound very normal. I wonder if your motherboard's BIOS is up to date. It might be that the kernel is getting bad ACPI info. With 4 cpus, there is something going wrong, and I haven't seen this before. You could look in the kernel sources for "failed to start" and see if you can understand the code. It may help to print out whatever information is being used to try to start the other cpus, but I have no idea what that is. db{0}> bt vmem_alloc() at netbsd:vmem_alloc+0x3f uvm_km_kmem_alloc() at netbsd:uvm_km_kmem_alloc+0x46 kmem_intr_alloc() at netbsd:kmem_intr_alloc+0x6d kmem_intr_zalloc() at netbsd:kmem_intr_zalloc+0xf mpbios_scan() at netbsd:mpbios_scan+0x4cd mainbus_attach() at netbsd:mainbus_attach+0x2d0 config_attach_loc() at netbsd:config_attach_loc+0x16e cpu_configure() at netbsd:cpu_configure+0x26 main() at netbsd:main+0x2a3 This looks like mpbios_scan has asked to allocate memory in some unreasonable or crazy amount. Really that should not fault/panic, but if you are able to read mpbios_scan (maybe even disassemble to find the C line for 0x4cd) and add sanity checking before alloc, that might lead to figuring it out. Interestingly the product id is different for all (so am guessing is all on one chipset). The kernel boots if I remove uhci* at pci? dev ? function ? but then the USB drive is not detected and the boot device is not found. as expected not to be found, but good that everything else is ok. Presumably there is no 30s delay? The system has five uchi entries across two dev numbers. Enabling Yes, I see uhci0 at pci0 dev 26 function 0: vendor 0x8086 product 0x2834 (rev. 0x02) uhci1 at pci0 dev 26 function 1: vendor 0x8086 product 0x2835 (rev. 0x02) ehci0 at pci0 dev 26 function 7: vendor 0x8086 product 0x283a (rev. 0x02) uhci2 at pci0 dev 29 function 0: vendor 0x8086 product 0x2830 (rev. 0x02) uhci3 at pci0 dev 29 function 1: vendor 0x8086 product 0x2831 (rev. 0x02) uhci4 at pci0 dev 29 function 2: vendor 0x8086 product 0x2832 (rev. 0x02) ehci1 at pci0 dev 29 function 7: vendor 0x8086 product 0x2836 (rev. 0x02) Note that your system has ehci controllers, which are about USB3. Your flashdrive is on usb6 which is ehci1. The way USB3 works is that the ehci controllers have the ports and USB1/2 devices are handed off to the uhci (or ohci non-Intel) controllers. So I wonder if you disabled those, or if it's only the uhci one you disabled. But it looks like sd0 attaches in your posted dmesg. specific devices such as: uhci4 at pci0 dev 29 function 2 allows the kernel to boot, but I have not yet got it to detect the USB drive in any combinations I have tried so far. But your posted dmesg attaches? If I enable all five devices specifying dev and function numbers then it boots but pauses for a very long time (maybe 30+ seconds). I wondered if there USB retries/errors not being displayed so I turned on USBVERBOSE but saw no additional output. So what you posted is with all 5 uhci lines, no uchi wildcard, and you didn't change the ehci wildcard? So it seems that something is matching the uhci driver, but when the attach runs it is crashing, perhaps on some device which is somewhere else. You can use "pcictl pci0 list" and look up the ids for anything odd, and then for the other buses. And yes, if you can set up a serial console, at least to be used when booting (even if the bios doesn't really cope, if you do boot with consdev set), and capture, then you can add debugging and maybe figure out more what's wrong. I suspect that if you know exactly what's wrong, this is not too hard to fix, adding some sort of quirk to not believe something from ACPI, or substitute something sane, exclude some device id, etc. With serial you can also setup kgdb, but I'm not sure how soon in boot that is set up relative to the crash. This lets you run gdb on another machine and debug the kernel remotely, with full source listings. But ddb is quite useful. The 30s delay could be a third thing wrong. signature.asc Description: PGP signature
Re: Reading a DDS tape with 2M blocks
Edgar Fußwrites: > I have a DDS tape (written on an IRIX machine) with 2M blocks. > Any way to read this on a NetBSD machine? > My memories of SCSI ILI handling on DDS are fuzzy. I remember you can operate > these tapes in fixed or variable block size mode, where some values in the > CDB > either mean blocks or bytes. I thought in variable mode, you could read block > sizes other than the (virtual) physical block size of the tape. Did you try dd if=/dev/rsd0d of=FILE bs=2m or similar? I believe that dd does reads of the given bs and these reads are passed to the tape device driver which then does reads of that size from the hardware, and that this then works fine. signature.asc Description: PGP signature
Re: Merging ugen into the usb stack
Martin Husemann <mar...@duskware.de> writes: > On Mon, Dec 11, 2017 at 08:24:00AM -0500, Greg Troxel wrote: >> I wonder if we should be attaching drivers to endpoints, rather than >> devices. > > This is the drivers decision (we have drivers that do it). > > However, ugen is not able to attach to a single interface itself (currently). Well, I guess I think it's better to allow drivers to attach to single interfaces in general, than to make ugen special and try to integrate it. But I haven't looked at things enough to justify that opinion. signature.asc Description: PGP signature
Re: Merging ugen into the usb stack
Martin Husemannwrites: > However, it can not work with the way NetBSD uses ugen devices: > > uftdi0 at uhub3 port 2 > uftdi0: FTDI (0x9e88) SheevaPlug JTAGKey FT2232D B (0x9e8f), rev 2.00/5.00, > addr 3 > ucom0 at uftdi0 portno 1 > ucom1 at uftdi0 portno 2 > > I can disable the ucom at uftdi0 portno 1, but there is no way to get a ugen > device to attach there. > > The uftdi chip itself offers a separate interface for each of the ports, > at that layer there should not be a problem. I wonder if we should be attaching drivers to endpoints, rather than devices. It seems fairly common to have multiple endpoints doing different things (among more-than-one-thing devices), rather than multiple devices behind a hub. Letting ugen attach to endpoints that drivers don't deal with seems like an entirely reasonable solution, and it seems to have lower risk of problems from things we can't predict. I also wonder what other operating systems do here (beyond point solutions that they've had to do). signature.asc Description: PGP signature
Re: Proposal: Disable autoload of compat_xyz modules
Manuel Bouyerwrites: > On Sun, Sep 10, 2017 at 12:17:58PM +0200, Maxime Villard wrote: >> Re-thinking about this again, it seems to me we could simply add a flags >> field in modinfo_t, with a bit that says "if this module is builtin, then >> don't load it". To use compat_xyz, you'll have to type modload, and the >> kernel will load the module from the builtin list. > > If I compile a kernel with a built-in module, I expect this module to > be active. Otherwise I don't compile it. But maxv@ is not talking about you deciding to compile a kernel and putting in a line for a module. The question is about compat modules that are in GENERIC, and how to choose defaults so that users who want to use them aren't inconveniencyed and that users that don't want to use them don't have reduced security. Reading maxv@'s suggestion, I wondered about autoload of non-built-in modules (but maybe that is already disabled). My quick reaction is that it would be nice if the "don't autoload" flag had the same behavior for builtin and non-builtin modules, so that builtin/not is just a linking style thing, and not more. But I see your point about respecting explicit configuration. So I wonder about (without providing a patch of course): having a per-compiled-module flag to disable autoload, as suggested (in builtin and not, unless I'm confused) set the noautoload flag to true in modules that are deemed an unnecessary risk to people who have not made a choice to use them [so far this is maxv's proposal, I think] expand config(8) to be able to set "noautoload", so that if a module is included as part of a kernel, it will be marked noautoload if and noly if the flag is on the line, regardless of defaults. This would not affect the modules in stand; they'd still have the default value of the noautoload flag from the default add the noautload flag to in-tree kernel configs for the above modules which means that in Manuel's custom kernel he can just leave out the noautoload flag and then that kernel will behave as always. People trying to run a MODULAR kernel would still need to either edit their module sources to change the flag (which if you are a MODULAR type, is more or less like editing GENERIC) or do manual modload. Overall I find this disabling of things by default but leaving them in far preferable to not building them or removing them from sources in terms of getting to a better place in the security/usability trade space. signature.asc Description: PGP signature
Re: kernel aslr: someone interested?
Maxime Villardwrites: > I would also add - even if it is not a relevant argument - that most > "commonly-used" operating systems do have kernel aslr: Windows, Mac, Linux, > etc. There's another point, which various people may also consider invalid :-) In the US, there's a federal computer security standard NIST 800-53, and essentially a subset of that NIST 800-171, and more or less all federal contractors handling non-public information have to implement it. There are a lot of security controls, and exploit mitigation is one of them. I am not claiming that kernel ASLR is a requirement. But, I would hate to see people in these environments be told not to use NetBSD because it lacks some security controls compared to alternatives. signature.asc Description: PGP signature
Re: spurious DIAGNOSTIC message "no disk label"
I think it's wrong to print out messages like that because DIAGNOSTIC is defined. DIAGNOSTIC is supposed to just add KASSERT (was panic, long ago) about conditions that must be true unless the kernel is buggy. Separately, given that there's no rule that all disks must have labels, it seems wrong of the kernel to print this. Certainly readdisklable() or something can return an error, and a caller can do something, but IMHO external input shouldn't trigger printfs like this. So I would be inclined to drop the printf, but as you say you might want to figure out why there is more than one attempt to read the label. signature.asc Description: PGP signature
Re: Howto use agr to aggregate VPN tunnels
BERTRAND Joëlwrites: > Hi, > > I have seen in manual : > > There is no way to configure agr interfaces without attaching physical > interfaces. > > Is tap considered as physical interface or not ? tap has MAC > address thus I think that is not a limitation. And agr created with > tap0 and tap1 uses tap0 MAC address. They are not actually physical of course, but I don't see any reason it should not work. However, if no one has tried and fixed any bugs that stop it from working, it might well not. So I suspect digging in with gdb or printf might help. Also, I would suggest setting up agr with two normal ethernet interfaces to be really sure you are doing everything else right. signature.asc Description: PGP signature
Re: CPUs and processes not evenly assigned?!
Hubert Feyrerwrites: > On Fri, 11 Nov 2016, Michael van Elst wrote: >> Since we don't have floating point the computation should be done in >> fixed point arithmetic, e.g. >> >> r_avgcount = (A * r_avgcount + B * INT2FIX(r_mcount)) / (A + B); >> >> With the current A=B=1 you get alpha=0.5, but other values are thinkable >> to make the balancer decide on the short term thread count or an even >> longer term moving average. >> >> Using one fractional bit for INT2FIX by multiplying by two might not >> be enough. > > I see a lot of ground for more research here, determining right amount > of bits and A and B. To sum up our options at this point: I see two separate issues. One is the mixing ratio of the old average and the new value. The other is how values are scaled to have a representation with enough precision. The patch to multiply by 4, but still adding and dividing by 2, seems to me to have the average count be 4* the true value, and provide 2 bits of fraction. As long as counts are compared and not used in an absolute sense, I don't see any problems with that approach. > a) leave the situation as-is and wait for research to get a perfect formula > b) commit the patch we have and wait for the research to be done > > Given that the existing patch in PR kern/43561 and PR kern/51615 does > improve the current situation, I'd vote for option "b". I concur with b. signature.asc Description: PGP signature
Re: FUA and TCQ
Johnny Billquistwrites: > With rotating rust, the order of operations can make a huge difference > in speed. With SSDs you don't have those seek times to begin with, so > I would expect the gains to be marginal. For reordering, I agree with you, but the SSD speeds are so high that pipeling is probably necessary to keep the SSD from stalling due to not having enough data to write. So this could help move from 300 MB/s (that I am seeing) to 550 MB/s. signature.asc Description: PGP signature
Re: struct file reference at VFS level
Joerg Sonnenberger <jo...@bec.de> writes: > On Fri, Apr 22, 2016 at 10:42:10AM -0400, Greg Troxel wrote: >> I still don't understand why this is about FUSE. What if a file were >> opened without O_NONBLOCK and then the same file were opened with? > > O_NONBLOCK is pretty much pointless for regular files. It only really > changes something for sockets, pipes and the like and they behave > different already. Sure, but I meant to include especially character special files. signature.asc Description: PGP signature