Re: NetBSD 9.3 to 10.0 upgrade failure - check for DOS fs
Date:Tue, 9 Apr 2024 22:28:46 +0200 From:Riccardo Mottola Message-ID: <9f0bd479-42ef-7842-90fb-0d6a503cf...@libero.it> | no "e" of course... and no MS-DOS in sight. It was already a fully | BSD-ized system. What does fdisk show? (ie: the MBR label). kre
Re: Is use of 'binary' mode necessary to open files on NetBSD?
Date:Sat, 2 Dec 2023 09:18:56 +0530 From:Mayuresh Message-ID: | On NetBSD, the fopen man page clearly says 'b' is ignored. So wonder if | gcc layer introduces the need to use it in above usage pattern. It is in stdio (see src/lib/libc/stdio/flags.c) - that is in fopen() and its siblings. What some higher level library might do is however entirly up to it. kre
Re: athn0 interface not showing up after detection
Date:Sat, 25 Nov 2023 15:21:13 -0500 From:tiny.sock7...@fastmail.com Message-ID: <43591e56-bbb6-4de5-bec5-37468ddb9...@app.fastmail.com> | The have those firmware files in my /libdata/firmware/if_athn, | I believe they came with the default install. Yes, they do. Your setup looks normal in that regard. But I'm afraid that means I can't assist any more, I know nothing about the USB system or athn devices in particular. You might get better results sending a PR describing what happens, than just sending here to netbsd-users (where some of the more technically oriented people don't necessarily read the messages). kre
Re: athn0 interface not showing up after detection
Date:Sat, 25 Nov 2023 12:51:56 -0500 From:tiny.sock7...@fastmail.com Message-ID: <9cc985c4-f389-4aa2-8315-6394bc697...@app.fastmail.com> | Is there an additional driver or command needed to load them into | kernel memory? No, the kernel should simply load whatever it needs (into the usb device) when it is recognised. You might want to check that sysctl hw.firmware.path starts with /libdata/firmware:... (or at the very least has /libdata/firmware in it). kre ps: you still did not say which version of NetBSD you're using.
Re: athn0 interface not showing up after detection
Date:Sat, 25 Nov 2023 01:59:36 -0500 From:tiny.sock7...@fastmail.com Message-ID: <0cda2184-60f3-4568-8900-0845a093e...@app.fastmail.com> You didn't say which version of NetBSD you're using, that might be important. | What might be causing this? I personally know nothing of athn devices, but the man page does say: For USB devices, the driver needs at least version 1.1 of the following firmware files, which are loaded when an interface is attached: /libdata/firmware/athn-ar7010 /libdata/firmware/athn-ar7010-11 /libdata/firmware/athn-ar9271 Not sure if you need all three, or only the last one - do you have those installed? If not, that would be where I would start. Also read athn(4). kre
Re: buildworld failure due to md5 not supported by openssl3
Date:Fri, 24 Nov 2023 19:45:00 +0100 From:Ede Wolf Message-ID: <6b8e764f-6f34-4f4e-82e1-2c7e7b724...@nebelschwaden.de> | I am having somewhat cosmetic wm(4) issues though, but that is more for | the alpha-port list, as at least on vbox - the only other machine I have | with wm drivers - those issues do not appear. As I understand it, which isn't very much, hardware issues aren't really my thing, many hardware wm devices have errata that need working around, and there's no guarantee that we have workarounds for all the varieties that exist in the tree. Things generally basically work anyway, but there can be issues. A software implementation (like on vbox) is less likely to have problems like that, as any issues that arise can just get fixed in the next vbox update - that's like many many orders of magnitude harder to do with hardware that's already been sold and doesn't rely upon loadable firmware. | Talking about virtualbox. On vbox 7.0.12 (running on a linux host) I was | unable to install amd64 10RC1 with vitio-net. Sorry, can't help with that one either, or certainly without a lot more debugging info as to what was actually happening. | No issues with the emulated Pro/1000MT. Good enough for me Yes, when I did use virtualbox (needed on my laptop before previously so I could run NetBSD) there were several cases like that - just find the emulated hardware that works, and use that - forget anything that doesn't... The change to the order of the lines in that libsaslc/lib/Makefile has been made in the HEAD sources - it needs fixing in NetBSD 10 as well, and it looks as if Christos didn't send a pullup for the change, so I will ... kre
Re: buildworld failure due to md5 not supported by openssl3
Date:Fri, 24 Nov 2023 17:21:42 +0100 From:Ede Wolf Message-ID: <5b8928a4-32b5-4015-8eb1-2432d3eb6...@nebelschwaden.de> | For what it is worth, as you have probably known it before, here my | confirmation: Swapping those lines and disabling kerberos the build | finished without problems. Thanks - I hadn't actually tested it, but I was fairly confident that would be what happened. | So I cannot comment on how usable this build is. Aside from not having Kerberos, it should be identical to the previous one - if you haven't tested that either, ie: you haven't run HEAD at all on your alpha yet, then please do send a message if there are any issues when you do get a chance to try it. I have asked if those 2 lines can be swapped around in the distribution sources, and I suspect that will happen soon, unless there was some obscure reason (rather than just an editing error) for the positioning. kre
Re: buildworld failure due to md5 not supported by openssl3
I just did an alpha build as follows. The -V's are what's in your mk.conf I believe (with MKKERBEROS=yes and USE_KERBEROS=yes) and it worked without issue. The build host is close enough to yours (kernel is 10.99.10 but that's irrelevant - userland is HEAD from before -10 was branched, but not all that long before). The sources don't have the Makefile change that would allow building with MKKERBEROS=no so I didn't try that. build.sh command:build.sh -j 16 -V MKATF=no -V MKCLEANSRC=yes -V MK CLEANVERIFY=yes -V MKCOMPAT=no -V MKCVS=yes -V MKDEBUGLIB=no -V MKDOC=yes -V MKD TRACE=no -V MKGDB=no -V MKHOSTOBJ=no -V MKHTML=no -V MKINFO=no -V MKIPFILTER=no -V MKISCSI=yes -V MKKERBEROS=yes -V USE_KERBEROS=yes -V MKLDAP=no -V USE_LDAP=no -V MKLVM=no -V MKMANZ=yes -V MKMDNS=no -V MKNOUVEAUFIRMWARE=no -V MKNPF=yes -V MKPF=no -V MKPOSTFIX=yes -V MKPROFILE=no -V MKRADEONFIRMWARE=no -V MKREPRO=yes - V MKRUMP=no -V MKX11=no -V MKX11FONTS=no -V MKX11MOTIF=no -V MKZFS=no -V MKYP=no -V USE_YP=no -V MKHESIOD=no -V USE_HESIOD=no -V MKPAM=yes -V USE_PAM=yes -V MKS KEY=no -V USE_SKEY=no -m alpha -D /release/testing/alpha -O /usr/obj/testing/alp ha -R /local/snap/20231123-testing-10.99.10-alpha -T /usr/obj/testing/tools -X / readonly/release/testing/src/xsrc -u -x iso-image build.sh started:Thu Nov 23 20:04:16 +07 2023 NetBSD version: 10.99.10 MACHINE: alpha MACHINE_ARCH:alpha Build platform: NetBSD 10.99.10 amd64 HOST_SH: /bin/sh getenv MAKECONF: /dev/null MAKECONF file: /dev/null TOOLDIR path:/usr/obj/testing/tools DESTDIR path:/release/testing/alpha RELEASEDIR path: /local/snap/20231123-testing-10.99.10-alpha Updated makewrapper: /usr/obj/testing/tools/bin/nbmake-alpha MKREPRO_TIMESTAMPWed Nov 22 14:51:55 UTC 2023 Successful make iso-image build.sh ended: Thu Nov 23 20:06:04 +07 2023 While I use '-u' (update build) this is the first alpha build I've done in decades I think (certainly the first on this system) so everything was clean to start with (no .o files, no .d (dependency) files, I had to actually mkdir the target (DESTDIR) directory - so that was certainly empty). So one thing to check is that while you didn't seem to be doing an update build, so everything should have been cleaned before it started, you might want to try making certain of that by manually cleaning it all (rm -fr on relevant directories) and trying again. It is possible that the change of MKKERBEROS allowed something to not get properly cleaned, which later messed up the build. kre ps: I actually did a release build - rather than the same as yours, not really by design, but just because that's what I normally always do, and I didn't think to change it! I was expecting the build to fail the way you reported, so I didn't thing it would make any real difference.
Re: buildworld failure due to md5 not supported by openssl3
Date:Thu, 23 Nov 2023 12:13:42 +0100 From:Ede Wolf Message-ID: <77602506-626c-4fff-90ec-48e2f4aaf...@nebelschwaden.de> | Ok, I did not see this as yet verified, because, as with MKKERBEROS=yes | and USE_KERBEROS=yes the build fails as well. Even though at a slightly | different place, but still crypto related. | | But very likely that those are related. I'll just sit back, relax and wait. I'll do a build in a minute, using your mk.conf settings, and see if I can work out what other hidden dependency that we have that is causing the problem. This one isn't as obvious as the last one. If you want to do something other than wait (and perhaps, just perhaps, get a usable build) you can try altering src/crypto/external/bsd/libsaslc/lib/Makefile Swap the order of the two lines: COPTS.crypto.c+=-Wno-error=deprecated-declarations .endif (so you get: .endif COPTS.crypto.c+=-Wno-error=deprecated-declarations instead) and then go back to your original mk.conf with MKKERBEROS=no (etc) and see what happens. kre kre
Re: buildworld failure due to md5 not supported by openssl3
Date:Wed, 22 Nov 2023 16:21:01 +0100 From:Ede Wolf Message-ID: | # cat /etc/mk.conf | MKKERBEROS=no That one is the problem, the COPTS.crypto.c entry that Martin mentioned is not included if MKKERBEROS is "no". I have no idea why. kre
Re: buildworld failure due to md5 not supported by openssl3
Date:Wed, 22 Nov 2023 16:21:01 +0100 From:Ede Wolf Message-ID: | My command: | | ./build.sh -a alpha -m alpha -j 4 -r -M /data/obj -D /data/destdir -R | /data/release distsets | | My mk.conf should be rather unspectacular as well: | | | # cat /etc/mk.conf That all looks clean enough - but what about your environment? Do you have CFLAGS or COPTS or anything similar in the environment? kre
Re: iscsid - lfs and ipv6 issues
Date:Sat, 18 Nov 2023 10:46:18 - (UTC) From:mlel...@serpens.de (Michael van Elst) Message-ID: And wrt this part: | The address string is later used in iscsid_driverif.c, a name | is resolved with gethostbyname(), so while an ipv6 address might | be accepted, the code lacks ipv6 support. That's probably correct by default, but it looks to me that if you have "options: inet6" in /etc/resolv.conf then gethostbyname_r (which gethostbyname calls to do all of the work) does ... if (res->options & RES_USE_INET6) { struct hostent *nhp = gethostbyname_internal(name, AF_INET6, res, hp, buf, buflen, he); if (nhp) { __res_put_state(res); return nhp; } } hp = gethostbyname_internal(name, AF_INET, res, hp, buf, buflen, he); __res_put_state(res); "options: inet6" sets RES_USE_INET6 in res->options. gethostbyname_internal() does all the real work of gethostbyname(), looking up "name" for an address in the AF given by the 2nd param. ie: if the inet6 option is set, then gethostbyname() will first look for an IPv6 address (or addresses) and if found, return those. If there are none (or if inet6 is not set in the options) then it will look for an IPv4 address (AF_INET). So, it might be possible to use iscsi with IPv6 without further changes. Doing it that way would cause other gethostbyname() users to also get given v6 addresses, which their code might not be expecting, so YMMV. (ie: caveat emptor). Using getaddrinfo() would be much better of course. kre
Re: iscsid - lfs and ipv6 issues
Date:Sat, 18 Nov 2023 18:25:58 +0700 From:Robert Elz Message-ID: <28754.1700306...@jacaranda.noi.kre.to> | one way to do that might be | if (sp2 = strchr(str, ']')) And in that, sp2 isn't needed, just use sp instead, leading to sp = strchr(sp, ':'); (etc). kre
Re: iscsid - lfs and ipv6 issues
Date:Sat, 18 Nov 2023 11:26:50 - (UTC) From:mlel...@serpens.de (Michael van Elst) Message-ID: | k...@munnari.oz.au (Robert Elz) writes: | | >That looks to me as if it should work, and is a lot cleaner, though | >I doubt there's a great need to remove the [] if they were given. | | getaddrinfo() doesn't strip or handle brackets. As I said before, I haven't looked at all at how the saved address string is handled after that routine returns it - I assumed that something else must be processing those (and probably in a way that allowed your 'x' workaround to work) - but if not, by all means, remove them there, I don't see any harm in doing that, as the user isn't being required to give them. kre
Re: iscsid - lfs and ipv6 issues
Actually, no, I don't think that will work after all - in an address like [fe80::1]:1234 the + sp = strchr(str, ':'); + if (sp != NULL) { + if (strchr(sp + 1, ':') != NULL) { code is going to happen, and set the port to 0 (instead of the intended 1234) - it needs to ignore :'s inside [] the way the old code was doing - one way to do that might be if (sp2 = strchr(str, ']')) sp2++; else sp2 = str; sp = strchr(sp2, ':'); if (sp != NULL) { /* etc as it is in your patch */ That's very very very crude, but I think will do the right thing for valid addresses. kre
Re: iscsid - lfs and ipv6 issues
That looks to me as if it should work, and is a lot cleaner, though I doubt there's a great need to remove the [] if they were given. kre
Re: iscsid - lfs and ipv6 issues
Date:Fri, 17 Nov 2023 22:22:24 - (UTC) From:mlel...@serpens.de (Michael van Elst) Message-ID: | The address parser looks broken. It certainly is, it is horrid. | For some reason the first character is skipped when it tries | to identify IPv6, At the relevant point it doesn't really seem to care which addr family, but is trying to deal with v6 address literals | I was successful with | iscsictl add_send_target -a 'x[ipv6-address]' I can't imagine how that would work (how it avoids the current problem is clear) - the relevant function simply copies the address (as a string) to be processed later - for current purposes I didn't look to see how that is processed into an actual sockaddr type address, and how that can possibly work with that 'x' there, but if it does, there is very likely more dubious code. The actual arg parsing of that -a option (for add_send_target, maybe other commands as well) is in src/sbin/iscsictl/iscsic_parse.c and the relevant function is get_address() After checking that the address is present (not NULL or "") the code does ... /* is there a port? don't check inside square brackets (IPv6 addr) */ for (sp = str + 1, val = 0; *sp && (*sp != ':' || val); sp++) { if (*sp == '[') val = 1; else if (*sp == ']') val = 0; } That "sp = str + 1" is your "skips the first character" - the problem is that if the '[' is the first char (which you'd normally expect it to be, if present at all) then it will never be seen, so val will remain 0, the first ':' that is seen will then appear to be a ':' that separates the address from the port number (rather than just part of the syntax of the v6 address) and it is all downhill after that. (When you put that 'x' there, the '[' is seen, and everything up to the ']' is just treated as the v6 addr, so this part of the code works.) Simply removing that ' + 1' from the init of sp should fix that one. But that's just the beginning of the problems... The code goes on: if (*sp) { If that's true, we know that *sp == ':' from the loop above, but there are two cases possible, one is an addr, followed by ':' and a port number. The other is a literal IPv6 addr which isn't enclosed in [ ] (in which case no port number would be possible, but that's the user's choice). The code needs to work out which case we have, and it does that by: for (sp2 = sp + 1; *sp2 && *sp2 != ':'; sp2++); That is, simply look at the string following the ':' and see if there's another ':' later, if there is, then the assumption is that this is a v6 addr which didn't include the [ ] as protection. That's OK (though there are a million ways this stuff can fail to correctly handle various broken input formats). if (!*sp2) { That is, there are no more ':' in the address, so the code assumes that what follows the first one (*sp) is a port number, and parses that. /* truncate source, that's the address */ *sp++ = '\0'; Now sp points past the ':' (which has been obliterated) at the port number itself, and the string pointed to be str is the address, from which the port number (and anything which follows) has been removed. This is followed by code that parses the value of the port number, and while it isn't particularly resilient against errors [aside: scanf makes for crappy parsers, use strtol() instead], it isn't relevant to anything here, so I'll skip that. } After this, the code wants to move past the just parsed port number, and see if that was terminated by '\0' or ',' - in the latter case a group tag (whatever that is, and no, I don't need to be told) can follow. for (; isdigit((unsigned char)*sp); sp++); That's skipping the digits that were the port number - but note that is being done, whether or not there was a port number given (oops) - in the case above where *sp2 == ':' (that is, we have a v6 addr with no [ ] around it) then this is just nonsense (this is what's happening in the case that was reported). if (*sp && *sp != ',') arg_error(arg, "Bad address format: Extra character(s) ' When that loop is done, we either stopped on the ',' or the '\0' (or that's what was expected) - if it stopped on anything else, that's an error. In the reported case it is stopping at a later ':' in the v6 addr, which implies that after the first ':' before the second, was a string of entirely decimal digits (or nothing in a case like fe80::...) and not one of the alpha hex chars that can make up an IPv6 addr, eg mine is currently: 2001:fb1:12a: in that case it would complain that the 'f' is an invalid "extra character". Next the code goes on to the case where there was no port number (no ':' was seen outside [ ] in the address) .. } else
Re: Meaning of size of /dev/pts/ files
Date:Mon, 25 Sep 2023 09:42:36 +0200 From:rockyho...@firemail.cc Message-ID: <1354f06f549eb36716bca02777cb7...@firemail.cc> | The /dev/pts/ files seem to have each their own size, as if they were | regular files. Everything which has an inode (or equivalent) has a size (everything that stat() can be applied to must have one, as a size field is in the resulting structure). | First curious fact: `ls -l' doesn't show the size in | bytes of such files (for some reason). Because it is meaningless nothing. POSIX says (in the definition of the header: off_t st_size For regular files, the file size in bytes. For symbolic links, the length in bytes of the pathname contained in the symbolic link. For a shared memory object, the length in bytes. For a typed memory object, the length in bytes. For other file types, the use of this field is unspecified. The final sentence is the relevant one. | Instead, `exa' shows their sizes: RVP already indicated how you misinterpreted that. | So, second curious fact: the sizes of these pts files are not | related to the number of characters received by them as output of some | command. Not curious, what you're looking at isn't the size field. | Any clue about what these sizes actually represent? RVP answered that for what you're looking at, the actual size, which is in the stat() results (which applications should always simply ignore for anything which isn't a regular file, symlink, or one of the memory types, as it is unspecified - and which both ls and exa (whatever that is) are doing, correctly, is irrelevant (and as RVP indicated, should always be 0, as nothing ever sets it to anything different). Terminal type devices don't get bigger (which is what the size represents) as you write data to them, they just pass the data through to someplace else, and forget it. They do tend to count how much they processed, but that's not a size, and is terminal dependant data, so not available via stat() (so ls will certainly never tell you that number). kre
Re: segfault in libterminfo with ncurses with nethack
Date:Fri, 1 Sep 2023 23:32:03 + (UTC) From:RVP Message-ID: | So, something like this: | | PREFER.curses= pkgsrc | .include "../../mk/curses.buildlink3.mk" | .if ! ${PREFER.curses:U} == "pkgsrc" | .include "../../mk/termcap.buildlink3.mk" | .endif | .include "../../mk/bsd.pkg.mk" Wouldn't it be better to just delete the include of the termcap buildlink file entirely? That is unless the application is actyally using termcap functionality directly itself. If the only reason it is there is because the NetBSD curses requires it, surely the curses buildlink file should be adding it, when it is needed (and not otherwise). kre
Re: UEFI installation
It would be possible to add a manual override in the installer, but currently there is no such thing. A better solution would probably be to simply set up all possible boot methods (for the way the system is being configured) without caring which method happened to be used to boot the install image. kre
Re: Using 'groff'
Date:Sat, 17 Jun 2023 15:18:35 + From:Todd Gruhn Message-ID: | This works: | groff -man /usr/pkg/man/man1/man.1 -Tascii 2> /dev/null | more I'm surprised, I would have expected it to need to be groff -man -Tascii /usr/pkg/man/man1/man.1 ... though with GNU tools one can never tell what they might allow. (the order of the -m and -T options isn't important). | OR , does 'groff -man ...' always need to have a full dir-name | (/usr/pkg/man/man1/* )? It needs to be given a path to the file(s) it is to format, yes. groff is not a manual page reader, it is a document formatter. It works for man pages because man pages are documents, but groff itself has no idea that what is being formatted is a manual page, nor where such things might be stored. Note that "-man" is not an option - -m is the option, it says which macro package to use, "an" is the name of the manual macros, used that way, as in practice, no-one ever does "-m an" (though you could) with the -m arg to *roff - the macro package name is (by convention) always given with the -m (as above, as -man in this case, there are a whole bunch of other possibilities, for documents written for those macros, you have to use the right macros for the document, and usually the 'm' is considered part of the macro set name (the manuscript macros ('s') are -ms, memorandum macros ('m') -mm, Eric's macros ('e') -me, the manual macros ('an') -man, the doc (new form macros) 'doc' (-mdoc), and the man/mdoc work it out and use either -man -or -mdoc, macros ("andoc") are -mandoc). I would also suggest not redirecting stderr to /dev/null - if anything is being printed to stderr, you (or someone) probably wants to investigate, as it usually indicates some kind of error. kre
Re: Advice for new travelling server: Intel Z690 chipset?
Date:Fri, 5 May 2023 14:57:17 +0200 From:Johan Stenstam Message-ID: <9a50686b-bd7c-4a00-84b9-3434395d0...@ihren.org> | But I’m concerned about the Intel Z690 chipset No need, that works fine - I have a setup with that (definitely not in a travelling system though - I can barely lift it) and it works just fine. You said you didn't care, but if you were considering using (in any way at all) the on-cpu graphics (assuming the CPU in the system you're looking at has that) that is unlikely to be supported in NetBSD - not even sure if it is recognised as a graphics device suitable for running wsfb or even a text based console. | * disk performance from the multiple M.2 PCIe X4 Gen4 slots (PCH) devices? Should be very good - for me enough that I had to add extra column width in iostat output to make the results (transfers/sec in particular) look reasonable... Capacity is limited (I think there may be 4TB M.2 devices around now, but common is just 2TB (or less, each)). | * networking: the NUC 12 has 10GbE (AQC113) + Intel� i225-LM. That one I can't help with. | * USB keyboard: can this still be an issue? No, that all just works. I have used nothing else for years now (KB & mouse). I use a wireless KB/mouse combo with just one dongle in a USB port for both. Wired USB keyboards with a hub and a mouse plug in port on them also exist. | * a working console (there is no VGA, but 2xHDMI?) and Again, that might be an issue, depending what it is using for generating the graphics. But a cheap (old, perhaps even pre-loved) PCIe graphics card is very likely to work OK, provided the system has a slot to put it in, and sufficient power to drive it (some of the old ones were hungry). Even if for some reason the DRM stuff doesn't work, it should at least appear as a frame buffer, and be usable as the console, and for wsfb. Any modern monitor/TV without HDMI support isn't worth considering, so HDMI should be no issue (and if you need to run a VGA based old monitor, or OHP, I think that HDMI->VGA converters exist). I have just gone and had a look at the specs of the NUC12 Extreme -- apart from having way less SATA availability than I have, that's almost identical to my system. Same CPU (or almost) (blindingly fast) and much of the rest of the specs look similar (I could have more RAM than that supports, but I do have just the 64GB that can have, and if anything, it is sometimes too much - never seen any swap space used, even when doing a -j16 build of NetBSD). The integrated WiFi is unlikely to be supported (until the new WiFi branch is finished anyway - that's still some time away yet I believe). In mine, I think: Intel product 7af0 (miscellaneous network, revision 0x11) at pci0 dev 20 function 3 not configured is probably that. I haven't used it, but the bluetooth which is on the same board/chipset is recognised by NetBSD). My Intel network (LAN) chip is I219V which works fine. I also have an rge (2.5GHz - Realtek Semiconductor Killer E3000) which I have no current use for, but is supported. I have the i9-12900 graphics disabled in the BIOS, so that doesn't appear in the dmesg output. The specs I saw are not terribly clear (I didn't bother downloading the datasheet) but it may be there's only a single PCIe slot available, so you might need to choose between graphics & network expansions (near term support, if it isn't there already, I don't know, for the i225 intel LAN is more likely than for the integrated graphics). kre
Re: Keeping NetBSD disklabel up to date
Date:Thu, 26 Jan 2023 15:32:59 -0700 From:beaker Message-ID: <14c843c8-9069-d45c-e103-fd1502c67...@lavabit.com> | I don't believe GPT is supported on the system in question. If you're referring to old OS installations, you might be right, without knowing the versions no-one could say (and I couldn't in any case for linux). But if you are referring to the hardware and BIOS, then it will not care, GPT was designed to be sufficiently backwards compatible with that that it should all jyst work. You need OSs and boot code that understands GPT, but that's just software. All the BIOS needs is an MBR and GPT retains that, and boot boot code installed in it which can handle the actual OS finding and loading, you can install a GPT version of that. kre
Re: sending/receiving UTF-8 characters from terminal to program
Date:Fri, 20 Jan 2023 08:55:45 + (UTC) From:RVP Message-ID: <4dd21c1f-f5c3-c3ba-96d8-cab73a0b...@sdf.org> | Both /bin/sh and bash output UTF-8 if given Unicode code- | points in the form `\u'. So, I believe bash will take your current locale into account when doing that, whereas neither /bin/sh nor /usr/bin/printf do, they simply emit UTF-8 unconditionally. This kind of difference is (partly) why POSIX is not including the \u (or \U) escape sequences in $'...' quoted strings in Issue 8. Another is how the end of the is detected, is it always exactly 4 hex digits (or 8 for \U), or any number up to 4 (or 8) if followed by a non-hex char, or using as many hex chars as exist? To be portable (as input) such a string needs to be exactly 4 (8) hex digits, and be followed by something which is not a hex digit - the closing ' is often useful there, it can always be followed immediately by $' to resume quoting again (or just ' or " if those are adequate). But that's just the input, you also need to be using a locale using UTF-8 char encoding to get predictable output. kre | | $ printf 'néz' | hexdump -C | 6e c3 a9 7a |n..z| | 0004 | $ printf $'n\uE9z' | hexdump -C | 6e c3 a9 7a |n..z| | 0004 | $ | | If that works, then check those UTF-8 bytes against whatever the | terminal emulator generated from your keystrokes for the `' | in `néz'. | | -RVP | | --0-494486379-1674204946=:18222-- |
Re: -10.0_BETA panics when system is rebooting
Date:Fri, 6 Jan 2023 22:04:26 +0100 From:=?UTF-8?Q?BERTRAND_Jo=c3=abl?= Message-ID: <85d8d94d-7cd6-8f8c-3b67-8e97a7c00...@systella.fr> I can't help with the panic cause, but: | [ 856605,000596] acpiout5 at acpivga0 (DD.5961966] dump device bad | | I don't understand last line as dmesg indicates : That's because it isn't really a line, somewhere in the "DD.5961966" string, one message has been overwritten by another. The last like is really just a (not all there) timestamp, and "dump device bad" Do we do crash dumps onto raidsets? kre
Re: 'cd' if HOME is unset
Date:Mon, 26 Dec 2022 10:41:25 -0800 From:Michael Cheponis Message-ID: | Well, as a zsh user: What zsh does you'd need to take up with zsh developers. But it is one of 2 shells I tested which don't require HOME to be set for "cd". zsh I am not all that surprised about. It tends not to concentrate a lot on conformance with other shells, but rather on what its designers believe is better for its users.(The other that does not error is dash). | $ echo $HOME | /usr/mac | $ unset HOME | $ echo $HOME | | $ zsh -c cd | $(no change in directory, no error msg) No error message, yes, but are you sure there was no change in directory? (Even if that change was into the directory it started from). jacaranda$ (cd /; unset HOME; zsh -c 'cd; pwd') /home/kre Looks like it changed directory to me (from / to /home/kre - my normal home). | in all cases, directory does not change when HOME is not defined. Note that in the test cases (like "(unset HOME; zsh -c cd)") the subshell (parentheses) are so the "unset HOME" doesn't affect the shell from which the command is run (you won't need to set HOME again after the test), and the cd is run only in the context of the shell that runs it, so only that process has its directory changed - in this test, that shell exits immediately after, so doing the change this way is normally pointless, here, it was done solely for the purpose of viewing the error message (if any). In the form I use above, rather than simply exit, the shell that ran the "cd" then ran "pwd" after, to reveal what its directory was now. You could change the command string to "pwd; cd; pwd" to see before and after directories. kre
Re: 'cd' if HOME is unset
Date:Sun, 25 Dec 2022 15:33:57 -0800 From:Michael Cheponis Message-ID: | Maybe it should print "$HOME is not set" in that case? Did you try it? It is easy... (unset HOME; sh -c cd) or use ksh (or some other shell) instead of sh to test it. Script started on Tue Dec 27 00:01:34 2022 jacaranda$ (unset HOME; sh -c cd) cd: HOME not set jacaranda$ (unset HOME; ksh -c cd) ksh: cd: no home directory (HOME not set) jacaranda$ exit Script done on Tue Dec 27 00:03:06 2022 Note HOME not $HOME is not set, $HOME is, if HOME is set, a pathname, or if HOME is not set, "", neither of which makes any sense to describe as 'not set'. kre
Re: 'cd' if HOME is unset
Date:Sat, 24 Dec 2022 22:32:22 -0500 From:Jan Schaumann Message-ID: | I happily admit that it's a rare edge case. I simply | find it surprising that 'cd' gives up if HOME is | unset. Seems unintuitive to me. It is how it is defined to work, and always has been. Only dash and zsh seem to handle that case, no other shells bother. (I am a little surprised that dash does, their general philosophy tends towards minimalist implementation, with almost nothing that isn't required). Better is just to always have HOME set. For /bin/sh ~ works without HOME, so you could define cd() { case "$#" in 0) set -- ~ ;; esac command cd "$@" } if you wanted to. But that's not required to work either (I'm not surprised that dash doesn't expand ~ when HOME is not set, that's more in line with what I'd expect ... though tilde expansion working is, in general, more useful, than cd with no args when HOME is unset, so if a shell was to do just one, I'd generally do it the way we do (as does bash)). kre ps: I am a little surprised that csh acts this way though, it started from the Thompson sh (ie: pre 7th edition) and back then there was no environment, and while csh had vars (incl home) it couldn't have cd depend upon what was in the environment, and had to either use the passwd db, either for "cd" or to init "home" or both. I guess that has been changed since.
Re: 'cd' if HOME is unset
Why bother?It is already clear that one cannot depend upon this working, and nothing normally should ever have HOME unset, unless that is done deliberately (perhaps even to prevent a simple "cd" from going there). kre
Re: NetBSD-10.0_BETA: clock: unknown CMOS layout
Date:Fri, 23 Dec 2022 10:20:27 - (UTC) From:mlel...@serpens.de (Michael van Elst) Message-ID: | The message says that no century information is found in the CMOS RAM, | the hardware clock itself seems to keep only 2 year digits. The century | is then deduced as 1900 if the year number is less than 70 and 2000 | otherwise. I would hope that it is the other way around, if >=70, assume 1900, and if < 70, assume 2000 (which is why 22 now will produce 2022). | This heuristic will fail 2030. 2070 would be more likely.But given how very unlikely it is now that anyone is ever going to (legitimately - people doing weird things can deal with the issues) boot a system in the 20th century, ever again, then perhaps we should be altering the heuristic to assume all years are 21st century for now, and then in another 50 years or so, if systems still exist with this issue, and we are still measuring civil time the same way, change the heuristic again so that 22nd century years will work with a similar boundary to what was used for 20th/21st century years, until sometime into the 22nd century, where it can start assuming all boots occur in that period (and so on, for as long as this is needed). kre
Re: timers slow (sleep 1 taking five seconds)
What you're seeing clearly isn't the TSC calibration problem that I was having, and Michael fixed (but which would not be fixed in 9.2, so it was a possible source) as what I was seeing was time actually running slow, and you're clearly not seeing that - just internal sleep timers running slow. However, when I switched from using the (miscalibrated) TSC to using a different timer hardware source, after having started with a badly running TSC, I did observe behaviour much like you are reporting. That is, time was running at the right rate, but internal sleeps were slow - still running at the incorrect frequency. That was mentioned somwhere in the "Weird clock behaviour with current (amd64) kernel" thread. This one I never bothered looking for, as it seemed (to me) likely to be just a side effect of the miscalibrated TSC, but perhaps not - perhaps there is some other bug in the internal timing that is being triggered by something (if I had to guess, now, I'd say perhaps some overflow, just because of how long your system has been running without a reboot). I'd also assume this is something in (relatively) new code - clearly in the 9 series, but I have had older vintage NetBSD systems running much longer than 90 days without any issues, but I have never really run a 9.x system, except when that was HEAD - the long running systems tend to be older (early 8 at best), and on HEAD, I tend to reboot much more frequently to keep up to date. If this is what it is, a reboot would almost certainly fix things, for a while - but would also loose the ability to debug what is currently happening. kre
Re: Upgrade 9.2->9.3 amd64 issues - device not configured
Date:Wed, 9 Nov 2022 21:36:30 +0100 From:Riccardo Mottola Message-ID: <60792a38-6644-2687-0a3d-38e9f8a50...@libero.it> | What do you suggest to do now? Unless there is something wrong, nothing.On a 9.x system you don't need any boot code updates with upgrades, so not having one done is harmless. The error you saw is harmless (to you), there is nothing to fix. kre
Re: tm(3) vs "double leap second"
Date:Sat, 22 Oct 2022 23:21:04 -0400 From:Jan Schaumann Message-ID: | I believe the notion of tm_sec allowing for 0-61 to | account for a possible "double leap second" was a | mistake It was indeed, somehow the notion of "possibly two leap seconds a year" was interpreted as "possibly two leap seconds at once". | I believe this should be updated, but I don't know | whether this is a documentation/comment fix only, or | if we are actually somewhere using a value of 61? I very much doubt that anyone is (any more) assuming that tm_sec can ever be 61, or that it ever jumps from 57 to 0 without going through (at least) 58 first. The patches you proposed look fine to me (even if there were some code assuming otherwise somewhere, those changes wouldn't affect it). kre
Re: Backing up "stuff"
Date:Tue, 18 Oct 2022 07:03:08 -0400 From:Todd Gruhn Message-ID: | DVD+DL? I have not heard this name. | What is DVD-DL? Dual Layer.Capacity about twice as much as a regular DVD (BluRay discs hold much more however). Needs dual layer blank discs, and a dual layer capable writer. kre
Re: vi -r crash, netbsd-9 amd64
Date:Sat, 15 Oct 2022 15:38:14 +1100 From:Paul Ripke Message-ID: | Interestingly, the file was last updated about 10 days before the crash... | and I do see fsync calls from vi on the recno recovery file, too. I'm not sure that crashes are necessarily the cause of corrupted vi recovery files - it is possible that some sequence of editing mods is what makes bad ones. In any case, occasional corrupt recovery files have been a vi "feature" for as long as recovery files have existed. It would be nice to see this fixed, but as no-one really knows what causes it, so no-one is able to make it happen at will, finding the cause is not an easy task. | I must admit I haven't experienced this - and I either crash my system | or suffer accidental power loss every month or so. You're running 9.3_STABLE right? I'm running HEAD. There is very likely a difference there. I also have quite a lot of (unused mostly) RAM which can hold a lot of buffers, which rarely ever actually require flushing in normal operations of my system. | The last corruption I saw was back in Aug 2019: Note that the corruption I meant was fine content corruption, the kind you refer to here: | https://mail-index.netbsd.org/current-users/2019/08/19/msg036431.html is meta-data corruption is far more rare (lots of work has gone into making sure that doesn't happen, as if it does, and isn't corrected, things just get worse and worse). On the other hand a file with corrupted data inside is simply a file with corrupted data inside, and only affects users of that file. | I have postgres and mongodb running, but they both do the right thing | with fsync, etc, None of the files I referred to would have been subject to any kind of sync. Certainly not fsync, but no general sync either (I have now taken to running a while sleep N; do sync; done loop, where I pick N more or less at random after each reboot - more or less replicating what the old update program used to do.) kre
Re: vi -r crash, netbsd-9 amd64
Date:Wed, 12 Oct 2022 22:57:00 +1100 From:Paul Ripke Message-ID: | "vi -r", etc, and it seemed to work fine. The recovery file that causes the | crash was left behind after a kernel panic. The recovery file will be corrupted. I have seen that kind of thing from time to time - vi -r really shouldn't crash when it attempts to recover the mangled file, but it really doesn't matter, nothing is likely to ever recover it, vi core dumping is just its weird way of telling you that. The bigger issue is that we have an issue with file system flushing, on panic, everything is normally supposed to be written, but that can't be guaranteed ... a bigger issue is that we aren't flushing file data almost ever. A while ago, I had a power failure (no chance for the kernel to flush anything before the system died - so no surprise there were corrupted files). What was a surprise was that files that had last been touched 12 hours previously hadn't been updated. That's not really acceptable. kre
Re: set -o emacs ; stty -echo
Date:Tue, 19 Jul 2022 21:34:27 -0400 From:Andrew Cagney Message-ID: | should, like for bash, this put the terminal into -echo mode? | | arm64$ echo $SHELL | /bin/sh | arm64$ set -o emacs ; stty -echo | arm64$ pwd | /home/cagney | | other combinations are equally puzzling. for instance: | | set -o emacs ; stty -echo ; set +o emacs | doesn't flip to -echo mode either Sorry this has taken so long for (my part of) looking into this. I am now confident that sh has nothing to do with this at all, in fact, sh never makes any changes (by itself) to any of the terminal operating modes -- it will check to see if the line discipline happens to be the old one (which I don't think exists on NetBSD any more, and probably not anywhere else relevant either) but if it is, all it does is refuse to enable job control, it doesn't even attempt to alter it. Aside from that, the only tty/sh interactions are to change the terminal's process group as needed as the foreground job alters, turn off O_NONBLOCK if someone stupidly set it on the shell's input stream (whether a terminal or otherwise) and monitor the window width of stdout (if a tty) for some output to be able to be wrapped better. Everything else related to tty modes is handled entirely by libedit. Hence, I am passing this to Christos to look at. kre
Re: GPT on RAID
Date:Wed, 28 Sep 2022 13:53:42 +0300 From:Dima Veselov Message-ID: <4fd66e90-86f4-9fc3-aaf8-27ce417b1...@lich.phys.spbu.ru> | I put GPT on RAID device (because disk is large) and it seems no good | way to root autoconfig. That's probably true, with the emphasis on "good" - but there is a way. | If there any way to autoconfig or tell kernel via bootloader that | my root reside on certain GPT partition which is on RAID device which | is on GPT of two disks? I've had a setup essentially like that for years - you need to configure the raid with "-A root" to tell raidframe to claim the root partition (and autoconfigure itself), and the tricky (not good) part, the GPT partition that is to be the root must have a wedge name of raidNa (where raidN is the raid set). So, gpt label -i M -l raidNa raidN (M is the relevant partition index). Don't forget to change any relevant NAME= entries (in fstab, or elsewhere) to match. raidctl -A root raidN It is somewhat bizarre, but works. My system has: NAME=raid7a / ffs rw,log 1 1 in fstab, and raid7 includes... raidctl -s raid7 Components: /dev/dk8: optimal /dev/dk18: optimal No spares. Component label for /dev/dk8: [...] RAID Level: 1 Autoconfig: Yes Root partition: Yes Last configured as: raid7 [and the same for dk18] and gpt show -l raid7 startsize index contents 0 1 PMBR 1 1 Pri GPT header 2 32 Pri GPT table 34 990 1024 1047552 1 GPT part - "raid7a" (etc). I don't think it is important (or even relevant) that the root partition happens to be the first one in the gpt on the raidframe. kre
Re: Qemu/nvmm - time in NetBSD guest system lags behind (with estd on host)
Date:Sun, 4 Sep 2022 10:52:31 +0200 From:Matthias Petermann Message-ID: <06b5d183-36c7-30bf-56be-8e507dffd...@petermann-it.de> | This is a cryptographically signed message in MIME format. | | --ms090201020102010003020302 | Content-Type: text/plain; charset=utf-8; format=flowed | Content-Language: de-DE | Content-Transfer-Encoding: quoted-printable | | Hi Robert, | | please allow me one mor more question Sure, but this one I cannot answer, I know nothing about the module build system, so I am punting this to Paul Goyette, the expert on all things module related. Paul? | | On 04.09.22 10:42, Matthias Petermann wrote: | > Hi Robert, | >0 | > On 04.09.22 02:58, Robert Elz wrote: | >> if that implies that you rebuilt the kernel with HZ=1000 and then used | >> the zfs module built with HZ=100 then I think the first thing I would try | >> would be to rebuild the module(s?) with HZ=1000 | >> | > | > Good point... I'll try that right away. This might coincide with my | > observation (race condition when initializing the ZPOOL, mail from just | > now). | I did build the kernel with build.sh as follows: | | ``` | $ cd /build/netbsd-93-1000hz/usr/src/sys/arch/amd64/conf | $ cp GENERIC VHOST | $ vi VHOST | | optionsHZ=1000 | | $ cd /build/netbsd-93-1000hz/usr/src/ | $ mkdir ../obj | $ ./build.sh -O ../obj -j 4 -U tools | $ ./build.sh -O ../obj -j 4 -U kernel=VHOST | $ ./build.sh -O ../obj -U releasekernel=VHOST | ``` | | ...and picked it up from | | While for the *kernel* / *releasekernel* target the name of the kernel | configuration to be used can be provided, I don't see such an option for | | the *modules* target. How can I make sure the modules are built with the | | HZ option set in VHOST config? Or does it simply adapt these from a | previous run of the *kernel* target? | | Kind regards | Matthias | | | --ms090201020102010003020302 | Content-Type: application/pkcs7-signature; name="smime.p7s" | Content-Transfer-Encoding: base64 | Content-Disposition: attachment; filename="smime.p7s" | Content-Description: S/MIME Cryptographic Signature | | MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgEFADCABgkqhkiG9w0BBwEAAKCC | Cc8wggSSMIIDeqADAgECAghoUOMHJhNeJTANBgkqhkiG9w0BAQsFADBmMQswCQYDVQQGEwJE | RTEzMDEGA1UECgwqREdOIERldXRzY2hlcyBHZXN1bmRoZWl0c25ldHogU2VydmljZSBHbWJI | MSIwIAYDVQQDDBlkZ25zZXJ2aWNlIENBIDIgVHlwZSBFOlBOMB4XDTIxMTIyNzEwMDY1MFoX | DTIyMTIyNzEwMDY1MFowcDELMAkGA1UEBhMCREUxITAfBgNVBAUTGDQwMDAwMDAwNjFjOTky | OTgyNjA1ZWNjNDEbMBkGA1UEAwwSTWF0dGhpYXMgUGV0ZXJtYW5uMSEwHwYJKoZIhvcNAQkB | FhJtcEBwZXRlcm1hbm4taXQuZGUwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCY | sokm5ZK4ogI3/02Du0PgMRGCgIZGVKmKStV/yMp7sZKi9oTMZwBEm1cO+zcepAFRA5iy4OC9 | eZf+qJSu4BgEL1/qHsI3EyTCLmVOou0mKCkXv4+neriN+z8LltlocJVt+L78j+rUjyDfYMsg | ep5Icf6oHVBdeXbvrds44yKLOW0ozlnTGzcgqVIW7kc34QrJz9VwDwYdGrIbZ8zu2qvLec4s | ApWNsOaEzjDQDcwKszVGxSap42EpU/81ZiIrXQqCXdcpienydi+qYA58NMN/HM6uBod3tmt/ | pc7PRKvXeRAsbjM1CtrxsiM2LZ+VOu1CY4qR80h64mNylj+wi7dXAgMBAAGjggE4MIIBNDAd | BgNVHQ4EFgQUJrBn3ZPsJhQjlSpeO+zlbphDFtIwDAYDVR0TAQH/BAIwADAfBgNVHSMEGDAW | gBTpxpPR1Q8GZHLqapY+uhDyVFSyeTBWBgNVHSAETzBNMEsGDCsGAQQB+ysCAQMCCDA7MDkG | CCsGAQUFBwIBFi1odHRwOi8vc2VjNS5kZ25zZXJ2aWNlLmRlL3BvbGljaWVzL2luZGV4Lmh0 | bWwwPgYDVR0fBDcwNTAzoDGgL4YtaHR0cDovL3NlYzUuZGduc2VydmljZS5kZS9jcmwvY3Js | Mi10eXBlLWUuY3JsMA4GA1UdDwEB/wQEAwIEsDAdBgNVHSUEFjAUBggrBgEFBQcDAgYIKwYB | BQUHAwQwHQYDVR0RBBYwFIESbXBAcGV0ZXJtYW5uLWl0LmRlMA0GCSqGSIb3DQEBCwUAA4IB | AQDXi3RDfDsZivZhaF+l/2lkHMgofI12pA1WbREKnELjA0yexbu+DQLcQtIRrZUAdsso5l1m | +aetmRd8n+AGUR2ZIfLTHTm/zbvMeSJXVzc+7aCcwyMpFCCOPuyUiO2SMT+B278Mf6fRgto8 | WuLlLnd7FlrxmOGKsTSF+kvwdHWHoUwh4dB8Y5CtZ5opj5GzLmuNo/axBvTvaDKAW+RxGpoH | U/Z1byL77K27Bg1P9fegN4jrzG+CZxJ/z/RQyXKTY8r1mjDQmuXUqmNnbH/BgD1C0diySbAm | Cvnw2FBe+/hGQDF8SZ50tnffLcqR65tbGBiCHPUYLgMYwT7fF/KPgltJMIIFNTCCBB2gAwIB | AgIIVRxK12atJfYwDQYJKoZIhvcNAQELBQAwYTELMAkGA1UEBhMCREUxMzAxBgNVBAoMKkRH | TiBEZXV0c2NoZXMgR2VzdW5kaGVpdHNuZXR6IFNlcnZpY2UgR21iSDEdMBsGA1UEAwwUZGdu | c2VydmljZSBSb290IDc6UE4wHhcNMTYxMDI2MDkyMjQxWhcNMjQxMDI2MDkyMjQxWjBmMQsw | CQYDVQQGEwJERTEzMDEGA1UECgwqREdOIERldXRzY2hlcyBHZXN1bmRoZWl0c25ldHogU2Vy | dmljZSBHbWJIMSIwIAYDVQQDDBlkZ25zZXJ2aWNlIENBIDIgVHlwZSBFOlBOMIIBIjANBgkq | hkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA3KXylD90x6NH0pdmzmujzW0XA2GWhOGVd7yxq3v1 | OOOTrEWoTkT3j//S+J8nEyun1GsKQ06jmg8bV2MY6CTQvS5ykcVQf1JAX9IdubzdC9YleCoQ | mmPE4pldM9slEpW9jbmeIHQVOvaiZGrKmI/gD/DnEDqNInY/Ta9XpaBw99otCQz5IQY/FX+n | Om+5jcp/Mn2WL2Zc81dokP3L6OohS8dsIBu5gpDmfAQBxgxcOk9FCANAZOtGIUTEcSOxl4zM | QUANmP116D+Hb0Aw7TDZitK+Q1F6H/O8Nm613LbkNm+MTcBSBK1aAELvH7Z105vYjrWjrFsi | zGV+r+bM2kAagQIDAQABo4IB6jCCAeYwEgYDVR0TAQH/BAgwBgEB/wIBADAfBgNVHSMEGDAW | g
Re: Qemu/nvmm - time in NetBSD guest system lags behind (with estd on host)
Date:Sat, 3 Sep 2022 13:51:25 +0200 From:Matthias Petermann Message-ID: <8c9bbdbc-5583-7f2d-4e04-ab550b6ee...@petermann-it.de> I cannot really help with zfs issues, I know nothing about it, but: | The zfs module was loaded though, I also built the kernel with exactly | the same sources as the "original" one, so I assume for now that the | modules are compatible. if that implies that you rebuilt the kernel with HZ=1000 and then used the zfs module built with HZ=100 then I think the first thing I would try would be to rebuild the module(s?) with HZ=1000 Long ago there was much work done to get rid of the constant HZ from the kernel, and replace it with the variable hz (which is initialised to HZ). However, I am by no means sure that this has consistently been maintained in all the intervening years, and in particular with external modules, and there might be places that are still (or again) using HZ (the constant) rather than hz (the variable) (but beware just doing a grep, in many contexts there is a #define HZ hz in scope that can defeat that simple way of checking). While you definitely need new modules if the kernel version changes, I don't believe that the converse applies. I suspect you need new modules (or might) when kernel options change as well (whether you do or not depends upon whether the module is affected by the option that has been changed - which is not always easy to detect, so just rebuilding is generally safer.) kre
Re: Qemu/nvmm - time in NetBSD guest system lags behind (with estd on host)
ps: arithmetic has never really been my thing, the 10.1ms I mentioned should probably have been 11ms instead. kre
Re: Qemu/nvmm - time in NetBSD guest system lags behind (with estd on host)
Date:Wed, 31 Aug 2022 13:42:13 +0200 From:Matthias Petermann Message-ID: | I'm also curious about the effect on energy consumption - i.e., whether | it's measurable. I'm sure its measurable, but I suspect you'd need a highly accurate and very precise ammeter to do that. kre
Re: Qemu/nvmm - time in NetBSD guest system lags behind (with estd on host)
Date:Wed, 31 Aug 2022 11:29:06 +0200 From:Matthias Petermann Message-ID: | The guests' time increasingly lags behind with continued operation. Also | the ntpd seems to have no compensating effect in the guests here. This is a well known issue, ntp in the guest cannot help, the clock drift is too much for its algorithms to handle. | What could be the reason for this? Can estd be a source of interference? On estd, no. The problem is that the qemu guest is running using time at 100 Hz, it expects a clock interrupt every 10ms. To make that happen, qemu (effectively) sleeps for 10ms, then signals a clock interrupt to the guest. But the host running qemu is also running with a clock at 100Hz. When a process asks to sleep for 10ms, it needs 2 clock ticks to occur, one which might happen an instant after the sleep request (be just 1us or something since the request) and so would not be long enough, and another 10ms later - then we know that 10ms has passed. If it happens (and always happened) just that way, all would be close enough (and NTP would cope in the host with any minor time shift that occurs) but what actually happens is that qemu wakes up from its 10ms sleep (or gets a 10 ms SIGALRM - that difference doesn't matter) and as well as signalling its guest, immediately requests a new 10ms sleep, for next time. Here, rather than being 1us before the next clock intr happens, it is more likely to be 1us after the previous one happened, ie: 9.999 ms until the next one happens - we wait that long, then 10ms more, and the sleep finishes - just about 20ms elapses to do one 10ms sleep. The guest is getting one clock interrupt every 20ms, but believing that 10ms has elapsed, as that's what it requested. Or that is how I understand it from what has been explained to me - the details might not be exactly right (and I've never looked inside qemu) but that's more or less the effect and its underlying cause. | All I could find so far is [1]. It is recommended to add the rtc switch | to the qemu command. Is there any recommendation here in the meantime | which setting works best with NetBSD? About the rtc, no no idea. But to deal with the problem, aside from major NetBSD code rewrites (the so called tickless kernel) the one solution that should work is to run the host with HZ set a lot higher, and leave the guest(s) at 100Hz. For any modern host (anything you'd really want to use to run a qemu guest in production) running with HZ=1000 will be fine (you'll never notice the tiny extra overhead). Some of the NetBSD ports already run at that kind of rate - alpha has been at 1024Hz forever (and these days, alphas are slow processors - though they weren't compared to others when that change was made). With this, the 10ms interrupts might actually occur about 10.1 ms apart, but that much drift NTP should be able to handle. If not, run the host with an even higher HZ rate, even 1 should work with a modern amd64 CPU (though I have never tested that, nor heard of anyone who has - but 2000 should not be an issue). If for some reason you cannot change the clock rate of the host (that is, compile a new kernel with "options HZ=1000" in the config file) then make the guests run with a much slower clock rate - nothing faster than 50Hz. That should be acceptable (pdp-11's used to run at 50 or 60hz, and worked OK) but needs to be even slower for clock drift issues. The problem is that if the OS clock rate is too slow, it will start to impact upon (perceived) performance, and some application capabilities. kre
Re: updating direct from 5 to 9?
Date:Mon, 22 Aug 2022 21:59:28 +0700 From:Robert Elz Message-ID: <8246.1661180...@jacaranda.noi.kre.to> That is, I am replying to myself... (sad that). | And second, find out why the existence of wedges has any effect on | mounting wd0a (would be different if you were using dk0 for some | reason on a filesystem not intended to have wedges). And of course (I should think before pressing "send") that's because the wedge has the block device open, and those are single use devices (when the wedge has it open, nothing else can open it). The question of why wedges were being created at all remains though. kre
Re: updating direct from 5 to 9?
Date:Mon, 22 Aug 2022 16:19:03 +0200 From:Martin Husemann Message-ID: <20220822141903.ga13...@mail.duskware.de> | Booting a 9.3 install CD and digging around a bit I found the 9.3 kernel | | - auto-creates bogus wedges dk0 (for the FFSv1 at /) and dk1 (for the |swap partition. You might want to try to find out why it is making wedges for a disklabelled drive at all? DKWEDGE_METHOD_BSDLABEL (and the MBR form) are supposed to be off by default. And second, find out why the existence of wedges has any effect on mounting wd0a (would be different if you were using dk0 for some reason on a filesystem not intended to have wedges). This has no bearing on why wedges created would not have the proper settings of course. | I dimly recall the disklabel moved into the type 169 MBR partition | a long time ago - I bet 4.0 was before that change and this is what | now causes the broken wedge auto-detection. I doubt you will win that bet. I don't recall a time when the label was ever not in the NetBSD partition, which means if it was, we're talking about sometime in the early 1.x versions, or before. (I started with 1.3, but didn't really ever use that on an x86 system, more on sparc, which is MBR free). But that is easy for you to check - just hunt for the disklabel on the "drive" you have configured the 4.0 system on, and see where it is. Maybe somehow there are two, perhaps different ones? kre
Re: updating direct from 5 to 9?
Date:Sun, 21 Aug 2022 07:55:45 -0400 From:Greg Troxel Message-ID: | But interesting that 9.2 build.sh works on 6.1. Not relevant to the actual topic, but that stopped working for me, I think even before -9 was released. The basic system would build, but something in X required tools that couldn't be compiled using the -6 version of gcc. But if you just meant the script, rather than a complete successful build, then yes, it was designed (not by me, in case that is not clear) to work almost anywhere. kre
Re: set -o emacs ; stty -echo
Date:Wed, 20 Jul 2022 07:43:28 -0400 From:Andrew Cagney Message-ID: | Do you want a bug report? It isn't needed, but you can make a PR for it if you like. kre
Re: NetBSD 9.2 installer can't detect disk of some Hetzner VPSes
Date:Wed, 20 Jul 2022 08:25:00 +0200 From:Matthias Petermann Message-ID: | Unfortunaly, the kernel panics shortly before it passes control to init: | | ``` | [] panic: cnopen: no console device What kind of console interface does that setup give you? Emulated serial port? Emulated graphics interface? One of the virtio devices (1043) is described in pcidevs as a virtio console but it doesn't look like we have any kind of driver for that one (whatever that actually means). The their setup emulates some kind of standard com port (serial) or vga, then it should be possible to attach to that, but the boot code would need to tell the kernel which of those to use. You'd probably do better asking on current-users (or perhaps tech-kern but just pick one of those) than netbsd-users for this kind of info. kre
Re: set -o emacs ; stty -echo
I will take a look, but I suspect you're seeing the interaction between what editline (libedit) wants, and its settings, and how the shell interacts with it to preserve sane settings for other commands that also use the terminao. bash and readline are much more tightly coupled than sh and libedit. kre ps: when I look at this I am much more likely to use vi mode than emacs ... should make no difference to the interaction just to how libedit edits (key bindings).
Re: Re: LTFS support for HP tape drive devices
Date:Fri, 15 Jul 2022 23:41:13 +0200 From:Colo Colo Message-ID: <20220715234113.840de...@pobox.sk> | But I am new to BSD, and the question is, if it is possible to combine | LTFS for NetBSD & LTFS for FreeBSD & Linux source packages to make | LTFS works with HP drives on NetBSD or FreeBSD As long as it is software (and you pay attention to licensing issues, so no GPL'd code tries to get into our kernel - and as little as possible in the distributed userland) almost anything is possible. If you're asking whether if you do the work to make it happen, would we accept that, I'd say probably yes (assuming licensing is BSD compat, ideally the code sytle is compat, and it works without breaking something else) and packages from anywhere (which work) tend to be accepted in pkgsrc, so by all means, go for it. If you're asking for someone else to make that happen for you, then you'd need to find someone with the desire to make it happen, or who can be convinced to have such a desire. kre
Re: NetBSD 9.2 installer can't detect disk of some Hetzner VPSes
Date:Mon, 11 Jul 2022 21:43:15 +0530 From:Mayuresh Message-ID: <20220711161315.hoakmn5fgz76gtov@localhost> | Hetzner agreed to set a compatible chipset for my instance. So I finally | got the configuration I needed and have just installed NetBSD 9.2 on that. Good. | Shouldn't qemu with the chipset setting they mentioned suffice for | testing? Yes, I guess it should ... I'm not a qemu user, don't even have it installed (though that is in progress now) so I may need some advice on how to operate it to get the desired effect... | There are some difficulties in testing on Hetzner such as | | - As their reply suggests, there is no guarantee about which chipset | your new instances will get. It's a bit random. Yes, though they did say (from what you posted) that all the AMD cpu instances have the one which has the problem, so that might not have been a problem. | - I doubt whether they'd allow an arbitrary image. They have provided | NetBSD 9.2 released image on their platform. But that's a different issue...And as you suggest, testing would be much easier done locally than exporting ISO images for you to try anyway. Of course, even if we find something to fix, it isn't likely to get into their provided image any time soon. kre
Re: NetBSD 9.2 installer can't detect disk of some Hetzner VPSes
Date:Mon, 11 Jul 2022 15:25:42 +0530 From:Mayuresh Message-ID: <20220711095542.mnyb4o54j5kd476c@localhost> | Following is a reply from Hetzner (they have quoted freebsd link, not sure | how relevant): It looks to be in the area I already thought was perhaps related, but that's old (2019) and we supposedly already have support for revision 1, which FreeBSD apparently didn't (back then) if I read all of that correctly (really just skimmed so far). | This is most likely due too a bug within BSD: | https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D236922 (for anyone else who goes looking, the =3D is a QP encoded = of course, so just omit the "3D"). In what form would you need a new NetBSD 9.2_STABLE to appear in in order to test the driver with some diagnostics added, so we can see what is really going on?Is just a kernel enough, or do you need an ISO image (since unless something changes, it is unlikely to work any better than the previous attempt - just provide more info - it doesn't need to contain install sets etc, with no accessible disks, those are useless, so it could be quite a small ISO - perhaps the one intended for installing from the net - not that that could work either - no working net or disks). kre
Re: NetBSD 9.2 installer can't detect disk of some Hetzner VPSes
Date:Sun, 10 Jul 2022 11:09:40 +0200 From:Martin Husemann Message-ID: <20220710090940.ga16...@mail.duskware.de> | Yeah, I noticed that we already have support for vioscsi* at virtio? | [which is what the spec draft I linked ended in] and vioif* at virtio? | (at least in current), so it can't be this simple. It has been there a while, I used to use it via virtualbox a laptop or two ago... It actually might be simpler than I thought - I was basing my "no support" on a grep not finding the product ID anywhere but in the pcidevs file (and files built from it). But dev/pci/virtio_pci.c does this ... /* * Non-transitional devices SHOULD have a PCI Revision * ID of 1 or higher. Drivers MUST match any PCI * Revision ID value. */ if (((PCI_PRODUCT_QUMRANET_VIRTIO_1040 <= PCI_PRODUCT(pa->pa_id)) && (PCI_PRODUCT(pa->pa_id) <= PCI_PRODUCT_QUMRANET_VIRTIO_107F)) && /* XXX: TODO */ PCI_REVISION(pa->pa_class) == 1) return 1; and all the devices in question should be between 1040 & 107f, so the only issue might be if the revision is not 1. Given that the comment says that as long as the rev is >=1 is supposed to work (there's an earlier test for rev 0 - transitional devices) it might be that that is the problem - the devices being configured just might be rev > 1. In that case, if nothing in our drivers is affected by the rev bump, all that might be needed is to adjust that final test (the one with the XXX comment...)."If". kre
Re: NetBSD 9.2 installer can't detect disk of some Hetzner VPSes
Date:Sun, 10 Jul 2022 09:58:48 +0200 From:Martin Husemann Message-ID: <20220710075848.gc25...@mail.duskware.de> | Is this the spec for the virtual devices? | https://lists.gnu.org/archive/html/qemu-devel/2011-06/msg00754.html No idea. That's 11 years old, and says TBD for the PCI ID to be used. It is also just the SCSI interface, the net, and I assume probably, console interfaces (and maybe more of them) are likely to be needed as well. | Might be worth to ping the tech-kern mailing list with the unconfigured | dmesg lines and that pointer, maybe someone has done some work on this | already or in a related area. Since it is easily available in qemu, | testing is easy. Might be. It isn't impossible that these are the same basic virtio interfaces as Virtualbox/VMware/... use, just with a different manufacturer ID. But I'm not sure how to find out. kre
Re: NetBSD 9.2 installer can't detect disk of some Hetzner VPSes
Date:Sun, 10 Jul 2022 09:46:26 +0530 From:Mayuresh Message-ID: <20220710041626.zvnntquax4w7jnwq@localhost> With this: | [ 1.236385] sd0 at scsibus0 target 0 lun 0: disk fixed you should do as I indicated in the earlier mail, see the config for scsibus0 and then for whatever that is attached to (but perhaps just scsibus0 will be enough) - you want the "at pciN" config line, which should reveal the pci vendor and decice code being used. Then you need to find out how to configure the bigger drive to use the same ones, rather than linux specials. kre
Re: NetBSD 9.2 installer can't detect disk of some Hetzner VPSes
Date:Sun, 10 Jul 2022 09:39:08 +0530 From:Mayuresh Message-ID: <20220710040908.exdwph7o7wuf3xrn@localhost> | Yes. Both network and disk are appearing not configured. PFA. Vendor 1af4 is (in our pcidevs) QUMRANET - the web says it is Red Hat, and used for virtio devices (which corresponds with our pcidevs, which has device 1041 listed as "Virtio Network", 1043 as "Virtio Console", 1048 as "Virtio SCSI". The latter corresponds with your linux boot dmesg... [1.682859] scsi host2: Virtio SCSI HBA [1.705095] scsi 2:0:0:0: Attached scsi generic sg0 type 0 [1.705167] sd 2:0:0:0: Power-on or device reset occurred [1.706859] sd 2:0:0:0: [sda] 320004096 512-byte logical blocks: (164 GB/153 GiB) At the minute I can see no NetBSD drivers for these devices, and I have no idea what their interface is like (that would require examining linux sources I suspect). Check your 48GB version, my guess is that it isn't using those linux devices. You may have the 160GB system configured expressly for linux - the host most likely doesn't have a special case for NetBSD (sad as that is) but they will certainly have an option for windows. Use that instead (linux should still work just fine). kre
Re: NetBSD 9.2 installer can't detect disk of some Hetzner VPSes
Date:Sun, 10 Jul 2022 01:14:05 +0530 From:Mayuresh Message-ID: <20220709194405.towbk55owqgg3xsb@localhost> | If you could give me some words to grep that will help. Look for "not configured" first. Then ata wd[0-9] ld[0-9] and sd[0-9] (you could probably just use wd0 ld0 and sd0) - but those assume a relatively normal x86 type install, there are lots of names for disc drivers that might appear. What would help would be to look at the 48GB install that works, see what the drive is called in that one, look for that name in dmesg output (probably in /var/run/dmesg.boot in a running system) then see what it connects to, look for that, see what it connects to, etc. Eg: I have ld0 at nvme0 nsid 1 ld0: 1863 GB, 243201 cyl, 255 head, 63 sec, 512 bytes/sect x 3907029168 sectors ld0: GPT GUID: fdb094e2-f6fc-45de-827a-106c6748e9c4 dk0 at ld0: "NetBSD_EFI", 522240 blocks at 2048, type: msdos (and several more dkN at ld0: lines - those aren't important). jacaranda$ grep nvme0 /var/run/dmesg.boot nvme0 at pci2 dev 0 function 0: Samsung Electronics (3rd vendor ID) product a80a (rev. 0x00) nvme0: NVMe 1.3 nvme0: for admin queue interrupting at msix1 vec 0 nvme0: Samsung SSD 980 PRO 2TB, firmware 3B2QGXA7, serial S69ENF0RA54347E nvme0: for io queue 1 interrupting at msix1 vec 1 affinity to cpu0 [lots more interrupt related lines] ld0 at nvme0 nsid 1 jacaranda$ grep pci2 /var/run/dmesg.boot pci2 at ppb1 bus 2 pci2: i/o space, memory space enabled, rd/line, wr/inv ok nvme0 at pci2 dev 0 function 0: Samsung Electronics (3rd vendor ID) product a80a (rev. 0x00) jacaranda$ grep ppb1 /var/run/dmesg.boot ppb1 at pci0 dev 6 function 0: Intel Alder Lake PCIe G4 Root Port 2 (x4) (rev. 0x02) ppb1: PCI Express capability version 2 x4 @ 16.0GT/s pci2 at ppb1 bus 2 [plus a bunch of false matches - ppb10 (and more) matches as well...] jacaranda$ grep pci0 /var/run/dmesg.boot pci0 at mainbus0 bus 0: configuration mode 1 pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok [various things connected, including...] ppb1 at pci0 dev 6 function 0: Intel Alder Lake PCIe G4 Root Port 2 (x4) (rev. 0x02) Intel product 7af0 (miscellaneous network, revision 0x11) at pci0 dev 20 function 3 not configured When you reach mainbus0 (or anything that looks like it) you can stop. The "not configured" there I included just as an example, that's (what will one day be) an iwl WiFi interface, but NetBSD doesn't have a driver for it yet. If you get that info for the system that works, you can search for the same things (except start in the reverse order) in the one that doesn't. I can't think of any reason simply configuring a bigger drive should make any difference - it is likely there are other config differences between the two systems than just that. kre
Re: Setting keyboard layout on xterm
Date:Wed, 29 Jun 2022 17:56:09 +0200 From:Martin Husemann Message-ID: <20220629155609.ga21...@mail.duskware.de> | I don't know if the in-tree xterm supports unicode In systems that have it (HEAD does, I haven't checked earlier) uxterm does. uxterm is just "xterm -class UXTerm" under the hood, so yes, xterm (or a recent enough xterm anyway) does support unicode, if started in the appropriate way. For me, it has no trouble showing me CJK/cyrillic/... spam messages from nmh in an xterm. Very decorative! I haven't tried typing, with X, that usually requires an input method to be installed, and I haven't done that yet (on other systems I allow Thai input, which has worked fine ... my keyboards have all the symbols on the keycaps, in addition to ascii ... quite crowded keycaps!) kre
Re: how to turn off devices that monitor sensors
Date:Tue, 21 Jun 2022 10:29:54 +0900 From:Henry Message-ID: | Thank you for the ideas. The manufacture date of this HP Pavillion | Notebook 15-au123d was 07/01/2017. NetBSD is installed UEFI. That should all be new enough that ACPI should work fine, and if the other OS's (well windows) can shut down, then I'd assume that entering S5 state should make that happen for NetBSD as well. What other hardware exists in that system? Does reboot (or shutdown -r) work correctly? | I tried `boot -2' but the startup stopped at the following. I don't | know how to proceed. | boot device: | root device: At that point you should be able to type ? and get a list of possible root device values, pick the right one, and type it. But it is possible that without ACPI the disk isn't being seen by NetBSD at all, and there will be nothing appropriate in that list. This is about as far as I can take it, I don't know the x86 architecture or the MD x86 code nearly well enough to suggest anything else that you can try. kre
Re: how to turn off devices that monitor sensors
You might try "boot -2 netbsd" to disable ACPI completely, in which case NetBSD would not be able to request ACPI S5 state to shut down and power off (would need to use older BIOS interfaces). If your system is an older one (HP model numbers mean nothing to me) then it shoukd work OK without ACPI. You could also confirm whether other OS's are able to power off that particular system (the test needs to be of that exact system, as minor variations like what BIOS rev is installed, what other hardware, and the BIOS config all could alter the results. Installing other systems should not be needed, "live" cd/dvd boots (or USB stick) should be sufficient to test this. But testing NetBSD with ACPI disabled first is the quick test to perform. kre
Re: how to turn off devices that monitor sensors
Date:Sun, 12 Jun 2022 18:18:26 +0900 From:Henry Message-ID: | The machine freezes with the last messages to the console: | acpi0: entering state S5 S5 is "off" (more or less), the system should be doing nothing except waiting for someone (or something) to request that it be turned on again. It looks as if your system is one of those which NetBSD doesn't know how to really shut down, or the BIOS has bugs which are preventing that from happening. Do you have any "Wake on" type events configured in the BIOS? If so, you might want to try disabling those and see if that might make a difference - the BIOS might be keeping the system more alive that you want so that wake on lan, or wake on usb keyboard, or something can work. | acpitz0: workqueue busy: updates stopped | coretemp0: workqueue busy: updates stopped | coretemp1: workqueue busy: updates stopped As Martin said, that's just noise, because the BIOS hasn't reset enough to stop that stuff from interrupting, and is apparently keeping enough power enabled to the ram (or at least caches) that enough of NetBSD is still around to report that stuff, I agree with Martin, those are almost certainly not related to your issue (they're a symptom caused by it, not causing it). [I also would agree that there's potentially a driver bug, once the system is off (or supposed to be off) nothing should be being processed at all.] kre ps: this might, or might not, make it direct from me to gmail, so I hope you're subscribed to netbsd-users so you can get the reply that way. gmail doesn't like me, and tends to bounce mail I send.
Re: Adding Raidframe to existent GPT system
Date:Mon, 9 May 2022 18:02:47 +0200 From:Martin Husemann Message-ID: <20220509160247.gd2...@mail.duskware.de> |34 2014 Unused | 204865536 1 GPT part - EFI System There's no good reason to align an EFI partition is there? Those 2014 unused blocks would serve a more useful purpose being included in that partition. | 67584 11720976384 2 GPT part - NetBSD RAIDFrame component That one should be (and is) aligned. | The two (identical) EFI partitions are not strictly needed (one would | do), but my theory was to allow booting from either disk if one of them | fails completely. That makes sense, assuming you're using EFI booting. But even if you had only one, you'd want something of that size on the other drive to keep them approx identical. A second efi partition is better than just more unused space. It would be nice if the EFI partition could be in the raidframe, but GPT doesn't allow overlapping partitions - if it did we could put the EFI partition inside the raidframe, and then just make the outer GPT partition table EFI partition refer to the same section of the drive. kre
Re: Adding Raidframe to existent GPT system
Date:Mon, 09 May 2022 10:01:47 -0400 From:=?UTF-8?Q?C=C3=A9sar_Catri=C3=A1n_C=2E?= Message-ID: | Got three drives, two for a RAID-1 array and one more for backup. ok, that's good. At least last time I looked raidframe was unable to autoconfigure hot spare drives - they need to be added after each boot, which is a minor inconvenience, as it can easily be scripted in rc.local ... just watch out for trying to add a drive as a spare after an earlier faikure has caused the spare to already have been incorporated into the raid set. It is also no great problem if you don't set up a hot spare, raidframe runs fine (without redundancy) if a drive fails (had that happen several times) then you can add the replacement manually after a failure. Just keep an eye on things so you detect faikures quickly, it is easy to get complacent after years of nothing happening. Which reminds me... | Got it enabled for MBR, but it seems the GPT adds complexity | for Raidframe due that each GPT partition is offered as a new | disk/wedge to the system. One man's complexity is another's flexibility. You can partition the drive, and make separate raid sets from partitions on different drives (with more drives to play with, I do things like that, using parts of different drives for different raid sets). | Should be created only one wedge at first, using the entire disk, | then apply raidframe to it You can do that, or break the drives into smaller pieces and make each of those a separate raid. It all depends upon your needs. It appears from your current GPT that you are using legacy (BIOS) booting, rather than EFI - that's fine, and NetBSD's boot allows booting from and root on a raid1 (though someone else will need to provide the recipe if you want to do that, I don't run things that way) but the firmware will not undertstand raidframe, so EFI booting needs an EFI partition in the physical drive's GPT partition table, not in a raid partion in that. | (don't know if raidframe is ready for GPT?), It is. Raidframe just gives you a simulated drive. Other than BIOS access you can do anything with a raidframe that you can with any other drive. | then do again a GPT layout into the raid0 device to deploy the | filesystems? That works, if you take care of booting (assuming you plan on booting from this drive/drives). The one thing to watch for is partition alignment and stripe sizes, relative to your filesystem block (newfs -b value) size, to avoid getting lots of read/modify/write cycles happening when all that should have happened is to write a block (once to each drive with RAID 1). There are other people better able to explain the issues here than me. Much better to wait a day or so, get the correct info, and set things up properly, thandk everything, init tge raid, set up filesystems, and tgen discover tgat tge config forces poor performance, and you neex to start over (if write performance matters, it doesn't always). kre
Re: Adding Raidframe to existent GPT system
oh, I forgot to say, that if you do have multiple drives, and are going to use raidframe, change the GPT partition type from ffs to raid. You can put a new GPT (or disklabel if it is small enough) inside the created raidframe, which appears to the system as a drive. kre
Re: GPT and UEFI booting
Date:Tue, 5 Apr 2022 12:54:40 +0200 From:Martin Husemann Message-ID: <20220405105440.ga20...@mail.duskware.de> | This is meant for expert use firmware bugs workarounds, and there seems | to be no official way to toggle it off again. If I am reading the got sources correctly, using gpt biosboot without giving the -A flag shoukd turn off the PMBR "active" bit. kre
Re: is /bin/sh the almquist shell?
Date:Tue, 29 Mar 2022 23:34:08 GMT From:Mayuresh Kathe Message-ID: <202203292334.22tny8vp027...@sdf.org> | should i start a separate thread asking for information | regarding netbsd's /bin/sh support for recursion? New thread? Probably not needed. To actually answer the question depends exactly what you mean/need. But as a simple (possibly incorrect) interpretation, the original Bourne sh had no functions, so the only way it could do anything recursive was by having a script run itself, either as a standalone command, or via the '.' command. All modern shells have functions (they are part of the POSIX sh spec) and all shell functions have always supported recursion. Not all shells support local vars in functions however, they are not in posix. Without them some recursive techniques can be more difficult. I believe that the original Almquist shell, and all descended from it (which includes dash incidentally) support functions and local variables. Please read the sh man page. kre
Re: manpage section-names
Date:Sat, 30 Oct 2021 20:32:23 -0500 (CDT) From:"Jeremy C. Reed" Message-ID: | n I didn't search for definition of "n" "new", back in about 1980... kre
Re: proposed change to getty
Date:Tue, 12 Oct 2021 09:42:55 -0400 From:matthew sporleder Message-ID: | Do you mean modem like a telephone modem or modem like a serial port? I meant telephone modems - they're what most uses DTR as a functional signal, and it is disabling that signal that all this is about. (There were other devices that behaved similarly, but they're even less likely to be seen now.) The serial port is the system interface to the modem (that is the interface in question - modems that are implemented on ISA/EISA/PCI/PCIe/... cards are entirely different beasts, though I believe that some of those, probably even most, present a system interface that looks like a serial port, and most likely manipulating the fake DTR on that fake serial port would have similar effects). And yes, I know that these things are not quite as commonly used these days as they were in the 1970's and 80's ... kre
Re: proposed change to getty
Date:Mon, 11 Oct 2021 18:09:23 -0300 (ADT) From:Jared McNeill Message-ID: <5ab793c9-8cab-2e79-e6ba-8017d924b...@invisible.ca> | There's a 2 second sleep in getty before opening the tty | that has been there since before NetBSD I don't recall if that was there when this version of getty was created or not -- probably, in which case it was probably also in the 7th edition getty. That all got done about 40 years ago, far too long to remember. 2 secs back then was the smallest sleep that was guaranteed to be > 0 (the delay for sleep(n) was between n-1 and n secs). Since that is no longer true, the smallest change that should happen is s/2/1/, or use usleep() or nanosleep() and make the delay even smaller, 200ms should be enough for any modem. Doing it in the driver is OK as well, but probably needs to remain in getty until we are sure that all drivers do this correctly. Since you are handling this by blocking open until long enough after the close had passed, also delaying the open in getty should have no real effect. kre
Re: FreeRADIUS instability
Date:Thu, 30 Sep 2021 08:37:44 -0400 From:Christos Zoulas Message-ID: <49c53880-d427-489d-92fa-881cd01b5...@zoulas.com> | I have committed it to head, but I want to make sure that everything is | ok and that people don't prefer to fix it via a fork hook, There's nothing wrong with that as a fix for the DNS resolver issue, but I suspect that the underlying issue isn't fixed this way - any process that has a kqueue open (by some code in some library, so not known to the application, as here) will face the same problem and so need a similar solution. I'd suggest that when a fork happens, rather than closing the kqueue fd in the child, rather it be left open, but redirected to a nothing object (one which simply returns errors on almost all operations but ones that only affect the fd (eg: dup) and close().) That would still need something like your fix if the kqueue is desired to work (again) in the child, but would avoid issues like the one in question where the fd is recorded somewhere, and used after that same fd has been reassigned elsewhere. Alternatively of course, simply make kqueue remain open across fork, it already needs to be able to handle multiple fd's aimed at the same queue, right? After all the fd can still be dup'd. kre
Re: FreeRADIUS instability
Date:Wed, 29 Sep 2021 13:48:51 -0700 From:"Pawel S. Veselov" Message-ID: <72a4f226-78dc-22f9-4d4b-90e434b76...@gmail.com> | I think the only way to fix this is to have the resolver state | cleaned up thoroughly after fork(). I can't see how this can be | worked around by applications. Maybe put a call to res_init() in the child, immediately after a successful fork (before any other fd manipulation). That will try to close the old kqueue, which will fail, but no-one cares, and then open a new one. kre
Re: /bin/sh fd 12
Date:Tue, 14 Sep 2021 21:26:43 +0700 From:Robert Elz Message-ID: <14651.1631629...@jinx.noi.kre.to> | the fix is almost done. And it was committed. But then it was just almost almost done ... now it should really be almost done. What I committed last night is OK, for normal sh use, but can be made to exhibit somewhat weird behaviour if subjected to exotic testing (trying to see how much of this really works, etc). I have a much better fix (for the "fd 13" issue) being built for testing now, but it won't be committed until much later today. kre
Re: /bin/sh fd 12
Date:Tue, 14 Sep 2021 09:12:17 -0400 From:Jan Schaumann Message-ID: <20210914131217.gk6...@netmeister.org> | Do you want me to send-pr the redirection to fds | 12/13? You can if you want, but the fix is almost done. (My test build is just completing now, then I need to run tests to make sure there are no regressions). As I said in an earlier message, the fix for 12 (any fd the sh has opened for its own needs) is trivial, but dealing with the issue with 13 (temporarily moved user fds) is messier, but I think I have it almost done (there are still one or two minor issues to deal with, which I will get to in a later fix). kre
Re: /bin/sh fd 12
Date:Tue, 14 Sep 2021 06:14:31 - (UTC) From:mlel...@serpens.de (Michael van Elst) Message-ID: | /bin/sh uses /dev/tty for job control which is enabled automatically | when running as interactive shell. But there is a -m option where | you can enable/disable it, i.e. 'sh +m' runs a shell with job control | disabled and descriptor 12 not open. And that's correct, rather than what I thought when I replied to the original message earlier. | Maybe kre@ knows if a shell should allow redirection to its | own internal file descriptors. It shouldn't. Fixing that one is trivial. Fixing the fd 13 one is trickier (but will happen). | (Our) ksh only supports the single digit descriptors 0..9 for redirection All ksh versions, I believe. kre
Re: /bin/sh fd 12
Date:Mon, 13 Sep 2021 23:32:39 -0400 From:Jan Schaumann Message-ID: <20210914033238.gj6...@netmeister.org> | 0, 1, and 2 are obvious, but fd 12 did not seem | obvious to me. | | Descriptor 12 being open to the current terminal means | I can do this: | | $ echo foo >&12 | foo | $ That's a bug, I will fix it. | But I can also: | | $ echo foo >&13 | foo | $ That's also a bug, a similar one, the same fix should apply. | even though fd 13 did not show up under /proc/$$/fd/. No, it is created for the echo command. | Where does that fd come from, and why is not shown | under /proc/$$/fd? When you redirect standard output of a built-in command, the existing standard output needs to be moved somewhere else (saved) before the new one can be opened (dup'd in this case). 13 is the next available fd, so that's where it is moved to - just in time for the dup() back to fd 1... When echo is done, fd 13 is moved back to fd 1, so it is closed again before you get a chance to look. | And what's the purpose of fds 12 and 13? 12 is the script input (which, when there is no script, is a copy of stdin, which is the terminal ... strange as it seems, when a tty is stdin/stdout/stderr it is open read/write on all three fd's). 13 is as above. But those numbers are not fixed, in various circumstances others might be used. | When using /bin/ksh, I see a different extraneous fd, fd 10, Same thing as fd 12 in /bin/sh - all Bourne shell clones will have something similar. | but I can't write to it: /bin/ksh only allows you to reference fd's 0..9 (so do many shells incidentally, that's all that's guaranteed by POSIX). That's why the "illegal file descriptor name". | Is this documented anywhere? No. Aside from /proc/*/fd these things are supposed to be invisible (an internal implementation detail) - you won't see them via the fdflags sh built-in for example. kre
Re: backspace in wscons console sends ^H to processes
Date:Mon, 19 Jul 2021 08:49:31 -0400 From:Greg Troxel Message-ID: | As additional background, IMHO all of this confusion arose from the | differing setups of DEC computers and the IBM PC. It is older than that. | On a real terminal as | one would have used with a PDP-11 or VAX in the 70s/80s/early-90s, "Real terminals" existed long before PDP-11's or VAXen. They had keys that struck paper and made marks. On those backspace moved backwards a character, and typing something else overstruck the previous character (sometimes intentionally, often just making a mess). Delete did nothing at all. Some of those paper terminals also had paper tape punches and readers. This allowed "off line" preparation of input, or messages (these terminals were used for telex / telegraph type operations, more than computers, computers just needed something for interactive input, that is something different than punched cards, and these were available). When preparing a punched paper tape, to erase a mistaken character, one relied on "delete does nothing" along with "delete is 0x7f", which when converted to even parity format, is 0xFF, and since a "1" was recorded on the tape as a hole, punching a delete changed whatever was there before into the delete code (all rows punched), which, as above, did nothing. But to overstrike the previous character, one needed first to move backwards to be over it - that was what a "backspace" accomplished. So, on "real terminals" the sequence to erase the previous character was "backspace delete", and yes, the user had to type both of them - and to erase two characters, one needed backspace backspace delete delete (etc). When "glass ttys" started to become popular, their manufacturers initially provided both keys - but computer systems wouldn't require users to type both chars, but different systems picked differently. Some used DEL (most Digital OS's did that), others used BSP (most other systems, since that was the thing that worked easiest on a glass tty - overstriking was destructive, so to replace one char with another one simply did "backspace, replacement". But DEC had settled on DEL as the preferred choice before glass ttys (or the "modern" form, Tektronix had a storage scope kind of thing, which was more like a paper terminal, just without paper) were invented. Unix used # for the erase character, as on a paper terminal you could see that, count how many # chars, and mentally erase that number of previous other characters (and @ was the line kill character, DEL was "interrupt"). Many unix users (but not all) had come from a DEC background, and in particular a lot of BSD users, where there was "VMS vs BSD unix" type competition all over the place. So when BSD changed the defaults from # and @, they picked the DEC convention (DEL ^U ^C) - which irritated non-DEC users a bit (like me) who used non-DEC glass terminals, with BSP in a convenient location, and DEL somewhere obscure (more obscure, usually smaller key). Never mind, it was user configurable (had been since the very early days, at least for erase and kill - interrupt and quit were not configurable initially - which was one reason many people in the early days used BSP for erase, DEL remained hard wired as interrupt). | That led, I think, i386 unix (386BSD, then early NetBSD) to let the key | send ^H and configure erase to ^H, breaking emacs That alone IMO is the biggest feature. Anything that breaks emacs, even in trivial ways like this, is GOOD. | I don't see that as possible, and I have no idea why you would want | that. Once you have the key that is logically the delete key sending | DEL as original ASCII intended, It certainly intended nothing of the kind, DEL was "deleted" just as NUL was "never entered" (empty space on the paper tape where nothing had been punched). Both were simply ignored and had no effect whatever (and so could be used for padding characters after sending something which the terminal would take a long time to execute, like carriage-return or line feed. If you ever actually used paper tape (I did) the last thing you'd ever want was for DEL to start being interpreted as anything other than "nothing here". On the other hand, as backspace never (usually) ever got onto the tape, it changed its position instead, that one turns out to be a nice choice for the erase character (paper tapes don't need it - the erasing is already done). That using ^H (BSP) as erase screws emacs users is just a bonus point. | What are you trying to | accomplish with this? Or are you asking "is there some way to have | multiple characters function as erase in the tty"? But that's a good question. The tty driver (which is all you get when you're in cat - unlike in a program which uses libedit or readline, or similar where all this is done in those libraries, or in the program itself) has exactly one erase character - you can set it to anything you
Re: where is device manufacturer/model kept?
Date:Mon, 28 Jun 2021 12:18:50 + (UTC) From:RVP Message-ID: <556bb7f-3792-635e-86ed-6d7c6b752...@sdf.org> | echo $(sysctl -n machdep.dmi.system-vendor) That's a convoluted way of writing sysctl -n machdep.dmi.system-vendor and one which could fail if the string just happened to contain the "wrong" characters (depending upon which version of echo is being used for which are "wrong" for this purpose). kre
Re: procfs difference between NetBSD and Linux
Date:Tue, 8 Jun 2021 06:45:28 + From:David Holland Message-ID: | No such luck, 1000+ atf failures with a supposedly clean tree, | something's badly borked. Might take a while :-( Something's locally not quite right ... I see nothing like that, and all the failures I'm seeing now are either the "normal" ones that happen everywhere (various ptrace test failures etc) or ones that are caused by the kernel I'm running not being GENERIC (it has no audio at all, not even pad, so lots of audio tests fail .. they should skip rather than fail, but that's a different problem, I don't have MODULAR turned on, so several tests which want to load kernel modules fail (again should skip) and I don't have COMPAT32 so everything that wants to try running a 32 bit binary also fails (also should skip, or simply not be attempted at all). Aside from that I have a bunch of c++ test failures, not sure whether those are normal or not, but I can't imagine that changing namei should have any effect there. So, my guess would be some trivial problem in the recent editing -- even with your original patch, untouched, I didn't get anything like that number of test failures. kre ps: send me (off list) updated files (or patches against HEAD) if you would like me to take another look.
Re: procfs difference between NetBSD and Linux
Date:Sun, 06 Jun 2021 00:28:50 +0700 From:Robert Elz Message-ID: <28802.1622914...@jinx.noi.kre.to> Once more, into the self-reply... | (all the rest of the files | your patch modified are as you modified them). It turns out there is another fix needed, in vfs_vnops.c In that one, the patch did ... - if (fmode & O_CREAT) { + /* +* 20210604 dholland ditto +*/ + if ((fmode & O_CREAT) != 0 && ndp->ni_dvp != NULL) { which means that we only get into the following code (the if only succeeds now) when we have a parent vnode, which now only happens when the target node doesn't exist, when it exists, we won't be creating anything, so no parent gets returned. The code that followed either actually created the file (or at least attempted to) - that part is still fine, and still works - or if the target exists (no create needed) released the parent vnode (skipping that part is fine, since we don't have it), also if O_EXCL is set, returned EEXIST - that's OK, as if O_EXCL is set, we don't do this modified code, so that's all OK, the EEXIST will come from namei() instead of this code in some cases, but no-one cares where it comes from, but also the code cleared the O_CREAT bit, and no longer does. Thus means that and as O_CREAT remains set, so we don't bother with vn_openchk() which means things like O_REGULAR no longer work. (The permission checks are all in there too!) I found this when I saw that the fopen("/dev/null", "wf") (and other similar) tests in tests/lib/libc/stdio/t_fopen failed in the ATF test run when I got time to go through the failures (OK, in reality I stopped at that point, I'll run them all again with the fix for this). I think that all that might be needed is to clear O_CREAT in the else case of this if .. that was pointless before as we never got there with O_CREAT set, but now we can. Once that's done, the t_fopen test succeeds (or as much as it can for me, I don't have MODULAR in my kernels, so a couple of the sub-tests are skipped, but those are unrelated to these changes). I have done some testing with that change made, but I need to run all the ATF tests, and make sure there's nothing else that's now failing and shouldn't be. All (quick & dirty) tests I have run on the various situations related to what looked like they might be problems here are working now. I also need to think more on the possible permutations. kre
Re: procfs difference between NetBSD and Linux
Date:Sat, 05 Jun 2021 23:03:05 +0700 From:Robert Elz Message-ID: <2011.1622908...@jinx.noi.kre.to> | Replying to my own message again (draw your own conclusions)... And now I am replying to my reply to my message. All remaining hope is lost! | Building now and then will test this version soon (I had already run | the AFS tests, which don't test this particular scenario, apparently, And of course I meant ATF tests Did you need any more proof? | as they all worked about as well as they typically do for me, certainly | no kernel crash from them). I will run them again on this version, and won't reply again unless there is something worth saying ("they worked, as well as usual" isn't it). But for now, my modified version of dholland@'s patch is looking good: netbsd# >/ -sh: cannot create /: is a directory netbsd# >/bin -sh: cannot create /bin: is a directory No more panic, and the results that we want. Further, with /proc mounted, and this (modified from the original in this thread to supply a little more info) test program: #include #include #include #include #include int main () { int fd, new_fd, err; char buf[PATH_MAX]; fd = openat (AT_FDCWD, "foo.txt", O_RDONLY|O_NOFOLLOW|O_DIRECT); err = errno; printf ("fd = %d (flags %#x) err=%d\n", fd, fcntl(fd, F_GETFL, 0), (fd == -1 ? err : 0)); sprintf(buf, "/proc/self/fd/%d", fd); sleep(2); new_fd = openat(AT_FDCWD, buf, O_RDWR|O_CREAT|O_NONBLOCK, 0744); err = errno; printf ("new_fd = %d (flags %#x) err=%d\n", new_fd, fcntl(new_fd, F_GETFL, 0), (new_fd == -1 ? err : 0)); } (and after creating "foo.txt" of course), the results are... fd = 3 (flags 0x8) err=0 new_fd = 4 (flags 0x6) err=0 The "O_DIRECT" in the first open I added just so the result from F_GETFL wouldn't be 0, it is otherwise meaningless (that is the 0x8) The flags value is O_DIRECT|O_RDONLY, as expected, O_NOFOLLOW is an operation, not a mode, and isn't saved. For new_fd, flags 6 is O_RDWR|O_NONBLOCK (O_CREAT isn't saved, naturally, that's also an operation, not a mode). I suspect this is what we want. Without /proc mounted ... and yes, initially I forgot it was needed, we get: fd = 3 (flags 0x8) err=0 new_fd = -1 (flags 0x) err=2 which also looks correct, not that anything there is in any way related to these changes in that case. Now I'll leave it for David to turn my mangling into something sane, but I think we probably have (in perhaps a messy way) this one solved. kre ps: David, I'll send the vfs_lookup.c I used in an off-list message, to save you recreating it out of the e-mail...
Re: procfs difference between NetBSD and Linux
Date:Sat, 05 Jun 2021 20:13:53 +0700 From:Robert Elz Message-ID: <16349.1622898...@jinx.noi.kre.to> Replying to my own message again (draw your own conclusions)... | It applies, compiled, and builds a release with no problems, running | tests now. Unfortunately, it doesn't work, kernel segv in vn_open(). I believe the cause is this code (in namei()): if (cnp->cn_nameiop != LOOKUP && (searchdir == NULL || searchdir->v_mount != foundobj->v_mount)) { if (searchdir) { /*... irrelevant for now */ } vrele(foundobj); foundobj = NULL; ndp->ni_dvp = NULL; ndp->ni_vp = NULL; state->attempt_retry = 1; which is followed by the code that changed: switch (cnp->cn_nameiop) { case CREATE: if (cnp->cn_flags & NONEXCLHACK) { (etc). The problem (of course) is those foundobj = NULL; and ndp->ni_vp = NULL; lines, neither of which we want to happen in this case. Then when we return to vn_open() (without an error) ndp->ni_vp == NULL and kaboom. I am trying a fix for this by making the initial test shown above be: if (cnp->cn_nameiop != LOOKUP && (cnp->cn_flags & NONEXCLHACK) == 0 && (searchdir == NULL || searchdir->v_mount != foundobj->v_mount)) { which of course then makes the test of NONEXCLHACK inside "case CREATE:" meaningless, but harmless, so I just left that for now. This change makes a NONEXCLHACK CREATE op function identically to a LOOKUP op, which I believe is what we want in this case. Then because we're now no longer doing the ndp->ni_dvp = NULL; and the code in vn_open() relies on that, I added if (foundobj != NULL && cnp->cn_flags & NONEXCLHACK) { if (searchdir != NULL) { if (searchdir_locked) { VOP_UNLOCK(searchdir); searchdir_locked = false; } vrele(searchdir); } searchdir = NULL; } which might be overly complicated, but seems to fit with what is needed (or done anyway) in what comes later when searchdir != NULL. (searchdir is later placed into ndp->ni_dvp). Building now and then will test this version soon (I had already run the AFS tests, which don't test this particular scenario, apparently, as they all worked about as well as they typically do for me, certainly no kernel crash from them). kre ps: The rhialto@ suggested test: echo >/usr made it really easy to test this). Thanks. My test setup has no (extra) mount points, or not until I get around to mounting a procfs to test the code that failed anyway, so I can't use /usr - but / works just as well -- / is a mount point. Using an 8.1 kernel (the relevant code hasn't changed in a decade - until today - so anything vaguelly recent should give the same results): $ echo >/ sh: cannot create /: file exists $ echo >/bin sh: cannot create /bin: is a directory shows it is just as good to use for the test as any other mount point. (the "echo" isn't needed, just ">/" works as a test, but that's immaterial).
Re: procfs difference between NetBSD and Linux
Date:Fri, 4 Jun 2021 20:09:14 + From:David Holland Message-ID: | The patch below has not even been compile-tested and so may need some | adjustments (and might conceivably break rump) but should address the | problem in a way that will, with luck, not explode anything else. It applies, compiled, and builds a release with no problems, running tests now. kre
Re: procfs difference between NetBSD and Linux
Date:Fri, 04 Jun 2021 10:32:24 +1000 From:Simon Burge Message-ID: <20210604003224.cc9b44e...@thoreau.thistledown.com.au> | https://pubs.opengroup.org/onlinepubs/007908799/xsh/open.html doesn't | mention anything about what filesystem types back the path being opened. No, but there are lots of other things also not mentioned that also affect what posix requires.Eg: and somewhat bizarrely, if the process in question was started with one (or more) of fds 0 1 or 2 closed, then what happens would also be unspecified. There the underlying issue is that the open might (in fact, would) return the lowest of the closed ones of those which isn't what applications expect, and so bizarre things happen. But POSIX has no notion of types of bizarre, there is just unspecified (something happens, but implementations get to decide), undefined (anything goes, reasonable or not) and specified. As soon as you move out of the POSIX defined environment, everything becomes unspecified or undefined. Obviously, it wouldn't be useful to take liberties, we wouldn't want open("/dev/null", 0) to call abort() just because the system has a procfs mounted, and particularly if the application isn't using it. But POSIX doesn't say we cannot. This is simply outside the standard. | It does say that O_CREAT without O_EXCL should have no effect if the | files exists. Yes, and obviously, wherever possible, that's what should happen, it is just that you cannot say "required for POSIX conformance" when the file is on a filesystem that doesn't conform with POSIX. | That this particular instance is related to procfs | shouldn't make a difference, right? I'm not aware of any discussions related to procfs type filesystems related to POSIX (doesn't mean there have never been any) but this type of issue comes up from time to time related to NFS, which also has slightly different semantics than "normal" filesystems - and I believe the answer has always been that as soon as you step away from a POSIX environment, the requirements no longer apply. Files and operations on an NFS filesystem aren't required to behave the same way as files on a normal filesystem (which is good, as they don't). kre ps: if we were to be overly cynical, we could also say that to conform to POSIX all that is required (of the implementation, leaving aside for now all the paperwork etc required of the implementors) is that the system pass the POSIX conformance tests. Those have no procfs (or NFS) because those things are not POSIX. Hence testing O_CREAT on a /proc/$$/fd/N type file name will never be done (or it could be, but it would just be a regular file in a regular directory, and irrelevant here), and so cannot cause a system to fail, whatever it does.
Re: procfs difference between NetBSD and Linux
Date:Thu, 3 Jun 2021 18:45:53 - (UTC) From:mlel...@serpens.de (Michael van Elst) Message-ID: | procfs will anser EOPNOTSUPP on VOP_CREATE. But it never comes that | far. No, it doesn't. What I was suggesting doesn't come close to fitting the way things actually work, I should have considered it more before sending. | On the other hand, the logic in namei() might not be correct. I'm not sure it is that simple (that's what I though a half hour or so ago). | It looks like a check to prevent CREATE operations on a mountpoint, | but that's neither necessary nor compatible when the object | already exists. The issue (which is easier to see in much older versions of namei() than the current one) is that a parent vnode pointer is required for CREATE (and DELETE and RENAME) vnode ops, but across a mount point that makes no sense (or does it? Could we simply return the previous vnode in the path regardless of the filesys - or would that wreck the locking somewhere?) If the CREATE is for a mkdir() or link() (or mknod() mkfifo() ...) then all of this makes sense, the EEXIST is correct, and simply returning the existing vnode as it is might not be. But open(path, O_CREAT|..., ...) is different, it is only a CREATE if the path doesn't exist, otherwise it is simply an open. It could do 2 lookups, one to discover if the path exists (returning if it does), and then a second CREATE lookup if it doesn't - but that would be full of races or locking nightmares. kre
Re: procfs difference between NetBSD and Linux
Replying to myself... | I think I am going to experiment with simply removing that error case | and see if anything breaks. but that cannot work, the issue is that the operations in question return the parent vnode, which, when a mount point has been crossed, isn't possible. Simply returning success in that case won't work at all for all the other uses of the CREAT vnop, which expect that parent vnode. I considered dealing with EEXIST in open() (where it makes no sense, unless O_EXCL is set) but that is unlikely to work, as namei() when it returns an error isn't also going to be properly returning the target vnode. My guess at the minute is that to fix this we need a new vnop, OCREATE (optional create) (or something), which works identically to the CREATE operation, except that it doesn't fail if it cannot return the parent vnode - and then callers (which would probably only be open()) using this new op would need to deal with that when it happens. But that's beyond my pay grade here, someone who has worked a lot on namei() and the vnops needs to consider all this a lot more. kre
Re: procfs difference between NetBSD and Linux
Date:Fri, 4 Jun 2021 14:29:51 - (UTC) From:mlel...@serpens.de (Michael van Elst) Message-ID: | We need to understand why namei() does this check and how it can be | corrected. Yes, I was wondering about that, it seems to make no sense to me. A mountpoint, by definition, must exist, so the O_CREAT flag (without O_EXCL) will never be creating anything, so if we hit a mountpoint boundary, just at the resolution of the name, the result cannot be affected by O_CREAT (alone - O_CREAT|O_EXCL is always going to fail, mountpoint or not, if the target name exists). Simply removing whatever the test is should (hypothetically anyway) make no difference to anything, so discovering why the check was added would be useful. I've been taking a bit of a look at the history, and while the error wasn't always EEXIST (that's only from 2011) the test has been there for ages. At the minute I'm thinking it might be a deficit in the design of the vnops ... the error comes from a "create" operation on a mount point, which obviously is going to fail (as do delete and rename). The problem is that O_CREAT isn't always a create op, it can simply be a lookup, it only turns into an actual create operation if the target doesn't exist. Perhaps that means the way the create vnop works needs to be altered, or perhaps this test doesn't really need to be there, as if it is really intended to be a create (as in mkdir() or link()) it should simply fail when it detects the target exists (mount point or not) and if it is an "optional create" (as on O_CREAT on open) then if the target exists it isn't really a create at all. I think I am going to experiment with simply removing that error case and see if anything breaks. kre
Re: procfs difference between NetBSD and Linux
Date:Thu, 3 Jun 2021 09:12:52 - (UTC) From:mlel...@serpens.de (Michael van Elst) Message-ID: | namei() return EEXIST when it works on a CREATE operation and | crosses a mountpoint. Could we perhaps simply have procfs remove O_CREAT from the flags passed by the user? It is never going to work to create a file inside a procfs mount, is it? kre ps: But I'm not sure this is a POSIX problem, POSIX has no procfs, and so anything that uses one is outside the bounds of what POSIX specifies, and into the great vastness of beyond all knowledge - ie: for POSIX, anything on a procfs is an unspecified operation.
Re: procfs difference between NetBSD and Linux
Date:Tue, 1 Jun 2021 14:32:19 +0200 From:Martin Husemann Message-ID: <20210601123219.ga16...@mail.duskware.de> | Good idea - and raise an upstream issue pointing at the non-portable | procfs assumption. And while doing that, ask them what they're possibly trying to achieve with the O_CREAT flag - if /proc/$$/fd/N doesn't exist, how is creating (what would be a normal file, if procfs allowed it) going to possibly do anything useful? It is hard to believe that they're intending that creating a file there will magically cause the fd to open (open to what underlying object?) If they know the fd is open (which they seem to do here) then they know that /proc/$$/fd/N already exists, in which case O_CREAT is useless (in the best of cases). kre
Re: `man` cannot find any entry
Date:Thu, 13 May 2021 15:29:25 -0700 From:J C Message-ID: | Any idea how I can fix this? unset MANPATH Then find where it is being set, and make that stop happening. Use of MANPATH is for unusual situations, it should not normally be required. kre
Re: toupper and warnings
Date:Thu, 06 May 2021 12:52:36 -0700 From:"Greg A. Woods" Message-ID: | Yeah, "Undefined Behaviour" should be undefined -- i.e. removed from the | spec -- i.e. become either fully defined or at least implementation | defined. It is not helpful at all -- it was a very VERY bad idea. Not really possible. To become implementation defined, the implementation needs to be able to specify what happens (even if different from what other implementations specify for the same thing). Sometimes that's not possible, and what happens depends upon things outside the control of the implementation. Eg: accessing an array out of bounds might just return random data from some other data structure, or it might generate a segmentation violation - it all depends upon how far out of bounds the access was, and where in the memory map the array in question happened to be placed. There's no way to define what will happen - even worse on an embedded system, running with no memory management or privilege separation, the access might hit on memory mapped I/O control, or CPU control registers, and do almost anything. | E.g. for ctype.h interfaces the spec should just say that values outside | the recognized range will simply be truncated as if by assignment to an | unsigned char. That might have been a good idea, perhaps, if it had been specified that way initially - only perhaps because it means penalizing good code with meaningless extra checks or no-op data manipulations (&0xFF or whatever) that do nothing for it except make the code run slower, just so bad code behaves in some kind of predictable (but probably still incorrect) way. But it wasn't specified like that. And standards bodies are not legislatures - they don't (or shouldn't) go defining how things should be, and then attempt to force implementations to obey. Rather, they set out what is known to work on all implementations (just omitting ones with admitted bugs which should be fixed), so that applications will know what they need to do to correctly use the interfaces provided, and what they should not do, as the results would either be unspecified (or implementation defined) or even simply undefined. They also make it clear what a new implementation needs to implement in order to be compatible with the other existing implementations, so that applications which work with other implementations will also work with the new one. | What I am pretty sure of though is that there's a vast difference | between the massive number of warnings spit out by the compiler vs. the | relatively low number of actual cases of passing values outside of -1..255. | We certainly wouldn't want to claim UB and abort for all of the warnings! It is certainly true that the compiler is guessing when it issues one of these warnings, in some cases it cannot know what the range of value will be at run time, in others its analysis functionality is simply not up to the task. So a lot of false warnings occur - for some of the warnings the vast majority look to be bogus (which is annoying) for others a warning most commonly means a problem exists. kre
Re: reboot hangs at "uhid2 at uhidev9 report id 7 ..."
Date:Wed, 5 May 2021 18:55:38 +0200 From:Rhialto Message-ID: | On Wed 05 May 2021 at 15:18:03 +0900, Henry wrote: | > The system kept booting into single user mode, but searching around I | > finally figured out that I needed to edit /etc/rc.conf. I thought I | > had successfully changed to rc_configured=YES. | | The installer is also supposed to do that for you, so there must have | been something weird there. Which might also explain the current problem. If the final stages of the system setup weren't done correctly, /dev might not have been setup either. In that case, no-one is going to be able to open /dev/console to output any further (from userlevel) messages. That last message you're seeing is often one of the last from the kernel before you start getting messages from running /etc/rc. That is, it is entirely possible that the system is up and running, but there is simply no way to communicate with it (if rc.conf wasn't set up, the network probably isn't enabled either). Boot in single user mode, check what is in /dev - if what's there doesn't look correct (ordinary file for /dev/null or missing, no /dev/console or not a char device, ...) then delete everything (except MAKEDEV if that is in /dev on your version) and "cd /dev; sh MAKEDEV std" (or sh /etc/MAKEDEV or wherever it is to be found). If what is in /dev looks to be correct, check /etc/ttys next, but incorrect config there is less likely to explain things. kre
Re: IPv6: in6_setscope: can't set scope for not loopback interface
Actually, you can ignore the "-s1600" request, looks as if someone has finally made tcpdumps default snap length somewhat bigger... It won't hurt if you have done it using that option, but it also should no longer (I have no idea for how long back into the past) be needed. kre
Re: IPv6: in6_setscope: can't set scope for not loopback interface
Date:Thu, 22 Apr 2021 20:50:09 +0200 From:=?UTF-8?Q?J=C3=B6rn_Clausen?= Message-ID: | $ ifconfig -a That all looks OK, | I have configured the IPv4 part of vioif0 via /etc/ifconfig.vioif0: Now I'm going to suggest that you (at least temporarily) configure a v6 address on that interface. My suspicion is that something on your system is seeing those v6 incoming multicast packets, and is attempting to reply (with its own multicast packets). But you have no global address - the only non link-local v6 address it can find is ::1. If all of this goes away when you have a v6 address configured, then we'll be much closer to finding out what is going on. Just add inet6 2a04:52c0:101:162::1/64 at the end of /etc/ifconfig.vioif0 (the '1' could be any 16 bit hex value you like, there are more ways to config v6 addrs, but this will do for now). The 2a04:52c0:101:162 is what you said your ISP assigned you. | and define the default route in /etc/rc.conf. don't bother with that for ipv6 for now (no default v6 route). | According to my ISP, he doesn't see the bogus packets with ::1 source, so | indeed they seem to be a product of my machine. Assuming that's correct, which this test should verify (those packets should go away and be replaced by packets from 2a04:52c0:101:162::1 if my guess is correct) then we need to try and work out why the network stack is allowing that to happen. | the resulting PCAP file is at | | https://drive.google.com/file/d/1b_QlSW_oqYb2lMe4m_FO-DU7mAQdd86c/view?usp=sharing I'm unable to fetch that (or rather, I can connect to that page, but all it ever does is show a "rotating circle" kind of thing). Can you just send the pcap file (or perhaps a new version) to me (not the list) via e-mail? A second tcpdump pcap file after the v6 global addr is configured might help as well. And please, use -s 1600 on the tcpdump command that writes the file - I'm not certain that it is required when -w is used, but it certainly won't hurt (without that, only the packet headers tend to be captured, and sometimes not even all that, 1600 is bigger than the MTU (plus ethernet headers) so should get everything). kre
Re: IPv6: in6_setscope: can't set scope for not loopback interface
Date:Thu, 22 Apr 2021 11:06:04 +0200 From:=?UTF-8?Q?J=C3=B6rn_Clausen?= Message-ID: | BTW: This is all happening on the actual network interface, | not the loopback interface. Yes, I knew that, but the NetBSD network stack uses the loopback interface for local packet delivery, it has to be configured correctly or (some) things won't work. | I can see a constant stream of these packets: | | 10:31:46.504046 IP6 2a04:52c0:101:7b1::.5344 > ff15::efc0:988f.6771: UDP, | length 138 Those are multicast packets. Multicast is one of the packet types for which the interface scopes are important. What port 6771 is being used for I'm not sure, /etc/services says it is "plysrv-https" (yes, including for UDP) but it might easily be something else. Maybe someone else here can recognise it. Of you might check, initially using netstat, and then perhaps fstat, whether your host has anything listening on that port. | 2a04:52c0:101:7b1 is on the same network as my machine That would be a network prefix, the source addr is be 2a04:52c0:101:7b1:: (those extra colons are important, and indicate a host part of all zeroes, which is unusual, but I don't think actually incorrect). | (technically, my ISP gave me the address 2a04:52c0:101:162::/64, That's also a network prefix (a block of 2^64 addresses). A different one that the prefix of the sender of those packets, though it is unclear what that prefix (the one assigned to you) is intended for - most likely for your internal network (if you have one, which for your usage you probably don't) rather than for the link between the ISP and you, which might be the 2a04:52c0:101:7b1 prefix. | but I don't use it and haven't configured the interface with it). That won't stop multicast packets arriving, the switch shouldn't be sending them unless something has joined the multicast group, but without knowing a lot more about how your ISP has configured the connections to its kvm guests, it is hard to say that anything wrong is happening. | Every now and then I see this: | | 10:31:49.689606 IP6 ::1.52736 > ff15::efc0:988f.6771: UDP, length 139 | 10:31:49.690455 IP6 ::1.6771 > ff15::efc0:988f.6771: UDP, length 139 | 10:31:51.690739 IP6 ::1.52736 > ff15::efc0:988f.6771: UDP, length 139 | 10:31:51.691180 IP6 ::1.6771 > ff15::efc0:988f.6771: UDP, length 139 Those are simply wrong. That ::1 source addr should never be attempting to send any packets off its host - and if they're arriving over the vioif0 interface, rather than being send, then some other host out there is horribly broken (I'd tend to suspect your config first though). | and this correlates perfectly with /var/log/messages: | | [Thu Apr 22 10:31:49 CEST 2021 < 27.000723>] in6_setscope: can't set scope | for not loopback interface vioif0 and loopback address ::1 Yes, it would. Those packets are nonsense. | So I see packets on my network interface (i.e. not the loopback interface) | with a source of ::1. I am waiting for a reply from my ISP if I am seeing | pink elephants or if there are actually such packets on the network. If there are, the sender of them needs to be fixed, but I wouldn't be surprised if something on your host is trying to send those. | Do you know if port 6771 is some well-known port in IPv6 for housekeeping? No, it is not a port I recognise. But that means nothing. | The information I found seem to lean more to malware, and 2a04:52c0:101:7b1 | might not be acting in good faith...? I don't think I'd be assuming malware, when mistakes are far more likely. The two most likely possibilities are some kind of mis-config on your host, or some kind of mis-config on some other host running in a different KVM guest on the same server. kre
Re: IPv6: in6_setscope: can't set scope for not loopback interface
Date:Wed, 21 Apr 2021 22:50:40 +0200 From:=?UTF-8?Q?J=C3=B6rn_Clausen?= Message-ID: | I am mostly ignorant to everything IPv6, so I have no clue what that | message means, and I was not able to find any enlightenment online. IPv6 link local (and multicast, and sometimes some other) addresses have a "scope" in addition to the address itself. That's because there is nothing in the address which indicates which interface it belongs to (no sub-net identifier or anything like that). The reference to ::1 in the messages is interesting, that's the v6 equivalent of 127.0.0.1 in V4 - the loopback address, and should only be assigned to lo0 (but needs to be there). | Is this something I can fix from inside the OS? Almost certainly. There's probably something mis-configured. What is the status of the loopback interface (lo0) ? Mine looks like: lo0: flags=0x8049 mtu 33624 inet 127.0.0.1/8 flags 0x0 inet6 ::1/128 flags 0x20 inet6 fe80::1%lo0/64 flags 0x0 scopeid 0x3 | $ ifconfig vioif0 | vioif0: flags=0x8843 mtu 1500 | ec_capabilities=1 | ec_enabled=0 | address: 00:16:3e:b3:00:8a | inet 5.2.76.44/24 broadcast 5.2.76.255 flags 0x0 | inet6 fe80::216:3eff:feb3:8a%vioif0/64 flags 0x0 scopeid 0x1 Nothing looks wrong there fe80::216:3eff:feb3:8a is your link local address on that interface, the "%vioif0" is the scope (and the /64 is essentially the netmask of course). While the changes at your ISP may have triggered something, and of course it is possible they're doing something incorrect or unusual, it is probably more likely that it is just different. You might want to capture a short sequence of packets on that interface to see what is happening, since the timestamps you included show the messages appearing several times a minute, capturing packets for just a minute or two should be enough to see if there's anything strange. tcpdump -i vioif0 -s 1600 -w /tmp/packets.pcap ip6 should do it, simply interrupt it after a couple of minutes. Then you can use tcpdump -r or wireshark to look at the packets, or put the file somewhere it can be fetched. kre
Re: ispell-british (for 9.0/amd64) has broken hash files
Date:Tue, 02 Feb 2021 22:15:07 -0800 From:"Greg A. Woods" Message-ID: | I'm wondering ie anyone can reproduce this | with the standard packages, yes. I have given up on ispell because of this and now mostly use aspell instead (sometimes just spell, but more often nothing as my typical e-mail would indicate...) kre
Re: Creating a GPT tab
Date:Sun, 24 Jan 2021 14:12:20 -0800 From:John Nemeth Message-ID: <202101242212.10omckhx022...@server.cornerstoneservice.ca> | The tools won't replicate this, nor should they, as it is a | seriously broken setup. To fix this setup, delete MBR parition 0. Actually, it would take more than that... The PMBR should cover the entire drive, the one shown only accounted for the (unused except for the partitioning headers) section before the start of the windows partition. This is even worse than when I looked at it before. If this works for anything at all (except perhaps ignoring the GPT partitions entirely, and simply allowing access via the MBR to the windows partition) then I suspect that indicates a bug somewhere. It shouldn't. kre
Re: Creating a GPT tab
Hmm, in my previous reply I missed the 0 MBR partition. That one is weird. That's a duplicate spec of the first GPT partition (the windows GPT partition) - which I assume is there to allow ancient windows systems, and other things that understand MBRs, to find that partition without understanding GPT. I am now guessing this is not a PC (for which that would make this an invalid PMBR) but an ARM system perhaps? I am not sure that we have tools that can make something like that. Perhaps, someone else will need to answer. kre
Re: Creating a GPT tab
Date:Sun, 24 Jan 2021 11:49:09 -0700 From:Brook Milligan Message-ID: <818cc659-27b2-4207-94e2-a14c9579f...@nmsu.edu> | I am trying to create GPT partitions that are the same as the following: | The complicating factor is that there is an MBR in sector 0 and the | primary GPT begins at sector 1. That's normal. The MBR is a "protective MBR" - it exists entirely so that old systems that don't understand GPT won't think the drive is empty and overwrite its contents without verifying first. | I cannot figure out how to make the tools replicate this and would | appreciate help. Assuming you mean to use the tools to make a similar structure, rather that make a literal copy of what is on the drive, then gpt create sd0 gpt add -b 32788 -s 163840 -t windows -l Windows_Data sd0 gpt add -s 33554432 -t ffs -l NetBSD_Root sd0 gpt add -s 207618048 -t ffs -l NetBSD-Data sd0 you didn't show the existing partition labels, so I just made some up, use whatever is appropriate (but they should be different from the ones on the existing drive, if you are making a new one, unless the two drives will (absolutely 100% for certain) never be connected to the same system at the same time. That's it. gpt will create the PMBR for you as part of gpt create, don't try and use fdisk to make that one appear. kre
Re: On upgrade: NetBSD ?.? (UNKNOWN)
As an alternative to what Herbert suggested, you can just edit /etc/motd and put whatver you like there It is intended to give a (brief, hopefully) message to people when they login. If you have update_motd set to YES in rc.conf (or do not have it there at all, YES is the default) then at each boot the first line of /etc/motd is made to contain the system version string (by the auto run at startup of /etc/rc.d.motd ... Herbert just suggester doing it manually). kre