Re: pax and ext2fs
On Thu, May 16, 2024 at 12:08 AM Philip Guenther wrote: > On Wed, May 15, 2024 at 1:14 AM Philip Guenther wrote: ... >> I think you've managed to hit a spot where the POSIX standard doesn't >> provide a way for a program to find the information it needs to do its job >> correctly. I've filed a ticket there >>https://austingroupbugs.net/view.php?id=1831 >> >> We'll see if my understanding of pathconf() is incorrect or if someone has a >> great idea for how to get around this... > > So yeah, what's needed is pathconfat(2)** but whether this winding loose end > ("That poor yak.") merits that much code and surface is yet to be examined > deeply. The fix for this has now been committed, so it'll be in 7.6 and a near future snapshot. Philip Guenther
Re: pax and ext2fs
On Thu, May 16, 2024 at 5:33 AM Walter Alejandro Iglesias wrote: > > On Thu May 16 09:48:45 2024 Philip Guenther wrote: > > So yeah, what's needed is pathconfat(2)** but whether this winding loose > > end ("That poor yak.") merits that much code and surface is yet to be > > examined deeply. ... > I read what you posted here: > > https://austingroupbugs.net/view.php?id=1831 > > In the footnote you wrote: > > "(This was encountered when trying to fix a pax implementation's > handling of timestamp comparison for -u when the target filesystem had > courser resolution that the source filesystem by using > pathconf(_PC_TIMESTAMP_RESOLUTION) on the target path to handle the > loss of high-precision time info...but the symlink pointed to a > location with high-precision timestamps so it couldn't know to round > the times when doing the comparison...)" > > > I did one more experiment. I removed the offending soft link from my > hard disk, then I copied the backed-up version of the soft link from the > ext2 drive back to my system tree. So you did so and then checked the timestamps on the symlinks using stat to see how they compared, yes? > Now pax (with your patches) doesn't > insist in re-updating the file, Sounds like you copied with something like 'cp -p' so the copy has a mtime with zero nsecs part, so now they do compare as equal. > *even after updating the file with > touch(1)*. Why would the symlink needs to be recopied by pax? You didn't update the symlink's timestamps. > The soft link *still* points to a location with high-precision > timestampts, but pax does the right job. Because the symlinks now have the exact same timestamp, one with zero nsecs. > Intuitively this suggests me that there is something more that mtime > precision in this misunderstanding between OpenBSD and ext2 file > systems. I think you should check the timestamps on the symlinks at each step to validate that. > P.S.: I'm courious about the following. After running the stat command > here and there, I found *many* files showing that lack of mtime > granularity spread throughout all my system tree (as a side note: this > doesn't happen with their ctime and atime.) The released install tgz files (base75.tgz, etc) use a format where the contained files all have simple integer mtimes and tar is invoked with the -p option (required for correct permissions on setuid/gid files) which makes it also set the mtime on the extracted file to match what's in the tar file. ctime is always set from the local clock when the inode is allocated/updated, so no reason for it to always have a zero nsecs. atime is of course updated from the local clock when you, uh, access them. Philip Guenther
Re: pax and ext2fs
On Wed, May 15, 2024 at 1:14 AM Philip Guenther wrote: > On Tue, May 14, 2024 at 11:59 AM Walter Alejandro Iglesias < > w...@roquesor.com> wrote: > >> Hi Philip, >> >> On Tue May 14 19:40:04 2024 Philip Guenther wrote: >> > If you like, you could try the following patch to pax to more gracefully >> > handle filesystems with time resolution more granular than nanoseconds. >> >> After applying your patch, as I'd done before reporting the issue, I >> sycronized my home directory to an external ext2fs drive with the >> command showed by the man page: >> >> $ pax -rw -v -Z -Y source target >> >> This time only one file stays updating again an again, a soft link I >> have in my ~/bin folder of /usr/local/bin/prename. > > > I think you've managed to hit a spot where the POSIX standard doesn't > provide a way for a program to find the information it needs to do its job > correctly. I've filed a ticket there >https://austingroupbugs.net/view.php?id=1831 > > We'll see if my understanding of pathconf() is incorrect or if someone has > a great idea for how to get around this... > So yeah, what's needed is pathconfat(2)** but whether this winding loose end ("That poor yak.") merits that much code and surface is yet to be examined deeply. Philip Guenther ** or lpathconf(2), but pathconfat(2) is better
Re: pax and ext2fs
On Tue, May 14, 2024 at 11:59 AM Walter Alejandro Iglesias wrote: > Hi Philip, > > On Tue May 14 19:40:04 2024 Philip Guenther wrote: > > If you like, you could try the following patch to pax to more gracefully > > handle filesystems with time resolution more granular than nanoseconds. > > After applying your patch, as I'd done before reporting the issue, I > sycronized my home directory to an external ext2fs drive with the > command showed by the man page: > > $ pax -rw -v -Z -Y source target > > This time only one file stays updating again an again, a soft link I > have in my ~/bin folder of /usr/local/bin/prename. I think you've managed to hit a spot where the POSIX standard doesn't provide a way for a program to find the information it needs to do its job correctly. I've filed a ticket there https://austingroupbugs.net/view.php?id=1831 We'll see if my understanding of pathconf() is incorrect or if someone has a great idea for how to get around this... Philip Guenther
Re: viomb0 unable to allocate256 physmem pages, error 12
viomb is a driver that tries to support OpenBSD, as a VM guest, responding to a request from the VM host to stop using so much physical memory. That log message indicates that the kernel couldn't easily free up that much physical memory, sorry! The VM host is, of course, free to decide to just page out whatever memory it wants instead, possibly resulting in thrashing: running a VM setup oversubscribed for memory is a great way to be frustrated and hate computers. How can you make that message go away? Provision your VM setup with enough memory that it's not over subscribed, or at least so that the OpenBSD guest(s) isn't the one being asked to slim itself (possibly by giving it *less* but _reserved_ memory, so that the VM host never tries to shrink its usage). Philip Guenther On Tue, May 14, 2024 at 4:16 PM F Bax wrote: > I'm not a coder; but I found source for viomb; which > calls uvm_pglistalloc; which calls uvm_pmr_getpages which mentions ENOMEM: > > https://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/src/sys/uvm/uvm_pmemrange.c?rev=1.66=text/plain > There I found this comment: > * fail if any of these conditions is true: > * [1] there really are no free pages, or > * [2] only kernel "reserved" pages remain and > *the UVM_PLA_USERESERVE flag wasn't used. > * [3] only pagedaemon "reserved" pages remain and > *the requestor isn't the pagedaemon nor the syncer. > > Unsure how I might use this information to get rid of the previously > mentioned error message.. > > On Tue, May 14, 2024 at 2:28 PM Peter J. Philipp > wrote: > >> On Tue, May 14, 2024 at 01:58:18PM -0400, F Bax wrote: >> > Recently installed 7.5 amd64 in qemu VM (8G RAM) under proxmox. See this >> > message many times on console and dmesg. >> > >> > viomb0 unable to allocate 256 physmem pages, error 12 >> > >> > What does this mean? How to resolve this issue? >> >> Hi, >> >> When you see "error " it's good to look up the manpage on errno. >> Under number 12 it says: ENOMEM "Cannot Allocate Memory". But look for >> yourself for a deeper explanation. Also if you want to hunt for this >> errno >> in the code you would most likely grep for ENOMEM. >> >> Best Regards, >> -pjp >> >> -- >> ** all info about me: lynx https://callpeter.tel, dig loc >> delphinusdns.org ** >> >>
Re: pax and ext2fs
If you like, you could try the following patch to pax to more gracefully handle filesystems with time resolution more granular than nanoseconds. The whitespace will presumably be mauled by gmail so use patch's -l option. Philip Guenther Index: ar_subs.c === RCS file: /data/src/openbsd/src/bin/pax/ar_subs.c,v diff -u -p -r1.51 ar_subs.c --- ar_subs.c 10 Jul 2023 16:28:33 - 1.51 +++ ar_subs.c 14 May 2024 17:19:15 - @@ -146,23 +146,59 @@ list(void) } static int -cmp_file_times(int mtime_flag, int ctime_flag, ARCHD *arcn, struct stat *sbp) +cmp_file_times(int mtime_flag, int ctime_flag, ARCHD *arcn, const char *path) { struct stat sb; + long res; - if (sbp == NULL) { - if (lstat(arcn->name, ) != 0) - return (0); - sbp = + if (path == NULL) + path = arcn->name; + if (lstat(path, ) != 0) + return (0); + + /* +* The target (sb) mtime might be rounded down due to the limitations +* of the FS it's on. If it's strictly greater or we don't care about +* mtime, then precision doesn't matter, so check those cases first. +*/ + if (ctime_flag && mtime_flag) { + if (timespeccmp(>sb.st_mtim, _mtim, <=)) + return timespeccmp(>sb.st_ctim, _ctim, <=); + if (!timespeccmp(>sb.st_ctim, _ctim, <=)) + return 0; + /* <= ctim, but >= mtim */ + } else if (ctime_flag) + return timespeccmp(>sb.st_ctim, _ctim, <=); + else if (timespeccmp(>sb.st_mtim, _mtim, <=)) + return 1; + + /* +* If we got here then the target arcn > sb for mtime *and* that's +* the deciding factor. Check whether they're equal after rounding +* down the arcn mtime to the precision of the target path. +*/ + res = pathconf(path, _PC_TIMESTAMP_RESOLUTION); + if (res == -1) + return 0; + + /* nanosecond resolution? previous comparisons were accurate */ + if (res == 1) + return 0; + + /* common case: second accuracy */ + if (res == 10) + return arcn->sb.st_mtime <= sb.st_mtime; + + if (res < 10) { + struct timespec ts = arcn->sb.st_mtim; + ts.tv_nsec = (ts.tv_nsec / res) * res; + return timespeccmp(, _mtim, <=); + } else { + /* not a POSIX compliant FS */ + res /= 10; + return ((arcn->sb.st_mtime / res) * res) <= sb.st_mtime; + return arcn->sb.st_mtime <= ((sb.st_mtime / res) * res); } - - if (ctime_flag && mtime_flag) - return (timespeccmp(>sb.st_mtim, >st_mtim, <=) && - timespeccmp(>sb.st_ctim, >st_ctim, <=)); - else if (ctime_flag) - return (timespeccmp(>sb.st_ctim, >st_ctim, <=)); - else - return (timespeccmp(>sb.st_mtim, >st_mtim, <=)); } /* @@ -842,14 +878,12 @@ copy(void) /* * if existing file is same age or newer skip */ - res = lstat(dirbuf, ); - *dest_pt = '\0'; - - if (res == 0) { + if (cmp_file_times(uflag, Dflag, arcn, dirbuf)) { + *dest_pt = '\0'; ftree_skipped_newer(arcn); - if (cmp_file_times(uflag, Dflag, arcn, )) - continue; + continue; } + *dest_pt = '\0'; } /* On Thu, May 2, 2024 at 6:54 AM Walter Alejandro Iglesias wrote: > On Thu, 2 May 2024 12:03:10, Stuart Henderson wrote > > I don't have a suitable filesystem handy to test, but does OpenBSD's > > implementation of ext2fs support sub-second timestamps? > > > > stat -f %Fm $filename > > > > If not, that's a probable explanation for the difference in behaviour. > > You could probably confirm by forcing timestamps with no nanosecond > > components, e.g. touch -t mmddhhmm.ss $filename, or copy to ext2fs > > and back again. > > $ doas mount -t ext2fs /dev/sd0i /mnt > $ touch ~/test.txt > $ cp ~/test.txt /mnt > $ stat -f %Fm /mnt/test.txt > 1714657214.0 > $ cp ~/test.txt /mnt > $ stat -f %Fm /mnt/test.txt > 1714657409.0 > 癘m >
Re: pax and ext2fs
On Tue, Apr 30, 2024 at 5:50 AM Walter Alejandro Iglesias wrote: > I'd never used pax(1), reading the man page I found this command can be > used to make a backup: > > $ pax -r -w -v -Y -Z home /backup > > Faster than using rsync indeed, but it seems that the -Y and -Z options > don't work with ext2fs? > It should work the same as on ffs, but since you put zero effort into describing _how_ its behavior didn't match your expectations, I wouldn't expect anyone to put more than zero effort in reading your mind. Good luck! Philip Guenther
Re: Getting "Boot error" after replacing a disk in softraid
RAID replicates the data in the RAIDed area, yes? Do you have some reason to believe that the boot information (MBR, etc) is _inside_ the RAID area, because I do not believe that. Really feels like installboot needs to be run on this drive to, uh, install the proper boot info. Philip Guenther On Tue, Apr 23, 2024 at 8:19 AM wrote: > Also, if I boot from a USB stick, with only the new SSD attached, the > softraid is registered as degraded (as the other old disk is missing), so > it has been populated, and the partition is also marked with an asterisk > for boot, but I still cannot boot from that drive. > >
Re: AAAA entry for openbsd.org
On Sun, Oct 22, 2023 at 6:53 PM Armin Jenewein wrote: > Hi. > > On 23-10-22 15:47:45, Kastus Shchuka wrote: > > On Sun, Oct 22, 2023 at 10:29:08PM +0200, Armin Jenewein wrote: > > > Hi, > > > > > > as I'm almost 100% sure adding IPv6 connectivity to the openbsd.org > > > host > > > wouldn't introduce side-effects for IPv4 users: is there any reason > > > openbsd.org still has no entry at the end of 2023? > > > > Why do you need it? > > Because it's extremely inconvenient to have manually type in the name of > a mirror that I know has an entry. The installer won't even be able > to download the mirror list because of the reason I mentioned. It tries > to talk to openbsd.org which obviously fails. See, this is why being clear about What Fine Problem You're Trying To Solve is important: AFAICT the installer tries to fetch the mirror list from ftplist1.openbsd.org and not from openbsd.org. Can you confirm that your _actual_ request is to have the installer be able to get the mirror list when on an IPv6-only host? (Please don't rant at people who try to help, particularly when doing exactly what you requested would NOT HAVE HELPED, unless you *want* people to drop you in their kill-file as "not worth trying to help".) Philip Guenther
Re: ImageMagick fails on OpenBSD 7.4 fresh install
Ah, sorry for my misreading what you wrote. Please use 'sendbug' to report the sequence of pkg_add operations that didn't work. Philip Guenther On Sun, Oct 22, 2023 at 5:50 PM Mark wrote: > It wasn't an upgraded system, that's fresh install, a completely new > OpenBSD 7.4 amd64. > > And the first package I wanted to install, was imagick. And it failed as on > the screenshot image link. > > However, installing the "gtk-update-icon-cache" package, and after, pkg_add > imagick solved the problem. > > That was the suggestion of "quinq", from IRC #openbsd. ("can you try > installing that package, and then ImageMagick?") > > Philip Guenther , 23 Eki 2023 Pzt, 02:54 tarihinde > şunu > yazdı: > > > Don't know what's wrong with the pkg database (/var/db/pkg/) on your > > system, but on mine the shared-mime-info-2.2 package includes a > definition > > for the update-mime-info tag, so if yours lacks that then something in > > there got hosed during your upgrade. Could be data loss from disk > failure, > > could be something pruned critical info from /var/db/pkg/, could be > > something I can't think of. > > > > So, I would suggest starting with verifying your confidence in your > > storage (no kernel log error messages about I/O errors? If this machine > > has suffered any file system issues then maybe backup, verify-backup, > newfs > > and restore?) > > > > Then I would probably reinstall *all* packages, but since I don't (fully) > > trust the pkg database, I would probably do it with the > > 1) pkg_info -mz > manual > > 2) cd /var/db/pkg && pkg_delete * > > 3) make sure nothing unexpected has been left behind in /var/db/pkg/ or > > /usr/local/* > > 4) pkg_add -l manual > > > > > > Or maybe now's a good time to do a fresh install. > > > > > > Philip Guenther > > > > > > On Sun, Oct 22, 2023 at 3:34 PM Mark wrote: > > > >> Tried changing the installurl, an another mirror, but didn't help. > >> > >> Here's what actually happens; > >> > >> https://i.ibb.co/G0wbGf5/terminal-sshot.png > >> > >> Regards. > >> > >> Mark , 23 Eki 2023 Pzt, 01:16 tarihinde şunu > >> yazdı: > >> > >> > pkg_add ImageMagick-6.9.12.88p0 gives me; > >> > > >> > (after fetching few libraries) > >> > > >> > "Can't install ImageMagick-6.9.12.88p0: can't resolve > >> > djvulibre-3.5.28p1,libheif-1.16.2p0" > >> > > >> > and then; > >> > "Couldn't install ImageMagick-6.9.12.88p0 djvulibre-3.5.28p1 > >> > libheif-1.16.2p0." > >> > > >> > This is a fresh OpenBSD 7.4 amd64 release. My installurl is pointed to > >> > cdn.openbsd.org/pub/OpenBSD. > >> > > >> > Any other php packages were installed fine. But both > >> > pecl80-imagick-3.7.0p1 and ImageMagick fail. > >> > > >> > Some idea would be much appreciated! > >> > > >> > Regards. > >> > > >> > > >
Re: X session doesn't survive zzz
I would start by removing X from the picture and verify that suspend and resume are working (or not) when X is not running. Are USB devices failing to reattach or coming back in some weird mode which isn't working? Can you ssh in? If that's working fine, then bring X back into the picture but capture /var/log/Xorg.0.log both before suspending and then after resuming (ssh in if necessary) and see what X is falling over on. Philip Guenther On Wed, Oct 18, 2023 at 4:17 AM Jan Stary wrote: > On Oct 18 11:11:54, h...@stare.cz wrote: > > This is current/amd64 on a PC (dmesg below). > > After a resume from zzz inside a running X session, > > I am greeted with the xenodm login screen > > into which I cannot login: the keyboard does nothing > > (is it the USB keyboard not reattaching properly?). > > > > Loging in on the console, > > To be clear: typing the username and passwd > into the xenodm login screen does nothing, > but on the console the kbd works as expeceted. > > > I see that the X session > > and the X applications (firefox, xterms) are dead. > > On the other hand, the mplayer that has been zzz'ed > > inside a tmux session starts playing again. > > > > After restarting xenodm with rcctl restart xenodm, > > I can log in and everything seems to work again. > > > > See the dmesg below, including the zzz and resume, > > and the full X log up to here. How can I debug this? > > > > Jan > > > > > > OpenBSD 7.4-current (GENERIC.MP) #1406: Sun Oct 15 10:34:05 MDT 2023 > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > > real mem = 8285454336 (7901MB) > > avail mem = 8014598144 (7643MB) > > random: good seed from bootblocks > > mpath0 at root > > scsibus0 at mpath0: 256 targets > > mainbus0 at root > > bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf0100 (36 entries) > > bios0: vendor Award Software International, Inc. version "F3" date > 03/31/2011 > > bios0: Gigabyte Technology Co., Ltd. H67MA-USB3-B3 > > acpi0 at bios0: ACPI 1.0 > > acpi0: sleep states S0 S3 S4 S5 > > acpi0: tables DSDT FACP HPET MCFG ASPT SSPT EUDS MATS TAMG APIC SSDT > > acpi0: wakeup devices PCI0(S5) PEX0(S5) PEX1(S5) PEX2(S5) PEX3(S5) > PEX4(S5) PEX5(S5) PEX6(S5) PEX7(S5) HUB0(S5) UAR1(S3) USBE(S3) USE2(S3) > AZAL(S5) > > acpitimer0 at acpi0: 3579545 Hz, 24 bits > > acpihpet0 at acpi0: 14318179 Hz > > acpimcfg0 at acpi0 > > acpimcfg0: addr 0xf400, bus 0-63 > > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat > > cpu0 at mainbus0: apid 0 (boot processor) > > cpu0: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz, 3492.09 MHz, 06-2a-07, > patch 002f > > cpu0: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN > > cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB > 64b/line 8-way L2 cache, 8MB 64b/line 16-way L3 cache > > cpu0: smt 0, core 0, package 0 > > mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges > > cpu0: apic clock running at 99MHz > > cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE > > cpu1 at mainbus0: apid 2 (application processor) > > cpu1: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz, 3492.12 MHz, 06-2a-07, > patch 002f > > cpu1: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN > > cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB > 64b/line 8-way L2 cache, 8MB 64b/line 16-way L3 cache > > cpu1: smt 0, core 1, package 0 > > cpu2 at mainbus0: apid 4 (application processor) > > cpu2: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz, 3492.19 MHz, 06-2a-07, > patch 002f > > cpu2: > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN > > cpu2: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB > 64b/line 8-way L2 cache, 8MB 64b/line 16-way L3 cache > > cpu2: smt 0, cor
Re: ImageMagick fails on OpenBSD 7.4 fresh install
Don't know what's wrong with the pkg database (/var/db/pkg/) on your system, but on mine the shared-mime-info-2.2 package includes a definition for the update-mime-info tag, so if yours lacks that then something in there got hosed during your upgrade. Could be data loss from disk failure, could be something pruned critical info from /var/db/pkg/, could be something I can't think of. So, I would suggest starting with verifying your confidence in your storage (no kernel log error messages about I/O errors? If this machine has suffered any file system issues then maybe backup, verify-backup, newfs and restore?) Then I would probably reinstall *all* packages, but since I don't (fully) trust the pkg database, I would probably do it with the 1) pkg_info -mz > manual 2) cd /var/db/pkg && pkg_delete * 3) make sure nothing unexpected has been left behind in /var/db/pkg/ or /usr/local/* 4) pkg_add -l manual Or maybe now's a good time to do a fresh install. Philip Guenther On Sun, Oct 22, 2023 at 3:34 PM Mark wrote: > Tried changing the installurl, an another mirror, but didn't help. > > Here's what actually happens; > > https://i.ibb.co/G0wbGf5/terminal-sshot.png > > Regards. > > Mark , 23 Eki 2023 Pzt, 01:16 tarihinde şunu > yazdı: > > > pkg_add ImageMagick-6.9.12.88p0 gives me; > > > > (after fetching few libraries) > > > > "Can't install ImageMagick-6.9.12.88p0: can't resolve > > djvulibre-3.5.28p1,libheif-1.16.2p0" > > > > and then; > > "Couldn't install ImageMagick-6.9.12.88p0 djvulibre-3.5.28p1 > > libheif-1.16.2p0." > > > > This is a fresh OpenBSD 7.4 amd64 release. My installurl is pointed to > > cdn.openbsd.org/pub/OpenBSD. > > > > Any other php packages were installed fine. But both > > pecl80-imagick-3.7.0p1 and ImageMagick fail. > > > > Some idea would be much appreciated! > > > > Regards. > > >
Re: Delay in starting xterm via ssh after upgrade from 7.3 to 7.4
If this had been observed _during_ 7.4 development then it would have been simpler to isolate what set of changes caused it. Since that didn't happen you'll have to debug this yourself on the affected systems. For starters, I would suggest turning up ssh logging with the -v option and capturing that to a file and comparing the output on working and not working systems. Or ktrace the stuttering processes and see when kdump -T output shows as the operations where the delays occurred. As for your "should I have never been doing these this way?" question, that's unanswerable without knowing _why_ you had written them that way. Using -Y instead of -X to disable XSecurity enforcement? Why tunnel X instead of have the remote client connect directly to the X server? You wrote those to solve some problem, changing that means going back and reopening that question, which is probably a distraction from the "why did the latency change" question. On Sun, Oct 22, 2023 at 7:22 AM Roger Marsh wrote: > On Thu, 19 Oct 2023 17:23:47 + > Roger Marsh wrote: > > > Hi, > > > > After upgrade from 7.3 to 7.4 (on both boxes) the xterm session for this > entry in .fvwmrc (on monitor): > > > > 'Exec exec ssh -Y opendev xterm -title roger@opendev' > > > > takes several seconds to deliver the xterm window, while I did not > notice any delay before upgrade. > > > > For other usernames on opendev the .fvwmrc entry is like (without the > '-X' for most usernames other than grading): > > > > 'Exec exec xterm -title grading@opendev -e ssh -X grading@opendev' > > > > and I do not notice any delay after upgrade compared with before upgrade. > > > > Expressing the 'roger@opendev' entry as: > > > > 'Exec exec xterm -title roger@opendev -e ssh -Y roger@opendev' > > > > fixes the delay problem, but was the delay a predictable consequence of > some change? Or perhaps the entry should never have been expressed in the > way that led to the delay? > > > > Below are dmsesg and pkg_info for both boxes involved. > > > > Roger > > ... > dmesg and pkg_info for monitor and opendev snipped. > ... > > Hi, > > Later I saw opening files with Python's Idle editor suffers the same > pattern of slow response, in terms of serving up the file edit window, as > seen with xterm. Scrolling through an editor window is slower too, and > stutters, compared with what was seen when both boxes were at 7.3 (PgUp and > PgDn buttons are what I used). > > One box (gash) had not been upgraded to 7.4 (because I thought it did not > have OpenBSD disks). It was modified, in particular adding Python Idle and > Chromium, to see what happens when 7.3 has the Xserver role and 7.4 the > Xclient role; and the other way round. > > Idle > XserverXclient Display file window Scrolling > 7.47.3 slow stutter > 7.37.4 quicksmooth > 7.47.4 slow stutter > 7.37.3 quicksmooth (from > memory: confirmed on reverting) >Same 7.4 boxquicksmooth > > Idle is started by 'Exec exec ssh -Y idle3.10' in .fvwmrc file. > Chromium is started by 'Exec exec ssh -X @ chrome' in > .fvwmrc file. > > This behaviour with Python persuades me to revert the OpenBSD 7.4 box > (monitor) in the Xserver role to 7.3 until 7.4 or later provides more > acceptable response times. > > Chromium seemed unaffected except for slow response when typing in the URL > bar on the separate 7.4 Xserver box. I thought I could mostly avoid this > by starting to use bookmarks, but the effect on Python matters more. > > Apologies for going off-topic by discussing Python and Chromium rather > than xterm: but the Python stuff changes my attitude to the problem from > minor annoyance to something which needs an immediate workaround. > > Below are dmesg (most recent reboot only) and pkg_info for the OpenBSD 7.3 > box (gash). > > Roger > > Script started on Sat Oct 21 17:09:38 2023 > gash$ dmesg > syncing disks... done > rebooting... > OpenBSD 7.3 (GENERIC.MP) #1125: Sat Mar 25 10:36:29 MDT 2023 > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP > real mem = 3967422464 (3783MB) > avail mem = 3827781632 (3650MB) > random: good seed from bootblocks > mpath0 at root > scsibus0 at mpath0: 256 targets > mainbus0 at root > bios0 at mainbus0: SMBIOS rev. 2.5 @ 0xe9f80 (85 entries) > bios0: vendor Hewlett-Packard version "786G1 v01.16" date 03/05/2009 > bios0: Hewlett-Packard HP Compaq dc7900 Small Form Factor > acpi0 at bios0: ACPI 1.0 > acpi0: sleep states S0 S3 S4 S5 > acpi0: tables DSDT FACP APIC ASF! MCFG TCPA SLIC HPET DMAR > acpi0: wakeup devices PCI0(S4) PEG1(S4) PEG2(S4) IGBE(S4) PCX1(S4) > PCX2(S4) PCX5(S4) PCX6(S4) HUB_(S4) USB1(S3) USB2(S3) USB3(S3) USB4(S3) > USB5(S3) USB6(S3) EUS1(S3) [...] > acpitimer0 at acpi0: 3579545 Hz, 24 bits > acpimadt0 at acpi0 addr 0xfee0:
Re: Crash on TOSHIBA PORTEGE Z30-A laptop
On Sat, Oct 21, 2023 at 2:27 AM wrote: > Hi Philip, > > Thank you very much for your answer. > > I tried to disable all options (+devices) possible. Same issue. > And what's about disable acpi in the kernel using the bsd.re-config? > As Mike and Theo noted, this will certainly cause problems. > Do you think If I replace the wireless card by somthing else, It could > resolve this issue? > Very unlikely. The problem is the stack depth of the ACPI processing. The crash you saw had the wifi interrupt occur during the ACPI processing but it could just as well happen with some other device interrupting the ACPI processing. If there isn't a newer BIOS that resolves this, I would tend to return the box as not suitable. Phlip Guenther
Re: Crash on TOSHIBA PORTEGE Z30-A laptop
On Fri, Oct 20, 2023 at 1:23 PM wrote: > I've recently installed OpenBSD 7.4 on this laptop. > > However, I'm experiencing random crashes. These occur at various times, > including during kernel loading (before running /etc/rc), > > or later while I'm using the system. > > > I've included the contents of /var/run/dmesg.boot below and attached the > screens with the ddb output command. > ... > bios0: vendor TOSHIBA version "Version 4.30" date 04/26/2018 > The screenshots show that the fault happens during a wifi interrupt that catches the ACPI thread processing a very deeply nested AML code. I suspect it's actually running out of kernel stack space as a result. Everything below is based on that hypothesis. So, the first thing to try is to see if there's a BIOS update newer than the 2018 rev it currently has. They may have optimized the AML code, or at least made it less deeply nested. Another possibility is to see if there's a device you can disable that would result in that AML not being called. If there's anything that you aren't using then disable it in the BIOS and hope. The last possibility would be to build a kernel which allocates more pages per thread for its kernel stack by bumping the UPAGES #define in /usr/src/sys/arch/amd64/include/param.h and building a new kernel. It's really only the ACPI thread that needs this, but we don't currently have code to control that on a per-thread basis. Philip Guenther
Re: reorder_kernel: failed
On Tue, Oct 17, 2023 at 10:34 AM Karel Lucas wrote: > Content of relink.log: > > (SHA256) /bsd: OK > LD="ld" sh makegap.sh 0x gapdummy.o > ld -T ld.script -X --warn-common -nopie -o newbsd ${SYSTEM_HEAD} > vers.o ${OBJS} > text data bssdec hex > 21325291403432 124108822969811 15e7dd3 > mv newbsd newbsd.gdb > ctfstrip -S -o newbsd newbsd.gdb > rm -f bsd.gdb > mv -f newbsd bsd > install -F -m 700 bsd /bsd && sha256 -h /var/db/kernel.SHA256 /bsd > install: rename: INS@4erJJ3bo3 to /bsd: Operation not permitted > *** Error 1 in /usr/share/relink/kernel/GENERIC.MP (Makefile:2267 > 'newinstall') > So renaming over /bsd failed with EPERM. That smells like /bsd is marked immutable via chflags. To verify, what's the output of ls -ldo / /bsd ? If it *is* marked immutable, then uh, you'll need to undo that and figure how the heck that happened and make sure it doesn't happen again. (If _you_ marked it immutable, then don't, or at least don't waste people's time when that breaks things.) Philip Guenther
Re: debugging "invalid argument" errors when loading elf files
On Tue, Oct 10, 2023 at 11:44 PM Lorenz (xha) wrote: > On Mon, Oct 09, 2023 at 01:29:52PM -0700, Philip Guenther wrote: > > On Mon, Oct 9, 2023 at 11:21 AM Lorenz (xha) wrote: > > > > > hi misc@, > > > > > > i'm currently porting the hare programming language to openbsd and i am > > > having quite a few problems trying to use a linker script. i am always > > > getting a "/bin/ksh: .bin/hare: Invalid argument" error. > > > > > > so far i tried a lot of stuff like comparing a working version without > a > > > linker script, looking if any of the programm headers are missing, etc. > ... > > Read /usr/src/sys/kern/*exec* and review the logic around the 10 > > occurrences of EINVAL in that code. Presumably the differences you > > identified will point to one or more of them > > found it: PT_PHDRS is missing. i didn't identify that difference at > first tho. it's needeed for PIE if i understand correctly. > Yeah. > why is ld not adding a PT_PHDR programm header? as far as i undestand, > PT_PHDR are the programm headers themselfs? PT_PHDR is the tag for an entry in the program headers that points to the program headers themselves. Some ELF files (for example, core files) have a program header but don't include a PT_PHDR entry in it. It's presumably not added by ld because you supplied a linker script and ld is trying to give you as much control as possible. Some of the other arguments you supplied may have required it to fill in other details, but including a DT_PHDR entry in the program header is apparently not one of them. As Theo says, this sort of thing makes linker scripts very subtle, with arch dependencies**, interactions with RELRO and W^X processing, and plain ABI weirdness. Even those of us who have written several of them often have to start from what 'readelf -lS' shows the default is to put together a starting point and massage it from there to achieve whatever our goal for going through this effort is. ** e.g., permissions on and immutability of .plt, .got, etc sections vary. Some archs have required some sections to be before or after others due to the CPU treating a limited range offset in some instructions as unsigned, so 'before' cannot be reached this is my linker script ... > i am moving the init functions in a > different section so that the hare runtime can execute them and not > libc. Well, the first issue with this is that libc doesn't invoke init functions, so there's some sort of misunderstanding going on. The .init section of the executable itself is invoked by the entry point code in crt0 before main is called, not libc. At a low level, that's done not via DT_INIT's value or the section name but via a symbol placed in the section which would presumably be carried along if you renamed the section, so you can't affect that. The .init_array section is handled by ld.so in dynamic programs where it's located via DT_INIT_ARRAY, and by crt0 in static programs where it's located via *_start and *_end symbols which I _guess_ would be from whatever ended up with the .init_array section name, so maybe renaming the section would prevent them from being invoked in that case, but I'm not sure and that an implementation detail that might change. (Moving a chunk of the crt0 code into libc would be possible (glibc did it, for example) and might make some evolution easier, but it hasn't happened.) ...but, backing up...what problem exists with the ordering that currently happens that makes you believe you need to interpose and have the "hare runtime" (sorry, I'm not familiar with that) execute them? How will you measure success of the change? Philip Guenther
Re: debugging "invalid argument" errors when loading elf files
On Mon, Oct 9, 2023 at 11:21 AM Lorenz (xha) wrote: > hi misc@, > > i'm currently porting the hare programming language to openbsd and i am > having quite a few problems trying to use a linker script. i am always > getting a "/bin/ksh: .bin/hare: Invalid argument" error. > > so far i tried a lot of stuff like comparing a working version without a > linker script, looking if any of the programm headers are missing, etc. > So you have a working binary (w/o linker script) and a not-working binary (w/linker script) and you've even done the comparison of the program headers of the two...and you're not going to show those but rather ask what, in general, could be wrong? Okay. Lacking the specifics of those differences (which you've already identified), the general advice is this: Read /usr/src/sys/kern/*exec* and review the logic around the 10 occurrences of EINVAL in that code. Presumably the differences you identified will point to one or more of them Philip Guenther
Re: Speed: dump/restore vs rsync
Whelp, that's bizarre: AFAICT that file type has never been used by any of the mainline BSDs. I guess you could have had a bitflip from a bug (or drive issue, or cosmic ray, etc) and it should really be either 012 (symlink) or 014 (socket). You could try tracking it down (update dump to print the inum too?) and then use find(1) to see what path(s) it has to get a hint about what it should have been, and then maybe use fsdb to change its type to what seems the correct one, though I would *test* my backup before doing that and be prepared to spend the time to newfs+restore the filesystem in case things go wrong. Philip On Fri, Sep 22, 2023 at 12:44 PM vitmau...@gmail.com wrote: > Dear Philip, > > thank you for pulling my ears. The complete error message is: > > DUMP: Warning: undefined file type 013 > > Best, > Vitor > > Em sex., 22 de set. de 2023 às 16:17, Philip Guenther > escreveu: > >> On Fri, Sep 22, 2023 at 11:18 AM vitmau...@gmail.com >> wrote: >> >>> I'm also getting an "undefined file type" error from dump. I found one >>> guy >>> from the FreeBSD mail list that got the same error, but he solved his >>> problem using fsck on the partition. I forced fsck on my side, since the >>> filesystem was marked as clean, but to no avail. >>> >> >> In OpenBSD's dump, that warning includes the actual file type which >> provoked the error: >> >> : bleys; pwd >> /usr/src/sbin/dump >> : bleys; grep 'undefined file type' *.c >> traverse.c: msg("Warning: undefined file type 0%o\n", >> : bleys; >> >> Since yours didn't include that, you would appear to be running some >> other version of dump, which would make this the wrong place to get help. >> >> ...unless you truncated an error message before asking about it, which is >> kinda self-defeating. >> >> >> Philip Guenther >> >>
Re: rmt, rcmd, /etc/hosts.equiv and .rhosts
On Sat, Sep 9, 2023 at 5:34 PM Daniele B. wrote: > Just investigating about /etc/hosts.equiv and ~/.rhosts and I was > quite serious to think that my system doesn't need both of them > I have not used a system in 30 years where /etc/hosts.equiv existed. I deleted my .rhosts when ssh was present on all the systems I had to deal with. I then start to look carefully my /etc and discovered a link > that read like this: > > 0 lrwxrwx--- 1 root wheel 13 Mar 25 17:14 /etc/rmt -> /usr/sbin/rmt > So, 45 years ago, /sbin didn't exist and when "systems" programs were added they were put in /etc. So, when someone decided it would be neat if dump could send its output over an rcmd(3) connection to another machine, the rmt program was put in /etc and dump was told to invoke "/etc/rmt" via rcmd(). Indeed, dump *still does that* in OpenBSD. man rmt: > ... > man rcmd: > ... > SUPERBUG (by myself): > > One can be "tempted" to think to a ruserok() function that hacked can > return always OK (0) and otherwise one can always revert to rcmdsh() > with the help of a "good" rshprog. > I'm sorry, but I don't understand what you're trying to say here. ruserok() is in libc and linked into rmt, so in this case "a ruserok() function that hacked can return always OK" would mean "if you can alter a root owned binary on the target system", which is...boring? I'm here to ask enlightment about the opportunity to define > /etc/hosts.equiv and ~/.rhosts but mainly Short answer: don't. Longer answer: "what problem are you trying to solve?" I suppose OpenSSH still has some hosts.equiv and .rhosts bits, but I trust that Theo periodically attempts to kill them completely and only the "we will sell no wine / until its time" hands of Damien and Darren will measure when they can finally be removed (if I haven't lost track and they're already gone). If there is still support for those in base libc, well, your asking about them may result in them being removed. > if it is still the case (and > why) to have this rmt link in etc. See explanation above. > Last if not first, what is the best > practice to defend myself form BUG and SUPERBUG listed above. > "Don't let untrusted people alter libc on your system" You *do* understand that you are trusting the OpenBSD developers by using OpenBSD, just as you are trusting the FreeBSD developers if you use FreeBSD, and trusting the Linux kernel and glibc and GNU-whatever utils, and systemd, and distro developers if you use a Linux distribution, yes? If you don't trust a community or find its values don't match yours, then find a different community that matches it better, or build your own. Philip Guenther OpenBSD developer
Re: struct kinfo_proc: p_schedflags and PSCHED_*
On Mon, Sep 11, 2023 at 12:01 PM Benjamin Stürz wrote: > I'm writing a little toy /proc fuse-fs for OpenBSD. > > The field p_schedflags defined in struct kinfo_proc > in file /usr/include/sys/sysctl.h refers to PSCHED_*, > but I can't find any references to these macros with: > $ grep -rn PSCHED_ /usr/include /usr/src/sys > > Nor can I find any references to p_schedflags: > $ grep -rn p_schedflags /usr/src > > This leads me to believe that this field is unused > and should also be marked accordingly, > to avoid future confusion. > > If not, please let me know how I interpret use this field. > Yeah, that field hasn't actually been set since 2007, when the scheduler state was moved into per-CPU struct schedstate_percpu. I guess it should be deleted the next time we bump the kinfo_proc ABI, along with a review of whether there are other dead items. I don't think we're, at least at this time, interested in exposing or making any promises about the inner workings of the scheduler, as those flags did. Philip
Re: signify: invalid comment in SHA256
On Sat, Jun 17, 2023 at 9:38 PM Avon Robertson wrote: > Used lynx to get the 4 files shown below. The shell prompt is a 2 > line prompt with current dir on 1st line,'$ ' only, on the 2nd line. > > Below is output captured from a tmux pane with a script. > > aahno:/ > $ cat /etc/installurl > https://mirror.aarnet.edu.au/pub/OpenBSD > aahno:/ > $ lynx $(cat /etc/installurl)/snapshots/amd64 > aahno:/ > $ cd ~/download > aahno:/home/anon/download > $ ls -la > total 1361400 > drwxr-x--- 2 anon anon512 Jun 18 15:35 . > drwxr-xr-x 25 anon anon 1536 Jun 18 08:01 .. > -rw-r- 1 anon anon 44817 Jun 18 15:34 INSTALL.amd64 > -rw-r- 1 anon anon 1992 Jun 18 15:34 SHA256 > -rw-r- 1 anon anon 2144 Jun 18 15:34 SHA256.sig > -rw-r- 1 anon anon 696745984 Jun 18 15:36 install73.img > aahno:/home/anon/download > $ signify -C -p /etc/signify/openbsd-73-base.pub -x SHA256 install73.img > signify: invalid comment in SHA256; must start with 'untrusted comment: ' > aahno:/home/anon/download You downloaded SHA256.sig, but then told signify to read the SHA256 file. Perhaps you should follow all the examples for signify and pass it the SHA256.sig file. Philip Guenther
Re: chmod change means dump(8) the file
On Wed, Jan 25, 2023 at 4:35 PM Jan Stary wrote: > On Jan 26 00:18:45, h...@stare.cz wrote: > > I have a large /media disk that I backup nightly using dump(8): > > full level 0 on the Sun/Mon night, incrementals through the week. > > The level 0 dump is huge, the incrementals are usualy trivial > > unless I add something to /media. > > > > Yesterday I chmod'd a lot of the files, without making any other change. > > That resulted in a huge level 2 dump; I suppose a chmod change counts > > as a changed file, so they all got dumped anew, even though the content > > of the file(s) has not changed. > > > > Is that intentional? It seems there is a lot of space to be saved > > if it's "only" the metadata that have changed. Is that decided by > > simply looking at the stat(2)? In particular, newer ctime is > > just as good a reason to dump the _content_ as newer mtime? > > Seems so: > > /* Determine if given inode should be dumped */ > [...] > if (CHECKNODUMP(dp) && > (DIP(dp, di_mtime) >= spcl.c_ddate || > DIP(dp, di_ctime) >= spcl.c_ddate)) { > Right: if the ctime is newer than the previous backup then you don't know what else has changed: the contents could have been modified and then the file's mtime backdated to before the previous backup. At that point the ctime is the only indicator that the file no longer matches its backup. (Of course, the second problem that follows from that limitation is that the 'dump' format doesn't have a way to record inode-only info (like mode, times, and flags) without also recording the file contents. So, even if the filesystem provided enough info for dump to know that only the file's mode had been changed, there's nothing it can do about it other than back up the entire file.) Philip Guenther
Re: Weirdness with du/df/my brain (latter more likely)
On Sun, Jan 22, 2023 at 2:08 PM Steve Fairhead wrote: > I was cloning a server with rsync in preparation for a major upgrade > (elderly OpenBSD to 7.2). I noticed that the home partition usage was a > good deal greater on the new machine than the old (as seen by df). > Good thing "cloning with rsync" has a specific meani... $ rsync --help | grep -c ^- 144 $ oh, hmm. You'll need to be specific about what rsync options you used, and perhaps eyeball what the manpage says about them. For example, the description of the -a option has a specific warning which seems a plausible explanation of the expansion. Philip Guenther
Re: Relinking to create unique kernel... failed!
On Fri, Jan 13, 2023 at 10:59 AM Nick Templeton wrote: > Ever since upgrading my machine to 7.2 I've been unable to relink my > kernel, anybody have any idea why? ... > Running "/usr/libexec//reorder_kernel" manually resulted in a kernel panic: > > mode = 0100600, inum = 7, fs = /tmp > panic: ffs_valloc: dup alloc > Stopped at db_enter+0x10: popq %rbp > You have at least one filesystem with latent corruption. You should reboot in single-user mode and run fsck with the -f option on each partition. Philip Guenther
Re: rdist remove option and default behaviour
On Mon, Dec 12, 2022 at 9:02 PM All wrote: > I wanted to clarify. > > In manpage for rdist I see that we can use option -o remove . > remove Remove extraneous files. If a directory is being > updated, any files that exist on the remote host that > do > not exist in the master directory are removed. This > is > useful for maintaining truly identical copies of > directories. > However, this seems to be the default anyway. > > If I specify "install /tmp/" and try to copy /tmp/test.file all the files > in /tmp/ > on the remote host will be wiped out and only test.file will remain there > after copy. > This behaviour seems to fit with "directory update" feature of "remove" > (like > if we do "install -o remove /tmp/"). Yet, "remove" was not specified above. > > Is my understanding of default behaviour correct? This how it supposed to > be working? > When reporting an issue, please include precise information about both * what your desired end result / goal was, and * what you tried, including how you invoked the command and/or the config used. If you leave out the former, then we'll be guessing as to why the result wasn't what you wanted. If you leave out the latter, then we'll be guessing as to what you did that didn't work as desired. ...or be prepared to be accepting of people guessing wrong. ALSO: rdist has been largely superseded by rsync, which has a much more efficient underlying protocol and, in my experience, a more regular set of behaviors. Before committing to rdist and its (limited by history) behavior, you should consider using rsync instead. It seems you * wanted to copy /tmp/test.file to /tmp/test.file on one or more other hosts? * you tried a distfile like this: whatever: ( /tmp/test.file ) -> HOSTNAME install /tmp/; ? You're correct that the latter does not achieve the former. To achieve the former, you would need to either * leave off the opt_dest_name from the 'install' directive, so that rdist would know to install the source to the exact same path on the target host * specify the full target path in the 'install' directive, ala "install /tmp/test.file" * have multiple source files, so that it treated opt_dest_name as a target directory and not a target path (like cp) So, what happened with what you _did_ try? Well, it was taken as a request install the contents of the file "/tmp/test.file" as a file "/tmp/"! rdist is smart enough to know that it can't remove a directory without first removing its contents, so it tried that and presumably failed. If it _could_ remove the contents it would then remove the directory...and then fail when it tried to create a tempfile with prefix "/tmp/". Could rdist's behavior be improved? In some ways, yes, but lots of sharp corners (e.g., single vs multiple source handling) would remain. Frankly, if rsync serves your purposes, you should use it instead. Philip Guenther
Re: port builds with inline source
Take a look at the Makefile for the sysutils/cpuid port, which has just one C file included in the ports source tree itself. Philip Guenther On Wed, Jun 29, 2022 at 3:53 PM Lyndon Nerenberg (VE7TFX/VE6BBM) < lyn...@orthanc.ca> wrote: > We have a number of in-house utilities that we push out as packages. > Right now these are built using the standard make framework, with > a bunch of hand-crafted glue to build and sign the packages before > pushing them to our internal distribution server. > > I would really like to take advantage of to automate > as much of the packing process as I can. The problem is that port > builds assume you're obtaining the program source from external > distribution files, whereas I want to build right out of the port > directory itself, i.e. have the program source live under > /usr/ports/foo/bar/src/. > > Has anyone come up with an idiomatic solution to this that doesn't > involve surgery on /usr/share/mk/*port*? > > --lyndon > >
Re: rpcbind security
On Fri, Jun 17, 2022 at 8:42 PM Gustavo Rios wrote: > Excuse me, but how does rpcbind know that a incoming request, for > set/unset, comes from the root user ? > Theo has already told you how the *portmap* program decides that: by looking at the host and port the request is coming from. (There is no rpcbind program in OpenBSD and that word doesn't appear in the manuals. If you see an rpcbind process then you're not on OpenBSD and need to check with a different mailing list.) Philip Guenther
Re: C states lost on amd64
On Fri, 27 May 2022, Jan Stary wrote: > ... and with the latest snapshot, they are back. ... > acpicpu0 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), > C1(1000@1 mwait.1), PSS > acpicpu1 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), > C1(1000@1 mwait.1), PSS > acpicpu2 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), > C1(1000@1 mwait.1), PSS > acpicpu3 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), > C1(1000@1 mwait.1), PSS > > On May 26 14:34:43, h...@stare.cz wrote: > > This is current/adm64, dmesgs below. > > With the current snapshot, the C states are gone: > > > > -acpicpu0 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), > > C1(1000@1 mwait.1), PSS > > -acpicpu1 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), > > C1(1000@1 mwait.1), PSS > > -acpicpu2 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), > > C1(1000@1 mwait.1), PSS > > -acpicpu3 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), > > C1(1000@1 mwait.1), PSS > > +acpicpu0 at acpi0: C1(@1 halt!), PSS > > +acpicpu1 at acpi0: C1(@1 halt!), PSS > > +acpicpu2 at acpi0: C1(@1 halt!), PSS > > +acpicpu3 at acpi0: C1(@1 halt!), PSS > > > > Is this expected? > > Is it related to the recent apmd -A change? Not really. Well, unless your box is one where the states change depending on, say, whether the box is plugged in. You could give this diff a shot. It enables processing of CST change notifications. No committers have a (working) box that does that, so I couldn't get any interest and I have no idea when--or even if--it might go in. Philip Guenther Index: sys/dev/acpi/acpicpu.c === RCS file: /data/src/openbsd/src/sys/dev/acpi/acpicpu.c,v retrieving revision 1.92 diff -u -p -r1.92 acpicpu.c --- sys/dev/acpi/acpicpu.c 6 Apr 2022 18:59:27 - 1.92 +++ sys/dev/acpi/acpicpu.c 12 Apr 2022 06:13:55 - @@ -25,6 +25,7 @@ #include #include #include +#include #include #include @@ -80,6 +81,7 @@ void acpicpu_setperf_ppc_change(struct a #define CST_FLAG_FALLBACK 0x4000 /* fallback for broken _CST */ #define CST_FLAG_SKIP 0x8000 /* state is worse choice */ +#define FLAGS_NOCST0x01 #define FLAGS_MWAIT_ONLY 0x02 #define FLAGS_BMCHECK 0x04 #define FLAGS_NOTHROTTLE 0x08 @@ -113,8 +115,10 @@ struct acpi_cstate uint64_taddress;/* or mwait hint */ }; -unsigned long cst_stats[4] = { 0 }; - +/* + * Locking: + * m sc_mtx + */ struct acpicpu_softc { struct device sc_dev; int sc_cpu; @@ -130,6 +134,10 @@ struct acpicpu_softc { struct cpu_info *sc_ci; SLIST_HEAD(,acpi_cstate) sc_cstates; + struct mutexsc_mtx; + struct acpi_cstate *sc_cstates_active; /* [m] */ + int sc_mwait_only; /* [m] */ + bus_space_tag_t sc_iot; bus_space_handle_t sc_ioh; @@ -161,10 +169,13 @@ struct acpicpu_softc { void acpicpu_add_cstatepkg(struct aml_value *, void *); void acpicpu_add_cdeppkg(struct aml_value *, void *); +void acpicpu_cst_activate(struct acpicpu_softc *); intacpicpu_getppc(struct acpicpu_softc *); intacpicpu_getpct(struct acpicpu_softc *); intacpicpu_getpss(struct acpicpu_softc *); intacpicpu_getcst(struct acpicpu_softc *); +intacpicpu_cst_changed(struct acpicpu_softc *); +void acpicpu_free_states(struct acpi_cstate *); void acpicpu_getcst_from_fadt(struct acpicpu_softc *); void acpicpu_print_one_cst(struct acpi_cstate *_cx); void acpicpu_print_cst(struct acpicpu_softc *_sc); @@ -510,11 +521,11 @@ acpicpu_getcst(struct acpicpu_softc *sc) struct acpi_cstate *cx, *next_cx; int use_nonmwait; - /* delete the existing list */ - while ((cx = SLIST_FIRST(>sc_cstates)) != NULL) { - SLIST_REMOVE_HEAD(>sc_cstates, link); - free(cx, M_DEVBUF, sizeof(*cx)); - } + /* set aside the existing list and free it if not active */ + cx = SLIST_FIRST(>sc_cstates); + SLIST_INIT(>sc_cstates); + if (cx != sc->sc_cstates_active) + acpicpu_free_states(cx); /* provide a fallback C1-via-halt in case _CST's C1 is bogus */ acpicpu_add_cstate(sc, ACPI_STATE_C1, CST_METH_HALT, @@ -528,8 +539,10 @@ acpicpu_getcst(struct acpicpu_softc *sc) /* only have fallback state? then no _CST objects were understood */ cx = SLIST_FIRST(>sc_cstates); - if (cx->flags & CST_FLAG_FALLBACK) + if (cx->flags & CST_FLAG_FALLBACK) { + sc->sc_flags
Re: Can't attach gdb to cwm
On Wed, Mar 9, 2022 at 8:28 AM Rob Whitlock wrote: > I'm trying to attach gdb to an already running cwm but I get the following > error: > > ptrace: Invalid argument. > > Why am I getting this error? Also, I have already set kern.global_ptrace=1, > and both cwm and gdb are being run by the same user. This problem occurs > both with the gdb in base and the gdb/egdb in ports. > Let me guess: the cwm process is an ancestor of the shell where you're invoking gdb. We don't permit that as the reparenting done by ptrace() would create a loop in the process tree, which breaks assumptions by both kernel and userspace programs. If that's the case, run gdb from an ssh session or something like that. Hmm, I guess I never updated the ptrace(2) manpage to mention that... Philip Guenther
Re: C2 state on AC/battery
On Mon, Feb 7, 2022 at 10:04 AM Jan Stary wrote: > On Feb 05 13:41:25, guent...@gmail.com wrote: > > On Sat, Feb 5, 2022 at 2:54 AM Jan Stary wrote: > > > > > This is current/amd64 on a Thinkapd T420s, dmesgs below. > > > It seems that C2 is or is not supported depending on > > > whether the machine boots on AC or on battery > > > (judging by three boots of each). > > > Is this intended? > > > > The acpicpu driver is reporting what ACPI told it; presumably the authors > > of the AML intended this change as a way to reduce power consumption. > > > > Now, ACPI provides a mechanism for the OS to tell it to notify the OS if > > the contents of the _CST table changes and at least in some cases > > acpicpu registers for that and if called it would write new acpicpu lines > > to the dmesg. > > > > If you're not seeing those when plugging/unplugging, > > I don't. > > > there are two > > possibilities: > > * does the AML on your system actually change the values and trigger the > > notify? > > * is acpicpu actually registering the callback correctly? > > > > I would suggest adding a printf() right before the aml_register_notify() > > call in acpicpu.c to see if it's actually being hit, > > Probably not: I added a printf() right there > but nothing shows in dmesg when plugging/unpluging. > That aml_register_notify() path is a *boot* time path, when acpicpu is attaching. What printf() did you add and did it appear during boot? If not, then the OS isn't registering the notify callback. Please send a report to bugs@ with sendbug as root, including the acpidump output. Philip Guenther
Re: C2 state on AC/battery
On Sat, Feb 5, 2022 at 2:54 AM Jan Stary wrote: > This is current/amd64 on a Thinkapd T420s, dmesgs below. > It seems that C2 is or is not supported depending on > whether the machine boots on AC or on battery > (judging by three boots of each). > Is this intended? > The acpicpu driver is reporting what ACPI told it; presumably the authors of the AML intended this change as a way to reduce power consumption. Now, ACPI provides a mechanism for the OS to tell it to notify the OS if the contents of the _CST table changes and at least in some cases acpicpu registers for that and if called it would write new acpicpu lines to the dmesg. If you're not seeing those when plugging/unplugging, there are two possibilities: * does the AML on your system actually change the values and trigger the notify? * is acpicpu actually registering the callback correctly? I would suggest adding a printf() right before the aml_register_notify() call in acpicpu.c to see if it's actually being hit, and if it is then dump the tables on your box and grovel around in them to see if you see notification support on the CPU nodes. Philip Guenther
Re: SSL write error: certificate verification failed: certificate has expired
On Wed, Feb 2, 2022 at 6:26 PM Yogendra Kumar Chaudhary wrote: > I am facing the following error while using pkg_add on OpenBSD 6.2. > 6.2? A four year old release which has been out of support for three years? You should download the 7.0 ISO and do a fresh install. And then read the FAQ about upgrades so that you can keep your system up to date after installing. Philip Guenther
Re: cd*.iso reboot loop (vultr, Skylake AVX MDS)
On Sat, Dec 4, 2021 at 4:32 AM Claus Assmann wrote: > My vultr OpenBSD 6.8 instance crashed and when it tried to reboot it > failed at: > > root on sd0a (...) > WARNING: / was not properly unmounted > kernel: privileged instruction fault trap, code=0 > mds_handler_skl_avx+0x33: clflush __ALIGN_SIZE+0x500(%rid,%rax,8) > ... > I noticed at least one difference however: > the crashing system shows > Using Skylake AVX MDS workaround > which might be something related to the function mentioned above? They have your virtualization guest configured in a way that doesn't match any real hardware: it has a family-model-stepping combination that matches the Skylake line, real hardware of which all have the cflushopt extension, but the host is making the guest trap when that instruction is used. You could test this theory by changing "clflushopt" to "clflush" in mds.S and building a new ISO, but poking them to provide a more consistent virtualization setup, whether by migration or reconfiguration, is the better solution. We could add more tests of the cpuid data and codepatch out the instructions that should be there but aren't, but for something like the clflushopt instruction where there's no real good reason for not passing through the extension when the CPU presumably has it, it's hard to get much enthusiasm up for working around a pointlessly dumb (or buggy) virtualization config. > Is this workaround something that could be turned off to see whether > it causes the problem? > The weird thing is that OpenBSD 6.8 was installed fine > (11 months ago), so I don't understand why this problem happens now > (could vultr have changed something in the underlying system?) > The machine hosting your guest probably suffered some failure (thus the crash that you experienced) and they migrated your guest to another host to get you back up and running. I periodically see the tickets go by at my $DAYJOB of this sort of replacement. Hardware, especially modern PCs, don't live anywhere near forever... Philip Guenther
Re: Transferring ownership of SSH connection from process A to B, letting A quit nicely?
On Tue, Aug 10, 2021 at 12:13 PM mid wrote: > On Monday, August 9th, 2021 at 5:36 AM, Philip Guenther < > guent...@gmail.com> wrote: > > > If you're 100% sure you have it right, then it should be easy to provide > a > > program that demonstrates > > 1. passing an fd between processes > > 2. using it successfully in the receiving process > > 3. the sending process exiting > > 4. attempts to us it failing the receiving process > > Not 100%, but I'm out of ideas, so here goes nothing. > > client.c (process A): > ... > Compiled with: > cc -std=c99 -o server server.c > cc -std=c99 -o client client.c > > `client` is also the shell of the user, but the results are the same if > I call it from within a "real" shell, too. > > The server receives the correct FDs, and prints > "Hello from the Server\n" correctly, too. But as soon as `client` > exits, the SSH connection goes with it, instead of staying (as in, > I get "Connection to localhost closed"). > Your problems have nothing to do with fd passing but rather are around not understanding how session management works. The client is passing its stdin/stdout, which are either pipes or a pseudo-tty connected to the ssh server and NOT the actual TCP socket carrying the ssh connection. When the session leader process exits the kernel will perform various cleanup operations (block tty access, send some signals). If you _really_ want to hack around in this area, you need to do a bunch of reading and research. I recommend buying/borrowing a copy of _Advanced_Programming_in_the_UNIX_Environment_ by W. Richard Stevens. Philip Guenther
Re: Transferring ownership of SSH connection from process A to B, letting A quit nicely?
On Sun, Aug 8, 2021 at 10:13 AM mid wrote: ... > I have tried sending the file descriptors associated with the connection > to process B via sendmsg, thinking that maybe the > file descriptors are reference-counted. It's a logical > assumption, but it didn't work - the connection closed with > process A. > File descriptors sent via sendmsg() on a unix domain socket of SCM_RIGHTS control messages *are* reference-counted. If you think you've done that and it's not behaving as expected, then first check and report errors on *all* the system calls, and that the returned data fields on things like recvmsg() have the values you expect. If sendmsg() is failing or you're accidentally discarding the fds in the recvmsg() by not providing the space needed then yeah, the fds will be closed because the last reference is gone. If you're 100% sure you have it right, then it should be easy to provide a program that demonstrates 1) passing an fd between processes 2) using it *successfully* in the receiving process 3) the sending process exiting 3) attempts to us it failing the receiving process No? Philip (Replies not on the list will be deleted)
Re: /var/log/failedlogin is a binary file with a lot of null bytes?!
On Fri, Jul 16, 2021 at 11:49 PM podolica wrote: > On my OpenBSD installation (6.9) one of the log files created by login(1) > seems to be a binary file: > $ less /var/log/failedlogin > "failedlogin" may be a binary file. See it anyway? > ... > What can I learn from this logfile? > A lot of repeating null bytes and "ttyC2" and "ttyC3" does not seems > to be very informative. > > Is this an error? > No, it's not an error. That file is specific to the 'login' command, specifically the source file /usr/src/usr.bin/login/failedlogin.c and consists of an array of the 'badlogin' structure specified there. If you want to dump its contents in a more readable format then you should write a small program to do so in C or some other language which can easily handle binary files. Philip Guenther
Re: udp sendto performance
On Mon, Jul 5, 2021 at 3:56 PM Brian Empson wrote: ... > I'm running 6.5, is there any significant performance improvements in > the newer versions of OpenBSD that would improve sendto()'s performance? > Yes. I'll suggest that before you do any serious perf measurement or try to "squeeze more performance out of" *any* codebase you update to a current release and not measure a two year / four version old release. There are people for whom tracking performance of a set up over time is important and for them measuring obsolete versions is useful. However, if you have a target and are trying to figure out whether a setup can _reach_ that target then measuring an older release tells you nothing, because you would never deploy an out of date release. I Would Dearly Hope. Philip Guenther
Re: EACCES of UDP packet
On Mon, Jun 21, 2021 at 9:07 PM Siegfried Levin wrote: > Thanks a lot for the hint. Unfortunately I’m still not able to see why > sendto failed with 13 Permission denied. The AF_INET address masked is the > correct one of my server, not a broadcast address. A sendto before this one > to the same address just worked. > > 3058 myapp CALL > sendto(5,0x1689f5f6500,0x5d,0x400,0x7f7f1144,0x10) > 3058 myapp STRU struct sockaddr { AF_INET, xxx.xxx.xxx.xxx: } > Why have you chosen to hide information that may be useful in debugging your problem? "Hi, I'm asking for help but I have to hide addresses because...this application is insecure if anyone else has its IP+port? Because I've never heard of shodan and don't believe that people are constantly scanning the Internet? And while I don't know why it's failing I'm 1000% sure that there's no information to be gained from seeing the IP, so if it later turns out my understanding of 'broadcast address' is incorrect, the time I've wasted for myself and others will be...a total loss?" 3058 myapp RET sendto -1 errno 13 Permission denied > 3058 myapp CALL close(5) > 3058 myapp RET close 0 > The dump file is like 600MB. I can provide more trace log if it is > necessary for locating the root cause. > Use the scientific method: * make a testable hypothesis * devise a test for that * perform the test * determine whether the hypothesis has been ruled out or confirmed So, since the manpage mentions blocking pf, I suggest the hypothesis "it returns EACCES because pf is blocking your packets". I can think of several ways to test that; what testing have you performed to confirm or rule out that possibility? "doas pfctl -d; run test; doas pfctl -e"? Alternatively: what's different about *that* call? Does every sento() call on that socket fail? What is special about that socket? If other sendto() calls succeed, what is different about that call? Earlier setsockopt() calls? You say "I can confirm the packet was not sent to a broadcast address": *how* have you confirmed that your understanding of 'broadcast address' matches the kernel's understanding? It ain't just 255.255.255.255 Philip Guenther > > > Siegfried > siegfried.le...@gmail.com > > > > > > On Jun 15, 2021, at 8:50 PM, Theo de Raadt wrote: > > > > use ktrace > > > > Siegfried Levin wrote: > > > >> Hi, > >> > >> I have a application run by a normal user communicating with the server > with UDP. It crashes very occasionally, like once per week, due to EACCES > when sending a UDP packet. According to the manpage ( > https://man.openbsd.org/OpenBSD-6.9/sendmsg.2#EACCES), the reason might > be either being blocked by PF or sending to a broadcast address. I can > confirm the packet was not sent to a broadcast address. However, I cannot > figure out what rule could block the connection occasionally either. The > application can be brought back online without changing any configuration. > Does anyone know what might fix this? I can also rewrite the code to make > it ignore the error and keep trying but that might not be a proper > solution. Running it as root might not be a good idea, too. > >> > >> It happens since OpenBSD 6.8. Now I’m running it on 6.9. The > application is written in Rust. > >> > >> Siegfried > >> siegfried.le...@gmail.com > >> > >> > >> > >> > >
Re: Usage of .note.openbsd.ident
On Fri, May 21, 2021 at 5:28 AM George Brown <321.geo...@gmail.com> wrote: > It seems this ELF note was used for the now dead compat_linux feature. > Aside from compat systems in other operating systems that may wish to > identify OpenBSD binaries does this note have any other active uses? > The point of the note (and/or the OS/ABI field in the ELF header) is to permit portable ELF tools to identify how to interpret OS-specific values, those in the OS-ranges for types, for example. Not inserting _some_ identifying factor is basically doing an embrace-and-extend on ELF and actively hostile to portability of tooling. If you find that ELF note obnoxious, just fix the linkers to instead set the ELF ABI field correctly. As I understand it, the 'go' tool chain has done that for years. It's really the better choice for this, would take less space and be faster to process. Philip Guenther
Re: Fwd: umm_map returns unaligned address?
On Fri, Apr 23, 2021 at 4:50 PM Alessandro Pistocchi wrote: ... > What I was flagging is just that sometimes uvm_map returns an address that > is not > aligned to PAGE_SIZE ( I printed it out and it has 0x004 in the lower 12 > bits).On the > other hand uvm_unmap has an assertion that panics if the address passed to > it is not > page aligned. I believe that there could be a bug somewhere. > You apparently didn't print out the value directly after return from uvm_map() but rather later after a bunch of your other code had run. Yes, there's a bug, in your game_mode_start_audio_thread(), where you advance the pointer from uvm_map() by four. Philip Guenther
Re: umm_map returns unaligned address?
On Fri, Apr 23, 2021 at 3:13 PM Alessandro Pistocchi wrote: > -- Forwarded message - > From: Alessandro Pistocchi > Date: Fri, Apr 23, 2021 at 1:55 PM > Subject: umm_map returns unaligned address? > To: > > > Hi all, > > I am fairly new to openbsd so if this is something obvious that I missed > please be understanding. > > I am adding a syscall to openbsd 6.8. I am working on a raspberry pi. > > During the syscall I allocate some memory that I want to share between the > kernel > and the calling process. > > When it's time to wrap up and unmap the memory, I unmap it both from the > kernel > map and from the process map. > > The unmapping from the process map goes fine, the unmapping from the kernel > map > fails by saying that the virtual address in kernel map is not aligned to > the page size > ( it's actually 4 bytes off ). > > What have I missed? I assumed that umm_map would return a page aligned > virtual > address for the kernel mapping as well. > > Here is my code for creating the shared memory chunk: > Stop sending summaries and just send diffs that compile: you don't know everything that is relevant and keep leaving out stuff that is. I'm the third person to say this. > > > // memory_size is a multiple of page size > uvm_object = uao_create(memory_size, 0); > if(!uvm_object) return; > > // TODO(ale): make sure that this memory cannot be swapped out > > uao_reference(uvm_object) > if(uvm_map(kernel_map, (vaddr_t *), round_page(memory_size), > uvm_object, >0, 0, UVM_MAPFLAG(PROT_READ | PROT_WRITE, PROT_READ | PROT_WRITE, >MAP_INHERIT_SHARED, MADV_NORMAL, 0))) { > The cast of is wrong: it's either unnecessary (if memory is of the correct type) or totally broken (if it isn't). Why did you think it was unnecessary to show how you declared your variables? You also fail to show your initialization of 'memory'. If you didn't then that's ABSOLUTELY wrong and not in line with the existing uses of uvm_map() in the kernel. Please consult the uvm_map(9) manpage for what the incoming value means. ... > uao_reference(uvm_object); > if(uvm_map(>p_vmspace->vm_map, _in_proc_space, > round_page(memory_size), uvm_object, >0, 0, UVM_MAPFLAG(PROT_READ | PROT_WRITE, PROT_READ | PROT_WRITE, >MAP_INHERIT_NONE, MADV_NORMAL, 0))) { > memory = 0; > This error handling is incomplete, lacking an unmap. Philip Guenther
Re: Unable to listen properly on UDP port 4500
: bleys; grep 4500 /etc/services ipsec-nat-t 4500/tcpipsec-msft # IPsec NAT-Traversal ipsec-nat-t 4500/udpipsec-msft # IPsec NAT-Traversal : bleys; sysctl net.inet.esp.udpencap net.inet.esp.udpencap=1 : bleys You're trying to use the ipsec ESP encapsulation port, which is enabled by default. If you're a masochist and likes making your life more difficult, you can use that port for your own purposes by disabling that sysctl. If you're not a masochist, use a different port. Philip Guenther On Tue, Dec 8, 2020 at 4:13 PM Chris Johnson wrote: > Hello All, > > I am unable to set up a localhost netcat listener on UDP port 4500 that > responds to a client on that same host. I encountered this issue > attempting to test whether UDP 4500 was open on our departmental firewall. > > Simple test case: Fresh build of OpenBSD 6.8. No local network, no > packet filter, no iked running. > > # netstat -na -f inet | grep 4500 > [empty] > # fstat | grep 4500 > [empty] > > $ nc -ul localhost 4501 & > [1] 72638 > $ nc -u localhost 4501 > Z > Z > ^C > $ pkill nc > > [1]+ Stopped nc -ul localhost 4501 > $ nc -ul localhost 4500 & > [2] 70181 > $ nc -u localhost 4500 > Z > ^C > $ pkill nc > [2]- Terminated nc -ul localhost 4500 > > The server running on port 4500 does not echo. Why not? Is there > something obvious that I'm missing? > > I've tried this on three different OpenBSD 6.8 systems (all amd64). Is > UDP 4500 reserved in some way? Other ports I've tried work fine. Linux > and MacOS systems work fine on this port. > > Cheers, > > Chris > >
Re: Potential ksh bug?
On Mon, Nov 16, 2020 at 11:04 PM Bodie wrote: > On 17.11.2020 05:04, Jordan Geoghegan wrote: > > Hello, > > > > I'm not sure if this is a bug, or if it's just a pdksh thing, but I > > stumbled upon some interesting behaviour when I was tinkering around > > with quoting and using a poor mans array: > > > > test=$(cat <<'__EOT' > > # I'll choose not to close this quote > > other_stuff > > __EOT > > ) > > > > echo "$test" > > > > > > When I run this command on ash, dash, yash, bash, zsh or ksh93 I get > > the following output: > > > > # I'll choose not to close this quote > > other_stuff > > > > But when I run it on ksh from base or any pdksh derivative it throws > > an error about an unclosed quote: > > > > test.sh[8]: no closing quote > > > > This snippet works on every POSIX-y shell in the ports tree, and fails > > on every pdksh variant I tried, including on NetBSD and DragonflyBSD > > as well. I don't have the requisite esoteric knowledge regarding > > pdksh's internal quoting logic, so I'm hoping one of the gurus here > > can determine whether this is a bug or if I'm just doing something > > annoying. > > > > Any insight that can be provided would be much appreciated. > > > > What exactly are you trying to achieve? > > If you will look in sh(1) for 'Command expansion' then there are defined > rules and your form is not between them. > I disagree. I believe this: cat <<'__EOT' # I'll choose not to close this quote other_stuff __EOT matches the syntax for 'command'...once you take into account redirections, including 'here-docs'. Or do you believe that's not a valid command on it's own? To put another way, I agree with halex@ that this is a (known, not yet fixed) bug. So error message about missing closing quote is actually proper > behavior. > Nope. This is a bug in OpenBSD ksh. > As well it is good idea to avoid reserved words as a names for variables > ;-) > (test) Hmm? * 'test' is not a reserved word in the shell * shell variable names are a completely different namespace than shell reserved words or commands * code written to check whether something is a bug is 1000% out-of-bounds for style comments: either there's a bug or there isn't Philip Guenther
Re: OpenBSD 6.8 (release) guest (qemu/kvm) on Linux 5.9 host (amd64) fails with protection fault trap
On Sun, Nov 15, 2020 at 10:24 AM Gabriel Garcia wrote: > I would like to run OpenBSD as stated on the subject - I have been able, > however, to run it successfully with "-cpu Opteron_G2-v1", but I would > rather use "-cpu host" instead. Also note that on an Intel host, OpenBSD > appears to work successfully on the same Linux base. > > qemu invocation that yields a trap: > ... Lots of looking everywhere but the error going on here. Let's look at the trap/ddb output: > kernel: protection fault trap, code=0 > Stopped at amd64_errata_setmsr+0x4e: wrmsr > > Contents of CPU registers: > ddb> show registers > rdi 0x9c5a203a > rsi 0x820ff920errata+0xe0 > rbp 0x824c5740end+0x2c5740 > rbx 0x18 > rdx0 > rcx 0xc0011029 > rax 0x3 > r80x824c55a8end+0x2c55a8 > r9 0 > r10 0xbdf7dabff85d847b > r11 0x51e076fef1dcfa7b > r120 > r130 > r14 0x820ff940acpihid_ca > r15 0x820ff920errata+0xe0 > rip 0x81bc6edeamd64_errata_setmsr+0x4e > cs 0x8 > rflags 0x10256__ALIGN_SIZE+0xf256 > rsp 0x824c5730end+0x2c5730 > ss 0x10 > amd64_errata_setmsr+0x4e: wrmsr Oh hey, it says RIGHT THERE that a wrmsr instruction faulted. Which one? Well, it's in the function amd64_errata_setmsr(). Furthermore, we just have to remember that wrmsr takes the MSR to write in the %ecx register (something the qemu people surely know) and so it's the 0xc0011029 MSR. Let's grep for that in the amd64 kernel source: : bleys; cd /usr/src/sys/arch/amd64/ : bleys; grep -rw 0xc0011029 * include/specialreg.h:#define MSR_DE_CFG 0xc0011029 /* Decode Configuration */ : bleys; grep -rwl MSR_DE_CFG * amd64/identcpu.c amd64/vmm.c amd64/amd64errata.c include/specialreg.h : bleys; grep -rwl ^amd64_errata_setmsr * amd64/amd64errata.c : bleys; less +/MSR_DE_CFG amd64/amd64errata.c <...> /* * 721: Processor May Incorrectly Update Stack Pointer */ { 721, 0, MSR_DE_CFG, amd64_errata_set9, amd64_errata_setmsr, DE_CFG_721 }, Looks like qemu fails to behave like a real AMD CPU by failing to handle the wrmsr() for that errata. Also the kernel you're running it on is failing to apply the errata itself (because otherwise OpenBSD won't be trying to flip the bit itself). Go shake an AMD errata document at the qemu people and figure out why your host kernel isn't applying a documented fix. Paying attention to what the kernel tells you is a Good Thing. Honestly, what you showed above, that it trapped on wrmsr with those registers should have been enough for the qemu people to figure out what wasn't working. Philip Guenther
Re: time_t
On Mon, Oct 5, 2020 at 12:27 PM Roderick wrote: ... > Back to tar files: there is place for 11 octal digits, that is > only twice the time you can count with 32 bits, in years: > > 2^33/(60*60*24*365.25*2)=136.09930083403047126524 > > Also not too much. Is it not a better solution to begin a new epoch > every 68.05 years? We can do a big celebration at the beginning of > each new epoch. > The pax file format (which is supported by many 'tar' binaries) supports expressing the time as a decimal string with sub-integer part, bounded only by the block size, solving both the field size limit problem and the lack of subsecond resolution. Philip Guenther
Re: i386, parallel port permission error?
On Wed, Aug 19, 2020 at 3:09 AM Doug Moss wrote: > On 2020-08-17, Stuart Henderson wrote: > >On 2020-08-17, Doug Moss wrote: > >> > >> Did something change at OpenBSD i386 between 5.9 and 6.0 > >> related to parallel port / lpt hardware permissions? > >> > >> Up to OpenBSD i386 5.9, > >> I used to be able to have a working case-LCD-screen > >> with lcdproc-0.5.7, driver=hd44780, winamp wiring, with 'allowaperture'. > >> At OpenBSD i386 6.0 and after, it fails. > > > >I think this is due to kernel memory access restrictions that were added. > >Setting sysctl kern.allowkmem=1 before securelevel is raised bypasses them > >but of course weakens protections. > > I think the problem in lcdproc is in the code from this file (port.h) > https://github.com/lcdproc/lcdproc/blob/master/server/drivers/port.h > > I am out of my depth with this code. I have never even seen these > calls 'outb' and 'inb' > The code looks like it was begun in 1995. > Is that what you are talking about 'kernel memory access'? > Those are direct CPU instructions for I/O. To use them, the code must use i386_iopl(2) from libarch.a to enable it, which in turn requires the machdep.allowaperture sysctl to a non-zero value (per the manpage). > Any advice about this? Is this code amenable to being 'modernized'? > > If can't modernize the lcdproc code, can you give me specifics about: > Do I just put a line in /etc/rc.securelevel > kern.allowkmem=1 > Try machdep.allowaperture=1 instead. Philip Guenther
Re: sysctl and panic
On Tue, Aug 4, 2020 at 12:23 PM Sven F. wrote: ... > # sysctl -w ddb.panic=1 > sysctl: ddb.panic: Operation not permitted ... > Is this expected and can be set only early in boot ? > Yes, exactly. Read the securelevel(7) or sysctl(2) manpages for details. > is ddb.panic=0 still supported ? > Yes. Philip Guenther
Re: perl hex possible bug
On Tue, Jul 21, 2020 at 3:12 PM Edgar Pettijohn wrote: > I was playing around with the hex function in perl. So naturally I > started with: > > perldoc -f hex > > Which showed me a few examples namely the following: > > print hex '0xAf'; # prints '175' > print hex 'aF'; # same > $valid_input =~ /\A(?:0?[xX])?(?:_?[0-9a-fA-F])*\z/ > > However, I get the following output: (newlines added for clarity) > > laptop$ perl -e 'print hex '0xAf';' > 373 > You used the same quotes on the inside and out, so the "inner" quotes actually never get to the perl! The shell parses the argument to perl to print hex 0xAf 0xAf is a numeric literal whose value is 175. The hex() function then takes its argument (175) converts it to a string ("175") and interpretats that string per its rules...as if you passed it "0x175" which equals 373. If you use distinct quotes, you get the value you expect: $ perl -le 'print hex "0xAf";' 175 $ > laptop$ perl -e 'print hex 'aF';' > 175 > That relies on the so-called poetry extension, where a bare word like aF is treated as a string. Turn on strict... $ perl -Mstrict -le 'print hex Af;' Bareword "Af" not allowed while "strict subs" in use at -e line 1. Execution of -e aborted due to compilation errors. $ > I'm guessing there is a bug here but not sure if its software or > documentation. > No bug, just shell quoting traps. Philip Guenther
Re: Potential grep bug?
Nope. This is a grep of a single file, so procfile() must be overflowing and this only 'fixes' it by relying on signed overflow, which is undefined behavior, being handled in a particular way by the compiler. So, luck (which fails when the compiler decides to hate you). There are more places that need to change for the reported problem to be handled safely. Philip Guenther On Tue, Jun 23, 2020 at 9:58 PM Martijn van Duren < open...@list.imperialat.at> wrote: > This seems to fix the issue for me. > > OK? > > martijn@ > > On Tue, 2020-06-23 at 19:29 -0700, Jordan Geoghegan wrote: > > Hello, > > > > I was working on a couple POSIX regular expressions to search for and > > validate IPv4 and IPv6 addresses with optional CIDR blocks, and > > encountered some strange behaviour from the base system grep. > > > > I wanted to validate my regex against a list of every valid IPv4 > > address, so I generated a list with a zsh 1 liner: > > > > for i in {0..255}; do; echo $i.{0..255}.{0..255}.{0..255} ; done | > > tr '[:space:]' '\n' > IPv4.txt > > > > My intentions were to test the regex by running it with 'grep -c' to > > confirm there was indeed 2^32 addresses matched, and I also wanted to > > benchmark and compare performance between BSD grep, GNU grep and > > ripgrep. The command I used: > > > > grep -Eoc > > > "((25[0-5]|(2[0-4]|1{0,1}[[:digit:]]){0,1}[[:digit:]])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[[:digit:]]){0,1}[[:digit:]])(/[1-9]|/[1-2][[:digit:]]|/3[0-2])?" > > > > My findings were surprising. Both GNU grep and ripgrep were able get > > through the file in roughly 10 and 20 minutes respectively, whereas the > > base system grep took over 20 hours! What interested me the most was > > that the base system grep when run with '-c' returned '0' for match > > count. It seems that 'grep -c' will have its counter overflow if there > > are more than 2^32-1 matches (4294967295) and then the counter will > > start counting from zero again for further matches. > > > > ryzen$ time zcat IPv4.txt.gz | grep -Eoc > "((25[0-5]|(2[0-4]|1{0,1}... > > 0 > > 1222m09.32s real 1224m28.02s user 1m16.17s system > > > > ryzen$ time zcat allip.txt.gz | ggrep -Eoc > "((25[0-5]|(2[0-4]|1{0,1}... > > 4294967296 > > 10m00.38s real11m40.57s user 0m30.55s system > > > > ryzen$ time rg -zoc "((25[0-5]|(2[0-4]|1{0,1}... > > 4294967296 > > 21m06.36s real27m06.04s user 0m50.08s system > > > > # See the counter overflow/reset: > > jot 4294967350 | grep -c "^[[:digit:]]" > > 54 > > > > All testing was done on a Ryzen desktop machine running 6.7 stable. > > > > The grep counting bug can be reproduced with this command: > > jot 4294967296 | nice grep -c "^[[:digit:]]" > > > > Regards, > > > > Jordan > > > Index: util.c > === > RCS file: /cvs/src/usr.bin/grep/util.c,v > retrieving revision 1.62 > diff -u -p -r1.62 util.c > --- util.c 3 Dec 2019 09:14:37 - 1.62 > +++ util.c 24 Jun 2020 06:46:52 - > @@ -106,7 +106,8 @@ procfile(char *fn) > { > str_t ln; > file_t *f; > - int c, t, z, nottext; > + int t, z, nottext; > + unsigned long long c; > > mcount = mlimit; > > @@ -169,7 +170,7 @@ procfile(char *fn) > if (cflag) { > if (!hflag) > printf("%s:", ln.file); > - printf("%u\n", c); > + printf("%llu\n", c); > } > if (lflag && c != 0) > printf("%s\n", fn); > >
Re: Potential awk bug?
On Sat, Jun 6, 2020 at 5:08 PM Zé Loff wrote: > On Sat, Jun 06, 2020 at 03:51:58PM -0700, Jordan Geoghegan wrote: > > I'm working on a simple awk snippet to convert the IP range data listed > in > > the Extended Delegation Statistics data from ARIN [1] and convert it into > > CIDR blocks. I have a snippet that works perfectly fine on mawk and gawk, > > but not on the base system awk. I'm 99% sure I'm not using any GNUisms, > as > > when I break the command up into two parts, it works perfectly. > > > > The snippet below does not work with base awk, but does work with gawk > and > > mawk: (Running on 6.6 -stable system) > > > > awk -F '|' '{ if ( $3 == "ipv4" && $2 == "US") printf("%s/%d\n", $4, > > 32-log($5)/log(2))}' delegated-arin-extended-latest.txt > > > > > > The command does output data, but it also throws errors for certain > lines: > > > > awk: log result out of range > > input record number 94027, file delegated-arin-extended-latest.txt > > source line number 1 > > > > Most CIDR blocks are calculated correctly, but about 10% of them have > errors > > (ie something that should calculated to be a /24 is instead calculated > to be > > a /30). > ... > I have no idea about what is going on, but FWIW I can reproduce this on > i386 6.7-stable and amd64 6.7-current (well, current-ish, #232). > Truncating the file to a single offending line produces the same result: > log($5) is out of range. > > It appears to have something to do with the last field. Removing it or > changing some of its characters seems to work, e.g.: > > > arin|US|ipv4|216.250.144.0|4096|20050503|allocated|5e58386636aa775c2106140445cf2c30 > > arin|US|ipv4|216.250.144.0|4096|20050503|allocated|5a58386636aa775c2106140445cf2c30 > ^ > Fails on the first line but works on the second. > Hah! Nice observation! The last field of the first line looks kinda like a number in scientific notation, but when awk internally tries to set up the fields it generates an ERANGE error...and the global errno variable is left with that value. Several builtins in awk, including log(), perform operations and then check whether errno is set to EDOM or ERANGE but fail to clear errno beforehand. The fix is to zero errno before all the code sequences that use the errcheck() function, ala: --- run.c 13 Aug 2019 10:45:56 - 1.44 +++ run.c 7 Jun 2020 03:14:38 - @@ -26,6 +26,7 @@ THIS SOFTWARE. #define DEBUG #include #include +#include #include #include #include @@ -1041,8 +1042,10 @@ Cell *arith(Node **a, int n) /* a[0] + a case POWER: if (j >= 0 && modf(j, ) == 0.0) /* pos integer exponent */ i = ipow(i, (int) j); - else + else { + errno = 0; i = errcheck(pow(i, j), "pow"); + } break; default:/* can't happen */ FATAL("illegal arithmetic operator %d", n); @@ -1135,8 +1138,10 @@ Cell *assign(Node **a, int n)/* a[0] = case POWEQ: if (yf >= 0 && modf(yf, ) == 0.0) /* pos integer exponent */ xf = ipow(xf, (int) yf); - else + else { + errno = 0; xf = errcheck(pow(xf, yf), "pow"); + } break; default: FATAL("illegal assignment operator %d", n); @@ -1499,12 +1504,15 @@ Cell *bltin(Node **a, int n)/* builtin u = strlen(getsval(x)); break; case FLOG: + errno = 0; u = errcheck(log(getfval(x)), "log"); break; case FINT: modf(getfval(x), ); break; case FEXP: + errno = 0; u = errcheck(exp(getfval(x)), "exp"); break; case FSQRT: + errno = 0; u = errcheck(sqrt(getfval(x)), "sqrt"); break; case FSIN: u = sin(getfval(x)); break; Todd, are we up to date with upstream, or is this latent there too? Philip Guenther
Re: Convert ffs1 to ffs2?
On Tue, May 19, 2020 at 10:50 PM Christer Solskogen < christer.solsko...@gmail.com> wrote: > Is that possible? > "Possible" is irrelevant. Lots of things are _possible_ but not done. "Has anyone actually written a tool to do this, and would you *trust* it?" are the proper question...and the answer appears to be *no*. Philip Guenther
Re: OpenBSD insecurity rumors from isopenbsdsecu.re
On Mon, May 11, 2020 at 6:09 PM wrote: ... > > And why would *you* care about those ways? If you can't tell us why you > would care, how can we answer your _real_ question? > Treat it as my secret, I want and that is why I ask because I can, I wish > you tell me the answer without a knowledge of "why I ask", > it is a very long discussion of answering by a question to question in > your Jewish style, is not it? > I considered treating your questions in good faith, but then you said this. If my questions have you spouting this nonrational drivel them you should stay away from OpenBSD because I am a committer and if you can't trust my questions then you shouldn't trust my code. Philip Guenther
Re: OpenBSD insecurity rumors from isopenbsdsecu.re
On Mon, May 11, 2020 at 4:28 PM wrote: > Is not a prohibition for USA citizens to work on OpenBSD cryptography > software parts an indication of trust relationship between current OpenBSD > and current USA? > I'm not sure what that sentence even means. What would a "trust relationship" between OpenBSD and "current USA" actually mean in terms of a CHANGE IN BEHAVIOR? Hell, what does "current USA" even _mean_?!? Did you mean to say "the US Federal Government"? If so, what would "trust between OpenBSD and the US Federal Government" actually mean in terms of a change in behavior that you, i...@aulix.com, could actually detect? And why would *you* care about those ways? If you can't tell us why you would care, how can we answer your _real_ question? There is cryptographic software in OpenBSD that was developed in part by someone who is/was a US citizen, in OpenSSH even, as a check of copyright/license statements on source files show. How does that change your world view? Philip Guenther
Re: Double fault trap in rtable_l2
On Sat, Apr 18, 2020 at 11:28 PM Thomas de Grivel wrote: > I got this error last night on an OpenBSD 6.6-stable amd64 on which I > recently enabled IKEv2 : > > > kernel: double fault trap, code=0 > > Stopped atrtable_l2+0x27: callq srp_enter+0x4 > That was the *complete* output from ddb? Really? Not a screen full of backtrace after that showing that it has a very deep stack? As you might guess from my questions: the #1 cause of a double fault traps are kernel bugs causing deep recursion where it runs off the end of the allocated stack, triggering a page fault exception which itself faults when it can't write the stack frame for the page fault. That "fault while trying to fault" results in a double fault, which I configured to be delivered on its own stack so that we can report this. Fixing the deep recursion in this case would require you providing the full stack trace to the list, so that the correct parties can see it and identify where it's incorrectly looping. Philip Guenther
Re: Openbsd supports pae?
Because it would be a total PITA now and in the future and benefit only that small set of machines that have >4GB of memory but that can't run 64bit. Since you like one-liner questions: why do you care?
Re: ffs details
On Tue, Feb 25, 2020 at 6:03 PM wrote: > Hi, I need some details about ffs, I read the kernel source but my c > knowledge is very basic. I understood all about the superblock but my > problem is understand how the files are allocated on the disk. > Anyone could give me more details about files allocation ? > You should start from the paper that described the design and implementation: https://www.cs.berkeley.edu/~brewer/cs262/FFS.pdf as linked to from https://en.wikipedia.org/wiki/Unix_File_System Kirk McKusick has continued to revise and improve FFS; many of those changes have been included in OpenBSD. Check his biography and his personal website for links to papers and presentations. Philip Guenther
Re: is there a 2GB limit on amd64 link?
On Wed, Feb 5, 2020 at 7:38 PM wrote: > I am encountering a linker error when compiling with ports-gcc Fortran: > > ld: error: lbug2.f90:(function MAIN__: .text+0x80): relocation > R_X86_64_PC32 out o > f range: 2456507324 is not in [-2147483648, 2147483647] > > The code has several large arrays, the total size of which exceeds 2GB. > > Is this a linker issue, a gcc fortran issue, or a pebkac? > It's at least a gnu fortran issue: it needs to generate object code in a larger "model" than it currently is. I've never used gnu fortran, but it might accept the -mcmodel=medium option like gcc and generate code sequences for data symbols that don't limit them to the bottom 2GB (or to within 2GB of the involved code, depending on gcc's choices in implementing the model). If it doesn't accept that option, then you'll need to work with the the docs, mailling lists, etc of the upstream gnu fortran project about how to have it generate code for the medium or large data models per the amd64 ABI. Philip Guenther
Re: Readv and writev failing across ethernet
On Tue, Dec 24, 2019 at 8:14 PM Raymond, David wrote: > Openmpi uses readv/writev. I am beginning to think that the timeout > and permission errors are legit and reflect real conditions. What does re do when it receives a write request when it is busy? > 're' does not expose a device, but rather provides network interfaces that are then used with sockets. What sort of sockets does openmpi use? What sort of packet loss is generated on this network and what protocols does openmpi use to recover from that? (Lacking both dmesg or kdump, I'll probably have nothing further to contribute to this thread) Philip Guenther
Re: Re-organising partitions without re-installation
On Mon, Dec 23, 2019 at 3:10 PM Stuart Longland wrote: ... > Where do you get `sysclean` from? I don't seem to have it: > > sjl-router# man sysclean > > > man: No entry for sysclean in the manual. > > sjl-router# which sysclean > > which: sysclean: Command not found. > $ pkg_info sysclean Information for http://mirrors.sonic.net/pub/OpenBSD/snapshots/packages/amd64/sysclean-2.8.tgz Comment: list obsolete files between OpenBSD upgrades Description: sysclean is a script designed to help remove obsolete files between OpenBSD upgrades. sysclean compares a reference root directory against the currently installed files, taking files from both the base system and packages into account. sysclean does not remove any files on the system. It only reports obsolete filenames or packages using out-of-date libraries. Maintainer: Sebastien Marie WWW: https://github.com/semarie/sysclean/ $
Re: Readv and writev failing across ethernet
On Mon, Dec 23, 2019 at 6:07 AM Ingo Schwarze wrote: > Theo de Raadt wrote on Sun, Dec 22, 2019 at 05:34:45PM -0700: > > Philip Guenther wrote: > >> Somebody wrote: > > >>> The man pages for readv and writev don't document the possibility of > >>> such errors. > > >> IMO, weird errnos from devices should be documented in the manpage for > the > >> device. Consider the termios(4) manpage, for example. > > > I agree on that. Otherwise the information-flood is too much. > > > > But I think some of our manual pages are a bit weak indicating there > > are other errors not listed: > > Is the following good enough? > > Or are you saying that *all* section 2 and 3 manual pages should be > reworded to say: "FOOBAR may for example fail if:"? > Not all. For section 2 it's the calls that take an fd that need to be open-ended about errors. Philip Guenther
Re: Disabling ACPI permanently
On Mon, Dec 23, 2019 at 5:10 AM Radek wrote: > I'm trying to permanently disable acpi doing the following steps[1]. > After the first reboot OS boots fine. > After the second reboot acpi seems to be re-enabled at boot - I get [2]. > What Am I doing wrong? > First, you should also check whether there's a newer BIOS firmware for this box, as there's a good chance Intel has fixed issues and issued a new one. If so, installing that may totally resolve the issue. If not, or if upgrading the firmware doesn't resolve this, then you should next send a bug report to b...@openbsd.org using sendbug. To get the most data when you do so, disable _just_ the acpipci device (using boot -c) instead of all of acpi and then run sendbug as root on that system. The bug report will then include the data from the ACPI tables, so that the driver can be fixed to deal with this. ... > acpipci0 at acpi0 PCI0panic: malloc: allocation too large, type = 33, size > = 292057776136 > Philip Guenther
Re: Readv and writev failing across ethernet
On Mon, Dec 23, 2019 at 5:04 AM Raymond, David wrote: > The "timeout" error was numerically 60. Curiously, boards with RTL > 8111GR chips did not produce these errors, but those with RTL 8111H > chips did. Unfortunately, this chipset seems to be in a lot of newer > motherboards. > > I didn't use ktrace/kdump. The openmpi software returned the error > presented by readv/writev. > > It sounds like the simplest solution at this point is to try > non-Realtek pcie network cards. Any suggestions? How are Intel or > Broadcom cards? > At this point I think you're clearly in the "device driver is buggy" situation. If this device has an in-tree driver (and not something you're compiling locally into your kernel) then you should start a new thread starting with a dmesg and a clear description of the involved hardware. Philip Guenther
Re: Readv and writev failing across ethernet
On Sun, Dec 22, 2019 at 3:33 PM Raymond, David wrote: > I am running openmpi-4.0.2 (self-compiled with GDS patches) on > up-to-date 6.6 stable with a Go program that calls Clang MPI routines. > With particular hardware (details provided if desired), readv and > writev calls randomly fail with respectively "Timeout" and "Permission > denied" errors for calls from one machine to another across the > ethernet. While "Permission denied" is the error message for EACCES, "Timeout" is not a complete errno error message OpenBSD. Has it been established that the underlying readv/writev syscalls are returning particular errors by using ktrace/kdump? Next: if you have a device open, then the device driver *totally controls* what errnos syscalls get. If a device driver wanted to return EDOM ("Numerical argument out of domain") it totally could. If you're getting weird errno from a device, well, review the device source! The errors don't occur between cores on the same machine. > THIS SHOULD NOT BE A SURPRISE: the net is not the same as your local machine. The man pages for readv and writev don't document the possibility of > such errors. IMO, weird errnos from devices should be documented in the manpage for the device. Consider the termios(4) manpage, for example. Philip Guenther
Re: Unable to build OpenBSD 6.6 libc on beaglebone black
On Sat, Dec 7, 2019 at 5:10 PM Jacob Adams wrote: > When trying to build libc with the latest security patch applied on my > beaglebone black, I was met with the following error: > > cc -O2 -pipe -g -Wimplicit -I/usr/src/lib/libc/include > -I/usr/src/lib/libc/hidden -D__LIBC__ > -Werror-implicit-function-declaration > -include namespace.h -Werror=deprecated-declarations -DAPIWARN -DYP > -I/usr/src/lib/libc/yp -DSOFTFLOAT_FOR_GCC -I/usr/src/lib/libc/softfloat > -I/usr/src/lib/libc -I/usr/src/lib/libc/gdtoa > -I/usr/src/lib/libc/arch/arm/gdtoa > -DINFNAN_CHECK -DMULTIPLE_THREADS -DNO_FENV_H -DUSE_LOCALE > -I/usr/src/lib/libc > -I/usr/src/lib/libc/citrus -DRESOLVSORT -DFLOATING_POINT -DPRINTF_WIDE_CHAR > -DSCANF_WIDE_CHAR -DFUTEX -MD -MP -c > /usr/src/lib/libc/db/btree/bt_close.c -o > bt_close.o > In file included from /usr/src/lib/libc/db/btree/bt_close.c:37: > /usr/src/lib/libc/hidden/stdlib.h:68:14: error: use of undeclared > identifier > 'calloc_conceal' > PROTO_NORMAL(calloc_conceal); > ^ > /usr/src/lib/libc/hidden/stdlib.h:109:14: error: use of undeclared > identifier > 'malloc_conceal' > PROTO_NORMAL(malloc_conceal); > ^ > 2 errors generated. > *** Error 1 in /usr/src/lib/libc (:39 'bt_close.o': @cc -O2 > -pipe -g > -Wimplicit -I/usr/src/lib/libc/include -I/usr/src/lib/libc/...) > > > I unpacked a copy of the 6.6 src.tar.gz that I had downloaded a while ago > in > /usr/src, and then updated to the stable branch with: > > cvs -qd anon...@anoncvs.ca.openbsd.org:/cvs up -Pd -rOPENBSD_6_6 > > I then ran: > > cd lib/libc > make obj > make > > and encountered this error. > > Clearly I've done something wrong, could someone please point me to my > mistake? > This box didn't have the 6.6 include files installed. This is demonstrated by the lack of a calloc_conceal() declaration in your /usr/include/stdlib.h. You can't just build 6.6 pieces without their dependent pieces being present. Now, unless you can explain _exactly_ how you ended up with this franken system (66 kernel but not include files?) and come up with a plan to get it out of the franken-state into a normal "matched kernel and userland, including compilation environment", then my recommendation would be to grab the 6.6 bsd.rd, boot to it, and (u)pgrade to 6.6 being sure to include the comp66 set in your install, and _then_ try building things...or just run syspatch. Philip Guenther
Re: How to achieve O_TTY_INIT when opening a USB modem?
On Sun, Nov 24, 2019 at 7:53 PM Jeffrey Walton wrote: > On Sun, Nov 24, 2019 at 10:10 PM Philip Guenther > wrote: > > > > On Sun, Nov 24, 2019 at 3:11 AM Jeffrey Walton > wrote: > >> > >> I am struggling to get a USB modem and terminal configured properly > >> under OpenBSD. The same code on Linux is fine. The symptom I am seeing > >> is a hung read() after issuing ATZ\r to the modem. > >> > >> I'm guessing there's an uninitialized field in my struct termios tty. > > > > I'm not sure what you mean by that. Do you mean you're concerned that > you're you making a tcsetattr(3) call on an incompletely initialized > structure? Or do you mean you're concerned that the initial configuration > of the tty provided by the kernel is in a "not good" state? > > I think cfmakeraw is not initializing the structure properly. It is an > intermittent failure. > This code is misusing cfmakeraw(3): it needs to call tcgetattr(3) on the tty fd and only call cfmakeraw() on the termios structure that tcgetattr() has filled in. (There may be other problems; I only reviewed enough to see that it was violating the rule I mentioned in my previous post. The _only_ portable way to initialize a struct termios is to use tcgetattr()!) Philip Guenther
Re: How to achieve O_TTY_INIT when opening a USB modem?
On Sun, Nov 24, 2019 at 3:11 AM Jeffrey Walton wrote: > I am struggling to get a USB modem and terminal configured properly > under OpenBSD. The same code on Linux is fine. The symptom I am seeing > is a hung read() after issuing ATZ\r to the modem. > > I'm guessing there's an uninitialized field in my struct termios tty. > I'm not sure what you mean by that. Do you mean you're concerned that you're you making a tcsetattr(3) call on an incompletely initialized structure? Or do you mean you're concerned that the initial configuration of the tty provided by the kernel is in a "not good" state? > The latest Posix provides O_TTY_INIT to ensure a terminal is in a good > configuration, but OpenBSD does not recognize it. > What is the equivalent under OpenBSD? OpenBSD, like all BSDs, does not require anything special to be done to initialize a tty on first open. We can (and I guess we should at this point) define O_TTY_INIT to be zero. How do I achieve O_TTY_INIT when > using a struct termios tty? > Before calling tcsetattr(3) you should call tcgetattr(3) to get the tty device's current settings and only alter the setting you care about. Philip Guenther
Re: Value of eax register after BIOS interrupt call from boot(8)
On Friday, November 8, 2019, Theo de Raadt wrote: > Philip Guenther wrote: > > > No, it should be the other way, moving the “clear NT flag” block down > after > > the “save registers into save area” block > > Ah. > > Index: arch/amd64/stand/libsa/gidt.S > === > RCS file: /cvs/src/sys/arch/amd64/stand/libsa/gidt.S,v > retrieving revision 1.11 > diff -u -p -u -r1.11 gidt.S > --- arch/amd64/stand/libsa/gidt.S 27 Oct 2012 15:43:42 - > 1.11 > +++ arch/amd64/stand/libsa/gidt.S 9 Nov 2019 06:50:57 - > @@ -423,14 +423,6 @@ intno = . - 1 > movl%edx, 0x9*4(%esp) > movb%bh , 0xe*4(%esp) > > - /* clear NT flag in eflags */ > - /* Martin Fredriksson */ > - pushf > - pop %eax > - and $0xbfff, %eax > - push%eax > - popf > - > /* save registers into save area */ > movl%eax, _C_LABEL(BIOS_regs)+BIOSR_AX > movl%ecx, _C_LABEL(BIOS_regs)+BIOSR_CX > @@ -438,6 +430,13 @@ intno = . - 1 > movl%ebp, _C_LABEL(BIOS_regs)+BIOSR_BP > movl%esi, _C_LABEL(BIOS_regs)+BIOSR_SI > movl%edi, _C_LABEL(BIOS_regs)+BIOSR_DI > + > + /* clear NT flag in eflags */ > + pushf > + pop %eax > + and $0xbfff, %eax > + push%eax > + popf > > pop %gs > pop %fs > Index: arch/i386/stand/libsa/gidt.S > === > RCS file: /cvs/src/sys/arch/i386/stand/libsa/gidt.S,v > retrieving revision 1.36 > diff -u -p -u -r1.36 gidt.S > --- arch/i386/stand/libsa/gidt.S31 Oct 2012 13:55:58 - > 1.36 > +++ arch/i386/stand/libsa/gidt.S9 Nov 2019 06:51:29 - > @@ -426,14 +426,6 @@ intno = . - 1 > movl%edx, 0x9*4(%esp) > movb%bh , 0xe*4(%esp) > > - /* clear NT flag in eflags */ > - /* Martin Fredriksson */ > - pushf > - pop %eax > - and $0xbfff, %eax > - push%eax > - popf > - > /* save registers into save area */ > movl%eax, _C_LABEL(BIOS_regs)+BIOSR_AX > movl%ecx, _C_LABEL(BIOS_regs)+BIOSR_CX > @@ -441,6 +433,13 @@ intno = . - 1 > movl%ebp, _C_LABEL(BIOS_regs)+BIOSR_BP > movl%esi, _C_LABEL(BIOS_regs)+BIOSR_SI > movl%edi, _C_LABEL(BIOS_regs)+BIOSR_DI > + > + /* clear NT flag in eflags */ > + pushf > + pop %eax > + and $0xbfff, %eax > + push%eax > + popf > > pop %gs > pop %f > Ok guenther@
Re: Value of eax register after BIOS interrupt call from boot(8)
On Friday, November 8, 2019, Theo de Raadt wrote: > Philip Guenther wrote: > > > Since we're unlikely to do _more_ with BIOS calls in the boot loader, my > > inclination would be to eliminate the structure value and the code that > > sets it (incorrectly). Opinions? > > I dunno, my crystal ball provides a more cynical outlook. > > How about we just repair by swapping the blocks as you propose, then > noone gets surprised down the road if they try to use the bios-interface > API's full functionality. > > The bootblocks don't shrink, but they don't grow either. > > Is this the right diff? I'm deleting the name which is in the commitlogs > since that isn't our style. ... > --- sys/arch/amd64/stand/libsa/gidt.S 27 Oct 2012 15:43:42 - > 1.11 > +++ sys/arch/amd64/stand/libsa/gidt.S 9 Nov 2019 03:57:11 - > @@ -417,19 +417,18 @@ intno = . - 1 > .byte 0xb8 > 2: .long 0x90909090 > > - /* pass BIOS return values back to caller */ > - movl%eax, 0xb*4(%esp) > - movl%ecx, 0xa*4(%esp) > - movl%edx, 0x9*4(%esp) > - movb%bh , 0xe*4(%esp) > - > /* clear NT flag in eflags */ > - /* Martin Fredriksson */ > pushf > pop %eax > and $0xbfff, %eax > push%eax > popf No, it should be the other way, moving the “clear NT flag” block down after the “save registers into save area” block Philip
Re: vi in ramdisk?
On Thu, Nov 7, 2019 at 9:57 PM Brennan Vincent wrote: > I am asking this out of pure curiosity, not to criticize or start a debate. > > Why does the ramdisk not include /usr/bin/vi by default? To date, > it is the only UNIX-like environment I have ever seen without some form > of vi. > The ramdisk space is extremely tight. We include what we feel is necessary, PUSHING OUT other stuff as priorities shift. If you have watch the commits closely, you would have seen drivers vanish from the ramdisks on tight archs as new functionality was added. Given what we want people to use the ramdisks for (installing, reinstalling, upgrading, fixing boot and set issues), vi is not necessary, while other functionality and drivers extend their applicability. We will keep the latter and not include the former. Philip Guenther
Re: Value of eax register after BIOS interrupt call from boot(8)
On Thu, Nov 7, 2019 at 9:31 AM Julius Zint wrote: > the following code snipped is from sys/arch/amd64/stand/libsa/gidt.S > > /* pass BIOS return values back to caller */ > movl%eax, 0xb*4(%esp) > movl%ecx, 0xa*4(%esp) > movl%edx, 0x9*4(%esp) > movb%bh , 0xe*4(%esp) > > /* clear NT flag in eflags */ > /* Martin Fredriksson */ > pushf > pop %eax > and $0xbfff, %eax > push%eax > popf > > /* save registers into save area */ > movl%eax, _C_LABEL(BIOS_regs)+BIOSR_AX > movl%ecx, _C_LABEL(BIOS_regs)+BIOSR_CX > movl%edx, _C_LABEL(BIOS_regs)+BIOSR_DX > movl%ebp, _C_LABEL(BIOS_regs)+BIOSR_BP > movl%esi, _C_LABEL(BIOS_regs)+BIOSR_SI > movl%edi, _C_LABEL(BIOS_regs)+BIOSR_DI > > These instructions are being executed after a BIOS interrupt. If i read > correctly, than (BIOS_regs)+BIOSR_AX contains the contents of the eflags > processor register and not of %eax. Is this intended or should it contain > the value of %eax? > Yeah, it looks like it's in the wrong order. The trick, of course, is that nothing actually examines BIOS_regs.biosr_ax, so the fact that the wrong value is saved there hasn't mattered. Since we're unlikely to do _more_ with BIOS calls in the boot loader, my inclination would be to eliminate the structure value and the code that sets it (incorrectly). Opinions? Philip Guenther
Re: this assembly example works in linux, netbsd - but not in openbsd, why?
On Tue, 29 Oct 2019, Guild Navigator wrote: > Program prints first two strings directly. > But it does not print the third string (1st array string). > > And debugging says why. > The address of msg1 and msg2 is not stored correctly in the array. > So when I access the address of msg1 from the array: > movq array (%rip), %rsi > it is NOT the address of msg1. > > I dont know if it is a linker problem? > I could kind of do "manual relocation" of sorts to manually > correct the addresses put in the array. In general, if you're going to exclude all the C startup bits that the operating system has provided, then you've signed up for handling all the possible ELF bits yourself. If you haven't boned up on ELF and its variations yet, then you should do so, if just so you can recognize when stuff has gone wrong. In particular, the OpenBSD linker defaults to PIE. To quote clang-local(1) (similar text is in gcc-local(1)): - clang will generate PIE code by default, allowing the system to load the resulting binary at a random location. This behavior can be turned off by passing -fno-pie to the compiler and -nopie to the linker. It is also turned off when the -pg flag is used. This means that yes, your executable has relocations. You've left out the rcrt0.o code that OpenBSD provides that handles such relocations, therefore you must either do the relocation processing yourself, or invoke the linker with the -nopie flag to instead generate a staticly positioned binary. If you want to handle the relocations yourself, then eyeball the code in /usr/src/lib/csu/, particularly boot.h and amd64/md_init.h, and read the ELF spec and the amd64 ABI spec for structure definitions and similar. > But what would be the OpenBSD correct way to > write such simple print-from-the-array-of-strings program? The answer to that literal question is "write it in C", but you obviously have another requirement of "...in ASM". There are *zero* places in OpenBSD where we write pure assembler programs. There are only two places where we do process bootstrap bits, lib/csu and libexec/ld.so/, and for both of those we do _just_ enough ASM to make calling a limited subset of C possible, and then call a C routine to do self-relocation. Improving the C version of the self-relocation code is *much* faster than trying improve N assembly versions. Heck, I did so just this month. ASM is cool for stuff that C can't do, but if C can do it the developer's time (and the time of future openbsd maintainers!) is much better spent in C than in ASM. I've touched ASM on every single current OpenBSD arch, so I understand the high cost of doing that when it _has_ been necessary and I have no interest in borking around in ASM for stuff C can reasonably do. Philip Guenther
Re: help with understanding __BSD_VISIBLE
On Fri, Jul 12, 2019 at 10:39 AM Allan Streib wrote: > Probably an elementary question stemming from my lack of C expertise. > > I am trying to complile some C code that includes its own "bcrypt" > function. This is conflicting with the declaration in pwd.h. > > error: conflicting types for 'bcrypt' > int bcrypt(char *, const char *, const char *); > ^ > /usr/include/pwd.h:112:8: note: previous declaration is here > char*bcrypt(const char *, const char *); > > In pwd.h I see that the bcrypt declaration is wrapped in a #if block: > ... > __POSIX_VISIBLE is defined as 200809, so __BSD_VISIBLE should be 0 and > the pwd.h declaration for bcrypt should be skipped? > There are four options here: 1) change the software to not use the name 'bcrypt' for a non-static function. OpenBSD has only been using it for 15 years... 2) If you're going to use the name bcrypt, then don't do so in files that pull in . (This would be a last choice in my book, as that's a fragile setup) 3) *IF* the software was written to only rely on the interfaces of some version of the POSIX standard, then follow the compilation rules described in that standard. You mention POSIX 2008, so perhaps this software would build when following those rules, passing the compiler -D_POSIX_C_SOURCE=200809L to only declare the symbols from that standard Note that application software should *never* define macros matching the pattern __*_VISIBLE such as __BSD_VISIBLE. Those are in the reserved namespace and on OpenBSD they are set by based on the macros specified in the various standards for use by application and build software. The ones you should care about are: _POSIX_C_SOURCE -- standardized: specifies a POSIX version _XOPEN_SOURCE-- standardized: specifies a POSIX + XSI version _ISOC11_SOURCE-- adds C2011 interfaces _BSD_SOURCE -- adds all BSD and obsoleted interfaces Make sense? Philip Guenther
Re: Putting fifos in subshells into the background
On Wed, Jun 12, 2019 at 12:54 AM Richard Ulmer wrote: > while making the Kakoune editor work on OpenBSD, I encountered some > strange behaviour [1]. This little script doesn't work with the OpenBSD > sh, but works at least with dash, bash and zsh: > > mkfifo 'testfifo' > cat "$( > ( printf 'foo\n' > testfifo 2>&1 ) > /dev/null 2>&1 & > printf 'testfifo' > )" > > I can make it work for all the mentioned shells like this: > > mkfifo 'testfifo' > cat "$( > ( ( printf 'foo\n' > testfifo 2>&1 ) & ) > /dev/null 2>&1 > printf 'testfifo' > )" > > Can someone explain or justify the behaviour of the OpenBSD sh, or do > you think this is a bug? > This is a bug, almost certainly from an over-zealous optimization in the logic handling subshells where the possibility that an inner redirection could be blocking wasn't taken into account when it tries to avoid unnecessary forks. Sorry, I don't have a fix in my back pocket. Your workaround is good; I'll note the intermediate set of parens can also be braces, which would let you avoid the otherwise necessary whitespace between open-parens if that grates on your soul like it does mine. :) Philip Guenther
Re: hw.ncpu=1, hw.ncpuonline=1, hw.ncpufound=4
On Mon, May 27, 2019 at 6:18 PM Ipsen S Ripsbusker < ips...@ripsbusker.no.eu.org> wrote: > Aaron Mason writes: > > Looks to me like you're not running bsd.mp. A dmesg would clear this > up. > > Indeed I was not running bsd.mp. I switched to bsd.mp, and then 2 of 4 > CPUs were online. Then I set "sysctl hw.smt = 1" to get all 4 online. > This is a side-point, but you do understand that those extra 2 aren't full CPUs, they're just the cardboard mockups that Intel sold you, and that if you run any untrusted code (including javascript in a web-browser) that those fake CPUs leak data across process boundaries, right? > Otto Moerbeek writes: > > On Sun, Apr 07, 2019 at 01:54:35PM +, Ipsen S Ripsbusker wrote: > > > ... > > > Also, now that I have realized this, I have a theory about a related > > > issue, and I would like to know how I can debug it. I am using softraid > > > CRYPTO, and I have found that accessing the disk with one process will > > > interrupt the other processes accessing the disk. Now I wonder this > > > happens because the sole core must switch encryption/decription > > > processes for the different files. How could I determine whether this > is > > > indeed happening? > Can you explain in more detail what you were observing when you said "found that accessing the disk with one process will interrupt the other processes accessing the disk"? The word 'interrupt' is overloaded in computing and what you saw may be a real problem with device support, or it may be completely innocuous, something which you should be ignoring. Philip Guenther
Re: Purpose of primary and secondary user groups
On Sun, Jan 13, 2019 at 6:13 AM Bryan Harris wrote: > Is there also a difference when creating a file in a folder with set GID > bit on that folder and owned by secondary group? I think in normal > behavior, if folder allows a user to create a file (sec. group w/ 770 > perm.) then the new file group will not take the group of the folder but > will take the group of the user's primary group. But if you have set GID > bit then the new file will take the group of the folder it's in (which > will be one of the user's secondary groups). > > I thought in OpenBSD there is also a flag to mount the filesystem to > always do this regardless of set GID but I can't remember. I don't see > it in the man page so maybe with all of this I'm really thinking of > Linux but I can't remember. > Nope. OpenBSD always uses the BSD behavior. The use of the SGID bit on directories to request BSD behavior was an addition in SystemV-based systems when enough of their devs and users yelled at them to Not Be Stupid And Provide the Better Behavior. I'm not sure who or when first added the mount option. Linux certainly has both of those, but is not the only one. Philip Guenther
Re: demystifying trap
On Sat, Jan 12, 2019 at 10:49 AM Predrag Punosevac wrote: > Could one of peple with some rudimental knowledge of kernel interals > tell me what am I seeing here > > Jan 12 13:42:37 oko /bsd: trap [mmonit-bin]89524/427284 type 6: sp > 122488ae75d0 not inside 7f7fffbf4000-7f7f4000 > 'sp' means "stack pointer" in here. The kernel is killing your process because it moved its stack pointer outside the memory which was mapped with MAP_STACK. This is most often seen with userspace thread implementations that haven't been updated to use MAP_STACK when allocating memory for thread stacks. Philip Guenther
Re: Porting some software to OpenBSD
On Sat, Jan 5, 2019 at 7:25 PM Adam Steen wrote: > I have a question about string (printf) formatting. > > I have a variable > > 'uint64_t freq' > > which is printed with > > 'log(DEBUG, "Solo5: clock_init(): freq=%lu\n", freq);' > > but am getting the following error > > ' > error: format specifies type 'unsigned long' but the argument has type > 'uint64_t' (aka 'unsigned long long') [-Werror,-Wformat] > freq); > ^~~~ > 1 error generated. > ' > > The easy fix is to change the format to '%llu', but this brakes FreeBSD > and Linux. Am i missing something or should i be investigating the log > implementation? > Option 1) log(DEBUG, "Solo5: clock_init(): freq=%llu\n", (unsigned long long)freq); Option 2) #include log(DEBUG, "Solo5: clock_init(): freq=%"PRIu64"\n", freq); Software native to OpenBSD uses option 1 when necessary. Philip Guenther
Re: Purpose of primary and secondary user groups
On Sat, Dec 29, 2018 at 11:29 AM Ipsen S Ripsbusker < ip...@ripsbusker.no.eu.org> wrote: > Aside from compatibility, what is the purpose of primary groups, > compared to secondary groups? > > Said otherwise, why do we have both primary and secondary groups > rather than only secondary groups? > > Yet another phrasing: Why do I need to set a primary group? > Secondary groups can only be set, all at once, when running as root (e.g., login, sshd), while the primary group can be altered by setgid binaries and then switched among using set*gid(2). For filesystem objects like files and directories, the BSD behavior is for the object to get its group from the directory in which it was created, ignoring the groups of the process that created it. On more SysV-like systems the default is to take the primary group of the process that created it. However, for objects that exist in the kernel but not the filesystem such as pipes, sockets, and SysV shared memory segments, semaphores, and message queues, the common behavior is to take the primary group of the process that created it. This doesn't have much effect other than fstat() for pipes and sockets, but for SysV stuff it affects what operations processes can perform. Philip Guenther
Re: I can't make build stable 6.4
On Sat, Dec 22, 2018 at 10:29 AM Krzysztof Strzeszewski wrote: > I change permission: > > chown build /usr/src/lib/libcrypto/obj/v3_info.o.d > chown build /usr/src/lib/libcrypto/obj/v3_info.po.d > chown build /usr/src/lib/libcrypto/obj/v3_info.so.d > chown build /usr/src/lib/libcrypto/obj/v3_purp.so.d > > end it's ok. > > it is a bug end 4 files have bad permission for user root instead build... > Many of us run builds and haven't seen this. The source files for those have been around for over a decade and haven't been updated recently, so this wasn't source files updated in the middle of the builds. Without know the exact sequence of operations on this tree it would be hard to diagnose how this happened. "When in doubt, rm -rf /usr/obj/* before building" Philip Guenther
Re: Is HPET timer accessible in userland?
On Thu, Dec 13, 2018 at 4:58 PM Paul Swanson wrote: > Is the HPET timer on AMD64 available to > developers in OpenBSD user land? > No. The CPU TSC is available to userspace. Note you may need to use the RDTSCP instruction on MP boxes (and VMs...) where the TSC is not consistent across CPUs, real or virtual. Philip Guenther
Re: netstat *:* udp sockets
On Thu, Dec 13, 2018 at 10:40 AM Ted Unangst wrote: > netstat -an tells me I am listening to all the udp. > > Active Internet connections (including servers) > Proto Recv-Q Send-Q Local Address Foreign Address > (state) > udp 0 0 *.**.* > udp 0 0 127.0.0.1.53 *.* > udp 0 0 *.**.* > udp 0 0 *.5353 *.* > udp 0 0 *.**.* > > What are those *.* sockets doing? How can you listen to all the ports? > Those are just UDP sockets on which connect() hasn't been called and that aren't in the middle of a recvfrom() or recvmsg(), no? And, perhaps more directly, how would I block this in pf.conf? > Excellent choice, blocking dhclient from receiving the leases that it requests. "What problem are you trying to solve?" Philip Guenther
Re: Core Dev?
On Tue, Dec 4, 2018 at 2:47 AM Marc Espie wrote: > (note that Antoine is the 2nd most prolific contributor to OpenBSD in terms > of # of commits) > Sure, Marc, but that's just because Antoine is such a high caliber mole that 22 years and 22k commits in order to backdoor AWS systems that were _clearly_ going to happen is completely believable. Philip Guenther
Re: statethreads crashes in ld on 6.4
On Mon, 3 Dec 2018, Claus Assmann wrote: > Here's the dissambler output and the ktrace output follows. > Unfortunately I don't know enough about this to figure out > what is wrong, hopefully someone else can (or tell me which > other information is still needed). TIA! A close read of the ktrace output points to the problem: ... > 65554 server GIO fd 2 wrote 89 bytes >"[03/Dec/2018:08:28:29] INFO: process 0 (pid 65554): starting 8 > threads on localhost:1234 >" So it's just about to create its eight (userspace) threads... > 65554 server CALL > mmap(0,0x12000,0x3,0x1002,-1,0) > 65554 server RET mmap 21771804393472/0x13cd24aac000 > 65554 server CALL > mmap(0,0x12000,0x3,0x1002,-1,0) > 65554 server RET mmap 21771451404288/0x13cd0fa09000 > 65554 server CALL > mmap(0,0x12000,0x3,0x1002,-1,0) > 65554 server RET mmap 21773345935360/0x13cd808cd000 > 65554 server CALL > mmap(0,0x12000,0x3,0x1002,-1,0) > 65554 server RET mmap 21774756491264/0x13cdd4a03000 > 65554 server CALL > mmap(0,0x12000,0x3,0x1002,-1,0) > 65554 server RET mmap 21774604423168/0x13cdcb8fd000 > 65554 server CALL > mmap(0,0x12000,0x3,0x1002,-1,0) > 65554 server RET mmap 21773142749184/0x13cd74707000 > 65554 server CALL > mmap(0,0x12000,0x3,0x1002,-1,0) > 65554 server RET mmap 21773994246144/0x13cda7314000 > 65554 server CALL > mmap(0,0x12000,0x3,0x1002,-1,0) > 65554 server RET mmap 21774606540800/0x13cdcbb02000 Eight mmaps, presumably one per thread... > 65554 server CALL kbind(0x7f7d4fa8,24,0x8a4abe18ba78cb4a) > 65554 server RET kbind 0 Okay, so this kbind() is by the original thread. The first argument to kbind() happens to be a buffer which is always on the current thread's stack. All is good here. ... > 65554 server CALL kbind(0x13cd24abcc48,24,0x8a4abe18ba78cb4a) > 65554 server PSIG SIGSEGV SIG_DFL addr=0x0 trapno=0 > 65554 server NAMI "server.core" And now this kbind() call blows up: the address is not on the original thread's stack but in one of those mmap()s...but those mmap()s were not marked as stacks by including MAP_STACK. To quote the "Security improvements" section of https://www.openbsd.org/64.html * Implemented MAP_STACK option for mmap(2). At pagefaults and syscalls the kernel will check that the stack pointer points to MAP_STACK memory, which mitigates against attacks using stack pivots. To confirm, if you check your dmesg(8) or /var/log/messages you should find the kernel complaining something like syscall [server]65554/### sp 13cd24a## not inside 0x7f7f###-0x7f7f### Philip Guenther
Re: statethreads crashes in ld on 6.4
On Sun, Dec 2, 2018 at 7:51 PM Edgar Pettijohn wrote: > Sorry just saw it came with some examples. Testing with the `lookupdns' > program > ended with a Bus error (core dumped). Here is gdb output: > > Core was generated by `lookupdns'. > Program terminated with signal SIGBUS, Bus error. > #0 _longjmp () at /usr/src/lib/libc/arch/amd64/gen/_setjmp.S:99 > 99 1: movq%r11,0(%rsp) > (gdb) bt > #0 _longjmp () at /usr/src/lib/libc/arch/amd64/gen/_setjmp.S:99 > Backtrace stopped: Cannot access memory at address 0xb044815db732800f > Crashing on _longjmp() would suggest it's not happy with OpenBSD's setjmp/longjmp XOR cookies, but those have been in for a while. If statethreads were working for Claus with 6.3 then he's hitting something different. Philip
Re: 6.4-release tset(1) really slow, what have I missed?
On Sun, Dec 2, 2018 at 2:15 PM Adam Thompson wrote: > I've successfully installed OpenBSD 6.4-RELEASE at OVH, but I'm noticing > one thing there that's different from everywhere else I've used 6.4. > > tset(1) takes approximately 12-15 seconds to execute, (almost) every > time. > > On a DigitalOcean VPS running 6.3-STABLE (via openup) tset sensibly > takes about 1 or 2 seconds: >athom...@mail.athompso.net:~$ time tset -s >TERM=xterm; >0m01.01s real 0m00.00s user 0m00.01s system >athom...@mail.athompso.net:~$ uname -r >6.3 > > On the OVH VPS running 6.4-STABLE (via openup), the same command takes > 15 seconds: >athom...@mail2.athompso.net:~$ time tset -s >TERM=xterm; >0m15.19s real 0m00.00s user 0m00.01s system >athom...@mail2.athompso.net:~$ uname -r >6.4 > > > That's from two SSH sessions from the same client with the same > parameters. > > I've captured ktrace(1) output, which shows tset(1) doing, well, > nothing: > ... > 57429/443422 tset 0.035908 CALL > kbind(0x7f7f7678,24,0xecf2201fc1aab9ca) > 57429/443422 tset 0.035933 RET kbind 0 > 57429/443422 tset 0.035950 CALL > nanosleep(0x7f7f7760,0x7f7f7750) > 57429/443422 tset 0.035967 STRU struct timespec { 1 } > 57429/443422 tset 15.809238 STRU struct timespec { 0 } > 57429/443422 tset 15.809272 RET nanosleep 0 > 57429/443422 tset 15.809303 CALL > kbind(0x7f7f76c8,24,0xecf2201fc1aab9ca) > 57429/443422 tset 15.809380 RET kbind 0 > ... > > I don't think this is a bug in 6.4, it's clearly environment-specific... > but I have no idea what on earth could be causing it. > It requested a sleep of 1 second and 15 seconds passed. That's a kernel timetracking issue, so the output of "sysctl kern.timecounter" would be a good place to start. Is this is an MP kernel using the CPU TSC, but on a VM where the virtual CPU's TSCs aren't in sync? Philip Guenther
Re: statethreads crashes in ld on 6.4
On Sat, Dec 1, 2018 at 6:34 AM Claus Assmann wrote: > statethreads (http://state-threads.sourceforge.net/) crashes on > OpenBSD 6.4/amd64 (release) with an error in ld (see below); it > works fine on previous OpenBSD versions. Do I have to set some > "special" cc/ld options to make this work? That'll depend on what the problem turns out to be, of course... > Or are patches to > statehreads required (there doesn't seem to be a port for it, > otherwise I would try that)? > Not that I know of. > #0 0x0c0b0980db08 in _dl_bind (object=0xc0a85cff400, index=) >from /usr/libexec/ld.so > (gdb) > Since ld.so is relinked on each boot, just an address doesn't really show what died. The disassembly up to that address would help. More important is knowing what signal killed the process. ktracing it and seeing what the syscalls leading up to signal were (and what extra info was in the signal) tells a lot. Philip Guenther
Re: why thread is not usable in perl5 of OpenBSD6.4?
On Sun, Nov 25, 2018 at 1:57 AM 岡本健二 wrote: > I have to use thread on the perl5 of OpenBSD 6.4. > However, it was disabled on the distribution. > Hmm, is this something that worked in previous releases, or is something that you've only tried in OpenBSD 6.4? Off-hand, it's still disabled by default in the Configure script that perl people ship, and I don't see anything in the OpenBSD bits to override their choice. > I tried to make the thread active to recompile the perl5 with -Dusethreads, > which led me to many test fails. > Were there tests that failed with -Dusethreads that passed when that wasn't used? If so, which, and what was their output? To put it another way: if you're suggesting that we build the base perl with -Dusethreads, what are the consequences of that? Test failures? Bigger binary? pkg_add is slower? Why the thread function was disabled in this release? > Is it security reason? > Upstream has it off by default, nothing so far has needed it, and it makes things slower (or at least that's why upstream says). Why would we enable it? Philip Guenther
Re: non-interactive sh and SIGTERM
On Fri, Nov 23, 2018 at 1:51 PM Olivier Taïbi wrote: > Sorry about the wrong report, I just tested again and I can see the same > behaviour with OpenBSD 6.4: sending SIGTERM to the sh process after > launching sh -c 'sleep 1000' does not result in sh sending a SIGTERM to > the sleep process. > Hmm, why should it? If you wanted to kill whatever processes where started from that invocation, shouldn't you send SIGTERM to the process group? > Philip, what was your test? > : morgaine; sh -c 'while :; do :; done' & [3] 16632 : morgaine; kill 16632 [3] - Terminated sh -c "while :; do :; done" : morgaine; : morgaine; sh -c 'while :; do sleep 1; done' & [3] 59539 : morgaine; kill 59539 : morgaine; [3] - Terminated sh -c "while :; do sleep 1; done" : morgaine; sh itself doesn't ignore SIGTERM, but rather exits after receiving it. Philip Guenther
Re: non-interactive sh and SIGTERM
On Thu, Nov 22, 2018 at 3:08 PM Olivier Taïbi wrote: > It seems that non-interactive sh(1) (i.e. sh -c command or sh file) > ignores the TERM signal. I'm surprised, is this the intended behaviour? > The man page says that interactive shells will ignore SIGTERM, but does > not mention the non-interactive case. > In my quick test it doesn't ignore SIGTERM, so you'll need to provide additional information for us to help you. Philip Guenther
Re: FreeBSD in vmm
On Tue, Nov 20, 2018 at 6:29 PM Ken M wrote: > Has anyone gotten this working? > > Just trying it as an experiment. > > I installed using qemu, serial console is working but when I boot through > vmctl > the console shows a supervisor read error, page not found which from what > I read > is indicative of bad memory. In qemu it boots fine though. Not sure what I > am > missing. > Not supported yet. There will be some sort of announcement when it works. Philip Guenther
Re: CURRENT userland does not compile due to games/glorkz
On Mon, Nov 12, 2018 at 2:41 AM Jyri Hovila [Turvamies.fi] < jyri.hov...@turvamies.fi> wrote: > > It's not a shortcut, > > This, as many things in this world, completely depend on the point of view. > > One can not simply say "this is this" or "this is not this", without > sufficient background information and overall understanding of the > situation as a whole. > ...which you didn't include. As the line from diagnostic medicine goes "hear hoofbeats? expect horses, not zebras". Failure to mention why your case is unusual suggests that you're a "normal case" -current follower, not someone who has an undisclosed reason for never using snaps. If you're uninterested in what you (now?) know to be the normal answer, you should say that so everyone's time can be saved. It also means that you need crank up your debugging and analysis, so you can work through these things yourself. What failed? Why? What does that imply? What can you change to resolve it? To avoid it? To undo it? If when you build -current, you also build a release, do you still have the files from your last successful build so you can rollback to something you accept? Do you have a test machine you can use your own snap to run through the source update multiple times to experiment with solutions? "What problem are you trying to solve?" ... > > It's fine if you want to waste your own time, but this is the > > one single method of getting out of many holes, like yours. > > It is also perfectly fine if you want to ignore how the real world > functions, and/or give a super irritating / dislikable impression of > yourself and your personality. To give you back just a little, it certainly > seems you know your holes well enough. > That was an unpleasant turn. Philip Guenther
Re: heap full during amd64 boot.
On Sat, Nov 3, 2018 at 12:13 PM Angelo Rossi wrote: > First of all you can't endorse me with services I cannot fullfill just > because you're lowly sense of humor told this. What is the effect (or goal, for that matter) of sharing this information with a broad base of users? Surely it's for more people to have and be able to use that knowledge, no? Meanwhile there have been multiple threads between bugs@ and misc@ where people reported such single-filesystem setups as having problems and were told "don't do that; it's a bad idea; use a normal multi-FS setup". If supporting such setups wasn't your goal, it was not clear what your goal was from your original message. > Then if the right behaviour > for bootloader is to give error on this broken configuration it follows > that the i386 arch is broiken since it permits to boot from such "crazy" > partitioning scheme. For my longer explanation of resource limitations at the bootloader and how that interacts with testing the envelope, see here: https://marc.info/?l=openbsd-misc=154053727724928=3 Given that background, you should understand that adding extra checks to the bootloader to detect and give a clearer error message for these "crazy" setup will actually break *more* of them! There are trade-offs: we made the changes we did because we thought they were worth the cost. Breaking more systems just to tell people clearly that their setups are unwise seems like a bad trade-off to me, but maybe it's the Right Thing if it'll eliminate these threads. Philip Guenther
Re: heap full during amd64 boot.
On Sat, Nov 3, 2018 at 7:59 AM Angelo Rossi wrote: > When using a=/ and b=swap partitioning scheme on installation or upgrading > from 6.3 with same partitioning scheme the default HEAP_SIZE=0xA > generates heap full error during boot in 6.4 amd64. To solve this I > installed the -stable soiurces from AnonCVS, and changed HEAP_SIZE from > 0xA to 0xC (the machine I tested an needs 700687 B so it should > work with an > HEAP_SIZE=0xB) in the file > /usr/src/sys/arch/amd64/stand/Makefile.inc . Then I compiled and installed > the new bootloader. > ...and thereby continued to run a badly configured system, while providing information to others on how to do so as well. HEY EVERYONE USING A SINGLE PARTITION: YOU CAN GET SUPPORT FROM < angelo.rossi.home...@gmail.com>.
Re: How effectiate login.conf changes in console? ("ksh -l" does not)
On Mon, Oct 29, 2018 at 9:19 PM Joseph Mayer wrote: > On Tuesday, October 30, 2018 1:56 PM, Philip Guenther > wrote: > > On Mon, Oct 29, 2018 at 8:40 PM Joseph Mayer joseph.ma...@protonmail.com > > wrote: > > > > > After having changed /etc/login.conf I'd like to effectuate the > > > changes directly in the console, without doing a logout-relogin > > > cycle. > > > Running "ksh -l" does not effectuate login.conf changes but only > > > re-runs the profile script [1]. > > > Running "login" asks for username and password which seems less > > > efficient than possible. > > > Is there any way to do this? > > > > Since changes to login.conf may mean raising/increasing hard limits, > which > > can only be done by privileged processes, the only sure fire way to have > > login.conf changes take effect is to logout and log back in. > ... > What about "su -l" [1]? > ... > If I'm root and do "su -l", will root's login.conf settings be applied? > > su.c [2] uses setusercontext() [3], and because emlogin is 0, > LOGIN_SETRESOURCES is specified as flag, and so is LOGIN_SETUMASK - > meaning login.conf settings are indeed effectuated by root doing > "su -l" (relogin as root) or "su -l someuser" (login as someuser). > > Correct? > I guess? Frankly, this is not an area I really care about: if I wanted to test a login.conf change I would either logout/login if I had the password for the account, or I suppose if it came up that I didn't, "su -c class" as root.If 'su -l' works for you in your testing (you did test it, yes?), then use it. Philip Guenther
Re: How effectiate login.conf changes in console? ("ksh -l" does not)
On Mon, Oct 29, 2018 at 8:40 PM Joseph Mayer wrote: > After having changed /etc/login.conf I'd like to effectuate the > changes directly in the console, without doing a logout-relogin > cycle. > > Running "ksh -l" does *not* effectuate login.conf changes but only > re-runs the profile script [1]. > > Running "login" asks for username and password which seems less > efficient than possible. > > Is there any way to do this? Since changes to login.conf may mean raising/increasing hard limits, which can only be done by privileged processes, the only sure fire way to have login.conf changes take effect is to logout and log back in. Philip Guenther
Re: Dell PowerEdge R410 not booting 6.4
On Thu, Oct 25, 2018 at 8:44 PM diego righi wrote: > So why openbsd 6.4 i386 and amd64 bootloaders (not biosboot, boot!) > express different behavior? Wasn't openbsd about correctness? :/ > If I'm wrong and it is documented that I can't do this fine, but so also > i386 should not work, this behavior is just strange for me, that's it. > This is something that most people, perhaps even most software developers, are not strongly aware of: that resource requirements are often both fine-grained and sharp-edged. That is: the exact requirements can vary in many fine-steps between systems, but there can be a sharp edge at which performance plummets badly or the system totally fails. This is true of *many* systems (including lots of cloud services) which work just fine until they *suddenly* fail, because the memory straw broke the available RAM camel's back, or the micro-service is now taking just _longer_ to service one request than the inter-request arrival interval so the queue so the queue grows in latency past the user/system tolerance. Case in point: the memory resources required by the biosboot code depend many factors including: * the size of the root partition * the block size of the root partition (which is affected by the size) * the inode number of the kernel being booted * the exact disk block numbers which the booted kernel was put in We all, the developers and the community of user who actively test -current kernel (THANK YOU!) exercise various combinations of those, the *vast* majority of which use the recommended system layout. That recommend layout doesn't push the first two of those items at all, and keeps the third and fourth in sane ranges. Meanwhile, those using monster root partitions have unknowingly been pushing the memory usage by biosboot, but below its limits. So, some change was made during the 6.3->6.4 dev cycle which requires _slightly_ more memory in biosboot. Maybe it was something about the compiler upgrades, or the maybe it was the SoftRaid crypto passphrase-retry change. Or maybe it was the tiny tweak of making biosboot default the console to com0 @ 115200 on VMs.Something made biosboot take more memory...and now those systems with monster root partitions were pushed over the edge of how much memory biosboot has available. Rule of thumb: the costs must be worth the gain. So: * enhancements and fixes break systems that don't get actively tested * we are are not going to block enhancements/fixes because of that * we test what we recommend, on many systems * if a change breaks the recommended config, then it'll get reverted/fixed * ...this is more likely the more quickly the problem is reported * ...and even then the recommendation for the future might change * we also test some systems that go beyond those recommendations... * ...but not all * if a system that doesn't follow the recommendations breaks as the result of a change, the developers will make a judgement about whether the gain of the change is worth the costs. We don't like breaking any config, even unusual ones, but if we think the setup is inadvisable, we'll say so and move on. In this particular case: * the changes to biosboot where in snapshots for MONTHS, but no one reported problems * if you aren't following recommendations, and aren't testing snapshots, then you should be 100% willing to change your configuration on upgrade, 'cause you ain't giving the feedback necessary to keep your unusual config alive * SINGLE PARTITION CONFIGS ARE DUMB, DON'T DO THAT; DON'T BOTHER COMPLAINING, JUST FIX THEM. Philip Guenther
Re: dmesg for edgerouter 6p
On Tue, Oct 23, 2018 at 12:14 PM Holger Glaess wrote: > i upgrade from an native 6.4 beta installation , no problems at all. > To quote the email sent to your local 'root' user after install/upgrade: If you wish to ensure that OpenBSD runs better on your machines, please do us a favor (after you have your mail system configured!) and type something like: # (dmesg; sysctl hw.sensors) | \ mail -s "Sony VAIO 505R laptop, apm works OK" dm...@openbsd.org so that we can see what kinds of configurations people are running. As shown, including a bit of information about your machine in the subject or the body can help us even further. We will use this information to improve device driver support in future releases. (Please do this using the supplied GENERIC kernel, not for a custom compiled kernel, unless you're unable to boot the GENERIC kernel. If you have a multi-processor machine, dmesg results of both GENERIC.MP and GENERIC kernels are appreciated.) The device driver information we get from this helps us fix existing drivers. Thank you!
Re: Bootloader failing to install on 2012 Mac Mini (Openbsd 6.4)
On Tue, Oct 23, 2018 at 4:38 PM Liam Wigney wrote: > I've used Openbsd before but my installs have gone smoothly with no issues > and this is really the first time it's been a problem. The install is a > super boring one, it's whole disk Openbsd with the default gpt partition > layout and nothing else special. > > During the install after the sets are successfully installed there's a > notification that the bootloader has failed to install due to mkdir being > called with an invalid argument. All the error messages from installboot from mkdir failing include both the path and the specific error message. Those are included because they're helpful in understanding exactly what failed (and thus what could be wrong). Including the _exact_ and _full_ error message would make it easier to assist. (Ruling out stuff that _didn't_ fail is key to figuring out root causes.) > Some research online said that I should > try to do installboot manually in the subsequent prrompt, so I called > installboot sd0 and got the following error > > installboot: /usr/mdec/biosboot: No such file or directory > Yes, when running from the bsd.rd ramdisk additional argument are necessary so that installboot can find the files it needs and disk on which to install them. ...but doing that will just replicate what the upgrade script already did and the error it gave you... At this point, the two pieces of information that would help the most are: 1) the *EXACT AND FULL* error message that the upgrader reported from installboot 2) what your disklabel and partition layout looks like. The output of "df -k" from the ramdisk shell prompt after the upgrade fails would be good, for example, as it has everything mounted under /mnt. Philip Guenther
Re: ath.c -> dmesg -> bug
On Wed, Oct 10, 2018 at 3:35 PM NN wrote: > I try to analyse my dmesg with: > > # dmesg | grep ath0 > > and I can see ERROR message: > > > ath0 device timeout ... > > I have checked "ath.c" file in "/cvs/src/sys/dev/ic/" on stable branch. > > I found this one construction: "--sc->sc_tx_time == 0". Probably it's > meen "0 == 0", > I have made this patch (see in attachment) and now it's working without > any ERROR/WARNING for me. > > Please confirm. > > If my FIX for "ath.c" is correct, please update cvs in new 6.4 Release. > ... > --- ath.c31 Jan 2018 11:27:03 -1.116 > +++ ath.c11 Oct 2018 00:06:54 - > @@ -930,7 +930,7 @@ ath_watchdog(struct ifnet *ifp) > if ((ifp->if_flags & IFF_RUNNING) == 0 || sc->sc_invalid) > return; > if (sc->sc_tx_timer) { > -if (--sc->sc_tx_timer == 0) { > +if (sc->sc_tx_timer == 0) { > This diff cannot be correct: the condition right above it is only true if sc->sc_tx_timer is non-zero, so then testing whether it is _currently_ zero will never be true. That's also the only place sc->sc_tx_timer is decremented, so deleting the '--' disables the timeout. The existing code decrements it and then tests whether it's zero, effectively testing whether sc->sc_tx_timer was exactly 1. Your work to update the driver in the thread from October 5th is a more productive way to address the issues you're experiencing. Philip Guenther