Re: pax and ext2fs

2024-05-17 Thread Philip Guenther
On Thu, May 16, 2024 at 12:08 AM Philip Guenther  wrote:
> On Wed, May 15, 2024 at 1:14 AM Philip Guenther  wrote:
...
>> I think you've managed to hit a spot where the POSIX standard doesn't 
>> provide a way for a program to find the information it needs to do its job 
>> correctly.  I've filed a ticket there
>>https://austingroupbugs.net/view.php?id=1831
>>
>> We'll see if my understanding of pathconf() is incorrect or if someone has a 
>> great idea for how to get around this...
>
> So yeah, what's needed is pathconfat(2)** but whether this winding loose end 
> ("That poor yak.") merits that much code and surface is yet to be examined 
> deeply.

The fix for this has now been committed, so it'll be in 7.6 and a near
future snapshot.


Philip Guenther



Re: pax and ext2fs

2024-05-17 Thread Philip Guenther
On Thu, May 16, 2024 at 5:33 AM Walter Alejandro Iglesias
 wrote:
>
> On Thu May 16 09:48:45 2024 Philip Guenther wrote:
> > So yeah, what's needed is pathconfat(2)** but whether this winding loose
> > end ("That poor yak.") merits that much code and surface is yet to be
> > examined deeply.
...
> I read what you posted here:
>
>   https://austingroupbugs.net/view.php?id=1831
>
> In the footnote you wrote:
>
>   "(This was encountered when trying to fix a pax implementation's
>   handling of timestamp comparison for -u when the target filesystem had
>   courser resolution that the source filesystem by using
>   pathconf(_PC_TIMESTAMP_RESOLUTION) on the target path to handle the
>   loss of high-precision time info...but the symlink pointed to a
>   location with high-precision timestamps so it couldn't know to round
>   the times when doing the comparison...)"
>
>
> I did one more experiment.  I removed the offending soft link from my
> hard disk, then I copied the backed-up version of the soft link from the
> ext2 drive back to my system tree.

So you did so and then checked the timestamps on the symlinks using
stat to see how they compared, yes?

>  Now pax (with your patches) doesn't
> insist in re-updating the file,

Sounds like you copied with something like 'cp -p' so the copy has a
mtime with zero nsecs part, so now they do compare as equal.

> *even after updating the file with
> touch(1)*.

Why would the symlink needs to be recopied by pax?  You didn't update
the symlink's timestamps.

> The soft link *still* points to a location with high-precision
> timestampts, but pax does the right job.

Because the symlinks now have the exact same timestamp, one with zero nsecs.

> Intuitively this suggests me that there is something more that mtime
> precision in this misunderstanding between OpenBSD and ext2 file
> systems.

I think you should check the timestamps on the symlinks at each step
to validate that.


> P.S.: I'm courious about the following.  After running the stat command
> here and there, I found *many* files showing that lack of mtime
> granularity spread throughout all my system tree (as a side note: this
> doesn't happen with their ctime and atime.)

The released install tgz files (base75.tgz, etc) use a format where
the contained files all have simple integer mtimes and tar is invoked
with the -p option (required for correct permissions on setuid/gid
files) which makes it also set the mtime on the extracted file to
match what's in the tar file.

ctime is always set from the local clock when the inode is
allocated/updated, so no reason for it to always have a zero nsecs.

atime is of course updated from the local clock when you, uh, access them.


Philip Guenther



Re: pax and ext2fs

2024-05-16 Thread Philip Guenther
On Wed, May 15, 2024 at 1:14 AM Philip Guenther  wrote:

> On Tue, May 14, 2024 at 11:59 AM Walter Alejandro Iglesias <
> w...@roquesor.com> wrote:
>
>> Hi Philip,
>>
>> On Tue May 14 19:40:04 2024 Philip Guenther wrote:
>> > If you like, you could try the following patch to pax to more gracefully
>> > handle filesystems with time resolution more granular than nanoseconds.
>>
>> After applying your patch, as I'd done before reporting the issue, I
>> sycronized my home directory to an external ext2fs drive with the
>> command showed by the man page:
>>
>>   $ pax -rw -v -Z -Y source target
>>
>> This time only one file stays updating again an again, a soft link I
>> have in my ~/bin folder of /usr/local/bin/prename.
>
>
> I think you've managed to hit a spot where the POSIX standard doesn't
> provide a way for a program to find the information it needs to do its job
> correctly.  I've filed a ticket there
>https://austingroupbugs.net/view.php?id=1831
>
> We'll see if my understanding of pathconf() is incorrect or if someone has
> a great idea for how to get around this...
>

So yeah, what's needed is pathconfat(2)** but whether this winding loose
end ("That poor yak.") merits that much code and surface is yet to be
examined deeply.

Philip Guenther


** or lpathconf(2), but pathconfat(2) is better


Re: pax and ext2fs

2024-05-15 Thread Philip Guenther
On Tue, May 14, 2024 at 11:59 AM Walter Alejandro Iglesias 
wrote:

> Hi Philip,
>
> On Tue May 14 19:40:04 2024 Philip Guenther wrote:
> > If you like, you could try the following patch to pax to more gracefully
> > handle filesystems with time resolution more granular than nanoseconds.
>
> After applying your patch, as I'd done before reporting the issue, I
> sycronized my home directory to an external ext2fs drive with the
> command showed by the man page:
>
>   $ pax -rw -v -Z -Y source target
>
> This time only one file stays updating again an again, a soft link I
> have in my ~/bin folder of /usr/local/bin/prename.


I think you've managed to hit a spot where the POSIX standard doesn't
provide a way for a program to find the information it needs to do its job
correctly.  I've filed a ticket there
   https://austingroupbugs.net/view.php?id=1831

We'll see if my understanding of pathconf() is incorrect or if someone has
a great idea for how to get around this...


Philip Guenther


Re: viomb0 unable to allocate256 physmem pages, error 12

2024-05-15 Thread Philip Guenther
viomb is a driver that tries to support OpenBSD, as a VM guest, responding
to a request from the VM host to stop using so much physical memory.  That
log message indicates that the kernel couldn't easily free up that much
physical memory, sorry!  The VM host is, of course, free to decide to just
page out whatever memory it wants instead, possibly resulting in thrashing:
running a VM setup oversubscribed for memory is a great way to be
frustrated and hate computers.

How can you make that message go away?  Provision your VM setup with enough
memory that it's not over subscribed, or at least so that the OpenBSD
guest(s) isn't the one being asked to slim itself (possibly by giving it
*less* but _reserved_ memory, so that the VM host never tries to shrink
its usage).


Philip Guenther


On Tue, May 14, 2024 at 4:16 PM F Bax  wrote:

> I'm not a coder; but I found source for viomb; which
> calls uvm_pglistalloc; which calls uvm_pmr_getpages which mentions ENOMEM:
>
> https://cvsweb.openbsd.org/cgi-bin/cvsweb/~checkout~/src/sys/uvm/uvm_pmemrange.c?rev=1.66=text/plain
> There I found this comment:
> * fail if any of these conditions is true:
> * [1]  there really are no free pages, or
> * [2]  only kernel "reserved" pages remain and
> *the UVM_PLA_USERESERVE flag wasn't used.
> * [3]  only pagedaemon "reserved" pages remain and
> *the requestor isn't the pagedaemon nor the syncer.
>
> Unsure how I might use this information to get rid of the previously
> mentioned error message..
>
> On Tue, May 14, 2024 at 2:28 PM Peter J. Philipp 
> wrote:
>
>> On Tue, May 14, 2024 at 01:58:18PM -0400, F Bax wrote:
>> > Recently installed 7.5 amd64 in qemu VM (8G RAM) under proxmox. See this
>> > message many times on console and dmesg.
>> >
>> > viomb0 unable to allocate 256 physmem pages, error 12
>> >
>> > What does this mean? How to resolve this issue?
>>
>> Hi,
>>
>> When you see "error " it's good to look up the manpage on errno.
>> Under number 12 it says:  ENOMEM "Cannot Allocate Memory".  But look for
>> yourself for a deeper explanation.  Also if you want to hunt for this
>> errno
>> in the code you would most likely grep for ENOMEM.
>>
>> Best Regards,
>> -pjp
>>
>> --
>> ** all info about me:  lynx https://callpeter.tel, dig loc
>> delphinusdns.org **
>>
>>


Re: pax and ext2fs

2024-05-14 Thread Philip Guenther
If you like, you could try the following patch to pax to more gracefully
handle filesystems with time resolution more granular than nanoseconds.
The whitespace will presumably be mauled by gmail so use patch's -l option.

Philip Guenther


Index: ar_subs.c
===
RCS file: /data/src/openbsd/src/bin/pax/ar_subs.c,v
diff -u -p -r1.51 ar_subs.c
--- ar_subs.c   10 Jul 2023 16:28:33 -  1.51
+++ ar_subs.c   14 May 2024 17:19:15 -
@@ -146,23 +146,59 @@ list(void)
 }

 static int
-cmp_file_times(int mtime_flag, int ctime_flag, ARCHD *arcn, struct stat
*sbp)
+cmp_file_times(int mtime_flag, int ctime_flag, ARCHD *arcn, const char
*path)
 {
struct stat sb;
+   long res;

-   if (sbp == NULL) {
-   if (lstat(arcn->name, ) != 0)
-   return (0);
-   sbp = 
+   if (path == NULL)
+   path = arcn->name;
+   if (lstat(path, ) != 0)
+   return (0);
+
+   /*
+* The target (sb) mtime might be rounded down due to the
limitations
+* of the FS it's on.  If it's strictly greater or we don't care
about
+* mtime, then precision doesn't matter, so check those cases first.
+*/
+   if (ctime_flag && mtime_flag) {
+   if (timespeccmp(>sb.st_mtim, _mtim, <=))
+   return timespeccmp(>sb.st_ctim, _ctim,
<=);
+   if (!timespeccmp(>sb.st_ctim, _ctim, <=))
+   return 0;
+   /* <= ctim, but >= mtim */
+   } else if (ctime_flag)
+   return timespeccmp(>sb.st_ctim, _ctim, <=);
+   else if (timespeccmp(>sb.st_mtim, _mtim, <=))
+   return 1;
+
+   /*
+* If we got here then the target arcn > sb for mtime *and* that's
+* the deciding factor.  Check whether they're equal after rounding
+* down the arcn mtime to the precision of the target path.
+*/
+   res = pathconf(path, _PC_TIMESTAMP_RESOLUTION);
+   if (res == -1)
+   return 0;
+
+   /* nanosecond resolution?  previous comparisons were accurate */
+   if (res == 1)
+   return 0;
+
+   /* common case: second accuracy */
+   if (res == 10)
+   return arcn->sb.st_mtime <= sb.st_mtime;
+
+   if (res < 10) {
+   struct timespec ts = arcn->sb.st_mtim;
+   ts.tv_nsec = (ts.tv_nsec / res) * res;
+   return timespeccmp(, _mtim, <=);
+   } else {
+   /* not a POSIX compliant FS */
+   res /= 10;
+   return ((arcn->sb.st_mtime / res) * res) <= sb.st_mtime;
+   return arcn->sb.st_mtime <= ((sb.st_mtime / res) * res);
}
-
-   if (ctime_flag && mtime_flag)
-   return (timespeccmp(>sb.st_mtim, >st_mtim, <=) &&
-   timespeccmp(>sb.st_ctim, >st_ctim, <=));
-   else if (ctime_flag)
-   return (timespeccmp(>sb.st_ctim, >st_ctim, <=));
-   else
-   return (timespeccmp(>sb.st_mtim, >st_mtim, <=));
 }

 /*
@@ -842,14 +878,12 @@ copy(void)
/*
 * if existing file is same age or newer skip
 */
-   res = lstat(dirbuf, );
-   *dest_pt = '\0';
-
-   if (res == 0) {
+   if (cmp_file_times(uflag, Dflag, arcn, dirbuf)) {
+   *dest_pt = '\0';
ftree_skipped_newer(arcn);
-   if (cmp_file_times(uflag, Dflag, arcn, ))
-   continue;
+   continue;
}
+   *dest_pt = '\0';
}

/*

On Thu, May 2, 2024 at 6:54 AM Walter Alejandro Iglesias 
wrote:

> On Thu, 2 May 2024 12:03:10, Stuart Henderson wrote
> > I don't have a suitable filesystem handy to test, but does OpenBSD's
> > implementation of ext2fs support sub-second timestamps?
> >
> > stat -f %Fm $filename
> >
> > If not, that's a probable explanation for the difference in behaviour.
> > You could probably confirm by forcing timestamps with no nanosecond
> > components, e.g. touch -t mmddhhmm.ss $filename, or copy to ext2fs
> > and back again.
>
> $ doas mount -t ext2fs /dev/sd0i /mnt
> $ touch ~/test.txt
> $ cp ~/test.txt /mnt
> $ stat -f %Fm /mnt/test.txt
> 1714657214.0
> $ cp ~/test.txt /mnt
> $ stat -f %Fm /mnt/test.txt
> 1714657409.0
> 癘m
>


Re: pax and ext2fs

2024-05-01 Thread Philip Guenther
On Tue, Apr 30, 2024 at 5:50 AM Walter Alejandro Iglesias 
wrote:

> I'd never used pax(1), reading the man page I found this command can be
> used to make a backup:
>
>   $ pax -r -w -v -Y -Z home /backup
>
> Faster than using rsync indeed, but it seems that the -Y and -Z options
> don't work with ext2fs?
>

It should work the same as on ffs, but since you put zero effort into
describing _how_ its behavior didn't match your expectations, I wouldn't
expect anyone to put more than zero effort in reading your mind.

Good luck!


Philip Guenther


Re: Getting "Boot error" after replacing a disk in softraid

2024-04-23 Thread Philip Guenther
RAID replicates the data in the RAIDed area, yes?

Do you have some reason to believe that the boot information (MBR, etc) is
_inside_ the RAID area, because I do not believe that.  Really feels like
installboot needs to be run on this drive to, uh, install the proper boot
info.


Philip Guenther


On Tue, Apr 23, 2024 at 8:19 AM  wrote:

> Also, if I boot from a USB stick, with only the new SSD attached, the
> softraid is registered as degraded (as the other old disk is missing), so
> it has been populated, and the partition is also marked with an asterisk
> for boot, but I still cannot boot from that drive.
>
>


Re: AAAA entry for openbsd.org

2023-10-22 Thread Philip Guenther
On Sun, Oct 22, 2023 at 6:53 PM Armin Jenewein  wrote:

> Hi.
>
> On 23-10-22 15:47:45, Kastus Shchuka wrote:
> > On Sun, Oct 22, 2023 at 10:29:08PM +0200, Armin Jenewein wrote:
> > > Hi,
> > >
> > > as I'm almost 100% sure adding IPv6 connectivity to the openbsd.org
> > > host
> > > wouldn't introduce side-effects for IPv4 users: is there any reason
> > > openbsd.org still has no  entry at the end of 2023?
> >
> > Why do you need it?
>
> Because it's extremely inconvenient to have manually type in the name of
> a mirror that I know has an  entry. The installer won't even be able
> to download the mirror list because of the reason I mentioned. It tries
> to talk to openbsd.org which obviously fails.


See, this is why being clear about What Fine Problem You're Trying To Solve
is important: AFAICT the installer tries to fetch the mirror list from
ftplist1.openbsd.org and not from openbsd.org.

Can you confirm that your _actual_ request is to have the installer be able
to get the mirror list when on an IPv6-only host?

(Please don't rant at people who try to help, particularly when doing
exactly what you requested would NOT HAVE HELPED, unless you *want* people
to drop you in their kill-file as "not worth trying to help".)


Philip Guenther


Re: ImageMagick fails on OpenBSD 7.4 fresh install

2023-10-22 Thread Philip Guenther
Ah, sorry for my misreading what you wrote.

Please use 'sendbug' to report the sequence of pkg_add operations that
didn't work.


Philip Guenther


On Sun, Oct 22, 2023 at 5:50 PM Mark  wrote:

> It wasn't an upgraded system, that's fresh install, a completely new
> OpenBSD 7.4 amd64.
>
> And the first package I wanted to install, was imagick. And it failed as on
> the screenshot image link.
>
> However, installing the "gtk-update-icon-cache" package, and after, pkg_add
> imagick solved the problem.
>
> That was the suggestion of "quinq", from IRC #openbsd. ("can you try
> installing that package, and then ImageMagick?")
>
> Philip Guenther , 23 Eki 2023 Pzt, 02:54 tarihinde
> şunu
> yazdı:
>
> > Don't know what's wrong with the pkg database (/var/db/pkg/) on your
> > system, but on mine the shared-mime-info-2.2 package includes a
> definition
> > for the update-mime-info tag, so if yours lacks that then something in
> > there got hosed during your upgrade.  Could be data loss from disk
> failure,
> > could be something pruned critical info from /var/db/pkg/, could be
> > something I can't think of.
> >
> > So, I would suggest starting with verifying your confidence in your
> > storage (no kernel log error messages about I/O errors?  If this machine
> > has suffered any file system issues then maybe backup, verify-backup,
> newfs
> > and restore?)
> >
> > Then I would probably reinstall *all* packages, but since I don't (fully)
> > trust the pkg database, I would probably do it with the
> > 1) pkg_info -mz > manual
> > 2) cd /var/db/pkg && pkg_delete *
> > 3) make sure nothing unexpected has been left behind in /var/db/pkg/ or
> > /usr/local/*
> > 4) pkg_add -l manual
> >
> >
> > Or maybe now's a good time to do a fresh install.  
> >
> >
> > Philip Guenther
> >
> >
> > On Sun, Oct 22, 2023 at 3:34 PM Mark  wrote:
> >
> >> Tried changing the installurl, an another mirror, but didn't help.
> >>
> >> Here's what actually happens;
> >>
> >> https://i.ibb.co/G0wbGf5/terminal-sshot.png
> >>
> >> Regards.
> >>
> >> Mark , 23 Eki 2023 Pzt, 01:16 tarihinde şunu
> >> yazdı:
> >>
> >> > pkg_add ImageMagick-6.9.12.88p0 gives me;
> >> >
> >> > (after fetching few libraries)
> >> >
> >> > "Can't install ImageMagick-6.9.12.88p0: can't resolve
> >> > djvulibre-3.5.28p1,libheif-1.16.2p0"
> >> >
> >> > and then;
> >> > "Couldn't install ImageMagick-6.9.12.88p0 djvulibre-3.5.28p1
> >> > libheif-1.16.2p0."
> >> >
> >> > This is a fresh OpenBSD 7.4 amd64 release. My installurl is pointed to
> >> > cdn.openbsd.org/pub/OpenBSD.
> >> >
> >> > Any other php packages were installed fine. But both
> >> > pecl80-imagick-3.7.0p1 and ImageMagick fail.
> >> >
> >> > Some idea would be much appreciated!
> >> >
> >> > Regards.
> >> >
> >>
> >
>


Re: X session doesn't survive zzz

2023-10-22 Thread Philip Guenther
I would start by removing X from the picture and verify that suspend and
resume are working (or not) when X is not running.  Are USB devices failing
to reattach or coming back in some weird mode which isn't working?  Can you
ssh in?

If that's working fine, then bring X back into the picture but capture
/var/log/Xorg.0.log both before suspending and then after resuming (ssh in
if necessary) and see what X is falling over on.


Philip Guenther


On Wed, Oct 18, 2023 at 4:17 AM Jan Stary  wrote:

> On Oct 18 11:11:54, h...@stare.cz wrote:
> > This is current/amd64 on a PC (dmesg below).
> > After a resume from zzz inside a running X session,
> > I am greeted with the xenodm login screen
> > into which I cannot login: the keyboard does nothing
> > (is it the USB keyboard not reattaching properly?).
> >
> > Loging in on the console,
>
> To be clear: typing the username and passwd
> into the xenodm login screen does nothing,
> but on the console the kbd works as expeceted.
>
> > I see that the X session
> > and the X applications (firefox, xterms) are dead.
> > On the other hand, the mplayer that has been zzz'ed
> > inside a tmux session starts playing again.
> >
> > After restarting xenodm with rcctl restart xenodm,
> > I can log in and everything seems to work again.
> >
> > See the dmesg below, including the zzz and resume,
> > and the full X log up to here. How can I debug this?
> >
> >   Jan
> >
> >
> > OpenBSD 7.4-current (GENERIC.MP) #1406: Sun Oct 15 10:34:05 MDT 2023
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > real mem = 8285454336 (7901MB)
> > avail mem = 8014598144 (7643MB)
> > random: good seed from bootblocks
> > mpath0 at root
> > scsibus0 at mpath0: 256 targets
> > mainbus0 at root
> > bios0 at mainbus0: SMBIOS rev. 2.4 @ 0xf0100 (36 entries)
> > bios0: vendor Award Software International, Inc. version "F3" date
> 03/31/2011
> > bios0: Gigabyte Technology Co., Ltd. H67MA-USB3-B3
> > acpi0 at bios0: ACPI 1.0
> > acpi0: sleep states S0 S3 S4 S5
> > acpi0: tables DSDT FACP HPET MCFG ASPT SSPT EUDS MATS TAMG APIC SSDT
> > acpi0: wakeup devices PCI0(S5) PEX0(S5) PEX1(S5) PEX2(S5) PEX3(S5)
> PEX4(S5) PEX5(S5) PEX6(S5) PEX7(S5) HUB0(S5) UAR1(S3) USBE(S3) USE2(S3)
> AZAL(S5)
> > acpitimer0 at acpi0: 3579545 Hz, 24 bits
> > acpihpet0 at acpi0: 14318179 Hz
> > acpimcfg0 at acpi0
> > acpimcfg0: addr 0xf400, bus 0-63
> > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> > cpu0 at mainbus0: apid 0 (boot processor)
> > cpu0: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz, 3492.09 MHz, 06-2a-07,
> patch 002f
> > cpu0:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> > cpu0: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB
> 64b/line 8-way L2 cache, 8MB 64b/line 16-way L3 cache
> > cpu0: smt 0, core 0, package 0
> > mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> > cpu0: apic clock running at 99MHz
> > cpu0: mwait min=64, max=64, C-substates=0.2.1.1, IBE
> > cpu1 at mainbus0: apid 2 (application processor)
> > cpu1: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz, 3492.12 MHz, 06-2a-07,
> patch 002f
> > cpu1:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> > cpu1: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB
> 64b/line 8-way L2 cache, 8MB 64b/line 16-way L3 cache
> > cpu1: smt 0, core 1, package 0
> > cpu2 at mainbus0: apid 4 (application processor)
> > cpu2: Intel(R) Core(TM) i7-2600 CPU @ 3.40GHz, 3492.19 MHz, 06-2a-07,
> patch 002f
> > cpu2:
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> > cpu2: 32KB 64b/line 8-way D-cache, 32KB 64b/line 8-way I-cache, 256KB
> 64b/line 8-way L2 cache, 8MB 64b/line 16-way L3 cache
> > cpu2: smt 0, cor

Re: ImageMagick fails on OpenBSD 7.4 fresh install

2023-10-22 Thread Philip Guenther
Don't know what's wrong with the pkg database (/var/db/pkg/) on your
system, but on mine the shared-mime-info-2.2 package includes a definition
for the update-mime-info tag, so if yours lacks that then something in
there got hosed during your upgrade.  Could be data loss from disk failure,
could be something pruned critical info from /var/db/pkg/, could be
something I can't think of.

So, I would suggest starting with verifying your confidence in your storage
(no kernel log error messages about I/O errors?  If this machine has
suffered any file system issues then maybe backup, verify-backup, newfs and
restore?)

Then I would probably reinstall *all* packages, but since I don't (fully)
trust the pkg database, I would probably do it with the
1) pkg_info -mz > manual
2) cd /var/db/pkg && pkg_delete *
3) make sure nothing unexpected has been left behind in /var/db/pkg/ or
/usr/local/*
4) pkg_add -l manual


Or maybe now's a good time to do a fresh install.  


Philip Guenther


On Sun, Oct 22, 2023 at 3:34 PM Mark  wrote:

> Tried changing the installurl, an another mirror, but didn't help.
>
> Here's what actually happens;
>
> https://i.ibb.co/G0wbGf5/terminal-sshot.png
>
> Regards.
>
> Mark , 23 Eki 2023 Pzt, 01:16 tarihinde şunu
> yazdı:
>
> > pkg_add ImageMagick-6.9.12.88p0 gives me;
> >
> > (after fetching few libraries)
> >
> > "Can't install ImageMagick-6.9.12.88p0: can't resolve
> > djvulibre-3.5.28p1,libheif-1.16.2p0"
> >
> > and then;
> > "Couldn't install ImageMagick-6.9.12.88p0 djvulibre-3.5.28p1
> > libheif-1.16.2p0."
> >
> > This is a fresh OpenBSD 7.4 amd64 release. My installurl is pointed to
> > cdn.openbsd.org/pub/OpenBSD.
> >
> > Any other php packages were installed fine. But both
> > pecl80-imagick-3.7.0p1 and ImageMagick fail.
> >
> > Some idea would be much appreciated!
> >
> > Regards.
> >
>


Re: Delay in starting xterm via ssh after upgrade from 7.3 to 7.4

2023-10-22 Thread Philip Guenther
If this had been observed _during_ 7.4 development then it would have been
simpler to isolate what set of changes caused it.  Since that didn't happen
you'll have to debug this yourself on the affected systems.  For starters,
I would suggest turning up ssh logging with the -v option and capturing
that to a file and comparing the output on working and not working
systems.  Or ktrace the stuttering processes and see when kdump -T output
shows as the operations where the delays occurred.


As for your "should I have never been doing these this way?" question,
that's unanswerable without knowing _why_ you had written them that way.
Using -Y instead of -X to disable XSecurity enforcement?  Why tunnel X
instead of have the remote client connect directly to the X server?  You
wrote those to solve some problem, changing that means going back and
reopening that question, which is probably a distraction from the "why did
the latency change" question.



On Sun, Oct 22, 2023 at 7:22 AM Roger Marsh  wrote:

> On Thu, 19 Oct 2023 17:23:47 +
> Roger Marsh  wrote:
>
> > Hi,
> >
> > After upgrade from 7.3 to 7.4 (on both boxes) the xterm session for this
> entry in .fvwmrc (on monitor):
> >
> > 'Exec exec ssh -Y opendev xterm -title roger@opendev'
> >
> > takes several seconds to deliver the xterm window, while I did not
> notice any delay before upgrade.
> >
> > For other usernames on opendev the .fvwmrc entry is like (without the
> '-X' for most usernames other than grading):
> >
> > 'Exec exec xterm -title grading@opendev -e ssh -X grading@opendev'
> >
> > and I do not notice any delay after upgrade compared with before upgrade.
> >
> > Expressing the 'roger@opendev' entry as:
> >
> > 'Exec exec xterm -title roger@opendev -e ssh -Y roger@opendev'
> >
> > fixes the delay problem, but was the delay a predictable consequence of
> some change?  Or perhaps the entry should never have been expressed in the
> way that led to the delay?
> >
> > Below are dmsesg and pkg_info for both boxes involved.
> >
> > Roger
>
> ...
> dmesg and pkg_info for monitor and opendev snipped.
> ...
>
> Hi,
>
> Later I saw opening files with Python's Idle editor suffers the same
> pattern of slow response, in terms of serving up the file edit window, as
> seen with xterm.  Scrolling through an editor window is slower too, and
> stutters, compared with what was seen when both boxes were at 7.3 (PgUp and
> PgDn buttons are what I used).
>
> One box (gash) had not been upgraded to 7.4 (because I thought it did not
> have OpenBSD disks).  It was modified, in particular adding Python Idle and
> Chromium, to see what happens when 7.3 has the Xserver role and 7.4 the
> Xclient role; and the other way round.
>
>   Idle
> XserverXclient   Display file window   Scrolling
>   7.47.3   slow stutter
>   7.37.4   quicksmooth
>   7.47.4   slow stutter
>   7.37.3   quicksmooth (from
> memory: confirmed on reverting)
>Same 7.4 boxquicksmooth
>
> Idle is started by 'Exec exec ssh -Y  idle3.10' in .fvwmrc file.
> Chromium is started by 'Exec exec ssh -X @ chrome' in
> .fvwmrc file.
>
> This behaviour with Python persuades me to revert the OpenBSD 7.4 box
> (monitor) in the Xserver role to 7.3 until 7.4 or later provides more
> acceptable response times.
>
> Chromium seemed unaffected except for slow response when typing in the URL
> bar on the separate 7.4 Xserver box.  I thought I could mostly avoid this
> by starting to use bookmarks, but the effect on Python matters more.
>
> Apologies for going off-topic by discussing Python and Chromium rather
> than xterm: but the Python stuff changes my attitude to the problem from
> minor annoyance to something which needs an immediate workaround.
>
> Below are dmesg (most recent reboot only) and pkg_info for the OpenBSD 7.3
> box (gash).
>
> Roger
>
> Script started on Sat Oct 21 17:09:38 2023
> gash$ dmesg
> syncing disks... done
> rebooting...
> OpenBSD 7.3 (GENERIC.MP) #1125: Sat Mar 25 10:36:29 MDT 2023
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 3967422464 (3783MB)
> avail mem = 3827781632 (3650MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.5 @ 0xe9f80 (85 entries)
> bios0: vendor Hewlett-Packard version "786G1 v01.16" date 03/05/2009
> bios0: Hewlett-Packard HP Compaq dc7900 Small Form Factor
> acpi0 at bios0: ACPI 1.0
> acpi0: sleep states S0 S3 S4 S5
> acpi0: tables DSDT FACP APIC ASF! MCFG TCPA SLIC HPET DMAR
> acpi0: wakeup devices PCI0(S4) PEG1(S4) PEG2(S4) IGBE(S4) PCX1(S4)
> PCX2(S4) PCX5(S4) PCX6(S4) HUB_(S4) USB1(S3) USB2(S3) USB3(S3) USB4(S3)
> USB5(S3) USB6(S3) EUS1(S3) [...]
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: 

Re: Crash on TOSHIBA PORTEGE Z30-A laptop

2023-10-22 Thread Philip Guenther
On Sat, Oct 21, 2023 at 2:27 AM  wrote:

> Hi Philip,
>
> Thank you very much for your answer.
>
> I tried to disable all options (+devices) possible. Same issue.
> And what's about disable acpi in the kernel using the bsd.re-config?
>

As Mike and Theo noted, this will certainly cause problems.



> Do you think If I replace the wireless card by somthing else, It could
> resolve this issue?
>

Very unlikely.  The problem is the stack depth of the ACPI processing.  The
crash you saw had the wifi interrupt occur during the ACPI processing but
it could just as well happen with some other device interrupting the ACPI
processing.

If there isn't a newer BIOS that resolves this, I would tend to return the
box as not suitable.


Phlip Guenther


Re: Crash on TOSHIBA PORTEGE Z30-A laptop

2023-10-20 Thread Philip Guenther
On Fri, Oct 20, 2023 at 1:23 PM  wrote:

> I've recently installed OpenBSD 7.4 on this laptop.
>
> However, I'm experiencing random crashes. These occur at various times,
> including during kernel loading (before running /etc/rc),
>
> or later while I'm using the system.
>
>
> I've included the contents of /var/run/dmesg.boot below and attached the
> screens with the ddb output command.
>
...

> bios0: vendor TOSHIBA version "Version 4.30" date 04/26/2018
>

The screenshots show that the fault happens during a wifi interrupt that
catches the ACPI thread processing a very deeply nested AML code.  I
suspect it's actually running out of kernel stack space as a result.
Everything below is based on that hypothesis.

So, the first thing to try is to see if there's a BIOS update newer than
the 2018 rev it currently has.  They may have optimized the AML code, or at
least made it less deeply nested.

Another possibility is to see if there's a device you can disable that
would result in that AML not being called.  If there's anything that you
aren't using then disable it in the BIOS and hope.

The last possibility would be to build a kernel which allocates more pages
per thread for its kernel stack by bumping the UPAGES #define
in /usr/src/sys/arch/amd64/include/param.h and building a new kernel.  It's
really only the ACPI thread that needs this, but we don't currently have
code to control that on a per-thread basis.


Philip Guenther


Re: reorder_kernel: failed

2023-10-17 Thread Philip Guenther
On Tue, Oct 17, 2023 at 10:34 AM Karel Lucas  wrote:

> Content of relink.log:
>
> (SHA256) /bsd: OK
> LD="ld" sh makegap.sh 0x gapdummy.o
> ld  -T  ld.script -X  --warn-common -nopie -o newbsd ${SYSTEM_HEAD}
> vers.o ${OBJS}
> text  data bssdec  hex
> 21325291403432   124108822969811   15e7dd3
> mv newbsd newbsd.gdb
> ctfstrip -S -o newbsd  newbsd.gdb
> rm -f bsd.gdb
> mv -f newbsd bsd
> install -F -m 700 bsd /bsd && sha256 -h /var/db/kernel.SHA256 /bsd
> install: rename: INS@4erJJ3bo3 to /bsd: Operation not permitted
> *** Error 1 in /usr/share/relink/kernel/GENERIC.MP (Makefile:2267
> 'newinstall')
>

So renaming over /bsd failed with EPERM.  That smells like /bsd is marked
immutable via chflags.  To verify, what's the output of
   ls -ldo / /bsd

?

If it *is* marked immutable, then uh, you'll need to undo that and figure
how the heck that happened and make sure it doesn't happen again.

(If _you_ marked it immutable, then don't, or at least don't waste people's
time when that breaks things.)


Philip Guenther


Re: debugging "invalid argument" errors when loading elf files

2023-10-11 Thread Philip Guenther
On Tue, Oct 10, 2023 at 11:44 PM Lorenz (xha)  wrote:

> On Mon, Oct 09, 2023 at 01:29:52PM -0700, Philip Guenther wrote:
> > On Mon, Oct 9, 2023 at 11:21 AM Lorenz (xha)  wrote:
> >
> > > hi misc@,
> > >
> > > i'm currently porting the hare programming language to openbsd and i am
> > > having quite a few problems trying to use a linker script. i am always
> > > getting a "/bin/ksh: .bin/hare: Invalid argument" error.
> > >
> > > so far i tried a lot of stuff like comparing a working version without
> a
> > > linker script, looking if any of the programm headers are missing, etc.
> ...
> > Read /usr/src/sys/kern/*exec* and review the logic around the 10
> > occurrences of EINVAL in that code.  Presumably the differences you
> > identified will point to one or more of them
>
> found it: PT_PHDRS is missing. i didn't identify that difference at
> first tho. it's needeed for PIE if i understand correctly.
>

Yeah.


> why is ld not adding a PT_PHDR programm header? as far as i undestand,
> PT_PHDR are the programm headers themselfs?


PT_PHDR is the tag for an entry in the program headers that points to the
program headers themselves.  Some ELF files (for example, core files) have
a program header but don't include a PT_PHDR entry in it.  It's presumably
not added by ld because you supplied a linker script and ld is trying to
give you as much control as possible.  Some of the other arguments you
supplied may have required it to fill in other details, but including a
DT_PHDR entry in the program header is apparently not one of them.

As Theo says, this sort of thing makes linker scripts very subtle, with
arch dependencies**, interactions with RELRO and W^X processing, and plain
ABI weirdness.  Even those of us who have written several of them often
have to start from what 'readelf -lS' shows the default is to put together
a starting point and massage it from there to achieve whatever our goal for
going through this effort is.

** e.g., permissions on and immutability of .plt, .got, etc sections vary.
Some archs have
required some sections to be before or after others due to the CPU treating
a limited range
offset in some instructions as unsigned, so 'before' cannot be reached


this is my linker script

...

> i am moving the init functions in a
> different section so that the hare runtime can execute them and not
> libc.


Well, the first issue with this is that libc doesn't invoke init functions,
so there's some sort of misunderstanding going on.

The .init section of the executable itself is invoked by the entry point
code in crt0 before main is called, not libc.  At a low level, that's done
not via DT_INIT's value or the section name but via a symbol placed in the
section which would presumably be carried along if you renamed the section,
so you can't affect that.

The .init_array section is handled by ld.so in dynamic programs where it's
located via DT_INIT_ARRAY, and by crt0 in static programs where it's
located via *_start and *_end symbols which I _guess_ would be from
whatever ended up with the .init_array section name, so maybe renaming the
section would prevent them from being invoked in that case, but I'm not
sure and that an implementation detail that might change.

(Moving a chunk of the crt0 code into libc would be possible (glibc did it,
for example) and might make some evolution easier, but it hasn't happened.)


...but, backing up...what problem exists with the ordering that currently
happens that makes you believe you need to interpose and have the "hare
runtime" (sorry, I'm not familiar with that) execute them?  How will you
measure success of the change?


Philip Guenther


Re: debugging "invalid argument" errors when loading elf files

2023-10-09 Thread Philip Guenther
On Mon, Oct 9, 2023 at 11:21 AM Lorenz (xha)  wrote:

> hi misc@,
>
> i'm currently porting the hare programming language to openbsd and i am
> having quite a few problems trying to use a linker script. i am always
> getting a "/bin/ksh: .bin/hare: Invalid argument" error.
>
> so far i tried a lot of stuff like comparing a working version without a
> linker script, looking if any of the programm headers are missing, etc.
>

So you have a working binary (w/o linker script) and a not-working binary
(w/linker script) and you've even done the comparison of the program
headers of the two...and you're not going to show those but rather ask
what, in general, could be wrong?  Okay.

Lacking the specifics of those differences (which you've already
identified), the general advice is this:

Read /usr/src/sys/kern/*exec* and review the logic around the 10
occurrences of EINVAL in that code.  Presumably the differences you
identified will point to one or more of them


Philip Guenther


Re: Speed: dump/restore vs rsync

2023-09-22 Thread Philip Guenther
Whelp, that's bizarre: AFAICT that file type has never been used by any of
the mainline BSDs.

I guess you could have had a bitflip from a bug (or drive issue, or cosmic
ray, etc) and it should really be either 012 (symlink) or 014
(socket).  You could try tracking it down (update dump to print the inum
too?) and then use find(1) to see what path(s) it has to get a hint about
what it should have been, and then maybe use fsdb to change its type to
what seems the correct one, though I would *test* my backup before doing
that and be prepared to spend the time to newfs+restore the filesystem in
case things go wrong.

Philip


On Fri, Sep 22, 2023 at 12:44 PM vitmau...@gmail.com 
wrote:

> Dear Philip,
>
> thank you for pulling my ears. The complete error message is:
>
> DUMP: Warning: undefined file type 013
>
> Best,
> Vitor
>
> Em sex., 22 de set. de 2023 às 16:17, Philip Guenther 
> escreveu:
>
>> On Fri, Sep 22, 2023 at 11:18 AM vitmau...@gmail.com 
>> wrote:
>>
>>> I'm also getting an "undefined file type" error from dump. I found one
>>> guy
>>> from the FreeBSD mail list that got the same error, but he solved his
>>> problem using fsck on the partition. I forced fsck on my side, since the
>>> filesystem was marked as clean, but to no avail.
>>>
>>
>> In OpenBSD's dump, that warning includes the actual file type which
>> provoked the error:
>>
>>  : bleys; pwd
>> /usr/src/sbin/dump
>> : bleys; grep  'undefined file type' *.c
>> traverse.c: msg("Warning: undefined file type 0%o\n",
>> : bleys;
>>
>> Since yours didn't include that, you would appear to be running some
>> other version of dump, which would make this the wrong place to get help.
>>
>> ...unless you truncated an error message before asking about it, which is
>> kinda self-defeating.
>>
>>
>> Philip Guenther
>>
>>


Re: rmt, rcmd, /etc/hosts.equiv and .rhosts

2023-09-11 Thread Philip Guenther
On Sat, Sep 9, 2023 at 5:34 PM Daniele B.  wrote:

> Just investigating about /etc/hosts.equiv and ~/.rhosts and I was
> quite serious to think that my system doesn't need both of them
>

I have not used a system in 30 years where /etc/hosts.equiv existed.  I
deleted my .rhosts when ssh was present on all the systems I had to deal
with.


I then start to look carefully my /etc and discovered a link
> that read like this:
>
> 0 lrwxrwx---  1 root  wheel  13 Mar 25 17:14 /etc/rmt -> /usr/sbin/rmt
>

So, 45 years ago, /sbin didn't exist and when "systems" programs were added
they were put in /etc.  So, when someone decided it would be neat if dump
could send its output over an rcmd(3) connection to another machine, the
rmt program was put in /etc and dump was told to invoke "/etc/rmt" via
rcmd().  Indeed, dump *still does that* in OpenBSD.


man rmt:
>
...

> man rcmd:
>
...

> SUPERBUG (by myself):
> 
> One can be "tempted" to think to a ruserok() function that hacked can
> return always OK (0) and otherwise one can always revert to rcmdsh()
> with the help of a "good" rshprog.
>

I'm sorry, but I don't understand what you're trying to say here.
ruserok() is in libc and linked into rmt, so in this case "a ruserok()
function that hacked can return always OK" would mean "if you can alter a
root owned binary on the target system", which is...boring?


I'm here to ask enlightment about the opportunity to define
> /etc/hosts.equiv and ~/.rhosts but mainly


Short answer: don't.
Longer answer: "what problem are you trying to solve?"

I suppose OpenSSH still has some hosts.equiv and .rhosts bits, but I trust
that Theo periodically attempts to kill them completely and only the "we
will sell no wine / until its time" hands of Damien and Darren will measure
when they can finally be removed (if I haven't lost track and they're
already gone).

If there is still support for those in base libc, well, your asking about
them may result in them being removed.



> if it is still the case (and
> why) to have this rmt link in etc.


See explanation above.



> Last if not first, what is the best
> practice to defend myself form BUG and SUPERBUG listed above.
>

"Don't let untrusted people alter libc on your system"

You *do* understand that you are trusting the OpenBSD developers by using
OpenBSD, just as you are trusting the FreeBSD developers if you use
FreeBSD, and trusting the Linux kernel and glibc and GNU-whatever utils,
and systemd, and distro developers if you use a Linux distribution, yes?

If you don't trust a community or find its values don't match yours, then
find a different community that matches it better, or build your own.


Philip Guenther
OpenBSD developer


Re: struct kinfo_proc: p_schedflags and PSCHED_*

2023-09-11 Thread Philip Guenther
On Mon, Sep 11, 2023 at 12:01 PM Benjamin Stürz 
wrote:

> I'm writing a little toy /proc fuse-fs for OpenBSD.
>
> The field p_schedflags defined in struct kinfo_proc
> in file /usr/include/sys/sysctl.h refers to PSCHED_*,
> but I can't find any references to these macros with:
> $ grep -rn PSCHED_ /usr/include /usr/src/sys
>
> Nor can I find any references to p_schedflags:
> $ grep -rn p_schedflags /usr/src
>
> This leads me to believe that this field is unused
> and should also be marked accordingly,
> to avoid future confusion.
>
> If not, please let me know how I interpret use this field.
>

Yeah, that field hasn't actually been set since 2007, when the scheduler
state was moved into per-CPU struct schedstate_percpu.  I guess it should
be deleted the next time we bump the kinfo_proc ABI, along with a review of
whether there are other dead items.

I don't think we're, at least at this time, interested in exposing or
making any promises about the inner workings of the scheduler, as those
flags did.


Philip


Re: signify: invalid comment in SHA256

2023-06-17 Thread Philip Guenther
On Sat, Jun 17, 2023 at 9:38 PM Avon Robertson  wrote:

> Used lynx to get the 4 files shown below. The shell prompt is a 2
> line prompt with current dir on 1st line,'$ ' only, on the 2nd line.
>
> Below is output captured from a tmux pane with a script.
>
> aahno:/
> $ cat /etc/installurl
> https://mirror.aarnet.edu.au/pub/OpenBSD
> aahno:/
> $ lynx $(cat /etc/installurl)/snapshots/amd64
> aahno:/
> $ cd ~/download
> aahno:/home/anon/download
> $ ls -la
> total 1361400
> drwxr-x---   2 anon  anon512 Jun 18 15:35 .
> drwxr-xr-x  25 anon  anon   1536 Jun 18 08:01 ..
> -rw-r-   1 anon  anon  44817 Jun 18 15:34 INSTALL.amd64
> -rw-r-   1 anon  anon   1992 Jun 18 15:34 SHA256
> -rw-r-   1 anon  anon   2144 Jun 18 15:34 SHA256.sig
> -rw-r-   1 anon  anon  696745984 Jun 18 15:36 install73.img
> aahno:/home/anon/download
> $ signify -C -p /etc/signify/openbsd-73-base.pub -x SHA256 install73.img
> signify: invalid comment in SHA256; must start with 'untrusted comment: '
> aahno:/home/anon/download


You downloaded SHA256.sig, but then told signify to read the SHA256 file.
Perhaps you should follow all the examples for signify and pass it the
SHA256.sig file.


Philip Guenther


Re: chmod change means dump(8) the file

2023-01-25 Thread Philip Guenther
On Wed, Jan 25, 2023 at 4:35 PM Jan Stary  wrote:

> On Jan 26 00:18:45, h...@stare.cz wrote:
> > I have a large /media disk that I backup nightly using dump(8):
> > full level 0 on the Sun/Mon night, incrementals through the week.
> > The level 0 dump is huge, the incrementals are usualy trivial
> > unless I add something to /media.
> >
> > Yesterday I chmod'd a lot of the files, without making any other change.
> > That resulted in a huge level 2 dump; I suppose a chmod change counts
> > as a changed file, so they all got dumped anew, even though the content
> > of the file(s) has not changed.
> >
> > Is that intentional? It seems there is a lot of space to be saved
> > if it's "only" the metadata that have changed. Is that decided by
> > simply looking at the stat(2)? In particular, newer ctime is
> > just as good a reason to dump the _content_ as newer mtime?
>
> Seems so:
>
> /* Determine if given inode should be dumped */
> [...]
> if (CHECKNODUMP(dp) &&
> (DIP(dp, di_mtime) >= spcl.c_ddate ||
>  DIP(dp, di_ctime) >= spcl.c_ddate)) {
>

Right: if the ctime is newer than the previous backup then you don't know
what else has changed: the contents could have been modified and then the
file's mtime backdated to before the previous backup.  At that point the
ctime is the only indicator that the file no longer matches its backup.

(Of course, the second problem that follows from that limitation is that
the 'dump' format doesn't have a way to record inode-only info (like mode,
times, and flags) without also recording the file contents.  So, even if
the filesystem provided enough info for dump to know that only the file's
mode had been changed, there's nothing it can do about it other than back
up the entire file.)


Philip Guenther


Re: Weirdness with du/df/my brain (latter more likely)

2023-01-22 Thread Philip Guenther
On Sun, Jan 22, 2023 at 2:08 PM Steve Fairhead  wrote:

> I was cloning a server with rsync in preparation for a major upgrade
> (elderly OpenBSD to 7.2). I noticed that the home partition usage was a
> good deal greater on the new machine than the old (as seen by df).
>

Good thing "cloning with rsync" has a specific meani...

$ rsync --help | grep -c ^-
144
$

oh, hmm.

You'll need to be specific about what rsync options you used, and perhaps
eyeball what the manpage says about them.  For example, the description of
the -a option has a specific warning which seems a plausible explanation of
the expansion.


Philip Guenther


Re: Relinking to create unique kernel... failed!

2023-01-13 Thread Philip Guenther
On Fri, Jan 13, 2023 at 10:59 AM Nick Templeton 
wrote:

> Ever since upgrading my machine to 7.2 I've been unable to relink my
> kernel, anybody have any idea why?

 ...

> Running "/usr/libexec//reorder_kernel" manually resulted in a kernel panic:
>
> mode = 0100600, inum = 7, fs = /tmp
> panic: ffs_valloc: dup alloc
> Stopped at db_enter+0x10: popq %rbp
>

You have at least one filesystem with latent corruption.  You should reboot
in single-user mode and run fsck with the -f option on each partition.

Philip Guenther


Re: rdist remove option and default behaviour

2022-12-13 Thread Philip Guenther
On Mon, Dec 12, 2022 at 9:02 PM All  wrote:

> I wanted to clarify.
>
> In manpage for rdist I see that we can use option -o remove .
> remove  Remove extraneous files.  If a directory is being
>  updated, any files that exist on the remote host that
> do
>  not exist in the master directory are removed.  This
> is
>  useful for maintaining truly identical copies of
>  directories.
> However, this seems to be the default anyway.
>
> If I specify "install /tmp/" and try to copy /tmp/test.file all the files
> in /tmp/
> on the remote host will be wiped out and only test.file will remain there
> after copy.
> This behaviour seems to fit with "directory update" feature of "remove"
> (like
> if we do "install -o remove /tmp/"). Yet, "remove" was not specified above.
>
> Is my understanding of default behaviour correct?  This how it supposed to
> be working?
>

When reporting an issue, please include precise information about both
 * what your desired end result / goal was, and
 * what you tried, including how you invoked the command and/or the config
used.

If you leave out the former, then we'll be guessing as to why the result
wasn't what you wanted.
If you leave out the latter, then we'll be guessing as to what you did that
didn't work as desired.
...or be prepared to be accepting of people guessing wrong.

ALSO: rdist has been largely superseded by rsync, which has a much more
efficient underlying protocol and, in my experience, a more regular set of
behaviors.  Before committing to rdist and its (limited by history)
behavior, you should consider using rsync instead.


It seems you
 * wanted to copy /tmp/test.file to /tmp/test.file on one or more other
hosts?
 * you tried a distfile like this:
  whatever:  ( /tmp/test.file ) -> HOSTNAME
install /tmp/;
?

You're correct that the latter does not achieve the former.  To achieve the
former, you would need to either
 * leave off the opt_dest_name from the 'install' directive, so that rdist
would know to install the source to
   the exact same path on the target host
 * specify the full target path in the 'install' directive, ala "install
/tmp/test.file"
 * have multiple source files, so that it treated opt_dest_name as a target
directory and not a target path (like cp)


So, what happened with what you _did_ try?  Well, it was taken as a request
install the contents of the file "/tmp/test.file" as a file "/tmp/"!  rdist
is smart enough to know that it can't remove a directory without first
removing its contents, so it tried that and presumably failed.  If it
_could_ remove the contents it would then remove the directory...and then
fail when it tried to create a tempfile with prefix "/tmp/".

Could rdist's behavior be improved?  In some ways, yes, but lots of
sharp corners (e.g., single vs multiple source handling) would remain.
Frankly, if rsync serves your purposes, you should use it instead.


Philip Guenther


Re: port builds with inline source

2022-06-29 Thread Philip Guenther
Take a look at the Makefile for the sysutils/cpuid port, which has just one
C file included in the ports source tree itself.

Philip Guenther


On Wed, Jun 29, 2022 at 3:53 PM Lyndon Nerenberg (VE7TFX/VE6BBM) <
lyn...@orthanc.ca> wrote:

> We have a number of in-house utilities that we push out as packages.
> Right now these are built using the standard make framework, with
> a bunch of hand-crafted glue to build and sign the packages before
> pushing them to our internal distribution server.
>
> I would really like to take advantage of  to automate
> as much of the packing process as I can.  The problem is that port
> builds assume you're obtaining the program source from external
> distribution files, whereas I want to build right out of the port
> directory itself, i.e. have the program source live under
> /usr/ports/foo/bar/src/.
>
> Has anyone come up with an idiomatic solution to this that doesn't
> involve surgery on /usr/share/mk/*port*?
>
> --lyndon
>
>


Re: rpcbind security

2022-06-18 Thread Philip Guenther
On Fri, Jun 17, 2022 at 8:42 PM Gustavo Rios  wrote:

> Excuse me, but how does rpcbind know that a incoming request, for
> set/unset, comes from the root user ?
>

Theo has already told you how the *portmap* program decides that: by
looking at the host and port the request is coming from.

(There is no rpcbind program in OpenBSD and that word doesn't appear in the
manuals.  If you see an rpcbind process then you're not on OpenBSD and
need to check with a different mailing list.)


Philip Guenther


Re: C states lost on amd64

2022-05-27 Thread Philip Guenther
On Fri, 27 May 2022, Jan Stary wrote:
> ... and with the latest snapshot, they are back.
...
> acpicpu0 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), 
> C1(1000@1 mwait.1), PSS
> acpicpu1 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), 
> C1(1000@1 mwait.1), PSS
> acpicpu2 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), 
> C1(1000@1 mwait.1), PSS
> acpicpu3 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), 
> C1(1000@1 mwait.1), PSS
> 
> On May 26 14:34:43, h...@stare.cz wrote:
> > This is current/adm64, dmesgs below.
> > With the current snapshot, the C states are gone:
> > 
> > -acpicpu0 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), 
> > C1(1000@1 mwait.1), PSS
> > -acpicpu1 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), 
> > C1(1000@1 mwait.1), PSS
> > -acpicpu2 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), 
> > C1(1000@1 mwait.1), PSS
> > -acpicpu3 at acpi0: C3(350@96 mwait.1@0x20), C2(500@64 mwait.1@0x10), 
> > C1(1000@1 mwait.1), PSS
> > +acpicpu0 at acpi0: C1(@1 halt!), PSS
> > +acpicpu1 at acpi0: C1(@1 halt!), PSS
> > +acpicpu2 at acpi0: C1(@1 halt!), PSS
> > +acpicpu3 at acpi0: C1(@1 halt!), PSS
> > 
> > Is this expected?
> > Is it related to the recent apmd -A change?

Not really.  Well, unless your box is one where the states change 
depending on, say, whether the box is plugged in.

You could give this diff a shot.  It enables processing of CST change 
notifications.  No committers have a (working) box that does that, so I 
couldn't get any interest and I have no idea when--or even if--it might go 
in.

Philip Guenther


Index: sys/dev/acpi/acpicpu.c
===
RCS file: /data/src/openbsd/src/sys/dev/acpi/acpicpu.c,v
retrieving revision 1.92
diff -u -p -r1.92 acpicpu.c
--- sys/dev/acpi/acpicpu.c  6 Apr 2022 18:59:27 -   1.92
+++ sys/dev/acpi/acpicpu.c  12 Apr 2022 06:13:55 -
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -80,6 +81,7 @@ void  acpicpu_setperf_ppc_change(struct a
 #define CST_FLAG_FALLBACK  0x4000  /* fallback for broken _CST */
 #define CST_FLAG_SKIP  0x8000  /* state is worse choice */
 
+#define FLAGS_NOCST0x01
 #define FLAGS_MWAIT_ONLY   0x02
 #define FLAGS_BMCHECK  0x04
 #define FLAGS_NOTHROTTLE   0x08
@@ -113,8 +115,10 @@ struct acpi_cstate
uint64_taddress;/* or mwait hint */
 };
 
-unsigned long cst_stats[4] = { 0 };
-
+/*
+ * Locking:
+ * m   sc_mtx
+ */
 struct acpicpu_softc {
struct device   sc_dev;
int sc_cpu;
@@ -130,6 +134,10 @@ struct acpicpu_softc {
struct cpu_info *sc_ci;
SLIST_HEAD(,acpi_cstate) sc_cstates;
 
+   struct mutexsc_mtx;
+   struct acpi_cstate  *sc_cstates_active; /* [m] */
+   int sc_mwait_only;  /* [m] */
+
bus_space_tag_t sc_iot;
bus_space_handle_t  sc_ioh;
 
@@ -161,10 +169,13 @@ struct acpicpu_softc {
 
 void   acpicpu_add_cstatepkg(struct aml_value *, void *);
 void   acpicpu_add_cdeppkg(struct aml_value *, void *);
+void   acpicpu_cst_activate(struct acpicpu_softc *);
 intacpicpu_getppc(struct acpicpu_softc *);
 intacpicpu_getpct(struct acpicpu_softc *);
 intacpicpu_getpss(struct acpicpu_softc *);
 intacpicpu_getcst(struct acpicpu_softc *);
+intacpicpu_cst_changed(struct acpicpu_softc *);
+void   acpicpu_free_states(struct acpi_cstate *);
 void   acpicpu_getcst_from_fadt(struct acpicpu_softc *);
 void   acpicpu_print_one_cst(struct acpi_cstate *_cx);
 void   acpicpu_print_cst(struct acpicpu_softc *_sc);
@@ -510,11 +521,11 @@ acpicpu_getcst(struct acpicpu_softc *sc)
struct acpi_cstate  *cx, *next_cx;
int use_nonmwait;
 
-   /* delete the existing list */
-   while ((cx = SLIST_FIRST(>sc_cstates)) != NULL) {
-   SLIST_REMOVE_HEAD(>sc_cstates, link);
-   free(cx, M_DEVBUF, sizeof(*cx));
-   }
+   /* set aside the existing list and free it if not active */
+   cx = SLIST_FIRST(>sc_cstates);
+   SLIST_INIT(>sc_cstates);
+   if (cx != sc->sc_cstates_active)
+   acpicpu_free_states(cx);
 
/* provide a fallback C1-via-halt in case _CST's C1 is bogus */
acpicpu_add_cstate(sc, ACPI_STATE_C1, CST_METH_HALT,
@@ -528,8 +539,10 @@ acpicpu_getcst(struct acpicpu_softc *sc)
 
/* only have fallback state?  then no _CST objects were understood */
cx = SLIST_FIRST(>sc_cstates);
-   if (cx->flags & CST_FLAG_FALLBACK)
+   if (cx->flags & CST_FLAG_FALLBACK) {
+   sc->sc_flags 

Re: Can't attach gdb to cwm

2022-03-09 Thread Philip Guenther
On Wed, Mar 9, 2022 at 8:28 AM Rob Whitlock  wrote:

> I'm trying to attach gdb to an already running cwm but I get the following
> error:
>
> ptrace: Invalid argument.
>
> Why am I getting this error? Also, I have already set kern.global_ptrace=1,
> and both cwm and gdb are being run by the same user. This problem occurs
> both with the gdb in base and the gdb/egdb in ports.
>

Let me guess: the cwm process is an ancestor of the shell where you're
invoking gdb.  We don't permit that as the reparenting done by ptrace()
would create a loop in the process tree, which breaks assumptions by both
kernel and userspace programs.  If that's the case, run gdb from an ssh
session or something like that.

Hmm, I guess I never updated the ptrace(2) manpage to mention that...

Philip Guenther


Re: C2 state on AC/battery

2022-02-07 Thread Philip Guenther
On Mon, Feb 7, 2022 at 10:04 AM Jan Stary  wrote:

> On Feb 05 13:41:25, guent...@gmail.com wrote:
> > On Sat, Feb 5, 2022 at 2:54 AM Jan Stary  wrote:
> >
> > > This is current/amd64 on a Thinkapd T420s, dmesgs below.
> > > It seems that C2 is or is not supported depending on
> > > whether the machine boots on AC or on battery
> > > (judging by three boots of each).
> > > Is this intended?
> >
> > The acpicpu driver is reporting what ACPI told it; presumably the authors
> > of the AML intended this change as a way to reduce power consumption.
> >
> > Now, ACPI provides a mechanism for the OS to tell it to notify the OS if
> > the contents of the _CST table changes and at least in some cases
> > acpicpu registers for that and if called it would write new acpicpu lines
> > to the dmesg.
> >
> > If you're not seeing those when plugging/unplugging,
>
> I don't.
>
> > there are two
> > possibilities:
> >  * does the AML on your system actually change the values and trigger the
> > notify?
> >  * is acpicpu actually registering the callback correctly?
> >
> > I would suggest adding a printf() right before the aml_register_notify()
> > call in acpicpu.c to see if it's actually being hit,
>
> Probably not: I added a printf() right there
> but nothing shows in dmesg when plugging/unpluging.
>

That aml_register_notify() path is a *boot* time path, when acpicpu is
attaching.  What printf() did you add and did it appear during boot?  If
not, then the OS isn't registering the notify callback.

Please send a report to bugs@ with sendbug as root, including the acpidump
output.


Philip Guenther


Re: C2 state on AC/battery

2022-02-05 Thread Philip Guenther
On Sat, Feb 5, 2022 at 2:54 AM Jan Stary  wrote:

> This is current/amd64 on a Thinkapd T420s, dmesgs below.
> It seems that C2 is or is not supported depending on
> whether the machine boots on AC or on battery
> (judging by three boots of each).
> Is this intended?
>

The acpicpu driver is reporting what ACPI told it; presumably the authors
of the AML intended this change as a way to reduce power consumption.

Now, ACPI provides a mechanism for the OS to tell it to notify the OS if
the contents of the _CST table changes and at least in some cases
acpicpu registers for that and if called it would write new acpicpu lines
to the dmesg.

If you're not seeing those when plugging/unplugging, there are two
possibilities:
 * does the AML on your system actually change the values and trigger the
notify?
 * is acpicpu actually registering the callback correctly?

I would suggest adding a printf() right before the aml_register_notify()
call in acpicpu.c to see if it's actually being hit, and if it is then dump
the tables on your box and grovel around in them to see if you see
notification support on the CPU nodes.


Philip Guenther


Re: SSL write error: certificate verification failed: certificate has expired

2022-02-02 Thread Philip Guenther
On Wed, Feb 2, 2022 at 6:26 PM Yogendra Kumar Chaudhary 
wrote:

> I am facing the following error while using pkg_add on OpenBSD 6.2.
>

6.2?  A four year old release which has been out of support for three years?

You should download the 7.0 ISO and do a fresh install.  And then read the
FAQ about upgrades so that you can keep your system up to date after
installing.


Philip Guenther


Re: cd*.iso reboot loop (vultr, Skylake AVX MDS)

2021-12-04 Thread Philip Guenther
On Sat, Dec 4, 2021 at 4:32 AM Claus Assmann 
wrote:

> My vultr OpenBSD 6.8 instance crashed and when it tried to reboot it
> failed at:
>
> root on sd0a (...)
> WARNING: / was not properly unmounted
> kernel: privileged instruction fault trap, code=0
> mds_handler_skl_avx+0x33:  clflush __ALIGN_SIZE+0x500(%rid,%rax,8)
>
...

> I noticed at least one difference however:
> the crashing system shows
> Using Skylake AVX MDS workaround
> which might be something related to the function mentioned above?


They have your virtualization guest configured in a way that doesn't match
any real hardware: it has a family-model-stepping combination that matches
the Skylake line, real hardware of which all have the cflushopt extension,
but the host is making the guest trap when that instruction is used.

You could test this theory by changing "clflushopt" to "clflush" in mds.S
and building a new ISO, but poking them to provide a more consistent
virtualization setup, whether by migration or reconfiguration, is the
better solution.

We could add more tests of the cpuid data and codepatch out the
instructions that should be there but aren't, but for something like the
clflushopt instruction where there's no real good reason for not passing
through the extension when the CPU presumably has it, it's hard to get much
enthusiasm up for working around a pointlessly dumb (or buggy)
virtualization config.



> Is this workaround something that could be turned off to see whether
> it causes the problem?
> The weird thing is that OpenBSD 6.8 was installed fine
> (11 months ago), so I don't understand why this problem happens now
> (could vultr have changed something in the underlying system?)
>

The machine hosting your guest probably suffered some failure (thus the
crash that you experienced) and they migrated your guest to another host to
get you back up and running.  I periodically see the tickets go by at my
$DAYJOB of this sort of replacement.  Hardware, especially modern PCs,
don't live anywhere near forever...


Philip Guenther


Re: Transferring ownership of SSH connection from process A to B, letting A quit nicely?

2021-08-11 Thread Philip Guenther
On Tue, Aug 10, 2021 at 12:13 PM mid  wrote:

> On Monday, August 9th, 2021 at 5:36 AM, Philip Guenther <
> guent...@gmail.com> wrote:
>
> > If you're 100% sure you have it right, then it should be easy to provide
> a
> > program that demonstrates
> > 1.  passing an fd between processes
> > 2.  using it successfully in the receiving process
> > 3.  the sending process exiting
> > 4.  attempts to us it failing the receiving process
>
> Not 100%, but I'm out of ideas, so here goes nothing.
>
> client.c (process A):
>
...

> Compiled with:
>   cc -std=c99 -o server server.c
>   cc -std=c99 -o client client.c
>
> `client` is also the shell of the user, but the results are the same if
> I call it from within a "real" shell, too.
>
> The server receives the correct FDs, and prints
> "Hello from the Server\n" correctly, too. But as soon as `client`
> exits, the SSH connection goes with it, instead of staying (as in,
> I get "Connection to localhost closed").
>

Your problems have nothing to do with fd passing but rather are around not
understanding how session management works.
The client is passing its stdin/stdout, which are either pipes or a
pseudo-tty connected to the ssh server and NOT the actual TCP socket
carrying the ssh connection.  When the session leader process exits the
kernel will perform various cleanup operations (block tty access, send some
signals).

If you _really_ want to hack around in this area, you need to do a bunch of
reading and research.  I recommend buying/borrowing a copy of
_Advanced_Programming_in_the_UNIX_Environment_ by W. Richard Stevens.


Philip Guenther


Re: Transferring ownership of SSH connection from process A to B, letting A quit nicely?

2021-08-08 Thread Philip Guenther
On Sun, Aug 8, 2021 at 10:13 AM mid  wrote:
...

> I have tried sending the file descriptors associated with the connection
> to process B via sendmsg, thinking that maybe the
> file descriptors are reference-counted. It's a logical
> assumption, but it didn't work - the connection closed with
> process A.
>

File descriptors sent via sendmsg() on a unix domain socket of SCM_RIGHTS
control messages *are* reference-counted.

If you think you've done that and it's not behaving as expected, then first
check and report errors on *all* the system calls, and that the returned
data fields on things like recvmsg() have the values you expect.  If
sendmsg() is failing or you're accidentally discarding the fds in the
recvmsg() by not providing the space needed then yeah, the fds will be
closed because the last reference is gone.

If you're 100% sure you have it right, then it should be easy to provide a
program that demonstrates
1) passing an fd between processes
2) using it *successfully* in the receiving process
3) the sending process exiting
3) attempts to us it failing the receiving process

No?


Philip

(Replies not on the list will be deleted)


Re: /var/log/failedlogin is a binary file with a lot of null bytes?!

2021-07-17 Thread Philip Guenther
On Fri, Jul 16, 2021 at 11:49 PM podolica  wrote:

> On my OpenBSD installation (6.9) one of the log files created by login(1)
> seems to be a binary file:
> $ less /var/log/failedlogin
> "failedlogin" may be a binary file. See it anyway?
>
...

> What can I learn from this logfile?
> A lot of repeating null bytes and "ttyC2" and "ttyC3" does not seems
> to be very informative.
>
> Is this an error?
>

No, it's not an error.  That file is specific to the 'login' command,
specifically the source file /usr/src/usr.bin/login/failedlogin.c and
consists of an array of the 'badlogin' structure specified there.  If you
want to dump its contents in a more readable format then you should write a
small program to do so in C or some other language which can easily handle
binary files.


Philip Guenther


Re: udp sendto performance

2021-07-06 Thread Philip Guenther
On Mon, Jul 5, 2021 at 3:56 PM Brian Empson  wrote:
...

> I'm running 6.5, is there any significant performance improvements in
> the newer versions of OpenBSD that would improve sendto()'s performance?
>

Yes.

I'll suggest that before you do any serious perf measurement or try to
"squeeze more performance out of" *any* codebase you update to a current
release and not measure a two year / four version old release.

There are people for whom tracking performance of a set up over time is
important and for them measuring obsolete versions is useful.  However, if
you have a target and are trying to figure out whether a setup can _reach_
that target then measuring an older release tells you nothing, because you
would never deploy an out of date release.  I Would Dearly Hope.


Philip Guenther


Re: EACCES of UDP packet

2021-06-22 Thread Philip Guenther
On Mon, Jun 21, 2021 at 9:07 PM Siegfried Levin 
wrote:

> Thanks a lot for the hint. Unfortunately I’m still not able to see why
> sendto failed with 13 Permission denied. The AF_INET address masked is the
> correct one of my server, not a broadcast address. A sendto before this one
> to the same address just worked.
>
>   3058 myapp  CALL
> sendto(5,0x1689f5f6500,0x5d,0x400,0x7f7f1144,0x10)
>   3058 myapp  STRU  struct sockaddr { AF_INET, xxx.xxx.xxx.xxx: }
>

Why have you chosen to hide information that may be useful in debugging
your problem?

"Hi, I'm asking for help but I have to hide addresses because...this
application is insecure if anyone else has its IP+port?  Because I've never
heard of shodan and don't believe that people are constantly scanning the
Internet?  And while I don't know why it's failing I'm 1000% sure that
there's no information to be gained from seeing the IP, so if it later
turns out my understanding of 'broadcast address' is incorrect, the time
I've wasted for myself and others will be...a total loss?"


  3058 myapp  RET   sendto -1 errno 13 Permission denied
>   3058 myapp  CALL  close(5)
>   3058 myapp  RET   close 0
>
The dump file is like 600MB. I can provide more trace log if it is
> necessary for locating the root cause.
>

Use the scientific method:
 * make a testable hypothesis
 * devise a test for that
 * perform the test
 * determine whether the hypothesis has been ruled out or confirmed

So, since the manpage mentions blocking pf, I suggest the hypothesis "it
returns EACCES because pf is blocking your packets".  I can think of
several ways to test that; what testing have you performed to confirm or
rule out that possibility?  "doas pfctl -d; run test; doas pfctl -e"?


Alternatively: what's different about *that* call?  Does every sento() call
on that socket fail?  What is special about that socket?  If other sendto()
calls succeed, what is different about that call?  Earlier setsockopt()
calls?


You say "I can confirm the packet was not sent to a broadcast address":
*how* have you confirmed that your understanding of 'broadcast address'
matches the kernel's understanding?  It ain't just 255.255.255.255


Philip Guenther



>
>
> Siegfried
> siegfried.le...@gmail.com
>
>
>
>
> > On Jun 15, 2021, at 8:50 PM, Theo de Raadt  wrote:
> >
> > use ktrace
> >
> > Siegfried Levin  wrote:
> >
> >> Hi,
> >>
> >> I have a application run by a normal user communicating with the server
> with UDP. It crashes very occasionally, like once per week, due to EACCES
> when sending a UDP packet. According to the manpage (
> https://man.openbsd.org/OpenBSD-6.9/sendmsg.2#EACCES), the reason might
> be either being blocked by PF or sending to a broadcast address. I can
> confirm the packet was not sent to a broadcast address. However, I cannot
> figure out what rule could block the connection occasionally either. The
> application can be brought back online without changing any configuration.
> Does anyone know what might fix this? I can also rewrite the code to make
> it ignore the error and keep trying but that might not be a proper
> solution. Running it as root might not be a good idea, too.
> >>
> >> It happens since OpenBSD 6.8. Now I’m running it on 6.9. The
> application is written in Rust.
> >>
> >> Siegfried
> >> siegfried.le...@gmail.com
> >>
> >>
> >>
> >>
>
>


Re: Usage of .note.openbsd.ident

2021-05-26 Thread Philip Guenther
On Fri, May 21, 2021 at 5:28 AM George Brown <321.geo...@gmail.com> wrote:

> It seems this ELF note was used for the now dead compat_linux feature.
> Aside from compat systems in other operating systems that may wish to
> identify OpenBSD binaries does this note have any other active uses?
>

The point of the note (and/or the OS/ABI field in the ELF header) is to
permit portable ELF tools to identify how to interpret OS-specific values,
those in the OS-ranges for types, for example.  Not inserting _some_
identifying factor is basically doing an embrace-and-extend on ELF and
actively hostile to portability of tooling.

If you find that ELF note obnoxious, just fix the linkers to instead set
the ELF ABI field correctly.  As I understand it, the 'go' tool chain has
done that for years.  It's really the better choice for this, would take
less space and be faster to process.


Philip Guenther


Re: Fwd: umm_map returns unaligned address?

2021-04-23 Thread Philip Guenther
On Fri, Apr 23, 2021 at 4:50 PM Alessandro Pistocchi 
wrote:
...

> What I was flagging is just that sometimes uvm_map returns an address that
> is not
> aligned to PAGE_SIZE ( I printed it out and it has 0x004 in the lower 12
> bits).On the
>
other hand uvm_unmap has an assertion that panics if the address passed to
> it is not
> page aligned. I believe that there could be a bug somewhere.
>

You apparently didn't print out the value directly after return from
uvm_map() but rather later after a bunch of your other code had run.  Yes,
there's a bug, in your game_mode_start_audio_thread(), where you advance
the pointer from uvm_map() by four.


Philip Guenther


Re: umm_map returns unaligned address?

2021-04-23 Thread Philip Guenther
On Fri, Apr 23, 2021 at 3:13 PM Alessandro Pistocchi 
wrote:

> -- Forwarded message -
> From: Alessandro Pistocchi 
> Date: Fri, Apr 23, 2021 at 1:55 PM
> Subject: umm_map returns unaligned address?
> To: 
>
>
> Hi all,
>
> I am fairly new to openbsd so if this is something obvious that I missed
> please be understanding.
>
> I am adding a syscall to openbsd 6.8. I am working on a raspberry pi.
>
> During the syscall I allocate some memory that I want to share between the
> kernel
> and the calling process.
>
> When it's time to wrap up and unmap the memory, I unmap it both from the
> kernel
> map and from the process map.
>
> The unmapping from the process map goes fine, the unmapping from the kernel
> map
> fails by saying that the virtual address in kernel map is not aligned to
> the page size
> ( it's actually 4 bytes off ).
>
> What have I missed? I assumed that umm_map would return a page aligned
> virtual
> address for the kernel mapping as well.
>
> Here is my code for creating the shared memory chunk:
>

Stop sending summaries and just send diffs that compile: you don't know
everything that is relevant and keep leaving out stuff that is.  I'm the
third person to say this.


>
> 
> // memory_size is a multiple of page size
> uvm_object = uao_create(memory_size, 0);
> if(!uvm_object) return;
>
> // TODO(ale): make sure that this memory cannot be swapped out
>
> uao_reference(uvm_object)
> if(uvm_map(kernel_map, (vaddr_t *), round_page(memory_size),
> uvm_object,
>0, 0, UVM_MAPFLAG(PROT_READ | PROT_WRITE, PROT_READ | PROT_WRITE,
>MAP_INHERIT_SHARED, MADV_NORMAL, 0))) {
>

The cast of  is wrong: it's either unnecessary (if memory is of the
correct type) or totally broken (if it isn't).  Why did you think it was
unnecessary to show how you declared your variables?

You also fail to show your initialization of 'memory'.  If you didn't then
that's ABSOLUTELY wrong and not in line with the existing uses of uvm_map()
in the kernel.  Please consult the uvm_map(9) manpage for what the incoming
value means.

...

> uao_reference(uvm_object);
> if(uvm_map(>p_vmspace->vm_map, _in_proc_space,
> round_page(memory_size), uvm_object,
>0, 0, UVM_MAPFLAG(PROT_READ | PROT_WRITE, PROT_READ | PROT_WRITE,
>MAP_INHERIT_NONE, MADV_NORMAL, 0))) {
> memory = 0;
>

This error handling is incomplete, lacking an unmap.


Philip Guenther


Re: Unable to listen properly on UDP port 4500

2020-12-08 Thread Philip Guenther
: bleys; grep 4500 /etc/services
ipsec-nat-t 4500/tcpipsec-msft  # IPsec NAT-Traversal
ipsec-nat-t 4500/udpipsec-msft  # IPsec NAT-Traversal
: bleys; sysctl net.inet.esp.udpencap
net.inet.esp.udpencap=1
: bleys

You're trying to use the ipsec ESP encapsulation port, which is enabled by
default.  If you're a masochist and likes making your life more difficult,
you can use that port for your own purposes by disabling that sysctl.  If
you're not a masochist, use a different port.


Philip Guenther


On Tue, Dec 8, 2020 at 4:13 PM Chris Johnson 
wrote:

> Hello All,
>
> I am unable to set up a localhost netcat listener on UDP port 4500 that
> responds to a client on that same host. I encountered this issue
> attempting to test whether UDP 4500 was open on our departmental firewall.
>
> Simple test case: Fresh build of OpenBSD 6.8. No local network, no
> packet filter, no iked running.
>
> # netstat -na -f inet | grep 4500
> [empty]
> # fstat | grep 4500
> [empty]
>
> $ nc -ul localhost 4501 &
> [1] 72638
> $ nc -u localhost 4501
> Z
> Z
> ^C
> $ pkill nc
>
> [1]+  Stopped nc -ul localhost 4501
> $ nc -ul localhost 4500 &
> [2] 70181
> $ nc -u localhost 4500
> Z
> ^C
> $ pkill nc
> [2]-  Terminated  nc -ul localhost 4500
>
> The server running on port 4500 does not echo. Why not? Is there
> something obvious that I'm missing?
>
> I've tried this on three different OpenBSD 6.8 systems (all amd64). Is
> UDP 4500 reserved in some way? Other ports I've tried work fine. Linux
> and MacOS systems work fine on this port.
>
> Cheers,
>
> Chris
>
>


Re: Potential ksh bug?

2020-11-17 Thread Philip Guenther
On Mon, Nov 16, 2020 at 11:04 PM Bodie  wrote:

> On 17.11.2020 05:04, Jordan Geoghegan wrote:
> > Hello,
> >
> > I'm not sure if this is a bug, or if it's just a pdksh thing, but I
> > stumbled upon some interesting behaviour when I was tinkering around
> > with quoting and using a poor mans array:
> >
> > test=$(cat <<'__EOT'
> > # I'll choose not to close this quote
> > other_stuff
> > __EOT
> > )
> >
> > echo "$test"
> >
> >
> > When I run this command on ash, dash, yash, bash, zsh or ksh93 I get
> > the following output:
> >
> > # I'll choose not to close this quote
> > other_stuff
> >
> > But when I run it on ksh from base or any pdksh derivative it throws
> > an error about an unclosed quote:
> >
> > test.sh[8]: no closing quote
> >
> > This snippet works on every POSIX-y shell in the ports tree, and fails
> > on every pdksh variant I tried, including on NetBSD and DragonflyBSD
> > as well.  I don't have the requisite esoteric knowledge regarding
> > pdksh's internal quoting logic, so I'm hoping one of the gurus here
> > can determine whether this is a bug or if I'm just doing something
> > annoying.
> >
> > Any insight that can be provided would be much appreciated.
> >
>
> What exactly are you trying to achieve?
>
> If you will look in sh(1) for 'Command expansion' then there are defined
> rules and your form is not between them.
>

I disagree.  I believe this:

cat <<'__EOT'
# I'll choose not to close this quote
other_stuff
__EOT

matches the syntax for 'command'...once you take into account redirections,
including 'here-docs'.  Or do you believe that's not a valid command on
it's own?  To put another way, I agree with halex@ that this is a (known,
not yet fixed) bug.


So error message about missing closing quote is actually proper
> behavior.
>

Nope.  This is a bug in OpenBSD ksh.



> As well it is good idea to avoid reserved words as a names for variables
> ;-)
> (test)


Hmm?

* 'test' is not a reserved word in the shell
* shell variable names are a completely different namespace than shell
reserved words or commands
* code written to check whether something is a bug is 1000% out-of-bounds
for style comments: either there's a bug or there isn't


Philip Guenther


Re: OpenBSD 6.8 (release) guest (qemu/kvm) on Linux 5.9 host (amd64) fails with protection fault trap

2020-11-16 Thread Philip Guenther
On Sun, Nov 15, 2020 at 10:24 AM Gabriel Garcia  wrote:

> I would like to run OpenBSD as stated on the subject - I have been able,
> however, to run it successfully with "-cpu Opteron_G2-v1", but I would
> rather use "-cpu host" instead. Also note that on an Intel host, OpenBSD
> appears to work successfully on the same Linux base.
>
> qemu invocation that yields a trap:
>
...

Lots of looking everywhere but the error going on here.  Let's look at the
trap/ddb output:


>   kernel: protection fault trap, code=0
>   Stopped at  amd64_errata_setmsr+0x4e:   wrmsr
>
> Contents of CPU registers:
> ddb> show registers
>   rdi   0x9c5a203a
>   rsi   0x820ff920errata+0xe0
>   rbp   0x824c5740end+0x2c5740
>   rbx 0x18
>   rdx0
>   rcx   0xc0011029
>   rax  0x3
>   r80x824c55a8end+0x2c55a8
>   r9 0
>   r10   0xbdf7dabff85d847b
>   r11   0x51e076fef1dcfa7b
>   r120
>   r130
>   r14   0x820ff940acpihid_ca
>   r15   0x820ff920errata+0xe0
>   rip   0x81bc6edeamd64_errata_setmsr+0x4e
>   cs   0x8
>   rflags   0x10256__ALIGN_SIZE+0xf256
>   rsp   0x824c5730end+0x2c5730
>   ss  0x10
>   amd64_errata_setmsr+0x4e:   wrmsr


Oh hey, it says RIGHT THERE that a wrmsr instruction faulted.  Which one?
Well, it's in the function amd64_errata_setmsr().  Furthermore, we just
have to remember that wrmsr takes the MSR to write in the %ecx register
(something the qemu people surely know) and so it's the 0xc0011029 MSR.
Let's grep for that in the amd64 kernel source:

: bleys; cd /usr/src/sys/arch/amd64/
: bleys; grep -rw 0xc0011029  *
include/specialreg.h:#define MSR_DE_CFG 0xc0011029  /* Decode
Configuration */
: bleys; grep -rwl MSR_DE_CFG *
amd64/identcpu.c
amd64/vmm.c
amd64/amd64errata.c
include/specialreg.h
: bleys; grep -rwl ^amd64_errata_setmsr *
amd64/amd64errata.c
: bleys; less +/MSR_DE_CFG amd64/amd64errata.c
<...>
/*
 * 721: Processor May Incorrectly Update Stack Pointer
 */
{
721, 0, MSR_DE_CFG, amd64_errata_set9,
amd64_errata_setmsr, DE_CFG_721
},


Looks like qemu fails to behave like a real AMD CPU by failing to handle
the wrmsr() for that errata.  Also the kernel you're running it on is
failing to apply the errata itself (because otherwise OpenBSD won't be
trying to flip the bit itself).  Go shake an AMD errata document at the
qemu people and figure out why your host kernel isn't applying a documented
fix.

Paying attention to what the kernel tells you is a Good Thing.  Honestly,
what you showed above, that it trapped on wrmsr with those registers should
have been enough for the qemu people to figure out what wasn't working.


Philip Guenther


Re: time_t

2020-10-05 Thread Philip Guenther
On Mon, Oct 5, 2020 at 12:27 PM Roderick  wrote:
...

> Back to tar files: there is place for 11 octal digits, that is
> only twice the time you can count with 32 bits, in years:
>
> 2^33/(60*60*24*365.25*2)=136.09930083403047126524
>
> Also not too much. Is it not a better solution to begin a new epoch
> every 68.05 years? We can do a big celebration at the beginning of
> each new epoch.
>

The pax file format (which is supported by many 'tar' binaries) supports
expressing the time as a decimal string with sub-integer part, bounded only
by the block size, solving both the field size limit problem and the lack
of subsecond resolution.


Philip Guenther


Re: i386, parallel port permission error?

2020-08-19 Thread Philip Guenther
On Wed, Aug 19, 2020 at 3:09 AM Doug Moss  wrote:

> On 2020-08-17, Stuart Henderson wrote:
> >On 2020-08-17, Doug Moss  wrote:
> >>
> >> Did something change at OpenBSD i386 between 5.9 and 6.0
> >> related to parallel port / lpt hardware permissions?
> >>
> >> Up to OpenBSD i386 5.9,
> >> I used to be able to have a working case-LCD-screen
> >> with lcdproc-0.5.7, driver=hd44780, winamp wiring, with 'allowaperture'.
> >> At OpenBSD i386 6.0 and after, it fails.
> >
> >I think this is due to kernel memory access restrictions that were added.
> >Setting sysctl kern.allowkmem=1 before securelevel is raised bypasses them
> >but of course weakens protections.
>
> I think the problem in lcdproc is in the code from this file (port.h)
> https://github.com/lcdproc/lcdproc/blob/master/server/drivers/port.h
>
> I am out of my depth with this code. I have never even seen these
> calls 'outb' and 'inb'
> The code looks like it was begun in 1995.
> Is that what you are talking about 'kernel memory access'?
>

Those are direct CPU instructions for I/O.  To use them, the code must use
i386_iopl(2) from libarch.a to enable it, which in turn requires the
machdep.allowaperture sysctl to a non-zero value (per the manpage).



> Any advice about this? Is this code amenable to being 'modernized'?
>
> If can't modernize the lcdproc code, can you give me specifics about:
> Do I just put a line in /etc/rc.securelevel
> kern.allowkmem=1
>

Try machdep.allowaperture=1 instead.


Philip Guenther


Re: sysctl and panic

2020-08-04 Thread Philip Guenther
On Tue, Aug 4, 2020 at 12:23 PM Sven F.  wrote:
...

> # sysctl -w  ddb.panic=1
> sysctl: ddb.panic: Operation not permitted

...

> Is this expected and can be set only early in boot ?
>

Yes, exactly.  Read the securelevel(7) or sysctl(2) manpages for details.



> is ddb.panic=0 still supported ?
>

Yes.

Philip Guenther


Re: perl hex possible bug

2020-07-21 Thread Philip Guenther
On Tue, Jul 21, 2020 at 3:12 PM Edgar Pettijohn 
wrote:

> I was playing around with the hex function in perl. So naturally I
> started with:
>
> perldoc -f hex
>
> Which showed me a few examples namely the following:
>
> print hex '0xAf'; # prints '175'
> print hex 'aF';   # same
> $valid_input =~ /\A(?:0?[xX])?(?:_?[0-9a-fA-F])*\z/
>
> However, I get the following output: (newlines added for clarity)
>
> laptop$ perl -e 'print hex '0xAf';'
> 373
>

You used the same quotes on the inside and out, so the "inner"
quotes actually never get to the perl!  The shell parses the argument to
perl to
print hex 0xAf

0xAf is a numeric literal whose value is 175.  The hex() function then
takes its argument (175) converts it to a string ("175") and interpretats
that string per its rules...as if you passed it "0x175" which equals 373.

If you use distinct quotes, you get the value you expect:

$ perl -le 'print hex "0xAf";'
175
$


> laptop$ perl -e 'print hex 'aF';'
> 175
>

That relies on the so-called poetry extension, where a bare word like aF is
treated as a string.  Turn on strict...

$ perl -Mstrict -le 'print hex Af;'
Bareword "Af" not allowed while "strict subs" in use at -e line 1.
Execution of -e aborted due to compilation errors.
$



> I'm guessing there is a bug here but not sure if its software or
> documentation.
>

No bug, just shell quoting traps.


Philip Guenther


Re: Potential grep bug?

2020-06-24 Thread Philip Guenther
Nope.  This is a grep of a single file, so procfile() must be overflowing
and this only 'fixes' it by relying on signed overflow, which is undefined
behavior, being handled in a particular way by the compiler.  So, luck
(which fails when the compiler decides to hate you).  There are more places
that need to change for the reported problem to be handled safely.

Philip Guenther


On Tue, Jun 23, 2020 at 9:58 PM Martijn van Duren <
open...@list.imperialat.at> wrote:

> This seems to fix the issue for me.
>
> OK?
>
> martijn@
>
> On Tue, 2020-06-23 at 19:29 -0700, Jordan Geoghegan wrote:
> > Hello,
> >
> > I was working on a couple POSIX regular expressions to search for and
> > validate IPv4 and IPv6 addresses with optional CIDR blocks, and
> > encountered some strange behaviour from the base system grep.
> >
> > I wanted to validate my regex against a list of every valid IPv4
> > address, so I generated a list with a zsh 1 liner:
> >
> >   for i in {0..255}; do; echo $i.{0..255}.{0..255}.{0..255} ; done |
> > tr '[:space:]' '\n' > IPv4.txt
> >
> > My intentions were to test the regex by running it with 'grep -c' to
> > confirm there was indeed 2^32 addresses matched, and I also wanted to
> > benchmark and compare performance between BSD grep, GNU grep and
> > ripgrep. The command I used:
> >
> > grep -Eoc
> >
> "((25[0-5]|(2[0-4]|1{0,1}[[:digit:]]){0,1}[[:digit:]])\.){3,3}(25[0-5]|(2[0-4]|1{0,1}[[:digit:]]){0,1}[[:digit:]])(/[1-9]|/[1-2][[:digit:]]|/3[0-2])?"
> >
> > My findings were surprising. Both GNU grep and ripgrep were able get
> > through the file in roughly 10 and 20 minutes respectively, whereas the
> > base system grep took over 20 hours! What interested me the most was
> > that the base system grep when run with '-c' returned '0' for match
> > count. It seems that 'grep -c' will have its counter overflow if there
> > are more than 2^32-1 matches (4294967295) and then the counter will
> > start counting from zero again for further matches.
> >
> >  ryzen$ time zcat IPv4.txt.gz | grep -Eoc
> "((25[0-5]|(2[0-4]|1{0,1}...
> >  0
> >  1222m09.32s real  1224m28.02s user 1m16.17s system
> >
> >  ryzen$ time zcat allip.txt.gz | ggrep -Eoc
> "((25[0-5]|(2[0-4]|1{0,1}...
> >  4294967296
> >  10m00.38s real11m40.57s user 0m30.55s system
> >
> >  ryzen$ time rg -zoc "((25[0-5]|(2[0-4]|1{0,1}...
> >  4294967296
> >  21m06.36s real27m06.04s user 0m50.08s system
> >
> > # See the counter overflow/reset:
> >  jot 4294967350 | grep -c "^[[:digit:]]"
> >  54
> >
> > All testing was done on a Ryzen desktop machine running 6.7 stable.
> >
> > The grep counting bug can be reproduced with this command:
> > jot 4294967296 | nice grep -c "^[[:digit:]]"
> >
> > Regards,
> >
> > Jordan
> >
> Index: util.c
> ===
> RCS file: /cvs/src/usr.bin/grep/util.c,v
> retrieving revision 1.62
> diff -u -p -r1.62 util.c
> --- util.c  3 Dec 2019 09:14:37 -   1.62
> +++ util.c  24 Jun 2020 06:46:52 -
> @@ -106,7 +106,8 @@ procfile(char *fn)
>  {
> str_t ln;
> file_t *f;
> -   int c, t, z, nottext;
> +   int t, z, nottext;
> +   unsigned long long c;
>
> mcount = mlimit;
>
> @@ -169,7 +170,7 @@ procfile(char *fn)
> if (cflag) {
> if (!hflag)
> printf("%s:", ln.file);
> -   printf("%u\n", c);
> +   printf("%llu\n", c);
> }
> if (lflag && c != 0)
> printf("%s\n", fn);
>
>


Re: Potential awk bug?

2020-06-06 Thread Philip Guenther
On Sat, Jun 6, 2020 at 5:08 PM Zé Loff  wrote:

> On Sat, Jun 06, 2020 at 03:51:58PM -0700, Jordan Geoghegan wrote:
> > I'm working on a simple awk snippet to convert the IP range data listed
> in
> > the Extended Delegation Statistics data from ARIN [1] and convert it into
> > CIDR blocks. I have a snippet that works perfectly fine on mawk and gawk,
> > but not on the base system awk. I'm 99% sure I'm not using any GNUisms,
> as
> > when I break the command up into two parts, it works perfectly.
> >
> > The snippet below does not work with base awk, but does work with gawk
> and
> > mawk: (Running on 6.6 -stable system)
> >
> >   awk -F '|' '{ if ( $3 == "ipv4" && $2 == "US") printf("%s/%d\n", $4,
> > 32-log($5)/log(2))}' delegated-arin-extended-latest.txt
> >
> >
> > The command does output data, but it also throws errors for certain
> lines:
> >
> >   awk: log result out of range
> >   input record number 94027, file delegated-arin-extended-latest.txt
> >   source line number 1
> >
> > Most CIDR blocks are calculated correctly, but about 10% of them have
> errors
> > (ie something that should calculated to be a /24 is instead calculated
> to be
> > a /30).
>
...

> I have no idea about what is going on, but FWIW I can reproduce this on
> i386 6.7-stable and amd64 6.7-current (well, current-ish, #232).
> Truncating the file to a single offending line produces the same result:
> log($5) is out of range.
>
> It appears to have something to do with the last field.  Removing it or
> changing some of its characters seems to work, e.g.:
>
>
> arin|US|ipv4|216.250.144.0|4096|20050503|allocated|5e58386636aa775c2106140445cf2c30
>
> arin|US|ipv4|216.250.144.0|4096|20050503|allocated|5a58386636aa775c2106140445cf2c30
> ^
> Fails on the first line but works on the second.
>

Hah!  Nice observation!

The last field of the first line looks kinda like a number in scientific
notation, but when awk internally tries to set up the fields it generates
an ERANGE error...and the global errno variable is left with that value.
Several builtins in awk, including log(), perform operations and then check
whether errno is set to EDOM or ERANGE but fail to clear errno beforehand.

The fix is to zero errno before all the code sequences that use the
errcheck() function, ala:

--- run.c   13 Aug 2019 10:45:56 -  1.44
+++ run.c   7 Jun 2020 03:14:38 -
@@ -26,6 +26,7 @@ THIS SOFTWARE.
 #define DEBUG
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1041,8 +1042,10 @@ Cell *arith(Node **a, int n) /* a[0] + a
case POWER:
if (j >= 0 && modf(j, ) == 0.0)   /* pos integer
exponent */
i = ipow(i, (int) j);
-   else
+   else {
+   errno = 0;
i = errcheck(pow(i, j), "pow");
+   }
break;
default:/* can't happen */
FATAL("illegal arithmetic operator %d", n);
@@ -1135,8 +1138,10 @@ Cell *assign(Node **a, int n)/* a[0] =
case POWEQ:
if (yf >= 0 && modf(yf, ) == 0.0) /* pos integer
exponent */
xf = ipow(xf, (int) yf);
-   else
+   else {
+   errno = 0;
xf = errcheck(pow(xf, yf), "pow");
+   }
break;
default:
FATAL("illegal assignment operator %d", n);
@@ -1499,12 +1504,15 @@ Cell *bltin(Node **a, int n)/* builtin
u = strlen(getsval(x));
break;
case FLOG:
+   errno = 0;
u = errcheck(log(getfval(x)), "log"); break;
case FINT:
modf(getfval(x), ); break;
case FEXP:
+   errno = 0;
    u = errcheck(exp(getfval(x)), "exp"); break;
case FSQRT:
+   errno = 0;
u = errcheck(sqrt(getfval(x)), "sqrt"); break;
case FSIN:
u = sin(getfval(x)); break;


Todd, are we up to date with upstream, or is this latent there too?


Philip Guenther


Re: Convert ffs1 to ffs2?

2020-05-20 Thread Philip Guenther
On Tue, May 19, 2020 at 10:50 PM Christer Solskogen <
christer.solsko...@gmail.com> wrote:

> Is that possible?
>

"Possible" is irrelevant.  Lots of things are _possible_ but not done.
"Has anyone actually written a tool to do this, and would you *trust* it?"
are the proper question...and the answer appears to be *no*.


Philip Guenther


Re: OpenBSD insecurity rumors from isopenbsdsecu.re

2020-05-11 Thread Philip Guenther
On Mon, May 11, 2020 at 6:09 PM  wrote:
...

> > And why would *you* care about those ways? If you can't tell us why you
> would care, how can we answer your _real_ question?


> Treat it as my secret, I want and that is why I ask because I can, I wish
> you tell me the answer without a knowledge of "why I ask",
> it is a very long discussion of answering by a question to question in
> your Jewish style, is not it?
>

I considered treating your questions in good faith, but then you said
this.  If my questions have you spouting this nonrational drivel them you
should stay away from OpenBSD because I am a committer and if you can't
trust my questions then you shouldn't trust my code.




Philip Guenther


Re: OpenBSD insecurity rumors from isopenbsdsecu.re

2020-05-11 Thread Philip Guenther
On Mon, May 11, 2020 at 4:28 PM  wrote:

> Is not a prohibition for USA citizens to work on OpenBSD cryptography
> software parts an indication of trust relationship between current OpenBSD
> and current USA?
>

I'm not sure what that sentence even means.  What would a "trust
relationship" between OpenBSD and "current USA" actually mean in terms of a
CHANGE IN BEHAVIOR?  Hell, what does "current USA" even _mean_?!?  Did you
mean to say "the US Federal Government"?  If so, what would "trust between
OpenBSD and the US Federal Government" actually mean in terms of a change
in behavior that you, i...@aulix.com, could actually detect?

And why would *you* care about those ways?  If you can't tell us why you
would care, how can we answer your _real_ question?

There is cryptographic software in OpenBSD that was developed in part by
someone who is/was a US citizen, in OpenSSH even, as a check of
copyright/license statements on source files show.  How does that change
your world view?


Philip Guenther


Re: Double fault trap in rtable_l2

2020-04-20 Thread Philip Guenther
On Sat, Apr 18, 2020 at 11:28 PM Thomas de Grivel 
wrote:

> I got this error last night on an OpenBSD 6.6-stable amd64 on which I
> recently enabled IKEv2 :
>
> > kernel: double fault trap, code=0
> > Stopped atrtable_l2+0x27: callq   srp_enter+0x4
>

That was the *complete* output from ddb?  Really?  Not a screen full of
backtrace after that showing that it has a very deep stack?

As you might guess from my questions: the #1 cause of a double fault traps
are kernel bugs causing deep recursion where it runs off the end of the
allocated stack, triggering a page fault exception which itself faults when
it can't write the stack frame for the page fault.  That "fault while
trying to fault" results in a double fault, which I configured to be
delivered on its own stack so that we can report this.

Fixing the deep recursion in this case would require you providing the full
stack trace to the list, so that the correct parties can see it and
identify where it's incorrectly looping.


Philip Guenther


Re: Openbsd supports pae?

2020-04-10 Thread Philip Guenther
Because it would be a total PITA now and in the future and benefit only
that small set of machines that have >4GB of memory but that can't run
64bit.

Since you like one-liner questions: why do you care?


Re: ffs details

2020-02-25 Thread Philip Guenther
On Tue, Feb 25, 2020 at 6:03 PM  wrote:

> Hi, I need some details about ffs, I read the kernel source but my c
> knowledge is very basic. I understood all about the superblock but my
> problem is understand how the files are allocated on the disk.
> Anyone could give me more details about files allocation ?
>

You should start from the paper that described the design and
implementation:
https://www.cs.berkeley.edu/~brewer/cs262/FFS.pdf

as linked to from https://en.wikipedia.org/wiki/Unix_File_System

Kirk McKusick has continued to revise and improve FFS; many of those
changes have been included in OpenBSD.  Check his biography and his
personal website for links to papers and presentations.


Philip Guenther


Re: is there a 2GB limit on amd64 link?

2020-02-05 Thread Philip Guenther
On Wed, Feb 5, 2020 at 7:38 PM  wrote:

> I am encountering a linker error when compiling with ports-gcc Fortran:
>
> ld: error: lbug2.f90:(function MAIN__: .text+0x80): relocation
> R_X86_64_PC32 out o
> f range: 2456507324 is not in [-2147483648, 2147483647]
>
> The code has several large arrays, the total size of which exceeds 2GB.
>
> Is this a linker issue, a gcc fortran issue, or a pebkac?
>

It's at least a gnu fortran issue: it needs to generate object code in a
larger "model" than it currently is.  I've never used gnu fortran, but it
might accept the -mcmodel=medium option like gcc and generate code
sequences for data symbols that don't limit them to the bottom 2GB (or to
within 2GB of the involved code, depending on gcc's choices in implementing
the model).

If it doesn't accept that option, then you'll need to work with the the
docs, mailling lists, etc of the upstream gnu fortran project about how to
have it generate code for the medium or large data models per the amd64 ABI.


Philip Guenther


Re: Readv and writev failing across ethernet

2019-12-24 Thread Philip Guenther
On Tue, Dec 24, 2019 at 8:14 PM Raymond, David 
wrote:

> Openmpi uses readv/writev.  I am beginning to think that the timeout
> and permission errors are legit and reflect real conditions.  What

does re do when it receives a write request when it is busy?
>

're' does not expose a device, but rather provides network interfaces that
are then used with sockets.  What sort of sockets does openmpi use?  What
sort of packet loss is generated on this network and what protocols does
openmpi use to recover from that?

(Lacking both dmesg or kdump, I'll probably have nothing further to
contribute to this thread)

Philip Guenther


Re: Re-organising partitions without re-installation

2019-12-23 Thread Philip Guenther
On Mon, Dec 23, 2019 at 3:10 PM Stuart Longland 
wrote:
...

> Where do you get `sysclean` from?  I don't seem to have it:
> > sjl-router# man sysclean
>
> > man: No entry for sysclean in the manual.
> > sjl-router# which sysclean
> > which: sysclean: Command not found.
>

$ pkg_info sysclean
Information for
http://mirrors.sonic.net/pub/OpenBSD/snapshots/packages/amd64/sysclean-2.8.tgz

Comment:
list obsolete files between OpenBSD upgrades

Description:
sysclean is a script designed to help remove obsolete files between OpenBSD
upgrades.

sysclean compares a reference root directory against the currently installed
files, taking files from both the base system and packages into account.

sysclean does not remove any files on the system. It only reports obsolete
filenames or packages using out-of-date libraries.

Maintainer: Sebastien Marie 

WWW: https://github.com/semarie/sysclean/

$


Re: Readv and writev failing across ethernet

2019-12-23 Thread Philip Guenther
On Mon, Dec 23, 2019 at 6:07 AM Ingo Schwarze  wrote:

> Theo de Raadt wrote on Sun, Dec 22, 2019 at 05:34:45PM -0700:
> > Philip Guenther  wrote:
> >> Somebody wrote:
>
> >>> The man pages for readv and writev don't document the possibility of
> >>> such errors.
>
> >> IMO, weird errnos from devices should be documented in the manpage for
> the
> >> device.  Consider the termios(4) manpage, for example.
>
> > I agree on that.  Otherwise the information-flood is too much.
> >
> > But I think some of our manual pages are a bit weak indicating there
> > are other errors not listed:
>
> Is the following good enough?
>
> Or are you saying that *all* section 2 and 3 manual pages should be
> reworded to say:  "FOOBAR may for example fail if:"?
>

Not all.  For section 2 it's the calls that take an fd that need to be
open-ended about errors.

Philip Guenther


Re: Disabling ACPI permanently

2019-12-23 Thread Philip Guenther
On Mon, Dec 23, 2019 at 5:10 AM Radek  wrote:

> I'm trying to permanently disable acpi doing the following steps[1].
> After the first reboot OS boots fine.
> After the second reboot acpi seems to be re-enabled at boot - I get [2].
> What Am I doing wrong?
>

First, you should also check whether there's a newer BIOS firmware for this
box, as there's a good chance Intel has fixed issues and issued a new one.
If so, installing that may totally resolve the issue.

If not, or if upgrading the firmware doesn't resolve this, then you should
next send a bug report to b...@openbsd.org using sendbug.  To get the most
data when you do so, disable _just_ the acpipci device (using boot -c)
instead of all of acpi and then run sendbug as root on that system.  The
bug report will then include the data from the ACPI tables, so that the
driver can be fixed to deal with this.

...

> acpipci0 at acpi0 PCI0panic: malloc: allocation too large, type = 33, size
> = 292057776136
>


Philip Guenther


Re: Readv and writev failing across ethernet

2019-12-23 Thread Philip Guenther
On Mon, Dec 23, 2019 at 5:04 AM Raymond, David 
wrote:

> The "timeout" error was numerically 60.  Curiously, boards with RTL
> 8111GR chips did not produce these errors, but those with RTL 8111H
> chips did.  Unfortunately, this chipset seems to be in a lot of newer
> motherboards.
>
> I didn't use ktrace/kdump.  The openmpi software returned the error
> presented by readv/writev.
>
> It sounds like the simplest solution at this point is to try
> non-Realtek pcie network cards.  Any suggestions?  How are Intel or
> Broadcom cards?
>

At this point I think you're clearly in the "device driver is buggy"
situation.  If this device has an in-tree driver (and not something you're
compiling locally into your kernel) then you should start a new thread
starting with a dmesg and a clear description of the involved hardware.


Philip Guenther


Re: Readv and writev failing across ethernet

2019-12-22 Thread Philip Guenther
On Sun, Dec 22, 2019 at 3:33 PM Raymond, David 
wrote:

> I am running openmpi-4.0.2 (self-compiled with GDS patches) on
> up-to-date 6.6 stable with a Go program that calls Clang MPI routines.
> With particular hardware (details provided if desired), readv and
> writev calls randomly fail with respectively "Timeout" and "Permission
> denied" errors for calls from one machine to another across the
> ethernet.


While "Permission denied" is the error message for EACCES, "Timeout" is not
a complete errno error message OpenBSD.  Has it been established that the
underlying readv/writev syscalls are returning particular errors by using
ktrace/kdump?

Next: if you have a device open, then the device driver *totally controls*
what errnos syscalls get.  If a device driver wanted to return EDOM
("Numerical argument out of domain") it totally could.  If you're getting
weird errno from a device, well, review the device source!


The errors don't occur between cores on the same machine.
>

THIS SHOULD NOT BE A SURPRISE: the net is not the same as your local
machine.


The man pages for readv and writev don't document the possibility of
> such errors.


IMO, weird errnos from devices should be documented in the manpage for the
device.  Consider the termios(4) manpage, for example.


Philip Guenther


Re: Unable to build OpenBSD 6.6 libc on beaglebone black

2019-12-07 Thread Philip Guenther
On Sat, Dec 7, 2019 at 5:10 PM Jacob Adams  wrote:

> When trying to build libc with the latest security patch applied on my
> beaglebone black, I was met with the following error:
>
> cc -O2 -pipe -g -Wimplicit -I/usr/src/lib/libc/include
> -I/usr/src/lib/libc/hidden -D__LIBC__
> -Werror-implicit-function-declaration
> -include namespace.h -Werror=deprecated-declarations -DAPIWARN -DYP
> -I/usr/src/lib/libc/yp -DSOFTFLOAT_FOR_GCC -I/usr/src/lib/libc/softfloat
> -I/usr/src/lib/libc -I/usr/src/lib/libc/gdtoa
> -I/usr/src/lib/libc/arch/arm/gdtoa
> -DINFNAN_CHECK -DMULTIPLE_THREADS -DNO_FENV_H -DUSE_LOCALE
> -I/usr/src/lib/libc
> -I/usr/src/lib/libc/citrus -DRESOLVSORT -DFLOATING_POINT -DPRINTF_WIDE_CHAR
> -DSCANF_WIDE_CHAR -DFUTEX  -MD -MP  -c
> /usr/src/lib/libc/db/btree/bt_close.c -o
> bt_close.o
> In file included from /usr/src/lib/libc/db/btree/bt_close.c:37:
> /usr/src/lib/libc/hidden/stdlib.h:68:14: error: use of undeclared
> identifier
> 'calloc_conceal'
> PROTO_NORMAL(calloc_conceal);
>  ^
> /usr/src/lib/libc/hidden/stdlib.h:109:14: error: use of undeclared
> identifier
> 'malloc_conceal'
> PROTO_NORMAL(malloc_conceal);
>  ^
> 2 errors generated.
> *** Error 1 in /usr/src/lib/libc (:39 'bt_close.o': @cc -O2
> -pipe -g
> -Wimplicit -I/usr/src/lib/libc/include -I/usr/src/lib/libc/...)
>
>
> I unpacked a copy of the 6.6 src.tar.gz that I had downloaded a while ago
> in
> /usr/src, and then updated to the stable branch with:
>
> cvs -qd anon...@anoncvs.ca.openbsd.org:/cvs up -Pd -rOPENBSD_6_6
>
> I then ran:
>
> cd lib/libc
> make obj
> make
>
> and encountered this error.
>
> Clearly I've done something wrong, could someone please point me to my
> mistake?
>

This box didn't have the 6.6 include files installed.  This is demonstrated
by the lack of a calloc_conceal() declaration in your
/usr/include/stdlib.h. You can't just build 6.6 pieces without their
dependent pieces being present.

Now, unless you can explain _exactly_ how you ended up with this franken
system (66 kernel but not include files?) and come up with a plan to get it
out of the franken-state into a normal "matched kernel and userland,
including compilation environment", then my recommendation would be to grab
the 6.6 bsd.rd, boot to it, and (u)pgrade to 6.6 being sure to include the
comp66 set in your install, and _then_ try building things...or just run
syspatch.


Philip Guenther


Re: How to achieve O_TTY_INIT when opening a USB modem?

2019-11-24 Thread Philip Guenther
On Sun, Nov 24, 2019 at 7:53 PM Jeffrey Walton  wrote:

> On Sun, Nov 24, 2019 at 10:10 PM Philip Guenther 
> wrote:
> >
> > On Sun, Nov 24, 2019 at 3:11 AM Jeffrey Walton 
> wrote:
> >>
> >> I am struggling to get a USB modem and terminal configured properly
> >> under OpenBSD. The same code on Linux is fine. The symptom I am seeing
> >> is a hung read() after issuing ATZ\r to the modem.
> >>
> >> I'm guessing there's an uninitialized field in my struct termios tty.
> >
> > I'm not sure what you mean by that.  Do you mean you're concerned that
> you're you making a tcsetattr(3) call on an incompletely initialized
> structure?  Or do you mean you're concerned that the initial configuration
> of the tty provided by the kernel is in a "not good" state?
>
> I think cfmakeraw is not initializing the structure properly. It is an
> intermittent failure.
>

This code is misusing cfmakeraw(3): it needs to call tcgetattr(3) on the
tty fd and only call cfmakeraw() on the termios structure that tcgetattr()
has filled in.

(There may be other problems; I only reviewed enough to see that it was
violating the rule I mentioned in my previous post.  The _only_ portable
way to initialize a struct termios is to use tcgetattr()!)


Philip Guenther


Re: How to achieve O_TTY_INIT when opening a USB modem?

2019-11-24 Thread Philip Guenther
On Sun, Nov 24, 2019 at 3:11 AM Jeffrey Walton  wrote:

> I am struggling to get a USB modem and terminal configured properly
> under OpenBSD. The same code on Linux is fine. The symptom I am seeing
> is a hung read() after issuing ATZ\r to the modem.
>
> I'm guessing there's an uninitialized field in my struct termios tty.
>

I'm not sure what you mean by that.  Do you mean you're concerned that
you're you making a tcsetattr(3) call on an incompletely initialized
structure?  Or do you mean you're concerned that the initial configuration
of the tty provided by the kernel is in a "not good" state?


> The latest Posix provides O_TTY_INIT to ensure a terminal is in a good
> configuration, but OpenBSD does not recognize it.
>
What is the equivalent under OpenBSD?


OpenBSD, like all BSDs, does not require anything special to be done to
initialize a tty on first open.  We can (and I guess we should at this
point) define O_TTY_INIT to be zero.


How do I achieve O_TTY_INIT when
> using a struct termios tty?
>

Before calling tcsetattr(3) you should call tcgetattr(3) to get the tty
device's current settings and only alter the setting you care about.


Philip Guenther


Re: Value of eax register after BIOS interrupt call from boot(8)

2019-11-09 Thread Philip Guenther
On Friday, November 8, 2019, Theo de Raadt  wrote:

> Philip Guenther  wrote:
>
> > No, it should be the other way, moving the “clear NT flag” block down
> after
> > the “save registers into save area” block
>
> Ah.
>
> Index: arch/amd64/stand/libsa/gidt.S
> ===
> RCS file: /cvs/src/sys/arch/amd64/stand/libsa/gidt.S,v
> retrieving revision 1.11
> diff -u -p -u -r1.11 gidt.S
> --- arch/amd64/stand/libsa/gidt.S   27 Oct 2012 15:43:42 -
> 1.11
> +++ arch/amd64/stand/libsa/gidt.S   9 Nov 2019 06:50:57 -
> @@ -423,14 +423,6 @@ intno  = . - 1
> movl%edx, 0x9*4(%esp)
> movb%bh , 0xe*4(%esp)
>
> -   /* clear NT flag in eflags */
> -   /* Martin Fredriksson  */
> -   pushf
> -   pop %eax
> -   and $0xbfff, %eax
> -   push%eax
> -   popf
> -
> /* save registers into save area */
> movl%eax, _C_LABEL(BIOS_regs)+BIOSR_AX
> movl%ecx, _C_LABEL(BIOS_regs)+BIOSR_CX
> @@ -438,6 +430,13 @@ intno  = . - 1
> movl%ebp, _C_LABEL(BIOS_regs)+BIOSR_BP
> movl%esi, _C_LABEL(BIOS_regs)+BIOSR_SI
> movl%edi, _C_LABEL(BIOS_regs)+BIOSR_DI
> +
> +   /* clear NT flag in eflags */
> +   pushf
> +   pop %eax
> +   and $0xbfff, %eax
> +   push%eax
> +   popf
>
> pop %gs
> pop %fs
> Index: arch/i386/stand/libsa/gidt.S
> ===
> RCS file: /cvs/src/sys/arch/i386/stand/libsa/gidt.S,v
> retrieving revision 1.36
> diff -u -p -u -r1.36 gidt.S
> --- arch/i386/stand/libsa/gidt.S31 Oct 2012 13:55:58 -
> 1.36
> +++ arch/i386/stand/libsa/gidt.S9 Nov 2019 06:51:29 -
> @@ -426,14 +426,6 @@ intno  = . - 1
> movl%edx, 0x9*4(%esp)
> movb%bh , 0xe*4(%esp)
>
> -   /* clear NT flag in eflags */
> -   /* Martin Fredriksson  */
> -   pushf
> -   pop %eax
> -   and $0xbfff, %eax
> -   push%eax
> -   popf
> -
> /* save registers into save area */
> movl%eax, _C_LABEL(BIOS_regs)+BIOSR_AX
> movl%ecx, _C_LABEL(BIOS_regs)+BIOSR_CX
> @@ -441,6 +433,13 @@ intno  = . - 1
> movl%ebp, _C_LABEL(BIOS_regs)+BIOSR_BP
> movl%esi, _C_LABEL(BIOS_regs)+BIOSR_SI
> movl%edi, _C_LABEL(BIOS_regs)+BIOSR_DI
> +
> +   /* clear NT flag in eflags */
> +   pushf
> +   pop %eax
> +   and $0xbfff, %eax
> +   push%eax
> +   popf
>
> pop %gs
> pop %f
>

Ok guenther@


Re: Value of eax register after BIOS interrupt call from boot(8)

2019-11-08 Thread Philip Guenther
On Friday, November 8, 2019, Theo de Raadt  wrote:

> Philip Guenther  wrote:
>
> > Since we're unlikely to do _more_ with BIOS calls in the boot loader, my
> > inclination would be to eliminate the structure value and the code that
> > sets it (incorrectly).  Opinions?
>
> I dunno, my crystal ball provides a more cynical outlook.
>
> How about we just repair by swapping the blocks as you propose, then
> noone gets surprised down the road if they try to use the bios-interface
> API's full functionality.
>
> The bootblocks don't shrink, but they don't grow either.
>
> Is this the right diff?  I'm deleting the name which is in the commitlogs
> since that isn't our style.

...

> --- sys/arch/amd64/stand/libsa/gidt.S   27 Oct 2012 15:43:42 -
> 1.11
>
+++ sys/arch/amd64/stand/libsa/gidt.S   9 Nov 2019 03:57:11 -
> @@ -417,19 +417,18 @@ intno = . - 1
> .byte   0xb8
>  2: .long   0x90909090
>
> -   /* pass BIOS return values back to caller */
> -   movl%eax, 0xb*4(%esp)
> -   movl%ecx, 0xa*4(%esp)
> -   movl%edx, 0x9*4(%esp)
> -   movb%bh , 0xe*4(%esp)
> -
> /* clear NT flag in eflags */
> -   /* Martin Fredriksson  */
> pushf
> pop %eax
> and $0xbfff, %eax
> push%eax
> popf


No, it should be the other way, moving the “clear NT flag” block down after
the “save registers into save area” block

Philip


Re: vi in ramdisk?

2019-11-07 Thread Philip Guenther
On Thu, Nov 7, 2019 at 9:57 PM Brennan Vincent 
wrote:

> I am asking this out of pure curiosity, not to criticize or start a debate.
>
> Why does the ramdisk not include /usr/bin/vi by default? To date,
> it is the only UNIX-like environment I have ever seen without some form
> of vi.
>

The ramdisk space is extremely tight.  We include what we feel is
necessary, PUSHING OUT other stuff as priorities shift.  If you have watch
the commits closely, you would have seen drivers vanish from the ramdisks
on tight archs as new functionality was added.

Given what we want people to use the ramdisks for (installing,
reinstalling, upgrading, fixing boot and set issues), vi is not necessary,
while other functionality and drivers extend their applicability.  We will
keep the latter and not include the former.


Philip Guenther


Re: Value of eax register after BIOS interrupt call from boot(8)

2019-11-07 Thread Philip Guenther
On Thu, Nov 7, 2019 at 9:31 AM Julius Zint  wrote:

> the following code snipped is from sys/arch/amd64/stand/libsa/gidt.S
>
> /* pass BIOS return values back to caller */
> movl%eax, 0xb*4(%esp)
> movl%ecx, 0xa*4(%esp)
> movl%edx, 0x9*4(%esp)
> movb%bh , 0xe*4(%esp)
>
> /* clear NT flag in eflags */
> /* Martin Fredriksson  */
> pushf
> pop %eax
> and $0xbfff, %eax
> push%eax
> popf
>
> /* save registers into save area */
> movl%eax, _C_LABEL(BIOS_regs)+BIOSR_AX
> movl%ecx, _C_LABEL(BIOS_regs)+BIOSR_CX
> movl%edx, _C_LABEL(BIOS_regs)+BIOSR_DX
> movl%ebp, _C_LABEL(BIOS_regs)+BIOSR_BP
> movl%esi, _C_LABEL(BIOS_regs)+BIOSR_SI
> movl%edi, _C_LABEL(BIOS_regs)+BIOSR_DI
>
> These instructions are being executed after a BIOS interrupt. If i read
> correctly, than (BIOS_regs)+BIOSR_AX contains the contents of the eflags
> processor register and not of %eax. Is this intended or should it contain
> the value of %eax?
>

Yeah, it looks like it's in the wrong order.  The trick, of course, is that
nothing actually examines BIOS_regs.biosr_ax, so the fact that the wrong
value is saved there hasn't mattered.

Since we're unlikely to do _more_ with BIOS calls in the boot loader, my
inclination would be to eliminate the structure value and the code that
sets it (incorrectly).  Opinions?


Philip Guenther


Re: this assembly example works in linux, netbsd - but not in openbsd, why?

2019-10-29 Thread Philip Guenther
On Tue, 29 Oct 2019, Guild Navigator wrote:
> Program prints first two strings directly.
> But it does not print the third string (1st array string).
> 
> And debugging says why.
> The address of msg1 and msg2 is not stored correctly in the array.
> So when I access the address of msg1 from the array:
> movq array (%rip), %rsi
> it is NOT the address of msg1.
> 
> I dont know if it is a linker problem?
> I could kind of do "manual relocation" of sorts to manually
> correct the addresses put in the array.

In general, if you're going to exclude all the C startup bits that the 
operating system has provided, then you've signed up for handling all the 
possible ELF bits yourself.  If you haven't boned up on ELF and its 
variations yet, then you should do so, if just so you can recognize when 
stuff has gone wrong.

In particular, the OpenBSD linker defaults to PIE.  To quote 
clang-local(1) (similar text is in gcc-local(1)):
 -   clang will generate PIE code by default, allowing the system to load
 the resulting binary at a random location.  This behavior can be
 turned off by passing -fno-pie to the compiler and -nopie to the
 linker.  It is also turned off when the -pg flag is used.

This means that yes, your executable has relocations.  You've left out the 
rcrt0.o code that OpenBSD provides that handles such relocations, 
therefore you must either do the relocation processing yourself, or invoke 
the linker with the -nopie flag to instead generate a staticly positioned 
binary.

If you want to handle the relocations yourself, then eyeball the code in 
/usr/src/lib/csu/, particularly boot.h and amd64/md_init.h, and read the 
ELF spec and the amd64 ABI spec for structure definitions and similar.


> But what would be the OpenBSD correct way to
> write such simple print-from-the-array-of-strings program?

The answer to that literal question is "write it in C", but you obviously 
have another requirement of "...in ASM".

There are *zero* places in OpenBSD where we write pure assembler programs.  
There are only two places where we do process bootstrap bits, lib/csu and 
libexec/ld.so/, and for both of those we do _just_ enough ASM to make 
calling a limited subset of C possible, and then call a C routine to do 
self-relocation.  Improving the C version of the self-relocation code is 
*much* faster than trying improve N assembly versions.  Heck, I did so 
just this month.

ASM is cool for stuff that C can't do, but if C can do it the developer's 
time (and the time of future openbsd maintainers!) is much better spent in 
C than in ASM.  I've touched ASM on every single current OpenBSD arch, so 
I understand the high cost of doing that when it _has_ been necessary and 
I have no interest in borking around in ASM for stuff C can reasonably do.


Philip Guenther



Re: help with understanding __BSD_VISIBLE

2019-07-13 Thread Philip Guenther
On Fri, Jul 12, 2019 at 10:39 AM Allan Streib  wrote:

> Probably an elementary question stemming from my lack of C expertise.
>
> I am trying to complile some C code that includes its own "bcrypt"
> function. This is conflicting with the declaration in pwd.h.
>
> error: conflicting types for 'bcrypt'
> int bcrypt(char *, const char *, const char *);
> ^
> /usr/include/pwd.h:112:8: note: previous declaration is here
> char*bcrypt(const char *, const char *);
>
> In pwd.h I see that the bcrypt declaration is wrapped in a #if block:
>
...

> __POSIX_VISIBLE is defined as 200809, so __BSD_VISIBLE should be 0 and
> the pwd.h declaration for bcrypt should be skipped?
>

There are four options here:
1) change the software to not use the name 'bcrypt' for a non-static
function.  OpenBSD has only been using it for 15 years...

2) If you're going to use the name bcrypt, then don't do so in files that
pull in .  (This would be a last choice in my book, as that's a
fragile setup)

3) *IF* the software was written to only rely on the interfaces of some
version of the POSIX standard, then follow the compilation rules described
in that standard.  You mention POSIX 2008, so perhaps this software would
build when following those rules, passing the compiler
-D_POSIX_C_SOURCE=200809L to only declare the symbols from that standard

Note that application software should *never* define macros matching the
pattern __*_VISIBLE such as __BSD_VISIBLE.  Those are in the reserved
namespace and on OpenBSD they are set by  based on the macros
specified in the various standards for use by application and build
software.  The ones you should care about are:
  _POSIX_C_SOURCE -- standardized: specifies a POSIX version
  _XOPEN_SOURCE-- standardized: specifies a POSIX + XSI version
  _ISOC11_SOURCE-- adds C2011 interfaces
  _BSD_SOURCE     -- adds all BSD and obsoleted interfaces

Make sense?

Philip Guenther


Re: Putting fifos in subshells into the background

2019-06-12 Thread Philip Guenther
On Wed, Jun 12, 2019 at 12:54 AM Richard Ulmer 
wrote:

> while making the Kakoune editor work on OpenBSD, I encountered some
> strange behaviour [1]. This little script doesn't work with the OpenBSD
> sh, but works at least with dash, bash and zsh:
>
> mkfifo 'testfifo'
> cat "$(
> ( printf 'foo\n' > testfifo 2>&1 ) > /dev/null 2>&1 &
> printf 'testfifo'
> )"
>
> I can make it work for all the mentioned shells like this:
>
> mkfifo 'testfifo'
> cat "$(
> ( ( printf 'foo\n' > testfifo 2>&1 ) & ) > /dev/null 2>&1
> printf 'testfifo'
> )"
>
> Can someone explain or justify the behaviour of the OpenBSD sh, or do
> you think this is a bug?
>

This is a bug, almost certainly from an over-zealous optimization in the
logic handling subshells where the possibility that an inner redirection
could be blocking wasn't taken into account when it tries to avoid
unnecessary forks.

Sorry, I don't have a fix in my back pocket.  Your workaround is good; I'll
note the intermediate set of parens can also be braces, which would let you
avoid the otherwise necessary whitespace between open-parens if that grates
on your soul like it does mine.  :)


Philip Guenther


Re: hw.ncpu=1, hw.ncpuonline=1, hw.ncpufound=4

2019-05-27 Thread Philip Guenther
On Mon, May 27, 2019 at 6:18 PM Ipsen S Ripsbusker <
ips...@ripsbusker.no.eu.org> wrote:

> Aaron Mason writes:
> > Looks to me like you're not running bsd.mp.  A dmesg would clear this
> up.
>
> Indeed I was not running bsd.mp. I switched to bsd.mp, and then 2 of 4
> CPUs were online. Then I set "sysctl hw.smt = 1" to get all 4 online.
>

This is a side-point, but you do understand that those extra 2 aren't full
CPUs, they're just the cardboard mockups that Intel sold you, and that if
you run any untrusted code (including javascript in a web-browser) that
those fake CPUs leak data across process boundaries, right?



> Otto Moerbeek writes:
> > On Sun, Apr 07, 2019 at 01:54:35PM +, Ipsen S Ripsbusker wrote:
> > > ...
> > > Also, now that I have realized this, I have a theory about a related
> > > issue, and I would like to know how I can debug it. I am using softraid
> > > CRYPTO, and I have found that accessing the disk with one process will
> > > interrupt the other processes accessing the disk. Now I wonder this
> > > happens because the sole core must switch encryption/decription
> > > processes for the different files. How could I determine whether this
> is
> > > indeed happening?
>

Can you explain in more detail what you were observing when you said "found
that accessing the disk with one process will interrupt the other processes
accessing the disk"?  The word 'interrupt' is overloaded in computing and
what you saw may be a real problem with device support, or it may be
completely innocuous, something which you should be ignoring.

Philip Guenther


Re: Purpose of primary and secondary user groups

2019-01-13 Thread Philip Guenther
On Sun, Jan 13, 2019 at 6:13 AM Bryan Harris  wrote:

> Is there also a difference when creating a file in a folder with set GID
> bit on that folder and owned by secondary group? I think in normal
> behavior, if folder allows a user to create a file (sec. group w/ 770
> perm.) then the new file group will not take the group of the folder but
> will take the group of the user's primary group. But if you have set GID
> bit then the new file will take the group of the folder it's in (which
> will be one of the user's secondary groups).
>
> I thought in OpenBSD there is also a flag to mount the filesystem to
> always do this regardless of set GID but I can't remember. I don't see
> it in the man page so maybe with all of this I'm really thinking of
> Linux but I can't remember.
>

Nope.  OpenBSD always uses the BSD behavior.  The use of the SGID bit on
directories to request BSD behavior was an addition in SystemV-based
systems when enough of their devs and users yelled at them to Not Be Stupid
And Provide the Better Behavior.  I'm not sure who or when first added the
mount option.  Linux certainly has both of those, but is not the only one.


Philip Guenther


Re: demystifying trap

2019-01-12 Thread Philip Guenther
On Sat, Jan 12, 2019 at 10:49 AM Predrag Punosevac 
wrote:

> Could one of peple with some rudimental knowledge of kernel interals
> tell me what am I seeing here
>
> Jan 12 13:42:37 oko /bsd: trap [mmonit-bin]89524/427284 type 6: sp
> 122488ae75d0 not inside 7f7fffbf4000-7f7f4000
>

'sp' means "stack pointer" in here.  The kernel is killing your process
because it moved its stack pointer outside the memory which was mapped with
MAP_STACK.  This is most often seen with userspace thread implementations
that haven't been updated to use MAP_STACK when allocating memory for
thread stacks.


Philip Guenther


Re: Porting some software to OpenBSD

2019-01-05 Thread Philip Guenther
On Sat, Jan 5, 2019 at 7:25 PM Adam Steen  wrote:

> I have a question about string (printf) formatting.
>
> I have a variable
>
> 'uint64_t freq'
>
> which is printed with
>
> 'log(DEBUG, "Solo5: clock_init(): freq=%lu\n", freq);'
>
> but am getting the following error
>
> '
> error: format specifies type 'unsigned long' but the argument has type
> 'uint64_t' (aka 'unsigned long long') [-Werror,-Wformat]
> freq);
> ^~~~
> 1 error generated.
> '
>
> The easy fix is to change the format to '%llu', but this brakes FreeBSD
> and Linux. Am i missing something or should i be investigating the log
> implementation?
>

Option 1)
log(DEBUG, "Solo5: clock_init(): freq=%llu\n", (unsigned long
long)freq);

Option 2)
#include 
log(DEBUG, "Solo5: clock_init(): freq=%"PRIu64"\n", freq);

Software native to OpenBSD uses option 1 when necessary.


Philip Guenther


Re: Purpose of primary and secondary user groups

2018-12-29 Thread Philip Guenther
On Sat, Dec 29, 2018 at 11:29 AM Ipsen S Ripsbusker <
ip...@ripsbusker.no.eu.org> wrote:

> Aside from compatibility, what is the purpose of primary groups,
> compared to secondary groups?
>
> Said otherwise, why do we have both primary and secondary groups
> rather than only secondary groups?
>
> Yet another phrasing: Why do I need to set a primary group?
>

Secondary groups can only be set, all at once, when running as root (e.g.,
login, sshd), while the primary group can be altered by setgid binaries and
then switched among using set*gid(2).

For filesystem objects like files and directories, the BSD behavior is for
the object to get its group from the directory in which it was created,
ignoring the groups of the process that created it.  On more SysV-like
systems the default is to take the primary group of the process that
created it.  However, for objects that exist in the kernel but not the
filesystem such as pipes, sockets, and SysV shared memory segments,
semaphores, and message queues, the common behavior is to take the primary
group of the process that created it.  This  doesn't have much effect other
than fstat() for pipes and sockets, but for SysV stuff it affects what
operations processes can perform.


Philip Guenther


Re: I can't make build stable 6.4

2018-12-22 Thread Philip Guenther
On Sat, Dec 22, 2018 at 10:29 AM Krzysztof Strzeszewski 
wrote:

> I change permission:
>
> chown build /usr/src/lib/libcrypto/obj/v3_info.o.d
> chown build /usr/src/lib/libcrypto/obj/v3_info.po.d
> chown build /usr/src/lib/libcrypto/obj/v3_info.so.d
> chown build /usr/src/lib/libcrypto/obj/v3_purp.so.d
>
> end it's ok.
>
> it is a bug end 4 files have bad permission for user root instead build...
>

Many of us run builds and haven't seen this.  The source files for those
have been around for over a decade and haven't been updated recently, so
this wasn't source files updated in the middle of the builds.

Without know the exact sequence of operations on this tree it would be hard
to diagnose how this happened.

"When in doubt, rm -rf /usr/obj/* before building"

Philip Guenther


Re: Is HPET timer accessible in userland?

2018-12-13 Thread Philip Guenther
On Thu, Dec 13, 2018 at 4:58 PM Paul Swanson  wrote:

> Is the HPET timer on AMD64 available to
> developers in OpenBSD user land?
>

No.

The CPU TSC is available to userspace.  Note you may need to use the RDTSCP
instruction on MP boxes (and VMs...) where the TSC is not consistent across
CPUs, real or virtual.


Philip Guenther


Re: netstat *:* udp sockets

2018-12-13 Thread Philip Guenther
On Thu, Dec 13, 2018 at 10:40 AM Ted Unangst  wrote:

> netstat -an tells me I am listening to all the udp.
>
> Active Internet connections (including servers)
> Proto   Recv-Q Send-Q  Local Address  Foreign Address
> (state)
> udp  0  0  *.**.*
> udp  0  0  127.0.0.1.53   *.*
> udp  0  0  *.**.*
> udp  0  0  *.5353 *.*
> udp  0  0  *.**.*
>
> What are those *.* sockets doing? How can you listen to all the ports?
>

Those are just UDP sockets on which connect() hasn't been called and that
aren't in the middle of a recvfrom()  or recvmsg(), no?


And, perhaps more directly, how would I block this in pf.conf?
>

Excellent choice, blocking dhclient from receiving the leases that it
requests.
"What problem are you trying to solve?"

Philip Guenther


Re: Core Dev?

2018-12-04 Thread Philip Guenther
On Tue, Dec 4, 2018 at 2:47 AM Marc Espie  wrote:

> (note that Antoine is the 2nd most prolific contributor to OpenBSD in terms
> of # of commits)
>

Sure, Marc, but that's just because Antoine is such a high caliber mole
that 22 years and 22k commits in order to backdoor AWS systems that were
_clearly_ going to happen is completely believable.

Philip Guenther


Re: statethreads crashes in ld on 6.4

2018-12-03 Thread Philip Guenther
On Mon, 3 Dec 2018, Claus Assmann wrote:
> Here's the dissambler output and the ktrace output follows.
> Unfortunately I don't know enough about this to figure out
> what is wrong, hopefully someone else can (or tell me which
> other information is still needed). TIA!

A close read of the ktrace output points to the problem:

...
>  65554 server   GIO   fd 2 wrote 89 bytes
>"[03/Dec/2018:08:28:29] INFO: process 0 (pid 65554): starting 8 
> threads on localhost:1234
>"

So it's just about to create its eight (userspace) threads...


>  65554 server   CALL  
> mmap(0,0x12000,0x3,0x1002,-1,0)
>  65554 server   RET   mmap 21771804393472/0x13cd24aac000
>  65554 server   CALL  
> mmap(0,0x12000,0x3,0x1002,-1,0)
>  65554 server   RET   mmap 21771451404288/0x13cd0fa09000
>  65554 server   CALL  
> mmap(0,0x12000,0x3,0x1002,-1,0)
>  65554 server   RET   mmap 21773345935360/0x13cd808cd000
>  65554 server   CALL  
> mmap(0,0x12000,0x3,0x1002,-1,0)
>  65554 server   RET   mmap 21774756491264/0x13cdd4a03000
>  65554 server   CALL  
> mmap(0,0x12000,0x3,0x1002,-1,0)
>  65554 server   RET   mmap 21774604423168/0x13cdcb8fd000
>  65554 server   CALL  
> mmap(0,0x12000,0x3,0x1002,-1,0)
>  65554 server   RET   mmap 21773142749184/0x13cd74707000
>  65554 server   CALL  
> mmap(0,0x12000,0x3,0x1002,-1,0)
>  65554 server   RET   mmap 21773994246144/0x13cda7314000
>  65554 server   CALL  
> mmap(0,0x12000,0x3,0x1002,-1,0)
>  65554 server   RET   mmap 21774606540800/0x13cdcbb02000

Eight mmaps, presumably one per thread...


>  65554 server   CALL  kbind(0x7f7d4fa8,24,0x8a4abe18ba78cb4a)
>  65554 server   RET   kbind 0

Okay, so this kbind() is by the original thread.  The first argument to 
kbind() happens to be a buffer which is always on the current thread's 
stack.  All is good here.

...
>  65554 server   CALL  kbind(0x13cd24abcc48,24,0x8a4abe18ba78cb4a)
>  65554 server   PSIG  SIGSEGV SIG_DFL addr=0x0 trapno=0
>  65554 server   NAMI  "server.core"

And now this kbind() call blows up: the address is not on the original 
thread's stack but in one of those mmap()s...but those mmap()s were not 
marked as stacks by including MAP_STACK.  To quote the "Security 
improvements" section of https://www.openbsd.org/64.html

* Implemented MAP_STACK option for mmap(2). At pagefaults and
  syscalls the kernel will check that the stack pointer points
  to MAP_STACK memory, which mitigates against attacks using
  stack pivots.


To confirm, if you check your dmesg(8) or /var/log/messages you should 
find the kernel complaining something like
   syscall [server]65554/### sp 13cd24a## not inside 0x7f7f###-0x7f7f###


Philip Guenther



Re: statethreads crashes in ld on 6.4

2018-12-02 Thread Philip Guenther
On Sun, Dec 2, 2018 at 7:51 PM Edgar Pettijohn 
wrote:

> Sorry just saw it came with some examples. Testing with the `lookupdns'
> program
> ended with a Bus error (core dumped). Here is gdb output:
>
> Core was generated by `lookupdns'.
> Program terminated with signal SIGBUS, Bus error.
> #0  _longjmp () at /usr/src/lib/libc/arch/amd64/gen/_setjmp.S:99
> 99  1:  movq%r11,0(%rsp)
> (gdb) bt
> #0  _longjmp () at /usr/src/lib/libc/arch/amd64/gen/_setjmp.S:99
> Backtrace stopped: Cannot access memory at address 0xb044815db732800f
>

Crashing on _longjmp() would suggest it's not happy with OpenBSD's
setjmp/longjmp XOR cookies, but those have been in for a while.  If
statethreads were working for Claus with 6.3 then he's hitting something
different.


Philip


Re: 6.4-release tset(1) really slow, what have I missed?

2018-12-02 Thread Philip Guenther
On Sun, Dec 2, 2018 at 2:15 PM Adam Thompson  wrote:

> I've successfully installed OpenBSD 6.4-RELEASE at OVH, but I'm noticing
> one thing there that's different from everywhere else I've used 6.4.
>
> tset(1) takes approximately 12-15 seconds to execute, (almost) every
> time.
>
> On a DigitalOcean VPS running 6.3-STABLE (via openup) tset sensibly
> takes about 1 or 2 seconds:
>athom...@mail.athompso.net:~$ time tset -s
>TERM=xterm;
>0m01.01s real 0m00.00s user 0m00.01s system
>athom...@mail.athompso.net:~$ uname -r
>6.3
>
> On the OVH VPS running 6.4-STABLE (via openup), the same command takes
> 15 seconds:
>athom...@mail2.athompso.net:~$ time tset -s
>TERM=xterm;
>0m15.19s real 0m00.00s user 0m00.01s system
>athom...@mail2.athompso.net:~$ uname -r
>6.4
>
>
> That's from two SSH sessions from the same client with the same
> parameters.
>
> I've captured ktrace(1) output, which shows tset(1) doing, well,
> nothing:
> ...
>   57429/443422  tset 0.035908 CALL
> kbind(0x7f7f7678,24,0xecf2201fc1aab9ca)
>   57429/443422  tset 0.035933 RET   kbind 0
>   57429/443422  tset 0.035950 CALL
> nanosleep(0x7f7f7760,0x7f7f7750)
>   57429/443422  tset 0.035967 STRU  struct timespec { 1 }
>   57429/443422  tset 15.809238 STRU  struct timespec { 0 }
>   57429/443422  tset 15.809272 RET   nanosleep 0
>   57429/443422  tset 15.809303 CALL
> kbind(0x7f7f76c8,24,0xecf2201fc1aab9ca)
>   57429/443422  tset 15.809380 RET   kbind 0
> ...
>
> I don't think this is a bug in 6.4, it's clearly environment-specific...
> but I have no idea what on earth could be causing it.
>

It requested a sleep of 1 second and 15 seconds passed.  That's a kernel
timetracking issue, so the output of "sysctl kern.timecounter" would be a
good place to start.  Is this is an MP kernel using the CPU TSC, but on a
VM where the virtual CPU's TSCs aren't in sync?


Philip Guenther


Re: statethreads crashes in ld on 6.4

2018-12-02 Thread Philip Guenther
On Sat, Dec 1, 2018 at 6:34 AM Claus Assmann 
wrote:

> statethreads (http://state-threads.sourceforge.net/) crashes on
> OpenBSD 6.4/amd64 (release) with an error in ld (see below); it
> works fine on previous OpenBSD versions.  Do I have to set some
> "special" cc/ld options to make this work?


That'll depend on what the problem turns out to be, of course...


> Or are patches to
> statehreads required (there doesn't seem to be a port for it,
> otherwise I would try that)?
>

Not that I know of.



> #0  0x0c0b0980db08 in _dl_bind (object=0xc0a85cff400, index=)
>from /usr/libexec/ld.so
> (gdb)
>

Since ld.so is relinked on each boot, just an address doesn't really show
what died.  The disassembly up to that address would help.
More important is knowing what signal killed the process.  ktracing it and
seeing what the syscalls leading up to signal were (and what extra info was
in the signal) tells a lot.


Philip Guenther


Re: why thread is not usable in perl5 of OpenBSD6.4?

2018-11-25 Thread Philip Guenther
On Sun, Nov 25, 2018 at 1:57 AM 岡本健二  wrote:

> I have to use thread on the perl5 of OpenBSD 6.4.
> However, it was disabled on the distribution.
>

Hmm, is this something that worked in previous releases, or is something
that you've only tried in OpenBSD 6.4?

Off-hand, it's still disabled by default in the Configure script that perl
people ship, and I don't see anything in the OpenBSD bits to override their
choice.



> I tried to make the thread active to recompile the perl5 with -Dusethreads,
> which led me to many test fails.
>

Were there tests that failed with -Dusethreads that passed when that wasn't
used?  If so, which, and what was their output?

To put it another way: if you're suggesting that we build the base perl
with -Dusethreads, what are the consequences of that?  Test failures?
Bigger binary?  pkg_add is slower?


Why the thread function was disabled in this release?
> Is it security reason?
>

 Upstream has it off by default, nothing so far has needed it, and it makes
things slower (or at least that's why upstream says).  Why would we enable
it?


Philip Guenther


Re: non-interactive sh and SIGTERM

2018-11-25 Thread Philip Guenther
On Fri, Nov 23, 2018 at 1:51 PM Olivier Taïbi  wrote:

> Sorry about the wrong report, I just tested again and I can see the same
> behaviour with OpenBSD 6.4: sending SIGTERM to the sh process after
> launching sh -c 'sleep 1000' does not result in sh sending a SIGTERM to
> the sleep process.
>

Hmm, why should it?  If you wanted to kill whatever processes where started
from that invocation, shouldn't you send SIGTERM to the process group?



> Philip, what was your test?
>

 : morgaine; sh -c 'while :; do :; done' &
[3] 16632
: morgaine; kill 16632
[3] - Terminated   sh -c "while :; do :; done"
: morgaine;
: morgaine; sh -c 'while :; do sleep 1; done' &
[3] 59539
: morgaine; kill 59539
: morgaine;
[3] - Terminated   sh -c "while :; do sleep 1; done"
: morgaine;

sh itself doesn't ignore SIGTERM, but rather exits after receiving it.


Philip Guenther


Re: non-interactive sh and SIGTERM

2018-11-22 Thread Philip Guenther
On Thu, Nov 22, 2018 at 3:08 PM Olivier Taïbi  wrote:

> It seems that non-interactive sh(1) (i.e. sh -c command or sh file)
> ignores the TERM signal. I'm surprised, is this the intended behaviour?
> The man page says that interactive shells will ignore SIGTERM, but does
> not mention the non-interactive case.
>

In my quick test it doesn't ignore SIGTERM, so you'll need to provide
additional information for us to help you.


Philip Guenther


Re: FreeBSD in vmm

2018-11-20 Thread Philip Guenther
On Tue, Nov 20, 2018 at 6:29 PM Ken M  wrote:

> Has anyone gotten this working?
>
> Just trying it as an experiment.
>
> I installed using qemu, serial console is working but when I boot through
> vmctl
> the console shows a supervisor read error, page not found which from what
> I read
> is indicative of bad memory. In qemu it boots fine though. Not sure what I
> am
> missing.
>

Not supported yet.  There will be some sort of announcement when it works.


Philip Guenther


Re: CURRENT userland does not compile due to games/glorkz

2018-11-12 Thread Philip Guenther
On Mon, Nov 12, 2018 at 2:41 AM Jyri Hovila [Turvamies.fi] <
jyri.hov...@turvamies.fi> wrote:

> > It's not a shortcut,
>
> This, as many things in this world, completely depend on the point of view.
>
> One can not simply say "this is this" or "this is not this", without
> sufficient background information and overall understanding of the
> situation as a whole.
>

...which you didn't include.  As the line from diagnostic medicine goes
"hear hoofbeats?  expect horses, not zebras".  Failure to mention why your
case is unusual suggests that you're a "normal case" -current follower, not
someone who has an undisclosed reason for never using snaps.  If you're
uninterested in what you (now?) know to be the normal answer, you should
say that so everyone's time can be saved.

It also means that you need crank up your debugging and analysis, so you
can work through these things yourself.  What failed?  Why?  What does that
imply?  What can you change to resolve it?  To avoid it?  To undo it?  If
when you build -current, you also build a release, do you still have the
files from your last successful build so you can rollback to something you
accept?  Do you have a test machine you can use your own snap to run
through the source update multiple times to experiment with solutions?

"What problem are you trying to solve?"

...

> >  It's fine if you want to waste your own time, but this is the
> > one single method of getting out of many holes, like yours.
>
> It is also perfectly fine if you want to ignore how the real world
> functions, and/or give a super irritating / dislikable impression of
> yourself and your personality. To give you back just a little, it certainly
> seems you know your holes well enough.
>

That was an unpleasant turn.


Philip Guenther


Re: heap full during amd64 boot.

2018-11-03 Thread Philip Guenther
On Sat, Nov 3, 2018 at 12:13 PM Angelo Rossi 
wrote:

> First of all you can't endorse me with services I cannot fullfill just
> because you're lowly sense of humor told this.


What is the effect (or goal, for that matter) of sharing this information
with a broad base of users?  Surely it's for more people to have and be
able to use that knowledge, no?  Meanwhile there have been multiple threads
between bugs@ and misc@ where people reported such single-filesystem setups
as having problems and were told "don't do that; it's a bad idea; use a
normal multi-FS setup".

If supporting such setups wasn't your goal, it was not clear what your goal
was from your original message.



> Then if the right behaviour
> for bootloader is to give error on this broken configuration it follows
> that the i386 arch is broiken since it permits to boot from such "crazy"
> partitioning scheme.


For my longer explanation of resource limitations at the bootloader and how
that interacts with testing the envelope, see here:
   https://marc.info/?l=openbsd-misc=154053727724928=3

Given that background, you should understand that adding extra checks to
the bootloader to detect and give a clearer error message for these "crazy"
setup will actually break *more* of them!

There are trade-offs: we made the changes we did because we thought they
were worth the cost.  Breaking more systems just to tell people clearly
that their setups are unwise seems like a bad trade-off to me, but maybe
it's the Right Thing if it'll eliminate these threads.


Philip Guenther


Re: heap full during amd64 boot.

2018-11-03 Thread Philip Guenther
On Sat, Nov 3, 2018 at 7:59 AM Angelo Rossi 
wrote:

> When using a=/ and b=swap partitioning scheme on installation or upgrading
> from 6.3 with same partitioning scheme the default HEAP_SIZE=0xA
> generates heap full error during boot in 6.4 amd64. To solve this I
> installed the -stable soiurces from AnonCVS, and changed HEAP_SIZE from
> 0xA to 0xC (the machine I tested an needs 700687 B so it should
> work with an
> HEAP_SIZE=0xB) in the file
> /usr/src/sys/arch/amd64/stand/Makefile.inc . Then I compiled and installed
> the new bootloader.
>

...and thereby continued to run a badly configured system, while providing
information to others on how to do so as well.

HEY EVERYONE USING A SINGLE PARTITION: YOU CAN GET SUPPORT FROM <
angelo.rossi.home...@gmail.com>.




Re: How effectiate login.conf changes in console? ("ksh -l" does not)

2018-10-30 Thread Philip Guenther
On Mon, Oct 29, 2018 at 9:19 PM Joseph Mayer 
wrote:

> On Tuesday, October 30, 2018 1:56 PM, Philip Guenther 
> wrote:
> > On Mon, Oct 29, 2018 at 8:40 PM Joseph Mayer joseph.ma...@protonmail.com
> > wrote:
> >
> > > After having changed /etc/login.conf I'd like to effectuate the
> > > changes directly in the console, without doing a logout-relogin
> > > cycle.
> > > Running "ksh -l" does not effectuate login.conf changes but only
> > > re-runs the profile script [1].
> > > Running "login" asks for username and password which seems less
> > > efficient than possible.
> > > Is there any way to do this?
> >
> > Since changes to login.conf may mean raising/increasing hard limits,
> which
> > can only be done by privileged processes, the only sure fire way to have
> > login.conf changes take effect is to logout and log back in.
>
...

> What about "su -l" [1]?
>
...

> If I'm root and do "su -l", will root's login.conf settings be applied?
>
> su.c [2] uses setusercontext() [3], and because emlogin is 0,
> LOGIN_SETRESOURCES is specified as flag, and so is LOGIN_SETUMASK -
> meaning login.conf settings are indeed effectuated by root doing
> "su -l" (relogin as root) or "su -l someuser" (login as someuser).
>
> Correct?
>

I guess?  Frankly, this is not an area I really care about: if I wanted to
test a login.conf change I would either logout/login if I had the password
for the account, or I suppose if it came up that I didn't, "su -c class" as
root.If 'su -l' works for you in your testing (you did test it, yes?),
then use it.


Philip Guenther


Re: How effectiate login.conf changes in console? ("ksh -l" does not)

2018-10-29 Thread Philip Guenther
On Mon, Oct 29, 2018 at 8:40 PM Joseph Mayer 
wrote:

> After having changed /etc/login.conf I'd like to effectuate the
> changes directly in the console, without doing a logout-relogin
> cycle.
>
> Running "ksh -l" does *not* effectuate login.conf changes but only
> re-runs the profile script [1].
>
> Running "login" asks for username and password which seems less
> efficient than possible.
>
> Is there any way to do this?


Since changes to login.conf may mean raising/increasing hard limits, which
can only be done by privileged processes, the only sure fire way to have
login.conf changes take effect is to logout and log back in.


Philip Guenther


Re: Dell PowerEdge R410 not booting 6.4

2018-10-26 Thread Philip Guenther
On Thu, Oct 25, 2018 at 8:44 PM diego righi  wrote:

> So why openbsd 6.4 i386 and amd64 bootloaders (not biosboot, boot!)
> express different behavior? Wasn't openbsd about correctness? :/
>
If I'm wrong and it is documented that I can't do this fine, but so also
> i386 should not work, this behavior is just strange for me, that's it.
>

This is something that most people, perhaps even most software developers,
are not strongly aware of: that resource requirements are often both
fine-grained and sharp-edged.  That is: the exact requirements can vary in
many fine-steps between systems, but there can be a sharp edge at which
performance plummets badly or the system totally fails.  This is true of
*many* systems (including lots of cloud services) which work just fine
until they *suddenly* fail, because the memory straw broke the available
RAM camel's back, or the micro-service is now taking just _longer_ to
service one request than the inter-request arrival interval so the queue so
the queue grows in latency past the user/system tolerance.

Case in point: the memory resources required by the biosboot code depend
many factors including:
* the size of the root partition
* the block size of the root partition (which is affected by the size)
* the inode number of the kernel being booted
* the exact disk block numbers which the booted kernel was put in

We all, the developers and the community of user who actively test -current
kernel (THANK YOU!) exercise various combinations of those, the *vast*
majority of which use the recommended system layout.  That recommend layout
doesn't push the first two of those items at all, and keeps the third and
fourth in sane ranges.

Meanwhile, those using monster root partitions have unknowingly been
pushing the memory usage by biosboot, but below its limits.

So, some change was made during the 6.3->6.4 dev cycle which requires
_slightly_ more memory in biosboot.  Maybe it was something about the
compiler upgrades, or the maybe it was the SoftRaid crypto passphrase-retry
change.  Or maybe it was the tiny tweak of making biosboot default the
console to com0 @ 115200 on VMs.Something made biosboot take more
memory...and now those systems with monster root partitions were pushed
over the edge of how much memory biosboot has available.

Rule of thumb: the costs must be worth the gain.
So:
 * enhancements and fixes break systems that don't get actively tested
 * we are are not going to block enhancements/fixes because of that
 * we test what we recommend, on many systems
 * if a change breaks the recommended config, then it'll get reverted/fixed
 * ...this is more likely the more quickly the problem is reported
 * ...and even then the recommendation for the future might change
 * we also test some systems that go beyond those recommendations...
 * ...but not all
 * if a system that doesn't follow the recommendations breaks as the result
of a change, the developers will make a judgement about whether the gain of
the change is worth the costs.  We don't like breaking any config, even
unusual ones, but if we think the setup is inadvisable, we'll say so and
move on.

In this particular case:
 * the changes to biosboot where in snapshots for MONTHS, but no one
reported problems
 * if you aren't following recommendations, and aren't testing snapshots,
then you should be 100% willing to change your configuration on upgrade,
'cause you ain't giving the feedback necessary to keep your unusual config
alive
 * SINGLE PARTITION CONFIGS ARE DUMB, DON'T DO THAT; DON'T BOTHER
COMPLAINING, JUST FIX THEM.


Philip Guenther


Re: dmesg for edgerouter 6p

2018-10-23 Thread Philip Guenther
On Tue, Oct 23, 2018 at 12:14 PM Holger Glaess  wrote:

> i upgrade from an native 6.4 beta installation , no problems at all.
>

To quote the email sent to your local 'root' user after install/upgrade:

If you wish to ensure that OpenBSD runs better on your machines, please do
us
a favor (after you have your mail system configured!) and type something
like:
 # (dmesg; sysctl hw.sensors) | \
mail -s "Sony VAIO 505R laptop, apm works OK" dm...@openbsd.org
so that we can see what kinds of configurations people are running.  As
shown,
including a bit of information about your machine in the subject or the body
can help us even further.  We will use this information to improve device
driver
support in future releases.  (Please do this using the supplied GENERIC
kernel,
not for a custom compiled kernel, unless you're unable to boot the GENERIC
kernel.  If you have a multi-processor machine, dmesg results of both
GENERIC.MP
and GENERIC kernels are appreciated.)  The device driver information we get
from
this helps us fix existing drivers. Thank you!


Re: Bootloader failing to install on 2012 Mac Mini (Openbsd 6.4)

2018-10-23 Thread Philip Guenther
On Tue, Oct 23, 2018 at 4:38 PM Liam Wigney  wrote:

> I've used Openbsd before but my installs have gone smoothly with no issues
> and this is really the first time it's been a problem. The install is a
> super boring one, it's whole disk Openbsd with the default gpt partition
> layout and nothing else special.
>
> During the install after the sets are successfully installed there's a
> notification that the bootloader has failed to install due to mkdir being
> called with an invalid argument.


All the error messages from installboot from mkdir failing include both the
path and the specific error message.  Those are included because they're
helpful in understanding exactly what failed (and thus what could be
wrong).  Including the _exact_ and _full_ error message would make it
easier to assist.

(Ruling out stuff that _didn't_ fail is key to figuring out root causes.)



> Some research online said that I should
> try to do installboot manually in the subsequent prrompt, so I called
> installboot sd0 and got the following error
>
> installboot: /usr/mdec/biosboot: No such file or directory
>

Yes, when running from the bsd.rd ramdisk additional argument are necessary
so that installboot can find the files it needs and disk on which to
install them.  ...but doing that will just replicate what the upgrade
script already did and the error it gave you...

At this point, the two pieces of information that would help the most are:
1) the *EXACT AND FULL* error message that the upgrader reported from
installboot
2) what your disklabel and partition layout looks like.  The output of "df
-k" from the ramdisk shell prompt after the upgrade fails would be good,
for example, as it has everything mounted under /mnt.


Philip Guenther


Re: ath.c -> dmesg -> bug

2018-10-10 Thread Philip Guenther
On Wed, Oct 10, 2018 at 3:35 PM NN  wrote:

> I try to analyse my dmesg with:
>
>  # dmesg | grep ath0
>
> and I can see ERROR message:
>
>  > ath0 device timeout ...
>
> I have checked "ath.c" file in "/cvs/src/sys/dev/ic/" on stable branch.
>
> I found this one construction: "--sc->sc_tx_time == 0". Probably it's
> meen "0 == 0",


> I have made this patch (see in attachment) and now it's working without
> any ERROR/WARNING for me.
>
> Please confirm.
>
> If my FIX for "ath.c" is correct, please update cvs in new 6.4 Release.
>
...

> --- ath.c31 Jan 2018 11:27:03 -1.116
> +++ ath.c11 Oct 2018 00:06:54 -
> @@ -930,7 +930,7 @@ ath_watchdog(struct ifnet *ifp)
>   if ((ifp->if_flags & IFF_RUNNING) == 0 || sc->sc_invalid)
>   return;
>   if (sc->sc_tx_timer) {
> -if (--sc->sc_tx_timer == 0) {
> +if (sc->sc_tx_timer == 0) {
>

This diff cannot be correct: the condition right above it is only true if
sc->sc_tx_timer is non-zero, so then testing whether it is _currently_ zero
will never be true.  That's also the only place sc->sc_tx_timer is
decremented, so deleting the '--' disables the timeout.  The existing code
decrements it and then tests whether it's zero, effectively testing whether
sc->sc_tx_timer was exactly 1.

Your work to update the driver in the thread from October 5th is a more
productive way to address the issues you're experiencing.


Philip Guenther


  1   2   3   4   5   6   7   8   9   10   >