Re: rw_lock_held() & friends

2020-12-07 Thread Philip Guenther
On Mon, Dec 7, 2020 at 11:16 AM Alexandr Nedvedicky <
alexandr.nedvedi...@oracle.com> wrote:

> What's the plan for rw_write_held()? Currently macro is true, if whoever
> holds
> the lock.
>
> >
> > +#define  rw_write_held(rwl)  (rw_status(rwl) == RW_WRITE)
>

Nope.  rw_status() returns RW_WRITE_OTHER if a different thread holds the
lock.


Philip


Re: Use SMR_TAILQ for `ps_threads'

2020-12-02 Thread Philip Guenther
On Tue, Dec 1, 2020 at 5:48 AM Martin Pieuchot  wrote:
...

> exit1() for a multi-threaded process does the following:
>
> atomic_setbits_int(>p_flag, P_WEXIT);
> single_thread_check(p, 0);  // directly or via single_thread_set()
> SMR_TAILQ_REMOVE_LOCKED(>ps_threads, p, p_thr_link);
>
> Note that exit1() can't be called in parallel.
>

Well, multiple threads in a process can easily hit exit1() practically
concurrently, under the limitations of the kernel lock.  If anything in
exit1() sleeps, it has to be ready for some other thread to start into
exit1()!


If exit1() is called with EXIT_NORMAL, the current thread will start by
> setting `ps_single' then iterate over `ps_thread'.  This is done before
> updating `ps_thread' so there's no race in that case.
>
> If exit1() is called with EXIT_THREAD, the current thread might end up
> removing itself from `ps_thread' while another one is iterating over it.
> Since we're currently concerned about the iteration in single_thread_set()
> notice that `P_WEXIT' is checked first.  So if my understanding is correct
> is should be enough to do the following:
>
> SMR_TAILQ_REMOVE_LOCKED(>ps_threads, p, p_thr_link);
> smr_barrier();
>
> That would delays exit1() a bit but doesn't involve callback which is
> IMHO simpler.
>
> Did I miss anything?
>

What does it take to be sure that an "is empty" (or "only contains one")
test is _really_ true with SMR?

Philip


Re: stdio: fclose(3), fopen(3): intended locking hierarchy?

2020-11-30 Thread Philip Guenther
On Mon, Nov 30, 2020 at 6:10 PM Scott Cheloha 
wrote:

> On Sat, Nov 28, 2020 at 01:41:50PM -0800, Philip Guenther wrote:
>
...

> > After thinking through states more, #4 isn't safe: _fwalk() can't take
> > sfp_mutex because it's called from __srefill() which has its FILE locked,
> > which would reverse the order: two concurrent calls to __srefill() from
> > line-buffered FILEs could have one in _fwalk() blocking on locking the
> > other, while the other blocks on the sfp_mutex for _fwalk().
>
> This part in __srefill():
>
> /*
>  * Before reading from a line buffered or unbuffered file,
>  * flush all line buffered output files, per the ANSI C
>  * standard.
>  */
>
...

> Where in the standard(s) is this behavior required?  I'm not even sure
> how to look for it.
>

Pick a version of the C standard and search for "buffered" until you find
something like this, which is part of 7.19.3p3 in the draft I'm looking at:

   <...>  When a stream is line buffered, characters are intended to be
   transmitted to or from the host environment as a block when a new-line
character is
   encountered. Furthermore, characters are intended to be transmitted as a
block to the host
   environment when a buffer is filled, when input is requested on an
unbuffered stream, or
   when input is requested on a line buffered stream that requires the
transmission of
   characters from the host environment. Support for these characteristics
is
   implementation-defined, and may be affected via the setbuf and setvbuf
functions.

Effectively the same text appears in the POSIX standard in XSH 2.5p6.

Basically, you're allowed to stop doing that by instead printing out your
cell-phone number so that everyone who wants to complain that their program
failed to output its prompt before blocking for input can call and scream
at you.  :D


Philip Guenther


Re: kvm_getfiles and KERN_FILE_BYFILE

2020-11-29 Thread Philip Guenther
On Sun, Nov 29, 2020 at 12:14 PM Martijn van Duren <
openbsd+t...@list.imperialat.at> wrote:

> On Sat, 2020-11-28 at 16:23 -0800, Philip Guenther wrote:
> > On Thu, Nov 26, 2020 at 1:08 PM Martijn van Duren <
> openbsd+t...@list.imperialat.at> wrote:
> > > I'm currently playing around a bit with kvm_getfiles and found that I
> > > couldn't use KERN_FILE_BYFILE with DTYPE_SOCKET.
> > > According to kvm_getfiles(3):
> > >  For KERN_FILE_BYFILE the recognized file types are defined in
> > >  :
> > >
> > >DTYPE_VNODE   files and devices
> > >DTYPE_SOCKET  sockets, regardless of domain
> > >DTYPE_PIPEpipes and FIFOs
> > >DTYPE_KQUEUE  kqueues
> > >
> > > But these defines are under ifdef _KERNEL.
> > >
> > > So is the manpage lying here, or should the defines be hoisted out
> > > of the ifdef?
> > >
> >
> >
> > Let's go ahead and hoist them: FreeBSD and NetBSD already have.  If
> possible, the diff to do that should also simplify the #include bits in
> these files:
> > usr.bin/netstat/inet.c
> > usr.bin/fstat/fstat.c
> > usr.bin/fstat/fuser.c
> > usr.bin/systat/netstat.c
> >
> >
> > Philip Guenther
> >
>
> The others have the #endif/#ifdef break rather low in the file.
> Personally I reckon it's better reading if the common code is more
> towards the top.
>
> OK?
>

ok guenther@

How do the userland clean up bits look?


Philip Guenther


Re: kvm_getfiles and KERN_FILE_BYFILE

2020-11-28 Thread Philip Guenther
On Thu, Nov 26, 2020 at 1:08 PM Martijn van Duren <
openbsd+t...@list.imperialat.at> wrote:

> I'm currently playing around a bit with kvm_getfiles and found that I
> couldn't use KERN_FILE_BYFILE with DTYPE_SOCKET.
> According to kvm_getfiles(3):
>  For KERN_FILE_BYFILE the recognized file types are defined in
>  :
>
>DTYPE_VNODE   files and devices
>DTYPE_SOCKET  sockets, regardless of domain
>DTYPE_PIPEpipes and FIFOs
>DTYPE_KQUEUE  kqueues
>
> But these defines are under ifdef _KERNEL.
>
> So is the manpage lying here, or should the defines be hoisted out
> of the ifdef?
>

Let's go ahead and hoist them: FreeBSD and NetBSD already have.  If
possible, the diff to do that should also simplify the #include bits in
these files:
usr.bin/netstat/inet.c
usr.bin/fstat/fstat.c
    usr.bin/fstat/fuser.c
usr.bin/systat/netstat.c


Philip Guenther


Re: stdio: fclose(3), fopen(3): intended locking hierarchy?

2020-11-28 Thread Philip Guenther
On Fri, Nov 27, 2020 at 10:35 PM Philip Guenther  wrote:
...

> So yeah, maybe it does work to:
> 1) make __sfp() FLOCKFILE() the allocated FILE before unlocking sfp_mutex
> 2) update f{,d,mem,un}open() and open_*memstream() to match (1), unlocking
>in all paths when the FILE is safe to be accessed by _fwalk(), and
> locking
>sfp_mutex around the zeroing of _flags.
> 3) make fclose() and freopen() also lock sfp_mutex around the zeroing of
> _flags
>(should add an _frelease() to findfp.c that does this dance for (2) and
> (3))
> 4) make _fwalk() take sfp_mutex, and maybe also a FILE* so the setting of
>__SIGN can be done under the lock?
>
5) decide how/whether to handle adjust the FLOCKFILE placement in the
> _fwalk()
>tree: is the testing of the "is line-buffered" flag in lflush() safe
> without
>a lock?  Mumble...
>

After thinking through states more, #4 isn't safe: _fwalk() can't take
sfp_mutex because it's called from __srefill() which has its FILE locked,
which would reverse the order: two concurrent calls to __srefill() from
line-buffered FILEs could have one in _fwalk() blocking on locking the
other, while the other blocks on the sfp_mutex for _fwalk().

Hmm, there's currently a loop between two __srefill() calls like that, as
there's nothing to force visibility of the __SIGN flag between CPUs so they
could try to lock each other.  Grrr.

Time to check other BSDs and see if they have a good solution to this...


Philip





>
> Now, back to that first assumption: if you're not willing to assume
> it then the "is allocated" test needs to not use _flags but be some
> other state change which can be relied on to not have false-positives,
> but otherwise the other bits above still apply, I believe.  Would
> be a major ABI change...and if we do that there's like 3 other
> things we should do at the same time (merge __sfileext into FILE,
> grow _file to an int, and maybe move the recursive mutex for
> f{,un}lockfile() into FILE?)
>
>
> Philip Guenther
>
>


Re: stdio: fclose(3), fopen(3): intended locking hierarchy?

2020-11-27 Thread Philip Guenther
ted" test needs to not use _flags but be some
other state change which can be relied on to not have false-positives,
but otherwise the other bits above still apply, I believe.  Would
be a major ABI change...and if we do that there's like 3 other
things we should do at the same time (merge __sfileext into FILE,
grow _file to an int, and maybe move the recursive mutex for
f{,un}lockfile() into FILE?)


Philip Guenther


Re: Fix ilogb(3)

2020-11-07 Thread Philip Guenther
On Fri, Nov 6, 2020 at 4:51 PM George Koehler  wrote:

> Your ilogb fix is ok gkoehler@
>

It's annoying that C and/or ieee754 and the original hardware
implementation in the x87 instructions diverged in their definitions, but
the former is what matters and libm needs to follow that.  ok guenther@



> On Sat, 31 Oct 2020 16:09:07 +0100 (CET)
> Mark Kettenis  wrote:
>
> > - Dropping the amd64 and i386 versions.  Fixing the corner cases in
> >   assembly is hard, and the C implementation should be fast enough for
> >   regular floating-point values.
>
> The amd64 and i386 assembly uses the x87 fxtract instruction.  I feel
> that x87 instructions are obsolete on amd64.  <...>


Umm, no?  The amd64 ABI defines "long double" as matching the format used
by the x87 instructions; a function returning that type returns it in the
x87 %st(0) register and a review of libm finds many of the functions are
naturally implemented by the native x87 instructions.

I believe the main issue in this case is that the standard evolved away
from what Kahan originally did as a consultant with Intel.  For a note on
how the ieee754 standard originated in (large) part from what Kahan did
with Intel, see
  https://people.eecs.berkeley.edu/~wkahan/ieee754status/754story.html

I'm not nearly enough of a numerical analyst to judge the decision of the
standard.


Philip Guenther


Re: powerpc ld.lld fix

2020-11-01 Thread Philip Guenther
Makes sense.  This code is just the space reservation, the relocation
generation or whatever fills them in is suppressed already, yes?  Assuming
so, r+

On Sat, Oct 31, 2020 at 2:46 PM Mark Kettenis 
wrote:

> > Date: Sat, 10 Oct 2020 23:19:19 +0200 (CEST)
> > From: Mark Kettenis 
> >
> > On powerpc with the secure-plt ABI we need a .got section, even if the
> > _GLOBAL_OFFSET_TABLE_ symbol isn't referenced.  This is needed because
> > the first three entries of the GOT are used by the dynamic linker.
> >
> > With this fix I can build executables of all flavours (including
> > -static/-nopie).
>
> Turns out that adding these GOT entries when using "ld -r" is a bad
> idea.  Dif below fixes that.
>
> ok?
>
>
> Index: gnu/llvm/lld/ELF/SyntheticSections.cpp
> ===
> RCS file: /cvs/src/gnu/llvm/lld/ELF/SyntheticSections.cpp,v
> retrieving revision 1.2
> diff -u -p -r1.2 SyntheticSections.cpp
> --- gnu/llvm/lld/ELF/SyntheticSections.cpp  11 Oct 2020 13:10:13
> -  1.2
> +++ gnu/llvm/lld/ELF/SyntheticSections.cpp  31 Oct 2020 23:37:11 -
> @@ -604,7 +604,7 @@ GotSection::GotSection()
>// ElfSym::globalOffsetTable.
>if (ElfSym::globalOffsetTable && !target->gotBaseSymInGotPlt)
>  numEntries += target->gotHeaderEntriesNum;
> -  else if (config->emachine == EM_PPC)
> +  else if (config->emachine == EM_PPC && !config->relocatable)
>  numEntries += target->gotHeaderEntriesNum;
>  }
>
>
>


Re: Diff: Introductory Clause Comma Crap

2020-10-31 Thread Philip Guenther
On Sat, Oct 31, 2020 at 10:19 AM VARIK VALEFOR  wrote:

> This message contains even more grammatical fixes for the OpenBSD
> manual pages.  To ensure that the changes which are proposed in this
> message can be justified, this message primarily fixes only a single
> type of error: the absence of commas after introductory clauses.
>

As a procedural side-note, I would like to suggest that before going on a
quest to make a change that touches so many files cross OpenBSD's code
base, that it would be wise to send out a diff with just a couple examples
and verify that the particular item of concern and the proposed fix is
agreed upon before spending the time to search and edit many other pages.

This is true not just for documentation changes but code changes, of
course: doing lots of work before there's "buy-in" is risking your time.


Philip Guenther


Re: [PATCH] ifconfig: keep track of allowed ips pointer correctly

2020-10-27 Thread Philip Guenther
On Tue, Oct 27, 2020 at 9:18 AM Jason A. Donenfeld  wrote:
...

> @@ -5721,16 +5720,18 @@ growwgdata(size_t by)
> if (wg_aip != NULL)
> wg_aip = (void *)wg_interface + aip_offset;
>
> -   ret = (void *)wg_interface + wgdata.wgd_size - by;
> -   bzero(ret, by);
> -
> -   return ret;
> +   bzero((void *)wg_interface + wgdata.wgd_size - by, by);
>  }
>

Total side note, but this caught my eye as relying on the gcc extension to
do pointer arithmetic on "void *" pointers.  At least historically we've
avoided relying on that, which is easy in this case by just casting to
"char *" instead.

Philip Guenther


Re: [patch] const cclasses reference in ksh

2020-10-26 Thread Philip Guenther
On Mon, Oct 26, 2020 at 8:35 AM Matthew Martin  wrote:

> Recently cclasses in lib/libc/gen/charclass.h was made const.[1]
> Mark the pointer used to walk the array in ksh const as well.
>
> 1: https://marc.info/?l=openbsd-cvs=160256416506433=2
>

Ugh, I totally missed that reach-around.

ok guenther@



> diff --git misc.c misc.c
> index 9e6e9db5e76..7226f74eccf 100644
> --- misc.c
> +++ misc.c
> @@ -713,7 +713,7 @@ do_gmatch(const unsigned char *s, const unsigned char
> *se,
>  static int
>  posix_cclass(const unsigned char *pattern, int test, const unsigned char
> **ep)
>  {
> -   struct cclass *cc;
> +   const struct cclass *cc;
> const unsigned char *colon;
> size_t len;
> int rval = 0;
>
>


Re: Infinite loop in uvm protection mapping

2020-10-19 Thread Philip Guenther
On Mon, Oct 19, 2020 at 3:13 PM Tom Rollet  wrote:

> Hi,
>
> I'm starting to help in the development of the dt device.
>
> I'm stuck on permission handling of memory. I'm trying to allocate a
> page in kernel with read/write protections, fill the allocated page
> with data then change the permissions to  read/exec.
>
> Snippet of my code:
>
>   addr = uvm_km_alloc(kernel_map, PAGE_SIZE);
>
>  [...] (memcpy data in allocated page)
>
>   uvm_map_protect(kernel_map, addr, addr + PAGE_SIZE, PROT_READ
>  | PROT_EXEC, FALSE)))
>

This is same usage as seen in the 'sti' driver...which is on hppa only, so
while it's presumably the correct usage of uvm_km_alloc() and
uvm_map_protect(), I don't think uvm_map_protect() has been used on
kernel-space on amd64 (or possibly all non-hppa archs) before in OpenBSD.
Whee?


It triggers the following error at boot time when executing
> the uvm_map_protect function.
>
> uvm_fault(0x81fb2c90, 0x7ffec0008000, 0, 2) -> e kernel: page fault
> trap, code=0 Stopped atpmap_write_protect+0x1f5:  lock andq
> $-0x3,0(%rdi)
>
> Trace:
>
> pmap_write_protect(82187b28,80002255b000,80002255c000,
>  5,50e8b70481f4f622,fd81b6567e70) at pmap_write_protect+0x212
> uvm_map_protect(82129ae0,80002255b000,80002255c000
>  ,5,0,82129ae0) at uvm_map_protect+0x501
> dt_alloc_kprobe(815560e0,80173900,e7ef01a2855152cc,
>  82395c98,0,815560e0) at dt_alloc_kprobe+0x1ff
> dt_prov_kprobe_init(2333e28db00d3edd,0,82121150,0,0,
>  824d9008) at dt_prov_kprobe_init+0x1d9
> dtattach(1,821fb384,f,1,c2ee1c3f472154e,2dda28) at dtattach+0x5d
> main(0,0,0,0,0,1) at main+0x419
>
> The problem comes from the loop in pmap_write_protect
> (sys/arch/amd64/amd64/pmap.c:2108) that is executed
> infinity in my case.
>
> Entry of function pmap_write_protect:
>  sva:  80002250A000
>  eva:  80002250B000
>
> After &= PG_FRAME (line 2098-2099)
>  sva= F80002250A000
>  eva= F80002250B000
>
>   loop:  (ligne 2108)
>
>   first iteration:
>  va   = F80002250A000
>  eva = F80002250B000
>  blockend = 080012240
>
...

> Does anyone have an idea how to fix this issue?


So, blockend is clearly wrong for va and eva.  I suspect the use of
L2_FRAME here:
   blockend = (va & L2_FRAME) + NBPD_L2;

is wrong here and it should be
   blockend = (va & VA_SIGN_NEG(L2_FRAME)) + NBPD_L2;

or some equivalent expression to keep all the bits above the frame.


Philip Guenther


Re: incorrect result from getppid for ptraced processes

2020-09-04 Thread Philip Guenther
On Fri, Sep 4, 2020 at 2:59 PM Mateusz Guzik  wrote:

> On 9/5/20, Philip Guenther  wrote:
> > On Fri, Sep 4, 2020 at 1:06 PM Mateusz Guzik  wrote:
> >
> >> On 9/4/20, Vitaliy Makkoveev  wrote:
> >> > On Fri, Sep 04, 2020 at 05:24:42PM +0200, Mateusz Guzik wrote:
> >> >> getppid blindly follows the parent pointer and reads the pid.
> >> >>
> >> >> The problem is that ptrace reparents the traced process, so in
> >> >> particular if you gdb -p $something, the target proc will start
> seeing
> >> >> gdb instead of its actual parent.
> >> >>
> >> >> There is a lot to say about the entire reparenting business or
> storing
> >> >> the original pid in ps_oppid (instead of some form of a reference to
> >> >> the process).
> >> >>
> >> >> However, I think the most feasible fix for now is the same thing
> >> >> FreeBSD did: *always* store the actual parent pid in ps_oppid. This
> >> >> means all repareting will keep updating it (most notably when
> >> >> abandoning children on exit), while ptrace will skip that part.
> >> >>
> >> >> Side effect of such a change be that getppid will stop requiring the
> >> >> kernel lock.
> >> >>
> >> >
> >> > Thanks for report. But we are in beta stage now so such modification
> is
> >> > impossible until next iteration.
> >> >
> >> > Since original parent identifier is stored as `ps_oppid' while process
> >> > is traced we just return it to userland for this case. This is the way
> >> > I
> >> > propose to fix this bug for now.
> >> >
> >> > Comments? OKs?
> >> >
> >> > Index: sys/kern/kern_prot.c
> >> > ===
> >> > RCS file: /cvs/src/sys/kern/kern_prot.c,v
> >> > retrieving revision 1.76
> >> > diff -u -p -r1.76 kern_prot.c
> >> > --- sys/kern/kern_prot.c  9 Jul 2019 12:23:25 -   1.76
> >> > +++ sys/kern/kern_prot.c  4 Sep 2020 21:12:15 -
> >> > @@ -84,7 +84,11 @@ int
> >> >  sys_getppid(struct proc *p, void *v, register_t *retval)
> >> >  {
> >> >
> >> > - *retval = p->p_p->ps_pptr->ps_pid;
> >> > + if (p->p_p->ps_flags & PS_TRACED)
> >> > + *retval = p->p_p->ps_oppid;
> >> > + else
> >> > + *retval = p->p_p->ps_pptr->ps_pid;
> >> > +
> >> >   return (0);
> >> >  }
> >>
> >> This is definitely a bare minimum fix, but it does the job.
> >>
> >
> > ptrace() has behaved like this for the life of OpenBSD and an indefinite
> > number of years previous in the BSD releases.  What has happened that a
> > definitely incomplete fix is needed Right Now?
>
> I don't see how this reads as a demand this is fixed Right Now.
>

I didn't call it a demand, but the point stands: what has changed?


I don't see how the fix is incomplete either. It can be done better
> with more effort, but AFAICS the above results in correct behavior.
>

There are at least 2 other uses of ps_pptr->ps_pid that should also change,
unless you like coredumps and ps disagreeing with getppid(), and someone
needs to think how it affects doas.

Philip Guenther


Re: incorrect result from getppid for ptraced processes

2020-09-04 Thread Philip Guenther
On Fri, Sep 4, 2020 at 1:06 PM Mateusz Guzik  wrote:

> On 9/4/20, Vitaliy Makkoveev  wrote:
> > On Fri, Sep 04, 2020 at 05:24:42PM +0200, Mateusz Guzik wrote:
> >> getppid blindly follows the parent pointer and reads the pid.
> >>
> >> The problem is that ptrace reparents the traced process, so in
> >> particular if you gdb -p $something, the target proc will start seeing
> >> gdb instead of its actual parent.
> >>
> >> There is a lot to say about the entire reparenting business or storing
> >> the original pid in ps_oppid (instead of some form of a reference to
> >> the process).
> >>
> >> However, I think the most feasible fix for now is the same thing
> >> FreeBSD did: *always* store the actual parent pid in ps_oppid. This
> >> means all repareting will keep updating it (most notably when
> >> abandoning children on exit), while ptrace will skip that part.
> >>
> >> Side effect of such a change be that getppid will stop requiring the
> >> kernel lock.
> >>
> >
> > Thanks for report. But we are in beta stage now so such modification is
> > impossible until next iteration.
> >
> > Since original parent identifier is stored as `ps_oppid' while process
> > is traced we just return it to userland for this case. This is the way I
> > propose to fix this bug for now.
> >
> > Comments? OKs?
> >
> > Index: sys/kern/kern_prot.c
> > ===
> > RCS file: /cvs/src/sys/kern/kern_prot.c,v
> > retrieving revision 1.76
> > diff -u -p -r1.76 kern_prot.c
> > --- sys/kern/kern_prot.c  9 Jul 2019 12:23:25 -   1.76
> > +++ sys/kern/kern_prot.c  4 Sep 2020 21:12:15 -
> > @@ -84,7 +84,11 @@ int
> >  sys_getppid(struct proc *p, void *v, register_t *retval)
> >  {
> >
> > - *retval = p->p_p->ps_pptr->ps_pid;
> > + if (p->p_p->ps_flags & PS_TRACED)
> > + *retval = p->p_p->ps_oppid;
> > + else
> > +     *retval = p->p_p->ps_pptr->ps_pid;
> > +
> >   return (0);
> >  }
>
> This is definitely a bare minimum fix, but it does the job.
>

ptrace() has behaved like this for the life of OpenBSD and an indefinite
number of years previous in the BSD releases.  What has happened that a
definitely incomplete fix is needed Right Now?


Philip Guenther


Re: timekeep: fixing large skews on amd64 with RDTSCP

2020-07-17 Thread Philip Guenther
On Thu, Jul 16, 2020 at 4:55 PM Scott Cheloha 
wrote:

> > On Jul 16, 2020, at 19:36, Theo de Raadt  wrote:
> >
> >> Note the third sentence.
> >>
> >> Given that, I reason that a serializing instruction before *and* after
> >> the RDTSC should freeze it in place.
> >
> > I haven't seen anyone read it that way.
>
> They say that instructions after RDTSC can run before it because
> it isn't a serializing instruction.
>
> Do we want that?
>
> And then, consider this bit of programming advice.  Also from the
> ISA reference (Vol. 2B 4-547):
>
> > If software requires RDTSC to be executed prior to execution of any
> > subsequent instruction (including any memory accesses), it can
> > execute the sequence LFENCE immediately after RDTSC.
>
> What other reading is possible given this documentation?
>
> What is your interpretation?  Am I missing something?
>

If you're trying to time a sequence of instructions via rdtsc, then you
have to put an lfence after the first rdtsc (so that the sequence being
timed doesn't get lifted to before it completes) and an lfence before the
second rdtsc (so that the sequence is actually complete before the second
one).  In this case, is it a problem if instructions after the rdtsc that
are not dependent on its result may be started before it's actually
complete?  If not, then there's no reason to bracket that side.


Philip Guenther


Re: disable libc sys wrappers?

2020-07-09 Thread Philip Guenther
On Wed, Jul 8, 2020 at 8:06 AM Theo de Raadt  wrote:

> Mark Kettenis  wrote:
>
> > > From: "Theo de Raadt" 
> > > Date: Wed, 08 Jul 2020 09:42:41 -0600
> > >
> > > I think we need something like this.
> > >
> > > Documenting it will be a challenge.
> > >
> > > I really don't like the name as is too generic, when the control is
> only
> > > for a narrow set of "current time" system calls.
> >
> > I'm not sure we should be using getenv() in this early initialization
> > function though.
>
> Ah, you worry about the static "#ifndef PIC / early_static_init" versus
> "PIC ld.so" environ setup, and this very early getenv() call might not be
> looking at the environ global.
>

It's late enough in the process (after a possible call
to early_static_init(), and definitely after any fixup by ld.so) that it
should work Just Fine.

I would flip the test to check the environment before running issetugid(2)
because the syscall is more expensive and it's nice not to clutter the
kdump output.  ;-)


Philip Guenther


Re: SSE in kernel?

2020-06-23 Thread Philip Guenther
On Mon, Jun 22, 2020 at 9:12 PM  wrote:

> Are SSE instructions allowed in the AMD64 kernel?  Is #ifdef __SSE__
> a sufficient guard?
>
> I have a rasops32 putchar with SSE that is 2x faster.
>

As Bryan and Patrick noted: it's possible, but there are restrictions and
costs.

The main restriction is that the code must not permit a context-switch
between the fpu_kernel_enter() and fpu_kernel_exit() calls.  No taking an
rwlock or calling any of the sleep functions, for example.

If you're using more than the minimal level of SSE which is already
required by the kernel (for lfence, etc) then you should also check whether
the necessary extension bits are present in curcpu()->ci_feature_* and fall
back to the current code if not present.

The cost is that if the thread doing this isn't a system thread, then the
first fpu_kernel_enter() call after the userspace->kernel transition has to
save and reset the FPU registers (XSAVEOPT + XRSTOR on newish CPUs).  Every
fpu_kernel_exit(), regardless of thread type, resets them again (XRSTOR).

If the restriction isn't a problem and the cost of those is worth the gain,
then sure, go for it.  We already do it for AES stuff in the kernel, for
example.  c.f. /usr/src/sys/arch/amd64/amd64/aesni.c


Philip Guenther


Re: EV_SET(2) shadows variable

2020-04-05 Thread Philip Guenther
On Sat, 4 Apr 2020, Theo de Raadt wrote:
> Philip Guenther  wrote:
> 
> > On Fri, 3 Apr 2020, Martin Pieuchot wrote:
> > > Thanks, here it is, ok?
> > 
> > ok guenther@
> 
> Should we do the same to all other macros, just in case?

Checking /usr/include/{,sys/}*.h, the diff below fixes the only ones I 
found to be potential problems

/usr/include/net* and some others have not-completely-safe macros, like 
IP6_EXTHDR_GET()

Index: include/bitstring.h
===
RCS file: /data/src/openbsd/src/include/bitstring.h,v
retrieving revision 1.5
diff -u -p -r1.5 bitstring.h
--- include/bitstring.h 2 Jun 2003 19:34:12 -   1.5
+++ include/bitstring.h 6 Apr 2020 00:37:52 -
@@ -83,46 +83,46 @@ typedef unsigned char bitstr_t;
 
/* clear bits start ... stop in bitstring */
 #definebit_nclear(name, start, stop) do { \
-   register bitstr_t *_name = name; \
-   register int _start = start, _stop = stop; \
-   while (_start <= _stop) { \
-   bit_clear(_name, _start); \
-   _start++; \
+   register bitstr_t *__name = (name); \
+   register int ___start = (start), __stop = (stop); \
+   while (__start <= __stop) { \
+   bit_clear(__name, __start); \
+   __start++; \
} \
 } while(0)
 
/* set bits start ... stop in bitstring */
 #definebit_nset(name, start, stop) do { \
-   register bitstr_t *_name = name; \
-   register int _start = start, _stop = stop; \
-   while (_start <= _stop) { \
-   bit_set(_name, _start); \
-   _start++; \
+   register bitstr_t *__name = (name); \
+   register int __start = (start), __stop = (stop); \
+   while (__start <= __stop) { \
+   bit_set(__name, __start); \
+   __start++; \
} \
 } while(0)
 
/* find first bit clear in name */
 #definebit_ffc(name, nbits, value) do { \
-   register bitstr_t *_name = name; \
-   register int _bit, _nbits = nbits, _value = -1; \
-   for (_bit = 0; _bit < _nbits; ++_bit) \
-   if (!bit_test(_name, _bit)) { \
-   _value = _bit; \
+   register bitstr_t *__name = (name); \
+   register int __bit, __nbits = (nbits), __value = -1; \
+   for (__bit = 0; __bit < __nbits; ++__bit) \
+   if (!bit_test(__name, __bit)) { \
+   __value = __bit; \
break; \
} \
-   *(value) = _value; \
+   *(value) = __value; \
 } while(0)
 
/* find first bit set in name */
 #definebit_ffs(name, nbits, value) do { \
-   register bitstr_t *_name = name; \
-   register int _bit, _nbits = nbits, _value = -1; \
-   for (_bit = 0; _bit < _nbits; ++_bit) \
-   if (bit_test(_name, _bit)) { \
-   _value = _bit; \
+   register bitstr_t *__name = (name); \
+   register int __bit, __nbits = (nbits), __value = -1; \
+   for (__bit = 0; __bit < __nbits; ++__bit) \
+   if (bit_test(__name, __bit)) { \
+   __value = __bit; \
break; \
} \
-   *(value) = _value; \
+   *(value) = __value; \
 } while(0)
 
 #endif /* !_BITSTRING_H_ */
Index: sys/sys/disklabel.h
===
RCS file: /data/src/openbsd/src/sys/sys/disklabel.h,v
retrieving revision 1.75
diff -u -p -r1.75 disklabel.h
--- sys/sys/disklabel.h 24 Oct 2017 09:36:13 -  1.75
+++ sys/sys/disklabel.h 6 Apr 2020 00:52:08 -
@@ -156,37 +156,37 @@ struct__partitionv0 { /* old (v0) part
 
 #define DL_GETPSIZE(p) (((u_int64_t)(p)->p_sizeh << 32) + (p)->p_size)
 #define DL_SETPSIZE(p, n)  do { \
-   u_int64_t x = (n); \
-   (p)->p_sizeh = x >> 32; \
-   (p)->p_size = x; \
+   u_int64_t __x = (n); \
+   (p)->p_sizeh = __x >> 32; \
+   (p)->p_size = __x; \
} while (0)
 #define DL_GETPOFFSET(p)   (((u_int64_t)(p)->p_offseth << 32) + 
(p)->p_offset)
 #define DL_SETPOFFSET(p, n)do { \
-   u_int64_t x = (n); \
-   (p)->p_offseth = x >> 32; \
-   (p)->p_offset = x; \
+   u_int64_t __x = (n); \
+   (p)->p_offseth = __x >> 32; \
+   

Re: split futex into three

2020-04-05 Thread Philip Guenther
On Sun, 5 Apr 2020, Stuart Henderson wrote:
> On 2020/04/05 10:28, Martin Pieuchot wrote:
> > Another way to proceed would be to do a port grep for futex and see what
> > the ecosystem is using.
> 
> Sorry it's not filtered, but :
> 
> https://junkpile.org/grep.futex.gz

Sure looks like the only occurence of futex() used with FUTEX_REQUEUE (== 
3) is the linux kernel test program.  Everything else, including rust, is 
using FUTEX_CMP_REQUEUE or one of the PI operations 
(FUTEX_WAIT_REQUEUE_PI, FUTEX_CMP_REQUEUE_PI).


Philip



Re: kdump futex fix

2020-04-04 Thread Philip Guenther
On Sat, 4 Apr 2020, Martin Pieuchot wrote:
> On 03/04/20(Fri) 19:26, Philip Guenther wrote:
> > On Fri, 3 Apr 2020, Martin Pieuchot wrote:
> > > Depending on the operation requested futex(2) might return the number of 
> > > woken threads or an error.  That means the following...
> > > 
> > > mpv  CALL  
> > > futex(0xa58935899b0,0x82,1,0,0)
> > > mpv  RET   futex -1 errno 1 Operation not permitted
> > > 
> > > ...is not an error but it indicates that 1 thread has been awoken.
> > > 
> > > Diff below would have saved me quite some time looking in the wrong 
> > > direction.
> > > 
> > > ok?
> > 
> > Isn't that just a complicated way to get the same effect as deleting the 
> > "case SYS_futex:" line shortly above there?
> 
> Thanks!  Updated below :)

ok guenther@



Re: split futex into three

2020-04-04 Thread Philip Guenther
On Sat, 4 Apr 2020, Paul Irofti wrote:
> Here is a proper diff (both sys and libc into one).

Okay, bunch o' comments and thoughts of varying strength of opinion.

> diff --git lib/libc/Symbols.list lib/libc/Symbols.list
> index f9aa62ab6e8..4fa37a835aa 100644
> --- lib/libc/Symbols.list
> +++ lib/libc/Symbols.list
> @@ -86,7 +86,10 @@ _thread_sys_fstatat
>  _thread_sys_fstatfs
>  _thread_sys_fsync
>  _thread_sys_ftruncate
> -_thread_sys_futex
> +_thread_sys_ofutex

The ofutex syscall should not be built (or exported) by libc. The kernel 
needs to continue to provide it for the normal period to support older 
libc's where futex(2) is a direct syscall,but no application does, would, 
or should invoke ofutex() under that name.


> +_thread_sys_futex_wait
> +_thread_sys_futex_wake
> +_thread_sys_futex_requeue

glibc has internal inline functions futex_wait() and futex_wake() and 
there has been at least discussion about exporting some version of them.  
If our signatures matched the last-best-proposal over there (which was 
dropped, mind you) then I would be tempted to use those names.  If not, 
then maybe go with _futex_wait() and _futex_wake()?

FUTEX_REQUEUE is the old bad one, with no val3 argument that's checked 
before the operation.  Our libc/libpthread don't actually use them, and in 
the Linux world glibc switched completely to FUTEX_CMP_REQUEUE.  Perhaps 
we should drop support for FUTEX_REQUEUE (major bump, yah) and add 
_futex_cmp_requeue(2) when we need it?


> --- lib/libc/gen/sigwait.c
> +++ lib/libc/gen/sigwait.c

This is unrelated to futex stuff.  :)


> --- lib/libc/hidden/sys/futex.h
> +++ lib/libc/hidden/sys/futex.h
> @@ -20,6 +20,8 @@
>  
>  #include_next 
>  
> -PROTO_NORMAL(futex);

You need to keep this as long as futex() is used internally.  Once futex() 
is no longer used in libc then this should change to 
"PROTO_DEPRECATED(futex);" and the "DEF_STRONG(futex);" line should be 
deleted.

(If you have DEF_STRONG(foo) or DEF_WEAK(foo), then you must have 
PROTO_NORMAL(foo).  c.f libc/include/README II.B.2)


> --- lib/libc/thread/synch.h
> +++ lib/libc/thread/synch.h
> @@ -19,10 +19,33 @@
>  #include 
>  #include 
>  
> +static inline int
> +_futex(volatile uint32_t *p, int op, int val, const struct timespec 
> *timeout, uint32_t *g)
> +{
> + int flags = 0;
> +
> + if (op & FUTEX_PRIVATE_FLAG)
> + flags |= FT_PRIVATE;
> +
> + switch (op) {
> + case FUTEX_WAIT:
> + case FUTEX_WAIT_PRIVATE:
> + return futex_wait(p, val, timeout, flags);
> + case FUTEX_WAKE:
> + case FUTEX_WAKE_PRIVATE:
> + return futex_wake(p, val, flags);
> + case FUTEX_REQUEUE:
> + case FUTEX_REQUEUE_PRIVATE:
> + return futex_requeue(p, val, g, timeout, flags);
> + }
> +
> + return ENOSYS;
> +}

I would put this logic directly into futex()'s definition, and then...


>  static inline int
>  _wake(volatile uint32_t *p, int n)
>  {
> - return futex(p, FUTEX_WAKE_PRIVATE, n, NULL, NULL);
> + return _futex(p, FUTEX_WAKE_PRIVATE, n, NULL, NULL);
>  }

have this just be
return futex_wake(p, n, FUTEX_WAKE_PRIVATE);

and similar for _twait() below.


(I would be inclined to use FUTEX_WAKE_PRIVATE as the flag for the new 
syscalls instead of exposing FT_PRIVATE into the userspace API.)


> --- sys/kern/sys_futex.c
> +++ sys/kern/sys_futex.c
> @@ -83,9 +83,74 @@ futex_init(void)
>  }
>  
>  int
> -sys_futex(struct proc *p, void *v, register_t *retval)
> +sys_futex_wait(struct proc *p, void *v, register_t *retval)
>  {
> - struct sys_futex_args /* {
> + struct sys_futex_wait_args /* {
> + syscallarg(uint32_t *) f;
> + syscallarg(inr) val;
> + syscallarg(const struct timespec *) timeout;
> + syscallarg(int) flags;
> + } */ *uap = v;
> + uint32_t *uaddr = SCARG(uap, f);
> + uint32_t val = SCARG(uap, val);
> + const struct timespec *timeout = SCARG(uap, timeout);
> + int flags = SCARG(uap, flags);
> +
> + KERNEL_LOCK();
> + rw_enter_write();
> + *retval = futex_wait(uaddr, val, timeout, flags);
> + rw_exit_write();
> + KERNEL_UNLOCK();
> +
> + return 0;
> +}
> +
> +int
> +sys_futex_wake(struct proc *p, void *v, register_t *retval)
> +{
> + struct sys_futex_wake_args /* {
> + syscallarg(uint32_t *) f;
> + syscallarg(int) val;
> + syscallarg(int) flags;
> + } */ *uap = v;
> + uint32_t *uaddr = SCARG(uap, f);
> + uint32_t val = SCARG(uap, val);
> + int flags = SCARG(uap, flags);
> +
> + rw_enter_write();
> + *retval = futex_wake(uaddr, val, flags);
> + rw_exit_write();
> +
> + return 0;
> +}
> +
> +int
> +sys_futex_requeue(struct proc *p, void *v, register_t *retval)
> +{
> + struct sys_futex_requeue_args /* {
> + syscallarg(uint32_t *) f;
> + syscallarg(int) val;
> + syscallarg(uint32_t *) g;
> + 

Re: kdump futex fix

2020-04-03 Thread Philip Guenther
On Fri, 3 Apr 2020, Martin Pieuchot wrote:
> Depending on the operation requested futex(2) might return the number of 
> woken threads or an error.  That means the following...
> 
> mpv  CALL  futex(0xa58935899b0,0x82,1,0,0)
> mpv  RET   futex -1 errno 1 Operation not permitted
> 
> ...is not an error but it indicates that 1 thread has been awoken.
> 
> Diff below would have saved me quite some time looking in the wrong 
> direction.
> 
> ok?

Isn't that just a complicated way to get the same effect as deleting the 
"case SYS_futex:" line shortly above there?

The real problem is that futex(2) is actually 3 different syscalls wrapped 
into one.  It was split into three then kdump could properly report 
futex_wake(2) and futex_requeue(2) as returning a count, while 
futex_wait(2) returns an errno.  The existing 'switch' in sys_futex() 
would just move to userspace's futex(3), provided for linux compat.

(And our syscalls would have proper prototypes and futex_wait() could take 
a clockid_t, and)



Re: EV_SET(2) shadows variable

2020-04-03 Thread Philip Guenther
On Fri, 3 Apr 2020, Martin Pieuchot wrote:
> Thanks, here it is, ok?

ok guenther@



Re: EV_SET(2) shadows variable

2020-04-02 Thread Philip Guenther
On Tue, Mar 31, 2020 at 11:24 PM Martin Pieuchot  wrote:

> The current form of EV_SET(2) declares a `kevp' variable.  This can
> cause to subtle bugs hard to discover if one uses a variable of the
> same to retrieve events.
>
> Diff below prefixes the locally declared variable by an underscore,
> like it it done in FD_ZERO(), which should be good enough to prevent
> surprises.
>
> Is it the right way to correct such issue?  How many underscores are
> enough?  Can the standards gurus tell me?
>

tl;dr: this should use a variable that starts with either two underbars
(__kevp) or an underbar and a capital (_Kevp).  The same is true of
FD_ZERO().

The namespace of identifiers that begin with a single underbar is only
"reserved for use as identifiers with file scope in both the ordinary and
tag name spaces".  That means it's perfectly legal for an application to
declare a *local* variable with such a name.  So, this should be legal:

#include 
#include 
#include 
#include 
int main(void)
{
struct fd_set _p;
struct kevent _kevp;
FD_ZERO(&_p):
EV_SET(&_kevp, 0, 0, 0, 0, 0, 0);
return 0;
}


...but it currently blows up on the FD_ZERO() and would blow up in the
EV_SET() with your proposed diff.


Philip Guenther


Re: ktrwriteraw & vget

2020-03-17 Thread Philip Guenther
On Tue, Mar 17, 2020 at 5:18 AM Martin Pieuchot  wrote:

> On 17/03/20(Tue) 04:02, Philip Guenther wrote:
> > On Tue, Mar 17, 2020 at 1:07 AM Martin Pieuchot  wrote:
> > [...]
> > > @@ -663,8 +668,6 @@ ktrwriteraw(struct proc *curp, struct vn
> > > LIST_FOREACH(pr, , ps_list)
> > > if (pr->ps_tracevp == vp && pr->ps_tracecred == cred)
> > > ktrcleartrace(pr);
> > > -
> > > -   vput(vp);
> > > return (error);
> > >  }
> > >
> >
> > This looks unsafe to me: isn't ktrcleartrace() only safe if the caller
> > holds a reference to the vnode?  Once ktrcleartrace() clears the
> reference
> > from the current thread's process and it goes on the freelist, can't the
> > vnode vp points to be invalidated and reused?
>
> As long as a process holds a reference to the vnode, via `ps_tracevp',
> it wont be recycle.  Only the last call of ktrcleartrace() will release
> the vnode via vrele(9).
>

...and after that last reference is released this code will continue to
walk the allprocess list, comparing a possibly-recycled pointer to
ps_tracevp pointers in the remaining processes.  Good thing that vrele(9)
is guaranteed to never sleep and thereby let another process start
ktracing, reusing that vnode.??


Re: ktrwriteraw & vget

2020-03-17 Thread Philip Guenther
On Tue, Mar 17, 2020 at 1:07 AM Martin Pieuchot  wrote:

> On 16/03/20(Mon) 14:01, Martin Pieuchot wrote:
> > vget(9) might fail, stop right away if that happens.
> >
> > CID 1453020 Unchecked return value.
>
> Updated diff that stops tracing if vget(9) fails, similar to what's
> currently done if VOP_WRITE(9) fails, suggested by visa@.
>
> Code shuffling is there to avoid calling vput(9) if vget(9) doesn't
> succeed.
>
> ok?
>
> Index: kern/kern_ktrace.c
> ===
> RCS file: /cvs/src/sys/kern/kern_ktrace.c,v
> retrieving revision 1.100
> diff -u -p -r1.100 kern_ktrace.c
> --- kern/kern_ktrace.c  6 Oct 2019 16:24:14 -   1.100
> +++ kern/kern_ktrace.c  17 Mar 2020 09:46:03 -
> @@ -649,12 +649,17 @@ ktrwriteraw(struct proc *curp, struct vn
> auio.uio_iovcnt++;
> auio.uio_resid += kth->ktr_len;
> }
> -   vget(vp, LK_EXCLUSIVE | LK_RETRY);
> +   error = vget(vp, LK_EXCLUSIVE | LK_RETRY);
> +   if (error)
> +   goto bad;
> error = VOP_WRITE(vp, , IO_UNIT|IO_APPEND, cred);
> -   if (!error) {
> -   vput(vp);
> -   return (0);
> -   }
> +   vput(vp);
> +   if (error)
> +   goto bad;
> +
> +   return (0);
> +
> +bad:
> /*
>  * If error encountered, give up tracing on this vnode.
>  */
> @@ -663,8 +668,6 @@ ktrwriteraw(struct proc *curp, struct vn
> LIST_FOREACH(pr, , ps_list)
> if (pr->ps_tracevp == vp && pr->ps_tracecred == cred)
> ktrcleartrace(pr);
> -
> -   vput(vp);
> return (error);
>  }
>

This looks unsafe to me: isn't ktrcleartrace() only safe if the caller
holds a reference to the vnode?  Once ktrcleartrace() clears the reference
from the current thread's process and it goes on the freelist, can't the
vnode vp points to be invalidated and reused?

Maybe this vget() should be split into vref() + vn_lock(), and the vput()
similarly split into VOP_UNLOCK() + vrele(), so the vrele() can be left
after this LIST_FOREACH() while the VOP_UNLOCK() is hoisted to the
"vn_lock() succeeded" path.


Philip Guenther


Re: posix_openpt: allow O_CLOEXEC

2020-02-06 Thread Philip Guenther
On Thu, Feb 6, 2020 at 11:38 AM joshua stein  wrote:

> On Thu, 06 Feb 2020 at 11:21:11 -0700, Todd C. Miller wrote:
> > On Thu, 06 Feb 2020 10:45:44 -0700, "Theo de Raadt" wrote:
> >
> > > That feels better, and will be more atomic.
> >
> > Unfortunately, we can't do this in ptmioctl() since we don't have
> > the index of the open ptm device to use to check for the presence
> > of UF_EXCLOSE.  So it has to be done in libc instead.
>
> Alright, so like this then?
>

I'm sorry, but this is only superficially correct: the otherwise
unobtainable value of O_CLOEXEC (and SOCK_CLOEXEC) is that it's atomic with
the creation of the fd: there is no window where another thread calling
execve() or fork() will catch the fd without the flag set.  Setting the
flag in the libc code doesn't provide that guarantee.

I see two ways to do this in a way that provides the guarantee without
breaking the existing ioctl(PTMGET) ABI:

1) rename PTMGET to PTMGET_O66; add PTMGET which *always* sets FD_CLOEXEC
on both fds and change libc to instead _clear_ FD_CLOEXEC when O_CLOEXEC is
not passed.  Downside: extra syscalls in the common case

2) rename PTMGET to PTMGET_O66 and struct ptmget to struct ptmget_o66;
define a new struct ptmget which includes an "int flags" member which can
be used to pass O_* flags that should be applied by the kernel to both
control and slave fd before returning; change libc to set that from its
flags argument.

2b) bonus: do (2) but also provide a flag which tells the kernel to *not
open* the slave, and use that in posix_openpt(), fixing a subtle bug in our
posix_openpt()'s behavior

I guess I meant three ways.


Side note: posix_openpt() should, like openpty(), unconditionally pass
O_CLOEXEC when opening /dev/ptm.


Philip Guenther


Re: waitpid vs ptrace

2020-01-30 Thread Philip Guenther
On Thu, 30 Jan 2020, Marc Espie wrote:
> So, basically, if we start arbitrary commands, then
> the classic loop 
> /* Wait for the child to exit.  */
> while (waitpid(cpid, , 0) == -1 && errno == EINTR)
>   continue;
> 
> is not quite enough.
> 
> See the small note in manpages (not only us, but everyone):
>  WIFSTOPPED(status)
>  True if the process has not terminated, but has stopped and can
>  be restarted.  This macro can be true only if the wait call
>  specified the WUNTRACED option or if the child process is being
>  traced (see ptrace(2)).
> 
> this means that somebody could run a command that forks, and then its child
> (our grand-child) could use ptrace on its parent, and then we would get
> a notification with WIFSTOPPED (assuming the current kernel bug is fixed).

No, what you describe would not cause waitpid(2) to return in the 'make' 
process: when ptrace(PT_ATTACH) is used, then target is effectively 
reparented to the tracing process such that the tracing process is the one 
which sees the wait() status changes, including the "stopped for tracing" 
state.  Only when it is detached (and thus either continued or killed!) 
will the make process see any wait statuses; at that point they will be 
normal wait statuses.


The case you describe _can_ happen if make's direct child simply calls 
ptrace(PT_TRACE_ME) and then takes a signal or execs, but a process doing 
that without providing a tracer process as well is simply buggy and 
misdesigned.


(The kernel bug is that wait calls in the "original but not currently the 
parent" process do not act like the traced process is still there and 
running: the fact that the child has been attached to and is being stopped 
and started by its tracer should be completely invisible to the original 
parent.)


Philip Guenther



Re: TIMESPEC_TO_NSEC(): futex(2), __thrsigdivert(2) & __thrsleep(2)

2020-01-12 Thread Philip Guenther
On Sun, Jan 12, 2020 at 4:44 AM Martin Pieuchot  wrote:

> The consensus is now to switch syscall doing sleeps to use tsleep_nsec(9).
> Our direct goal is to stop expressing time as ticks, more might come
> later.
>

The API futex(2) has a bug: it doesn't take a clockid_t and doesn't have a
way to take an absolute timeout, which means userspace has to do non-ideal
conversions to work around those limitations.  While it might(?) make sense
to keep futex(2) the way it is to support ports that use that exact API**,
it feels like we should support a richer API to make libc/libpthread
happier...and perhaps not have the insane argument unuse/reuse/overloading
that futex(2) has (e.g., "pass uint32_t val2 as the fourth argument instead
of timeout").

Let's say a diff magically appeared splitting out three syscalls for the
three ops, with correct types and args and where timeouts were specified
with timespec+clockid_t+"is absolute" flag, can that be slotted into all
this without wailing and gnashing of teeth?

** I have no idea if any ports actually do that

Philip Guenther


Re: PATCH: fix race condition in binutils-2.17 build

2020-01-02 Thread Philip Guenther
On Thu, Jan 2, 2020 at 2:26 AM Marc Espie  wrote:

> We don't normally hit this thanks to the "expensive job" heuristics
> (meaning that we don't start the second target while the first is
> building), but still there is a race, and if for whatever reason we
> remove that heuristics we hit it fairly hard.
>
> The patch is fairly trivial, just run both targets separately.
> Example of a crash and analysis follows.
>
> Note that the problem only occurs in binutils-2.17. The former versions
> did have one less level of indirection in their Makefiles.
>

ok guenther@


Re: sparc64: iommu: unbreak DEBUG build: %lx and time_t

2019-12-23 Thread Philip Guenther
On Mon, Dec 23, 2019 at 12:42 PM Klemens Nanni  wrote:

> `option DEBUG' builds are broken on sparc64, here's the first step in
> fixing it.
>
> /sys/arch/sparc64/dev/iommu.c: In function 'iommu_strbuf_flush_done':
> /sys/arch/sparc64/dev/iommu.c:582: warning: format '%lx' expects type
> 'long unsigned int', but argument 4 has type 'time_t'
> /sys/arch/sparc64/dev/iommu.c:582: warning: format '%lx' expects type
> 'long unsigned int', but argument 6 has type 'time_t'
>
> time_t is a long unsinged integer, that is
>
> /usr/include/time.h:typedef __time_ttime_t;
> /usr/include/sys/_types.h:typedef   __int64_t   __time_t;   /*
> epoch time */
> /usr/include/amd64/_types.h:typedef long long   __int64_t;
>
> OK?  Any better idea?


In general, for time_t we've case to (long long) and used %ll, so
that it's consistent across the system.


Philip Guenther


Re: dmesg alloc

2019-12-23 Thread Philip Guenther
On Mon, 23 Dec 2019, Alexander Bluhm wrote:
> dmesg(8) allocates a bit too much memory.
> 
> sysctl KERN_MSGBUFSIZE returns msg_bufs.
> sysctl KERN_MSGBUF copies out msg_bufs + offsetof(struct msgbuf,
> msg_bufc) bytes.
> dmesg allocates msg_bufs + sizeof(struct msgbuf) - 1 bytes.
> 
> This is not the same value as struct msgbuf is padded.
> struct msgbuf {
>   ...
> longmsg_bufd;   /* number of dropped bytes */
> charmsg_bufc[1];/* buffer */
> };
> 
> Better use the same size algorithm in kernel and userland.
> 
> ok?

The first chunk should be replaced with this one:

@@ -39,6 +39,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 

to just pick up the offsetof() supplied for the last 30 years by the C 
standard.  Otherwise, ok.



Re: fix mcount.po target in libc/gmon/Makefile.inc

2019-12-19 Thread Philip Guenther
On Thu, 19 Dec 2019, Theo Buehler wrote:
> According to the comment in lib/libc/gmon/Makefile.inc, mcount needs
> special treatment since it cannot be compiled with profiling or pie.
> 
> When depend was removed as an independent target, this special case was
> missed.  Align it with the inference rules in bsd.lib.mk. This also
> makes 'make clean' work correctly (it currently leaves mcount.po.d
> behind).

ok guenther@



Re: amd64 SMEP SMAP trap panic print

2019-12-19 Thread Philip Guenther
On Fri, 20 Dec 2019, Alexander Bluhm wrote:
> Can we use the regular trap panic for SMEP and SMAP?  pageflttrap() 
> returns 0 to print a nice reason in kerntrap().  Especially if ddb is 
> disabled, additional information is printed.
> 
> attempt to access user address 0xe27539f1000 in supervisor mode
> fatal page fault in supervisor mode
> trap type 6 code 3 rip 819c2665 cs 8 rflags 10202 cr2 e27539f1000 cpl 
> 0 rsp 80001ffbfe28
> gsbase 0x80001fb63ff0  kgsbase 0x0
> panic: trap type 6, code=3, pc=819c2665
...
> Index: arch/amd64/amd64/trap.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/arch/amd64/amd64/trap.c,v
> retrieving revision 1.77
> diff -u -p -r1.77 trap.c
> --- arch/amd64/amd64/trap.c   6 Sep 2019 12:22:01 -   1.77
> +++ arch/amd64/amd64/trap.c   19 Dec 2019 23:46:40 -
> @@ -163,14 +163,18 @@ pageflttrap(struct trapframe *frame, int
>   extern struct vm_map *kernel_map;
> 
>   /* This will only trigger if SMEP is enabled */
> - if (cr2 <= VM_MAXUSER_ADDRESS && frame->tf_err & PGEX_I)
> - panic("attempt to execute user address %p "
> - "in supervisor mode", (void *)cr2);
> + if (cr2 <= VM_MAXUSER_ADDRESS && frame->tf_err & PGEX_I) {
> + printf("attempt to execute user address %p "
> + "in supervisor mode\n", (void *)cr2);
> + return 0;
> + }
>   /* This will only trigger if SMAP is enabled */
>   if (pcb->pcb_onfault == NULL && cr2 <= VM_MAXUSER_ADDRESS &&
> - frame->tf_err & PGEX_P)
> - panic("attempt to access user address %p "
> - "in supervisor mode", (void *)cr2);
> + frame->tf_err & PGEX_P) {
> + printf("attempt to access user address %p "
> + "in supervisor mode\n", (void *)cr2);
> + return 0;
> + }
> 
>   /*
>* It is only a kernel address space fault iff:

For this part, should we reuse the 'faultstr' logic seen later to set the 
panic string and do something like, say...

Index: amd64/trap.c
===
RCS file: /data/src/openbsd/src/sys/arch/amd64/amd64/trap.c,v
retrieving revision 1.77
diff -u -p -r1.77 trap.c
--- amd64/trap.c6 Sep 2019 12:22:01 -   1.77
+++ amd64/trap.c20 Dec 2019 02:21:03 -
@@ -77,6 +77,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 
@@ -132,6 +133,24 @@ static inline void verify_smap(const cha
 static inline void debug_trap(struct trapframe *_frame, struct proc *_p,
 long _type);
 
+static inline int
+fault(const char *format, ...)
+{
+   static char faultbuf[512];
+   va_list ap;
+
+   /*
+* Save the fault info for DDB and retain the kernel lock to keep
+* faultbuf from being overwritten by another CPU.
+*/
+   va_start(ap, format);
+   vsnprintf(faultbuf, sizeof faultbuf, format, ap);
+   va_end(ap);
+   printf("%s\n", faultbuf);
+   faultstr = faultbuf;
+   return 0;
+}
+
 /*
  * pageflttrap(frame, usermode): page fault handler
  * Returns non-zero if the fault was handled (possibly by generating
@@ -164,12 +183,12 @@ pageflttrap(struct trapframe *frame, int
 
/* This will only trigger if SMEP is enabled */
if (cr2 <= VM_MAXUSER_ADDRESS && frame->tf_err & PGEX_I)
-   panic("attempt to execute user address %p "
+   return fault("attempt to execute user address %p "
"in supervisor mode", (void *)cr2);
/* This will only trigger if SMAP is enabled */
if (pcb->pcb_onfault == NULL && cr2 <= VM_MAXUSER_ADDRESS &&
frame->tf_err & PGEX_P)
-   panic("attempt to access user address %p "
+   return fault("attempt to access user address %p "
"in supervisor mode", (void *)cr2);
 
/*
@@ -211,18 +230,9 @@ pageflttrap(struct trapframe *frame, int
frame->tf_rip = (u_int64_t)pcb->pcb_onfault;
return 1;
} else {
-   /*
-* Bad memory access in the kernel; save the fault
-* info for DDB and retain the kernel lock to keep
-* faultbuf from being overwritten by another CPU.
-*/
-   static char faultbuf[512];
-   snprintf(faultbuf, sizeof faultbuf,
-   "uvm_fault(%p, 0x%llx, 0, %d) -> %x",
+   /* bad memory access in the kernel */
+   return 

Re: inteldrm(4) fix

2019-12-17 Thread Philip Guenther
On Tue, Dec 17, 2019 at 5:02 AM Mark Kettenis 
wrote:

> Hit the
>
>   KASSERT(curcpu()->ci_inatomic == 0);
>
> in pagefault_disable().  Analysis of the code in i915_gem_execbuffer.c
> shows that pagefault_disable() may be called when page faults are
> already disabled, i.e. from eb_relocate_slow() where eb_relocate_vma()
> is called after pagefault_disable().
>
> Diff below fixes this.
>
> ok?
>

Like a making a lock recursive, it suggests confusion in the code...but if
upstream supports it then doing otherwise is a losing proposition.

ok guenther@


Re: Sync KVE_ET_* with UVM_ET_*

2019-12-05 Thread Philip Guenther
On Thu, Dec 5, 2019 at 9:15 AM Martin Pieuchot  wrote:

> Sync with reality, will help KERN_PROC_VMMAP consumers.  Ok?
>

ok guenther@


Re: fwide() does not unlock if error was occurred

2019-12-02 Thread Philip Guenther
On Mon, Dec 2, 2019 at 7:48 PM Masato Asou  wrote:

> fwide() does not unlock if error was occurred.
>
> ok?
>

ok guenther@


Re: iwm: support 9260 devices

2019-11-16 Thread Philip Guenther
On Sat, Nov 16, 2019 at 8:19 AM Stefan Sperling  wrote:

> On Sat, Nov 16, 2019 at 04:51:44PM +0100, Stefan Sperling wrote:
> > This diff adds support for iwm(4) 9260 devices and hopefully 9560
> > devices as well but I have not yet had time to test those.
> >
> > Joint work with patrick@. Some parts were lifted from FreeBSD.
> >
> > If you have the followng device in pcidump it should at least get
> > an IP address from DHCP and be able to ping:
> >  4:0:0: Intel Dual Band Wireless-AC 9260
> >  0x: Vendor ID: 8086, Product ID: 2526
> >
> > The firmware is not in fw_update yet.
> > In the meantime firmware can be fetched from here:
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/
> >
> > Copy these files to /etc/firmware as indicated:
> > for 9260: iwlwifi-9260-th-b0-jf-b0-34.ucode -> /etc/firmware/iwm-9260-34
> > for 9560: iwlwifi-9000-pu-b0-jf-b0-34.ucode -> /etc/firmware/iwm-9000-34
> >
> > Checks for regressions on already supported devices are also welcome,
> > in which case the firmware isn't needed.
>
> Better diff which fixes an Rx throughput issue which was present in
> the previous diff.


Looking good so far on my X1 extreme once I
added PCI_PRODUCT_INTEL_WL_9560_2 next to the _1 lines in if_iwm.c.
Awesome!

Philip Guenther


Re: Fix cpio -Hustar -o

2019-11-14 Thread Philip Guenther
On Thu, Nov 14, 2019 at 1:36 PM Christian Weisgerber 
wrote:

> I found this discrepancy surprising:
>
> $ find /bin -print | cpio -o -Hustar >foo
> $ file foo
> foo: POSIX tar archive
> $ find /bin -print | cpio -Hustar -o >foo
> $ file foo
> foo: ASCII cpio archive (SVR4 with CRC)
>
> The argument order should not matter here.
>
> Fix: When processing the -o switch, only set the archive format if
> not already set.
>
> ok?
>
> Index: bin/pax/options.c
> ===
> RCS file: /cvs/src/bin/pax/options.c,v
> retrieving revision 1.102
> diff -u -p -r1.102 options.c
> --- bin/pax/options.c   13 Sep 2018 12:33:43 -  1.102
> +++ bin/pax/options.c   14 Nov 2019 20:59:04 -
> @@ -1226,7 +1226,8 @@ cpio_options(int argc, char **argv)
>  * create an archive
>  */
> act = ARCHIVE;
> -   frmt = &(fsub[F_CPIO]);
> +   if (frmt == NULL)
> +   frmt = &(fsub[F_CPIO]);
> break;
> case 'p':
> /*
>

ok guenther@


Re: fix amd64 pmap comment

2019-11-02 Thread Philip Guenther
On Sat, Nov 2, 2019 at 4:34 AM Martin Pieuchot  wrote:

> uvm_km_alloc(9) has never been used in amd64's pmap.  pool_get(9) is the
> thing since its import by mickey@.
>
> ok?
>

ok guenther@


ftp: utime->utimensat: use UTIME_OMIT instead of calling time()

2019-10-13 Thread Philip Guenther


ftp currently uses utime(3) to set the modification time on the retrieved 
file.  It set the access time on the file to the current time when doing 
so.  By converting this to use utimensat(), we can avoid the silly call 
to time(3) to get the current time for the atime by instead passing 
UTIME_NOW.

However, I _think_ ftp does this just because utime(3) has no way to set 
the modification time without also setting the access time and calling 
stat(2) just to get the current atime (and dealing with the possible error 
cases) wasn't worth it, so instead of setting the atime to the current 
time with UTIME_NOW, I propose leaving the atime unchanged with 
UTIME_OMIT.

Thoughts?  oks?


Philip Guenther


Index: ftp.c
===
RCS file: /data/src/openbsd/src/usr.bin/ftp/ftp.c,v
retrieving revision 1.105
diff -u -p -r1.105 ftp.c
--- ftp.c   28 Jun 2019 13:35:01 -  1.105
+++ ftp.c   14 Oct 2019 01:32:56 -
@@ -72,6 +72,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -79,7 +80,6 @@
 #include 
 #include 
 #include 
-#include 
 
 #include "ftp_var.h"
 
@@ -1211,11 +1211,12 @@ break2:
if (preserve && (closefunc == fclose)) {
mtime = remotemodtime(remote, 0);
if (mtime != -1) {
-   struct utimbuf ut;
+   struct timespec times[2];
 
-   ut.actime = time(NULL);
-   ut.modtime = mtime;
-   if (utime(local, ) == -1)
+   times[0].tv_nsec = UTIME_OMIT;
+   times[1].tv_sec = mtime;
+   times[1].tv_nsec = 0;
+   if (utimensat(AT_FDCWD, local, times, 0) == -1)
fprintf(ttyout,
"Can't change modification time on %s to %s",
local, asctime(localtime()));



Re: Remove some sigvec Xr's

2019-07-25 Thread Philip Guenther
On Thu, Jul 25, 2019 at 1:04 PM Todd C. Miller  wrote:

> We should only cross-reference the obsolete sigvec(3) function from
> the signal compat manuals and sigaction(2).
>
> This also syncs the SEE ALSO section in ualarm(3) match that of
> alarm(3).
>

ok guenther@


We could reference signal(3) in csh instead of sigaction(2) if
> that's what people prefer.


I see no reason to prefer signal(3) over sigaction(2) for pages that aren't
about C-standard functions.

Hmm: sh(1) and ksh(1) have *nothing* from sections 2 or 3 in their SEE
ALSO.  That doesn't seem like a wrong choice, albeit inconsistent with
csh(1).  Part of me feels like _if_ they're going to mention umask(2),
setrlimit(2), and sigaction(2), then they should mention chdir(2), as the
other classic "must be in the shell" syscall.  

Philip Guenther


Re: remove duplicate definitions of LAPIC_ID_MASK and LAPIC_ID_SHIFT

2019-07-25 Thread Philip Guenther
On Thu, Jul 25, 2019 at 4:44 PM Kevin Lo  wrote:

> ok?
>

Yes, please.  guenther@


Re: Comment fix: No mcontext

2019-07-13 Thread Philip Guenther
On Thu, Jul 11, 2019 at 10:12 PM Benjamin Baier 
wrote:

> there is no mcontext, and since V1.2 of process_machdep.c struct reg
> and struct trapframe don't have to be closely synced.
> process_machdep.c no longer memcpy's one to the other.
>

Yep.  Committed.  Thanks!


Philip Guenther


logger(1): add -c option for LOG_CONS

2019-06-16 Thread Philip Guenther


Testing something else I needed to call syslog(3) with LOG_CONS.  Diff 
below adds support to logger(1) for doing that.  Option choice is 
compatible with NetBSD.

ok?

Philip Guenther

Index: logger.1
===
RCS file: /data/src/openbsd/src/usr.bin/logger/logger.1,v
retrieving revision 1.21
diff -u -p -r1.21 logger.1
--- logger.15 Jun 2013 17:35:12 -   1.21
+++ logger.116 Jun 2019 19:19:26 -
@@ -38,7 +38,7 @@
 .Nd make entries in the system log
 .Sh SYNOPSIS
 .Nm logger
-.Op Fl is
+.Op Fl cis
 .Op Fl f Ar file
 .Op Fl p Ar pri
 .Op Fl t Ar tag
@@ -52,6 +52,10 @@ system log module.
 .Pp
 The options are as follows:
 .Bl -tag -width "-f file"
+.It Fl c
+If unable to pass the message to
+.Xr syslogd 8 ,
+attempt to write the message to the console.
 .It Fl f Ar file
 Read from the specified
 .Ar file .
@@ -102,5 +106,5 @@ utility is compliant with the
 specification.
 .Pp
 The flags
-.Op Fl ifpst
+.Op Fl cfipst
 are extensions to that specification.
Index: logger.c
===
RCS file: /data/src/openbsd/src/usr.bin/logger/logger.c,v
retrieving revision 1.17
diff -u -p -r1.17 logger.c
--- logger.c28 Mar 2016 18:18:52 -  1.17
+++ logger.c16 Jun 2019 19:18:33 -
@@ -61,8 +61,11 @@ main(int argc, char *argv[])
tag = NULL;
pri = LOG_NOTICE;
logflags = 0;
-   while ((ch = getopt(argc, argv, "f:ip:st:")) != -1)
+   while ((ch = getopt(argc, argv, "cf:ip:st:")) != -1)
switch(ch) {
+   case 'c':   /* log to console */
+   logflags |= LOG_CONS;
+   break;
case 'f':   /* file to log */
if (freopen(optarg, "r", stdin) == NULL) {
(void)fprintf(stderr, "logger: %s: %s.\n",
@@ -180,6 +183,6 @@ void
 usage(void)
 {
(void)fprintf(stderr,
-   "usage: logger [-is] [-f file] [-p pri] [-t tag] [message ...]\n");
+   "usage: logger [-cis] [-f file] [-p pri] [-t tag] [message ...]\n");
exit(1);
 }



Re: usr.bin/getconf: Add reporting LONG_BIT

2019-05-27 Thread Philip Guenther
On Mon, May 27, 2019 at 3:43 PM Brian Callahan  wrote:

> Below is a small diff in response to some configuration attempts
> found in ports land.
> lang/ponyc uses $(shell getconf LONG_BIT) in its Makefile to
> determine whether or not we're on a 64-bit platform.
>

It needs to know that at the scripting level, outside of the C code itself
where it can directly test LONG_BIT (or perhaps better, _LP64)?  Huh.



> However, our getconf(1) doesn't support reporting LONG_BIT.
> This diff enables it. GNU/FreeBSD/DragonFly/MacOS getconf reports
> LONG_BIT, but NetBSD getconf does not.
> Desirable? OK?
>

That's the way to do it, I just have this vague "what tempting lunacy has
led them to this?" lurking in my mind.


Philip Guenther


Re: Make pthread_np.h standalone

2019-05-27 Thread Philip Guenther
On Mon, May 27, 2019 at 7:36 AM Jeremie Courreges-Anglas 
wrote:

>
> A bunch of our ports expect pthread_np.h to be standalone, alas our
> version doesn't include pthread.h.  The diff below should help us get
> rid of some patches in (at least) mongodb, mono, gnustep and webkitgtk4.
>
> ok?
>

ok guenther@


Re: wall(1) unveil for non-root users

2019-05-16 Thread Philip Guenther
On Wed, May 15, 2019 at 10:54 PM Theo de Raadt  wrote:

> Martijn van Duren  wrote:
>
> > I don't see much point in the check.
> >
> > If we don't have write permissions open(2) will fail.
> > If we open it based on S_IWOTH permissions than checking for S_IWGRP
> > without considering who is in that group seems really absurd to me.
> >
> > So I'd be OK with patch 1
>
> There are 3 write permission checks which could permit the open,
> but then it checks the specific one.
>
> I believe this is similar to comsat / biff, which in those programs
> are used to indicate "I am ok with messages to my tty".
>
> That is an old behaviour, but I bet that is the history of this S_IWGRP
> check.
>

Yep.  mesg(1) manipulates your tty's group-write bit so that wall(1),
talk(1), and write(1) can check it.

Philip Guenther


Re: ld.so speedup (part 2)

2019-05-11 Thread Philip Guenther
On Tue, 7 May 2019, Jeremie Courreges-Anglas wrote:
> On Sat, Apr 27 2019, Nathanael Rensen  
> wrote:
> > The diff below speeds up ld.so library intialisation where the 
> > dependency tree is broad and deep, such as samba's smbd which links 
> > over 100 libraries.
...
> As I told mpi@ earlier today, I think your changes are correct as is,
> and are good to be committed.  So this counts as an ok jca@.  But I'd
> expect other developers to chime in soon, maybe they'll spot something
> that I didn't.

drahn@ and I pulled on our ld.so waders and agreed it's good, so I've 
committed it with some tweaking to the #defines to make them 
self-explanatory and have contiguous bit-assignments.

Thank you for identifying this badly inefficient algorithm and spotting 
how easy it was to fix!


Philip Guenther



Re: Switch powerpc to big PIC

2019-02-05 Thread Philip Guenther
On Tue, Feb 5, 2019 at 12:59 PM Mark Kettenis 
wrote:

> The architecture already has big PIE.  The issue is that clang doesn't
> support secure-plt for small pic.  I haven't entirely figured out
> what's going on here and we probably need some further fixes to clang
> here.  On the other hand I think it is probably time to recognize
> there is more and more bloat in the world.
>
> Thoughts?
>

libexec/ld.so/powerpc/Makefile.inc needs updating to match, no?
---
CFLAGS += -fpic -msoft-float
---


Also, does clang support secure-plt for small pie?  If not, then
lib/csu/Makefile needs updating too:
---
# Override powerpc default of -fPIE
# XXX if this is safe, why not override CFLAGS for alpha and sparc64 too?
# Does it work because the csu bits come first and get the first few GOT
# entries?
.if ${MACHINE_ARCH} == "powerpc"
CFLAGS+=-fpie
.endif
---


Philip Guenther


Re: trunk shouldnt care if it's stacked

2019-01-09 Thread Philip Guenther
On Wed, Jan 9, 2019 at 1:11 AM Klemens Nanni  wrote:

> On Wed, Jan 09, 2019 at 01:12:31PM +1000, David Gwynne wrote:
> > -#define TRUNK_MAX_STACKING   4   /* maximum number of stacked
> trunks */
> Is this an arbitrary limit or does it conceal other limitiations?
>

If stacking is handled via possibly recursive calls then there's an
effective limit imposed by size of the kernel stack.

Imposing a depth limit also prevents looping a stack by adding an upper
trunk to a bottom one.


Philip Guenther


Re: ksh: quote empties

2018-12-30 Thread Philip Guenther
On Sun, 15 Apr 2018, Martijn Dekker wrote:
> Op 15-04-18 om 03:41 schreef Philip Guenther:
> > On Sun, 15 Apr 2018, Klemens Nanni wrote:
> > > It also badly effects non-empty cases:
> > ...
> > >   $ ./obj/ksh -c alias
> > >   autoload=''
> > >   functions=''
> > 
> > Hah!  The original diff i actually broken (it tests the wrong variable)
> > but I fixed that by accident when I manually made the diff in my tree!
> > 
> > So, uh, I'm no longer fine with the original diff...
> 
> D'oh!
> 
> That's embarrassing. Sorry about that. :-/
...
> --- misc.c9 Apr 2018 17:53:36 -   1.70
> +++ misc.c15 Apr 2018 02:06:55 -
> @@ -966,6 +966,12 @@ print_value_quoted(const char *s)
>   const char *p;
>   int inquote = 0;
> 
> + /* Check for empty */
> + if (!*s) {
> + shprintf("''");
> + return;
> + }
> +

This thread was never resolved/committed.  Looking again at the diffs, I 
still think I prefer that we _not_ touch print_value_quoted(), as the 
other callers all use the 'key=value' format and don't need special 
handling of empty values, leaving a diff like the one below.

ok?

Philip


Index: bin/ksh/c_sh.c
===
RCS file: /data/src/openbsd/src/bin/ksh/c_sh.c,v
retrieving revision 1.63
diff -u -p -r1.63 c_sh.c
--- bin/ksh/c_sh.c  9 Apr 2018 17:53:36 -   1.63
+++ bin/ksh/c_sh.c  30 Dec 2018 22:24:19 -
@@ -478,7 +478,10 @@ c_trap(char **wp)
for (p = sigtraps, i = NSIG+1; --i >= 0; p++) {
if (p->trap != NULL) {
shprintf("trap -- ");
-   print_value_quoted(p->trap);
+   if (p->trap[0])
+   print_value_quoted(p->trap);
+   else
+   shprintf("''");
shprintf(" %s\n", p->name);
}
}
Index: regress/bin/ksh/obsd-regress.t
===
RCS file: /data/src/openbsd/src/regress/bin/ksh/obsd-regress.t,v
retrieving revision 1.10
diff -u -p -r1.10 obsd-regress.t
--- regress/bin/ksh/obsd-regress.t  8 Dec 2018 21:03:51 -   1.10
+++ regress/bin/ksh/obsd-regress.t  30 Dec 2018 22:41:27 -
@@ -503,3 +503,20 @@ description:
 stdin:
kill -s SIGINFO $$
 ---
+
+name: trap-ignore-output
+description:
+   trap output shows ignored signals correctly
+stdin:
+   trap '' SIGUSR1 SIGUSR2
+   trap "" SIGPIPE
+   trap exit SIGINT
+   trap exit\ 1 SIGTERM
+   trap
+expected-stdout:
+   trap -- exit INT
+   trap -- '' PIPE
+   trap -- 'exit 1' TERM
+   trap -- '' USR1
+   trap -- '' USR2
+---



reduce libgen.h usage

2018-12-30 Thread Philip Guenther


 only prototypes dirname() and basename().  There are a bunch of 
source files that #include it but don't need it; the diff below deletes 
those pointless includes.

ok?

Philip

Index: bin/ksh/edit.c
===
RCS file: /data/src/openbsd/src/bin/ksh/edit.c,v
retrieving revision 1.66
diff -u -p -r1.66 edit.c
--- bin/ksh/edit.c  18 Jun 2018 17:03:58 -  1.66
+++ bin/ksh/edit.c  21 Oct 2018 03:47:03 -
@@ -12,7 +12,6 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
Index: usr.bin/cvs/commit.c
===
RCS file: /data/src/openbsd/src/usr.bin/cvs/commit.c,v
retrieving revision 1.158
diff -u -p -r1.158 commit.c
--- usr.bin/cvs/commit.c1 Jun 2017 08:08:24 -   1.158
+++ usr.bin/cvs/commit.c21 Oct 2018 03:39:46 -
@@ -20,7 +20,6 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
Index: usr.bin/less/less.h
===
RCS file: /data/src/openbsd/src/usr.bin/less/less.h,v
retrieving revision 1.27
diff -u -p -r1.27 less.h
--- usr.bin/less/less.h 26 Mar 2016 08:59:29 -  1.27
+++ usr.bin/less/less.h 21 Oct 2018 03:41:31 -
@@ -19,7 +19,6 @@
 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
Index: usr.bin/mg/dired.c
===
RCS file: /data/src/openbsd/src/usr.bin/mg/dired.c,v
retrieving revision 1.83
diff -u -p -r1.83 dired.c
--- usr.bin/mg/dired.c  7 Oct 2016 00:17:20 -   1.83
+++ usr.bin/mg/dired.c  21 Oct 2018 03:39:48 -
@@ -16,7 +16,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
Index: usr.bin/mg/grep.c
===
RCS file: /data/src/openbsd/src/usr.bin/mg/grep.c,v
retrieving revision 1.46
diff -u -p -r1.46 grep.c
--- usr.bin/mg/grep.c   9 Jan 2018 17:59:29 -   1.46
+++ usr.bin/mg/grep.c   21 Oct 2018 03:39:50 -
@@ -7,7 +7,6 @@
 #include 
 
 #include 
-#include 
 #include 
 #include 
 #include 
Index: usr.bin/patch/util.c
===
RCS file: /data/src/openbsd/src/usr.bin/patch/util.c,v
retrieving revision 1.41
diff -u -p -r1.41 util.c
--- usr.bin/patch/util.c7 Apr 2018 14:55:13 -   1.41
+++ usr.bin/patch/util.c20 Apr 2018 02:22:30 -
@@ -31,7 +31,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
Index: usr.bin/rcs/rcs.c
===
RCS file: /data/src/openbsd/src/usr.bin/rcs/rcs.c,v
retrieving revision 1.85
diff -u -p -r1.85 rcs.c
--- usr.bin/rcs/rcs.c   9 May 2016 13:03:55 -   1.85
+++ usr.bin/rcs/rcs.c   21 Oct 2018 03:39:52 -
@@ -30,7 +30,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
Index: usr.sbin/npppd/npppd/npppd.c
===
RCS file: /data/src/openbsd/src/usr.sbin/npppd/npppd/npppd.c,v
retrieving revision 1.48
diff -u -p -r1.48 npppd.c
--- usr.sbin/npppd/npppd/npppd.c25 Jul 2018 02:18:36 -  1.48
+++ usr.sbin/npppd/npppd/npppd.c21 Oct 2018 03:35:48 -
@@ -50,7 +50,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
Index: usr.sbin/smtpd/envelope.c
===
RCS file: /data/src/openbsd/src/usr.sbin/smtpd/envelope.c,v
retrieving revision 1.41
diff -u -p -r1.41 envelope.c
--- usr.sbin/smtpd/envelope.c   27 Dec 2018 15:41:50 -  1.41
+++ usr.sbin/smtpd/envelope.c   30 Dec 2018 18:39:47 -
@@ -33,7 +33,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
Index: usr.sbin/smtpd/queue.c
===
RCS file: /data/src/openbsd/src/usr.sbin/smtpd/queue.c,v
retrieving revision 1.188
diff -u -p -r1.188 queue.c
--- usr.sbin/smtpd/queue.c  8 Dec 2018 08:01:15 -   1.188
+++ usr.sbin/smtpd/queue.c  9 Dec 2018 21:05:57 -
@@ -28,7 +28,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
Index: usr.sbin/smtpd/queue_backend.c
===
RCS file: /data/src/openbsd/src/usr.sbin/smtpd/queue_backend.c,v
retrieving revision 1.64
diff -u -p -r1.64 queue_backend.c
--- usr.sbin/smtpd/queue_backend.c  31 May 2018 21:06:12 -  1.64
+++ usr.sbin/smtpd/queue_backend.c  21 Oct 2018 03:36:06 -
@@ -31,7 +31,6 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
Index: usr.sbin/smtpd/queue_fs.c
===
RCS file: /data/src/openbsd/src/usr.sbin/smtpd/queue_fs.c,v
retrieving revision 1.17
diff -u -p 

Re: ldapd, fix warnings when compiling with gcc

2018-12-04 Thread Philip Guenther
On Tue, Dec 4, 2018 at 5:40 AM Claudio Jeker 
wrote:

> Gcc is unhappy about the void * usage in printf:
> search.c:325: warning: format '%s' expects type 'char *', but argument 2
> has type 'void *'
> search.c:345: warning: format '%.*s' expects type 'char *', but argument 3
> has type 'void *'
> search.c:365: warning: format '%s' expects type 'char *', but argument 2
> has type 'void *'
>
> This diff fixes those. Now neither clang nor gcc complain.
> OK?
>

ok guenther@


Re: [patch] sh.1 getopts inaccurate

2018-11-28 Thread Philip Guenther
On Wed, Nov 28, 2018 at 10:52 PM Solene Rapenne  wrote:

> I'm not familiar with getopts, first time I use it and the man page was
> confusing. It says "a colon following an option indicates it may take an
> argument".
> I understand this as: if I use the string "s:" then I can use "-s" or "-s
> something" as it **may** take an argument.
>
> But if you use "s:", getopts reports an error "-s requires argument".
>
> I propose to modify that sentence (not sure for the s in requires).
>
> Index: sh.1
> ===
> RCS file: /data/cvs/src/bin/ksh/sh.1,v
> retrieving revision 1.149
> diff -u -p -r1.149 sh.1
> --- sh.128 Sep 2018 18:32:39 -  1.149
> +++ sh.129 Nov 2018 06:44:50 -
> @@ -495,7 +495,7 @@ to the index of the next variable to be
>  The string
>  .Ar optstring
>  contains a list of acceptable options;
> -a colon following an option indicates it may take an argument.
> +a colon following an option indicates it requires an argument.
>  If an option not recognised by
>  .Ar optstring
>  is found,
>

"requires" is correct there.
ok guenther@


Re: ksh: be even more posixly correct on local/typeset

2018-11-20 Thread Philip Guenther
On Tue, Nov 20, 2018 at 12:47 AM Martijn van Duren <
openbsd+t...@list.imperialat.at> wrote:

> When running a script with -o posix I usually do so to make sure that
> the script will run anywhere with a POSIX compliant shell. Currently we
> support typeset when running in posix mode, where POSIX states:
> If the command name matches the name of a utility listed in the
> following table, the results are unspecified[0].
> >snip<
> local
> >snip<
> typeset
> >snip<
>
> When looking at the rationale[1] I found the following:
> Local variables within a function were considered and included in
> another early proposal (controlled by the special built-in local), but
> were removed because they do not fit the simple model developed for
> functions and because there was some opposition to adding yet another
> new special built-in that was not part of historical practice.
> Implementations should reserve the identifier local (as well as typeset,
> as used in the KornShell) in case this local variable mechanism is
> adopted in a future version of this standard.
>
> I know bash supports local and dash has the keyword, but ignores its
> meaning (the variables are still global), but since POSIX is pretty
> vague in how the keyword should be reserved I reckon it would help
> people who want to write portable scripts.
>
> Thoughts? Are people running -o posix in production where this could
> breakage?
>

"set -o posix" is, IMO, about dealing with the places where ksh and posix
diverge.  Using it as a hammer for strict compliance eliminates that middle
ground of "posix behavior with extensions".  I would not consider that a
positive direction.


Philip Guenther


Re: tap(4) minor number too large

2018-11-11 Thread Philip Guenther
On Sun, Nov 11, 2018 at 10:03 PM David Gwynne  wrote:
...

> so like this?
>
...

> --- if_tun.c24 Feb 2018 07:20:04 -  1.181
> +++ if_tun.c12 Nov 2018 06:02:51 -
> @@ -193,6 +193,9 @@ tun_create(struct if_clone *ifc, int uni
> struct tun_softc*tp;
> struct ifnet*ifp;
>
> +   if (unit > minor(~0U))
> +   return (ENXIO);
>

I would be inclined to write minor(~(dev_t)0) Just In Case, but ok either
way.


Re: tap(4) minor number too large

2018-11-11 Thread Philip Guenther
On Sun, Nov 11, 2018 at 9:45 PM David Gwynne  wrote:

> If you can trick someone into implementing 64bit device ids you can have
> more than your 16 millionth tap device.
>

Hhahahahahahahhahahahahah.

That would involve rolling six syscall numbers, not to mention handling the
64bit padding in one of mknod() or mknodat().

As the guy who did the last type bump that invasive, Just Say No.


Philip Guenther


Re: AMD64 buffer cache 4GB cap anything new, multiqueueing plans? ("64bit DMA on amd64" cont)

2018-11-06 Thread Philip Guenther
On Tue, Nov 6, 2018 at 9:51 PM Joseph Mayer 
wrote:

> Previously there was a years-long thread about a 4GB (32bit) buffer
> cache constraint on AMD64, ref
> https://marc.info/?t=14682443664=1=2 .
>
> What I gather is,
>
>  * The problematique is that on AMD64, DMA is limited to 32bit
>addressing, I guess because unlike AMD64 arch CPU:s which all have
>64bit DMA support, popular PCI accessories and supporting hardware
>out there like bridges, have DMA functionality limited to 32bit
>addressing.
>

My read of that thread, particularly Theo's comments, is that no one
actually demonstrated a case where lack of 64bit DMA caused any problems or
limitations.

If you have a system and use where lack of 64bit DMA creates a performance
limitation, then describe it and, *more importantly*, *why* you think the
DMA limit is involved.


Philip Guenther


Re: httpd: fix/consistency cast for ctype function

2018-11-03 Thread Philip Guenther
On Fri, Nov 2, 2018 at 4:14 AM Hiltjo Posthuma 
wrote:

> I noticed many ctype functions (such as isalpha, isdigit, tolower) are
> cast to
> unsigned char in httpd. This patch changes it also for one remaining check.
>

Yep.  Committed.  Thanks!

Philip Guenther


Re: cron.c patch

2018-10-20 Thread Philip Guenther
On Sat, Oct 20, 2018 at 2:41 PM Edgar Pettijohn III 
wrote:

> On 10/20/18 6:40 PM, Philip Guenther wrote:
>
> On Sat, Oct 20, 2018 at 2:34 PM Edgar Pettijohn III <
> ed...@pettijohn-web.com> wrote:
>
>> I'm guessing the if block will never hit since listen returns either 0 or
>> -1.
>>
>
> Uh, but -1 is true.
>
> my bad. -1 just feels so untrue.
>

On two's-complement systems (i.e., all modern CPUs), -1 is all bits set
making it the *truest* value!  :-)


Re: cron.c patch

2018-10-20 Thread Philip Guenther
On Sat, Oct 20, 2018 at 2:34 PM Edgar Pettijohn III 
wrote:

> I'm guessing the if block will never hit since listen returns either 0 or
> -1.
>

Uh, but -1 is true.

Philip


Re: multicast.4: drop unneeded (void *) casts

2018-10-19 Thread Philip Guenther
On Fri, Oct 19, 2018 at 6:20 AM Scott Cheloha 
wrote:
...

> Sure, looks cleaner.  ok?


ok guenther@


Re: multicast.4: drop unneeded (void *) casts

2018-10-18 Thread Philip Guenther
On Thu, Oct 18, 2018 at 4:00 PM  wrote:

> getsockopt(2) and setsockopt(2) take a void pointer for the third
> parameter so there is no need for any of these casts.
>

Yep.


> I can do other style(9)-type cleanup in a subsequent diff or I can
> cook this diff and do it all in one.
>

> Thoughts, preferences?
>

I would prefer line-wrapping adjustment that's enabled by the deletion of
the casts be done in the same commit.  Those to change, IMHO, called out
below.


> @@ -168,7 +168,7 @@ memcpy(_lcl_addr, _local_add
>  if (vc.vifc_flags & VIFF_TUNNEL)
>  memcpy(_rmt_addr, _remote_address,
> sizeof(vc.vifc_rmt_addr));
> -setsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_VIF, (void *),
> +setsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_VIF, ,
> sizeof(vc));
>

Unwrap completely.

@@ -205,7 +205,7 @@ memset(, 0, sizeof(mc));
>  mc.mif6c_mifi = mif_index;
>  mc.mif6c_flags = mif_flags;
>  mc.mif6c_pifi = pif_index;
> -setsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_ADD_MIF, (void *),
> +setsockopt(mrouter_s6, IPPROTO_IPV6, MRT6_ADD_MIF, ,
> sizeof(mc));
>

Ditto

 ...

> @@ -302,7 +302,7 @@ mc.mfcc_parent = iif_index;
>  for (i = 0; i < maxvifs; i++)
>  mc.mfcc_ttls[i] = oifs_ttl[i];
>  setsockopt(mrouter_s4, IPPROTO_IP, MRT_ADD_MFC,
> -   (void *), sizeof(mc));
> +   , sizeof(mc));
>

Ditto and the next three.


>  setsockopt(mrouter_s4, IPPROTO_IPV6, MRT6_ADD_MFC,
> -   (void *), sizeof(mc));
> +   , sizeof(mc));
>
...

>  setsockopt(mrouter_s4, IPPROTO_IP, MRT_DEL_MFC,
> -   (void *), sizeof(mc));
> +   , sizeof(mc));

...

>  setsockopt(mrouter_s4, IPPROTO_IPV6, MRT6_DEL_MFC,
> -   (void *), sizeof(mc));
> +   , sizeof(mc));
>


@@ -478,7 +478,7 @@ would fail.
>  To modify the API, and to set some specific feature in the kernel, then:
>  .Bd -literal -offset 3n
>  uint32_t v = MRT_MFC_FLAGS_DISABLE_WRONGVIF;
> -if (setsockopt(sock, IPPROTO_IP, MRT_API_CONFIG, (void *), sizeof(v))
> +if (setsockopt(sock, IPPROTO_IP, MRT_API_CONFIG, , sizeof(v))
>  != 0) {
>

Unwrap the !=0{ line.


Re: copy and paste error in rb tree code

2018-10-08 Thread Philip Guenther
On Mon, Oct 8, 2018 at 6:03 PM David Gwynne  wrote:

> According to https://github.com/FRRouting/frr/pull/3106, Coverity thinks
> I made a copy and paste error regarding one of the augment calls in
> RBT_REMOVE. I'm inclined to agree.
>
> The augment code is only used by uvm_map.c, and it seems to be fine with
> this change. I believe it coped because the augment handler it provides
> is very conservative and walks all parents on every augment call, so
> the code missing one wasn't a problem.
>
> ok?
>

Looking at the macro code this was based on in tree.h rev 1.14, I agree
it's wrong and the diff is correct:
...
 if (RB_LEFT(RB_PARENT(old, field), field) == old)\
RB_LEFT(RB_PARENT(old, field), field) =
elm;\
else\
RB_RIGHT(RB_PARENT(old, field), field) =
elm;\
RB_AUGMENT(RB_PARENT(old, field));  \
...

ok guenther@


Re: bsd.rd failure in VirtualBox

2018-09-15 Thread Philip Guenther
On Sat, Sep 15, 2018 at 11:59 AM David Higgs  wrote:

> I often use VirtualBox (version 5.2.18 on OS X) to familiarize myself
> with new features in snapshots, before upgrading my physical hardware.
>
> This afternoon, I tried updating bsd.rd (amd64, 6.4-beta RAMDISK_CD
> #281) and wasn't able to successfully boot it.  I had to rely on the
> video capture ability of VirtualBox to even notice there was a panic
> (typed out below) before it rebooted to the "BIOS" splash screen.
>
...

> Also attached is the dmesg from a prior working snapshot.  I haven't
> tried updating since this prior snapshot, so I don't have further
> insight into when the issue first appeared.
>

Thank you for the complete and clear report!

I have a diff in the amd64 snapshots to use the CPU's PCID support in many
cases and this VirtualBox setup found a bug in it.  I've generated a new
diff that should fix this, so a future snap should fix this, though when
that'll happend depends on the snap builder's schedule.


Philip Guenther


Re: [patch] Fix an error in chmod.1

2018-09-13 Thread Philip Guenther
On Thu, Sep 13, 2018 at 8:17 PM Nan Xiao  wrote:

> The following patch fixes an error in chmod.1, and very sorry if I am
> wrong, thanks!
>
...

> --- chmod.1 7 Jun 2017 09:41:57 -   1.42
> +++ chmod.1 14 Sep 2018 05:10:44 -
> @@ -243,7 +243,7 @@ If no value is supplied for
>  each permission bit specified in
>  .Ar perm ,
>  for which the corresponding bit in the file mode creation mask
> -is clear, is cleared.
> +is set, is cleared.
>  Otherwise, the mode bits represented by the specified
>  .Ar who
>  and
>

The current documentation matches both the POSIX specification and the
utility behavior.  Consider:

$ touch file
$ chmod 666 file
$ mkdir -m 777 dir
$ ls -l
total 2
drwxrwxrwx  2 guenther  wheel  512 Sep 13 22:24 dir
-rw-rw-rw-  1 guenther  wheel0 Sep 13 22:24 file
$ umask
002
$ chmod -rw *
$ ls -l
total 2
d--x--x-wx  2 guenther  wheel  512 Sep 13 22:24 dir
w-  1 guenther  wheel0 Sep 13 22:24 file
$

The other-write bit was set in my file mode creation mask, so the "-rw" was
treated like "ug-rw,o-r" and did not change the other-write permission on
the file or directory.


Philip Guenther


Re: add uid_from_user/gid_from_group [1/3]

2018-09-12 Thread Philip Guenther
On Wed, 12 Sep 2018, Todd C. Miller wrote:
> Thanks for the feedback, here's an updated diff that eliminates 
> pwcache.h, gracefully handles table allocation failure and massages the 
> comments to be a bit more general.
> 
> I can take a look at supporting arbitrary length names in the future.

ok guenther@


> +/*
> + * Constants and data structures used to implement group and password file
> + * caches.  Name lengths have been chosen to be as large as those supported
> + * by the passwd and group files as well as the standard archive formats.
> + * CACHE SIZES MUST BE PRIME
> + */

Side note: I seem to recall reading an analysis somewhere that found that 
with a good hash function, the size of the table didn't matter.  I 
remember Kernighan & Pike's The Practice of Programming showing the 
primality of the size didn't matter for the hash table they whip up in the 
algorithms section.  Maybe the current hash just needs updating...



Re: add uid_from_user/gid_from_group [2/3]

2018-09-11 Thread Philip Guenther
On Mon, 10 Sep 2018, Todd C. Miller wrote:
> This removes cache.c from pax in favor of using the new uid_from_user() 
> and gid_from_group() functions in libc.  I've added explicit calls to 
> setpassent() and setgroupent() since they are no longer implicitly 
> called.

This part is ok guenther@ (once part 1 is haggled in)



Re: add uid_from_user/gid_from_group [1/3]

2018-09-11 Thread Philip Guenther
On Mon, 10 Sep 2018, Todd C. Miller wrote:
...
> --- /dev/null 1 Jan 1970 00:00:00 -
> +++ lib/libc/gen/pwcache.h10 Sep 2018 00:46:44 -
...
> +/*
> + * Constants and data structures used to implement group and password file
> + * caches. Traditional passwd/group cache routines perform quite poorly with
> + * archives. The chances of hitting a valid lookup with an archive is quite a
> + * bit worse than with files already resident on the file system. These 
> misses
> + * create a MAJOR performance cost. To adress this problem, these routines
> + * cache both hits and misses.
> + *
> + * NOTE:  name lengths must be as large as those stored in ANY PROTOCOL and
> + * as stored in the passwd and group files. CACHE SIZES MUST BE PRIME
> + */

Oh yeah, this comment should be massaged to match the new context,
perhaps noting that the limits on these functions are set to provide
effective caching, both positive and negative, for the standard
archive formats, blah blah blah.


(I mentioned "maul"ing before.  I'm been tempted to make malloc earn its
pay and use
struct uidc {
uid_t uid;
char valid;
char name[];
};

instead, mallocing the correct size for the name...and eliminating the
UNMLEN/GNMLEN limits!)


Philip



Re: add uid_from_user/gid_from_group [1/3]

2018-09-11 Thread Philip Guenther
On Mon, 10 Sep 2018, Todd C. Miller wrote:
> This is a port of uid_from_user() and gid_from_group() from NetBSD (also 
> present in FreeBSD).
...
> --- lib/libc/gen/pwcache.c25 Nov 2015 23:16:01 -  1.13
> +++ lib/libc/gen/pwcache.c10 Sep 2018 15:04:35 -
...
> +/*
> + * routines that control user, group, uid and gid caches (for the archive
> + * member print routine).

The parenthetical side comment should be deleted.  I suggest capitalizing 
'routines' while there.

...
> +const char *
> +group_from_gid(gid_t gid, int noname)
> +{
> + struct group grstore, *gr = NULL;
> + char grbuf[_GR_BUF_LEN];
> + GIDC *ptr, **pptr;
> +
> + if ((gidtb == NULL) && (gidtb_start() < 0))
> + return NULL;

One question is whether these routines should fall back to uncached 
operation if the allocation fails.  "tar -tv" will segfault currently if a 
[ug]idtb_start() call fails.


...
> Index: lib/libc/gen/pwcache.h
> ===
> RCS file: lib/libc/gen/pwcache.h

I merged cache.h into cache.c in pax because the separate header was 
pointless; why split these bits into a single-use header again?

(...specially because I would maul these data structure a bit...)



> Index: lib/libc/gen/getgrent.c
> Index: lib/libc/Symbols.list
> Index: lib/libc/hidden/grp.h
> Index: lib/libc/hidden/pwd.h

These bits are all correct and do what you desire.


Philip



Re: Linux DRM

2018-09-03 Thread Philip Guenther
On Mon, Sep 3, 2018 at 11:46 AM Thomas de Grivel  wrote:

> I was browsing the DRM code ported from Linux and it's a terrible
> mess, is there any ongoing project to clean up that codebase or
> rewrite it entirely ?
>

No.  OpenBSD doesn't have the resources to reimplement the DRM subsystem or
maintain a non-trivial fork of the Linux version.  We don't want to get
stuck with a code base that doesn't support close-to-current hardware, so
the porting work has concentrated on minimizing the changes necessary to
make the upstream code base work in OpenBSD.

It's clear that the hardware support in the upstream has large
contributions from developers with inside access at the hardware vendors;
without such access it's doubtful that all the hardware bugs^Wlimitations
can be worked around with non-infinite resource.

Improvements in the DRM code itself should be done in the upstream, not
just to minimize OpenBSD costs in this area, but so that all OSes that draw
from that base can benefit.


Philip Guenther


Re: Floating point exception on boot after using syspatch(8)

2018-08-01 Thread Philip Guenther
On Wed, 1 Aug 2018, Zbyszek Żółkiewski wrote:
> OpenBSD 6.3 GENERIC.MP running on XEN (AWS m4.large) - dmesg at the end 
> of this mail. Image build from https://github.com/kolargol/openbsd-aws 
> which is fork from https://github.com/ajacoutot/aws-openbsd - my fork 
> have some minor modifications.
...
> forgive me if i missed something obvious here - but any idea what is 
> wrong here?  What can cause Floating point exception during boot on 
> system that otherwise works without patching?

This smells like some compatibility issue with the eager-FPU change.  One 
thought is whether the Xen presents the guest with a correctly reset FPU 
state on all CPUs, or if there's some odd state that we're not clearing.

I made a later commit in that area (sys/arch/amd64/amd64/cpu.c rev 1.124) 
so it would be interesting to know if -current shows the same issue or if 
the symptoms have improved (or changed at all) there.


Philip Guenther



Re: Kill MD-specific interrupt enable/disable API on amd64

2018-07-24 Thread Philip Guenther
On Tue, Jul 24, 2018 at 1:02 PM Mark Kettenis 
wrote:

> Diff below switches to the MI equivalent and kills the MD-specific
> API.
>
> ok?
>

ok guenther@


Re: CVS: cvs.openbsd.org: src

2018-07-12 Thread Philip Guenther
On Thu, Jul 12, 2018 at 2:11 PM Philip Guenther 
wrote:

> CVSROOT:/cvs
> Module name:src
> Changes by: guent...@cvs.openbsd.org2018/07/12 08:11:11
>
> Modified files:
> sys/arch/amd64/amd64: cpu.c identcpu.c locore.S machdep.c pmap.c
>   vector.S
> sys/arch/amd64/conf: ld.script
> sys/arch/amd64/include: asm.h codepatch.h
>
> Log message:
> Reorganize the Meltdown entry and exit trampolines for syscall and
> traps so that the "mov %rax,%cr3" is followed by an infinite loop
> which is avoided because the mapping of the code being executed is
> changed.  This means the sysretq/iretq isn't even present in that
> flow of instructions in the kernel mapping, so userspace code can't
> be speculatively reached on the kernel mapping and totally eliminates
> the conditional jump over the the %cr3 change that supported CPUs
> without the Meltdown vulnerability.  The return paths were probably
> vulnerable to Spectre v1 (and v1.1/1.2) style attacks, speculatively
> executing user code post-system-call with the kernel mappings, thus
> creating cache/TLB/etc side-effects.


Damnit, I left out that since this evolves the _Meltdown_ fix with mapping
_over_ the trampoline, we're calling this the Tuna Meltover.

Because you can tuna meltover, but you can't tune a fish.
(hat tip to the author of the tunefs(8) manpage.)


Philip Guenther


Re: signal to process or posix thread

2018-07-09 Thread Philip Guenther
On Sun, Jul 1, 2018 at 9:47 AM Joerg Sonnenberger  wrote:

> On Fri, Jun 29, 2018 at 04:21:17PM +0200, Alexander Bluhm wrote:
> > The problem is that POSIX has signals that are sent to processes
> > and signals sent to individual threads.  Our kernel does not support
> > this properly.
>
> Well, not exactly. POSIX has synchronous and asynchronous signals. I.e.
> SIGFPE after a division by zero or SIGBUS/SIGSEGV are typically
> synchronous traps thrown by the processing of the instructions of the
> current thread. This signals are delivered to the current thread. All
> other signals, i.e. those created as side effect of kill(2) are sent to
> the process at large.


I think Bluhm's wording is more true to POSIX, as there are other methods
for a signal to be delivered to a specific thread.  To quote POSIX XSH
2.4.1p2:
At the time of generation, a determination shall be made whether the
signal
has been generated for the process or for a specific thread within the
process.
Signals which are generated by some action attributable to a particular
thread,
such as a hardware fault, shall be generated for the thread that caused
the
signal to be generated. Signals that are generated in association with
a process
ID or process group ID or an asynchronous event, such as terminal
activity,
shall be generated for the process.

Later, it specifies that pthread_kill() generates the signal for the
specific thread:
The pthread_kill() function shall request that a signal be delivered to
the specified thread.


Those signals are handled by the first thread that
> doesn't have them masked. In that case, it should be put on the pending
> list of the process and any unmasking operation should check the pending
> list on whether a signal should be delivered delayed.
>

Yep.


Philip Guenther


Re: pthread_barrier_init: check count argument

2018-07-05 Thread Philip Guenther
On Thu, Jul 5, 2018 at 1:20 AM Paul Irofti  wrote:

> POSIX mandates that we return EINVAL if count equals zero on barrier
> initialization. Which makes sense to me.
>
> This also fixes posixtestsuite conformance 3-1 test.
>
> OK?
>

ok guenther@


Re: acpi(4) attach glue for amd64/i386

2018-07-01 Thread Philip Guenther
On Sun, Jul 1, 2018 at 8:00 AM Mark Kettenis 
wrote:

> Diff below moves the attach glue from acpi.c into acpi_machdep.c.
> Gets rid of an #ifdef and helps me avoid an ugly hack on arm64.  There
> is some additional code duplication, but I think this is acceptable.
>
> ok?
>

Fewer ifdefs, yay.  ok guenther@


Re: Use -z notext -z norelro for armv7 efiboot

2018-06-22 Thread Philip Guenther
On Fri, Jun 22, 2018 at 3:04 AM Mark Kettenis 
wrote:

> Creating relocations inside .text is inherent to the way we wrap an
> ELF shared object inside a PE executable.  But without -z notext
> lld(1) complains about this.  Also pass -z norelro to suppress the
> creation of a separate rodata segment that makes lld(1) complain about
> relocations against a read-only section.  This makes it possible to
> build efiboot with lld(1) on armv7.
>
> ok?
>

ok guenther@


Re: [patch] Use snprintf to implement concat in etc.c

2018-06-08 Thread Philip Guenther
On Thu, Jun 7, 2018 at 12:34 AM Nan Xiao  wrote:

> I think maybe one snprintf is more efficient and concise. Sorry if I am
> wrong, thanks!
>

Seems unlikely.  More productive would be to go up a level and look at
whether concat() is carrying its weight as an abstraction.  One of the
calls is completely superfluous, to my eye.

But really, ldconfig is pretty uninteresting, being run very, very rarely,
so polishing it is like sanding the rough spots on the bottoms of table
legs, where they touch the floor.  We've spent more cycles talking about it
then it'll use in a year...



Philip Guenther


Re: witness report

2018-06-03 Thread Philip Guenther
On Sun, Jun 3, 2018 at 12:51 PM, Amit Kulkarni  wrote:

> On Sun, 3 Jun 2018 10:37:30 -0700
> Philip Guenther  wrote:
>
...

> > Index: kern/kern_rwlock.c
> > ===
> > RCS file: /data/src/openbsd/src/sys/kern/kern_rwlock.c,v
> > retrieving revision 1.35
> > diff -u -p -r1.35 kern_rwlock.c
> > --- kern/kern_rwlock.c21 Mar 2018 12:28:39 -  1.35
> > +++ kern/kern_rwlock.c3 Jun 2018 17:00:02 -
> > @@ -223,6 +223,8 @@ _rw_enter(struct rwlock *rwl, int flags
> >   lop_flags = LOP_NEWORDER;
> >   if (flags & RW_WRITE)
> >   lop_flags |= LOP_EXCLUSIVE;
> > + if (flags & RW_DUPOK)
> > + lop_flags |= LOP_DUPOK;
> >   if ((flags & RW_NOSLEEP) == 0 && (flags & RW_DOWNGRADE) == 0)
> >   WITNESS_CHECKORDER(>rwl_lock_obj, lop_flags, file,
> line,
> >   NULL);
> > Index: kern/vfs_subr.c
> > ===
> > RCS file: /data/src/openbsd/src/sys/kern/vfs_subr.c,v
> > retrieving revision 1.273
> > diff -u -p -r1.273 vfs_subr.c
> > --- kern/vfs_subr.c   27 May 2018 06:02:14 -  1.273
> > +++ kern/vfs_subr.c   3 Jun 2018 17:04:09 -
> > @@ -188,6 +188,11 @@ vfs_busy(struct mount *mp, int flags)
> >   else
> >   rwflags |= RW_NOSLEEP;
> >
> > +#ifdef WITNESS
> > + if (flags & VB_DUPOK)
> > + rwflags |= RW_DUPOK;
> > +#endif
> > +
>
> The other parts where you added the dup are not checking for Witness. This
> part above should be for all kernels, right? Witness or non-witness.
>

No, the other code-generating additions, in kern_rwlock.c, are also inside
#ifdef WITNESS, just outside of the context of the diff.  The RW_DUPOK flag
has no effect if it's not a WITNESS kernel so excluding those lines is
intentional.


Philip Guenther


Re: witness report

2018-06-03 Thread Philip Guenther
On Sun, 3 Jun 2018, Theo de Raadt wrote:
> Philip Guenther  wrote:
> > The warning is not that a single filesystem is being locked 
> > recursively by a single thread, but just that a single thread is 
> > holding locks on multiple filesystems.
> 
> vfs_stall() needs to grab locks on all filesystems, to stop a variety of 
> filesystem transactions.  (Other types of transactions are blocked in 
> other ways).
> 
> sys_umount() grabs locks on all filesystems above it, to stop anyone 
> from doing a parallel mount/unmount along the same path.
> 
> This is all normal.

Diff below adds VB_DUPOK to indicate that a given vfs_busy() call is 
expected/permitted to occur while the thread already holds another 
filesystem busy; the caller is responsible for ensuring the filesystems 
are locked in the correct order (or that NOWAIT is used safely as in the 
sys_mount() case).

As part of this, this plumbs RW_DUPOK for rw_enter().

I no longer get the witness warning on hibernate with this.

ok?

Philip Guenther


Index: sys/mount.h
===
RCS file: /data/src/openbsd/src/sys/sys/mount.h,v
retrieving revision 1.136
diff -u -p -r1.136 mount.h
--- sys/mount.h 8 May 2018 08:58:49 -   1.136
+++ sys/mount.h 3 Jun 2018 17:05:28 -
@@ -564,6 +564,7 @@ int vfs_busy(struct mount *, int);
 #define VB_WRITE   0x02
 #define VB_NOWAIT  0x04/* immediately fail on busy lock */
 #define VB_WAIT0x08/* sleep fail on busy lock */
+#define VB_DUPOK   0x10/* permit duplicate mount busying */
 
 int vfs_isbusy(struct mount *);
 int vfs_mount_foreach_vnode(struct mount *, int (*func)(struct vnode *,
Index: sys/rwlock.h
===
RCS file: /data/src/openbsd/src/sys/sys/rwlock.h,v
retrieving revision 1.22
diff -u -p -r1.22 rwlock.h
--- sys/rwlock.h12 Aug 2017 23:27:44 -  1.22
+++ sys/rwlock.h3 Jun 2018 17:05:32 -
@@ -116,6 +116,7 @@ struct rwlock {
 #define RW_SLEEPFAIL   0x0020UL /* fail if we slept for the lock */
 #define RW_NOSLEEP 0x0040UL /* don't wait for the lock */
 #define RW_RECURSEFAIL 0x0080UL /* Fail on recursion for RRW locks. */
+#define RW_DUPOK   0x0100UL /* Permit duplicate lock */
 
 /*
  * for rw_status() and rrw_status() only: exclusive lock held by
Index: kern/kern_rwlock.c
===
RCS file: /data/src/openbsd/src/sys/kern/kern_rwlock.c,v
retrieving revision 1.35
diff -u -p -r1.35 kern_rwlock.c
--- kern/kern_rwlock.c  21 Mar 2018 12:28:39 -  1.35
+++ kern/kern_rwlock.c  3 Jun 2018 17:00:02 -
@@ -223,6 +223,8 @@ _rw_enter(struct rwlock *rwl, int flags 
lop_flags = LOP_NEWORDER;
if (flags & RW_WRITE)
lop_flags |= LOP_EXCLUSIVE;
+   if (flags & RW_DUPOK)
+   lop_flags |= LOP_DUPOK;
if ((flags & RW_NOSLEEP) == 0 && (flags & RW_DOWNGRADE) == 0)
WITNESS_CHECKORDER(>rwl_lock_obj, lop_flags, file, line,
NULL);
Index: kern/vfs_subr.c
===
RCS file: /data/src/openbsd/src/sys/kern/vfs_subr.c,v
retrieving revision 1.273
diff -u -p -r1.273 vfs_subr.c
--- kern/vfs_subr.c 27 May 2018 06:02:14 -  1.273
+++ kern/vfs_subr.c 3 Jun 2018 17:04:09 -
@@ -188,6 +188,11 @@ vfs_busy(struct mount *mp, int flags)
else
rwflags |= RW_NOSLEEP;
 
+#ifdef WITNESS
+   if (flags & VB_DUPOK)
+   rwflags |= RW_DUPOK;
+#endif
+
if (rw_enter(>mnt_lock, rwflags))
return (EBUSY);
 
@@ -1602,7 +1607,7 @@ vfs_stall(struct proc *p, int stall)
 */
TAILQ_FOREACH_REVERSE(mp, , mntlist, mnt_list) {
if (stall) {
-   error = vfs_busy(mp, VB_WRITE|VB_WAIT);
+   error = vfs_busy(mp, VB_WRITE|VB_WAIT|VB_DUPOK);
if (error) {
printf("%s: busy\n", mp->mnt_stat.f_mntonname);
allerror = error;
Index: kern/vfs_syscalls.c
===
RCS file: /data/src/openbsd/src/sys/kern/vfs_syscalls.c,v
retrieving revision 1.284
diff -u -p -r1.284 vfs_syscalls.c
--- kern/vfs_syscalls.c 2 Jun 2018 10:27:43 -   1.284
+++ kern/vfs_syscalls.c 3 Jun 2018 17:04:55 -
@@ -230,7 +230,7 @@ sys_mount(struct proc *p, void *v, regis
 
 update:
/* Ensure that the parent mountpoint does not get unmounted. */
-   error = vfs_busy(vp->v_mount, VB_READ|VB_NOWAIT);
+   error = vfs_busy(vp->v_mount, VB_READ|VB_NOWAIT|VB_DUPOK);
if (error) {
if (mp->mnt_flag & MNT_UPDATE) {

Re: witness report

2018-06-03 Thread Philip Guenther
On Sun, Jun 3, 2018 at 9:08 AM, Theo de Raadt  wrote:

> Philip Guenther  wrote:
>
> > On Sun, Jun 3, 2018 at 3:26 AM, Klemens Nanni  wrote:
> >
> > > Snap from 31.05.2018 with
> > >
> > > OpenBSD 6.3-current (GENERIC.MP) #0: Sat Jun  2 16:21:22 CEST 2018
> > > k...@x250.my.domain:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > >
> > >
> > > acquiring duplicate lock of same type: ">mnt_lock"
> > > 1st vfslock @ /usr/src/sys/kern/vfs_subr.c:191
> > > 2nd vfslock @ /usr/src/sys/kern/vfs_subr.c:191
> > > Starting stack trace...
> > > witness_checkorder(9,81ab15c8,bf,80d00040,21) at
> > > witness_checkorder+0x63d
> > > _rw_enter(0,1,0,80d0) at _rw_enter+0x56
> > > vfs_stall(1,80025400) at vfs_stall+0xab
> > >
> >
> > [Also reported by bluhm@ and others]
> >
> > Is vfs_stall() the only place that locks (busies) multiple mounts?
>
> Isn't that also how unmount works?  I believe it locks the mount it will
> be discarding, and the filesystem it is mounted upon.  It is serializing
> against other activities, such as a new mount or another umount occuring
> against the same dir.




Hmm, yes, dounmount() can busy multiple filesystems concurrently when doing
a forced unmount of a filesystem which has a filesystem below it, but in
the normal case of unmounting a leaf filesystem I don't see where it would
have multiple filesystems busy concurrently.


I see vfs_stall() doing 1 lock per filesystem.  Why does this report
> say there are duplicates?
>

 The warning is not that a single filesystem is being locked recursively by
a single thread, but just that a single thread is holding locks on multiple
filesystems.


Philip


Re: witness report

2018-06-03 Thread Philip Guenther
On Sun, Jun 3, 2018 at 3:26 AM, Klemens Nanni  wrote:

> Snap from 31.05.2018 with
>
> OpenBSD 6.3-current (GENERIC.MP) #0: Sat Jun  2 16:21:22 CEST 2018
> k...@x250.my.domain:/usr/src/sys/arch/amd64/compile/GENERIC.MP
>
>
> acquiring duplicate lock of same type: ">mnt_lock"
> 1st vfslock @ /usr/src/sys/kern/vfs_subr.c:191
> 2nd vfslock @ /usr/src/sys/kern/vfs_subr.c:191
> Starting stack trace...
> witness_checkorder(9,81ab15c8,bf,80d00040,21) at
> witness_checkorder+0x63d
> _rw_enter(0,1,0,80d0) at _rw_enter+0x56
> vfs_stall(1,80025400) at vfs_stall+0xab
>

[Also reported by bluhm@ and others]

Is vfs_stall() the only place that locks (busies) multiple mounts?  If it's
the only place _and_ it hotlds no other locks when doing that then I think
it actually is safe, because it would be equivalent to multiple threads
each holding only a single mount lock and nothing else.

Even if we agree that evaluation is correct, I don't think we want to mark
mnt_lock as DUPOK for all purposes, but rather just pass LOP_DUPOK through
for the calls from vfs_stall().


Philip Guenther


Re: Drop memory barriers from _aromic_lock() on armv7 and arm64

2018-05-21 Thread Philip Guenther
On Mon, May 14, 2018 at 11:36 AM, Mark Kettenis 
wrote:

> Since the callers now add the barriers we can drop them here.
>
> ok?
>

ok guenther@


Re: de-hole some structs on amd64

2018-05-20 Thread Philip Guenther
On Sat, May 19, 2018 at 4:30 PM, Amit Kulkarni <amit.o...@gmail.com> wrote:
>
> I tested removing some slop (i.e. structure packing/de-holing) on amd64,
> this went through a full kernel + userland build.
>

Parts of this are probably okay, but there's some stuff which needs better
placement vs comments and at least one move which needs a justification for
it being safe (or not).



> --- a/sys/sys/proc.h
> +++ b/sys/sys/proc.h
> @@ -170,8 +170,8 @@ struct process {
>
>  /* The following fields are all zeroed upon creation in process_new. */
>  #defineps_startzerops_klist
> -   struct  klist ps_klist; /* knotes attached to this process
> */
> int ps_flags;   /* PS_* flags. */
> +   struct  klist ps_klist; /* knotes attached to this process
> */
>

Nope: you've moved ps_flags from inside the "zeroed out on fork" region to
outside of it
a) without justifying why that's safe, and
b) while leaving it below the comment saying that it's zeroed, when it no
longer is.

Do any of the other moves here cross a start/end zero/copy marker?



> @@ -285,6 +284,7 @@ struct proc {
> struct  futex   *p_futex;   /* Current sleeping futex. */
>
> /* substructures: */
> +   LIST_ENTRY(proc) p_hash;/* Hash chain. */
> struct  filedesc *p_fd; /* copy of p_p->ps_fd */
> struct  vmspace *p_vmspace; /* copy of p_p->ps_vmspace */
>

p_hash isn't a substructure, so putting it below the /* substructures: */
comment is wrong.  Please pay attention to the comments and consider how
the apply (or don't) to the members you're moving.



> @@ -305,6 +304,11 @@ struct proc {
> longp_thrslpid; /* for thrsleep syscall */
>
> /* scheduling */
> +   struct  cpu_info * volatile p_cpu; /* CPU we're running on. */
> +
> +   struct  rusage p_ru;/* Statistics */
> +   struct  tusage p_tu;/* accumulated times. */
> +   struct  timespec p_rtime;   /* Real time. */
> u_int   p_estcpu;/* Time averaged value of p_cpticks. */
> int p_cpticks;   /* Ticks of cpu time. */
>

Again, you've separated the scheduling parameter from the /* scheduling */
comment, putting member that aren't about scheduling between them.


Philip Guenther


Re: ld.so arm64 syscalls

2018-05-15 Thread Philip Guenther
On Tuesday, May 15, 2018, Mark Kettenis  wrote:

> There's a subtle bug in the DL_SYSCALL() implementation on arm64.
> Upon error we're supposed to return -errno.  The code does a negate of
> the lower 32-bit bits.  This means that syscalls that return a 64-bit
> number (i.e. ssize_t) still return a positive number as the upper 32
> bits remain zero.  So we should negate the full 64 bits.  That's safe
> even for system calls that return a 32-bit number since setting the
> lower 32 bits zeroes the upper 32 bits on arm64 just like on amd64.
>
> Fixes a crash I saw during ports building where dl_readlink()
> returning EINVAL resulted in an out-of-bounds access.
>
> ok?


Nice catch. Ok guenther@


Re: Missing hardlink for /usr/bin/cc

2018-05-13 Thread Philip Guenther
On Sat, May 12, 2018 at 8:59 AM, Anthony Coulter <b...@anthonycoulter.name>
wrote:

> $ ls -li /usr/bin/{cc,c++,clang,clang++,clang-cpp} /usr/libexec/cpp
> 156140 -r-xr-xr-x  5 root  bin  46885664 May  4 11:12 /usr/bin/c++
> 155926 -r-xr-xr-x  1 root  bin  46885664 May  4 11:12 /usr/bin/cc
> 156140 -r-xr-xr-x  5 root  bin  46885664 May  4 11:12 /usr/bin/clang
> 156140 -r-xr-xr-x  5 root  bin  46885664 May  4 11:12 /usr/bin/clang++
> 156140 -r-xr-xr-x  5 root  bin  46885664 May  4 11:12 /usr/bin/clang-cpp
> 156140 -r-xr-xr-x  5 root  bin  46885664 May  4 11:12 /usr/libexec/cpp
> $ diff /usr/bin/{cc,clang}
> $
>
> I interpret this as /usr/bin/cc accidentally being a copy instead of a
> hard link. Is this correct?
>

The underlying issue is that /usr/bin/cc is packaged in baseXY.tgz where
it's needed for (sane) kernel and library relinking, while the others are
packaged in compXY.tgz.

Perhaps we should move the others into baseXY.tgz so the link is preserved,
but that would have to be only on the clang-as-cc archs and there are
probably other catches I haven't noticed.  This is perhaps a puzzle best
for Theo...


Philip Guenther


Re: Say no to LARVAL

2018-05-10 Thread Philip Guenther
On Wed, May 9, 2018 at 3:31 AM, Martin Pieuchot <m...@openbsd.org> wrote:

> On 08/05/18(Tue) 14:12, Philip Guenther wrote:
> > On Tue, 8 May 2018, Martin Pieuchot wrote:
> > > The way our kernel allocates and populates new 'struct file *' is
> > > currently a complete mess.  One of the problems is that it puts
> > > incomplete descriptors on the global `filehead' list and on the
> > > open file array `fd_ofiles'.  But to make sure that other threads
> > > won't use such descriptors before they are ready, they are first
> > > marked as LARVAL then unmarked via the FILE_SET_MATURE() macro.
> >
> > The reason we do that is to meet two goals:
> > 1) we want to allocate the fd (and fail if we can't) before
> >possibly blocking while completing the creation of the struct file.
> >For example, in doaccept(), this thread being blocked waiting for a
> new
> >connection must not block other threads from making fd changes.
>
> But the way I understand it, allocating the fd isn't changed by my diff.
> fdalloc(), called by falloc(), still allocates it and mark the
> corresponding
> bits with fd_used().
>

Those bits don't block dup2() *targeting* the fd, or close of the fd.



> > 2) close(half_created) and dup2(old, half_created) must be safe: it must
> >not leak the half_created file in the kernel, and on completion the
> >half_created fd must either be closed or dup'ed to; the half-created
> >file must not suddenly show up there later
>
> I don't understand, are you talking about close(2) and dup2(2)?  How
> syscalls are related to half_created files?  How is that related to
> my explanation below?
>

thread 1: socket() -- > returns 3
thread 1: bind(3, someaddr); listen(3);
thread 1: accept(3) --> expected to return 4 (it's the first free fd), but
blocks waiting for connection
thread 2:   socket() --> returns 5 (if not, then explain that and
ignore the rest of this sequences)
thread 2:   dup2(0, 4) --> better not block!  after this fd 4 must be a
dup of 0!
thread 2:   connect(5, someaddr) --> unblocks thread 1

So what happens at the end of this?  Did accept() return 4 as expected and
if so is fd 4 a duplicate of fd 0 or is it the accepted socket?  I argue
that it must be a dup of 0 because dup2 succeeded, and the accept that
completed later cannot close that fd.  That's how the current kernel
behaves.

I haven't tested it, but my reading of the diff is that is fd 4 will be the
accepted fd and the dup'ed copy of fd 0 will be lost, leaving it's
reference count too high by one.


Philip


Re: Say no to LARVAL

2018-05-08 Thread Philip Guenther
On Tue, 8 May 2018, Martin Pieuchot wrote:
> The way our kernel allocates and populates new 'struct file *' is
> currently a complete mess.  One of the problems is that it puts
> incomplete descriptors on the global `filehead' list and on the
> open file array `fd_ofiles'.  But to make sure that other threads
> won't use such descriptors before they are ready, they are first
> marked as LARVAL then unmarked via the FILE_SET_MATURE() macro.

The reason we do that is to meet two goals:
1) we want to allocate the fd (and fail if we can't) before 
   possibly blocking while completing the creation of the struct file.  
   For example, in doaccept(), this thread being blocked waiting for a new 
   connection must not block other threads from making fd changes.

2) close(half_created) and dup2(old, half_created) must be safe: it must 
   not leak the half_created file in the kernel, and on completion the 
   half_created fd must either be closed or dup'ed to; the half-created 
   file must not suddenly show up there later


> This is a problem because we'll soon need to serialize accesses
> to the above mentioned data structures with finer locks.  And having
> to deal with specific checks in multiple places is simply not viable.
> 
> So the diff below changes the logic by introducing a new function:
> fdinsert().  This function will be used to insert new descriptors into
> shared data structured when they are completely initialized.

doaccept() is still releasing the filedesc lock between falloc() and what 
used to be FILE_SET_MATURE() but is now fdinsert(), but that's not safe 
since the fd_ofile[] entry is still NULL so finishdup() and close() will 
happily overwrite it or return and then later doaccept()->fdinsert() will 
overwrite one pointer with another, leaking file references.

So this diff is not safe.


Philip



restore: preserve symlink mode

2018-04-26 Thread Philip Guenther

When a symlink is restored, set its mode to the value from the dump.  
This is totally cosmetic, as the kernel ignores the mode on symlinks, but 
it's nice to restore to the original.

ok?

Philip

Index: tape.c
===
RCS file: /cvs/src/sbin/restore/tape.c,v
retrieving revision 1.49
diff -u -p -r1.49 tape.c
--- tape.c  21 Jan 2017 08:31:44 -  1.49
+++ tape.c  26 Apr 2018 17:56:00 -
@@ -564,6 +564,8 @@ extractfile(char *name)
if (linkit(lnkbuf, name, SYMLINK) == FAIL)
return (FAIL);
(void)lchown(name, uid, gid);
+   (void)fchmodat(AT_FDCWD, name, mode,
+   AT_SYMLINK_NOFOLLOW);
return (GOOD);
}
 



Re: on-line kernel debugging

2018-04-25 Thread Philip Guenther
On Wed, Apr 25, 2018 at 1:54 AM, Il Ka <kazakevichi...@gmail.com> wrote:
>
> I am also interested in kernel debugging.
> > but we do support running gdb inside a system against its running kernel
> Thank you for this recipe, but is it true that live debug does not work for
> some cases?
> You can't set break point in wsconsole and tty and debug them nor you can
> debug kernel that
> is frozen, and it seems that remote debug
> (like http://www.netbsd.org/docs/kernel/kgdb.html)
> is not supported by obsd, right?
>

Correct.



> Is it because of security reasons?
>

No one was using it, it got in the way of development that was in progress,
and none of the developers was interested in fixing it.  If someone wants
it and can keep it from being in the way of other work, then some form of
it could come back in, but if it would just decay into an impediment then
that would be a waste of time.


Philip Guenther


Re: Make binutils not barf on binaries linked with lld

2018-04-24 Thread Philip Guenther
On Tue, Apr 24, 2018 at 8:43 PM, Mark Kettenis 
wrote:

> So lld generates .gnu.hash sections that our ancient binutils doesn't
> grok.  With the diff below (taken from FreeBSD's binutils, so GPLv2)
> fixes this.
>
> ok?
>
> --- gnu/usr.bin/binutils-2.17/bfd/elf.c 10 Aug 2016 20:46:08 -
> 1.12
> +++ gnu/usr.bin/binutils-2.17/bfd/elf.c 24 Apr 2018 18:41:12 -
>
...

> @@ -2785,6 +2787,10 @@ elf_fake_sections (bfd *abfd, asection *
>  case SHT_GROUP:
>this_hdr->sh_entsize = 4;
>break;
> +
> +case SHT_GNU_HASH:
> +  this_hdr->sh_entsize = bed->s->arch_size == 64 ? 0 : 4;
> +  break;
>

The entity size is _zero_ for 64bit archs?  (Or does that not really matter
because you're not trying to make it dump the actual hash table itself?)

Other than that possibly being wrong, ok guenther


Re: semaphore: validate timespec and handle SA_RESTART

2018-04-24 Thread Philip Guenther
On Mon, 23 Apr 2018, Paul Irofti wrote:
> After discussing further with mpi@ and guenther@, we decided to first
> fix the existing semaphore implementation with regards to SA_RESTART
> and POSIX compliant returns in the case where we deal with restartable
> signals.
> 
> Currently we return EINTR everywhere which is mostly incorrect as the
> user can not know if she needs to recall the syscall or not. Philip
> suggested we use ECANCELED for that which is what my diff does.
> 
> Regression tests pass and so does the posixsuite. Timespec validation
> bits are needed to pass the later.

ok guenther@



Re: ksh: quote empties

2018-04-14 Thread Philip Guenther
On Sun, 15 Apr 2018, Klemens Nanni wrote:
> It also badly effects non-empty cases:
...
>   $ ./obj/ksh -c alias
>   autoload=''
>   functions=''

Hah!  The original diff i actually broken (it tests the wrong variable) 
but I fixed that by accident when I manually made the diff in my tree!

So, uh, I'm no longer fine with the original diff...


Philip



Re: ksh: quote empties

2018-04-14 Thread Philip Guenther
On Sun, 15 Apr 2018, Martijn Dekker wrote:
> $ ksh -c 'trap "" CONT; trap'
> trap --  CONT
> 
> That is not "suitable for re-entry into the shell". Empty words must be
> quoted, or they disappear. Expected output:
> 
> trap -- '' CONT
> 
> Patch below. OK?

That also changes the output of set, export, and alias:

: morgaine; ksh -c 'alias foo=""; alias -p; export foo=; set; export -p' | grep 
foo
alias foo=
foo=
export foo=
: morgaine; obj/ksh -c 'alias foo=""; alias -p; export foo=; set; export -p' | 
grep foo
alias foo=''
foo=''
export foo=''
: morgaine;

Are there any regress tests affected by this?


> --- misc.c9 Apr 2018 17:53:36 -   1.70
> +++ misc.c14 Apr 2018 22:44:39 -
> @@ -966,6 +966,12 @@ print_value_quoted(const char *s)
>   const char *p;
>   int inquote = 0;
> 
> + /* Check for empty */
> + if (!*p) {
> + shprintf("''");
> + return;
> + }
> +

My bikeshed would move the handling into the unquoted case to make it just 
a test+set:

--- misc.c  15 Mar 2018 16:51:29 -  1.69
+++ misc.c  15 Apr 2018 01:00:31 -
@@ -971,6 +971,9 @@ print_value_quoted(const char *s)
if (ctype(*p, C_QUOTE))
break;
if (!*p) {
+   /* handle zero-length value */
+   if (p == s)
+   s = "''";
shprintf("%s", s);
return;
}


...but I'm find with the original diff too.


Philip



Re: sem_trywait(3) and sem_wait(3) can return EINTR

2018-03-28 Thread Philip Guenther
On Wed, Mar 28, 2018 at 2:14 PM, Paul Irofti <p...@irofti.net> wrote:
>
> I do not know if this is expected or not, but sem_trywait(3) can and
> does return EINTR. From the manpage I got the impression that it should
> not. Should we amend the manpage or is this something to be fixed in the
> implementation?
>
>  73 do {
>  74 r = __thrsleep(ident, CLOCK_REALTIME, abstime,
>  75 >lock, delayed_cancel);
>  76 _spinlock(>lock);
>  77 /* ignore interruptions other than cancelation
> */
>  78 if (r == EINTR && (delayed_cancel == NULL ||
>  79 *delayed_cancel == 0))
>  80 r = 0;
>  81 } while (r == 0 && sem->value == 0);
>
> Lines 78--80 are the interesting ones.


Lines 69-70 are more important for sem_trywait():
 69 } else if (tryonly) {
 70 r = EAGAIN;
 71 } else {

sem_trywait() calls _sem_wait() with tryonly=1, so it can't reach that
do{tsleep}while loop and will never return EINTR.

In the end, _sem_wait() will return EINTR only if the thread was
canceled...in which case the function that called _sem_wait() will never
return but instead call _thread_canceled() via
the LEAVE_CANCEL_POINT_INNER() macro.


Philip Guenther


Re: free(9) while holding a rwlock

2018-02-21 Thread Philip Guenther
On Wed, Feb 21, 2018 at 1:26 AM, Martin Pieuchot  wrote:

> Is it safe?  What kind of deadlock/weird situation can occur?
>
> I'm asking because the diff below, that introduces a lock to protect uid
> globals, has an XXX comment from guenther@ asking if it is a problem.
>

The unclarity of concision: the question behind the XXX was whether the
code should release uidinfolk before the free() and then reacquire it, as
holding uidinfolk blocks all (real) UID changing calls as well as
waitpid(), fork(), and __tfork().  Could free() be so slow that not
blocking those could be a concern?  Doubtful, but 

Philip


Meltdown, aka "Dear Intel, you suck"

2018-01-05 Thread Philip Guenther
So, yes, we the OpenBSD developers are not totally asleep and a handful of
us are working out how to deal with Intel's fuck-up aka the Meltdown
attack.  While we have the advantage of less complexity in this area (e.g.,
no 32bit-on-64bit compat), there's still a pile of details to work through
about what has to be *always* in the page tables vs what can/should/must be
hidden.

Do KARL or other features of OpenBSD mitigate this?  Not particularly.  If
you're running code from multiple trust domains then yeah, you're largely
at risk.

We have received *no* non-public information.  I've seen posts elsewhere by
other *BSD people implying that they receive little or no prior warning, so
I have no reason to believe this was specific to OpenBSD and/or our
philosophy.  Personally, I do find itamusing? that public announcements
were moved up after the issue was deduced from development discussions and
commits to a different open source OS project.  Aren't we all glad that
this was under embargo and strongly believe in the future value of
embargoes?

Unless something unexpected happens, we'll be applying the workaround to
amd64 first and then working out what to do for i386 and arm* (if still
though to be necessary for arm) after that.  No promises on only applying
it to Intel CPUs or knobs to disable it, yet: we'll see how complex that
would make things.  As always, our own developer laptops are the first
targets, so we're invested in it working and being usable.


Philip Guenther
guent...@openbsd.org


  1   2   3   4   5   6   7   8   9   10   >