Re: Picky, but much more efficient arc4random_uniform!

2022-05-17 Thread Otto Moerbeek
On Wed, May 18, 2022 at 02:50:28PM +1000, Damien Miller wrote:

> On Tue, 17 May 2022, Raimo Niskanen wrote:
> 
> > Why reinvent the wheel?
> > 
> > Here is a pretty good walkthrough of established methods:
> > 
> > https://www.pcg-random.org/posts/bounded-rands.html
> > 
> > It sounds to me as if your suggested methor essentially is
> > "Bitmask with Rejection -- Apple's Method" with the added twist
> > to keep the unused bits of the base generator's word and append
> > new, which might be an optimization for small ranges, but might
> > be just overhead for large ones.
> > 
> > In that post one can see that there might be other suggested smaller
> > optimizations that can be applied to the OpenBSD method, and that
> > the bitmask method is effective in many cases but not a clear winner.
> 
> Oh nice, I wasn't aware that Apple had an improved algorithm. I always
> thought the best avenue for a speedup was to consider only the bits that
> could satisfy the constraint, but it never occurred to me how to actually
> make use of this :)
> 
> The toy benchmark below compares the existing implementation with
> reimplementations of both their version as well as something more close
> to Apple's actual method (which reuses the high bits).
> 
> However, I don't see a speedup for either of the alternate approaches.
> 
> [djm@djm ~]$ cc -O2 -Werror -Wall -Wextra -o rb rb.c && doas nice -n -20 ./rb 
> openbsd; elapsed = 8.327954
> bitmask; elapsed = 13.306228
> bitmask+reuse; elapsed = 11.567389

instrumenting the code to count the number of arc4random calls I see thsi:

openbsd; elapsed = 2.835819; calls = 12340949
bitmask; elapsed = 4.335576; calls = 17836216
bitmask+reuse; elapsed = 3.710277; calls = 15245337

(this is a different number of trials on a slow machine).

So the promise of less calls is not fulfilled. Sounds like a bug.

-Otto



Re: Picky, but much more efficient arc4random_uniform!

2022-05-17 Thread Theo de Raadt
> However, I don't see a speedup for either of the alternate approaches.

Or maybe, just maybe, the underlying function (arc4random, which in the
dominant case it is just a small chacha step) is so inexpensive relative
to the proposed higher-level logic changes?  So this is all tilting at
windmills, and also something about measuring before optimizing^Wbreaking...



Re: Picky, but much more efficient arc4random_uniform!

2022-05-17 Thread Damien Miller
On Tue, 17 May 2022, Raimo Niskanen wrote:

> Why reinvent the wheel?
> 
> Here is a pretty good walkthrough of established methods:
> 
> https://www.pcg-random.org/posts/bounded-rands.html
> 
> It sounds to me as if your suggested methor essentially is
> "Bitmask with Rejection -- Apple's Method" with the added twist
> to keep the unused bits of the base generator's word and append
> new, which might be an optimization for small ranges, but might
> be just overhead for large ones.
> 
> In that post one can see that there might be other suggested smaller
> optimizations that can be applied to the OpenBSD method, and that
> the bitmask method is effective in many cases but not a clear winner.

Oh nice, I wasn't aware that Apple had an improved algorithm. I always
thought the best avenue for a speedup was to consider only the bits that
could satisfy the constraint, but it never occurred to me how to actually
make use of this :)

The toy benchmark below compares the existing implementation with
reimplementations of both their version as well as something more close
to Apple's actual method (which reuses the high bits).

However, I don't see a speedup for either of the alternate approaches.

[djm@djm ~]$ cc -O2 -Werror -Wall -Wextra -o rb rb.c && doas nice -n -20 ./rb 
openbsd; elapsed = 8.327954
bitmask; elapsed = 13.306228
bitmask+reuse; elapsed = 11.567389

Maybe my benchmark is crap or maybe I need dlg@ to omgoptimize it for me...



#include 
#include 
#include 
#include 

#ifndef __has_builtin
#define __has_builtin(x) 0
#endif
#if __has_builtin(__builtin_clz)
#define arc4random_clz(x) __builtin_clz(x)
#else
#warning lacks __builtin_clz()
/* Count most-significant zero bits, like __builtin_clz() */
static int
arc4random_clz(unsigned int x)
{
int ret = 0;
unsigned int n;
const int lut[16] = { 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0 };

while (x != 0) {
n = (x >> 28) & 0xf;
if (n != 0)
return ret + lut[n];
x <<= 4;
ret += 4;
}
return 32;
}
#endif

/* Current OpenBSD implementation */
static uint32_t
arc4random_uniform1(uint32_t upper_bound)
{
uint32_t r, min;

if (upper_bound < 2)
return 0;

/* 2**32 % x == (2**32 - x) % x */
min = -upper_bound % upper_bound;

/*
 * This could theoretically loop forever but each retry has
 * p > 0.5 (worst case, usually far better) of selecting a
 * number inside the range we need, so it should rarely need
 * to re-roll.
 */
for (;;) {
r = arc4random();
if (r >= min)
break;
}

return r % upper_bound;
}

/*
 * Like "Bitmask with Rejection" implementation from
 * https://www.pcg-random.org/posts/bounded-rands.html
 */
static uint32_t
arc4random_uniform2(uint32_t upper_bound)
{
uint32_t mask, r;

if (upper_bound < 2)
return 0;

mask = (uint32_t)-1 >> arc4random_clz((upper_bound - 1) | 1);;
for (;;) {
r = arc4random();
if ((r & mask) < upper_bound)
return r & mask;
}
/* NOTREACHED */
}

/*
 * Like Apple's
 * 
https://opensource.apple.com/source/Libc/Libc-1158.50.2/gen/FreeBSD/arc4random.c
 */
static uint32_t
arc4random_uniform3(uint32_t upper_bound)
{
int zeroes, bits, remain = 32;
uint32_t mask, r;

if (upper_bound < 2)
return 0;

zeroes = arc4random_clz((upper_bound - 1) | 1);
bits = 32 - zeroes;
mask = (uint32_t)-1 >> zeroes;

for (;;) {
r = arc4random();
if ((r & mask) < upper_bound)
return r & mask;
for (remain = zeroes; remain >= bits; remain -= bits) {
r >>= bits;
if ((r & mask) < upper_bound)
return r & mask;
}
}
/* NOTREACHED */
}


#include 

static uint32_t
fixture(const char *s, uint32_t (* const rnd)(uint32_t))
{
const uint32_t trials = 50 * 1000 * 1000;
const uint32_t max = 0x8000;
const uint32_t mul = 7;
unsigned int v, i, r;
struct timeval start, finish, delta;

gettimeofday(, NULL);
for (v = 3, r = 0; v < max; v *= mul) {
/* printf("%u\n", v); */
for (i = 0; i < trials; i++)
r |= rnd(v);
}
gettimeofday(, NULL);
timersub(, , );
printf("%s; elapsed = %lld.%06lld\n", s,
(long long)delta.tv_sec, (long long)delta.tv_usec);
return r;
}

int main(void) {
fixture("openbsd", arc4random_uniform1);
fixture("bitmask", arc4random_uniform2);
fixture("bitmask+reuse", arc4random_uniform3);
}



Re: Picky, but much more efficient arc4random_uniform!

2022-05-17 Thread Philip Guenther
On Tue, May 17, 2022 at 1:10 PM Steffen Nurpmeso  wrote:

> Joerg Sonnenberger wrote in
>  :
>  |Am Fri, May 13, 2022 at 09:43:26AM -0500 schrieb Luke Small:
>  |> I made a couple new versions of a new kind of arc4random_uniform-like
>  ...
>  |If your main use case is limiting the amount of cryptography when using
>  |small bounds, there is a much simpler approach to be taken here. For
>  |boundaries below 256, use arc4random_buf to extract one byte if bound is
>  |a power of two, otherwise two. This gives most of the performance
>  |benefit without complicating the algorithm. Extracting two bytes ensures
>  |that the propability of success is > 99% and the double extracting
>  |doesn't eat up the benefits.
>
> You can use (really implemented) _buf() if you need a 8-bit or
> 16-bit etc number.
>
> I find that _uniform() often makes no difference to a simple
> modulo because like the comment in _uniform() says "p > 0.5 (worst

case, usually far better", and usually RNGs sprinkle bits nicely,
>

What does that statement mean?  You seem to be saying "module is uniform,
except when it isn't, which could be almost half the time for some cases,
but when it's uniform it's uniform, so why bother making it actually
correct and dependable".

I mean, what does that _mean_???   It's as if I said "my text handling
program handles all characters uniformly, except those with accents, but
that's less than 10% of the characters I type, so it handles all characters
uniformly."  WTF, NO!



> 0 bytes "do not occur", so a 32-bit RNG value "is" >=0x01FF in
> most cases for "my RNG" (of 10 803/759/793 NOT; 776/805/793
> NOT for Linux getrandom(2)), which is a pretty high cut off.
> Using _uniform() just because of its name seems strange thus.
>

Where do these ideas come from, that "0 bytes 'do not occur'"??  If your
rand generator doesn't provide zero bytes at the expected frequency, you
know, 1 in 256, then you're using a garbage random number generator.
Please stop making such suggestions here because THEY ARE NOT TRUE ABOUT
OPENBSD.  Do ya'll not bother to test the claims that you make?

: bleys; cat f.c
#include 
#include 

int
main(void)
{
uint32_t u;
long count;

count = 0;
while ((u = arc4random()) > 0x1ff)
count++;
printf("%08x\t%ld\n", u, count);

count = 0;
for (;;) {
u = arc4random();
if ((u & 0xff00) == 0 ||
(u & 0x00ff) == 0 ||
(u & 0xff00) == 0 ||
(u & 0x00ff) == 0)
break;
count++;
}
printf("%08x\t%ld\n", u, count);

return 0;
}
: bleys; cc f.c
: bleys; ./a.out
00b82e5c58
ab47880036
: bleys;


Philip Guenther


Re: powerpc*: make EXC_LAST less awkward

2022-05-17 Thread Mark Kettenis
> Date: Tue, 17 May 2022 19:20:49 +
> From: Miod Vallat 
> 
> As seen in sys/arch/powerpc64/include/trap.h, not-so-ancient PowerPC and
> POWER processors define exception addresses past EXC_LAST.

Erh, wait, not really.  EXC_AST is "fake" and EXC_USER is a flag that
indicates we came from userland.
 
> The following diff changes EXC_LAST to no longer be the last
> "traditional" (0x100 bytes long) exception address, but the
> (conveniently page-aligned) end of the exception vectors area, and
> updates the pointer arithmetic accordingly.

Should we rename EXC_LAST to EXC_END?  That should avoid confusion
with the FreeBSD code that I used as a model to implement the
low-lever powerpc64 code.

> While there, this replaces a hardcoded number for the main Altivec
> exception with the proper symbolic constant (although the comment block
> following that ought to be updated).
> 
> Tested on macppc, not tested on powerpc64 due to lack of hardware.
> 
> Index: sys/arch/macppc/macppc/machdep.c
> ===
> RCS file: /OpenBSD/src/sys/arch/macppc/macppc/machdep.c,v
> retrieving revision 1.195
> diff -u -p -r1.195 machdep.c
> --- sys/arch/macppc/macppc/machdep.c  7 Dec 2021 17:50:44 -   1.195
> +++ sys/arch/macppc/macppc/machdep.c  17 May 2022 14:56:04 -
> @@ -169,7 +169,7 @@ initppc(u_int startkernel, u_int endkern
>   /*
>* Set up trap vectors
>*/
> - for (exc = EXC_RSVD; exc <= EXC_LAST; exc += 0x100) {
> + for (exc = EXC_RSVD; exc < EXC_LAST; exc += 0x100) {
>   switch (exc) {
>   default:
>   bcopy(, (void *)exc, (size_t));
> @@ -212,7 +212,7 @@ initppc(u_int startkernel, u_int endkern
>   }
>  
>   /* Grr, ALTIVEC_UNAVAIL is a vector not ~0xff aligned: 0x0f20 */
> - bcopy(, (void *)0xf20, (size_t));
> + bcopy(, (void *)EXC_VEC, (size_t));
>  
>   /*
>* since trapsize is > 0x20, we just overwrote the EXC_PERF handler
> @@ -222,7 +222,7 @@ initppc(u_int startkernel, u_int endkern
>* do not generate EXC_PERF exceptions...
>*/
>  
> - syncicache((void *)EXC_RST, EXC_LAST - EXC_RST + 0x100);
> + syncicache((void *)EXC_RST, EXC_LAST - EXC_RST);
>  
>   /*
>* Now enable translation (and machine checks/recoverable interrupts).
> Index: sys/arch/powerpc/include/trap.h
> ===
> RCS file: /OpenBSD/src/sys/arch/powerpc/include/trap.h,v
> retrieving revision 1.7
> diff -u -p -r1.7 trap.h
> --- sys/arch/powerpc/include/trap.h   26 Apr 2007 21:36:32 -  1.7
> +++ sys/arch/powerpc/include/trap.h   17 May 2022 14:56:04 -
> @@ -58,7 +58,8 @@
>  #define  EXC_DLMISS  0x1100  /* Data load translation miss */
>  #define  EXC_DSMISS  0x1200  /* Data store translation miss 
> */
>  
> -#define  EXC_LAST0x2f00  /* Last possible exception 
> vector */
> +#define  EXC_LAST0x3000  /* End of last possible 
> exception
> +vector */
>  
>  #define  EXC_AST 0x3000  /* Fake AST vector */
>  
> Index: sys/arch/powerpc64/include/trap.h
> ===
> RCS file: /OpenBSD/src/sys/arch/powerpc64/include/trap.h,v
> retrieving revision 1.8
> diff -u -p -r1.8 trap.h
> --- sys/arch/powerpc64/include/trap.h 11 Mar 2021 11:17:00 -  1.8
> +++ sys/arch/powerpc64/include/trap.h 17 May 2022 14:56:04 -
> @@ -1,3 +1,4 @@
> +/*   $OpenBSD$   */
>  /*-
>   * SPDX-License-Identifier: BSD-4-Clause
>   *
> @@ -105,7 +106,8 @@
>  /* POWER8 */
>  #define EXC_SOFT_PATCH   0x1500  /* POWER8 Soft Patch Exception 
> */
>  
> -#define  EXC_LAST0x2f00  /* Last possible exception 
> vector */
> +#define  EXC_LAST0x3000  /* End of last possible 
> exception
> +vector */
>  
>  #define  EXC_AST 0x3000  /* Fake AST vector */
>  
> 
> 



Re: pf: remove unused include files

2022-05-17 Thread Alexander Bluhm
On Tue, May 17, 2022 at 06:40:12PM +, Miod Vallat wrote:
> sys/net/pf.c r1.968 added a call to m_print() #ifdef DDB in a
> troublesome situation.
> 
> Once the root cause of the problem was fixed, the DDB-specific code path
> was removed in r1.970, but the added includes were kept, although
> nothing in pf.c depends on DDB anymore.

OK bluhm@

> Index: pf.c
> ===
> RCS file: /OpenBSD/src/sys/net/pf.c,v
> retrieving revision 1.1129
> diff -u -p -r1.1129 pf.c
> --- pf.c  5 May 2022 16:44:22 -   1.1129
> +++ pf.c  17 May 2022 18:38:34 -
> @@ -103,11 +103,6 @@
>  struct pfsync_deferral;
>  #endif /* NPFSYNC > 0 */
>  
> -#ifdef DDB
> -#include 
> -#include 
> -#endif
> -
>  /*
>   * Global variables
>   */



Re: Picky, but much more efficient arc4random_uniform!

2022-05-17 Thread Steffen Nurpmeso
Steffen Nurpmeso wrote in
 <20220517220924.xohqc%stef...@sdaoden.eu>:
 |Joerg Sonnenberger wrote in
 | :
 ||Am Fri, May 13, 2022 at 09:43:26AM -0500 schrieb Luke Small:
 ||> I made a couple new versions of a new kind of arc4random_uniform-like
 ...
 |0 bytes "do not occur", so a 32-bit RNG value "is" >=0x01FF in
 |most cases for "my RNG" (of 10 803/759/793 NOT; 776/805/793
 |NOT for Linux getrandom(2)), which is a pretty high cut off.
 ...

  /*@ lotto.cxx - output six random numbers */

  #include 
  #include 

  #include 

  //#define NYDPROF_ENABLE
  //#define NYD_ENABLE
  //#define NYD2_ENABLE
  #include 
  NSPC_USE(su)

  static u32
  bounded_rand(u32 range, u32 rv){
 for(u32 t = -range % range;;){
if(rv >= t)
   return rv % range;
log::write(log::emerg, "NOFIT");
 }
  }

  int
  main(void){
 u32 uni[50], mod[50], rv;

 state::create_core(NIL, (state::debug | log::debug), state::err_nopass);

 su_mem_set(uni, 0, sizeof uni);
 su_mem_set(mod, 0, sizeof mod);

  u32 au=0;
 for(u32 i = 0; i < 10; ++i){
random::builtin_generate(, sizeof rv, state::err_nopass);
//if(getrandom(, sizeof rv, GRND_NONBLOCK) == -1)
//   log::write(log::emerg, "getrandom(2)");
  if(rv < 0x01FF)
  log::write(log::info, "AU %u - 0x%X", ++au,rv);
++mod[rv % NELEM(mod)];
++uni[bounded_rand(NELEM(mod), rv)];
 }

 u32 unilo, modlo, unihi, modhi;
 unilo = modlo = max::u32;
 unihi = modhi = 0;

 for(u32 i = 0; i < NELEM(uni); ++i){
unilo = get_min(unilo, uni[i]);
modlo = get_min(modlo, mod[i]);
unihi = get_max(unihi, uni[i]);
modhi = get_max(modhi, mod[i]);
log::write(log::info, "%u - %u / %u %s",
   i, uni[i], mod[i], (uni[i] != mod[i] ? "!!!" : "="));
 }
 log::write(log::info, "MIN %u / %u, MAX %u / %u - au %u",
   unilo, modlo, unihi, modhi,au);

 return 0;
  }

  #include 
  // s-it-mode

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



Re: Picky, but much more efficient arc4random_uniform!

2022-05-17 Thread Steffen Nurpmeso
Joerg Sonnenberger wrote in
 :
 |Am Fri, May 13, 2022 at 09:43:26AM -0500 schrieb Luke Small:
 |> I made a couple new versions of a new kind of arc4random_uniform-like
 ...
 |If your main use case is limiting the amount of cryptography when using
 |small bounds, there is a much simpler approach to be taken here. For
 |boundaries below 256, use arc4random_buf to extract one byte if bound is
 |a power of two, otherwise two. This gives most of the performance
 |benefit without complicating the algorithm. Extracting two bytes ensures
 |that the propability of success is > 99% and the double extracting
 |doesn't eat up the benefits.

You can use (really implemented) _buf() if you need a 8-bit or
16-bit etc number.

I find that _uniform() often makes no difference to a simple
modulo because like the comment in _uniform() says "p > 0.5 (worst
case, usually far better", and usually RNGs sprinkle bits nicely,
0 bytes "do not occur", so a 32-bit RNG value "is" >=0x01FF in
most cases for "my RNG" (of 10 803/759/793 NOT; 776/805/793
NOT for Linux getrandom(2)), which is a pretty high cut off.
Using _uniform() just because of its name seems strange thus.

--steffen
|
|Der Kragenbaer,The moon bear,
|der holt sich munter   he cheerfully and one by one
|einen nach dem anderen runter  wa.ks himself off
|(By Robert Gernhardt)



Re: Picky, but much more efficient arc4random_uniform!

2022-05-17 Thread Joerg Sonnenberger
Am Fri, May 13, 2022 at 09:43:26AM -0500 schrieb Luke Small:
> I made a couple new versions of a new kind of arc4random_uniform-like
> function and some example functions which use them. Instead of having a
> sufficiently large random number greater than the modulus, I pick a random
> number using arc4random() from a bitfield where the length of the bitfield
> is just below or slightly beyond the value of the modulus and returns the
> bitfield it if it is less than the value of the modulus.

If your main use case is limiting the amount of cryptography when using
small bounds, there is a much simpler approach to be taken here. For
boundaries below 256, use arc4random_buf to extract one byte if bound is
a power of two, otherwise two. This gives most of the performance
benefit without complicating the algorithm. Extracting two bytes ensures
that the propability of success is > 99% and the double extracting
doesn't eat up the benefits.

Joerg



mg: dobeep_msgs and %s

2022-05-17 Thread Omar Polo
dobeep_msgs isn't printf-like: it just prints the two arguments
separated by a space in the minibuffer.  When it was introduced some
call from ewprintf were incorrectly translated and the "%s" remained.

For example, "M-g M-g abc RET" shows "Line number %s invalid".

ok?

diff bd057c8434699e3a647a884343d442828df4678e /usr/src
blob - 251e7e8185b0b897d28f26fa3bf11d4f8eb281f5
file + usr.bin/mg/basic.c
--- usr.bin/mg/basic.c
+++ usr.bin/mg/basic.c
@@ -536,7 +536,7 @@ gotoline(int f, int n)
return (ABORT);
n = (int)strtonum(buf, INT_MIN, INT_MAX, );
if (err)
-   return(dobeep_msgs("Line number %s", err));
+   return(dobeep_msgs("Line number", err));
}
return(setlineno(n));
 }
blob - 9ad06e49bc715c81e70573ed000e6f5c321d97a0
file + usr.bin/mg/cscope.c
--- usr.bin/mg/cscope.c
+++ usr.bin/mg/cscope.c
@@ -185,9 +185,9 @@ cscreatelist(int f, int n)
return (FALSE);
 
if (stat(dir, ) == -1)
-   return(dobeep_msgs("stat: %s", strerror(errno)));
+   return(dobeep_msgs("stat:", strerror(errno)));
else if (S_ISDIR(sb.st_mode) == 0)
-   return(dobeep_msgs("%s: Not a directory", dir));
+   return(dobeep_msgs(dir, "Not a directory"));
 
if (csexists("cscope-indexer") == FALSE)
return(dobeep_msg("no such file or directory, cscope-indexer"));



powerpc*: make EXC_LAST less awkward

2022-05-17 Thread Miod Vallat
As seen in sys/arch/powerpc64/include/trap.h, not-so-ancient PowerPC and
POWER processors define exception addresses past EXC_LAST.

The following diff changes EXC_LAST to no longer be the last
"traditional" (0x100 bytes long) exception address, but the
(conveniently page-aligned) end of the exception vectors area, and
updates the pointer arithmetic accordingly.

While there, this replaces a hardcoded number for the main Altivec
exception with the proper symbolic constant (although the comment block
following that ought to be updated).

Tested on macppc, not tested on powerpc64 due to lack of hardware.

Index: sys/arch/macppc/macppc/machdep.c
===
RCS file: /OpenBSD/src/sys/arch/macppc/macppc/machdep.c,v
retrieving revision 1.195
diff -u -p -r1.195 machdep.c
--- sys/arch/macppc/macppc/machdep.c7 Dec 2021 17:50:44 -   1.195
+++ sys/arch/macppc/macppc/machdep.c17 May 2022 14:56:04 -
@@ -169,7 +169,7 @@ initppc(u_int startkernel, u_int endkern
/*
 * Set up trap vectors
 */
-   for (exc = EXC_RSVD; exc <= EXC_LAST; exc += 0x100) {
+   for (exc = EXC_RSVD; exc < EXC_LAST; exc += 0x100) {
switch (exc) {
default:
bcopy(, (void *)exc, (size_t));
@@ -212,7 +212,7 @@ initppc(u_int startkernel, u_int endkern
}
 
/* Grr, ALTIVEC_UNAVAIL is a vector not ~0xff aligned: 0x0f20 */
-   bcopy(, (void *)0xf20, (size_t));
+   bcopy(, (void *)EXC_VEC, (size_t));
 
/*
 * since trapsize is > 0x20, we just overwrote the EXC_PERF handler
@@ -222,7 +222,7 @@ initppc(u_int startkernel, u_int endkern
 * do not generate EXC_PERF exceptions...
 */
 
-   syncicache((void *)EXC_RST, EXC_LAST - EXC_RST + 0x100);
+   syncicache((void *)EXC_RST, EXC_LAST - EXC_RST);
 
/*
 * Now enable translation (and machine checks/recoverable interrupts).
Index: sys/arch/powerpc/include/trap.h
===
RCS file: /OpenBSD/src/sys/arch/powerpc/include/trap.h,v
retrieving revision 1.7
diff -u -p -r1.7 trap.h
--- sys/arch/powerpc/include/trap.h 26 Apr 2007 21:36:32 -  1.7
+++ sys/arch/powerpc/include/trap.h 17 May 2022 14:56:04 -
@@ -58,7 +58,8 @@
 #defineEXC_DLMISS  0x1100  /* Data load translation miss */
 #defineEXC_DSMISS  0x1200  /* Data store translation miss 
*/
 
-#defineEXC_LAST0x2f00  /* Last possible exception 
vector */
+#defineEXC_LAST0x3000  /* End of last possible 
exception
+  vector */
 
 #defineEXC_AST 0x3000  /* Fake AST vector */
 
Index: sys/arch/powerpc64/include/trap.h
===
RCS file: /OpenBSD/src/sys/arch/powerpc64/include/trap.h,v
retrieving revision 1.8
diff -u -p -r1.8 trap.h
--- sys/arch/powerpc64/include/trap.h   11 Mar 2021 11:17:00 -  1.8
+++ sys/arch/powerpc64/include/trap.h   17 May 2022 14:56:04 -
@@ -1,3 +1,4 @@
+/* $OpenBSD$   */
 /*-
  * SPDX-License-Identifier: BSD-4-Clause
  *
@@ -105,7 +106,8 @@
 /* POWER8 */
 #define EXC_SOFT_PATCH 0x1500  /* POWER8 Soft Patch Exception */
 
-#defineEXC_LAST0x2f00  /* Last possible exception 
vector */
+#defineEXC_LAST0x3000  /* End of last possible 
exception
+  vector */
 
 #defineEXC_AST 0x3000  /* Fake AST vector */
 



pf: remove unused include files

2022-05-17 Thread Miod Vallat
sys/net/pf.c r1.968 added a call to m_print() #ifdef DDB in a
troublesome situation.

Once the root cause of the problem was fixed, the DDB-specific code path
was removed in r1.970, but the added includes were kept, although
nothing in pf.c depends on DDB anymore.

Index: pf.c
===
RCS file: /OpenBSD/src/sys/net/pf.c,v
retrieving revision 1.1129
diff -u -p -r1.1129 pf.c
--- pf.c5 May 2022 16:44:22 -   1.1129
+++ pf.c17 May 2022 18:38:34 -
@@ -103,11 +103,6 @@
 struct pfsync_deferral;
 #endif /* NPFSYNC > 0 */
 
-#ifdef DDB
-#include 
-#include 
-#endif
-
 /*
  * Global variables
  */



Re: start unlocking kbind(2)

2022-05-17 Thread Theo de Raadt
Martin Pieuchot  wrote:

> On 17/05/22(Tue) 10:44, David Gwynne wrote:
> > this narrows the scope of the KERNEL_LOCK in kbind(2) so the syscall
> > argument checks can be done without the kernel lock.
> > 
> > care is taken to allow the pc/cookie checks to run without any lock by
> > being careful with the order of the checks. all modifications to the
> > pc/cookie state are serialised by the per process mutex.
> 
> I don't understand why it is safe to do the following check without
> holding a mutex:
> 
>   if (pr->ps_kbind_addr == pc)
>   ...
> 
> Is there much differences when always grabbing the per-process mutex?

I think the theory is ps_kbind_addr is fixed to a shared address space
in "pr", if you are threaded there is only one ps_kbind_addr for all the
processes "p" sharing that address space.

And execve() uses single_thread_set, which means you can't race?




Re: start unlocking kbind(2)

2022-05-17 Thread Martin Pieuchot
On 17/05/22(Tue) 10:44, David Gwynne wrote:
> this narrows the scope of the KERNEL_LOCK in kbind(2) so the syscall
> argument checks can be done without the kernel lock.
> 
> care is taken to allow the pc/cookie checks to run without any lock by
> being careful with the order of the checks. all modifications to the
> pc/cookie state are serialised by the per process mutex.

I don't understand why it is safe to do the following check without
holding a mutex:

if (pr->ps_kbind_addr == pc)
...

Is there much differences when always grabbing the per-process mutex?

> i dont know enough about uvm to say whether it is safe to unlock the
> actual memory updates too, but even if i was confident i would still
> prefer to change it as a separate step.

I agree.

> Index: kern/init_sysent.c
> ===
> RCS file: /cvs/src/sys/kern/init_sysent.c,v
> retrieving revision 1.236
> diff -u -p -r1.236 init_sysent.c
> --- kern/init_sysent.c1 May 2022 23:00:04 -   1.236
> +++ kern/init_sysent.c17 May 2022 00:36:03 -
> @@ -1,4 +1,4 @@
> -/*   $OpenBSD: init_sysent.c,v 1.236 2022/05/01 23:00:04 tedu Exp $  */
> +/*   $OpenBSD$   */
>  
>  /*
>   * System call switch table.
> @@ -204,7 +204,7 @@ const struct sysent sysent[] = {
>   sys_utimensat },/* 84 = utimensat */
>   { 2, s(struct sys_futimens_args), 0,
>   sys_futimens }, /* 85 = futimens */
> - { 3, s(struct sys_kbind_args), 0,
> + { 3, s(struct sys_kbind_args), SY_NOLOCK | 0,
>   sys_kbind },/* 86 = kbind */
>   { 2, s(struct sys_clock_gettime_args), SY_NOLOCK | 0,
>   sys_clock_gettime },/* 87 = clock_gettime */
> Index: kern/syscalls.master
> ===
> RCS file: /cvs/src/sys/kern/syscalls.master,v
> retrieving revision 1.223
> diff -u -p -r1.223 syscalls.master
> --- kern/syscalls.master  24 Feb 2022 07:41:51 -  1.223
> +++ kern/syscalls.master  17 May 2022 00:36:03 -
> @@ -194,7 +194,7 @@
>   const struct timespec *times, int flag); }
>  85   STD { int sys_futimens(int fd, \
>   const struct timespec *times); }
> -86   STD { int sys_kbind(const struct __kbind *param, \
> +86   STD NOLOCK  { int sys_kbind(const struct __kbind *param, \
>   size_t psize, int64_t proc_cookie); }
>  87   STD NOLOCK  { int sys_clock_gettime(clockid_t clock_id, \
>   struct timespec *tp); }
> Index: uvm/uvm_mmap.c
> ===
> RCS file: /cvs/src/sys/uvm/uvm_mmap.c,v
> retrieving revision 1.169
> diff -u -p -r1.169 uvm_mmap.c
> --- uvm/uvm_mmap.c19 Jan 2022 10:43:48 -  1.169
> +++ uvm/uvm_mmap.c17 May 2022 00:36:03 -
> @@ -70,6 +70,7 @@
>  #include 
>  #include   /* for KBIND* */
>  #include 
> +#include 
>  
>  #include /* for __LDPGSZ */
>  
> @@ -1125,33 +1126,64 @@ sys_kbind(struct proc *p, void *v, regis
>   const char *data;
>   vaddr_t baseva, last_baseva, endva, pageoffset, kva;
>   size_t psize, s;
> - u_long pc;
> + u_long pc, kpc;
>   int count, i, extra;
> + uint64_t cookie;
>   int error;
>  
>   /*
>* extract syscall args from uap
>*/
>   paramp = SCARG(uap, param);
> - psize = SCARG(uap, psize);
>  
>   /* a NULL paramp disables the syscall for the process */
>   if (paramp == NULL) {
> + mtx_enter(>ps_mtx);
>   if (pr->ps_kbind_addr != 0)
> - sigexit(p, SIGILL);
> + goto leave_sigill;
>   pr->ps_kbind_addr = BOGO_PC;
> + mtx_leave(>ps_mtx);
>   return 0;
>   }
>  
>   /* security checks */
> +
> + /*
> +  * ps_kbind_addr can only be set to 0 or BOGO_PC by the
> +  * kernel, not by a call from userland.
> +  */
>   pc = PROC_PC(p);
> - if (pr->ps_kbind_addr == 0) {
> - pr->ps_kbind_addr = pc;
> - pr->ps_kbind_cookie = SCARG(uap, proc_cookie);
> - } else if (pc != pr->ps_kbind_addr || pc == BOGO_PC)
> - sigexit(p, SIGILL);
> - else if (pr->ps_kbind_cookie != SCARG(uap, proc_cookie))
> - sigexit(p, SIGILL);
> + if (pc == 0 || pc == BOGO_PC)
> + goto sigill;
> +
> + cookie = SCARG(uap, proc_cookie);
> + if (pr->ps_kbind_addr == pc) {
> + membar_datadep_consumer();
> + if (pr->ps_kbind_cookie != cookie)
> + goto sigill;
> + } else {
> + mtx_enter(>ps_mtx);
> + kpc = pr->ps_kbind_addr;
> +
> + /*
> +  * If we're the first thread in (kpc is 0), then
> +  * 

Call uvm_vnp_uncache() before VOP_RENAME()

2022-05-17 Thread Martin Pieuchot
nfsrv_rename() should behave like dorenameat() and tell UVM to "flush" a
possibly mmap'ed file before calling VOP_RENAME().

ok?

Index: nfs/nfs_serv.c
===
RCS file: /cvs/src/sys/nfs/nfs_serv.c,v
retrieving revision 1.120
diff -u -p -r1.120 nfs_serv.c
--- nfs/nfs_serv.c  11 Mar 2021 13:31:35 -  1.120
+++ nfs/nfs_serv.c  4 May 2022 15:29:06 -
@@ -1488,6 +1488,9 @@ nfsrv_rename(struct nfsrv_descript *nfsd
error = -1;
 out:
if (!error) {
+   if (tvp) {
+   (void)uvm_vnp_uncache(tvp);
+   }
error = VOP_RENAME(fromnd.ni_dvp, fromnd.ni_vp, _cnd,
   tond.ni_dvp, tond.ni_vp, _cnd);
} else {



wsconscfg(8): added "-g" option

2022-05-17 Thread Sergiy Kopchalyuk
While debugging simplefb-related issues with /dev/console initialization
on Pinebook Pro I've discovered that wsconscfg(8) does not provide 
a way to call WSDISPLAY_GETSCREEN ioctl. Hence this small patch. 

It is similar to what NetBSD has in their codebase.


Index: usr.sbin/wsconscfg/wsconscfg.8
===
RCS file: /cvs/src/usr.sbin/wsconscfg/wsconscfg.8,v
retrieving revision 1.20
diff -u -p -r1.20 wsconscfg.8
--- usr.sbin/wsconscfg/wsconscfg.8  1 Jul 2010 02:46:06 -   1.20
+++ usr.sbin/wsconscfg/wsconscfg.8  17 May 2022 11:48:52 -
@@ -33,7 +33,7 @@
 .Nd configure virtual terminals on a wscons display
 .Sh SYNOPSIS
 .Nm wsconscfg
-.Op Fl dFkm
+.Op Fl gdFkm
 .Op Fl e Ar emul
 .Op Fl f Ar ctldev
 .Op Fl t Ar type
@@ -41,7 +41,7 @@
 .Sh DESCRIPTION
 The
 .Nm
-tool allows for the creation and removal of virtual terminals
+tool allows for the viewing, creation and removal of virtual terminals
 on display devices controlled by the wscons terminal framework,
 as long as the underlying display hardware driver supports multiple screens.
 Furthermore, it controls the assignment of keyboards to displays.
@@ -111,6 +111,13 @@ display properties.
 Valid
 .Ar type
 arguments are defined by the underlying display device driver.
+.It Fl g
+Print information (index, screen type and terminal emulation) about
+virtual terminal, specified by
+.Ar index .
+If the
+.Ar index
+argument is omitted, information about current terminal is displayed.
 .El
 .\" .Pp
 .\" Typically, the
Index: usr.sbin/wsconscfg/wsconscfg.c
===
RCS file: /cvs/src/usr.sbin/wsconscfg/wsconscfg.c,v
retrieving revision 1.17
diff -u -p -r1.17 wsconscfg.c
--- usr.sbin/wsconscfg/wsconscfg.c  24 Oct 2021 21:24:19 -  1.17
+++ usr.sbin/wsconscfg/wsconscfg.c  17 May 2022 11:48:52 -
@@ -52,21 +52,23 @@ usage(void)
extern char *__progname;
 
(void)fprintf(stderr,
-   "usage: %s [-dFkm] [-e emul] [-f ctldev] [-t type] index\n",
+   "usage: %s [-gdFkm] [-e emul] [-f ctldev] [-t type] index\n",
__progname);
exit(1);
 }
 
+
 int
 main(int argc, char *argv[])
 {
char *wsdev;
-   int c, delete, kbd, idx, wsfd, res, mux;
+   int c, get, delete, kbd, idx, wsfd, res, mux;
struct wsdisplay_addscreendata asd;
struct wsdisplay_delscreendata dsd;
struct wsmux_device wmd;
 
wsdev = DEFDEV;
+   get = 0;
delete = 0;
kbd = 0;
mux = 0;
@@ -74,11 +76,14 @@ main(int argc, char *argv[])
asd.emul[0] = 0;
dsd.flags = 0;
 
-   while ((c = getopt(argc, argv, "f:dkmt:e:F")) != -1) {
+   while ((c = getopt(argc, argv, "f:gdkmt:e:F")) != -1) {
switch (c) {
case 'f':
wsdev = optarg;
break;
+   case 'g':
+   get = 1;
+   break;
case 'd':
delete = 1;
break;
@@ -107,14 +112,14 @@ main(int argc, char *argv[])
argc -= optind;
argv += optind;
 
-   if (kbd ? (argc > 1) : (argc != 1))
+   if ((kbd ^ get) ? (argc > 1) : (argc != 1))
usage();
 
idx = -1;
if (argc > 0 && sscanf(argv[0], "%d", ) != 1)
errx(1, "invalid index");
 
-   wsfd = open(wsdev, O_RDWR);
+   wsfd = open(wsdev, get ? O_RDONLY : O_RDWR);
if (wsfd < 0)
err(2, "%s", wsdev);
 
@@ -133,6 +138,14 @@ main(int argc, char *argv[])
if (res < 0)
err(3, "WSMUXIO_ADD_DEVICE");
}
+   } else if (get) {
+   asd.idx = idx;
+   res = ioctl(wsfd, WSDISPLAYIO_GETSCREEN, );
+   if (res < 0)
+   err(3, "WSDISPLAYIO_GETSCREEN");
+   else
+   printf("idx: %d; screentype: %s; emul: %s\n",
+  asd.idx, asd.screentype, asd.emul);
} else if (delete) {
dsd.idx = idx;
res = ioctl(wsfd, WSDISPLAYIO_DELSCREEN, );



arpresolve: XXXXX route contains no arp information

2022-05-17 Thread Hrvoje Popovski
Hi all,

here we have carp pfsync setup mostly for wireless and mobile users.
Beside from pure forwarding and firewalling boxes are dhcp server with
dhcpsync.


>From time to time in logs i can see:

May 17 11:44:18 bcbnfw1 /bsd: arpresolve: 10.30.0.0: route contains no
arp information


I've route monitor it and i think that this is the one:

got message of size 272 on Tue May 17 11:44:18 2022
RTM_MISS: Lookup failed on this address: len 272, priority 0, table 0,
if# 0, pid: 0, seq 0, errno 17
flags:
fmask:
use:0   mtu:0expire:0
locks:  inits:
sockaddrs: 
 10.30.15.39 (53)
Qb2.20.00.80.ff.ff.7c.dd.7e.81.ff.ff.ff.ff.00.00.00.00.00.00.00.00.07.00.00.00.00.00.00.00.70.0d.26.66.80.fd.ff.ff.80.65.01.00.60.00.00.00.00.00.00.00.00.00.00.00.38.36.b2.20.00.80.ff.ff.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.11


at that time 11:44:18 there are few logs for ip 10.30.15.39 (mobile
phone), please see log in attachment.


route 10.30.0/20 is connected route to interface vlan1130

DestinationGatewayFlags   Refs  Use   Mtu  Prio
Iface
10.30.0/20 10.30.0.2  UCn  14122619 - 4
vlan1130

arp -an | grep 10.30.15.39
10.30.15.39  76:b9:98:72:7c:9c vlan1130 15m30s

Beside that in logs there are lot's of:
arp: attempt to add entry for 10.33.0.116 on vlan1133 by
a4:45:19:7d:bb:28 on vlan1132

i'm explaining this log to myself - because there is one SSID (eduroam)
and lots of vlans i will this this log no matter what ..


is this "arpresolve log" bug ?

got message of size 192 on Tue May 17 11:44:18 2022
RTM_ADD: Add Route: len 192, priority 3, table 0, if# 37, name vlan1130, pid: 
0, seq 0, errno 0
flags:
fmask:
use:0   mtu:0expire:0
locks:  inits:
sockaddrs: 
 10.30.15.39 link#37 90:e2:ba:d7:1b:f4 10.30.0.2
got message of size 192 on Tue May 17 11:44:18 2022
RTM_DELETE: Delete Route: len 192, priority 3, table 0, if# 37, name vlan1130, 
pid: 0, seq 0, errno 0
flags:
fmask:
use:0   mtu:0expire:0
locks:  inits:
sockaddrs: 
 10.30.15.39 link#37 90:e2:ba:d7:1b:f4 10.30.0.2
got message of size 192 on Tue May 17 11:44:18 2022
RTM_ADD: Add Route: len 192, priority 3, table 0, if# 37, name vlan1130, pid: 
0, seq 0, errno 0
flags:
fmask:
use:0   mtu:0expire:0
locks:  inits:
sockaddrs: 
 10.30.15.39 link#37 90:e2:ba:d7:1b:f4 10.30.0.2
got message of size 192 on Tue May 17 11:44:18 2022
RTM_DELETE: Delete Route: len 192, priority 3, table 0, if# 37, name vlan1130, 
pid: 0, seq 0, errno 0
flags:
fmask:
use:0   mtu:0expire:0
locks:  inits:
sockaddrs: 
 10.30.15.39 link#37 90:e2:ba:d7:1b:f4 10.30.0.2
got message of size 192 on Tue May 17 11:44:18 2022
RTM_ADD: Add Route: len 192, priority 3, table 0, if# 37, name vlan1130, pid: 
0, seq 0, errno 0
flags:
fmask:
use:0   mtu:0expire:0
locks:  inits:
sockaddrs: 
 10.30.15.39 link#37 90:e2:ba:d7:1b:f4 10.30.0.2
got message of size 272 on Tue May 17 11:44:18 2022
RTM_MISS: Lookup failed on this address: len 272, priority 0, table 0, if# 0, 
pid: 0, seq 0, errno 17
flags:
fmask:
use:0   mtu:0expire:0
locks:  inits:
sockaddrs: 
 10.30.15.39 (53) 
Qb2.20.00.80.ff.ff.7c.dd.7e.81.ff.ff.ff.ff.00.00.00.00.00.00.00.00.07.00.00.00.00.00.00.00.70.0d.26.66.80.fd.ff.ff.80.65.01.00.60.00.00.00.00.00.00.00.00.00.00.00.38.36.b2.20.00.80.ff.ff.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.00.11
got message of size 192 on Tue May 17 11:44:18 2022
RTM_RESOLVE: Route created by cloning: len 192, priority 3, table 0, if# 37, 
name vlan1130, pid: 0, seq 0, errno 0
flags:
fmask:
use:0   mtu:0expire:0
locks:  inits:
sockaddrs: 
 10.30.15.39 76:b9:98:72:7c:9c 90:e2:ba:d7:1b:f4 10.30.0.2


Re: mg and trailing whitespaces

2022-05-17 Thread Omar Polo
10 weeks bump :)

(+cc lum@)

Omar Polo  wrote:
> Hello tech,
> 
> mg(1) has this tendency to leave a lot of trailing whitespaces around in
> auto-indent-mode and c-mode which I find annoying.  Yes, there's
> delete-trailing-space but it works only on the current line (and not the
> whole buffer as in emacs) and one has to manually call it anyway.
> Emacs, and even vi in base, are clever in this regard: trailing
> whitespaces before the cursor are delete upon RET before adding a new
> line.
> 
> So, here's the same heuristic for mg when auto-indent-mode or c-mode is
> enabled.  It's still possible to leave trailing whitespace in a buffer
> in those modes, it only gets a little bit harder to do.
> 
> (as a next step I guess we could also garbage collect cc_strip_trailp,
> it was used in only one place before this patch and is unconditionally
> set to TRUE.)
> 
> Thoughts/OK?

This was also tested by Mikhail (thanks!) who suggested to follow the
GNU Emacs behavior of auto-indent-mode (i.e. first trim, then compute
the autoindent.)  Initially I agreed, but after some usage I find my
proposed behaviour more useful since mg lacks an automagic indentation
for sh, awk, conf, ...

diff fd726a6788bfd748bd7a97d560a67c0fa887 
c9a41b1dcf7c2abb23d39c14e8e488d5bb193a95
blob - 6c0ef5b897536c8c2d6e0faa6500e5a64ea57598
blob + 7b47402aa1145e41a7e8b14ea62fcd3d49bab074
--- usr.bin/mg/cmode.c
+++ usr.bin/mg/cmode.c
@@ -205,6 +205,8 @@ cc_lfindent(int f, int n)
 {
if (n < 0)
return (FALSE);
+   if (cc_strip_trailp)
+   (void)delwhite(FFRAND, 1);
if (enewline(FFRAND, 1) == FALSE)
return (FALSE);
return (cc_indent(FFRAND, n));
blob - 4d38284f3b886043336ff967602354f0fc81a4fd
blob + 2593eb6b63a675ee6b928cf13a574ebf8f9b5acf
--- usr.bin/mg/util.c
+++ usr.bin/mg/util.c
@@ -373,6 +373,7 @@ lfindent(int f, int n)
nicol |= 0x07;
++nicol;
}
+   (void)delwhite(FFRAND, 1);
if (lnewline() == FALSE || ((
 #ifdef NOTAB
curbp->b_flag & BFNOTAB) ? linsert(nicol, ' ') == FALSE : (





turn rc_exec variable into an rcexec function

2022-05-17 Thread Antoine Jacoutot
Hi folks.

I would like to move away from the rc_exec variable to an rcexec() function for
rc.d scripts.
That will allow daemon_logger to work out of the box even for manually
crafted rc_start() functions.
I will also simplify the addition of new features like daemon_startdir.

I have tested this with a few daemons only.
So I am calling for testing this diff on your own setup to detect any possible
regression (especially for daemons with a complex starting sequence).
To test, just apply the diff to rc.subr and replace ${rcexec} with rc_exec in
your rc.d scripts.

As I don't want to spam the list, please report OK tests directly to me and
KO ones to the list.

Thanks a lot!


Index: rc.subr
===
RCS file: /cvs/src/etc/rc.d/rc.subr,v
retrieving revision 1.152
diff -u -p -r1.152 rc.subr
--- rc.subr 10 Feb 2022 16:57:33 -  1.152
+++ rc.subr 17 May 2022 07:05:57 -
@@ -159,11 +159,19 @@ _rc_wait_for_start() {
return
 }
 
-rc_start() {
-   ${rcexec} "${daemon_logger:+set -o pipefail; }${daemon} 
${daemon_flags}${daemon_logger:+ 2>&1 |
+rc_exec() {
+   local _rcexec="su -fl -c ${daemon_class} -s /bin/sh ${daemon_user} -c"
+   [ "${daemon_rtable}" -eq "$(id -R)" ] ||
+   _rcexec="route -T ${daemon_rtable} exec ${_rcexec}"
+
+   ${_rcexec} "${daemon_logger:+set -o pipefail; }$@${daemon_logger:+ 2>&1 
|
logger -ip ${daemon_logger} -t ${_name}}"
 }
 
+rc_start() {
+   rc_exec "${daemon} ${daemon_flags}"
+}
+
 rc_check() {
pgrep -T "${daemon_rtable}" -q -xf "${pexp}"
 }
@@ -343,6 +351,3 @@ unset _rcflags _rclogger _rcrtable _rcti
 # the shell will strip the quotes from daemon_flags when starting a daemon;
 # make sure pexp matches the process (i.e. doesn't include the quotes)
 pexp="$(eval echo ${daemon}${daemon_flags:+ ${daemon_flags}})"
-rcexec="su -fl -c ${daemon_class} -s /bin/sh ${daemon_user} -c"
-[ "${daemon_rtable}" -eq "$(id -R)" ] ||
-   rcexec="route -T ${daemon_rtable} exec ${rcexec}"


-- 
Antoine