Re: asr(3): strip AD flag in responses

2021-11-21 Thread Otto Moerbeek
On Sun, Nov 21, 2021 at 04:51:45PM +0100, Florian Obser wrote:

> On 2021-11-20 21:16 +01, Otto Moerbeek  wrote:
> > On Sat, Nov 20, 2021 at 06:44:58PM +0100, Florian Obser wrote:
> >
> >> On 2021-11-20 18:41 +01, Florian Obser  wrote:
> >> > On 2021-11-20 18:19 +01, Florian Obser  wrote:
> >> >
> >> >> +/*
> >> >> + * Clear AD flag in the answer.
> >> >> + */
> >> >> +static void
> >> >> +clear_ad(struct asr_result *ar)
> >> >> +{
> >> >> +   struct asr_dns_header   *h;
> >> >> +   uint16_t flags;
> >> >> +
> >> >> +   h = (struct asr_dns_header *)ar->ar_data;
> >> >> +   flags = ntohs(h->flags);
> >> >> +   flags &= ~(AD_MASK);
> >> >> +   h->flags = htons(flags);
> >> >> +}
> >> >> +
> >> >
> >> > btw. is it possible that this is not alligned correctly on sparc64?
> >> >
> >> > should be do something like (not even compile tested)
> >> >
> >> > static void
> >> > clear_ad(struct asr_result *ar)
> >> > {
> >> >  struct asr_dns_headerh;
> >> >
> >> > memmove(, ar->ar_data, sizeof(h));
> >> > h.flags = ntohs(h.flags);
> >> > h.flags &= ~(AD_MASK);
> >> > h.flags = htons(h.flags);
> >> > memmove(ar->ar_data, , sizeof(h));
> >> > }
> >> >
> >> 
> >> memcpy obviously, I was distracted by the copious amount of memmove in
> >> asr code...
> >
> > It is not needed to copy the "whole" header just to change the flags.
> > You could just copy out, modify and copy back the flags field only.
> >
> > otoh, it's just 12 bytes, so no big deal.
> 
> right. So I have tried my patch (without the memcpy dance) on sparc64
> over udp and tcp and I have also tracked this down in the code. This
> should be fine as is. ar->ar_data comes directly out of malloc
> (reallocarray) in ensure_ibuf() and the struct is defined thusly:
> 
> struct asr_dns_header {
> uint16_tid;
> uint16_tflags;
> uint16_tqdcount;
> uint16_tancount;
> uint16_tnscount;
> uint16_tarcount;
> };
> 

So that is indeed safe as long as nobody starts allocating packet
buffers in different ways,

-Otto



Re: asr(3): strip AD flag in responses

2021-11-20 Thread Otto Moerbeek
On Sat, Nov 20, 2021 at 06:44:58PM +0100, Florian Obser wrote:

> On 2021-11-20 18:41 +01, Florian Obser  wrote:
> > On 2021-11-20 18:19 +01, Florian Obser  wrote:
> >
> >> +/*
> >> + * Clear AD flag in the answer.
> >> + */
> >> +static void
> >> +clear_ad(struct asr_result *ar)
> >> +{
> >> +  struct asr_dns_header   *h;
> >> +  uint16_t flags;
> >> +
> >> +  h = (struct asr_dns_header *)ar->ar_data;
> >> +  flags = ntohs(h->flags);
> >> +  flags &= ~(AD_MASK);
> >> +  h->flags = htons(flags);
> >> +}
> >> +
> >
> > btw. is it possible that this is not alligned correctly on sparc64?
> >
> > should be do something like (not even compile tested)
> >
> > static void
> > clear_ad(struct asr_result *ar)
> > {
> > struct asr_dns_headerh;
> >
> > memmove(, ar->ar_data, sizeof(h));
> > h.flags = ntohs(h.flags);
> > h.flags &= ~(AD_MASK);
> > h.flags = htons(h.flags);
> > memmove(ar->ar_data, , sizeof(h));
> > }
> >
> 
> memcpy obviously, I was distracted by the copious amount of memmove in
> asr code...

It is not needed to copy the "whole" header just to change the flags.
You could just copy out, modify and copy back the flags field only.

otoh, it's just 12 bytes, so no big deal.

-Otto



Re: asr(3): strip AD flag in responses

2021-11-20 Thread Otto Moerbeek
On Sat, Nov 20, 2021 at 02:40:59PM +0100, Otto Moerbeek wrote:

> On Sat, Nov 20, 2021 at 12:20:32PM +0100, Florian Obser wrote:
> 
> > The Authentic Data (AD) flag indicates that the nameserver validated
> > the response using DNSSEC. For clients to trust this the nameserver
> > and the path to the nameserver must be trusted. In the general case
> > this is not true.
> > 
> > We can trust localhost so we set the AD flag on queries to request
> > validation and preserve the AD flag in answers. (*)
> > 
> > If, and only if, trusted nameservers (that are not on localhost) have
> > been added to resolv.conf and the path to them is secure the trust-ad
> > flag may be used to request validation from them and trust answers with
> > the AD flag set.
> > 
> > The trust-ad option first appeared in glibc 2.31.
> > ( https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/461 and
> > https://man7.org/linux/man-pages/man5/resolv.conf.5.html )
> > 
> > Thomas Habets (thomas at habets.se) pointed out on bugs@ that
> > VerifyHostKeyDNS in ssh only works with unwind (which is good) but
> > only by accident (which is bad).
> > https://marc.info/?t=16371749593=1=2
> > 
> > *) This is for people running unwind, unbound or some other validating
> > resolver on localhost. Yes, it is possible that someone set up some sort
> > of forwarder where they trust the DNS answers but not that they are
> > DNSSEC validated. This feels contrived and a case of DON'T DO THAT!
> > 
> > OK?
> 
> I like this much better than the sketch I posted on bugs@
> 
> Two comment wrt the docs inline.
> 
> Code looks and tests good.
> 
>   -Otto
> 
> > 
> > diff --git include/resolv.h include/resolv.h
> > index fb02483871e..2422deb5484 100644
> > --- include/resolv.h
> > +++ include/resolv.h
> > @@ -191,6 +191,7 @@ struct __res_state_ext {
> >  /* DNSSEC extensions: use higher bit to avoid conflict with ISC use */
> >  #defineRES_USE_DNSSEC  0x2000  /* use DNSSEC using OK bit in 
> > OPT */
> >  #defineRES_USE_CD  0x1000  /* set Checking Disabled flag */
> > +#defineRES_TRUSTAD 0x8000  /* Request AD, keep it in 
> > responses. */
> >  
> >  #define RES_DEFAULT(RES_RECURSE | RES_DEFNAMES | RES_DNSRCH)
> >  
> > diff --git lib/libc/asr/asr.c lib/libc/asr/asr.c
> > index 8bcb61b6000..77bc3854420 100644
> > --- lib/libc/asr/asr.c
> > +++ lib/libc/asr/asr.c
> > @@ -661,7 +661,8 @@ pass0(char **tok, int n, struct asr_ctx *ac)
> > d = strtonum(tok[i] + 6, 1, 16, );
> > if (e == NULL)
> > ac->ac_ndots = d;
> > -   }
> > +   } else if (!strcmp(tok[i], "trust-ad"))
> > +   ac->ac_options |= RES_TRUSTAD;
> > }
> > }
> >  }
> > @@ -672,7 +673,10 @@ pass0(char **tok, int n, struct asr_ctx *ac)
> >  static int
> >  asr_ctx_from_string(struct asr_ctx *ac, const char *str)
> >  {
> > -   char buf[512], *ch;
> > +   struct sockaddr_in6 *sin6;
> > +   struct sockaddr_in  *sin;
> > +   int  i, trustad;
> > +   char buf[512], *ch;
> >  
> > asr_ctx_parse(ac, str);
> >  
> > @@ -702,6 +706,27 @@ asr_ctx_from_string(struct asr_ctx *ac, const char 
> > *str)
> > break;
> > }
> >  
> > +   trustad = 1;
> > +   for (i = 0; i < ac->ac_nscount && trustad; i++) {
> > +   switch (ac->ac_ns[i]->sa_family) {
> > +   case AF_INET:
> > +   sin = (struct sockaddr_in *)ac->ac_ns[i];
> > +   if (sin->sin_addr.s_addr != htonl(INADDR_LOOPBACK))
> > +   trustad = 0;
> > +   break;
> > +   case AF_INET6:
> > +   sin6 = (struct sockaddr_in6 *)ac->ac_ns[i];
> > +   if (!IN6_IS_ADDR_LOOPBACK(>sin6_addr))
> > +   trustad = 0;
> > +   break;
> > +   default:
> > +   trustad = 0;
> > +   break;
> > +   }
> > +   }
> > +   if (trustad)
> > +   ac->ac_options |= RES_TRUSTAD;
> > +
> > return (0);
> >  }
> >  
> > diff --git lib/libc/asr/getrrsetbyname_async.c 
> > lib

Re: asr(3): strip AD flag in responses

2021-11-20 Thread Otto Moerbeek
On Sat, Nov 20, 2021 at 12:20:32PM +0100, Florian Obser wrote:

> The Authentic Data (AD) flag indicates that the nameserver validated
> the response using DNSSEC. For clients to trust this the nameserver
> and the path to the nameserver must be trusted. In the general case
> this is not true.
> 
> We can trust localhost so we set the AD flag on queries to request
> validation and preserve the AD flag in answers. (*)
> 
> If, and only if, trusted nameservers (that are not on localhost) have
> been added to resolv.conf and the path to them is secure the trust-ad
> flag may be used to request validation from them and trust answers with
> the AD flag set.
> 
> The trust-ad option first appeared in glibc 2.31.
> ( https://gnutoolchain-gerrit.osci.io/r/c/glibc/+/461 and
> https://man7.org/linux/man-pages/man5/resolv.conf.5.html )
> 
> Thomas Habets (thomas at habets.se) pointed out on bugs@ that
> VerifyHostKeyDNS in ssh only works with unwind (which is good) but
> only by accident (which is bad).
> https://marc.info/?t=16371749593=1=2
> 
> *) This is for people running unwind, unbound or some other validating
> resolver on localhost. Yes, it is possible that someone set up some sort
> of forwarder where they trust the DNS answers but not that they are
> DNSSEC validated. This feels contrived and a case of DON'T DO THAT!
> 
> OK?

I like this much better than the sketch I posted on bugs@

Two comment wrt the docs inline.

Code looks and tests good.

-Otto

> 
> diff --git include/resolv.h include/resolv.h
> index fb02483871e..2422deb5484 100644
> --- include/resolv.h
> +++ include/resolv.h
> @@ -191,6 +191,7 @@ struct __res_state_ext {
>  /* DNSSEC extensions: use higher bit to avoid conflict with ISC use */
>  #define  RES_USE_DNSSEC  0x2000  /* use DNSSEC using OK bit in 
> OPT */
>  #define  RES_USE_CD  0x1000  /* set Checking Disabled flag */
> +#define  RES_TRUSTAD 0x8000  /* Request AD, keep it in 
> responses. */
>  
>  #define RES_DEFAULT  (RES_RECURSE | RES_DEFNAMES | RES_DNSRCH)
>  
> diff --git lib/libc/asr/asr.c lib/libc/asr/asr.c
> index 8bcb61b6000..77bc3854420 100644
> --- lib/libc/asr/asr.c
> +++ lib/libc/asr/asr.c
> @@ -661,7 +661,8 @@ pass0(char **tok, int n, struct asr_ctx *ac)
>   d = strtonum(tok[i] + 6, 1, 16, );
>   if (e == NULL)
>   ac->ac_ndots = d;
> - }
> + } else if (!strcmp(tok[i], "trust-ad"))
> + ac->ac_options |= RES_TRUSTAD;
>   }
>   }
>  }
> @@ -672,7 +673,10 @@ pass0(char **tok, int n, struct asr_ctx *ac)
>  static int
>  asr_ctx_from_string(struct asr_ctx *ac, const char *str)
>  {
> - char buf[512], *ch;
> + struct sockaddr_in6 *sin6;
> + struct sockaddr_in  *sin;
> + int  i, trustad;
> + char buf[512], *ch;
>  
>   asr_ctx_parse(ac, str);
>  
> @@ -702,6 +706,27 @@ asr_ctx_from_string(struct asr_ctx *ac, const char *str)
>   break;
>   }
>  
> + trustad = 1;
> + for (i = 0; i < ac->ac_nscount && trustad; i++) {
> + switch (ac->ac_ns[i]->sa_family) {
> + case AF_INET:
> + sin = (struct sockaddr_in *)ac->ac_ns[i];
> + if (sin->sin_addr.s_addr != htonl(INADDR_LOOPBACK))
> + trustad = 0;
> + break;
> + case AF_INET6:
> + sin6 = (struct sockaddr_in6 *)ac->ac_ns[i];
> + if (!IN6_IS_ADDR_LOOPBACK(>sin6_addr))
> + trustad = 0;
> + break;
> + default:
> + trustad = 0;
> + break;
> + }
> + }
> + if (trustad)
> + ac->ac_options |= RES_TRUSTAD;
> +
>   return (0);
>  }
>  
> diff --git lib/libc/asr/getrrsetbyname_async.c 
> lib/libc/asr/getrrsetbyname_async.c
> index e5e7c23c261..06a998b0381 100644
> --- lib/libc/asr/getrrsetbyname_async.c
> +++ lib/libc/asr/getrrsetbyname_async.c
> @@ -32,7 +32,7 @@
>  #include "asr_private.h"
>  
>  static int getrrsetbyname_async_run(struct asr_query *, struct asr_result *);
> -static void get_response(struct asr_result *, const char *, int);
> +static void get_response(struct asr_result *, const char *, int, int);
>  
>  struct asr_query *
>  getrrsetbyname_async(const char *hostname, unsigned int rdclass,
> @@ -150,7 +150,8 @@ getrrsetbyname_async_run(struct asr_query *as, struct 
> asr_result *ar)
>   break;
>   }
>  
> - get_response(ar, ar->ar_data, ar->ar_datalen);
> + get_response(ar, ar->ar_data, ar->ar_datalen,
> + as->as_ctx->ac_options & RES_TRUSTAD);
>   free(ar->ar_data);
>   

Re: [PATCH] Change maximum size of /usr/src to 3G for autoinstall

2021-11-07 Thread Otto Moerbeek
On Sun, Nov 07, 2021 at 07:44:57PM +0300, Mikhail wrote:

> On Sat, Oct 30, 2021 at 11:39:54AM +0300, Mikhail wrote:
> > On Sun, Oct 24, 2021 at 02:17:25PM +0300, Mikhail wrote:
> > > On Sun, Oct 24, 2021 at 11:32:26AM +0100, Stuart Henderson wrote:
> > > > The minimum needs to go up too, a cvs checkout is 1.3G already.
> > > > 
> > > > (Not that I use auto defaults without changes anyway, they don't
> > > > work too well for ports dev..)
> > > 
> > > Changed minimum to 1.5G.
> > 
> > Weekly friendly ping. Comments, objections, feedback?
> > 
> > Maybe someone has another opinion on max (3G) and min (1.5G) values?
> > 
> > I think bumping them makes sense, since more and more users use git.
> 
> Last ping, maybe interested committer appeared on this week.
> 

I'll take a look. Remind me if I forget. Sorry for the delay.

-Otto



Re: Missing semicolon in snmpd/parse.y

2021-10-20 Thread Otto Moerbeek
On Wed, Oct 20, 2021 at 01:58:03PM +0200, Gerhard Roth wrote:

> Hi,
> 
> the rule for 'listen_udptcp' is missing a semicolon at its end.
> 
> I have no idea what yacc does to the following 'port' rule without
> that semicolon.

Looks like the generated c code is the same;

ok otto@

-Otto

> 
> Gerhard
> 
> 
> Index: usr.sbin/snmpd/parse.y
> ===
> RCS file: /cvs/src/usr.sbin/snmpd/parse.y,v
> retrieving revision 1.70
> diff -u -p -u -p -r1.70 parse.y
> --- usr.sbin/snmpd/parse.y15 Oct 2021 15:01:29 -  1.70
> +++ usr.sbin/snmpd/parse.y20 Oct 2021 11:45:29 -
> @@ -350,6 +350,7 @@ listen_udptcp : listenproto STRING port 
>   free($2);
>   free($3);
>   }
> + ;
>  
>  port : /* empty */   {
>   $$ = NULL;




Re: Unwind + NSD usage question

2021-09-28 Thread Otto Moerbeek
On Mon, Sep 27, 2021 at 08:50:06PM -0400, abyx...@mnetic.ch wrote:

> Hello, trying to set up unwind with nsd on the same machine serving a 
> internal domain (home.arpa) with all my machines being part of that domain, 
> eg router.home.arpa. If I point dig at my nsd instance (dig @127.0.0.1 -p 
> 10053 router.home.arpa. A) I see my subdomains in the zone all being returned 
> (router.home.arpa. -> 10.0.0.1). If I set nsd as a forwarder in unwind.conf 
> (forwarder 127.0.0.1 port 10053) though, things get weird. My ISP doesn't 
> return any results for home.arpa but some other servers (quad9 and 
> cloudfare?) return a blackhole address pointing to prisoner.iana.org. If I 
> limit unwind to preference {forwarder recursor} I now get my local nsd 
> results for my domains as expected. If I comment out the preference line, 
> unwind eventually learns a server that will answer to home.arpa with the 
> blackhole prisoner.iana.org address (at least a minute in, sometimes longer, 
> makes testing difficult). The use of force forwarder {home.arpa} and force 
> accept bogus forwarder {home.arpa} don't appear to have any effect at all. 
> (Full configs and dmesg below). 
> 

> I dug through the code a bit, if I'm following it correctly in
> sbin/unwind/resolver.c:check_resolver_done, nsd seems to be returning
> a SERVFAIL and being marked dead (as confirmed with unwindctl status.
> I am not sure I followed the code correctly at this point, but being
> set to DEAD and/or returning a SERVFAIL seems to preempt the use of
> force accept bogus. I am not sure what test unwind/libunbound are
> doing to check the health status of the different resolvers but I have
> yet to see my nsd forwarder not marked as "dead" in unwindctl status.
> Any ideas on how to debug this? This happens on both 6.9 and -current.
> The -current dmesg is posted below. 

(Pleae wrap your lines).

Your issue might be that an NSD instance does not work as forwarding
target, since it is not an recursive resolver. unwind expects
forwarders to be able to resolve the whole DNS tree, even if they are
marked to be used for a subtree only.

I have a similar setup, but I am forwarding to a recursive resolver
that is authoritative for my local private domain. Any resolver I know
has that capability, e.g. with unbound you would use local.zone.

-Otto
> 
> 
> 
> ---
> router# cat /etc/unwind.conf  
>  
> forwarder {
> 127.0.0.1 port 10053
> }
> 
> force accept bogus forwarder { home.arpa }
> #force autoconf { home.arpa }
> preference { forwarder recursor }
> #preference { recursor DoT forwarder }
> ---
> 
> 
> ---
> router# cat /var/nsd/etc/nsd.conf 
>  
> # $OpenBSD: nsd.conf,v 1.13 2018/08/16 17:59:12 florian Exp $
> 
> server:
> hide-version: yes
> verbosity: 1
> database: "" # disable database
> 
> ## bind to a specific address/port
> ip-address: 127.0.0.1@10053
> 
> ## make packets as small as possible, on by default
> #   minimal-responses: yes
> 
> ## respond with truncation for ANY queries over UDP and allow ANY over TCP,
> ## on by default
> #   refuse-any: yes
> 
> remote-control:
> control-enable: yes
> control-interface: /var/run/nsd.sock
> 
> zone:
> name: "home.arpa."
> zonefile: "master/home.arpa"
> ---
> 
> 
> ---
> router# unwindctl status  
>  
> 1. recursorvalidating,  30ms   2. forwarder dead,  15ms
> 
>   histograms: lifetime[ms], decaying[ms]
>  <10   <20   <40   <60   <80  <100  <200  <400  <600  <800 <1000 >
>   rec   1634  1008  1014   619   292   339   973   667   15626 7 1
>   1614 8 6 1 3 6 5 0 0 0 0
>  forw   223886 0 0 0 0 0 0 0 0 0 0
>   19 0 0 0 0 0 0 0 0 0 0 0
> ---
> 
> 
> ---
> router# dig @127.0.0.1 home.arpa. A
> 
> ; <<>> dig 9.10.8-P1 <<>> @127.0.0.1 home.arpa. A
> ; (1 server found)
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 41102
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0
> 
> ;; QUESTION SECTION:
> ;home.arpa. IN  A
> 
> ;; ANSWER SECTION:
> home.arpa.  413 IN  A   10.0.0.1
> 
> ;; Query time: 62 msec
> ;; SERVER: 127.0.0.1#53(127.0.0.1)
> ;; WHEN: Mon Sep 27 20:46:38 EDT 2021
> ;; MSG SIZE  rcvd: 43
> ---
> 
> 
> ---
> router# dig @9.9.9.9 home.arpa. A   
> 
> ; <<>> dig 9.10.8-P1 <<>> @9.9.9.9 home.arpa. A
> ; (1 server found)
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 53702
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
> 
> ;; OPT PSEUDOSECTION:
> ; EDNS: version: 0, flags:; 

Re: libedit: stop ignoring SIGINT

2021-08-09 Thread Otto Moerbeek
On Mon, Aug 09, 2021 at 07:20:31AM -0600, Theo de Raadt wrote:

> Ingo Schwarze  wrote:
> 
> > as mentioned earlier, deraadt@ reported that sftp(1) ignores Ctrl-C.
> > Fixing that without longjmp(3) requires making editline(3) better
> > behaved.
> 
> If a library interface encourages use longjmp(), that library should be
> thrown into the dustbin of history.  Our src tree has very few longjmp
> these days.  Thank you for the effort to discourage addition of more.
> 
> > The following patch causes el_gets(3) and el_wgets(3) to return
> > failure when read(2)ing from the terminal is interrupted by a
> > signal other than SIGCONT or SIGWINCH.  That allows the calling
> > program to decide what to do, usually either exit the program or
> > provide a fresh prompt to the user.
> 
> Looks good.
> 
> >  * bc(1)
> >It behaves well with the patch: Ctrl-C discards the current
> >input line and starts a new input line.
> >The reason why this already works even without the patch
> >is that bc(1) does very scary stuff inside the signal handler:
> >pass a file-global EditLine pointer on the heap to el_line(3)
> >and access fields inside the returned struct.  Needless to
> >say that no signal handler should do such things...
> 
> Otto -- if you are short of time, at minimum mark the variable usage
> line with /* XXX signal race */ as we have done throughout the tree.  I
> would encourage anyone who sees such problems inside any signal handler
> to show such comment-adding diffs.  If these problems are documented,
> they can be fixed eventually, usually through event-loop logic, but I'll
> admit many of the comments are in non-event-loop programs.

Added the comment. Don't know what I was thinking when I did that change.

-Otto

> 
> >  * ftp(1)
> >It behaves well with the patch: Ctrl-C discards the current
> >input line and provides a fresh prompt.
> >The reason why it already works without the patch is that ftp(1)
> >uses setjmp(3)/longjmp(3) to forcefully grab back control
> >from el_gets(3) without waiting for it to return.
> 
> Horrible isn't it.
> 
> >  * sftp(1)
> >Behaviour is improved with the patch: Ctrl-C now exits sftp(1).
> >If desired, i can supply a very simple follow-up patch to sftp.c
> >to instead behave like ftp(1) and bc(1), i.e. discard the
> >current input line and provide a fresh prompt.
> >Neither doing undue work in the signal handler nor longjmp(3)
> >will be required for that (if this patch gets committed).
> 
> I suspect dtucker will want to decide on the interactive behaviour,
> but what you describe sounds right.
> 
> > Also note that deraadt@ pointed out in private mail to me that the
> > fact that read__fixio() clears FIONBIO is probably a symptom of
> > botched editline(3) API design.  That might be worth fixing, too,
> > as far as that is feasible, but it is unrelated to the sftp(1)
> > Ctrl-C issue; let's address one topic at a time.
> 
> I mentioned rarely having seen a good outcome from code mixing any of
> the 3: FIONBIO, FIONREAD, and select/poll.  Often the non-blocking was
> added to select/poll code to hide some sort of bug, or the select/poll
> code was added amateurishly to older code without removing the FIONBIO.
> There are a few situations you need both approaches mixed, but it isn't
> the general case, and thus FIONBIO has a "polled IO" smell for me.
> 



Re: [CAN IGNORE] Proposal for new bc(1) and dc(1)

2021-07-31 Thread Otto Moerbeek
On Fri, Jul 30, 2021 at 09:54:27AM -0600, Gavin Howard wrote:

> Whoops; I thought Theo would make the decision, and his last email made
> me think he might have.
> 
> I am happy to help as much as I can to make the process easy for you.
> 
> In the meantime, I think I will release 5.0.0 when it's ready. I'll take
> into account your feedback in a future release.
> 
> Gavin Howard
> 

It's mostly a question of finding time. 

As for the decision to import itself, Theo listed important drawbacks
of importing something from upstream into our base. The advantages of
doing that must be very big to outweight the disadvantages. Given that
current dc and bc are quite ok (if I may so myself), the chances of
your code getting into base are quite slim indeed.

-Otto



Re: [CAN IGNORE] Proposal for new bc(1) and dc(1)

2021-07-30 Thread Otto Moerbeek
On Thu, Jul 29, 2021 at 10:31:34PM -0600, Gavin Howard wrote:

> Hello,
> 
> At this point, because of the lack of reply, I am going to assume that
> my proposal is rejected. While I am sad, I understand.
> 
> Thank you for taking the time to consider my proposal.
> 
> Gavin Howard
> 

I just did not find the time to look at it. Sorry for that. I still
might one day.

-Otto



Re: [CAN IGNORE] Proposal for new bc(1) and dc(1)

2021-06-17 Thread Otto Moerbeek
On Thu, Jun 17, 2021 at 10:01:02AM -0600, Gavin Howard wrote:

> Otto,
> 
> > I think it is interesting. As for the incompatibilites, my memory says
> > I followed the original dc and bc when deciding on those (GNU chose to
> > differs in these cases). Bit it has been 18 years since I wrote the
> > current versions, so I might misrememeber.
> 
> I think that makes sense to me. Unfortunately, when I was building my
> dc, I couldn't find any mention in the OpenBSD man pages, which I used
> to ensure as much compatibility as I could, that arrays and registers
> were not separate. Well, there was one (the `;` command mentions
> registers, but the `:` command does not, so I thought that was a typo).
> 
> Regarding the 0 having length 0 or 1, that was a decision I agonized
> over. My dad, who is a mathematician, said that it could go either way.
> Unfortunately for me, if this is a showstopper incompatibility (and it
> might be based on how the test suite uses `length()` and `Z`), I do
> think I would keep it as it is and accept that OpenBSD will not want my
> bc(1) and dc(1).

It looks like GNU dc and bc do not agree:

$ dc -V
dc (GNU bc 1.06) 1.3

Copyright 1994, 1997, 1998, 2000 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There
is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE,
to the extent permitted by law.
iMac:~ otto$ dc
0Zp
0

and 

$ bc
bc 1.06
Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'. 
length(0)
1

I confirmed the original dc by Morris and Cherry indeed print 0 for
the above test case.

-Otto
> 
> > As for moving to your version, I have no opinion yet. I have some
> > attachment to the current code, but not so strong that I am opposing
> > replacement upfront. OTOH the current implementaion is almost
> > maintainance free for many years already. So I dunno.
> 
> You have a right to have attachment to it; I have attachment to mine!
> 
> In fact, I was pleasantly surprised at how clean and readable your code
> was. I usually struggle to read code written by others, but I could
> easily read yours.
> 
> On that note, since last night, I thought of more disadvantages of
> moving to my bc and dc, which I feel I must mention.
> 
> More disadvantages:
> 
> * The current dc(1) and bc(1) are from a known member of the OpenBSD
>   community with many contributions. I am an unknown quantity.
> * The current dc(1) and bc(1) do not have ugly portability code that
>   OpenBSD probably doesn't care about.
> * The current dc(1) and bc(1) do not have ugly code to support build
>   options that OpenBSD does not care about.
> * The binary size of the OpenBSD dc(1) and bc(1) combined are 78% the
>   size of mine combined (on amd64). The size of OpenBSD combined is
>   145440, and the size of mine combined are 185706.
> * The current dc(1) and bc(1) have much less source code and have been
>   nearly maintenance-free for many years. Mine were started in 2018 and
>   do not have as long of a track record for being low maintenance.
> 
> > I'll take a look at your code soon and maybe other devs have opinions.
> 
> Thank you very much!
> 
> Gavin Howard
> 



Re: [CAN IGNORE] Proposal for new bc(1) and dc(1)

2021-06-17 Thread Otto Moerbeek
On Wed, Jun 16, 2021 at 11:40:08PM -0600, Gavin Howard wrote:

> Hello,
> 
> My name is Gavin Howard. I have developed a new bc(1) and dc(1)
> implementation. [0]
> 
> I propose replacing the current implementations with mine.
> 
> Advantages:
> 
> * Performance. [1]
> * Extensions for ease of use.
> * With build options to remove most extensions, if desired.
> * Compatible with GNU bc.
> * Already used by default in FreeBSD.
> * Fuzzed thoroughly.
> * No exec needed for bc(1) (both programs are contained in the same
>   multi-call binary).
> 
> Expectations met:
> 
> * Already uses pledge(2) and unveil(2).
> * No dependencies beyond C99, POSIX `make`, and POSIX `sh`.
> * This includes no dependency on editline(3), even though my bc(1)
>   and dc(1) have a history implementation.
> * Thorough test suite.
> * Comprehensive man pages.
> * Locale support.
> 
> Disadvantages:
> 
> * There are incompatibilities with the current bc(1) and dc(1), which
>   are listed below. All users would need to be made aware of these
>   incompatibilities, so they can update scripts, and scripts in `src/`
>   will also need to be updated.
> 
> Incompatibilities (failing tests from `regress/usr.bin`):
> 
> 1. Current bc(1) and dc(1) return 0 for length(a) where a is 0. Mine
>return 1. This causes my dc(1) to fail `dc/t1.in` and `dc/t28.in`.
> 2. Current dc(1) implements arrays as part of registers. Mine keeps
>arrays and registers separate. This causes my dc(1) to fail
>`dc/t1.in` and `dc/t8.in`.
> 3. Current dc(1) does not print a `nul` byte if given the `P` or `a`
>commands with 0 on the top of the stack. My dc(1) does (because it
>considers 0 to have one digit, see #1). This causes my dc(1) to fail
>`dc/t3.in` and `dc/t13.in`.
> 4. Current dc(1) starts with empty registers, and allows the user to pop
>all items off the register stack. My dc(1) starts with a single item
>in the register and does not allow the user to remove it.
> 5. Current dc(1) will push an item onto a register stack for the `s`
>command. My dc(1) does not since one already exists. This, combined
>with #4, causes my dc(1) to fail `dc/t5.in`
> 6. Current bc(1) and dc(1) have a larger maximum `obase` than mine. This
>causes my dc(1) to fail `dc/t9.in`.
> 7. Current dc(1) does not reset on errors. My dc(1) does, so it fails
>`dc/t12.in`.
> 8. Current dc(1) allows register names with any character. My dc(1)
>requires non-control characters and has a different way of doing
>extended registers. This causes my dc(1) to fail `dc/t15.in`,
>`dc/t16.in`, `dc/t19.in`, `dc/t21.in`, and `dc/t23.in`.
> 9. Current bc(1) is a frontend to dc(1). Mine are combined into the same
>binary and generate and run bytecode. This means that my bc(1) fails
>all of the bc(1) regression tests (which generate dc(1) code) and
>does not have the `-c` option.
> 
> Notes:
> 
> My dc(1) also fails `dc/t10.in` because it doesn't have the `!` command,
> but https://github.com/openbsd/src/commit/dc405aa075 makes it appear as
> though the current dc(1) does not have the `!` command either.
> 
> In https://youtu.be/gvmGfpMgny4?t=1277 , Bob Beck said that unveil(2)
> must not be used on command-line arguments, so I use unveil(2) after all
> command-line files are executed.
> 
> Current version is 4.0.2. I am planning to release version 4.1.0 soon,
> but held off in case you are interested and had feedback that might
> help.
> 
> I am willing to help maintain them if they are put into OpenBSD, but I
> am also willing to pass them off to whoever you wish, should you wish to
> do so.
> 
> I do have a mirror on GitHub.
> 
> If you are not interested, feel free to ignore this email.
> 
> Regardless, thank you for your time.
> 
> Gavin Howard
> 
> [0]: https://git.yzena.com/gavin/bc
> [1]: https://git.yzena.com/gavin/bc/src/branch/master/manuals/benchmarks.md
> 

I think it is interesting. As for the incompatibilites, my memory says
I followed the original dc and bc when deciding on those (GNU chose to
differs in these cases). Bit it has been 18 years since I wrote the
current versions, so I might misrememeber.

As for moving to your version, I have no opinion yet. I have some
attachment to the current code, but not so strong that I am opposing
replacement upfront. OTOH the current implementaion is almost
maintainance free for many years already. So I dunno.

I'll take a look at your code soon and maybe other devs have opinions.

-Otto



Re: Ryzen 5800X hw.setperf vs hw.cpuspeed

2021-06-01 Thread Otto Moerbeek
On Mon, May 31, 2021 at 10:24:01PM +0200, Josh wrote:

> thanks Otto for the dmesg.
> 
> I'd like to get one B550 mobo as well. Which version of Gigabyte B550
> AORUS ELITE do you have exactly? ATX? mATX ?
> Most of them listed here[1] have either RLT8118 or RLT8125 chipset and
> re(4) doesn't list them...
> 
> Can't find any reference to your model there[1] (RTL8168 chipset) "re0
> at pci5 dev 0 function 0 "Realtek 8168" rev 0x06: RTL8168E/8111E
> (0x2c00), msi, address 64:70:02:01:db:3c
> rgephy0 at re0 phy 7: RTL8169S/8110S/8211 PHY, rev. 4"
> 
> Could it be this one[2]?

Yes, that the mb.

The re(4) is a random pci express card I plugged in initially, the
on-board rge(4) did not work properly: interrupt storm and no WOL,
both issues are fixed now.

I'm not using power management atm.

-Otto
> 
> Cheers
> 
> [1] https://www.gigabyte.com/Motherboard/AORUS-Gaming
> [2] https://www.gigabyte.com/Motherboard/B550-AORUS-ELITE-rev-10/sp#sp
> 
> On Fri, Nov 20, 2020 at 9:28 AM Otto Moerbeek  wrote:
> >
> > Hi,
> >
> > I got a new Ryzen machine, dmesg below. What I'm observing might be a
> > issue with hw.setperf.
> >
> > On startsup it shows:
> >
> > hw.cpuspeed=3800
> > hw.setperf=100
> >
> > If I lower hw.setperf to zero, the new state is reflect immediately in
> > hw.cpuspeed:
> >
> > hw.cpuspeed=2200
> > hw.setperf=0
> >
> > And also sha256 -t becomes slower as expected.
> >
> > But If I raise hw.setperf to 100 I'm seeing:
> >
> > hw.cpuspeed=2200
> > hw.setperf=100
> >
> > and sha256 -t is still slow. Only after some time passes (lets say a
> > couple of tens of seconds) it does show:
> >
> > hw.cpuspeed=3800
> > hw.setperf=100
> >
> > and sha256 -t is fast again.
> >
> > This behaviour is different from my old machine, where setting
> > hs.setperf was reflected in hs.cpuspeed immediately both ways
> >
> > Any clue?
> >
> > -Otto
> >
> > OpenBSD 6.8-current (GENERIC.MP) #1: Thu Nov 19 21:01:06 CET 2020
> > o...@lou.intra.drijf.net:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > real mem = 34286964736 (32698MB)
> > avail mem = 33232543744 (31693MB)
> > random: good seed from bootblocks
> > mpath0 at root
> > scsibus0 at mpath0: 256 targets
> > mainbus0 at root
> > bios0 at mainbus0: SMBIOS rev. 3.3 @ 0xe8d60 (55 entries)
> > bios0: vendor American Megatrends Inc. version "F11d" date 10/29/2020
> > bios0: Gigabyte Technology Co., Ltd. B550 AORUS ELITE
> > acpi0 at bios0: ACPI 6.0
> > acpi0: sleep states S0 S3 S4 S5
> > acpi0: tables DSDT FACP SSDT SSDT SSDT SSDT FIDT MCFG HPET BGRT IVRS PCCT 
> > SSDT CRAT CDIT SSDT SSDT SSDT SSDT WSMT APIC SSDT SSDT SSDT FPDT
> > acpi0: wakeup devices GPP0(S4) GP12(S4) GP13(S4) XHC0(S4) GP30(S4) GP31(S4) 
> > GPP2(S4) GPP3(S4) GPP8(S4) GPP1(S4)
> > acpitimer0 at acpi0: 3579545 Hz, 32 bits
> > acpimcfg0 at acpi0
> > acpimcfg0: addr 0xf000, bus 0-127
> > acpihpet0 at acpi0: 14318180 Hz
> > acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> > cpu0 at mainbus0: apid 0 (boot processor)
> > cpu0: AMD Ryzen 7 5800X 8-Core Processor, 3793.35 MHz, 19-21-00
> > cpu0: 
> > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,PKU,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
> > cpu0: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 
> > 64b/line 8-way L2 cache, 32MB 64b/line disabled L3 cache
> > cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully 
> > associative
> > cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully 
> > associative
> > cpu0: smt 0, core 0, package 0
> > mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
> > cpu0: apic clock running at 99MHz
> > cpu0: mwait min=64, max=64, C-substates=1.1, IBE
> > cpu1 at mainbus0: apid 2 (application processor)
> > cpu1: AMD Ryzen 7 5800X 8-Core Processor, 3792.89 MHz, 19-21-00
> > cpu1: 
> > FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,R

Re: pthread_once fix memory leak

2021-05-02 Thread Otto Moerbeek
On Sun, May 02, 2021 at 02:07:21PM +0200, Mark Kettenis wrote:

> > From: Martijn van Duren 
> > Date: Sun, 02 May 2021 13:28:10 +0200
> > 
> > Found this while tracing a memory leak in filter-dkimsign, thanks to
> > libcrypto. The mutex in pthread_once_t is never destroyed, so the
> > memory allocated inside the mutex is never released.
> > 
> > The diff below was inspired by Ed Schouten and switches form mutex to
> > futex to prevent any memory allocation. I've run with it for about a
> > week without issues and tb@ has given it some beating on sparc64.
> > However I'm no expert in this area and scrutiny from people with more
> > experience in this area and testing in general would be appreciated.
> > 
> > This implementation has one shortcoming I can see, namely[0]:
> > The pthread_once() function is not a cancellation point. However, if
> > init_routine is a cancellation point and is canceled, the effect on
> > once_control shall be as if pthread_once() was never called.
> > It doesn't handle this situation by waking up the sleeping threads.
> > However, the current code doesn't handle this requirement either:
> > #include 
> > #include 
> > 
> > pthread_once_t once = PTHREAD_ONCE_INIT;
> > 
> > void
> > init(void)
> > {
> > printf("init\n");
> > pthread_exit(NULL);
> > }
> > 
> > void *
> > routine(void *arg)
> > {
> > pthread_once(, init);
> > printf("%s\n", __func__);
> > return NULL;
> > }
> > 
> > int
> > main(int argc, char *argv[])
> > {
> > pthread_t thread;
> > pthread_create(, NULL, routine, NULL);
> > pthread_once(, init);
> > printf("%s\n", __func__);
> > return 0;
> > }
> > 
> > Since our current code shows similar behaviour without real world
> > problems and all the solutions that I can come up with are racey I think 
> > this diff can stand on its own and some other brave soul can fix this
> > requirement at a later time. :-)
> > 
> > OK?
> 
> Sorry, no, this is an ABI break.  And a libpthreads major bump is a
> major flag day.
> 
> I don't think this is worth fixing on its own.  There are other
> instances where using a mutex will leak memory.  We need to change the
> mutex implementation such that it doesn't use malloc.  This is needed
> for process shared mutexes too.

Agreed. This is a one-time leak, since once_control must not be on
the stack. So not a big issue. I would love to see malloc-free mutexes
as well.

-Otto

> 
> > Index: include/pthread.h
> > ===
> > RCS file: /cvs/src/include/pthread.h,v
> > retrieving revision 1.4
> > diff -u -p -r1.4 pthread.h
> > --- include/pthread.h   5 Mar 2018 01:15:26 -   1.4
> > +++ include/pthread.h   2 May 2021 11:24:17 -
> > @@ -136,20 +136,13 @@ typedef void  *(*pthread_startroutine_t)(
> >   * Once definitions.
> >   */
> >  struct pthread_once {
> > -   int state;
> > -   pthread_mutex_t mutex;
> > +   volatile unsigned int   state;
> >  };
> >  
> >  /*
> > - * Flags for once initialization.
> > - */
> > -#define PTHREAD_NEEDS_INIT  0
> > -#define PTHREAD_DONE_INIT   1
> > -
> > -/*
> >   * Static once initialization values. 
> >   */
> > -#define PTHREAD_ONCE_INIT   { PTHREAD_NEEDS_INIT, 
> > PTHREAD_MUTEX_INITIALIZER }
> > +#define PTHREAD_ONCE_INIT   { 0 }
> >  
> >  /*
> >   * Static initialization values. 
> > Index: lib/libc/thread/rthread_once.c
> > ===
> > RCS file: /cvs/src/lib/libc/thread/rthread_once.c,v
> > retrieving revision 1.3
> > diff -u -p -r1.3 rthread_once.c
> > --- lib/libc/thread/rthread_once.c  4 Nov 2017 22:53:57 -   1.3
> > +++ lib/libc/thread/rthread_once.c  2 May 2021 11:24:17 -
> > @@ -18,15 +18,25 @@
> >  
> >  #include 
> >  
> > +#include "synch.h"
> > +
> >  int
> >  pthread_once(pthread_once_t *once_control, void (*init_routine)(void))
> >  {
> > -   pthread_mutex_lock(_control->mutex);
> > -   if (once_control->state == PTHREAD_NEEDS_INIT) {
> > +   switch (atomic_cas_uint(&(once_control->state), 0, 1)) {
> > +   case 0:
> > init_routine();
> > -   once_control->state = PTHREAD_DONE_INIT;
> > +   atomic_inc_int(_control->state);
> > +   _wake(_control->state, INT_MAX);
> > +   break;
> > +   case 1:
> > +   do {
> > +   _twait(_control->state, 1, 0, NULL);
> > +   } while (once_control->state != 2);
> > +   break;
> > +   default:
> > +   break;
> > }
> > -   pthread_mutex_unlock(_control->mutex);
> >  
> > -   return (0);
> > +   return 0;
> >  }
> > 
> > 
> > 
> 



Re: malloc vs emacs

2021-04-28 Thread Otto Moerbeek
On Sun, Apr 25, 2021 at 06:41:09PM +0200, Mark Kettenis wrote:

> > Date: Sun, 25 Apr 2021 17:53:31 +0200
> > From: Otto Moerbeek 
> > 
> > Hi,
> > 
> > A local test and jca@ confirm the special casing isn't needed anymore.
> > 
> > Two things:
> > 
> > - This could do with a ports bulk build to find other offenders
> > 
> > - Would this require a libc bump?
> 
> Unless I'm mistaken, this removes the PLT entries for
> malloc/calloc/realloc/free.  That means it will no longer be possible
> to intercept calls to those functions from within libc.  Intercepting
> these calls is what some memory leak detection tools do.  With this
> diff those tools will no longer see allocations made by libc.
> 
> Is that something people care about?

Personally I do not care a lot. Though if people do (nobody reacted
so far), would changing it to weak help?

(I still have a WIP project to get malloc tracing via a utrace export
( not 100% happy about it, mostly since we do not have a robust way to
get stacktraces).

-Otto

> 
> 
> > Index: hidden/stdlib.h
> > ===
> > RCS file: /cvs/src/lib/libc/hidden/stdlib.h,v
> > retrieving revision 1.16
> > diff -u -p -r1.16 stdlib.h
> > --- hidden/stdlib.h 10 May 2019 15:03:24 -  1.16
> > +++ hidden/stdlib.h 24 Apr 2021 11:12:27 -
> > @@ -54,7 +54,7 @@ PROTO_STD_DEPRECATED(_Exit);
> >  PROTO_DEPRECATED(a64l);
> >  PROTO_NORMAL(abort);
> >  PROTO_NORMAL(abs);
> > -/* PROTO_NORMAL(aligned_alloc) not yet, breaks emacs */
> > +PROTO_NORMAL(aligned_alloc);
> >  PROTO_NORMAL(arc4random);
> >  PROTO_NORMAL(arc4random_buf);
> >  PROTO_NORMAL(arc4random_uniform);
> > @@ -64,7 +64,7 @@ PROTO_NORMAL(atoi);
> >  PROTO_STD_DEPRECATED(atol);
> >  PROTO_STD_DEPRECATED(atoll);
> >  PROTO_STD_DEPRECATED(bsearch);
> > -/*PROTO_NORMAL(calloc);not yet, breaks emacs */
> > +PROTO_NORMAL(calloc);
> >  PROTO_NORMAL(calloc_conceal);
> >  PROTO_NORMAL(cgetcap);
> >  PROTO_NORMAL(cgetclose);
> > @@ -85,7 +85,7 @@ PROTO_DEPRECATED(ecvt);
> >  PROTO_NORMAL(erand48);
> >  PROTO_NORMAL(exit);
> >  PROTO_DEPRECATED(fcvt);
> > -/*PROTO_NORMAL(free);  not yet, breaks emacs */
> > +PROTO_NORMAL(free);
> >  PROTO_NORMAL(freezero);
> >  PROTO_DEPRECATED(gcvt);
> >  PROTO_DEPRECATED(getbsize);
> > @@ -105,7 +105,7 @@ PROTO_DEPRECATED(ldiv);
> >  PROTO_STD_DEPRECATED(llabs);
> >  PROTO_STD_DEPRECATED(lldiv);
> >  PROTO_DEPRECATED(lrand48);
> > -/*PROTO_NORMAL(malloc);not yet, breaks emacs */
> > +PROTO_NORMAL(malloc);
> >  PROTO_NORMAL(malloc_conceal);
> >  PROTO_STD_DEPRECATED(mblen);
> >  PROTO_STD_DEPRECATED(mbstowcs);
> > @@ -119,7 +119,7 @@ PROTO_DEPRECATED(mkstemps);
> >  PROTO_DEPRECATED(mktemp);
> >  PROTO_DEPRECATED(mrand48);
> >  PROTO_DEPRECATED(nrand48);
> > -/*Proto_Normal(Posix_Memalign);Not Yet, Breaks emacs */
> > +PROTO_NORMAL(posix_memalign);
> >  PROTO_DEPRECATED(posix_openpt);
> >  PROTO_DEPRECATED(ptsname);
> >  PROTO_NORMAL(putenv);
> > @@ -130,7 +130,7 @@ PROTO_DEPRECATED(radixsort);
> >  PROTO_STD_DEPRECATED(rand);
> >  PROTO_NORMAL(rand_r);
> >  PROTO_DEPRECATED(random);
> > -/*PROTO_NORMAL(realloc);   not yet, breaks emacs */
> > +PROTO_NORMAL(realloc);
> >  PROTO_NORMAL(reallocarray);
> >  PROTO_NORMAL(recallocarray);
> >  PROTO_DEPRECATED(realpath);
> > Index: stdlib/malloc.c
> > ===
> > RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v
> > retrieving revision 1.270
> > diff -u -p -r1.270 malloc.c
> > --- stdlib/malloc.c 9 Apr 2021 06:05:21 -   1.270
> > +++ stdlib/malloc.c 24 Apr 2021 11:12:27 -
> > @@ -1284,7 +1284,7 @@ malloc(size_t size)
> > EPILOGUE()
> > return r;
> >  }
> > -/*DEF_STRONG(malloc);*/
> > +DEF_STRONG(malloc);
> >  
> >  void *
> >  malloc_conceal(size_t size)
> > @@ -1472,7 +1472,7 @@ free(void *ptr)
> > _MALLOC_UNLOCK(d->mutex);
> > errno = saved_errno;
> >  }
> > -/*DEF_STRONG(free);*/
> > +DEF_STRONG(free);
> >  
> >  static void
> >  freezero_p(void *ptr, size_t sz)
> > @@ -1695,7 +1695,7 @@ realloc(void *ptr, size_t size)
> > EPILOGUE()
> > return r;
> >  }
> > -/*DEF_STRONG(realloc);*/
> > +DEF_STRONG(realloc);
> >  
> >  /*
> > 

malloc vs emacs

2021-04-25 Thread Otto Moerbeek
Hi,

A local test and jca@ confirm the special casing isn't needed anymore.

Two things:

- This could do with a ports bulk build to find other offenders

- Would this require a libc bump? 

-Otto

Index: hidden/stdlib.h
===
RCS file: /cvs/src/lib/libc/hidden/stdlib.h,v
retrieving revision 1.16
diff -u -p -r1.16 stdlib.h
--- hidden/stdlib.h 10 May 2019 15:03:24 -  1.16
+++ hidden/stdlib.h 24 Apr 2021 11:12:27 -
@@ -54,7 +54,7 @@ PROTO_STD_DEPRECATED(_Exit);
 PROTO_DEPRECATED(a64l);
 PROTO_NORMAL(abort);
 PROTO_NORMAL(abs);
-/* PROTO_NORMAL(aligned_alloc) not yet, breaks emacs */
+PROTO_NORMAL(aligned_alloc);
 PROTO_NORMAL(arc4random);
 PROTO_NORMAL(arc4random_buf);
 PROTO_NORMAL(arc4random_uniform);
@@ -64,7 +64,7 @@ PROTO_NORMAL(atoi);
 PROTO_STD_DEPRECATED(atol);
 PROTO_STD_DEPRECATED(atoll);
 PROTO_STD_DEPRECATED(bsearch);
-/*PROTO_NORMAL(calloc);not yet, breaks emacs */
+PROTO_NORMAL(calloc);
 PROTO_NORMAL(calloc_conceal);
 PROTO_NORMAL(cgetcap);
 PROTO_NORMAL(cgetclose);
@@ -85,7 +85,7 @@ PROTO_DEPRECATED(ecvt);
 PROTO_NORMAL(erand48);
 PROTO_NORMAL(exit);
 PROTO_DEPRECATED(fcvt);
-/*PROTO_NORMAL(free);  not yet, breaks emacs */
+PROTO_NORMAL(free);
 PROTO_NORMAL(freezero);
 PROTO_DEPRECATED(gcvt);
 PROTO_DEPRECATED(getbsize);
@@ -105,7 +105,7 @@ PROTO_DEPRECATED(ldiv);
 PROTO_STD_DEPRECATED(llabs);
 PROTO_STD_DEPRECATED(lldiv);
 PROTO_DEPRECATED(lrand48);
-/*PROTO_NORMAL(malloc);not yet, breaks emacs */
+PROTO_NORMAL(malloc);
 PROTO_NORMAL(malloc_conceal);
 PROTO_STD_DEPRECATED(mblen);
 PROTO_STD_DEPRECATED(mbstowcs);
@@ -119,7 +119,7 @@ PROTO_DEPRECATED(mkstemps);
 PROTO_DEPRECATED(mktemp);
 PROTO_DEPRECATED(mrand48);
 PROTO_DEPRECATED(nrand48);
-/*PROTO_NORMAL(posix_memalign);not yet, breaks emacs */
+PROTO_NORMAL(posix_memalign);
 PROTO_DEPRECATED(posix_openpt);
 PROTO_DEPRECATED(ptsname);
 PROTO_NORMAL(putenv);
@@ -130,7 +130,7 @@ PROTO_DEPRECATED(radixsort);
 PROTO_STD_DEPRECATED(rand);
 PROTO_NORMAL(rand_r);
 PROTO_DEPRECATED(random);
-/*PROTO_NORMAL(realloc);   not yet, breaks emacs */
+PROTO_NORMAL(realloc);
 PROTO_NORMAL(reallocarray);
 PROTO_NORMAL(recallocarray);
 PROTO_DEPRECATED(realpath);
Index: stdlib/malloc.c
===
RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v
retrieving revision 1.270
diff -u -p -r1.270 malloc.c
--- stdlib/malloc.c 9 Apr 2021 06:05:21 -   1.270
+++ stdlib/malloc.c 24 Apr 2021 11:12:27 -
@@ -1284,7 +1284,7 @@ malloc(size_t size)
EPILOGUE()
return r;
 }
-/*DEF_STRONG(malloc);*/
+DEF_STRONG(malloc);
 
 void *
 malloc_conceal(size_t size)
@@ -1472,7 +1472,7 @@ free(void *ptr)
_MALLOC_UNLOCK(d->mutex);
errno = saved_errno;
 }
-/*DEF_STRONG(free);*/
+DEF_STRONG(free);
 
 static void
 freezero_p(void *ptr, size_t sz)
@@ -1695,7 +1695,7 @@ realloc(void *ptr, size_t size)
EPILOGUE()
return r;
 }
-/*DEF_STRONG(realloc);*/
+DEF_STRONG(realloc);
 
 /*
  * This is sqrt(SIZE_MAX+1), as s1*s2 <= SIZE_MAX
@@ -1726,7 +1726,7 @@ calloc(size_t nmemb, size_t size)
EPILOGUE()
return r;
 }
-/*DEF_STRONG(calloc);*/
+DEF_STRONG(calloc);
 
 void *
 calloc_conceal(size_t nmemb, size_t size)
@@ -2036,7 +2036,7 @@ err:
errno = saved_errno;
return res;
 }
-/*DEF_STRONG(posix_memalign);*/
+DEF_STRONG(posix_memalign);
 
 void *
 aligned_alloc(size_t alignment, size_t size)
@@ -2061,7 +2061,7 @@ aligned_alloc(size_t alignment, size_t s
EPILOGUE()
return r;
 }
-/*DEF_STRONG(aligned_alloc);*/
+DEF_STRONG(aligned_alloc);
 
 #ifdef MALLOC_STATS
 



Re: small malloc diff

2021-04-08 Thread Otto Moerbeek
On Fri, Apr 09, 2021 at 07:39:05AM +0200, Theo Buehler wrote:

> On Fri, Apr 09, 2021 at 07:36:35AM +0200, Otto Moerbeek wrote:
> > On Thu, Apr 01, 2021 at 11:23:58AM +0200, Otto Moerbeek wrote:
> > 
> > > Hi,
> > > 
> > > here's a small malloc diff. Most important part is an extra internal
> > > consistency check. I have been running this for a few week already,
> > 
> > ping?
> 
> Been running this since you posted it on several busy boxes.
> 
> ok tb

Thanks, will commit soon.

-Otto

> 
> > 
> > > 
> > >   -Otto
> > > 
> > > Index: stdlib/malloc.3
> > > ===
> > > RCS file: /cvs/src/lib/libc/stdlib/malloc.3,v
> > > retrieving revision 1.127
> > > diff -u -p -r1.127 malloc.3
> > > --- stdlib/malloc.3   25 Feb 2021 15:20:18 -  1.127
> > > +++ stdlib/malloc.3   1 Apr 2021 09:21:59 -
> > > @@ -366,7 +366,8 @@ If a program changes behavior if any of 
> > >  are used,
> > >  it is buggy.
> > >  .Pp
> > > -The default number of free pages cached is 64 per malloc pool.
> > > +The default size of the cache is 64 single page allocations.
> > > +It also caches a number of larger regions.
> > >  Multi-threaded programs use multiple pools.
> > >  .Sh RETURN VALUES
> > >  Upon successful completion, the allocation functions
> > > Index: stdlib/malloc.c
> > > ===
> > > RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v
> > > retrieving revision 1.269
> > > diff -u -p -r1.269 malloc.c
> > > --- stdlib/malloc.c   9 Mar 2021 07:39:28 -   1.269
> > > +++ stdlib/malloc.c   1 Apr 2021 09:22:00 -
> > > @@ -1404,6 +1404,8 @@ ofree(struct dir_info **argpool, void *p
> > >   } else {
> > >   /* Validate and optionally canary check */
> > >   struct chunk_info *info = (struct chunk_info *)r->size;
> > > + if (info->size != sz)
> > > + wrterror(pool, "internal struct corrupt");
> > >   find_chunknum(pool, info, p, mopts.chunk_canaries);
> > >   if (!clear) {
> > >   void *tmp;
> > > @@ -1608,6 +1610,7 @@ orealloc(struct dir_info **argpool, void
> > >   }
> > >   if (munmap((char *)r->p + rnewsz, roldsz - rnewsz))
> > >   wrterror(pool, "munmap %p", (char *)r->p + 
> > > rnewsz);
> > > + STATS_SUB(d->malloc_used, roldsz - rnewsz);
> > >   r->size = gnewsz;
> > >   if (MALLOC_MOVE_COND(gnewsz)) {
> > >   void *pp = MALLOC_MOVE(r->p, gnewsz);
> > > 
> > > 
> > 
> 



Re: small malloc diff

2021-04-08 Thread Otto Moerbeek
On Thu, Apr 01, 2021 at 11:23:58AM +0200, Otto Moerbeek wrote:

> Hi,
> 
> here's a small malloc diff. Most important part is an extra internal
> consistency check. I have been running this for a few week already,

ping?

> 
>   -Otto
> 
> Index: stdlib/malloc.3
> ===
> RCS file: /cvs/src/lib/libc/stdlib/malloc.3,v
> retrieving revision 1.127
> diff -u -p -r1.127 malloc.3
> --- stdlib/malloc.3   25 Feb 2021 15:20:18 -  1.127
> +++ stdlib/malloc.3   1 Apr 2021 09:21:59 -
> @@ -366,7 +366,8 @@ If a program changes behavior if any of 
>  are used,
>  it is buggy.
>  .Pp
> -The default number of free pages cached is 64 per malloc pool.
> +The default size of the cache is 64 single page allocations.
> +It also caches a number of larger regions.
>  Multi-threaded programs use multiple pools.
>  .Sh RETURN VALUES
>  Upon successful completion, the allocation functions
> Index: stdlib/malloc.c
> ===
> RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v
> retrieving revision 1.269
> diff -u -p -r1.269 malloc.c
> --- stdlib/malloc.c   9 Mar 2021 07:39:28 -   1.269
> +++ stdlib/malloc.c   1 Apr 2021 09:22:00 -
> @@ -1404,6 +1404,8 @@ ofree(struct dir_info **argpool, void *p
>   } else {
>   /* Validate and optionally canary check */
>   struct chunk_info *info = (struct chunk_info *)r->size;
> + if (info->size != sz)
> + wrterror(pool, "internal struct corrupt");
>   find_chunknum(pool, info, p, mopts.chunk_canaries);
>   if (!clear) {
>   void *tmp;
> @@ -1608,6 +1610,7 @@ orealloc(struct dir_info **argpool, void
>   }
>   if (munmap((char *)r->p + rnewsz, roldsz - rnewsz))
>   wrterror(pool, "munmap %p", (char *)r->p + 
> rnewsz);
> + STATS_SUB(d->malloc_used, roldsz - rnewsz);
>   r->size = gnewsz;
>   if (MALLOC_MOVE_COND(gnewsz)) {
>   void *pp = MALLOC_MOVE(r->p, gnewsz);
> 
> 



small malloc diff

2021-04-01 Thread Otto Moerbeek
Hi,

here's a small malloc diff. Most important part is an extra internal
consistency check. I have been running this for a few week already,

-Otto

Index: stdlib/malloc.3
===
RCS file: /cvs/src/lib/libc/stdlib/malloc.3,v
retrieving revision 1.127
diff -u -p -r1.127 malloc.3
--- stdlib/malloc.3 25 Feb 2021 15:20:18 -  1.127
+++ stdlib/malloc.3 1 Apr 2021 09:21:59 -
@@ -366,7 +366,8 @@ If a program changes behavior if any of 
 are used,
 it is buggy.
 .Pp
-The default number of free pages cached is 64 per malloc pool.
+The default size of the cache is 64 single page allocations.
+It also caches a number of larger regions.
 Multi-threaded programs use multiple pools.
 .Sh RETURN VALUES
 Upon successful completion, the allocation functions
Index: stdlib/malloc.c
===
RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v
retrieving revision 1.269
diff -u -p -r1.269 malloc.c
--- stdlib/malloc.c 9 Mar 2021 07:39:28 -   1.269
+++ stdlib/malloc.c 1 Apr 2021 09:22:00 -
@@ -1404,6 +1404,8 @@ ofree(struct dir_info **argpool, void *p
} else {
/* Validate and optionally canary check */
struct chunk_info *info = (struct chunk_info *)r->size;
+   if (info->size != sz)
+   wrterror(pool, "internal struct corrupt");
find_chunknum(pool, info, p, mopts.chunk_canaries);
if (!clear) {
void *tmp;
@@ -1608,6 +1610,7 @@ orealloc(struct dir_info **argpool, void
}
if (munmap((char *)r->p + rnewsz, roldsz - rnewsz))
wrterror(pool, "munmap %p", (char *)r->p + 
rnewsz);
+   STATS_SUB(d->malloc_used, roldsz - rnewsz);
r->size = gnewsz;
if (MALLOC_MOVE_COND(gnewsz)) {
void *pp = MALLOC_MOVE(r->p, gnewsz);




Re: vmm crash on 6.9-beta

2021-03-22 Thread Otto Moerbeek
On Mon, Mar 22, 2021 at 03:20:37PM +0100, Mischa wrote:

> 
> 
> > On 22 Mar 2021, at 15:18, Otto Moerbeek  wrote:
> > 
> > On Mon, Mar 22, 2021 at 03:06:40PM +0100, Mischa wrote:
> > 
> >>> On 22 Mar 2021, at 15:05, Dave Voutila  wrote:
> >>> Otto Moerbeek writes:
> >>>> On Mon, Mar 22, 2021 at 09:51:19AM -0400, Dave Voutila wrote:
> >>>>> Otto Moerbeek writes:
> >>>>>> On Mon, Mar 22, 2021 at 01:47:18PM +0100, Mischa wrote:
> >>>>>>>> On 22 Mar 2021, at 13:43, Stuart Henderson  
> >>>>>>>> wrote:
> >>>>>>>> 
> >>>>>>>>>> Created a fresh install qcow2 image and derived 35 new VMs from it.
> >>>>>>>>>> Then I started all the VMs in four cycles, 10 VMs per cycle and 
> >>>>>>>>>> waiting 240 seconds after each cycle.
> >>>>>>>>>> Similar to the staggered start based on the amount of CPUs.
> >>>>>>>> 
> >>>>>>>>> For me this is not enough info to even try to reproduce, I know 
> >>>>>>>>> little
> >>>>>>>>> of vmm or vmd and have no idea what "derive" means in this context.
> >>>>>>>> 
> >>>>>>>> This is a big bit of information that was missing from the original
> >>>>>>> 
> >>>>>>> Well.. could have been better described indeed. :))
> >>>>>>> " I created 41 additional VMs based on a single qcow2 base image.”
> >>>>>>> 
> >>>>>>>> report ;) qcow has a concept of a read-only base image (or 'backing
> >>>>>>>> file') which can be shared between VMs, with writes diverted to a
> >>>>>>>> separate image ('derived image').
> >>>>>>>> 
> >>>>>>>> So e.g. you can create a base image, do a simple OS install for a
> >>>>>>>> particular OS version to that base image, then you stop using that
> >>>>>>>> for a VM and just use it as a base to create derived images from.
> >>>>>>>> You then run VMs using the derived image and make whatever config
> >>>>>>>> changes. If you have a bunch of VMs using the same OS release then
> >>>>>>>> you save some disk space for the common files.
> >>>>>>>> 
> >>>>>>>> Mischa did you leave a VM running which is working on the base
> >>>>>>>> image directly? That would certainly cause problems.
> >>>>>>> 
> >>>>>>> I did indeed. Let me try that again without keeping the base image 
> >>>>>>> running.
> >>>>>> 
> >>>>>> Right. As a safeguard, I would change the base image to be r/o.
> >>>>> 
> >>>>> vmd(8) should treating it r/o...the config process is responsible for
> >>>>> opening the disk files and passing the fd's to the vm process. In
> >>>>> config.c, the call to open(2) for the base images should be using the
> >>>>> flags O_RDONLY | O_NONBLOCK.
> >>>>> 
> >>>>> A ktrace on my system shows that's the case. Below, "new.qcow2" is a new
> >>>>> disk image I based off the "alpine.qcow2" image:
> >>>>> 
> >>>>> 20862 vmd  CALL  
> >>>>> open(0x7f7d4370,0x26)
> >>>>> 20862 vmd  NAMI  "/home/dave/vm/new.qcow2"
> >>>>> 20862 vmd  RET   open 10/0xa
> >>>>> 20862 vmd  CALL  fstat(10,0x7f7d42b8)
> >>>>> 20862 vmd  STRU  struct stat { dev=1051, ino=19531847, 
> >>>>> mode=-rw--- , nlink=1, uid=1000<"dave">, gid=1000<"dave">, 
> >>>>> rdev=78096304, atime=1616420730<"Mar 22 09:45:30 2021">.509011764, 
> >>>>> mtime=1616420697<"Mar 22 09:44:57 2021">.189185158, 
> >>>>> ctime=1616420697<"Mar 22 09:44:57 2021">.189185158, size=262144, 
> >>>>> blocks=256, blksize=32768, flags=0x0, gen=0xb64d5d98 }
> >>>>> 20862 vmd  RET   fstat 0
> >>>>> 20862 vmd  CALL  kbind(0x7f7d39d8,24,0x2a9349e63ae9950c)
> >>

Re: vmm crash on 6.9-beta

2021-03-22 Thread Otto Moerbeek
On Mon, Mar 22, 2021 at 03:06:40PM +0100, Mischa wrote:

> > On 22 Mar 2021, at 15:05, Dave Voutila  wrote:
> > Otto Moerbeek writes:
> >> On Mon, Mar 22, 2021 at 09:51:19AM -0400, Dave Voutila wrote:
> >>> Otto Moerbeek writes:
> >>>> On Mon, Mar 22, 2021 at 01:47:18PM +0100, Mischa wrote:
> >>>>>> On 22 Mar 2021, at 13:43, Stuart Henderson  
> >>>>>> wrote:
> >>>>>> 
> >>>>>>>> Created a fresh install qcow2 image and derived 35 new VMs from it.
> >>>>>>>> Then I started all the VMs in four cycles, 10 VMs per cycle and 
> >>>>>>>> waiting 240 seconds after each cycle.
> >>>>>>>> Similar to the staggered start based on the amount of CPUs.
> >>>>>> 
> >>>>>>> For me this is not enough info to even try to reproduce, I know little
> >>>>>>> of vmm or vmd and have no idea what "derive" means in this context.
> >>>>>> 
> >>>>>> This is a big bit of information that was missing from the original
> >>>>> 
> >>>>> Well.. could have been better described indeed. :))
> >>>>> " I created 41 additional VMs based on a single qcow2 base image.”
> >>>>> 
> >>>>>> report ;) qcow has a concept of a read-only base image (or 'backing
> >>>>>> file') which can be shared between VMs, with writes diverted to a
> >>>>>> separate image ('derived image').
> >>>>>> 
> >>>>>> So e.g. you can create a base image, do a simple OS install for a
> >>>>>> particular OS version to that base image, then you stop using that
> >>>>>> for a VM and just use it as a base to create derived images from.
> >>>>>> You then run VMs using the derived image and make whatever config
> >>>>>> changes. If you have a bunch of VMs using the same OS release then
> >>>>>> you save some disk space for the common files.
> >>>>>> 
> >>>>>> Mischa did you leave a VM running which is working on the base
> >>>>>> image directly? That would certainly cause problems.
> >>>>> 
> >>>>> I did indeed. Let me try that again without keeping the base image 
> >>>>> running.
> >>>> 
> >>>> Right. As a safeguard, I would change the base image to be r/o.
> >>> 
> >>> vmd(8) should treating it r/o...the config process is responsible for
> >>> opening the disk files and passing the fd's to the vm process. In
> >>> config.c, the call to open(2) for the base images should be using the
> >>> flags O_RDONLY | O_NONBLOCK.
> >>> 
> >>> A ktrace on my system shows that's the case. Below, "new.qcow2" is a new
> >>> disk image I based off the "alpine.qcow2" image:
> >>> 
> >>> 20862 vmd  CALL  open(0x7f7d4370,0x26)
> >>> 20862 vmd  NAMI  "/home/dave/vm/new.qcow2"
> >>> 20862 vmd  RET   open 10/0xa
> >>> 20862 vmd  CALL  fstat(10,0x7f7d42b8)
> >>> 20862 vmd  STRU  struct stat { dev=1051, ino=19531847, 
> >>> mode=-rw--- , nlink=1, uid=1000<"dave">, gid=1000<"dave">, 
> >>> rdev=78096304, atime=1616420730<"Mar 22 09:45:30 2021">.509011764, 
> >>> mtime=1616420697<"Mar 22 09:44:57 2021">.189185158, ctime=1616420697<"Mar 
> >>> 22 09:44:57 2021">.189185158, size=262144, blocks=256, blksize=32768, 
> >>> flags=0x0, gen=0xb64d5d98 }
> >>> 20862 vmd  RET   fstat 0
> >>> 20862 vmd  CALL  kbind(0x7f7d39d8,24,0x2a9349e63ae9950c)
> >>> 20862 vmd  RET   kbind 0
> >>> 20862 vmd  CALL  pread(10,0x7f7d42a8,0x68,0)
> >>> 20862 vmd  GIO   fd 10 read 104 bytes
> >>>   
> >>> "QFI\M-{\0\0\0\^C\0\0\0\0\0\0\0h\0\0\0\f\0\0\0\^P\0\0\0\^E\0\0\0\0\0\0\
> >>>
> >>> \0\0\0\0\0(\0\0\0\0\0\^A\0\0\0\0\0\0\0\^B\0\0\0\0\0\^A\0\0\0\0\0\0\0\
> >>>
> >>> \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\^D\0\
> >>>\0\0h"
> >>> 20862 vmd  RET   pread 104/0x68
> >>> 20862 vmd  CALL  pread(10,0x7f7d4770,0xc,0x68)
> >>> 20862 vm

Re: vmm crash on 6.9-beta

2021-03-22 Thread Otto Moerbeek
On Mon, Mar 22, 2021 at 01:59:17PM +, Stuart Henderson wrote:

> > > I'm more familiar with the vmd(8) codebase than any ffs stuff, but I
> > > don't think the issue is the base image being r/w.
> > 
> > AFAIKS, the issue is that if you start a vm modifying the base because it
> > uses it as a regular image, that r/o open for the other vms does not
> > matter a lot,
> 
> vmd could possibly refuse to use an image as a base for a derived image
> if it already has the base open (or vice-versa), but then some other
> software (e.g. qemu, qemu-img) could modify the base image too, there
> are always ways to break things not matter what the safeguards.
> 

Hence my safeguard at the OS level: chmod it...

-Otto



Re: vmm crash on 6.9-beta

2021-03-22 Thread Otto Moerbeek
On Mon, Mar 22, 2021 at 09:51:19AM -0400, Dave Voutila wrote:

> 
> Otto Moerbeek writes:
> 
> > On Mon, Mar 22, 2021 at 01:47:18PM +0100, Mischa wrote:
> >
> >>
> >>
> >> > On 22 Mar 2021, at 13:43, Stuart Henderson  wrote:
> >> >
> >> >>> Created a fresh install qcow2 image and derived 35 new VMs from it.
> >> >>> Then I started all the VMs in four cycles, 10 VMs per cycle and 
> >> >>> waiting 240 seconds after each cycle.
> >> >>> Similar to the staggered start based on the amount of CPUs.
> >> >
> >> >> For me this is not enough info to even try to reproduce, I know little
> >> >> of vmm or vmd and have no idea what "derive" means in this context.
> >> >
> >> > This is a big bit of information that was missing from the original
> >>
> >> Well.. could have been better described indeed. :))
> >> " I created 41 additional VMs based on a single qcow2 base image.”
> >>
> >> > report ;) qcow has a concept of a read-only base image (or 'backing
> >> > file') which can be shared between VMs, with writes diverted to a
> >> > separate image ('derived image').
> >> >
> >> > So e.g. you can create a base image, do a simple OS install for a
> >> > particular OS version to that base image, then you stop using that
> >> > for a VM and just use it as a base to create derived images from.
> >> > You then run VMs using the derived image and make whatever config
> >> > changes. If you have a bunch of VMs using the same OS release then
> >> > you save some disk space for the common files.
> >> >
> >> > Mischa did you leave a VM running which is working on the base
> >> > image directly? That would certainly cause problems.
> >>
> >> I did indeed. Let me try that again without keeping the base image running.
> >
> > Right. As a safeguard, I would change the base image to be r/o.
> 
> vmd(8) should treating it r/o...the config process is responsible for
> opening the disk files and passing the fd's to the vm process. In
> config.c, the call to open(2) for the base images should be using the
> flags O_RDONLY | O_NONBLOCK.
> 
> A ktrace on my system shows that's the case. Below, "new.qcow2" is a new
> disk image I based off the "alpine.qcow2" image:
> 
>  20862 vmd  CALL  open(0x7f7d4370,0x26)
>  20862 vmd  NAMI  "/home/dave/vm/new.qcow2"
>  20862 vmd  RET   open 10/0xa
>  20862 vmd  CALL  fstat(10,0x7f7d42b8)
>  20862 vmd  STRU  struct stat { dev=1051, ino=19531847, mode=-rw--- , 
> nlink=1, uid=1000<"dave">, gid=1000<"dave">, rdev=78096304, 
> atime=1616420730<"Mar 22 09:45:30 2021">.509011764, mtime=1616420697<"Mar 22 
> 09:44:57 2021">.189185158, ctime=1616420697<"Mar 22 09:44:57 
> 2021">.189185158, size=262144, blocks=256, blksize=32768, flags=0x0, 
> gen=0xb64d5d98 }
>  20862 vmd  RET   fstat 0
>  20862 vmd  CALL  kbind(0x7f7d39d8,24,0x2a9349e63ae9950c)
>  20862 vmd  RET   kbind 0
>  20862 vmd  CALL  pread(10,0x7f7d42a8,0x68,0)
>  20862 vmd  GIO   fd 10 read 104 bytes
>"QFI\M-{\0\0\0\^C\0\0\0\0\0\0\0h\0\0\0\f\0\0\0\^P\0\0\0\^E\0\0\0\0\0\0\
> \0\0\0\0\0(\0\0\0\0\0\^A\0\0\0\0\0\0\0\^B\0\0\0\0\0\^A\0\0\0\0\0\0\0\
> \0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\^D\0\
> \0\0h"
>  20862 vmd  RET   pread 104/0x68
>  20862 vmd  CALL  pread(10,0x7f7d4770,0xc,0x68)
>  20862 vmd  GIO   fd 10 read 12 bytes
>"alpine.qcow2"
>  20862 vmd  RET   pread 12/0xc
>  20862 vmd  CALL  kbind(0x7f7d39d8,24,0x2a9349e63ae9950c)
>  20862 vmd  RET   kbind 0
>  20862 vmd  CALL  kbind(0x7f7d39d8,24,0x2a9349e63ae9950c)
>  20862 vmd  RET   kbind 0
>  20862 vmd  CALL  __realpath(0x7f7d3ea0,0x7f7d3680)
>  20862 vmd  NAMI  "/home/dave/vm/alpine.qcow2"
>  20862 vmd  NAMI  "/home/dave/vm/alpine.qcow2"
>  20862 vmd  RET   __realpath 0
>  20862 vmd  CALL  open(0x7f7d4370,0x4)
>  20862 vmd  NAMI  "/home/dave/vm/alpine.qcow2"
>  20862 vmd  RET   open 11/0xb
>  20862 vmd  CALL  fstat(11,0x7f7d42b8)
> 
> 
> I'm more familiar with the vmd(8) codebase than any ffs stuff, but I
> don't think the issue is the base image being r/w.
> 
> -Dave

AFAIKS, the issue is that if you start a vm modifying the base because it
uses it as a regular image, that r/o open for the other vms does not
matter a lot,

-OPtto



Re: vmm crash on 6.9-beta

2021-03-22 Thread Otto Moerbeek
On Mon, Mar 22, 2021 at 01:47:18PM +0100, Mischa wrote:

> 
> 
> > On 22 Mar 2021, at 13:43, Stuart Henderson  wrote:
> > 
> >>> Created a fresh install qcow2 image and derived 35 new VMs from it.
> >>> Then I started all the VMs in four cycles, 10 VMs per cycle and waiting 
> >>> 240 seconds after each cycle.
> >>> Similar to the staggered start based on the amount of CPUs.
> > 
> >> For me this is not enough info to even try to reproduce, I know little
> >> of vmm or vmd and have no idea what "derive" means in this context.
> > 
> > This is a big bit of information that was missing from the original
> 
> Well.. could have been better described indeed. :))
> " I created 41 additional VMs based on a single qcow2 base image.”
> 
> > report ;) qcow has a concept of a read-only base image (or 'backing
> > file') which can be shared between VMs, with writes diverted to a
> > separate image ('derived image').
> > 
> > So e.g. you can create a base image, do a simple OS install for a
> > particular OS version to that base image, then you stop using that
> > for a VM and just use it as a base to create derived images from.
> > You then run VMs using the derived image and make whatever config
> > changes. If you have a bunch of VMs using the same OS release then
> > you save some disk space for the common files.
> > 
> > Mischa did you leave a VM running which is working on the base
> > image directly? That would certainly cause problems.
> 
> I did indeed. Let me try that again without keeping the base image running.

Right. As a safeguard, I would change the base image to be r/o.

I was just looking at your script and scratching my head: why is Mischa
starting vm01 ...

-Otto

> 
> Mischa
> 
> > 
> > 
> >> Would it be possiblet for you to show the exact steps (preferably a
> >> script) to reproduce the issue?
> >> 
> >> Though the specific hardware might play a role as well...
> >> 
> >>-Otto
> >>> 
> >>> Mischa
> >>> 
> >>> OpenBSD 6.9-beta (GENERIC.MP) #421: Sun Mar 21 13:17:22 MDT 2021
> >>>dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> >>> real mem = 137374924800 (131010MB)
> >>> avail mem = 133196165120 (127025MB)
> >>> random: good seed from bootblocks
> >>> mpath0 at root
> >>> scsibus0 at mpath0: 256 targets
> >>> mainbus0 at root
> >>> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xbf42c000 (99 entries)
> >>> bios0: vendor Dell Inc. version "2.8.0" date 06/26/2019
> >>> bios0: Dell Inc. PowerEdge R620
> >>> acpi0 at bios0: ACPI 3.0
> >>> acpi0: sleep states S0 S4 S5
> >>> acpi0: tables DSDT FACP APIC SPCR HPET DMAR MCFG WD__ SLIC ERST HEST BERT 
> >>> EINJ TCPA PC__ SRAT SSDT
> >>> acpi0: wakeup devices PCI0(S5) PCI1(S5)
> >>> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> >>> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> >>> cpu0 at mainbus0: apid 0 (boot processor)
> >>> cpu0: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, 2600.34 MHz, 06-2d-07
> >>> cpu0: 
> >>> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> >>> cpu0: 256KB 64b/line 8-way L2 cache
> >>> cpu0: smt 0, core 0, package 0
> >>> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> >>> cpu0: apic clock running at 99MHz
> >>> cpu0: mwait min=64, max=64, C-substates=0.2.1.1.2, IBE
> >>> cpu1 at mainbus0: apid 32 (application processor)
> >>> cpu1: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, 1200.02 MHz, 06-2d-07
> >>> cpu1: 
> >>> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> >>> cpu1: 256KB 64b/line 8-way L2 cache
> >>> cpu1: smt 0, core 0, package 1
> >>> cpu2 at mainbus0: apid 2 (application processor)
> >>> cpu2: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, 2600.03 MHz, 06-2d-07
> >>> cpu2: 
> >>> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> >>> cpu2: 256KB 64b/line 8-way L2 cache
> >>> cpu2: smt 0, core 1, package 0
> >>> cpu3 at mainbus0: apid 34 (application processor)
> >>> cpu3: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, 2600.03 MHz, 06-2d-07
> >>> cpu3: 
> >>> 

Re: vmm crash on 6.9-beta

2021-03-22 Thread Otto Moerbeek
On Mon, Mar 22, 2021 at 11:34:25AM +0100, Mischa wrote:

> > On 21 Mar 2021, at 02:31, Theo de Raadt  wrote:
> > Otto Moerbeek  wrote:
> >> On Fri, Mar 19, 2021 at 04:15:31PM +, Stuart Henderson wrote:
> >> 
> >>> On 2021/03/19 17:05, Jan Klemkow wrote:
> >>>> Hi,
> >>>> 
> >>>> I had the same issue a few days ago a server hardware of mine.  I just
> >>>> ran 'cvs up'.  So, it looks like a generic bug in FFS and not related to
> >>>> vmm.
> >>> 
> >>> This panic generally relates to filesystem corruption. If fsck doesn't
> >>> help then recreating which filesystem is triggering it is usually needed.
> >> 
> >> Yeah, once in a while we see reports of it. It seems to be some nasty
> >> conspiracy between the generic filesystem code, ffs and fsck_ffs.
> >> Maybe even the device (driver) itself is involved. A possible
> >> underlying issue may be that some operation are re-ordered while they
> >> should not.
> > 
> > Yes, it does hint at a reordering.
> > 
> >> Now the strange thing is, fsck_ffs *should* be able to repair the
> >> inconsistency, but it appears in some cases it is not, and some bits
> >> on the disk remain to trigger it again.
> > 
> > fsck_ffs can only repair one inconsistancy.  There are a number of lockstep
> > operations, I suppose we can call them acid-in-lowercase, which allow fsck
> > to determine at which point the crashed system gave up the ghost.  fsck then
> > removes the partial operations, leaving a viable filesystem.  But if the 
> > disk
> > layer lands later writes but not earlier writes, fsck cannot handle it.
> 
> I managed to re-create the issue.
> 
> Created a fresh install qcow2 image and derived 35 new VMs from it.
> Then I started all the VMs in four cycles, 10 VMs per cycle and waiting 240 
> seconds after each cycle.
> Similar to the staggered start based on the amount of CPUs.
> 
> This time it was “only” one VM that was affected by this. VM four that got 
> started.
> 
> ddb> show panic
> ffs_valloc: dup alloc
> ddb> trace
> db_enter() at db_enter+0x10
> panic(81dc21b2) at panic+0x12a
> ffs_inode_alloc(fd803c94ef00,81a4,fd803f7bbf00,800014d728b8) at 
> ffs
> _inode_alloc+0x442
> ufs_makeinode(81a4,fd803c930908,800014d72bb0,800014d72c00) at 
> ufs_m
> akeinode+0x7f
> ufs_create(800014d72960) at ufs_create+0x3c
> VOP_CREATE(fd803c930908,800014d72bb0,800014d72c00,800014d729c0)
>  at VOP_CREATE+0x4a
> vn_open(800014d72b80,602,1a4) at vn_open+0x182
> doopenat(8000c778,ff9c,f8fc28f00f4,601,1b6,800014d72d80) at 
> doo
> penat+0x1d0
> syscall(800014d72df0) at syscall+0x315
> Xsyscall() at Xsyscall+0x128
> end of kernel
> end trace frame: 0x7f7be450, count: -10
> 
> dmesg of the host below.

For me this is not enough info to even try to reproduce, I know little
of vmm or vmd and have no idea what "derive" means in this context.

Would it be possiblet for you to show the exact steps (preferably a
script) to reproduce the issue?

Though the specific hardware might play a role as well...

-Otto
> 
> Mischa
> 
> OpenBSD 6.9-beta (GENERIC.MP) #421: Sun Mar 21 13:17:22 MDT 2021
> dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> real mem = 137374924800 (131010MB)
> avail mem = 133196165120 (127025MB)
> random: good seed from bootblocks
> mpath0 at root
> scsibus0 at mpath0: 256 targets
> mainbus0 at root
> bios0 at mainbus0: SMBIOS rev. 2.7 @ 0xbf42c000 (99 entries)
> bios0: vendor Dell Inc. version "2.8.0" date 06/26/2019
> bios0: Dell Inc. PowerEdge R620
> acpi0 at bios0: ACPI 3.0
> acpi0: sleep states S0 S4 S5
> acpi0: tables DSDT FACP APIC SPCR HPET DMAR MCFG WD__ SLIC ERST HEST BERT 
> EINJ TCPA PC__ SRAT SSDT
> acpi0: wakeup devices PCI0(S5) PCI1(S5)
> acpitimer0 at acpi0: 3579545 Hz, 24 bits
> acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
> cpu0 at mainbus0: apid 0 (boot processor)
> cpu0: Intel(R) Xeon(R) CPU E5-2630 0 @ 2.30GHz, 2600.34 MHz, 06-2d-07
> cpu0: 
> FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE,SSE3,PCLMUL,DTES64,MWAIT,DS-CPL,VMX,SMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,DCA,SSE4.1,SSE4.2,x2APIC,POPCNT,DEADLINE,AES,XSAVE,AVX,NXE,PAGE1GB,RDTSCP,LONG,LAHF,PERF,ITSC,MD_CLEAR,IBRS,IBPB,STIBP,L1DF,SSBD,SENSOR,ARAT,XSAVEOPT,MELTDOWN
> cpu0: 256KB 64b/line 8-way L2 cache
> cpu0: smt 0, core 0, package 0
> mtrr: Pentium Pro MTRR support, 10 var ranges, 88 fixed ranges
> cpu0: api

Re: slaacd(8): pltime 0 and temporary addresses

2021-03-21 Thread Otto Moerbeek
On Sun, Mar 21, 2021 at 02:38:42PM +0100, Florian Obser wrote:

> 
> Don't warn that we can't form a temporary address when a router
> deprecates a prefix by sending a pltime of 0, this is normal.
> Continue warning when the pltime is smaller than 5 as this is almost
> certainly a configuration error.
> 
> OK?

yes,

-Otto

> 
> diff --git engine.c engine.c
> index 7b49b330328..94a4a232d6a 100644
> --- engine.c
> +++ engine.c
> @@ -1932,14 +1932,15 @@ update_iface_ra_prefix(struct slaacd_iface *iface, 
> struct radv *ra,
>  
>   /* privacy addresses do not depend on eui64 */
>   if (!found_privacy && iface->autoconfprivacy) {
> - if (prefix->pltime < PRIV_REGEN_ADVANCE) {
> + if (prefix->pltime >= PRIV_REGEN_ADVANCE) {
> + /* new privacy proposal */
> + gen_address_proposal(iface, ra, prefix, 1);
> + } else if (prefix->pltime > 0) {
>   log_warnx("%s: pltime from %s is too small: %d < %d; "
>   "not generating privacy address", __func__,
>   sin6_to_str(>from), prefix->pltime,
>   PRIV_REGEN_ADVANCE);
> - } else
> - /* new privacy proposal */
> - gen_address_proposal(iface, ra, prefix, 1);
> + }
>   }
>  }
>  
> 
> 
> -- 
> I'm not entirely sure you are real.
> 



Re: vmm crash on 6.9-beta

2021-03-20 Thread Otto Moerbeek
On Fri, Mar 19, 2021 at 04:15:31PM +, Stuart Henderson wrote:

> On 2021/03/19 17:05, Jan Klemkow wrote:
> > Hi,
> > 
> > I had the same issue a few days ago a server hardware of mine.  I just
> > ran 'cvs up'.  So, it looks like a generic bug in FFS and not related to
> > vmm.
> 
> This panic generally relates to filesystem corruption. If fsck doesn't
> help then recreating which filesystem is triggering it is usually needed.

Yeah, once in a while we see reports of it. It seems to be some nasty
conspiracy between the generic filesystem code, ffs and fsck_ffs.
Maybe even the device (driver) itself is involved. A possible
underlying issue may be that some operation are re-ordered while they
should not.

Now the strange thing is, fsck_ffs *should* be able to repair the
inconsistency, but it appears in some cases it is not, and some bits
on the disk remain to trigger it again.

-Otto

> 
> 
> > OpenBSD 6.9-beta (GENERIC.MP) #396: Thu Mar 11 19:15:56 MST 2021
> > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > 
> > ciao,
> > Jan
> > 
> > ddb{2}> show panic
> > ffs_valloc: dup alloc
> > 
> > ddb{2}> trace
> > db_enter() at db_enter+0x10
> > panic(81dda170) at panic+0x12a
> > ffs_inode_alloc(fd8a1acb50f0,81a4,fd8c3f7ba120,8000229d3088) at 
> > ffs
> > _inode_alloc+0x442
> > ufs_makeinode(81a4,fd8a8a498940,8000229d3380,8000229d33d0) at 
> > ufs_m
> > akeinode+0x7f
> > ufs_create(8000229d3130) at ufs_create+0x3c
> > VOP_CREATE(fd8a8a498940,8000229d3380,8000229d33d0,8000229d3190)
> >  at VOP_CREATE+0x4a
> > vn_open(8000229d3350,602,1a4) at vn_open+0x182
> > doopenat(800022915500,ff9c,cc7a0280ad0,601,1b6,8000229d3550) at 
> > doo
> > penat+0x1cd
> > syscall(8000229d35c0) at syscall+0x389
> > Xsyscall() at Xsyscall+0x128
> > end of kernel
> > end trace frame: 0x7f7c5520, count: -10
> > 
> > ddb{2}> ps
> >PID TID   PPIDUID  S   FLAGS  WAIT  COMMAND
> > *56226  366608  70629  0  70x13cvs
> > 



Re: vmm crash on 6.9-beta

2021-03-13 Thread Otto Moerbeek
On Sat, Mar 13, 2021 at 12:08:52AM -0800, Mike Larkin wrote:

> On Wed, Mar 10, 2021 at 08:30:32PM +0100, Mischa wrote:
> > On 10 Mar at 18:59, Mike Larkin  wrote:
> > > On Wed, Mar 10, 2021 at 03:08:21PM +0100, Mischa wrote:
> > > > Hi All,
> > > >
> > > > Currently I am running 6.9-beta on one of my hosts to test 
> > > > veb(4)/vport(4).
> > > >
> > > > root@server14:~ # sysctl kern.version
> > > > kern.version=OpenBSD 6.9-beta (GENERIC.MP) #385: Mon Mar  8 12:57:12 
> > > > MST 2021
> > > > dera...@amd64.openbsd.org:/usr/src/sys/arch/amd64/compile/GENERIC.MP
> > > >
> > > > On order to add some load to the system I created 41 additional VMs 
> > > > based on a single qcow2 base image.
> > > > A couple of those VMs crashed with the following ddb output.
> > > >
> > > > ddb> show panic
> > > > ffs_valloc: dup alloc
> > > > ddb> trace
> > > > db_enter() at db_enter+0x10
> > > > panic(81dc0709) at panic+0x12a
> > > > ffs_inode_alloc(fd80269831e0,8180,fd803f7bb540,800014e1e3e8)
> > > >  at ffs
> > > > _inode_alloc+0x442
> > > > ufs_makeinode(8180,fd8026a386a0,800014e1e6e0,800014e1e730) 
> > > > at ufs_m
> > > > akeinode+0x7f
> > > > ufs_create(800014e1e490) at ufs_create+0x3c
> > > > VOP_CREATE(fd8026a386a0,800014e1e6e0,800014e1e730,800014e1e4f0)
> > > >  at VOP_CREATE+0x4a
> > > > vn_open(800014e1e6b0,10602,180) at vn_open+0x182
> > > > doopenat(800014e8a518,ff9c,70e0e92a500,10601,1b6,800014e1e8b0)
> > > >  at d
> > > > oopenat+0x1d0
> > > > syscall(800014e1e920) at syscall+0x315
> > > > Xsyscall() at Xsyscall+0x128
> > > > end of kernel
> > > > end trace frame: 0x7f7e5000, count: -10
> > > >
> > > > Mischa
> > > >
> > >
> > > Probably not vmm(4) related but thanks for reporting!
> >
> > Could it be qcow2 related? or is this general disk? At least that is what I 
> > think ffs_ is. :)
> >
> > Mischa
> >
> 
> likely completely unrelated to anything vmd(8) is doing.
> 

Appart form kernel/ffs bugs, a dup alloc can also be caused by an
inconsistent fs.  Please run a *forced* (-f) fsck on the fs. (after
unmounting of course).

-Otto



Re: malloc cache changes

2021-03-09 Thread Otto Moerbeek
On Tue, Mar 09, 2021 at 09:12:03AM +0100, Otto Moerbeek wrote:

> Hi,
> 
> I just committed a malloc change that is interesting. It has been in
> snaps already for a while.
> 
> It changes the malloc cache to be a little more friendly to the
> kernel, mallocs tendency to split large allocations into page-sized ones
> was giving the kernel a hard time in some cases.
> 
> By changing the data structure I am also able to find a proper sized
> block in the cache immediately, instead of searching for it.
> 
> The original cache was very aggressive ununmapping regions. Which is

s/unun/un

> nice to find use-after-frees, but also increases the pressure on the
> kernel. This version is a bit less aggressive.
> 
> The behaviour for malloc option S is unchanged, there you get the
> maximum protection, at the cost of performance. Cache resizing also
> works as before, except that the maximum size of the cache is now
> larger: apart from the default 64 1 page sized regions it also tracks
> a number of bigger regions.
> 
>   -Otto
> 
> 



malloc cache changes

2021-03-09 Thread Otto Moerbeek
Hi,

I just committed a malloc change that is interesting. It has been in
snaps already for a while.

It changes the malloc cache to be a little more friendly to the
kernel, mallocs tendency to split large allocations into page-sized ones
was giving the kernel a hard time in some cases.

By changing the data structure I am also able to find a proper sized
block in the cache immediately, instead of searching for it.

The original cache was very aggressive ununmapping regions. Which is
nice to find use-after-frees, but also increases the pressure on the
kernel. This version is a bit less aggressive.

The behaviour for malloc option S is unchanged, there you get the
maximum protection, at the cost of performance. Cache resizing also
works as before, except that the maximum size of the cache is now
larger: apart from the default 64 1 page sized regions it also tracks
a number of bigger regions.

-Otto




Re: occasional SSIGSEGV on C++ exception handling

2021-02-22 Thread Otto Moerbeek
On Tue, Feb 23, 2021 at 06:23:22PM +1100, Jonathan Gray wrote:

> On Tue, Feb 23, 2021 at 08:10:54AM +0100, Otto Moerbeek wrote:
> > On Mon, Feb 22, 2021 at 08:58:07PM -, Miod Vallat wrote:
> > 
> > > 
> > > > No problem, real-life often takes precedence.
> > > 
> > > No way! operator(7) would need an update!
> > > 
> > 
> > What do we do when we see a bug? We fix it! What if it is not fixable?
> > We document it!
> > 
> > -Otto
> 
> real life is not a C operator

That's the bug I'm describing.

> 
> > 
> > Index: operator.7
> > ===
> > RCS file: /cvs/src/share/man/man7/operator.7,v
> > retrieving revision 1.11
> > diff -u -p -r1.11 operator.7
> > --- operator.7  21 Jun 2019 02:28:34 -  1.11
> > +++ operator.7  23 Feb 2021 07:09:10 -
> > @@ -57,3 +57,5 @@
> >  .It "\&," Ta "left to right"
> >  .El
> >  .Ed
> > +.Sh BUGS
> > +Often real life takes precedence.
> > 
> > 
> 



Re: occasional SSIGSEGV on C++ exception handling

2021-02-22 Thread Otto Moerbeek
On Mon, Feb 22, 2021 at 08:58:07PM -, Miod Vallat wrote:

> 
> > No problem, real-life often takes precedence.
> 
> No way! operator(7) would need an update!
> 

What do we do when we see a bug? We fix it! What if it is not fixable?
We document it!

-Otto

Index: operator.7
===
RCS file: /cvs/src/share/man/man7/operator.7,v
retrieving revision 1.11
diff -u -p -r1.11 operator.7
--- operator.7  21 Jun 2019 02:28:34 -  1.11
+++ operator.7  23 Feb 2021 07:09:10 -
@@ -57,3 +57,5 @@
 .It "\&," Ta "left to right"
 .El
 .Ed
+.Sh BUGS
+Often real life takes precedence.



Re: occasional SSIGSEGV on C++ exception handling

2021-02-22 Thread Otto Moerbeek
On Mon, Feb 22, 2021 at 11:09:41AM +0200, Paul Irofti wrote:

> >   - investigate the commit you mention above. Sadly I cannot
> >remember the original case that prompted for the caching code to 
> > be
> >added.
> 
> Sorry I could not reply earlier.


No problem, real-life often takes precedence.

> 
> The caching code was added by me to make libreoffice work with non-toy
> spreadsheets. Apparently referencing external cells is done through
> exception handling in office suites and this lead to waiting for whole
> minutes for libreoffice to load.
> 
> The gcc toolchain has this optimization, but clang one does not. So I added
> a simple caching mechanism a few years ago that make this bearable.
> 
> See revision 1.8 of AddressSpace.hpp. Of course it moved in the meantime and
> CVS lost all its history.
> 
> https://cvsweb.openbsd.org/src/lib/libunwind/src/Attic/AddressSpace.hpp
> 

In the meantime I made a an committed a fix. The root cause was that
the cache isn't thread safe. My change fixes that by making the cache
object tread_local:

http://cvsweb.openbsd.org/cgi-bin/cvsweb/src/gnu/llvm/libunwind/src/UnwindCursor.hpp.diff?r1=1.2=1.3

-Otto



Re: occasional SSIGSEGV on C++ exception handling

2021-02-20 Thread Otto Moerbeek
On Sat, Feb 20, 2021 at 06:30:23PM +0100, Mark Kettenis wrote:

> > Date: Sat, 20 Feb 2021 18:23:26 +0100
> > From: Otto Moerbeek 
> > Cc: tech@openbsd.org, piro...@openbsd.org
> > Content-Type: text/plain; charset=us-ascii
> > Content-Disposition: inline
> > 
> > On Fri, Feb 19, 2021 at 05:29:31PM +0100, Mark Kettenis wrote:
> > 
> > > > Date: Fri, 19 Feb 2021 16:43:10 +0100
> > > > From: Otto Moerbeek 
> > > > 
> > > > On Fri, Feb 19, 2021 at 01:06:43PM +0100, Otto Moerbeek wrote:
> > > > 
> > > > > On Fri, Feb 19, 2021 at 12:45:58PM +0100, Mark Kettenis wrote:
> > > > > 
> > > > > > > Date: Fri, 19 Feb 2021 10:57:30 +0100
> > > > > > > From: Otto Moerbeek 
> > > > > > > 
> > > > > > > Hi,
> > > > > > > 
> > > > > > > working on PowerDNS Recursor, once in a while I'm seeing:
> > > > > > > 
> > > > > > > #0  0x09fd67ef09dc in
> > > > > > > libunwind::UnwindInfoSectionsCache::CacheTree_RB_INSERT_COLOR
> > > > > > > (this=, 
> > > > > > > head=0x9fd67efc8e8 , 
> > > > > > > elm=0x9fca04be900)
> > > > > > > at 
> > > > > > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/AddressSpace.hpp:243
> > > > > > > 243   RB_GENERATE(CacheTree, CacheItem, entry, CacheCmp);
> > > > > > > [Current thread is 1 (process 349420)]
> > > > > > > (gdb) bt
> > > > > > > #0  0x09fd67ef09dc in
> > > > > > > libunwind::UnwindInfoSectionsCache::CacheTree_RB_INSERT_COLOR
> > > > > > > (this=, 
> > > > > > > head=0x9fd67efc8e8 , 
> > > > > > > elm=0x9fca04be900)
> > > > > > > at 
> > > > > > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/AddressSpace.hpp:243
> > > > > > > #1  0x09fd67eeddef in
> > > > > > > libunwind::UnwindInfoSectionsCache::CacheTree_RB_INSERT
> > > > > > > (this=, 
> > > > > > > head=, elm=)
> > > > > > > at 
> > > > > > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/AddressSpace.hpp:243
> > > > > > > #2  libunwind::UnwindInfoSectionsCache::setUnwindInfoSectionsForPC
> > > > > > > (this=, key=10983975073074, 
> > > > > > > uis=...) at 
> > > > > > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/AddressSpace.hpp:237
> > > > > > > #3  libunwind::UnwindCursor > > > > > > libunwind::Registers_x86_64>::setInfoBasedOnIPRegister (
> > > > > > > this=0x9fd2ca0aa68, isReturnAddress=)
> > > > > > > at 
> > > > > > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/UnwindCursor.hpp:1891
> > > > > > > #4  0x09fd67eedaa4 in
> > > > > > > libunwind::UnwindCursor > > > > > > libunwind::Registers_x86_64>::step (
> > > > > > > this=0x9fd2ca0aa68) at 
> > > > > > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/UnwindCursor.hpp:2031
> > > > > > > #5  0x09fd67ef15a4 in unwind_phase1 (uc=,
> > > > > > > cursor=, exception_object=0x9fd37b24560)
> > > > > > > at 
> > > > > > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/UnwindLevel1.c:46
> > > > > > > #6  _Unwind_RaiseException (exception_object=0x9fd37b24560)
> > > > > > > at 
> > > > > > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/UnwindLevel1.c:363
> > > > > > > #7  0x09fd67eeb533 in __cxa_throw 
> > > > > > > (thrown_object=0x9fd37b24580, 
> > > > > > > tinfo=0x9fa6c615a00 , 
> > > > > > > dest=)
> > > > > > > at 
> > > > > > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libcxxabi/src/cxa_exception.cpp:279
> > > > > > > #8  0x09fa6c295955 in ComboAddress::ComboAddress 
> > > > > > > (this= > > > > > > out>, str=..., port=)
> > > > > > > at ./iputils.hh:219
> > > > > > >

Re: occasional SSIGSEGV on C++ exception handling

2021-02-20 Thread Otto Moerbeek
On Fri, Feb 19, 2021 at 05:29:31PM +0100, Mark Kettenis wrote:

> > Date: Fri, 19 Feb 2021 16:43:10 +0100
> > From: Otto Moerbeek 
> > 
> > On Fri, Feb 19, 2021 at 01:06:43PM +0100, Otto Moerbeek wrote:
> > 
> > > On Fri, Feb 19, 2021 at 12:45:58PM +0100, Mark Kettenis wrote:
> > > 
> > > > > Date: Fri, 19 Feb 2021 10:57:30 +0100
> > > > > From: Otto Moerbeek 
> > > > > 
> > > > > Hi,
> > > > > 
> > > > > working on PowerDNS Recursor, once in a while I'm seeing:
> > > > > 
> > > > > #0  0x09fd67ef09dc in
> > > > > libunwind::UnwindInfoSectionsCache::CacheTree_RB_INSERT_COLOR
> > > > > (this=, 
> > > > > head=0x9fd67efc8e8 , elm=0x9fca04be900)
> > > > > at 
> > > > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/AddressSpace.hpp:243
> > > > > 243   RB_GENERATE(CacheTree, CacheItem, entry, CacheCmp);
> > > > > [Current thread is 1 (process 349420)]
> > > > > (gdb) bt
> > > > > #0  0x09fd67ef09dc in
> > > > > libunwind::UnwindInfoSectionsCache::CacheTree_RB_INSERT_COLOR
> > > > > (this=, 
> > > > > head=0x9fd67efc8e8 , elm=0x9fca04be900)
> > > > > at 
> > > > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/AddressSpace.hpp:243
> > > > > #1  0x09fd67eeddef in
> > > > > libunwind::UnwindInfoSectionsCache::CacheTree_RB_INSERT
> > > > > (this=, 
> > > > > head=, elm=)
> > > > > at 
> > > > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/AddressSpace.hpp:243
> > > > > #2  libunwind::UnwindInfoSectionsCache::setUnwindInfoSectionsForPC
> > > > > (this=, key=10983975073074, 
> > > > > uis=...) at 
> > > > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/AddressSpace.hpp:237
> > > > > #3  libunwind::UnwindCursor > > > > libunwind::Registers_x86_64>::setInfoBasedOnIPRegister (
> > > > > this=0x9fd2ca0aa68, isReturnAddress=)
> > > > > at 
> > > > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/UnwindCursor.hpp:1891
> > > > > #4  0x09fd67eedaa4 in
> > > > > libunwind::UnwindCursor > > > > libunwind::Registers_x86_64>::step (
> > > > > this=0x9fd2ca0aa68) at 
> > > > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/UnwindCursor.hpp:2031
> > > > > #5  0x09fd67ef15a4 in unwind_phase1 (uc=,
> > > > > cursor=, exception_object=0x9fd37b24560)
> > > > > at 
> > > > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/UnwindLevel1.c:46
> > > > > #6  _Unwind_RaiseException (exception_object=0x9fd37b24560)
> > > > > at 
> > > > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/UnwindLevel1.c:363
> > > > > #7  0x09fd67eeb533 in __cxa_throw (thrown_object=0x9fd37b24580, 
> > > > > tinfo=0x9fa6c615a00 , dest= > > > > out>)
> > > > > at 
> > > > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libcxxabi/src/cxa_exception.cpp:279
> > > > > #8  0x09fa6c295955 in ComboAddress::ComboAddress (this= > > > > out>, str=..., port=)
> > > > > at ./iputils.hh:219
> > > > > #9  0x09fa6c489970 in startFrameStreamServers (config=...) at 
> > > > > pdns_recursor.cc:1248
> > > > > #10 checkFrameStreamExport (luaconfsLocal=...) at 
> > > > > pdns_recursor.cc:1290
> > > > > #11 0x09fa6c48158f in recursorThread (n=,
> > > > > ...
> > > > > 
> > > > > This does not happen always, most of the time this exception is
> > > > > handled correctly, afaik.
> > > > > 
> > > > > The code that twrows an exception is:
> > > > >   try {
> > > > > ComboAddress address(server);
> > > > > ...
> > > > >   }
> > > > >   catch ...
> > > > > 
> > > > > The ComboAddress constructor throws the exception (and is supposed to
> > > > > do that). It looks like libunwind gets confused somehow.
> > > > > 
> > > > > Any clue?
> > > > 
> > > > T

Re: occasional SSIGSEGV on C++ exception handling

2021-02-19 Thread Otto Moerbeek
On Fri, Feb 19, 2021 at 01:06:43PM +0100, Otto Moerbeek wrote:

> On Fri, Feb 19, 2021 at 12:45:58PM +0100, Mark Kettenis wrote:
> 
> > > Date: Fri, 19 Feb 2021 10:57:30 +0100
> > > From: Otto Moerbeek 
> > > 
> > > Hi,
> > > 
> > > working on PowerDNS Recursor, once in a while I'm seeing:
> > > 
> > > #0  0x09fd67ef09dc in
> > > libunwind::UnwindInfoSectionsCache::CacheTree_RB_INSERT_COLOR
> > > (this=, 
> > > head=0x9fd67efc8e8 , elm=0x9fca04be900)
> > > at 
> > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/AddressSpace.hpp:243
> > > 243   RB_GENERATE(CacheTree, CacheItem, entry, CacheCmp);
> > > [Current thread is 1 (process 349420)]
> > > (gdb) bt
> > > #0  0x09fd67ef09dc in
> > > libunwind::UnwindInfoSectionsCache::CacheTree_RB_INSERT_COLOR
> > > (this=, 
> > > head=0x9fd67efc8e8 , elm=0x9fca04be900)
> > > at 
> > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/AddressSpace.hpp:243
> > > #1  0x09fd67eeddef in
> > > libunwind::UnwindInfoSectionsCache::CacheTree_RB_INSERT
> > > (this=, 
> > > head=, elm=)
> > > at 
> > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/AddressSpace.hpp:243
> > > #2  libunwind::UnwindInfoSectionsCache::setUnwindInfoSectionsForPC
> > > (this=, key=10983975073074, 
> > > uis=...) at 
> > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/AddressSpace.hpp:237
> > > #3  libunwind::UnwindCursor > > libunwind::Registers_x86_64>::setInfoBasedOnIPRegister (
> > > this=0x9fd2ca0aa68, isReturnAddress=)
> > > at 
> > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/UnwindCursor.hpp:1891
> > > #4  0x09fd67eedaa4 in
> > > libunwind::UnwindCursor > > libunwind::Registers_x86_64>::step (
> > > this=0x9fd2ca0aa68) at 
> > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/UnwindCursor.hpp:2031
> > > #5  0x09fd67ef15a4 in unwind_phase1 (uc=,
> > > cursor=, exception_object=0x9fd37b24560)
> > > at 
> > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/UnwindLevel1.c:46
> > > #6  _Unwind_RaiseException (exception_object=0x9fd37b24560)
> > > at 
> > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/UnwindLevel1.c:363
> > > #7  0x09fd67eeb533 in __cxa_throw (thrown_object=0x9fd37b24580, 
> > > tinfo=0x9fa6c615a00 , dest= > > out>)
> > > at 
> > > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libcxxabi/src/cxa_exception.cpp:279
> > > #8  0x09fa6c295955 in ComboAddress::ComboAddress (this= > > out>, str=..., port=)
> > > at ./iputils.hh:219
> > > #9  0x09fa6c489970 in startFrameStreamServers (config=...) at 
> > > pdns_recursor.cc:1248
> > > #10 checkFrameStreamExport (luaconfsLocal=...) at pdns_recursor.cc:1290
> > > #11 0x09fa6c48158f in recursorThread (n=,
> > > ...
> > > 
> > > This does not happen always, most of the time this exception is
> > > handled correctly, afaik.
> > > 
> > > The code that twrows an exception is:
> > >   try {
> > > ComboAddress address(server);
> > > ...
> > >   }
> > >   catch ...
> > > 
> > > The ComboAddress constructor throws the exception (and is supposed to
> > > do that). It looks like libunwind gets confused somehow.
> > > 
> > > Any clue?
> > 
> > The cache that pirofti@ added a while ago isn't thread-safe.  Or maybe
> > this is a use-after free caused by dlcose(4).  We should probably
> > disable/remove it while he is working on a better solution.
> > Unfortunately I don't think adding locking here is a good idea so this
> > may need a more fundamental rethink.
> > 
> > Upstream did add an optimization in this area a few months ago:
> > 
> >   
> > https://github.com/llvm/llvm-project/commit/881aba7071c6e4cc2417e875ca5027ec7c0a92a3
> > 
> > The version of libunwind we're using is older than that, so it may be
> > worth picking that up and see if that improves the original problem.
> 
> First I'm going to try to fix it my making the cache thread_local.
> 
> I'm probably going to regret looking at this code,
> 
>   -Otto

The diff below works for my test case on amd64.

It also feels right from a theoretical point of view. As for practical
matters

Re: occasional SSIGSEGV on C++ exception handling

2021-02-19 Thread Otto Moerbeek
On Fri, Feb 19, 2021 at 12:45:58PM +0100, Mark Kettenis wrote:

> > Date: Fri, 19 Feb 2021 10:57:30 +0100
> > From: Otto Moerbeek 
> > 
> > Hi,
> > 
> > working on PowerDNS Recursor, once in a while I'm seeing:
> > 
> > #0  0x09fd67ef09dc in
> > libunwind::UnwindInfoSectionsCache::CacheTree_RB_INSERT_COLOR
> > (this=, 
> > head=0x9fd67efc8e8 , elm=0x9fca04be900)
> > at 
> > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/AddressSpace.hpp:243
> > 243   RB_GENERATE(CacheTree, CacheItem, entry, CacheCmp);
> > [Current thread is 1 (process 349420)]
> > (gdb) bt
> > #0  0x09fd67ef09dc in
> > libunwind::UnwindInfoSectionsCache::CacheTree_RB_INSERT_COLOR
> > (this=, 
> > head=0x9fd67efc8e8 , elm=0x9fca04be900)
> > at 
> > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/AddressSpace.hpp:243
> > #1  0x09fd67eeddef in
> > libunwind::UnwindInfoSectionsCache::CacheTree_RB_INSERT
> > (this=, 
> > head=, elm=)
> > at 
> > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/AddressSpace.hpp:243
> > #2  libunwind::UnwindInfoSectionsCache::setUnwindInfoSectionsForPC
> > (this=, key=10983975073074, 
> > uis=...) at 
> > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/AddressSpace.hpp:237
> > #3  libunwind::UnwindCursor > libunwind::Registers_x86_64>::setInfoBasedOnIPRegister (
> > this=0x9fd2ca0aa68, isReturnAddress=)
> > at 
> > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/UnwindCursor.hpp:1891
> > #4  0x09fd67eedaa4 in
> > libunwind::UnwindCursor > libunwind::Registers_x86_64>::step (
> > this=0x9fd2ca0aa68) at 
> > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/UnwindCursor.hpp:2031
> > #5  0x09fd67ef15a4 in unwind_phase1 (uc=,
> > cursor=, exception_object=0x9fd37b24560)
> > at 
> > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/UnwindLevel1.c:46
> > #6  _Unwind_RaiseException (exception_object=0x9fd37b24560)
> > at 
> > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/UnwindLevel1.c:363
> > #7  0x09fd67eeb533 in __cxa_throw (thrown_object=0x9fd37b24580, 
> > tinfo=0x9fa6c615a00 , dest=)
> > at 
> > /usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libcxxabi/src/cxa_exception.cpp:279
> > #8  0x09fa6c295955 in ComboAddress::ComboAddress (this= > out>, str=..., port=)
> > at ./iputils.hh:219
> > #9  0x09fa6c489970 in startFrameStreamServers (config=...) at 
> > pdns_recursor.cc:1248
> > #10 checkFrameStreamExport (luaconfsLocal=...) at pdns_recursor.cc:1290
> > #11 0x09fa6c48158f in recursorThread (n=,
> > ...
> > 
> > This does not happen always, most of the time this exception is
> > handled correctly, afaik.
> > 
> > The code that twrows an exception is:
> >   try {
> > ComboAddress address(server);
> > ...
> >   }
> >   catch ...
> > 
> > The ComboAddress constructor throws the exception (and is supposed to
> > do that). It looks like libunwind gets confused somehow.
> > 
> > Any clue?
> 
> The cache that pirofti@ added a while ago isn't thread-safe.  Or maybe
> this is a use-after free caused by dlcose(4).  We should probably
> disable/remove it while he is working on a better solution.
> Unfortunately I don't think adding locking here is a good idea so this
> may need a more fundamental rethink.
> 
> Upstream did add an optimization in this area a few months ago:
> 
>   
> https://github.com/llvm/llvm-project/commit/881aba7071c6e4cc2417e875ca5027ec7c0a92a3
> 
> The version of libunwind we're using is older than that, so it may be
> worth picking that up and see if that improves the original problem.

First I'm going to try to fix it my making the cache thread_local.

I'm probably going to regret looking at this code,

-Otto

Index: UnwindCursor.hpp
===
RCS file: /cvs/src/gnu/llvm/libunwind/src/UnwindCursor.hpp,v
retrieving revision 1.2
diff -u -p -r1.2 UnwindCursor.hpp
--- UnwindCursor.hpp2 Jan 2021 01:10:02 -   1.2
+++ UnwindCursor.hpp19 Feb 2021 12:05:26 -
@@ -75,7 +75,7 @@ extern "C" _Unwind_Reason_Code __libunwi
 
 namespace libunwind {
 
-static UnwindInfoSectionsCache uwis_cache;
+static thread_local UnwindInfoSectionsCache uwis_cache;
 
 #if defined(_LIBUNWIND_SUPPORT_DWARF_UNWIND)
 /// Cache of recently found FDEs.



occasional SSIGSEGV on C++ exception handling

2021-02-19 Thread Otto Moerbeek
Hi,

working on PowerDNS Recursor, once in a while I'm seeing:

#0  0x09fd67ef09dc in
libunwind::UnwindInfoSectionsCache::CacheTree_RB_INSERT_COLOR
(this=, 
head=0x9fd67efc8e8 , elm=0x9fca04be900)
at 
/usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/AddressSpace.hpp:243
243   RB_GENERATE(CacheTree, CacheItem, entry, CacheCmp);
[Current thread is 1 (process 349420)]
(gdb) bt
#0  0x09fd67ef09dc in
libunwind::UnwindInfoSectionsCache::CacheTree_RB_INSERT_COLOR
(this=, 
head=0x9fd67efc8e8 , elm=0x9fca04be900)
at 
/usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/AddressSpace.hpp:243
#1  0x09fd67eeddef in
libunwind::UnwindInfoSectionsCache::CacheTree_RB_INSERT
(this=, 
head=, elm=)
at 
/usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/AddressSpace.hpp:243
#2  libunwind::UnwindInfoSectionsCache::setUnwindInfoSectionsForPC
(this=, key=10983975073074, 
uis=...) at 
/usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/AddressSpace.hpp:237
#3  libunwind::UnwindCursor::setInfoBasedOnIPRegister (
this=0x9fd2ca0aa68, isReturnAddress=)
at 
/usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/UnwindCursor.hpp:1891
#4  0x09fd67eedaa4 in
libunwind::UnwindCursor::step (
this=0x9fd2ca0aa68) at 
/usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/UnwindCursor.hpp:2031
#5  0x09fd67ef15a4 in unwind_phase1 (uc=,
cursor=, exception_object=0x9fd37b24560)
at 
/usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/UnwindLevel1.c:46
#6  _Unwind_RaiseException (exception_object=0x9fd37b24560)
at 
/usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libunwind/src/UnwindLevel1.c:363
#7  0x09fd67eeb533 in __cxa_throw (thrown_object=0x9fd37b24580, 
tinfo=0x9fa6c615a00 , dest=)
at 
/usr/src/gnu/lib/libcxxabi/../../../gnu/llvm/libcxxabi/src/cxa_exception.cpp:279
#8  0x09fa6c295955 in ComboAddress::ComboAddress (this=, str=..., port=)
at ./iputils.hh:219
#9  0x09fa6c489970 in startFrameStreamServers (config=...) at 
pdns_recursor.cc:1248
#10 checkFrameStreamExport (luaconfsLocal=...) at pdns_recursor.cc:1290
#11 0x09fa6c48158f in recursorThread (n=,
...

This does not happen always, most of the time this exception is
handled correctly, afaik.

The code that twrows an exception is:
  try {
ComboAddress address(server);
...
  }
  catch ...

The ComboAddress constructor throws the exception (and is supposed to
do that). It looks like libunwind gets confused somehow.

Any clue?

-Otto



Re: if calloc() needs nmemb and size, why doesn't freezero()?

2021-02-18 Thread Otto Moerbeek
On Thu, Feb 18, 2021 at 03:24:36PM -0600, Luke Small wrote:

> However, calloc(ptr, nmemb, size) may have been called using smaller int
> variable types which would overflow when multiplied. Where if the variables
> storing the values passed to nmemb and size are less than or especially
> equal to their original values, I think it’d be good to state that:
> 
> freezero(ptr, (size_t)nmemb * (size_t)size);
> is guaranteed to work, but
> freezero(ptr, nmemb * size);
> does not have that guarantee.

Lets try to make things explicit.

The function c() does the overflowe check like calloc does.
The function f() takes a size_t.

#include 
#include 

#define MUL_NO_OVERFLOW (1UL << (sizeof(size_t) * 4))

void c(size_t nmemb, size_t size)
{
if ((nmemb >= MUL_NO_OVERFLOW || size >= MUL_NO_OVERFLOW) &&
nmemb > 0 && SIZE_T_MAX / nmemb < size)
printf("Overflow\n");
else
printf("%zu\n", nmemb * size);
}

void f(size_t m)
{
printf("%zu\n", m);
}

int
main()
{
int a = INT_MAX;
int b = INT_MAX;
c(a, b);
f(a * b); 
}

Now the issues is that the multiplication in the last line of main()
overflows:

$ ./a.out
4611686014132420609
1

because this is an int multiplication only after that the promotion to
size_t is done.

So you are right that this can happen, *if you are using the wrong
types*. But I would argue that feeding anything other than either
size_t or constants to calloc() is already wrong. You *have* to
consider the argument conversion rules when feeding values to calloc()
(or any function). To avoid having to think about those, start with
size_t already for everything that is a size or count of a memory
object.

-Otto



Re: malloc junking tweaks

2021-02-18 Thread Otto Moerbeek
On Fri, Feb 12, 2021 at 02:48:34PM +0100, Otto Moerbeek wrote:

> On Fri, Feb 12, 2021 at 02:18:08PM +0100, Otto Moerbeek wrote:
> 
> > Hi,
> > 
> > Curently, junking is done like this:
> > 
> > - for small chunk, the whole chunk junked on free
> > 
> > - for pages sized and larger, the first half a page is filled
> > 
> > - after a delayed free, the first 32 bytes of small chunks are
> > validated to not be overwritten
> > 
> > - page sized and larger allocations are not validated at all, even if
> > they end up in the cache.
> > 
> > This diff changes the following:
> > 
> > - I make use of the fact that I know how the chunks are aligned, and
> > write 8 bytes at the time by using a uint64_t pointer. For an
> > allocation a max of 4 such uint64_t's are written spread over the
> > allocation. For pages sized and larger, the first page is junked in
> > such a way.
> > 
> > - Delayed free of a small chunk checks the corresponiding way.
> > 
> > - Pages ending up in the cache are validated upon unmapping or re-use.
> > 
> > The last point is the real gain: we also check for write-after-free
> > for large allocations, which we did not do before.
> > 
> > So we are catching more writes-after-frees. A price to pay is that
> > larger chunks are not completely junked, but only a total of 32 bytes
> > are. I chose this number after comparing performance with the current
> > code: we still gain a bit in speed.
> > 
> > Junk mode 0 (j) and junk mode 2 (J) are not changed.
> > 
> > Please test and review,
> > 
> > -Otto
> > 
> 
> And now with correct version of diff

Any feedback?

-Otto
> 
> 
> Index: stdlib/malloc.3
> ===
> RCS file: /cvs/src/lib/libc/stdlib/malloc.3,v
> retrieving revision 1.126
> diff -u -p -r1.126 malloc.3
> --- stdlib/malloc.3   14 Sep 2019 13:16:50 -  1.126
> +++ stdlib/malloc.3   12 Feb 2021 08:14:54 -
> @@ -619,7 +619,7 @@ or
>  reallocate an unallocated pointer was made.
>  .It Dq chunk is already free
>  There was an attempt to free a chunk that had already been freed.
> -.It Dq use after free
> +.It Dq write after free
>  A chunk has been modified after it was freed.
>  .It Dq modified chunk-pointer
>  The pointer passed to
> Index: stdlib/malloc.c
> ===
> RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v
> retrieving revision 1.267
> diff -u -p -r1.267 malloc.c
> --- stdlib/malloc.c   23 Nov 2020 15:42:11 -  1.267
> +++ stdlib/malloc.c   12 Feb 2021 08:14:54 -
> @@ -89,6 +89,7 @@
>   */
>  #define SOME_JUNK0xdb/* deadbeef */
>  #define SOME_FREEJUNK0xdf/* dead, free */
> +#define SOME_FREEJUNK_ULL0xdfdfdfdfdfdfdfdfULL
>  
>  #define MMAP(sz,f)   mmap(NULL, (sz), PROT_READ | PROT_WRITE, \
>  MAP_ANON | MAP_PRIVATE | (f), -1, 0)
> @@ -655,6 +656,49 @@ delete(struct dir_info *d, struct region
>   }
>  }
>  
> +static inline void
> +junk_free(int junk, void *p, size_t sz)
> +{
> + size_t i, step = 1;
> + uint64_t *lp = p;
> +
> + if (junk == 0 || sz == 0)
> + return;
> + sz /= sizeof(uint64_t);
> + if (junk == 1) {
> + if (sz > MALLOC_PAGESIZE / sizeof(uint64_t))
> + sz = MALLOC_PAGESIZE / sizeof(uint64_t);
> + step = sz / 4;
> + if (step == 0)
> + step = 1;
> + }
> + for (i = 0; i < sz; i += step)
> + lp[i] = SOME_FREEJUNK_ULL;
> +}
> +
> +static inline void
> +validate_junk(struct dir_info *pool, void *p, size_t sz)
> +{
> + size_t i, step = 1;
> + uint64_t *lp = p;
> +
> + if (pool->malloc_junk == 0 || sz == 0)
> + return;
> + sz /= sizeof(uint64_t);
> + if (pool->malloc_junk == 1) {
> + if (sz > MALLOC_PAGESIZE / sizeof(uint64_t))
> + sz = MALLOC_PAGESIZE / sizeof(uint64_t);
> + step = sz / 4;
> + if (step == 0)
> + step = 1;
> + }
> + for (i = 0; i < sz; i += step) {
> + if (lp[i] != SOME_FREEJUNK_ULL)
> + wrterror(pool, "write after free %p", p);
> + }
> +}
> +
> +
>  /*
>   * Cache maintenance. We keep at most malloc_cache pages cached.
>   * If the cache is becoming full, unmap pages in the cache for real,
> @@ -663,7 +707,7 @@ delete(struct d

Re: if calloc() needs nmemb and size, why doesn't freezero()?

2021-02-18 Thread Otto Moerbeek
On Wed, Feb 17, 2021 at 11:05:49AM -0700, Theo de Raadt wrote:

> Luke Small  wrote:
> 
> > I guess I always thought there'd be some more substantial overflow 
> > mitigation.
> 
> You have to free with the exact same size as allocation.

Small correction: the size may be smaller than the original. In that
case, only a partial clear is guaranteed, the deallocation will still
be for the full allocation. Originally we were more strict, but iirc
that was causing to much headaches for some. See
https://cvsweb.openbsd.org/src/lib/libc/stdlib/malloc.c?rev=1.221

But the point stands: nmemb * size does not overflow, since the
original allocation would have overflowed and thus failed.

-Otto

> 
> nmemb and size did not change.
> 
> The math has already been checked, and regular codeflows will store the
> multiple in a single variable after successful checking, for
> reuse.
> 
> > Would it be too much hand-holding to put in the manpage that to avoid 
> > potential
> > freeezero() integer overflow,
> > it may be useful to run freezero() as freezero((size_t)nmemb * 
> > (size_t)size);
> 
> Wow, Those casts make it very clear you don't understand C, if you do
> that kind of stuff elsewhere you are introducing problems.
> 
> Sorry no you are wrong.
> 



Re: malloc junking tweaks

2021-02-12 Thread Otto Moerbeek
On Fri, Feb 12, 2021 at 02:18:08PM +0100, Otto Moerbeek wrote:

> Hi,
> 
> Curently, junking is done like this:
> 
> - for small chunk, the whole chunk junked on free
> 
> - for pages sized and larger, the first half a page is filled
> 
> - after a delayed free, the first 32 bytes of small chunks are
> validated to not be overwritten
> 
> - page sized and larger allocations are not validated at all, even if
> they end up in the cache.
> 
> This diff changes the following:
> 
> - I make use of the fact that I know how the chunks are aligned, and
> write 8 bytes at the time by using a uint64_t pointer. For an
> allocation a max of 4 such uint64_t's are written spread over the
> allocation. For pages sized and larger, the first page is junked in
> such a way.
> 
> - Delayed free of a small chunk checks the corresponiding way.
> 
> - Pages ending up in the cache are validated upon unmapping or re-use.
> 
> The last point is the real gain: we also check for write-after-free
> for large allocations, which we did not do before.
> 
> So we are catching more writes-after-frees. A price to pay is that
> larger chunks are not completely junked, but only a total of 32 bytes
> are. I chose this number after comparing performance with the current
> code: we still gain a bit in speed.
> 
> Junk mode 0 (j) and junk mode 2 (J) are not changed.
> 
> Please test and review,
> 
>   -Otto
> 

And now with correct version of diff

-Otto


Index: stdlib/malloc.3
===
RCS file: /cvs/src/lib/libc/stdlib/malloc.3,v
retrieving revision 1.126
diff -u -p -r1.126 malloc.3
--- stdlib/malloc.3 14 Sep 2019 13:16:50 -  1.126
+++ stdlib/malloc.3 12 Feb 2021 08:14:54 -
@@ -619,7 +619,7 @@ or
 reallocate an unallocated pointer was made.
 .It Dq chunk is already free
 There was an attempt to free a chunk that had already been freed.
-.It Dq use after free
+.It Dq write after free
 A chunk has been modified after it was freed.
 .It Dq modified chunk-pointer
 The pointer passed to
Index: stdlib/malloc.c
===
RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v
retrieving revision 1.267
diff -u -p -r1.267 malloc.c
--- stdlib/malloc.c 23 Nov 2020 15:42:11 -  1.267
+++ stdlib/malloc.c 12 Feb 2021 08:14:54 -
@@ -89,6 +89,7 @@
  */
 #define SOME_JUNK  0xdb/* deadbeef */
 #define SOME_FREEJUNK  0xdf/* dead, free */
+#define SOME_FREEJUNK_ULL  0xdfdfdfdfdfdfdfdfULL
 
 #define MMAP(sz,f) mmap(NULL, (sz), PROT_READ | PROT_WRITE, \
 MAP_ANON | MAP_PRIVATE | (f), -1, 0)
@@ -655,6 +656,49 @@ delete(struct dir_info *d, struct region
}
 }
 
+static inline void
+junk_free(int junk, void *p, size_t sz)
+{
+   size_t i, step = 1;
+   uint64_t *lp = p;
+
+   if (junk == 0 || sz == 0)
+   return;
+   sz /= sizeof(uint64_t);
+   if (junk == 1) {
+   if (sz > MALLOC_PAGESIZE / sizeof(uint64_t))
+   sz = MALLOC_PAGESIZE / sizeof(uint64_t);
+   step = sz / 4;
+   if (step == 0)
+   step = 1;
+   }
+   for (i = 0; i < sz; i += step)
+   lp[i] = SOME_FREEJUNK_ULL;
+}
+
+static inline void
+validate_junk(struct dir_info *pool, void *p, size_t sz)
+{
+   size_t i, step = 1;
+   uint64_t *lp = p;
+
+   if (pool->malloc_junk == 0 || sz == 0)
+   return;
+   sz /= sizeof(uint64_t);
+   if (pool->malloc_junk == 1) {
+   if (sz > MALLOC_PAGESIZE / sizeof(uint64_t))
+   sz = MALLOC_PAGESIZE / sizeof(uint64_t);
+   step = sz / 4;
+   if (step == 0)
+   step = 1;
+   }
+   for (i = 0; i < sz; i += step) {
+   if (lp[i] != SOME_FREEJUNK_ULL)
+   wrterror(pool, "write after free %p", p);
+   }
+}
+
+
 /*
  * Cache maintenance. We keep at most malloc_cache pages cached.
  * If the cache is becoming full, unmap pages in the cache for real,
@@ -663,7 +707,7 @@ delete(struct dir_info *d, struct region
  * cache are in MALLOC_PAGESIZE units.
  */
 static void
-unmap(struct dir_info *d, void *p, size_t sz, size_t clear, int junk)
+unmap(struct dir_info *d, void *p, size_t sz, size_t clear)
 {
size_t psz = sz >> MALLOC_PAGESHIFT;
size_t rsz;
@@ -695,6 +739,8 @@ unmap(struct dir_info *d, void *p, size_
r = >free_regions[(i + offset) & mask];
if (r->p != NULL) {
rsz = r->size << MALLOC_PAGESHIFT;
+   if (!mopts.malloc_freeunmap)
+   validate_junk(d, r->p, rsz);
  

malloc junking tweaks

2021-02-12 Thread Otto Moerbeek
Hi,

Curently, junking is done like this:

- for small chunk, the whole chunk junked on free

- for pages sized and larger, the first half a page is filled

- after a delayed free, the first 32 bytes of small chunks are
validated to not be overwritten

- page sized and larger allocations are not validated at all, even if
they end up in the cache.

This diff changes the following:

- I make use of the fact that I know how the chunks are aligned, and
write 8 bytes at the time by using a uint64_t pointer. For an
allocation a max of 4 such uint64_t's are written spread over the
allocation. For pages sized and larger, the first page is junked in
such a way.

- Delayed free of a small chunk checks the corresponiding way.

- Pages ending up in the cache are validated upon unmapping or re-use.

The last point is the real gain: we also check for write-after-free
for large allocations, which we did not do before.

So we are catching more writes-after-frees. A price to pay is that
larger chunks are not completely junked, but only a total of 32 bytes
are. I chose this number after comparing performance with the current
code: we still gain a bit in speed.

Junk mode 0 (j) and junk mode 2 (J) are not changed.

Please test and review,

-Otto

Index: stdlib/malloc.c
===
RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v
retrieving revision 1.267
diff -u -p -r1.267 malloc.c
--- stdlib/malloc.c 23 Nov 2020 15:42:11 -  1.267
+++ stdlib/malloc.c 6 Feb 2021 20:03:49 -
@@ -89,6 +89,7 @@
  */
 #define SOME_JUNK  0xdb/* deadbeef */
 #define SOME_FREEJUNK  0xdf/* dead, free */
+#define SOME_FREEJUNK_ULL  0xdfdfdfdfdfdfdfdfULL
 
 #define MMAP(sz,f) mmap(NULL, (sz), PROT_READ | PROT_WRITE, \
 MAP_ANON | MAP_PRIVATE | (f), -1, 0)
@@ -655,6 +656,46 @@ delete(struct dir_info *d, struct region
}
 }
 
+static inline void
+junkfree(int junk, void *p, size_t sz)
+{
+   size_t byte, step = 1;
+
+   if (junk == 0 || sz == 0)
+   return;
+   if (junk == 1) {
+   if (sz > MALLOC_PAGESIZE)
+   sz = MALLOC_PAGESIZE;
+   step = sz / 32;
+   if (step == 0)
+   step = 1;
+   }
+   for (byte = 0; byte < sz; byte++)
+   ((unsigned char *)p)[byte] = SOME_FREEJUNK;
+}
+
+static inline void
+validate_junk(struct dir_info *pool, void *p, size_t sz)
+{
+   size_t byte, step = 1;
+
+   if (pool->malloc_junk == 0 || sz == 0)
+   return;
+   if (pool->malloc_junk == 1) {
+   if (sz > MALLOC_PAGESIZE)
+   sz = MALLOC_PAGESIZE;
+   step = sz / 32;
+   if (step == 0)
+   step = 1;
+   }
+   for (byte = 0; byte < sz; byte += step) {
+   if (((unsigned char *)p)[byte] != SOME_FREEJUNK)
+   wrterror(pool, "write after free %p %#zx@%#zx", p,
+   byte, sz);
+   }
+}
+
+
 /*
  * Cache maintenance. We keep at most malloc_cache pages cached.
  * If the cache is becoming full, unmap pages in the cache for real,
@@ -663,7 +704,7 @@ delete(struct dir_info *d, struct region
  * cache are in MALLOC_PAGESIZE units.
  */
 static void
-unmap(struct dir_info *d, void *p, size_t sz, size_t clear, int junk)
+unmap(struct dir_info *d, void *p, size_t sz, size_t clear)
 {
size_t psz = sz >> MALLOC_PAGESHIFT;
size_t rsz;
@@ -716,12 +757,10 @@ unmap(struct dir_info *d, void *p, size_
if (r->p == NULL) {
if (clear > 0)
memset(p, 0, clear);
-   if (junk && !mopts.malloc_freeunmap) {
-   size_t amt = junk == 1 ?  MALLOC_MAXCHUNK : sz;
-   memset(p, SOME_FREEJUNK, amt);
-   }
if (mopts.malloc_freeunmap)
mprotect(p, sz, PROT_NONE);
+   else
+   junkfree(d->malloc_junk, p, psz << 
MALLOC_PAGESHIFT);
r->p = p;
r->size = psz;
d->free_regions_size += psz;
@@ -760,15 +799,16 @@ map(struct dir_info *d, size_t sz, int z
if (r->p != NULL) {
if (r->size == psz) {
p = r->p;
+   if (!mopts.malloc_freeunmap)
+   validate_junk(d, p, psz << 
MALLOC_PAGESHIFT);
r->p = NULL;
d->free_regions_size -= psz;
if (mopts.malloc_freeunmap)
mprotect(p, sz, PROT_READ | PROT_WRITE);
if (zero_fill)
memset(p, 0, 

Re: execve -1 errno 12 Cannot allocate memory

2021-02-01 Thread Otto Moerbeek
On Mon, Feb 01, 2021 at 10:24:31PM -0500, Philippe Meunier wrote:

> Anyone?

Fixing a particluar issue is fine, but more important is an assessment
it does not break other things. In particular, does this limit the VM
for data available to any program (which is already quite limited on
i386)?

-Otto

> 
> Philippe
> 
> 
> Philippe Meunier wrote:
> >Jonathan Gray wrote:
> >>MAXTSIZ is 128 MB on i386
> >>see sys/arch/i386/include/vmparam.h
> >
> >Mark Kettenis wrote:
> >>sys/arch/i386/include/vmparam.h has:
> >>#define MAXTSIZ (128*1024*1024) /* max text size */
> >
> >Thanks to both of you for the pointer!
> >
> >So what about the patch below?  I've checked with a new kernel that it
> >fixes the problem with chrome (even when using the default limits in
> >/etc/login.conf).
> >
> >Philippe
> >
> >
> >Index: sys/arch/i386/include/vmparam.h
> >===
> >RCS file: /cvs/src/sys/arch/i386/include/vmparam.h,v
> >retrieving revision 1.56
> >diff -u -r1.56 vmparam.h
> >--- sys/arch/i386/include/vmparam.h  17 Apr 2018 15:50:05 -  1.56
> >+++ sys/arch/i386/include/vmparam.h  31 Jan 2021 09:41:00 -
> >@@ -46,7 +46,7 @@
> > /*
> >  * Virtual memory related constants, all in bytes
> >  */
> >-#define MAXTSIZ (128*1024*1024) /* max text size */
> >+#define MAXTSIZ (256*1024*1024) /* max text size */
> > #ifndef DFLDSIZ
> > #define DFLDSIZ (64*1024*1024)  /* initial data size 
> > limit */
> > #endif
> >
> >
> 



Re: [PATCH v3 (resend)] tee: Add -q, --quiet, --silent option to not write to stdout

2021-01-24 Thread Otto Moerbeek
On Sun, Jan 24, 2021 at 01:01:45PM -0700, Alex Henrie wrote:

> On Sun, Jan 24, 2021 at 10:51 AM Otto Moerbeek  wrote:
> >
> > Please stop pushing your diff to this list. So far nobody showed any
> > interest.
> 
> I am definitely interested. Bernhard Voelker seemed to express
> interest as well, conditional on -q being added to POSIX first.[1]
> Also, a --quiet flag was proposed back in 2001 by Roman Czyborra [2]
> and Jim Meyering expressed support for the idea.[3]
> 
> -Alex
> 
> [1] https://lists.gnu.org/archive/html/coreutils/2021-01/msg00043.html
> [2] https://lists.gnu.org/archive/html/bug-sh-utils/2001-05/msg00024.html
> [3] https://lists.gnu.org/archive/html/bug-sh-utils/2001-05/msg00039.html

"This list" is the OpenBSD tech list, sorry I did leave out this
context info.

-Otto



Re: unwind: silence "udp connect failed" errors

2021-01-24 Thread Otto Moerbeek
On Sun, Jan 24, 2021 at 07:24:07PM +0100, Florian Obser wrote:

> On Sun, Jan 24, 2021 at 01:06:31PM +0100, Klemens Nanni wrote:
> > On Sun, Jan 24, 2021 at 12:52:50PM +0100, Theo Buehler wrote:
> > > Probably better to sync first with the corresponding unbound commit
> > > https://cvsweb.openbsd.org/src/usr.sbin/unbound/services/outside_network.c#rev1.21
> > > then adjust udp_connect_needs_log() as needed.
> > Good call, thanks.
> > 
> > Here's the combined diff that syncs with unbound and adds EADDRNOTAVAIL
> > in the same fashion.
> > 
> > In case that is OK, I'd commit sync and addition separately.
> > 
> > Feedback? OK?
> 
> I consider the libunbound directory off-limits, it is not a fork and
> it should not be a fork.
> 
> We currently carry two diffs in there and I want to get rid of them
> (see at the end).
> 
> Especialy the local syslog one is stupid. I think we are doing
> something wrong or libunbound does something wrong, not sure.
> 
> Anyway, I think what we should actually do is disable logging in
> libunbound unless we crank logging to debug in unwind. There is
> nothing interesting comming out of libunbound for us.
> 
> This leaves the 2nd problem, why is it going off the rails so badly if
> there is no v4? Is it also doing this for v6 when that's not present?
> Try resolving dnssec-failed.org, it's delegated to comcast which uses
> 5 nameservers all with v4 and v6 addresses. Since dnssec validation is
> intentionally broken unwind tries to talk to *all* nameservers, the
> log does not stop scrolling. It's so bad it actually times out on IPv6
> only.

(lib)unbound is very aggressive in finding a answer that validates. If
a reply fails validation, it will continue to recurse to see if it can
find a valid one. Other recursors use different apporoaches, e.g.
PowerDNS recursor cuts of the recursion if it finds an answer, even
if fails validation. I'd say if some of your nameservers provide
non-validating answers, fix them.

-Otto
> 
> Is this a bug in (lib)unbound or is it doing what it's supposed to do?
> 
> It is possible that from the point of view of (lib)unbound it is
> behaving perfectly fine, if you don't have an address family you are
> supposed to hit the available button in unbound.conf.
> 
> If that is the case we should do the same, automatically. I mean
> that's the whole point of unwind, figure out what works and use that.
> Meaning if we detect that we don't have working IPv4 set use-ipv4: no.
> 
> commit d273f78b8643bdb01f621260eb323123b774e431
> Author: florian 
> Date:   Fri Dec 6 13:08:48 2019 +
> 
> Stop fiddling with openlog / closelog in libunbound. unwind handles
> this. We need to find a way to properly upstream this.
> OK otto
> 
> diff --git libunbound/util/log.c libunbound/util/log.c
> index dfbb2334994..e8e987963c5 100644
> --- libunbound/util/log.c
> +++ libunbound/util/log.c
> @@ -109,16 +109,20 @@ log_init(const char* filename, int use_syslog, const 
> char* chrootdir)
>   fclose(cl);
>   }
>  #ifdef HAVE_SYSLOG_H
> +#if 0/* unwind handles syslog for us */
>   if(logging_to_syslog) {
>   closelog();
>   logging_to_syslog = 0;
>   }
> +#endif
>   if(use_syslog) {
>   /* do not delay opening until first write, because we may
>* chroot and no longer be able to access dev/log and so on */
>   /* the facility is LOG_DAEMON by default, but
>* --with-syslog-facility=LOCAL[0-7] can override it */
> +#if 0/* unwind handles syslog for us */
>   openlog(ident, LOG_NDELAY, UB_SYSLOG_FACILITY);
> +#endif
>   logging_to_syslog = 1;
>   lock_basic_unlock(_lock);
>   return;
> 
> commit 81b0d744ff77e26ea69cee28aed10081d3973fe8
> Author: otto 
> Date:   Sat Dec 14 19:56:26 2019 +
> 
> Be less aggressive pre-allocating memory; ok florian@
> 
> diff --git libunbound/util/alloc.c libunbound/util/alloc.c
> index 7e9618931ca..e9613b10dcd 100644
> --- libunbound/util/alloc.c
> +++ libunbound/util/alloc.c
> @@ -113,7 +113,7 @@ alloc_init(struct alloc_cache* alloc, struct alloc_cache* 
> super,
>   alloc->last_id -= 1;/* for compiler portability. */
>   alloc->last_id |= alloc->next_id;
>   alloc->next_id += 1;/* because id=0 is special. */
> - alloc->max_reg_blocks = 100;
> + alloc->max_reg_blocks = 10;
>   alloc->num_reg_blocks = 0;
>   alloc->reg_list = NULL;
>   alloc->cleanup = NULL;
> 
> 
> > 
> > 
> > Index: libunbound/services/outside_network.c
> > ===
> > RCS file: /cvs/src/sbin/unwind/libunbound/services/outside_network.c,v
> > retrieving revision 1.9
> > diff -u -p -r1.9 outside_network.c
> > --- libunbound/services/outside_network.c   11 Dec 2020 12:21:40 -  
> > 1.9
> > +++ libunbound/services/outside_network.c   24 Jan 2021 

Re: [PATCH v3 (resend)] tee: Add -q, --quiet, --silent option to not write to stdout

2021-01-24 Thread Otto Moerbeek
On Sun, Jan 24, 2021 at 01:18:46PM +0100, Alejandro Colomar wrote:

> This is useful for using tee to just write to a file,
> at the end of a pipeline,
> without having to redirect to /dev/null
> 
> Example:
> 
> echo 'foo' | sudo tee -q /etc/foo;
> 
> is equivalent to the old (and ugly)

You keep repeating "ugly" as the reason you are wanting this.

I consider adding special options to command to solve an imagined
issue that can be solved with a general concept like redirection ugly.
Please stop pushing your diff to this list. So far nobody showed any
interest.

-Otto

> 
> echo 'foo' | sudo tee /etc/foo >/dev/null;
> 
> Signed-off-by: Alejandro Colomar 
> ---
> 
> Resend as v3. I forgot to change the subject line.
> Everything else is the same as in
> <20210123145356.53962-1-alx.manpa...@gmail.com>.
> 
>  src/tee.c | 18 +++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/src/tee.c b/src/tee.c
> index c81faea91..1dfa92cf2 100644
> --- a/src/tee.c
> +++ b/src/tee.c
> @@ -45,6 +45,9 @@ static bool append;
>  /* If true, ignore interrupts. */
>  static bool ignore_interrupts;
>  
> +/* Don't write to stdout */
> +static bool quiet;
> +
>  enum output_error
>{
>  output_error_sigpipe,  /* traditional behavior, sigpipe enabled.  */
> @@ -61,6 +64,8 @@ static struct option const long_options[] =
>{"append", no_argument, NULL, 'a'},
>{"ignore-interrupts", no_argument, NULL, 'i'},
>{"output-error", optional_argument, NULL, 'p'},
> +  {"quiet", no_argument, NULL, 'q'},
> +  {"silent", no_argument, NULL, 'q'},
>{GETOPT_HELP_OPTION_DECL},
>{GETOPT_VERSION_OPTION_DECL},
>{NULL, 0, NULL, 0}
> @@ -93,6 +98,7 @@ Copy standard input to each FILE, and also to standard 
> output.\n\
>  "), stdout);
>fputs (_("\
>-pdiagnose errors writing to non pipes\n\
> +  -q, --quiet, --silent don't write to standard output\n\
>--output-error[=MODE]   set behavior on write error.  See MODE below\n\
>  "), stdout);
>fputs (HELP_OPTION_DESCRIPTION, stdout);
> @@ -130,8 +136,9 @@ main (int argc, char **argv)
>  
>append = false;
>ignore_interrupts = false;
> +  quiet = false;
>  
> -  while ((optc = getopt_long (argc, argv, "aip", long_options, NULL)) != -1)
> +  while ((optc = getopt_long (argc, argv, "aipq", long_options, NULL)) != -1)
>  {
>switch (optc)
>  {
> @@ -151,6 +158,10 @@ main (int argc, char **argv)
>  output_error = output_error_warn_nopipe;
>break;
>  
> +case 'q':
> +  quiet = true;
> +  break;
> +
>  case_GETOPT_HELP_CHAR;
>  
>  case_GETOPT_VERSION_CHAR (PROGRAM_NAME, AUTHORS);
> @@ -235,8 +246,9 @@ tee_files (int nfiles, char **files)
>  break;
>  
>/* Write to all NFILES + 1 descriptors.
> - Standard output is the first one.  */
> -  for (i = 0; i <= nfiles; i++)
> + Standard output is the first one.
> + If 'quiet' is true, write to descriptors 1 and above (omit stdout)  
> */
> +  for (i = quiet; i <= nfiles; i++)
>  if (descriptors[i]
>  && fwrite (buffer, bytes_read, 1, descriptors[i]) != 1)
>{
> -- 
> 2.30.0
> 



Re: [PATCH] tee: Add -q, --quiet, --silent option to not write to stdout

2021-01-23 Thread Otto Moerbeek
On Sat, Jan 23, 2021 at 03:28:01PM +, Stuart Henderson wrote:

> [cc's trimmed]
> 
> On 2021/01/23 15:53, Alejandro Colomar wrote:
> > This is useful for using tee to just write to a file,
> > at the end of a pipeline,
> > without having to redirect to /dev/null
> > 
> > Example:
> > 
> > echo 'foo' | sudo tee -q /etc/foo;
> > 
> > is equivalent to the old (and ugly)
> > 
> > echo 'foo' | sudo tee /etc/foo >/dev/null;
> 
> If this added a new very useful feature then *maybe* it would be
> worthwhile. But as things stand, as an alternative way to do something
> which can already be done trivially, why would you want to encourage
> unportable scripts by adding it?
> 
> It's much less ugly to use >/dev/null than to create a script which
> will fail (possibly in bad ways) on slightly older OS releases.
> 
> I seriously doubt this will be added to OpenBSD.

Agreed. Using a general concept like redirection is much better that
adding specific options to commands. I have no idea why redirection
would be considered ugly.

-Otto



cmp -s bugfix

2021-01-08 Thread Otto Moerbeek
As reported on misc@

https://marc.info/?l=openbsd-misc=161016188503894=2

-Otto

Index: regular.c
===
RCS file: /cvs/src/usr.bin/cmp/regular.c,v
retrieving revision 1.12
diff -u -p -r1.12 regular.c
--- regular.c   6 Feb 2015 23:21:59 -   1.12
+++ regular.c   9 Jan 2021 06:53:20 -
@@ -51,15 +51,15 @@ c_regular(int fd1, char *file1, off_t sk
off_t byte, length, line;
int dfound;
 
-   if (sflag && len1 != len2)
-   exit(1);
-
if (skip1 > len1)
eofmsg(file1);
len1 -= skip1;
if (skip2 > len2)
eofmsg(file2);
len2 -= skip2;
+
+   if (sflag && len1 != len2)
+   exit(1);
 
length = MINIMUM(len1, len2);
if (length > SIZE_MAX) {



Re: use getnameinfo in bgpd to print addresses

2021-01-04 Thread Otto Moerbeek
On Mon, Jan 04, 2021 at 05:50:53PM +0100, Otto Moerbeek wrote:

> tOn Mon, Jan 04, 2021 at 01:42:48PM +0100, Theo Buehler wrote:
> 
> > > > +   return log_sockaddr(addr2sa(addr, 0, ), len);
> > > 
> > > Perhaps I haven't yet had enough coffee this year, but I'm unsure
> > > whether it is actually guaranteed that addr2sa() is called before the
> > > second len in this line is passed to log_sockaddr().
> > 
> > Answering my own question: C99 and C11 6.5.2.2.12 require that all side
> > effects must be completed before log_sockaddr() is called. As addr2sa()
> > changes the second len as a side effect, this should be fine.
> > 
> > ok tb
> > 
> 
> I am not convinced. Consider
> 
> #include 
> 
> char x;
> 
> char f(char *p)
> {
> *p = 'f';
>   return 'r';
> }
> 
> int main()
> {
>   x = 'm';
>   printf("%c %c %c\n", x, f(), x);
> }
> 
> prints "m r f" here. The first x is not influenced by the call to
> f(), while the second is. So the side effetc only affects one of the args.
> 
>   -Otto
> 
> 

And if you compile with gcc on amd64 you'll get "f r m".

In my reading, the C standard is clear; the order of evaluation of the
arguments is undefined. The side effects should indeed take effect
before calling the function, but that's something different, e.g.
before the side effetcs are complete, some of the args may already be
on the stack,

-Otto



Re: Thread local data setup and destruct

2021-01-04 Thread Otto Moerbeek
On Mon, Jan 04, 2021 at 06:03:46PM +0100, Mark Kettenis wrote:

> > Date: Sun, 3 Jan 2021 13:47:45 +0100
> > From: Otto Moerbeek 
> > 
> > On Thu, Dec 31, 2020 at 05:54:06PM +0100, Alexander Bluhm wrote:
> > 
> > > On Tue, Dec 29, 2020 at 04:07:19PM +0100, Otto Moerbeek wrote:
> > > > This workds better, checking the flags does not work if the thread is
> > > > already on the road to desctruction.
> > > 
> > > This diff survived a full regress run on amd64.
> > > 
> > > bluhm
> > 
> > anybody wants to ok?
> 
> Don't forget the debug leftovers (char buf[100] and int len).
> 
> Also, is the #include  in thread/rthread_cb.c really needed?

nope, that was part of the debug code.

> 
> Otherwise, this is ok kettenis@
> 
> 
> > > > Index: asr/asr.c
> > > > ===
> > > > RCS file: /cvs/src/lib/libc/asr/asr.c,v
> > > > retrieving revision 1.64
> > > > diff -u -p -r1.64 asr.c
> > > > --- asr/asr.c   6 Jul 2020 13:33:05 -   1.64
> > > > +++ asr/asr.c   29 Dec 2020 15:05:45 -
> > > > @@ -117,7 +117,7 @@ _asr_resolver_done(void *arg)
> > > > _asr_ctx_unref(ac);
> > > > return;
> > > > } else {
> > > > -   priv = _THREAD_PRIVATE(_asr, _asr, &_asr);
> > > > +   priv = _THREAD_PRIVATE_DT(_asr, _asr, NULL, &_asr);
> > > > if (*priv == NULL)
> > > > return;
> > > > asr = *priv;
> > > > @@ -128,6 +128,23 @@ _asr_resolver_done(void *arg)
> > > > free(asr);
> > > >  }
> > > >  
> > > > +static void
> > > > +_asr_resolver_done_tp(void *arg)
> > > > +{
> > > > +   char buf[100];
> > > > +   int len;
> > > > +   struct asr **priv = arg;
> > > > +   struct asr *asr;
> > > > +
> > > > +   if (*priv == NULL)
> > > > +   return;
> > > > +   asr = *priv;
> > > > +
> > > > +   _asr_ctx_unref(asr->a_ctx);
> > > > +   free(asr);
> > > > +   free(priv);
> > > > +}
> > > > +
> > > >  void *
> > > >  asr_resolver_from_string(const char *str)
> > > >  {
> > > > @@ -349,7 +366,8 @@ _asr_use_resolver(void *arg)
> > > > }
> > > > else {
> > > > DPRINT("using thread-local resolver\n");
> > > > -   priv = _THREAD_PRIVATE(_asr, _asr, &_asr);
> > > > +   priv = _THREAD_PRIVATE_DT(_asr, _asr, 
> > > > _asr_resolver_done_tp,
> > > > +   &_asr);
> > > > if (*priv == NULL) {
> > > > DPRINT("setting up thread-local resolver\n");
> > > > *priv = _asr_resolver();
> > > > Index: include/thread_private.h
> > > > ===
> > > > RCS file: /cvs/src/lib/libc/include/thread_private.h,v
> > > > retrieving revision 1.35
> > > > diff -u -p -r1.35 thread_private.h
> > > > --- include/thread_private.h13 Feb 2019 13:22:14 -  1.35
> > > > +++ include/thread_private.h29 Dec 2020 15:05:45 -
> > > > @@ -98,7 +98,8 @@ struct thread_callbacks {
> > > > void(*tc_mutex_destroy)(void **);
> > > > void(*tc_tag_lock)(void **);
> > > > void(*tc_tag_unlock)(void **);
> > > > -   void*(*tc_tag_storage)(void **, void *, size_t, void *);
> > > > +   void*(*tc_tag_storage)(void **, void *, size_t, void 
> > > > (*)(void *),
> > > > +  void *);
> > > > __pid_t (*tc_fork)(void);
> > > > __pid_t (*tc_vfork)(void);
> > > > void(*tc_thread_release)(struct pthread *);
> > > > @@ -142,6 +143,7 @@ __END_HIDDEN_DECLS
> > > >  #define _THREAD_PRIVATE_MUTEX_LOCK(name)   do {} while (0)
> > > >  #define _THREAD_PRIVATE_MUTEX_UNLOCK(name) do {} while (0)
> > > >  #define _THREAD_PRIVATE(keyname, storage, error)   &(storage)
> > > > +#define _THREAD_PRIVATE_DT(keyname, stora

Re: use getnameinfo in bgpd to print addresses

2021-01-04 Thread Otto Moerbeek
tOn Mon, Jan 04, 2021 at 01:42:48PM +0100, Theo Buehler wrote:

> > > + return log_sockaddr(addr2sa(addr, 0, ), len);
> > 
> > Perhaps I haven't yet had enough coffee this year, but I'm unsure
> > whether it is actually guaranteed that addr2sa() is called before the
> > second len in this line is passed to log_sockaddr().
> 
> Answering my own question: C99 and C11 6.5.2.2.12 require that all side
> effects must be completed before log_sockaddr() is called. As addr2sa()
> changes the second len as a side effect, this should be fine.
> 
> ok tb
> 

I am not convinced. Consider

#include 

char x;

char f(char *p)
{
*p = 'f';
return 'r';
}

int main()
{
x = 'm';
printf("%c %c %c\n", x, f(), x);
}

prints "m r f" here. The first x is not influenced by the call to
f(), while the second is. So the side effetc only affects one of the args.

-Otto




Re: Thread local data setup and destruct

2021-01-03 Thread Otto Moerbeek
On Thu, Dec 31, 2020 at 05:54:06PM +0100, Alexander Bluhm wrote:

> On Tue, Dec 29, 2020 at 04:07:19PM +0100, Otto Moerbeek wrote:
> > This workds better, checking the flags does not work if the thread is
> > already on the road to desctruction.
> 
> This diff survived a full regress run on amd64.
> 
> bluhm

anybody wants to ok?

-Otto

> 
> > Index: asr/asr.c
> > ===
> > RCS file: /cvs/src/lib/libc/asr/asr.c,v
> > retrieving revision 1.64
> > diff -u -p -r1.64 asr.c
> > --- asr/asr.c   6 Jul 2020 13:33:05 -   1.64
> > +++ asr/asr.c   29 Dec 2020 15:05:45 -
> > @@ -117,7 +117,7 @@ _asr_resolver_done(void *arg)
> > _asr_ctx_unref(ac);
> > return;
> > } else {
> > -   priv = _THREAD_PRIVATE(_asr, _asr, &_asr);
> > +   priv = _THREAD_PRIVATE_DT(_asr, _asr, NULL, &_asr);
> > if (*priv == NULL)
> > return;
> > asr = *priv;
> > @@ -128,6 +128,23 @@ _asr_resolver_done(void *arg)
> > free(asr);
> >  }
> >  
> > +static void
> > +_asr_resolver_done_tp(void *arg)
> > +{
> > +   char buf[100];
> > +   int len;
> > +   struct asr **priv = arg;
> > +   struct asr *asr;
> > +
> > +   if (*priv == NULL)
> > +   return;
> > +   asr = *priv;
> > +
> > +   _asr_ctx_unref(asr->a_ctx);
> > +   free(asr);
> > +   free(priv);
> > +}
> > +
> >  void *
> >  asr_resolver_from_string(const char *str)
> >  {
> > @@ -349,7 +366,8 @@ _asr_use_resolver(void *arg)
> > }
> > else {
> > DPRINT("using thread-local resolver\n");
> > -   priv = _THREAD_PRIVATE(_asr, _asr, &_asr);
> > +   priv = _THREAD_PRIVATE_DT(_asr, _asr, _asr_resolver_done_tp,
> > +   &_asr);
> > if (*priv == NULL) {
> > DPRINT("setting up thread-local resolver\n");
> > *priv = _asr_resolver();
> > Index: include/thread_private.h
> > ===
> > RCS file: /cvs/src/lib/libc/include/thread_private.h,v
> > retrieving revision 1.35
> > diff -u -p -r1.35 thread_private.h
> > --- include/thread_private.h13 Feb 2019 13:22:14 -  1.35
> > +++ include/thread_private.h29 Dec 2020 15:05:45 -
> > @@ -98,7 +98,8 @@ struct thread_callbacks {
> > void(*tc_mutex_destroy)(void **);
> > void(*tc_tag_lock)(void **);
> > void(*tc_tag_unlock)(void **);
> > -   void*(*tc_tag_storage)(void **, void *, size_t, void *);
> > +   void*(*tc_tag_storage)(void **, void *, size_t, void (*)(void *),
> > +  void *);
> > __pid_t (*tc_fork)(void);
> > __pid_t (*tc_vfork)(void);
> > void(*tc_thread_release)(struct pthread *);
> > @@ -142,6 +143,7 @@ __END_HIDDEN_DECLS
> >  #define _THREAD_PRIVATE_MUTEX_LOCK(name)   do {} while (0)
> >  #define _THREAD_PRIVATE_MUTEX_UNLOCK(name) do {} while (0)
> >  #define _THREAD_PRIVATE(keyname, storage, error)   &(storage)
> > +#define _THREAD_PRIVATE_DT(keyname, storage, dt, error)&(storage)
> >  #define _MUTEX_LOCK(mutex) do {} while (0)
> >  #define _MUTEX_UNLOCK(mutex)   do {} while (0)
> >  #define _MUTEX_DESTROY(mutex)  do {} while (0)
> > @@ -168,7 +170,12 @@ __END_HIDDEN_DECLS
> >  #define _THREAD_PRIVATE(keyname, storage, error)   \
> > (_thread_cb.tc_tag_storage == NULL ? &(storage) :   \
> > _thread_cb.tc_tag_storage(&(__THREAD_NAME(keyname)),\
> > -   &(storage), sizeof(storage), error))
> > +   &(storage), sizeof(storage), NULL, (error)))
> > +
> > +#define _THREAD_PRIVATE_DT(keyname, storage, dt, error)
> > \
> > +   (_thread_cb.tc_tag_storage == NULL ? &(storage) :   \
> > +   _thread_cb.tc_tag_storage(&(__THREAD_NAME(keyname)),\
> > +   &(storage), sizeof(storage), (dt), (error)))
> >  
> >  /*
> >   * Macros used in libc to access mutexes.
> > Index: thread/rthread_cb.h
> > ===
> > RCS file: /cvs/src/lib/libc/thread/rthread_cb.h,v
> > retrieving revision 1.2
> > diff -u -p -r1.2 rthread_cb.h

Re: drm(4) memory allocation diff

2021-01-01 Thread Otto Moerbeek
On Thu, Dec 31, 2020 at 10:09:36PM +0100, Mark Kettenis wrote:

> The diff below changes the emulated Linux memory allocation functions
> a bit such that they only use malloc(9) for allocations smaller than a
> page.  This reduces pressure on the "interrupt safe" map and hopefully
> will avoid the
> 
> uvm_mapent_alloc: out of static map entries
> 
> messages that some people have seen more often in the last few months.
> It also could help with some memory allocation issues seem by
> amdgpu(4) users.
> 
> The downside of this approach is that memory leaks will be harder to
> spot as the larger memory allocations are no longer included in the
> DRM type as reported by vmstat -m.  Another downside is that this no
> longer caps the amount of kernel memory that can be allocated by
> drm(4).  If that turns out to be a problem, we can impose a limit on
> the amount of memory that can be allocated this way.
> 
> The implementation needs to keep track of the size of each allocated
> memory block.  This is done using a red-black tree.  Our kernel uses
> red-black trees in similar situations but I wouldn't call myself an
> expert in the area of data structures so a there may be a better
> approach.
> 
> Some real-life testing would be appreciated.
> 

A hash table (as used in userland) would lead to an amortized
complexity of O(1) for an insert or delete, while RB-tree would be
O(log n) for a tree of size n. The drawbakc is that a hash table
is more complex. Basically it depend on the amount of insert and
deletes you expect if it's worth the trouble.

A few nits inline,

-Otto

> Index: dev/pci/drm/drm_linux.c
> ===
> RCS file: /cvs/src/sys/dev/pci/drm/drm_linux.c,v
> retrieving revision 1.74
> diff -u -p -r1.74 drm_linux.c
> --- dev/pci/drm/drm_linux.c   31 Dec 2020 06:31:55 -  1.74
> +++ dev/pci/drm/drm_linux.c   31 Dec 2020 20:44:38 -
> @@ -430,6 +430,116 @@ dmi_check_system(const struct dmi_system
>   return (num);
>  }
>  
> +struct vmalloc_entry {
> + const void  *addr;
> + size_t  size;
> + RBT_ENTRY(vmalloc_entry) vmalloc_node;
> +};
> +
> +struct pool vmalloc_pool;
> +RBT_HEAD(vmalloc_tree, vmalloc_entry) vmalloc_tree;
> +
> +RBT_PROTOTYPE(vmalloc_tree, vmalloc_entry, vmalloc_node, vmalloc_compare);
> +
> +static inline int
> +vmalloc_compare(const struct vmalloc_entry *a, const struct vmalloc_entry *b)
> +{
> + vaddr_t va = (vaddr_t)a->addr;
> + vaddr_t vb = (vaddr_t)b->addr;
> +
> + return va < vb ? -1 : va > vb;
> +}
> +
> +RBT_GENERATE(vmalloc_tree, vmalloc_entry, vmalloc_node, vmalloc_compare);
> +
> +bool
> +is_vmalloc_addr(const void *addr)
> +{
> + struct vmalloc_entry key;
> + struct vmalloc_entry *entry;
> +
> + key.addr = addr;
> + entry = RBT_FIND(vmalloc_tree, _tree, );
> + return (entry != NULL);
> +}
> +
> +void *
> +vmalloc(unsigned long size)


Why sometimes unsigned long and sometimes size_t for sizes?

> +{
> + struct vmalloc_entry *entry;
> + void *addr;
> +
> + size = roundup(size, PAGE_SIZE);
> + addr = km_alloc(size, _any, _dirty, _waitok);
> + if (addr) {
> + entry = pool_get(_pool, PR_WAITOK);
> + entry->addr = addr;
> + entry->size = size;
> + RBT_INSERT(vmalloc_tree, _tree, entry);
> + }
> +
> + return addr;
> +}
> +
> +void *
> +vzalloc(unsigned long size)
> +{
> + struct vmalloc_entry *entry;
> + void *addr;
> +
> + size = roundup(size, PAGE_SIZE);
> + addr = km_alloc(size, _any, _zero, _waitok);
> + if (addr) {
> + entry = pool_get(_pool, PR_WAITOK);
> + entry->addr = addr;
> + entry->size = size;
> + RBT_INSERT(vmalloc_tree, _tree, entry);
> + }
> +
> + return addr;
> +}
> +
> +void
> +vfree(const void *addr)
> +{
> + struct vmalloc_entry key;
> + struct vmalloc_entry *entry;
> +
> + key.addr = addr;
> + entry = RBT_FIND(vmalloc_tree, _tree, );
> + if (entry == NULL)
> + panic("%s: non vmalloced addr %p", __func__, addr);
> +
> + RBT_REMOVE(vmalloc_tree, _tree, entry);
> + km_free((void *)addr, entry->size, _any, _dirty);
> + pool_put(_pool, entry);
> +}
> +
> +void *
> +kvmalloc(size_t size, gfp_t flags)
> +{
> + if (flags != GFP_KERNEL || size < PAGE_SIZE)
> + return malloc(size, M_DRM, flags);
> + return vmalloc(size);
> +}
> +
> +void *
> +kvzalloc(size_t size, gfp_t flags)
> +{
> + if (flags != GFP_KERNEL || size < PAGE_SIZE)
> + return malloc(size, M_DRM, flags | M_ZERO);
> + return vzalloc(size);
> +}
> +
> +void
> +kvfree(const void *addr)
> +{
> + if (is_vmalloc_addr(addr))
> + vfree(addr);
> + else
> + free((void *)addr, M_DRM, 0);
> +}
> +
>  struct vm_page *
>  alloc_pages(unsigned int gfp_mask, unsigned int order)
>  {
> @@ -1939,6 +2049,10 @@ 

Re: Thread local data setup and destruct

2020-12-29 Thread Otto Moerbeek
On Tue, Dec 29, 2020 at 01:42:57PM +0100, Otto Moerbeek wrote:

> On Tue, Dec 29, 2020 at 12:46:54PM +0100, Mark Kettenis wrote:
> 
> > > Date: Tue, 29 Dec 2020 12:21:25 +0100
> > > From: Otto Moerbeek 
> > > 
> > > Hi,
> > > 
> > > this is a continuation from the threads on bugs@
> > > 
> > > This version makes it explicit to *only* setup "TLS" (which actually
> > > is just a pointer to static data) with the data provided if we're
> > > running single threaded (ie.. no -pthread or -pthread but no pthread
> > > function that triggers multi-threaded init called yet).  In all other
> > > cases the real allocated TLS is zero'ed. This avoids the setup issue
> > > asr is having.  This diff also makes sure asr uses a specialized
> > > destructor for it's TLS to kill a mem leak that occurs on thread
> > > destruction.
> > > 
> > > All in all I'm still a bit fond of the constructor approach to
> > > librthread init: it makes the distinction between having a single
> > > thread and being the main thread in a multi-threaded setup moot.
> > > 
> > > Anyway, we can consider that later, please test this.
> > 
> > This isn't right.  The memcpy() call you're removing is there such
> > that the contents of the per-thread storage for the initial (main)
> > thread remains the same.
> > 
> > With your change it reverts back to zero when the first thread is
> > created.  In the particular case at hand that means the initial thread
> > would get a fresh resolver context at that point.
> 
> Yes, but if I do not that crashes happen on destruct. Note that your
> initial suggestion (in asr.c)
> 
> if (priv != &_asr && *priv == _asr)
> *priv = NULL;
> 
> has the same property: it clears the contents if they happen to be the
> same as _asr.
> 
> The crash happens if the pre-mt (static) and the TLS of the main
> thread share the same data. I did not find a way to make that work, so
> I decided to use the big hammer. I might be overlooking something
> obvious though...


This workds better, checking the flags does not work if the thread is
already on the road to desctruction.

-Otto

Index: asr/asr.c
===
RCS file: /cvs/src/lib/libc/asr/asr.c,v
retrieving revision 1.64
diff -u -p -r1.64 asr.c
--- asr/asr.c   6 Jul 2020 13:33:05 -   1.64
+++ asr/asr.c   29 Dec 2020 15:05:45 -
@@ -117,7 +117,7 @@ _asr_resolver_done(void *arg)
_asr_ctx_unref(ac);
return;
} else {
-   priv = _THREAD_PRIVATE(_asr, _asr, &_asr);
+   priv = _THREAD_PRIVATE_DT(_asr, _asr, NULL, &_asr);
if (*priv == NULL)
return;
asr = *priv;
@@ -128,6 +128,23 @@ _asr_resolver_done(void *arg)
free(asr);
 }
 
+static void
+_asr_resolver_done_tp(void *arg)
+{
+   char buf[100];
+   int len;
+   struct asr **priv = arg;
+   struct asr *asr;
+
+   if (*priv == NULL)
+   return;
+   asr = *priv;
+
+   _asr_ctx_unref(asr->a_ctx);
+   free(asr);
+   free(priv);
+}
+
 void *
 asr_resolver_from_string(const char *str)
 {
@@ -349,7 +366,8 @@ _asr_use_resolver(void *arg)
}
else {
DPRINT("using thread-local resolver\n");
-   priv = _THREAD_PRIVATE(_asr, _asr, &_asr);
+   priv = _THREAD_PRIVATE_DT(_asr, _asr, _asr_resolver_done_tp,
+   &_asr);
if (*priv == NULL) {
DPRINT("setting up thread-local resolver\n");
*priv = _asr_resolver();
Index: include/thread_private.h
===
RCS file: /cvs/src/lib/libc/include/thread_private.h,v
retrieving revision 1.35
diff -u -p -r1.35 thread_private.h
--- include/thread_private.h13 Feb 2019 13:22:14 -  1.35
+++ include/thread_private.h29 Dec 2020 15:05:45 -
@@ -98,7 +98,8 @@ struct thread_callbacks {
void(*tc_mutex_destroy)(void **);
void(*tc_tag_lock)(void **);
void(*tc_tag_unlock)(void **);
-   void*(*tc_tag_storage)(void **, void *, size_t, void *);
+   void*(*tc_tag_storage)(void **, void *, size_t, void (*)(void *),
+  void *);
__pid_t (*tc_fork)(void);
__pid_t (*tc_vfork)(void);
void(*tc_thread_release)(struct pthread *);
@@ -142,6 +143,7 @@ __END_HIDDEN_DECLS
 #define _THREAD_PRIVATE_MUTEX_LOCK(name)   do {} while (0)
 #define _THREAD_PRIVATE_MUTEX_UNLOCK(name) do {} while (0)
 #define _

Re: Thread local data setup and destruct

2020-12-29 Thread Otto Moerbeek
On Tue, Dec 29, 2020 at 12:46:54PM +0100, Mark Kettenis wrote:

> > Date: Tue, 29 Dec 2020 12:21:25 +0100
> > From: Otto Moerbeek 
> > 
> > Hi,
> > 
> > this is a continuation from the threads on bugs@
> > 
> > This version makes it explicit to *only* setup "TLS" (which actually
> > is just a pointer to static data) with the data provided if we're
> > running single threaded (ie.. no -pthread or -pthread but no pthread
> > function that triggers multi-threaded init called yet).  In all other
> > cases the real allocated TLS is zero'ed. This avoids the setup issue
> > asr is having.  This diff also makes sure asr uses a specialized
> > destructor for it's TLS to kill a mem leak that occurs on thread
> > destruction.
> > 
> > All in all I'm still a bit fond of the constructor approach to
> > librthread init: it makes the distinction between having a single
> > thread and being the main thread in a multi-threaded setup moot.
> > 
> > Anyway, we can consider that later, please test this.
> 
> This isn't right.  The memcpy() call you're removing is there such
> that the contents of the per-thread storage for the initial (main)
> thread remains the same.
> 
> With your change it reverts back to zero when the first thread is
> created.  In the particular case at hand that means the initial thread
> would get a fresh resolver context at that point.

Yes, but if I do not that crashes happen on destruct. Note that your
initial suggestion (in asr.c)

if (priv != &_asr && *priv == _asr)
*priv = NULL;

has the same property: it clears the contents if they happen to be the
same as _asr.

The crash happens if the pre-mt (static) and the TLS of the main
thread share the same data. I did not find a way to make that work, so
I decided to use the big hammer. I might be overlooking something
obvious though...

-Otto

> 
> > Index: asr/asr.c
> > ===
> > RCS file: /cvs/src/lib/libc/asr/asr.c,v
> > retrieving revision 1.64
> > diff -u -p -r1.64 asr.c
> > --- asr/asr.c   6 Jul 2020 13:33:05 -   1.64
> > +++ asr/asr.c   29 Dec 2020 11:10:08 -
> > @@ -117,7 +117,7 @@ _asr_resolver_done(void *arg)
> > _asr_ctx_unref(ac);
> > return;
> > } else {
> > -   priv = _THREAD_PRIVATE(_asr, _asr, &_asr);
> > +   priv = _THREAD_PRIVATE_DT(_asr, _asr, NULL, &_asr);
> > if (*priv == NULL)
> > return;
> > asr = *priv;
> > @@ -128,6 +128,21 @@ _asr_resolver_done(void *arg)
> > free(asr);
> >  }
> >  
> > +static void
> > +_asr_resolver_done_tp(void *arg)
> > +{
> > +   struct asr **priv = arg;
> > +   struct asr *asr;
> > +
> > +   if (*priv == NULL)
> > +   return;
> > +   asr = *priv;
> > +
> > +   _asr_ctx_unref(asr->a_ctx);
> > +   free(asr);
> > +   free(priv);
> > +}
> > +
> >  void *
> >  asr_resolver_from_string(const char *str)
> >  {
> > @@ -349,7 +364,8 @@ _asr_use_resolver(void *arg)
> > }
> > else {
> > DPRINT("using thread-local resolver\n");
> > -   priv = _THREAD_PRIVATE(_asr, _asr, &_asr);
> > +   priv = _THREAD_PRIVATE_DT(_asr, _asr, _asr_resolver_done_tp,
> > +   &_asr);
> > if (*priv == NULL) {
> > DPRINT("setting up thread-local resolver\n");
> > *priv = _asr_resolver();
> > Index: include/thread_private.h
> > ===
> > RCS file: /cvs/src/lib/libc/include/thread_private.h,v
> > retrieving revision 1.35
> > diff -u -p -r1.35 thread_private.h
> > --- include/thread_private.h13 Feb 2019 13:22:14 -  1.35
> > +++ include/thread_private.h29 Dec 2020 11:10:08 -
> > @@ -51,7 +51,7 @@ PROTO_NORMAL(_malloc_init);
> >   * tc_tag_storage:
> >   * Returns a pointer to per-thread instance of data associated
> >   * with the given tag.  If the given tag is NULL a tag is
> > - * allocated and initialized automatically.
> > + * allocated and cleared automatically.
> >   *
> >   * tc_fork, tc_vfork:
> >   * If not NULL, they are called instead of the syscall stub, so that
> > @@ -98,7 +98,7 @@ struct thread_callbacks {
> > void(*tc_mutex_destroy)(void **);
> > void(*tc_tag_lock)(void **);
> >

Thread local data setup and destruct

2020-12-29 Thread Otto Moerbeek
Hi,

this is a continuation from the threads on bugs@

This version makes it explicit to *only* setup "TLS" (which actually
is just a pointer to static data) with the data provided if we're
running single threaded (ie.. no -pthread or -pthread but no pthread
function that triggers multi-threaded init called yet).  In all other
cases the real allocated TLS is zero'ed. This avoids the setup issue
asr is having.  This diff also makes sure asr uses a specialized
destructor for it's TLS to kill a mem leak that occurs on thread
destruction.

All in all I'm still a bit fond of the constructor approach to
librthread init: it makes the distinction between having a single
thread and being the main thread in a multi-threaded setup moot.

Anyway, we can consider that later, please test this.

-Otto

Index: asr/asr.c
===
RCS file: /cvs/src/lib/libc/asr/asr.c,v
retrieving revision 1.64
diff -u -p -r1.64 asr.c
--- asr/asr.c   6 Jul 2020 13:33:05 -   1.64
+++ asr/asr.c   29 Dec 2020 11:10:08 -
@@ -117,7 +117,7 @@ _asr_resolver_done(void *arg)
_asr_ctx_unref(ac);
return;
} else {
-   priv = _THREAD_PRIVATE(_asr, _asr, &_asr);
+   priv = _THREAD_PRIVATE_DT(_asr, _asr, NULL, &_asr);
if (*priv == NULL)
return;
asr = *priv;
@@ -128,6 +128,21 @@ _asr_resolver_done(void *arg)
free(asr);
 }
 
+static void
+_asr_resolver_done_tp(void *arg)
+{
+   struct asr **priv = arg;
+   struct asr *asr;
+
+   if (*priv == NULL)
+   return;
+   asr = *priv;
+
+   _asr_ctx_unref(asr->a_ctx);
+   free(asr);
+   free(priv);
+}
+
 void *
 asr_resolver_from_string(const char *str)
 {
@@ -349,7 +364,8 @@ _asr_use_resolver(void *arg)
}
else {
DPRINT("using thread-local resolver\n");
-   priv = _THREAD_PRIVATE(_asr, _asr, &_asr);
+   priv = _THREAD_PRIVATE_DT(_asr, _asr, _asr_resolver_done_tp,
+   &_asr);
if (*priv == NULL) {
DPRINT("setting up thread-local resolver\n");
*priv = _asr_resolver();
Index: include/thread_private.h
===
RCS file: /cvs/src/lib/libc/include/thread_private.h,v
retrieving revision 1.35
diff -u -p -r1.35 thread_private.h
--- include/thread_private.h13 Feb 2019 13:22:14 -  1.35
+++ include/thread_private.h29 Dec 2020 11:10:08 -
@@ -51,7 +51,7 @@ PROTO_NORMAL(_malloc_init);
  * tc_tag_storage:
  * Returns a pointer to per-thread instance of data associated
  * with the given tag.  If the given tag is NULL a tag is
- * allocated and initialized automatically.
+ * allocated and cleared automatically.
  *
  * tc_fork, tc_vfork:
  * If not NULL, they are called instead of the syscall stub, so that
@@ -98,7 +98,7 @@ struct thread_callbacks {
void(*tc_mutex_destroy)(void **);
void(*tc_tag_lock)(void **);
void(*tc_tag_unlock)(void **);
-   void*(*tc_tag_storage)(void **, void *, size_t, void *);
+   void*(*tc_tag_storage)(void **, size_t, void (*)(void *), void *);
__pid_t (*tc_fork)(void);
__pid_t (*tc_vfork)(void);
void(*tc_thread_release)(struct pthread *);
@@ -142,6 +142,7 @@ __END_HIDDEN_DECLS
 #define _THREAD_PRIVATE_MUTEX_LOCK(name)   do {} while (0)
 #define _THREAD_PRIVATE_MUTEX_UNLOCK(name) do {} while (0)
 #define _THREAD_PRIVATE(keyname, storage, error)   &(storage)
+#define _THREAD_PRIVATE_DT(keyname, storage, dt, error)&(storage)
 #define _MUTEX_LOCK(mutex) do {} while (0)
 #define _MUTEX_UNLOCK(mutex)   do {} while (0)
 #define _MUTEX_DESTROY(mutex)  do {} while (0)
@@ -168,7 +169,12 @@ __END_HIDDEN_DECLS
 #define _THREAD_PRIVATE(keyname, storage, error)   \
(_thread_cb.tc_tag_storage == NULL ? &(storage) :   \
_thread_cb.tc_tag_storage(&(__THREAD_NAME(keyname)),\
-   &(storage), sizeof(storage), error))
+   sizeof(storage), NULL, (error)))
+
+#define _THREAD_PRIVATE_DT(keyname, storage, dt, error)
\
+   (_thread_cb.tc_tag_storage == NULL ? &(storage) :   \
+   _thread_cb.tc_tag_storage(&(__THREAD_NAME(keyname)),\
+   sizeof(storage), (dt), (error)))
 
 /*
  * Macros used in libc to access mutexes.
Index: thread/rthread_cb.h
===
RCS file: /cvs/src/lib/libc/thread/rthread_cb.h,v
retrieving revision 1.2
diff -u -p -r1.2 rthread_cb.h
--- thread/rthread_cb.h 5 Sep 2017 02:40:54 -   1.2
+++ thread/rthread_cb.h 29 Dec 2020 11:10:08 -

Re: Wake on LAN support for rge(4)

2020-12-23 Thread Otto Moerbeek
On Wed, Dec 23, 2020 at 12:35:46PM +0800, Kevin Lo wrote:

> Hi,
> 
> This diff implements WoL support in rge(4).  I can wakeup the machine with WoL
> after suspending it through `zzz` or powering off it through `halt -p`.

Thanks! This works as expected in my testing.

-Otto

> 
> Index: share/man/man4/rge.4
> ===
> RCS file: /cvs/src/share/man/man4/rge.4,v
> retrieving revision 1.4
> diff -u -p -u -p -r1.4 rge.4
> --- share/man/man4/rge.4  12 Oct 2020 02:11:10 -  1.4
> +++ share/man/man4/rge.4  23 Dec 2020 04:33:26 -
> @@ -37,6 +37,15 @@ Rivet Networks Killer E3000 Adapter (250
>  .It
>  TP-LINK TL-NG421 Adapter (2500baseT)
>  .El
> +.Pp
> +The
> +.Nm
> +driver additionally supports Wake on LAN (WoL).
> +See
> +.Xr arp 8
> +and
> +.Xr ifconfig 8
> +for more details.
>  .Sh SEE ALSO
>  .Xr arp 4 ,
>  .Xr ifmedia 4 ,
> Index: sys/dev/pci/if_rge.c
> ===
> RCS file: /cvs/src/sys/dev/pci/if_rge.c,v
> retrieving revision 1.9
> diff -u -p -u -p -r1.9 if_rge.c
> --- sys/dev/pci/if_rge.c  12 Dec 2020 11:48:53 -  1.9
> +++ sys/dev/pci/if_rge.c  23 Dec 2020 04:33:27 -
> @@ -59,6 +59,7 @@ int rge_debug = 0;
>  
>  int  rge_match(struct device *, void *, void *);
>  void rge_attach(struct device *, struct device *, void *);
> +int  rge_activate(struct device *, int);
>  int  rge_intr(void *);
>  int  rge_encap(struct rge_softc *, struct mbuf *, int);
>  int  rge_ioctl(struct ifnet *, u_long, caddr_t);
> @@ -111,6 +112,10 @@ int  rge_get_link_status(struct rge_soft
>  void rge_txstart(void *);
>  void rge_tick(void *);
>  void rge_link_state(struct rge_softc *);
> +#ifndef SMALL_KERNEL
> +int  rge_wol(struct ifnet *, int);
> +void rge_wol_power(struct rge_softc *);
> +#endif
>  
>  static const struct {
>   uint16_t reg;
> @@ -126,7 +131,7 @@ static const struct {
>  };
>  
>  struct cfattach rge_ca = {
> - sizeof(struct rge_softc), rge_match, rge_attach
> + sizeof(struct rge_softc), rge_match, rge_attach, NULL, rge_activate
>  };
>  
>  struct cfdriver rge_cd = {
> @@ -272,6 +277,11 @@ rge_attach(struct device *parent, struct
>   ifp->if_capabilities |= IFCAP_VLAN_HWTAGGING;
>  #endif
>  
> +#ifndef SMALL_KERNEL
> + ifp->if_capabilities |= IFCAP_WOL;
> + ifp->if_wol = rge_wol;
> + rge_wol(ifp, 0);
> +#endif
>   timeout_set(>sc_timeout, rge_tick, sc);
>   task_set(>sc_task, rge_txstart, sc);
>  
> @@ -288,6 +298,25 @@ rge_attach(struct device *parent, struct
>  }
>  
>  int
> +rge_activate(struct device *self, int act)
> +{
> + struct rge_softc *sc = (struct rge_softc *)self;
> + int rv = 0;
> +
> + switch (act) {
> + case DVACT_POWERDOWN:
> + rv = config_activate_children(self, act);
> +#ifndef SMALL_KERNEL
> + rge_wol_power(sc);
> +#endif
> + default:
> + rv = config_activate_children(self, act);
> + break;
> + }
> + return (rv);
> +}
> +
> +int
>  rge_intr(void *arg)
>  {
>   struct rge_softc *sc = arg;
> @@ -2025,6 +2054,7 @@ rge_hw_init(struct rge_softc *sc)
>   /* Set PCIe uncorrectable error status. */
>   rge_write_csi(sc, 0x108,
>   rge_read_csi(sc, 0x108) | 0x0010);
> +
>  }
>  
>  void
> @@ -2391,3 +2421,48 @@ rge_link_state(struct rge_softc *sc)
>   if_link_state_change(ifp);
>   }
>  }
> +
> +#ifndef SMALL_KERNEL
> +int
> +rge_wol(struct ifnet *ifp, int enable)
> +{
> + struct rge_softc *sc = ifp->if_softc;
> +
> + if (enable) {
> + if (!(RGE_READ_1(sc, RGE_CFG1) & RGE_CFG1_PM_EN)) {
> + printf("%s: power management is disabled, "
> + "cannot do WOL\n", sc->sc_dev.dv_xname);
> + return (ENOTSUP);
> + }
> +
> + }
> +
> + rge_iff(sc);
> +
> + if (enable)
> + RGE_MAC_SETBIT(sc, 0xc0b6, 0x0001);
> + else
> + RGE_MAC_CLRBIT(sc, 0xc0b6, 0x0001);
> +
> + RGE_SETBIT_1(sc, RGE_EECMD, RGE_EECMD_WRITECFG);
> + RGE_CLRBIT_1(sc, RGE_CFG5, RGE_CFG5_WOL_LANWAKE | RGE_CFG5_WOL_UCAST |
> + RGE_CFG5_WOL_MCAST | RGE_CFG5_WOL_BCAST);
> + RGE_CLRBIT_1(sc, RGE_CFG3, RGE_CFG3_WOL_LINK | RGE_CFG3_WOL_MAGIC);
> + if (enable)
> + RGE_SETBIT_1(sc, RGE_CFG5, RGE_CFG5_WOL_LANWAKE);
> + RGE_CLRBIT_1(sc, RGE_EECMD, RGE_EECMD_WRITECFG);
> +
> + return (0);
> +}
> +
> +void
> +rge_wol_power(struct rge_softc *sc)
> +{
> + /* Disable RXDV gate. */
> + RGE_CLRBIT_1(sc, RGE_PPSW, 0x08);
> + DELAY(2000);
> +
> + RGE_SETBIT_1(sc, RGE_CFG1, RGE_CFG1_PM_EN);
> + RGE_SETBIT_1(sc, RGE_CFG2, RGE_CFG2_PMSTS_EN);
> +}
> +#endif
> Index: sys/dev/pci/if_rgereg.h
> ===
> 

Re: kdump: show scope for v6 addresses if set

2020-12-20 Thread Otto Moerbeek
On Sun, Dec 20, 2020 at 02:34:09PM +0100, Claudio Jeker wrote:

> On Sun, Dec 20, 2020 at 01:39:57PM +0100, Otto Moerbeek wrote:
> > Hi,
> > 
> > scope is there, just not shown. While there, use proper constants for
> > two sizes.
> > 
> > -Otto
> > 
> > 
> > Index: ktrstruct.c
> > ===
> > RCS file: /cvs/src/usr.bin/kdump/ktrstruct.c,v
> > retrieving revision 1.28
> > diff -u -p -r1.28 ktrstruct.c
> > --- ktrstruct.c 17 Nov 2018 20:46:12 -  1.28
> > +++ ktrstruct.c 20 Dec 2020 12:34:34 -
> > @@ -90,7 +90,7 @@ ktrsockaddr(struct sockaddr *sa)
> > switch(sa->sa_family) {
> > case AF_INET: {
> > struct sockaddr_in  *sa_in;
> > -   char addr[64];
> > +   char addr[INET_ADDRSTRLEN];
> >  
> > sa_in = (struct sockaddr_in *)sa;
> > check_sockaddr_len(in);
> > @@ -100,12 +100,15 @@ ktrsockaddr(struct sockaddr *sa)
> > }
> > case AF_INET6: {
> > struct sockaddr_in6 *sa_in6;
> > -   char addr[64];
> > +   char addr[INET6_ADDRSTRLEN], scope[12] = { 0 };
> >  
> > sa_in6 = (struct sockaddr_in6 *)sa;
> > check_sockaddr_len(in6);
> > inet_ntop(AF_INET6, _in6->sin6_addr, addr, sizeof addr);
> > -   printf("[%s]:%u", addr, htons(sa_in6->sin6_port));
> > +   if (sa_in6->sin6_scope_id)
> > +   snprintf(scope, sizeof(scope), "%%%u",
> > +   sa_in6->sin6_scope_id);
> 
> Would it make sense to use if_indextoname() here to translate the string
> into an interface name? The snprintf would still be needed for the case
> where NULL is returned by if_indextoname().

that translation is dependent on the machine kdump is run on. So it
will give the wrong interface often. And even on the same machine the
network config might have changed.

-Otto


> > +   printf("[%s%s]:%u", addr, scope, htons(sa_in6->sin6_port));
> > break;
> > }
> > case AF_UNIX: {
> > 
> 
> -- 
> :wq Claudio



kdump: show scope for v6 addresses if set

2020-12-20 Thread Otto Moerbeek
Hi,

scope is there, just not shown. While there, use proper constants for
two sizes.

-Otto


Index: ktrstruct.c
===
RCS file: /cvs/src/usr.bin/kdump/ktrstruct.c,v
retrieving revision 1.28
diff -u -p -r1.28 ktrstruct.c
--- ktrstruct.c 17 Nov 2018 20:46:12 -  1.28
+++ ktrstruct.c 20 Dec 2020 12:34:34 -
@@ -90,7 +90,7 @@ ktrsockaddr(struct sockaddr *sa)
switch(sa->sa_family) {
case AF_INET: {
struct sockaddr_in  *sa_in;
-   char addr[64];
+   char addr[INET_ADDRSTRLEN];
 
sa_in = (struct sockaddr_in *)sa;
check_sockaddr_len(in);
@@ -100,12 +100,15 @@ ktrsockaddr(struct sockaddr *sa)
}
case AF_INET6: {
struct sockaddr_in6 *sa_in6;
-   char addr[64];
+   char addr[INET6_ADDRSTRLEN], scope[12] = { 0 };
 
sa_in6 = (struct sockaddr_in6 *)sa;
check_sockaddr_len(in6);
inet_ntop(AF_INET6, _in6->sin6_addr, addr, sizeof addr);
-   printf("[%s]:%u", addr, htons(sa_in6->sin6_port));
+   if (sa_in6->sin6_scope_id)
+   snprintf(scope, sizeof(scope), "%%%u",
+   sa_in6->sin6_scope_id);
+   printf("[%s%s]:%u", addr, scope, htons(sa_in6->sin6_port));
break;
}
case AF_UNIX: {



Re: dig vs ipv6 (scoped) addresses

2020-12-20 Thread Otto Moerbeek
On Sun, Dec 20, 2020 at 10:48:01AM +0100, Florian Obser wrote:

> On Thu, Dec 17, 2020 at 12:15:16PM +0100, Otto Moerbeek wrote:
> > Hi,
> 
> > 
> > as noted on misc dig does not like to talk to local link addresses.
> > This fixes that case. While investigating I also found another bug:
> 
> Thanks for looking into this. Looks like I got distracted while
> ripping out isc_sockaddr and did not fully clean it up. Probably
> because I found another isc_indirection to delete :/
> 
> I'd rather like to get rid of isc_sockaddr_fromin* completely, see
> diff at the end.
> 
> > selecting v6 or v4 addresses only from resolv.conf via the -4 or -6
> > command line argument does not work as expected.
> 
> Nice catch.
> 
> > 
> > This fixes both. I did not test binding to a src address with this.
> > This is likely as broken as it was before.
> 
> My diff fixes that, too.
> 
> I still need to keep isc_sockaddr_fromin* because it's used for
> +subnet i.e. ecs. Which is broken, too. I'm having a look now.
> 
> > 
> > -Otto
> > 
> 
> > Index: lib/lwres/lwconfig.c
> > ===
> > RCS file: /cvs/src/usr.bin/dig/lib/lwres/lwconfig.c,v
> > retrieving revision 1.6
> > diff -u -p -r1.6 lwconfig.c
> > --- lib/lwres/lwconfig.c25 Feb 2020 05:00:43 -  1.6
> > +++ lib/lwres/lwconfig.c17 Dec 2020 11:06:49 -
> > @@ -231,7 +231,7 @@ lwres_conf_parsenameserver(lwres_conf_t 
> >  
> > res = lwres_create_addr(word, , 1);
> > use_ipv4 = confdata->flags & LWRES_USEIPV4;
> > -   use_ipv6 = confdata->flags & LWRES_USEIPV4;
> > +   use_ipv6 = confdata->flags & LWRES_USEIPV6;
> > if (res == LWRES_R_SUCCESS &&
> > ((address.family == LWRES_ADDRTYPE_V4 && use_ipv4) ||
> > (address.family == LWRES_ADDRTYPE_V6 && use_ipv6))) {
> > 
> 
> OK florian for this

Committed

> 
> OK for this version for the rest?

A few nits inline. With those either addressed or ignored, OK,

-Otto
> 
> 
> diff --git dig.c dig.c
> index a0988a0567b..6b142a03239 100644
> --- dig.c
> +++ dig.c
> @@ -17,7 +17,10 @@
>  /* $Id: dig.c,v 1.18 2020/09/15 11:47:42 florian Exp $ */
>  
>  /*! \file */
> -#include 
> +#include 
> +#include 
> +
> +#include 
>  
>  #include 
>  #include 
> @@ -1275,10 +1278,7 @@ dash_option(char *option, char *next, dig_lookup_t 
> **lookup,
>   dns_rdatatype_t rdtype;
>   dns_rdataclass_t rdclass;
>   char textname[MXNAME];
> - struct in_addr in4;
> - struct in6_addr in6;
> - in_port_t srcport;
> - char *hash, *cmd;
> + char *cmd;
>   uint32_t num;
>   const char *errstr;
>  
> @@ -1346,28 +1346,39 @@ dash_option(char *option, char *next, dig_lookup_t 
> **lookup,
>   if (value == NULL)
>   goto invalid_option;
>   switch (opt) {
> - case 'b':
> + case 'b': {
> + struct addrinfo *ai = NULL, hints;
> + int error;
> + char *hash;
> +
> + memset(, 0, sizeof(hints));
> + hints.ai_flags = AI_NUMERICHOST;
> + hints.ai_socktype = SOCK_STREAM;

It does not realy matter for the rsult, but SOCK_DGRAM feels more
natural for DNS.

> +
>   hash = strchr(value, '#');
>   if (hash != NULL) {
> - num = strtonum(hash + 1, 0, MAXPORT, );
> - if (errstr != NULL)
> - fatal("port number is %s: '%s'", errstr,
> - hash + 1);
> - srcport = num;
>   *hash = '\0';
> + error = getaddrinfo(value, hash + 1, , );
> + *hash = '#';
>   } else
> - srcport = 0;
> - if (have_ipv6 && inet_pton(AF_INET6, value, ) == 1)
> - isc_sockaddr_fromin6(_address, , srcport);
> - else if (have_ipv4 && inet_pton(AF_INET, value, ) == 1)
> - isc_sockaddr_fromin(_address, , srcport);
> - else
> + error = getaddrinfo(value, NULL, , );
> +
> + if (error)
> + fatal("invalid address %s: %s", value,
> + gai_strerror(error));
> + if (ai == NULL || ai->ai_addrlen > sizeof(bind_address))
> + fatal("invalid address %s", value);
> + if (!have_ipv4 && ai->ai_family ==

dig vs ipv6 (scoped) addresses

2020-12-17 Thread Otto Moerbeek
Hi,

as noted on misc dig does not like to talk to local link addresses.
This fixes that case. While investigating I also found another bug:
selecting v6 or v4 addresses only from resolv.conf via the -4 or -6
command line argument does not work as expected.

This fixes both. I did not test binding to a src address with this.
This is likely as broken as it was before.

-Otto

Index: dig.c
===
RCS file: /cvs/src/usr.bin/dig/dig.c,v
retrieving revision 1.18
diff -u -p -r1.18 dig.c
--- dig.c   15 Sep 2020 11:47:42 -  1.18
+++ dig.c   17 Dec 2020 11:06:49 -
@@ -1358,7 +1358,7 @@ dash_option(char *option, char *next, di
} else
srcport = 0;
if (have_ipv6 && inet_pton(AF_INET6, value, ) == 1)
-   isc_sockaddr_fromin6(_address, , srcport);
+   isc_sockaddr_fromin6(_address, , srcport, 0);
else if (have_ipv4 && inet_pton(AF_INET, value, ) == 1)
isc_sockaddr_fromin(_address, , srcport);
else
Index: dighost.c
===
RCS file: /cvs/src/usr.bin/dig/dighost.c,v
retrieving revision 1.34
diff -u -p -r1.34 dighost.c
--- dighost.c   15 Sep 2020 11:47:42 -  1.34
+++ dighost.c   17 Dec 2020 11:06:49 -
@@ -540,7 +540,7 @@ get_addresses(const char *hostname, in_p
struct sockaddr_in6 *sin6;
sin6 = (struct sockaddr_in6 *)tmpai->ai_addr;
isc_sockaddr_fromin6([i], >sin6_addr,
-dstport);
+dstport, sin6->sin6_scope_id);
}
i++;
 
@@ -972,7 +972,7 @@ parse_netprefix(struct sockaddr_storage 
 
if (inet_pton(AF_INET6, buf, ) == 1) {
parsed = 1;
-   isc_sockaddr_fromin6(sa, , 0);
+   isc_sockaddr_fromin6(sa, , 0, 0);
if (prefix_length > 128)
prefix_length = 128;
} else if (inet_pton(AF_INET, buf, ) == 1) {
Index: lib/isc/sockaddr.c
===
RCS file: /cvs/src/usr.bin/dig/lib/isc/sockaddr.c,v
retrieving revision 1.14
diff -u -p -r1.14 sockaddr.c
--- lib/isc/sockaddr.c  28 Nov 2020 06:33:55 -  1.14
+++ lib/isc/sockaddr.c  17 Dec 2020 11:06:49 -
@@ -223,7 +223,7 @@ isc_sockaddr_anyofpf(struct sockaddr_sto
 
 void
 isc_sockaddr_fromin6(struct sockaddr_storage *sockaddr, const struct in6_addr 
*ina6,
-in_port_t port)
+in_port_t port, uint32_t scope)
 {
struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *) sockaddr;
memset(sockaddr, 0, sizeof(*sockaddr));
@@ -231,6 +231,7 @@ isc_sockaddr_fromin6(struct sockaddr_sto
sin6->sin6_len = sizeof(*sin6);
sin6->sin6_addr = *ina6;
sin6->sin6_port = htons(port);
+   sin6->sin6_scope_id = scope;
 }
 
 int
Index: lib/isc/include/isc/sockaddr.h
===
RCS file: /cvs/src/usr.bin/dig/lib/isc/include/isc/sockaddr.h,v
retrieving revision 1.7
diff -u -p -r1.7 sockaddr.h
--- lib/isc/include/isc/sockaddr.h  15 Sep 2020 11:47:42 -  1.7
+++ lib/isc/include/isc/sockaddr.h  17 Dec 2020 11:06:49 -
@@ -91,7 +91,7 @@ isc_sockaddr_fromin(struct sockaddr_stor
 
 void
 isc_sockaddr_fromin6(struct sockaddr_storage *sockaddr, const struct in6_addr 
*ina6,
-in_port_t port);
+in_port_t port, uint32_t scope);
 /*%<
  * Construct an struct sockaddr_storage from an IPv6 address and port.
  */
Index: lib/lwres/lwconfig.c
===
RCS file: /cvs/src/usr.bin/dig/lib/lwres/lwconfig.c,v
retrieving revision 1.6
diff -u -p -r1.6 lwconfig.c
--- lib/lwres/lwconfig.c25 Feb 2020 05:00:43 -  1.6
+++ lib/lwres/lwconfig.c17 Dec 2020 11:06:49 -
@@ -231,7 +231,7 @@ lwres_conf_parsenameserver(lwres_conf_t 
 
res = lwres_create_addr(word, , 1);
use_ipv4 = confdata->flags & LWRES_USEIPV4;
-   use_ipv6 = confdata->flags & LWRES_USEIPV4;
+   use_ipv6 = confdata->flags & LWRES_USEIPV6;
if (res == LWRES_R_SUCCESS &&
((address.family == LWRES_ADDRTYPE_V4 && use_ipv4) ||
(address.family == LWRES_ADDRTYPE_V6 && use_ipv6))) {



Re: syspatch exit state

2020-12-06 Thread Otto Moerbeek
On Sun, Dec 06, 2020 at 03:31:19PM +, SW wrote:

> On 06/12/2020 14:32, Otto Moerbeek wrote:
> > On Sun, Dec 06, 2020 at 02:19:05PM +, SW wrote:
> >
> >> Hi,
> >> I've been looking to have syspatch give me a quick indication of whether
> >> a reboot is likely to be required. As a quick and dirty check, I've just
> >> been treating "Were patches applied?" as the indicator.
> >>
> >> The following diff will cause syspatch to exit when applying patches
> >> with status code 0 only if patches were actually applied.
> >>
> >> My biggest concern is that it does cause a change in behaviour, so
> >> perhaps this either needs making into an option or a different approach
> >> entirely?
> >>
> >> --- syspatch    Sun Dec  6 14:11:12 2020
> >> +++ syspatch    Sun Dec  6 14:10:23 2020
> >> @@ -323,3 +323,9 @@ if ((OPTIND == 1)); then
> >>     _PATCH_APPLIED=true
> >>     done
> >>  fi
> >> +
> >> +if [ "$_PATCH_APPLIED" = "true" ]; then
> >> +   exit 0
> >> +else
> >> +   exit 1
> >> +fi
> >>
> >> Thanks,
> >> S
> >>
> > I don't this is correct since it maks syspatch exit 1 on "no patches 
> > applied".
> >
> > -Otto
> > .
> That's precisely the idea- from previous discussion with a couple of
> people there didn't seem to be an easy (programmatic) way to figure out
> whether syspatch had done anything or not.

exit code 1 normally used for error conditions. A system being
up-to-date is not an error condition. 

-Otto


> 
> Doing this would be a bit of a blunt way of handling things, and perhaps
> it would be better gated behind a flag, but is there a better way to
> make a scripted update work automatically (including rebooting as
> necessary)?
> 
> Thanks,
> S



Re: syspatch exit state

2020-12-06 Thread Otto Moerbeek
On Sun, Dec 06, 2020 at 02:19:05PM +, SW wrote:

> Hi,
> I've been looking to have syspatch give me a quick indication of whether
> a reboot is likely to be required. As a quick and dirty check, I've just
> been treating "Were patches applied?" as the indicator.
> 
> The following diff will cause syspatch to exit when applying patches
> with status code 0 only if patches were actually applied.
> 
> My biggest concern is that it does cause a change in behaviour, so
> perhaps this either needs making into an option or a different approach
> entirely?
> 
> --- syspatch    Sun Dec  6 14:11:12 2020
> +++ syspatch    Sun Dec  6 14:10:23 2020
> @@ -323,3 +323,9 @@ if ((OPTIND == 1)); then
>     _PATCH_APPLIED=true
>     done
>  fi
> +
> +if [ "$_PATCH_APPLIED" = "true" ]; then
> +   exit 0
> +else
> +   exit 1
> +fi
> 
> Thanks,
> S
> 

I don't this is correct since it maks syspatch exit 1 on "no patches applied".

-Otto



Re: clean /dev from /etc/daily ?

2020-11-23 Thread Otto Moerbeek
tOn Mon, Nov 23, 2020 at 01:53:01PM +0100, Solene Rapenne wrote:

> A common mistake when using dd is to create a file in /dev which
> fills up the space of / and may stay silent until / gets filled up
> by something else that will fail.
> 
> Would it be OK to add this in /etc/daily?
> 
> find /dev -type f ! -name MAKEDEV -delete
> 
> AFAIK /dev should have only MAKEDEV as a regular file.
> hier(7) says /dev only have block and character devices
> with the exception of MAKEDEV.
> 

reporting is good, but deleting not.

-Otto



Ryzen 5800X hw.setperf vs hw.cpuspeed

2020-11-20 Thread Otto Moerbeek
Hi,

I got a new Ryzen machine, dmesg below. What I'm observing might be a
issue with hw.setperf. 

On startsup it shows:

hw.cpuspeed=3800
hw.setperf=100

If I lower hw.setperf to zero, the new state is reflect immediately in
hw.cpuspeed:

hw.cpuspeed=2200
hw.setperf=0

And also sha256 -t becomes slower as expected.

But If I raise hw.setperf to 100 I'm seeing:

hw.cpuspeed=2200
hw.setperf=100

and sha256 -t is still slow. Only after some time passes (lets say a
couple of tens of seconds) it does show:

hw.cpuspeed=3800
hw.setperf=100

and sha256 -t is fast again.

This behaviour is different from my old machine, where setting
hs.setperf was reflected in hs.cpuspeed immediately both ways

Any clue?

-Otto

OpenBSD 6.8-current (GENERIC.MP) #1: Thu Nov 19 21:01:06 CET 2020
o...@lou.intra.drijf.net:/usr/src/sys/arch/amd64/compile/GENERIC.MP
real mem = 34286964736 (32698MB)
avail mem = 33232543744 (31693MB)
random: good seed from bootblocks
mpath0 at root
scsibus0 at mpath0: 256 targets
mainbus0 at root
bios0 at mainbus0: SMBIOS rev. 3.3 @ 0xe8d60 (55 entries)
bios0: vendor American Megatrends Inc. version "F11d" date 10/29/2020
bios0: Gigabyte Technology Co., Ltd. B550 AORUS ELITE
acpi0 at bios0: ACPI 6.0
acpi0: sleep states S0 S3 S4 S5
acpi0: tables DSDT FACP SSDT SSDT SSDT SSDT FIDT MCFG HPET BGRT IVRS PCCT SSDT 
CRAT CDIT SSDT SSDT SSDT SSDT WSMT APIC SSDT SSDT SSDT FPDT
acpi0: wakeup devices GPP0(S4) GP12(S4) GP13(S4) XHC0(S4) GP30(S4) GP31(S4) 
GPP2(S4) GPP3(S4) GPP8(S4) GPP1(S4)
acpitimer0 at acpi0: 3579545 Hz, 32 bits
acpimcfg0 at acpi0
acpimcfg0: addr 0xf000, bus 0-127
acpihpet0 at acpi0: 14318180 Hz
acpimadt0 at acpi0 addr 0xfee0: PC-AT compat
cpu0 at mainbus0: apid 0 (boot processor)
cpu0: AMD Ryzen 7 5800X 8-Core Processor, 3793.35 MHz, 19-21-00
cpu0: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,PKU,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu0: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 
8-way L2 cache, 32MB 64b/line disabled L3 cache
cpu0: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu0: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu0: smt 0, core 0, package 0
mtrr: Pentium Pro MTRR support, 8 var ranges, 88 fixed ranges
cpu0: apic clock running at 99MHz
cpu0: mwait min=64, max=64, C-substates=1.1, IBE
cpu1 at mainbus0: apid 2 (application processor)
cpu1: AMD Ryzen 7 5800X 8-Core Processor, 3792.89 MHz, 19-21-00
cpu1: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,PKU,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu1: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 
8-way L2 cache, 32MB 64b/line disabled L3 cache
cpu1: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu1: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu1: smt 0, core 1, package 0
cpu2 at mainbus0: apid 4 (application processor)
cpu2: AMD Ryzen 7 5800X 8-Core Processor, 3792.89 MHz, 19-21-00
cpu2: 
FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CFLUSH,MMX,FXSR,SSE,SSE2,HTT,SSE3,PCLMUL,MWAIT,SSSE3,FMA3,CX16,SSE4.1,SSE4.2,MOVBE,POPCNT,AES,XSAVE,AVX,F16C,RDRAND,NXE,MMXX,FFXSR,PAGE1GB,RDTSCP,LONG,LAHF,CMPLEG,SVM,EAPICSP,AMCR8,ABM,SSE4A,MASSE,3DNOWP,OSVW,IBS,SKINIT,TCE,TOPEXT,CPCTR,DBKP,PCTRL3,MWAITX,ITSC,FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,PQM,RDSEED,ADX,SMAP,CLFLUSHOPT,CLWB,SHA,UMIP,PKU,IBPB,IBRS,STIBP,SSBD,XSAVEOPT,XSAVEC,XGETBV1,XSAVES
cpu2: 32KB 64b/line 8-way I-cache, 32KB 64b/line 8-way D-cache, 512KB 64b/line 
8-way L2 cache, 32MB 64b/line disabled L3 cache
cpu2: ITLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu2: DTLB 64 4KB entries fully associative, 64 4MB entries fully associative
cpu2: smt 0, core 2, package 0
cpu3 at mainbus0: apid 6 (application processor)
cpu3: AMD Ryzen 7 5800X 8-Core Processor, 3792.89 MHz, 19-21-00
cpu3: 

Re: diff: tcp ack improvement

2020-11-05 Thread Otto Moerbeek
On Fri, Nov 06, 2020 at 01:10:52AM +0100, Jan Klemkow wrote:

> Hi,
> 
> bluhm and I make some network performance measurements and kernel
> profiling.
> 
> Setup:Linux (iperf) -10gbit-> OpenBSD (relayd) -10gbit-> Linux (iperf)
> 
> We figured out, that the kernel uses a huge amount of processing time
> for sending ACKs to the sender on the receiving interface.  After
> receiving a data segment, we send our two ACK.  The first one in
> tcp_input() direct after receiving.  The second ACK is send out, after
> the userland or the sosplice task read some data out of the socket
> buffer.
> 
> The fist ACK in tcp_input() is called after receiving every other data
> segment like it is discribed in RFC1122:
> 
>   4.2.3.2  When to Send an ACK Segment
>   A TCP SHOULD implement a delayed ACK, but an ACK should
>   not be excessively delayed; in particular, the delay
>   MUST be less than 0.5 seconds, and in a stream of
>   full-sized segments there SHOULD be an ACK for at least
>   every second segment.
> 
> This advice is based on the paper "Congestion Avoidance and Control":
> 
>   4 THE GATEWAY SIDE OF CONGESTION CONTROL
>   The 8 KBps senders were talking to 4.3+BSD receivers
>   which would delay an ack for atmost one packet (because
>   of an ack’s clock’ role, the authors believe that the
>   minimum ack frequency should be every other packet).
> 
> Sending the first ACK (on every other packet) coasts us too much
> processing time.  Thus, we run into a full socket buffer earlier.  The
> first ACK just acknowledges the received data, but does not update the
> window.  The second ACK, caused by the socket buffer reader, also
> acknowledges the data and also updates the window.  So, the second ACK,
> is much more worth for a fast packet processing than the fist one.
> 
> The performance improvement is between 33% with splicing and 20% without
> splice:
> 
>   splicingrelaying
> 
>   current 3.1 GBit/s  2.6 GBit/s
>   w/o first ack   4.1 GBit/s  3.1 GBit/s
> 
> As far as I understand the implementation of other operating systems:
> Linux has implement a custom TCP_QUICKACK socket option, to turn this
> kind of feature on and off.  FreeBSD and NetBSD sill depend on it, when
> using the New Reno implementation.
> 
> The following diff turns off the direct ACK on every other segment.  We
> are running this diff in production on our own machines at genua and on
> our products for several month, now.  We don't noticed any problems,
> even with interactive network sessions (ssh) nor with bulk traffic.
> 
> Another solution could be a sysctl(3) or an additional socket option,
> similar to Linux, to control this behavior per socket or system wide.
> Also, a counter to ACK every 3rd, 4th... data segment could beat the
> problem.

I am wondering if you also looked at another scenario: the process
reading the soecket is sleeping so the receive buffer fills up without
any acks being sent. Won't that lead to a lot of retransmissions
containing data?

-Otto

> 
> bye,
> Jan
> 
> Index: netinet/tcp_input.c
> ===
> RCS file: /cvs/src/sys/netinet/tcp_input.c,v
> retrieving revision 1.365
> diff -u -p -r1.365 tcp_input.c
> --- netinet/tcp_input.c   19 Jun 2020 22:47:22 -  1.365
> +++ netinet/tcp_input.c   5 Nov 2020 23:00:34 -
> @@ -165,8 +165,8 @@ do { \
>  #endif
>  
>  /*
> - * Macro to compute ACK transmission behavior.  Delay the ACK unless
> - * we have already delayed an ACK (must send an ACK every two segments).
> + * Macro to compute ACK transmission behavior.  Delay the ACK until
> + * a read from the socket buffer or the delayed ACK timer causes one.
>   * We also ACK immediately if we received a PUSH and the ACK-on-PUSH
>   * option is enabled or when the packet is coming from a loopback
>   * interface.
> @@ -176,8 +176,7 @@ do { \
>   struct ifnet *ifp = NULL; \
>   if (m && (m->m_flags & M_PKTHDR)) \
>   ifp = if_get(m->m_pkthdr.ph_ifidx); \
> - if (TCP_TIMER_ISARMED(tp, TCPT_DELACK) || \
> - (tcp_ack_on_push && (tiflags) & TH_PUSH) || \
> + if ((tcp_ack_on_push && (tiflags) & TH_PUSH) || \
>   (ifp && (ifp->if_flags & IFF_LOOPBACK))) \
>   tp->t_flags |= TF_ACKNOW; \
>   else \
> 



Re: dig(1): Extended DNS Error (RFC 8914)

2020-10-30 Thread Otto Moerbeek
On Fri, Oct 30, 2020 at 03:04:03PM +0100, Florian Obser wrote:

Love it,

-Otto

> $ obj/dig @1.1.1.1 dnssec-failed.org
> 
> ; <<>> dig 9.10.8-P1 <<>> @1.1.1.1 dnssec-failed.org
> ; (1 server found)
> ;; global options: +cmd
> ;; Got answer:
> ;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 26772
> ;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
> 
> ;; OPT PSEUDOSECTION:
> ; EDNS: version: 0, flags:; udp: 1232
> ; EDE: 6 (DNSSEC Bogus)
> ;; QUESTION SECTION:
> ;dnssec-failed.org. IN  A
> 
> ;; Query time: 244 msec
> ;; SERVER: 1.1.1.1#53(1.1.1.1)
> ;; WHEN: Fri Oct 30 14:59:09 CET 2020
> ;; MSG SIZE  rcvd: 52
> 
> Since I'm not aware of a server/query combination that responds with
> UTF-8 encoded EXTENDED-TEXT I didn't implement anything special for
> this so it will use the default renderer that's also used for NSIDs,
> printing a hexdump + printable ascii, e.g.:
> 
> $ dig @k.root-servers.net +nsid . soa
> [...]
> ;; OPT PSEUDOSECTION:
> ; EDNS: version: 0, flags:; udp: 1232
> ; NSID: 6e 73 33 2e 6e 6c 2d 61 6d 73 2e 6b 2e 72 69 70 65 2e 6e 65 74 
> ("ns3.nl-ams.k.ripe.net")
> 
> OK?
> 
> diff --git lib/dns/include/dns/message.h lib/dns/include/dns/message.h
> index 65ffcfd4c3f..a70720eee39 100644
> --- lib/dns/include/dns/message.h
> +++ lib/dns/include/dns/message.h
> @@ -104,6 +104,7 @@
>  #define DNS_OPT_COOKIE   10  /*%< COOKIE opt code */
>  #define DNS_OPT_PAD  12  /*%< PAD opt code */
>  #define DNS_OPT_KEY_TAG  14  /*%< Key tag opt code */
> +#define DNS_OPT_EDE  15  /* RFC 8914 */
>  
>  /*%< The number of EDNS options we know about. */
>  #define DNS_EDNSOPTIONS  4
> diff --git lib/dns/message.c lib/dns/message.c
> index 5e0fb167382..9721f9c0ef4 100644
> --- lib/dns/message.c
> +++ lib/dns/message.c
> @@ -2434,6 +2434,68 @@ render_ecs(isc_buffer_t *ecsbuf, isc_buffer_t *target) 
> {
>   return (ISC_R_SUCCESS);
>  }
>  
> +static const char *
> +ede_info_code2str(uint16_t info_code)
> +{
> + if (info_code > 49151)
> + return "Private Use";
> +
> + switch (info_code) {
> + case 0:
> + return "Other Error";
> + case 1:
> + return "Unsupported DNSKEY Algorithm";
> + case 2:
> + return "Unsupported DS Digest Type";
> + case 3:
> + return "Stale Answer";
> + case 4:
> + return "Forged Answer";
> + case 5:
> + return "DNSSEC Indeterminate";
> + case 6:
> + return "DNSSEC Bogus";
> + case 7:
> + return "Signature Expired";
> + case 8:
> + return "Signature Not Yet Valid";
> + case 9:
> + return "DNSKEY Missing";
> + case 10:
> + return "RRSIGs Missing";
> + case 11:
> + return "No Zone Key Bit Set";
> + case 12:
> + return "NSEC Missing";
> + case 13:
> + return "Cached Error";
> + case 14:
> + return "Not Ready";
> + case 15:
> + return "Blocked";
> + case 16:
> + return "Censored";
> + case 17:
> + return "Filtered";
> + case 18:
> + return "Prohibited";
> + case 19:
> + return "Stale NXDomain Answer";
> + case 20:
> + return "Not Authoritative";
> + case 21:
> + return "Not Supported";
> + case 22:
> + return "No Reachable Authority";
> + case 23:
> + return "Network Error";
> + case 24:
> + return "Invalid Data";
> + default:
> + return "Unassigned";
> + }
> +}
> +
>  isc_result_t
>  dns_message_pseudosectiontotext(dns_message_t *msg,
>   dns_pseudosection_t section,
> @@ -2557,6 +2619,20 @@ dns_message_pseudosectiontotext(dns_message_t *msg,
>   ADD_STRING(target, "\n");
>   continue;
>   }
> + } else if (optcode == DNS_OPT_EDE) {
> + uint16_t info_code;
> + ADD_STRING(target, "; EDE");
> + if (optlen >= 2) {
> + info_code =
> + isc_buffer_getuint16();
> + optlen -= 2;
> + snprintf(buf, sizeof(buf), ": %u (",
> + info_code);
> + ADD_STRING(target, buf);
> + ADD_STRING(target,
> + ede_info_code2str(info_code));
> + ADD_STRING(target, ")");
> + }
>   } else {
>   ADD_STRING(target, "; 

tree.h: returning void, legal but weird

2020-10-10 Thread Otto Moerbeek


OK?

-Otto

Index: tree.h
===
RCS file: /cvs/src/sys/sys/tree.h,v
retrieving revision 1.29
diff -u -p -r1.29 tree.h
--- tree.h  30 Jul 2017 19:27:20 -  1.29
+++ tree.h  10 Oct 2020 16:36:15 -
@@ -910,25 +910,25 @@ _name##_RBT_PARENT(struct _type *elm) 
 __unused static inline void\
 _name##_RBT_SET_LEFT(struct _type *elm, struct _type *left)\
 {  \
-   return _rb_set_left(_name##_RBT_TYPE, elm, left);   \
+   _rb_set_left(_name##_RBT_TYPE, elm, left);  \
 }  \
\
 __unused static inline void\
 _name##_RBT_SET_RIGHT(struct _type *elm, struct _type *right)  \
 {  \
-   return _rb_set_right(_name##_RBT_TYPE, elm, right); \
+   _rb_set_right(_name##_RBT_TYPE, elm, right);\
 }  \
\
 __unused static inline void\
 _name##_RBT_SET_PARENT(struct _type *elm, struct _type *parent)
\
 {  \
-   return _rb_set_parent(_name##_RBT_TYPE, elm, parent);   \
+   _rb_set_parent(_name##_RBT_TYPE, elm, parent);  \
 }  \
\
 __unused static inline void\
 _name##_RBT_POISON(struct _type *elm, unsigned long poison)\
 {  \
-   return _rb_poison(_name##_RBT_TYPE, elm, poison);   \
+   _rb_poison(_name##_RBT_TYPE, elm, poison);  \
 }  \
\
 __unused static inline int \



Re: random canary bytes for malloc

2020-10-04 Thread Otto Moerbeek
On Tue, Sep 29, 2020 at 08:17:54AM +0200, Otto Moerbeek wrote:

> Hi,
> 
> until now, canary bytes (used by the C olption) were the same as the
> bytes used to junk (0xfd).  This means that certain overwrites are not
> detected, like setting the high bit. 
> 
> This makes the byte value used to write canaries random. I do not want
> to complicate the code to handle all combinatuon of F and C, so 0xfd
> is still acepted as a canary byte.
> 
> Please test with all your favourite combinations of malloc flags.

Any takers apart from tb@ who tested this earlier?

-Otto

> 
> Index: malloc.c
> ===
> RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v
> retrieving revision 1.263
> diff -u -p -r1.263 malloc.c
> --- malloc.c  6 Sep 2020 06:41:03 -   1.263
> +++ malloc.c  10 Sep 2020 10:53:18 -
> @@ -193,7 +193,7 @@ struct malloc_readonly {
>   int def_malloc_junk;/* junk fill? */
>   int malloc_realloc; /* always realloc? */
>   int malloc_xmalloc; /* xmalloc behaviour? */
> - int chunk_canaries; /* use canaries after chunks? */
> + u_int   chunk_canaries; /* use canaries after chunks? */
>   int internal_funcs; /* use better recallocarray/freezero? */
>   u_int   def_malloc_cache;   /* free pages we cache */
>   size_t  malloc_guard;   /* use guard pages after allocations? */
> @@ -468,6 +468,11 @@ omalloc_init(void)
>  
>   while ((mopts.malloc_canary = arc4random()) == 0)
>   ;
> + if (mopts.chunk_canaries)
> + do {
> + mopts.chunk_canaries = arc4random();
> + } while ((u_char)mopts.chunk_canaries == 0 ||
> + (u_char)mopts.chunk_canaries == SOME_FREEJUNK); 
>  }
>  
>  static void
> @@ -938,7 +943,7 @@ fill_canary(char *ptr, size_t sz, size_t
>  
>   if (check_sz > CHUNK_CHECK_LENGTH)
>   check_sz = CHUNK_CHECK_LENGTH;
> - memset(ptr + sz, SOME_JUNK, check_sz);
> + memset(ptr + sz, mopts.chunk_canaries, check_sz);
>  }
>  
>  /*
> @@ -1039,7 +1044,7 @@ validate_canary(struct dir_info *d, u_ch
>   q = p + check_sz;
>  
>   while (p < q) {
> - if (*p != SOME_JUNK) {
> + if (*p != (u_char)mopts.chunk_canaries && *p != SOME_JUNK) {
>   wrterror(d, "chunk canary corrupted %p %#tx@%#zx%s",
>   ptr, p - ptr, sz,
>   *p == SOME_FREEJUNK ? " (double free?)" : "");
> 



dump: better handling of large filesystems

2020-09-29 Thread Otto Moerbeek
Hi,

this fixes an overwrite of spcl.c_addr.  Taken form FreeBSD.

See https://marc.info/?l=openbsd-misc=160018252418088=2

-Otto


Index: tape.c
===
RCS file: /cvs/src/sbin/dump/tape.c,v
retrieving revision 1.45
diff -u -p -r1.45 tape.c
--- tape.c  28 Jun 2019 13:32:43 -  1.45
+++ tape.c  26 Sep 2020 06:30:37 -
@@ -330,7 +330,10 @@ flushtape(void)
}
 
blks = 0;
-   if (spcl.c_type != TS_END) {
+   if (spcl.c_type != TS_END && spcl.c_type != TS_CLRI &&
+   spcl.c_type != TS_BITS) {
+   if (spcl.c_count > TP_NINDIR)
+   quit("c_count too large\n");
for (i = 0; i < spcl.c_count; i++)
if (spcl.c_addr[i] != 0)
blks++;



random canary bytes for malloc

2020-09-29 Thread Otto Moerbeek
Hi,

until now, canary bytes (used by the C olption) were the same as the
bytes used to junk (0xfd).  This means that certain overwrites are not
detected, like setting the high bit. 

This makes the byte value used to write canaries random. I do not want
to complicate the code to handle all combinatuon of F and C, so 0xfd
is still acepted as a canary byte.

Please test with all your favourite combinations of malloc flags.

-Otto

Index: malloc.c
===
RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v
retrieving revision 1.263
diff -u -p -r1.263 malloc.c
--- malloc.c6 Sep 2020 06:41:03 -   1.263
+++ malloc.c10 Sep 2020 10:53:18 -
@@ -193,7 +193,7 @@ struct malloc_readonly {
int def_malloc_junk;/* junk fill? */
int malloc_realloc; /* always realloc? */
int malloc_xmalloc; /* xmalloc behaviour? */
-   int chunk_canaries; /* use canaries after chunks? */
+   u_int   chunk_canaries; /* use canaries after chunks? */
int internal_funcs; /* use better recallocarray/freezero? */
u_int   def_malloc_cache;   /* free pages we cache */
size_t  malloc_guard;   /* use guard pages after allocations? */
@@ -468,6 +468,11 @@ omalloc_init(void)
 
while ((mopts.malloc_canary = arc4random()) == 0)
;
+   if (mopts.chunk_canaries)
+   do {
+   mopts.chunk_canaries = arc4random();
+   } while ((u_char)mopts.chunk_canaries == 0 ||
+   (u_char)mopts.chunk_canaries == SOME_FREEJUNK); 
 }
 
 static void
@@ -938,7 +943,7 @@ fill_canary(char *ptr, size_t sz, size_t
 
if (check_sz > CHUNK_CHECK_LENGTH)
check_sz = CHUNK_CHECK_LENGTH;
-   memset(ptr + sz, SOME_JUNK, check_sz);
+   memset(ptr + sz, mopts.chunk_canaries, check_sz);
 }
 
 /*
@@ -1039,7 +1044,7 @@ validate_canary(struct dir_info *d, u_ch
q = p + check_sz;
 
while (p < q) {
-   if (*p != SOME_JUNK) {
+   if (*p != (u_char)mopts.chunk_canaries && *p != SOME_JUNK) {
wrterror(d, "chunk canary corrupted %p %#tx@%#zx%s",
ptr, p - ptr, sz,
*p == SOME_FREEJUNK ? " (double free?)" : "");



Re: btrace: add boolean AND and OR operators

2020-09-14 Thread Otto Moerbeek
On Mon, Sep 14, 2020 at 03:28:17PM +0200, Jasper Lievisse Adriaanse wrote:

> Hi,
> 
> This diff adds support for the '&' and '|' operators, along with
> a new testcase.
> 
> OK?

The precedence looks funny

I'd guess you want

%left '|'
%left '&'
%left '+' '-'
%left '/' '*'

To avoid suprises.

-Otto

> 
> Index: usr.sbin/btrace/bt_parse.y
> ===
> RCS file: /cvs/src/usr.sbin/btrace/bt_parse.y,v
> retrieving revision 1.16
> diff -u -p -r1.16 bt_parse.y
> --- usr.sbin/btrace/bt_parse.y11 Jul 2020 14:52:14 -  1.16
> +++ usr.sbin/btrace/bt_parse.y14 Sep 2020 15:14:10 -
> @@ -119,6 +119,7 @@ static int yylex(void);
>  
>  %left'+' '-'
>  %left'/' '*'
> +%left'&' '|'
>  %%
>  
>  grammar  : /* empty */
> @@ -172,6 +173,8 @@ term  : '(' term ')'  { $$ = 
> $2; }
>   | term '-' term { $$ = ba_op('-', $1, $3); }
>   | term '/' term { $$ = ba_op('/', $1, $3); }
>   | term '*' term { $$ = ba_op('*', $1, $3); }
> + | term '&' term { $$ = ba_op('&', $1, $3); }
> + | term '|' term { $$ = ba_op('|', $1, $3); }
>   | NUMBER{ $$ = ba_new($1, B_AT_LONG); }
>   | builtin   { $$ = ba_new(NULL, $1); }
>   | gvar  { $$ = bv_get($1); }
> @@ -331,6 +334,12 @@ ba_op(const char op, struct bt_arg *da0,
>   break;
>   case '/':
>   type = B_AT_OP_DIVIDE;
> + break;
> + case '&':
> + type = B_AT_OP_AND;
> + break;
> + case '|':
> + type = B_AT_OP_OR;
>   break;
>   default:
>   assert(0);
> Index: usr.sbin/btrace/bt_parser.h
> ===
> RCS file: /cvs/src/usr.sbin/btrace/bt_parser.h,v
> retrieving revision 1.9
> diff -u -p -r1.9 bt_parser.h
> --- usr.sbin/btrace/bt_parser.h   13 Aug 2020 11:29:39 -  1.9
> +++ usr.sbin/btrace/bt_parser.h   14 Sep 2020 15:14:10 -
> @@ -143,6 +143,8 @@ struct bt_arg {
>   B_AT_OP_MINUS,
>   B_AT_OP_MULT,
>   B_AT_OP_DIVIDE,
> + B_AT_OP_AND,
> + B_AT_OP_OR,
>   }ba_type;
>  };
>  
> Index: usr.sbin/btrace/btrace.c
> ===
> RCS file: /cvs/src/usr.sbin/btrace/btrace.c,v
> retrieving revision 1.24
> diff -u -p -r1.24 btrace.c
> --- usr.sbin/btrace/btrace.c  11 Sep 2020 08:16:15 -  1.24
> +++ usr.sbin/btrace/btrace.c  14 Sep 2020 15:14:10 -
> @@ -812,7 +812,7 @@ stmt_store(struct bt_stmt *bs, struct dt
>   case B_AT_BI_NSECS:
>   bv->bv_value = ba_new(builtin_nsecs(dtev), B_AT_LONG);
>   break;
> - case B_AT_OP_ADD ... B_AT_OP_DIVIDE:
> + case B_AT_OP_ADD ... B_AT_OP_OR:
>   bv->bv_value = ba_new(ba2long(ba, dtev), B_AT_LONG);
>   break;
>   default:
> @@ -992,6 +992,12 @@ baexpr2long(struct bt_arg *ba, struct dt
>   case B_AT_OP_DIVIDE:
>   result = first / second;
>   break;
> + case B_AT_OP_AND:
> + result = first & second;
> + break;
> + case B_AT_OP_OR:
> + result = first | second;
> + break;
>   default:
>   xabort("unsuported operation %d", ba->ba_type);
>   }
> @@ -1025,7 +1031,7 @@ ba2long(struct bt_arg *ba, struct dt_evt
>   case B_AT_BI_RETVAL:
>   val = dtev->dtev_sysretval[0];
>   break;
> - case B_AT_OP_ADD ... B_AT_OP_DIVIDE:
> + case B_AT_OP_ADD ... B_AT_OP_OR:
>   val = baexpr2long(ba, dtev);
>   break;
>   default:
> @@ -1093,7 +1099,7 @@ ba2str(struct bt_arg *ba, struct dt_evt 
>   case B_AT_VAR:
>   str = ba2str(ba_read(ba), dtev);
>   break;
> - case B_AT_OP_ADD ... B_AT_OP_DIVIDE:
> + case B_AT_OP_ADD ... B_AT_OP_OR:
>   snprintf(buf, sizeof(buf) - 1, "%ld", ba2long(ba, dtev));
>   str = buf;
>   break;
> @@ -1152,7 +1158,7 @@ ba2dtflags(struct bt_arg *ba)
>   case B_AT_MF_MAX:
>   case B_AT_MF_MIN:
>   case B_AT_MF_SUM:
> - case B_AT_OP_ADD ... B_AT_OP_DIVIDE:
> + case B_AT_OP_ADD ... B_AT_OP_OR:
>   break;
>   default:
>   xabort("invalid argument type %d", ba->ba_type);
> Index: regress/usr.sbin/btrace/Makefile
> ===
> RCS file: /cvs/src/regress/usr.sbin/btrace/Makefile,v
> retrieving revision 1.4
> diff -u -p -r1.4 Makefile
> --- 

Re: asn1/a_bitstring.c: zeroing after recallocarray

2020-09-02 Thread Otto Moerbeek
On Thu, Sep 03, 2020 at 07:03:14AM +0200, Theo Buehler wrote:

> The memset is not needed as recallocarray(3) does the zeroing already.
> (I also think the a->data == NULL check in the if clause is redundant,
> but I'm just suggesting to remove a bit that confused me)

ok,

-Otto

> 
> Index: asn1/a_bitstr.c
> ===
> RCS file: /var/cvs/src/lib/libcrypto/asn1/a_bitstr.c,v
> retrieving revision 1.29
> diff -u -p -U7 -r1.29 a_bitstr.c
> --- asn1/a_bitstr.c   20 Oct 2018 16:07:09 -  1.29
> +++ asn1/a_bitstr.c   15 Jun 2020 12:46:00 -
> @@ -211,16 +211,14 @@ ASN1_BIT_STRING_set_bit(ASN1_BIT_STRING 
>   if ((a->length < (w + 1)) || (a->data == NULL)) {
>   if (!value)
>   return(1); /* Don't need to set */
>   if ((c = recallocarray(a->data, a->length, w + 1, 1)) == NULL) {
>   ASN1error(ERR_R_MALLOC_FAILURE);
>   return 0;
>   }
> - if (w + 1 - a->length > 0)
> - memset(c + a->length, 0, w + 1 - a->length);
>   a->data = c;
>   a->length = w + 1;
>   }
>   a->data[w] = ((a->data[w]) & iv) | v;
>   while ((a->length > 0) && (a->data[a->length - 1] == 0))
>   a->length--;
>  
> 



Re: shrinking and growing reallocs: a theoretical? bad case for performance

2020-09-02 Thread Otto Moerbeek
On Tue, Sep 01, 2020 at 11:56:36AM +0100, Stuart Henderson wrote:

> On 2020/08/31 08:39, Otto Moerbeek wrote:
> > A question from Theo made me think about realloc and come up with a
> > particular bad case for performance. I do not know if it happens in
> > practice, but it was easy to create a test program to hit the case.
> 
> Not very scientific testing (a single attempt at building one port), but
> this seems to help quite a lot when compiling programs written in rust.
> I encourage others to test the diff :-)
> 

It turned out this particular case was a fluke. But I'm still very
interested in cases where it does matter and tests in general as well.

-Otto



Re: shrinking and growing reallocs: a theoretical? bad case for performance

2020-08-31 Thread Otto Moerbeek
On Mon, Aug 31, 2020 at 11:25:51AM -0600, Theo de Raadt wrote:

> > Taking advantage of the sparse address space is smart and as 64-bit
> > is now the norm, that space is even sparser.
> 
> Fundamentally this is moving various forms of pressure to the kernel,
> which does not do the best job yet.

This effect is reduced by making small shrinks a no-op.

> 
> The pivot code in mmap for new mappings isn't entirely bug-free so we've
> avoided it turning it on.  The idea of that code is be random as
> neccessary -- creating "unknowable addresses", but in doing so avoid
> fragmenting the address space excessively.  Excessive fragmentation in turn
> fragmentations allocation in multi-level page-tables, and that in turn
> results in excessive TLB pressure.  Which is difficult to gauge since things
> keep working, but brings in a big performance cost.
> 
> Basically we were brave to do very high amounts of randomization early on.
> At a cost.  But our work to improve the cost isn't finished.



Re: shrinking and growing reallocs: a theoretical? bad case for performance

2020-08-31 Thread Otto Moerbeek
On Mon, Aug 31, 2020 at 08:28:25AM -0400, David Higgs wrote:

> On Mon, Aug 31, 2020 at 2:41 AM Otto Moerbeek  wrote:
> 
> > Hi,
> >
> > A question from Theo made me think about realloc and come up with a
> > particular bad case for performance. I do not know if it happens in
> > practice, but it was easy to create a test program to hit the case.
> >
> > We're talking allocation >= a page here. Smaller allocation follow
> > different rules.
> >
> > If an allocation is grown by realloc, I first tries to extend the
> > allocation by mapping pages next to the existing allocation. Since
> > the location of pages is randomized, chanches are high that next to an
> > allocation there are unmapped pages so the grow will work out.
> >
> > If realloc needs to shrink the allocation it puts the high pages no
> > longer needed in the malloc cache. There they can be re-used by other
> > allocations. But if that happens, next a grow of first allocation will
> > fail: the pages are already mapped. So realloc needs to do a new
> > allocation followed by a copy and a cleanup of the original.
> >
> > So this strategy of a shrinking realloc to of put unneeded pages into
> > the cache can work against us, plus it has the consequence that use of
> > realloc leads to allocations close to each other: no free guard pages.
> >
> 
> If I am interpreting this correctly, realloc could be used to groom/shape
> the heap such that future allocations are less random and more predictable?
> 
> --david

In a way yes, but that's a consequence of caching pages: new
allocations will come from the cache if possible. But with this diff
there are less possibilities. Also note that malloc option S disables
the cache. 

-Otto



shrinking and growing reallocs: a theoretical? bad case for performance

2020-08-31 Thread Otto Moerbeek
Hi,

A question from Theo made me think about realloc and come up with a
particular bad case for performance. I do not know if it happens in
practice, but it was easy to create a test program to hit the case.

We're talking allocation >= a page here. Smaller allocation follow
different rules.

If an allocation is grown by realloc, I first tries to extend the
allocation by mapping pages next to the existing allocation. Since
the location of pages is randomized, chanches are high that next to an
allocation there are unmapped pages so the grow will work out.

If realloc needs to shrink the allocation it puts the high pages no
longer needed in the malloc cache. There they can be re-used by other
allocations. But if that happens, next a grow of first allocation will
fail: the pages are already mapped. So realloc needs to do a new
allocation followed by a copy and a cleanup of the original.

So this strategy of a shrinking realloc to of put unneeded pages into
the cache can work against us, plus it has the consequence that use of
realloc leads to allocations close to each other: no free guard pages.

The program below tests this scenario and runs awfully slow. The diff
fixes this by applying two strategies. The first already makes a huge
difference, but the second strategy will also reduce the total number
of syscalls at the cost of some more memory use.

1. I do not put high pages of shrinking reallocs into to cache, but
directly unmap.

2. For small shrinking reallocs realloc become a no-op. Pro: no
syscalls at all, cons: the actual allocation is larger, so less
overflow detection. So I do not do this if guard pages are active or
the reduction is larger than the cache size.

Some stats, First run is -current, second one is with (an earlier
version of) the diff on an armv7 machine. Other systems also show huge
differences.

[otto@wand:19]$ time ./a.out
0m31.68s real 0m10.02s user 0m21.65s system

[otto@wand:33]$ time ./a.out
0m00.16s real 0m00.12s user 0m00.03s system

I do not see any diffference for builds. But I cna imagine real-life
programs hitting the case.

-Otto



#include 
#include 
#include 

void *p;
size_t psz;
#define E(x) if ((x) == NULL) err(1, NULL)

void f(void)
{
int i;
void *s[64];

p = realloc(p, 1023*psz);
E(p);
for (i = 0; i < 64; i++) {
s[i] = malloc(psz);
E(s[i]);
}
p = realloc(p, 1024*psz);
E(p);
for (i = 0; i < 64; i++)
free(s[i]);

}

int main()
{
int i;

psz = getpagesize();
p = malloc(1024*psz);
E(p);
for (i = 0; i < 1000; i++)
f();
}


Index: malloc.c
===
RCS file: /cvs/src/lib/libc/stdlib/malloc.c,v
retrieving revision 1.262
diff -u -p -r1.262 malloc.c
--- malloc.c28 Jun 2019 13:32:42 -  1.262
+++ malloc.c31 Aug 2020 06:01:40 -
@@ -728,28 +728,8 @@ unmap(struct dir_info *d, void *p, size_
wrterror(d, "malloc cache overflow");
 }
 
-static void
-zapcacheregion(struct dir_info *d, void *p, size_t len)
-{
-   u_int i;
-   struct region_info *r;
-   size_t rsz;
-
-   for (i = 0; i < d->malloc_cache; i++) {
-   r = >free_regions[i];
-   if (r->p >= p && r->p <= (void *)((char *)p + len)) {
-   rsz = r->size << MALLOC_PAGESHIFT;
-   if (munmap(r->p, rsz))
-   wrterror(d, "munmap %p", r->p);
-   r->p = NULL;
-   d->free_regions_size -= r->size;
-   STATS_SUB(d->malloc_used, rsz);
-   }
-   }
-}
-
 static void *
-map(struct dir_info *d, void *hint, size_t sz, int zero_fill)
+map(struct dir_info *d, size_t sz, int zero_fill)
 {
size_t psz = sz >> MALLOC_PAGESHIFT;
struct region_info *r, *big = NULL;
@@ -762,7 +742,7 @@ map(struct dir_info *d, void *hint, size
if (sz != PAGEROUND(sz))
wrterror(d, "map round");
 
-   if (hint == NULL && psz > d->free_regions_size) {
+   if (psz > d->free_regions_size) {
_MALLOC_LEAVE(d);
p = MMAP(sz, d->mmap_flag);
_MALLOC_ENTER(d);
@@ -774,8 +754,6 @@ map(struct dir_info *d, void *hint, size
for (i = 0; i < d->malloc_cache; i++) {
r = >free_regions[(i + d->rotor) & (d->malloc_cache - 1)];
if (r->p != NULL) {
-   if (hint != NULL && r->p != hint)
-   continue;
if (r->size == psz) {
p = r->p;
r->p = NULL;
@@ -807,8 +785,6 @@ map(struct dir_info *d, void *hint, size
memset(p, SOME_FREEJUNK, sz);
return p;
}
-   if (hint != NULL)
-   return 

Re: ntpd: go into unsynced mode

2020-08-30 Thread Otto Moerbeek
On Sat, Aug 22, 2020 at 03:51:48PM +0200, Otto Moerbeek wrote:

> Hi,
> 
> At the moment ntpd never goes into unsynced mode if network
> connectivity is lost. The code to do that is only triggered when a
> pakcet is received, which does not happen. 
> 
> This diff fixes that by going into unsynced mode if no time data was
> processed for a while. 
> 
> An earlier version of this diff was tested by naddy@. Compared to that
> version, the needed period of inactivity is now three times as large
> and I set scale to 1, so recovery goes faster.
> 
> Please test and review,

anyone wants to ok?

-Otto


> 
> Index: ntp.c
> ===
> RCS file: /cvs/src/usr.sbin/ntpd/ntp.c,v
> retrieving revision 1.165
> diff -u -p -r1.165 ntp.c
> --- ntp.c 22 Jun 2020 06:11:34 -  1.165
> +++ ntp.c 22 Aug 2020 13:48:34 -
> @@ -89,6 +89,7 @@ ntp_main(struct ntpd_conf *nconf, struct
>   struct stat  stb;
>   struct ctl_conn *cc;
>   time_t   nextaction, last_sensor_scan = 0, now;
> + time_t   last_action = 0, interval;
>   void*newp;
>  
>   if (socketpair(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, PF_UNSPEC,
> @@ -402,6 +403,7 @@ ntp_main(struct ntpd_conf *nconf, struct
>   for (; nfds > 0 && j < idx_clients; j++) {
>   if (pfd[j].revents & (POLLIN|POLLERR)) {
>   nfds--;
> + last_action = now;
>   if (client_dispatch(idx2peer[j - idx_peers],
>   conf->settime, conf->automatic) == -1) {
>   log_warn("pipe write error (settime)");
> @@ -417,8 +419,24 @@ ntp_main(struct ntpd_conf *nconf, struct
>   for (s = TAILQ_FIRST(>ntp_sensors); s != NULL;
>   s = next_s) {
>   next_s = TAILQ_NEXT(s, entry);
> - if (s->next <= getmonotime())
> + if (s->next <= now) {
> + last_action = now;
>   sensor_query(s);
> + }
> + }
> +
> + /*
> +  * Compute maximum of scale_interval(INTERVAL_QUERY_NORMAL),
> +  * if we did not process a time message for three times that
> +  * interval, stop advertising we're synced.
> +  */
> + interval = INTERVAL_QUERY_NORMAL * conf->scale;
> + interval += MAXIMUM(5, interval / 10) - 1;
> + if (conf->status.synced && last_action + 3 * interval < now) {
> + log_info("clock is now unsynced");
> + conf->status.synced = 0;
> + conf->scale = 1;
> + priv_dns(IMSG_UNSYNCED, NULL, 0);
>   }
>   }
>  
> 



Re: ntpd: go into unsynced mode

2020-08-25 Thread Otto Moerbeek
On Tue, Aug 25, 2020 at 07:05:31PM +0200, Matthias Schmidt wrote:

> Hi Otto,
> 
> * Otto Moerbeek wrote:
> > Hi,
> > 
> > At the moment ntpd never goes into unsynced mode if network
> > connectivity is lost. The code to do that is only triggered when a
> > pakcet is received, which does not happen. 
> > 
> > This diff fixes that by going into unsynced mode if no time data was
> > processed for a while. 
> > 
> > An earlier version of this diff was tested by naddy@. Compared to that
> > version, the needed period of inactivity is now three times as large
> > and I set scale to 1, so recovery goes faster.
> > 
> > Please test and review,
> 
> I have your diff running on my Laptop which sometimes not connected to a
> network so it should be a good test case.
> 
> I haven't noticed any difference to before, so I count that as a good
> sign :)  I spotted only one thing:  While "ntpctl -s a" says that the
> clock is unsynced I see no message from ntpd in the logs.  Not sure if
> that's on purpose or not, I just noticed it.

Thanks for testing. 

There should be "clock is now unsynced" and "clock is now synced" messages
in /var/log/daemon... here they do appear.

-Otto



ntpd: go into unsynced mode

2020-08-22 Thread Otto Moerbeek
Hi,

At the moment ntpd never goes into unsynced mode if network
connectivity is lost. The code to do that is only triggered when a
pakcet is received, which does not happen. 

This diff fixes that by going into unsynced mode if no time data was
processed for a while. 

An earlier version of this diff was tested by naddy@. Compared to that
version, the needed period of inactivity is now three times as large
and I set scale to 1, so recovery goes faster.

Please test and review,

-Otto

Index: ntp.c
===
RCS file: /cvs/src/usr.sbin/ntpd/ntp.c,v
retrieving revision 1.165
diff -u -p -r1.165 ntp.c
--- ntp.c   22 Jun 2020 06:11:34 -  1.165
+++ ntp.c   22 Aug 2020 13:48:34 -
@@ -89,6 +89,7 @@ ntp_main(struct ntpd_conf *nconf, struct
struct stat  stb;
struct ctl_conn *cc;
time_t   nextaction, last_sensor_scan = 0, now;
+   time_t   last_action = 0, interval;
void*newp;
 
if (socketpair(AF_UNIX, SOCK_STREAM | SOCK_CLOEXEC, PF_UNSPEC,
@@ -402,6 +403,7 @@ ntp_main(struct ntpd_conf *nconf, struct
for (; nfds > 0 && j < idx_clients; j++) {
if (pfd[j].revents & (POLLIN|POLLERR)) {
nfds--;
+   last_action = now;
if (client_dispatch(idx2peer[j - idx_peers],
conf->settime, conf->automatic) == -1) {
log_warn("pipe write error (settime)");
@@ -417,8 +419,24 @@ ntp_main(struct ntpd_conf *nconf, struct
for (s = TAILQ_FIRST(>ntp_sensors); s != NULL;
s = next_s) {
next_s = TAILQ_NEXT(s, entry);
-   if (s->next <= getmonotime())
+   if (s->next <= now) {
+   last_action = now;
sensor_query(s);
+   }
+   }
+
+   /*
+* Compute maximum of scale_interval(INTERVAL_QUERY_NORMAL),
+* if we did not process a time message for three times that
+* interval, stop advertising we're synced.
+*/
+   interval = INTERVAL_QUERY_NORMAL * conf->scale;
+   interval += MAXIMUM(5, interval / 10) - 1;
+   if (conf->status.synced && last_action + 3 * interval < now) {
+   log_info("clock is now unsynced");
+   conf->status.synced = 0;
+   conf->scale = 1;
+   priv_dns(IMSG_UNSYNCED, NULL, 0);
}
}
 



Re: adjtime(2): distribute skew along arbitrary runtime period

2020-07-16 Thread Otto Moerbeek
On Wed, Jul 15, 2020 at 09:08:29AM -0500, Scott Cheloha wrote:

> Hi,
> 
> adjtime(2) skews the clock at up to 5000ppm per second.  The way this
> actually happens is pretty straightforward: at the start of every UTC
> second we call ntp_update_second() from tc_windup() and reset
> th_adjustment.  th_adjustment is then mixed into the scale for one UTC
> second.  This cycle slowly chips away at th_adjtimedelta, eventually
> reducing it to zero.
> 
> This is fine, except that using UTC for your update period requires
> you to work around how the UTC time can jump forward a huge amount.
> There are two notable jumps:
> 
> 1. The big jump forward to the RTC time during boot.
> 
> 2. The big jump forward to the RTC time after each resume.
> 
> To handle this we have a magic number in the code, LARGE_STEP.  If the
> UTC time jumps more than LARGE_STEP (200) seconds we truncate the
> number of ntp_update_second() calls to 2 to avoid looping endlessly in
> tc_windup().  Here we find a wart: we do 2 calls to account for a
> missed leap second, even though we no longer handle those in the
> kernel.
> 
> The magic number approach is less than ideal because it doesn't handle
> short suspends correctly: suspends shorter than 200 seconds are
> deducted from th_adjtimedelta even though we do not skew the clock
> during suspend.
> 
> Now that the timehands have a concept of "runtime" (time spent not
> suspended) I think it would be nicer if we called ntp_update_second()
> along an arbitrary period on the runtime clock.
> 
> So, this diff:
> 
> When adjtime(2) is called the NTP update period (th_next_ntp_update)
> is changed to align with the current runtime.  Thereafter, once per
> second, ntp_update_second() is called.
> 
> We don't deduct any skew from th_adjtimedelta across a big UTC jump
> (like a suspend) because the runtime clock does not advance while the
> machine is down.
> 
> Another upside is that skew changes via adjtime(2) happen immediately
> instead of being applied up to one second later.  For example, if the
> adjtime(2) skew is cancelled, the skew stops right away instead of
> continuing for up to one second.  This behavior seems more correct to
> me.
> 
> And, obviously, we can get rid of the magic number.
> 
> --
> 
> otto: Does the NTP algorithm *require* us to distribute the adjtime(2)
>   skew as we do?  At the start of the UTC second?  Or can we choose
>   an arbitrary starting point for the period like I do in this diff?
> 
>   My intuition is that this diff shouldn't break anything, and my
>   testing suggests it doesn't, but I'd appreciate a test all the same.

As far as I know, the NTP adjustment algorithm does not depend on a
particular point.

-Otto

> 
> Index: kern_tc.c
> ===
> RCS file: /cvs/src/sys/kern/kern_tc.c,v
> retrieving revision 1.62
> diff -u -p -r1.62 kern_tc.c
> --- kern_tc.c 6 Jul 2020 13:33:09 -   1.62
> +++ kern_tc.c 15 Jul 2020 13:56:22 -
> @@ -35,14 +35,6 @@
>  #include 
>  #include 
>  
> -/*
> - * A large step happens on boot.  This constant detects such steps.
> - * It is relatively small so that ntp_update_second gets called enough
> - * in the typical 'missed a couple of seconds' case, but doesn't loop
> - * forever when the time step is large.
> - */
> -#define LARGE_STEP   200
> -
>  u_int dummy_get_timecount(struct timecounter *);
>  
>  int sysctl_tc_hardware(void *, size_t *, void *, size_t);
> @@ -77,6 +69,7 @@ struct timehands {
>   /* These fields must be initialized by the driver. */
>   struct timecounter  *th_counter;/* [W] */
>   int64_t th_adjtimedelta;/* [T,W] */
> + struct bintime  th_next_ntp_update; /* [T,W] */
>   int64_t th_adjustment;  /* [W] */
>   u_int64_t   th_scale;   /* [W] */
>   u_int   th_offset_count;/* [W] */
> @@ -564,12 +557,11 @@ void
>  tc_windup(struct bintime *new_boottime, struct bintime *new_offset,
>  int64_t *new_adjtimedelta)
>  {
> - struct bintime bt;
> + struct bintime diff, runtime, utc;
>   struct timecounter *active_tc;
>   struct timehands *th, *tho;
>   u_int64_t scale;
>   u_int delta, ncount, ogen;
> - int i;
>  
>   if (new_boottime != NULL || new_adjtimedelta != NULL)
>   rw_assert_wrlock(_lock);
> @@ -609,8 +601,8 @@ tc_windup(struct bintime *new_boottime, 
>* accordingly.
>*/
>   if (new_offset != NULL && bintimecmp(>th_offset, new_offset, <)) {
> - bintimesub(new_offset, >th_offset, );
> - bintimeadd(>th_naptime, , >th_naptime);
> + bintimesub(new_offset, >th_offset, );
> + bintimeadd(>th_naptime, , >th_naptime);
>   th->th_offset = *new_offset;
>   }
>  
> @@ -633,30 +625,29 @@ tc_windup(struct bintime *new_boottime, 
>*/
>   

Re: fsck_ffs: faster with lots of cylinder groups

2020-07-12 Thread Otto Moerbeek
On Sun, Jul 12, 2020 at 11:07:05AM +0200, Solene Rapenne wrote:

> On Sun, 12 Jul 2020 09:13:47 +0200
> Otto Moerbeek :
> 
> > On Mon, Jun 29, 2020 at 02:30:41PM +0200, Otto Moerbeek wrote:
> > 
> > > On Sun, Jun 21, 2020 at 03:35:21PM +0200, Otto Moerbeek wrote:
> > >   
> > > > Hi,
> > > > 
> > > > both phase 1 and phase 5 need cylinder group metadata.  This diff
> > > > keeps the cg data read in phase 1 in memory to be used by phase 5 if
> > > > possible. From FreeBSD. 
> > > > 
> > > > -Otto
> > > > 
> > > > On an empty 30T fileystem:
> > > > 
> > > > $ time obj/fsck_ffs -f /dev/sd3a
> > > > 2m44.10s real 0m13.21s user 0m07.38s system
> > > > 
> > > > $ time doas obj/fsck_ffs -f /dev/sd3a
> > > > 1m32.81s real 0m12.86s user 0m05.25s system
> > > > 
> > > > The difference will be less if a fileystem is filled up, but still 
> > > > nice.  
> > > 
> > > Any takers?  
> > 
> > No feedback. I'm getting discouraged in doing more filesystem work...
> > 
> > What to do?
> > 
> > 1) Abondon the diff
> > 2) Commit without ok
> > 
> > I did quite extensive testing, but both options are unsatisfactory.
> > 
> > -Otto
> 
> I'm not sure how to test your diff.
> Would running fsck on a sane filesystem enough?
> 
> Are you using Vms that you halt to force a
> fsck on them? Would this be a good test too?

I have used both large and small fieysystems, clean and with
inconsistencies, both ffs1 and ffs2. Sometimes I create
inconsistencies by power cycling a machine, buut I have created faulty
filesystems by carefully overwriting meta data with dd in the past as
well.

In this case running with a restricted ulimit -d to force the fallback
code to kick in is also an good idea.

-Otto



Re: fsck_ffs: faster with lots of cylinder groups

2020-07-12 Thread Otto Moerbeek
On Mon, Jun 29, 2020 at 02:30:41PM +0200, Otto Moerbeek wrote:

> On Sun, Jun 21, 2020 at 03:35:21PM +0200, Otto Moerbeek wrote:
> 
> > Hi,
> > 
> > both phase 1 and phase 5 need cylinder group metadata.  This diff
> > keeps the cg data read in phase 1 in memory to be used by phase 5 if
> > possible. From FreeBSD. 
> > 
> > -Otto
> > 
> > On an empty 30T fileystem:
> > 
> > $ time obj/fsck_ffs -f /dev/sd3a
> > 2m44.10s real 0m13.21s user 0m07.38s system
> > 
> > $ time doas obj/fsck_ffs -f /dev/sd3a
> > 1m32.81s real 0m12.86s user 0m05.25s system
> > 
> > The difference will be less if a fileystem is filled up, but still nice.
> 
> Any takers?

No feedback. I'm getting discouraged in doing more filesystem work...

What to do?

1) Abondon the diff
2) Commit without ok

I did quite extensive testing, but both options are unsatisfactory.

-Otto

> 
> > 
> > Index: fsck.h
> > ===
> > RCS file: /cvs/src/sbin/fsck_ffs/fsck.h,v
> > retrieving revision 1.32
> > diff -u -p -r1.32 fsck.h
> > --- fsck.h  5 Jan 2018 09:33:47 -   1.32
> > +++ fsck.h  21 Jun 2020 12:48:50 -
> > @@ -136,7 +136,6 @@ struct bufarea {
> >  struct bufarea bufhead;/* head of list of other blks in 
> > filesys */
> >  struct bufarea sblk;   /* file system superblock */
> >  struct bufarea asblk;  /* alternate file system superblock */
> > -struct bufarea cgblk;  /* cylinder group blocks */
> >  struct bufarea *pdirbp;/* current directory contents */
> >  struct bufarea *pbp;   /* current inode block */
> >  struct bufarea *getdatablk(daddr_t, long);
> > @@ -148,9 +147,7 @@ struct bufarea *getdatablk(daddr_t, long
> > (bp)->b_flags = 0;
> >  
> >  #definesbdirty()   sblk.b_dirty = 1
> > -#definecgdirty()   cgblk.b_dirty = 1
> >  #definesblock  (*sblk.b_un.b_fs)
> > -#definecgrp(*cgblk.b_un.b_cg)
> >  
> >  enum fixstate {DONTKNOW, NOFIX, FIX, IGNORE};
> >  
> > @@ -275,9 +272,13 @@ struct ufs2_dinode ufs2_zino;
> >  #defineFOUND   0x10
> >  
> >  union dinode *ginode(ino_t);
> > +struct bufarea *cglookup(u_int cg);
> >  struct inoinfo *getinoinfo(ino_t);
> >  void getblk(struct bufarea *, daddr_t, long);
> >  ino_t allocino(ino_t, int);
> > +void *Malloc(size_t);
> > +void *Calloc(size_t, size_t);
> > +void *Reallocarray(void *, size_t, size_t);
> >  
> >  int(*info_fn)(char *, size_t);
> >  char   *info_filesys;
> > Index: inode.c
> > ===
> > RCS file: /cvs/src/sbin/fsck_ffs/inode.c,v
> > retrieving revision 1.49
> > diff -u -p -r1.49 inode.c
> > --- inode.c 16 Sep 2018 02:43:11 -  1.49
> > +++ inode.c 21 Jun 2020 12:48:50 -
> > @@ -370,7 +370,7 @@ setinodebuf(ino_t inum)
> > partialsize = inobufsize;
> > }
> > if (inodebuf == NULL &&
> > -   (inodebuf = malloc((unsigned)inobufsize)) == NULL)
> > +   (inodebuf = Malloc((unsigned)inobufsize)) == NULL)
> > errexit("Cannot allocate space for inode buffer\n");
> >  }
> >  
> > @@ -401,7 +401,7 @@ cacheino(union dinode *dp, ino_t inumber
> > blks = howmany(DIP(dp, di_size), sblock.fs_bsize);
> > if (blks > NDADDR)
> > blks = NDADDR + NIADDR;
> > -   inp = malloc(sizeof(*inp) + (blks ? blks - 1 : 0) * sizeof(daddr_t));
> > +   inp = Malloc(sizeof(*inp) + (blks ? blks - 1 : 0) * sizeof(daddr_t));
> > if (inp == NULL)
> > errexit("cannot allocate memory for inode cache\n");
> > inpp = [inumber % numdirs];
> > @@ -423,10 +423,10 @@ cacheino(union dinode *dp, ino_t inumber
> > inp->i_blks[NDADDR + i] = DIP(dp, di_ib[i]);
> > if (inplast == listmax) {
> > newlistmax = listmax + 100;
> > -   newinpsort = reallocarray(inpsort,
> > +   newinpsort = Reallocarray(inpsort,
> > (unsigned)newlistmax, sizeof(struct inoinfo *));
> > if (newinpsort == NULL)
> > -   errexit("cannot increase directory list");
> > +   errexit("cannot increase directory list\n");
> > inpsort = newinpsort;
> > listmax = newlistmax;
> > }

Re: Undefined Behavior at jsmn.c

2020-07-12 Thread Otto Moerbeek
On Sun, Jul 12, 2020 at 09:57:02AM +0430, Ali Farzanrad wrote:

> Hi @tech,
> 
> I was comparing jsmn.c in acme-client with jsmn.c in FreeBSD [1].
> I found a switch without a default case which is an undefined behavior:
> 
> @@ -69,6 +69,8 @@
>   case '\t' : case '\r' : case '\n' : case ' ' :
>   case ','  : case ']'  : case '}' :
>   goto found;
> + default:
> + break;
>   }
>   if (js[parser->pos] < 32 || js[parser->pos] >= 127) {
>   parser->pos = start;
> 
> I have patched that undefined behavior + some style fix.

It is bad practise to intermix style changes with bug fixes. 
Please post the fix seperately.

-Otto

> 
> [1] https://svnweb.freebsd.org/base/head/lib/libpmc/pmu-events/jsmn.c
> 
> Index: jsmn.c
> ===
> RCS file: /cvs/src/usr.sbin/acme-client/jsmn.c,v
> retrieving revision 1.1
> diff -u -p -r1.1 jsmn.c
> --- jsmn.c31 Aug 2016 22:01:42 -  1.1
> +++ jsmn.c12 Jul 2020 05:10:34 -
> @@ -1,31 +1,33 @@
>  /*
> - Copyright (c) 2010 Serge A. Zaitsev
> - 
> - Permission is hereby granted, free of charge, to any person obtaining a copy
> - of this software and associated documentation files (the "Software"), to 
> deal
> - in the Software without restriction, including without limitation the rights
> - to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> - copies of the Software, and to permit persons to whom the Software is
> - furnished to do so, subject to the following conditions:
> - 
> - The above copyright notice and this permission notice shall be included in
> - all copies or substantial portions of the Software.
> - 
> - THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> - IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> - FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
> - AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> - LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
> - OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> - THE SOFTWARE.*
> + * Copyright (c) 2010 Serge A. Zaitsev
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
> + * of this software and associated documentation files (the "Software"), to 
> deal
> + * in the Software without restriction, including without limitation the 
> rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL 
> THE
> + * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
>   */
> +
>  #include "jsmn.h"
>  
> -/**
> - * Allocates a fresh unused token from the token pull.
> +/*
> + * Allocates a fresh unused token from the token pool.
>   */
> -static jsmntok_t *jsmn_alloc_token(jsmn_parser *parser,
> - jsmntok_t *tokens, size_t num_tokens) {
> +static jsmntok_t *
> +jsmn_alloc_token(jsmn_parser *parser, jsmntok_t *tokens, size_t num_tokens)
> +{
>   jsmntok_t *tok;
>   if (parser->toknext >= num_tokens) {
>   return NULL;
> @@ -39,22 +41,25 @@ static jsmntok_t *jsmn_alloc_token(jsmn_
>   return tok;
>  }
>  
> -/**
> +/*
>   * Fills token type and boundaries.
>   */
> -static void jsmn_fill_token(jsmntok_t *token, jsmntype_t type,
> -int start, int end) {
> +static void
> +jsmn_fill_token(jsmntok_t *token, jsmntype_t type, int start, int end)
> +{
>   token->type = type;
>   token->start = start;
>   token->end = end;
>   token->size = 0;
>  }
>  
> -/**
> +/*
>   * Fills next available token with JSON primitive.
>   */
> -static int jsmn_parse_primitive(jsmn_parser *parser, const char *js,
> - size_t len, jsmntok_t *tokens, size_t num_tokens) {
> +static int
> +jsmn_parse_primitive(jsmn_parser *parser, const char *js,
> +size_t len, jsmntok_t *tokens, size_t num_tokens)
> +{
>   jsmntok_t *token;
>   int start;
>  
> @@ -63,12 +68,19 @@ static int jsmn_parse_primitive(jsmn_par
>   for (; parser->pos < len && 

Re: adjfreq(2): limit adjustment to prevent overflow during tc_windup()

2020-07-03 Thread Otto Moerbeek
On Thu, Jul 02, 2020 at 08:27:58PM -0500, Scott Cheloha wrote:

> Hi,
> 
> When we recompute the scaling factor during tc_windup() there is an
> opportunity for arithmetic overflow/underflow when we add the NTP
> adjustment into the scale:
> 
>649  scale = (u_int64_t)1 << 63;
>650  scale += \
>651  ((th->th_adjustment + th->th_counter->tc_freq_adj) / 
> 1024) * 2199;
>652  scale /= th->th_counter->tc_frequency;
>653  th->th_scale = scale * 2;
> 
> At lines 650 and 651, you will overflow/underflow if
> th->th_counter->tc_freq_adj is sufficiently positive/negative.
> 
> I don't like the idea of checking for that overflow during
> tc_windup().  We can pick a reasonable adjustment range and check for
> it during adjfreq(2) and that should be good enough.
> 
> My strawman proposal is a range of -5 to 5 parts per
> billion.  We could push the limits a bit, but half a billion seems
> like a nice round number to me.
> 
> On a perfect clock, this means you can effect a 0.5x slowdown or a
> 1.5x speedup via adjfreq(2), but no slower/faster.
> 
> I don't *think* ntpd(8) would ever reach such extreme adjustments
> through its algorithm.  I don't think this will break anyone's working
> setup.
> 
> (Maybe I'm wrong, though.  otto@?)

Right, ntpd is pretty conversative and won't do big adjustments.

-Otto

> 
> Just so we're all clear that the math is sound, here's the result at
> the upper limit of the input range.  Note that adjtime(2) is capped at
> 5000PPM in ntp_update_second(), hence its value here.
> 
>   int64_t th_adjustment = (5000 * 1000) << 32;/* 2147483648000 */
>   int64_t tc_freq_adj = 5LL << 32;/* 21474836480 
> */
>   
> 
>   scale = (u_int64_t)1 << 63  /* 9223372036854775808 
> */
>   scale += (th_adjustment + tc_freq_adj) / 1024 * 2199;
>   /*+= (216895848448000) / 1024 * 2199; */
>   /*+= 465775362048000; */
> 
> 9223372036854775808 + 465775362048000 = 13881125657334775808,
> which less than 18446744073709551616, so we don't have overflow.
> 
> At the negative end of the input range, i.e.
> 
>   int64_t th_adjustment = (-5000 * 1000) << 32;
>   int64_t tc_freq_adj = -5LL << 32;
> 
> you have 9223372036854775808 - 465775362048000 = 4565618416374775808,
> so no underflow either.
> 
> Thoughts?
> 
> What is the best way to express this range in the documentation?  Do I
> say "parts per billion", or something else?
> 
> Index: sys/kern/kern_time.c
> ===
> RCS file: /cvs/src/sys/kern/kern_time.c,v
> retrieving revision 1.131
> diff -u -p -r1.131 kern_time.c
> --- sys/kern/kern_time.c  22 Jun 2020 18:25:57 -  1.131
> +++ sys/kern/kern_time.c  3 Jul 2020 00:57:49 -
> @@ -391,6 +391,9 @@ sys_settimeofday(struct proc *p, void *v
>   return (0);
>  }
>  
> +#define ADJFREQ_MAX (5LL << 32)
> +#define ADJFREQ_MIN (-5LL << 32)
> +
>  int
>  sys_adjfreq(struct proc *p, void *v, register_t *retval)
>  {
> @@ -408,6 +411,8 @@ sys_adjfreq(struct proc *p, void *v, reg
>   return (error);
>   if ((error = copyin(freq, , sizeof(f
>   return (error);
> + if (f < ADJFREQ_MIN || f > ADJFREQ_MAX)
> + return (EINVAL);
>   }
>  
>   rw_enter(_lock, (freq == NULL) ? RW_READ : RW_WRITE);
> Index: lib/libc/sys/adjfreq.2
> ===
> RCS file: /cvs/src/lib/libc/sys/adjfreq.2,v
> retrieving revision 1.7
> diff -u -p -r1.7 adjfreq.2
> --- lib/libc/sys/adjfreq.210 Sep 2015 17:55:21 -  1.7
> +++ lib/libc/sys/adjfreq.23 Jul 2020 00:57:49 -
> @@ -60,6 +60,10 @@ The
>  .Fa freq
>  argument is non-null and the process's effective user ID is not that
>  of the superuser.
> +.It Bq Er EINVAL
> +.Fa freq
> +is less than -5 parts-per-billion or greater than 5
> +parts-per-billion.
>  .El
>  .Sh SEE ALSO
>  .Xr date 1 ,



Re: fsck_ffs: faster with lots of cylinder groups

2020-06-29 Thread Otto Moerbeek
On Sun, Jun 21, 2020 at 03:35:21PM +0200, Otto Moerbeek wrote:

> Hi,
> 
> both phase 1 and phase 5 need cylinder group metadata.  This diff
> keeps the cg data read in phase 1 in memory to be used by phase 5 if
> possible. From FreeBSD. 
> 
>   -Otto
> 
> On an empty 30T fileystem:
> 
> $ time obj/fsck_ffs -f /dev/sd3a
> 2m44.10s real 0m13.21s user 0m07.38s system
> 
> $ time doas obj/fsck_ffs -f /dev/sd3a
> 1m32.81s real 0m12.86s user 0m05.25s system
> 
> The difference will be less if a fileystem is filled up, but still nice.

Any takers?

-Otto

> 
> Index: fsck.h
> ===
> RCS file: /cvs/src/sbin/fsck_ffs/fsck.h,v
> retrieving revision 1.32
> diff -u -p -r1.32 fsck.h
> --- fsck.h5 Jan 2018 09:33:47 -   1.32
> +++ fsck.h21 Jun 2020 12:48:50 -
> @@ -136,7 +136,6 @@ struct bufarea {
>  struct bufarea bufhead;  /* head of list of other blks in 
> filesys */
>  struct bufarea sblk; /* file system superblock */
>  struct bufarea asblk;/* alternate file system superblock */
> -struct bufarea cgblk;/* cylinder group blocks */
>  struct bufarea *pdirbp;  /* current directory contents */
>  struct bufarea *pbp; /* current inode block */
>  struct bufarea *getdatablk(daddr_t, long);
> @@ -148,9 +147,7 @@ struct bufarea *getdatablk(daddr_t, long
>   (bp)->b_flags = 0;
>  
>  #define  sbdirty()   sblk.b_dirty = 1
> -#define  cgdirty()   cgblk.b_dirty = 1
>  #define  sblock  (*sblk.b_un.b_fs)
> -#define  cgrp(*cgblk.b_un.b_cg)
>  
>  enum fixstate {DONTKNOW, NOFIX, FIX, IGNORE};
>  
> @@ -275,9 +272,13 @@ struct ufs2_dinode ufs2_zino;
>  #define  FOUND   0x10
>  
>  union dinode *ginode(ino_t);
> +struct bufarea *cglookup(u_int cg);
>  struct inoinfo *getinoinfo(ino_t);
>  void getblk(struct bufarea *, daddr_t, long);
>  ino_t allocino(ino_t, int);
> +void *Malloc(size_t);
> +void *Calloc(size_t, size_t);
> +void *Reallocarray(void *, size_t, size_t);
>  
>  int  (*info_fn)(char *, size_t);
>  char *info_filesys;
> Index: inode.c
> ===
> RCS file: /cvs/src/sbin/fsck_ffs/inode.c,v
> retrieving revision 1.49
> diff -u -p -r1.49 inode.c
> --- inode.c   16 Sep 2018 02:43:11 -  1.49
> +++ inode.c   21 Jun 2020 12:48:50 -
> @@ -370,7 +370,7 @@ setinodebuf(ino_t inum)
>   partialsize = inobufsize;
>   }
>   if (inodebuf == NULL &&
> - (inodebuf = malloc((unsigned)inobufsize)) == NULL)
> + (inodebuf = Malloc((unsigned)inobufsize)) == NULL)
>   errexit("Cannot allocate space for inode buffer\n");
>  }
>  
> @@ -401,7 +401,7 @@ cacheino(union dinode *dp, ino_t inumber
>   blks = howmany(DIP(dp, di_size), sblock.fs_bsize);
>   if (blks > NDADDR)
>   blks = NDADDR + NIADDR;
> - inp = malloc(sizeof(*inp) + (blks ? blks - 1 : 0) * sizeof(daddr_t));
> + inp = Malloc(sizeof(*inp) + (blks ? blks - 1 : 0) * sizeof(daddr_t));
>   if (inp == NULL)
>   errexit("cannot allocate memory for inode cache\n");
>   inpp = [inumber % numdirs];
> @@ -423,10 +423,10 @@ cacheino(union dinode *dp, ino_t inumber
>   inp->i_blks[NDADDR + i] = DIP(dp, di_ib[i]);
>   if (inplast == listmax) {
>   newlistmax = listmax + 100;
> - newinpsort = reallocarray(inpsort,
> + newinpsort = Reallocarray(inpsort,
>   (unsigned)newlistmax, sizeof(struct inoinfo *));
>   if (newinpsort == NULL)
> - errexit("cannot increase directory list");
> + errexit("cannot increase directory list\n");
>   inpsort = newinpsort;
>   listmax = newlistmax;
>   }
> @@ -582,7 +582,8 @@ allocino(ino_t request, int type)
>  {
>   ino_t ino;
>   union dinode *dp;
> - struct cg *cgp = 
> + struct bufarea *cgbp;
> + struct cg *cgp;
>   int cg;
>   time_t t;
>   struct inostat *info;
> @@ -602,7 +603,7 @@ allocino(ino_t request, int type)
>   unsigned long newalloced, i;
>   newalloced = MINIMUM(sblock.fs_ipg,
>   MAXIMUM(2 * inostathead[cg].il_numalloced, 10));
> - info = calloc(newalloced, sizeof(struct inostat));
> + info = Calloc(newalloced, sizeof(struct inostat));
>   if (info == NULL) {
>

Re: obsd 6.7 - ntpd error msg

2020-06-22 Thread Otto Moerbeek
On Thu, Jun 18, 2020 at 11:41:17AM +0200, Otto Moerbeek wrote:

> On Thu, Jun 18, 2020 at 09:57:34AM +0200, Salvatore Cuzzilla wrote:
> 
> > Perfect, tnx!
> > 
> > On 18.06.2020 07:58, Otto Moerbeek wrote:
> > > On Wed, Jun 17, 2020 at 10:53:54PM +0200, Salvatore Cuzzilla wrote:
> > > 
> > > > Hi Otto here the logs (multitail) - @22:49:15 I restarted ntpd:
> > > > -
> > > > Jun 17 22:49:23 obsd ntpd[88568]: constraint reply from 188.61.106.24: 
> > > > offset -0.541051
> > > > Jun 17 22:49:46 obsd ntpd[88568]: peer 172.17.1.1 now valid
> > > > 01] /var/log/daemon  <---   
> > > > 
> > > > 
> > > > 
> > > >  248KB - 2020/06/17 22:49:46
> > > > -
> > > > Jun 17 14:00:01 obsd syslogd[80400]: restart
> > > > Jun 17 16:19:41 obsd ntpd[92699]: pipe write error (from main): No such 
> > > > file or directory
> > > > Jun 17 16:21:07 obsd ntpd[29588]: pipe write error (from main): No such 
> > > > file or directory
> > > > Jun 17 17:00:01 obsd syslogd[80400]: restart
> > > > Jun 17 17:01:25 obsd ntpd[96273]: pipe write error (from main): No such 
> > > > file or directory
> > > > Jun 17 17:02:38 obsd ntpd[94737]: pipe write error (from main): No such 
> > > > file or directory
> > > > Jun 17 20:00:01 obsd syslogd[80400]: restart
> > > > Jun 17 22:00:01 obsd syslogd[80400]: restart
> > > > Jun 17 22:49:22 obsd ntpd[40936]: pipe write error (from main): No such 
> > > > file or directory
> > > > 02] /var/log/messages <---  
> > > > 
> > > > 
> > > > 
> > > >  205KB - 2020/06/17 22:49:22
> > > > -
> > > > 22:49:15 -ksh ToTo@obsd ~ $ doas rcctl restart ntpd
> > > > ntpd(ok)
> > > > ntpd(ok)
> > > > 22:49:23 -ksh ToTo@obsd ~ $
> > > 
> > > 
> > > OK, now we're getting somewhere.  It always helps to provide lots of
> > > information form the start.
> > > 
> > > The message is generated by ntpd being stopped.  It is harmless,
> > > though it is actually wrong, it's a pip read error.
> > > 
> > > So nothing to worry about.  I'll see if the log level should be
> > > changed to debug for this one or maybe another solution.
> > > 
> 
> And now with diff.

I committed a slighlty more conservative version of this diff. A dns
read error (which should not happen) still logs at warn level.

-Otto

> 
> Index: ntp.c
> ===
> RCS file: /cvs/src/usr.sbin/ntpd/ntp.c,v
> retrieving revision 1.164
> diff -u -p -r1.164 ntp.c
> --- ntp.c 11 Apr 2020 07:49:48 -  1.164
> +++ ntp.c 18 Jun 2020 09:39:03 -
> @@ -365,7 +365,7 @@ ntp_main(struct ntpd_conf *nconf, struct
>   if (nfds > 0 && pfd[PFD_PIPE_MAIN].revents & (POLLIN|POLLERR)) {
>   nfds--;
>   if (ntp_dispatch_imsg() == -1) {
> - log_warn("pipe write error (from main)");
> + log_debug("pipe read error (from main)");
>   ntp_quit = 1;
>   }
>   }
> @@ -380,7 +380,7 @@ ntp_main(struct ntpd_conf *nconf, struct
>   if (nfds > 0 && pfd[PFD_PIPE_DNS].revents & (POLLIN|POLLERR)) {
>   nfds--;
>   if (ntp_dispatch_imsg_dns() == -1) {
> - log_warn("pipe write error (from dns engine)");
> + log_debug("pipe read error (from dns engine)");
>   ntp_quit = 1;
>   }
>   }
> 



fsck_ffs: faster with lots of cylinder groups

2020-06-21 Thread Otto Moerbeek
Hi,

both phase 1 and phase 5 need cylinder group metadata.  This diff
keeps the cg data read in phase 1 in memory to be used by phase 5 if
possible. From FreeBSD. 

-Otto

On an empty 30T fileystem:

$ time obj/fsck_ffs -f /dev/sd3a
2m44.10s real 0m13.21s user 0m07.38s system

$ time doas obj/fsck_ffs -f /dev/sd3a
1m32.81s real 0m12.86s user 0m05.25s system

The difference will be less if a fileystem is filled up, but still nice.

-Otto

Index: fsck.h
===
RCS file: /cvs/src/sbin/fsck_ffs/fsck.h,v
retrieving revision 1.32
diff -u -p -r1.32 fsck.h
--- fsck.h  5 Jan 2018 09:33:47 -   1.32
+++ fsck.h  21 Jun 2020 12:48:50 -
@@ -136,7 +136,6 @@ struct bufarea {
 struct bufarea bufhead;/* head of list of other blks in 
filesys */
 struct bufarea sblk;   /* file system superblock */
 struct bufarea asblk;  /* alternate file system superblock */
-struct bufarea cgblk;  /* cylinder group blocks */
 struct bufarea *pdirbp;/* current directory contents */
 struct bufarea *pbp;   /* current inode block */
 struct bufarea *getdatablk(daddr_t, long);
@@ -148,9 +147,7 @@ struct bufarea *getdatablk(daddr_t, long
(bp)->b_flags = 0;
 
 #definesbdirty()   sblk.b_dirty = 1
-#definecgdirty()   cgblk.b_dirty = 1
 #definesblock  (*sblk.b_un.b_fs)
-#definecgrp(*cgblk.b_un.b_cg)
 
 enum fixstate {DONTKNOW, NOFIX, FIX, IGNORE};
 
@@ -275,9 +272,13 @@ struct ufs2_dinode ufs2_zino;
 #defineFOUND   0x10
 
 union dinode *ginode(ino_t);
+struct bufarea *cglookup(u_int cg);
 struct inoinfo *getinoinfo(ino_t);
 void getblk(struct bufarea *, daddr_t, long);
 ino_t allocino(ino_t, int);
+void *Malloc(size_t);
+void *Calloc(size_t, size_t);
+void *Reallocarray(void *, size_t, size_t);
 
 int(*info_fn)(char *, size_t);
 char   *info_filesys;
Index: inode.c
===
RCS file: /cvs/src/sbin/fsck_ffs/inode.c,v
retrieving revision 1.49
diff -u -p -r1.49 inode.c
--- inode.c 16 Sep 2018 02:43:11 -  1.49
+++ inode.c 21 Jun 2020 12:48:50 -
@@ -370,7 +370,7 @@ setinodebuf(ino_t inum)
partialsize = inobufsize;
}
if (inodebuf == NULL &&
-   (inodebuf = malloc((unsigned)inobufsize)) == NULL)
+   (inodebuf = Malloc((unsigned)inobufsize)) == NULL)
errexit("Cannot allocate space for inode buffer\n");
 }
 
@@ -401,7 +401,7 @@ cacheino(union dinode *dp, ino_t inumber
blks = howmany(DIP(dp, di_size), sblock.fs_bsize);
if (blks > NDADDR)
blks = NDADDR + NIADDR;
-   inp = malloc(sizeof(*inp) + (blks ? blks - 1 : 0) * sizeof(daddr_t));
+   inp = Malloc(sizeof(*inp) + (blks ? blks - 1 : 0) * sizeof(daddr_t));
if (inp == NULL)
errexit("cannot allocate memory for inode cache\n");
inpp = [inumber % numdirs];
@@ -423,10 +423,10 @@ cacheino(union dinode *dp, ino_t inumber
inp->i_blks[NDADDR + i] = DIP(dp, di_ib[i]);
if (inplast == listmax) {
newlistmax = listmax + 100;
-   newinpsort = reallocarray(inpsort,
+   newinpsort = Reallocarray(inpsort,
(unsigned)newlistmax, sizeof(struct inoinfo *));
if (newinpsort == NULL)
-   errexit("cannot increase directory list");
+   errexit("cannot increase directory list\n");
inpsort = newinpsort;
listmax = newlistmax;
}
@@ -582,7 +582,8 @@ allocino(ino_t request, int type)
 {
ino_t ino;
union dinode *dp;
-   struct cg *cgp = 
+   struct bufarea *cgbp;
+   struct cg *cgp;
int cg;
time_t t;
struct inostat *info;
@@ -602,7 +603,7 @@ allocino(ino_t request, int type)
unsigned long newalloced, i;
newalloced = MINIMUM(sblock.fs_ipg,
MAXIMUM(2 * inostathead[cg].il_numalloced, 10));
-   info = calloc(newalloced, sizeof(struct inostat));
+   info = Calloc(newalloced, sizeof(struct inostat));
if (info == NULL) {
pwarn("cannot alloc %zu bytes to extend inoinfo\n",
sizeof(struct inostat) * newalloced);
@@ -619,7 +620,8 @@ allocino(ino_t request, int type)
inostathead[cg].il_numalloced = newalloced;
info = inoinfo(ino);
}
-   getblk(, cgtod(, cg), sblock.fs_cgsize);
+   cgbp = cglookup(cg);
+   cgp = cgbp->b_un.b_cg;
if (!cg_chkmagic(cgp))
pfatal("CG %d: BAD MAGIC NUMBER\n", cg);
setbit(cg_inosused(cgp), ino % sblock.fs_ipg);
@@ -637,7 +639,7 @@ allocino(ino_t request, int type)
default:

Re: obsd 6.7 - ntpd error msg

2020-06-18 Thread Otto Moerbeek
On Thu, Jun 18, 2020 at 09:57:34AM +0200, Salvatore Cuzzilla wrote:

> Perfect, tnx!
> 
> On 18.06.2020 07:58, Otto Moerbeek wrote:
> > On Wed, Jun 17, 2020 at 10:53:54PM +0200, Salvatore Cuzzilla wrote:
> > 
> > > Hi Otto here the logs (multitail) - @22:49:15 I restarted ntpd:
> > > -
> > > Jun 17 22:49:23 obsd ntpd[88568]: constraint reply from 188.61.106.24: 
> > > offset -0.541051
> > > Jun 17 22:49:46 obsd ntpd[88568]: peer 172.17.1.1 now valid
> > > 01] /var/log/daemon  <--- 
> > >   
> > >   
> > >
> > > 248KB - 2020/06/17 22:49:46
> > > -
> > > Jun 17 14:00:01 obsd syslogd[80400]: restart
> > > Jun 17 16:19:41 obsd ntpd[92699]: pipe write error (from main): No such 
> > > file or directory
> > > Jun 17 16:21:07 obsd ntpd[29588]: pipe write error (from main): No such 
> > > file or directory
> > > Jun 17 17:00:01 obsd syslogd[80400]: restart
> > > Jun 17 17:01:25 obsd ntpd[96273]: pipe write error (from main): No such 
> > > file or directory
> > > Jun 17 17:02:38 obsd ntpd[94737]: pipe write error (from main): No such 
> > > file or directory
> > > Jun 17 20:00:01 obsd syslogd[80400]: restart
> > > Jun 17 22:00:01 obsd syslogd[80400]: restart
> > > Jun 17 22:49:22 obsd ntpd[40936]: pipe write error (from main): No such 
> > > file or directory
> > > 02] /var/log/messages <---
> > >   
> > >   
> > >
> > > 205KB - 2020/06/17 22:49:22
> > > -
> > > 22:49:15 -ksh ToTo@obsd ~ $ doas rcctl restart ntpd
> > > ntpd(ok)
> > > ntpd(ok)
> > > 22:49:23 -ksh ToTo@obsd ~ $
> > 
> > 
> > OK, now we're getting somewhere.  It always helps to provide lots of
> > information form the start.
> > 
> > The message is generated by ntpd being stopped.  It is harmless,
> > though it is actually wrong, it's a pip read error.
> > 
> > So nothing to worry about.  I'll see if the log level should be
> > changed to debug for this one or maybe another solution.
> > 

And now with diff.

-Otto

Index: ntp.c
===
RCS file: /cvs/src/usr.sbin/ntpd/ntp.c,v
retrieving revision 1.164
diff -u -p -r1.164 ntp.c
--- ntp.c   11 Apr 2020 07:49:48 -  1.164
+++ ntp.c   18 Jun 2020 09:39:03 -
@@ -365,7 +365,7 @@ ntp_main(struct ntpd_conf *nconf, struct
if (nfds > 0 && pfd[PFD_PIPE_MAIN].revents & (POLLIN|POLLERR)) {
nfds--;
if (ntp_dispatch_imsg() == -1) {
-   log_warn("pipe write error (from main)");
+   log_debug("pipe read error (from main)");
ntp_quit = 1;
}
}
@@ -380,7 +380,7 @@ ntp_main(struct ntpd_conf *nconf, struct
if (nfds > 0 && pfd[PFD_PIPE_DNS].revents & (POLLIN|POLLERR)) {
nfds--;
if (ntp_dispatch_imsg_dns() == -1) {
-   log_warn("pipe write error (from dns engine)");
+   log_debug("pipe read error (from dns engine)");
ntp_quit = 1;
}
}



Re: obsd 6.7 - ntpd error msg

2020-06-17 Thread Otto Moerbeek
On Wed, Jun 17, 2020 at 10:53:54PM +0200, Salvatore Cuzzilla wrote:

> Hi Otto here the logs (multitail) - @22:49:15 I restarted ntpd:
> -
> Jun 17 22:49:23 obsd ntpd[88568]: constraint reply from 188.61.106.24: offset 
> -0.541051
> Jun 17 22:49:46 obsd ntpd[88568]: peer 172.17.1.1 now valid
> 01] /var/log/daemon  <--- 
>   
>   
>248KB - 2020/06/17 
> 22:49:46
> -
> Jun 17 14:00:01 obsd syslogd[80400]: restart
> Jun 17 16:19:41 obsd ntpd[92699]: pipe write error (from main): No such file 
> or directory
> Jun 17 16:21:07 obsd ntpd[29588]: pipe write error (from main): No such file 
> or directory
> Jun 17 17:00:01 obsd syslogd[80400]: restart
> Jun 17 17:01:25 obsd ntpd[96273]: pipe write error (from main): No such file 
> or directory
> Jun 17 17:02:38 obsd ntpd[94737]: pipe write error (from main): No such file 
> or directory
> Jun 17 20:00:01 obsd syslogd[80400]: restart
> Jun 17 22:00:01 obsd syslogd[80400]: restart
> Jun 17 22:49:22 obsd ntpd[40936]: pipe write error (from main): No such file 
> or directory
> 02] /var/log/messages <---
>   
>   
>205KB - 2020/06/17 
> 22:49:22
> -
> 22:49:15 -ksh ToTo@obsd ~ $ doas rcctl restart ntpd
> ntpd(ok)
> ntpd(ok)
> 22:49:23 -ksh ToTo@obsd ~ $


OK, now we're getting somewhere.  It always helps to provide lots of
information form the start.

The message is generated by ntpd being stopped.  It is harmless,
though it is actually wrong, it's a pip read error.

So nothing to worry about.  I'll see if the log level should be
changed to debug for this one or maybe another solution.

-Otto
> 
> On 17.06.2020 21:18, Otto Moerbeek wrote:
> > On Wed, Jun 17, 2020 at 09:15:22PM +0200, Salvatore Cuzzilla wrote:
> > 
> > > Hi Otto,
> > > 
> > > thanks for helping, really appreciated!
> > > The msg is showing after each restart. My simple conf here below:
> > > -
> > > 21:05:52 -ksh ToTo@obsd ~ $ doas cat /etc/ntpd.conf
> > > # $OpenBSD: ntpd.conf,v 1.14 2015/07/15 20:28:37 ajacoutot Exp $
> > > #
> > > # See ntpd.conf(5) and /etc/examples/ntpd.conf
> > > 
> > > server 172.17.1.1
> > > sensor *
> > > constraints from "https://www.alfanetti.org;
> > > -
> > 
> > And show the log lines, all of them
> > 
> > -Otto
> > 
> > > 
> > > On 17.06.2020 20:51, Otto Moerbeek wrote:
> > > > On Wed, Jun 17, 2020 at 04:50:46PM +0200, Salvatore Cuzzilla wrote:
> > > >
> > > > > Hi Folks,
> > > > >
> > > > > when I restart ntpd I see this msg in /var/log/daemon:
> > > > >
> > > > > Jun 17 16:19:41 obsd ntpd[92699]: pipe write error (from main): No 
> > > > > suchfile or directory
> > > > >
> > > > > however, time seems to be in sync:
> > > > >
> > > > > ---
> > > > > 16:37:17 -ksh ToTo@obsd ~ $ ntpctl -sa
> > > > > 1/1 peers valid, 1/1 sensors valid, constraint offset -1s, clock 
> > > > > unsynced
> > > > >
> > > > > peer
> > > > >wt tl st  next  poll  offset   delay  jitter
> > > > > 172.17.1.1
> > > > > 1 10  3 2361s 3069s-0.008ms 0.716ms 0.137ms
> > > > >
> > > > > sensor
> > > > >wt gd st  next  poll  offset  correction
> > > > > vmt0
> > > > > 1  1  07s   15s27.860ms 0.000ms
> > > > >
> > > > > 16:38:20 -ksh ToTo@obsd ~ $ doas sysctl -a | grep timecounter
> > > > > kern.timecounter.tick=1
> > > > > kern.timecounter.timestepwarnings=0
> > > > > kern.timecounter.hardware=tsc
> > > > > kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000) 
> > > > > acpitimer0(1000)
> > > > > ---
> > > > >
> > > > > anyone else experiencing the same?
> > > > >
> > > > > ---
> > > > > :wq,
> > > > > Salvatore.
> > > > >
> > > >
> > > > Was the message in the log before or after restarting?
> > > > Please show your ntpd.conf
> > > >
> > > > -Otto
> > > >
> > > 
> > > ---
> > > :wq,
> > > Salvatore.
> > 
> 
> ---
> :wq,
> Salvatore.



Re: obsd 6.7 - ntpd error msg

2020-06-17 Thread Otto Moerbeek
On Wed, Jun 17, 2020 at 09:15:22PM +0200, Salvatore Cuzzilla wrote:

> Hi Otto,
> 
> thanks for helping, really appreciated!
> The msg is showing after each restart. My simple conf here below:
> -
> 21:05:52 -ksh ToTo@obsd ~ $ doas cat /etc/ntpd.conf
> # $OpenBSD: ntpd.conf,v 1.14 2015/07/15 20:28:37 ajacoutot Exp $
> #
> # See ntpd.conf(5) and /etc/examples/ntpd.conf
> 
> server 172.17.1.1
> sensor *
> constraints from "https://www.alfanetti.org;
> -

And show the log lines, all of them

    -Otto

> 
> On 17.06.2020 20:51, Otto Moerbeek wrote:
> > On Wed, Jun 17, 2020 at 04:50:46PM +0200, Salvatore Cuzzilla wrote:
> > 
> > > Hi Folks,
> > > 
> > > when I restart ntpd I see this msg in /var/log/daemon:
> > > 
> > > Jun 17 16:19:41 obsd ntpd[92699]: pipe write error (from main): No 
> > > suchfile or directory
> > > 
> > > however, time seems to be in sync:
> > > 
> > > ---
> > > 16:37:17 -ksh ToTo@obsd ~ $ ntpctl -sa
> > > 1/1 peers valid, 1/1 sensors valid, constraint offset -1s, clock unsynced
> > > 
> > > peer
> > >wt tl st  next  poll  offset   delay  jitter
> > > 172.17.1.1
> > > 1 10  3 2361s 3069s-0.008ms 0.716ms 0.137ms
> > > 
> > > sensor
> > >wt gd st  next  poll  offset  correction
> > > vmt0
> > > 1  1  07s   15s27.860ms 0.000ms
> > > 
> > > 16:38:20 -ksh ToTo@obsd ~ $ doas sysctl -a | grep timecounter
> > > kern.timecounter.tick=1
> > > kern.timecounter.timestepwarnings=0
> > > kern.timecounter.hardware=tsc
> > > kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000) 
> > > acpitimer0(1000)
> > > ---
> > > 
> > > anyone else experiencing the same?
> > > 
> > > ---
> > > :wq,
> > > Salvatore.
> > > 
> > 
> > Was the message in the log before or after restarting?
> > Please show your ntpd.conf
> > 
> > -Otto
> > 
> 
> ---
> :wq,
> Salvatore.



Re: obsd 6.7 - ntpd error msg

2020-06-17 Thread Otto Moerbeek
On Wed, Jun 17, 2020 at 04:50:46PM +0200, Salvatore Cuzzilla wrote:

> Hi Folks,
> 
> when I restart ntpd I see this msg in /var/log/daemon:
> 
> Jun 17 16:19:41 obsd ntpd[92699]: pipe write error (from main): No suchfile 
> or directory
> 
> however, time seems to be in sync:
> 
> ---
> 16:37:17 -ksh ToTo@obsd ~ $ ntpctl -sa
> 1/1 peers valid, 1/1 sensors valid, constraint offset -1s, clock unsynced
> 
> peer
>wt tl st  next  poll  offset   delay  jitter
> 172.17.1.1
> 1 10  3 2361s 3069s-0.008ms 0.716ms 0.137ms
> 
> sensor
>wt gd st  next  poll  offset  correction
> vmt0
> 1  1  07s   15s27.860ms 0.000ms
> 
> 16:38:20 -ksh ToTo@obsd ~ $ doas sysctl -a | grep timecounter
> kern.timecounter.tick=1
> kern.timecounter.timestepwarnings=0
> kern.timecounter.hardware=tsc
> kern.timecounter.choice=i8254(0) acpihpet0(1000) tsc(2000) acpitimer0(1000)
> ---
> 
> anyone else experiencing the same?
> 
> ---
> :wq,
> Salvatore.
> 

Was the message in the log before or after restarting?
Please show your ntpd.conf

-Otto



sparc64: bootblocks vs ofwboot load address

2020-06-05 Thread Otto Moerbeek
Hi,

Miod remarked the overwriting of the bootblocks actually is a
regression I introduced. So teintroduce the lost comment and load
ofwboot at 0x6000. 

OK?

-Otto

Index: bootblk.fth
===
RCS file: /cvs/src/sys/arch/sparc64/stand/bootblk/bootblk.fth,v
retrieving revision 1.9
diff -u -p -r1.9 bootblk.fth
--- bootblk.fth 2 Apr 2020 06:06:22 -   1.9
+++ bootblk.fth 5 Jun 2020 08:09:33 -
@@ -716,7 +716,15 @@ create cur-blockno -1 l, -1 l, \ Curren
 2drop
 ;
 
-" load-base " evaluate constant loader-base
+\
+\ According to the 1275 addendum for SPARC processors:
+\ Default load-base is 0x4000.  At least 0x8. or
+\ 512KB must be available at that address.  
+\
+\ The Fcode bootblock can take up up to 8KB (O.K., 7.5KB) 
+\ so load programs at 0x4000 + 0x2000=> 0x6000
+\
+" load-base " evaluate 2000 + constant loader-base
 
 : load-file-signon ( load-file len boot-path len -- load-file len boot-path 
len )
." Loading file" space 2over type cr ." from device" space 2dup type cr
@@ -821,7 +829,7 @@ create cur-blockno -1 l, -1 l,  \ Curren
 ;
 
 : do-boot ( bootfile -- )
-   ." OpenBSD IEEE 1275 Bootblock 2.0" cr
+   ." OpenBSD IEEE 1275 Bootblock 2.1" cr
 
\ Open boot device
boot-path   ( boot-path len )



Re: filesystem code integer and many inodes

2020-06-02 Thread Otto Moerbeek
On Fri, May 29, 2020 at 09:30:04AM +0200, Otto Moerbeek wrote:

> On Thu, May 28, 2020 at 12:54:41PM -0600, Todd C. Miller wrote:
> 
> > On Thu, 28 May 2020 20:53:07 +0200, Otto Moerbeek wrote:
> > 
> > > Here's the separate diff for the prefcg loops. From FreeBSD.
> > 
> > OK millert@
> > 
> >  - todd
> > 
> 
> And here's the updated diff against -current. I removed a redundant
> cast in a fs_ipg * fs_ncg multiplication in fsck_ffs. Since both are
> u_int32 and we know the product is <= UINT_MAX, so we do not need to
> cast.
> 
> I would like to make some progress here, I have a followup diff to
> speed up Phase 5 of fsck_ffs...

Did anyone look closer at this?

Did anyone test?

-Otto


> 
> Index: sbin/clri/clri.c
> ===
> RCS file: /cvs/src/sbin/clri/clri.c,v
> retrieving revision 1.20
> diff -u -p -r1.20 clri.c
> --- sbin/clri/clri.c  28 Jun 2019 13:32:43 -  1.20
> +++ sbin/clri/clri.c  29 May 2020 07:23:27 -
> @@ -68,7 +68,8 @@ main(int argc, char *argv[])
>   char *fs, sblock[SBLOCKSIZE];
>   size_t bsize;
>   off_t offset;
> - int i, fd, imax, inonum;
> + int i, fd;
> + ino_t imax, inonum;
>  
>   if (argc < 3)
>   usage();
> Index: sbin/dumpfs/dumpfs.c
> ===
> RCS file: /cvs/src/sbin/dumpfs/dumpfs.c,v
> retrieving revision 1.35
> diff -u -p -r1.35 dumpfs.c
> --- sbin/dumpfs/dumpfs.c  17 Feb 2020 16:11:25 -  1.35
> +++ sbin/dumpfs/dumpfs.c  29 May 2020 07:23:27 -
> @@ -69,7 +69,7 @@ union {
>  #define acg  cgun.cg
>  
>  int  dumpfs(int, const char *);
> -int  dumpcg(const char *, int, int);
> +int  dumpcg(const char *, int, u_int);
>  int  marshal(const char *);
>  int  open_disk(const char *);
>  void pbits(void *, int);
> @@ -163,6 +163,7 @@ dumpfs(int fd, const char *name)
>   size_t size;
>   off_t off;
>   int i, j;
> + u_int cg;
>  
>   switch (afs.fs_magic) {
>   case FS_UFS2_MAGIC:
> @@ -172,7 +173,7 @@ dumpfs(int fd, const char *name)
>   afs.fs_magic, ctime());
>   printf("superblock location\t%jd\tid\t[ %x %x ]\n",
>   (intmax_t)afs.fs_sblockloc, afs.fs_id[0], afs.fs_id[1]);
> - printf("ncg\t%d\tsize\t%jd\tblocks\t%jd\n",
> + printf("ncg\t%u\tsize\t%jd\tblocks\t%jd\n",
>   afs.fs_ncg, (intmax_t)fssize, (intmax_t)afs.fs_dsize);
>   break;
>   case FS_UFS1_MAGIC:
> @@ -198,7 +199,7 @@ dumpfs(int fd, const char *name)
>   printf("cylgrp\t%s\tinodes\t%s\tfslevel %d\n",
>   i < 1 ? "static" : "dynamic",
>   i < 2 ? "4.2/4.3BSD" : "4.4BSD", i);
> - printf("ncg\t%d\tncyl\t%d\tsize\t%d\tblocks\t%d\n",
> + printf("ncg\t%u\tncyl\t%d\tsize\t%d\tblocks\t%d\n",
>   afs.fs_ncg, afs.fs_ncyl, afs.fs_ffs1_size, 
> afs.fs_ffs1_dsize);
>   break;
>   default:
> @@ -223,9 +224,9 @@ dumpfs(int fd, const char *name)
>   (intmax_t)afs.fs_cstotal.cs_ndir,
>   (intmax_t)afs.fs_cstotal.cs_nifree, 
>   (intmax_t)afs.fs_cstotal.cs_nffree);
> - printf("bpg\t%d\tfpg\t%d\tipg\t%d\n",
> + printf("bpg\t%d\tfpg\t%d\tipg\t%u\n",
>   afs.fs_fpg / afs.fs_frag, afs.fs_fpg, afs.fs_ipg);
> - printf("nindir\t%d\tinopb\t%d\tmaxfilesize\t%ju\n",
> + printf("nindir\t%d\tinopb\t%u\tmaxfilesize\t%ju\n",
>   afs.fs_nindir, afs.fs_inopb, 
>   (uintmax_t)afs.fs_maxfilesize);
>   printf("sbsize\t%d\tcgsize\t%d\tcsaddr\t%jd\tcssize\t%d\n",
> @@ -238,10 +239,10 @@ dumpfs(int fd, const char *name)
>   printf("nbfree\t%d\tndir\t%d\tnifree\t%d\tnffree\t%d\n",
>   afs.fs_ffs1_cstotal.cs_nbfree, afs.fs_ffs1_cstotal.cs_ndir,
>   afs.fs_ffs1_cstotal.cs_nifree, 
> afs.fs_ffs1_cstotal.cs_nffree);
> - printf("cpg\t%d\tbpg\t%d\tfpg\t%d\tipg\t%d\n",
> + printf("cpg\t%d\tbpg\t%d\tfpg\t%d\tipg\t%u\n",
>   afs.fs_cpg, afs.fs_fpg / afs.fs_frag, afs.fs_fpg,
>   afs.fs_ipg);
> - printf("nindir\t%d\tinopb\t%d\tnspf\t%d\tmaxfilesize\t%ju\n",
> + printf("nindir\t%d\tinopb\t%u\tnspf\t%d\tmaxfilesize\t%ju\n",
>

Re: sparc64 boot issue on qemu

2020-05-31 Thread Otto Moerbeek
On Sun, May 31, 2020 at 09:49:34AM +0100, Mark Cave-Ayland wrote:

> On 30/05/2020 10:54, Otto Moerbeek wrote:
> 
> > https://cdn.openbsd.org/pub/OpenBSD/snapshots/sparc64/
> > contains the unpatched miniroot.
> > 
> > https://www.drijf.net/openbsd/disk.qcow2
> > 
> > is the disk image based on the miniroot containing the patch in the
> > firts post in this thread.
> > 
> > Thanks for looking into this.
> > 
> > Note that we did *not* observe boot failure on any real sparc64
> > hardware. The bootblock changes I did for the 6.7 release were tested
> > on many different machines.
> 
> Thanks for the test case which enables me to reproduce the issue. With 
> ?fcode-verbose
> enabled you see this at the end of the FCode execution:
> 
> ...
> ...
> 5acf :  [ 0x8b7 ]
> 5ad0 : b(lit) [ 0x10 ]
> 5ad6 :  [ 0x81e ]
> 5ad7 : 0= [ 0x34 ]
> 5ad8 : swap [ 0x49 ]
> 5ad9 : drop [ 0x46 ]
> 5ada : b?branch [ 0x14 ]
>(offset) 5
> 5ade : (compile)  [ 0x8c8 ]
> 5adf : (compile) b(>resolve) [ 0xb2 ]
> OpenBSD IEEE 1275 Bootblock 2.0
> Booting from device /pci@1fe,0/pci@1,1/ide@3/ide@1/cdrom@0
> Try superblock read
> FFS v1
> ufs-open complete
> .Looking for ofwboot in directory...
> .
> ..
> ofwboot
> Found it
> .Loading 1a1c8  bytes of file...
> Copying 2000 bytes to 4000
> Copying 2000 bytes to 6000
> Copying 2000 bytes to 8000
> Copying 2000 bytes to a000
> Copying 2000 bytes to c000
> Copying 2000 bytes to e000
> Copying 2000 bytes to 1
> Copying 2000 bytes to 12000
> Copying 2000 bytes to 14000
> Copying 2000 bytes to 16000
> Copying 2000 bytes to 18000
> Copying 2000 bytes to 1a000
> Copying 2000 bytes to 1c000
> Copying 2000 bytes to 1e000
> 5ae0 : expect [ 0x8a ]
> 
> 
> Now that 0x8a is completely wrong since according to
> https://github.com/openbsd/src/blob/master/sys/arch/sparc64/stand/bootblk/bootblk.fth
> the last instruction should be exit which is 0x33.
> 
> Since the FCode itself is located at load-base (0x4000) it looks to me from 
> the above
> debug that you're loading ofwboot at the same address, overwriting the FCode. 
> Once
> do-boot has finished executing, the FCode interpreter returns to execute the 
> exit
> word which has now been overwritten: so instead of returning to the updated 
> client
> context via exit to execute ofwboot, it executes expect which asks for input 
> from the
> keyboard and then crashes because the stack is incorrect.
> 
> My recommendation would be to load ofwboot at 0x6000 instead of 0x4000 which I
> believe will fix the issue. It's interesting you mention that this works on 
> real
> hardware, since it doesn't agree with my reading of the IEEE-1275 
> specification so
> you're certainly relying on some undocumented behaviour here.
> 
> 
> ATB,
> 
> Mark.

Thanks, the following works indeed. 

-Otto

Index: bootblk.fth
===
RCS file: /cvs/src/sys/arch/sparc64/stand/bootblk/bootblk.fth,v
retrieving revision 1.9
diff -u -p -r1.9 bootblk.fth
--- bootblk.fth 2 Apr 2020 06:06:22 -   1.9
+++ bootblk.fth 31 May 2020 13:17:25 -
@@ -716,7 +716,8 @@ create cur-blockno -1 l, -1 l,  \ Curren
 2drop
 ;
 
-" load-base " evaluate constant loader-base
+\\ Do not overwrite bootblocks
+" load-base " evaluate 2000 + constant loader-base
 
 : load-file-signon ( load-file len boot-path len -- load-file len boot-path 
len )
." Loading file" space 2over type cr ." from device" space 2dup type cr



Re: sparc64 boot issue on qemu

2020-05-30 Thread Otto Moerbeek
On Sat, May 30, 2020 at 10:11:08AM +0100, Mark Cave-Ayland wrote:

> On 30/05/2020 10:03, Otto Moerbeek wrote:
> 
> > Hi,
> > 
> > thanks for the hints, but an unpatched 6.7 miniroot still fails to
> > boot for me
> > 
> > qemu-system-sparc64 -machine sun4u -m 1024 -drive \
> > file=miniroot67.img,format=raw -nographic -serial stdio -monitor none
> > 
> > OpenBIOS for Sparc64
> > Configuration device id QEMU version 1 machine id 0
> > kernel cmdline 
> > CPUs: 1 x SUNW,UltraSPARC-IIi
> > UUID: ----
> > Welcome to OpenBIOS v1.1 built on Oct 28 2019 17:08
> >   Type 'help' for detailed information
> > Trying disk:a...
> > Not a bootable ELF image
> > Not a bootable a.out image
> > 
> > Loading FCode image...
> > Loaded 6882 bytes
> > entry point is 0x4000
> > Evaluating FCode...
> > OpenBSD IEEE 1275 Bootblock 2.0
> > ..
> > 
> > And then hangs
> > 
> > While the patched bootblocks do boot (but hang later after
> > 
> > scsibus1 at softraid0: 256 targets
> > 
> > 
> > as before,
> > 
> > -Otto
> 
> Hmmm odd. Is it possible for you to upload your miniroot somewhere for me to 
> take a
> quick look? I don't have a great deal of time right now, but I can run it 
> through a
> debugger to see if anything obvious shows up.
> 
> 
> ATB,
> 
> Mark.

https://cdn.openbsd.org/pub/OpenBSD/snapshots/sparc64/
contains the unpatched miniroot.

https://www.drijf.net/openbsd/disk.qcow2

is the disk image based on the miniroot containing the patch in the
firts post in this thread.

Thanks for looking into this.

Note that we did *not* observe boot failure on any real sparc64
hardware. The bootblock changes I did for the 6.7 release were tested
on many different machines.

-Otto




Re: sparc64 boot issue on qemu

2020-05-30 Thread Otto Moerbeek
On Sat, May 30, 2020 at 09:29:36AM +0100, Mark Cave-Ayland wrote:

> On 29/05/2020 23:56, Jason A. Donenfeld wrote:
> 
> > Oh that's a nice observation about `boot disk -V`. Doing so actually
> > got me booting up entirely:
> > 
> > $ qemu-img convert -O qcow2 miniroot66.fs disk.qcow2
> > $ qemu-img resize disk.qcow2 20G
> > $ qemu-system-sparc64 -m 1024 -drive file=disk.qcow2,if=ide -net
> > nic,model=ne2k_pci -net user -boot a -nographic -monitor none -serial
> > stdio
> 
> I think the problem here is that you're asking OpenBIOS to boot from the 
> (empty)
> floppy disk with "-boot a" rather than the qcow2 image which is normally 
> attached to
> the first hard disk "-boot c". As this is the default, then I would expect the
> command line above to work if you simply drop "-boot a".
> 
> Also is there a particular reason for using the ne2k_pci NIC instead of the 
> default
> in-built sunhme device? I try and keep the documentation at
> https://wiki.qemu.org/Documentation/Platforms/SPARC as accurate as I can, so 
> do look
> there for latest best practices and command line examples.
> 
> Finally the version of qemu-system-sparc64 you are running can also boot from 
> a
> virtio-blk-pci device (again see the above wiki page for details) if you are 
> looking
> for the best emulated disk performance.
> 
> 
> ATB,
> 
> Mark.

Hi,

thanks for the hints, but an unpatched 6.7 miniroot still fails to
boot for me

qemu-system-sparc64 -machine sun4u -m 1024 -drive \
file=miniroot67.img,format=raw -nographic -serial stdio -monitor none

OpenBIOS for Sparc64
Configuration device id QEMU version 1 machine id 0
kernel cmdline 
CPUs: 1 x SUNW,UltraSPARC-IIi
UUID: ----
Welcome to OpenBIOS v1.1 built on Oct 28 2019 17:08
  Type 'help' for detailed information
Trying disk:a...
Not a bootable ELF image
Not a bootable a.out image

Loading FCode image...
Loaded 6882 bytes
entry point is 0x4000
Evaluating FCode...
OpenBSD IEEE 1275 Bootblock 2.0
..

And then hangs

While the patched bootblocks do boot (but hang later after

scsibus1 at softraid0: 256 targets


as before,

-Otto



sparc64 boot issue on qemu

2020-05-29 Thread Otto Moerbeek
On Thu, May 28, 2020 at 10:11:28AM +0200, Otto Moerbeek wrote:

> On Thu, May 28, 2020 at 01:21:21AM -0600, Jason A. Donenfeld wrote:
> 
> > On Thu, May 28, 2020 at 1:19 AM Otto Moerbeek  wrote:
> > > Of course.., I was running it from a !wxallowed mount. BTW, qemu is in
> > > packages, no need to build it yourself.
> > 
> > Sure, but now I've been somewhat nerd sniped and am playing with this
> > fcode forth implementation in qemu :-P. I wonder if there's something
> > missing in the 64-bit extensions to IEEE 1275, in table.fs...
> 
> OK, can reproduce. I'll see if I can find out something.
> 
>   -Otto
> 

After running the bootblocks in debug mode (using boot disk -V) and
seeing ofwboot was found and loaded, I added some debug code to the
bootblocks and now it correctly starts ofwboot on qemu


Trying disk:a...
Not a bootable ELF image
Not a bootable a.out image

Loading FCode image...
Loaded 6936 bytes
entry point is 0x4000
Evaluating FCode...
OpenBSD IEEE 1275 Bootblock 2.0
..free mem
close boot dev
start loaded program
>> OpenBSD BOOT 1.17
Trying bsd...
open /etc/random.seed: No such file or directory
Booting /pci@1fe,0/pci@1,1/ide@3/ide@0/disk@0:a/bsd
4225784@0x100+1288@0x1407af8+3249436@0x1c0+944868@0x1f1951c 
symbols @ 0xfef50340 139 start=0x100
console is /pci@1fe,0/pci@1,1/ebus@1/su
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California.  All rights reserved.
Copyright (c) 1995-2020 OpenBSD. All rights reserved.  https://www.OpenBSD.org
real mem = 2147483648 (2048MB)
avail mem = 2099232768 (2001MB)
random: boothowto does not indicate good seed
mainbus0 at root: OpenBiosTeam,OpenBIOS
cpu0 at mainbus0: SUNW,UltraSPARC-IIi (rev 9.1) @ 100 MHz
cpu0: physical 256K instruction (64 b/l), 16K data (32 b/l), 256K
external (64 b/l)
psycho0 at mainbus0: SUNW,sabre, impl 0, version 0, ign 7c0
psycho0: bus range 0-2, PCI bus 0
psycho0: dvma map c000-dfff
pci0 at psycho0
ppb0 at pci0 dev 1 function 1 "Sun Simba" rev 0x11
pci1 at ppb0 bus 1
ebus0 at pci1 dev 1 function 0 "Sun PCIO EBus2" rev 0x01
clock1 at ebus0 addr 2000-3fff: mk48t59
"power" at ebus0 addr 7240-7243 ivec 0x1 not configured
"fdthree" at ebus0 addr 0- not configured
com0 at ebus0 addr 3f8-3ff ivec 0x2b: ns16550a, 16 byte fifo
com0: console
pckbc0 at ebus0 addr 60-67 ivec 0x29
pckbd0 at pckbc0 (kbd slot)
wskbd0 at pckbd0
"Bochs VGA" rev 0x02 at pci1 dev 2 function 0 not configured
pciide0 at pci1 dev 3 function 0 "CMD Technology PCI0646" rev 0x07:
DMA, channel 0 configured to native-PCI, channel 1 configured to native-PCI
pciide0: using ivec 0x7e0 for native-PCI interrupt
wd0 at pciide0 channel 0 drive 0: 
wd0: 16-sector PIO, LBA48, 3MB, 6400 sectors
wd0(pciide0:0:0): using PIO mode 4, Ultra-DMA mode 2
atapiscsi0 at pciide0 channel 1 drive 0
scsibus0 at atapiscsi0: 2 targets
cd0 at scsibus0 targ 0 lun 0:  removable
cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2
ppb1 at pci0 dev 1 function 0 "Sun Simba" rev 0x11
pci2 at ppb1 bus 2
ne0 at pci2 dev 0 function 0 "Realtek 8029" rev 0x00: ivec 0x7d0,
address 52:54:00:12:34:56
softraid0 at root
scsibus1 at softraid0: 256 targets

It hangs at this point here, but I that's clearly another issue.

Puzzled...

-Otto

Index: bootblk.fth
===
RCS file: /cvs/src/sys/arch/sparc64/stand/bootblk/bootblk.fth,v
retrieving revision 1.9
diff -u -p -r1.9 bootblk.fth
--- bootblk.fth 2 Apr 2020 06:06:22 -   1.9
+++ bootblk.fth 29 May 2020 11:53:36 -
@@ -850,16 +850,22 @@ create cur-blockno -1 l, -1 l,\ Curren
   " /ofwboot" load-file( -- load-base )
then
 
+   ." free mem" cr
+
\ Free memory for reading disk blocks
cur-block 0<> if
   dev-block dev-blocksize free-mem
then
 
+   ." close boot dev" cr
+
\ Close boot device
boot-ihandle dup -1 <> if
   cif-close -1 to boot-ihandle 
then

+   ." start loaded program" cr
+
dup 0<> if " to load-base init-program" evaluate then
 ;
 



Re: filesystem code integer and many inodes

2020-05-29 Thread Otto Moerbeek
On Fri, May 29, 2020 at 09:30:04AM +0200, Otto Moerbeek wrote:

> On Thu, May 28, 2020 at 12:54:41PM -0600, Todd C. Miller wrote:
> 
> > On Thu, 28 May 2020 20:53:07 +0200, Otto Moerbeek wrote:
> > 
> > > Here's the separate diff for the prefcg loops. From FreeBSD.
> > 
> > OK millert@
> > 
> >  - todd
> > 
> 
> And here's the updated diff against -current. I removed a redundant
> cast in a fs_ipg * fs_ncg multiplication in fsck_ffs. Since both are
> u_int32 and we know the product is <= UINT_MAX, so we do not need to
> cast.
> 
> I would like to make some progress here, I have a followup diff to
> speed up Phase 5 of fsck_ffs...

This last line was directed at other tech@ subscribers and not so much
at millert@. Please review and/or test. Thanks!

-Otto



  1   2   3   4   5   6   7   8   >