Re: uvm_fault: Kill goto Case2

2020-11-13 Thread Theo Buehler
On Fri, Nov 13, 2020 at 12:04:23PM -0300, Martin Pieuchot wrote:
> Another simple refactoring of uvm_fault() removing a goto, ok?

ok tb



Re: uvm_pagealloc() & uvm.free accounting

2020-11-13 Thread Mark Kettenis
> Date: Fri, 13 Nov 2020 11:38:33 -0300
> From: Martin Pieuchot 
> 
> The uvmexp.free* variables are read in uvm_pagealloc() & uvm_pglistalloc()
> before grabbing the `fpageqlock'.
> 
> `uvmexp.free' is always modified by the pmemrange allocator under the
> above motioned lock.  To avoid races and the chances of missing a wakeup,
> the diff below move the checks inside the critical section.
> 
> Note that this doesn't solve the race with reading `freemin', `freetarg',
> `inactive' and `inactarg'.  These are currently updated under another
> lock and will be addressed in an upcoming diff.

Well, the bit of code that wakes up the pagedaemon:

/*
 * check to see if we need to generate some free pages waking
 * the pagedaemon.
 */
if ((uvmexp.free - BUFPAGES_DEFICIT) < uvmexp.freemin ||
((uvmexp.free - BUFPAGES_DEFICIT) < uvmexp.freetarg &&
(uvmexp.inactive + BUFPAGES_INACT) < uvmexp.inactarg))
wakeup();

is actually fine.  All this really does is kicking the pagedaemon if
we don't meet the the targets of free/inactive pages, but we have some
leeway here and if we narrowly miss kicking the pagedeamon this time
around, we will kick it the next time.

The bit where we check the actual limits may indeed be important.  On
top of that your approach avoids duplicating the checks, so that's
good.

> To fix this race I introduced a new UVM_PLA_USERESERVE.  Note that those
> flags are now for uvm_pmr_getpages() so reflect that in comment.

That's wrong.  The existing flags remain flags for uvm_pglistalloc()
and that is the interface that the rest of the kernel calls.  What do
you think PLA stands for?

> Careful reviewers will spot an off-by-one change in the check for 
> pagedaemon's reserved memory.  My understanding is that it's a bugfix,
> is it correct?

You mean for uvm_pagealloc().  I'd say yes.  But this does mean that
in some sense the pagedaemon reserve grows by a page at the cost of
the kernel reserve.  So I think it would be a good idea to bump the
kernel reserve a bit.  Maybe to 8 pages?

> I also started to document the locking for some of the `uvmexp' fields.
> I'm well aware that reading `uvmexp.free' is done without the correct
> lock in the pagedaemon, this will be addressed soon.  The other accesses
> shouldn't matter much as they are not used to make decisions.
> 
> This should hopefully be good enough to make uvm_pagealloc() completely
> mp-safe, something that matters much because it's called from pmap_enter(9)
> on some architectures.
> 
> ok?

Some additional comments below...

> Index: uvm/uvm_extern.h
> ===
> RCS file: /cvs/src/sys/uvm/uvm_extern.h,v
> retrieving revision 1.154
> diff -u -p -r1.154 uvm_extern.h
> --- uvm/uvm_extern.h  19 Oct 2020 08:19:46 -  1.154
> +++ uvm/uvm_extern.h  13 Nov 2020 14:05:08 -
> @@ -142,14 +142,15 @@ typedef int vm_prot_t;
>  #define  UVM_PGA_ZERO0x0002  /* returned page must be zeroed 
> */
>  
>  /*
> - * flags for uvm_pglistalloc()
> + * flags for uvm_pmr_getpages()

See my comments above.  Maybe add "also used by uvm_pmr_getpages()".

>   */
>  #define UVM_PLA_WAITOK   0x0001  /* may sleep */
>  #define UVM_PLA_NOWAIT   0x0002  /* can't sleep (need one of the 
> two) */
>  #define UVM_PLA_ZERO 0x0004  /* zero all pages before returning */
>  #define UVM_PLA_TRYCONTIG0x0008  /* try to allocate contig physmem */
>  #define UVM_PLA_FAILOK   0x0010  /* caller can handle failure */
> -#define UVM_PLA_NOWAKE   0x0020  /* don't wake the page daemon 
> on failure */
> +#define UVM_PLA_NOWAKE   0x0020  /* don't wake page daemon on 
> failure */
> +#define UVM_PLA_USERESERVE   0x0040  /* can allocate from kernel reserve */
>  
>  /*
>   * lockflags that control the locking behavior of various functions.
> Index: uvm/uvm_page.c
> ===
> RCS file: /cvs/src/sys/uvm/uvm_page.c,v
> retrieving revision 1.150
> diff -u -p -r1.150 uvm_page.c
> --- uvm/uvm_page.c22 Sep 2020 14:31:08 -  1.150
> +++ uvm/uvm_page.c13 Nov 2020 13:52:25 -
> @@ -733,32 +733,11 @@ uvm_pglistalloc(psize_t size, paddr_t lo
>   size = atop(round_page(size));
>  
>   /*
> -  * check to see if we need to generate some free pages waking
> -  * the pagedaemon.
> -  */
> - if ((uvmexp.free - BUFPAGES_DEFICIT) < uvmexp.freemin ||
> - ((uvmexp.free - BUFPAGES_DEFICIT) < uvmexp.freetarg &&
> - (uvmexp.inactive + BUFPAGES_INACT) < uvmexp.inactarg))
> - wakeup();
> -
> - /*
>* XXX uvm_pglistalloc is currently only used for kernel
>* objects. Unlike the checks in uvm_pagealloc, below, here
> -  * we are always allowed to use the kernel reserve. However, we
> -  * have to enforce the pagedaemon 

unwind(8): handle large answers differently

2020-11-13 Thread Florian Obser
The recent fix for handling large (about 16k) answers in unwind has
the downside that we are now always copying at least 16k per answer
between the resolver and frontend process.
That seems wasteful.

This re-arranges things to only copy what its needed.

Tests, OKs?

diff --git frontend.c frontend.c
index 8adbb07219b..74715087c5f 100644
--- frontend.c
+++ frontend.c
@@ -86,7 +86,8 @@ struct pending_query {
int  fd;
int  bogus;
int  rcode_override;
-   ssize_t  answer_len;
+   int  answer_len;
+   int  received;
uint8_t *answer;
 };
 
@@ -424,10 +425,7 @@ frontend_dispatch_resolver(int fd, short event, void *bula)
struct imsgev   *iev = bula;
struct imsgbuf  *ibuf = >ibuf;
struct imsg  imsg;
-   struct query_imsg   *query_imsg;
-   struct answer_imsg  *answer_imsg;
int  n, shut = 0, chg;
-   uint8_t *p;
 
if (event & EV_READ) {
if ((n = imsg_read(ibuf)) == -1 && errno != EAGAIN)
@@ -449,45 +447,30 @@ frontend_dispatch_resolver(int fd, short event, void 
*bula)
break;
 
switch (imsg.hdr.type) {
-   case IMSG_ANSWER_HEADER:
-   if (IMSG_DATA_SIZE(imsg) != sizeof(*query_imsg))
-   fatalx("%s: IMSG_ANSWER_HEADER wrong length: "
-   "%lu", __func__, IMSG_DATA_SIZE(imsg));
-   query_imsg = (struct query_imsg *)imsg.data;
-   if ((pq = find_pending_query(query_imsg->id)) ==
-   NULL) {
-   log_warnx("cannot find pending query %llu",
-   query_imsg->id);
-   break;
-   }
-   if (query_imsg->err) {
-   send_answer(pq);
-   pq = NULL;
-   break;
-   }
-   pq->bogus = query_imsg->bogus;
-   break;
-   case IMSG_ANSWER:
-   if (IMSG_DATA_SIZE(imsg) != sizeof(*answer_imsg))
+   case IMSG_ANSWER: {
+   struct answer_header*answer_header;
+   int  data_len;
+   uint8_t *data;
+
+   if (IMSG_DATA_SIZE(imsg) < sizeof(*answer_header))
fatalx("%s: IMSG_ANSWER wrong length: "
"%lu", __func__, IMSG_DATA_SIZE(imsg));
-   answer_imsg = (struct answer_imsg *)imsg.data;
-   if ((pq = find_pending_query(answer_imsg->id)) ==
+   answer_header = (struct answer_header *)imsg.data;
+   data = (uint8_t *)imsg.data + sizeof(*answer_header);
+   if (answer_header->answer_len > 65535)
+   fatalx("%s: IMSG_ANSWER answer too big: %d",
+   __func__, answer_header->answer_len);
+   data_len = IMSG_DATA_SIZE(imsg) -
+   sizeof(*answer_header);
+
+   if ((pq = find_pending_query(answer_header->id)) ==
NULL) {
-   log_warnx("cannot find pending query %llu",
-   answer_imsg->id);
+   log_warnx("%s: cannot find pending query %llu",
+   __func__, answer_header->id);
break;
}
 
-   p = realloc(pq->answer, pq->answer_len +
-   answer_imsg->len);
-
-   if (p != NULL) {
-   pq->answer = p;
-   memcpy(pq->answer + pq->answer_len,
-   answer_imsg->answer, answer_imsg->len);
-   pq->answer_len += answer_imsg->len;
-   } else {
+   if (answer_header->srvfail) {
free(pq->answer);
pq->answer_len = 0;
pq->answer = NULL;
@@ -495,9 +478,32 @@ frontend_dispatch_resolver(int fd, short event, void *bula)
send_answer(pq);
break;
}
-   if (!answer_imsg->truncated)
+
+   if (pq->answer == NULL) {
+ 

uvm_fault: Kill goto Case2

2020-11-13 Thread Martin Pieuchot
Another simple refactoring of uvm_fault() removing a goto, ok?

Index: uvm/uvm_fault.c
===
RCS file: /cvs/src/sys/uvm/uvm_fault.c,v
retrieving revision 1.106
diff -u -p -r1.106 uvm_fault.c
--- uvm/uvm_fault.c 13 Nov 2020 14:18:25 -  1.106
+++ uvm/uvm_fault.c 13 Nov 2020 15:01:41 -
@@ -942,12 +942,24 @@ ReFault:
return error;
}
 
-   amap = ufi.entry->aref.ar_amap;
-   uobj = ufi.entry->object.uvm_obj;
-
/* (shadowed == TRUE) if there is an anon at the faulting address */
shadowed = uvm_fault_upper_lookup(, , anons, pages);
 
+   /* handle case 1: fault on an anon in our amap */
+   if (shadowed == TRUE) {
+   error = uvm_fault_upper(, , anons, fault_type,
+   access_type);
+   switch (error) {
+   case ERESTART:
+   goto ReFault;
+   default:
+   return error;
+   }
+   }
+
+   amap = ufi.entry->aref.ar_amap;
+   uobj = ufi.entry->object.uvm_obj;
+
/*
 * if the desired page is not shadowed by the amap and we have a
 * backing object, then we check to see if the backing object would
@@ -1055,30 +1067,12 @@ ReFault:
/*
 * note that at this point we are done with any front or back pages.
 * we are now going to focus on the center page (i.e. the one we've
-* faulted on).  if we have faulted on the top (anon) layer
-* [i.e. case 1], then the anon we want is anons[centeridx] (we have
-* not touched it yet).  if we have faulted on the bottom (uobj)
+* faulted on).  if we have faulted on the bottom (uobj)
 * layer [i.e. case 2] and the page was both present and available,
 * then we've got a pointer to it as "uobjpage" and we've already
 * made it BUSY.
 */
-   /*
-* there are four possible cases we must address: 1A, 1B, 2A, and 2B
-*/
-   /* redirect case 2: if we are not shadowed, go to case 2. */
-   if (shadowed == FALSE)
-   goto Case2;
-
-   /* handle case 1: fault on an anon in our amap */
-   error = uvm_fault_upper(, , anons, fault_type, access_type);
-   switch (error) {
-   case ERESTART:
-   goto ReFault;
-   default:
-   return error;
-   }
 
-Case2:
/* handle case 2: faulting on backing object or zero fill */
/*
 * note that uobjpage can not be PGO_DONTCARE at this point.  we now



uvm_pagealloc() & uvm.free accounting

2020-11-13 Thread Martin Pieuchot
The uvmexp.free* variables are read in uvm_pagealloc() & uvm_pglistalloc()
before grabbing the `fpageqlock'.

`uvmexp.free' is always modified by the pmemrange allocator under the
above motioned lock.  To avoid races and the chances of missing a wakeup,
the diff below move the checks inside the critical section.

Note that this doesn't solve the race with reading `freemin', `freetarg',
`inactive' and `inactarg'.  These are currently updated under another
lock and will be addressed in an upcoming diff.

To fix this race I introduced a new UVM_PLA_USERESERVE.  Note that those
flags are now for uvm_pmr_getpages() so reflect that in comment.

Careful reviewers will spot an off-by-one change in the check for 
pagedaemon's reserved memory.  My understanding is that it's a bugfix,
is it correct?

I also started to document the locking for some of the `uvmexp' fields.
I'm well aware that reading `uvmexp.free' is done without the correct
lock in the pagedaemon, this will be addressed soon.  The other accesses
shouldn't matter much as they are not used to make decisions.

This should hopefully be good enough to make uvm_pagealloc() completely
mp-safe, something that matters much because it's called from pmap_enter(9)
on some architectures.

ok?

Index: uvm/uvm_extern.h
===
RCS file: /cvs/src/sys/uvm/uvm_extern.h,v
retrieving revision 1.154
diff -u -p -r1.154 uvm_extern.h
--- uvm/uvm_extern.h19 Oct 2020 08:19:46 -  1.154
+++ uvm/uvm_extern.h13 Nov 2020 14:05:08 -
@@ -142,14 +142,15 @@ typedef int   vm_prot_t;
 #defineUVM_PGA_ZERO0x0002  /* returned page must be zeroed 
*/
 
 /*
- * flags for uvm_pglistalloc()
+ * flags for uvm_pmr_getpages()
  */
 #define UVM_PLA_WAITOK 0x0001  /* may sleep */
 #define UVM_PLA_NOWAIT 0x0002  /* can't sleep (need one of the two) */
 #define UVM_PLA_ZERO   0x0004  /* zero all pages before returning */
 #define UVM_PLA_TRYCONTIG  0x0008  /* try to allocate contig physmem */
 #define UVM_PLA_FAILOK 0x0010  /* caller can handle failure */
-#define UVM_PLA_NOWAKE 0x0020  /* don't wake the page daemon on 
failure */
+#define UVM_PLA_NOWAKE 0x0020  /* don't wake page daemon on failure */
+#define UVM_PLA_USERESERVE 0x0040  /* can allocate from kernel reserve */
 
 /*
  * lockflags that control the locking behavior of various functions.
Index: uvm/uvm_page.c
===
RCS file: /cvs/src/sys/uvm/uvm_page.c,v
retrieving revision 1.150
diff -u -p -r1.150 uvm_page.c
--- uvm/uvm_page.c  22 Sep 2020 14:31:08 -  1.150
+++ uvm/uvm_page.c  13 Nov 2020 13:52:25 -
@@ -733,32 +733,11 @@ uvm_pglistalloc(psize_t size, paddr_t lo
size = atop(round_page(size));
 
/*
-* check to see if we need to generate some free pages waking
-* the pagedaemon.
-*/
-   if ((uvmexp.free - BUFPAGES_DEFICIT) < uvmexp.freemin ||
-   ((uvmexp.free - BUFPAGES_DEFICIT) < uvmexp.freetarg &&
-   (uvmexp.inactive + BUFPAGES_INACT) < uvmexp.inactarg))
-   wakeup();
-
-   /*
 * XXX uvm_pglistalloc is currently only used for kernel
 * objects. Unlike the checks in uvm_pagealloc, below, here
-* we are always allowed to use the kernel reserve. However, we
-* have to enforce the pagedaemon reserve here or allocations
-* via this path could consume everything and we can't
-* recover in the page daemon.
+* we are always allowed to use the kernel reserve.
 */
- again:
-   if ((uvmexp.free <= uvmexp.reserve_pagedaemon + size &&
-   !((curproc == uvm.pagedaemon_proc) ||
-   (curproc == syncerproc {
-   if (flags & UVM_PLA_WAITOK) {
-   uvm_wait("uvm_pglistalloc");
-   goto again;
-   }
-   return (ENOMEM);
-   }
+   flags |= UVM_PLA_USERESERVE;
 
if ((high & PAGE_MASK) != PAGE_MASK) {
printf("uvm_pglistalloc: Upper boundary 0x%lx "
@@ -871,7 +850,7 @@ uvm_pagerealloc_multi(struct uvm_object 
 }
 
 /*
- * uvm_pagealloc_strat: allocate vm_page from a particular free list.
+ * uvm_pagealloc: allocate vm_page from a particular free list.
  *
  * => return null if no pages free
  * => wake up pagedaemon if number of free pages drops below low water mark
@@ -886,37 +865,16 @@ uvm_pagealloc(struct uvm_object *obj, vo
struct vm_page *pg;
struct pglist pgl;
int pmr_flags;
-   boolean_t use_reserve;
 
KASSERT(obj == NULL || anon == NULL);
+   KASSERT(anon == NULL || off == 0);
KASSERT(off == trunc_page(off));
 
-   /*
-* check to see if we need to generate some free pages waking
-* the pagedaemon.
-*/
-   if ((uvmexp.free - BUFPAGES_DEFICIT) < uvmexp.freemin 

Re: uvm_fault: is there an anon?

2020-11-13 Thread Mark Kettenis
> Date: Fri, 13 Nov 2020 08:19:03 -0300
> From: Martin Pieuchot 
> 
> On 04/11/20(Wed) 11:04, Martin Pieuchot wrote:
> > Diff below introduces a helper that looks for existing mapping.  The
> > value returned by this lookup function determines if there's an anon
> > at the faulting address which tells us if we're dealign with a fault
> > of type 1 or 2.
> > 
> > This small refactoring is part of the current work to separate the code
> > handling faults of type 1 and 2.  The end goal being to move the type 1
> > faults handling out of the KERNEL_LOCK().
> > 
> > The function name is taken from NetBSD to not introduce more difference
> > than there's already.
> 
> New diff now that the other one has been committed, ok?

ok kettenis@

> Index: uvm/uvm_fault.c
> ===
> RCS file: /cvs/src/sys/uvm/uvm_fault.c,v
> retrieving revision 1.105
> diff -u -p -r1.105 uvm_fault.c
> --- uvm/uvm_fault.c   13 Nov 2020 11:16:08 -  1.105
> +++ uvm/uvm_fault.c   13 Nov 2020 11:17:52 -
> @@ -801,6 +801,84 @@ uvm_fault_upper(struct uvm_faultinfo *uf
>   return 0;
>  }
>  
> +
> +/*
> + * uvm_fault_upper_lookup: look up existing h/w mapping and amap.
> + *
> + * iterate range of interest:
> + *  1. check if h/w mapping exists.  if yes, we don't care
> + *  2. check if anon exists.  if not, page is lower.
> + *  3. if anon exists, enter h/w mapping for neighbors.
> + */
> +boolean_t
> +uvm_fault_upper_lookup(struct uvm_faultinfo *ufi,
> +const struct uvm_faultctx *flt, struct vm_anon **anons,
> +struct vm_page **pages)
> +{
> + struct vm_amap *amap = ufi->entry->aref.ar_amap;
> + struct vm_anon *anon;
> + boolean_t shadowed;
> + vaddr_t currva;
> + paddr_t pa;
> + int lcv;
> +
> + /*
> +  * map in the backpages and frontpages we found in the amap in hopes
> +  * of preventing future faults.we also init the pages[] array as
> +  * we go.
> +  */
> + currva = flt->startva;
> + shadowed = FALSE;
> + for (lcv = 0 ; lcv < flt->npages ; lcv++, currva += PAGE_SIZE) {
> + /*
> +  * dont play with VAs that are already mapped
> +  * except for center)
> +  */
> + if (lcv != flt->centeridx &&
> + pmap_extract(ufi->orig_map->pmap, currva, )) {
> + pages[lcv] = PGO_DONTCARE;
> + continue;
> + }
> +
> + /* unmapped or center page. check if any anon at this level. */
> + if (amap == NULL || anons[lcv] == NULL) {
> + pages[lcv] = NULL;
> + continue;
> + }
> +
> + /* check for present page and map if possible. re-activate it */
> + pages[lcv] = PGO_DONTCARE;
> + if (lcv == flt->centeridx) {/* save center for later! */
> + shadowed = TRUE;
> + continue;
> + }
> + anon = anons[lcv];
> + if (anon->an_page &&
> + (anon->an_page->pg_flags & (PG_RELEASED|PG_BUSY)) == 0) {
> + uvm_lock_pageq();
> + uvm_pageactivate(anon->an_page);/* reactivate */
> + uvm_unlock_pageq();
> + uvmexp.fltnamap++;
> +
> + /*
> +  * Since this isn't the page that's actually faulting,
> +  * ignore pmap_enter() failures; it's not critical
> +  * that we enter these right now.
> +  */
> + (void) pmap_enter(ufi->orig_map->pmap, currva,
> + VM_PAGE_TO_PHYS(anon->an_page) | flt->pa_flags,
> + (anon->an_ref > 1) ?
> + (flt->enter_prot & ~PROT_WRITE) : flt->enter_prot,
> + PMAP_CANFAIL |
> +  (VM_MAPENT_ISWIRED(ufi->entry) ? PMAP_WIRED : 0));
> + }
> + }
> + if (flt->npages > 1)
> + pmap_update(ufi->orig_map->pmap);
> +
> + return shadowed;
> +}
> +
>  /*
>   *   F A U L T   -   m a i n   e n t r y   p o i n t
>   */
> @@ -827,7 +905,6 @@ uvm_fault(vm_map_t orig_map, vaddr_t vad
>   int result, lcv, gotpages;
>   vaddr_t currva;
>   voff_t uoff;
> - paddr_t pa;
>   struct vm_amap *amap;
>   struct uvm_object *uobj;
>   struct vm_anon *anons_store[UVM_MAXRANGE], **anons, *anon;
> @@ -868,61 +945,9 @@ ReFault:
>   amap = ufi.entry->aref.ar_amap;
>   uobj = ufi.entry->object.uvm_obj;
>  
> - /*
> -  * map in the backpages and frontpages we found in the amap in hopes
> -  * of preventing future faults.we also init the pages[] array as
> -  * we go.
> -  */
> - currva = flt.startva;
> - shadowed = FALSE;
> - for (lcv = 0 ; lcv < flt.npages ; lcv++, currva += PAGE_SIZE) {
> - 

Re: Strange snmpd issue with 6.7

2020-11-13 Thread Stuart Henderson
Moving this to tech@. Not an answer but here's some more information.
misc thread archived at https://marc.info/?t=16046954722=1=2
but I think the important bits are all in quotes in this mail anyway.

On 2020/11/12 20:35, Winfred Harrelson wrote:
> On Thu, Nov 12, 2020 at 09:51:46AM -, Stuart Henderson wrote:
> > On 2020-11-09, Winfred Harrelson  wrote:
> > > On Sat, Nov 07, 2020 at 01:53:00PM -, Stuart Henderson wrote:
> > >> On 2020-11-06, Winfred Harrelson  wrote:
> > >> > I am running OpenBSD 6.7 and am having a strange issue with snmpd(8).
> > >> >
> > >> > The issue is that it doesn't have all the arp entries but this was
> > >> > working before.  I don't know exactly when this started happening
> > >> > but I just noticed today.
> > >> >
> > >> > Here is the machine in question and what I get:
> > >> >
> > >> > wharrels@styx1:/home/wharrels$ uname -a
> > >> > OpenBSD styx1 6.7 GENERIC.MP#3 amd64
> > >> >
> > >> > wharrels@styx1:/home/wharrels$ arp -a | wc -l
> > >> >  985
> > >> >
> > >> > Box is acting as a firewall so that is normal.  Actually normal to
> > >> > have many more than that.  But if I do a query from another machine
> > >> > via snmpwalk I get a completely different number of machines in
> > >> > the arp table:
> > >> >
> > >> > [wharrels@newtron ~]$ snmpwalk -v2c -c public styx1 
> > >> > ip.ipNetToMediaTable.ipNetToMediaEntry.ipNetToMediaPhysAddress | wc -l
> > >> > 456
> > >> >
> > >> > Not even close to the same number of machines.  The above OID 
> > >> > translates to
> > >> > 1.3.6.1.2.1.4.22.1.2 if you want to try this and see what you get.
> > >> >
> > >> > Should I be using a different OID to get the arp table or is there
> > >> > another way to do this?  It might be that this was not working quite
> > >> > right before but I don't remember it being off like this.
> > >> >
> > >> > Any help would be appreciated, thanks.
> > >> >
> > >> > Winfred
> > >> >
> > >> >
> > >> 
> > >> If you have set "filter-routes yes" then this is expected as it will
> > >> stop snmpd from seeing route updates and thus new additions to the
> > >> ARP table.
> > >
> > > I do not have that in my config file.  Man page says the default is "no"
> > > so this should not be it correct?  I will try adding the line with a
> > > "no" just to see if that changes anything though.
> > 
> > Correct.
> > 
> > >> If you have not then I'd say this is a bug and best reported to
> > >> bugs@ rather than misc@.
> > >
> > > I am running 6.7 on this box so I may wait until I can get it updated
> > > to 6.8 before reporting to bugs@.
> > 
> > Worth doing though I think 6.8 is unlikely to help.
> > 
> > Does restarting snmpd result in picking up the full arp table again?
> 
> Yes initially.  Unfortunately, after a while they get out of sync
> again.  Gets more out of sync longer it runs though not quickly.
> 
> I have another box running 6.4 that is also having the same issue.
> I am not going to bother complaining about that one because it is
> too old and in more of a need to be updated.  Am hoping to update
> both over the coming holidays.

I see what looks like the same with -current too, I usually have
filter-routes on most firewalls/routers to reduce cpu load (especially
if running dynamic routing protocols) so I have to ignore results
from those machines because they will definitely miss any changes
made after the initial load[1],

After hunting around I found some other machines that don't use
filter-routes and do have a mismatch between arp -an and
ipNetToMediaTable entries (I suppose the simplest query to find
these is a plain snmpwalk on ipNetToMediaPhysAddress). I only
usually see either the same number or one fewer address in
ipNetToMediaTable than arp -an.

I wanted to see if I could spot anything in snmpd debug output
(snmpd -dv) when this happened so I ran this in one terminal

$ while true; do echo `snmp walk -v 2c -c public 82.68.199.132 
ipNetToMediaPhysAddress|grep -c [H]ex-STRING` `arp -an | grep -c :`; sleep .5; 
done

to show up discrepancies while running "arp -da" in another several
times.

The debug output was pretty noisy so I started again and grepped out
the query parsing lines - "snmpd -dv 2>&1 | grep -v snmpe_parse" -
and another round of arp -da, no unexpected output there.

Then I restarted again while "route -n monitor" was running, provoked
the discrepancy again, and searched monitor output for the IP address
that was missing in snmp, and compared to one that showed up.

An address showing in both snmpd and arp -an had

RTM_GET 
RTM_DELETE  
RTM_ADD 
RTM_RESOLVE 

And the address only showing in arp -an had

RTM_GET 
RTM_DELETE  
(no RTM_ADD/RTM_RESOLVE)

So whatever the root cause (probably something to do with the fact that
it's a CACHED route), there doesn't appear to be a new route socket
message generated when those arp entries come back. The lack of route
socket message would certainly explain why snmpd doesn't show the arp

Re: uvm_fault: is there an anon?

2020-11-13 Thread Jonathan Matthew
On Fri, Nov 13, 2020 at 12:17:04PM +0100, Theo Buehler wrote:
> On Wed, Nov 04, 2020 at 11:04:12AM -0300, Martin Pieuchot wrote:
> > Diff below introduces a helper that looks for existing mapping.  The
> > value returned by this lookup function determines if there's an anon
> > at the faulting address which tells us if we're dealign with a fault
> > of type 1 or 2.
> > 
> > This small refactoring is part of the current work to separate the code
> > handling faults of type 1 and 2.  The end goal being to move the type 1
> > faults handling out of the KERNEL_LOCK().
> > 
> > The function name is taken from NetBSD to not introduce more difference
> > than there's already.
> > 
> > ok?
> 
> ok tb.
> 
> I've been running the three diffs for two days and this went through two
> 'make release'
> 

Same here.  NetBSD's uvm fault handler is a lot more readable than ours, so
heading in that direction seems like a pretty good idea, ok jmatthew@



Re: uvm_fault: is there an anon?

2020-11-13 Thread Martin Pieuchot
On 04/11/20(Wed) 11:04, Martin Pieuchot wrote:
> Diff below introduces a helper that looks for existing mapping.  The
> value returned by this lookup function determines if there's an anon
> at the faulting address which tells us if we're dealign with a fault
> of type 1 or 2.
> 
> This small refactoring is part of the current work to separate the code
> handling faults of type 1 and 2.  The end goal being to move the type 1
> faults handling out of the KERNEL_LOCK().
> 
> The function name is taken from NetBSD to not introduce more difference
> than there's already.

New diff now that the other one has been committed, ok?

Index: uvm/uvm_fault.c
===
RCS file: /cvs/src/sys/uvm/uvm_fault.c,v
retrieving revision 1.105
diff -u -p -r1.105 uvm_fault.c
--- uvm/uvm_fault.c 13 Nov 2020 11:16:08 -  1.105
+++ uvm/uvm_fault.c 13 Nov 2020 11:17:52 -
@@ -801,6 +801,84 @@ uvm_fault_upper(struct uvm_faultinfo *uf
return 0;
 }
 
+
+/*
+ * uvm_fault_upper_lookup: look up existing h/w mapping and amap.
+ *
+ * iterate range of interest:
+ *  1. check if h/w mapping exists.  if yes, we don't care
+ *  2. check if anon exists.  if not, page is lower.
+ *  3. if anon exists, enter h/w mapping for neighbors.
+ */
+boolean_t
+uvm_fault_upper_lookup(struct uvm_faultinfo *ufi,
+const struct uvm_faultctx *flt, struct vm_anon **anons,
+struct vm_page **pages)
+{
+   struct vm_amap *amap = ufi->entry->aref.ar_amap;
+   struct vm_anon *anon;
+   boolean_t shadowed;
+   vaddr_t currva;
+   paddr_t pa;
+   int lcv;
+
+   /*
+* map in the backpages and frontpages we found in the amap in hopes
+* of preventing future faults.we also init the pages[] array as
+* we go.
+*/
+   currva = flt->startva;
+   shadowed = FALSE;
+   for (lcv = 0 ; lcv < flt->npages ; lcv++, currva += PAGE_SIZE) {
+   /*
+* dont play with VAs that are already mapped
+* except for center)
+*/
+   if (lcv != flt->centeridx &&
+   pmap_extract(ufi->orig_map->pmap, currva, )) {
+   pages[lcv] = PGO_DONTCARE;
+   continue;
+   }
+
+   /* unmapped or center page. check if any anon at this level. */
+   if (amap == NULL || anons[lcv] == NULL) {
+   pages[lcv] = NULL;
+   continue;
+   }
+
+   /* check for present page and map if possible. re-activate it */
+   pages[lcv] = PGO_DONTCARE;
+   if (lcv == flt->centeridx) {/* save center for later! */
+   shadowed = TRUE;
+   continue;
+   }
+   anon = anons[lcv];
+   if (anon->an_page &&
+   (anon->an_page->pg_flags & (PG_RELEASED|PG_BUSY)) == 0) {
+   uvm_lock_pageq();
+   uvm_pageactivate(anon->an_page);/* reactivate */
+   uvm_unlock_pageq();
+   uvmexp.fltnamap++;
+
+   /*
+* Since this isn't the page that's actually faulting,
+* ignore pmap_enter() failures; it's not critical
+* that we enter these right now.
+*/
+   (void) pmap_enter(ufi->orig_map->pmap, currva,
+   VM_PAGE_TO_PHYS(anon->an_page) | flt->pa_flags,
+   (anon->an_ref > 1) ?
+   (flt->enter_prot & ~PROT_WRITE) : flt->enter_prot,
+   PMAP_CANFAIL |
+(VM_MAPENT_ISWIRED(ufi->entry) ? PMAP_WIRED : 0));
+   }
+   }
+   if (flt->npages > 1)
+   pmap_update(ufi->orig_map->pmap);
+
+   return shadowed;
+}
+
 /*
  *   F A U L T   -   m a i n   e n t r y   p o i n t
  */
@@ -827,7 +905,6 @@ uvm_fault(vm_map_t orig_map, vaddr_t vad
int result, lcv, gotpages;
vaddr_t currva;
voff_t uoff;
-   paddr_t pa;
struct vm_amap *amap;
struct uvm_object *uobj;
struct vm_anon *anons_store[UVM_MAXRANGE], **anons, *anon;
@@ -868,61 +945,9 @@ ReFault:
amap = ufi.entry->aref.ar_amap;
uobj = ufi.entry->object.uvm_obj;
 
-   /*
-* map in the backpages and frontpages we found in the amap in hopes
-* of preventing future faults.we also init the pages[] array as
-* we go.
-*/
-   currva = flt.startva;
-   shadowed = FALSE;
-   for (lcv = 0 ; lcv < flt.npages ; lcv++, currva += PAGE_SIZE) {
-   /*
-* dont play with VAs that are already mapped
-* except for center)
-*/
-   if (lcv != flt.centeridx &&

Re: uvm_fault: is there an anon?

2020-11-13 Thread Theo Buehler
On Wed, Nov 04, 2020 at 11:04:12AM -0300, Martin Pieuchot wrote:
> Diff below introduces a helper that looks for existing mapping.  The
> value returned by this lookup function determines if there's an anon
> at the faulting address which tells us if we're dealign with a fault
> of type 1 or 2.
> 
> This small refactoring is part of the current work to separate the code
> handling faults of type 1 and 2.  The end goal being to move the type 1
> faults handling out of the KERNEL_LOCK().
> 
> The function name is taken from NetBSD to not introduce more difference
> than there's already.
> 
> ok?

ok tb.

I've been running the three diffs for two days and this went through two
'make release'



[PATCH]: Clearer documentation when using EVFILT_EXCEPT

2020-11-13 Thread Emil Engler
Currently it isn't mentioned that a socket is required when using 
EVFILT_EXCEPT with NOTE_OOB. To some experienced users it might be clear 
that it must be a socket but I don't think an additional word would hurt 
anyone.


Index: lib/libc/sys/kqueue.2
===
RCS file: /cvs/src/lib/libc/sys/kqueue.2,v
retrieving revision 1.42
diff -u -p -u -p -r1.42 kqueue.2
--- lib/libc/sys/kqueue.2   22 Jun 2020 13:42:06 -  1.42
+++ lib/libc/sys/kqueue.2   13 Nov 2020 10:46:44 -
@@ -315,7 +315,8 @@ Takes a descriptor as the identifier, an
 specified exceptional conditions has occurred on the descriptor.
 Conditions are specified in
 .Fa fflags .
-Currently, a filter can monitor the reception of out-of-band data with
+Currently, a filter can monitor the reception of out-of-band data on a
+socket with
 .Dv NOTE_OOB .
 .It Dv EVFILT_WRITE
 Takes a descriptor as the identifier, and returns whenever