converting uvm_km_valloc to km_alloc

2020-12-17 Thread Jonathan Matthew
On Wed, Dec 16, 2020 at 12:00:38AM +0100, Mark Kettenis wrote:
> > Date: Tue, 15 Dec 2020 21:21:37 +0100
> > From: Alexander Bluhm 
> > 
> > On Tue, Dec 15, 2020 at 06:57:03PM +0100, Mark Kettenis wrote:
> > > Does the diff below fix this?
> > 
> > I can reproduce the panic and your diff fixes it.
> > 
> > Usually my regress machines do not trigger it as I do not install
> > firmware.  fw_update and reboot makes it crash.
> > 
> > bluhm
> 
> Thanks.  This is committed now.  However, there may be other case
> where we use uvm_km_valloc() early on that will trip over the kernel
> lock assertion that mpi@ added in uvm_km_pgremove().  Ideally we
> should get rid of all the uvm_km_free() calls in the kernel.

Here are a couple of relatively easy ones, applying changes from r1.86 of
amd64's acpi_machdep.c to i386 and arm64.  I've tested i386 but it turns out I
don't have any arm64 machines with acpi.


Index: arch/arm64/arm64/acpi_machdep.c
===
RCS file: /cvs/src/sys/arch/arm64/arm64/acpi_machdep.c,v
retrieving revision 1.10
diff -u -p -u -p -r1.10 acpi_machdep.c
--- arch/arm64/arm64/acpi_machdep.c 6 Dec 2020 21:19:55 -   1.10
+++ arch/arm64/arm64/acpi_machdep.c 18 Dec 2020 00:23:01 -
@@ -74,7 +74,8 @@ acpi_map(paddr_t pa, size_t len, struct 
 {
paddr_t pgpa = trunc_page(pa);
paddr_t endpa = round_page(pa + len);
-   vaddr_t va = uvm_km_valloc(kernel_map, endpa - pgpa);
+   vaddr_t va = (vaddr_t)km_alloc(endpa - pgpa, _any, _none,
+   _nowait);
 
if (va == 0)
return (ENOMEM);
@@ -97,7 +98,7 @@ void
 acpi_unmap(struct acpi_mem_map *handle)
 {
pmap_kremove(handle->baseva, handle->vsize);
-   uvm_km_free(kernel_map, handle->baseva, handle->vsize);
+   km_free((void *)handle->baseva, handle->vsize, _any, _none);
 }
 
 int
Index: arch/i386/i386/acpi_machdep.c
===
RCS file: /cvs/src/sys/arch/i386/i386/acpi_machdep.c,v
retrieving revision 1.74
diff -u -p -u -p -r1.74 acpi_machdep.c
--- arch/i386/i386/acpi_machdep.c   21 Jul 2020 03:48:06 -  1.74
+++ arch/i386/i386/acpi_machdep.c   18 Dec 2020 00:23:01 -
@@ -117,7 +117,8 @@ acpi_map(paddr_t pa, size_t len, struct 
 {
paddr_t pgpa = trunc_page(pa);
paddr_t endpa = round_page(pa + len);
-   vaddr_t va = uvm_km_valloc(kernel_map, endpa - pgpa);
+   vaddr_t va = (vaddr_t)km_alloc(endpa - pgpa, _any, _none,
+   _nowait);
 
if (va == 0)
return (ENOMEM);
@@ -140,7 +141,7 @@ void
 acpi_unmap(struct acpi_mem_map *handle)
 {
pmap_kremove(handle->baseva, handle->vsize);
-   uvm_km_free(kernel_map, handle->baseva, handle->vsize);
+   km_free((void *)handle->baseva, handle->vsize, _any, _none);
 }
 
 int



Re: [diff] src/usr.sbin/smtpd: plug two memory leaks

2020-12-17 Thread gilles
December 17, 2020 4:02 PM, gil...@poolp.org wrote:

> Hello,
> 
> The following diffs plug two memory leaks in smtpd:
> 
> a- in lka_filter.c, the name of the filter chain for a session is strdup()-ed
> upon session allocation but not released upon session release. free() it
> in lka_filter_end().
> 
> b- in smtp_session.c, filter io channel should be released when a tx is over,
> but it isn't. call io_free() on the channel in smtp_tx_free() if we do
> have a channel ready.
> 

please disregard b-, there's no leak here as martijn@ pointed out in private



Re: diff: tcp ack improvement

2020-12-17 Thread Jan Klemkow
ping

On Fri, Nov 06, 2020 at 01:10:52AM +0100, Jan Klemkow wrote:
> Hi,
> 
> bluhm and I make some network performance measurements and kernel
> profiling.
> 
> Setup:Linux (iperf) -10gbit-> OpenBSD (relayd) -10gbit-> Linux (iperf)
> 
> We figured out, that the kernel uses a huge amount of processing time
> for sending ACKs to the sender on the receiving interface.  After
> receiving a data segment, we send our two ACK.  The first one in
> tcp_input() direct after receiving.  The second ACK is send out, after
> the userland or the sosplice task read some data out of the socket
> buffer.
> 
> The fist ACK in tcp_input() is called after receiving every other data
> segment like it is discribed in RFC1122:
> 
>   4.2.3.2  When to Send an ACK Segment
>   A TCP SHOULD implement a delayed ACK, but an ACK should
>   not be excessively delayed; in particular, the delay
>   MUST be less than 0.5 seconds, and in a stream of
>   full-sized segments there SHOULD be an ACK for at least
>   every second segment.
> 
> This advice is based on the paper "Congestion Avoidance and Control":
> 
>   4 THE GATEWAY SIDE OF CONGESTION CONTROL
>   The 8 KBps senders were talking to 4.3+BSD receivers
>   which would delay an ack for atmost one packet (because
>   of an ack’s clock’ role, the authors believe that the
>   minimum ack frequency should be every other packet).
> 
> Sending the first ACK (on every other packet) coasts us too much
> processing time.  Thus, we run into a full socket buffer earlier.  The
> first ACK just acknowledges the received data, but does not update the
> window.  The second ACK, caused by the socket buffer reader, also
> acknowledges the data and also updates the window.  So, the second ACK,
> is much more worth for a fast packet processing than the fist one.
> 
> The performance improvement is between 33% with splicing and 20% without
> splice:
> 
>   splicingrelaying
> 
>   current 3.1 GBit/s  2.6 GBit/s
>   w/o first ack   4.1 GBit/s  3.1 GBit/s
> 
> As far as I understand the implementation of other operating systems:
> Linux has implement a custom TCP_QUICKACK socket option, to turn this
> kind of feature on and off.  FreeBSD and NetBSD sill depend on it, when
> using the New Reno implementation.
> 
> The following diff turns off the direct ACK on every other segment.  We
> are running this diff in production on our own machines at genua and on
> our products for several month, now.  We don't noticed any problems,
> even with interactive network sessions (ssh) nor with bulk traffic.
> 
> Another solution could be a sysctl(3) or an additional socket option,
> similar to Linux, to control this behavior per socket or system wide.
> Also, a counter to ACK every 3rd, 4th... data segment could beat the
> problem.
> 
> bye,
> Jan
> 
> Index: netinet/tcp_input.c
> ===
> RCS file: /cvs/src/sys/netinet/tcp_input.c,v
> retrieving revision 1.365
> diff -u -p -r1.365 tcp_input.c
> --- netinet/tcp_input.c   19 Jun 2020 22:47:22 -  1.365
> +++ netinet/tcp_input.c   5 Nov 2020 23:00:34 -
> @@ -165,8 +165,8 @@ do { \
>  #endif
>  
>  /*
> - * Macro to compute ACK transmission behavior.  Delay the ACK unless
> - * we have already delayed an ACK (must send an ACK every two segments).
> + * Macro to compute ACK transmission behavior.  Delay the ACK until
> + * a read from the socket buffer or the delayed ACK timer causes one.
>   * We also ACK immediately if we received a PUSH and the ACK-on-PUSH
>   * option is enabled or when the packet is coming from a loopback
>   * interface.
> @@ -176,8 +176,7 @@ do { \
>   struct ifnet *ifp = NULL; \
>   if (m && (m->m_flags & M_PKTHDR)) \
>   ifp = if_get(m->m_pkthdr.ph_ifidx); \
> - if (TCP_TIMER_ISARMED(tp, TCPT_DELACK) || \
> - (tcp_ack_on_push && (tiflags) & TH_PUSH) || \
> + if ((tcp_ack_on_push && (tiflags) & TH_PUSH) || \
>   (ifp && (ifp->if_flags & IFF_LOOPBACK))) \
>   tp->t_flags |= TF_ACKNOW; \
>   else \
> 



Re: WITNESS panic: acquiring blockable sleep lock with spinlock or critical section held (rwlock) kmmaplk

2020-12-17 Thread Martin Pieuchot
On 17/12/20(Thu) 23:16, Mark Kettenis wrote:
> > Date: Thu, 17 Dec 2020 18:56:52 -0300
> > From: Martin Pieuchot 
> > 
> > On 16/12/20(Wed) 22:49, Greg Steuck wrote:
> > > I just hit this while booting an i386-current in vmd. The source tree is
> > > synced to "Remove the assertion in uvm_km_pgremove()."
> > > 
> > > I enabled WITNESS on top of GENERIC. Naturally, GENERIC-Dec15 snap works.
> > > 
> > > Anybody else see this so I know it's worth a bisect?
> > > [...]
> > 
> > I can reproduce it.  Diff below fixes it.  This is the beginning of a
> > rabbit hole... thanks!
> > 
> > > witness: lock_object uninitialized: 0xd0f3c828
> > > Starting stack trace...
> > > witness_checkorder(0,d6bb011c,d1155e6c,d02e10e4,90) at 
> > > witness_checkorder+0x8a
> > > witness_checkorder(d0f3c828,9,0) at witness_checkorder+0x8a
> > > mtx_enter(d0f3c81c) at mtx_enter+0x27
> > > pmap_extract_pae(d8bb0d80,f5605000,d8bb0da0) at pmap_extract_pae+0x53
> > > pmap_pinit_pd_pae(d8bb0d80) at pmap_pinit_pd_pae+0x268
> > > pmap_create(1,1000,f6fe5e86,d8bbfd54,d0f5ba18) at pmap_create+0xa8
> > > uvmspace_fork(d0f5b5fc,d8bb3e34,d0f5b5fc,1,d1155f70) at uvmspace_fork+0x56
> > > process_new(d8bb3e34,d0f5b5fc,1) at process_new+0xeb
> > > fork1(d0eb7b14,1,d04eb560,0,0,d1155f90) at fork1+0x1ba
> > > panic: acquiring blockable sleep lock with spinlock or critical section 
> > > held (rwlock) kmmaplk
> > 
> > pmap_kernel()'s mutexes aren't initialized.  Diff below does that.
> 
> Well, that is somewhat intentional.  Those mutexes should never be
> used for the kernel pmap.  The kernel pmap is always there and is
> updated atomically.
> 
> So how did we end up trying to grab one of these mutexs?

pmap_map_ptes() (both version of them) grab the current's pmap
`pm_apte_mtx' which ends up being the kernel one in this case.

> > Index: arch/i386/i386/pmap.c
> > ===
> > RCS file: /cvs/src/sys/arch/i386/i386/pmap.c,v
> > retrieving revision 1.209
> > diff -u -p -r1.209 pmap.c
> > --- arch/i386/i386/pmap.c   24 Sep 2020 11:36:50 -  1.209
> > +++ arch/i386/i386/pmap.c   17 Dec 2020 21:47:11 -
> > @@ -961,6 +961,8 @@ pmap_bootstrap(vaddr_t kva_start)
> >  */
> >  
> > kpm = pmap_kernel();
> > +   mtx_init(>pm_mtx, IPL_VM);
> > +   mtx_init(>pm_apte_mtx, IPL_VM);
> > uvm_objinit(>pm_obj, NULL, 1);
> > bzero(>pm_list, sizeof(kpm->pm_list));  /* pm_list not used */
> > kpm->pm_pdir = (vaddr_t)(proc0.p_addr->u_pcb.pcb_cr3 + KERNBASE);
> > 
> > 



Re: WITNESS panic: acquiring blockable sleep lock with spinlock or critical section held (rwlock) kmmaplk

2020-12-17 Thread Mark Kettenis
> Date: Thu, 17 Dec 2020 18:56:52 -0300
> From: Martin Pieuchot 
> 
> On 16/12/20(Wed) 22:49, Greg Steuck wrote:
> > I just hit this while booting an i386-current in vmd. The source tree is
> > synced to "Remove the assertion in uvm_km_pgremove()."
> > 
> > I enabled WITNESS on top of GENERIC. Naturally, GENERIC-Dec15 snap works.
> > 
> > Anybody else see this so I know it's worth a bisect?
> > [...]
> 
> I can reproduce it.  Diff below fixes it.  This is the beginning of a
> rabbit hole... thanks!
> 
> > witness: lock_object uninitialized: 0xd0f3c828
> > Starting stack trace...
> > witness_checkorder(0,d6bb011c,d1155e6c,d02e10e4,90) at 
> > witness_checkorder+0x8a
> > witness_checkorder(d0f3c828,9,0) at witness_checkorder+0x8a
> > mtx_enter(d0f3c81c) at mtx_enter+0x27
> > pmap_extract_pae(d8bb0d80,f5605000,d8bb0da0) at pmap_extract_pae+0x53
> > pmap_pinit_pd_pae(d8bb0d80) at pmap_pinit_pd_pae+0x268
> > pmap_create(1,1000,f6fe5e86,d8bbfd54,d0f5ba18) at pmap_create+0xa8
> > uvmspace_fork(d0f5b5fc,d8bb3e34,d0f5b5fc,1,d1155f70) at uvmspace_fork+0x56
> > process_new(d8bb3e34,d0f5b5fc,1) at process_new+0xeb
> > fork1(d0eb7b14,1,d04eb560,0,0,d1155f90) at fork1+0x1ba
> > panic: acquiring blockable sleep lock with spinlock or critical section 
> > held (rwlock) kmmaplk
> 
> pmap_kernel()'s mutexes aren't initialized.  Diff below does that.

Well, that is somewhat intentional.  Those mutexes should never be
used for the kernel pmap.  The kernel pmap is always there and is
updated atomically.

So how did we end up trying to grab one of these mutexs?

> Index: arch/i386/i386/pmap.c
> ===
> RCS file: /cvs/src/sys/arch/i386/i386/pmap.c,v
> retrieving revision 1.209
> diff -u -p -r1.209 pmap.c
> --- arch/i386/i386/pmap.c 24 Sep 2020 11:36:50 -  1.209
> +++ arch/i386/i386/pmap.c 17 Dec 2020 21:47:11 -
> @@ -961,6 +961,8 @@ pmap_bootstrap(vaddr_t kva_start)
>*/
>  
>   kpm = pmap_kernel();
> + mtx_init(>pm_mtx, IPL_VM);
> + mtx_init(>pm_apte_mtx, IPL_VM);
>   uvm_objinit(>pm_obj, NULL, 1);
>   bzero(>pm_list, sizeof(kpm->pm_list));  /* pm_list not used */
>   kpm->pm_pdir = (vaddr_t)(proc0.p_addr->u_pcb.pcb_cr3 + KERNBASE);
> 
> 



Re: WITNESS panic: acquiring blockable sleep lock with spinlock or critical section held (rwlock) kmmaplk

2020-12-17 Thread Martin Pieuchot
On 16/12/20(Wed) 22:49, Greg Steuck wrote:
> I just hit this while booting an i386-current in vmd. The source tree is
> synced to "Remove the assertion in uvm_km_pgremove()."
> 
> I enabled WITNESS on top of GENERIC. Naturally, GENERIC-Dec15 snap works.
> 
> Anybody else see this so I know it's worth a bisect?
> [...]

I can reproduce it.  Diff below fixes it.  This is the beginning of a
rabbit hole... thanks!

> witness: lock_object uninitialized: 0xd0f3c828
> Starting stack trace...
> witness_checkorder(0,d6bb011c,d1155e6c,d02e10e4,90) at witness_checkorder+0x8a
> witness_checkorder(d0f3c828,9,0) at witness_checkorder+0x8a
> mtx_enter(d0f3c81c) at mtx_enter+0x27
> pmap_extract_pae(d8bb0d80,f5605000,d8bb0da0) at pmap_extract_pae+0x53
> pmap_pinit_pd_pae(d8bb0d80) at pmap_pinit_pd_pae+0x268
> pmap_create(1,1000,f6fe5e86,d8bbfd54,d0f5ba18) at pmap_create+0xa8
> uvmspace_fork(d0f5b5fc,d8bb3e34,d0f5b5fc,1,d1155f70) at uvmspace_fork+0x56
> process_new(d8bb3e34,d0f5b5fc,1) at process_new+0xeb
> fork1(d0eb7b14,1,d04eb560,0,0,d1155f90) at fork1+0x1ba
> panic: acquiring blockable sleep lock with spinlock or critical section held 
> (rwlock) kmmaplk

pmap_kernel()'s mutexes aren't initialized.  Diff below does that.

Index: arch/i386/i386/pmap.c
===
RCS file: /cvs/src/sys/arch/i386/i386/pmap.c,v
retrieving revision 1.209
diff -u -p -r1.209 pmap.c
--- arch/i386/i386/pmap.c   24 Sep 2020 11:36:50 -  1.209
+++ arch/i386/i386/pmap.c   17 Dec 2020 21:47:11 -
@@ -961,6 +961,8 @@ pmap_bootstrap(vaddr_t kva_start)
 */
 
kpm = pmap_kernel();
+   mtx_init(>pm_mtx, IPL_VM);
+   mtx_init(>pm_apte_mtx, IPL_VM);
uvm_objinit(>pm_obj, NULL, 1);
bzero(>pm_list, sizeof(kpm->pm_list));  /* pm_list not used */
kpm->pm_pdir = (vaddr_t)(proc0.p_addr->u_pcb.pcb_cr3 + KERNBASE);



Enhancing (some) PF state

2020-12-17 Thread Sven F.
Dear readers,

pfctl -vv -ss shows detailed information on states.
I would like to improve the information provided about specific TCP connections,
regarding the latency of the network.
An obvious way seems to be to measure the time to get ACKs back.
Another way would be to use packets timestamps.

I patched my kernel to extract this information and
log it, before trying to report it to the userland.
I did not use timestamps, because I am not sure how i could do that.
If you have any advice on that, it would be welcome.
Moreover the patch is a prototype,
So I would appreciate any feedback on my diff (attached):
Currently the code is using a LABEL to trigger the measure,
of course, later it should be a keyword like "latency" in the rules.
For example :
match proto tcp to port 80 latency
Or something else.
This would be discuss after choosing a method for latency computation,

Maybe there is a better way to extract network TCP latencies information
( i would like to avoid running in promiscuous, but if a current
packaged software does it well... )
but I did not come across it.
Please share if you know of a better way to tackle this.


Happy holidays !


lag.diff
Description: Binary data


[diff] src/usr.sbin/smtpd: plug two memory leaks

2020-12-17 Thread gilles
Hello,

The following diffs plug two memory leaks in smtpd:

a- in lka_filter.c, the name of the filter chain for a session is strdup()-ed
upon session allocation but not released upon session release. free() it
in lka_filter_end().

b- in smtp_session.c, filter io channel should be released when a tx is over,
but it isn't. call io_free() on the channel in smtp_tx_free() if we do
have a channel ready.

Gilles


diff --git a/usr.sbin/smtpd/lka_filter.c b/usr.sbin/smtpd/lka_filter.c
index 9891e6140a3..6eb0829efca 100644
--- a/usr.sbin/smtpd/lka_filter.c
+++ b/usr.sbin/smtpd/lka_filter.c
@@ -535,6 +535,7 @@ lka_filter_end(uint64_t reqid)
free(fs->mail_from);
free(fs->username);
free(fs->lastparam);
+   free(fs->filter_name);
free(fs);
log_trace(TRACE_FILTERS, "%016"PRIx64" filters session-end", reqid);
 }

diff --git a/usr.sbin/smtpd/smtp_session.c b/usr.sbin/smtpd/smtp_session.c
index 60123ad9a80..4530f44fb69 100644
--- a/usr.sbin/smtpd/smtp_session.c
+++ b/usr.sbin/smtpd/smtp_session.c
@@ -2438,6 +2438,11 @@ smtp_tx_free(struct smtp_tx *tx)
if (tx->ofile)
fclose(tx->ofile);
 
+   if (tx->filter) {
+   io_free(tx->filter);
+   tx->filter = NULL;
+   }
+
tx->session->tx = NULL;
 
free(tx);



Re: [diff] src/usr.sbin/smtpd: plug two memory leaks

2020-12-17 Thread Todd C . Miller
On Thu, 17 Dec 2020 15:02:41 +, gil...@poolp.org wrote:

> a- in lka_filter.c, the name of the filter chain for a session is strdup()-ed
> upon session allocation but not released upon session release. free() it
> in lka_filter_end().
>
> b- in smtp_session.c, filter io channel should be released when a tx is over,
> but it isn't. call io_free() on the channel in smtp_tx_free() if we do
> have a channel ready.
>
> diff --git a/usr.sbin/smtpd/lka_filter.c b/usr.sbin/smtpd/lka_filter.c
> index 9891e6140a3..6eb0829efca 100644
> --- a/usr.sbin/smtpd/lka_filter.c
> +++ b/usr.sbin/smtpd/lka_filter.c
> @@ -535,6 +535,7 @@ lka_filter_end(uint64_t reqid)
>   free(fs->mail_from);
>   free(fs->username);
>   free(fs->lastparam);
> + free(fs->filter_name);
>   free(fs);
>   log_trace(TRACE_FILTERS, "%016"PRIx64" filters session-end", reqid);
>  }

OK

> diff --git a/usr.sbin/smtpd/smtp_session.c b/usr.sbin/smtpd/smtp_session.c
> index 60123ad9a80..4530f44fb69 100644
> --- a/usr.sbin/smtpd/smtp_session.c
> +++ b/usr.sbin/smtpd/smtp_session.c
> @@ -2438,6 +2438,11 @@ smtp_tx_free(struct smtp_tx *tx)
>   if (tx->ofile)
>   fclose(tx->ofile);
>  
> + if (tx->filter) {
> + io_free(tx->filter);
> + tx->filter = NULL;
> + }
> +
>   tx->session->tx = NULL;
>  
>   free(tx);

Is there any reason to clear tx->filter when we are about to free tx?
If so, maybe tx->ofile should be cleared after close too.

 - todd



Re: regress print target name

2020-12-17 Thread Theo Buehler
On Thu, Dec 17, 2020 at 12:01:25PM +0100, Alexander Bluhm wrote:
> On Wed, Dec 16, 2020 at 04:42:59PM +0100, Alexander Bluhm wrote:
> > When debugging tests, it is useful to see the target name and which
> > output belongs to it.
> 
> A small addition:
> 
> Run setup_once targets in a sepearate block with headline before
> all other targets.
> 
> ok?

ok tb

> 
> bluhm
> 
> Index: share/mk/bsd.regress.mk
> ===
> RCS file: /data/mirror/openbsd/cvs/src/share/mk/bsd.regress.mk,v
> retrieving revision 1.22
> diff -u -p -r1.22 bsd.regress.mk
> --- share/mk/bsd.regress.mk   16 Dec 2020 16:53:24 -  1.22
> +++ share/mk/bsd.regress.mk   17 Dec 2020 00:56:08 -
> @@ -75,13 +75,16 @@ ${REGRESS_TARGETS}: ${REGRESS_SETUP}
>  CLEANFILES+=${REGRESS_SETUP_ONCE:S/^/stamp-/}
>  ${REGRESS_TARGETS}: ${REGRESS_SETUP_ONCE:S/^/stamp-/}
>  ${REGRESS_SETUP_ONCE:S/^/stamp-/}: .SILENT
> + echo ' ${@:S/^stamp-//} '
>   ${MAKE} -C ${.CURDIR} ${@:S/^stamp-//}
>   date >$@
> + echo
>  .endif
>  
>  regress: .SILENT
>  .if !empty(REGRESS_SETUP_ONCE)
>   rm -f ${REGRESS_SETUP_ONCE:S/^/stamp-/}
> + ${MAKE} -C ${.CURDIR} ${REGRESS_SETUP_ONCE:S/^/stamp-/}
>  .endif
>  .for RT in ${REGRESS_TARGETS}
>   echo ' ${RT} '
> 



bgpd: getifaddrs ifa_addr NULL check

2020-12-17 Thread Claudio Jeker
getifaddrs can return a struct ifaddrs entry with a NULL ifa_addr.
I think an unnumbered point-to-point interface can trigger this.
So better check for it before accessing anything in ifa_addr.

-- 
:wq Claudio

Index: config.c
===
RCS file: /cvs/src/usr.sbin/bgpd/config.c,v
retrieving revision 1.95
diff -u -p -r1.95 config.c
--- config.c14 Feb 2020 13:54:31 -  1.95
+++ config.c4 Dec 2020 11:46:33 -
@@ -339,7 +339,8 @@ get_bgpid(void)
fatal("getifaddrs");
 
for (ifa = ifap; ifa; ifa = ifa->ifa_next) {
-   if (ifa->ifa_addr->sa_family != AF_INET)
+   if (ifa->ifa_addr == NULL ||
+   ifa->ifa_addr->sa_family != AF_INET)
continue;
cur = ((struct sockaddr_in *)ifa->ifa_addr)->sin_addr.s_addr;
if ((cur & localnet) == localnet)   /* skip 127/8 */
Index: session.c
===
RCS file: /cvs/src/usr.sbin/bgpd/session.c,v
retrieving revision 1.406
diff -u -p -r1.406 session.c
--- session.c   11 Dec 2020 12:00:01 -  1.406
+++ session.c   17 Dec 2020 12:18:54 -
@@ -1223,7 +1223,8 @@ get_alternate_addr(struct sockaddr *sa, 
fatal("getifaddrs");
 
for (match = ifap; match != NULL; match = match->ifa_next)
-   if (sa_cmp(sa, match->ifa_addr) == 0)
+   if (match->ifa_addr != NULL &&
+   sa_cmp(sa, match->ifa_addr) == 0)
break;
 
if (match == NULL) {
@@ -1234,7 +1235,8 @@ get_alternate_addr(struct sockaddr *sa, 
switch (sa->sa_family) {
case AF_INET6:
for (ifa = ifap; ifa != NULL; ifa = ifa->ifa_next) {
-   if (ifa->ifa_addr->sa_family == AF_INET &&
+   if (ifa->ifa_addr != NULL &&
+   ifa->ifa_addr->sa_family == AF_INET &&
strcmp(ifa->ifa_name, match->ifa_name) == 0) {
sa2addr(ifa->ifa_addr, alt, NULL);
break;
@@ -1243,10 +1245,12 @@ get_alternate_addr(struct sockaddr *sa, 
break;
case AF_INET:
for (ifa = ifap; ifa != NULL; ifa = ifa->ifa_next) {
-   struct sockaddr_in6 *s =
-   (struct sockaddr_in6 *)ifa->ifa_addr;
-   if (ifa->ifa_addr->sa_family == AF_INET6 &&
+   if (ifa->ifa_addr != NULL &&
+   ifa->ifa_addr->sa_family == AF_INET6 &&
strcmp(ifa->ifa_name, match->ifa_name) == 0) {
+   struct sockaddr_in6 *s =
+   (struct sockaddr_in6 *)ifa->ifa_addr;
+
/* only accept global scope addresses */
if (IN6_IS_ADDR_LINKLOCAL(>sin6_addr) ||
IN6_IS_ADDR_SITELOCAL(>sin6_addr))



Re: bgpd send side hold timer

2020-12-17 Thread Claudio Jeker
On Wed, Dec 16, 2020 at 10:41:42PM +, Job Snijders wrote:
> On Tue, Dec 15, 2020 at 05:02:19PM +0100, Claudio Jeker wrote:
> > On Mon, Dec 14, 2020 at 06:22:09PM +, Job Snijders wrote:
> > > This patch appears to be a very elegant solution to a thorny subtle
> > > problem: what to do when a peer is not accepting new routing
> > > information from you?
> > 
> > One thing I'm unsure about is the value of the SendHold timer. I reused
> > the hold timer value with the assumption that for dead connections the
> > regular hold timer expires before the SendHold timer (the send buffer
> > needs to be full before send starts blocking).
> 
> Let's be conservative while being progressive! :-)
> 
> If the 'Send Hold Timer' value is moved from 'infinite' to 90 seconds
> ("The suggested default value for the HoldTime", RFC 4271), I think
> we'll be able to see benefits.
> 
> > People should look out for cases where the SendHold timer triggered before
> > either a NOTIFICATION form the peer arrived or where the SendHold timer
> > triggered before the HoldTimer. Now that may be tricky since both SendHold
> > and Hold timer trigger the same EVNT_TIMER_HOLDTIME event so they can not
> > be distinguished easily.
> 
> Separation of the cases might be helpful.
> 
> > I think that the SendHold timer will almost never trigger and if it does
> > only for the case where a session only works in one direction.
> 
> If it is rare, maybe it should be logged as a unique message:
> 
> "SendHoldTimer Expired".
> 

This diff does both suggestions. Adds a new event and a uses the default
hold time of 90 sec for the send timeout (unless holdtime is larger than
90 sec in which case holdtime is used).

This will show up in the logs like this:
neighbor (badjojo): sending notification: HoldTimer expired
neighbor (badjojo): state change Established -> Idle, reason: SendHoldTimer 
expired

In bgpctl show nei badjojo it will show this:
  Last error sent: HoldTimer expired

Since there is no NOTIFICATION error code for SendHoldTimer expired this
is about the best we can do for now.
-- 
:wq Claudio

Index: bgpd.h
===
RCS file: /cvs/src/usr.sbin/bgpd/bgpd.h,v
retrieving revision 1.405
diff -u -p -r1.405 bgpd.h
--- bgpd.h  5 Nov 2020 11:52:59 -   1.405
+++ bgpd.h  17 Dec 2020 12:01:38 -
@@ -1377,6 +1377,7 @@ static const char * const eventnames[] =
"ConnectRetryTimer expired",
"HoldTimer expired",
"KeepaliveTimer expired",
+   "SendHoldTimer expired",
"OPEN message received",
"KEEPALIVE message received",
"UPDATE message received",
@@ -1467,6 +1468,7 @@ static const char * const timernames[] =
"ConnectRetryTimer",
"KeepaliveTimer",
"HoldTimer",
+   "SendHoldTimer",
"IdleHoldTimer",
"IdleHoldResetTimer",
"CarpUndemoteTimer",
Index: session.c
===
RCS file: /cvs/src/usr.sbin/bgpd/session.c,v
retrieving revision 1.406
diff -u -p -r1.406 session.c
--- session.c   11 Dec 2020 12:00:01 -  1.406
+++ session.c   17 Dec 2020 12:11:35 -
@@ -375,6 +375,9 @@ session_main(int debug, int verbose)
case Timer_Hold:
bgp_fsm(p, EVNT_TIMER_HOLDTIME);
break;
+   case Timer_SendHold:
+   bgp_fsm(p, EVNT_TIMER_SENDHOLD);
+   break;
case Timer_ConnectRetry:
bgp_fsm(p, EVNT_TIMER_CONNRETRY);
break;
@@ -597,6 +600,7 @@ bgp_fsm(struct peer *peer, enum session_
switch (event) {
case EVNT_START:
timer_stop(>timers, Timer_Hold);
+   timer_stop(>timers, Timer_SendHold);
timer_stop(>timers, Timer_Keepalive);
timer_stop(>timers, Timer_IdleHold);
 
@@ -709,6 +713,7 @@ bgp_fsm(struct peer *peer, enum session_
change_state(peer, STATE_IDLE, event);
break;
case EVNT_TIMER_HOLDTIME:
+   case EVNT_TIMER_SENDHOLD:
session_notification(peer, ERR_HOLDTIMEREXPIRED,
0, NULL, 0);
change_state(peer, STATE_IDLE, event);
@@ -749,6 +754,7 @@ bgp_fsm(struct peer *peer, enum session_
change_state(peer, STATE_IDLE, event);
break;
case EVNT_TIMER_HOLDTIME:
+   case EVNT_TIMER_SENDHOLD:
session_notification(peer, ERR_HOLDTIMEREXPIRED,
0, NULL, 0);
change_state(peer, STATE_IDLE, event);
@@ -784,6 

amd64 pmap pv_entry SLIST

2020-12-17 Thread Alexander Bluhm
Hi,

Can we convert the pv_entry list in amd64 pmap into an SLIST?
I think the code with macros is easier to read.

ok?

bluhm

Index: arch/amd64//amd64/pmap.c
===
RCS file: /data/mirror/openbsd/cvs/src/sys/arch/amd64/amd64/pmap.c,v
retrieving revision 1.141
diff -u -p -r1.141 pmap.c
--- arch/amd64//amd64/pmap.c16 Dec 2020 21:11:35 -  1.141
+++ arch/amd64//amd64/pmap.c16 Dec 2020 21:13:40 -
@@ -321,9 +321,9 @@ void pmap_remove_ept(struct pmap *, vadd
 void pmap_do_remove_ept(struct pmap *, vaddr_t);
 int pmap_enter_ept(struct pmap *, vaddr_t, paddr_t, vm_prot_t);
 int pmap_remove_pte(struct pmap *, struct vm_page *, pt_entry_t *,
-vaddr_t, int, struct pv_entry **);
+vaddr_t, int, struct pvlist *);
 void pmap_remove_ptes(struct pmap *, struct vm_page *, vaddr_t,
-vaddr_t, vaddr_t, int, struct pv_entry **);
+vaddr_t, vaddr_t, int, struct pvlist *);
 #define PMAP_REMOVE_ALL0   /* remove all mappings */
 #define PMAP_REMOVE_SKIPWIRED  1   /* skip wired mappings */
 
@@ -1029,8 +1029,7 @@ pmap_enter_pv(struct vm_page *pg, struct
pve->pv_va = va;
pve->pv_ptp = ptp;  /* NULL for kernel pmap */
mtx_enter(>mdpage.pv_mtx);
-   pve->pv_next = pg->mdpage.pv_list;  /* add to ... */
-   pg->mdpage.pv_list = pve;   /* ... list */
+   SLIST_INSERT_HEAD(>mdpage.pv_list, pve, pv_next);
mtx_leave(>mdpage.pv_mtx);
 }
 
@@ -1044,16 +1043,19 @@ pmap_enter_pv(struct vm_page *pg, struct
 struct pv_entry *
 pmap_remove_pv(struct vm_page *pg, struct pmap *pmap, vaddr_t va)
 {
-   struct pv_entry *pve, **prevptr;
+   struct pv_entry *pve, *prev;
 
mtx_enter(>mdpage.pv_mtx);
-   prevptr = >mdpage.pv_list;
-   while ((pve = *prevptr) != NULL) {
+   prev = NULL;
+   SLIST_FOREACH(pve, >mdpage.pv_list, pv_next) {
if (pve->pv_pmap == pmap && pve->pv_va == va) { /* match? */
-   *prevptr = pve->pv_next;/* remove it! */
+   if (prev == NULL)
+   SLIST_REMOVE_HEAD(>mdpage.pv_list, pv_next);
+   else
+   SLIST_REMOVE_AFTER(prev, pv_next);
break;
}
-   prevptr = >pv_next;/* previous pointer */
+   prev = pve; /* previous pointer */
}
mtx_leave(>mdpage.pv_mtx);
return(pve);/* return removed pve */
@@ -1583,7 +1585,7 @@ pmap_copy_page(struct vm_page *srcpg, st
 
 void
 pmap_remove_ptes(struct pmap *pmap, struct vm_page *ptp, vaddr_t ptpva,
-vaddr_t startva, vaddr_t endva, int flags, struct pv_entry **free_pvs)
+vaddr_t startva, vaddr_t endva, int flags, struct pvlist *free_pvs)
 {
struct pv_entry *pve;
pt_entry_t *pte = (pt_entry_t *) ptpva;
@@ -1643,10 +1645,8 @@ pmap_remove_ptes(struct pmap *pmap, stru
/* sync R/M bits */
pmap_sync_flags_pte(pg, opte);
pve = pmap_remove_pv(pg, pmap, startva);
-   if (pve != NULL) {
-   pve->pv_next = *free_pvs;
-   *free_pvs = pve;
-   }
+   if (pve != NULL)
+   SLIST_INSERT_HEAD(free_pvs, pve, pv_next);
 
/* end of "for" loop: time for next pte */
}
@@ -1663,7 +1663,7 @@ pmap_remove_ptes(struct pmap *pmap, stru
 
 int
 pmap_remove_pte(struct pmap *pmap, struct vm_page *ptp, pt_entry_t *pte,
-vaddr_t va, int flags, struct pv_entry **free_pvs)
+vaddr_t va, int flags, struct pvlist *free_pvs)
 {
struct pv_entry *pve;
struct vm_page *pg;
@@ -1708,10 +1708,8 @@ pmap_remove_pte(struct pmap *pmap, struc
/* sync R/M bits */
pmap_sync_flags_pte(pg, opte);
pve = pmap_remove_pv(pg, pmap, va);
-   if (pve != NULL) {
-   pve->pv_next = *free_pvs;
-   *free_pvs = pve;
-   }
+   if (pve != NULL)
+   SLIST_INSERT_HEAD(free_pvs, pve, pv_next);
 
return 1;
 }
@@ -1746,7 +1744,7 @@ pmap_do_remove(struct pmap *pmap, vaddr_
vaddr_t blkendva;
struct vm_page *ptp;
struct pv_entry *pve;
-   struct pv_entry *free_pvs = NULL;
+   struct pvlist free_pvs = SLIST_HEAD_INITIALIZER(pvlist);
vaddr_t va;
int shootall = 0, shootself;
struct pg_to_free empty_ptps;
@@ -1864,8 +1862,8 @@ pmap_do_remove(struct pmap *pmap, vaddr_
pmap_tlb_shootwait();
 
 cleanup:
-   while ((pve = free_pvs) != NULL) {
-   free_pvs = pve->pv_next;
+   while ((pve = SLIST_FIRST(_pvs)) != NULL) {
+   SLIST_REMOVE_HEAD(_pvs, pv_next);
pool_put(_pv_pool, pve);
}
 
@@ -1898,7 +1896,7 @@ pmap_page_remove(struct vm_page *pg)

dig vs ipv6 (scoped) addresses

2020-12-17 Thread Otto Moerbeek
Hi,

as noted on misc dig does not like to talk to local link addresses.
This fixes that case. While investigating I also found another bug:
selecting v6 or v4 addresses only from resolv.conf via the -4 or -6
command line argument does not work as expected.

This fixes both. I did not test binding to a src address with this.
This is likely as broken as it was before.

-Otto

Index: dig.c
===
RCS file: /cvs/src/usr.bin/dig/dig.c,v
retrieving revision 1.18
diff -u -p -r1.18 dig.c
--- dig.c   15 Sep 2020 11:47:42 -  1.18
+++ dig.c   17 Dec 2020 11:06:49 -
@@ -1358,7 +1358,7 @@ dash_option(char *option, char *next, di
} else
srcport = 0;
if (have_ipv6 && inet_pton(AF_INET6, value, ) == 1)
-   isc_sockaddr_fromin6(_address, , srcport);
+   isc_sockaddr_fromin6(_address, , srcport, 0);
else if (have_ipv4 && inet_pton(AF_INET, value, ) == 1)
isc_sockaddr_fromin(_address, , srcport);
else
Index: dighost.c
===
RCS file: /cvs/src/usr.bin/dig/dighost.c,v
retrieving revision 1.34
diff -u -p -r1.34 dighost.c
--- dighost.c   15 Sep 2020 11:47:42 -  1.34
+++ dighost.c   17 Dec 2020 11:06:49 -
@@ -540,7 +540,7 @@ get_addresses(const char *hostname, in_p
struct sockaddr_in6 *sin6;
sin6 = (struct sockaddr_in6 *)tmpai->ai_addr;
isc_sockaddr_fromin6([i], >sin6_addr,
-dstport);
+dstport, sin6->sin6_scope_id);
}
i++;
 
@@ -972,7 +972,7 @@ parse_netprefix(struct sockaddr_storage 
 
if (inet_pton(AF_INET6, buf, ) == 1) {
parsed = 1;
-   isc_sockaddr_fromin6(sa, , 0);
+   isc_sockaddr_fromin6(sa, , 0, 0);
if (prefix_length > 128)
prefix_length = 128;
} else if (inet_pton(AF_INET, buf, ) == 1) {
Index: lib/isc/sockaddr.c
===
RCS file: /cvs/src/usr.bin/dig/lib/isc/sockaddr.c,v
retrieving revision 1.14
diff -u -p -r1.14 sockaddr.c
--- lib/isc/sockaddr.c  28 Nov 2020 06:33:55 -  1.14
+++ lib/isc/sockaddr.c  17 Dec 2020 11:06:49 -
@@ -223,7 +223,7 @@ isc_sockaddr_anyofpf(struct sockaddr_sto
 
 void
 isc_sockaddr_fromin6(struct sockaddr_storage *sockaddr, const struct in6_addr 
*ina6,
-in_port_t port)
+in_port_t port, uint32_t scope)
 {
struct sockaddr_in6 *sin6 = (struct sockaddr_in6 *) sockaddr;
memset(sockaddr, 0, sizeof(*sockaddr));
@@ -231,6 +231,7 @@ isc_sockaddr_fromin6(struct sockaddr_sto
sin6->sin6_len = sizeof(*sin6);
sin6->sin6_addr = *ina6;
sin6->sin6_port = htons(port);
+   sin6->sin6_scope_id = scope;
 }
 
 int
Index: lib/isc/include/isc/sockaddr.h
===
RCS file: /cvs/src/usr.bin/dig/lib/isc/include/isc/sockaddr.h,v
retrieving revision 1.7
diff -u -p -r1.7 sockaddr.h
--- lib/isc/include/isc/sockaddr.h  15 Sep 2020 11:47:42 -  1.7
+++ lib/isc/include/isc/sockaddr.h  17 Dec 2020 11:06:49 -
@@ -91,7 +91,7 @@ isc_sockaddr_fromin(struct sockaddr_stor
 
 void
 isc_sockaddr_fromin6(struct sockaddr_storage *sockaddr, const struct in6_addr 
*ina6,
-in_port_t port);
+in_port_t port, uint32_t scope);
 /*%<
  * Construct an struct sockaddr_storage from an IPv6 address and port.
  */
Index: lib/lwres/lwconfig.c
===
RCS file: /cvs/src/usr.bin/dig/lib/lwres/lwconfig.c,v
retrieving revision 1.6
diff -u -p -r1.6 lwconfig.c
--- lib/lwres/lwconfig.c25 Feb 2020 05:00:43 -  1.6
+++ lib/lwres/lwconfig.c17 Dec 2020 11:06:49 -
@@ -231,7 +231,7 @@ lwres_conf_parsenameserver(lwres_conf_t 
 
res = lwres_create_addr(word, , 1);
use_ipv4 = confdata->flags & LWRES_USEIPV4;
-   use_ipv6 = confdata->flags & LWRES_USEIPV4;
+   use_ipv6 = confdata->flags & LWRES_USEIPV6;
if (res == LWRES_R_SUCCESS &&
((address.family == LWRES_ADDRTYPE_V4 && use_ipv4) ||
(address.family == LWRES_ADDRTYPE_V6 && use_ipv6))) {



Re: regress print target name

2020-12-17 Thread Alexander Bluhm
On Wed, Dec 16, 2020 at 04:42:59PM +0100, Alexander Bluhm wrote:
> When debugging tests, it is useful to see the target name and which
> output belongs to it.

A small addition:

Run setup_once targets in a sepearate block with headline before
all other targets.

ok?

bluhm

Index: share/mk/bsd.regress.mk
===
RCS file: /data/mirror/openbsd/cvs/src/share/mk/bsd.regress.mk,v
retrieving revision 1.22
diff -u -p -r1.22 bsd.regress.mk
--- share/mk/bsd.regress.mk 16 Dec 2020 16:53:24 -  1.22
+++ share/mk/bsd.regress.mk 17 Dec 2020 00:56:08 -
@@ -75,13 +75,16 @@ ${REGRESS_TARGETS}: ${REGRESS_SETUP}
 CLEANFILES+=${REGRESS_SETUP_ONCE:S/^/stamp-/}
 ${REGRESS_TARGETS}: ${REGRESS_SETUP_ONCE:S/^/stamp-/}
 ${REGRESS_SETUP_ONCE:S/^/stamp-/}: .SILENT
+   echo ' ${@:S/^stamp-//} '
${MAKE} -C ${.CURDIR} ${@:S/^stamp-//}
date >$@
+   echo
 .endif
 
 regress: .SILENT
 .if !empty(REGRESS_SETUP_ONCE)
rm -f ${REGRESS_SETUP_ONCE:S/^/stamp-/}
+   ${MAKE} -C ${.CURDIR} ${REGRESS_SETUP_ONCE:S/^/stamp-/}
 .endif
 .for RT in ${REGRESS_TARGETS}
echo ' ${RT} '