Re: adjfreq(2): limit adjustment to prevent overflow during tc_windup()

2020-07-06 Thread Nicholas Marriott
10^12 was the old definition of billion in the UK also, although in the
last few decades it has become rare and 10^9 is now the norm.

https://en.wikipedia.org/wiki/Long_and_short_scales has quite a bit about
it.


On Tue, 7 Jul 2020 at 00:27, Scott Cheloha  wrote:

> On Mon, Jul 06, 2020 at 11:57:32PM +0200, Mark Kettenis wrote:
> > > Date: Mon, 6 Jul 2020 16:40:35 -0500
> > > From: Scott Cheloha 
> > >
> > > On Fri, Jul 03, 2020 at 10:52:15AM +0200, Otto Moerbeek wrote:
> > > > On Thu, Jul 02, 2020 at 08:27:58PM -0500, Scott Cheloha wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > When we recompute the scaling factor during tc_windup() there is an
> > > > > opportunity for arithmetic overflow/underflow when we add the NTP
> > > > > adjustment into the scale:
> > > > >
> > > > >649  scale = (u_int64_t)1 << 63;
> > > > >650  scale += \
> > > > >651  ((th->th_adjustment +
> th->th_counter->tc_freq_adj) / 1024) * 2199;
> > > > >652  scale /= th->th_counter->tc_frequency;
> > > > >653  th->th_scale = scale * 2;
> > > > >
> > > > > At lines 650 and 651, you will overflow/underflow if
> > > > > th->th_counter->tc_freq_adj is sufficiently positive/negative.
> > > > >
> > > > > I don't like the idea of checking for that overflow during
> > > > > tc_windup().  We can pick a reasonable adjustment range and check
> for
> > > > > it during adjfreq(2) and that should be good enough.
> > > > >
> > > > > My strawman proposal is a range of -5 to 5 parts
> per
> > > > > billion.  We could push the limits a bit, but half a billion seems
> > > > > like a nice round number to me.
> > > > >
> > > > > On a perfect clock, this means you can effect a 0.5x slowdown or a
> > > > > 1.5x speedup via adjfreq(2), but no slower/faster.
> > > > >
> > > > > I don't *think* ntpd(8) would ever reach such extreme adjustments
> > > > > through its algorithm.  I don't think this will break anyone's
> working
> > > > > setup.
> > > > >
> > > > > (Maybe I'm wrong, though.  otto@?)
> > > >
> > > > Right, ntpd is pretty conversative and won't do big adjustments.
> > >
> > > So, what is the right way to describe these limits?
> > >
> > > "Parts per billion"?  Something else?
> >
> > The traditional way to express clock drift in the context of NTP is
> > "parts per million" or "ppm" for short.  There are good reasons to
> > avoid using "billion" in documentation as a very similar word is used
> > in Germanic languages for 10^12 where in English you mean 10^9.
>
> Huh.  So from what I've gathered:
>
> German  English ISO
>
> Million Million 10^6
> Milliarde   Billion 10^9
> Billion Trillion10^12
> Billiarde   Quadrillion 10^15
> TrillionQuintillion 10^18
>
> Potentially confusing.
>
> The "German Connection" to NTP in particular eludes me, though.
>
> > Stripping three zeroes also makes it easier to read.  [...]
>
> Yes, it always does.  It looks nicer as "N ppm" in the mandoc(1)
> output, too.
>
> --
>
> otto@, from what you said I take it you're OK with the new limits so
> I'm going commit it as follows in a day or two.
>
> Index: lib/libc/sys/adjfreq.2
> ===
> RCS file: /cvs/src/lib/libc/sys/adjfreq.2,v
> retrieving revision 1.7
> diff -u -p -r1.7 adjfreq.2
> --- lib/libc/sys/adjfreq.2  10 Sep 2015 17:55:21 -  1.7
> +++ lib/libc/sys/adjfreq.2  6 Jul 2020 23:00:24 -
> @@ -60,6 +60,9 @@ The
>  .Fa freq
>  argument is non-null and the process's effective user ID is not that
>  of the superuser.
> +.It Bq Er EINVAL
> +.Fa freq
> +is less than -50 ppm or greater than 50 ppm.
>  .El
>  .Sh SEE ALSO
>  .Xr date 1 ,
> Index: sys/kern/kern_time.c
> ===
> RCS file: /cvs/src/sys/kern/kern_time.c,v
> retrieving revision 1.131
> diff -u -p -r1.131 kern_time.c
> --- sys/kern/kern_time.c22 Jun 2020 18:25:57 -  1.131
> +++ sys/kern/kern_time.c6 Jul 2020 23:00:24 -
> @@ -391,6 +391,9 @@ sys_settimeofday(struct proc *p, void *v
> return (0);
>  }
>
> +#define ADJFREQ_MAX (5LL << 32)
> +#define ADJFREQ_MIN (-5LL << 32)
> +
>  int
>  sys_adjfreq(struct proc *p, void *v, register_t *retval)
>  {
> @@ -408,6 +411,8 @@ sys_adjfreq(struct proc *p, void *v, reg
> return (error);
> if ((error = copyin(freq, , sizeof(f
> return (error);
> +   if (f < ADJFREQ_MIN || f > ADJFREQ_MAX)
> +   return (EINVAL);
> }
>
> rw_enter(_lock, (freq == NULL) ? RW_READ : RW_WRITE);
>
>


ws(4): Wheel emulation touch scroll on touchscreens

2020-07-06 Thread wdaver

My first diff...

This patch has the ws driver update mouse cursor absolute position
before scrolling begins on a touchscreen when wheel emulation is
enabled.

This change should not impact other uses of wheel emulation.

Is this ok?

Index: driver/xf86-input-ws/src/ws.c
===
RCS file: /cvs/xenocara/driver/xf86-input-ws/src/ws.c,v
retrieving revision 1.64
diff -u -p -u -p -r1.64 driver/xf86-input-ws/src/ws.c
--- driver/xf86-input-ws/src/ws.c    24 Mar 2019 17:59:19 -  1.64
+++ driver/xf86-input-ws/src/ws.c    6 Jul 2020 21:57:03 -
@@ -679,7 +679,8 @@ wsReadInput(InputInfoPtr pInfo)
         int ydelta = hw.ay - priv->old_ay;
         priv->old_ax = hw.ax;
         priv->old_ay = hw.ay;
-        if (wsWheelEmuFilterMotion(pInfo, xdelta, ydelta))
+        if (wsWheelEmuFilterMotion(pInfo, xdelta, ydelta)
+        && priv->type != WSMOUSE_TYPE_TPANEL)
             return;

         /* absolute position event */



Re: userland clock_gettime proof of concept

2020-07-06 Thread Vitaliy Makkoveev
Sorry for late reaction. At least VirtualBox based virtual machines
started to panic with the recent kernel. I reverted your diff and panics
stopped.
Screenshot attached.


Re: userland clock_gettime proof of concept

2020-07-06 Thread Vitaliy Makkoveev


> On 5 Jul 2020, at 20:31, Paul Irofti  wrote:
> 
> On Fri, Jul 03, 2020 at 06:36:39PM +0300, Paul Irofti wrote:
>> 
>> 
>> În 3 iulie 2020 17:55:25 EEST, Mark Kettenis  a 
>> scris:
 Date: Fri, 3 Jul 2020 15:13:22 +0200
 From: Robert Nagy 
 
 On 02/07/20 00:31 +0100, Stuart Henderson wrote:
> running on 38 of these, btw.
 
 been running with this on all my workstations and laptops and on 3
>>> build
 servers as well
>>> 
>>> Are the issue that naddy@ saw solved?
>>> 
>>> Did anybody do a *proper* test on anything besides amd64?  Especially
>>> on architectures where the optimized clock_gettime is *not* available?
>> 
>> Yes and yes. 
> 
> So, can we go ahead with this?
> 

Sorry for late reaction. At least VirtualBox based virtual machines started to
panic with the recent kernel. I reverted your diff and panics stopped.

Screenshot attached.



Re: [patch] dhclient(8) crashes with vm.malloc_conf=J

2020-07-06 Thread David Gwynne



> On 7 Jul 2020, at 7:57 am, Jesper Wallin  wrote:
> 
> Hi all,
> 
> I received a segmentation fault from dhclient(8) upon boot and decided
> to investigate...  My system is running with vm.malloc_conf=CFGJUR and
> figured one of those options was the cause of the crash.  I noticed that
> the buffer which holds my config options contained a lot of junk at the
> end and learned that 'J' is to blame together with a missing \0.
> 
> 
> How to reproduce:
> # sysctl vm.malloc_conf=J
> # cp /etc/dhclient.conf /etc/dhclient.conf.backup
> # echo 'supersede domain-name "ifconfig.se";' > /etc/dhclient.conf
> 
> Then run 'dhclient if0' a lot of times until it crashes, sometimes it
> takes more than 100 attempts.  Using vm.malloc_conf=CFGJUR might trigger
> it faster.
> 
> 
> In clparse.c:916, malloc(3) is used to get a buffer of the same length
> as the option in the config file.  But with 'J' in vm.malloc_conf, the
> buffer is bigger and contains junk.  I wouldn't say that my fix is the
> prettiest, but I get an extra byte and zero out the buffer.  Maybe
> someone has a more elegant fix for this.
> 
> 
> Yours,
> Jesper Wallin

you might want to put the memset after the check to see if the malloc failed...

> Index: clparse.c
> ===
> RCS file: /cvs/src/sbin/dhclient/clparse.c,v
> retrieving revision 1.199
> diff -u -p -r1.199 clparse.c
> --- clparse.c 13 May 2020 20:55:41 -  1.199
> +++ clparse.c 6 Jul 2020 21:25:54 -
> @@ -913,7 +913,8 @@ parse_option(FILE *cfile, int *code, str
>   } while (*fmt == 'A' && token == ',');
> 
>   free(options[i].data);
> - options[i].data = malloc(hunkix);
> + options[i].data = malloc(hunkix+1);
> + memset(options[i].data, 0, hunkix+1);
>   if (options[i].data == NULL)
>   fatal("option data");
>   memcpy(options[i].data, hunkbuf, hunkix);
> 



Re: pipex(4): kill pipexintr()

2020-07-06 Thread Vitaliy Makkoveev
On Mon, Jul 06, 2020 at 08:47:23PM +0200, Martin Pieuchot wrote:
> On 06/07/20(Mon) 19:23, Vitaliy Makkoveev wrote:
> > > On 6 Jul 2020, at 17:36, Martin Pieuchot  wrote:
> > [...] 
> > Unfortunately you can’t be sure about NET_LOCK() status while you are
> > in pppac_start(). It was described at this thread [1].
> > 
> > We have two cases:
> > 1. pppac_start() called from pppac_output(). NET_LOCK() was inherited.
> 
> Such recursions should be avoided.  if_enqueue() should take care of
> that.

I suggest to finish the route to if_get(9) before. Updated diff which
removes pipexintr() below. Just against the most resent source tree.

Index: lib/libc/sys/sysctl.2
===
RCS file: /cvs/src/lib/libc/sys/sysctl.2,v
retrieving revision 1.40
diff -u -p -r1.40 sysctl.2
--- lib/libc/sys/sysctl.2   17 May 2020 05:48:39 -  1.40
+++ lib/libc/sys/sysctl.2   6 Jul 2020 21:55:16 -
@@ -2033,35 +2033,11 @@ The currently defined variable names are
 .Bl -column "Third level name" "integer" "Changeable" -offset indent
 .It Sy "Third level name" Ta Sy "Type" Ta Sy "Changeable"
 .It Dv PIPEXCTL_ENABLE Ta integer Ta yes
-.It Dv PIPEXCTL_INQ Ta node Ta not applicable
-.It Dv PIPEXCTL_OUTQ Ta node Ta not applicable
 .El
 .Bl -tag -width "123456"
 .It Dv PIPEXCTL_ENABLE
 If set to 1, enable PIPEX processing.
 The default is 0.
-.It Dv PIPEXCTL_INQ Pq Va net.pipex.inq
-Fourth level comprises an array of
-.Vt struct ifqueue
-structures containing information about the PIPEX packet input queue.
-The forth level names for the elements of
-.Vt struct ifqueue
-are the same as described in
-.Li ip.arpq
-in the
-.Dv PF_INET
-section.
-.It Dv PIPEXCTL_OUTQ Pq Va net.pipex.outq
-Fourth level comprises an array of
-.Vt struct ifqueue
-structures containing information about PIPEX packet output queue.
-The forth level names for the elements of
-.Vt struct ifqueue
-are the same as described in
-.Li ip.arpq
-in the
-.Dv PF_INET
-section.
 .El
 .El
 .Ss CTL_VFS
Index: sys/net/if.c
===
RCS file: /cvs/src/sys/net/if.c,v
retrieving revision 1.611
diff -u -p -r1.611 if.c
--- sys/net/if.c30 Jun 2020 09:31:38 -  1.611
+++ sys/net/if.c6 Jul 2020 21:55:17 -
@@ -1012,13 +1012,6 @@ if_netisr(void *unused)
KERNEL_UNLOCK();
}
 #endif
-#ifdef PIPEX
-   if (n & (1 << NETISR_PIPEX)) {
-   KERNEL_LOCK();
-   pipexintr();
-   KERNEL_UNLOCK();
-   }
-#endif
t |= n;
}
 
Index: sys/net/netisr.h
===
RCS file: /cvs/src/sys/net/netisr.h,v
retrieving revision 1.51
diff -u -p -r1.51 netisr.h
--- sys/net/netisr.h6 Aug 2019 22:57:54 -   1.51
+++ sys/net/netisr.h6 Jul 2020 21:55:17 -
@@ -48,7 +48,6 @@
 #defineNETISR_IPV6 24  /* same as AF_INET6 */
 #defineNETISR_ISDN 26  /* same as AF_E164 */
 #defineNETISR_PPP  28  /* for PPP processing */
-#defineNETISR_PIPEX27  /* for pipex processing */
 #defineNETISR_BRIDGE   29  /* for bridge processing */
 #defineNETISR_PPPOE30  /* for pppoe processing */
 #defineNETISR_SWITCH   31  /* for switch dataplane */
@@ -68,7 +67,6 @@ void  bridgeintr(void);
 void   pppoeintr(void);
 void   switchintr(void);
 void   pfsyncintr(void);
-void   pipexintr(void);
 
 #defineschednetisr(anisr)  
\
 do {   \
Index: sys/net/pipex.c
===
RCS file: /cvs/src/sys/net/pipex.c,v
retrieving revision 1.119
diff -u -p -r1.119 pipex.c
--- sys/net/pipex.c 6 Jul 2020 20:37:51 -   1.119
+++ sys/net/pipex.c 6 Jul 2020 21:55:17 -
@@ -97,10 +97,6 @@ struct radix_node_head   *pipex_rd_head6 =
 struct timeout pipex_timer_ch; /* callout timer context */
 int pipex_prune = 1;   /* walk list every seconds */
 
-/* pipex traffic queue */
-struct mbuf_queue pipexinq = MBUF_QUEUE_INITIALIZER(IFQ_MAXLEN, IPL_NET);
-struct mbuf_queue pipexoutq = MBUF_QUEUE_INITIALIZER(IFQ_MAXLEN, IPL_NET);
-
 /* borrow an mbuf pkthdr field */
 #define ph_ppp_proto ether_vtag
 
@@ -713,82 +709,6 @@ pipex_lookup_by_session_id(int protocol,
 }
 
 /***
- * Queue and Software Interrupt Handler
- ***/
-void
-pipexintr(void)
-{
-   struct pipex_session *pkt_session;
-   u_int16_t proto;
-   struct mbuf *m;
-   struct mbuf_list ml;
-
-   /* ppp output */
-   

Re: amd64: lapic: refactor lapic timer programming

2020-07-06 Thread Mike Larkin
On Fri, Jul 03, 2020 at 07:41:45PM -0500, Scott Cheloha wrote:
> Hi,
>
> I want to run the lapic timer in one-shot mode on amd64 as we do with
> other interrupt clocks on other platforms.  I aim to make the clock
> interrupt code MD where possible.
>
> However, nobody is going to test my MD clock interrupt work unless
> amd64 is ready to use it.  amd64 doesn't run in oneshot mode so there
> is preliminary work to do first.
>
> --
>
> Before we can run the lapic timer in one-shot mode we need to simplify
> the process of actually programming it.
>
> This patch refactors all lapic timer programming into a single
> routine.  We don't use any divisor other than 1 so I don't see a need
> to make it a parameter to lapic_timer_arm().  We can add TSC deadline
> support later if someone wants it.
>
> The way we program the timer differs from how e.g. Darwin and FreeBSD
> and Linux do it.  They write:
>
>  - lvtt (mode + vector + (maybe) mask)
>  - dcr
>  - icr
>
> while we do:
>
>  - lvtt (mode + mask)
>  - dcr
>  - icr
>  - (maybe) lvtt (mode + vector)
>
> I don't see a reason to arm the timer with four writes instead of
> three, so in this patch I use the three-write ordering.
>
> Am I missing something?  Do I need to disable interrupts before I
> reprogram the timer?
>

This reads ok to me. I am not aware of any requirements to disable
interrupts while reprogramming the timer.

-ml

> -Scott
>
> Index: lapic.c
> ===
> RCS file: /cvs/src/sys/arch/amd64/amd64/lapic.c,v
> retrieving revision 1.55
> diff -u -p -r1.55 lapic.c
> --- lapic.c   3 Aug 2019 14:57:51 -   1.55
> +++ lapic.c   4 Jul 2020 00:40:26 -
> @@ -413,6 +413,42 @@ u_int32_t lapic_frac_usec_per_cycle;
>  u_int64_t lapic_frac_cycle_per_usec;
>  u_int32_t lapic_delaytab[26];
>
> +void lapic_timer_arm(uint32_t, int, uint32_t);
> +void lapic_timer_arm_once(int, uint32_t);
> +void lapic_timer_arm_period(int, uint32_t);
> +
> +/*
> + * Start the local apic countdown timer.
> + *
> + * First set the mode, vector, and (maybe) the mask.
> + * then set the divisor,
> + * and finally set the cycle count.
> + */
> +void
> +lapic_timer_arm(uint32_t mode, int masked, uint32_t cycles)
> +{
> + uint32_t lvtt;
> +
> + lvtt = mode | LAPIC_TIMER_VECTOR;
> + lvtt |= (masked) ? LAPIC_LVTT_M : 0;
> +
> + lapic_writereg(LAPIC_LVTT, lvtt);
> + lapic_writereg(LAPIC_DCR_TIMER, LAPIC_DCRT_DIV1);
> + lapic_writereg(LAPIC_ICR_TIMER, cycles);
> +}
> +
> +void
> +lapic_timer_arm_once(int masked, uint32_t cycles)
> +{
> + lapic_timer_arm(LAPIC_LVTT_TM_ONESHOT, masked, cycles);
> +}
> +
> +void
> +lapic_timer_arm_period(int masked, uint32_t cycles)
> +{
> + lapic_timer_arm(LAPIC_LVTT_TM_PERIODIC, masked, cycles);
> +}
> +
>  void
>  lapic_clockintr(void *arg, struct intrframe frame)
>  {
> @@ -430,17 +466,7 @@ lapic_clockintr(void *arg, struct intrfr
>  void
>  lapic_startclock(void)
>  {
> - /*
> -  * Start local apic countdown timer running, in repeated mode.
> -  *
> -  * Mask the clock interrupt and set mode,
> -  * then set divisor,
> -  * then unmask and set the vector.
> -  */
> - lapic_writereg(LAPIC_LVTT, LAPIC_LVTT_TM|LAPIC_LVTT_M);
> - lapic_writereg(LAPIC_DCR_TIMER, LAPIC_DCRT_DIV1);
> - lapic_writereg(LAPIC_ICR_TIMER, lapic_tval);
> - lapic_writereg(LAPIC_LVTT, LAPIC_LVTT_TM|LAPIC_TIMER_VECTOR);
> + lapic_timer_arm_period(0, lapic_tval);
>  }
>
>  void
> @@ -498,9 +524,7 @@ lapic_calibrate_timer(struct cpu_info *c
>* Configure timer to one-shot, interrupt masked,
>* large positive number.
>*/
> - lapic_writereg(LAPIC_LVTT, LAPIC_LVTT_M);
> - lapic_writereg(LAPIC_DCR_TIMER, LAPIC_DCRT_DIV1);
> - lapic_writereg(LAPIC_ICR_TIMER, 0x8000);
> + lapic_timer_arm_once(1, 0x8000);
>
>   s = intr_disable();
>
> @@ -540,10 +564,7 @@ skip_calibration:
>   lapic_tval = (lapic_per_second * 2) / hz;
>   lapic_tval = (lapic_tval / 2) + (lapic_tval & 0x1);
>
> - lapic_writereg(LAPIC_LVTT, LAPIC_LVTT_TM | LAPIC_LVTT_M |
> - LAPIC_TIMER_VECTOR);
> - lapic_writereg(LAPIC_DCR_TIMER, LAPIC_DCRT_DIV1);
> - lapic_writereg(LAPIC_ICR_TIMER, lapic_tval);
> + lapic_timer_arm_period(0, lapic_tval);
>
>   /*
>* Compute fixed-point ratios between cycles and
>



Re: userland clock_gettime proof of concept

2020-07-06 Thread Theo de Raadt
Ah, it was seen.

But everyone has too much memory these days.

Vitaliy Makkoveev  wrote:

> Sorry for late reaction. At least VirtualBox based virtual machines
> started to panic with the recent kernel. I reverted your diff and panics
> stopped.
> Screenshot attached.



Re: adjfreq(2): limit adjustment to prevent overflow during tc_windup()

2020-07-06 Thread Scott Cheloha
On Mon, Jul 06, 2020 at 11:57:32PM +0200, Mark Kettenis wrote:
> > Date: Mon, 6 Jul 2020 16:40:35 -0500
> > From: Scott Cheloha 
> > 
> > On Fri, Jul 03, 2020 at 10:52:15AM +0200, Otto Moerbeek wrote:
> > > On Thu, Jul 02, 2020 at 08:27:58PM -0500, Scott Cheloha wrote:
> > > 
> > > > Hi,
> > > > 
> > > > When we recompute the scaling factor during tc_windup() there is an
> > > > opportunity for arithmetic overflow/underflow when we add the NTP
> > > > adjustment into the scale:
> > > > 
> > > >649  scale = (u_int64_t)1 << 63;
> > > >650  scale += \
> > > >651  ((th->th_adjustment + th->th_counter->tc_freq_adj) 
> > > > / 1024) * 2199;
> > > >652  scale /= th->th_counter->tc_frequency;
> > > >653  th->th_scale = scale * 2;
> > > > 
> > > > At lines 650 and 651, you will overflow/underflow if
> > > > th->th_counter->tc_freq_adj is sufficiently positive/negative.
> > > > 
> > > > I don't like the idea of checking for that overflow during
> > > > tc_windup().  We can pick a reasonable adjustment range and check for
> > > > it during adjfreq(2) and that should be good enough.
> > > > 
> > > > My strawman proposal is a range of -5 to 5 parts per
> > > > billion.  We could push the limits a bit, but half a billion seems
> > > > like a nice round number to me.
> > > > 
> > > > On a perfect clock, this means you can effect a 0.5x slowdown or a
> > > > 1.5x speedup via adjfreq(2), but no slower/faster.
> > > > 
> > > > I don't *think* ntpd(8) would ever reach such extreme adjustments
> > > > through its algorithm.  I don't think this will break anyone's working
> > > > setup.
> > > > 
> > > > (Maybe I'm wrong, though.  otto@?)
> > > 
> > > Right, ntpd is pretty conversative and won't do big adjustments.
> > 
> > So, what is the right way to describe these limits?
> > 
> > "Parts per billion"?  Something else?
> 
> The traditional way to express clock drift in the context of NTP is
> "parts per million" or "ppm" for short.  There are good reasons to
> avoid using "billion" in documentation as a very similar word is used
> in Germanic languages for 10^12 where in English you mean 10^9.

Huh.  So from what I've gathered:

German  English ISO

Million Million 10^6
Milliarde   Billion 10^9
Billion Trillion10^12
Billiarde   Quadrillion 10^15
TrillionQuintillion 10^18

Potentially confusing.

The "German Connection" to NTP in particular eludes me, though.

> Stripping three zeroes also makes it easier to read.  [...]

Yes, it always does.  It looks nicer as "N ppm" in the mandoc(1)
output, too.

--

otto@, from what you said I take it you're OK with the new limits so
I'm going commit it as follows in a day or two.

Index: lib/libc/sys/adjfreq.2
===
RCS file: /cvs/src/lib/libc/sys/adjfreq.2,v
retrieving revision 1.7
diff -u -p -r1.7 adjfreq.2
--- lib/libc/sys/adjfreq.2  10 Sep 2015 17:55:21 -  1.7
+++ lib/libc/sys/adjfreq.2  6 Jul 2020 23:00:24 -
@@ -60,6 +60,9 @@ The
 .Fa freq
 argument is non-null and the process's effective user ID is not that
 of the superuser.
+.It Bq Er EINVAL
+.Fa freq
+is less than -50 ppm or greater than 50 ppm.
 .El
 .Sh SEE ALSO
 .Xr date 1 ,
Index: sys/kern/kern_time.c
===
RCS file: /cvs/src/sys/kern/kern_time.c,v
retrieving revision 1.131
diff -u -p -r1.131 kern_time.c
--- sys/kern/kern_time.c22 Jun 2020 18:25:57 -  1.131
+++ sys/kern/kern_time.c6 Jul 2020 23:00:24 -
@@ -391,6 +391,9 @@ sys_settimeofday(struct proc *p, void *v
return (0);
 }
 
+#define ADJFREQ_MAX (5LL << 32)
+#define ADJFREQ_MIN (-5LL << 32)
+
 int
 sys_adjfreq(struct proc *p, void *v, register_t *retval)
 {
@@ -408,6 +411,8 @@ sys_adjfreq(struct proc *p, void *v, reg
return (error);
if ((error = copyin(freq, , sizeof(f
return (error);
+   if (f < ADJFREQ_MIN || f > ADJFREQ_MAX)
+   return (EINVAL);
}
 
rw_enter(_lock, (freq == NULL) ? RW_READ : RW_WRITE);



Re: userland clock_gettime proof of concept

2020-07-06 Thread Mark Kettenis
> From: Vitaliy Makkoveev 
> Date: Tue, 7 Jul 2020 01:34:33 +0300
> 
> > On 5 Jul 2020, at 20:31, Paul Irofti  wrote:
> > 
> > On Fri, Jul 03, 2020 at 06:36:39PM +0300, Paul Irofti wrote:
> >> 
> >> 
> >> În 3 iulie 2020 17:55:25 EEST, Mark Kettenis  a 
> >> scris:
>  Date: Fri, 3 Jul 2020 15:13:22 +0200
>  From: Robert Nagy 
>  
>  On 02/07/20 00:31 +0100, Stuart Henderson wrote:
> > running on 38 of these, btw.
>  
>  been running with this on all my workstations and laptops and on 3
> >>> build
>  servers as well
> >>> 
> >>> Are the issue that naddy@ saw solved?
> >>> 
> >>> Did anybody do a *proper* test on anything besides amd64?  Especially
> >>> on architectures where the optimized clock_gettime is *not* available?
> >> 
> >> Yes and yes. 
> > 
> > So, can we go ahead with this?
> > 
> 
> Sorry for late reaction. At least VirtualBox based virtual machines
> started to panic with the recent kernel. I reverted your diff and
> panics stopped.

Should already be fixed.



[patch] dhclient(8) crashes with vm.malloc_conf=J

2020-07-06 Thread Jesper Wallin
Hi all,

I received a segmentation fault from dhclient(8) upon boot and decided
to investigate...  My system is running with vm.malloc_conf=CFGJUR and
figured one of those options was the cause of the crash.  I noticed that
the buffer which holds my config options contained a lot of junk at the
end and learned that 'J' is to blame together with a missing \0.


How to reproduce:
# sysctl vm.malloc_conf=J
# cp /etc/dhclient.conf /etc/dhclient.conf.backup
# echo 'supersede domain-name "ifconfig.se";' > /etc/dhclient.conf

Then run 'dhclient if0' a lot of times until it crashes, sometimes it
takes more than 100 attempts.  Using vm.malloc_conf=CFGJUR might trigger
it faster.


In clparse.c:916, malloc(3) is used to get a buffer of the same length
as the option in the config file.  But with 'J' in vm.malloc_conf, the
buffer is bigger and contains junk.  I wouldn't say that my fix is the
prettiest, but I get an extra byte and zero out the buffer.  Maybe
someone has a more elegant fix for this.


Yours,
Jesper Wallin


Index: clparse.c
===
RCS file: /cvs/src/sbin/dhclient/clparse.c,v
retrieving revision 1.199
diff -u -p -r1.199 clparse.c
--- clparse.c   13 May 2020 20:55:41 -  1.199
+++ clparse.c   6 Jul 2020 21:25:54 -
@@ -913,7 +913,8 @@ parse_option(FILE *cfile, int *code, str
} while (*fmt == 'A' && token == ',');
 
free(options[i].data);
-   options[i].data = malloc(hunkix);
+   options[i].data = malloc(hunkix+1);
+   memset(options[i].data, 0, hunkix+1);
if (options[i].data == NULL)
fatal("option data");
memcpy(options[i].data, hunkbuf, hunkix);



Re: adjfreq(2): limit adjustment to prevent overflow during tc_windup()

2020-07-06 Thread Mark Kettenis
> Date: Mon, 6 Jul 2020 16:40:35 -0500
> From: Scott Cheloha 
> 
> On Fri, Jul 03, 2020 at 10:52:15AM +0200, Otto Moerbeek wrote:
> > On Thu, Jul 02, 2020 at 08:27:58PM -0500, Scott Cheloha wrote:
> > 
> > > Hi,
> > > 
> > > When we recompute the scaling factor during tc_windup() there is an
> > > opportunity for arithmetic overflow/underflow when we add the NTP
> > > adjustment into the scale:
> > > 
> > >649  scale = (u_int64_t)1 << 63;
> > >650  scale += \
> > >651  ((th->th_adjustment + th->th_counter->tc_freq_adj) / 
> > > 1024) * 2199;
> > >652  scale /= th->th_counter->tc_frequency;
> > >653  th->th_scale = scale * 2;
> > > 
> > > At lines 650 and 651, you will overflow/underflow if
> > > th->th_counter->tc_freq_adj is sufficiently positive/negative.
> > > 
> > > I don't like the idea of checking for that overflow during
> > > tc_windup().  We can pick a reasonable adjustment range and check for
> > > it during adjfreq(2) and that should be good enough.
> > > 
> > > My strawman proposal is a range of -5 to 5 parts per
> > > billion.  We could push the limits a bit, but half a billion seems
> > > like a nice round number to me.
> > > 
> > > On a perfect clock, this means you can effect a 0.5x slowdown or a
> > > 1.5x speedup via adjfreq(2), but no slower/faster.
> > > 
> > > I don't *think* ntpd(8) would ever reach such extreme adjustments
> > > through its algorithm.  I don't think this will break anyone's working
> > > setup.
> > > 
> > > (Maybe I'm wrong, though.  otto@?)
> > 
> > Right, ntpd is pretty conversative and won't do big adjustments.
> 
> So, what is the right way to describe these limits?
> 
> "Parts per billion"?  Something else?

The traditional way to express clock drift in the context of NTP is
"parts per million" or "ppm" for short.  There are good reasons to
avoid using "billion" in documentation as a very similar word is used
in Germanic languages for 10^12 where in English you mean 10^9.
Stripping three zeroes also makes it easier to read.

I don't think the fact that this is expressed in the code as "parts
per billion" is an issue.

Cheers,

Mark

> Index: lib/libc/sys/adjfreq.2
> ===
> RCS file: /cvs/src/lib/libc/sys/adjfreq.2,v
> retrieving revision 1.7
> diff -u -p -r1.7 adjfreq.2
> --- lib/libc/sys/adjfreq.210 Sep 2015 17:55:21 -  1.7
> +++ lib/libc/sys/adjfreq.26 Jul 2020 21:39:37 -
> @@ -60,6 +60,10 @@ The
>  .Fa freq
>  argument is non-null and the process's effective user ID is not that
>  of the superuser.
> +.It Bq Er EINVAL
> +.Fa freq
> +is less than -5 parts-per-billion or greater than 5
> +parts-per-billion.
>  .El
>  .Sh SEE ALSO
>  .Xr date 1 ,
> Index: sys/kern/kern_time.c
> ===
> RCS file: /cvs/src/sys/kern/kern_time.c,v
> retrieving revision 1.131
> diff -u -p -r1.131 kern_time.c
> --- sys/kern/kern_time.c  22 Jun 2020 18:25:57 -  1.131
> +++ sys/kern/kern_time.c  6 Jul 2020 21:39:38 -
> @@ -391,6 +391,9 @@ sys_settimeofday(struct proc *p, void *v
>   return (0);
>  }
>  
> +#define ADJFREQ_MAX (5LL << 32)
> +#define ADJFREQ_MIN (-5LL << 32)
> +
>  int
>  sys_adjfreq(struct proc *p, void *v, register_t *retval)
>  {
> @@ -408,6 +411,8 @@ sys_adjfreq(struct proc *p, void *v, reg
>   return (error);
>   if ((error = copyin(freq, , sizeof(f
>   return (error);
> + if (f < ADJFREQ_MIN || f > ADJFREQ_MAX)
> + return (EINVAL);
>   }
>  
>   rw_enter(_lock, (freq == NULL) ? RW_READ : RW_WRITE);
> 
> 



Use CPU_IS_PRIMARY macro in identifycpu() on amd64

2020-07-06 Thread Frederic Cambus
Hi tech@,

While having a glance at identcpu.c to see how CPU flags were printed
for each CPU, I noticed we have the following macro in cpu.h:

#define CPU_IS_PRIMARY(ci)  ((ci)->ci_flags & CPUF_PRIMARY)

Here is a diff to use it in identifycpu() on amd64.

Comments? OK?

Index: sys/arch/amd64/amd64/identcpu.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/identcpu.c,v
retrieving revision 1.115
diff -u -p -r1.115 identcpu.c
--- sys/arch/amd64/amd64/identcpu.c 27 May 2020 05:08:53 -  1.115
+++ sys/arch/amd64/amd64/identcpu.c 6 Jul 2020 21:04:13 -
@@ -480,7 +480,7 @@ identifycpu(struct cpu_info *ci)
ci->ci_efeature_ecx, ci->ci_feature_eflags);
/* Other bits may clash */
ci->ci_feature_flags |= (ci->ci_feature_eflags & CPUID_NXE);
-   if (ci->ci_flags & CPUF_PRIMARY)
+   if (CPU_IS_PRIMARY(ci))
ecpu_ecxfeature = ci->ci_efeature_ecx;
/* Let cpu_feature be the common bits */
cpu_feature &= ci->ci_feature_flags;
@@ -515,7 +515,7 @@ identifycpu(struct cpu_info *ci)
sizeof(mycpu_model));
 
/* If primary cpu, fill in the global cpu_model used by sysctl */
-   if (ci->ci_flags & CPUF_PRIMARY)
+   if (CPU_IS_PRIMARY(ci))
strlcpy(cpu_model, mycpu_model, sizeof(cpu_model));
 
ci->ci_family = (ci->ci_signature >> 8) & 0x0f;
@@ -561,7 +561,7 @@ identifycpu(struct cpu_info *ci)
printf(", %llu.%02llu MHz", (freq + 4999) / 100,
((freq + 4999) / 1) % 100);
 
-   if (ci->ci_flags & CPUF_PRIMARY) {
+   if (CPU_IS_PRIMARY(ci)) {
cpuspeed = (freq + 4999) / 100;
cpu_cpuspeed = cpu_amd64speed;
}
@@ -689,7 +689,7 @@ identifycpu(struct cpu_info *ci)
ci->ci_dev->dv_xname);
}
 
-   if (ci->ci_flags & CPUF_PRIMARY) {
+   if (CPU_IS_PRIMARY(ci)) {
 #ifndef SMALL_KERNEL
if (!strcmp(cpu_vendor, "AuthenticAMD") &&
ci->ci_pnfeatset >= 0x8007) {
@@ -737,7 +737,7 @@ identifycpu(struct cpu_info *ci)
 #endif
 
 #ifdef CRYPTO
-   if (ci->ci_flags & CPUF_PRIMARY) {
+   if (CPU_IS_PRIMARY(ci)) {
if (cpu_ecxfeature & CPUIDECX_PCLMUL)
amd64_has_pclmul = 1;
 



Re: adjfreq(2): limit adjustment to prevent overflow during tc_windup()

2020-07-06 Thread Scott Cheloha
On Fri, Jul 03, 2020 at 10:52:15AM +0200, Otto Moerbeek wrote:
> On Thu, Jul 02, 2020 at 08:27:58PM -0500, Scott Cheloha wrote:
> 
> > Hi,
> > 
> > When we recompute the scaling factor during tc_windup() there is an
> > opportunity for arithmetic overflow/underflow when we add the NTP
> > adjustment into the scale:
> > 
> >649  scale = (u_int64_t)1 << 63;
> >650  scale += \
> >651  ((th->th_adjustment + th->th_counter->tc_freq_adj) / 
> > 1024) * 2199;
> >652  scale /= th->th_counter->tc_frequency;
> >653  th->th_scale = scale * 2;
> > 
> > At lines 650 and 651, you will overflow/underflow if
> > th->th_counter->tc_freq_adj is sufficiently positive/negative.
> > 
> > I don't like the idea of checking for that overflow during
> > tc_windup().  We can pick a reasonable adjustment range and check for
> > it during adjfreq(2) and that should be good enough.
> > 
> > My strawman proposal is a range of -5 to 5 parts per
> > billion.  We could push the limits a bit, but half a billion seems
> > like a nice round number to me.
> > 
> > On a perfect clock, this means you can effect a 0.5x slowdown or a
> > 1.5x speedup via adjfreq(2), but no slower/faster.
> > 
> > I don't *think* ntpd(8) would ever reach such extreme adjustments
> > through its algorithm.  I don't think this will break anyone's working
> > setup.
> > 
> > (Maybe I'm wrong, though.  otto@?)
> 
> Right, ntpd is pretty conversative and won't do big adjustments.

So, what is the right way to describe these limits?

"Parts per billion"?  Something else?

Index: lib/libc/sys/adjfreq.2
===
RCS file: /cvs/src/lib/libc/sys/adjfreq.2,v
retrieving revision 1.7
diff -u -p -r1.7 adjfreq.2
--- lib/libc/sys/adjfreq.2  10 Sep 2015 17:55:21 -  1.7
+++ lib/libc/sys/adjfreq.2  6 Jul 2020 21:39:37 -
@@ -60,6 +60,10 @@ The
 .Fa freq
 argument is non-null and the process's effective user ID is not that
 of the superuser.
+.It Bq Er EINVAL
+.Fa freq
+is less than -5 parts-per-billion or greater than 5
+parts-per-billion.
 .El
 .Sh SEE ALSO
 .Xr date 1 ,
Index: sys/kern/kern_time.c
===
RCS file: /cvs/src/sys/kern/kern_time.c,v
retrieving revision 1.131
diff -u -p -r1.131 kern_time.c
--- sys/kern/kern_time.c22 Jun 2020 18:25:57 -  1.131
+++ sys/kern/kern_time.c6 Jul 2020 21:39:38 -
@@ -391,6 +391,9 @@ sys_settimeofday(struct proc *p, void *v
return (0);
 }
 
+#define ADJFREQ_MAX (5LL << 32)
+#define ADJFREQ_MIN (-5LL << 32)
+
 int
 sys_adjfreq(struct proc *p, void *v, register_t *retval)
 {
@@ -408,6 +411,8 @@ sys_adjfreq(struct proc *p, void *v, reg
return (error);
if ((error = copyin(freq, , sizeof(f
return (error);
+   if (f < ADJFREQ_MIN || f > ADJFREQ_MAX)
+   return (EINVAL);
}
 
rw_enter(_lock, (freq == NULL) ? RW_READ : RW_WRITE);



Re: ksh(1) set -o pipefail

2020-07-06 Thread Todd C . Miller
On Mon, 06 Jul 2020 22:22:36 +0200, Jeremie Courreges-Anglas wrote:

> Requested by ajacoutot@, here's an attempt at implementing set -o
> pipefail.  As pointed by sthen@ this option should be included in the
> next POSIX standard:
>
>   https://www.austingroupbugs.net/view.php?id=789
>
> There are several ways to implement it, the diff I showed to Antoine was
> based on option 2) in
>
>   https://www.austingroupbugs.net/view.php?id=789#c4102
>
> whereas the diff below implements option 1) described by kre@NetBSD,
> which looks like the most sensible approach.

Looks good to me.  OK millert@

 - todd



ksh(1) set -o pipefail

2020-07-06 Thread Jeremie Courreges-Anglas


Requested by ajacoutot@, here's an attempt at implementing set -o
pipefail.  As pointed by sthen@ this option should be included in the
next POSIX standard:

  https://www.austingroupbugs.net/view.php?id=789

There are several ways to implement it, the diff I showed to Antoine was
based on option 2) in

  https://www.austingroupbugs.net/view.php?id=789#c4102

whereas the diff below implements option 1) described by kre@NetBSD,
which looks like the most sensible approach.

No, this doesn't add support for a PIPESTATUS array, I'm not sure we need it.

This diff includes manpage and regress bits.
Thoughts, oks?


Index: bin/ksh/jobs.c
===
RCS file: /d/cvs/src/bin/ksh/jobs.c,v
retrieving revision 1.61
diff -u -p -r1.61 jobs.c
--- bin/ksh/jobs.c  28 Jun 2019 13:34:59 -  1.61
+++ bin/ksh/jobs.c  6 Jul 2020 18:10:43 -
@@ -70,6 +70,7 @@ struct proc {
 #define JF_REMOVE  0x200   /* flagged for removal (j_jobs()/j_notify()) */
 #define JF_USETTYMODE  0x400   /* tty mode saved if process exits normally */
 #define JF_SAVEDTTYPGRP0x800   /* j->saved_ttypgrp is valid */
+#define JF_PIPEFAIL0x1000  /* pipefail on when job was started */
 
 typedef struct job Job;
 struct job {
@@ -421,6 +422,8 @@ exchild(struct op *t, int flags, volatil
 */
j->flags = (flags & XXCOM) ? JF_XXCOM :
((flags & XBGND) ? 0 : (JF_FG|JF_USETTYMODE));
+   if (Flag(FPIPEFAIL))
+   j->flags |= JF_PIPEFAIL;
timerclear(>usrtime);
timerclear(>systime);
j->state = PRUNNING;
@@ -1084,7 +1087,30 @@ j_waitj(Job *j,
 
j_usrtime = j->usrtime;
j_systime = j->systime;
-   rv = j->status;
+
+   if (j->flags & JF_PIPEFAIL) {
+   Proc *p;
+   int status;
+
+   rv = 0;
+   for (p = j->proc_list; p != NULL; p = p->next) {
+   switch (p->state) {
+   case PEXITED:
+   status = WEXITSTATUS(p->status);
+   break;
+   case PSIGNALLED:
+   status = 128 + WTERMSIG(p->status);
+   break;
+   default:
+   status = 0;
+   break;
+   }
+   if (status)
+   rv = status;
+   }
+   } else
+   rv = j->status;
+
 
if (!(flags & JW_ASYNCNOTIFY) &&
(!Flag(FMONITOR) || j->state != PSTOPPED)) {
Index: bin/ksh/ksh.1
===
RCS file: /d/cvs/src/bin/ksh/ksh.1,v
retrieving revision 1.208
diff -u -p -r1.208 ksh.1
--- bin/ksh/ksh.1   26 Nov 2019 22:49:01 -  1.208
+++ bin/ksh/ksh.1   6 Jul 2020 16:04:34 -
@@ -361,7 +361,9 @@ token to form pipelines, in which the st
 last is piped (see
 .Xr pipe 2 )
 to the standard input of the following command.
-The exit status of a pipeline is that of its last command.
+The exit status of a pipeline is that of its last command, unless the
+.Ic pipefail
+option is set.
 A pipeline may be prefixed by the
 .Ql \&!
 reserved word, which causes the exit status of the pipeline to be logically
@@ -3664,6 +3666,10 @@ See the
 and
 .Ic pwd
 commands above for more details.
+.It Ic pipefail
+The exit status of a pipeline is the exit status of the rightmost
+command in the pipeline that doesn't return 0, or 0 if all commands
+returned a 0 exit status.
 .It Ic posix
 Enable POSIX mode.
 See
Index: bin/ksh/misc.c
===
RCS file: /d/cvs/src/bin/ksh/misc.c,v
retrieving revision 1.73
diff -u -p -r1.73 misc.c
--- bin/ksh/misc.c  28 Jun 2019 13:34:59 -  1.73
+++ bin/ksh/misc.c  6 Jul 2020 16:04:34 -
@@ -147,6 +147,7 @@ const struct option sh_options[] = {
{ "notify", 'b',OF_ANY },
{ "nounset",'u',OF_ANY },
{ "physical", 0,OF_ANY }, /* non-standard */
+   { "pipefail", 0,OF_ANY }, /* non-standard */
{ "posix",0,OF_ANY }, /* non-standard */
{ "privileged", 'p',OF_ANY },
{ "restricted", 'r',OF_CMDLINE },
Index: bin/ksh/sh.h
===
RCS file: /d/cvs/src/bin/ksh/sh.h,v
retrieving revision 1.75
diff -u -p -r1.75 sh.h
--- bin/ksh/sh.h20 Feb 2019 23:59:17 -  1.75
+++ bin/ksh/sh.h6 Jul 2020 16:13:59 -
@@ -158,6 +158,7 @@ enum sh_flag {
FNOTIFY,/* -b: asynchronous job completion notification */
FNOUNSET,   /* -u: using an unset var is an error */
FPHYSICAL,  /* -o physical: don't do 

Re: pipex(4): kill pipexintr()

2020-07-06 Thread Vitaliy Makkoveev
> On 6 Jul 2020, at 17:36, Martin Pieuchot  wrote:
> 
> On 06/07/20(Mon) 16:42, Vitaliy Makkoveev wrote:
>> [...] 
>> pipex(4) is simultaneously locked by NET_LOCK() and KERNEL_LOCK() but
>> with two exceptions:
>> 
>> 1. As you pointed pipex_pppoe_input() called without KERNEL_LOCK() held.
>> 2. pppac_start() called without NET_LOCK() held. Or with NET_LOCK()
>>   held. It depends on `if_snd' usage.
>> 
>> Diff below enforces pppac_start() to be called with NET_LOCK() held.
>> Also all externally called pipex(4) input and output routines have
>> NET_ASSERT_LOCKED() assertion.
>> 
>> Now pipex(4) is fully protected by NET_LOCK() so description of struct
>> members chenget too.
>> 
>> Index: sys/net/if_pppx.c
>> ===
>> RCS file: /cvs/src/sys/net/if_pppx.c,v
>> retrieving revision 1.90
>> diff -u -p -r1.90 if_pppx.c
>> --- sys/net/if_pppx.c24 Jun 2020 08:52:53 -  1.90
>> +++ sys/net/if_pppx.c6 Jul 2020 11:10:17 -
>> @@ -1117,6 +1117,8 @@ pppacopen(dev_t dev, int flags, int mode
>>  ifp->if_output = pppac_output;
>>  ifp->if_start = pppac_start;
>>  ifp->if_ioctl = pppac_ioctl;
>> +/* XXXSMP: be sure pppac_start() called under NET_LOCK() */
>> +IFQ_SET_MAXLEN(>if_snd, 1);
> 
> Is it possible to grab the NET_LOCK() inside pppac_start() instead of
> grabbing it outside?  This should allow *start() routine to be called
> from any context.

Unfortunately you can’t be sure about NET_LOCK() status while you are
in pppac_start(). It was described at this thread [1].

We have two cases:
1. pppac_start() called from pppac_output(). NET_LOCK() was inherited.
2. pppac_start() called from `systq’. There is no NET_LOCK() held.

NET_LOCK() is not required for pipex_{,ppp}_output() because underlay
routines were switched to ip{,6}_send().

I intentionally want to made all pipex(4) locked by one lock and be
sure it’s locking is consistent. I hope to start implementing fine
graining locks just after pipex(4) be switched to if_get(9).

> 
> It might be interesting to see that as a difference between the NET_LOCK()
> used to protect the network stack internals and the NET_LOCK() used to
> protect pipex(4) internals.  Such distinction might help to convert the
> latter into a different lock or primitive.
> 
>> Index: sys/net/pipex.c
>> ===
>> RCS file: /cvs/src/sys/net/pipex.c,v
>> retrieving revision 1.117
>> diff -u -p -r1.117 pipex.c
>> --- sys/net/pipex.c  30 Jun 2020 14:05:13 -  1.117
>> +++ sys/net/pipex.c  6 Jul 2020 11:10:17 -
>> @@ -869,6 +869,7 @@ pipex_output(struct mbuf *m0, int af, in
>>  struct ip ip;
>>  struct mbuf *mret;
>> 
>> +NET_ASSERT_LOCKED();
> 
> This function doesn't touch any shared data structure, we'd better move
> the NET_ASSERT_LOCKED() above rn_lookuo() in pipex_lookup_by_ip_address().
> 
> Note that `pipex_rd_head4' and `pipex_rd_head6' are, with this diff,
> also protected by the NET_LOCK() and should be annotated as such.
> 
>>  session = NULL;
>>  mret = NULL;
>>  switch (af) {
>> @@ -962,6 +963,8 @@ pipex_ppp_output(struct mbuf *m0, struct
>> {
>>  u_char *cp, hdr[16];
>> 
>> +NET_ASSERT_LOCKED();
> 
> Same here, it seems that the only reason the NET_LOCK() is necessary in
> the output path is to prevent corruption of the `session' descriptor being
> used.  So we'd rather put the assertion above the LIST_FOREACH(). 

They touch `session->stat’ :) We can switch pipex(4) to percpu counters and
output will be lockless. But I guess this should be done later.

> 
> Anyway all of those can be addressed later, your diff is ok mpi@
> 

Thanks. I will commit it if no one has objections.

1. https://marc.info/?t=15899861152=1=2



Re: pipex(4): kill pipexintr()

2020-07-06 Thread Martin Pieuchot
On 06/07/20(Mon) 19:23, Vitaliy Makkoveev wrote:
> > On 6 Jul 2020, at 17:36, Martin Pieuchot  wrote:
> [...] 
> Unfortunately you can’t be sure about NET_LOCK() status while you are
> in pppac_start(). It was described at this thread [1].
> 
> We have two cases:
> 1. pppac_start() called from pppac_output(). NET_LOCK() was inherited.

Such recursions should be avoided.  if_enqueue() should take care of
that.



pipex(4) prevent `old_session_keys' memory leak

2020-07-06 Thread Vitaliy Makkoveev
Before session freeing pipex_rele_session() will check and release
`old_session_keys' if necessary. So use it instead of pool_put(9) within
pipex_destroy_session().

Index: sys/net/pipex.c
===
RCS file: /cvs/src/sys/net/pipex.c,v
retrieving revision 1.117
diff -u -p -r1.117 pipex.c
--- sys/net/pipex.c 30 Jun 2020 14:05:13 -  1.117
+++ sys/net/pipex.c 6 Jul 2020 13:23:25 -
@@ -652,7 +652,7 @@ pipex_destroy_session(struct pipex_sessi
}
 
pipex_unlink_session(session);
-   pool_put(_session_pool, session);
+   pipex_rele_session(session);
 
return (0);
 }



Re: pipex(4): kill pipexintr()

2020-07-06 Thread Vitaliy Makkoveev
On Mon, Jul 06, 2020 at 10:59:17AM +0200, Martin Pieuchot wrote:
> On 01/07/20(Wed) 22:42, Vitaliy Makkoveev wrote:
> > pipex(4) has 2 mbuf queues: `pipexinq' and `pipexoutq'. When mbuf passed
> > to pipex it goes to one of these queues and pipexintr() will be
> > scheduled to process them. pipexintr() called from `netisr' context.
> > 
> > It's true for pppac(4) but for pppx(4) only incoming mbufs go to
> > `pipexinq. Outgoing mbufs go directly to stack. pppx(4) enabled in
> > npppd.conf(5) by default so I guess it's the common case of pipex(4)
> > usage.
> > 
> > The code looks like there is no requirements to this delayed mbufs
> > processing, we can pass it directly to stack as we do for pppx(4)
> > outgoing traffic.
> > 
> > Also we have some troubles with pipexintr() as it was described in [1].
> > It's protection of `ph_cookie'. We don't this protection this time and
> > we can't because we should brake if_get(9) logic.
> > 
> > Diff below removes pipexintr(). Now all mbufs passed directly without
> > enqueueing within pipex(4). We also can destroy sessions safe in all
> > cases. We also can use if_get(9) instead using unreferenced pointers to
> > `ifnet' within pipex(4). We also avoided context switch while we
> > processing mbufs within pipex(4). We decreased latency.
> > 
> > I'm seeding debian torrents with this diff an all goes well.
> 
> With this diff the content of pipexintr() is no longer executed with the
> KERNEL_LOCK() held.  This can be seen by following the code starting in
> ether_input().
> 
> Grabbing the KERNEL_LOCK() there is not a way forward.  The whole idea
> of if_input_process() is to be free of KERNEL_LOCK() to not introduce
> latency delay.
> 
> So this changes implies that `pipex_session_list' and possibly other
> global data structures as well as the elements linked in those are all
> protected by the NET_LOCK().  I believe this is the easiest way forward.
> 
> That said I would be comfortable with this diff going in if an audit of
> the data structures accessed in the code path starting by pipex_pppoe_input()
> has been done.  That implies annotating/documenting which data structures
> are now protected by the NET_LOCK() and adding the necessary
> NET_ASSERT_LOCK().  Such audit might lead to consider some ioctl code
> path changes to now serialize on the NET_LOCK() instead of the
> KERNEL_LOCK().
> 

pipex(4) is simultaneously locked by NET_LOCK() and KERNEL_LOCK() but
with two exceptions:

1. As you pointed pipex_pppoe_input() called without KERNEL_LOCK() held.
2. pppac_start() called without NET_LOCK() held. Or with NET_LOCK()
   held. It depends on `if_snd' usage.

Diff below enforces pppac_start() to be called with NET_LOCK() held.
Also all externally called pipex(4) input and output routines have
NET_ASSERT_LOCKED() assertion.

Now pipex(4) is fully protected by NET_LOCK() so description of struct
members chenget too.

Index: sys/net/if_pppx.c
===
RCS file: /cvs/src/sys/net/if_pppx.c,v
retrieving revision 1.90
diff -u -p -r1.90 if_pppx.c
--- sys/net/if_pppx.c   24 Jun 2020 08:52:53 -  1.90
+++ sys/net/if_pppx.c   6 Jul 2020 11:10:17 -
@@ -1117,6 +1117,8 @@ pppacopen(dev_t dev, int flags, int mode
ifp->if_output = pppac_output;
ifp->if_start = pppac_start;
ifp->if_ioctl = pppac_ioctl;
+   /* XXXSMP: be sure pppac_start() called under NET_LOCK() */
+   IFQ_SET_MAXLEN(>if_snd, 1);
 
if_counters_alloc(ifp);
if_attach(ifp);
Index: sys/net/pipex.c
===
RCS file: /cvs/src/sys/net/pipex.c,v
retrieving revision 1.117
diff -u -p -r1.117 pipex.c
--- sys/net/pipex.c 30 Jun 2020 14:05:13 -  1.117
+++ sys/net/pipex.c 6 Jul 2020 11:10:17 -
@@ -869,6 +869,7 @@ pipex_output(struct mbuf *m0, int af, in
struct ip ip;
struct mbuf *mret;
 
+   NET_ASSERT_LOCKED();
session = NULL;
mret = NULL;
switch (af) {
@@ -962,6 +963,8 @@ pipex_ppp_output(struct mbuf *m0, struct
 {
u_char *cp, hdr[16];
 
+   NET_ASSERT_LOCKED();
+
 #ifdef PIPEX_MPPE
if (pipex_session_is_mppe_enabled(session)) {
if (proto == PPP_IP) {
@@ -1355,6 +1358,7 @@ pipex_pppoe_input(struct mbuf *m0, struc
int hlen;
struct pipex_pppoe_header pppoe;
 
+   NET_ASSERT_LOCKED();
/* already checked at pipex_pppoe_lookup_session */
KASSERT(m0->m_pkthdr.len >= (sizeof(struct ether_header) +
sizeof(pppoe)));
@@ -1586,6 +1590,7 @@ pipex_pptp_input(struct mbuf *m0, struct
struct pipex_pptp_session *pptp_session;
int rewind = 0;
 
+   NET_ASSERT_LOCKED();
KASSERT(m0->m_pkthdr.len >= PIPEX_IPGRE_HDRLEN);
pptp_session = >proto.pptp;
 
@@ -2031,6 +2036,7 @@ pipex_l2tp_input(struct mbuf *m0, int of
uint16_t flags, ns = 0, nr = 0;
int rewind = 0;
 
+   

DNS options for sppp(4)

2020-07-06 Thread Peter J. Philipp
Hello,

This is an old patch from Gerhard Roth, and mpf@ dating back to 2007.  Please
see:  https://marc.info/?l=openbsd-tech=134943767022961=2

I contacted Gerhard who said instead of begging for this I should make it
IPv6 capable.  So I tried and nearly flooded my ISP off the net (sorry),
it seems that there is no IPV6CP capabilities for this yet.  I googled and
found a few drafts but all said the protocol number is TBD.  Also I looked
on IANA.org https://www.iana.org/assignments/ppp-numbers/ppp-numbers.xhtml
and found no official IPV6CP specification.

I'm submitting this patch because I have a need for DNS servers as my ISP
does not give these out via official support channels.  So I'm relying on
either using a public resolver (such as 8.8.8.8) or unwind.  No offence to 
unwind but I'd rather use these ISP nameservers if I knew where to get them.
I have also made a forwarding feature on my authoritative nameserver (with
caching) and would write the code to automatically configure on startup to
automatically determined pppoe(4) nameservers.  So this is the self pitch.

Patch is below with lots of #if 0, that I'd clean up if it passes the
political hurdles.  Also I have no -current on me currently so this is all
against 6.7.  I understand there is wireguard code in ifconfig and I'd have
to bite the bullet and sysupgrade a machine here to make new new patch
fit.  I just need an OK from someone and I'll clean it up otherwise I 
won't do anymore work on this subject.  Patch after my signature:

Best Regards,
-peter


Index: ifconfig.c
===
RCS file: /cvs/src/sbin/ifconfig/ifconfig.c,v
retrieving revision 1.421
diff -u -p -u -r1.421 ifconfig.c
--- ifconfig.c  27 Feb 2020 08:28:35 -  1.421
+++ ifconfig.c  6 Jul 2020 17:09:49 -
@@ -690,6 +690,7 @@ u_int   getwpacipher(const char *);
 void   print_cipherset(u_int32_t);
 
 void   spppauthinfo(struct sauthreq *, int);
+void   spppdnsinfo(struct sdnsreq *);
 
 /* Known address families */
 const struct afswtch {
@@ -5401,6 +5402,17 @@ spppauthinfo(struct sauthreq *spa, int d
 }
 
 void
+spppdnsinfo(struct sdnsreq *spd)
+{
+   bzero(spd, sizeof(struct sdnsreq));
+   
+   ifr.ifr_data = (caddr_t)spd;
+   spd->cmd = SPPPIOGDNS;
+   if (ioctl(sock, SIOCGSARAMS, ) == -1)
+   err(1, "SIOCGSARAMS(SPPPIOGDNS)");
+}
+
+void
 setsroto(const char *val, int d)
 {
struct sauthreq spa;
@@ -5534,6 +5546,11 @@ sppp_status(void)
 {
struct spppreq spr;
struct sauthreq spa;
+   struct sdnsreq spd;
+   struct sockaddr_storage *ss = NULL;
+
+   char buf[INET6_ADDRSTRLEN];
+   int i;
 
bzero(, sizeof(spr));
 
@@ -5573,6 +5590,39 @@ sppp_status(void)
if (spa.flags & AUTHFLAG_NORECHALLENGE)
printf("norechallenge ");
putchar('\n');
+
+   spppdnsinfo();
+   for (i = 0; i < 2; i++) {
+   ss = (struct sockaddr_storage *)_dns[i];
+   if (ss->ss_family == AF_UNSPEC)
+   continue;
+
+   switch (i) {
+   case 0:
+   inet_ntop(AF_INET, &((struct sockaddr_in 
*)ss)->sin_addr.s_addr,
+   buf, sizeof(buf));
+   printf("\tipcp pri_dns: %s\n", buf);
+   break;
+   case 1:
+   inet_ntop(AF_INET, &((struct sockaddr_in 
*)ss)->sin_addr.s_addr,
+   buf, sizeof(buf));
+   printf("\tipcp sec_dns: %s\n", buf);
+   break;
+#if notyet
+   case 2:
+   inet_ntop(AF_INET6, &((struct sockaddr_in6 
*)ss)->sin6_addr,
+   buf, sizeof(buf));
+   printf("\tpri_dns6: %s\n", buf);
+   break;
+   case 3:
+   inet_ntop(AF_INET6, &((struct sockaddr_in6 
*)ss)->sin6_addr,
+   buf, sizeof(buf));
+   printf("\tsec_dns6: %s\n", buf);
+   break;
+#endif
+   }
+   }
+
 }
 
 void
Index: if_sppp.h
===
RCS file: /cvs/src/sys/net/if_sppp.h,v
retrieving revision 1.27
diff -u -p -u -r1.27 if_sppp.h
--- if_sppp.h   24 Jun 2019 21:36:53 -  1.27
+++ if_sppp.h   6 Jul 2020 17:12:27 -
@@ -82,12 +82,18 @@ struct spppreq {
enum ppp_phase phase;   /* phase we're currently in */
 };
 
+struct sdnsreq {
+   int cmd;
+   struct sockaddr_storage ss_dns[4];
+};
+
 #define SPPPIOGDEFS  ((int)(('S' << 24) + (1 << 16) + sizeof(struct spppreq)))
 #define SPPPIOSDEFS  ((int)(('S' << 24) + (2 << 16) + sizeof(struct spppreq)))
 #define SPPPIOGMAUTH ((int)(('S' << 24) + (3 << 16) + sizeof(struct sauthreq)))
 #define SPPPIOSMAUTH ((int)(('S' << 24) + (4 << 16) + sizeof(struct sauthreq)))
 

Re: fix races in if_clone_create()

2020-07-06 Thread Vitaliy Makkoveev



> On 6 Jul 2020, at 12:17, Martin Pieuchot  wrote:
> 
> On 01/07/20(Wed) 00:02, Vitaliy Makkoveev wrote:
>> On Tue, Jun 30, 2020 at 03:48:22PM +0300, Vitaliy Makkoveev wrote:
>>> On Tue, Jun 30, 2020 at 12:08:03PM +0200, Martin Pieuchot wrote:
 On 29/06/20(Mon) 11:59, Vitaliy Makkoveev wrote:
> [...] 
> I reworked tool for reproduce. Now I avoided fork()/exec() route and it
> takes couple of minutes to take panic on 4 cores. Also some screenshots
> attached.
 
 Setting kern.pool_debug=2 makes the race reproducible in seconds.
>> 
>> Unfortunately you will catch splassert() caused by kern/sched_bsd.c:304.
>> malloc() will call yield() while we are holding NET_LOCK(). I attached
>> screenshot with splassertion to this mail.
> 
> With kern.splassert < 3 it is fine. 
> 
 Could you turn this test into something committable in regress/?  We can
 link it to the build once a fix is committed.
 
>>> 
>>> We have 3 races with cloned interfaces:
>>> 1. if_clone_create() vs if_clone_create()
>>> 2. if_clone_destroy() vs if_clone_destroy()
>>> 3. if_clone_destroy() vs the rest of stack
>>> 
>>> It makes sences to commit unified test to regress/, so I suggest to wait
>>> a little.
>> 
>> The another solution.
>> 
>> Diff below introduces per-`ifc' serialization for if_clone_create() and
>> if_clone_destroy(). There is no index bitmap anymore.
> 
> I like the simplification.  More comments below:
> 
>> +/*
>> + * Lock a clone network interface.
>> + */
>> +int
>> +if_clone_lock(struct if_clone *ifc)
>> +{
>> +int error;
>> +
>> +rw_enter_write(>ifc_lock);
>> +
>> +while (ifc->ifc_flags & IFC_CREATE_LOCKED) {
>> +ifc->ifc_flags |= IFC_CREATE_LOCKWAIT;
>> +error = rwsleep_nsec(>ifc_flags, >ifc_lock,
>> +PWAIT|PCATCH, "ifclk", INFSLP);
>> +if(error != 0) {
>> +ifc->ifc_flags &= ~IFC_CREATE_LOCKWAIT;
>> +rw_exit_write(>ifc_lock);
>> +return error;
>> +}
>> +}
>> +ifc->ifc_flags |= IFC_CREATE_LOCKED;
>> +ifc->ifc_flags &= ~IFC_CREATE_LOCKWAIT;
>> +
>> +rw_exit_write(>ifc_lock);
>> +
>> +return 0;
>> +}
> 
> This is like re-implementing a rwlock but loosing the debugging ability of
> WITNESS.

The reason to do this is to avoid call `ifc_create’ with rwlock held.
We have unique sleep points for each underlaying routine for `ifc_create’,
so this "rwlock reimplementation" looks better. Also this lock is used in
one place only and impact of loosing debugging ability is not such big.

> 
> I also don't see any reason for having a per-ifc lock.  If, at least one
> of the problems, is a double insert in `ifnet' then we should be able to
> assert that a lock is held when doing such assertion.

This race breaks ifunit() not `if_list’. I mean LIST_*() operation are
not broken because `le_{prev,next}’ are valid, but list is inconsistent of 
course.

Since only "ifconfig clonerA0 create& ifconfig clonerA0 create” will
break, I see no reason to deny simultaneous execution of
“ifconfig clonerA0 create& ifconfig clonerB0 create”.

Let this lock be per `ifc’ not global.

> 
> Assertions and documentation are more important than preventing races
> because they allow to build awareness and elegant solutions instead of
> hacking diffs until stuff work without knowing why.
> 
> There are two cases where `ifp' are inserted into `ifnet':
> 1. by autoconf during boot or hotplug
> 2. by cloning ioctl
> 
> In the second case it is always about pseudo-devices.  So the assertion
> should be conditional like:
> 
>   if (ISSET(ifp->if_xflags, IFXF_CLONED))
>   rw_assert_wrlock(_lock);
> 
> In other words this fixes serializes insertions/removal on the global
> list `ifnet', the KERNEL_LOCK() being still required for reading it.
> 
> Is there any other data structure which ends up being protected by this
> approach and could be documented?

We should be sure there is no multiple `ifnet’s in `if_list’ with the same
`if_xname’. And the assertion you proposed looks not obvious here.
Assertion like below looks more reasonable but introduces performance
impact.

 cut begin 
void
if_attach(struct ifnet *ifp)
{
if_attach_common(ifp);
NET_LOCK();
KASSERT(ifunit(ifp->if_xname) == NULL);
TAILQ_INSERT_TAIL(, ifp, if_list);
if_attachsetup(ifp);
NET_UNLOCK();
}
 cut end 

I guess the commentary within if_clone_create() is the best solution.
Something like this: “Deny simultaneous execution to prevent multiple
creation of interfaces with the same name”.


Re: pipex(4): kill pipexintr()

2020-07-06 Thread Martin Pieuchot
On 06/07/20(Mon) 16:42, Vitaliy Makkoveev wrote:
> [...] 
> pipex(4) is simultaneously locked by NET_LOCK() and KERNEL_LOCK() but
> with two exceptions:
> 
> 1. As you pointed pipex_pppoe_input() called without KERNEL_LOCK() held.
> 2. pppac_start() called without NET_LOCK() held. Or with NET_LOCK()
>held. It depends on `if_snd' usage.
> 
> Diff below enforces pppac_start() to be called with NET_LOCK() held.
> Also all externally called pipex(4) input and output routines have
> NET_ASSERT_LOCKED() assertion.
> 
> Now pipex(4) is fully protected by NET_LOCK() so description of struct
> members chenget too.
> 
> Index: sys/net/if_pppx.c
> ===
> RCS file: /cvs/src/sys/net/if_pppx.c,v
> retrieving revision 1.90
> diff -u -p -r1.90 if_pppx.c
> --- sys/net/if_pppx.c 24 Jun 2020 08:52:53 -  1.90
> +++ sys/net/if_pppx.c 6 Jul 2020 11:10:17 -
> @@ -1117,6 +1117,8 @@ pppacopen(dev_t dev, int flags, int mode
>   ifp->if_output = pppac_output;
>   ifp->if_start = pppac_start;
>   ifp->if_ioctl = pppac_ioctl;
> + /* XXXSMP: be sure pppac_start() called under NET_LOCK() */
> + IFQ_SET_MAXLEN(>if_snd, 1);

Is it possible to grab the NET_LOCK() inside pppac_start() instead of
grabbing it outside?  This should allow *start() routine to be called
from any context.

It might be interesting to see that as a difference between the NET_LOCK()
used to protect the network stack internals and the NET_LOCK() used to
protect pipex(4) internals.  Such distinction might help to convert the
latter into a different lock or primitive.

> Index: sys/net/pipex.c
> ===
> RCS file: /cvs/src/sys/net/pipex.c,v
> retrieving revision 1.117
> diff -u -p -r1.117 pipex.c
> --- sys/net/pipex.c   30 Jun 2020 14:05:13 -  1.117
> +++ sys/net/pipex.c   6 Jul 2020 11:10:17 -
> @@ -869,6 +869,7 @@ pipex_output(struct mbuf *m0, int af, in
>   struct ip ip;
>   struct mbuf *mret;
>  
> + NET_ASSERT_LOCKED();

This function doesn't touch any shared data structure, we'd better move
the NET_ASSERT_LOCKED() above rn_lookuo() in pipex_lookup_by_ip_address().

Note that `pipex_rd_head4' and `pipex_rd_head6' are, with this diff,
also protected by the NET_LOCK() and should be annotated as such.

>   session = NULL;
>   mret = NULL;
>   switch (af) {
> @@ -962,6 +963,8 @@ pipex_ppp_output(struct mbuf *m0, struct
>  {
>   u_char *cp, hdr[16];
>  
> + NET_ASSERT_LOCKED();

Same here, it seems that the only reason the NET_LOCK() is necessary in
the output path is to prevent corruption of the `session' descriptor being
used.  So we'd rather put the assertion above the LIST_FOREACH(). 

Anyway all of those can be addressed later, your diff is ok mpi@



Re: urtwn(4): Add support for D-Link DWA-121 rev B1

2020-07-06 Thread Jonathan Gray
On Mon, Jul 06, 2020 at 10:15:14AM +0100, Miguel Landaeta wrote:
> Hi,
> 
> The patch at the end should add support for USB wifi dongle
> DWA-121 from D-Link [1].
> 
> The USB id of such device is 2001:331b.
> 
> lykke$ usbdevs
> Controller /dev/usb0:
> addr 01: : Generic, EHCI root hub
> Controller /dev/usb1:
> addr 01: : Generic, EHCI root hub
> addr 02: 05e3:0608 Genesys Logic, USB2.0 Hub
> addr 03: 0c45:6321 Sonix Technology Co., Ltd., USB Camera
> Controller /dev/usb2:
> addr 01: : Generic, xHCI root hub
> Controller /dev/usb3:
> addr 01: : Generic, xHCI root hub
> addr 02: 2001:331b Realtek\r,
> Controller /dev/usb4:
> addr 01: : Generic, OHCI root hub
> addr 02: 258a:001e HAILUCK CO.,LTD, USB KEYBOARD
> Controller /dev/usb5:
> addr 01: : Generic, OHCI root hub
> 
> Relevant dmesg message (full dmesg is attached):
> 
> urtwn0 at uhub3 port 1 configuration 1 interface 0 "Realtek " rev 2.00/0.00 
> addr 2
> urtwn0: MAC/BB RTL8188EU, RF 6052 1T1R, address 60:63:4c:xx:xx:xx
> 
> Thanks,
> Miguel.
> 
> 1. https://eu.dlink.com/uk/en/products/dwa-121-wireless-n-150-pico-usb-adapter

Thanks, committed with if_urtwn.c changed to be in sorted order.

If you run 'fw_update bwfm' bwfm firmware will be installed and builtin
802.11 should work.

> 
> 
> Index: sys/dev/usb/if_urtwn.c
> ===
> RCS file: /cvs/src/sys/dev/usb/if_urtwn.c,v
> retrieving revision 1.90
> diff -u -p -r1.90 if_urtwn.c
> --- sys/dev/usb/if_urtwn.c11 Jun 2020 00:56:12 -  1.90
> +++ sys/dev/usb/if_urtwn.c6 Jul 2020 08:26:30 -
> @@ -328,6 +328,7 @@ static const struct urtwn_type {
>   URTWN_DEV_8188EU(REALTEK,   RTL8188ETV),
>   URTWN_DEV_8188EU(REALTEK,   RTL8188EU),
>   URTWN_DEV_8188EU(TPLINK,RTL8188EUS),
> + URTWN_DEV_8188EU(DLINK, DWA121B1),
>   /* URTWN_RTL8192EU */
>   URTWN_DEV_8192EU(DLINK, DWA131E1),
>   URTWN_DEV_8192EU(REALTEK,   RTL8192EU),
> Index: sys/dev/usb/usbdevs
> ===
> RCS file: /cvs/src/sys/dev/usb/usbdevs,v
> retrieving revision 1.717
> diff -u -p -r1.717 usbdevs
> --- sys/dev/usb/usbdevs   22 Jun 2020 15:49:37 -  1.717
> +++ sys/dev/usb/usbdevs   6 Jul 2020 08:26:30 -
> @@ -1565,6 +1565,7 @@ product DLINK DWA125D1  0x330f  DWA-125 r
>  product DLINK DWA123D1   0x3310  DWA-123 rev D1
>  product DLINK DWA137A1   0x3317  DWA-137 rev A1
>  product DLINK DWA131E1   0x3319  DWA-131 rev E1
> +product DLINK DWA121B1   0x331b  DWA-121 rev B1
>  product DLINK DWA182D1   0x331c  DWA-182 rev D1
>  product DLINK DWA171C1   0x331d  DWA-171 rev C1
>  product DLINK DWL122 0x3700  DWL-122
> Index: sys/dev/usb/usbdevs.h
> ===
> RCS file: /cvs/src/sys/dev/usb/usbdevs.h,v
> retrieving revision 1.729
> diff -u -p -r1.729 usbdevs.h
> --- sys/dev/usb/usbdevs.h 22 Jun 2020 15:52:39 -  1.729
> +++ sys/dev/usb/usbdevs.h 6 Jul 2020 08:26:30 -
> @@ -1570,6 +1570,7 @@
>  #define  USB_PRODUCT_DLINK_DWA131B   0x330d  /* DWA-131 rev 
> B */
>  #define  USB_PRODUCT_DLINK_DWA125D1  0x330f  /* DWA-125 rev 
> D1 */
>  #define  USB_PRODUCT_DLINK_DWA123D1  0x3310  /* DWA-123 rev 
> D1 */
> +#define  USB_PRODUCT_DLINK_DWA121B1  0x331b  /* DWA-121 rev 
> B1 */
>  #define  USB_PRODUCT_DLINK_DWA137A1  0x3317  /* DWA-137 rev 
> A1 */
>  #define  USB_PRODUCT_DLINK_DWA131E1  0x3319  /* DWA-131 rev 
> E1 */
>  #define  USB_PRODUCT_DLINK_DWA182D1  0x331c  /* DWA-182 rev 
> D1 */
> Index: sys/dev/usb/usbdevs_data.h
> ===
> RCS file: /cvs/src/sys/dev/usb/usbdevs_data.h,v
> retrieving revision 1.723
> diff -u -p -r1.723 usbdevs_data.h
> --- sys/dev/usb/usbdevs_data.h22 Jun 2020 15:52:39 -  1.723
> +++ sys/dev/usb/usbdevs_data.h6 Jul 2020 08:26:31 -
> @@ -2618,6 +2618,10 @@ const struct usb_known_product usb_known
>   "DWA-123 rev D1",
>   },
>   {
> + USB_VENDOR_DLINK, USB_PRODUCT_DLINK_DWA121B1,
> + "DWA-121 rev B1",
> + },
> + {
>   USB_VENDOR_DLINK, USB_PRODUCT_DLINK_DWA137A1,
>   "DWA-137 rev A1",
>   },
> 
> 
> -- 
> Miguel Landaeta, nomadium at debian.org
> secure email with PGP 0x6E608B637D8967E9 available at http://miguel.cc/key.
> "Faith means not wanting to know what is true." -- Nietzsche

> OpenBSD 6.7-current (CUSTOM) #0: Fri Jul  3 14:18:45 IST 2020
> mig...@lykke.nomadium.net:/sys/arch/arm64/compile/CUSTOM
> real mem  = 4092612608 (3903MB)
> avail mem = 3891392512 (3711MB)
> random: good seed from bootblocks
> mainbus0 at root: Pine64 Pinebook Pro
> 

urtwn(4): Add support for D-Link DWA-121 rev B1

2020-07-06 Thread Miguel Landaeta
Hi,

The patch at the end should add support for USB wifi dongle
DWA-121 from D-Link [1].

The USB id of such device is 2001:331b.

lykke$ usbdevs
Controller /dev/usb0:
addr 01: : Generic, EHCI root hub
Controller /dev/usb1:
addr 01: : Generic, EHCI root hub
addr 02: 05e3:0608 Genesys Logic, USB2.0 Hub
addr 03: 0c45:6321 Sonix Technology Co., Ltd., USB Camera
Controller /dev/usb2:
addr 01: : Generic, xHCI root hub
Controller /dev/usb3:
addr 01: : Generic, xHCI root hub
addr 02: 2001:331b Realtek\r,
Controller /dev/usb4:
addr 01: : Generic, OHCI root hub
addr 02: 258a:001e HAILUCK CO.,LTD, USB KEYBOARD
Controller /dev/usb5:
addr 01: : Generic, OHCI root hub

Relevant dmesg message (full dmesg is attached):

urtwn0 at uhub3 port 1 configuration 1 interface 0 "Realtek " rev 2.00/0.00 
addr 2
urtwn0: MAC/BB RTL8188EU, RF 6052 1T1R, address 60:63:4c:xx:xx:xx

Thanks,
Miguel.

1. https://eu.dlink.com/uk/en/products/dwa-121-wireless-n-150-pico-usb-adapter


Index: sys/dev/usb/if_urtwn.c
===
RCS file: /cvs/src/sys/dev/usb/if_urtwn.c,v
retrieving revision 1.90
diff -u -p -r1.90 if_urtwn.c
--- sys/dev/usb/if_urtwn.c  11 Jun 2020 00:56:12 -  1.90
+++ sys/dev/usb/if_urtwn.c  6 Jul 2020 08:26:30 -
@@ -328,6 +328,7 @@ static const struct urtwn_type {
URTWN_DEV_8188EU(REALTEK,   RTL8188ETV),
URTWN_DEV_8188EU(REALTEK,   RTL8188EU),
URTWN_DEV_8188EU(TPLINK,RTL8188EUS),
+   URTWN_DEV_8188EU(DLINK, DWA121B1),
/* URTWN_RTL8192EU */
URTWN_DEV_8192EU(DLINK, DWA131E1),
URTWN_DEV_8192EU(REALTEK,   RTL8192EU),
Index: sys/dev/usb/usbdevs
===
RCS file: /cvs/src/sys/dev/usb/usbdevs,v
retrieving revision 1.717
diff -u -p -r1.717 usbdevs
--- sys/dev/usb/usbdevs 22 Jun 2020 15:49:37 -  1.717
+++ sys/dev/usb/usbdevs 6 Jul 2020 08:26:30 -
@@ -1565,6 +1565,7 @@ product DLINK DWA125D10x330f  DWA-125 r
 product DLINK DWA123D1 0x3310  DWA-123 rev D1
 product DLINK DWA137A1 0x3317  DWA-137 rev A1
 product DLINK DWA131E1 0x3319  DWA-131 rev E1
+product DLINK DWA121B1 0x331b  DWA-121 rev B1
 product DLINK DWA182D1 0x331c  DWA-182 rev D1
 product DLINK DWA171C1 0x331d  DWA-171 rev C1
 product DLINK DWL122   0x3700  DWL-122
Index: sys/dev/usb/usbdevs.h
===
RCS file: /cvs/src/sys/dev/usb/usbdevs.h,v
retrieving revision 1.729
diff -u -p -r1.729 usbdevs.h
--- sys/dev/usb/usbdevs.h   22 Jun 2020 15:52:39 -  1.729
+++ sys/dev/usb/usbdevs.h   6 Jul 2020 08:26:30 -
@@ -1570,6 +1570,7 @@
 #defineUSB_PRODUCT_DLINK_DWA131B   0x330d  /* DWA-131 rev 
B */
 #defineUSB_PRODUCT_DLINK_DWA125D1  0x330f  /* DWA-125 rev 
D1 */
 #defineUSB_PRODUCT_DLINK_DWA123D1  0x3310  /* DWA-123 rev 
D1 */
+#defineUSB_PRODUCT_DLINK_DWA121B1  0x331b  /* DWA-121 rev 
B1 */
 #defineUSB_PRODUCT_DLINK_DWA137A1  0x3317  /* DWA-137 rev 
A1 */
 #defineUSB_PRODUCT_DLINK_DWA131E1  0x3319  /* DWA-131 rev 
E1 */
 #defineUSB_PRODUCT_DLINK_DWA182D1  0x331c  /* DWA-182 rev 
D1 */
Index: sys/dev/usb/usbdevs_data.h
===
RCS file: /cvs/src/sys/dev/usb/usbdevs_data.h,v
retrieving revision 1.723
diff -u -p -r1.723 usbdevs_data.h
--- sys/dev/usb/usbdevs_data.h  22 Jun 2020 15:52:39 -  1.723
+++ sys/dev/usb/usbdevs_data.h  6 Jul 2020 08:26:31 -
@@ -2618,6 +2618,10 @@ const struct usb_known_product usb_known
"DWA-123 rev D1",
},
{
+   USB_VENDOR_DLINK, USB_PRODUCT_DLINK_DWA121B1,
+   "DWA-121 rev B1",
+   },
+   {
USB_VENDOR_DLINK, USB_PRODUCT_DLINK_DWA137A1,
"DWA-137 rev A1",
},


-- 
Miguel Landaeta, nomadium at debian.org
secure email with PGP 0x6E608B637D8967E9 available at http://miguel.cc/key.
"Faith means not wanting to know what is true." -- Nietzsche
OpenBSD 6.7-current (CUSTOM) #0: Fri Jul  3 14:18:45 IST 2020
mig...@lykke.nomadium.net:/sys/arch/arm64/compile/CUSTOM
real mem  = 4092612608 (3903MB)
avail mem = 3891392512 (3711MB)
random: good seed from bootblocks
mainbus0 at root: Pine64 Pinebook Pro
psci0 at mainbus0: PSCI 1.1, SMCCC 1.1
cpu0 at mainbus0 mpidr 0: ARM Cortex-A53 r0p4
cpu0: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
cpu0: 512KB 64b/line 16-way L2 cache
cpu1 at mainbus0 mpidr 1: ARM Cortex-A53 r0p4
cpu1: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 D-cache
cpu1: 512KB 64b/line 16-way L2 cache
cpu2 at mainbus0 mpidr 2: ARM Cortex-A53 r0p4
cpu2: 32KB 64b/line 2-way L1 VIPT I-cache, 32KB 64b/line 4-way L1 

Re: fix races in if_clone_create()

2020-07-06 Thread Martin Pieuchot
On 01/07/20(Wed) 00:02, Vitaliy Makkoveev wrote:
> On Tue, Jun 30, 2020 at 03:48:22PM +0300, Vitaliy Makkoveev wrote:
> > On Tue, Jun 30, 2020 at 12:08:03PM +0200, Martin Pieuchot wrote:
> > > On 29/06/20(Mon) 11:59, Vitaliy Makkoveev wrote:
> > > > [...] 
> > > > I reworked tool for reproduce. Now I avoided fork()/exec() route and it
> > > > takes couple of minutes to take panic on 4 cores. Also some screenshots
> > > > attached.
> > > 
> > > Setting kern.pool_debug=2 makes the race reproducible in seconds.
> 
> Unfortunately you will catch splassert() caused by kern/sched_bsd.c:304.
> malloc() will call yield() while we are holding NET_LOCK(). I attached
> screenshot with splassertion to this mail.

With kern.splassert < 3 it is fine. 

> > > Could you turn this test into something committable in regress/?  We can
> > > link it to the build once a fix is committed.
> > > 
> > 
> > We have 3 races with cloned interfaces:
> > 1. if_clone_create() vs if_clone_create()
> > 2. if_clone_destroy() vs if_clone_destroy()
> > 3. if_clone_destroy() vs the rest of stack
> > 
> > It makes sences to commit unified test to regress/, so I suggest to wait
> > a little.
> 
> The another solution.
> 
> Diff below introduces per-`ifc' serialization for if_clone_create() and
> if_clone_destroy(). There is no index bitmap anymore.

I like the simplification.  More comments below:

> +/*
> + * Lock a clone network interface.
> + */
> +int
> +if_clone_lock(struct if_clone *ifc)
> +{
> + int error;
> +
> + rw_enter_write(>ifc_lock);
> +
> + while (ifc->ifc_flags & IFC_CREATE_LOCKED) {
> + ifc->ifc_flags |= IFC_CREATE_LOCKWAIT;
> + error = rwsleep_nsec(>ifc_flags, >ifc_lock,
> + PWAIT|PCATCH, "ifclk", INFSLP);
> + if(error != 0) {
> + ifc->ifc_flags &= ~IFC_CREATE_LOCKWAIT;
> + rw_exit_write(>ifc_lock);
> + return error;
> + }
> + }
> + ifc->ifc_flags |= IFC_CREATE_LOCKED;
> + ifc->ifc_flags &= ~IFC_CREATE_LOCKWAIT;
> +
> + rw_exit_write(>ifc_lock);
> + 
> + return 0;
> +}

This is like re-implementing a rwlock but loosing the debugging ability of
WITNESS.

I also don't see any reason for having a per-ifc lock.  If, at least one
of the problems, is a double insert in `ifnet' then we should be able to
assert that a lock is held when doing such assertion.

Assertions and documentation are more important than preventing races
because they allow to build awareness and elegant solutions instead of
hacking diffs until stuff work without knowing why.

There are two cases where `ifp' are inserted into `ifnet':
 1. by autoconf during boot or hotplug
 2. by cloning ioctl

In the second case it is always about pseudo-devices.  So the assertion
should be conditional like:

if (ISSET(ifp->if_xflags, IFXF_CLONED))
rw_assert_wrlock(_lock);

In other words this fixes serializes insertions/removal on the global
list `ifnet', the KERNEL_LOCK() being still required for reading it.

Is there any other data structure which ends up being protected by this
approach and could be documented?



Re: route add ::/0 ...

2020-07-06 Thread YASUOKA Masahiko
Let me updated the diff.

On Mon, 06 Jul 2020 17:54:30 +0900 (JST)
YASUOKA Masahiko  wrote:
> On Tue, 30 Jun 2020 02:42:02 +0200
> Klemens Nanni  wrote:
>> On Tue, Jun 30, 2020 at 09:00:30AM +0900, YASUOKA Masahiko wrote:
>>> inet_makenetandmask() had required another treatment.
>>> 
>>> Also -prefixlen 0 for -inet has a bug
>>> 
>>>  % doas ./obj/route -T100 add -inet 0.0.0.0 -prefixlen 0 127.0.0.1
>>>  add net 0.0.0.0: gateway 127.0.0.1
>>>  % netstat -nrf inet -T 100
>>>  Routing tables
>>> 
>>>  Internet:
>>>  DestinationGatewayFlags   Refs  Use   Mtu  Prio 
>>> Iface
>>>  0.0.0.0/32 127.0.0.1  UGS00 32768 8 
>>> lo100
>>> 
>>> /0 becomes /32.  The diff following also fixes the problem.
>> Yes, this looks correct to me;  regress is also happy (again).
>> 
>> OK kn
> 
> Thanks,
> 
> I'm  going to commit the diff.  ok or comments, are still welcome.
> 
> 
> Stop using make_addr() which trims trailing zeros of the netmask, set
> family and length field.  This fixes route(8) to handle "::/0"
> properly.  Also fix "route add -inet 0.0.0.0 -prefixlen 0 (gateway)"
> to work properly.
> 
> Index: sbin/route/route.c
> ===
> RCS file: /cvs/src/sbin/route/route.c,v
> retrieving revision 1.247
> diff -u -p -r1.247 route.c
> --- sbin/route/route.c15 Jan 2020 10:26:25 -  1.247
> +++ sbin/route/route.c6 Jul 2020 08:45:06 -
(snip)
> @@ -781,12 +780,9 @@ inet_makenetandmask(u_int32_t net, struc
>   sin->sin_addr.s_addr = htonl(net);
>   sin = _mask.sin;
>   sin->sin_addr.s_addr = htonl(mask);
> - sin->sin_len = 0;
> - sin->sin_family = 0;
> + sin->sin_family = AF_INET;
>   cp = (char *)(>sin_addr + 1);
> - while (*--cp == '\0' && cp > (char *)sin)
> - continue;
> - sin->sin_len = 1 + cp - (char *)sin;
> + sin->sin_len = sizeof(struct sockaddr_in);
>  }
>  
>  /*

"cp" becomes unused.  The updated diff removes "cp" as well.

Index: sbin/route/route.c
===
RCS file: /cvs/src/sbin/route/route.c,v
retrieving revision 1.247
diff -u -p -r1.247 route.c
--- sbin/route/route.c  15 Jan 2020 10:26:25 -  1.247
+++ sbin/route/route.c  6 Jul 2020 08:57:25 -
@@ -107,7 +107,6 @@ void print_rtmsg(struct rt_msghdr *, in
 voidpmsg_common(struct rt_msghdr *);
 voidpmsg_addrs(char *, int);
 voidbprintf(FILE *, int, char *);
-voidmask_addr(union sockunion *, union sockunion *, int);
 int getaddr(int, int, char *, struct hostent **);
 voidgetmplslabel(char *, int);
 int rtmsg(int, int, int, uint8_t);
@@ -767,7 +766,6 @@ void
 inet_makenetandmask(u_int32_t net, struct sockaddr_in *sin, int bits)
 {
u_int32_t mask;
-   char *cp;
 
rtm_addrs |= RTA_NETMASK;
if (bits == 0 && net == 0)
@@ -781,12 +779,8 @@ inet_makenetandmask(u_int32_t net, struc
sin->sin_addr.s_addr = htonl(net);
sin = _mask.sin;
sin->sin_addr.s_addr = htonl(mask);
-   sin->sin_len = 0;
-   sin->sin_family = 0;
-   cp = (char *)(>sin_addr + 1);
-   while (*--cp == '\0' && cp > (char *)sin)
-   continue;
-   sin->sin_len = 1 + cp - (char *)sin;
+   sin->sin_family = AF_INET;
+   sin->sin_len = sizeof(struct sockaddr_in);
 }
 
 /*
@@ -1001,7 +995,8 @@ prefixlen(int af, char *s)
memset(_mask, 0, sizeof(so_mask));
so_mask.sin.sin_family = AF_INET;
so_mask.sin.sin_len = sizeof(struct sockaddr_in);
-   so_mask.sin.sin_addr.s_addr = htonl(0x << (32 - len));
+   if (len != 0)
+   so_mask.sin.sin_addr.s_addr = htonl(0x << (32 - 
len));
break;
case AF_INET6:
so_mask.sin6.sin6_family = AF_INET6;
@@ -1088,8 +1083,6 @@ rtmsg(int cmd, int flags, int fmask, uin
rtm.rtm_mpls = mpls_flags;
rtm.rtm_hdrlen = sizeof(rtm);
 
-   if (rtm_addrs & RTA_NETMASK)
-   mask_addr(_dst, _mask, RTA_DST);
/* store addresses in ascending order of RTA values */
NEXTADDR(RTA_DST, so_dst);
NEXTADDR(RTA_GATEWAY, so_gate);
@@ -1118,34 +,6 @@ rtmsg(int cmd, int flags, int fmask, uin
}
 #undef rtm
return (0);
-}
-
-void
-mask_addr(union sockunion *addr, union sockunion *mask, int which)
-{
-   int olen = mask->sa.sa_len;
-   char *cp1 = olen + (char *)mask, *cp2;
-
-   for (mask->sa.sa_len = 0; cp1 > (char *)mask; )
-   if (*--cp1 != '\0') {
-   mask->sa.sa_len = 1 + cp1 - (char *)mask;
-   break;
-   }
-   if ((rtm_addrs & which) == 0)
-   return;
-   switch (addr->sa.sa_family) {
-   case AF_INET:
-   case AF_INET6:
-   case AF_UNSPEC:
-   return;
-   }
-   cp1 = 

Re: pipex(4): kill pipexintr()

2020-07-06 Thread Martin Pieuchot
On 01/07/20(Wed) 22:42, Vitaliy Makkoveev wrote:
> pipex(4) has 2 mbuf queues: `pipexinq' and `pipexoutq'. When mbuf passed
> to pipex it goes to one of these queues and pipexintr() will be
> scheduled to process them. pipexintr() called from `netisr' context.
> 
> It's true for pppac(4) but for pppx(4) only incoming mbufs go to
> `pipexinq. Outgoing mbufs go directly to stack. pppx(4) enabled in
> npppd.conf(5) by default so I guess it's the common case of pipex(4)
> usage.
> 
> The code looks like there is no requirements to this delayed mbufs
> processing, we can pass it directly to stack as we do for pppx(4)
> outgoing traffic.
> 
> Also we have some troubles with pipexintr() as it was described in [1].
> It's protection of `ph_cookie'. We don't this protection this time and
> we can't because we should brake if_get(9) logic.
> 
> Diff below removes pipexintr(). Now all mbufs passed directly without
> enqueueing within pipex(4). We also can destroy sessions safe in all
> cases. We also can use if_get(9) instead using unreferenced pointers to
> `ifnet' within pipex(4). We also avoided context switch while we
> processing mbufs within pipex(4). We decreased latency.
> 
> I'm seeding debian torrents with this diff an all goes well.

With this diff the content of pipexintr() is no longer executed with the
KERNEL_LOCK() held.  This can be seen by following the code starting in
ether_input().

Grabbing the KERNEL_LOCK() there is not a way forward.  The whole idea
of if_input_process() is to be free of KERNEL_LOCK() to not introduce
latency delay.

So this changes implies that `pipex_session_list' and possibly other
global data structures as well as the elements linked in those are all
protected by the NET_LOCK().  I believe this is the easiest way forward.

That said I would be comfortable with this diff going in if an audit of
the data structures accessed in the code path starting by pipex_pppoe_input()
has been done.  That implies annotating/documenting which data structures
are now protected by the NET_LOCK() and adding the necessary
NET_ASSERT_LOCK().  Such audit might lead to consider some ioctl code
path changes to now serialize on the NET_LOCK() instead of the
KERNEL_LOCK().

> 1. https://marc.info/?t=15930080902=1=2
> 
> Index: lib/libc/sys/sysctl.2
> ===
> RCS file: /cvs/src/lib/libc/sys/sysctl.2,v
> retrieving revision 1.40
> diff -u -p -r1.40 sysctl.2
> --- lib/libc/sys/sysctl.2 17 May 2020 05:48:39 -  1.40
> +++ lib/libc/sys/sysctl.2 1 Jul 2020 19:20:22 -
> @@ -2033,35 +2033,11 @@ The currently defined variable names are
>  .Bl -column "Third level name" "integer" "Changeable" -offset indent
>  .It Sy "Third level name" Ta Sy "Type" Ta Sy "Changeable"
>  .It Dv PIPEXCTL_ENABLE Ta integer Ta yes
> -.It Dv PIPEXCTL_INQ Ta node Ta not applicable
> -.It Dv PIPEXCTL_OUTQ Ta node Ta not applicable
>  .El
>  .Bl -tag -width "123456"
>  .It Dv PIPEXCTL_ENABLE
>  If set to 1, enable PIPEX processing.
>  The default is 0.
> -.It Dv PIPEXCTL_INQ Pq Va net.pipex.inq
> -Fourth level comprises an array of
> -.Vt struct ifqueue
> -structures containing information about the PIPEX packet input queue.
> -The forth level names for the elements of
> -.Vt struct ifqueue
> -are the same as described in
> -.Li ip.arpq
> -in the
> -.Dv PF_INET
> -section.
> -.It Dv PIPEXCTL_OUTQ Pq Va net.pipex.outq
> -Fourth level comprises an array of
> -.Vt struct ifqueue
> -structures containing information about PIPEX packet output queue.
> -The forth level names for the elements of
> -.Vt struct ifqueue
> -are the same as described in
> -.Li ip.arpq
> -in the
> -.Dv PF_INET
> -section.
>  .El
>  .El
>  .Ss CTL_VFS
> Index: sys/net/if.c
> ===
> RCS file: /cvs/src/sys/net/if.c,v
> retrieving revision 1.611
> diff -u -p -r1.611 if.c
> --- sys/net/if.c  30 Jun 2020 09:31:38 -  1.611
> +++ sys/net/if.c  1 Jul 2020 19:20:27 -
> @@ -1012,13 +1012,6 @@ if_netisr(void *unused)
>   KERNEL_UNLOCK();
>   }
>  #endif
> -#ifdef PIPEX
> - if (n & (1 << NETISR_PIPEX)) {
> - KERNEL_LOCK();
> - pipexintr();
> - KERNEL_UNLOCK();
> - }
> -#endif
>   t |= n;
>   }
>  
> Index: sys/net/netisr.h
> ===
> RCS file: /cvs/src/sys/net/netisr.h,v
> retrieving revision 1.51
> diff -u -p -r1.51 netisr.h
> --- sys/net/netisr.h  6 Aug 2019 22:57:54 -   1.51
> +++ sys/net/netisr.h  1 Jul 2020 19:20:27 -
> @@ -48,7 +48,6 @@
>  #define  NETISR_IPV6 24  /* same as AF_INET6 */
>  #define  NETISR_ISDN 26  /* same as AF_E164 */
>  #define  NETISR_PPP  28  /* for PPP processing */
> -#define  NETISR_PIPEX27  /* for 

Re: route add ::/0 ...

2020-07-06 Thread YASUOKA Masahiko


On Tue, 30 Jun 2020 02:42:02 +0200
Klemens Nanni  wrote:
> On Tue, Jun 30, 2020 at 09:00:30AM +0900, YASUOKA Masahiko wrote:
>> inet_makenetandmask() had required another treatment.
>> 
>> Also -prefixlen 0 for -inet has a bug
>> 
>>  % doas ./obj/route -T100 add -inet 0.0.0.0 -prefixlen 0 127.0.0.1
>>  add net 0.0.0.0: gateway 127.0.0.1
>>  % netstat -nrf inet -T 100
>>  Routing tables
>> 
>>  Internet:
>>  DestinationGatewayFlags   Refs  Use   Mtu  Prio 
>> Iface
>>  0.0.0.0/32 127.0.0.1  UGS00 32768 8 
>> lo100
>> 
>> /0 becomes /32.  The diff following also fixes the problem.
> Yes, this looks correct to me;  regress is also happy (again).
> 
> OK kn

Thanks,

I'm  going to commit the diff.  ok or comments, are still welcome.


Stop using make_addr() which trims trailing zeros of the netmask, set
family and length field.  This fixes route(8) to handle "::/0"
properly.  Also fix "route add -inet 0.0.0.0 -prefixlen 0 (gateway)"
to work properly.

Index: sbin/route/route.c
===
RCS file: /cvs/src/sbin/route/route.c,v
retrieving revision 1.247
diff -u -p -r1.247 route.c
--- sbin/route/route.c  15 Jan 2020 10:26:25 -  1.247
+++ sbin/route/route.c  6 Jul 2020 08:45:06 -
@@ -107,7 +107,6 @@ void print_rtmsg(struct rt_msghdr *, in
 voidpmsg_common(struct rt_msghdr *);
 voidpmsg_addrs(char *, int);
 voidbprintf(FILE *, int, char *);
-voidmask_addr(union sockunion *, union sockunion *, int);
 int getaddr(int, int, char *, struct hostent **);
 voidgetmplslabel(char *, int);
 int rtmsg(int, int, int, uint8_t);
@@ -781,12 +780,9 @@ inet_makenetandmask(u_int32_t net, struc
sin->sin_addr.s_addr = htonl(net);
sin = _mask.sin;
sin->sin_addr.s_addr = htonl(mask);
-   sin->sin_len = 0;
-   sin->sin_family = 0;
+   sin->sin_family = AF_INET;
cp = (char *)(>sin_addr + 1);
-   while (*--cp == '\0' && cp > (char *)sin)
-   continue;
-   sin->sin_len = 1 + cp - (char *)sin;
+   sin->sin_len = sizeof(struct sockaddr_in);
 }
 
 /*
@@ -1001,7 +997,8 @@ prefixlen(int af, char *s)
memset(_mask, 0, sizeof(so_mask));
so_mask.sin.sin_family = AF_INET;
so_mask.sin.sin_len = sizeof(struct sockaddr_in);
-   so_mask.sin.sin_addr.s_addr = htonl(0x << (32 - len));
+   if (len != 0)
+   so_mask.sin.sin_addr.s_addr = htonl(0x << (32 - 
len));
break;
case AF_INET6:
so_mask.sin6.sin6_family = AF_INET6;
@@ -1088,8 +1085,6 @@ rtmsg(int cmd, int flags, int fmask, uin
rtm.rtm_mpls = mpls_flags;
rtm.rtm_hdrlen = sizeof(rtm);
 
-   if (rtm_addrs & RTA_NETMASK)
-   mask_addr(_dst, _mask, RTA_DST);
/* store addresses in ascending order of RTA values */
NEXTADDR(RTA_DST, so_dst);
NEXTADDR(RTA_GATEWAY, so_gate);
@@ -1118,34 +1113,6 @@ rtmsg(int cmd, int flags, int fmask, uin
}
 #undef rtm
return (0);
-}
-
-void
-mask_addr(union sockunion *addr, union sockunion *mask, int which)
-{
-   int olen = mask->sa.sa_len;
-   char *cp1 = olen + (char *)mask, *cp2;
-
-   for (mask->sa.sa_len = 0; cp1 > (char *)mask; )
-   if (*--cp1 != '\0') {
-   mask->sa.sa_len = 1 + cp1 - (char *)mask;
-   break;
-   }
-   if ((rtm_addrs & which) == 0)
-   return;
-   switch (addr->sa.sa_family) {
-   case AF_INET:
-   case AF_INET6:
-   case AF_UNSPEC:
-   return;
-   }
-   cp1 = mask->sa.sa_len + 1 + (char *)addr;
-   cp2 = addr->sa.sa_len + 1 + (char *)addr;
-   while (cp2 > cp1)
-   *--cp2 = '\0';
-   cp2 = mask->sa.sa_len + 1 + (char *)mask;
-   while (cp1 > addr->sa.sa_data)
-   *--cp1 &= *--cp2;
 }
 
 char *msgtypes[] = {