Re: gptimer(4): switch to clockintr

2023-01-24 Thread Jonathan Gray
On Tue, Jan 24, 2023 at 08:57:48PM +0100, Patrick Wildt wrote:
> Am Mon, Jan 23, 2023 at 04:34:27PM -0600 schrieb Scott Cheloha:
> > Whoops, missed one.  This is the fifth and (I think) last armv7 clock
> > interrupt driver that needs to switch to clockintr.  gptimer(4) is
> > nearly identical to dmtimer(4).
> > 
> > Notable changes:
> > 
> > - Switch stathz from 128 to hz.
> > - Switch profhz from 1024 to (stathz * 10).
> > 
> > Everything else in the patch is just normal clockintr switching stuff
> > or duplicated from the dmtimer(4) patch.
> > 
> > jca@ has compile-tested this.
> > 
> > I would appreciate a test to confirm that the GENERIC boots.  I don't
> > think we need to do a full release build.
> > 
> > ... if nobody knows where to find a board with gptimer(4), I am
> > looking for permission to just commit this as-is.  I cannot find any
> > entries in the dmesg mails of any machines with gptimer(4).  kettenis@
> > was uncertain whether we actually support any of the machines that have
> > this clock.
> > 
> > Preferences?  ok?
> 
> Should we just get rid of it?  It's only used on OMAP3, which was
> already outdated when I started my ARMv7 endeavour a decade ago.
> I never had that machine, drahn@ might have one in the attic.
> 
> The relevant platforms were the Pandaboard (OMAP4) and the BeagleBone
> Black (AM335x).  But the OMAP3, did this thing ever work?

We don't match it with device tree but it appears to be
ti,omap3430-timer

Appears on OMAP4.  Though OMAP4/Cortex A9 also has amptimer(4).



Re: gptimer(4): switch to clockintr

2023-01-24 Thread Patrick Wildt
Am Tue, Jan 24, 2023 at 10:32:49PM +0100 schrieb Mark Kettenis:
> > Date: Tue, 24 Jan 2023 20:57:48 +0100
> > From: Patrick Wildt 
> > 
> > Am Mon, Jan 23, 2023 at 04:34:27PM -0600 schrieb Scott Cheloha:
> > > Whoops, missed one.  This is the fifth and (I think) last armv7 clock
> > > interrupt driver that needs to switch to clockintr.  gptimer(4) is
> > > nearly identical to dmtimer(4).
> > > 
> > > Notable changes:
> > > 
> > > - Switch stathz from 128 to hz.
> > > - Switch profhz from 1024 to (stathz * 10).
> > > 
> > > Everything else in the patch is just normal clockintr switching stuff
> > > or duplicated from the dmtimer(4) patch.
> > > 
> > > jca@ has compile-tested this.
> > > 
> > > I would appreciate a test to confirm that the GENERIC boots.  I don't
> > > think we need to do a full release build.
> > > 
> > > ... if nobody knows where to find a board with gptimer(4), I am
> > > looking for permission to just commit this as-is.  I cannot find any
> > > entries in the dmesg mails of any machines with gptimer(4).  kettenis@
> > > was uncertain whether we actually support any of the machines that have
> > > this clock.
> > > 
> > > Preferences?  ok?
> > 
> > Should we just get rid of it?  It's only used on OMAP3, which was
> > already outdated when I started my ARMv7 endeavour a decade ago.
> > I never had that machine, drahn@ might have one in the attic.
> > 
> > The relevant platforms were the Pandaboard (OMAP4) and the BeagleBone
> > Black (AM335x).  But the OMAP3, did this thing ever work?
> 
> Might as well commit this such that we have it in the attic just in
> case someone wants to revive OMAP3?

Yup, sounds good to me.

> > > Index: gptimer.c
> > > ===
> > > RCS file: /cvs/src/sys/arch/armv7/omap/gptimer.c,v
> > > retrieving revision 1.17
> > > diff -u -p -r1.17 gptimer.c
> > > --- gptimer.c 22 Jan 2023 18:36:38 -  1.17
> > > +++ gptimer.c 22 Jan 2023 23:52:32 -
> > > @@ -23,9 +23,11 @@
> > >  
> > >  #include 
> > >  #include 
> > > +#include 
> > >  #include 
> > >  #include 
> > >  #include 
> > > +#include 
> > >  #include 
> > >  #include 
> > >  #include 
> > > @@ -93,14 +95,12 @@
> > >  
> > >  #define TIMER_FREQUENCY  32768   /* 32kHz is used, 
> > > selectable */
> > >  
> > > -static struct evcount clk_count;
> > > -static struct evcount stat_count;
> > > -
> > >  void gptimer_attach(struct device *parent, struct device *self, void 
> > > *args);
> > >  int gptimer_intr(void *frame);
> > >  void gptimer_wait(int reg);
> > >  void gptimer_cpu_initclocks(void);
> > >  void gptimer_delay(u_int);
> > > +void gptimer_reset_tisr(void);
> > >  void gptimer_setstatclockrate(int newhz);
> > >  
> > >  bus_space_tag_t gptimer_iot;
> > > @@ -120,13 +120,16 @@ static struct timecounter gptimer_timeco
> > >   .tc_user = 0,
> > >  };
> > >  
> > > -volatile u_int32_t nexttickevent;
> > > -volatile u_int32_t nextstatevent;
> > > -u_int32_tticks_per_second;
> > > -u_int32_tticks_per_intr;
> > > -u_int32_tticks_err_cnt;
> > > -u_int32_tticks_err_sum;
> > > -u_int32_tstatvar, statmin;
> > > +uint64_t gptimer_nsec_cycle_ratio;
> > > +uint64_t gptimer_nsec_max;
> > > +
> > > +void gptimer_rearm(void *, uint64_t);
> > > +void gptimer_trigger(void *);
> > > +
> > > +const struct intrclock gptimer_intrclock = {
> > > + .ic_rearm = gptimer_rearm,
> > > + .ic_trigger = gptimer_trigger
> > > +};
> > >  
> > >  const struct cfattachgptimer_ca = {
> > >   sizeof (struct device), NULL, gptimer_attach
> > > @@ -177,98 +180,10 @@ gptimer_attach(struct device *parent, st
> > >   gptimer_setstatclockrate, NULL);
> > >  }
> > >  
> > > -/*
> > > - * See comment in arm/xscale/i80321_clock.c
> > > - *
> > > - * counter is count up, but with autoreload timers it is not possible
> > > - * to detect how many  interrupts passed while interrupts were blocked.
> > > - * also it is not possible to atomically add to the register
> > > - * get get it to precisely fire at a non-fixed interval.
> > > - *
> > > - * To work around this two timers are used, GPT1 is used as a reference
> > > - * clock without reload , however we just ignore the interrupt it
> > > - * would (may?) generate.
> > > - *
> > > - * Internally this keeps track of when the next timer should fire
> > > - * and based on that time and the current value of the reference
> > > - * clock a number is written into the timer count register to schedule
> > > - * the next event.
> > > - */
> > > -
> > >  int
> > >  gptimer_intr(void *frame)
> > >  {
> > > - u_int32_t now, r;
> > > - u_int32_t nextevent, duration;
> > > -
> > > - /* clear interrupt */
> > > - now = bus_space_read_4(gptimer_iot, gptimer_ioh1, GP_TCRR);
> > > -
> > > - while ((int32_t) (nexttickevent - now) < 0) {
> > > - nexttickevent += ticks_per_intr;
> > > - ticks_err_sum += ticks_err_cnt;
> > > -#if 0
> > > - if (ticks_err_sum  

Re: gptimer(4): switch to clockintr

2023-01-24 Thread Mark Kettenis
> Date: Tue, 24 Jan 2023 20:57:48 +0100
> From: Patrick Wildt 
> 
> Am Mon, Jan 23, 2023 at 04:34:27PM -0600 schrieb Scott Cheloha:
> > Whoops, missed one.  This is the fifth and (I think) last armv7 clock
> > interrupt driver that needs to switch to clockintr.  gptimer(4) is
> > nearly identical to dmtimer(4).
> > 
> > Notable changes:
> > 
> > - Switch stathz from 128 to hz.
> > - Switch profhz from 1024 to (stathz * 10).
> > 
> > Everything else in the patch is just normal clockintr switching stuff
> > or duplicated from the dmtimer(4) patch.
> > 
> > jca@ has compile-tested this.
> > 
> > I would appreciate a test to confirm that the GENERIC boots.  I don't
> > think we need to do a full release build.
> > 
> > ... if nobody knows where to find a board with gptimer(4), I am
> > looking for permission to just commit this as-is.  I cannot find any
> > entries in the dmesg mails of any machines with gptimer(4).  kettenis@
> > was uncertain whether we actually support any of the machines that have
> > this clock.
> > 
> > Preferences?  ok?
> 
> Should we just get rid of it?  It's only used on OMAP3, which was
> already outdated when I started my ARMv7 endeavour a decade ago.
> I never had that machine, drahn@ might have one in the attic.
> 
> The relevant platforms were the Pandaboard (OMAP4) and the BeagleBone
> Black (AM335x).  But the OMAP3, did this thing ever work?

Might as well commit this such that we have it in the attic just in
case someone wants to revive OMAP3?

> > Index: gptimer.c
> > ===
> > RCS file: /cvs/src/sys/arch/armv7/omap/gptimer.c,v
> > retrieving revision 1.17
> > diff -u -p -r1.17 gptimer.c
> > --- gptimer.c   22 Jan 2023 18:36:38 -  1.17
> > +++ gptimer.c   22 Jan 2023 23:52:32 -
> > @@ -23,9 +23,11 @@
> >  
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > +#include 
> >  #include 
> >  #include 
> >  #include 
> > @@ -93,14 +95,12 @@
> >  
> >  #define TIMER_FREQUENCY32768   /* 32kHz is used, 
> > selectable */
> >  
> > -static struct evcount clk_count;
> > -static struct evcount stat_count;
> > -
> >  void gptimer_attach(struct device *parent, struct device *self, void 
> > *args);
> >  int gptimer_intr(void *frame);
> >  void gptimer_wait(int reg);
> >  void gptimer_cpu_initclocks(void);
> >  void gptimer_delay(u_int);
> > +void gptimer_reset_tisr(void);
> >  void gptimer_setstatclockrate(int newhz);
> >  
> >  bus_space_tag_t gptimer_iot;
> > @@ -120,13 +120,16 @@ static struct timecounter gptimer_timeco
> > .tc_user = 0,
> >  };
> >  
> > -volatile u_int32_t nexttickevent;
> > -volatile u_int32_t nextstatevent;
> > -u_int32_t  ticks_per_second;
> > -u_int32_t  ticks_per_intr;
> > -u_int32_t  ticks_err_cnt;
> > -u_int32_t  ticks_err_sum;
> > -u_int32_t  statvar, statmin;
> > +uint64_t gptimer_nsec_cycle_ratio;
> > +uint64_t gptimer_nsec_max;
> > +
> > +void gptimer_rearm(void *, uint64_t);
> > +void gptimer_trigger(void *);
> > +
> > +const struct intrclock gptimer_intrclock = {
> > +   .ic_rearm = gptimer_rearm,
> > +   .ic_trigger = gptimer_trigger
> > +};
> >  
> >  const struct cfattach  gptimer_ca = {
> > sizeof (struct device), NULL, gptimer_attach
> > @@ -177,98 +180,10 @@ gptimer_attach(struct device *parent, st
> > gptimer_setstatclockrate, NULL);
> >  }
> >  
> > -/*
> > - * See comment in arm/xscale/i80321_clock.c
> > - *
> > - * counter is count up, but with autoreload timers it is not possible
> > - * to detect how many  interrupts passed while interrupts were blocked.
> > - * also it is not possible to atomically add to the register
> > - * get get it to precisely fire at a non-fixed interval.
> > - *
> > - * To work around this two timers are used, GPT1 is used as a reference
> > - * clock without reload , however we just ignore the interrupt it
> > - * would (may?) generate.
> > - *
> > - * Internally this keeps track of when the next timer should fire
> > - * and based on that time and the current value of the reference
> > - * clock a number is written into the timer count register to schedule
> > - * the next event.
> > - */
> > -
> >  int
> >  gptimer_intr(void *frame)
> >  {
> > -   u_int32_t now, r;
> > -   u_int32_t nextevent, duration;
> > -
> > -   /* clear interrupt */
> > -   now = bus_space_read_4(gptimer_iot, gptimer_ioh1, GP_TCRR);
> > -
> > -   while ((int32_t) (nexttickevent - now) < 0) {
> > -   nexttickevent += ticks_per_intr;
> > -   ticks_err_sum += ticks_err_cnt;
> > -#if 0
> > -   if (ticks_err_sum  > hz) {
> > -   u_int32_t match_error;
> > -   match_error = ticks_err_sum / hz
> > -   ticks_err_sum -= (match_error * hz);
> > -   }
> > -#else
> > -   /* looping a few times is faster than divide */
> > -   while (ticks_err_sum  > hz) {
> > -   nexttickevent += 1;

Re: gptimer(4): switch to clockintr

2023-01-24 Thread Patrick Wildt
Am Mon, Jan 23, 2023 at 04:34:27PM -0600 schrieb Scott Cheloha:
> Whoops, missed one.  This is the fifth and (I think) last armv7 clock
> interrupt driver that needs to switch to clockintr.  gptimer(4) is
> nearly identical to dmtimer(4).
> 
> Notable changes:
> 
> - Switch stathz from 128 to hz.
> - Switch profhz from 1024 to (stathz * 10).
> 
> Everything else in the patch is just normal clockintr switching stuff
> or duplicated from the dmtimer(4) patch.
> 
> jca@ has compile-tested this.
> 
> I would appreciate a test to confirm that the GENERIC boots.  I don't
> think we need to do a full release build.
> 
> ... if nobody knows where to find a board with gptimer(4), I am
> looking for permission to just commit this as-is.  I cannot find any
> entries in the dmesg mails of any machines with gptimer(4).  kettenis@
> was uncertain whether we actually support any of the machines that have
> this clock.
> 
> Preferences?  ok?

Should we just get rid of it?  It's only used on OMAP3, which was
already outdated when I started my ARMv7 endeavour a decade ago.
I never had that machine, drahn@ might have one in the attic.

The relevant platforms were the Pandaboard (OMAP4) and the BeagleBone
Black (AM335x).  But the OMAP3, did this thing ever work?

> Index: gptimer.c
> ===
> RCS file: /cvs/src/sys/arch/armv7/omap/gptimer.c,v
> retrieving revision 1.17
> diff -u -p -r1.17 gptimer.c
> --- gptimer.c 22 Jan 2023 18:36:38 -  1.17
> +++ gptimer.c 22 Jan 2023 23:52:32 -
> @@ -23,9 +23,11 @@
>  
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 
>  #include 
> @@ -93,14 +95,12 @@
>  
>  #define TIMER_FREQUENCY  32768   /* 32kHz is used, 
> selectable */
>  
> -static struct evcount clk_count;
> -static struct evcount stat_count;
> -
>  void gptimer_attach(struct device *parent, struct device *self, void *args);
>  int gptimer_intr(void *frame);
>  void gptimer_wait(int reg);
>  void gptimer_cpu_initclocks(void);
>  void gptimer_delay(u_int);
> +void gptimer_reset_tisr(void);
>  void gptimer_setstatclockrate(int newhz);
>  
>  bus_space_tag_t gptimer_iot;
> @@ -120,13 +120,16 @@ static struct timecounter gptimer_timeco
>   .tc_user = 0,
>  };
>  
> -volatile u_int32_t nexttickevent;
> -volatile u_int32_t nextstatevent;
> -u_int32_tticks_per_second;
> -u_int32_tticks_per_intr;
> -u_int32_tticks_err_cnt;
> -u_int32_tticks_err_sum;
> -u_int32_tstatvar, statmin;
> +uint64_t gptimer_nsec_cycle_ratio;
> +uint64_t gptimer_nsec_max;
> +
> +void gptimer_rearm(void *, uint64_t);
> +void gptimer_trigger(void *);
> +
> +const struct intrclock gptimer_intrclock = {
> + .ic_rearm = gptimer_rearm,
> + .ic_trigger = gptimer_trigger
> +};
>  
>  const struct cfattachgptimer_ca = {
>   sizeof (struct device), NULL, gptimer_attach
> @@ -177,98 +180,10 @@ gptimer_attach(struct device *parent, st
>   gptimer_setstatclockrate, NULL);
>  }
>  
> -/*
> - * See comment in arm/xscale/i80321_clock.c
> - *
> - * counter is count up, but with autoreload timers it is not possible
> - * to detect how many  interrupts passed while interrupts were blocked.
> - * also it is not possible to atomically add to the register
> - * get get it to precisely fire at a non-fixed interval.
> - *
> - * To work around this two timers are used, GPT1 is used as a reference
> - * clock without reload , however we just ignore the interrupt it
> - * would (may?) generate.
> - *
> - * Internally this keeps track of when the next timer should fire
> - * and based on that time and the current value of the reference
> - * clock a number is written into the timer count register to schedule
> - * the next event.
> - */
> -
>  int
>  gptimer_intr(void *frame)
>  {
> - u_int32_t now, r;
> - u_int32_t nextevent, duration;
> -
> - /* clear interrupt */
> - now = bus_space_read_4(gptimer_iot, gptimer_ioh1, GP_TCRR);
> -
> - while ((int32_t) (nexttickevent - now) < 0) {
> - nexttickevent += ticks_per_intr;
> - ticks_err_sum += ticks_err_cnt;
> -#if 0
> - if (ticks_err_sum  > hz) {
> - u_int32_t match_error;
> - match_error = ticks_err_sum / hz
> - ticks_err_sum -= (match_error * hz);
> - }
> -#else
> - /* looping a few times is faster than divide */
> - while (ticks_err_sum  > hz) {
> - nexttickevent += 1;
> - ticks_err_sum -= hz;
> - }
> -#endif
> - clk_count.ec_count++;
> - hardclock(frame);
> - }
> - while ((int32_t) (nextstatevent - now) < 0) {
> - do {
> - r = random() & (statvar -1);
> - } while (r == 0); /* random == 0 not allowed */
> - nextstatevent += statmin + r;
> - 

Re: refactor mbuf parsing on driver level

2023-01-24 Thread David Gwynne



> On 25 Jan 2023, at 02:27, Vitaliy Makkoveev  wrote:
> 
> 
> 
>> On 24 Jan 2023, at 19:21, Jan Klemkow  wrote:
>> 
>> On Tue, Jan 24, 2023 at 05:40:55PM +0300, Vitaliy Makkoveev wrote:
>>> On Tue, Jan 24, 2023 at 03:14:36PM +0100, Jan Klemkow wrote:
 On Tue, Jan 24, 2023 at 09:32:53PM +1000, David Gwynne wrote:
> On Mon, Jan 23, 2023 at 09:25:34AM +0100, Jan Klemkow wrote:
>> On Wed, Jan 18, 2023 at 03:49:25PM -0700, Theo de Raadt wrote:
>>> Jan Klemkow  wrote:
 On Wed, Jan 18, 2023 at 10:50:25AM +0300, Vitaliy Makkoveev wrote:
> On Tue, Jan 17, 2023 at 11:09:17PM +0100, Jan Klemkow wrote:
>> we have several drivers which have to parse the content of mbufs.  
>> This
>> diff suggest a central parsing function for this.  Thus, we can 
>> reduce
>> redundant code.
>> 
>> I just start with ix(4) and ixl(4) because it was easy to test for 
>> me.
>> But, this could also improve em(4), igc(4), ale(4) and oce(4).
>> 
>> I'm not sure about the name, the api nor the place of this code.  
>> So, if
>> someone has a better idea: i'm open to anything.
> 
> I like code this deduplication.
> 
> This newly introduced function doesn't touch ifnet but only extracts
> protocol headers from mbuf(9). I guess mbuf_extract_headers() or
> something like is much better for name with the ern/uipc_mbuf2.c as
> place.
 
 Good Point.  Updates diff below.
>>> 
>>> I agree, "extract" is a better name.  dlg, do you have a comment?
>> 
>> Whats you opinion about this diff?
> 
> it makes ix and ixl prettier, so that's a good enough reason to do
> it. it should go in net/if_ethersubr.c as ether_extract_headers()
> though.
> 
> could you try using a struct to carry the header pointers around and see
> what that looks like?
> 
> struct ether_extracted {
> struct ether_header *eh;
> struct ip *ip4;
> struct ip6_hdr *ip6;
> struct tcphdr *tcp;
> struct udphdr *udp;
> };
> 
> void ether_extract_headers(struct mbuf *, struct ether_extracted *);
> 
> you can add a depth or flags argument if you want to be able to
> tell it to return before looking for the tcp/udp headers if you
> want.
>>> 
>>> Looks better then m_extract_headers(). Since ext->eh is always assigned
>>> to non NULL value below, the "ext->eh = NULL;" is not necessary. Also
>>> I'm not sure, but is memset() more reliable for `ext' zeroing? Anyway,
>>> feel free to commit without memset().
>> 
>> OK?
>> 
> 
> Sure. ok mvs@

ok by me too.




Re: refactor mbuf parsing on driver level

2023-01-24 Thread Vitaliy Makkoveev



> On 24 Jan 2023, at 19:21, Jan Klemkow  wrote:
> 
> On Tue, Jan 24, 2023 at 05:40:55PM +0300, Vitaliy Makkoveev wrote:
>> On Tue, Jan 24, 2023 at 03:14:36PM +0100, Jan Klemkow wrote:
>>> On Tue, Jan 24, 2023 at 09:32:53PM +1000, David Gwynne wrote:
 On Mon, Jan 23, 2023 at 09:25:34AM +0100, Jan Klemkow wrote:
> On Wed, Jan 18, 2023 at 03:49:25PM -0700, Theo de Raadt wrote:
>> Jan Klemkow  wrote:
>>> On Wed, Jan 18, 2023 at 10:50:25AM +0300, Vitaliy Makkoveev wrote:
 On Tue, Jan 17, 2023 at 11:09:17PM +0100, Jan Klemkow wrote:
> we have several drivers which have to parse the content of mbufs.  
> This
> diff suggest a central parsing function for this.  Thus, we can reduce
> redundant code.
> 
> I just start with ix(4) and ixl(4) because it was easy to test for me.
> But, this could also improve em(4), igc(4), ale(4) and oce(4).
> 
> I'm not sure about the name, the api nor the place of this code.  So, 
> if
> someone has a better idea: i'm open to anything.
 
 I like code this deduplication.
 
 This newly introduced function doesn't touch ifnet but only extracts
 protocol headers from mbuf(9). I guess mbuf_extract_headers() or
 something like is much better for name with the ern/uipc_mbuf2.c as
 place.
>>> 
>>> Good Point.  Updates diff below.
>> 
>> I agree, "extract" is a better name.  dlg, do you have a comment?
> 
> Whats you opinion about this diff?
 
 it makes ix and ixl prettier, so that's a good enough reason to do
 it. it should go in net/if_ethersubr.c as ether_extract_headers()
 though.
 
 could you try using a struct to carry the header pointers around and see
 what that looks like?
 
 struct ether_extracted {
struct ether_header *eh;
struct ip   *ip4;
struct ip6_hdr  *ip6;
struct tcphdr   *tcp;
struct udphdr   *udp;
 };
 
 void ether_extract_headers(struct mbuf *, struct ether_extracted *);
 
 you can add a depth or flags argument if you want to be able to
 tell it to return before looking for the tcp/udp headers if you
 want.
>> 
>> Looks better then m_extract_headers(). Since ext->eh is always assigned
>> to non NULL value below, the "ext->eh = NULL;" is not necessary. Also
>> I'm not sure, but is memset() more reliable for `ext' zeroing? Anyway,
>> feel free to commit without memset().
> 
> OK?
> 

Sure. ok mvs@



Re: refactor mbuf parsing on driver level

2023-01-24 Thread Jan Klemkow
On Tue, Jan 24, 2023 at 05:40:55PM +0300, Vitaliy Makkoveev wrote:
> On Tue, Jan 24, 2023 at 03:14:36PM +0100, Jan Klemkow wrote:
> > On Tue, Jan 24, 2023 at 09:32:53PM +1000, David Gwynne wrote:
> > > On Mon, Jan 23, 2023 at 09:25:34AM +0100, Jan Klemkow wrote:
> > > > On Wed, Jan 18, 2023 at 03:49:25PM -0700, Theo de Raadt wrote:
> > > > > Jan Klemkow  wrote:
> > > > > > On Wed, Jan 18, 2023 at 10:50:25AM +0300, Vitaliy Makkoveev wrote:
> > > > > > > On Tue, Jan 17, 2023 at 11:09:17PM +0100, Jan Klemkow wrote:
> > > > > > > > we have several drivers which have to parse the content of 
> > > > > > > > mbufs.  This
> > > > > > > > diff suggest a central parsing function for this.  Thus, we can 
> > > > > > > > reduce
> > > > > > > > redundant code.
> > > > > > > > 
> > > > > > > > I just start with ix(4) and ixl(4) because it was easy to test 
> > > > > > > > for me.
> > > > > > > > But, this could also improve em(4), igc(4), ale(4) and oce(4).
> > > > > > > > 
> > > > > > > > I'm not sure about the name, the api nor the place of this 
> > > > > > > > code.  So, if
> > > > > > > > someone has a better idea: i'm open to anything.
> > > > > > > 
> > > > > > > I like code this deduplication.
> > > > > > > 
> > > > > > > This newly introduced function doesn't touch ifnet but only 
> > > > > > > extracts
> > > > > > > protocol headers from mbuf(9). I guess mbuf_extract_headers() or
> > > > > > > something like is much better for name with the ern/uipc_mbuf2.c 
> > > > > > > as
> > > > > > > place.
> > > > > > 
> > > > > > Good Point.  Updates diff below.
> > > > > 
> > > > > I agree, "extract" is a better name.  dlg, do you have a comment?
> > > > 
> > > > Whats you opinion about this diff?
> > > 
> > > it makes ix and ixl prettier, so that's a good enough reason to do
> > > it. it should go in net/if_ethersubr.c as ether_extract_headers()
> > > though.
> > > 
> > > could you try using a struct to carry the header pointers around and see
> > > what that looks like?
> > > 
> > > struct ether_extracted {
> > >   struct ether_header *eh;
> > >   struct ip   *ip4;
> > >   struct ip6_hdr  *ip6;
> > >   struct tcphdr   *tcp;
> > >   struct udphdr   *udp;
> > > };
> > > 
> > > void ether_extract_headers(struct mbuf *, struct ether_extracted *);
> > > 
> > > you can add a depth or flags argument if you want to be able to
> > > tell it to return before looking for the tcp/udp headers if you
> > > want.
> 
> Looks better then m_extract_headers(). Since ext->eh is always assigned
> to non NULL value below, the "ext->eh = NULL;" is not necessary. Also
> I'm not sure, but is memset() more reliable for `ext' zeroing? Anyway,
> feel free to commit without memset().

OK?

Thanks,
Jan

Index: dev/pci/if_ix.c
===
RCS file: /cvs/src/sys/dev/pci/if_ix.c,v
retrieving revision 1.189
diff -u -p -r1.189 if_ix.c
--- dev/pci/if_ix.c 2 Sep 2022 14:08:09 -   1.189
+++ dev/pci/if_ix.c 24 Jan 2023 13:34:17 -
@@ -2477,25 +2477,16 @@ static inline int
 ixgbe_csum_offload(struct mbuf *mp, uint32_t *vlan_macip_lens,
 uint32_t *type_tucmd_mlhl, uint32_t *olinfo_status)
 {
-   struct ether_header *eh = mtod(mp, struct ether_header *);
-   struct mbuf *m;
-   int hoff;
+   struct ether_extracted ext;
int offload = 0;
uint32_t iphlen;
-   uint8_t ipproto;
 
-   *vlan_macip_lens |= (sizeof(*eh) << IXGBE_ADVTXD_MACLEN_SHIFT);
+   ether_extract_headers(mp, &ext);
 
-   switch (ntohs(eh->ether_type)) {
-   case ETHERTYPE_IP: {
-   struct ip *ip;
+   *vlan_macip_lens |= (sizeof(*ext.eh) << IXGBE_ADVTXD_MACLEN_SHIFT);
 
-   m = m_getptr(mp, sizeof(*eh), &hoff);
-   KASSERT(m != NULL && m->m_len - hoff >= sizeof(*ip));
-   ip = (struct ip *)(mtod(m, caddr_t) + hoff);
-
-   iphlen = ip->ip_hl << 2;
-   ipproto = ip->ip_p;
+   if (ext.ip4) {
+   iphlen = ext.ip4->ip_hl << 2;
 
if (ISSET(mp->m_pkthdr.csum_flags, M_IPV4_CSUM_OUT)) {
*olinfo_status |= IXGBE_TXD_POPTS_IXSM << 8;
@@ -2503,46 +2494,30 @@ ixgbe_csum_offload(struct mbuf *mp, uint
}
 
*type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_IPV4;
-   break;
-   }
-
 #ifdef INET6
-   case ETHERTYPE_IPV6: {
-   struct ip6_hdr *ip6;
-
-   m = m_getptr(mp, sizeof(*eh), &hoff);
-   KASSERT(m != NULL && m->m_len - hoff >= sizeof(*ip6));
-   ip6 = (struct ip6_hdr *)(mtod(m, caddr_t) + hoff);
-
-   iphlen = sizeof(*ip6);
-   ipproto = ip6->ip6_nxt;
+   } else if (ext.ip6) {
+   iphlen = sizeof(*ext.ip6);
 
*type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_IPV6;
-   break;
-   }
 #endif
-
-   default:
+   } else {
return offload;

Re: refactor mbuf parsing on driver level

2023-01-24 Thread Vitaliy Makkoveev
On Tue, Jan 24, 2023 at 03:14:36PM +0100, Jan Klemkow wrote:
> On Tue, Jan 24, 2023 at 09:32:53PM +1000, David Gwynne wrote:
> > On Mon, Jan 23, 2023 at 09:25:34AM +0100, Jan Klemkow wrote:
> > > On Wed, Jan 18, 2023 at 03:49:25PM -0700, Theo de Raadt wrote:
> > > > Jan Klemkow  wrote:
> > > > > On Wed, Jan 18, 2023 at 10:50:25AM +0300, Vitaliy Makkoveev wrote:
> > > > > > On Tue, Jan 17, 2023 at 11:09:17PM +0100, Jan Klemkow wrote:
> > > > > > > we have several drivers which have to parse the content of mbufs. 
> > > > > > >  This
> > > > > > > diff suggest a central parsing function for this.  Thus, we can 
> > > > > > > reduce
> > > > > > > redundant code.
> > > > > > > 
> > > > > > > I just start with ix(4) and ixl(4) because it was easy to test 
> > > > > > > for me.
> > > > > > > But, this could also improve em(4), igc(4), ale(4) and oce(4).
> > > > > > > 
> > > > > > > I'm not sure about the name, the api nor the place of this code.  
> > > > > > > So, if
> > > > > > > someone has a better idea: i'm open to anything.
> > > > > > 
> > > > > > I like code this deduplication.
> > > > > > 
> > > > > > This newly introduced function doesn't touch ifnet but only extracts
> > > > > > protocol headers from mbuf(9). I guess mbuf_extract_headers() or
> > > > > > something like is much better for name with the ern/uipc_mbuf2.c as
> > > > > > place.
> > > > > 
> > > > > Good Point.  Updates diff below.
> > > > 
> > > > I agree, "extract" is a better name.  dlg, do you have a comment?
> > > 
> > > Whats you opinion about this diff?
> > 
> > it makes ix and ixl prettier, so that's a good enough reason to do
> > it. it should go in net/if_ethersubr.c as ether_extract_headers()
> > though.
> > 
> > could you try using a struct to carry the header pointers around and see
> > what that looks like?
> > 
> > struct ether_extracted {
> > struct ether_header *eh;
> > struct ip   *ip4;
> > struct ip6_hdr  *ip6;
> > struct tcphdr   *tcp;
> > struct udphdr   *udp;
> > };
> > 
> > void ether_extract_headers(struct mbuf *, struct ether_extracted *);
> > 
> > you can add a depth or flags argument if you want to be able to
> > tell it to return before looking for the tcp/udp headers if you
> > want.
> 
> OK?
> 

Looks better then m_extract_headers(). Since ext->eh is always assigned
to non NULL value below, the "ext->eh = NULL;" is not necessary. Also
I'm not sure, but is memset() more reliable for `ext' zeroing? Anyway,
feel free to commit without memset().

> + /* Return NULL if header was not recognized. */
> + ext->eh = NULL;
> + ext->ip4 = NULL;
> + ext->ip6 = NULL;
> + ext->tcp = NULL;
> + ext->udp = NULL;
> +
> + ext->eh = mtod(mp, struct ether_header *);



Re: refactor mbuf parsing on driver level

2023-01-24 Thread Jan Klemkow
On Tue, Jan 24, 2023 at 09:32:53PM +1000, David Gwynne wrote:
> On Mon, Jan 23, 2023 at 09:25:34AM +0100, Jan Klemkow wrote:
> > On Wed, Jan 18, 2023 at 03:49:25PM -0700, Theo de Raadt wrote:
> > > Jan Klemkow  wrote:
> > > > On Wed, Jan 18, 2023 at 10:50:25AM +0300, Vitaliy Makkoveev wrote:
> > > > > On Tue, Jan 17, 2023 at 11:09:17PM +0100, Jan Klemkow wrote:
> > > > > > we have several drivers which have to parse the content of mbufs.  
> > > > > > This
> > > > > > diff suggest a central parsing function for this.  Thus, we can 
> > > > > > reduce
> > > > > > redundant code.
> > > > > > 
> > > > > > I just start with ix(4) and ixl(4) because it was easy to test for 
> > > > > > me.
> > > > > > But, this could also improve em(4), igc(4), ale(4) and oce(4).
> > > > > > 
> > > > > > I'm not sure about the name, the api nor the place of this code.  
> > > > > > So, if
> > > > > > someone has a better idea: i'm open to anything.
> > > > > 
> > > > > I like code this deduplication.
> > > > > 
> > > > > This newly introduced function doesn't touch ifnet but only extracts
> > > > > protocol headers from mbuf(9). I guess mbuf_extract_headers() or
> > > > > something like is much better for name with the ern/uipc_mbuf2.c as
> > > > > place.
> > > > 
> > > > Good Point.  Updates diff below.
> > > 
> > > I agree, "extract" is a better name.  dlg, do you have a comment?
> > 
> > Whats you opinion about this diff?
> 
> it makes ix and ixl prettier, so that's a good enough reason to do
> it. it should go in net/if_ethersubr.c as ether_extract_headers()
> though.
> 
> could you try using a struct to carry the header pointers around and see
> what that looks like?
> 
> struct ether_extracted {
>   struct ether_header *eh;
>   struct ip   *ip4;
>   struct ip6_hdr  *ip6;
>   struct tcphdr   *tcp;
>   struct udphdr   *udp;
> };
> 
> void ether_extract_headers(struct mbuf *, struct ether_extracted *);
> 
> you can add a depth or flags argument if you want to be able to
> tell it to return before looking for the tcp/udp headers if you
> want.

OK?

Thanks,
Jan

Index: dev/pci/if_ix.c
===
RCS file: /cvs/src/sys/dev/pci/if_ix.c,v
retrieving revision 1.189
diff -u -p -r1.189 if_ix.c
--- dev/pci/if_ix.c 2 Sep 2022 14:08:09 -   1.189
+++ dev/pci/if_ix.c 24 Jan 2023 13:34:17 -
@@ -2477,25 +2477,16 @@ static inline int
 ixgbe_csum_offload(struct mbuf *mp, uint32_t *vlan_macip_lens,
 uint32_t *type_tucmd_mlhl, uint32_t *olinfo_status)
 {
-   struct ether_header *eh = mtod(mp, struct ether_header *);
-   struct mbuf *m;
-   int hoff;
+   struct ether_extracted ext;
int offload = 0;
uint32_t iphlen;
-   uint8_t ipproto;
 
-   *vlan_macip_lens |= (sizeof(*eh) << IXGBE_ADVTXD_MACLEN_SHIFT);
+   ether_extract_headers(mp, &ext);
 
-   switch (ntohs(eh->ether_type)) {
-   case ETHERTYPE_IP: {
-   struct ip *ip;
+   *vlan_macip_lens |= (sizeof(*ext.eh) << IXGBE_ADVTXD_MACLEN_SHIFT);
 
-   m = m_getptr(mp, sizeof(*eh), &hoff);
-   KASSERT(m != NULL && m->m_len - hoff >= sizeof(*ip));
-   ip = (struct ip *)(mtod(m, caddr_t) + hoff);
-
-   iphlen = ip->ip_hl << 2;
-   ipproto = ip->ip_p;
+   if (ext.ip4) {
+   iphlen = ext.ip4->ip_hl << 2;
 
if (ISSET(mp->m_pkthdr.csum_flags, M_IPV4_CSUM_OUT)) {
*olinfo_status |= IXGBE_TXD_POPTS_IXSM << 8;
@@ -2503,46 +2494,30 @@ ixgbe_csum_offload(struct mbuf *mp, uint
}
 
*type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_IPV4;
-   break;
-   }
-
 #ifdef INET6
-   case ETHERTYPE_IPV6: {
-   struct ip6_hdr *ip6;
-
-   m = m_getptr(mp, sizeof(*eh), &hoff);
-   KASSERT(m != NULL && m->m_len - hoff >= sizeof(*ip6));
-   ip6 = (struct ip6_hdr *)(mtod(m, caddr_t) + hoff);
-
-   iphlen = sizeof(*ip6);
-   ipproto = ip6->ip6_nxt;
+   } else if (ext.ip6) {
+   iphlen = sizeof(*ext.ip6);
 
*type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_IPV6;
-   break;
-   }
 #endif
-
-   default:
+   } else {
return offload;
}
 
*vlan_macip_lens |= iphlen;
 
-   switch (ipproto) {
-   case IPPROTO_TCP:
+   if (ext.tcp) {
*type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_TCP;
if (ISSET(mp->m_pkthdr.csum_flags, M_TCP_CSUM_OUT)) {
*olinfo_status |= IXGBE_TXD_POPTS_TXSM << 8;
offload = 1;
}
-   break;
-   case IPPROTO_UDP:
+   } else if (ext.udp) {
*type_tucmd_mlhl |= IXGBE_ADVTXD_TUCMD_L4T_UDP;
if (ISSET(mp->m_pkthdr.csum_flags, M_UDP_CSUM_OUT)) {
*

Re: bgpd final bits for ASPA support

2023-01-24 Thread Theo Buehler
On Tue, Jan 24, 2023 at 02:21:47PM +0100, Claudio Jeker wrote:
> Here is hopefully the last pieces to finish ASAP support.
> This adds filter support. It is similar to origin validation and called
> AVS (ASPA validation state). Only difference is that not-found is called
> unknown in AVS.
> The diff also adds `bgpctl show rib avs invalid` support.
> 
> ~> bgpctl show rib avs invalid
> flags: * = Valid, > = Selected, I = via IBGP, A = Announced,
>S = Stale, E = Error
> origin validation state: N = not-found, V = valid, ! = invalid
> aspa validation state: ? = unknown, V = valid, ! = invalid
> origin: i = IGP, e = EGP, ? = Incomplete
> 
> flags  vs destination  gateway  lpref   med aspath origin
> *>V-! 2606:b0c0:b00b::/48  2001:db8::252100 0 8271 6939 61138 945 
> i
> *mV-! 2606:b0c0:b00b::/48  2001:db8::253100 0 8271 6939 61138 945 
> i
> 
> So with this I found the first two invalid ASPA paths, time to adjust the
> filters :)

Diff reads fine,

ok tb



bgpd final bits for ASPA support

2023-01-24 Thread Claudio Jeker
Here is hopefully the last pieces to finish ASAP support.
This adds filter support. It is similar to origin validation and called
AVS (ASPA validation state). Only difference is that not-found is called
unknown in AVS.
The diff also adds `bgpctl show rib avs invalid` support.

~> bgpctl show rib avs invalid
flags: * = Valid, > = Selected, I = via IBGP, A = Announced,
   S = Stale, E = Error
origin validation state: N = not-found, V = valid, ! = invalid
aspa validation state: ? = unknown, V = valid, ! = invalid
origin: i = IGP, e = EGP, ? = Incomplete

flags  vs destination  gateway  lpref   med aspath origin
*>V-! 2606:b0c0:b00b::/48  2001:db8::252100 0 8271 6939 61138 945 i
*mV-! 2606:b0c0:b00b::/48  2001:db8::253100 0 8271 6939 61138 945 i

So with this I found the first two invalid ASPA paths, time to adjust the
filters :)
-- 
:wq Claudio

Index: usr.sbin/bgpd/bgpd.conf.5
===
RCS file: /cvs/src/usr.sbin/bgpd/bgpd.conf.5,v
retrieving revision 1.229
diff -u -p -r1.229 bgpd.conf.5
--- usr.sbin/bgpd/bgpd.conf.5   20 Jan 2023 15:41:33 -  1.229
+++ usr.sbin/bgpd/bgpd.conf.5   24 Jan 2023 13:10:37 -
@@ -1538,6 +1538,14 @@ deny from any { AS { 1, 2, 3 }, source-a
 .Ed
 .Pp
 .It Xo
+.Ic avs
+.Pq Ic valid | unknown | invalid
+.Xc
+This rule applies only to
+.Em UPDATES
+where the ASPA Validation State (AVS) matches.
+.Pp
+.It Xo
 .Ic community
 .Ar as-number Ns Li \&: Ns Ar local
 .Xc
Index: usr.sbin/bgpd/bgpd.h
===
RCS file: /cvs/src/usr.sbin/bgpd/bgpd.h,v
retrieving revision 1.459
diff -u -p -r1.459 bgpd.h
--- usr.sbin/bgpd/bgpd.h24 Jan 2023 11:28:41 -  1.459
+++ usr.sbin/bgpd/bgpd.h24 Jan 2023 13:10:37 -
@@ -94,6 +94,9 @@
 #defineF_CTL_OVS_NOTFOUND  0x20
 #defineF_CTL_NEIGHBORS 0x40 /* only used by bgpctl */
 #defineF_CTL_HAS_PATHID0x80 /* only set on requests */
+#defineF_CTL_AVS_VALID 0x100
+#defineF_CTL_AVS_INVALID   0x200
+#defineF_CTL_AVS_UNKNOWN   0x400
 
 #define CTASSERT(x)extern char  _ctassert[(x) ? 1 : -1 ] \
__attribute__((__unused__))
@@ -896,7 +899,7 @@ struct filter_originset {
struct rde_prefixset*ps;
 };
 
-struct filter_ovs {
+struct filter_vs {
uint8_t  validity;
uint8_t  is_set;
 };
@@ -1082,7 +1085,8 @@ struct filter_match {
struct communitycommunity[MAX_COMM_MATCH];
struct filter_prefixset prefixset;
struct filter_originset originset;
-   struct filter_ovs   ovs;
+   struct filter_vsovs;
+   struct filter_vsavs;
int maxcomm;
int maxextcomm;
int maxlargecomm;
Index: usr.sbin/bgpd/parse.y
===
RCS file: /cvs/src/usr.sbin/bgpd/parse.y,v
retrieving revision 1.439
diff -u -p -r1.439 parse.y
--- usr.sbin/bgpd/parse.y   20 Jan 2023 15:41:33 -  1.439
+++ usr.sbin/bgpd/parse.y   24 Jan 2023 13:10:37 -
@@ -231,7 +231,7 @@ typedef struct {
 %token COMMUNITY EXTCOMMUNITY LARGECOMMUNITY DELETE
 %token MAXCOMMUNITIES MAXEXTCOMMUNITIES MAXLARGECOMMUNITIES
 %token PREFIX PREFIXLEN PREFIXSET
-%token ASPASET ROASET ORIGINSET OVS EXPIRES
+%token ASPASET ROASET ORIGINSET OVS AVS EXPIRES
 %token ASSET SOURCEAS TRANSITAS PEERAS PROVIDERAS CUSTOMERAS MAXASLEN MAXASSEQ
 %token SET LOCALPREF MED METRIC NEXTHOP REJECT BLACKHOLE NOMODIFY SELF
 %token PREPEND_SELF PREPEND_PEER PFTABLE WEIGHT RTLABEL ORIGIN PRIORITY
@@ -244,7 +244,8 @@ typedef struct {
 %token   NUMBER
 %typeasnumber as4number as4number_any optnumber
 %typeespah family safi restart origincode nettype
-%typeyesno inout restricted validity expires enforce
+%typeyesno inout restricted expires enforce
+%typevalidity aspa_validity
 %typeaddpathextra addpathmax
 %typestring
 %type  address
@@ -2622,6 +2623,14 @@ filter_elm   : filter_prefix_h   {
fmopts.m.ovs.validity = $2;
fmopts.m.ovs.is_set = 1;
}
+   | AVS aspa_validity {
+   if (fmopts.m.avs.is_set) {
+   yyerror("avs filter already specified");
+   YYERROR;
+   }
+   fmopts.m.avs.validity = $2;
+   fmopts.m.avs.is_set = 1;
+   }
;
 
 prefixlenop: /* empty */   { memset(&$$, 0, sizeof($$)

Re: bgpd validate ASPATH with ASPA

2023-01-24 Thread Theo Buehler
On Fri, Jan 20, 2023 at 02:44:16PM +0100, Claudio Jeker wrote:
> On Fri, Jan 20, 2023 at 12:21:14PM +0100, Claudio Jeker wrote:
> > This diff adds the reload logic and rewrites larger parts of what was
> > already there to have ASPA validation in the RDE.
> > 
> > The main reason this diff is so large is that the ASPA state cache on
> > struct rde_aspath needs to be afi/aid and role independent. So I changed
> > the aspa functions to be role and aid independent which results in a lot
> > of churn.
> > 
> > The code now uses rde_aspa_validity() with the cache in rde_aspath to
> > figure out if a prefix is ASPA valid, invalid or unknown.
> > rde_aspa_validity() is cheap since it just checks various bits to decide.
> > The cache is updated by checking a generation counter that is increased
> > during reload. This is done since the tables are walked by prefix and not
> > by ASPATH.
> > 
> > There is still no filter syntax available to deny aspa invalid routes but
> > that will follow soon.
> > 
> > The diff includes bgpd, bgpctl and regress test changes. There is a lot of
> > churn in regress test because of bgpctl output changes.
> 
> I missed a small bit in the diff. In rde_filter_match() the state->vstate
> needs to be masked with the ROA_MASK else the ovs validity will not match.
> 
> I added this to the big diff but just included the delta here.

I spent a lot of time on this diff and can't spot anything wrong with it.
It is super tricky, so it is of course very easy to miss subtleties...

ok tb