Re: timeout.9: document kclock timeouts + a bunch of other changes

2021-06-24 Thread Jason McIntyre
On Thu, Jun 24, 2021 at 05:39:56PM -0500, Scott Cheloha wrote:
> On Thu, Jun 24, 2021 at 06:51:07AM +0100, Jason McIntyre wrote:
> > On Wed, Jun 23, 2021 at 06:57:00PM -0500, Scott Cheloha wrote:
> > > Hi,
> > > 
> > > I want to document kclock timeouts so others can use them.
> > 
> > morning. reads fine, except one issue:
> > 
> > [...]
> > 
> > > +.Bl -tag -width kclock
> > > +.It Fa kclock
> > > +The timeout is scheduled against the given
> > > +.Fa kclock,
> > 
> > you need a space between kclock and the comma
> 
> Huh, I'm a little surprised -Tlint doesn't flag that.
> 

it did! you must have just missed it somehow.

> Fixed.
> 
> Are there any other mdoc(7)-type stylistic anachronisms we can fix up?
> 

everything else looked ok.
jmc

> Index: share/man/man9/timeout.9
> ===
> RCS file: /cvs/src/share/man/man9/timeout.9,v
> retrieving revision 1.53
> diff -u -p -r1.53 timeout.9
> --- share/man/man9/timeout.9  11 May 2021 13:29:25 -  1.53
> +++ share/man/man9/timeout.9  24 Jun 2021 22:37:38 -
> @@ -44,7 +44,7 @@
>  .Nm timeout_triggered ,
>  .Nm TIMEOUT_INITIALIZER ,
>  .Nm TIMEOUT_INITIALIZER_FLAGS
> -.Nd execute a function after a specified period of time
> +.Nd execute a function in the future
>  .Sh SYNOPSIS
>  .In sys/types.h
>  .In sys/timeout.h
> @@ -55,12 +55,13 @@
>  .Fa "struct timeout *to"
>  .Fa "void (*fn)(void *)"
>  .Fa "void *arg"
> +.Fa "int kclock"
>  .Fa "int flags"
>  .Fc
>  .Ft void
>  .Fn timeout_set_proc "struct timeout *to" "void (*fn)(void *)" "void *arg"
>  .Ft int
> -.Fn timeout_add "struct timeout *to" "int ticks"
> +.Fn timeout_add "struct timeout *to" "int nticks"
>  .Ft int
>  .Fn timeout_del "struct timeout *to"
>  .Ft int
> @@ -83,174 +84,218 @@
>  .Fn timeout_add_usec "struct timeout *to" "int usec"
>  .Ft int
>  .Fn timeout_add_nsec "struct timeout *to" "int nsec"
> -.Fn TIMEOUT_INITIALIZER "void (*fn)(void *)" "void *arg"
> -.Fn TIMEOUT_INITIALIZER_FLAGS "void (*fn)(void *)" "void *arg" "int flags"
> +.Ft int
> +.Fn timeout_in_nsec "struct timeout *to" "uint64_t nsecs"
> +.Ft int
> +.Fn timeout_at_ts "struct timeout *to" "const struct timespec *abs"
> +.Fo TIMEOUT_INITIALIZER
> +.Fa "void (*fn)(void *)"
> +.Fa "void *arg"
> +.Fc
> +.Fo TIMEOUT_INITIALIZER_FLAGS
> +.Fa "void (*fn)(void *)"
> +.Fa "void *arg"
> +.Fa "int kclock"
> +.Fa "int flags"
> +.Fc
>  .Sh DESCRIPTION
>  The
>  .Nm timeout
> -API provides a mechanism to execute a function at a given time.
> -The granularity of the time is limited by the granularity of the
> -.Xr hardclock 9
> -timer which executes
> -.Xr hz 9
> -times a second.
> +API provides a mechanism to schedule a function for asynchronous
> +execution in the future.
>  .Pp
> -It is the responsibility of the caller to provide these functions with
> -pre-allocated timeout structures.
> +All state is encapsulated in a caller-allocated timeout structure
> +.Pq hereafter, a Qo timeout Qc .
> +A timeout must be initialized before it may be used as input to other
> +functions in the API.
>  .Pp
>  The
>  .Fn timeout_set
> -function prepares the timeout structure
> -.Fa to
> -to be used in future calls to
> -.Fn timeout_add
> -and
> -.Fn timeout_del .
> -The timeout will be prepared to call the function specified by the
> +function initializes the timeout
> +.Fa to .
> +When executed,
> +the timeout will call the function
>  .Fa fn
> -argument with a
> -.Fa void *
> -argument given in the
> +with
>  .Fa arg
> -argument.
> -Once initialized, the
> -.Fa to
> -structure can be used repeatedly in
> -.Fn timeout_add
> -and
> -.Fn timeout_del
> -and does not need to be reinitialized unless
> -the function called and/or its argument must change.
> +as its first parameter.
> +The timeout is implicitly scheduled against the
> +.Dv KCLOCK_NONE
> +clock and is not configured with any additional flags.
>  .Pp
>  The
>  .Fn timeout_set_flags
>  function is similar to
> -.Fn timeout_set
> -but it additionally accepts the bitwise OR of zero or more of the
> -following
> +.Fn timeout_set ,
> +except that it takes two additional parameters:
> +.Bl -tag -width kclock
> +.It Fa kclock
> +The timeout is scheduled against the given
> +.Fa kclock ,
> +which must be one of the following:
> +.Bl -tag -width KCLOCK_UPTIME
> +.It Dv KCLOCK_NONE
> +Low resolution tick-based clock.
> +The granularity of this clock is limited by the
> +.Xr hardclock 9 ,
> +which executes roughly
> +.Xr hz 9
> +times per second.
> +.It Dv KCLOCK_UPTIME
> +The uptime clock.
> +Counts the time elapsed since the system booted.
> +.El
> +.It Fa flags
> +The timeout's behavior may be configured with the bitwise OR of
> +zero or more of the following
>  .Fa flags :
> -.Bl -tag -width TIMEOUT_PROC -offset indent
> +.Bl -tag -width TIMEOUT_PROC
>  .It Dv TIMEOUT_PROC
> -Runs the timeout in a process context instead of the default
> +Execute the timeout in a process context instead of the default
>  .Dv IPL_SOFTCLOCK
>  

Re: [External] : Re: if_etherbridge.c vs. parallel forwarding

2021-06-24 Thread David Gwynne
On Thu, Jun 24, 2021 at 09:31:20AM +0200, Alexandr Nedvedicky wrote:
> Hello David,
> 
> 
> > 
> > i think we can get away with not refcounting eb_entry structures at all.
> > either they're in the etherbridge map/table or they're not, and the
> > thing that takes them out of the map while holding the eb_lock mutex
> > becomes responsible for their cleanup.
> > 
> > i feel like most of the original logic can still hold up if we fix my
> > stupid refcnting mistake(s) and do a better job of avoiding a double
> > free.
> 
> I'm not sure. It seems to me the code in your diff deals with
> insert vs. insert race properly. how about delete vs. insert?

hrm. i can half convince myself that the outcome of losing a delete vs
insert race would have a semantically correct outcome, but while i'm
trying to do that it occurs to me that i'm trying to make the code too
clever and i should dumb it down.

> 350 mtx_enter(>eb_lock);
> 351 num = eb->eb_num + (oebe == NULL);
>
> 352 if (num <= eb->eb_max && ebt_insert(eb, nebe) == oebe) {  
>
> 353 /* we won, do the update */   
>
> 354 ebl_insert(ebl, nebe);
> 355 
> 356 if (oebe != NULL) {
> 357 ebl_remove(ebl, oebe);
> 358 ebt_replace(eb, oebe, nebe);
> 359 }
> 360 
> 361 nebe = NULL; /* give nebe reference to the table */
> 362 eb->eb_num = num; 
>
> 363 } else {  
>
> 364 /* we lost, we didn't end up replacing oebe */
> 365 oebe = NULL;
> 366 }
> 367 mtx_leave(>eb_lock);
> 368 
> 
> assume cpu0 got oebe and assumes it is going to perform update (oebe != 
> NULL).
> the cpu1 runs ahead and won mutex (->eb_lock) in etherbridge_del_addr() 
> and
> removed the entry successfully. as soon as cpu1 leaves ->eb_lock, it's
> cpu0's turn. In this case ebt_insert() returns NULL, because there is
> no conflict any more. However 'NULL != oebe'.

in this situation it would look like etherbridge_del_addr ran after nebe
was inserted, which i think is a plausible history.

> I'm not sure we can fix insert vs. delete race properly without atomic
> reference counter.

during the drive to work it occurred to me that we should basically have
the same logic around whether we should insert or replace or do nothing
in both the smr and mutex critical sections.

it at least makes the code easier to understand. i think?

Index: if_etherbridge.c
===
RCS file: /cvs/src/sys/net/if_etherbridge.c,v
retrieving revision 1.6
diff -u -p -r1.6 if_etherbridge.c
--- if_etherbridge.c10 Mar 2021 10:21:47 -  1.6
+++ if_etherbridge.c25 Jun 2021 03:56:37 -
@@ -44,7 +44,6 @@
 
 #include 
 
-static inline void ebe_take(struct eb_entry *);
 static inline void ebe_rele(struct eb_entry *);
 static voidebe_free(void *);
 
@@ -233,16 +232,9 @@ ebt_remove(struct etherbridge *eb, struc
 }
 
 static inline void
-ebe_take(struct eb_entry *ebe)
-{
-   refcnt_take(>ebe_refs);
-}
-
-static void
 ebe_rele(struct eb_entry *ebe)
 {
-   if (refcnt_rele(>ebe_refs))
-   smr_call(>ebe_smr_entry, ebe_free, ebe);
+   smr_call(>ebe_smr_entry, ebe_free, ebe);
 }
 
 static void
@@ -309,19 +301,21 @@ etherbridge_map(struct etherbridge *eb, 
 
smr_read_enter();
oebe = ebl_find(ebl, eba);
-   if (oebe == NULL)
-   new = 1;
-   else {
+   if (oebe == NULL) {
+   /*
+* peek at the space to see if it's worth trying
+* to make a new entry.
+*/
+   if (eb->eb_num < eb->eb_max)
+   new = 1;
+   } else {
if (oebe->ebe_age != now)
oebe->ebe_age = now;
 
/* does this entry need to be replaced? */
if (oebe->ebe_type == EBE_DYNAMIC &&
-   !eb_port_eq(eb, oebe->ebe_port, port)) {
+   !eb_port_eq(eb, oebe->ebe_port, port))
new = 1;
-   ebe_take(oebe);
-   } else
-   oebe = NULL;
}
smr_read_leave();
 
@@ -342,7 +336,6 @@ etherbridge_map(struct etherbridge *eb, 
}
 
smr_init(>ebe_smr_entry);
-   refcnt_init(>ebe_refs);
nebe->ebe_etherbridge = eb;
 
nebe->ebe_addr = eba;
@@ -351,40 +344,49 @@ etherbridge_map(struct etherbridge *eb, 

Re: [please test] amd64: schedule clock interrupts against system clock

2021-06-24 Thread Scott Cheloha
On Thu, Jun 24, 2021 at 09:50:07PM -0500, Scott Cheloha wrote:
> 
> [...]
> 
> Stats for the clockintr subsystem are exposed via sysctl(2).  If
> you're interested in providing them you can compile and run the
> program attached inline in my next mail.  A snippet of the output from
> across a suspend/resume is especially useful.
> 
> [...]

Here's the program I mentioned.  It prints clockintr stats once per
second until you kill it (C-c works).

You will need to do `make includes` first, as the patch in the prior
mail added a new file to sys/dev and changed .  That or
you can do a full `make build`.

Assuming you put the inline file in clockintrstat.c, just build and
run:

$ cc -o clockintrstat clockintrstat.c
$ ./clockintrstat

Thanks,

-Scott

--

#include 
#include 

#include 

#include 
#include 
#include 
#include 
#include 
#include 

volatile sig_atomic_t uninterrupted = 1;
volatile sig_atomic_t need_header = 1;

void alrm(int);
void info(int);
void quit(int);

int
main(void)
{
struct clockintr_stat diff, new, old;
struct itimerval itv;
struct timespec elapsed, now, then;
uint64_t avg_lateness;
int mib[] = { CTL_KERN, KERN_CLOCKINTR_STATS };
sigset_t empty;
size_t len;
int first, rows;

memset(, 0, sizeof(diff));
memset(, 0, sizeof(new));
memset(, 0, sizeof(old));
timespecclear();
timespecclear();
sigemptyset();
len = sizeof(old);
first = 1;

signal(SIGALRM, alrm);
signal(SIGINT, quit);
signal(SIGINFO, info);

itv.it_value.tv_sec = 1;
itv.it_value.tv_usec = 0;
itv.it_interval = itv.it_value;
if (setitimer(ITIMER_REAL, , NULL) == -1)
err(1, "setitimer");

while (uninterrupted) {
if (need_header) {
need_header = 0;
printf("%20s %15s %7s %7s %7s %10s\n",
"uptime",
"elapsed",
"run",
"early",
"prompt",
"avg-late");
if (!first) {
sigsuspend();
continue;
}
}
old = new;
if (sysctl(mib, nitems(mib), , , NULL, 0) == -1)
err(1, "sysctl: KERN_CLOCKINTR_STATS");
then = now;
clock_gettime(CLOCK_BOOTTIME, );
if (first) {
first = 0;
sigsuspend();
continue;
}
timespecsub(, , );
diff.cs_dispatch_early = new.cs_dispatch_early - 
old.cs_dispatch_early;
diff.cs_dispatch_prompt = new.cs_dispatch_prompt - 
old.cs_dispatch_prompt;
diff.cs_dispatch_lateness = new.cs_dispatch_lateness - 
old.cs_dispatch_lateness;
diff.cs_events_run = new.cs_events_run - old.cs_events_run;
if (diff.cs_dispatch_prompt == 0)
avg_lateness = 0;
else
avg_lateness = diff.cs_dispatch_lateness / 
diff.cs_dispatch_prompt;
printf("%10lld.%09ld %5lld.%09ld %7llu %7llu %7llu %10llu\n",
(long long)now.tv_sec,
now.tv_nsec,
(long long)elapsed.tv_sec,
elapsed.tv_nsec,
diff.cs_events_run,
diff.cs_dispatch_early,
diff.cs_dispatch_prompt,
avg_lateness);
sigsuspend();
}

return 0;
}

void
alrm(int signo)
{
}

void
info(int signo)
{
need_header = 1;
}

void
quit(int signo)
{
uninterrupted = 0;
}



[please test] amd64: schedule clock interrupts against system clock

2021-06-24 Thread Scott Cheloha
Hi,

I'm looking for testers for the attached patch.  You need an amd64
machine with a lapic.

This includes:

- All "real" amd64 machines ever made
- amd64 VMs running on hypervisors that provide a virtual lapic

Note that this does *not* include:

- amd64 VMs running on OpenBSD's vmm(4).

(I will ask for a separate round of testing for vmm(4) VMs, don't
worry.)

The patch adds a new machine-independent clock interrupt scheduling
layer (hereafter, "clockintr") to the kernel in kern/kern_clockintr.c,
configures GENERIC amd64 kernels to use clockintr, and changes
amd64/lapic.c to use clockintr instead of calling hardclock(9)
directly.

Please apply the patch and make sure to reconfigure your kernel before
recompiling/installing it to test.  I am especially interested in
whether this breaks suspend/resume or hibernate/unhibernate.
Suspend/resume is unaffected on my Lenovo X1C7, is the same true for
your machine?  Please include a dmesg with your results.

Stats for the clockintr subsystem are exposed via sysctl(2).  If
you're interested in providing them you can compile and run the
program attached inline in my next mail.  A snippet of the output from
across a suspend/resume is especially useful.

This is the end of the mail if you just want to test this.  If you are
interested in the possible behavior changes or a description of how
clockintr works, keep reading.

Thanks,

-Scott

--

There are some behavior changes, but I have found them to be small,
harmless, and/or useful.  The first one is the most significant:

- Clockintr schedules events against the system clock, so hardclock(9)
  ticks are pegged to the system clock, so the length of a tick is now
  subject to NTP adjustment via adjtime(2) and adjfreq(2).

  In practice, NTP adjustment is very conservative.  In my testing the
  delta between the raw frequency and the NTP frequency is small
  when ntpd(8) is doing coarse correction with adjtime(2) and invisible
  when ntpd(8) is doing fine correction with adjfreq(2).

  The upshot of the frequency difference is sometimes you will get
  some spurious ("early") interrupts while ntpd(8) is correcting the
  clock.  They go away when the ntpd(8) finishes synchronizing.

  FWIW: Linux, FreeBSD, and DragonflyBSD have all made this jump.

- hardclock(9) will run simultaneously on every CPU in the system.
  This seems to be fine, but there might be some subtle contention
  that gets worse as you add more CPUs.  Worth investigating.

- amd64 gets a pseudorandom statclock().  This is desirable, right?

- "Lost" or delayed ticks are handled by the clockintr layer
  transparently.  This means that if the clock interrupt is delayed
  due to e.g. hypervisor delays, we don't "lose" ticks and the
  timeout schedule does not decay.

  This is super relevant for vmm(4), but it may also be relevant for
  other hypervisors.

--

Last, here are notes for people interested in the design or the actual
code.  Ask questions if something about my approach seems off, I have
never added a subsystem to the kernel before.  The code has not
changed much in the last six months so I think I am nearing a stable
design.  I will document these interfaces in a real manpage soon.

- Clockintr prototypes are declared in .

- Clockintr depends on the timecounter and the system clock to do its
  scheduling.  If there is no working timecounter the machine will
  hang, as multitasking preemption will cease.

- Global clockintr initialization is done via clockintr_init().  You
  call this from cpu_initclocks() on the primary CPU *after* you install
  a timecounter.  The function sets a global frequency for the
  hardclock(9) (required), a global frequency for the statclock()
  (or zero if you don't want a statclock), and sets global behavior
  flags.

  There is only one flag right now, CI_RNDSTAT, which toggles whether
  the statclock() has a pseudorandom period.  If the platform has a
  one-shot clock (e.g. amd64, arm64, etc.) it makes sense to set
  CI_RNDSTAT.  If the platform does not have a one-shot clock (e.g.
  alpha) there is no point in setting CI_RNDSTAT as the hardware
  cannot provide the feature.

- Per-CPU clockintr initialization is done via clockintr_cpu_init().
  On the primary CPU, call this immediately *after* you call
  clockintr_init().  Secondary CPUs should call this late in
  cpu_hatch(), probably right before cpu_switchto(9).  The function
  allocates memory for the local CPU's schedule and "installs" the local
  interrupt clock, if any.

  If the platform cannot provide a local interrupt clock with a
  one-shot mode you just pass a NULL pointer to clockintr_cpu_init().
  The clockintr layer on that CPU will then run in "dummy" mode.  The
  platform is left responsible for scheduling clock interrupt delivery
  without input from the clockintr code.  The platform should deliver a
  clock interrupt hz(9) times per second to keep things running
  smoothly.  This mode works but it is not very accurate.

  If the platform 

Re: timeout.9: document kclock timeouts + a bunch of other changes

2021-06-24 Thread Scott Cheloha
On Thu, Jun 24, 2021 at 06:51:07AM +0100, Jason McIntyre wrote:
> On Wed, Jun 23, 2021 at 06:57:00PM -0500, Scott Cheloha wrote:
> > Hi,
> > 
> > I want to document kclock timeouts so others can use them.
> 
> morning. reads fine, except one issue:
> 
> [...]
> 
> > +.Bl -tag -width kclock
> > +.It Fa kclock
> > +The timeout is scheduled against the given
> > +.Fa kclock,
> 
> you need a space between kclock and the comma

Huh, I'm a little surprised -Tlint doesn't flag that.

Fixed.

Are there any other mdoc(7)-type stylistic anachronisms we can fix up?

Index: share/man/man9/timeout.9
===
RCS file: /cvs/src/share/man/man9/timeout.9,v
retrieving revision 1.53
diff -u -p -r1.53 timeout.9
--- share/man/man9/timeout.911 May 2021 13:29:25 -  1.53
+++ share/man/man9/timeout.924 Jun 2021 22:37:38 -
@@ -44,7 +44,7 @@
 .Nm timeout_triggered ,
 .Nm TIMEOUT_INITIALIZER ,
 .Nm TIMEOUT_INITIALIZER_FLAGS
-.Nd execute a function after a specified period of time
+.Nd execute a function in the future
 .Sh SYNOPSIS
 .In sys/types.h
 .In sys/timeout.h
@@ -55,12 +55,13 @@
 .Fa "struct timeout *to"
 .Fa "void (*fn)(void *)"
 .Fa "void *arg"
+.Fa "int kclock"
 .Fa "int flags"
 .Fc
 .Ft void
 .Fn timeout_set_proc "struct timeout *to" "void (*fn)(void *)" "void *arg"
 .Ft int
-.Fn timeout_add "struct timeout *to" "int ticks"
+.Fn timeout_add "struct timeout *to" "int nticks"
 .Ft int
 .Fn timeout_del "struct timeout *to"
 .Ft int
@@ -83,174 +84,218 @@
 .Fn timeout_add_usec "struct timeout *to" "int usec"
 .Ft int
 .Fn timeout_add_nsec "struct timeout *to" "int nsec"
-.Fn TIMEOUT_INITIALIZER "void (*fn)(void *)" "void *arg"
-.Fn TIMEOUT_INITIALIZER_FLAGS "void (*fn)(void *)" "void *arg" "int flags"
+.Ft int
+.Fn timeout_in_nsec "struct timeout *to" "uint64_t nsecs"
+.Ft int
+.Fn timeout_at_ts "struct timeout *to" "const struct timespec *abs"
+.Fo TIMEOUT_INITIALIZER
+.Fa "void (*fn)(void *)"
+.Fa "void *arg"
+.Fc
+.Fo TIMEOUT_INITIALIZER_FLAGS
+.Fa "void (*fn)(void *)"
+.Fa "void *arg"
+.Fa "int kclock"
+.Fa "int flags"
+.Fc
 .Sh DESCRIPTION
 The
 .Nm timeout
-API provides a mechanism to execute a function at a given time.
-The granularity of the time is limited by the granularity of the
-.Xr hardclock 9
-timer which executes
-.Xr hz 9
-times a second.
+API provides a mechanism to schedule a function for asynchronous
+execution in the future.
 .Pp
-It is the responsibility of the caller to provide these functions with
-pre-allocated timeout structures.
+All state is encapsulated in a caller-allocated timeout structure
+.Pq hereafter, a Qo timeout Qc .
+A timeout must be initialized before it may be used as input to other
+functions in the API.
 .Pp
 The
 .Fn timeout_set
-function prepares the timeout structure
-.Fa to
-to be used in future calls to
-.Fn timeout_add
-and
-.Fn timeout_del .
-The timeout will be prepared to call the function specified by the
+function initializes the timeout
+.Fa to .
+When executed,
+the timeout will call the function
 .Fa fn
-argument with a
-.Fa void *
-argument given in the
+with
 .Fa arg
-argument.
-Once initialized, the
-.Fa to
-structure can be used repeatedly in
-.Fn timeout_add
-and
-.Fn timeout_del
-and does not need to be reinitialized unless
-the function called and/or its argument must change.
+as its first parameter.
+The timeout is implicitly scheduled against the
+.Dv KCLOCK_NONE
+clock and is not configured with any additional flags.
 .Pp
 The
 .Fn timeout_set_flags
 function is similar to
-.Fn timeout_set
-but it additionally accepts the bitwise OR of zero or more of the
-following
+.Fn timeout_set ,
+except that it takes two additional parameters:
+.Bl -tag -width kclock
+.It Fa kclock
+The timeout is scheduled against the given
+.Fa kclock ,
+which must be one of the following:
+.Bl -tag -width KCLOCK_UPTIME
+.It Dv KCLOCK_NONE
+Low resolution tick-based clock.
+The granularity of this clock is limited by the
+.Xr hardclock 9 ,
+which executes roughly
+.Xr hz 9
+times per second.
+.It Dv KCLOCK_UPTIME
+The uptime clock.
+Counts the time elapsed since the system booted.
+.El
+.It Fa flags
+The timeout's behavior may be configured with the bitwise OR of
+zero or more of the following
 .Fa flags :
-.Bl -tag -width TIMEOUT_PROC -offset indent
+.Bl -tag -width TIMEOUT_PROC
 .It Dv TIMEOUT_PROC
-Runs the timeout in a process context instead of the default
+Execute the timeout in a process context instead of the default
 .Dv IPL_SOFTCLOCK
 interrupt context.
 .El
+.El
 .Pp
 The
 .Fn timeout_set_proc
-function is similar to
+function is equivalent to
+.Fn timeout_set ,
+except that the given timeout is configured with the
+.Dv TIMEOUT_PROC
+flag.
+.Pp
+A timeout may also be initialized statically.
+The
+.Fn TIMEOUT_INITIALIZER
+macro is equivalent to the
 .Fn timeout_set
-but it runs the timeout in a process context instead of the default
-.Dv IPL_SOFTCLOCK
-interrupt context.
+function and the
+.Fn 

OpenSMTPD: status set incorrectly on envelope expiration + fix

2021-06-24 Thread Erik Brens

Our OpenSMTPD daemon sent us the following confusing message:


A message is delayed for more than 0 seconds for the following
list of recipients:

b...@gus.test: Envelope expired


The enclosed patch will fix the issue permanently.

Sincerely, Erik
Index: usr.sbin/smtpd/queue.c
===
RCS file: /cvs/src/usr.sbin/smtpd/queue.c,v
retrieving revision 1.193
diff -u -p -r1.193 queue.c
--- usr.sbin/smtpd/queue.c  14 Jun 2021 17:58:16 -  1.193
+++ usr.sbin/smtpd/queue.c  24 Jun 2021 20:34:29 -
@@ -199,7 +199,7 @@ queue_imsg(struct mproc *p, struct imsg 
 
bounce.type = B_FAILED;
envelope_set_errormsg(, "Envelope expired");
-   envelope_set_esc_class(, ESC_STATUS_TEMPFAIL);
+   envelope_set_esc_class(, ESC_STATUS_PERMFAIL);
envelope_set_esc_code(, ESC_DELIVERY_TIME_EXPIRED);
queue_bounce(, );
queue_log(, "Expire", "Envelope expired");


Re: bgpd refactor network flush code a bit

2021-06-24 Thread Sebastian Benoit
Claudio Jeker(cje...@diehard.n-r-g.com) on 2021.06.24 17:06:36 +0200:
> The network flush code only operates on peerself (like all the other
> network commands). Instead of passing a peer to the tree walker just
> default to peerself in network_flush_upcall().
> This makes the code more obivous that it operates on peerself.
> 

ok


> -- 
> :wq Claudio
> 
> Index: rde.c
> ===
> RCS file: /cvs/src/usr.sbin/bgpd/rde.c,v
> retrieving revision 1.528
> diff -u -p -r1.528 rde.c
> --- rde.c 24 Jun 2021 13:03:31 -  1.528
> +++ rde.c 24 Jun 2021 15:00:32 -
> @@ -517,7 +517,7 @@ badnetdel:
>   break;
>   }
>   if (rib_dump_new(RIB_ADJ_IN, AID_UNSPEC,
> - RDE_RUNNER_ROUNDS, peerself, network_flush_upcall,
> + RDE_RUNNER_ROUNDS, NULL, network_flush_upcall,
>   NULL, NULL) == -1)
>   log_warn("rde_dispatch: IMSG_NETWORK_FLUSH");
>   break;
> @@ -4065,13 +4065,12 @@ network_dump_upcall(struct rib_entry *re
>  static void
>  network_flush_upcall(struct rib_entry *re, void *ptr)
>  {
> - struct rde_peer *peer = ptr;
>   struct bgpd_addr addr;
>   struct prefix *p;
>   u_int32_t i;
>   u_int8_t prefixlen;
>  
> - p = prefix_bypeer(re, peer);
> + p = prefix_bypeer(re, peerself);
>   if (p == NULL)
>   return;
>   if ((prefix_aspath(p)->flags & F_ANN_DYNAMIC) != F_ANN_DYNAMIC)
> @@ -4084,14 +4083,14 @@ network_flush_upcall(struct rib_entry *r
>   struct rib *rib = rib_byid(i);
>   if (rib == NULL)
>   continue;
> - if (prefix_withdraw(rib, peer, , prefixlen) == 1)
> - rde_update_log("flush announce", i, peer,
> + if (prefix_withdraw(rib, peerself, , prefixlen) == 1)
> + rde_update_log("flush announce", i, peerself,
>   NULL, , prefixlen);
>   }
>  
> - if (prefix_withdraw(rib_byid(RIB_ADJ_IN), peer, ,
> + if (prefix_withdraw(rib_byid(RIB_ADJ_IN), peerself, ,
>   prefixlen) == 1)
> - peer->prefix_cnt--;
> + peerself->prefix_cnt--;
>  }
>  
>  /* clean up */
> 



Re: bgpd shuffle some code around

2021-06-24 Thread Sebastian Benoit
ok

Claudio Jeker(cje...@diehard.n-r-g.com) on 2021.06.24 17:03:58 +0200:
> In rde_update_dispatch() do the AFI check for IPv4 prefixes before
> extracting the prefix. This is similar to what the MP code does and
> is also more logical.
> 
> OK?
> -- 
> :wq Claudio
> 
> Index: rde.c
> ===
> RCS file: /cvs/src/usr.sbin/bgpd/rde.c,v
> retrieving revision 1.528
> diff -u -p -r1.528 rde.c
> --- rde.c 24 Jun 2021 13:03:31 -  1.528
> +++ rde.c 24 Jun 2021 15:00:10 -
> @@ -1280,6 +1280,14 @@ rde_update_dispatch(struct rde_peer *pee
>  
>   /* withdraw prefix */
>   while (len > 0) {
> + if (peer->capa.mp[AID_INET] == 0) {
> + log_peer_warnx(>conf,
> + "bad withdraw, %s disabled", aid2str(AID_INET));
> + rde_update_err(peer, ERR_UPDATE, ERR_UPD_OPTATTR,
> + NULL, 0);
> + goto done;
> + }
> +
>   if ((pos = nlri_get_prefix(p, len, ,
>   )) == -1) {
>   /*
> @@ -1294,14 +1302,6 @@ rde_update_dispatch(struct rde_peer *pee
>   p += pos;
>   len -= pos;
>  
> - if (peer->capa.mp[AID_INET] == 0) {
> - log_peer_warnx(>conf,
> - "bad withdraw, %s disabled", aid2str(AID_INET));
> - rde_update_err(peer, ERR_UPDATE, ERR_UPD_OPTATTR,
> - NULL, 0);
> - goto done;
> - }
> -
>   rde_update_withdraw(peer, , prefixlen);
>   }
>  
> @@ -1393,6 +1393,14 @@ rde_update_dispatch(struct rde_peer *pee
>  
>   /* parse nlri prefix */
>   while (nlri_len > 0) {
> + if (peer->capa.mp[AID_INET] == 0) {
> + log_peer_warnx(>conf,
> + "bad update, %s disabled", aid2str(AID_INET));
> + rde_update_err(peer, ERR_UPDATE, ERR_UPD_OPTATTR,
> + NULL, 0);
> + goto done;
> + }
> +
>   if ((pos = nlri_get_prefix(p, nlri_len, ,
>   )) == -1) {
>   log_peer_warnx(>conf, "bad nlri prefix");
> @@ -1402,14 +1410,6 @@ rde_update_dispatch(struct rde_peer *pee
>   }
>   p += pos;
>   nlri_len -= pos;
> -
> - if (peer->capa.mp[AID_INET] == 0) {
> - log_peer_warnx(>conf,
> - "bad update, %s disabled", aid2str(AID_INET));
> - rde_update_err(peer, ERR_UPDATE, ERR_UPD_OPTATTR,
> - NULL, 0);
> - goto done;
> - }
>  
>   if (rde_update_update(peer, , , prefixlen) == -1)
>   goto done;
> 



Re: bgpd fix add-path capability encoding

2021-06-24 Thread Sebastian Benoit
Claudio Jeker(cje...@diehard.n-r-g.com) on 2021.06.22 19:47:04 +0200:
> Dumb copy paste error. The add-path capability is 4byte per AFI/SAFI
> the 2 + is from graceful restart where two extra bytes are at the front of
> the AFI/SAFI list.

ok.

> 
> -- 
> :wq Claudio
> 
> Index: session.c
> ===
> RCS file: /cvs/src/usr.sbin/bgpd/session.c,v
> retrieving revision 1.421
> diff -u -p -r1.421 session.c
> --- session.c 17 Jun 2021 16:05:26 -  1.421
> +++ session.c 22 Jun 2021 11:50:08 -
> @@ -1470,9 +1470,9 @@ session_open(struct peer *p)
>   u_int8_taplen;
>  
>   if (mpcapa)
> - aplen = 2 + 4 * mpcapa;
> + aplen = 4 * mpcapa;
>   else/* AID_INET */
> - aplen = 2 + 4;
> + aplen = 4;
>   errs += session_capa_add(opb, CAPA_ADD_PATH, aplen);
>   if (mpcapa) {
>   for (i = AID_MIN; i < AID_MAX; i++) {
> 



SiFive Unmatched if_cad fix

2021-06-24 Thread Mickael Torres
Hello,

On the risc-v SiFive Unmatched the internal cad0 ethernet interface stops 
working randomly after some packets are sent/received. It looks like it's 
because the bus_dmamap used isn't restricted to lower than 4GB physical 
addresses, and the interface itself is.

Configuring the interface for 64 bits DMA fixes the problem, and the machine 
is now useable with its internal ethernet port.

I didn't test very extensively, but it was very easy to run into 
"cad0: hresp error, interface stopped" before the patch. After the patch it 
survived a couple hours of tests and ping -f from and to it.

It is now depending on being compiled for __riscv64__ or not, would it be 
better to do it dynamically when matching "sifive,fu740-c000-gem" ?

Best,
Mickael



Index: sys/dev/fdt/if_cad.c
===
RCS file: /cvs/src/sys/dev/fdt/if_cad.c,v
retrieving revision 1.2
diff -u -p -u -r1.2 if_cad.c
--- sys/dev/fdt/if_cad.c13 Jun 2021 02:56:48 -  1.2
+++ sys/dev/fdt/if_cad.c24 Jun 2021 18:56:07 -
@@ -54,6 +54,10 @@
 #include 
 #include 
 
+#ifdef __riscv64__
+#define GEM_DMA64
+#endif
+
 #define GEM_NETCTL 0x
 #define  GEM_NETCTL_DPRAM  (1 << 18)
 #define  GEM_NETCTL_STARTTX(1 << 9)
@@ -92,6 +96,7 @@
 #define  GEM_DMACR_ES_DESCR(1 << 6)
 #define  GEM_DMACR_BLEN_MASK   (0x1f << 0)
 #define  GEM_DMACR_BLEN_16 (0x10 << 0)
+#define  GEM_DMACR_DMA64   (1 << 30)
 #define GEM_TXSR   0x0014
 #define  GEM_TXSR_TXGO (1 << 3)
 #define GEM_RXQBASE0x0018
@@ -168,6 +173,8 @@
 #define GEM_RXIPCCNT   0x01a8
 #define GEM_RXTCPCCNT  0x01ac
 #define GEM_RXUDPCCNT  0x01b0
+#define GEM_TXQBASE_HI 0x04c8
+#define GEM_RXQBASE_HI 0x04d4
 
 #define GEM_CLK_TX "tx_clk"
 
@@ -186,6 +193,10 @@ struct cad_dmamem {
 struct cad_desc {
uint32_td_addr;
uint32_td_status;
+#ifdef GEM_DMA64
+   uint32_td_addrh;
+   uint32_td_pad;
+#endif
 };
 
 #define GEM_RXD_ADDR_WRAP  (1 << 1)
@@ -211,6 +222,10 @@ struct cad_desc {
 struct cad_txdesc {
uint32_ttxd_addr;
uint32_ttxd_status;
+#ifdef GEM_DMA64
+   uint32_ttxd_addrh;
+   uint32_ttxd_pad;
+#endif
 };
 
 #define GEM_TXD_USED   (1 << 31)
@@ -549,6 +564,10 @@ cad_reset(struct cad_softc *sc)
HWRITE4(sc, GEM_TXSR, 0);
HWRITE4(sc, GEM_RXQBASE, 0);
HWRITE4(sc, GEM_TXQBASE, 0);
+#ifdef GEM_DMA64
+   HWRITE4(sc, GEM_RXQBASE_HI, 0);
+   HWRITE4(sc, GEM_TXQBASE_HI, 0);
+#endif
 
/* MDIO clock rate must not exceed 2.5 MHz. */
freq = clock_get_frequency(sc->sc_node, "pclk");
@@ -607,6 +626,10 @@ cad_up(struct cad_softc *sc)
0, sc->sc_txring->cdm_size,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 
+#ifdef GEM_DMA64
+   HWRITE4(sc, GEM_TXQBASE_HI,
+   sc->sc_txring->cdm_map->dm_segs[0].ds_addr >> 32);
+#endif
HWRITE4(sc, GEM_TXQBASE, sc->sc_txring->cdm_map->dm_segs[0].ds_addr);
 
/*
@@ -642,6 +665,10 @@ cad_up(struct cad_softc *sc)
0, sc->sc_rxring->cdm_size,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 
+#ifdef GEM_DMA64
+   HWRITE4(sc, GEM_RXQBASE_HI,
+   sc->sc_rxring->cdm_map->dm_segs[0].ds_addr >> 32);
+#endif
HWRITE4(sc, GEM_RXQBASE, sc->sc_rxring->cdm_map->dm_segs[0].ds_addr);
 
/*
@@ -697,6 +724,9 @@ cad_up(struct cad_softc *sc)
val |= (MCLBYTES / 64) << GEM_DMACR_RXBUF_SHIFT;
val &= ~GEM_DMACR_BLEN_MASK;
val |= GEM_DMACR_BLEN_16;
+#ifdef GEM_DMA64
+   val |= GEM_DMACR_DMA64;
+#endif
 
if (ifp->if_capabilities & IFCAP_CSUM_IPv4)
val |= GEM_DMACR_TXCSUMEN;
@@ -987,6 +1017,9 @@ cad_encap(struct cad_softc *sc, struct m
 
txd = >sc_txdesc[idx];
txd->d_addr = map->dm_segs[i].ds_addr;
+#ifdef GEM_DMA64
+   txd->d_addrh = map->dm_segs[i].ds_addr >> 32;
+#endif
 
/* Make d_addr visible before GEM_TXD_USED is cleared
 * in d_status. */
@@ -1151,6 +1184,10 @@ cad_rxfill(struct cad_softc *sc)
rxd = >sc_rxdesc[idx];
rxd->d_status = 0;
 
+#ifdef GEM_DMA64
+   rxd->d_addrh = rxb->bf_map->dm_segs[0].ds_addr >> 32;
+#endif
+
/* Make d_status visible before clearing GEM_RXD_ADDR_USED
 * in d_addr. */
bus_dmamap_sync(sc->sc_dmat, sc->sc_rxring->cdm_map,
@@ -1413,9 +1450,16 @@ cad_dmamem_alloc(struct cad_softc *sc, b
cdm = malloc(sizeof(*cdm), M_DEVBUF, M_WAITOK | M_ZERO);

Whitespace diff for sys/arch/amd64/amd64/identcpu.c

2021-06-24 Thread Ashton Fagg
Found these while tinkering with my CPU scaling issue the other day. Two
instances of trailing white-space.

Index: sys/arch/amd64/amd64/identcpu.c
===
RCS file: /cvs/src/sys/arch/amd64/amd64/identcpu.c,v
retrieving revision 1.118
diff -u -p -u -p -r1.118 identcpu.c
--- sys/arch/amd64/amd64/identcpu.c	31 Dec 2020 06:22:33 -	1.118
+++ sys/arch/amd64/amd64/identcpu.c	24 Jun 2021 15:42:05 -
@@ -941,7 +941,7 @@ cpu_check_vmm_cap(struct cpu_info *ci)
 			/* VM Functions available? */
 			if (msr & (IA32_VMX_ENABLE_VM_FUNCTIONS) << 32) {
 ci->ci_vmm_cap.vcc_vmx.vmx_vm_func =
-rdmsr(IA32_VMX_VMFUNC);	
+rdmsr(IA32_VMX_VMFUNC);
 			}
 		}
 	}
@@ -1025,7 +1025,7 @@ cpu_check_vmm_cap(struct cpu_info *ci)
 		 * hardware (RDCL_NO), or we may be nested in an VMM that
 		 * is doing flushes (SKIP_L1DFL_VMENTRY) using the MSR.
 		 * In either case no mitigation at all is necessary.
-		 */	
+		 */
 		if (ci->ci_feature_sefflags_edx & SEFF0EDX_ARCH_CAP) {
 			msr = rdmsr(MSR_ARCH_CAPABILITIES);
 			if ((msr & ARCH_CAPABILITIES_RDCL_NO) ||


OpenBSD Errata: June 25, 2021 (bgpd)

2021-06-24 Thread Sebastian Benoit
An errata patch for the bgpd routing daemon has been released for
OpenBSD 6.9.

During bgpd(8) config reloads prefixes of the wrong address family could
leak to peers resulting in session resets.

Binary updates for the amd64, i386, and arm64 platform are available
via the syspatch utility. Source code patches can be found on the
respective errata page:

  https://www.openbsd.org/errata69.html
  



bgpd refactor network flush code a bit

2021-06-24 Thread Claudio Jeker
The network flush code only operates on peerself (like all the other
network commands). Instead of passing a peer to the tree walker just
default to peerself in network_flush_upcall().
This makes the code more obivous that it operates on peerself.

-- 
:wq Claudio

Index: rde.c
===
RCS file: /cvs/src/usr.sbin/bgpd/rde.c,v
retrieving revision 1.528
diff -u -p -r1.528 rde.c
--- rde.c   24 Jun 2021 13:03:31 -  1.528
+++ rde.c   24 Jun 2021 15:00:32 -
@@ -517,7 +517,7 @@ badnetdel:
break;
}
if (rib_dump_new(RIB_ADJ_IN, AID_UNSPEC,
-   RDE_RUNNER_ROUNDS, peerself, network_flush_upcall,
+   RDE_RUNNER_ROUNDS, NULL, network_flush_upcall,
NULL, NULL) == -1)
log_warn("rde_dispatch: IMSG_NETWORK_FLUSH");
break;
@@ -4065,13 +4065,12 @@ network_dump_upcall(struct rib_entry *re
 static void
 network_flush_upcall(struct rib_entry *re, void *ptr)
 {
-   struct rde_peer *peer = ptr;
struct bgpd_addr addr;
struct prefix *p;
u_int32_t i;
u_int8_t prefixlen;
 
-   p = prefix_bypeer(re, peer);
+   p = prefix_bypeer(re, peerself);
if (p == NULL)
return;
if ((prefix_aspath(p)->flags & F_ANN_DYNAMIC) != F_ANN_DYNAMIC)
@@ -4084,14 +4083,14 @@ network_flush_upcall(struct rib_entry *r
struct rib *rib = rib_byid(i);
if (rib == NULL)
continue;
-   if (prefix_withdraw(rib, peer, , prefixlen) == 1)
-   rde_update_log("flush announce", i, peer,
+   if (prefix_withdraw(rib, peerself, , prefixlen) == 1)
+   rde_update_log("flush announce", i, peerself,
NULL, , prefixlen);
}
 
-   if (prefix_withdraw(rib_byid(RIB_ADJ_IN), peer, ,
+   if (prefix_withdraw(rib_byid(RIB_ADJ_IN), peerself, ,
prefixlen) == 1)
-   peer->prefix_cnt--;
+   peerself->prefix_cnt--;
 }
 
 /* clean up */



bgpd shuffle some code around

2021-06-24 Thread Claudio Jeker
In rde_update_dispatch() do the AFI check for IPv4 prefixes before
extracting the prefix. This is similar to what the MP code does and
is also more logical.

OK?
-- 
:wq Claudio

Index: rde.c
===
RCS file: /cvs/src/usr.sbin/bgpd/rde.c,v
retrieving revision 1.528
diff -u -p -r1.528 rde.c
--- rde.c   24 Jun 2021 13:03:31 -  1.528
+++ rde.c   24 Jun 2021 15:00:10 -
@@ -1280,6 +1280,14 @@ rde_update_dispatch(struct rde_peer *pee
 
/* withdraw prefix */
while (len > 0) {
+   if (peer->capa.mp[AID_INET] == 0) {
+   log_peer_warnx(>conf,
+   "bad withdraw, %s disabled", aid2str(AID_INET));
+   rde_update_err(peer, ERR_UPDATE, ERR_UPD_OPTATTR,
+   NULL, 0);
+   goto done;
+   }
+
if ((pos = nlri_get_prefix(p, len, ,
)) == -1) {
/*
@@ -1294,14 +1302,6 @@ rde_update_dispatch(struct rde_peer *pee
p += pos;
len -= pos;
 
-   if (peer->capa.mp[AID_INET] == 0) {
-   log_peer_warnx(>conf,
-   "bad withdraw, %s disabled", aid2str(AID_INET));
-   rde_update_err(peer, ERR_UPDATE, ERR_UPD_OPTATTR,
-   NULL, 0);
-   goto done;
-   }
-
rde_update_withdraw(peer, , prefixlen);
}
 
@@ -1393,6 +1393,14 @@ rde_update_dispatch(struct rde_peer *pee
 
/* parse nlri prefix */
while (nlri_len > 0) {
+   if (peer->capa.mp[AID_INET] == 0) {
+   log_peer_warnx(>conf,
+   "bad update, %s disabled", aid2str(AID_INET));
+   rde_update_err(peer, ERR_UPDATE, ERR_UPD_OPTATTR,
+   NULL, 0);
+   goto done;
+   }
+
if ((pos = nlri_get_prefix(p, nlri_len, ,
)) == -1) {
log_peer_warnx(>conf, "bad nlri prefix");
@@ -1402,14 +1410,6 @@ rde_update_dispatch(struct rde_peer *pee
}
p += pos;
nlri_len -= pos;
-
-   if (peer->capa.mp[AID_INET] == 0) {
-   log_peer_warnx(>conf,
-   "bad update, %s disabled", aid2str(AID_INET));
-   rde_update_err(peer, ERR_UPDATE, ERR_UPD_OPTATTR,
-   NULL, 0);
-   goto done;
-   }
 
if (rde_update_update(peer, , , prefixlen) == -1)
goto done;



Re: bgpd fix bad free() call when deflating aspath

2021-06-24 Thread Claudio Jeker
On Tue, Jun 22, 2021 at 08:19:22PM +0200, Claudio Jeker wrote:
> I changed the way up_generate_attr() calls aspath_deflate() but did not
> realize that aspath_deflate() frees the pdata at the end. That free should
> no longer happen but for that also the mrt case where aspath_deflate()
> needs to be adjusted.
> 
> With this both the mrt and as0 integration test pass again.

Ping. This crashes bgpd when talking to old 2-byte only BGP speakers. So
I would like to commit this soon.

-- 
:wq Claudio

Index: rde_attr.c
===
RCS file: /cvs/src/usr.sbin/bgpd/rde_attr.c,v
retrieving revision 1.124
diff -u -p -r1.124 rde_attr.c
--- rde_attr.c  16 Jan 2021 13:14:54 -  1.124
+++ rde_attr.c  22 Jun 2021 18:13:15 -
@@ -568,7 +568,6 @@ aspath_put(struct aspath *aspath)
 
 /*
  * convert a 4 byte aspath to a 2 byte one.
- * data is freed by aspath_deflate
  */
 u_char *
 aspath_deflate(u_char *data, u_int16_t *len, int *flagnew)
@@ -614,7 +613,6 @@ aspath_deflate(u_char *data, u_int16_t *
}
}
 
-   free(data);
*len = nlen;
return (ndata);
 }
Index: mrt.c
===
RCS file: /cvs/src/usr.sbin/bgpd/mrt.c,v
retrieving revision 1.103
diff -u -p -r1.103 mrt.c
--- mrt.c   9 Jan 2020 11:55:25 -   1.103
+++ mrt.c   22 Jun 2021 18:12:45 -
@@ -160,15 +160,19 @@ mrt_attr_dump(struct ibuf *buf, struct r
return (-1);
 
/* aspath */
-   pdata = aspath_prepend(a->aspath, rde_local_as(), 0, );
+   plen = aspath_length(a->aspath);
+   pdata = aspath_dump(a->aspath);
+
if (!v2)
pdata = aspath_deflate(pdata, , );
if (attr_writebuf(buf, ATTR_WELL_KNOWN, ATTR_ASPATH, pdata,
plen) == -1) {
-   free(pdata);
+   if (!v2)
+   free(pdata);
return (-1);
}
-   free(pdata);
+   if (!v2)
+   free(pdata);
 
if (nexthop && nexthop->aid == AID_INET) {
/* nexthop, already network byte order */




Re: Bad comparison between double and uint64_t

2021-06-24 Thread Masato Asou
Hi Ali,

From: Ali Farzanrad 
Date: Thu, 24 Jun 2021 07:25:24 +

> Hi Masato,
> 
> Masato Asou  wrote:
>> Hi Ali,
>> 
>> In this case, ULLONG_MAX is implicitly cast to double, isn't it?
> 
> Yes it is implicitly cast to double, but due to floating point number
> precision, ULLONG_MAX will convert to (ULLONG_MAX + 1).
> So in case val is really ULLONG_MAX + 1, this condition is false:
> 
> if (val > ULLONG_MAX)
> 
> and when we want to convert val to u_int64_t we will have
> (ULLONG_MAX + 1) which is 0.
> 
> *n = val; /* stores ULLONG_MAX + 1 which is 0 */

I was not aware of that problem.
Thank you for the explanation.

> I'm not sure if it cause any real problem, but it is wrong and should be
> fixed.

It is true that I don't know what specific actual problem will occur,
but there is a potential problem.
--
ASOU Masato

>> 
>> Do you have any problems if you don't cast to double? 
>> --
>> ASOU Masato
>> 
>> From: Ali Farzanrad 
>> Date: Wed, 23 Jun 2021 20:24:27 +
>> 
>> > Oh, my bad, sorry.
>> > I assumed that val is always integer.
>> > I guess it is better to ignore val == ULLONG_MAX:
>> > 
>> > ===
>> > RCS file: /cvs/src/sbin/disklabel/editor.c,v
>> > retrieving revision 1.368
>> > diff -u -p -r1.368 editor.c
>> > --- editor.c   2021/05/30 19:02:30 1.368
>> > +++ editor.c   2021/06/23 20:23:03
>> > @@ -2418,7 +2418,7 @@ apply_unit(double val, u_char unit, u_int64_t *n)
>> >}
>> >  
>> >val *= factor / DEV_BSIZE;
>> > -  if (val > ULLONG_MAX)
>> > +  if (val >= (double)ULLONG_MAX)
>> >return (-1);
>> >*n = val;
>> >return (0);
>> > @@ -2429,7 +2429,7 @@ parse_sizespec(const char *buf, double *val, char 
>> > **un
>> >  {
>> >errno = 0;
>> >*val = strtod(buf, unit);
>> > -  if (errno == ERANGE || *val < 0 || *val > ULLONG_MAX)
>> > +  if (errno == ERANGE || *val < 0 || *val >= (double)ULLONG_MAX)
>> >return (-1);/* too big/small */
>> >if (*val == 0 && *unit == buf)
>> >return (-1);/* No conversion performed. */
>> > 
>> > Ali Farzanrad  wrote:
>> >> Hi tech@,
>> >> 
>> >> disklabel shows a warning at build time which might be important.
>> >> Following diff will surpass the warning.
>> >> 
>> >> ===
>> >> RCS file: /cvs/src/sbin/disklabel/editor.c,v
>> >> retrieving revision 1.368
>> >> diff -u -p -r1.368 editor.c
>> >> --- editor.c  2021/05/30 19:02:30 1.368
>> >> +++ editor.c  2021/06/23 19:25:45
>> >> @@ -2418,7 +2418,7 @@ apply_unit(double val, u_char unit, u_int64_t *n)
>> >>   }
>> >>  
>> >>   val *= factor / DEV_BSIZE;
>> >> - if (val > ULLONG_MAX)
>> >> + if (val != (double)(u_int64_t)val)
>> >>   return (-1);
>> >>   *n = val;
>> >>   return (0);
>> >> @@ -2429,7 +2429,7 @@ parse_sizespec(const char *buf, double *val, char 
>> >> **un
>> >>  {
>> >>   errno = 0;
>> >>   *val = strtod(buf, unit);
>> >> - if (errno == ERANGE || *val < 0 || *val > ULLONG_MAX)
>> >> + if (errno == ERANGE || *val < 0 || *val != (double)(u_int64_t)*val)
>> >>   return (-1);/* too big/small */
>> >>   if (*val == 0 && *unit == buf)
>> >>   return (-1);/* No conversion performed. */
>> >> 
>> >> 
>> > 
>> 
>> 



Re: [External] : Re: if_etherbridge.c vs. parallel forwarding

2021-06-24 Thread Alexandr Nedvedicky
Hello David,


> 
> i think we can get away with not refcounting eb_entry structures at all.
> either they're in the etherbridge map/table or they're not, and the
> thing that takes them out of the map while holding the eb_lock mutex
> becomes responsible for their cleanup.
> 
> i feel like most of the original logic can still hold up if we fix my
> stupid refcnting mistake(s) and do a better job of avoiding a double
> free.

I'm not sure. It seems to me the code in your diff deals with
insert vs. insert race properly. how about delete vs. insert?

350 mtx_enter(>eb_lock);
351 num = eb->eb_num + (oebe == NULL);  
 
352 if (num <= eb->eb_max && ebt_insert(eb, nebe) == oebe) {
 
353 /* we won, do the update */ 
 
354 ebl_insert(ebl, nebe);
355 
356 if (oebe != NULL) {
357 ebl_remove(ebl, oebe);
358 ebt_replace(eb, oebe, nebe);
359 }
360 
361 nebe = NULL; /* give nebe reference to the table */
362 eb->eb_num = num;   
 
363 } else {
 
364 /* we lost, we didn't end up replacing oebe */
365 oebe = NULL;
366 }
367 mtx_leave(>eb_lock);
368 

assume cpu0 got oebe and assumes it is going to perform update (oebe != 
NULL).
the cpu1 runs ahead and won mutex (->eb_lock) in etherbridge_del_addr() and
removed the entry successfully. as soon as cpu1 leaves ->eb_lock, it's
cpu0's turn. In this case ebt_insert() returns NULL, because there is
no conflict any more. However 'NULL != oebe'.

I'm not sure we can fix insert vs. delete race properly without atomic
reference counter.

thanks and
regards
sashan



Re: Bad comparison between double and uint64_t

2021-06-24 Thread Ali Farzanrad
Hi Masato,

Masato Asou  wrote:
> Hi Ali,
> 
> In this case, ULLONG_MAX is implicitly cast to double, isn't it?

Yes it is implicitly cast to double, but due to floating point number
precision, ULLONG_MAX will convert to (ULLONG_MAX + 1).
So in case val is really ULLONG_MAX + 1, this condition is false:

if (val > ULLONG_MAX)

and when we want to convert val to u_int64_t we will have
(ULLONG_MAX + 1) which is 0.

*n = val; /* stores ULLONG_MAX + 1 which is 0 */

I'm not sure if it cause any real problem, but it is wrong and should be
fixed.

> 
> Do you have any problems if you don't cast to double? 
> --
> ASOU Masato
> 
> From: Ali Farzanrad 
> Date: Wed, 23 Jun 2021 20:24:27 +
> 
> > Oh, my bad, sorry.
> > I assumed that val is always integer.
> > I guess it is better to ignore val == ULLONG_MAX:
> > 
> > ===
> > RCS file: /cvs/src/sbin/disklabel/editor.c,v
> > retrieving revision 1.368
> > diff -u -p -r1.368 editor.c
> > --- editor.c2021/05/30 19:02:30 1.368
> > +++ editor.c2021/06/23 20:23:03
> > @@ -2418,7 +2418,7 @@ apply_unit(double val, u_char unit, u_int64_t *n)
> > }
> >  
> > val *= factor / DEV_BSIZE;
> > -   if (val > ULLONG_MAX)
> > +   if (val >= (double)ULLONG_MAX)
> > return (-1);
> > *n = val;
> > return (0);
> > @@ -2429,7 +2429,7 @@ parse_sizespec(const char *buf, double *val, char **un
> >  {
> > errno = 0;
> > *val = strtod(buf, unit);
> > -   if (errno == ERANGE || *val < 0 || *val > ULLONG_MAX)
> > +   if (errno == ERANGE || *val < 0 || *val >= (double)ULLONG_MAX)
> > return (-1);/* too big/small */
> > if (*val == 0 && *unit == buf)
> > return (-1);/* No conversion performed. */
> > 
> > Ali Farzanrad  wrote:
> >> Hi tech@,
> >> 
> >> disklabel shows a warning at build time which might be important.
> >> Following diff will surpass the warning.
> >> 
> >> ===
> >> RCS file: /cvs/src/sbin/disklabel/editor.c,v
> >> retrieving revision 1.368
> >> diff -u -p -r1.368 editor.c
> >> --- editor.c   2021/05/30 19:02:30 1.368
> >> +++ editor.c   2021/06/23 19:25:45
> >> @@ -2418,7 +2418,7 @@ apply_unit(double val, u_char unit, u_int64_t *n)
> >>}
> >>  
> >>val *= factor / DEV_BSIZE;
> >> -  if (val > ULLONG_MAX)
> >> +  if (val != (double)(u_int64_t)val)
> >>return (-1);
> >>*n = val;
> >>return (0);
> >> @@ -2429,7 +2429,7 @@ parse_sizespec(const char *buf, double *val, char 
> >> **un
> >>  {
> >>errno = 0;
> >>*val = strtod(buf, unit);
> >> -  if (errno == ERANGE || *val < 0 || *val > ULLONG_MAX)
> >> +  if (errno == ERANGE || *val < 0 || *val != (double)(u_int64_t)*val)
> >>return (-1);/* too big/small */
> >>if (*val == 0 && *unit == buf)
> >>return (-1);/* No conversion performed. */
> >> 
> >> 
> > 
> 
>