Re: setitimer(2): increase interval upper bound to UINT_MAX seconds

2021-06-17 Thread Scott Cheloha
On Fri, Jun 11, 2021 at 12:17:02PM -0500, Scott Cheloha wrote:
> Hi,
> 
> setitimer(2) has a one hundred million second upper bound for timers.
> Any timer interval larger than this is considered invalid and we set
> EINVAL.
> 
> There is no longer any reason to use this particular limit.  Kclock
> timeouts support the full range of a timespec, so we can trivially
> increase the upper bound without any practical risk of overflow.
> 
> This patch increases the upper bound to UINT_MAX seconds.
> 
> Why UINT_MAX?  UINT_MAX is the largest possible input to alarm(3).  We
> could then simplify the alarm(3) manpage and the libc alarm.c code in
> a subsequent patch.  POSIX says alarm(3) "is always successful".  Our
> implementation can fail.  It would be nicer/simpler if ours were free
> of failure modes.
> 
> ok?

1 week bump.

Updated patch: make the maximum value ("max") static and const.

Index: kern_time.c
===
RCS file: /cvs/src/sys/kern/kern_time.c,v
retrieving revision 1.153
diff -u -p -r1.153 kern_time.c
--- kern_time.c 11 Jun 2021 16:36:34 -  1.153
+++ kern_time.c 18 Jun 2021 01:40:42 -
@@ -709,15 +709,16 @@ out:
 int
 itimerfix(struct itimerval *itv)
 {
+   static const struct timeval max = { .tv_sec = UINT_MAX, .tv_usec = 0 };
struct timeval min_interval = { .tv_sec = 0, .tv_usec = tick };
 
if (itv->it_value.tv_sec < 0 || !timerisvalid(>it_value))
return EINVAL;
-   if (itv->it_value.tv_sec > 1)
+   if (timercmp(>it_value, , >))
return EINVAL;
if (itv->it_interval.tv_sec < 0 || !timerisvalid(>it_interval))
return EINVAL;
-   if (itv->it_interval.tv_sec > 1)
+   if (timercmp(>it_interval, , >))
return EINVAL;
 
if (!timerisset(>it_value))



Re: [CAN IGNORE] Proposal for new bc(1) and dc(1)

2021-06-17 Thread enh
On Thu, Jun 17, 2021 at 9:48 AM Otto Moerbeek  wrote:

> On Thu, Jun 17, 2021 at 10:01:02AM -0600, Gavin Howard wrote:
>
> > Otto,
> >
> > > I think it is interesting. As for the incompatibilites, my memory says
> > > I followed the original dc and bc when deciding on those (GNU chose to
> > > differs in these cases). Bit it has been 18 years since I wrote the
> > > current versions, so I might misrememeber.
> >
> > I think that makes sense to me. Unfortunately, when I was building my
> > dc, I couldn't find any mention in the OpenBSD man pages, which I used
> > to ensure as much compatibility as I could, that arrays and registers
> > were not separate. Well, there was one (the `;` command mentions
> > registers, but the `:` command does not, so I thought that was a typo).
> >
> > Regarding the 0 having length 0 or 1, that was a decision I agonized
> > over. My dad, who is a mathematician, said that it could go either way.
> > Unfortunately for me, if this is a showstopper incompatibility (and it
> > might be based on how the test suite uses `length()` and `Z`), I do
> > think I would keep it as it is and accept that OpenBSD will not want my
> > bc(1) and dc(1).
>
> It looks like GNU dc and bc do not agree:
>

heh. while i know what Theo means -- and his is also a valid philosophical
standpoint -- this kind of thing is one reason why i personally prefer
fewer implementations of things and more shared code between them :-)

at least it leads to more consistency, and to having fewer places to fix.
(since i think we've all had enough counterexamples to not believe that
whole "many eyes" thing, the real question is "how easily/quickly can you
make fixes, and how easily/quickly can you distribute them?"... given how
few eyes are actually open, i'd rather have them all looking at the same
set of problems :-( )


> $ dc -V
> dc (GNU bc 1.06) 1.3
>
> Copyright 1994, 1997, 1998, 2000 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There
> is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE,
> to the extent permitted by law.
> iMac:~ otto$ dc
> 0Zp
> 0
>
> and
>
> $ bc
> bc 1.06
> Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.
> This is free software with ABSOLUTELY NO WARRANTY.
> For details type `warranty'.
> length(0)
> 1
>
> I confirmed the original dc by Morris and Cherry indeed print 0 for
> the above test case.
>
> -Otto
> >
> > > As for moving to your version, I have no opinion yet. I have some
> > > attachment to the current code, but not so strong that I am opposing
> > > replacement upfront. OTOH the current implementaion is almost
> > > maintainance free for many years already. So I dunno.
> >
> > You have a right to have attachment to it; I have attachment to mine!
> >
> > In fact, I was pleasantly surprised at how clean and readable your code
> > was. I usually struggle to read code written by others, but I could
> > easily read yours.
> >
> > On that note, since last night, I thought of more disadvantages of
> > moving to my bc and dc, which I feel I must mention.
> >
> > More disadvantages:
> >
> > * The current dc(1) and bc(1) are from a known member of the OpenBSD
> >   community with many contributions. I am an unknown quantity.
> > * The current dc(1) and bc(1) do not have ugly portability code that
> >   OpenBSD probably doesn't care about.
> > * The current dc(1) and bc(1) do not have ugly code to support build
> >   options that OpenBSD does not care about.
> > * The binary size of the OpenBSD dc(1) and bc(1) combined are 78% the
> >   size of mine combined (on amd64). The size of OpenBSD combined is
> >   145440, and the size of mine combined are 185706.
> > * The current dc(1) and bc(1) have much less source code and have been
> >   nearly maintenance-free for many years. Mine were started in 2018 and
> >   do not have as long of a track record for being low maintenance.
> >
> > > I'll take a look at your code soon and maybe other devs have opinions.
> >
> > Thank you very much!
> >
> > Gavin Howard
> >
>
>


Re: [CAN IGNORE] Proposal for new bc(1) and dc(1)

2021-06-17 Thread enh
it's perhaps worth mentioning in the "pros" column that toybox (a
0BSD-licensed busybox which provides most of /bin on Android, which i
maintain) and busybox both use Gavin's bc/dc implementations[1]. so
although the GNU versions are their own thing and i assume likely to stay
that way, "all Android devices (plus any other toybox users) and everything
that uses busybox" is a fairly significant chunk of the market if you're
thinking about compatibility/interoperability issues.


1. Android doesn't actually use the toybox copy of bc, it uses Gavin's
upstream version directly. the only issue we've had with bc/dc on Android
was from a bug that was introduced in toybox that was never in upstream. if
you're thinking "how much use do bc/dc even get on Android devices
anyway?", that's a fair point, but note that we also use host prebuilts of
Gavin's bc/dc to _build_ Android (including all the third-party open source
packages and the kernel, not just code we wrote). that was the _real_
reason for me to get into the business of shipping bc/dc!

On Thu, Jun 17, 2021 at 9:08 AM Gavin Howard 
wrote:

> Otto,
>
> > I think it is interesting. As for the incompatibilites, my memory says
> > I followed the original dc and bc when deciding on those (GNU chose to
> > differs in these cases). Bit it has been 18 years since I wrote the
> > current versions, so I might misrememeber.
>
> I think that makes sense to me. Unfortunately, when I was building my
> dc, I couldn't find any mention in the OpenBSD man pages, which I used
> to ensure as much compatibility as I could, that arrays and registers
> were not separate. Well, there was one (the `;` command mentions
> registers, but the `:` command does not, so I thought that was a typo).
>
> Regarding the 0 having length 0 or 1, that was a decision I agonized
> over. My dad, who is a mathematician, said that it could go either way.
> Unfortunately for me, if this is a showstopper incompatibility (and it
> might be based on how the test suite uses `length()` and `Z`), I do
> think I would keep it as it is and accept that OpenBSD will not want my
> bc(1) and dc(1).
>
> > As for moving to your version, I have no opinion yet. I have some
> > attachment to the current code, but not so strong that I am opposing
> > replacement upfront. OTOH the current implementaion is almost
> > maintainance free for many years already. So I dunno.
>
> You have a right to have attachment to it; I have attachment to mine!
>
> In fact, I was pleasantly surprised at how clean and readable your code
> was. I usually struggle to read code written by others, but I could
> easily read yours.
>
> On that note, since last night, I thought of more disadvantages of
> moving to my bc and dc, which I feel I must mention.
>
> More disadvantages:
>
> * The current dc(1) and bc(1) are from a known member of the OpenBSD
>   community with many contributions. I am an unknown quantity.
> * The current dc(1) and bc(1) do not have ugly portability code that
>   OpenBSD probably doesn't care about.
> * The current dc(1) and bc(1) do not have ugly code to support build
>   options that OpenBSD does not care about.
> * The binary size of the OpenBSD dc(1) and bc(1) combined are 78% the
>   size of mine combined (on amd64). The size of OpenBSD combined is
>   145440, and the size of mine combined are 185706.
> * The current dc(1) and bc(1) have much less source code and have been
>   nearly maintenance-free for many years. Mine were started in 2018 and
>   do not have as long of a track record for being low maintenance.
>
> > I'll take a look at your code soon and maybe other devs have opinions.
>
> Thank you very much!
>
> Gavin Howard
>
>


Re: [CAN IGNORE] Proposal for new bc(1) and dc(1)

2021-06-17 Thread Gavin Howard
Mr. de Raadt,

> Eyes will usually look at the version they are used to, and any effort
> to shink the number of versions will probably fail to re-adapt those
> eyes towards looking at another version with the same focus, as they are
> not familiar with the replacement.

I think you are correct here. However, having more *users* does imply
that bugs would be run into more often. When users report bugs, I fix
them. (See [1] [2] [3], and there are many more by email.) This means
that even if there are no more eyes than mine looking at the code, more
bugs will be fixed.

Also, I have spent a lot of time making these programs as high quality
as I can. When I said that they have been fuzzed thoroughly, I wasn't
kidding. For this upcoming release, I have already fuzzed them and found
no crashes or memory bugs, and I will still fuzz them again after I take
into account all of your feedback.

> Additionally, all software efforts also face a limit on the number of
> cooks huddled in the kitchen, so I do not believe the development
> community grows in that way. I don't know of any succesfull examples
> where divergent teams merged into a single track.

The fact that divergent teams do not merge also implies that divergent
programs may not become more consistent with each other either.

> To me, it seems best if we urge everyone to craft their independent
> implementations towards consistency, and conversation on those details
> is important to get there.

I am not sure consistency will happen here, especially with dc(1), but
even with bc(1). For example, the incompatibility with the `length()`
function is an ambiguity that is fundamental to math because the POSIX
standard defers to the mathematical definition of "significant digits."

The POSIX standard is also ambiguous on whether numbers with all zeros
after the decimal point count as integers, which is important because
the power `^` operator requires an integer argument on the right.

For my implementations, I tried to follow the standard as best I could,
but these ambiguities pop up over and over.

But dc(1) is even worse because there is no standard; there are only
implementations from years ago. I based my dc's behavior on the OpenBSD
manpage specifically, but there were important things that were not
documented.

In my opinion, one of the advantages of my bc and dc is that I have
tried very hard to document and guarantee every bit of behavior. I am
even working on a document about how to do development on them. [4]

> There is another problem in our tree: The use of external upstream
> managed code is a royal pain in the ass. We do it where we have to,
> but the process for managing that kind stuff is a culturally weird.

I can't disagree, but what I can say is that I am willing to make this a
non-issue by making my code build using OpenBSD's infrastructure,
especially OpenBSD's `make`, so that you will not notice a difference.
That would include adapting my test suite to work in `regress/`, as well
as adapting existing dc and bc scripts in the tree, among other things.
And then I would take responsibility for maintaining those adaptations.
If I do, then if these are accepted, Otto doesn't have to worry about
bc(1) and dc(1) anymore; that would be my job.

However, you said before that there might be too many cooks in the
kitchen, so only you can decide if that would work for OpenBSD. I can
understand if it would not.

Gavin Howard

[1]: https://github.com/gavinhoward/bc/issues/31
[2]: https://github.com/gavinhoward/bc/issues/3
[3]: https://git.yzena.com/gavin/bc/issues/4
[4]: https://git.yzena.com/gavin/bc/src/branch/master/manuals/development.md



Re: vmd(8): add barebones vioblk GET_ID support

2021-06-17 Thread Mike Larkin
On Thu, Jun 17, 2021 at 12:07:10PM -0400, Dave Voutila wrote:
>
> Dave Voutila writes:
>
> > The virtio spec has had a de facto command for drivers to read the
> > serial number off a virtual block device. QEMU introduced this feature
> > years ago. Last November, the virtio governing group voted in favor of
> > adopting it officially into v1.2 (the next virtio spec) [1].
> >
> > The below diff adds the basics of handling the request returning an
> > empty serial number. (Serial numbers are limited to 20 bytes.) This
> > stops vmd from complaining about "unsupported command 0x8" when guests
> > send this command type.
>
> Got some feedback off-list from claudio@ that I think is sound. Instead
> of providing an "empty" serial id/number, simply return an UNSUPP status
> to indicate we don't support the value.
>
> I think this approach better than the approach I was suggesting that was
> based off QEMU's design of defaulting to "". (FreeBSD's Bhyve generates
> a serial like "BHYVE-1122-3344-5566" where the suffix is some truncated
> md5 of the backing filename. I'm not a fan of this approach.)
>
> >
> > secdata_desc{,idx} variables are renamed to just data_desc{,idx} to
> > semantically match the change since they're used for more than sector
> > data.
>
> I undid this renaming for now to reduce noise.
>
> > This is primarily part of my work to clean up and bring vmd's virtio
> > implementation more up to date and to align to our own
> > v{io,ioblk,ioscsi,etc.}(4) current capabilities. (vioblk(4) doesn't
> > support this yet, but Linux guests use it frequently.)
>
> While adding the BLK_ID support, I also switched the FLUSH/FLUSH_OUT
> response to be VIRTIO_BLK_S_UNSUPP as well since the device does not
> negotiate that feature. Any request from the guest to "flush" currently
> doesn't do anything (some hypervisors will fsync(2) the underlying fd)
> but for now I'm correcting the response code.
>
> I also noticed and added read/write checks prior to calls to
> {read,write}_mem. The virtio spec says a device MUST not write to a
> read-only descriptor and SHOULD NOT read a write-only descriptor with an
> exception being made for debugging. (See 2.6.5.1 Device requirements:
> The Virtqueue Descriptor Table.)
>
> Next steps in this are of code will be to properly implement the missing
> VIRTIO_BLK_S_IOERR results for failed i/o. Right now the device bails
> processing the command and doesn't reply to the driver, which is not
> conforming with virtio spec.
>
> > OK?
>
> Any other feedback? OK?
>

ok mlarkin

> >
> > -dv
> >
> > [1] https://www.oasis-open.org/committees/ballot.php?id=3536
>
>
> Index: virtio.c
> ===
> RCS file: /cvs/src/usr.sbin/vmd/virtio.c,v
> retrieving revision 1.89
> diff -u -p -r1.89 virtio.c
> --- virtio.c  16 Jun 2021 16:55:02 -  1.89
> +++ virtio.c  17 Jun 2021 15:57:56 -
> @@ -517,6 +517,11 @@ vioblk_notifyq(struct vioblk_dev *dev)
>   }
>
>   /* Read command from descriptor ring */
> + if (cmd_desc->flags & VRING_DESC_F_WRITE) {
> + log_warnx("vioblk: unexpected writable cmd descriptor "
> + "%d", cmd_desc_idx);
> + goto out;
> + }
>   if (read_mem(cmd_desc->addr, , sizeof(cmd))) {
>   log_warnx("vioblk: command read_mem error @ 0x%llx",
>   cmd_desc->addr);
> @@ -541,6 +546,13 @@ vioblk_notifyq(struct vioblk_dev *dev)
>   struct ioinfo *info;
>   const uint8_t *secdata;
>
> + if ((secdata_desc->flags & VRING_DESC_F_WRITE)
> + == 0) {
> + log_warnx("vioblk: unwritable data "
> + "descriptor %d", secdata_desc_idx);
> + goto out;
> + }
> +
>   info = vioblk_start_read(dev,
>   cmd.sector + secbias, secdata_desc->len);
>
> @@ -607,6 +619,13 @@ vioblk_notifyq(struct vioblk_dev *dev)
>   do {
>   struct ioinfo *info;
>
> + if (secdata_desc->flags & VRING_DESC_F_WRITE) {
> + log_warnx("wr vioblk: unexpected "
> + "writable data descriptor %d",
> + secdata_desc_idx);
> + goto out;
> + }
> +
>   info = vioblk_start_write(dev,
>   cmd.sector + secbias,
>   secdata_desc->addr, secdata_desc->len);
> @@ -654,7 +673,35 @@ vioblk_notifyq(struct vioblk_dev *dev)
>   ds_desc_idx = 

Re: [CAN IGNORE] Proposal for new bc(1) and dc(1)

2021-06-17 Thread Theo de Raadt
enh  wrote:

> heh. while i know what Theo means -- and his is also a valid philosophical
> standpoint -- this kind of thing is one reason why i personally prefer
> fewer implementations of things and more shared code between them :-)
> 
> at least it leads to more consistency, and to having fewer places to fix.
> (since i think we've all had enough counterexamples to not believe that
> whole "many eyes" thing, the real question is "how easily/quickly can you
> make fixes, and how easily/quickly can you distribute them?"... given how
> few eyes are actually open, i'd rather have them all looking at the same
> set of problems :-( )

Eyes will usually look at the version they are used to, and any effort
to shink the number of versions will probably fail to re-adapt those
eyes towards looking at another version with the same focus, as they are
not familiar with the replacement.

Additionally, all software efforts also face a limit on the number of
cooks huddled in the kitchen, so I do not believe the development
community grows in that way.  I don't know of any succesfull examples
where divergent teams merged into a single track.

To me, it seems best if we urge everyone to craft their independent
implementations towards consistency, and conversation on those details
is important to get there.


There is another problem in our tree: The use of external upstream
managed code is a royal pain in the ass.  We do it where we have to,
but the process for managing that kind stuff is a culturally weird.



Re: [CAN IGNORE] Proposal for new bc(1) and dc(1)

2021-06-17 Thread Theo de Raadt
Having fewer versions of software is not neccessarily a plus.

It is easily considered a negative.


enh  wrote:

> it's perhaps worth mentioning in the "pros" column that toybox (a
> 0BSD-licensed busybox which provides most of /bin on Android, which i
> maintain) and busybox both use Gavin's bc/dc implementations[1]. so
> although the GNU versions are their own thing and i assume likely to stay
> that way, "all Android devices (plus any other toybox users) and everything
> that uses busybox" is a fairly significant chunk of the market if you're
> thinking about compatibility/interoperability issues.
> 
> 
> 1. Android doesn't actually use the toybox copy of bc, it uses Gavin's
> upstream version directly. the only issue we've had with bc/dc on Android
> was from a bug that was introduced in toybox that was never in upstream. if
> you're thinking "how much use do bc/dc even get on Android devices
> anyway?", that's a fair point, but note that we also use host prebuilts of
> Gavin's bc/dc to _build_ Android (including all the third-party open source
> packages and the kernel, not just code we wrote). that was the _real_
> reason for me to get into the business of shipping bc/dc!
> 
> On Thu, Jun 17, 2021 at 9:08 AM Gavin Howard 
> wrote:
> 
> > Otto,
> >
> > > I think it is interesting. As for the incompatibilites, my memory says
> > > I followed the original dc and bc when deciding on those (GNU chose to
> > > differs in these cases). Bit it has been 18 years since I wrote the
> > > current versions, so I might misrememeber.
> >
> > I think that makes sense to me. Unfortunately, when I was building my
> > dc, I couldn't find any mention in the OpenBSD man pages, which I used
> > to ensure as much compatibility as I could, that arrays and registers
> > were not separate. Well, there was one (the `;` command mentions
> > registers, but the `:` command does not, so I thought that was a typo).
> >
> > Regarding the 0 having length 0 or 1, that was a decision I agonized
> > over. My dad, who is a mathematician, said that it could go either way.
> > Unfortunately for me, if this is a showstopper incompatibility (and it
> > might be based on how the test suite uses `length()` and `Z`), I do
> > think I would keep it as it is and accept that OpenBSD will not want my
> > bc(1) and dc(1).
> >
> > > As for moving to your version, I have no opinion yet. I have some
> > > attachment to the current code, but not so strong that I am opposing
> > > replacement upfront. OTOH the current implementaion is almost
> > > maintainance free for many years already. So I dunno.
> >
> > You have a right to have attachment to it; I have attachment to mine!
> >
> > In fact, I was pleasantly surprised at how clean and readable your code
> > was. I usually struggle to read code written by others, but I could
> > easily read yours.
> >
> > On that note, since last night, I thought of more disadvantages of
> > moving to my bc and dc, which I feel I must mention.
> >
> > More disadvantages:
> >
> > * The current dc(1) and bc(1) are from a known member of the OpenBSD
> >   community with many contributions. I am an unknown quantity.
> > * The current dc(1) and bc(1) do not have ugly portability code that
> >   OpenBSD probably doesn't care about.
> > * The current dc(1) and bc(1) do not have ugly code to support build
> >   options that OpenBSD does not care about.
> > * The binary size of the OpenBSD dc(1) and bc(1) combined are 78% the
> >   size of mine combined (on amd64). The size of OpenBSD combined is
> >   145440, and the size of mine combined are 185706.
> > * The current dc(1) and bc(1) have much less source code and have been
> >   nearly maintenance-free for many years. Mine were started in 2018 and
> >   do not have as long of a track record for being low maintenance.
> >
> > > I'll take a look at your code soon and maybe other devs have opinions.
> >
> > Thank you very much!
> >
> > Gavin Howard
> >
> >



Re: [CAN IGNORE] Proposal for new bc(1) and dc(1)

2021-06-17 Thread Otto Moerbeek
On Thu, Jun 17, 2021 at 10:01:02AM -0600, Gavin Howard wrote:

> Otto,
> 
> > I think it is interesting. As for the incompatibilites, my memory says
> > I followed the original dc and bc when deciding on those (GNU chose to
> > differs in these cases). Bit it has been 18 years since I wrote the
> > current versions, so I might misrememeber.
> 
> I think that makes sense to me. Unfortunately, when I was building my
> dc, I couldn't find any mention in the OpenBSD man pages, which I used
> to ensure as much compatibility as I could, that arrays and registers
> were not separate. Well, there was one (the `;` command mentions
> registers, but the `:` command does not, so I thought that was a typo).
> 
> Regarding the 0 having length 0 or 1, that was a decision I agonized
> over. My dad, who is a mathematician, said that it could go either way.
> Unfortunately for me, if this is a showstopper incompatibility (and it
> might be based on how the test suite uses `length()` and `Z`), I do
> think I would keep it as it is and accept that OpenBSD will not want my
> bc(1) and dc(1).

It looks like GNU dc and bc do not agree:

$ dc -V
dc (GNU bc 1.06) 1.3

Copyright 1994, 1997, 1998, 2000 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There
is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE,
to the extent permitted by law.
iMac:~ otto$ dc
0Zp
0

and 

$ bc
bc 1.06
Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.
This is free software with ABSOLUTELY NO WARRANTY.
For details type `warranty'. 
length(0)
1

I confirmed the original dc by Morris and Cherry indeed print 0 for
the above test case.

-Otto
> 
> > As for moving to your version, I have no opinion yet. I have some
> > attachment to the current code, but not so strong that I am opposing
> > replacement upfront. OTOH the current implementaion is almost
> > maintainance free for many years already. So I dunno.
> 
> You have a right to have attachment to it; I have attachment to mine!
> 
> In fact, I was pleasantly surprised at how clean and readable your code
> was. I usually struggle to read code written by others, but I could
> easily read yours.
> 
> On that note, since last night, I thought of more disadvantages of
> moving to my bc and dc, which I feel I must mention.
> 
> More disadvantages:
> 
> * The current dc(1) and bc(1) are from a known member of the OpenBSD
>   community with many contributions. I am an unknown quantity.
> * The current dc(1) and bc(1) do not have ugly portability code that
>   OpenBSD probably doesn't care about.
> * The current dc(1) and bc(1) do not have ugly code to support build
>   options that OpenBSD does not care about.
> * The binary size of the OpenBSD dc(1) and bc(1) combined are 78% the
>   size of mine combined (on amd64). The size of OpenBSD combined is
>   145440, and the size of mine combined are 185706.
> * The current dc(1) and bc(1) have much less source code and have been
>   nearly maintenance-free for many years. Mine were started in 2018 and
>   do not have as long of a track record for being low maintenance.
> 
> > I'll take a look at your code soon and maybe other devs have opinions.
> 
> Thank you very much!
> 
> Gavin Howard
> 



Re: vmd(8): add barebones vioblk GET_ID support

2021-06-17 Thread Dave Voutila


Dave Voutila writes:

> The virtio spec has had a de facto command for drivers to read the
> serial number off a virtual block device. QEMU introduced this feature
> years ago. Last November, the virtio governing group voted in favor of
> adopting it officially into v1.2 (the next virtio spec) [1].
>
> The below diff adds the basics of handling the request returning an
> empty serial number. (Serial numbers are limited to 20 bytes.) This
> stops vmd from complaining about "unsupported command 0x8" when guests
> send this command type.

Got some feedback off-list from claudio@ that I think is sound. Instead
of providing an "empty" serial id/number, simply return an UNSUPP status
to indicate we don't support the value.

I think this approach better than the approach I was suggesting that was
based off QEMU's design of defaulting to "". (FreeBSD's Bhyve generates
a serial like "BHYVE-1122-3344-5566" where the suffix is some truncated
md5 of the backing filename. I'm not a fan of this approach.)

>
> secdata_desc{,idx} variables are renamed to just data_desc{,idx} to
> semantically match the change since they're used for more than sector
> data.

I undid this renaming for now to reduce noise.

> This is primarily part of my work to clean up and bring vmd's virtio
> implementation more up to date and to align to our own
> v{io,ioblk,ioscsi,etc.}(4) current capabilities. (vioblk(4) doesn't
> support this yet, but Linux guests use it frequently.)

While adding the BLK_ID support, I also switched the FLUSH/FLUSH_OUT
response to be VIRTIO_BLK_S_UNSUPP as well since the device does not
negotiate that feature. Any request from the guest to "flush" currently
doesn't do anything (some hypervisors will fsync(2) the underlying fd)
but for now I'm correcting the response code.

I also noticed and added read/write checks prior to calls to
{read,write}_mem. The virtio spec says a device MUST not write to a
read-only descriptor and SHOULD NOT read a write-only descriptor with an
exception being made for debugging. (See 2.6.5.1 Device requirements:
The Virtqueue Descriptor Table.)

Next steps in this are of code will be to properly implement the missing
VIRTIO_BLK_S_IOERR results for failed i/o. Right now the device bails
processing the command and doesn't reply to the driver, which is not
conforming with virtio spec.

> OK?

Any other feedback? OK?

>
> -dv
>
> [1] https://www.oasis-open.org/committees/ballot.php?id=3536


Index: virtio.c
===
RCS file: /cvs/src/usr.sbin/vmd/virtio.c,v
retrieving revision 1.89
diff -u -p -r1.89 virtio.c
--- virtio.c16 Jun 2021 16:55:02 -  1.89
+++ virtio.c17 Jun 2021 15:57:56 -
@@ -517,6 +517,11 @@ vioblk_notifyq(struct vioblk_dev *dev)
}

/* Read command from descriptor ring */
+   if (cmd_desc->flags & VRING_DESC_F_WRITE) {
+   log_warnx("vioblk: unexpected writable cmd descriptor "
+   "%d", cmd_desc_idx);
+   goto out;
+   }
if (read_mem(cmd_desc->addr, , sizeof(cmd))) {
log_warnx("vioblk: command read_mem error @ 0x%llx",
cmd_desc->addr);
@@ -541,6 +546,13 @@ vioblk_notifyq(struct vioblk_dev *dev)
struct ioinfo *info;
const uint8_t *secdata;

+   if ((secdata_desc->flags & VRING_DESC_F_WRITE)
+   == 0) {
+   log_warnx("vioblk: unwritable data "
+   "descriptor %d", secdata_desc_idx);
+   goto out;
+   }
+
info = vioblk_start_read(dev,
cmd.sector + secbias, secdata_desc->len);

@@ -607,6 +619,13 @@ vioblk_notifyq(struct vioblk_dev *dev)
do {
struct ioinfo *info;

+   if (secdata_desc->flags & VRING_DESC_F_WRITE) {
+   log_warnx("wr vioblk: unexpected "
+   "writable data descriptor %d",
+   secdata_desc_idx);
+   goto out;
+   }
+
info = vioblk_start_write(dev,
cmd.sector + secbias,
secdata_desc->addr, secdata_desc->len);
@@ -654,7 +673,35 @@ vioblk_notifyq(struct vioblk_dev *dev)
ds_desc_idx = cmd_desc->next & VIOBLK_QUEUE_MASK;
ds_desc = [ds_desc_idx];

-   ds = VIRTIO_BLK_S_OK;
+   ds = VIRTIO_BLK_S_UNSUPP;
+   break;
+ 

Re: [CAN IGNORE] Proposal for new bc(1) and dc(1)

2021-06-17 Thread Gavin Howard
Otto,

> I think it is interesting. As for the incompatibilites, my memory says
> I followed the original dc and bc when deciding on those (GNU chose to
> differs in these cases). Bit it has been 18 years since I wrote the
> current versions, so I might misrememeber.

I think that makes sense to me. Unfortunately, when I was building my
dc, I couldn't find any mention in the OpenBSD man pages, which I used
to ensure as much compatibility as I could, that arrays and registers
were not separate. Well, there was one (the `;` command mentions
registers, but the `:` command does not, so I thought that was a typo).

Regarding the 0 having length 0 or 1, that was a decision I agonized
over. My dad, who is a mathematician, said that it could go either way.
Unfortunately for me, if this is a showstopper incompatibility (and it
might be based on how the test suite uses `length()` and `Z`), I do
think I would keep it as it is and accept that OpenBSD will not want my
bc(1) and dc(1).

> As for moving to your version, I have no opinion yet. I have some
> attachment to the current code, but not so strong that I am opposing
> replacement upfront. OTOH the current implementaion is almost
> maintainance free for many years already. So I dunno.

You have a right to have attachment to it; I have attachment to mine!

In fact, I was pleasantly surprised at how clean and readable your code
was. I usually struggle to read code written by others, but I could
easily read yours.

On that note, since last night, I thought of more disadvantages of
moving to my bc and dc, which I feel I must mention.

More disadvantages:

* The current dc(1) and bc(1) are from a known member of the OpenBSD
  community with many contributions. I am an unknown quantity.
* The current dc(1) and bc(1) do not have ugly portability code that
  OpenBSD probably doesn't care about.
* The current dc(1) and bc(1) do not have ugly code to support build
  options that OpenBSD does not care about.
* The binary size of the OpenBSD dc(1) and bc(1) combined are 78% the
  size of mine combined (on amd64). The size of OpenBSD combined is
  145440, and the size of mine combined are 185706.
* The current dc(1) and bc(1) have much less source code and have been
  nearly maintenance-free for many years. Mine were started in 2018 and
  do not have as long of a track record for being low maintenance.

> I'll take a look at your code soon and maybe other devs have opinions.

Thank you very much!

Gavin Howard



Re: bgpd support for enhanced route refresh

2021-06-17 Thread Claudio Jeker
On Thu, Jun 17, 2021 at 01:40:01PM +, Job Snijders wrote:
> On Thu, Jun 17, 2021 at 03:29:38PM +0200, Claudio Jeker wrote:
> > On Thu, Jun 17, 2021 at 01:25:07PM +, Job Snijders wrote:
> > > On Thu, Jun 17, 2021 at 12:24:16PM +0200, Claudio Jeker wrote:
> > > > On Mon, Jun 14, 2021 at 05:10:07PM +0200, Claudio Jeker wrote:
> > > > > On Thu, May 27, 2021 at 06:24:06PM +0200, Claudio Jeker wrote:
> > > > > > Implement RFC 7313 enhanced route refresh.
> > > > > > 
> > > > > > While there also change when graceful restart EoR markers are sent.
> > > > > > In short the graceful restart marker should only be sent initally. 
> > > > > > After
> > > > > > that the End of Route Refresh message should be sent instead.
> > > > > > Because of this track if an EoR marker was received or should be 
> > > > > > sent in
> > > > > > the peer config.
> > > > > > 
> > > > > > For now this setting is off by default but that may be changed at a 
> > > > > > later
> > > > > > state.
> > > > > > 
> > > > > > Please try this out and tell me if it works for you. The message and
> > > > > > prefix/rrefresh counters in bgpctl show nei output help a lot to 
> > > > > > see what
> > > > > > is going on.
> > > > 
> > > > Minor cleanup, instead of this comment use CTASSERT to make sure that 
> > > > the
> > > > recv_eor and send_eor fields are large enough to work as a bitfield for
> > > > AID_MAX
> > > > 
> > > > > -#define  AID_MAX 5
> > > > > +#define  AID_MAX 5   /* check rde_peer.recv_eor when 
> > > > > max reaches 7 */
> > > > 
> > > > Enhanced route refresh is currently off by default. So unless somebody
> > > > is against this I will commit this soon. At least then it gets tested :)
> > > 
> > > Under what circumstances will bgpd(8) send an enhanced route refresh?
> > 
> > The enhanced route refresh messages BoRR and EoRR are sent from the system
> > that got the normal route refresh request. So you need to issue a refresh
> > from the peer to see the new messages.
> 
> Ah... there we go:
> 
> 13:39:14.349247 10.0.0.12.28035 > 10.0.0.1.bgp: P 19:42(23) ack 1 win 256 
> : BGP (ROUTE-REFRESH Request (IPv4 
> Unicast)) (DF) [tos 0xc0]
> 13:39:14.351043 10.0.0.1.bgp > 10.0.0.12.28035: P 1:24(23) ack 42 win 267 
> : BGP (ROUTE-REFRESH BoRR (IPv4 
> Unicast)) [tos 0xc0]
> 13:39:14.548996 10.0.0.12.28035 > 10.0.0.1.bgp: . ack 24 win 256 
>  (DF) [tos 0xc0]
> 13:39:18.855601 10.0.0.1.bgp > 10.0.0.12.28035: P 24:47(23) ack 42 win 267 
> : BGP (ROUTE-REFRESH EoRR (IPv4 
> Unicast)) [tos 0xc0]
> 13:39:19.049045 10.0.0.12.28035 > 10.0.0.1.bgp: . ack 47 win 256 
>  (DF) [tos 0xc0]
> 
> Index: print-bgp.c
> ===
> RCS file: /cvs/src/usr.sbin/tcpdump/print-bgp.c,v
> retrieving revision 1.29
> diff -u -p -r1.29 print-bgp.c
> --- print-bgp.c   3 Jul 2019 03:24:03 -   1.29
> +++ print-bgp.c   17 Jun 2021 13:39:37 -
> @@ -97,7 +97,7 @@ struct bgp_route_refresh {
>   u_int16_t len;
>   u_int8_t type;
>   u_int8_t afi[2]; /* unaligned; should be u_int16_t */
> - u_int8_t res;
> + u_int8_t subtype;
>   u_int8_t safi;
>  };
>  #define BGP_ROUTE_REFRESH_SIZE  23
> @@ -189,6 +189,7 @@ static const char *bgp_capcode[] = {
>   /* 67: [Chen] */ "DYNAMIC_CAPABILITY",
>   /* 68: [Appanna] */ "MULTISESSION",
>   /* 69: [draft-ietf-idr-add-paths] */ "ADD-PATH",
> + /* 70: RFC7313 */ "ENHANCED_ROUTE_REFRESH"
>  };
>  
>  #define bgp_capcode(x) \
> @@ -199,7 +200,7 @@ static const char *bgpnotify_major[] = {
>   NULL, "Message Header Error",
>   "OPEN Message Error", "UPDATE Message Error",
>   "Hold Timer Expired", "Finite State Machine Error",
> - "Cease", "Capability Message Error",
> + "Cease", "ROUTE_REFRESH Message Error",
>  };
>  #define bgp_notify_major(x) \
>   num_or_str(bgpnotify_major, \
> @@ -323,6 +324,11 @@ static const char *afnumber[] = AFNUM_NA
>   num_or_str(afnumber, \
>   sizeof(afnumber)/sizeof(afnumber[0]), (x)))
>  
> +static const char *refreshtype[] = {
> + "Request", "BoRR", "EoRR"
> +};
> +#define refresh_subtype(x) \
> + num_or_str(refreshtype, sizeof(refreshtype)/sizeof(refreshtype[0]), (x))
>  
>  static const char *
>  num_or_str(const char **table, size_t siz, int value)
> @@ -1069,7 +1075,8 @@ bgp_route_refresh_print(const u_char *da
>  
>   bgp_route_refresh_header = (const struct bgp_route_refresh *)dat;
>  
> - printf(" (%s %s)",
> + printf(" %s (%s %s)",
> + refresh_subtype(bgp_route_refresh_header->subtype),
>   af_name(EXTRACT_16BITS(_route_refresh_header->afi)),
>   bgp_attr_nlri_safi(bgp_route_refresh_header->safi));
>  
> 

Not a tcpdump expert but this diff looks OK.

-- 
:wq Claudio



Re: bgpd support for enhanced route refresh

2021-06-17 Thread Job Snijders
Disabled by default is a good start.

OK job@



Re: ipsec crypto kernel lock

2021-06-17 Thread Vitaliy Makkoveev
On Thu, Jun 17, 2021 at 03:19:11PM +0200, Alexander Bluhm wrote:
> On Thu, Jun 17, 2021 at 10:09:47AM +0200, Martin Pieuchot wrote:
> > It's not clear to me which field is the KERNEL_LOCK() protecting.  Is it
> > the access to `swcr_sessions'?  Is it a reference?  If so grabbing/releasing
> > the lock might not be enough to fix the use-after-free.
> 
> I am running the fix for nearly a day now.  Seems to be stable.  I
> guess the crash is a regression from unlocking the pfkey socket.
> Rekeying had the kernel lock before, now it does not.  Calling
> crypto operations without kernel lock looks wrong.  There is no
> other locking in crypto.
> 
> > Could you annotate which field is being protected by the KERNEL_LOCK()? 
> 
> No.  I do not want to invest into fine grained crypto locking.  I
> need a stable test machine.  If you think my aporoach of throwing
> the kernel lock between the boundary of network stack, pfkey and
> crypto is too naive, I would suggest that we backoout the pfkey
> unlocking.
> 
> But my aproach is stable and has less locks than before.  I think
> it is an improvement.

We perform crypto(9) access from net stack without kernel lock held, so
pfkey unlocking backout could not be enough and 'tdbp->tdb_xform->xf_*'
routines should be also wrapped by kernel lock. So this diff is welcomed
anyway.

> 
> bluhm
> 
> > > Index: netinet/ip_ah.c
> > > ===
> > > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ip_ah.c,v
> > > retrieving revision 1.146
> > > diff -u -p -r1.146 ip_ah.c
> > > --- netinet/ip_ah.c   25 Feb 2021 02:48:21 -  1.146
> > > +++ netinet/ip_ah.c   16 Jun 2021 17:29:37 -
> > > @@ -98,6 +98,7 @@ ah_init(struct tdb *tdbp, struct xformsw
> > >  {
> > >   struct auth_hash *thash = NULL;
> > >   struct cryptoini cria, crin;
> > > + int error;
> > >  
> > >   /* Authentication operation. */
> > >   switch (ii->ii_authalg) {
> > > @@ -162,7 +163,10 @@ ah_init(struct tdb *tdbp, struct xformsw
> > >   cria.cri_next = 
> > >   }
> > >  
> > > - return crypto_newsession(>tdb_cryptoid, , 0);
> > > + KERNEL_LOCK();
> > > + error = crypto_newsession(>tdb_cryptoid, , 0);
> > > + KERNEL_UNLOCK();
> > > + return error;
> > >  }
> > >  
> > >  /*
> > > @@ -171,7 +175,7 @@ ah_init(struct tdb *tdbp, struct xformsw
> > >  int
> > >  ah_zeroize(struct tdb *tdbp)
> > >  {
> > > - int err;
> > > + int error;
> > >  
> > >   if (tdbp->tdb_amxkey) {
> > >   explicit_bzero(tdbp->tdb_amxkey, tdbp->tdb_amxkeylen);
> > > @@ -179,9 +183,11 @@ ah_zeroize(struct tdb *tdbp)
> > >   tdbp->tdb_amxkey = NULL;
> > >   }
> > >  
> > > - err = crypto_freesession(tdbp->tdb_cryptoid);
> > > + KERNEL_LOCK();
> > > + error = crypto_freesession(tdbp->tdb_cryptoid);
> > > + KERNEL_UNLOCK();
> > >   tdbp->tdb_cryptoid = 0;
> > > - return err;
> > > + return error;
> > >  }
> > >  
> > >  /*
> > > @@ -626,7 +632,9 @@ ah_input(struct mbuf *m, struct tdb *tdb
> > >   }
> > >  
> > >   /* Get crypto descriptors. */
> > > + KERNEL_LOCK();
> > >   crp = crypto_getreq(1);
> > > + KERNEL_UNLOCK();
> > >   if (crp == NULL) {
> > >   DPRINTF(("%s: failed to acquire crypto descriptors\n",
> > >   __func__));
> > > @@ -696,11 +704,16 @@ ah_input(struct mbuf *m, struct tdb *tdb
> > >   tc->tc_rdomain = tdb->tdb_rdomain;
> > >   memcpy(>tc_dst, >tdb_dst, sizeof(union sockaddr_union));
> > >  
> > > - return crypto_dispatch(crp);
> > > + KERNEL_LOCK();
> > > + error = crypto_dispatch(crp);
> > > + KERNEL_UNLOCK();
> > > + return error;
> > >  
> > >   drop:
> > >   m_freem(m);
> > > + KERNEL_LOCK();
> > >   crypto_freereq(crp);
> > > + KERNEL_UNLOCK();
> > >   free(tc, M_XDATA, 0);
> > >   return error;
> > >  }
> > > @@ -1047,7 +1060,9 @@ ah_output(struct mbuf *m, struct tdb *td
> > >  #endif
> > >  
> > >   /* Get crypto descriptors. */
> > > + KERNEL_LOCK();
> > >   crp = crypto_getreq(1);
> > > + KERNEL_UNLOCK();
> > >   if (crp == NULL) {
> > >   DPRINTF(("%s: failed to acquire crypto descriptors\n",
> > >   __func__));
> > > @@ -1144,11 +1159,16 @@ ah_output(struct mbuf *m, struct tdb *td
> > >   tc->tc_rdomain = tdb->tdb_rdomain;
> > >   memcpy(>tc_dst, >tdb_dst, sizeof(union sockaddr_union));
> > >  
> > > - return crypto_dispatch(crp);
> > > + KERNEL_LOCK();
> > > + error = crypto_dispatch(crp);
> > > + KERNEL_UNLOCK();
> > > + return error;
> > >  
> > >   drop:
> > >   m_freem(m);
> > > + KERNEL_LOCK();
> > >   crypto_freereq(crp);
> > > + KERNEL_UNLOCK();
> > >   free(tc, M_XDATA, 0);
> > >   return error;
> > >  }
> > > Index: netinet/ip_esp.c
> > > ===
> > > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ip_esp.c,v
> > > retrieving revision 1.162
> > > diff -u -p -r1.162 ip_esp.c
> > > --- netinet/ip_esp.c  25 Feb 2021 02:48:21 -  1.162
> > > +++ netinet/ip_esp.c  16 Jun 2021 17:29:55 -
> > 

Re: bgpd support for enhanced route refresh

2021-06-17 Thread Job Snijders
On Thu, Jun 17, 2021 at 03:29:38PM +0200, Claudio Jeker wrote:
> On Thu, Jun 17, 2021 at 01:25:07PM +, Job Snijders wrote:
> > On Thu, Jun 17, 2021 at 12:24:16PM +0200, Claudio Jeker wrote:
> > > On Mon, Jun 14, 2021 at 05:10:07PM +0200, Claudio Jeker wrote:
> > > > On Thu, May 27, 2021 at 06:24:06PM +0200, Claudio Jeker wrote:
> > > > > Implement RFC 7313 enhanced route refresh.
> > > > > 
> > > > > While there also change when graceful restart EoR markers are sent.
> > > > > In short the graceful restart marker should only be sent initally. 
> > > > > After
> > > > > that the End of Route Refresh message should be sent instead.
> > > > > Because of this track if an EoR marker was received or should be sent 
> > > > > in
> > > > > the peer config.
> > > > > 
> > > > > For now this setting is off by default but that may be changed at a 
> > > > > later
> > > > > state.
> > > > > 
> > > > > Please try this out and tell me if it works for you. The message and
> > > > > prefix/rrefresh counters in bgpctl show nei output help a lot to see 
> > > > > what
> > > > > is going on.
> > > 
> > > Minor cleanup, instead of this comment use CTASSERT to make sure that the
> > > recv_eor and send_eor fields are large enough to work as a bitfield for
> > > AID_MAX
> > > 
> > > > -#defineAID_MAX 5
> > > > +#defineAID_MAX 5   /* check rde_peer.recv_eor when 
> > > > max reaches 7 */
> > > 
> > > Enhanced route refresh is currently off by default. So unless somebody
> > > is against this I will commit this soon. At least then it gets tested :)
> > 
> > Under what circumstances will bgpd(8) send an enhanced route refresh?
> 
> The enhanced route refresh messages BoRR and EoRR are sent from the system
> that got the normal route refresh request. So you need to issue a refresh
> from the peer to see the new messages.

Ah... there we go:

13:39:14.349247 10.0.0.12.28035 > 10.0.0.1.bgp: P 19:42(23) ack 1 win 256 
: BGP (ROUTE-REFRESH Request (IPv4 
Unicast)) (DF) [tos 0xc0]
13:39:14.351043 10.0.0.1.bgp > 10.0.0.12.28035: P 1:24(23) ack 42 win 267 
: BGP (ROUTE-REFRESH BoRR (IPv4 
Unicast)) [tos 0xc0]
13:39:14.548996 10.0.0.12.28035 > 10.0.0.1.bgp: . ack 24 win 256 
 (DF) [tos 0xc0]
13:39:18.855601 10.0.0.1.bgp > 10.0.0.12.28035: P 24:47(23) ack 42 win 267 
: BGP (ROUTE-REFRESH EoRR (IPv4 
Unicast)) [tos 0xc0]
13:39:19.049045 10.0.0.12.28035 > 10.0.0.1.bgp: . ack 47 win 256 
 (DF) [tos 0xc0]

Index: print-bgp.c
===
RCS file: /cvs/src/usr.sbin/tcpdump/print-bgp.c,v
retrieving revision 1.29
diff -u -p -r1.29 print-bgp.c
--- print-bgp.c 3 Jul 2019 03:24:03 -   1.29
+++ print-bgp.c 17 Jun 2021 13:39:37 -
@@ -97,7 +97,7 @@ struct bgp_route_refresh {
u_int16_t len;
u_int8_t type;
u_int8_t afi[2]; /* unaligned; should be u_int16_t */
-   u_int8_t res;
+   u_int8_t subtype;
u_int8_t safi;
 };
 #define BGP_ROUTE_REFRESH_SIZE  23
@@ -189,6 +189,7 @@ static const char *bgp_capcode[] = {
/* 67: [Chen] */ "DYNAMIC_CAPABILITY",
/* 68: [Appanna] */ "MULTISESSION",
/* 69: [draft-ietf-idr-add-paths] */ "ADD-PATH",
+   /* 70: RFC7313 */ "ENHANCED_ROUTE_REFRESH"
 };
 
 #define bgp_capcode(x) \
@@ -199,7 +200,7 @@ static const char *bgpnotify_major[] = {
NULL, "Message Header Error",
"OPEN Message Error", "UPDATE Message Error",
"Hold Timer Expired", "Finite State Machine Error",
-   "Cease", "Capability Message Error",
+   "Cease", "ROUTE_REFRESH Message Error",
 };
 #define bgp_notify_major(x) \
num_or_str(bgpnotify_major, \
@@ -323,6 +324,11 @@ static const char *afnumber[] = AFNUM_NA
num_or_str(afnumber, \
sizeof(afnumber)/sizeof(afnumber[0]), (x)))
 
+static const char *refreshtype[] = {
+   "Request", "BoRR", "EoRR"
+};
+#define refresh_subtype(x) \
+   num_or_str(refreshtype, sizeof(refreshtype)/sizeof(refreshtype[0]), (x))
 
 static const char *
 num_or_str(const char **table, size_t siz, int value)
@@ -1069,7 +1075,8 @@ bgp_route_refresh_print(const u_char *da
 
bgp_route_refresh_header = (const struct bgp_route_refresh *)dat;
 
-   printf(" (%s %s)",
+   printf(" %s (%s %s)",
+   refresh_subtype(bgp_route_refresh_header->subtype),
af_name(EXTRACT_16BITS(_route_refresh_header->afi)),
bgp_attr_nlri_safi(bgp_route_refresh_header->safi));
 



Re: bgpd support for enhanced route refresh

2021-06-17 Thread Claudio Jeker
On Thu, Jun 17, 2021 at 01:25:07PM +, Job Snijders wrote:
> On Thu, Jun 17, 2021 at 12:24:16PM +0200, Claudio Jeker wrote:
> > On Mon, Jun 14, 2021 at 05:10:07PM +0200, Claudio Jeker wrote:
> > > On Thu, May 27, 2021 at 06:24:06PM +0200, Claudio Jeker wrote:
> > > > Implement RFC 7313 enhanced route refresh.
> > > > 
> > > > While there also change when graceful restart EoR markers are sent.
> > > > In short the graceful restart marker should only be sent initally. After
> > > > that the End of Route Refresh message should be sent instead.
> > > > Because of this track if an EoR marker was received or should be sent in
> > > > the peer config.
> > > > 
> > > > For now this setting is off by default but that may be changed at a 
> > > > later
> > > > state.
> > > > 
> > > > Please try this out and tell me if it works for you. The message and
> > > > prefix/rrefresh counters in bgpctl show nei output help a lot to see 
> > > > what
> > > > is going on.
> > 
> > Minor cleanup, instead of this comment use CTASSERT to make sure that the
> > recv_eor and send_eor fields are large enough to work as a bitfield for
> > AID_MAX
> > 
> > > -#define  AID_MAX 5
> > > +#define  AID_MAX 5   /* check rde_peer.recv_eor when max 
> > > reaches 7 */
> > 
> > Enhanced route refresh is currently off by default. So unless somebody
> > is against this I will commit this soon. At least then it gets tested :)
> 
> Under what circumstances will bgpd(8) send an enhanced route refresh?

The enhanced route refresh messages BoRR and EoRR are sent from the system
that got the normal route refresh request. So you need to issue a refresh
from the peer to see the new messages.
 
> If I issue 'bgpctl neighbor xxx refresh' on a session where Enhanced
> Refresh is negotiated
> 
>   Negotiated capabilities:
> Multiprotocol extensions: IPv4 unicast
> 4-byte AS numbers
> Route Refresh
> Enhanced Route Refresh
> Graceful Restart
> 
> I see in tshark:
> 
>Border Gateway Protocol - ROUTE-REFRESH Message
>Marker: 
>Length: 23
>Type: ROUTE-REFRESH Message (5)
>Address family identifier (AFI): IPv4 (1)
>Subtype: Normal route refresh request [RFC2918] with/without ORF 
> [RFC5291] (0)
>Subsequent address family identifier (SAFI): Unicast (1)
> 
> Attached below a tcpdump diff that perhaps can help improve visibility
> :-)
> 
> Kind regards,
> 
> Job
> 
> Index: print-bgp.c
> ===
> RCS file: /cvs/src/usr.sbin/tcpdump/print-bgp.c,v
> retrieving revision 1.29
> diff -u -p -r1.29 print-bgp.c
> --- print-bgp.c   3 Jul 2019 03:24:03 -   1.29
> +++ print-bgp.c   17 Jun 2021 13:24:33 -
> @@ -97,7 +97,7 @@ struct bgp_route_refresh {
>   u_int16_t len;
>   u_int8_t type;
>   u_int8_t afi[2]; /* unaligned; should be u_int16_t */
> - u_int8_t res;
> + u_int8_t subtype;
>   u_int8_t safi;
>  };
>  #define BGP_ROUTE_REFRESH_SIZE  23
> @@ -189,6 +189,7 @@ static const char *bgp_capcode[] = {
>   /* 67: [Chen] */ "DYNAMIC_CAPABILITY",
>   /* 68: [Appanna] */ "MULTISESSION",
>   /* 69: [draft-ietf-idr-add-paths] */ "ADD-PATH",
> + /* 70: RFC7313 */ "ENHANCED_ROUTE_REFRESH"
>  };
>  
>  #define bgp_capcode(x) \
> @@ -199,7 +200,7 @@ static const char *bgpnotify_major[] = {
>   NULL, "Message Header Error",
>   "OPEN Message Error", "UPDATE Message Error",
>   "Hold Timer Expired", "Finite State Machine Error",
> - "Cease", "Capability Message Error",
> + "Cease", "ROUTE_REFRESH Message Error",
>  };
>  #define bgp_notify_major(x) \
>   num_or_str(bgpnotify_major, \
> @@ -323,6 +324,11 @@ static const char *afnumber[] = AFNUM_NA
>   num_or_str(afnumber, \
>   sizeof(afnumber)/sizeof(afnumber[0]), (x)))
>  
> +static const char *refreshtype[] = {
> + "Normal", "BoRR", "EoRR"
> +};
> +#define refresh_subtype(x) \
> + num_or_str(refreshtype, sizeof(refreshtype)/sizeof(refreshtype[0]), (x))
>  
>  static const char *
>  num_or_str(const char **table, size_t siz, int value)
> @@ -1069,7 +1075,8 @@ bgp_route_refresh_print(const u_char *da
>  
>   bgp_route_refresh_header = (const struct bgp_route_refresh *)dat;
>  
> - printf(" (%s %s)",
> + printf(" %s (%s %s)",
> + refresh_subtype(bgp_route_refresh_header->subtype),
>   af_name(EXTRACT_16BITS(_route_refresh_header->afi)),
>   bgp_attr_nlri_safi(bgp_route_refresh_header->safi));
>  
> 

-- 
:wq Claudio



Re: bgpd support for enhanced route refresh

2021-06-17 Thread Job Snijders
On Thu, Jun 17, 2021 at 12:24:16PM +0200, Claudio Jeker wrote:
> On Mon, Jun 14, 2021 at 05:10:07PM +0200, Claudio Jeker wrote:
> > On Thu, May 27, 2021 at 06:24:06PM +0200, Claudio Jeker wrote:
> > > Implement RFC 7313 enhanced route refresh.
> > > 
> > > While there also change when graceful restart EoR markers are sent.
> > > In short the graceful restart marker should only be sent initally. After
> > > that the End of Route Refresh message should be sent instead.
> > > Because of this track if an EoR marker was received or should be sent in
> > > the peer config.
> > > 
> > > For now this setting is off by default but that may be changed at a later
> > > state.
> > > 
> > > Please try this out and tell me if it works for you. The message and
> > > prefix/rrefresh counters in bgpctl show nei output help a lot to see what
> > > is going on.
> 
> Minor cleanup, instead of this comment use CTASSERT to make sure that the
> recv_eor and send_eor fields are large enough to work as a bitfield for
> AID_MAX
> 
> > -#defineAID_MAX 5
> > +#defineAID_MAX 5   /* check rde_peer.recv_eor when max 
> > reaches 7 */
> 
> Enhanced route refresh is currently off by default. So unless somebody
> is against this I will commit this soon. At least then it gets tested :)

Under what circumstances will bgpd(8) send an enhanced route refresh?

If I issue 'bgpctl neighbor xxx refresh' on a session where Enhanced
Refresh is negotiated

  Negotiated capabilities:
Multiprotocol extensions: IPv4 unicast
4-byte AS numbers
Route Refresh
Enhanced Route Refresh
Graceful Restart

I see in tshark:

   Border Gateway Protocol - ROUTE-REFRESH Message
   Marker: 
   Length: 23
   Type: ROUTE-REFRESH Message (5)
   Address family identifier (AFI): IPv4 (1)
   Subtype: Normal route refresh request [RFC2918] with/without ORF 
[RFC5291] (0)
   Subsequent address family identifier (SAFI): Unicast (1)

Attached below a tcpdump diff that perhaps can help improve visibility
:-)

Kind regards,

Job

Index: print-bgp.c
===
RCS file: /cvs/src/usr.sbin/tcpdump/print-bgp.c,v
retrieving revision 1.29
diff -u -p -r1.29 print-bgp.c
--- print-bgp.c 3 Jul 2019 03:24:03 -   1.29
+++ print-bgp.c 17 Jun 2021 13:24:33 -
@@ -97,7 +97,7 @@ struct bgp_route_refresh {
u_int16_t len;
u_int8_t type;
u_int8_t afi[2]; /* unaligned; should be u_int16_t */
-   u_int8_t res;
+   u_int8_t subtype;
u_int8_t safi;
 };
 #define BGP_ROUTE_REFRESH_SIZE  23
@@ -189,6 +189,7 @@ static const char *bgp_capcode[] = {
/* 67: [Chen] */ "DYNAMIC_CAPABILITY",
/* 68: [Appanna] */ "MULTISESSION",
/* 69: [draft-ietf-idr-add-paths] */ "ADD-PATH",
+   /* 70: RFC7313 */ "ENHANCED_ROUTE_REFRESH"
 };
 
 #define bgp_capcode(x) \
@@ -199,7 +200,7 @@ static const char *bgpnotify_major[] = {
NULL, "Message Header Error",
"OPEN Message Error", "UPDATE Message Error",
"Hold Timer Expired", "Finite State Machine Error",
-   "Cease", "Capability Message Error",
+   "Cease", "ROUTE_REFRESH Message Error",
 };
 #define bgp_notify_major(x) \
num_or_str(bgpnotify_major, \
@@ -323,6 +324,11 @@ static const char *afnumber[] = AFNUM_NA
num_or_str(afnumber, \
sizeof(afnumber)/sizeof(afnumber[0]), (x)))
 
+static const char *refreshtype[] = {
+   "Normal", "BoRR", "EoRR"
+};
+#define refresh_subtype(x) \
+   num_or_str(refreshtype, sizeof(refreshtype)/sizeof(refreshtype[0]), (x))
 
 static const char *
 num_or_str(const char **table, size_t siz, int value)
@@ -1069,7 +1075,8 @@ bgp_route_refresh_print(const u_char *da
 
bgp_route_refresh_header = (const struct bgp_route_refresh *)dat;
 
-   printf(" (%s %s)",
+   printf(" %s (%s %s)",
+   refresh_subtype(bgp_route_refresh_header->subtype),
af_name(EXTRACT_16BITS(_route_refresh_header->afi)),
bgp_attr_nlri_safi(bgp_route_refresh_header->safi));
 



Re: ipsec crypto kernel lock

2021-06-17 Thread Alexander Bluhm
On Thu, Jun 17, 2021 at 10:09:47AM +0200, Martin Pieuchot wrote:
> It's not clear to me which field is the KERNEL_LOCK() protecting.  Is it
> the access to `swcr_sessions'?  Is it a reference?  If so grabbing/releasing
> the lock might not be enough to fix the use-after-free.

I am running the fix for nearly a day now.  Seems to be stable.  I
guess the crash is a regression from unlocking the pfkey socket.
Rekeying had the kernel lock before, now it does not.  Calling
crypto operations without kernel lock looks wrong.  There is no
other locking in crypto.

> Could you annotate which field is being protected by the KERNEL_LOCK()? 

No.  I do not want to invest into fine grained crypto locking.  I
need a stable test machine.  If you think my aporoach of throwing
the kernel lock between the boundary of network stack, pfkey and
crypto is too naive, I would suggest that we backoout the pfkey
unlocking.

But my aproach is stable and has less locks than before.  I think
it is an improvement.

bluhm

> > Index: netinet/ip_ah.c
> > ===
> > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ip_ah.c,v
> > retrieving revision 1.146
> > diff -u -p -r1.146 ip_ah.c
> > --- netinet/ip_ah.c 25 Feb 2021 02:48:21 -  1.146
> > +++ netinet/ip_ah.c 16 Jun 2021 17:29:37 -
> > @@ -98,6 +98,7 @@ ah_init(struct tdb *tdbp, struct xformsw
> >  {
> > struct auth_hash *thash = NULL;
> > struct cryptoini cria, crin;
> > +   int error;
> >  
> > /* Authentication operation. */
> > switch (ii->ii_authalg) {
> > @@ -162,7 +163,10 @@ ah_init(struct tdb *tdbp, struct xformsw
> > cria.cri_next = 
> > }
> >  
> > -   return crypto_newsession(>tdb_cryptoid, , 0);
> > +   KERNEL_LOCK();
> > +   error = crypto_newsession(>tdb_cryptoid, , 0);
> > +   KERNEL_UNLOCK();
> > +   return error;
> >  }
> >  
> >  /*
> > @@ -171,7 +175,7 @@ ah_init(struct tdb *tdbp, struct xformsw
> >  int
> >  ah_zeroize(struct tdb *tdbp)
> >  {
> > -   int err;
> > +   int error;
> >  
> > if (tdbp->tdb_amxkey) {
> > explicit_bzero(tdbp->tdb_amxkey, tdbp->tdb_amxkeylen);
> > @@ -179,9 +183,11 @@ ah_zeroize(struct tdb *tdbp)
> > tdbp->tdb_amxkey = NULL;
> > }
> >  
> > -   err = crypto_freesession(tdbp->tdb_cryptoid);
> > +   KERNEL_LOCK();
> > +   error = crypto_freesession(tdbp->tdb_cryptoid);
> > +   KERNEL_UNLOCK();
> > tdbp->tdb_cryptoid = 0;
> > -   return err;
> > +   return error;
> >  }
> >  
> >  /*
> > @@ -626,7 +632,9 @@ ah_input(struct mbuf *m, struct tdb *tdb
> > }
> >  
> > /* Get crypto descriptors. */
> > +   KERNEL_LOCK();
> > crp = crypto_getreq(1);
> > +   KERNEL_UNLOCK();
> > if (crp == NULL) {
> > DPRINTF(("%s: failed to acquire crypto descriptors\n",
> > __func__));
> > @@ -696,11 +704,16 @@ ah_input(struct mbuf *m, struct tdb *tdb
> > tc->tc_rdomain = tdb->tdb_rdomain;
> > memcpy(>tc_dst, >tdb_dst, sizeof(union sockaddr_union));
> >  
> > -   return crypto_dispatch(crp);
> > +   KERNEL_LOCK();
> > +   error = crypto_dispatch(crp);
> > +   KERNEL_UNLOCK();
> > +   return error;
> >  
> >   drop:
> > m_freem(m);
> > +   KERNEL_LOCK();
> > crypto_freereq(crp);
> > +   KERNEL_UNLOCK();
> > free(tc, M_XDATA, 0);
> > return error;
> >  }
> > @@ -1047,7 +1060,9 @@ ah_output(struct mbuf *m, struct tdb *td
> >  #endif
> >  
> > /* Get crypto descriptors. */
> > +   KERNEL_LOCK();
> > crp = crypto_getreq(1);
> > +   KERNEL_UNLOCK();
> > if (crp == NULL) {
> > DPRINTF(("%s: failed to acquire crypto descriptors\n",
> > __func__));
> > @@ -1144,11 +1159,16 @@ ah_output(struct mbuf *m, struct tdb *td
> > tc->tc_rdomain = tdb->tdb_rdomain;
> > memcpy(>tc_dst, >tdb_dst, sizeof(union sockaddr_union));
> >  
> > -   return crypto_dispatch(crp);
> > +   KERNEL_LOCK();
> > +   error = crypto_dispatch(crp);
> > +   KERNEL_UNLOCK();
> > +   return error;
> >  
> >   drop:
> > m_freem(m);
> > +   KERNEL_LOCK();
> > crypto_freereq(crp);
> > +   KERNEL_UNLOCK();
> > free(tc, M_XDATA, 0);
> > return error;
> >  }
> > Index: netinet/ip_esp.c
> > ===
> > RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ip_esp.c,v
> > retrieving revision 1.162
> > diff -u -p -r1.162 ip_esp.c
> > --- netinet/ip_esp.c25 Feb 2021 02:48:21 -  1.162
> > +++ netinet/ip_esp.c16 Jun 2021 17:29:55 -
> > @@ -93,6 +93,7 @@ esp_init(struct tdb *tdbp, struct xforms
> > struct enc_xform *txform = NULL;
> > struct auth_hash *thash = NULL;
> > struct cryptoini cria, crie, crin;
> > +   int error;
> >  
> > if (!ii->ii_encalg && !ii->ii_authalg) {
> > DPRINTF(("esp_init(): neither authentication nor encryption "
> > @@ -294,8 +295,11 @@ esp_init(struct tdb *tdbp, struct xforms
> > cria.cri_key = 

Re: Rationale behind exec clearing out unveil paths

2021-06-17 Thread Kevin Chadwick
On 6/15/21 4:33 PM, dz...@disroot.org wrote:
>  If it only needs access to its lock file,
> why should I give it access to my ssh keys?

I think that it is worth understanding that you can use file and process
permissions, for that.

Unveil is an extra layer, because no matter what ssh key you provide to an
unveiled app. The developer of that application can decide that I only need
access to a particular key provided on the command line and only within certain
execution paths.

The app design may have a separate process that just handles the key and limited
operations by talking via a socket. In a way, accomplishing that which you
wanted in the first place. Possibly without you the user, even knowing.



Re: [CAN IGNORE] Proposal for new bc(1) and dc(1)

2021-06-17 Thread Otto Moerbeek
On Wed, Jun 16, 2021 at 11:40:08PM -0600, Gavin Howard wrote:

> Hello,
> 
> My name is Gavin Howard. I have developed a new bc(1) and dc(1)
> implementation. [0]
> 
> I propose replacing the current implementations with mine.
> 
> Advantages:
> 
> * Performance. [1]
> * Extensions for ease of use.
> * With build options to remove most extensions, if desired.
> * Compatible with GNU bc.
> * Already used by default in FreeBSD.
> * Fuzzed thoroughly.
> * No exec needed for bc(1) (both programs are contained in the same
>   multi-call binary).
> 
> Expectations met:
> 
> * Already uses pledge(2) and unveil(2).
> * No dependencies beyond C99, POSIX `make`, and POSIX `sh`.
> * This includes no dependency on editline(3), even though my bc(1)
>   and dc(1) have a history implementation.
> * Thorough test suite.
> * Comprehensive man pages.
> * Locale support.
> 
> Disadvantages:
> 
> * There are incompatibilities with the current bc(1) and dc(1), which
>   are listed below. All users would need to be made aware of these
>   incompatibilities, so they can update scripts, and scripts in `src/`
>   will also need to be updated.
> 
> Incompatibilities (failing tests from `regress/usr.bin`):
> 
> 1. Current bc(1) and dc(1) return 0 for length(a) where a is 0. Mine
>return 1. This causes my dc(1) to fail `dc/t1.in` and `dc/t28.in`.
> 2. Current dc(1) implements arrays as part of registers. Mine keeps
>arrays and registers separate. This causes my dc(1) to fail
>`dc/t1.in` and `dc/t8.in`.
> 3. Current dc(1) does not print a `nul` byte if given the `P` or `a`
>commands with 0 on the top of the stack. My dc(1) does (because it
>considers 0 to have one digit, see #1). This causes my dc(1) to fail
>`dc/t3.in` and `dc/t13.in`.
> 4. Current dc(1) starts with empty registers, and allows the user to pop
>all items off the register stack. My dc(1) starts with a single item
>in the register and does not allow the user to remove it.
> 5. Current dc(1) will push an item onto a register stack for the `s`
>command. My dc(1) does not since one already exists. This, combined
>with #4, causes my dc(1) to fail `dc/t5.in`
> 6. Current bc(1) and dc(1) have a larger maximum `obase` than mine. This
>causes my dc(1) to fail `dc/t9.in`.
> 7. Current dc(1) does not reset on errors. My dc(1) does, so it fails
>`dc/t12.in`.
> 8. Current dc(1) allows register names with any character. My dc(1)
>requires non-control characters and has a different way of doing
>extended registers. This causes my dc(1) to fail `dc/t15.in`,
>`dc/t16.in`, `dc/t19.in`, `dc/t21.in`, and `dc/t23.in`.
> 9. Current bc(1) is a frontend to dc(1). Mine are combined into the same
>binary and generate and run bytecode. This means that my bc(1) fails
>all of the bc(1) regression tests (which generate dc(1) code) and
>does not have the `-c` option.
> 
> Notes:
> 
> My dc(1) also fails `dc/t10.in` because it doesn't have the `!` command,
> but https://github.com/openbsd/src/commit/dc405aa075 makes it appear as
> though the current dc(1) does not have the `!` command either.
> 
> In https://youtu.be/gvmGfpMgny4?t=1277 , Bob Beck said that unveil(2)
> must not be used on command-line arguments, so I use unveil(2) after all
> command-line files are executed.
> 
> Current version is 4.0.2. I am planning to release version 4.1.0 soon,
> but held off in case you are interested and had feedback that might
> help.
> 
> I am willing to help maintain them if they are put into OpenBSD, but I
> am also willing to pass them off to whoever you wish, should you wish to
> do so.
> 
> I do have a mirror on GitHub.
> 
> If you are not interested, feel free to ignore this email.
> 
> Regardless, thank you for your time.
> 
> Gavin Howard
> 
> [0]: https://git.yzena.com/gavin/bc
> [1]: https://git.yzena.com/gavin/bc/src/branch/master/manuals/benchmarks.md
> 

I think it is interesting. As for the incompatibilites, my memory says
I followed the original dc and bc when deciding on those (GNU chose to
differs in these cases). Bit it has been 18 years since I wrote the
current versions, so I might misrememeber.

As for moving to your version, I have no opinion yet. I have some
attachment to the current code, but not so strong that I am opposing
replacement upfront. OTOH the current implementaion is almost
maintainance free for many years already. So I dunno.

I'll take a look at your code soon and maybe other devs have opinions.

-Otto



Re: bgpd support for enhanced route refresh

2021-06-17 Thread Claudio Jeker
On Mon, Jun 14, 2021 at 05:10:07PM +0200, Claudio Jeker wrote:
> On Thu, May 27, 2021 at 06:24:06PM +0200, Claudio Jeker wrote:
> > Implement RFC 7313 enhanced route refresh.
> > 
> > While there also change when graceful restart EoR markers are sent.
> > In short the graceful restart marker should only be sent initally. After
> > that the End of Route Refresh message should be sent instead.
> > Because of this track if an EoR marker was received or should be sent in
> > the peer config.
> > 
> > For now this setting is off by default but that may be changed at a later
> > state.
> > 
> > Please try this out and tell me if it works for you. The message and
> > prefix/rrefresh counters in bgpctl show nei output help a lot to see what
> > is going on.

Minor cleanup, instead of this comment use CTASSERT to make sure that the
recv_eor and send_eor fields are large enough to work as a bitfield for
AID_MAX

> -#define  AID_MAX 5
> +#define  AID_MAX 5   /* check rde_peer.recv_eor when max 
> reaches 7 */

Enhanced route refresh is currently off by default. So unless somebody
is against this I will commit this soon. At least then it gets tested :)

-- 
:wq Claudio


Index: bgpd.8
===
RCS file: /cvs/src/usr.sbin/bgpd/bgpd.8,v
retrieving revision 1.68
diff -u -p -r1.68 bgpd.8
--- bgpd.8  16 Jun 2021 16:24:12 -  1.68
+++ bgpd.8  17 Jun 2021 09:35:52 -
@@ -382,6 +382,15 @@ has been started.
 .Re
 .Pp
 .Rs
+.%A K. Patel
+.%A E. Chen
+.%A B. Venkatachalapathy
+.%D July 2014
+.%R RFC 7313
+.%T Enhanced Route Refresh Capability for BGP-4
+.Re
+.Pp
+.Rs
 .%A W. Kumari
 .%A R. Bush
 .%A H. Schiller
Index: bgpd.conf.5
===
RCS file: /cvs/src/usr.sbin/bgpd/bgpd.conf.5,v
retrieving revision 1.210
diff -u -p -r1.210 bgpd.conf.5
--- bgpd.conf.5 6 May 2021 09:21:35 -   1.210
+++ bgpd.conf.5 27 May 2021 16:03:30 -
@@ -831,6 +831,16 @@ The default is
 .Ic yes .
 .Pp
 .It Xo
+.Ic announce enhanced refresh
+.Pq Ic yes Ns | Ns Ic no
+.Xc
+If set to
+.Ic yes ,
+the enhanced route refresh capability is announced.
+The default is
+.Ic no .
+.Pp
+.It Xo
 .Ic announce refresh
 .Pq Ic yes Ns | Ns Ic no
 .Xc
Index: bgpd.h
===
RCS file: /cvs/src/usr.sbin/bgpd/bgpd.h,v
retrieving revision 1.414
diff -u -p -r1.414 bgpd.h
--- bgpd.h  27 May 2021 08:27:48 -  1.414
+++ bgpd.h  17 Jun 2021 10:13:28 -
@@ -96,6 +96,9 @@
 #defineF_CTL_OVS_NOTFOUND  0x20
 #defineF_CTL_NEIGHBORS 0x40 /* only used by bgpctl */
 
+#define CTASSERT(x)extern char  _ctassert[(x) ? 1 : -1 ] \
+   __attribute__((__unused__))
+
 /*
  * Note that these numeric assignments differ from the numbers commonly
  * used in route origin validation context.
Index: parse.y
===
RCS file: /cvs/src/usr.sbin/bgpd/parse.y,v
retrieving revision 1.416
diff -u -p -r1.416 parse.y
--- parse.y 20 May 2021 10:06:20 -  1.416
+++ parse.y 27 May 2021 16:03:30 -
@@ -204,7 +204,7 @@ typedef struct {
 %token GROUP NEIGHBOR NETWORK
 %token EBGP IBGP
 %token LOCALAS REMOTEAS DESCR LOCALADDR MULTIHOP PASSIVE MAXPREFIX RESTART
-%token ANNOUNCE CAPABILITIES REFRESH AS4BYTE CONNECTRETRY
+%token ANNOUNCE CAPABILITIES REFRESH AS4BYTE CONNECTRETRY ENHANCED
 %token DEMOTE ENFORCE NEIGHBORAS ASOVERRIDE REFLECTOR DEPEND DOWN
 %token DUMP IN OUT SOCKET RESTRICTED
 %token LOG TRANSPARENT
@@ -1446,6 +1446,9 @@ peeropts  : REMOTEAS as4number{
| ANNOUNCE REFRESH yesno {
curpeer->conf.capabilities.refresh = $3;
}
+   | ANNOUNCE ENHANCED REFRESH yesno {
+   curpeer->conf.capabilities.enhanced_rr = $4;
+   }
| ANNOUNCE RESTART yesno {
curpeer->conf.capabilities.grestart.restart = $3;
}
@@ -2898,6 +2901,7 @@ lookup(char *s)
{ "dump",   DUMP},
{ "ebgp",   EBGP},
{ "enforce",ENFORCE},
+   { "enhanced",   ENHANCED },
{ "esp",ESP},
{ "evaluate",   EVALUATE},
{ "export", EXPORT},
Index: rde.c
===
RCS file: /cvs/src/usr.sbin/bgpd/rde.c,v
retrieving revision 1.525
diff -u -p -r1.525 rde.c
--- rde.c   17 Jun 2021 08:43:06 -  1.525
+++ rde.c   17 Jun 2021 09:35:52 -
@@ -1068,6 +1068,7 @@ rde_dispatch_imsg_rtr(struct imsgbuf *ib
 void
 rde_dispatch_imsg_peer(struct rde_peer *peer, void *bula)
 {
+   struct route_refresh rr;
struct session_up sup;
struct imsg imsg;
u_int8_t 

Re: ifnewlladdr spl

2021-06-17 Thread Vitaliy Makkoveev
On Thu, Jun 17, 2021 at 02:03:03AM +0200, Alexander Bluhm wrote:
> Thanks dlg@ for the hint with the ixl(4) fixes in current.  I will
> try so solve my 6.6 problem with that.
> 
> On Wed, Jun 16, 2021 at 10:19:03AM +0200, Martin Pieuchot wrote:
> > The diff discussed in this thread reduces the scope of the splnet/splx()
> > dance to only surround the modification of `if_flags'.   How is this
> > related to what you said?  Is it because `if_flags' is read in interrupt
> > handlers and it isn't modified atomically?  Does that imply that every
> > modification of `if_flags' should be done at IPL_NET?  Does that mean
> > some love is needed to ensure reading `if_flags' is coherent?
> 
> I have checked some drivers: hme, gem, em, ix.
> 
> They read if_flags in the interrupt handler, but never set it there.
> So it seems to be save to change it without splnet().  We can remove
> it from ifnewlladdr().
> 
> And most changes seem to happen from ioctl path, there we have net
> lock.  But I see also places where if_flags is changed with kernel
> lock.  This is in carp_carpdev_state() task and e.g. ixgbe_watchdog().
> 
> As we also have kernel lock in the ioctl path, I think this is the
> lock which protects if_flags modifications.  Look at if_watchdog_task()
> and carp_carpdev_state().
> 
> > Does that change also imply that it is safe to issue a SIOCSIFFLAGS on
> > a legacy drivers without blocking interrupts?  If the IPL needs to be
> > raised, this is left to the driver, right?
> 
> The drivers I have checked do splnet() in their ioctl function.
> ifioctl() calls driver ioctl with net lock, but without splnet().
> ixl does not use splnet() at all.
> 
> > Was the current splnet/splx() dance an easy way to block packet reception
> > between multiple SIOCSIFFLAGS ioctls?  This might have been a way to not
> > receive packets on a DOWN interface.  This is no longer the case on MP
> > kernels as there's a window between the UP and DOWN ioctls.  Do we care?
> > Is this down/up/down dance fine for legacy a modern drivers?
> 
> I guess, if we allow interrupts to happen, all packets should have
> been processed after we return from setting the interface down.
> 
> Is this the correct version of the diff?  Not tested yet.
> 
> bluhm
> 

Hypothetically we could have the driver for the old NIC which relies
on disabled interrupts on this ioctl() sequence. dlg@ pointed that this
splx(9) logic doesn't work on MP machines, but if such NIC was never
used on MP machines this diff could trigger it.

Also we have the mess around `if_flags' protection, because some code
should be serialized with `if_flags' modification and such code relies
on netlock. But also we have some pieces of code which modify `if_flags'
with the kernel lock held.

> 
> Index: net/if.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/net/if.c,v
> retrieving revision 1.641
> diff -u -p -r1.641 if.c
> --- net/if.c  25 May 2021 22:45:09 -  1.641
> +++ net/if.c  16 Jun 2021 23:46:42 -
> @@ -3107,9 +3107,10 @@ ifnewlladdr(struct ifnet *ifp)
>  #endif
>   struct ifreq ifrq;
>   short up;
> - int s;
>  
> - s = splnet();
> + NET_ASSERT_LOCKED();/* for ioctl and in6 */
> + KERNEL_ASSERT_LOCKED(); /* for if_flags */
> +
>   up = ifp->if_flags & IFF_UP;
>  
>   if (up) {
> @@ -3143,7 +3144,6 @@ ifnewlladdr(struct ifnet *ifp)
>   ifrq.ifr_flags = ifp->if_flags;
>   (*ifp->if_ioctl)(ifp, SIOCSIFFLAGS, (caddr_t));
>   }
> - splx(s);
>  }
>  
>  void
> 



bgpd refactor common code

2021-06-17 Thread Claudio Jeker
To not recreate the issue of missing another check in one of the
up_generate_updates() call points factor out the common code into
rde_skip_peer().

I hope this way a similar f-up can be avoided
-- 
:wq Claudio

? obj
Index: rde.c
===
RCS file: /cvs/src/usr.sbin/bgpd/rde.c,v
retrieving revision 1.525
diff -u -p -r1.525 rde.c
--- rde.c   17 Jun 2021 08:43:06 -  1.525
+++ rde.c   17 Jun 2021 08:50:37 -
@@ -2879,6 +2879,28 @@ rde_evaluate_all(void)
return rde_eval_all;
 }
 
+static int
+rde_skip_peer(struct rde_peer *peer, u_int16_t rib_id, u_int8_t aid)
+{
+   /* skip ourself */
+   if (peer == peerself)
+   return 1;
+   if (peer->state != PEER_UP)
+   return 1;
+   /* skip peers using a different rib */
+   if (peer->loc_rib_id != rib_id)
+   return 1;
+   /* check if peer actually supports the address family */
+   if (peer->capa.mp[aid] == 0)
+   return 1;
+   /* skip peers with special export types */
+   if (peer->export_type == EXPORT_NONE ||
+   peer->export_type == EXPORT_DEFAULT_ROUTE)
+   return 1;
+
+   return 0;
+}
+
 void
 rde_generate_updates(struct rib *rib, struct prefix *new, struct prefix *old,
 int eval_all)
@@ -2903,20 +2925,7 @@ rde_generate_updates(struct rib *rib, st
aid = old->pt->aid;
 
LIST_FOREACH(peer, , peer_l) {
-   /* skip ourself */
-   if (peer == peerself)
-   continue;
-   if (peer->state != PEER_UP)
-   continue;
-   /* skip peers using a different rib */
-   if (peer->loc_rib_id != rib->id)
-   continue;
-   /* check if peer actually supports the address family */
-   if (peer->capa.mp[aid] == 0)
-   continue;
-   /* skip peers with special export types */
-   if (peer->export_type == EXPORT_NONE ||
-   peer->export_type == EXPORT_DEFAULT_ROUTE)
+   if (rde_skip_peer(peer, rib->id, aid))
continue;
/* skip regular peers if the best path didn't change */
if ((peer->flags & PEERFLAG_EVALUATE_ALL) == 0 && eval_all)
@@ -3571,20 +3580,7 @@ rde_softreconfig_out(struct rib_entry *r
return;
 
LIST_FOREACH(peer, , peer_l) {
-   /* skip ourself */
-   if (peer == peerself)
-   continue;
-   if (peer->state != PEER_UP)
-   continue;
-   /* skip peers using a different rib */
-   if (peer->loc_rib_id != p->re->rib_id)
-   continue;
-   /* check if peer actually supports the address family */
-   if (peer->capa.mp[aid] == 0)
-   continue;
-   /* skip peers with special export types */
-   if (peer->export_type == EXPORT_NONE ||
-   peer->export_type == EXPORT_DEFAULT_ROUTE)
+   if (rde_skip_peer(peer, re->rib_id, aid))
continue;
/* skip peers which don't need to reconfigure */
if (peer->reconf_out == 0)



Re: ipsec crypto kernel lock

2021-06-17 Thread Martin Pieuchot
On 16/06/21(Wed) 22:05, Alexander Bluhm wrote:
> Hi,
> 
> I have seen a kernel crash with while forwarding and encrypting
> much traffic through OpenBSD IPsec gateways running iked.
> 
> kernel: protection fault trap, code=0
> Stopped at  aes_ctr_crypt+0x1e: addb$0x1,0x2e3(%rdi)
> 
> ddb{2}> trace
> aes_ctr_crypt(16367ed4be021a53,8000246e1db0) at aes_ctr_crypt+0x1e
> swcr_authenc(fd8132a21b08) at swcr_authenc+0x5c3
> swcr_process(fd8132a21b08) at swcr_process+0x1e8
> crypto_invoke(fd8132a21b08) at crypto_invoke+0xde
> taskq_thread(80200500) at taskq_thread+0x81
> end trace frame: 0x0, count: -5
> 
> *64926  109760  0  0  7 0x14200crypto
> 
> swcr_authenc() passes swe->sw_kschedule to aes_ctr_crypt(), swe has
> been freed and contains deaf006cdeaf4152, which looks like some
> sort of poison.  I suspect a use after free.
> 
> The swe value comes from the swcr_sessions global pointers.  Its
> content looks sane in ddb.  Noone touches it in swcr_authenc().  So
> I guess that an other CPU changes the global structures while
> swcr_authenc() is working with it.
> 
> The crypto thread is protected by kernel lock, both network stack
> and pfkey use net lock.  The kernel lock has been recently removed
> from pfkey.
> 
> I think the required lock for the crypto framework is the kernel
> lock.  If crypto_ functions are called, IPsec must grab the kernel
> lock.  pfkey accesses crypto only via tdb_ functions, so this diff
> also covers that case.

It's not clear to me which field is the KERNEL_LOCK() protecting.  Is it
the access to `swcr_sessions'?  Is it a reference?  If so grabbing/releasing
the lock might not be enough to fix the use-after-free.

Could you annotate which field is being protected by the KERNEL_LOCK()? 

> Index: netinet/ip_ah.c
> ===
> RCS file: /data/mirror/openbsd/cvs/src/sys/netinet/ip_ah.c,v
> retrieving revision 1.146
> diff -u -p -r1.146 ip_ah.c
> --- netinet/ip_ah.c   25 Feb 2021 02:48:21 -  1.146
> +++ netinet/ip_ah.c   16 Jun 2021 17:29:37 -
> @@ -98,6 +98,7 @@ ah_init(struct tdb *tdbp, struct xformsw
>  {
>   struct auth_hash *thash = NULL;
>   struct cryptoini cria, crin;
> + int error;
>  
>   /* Authentication operation. */
>   switch (ii->ii_authalg) {
> @@ -162,7 +163,10 @@ ah_init(struct tdb *tdbp, struct xformsw
>   cria.cri_next = 
>   }
>  
> - return crypto_newsession(>tdb_cryptoid, , 0);
> + KERNEL_LOCK();
> + error = crypto_newsession(>tdb_cryptoid, , 0);
> + KERNEL_UNLOCK();
> + return error;
>  }
>  
>  /*
> @@ -171,7 +175,7 @@ ah_init(struct tdb *tdbp, struct xformsw
>  int
>  ah_zeroize(struct tdb *tdbp)
>  {
> - int err;
> + int error;
>  
>   if (tdbp->tdb_amxkey) {
>   explicit_bzero(tdbp->tdb_amxkey, tdbp->tdb_amxkeylen);
> @@ -179,9 +183,11 @@ ah_zeroize(struct tdb *tdbp)
>   tdbp->tdb_amxkey = NULL;
>   }
>  
> - err = crypto_freesession(tdbp->tdb_cryptoid);
> + KERNEL_LOCK();
> + error = crypto_freesession(tdbp->tdb_cryptoid);
> + KERNEL_UNLOCK();
>   tdbp->tdb_cryptoid = 0;
> - return err;
> + return error;
>  }
>  
>  /*
> @@ -626,7 +632,9 @@ ah_input(struct mbuf *m, struct tdb *tdb
>   }
>  
>   /* Get crypto descriptors. */
> + KERNEL_LOCK();
>   crp = crypto_getreq(1);
> + KERNEL_UNLOCK();
>   if (crp == NULL) {
>   DPRINTF(("%s: failed to acquire crypto descriptors\n",
>   __func__));
> @@ -696,11 +704,16 @@ ah_input(struct mbuf *m, struct tdb *tdb
>   tc->tc_rdomain = tdb->tdb_rdomain;
>   memcpy(>tc_dst, >tdb_dst, sizeof(union sockaddr_union));
>  
> - return crypto_dispatch(crp);
> + KERNEL_LOCK();
> + error = crypto_dispatch(crp);
> + KERNEL_UNLOCK();
> + return error;
>  
>   drop:
>   m_freem(m);
> + KERNEL_LOCK();
>   crypto_freereq(crp);
> + KERNEL_UNLOCK();
>   free(tc, M_XDATA, 0);
>   return error;
>  }
> @@ -1047,7 +1060,9 @@ ah_output(struct mbuf *m, struct tdb *td
>  #endif
>  
>   /* Get crypto descriptors. */
> + KERNEL_LOCK();
>   crp = crypto_getreq(1);
> + KERNEL_UNLOCK();
>   if (crp == NULL) {
>   DPRINTF(("%s: failed to acquire crypto descriptors\n",
>   __func__));
> @@ -1144,11 +1159,16 @@ ah_output(struct mbuf *m, struct tdb *td
>   tc->tc_rdomain = tdb->tdb_rdomain;
>   memcpy(>tc_dst, >tdb_dst, sizeof(union sockaddr_union));
>  
> - return crypto_dispatch(crp);
> + KERNEL_LOCK();
> + error = crypto_dispatch(crp);
> + KERNEL_UNLOCK();
> + return error;
>  
>   drop:
>   m_freem(m);
> + KERNEL_LOCK();
>   crypto_freereq(crp);
> + KERNEL_UNLOCK();
>   free(tc, M_XDATA, 0);
>   return error;
>  }
> Index: netinet/ip_esp.c
> 

Re: tcpdump(8): improve dhcp6

2021-06-17 Thread Martijn van Duren
dlg@ asked me for an example output, so here it is:
07:52:52.326084 fe80::fce1:baff:fed2:8886.dhcpv6-client > 
ff02::1:2.dhcpv6-server: [udp sum ok] DHCPv6 Solicit xid 0xdc0732
OPTION_CLIENTID: 00:01:00:01:28:5c:b7:1e:e8:6a:64:e8:d5:3f
OPTION_ELAPSED_TIME: 0.00s
OPTION_IA_PD: 34e3ca60 [hlim 1] (len 52)
07:52:52.326916 fe80::fce1:bbff:fed1:8a39.15622 > 
fe80::fce1:baff:fed2:8886.dhcpv6-client: [udp sum ok] DHCPv6 Advertise xid 
0xdc0732
OPTION_CLIENTID: 00:01:00:01:28:5c:b7:1e:e8:6a:64:e8:d5:3f
OPTION_SERVERID: 00:01:00:01:28:5c:b6:f1:fe:e1:bb:d1:8a:39
OPTION_IA_PD: 
001a00193020010db8
 (len 93, hlim 64)
07:52:53.335961 fe80::fce1:baff:fed2:8886.dhcpv6-client > 
ff02::1:2.dhcpv6-server: [udp sum ok] DHCPv6 Request xid 0xa1d03f
OPTION_CLIENTID: 00:01:00:01:28:5c:b7:1e:e8:6a:64:e8:d5:3f
OPTION_SERVERID: 00:01:00:01:28:5c:b6:f1:fe:e1:bb:d1:8a:39
OPTION_ELAPSED_TIME: 0.00s
OPTION_IA_PD: 
001a00193020010db8c135e3ca
 [hlim 1] (len 99)
07:52:53.336899 fe80::fce1:bbff:fed1:8a39.15622 > 
fe80::fce1:baff:fed2:8886.dhcpv6-client: [udp sum ok] DHCPv6 Reply xid 0xa1d03f
OPTION_CLIENTID: 00:01:00:01:28:5c:b7:1e:e8:6a:64:e8:d5:3f
OPTION_SERVERID: 00:01:00:01:28:5c:b6:f1:fe:e1:bb:d1:8a:39
OPTION_IA_PD: 
001a00193020010db8
 (len 93, hlim 64)

This contains both the pretty values as described below as well as the
old format to which IA_PD falls back to. Also worth mentioning is that I
choose this output for the duid, because it's simpler, it's what
wide-dhcpv6 uses in it's configs and knowing how a UID is build up
doesn't give us any useful information imho.

On Wed, 2021-06-16 at 17:03 +0200, Martijn van Duren wrote:
> According to the last commit message to print-dhcp6.c by dlg:
> if someone is interested in making this easier to read, it would
> be a straightforward and  well contained project to better handle
> option printing.
> 
> I'm playing around a little with dhcp6 and I heeded the call.
> 
> This diff does a few things:
> - Extract the 24 bits as specified by the spec instead of stripping them
>   inside the printf
> - Add the defines of all the options inside RFC8415.  This one can be
>   expanded by iana's official list[0], but that's too much for this
>   diff.
> - Print the human readable name of options, or print the numeric value
>   if unknown.
>   I also opted for the OPTION_* syntax, instead of the full name,
>   because "Identity Association for Non-temporary Addresses" seemed a
>   bit much and it keeps the code simpler.
> - Pretty print the clientid, serverid, and elapsed_time options. In my
>   minimal test setup these were the first ones to come by. Values of
>   other options continue to be printed as they were and might follow if
>   people are interested in this diff.
> 
> OK?
> 
> martijn@
> 
> [0] https://www.iana.org/assignments/dhcpv6-parameters/dhcpv6-parameters.xhtml
> 
> Index: print-dhcp6.c
> ===
> RCS file: /cvs/src/usr.sbin/tcpdump/print-dhcp6.c,v
> retrieving revision 1.12
> diff -u -p -r1.12 print-dhcp6.c
> --- print-dhcp6.c   2 Dec 2019 22:07:20 -   1.12
> +++ print-dhcp6.c   16 Jun 2021 14:55:42 -
> @@ -49,10 +49,44 @@ struct rtentry;
>  #define DH6_RELAY_FORW 12
>  #define DH6_RELAY_REPL 13
>  
> +#define XIDLENGTH  3
> +
> +#define OPTION_CLIENTID1
> +#define OPTION_SERVERID2
> +#define OPTION_IA_NA   3
> +#define OPTION_IA_TA   4
> +#define OPTION_IAADDR  5
> +#define OPTION_ORO 6
> +#define OPTION_PREFERENCE  7
> +#define OPTION_ELAPSED_TIME8
> +#define OPTION_RELAY_MSG   9
> +#define OPTION_AUTH11
> +#define OPTION_UNICAST 12
> +#define OPTION_STATUS_CODE 13
> +#define OPTION_RAPID_COMMIT14
> +#define OPTION_USER_CLASS  15
> +#define OPTION_VENDOR_CLASS16
> +#define OPTION_VENDOR_OPTS 17
> +#define OPTION_INTERFACE_ID18
> +#define OPTION_RECONF_MSG  19
> +#define OPTION_RECONF_ACCEPT   20
> +#define OPTION_IA_PD   25
> +#define OPTION_IAPREFIX26
> +#define OPTION_INFORMATION_REFRESH_TIME32
> +#define OPTION_SOL_MAX_RT  82
> +#define OPTION_INF_MAX_RT  83
> +
> +#define CASEEXPAND(var, OPTION) \
> +case OPTION:   \
> +   var = #OPTION;  \
> +   break;
> +
>  static void
>  dhcp6opt_print(const u_char *cp, u_int length)
>  {
> uint16_t code, len;
> +   const char *codename;
> +   char codenameunknown[sizeof("OPTION (65535)")];
> u_int i;
> int l = snapend - cp;
>  
> @@ -77,22 +111,96 @@ dhcp6opt_print(const u_char *cp, u_int l
>