Re: 802.1q with 0 tag

2018-10-11 Thread David Gwynne
Hi Rivo,

vlan(4) can be configured to receive (and send) packets with 0 as the tag on 
the wire, which this diff will break.

I'm pretty confident you can leave an IP configured on the physical network 
interface, and create and configure a vlan interface on it without setting a 
VLAN id. The untagged frames should be received as normal on the parent, and 
the tagged frames with the priority will come in on the vlan interface but 
still get accepted for IP on the parent.

Note that the default priority of packets in the OpenBSD kernel is 3, which 
might be higher than what you expect an untagged packets priority to be. The 
default 802.1q priority is 1 iirc.

I'm curious as to where you see both tagged and untagged frames at the same 
time.

dlg

> On 11 Oct 2018, at 7:20 pm, Rivo Nurges  wrote:
> 
> Hi!
> 
> In theory 802.1q header with vlan tag 0 can be used to just signal
> priority. Now I'm seeing this in the wild. On a untagged port some
> packets are coming with dotq header and some without. Currently we drop
> all the packets with dotq header and vlan tag 0.
> 
> Following patch fixes this by recording the priority and removing the
> header, so existing code to handle the packet. At least NetBSD does
> something similar.
> 
> Rivo
> 
> diff --git sys/net/if_ethersubr.c sys/net/if_ethersubr.c
> index 76f6c3147e0..68f5b03b4f4 100644
> --- sys/net/if_ethersubr.c
> +++ sys/net/if_ethersubr.c
> @@ -362,6 +362,22 @@ ether_input(struct ifnet *ifp, struct mbuf *m, void 
> *cookie)
> 
>   etype = ntohs(eh->ether_type);
> 
> + /*
> +  * 802.1Q header with a tag of 0 can be used to store priority.
> +  */
> + if (etype == ETHERTYPE_VLAN) {
> + struct ether_vlan_header *evl = (void *)eh;
> + if (EVL_VLANOFTAG(evl->evl_tag) == EVL_VLID_NULL) {
> + etype = ntohs(evl->evl_proto);
> + m->m_pkthdr.pf.prio = EVL_PRIOFTAG(evl->evl_tag);
> + /* IEEE 802.1p has prio 0 and 1 swapped */
> + if (m->m_pkthdr.pf.prio <= 1)
> + m->m_pkthdr.pf.prio = !m->m_pkthdr.pf.prio;
> + memmove((char *)eh + EVL_ENCAPLEN, eh, sizeof(*eh));
> + m_adj(m, EVL_ENCAPLEN);
> + }
> + }
> +
>   switch (etype) {
>   case ETHERTYPE_IP:
>   input = ipv4_input;
> 



Re: vmd: servicing virtio devices from separate processes

2018-10-11 Thread Sergio Lopez
On Thu, 2018-10-11 at 09:02 -0700, Mike Larkin wrote:
> On Wed, Oct 10, 2018 at 05:04:34PM +0200, Sergio Lopez wrote:
> > On Mon, 2018-10-08 at 09:58 -0700, Mike Larkin wrote:
> > > On Fri, Oct 05, 2018 at 01:50:10PM +0200, Sergio Lopez wrote:
> > > > Hi,
> > > > 
> > > > I have an idea in mind that I'd like to share to ask you if you think
> > > > it's worth giving it a try.
> > > > 
> > > > Right now, vmd already features an excellent privsep model to ensure
> > > > the process servicing the VM requests to the outside world is running
> > > > with the lowest possible privileges.
> > > > 
> > > > I was wondering if we could take a step further, servicing each virtio
> > > > device from a different process. This design would simplify the
> > > > implementation and maintenance of those devices, improve the privsep
> > > > model and increase the resilience of VMs (the crash of a process
> > > > servicing a device won't bring down the whole VM, and a mechanism to
> > > > recover from this scenario could be explored).
> > > > 
> > > > Doing this in an efficient way requires:
> > > > 
> > > >  - The ability to receive virtqueue notifications directly on the
> > > > process. I've already sent an RFC patch for this (see "Lightweight
> > > > mechanism to kick virtio VQs"), as it'd be useful for the non-separated
> > > > model too.
> > > > 
> > > >  - An in-kernel IRQ chip. This one is the most controversial, as it
> > > > means emulating a device from a privileged domain, but I'm pretty sure
> > > > a lapic implementation with enough functionality to serve *BSD/Linux
> > > > Guests can be small and simple enough to be easily auditable.
> > > > 
> > > >  - A way to map the VM memory into a third process context. Can
> > > > uvm_share for this? Here's also the opportunity to explore options to
> > > > avoid mapping the whole VM memory, though that'll possibly require
> > > > functionality non covered by the virtio specs.
> > > > 
> > > > Do you think it's worth exploring this model? What are feelings
> > > > regarding the in-kernel IRQ chip?
> > > > 
> > > > Sergio (slp).
> > > > 
> > > 
> > > Lots of things to read through in this and the attached diff. I'll try to
> > > catch up and reply as soon as I can.
> > 
> > Ack. Let me know if you need me to split it in different patches, or if
> > you want to take a look at its use on vmd (the actual patch for vmd
> > still needs polishing, but works).
> > 
> > Sergio (slp).
> > 
> 
> I'm not sure if splitting this into a separate process or using a taskq is
> the right way to go ; dlg@ had done the latter before, but we dropped that
> for some reason.

Well, it's not a question of one thing or the other. It'd be possible
to have some devices serviced by code emmbedded in vmd, as it is today,
and others serviced by separated processes.

A nice use case for having a separated process servicing a device could
be implementing a virtio_blk disk with a complex backend, like ceph or
glusterfs. Not only you would avoid linking vmd against
librbd/libgfapi, but also would be able to attach gdb to inspect it
without halting the whole VM. Perhaps even restarting the process (hot-
plug/unplug?).

On the other hand, the underlying features (kickfd/in-kernel lapic) are
useful on their own, even with all devices being served by vmd,
providing lower latency, less vmexits, and parallel execution of I/O
requests. The only drawback is enlarging the kernel a bit, but I guess
this could be out of the GENERIC* configs by default.

IMHO this is win-win scenario.

Sergio (slp).



Re: vmd: servicing virtio devices from separate processes

2018-10-11 Thread Mike Larkin
On Wed, Oct 10, 2018 at 05:04:34PM +0200, Sergio Lopez wrote:
> On Mon, 2018-10-08 at 09:58 -0700, Mike Larkin wrote:
> > On Fri, Oct 05, 2018 at 01:50:10PM +0200, Sergio Lopez wrote:
> > > Hi,
> > > 
> > > I have an idea in mind that I'd like to share to ask you if you think
> > > it's worth giving it a try.
> > > 
> > > Right now, vmd already features an excellent privsep model to ensure
> > > the process servicing the VM requests to the outside world is running
> > > with the lowest possible privileges.
> > > 
> > > I was wondering if we could take a step further, servicing each virtio
> > > device from a different process. This design would simplify the
> > > implementation and maintenance of those devices, improve the privsep
> > > model and increase the resilience of VMs (the crash of a process
> > > servicing a device won't bring down the whole VM, and a mechanism to
> > > recover from this scenario could be explored).
> > > 
> > > Doing this in an efficient way requires:
> > > 
> > >  - The ability to receive virtqueue notifications directly on the
> > > process. I've already sent an RFC patch for this (see "Lightweight
> > > mechanism to kick virtio VQs"), as it'd be useful for the non-separated
> > > model too.
> > > 
> > >  - An in-kernel IRQ chip. This one is the most controversial, as it
> > > means emulating a device from a privileged domain, but I'm pretty sure
> > > a lapic implementation with enough functionality to serve *BSD/Linux
> > > Guests can be small and simple enough to be easily auditable.
> > > 
> > >  - A way to map the VM memory into a third process context. Can
> > > uvm_share for this? Here's also the opportunity to explore options to
> > > avoid mapping the whole VM memory, though that'll possibly require
> > > functionality non covered by the virtio specs.
> > > 
> > > Do you think it's worth exploring this model? What are feelings
> > > regarding the in-kernel IRQ chip?
> > > 
> > > Sergio (slp).
> > > 
> > 
> > Lots of things to read through in this and the attached diff. I'll try to
> > catch up and reply as soon as I can.
> 
> Ack. Let me know if you need me to split it in different patches, or if
> you want to take a look at its use on vmd (the actual patch for vmd
> still needs polishing, but works).
> 
> Sergio (slp).
> 

I'm not sure if splitting this into a separate process or using a taskq is
the right way to go ; dlg@ had done the latter before, but we dropped that
for some reason.

dlg@ do you still have that diff for comparison?

-ml



802.1q with 0 tag

2018-10-11 Thread Rivo Nurges
Hi!

In theory 802.1q header with vlan tag 0 can be used to just signal
priority. Now I'm seeing this in the wild. On a untagged port some
packets are coming with dotq header and some without. Currently we drop
all the packets with dotq header and vlan tag 0.

Following patch fixes this by recording the priority and removing the
header, so existing code to handle the packet. At least NetBSD does
something similar.

Rivo

diff --git sys/net/if_ethersubr.c sys/net/if_ethersubr.c
index 76f6c3147e0..68f5b03b4f4 100644
--- sys/net/if_ethersubr.c
+++ sys/net/if_ethersubr.c
@@ -362,6 +362,22 @@ ether_input(struct ifnet *ifp, struct mbuf *m, void 
*cookie)
 
etype = ntohs(eh->ether_type);
 
+   /*
+* 802.1Q header with a tag of 0 can be used to store priority.
+*/
+   if (etype == ETHERTYPE_VLAN) {
+   struct ether_vlan_header *evl = (void *)eh;
+   if (EVL_VLANOFTAG(evl->evl_tag) == EVL_VLID_NULL) {
+   etype = ntohs(evl->evl_proto);
+   m->m_pkthdr.pf.prio = EVL_PRIOFTAG(evl->evl_tag);
+   /* IEEE 802.1p has prio 0 and 1 swapped */
+   if (m->m_pkthdr.pf.prio <= 1)
+   m->m_pkthdr.pf.prio = !m->m_pkthdr.pf.prio;
+   memmove((char *)eh + EVL_ENCAPLEN, eh, sizeof(*eh));
+   m_adj(m, EVL_ENCAPLEN);
+   }
+   }
+
switch (etype) {
case ETHERTYPE_IP:
input = ipv4_input;