Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

2023-10-10 Thread Jason Wang
On Tue, Oct 10, 2023 at 2:19 PM Akihiko Odaki  wrote:
>
> On 2023/10/10 15:00, Jason Wang wrote:
> > On Tue, Oct 10, 2023 at 1:51 PM Akihiko Odaki  
> > wrote:
> >>
> >> On 2023/10/10 14:45, Jason Wang wrote:
> >>> On Tue, Oct 10, 2023 at 9:52 AM Akihiko Odaki  
> >>> wrote:
> 
>  On 2023/10/09 19:44, Willem de Bruijn wrote:
> > On Mon, Oct 9, 2023 at 3:12 AM Akihiko Odaki  
> > wrote:
> >>
> >> On 2023/10/09 19:06, Willem de Bruijn wrote:
> >>> On Mon, Oct 9, 2023 at 3:02 AM Akihiko Odaki 
> >>>  wrote:
> 
>  On 2023/10/09 18:57, Willem de Bruijn wrote:
> > On Mon, Oct 9, 2023 at 3:57 AM Akihiko Odaki 
> >  wrote:
> >>
> >> On 2023/10/09 17:04, Willem de Bruijn wrote:
> >>> On Sun, Oct 8, 2023 at 3:46 PM Akihiko Odaki 
> >>>  wrote:
> 
>  On 2023/10/09 5:08, Willem de Bruijn wrote:
> > On Sun, Oct 8, 2023 at 10:04 PM Akihiko Odaki 
> >  wrote:
> >>
> >> On 2023/10/09 4:07, Willem de Bruijn wrote:
> >>> On Sun, Oct 8, 2023 at 7:22 AM Akihiko Odaki 
> >>>  wrote:
> 
>  virtio-net have two usage of hashes: one is RSS and another 
>  is hash
>  reporting. Conventionally the hash calculation was done by 
>  the VMM.
>  However, computing the hash after the queue was chosen 
>  defeats the
>  purpose of RSS.
> 
>  Another approach is to use eBPF steering program. This 
>  approach has
>  another downside: it cannot report the calculated hash due 
>  to the
>  restrictive nature of eBPF.
> 
>  Introduce the code to compute hashes to the kernel in order 
>  to overcome
>  thse challenges. An alternative solution is to extend the 
>  eBPF steering
>  program so that it will be able to report to the userspace, 
>  but it makes
>  little sense to allow to implement different hashing 
>  algorithms with
>  eBPF since the hash value reported by virtio-net is strictly 
>  defined by
>  the specification.
> 
>  The hash value already stored in sk_buff is not used and 
>  computed
>  independently since it may have been computed in a way not 
>  conformant
>  with the specification.
> 
>  Signed-off-by: Akihiko Odaki 
>  ---
> >>>
>  +static const struct tun_vnet_hash_cap tun_vnet_hash_cap = {
>  +   .max_indirection_table_length =
>  +   TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH,
>  +
>  +   .types = VIRTIO_NET_SUPPORTED_HASH_TYPES
>  +};
> >>>
> >>> No need to have explicit capabilities exchange like this? Tun 
> >>> either
> >>> supports all or none.
> >>
> >> tun does not support VIRTIO_NET_RSS_HASH_TYPE_IP_EX,
> >> VIRTIO_NET_RSS_HASH_TYPE_TCP_EX, and 
> >> VIRTIO_NET_RSS_HASH_TYPE_UDP_EX.
> >>
> >> It is because the flow dissector does not support IPv6 
> >> extensions. The
> >> specification is also vague, and does not tell how many TLVs 
> >> should be
> >> consumed at most when interpreting destination option header 
> >> so I chose
> >> to avoid adding code for these hash types to the flow 
> >> dissector. I doubt
> >> anyone will complain about it since nobody complains for Linux.
> >>
> >> I'm also adding this so that we can extend it later.
> >> max_indirection_table_length may grow for systems with 128+ 
> >> CPUs, or
> >> types may have other bits for new protocols in the future.
> >>
> >>>
>  case TUNSETSTEERINGEBPF:
>  -   ret = tun_set_ebpf(tun, >steering_prog, 
>  argp);
>  +   bpf_ret = tun_set_ebpf(tun, 
>  >steering_prog, argp);
>  +   if (IS_ERR(bpf_ret))
>  +   ret = PTR_ERR(bpf_ret);
>  +   else if (bpf_ret)
>  +   tun->vnet_hash.flags &= 
>  ~TUN_VNET_HASH_RSS;
> >>>
> >>> Don't make one feature disable another.
> >>>
> >>> 

Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

2023-10-10 Thread Jason Wang
On Tue, Oct 10, 2023 at 1:51 PM Akihiko Odaki  wrote:
>
> On 2023/10/10 14:45, Jason Wang wrote:
> > On Tue, Oct 10, 2023 at 9:52 AM Akihiko Odaki  
> > wrote:
> >>
> >> On 2023/10/09 19:44, Willem de Bruijn wrote:
> >>> On Mon, Oct 9, 2023 at 3:12 AM Akihiko Odaki  
> >>> wrote:
> 
>  On 2023/10/09 19:06, Willem de Bruijn wrote:
> > On Mon, Oct 9, 2023 at 3:02 AM Akihiko Odaki  
> > wrote:
> >>
> >> On 2023/10/09 18:57, Willem de Bruijn wrote:
> >>> On Mon, Oct 9, 2023 at 3:57 AM Akihiko Odaki 
> >>>  wrote:
> 
>  On 2023/10/09 17:04, Willem de Bruijn wrote:
> > On Sun, Oct 8, 2023 at 3:46 PM Akihiko Odaki 
> >  wrote:
> >>
> >> On 2023/10/09 5:08, Willem de Bruijn wrote:
> >>> On Sun, Oct 8, 2023 at 10:04 PM Akihiko Odaki 
> >>>  wrote:
> 
>  On 2023/10/09 4:07, Willem de Bruijn wrote:
> > On Sun, Oct 8, 2023 at 7:22 AM Akihiko Odaki 
> >  wrote:
> >>
> >> virtio-net have two usage of hashes: one is RSS and another is 
> >> hash
> >> reporting. Conventionally the hash calculation was done by the 
> >> VMM.
> >> However, computing the hash after the queue was chosen defeats 
> >> the
> >> purpose of RSS.
> >>
> >> Another approach is to use eBPF steering program. This 
> >> approach has
> >> another downside: it cannot report the calculated hash due to 
> >> the
> >> restrictive nature of eBPF.
> >>
> >> Introduce the code to compute hashes to the kernel in order to 
> >> overcome
> >> thse challenges. An alternative solution is to extend the eBPF 
> >> steering
> >> program so that it will be able to report to the userspace, 
> >> but it makes
> >> little sense to allow to implement different hashing 
> >> algorithms with
> >> eBPF since the hash value reported by virtio-net is strictly 
> >> defined by
> >> the specification.
> >>
> >> The hash value already stored in sk_buff is not used and 
> >> computed
> >> independently since it may have been computed in a way not 
> >> conformant
> >> with the specification.
> >>
> >> Signed-off-by: Akihiko Odaki 
> >> ---
> >
> >> +static const struct tun_vnet_hash_cap tun_vnet_hash_cap = {
> >> +   .max_indirection_table_length =
> >> +   TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH,
> >> +
> >> +   .types = VIRTIO_NET_SUPPORTED_HASH_TYPES
> >> +};
> >
> > No need to have explicit capabilities exchange like this? Tun 
> > either
> > supports all or none.
> 
>  tun does not support VIRTIO_NET_RSS_HASH_TYPE_IP_EX,
>  VIRTIO_NET_RSS_HASH_TYPE_TCP_EX, and 
>  VIRTIO_NET_RSS_HASH_TYPE_UDP_EX.
> 
>  It is because the flow dissector does not support IPv6 
>  extensions. The
>  specification is also vague, and does not tell how many TLVs 
>  should be
>  consumed at most when interpreting destination option header so 
>  I chose
>  to avoid adding code for these hash types to the flow dissector. 
>  I doubt
>  anyone will complain about it since nobody complains for Linux.
> 
>  I'm also adding this so that we can extend it later.
>  max_indirection_table_length may grow for systems with 128+ 
>  CPUs, or
>  types may have other bits for new protocols in the future.
> 
> >
> >>case TUNSETSTEERINGEBPF:
> >> -   ret = tun_set_ebpf(tun, >steering_prog, 
> >> argp);
> >> +   bpf_ret = tun_set_ebpf(tun, 
> >> >steering_prog, argp);
> >> +   if (IS_ERR(bpf_ret))
> >> +   ret = PTR_ERR(bpf_ret);
> >> +   else if (bpf_ret)
> >> +   tun->vnet_hash.flags &= 
> >> ~TUN_VNET_HASH_RSS;
> >
> > Don't make one feature disable another.
> >
> > TUNSETSTEERINGEBPF and TUNSETVNETHASH are mutually exclusive
> > functions. If one is enabled the other call should fail, with 
> > EBUSY
> > for instance.
> >
> >> +   case TUNSETVNETHASH:
> >> +   len = sizeof(vnet_hash);
> >> +

Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

2023-10-09 Thread Jason Wang
On Tue, Oct 10, 2023 at 9:52 AM Akihiko Odaki  wrote:
>
> On 2023/10/09 19:44, Willem de Bruijn wrote:
> > On Mon, Oct 9, 2023 at 3:12 AM Akihiko Odaki  
> > wrote:
> >>
> >> On 2023/10/09 19:06, Willem de Bruijn wrote:
> >>> On Mon, Oct 9, 2023 at 3:02 AM Akihiko Odaki  
> >>> wrote:
> 
>  On 2023/10/09 18:57, Willem de Bruijn wrote:
> > On Mon, Oct 9, 2023 at 3:57 AM Akihiko Odaki  
> > wrote:
> >>
> >> On 2023/10/09 17:04, Willem de Bruijn wrote:
> >>> On Sun, Oct 8, 2023 at 3:46 PM Akihiko Odaki 
> >>>  wrote:
> 
>  On 2023/10/09 5:08, Willem de Bruijn wrote:
> > On Sun, Oct 8, 2023 at 10:04 PM Akihiko Odaki 
> >  wrote:
> >>
> >> On 2023/10/09 4:07, Willem de Bruijn wrote:
> >>> On Sun, Oct 8, 2023 at 7:22 AM Akihiko Odaki 
> >>>  wrote:
> 
>  virtio-net have two usage of hashes: one is RSS and another is 
>  hash
>  reporting. Conventionally the hash calculation was done by the 
>  VMM.
>  However, computing the hash after the queue was chosen defeats 
>  the
>  purpose of RSS.
> 
>  Another approach is to use eBPF steering program. This approach 
>  has
>  another downside: it cannot report the calculated hash due to the
>  restrictive nature of eBPF.
> 
>  Introduce the code to compute hashes to the kernel in order to 
>  overcome
>  thse challenges. An alternative solution is to extend the eBPF 
>  steering
>  program so that it will be able to report to the userspace, but 
>  it makes
>  little sense to allow to implement different hashing algorithms 
>  with
>  eBPF since the hash value reported by virtio-net is strictly 
>  defined by
>  the specification.
> 
>  The hash value already stored in sk_buff is not used and computed
>  independently since it may have been computed in a way not 
>  conformant
>  with the specification.
> 
>  Signed-off-by: Akihiko Odaki 
>  ---
> >>>
>  +static const struct tun_vnet_hash_cap tun_vnet_hash_cap = {
>  +   .max_indirection_table_length =
>  +   TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH,
>  +
>  +   .types = VIRTIO_NET_SUPPORTED_HASH_TYPES
>  +};
> >>>
> >>> No need to have explicit capabilities exchange like this? Tun 
> >>> either
> >>> supports all or none.
> >>
> >> tun does not support VIRTIO_NET_RSS_HASH_TYPE_IP_EX,
> >> VIRTIO_NET_RSS_HASH_TYPE_TCP_EX, and 
> >> VIRTIO_NET_RSS_HASH_TYPE_UDP_EX.
> >>
> >> It is because the flow dissector does not support IPv6 extensions. 
> >> The
> >> specification is also vague, and does not tell how many TLVs 
> >> should be
> >> consumed at most when interpreting destination option header so I 
> >> chose
> >> to avoid adding code for these hash types to the flow dissector. I 
> >> doubt
> >> anyone will complain about it since nobody complains for Linux.
> >>
> >> I'm also adding this so that we can extend it later.
> >> max_indirection_table_length may grow for systems with 128+ CPUs, 
> >> or
> >> types may have other bits for new protocols in the future.
> >>
> >>>
>    case TUNSETSTEERINGEBPF:
>  -   ret = tun_set_ebpf(tun, >steering_prog, 
>  argp);
>  +   bpf_ret = tun_set_ebpf(tun, >steering_prog, 
>  argp);
>  +   if (IS_ERR(bpf_ret))
>  +   ret = PTR_ERR(bpf_ret);
>  +   else if (bpf_ret)
>  +   tun->vnet_hash.flags &= 
>  ~TUN_VNET_HASH_RSS;
> >>>
> >>> Don't make one feature disable another.
> >>>
> >>> TUNSETSTEERINGEBPF and TUNSETVNETHASH are mutually exclusive
> >>> functions. If one is enabled the other call should fail, with 
> >>> EBUSY
> >>> for instance.
> >>>
>  +   case TUNSETVNETHASH:
>  +   len = sizeof(vnet_hash);
>  +   if (copy_from_user(_hash, argp, len)) {
>  +   ret = -EFAULT;
>  +   break;
>  +   }
>  +
>  +   if (((vnet_hash.flags & TUN_VNET_HASH_REPORT) &&
>  +(tun->vnet_hdr_sz < sizeof(struct 
> 

Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

2023-10-09 Thread Michael S. Tsirkin
On Mon, Oct 09, 2023 at 05:44:20PM +0900, Akihiko Odaki wrote:
> On 2023/10/09 17:13, Willem de Bruijn wrote:
> > On Sun, Oct 8, 2023 at 12:22 AM Akihiko Odaki  
> > wrote:
> > > 
> > > virtio-net have two usage of hashes: one is RSS and another is hash
> > > reporting. Conventionally the hash calculation was done by the VMM.
> > > However, computing the hash after the queue was chosen defeats the
> > > purpose of RSS.
> > > 
> > > Another approach is to use eBPF steering program. This approach has
> > > another downside: it cannot report the calculated hash due to the
> > > restrictive nature of eBPF.
> > > 
> > > Introduce the code to compute hashes to the kernel in order to overcome
> > > thse challenges. An alternative solution is to extend the eBPF steering
> > > program so that it will be able to report to the userspace, but it makes
> > > little sense to allow to implement different hashing algorithms with
> > > eBPF since the hash value reported by virtio-net is strictly defined by
> > > the specification.
> > > 
> > > The hash value already stored in sk_buff is not used and computed
> > > independently since it may have been computed in a way not conformant
> > > with the specification.
> > > 
> > > Signed-off-by: Akihiko Odaki 
> > 
> > > @@ -2116,31 +2172,49 @@ static ssize_t tun_put_user(struct tun_struct 
> > > *tun,
> > >  }
> > > 
> > >  if (vnet_hdr_sz) {
> > > -   struct virtio_net_hdr gso;
> > > +   union {
> > > +   struct virtio_net_hdr hdr;
> > > +   struct virtio_net_hdr_v1_hash v1_hash_hdr;
> > > +   } hdr;
> > > +   int ret;
> > > 
> > >  if (iov_iter_count(iter) < vnet_hdr_sz)
> > >  return -EINVAL;
> > > 
> > > -   if (virtio_net_hdr_from_skb(skb, ,
> > > -   tun_is_little_endian(tun), 
> > > true,
> > > -   vlan_hlen)) {
> > > +   if ((READ_ONCE(tun->vnet_hash.flags) & 
> > > TUN_VNET_HASH_REPORT) &&
> > > +   vnet_hdr_sz >= sizeof(hdr.v1_hash_hdr) &&
> > > +   skb->tun_vnet_hash) {
> > 
> > Isn't vnet_hdr_sz guaranteed to be >= hdr.v1_hash_hdr, by virtue of
> > the set hash ioctl failing otherwise?
> > 
> > Such checks should be limited to control path where possible
> 
> There is a potential race since tun->vnet_hash.flags and vnet_hdr_sz are not
> read at once.

And then it's a complete mess and you get inconsistent
behaviour with packets getting sent all over the place, right?
So maybe keep a pointer to this struct so it can be
changed atomically then. Maybe even something with rcu I donnu.

-- 
MST

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

2023-10-09 Thread Michael S. Tsirkin
Akihiko Odaki sorry about reposts.
Having an email with two "@" in the CC list:
 xuanzhuo@linux.alibaba.comsh...@kernel.org
tripped up mutt's reply-all for me and made it send to you only.

I am guessing you meant two addresses: xuanz...@linux.alibaba.com
and sh...@kernel.org.


On Sun, Oct 08, 2023 at 02:20:49PM +0900, Akihiko Odaki wrote:
> virtio-net have two usage of hashes: one is RSS and another is hash
> reporting. Conventionally the hash calculation was done by the VMM.
> However, computing the hash after the queue was chosen defeats the
> purpose of RSS.
> 
> Another approach is to use eBPF steering program. This approach has
> another downside: it cannot report the calculated hash due to the
> restrictive nature of eBPF.
> 
> Introduce the code to compute hashes to the kernel in order to overcome
> thse challenges. An alternative solution is to extend the eBPF steering
> program so that it will be able to report to the userspace, but it makes
> little sense to allow to implement different hashing algorithms with
> eBPF since the hash value reported by virtio-net is strictly defined by
> the specification.
> 
> The hash value already stored in sk_buff is not used and computed
> independently since it may have been computed in a way not conformant
> with the specification.
> 
> Signed-off-by: Akihiko Odaki 
> ---
>  drivers/net/tun.c   | 187 
>  include/uapi/linux/if_tun.h |  16 +++
>  2 files changed, 182 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 89ab9efe522c..561a573cd008 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -171,6 +171,9 @@ struct tun_prog {
>   struct bpf_prog *prog;
>  };
>  
> +#define TUN_VNET_HASH_MAX_KEY_SIZE 40
> +#define TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH 128
> +

where do these come from?

>  /* Since the socket were moved to tun_file, to preserve the behavior of 
> persist
>   * device, socket filter, sndbuf and vnet header size were restore when the
>   * file were attached to a persist device.
> @@ -209,6 +212,9 @@ struct tun_struct {
>   struct tun_prog __rcu *steering_prog;
>   struct tun_prog __rcu *filter_prog;
>   struct ethtool_link_ksettings link_ksettings;
> + struct tun_vnet_hash vnet_hash;
> + u16 
> vnet_hash_indirection_table[TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH];
> + u32 vnet_hash_key[TUN_VNET_HASH_MAX_KEY_SIZE / 4];

That's quite a lot of data to add in this struct, and will be used
by a small minority of users. Are you pushing any hot data out of cache
with this? Why not allocate these as needed?

>   /* init args */
>   struct file *file;
>   struct ifreq *ifr;
> @@ -219,6 +225,13 @@ struct veth {
>   __be16 h_vlan_TCI;
>  };
>  
> +static const struct tun_vnet_hash_cap tun_vnet_hash_cap = {
> + .max_indirection_table_length =
> + TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH,
> +
> + .types = VIRTIO_NET_SUPPORTED_HASH_TYPES
> +};
> +
>  static void tun_flow_init(struct tun_struct *tun);
>  static void tun_flow_uninit(struct tun_struct *tun);
>  
> @@ -320,10 +333,16 @@ static long tun_set_vnet_be(struct tun_struct *tun, int 
> __user *argp)
>   if (get_user(be, argp))
>   return -EFAULT;
>  
> - if (be)
> + if (be) {
> + if (!(tun->flags & TUN_VNET_LE) &&
> + (tun->vnet_hash.flags & TUN_VNET_HASH_REPORT)) {
> + return -EINVAL;
> + }
> +
>   tun->flags |= TUN_VNET_BE;
> - else
> + } else {
>   tun->flags &= ~TUN_VNET_BE;
> + }
>  
>   return 0;
>  }
> @@ -558,15 +577,47 @@ static u16 tun_ebpf_select_queue(struct tun_struct 
> *tun, struct sk_buff *skb)
>   return ret % numqueues;
>  }
>  
> +static u16 tun_vnet_select_queue(struct tun_struct *tun, struct sk_buff *skb)
> +{
> + u32 value = qdisc_skb_cb(skb)->tun_vnet_hash_value;
> + u32 numqueues;
> + u32 index;
> + u16 queue;
> +
> + numqueues = READ_ONCE(tun->numqueues);
> + if (!numqueues)
> + return 0;
> +
> + index = value & READ_ONCE(tun->vnet_hash.indirection_table_mask);
> + queue = READ_ONCE(tun->vnet_hash_indirection_table[index]);
> + if (!queue)
> + queue = READ_ONCE(tun->vnet_hash.unclassified_queue);

Apparently 0 is an illegal queue value? You are making this part
of UAPI better document things like this.

> +
> + return queue % numqueues;
> +}
> +
>  static u16 tun_select_queue(struct net_device *dev, struct sk_buff *skb,
>   struct net_device *sb_dev)
>  {
>   struct tun_struct *tun = netdev_priv(dev);
> + u8 vnet_hash_flags = READ_ONCE(tun->vnet_hash.flags);
> + struct virtio_net_hash hash;
>   u16 ret;
>  
> + if (vnet_hash_flags & (TUN_VNET_HASH_RSS | TUN_VNET_HASH_REPORT)) {
> + virtio_net_hash(skb, READ_ONCE(tun->vnet_hash.types),
> + 

Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

2023-10-09 Thread Willem de Bruijn
On Mon, Oct 9, 2023 at 3:12 AM Akihiko Odaki  wrote:
>
> On 2023/10/09 19:06, Willem de Bruijn wrote:
> > On Mon, Oct 9, 2023 at 3:02 AM Akihiko Odaki  
> > wrote:
> >>
> >> On 2023/10/09 18:57, Willem de Bruijn wrote:
> >>> On Mon, Oct 9, 2023 at 3:57 AM Akihiko Odaki  
> >>> wrote:
> 
>  On 2023/10/09 17:04, Willem de Bruijn wrote:
> > On Sun, Oct 8, 2023 at 3:46 PM Akihiko Odaki  
> > wrote:
> >>
> >> On 2023/10/09 5:08, Willem de Bruijn wrote:
> >>> On Sun, Oct 8, 2023 at 10:04 PM Akihiko Odaki 
> >>>  wrote:
> 
>  On 2023/10/09 4:07, Willem de Bruijn wrote:
> > On Sun, Oct 8, 2023 at 7:22 AM Akihiko Odaki 
> >  wrote:
> >>
> >> virtio-net have two usage of hashes: one is RSS and another is hash
> >> reporting. Conventionally the hash calculation was done by the VMM.
> >> However, computing the hash after the queue was chosen defeats the
> >> purpose of RSS.
> >>
> >> Another approach is to use eBPF steering program. This approach has
> >> another downside: it cannot report the calculated hash due to the
> >> restrictive nature of eBPF.
> >>
> >> Introduce the code to compute hashes to the kernel in order to 
> >> overcome
> >> thse challenges. An alternative solution is to extend the eBPF 
> >> steering
> >> program so that it will be able to report to the userspace, but it 
> >> makes
> >> little sense to allow to implement different hashing algorithms 
> >> with
> >> eBPF since the hash value reported by virtio-net is strictly 
> >> defined by
> >> the specification.
> >>
> >> The hash value already stored in sk_buff is not used and computed
> >> independently since it may have been computed in a way not 
> >> conformant
> >> with the specification.
> >>
> >> Signed-off-by: Akihiko Odaki 
> >> ---
> >
> >> +static const struct tun_vnet_hash_cap tun_vnet_hash_cap = {
> >> +   .max_indirection_table_length =
> >> +   TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH,
> >> +
> >> +   .types = VIRTIO_NET_SUPPORTED_HASH_TYPES
> >> +};
> >
> > No need to have explicit capabilities exchange like this? Tun either
> > supports all or none.
> 
>  tun does not support VIRTIO_NET_RSS_HASH_TYPE_IP_EX,
>  VIRTIO_NET_RSS_HASH_TYPE_TCP_EX, and VIRTIO_NET_RSS_HASH_TYPE_UDP_EX.
> 
>  It is because the flow dissector does not support IPv6 extensions. 
>  The
>  specification is also vague, and does not tell how many TLVs should 
>  be
>  consumed at most when interpreting destination option header so I 
>  chose
>  to avoid adding code for these hash types to the flow dissector. I 
>  doubt
>  anyone will complain about it since nobody complains for Linux.
> 
>  I'm also adding this so that we can extend it later.
>  max_indirection_table_length may grow for systems with 128+ CPUs, or
>  types may have other bits for new protocols in the future.
> 
> >
> >>  case TUNSETSTEERINGEBPF:
> >> -   ret = tun_set_ebpf(tun, >steering_prog, argp);
> >> +   bpf_ret = tun_set_ebpf(tun, >steering_prog, 
> >> argp);
> >> +   if (IS_ERR(bpf_ret))
> >> +   ret = PTR_ERR(bpf_ret);
> >> +   else if (bpf_ret)
> >> +   tun->vnet_hash.flags &= ~TUN_VNET_HASH_RSS;
> >
> > Don't make one feature disable another.
> >
> > TUNSETSTEERINGEBPF and TUNSETVNETHASH are mutually exclusive
> > functions. If one is enabled the other call should fail, with EBUSY
> > for instance.
> >
> >> +   case TUNSETVNETHASH:
> >> +   len = sizeof(vnet_hash);
> >> +   if (copy_from_user(_hash, argp, len)) {
> >> +   ret = -EFAULT;
> >> +   break;
> >> +   }
> >> +
> >> +   if (((vnet_hash.flags & TUN_VNET_HASH_REPORT) &&
> >> +(tun->vnet_hdr_sz < sizeof(struct 
> >> virtio_net_hdr_v1_hash) ||
> >> + !tun_is_little_endian(tun))) ||
> >> +vnet_hash.indirection_table_mask >=
> >> +TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH) {
> >> +   ret = -EINVAL;
> >> +   break;
> >> +   }
> >> +
> >> +   argp = (u8 __user *)argp + len;
> >> +

Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

2023-10-09 Thread Willem de Bruijn
On Mon, Oct 9, 2023 at 3:11 AM Akihiko Odaki  wrote:
>
> On 2023/10/09 19:07, Willem de Bruijn wrote:
> > On Mon, Oct 9, 2023 at 3:05 AM Akihiko Odaki  
> > wrote:
> >>
> >>
> >>
> >> On 2023/10/09 18:54, Willem de Bruijn wrote:
> >>> On Mon, Oct 9, 2023 at 3:44 AM Akihiko Odaki  
> >>> wrote:
> 
>  On 2023/10/09 17:13, Willem de Bruijn wrote:
> > On Sun, Oct 8, 2023 at 12:22 AM Akihiko Odaki 
> >  wrote:
> >>
> >> virtio-net have two usage of hashes: one is RSS and another is hash
> >> reporting. Conventionally the hash calculation was done by the VMM.
> >> However, computing the hash after the queue was chosen defeats the
> >> purpose of RSS.
> >>
> >> Another approach is to use eBPF steering program. This approach has
> >> another downside: it cannot report the calculated hash due to the
> >> restrictive nature of eBPF.
> >>
> >> Introduce the code to compute hashes to the kernel in order to overcome
> >> thse challenges. An alternative solution is to extend the eBPF steering
> >> program so that it will be able to report to the userspace, but it 
> >> makes
> >> little sense to allow to implement different hashing algorithms with
> >> eBPF since the hash value reported by virtio-net is strictly defined by
> >> the specification.
> >>
> >> The hash value already stored in sk_buff is not used and computed
> >> independently since it may have been computed in a way not conformant
> >> with the specification.
> >>
> >> Signed-off-by: Akihiko Odaki 
> >
> >> @@ -2116,31 +2172,49 @@ static ssize_t tun_put_user(struct tun_struct 
> >> *tun,
> >>}
> >>
> >>if (vnet_hdr_sz) {
> >> -   struct virtio_net_hdr gso;
> >> +   union {
> >> +   struct virtio_net_hdr hdr;
> >> +   struct virtio_net_hdr_v1_hash v1_hash_hdr;
> >> +   } hdr;
> >> +   int ret;
> >>
> >>if (iov_iter_count(iter) < vnet_hdr_sz)
> >>return -EINVAL;
> >>
> >> -   if (virtio_net_hdr_from_skb(skb, ,
> >> -   tun_is_little_endian(tun), 
> >> true,
> >> -   vlan_hlen)) {
> >> +   if ((READ_ONCE(tun->vnet_hash.flags) & 
> >> TUN_VNET_HASH_REPORT) &&
> >> +   vnet_hdr_sz >= sizeof(hdr.v1_hash_hdr) &&
> >> +   skb->tun_vnet_hash) {
> >
> > Isn't vnet_hdr_sz guaranteed to be >= hdr.v1_hash_hdr, by virtue of
> > the set hash ioctl failing otherwise?
> >
> > Such checks should be limited to control path where possible
> 
>  There is a potential race since tun->vnet_hash.flags and vnet_hdr_sz are
>  not read at once.
> >>>
> >>> It should not be possible to downgrade the hdr_sz once v1 is selected.
> >>
> >> I see nothing that prevents shrinking the header size.
> >>
> >> tun->vnet_hash.flags is read after vnet_hdr_sz so the race can happen
> >> even for the case the header size grows though this can be fixed by
> >> reordering the two reads.
> >
> > One option is to fail any control path that tries to re-negotiate
> > header size once this hash option is enabled?
> >
> > There is no practical reason to allow feature re-negotiation at any
> > arbitrary time.
>
> I think it's a bit awkward interface design since tun allows to
> reconfigure any of its parameters, but it's certainly possible.

If this would be the only exception to that rule, and this is the only
place that needs a datapath check, then it's fine to leave as is.

In general, this runtime configurability serves little purpose but to
help syzbot exercise code paths no real application would attempt. But
I won't ask to diverge from whatever tun already does. We just have to
be more careful about the possible races it brings.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

2023-10-09 Thread Willem de Bruijn
On Mon, Oct 9, 2023 at 3:05 AM Akihiko Odaki  wrote:
>
>
>
> On 2023/10/09 18:54, Willem de Bruijn wrote:
> > On Mon, Oct 9, 2023 at 3:44 AM Akihiko Odaki  
> > wrote:
> >>
> >> On 2023/10/09 17:13, Willem de Bruijn wrote:
> >>> On Sun, Oct 8, 2023 at 12:22 AM Akihiko Odaki  
> >>> wrote:
> 
>  virtio-net have two usage of hashes: one is RSS and another is hash
>  reporting. Conventionally the hash calculation was done by the VMM.
>  However, computing the hash after the queue was chosen defeats the
>  purpose of RSS.
> 
>  Another approach is to use eBPF steering program. This approach has
>  another downside: it cannot report the calculated hash due to the
>  restrictive nature of eBPF.
> 
>  Introduce the code to compute hashes to the kernel in order to overcome
>  thse challenges. An alternative solution is to extend the eBPF steering
>  program so that it will be able to report to the userspace, but it makes
>  little sense to allow to implement different hashing algorithms with
>  eBPF since the hash value reported by virtio-net is strictly defined by
>  the specification.
> 
>  The hash value already stored in sk_buff is not used and computed
>  independently since it may have been computed in a way not conformant
>  with the specification.
> 
>  Signed-off-by: Akihiko Odaki 
> >>>
>  @@ -2116,31 +2172,49 @@ static ssize_t tun_put_user(struct tun_struct 
>  *tun,
>    }
> 
>    if (vnet_hdr_sz) {
>  -   struct virtio_net_hdr gso;
>  +   union {
>  +   struct virtio_net_hdr hdr;
>  +   struct virtio_net_hdr_v1_hash v1_hash_hdr;
>  +   } hdr;
>  +   int ret;
> 
>    if (iov_iter_count(iter) < vnet_hdr_sz)
>    return -EINVAL;
> 
>  -   if (virtio_net_hdr_from_skb(skb, ,
>  -   tun_is_little_endian(tun), 
>  true,
>  -   vlan_hlen)) {
>  +   if ((READ_ONCE(tun->vnet_hash.flags) & 
>  TUN_VNET_HASH_REPORT) &&
>  +   vnet_hdr_sz >= sizeof(hdr.v1_hash_hdr) &&
>  +   skb->tun_vnet_hash) {
> >>>
> >>> Isn't vnet_hdr_sz guaranteed to be >= hdr.v1_hash_hdr, by virtue of
> >>> the set hash ioctl failing otherwise?
> >>>
> >>> Such checks should be limited to control path where possible
> >>
> >> There is a potential race since tun->vnet_hash.flags and vnet_hdr_sz are
> >> not read at once.
> >
> > It should not be possible to downgrade the hdr_sz once v1 is selected.
>
> I see nothing that prevents shrinking the header size.
>
> tun->vnet_hash.flags is read after vnet_hdr_sz so the race can happen
> even for the case the header size grows though this can be fixed by
> reordering the two reads.

One option is to fail any control path that tries to re-negotiate
header size once this hash option is enabled?

There is no practical reason to allow feature re-negotiation at any
arbitrary time.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

2023-10-09 Thread Willem de Bruijn
On Mon, Oct 9, 2023 at 3:02 AM Akihiko Odaki  wrote:
>
> On 2023/10/09 18:57, Willem de Bruijn wrote:
> > On Mon, Oct 9, 2023 at 3:57 AM Akihiko Odaki  
> > wrote:
> >>
> >> On 2023/10/09 17:04, Willem de Bruijn wrote:
> >>> On Sun, Oct 8, 2023 at 3:46 PM Akihiko Odaki  
> >>> wrote:
> 
>  On 2023/10/09 5:08, Willem de Bruijn wrote:
> > On Sun, Oct 8, 2023 at 10:04 PM Akihiko Odaki 
> >  wrote:
> >>
> >> On 2023/10/09 4:07, Willem de Bruijn wrote:
> >>> On Sun, Oct 8, 2023 at 7:22 AM Akihiko Odaki 
> >>>  wrote:
> 
>  virtio-net have two usage of hashes: one is RSS and another is hash
>  reporting. Conventionally the hash calculation was done by the VMM.
>  However, computing the hash after the queue was chosen defeats the
>  purpose of RSS.
> 
>  Another approach is to use eBPF steering program. This approach has
>  another downside: it cannot report the calculated hash due to the
>  restrictive nature of eBPF.
> 
>  Introduce the code to compute hashes to the kernel in order to 
>  overcome
>  thse challenges. An alternative solution is to extend the eBPF 
>  steering
>  program so that it will be able to report to the userspace, but it 
>  makes
>  little sense to allow to implement different hashing algorithms with
>  eBPF since the hash value reported by virtio-net is strictly defined 
>  by
>  the specification.
> 
>  The hash value already stored in sk_buff is not used and computed
>  independently since it may have been computed in a way not conformant
>  with the specification.
> 
>  Signed-off-by: Akihiko Odaki 
>  ---
> >>>
>  +static const struct tun_vnet_hash_cap tun_vnet_hash_cap = {
>  +   .max_indirection_table_length =
>  +   TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH,
>  +
>  +   .types = VIRTIO_NET_SUPPORTED_HASH_TYPES
>  +};
> >>>
> >>> No need to have explicit capabilities exchange like this? Tun either
> >>> supports all or none.
> >>
> >> tun does not support VIRTIO_NET_RSS_HASH_TYPE_IP_EX,
> >> VIRTIO_NET_RSS_HASH_TYPE_TCP_EX, and VIRTIO_NET_RSS_HASH_TYPE_UDP_EX.
> >>
> >> It is because the flow dissector does not support IPv6 extensions. The
> >> specification is also vague, and does not tell how many TLVs should be
> >> consumed at most when interpreting destination option header so I chose
> >> to avoid adding code for these hash types to the flow dissector. I 
> >> doubt
> >> anyone will complain about it since nobody complains for Linux.
> >>
> >> I'm also adding this so that we can extend it later.
> >> max_indirection_table_length may grow for systems with 128+ CPUs, or
> >> types may have other bits for new protocols in the future.
> >>
> >>>
>  case TUNSETSTEERINGEBPF:
>  -   ret = tun_set_ebpf(tun, >steering_prog, argp);
>  +   bpf_ret = tun_set_ebpf(tun, >steering_prog, 
>  argp);
>  +   if (IS_ERR(bpf_ret))
>  +   ret = PTR_ERR(bpf_ret);
>  +   else if (bpf_ret)
>  +   tun->vnet_hash.flags &= ~TUN_VNET_HASH_RSS;
> >>>
> >>> Don't make one feature disable another.
> >>>
> >>> TUNSETSTEERINGEBPF and TUNSETVNETHASH are mutually exclusive
> >>> functions. If one is enabled the other call should fail, with EBUSY
> >>> for instance.
> >>>
>  +   case TUNSETVNETHASH:
>  +   len = sizeof(vnet_hash);
>  +   if (copy_from_user(_hash, argp, len)) {
>  +   ret = -EFAULT;
>  +   break;
>  +   }
>  +
>  +   if (((vnet_hash.flags & TUN_VNET_HASH_REPORT) &&
>  +(tun->vnet_hdr_sz < sizeof(struct 
>  virtio_net_hdr_v1_hash) ||
>  + !tun_is_little_endian(tun))) ||
>  +vnet_hash.indirection_table_mask >=
>  +TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH) {
>  +   ret = -EINVAL;
>  +   break;
>  +   }
>  +
>  +   argp = (u8 __user *)argp + len;
>  +   len = (vnet_hash.indirection_table_mask + 1) * 2;
>  +   if (copy_from_user(vnet_hash_indirection_table, 
>  argp, len)) {
>  +   ret = -EFAULT;
>  +   break;
>  +   }
>  +
>  +   argp = (u8 __user *)argp + len;
>  +   len = 

Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

2023-10-09 Thread Willem de Bruijn
On Mon, Oct 9, 2023 at 3:57 AM Akihiko Odaki  wrote:
>
> On 2023/10/09 17:04, Willem de Bruijn wrote:
> > On Sun, Oct 8, 2023 at 3:46 PM Akihiko Odaki  
> > wrote:
> >>
> >> On 2023/10/09 5:08, Willem de Bruijn wrote:
> >>> On Sun, Oct 8, 2023 at 10:04 PM Akihiko Odaki  
> >>> wrote:
> 
>  On 2023/10/09 4:07, Willem de Bruijn wrote:
> > On Sun, Oct 8, 2023 at 7:22 AM Akihiko Odaki  
> > wrote:
> >>
> >> virtio-net have two usage of hashes: one is RSS and another is hash
> >> reporting. Conventionally the hash calculation was done by the VMM.
> >> However, computing the hash after the queue was chosen defeats the
> >> purpose of RSS.
> >>
> >> Another approach is to use eBPF steering program. This approach has
> >> another downside: it cannot report the calculated hash due to the
> >> restrictive nature of eBPF.
> >>
> >> Introduce the code to compute hashes to the kernel in order to overcome
> >> thse challenges. An alternative solution is to extend the eBPF steering
> >> program so that it will be able to report to the userspace, but it 
> >> makes
> >> little sense to allow to implement different hashing algorithms with
> >> eBPF since the hash value reported by virtio-net is strictly defined by
> >> the specification.
> >>
> >> The hash value already stored in sk_buff is not used and computed
> >> independently since it may have been computed in a way not conformant
> >> with the specification.
> >>
> >> Signed-off-by: Akihiko Odaki 
> >> ---
> >
> >> +static const struct tun_vnet_hash_cap tun_vnet_hash_cap = {
> >> +   .max_indirection_table_length =
> >> +   TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH,
> >> +
> >> +   .types = VIRTIO_NET_SUPPORTED_HASH_TYPES
> >> +};
> >
> > No need to have explicit capabilities exchange like this? Tun either
> > supports all or none.
> 
>  tun does not support VIRTIO_NET_RSS_HASH_TYPE_IP_EX,
>  VIRTIO_NET_RSS_HASH_TYPE_TCP_EX, and VIRTIO_NET_RSS_HASH_TYPE_UDP_EX.
> 
>  It is because the flow dissector does not support IPv6 extensions. The
>  specification is also vague, and does not tell how many TLVs should be
>  consumed at most when interpreting destination option header so I chose
>  to avoid adding code for these hash types to the flow dissector. I doubt
>  anyone will complain about it since nobody complains for Linux.
> 
>  I'm also adding this so that we can extend it later.
>  max_indirection_table_length may grow for systems with 128+ CPUs, or
>  types may have other bits for new protocols in the future.
> 
> >
> >>case TUNSETSTEERINGEBPF:
> >> -   ret = tun_set_ebpf(tun, >steering_prog, argp);
> >> +   bpf_ret = tun_set_ebpf(tun, >steering_prog, argp);
> >> +   if (IS_ERR(bpf_ret))
> >> +   ret = PTR_ERR(bpf_ret);
> >> +   else if (bpf_ret)
> >> +   tun->vnet_hash.flags &= ~TUN_VNET_HASH_RSS;
> >
> > Don't make one feature disable another.
> >
> > TUNSETSTEERINGEBPF and TUNSETVNETHASH are mutually exclusive
> > functions. If one is enabled the other call should fail, with EBUSY
> > for instance.
> >
> >> +   case TUNSETVNETHASH:
> >> +   len = sizeof(vnet_hash);
> >> +   if (copy_from_user(_hash, argp, len)) {
> >> +   ret = -EFAULT;
> >> +   break;
> >> +   }
> >> +
> >> +   if (((vnet_hash.flags & TUN_VNET_HASH_REPORT) &&
> >> +(tun->vnet_hdr_sz < sizeof(struct 
> >> virtio_net_hdr_v1_hash) ||
> >> + !tun_is_little_endian(tun))) ||
> >> +vnet_hash.indirection_table_mask >=
> >> +TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH) {
> >> +   ret = -EINVAL;
> >> +   break;
> >> +   }
> >> +
> >> +   argp = (u8 __user *)argp + len;
> >> +   len = (vnet_hash.indirection_table_mask + 1) * 2;
> >> +   if (copy_from_user(vnet_hash_indirection_table, argp, 
> >> len)) {
> >> +   ret = -EFAULT;
> >> +   break;
> >> +   }
> >> +
> >> +   argp = (u8 __user *)argp + len;
> >> +   len = virtio_net_hash_key_length(vnet_hash.types);
> >> +
> >> +   if (copy_from_user(vnet_hash_key, argp, len)) {
> >> +   ret = -EFAULT;
> >> +   break;
> >> +   }
> >
> > Probably easier and less error-prone to define a fixed size control
> > struct with the max indirection table size.

Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

2023-10-09 Thread Willem de Bruijn
On Mon, Oct 9, 2023 at 3:44 AM Akihiko Odaki  wrote:
>
> On 2023/10/09 17:13, Willem de Bruijn wrote:
> > On Sun, Oct 8, 2023 at 12:22 AM Akihiko Odaki  
> > wrote:
> >>
> >> virtio-net have two usage of hashes: one is RSS and another is hash
> >> reporting. Conventionally the hash calculation was done by the VMM.
> >> However, computing the hash after the queue was chosen defeats the
> >> purpose of RSS.
> >>
> >> Another approach is to use eBPF steering program. This approach has
> >> another downside: it cannot report the calculated hash due to the
> >> restrictive nature of eBPF.
> >>
> >> Introduce the code to compute hashes to the kernel in order to overcome
> >> thse challenges. An alternative solution is to extend the eBPF steering
> >> program so that it will be able to report to the userspace, but it makes
> >> little sense to allow to implement different hashing algorithms with
> >> eBPF since the hash value reported by virtio-net is strictly defined by
> >> the specification.
> >>
> >> The hash value already stored in sk_buff is not used and computed
> >> independently since it may have been computed in a way not conformant
> >> with the specification.
> >>
> >> Signed-off-by: Akihiko Odaki 
> >
> >> @@ -2116,31 +2172,49 @@ static ssize_t tun_put_user(struct tun_struct *tun,
> >>  }
> >>
> >>  if (vnet_hdr_sz) {
> >> -   struct virtio_net_hdr gso;
> >> +   union {
> >> +   struct virtio_net_hdr hdr;
> >> +   struct virtio_net_hdr_v1_hash v1_hash_hdr;
> >> +   } hdr;
> >> +   int ret;
> >>
> >>  if (iov_iter_count(iter) < vnet_hdr_sz)
> >>  return -EINVAL;
> >>
> >> -   if (virtio_net_hdr_from_skb(skb, ,
> >> -   tun_is_little_endian(tun), 
> >> true,
> >> -   vlan_hlen)) {
> >> +   if ((READ_ONCE(tun->vnet_hash.flags) & 
> >> TUN_VNET_HASH_REPORT) &&
> >> +   vnet_hdr_sz >= sizeof(hdr.v1_hash_hdr) &&
> >> +   skb->tun_vnet_hash) {
> >
> > Isn't vnet_hdr_sz guaranteed to be >= hdr.v1_hash_hdr, by virtue of
> > the set hash ioctl failing otherwise?
> >
> > Such checks should be limited to control path where possible
>
> There is a potential race since tun->vnet_hash.flags and vnet_hdr_sz are
> not read at once.

It should not be possible to downgrade the hdr_sz once v1 is selected.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

2023-10-09 Thread Willem de Bruijn
On Sun, Oct 8, 2023 at 12:22 AM Akihiko Odaki  wrote:
>
> virtio-net have two usage of hashes: one is RSS and another is hash
> reporting. Conventionally the hash calculation was done by the VMM.
> However, computing the hash after the queue was chosen defeats the
> purpose of RSS.
>
> Another approach is to use eBPF steering program. This approach has
> another downside: it cannot report the calculated hash due to the
> restrictive nature of eBPF.
>
> Introduce the code to compute hashes to the kernel in order to overcome
> thse challenges. An alternative solution is to extend the eBPF steering
> program so that it will be able to report to the userspace, but it makes
> little sense to allow to implement different hashing algorithms with
> eBPF since the hash value reported by virtio-net is strictly defined by
> the specification.
>
> The hash value already stored in sk_buff is not used and computed
> independently since it may have been computed in a way not conformant
> with the specification.
>
> Signed-off-by: Akihiko Odaki 

> @@ -2116,31 +2172,49 @@ static ssize_t tun_put_user(struct tun_struct *tun,
> }
>
> if (vnet_hdr_sz) {
> -   struct virtio_net_hdr gso;
> +   union {
> +   struct virtio_net_hdr hdr;
> +   struct virtio_net_hdr_v1_hash v1_hash_hdr;
> +   } hdr;
> +   int ret;
>
> if (iov_iter_count(iter) < vnet_hdr_sz)
> return -EINVAL;
>
> -   if (virtio_net_hdr_from_skb(skb, ,
> -   tun_is_little_endian(tun), true,
> -   vlan_hlen)) {
> +   if ((READ_ONCE(tun->vnet_hash.flags) & TUN_VNET_HASH_REPORT) 
> &&
> +   vnet_hdr_sz >= sizeof(hdr.v1_hash_hdr) &&
> +   skb->tun_vnet_hash) {

Isn't vnet_hdr_sz guaranteed to be >= hdr.v1_hash_hdr, by virtue of
the set hash ioctl failing otherwise?

Such checks should be limited to control path where possible

> +   vnet_hdr_content_sz = sizeof(hdr.v1_hash_hdr);
> +   ret = virtio_net_hdr_v1_hash_from_skb(skb,
> + 
> _hash_hdr,
> + true,
> + vlan_hlen,
> + _hash);
> +   } else {
> +   vnet_hdr_content_sz = sizeof(hdr.hdr);
> +   ret = virtio_net_hdr_from_skb(skb, ,
> + 
> tun_is_little_endian(tun),
> + true, vlan_hlen);
> +   }
> +
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

2023-10-09 Thread Willem de Bruijn
On Sun, Oct 8, 2023 at 3:46 PM Akihiko Odaki  wrote:
>
> On 2023/10/09 5:08, Willem de Bruijn wrote:
> > On Sun, Oct 8, 2023 at 10:04 PM Akihiko Odaki  
> > wrote:
> >>
> >> On 2023/10/09 4:07, Willem de Bruijn wrote:
> >>> On Sun, Oct 8, 2023 at 7:22 AM Akihiko Odaki  
> >>> wrote:
> 
>  virtio-net have two usage of hashes: one is RSS and another is hash
>  reporting. Conventionally the hash calculation was done by the VMM.
>  However, computing the hash after the queue was chosen defeats the
>  purpose of RSS.
> 
>  Another approach is to use eBPF steering program. This approach has
>  another downside: it cannot report the calculated hash due to the
>  restrictive nature of eBPF.
> 
>  Introduce the code to compute hashes to the kernel in order to overcome
>  thse challenges. An alternative solution is to extend the eBPF steering
>  program so that it will be able to report to the userspace, but it makes
>  little sense to allow to implement different hashing algorithms with
>  eBPF since the hash value reported by virtio-net is strictly defined by
>  the specification.
> 
>  The hash value already stored in sk_buff is not used and computed
>  independently since it may have been computed in a way not conformant
>  with the specification.
> 
>  Signed-off-by: Akihiko Odaki 
>  ---
> >>>
>  +static const struct tun_vnet_hash_cap tun_vnet_hash_cap = {
>  +   .max_indirection_table_length =
>  +   TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH,
>  +
>  +   .types = VIRTIO_NET_SUPPORTED_HASH_TYPES
>  +};
> >>>
> >>> No need to have explicit capabilities exchange like this? Tun either
> >>> supports all or none.
> >>
> >> tun does not support VIRTIO_NET_RSS_HASH_TYPE_IP_EX,
> >> VIRTIO_NET_RSS_HASH_TYPE_TCP_EX, and VIRTIO_NET_RSS_HASH_TYPE_UDP_EX.
> >>
> >> It is because the flow dissector does not support IPv6 extensions. The
> >> specification is also vague, and does not tell how many TLVs should be
> >> consumed at most when interpreting destination option header so I chose
> >> to avoid adding code for these hash types to the flow dissector. I doubt
> >> anyone will complain about it since nobody complains for Linux.
> >>
> >> I'm also adding this so that we can extend it later.
> >> max_indirection_table_length may grow for systems with 128+ CPUs, or
> >> types may have other bits for new protocols in the future.
> >>
> >>>
>    case TUNSETSTEERINGEBPF:
>  -   ret = tun_set_ebpf(tun, >steering_prog, argp);
>  +   bpf_ret = tun_set_ebpf(tun, >steering_prog, argp);
>  +   if (IS_ERR(bpf_ret))
>  +   ret = PTR_ERR(bpf_ret);
>  +   else if (bpf_ret)
>  +   tun->vnet_hash.flags &= ~TUN_VNET_HASH_RSS;
> >>>
> >>> Don't make one feature disable another.
> >>>
> >>> TUNSETSTEERINGEBPF and TUNSETVNETHASH are mutually exclusive
> >>> functions. If one is enabled the other call should fail, with EBUSY
> >>> for instance.
> >>>
>  +   case TUNSETVNETHASH:
>  +   len = sizeof(vnet_hash);
>  +   if (copy_from_user(_hash, argp, len)) {
>  +   ret = -EFAULT;
>  +   break;
>  +   }
>  +
>  +   if (((vnet_hash.flags & TUN_VNET_HASH_REPORT) &&
>  +(tun->vnet_hdr_sz < sizeof(struct 
>  virtio_net_hdr_v1_hash) ||
>  + !tun_is_little_endian(tun))) ||
>  +vnet_hash.indirection_table_mask >=
>  +TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH) {
>  +   ret = -EINVAL;
>  +   break;
>  +   }
>  +
>  +   argp = (u8 __user *)argp + len;
>  +   len = (vnet_hash.indirection_table_mask + 1) * 2;
>  +   if (copy_from_user(vnet_hash_indirection_table, argp, 
>  len)) {
>  +   ret = -EFAULT;
>  +   break;
>  +   }
>  +
>  +   argp = (u8 __user *)argp + len;
>  +   len = virtio_net_hash_key_length(vnet_hash.types);
>  +
>  +   if (copy_from_user(vnet_hash_key, argp, len)) {
>  +   ret = -EFAULT;
>  +   break;
>  +   }
> >>>
> >>> Probably easier and less error-prone to define a fixed size control
> >>> struct with the max indirection table size.
> >>
> >> I made its size variable because the indirection table and key may grow
> >> in the future as I wrote above.
> >>
> >>>
> >>> Btw: please trim the CC: list considerably on future patches.
> >>
> >> I'll do so in the next version with the TUNSETSTEERINGEBPF change you
> >> proposed.
> >
> > To be clear: please don't just 

Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

2023-10-08 Thread Willem de Bruijn
On Sun, Oct 8, 2023 at 10:04 PM Akihiko Odaki  wrote:
>
> On 2023/10/09 4:07, Willem de Bruijn wrote:
> > On Sun, Oct 8, 2023 at 7:22 AM Akihiko Odaki  
> > wrote:
> >>
> >> virtio-net have two usage of hashes: one is RSS and another is hash
> >> reporting. Conventionally the hash calculation was done by the VMM.
> >> However, computing the hash after the queue was chosen defeats the
> >> purpose of RSS.
> >>
> >> Another approach is to use eBPF steering program. This approach has
> >> another downside: it cannot report the calculated hash due to the
> >> restrictive nature of eBPF.
> >>
> >> Introduce the code to compute hashes to the kernel in order to overcome
> >> thse challenges. An alternative solution is to extend the eBPF steering
> >> program so that it will be able to report to the userspace, but it makes
> >> little sense to allow to implement different hashing algorithms with
> >> eBPF since the hash value reported by virtio-net is strictly defined by
> >> the specification.
> >>
> >> The hash value already stored in sk_buff is not used and computed
> >> independently since it may have been computed in a way not conformant
> >> with the specification.
> >>
> >> Signed-off-by: Akihiko Odaki 
> >> ---
> >
> >> +static const struct tun_vnet_hash_cap tun_vnet_hash_cap = {
> >> +   .max_indirection_table_length =
> >> +   TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH,
> >> +
> >> +   .types = VIRTIO_NET_SUPPORTED_HASH_TYPES
> >> +};
> >
> > No need to have explicit capabilities exchange like this? Tun either
> > supports all or none.
>
> tun does not support VIRTIO_NET_RSS_HASH_TYPE_IP_EX,
> VIRTIO_NET_RSS_HASH_TYPE_TCP_EX, and VIRTIO_NET_RSS_HASH_TYPE_UDP_EX.
>
> It is because the flow dissector does not support IPv6 extensions. The
> specification is also vague, and does not tell how many TLVs should be
> consumed at most when interpreting destination option header so I chose
> to avoid adding code for these hash types to the flow dissector. I doubt
> anyone will complain about it since nobody complains for Linux.
>
> I'm also adding this so that we can extend it later.
> max_indirection_table_length may grow for systems with 128+ CPUs, or
> types may have other bits for new protocols in the future.
>
> >
> >>  case TUNSETSTEERINGEBPF:
> >> -   ret = tun_set_ebpf(tun, >steering_prog, argp);
> >> +   bpf_ret = tun_set_ebpf(tun, >steering_prog, argp);
> >> +   if (IS_ERR(bpf_ret))
> >> +   ret = PTR_ERR(bpf_ret);
> >> +   else if (bpf_ret)
> >> +   tun->vnet_hash.flags &= ~TUN_VNET_HASH_RSS;
> >
> > Don't make one feature disable another.
> >
> > TUNSETSTEERINGEBPF and TUNSETVNETHASH are mutually exclusive
> > functions. If one is enabled the other call should fail, with EBUSY
> > for instance.
> >
> >> +   case TUNSETVNETHASH:
> >> +   len = sizeof(vnet_hash);
> >> +   if (copy_from_user(_hash, argp, len)) {
> >> +   ret = -EFAULT;
> >> +   break;
> >> +   }
> >> +
> >> +   if (((vnet_hash.flags & TUN_VNET_HASH_REPORT) &&
> >> +(tun->vnet_hdr_sz < sizeof(struct 
> >> virtio_net_hdr_v1_hash) ||
> >> + !tun_is_little_endian(tun))) ||
> >> +vnet_hash.indirection_table_mask >=
> >> +TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH) {
> >> +   ret = -EINVAL;
> >> +   break;
> >> +   }
> >> +
> >> +   argp = (u8 __user *)argp + len;
> >> +   len = (vnet_hash.indirection_table_mask + 1) * 2;
> >> +   if (copy_from_user(vnet_hash_indirection_table, argp, 
> >> len)) {
> >> +   ret = -EFAULT;
> >> +   break;
> >> +   }
> >> +
> >> +   argp = (u8 __user *)argp + len;
> >> +   len = virtio_net_hash_key_length(vnet_hash.types);
> >> +
> >> +   if (copy_from_user(vnet_hash_key, argp, len)) {
> >> +   ret = -EFAULT;
> >> +   break;
> >> +   }
> >
> > Probably easier and less error-prone to define a fixed size control
> > struct with the max indirection table size.
>
> I made its size variable because the indirection table and key may grow
> in the future as I wrote above.
>
> >
> > Btw: please trim the CC: list considerably on future patches.
>
> I'll do so in the next version with the TUNSETSTEERINGEBPF change you
> proposed.

To be clear: please don't just resubmit with that one change.

The skb and cb issues are quite fundamental issues that need to be resolved.

I'd like to understand why adjusting the existing BPF feature for this
exact purpose cannot be amended to return the key it produced.

As you point out, the C flow dissector is insufficient. The BPF flow
dissector does not have this problem. The same 

Re: [RFC PATCH 5/7] tun: Introduce virtio-net hashing feature

2023-10-08 Thread Willem de Bruijn
On Sun, Oct 8, 2023 at 7:22 AM Akihiko Odaki  wrote:
>
> virtio-net have two usage of hashes: one is RSS and another is hash
> reporting. Conventionally the hash calculation was done by the VMM.
> However, computing the hash after the queue was chosen defeats the
> purpose of RSS.
>
> Another approach is to use eBPF steering program. This approach has
> another downside: it cannot report the calculated hash due to the
> restrictive nature of eBPF.
>
> Introduce the code to compute hashes to the kernel in order to overcome
> thse challenges. An alternative solution is to extend the eBPF steering
> program so that it will be able to report to the userspace, but it makes
> little sense to allow to implement different hashing algorithms with
> eBPF since the hash value reported by virtio-net is strictly defined by
> the specification.
>
> The hash value already stored in sk_buff is not used and computed
> independently since it may have been computed in a way not conformant
> with the specification.
>
> Signed-off-by: Akihiko Odaki 
> ---

> +static const struct tun_vnet_hash_cap tun_vnet_hash_cap = {
> +   .max_indirection_table_length =
> +   TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH,
> +
> +   .types = VIRTIO_NET_SUPPORTED_HASH_TYPES
> +};

No need to have explicit capabilities exchange like this? Tun either
supports all or none.

> case TUNSETSTEERINGEBPF:
> -   ret = tun_set_ebpf(tun, >steering_prog, argp);
> +   bpf_ret = tun_set_ebpf(tun, >steering_prog, argp);
> +   if (IS_ERR(bpf_ret))
> +   ret = PTR_ERR(bpf_ret);
> +   else if (bpf_ret)
> +   tun->vnet_hash.flags &= ~TUN_VNET_HASH_RSS;

Don't make one feature disable another.

TUNSETSTEERINGEBPF and TUNSETVNETHASH are mutually exclusive
functions. If one is enabled the other call should fail, with EBUSY
for instance.

> +   case TUNSETVNETHASH:
> +   len = sizeof(vnet_hash);
> +   if (copy_from_user(_hash, argp, len)) {
> +   ret = -EFAULT;
> +   break;
> +   }
> +
> +   if (((vnet_hash.flags & TUN_VNET_HASH_REPORT) &&
> +(tun->vnet_hdr_sz < sizeof(struct 
> virtio_net_hdr_v1_hash) ||
> + !tun_is_little_endian(tun))) ||
> +vnet_hash.indirection_table_mask >=
> +TUN_VNET_HASH_MAX_INDIRECTION_TABLE_LENGTH) {
> +   ret = -EINVAL;
> +   break;
> +   }
> +
> +   argp = (u8 __user *)argp + len;
> +   len = (vnet_hash.indirection_table_mask + 1) * 2;
> +   if (copy_from_user(vnet_hash_indirection_table, argp, len)) {
> +   ret = -EFAULT;
> +   break;
> +   }
> +
> +   argp = (u8 __user *)argp + len;
> +   len = virtio_net_hash_key_length(vnet_hash.types);
> +
> +   if (copy_from_user(vnet_hash_key, argp, len)) {
> +   ret = -EFAULT;
> +   break;
> +   }

Probably easier and less error-prone to define a fixed size control
struct with the max indirection table size.

Btw: please trim the CC: list considerably on future patches.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization