Re: [Linux-kernel-mentees][PATCH 0/2] reorder members of structures in virtio_net for optimization

2020-10-02 Thread David Miller
From: Anant Thazhemadam 
Date: Wed, 30 Sep 2020 10:47:20 +0530

> The structures virtnet_info and receive_queue have byte holes in 
> middle, and their members could do with some rearranging 
> (order-of-declaration wise) in order to overcome this.
> 
> Rearranging the members helps in:
>   * elimination the byte holes in the middle of the structures
>   * reduce the size of the structure (virtnet_info)
>   * have more members stored in one cache line (as opposed to 
> unnecessarily crossing the cacheline boundary and spanning
> different cachelines)
> 
> The analysis was performed using pahole.
> 
> These patches may be applied in any order.

What effects do these changes have on performance?

The cache locality for various TX and RX paths could be effected.

I'm not applying these patches without some data on the performance
impact.

Thank you.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net v2] virtio-net: don't disable guest csum when disable LRO

2020-09-29 Thread David Miller
From: xiangxia.m@gmail.com
Date: Tue, 29 Sep 2020 09:58:06 +0800

> From: Tonghao Zhang 
> 
> Open vSwitch and Linux bridge will disable LRO of the interface
> when this interface added to them. Now when disable the LRO, the
> virtio-net csum is disable too. That drops the forwarding performance.
> 
> Fixes: a02e8964eaf9 ("virtio-net: ethtool configurable LRO")
> Cc: Michael S. Tsirkin 
> Cc: Jason Wang 
> Cc: Willem de Bruijn 
> Signed-off-by: Tonghao Zhang 

Applied and queued up for -stable, thank you.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] vhost: fix typo in error message

2020-09-01 Thread David Miller
From: Yunsheng Lin 
Date: Tue, 1 Sep 2020 10:39:09 +0800

> "enable" should be "disable" when the function name is
> vhost_disable_notify(), which does the disabling work.
> 
> Signed-off-by: Yunsheng Lin 

Applied to 'net'.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3] virtio_vsock: Fix race condition in virtio_transport_recv_pkt

2020-05-30 Thread David Miller
From: Jia He 
Date: Sat, 30 May 2020 09:38:28 +0800

> When client on the host tries to connect(SOCK_STREAM, O_NONBLOCK) to the
> server on the guest, there will be a panic on a ThunderX2 (armv8a server):
 ...
> The race condition is as follows:
> Task1Task2
> ==
> __sock_release   virtio_transport_recv_pkt
>   __vsock_release  vsock_find_bound_socket (found sk)
> lock_sock_nested
> vsock_remove_sock
> sock_orphan
>   sk_set_socket(sk, NULL)
> sk->sk_shutdown = SHUTDOWN_MASK
> ...
> release_sock
> lock_sock
>virtio_transport_recv_connecting
>  sk->sk_socket->state (panic!)
> 
> The root cause is that vsock_find_bound_socket can't hold the lock_sock,
> so there is a small race window between vsock_find_bound_socket() and
> lock_sock(). If __vsock_release() is running in another task,
> sk->sk_socket will be set to NULL inadvertently.
> 
> This fixes it by checking sk->sk_shutdown(suggested by Stefano) after
> lock_sock since sk->sk_shutdown is set to SHUTDOWN_MASK under the
> protection of lock_sock_nested.
> 
> Signed-off-by: Jia He 
> Reviewed-by: Stefano Garzarella 

Applied and queued up for -stable, thank you.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] virtio_net: fix lockdep warning on 32 bit

2020-05-07 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Thu, 7 May 2020 03:25:56 -0400

> When we fill up a receive VQ, try_fill_recv currently tries to count
> kicks using a 64 bit stats counter. Turns out, on a 32 bit kernel that
> uses a seqcount. sequence counts are "lock" constructs where you need to
> make sure that writers are serialized.
> 
> In turn, this means that we mustn't run two try_fill_recv concurrently.
> Which of course we don't. We do run try_fill_recv sometimes from a
> softirq napi context, and sometimes from a fully preemptible context,
> but the later always runs with napi disabled.
> 
> However, when it comes to the seqcount, lockdep is trying to enforce the
> rule that the same lock isn't accessed from preemptible and softirq
> context - it doesn't know about napi being enabled/disabled. This causes
> a false-positive warning:
> 
> WARNING: inconsistent lock state
> ...
> inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
> 
> As a work around, shut down the warning by switching
> to u64_stats_update_begin_irqsave - that works by disabling
> interrupts on 32 bit only, is a NOP on 64 bit.
> 
> Reported-by: Thomas Gleixner 
> Suggested-by: Eric Dumazet 
> Signed-off-by: Michael S. Tsirkin 

Applied and queued up for -stable, thanks Michael.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_net: fix lockdep warning on 32 bit

2020-05-06 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Tue, 5 May 2020 20:01:31 -0400

> - u64_stats_update_end(>stats.syncp);
> + u64_stats_update_end_irqrestore(>stats.syncp);

Need to pass flags to this function.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net v2 0/2] vsock/virtio: fixes about packet delivery to monitoring devices

2020-04-27 Thread David Miller
From: Stefano Garzarella 
Date: Fri, 24 Apr 2020 17:08:28 +0200

> During the review of v1, Stefan pointed out an issue introduced by
> that patch, where replies can appear in the packet capture before
> the transmitted packet.
> 
> While fixing my patch, reverting it and adding a new flag in
> 'struct virtio_vsock_pkt' (patch 2/2), I found that we already had
> that issue in vhost-vsock, so I fixed it (patch 1/2).
> 
> v1 -> v2:
> - reverted the v1 patch, to avoid that replies can appear in the
>   packet capture before the transmitted packet [Stefan]
> - added patch to fix packet delivering to monitoring devices in
>   vhost-vsock
> - added patch to check if the packet is already delivered to
>   monitoring devices
> 
> v1: 
> https://patchwork.ozlabs.org/project/netdev/patch/20200421092527.41651-1-sgarz...@redhat.com/

Series applied, thank you.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RESEND] ptp: add VMware virtual PTP clock driver

2020-03-05 Thread David Miller
From: Vivek Thampi 
Date: Fri, 28 Feb 2020 05:32:46 +

> Add a PTP clock driver called ptp_vmw, for guests running on VMware ESXi
> hypervisor. The driver attaches to a VMware virtual device called
> "precision clock" that provides a mechanism for querying host system time.
> Similar to existing virtual PTP clock drivers (e.g. ptp_kvm), ptp_vmw
> utilizes the kernel's PTP hardware clock API to implement a clock device
> that can be used as a reference in Chrony for synchronizing guest time with
> host.
> 
> The driver is only applicable to x86 guests running in VMware virtual
> machines with precision clock virtual device present. It uses a VMware
> specific hypercall mechanism to read time from the device.
> 
> Reviewed-by: Thomas Hellstrom 
> Signed-off-by: Vivek Thampi 

Thanks for your explanation of why this is a reasonable driver, makes sense.

Applied to net-next.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RESEND] ptp: add VMware virtual PTP clock driver

2020-03-04 Thread David Miller
From: Vivek Thampi 
Date: Fri, 28 Feb 2020 05:32:46 +

> Add a PTP clock driver called ptp_vmw, for guests running on VMware ESXi
> hypervisor. The driver attaches to a VMware virtual device called
> "precision clock" that provides a mechanism for querying host system time.
> Similar to existing virtual PTP clock drivers (e.g. ptp_kvm), ptp_vmw
> utilizes the kernel's PTP hardware clock API to implement a clock device
> that can be used as a reference in Chrony for synchronizing guest time with
> host.
> 
> The driver is only applicable to x86 guests running in VMware virtual
> machines with precision clock virtual device present. It uses a VMware
> specific hypercall mechanism to read time from the device.
> 
> Reviewed-by: Thomas Hellstrom 
> Signed-off-by: Vivek Thampi 
> ---
>  Based on feedback, resending patch to include a broader audience.

If it's just providing a read of an accurate timesource, I think it's kinda
pointless to provide a full PTP driver for it.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] vsock: fix potential deadlock in transport->release()

2020-02-27 Thread David Miller
From: Stefano Garzarella 
Date: Wed, 26 Feb 2020 11:58:18 +0100

> Some transports (hyperv, virtio) acquire the sock lock during the
> .release() callback.
> 
> In the vsock_stream_connect() we call vsock_assign_transport(); if
> the socket was previously assigned to another transport, the
> vsk->transport->release() is called, but the sock lock is already
> held in the vsock_stream_connect(), causing a deadlock reported by
> syzbot:
 ...
> To avoid this issue, this patch remove the lock acquiring in the
> .release() callback of hyperv and virtio transports, and it holds
> the lock when we call vsk->transport->release() in the vsock core.
> 
> Reported-by: syzbot+731710996d79d0d58...@syzkaller.appspotmail.com
> Fixes: 408624af4c89 ("vsock: use local transport when it is loaded")
> Signed-off-by: Stefano Garzarella 

Applied, thank you.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3] virtio: Work around frames incorrectly marked as gso

2020-02-26 Thread David Miller
From: anton.iva...@cambridgegreys.com
Date: Mon, 24 Feb 2020 13:25:50 +

> From: Anton Ivanov 
> 
> Some of the locally generated frames marked as GSO which
> arrive at virtio_net_hdr_from_skb() have no GSO_TYPE, no
> fragments (data_len = 0) and length significantly shorter
> than the MTU (752 in my experiments).
> 
> This is observed on raw sockets reading off vEth interfaces
> in all 4.x and 5.x kernels. The frames are reported as
> invalid, while they are in fact gso-less frames.
> 
> The easiest way to reproduce is to connect a User Mode
> Linux instance to the host using the vector raw transport
> and a vEth interface. Vector raw uses recvmmsg/sendmmsg
> with virtio headers on af_packet sockets. When running iperf
> between the UML and the host, UML regularly complains about
> EINVAL return from recvmmsg.
> 
> This patch marks the vnet header as non-GSO instead of
> reporting it as invalid.
> 
> Signed-off-by: Anton Ivanov 

I don't feel comfortable applying this until we know where these
weird frames are coming from and how they are created.

Please respin this patch once you know this information and make
sure to mention it in the commit log.

Thank you.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 1/3] vsock: add network namespace support

2020-01-20 Thread David Miller
From: Stefano Garzarella 
Date: Thu, 16 Jan 2020 18:24:26 +0100

> This patch adds 'netns' module param to enable this new feature
> (disabled by default), because it changes vsock's behavior with
> network namespaces and could break existing applications.

Sorry, no.

I wonder if you can even design a legitimate, reasonable, use case
where these netns changes could break things.

I am totally against adding a module parameter for this, it's
incredibly confusing for users and will create a test scenerio
that is strongly less likely to be covered.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] virtio_net: CTRL_GUEST_OFFLOADS depends on CTRL_VQ

2020-01-09 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Sun, 5 Jan 2020 08:22:07 -0500

> The only way for guest to control offloads (as enabled by
> VIRTIO_NET_F_CTRL_GUEST_OFFLOADS) is by sending commands
> through CTRL_VQ. So it does not make sense to
> acknowledge VIRTIO_NET_F_CTRL_GUEST_OFFLOADS without
> VIRTIO_NET_F_CTRL_VQ.
> 
> The spec does not outlaw devices with such a configuration, so we have
> to support it. Simply clear VIRTIO_NET_F_CTRL_GUEST_OFFLOADS.
> Note that Linux is still crashing if it tries to
> change the offloads when there's no control vq.
> That needs to be fixed by another patch.
> 
> Reported-by: Alistair Delva 
> Reported-by: Willem de Bruijn 
> Fixes: 3f93522ffab2 ("virtio-net: switch off offloads on demand if possible 
> on XDP set")
> Signed-off-by: Michael S. Tsirkin 
> ---
> 
> Same patch as v1 but update documentation so it's clear it's not
> enough to fix the crash.

Where are we with this patch?  There seems to still be some unresolved
discussion about how we should actually handle this case.

Thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v3 00/11] VSOCK: add vsock_test test suite

2019-12-20 Thread David Miller
From: Stefano Garzarella 
Date: Wed, 18 Dec 2019 19:06:57 +0100

> The vsock_diag.ko module already has a test suite but the core AF_VSOCK
> functionality has no tests. This patch series adds several test cases that
> exercise AF_VSOCK SOCK_STREAM socket semantics (send/recv, connect/accept,
> half-closed connections, simultaneous connections).
> 
> The v1 of this series was originally sent by Stefan.
> 
> v3:
> - Patch 6:
>   * check the byte received in the recv_byte()
>   * use send(2)/recv(2) instead of write(2)/read(2) to test also flags
> (e.g. MSG_PEEK)
> - Patch 8:
>   * removed unnecessary control_expectln("CLOSED") [Stefan].
> - removed patches 9,10,11 added in the v2
> - new Patch 9 add parameters to list and skip tests (e.g. useful for vmci
>   that doesn't support half-closed socket in the host)
> - new Patch 10 prints a list of options in the help
> - new Patch 11 tests MSG_PEEK flags of recv(2)
> 
> v2: https://patchwork.ozlabs.org/cover/1140538/
> v1: https://patchwork.ozlabs.org/cover/847998/

Series applied, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net 0/2] vsock/virtio: fix null-pointer dereference and related precautions

2019-12-16 Thread David Miller
From: Stefano Garzarella 
Date: Fri, 13 Dec 2019 19:47:59 +0100

> This series mainly solves a possible null-pointer dereference in
> virtio_transport_recv_listen() introduced with the multi-transport
> support [PATCH 1].
> 
> PATCH 2 adds a WARN_ON check for the same potential issue
> and a returned error in the virtio_transport_send_pkt_info() function
> to avoid crashing the kernel.

Series applied, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v12 0/3] netdev: ndo_tx_timeout cleanup

2019-12-12 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Tue, 10 Dec 2019 09:23:47 -0500

> Yet another forward declaration I missed. Hopfully the last one ...
> 
> A bunch of drivers want to know which tx queue triggered a timeout,
> and virtio wants to do the same.
> We actually have the info to hand, let's just pass it on to drivers.
> Note: tested with an experimental virtio patch by Julio.
> That patch itself isn't ready yet though, so not included.
> Other drivers compiled only.

Series applied, will push out after build testing completes.

Thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost/vsock: accept only packets with the right dst_cid

2019-12-12 Thread David Miller
From: Stefano Garzarella 
Date: Thu, 12 Dec 2019 14:14:53 +0100

> On Thu, Dec 12, 2019 at 07:56:26AM -0500, Michael S. Tsirkin wrote:
>> On Thu, Dec 12, 2019 at 01:36:24PM +0100, Stefano Garzarella wrote:
>> I'd say it's better to backport to all stable releases where it applies,
>> but yes it's only a security issue in 5.4.  Dave could you forward pls?
> 
> Yes, I agree with you.
> 
> @Dave let me know if I should do it.

I've queued it up for -stable.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v2 0/6] vsock: add local transport support

2019-12-11 Thread David Miller
From: Stefano Garzarella 
Date: Tue, 10 Dec 2019 11:43:01 +0100

> v2:
>  - style fixes [Dave]
>  - removed RCU sync and changed 'the_vsock_loopback' in a global
>static variable [Stefan]
>  - use G2H transport when local transport is not loaded and remote cid
>is VMADDR_CID_LOCAL [Stefan]
>  - rebased on net-next
> 
> v1: https://patchwork.kernel.org/cover/11251735/
> 
> This series introduces a new transport (vsock_loopback) to handle
> local communication.
...

Series applied, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v11 0/3] netdev: ndo_tx_timeout cleanup

2019-12-10 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Tue, 10 Dec 2019 08:08:58 -0500

> Sorry about the churn, v10 was based on net - not on net-next
> by mistake.

Ugh sorry, I just saw this.  vger is sending postings massively out of
order today.

Ignore my previous reply to #1 :-)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v10 1/3] netdev: pass the stuck queue to the timeout handler

2019-12-10 Thread David Miller


Michael, please provide a proper introductory posting for your patch series
just like everyone else does.

Not only does it help people understand at a high level what the patch
series is doing, how it is doing it, and why it is doing it that way.  It
also gives me a single email to reply to when I apply your patch series.

Therefore, please respin this properly.

Thank you.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v9 0/3] netdev: ndo_tx_timeout cleanup

2019-12-09 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Mon, 9 Dec 2019 11:28:57 -0500

> A bunch of drivers want to know which tx queue triggered a timeout,
> and virtio wants to do the same.
> We actually have the info to hand, let's just pass it on to drivers.
> Note: tested with an experimental virtio patch by Julio.
> That patch itself isn't ready yet though, so not included.
> Other drivers compiled only.

Besides the "int" --> "unsigned int" typing issue, I never saw patch #2
neither on the mailing list nor directly sent to me.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost/vsock: accept only packets with the right dst_cid

2019-12-07 Thread David Miller
From: Stefano Garzarella 
Date: Fri,  6 Dec 2019 15:39:12 +0100

> When we receive a new packet from the guest, we check if the
> src_cid is correct, but we forgot to check the dst_cid.
> 
> The host should accept only packets where dst_cid is
> equal to the host CID.
> 
> Signed-off-by: Stefano Garzarella 

Applied.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [net-next V3 0/2] drivers: net: virtio_net: implement

2019-11-27 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Wed, 27 Nov 2019 06:38:35 -0500

> On Tue, Nov 26, 2019 at 02:06:30PM -0800, David Miller wrote:
>> 
>> net-next is closed
> 
> Could you merge this early when net-next reopens though?
> This way I don't need to keep adding drivers to update.

It simply needs to be reposted this as soon as net-next opens back up.

I fail to understand even what special treatment you want given to
a given change, it doesn't make any sense.  We have a process for
doing this, it's simple, it's straightforward, and is fair to
everyone.

Thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [net-next V3 0/2] drivers: net: virtio_net: implement

2019-11-26 Thread David Miller


net-next is closed
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] MAINTAINERS: Add myself as maintainer of virtio-vsock

2019-11-22 Thread David Miller
From: Stefano Garzarella 
Date: Fri, 22 Nov 2019 11:20:10 +0100

> Since I'm actively working on vsock and virtio/vhost transports,
> Stefan suggested to help him to maintain it.
> 
> Signed-off-by: Stefano Garzarella 

Applied.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vsock/virtio: fix sock refcnt holding during the shutdown

2019-11-09 Thread David Miller
From: Stefano Garzarella 
Date: Fri,  8 Nov 2019 17:08:50 +0100

> The "42f5cda5eaf4" commit rightly set SOCK_DONE on peer shutdown,
> but there is an issue if we receive the SHUTDOWN(RDWR) while the
> virtio_transport_close_timeout() is scheduled.
> In this case, when the timeout fires, the SOCK_DONE is already
> set and the virtio_transport_close_timeout() will not call
> virtio_transport_reset() and virtio_transport_do_close().
> This causes that both sockets remain open and will never be released,
> preventing the unloading of [virtio|vhost]_transport modules.
> 
> This patch fixes this issue, calling virtio_transport_reset() and
> virtio_transport_do_close() when we receive the SHUTDOWN(RDWR)
> and there is nothing left to read.
> 
> Fixes: 42f5cda5eaf4 ("vsock/virtio: set SOCK_DONE on peer shutdown")
> Cc: Stephen Barber 
> Signed-off-by: Stefano Garzarella 

Applied and queued up for -stable, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net 0/2] vsock/virtio: make the credit mechanism more robust

2019-10-18 Thread David Miller
From: Stefano Garzarella 
Date: Thu, 17 Oct 2019 14:44:01 +0200

> This series makes the credit mechanism implemented in the
> virtio-vsock devices more robust.
> Patch 1 sends an update to the remote peer when the buf_alloc
> change.
> Patch 2 prevents a malicious peer (especially the guest) can
> consume all the memory of the other peer, discarding packets
> when the credit available is not respected.

Series applied, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vsock/virtio: remove unused 'work' field from 'struct virtio_vsock_pkt'

2019-10-15 Thread David Miller
From: Stefano Garzarella 
Date: Tue, 15 Oct 2019 17:00:51 +0200

> The 'work' field was introduced with commit 06a8fc78367d0
> ("VSOCK: Introduce virtio_vsock_common.ko")
> but it is never used in the code, so we can remove it to save
> memory allocated in the per-packet 'struct virtio_vsock_pkt'
> 
> Suggested-by: Michael S. Tsirkin 
> Signed-off-by: Stefano Garzarella 

Michael, will you take this?  I assume so...

Thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] net, uapi: fix -Wpointer-arith warnings

2019-10-04 Thread David Miller
From: Alexey Dobriyan 
Date: Thu, 3 Oct 2019 23:29:24 +0300

> Add casts to fix these warnings:
> 
> ./usr/include/linux/netfilter_arp/arp_tables.h:200:19: error: pointer of type 
> 'void *' used in arithmetic [-Werror=pointer-arith]
> ./usr/include/linux/netfilter_bridge/ebtables.h:197:19: error: pointer of 
> type 'void *' used in arithmetic [-Werror=pointer-arith]
> ./usr/include/linux/netfilter_ipv4/ip_tables.h:223:19: error: pointer of type 
> 'void *' used in arithmetic [-Werror=pointer-arith]
> ./usr/include/linux/netfilter_ipv6/ip6_tables.h:263:19: error: pointer of 
> type 'void *' used in arithmetic [-Werror=pointer-arith]
> ./usr/include/linux/tipc_config.h:310:28: error: pointer of type 'void *' 
> used in arithmetic [-Werror=pointer-arith]
> ./usr/include/linux/tipc_config.h:410:24: error: pointer of type 'void *' 
> used in arithmetic [-Werror=pointer-arith]
> ./usr/include/linux/virtio_ring.h:170:16: error: pointer of type 'void *' 
> used in arithmetic [-Werror=pointer-arith]
> 
> Those are theoretical probably but kernel doesn't control compiler flags
> in userspace.
> 
> Signed-off-by: Alexey Dobriyan 

Applied.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v2] vsock/virtio: add support for MSG_PEEK

2019-10-01 Thread David Miller
From: Matias Ezequiel Vara Larsen 
Date: Mon, 30 Sep 2019 18:25:23 +

> This patch adds support for MSG_PEEK. In such a case, packets are not
> removed from the rx_queue and credit updates are not sent.
> 
> Signed-off-by: Matias Ezequiel Vara Larsen 
> Reviewed-by: Stefano Garzarella 
> Tested-by: Stefano Garzarella 

Applied.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net v3] vsock: Fix a lockdep warning in __vsock_release()

2019-10-01 Thread David Miller
From: Dexuan Cui 
Date: Mon, 30 Sep 2019 18:43:50 +

> Lockdep is unhappy if two locks from the same class are held.
> 
> Fix the below warning for hyperv and virtio sockets (vmci socket code
> doesn't have the issue) by using lock_sock_nested() when __vsock_release()
> is called recursively:
 ...
> Tested-by: Stefano Garzarella 
> Signed-off-by: Dexuan Cui 

Applied, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] vsock/virtio: add support for MSG_PEEK

2019-09-27 Thread David Miller


This is net-next material.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 0/2] Revert and rework on the metadata accelreation

2019-09-06 Thread David Miller
From: Jason Wang 
Date: Fri, 6 Sep 2019 18:02:35 +0800

> On 2019/9/5 下午9:59, Jason Gunthorpe wrote:
>> I think you should apply the revert this cycle and rebase the other
>> patch for next..
>>
>> Jason
> 
> Yes, the plan is to revert in this release cycle.

Then you should reset patch #1 all by itself targetting 'net'.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] vsock/virtio: a better comment on credit update

2019-09-05 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Tue, 3 Sep 2019 03:38:16 -0400

> The comment we have is just repeating what the code does.
> Include the *reason* for the condition instead.
> 
> Cc: Stefano Garzarella 
> Signed-off-by: Michael S. Tsirkin 

Applied.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration

2019-08-11 Thread David Miller
From: Jason Wang 
Date: Mon, 12 Aug 2019 10:44:51 +0800

> On 2019/8/11 上午1:52, Michael S. Tsirkin wrote:
>> At this point how about we revert
>> 7f466032dc9e5a61217f22ea34b2df932786bbfc
>> for this release, and then re-apply a corrected version
>> for the next one?
> 
> If possible, consider we've actually disabled the feature. How about
> just queued those patches for next release?

I'm tossing this series while you and Michael decide how to move forward.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V4 0/9] Fixes for metadata accelreation

2019-08-08 Thread David Miller
From: Jason Wang 
Date: Wed,  7 Aug 2019 03:06:08 -0400

> This series try to fix several issues introduced by meta data
> accelreation series. Please review.
 ...

My impression is that patch #7 will be changed to use spinlocks so there
will be a v5.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v5 12/29] compat_ioctl: move drivers to compat_ptr_ioctl

2019-07-30 Thread David Miller
From: Arnd Bergmann 
Date: Tue, 30 Jul 2019 21:50:28 +0200

> Each of these drivers has a copy of the same trivial helper function to
> convert the pointer argument and then call the native ioctl handler.
> 
> We now have a generic implementation of that, so use it.
> 
> Acked-by: Greg Kroah-Hartman 
> Acked-by: Michael S. Tsirkin 
> Reviewed-by: Jarkko Sakkinen 
> Reviewed-by: Jason Gunthorpe 
> Reviewed-by: Jiri Kosina 
> Reviewed-by: Stefan Hajnoczi 
> Signed-off-by: Arnd Bergmann 

I assume this has to go via your series, thus:

Acked-by: David S. Miller 
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 0/3] vsock/virtio: several fixes in the .probe() and .remove()

2019-07-08 Thread David Miller
From: Stefano Garzarella 
Date: Fri,  5 Jul 2019 13:04:51 +0200

> During the review of "[PATCH] vsock/virtio: Initialize core virtio vsock
> before registering the driver", Stefan pointed out some possible issues
> in the .probe() and .remove() callbacks of the virtio-vsock driver.
 ...

Series applied to net-next, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] vhost_net: disable zerocopy by default

2019-06-17 Thread David Miller
From: Jason Wang 
Date: Mon, 17 Jun 2019 05:20:54 -0400

> Vhost_net was known to suffer from HOL[1] issues which is not easy to
> fix. Several downstream disable the feature by default. What's more,
> the datapath was split and datacopy path got the support of batching
> and XDP support recently which makes it faster than zerocopy part for
> small packets transmission.
> 
> It looks to me that disable zerocopy by default is more
> appropriate. It cold be enabled by default again in the future if we
> fix the above issues.
> 
> [1] https://patchwork.kernel.org/patch/3787671/
> 
> Signed-off-by: Jason Wang 

Applied, thanks Jason.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] vsock: correct removal of socket from the list

2019-06-14 Thread David Miller
From: Sunil Muthuswamy 
Date: Thu, 13 Jun 2019 03:52:27 +

> The current vsock code for removal of socket from the list is both
> subject to race and inefficient. It takes the lock, checks whether
> the socket is in the list, drops the lock and if the socket was on the
> list, deletes it from the list. This is subject to race because as soon
> as the lock is dropped once it is checked for presence, that condition
> cannot be relied upon for any decision. It is also inefficient because
> if the socket is present in the list, it takes the lock twice.
> 
> Signed-off-by: Sunil Muthuswamy 

Applied.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 2/5] vsock/virtio: fix locking for fwd_cnt and buf_alloc

2019-06-02 Thread David Miller
From: Stefano Garzarella 
Date: Fri, 31 May 2019 15:39:51 +0200

> @@ -434,7 +434,9 @@ void virtio_transport_set_buffer_size(struct vsock_sock 
> *vsk, u64 val)
>   if (val > vvs->buf_size_max)
>   vvs->buf_size_max = val;
>   vvs->buf_size = val;
> + spin_lock_bh(>rx_lock);
>   vvs->buf_alloc = val;
> + spin_unlock_bh(>rx_lock);

This locking doesn't do anything other than to strongly order the
buf_size store to occur before the buf_alloc one.

If you need a memory barrier, use one.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 0/6] vhost: accelerate metadata access

2019-05-30 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Thu, 30 May 2019 14:13:28 -0400

> On Thu, May 30, 2019 at 11:07:30AM -0700, David Miller wrote:
>> From: Jason Wang 
>> Date: Fri, 24 May 2019 04:12:12 -0400
>> 
>> > This series tries to access virtqueue metadata through kernel virtual
>> > address instead of copy_user() friends since they had too much
>> > overheads like checks, spec barriers or even hardware feature
>> > toggling like SMAP. This is done through setup kernel address through
>> > direct mapping and co-opreate VM management with MMU notifiers.
>> > 
>> > Test shows about 23% improvement on TX PPS. TCP_STREAM doesn't see
>> > obvious improvement.
>> 
>> I'm still waiting for some review from mst.
>> 
>> If I don't see any review soon I will just wipe these changes from
>> patchwork as it serves no purpose to just let them rot there.
>> 
>> Thank you.
> 
> I thought we agreed I'm merging this through my tree, not net-next.
> So you can safely wipe it.

Aha, I didn't catch that, thanks!
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 0/6] vhost: accelerate metadata access

2019-05-30 Thread David Miller
From: Jason Wang 
Date: Fri, 24 May 2019 04:12:12 -0400

> This series tries to access virtqueue metadata through kernel virtual
> address instead of copy_user() friends since they had too much
> overheads like checks, spec barriers or even hardware feature
> toggling like SMAP. This is done through setup kernel address through
> direct mapping and co-opreate VM management with MMU notifiers.
> 
> Test shows about 23% improvement on TX PPS. TCP_STREAM doesn't see
> obvious improvement.

I'm still waiting for some review from mst.

If I don't see any review soon I will just wipe these changes from
patchwork as it serves no purpose to just let them rot there.

Thank you.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH 1/4] vsock/virtio: fix locking around 'the_virtio_vsock'

2019-05-29 Thread David Miller
From: Stefano Garzarella 
Date: Tue, 28 May 2019 12:56:20 +0200

> @@ -68,7 +68,13 @@ struct virtio_vsock {
>  
>  static struct virtio_vsock *virtio_vsock_get(void)
>  {
> - return the_virtio_vsock;
> + struct virtio_vsock *vsock;
> +
> + mutex_lock(_virtio_vsock_mutex);
> + vsock = the_virtio_vsock;
> + mutex_unlock(_virtio_vsock_mutex);
> +
> + return vsock;

This doesn't do anything as far as I can tell.

No matter what, you will either get the value before it's changed or
after it's changed.

Since you should never publish the pointer by assigning it until the
object is fully initialized, this can never be a problem even without
the mutex being there.

Even if you sampled the the_virtio_sock value right before it's being
set to NULL by the remove function, that still can happen with the
mutex held too.

This function is also terribly named btw, it implies that a reference
count is being taken.  But that's not what this function does, it
just returns the pointer value as-is.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH V2 0/4] Prevent vhost kthread from hogging CPU

2019-05-18 Thread David Miller
From: Jason Wang 
Date: Fri, 17 May 2019 00:29:48 -0400

> Hi:
> 
> This series try to prevent a guest triggerable CPU hogging through
> vhost kthread. This is done by introducing and checking the weight
> after each requrest. The patch has been tested with reproducer of
> vsock and virtio-net. Only compile test is done for vhost-scsi.
> 
> Please review.
> 
> This addresses CVE-2019-3900.
> 
> Changs from V1:
> - fix user-ater-free in vosck patch

I am assuming that not only will mst review this, it will also go via
his tree rather than mine.

Thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH RESEND] vsock/virtio: Initialize core virtio vsock before registering the driver

2019-05-18 Thread David Miller
From: "Jorge E. Moreira" 
Date: Thu, 16 May 2019 13:51:07 -0700

> Avoid a race in which static variables in net/vmw_vsock/af_vsock.c are
> accessed (while handling interrupts) before they are initialized.
 ...
> Fixes: 22b5c0b63f32 ("vsock/virtio: fix kernel panic after device hot-unplug")
> Cc: Stefan Hajnoczi 
> Cc: Stefano Garzarella 
> Cc: "David S. Miller" 
> Cc: k...@vger.kernel.org
> Cc: virtualization@lists.linux-foundation.org
> Cc: net...@vger.kernel.org
> Cc: kernel-t...@android.com
> Cc: sta...@vger.kernel.org [4.9+]
> Signed-off-by: Jorge E. Moreira 
> Reviewed-by: Stefano Garzarella 
> Reviewed-by: Stefan Hajnoczi 

Applied and queued up for -stable, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3] vsock/virtio: free packets during the socket release

2019-05-17 Thread David Miller
From: Stefano Garzarella 
Date: Fri, 17 May 2019 16:45:43 +0200

> When the socket is released, we should free all packets
> queued in the per-socket list in order to avoid a memory
> leak.
> 
> Signed-off-by: Stefano Garzarella 

Applied and queued up for -stable.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] vhost: don't use kmap() to log dirty pages

2019-05-13 Thread David Miller
From: Jason Wang 
Date: Mon, 13 May 2019 01:27:45 -0400

> Vhost log dirty pages directly to a userspace bitmap through GUP and
> kmap_atomic() since kernel doesn't have a set_bit_to_user()
> helper. This will cause issues for the arch that has virtually tagged
> caches. The way to fix is to keep using userspace virtual
> address. Fortunately, futex has arch_futex_atomic_op_inuser() which
> could be used for setting a bit to user.
> 
> Note there're several cases that futex helper can fail e.g a page
> fault or the arch that doesn't have the support. For those cases, a
> simplified get_user()/put_user() pair protected by a global mutex is
> provided as a fallback. The fallback may lead false positive that
> userspace may see more dirty pages.
> 
> Cc: Christoph Hellwig 
> Cc: James Bottomley 
> Cc: Andrea Arcangeli 
> Cc: Thomas Gleixner 
> Cc: Ingo Molnar 
> Cc: Peter Zijlstra 
> Cc: Darren Hart 
> Fixes: 3a4d5c94e9593 ("vhost_net: a kernel-level virtio server")
> Signed-off-by: Jason Wang 

I want to see a review from Michael for this change before applying.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 2/8] vsock/virtio: free packets during the socket release

2019-05-10 Thread David Miller
From: Stefano Garzarella 
Date: Fri, 10 May 2019 14:58:37 +0200

> @@ -827,12 +827,20 @@ static bool virtio_transport_close(struct vsock_sock 
> *vsk)
>  
>  void virtio_transport_release(struct vsock_sock *vsk)
>  {
> + struct virtio_vsock_sock *vvs = vsk->trans;
> + struct virtio_vsock_buf *buf;
>   struct sock *sk = >sk;
>   bool remove_sock = true;
>  
>   lock_sock(sk);
>   if (sk->sk_type == SOCK_STREAM)
>   remove_sock = virtio_transport_close(vsk);
> + while (!list_empty(>rx_queue)) {
> + buf = list_first_entry(>rx_queue,
> +struct virtio_vsock_buf, list);

Please use list_for_each_entry_safe().
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] vhost: reject zero size iova range

2019-04-10 Thread David Miller
From: Jason Wang 
Date: Tue,  9 Apr 2019 12:10:25 +0800

> We used to accept zero size iova range which will lead a infinite loop
> in translate_desc(). Fixing this by failing the request in this case.
> 
> Reported-by: syzbot+d21e6e297322a900c...@syzkaller.appspotmail.com
> Fixes: 6b1e6cc7 ("vhost: new device IOTLB API")
> Signed-off-by: Jason Wang 

Applied and queued up for -stable.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net v8] failover: allow name change on IFF_UP slave interfaces

2019-04-10 Thread David Miller
From: Si-Wei Liu 
Date: Mon,  8 Apr 2019 19:45:27 -0400

> When a netdev appears through hot plug then gets enslaved by a failover
> master that is already up and running, the slave will be opened
> right away after getting enslaved. Today there's a race that userspace
> (udev) may fail to rename the slave if the kernel (net_failover)
> opens the slave earlier than when the userspace rename happens.
> Unlike bond or team, the primary slave of failover can't be renamed by
> userspace ahead of time, since the kernel initiated auto-enslavement is
> unable to, or rather, is never meant to be synchronized with the rename
> request from userspace.
> 
> As the failover slave interfaces are not designed to be operated
> directly by userspace apps: IP configuration, filter rules with
> regard to network traffic passing and etc., should all be done on master
> interface. In general, userspace apps only care about the
> name of master interface, while slave names are less important as long
> as admin users can see reliable names that may carry
> other information describing the netdev. For e.g., they can infer that
> "ens3nsby" is a standby slave of "ens3", while for a
> name like "eth0" they can't tell which master it belongs to.
> 
> Historically the name of IFF_UP interface can't be changed because
> there might be admin script or management software that is already
> relying on such behavior and assumes that the slave name can't be
> changed once UP. But failover is special: with the in-kernel
> auto-enslavement mechanism, the userspace expectation for device
> enumeration and bring-up order is already broken. Previously initramfs
> and various userspace config tools were modified to bypass failover
> slaves because of auto-enslavement and duplicate MAC address. Similarly,
> in case that users care about seeing reliable slave name, the new type
> of failover slaves needs to be taken care of specifically in userspace
> anyway.
> 
> It's less risky to lift up the rename restriction on failover slave
> which is already UP. Although it's possible this change may potentially
> break userspace component (most likely configuration scripts or
> management software) that assumes slave name can't be changed while
> UP, it's relatively a limited and controllable set among all userspace
> components, which can be fixed specifically to listen for the rename
> events on failover slaves. Userspace component interacting with slaves
> is expected to be changed to operate on failover master interface
> instead, as the failover slave is dynamic in nature which may come and
> go at any point.  The goal is to make the role of failover slaves less
> relevant, and userspace components should only deal with failover master
> in the long run.
> 
> Fixes: 30c8bd5aa8b2 ("net: Introduce generic failover module")
> Signed-off-by: Si-Wei Liu 
> Reviewed-by: Liran Alon 

Applied and queued up for -stable.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio-net: Remove inclusion of pci.h

2019-04-06 Thread David Miller
From: Yuval Shaia 
Date: Wed,  3 Apr 2019 11:20:45 +0300

> This header is not in use - remove it.
> 
> Signed-off-by: Yuval Shaia 

Applied to net-next
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio-net: Fix some minor formatting errors

2019-04-06 Thread David Miller
From: Yuval Shaia 
Date: Wed,  3 Apr 2019 12:10:13 +0300

> Signed-off-by: Yuval Shaia 

Applied to net-next
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_net: remove hcpu from virtnet_clean_affinity

2019-03-18 Thread David Miller
From: Peter Xu 
Date: Mon, 18 Mar 2019 14:56:06 +0800

> The variable is never used.
> 
> CC: Michael S. Tsirkin 
> CC: Jason Wang 
> CC: virtualization@lists.linux-foundation.org
> CC: net...@vger.kernel.org
> CC: linux-ker...@vger.kernel.org
> Signed-off-by: Peter Xu 

This looks rather uncontroversial and straightforward so I've applied
this to net-next.

Thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH V2 0/5] vhost: accelerate metadata access through vmap()

2019-03-11 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Mon, 11 Mar 2019 09:59:28 -0400

> On Mon, Mar 11, 2019 at 03:13:17PM +0800, Jason Wang wrote:
>> 
>> On 2019/3/8 下午10:12, Christoph Hellwig wrote:
>> > On Wed, Mar 06, 2019 at 02:18:07AM -0500, Jason Wang wrote:
>> > > This series tries to access virtqueue metadata through kernel virtual
>> > > address instead of copy_user() friends since they had too much
>> > > overheads like checks, spec barriers or even hardware feature
>> > > toggling. This is done through setup kernel address through vmap() and
>> > > resigter MMU notifier for invalidation.
>> > > 
>> > > Test shows about 24% improvement on TX PPS. TCP_STREAM doesn't see
>> > > obvious improvement.
>> > How is this going to work for CPUs with virtually tagged caches?
>> 
>> 
>> Anything different that you worry?
> 
> If caches have virtual tags then kernel and userspace view of memory
> might not be automatically in sync if they access memory
> through different virtual addresses. You need to do things like
> flush_cache_page, probably multiple times.

"flush_dcache_page()"
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [RFC PATCH net] failover: allow name change on IFF_UP slave interfaces

2019-03-04 Thread David Miller


Why did you send this three times?

What's different in each of these copies?

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net V2] vhost: correctly check the return value of translate_desc() in log_used()

2019-02-19 Thread David Miller
From: Jason Wang 
Date: Tue, 19 Feb 2019 14:53:44 +0800

> When fail, translate_desc() returns negative value, otherwise the
> number of iovs. So we should fail when the return value is negative
> instead of a blindly check against zero.
> 
> Detected by CoverityScan, CID# 1442593:  Control flow issues  (DEADCODE)
> 
> Fixes: cc5e71075947 ("vhost: log dirty page correctly")
> Acked-by: Michael S. Tsirkin 
> Reported-by: Stephen Hemminger 
> Signed-off-by: Jason Wang 

Applied and queued up for -stable, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] vhost: correctly check the return value of translate_desc() in log_used()

2019-02-15 Thread David Miller
From: Jason Wang 
Date: Fri, 15 Feb 2019 15:53:24 +0800

> When fail, translate_desc() returns negative value, otherwise the
> number of iovs. So we should fail when the return value is negative
> instead of a blindly check against zero.
> 
> Reported-by: Stephen Hemminger 
> Fixes: cc5e71075947 ("vhost: log dirty page correctly")
> Signed-off-by: Jason Wang 

Jason, please put the Fixes tag first.

Thank you.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio_net: Account for tx bytes and packets on sending xdp_frames

2019-02-04 Thread David Miller
From: Toshiaki Makita 
Date: Thu, 31 Jan 2019 20:40:30 +0900

> Previously virtnet_xdp_xmit() did not account for device tx counters,
> which caused confusions.
> To be consistent with SKBs, account them on freeing xdp_frames.
> 
> Reported-by: David Ahern 
> Signed-off-by: Toshiaki Makita 

Applied, thank you.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3 0/2] vsock/virtio: fix issues on device hot-unplug

2019-02-04 Thread David Miller
From: Stefano Garzarella 
Date: Fri,  1 Feb 2019 12:42:05 +0100

> These patches try to handle the hot-unplug of vsock virtio transport device in
> a proper way.
> 
> Maybe move the vsock_core_init()/vsock_core_exit() functions in the 
> module_init
> and module_exit of vsock_virtio_transport module can't be the best way, but 
> the
> architecture of vsock_core forces us to this approach for now.
> 
> The vsock_core proto_ops expect a valid pointer to the transport device, so we
> can't call vsock_core_exit() until there are open sockets.
> 
> v2 -> v3:
>  - Rebased on master
> 
> v1 -> v2:
>  - Fixed commit message of patch 1.
>  - Added Reviewed-by, Acked-by tags by Stefan

Series applied.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 net 0/7] virtio_net: Fix problems around XDP tx and napi_tx

2019-01-31 Thread David Miller
From: Toshiaki Makita 
Date: Tue, 29 Jan 2019 09:45:52 +0900

> While I'm looking into how to account standard tx counters on XDP tx
> processing, I found several bugs around XDP tx and napi_tx.
> 
> Patch1: Fix oops on error path. Patch2 depends on this.
> Patch2: Fix memory corruption on freeing xdp_frames with napi_tx enabled.
> Patch3: Minor fix patch5 depends on.
> Patch4: Fix memory corruption on processing xdp_frames when XDP is disabled.
>   Also patch5 depends on this.
> Patch5: Fix memory corruption on processing xdp_frames while XDP is being
>   disabled.
> Patch6: Minor fix patch7 depends on.
> Patch7: Fix memory corruption on freeing sk_buff or xdp_frames when a normal
>   queue is reused for XDP and vise versa.
> 
> v2:
> - patch5: Make rcu_assign_pointer/synchronize_net conditional instead of
>   _virtnet_set_queues.
> - patch7: Use napi_consume_skb() instead of dev_consume_skb_any()
> 
> Signed-off-by: Toshiaki Makita 

Series applied.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] vhost: fix OOB in get_rx_bufs()

2019-01-31 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Tue, 29 Jan 2019 20:36:31 -0500

> If it helps I can include most virtio stuff in my pull requests instead.
> Or if that can't work since there's too often a dependency on net-next,
> maybe Jason wants to create a tree and send pull requests to you.  Let
> us know if that will help, and which of the options looks better from
> your POV.

Thanks for offering Michael, I really appreciate it.

Let me think about the logistics of that and how it may or may not
help me with my backlog.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio_net: Account for tx bytes and packets on sending xdp_frames

2019-01-31 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Thu, 31 Jan 2019 10:25:17 -0500

> On Thu, Jan 31, 2019 at 08:40:30PM +0900, Toshiaki Makita wrote:
>> Previously virtnet_xdp_xmit() did not account for device tx counters,
>> which caused confusions.
>> To be consistent with SKBs, account them on freeing xdp_frames.
>> 
>> Reported-by: David Ahern 
>> Signed-off-by: Toshiaki Makita 
> 
> Well we count them on receive so I guess it makes sense for consistency
> 
> Acked-by: Michael S. Tsirkin 
> 
> however, I really wonder whether adding more and more standard net stack
> things like this will end up costing most of XDP its speed.
> 
> Should we instead make sure *not* to account XDP packets
> in any counters at all? XDP programs can use maps
> to do their own counting...

This has been definitely a discussion point, and something we should
develop a clear, strong, policy on.

David, Jesper, care to chime in where we ended up in that last thread
discussion this?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] vhost: fix OOB in get_rx_bufs()

2019-01-29 Thread David Miller
From: Jason Wang 
Date: Mon, 28 Jan 2019 15:05:05 +0800

> After batched used ring updating was introduced in commit e2b3b35eb989
> ("vhost_net: batch used ring update in rx"). We tend to batch heads in
> vq->heads for more than one packet. But the quota passed to
> get_rx_bufs() was not correctly limited, which can result a OOB write
> in vq->heads.
> 
> headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
> vhost_len, , vq_log, ,
> likely(mergeable) ? UIO_MAXIOV : 1);
> 
> UIO_MAXIOV was still used which is wrong since we could have batched
> used in vq->heads, this will cause OOB if the next buffer needs more
> than 960 (1024 (UIO_MAXIOV) - 64 (VHOST_NET_BATCH)) heads after we've
> batched 64 (VHOST_NET_BATCH) heads:
 ...
> Fixing this by allocating UIO_MAXIOV + VHOST_NET_BATCH iovs for
> vhost-net. This is done through set the limitation through
> vhost_dev_init(), then set_owner can allocate the number of iov in a
> per device manner.
> 
> This fixes CVE-2018-16880.
> 
> Fixes: e2b3b35eb989 ("vhost_net: batch used ring update in rx")
> Signed-off-by: Jason Wang 

Applied and queued up for -stable, thanks!
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] vhost: fix OOB in get_rx_bufs()

2019-01-29 Thread David Miller
From: David Miller 
Date: Tue, 29 Jan 2019 15:10:26 -0800 (PST)

> Yeah the CVE pushed my hand a little bit, and I knew I was going to
> send Linus a pull request today because David Watson needs some TLS
> changes in net-next.

I also want to make a general comment for the record.

If I let patches slip consistently past 24 hours my backlog is
unmanageable.  Even with aggressively applying things quickly I'm
right now at 70-75.  If I do not do what I am doing, then it's in the
100-150 range.

So I am at the point where I often must move forward with patches that
I think I personally can verify and vet on my own.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] vhost: fix OOB in get_rx_bufs()

2019-01-29 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Tue, 29 Jan 2019 17:54:44 -0500

> On Mon, Jan 28, 2019 at 10:54:44PM -0800, David Miller wrote:
>> From: Jason Wang 
>> Date: Mon, 28 Jan 2019 15:05:05 +0800
>> 
>> > After batched used ring updating was introduced in commit e2b3b35eb989
>> > ("vhost_net: batch used ring update in rx"). We tend to batch heads in
>> > vq->heads for more than one packet. But the quota passed to
>> > get_rx_bufs() was not correctly limited, which can result a OOB write
>> > in vq->heads.
>> > 
>> > headcount = get_rx_bufs(vq, vq->heads + nvq->done_idx,
>> > vhost_len, , vq_log, ,
>> > likely(mergeable) ? UIO_MAXIOV : 1);
>> > 
>> > UIO_MAXIOV was still used which is wrong since we could have batched
>> > used in vq->heads, this will cause OOB if the next buffer needs more
>> > than 960 (1024 (UIO_MAXIOV) - 64 (VHOST_NET_BATCH)) heads after we've
>> > batched 64 (VHOST_NET_BATCH) heads:
>>  ...
>> > Fixing this by allocating UIO_MAXIOV + VHOST_NET_BATCH iovs for
>> > vhost-net. This is done through set the limitation through
>> > vhost_dev_init(), then set_owner can allocate the number of iov in a
>> > per device manner.
>> > 
>> > This fixes CVE-2018-16880.
>> > 
>> > Fixes: e2b3b35eb989 ("vhost_net: batch used ring update in rx")
>> > Signed-off-by: Jason Wang 
>> 
>> Applied and queued up for -stable, thanks!
> 
> Wow it seems we are down to hours round time post to queue.
> It would be hard to keep up that rate generally.
> However, I am guessing this was already in downstreams, and it's a CVE,
> so I guess it's a no brainer and review wasn't really necessary - was
> that the idea? Just checking.

Yeah the CVE pushed my hand a little bit, and I knew I was going to send Linus
a pull request today because David Watson needs some TLS changes in net-next.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next V4 0/5] vhost: accelerate metadata access through vmap()

2019-01-27 Thread David Miller
From: Jason Wang 
Date: Wed, 23 Jan 2019 17:55:52 +0800

> This series tries to access virtqueue metadata through kernel virtual
> address instead of copy_user() friends since they had too much
> overheads like checks, spec barriers or even hardware feature
> toggling.
> 
> Test shows about 24% improvement on TX PPS. It should benefit other
> cases as well.

I've read over the discussion of patch #5 a few times.

And it seems to me that, at a minimum, a few things still need to
be resolved:

1) More perf data added to commit message.

2) Whether invalidate_range_start() and invalidate_range_end() must
   be paired.

Etc.  So I am marking this series "Changes Requested".
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next V4 0/5] vhost: accelerate metadata access through vmap()

2019-01-23 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Wed, 23 Jan 2019 08:58:07 -0500

> On Wed, Jan 23, 2019 at 05:55:52PM +0800, Jason Wang wrote:
>> This series tries to access virtqueue metadata through kernel virtual
>> address instead of copy_user() friends since they had too much
>> overheads like checks, spec barriers or even hardware feature
>> toggling.
>> 
>> Test shows about 24% improvement on TX PPS. It should benefit other
>> cases as well.
> 
> ok I think this addresses most comments but it's a big change and we
> just started 1.1 review so to pls give me a week to review this ok?

Ok. :)
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] virtio_net: bulk free tx skbs

2019-01-19 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Thu, 17 Jan 2019 23:20:07 -0500

> Use napi_consume_skb() to get bulk free.  Note that napi_consume_skb is
> safe to call in a non-napi context as long as the napi_budget flag is
> correct.
> 
> Signed-off-by: Michael S. Tsirkin 

Applied, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net V4] vhost: log dirty page correctly

2019-01-18 Thread David Miller
From: Jason Wang 
Date: Wed, 16 Jan 2019 16:54:42 +0800

> Vhost dirty page logging API is designed to sync through GPA. But we
> try to log GIOVA when device IOTLB is enabled. This is wrong and may
> lead to missing data after migration.
> 
> To solve this issue, when logging with device IOTLB enabled, we will:
> 
> 1) reuse the device IOTLB translation result of GIOVA->HVA mapping to
>get HVA, for writable descriptor, get HVA through iovec. For used
>ring update, translate its GIOVA to HVA
> 2) traverse the GPA->HVA mapping to get the possible GPA and log
>through GPA. Pay attention this reverse mapping is not guaranteed
>to be unique, so we should log each possible GPA in this case.
> 
> This fix the failure of scp to guest during migration. In -next, we
> will probably support passing GIOVA->GPA instead of GIOVA->HVA.
> 
> Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")
> Reported-by: Jintack Lim 
> Cc: Jintack Lim 
> Signed-off-by: Jason Wang 

Applied and queued up for -stable.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net V4] vhost: log dirty page correctly

2019-01-17 Thread David Miller
From: Jason Wang 
Date: Wed, 16 Jan 2019 16:54:42 +0800

> Vhost dirty page logging API is designed to sync through GPA. But we
> try to log GIOVA when device IOTLB is enabled. This is wrong and may
> lead to missing data after migration.
> 
> To solve this issue, when logging with device IOTLB enabled, we will:
> 
> 1) reuse the device IOTLB translation result of GIOVA->HVA mapping to
>get HVA, for writable descriptor, get HVA through iovec. For used
>ring update, translate its GIOVA to HVA
> 2) traverse the GPA->HVA mapping to get the possible GPA and log
>through GPA. Pay attention this reverse mapping is not guaranteed
>to be unique, so we should log each possible GPA in this case.
> 
> This fix the failure of scp to guest during migration. In -next, we
> will probably support passing GIOVA->GPA instead of GIOVA->HVA.
> 
> Fixes: 6b1e6cc7855b ("vhost: new device IOTLB API")
> Reported-by: Jintack Lim 
> Cc: Jintack Lim 
> Signed-off-by: Jason Wang 

Michaell, can I get a review for this please?

Thank you.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost/vsock: fix vhost vsock cid hashing inconsistent

2019-01-16 Thread David Miller
From: Zha Bin 
Date: Tue,  8 Jan 2019 16:07:03 +0800

> The vsock core only supports 32bit CID, but the Virtio-vsock spec define
> CID (dst_cid and src_cid) as u64 and the upper 32bits is reserved as
> zero. This inconsistency causes one bug in vhost vsock driver. The
> scenarios is:
> 
>   0. A hash table (vhost_vsock_hash) is used to map an CID to a vsock
>   object. And hash_min() is used to compute the hash key. hash_min() is
>   defined as:
>   (sizeof(val) <= 4 ? hash_32(val, bits) : hash_long(val, bits)).
>   That means the hash algorithm has dependency on the size of macro
>   argument 'val'.
>   0. In function vhost_vsock_set_cid(), a 64bit CID is passed to
>   hash_min() to compute the hash key when inserting a vsock object into
>   the hash table.
>   0. In function vhost_vsock_get(), a 32bit CID is passed to hash_min()
>   to compute the hash key when looking up a vsock for an CID.
> 
> Because the different size of the CID, hash_min() returns different hash
> key, thus fails to look up the vsock object for an CID.
> 
> To fix this bug, we keep CID as u64 in the IOCTLs and virtio message
> headers, but explicitly convert u64 to u32 when deal with the hash table
> and vsock core.
> 
> Fixes: 834e772c8db0 ("vhost/vsock: fix use-after-free in network stack 
> callers")
> Link: https://github.com/stefanha/virtio/blob/vsock/trunk/content.tex
> Signed-off-by: Zha Bin 
> Reviewed-by: Liu Jiang 

Applied, thank you.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next] virtio_net: bulk free tx skbs

2019-01-16 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Mon, 14 Jan 2019 20:34:26 -0500

> Use napi_consume_skb() to get bulk free.  Note that napi_consume_skb is
> safe to call in a non-napi context as long as the napi_budget flag is
> correct.
> 
> Signed-off-by: Michael S. Tsirkin 
> ---
> 
> My perf testing setup is down but it works fine on my devel box and
> should be fairly uncontroversial.

It would be uncontroversial if it compiled.

drivers/net/virtio_net.c: In function ‘free_old_xmit_skbs’:
drivers/net/virtio_net.c:1346:25: error: ‘use_napi’ undeclared (first use in 
this function); did you mean ‘used_math’?
   napi_consume_skb(skb, use_napi);
 ^~~~
 used_math
drivers/net/virtio_net.c:1346:25: note: each undeclared identifier is reported 
only once for each function it appears in
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization

Re: [RFC PATCH V2 3/3] vhost: access vq metadata through kernel virtual address

2018-12-28 Thread David Miller
From: Jason Wang 
Date: Fri, 28 Dec 2018 15:55:37 +0800

> +static int vhost_invalidate_vmap(struct vhost_virtqueue *vq,
> +  struct vhost_vmap *map,
> +  unsigned long uaddr,
> +  unsigned long start,
> +  unsigned long end,
> +  bool blockable)
> +{
> + if (start < uaddr && end >= uaddr) {
> + if (!blockable)
> + return -EAGAIN;
> + mutex_lock(>mutex);
> + if (map->addr)
> + vunmap(map->unmap_addr);
> + map->addr = NULL;
> + map->unmap_addr = NULL;
> + mutex_unlock(>mutex);
> + }
> +
> + return 0;
> +}

What are the rules for these invalidate operations?

Can there be partial overlaps?  If so, wouldn't you need some way of
keeping track of the partially overlapping unmaps so that once all of
the invalidates covering the range occur you properly cleanup and do
the vunmap()?
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] VSOCK: Send reset control packet when socket is partially bound

2018-12-19 Thread David Miller
From: Jorgen Hansen 
Date: Tue, 18 Dec 2018 00:34:06 -0800

> If a server side socket is bound to an address, but not in the listening
> state yet, incoming connection requests should receive a reset control
> packet in response. However, the function used to send the reset
> silently drops the reset packet if the sending socket isn't bound
> to a remote address (as is the case for a bound socket not yet in
> the listening state). This change fixes this by using the src
> of the incoming packet as destination for the reset packet in
> this case.
> 
> Fixes: d021c344051a ("VSOCK: Introduce VM Sockets")
> Reviewed-by: Adit Ranadive 
> Reviewed-by: Vishnu Dasa 
> Signed-off-by: Jorgen Hansen 

Applied and queued up for -stable, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] VSOCK: Send reset control packet when socket is partially bound

2018-12-19 Thread David Miller
From: Jorgen Hansen 
Date: Wed, 12 Dec 2018 01:38:59 -0800

>  static int vmci_transport_send_reset_bh(struct sockaddr_vm *dst,
> @@ -312,12 +328,29 @@ static int vmci_transport_send_reset_bh(struct 
> sockaddr_vm *dst,
>  static int vmci_transport_send_reset(struct sock *sk,
>struct vmci_transport_packet *pkt)
>  {
> + struct sockaddr_vm dst;
> + struct sockaddr_vm *dst_ptr;
> + struct vsock_sock *vsk;
> +

Please order local variables from longest to shortest line.

Thank you.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 3/3] vhost: access vq metadata through kernel virtual address

2018-12-18 Thread David Miller
From: Jason Wang 
Date: Fri, 14 Dec 2018 11:57:35 +0800

> This is the price of all GUP users not only vhost itself. What's more
> important, the goal is not to be left too much behind for other
> backends like DPDK or AF_XDP (all of which are using GUP).

+1
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] vhost: return EINVAL if iovecs size does not match the message size

2018-12-18 Thread David Miller
From: Pavel Tikhomirov 
Date: Thu, 13 Dec 2018 17:53:50 +0300

> We've failed to copy and process vhost_iotlb_msg so let userspace at
> least know about it. For instance before these patch the code below runs
> without any error:
 ...
> Signed-off-by: Pavel Tikhomirov 

Michael, will you be taking this in via your tree?

Thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 0/3] vhost: accelerate metadata access through vmap()

2018-12-18 Thread David Miller
From: Jason Wang 
Date: Fri, 14 Dec 2018 12:29:54 +0800

> 
> On 2018/12/14 上午4:12, Michael S. Tsirkin wrote:
>> On Thu, Dec 13, 2018 at 06:10:19PM +0800, Jason Wang wrote:
>>> Hi:
>>>
>>> This series tries to access virtqueue metadata through kernel virtual
>>> address instead of copy_user() friends since they had too much
>>> overheads like checks, spec barriers or even hardware feature
>>> toggling.
>>>
>>> Test shows about 24% improvement on TX PPS. It should benefit other
>>> cases as well.
>>>
>>> Please review
>> I think the idea of speeding up userspace access is a good one.
>> However I think that moving all checks to start is way too aggressive.
> 
> 
> So did packet and AF_XDP. Anyway, sharing address space and access
> them directly is the fastest way. Performance is the major
> consideration for people to choose backend. Compare to userspace
> implementation, vhost does not have security advantages at any
> level. If vhost is still slow, people will start to develop backends
> based on e.g AF_XDP.

Exactly, this is precisely how this kind of problem should be solved.

Michael, I strongly support the approach Jason is taking here, and I
would like to ask you to seriously reconsider your objections.

Thank you.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 2/5] VSOCK: support fill data to mergeable rx buffer in host

2018-12-15 Thread David Miller
From: jiangyiwen 
Date: Thu, 13 Dec 2018 11:11:48 +0800

> I hope Host can fill fewer bytes into rx virtqueue, so
> I keep structure virtio_vsock_mrg_rxbuf_hdr one byte
> alignment.

The question is if this actully matters.

Do you know?

If the obejct this is embeeded inside of is at least 2 byte aligned,
you are marking it packed for nothing.

There are only %100 downsides to using the packed attribute.

Simply define your datastructures properly, with fixed sized types,
and all padding defined explicitly.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net V2 0/4] Fix various issue of vhost

2018-12-15 Thread David Miller
From: Jason Wang 
Date: Wed, 12 Dec 2018 18:08:15 +0800

> This series tries to fix various issues of vhost:
> 
> - Patch 1 adds a missing write barrier between used idx updating and
>   logging.
> - Patch 2-3 brings back the protection of device IOTLB through vq
>   mutex, this fixes possible use after free in device IOTLB entries.
> - Patch 4-7 fixes the diry page logging when device IOTLB is
>   enabled. We should done through GPA instead of GIOVA, this was done
>   through intorudce HVA->GPA reverse mapping and convert HVA to GPA
>   during logging dirty pages.
> 
> Please consider them for -stable.
> 
> Thanks
> 
> Changes from V1:
> - silent compiler warning for 32bit.
> - use mutex_trylock() on slowpath instead of mutex_lock() even on fast
>   path.

Hello Jason.

Look like Michael wants you to split out patch #4 and target
net-next with it.

So please do that and respin the first 3 patches here with Michael's
ACKs.

Thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 2/5] VSOCK: support fill data to mergeable rx buffer in host

2018-12-15 Thread David Miller
From: jiangyiwen 
Date: Wed, 12 Dec 2018 17:29:31 +0800

> diff --git a/include/uapi/linux/virtio_vsock.h 
> b/include/uapi/linux/virtio_vsock.h
> index 1d57ed3..2292f30 100644
> --- a/include/uapi/linux/virtio_vsock.h
> +++ b/include/uapi/linux/virtio_vsock.h
> @@ -63,6 +63,11 @@ struct virtio_vsock_hdr {
>   __le32  fwd_cnt;
>  } __attribute__((packed));
> 
> +/* It add mergeable rx buffers feature */
> +struct virtio_vsock_mrg_rxbuf_hdr {
> + __le16  num_buffers;/* number of mergeable rx buffers */
> +} __attribute__((packed));
> +

I know the rest of this file uses 'packed' but this attribute should
only be used if absolutely necessary as it incurs a
non-trivial performance penalty for some architectures.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2 1/5] VSOCK: support fill mergeable rx buffer in guest

2018-12-14 Thread David Miller
From: jiangyiwen 
Date: Wed, 12 Dec 2018 17:28:16 +0800

> +static int fill_mergeable_rx_buff(struct virtio_vsock *vsock,
> + struct virtqueue *vq)
> +{
> + struct page_frag *alloc_frag = >alloc_frag;
> + struct scatterlist sg;
> + /* Currently we don't use ewma len, use PAGE_SIZE instead, because too
> +  * small size can't fill one full packet, sadly we only 128 vq num now.
> +  */
> + unsigned int len = PAGE_SIZE, hole;
> + void *buf;
> + int err;

Please don't break up a set of local variable declarations with a
comment like this.  The comment seems to be about the initialization
of 'len', so move that initialization into the code below the variable
declarations and bring the comment along for the ride as well.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net V3 0/3] Fix various issue of vhost

2018-12-12 Thread David Miller
From: Jason Wang 
Date: Thu, 13 Dec 2018 10:53:36 +0800

> This series tries to fix various issues of vhost:
> 
> - Patch 1 adds a missing write barrier between used idx updating and
>   logging.
> - Patch 2-3 brings back the protection of device IOTLB through vq
>   mutex, this fixes possible use after free in device IOTLB entries.
> 
> Please consider them for -stable.
> 
> Thanks
> 
> Changes from V2:
> - drop dirty page fix and make it for net-next
> Changes from V1:
> - silent compiler warning for 32bit.
> - use mutex_trylock() on slowpath instead of mutex_lock() even on fast
>   path.

Series applied and queued up for -stable, thanks Jason.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net 0/4] Fix various issue of vhost

2018-12-11 Thread David Miller
From: Jason Wang 
Date: Mon, 10 Dec 2018 17:44:50 +0800

> This series tries to fix various issues of vhost:
> 
> - Patch 1 adds a missing write barrier between used idx updating and
>   logging.
> - Patch 2-3 brings back the protection of device IOTLB through vq
>   mutex, this fixes possible use after free in device IOTLB entries.
> - Patch 4 fixes the diry page logging when device IOTLB is
>   enabled. We should done through GPA instead of GIOVA, this was done
>   through logging through iovec and traversing GPA->HPA list for the
>   GPA.
> 
> Please consider them for -stable.

Looks like the kbuild robot found some problems.

->used is a pointer (which might be 32-bit) and you're casting it to
a u64 in the translate_desc() calls of patch #4.

Please make sure that you don't actually require the full domain of
a u64 in these values, as obviously if vq->used is a pointer you will
only get a 32-bit domain on 32-bit architectures.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v2] vhost: fix IOTLB locking

2018-12-03 Thread David Miller
From: Jean-Philippe Brucker 
Date: Fri, 30 Nov 2018 16:05:53 +

> Commit 78139c94dc8c ("net: vhost: lock the vqs one by one") moved the vq
> lock to improve scalability, but introduced a possible deadlock in
> vhost-iotlb. vhost_iotlb_notify_vq() now takes vq->mutex while holding
> the device's IOTLB spinlock. And on the vhost_iotlb_miss() path, the
> spinlock is taken while holding vq->mutex.
> 
> Since calling vhost_poll_queue() doesn't require any lock, avoid the
> deadlock by not taking vq->mutex.
> 
> Fixes: 78139c94dc8c ("net: vhost: lock the vqs one by one")
> Acked-by: Jason Wang 
> Acked-by: Michael S. Tsirkin 
> Signed-off-by: Jean-Philippe Brucker 

Applied, thank you.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] virtio-net: keep vnet header zeroed after processing XDP

2018-11-30 Thread David Miller
From: Jason Wang 
Date: Thu, 29 Nov 2018 13:53:16 +0800

> We copy vnet header unconditionally in page_to_skb() this is wrong
> since XDP may modify the packet data. So let's keep a zeroed vnet
> header for not confusing the conversion between vnet header and skb
> metadata.
> 
> In the future, we should able to detect whether or not the packet was
> modified and keep using the vnet header when packet was not touched.
> 
> Fixes: f600b6905015 ("virtio_net: Add XDP support")
> Reported-by: Pavel Popa 
> Signed-off-by: Jason Wang 

Applied and queued up for -stable, thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v3 00/13] virtio: support packed ring

2018-11-26 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Tue, 27 Nov 2018 01:08:08 -0500

> On Wed, Nov 21, 2018 at 06:03:17PM +0800, Tiwei Bie wrote:
>> Hi,
>> 
>> This patch set implements packed ring support in virtio driver.
>> 
>> A performance test between pktgen (pktgen_sample03_burst_single_flow.sh)
>> and DPDK vhost (testpmd/rxonly/vhost-PMD) has been done, I saw
>> ~30% performance gain in packed ring in this case.
>> 
>> To make this patch set work with below patch set for vhost,
>> some hacks are needed to set the _F_NEXT flag in indirect
>> descriptors (this should be fixed in vhost):
>> 
>> https://lkml.org/lkml/2018/7/3/33
> 
> I went over it and I think it's correct spec-wise.
> 
> I have some ideas for enhancements but let's start
> with getting this stuff merged first.
> 
> Acked-by: Michael S. Tsirkin 

Series applied.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net 2/2] virtio-net: fail XDP set if guest csum is negotiated

2018-11-24 Thread David Miller
From: Jason Wang 
Date: Thu, 22 Nov 2018 14:36:31 +0800

> We don't support partial csumed packet since its metadata will be lost
> or incorrect during XDP processing. So fail the XDP set if guest_csum
> feature is negotiated.
> 
> Fixes: f600b6905015 ("virtio_net: Add XDP support")
> Reported-by: Jesper Dangaard Brouer 
> Cc: Jesper Dangaard Brouer 
> Cc: Pavel Popa 
> Cc: David Ahern 
> Signed-off-by: Jason Wang 

Applied and queued up for -stable.

Same comments as for patch #1.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net 1/2] virtio-net: disable guest csum during XDP set

2018-11-24 Thread David Miller
From: Jason Wang 
Date: Thu, 22 Nov 2018 14:36:30 +0800

> We don't disable VIRTIO_NET_F_GUEST_CSUM if XDP was set. This means we
> can receive partial csumed packets with metadata kept in the
> vnet_hdr. This may have several side effects:
> 
> - It could be overridden by header adjustment, thus is might be not
>   correct after XDP processing.
> - There's no way to pass such metadata information through
>   XDP_REDIRECT to another driver.
> - XDP does not support checksum offload right now.
> 
> So simply disable guest csum if possible in this the case of XDP.
> 
> Fixes: 3f93522ffab2d ("virtio-net: switch off offloads on demand if possible 
> on XDP set")
> Reported-by: Jesper Dangaard Brouer 
> Cc: Jesper Dangaard Brouer 
> Cc: Pavel Popa 
> Cc: David Ahern 
> Signed-off-by: Jason Wang 

Applied and queued up for -stable.

We really should have a way to use the checksum provided if the XDP
program returns XDP_PASS and does not modify the packet contents
or size.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next v3 00/13] virtio: support packed ring

2018-11-21 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Wed, 21 Nov 2018 07:20:27 -0500

> Dave, given the holiday, attempts to wrap up the 1.1 spec and the
> patchset size I would very much appreciate a bit more time for
> review. Say until Nov 28?

Ok.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 2/2] tuntap: free XDP dropped packets in a batch

2018-11-17 Thread David Miller
From: Jason Wang 
Date: Thu, 15 Nov 2018 17:43:10 +0800

> Thanks to the batched XDP buffs through msg_control. Instead of
> calling put_page() for each page which involves a atomic operation,
> let's batch them by record the last page that needs to be freed and
> its refcnt count and free them in a batch.
> 
> Testpmd(virtio-user + vhost_net) + XDP_DROP shows 3.8% improvement.
> 
> Before: 4.71Mpps
> After : 4.89Mpps
> 
> Signed-off-by: Jason Wang 

Applied.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next 1/2] vhost_net: mitigate page reference counting during page frag refill

2018-11-17 Thread David Miller
From: Jason Wang 
Date: Thu, 15 Nov 2018 17:43:09 +0800

> We do a get_page() which involves a atomic operation. This patch tries
> to mitigate a per packet atomic operation by maintaining a reference
> bias which is initially USHRT_MAX. Each time a page is got, instead of
> calling get_page() we decrease the bias and when we find it's time to
> use a new page we will decrease the bias at one time through
> __page_cache_drain_cache().
> 
> Testpmd(virtio_user + vhost_net) + XDP_DROP on TAP shows about 1.6%
> improvement.
> 
> Before: 4.63Mpps
> After:  4.71Mpps
> 
> Signed-off-by: Jason Wang 

Applied.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net] vhost: Fix Spectre V1 vulnerability

2018-10-31 Thread David Miller
From: Jason Wang 
Date: Tue, 30 Oct 2018 14:10:49 +0800

> The idx in vhost_vring_ioctl() was controlled by userspace, hence a
> potential exploitation of the Spectre variant 1 vulnerability.
> 
> Fixing this by sanitizing idx before using it to index d->vqs.
> 
> Cc: Michael S. Tsirkin 
> Cc: Josh Poimboeuf 
> Cc: Andrea Arcangeli 
> Signed-off-by: Jason Wang 

Applied and queued up for -stable.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH] virtio_net: add local_bh_disable() around u64_stats_update_begin

2018-10-18 Thread David Miller
From: Sebastian Andrzej Siewior 
Date: Thu, 18 Oct 2018 10:43:13 +0200

> on 32bit, lockdep notices that virtnet_open() and refill_work() invoke
> try_fill_recv() from process context while virtnet_receive() invokes the
> same function from BH context. The problem that the seqcounter within
> u64_stats_update_begin() may deadlock if it is interrupted by BH and
> then acquired again.
> 
> Introduce u64_stats_update_begin_bh() which disables BH on 32bit
> architectures. Since the BH might interrupt the worker, this new
> function should not limited to SMP like the others which are expected
> to be used in softirq.
> 
> With this change we might lose increments but this is okay. The
> important part that the two 32bit parts of the 64bit counter are not
> corrupted.
> 
> Fixes: 461f03dc99cf6 ("virtio_net: Add kick stats").
> Suggested-by: Stephen Hemminger 
> Signed-off-by: Sebastian Andrzej Siewior 

Trying to get down to the bottom of this:

1) virtnet_receive() runs from softirq but only if NAPI is active and
   enabled.  It is in this context that it invokes try_fill_recv().

2) refill_work() runs from process context, but disables NAPI (and
   thus invocation of virtnet_receive()) before calling
   try_fill_recv().

3) virtnet_open() invokes from process context as well, but before the
   NAPI instances are enabled, it is same as case #2.

4) virtnet_restore_up() is the same situations as #3.

Therefore I agree that this is a false positive, and simply lockdep
cannot see the NAPI synchronization done by case #2.

I think we shouldn't add unnecessary BH disabling here, and instead
find some way to annotate this for lockdep's sake.

Thank you.

___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH v3] virtio_net: avoid using netif_tx_disable() for serializing tx routine

2018-10-17 Thread David Miller
From: Ake Koomsin 
Date: Wed, 17 Oct 2018 19:44:12 +0900

> Commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
> introduces netif_tx_disable() after netif_device_detach() in order to
> avoid use-after-free of tx queues. However, there are two issues.
> 
> 1) Its operation is redundant with netif_device_detach() in case the
>interface is running.
> 2) In case of the interface is not running before suspending and
>resuming, the tx does not get resumed by netif_device_attach().
>This results in losing network connectivity.
> 
> It is better to use netif_tx_lock_bh()/netif_tx_unlock_bh() instead for
> serializing tx routine during reset. This also preserves the symmetry
> of netif_device_detach() and netif_device_attach().
> 
> Fixes commit 713a98d90c5e ("virtio-net: serialize tx routine during reset")
> Signed-off-by: Ake Koomsin 

Applied and queued up for -stable.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next V3] virtio_net: ethtool tx napi configuration

2018-10-10 Thread David Miller
From: Jason Wang 
Date: Tue,  9 Oct 2018 10:06:26 +0800

> Implement ethtool .set_coalesce (-C) and .get_coalesce (-c) handlers.
> Interrupt moderation is currently not supported, so these accept and
> display the default settings of 0 usec and 1 frame.
> 
> Toggle tx napi through setting tx-frames. So as to not interfere
> with possible future interrupt moderation, value 1 means tx napi while
> value 0 means not.
> 
> Only allow the switching when device is down for simplicity.
> 
> Link: https://patchwork.ozlabs.org/patch/948149/
> Suggested-by: Jason Wang 
> Signed-off-by: Willem de Bruijn 
> Signed-off-by: Jason Wang 
> ---
> Changes from V2:
> - only allow the switching when device is done
> - remove unnecessary global variable and initialization
> Changes from V1:
> - try to synchronize with datapath to allow changing mode when
>   interface is up.
> - use tx-frames 0 as to disable tx napi while tx-frames 1 to enable tx napi

Applied, with...

> + bool running = netif_running(dev);

this unused variable removed.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [REBASE PATCH net-next v9 0/4] net: vhost: improve performance when enable busyloop

2018-09-26 Thread David Miller
From: xiangxia.m@gmail.com
Date: Tue, 25 Sep 2018 05:36:48 -0700

> From: Tonghao Zhang 
> 
> This patches improve the guest receive performance.
> On the handle_tx side, we poll the sock receive queue
> at the same time. handle_rx do that in the same way.
> 
> For more performance report, see patch 4

Series applied, thank you.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next V2] virtio_net: ethtool tx napi configuration

2018-09-13 Thread David Miller
From: Jason Wang 
Date: Thu, 13 Sep 2018 13:35:45 +0800

> Toggle tx napi through a bit in tx-frames.

This is not what the code implements as the interface any more.

Please fix the commit message to match the code.

Thanks.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


Re: [PATCH net-next V2 00/11] vhost_net TX batching

2018-09-13 Thread David Miller
From: "Michael S. Tsirkin" 
Date: Thu, 13 Sep 2018 14:02:13 -0400

> On Thu, Sep 13, 2018 at 09:28:19AM -0700, David Miller wrote:
>> From: Jason Wang 
>> Date: Wed, 12 Sep 2018 11:16:58 +0800
>> 
>> > This series tries to batch submitting packets to underlayer socket
>> > through msg_control during sendmsg(). This is done by:
>>  ...
>> 
>> Series applied, thanks Jason.
> 
> Going over it now I don't see a lot to complain about, but I'd
> appreciate a bit more time for review in the future. 3 days ok?

I try to handle every patchwork entry within 48 hours, more
specifically 2 business days.

Given the rate at which patches flow through networking, I think
this is both reasonable and necessary.

One always have the option to send a quick reply to the list saying
"Please wait for my review, I'll get to it in the next day or two."
which I will certainly honor.

Thank you.
___
Virtualization mailing list
Virtualization@lists.linux-foundation.org
https://lists.linuxfoundation.org/mailman/listinfo/virtualization


  1   2   3   4   5   >