RE: [PATCH] hv_netvsc: Fix a deadlock by getting rtnl_lock earlier in netvsc_probe()
> From: David Miller > Sent: Wednesday, August 29, 2018 17:49 > > From: Dexuan Cui > Date: Wed, 22 Aug 2018 21:20:03 + > > > --- > > drivers/net/hyperv/netvsc_drv.c | 11 ++- > > 1 file changed, 10 insertions(+), 1 deletion(-) > > > > > > FYI: these are the related 3 paths which show the deadlock: > > This incredibly useful information belongs in the commit log > message, and therefore before the --- and signoffs. Hi David, I was afraid the call-traces are too detailed. :-) Can you please move the info to before the --- line? Or, should I resend the patch with the commit log updated? Thanks, -- Dexuan
[PATCH net] hv_sock: add locking in the open/close/release code paths
Without the patch, when hvs_open_connection() hasn't completely established a connection (e.g. it has changed sk->sk_state to SS_CONNECTED, but hasn't inserted the sock into the connected queue), vsock_stream_connect() may see the sk_state change and return the connection to the userspace, and next when the userspace closes the connection quickly, hvs_release() may not see the connection in the connected queue; finally hvs_open_connection() inserts the connection into the queue, but we won't be able to purge the connection for ever. Signed-off-by: Dexuan Cui <de...@microsoft.com> Cc: K. Y. Srinivasan <k...@microsoft.com> Cc: Haiyang Zhang <haiya...@microsoft.com> Cc: Stephen Hemminger <sthem...@microsoft.com> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com> Cc: Cathy Avery <cav...@redhat.com> Cc: Rolf Neugebauer <rolf.neugeba...@docker.com> Cc: Marcelo Cerri <marcelo.ce...@canonical.com> --- Please consider this for v4.14. net/vmw_vsock/hyperv_transport.c | 22 ++ 1 file changed, 18 insertions(+), 4 deletions(-) diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c index 14ed5a3..e21991f 100644 --- a/net/vmw_vsock/hyperv_transport.c +++ b/net/vmw_vsock/hyperv_transport.c @@ -310,11 +310,15 @@ static void hvs_close_connection(struct vmbus_channel *chan) struct sock *sk = get_per_channel_state(chan); struct vsock_sock *vsk = vsock_sk(sk); + lock_sock(sk); + sk->sk_state = SS_UNCONNECTED; sock_set_flag(sk, SOCK_DONE); vsk->peer_shutdown |= SEND_SHUTDOWN | RCV_SHUTDOWN; sk->sk_state_change(sk); + + release_sock(sk); } static void hvs_open_connection(struct vmbus_channel *chan) @@ -344,6 +348,8 @@ static void hvs_open_connection(struct vmbus_channel *chan) if (!sk) return; + lock_sock(sk); + if ((conn_from_host && sk->sk_state != VSOCK_SS_LISTEN) || (!conn_from_host && sk->sk_state != SS_CONNECTING)) goto out; @@ -395,9 +401,7 @@ static void hvs_open_connection(struct vmbus_channel *chan) vsock_insert_connected(vnew); - lock_sock(sk); vsock_enqueue_accept(sk, new); - release_sock(sk); } else { sk->sk_state = SS_CONNECTED; sk->sk_socket->state = SS_CONNECTED; @@ -410,6 +414,8 @@ static void hvs_open_connection(struct vmbus_channel *chan) out: /* Release refcnt obtained when we called vsock_find_bound_socket() */ sock_put(sk); + + release_sock(sk); } static u32 hvs_get_local_cid(void) @@ -476,13 +482,21 @@ static int hvs_shutdown(struct vsock_sock *vsk, int mode) static void hvs_release(struct vsock_sock *vsk) { + struct sock *sk = sk_vsock(vsk); struct hvsock *hvs = vsk->trans; - struct vmbus_channel *chan = hvs->chan; + struct vmbus_channel *chan; + lock_sock(sk); + + sk->sk_state = SS_DISCONNECTING; + vsock_remove_sock(vsk); + + release_sock(sk); + + chan = hvs->chan; if (chan) hvs_shutdown(vsk, RCV_SHUTDOWN | SEND_SHUTDOWN); - vsock_remove_sock(vsk); } static void hvs_destruct(struct vsock_sock *vsk) -- 2.7.4
RE: [PATCH] vsock: only load vmci transport on VMware hypervisor by default
> From: Jorgen S. Hansen [mailto:jhan...@vmware.com] > Sent: Wednesday, September 6, 2017 7:11 AM >> ... > > I'm currently working on NFS over AF_VSOCK and sock_diag support (for > > ss(8) and netstat-like tools). > > > > Multi-transport support is lower priority for me at the moment. I'm > > happy to review patches though. If there is no progress on this by the > > end of the year then I will have time to work on it. > > > > I’ll try to find time to write a more coherent proposal in the coming weeks, > and we can discuss that. > > Jorgen Thank you! Thanks, -- Dexuan
RE: [PATCH] vsock: only load vmci transport on VMware hypervisor by default
> From: Stefan Hajnoczi [mailto:stefa...@redhat.com] > Sent: Thursday, August 31, 2017 4:55 AM > ... > On Tue, Aug 29, 2017 at 03:37:07PM +, Jorgen S. Hansen wrote: > > > On Aug 29, 2017, at 4:36 AM, Dexuan Cui <de...@microsoft.com> wrote: > > If we allow multiple host side transports, virtio host side support and > > vmci should be able to coexist regardless of the order of initialization. > > That sounds good to me. > > This means af_vsock.c needs to be aware of CID allocation. Currently the > vhost_vsock.ko driver handles this itself (it keeps a list of CIDs and > checks that they are not used twice). It should be possible to move > that state into af_vsock.c so we have <cid, host_transport> pairs. > > I'm currently working on NFS over AF_VSOCK and sock_diag support (for > ss(8) and netstat-like tools). > > Multi-transport support is lower priority for me at the moment. I'm > happy to review patches though. If there is no progress on this by the > end of the year then I will have time to work on it. I understand. Thank you both for sharing the details about the plan! > Are either of you are in Prague, Czech Republic on October 25-27 for > Linux Kernel Summit, Open Source Summit Europe, Embedded Linux > Conference Europe, KVM Forum, or MesosCon Europe? > > Stefan I regret I won't be there this year. Thanks, -- Dexuan
RE: [PATCH] vsock: only load vmci transport on VMware hypervisor by default
> From: Dexuan Cui > Sent: Tuesday, August 22, 2017 21:21 > > ... > > ... > > The only problem here would be the potential for a guest and a host app > to > > have a conflict wrt port numbers, even though they would be able to > > operate fine, if restricted to their appropriate transport. > > > > Thanks, > > Jorgen > > Hi Jorgen, Stefan, > Thank you for the detailed analysis! > You have a much better understanding than me about the complex > scenarios. Can you please work out a patch? :-) Hi Jorgen, Stefan, May I know your plan for this? > IMO Linux driver of Hyper-V sockets is the simplest case, as we only have > the "to host" option (the host side driver of Hyper-V sockets runs on > Windows kernel and I don't think the other hypervisors emulate > the full Hyper-V VMBus 4.0, which is required to support Hyper-V sockets). > > -- Dexuan Thanks, -- Dexuan
RE: [PATCH v3 net-next 1/1] hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)
> From: David Miller [mailto:da...@davemloft.net] > Sent: Monday, August 28, 2017 15:39 > From: Dexuan Cui <de...@microsoft.com> > Date: Sat, 26 Aug 2017 04:52:43 + > > > > > Hyper-V Sockets (hv_sock) supplies a byte-stream based communication > > mechanism between the host and the guest. It uses VMBus ringbuffer as > the > > transportation layer. > > > > With hv_sock, applications between the host (Windows 10, Windows > Server > > 2016 or newer) and the guest can talk with each other using the traditional > > socket APIs. > > > > Signed-off-by: Dexuan Cui <de...@microsoft.com> > > Applied, thank you. Thanks a lot! There are some supporting patches still pending in the VMBus driver. I'll make sure they go in through the char-misc tree. Thanks, -- Dexuan
[PATCH v3 net-next 1/1] hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication mechanism between the host and the guest. It uses VMBus ringbuffer as the transportation layer. With hv_sock, applications between the host (Windows 10, Windows Server 2016 or newer) and the guest can talk with each other using the traditional socket APIs. More info about Hyper-V Sockets is available here: "Make your own integration services": https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-guide/make-integration-service The patch implements the necessary support in Linux guest by introducing a new vsock transport for AF_VSOCK. Signed-off-by: Dexuan Cui <de...@microsoft.com> Cc: K. Y. Srinivasan <k...@microsoft.com> Cc: Haiyang Zhang <haiya...@microsoft.com> Cc: Stephen Hemminger <sthem...@microsoft.com> Cc: Andy King <ack...@vmware.com> Cc: Dmitry Torokhov <d...@vmware.com> Cc: George Zhang <georgezh...@vmware.com> Cc: Jorgen Hansen <jhan...@vmware.com> Cc: Reilly Grant <gra...@vmware.com> Cc: Asias He <as...@redhat.com> Cc: Stefan Hajnoczi <stefa...@redhat.com> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com> Cc: Cathy Avery <cav...@redhat.com> Cc: Rolf Neugebauer <rolf.neugeba...@docker.com> Cc: Marcelo Cerri <marcelo.ce...@canonical.com> --- Changes in v2: fixed hvs_stream_allow() for cid and the comments Thanks Stefan Hajnoczi! added proper locking when using vsock_enqueue_accept() Thanks Stefan Hajnoczi and Jorgen Hansen! The previous v1 patch is not needed any more: [PATCH net-next 2/3] vsock: fix vsock_dequeue/enqueue_accept race Another previous v1 patch is being discussed in another thread: vsock: only load vmci transport on VMware hypervisor by default Changes in v3 (addressed David Millers's comments): used better naming: VMBUS_PKT_TRAILER_SIZE better handled fin_sent: removed atomic removed "inline" tags better handled uuid service_id assignments: avoid pointers MAINTAINERS | 1 + net/vmw_vsock/Kconfig| 12 + net/vmw_vsock/Makefile | 3 + net/vmw_vsock/hyperv_transport.c | 904 +++ 4 files changed, 920 insertions(+) create mode 100644 net/vmw_vsock/hyperv_transport.c diff --git a/MAINTAINERS b/MAINTAINERS index 2db0f8c..dae0573 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6279,6 +6279,7 @@ F:drivers/net/hyperv/ F: drivers/scsi/storvsc_drv.c F: drivers/uio/uio_hv_generic.c F: drivers/video/fbdev/hyperv_fb.c +F: net/vmw_vsock/hyperv_transport.c F: include/linux/hyperv.h F: tools/hv/ F: Documentation/ABI/stable/sysfs-bus-vmbus diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig index a7ae09d..3f52929 100644 --- a/net/vmw_vsock/Kconfig +++ b/net/vmw_vsock/Kconfig @@ -46,3 +46,15 @@ config VIRTIO_VSOCKETS_COMMON This option is selected by any driver which needs to access the virtio_vsock. The module will be called vmw_vsock_virtio_transport_common. + +config HYPERV_VSOCKETS + tristate "Hyper-V transport for Virtual Sockets" + depends on VSOCKETS && HYPERV + help + This module implements a Hyper-V transport for Virtual Sockets. + + Enable this transport if your Virtual Machine host supports Virtual + Sockets over Hyper-V VMBus. + + To compile this driver as a module, choose M here: the module will be + called hv_sock. If unsure, say N. diff --git a/net/vmw_vsock/Makefile b/net/vmw_vsock/Makefile index 09fc2eb..e63d574 100644 --- a/net/vmw_vsock/Makefile +++ b/net/vmw_vsock/Makefile @@ -2,6 +2,7 @@ obj-$(CONFIG_VSOCKETS) += vsock.o obj-$(CONFIG_VMWARE_VMCI_VSOCKETS) += vmw_vsock_vmci_transport.o obj-$(CONFIG_VIRTIO_VSOCKETS) += vmw_vsock_virtio_transport.o obj-$(CONFIG_VIRTIO_VSOCKETS_COMMON) += vmw_vsock_virtio_transport_common.o +obj-$(CONFIG_HYPERV_VSOCKETS) += hv_sock.o vsock-y += af_vsock.o af_vsock_tap.o vsock_addr.o @@ -11,3 +12,5 @@ vmw_vsock_vmci_transport-y += vmci_transport.o vmci_transport_notify.o \ vmw_vsock_virtio_transport-y += virtio_transport.o vmw_vsock_virtio_transport_common-y += virtio_transport_common.o + +hv_sock-y += hyperv_transport.o diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c new file mode 100644 index 000..14ed5a3 --- /dev/null +++ b/net/vmw_vsock/hyperv_transport.c @@ -0,0 +1,904 @@ +/* + * Hyper-V transport for vsock + * + * Hyper-V Sockets supplies a byte-stream based communication mechanism + * between the host and the VM. This driver implements the necessary + * support in the VM by introducing the new vsock transport. + * + * Copyright (c) 2017, Microsoft Corporation. + * + * This program is free software; you can redistribute i
RE: [PATCH v2 net-next 1/1] hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)
> From: David Miller [mailto:da...@davemloft.net] > Sent: Thursday, August 24, 2017 18:20 > > +#define VMBUS_PKT_TRAILER (sizeof(u64)) > > This is not the packet trailer, it's the size of the packet trailer. Thanks! I'll change it to VMBUS_PKT_TRAILER_SIZE. > > + /* Have we sent the zero-length packet (FIN)? */ > > + unsigned long fin_sent; > > Why does this need to be atomic? Why can't a smaller simpler It doesn't have to be. It was originally made for a quick workaround. Thanks! I should do it in the right way now. > mechanism be used to make sure hvs_shutdown() only performs > hvs_send_data() call once on the channel? I'll change "fin_sent" to bool, and avoid test_and_set_bit(). I'll add lock_sock/release_sock() in hvs_shutdown() like this: static int hvs_shutdown(struct vsock_sock *vsk, int mode) { ... lock_sock(sk); hvs = vsk->trans; if (hvs->fin_sent) goto out; send_buf = (struct hvs_send_buf *) (void)hvs_send_data(hvs->chan, send_buf, 0); hvs->fin_sent = true; out: release_sock(sk); return 0; } > > +static inline bool is_valid_srv_id(const uuid_le *id) > > +{ > > + return !memcmp(>b[4], _id_template.b[4], sizeof(uuid_le) - > 4); > > +} > > Do not use the inline function attribute in *.c code. Let the > compiler decide. OK. Will remove all the inline usages. > > + *((u32 *)>vm_srv_id) = vsk->local_addr.svm_port; > > + *((u32 *)>host_srv_id) = vsk->remote_addr.svm_port; > > There has to be a better way to express this. I may need to define a uinon here. Let me try it. > And if this is partially initializing vm_srv_id, at a minimum > endianness needs to be taken into account. I may need to use cpu_to_le32(). Let me check it.
[PATCH v2 net-next 1/1] hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication mechanism between the host and the guest. It uses VMBus ringbuffer as the transportation layer. With hv_sock, applications between the host (Windows 10, Windows Server 2016 or newer) and the guest can talk with each other using the traditional socket APIs. More info about Hyper-V Sockets is available here: "Make your own integration services": https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-guide/make-integration-service The patch implements the necessary support in Linux guest by introducing a new vsock transport for AF_VSOCK. Signed-off-by: Dexuan Cui <de...@microsoft.com> Cc: K. Y. Srinivasan <k...@microsoft.com> Cc: Haiyang Zhang <haiya...@microsoft.com> Cc: Stephen Hemminger <sthem...@microsoft.com> Cc: Andy King <ack...@vmware.com> Cc: Dmitry Torokhov <d...@vmware.com> Cc: George Zhang <georgezh...@vmware.com> Cc: Jorgen Hansen <jhan...@vmware.com> Cc: Reilly Grant <gra...@vmware.com> Cc: Stefan Hajnoczi <stefa...@redhat.com> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com> Cc: Cathy Avery <cav...@redhat.com> Cc: Rolf Neugebauer <rolf.neugeba...@docker.com> Cc: Marcelo Cerri <marcelo.ce...@canonical.com> --- Changes in v2: Fixed hvs_stream_allow() for cid and the comments Thanks Stefan Hajnoczi! Added proper locking when using vsock_enqueue_accept() Thanks Stefan Hajnoczi and Jorgen Hansen! The previous v1 patch is not needed any more: [PATCH net-next 2/3] vsock: fix vsock_dequeue/enqueue_accept race Another previous v1 patch is being discussed in another thread: vsock: only load vmci transport on VMware hypervisor by default MAINTAINERS | 1 + net/vmw_vsock/Kconfig| 12 + net/vmw_vsock/Makefile | 3 + net/vmw_vsock/hyperv_transport.c | 888 +++ 4 files changed, 904 insertions(+) create mode 100644 net/vmw_vsock/hyperv_transport.c diff --git a/MAINTAINERS b/MAINTAINERS index 2db0f8c..dae0573 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6279,6 +6279,7 @@ F:drivers/net/hyperv/ F: drivers/scsi/storvsc_drv.c F: drivers/uio/uio_hv_generic.c F: drivers/video/fbdev/hyperv_fb.c +F: net/vmw_vsock/hyperv_transport.c F: include/linux/hyperv.h F: tools/hv/ F: Documentation/ABI/stable/sysfs-bus-vmbus diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig index a7ae09d..3f52929 100644 --- a/net/vmw_vsock/Kconfig +++ b/net/vmw_vsock/Kconfig @@ -46,3 +46,15 @@ config VIRTIO_VSOCKETS_COMMON This option is selected by any driver which needs to access the virtio_vsock. The module will be called vmw_vsock_virtio_transport_common. + +config HYPERV_VSOCKETS + tristate "Hyper-V transport for Virtual Sockets" + depends on VSOCKETS && HYPERV + help + This module implements a Hyper-V transport for Virtual Sockets. + + Enable this transport if your Virtual Machine host supports Virtual + Sockets over Hyper-V VMBus. + + To compile this driver as a module, choose M here: the module will be + called hv_sock. If unsure, say N. diff --git a/net/vmw_vsock/Makefile b/net/vmw_vsock/Makefile index 09fc2eb..e63d574 100644 --- a/net/vmw_vsock/Makefile +++ b/net/vmw_vsock/Makefile @@ -2,6 +2,7 @@ obj-$(CONFIG_VSOCKETS) += vsock.o obj-$(CONFIG_VMWARE_VMCI_VSOCKETS) += vmw_vsock_vmci_transport.o obj-$(CONFIG_VIRTIO_VSOCKETS) += vmw_vsock_virtio_transport.o obj-$(CONFIG_VIRTIO_VSOCKETS_COMMON) += vmw_vsock_virtio_transport_common.o +obj-$(CONFIG_HYPERV_VSOCKETS) += hv_sock.o vsock-y += af_vsock.o af_vsock_tap.o vsock_addr.o @@ -11,3 +12,5 @@ vmw_vsock_vmci_transport-y += vmci_transport.o vmci_transport_notify.o \ vmw_vsock_virtio_transport-y += virtio_transport.o vmw_vsock_virtio_transport_common-y += virtio_transport_common.o + +hv_sock-y += hyperv_transport.o diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c new file mode 100644 index 000..597fb25 --- /dev/null +++ b/net/vmw_vsock/hyperv_transport.c @@ -0,0 +1,888 @@ +/* + * Hyper-V transport for vsock + * + * Hyper-V Sockets supplies a byte-stream based communication mechanism + * between the host and the VM. This driver implements the necessary + * support in the VM by introducing the new vsock transport. + * + * Copyright (c) 2017, Microsoft Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS F
RE: [PATCH] vsock: only load vmci transport on VMware hypervisor by default
> From: Jorgen S. Hansen [mailto:jhan...@vmware.com] > > On Aug 22, 2017, at 11:54 AM, Stefan Hajnoczi> wrote: > > ... > > We *can* by looking at the destination CID. Please take a look at > > drivers/misc/vmw_vmci/vmci_route.c:vmci_route() to see how VMCI > handles > > nested virt. > > > > It boils down to something like this: > > > > static int vsock_stream_connect(struct socket *sock, struct sockaddr *addr, > > int addr_len, int flags) > > { > > ... > > if (remote_addr.svm_cid == VMADDR_CID_HOST) > > transport = host_transport; > > else > > transport = guest_transport; > > > > It's easy for connect(2) but Jorgen mentioned it's harder for listen(2) > > because the socket would need to listen on both transports. We define > > two new constants VMADDR_CID_LISTEN_FROM_GUEST and > > VMADDR_CID_LISTEN_FROM_HOST for bind(2) so that applications can > decide > > which side to listen on. > > If a socket is bound to VMADDR_CID_HOST, we would consider that socket as > bound to the host side transport, so that would be the same as > VMADDR_CID_LISTEN_FROM_GUEST. For the guest, we have > IOCTL_VM_SOCKETS_GET_LOCAL_CID, so that could be used to get and bind > a socket to the guest transport (VMCI will always return the guest CID as the > local one, if the VMCI driver is used in a guest, and it looks like virtio > will do > the same). We could treat VMADDR_CID_ANY as always being the guest > transport, since that is the use case where you don’t know upfront what > your CID is, if we don’t want to listen on all transports. So we would use the > host transport, if a socket is bound to VMADDR_CID_HOST, or if there is no > guest transport, and in all other cases use the guest transport. However, > having a couple of symbolic names like you suggest certainly makes it more > obvious, and could be used in combination with this. It would be a plus if > existing applications would function as intended in most cases. > > > Or the listen socket could simply listen to > > both sides. > > The only problem here would be the potential for a guest and a host app to > have a conflict wrt port numbers, even though they would be able to > operate fine, if restricted to their appropriate transport. > > Thanks, > Jorgen Hi Jorgen, Stefan, Thank you for the detailed analysis! You have a much better understanding than me about the complex scenarios. Can you please work out a patch? :-) IMO Linux driver of Hyper-V sockets is the simplest case, as we only have the "to host" option (the host side driver of Hyper-V sockets runs on Windows kernel and I don't think the other hypervisors emulate the full Hyper-V VMBus 4.0, which is required to support Hyper-V sockets). -- Dexuan
RE: [PATCH net-next 3/3] hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)
> From: Stefan Hajnoczi [mailto:stefa...@redhat.com] > On Fri, Aug 18, 2017 at 10:23:54PM +, Dexuan Cui wrote: > > > > +static bool hvs_stream_allow(u32 cid, u32 port) > > > > +{ > > > > + static const u32 valid_cids[] = { > > > > + VMADDR_CID_ANY, > > > > > > Is this for loopback? > > > > No, we don't support lookback in Linux VM, at least for now. > > In our Linux implementation, Linux VM can only connect to the host, and > > here when Linux VM calls connect(), I treat VMADDR_CID_ANY > > the same as VMADDR_CID_HOST. > > VMCI and virtio-vsock do not treat connect(VMADDR_CID_ANY) the same as > connect(VMADDR_CID_HOST). It is an error to connect to VMADDR_CID_ANY. Ok. Then I'll only allow VMADDR_CID_HOST as the destination CID, since we don't support loopback mode. > > > > + /* The host's port range [MIN_HOST_EPHEMERAL_PORT, 0x) > > > is > > > > +* reserved as ephemeral ports, which are used as the host's > > > > ports > > > > +* when the host initiates connections. > > > > +*/ > > > > + if (port > MAX_HOST_LISTEN_PORT) > > > > + return false; > > > > > > Without this if statement the guest will attempt to connect. I guess > > > there will be no listen sockets above MAX_HOST_LISTEN_PORT, so the > > > connection attempt will fail. > > > > You're correct. > > To use the vsock common infrastructure, we have to map Hyper-V's > > GUID <VM_ID, Service_ID> to int <cid, port>, and hence we must limit > > the port range we can listen() on to [0, MAX_LISTEN_PORT], i.e. > > we can only use half of the whole 32-bit port space for listen(). > > This is detailed in the long comments starting at about Line 100. > > > > > ...but hardcode this knowledge into the guest driver? > > I'd like the guest's connect() to fail immediately here. > > IMO this is better than a connect timeout. :-) > > Thanks for explaining. Perhaps the comment could be updated: > > /* The host's port range [MIN_HOST_EPHEMERAL_PORT, 0x) is > * reserved as ephemeral ports, which are used as the host's ports when > * the host initiates connections. > * > * Perform this check in the guest so an immediate error is produced > * instead of a timeout. > */ > > Stefan Thank you, Stefan! Please see the below for the updated version of the function: static bool hvs_stream_allow(u32 cid, u32 port) { /* The host's port range [MIN_HOST_EPHEMERAL_PORT, 0x) is * reserved as ephemeral ports, which are used as the host's ports * when the host initiates connections. * * Perform this check in the guest so an immediate error is produced * instead of a timeout. */ if (port > MAX_HOST_LISTEN_PORT) return false; if (cid == VMADDR_CID_HOST) return true; return false; } I'll send a v2 of the patch later today. -- Dexuan
RE: [PATCH] vsock: only load vmci transport on VMware hypervisor by default
> From: Stefan Hajnoczi [mailto:stefa...@redhat.com] > > CID is not really used by us, because we only support guest<->host > communication, > > and don't support guest<->guest communication. The Hyper-V host > references > > every VM by VmID (which is invisible to the VM), and a VM can only talk to > the > > host via this feature. > > Applications running inside the guest should use VMADDR_CID_HOST (2) to > connect to the host, even on Hyper-V. I have no objection, and this patch does support this usage of the user-space applications. > By the way, we should collaborate on a test suite and a vsock(7) man > page that documents the semantics of AF_VSOCK sockets. This way our > transports will have the same behavior and AF_VSOCK applications will > work on all 3 hypervisors. I can't agree more. :-) BTW, I have been using Rolf's test suite to test my patch: https://github.com/rn/virtsock/tree/master/c Maybe this can be a good starting point. > Not all features need to be supported. For example, VMCI supports > SOCK_DGRAM while Hyper-V and virtio do not. But features that are > available should behave identically. I totally agree, though I'm afraid Hyper-V may have a little more limitations compared to VMware/KVM duo to the<--> mapping. > > Can we use the 'protocol' parameter in the socket() function: > > int socket(int domain, int type, int protocol) > > > > IMO currently the 'protocol' is not really used. > > I think we can modify __vsock_core_init() to allow multiple transport layers > to > > be registered, and we can define different 'protocol' numbers for > > VMware/KVM/Hyper-V, and ask the application to explicitly specify what > should > > be used. Considering compatibility, we can use the default transport in a > given > > VM depending on the underlying hypervisor. > > I think AF_VSOCK should hide the transport from users/applications. Ideally yes, but let's consider the KVM-on-KVM nested scenario: when an application in the Level-1 VM creates an AF_VSOCK socket and call connect() for it, how can we know if the app is trying to connect to the Level-0 host, or connect to the Level-2 VM? We can't. That's why I propose we should use the 'protocol' parameter to distinguish between "to guest" and "to host". With my proposal, in the above scenario, by default (the 'protocol' is 0), we choose the "to host" transport layer when socket() is called; if the userspace app explicitly specifies "to guest", we choose the "to guest" transport layer when socket() is called. This way, the connect(), bind(), etc. can work automatically. (Of course, the default transport for a give VM can be better chosen if we detect which nested level the app is running on.) > Think of same-on-same nested virtualization: VMware-on-VMware or > KVM-on-KVM. In that case specifying VMCI or virtio doesn't help. > > We'd still need to distinguish between "to guest" and "to host" > (currently VMCI has code to do this but virtio does not). > > The natural place to distinguish the destination is when dealing with > the sockaddr in connect(), bind(), etc. > > Stefan Thanks, -- Dexuan
RE: [PATCH net-next 3/3] hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)
> From: Stefan Hajnoczi [mailto:stefa...@redhat.com] > Sent: Thursday, August 17, 2017 07:56 > To: Dexuan Cui <de...@microsoft.com> > On Tue, Aug 15, 2017 at 10:18:41PM +, Dexuan Cui wrote: > > +static u32 hvs_get_local_cid(void) > > +{ > > + return VMADDR_CID_ANY; > > +} > > Interesting concept: the guest never knows its CID. This is nice from a > live migration perspective. Currently VMCI and virtio adjust listen > socket local CIDs after migration. > > > +static bool hvs_stream_allow(u32 cid, u32 port) > > +{ > > + static const u32 valid_cids[] = { > > + VMADDR_CID_ANY, > > Is this for loopback? No, we don't support lookback in Linux VM, at least for now. In our Linux implementation, Linux VM can only connect to the host, and here when Linux VM calls connect(), I treat VMADDR_CID_ANY the same as VMADDR_CID_HOST. > > + VMADDR_CID_HOST, > > + }; > > + int i; > > + > > + /* The host's port range [MIN_HOST_EPHEMERAL_PORT, 0x) > is > > +* reserved as ephemeral ports, which are used as the host's ports > > +* when the host initiates connections. > > +*/ > > + if (port > MAX_HOST_LISTEN_PORT) > > + return false; > > Without this if statement the guest will attempt to connect. I guess > there will be no listen sockets above MAX_HOST_LISTEN_PORT, so the > connection attempt will fail. You're correct. To use the vsock common infrastructure, we have to map Hyper-V's GUID <VM_ID, Service_ID> to int <cid, port>, and hence we must limit the port range we can listen() on to [0, MAX_LISTEN_PORT], i.e. we can only use half of the whole 32-bit port space for listen(). This is detailed in the long comments starting at about Line 100. > ...but hardcode this knowledge into the guest driver? I'd like the guest's connect() to fail immediately here. IMO this is better than a connect timeout. :-) Thanks, -- Dexuan
RE: [PATCH net-next 2/3] vsock: fix vsock_dequeue/enqueue_accept race
> From: Stefan Hajnoczi [mailto:stefa...@redhat.com] > Sent: Thursday, August 17, 2017 07:06 > > On Tue, Aug 15, 2017 at 10:15:39PM +0000, Dexuan Cui wrote: > > With the current code, when vsock_dequeue_accept() is removing a sock > > from the list, nothing prevents vsock_enqueue_accept() from adding a new > > sock into the list concurrently. We should add a lock to protect the list. > > The listener sock is locked, preventing concurrent modification. I have > checked both the virtio and vmci transports. Can you post an example > where the listener sock isn't locked? > > Stefan Sorry, I was not careful when checking the vmci code. Please ignore the patch. Now I realized the expectation is that the individual transport drivers should do the locking for vsock_enqueue_accept(), but for vsock_dequeue_accept(), the locking is done by the common vsock driver. Thanks, -- Dexuan
RE: [PATCH] vsock: only load vmci transport on VMware hypervisor by default
> From: Jorgen S. Hansen [mailto:jhan...@vmware.com] > Sent: Thursday, August 17, 2017 08:17 > > > > Putting aside nested virtualization, I want to load the transport (vmci, > > Hyper-V, vsock) for which there is paravirtualized hardware present > > inside the guest. > > Good points. Completely agree that this is the desired behavior for a guest. > > > > It's a little tricker on the host side (doesn't matter for Hyper-V and > > probably also doesn't for VMware) because the host-side driver is a > > software device with no hardware backing it. In KVM we assume the > > vhost_vsock.ko kernel module will be loaded sufficiently early. > > Since the vmci driver is currently tied to PF_VSOCK it hasn’t been a problem, > but on the host side the VMCI driver has no hardware backing it either, so > when we move to a more appropriate solution, this will be an issue for VMCI as > well. I’ll check our shipped products, but they most likely assume that if an > upstreamed vmci module is present, it will be loaded automatically. Hyper-V Sockets is a standard feature of VMBus v4.0, so we can easily know we can and should load iff vmbus_proto_version >= VERSION_WIN10. > > Things get trickier with nested virtualization because the VM might want > > to talk to its host but also to its nested VMs. The simple way of > > fixing this would be to allow two transports loaded simultaneously and > > route traffic destined to CID 2 to the host transport and all other > > traffic to the guest transport. This sounds like a little tricky to me. CID is not really used by us, because we only support guest<->host communication, and don't support guest<->guest communication. The Hyper-V host references every VM by VmID (which is invisible to the VM), and a VM can only talk to the host via this feature. > This is close to the routing the VMCI driver does in a nested environment, but > that is with the assumption that there is only one type of transport. Having > two > different transports would require that we delay resolving the transport type > until the socket endpoint has been bound to an address. Things get trickier if > listening sockets use VMADDR_CID_ANY - if only one transport is present, this > would allow the socket to accept connections from both guests and outer host, > but with multiple transports that won’t work, since we can’t associate a > socket > with a transport until the socket is bound. > > > > > Perhaps we should discuss these cases a bit more to figure out how to > > avoid conflicts over MODULE_ALIAS_NETPROTO(PF_VSOCK). > > Agreed. Can we use the 'protocol' parameter in the socket() function: int socket(int domain, int type, int protocol) IMO currently the 'protocol' is not really used. I think we can modify __vsock_core_init() to allow multiple transport layers to be registered, and we can define different 'protocol' numbers for VMware/KVM/Hyper-V, and ask the application to explicitly specify what should be used. Considering compatibility, we can use the default transport in a given VM depending on the underlying hypervisor. -- Dexuan
RE: [PATCH net-next 2/3] vsock: fix vsock_dequeue/enqueue_accept race
> > On Aug 16, 2017, at 12:15 AM, Dexuan Cui <de...@microsoft.com> wrote: > > With the current code, when vsock_dequeue_accept() is removing a sock > > from the list, nothing prevents vsock_enqueue_accept() from adding a new > > sock into the list concurrently. We should add a lock to protect the list. > > For the VMCI socket transport, we always lock the sockets before calling into > vsock_enqueue_accept and af_vsock.c locks the socket before calling > vsock_dequeue_accept, so from our point of view these operations are already > protected, but with finer granularity than a single global lock. As far as I > can see, > the virtio transport also locks the socket before calling > vsock_enqueue_accept, > so they should be fine with the current version as well, but Stefan can > comment > on that. > > Jorgen Hi Jorgen, Thanks, you're correct. Please ignore this patch. I'll update the hv_sock driver to add proper lock_sock()/relesae_sock(). Thanks, -- Dexuan
RE: [PATCH] vsock: only load vmci transport on VMware hypervisor by default
> From: David Miller [mailto:da...@davemloft.net] > Sent: Thursday, August 17, 2017 10:04 > I would avoid module parameters at all costs. > > It is the worst possible interface for users of your software. > > You really need to fundamentally solve the problems related to making > sure the proper modules for the VM actually present on the system get > loaded when necessary rather than adding hacks like this. > > Unlike a proper solution, these hacks are ugly but have to stay around > forever once you put them in place. Sorry for reminding me again, David! :-) I'll try to figure out the correct solution. Thanks, -- Dexuan
RE: [PATCH net-next 1/3] VMCI: only load on VMware hypervisor
> From: Dexuan Cui > Sent: Wednesday, August 16, 2017 15:34 > > From: Jorgen S. Hansen [mailto:jhan...@vmware.com] > > > Without the patch, vmw_vsock_vmci_transport.ko and vmw_vmci.ko can > > > automatically load when an application creates an AF_VSOCK socket. > > > > > > This is the expected good behavior on VMware hypervisor, but as we > > > are going to add hv_sock.ko (i.e. Hyper-V transport for AF_VSOCK), we > > > should make sure vmw_vsock_vmci_transport.ko doesn't load on Hyper- > V, > > > otherwise there is a -EBUSY conflict when both > vmw_vsock_vmci_transport.ko > > > and hv_sock.ko try to call vsock_core_init() on Hyper-V. > > > > The VMCI driver (vmw_vmci.ko) is used both by the VMware guest support > > (VMware Tools primarily) and by our Workstation product. Always > disabling the > > VMCI driver on Hyper-V means that user won’t be able to run Workstation > > nested in Linux VMs on Hyper-V. Since the VMCI driver itself isn’t the > problem > > here, maybe we could move the check to vmw_vsock_vmci_transport.ko? > > Ideally, there should be some way for a user to have access to both > protocols, > > but for now disabling the VMCI socket transport for Hyper-V (possibly with > a > > module option to skip that check and always load it) but leaving the VMCI > driver > > functional would be better, > > > > Jorgen > > Thank you for explaining the background! > Then I'll make a new patch, following your suggestion. > > -- Dexuan Hi Jorgen, David, Just now I posted a new patch "[PATCH] vsock: only load vmci transport on VMware hypervisor by default" to replace this patch. @Jorgen: FWIW, with the new patch, when I create an AF_VSOCK sockets on Hyper-V, vmw_vmci.ko is also automatically loaded and 3 lines of kernel messages are printed, but I think I'm OK with this, since it's harmless. -- Dexuan
[PATCH] vsock: only load vmci transport on VMware hypervisor by default
Without the patch, vmw_vsock_vmci_transport.ko can automatically load when an application creates an AF_VSOCK socket. This is the expected good behavior on VMware hypervisor, but as we are going to add hv_sock.ko (i.e. Hyper-V transport for AF_VSOCK), we should make sure vmw_vsock_vmci_transport.ko can't load on Hyper-V, otherwise there is a -EBUSY conflict when both vmw_vsock_vmci_transport.ko and hv_sock.ko try to call vsock_core_init() on Hyper-V. On the other hand, hv_sock.ko can only load on Hyper-V, because it depends on hv_vmbus.ko, which detects Hyper-V in hv_acpi_init(). KVM's vsock_virtio_transport doesn't have the issue because it doesn't define MODULE_ALIAS_NETPROTO(PF_VSOCK). The patch also adds a module parameter "skip_hypervisor_check" for vmw_vsock_vmci_transport.ko. Signed-off-by: Dexuan Cui <de...@microsoft.com> Cc: Alok Kataria <akata...@vmware.com> Cc: Andy King <ack...@vmware.com> Cc: Adit Ranadive <ad...@vmware.com> Cc: George Zhang <georgezh...@vmware.com> Cc: Jorgen Hansen <jhan...@vmware.com> Cc: K. Y. Srinivasan <k...@microsoft.com> Cc: Haiyang Zhang <haiya...@microsoft.com> Cc: Stephen Hemminger <sthem...@microsoft.com> --- net/vmw_vsock/Kconfig | 2 +- net/vmw_vsock/vmci_transport.c | 11 +++ 2 files changed, 12 insertions(+), 1 deletion(-) diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig index a24369d..3f52929 100644 --- a/net/vmw_vsock/Kconfig +++ b/net/vmw_vsock/Kconfig @@ -17,7 +17,7 @@ config VSOCKETS config VMWARE_VMCI_VSOCKETS tristate "VMware VMCI transport for Virtual Sockets" - depends on VSOCKETS && VMWARE_VMCI + depends on VSOCKETS && VMWARE_VMCI && HYPERVISOR_GUEST help This module implements a VMCI transport for Virtual Sockets. diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c index 10ae782..c068873 100644 --- a/net/vmw_vsock/vmci_transport.c +++ b/net/vmw_vsock/vmci_transport.c @@ -16,6 +16,7 @@ #include #include #include +#include #include #include #include @@ -73,6 +74,10 @@ struct vmci_transport_recv_pkt_info { struct vmci_transport_packet pkt; }; +static bool skip_hypervisor_check; +module_param(skip_hypervisor_check, bool, 0444); +MODULE_PARM_DESC(hot_add, "If set, attempt to load on non-VMware platforms"); + static LIST_HEAD(vmci_transport_cleanup_list); static DEFINE_SPINLOCK(vmci_transport_cleanup_lock); static DECLARE_WORK(vmci_transport_cleanup_work, vmci_transport_cleanup); @@ -2085,6 +2090,12 @@ static int __init vmci_transport_init(void) { int err; + /* Check if we are running on VMware's hypervisor and bail out +* if we are not. +*/ + if (!skip_hypervisor_check && x86_hyper != _hyper_vmware) + return -ENODEV; + /* Create the datagram handle that we will use to send and receive all * VSocket control messages for this context. */ -- 2.7.4
RE: [PATCH net-next 1/3] VMCI: only load on VMware hypervisor
> From: Jorgen S. Hansen [mailto:jhan...@vmware.com] > > Without the patch, vmw_vsock_vmci_transport.ko and vmw_vmci.ko can > > automatically load when an application creates an AF_VSOCK socket. > > > > This is the expected good behavior on VMware hypervisor, but as we > > are going to add hv_sock.ko (i.e. Hyper-V transport for AF_VSOCK), we > > should make sure vmw_vsock_vmci_transport.ko doesn't load on Hyper-V, > > otherwise there is a -EBUSY conflict when both vmw_vsock_vmci_transport.ko > > and hv_sock.ko try to call vsock_core_init() on Hyper-V. > > The VMCI driver (vmw_vmci.ko) is used both by the VMware guest support > (VMware Tools primarily) and by our Workstation product. Always disabling the > VMCI driver on Hyper-V means that user won’t be able to run Workstation > nested in Linux VMs on Hyper-V. Since the VMCI driver itself isn’t the problem > here, maybe we could move the check to vmw_vsock_vmci_transport.ko? > Ideally, there should be some way for a user to have access to both protocols, > but for now disabling the VMCI socket transport for Hyper-V (possibly with a > module option to skip that check and always load it) but leaving the VMCI > driver > functional would be better, > > Jorgen Thank you for explaining the background! Then I'll make a new patch, following your suggestion. -- Dexuan
RE: [PATCH net-next 1/3] VMCI: only load on VMware hypervisor
> From: David Miller [mailto:da...@davemloft.net] > Sent: Wednesday, August 16, 2017 11:07 > > From: Dexuan Cui <de...@microsoft.com> > Date: Tue, 15 Aug 2017 22:13:29 + > > > + /* > > +* Check if we are running on VMware's hypervisor and bail out > > +* if we are not. > > +*/ > > + if (x86_hyper != _hyper_vmware) > > + return -ENODEV; > > This symbol is only available when CONFIG_HYPERVISOR_GUEST is defined. > But this driver does not have a Kconfig dependency on that symbol so > the build can fail in some configurations. Hi David, It looks typically modern Linux distros have CONFIG_HYPERVISOR_GUEST=y by default, but I agree here we should make the dependency explicit: --- a/drivers/misc/vmw_vmci/Kconfig +++ b/drivers/misc/vmw_vmci/Kconfig @@ -4,7 +4,7 @@ config VMWARE_VMCI tristate "VMware VMCI Driver" - depends on X86 && PCI + depends on X86 && PCI && HYPERVISOR_GUEST help This is VMware's Virtual Machine Communication Interface. It enables high-speed communication between host and guest in a virtual And it looks it's not a bad thing to add the dependency, because some existing VMWare drivers have had the dependency on CONFIG_HYPERVISOR_GUEST=y: drivers/input/mouse/vmmouse.c (MOUSE_PS2_VMMOUSE) drivers/misc/vmw_balloon.c (VMWARE_BALLOON) Do you want me to submit a v2 for this patch with the Kconfig change? -- Dexuan
[PATCH net-next 3/3] hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication mechanism between the host and the guest. It uses VMBus ringbuffer as the transportation layer. With hv_sock, applications between the host (Windows 10, Windows Server 2016 or newer) and the guest can talk with each other using the traditional socket APIs. More info about Hyper-V Sockets is available here: "Make your own integration services": https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-guide/make-integration-service The patch implements the necessary support in Linux guest by introducing a new vsock transport for AF_VSOCK. Signed-off-by: Dexuan Cui <de...@microsoft.com> Cc: K. Y. Srinivasan <k...@microsoft.com> Cc: Haiyang Zhang <haiya...@microsoft.com> Cc: Stephen Hemminger <sthem...@microsoft.com> Cc: Andy King <ack...@vmware.com> Cc: Dmitry Torokhov <d...@vmware.com> Cc: George Zhang <georgezh...@vmware.com> Cc: Jorgen Hansen <jhan...@vmware.com> Cc: Reilly Grant <gra...@vmware.com> Cc: Asias He <as...@redhat.com> Cc: Stefan Hajnoczi <stefa...@redhat.com> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com> Cc: Cathy Avery <cav...@redhat.com> Cc: Rolf Neugebauer <rolf.neugeba...@docker.com> Cc: Marcelo Cerri <marcelo.ce...@canonical.com> --- MAINTAINERS | 1 + net/vmw_vsock/Kconfig| 12 + net/vmw_vsock/Makefile | 3 + net/vmw_vsock/hyperv_transport.c | 890 +++ 4 files changed, 906 insertions(+) create mode 100644 net/vmw_vsock/hyperv_transport.c diff --git a/MAINTAINERS b/MAINTAINERS index 2db0f8c..dae0573 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -6279,6 +6279,7 @@ F:drivers/net/hyperv/ F: drivers/scsi/storvsc_drv.c F: drivers/uio/uio_hv_generic.c F: drivers/video/fbdev/hyperv_fb.c +F: net/vmw_vsock/hyperv_transport.c F: include/linux/hyperv.h F: tools/hv/ F: Documentation/ABI/stable/sysfs-bus-vmbus diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig index 8831e7c..a24369d 100644 --- a/net/vmw_vsock/Kconfig +++ b/net/vmw_vsock/Kconfig @@ -46,3 +46,15 @@ config VIRTIO_VSOCKETS_COMMON This option is selected by any driver which needs to access the virtio_vsock. The module will be called vmw_vsock_virtio_transport_common. + +config HYPERV_VSOCKETS + tristate "Hyper-V transport for Virtual Sockets" + depends on VSOCKETS && HYPERV + help + This module implements a Hyper-V transport for Virtual Sockets. + + Enable this transport if your Virtual Machine host supports Virtual + Sockets over Hyper-V VMBus. + + To compile this driver as a module, choose M here: the module will be + called hv_sock. If unsure, say N. diff --git a/net/vmw_vsock/Makefile b/net/vmw_vsock/Makefile index 09fc2eb..e63d574 100644 --- a/net/vmw_vsock/Makefile +++ b/net/vmw_vsock/Makefile @@ -2,6 +2,7 @@ obj-$(CONFIG_VSOCKETS) += vsock.o obj-$(CONFIG_VMWARE_VMCI_VSOCKETS) += vmw_vsock_vmci_transport.o obj-$(CONFIG_VIRTIO_VSOCKETS) += vmw_vsock_virtio_transport.o obj-$(CONFIG_VIRTIO_VSOCKETS_COMMON) += vmw_vsock_virtio_transport_common.o +obj-$(CONFIG_HYPERV_VSOCKETS) += hv_sock.o vsock-y += af_vsock.o af_vsock_tap.o vsock_addr.o @@ -11,3 +12,5 @@ vmw_vsock_vmci_transport-y += vmci_transport.o vmci_transport_notify.o \ vmw_vsock_virtio_transport-y += virtio_transport.o vmw_vsock_virtio_transport_common-y += virtio_transport_common.o + +hv_sock-y += hyperv_transport.o diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c new file mode 100644 index 000..1913b38 --- /dev/null +++ b/net/vmw_vsock/hyperv_transport.c @@ -0,0 +1,890 @@ +/* + * Hyper-V transport for vsock + * + * Hyper-V Sockets supplies a byte-stream based communication mechanism + * between the host and the VM. This driver implements the necessary + * support in the VM by introducing the new vsock transport. + * + * Copyright (c) 2017, Microsoft Corporation. + * + * This program is free software; you can redistribute it and/or modify it + * under the terms and conditions of the GNU General Public License, + * version 2, as published by the Free Software Foundation. + * + * This program is distributed in the hope it will be useful, but WITHOUT + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or + * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for + * more details. + * + */ +#include +#include +#include +#include +#include + +/* The host side's design of the feature requires 6 exact 4KB pages for + * recv/send rings respectively -- this is suboptimal considering memory + * consumption, however unluckily we have to live with it, before the + * host comes up with a better design in the future. + */ +#define PAGE_SIZE_4K 4096 +#define RINGBUFFER_
[PATCH net-next 2/3] vsock: fix vsock_dequeue/enqueue_accept race
With the current code, when vsock_dequeue_accept() is removing a sock from the list, nothing prevents vsock_enqueue_accept() from adding a new sock into the list concurrently. We should add a lock to protect the list. Signed-off-by: Dexuan Cui <de...@microsoft.com> Cc: Andy King <ack...@vmware.com> Cc: Dmitry Torokhov <d...@vmware.com> Cc: George Zhang <georgezh...@vmware.com> Cc: Jorgen Hansen <jhan...@vmware.com> Cc: Reilly Grant <gra...@vmware.com> Cc: Asias He <as...@redhat.com> Cc: Stefan Hajnoczi <stefa...@redhat.com> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com> Cc: Cathy Avery <cav...@redhat.com> Cc: K. Y. Srinivasan <k...@microsoft.com> Cc: Haiyang Zhang <haiya...@microsoft.com> Cc: Stephen Hemminger <sthem...@microsoft.com> --- net/vmw_vsock/af_vsock.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c index dfc8c51e..b7b2c66 100644 --- a/net/vmw_vsock/af_vsock.c +++ b/net/vmw_vsock/af_vsock.c @@ -126,6 +126,7 @@ static struct proto vsock_proto = { static const struct vsock_transport *transport; static DEFINE_MUTEX(vsock_register_mutex); +static DEFINE_SPINLOCK(vsock_accept_queue_lock); / EXPORTS / @@ -406,7 +407,10 @@ void vsock_enqueue_accept(struct sock *listener, struct sock *connected) sock_hold(connected); sock_hold(listener); + + spin_lock(_accept_queue_lock); list_add_tail(>accept_queue, >accept_queue); + spin_unlock(_accept_queue_lock); } EXPORT_SYMBOL_GPL(vsock_enqueue_accept); @@ -423,7 +427,10 @@ static struct sock *vsock_dequeue_accept(struct sock *listener) vconnected = list_entry(vlistener->accept_queue.next, struct vsock_sock, accept_queue); + spin_lock(_accept_queue_lock); list_del_init(>accept_queue); + spin_unlock(_accept_queue_lock); + sock_put(listener); /* The caller will need a reference on the connected socket so we let * it call sock_put(). -- 2.7.4
[PATCH net-next 1/3] VMCI: only load on VMware hypervisor
Without the patch, vmw_vsock_vmci_transport.ko and vmw_vmci.ko can automatically load when an application creates an AF_VSOCK socket. This is the expected good behavior on VMware hypervisor, but as we are going to add hv_sock.ko (i.e. Hyper-V transport for AF_VSOCK), we should make sure vmw_vsock_vmci_transport.ko doesn't load on Hyper-V, otherwise there is a -EBUSY conflict when both vmw_vsock_vmci_transport.ko and hv_sock.ko try to call vsock_core_init() on Hyper-V. On the other hand, hv_sock.ko can only load on Hyper-V, because it depends on hv_vmbus.ko, which detects Hyper-V in hv_acpi_init(). KVM's vsock_virtio_transport doesn't have the issue because it doesn't define MODULE_ALIAS_NETPROTO(PF_VSOCK). Signed-off-by: Dexuan Cui <de...@microsoft.com> Cc: Alok Kataria <akata...@vmware.com> Cc: Andy King <ack...@vmware.com> Cc: Adit Ranadive <ad...@vmware.com> Cc: George Zhang <georgezh...@vmware.com> Cc: Jorgen Hansen <jhan...@vmware.com> Cc: K. Y. Srinivasan <k...@microsoft.com> Cc: Haiyang Zhang <haiya...@microsoft.com> Cc: Stephen Hemminger <sthem...@microsoft.com> --- drivers/misc/vmw_vmci/vmci_driver.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/misc/vmw_vmci/vmci_driver.c b/drivers/misc/vmw_vmci/vmci_driver.c index d7eaf1e..1789ea7 100644 --- a/drivers/misc/vmw_vmci/vmci_driver.c +++ b/drivers/misc/vmw_vmci/vmci_driver.c @@ -19,6 +19,7 @@ #include #include #include +#include #include "vmci_driver.h" #include "vmci_event.h" @@ -58,6 +59,13 @@ static int __init vmci_drv_init(void) int vmci_err; int error; + /* +* Check if we are running on VMware's hypervisor and bail out +* if we are not. +*/ + if (x86_hyper != _hyper_vmware) + return -ENODEV; + vmci_err = vmci_event_init(); if (vmci_err < VMCI_SUCCESS) { pr_err("Failed to initialize VMCIEvent (result=%d)\n", -- 2.7.4
[PATCH net-next 0/3] add Hyper-V transport for Virtual Sockets
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication mechanism between the host and the guest. It uses VMBus ringbuffer as the transportation layer. PATCH 01 and 02 are for VMCI and the common infrastructure vsock. PATCH 03 implements the necessary support in Linux guest by introducing a new vsock transport for AF_VSOCK. Please review them. Note: there are some other supporting fixes in the VMBus driver. I'll post them separately for the char-misc tree. PS, there was an old implementation of Hyper-V Sockets posted last year: https://patchwork.kernel.org/patch/9244467/, which was not accepted. The biggest challenge was that why Hyper-V Sockets required a new address family, and I explained that was because of its different end point format. Compared to the old implementation, this new implementation maps Hyper-V Sockets end point format to vsock's , and hence it manages to share the common vsock infrastructure to greatly reduce duplicate code, and avoid adding a new address family. The details are documented in PATCH 03. Dexuan Cui (3): VMCI: only load on VMware hypervisor vsock: fix vsock_dequeue/enqueue_accept race hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK) MAINTAINERS | 1 + drivers/misc/vmw_vmci/vmci_driver.c | 8 + net/vmw_vsock/Kconfig | 12 + net/vmw_vsock/Makefile | 3 + net/vmw_vsock/af_vsock.c| 7 + net/vmw_vsock/hyperv_transport.c| 890 6 files changed, 921 insertions(+) create mode 100644 net/vmw_vsock/hyperv_transport.c -- 2.7.4
[PATCH] netvsc: fix use-after-free in netvsc_change_mtu()
'nvdev' is freed in rndis_filter_device_remove -> netvsc_device_remove -> free_netvsc_device, so we mustn't access it, before it's re-created in rndis_filter_device_add -> netvsc_device_add. Signed-off-by: Dexuan Cui <de...@microsoft.com> Cc: "K. Y. Srinivasan" <k...@microsoft.com> Cc: Haiyang Zhang <haiya...@microsoft.com> Cc: Stephen Hemminger <sthem...@microsoft.com> --- drivers/net/hyperv/netvsc_drv.c | 15 +++ 1 file changed, 11 insertions(+), 4 deletions(-) diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index 2d3cdb0..bc05c89 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -859,15 +859,22 @@ static int netvsc_change_mtu(struct net_device *ndev, int mtu) if (ret) goto out; + memset(_info, 0, sizeof(device_info)); + device_info.ring_size = ring_size; + device_info.num_chn = nvdev->num_chn; + device_info.max_num_vrss_chns = nvdev->num_chn; + ndevctx->start_remove = true; rndis_filter_device_remove(hdev, nvdev); + /* 'nvdev' has been freed in rndis_filter_device_remove() -> +* netvsc_device_remove () -> free_netvsc_device(). +* We mustn't access it before it's re-created in +* rndis_filter_device_add() -> netvsc_device_add(). +*/ + ndev->mtu = mtu; - memset(_info, 0, sizeof(device_info)); - device_info.ring_size = ring_size; - device_info.num_chn = nvdev->num_chn; - device_info.max_num_vrss_chns = nvdev->num_chn; rndis_filter_device_add(hdev, _info); out: -- 2.7.4
Mellanox ConnectX-3 VF driver can't work with 16 CPUs?
Hi, While trying SR-IOV with a Linux guest running on Hyper-V, I found this issue: the VF driver can't work if the guest has 16 virtual CPUs (less vCPUs, e.g. 8, can work fine): [9.927820] mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014) [9.927882] mlx4_core: Initializing b961:00:02.0 [9.970994] mlx4_core b961:00:02.0: Detected virtual function - running in slave mode [9.976783] mlx4_core b961:00:02.0: Sending reset [9.985858] mlx4_core b961:00:02.0: Sending vhcr0 [ 10.004855] mlx4_core b961:00:02.0: HCA minimum page size:512 [ 10.010465] mlx4_core b961:00:02.0: Timestamping is not supported in slave mode [ 10.203065] mlx4_core b961:00:02.0: Failed to initialize event queue table, aborting [ 10.226728] mlx4_core: probe of b961:00:02.0 failed with error -12 I'm using the mainline kernel (4.10.0-rc4). Any idea? Thanks, -- Dexuan
RE: [PATCH v18 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: David Miller [mailto:da...@davemloft.net] > Sent: Wednesday, July 27, 2016 1:45 > To: Dexuan Cui <de...@microsoft.com> > > From: Dexuan Cui <de...@microsoft.com> > Date: Tue, 26 Jul 2016 07:09:41 + > > > I googled "S390 hypervisor socket" but didn't find anything related (I > > think). > > That would be net/iucv/ Thanks for the info! I'll look into this. > There's also VMWare's stuff under net/vmw_vsock > > It's just absolutely rediculous to make a new hypervisor socket > interface over and over again, so much code duplication and > replication. I agree on this principle of avoiding duplication. However my feeling is: IMHO different hypervisor sockets were developed independently without coordination and the implementation details could be so different that an enough generic framework/infrastructure is difficult, e.g., at first glance, it looks AF_IUCV is quite different from AF_VSOCK and this might explain why AF_VSOCK wasn't built on AF_IUCV(?). I'll dig more into AF_IUCV, AF_VSOCK and AF_HYPERV and figure out what is the best direction I should go. Thanks, -- Dexuan
RE: [PATCH v18 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: netdev-ow...@vger.kernel.org [mailto:netdev- > ow...@vger.kernel.org] On Behalf Of Dexuan Cui > Sent: Tuesday, July 26, 2016 21:22 > ... > This is because, the design of AF_HYPERV in the Hyper-V host side is > suboptimal IMHO (the current host side design requires the least > change in the host side, but it makes my life difficult. :-( It may > change in the future, but luckily we have to live with it at present): BTW, sorry for my typo: "luckily" should be "unluckily". Thanks, -- Dexuan
RE: [PATCH v18 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: Michal Kubecek [mailto:mkube...@suse.cz] > Sent: Tuesday, July 26, 2016 17:57 > ... > On Tue, Jul 26, 2016 at 07:09:41AM +0000, Dexuan Cui wrote: > > ... I don't think Michal > > Kubecek was suggesting I build my code using the existing AF_VSOCK > > code(?) I think he was only asking me to clarify the way I used to write > > the text to explain why I can't fit my code into the existing AF_VSOCK > > code. BTW, AF_VSOCK is not on S390, I think. > > Actually, I believe building on top of existing AF_VSOCK should be the > first thought and only if this way shows unfeasible, one should consider > a completely new implementation from scratch. After all, when VMware > was upstreaming vsock, IIRC they had to work hard on making it > a generic solution rather than a one purpose tool tailored for their specific > use > case. > > What I wanted to say in that mail was that I didn't find the reasoning > very convincing. The only point that wasn't like "AF_VSOCK has many > features we don't need" was the incompatible addressing scheme. The > cover letter text didn't convince me it was given as much thought as it > deserved. I felt - and it still feel - that the option of building on > top of vsock wasn't considered seriously enough. Hi Michal, Thank you very much for the detailed explanation! Just now I read your previous reply again and I think I actually failed to get your point and my reply was inappropriate. I'm sorry about that. When I firstly made the patch last July, I did try to build it on AF_VSOCK, but my feeling was that I had to made big changes to AF_VSOCK code and its related transport layer driver's code. My feeling was that the AF_VSOCK solution's implementation is not so generic that I can fit mine in (easily). To make my feeling more concrete so I can answer your question properly, I'll be figuring out exactly how big the required changes will be -- I'm afraid this would take non-trivial time, but I'll try to finish the investigation ASAP. The biggest challenge is the incompatible addressing scheme. If you could give some advice, I would be very grateful. > I must also admit I'm a bit confused by your response to the issue of > socket lookup performance. I always thought the main reason to use > special hypervisor sockets instead of TCP/IP over virtual network > devices was efficiency (to avoid the overhead of network protocol > processing). Yes, I agree with you. BTW, IMO hypervisor sockets have an advantage of "zero-configuration". To make TCP/IP work between host/guest, we need to add a NIC to the guest, configure the NIC properly in the guest and find a way to let the host/guest know each other's IP address, etc. With hypervisor sockets, there is almost no such configuration effort. > The fact that traversing a linear linked list under > a global mutex for each socket lookup is not an issue as opening > a connection is going to be slow anyway surprised me therefore. This is because, the design of AF_HYPERV in the Hyper-V host side is suboptimal IMHO (the current host side design requires the least change in the host side, but it makes my life difficult. :-( It may change in the future, but luckily we have to live with it at present): 1) A new connection is treated as a new Hyper-V device, so it has to go through the slow device_register(). Please see vmbus_device_register(). 2) A connection/device must have its own ringbuffer that is shared between host/guest. Allocating the ringbuffer memory in the VM and sharing the memory with the host by messages are both slow, though I didn't measure the exact cost. Please see hvsock_open_connection() -> vmbus_open(). 3) The max length of the linear linked list is 2048, and in practice, typically I guess the length should be small, so my gut feeling is that the list traversing shouldn't be the bottleneck. Having said that, I agree it's good to use some mechanism, like hash table, to speed up the lookup. I'll add this. > But > maybe it's fine as the typical use case is going to be small number of > long running connections and traffic performance is going to make for > the connection latency. Yeah, IMO it seems traffic performance and zero-configuration came first when the current host side design was made. > Or there are other advantages, I don't know. > But if that is the case, it would IMHO deserve to be explained. > > Michal Kubecek Thanks, -- Dexuan
RE: [PATCH v18 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: devel [mailto:driverdev-devel-boun...@linuxdriverproject.org] On > Behalf Of Dexuan Cui > ... > > From: David Miller [mailto:da...@davemloft.net] > > ... > > From: Dexuan Cui <de...@microsoft.com> > > Date: Tue, 26 Jul 2016 03:09:16 + > > > > > BTW, during the past month, at least 7 other people also reviewed > > > the patch and gave me quite a few good comments, which have > > > been addressed. > > > > Correction: Several people gave coding style and simple corrections > > to your patch. > > > > Very few gave any review of the _SUBSTANCE_ of your changes. > > > > And the one of the few who did, and suggested you build your > > facilities using the existing S390 hypervisor socket infrastructure, > > you brushed off _IMMEDIATELY_. > > > > That drives me crazy. The one person who gave you real feedback > > you basically didn't consider seriously at all. > > Hi David, > I'm very sorry -- I guess I must have missed something here -- I don't > remember somebody replied with S390 hypervisor socket > infrastructure... I'm re-reading all the replies, trying to locate the > reply and I'll find out why I didn't take it seriously. Sorry in advance. Hi, David, I checked all the comments I received and all my replies (at least I really tried my best to check my Inbox) , but couldn't find the "S390 hypervisor socket infrastructure" mail. I googled "S390 hypervisor socket" but didn't find anything related (I think). I'm really sorry -- could you please give a little more info about it? If you meant https://lkml.org/lkml/2016/7/13/382, I don't think Michal Kubecek was suggesting I build my code using the existing AF_VSOCK code(?) I think he was only asking me to clarify the way I used to write the text to explain why I can't fit my code into the existing AF_VSOCK code. BTW, AF_VSOCK is not on S390, I think. If this is the case, I'm sorry I didn't explain the reason clearer. My replies last year explained the reason with more info: https://lkml.org/lkml/2015/7/7/1162 https://lkml.org/lkml/2015/7/17/67 And I thought people agreed that a new address family is justified. Please let me excerpt the most related snippets in my old replies: -- The biggest difference is the size of the endpoint (u128 vs. u32): in AF_VOSCK vs. in AF_HYPERV. In the AF_VSOCK code and the related transport layer (the wrapper ops of VMware's VMCI), the size is widely used in kernel space (and user space application). If I have to fit my code to AF_VSOCK code, I would have to mess up the AF_VSOCK code in many places by adding ugly code like: IF the endpoint size is <u32, u32> THEN use the existing logic; ELSE use the new logic; And the user space application has to explicitly handle the different endpoint sizes too. -- Looking forward to your reply! Thanks, -- Dexuan
RE: [PATCH v18 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: David Miller [mailto:da...@davemloft.net] > ... > From: Dexuan Cui <de...@microsoft.com> > Date: Tue, 26 Jul 2016 03:09:16 + > > > BTW, during the past month, at least 7 other people also reviewed > > the patch and gave me quite a few good comments, which have > > been addressed. > > Correction: Several people gave coding style and simple corrections > to your patch. > > Very few gave any review of the _SUBSTANCE_ of your changes. > > And the one of the few who did, and suggested you build your > facilities using the existing S390 hypervisor socket infrastructure, > you brushed off _IMMEDIATELY_. > > That drives me crazy. The one person who gave you real feedback > you basically didn't consider seriously at all. Hi David, I'm very sorry -- I guess I must have missed something here -- I don't remember somebody replied with S390 hypervisor socket infrastructure... I'm re-reading all the replies, trying to locate the reply and I'll find out why I didn't take it seriously. Sorry in advance. > I know why you don't want to consider alternative implementations, > and it's because you guys have so much invested in what you've > implemented already. This is not true. I'm absolutely open to any possibility to have an alternative better implementation. Please allow me to find the "S390 hypervisor socket infrastructure" reply first and I'll report back ASAP. > But that's tough and not our problem. > > And until this changes, yes, this submission will be stuck in the > mud and continue slogging on like this. I definitely agree and understand. Thanks, -- Dexuan
RE: [PATCH v18 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: David Miller [mailto:da...@davemloft.net] > > From: Dexuan Cui <de...@microsoft.com> > Date: Sat, 23 Jul 2016 01:35:51 + > > > +static struct sock *hvsock_create(struct net *net, struct socket *sock, > > + gfp_t priority, unsigned short type) > > +{ > > + struct hvsock_sock *hvsk; > > + struct sock *sk; > > + > > + sk = sk_alloc(net, AF_HYPERV, priority, _proto, 0); > > + if (!sk) > > + return NULL; > ... > > + /* Looks stream-based socket doesn't need this. */ > > + sk->sk_backlog_rcv = NULL; > > + > > + sk->sk_state = 0; > > + sock_reset_flag(sk, SOCK_DONE); > > All of these are unnecessary initializations, since sk_alloc() zeroes > out the 'sk' object for you. Hi David, Thanks for the comment! I'll remove the 3 lines. May I know if you have more comments? BTW, during the past month, at least 7 other people also reviewed the patch and gave me quite a few good comments, which have been addressed. Though only one of them gave the Reviewed-by line for now, I guess I would get more if I ping them to have a look at the latest version of the patch, i.e., v19 -- I'm going to post it with the aforementioned 3 lines removed and if you've more comments, I'm ready to address them too. :-) Thanks, -- Dexuan
[PATCH v18 net-next 1/1] hv_sock: introduce Hyper-V Sockets
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication mechanism between the host and the guest. It's somewhat like TCP over VMBus, but the transportation layer (VMBus) is much simpler than IP. With Hyper-V Sockets, applications between the host and the guest can talk to each other directly by the traditional BSD-style socket APIs. Hyper-V Sockets is only available on new Windows hosts, like Windows Server 2016. More info is in this article "Make your own integration services": https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service The patch implements the necessary support in the guest side by introducing a new socket address family AF_HYPERV. Signed-off-by: Dexuan Cui <de...@microsoft.com> Cc: "K. Y. Srinivasan" <k...@microsoft.com> Cc: Haiyang Zhang <haiya...@microsoft.com> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com> Cc: Cathy Avery <cav...@redhat.com> Cc: Olaf Hering <o...@aepfle.de> --- You can also get the patch by (commit 84146dfb): https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160721_v18 For the change log before v12, please see https://lkml.org/lkml/2016/5/15/31 In v12, the changes are mainly the following: 1) remove the module params as David suggested. 2) use 5 exact pages for VMBus send/recv rings, respectively. The host side's design of the feature requires 5 exact pages for recv/send rings respectively -- this is suboptimal considering memory consumption, however unluckily we have to live with it, before the host comes up with a new design in the future. :-( 3) remove the per-connection static send/recv buffers Instead, we allocate and free the buffers dynamically only when we recv/send data. This means: when a connection is idle, no memory is consumed as recv/send buffers at all. In v13: I return ENOMEM on buffer alllocation failure Actually "man read/write" says "Other errors may occur, depending on the object connected to fd". "man send/recv" indeed lists ENOMEM. Considering AF_HYPERV is a new socket type, ENOMEM seems OK here. In the long run, I think we should add a new API in the VMBus driver, allowing data copy from VMBus ringbuffer into user mode buffer directly. This way, we can even eliminate this temporary buffer. In v14: fix some coding style issues pointed out by David. In v15: Just some stylistic changes addressing comments from Joe Perches and Olaf Hering -- thank you! - add a GPL blurb. - define a new macro PAGE_SIZE_4K and use it to replace PAGE_SIZE - change sk_to_hvsock/hvsock_to_sk() from macros to inline functions - remove a not-very-useful pr_err() - fix some typos in comment and coding style issues. In v16: Made stylistic changes addressing comments from Vitaly Kuznetsov. Thank you very much for the detailed comments, Vitaly! In v17: - PAGE_SIZE -> PAGE_SIZE_4K - allow regular users to use the socket Thank you Michal Kubecek for the suggestions! In v18: Just some tiny updates to address some spurious compiler warnings: "xxx may be used uninitialized in this function". Looking forward to your comments! MAINTAINERS |2 + include/linux/hyperv.h | 13 + include/linux/socket.h |4 +- include/net/af_hvsock.h | 78 +++ include/uapi/linux/hyperv.h | 23 + net/Kconfig |1 + net/Makefile|1 + net/hv_sock/Kconfig | 10 + net/hv_sock/Makefile|3 + net/hv_sock/af_hvsock.c | 1507 +++ 10 files changed, 1641 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 50f69ba..6eaa26f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5514,7 +5514,9 @@ F:drivers/pci/host/pci-hyperv.c F: drivers/net/hyperv/ F: drivers/scsi/storvsc_drv.c F: drivers/video/fbdev/hyperv_fb.c +F: net/hv_sock/ F: include/linux/hyperv.h +F: include/net/af_hvsock.h F: tools/hv/ F: Documentation/ABI/stable/sysfs-bus-vmbus diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index 50f493e..1cda6ea5 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1508,5 +1508,18 @@ static inline void commit_rd_index(struct vmbus_channel *channel) vmbus_set_event(channel); } +struct vmpipe_proto_header { + u32 pkt_type; + u32 data_size; +}; + +#define HVSOCK_HEADER_LEN (sizeof(struct vmpacket_descriptor) + \ +sizeof(struct vmpipe_proto_header)) + +/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */ +#define PREV_INDICES_LEN (sizeof(u64)) +#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \ + ALIGN((payload_len), 8) + \ + PREV_INDICES_LEN) #endif /* _HYPERV_H */ diff --git a/include/linux/socket.h b/include/linux/socket.
[PATCH v18 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)
_sock sockets. By default, 1024 sockets take about 1024 * (3+2+1+1) * 4KB = 28M bytes. (Here 1+1 means 1 page for send/recv buffers per connection, respectively.) 3) implement the TODO in hvsock_shutdown(). 4) fix a bug in hvsock_close_connection(): I remove "sk->sk_socket->state = SS_UNCONNECTED;" -- actually this line is not really useful. For a connection triggered by a host app's connect(), sk->sk_socket remains NULL before the connection is accepted by the server app (in Linux VM): see hvsock_accept() -> hvsock_accept_wait() -> sock_graft(connected, newsock). If the host app exits before the server app's accept() returns, the host can send a rescind-message to close the connection and later in the Linux VM's message handler i.e. vmbus_onoffer_rescind()), Linux will get a NULL de-referencing crash. 5) fix a bug in hvsock_open_connection() I move the vmbus_set_chn_rescind_callback() to a later place, because when vmbus_open() fails, hvsock_close_connection() can do nothing and we count on vmbus_onoffer_rescind() -> vmbus_device_unregister() to clean up the device. 6) some stylistic modificiation. Changes since v11: 1) remove the module params as David suggested. 2) use 5 exact pages for VMBus send/recv rings, respectively. The host side's design of the feature requires 5 exact pages for recv/send rings respectively -- this is suboptimal considering memory consumption, however unluckily we have to live with it, before the host comes up with a new design in the future. :-( 3) remove the per-connection static send/recv buffers Instead, we allocate and free the buffers dynamically only when we recv/send data. This means: when a connection is idle, no memory is consumed as recv/send buffers at all. Dexuan Cui (1): hv_sock: introduce Hyper-V Sockets Changes since v12: return ENOMEM on buffer alllocation failure Actually "man read/write" says "Other errors may occur, depending on the object connected to fd". "man send/recv" indeed lists ENOMEM. Considering AF_HYPERV is a new socket type, ENOMEM seems OK here. In the long run, I think we should add a new API in the VMBus driver, allowing data copy from VMBus ringbuffer into user mode buffer directly. This way, we can even eliminate this temporary buffer. Changes since v13: fix some coding style issues pointed out by David. Changes since v14: Just some stylistic changes addressing comments from Joe Perches and Olaf Hering -- thank you! - add a GPL blurb. - define a new macro PAGE_SIZE_4K and use it to replace PAGE_SIZE - change sk_to_hvsock/hvsock_to_sk() from macros to inline functions - remove a not-very-useful pr_err() - fix some typos in comment and coding style issues. Changes since v15: Made stylistic changes addressing comments from Vitaly Kuznetsov. Thank you very much for the detailed comments, Vitaly! Changes since v16: - PAGE_SIZE -> PAGE_SIZE_4K - allow regular users to use the socket Thank you Michal Kubecek for the suggestions! Changes since v17: Just some tiny updates to address some spurious compiler warnings: "xxx may be used uninitialized in this function". Dexuan Cui (1): hv_sock: introduce Hyper-V Sockets MAINTAINERS |2 + include/linux/hyperv.h | 13 + include/linux/socket.h |4 +- include/net/af_hvsock.h | 78 +++ include/uapi/linux/hyperv.h | 23 + net/Kconfig |1 + net/Makefile|1 + net/hv_sock/Kconfig | 10 + net/hv_sock/Makefile|3 + net/hv_sock/af_hvsock.c | 1507 +++ 10 files changed, 1641 insertions(+), 1 deletion(-) create mode 100644 include/net/af_hvsock.h create mode 100644 net/hv_sock/Kconfig create mode 100644 net/hv_sock/Makefile create mode 100644 net/hv_sock/af_hvsock.c -- 2.1.0
RE: [PATCH v17 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: David Miller [mailto:da...@davemloft.net] > >> From: kbuild test robot [mailto:l...@intel.com] > >> [auto build test WARNING on net-next/master] > >> > >> url:https://github.com/0day-ci/linux/commits/Dexuan-Cui/introduce- > >> Hyper-V-VM-Sockets-hv_sock/20160715-223433 > >> config: x86_64-randconfig-a0-07191719 (attached as .config) > >> compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7 > >> reproduce: > >> # save the attached .config to linux build tree > >> make ARCH=x86_64 > >> > >> All warnings (new ones prefixed by >>): > >> > >>net/hv_sock/af_hvsock.c: In function 'hvsock_open_connection': > >>net/hv_sock/af_hvsock.c:693: warning: 'hvsk' may be used uninitialized > in > >> this function > >>net/hv_sock/af_hvsock.c:693: warning: 'new_hvsk' may be used > >> uninitialized in this function > >>net/hv_sock/af_hvsock.c:697: warning: 'new_sk' may be used > uninitialized > >> in this function > >>net/hv_sock/af_hvsock.c: In function 'hvsock_sendmsg_wait': > >>net/hv_sock/af_hvsock.c:1053: warning: 'ret' may be used uninitialized > in > >> this function > >> >> net/hv_sock/af_hvsock.o: warning: objtool: > hvsock_on_channel_cb()+0x1d: > >> function has unreachable instruction > > > > These warnings are all false alarms. > > But you still have to quiet them. Sure. Will do. Thanks, -- Dexuan
RE: [PATCH v17 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: kbuild test robot [mailto:l...@intel.com] > Sent: Wednesday, July 20, 2016 1:10 > > Hi, > > [auto build test WARNING on net-next/master] > > url:https://github.com/0day-ci/linux/commits/Dexuan-Cui/introduce- > Hyper-V-VM-Sockets-hv_sock/20160715-223433 > config: x86_64-randconfig-a0-07191719 (attached as .config) > compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7 > reproduce: > # save the attached .config to linux build tree > make ARCH=x86_64 > > All warnings (new ones prefixed by >>): > >net/hv_sock/af_hvsock.c: In function 'hvsock_open_connection': >net/hv_sock/af_hvsock.c:693: warning: 'hvsk' may be used uninitialized in > this function >net/hv_sock/af_hvsock.c:693: warning: 'new_hvsk' may be used > uninitialized in this function >net/hv_sock/af_hvsock.c:697: warning: 'new_sk' may be used uninitialized > in this function >net/hv_sock/af_hvsock.c: In function 'hvsock_sendmsg_wait': >net/hv_sock/af_hvsock.c:1053: warning: 'ret' may be used uninitialized in > this function > >> net/hv_sock/af_hvsock.o: warning: objtool: hvsock_on_channel_cb()+0x1d: > function has unreachable instruction These warnings are all false alarms. Thanks, -- Dexuan
[PATCH v17 net-next 1/1] hv_sock: introduce Hyper-V Sockets
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication mechanism between the host and the guest. It's somewhat like TCP over VMBus, but the transportation layer (VMBus) is much simpler than IP. With Hyper-V Sockets, applications between the host and the guest can talk to each other directly by the traditional BSD-style socket APIs. Hyper-V Sockets is only available on new Windows hosts, like Windows Server 2016. More info is in this article "Make your own integration services": https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service The patch implements the necessary support in the guest side by introducing a new socket address family AF_HYPERV. Signed-off-by: Dexuan Cui <de...@microsoft.com> Cc: "K. Y. Srinivasan" <k...@microsoft.com> Cc: Haiyang Zhang <haiya...@microsoft.com> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com> Cc: Cathy Avery <cav...@redhat.com> Cc: Olaf Hering <o...@aepfle.de> --- You can also get the patch by (commit fcf045af6): https://github.com/dcui/linux/tree/decui/hv_sock/net-next/20160715_v17 For the change log before v12, please see https://lkml.org/lkml/2016/5/15/31 In v12, the changes are mainly the following: 1) remove the module params as David suggested. 2) use 5 exact pages for VMBus send/recv rings, respectively. The host side's design of the feature requires 5 exact pages for recv/send rings respectively -- this is suboptimal considering memory consumption, however unluckily we have to live with it, before the host comes up with a new design in the future. :-( 3) remove the per-connection static send/recv buffers Instead, we allocate and free the buffers dynamically only when we recv/send data. This means: when a connection is idle, no memory is consumed as recv/send buffers at all. In v13: I return ENOMEM on buffer alllocation failure Actually "man read/write" says "Other errors may occur, depending on the object connected to fd". "man send/recv" indeed lists ENOMEM. Considering AF_HYPERV is a new socket type, ENOMEM seems OK here. In the long run, I think we should add a new API in the VMBus driver, allowing data copy from VMBus ringbuffer into user mode buffer directly. This way, we can even eliminate this temporary buffer. In v14: fix some coding style issues pointed out by David. In v15: Just some stylistic changes addressing comments from Joe Perches and Olaf Hering -- thank you! - add a GPL blurb. - define a new macro PAGE_SIZE_4K and use it to replace PAGE_SIZE - change sk_to_hvsock/hvsock_to_sk() from macros to inline functions - remove a not-very-useful pr_err() - fix some typos in comment and coding style issues. In v16: Made stylistic changes addressing comments from Vitaly Kuznetsov. Thank you very much for the detailed comments, Vitaly! In v17: - PAGE_SIZE -> PAGE_SIZE_4K - allow regular users to use the socket Thank you Michal Kubecek for the suggestions! Looking forward to your comments! MAINTAINERS |2 + include/linux/hyperv.h | 13 + include/linux/socket.h |4 +- include/net/af_hvsock.h | 78 +++ include/uapi/linux/hyperv.h | 23 + net/Kconfig |1 + net/Makefile|1 + net/hv_sock/Kconfig | 10 + net/hv_sock/Makefile|3 + net/hv_sock/af_hvsock.c | 1507 +++ 10 files changed, 1641 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 50f69ba..6eaa26f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5514,7 +5514,9 @@ F:drivers/pci/host/pci-hyperv.c F: drivers/net/hyperv/ F: drivers/scsi/storvsc_drv.c F: drivers/video/fbdev/hyperv_fb.c +F: net/hv_sock/ F: include/linux/hyperv.h +F: include/net/af_hvsock.h F: tools/hv/ F: Documentation/ABI/stable/sysfs-bus-vmbus diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index 50f493e..1cda6ea5 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1508,5 +1508,18 @@ static inline void commit_rd_index(struct vmbus_channel *channel) vmbus_set_event(channel); } +struct vmpipe_proto_header { + u32 pkt_type; + u32 data_size; +}; + +#define HVSOCK_HEADER_LEN (sizeof(struct vmpacket_descriptor) + \ +sizeof(struct vmpipe_proto_header)) + +/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */ +#define PREV_INDICES_LEN (sizeof(u64)) +#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \ + ALIGN((payload_len), 8) + \ + PREV_INDICES_LEN) #endif /* _HYPERV_H */ diff --git a/include/linux/socket.h b/include/linux/socket.h index b5cc5a6..0b68b58 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -202,8 +202,9 @@ struct ucred { #de
[PATCH v17 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)
_sock sockets. By default, 1024 sockets take about 1024 * (3+2+1+1) * 4KB = 28M bytes. (Here 1+1 means 1 page for send/recv buffers per connection, respectively.) 3) implement the TODO in hvsock_shutdown(). 4) fix a bug in hvsock_close_connection(): I remove "sk->sk_socket->state = SS_UNCONNECTED;" -- actually this line is not really useful. For a connection triggered by a host app's connect(), sk->sk_socket remains NULL before the connection is accepted by the server app (in Linux VM): see hvsock_accept() -> hvsock_accept_wait() -> sock_graft(connected, newsock). If the host app exits before the server app's accept() returns, the host can send a rescind-message to close the connection and later in the Linux VM's message handler i.e. vmbus_onoffer_rescind()), Linux will get a NULL de-referencing crash. 5) fix a bug in hvsock_open_connection() I move the vmbus_set_chn_rescind_callback() to a later place, because when vmbus_open() fails, hvsock_close_connection() can do nothing and we count on vmbus_onoffer_rescind() -> vmbus_device_unregister() to clean up the device. 6) some stylistic modificiation. Changes since v11: 1) remove the module params as David suggested. 2) use 5 exact pages for VMBus send/recv rings, respectively. The host side's design of the feature requires 5 exact pages for recv/send rings respectively -- this is suboptimal considering memory consumption, however unluckily we have to live with it, before the host comes up with a new design in the future. :-( 3) remove the per-connection static send/recv buffers Instead, we allocate and free the buffers dynamically only when we recv/send data. This means: when a connection is idle, no memory is consumed as recv/send buffers at all. Dexuan Cui (1): hv_sock: introduce Hyper-V Sockets Changes since v12: return ENOMEM on buffer alllocation failure Actually "man read/write" says "Other errors may occur, depending on the object connected to fd". "man send/recv" indeed lists ENOMEM. Considering AF_HYPERV is a new socket type, ENOMEM seems OK here. In the long run, I think we should add a new API in the VMBus driver, allowing data copy from VMBus ringbuffer into user mode buffer directly. This way, we can even eliminate this temporary buffer. Changes since v13: fix some coding style issues pointed out by David. Changes since v14: Just some stylistic changes addressing comments from Joe Perches and Olaf Hering -- thank you! - add a GPL blurb. - define a new macro PAGE_SIZE_4K and use it to replace PAGE_SIZE - change sk_to_hvsock/hvsock_to_sk() from macros to inline functions - remove a not-very-useful pr_err() - fix some typos in comment and coding style issues. Changes since v15: Made stylistic changes addressing comments from Vitaly Kuznetsov. Thank you very much for the detailed comments, Vitaly! Changes since v16: - PAGE_SIZE -> PAGE_SIZE_4K - allow regular users to use the socket Thank you Michal Kubecek for the suggestions! Dexuan Cui (1): hv_sock: introduce Hyper-V Sockets MAINTAINERS |2 + include/linux/hyperv.h | 13 + include/linux/socket.h |4 +- include/net/af_hvsock.h | 78 +++ include/uapi/linux/hyperv.h | 23 + net/Kconfig |1 + net/Makefile|1 + net/hv_sock/Kconfig | 10 + net/hv_sock/Makefile|3 + net/hv_sock/af_hvsock.c | 1507 +++ 10 files changed, 1641 insertions(+), 1 deletion(-) create mode 100644 include/net/af_hvsock.h create mode 100644 net/hv_sock/Kconfig create mode 100644 net/hv_sock/Makefile create mode 100644 net/hv_sock/af_hvsock.c -- 2.1.0
RE: [PATCH v16 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)
> From: Michal Kubecek [mailto:mkube...@suse.cz] > > .. > > However, though Hyper-V Sockets may seem conceptually similar to > > AF_VOSCK, there are differences in the transportation layer, and IMO these > > make the direct code reusing impractical: > > > > 1. In AF_VSOCK, the endpoint type is: , but in > > AF_HYPERV, the endpoint type is: . Here GUID > > is 128-bit. > > OK, this could be a problem. > > > 2. AF_VSOCK supports SOCK_DGRAM, while AF_HYPERV doesn't. > > > > 3. AF_VSOCK supports some special sock opts, like > SO_VM_SOCKETS_BUFFER_SIZE, > > SO_VM_SOCKETS_BUFFER_MIN/MAX_SIZE and > SO_VM_SOCKETS_CONNECT_TIMEOUT. > > These are meaningless to AF_HYPERV. > > > > 4. Some AF_VSOCK's VMCI transportation ops are meanless to > AF_HYPERV/VMBus, > > like .notify_recv_init > > .notify_recv_pre_block > > .notify_recv_pre_dequeue > > .notify_recv_post_dequeue > > .notify_send_init > > .notify_send_pre_block > > .notify_send_pre_enqueue > > .notify_send_post_enqueue > > etc. > > > > So I think we'd better introduce a new address family: AF_HYPERV. > > I don't quite understand the logic here. All these sound like "AF_VSOCK > has this feature we don't need so (rather than not using the feature) we > are not going to use AF_VSOCK". I would understand if you pointed out > features important for you that are missing in AF_VSOCK but this kind of > reasoning sounds strange to me. > > Michal Kubecek Hi Michal, Sorry, I might not have made me clear. I didn't mean "AF_VSOCK has this feature we don't need". I didn't mean "features important for me that are missing in AF_VSOCK", either. I just wanted to say that I need a new protocol number and I should have a separate directory in net/, i.e., net/hv_sock/. Because AF_VSOCK and AF_HYPERV are conceptually similar, some people asked why I didn't fit my code into net/vmw_vsock/ and I wrote the text to explain why that wasn't a good idea: the implementation details are different and I can't directly reuse the vsock code. Thanks, -- Dexuan
RE: [PATCH v16 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: Michal Kubecek [mailto:mkube...@suse.cz] > > .. > > +static struct sock *hvsock_find_connected_socket_by_channel( > > + const struct vmbus_channel *channel) > > +{ > > + struct hvsock_sock *hvsk; > > + > > + list_for_each_entry(hvsk, _connected_list, connected_list) { > > + if (hvsk->channel == channel) > > + return hvsock_to_sk(hvsk); > > + } > > + return NULL; > > +} > > How does this work from performance point of view if there are many > connected sockets and/or high frequency of new connections? AFAICS most > other families use a hash table for socket lookup. Hi Michal, Per the current design of the feature in the host, there is actually an implicit inherent limit of the number of the per-guest connections: a guest can't have more than 2048 connections. This is because 1 connection takes a VMBus channel ID and at most 2048 channel IDs per guest are supported. And I don't think the lookup function is a bottleneck because the whole process of creating or closing a connection is actually doing lots of things, which need several extra rounds of interactions between the host and the guest, taking much more cycles than the lookup here. > > +static void get_ringbuffer_rw_status(struct vmbus_channel *channel, > > +bool *can_read, bool *can_write) > > .. > > + if (can_write) { > > + hv_get_ringbuffer_availbytes(>outbound, > > +, > > +_write_bytes); > > + > > + /* We only write if there is enough space */ > > + *can_write = avl_write_bytes > HVSOCK_PKT_LEN(PAGE_SIZE); > > I'm not sure where does this come from but is this really supposed to be > PAGE_SIZE (not the fixed 4KB PAGE_SIZE_4K)? Thanks for pointing this out! I'll replace it with PAGE_SIZE_4K. > > + /* see get_ringbuffer_rw_status() */ > > + set_channel_pending_send_size(channel, HVSOCK_PKT_LEN(PAGE_SIZE) > + 1); > > Same question. I'll replace it with PAGE_SIZE_4K too. > > +static int hvsock_create_sock(struct net *net, struct socket *sock, > > + int protocol, int kern) > > +{ > > + struct sock *sk; > > + > > + if (!capable(CAP_SYS_ADMIN) && !capable(CAP_NET_ADMIN)) > > + return -EPERM; > > Looks like any application wanting to use hyper-v sockets will need > rather high privileges. It would make sense if these sockets were > reserved for privileged tasks like VM management. But according to the > commit message, hv_sock is supposed to be used for regular application > to application communication. Requiring CAP_{SYS,NET}_ADMIN looks like > an overkill to me. I agree with you. Let me remove this check. BTW, the check was supposed to prevent regular app from using the socket, because the current design by the host has a drawback: a connection consumes at least 40KB unswapable memory as the host<->guest shared ring and we don't want malicious regular apps to be able to consume all the memory. Later I realized the per-guest number of connections couldn't exceed 2048, so at most the host<->guest rings consume 2K * 40KB = 80MB memory and this isn't a big concern to me. Thanks, -- Dexuan
[PATCH v16 net-next 1/1] hv_sock: introduce Hyper-V Sockets
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication mechanism between the host and the guest. It's somewhat like TCP over VMBus, but the transportation layer (VMBus) is much simpler than IP. With Hyper-V Sockets, applications between the host and the guest can talk to each other directly by the traditional BSD-style socket APIs. Hyper-V Sockets is only available on new Windows hosts, like Windows Server 2016. More info is in this article "Make your own integration services": https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service The patch implements the necessary support in the guest side by introducing a new socket address family AF_HYPERV. Signed-off-by: Dexuan Cui <de...@microsoft.com> Cc: "K. Y. Srinivasan" <k...@microsoft.com> Cc: Haiyang Zhang <haiya...@microsoft.com> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com> Cc: Cathy Avery <cav...@redhat.com> Cc: Olaf Hering <o...@aepfle.de> --- You can also get the patch by (commit 5dde7975): https://github.com/dcui/linux/tree/decui/hv_sock/net-next/20160711_v16 For the change log before v12, please see https://lkml.org/lkml/2016/5/15/31 In v12, the changes are mainly the following: 1) remove the module params as David suggested. 2) use 5 exact pages for VMBus send/recv rings, respectively. The host side's design of the feature requires 5 exact pages for recv/send rings respectively -- this is suboptimal considering memory consumption, however unluckily we have to live with it, before the host comes up with a new design in the future. :-( 3) remove the per-connection static send/recv buffers Instead, we allocate and free the buffers dynamically only when we recv/send data. This means: when a connection is idle, no memory is consumed as recv/send buffers at all. In v13: I return ENOMEM on buffer alllocation failure Actually "man read/write" says "Other errors may occur, depending on the object connected to fd". "man send/recv" indeed lists ENOMEM. Considering AF_HYPERV is a new socket type, ENOMEM seems OK here. In the long run, I think we should add a new API in the VMBus driver, allowing data copy from VMBus ringbuffer into user mode buffer directly. This way, we can even eliminate this temporary buffer. In v14: fix some coding style issues pointed out by David. In v15: Just some stylistic changes addressing comments from Joe Perches and Olaf Hering -- thank you! - add a GPL blurb. - define a new macro PAGE_SIZE_4K and use it to replace PAGE_SIZE - change sk_to_hvsock/hvsock_to_sk() from macros to inline functions - remove a not-very-useful pr_err() - fix some typos in comment and coding style issues. In v16: Made stylistic changes addressing comments from Vitaly Kuznetsov. Thank you very much for the detailed comments, Vitaly! Looking forward to your comments! MAINTAINERS |2 + include/linux/hyperv.h | 13 + include/linux/socket.h |4 +- include/net/af_hvsock.h | 78 +++ include/uapi/linux/hyperv.h | 23 + net/Kconfig |1 + net/Makefile|1 + net/hv_sock/Kconfig | 10 + net/hv_sock/Makefile|3 + net/hv_sock/af_hvsock.c | 1509 +++ 10 files changed, 1643 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 50f69ba..6eaa26f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5514,7 +5514,9 @@ F:drivers/pci/host/pci-hyperv.c F: drivers/net/hyperv/ F: drivers/scsi/storvsc_drv.c F: drivers/video/fbdev/hyperv_fb.c +F: net/hv_sock/ F: include/linux/hyperv.h +F: include/net/af_hvsock.h F: tools/hv/ F: Documentation/ABI/stable/sysfs-bus-vmbus diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index 50f493e..1cda6ea5 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1508,5 +1508,18 @@ static inline void commit_rd_index(struct vmbus_channel *channel) vmbus_set_event(channel); } +struct vmpipe_proto_header { + u32 pkt_type; + u32 data_size; +}; + +#define HVSOCK_HEADER_LEN (sizeof(struct vmpacket_descriptor) + \ +sizeof(struct vmpipe_proto_header)) + +/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */ +#define PREV_INDICES_LEN (sizeof(u64)) +#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \ + ALIGN((payload_len), 8) + \ + PREV_INDICES_LEN) #endif /* _HYPERV_H */ diff --git a/include/linux/socket.h b/include/linux/socket.h index b5cc5a6..0b68b58 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -202,8 +202,9 @@ struct ucred { #define AF_VSOCK 40 /* vSockets */ #define AF_KCM 41 /* Kernel Connection Multiplexor*/ #defin
[PATCH v16 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)
send_ring_page is 2. 2) add module param max_socket_number (the default is 1024). A user can enlarge the number to create more than 1024 hv_sock sockets. By default, 1024 sockets take about 1024 * (3+2+1+1) * 4KB = 28M bytes. (Here 1+1 means 1 page for send/recv buffers per connection, respectively.) 3) implement the TODO in hvsock_shutdown(). 4) fix a bug in hvsock_close_connection(): I remove "sk->sk_socket->state = SS_UNCONNECTED;" -- actually this line is not really useful. For a connection triggered by a host app's connect(), sk->sk_socket remains NULL before the connection is accepted by the server app (in Linux VM): see hvsock_accept() -> hvsock_accept_wait() -> sock_graft(connected, newsock). If the host app exits before the server app's accept() returns, the host can send a rescind-message to close the connection and later in the Linux VM's message handler i.e. vmbus_onoffer_rescind()), Linux will get a NULL de-referencing crash. 5) fix a bug in hvsock_open_connection() I move the vmbus_set_chn_rescind_callback() to a later place, because when vmbus_open() fails, hvsock_close_connection() can do nothing and we count on vmbus_onoffer_rescind() -> vmbus_device_unregister() to clean up the device. 6) some stylistic modificiation. Changes since v11: 1) remove the module params as David suggested. 2) use 5 exact pages for VMBus send/recv rings, respectively. The host side's design of the feature requires 5 exact pages for recv/send rings respectively -- this is suboptimal considering memory consumption, however unluckily we have to live with it, before the host comes up with a new design in the future. :-( 3) remove the per-connection static send/recv buffers Instead, we allocate and free the buffers dynamically only when we recv/send data. This means: when a connection is idle, no memory is consumed as recv/send buffers at all. Dexuan Cui (1): hv_sock: introduce Hyper-V Sockets Changes since v12: return ENOMEM on buffer alllocation failure Actually "man read/write" says "Other errors may occur, depending on the object connected to fd". "man send/recv" indeed lists ENOMEM. Considering AF_HYPERV is a new socket type, ENOMEM seems OK here. In the long run, I think we should add a new API in the VMBus driver, allowing data copy from VMBus ringbuffer into user mode buffer directly. This way, we can even eliminate this temporary buffer. Changes since v13: fix some coding style issues pointed out by David. Changes since v14: Just some stylistic changes addressing comments from Joe Perches and Olaf Hering -- thank you! - add a GPL blurb. - define a new macro PAGE_SIZE_4K and use it to replace PAGE_SIZE - change sk_to_hvsock/hvsock_to_sk() from macros to inline functions - remove a not-very-useful pr_err() - fix some typos in comment and coding style issues. Changes since v15: Made stylistic changes addressing comments from Vitaly Kuznetsov. Thank you very much for the detailed comments, Vitaly! Dexuan Cui (1): hv_sock: introduce Hyper-V Sockets MAINTAINERS |2 + include/linux/hyperv.h | 13 + include/linux/socket.h |4 +- include/net/af_hvsock.h | 78 +++ include/uapi/linux/hyperv.h | 23 + net/Kconfig |1 + net/Makefile|1 + net/hv_sock/Kconfig | 10 + net/hv_sock/Makefile|3 + net/hv_sock/af_hvsock.c | 1509 +++ 10 files changed, 1643 insertions(+), 1 deletion(-) create mode 100644 include/net/af_hvsock.h create mode 100644 net/hv_sock/Kconfig create mode 100644 net/hv_sock/Makefile create mode 100644 net/hv_sock/af_hvsock.c -- 2.1.0 delta_v15_vs_v16.patch Description: delta_v15_vs_v16.patch
RE: [PATCH v15 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com] > ... > Some comments below. The vast majority of them are really minor, the > only thing which bothers me a little bit is WARN() in hvsock_sendmsg() > which I think shouldn't be there. But I may have missed something. Thank you for the very detailed comments, Vitaly! Now I see I shouldn't put pr_err() in hvsock_sendmsg() and hvsock_recvmsg(), because IMO a malicious app can use this to generate lots of messages to slow down the system. I'll remove them. I'll reply your other comments bellow. > > +#define guid_t uuid_le > > +struct sockaddr_hv { > > + __kernel_sa_family_tshv_family; /* Address family */ > > + u16 reserved;/* Must be Zero*/ > > + guid_t shv_vm_id; /* VM ID */ > > + guid_t shv_service_id; /* Service ID */ > > +}; > > I'm not sure it is worth it to introduce a new 'guid_t' type here, we > may want to rename > > shv_vm_id -> shv_vm_guid > shv_service_id -> shv_service_guid > > and use uuid_le type. Ok. I'll make the change. > > +config HYPERV_SOCK > > + tristate "Hyper-V Sockets" > > + depends on HYPERV > > + default m if HYPERV > > + help > > + Hyper-V Sockets is somewhat like TCP over VMBus, allowing > > + communication between Linux guest and Hyper-V host without TCP/IP. > > + > > I know it's hard to come up with a simple description but I'd rather > describe is as "Socket interface for high speed communication between > Linux guest and Hyper-V host over VMBus." OK. > > +static bool uuid_equals(uuid_le u1, uuid_le u2) > > +{ > > + return !uuid_le_cmp(u1, u2); > > +} > > why not use uuid_le_cmp directly? OK. I will change to it. > > +static unsigned int hvsock_poll(struct file *file, struct socket *sock, > > + poll_table *wait) >> ... > > + if (channel) { > > + /* If there is something in the queue then we can read */ > > + get_ringbuffer_rw_status(channel, _read, _write); > > + > > + if (!can_read && hvsk->recv) > > + can_read = true; > > + > > + if (!(sk->sk_shutdown & RCV_SHUTDOWN) && can_read) > > + mask |= POLLIN | POLLRDNORM; > > + } else { > > + can_read = false; > > we don't use can_read below I'll remove the can_read assignment. > > + channel = hvsk->channel; > > + if (!channel) { > > + WARN_ONCE(1, "NULL channel! There is a programming > bug.\n"); > > BUG() then OK. > > +static int hvsock_open_connection(struct vmbus_channel *channel) > > +{ > > + .. > > + if (conn_from_host) { > > + if (sk->sk_ack_backlog >= sk->sk_max_ack_backlog) { > > + ret = -EMFILE; > > I'm not sure -EMFILE is appropriate, we don't really have "too many open > files". Here the ret value doesn't really matter, because the return value of the function is not really used at present. However, I agree with you that EMFILE is unsuitable. Let me change to ECONNREFUSED, which seems better to me. > > +static int hvsock_connect_wait(struct socket *sock, > > + int flags, int current_ret) > > +{ > > + struct sock *sk = sock->sk; > > + struct hvsock_sock *hvsk; > > + int ret = current_ret; > > + DEFINE_WAIT(wait); > > + long timeout; > > + > > + hvsk = sk_to_hvsock(sk); > > + timeout = 30 * HZ; > > We may want to introduce a define for this timeout. Does it actually > match host's timeout? I'll add HVSOCK_CONNECT_TIMEOUT for this. Yes, the value is from Hyper-V team. > > +static int hvsock_accept_wait(struct sock *listener, > > + .. > > + > > + if (ret) { > > + release_sock(connected); > > + sock_put(connected); > > + } else { > > + newsock->state = SS_CONNECTED; > > + sock_graft(connected, newsock); > > + release_sock(connected); > > + sock_put(connected); > > so we do release_sock()/sock_put() unconditionally and this piece could > be rewritten as > > if (!ret) { > newsock->state = SS_CONNECTED; > sock_graft(connected, newsock); > } > release_sock(connected); > sock_put(connected); Will do. > > +static int hvsock_listen(struct socket *sock, int backlog) > > +{ > > + .. > > + /* This is an artificial limit */ > > + if (backlog > 128) > > + backlog = 128; > > Let's do a define for it. Ok. > > +static int hvsock_sendmsg(struct socket *sock, struct msghdr *msg, > > + size_t len) > > +{ > > + struct hvsock_sock *hvsk; > > + struct sock *sk; > > + int ret; > > + > > + if (len == 0) > > + return -EINVAL; > > + > > + if (msg->msg_flags & ~MSG_DONTWAIT) { > > + pr_err("%s: unsupported flags=0x%x\n", __func__, > > + msg->msg_flags); > > I don't think we
RE: [PATCH v14 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: Olaf Hering [mailto:o...@aepfle.de] > Sent: Friday, July 8, 2016 0:02 > On Thu, Jun 30, Dexuan Cui wrote: > > > +/* The MTU is 16KB per the host side's design. */ > > +struct hvsock_recv_buf { > > + unsigned int data_len; > > + unsigned int data_offset; > > + > > + struct vmpipe_proto_header hdr; > > + u8 buf[PAGE_SIZE * 4]; > > Please use some macro related to the protocol rather than a Linux > compiletime macro. OK. I'll fix this. > > +/* We send at most 4KB payload per VMBus packet. */ > > +struct hvsock_send_buf { > > + struct vmpipe_proto_header hdr; > > + u8 buf[PAGE_SIZE]; > > Same here. OK. I'll fix this. > > + * Copyright(c) 2016, Microsoft Corporation. All rights reserved. > > Here the BSD license follows. I think its required/desired to also > include a GPL blurb like it is done in many other files: > ... > * Alternatively, this software may be distributed under the terms of > * the GNU General Public License ("GPL") version 2 as published by the > * Free Software Foundation. > > > Otherwise the MODULE_LICENSE string might be incorrect. I'll add the GPL blurb. > > + /* Hyper-V Sockets requires at least VMBus 4.0 */ > > + if ((vmbus_proto_version >> 16) < 4) { > > + pr_err("failed to load: VMBus 4 or later is required\n"); > > I guess this mens WS 2016+, and loading in earlier host versions will > trigger this path? I think a silent ENODEV is enough. Yes. OK, I'll remove the pr_err(). > > > + return -ENODEV; > > Olaf I'll post v15 shortly, which will address all the comments from Joe and Olaf. Thanks, -- Dexuan
RE: [PATCH v15 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: Dexuan Cui > Sent: Friday, July 8, 2016 15:47 > > You can also get the patch here (2764221d): > https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160708_v15 > > In v14: > fix some coding style issues pointed out by David. > > In v15: > Just some stylistic changes addressing comments from Joe Perches and > Olaf Hering -- thank you! > - add a GPL blurb. > - define a new macro PAGE_SIZE_4K and use it to replace PAGE_SIZE > - change sk_to_hvsock/hvsock_to_sk() from macros to inline functions > - remove a not-very-useful pr_err() > - fix some typos in comment and coding style issues. FYI: the diff between v14 and v15 is attached: the diff is generated by git-diff-ing the 2 branches decui/hv_sock/net-next/20160629_v14 and decui/hv_sock/net-next/20160708_v15 in the above github repo. Thanks, -- Dexuan delta_v14_vs.v15.patch Description: delta_v14_vs.v15.patch
[PATCH v15 net-next 1/1] hv_sock: introduce Hyper-V Sockets
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication mechanism between the host and the guest. It's somewhat like TCP over VMBus, but the transportation layer (VMBus) is much simpler than IP. With Hyper-V Sockets, applications between the host and the guest can talk to each other directly by the traditional BSD-style socket APIs. Hyper-V Sockets is only available on new Windows hosts, like Windows Server 2016. More info is in this article "Make your own integration services": https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service The patch implements the necessary support in the guest side by introducing a new socket address family AF_HYPERV. Signed-off-by: Dexuan Cui <de...@microsoft.com> Cc: "K. Y. Srinivasan" <k...@microsoft.com> Cc: Haiyang Zhang <haiya...@microsoft.com> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com> Cc: Cathy Avery <cav...@redhat.com> --- You can also get the patch here (2764221d): https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160708_v15 For the change log before v12, please see https://lkml.org/lkml/2016/5/15/31 In v12, the changes are mainly the following: 1) remove the module params as David suggested. 2) use 5 exact pages for VMBus send/recv rings, respectively. The host side's design of the feature requires 5 exact pages for recv/send rings respectively -- this is suboptimal considering memory consumption, however unluckily we have to live with it, before the host comes up with a new design in the future. :-( 3) remove the per-connection static send/recv buffers Instead, we allocate and free the buffers dynamically only when we recv/send data. This means: when a connection is idle, no memory is consumed as recv/send buffers at all. In v13: I return ENOMEM on buffer alllocation failure Actually "man read/write" says "Other errors may occur, depending on the object connected to fd". "man send/recv" indeed lists ENOMEM. Considering AF_HYPERV is a new socket type, ENOMEM seems OK here. In the long run, I think we should add a new API in the VMBus driver, allowing data copy from VMBus ringbuffer into user mode buffer directly. This way, we can even eliminate this temporary buffer. In v14: fix some coding style issues pointed out by David. In v15: Just some stylistic changes addressing comments from Joe Perches and Olaf Hering -- thank you! - add a GPL blurb. - define a new macro PAGE_SIZE_4K and use it to replace PAGE_SIZE - change sk_to_hvsock/hvsock_to_sk() from macros to inline functions - remove a not-very-useful pr_err() - fix some typos in comment and coding style issues. Looking forward to your comments! MAINTAINERS |2 + include/linux/hyperv.h | 13 + include/linux/socket.h |4 +- include/net/af_hvsock.h | 78 +++ include/uapi/linux/hyperv.h | 24 + net/Kconfig |1 + net/Makefile|1 + net/hv_sock/Kconfig | 10 + net/hv_sock/Makefile|3 + net/hv_sock/af_hvsock.c | 1523 +++ 10 files changed, 1658 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 50f69ba..6eaa26f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5514,7 +5514,9 @@ F:drivers/pci/host/pci-hyperv.c F: drivers/net/hyperv/ F: drivers/scsi/storvsc_drv.c F: drivers/video/fbdev/hyperv_fb.c +F: net/hv_sock/ F: include/linux/hyperv.h +F: include/net/af_hvsock.h F: tools/hv/ F: Documentation/ABI/stable/sysfs-bus-vmbus diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index 50f493e..1cda6ea5 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1508,5 +1508,18 @@ static inline void commit_rd_index(struct vmbus_channel *channel) vmbus_set_event(channel); } +struct vmpipe_proto_header { + u32 pkt_type; + u32 data_size; +}; + +#define HVSOCK_HEADER_LEN (sizeof(struct vmpacket_descriptor) + \ +sizeof(struct vmpipe_proto_header)) + +/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */ +#define PREV_INDICES_LEN (sizeof(u64)) +#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \ + ALIGN((payload_len), 8) + \ + PREV_INDICES_LEN) #endif /* _HYPERV_H */ diff --git a/include/linux/socket.h b/include/linux/socket.h index b5cc5a6..0b68b58 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -202,8 +202,9 @@ struct ucred { #define AF_VSOCK 40 /* vSockets */ #define AF_KCM 41 /* Kernel Connection Multiplexor*/ #define AF_QIPCRTR 42 /* Qualcomm IPC Router */ +#define AF_HYPERV 43 /* Hyper-V Sockets */ -#define AF_MAX 43 /* For now..
[PATCH v15 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)
efault, 1024 sockets take about 1024 * (3+2+1+1) * 4KB = 28M bytes. (Here 1+1 means 1 page for send/recv buffers per connection, respectively.) 3) implement the TODO in hvsock_shutdown(). 4) fix a bug in hvsock_close_connection(): I remove "sk->sk_socket->state = SS_UNCONNECTED;" -- actually this line is not really useful. For a connection triggered by a host app's connect(), sk->sk_socket remains NULL before the connection is accepted by the server app (in Linux VM): see hvsock_accept() -> hvsock_accept_wait() -> sock_graft(connected, newsock). If the host app exits before the server app's accept() returns, the host can send a rescind-message to close the connection and later in the Linux VM's message handler i.e. vmbus_onoffer_rescind()), Linux will get a NULL de-referencing crash. 5) fix a bug in hvsock_open_connection() I move the vmbus_set_chn_rescind_callback() to a later place, because when vmbus_open() fails, hvsock_close_connection() can do nothing and we count on vmbus_onoffer_rescind() -> vmbus_device_unregister() to clean up the device. 6) some stylistic modificiation. Changes since v11: 1) remove the module params as David suggested. 2) use 5 exact pages for VMBus send/recv rings, respectively. The host side's design of the feature requires 5 exact pages for recv/send rings respectively -- this is suboptimal considering memory consumption, however unluckily we have to live with it, before the host comes up with a new design in the future. :-( 3) remove the per-connection static send/recv buffers Instead, we allocate and free the buffers dynamically only when we recv/send data. This means: when a connection is idle, no memory is consumed as recv/send buffers at all. Dexuan Cui (1): hv_sock: introduce Hyper-V Sockets Changes since v12: return ENOMEM on buffer alllocation failure Actually "man read/write" says "Other errors may occur, depending on the object connected to fd". "man send/recv" indeed lists ENOMEM. Considering AF_HYPERV is a new socket type, ENOMEM seems OK here. In the long run, I think we should add a new API in the VMBus driver, allowing data copy from VMBus ringbuffer into user mode buffer directly. This way, we can even eliminate this temporary buffer. Changes since v13: fix some coding style issues pointed out by David. Changes since v14: Just some stylistic changes addressing comments from Joe Perches and Olaf Hering -- thank you! - add a GPL blurb. - define a new macro PAGE_SIZE_4K and use it to replace PAGE_SIZE - change sk_to_hvsock/hvsock_to_sk() from macros to inline functions - remove a not-very-useful pr_err() - fix some typos in comment and coding style issues. Dexuan Cui (1): hv_sock: introduce Hyper-V Sockets MAINTAINERS |2 + include/linux/hyperv.h | 13 + include/linux/socket.h |4 +- include/net/af_hvsock.h | 78 +++ include/uapi/linux/hyperv.h | 24 + net/Kconfig |1 + net/Makefile|1 + net/hv_sock/Kconfig | 10 + net/hv_sock/Makefile|3 + net/hv_sock/af_hvsock.c | 1523 +++ 10 files changed, 1658 insertions(+), 1 deletion(-) create mode 100644 include/net/af_hvsock.h create mode 100644 net/hv_sock/Kconfig create mode 100644 net/hv_sock/Makefile create mode 100644 net/hv_sock/af_hvsock.c -- 2.1.0
RE: [PATCH v14 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: Joe Perches [mailto:j...@perches.com] > Sent: Tuesday, July 5, 2016 17:39 > To: Dexuan Cui <de...@microsoft.com>; da...@davemloft.net; > gre...@linuxfoundation.org; netdev@vger.kernel.org; linux- > ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de; > a...@canonical.com; jasow...@redhat.com; Vitaly Kuznetsov > <vkuzn...@redhat.com>; Cathy Avery <cav...@redhat.com>; KY Srinivasan > <k...@microsoft.com> > Cc: Haiyang Zhang <haiya...@microsoft.com>; Rolf Neugebauer > <rolf.neugeba...@docker.com> > Subject: Re: [PATCH v14 net-next 1/1] hv_sock: introduce Hyper-V Sockets > > On Tue, 2016-07-05 at 09:31 +, Dexuan Cui wrote: > > > > > +/* This is the address fromat of Hyper-V Sockets. > > > format > > I suppose you meant I should change > > /* This is ... > > to > > /* > > * This is ... > > I'll fix this. > > No, I just meant fromat should be format Oh... Got it. Thanks! I'll fix the typo. -- Dexuan
RE: [PATCH v14 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: Joe Perches [mailto:j...@perches.com] > > > +#define sk_to_hvsock(__sk) ((struct hvsock_sock *)(__sk)) > > +#define hvsock_to_sk(__hvsk) ((struct sock *)(__hvsk)) > > Might as well be static inlines Hi Joe, Thank you for the suggestions (again)! :-) I'll change them to static inlines. > > +/* We send at most 4KB payload per VMBus packet. */ > > +struct hvsock_send_buf { > > + struct vmpipe_proto_header hdr; > > + u8 buf[PAGE_SIZE]; > > PAGE_SIZE might not be the right define here if > the comment is to be believed. I'll change to something like this: +#define HVSOCK_MAX_SND_SIZE_BY_VM (1024 * 4) struct hvsock_send_buf { struct vmpipe_proto_header hdr; - u8 buf[PAGE_SIZE]; + u8 buf[HVSOCK_MAX_SND_SIZE_BY_VM]; }; > > diff --git a/include/uapi/linux/hyperv.h b/include/uapi/linux/hyperv.h > [] > > @@ -396,4 +397,27 @@ struct hv_kvp_ip_msg { > > struct hv_kvp_ipaddr_value kvp_ip_val; > > } __attribute__((packed)); > > > > +/* This is the address fromat of Hyper-V Sockets. > > format I suppose you meant I should change /* This is ... to /* * This is ... I'll fix this. > > diff --git a/net/hv_sock/af_hvsock.c b/net/hv_sock/af_hvsock.c > [] > > @@ -0,0 +1,1519 @@ > > +/* > > + * Hyper-V Sockets -- a socket-based communication channel between the > > + * Hyper-V host and the virtual machines running on it. > > + * > > + * Copyright(c) 2016, Microsoft Corporation. All rights reserved. > > + * > > + * Redistribution and use in source and binary forms, with or without > > + * modification, are permitted provided that the following conditions > > + * are met > . > Is this license GPL compatible? Yes. At the end of the file, there is a line +MODULE_LICENSE("Dual BSD/GPL"); > > +static struct proto hvsock_proto = { > > + .name = "HV_SOCK", > > + .owner = THIS_MODULE, > > + .obj_size = sizeof(struct hvsock_sock), > > +}; > > const? No. In hvsock_create(), hvsock_proto is passed to sk_alloc(), which requires a non-const argument. > > +static int hvsock_recvmsg_wait(struct sock *sk, struct msghdr *msg, > > + size_t len, int flags) > > +{ > [] > > + if (ret != 0 || payload_len > > > + sizeof(hvsk->recv->buf)) { > > This could look nicer as > > if (ret != 0 || > payload_len > sizeof(hvsk->recv->buf)) { I'll fix this. Thanks, -- Dexuan
RE: [PATCH v14 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: David Miller [mailto:da...@davemloft.net] > Sent: Tuesday, July 5, 2016 14:27 > To: Dexuan Cui <de...@microsoft.com> > Subject: Re: [PATCH v14 net-next 1/1] hv_sock: introduce Hyper-V Sockets > > From: Dexuan Cui <de...@microsoft.com> > Date: Tue, 5 Jul 2016 01:58:31 + > > > Not sure if you had a chance to review this version. > > Why me? I just think you're the most responsive reviewer. :-) > Other people have to review this too. Sure. Let me try to ask more people to review this. > > Now I have a question: may I split the include/linux/socket.h change > > and ask you to pre-allocate the number for AF_HYPERV to allow > > backporting of Hyper-V Sockets to distro kernels, and to make sure > > that applications using the socket type will work with the backport > > as well as the upstream kernel? > > Sorry, I'm not going to do this. > > You cannot commit anything in userspace to this value anywhere > until it is accepted upstream. Got it. Thanks for the explanation! Thanks, -- Dexuan
RE: [PATCH v14 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel- > ow...@vger.kernel.org] On Behalf Of Dexuan Cui > Sent: Thursday, June 30, 2016 23:59 > diff --git a/include/linux/socket.h b/include/linux/socket.h > index b5cc5a6..0b68b58 100644 > --- a/include/linux/socket.h > +++ b/include/linux/socket.h > @@ -202,8 +202,9 @@ struct ucred { > #define AF_VSOCK 40 /* vSockets */ > #define AF_KCM 41 /* Kernel Connection Multiplexor*/ > #define AF_QIPCRTR 42 /* Qualcomm IPC Router */ > +#define AF_HYPERV43 /* Hyper-V Sockets */ > > -#define AF_MAX 43 /* For now.. */ > +#define AF_MAX 44 /* For now.. */ Hi David, Not sure if you had a chance to review this version. Now I have a question: may I split the include/linux/socket.h change and ask you to pre-allocate the number for AF_HYPERV to allow backporting of Hyper-V Sockets to distro kernels, and to make sure that applications using the socket type will work with the backport as well as the upstream kernel? Thanks, -- Dexuan
RE: [PATCH v14 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: Olaf Hering [mailto:o...@aepfle.de] > Sent: Friday, July 1, 2016 0:12 > To: Dexuan Cui <de...@microsoft.com> > Cc: da...@davemloft.net; gre...@linuxfoundation.org; > netdev@vger.kernel.org; linux-ker...@vger.kernel.org; > de...@linuxdriverproject.org; a...@canonical.com; jasow...@redhat.com; > Vitaly Kuznetsov <vkuzn...@redhat.com>; Cathy Avery <cav...@redhat.com>; > KY Srinivasan <k...@microsoft.com>; Haiyang Zhang > <haiya...@microsoft.com>; j...@perches.com; Rolf Neugebauer > <rolf.neugeba...@docker.com> > Subject: Re: [PATCH v14 net-next 1/1] hv_sock: introduce Hyper-V Sockets > > On Thu, Jun 30, Dexuan Cui wrote: > > > -#define AF_MAX 43 /* For now.. */ > > +#define AF_MAX 44 /* For now.. */ > > Should this patch also change the places where AF_MAX is used, > like all the arrays in net/core/sock.c? > > Olaf Thanks for the reminder, Olaf! I think we may as well make a separate patch for this. It is in my To-Do list. Thanks, -- Dexuan
[PATCH v14 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)
efault, 1024 sockets take about 1024 * (3+2+1+1) * 4KB = 28M bytes. (Here 1+1 means 1 page for send/recv buffers per connection, respectively.) 3) implement the TODO in hvsock_shutdown(). 4) fix a bug in hvsock_close_connection(): I remove "sk->sk_socket->state = SS_UNCONNECTED;" -- actually this line is not really useful. For a connection triggered by a host app's connect(), sk->sk_socket remains NULL before the connection is accepted by the server app (in Linux VM): see hvsock_accept() -> hvsock_accept_wait() -> sock_graft(connected, newsock). If the host app exits before the server app's accept() returns, the host can send a rescind-message to close the connection and later in the Linux VM's message handler i.e. vmbus_onoffer_rescind()), Linux will get a NULL de-referencing crash. 5) fix a bug in hvsock_open_connection() I move the vmbus_set_chn_rescind_callback() to a later place, because when vmbus_open() fails, hvsock_close_connection() can do nothing and we count on vmbus_onoffer_rescind() -> vmbus_device_unregister() to clean up the device. 6) some stylistic modificiation. Changes since v11: 1) remove the module params as David suggested. 2) use 5 exact pages for VMBus send/recv rings, respectively. The host side's design of the feature requires 5 exact pages for recv/send rings respectively -- this is suboptimal considering memory consumption, however unluckily we have to live with it, before the host comes up with a new design in the future. :-( 3) remove the per-connection static send/recv buffers Instead, we allocate and free the buffers dynamically only when we recv/send data. This means: when a connection is idle, no memory is consumed as recv/send buffers at all. Dexuan Cui (1): hv_sock: introduce Hyper-V Sockets Changes since v12: return ENOMEM on buffer alllocation failure Actually "man read/write" says "Other errors may occur, depending on the object connected to fd". "man send/recv" indeed lists ENOMEM. Considering AF_HYPERV is a new socket type, ENOMEM seems OK here. In the long run, I think we should add a new API in the VMBus driver, allowing data copy from VMBus ringbuffer into user mode buffer directly. This way, we can even eliminate this temporary buffer. Changes since v13: fix some coding style issues pointed out by David. Dexuan Cui (1): hv_sock: introduce Hyper-V Sockets MAINTAINERS |2 + include/linux/hyperv.h | 13 + include/linux/socket.h |4 +- include/net/af_hvsock.h | 59 ++ include/uapi/linux/hyperv.h | 24 + net/Kconfig |1 + net/Makefile|1 + net/hv_sock/Kconfig | 10 + net/hv_sock/Makefile|3 + net/hv_sock/af_hvsock.c | 1519 +++ 10 files changed, 1635 insertions(+), 1 deletion(-) create mode 100644 include/net/af_hvsock.h create mode 100644 net/hv_sock/Kconfig create mode 100644 net/hv_sock/Makefile create mode 100644 net/hv_sock/af_hvsock.c -- 2.1.0
[PATCH v14 net-next 1/1] hv_sock: introduce Hyper-V Sockets
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication mechanism between the host and the guest. It's somewhat like TCP over VMBus, but the transportation layer (VMBus) is much simpler than IP. With Hyper-V Sockets, applications between the host and the guest can talk to each other directly by the traditional BSD-style socket APIs. Hyper-V Sockets is only available on new Windows hosts, like Windows Server 2016. More info is in this article "Make your own integration services": https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service The patch implements the necessary support in the guest side by introducing a new socket address family AF_HYPERV. Signed-off-by: Dexuan Cui <de...@microsoft.com> Cc: "K. Y. Srinivasan" <k...@microsoft.com> Cc: Haiyang Zhang <haiya...@microsoft.com> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com> Cc: Cathy Avery <cav...@redhat.com> --- You can also get the patch here (8ba95c8ec9): https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160629_v14 For the change log before v12, please see https://lkml.org/lkml/2016/5/15/31 In v12, the changes are mainly the following: 1) remove the module params as David suggested. 2) use 5 exact pages for VMBus send/recv rings, respectively. The host side's design of the feature requires 5 exact pages for recv/send rings respectively -- this is suboptimal considering memory consumption, however unluckily we have to live with it, before the host comes up with a new design in the future. :-( 3) remove the per-connection static send/recv buffers Instead, we allocate and free the buffers dynamically only when we recv/send data. This means: when a connection is idle, no memory is consumed as recv/send buffers at all. In v13: I return ENOMEM on buffer alllocation failure Actually "man read/write" says "Other errors may occur, depending on the object connected to fd". "man send/recv" indeed lists ENOMEM. Considering AF_HYPERV is a new socket type, ENOMEM seems OK here. In the long run, I think we should add a new API in the VMBus driver, allowing data copy from VMBus ringbuffer into user mode buffer directly. This way, we can even eliminate this temporary buffer. In v14: fix some coding style issues pointed out by David. Looking forward to your comments! MAINTAINERS |2 + include/linux/hyperv.h | 13 + include/linux/socket.h |4 +- include/net/af_hvsock.h | 59 ++ include/uapi/linux/hyperv.h | 24 + net/Kconfig |1 + net/Makefile|1 + net/hv_sock/Kconfig | 10 + net/hv_sock/Makefile|3 + net/hv_sock/af_hvsock.c | 1519 +++ 10 files changed, 1635 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 50f69ba..6eaa26f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5514,7 +5514,9 @@ F:drivers/pci/host/pci-hyperv.c F: drivers/net/hyperv/ F: drivers/scsi/storvsc_drv.c F: drivers/video/fbdev/hyperv_fb.c +F: net/hv_sock/ F: include/linux/hyperv.h +F: include/net/af_hvsock.h F: tools/hv/ F: Documentation/ABI/stable/sysfs-bus-vmbus diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index 50f493e..1cda6ea5 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1508,5 +1508,18 @@ static inline void commit_rd_index(struct vmbus_channel *channel) vmbus_set_event(channel); } +struct vmpipe_proto_header { + u32 pkt_type; + u32 data_size; +}; + +#define HVSOCK_HEADER_LEN (sizeof(struct vmpacket_descriptor) + \ +sizeof(struct vmpipe_proto_header)) + +/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */ +#define PREV_INDICES_LEN (sizeof(u64)) +#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \ + ALIGN((payload_len), 8) + \ + PREV_INDICES_LEN) #endif /* _HYPERV_H */ diff --git a/include/linux/socket.h b/include/linux/socket.h index b5cc5a6..0b68b58 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -202,8 +202,9 @@ struct ucred { #define AF_VSOCK 40 /* vSockets */ #define AF_KCM 41 /* Kernel Connection Multiplexor*/ #define AF_QIPCRTR 42 /* Qualcomm IPC Router */ +#define AF_HYPERV 43 /* Hyper-V Sockets */ -#define AF_MAX 43 /* For now.. */ +#define AF_MAX 44 /* For now.. */ /* Protocol families, same as address families. */ #define PF_UNSPEC AF_UNSPEC @@ -251,6 +252,7 @@ struct ucred { #define PF_VSOCK AF_VSOCK #define PF_KCM AF_KCM #define PF_QIPCRTR AF_QIPCRTR +#define PF_HYPERV AF_HYPERV #define PF_MAX AF_MAX /* Maxi
RE: [PATCH v13 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: David Miller [mailto:da...@davemloft.net] > Sent: Thursday, June 30, 2016 20:45 > To: Dexuan Cui <de...@microsoft.com> > Cc: gre...@linuxfoundation.org; netdev@vger.kernel.org; linux- > ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de; > a...@canonical.com; jasow...@redhat.com; vkuzn...@redhat.com; > cav...@redhat.com; KY Srinivasan <k...@microsoft.com>; Haiyang Zhang > <haiya...@microsoft.com>; j...@perches.com; rolf.neugeba...@docker.com > Subject: Re: [PATCH v13 net-next 1/1] hv_sock: introduce Hyper-V Sockets > > From: Dexuan Cui <de...@microsoft.com> > Date: Wed, 29 Jun 2016 11:30:40 + > > > @@ -1509,4 +1509,18 @@ static inline void commit_rd_index(struct > vmbus_channel *channel) > > } > > > > > > +struct vmpipe_proto_header { > > + u32 pkt_type; > > It is wasteful to have two empty lines before this structure definition, one > is sufficient. > > ... Hi David, Thank you for pointing out the issues! I'll fix all of them, and check all the similar issues in the patch. Will post a new version ASAP. Thanks, -- Dexuan
[PATCH v13 net-next 1/1] hv_sock: introduce Hyper-V Sockets
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication mechanism between the host and the guest. It's somewhat like TCP over VMBus, but the transportation layer (VMBus) is much simpler than IP. With Hyper-V Sockets, applications between the host and the guest can talk to each other directly by the traditional BSD-style socket APIs. Hyper-V Sockets is only available on new Windows hosts, like Windows Server 2016. More info is in this article "Make your own integration services": https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service The patch implements the necessary support in the guest side by introducing a new socket address family AF_HYPERV. Signed-off-by: Dexuan Cui <de...@microsoft.com> Cc: "K. Y. Srinivasan" <k...@microsoft.com> Cc: Haiyang Zhang <haiya...@microsoft.com> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com> Cc: Cathy Avery <cav...@redhat.com> --- You can also get the patch here (ae3cbdabca): https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160629_v13 For the change log before v12, please see https://lkml.org/lkml/2016/5/15/31 In v12, the changes are mainly the following: 1) remove the module params as David suggested. 2) use 5 exact pages for VMBus send/recv rings, respectively. The host side's design of the feature requires 5 exact pages for recv/send rings respectively -- this is suboptimal considering memory consumption, however unluckily we have to live with it, before the host comes up with a new design in the future. :-( 3) remove the per-connection static send/recv buffers Instead, we allocate and free the buffers dynamically only when we recv/send data. This means: when a connection is idle, no memory is consumed as recv/send buffers at all. In v13: I return ENOMEM on buffer alllocation failure Actually "man read/write" says "Other errors may occur, depending on the object connected to fd". "man send/recv" indeed lists ENOMEM. Considering AF_HYPERV is a new socket type, ENOMEM seems OK here. In the long run, I think we should add a new API in the VMBus driver, allowing data copy from VMBus ringbuffer into user mode buffer directly. This way, we can even eliminate this temporary buffer. Looking forward to your comments! MAINTAINERS |2 + include/linux/hyperv.h | 14 + include/linux/socket.h |4 +- include/net/af_hvsock.h | 59 ++ include/uapi/linux/hyperv.h | 25 + net/Kconfig |1 + net/Makefile|1 + net/hv_sock/Kconfig | 10 + net/hv_sock/Makefile|3 + net/hv_sock/af_hvsock.c | 1519 +++ 10 files changed, 1637 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 50f69ba..6eaa26f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5514,7 +5514,9 @@ F:drivers/pci/host/pci-hyperv.c F: drivers/net/hyperv/ F: drivers/scsi/storvsc_drv.c F: drivers/video/fbdev/hyperv_fb.c +F: net/hv_sock/ F: include/linux/hyperv.h +F: include/net/af_hvsock.h F: tools/hv/ F: Documentation/ABI/stable/sysfs-bus-vmbus diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index 50f493e..95d159e 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1509,4 +1509,18 @@ static inline void commit_rd_index(struct vmbus_channel *channel) } +struct vmpipe_proto_header { + u32 pkt_type; + u32 data_size; +}; + +#define HVSOCK_HEADER_LEN (sizeof(struct vmpacket_descriptor) + \ +sizeof(struct vmpipe_proto_header)) + +/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */ +#define PREV_INDICES_LEN (sizeof(u64)) + +#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \ + ALIGN((payload_len), 8) + \ + PREV_INDICES_LEN) #endif /* _HYPERV_H */ diff --git a/include/linux/socket.h b/include/linux/socket.h index b5cc5a6..0b68b58 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -202,8 +202,9 @@ struct ucred { #define AF_VSOCK 40 /* vSockets */ #define AF_KCM 41 /* Kernel Connection Multiplexor*/ #define AF_QIPCRTR 42 /* Qualcomm IPC Router */ +#define AF_HYPERV 43 /* Hyper-V Sockets */ -#define AF_MAX 43 /* For now.. */ +#define AF_MAX 44 /* For now.. */ /* Protocol families, same as address families. */ #define PF_UNSPEC AF_UNSPEC @@ -251,6 +252,7 @@ struct ucred { #define PF_VSOCK AF_VSOCK #define PF_KCM AF_KCM #define PF_QIPCRTR AF_QIPCRTR +#define PF_HYPERV AF_HYPERV #define PF_MAX AF_MAX /* Maximum queue length specifiable by listen. */ diff --git a/include/net/af_hvsock.h b/include/net/af_
[PATCH v13 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)
efault, 1024 sockets take about 1024 * (3+2+1+1) * 4KB = 28M bytes. (Here 1+1 means 1 page for send/recv buffers per connection, respectively.) 3) implement the TODO in hvsock_shutdown(). 4) fix a bug in hvsock_close_connection(): I remove "sk->sk_socket->state = SS_UNCONNECTED;" -- actually this line is not really useful. For a connection triggered by a host app's connect(), sk->sk_socket remains NULL before the connection is accepted by the server app (in Linux VM): see hvsock_accept() -> hvsock_accept_wait() -> sock_graft(connected, newsock). If the host app exits before the server app's accept() returns, the host can send a rescind-message to close the connection and later in the Linux VM's message handler i.e. vmbus_onoffer_rescind()), Linux will get a NULL de-referencing crash. 5) fix a bug in hvsock_open_connection() I move the vmbus_set_chn_rescind_callback() to a later place, because when vmbus_open() fails, hvsock_close_connection() can do nothing and we count on vmbus_onoffer_rescind() -> vmbus_device_unregister() to clean up the device. 6) some stylistic modificiation. Changes since v11: 1) remove the module params as David suggested. 2) use 5 exact pages for VMBus send/recv rings, respectively. The host side's design of the feature requires 5 exact pages for recv/send rings respectively -- this is suboptimal considering memory consumption, however unluckily we have to live with it, before the host comes up with a new design in the future. :-( 3) remove the per-connection static send/recv buffers Instead, we allocate and free the buffers dynamically only when we recv/send data. This means: when a connection is idle, no memory is consumed as recv/send buffers at all. Dexuan Cui (1): hv_sock: introduce Hyper-V Sockets Changes since v12: return ENOMEM on buffer alllocation failure Actually "man read/write" says "Other errors may occur, depending on the object connected to fd". "man send/recv" indeed lists ENOMEM. Considering AF_HYPERV is a new socket type, ENOMEM seems OK here. In the long run, I think we should add a new API in the VMBus driver, allowing data copy from VMBus ringbuffer into user mode buffer directly. This way, we can even eliminate this temporary buffer. MAINTAINERS |2 + include/linux/hyperv.h | 14 + include/linux/socket.h |4 +- include/net/af_hvsock.h | 59 ++ include/uapi/linux/hyperv.h | 25 + net/Kconfig |1 + net/Makefile|1 + net/hv_sock/Kconfig | 10 + net/hv_sock/Makefile|3 + net/hv_sock/af_hvsock.c | 1519 +++ 10 files changed, 1637 insertions(+), 1 deletion(-) create mode 100644 include/net/af_hvsock.h create mode 100644 net/hv_sock/Kconfig create mode 100644 net/hv_sock/Makefile create mode 100644 net/hv_sock/af_hvsock.c -- 2.1.0
RE: [PATCH v12 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: Rick Jones [mailto:rick.jon...@hpe.com] > Sent: Tuesday, June 28, 2016 23:43 > To: Dexuan Cui <de...@microsoft.com>; David Miller <da...@davemloft.net> > Cc: gre...@linuxfoundation.org; netdev@vger.kernel.org; linux- > ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de; > a...@canonical.com; jasow...@redhat.com; vkuzn...@redhat.com; > cav...@redhat.com; KY Srinivasan <k...@microsoft.com>; Haiyang Zhang > <haiya...@microsoft.com>; j...@perches.com > Subject: Re: [PATCH v12 net-next 1/1] hv_sock: introduce Hyper-V Sockets > > On 06/28/2016 02:59 AM, Dexuan Cui wrote: > > The idea here is: IMO the syscalls sys_read()/write() shoudn't return > > -ENOMEM, so I have to make sure the buffer allocation succeeds? > > > > I tried to use kmalloc with __GFP_NOFAIL, but I hit a warning in > > in mm/page_alloc.c: > > WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); > > > > What error code do you think I should return? > > EAGAIN, ERESTARTSYS, or something else? > > > > May I have your suggestion? Thanks! > > What happens as far as errno is concerned when an application makes a > read() call against a (say TCP) socket associated with a connection > which has been reset? I suppose it is ECONNRESET (Connection reset by peer). > Is it limited to those errno values listed in the > read() manpage, or does it end-up getting an errno value from those > listed in the recv() manpage? Or, perhaps even one not (presently) > listed in either? > > rick jones Actually "man read/write" says "Other errors may occur, depending on the object connected to fd". "man send/recv" indeed lists ENOMEM. Considering AF_HYPERV is a new socket type, ENOMEM seems OK to me and I'm going to post a new version of the patch. In the long run, I think we should add a new API in the VMBus driver, allowing data copy from VMBus ringbuffer into user mode buffer directly. This way, we can even eliminate this temporary buffer. Thanks, -- Dexuan
RE: [PATCH v12 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: David Miller [mailto:da...@davemloft.net] > Sent: Tuesday, June 28, 2016 21:45 > To: Dexuan Cui <de...@microsoft.com> > Cc: gre...@linuxfoundation.org; netdev@vger.kernel.org; linux- > ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de; > a...@canonical.com; jasow...@redhat.com; vkuzn...@redhat.com; > cav...@redhat.com; KY Srinivasan <k...@microsoft.com>; Haiyang Zhang > <haiya...@microsoft.com>; j...@perches.com > Subject: Re: [PATCH v12 net-next 1/1] hv_sock: introduce Hyper-V Sockets > > From: Dexuan Cui <de...@microsoft.com> > Date: Tue, 28 Jun 2016 09:59:21 + > > > The idea here is: IMO the syscalls sys_read()/write() shoudn't return > > -ENOMEM, so I have to make sure the buffer allocation succeeds? > > You have to fail if resources cannot be allocated. OK, I'll try to fix this, probably by returning -EAGAIN or -ERESTARTSYS. I'll report back ASAP. Thanks, -- Dexuan
RE: [PATCH v12 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: David Miller [mailto:da...@davemloft.net] > Sent: Tuesday, June 28, 2016 17:34 > To: Dexuan Cui <de...@microsoft.com> > Cc: gre...@linuxfoundation.org; netdev@vger.kernel.org; linux- > ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de; > a...@canonical.com; jasow...@redhat.com; vkuzn...@redhat.com; > cav...@redhat.com; KY Srinivasan <k...@microsoft.com>; Haiyang Zhang > <haiya...@microsoft.com>; j...@perches.com > Subject: Re: [PATCH v12 net-next 1/1] hv_sock: introduce Hyper-V Sockets > > From: Dexuan Cui <de...@microsoft.com> > Date: Fri, 24 Jun 2016 07:45:24 + > > > + while ((ret = vmalloc(size)) == NULL) > > + ssleep(1); > > This is completely, and entirely, unacceptable. > > If the allocation fails, you return an error and release > your resources. > > You don't just loop forever waiting for it to succeed. Hi David, I agree this is ugly... The idea here is: IMO the syscalls sys_read()/write() shoudn't return -ENOMEM, so I have to make sure the buffer allocation succeeds? I tried to use kmalloc with __GFP_NOFAIL, but I hit a warning in in mm/page_alloc.c: WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); What error code do you think I should return? EAGAIN, ERESTARTSYS, or something else? May I have your suggestion? Thanks! -- Dexuan
[PATCH v12 net-next 1/1] hv_sock: introduce Hyper-V Sockets
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication mechanism between the host and the guest. It's somewhat like TCP over VMBus, but the transportation layer (VMBus) is much simpler than IP. With Hyper-V Sockets, applications between the host and the guest can talk to each other directly by the traditional BSD-style socket APIs. Hyper-V Sockets is only available on new Windows hosts, like Windows Server 2016. More info is in this article "Make your own integration services": https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service The patch implements the necessary support in the guest side by introducing a new socket address family AF_HYPERV. Signed-off-by: Dexuan Cui <de...@microsoft.com> Cc: "K. Y. Srinivasan" <k...@microsoft.com> Cc: Haiyang Zhang <haiya...@microsoft.com> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com> Cc: Cathy Avery <cav...@redhat.com> --- You can also get the patch here: https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160620_v12 For the change log before v12, please see https://lkml.org/lkml/2016/5/15/31 In v12, the changes are mainly the following: 1) remove the module params as David suggested. 2) use 5 exact pages for VMBus send/recv rings, respectively. The host side's design of the feature requires 5 exact pages for recv/send rings respectively -- this is suboptimal considering memory consumption, however unluckily we have to live with it, before the host comes up with a new design in the future. :-( 3) remove the per-connection static send/recv buffers Instead, we allocate and free the buffers dynamically only when we recv/send data. This means: when a connection is idle, no memory is consumed as recv/send buffers at all. Looking forward to your comments! MAINTAINERS |2 + include/linux/hyperv.h | 14 + include/linux/socket.h |4 +- include/net/af_hvsock.h | 59 ++ include/uapi/linux/hyperv.h | 25 + net/Kconfig |1 + net/Makefile|1 + net/hv_sock/Kconfig | 10 + net/hv_sock/Makefile|3 + net/hv_sock/af_hvsock.c | 1514 +++ 10 files changed, 1632 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 50f69ba..6eaa26f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5514,7 +5514,9 @@ F:drivers/pci/host/pci-hyperv.c F: drivers/net/hyperv/ F: drivers/scsi/storvsc_drv.c F: drivers/video/fbdev/hyperv_fb.c +F: net/hv_sock/ F: include/linux/hyperv.h +F: include/net/af_hvsock.h F: tools/hv/ F: Documentation/ABI/stable/sysfs-bus-vmbus diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index 50f493e..95d159e 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1509,4 +1509,18 @@ static inline void commit_rd_index(struct vmbus_channel *channel) } +struct vmpipe_proto_header { + u32 pkt_type; + u32 data_size; +}; + +#define HVSOCK_HEADER_LEN (sizeof(struct vmpacket_descriptor) + \ +sizeof(struct vmpipe_proto_header)) + +/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */ +#define PREV_INDICES_LEN (sizeof(u64)) + +#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \ + ALIGN((payload_len), 8) + \ + PREV_INDICES_LEN) #endif /* _HYPERV_H */ diff --git a/include/linux/socket.h b/include/linux/socket.h index b5cc5a6..0b68b58 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -202,8 +202,9 @@ struct ucred { #define AF_VSOCK 40 /* vSockets */ #define AF_KCM 41 /* Kernel Connection Multiplexor*/ #define AF_QIPCRTR 42 /* Qualcomm IPC Router */ +#define AF_HYPERV 43 /* Hyper-V Sockets */ -#define AF_MAX 43 /* For now.. */ +#define AF_MAX 44 /* For now.. */ /* Protocol families, same as address families. */ #define PF_UNSPEC AF_UNSPEC @@ -251,6 +252,7 @@ struct ucred { #define PF_VSOCK AF_VSOCK #define PF_KCM AF_KCM #define PF_QIPCRTR AF_QIPCRTR +#define PF_HYPERV AF_HYPERV #define PF_MAX AF_MAX /* Maximum queue length specifiable by listen. */ diff --git a/include/net/af_hvsock.h b/include/net/af_hvsock.h new file mode 100644 index 000..20d23d5 --- /dev/null +++ b/include/net/af_hvsock.h @@ -0,0 +1,59 @@ +#ifndef __AF_HVSOCK_H__ +#define __AF_HVSOCK_H__ + +#include +#include +#include + +/* The host side's design of the feature requires 5 exact pages for recv/send + * rings respectively -- this is suboptimal considering memory consumption, + * however unluckily we have to live with it, before the host comes up with + * a better new design in the future. + */ +#define R
[PATCH v12 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)
efault, 1024 sockets take about 1024 * (3+2+1+1) * 4KB = 28M bytes. (Here 1+1 means 1 page for send/recv buffers per connection, respectively.) 3) implement the TODO in hvsock_shutdown(). 4) fix a bug in hvsock_close_connection(): I remove "sk->sk_socket->state = SS_UNCONNECTED;" -- actually this line is not really useful. For a connection triggered by a host app's connect(), sk->sk_socket remains NULL before the connection is accepted by the server app (in Linux VM): see hvsock_accept() -> hvsock_accept_wait() -> sock_graft(connected, newsock). If the host app exits before the server app's accept() returns, the host can send a rescind-message to close the connection and later in the Linux VM's message handler i.e. vmbus_onoffer_rescind()), Linux will get a NULL de-referencing crash. 5) fix a bug in hvsock_open_connection() I move the vmbus_set_chn_rescind_callback() to a later place, because when vmbus_open() fails, hvsock_close_connection() can do nothing and we count on vmbus_onoffer_rescind() -> vmbus_device_unregister() to clean up the device. 6) some stylistic modificiation. Changes since v11: 1) remove the module params as David suggested. 2) use 5 exact pages for VMBus send/recv rings, respectively. The host side's design of the feature requires 5 exact pages for recv/send rings respectively -- this is suboptimal considering memory consumption, however unluckily we have to live with it, before the host comes up with a new design in the future. :-( 3) remove the per-connection static send/recv buffers Instead, we allocate and free the buffers dynamically only when we recv/send data. This means: when a connection is idle, no memory is consumed as recv/send buffers at all. Dexuan Cui (1): hv_sock: introduce Hyper-V Sockets MAINTAINERS |2 + include/linux/hyperv.h | 14 + include/linux/socket.h |4 +- include/net/af_hvsock.h | 59 ++ include/uapi/linux/hyperv.h | 25 + net/Kconfig |1 + net/Makefile|1 + net/hv_sock/Kconfig | 10 + net/hv_sock/Makefile|3 + net/hv_sock/af_hvsock.c | 1514 +++ 10 files changed, 1632 insertions(+), 1 deletion(-) create mode 100644 include/net/af_hvsock.h create mode 100644 net/hv_sock/Kconfig create mode 100644 net/hv_sock/Makefile create mode 100644 net/hv_sock/af_hvsock.c -- 2.1.0
RE: [PATCH v11 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)
> From: David Miller [mailto:da...@davemloft.net] > Sent: Thursday, May 19, 2016 12:13 > To: Dexuan Cui <de...@microsoft.com> > Cc: KY Srinivasan <k...@microsoft.com>; o...@aepfle.de; > gre...@linuxfoundation.org; jasow...@redhat.com; linux- > ker...@vger.kernel.org; j...@perches.com; netdev@vger.kernel.org; > a...@canonical.com; de...@linuxdriverproject.org; Haiyang Zhang > <haiya...@microsoft.com> > Subject: Re: [PATCH v11 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock) > > > I'm travelling and very busy with the merge window. So sorry I won't be able > to think about this for some time. David, Sure, I understand. Please let me recap my last mail: 1) I'll replace my statically-allocated per-connection "send/recv bufs" with dynamically ones, so no buf is used when there is no traffic. 2) Another kind of bufs i.e., the multi-page "VMBus send/recv ringbuffer", is a must IMO due to the host side's design of the feature: every connection needs its own ringbuffer, which takes several pages (2~3 pages at least. And, 5 pages should suffice for good performance). The ringbuffer can be accessed by the host at any time, so IMO the pages can't be swappable. I understand net-next is closed now. I'm going to post the next version after 4.7-rc1 is out in several weeks. If you could give me some suggestions, I would be definitely happy to take. Thanks! -- Dexuan
RE: [PATCH v11 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)
> From: devel [mailto:driverdev-devel-boun...@linuxdriverproject.org] On Behalf > Of Dexuan Cui > Sent: Tuesday, May 17, 2016 10:46 > To: David Miller <da...@davemloft.net> > Cc: o...@aepfle.de; gre...@linuxfoundation.org; jasow...@redhat.com; > linux-ker...@vger.kernel.org; j...@perches.com; netdev@vger.kernel.org; > a...@canonical.com; de...@linuxdriverproject.org; Haiyang Zhang > <haiya...@microsoft.com> > Subject: RE: [PATCH v11 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock) > > > From: David Miller [mailto:da...@davemloft.net] > > Sent: Monday, May 16, 2016 1:16 > > To: Dexuan Cui <de...@microsoft.com> > > Cc: gre...@linuxfoundation.org; netdev@vger.kernel.org; linux- > > ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de; > > a...@canonical.com; jasow...@redhat.com; cav...@redhat.com; KY > > Srinivasan <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>; > > j...@perches.com; vkuzn...@redhat.com > > Subject: Re: [PATCH v11 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock) > > > > From: Dexuan Cui <de...@microsoft.com> > > Date: Sun, 15 May 2016 09:52:42 -0700 > > > > > Changes since v10 > > > > > > 1) add module params: send_ring_page, recv_ring_page. They can be used > to > > > enlarge the ringbuffer size to get better performance, e.g., > > > # modprobe hv_sock recv_ring_page=16 send_ring_page=16 > > > By default, recv_ring_page is 3 and send_ring_page is 2. > > > > > > 2) add module param max_socket_number (the default is 1024). > > > A user can enlarge the number to create more than 1024 hv_sock sockets. > > > By default, 1024 sockets take about 1024 * (3+2+1+1) * 4KB = 28M bytes. > > > (Here 1+1 means 1 page for send/recv buffers per connection, > > > respectively.) > > > > This is papering around my objections, and create module parameters which > > I am fundamentally against. > > > > You're making the facility unusable by default, just to work around my > > memory consumption concerns. > > > > What will end up happening is that everyone will simply increase the > > values. > > > > You're not really addressing the core issue, and I will be ignoring you > > future submissions of this change until you do. > > David, > I am sorry I came across as ignoring your feedback; that was not my intention. > The current host side design for this feature is such that each socket > connection > needs its own channel, which consists of > > 1.A ring buffer for host to guest communication > 2.A ring buffer for guest to host communication > > The memory for the ring buffers has to be pinned down as this will be accessed > both from interrupt level in Linux guest and from the host OS at any time. > > To address your concerns, I am planning to re-implement both the receive path > and the send path so that no additional pinned memory will be needed. > > Receive Path: > When the application does a read on the socket, we will dynamically allocate > the buffer and perform the read operation on the incoming ring buffer. Since > we will be in the process context, we can sleep here and will set the > "GFP_KERNEL | __GFP_NOFAIL" flags. This buffer will be freed once the > application consumes all the data. > > Send Path: > On the send side, we will construct the payload to be sent directly on the > outgoing ringbuffer. > > So, with these changes, the only memory that will be pinned down will be the > memory for the ring buffers on a per-connection basis and this memory will be > pinned down until the connection is torn down. > > Please let me know if this addresses your concerns. > > -- Dexuan Hi David, Ping. Really appreciate your comment. Thanks, -- Dexuan
RE: [PATCH v11 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)
> From: David Miller [mailto:da...@davemloft.net] > Sent: Monday, May 16, 2016 1:16 > To: Dexuan Cui <de...@microsoft.com> > Cc: gre...@linuxfoundation.org; netdev@vger.kernel.org; linux- > ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de; > a...@canonical.com; jasow...@redhat.com; cav...@redhat.com; KY > Srinivasan <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>; > j...@perches.com; vkuzn...@redhat.com > Subject: Re: [PATCH v11 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock) > > From: Dexuan Cui <de...@microsoft.com> > Date: Sun, 15 May 2016 09:52:42 -0700 > > > Changes since v10 > > > > 1) add module params: send_ring_page, recv_ring_page. They can be used to > > enlarge the ringbuffer size to get better performance, e.g., > > # modprobe hv_sock recv_ring_page=16 send_ring_page=16 > > By default, recv_ring_page is 3 and send_ring_page is 2. > > > > 2) add module param max_socket_number (the default is 1024). > > A user can enlarge the number to create more than 1024 hv_sock sockets. > > By default, 1024 sockets take about 1024 * (3+2+1+1) * 4KB = 28M bytes. > > (Here 1+1 means 1 page for send/recv buffers per connection, respectively.) > > This is papering around my objections, and create module parameters which > I am fundamentally against. > > You're making the facility unusable by default, just to work around my > memory consumption concerns. > > What will end up happening is that everyone will simply increase the > values. > > You're not really addressing the core issue, and I will be ignoring you > future submissions of this change until you do. David, I am sorry I came across as ignoring your feedback; that was not my intention. The current host side design for this feature is such that each socket connection needs its own channel, which consists of 1.A ring buffer for host to guest communication 2.A ring buffer for guest to host communication The memory for the ring buffers has to be pinned down as this will be accessed both from interrupt level in Linux guest and from the host OS at any time. To address your concerns, I am planning to re-implement both the receive path and the send path so that no additional pinned memory will be needed. Receive Path: When the application does a read on the socket, we will dynamically allocate the buffer and perform the read operation on the incoming ring buffer. Since we will be in the process context, we can sleep here and will set the "GFP_KERNEL | __GFP_NOFAIL" flags. This buffer will be freed once the application consumes all the data. Send Path: On the send side, we will construct the payload to be sent directly on the outgoing ringbuffer. So, with these changes, the only memory that will be pinned down will be the memory for the ring buffers on a per-connection basis and this memory will be pinned down until the connection is torn down. Please let me know if this addresses your concerns. Thanks, -- Dexuan
[PATCH v11 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)
efault, 1024 sockets take about 1024 * (3+2+1+1) * 4KB = 28M bytes. (Here 1+1 means 1 page for send/recv buffers per connection, respectively.) 3) implement the TODO in hvsock_shutdown(). 4) fix a bug in hvsock_close_connection(): I remove "sk->sk_socket->state = SS_UNCONNECTED;" -- actually this line is not really useful. For a connection triggered by a host app’s connect(), sk->sk_socket remains NULL before the connection is accepted by the server app (in Linux VM): see hvsock_accept() -> hvsock_accept_wait() -> sock_graft(connected, newsock). If the host app exits before the server app’s accept() returns, the host can send a rescind-message to close the connection and later in the Linux VM’s message handler i.e. vmbus_onoffer_rescind()), Linux will get a NULL de-referencing crash. 5) fix a bug in hvsock_open_connection() I move the vmbus_set_chn_rescind_callback() to a later place, because when vmbus_open() fails, hvsock_close_connection() can do nothing and we count on vmbus_onoffer_rescind() -> vmbus_device_unregister() to clean up the device. 6) some stylistic modificiation. Dexuan Cui (1): hv_sock: introduce Hyper-V Sockets MAINTAINERS |2 + include/linux/hyperv.h | 14 + include/linux/socket.h |4 +- include/net/af_hvsock.h | 78 +++ include/uapi/linux/hyperv.h | 25 + net/Kconfig |1 + net/Makefile|1 + net/hv_sock/Kconfig | 10 + net/hv_sock/Makefile|3 + net/hv_sock/af_hvsock.c | 1520 +++ 10 files changed, 1657 insertions(+), 1 deletion(-) create mode 100644 include/net/af_hvsock.h create mode 100644 net/hv_sock/Kconfig create mode 100644 net/hv_sock/Makefile create mode 100644 net/hv_sock/af_hvsock.c -- 2.7.4
[PATCH v11 net-next 1/1] hv_sock: introduce Hyper-V Sockets
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication mechanism between the host and the guest. It's somewhat like TCP over VMBus, but the transportation layer (VMBus) is much simpler than IP. With Hyper-V Sockets, applications between the host and the guest can talk to each other directly by the traditional BSD-style socket APIs. Hyper-V Sockets is only available on new Windows hosts, like Windows Server 2016. More info is in this article "Make your own integration services": https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service The patch implements the necessary support in the guest side by introducing a new socket address family AF_HYPERV. Signed-off-by: Dexuan Cui <de...@microsoft.com> Cc: "K. Y. Srinivasan" <k...@microsoft.com> Cc: Haiyang Zhang <haiya...@microsoft.com> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com> Cc: Cathy Avery <cav...@redhat.com> --- You can also get the patch on this branch: https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160515_v11 For the change log before v10, please see https://lkml.org/lkml/2016/5/4/532 In v10, the main changes consist of 1) minimize struct hvsock_sock by making the send/recv buffers pointers. the buffers are allocated by kmalloc() in __hvsock_create(). 2) minimize the sizes of the send/recv buffers and the vmbus ringbuffers. In v11, the changes are: 1) add module params: send_ring_page, recv_ring_page. They can be used to enlarge the ringbuffer size to get better performance, e.g., # modprobe hv_sock recv_ring_page=16 send_ring_page=16 By default, recv_ring_page is 3 and send_ring_page is 2. 2) add module param max_socket_number (the default is 1024). A user can enlarge the number to create more than 1024 hv_sock sockets. By default, 1024 sockets take about 1024 * (3+2+1+1) * 4KB = 28M bytes. (Here 1+1 means 1 page for send/recv buffers per connection, respectively.) 3) implement the TODO in hvsock_shutdown(). 4) fix a bug in hvsock_close_connection(): I remove "sk->sk_socket->state = SS_UNCONNECTED;" -- actually this line is not really useful. For a connection triggered by a host app’s connect(), sk->sk_socket remains NULL before the connection is accepted by the server app (in Linux VM): see hvsock_accept() -> hvsock_accept_wait() -> sock_graft(connected, newsock). If the host app exits before the server app’s accept() returns, the host can send a rescind-message to close the connection and later in the Linux VM’s message handler i.e. vmbus_onoffer_rescind()), Linux will get a NULL de-referencing crash. 5) fix a bug in hvsock_open_connection() I move the vmbus_set_chn_rescind_callback() to a later place, because when vmbus_open() fails, hvsock_close_connection() can do nothing and we count on vmbus_onoffer_rescind() -> vmbus_device_unregister() to clean up the device. 6) some stylistic modificiation. MAINTAINERS |2 + include/linux/hyperv.h | 14 + include/linux/socket.h |4 +- include/net/af_hvsock.h | 78 +++ include/uapi/linux/hyperv.h | 25 + net/Kconfig |1 + net/Makefile|1 + net/hv_sock/Kconfig | 10 + net/hv_sock/Makefile|3 + net/hv_sock/af_hvsock.c | 1520 +++ 10 files changed, 1657 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index b57df66..c9fe2c6 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5271,7 +5271,9 @@ F:drivers/pci/host/pci-hyperv.c F: drivers/net/hyperv/ F: drivers/scsi/storvsc_drv.c F: drivers/video/fbdev/hyperv_fb.c +F: net/hv_sock/ F: include/linux/hyperv.h +F: include/net/af_hvsock.h F: tools/hv/ F: Documentation/ABI/stable/sysfs-bus-vmbus diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index aa0fadc..7be7237 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1338,4 +1338,18 @@ extern __u32 vmbus_proto_version; int vmbus_send_tl_connect_request(const uuid_le *shv_guest_servie_id, const uuid_le *shv_host_servie_id); +struct vmpipe_proto_header { + u32 pkt_type; + u32 data_size; +}; + +#define HVSOCK_HEADER_LEN (sizeof(struct vmpacket_descriptor) + \ +sizeof(struct vmpipe_proto_header)) + +/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */ +#define PREV_INDICES_LEN (sizeof(u64)) + +#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \ + ALIGN((payload_len), 8) + \ + PREV_INDICES_LEN) #endif /* _HYPERV_H */ diff --git a/include/linux/socket.h b/include/linux/socket.h index b5cc5a6..0b68b58 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -202,8 +202,9 @@ struct ucred { #define AF_VSOCK 40
[PATCH v10 net-next 1/1] hv_sock: introduce Hyper-V Sockets
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication mechanism between the host and the guest. It's somewhat like TCP over VMBus, but the transportation layer (VMBus) is much simpler than IP. With Hyper-V Sockets, applications between the host and the guest can talk to each other directly by the traditional BSD-style socket APIs. Hyper-V Sockets is only available on new Windows hosts, like Windows Server 2016. More info is in this article "Make your own integration services": https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service The patch implements the necessary support in the guest side by introducing a new socket address family AF_HYPERV. Signed-off-by: Dexuan Cui <de...@microsoft.com> Cc: "K. Y. Srinivasan" <k...@microsoft.com> Cc: Haiyang Zhang <haiya...@microsoft.com> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com> Cc: Cathy Avery <cav...@redhat.com> --- You can also get the patch on this branch: https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160512_v10 For the change log before v10, please see https://lkml.org/lkml/2016/5/4/532 In v10, the main changes consist of 1) minimize struct hvsock_sock by making the send/recv buffers pointers. the buffers are allocated by kmalloc() in __hvsock_create(). 2) minimize the sizes of the send/recv buffers and the vmbus ringbuffers. MAINTAINERS |2 + include/linux/hyperv.h | 14 + include/linux/socket.h |4 +- include/net/af_hvsock.h | 78 +++ include/uapi/linux/hyperv.h | 25 + net/Kconfig |1 + net/Makefile|1 + net/hv_sock/Kconfig | 10 + net/hv_sock/Makefile|3 + net/hv_sock/af_hvsock.c | 1484 +++ 10 files changed, 1621 insertions(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index b57df66..c9fe2c6 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5271,7 +5271,9 @@ F:drivers/pci/host/pci-hyperv.c F: drivers/net/hyperv/ F: drivers/scsi/storvsc_drv.c F: drivers/video/fbdev/hyperv_fb.c +F: net/hv_sock/ F: include/linux/hyperv.h +F: include/net/af_hvsock.h F: tools/hv/ F: Documentation/ABI/stable/sysfs-bus-vmbus diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index aa0fadc..7be7237 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1338,4 +1338,18 @@ extern __u32 vmbus_proto_version; int vmbus_send_tl_connect_request(const uuid_le *shv_guest_servie_id, const uuid_le *shv_host_servie_id); +struct vmpipe_proto_header { + u32 pkt_type; + u32 data_size; +}; + +#define HVSOCK_HEADER_LEN (sizeof(struct vmpacket_descriptor) + \ +sizeof(struct vmpipe_proto_header)) + +/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */ +#define PREV_INDICES_LEN (sizeof(u64)) + +#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \ + ALIGN((payload_len), 8) + \ + PREV_INDICES_LEN) #endif /* _HYPERV_H */ diff --git a/include/linux/socket.h b/include/linux/socket.h index b5cc5a6..0b68b58 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -202,8 +202,9 @@ struct ucred { #define AF_VSOCK 40 /* vSockets */ #define AF_KCM 41 /* Kernel Connection Multiplexor*/ #define AF_QIPCRTR 42 /* Qualcomm IPC Router */ +#define AF_HYPERV 43 /* Hyper-V Sockets */ -#define AF_MAX 43 /* For now.. */ +#define AF_MAX 44 /* For now.. */ /* Protocol families, same as address families. */ #define PF_UNSPEC AF_UNSPEC @@ -251,6 +252,7 @@ struct ucred { #define PF_VSOCK AF_VSOCK #define PF_KCM AF_KCM #define PF_QIPCRTR AF_QIPCRTR +#define PF_HYPERV AF_HYPERV #define PF_MAX AF_MAX /* Maximum queue length specifiable by listen. */ diff --git a/include/net/af_hvsock.h b/include/net/af_hvsock.h new file mode 100644 index 000..e002397 --- /dev/null +++ b/include/net/af_hvsock.h @@ -0,0 +1,78 @@ +#ifndef __AF_HVSOCK_H__ +#define __AF_HVSOCK_H__ + +#include +#include +#include + +/* Note: 3-page is the minimal recv ringbuffer size: + * + * the 1st page is used as the shared read/write index etc, rather than data: + * see hv_ringbuffer_init(); + * + * the payload length in the vmbus pipe message received from the host can + * be 4096 bytes, and considing the header of HVSOCK_HEADER_LEN bytes, we + * need at least 2 extra pages for ringbuffer data. + */ +#define HVSOCK_RCV_BUF_SZPAGE_SIZE +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RCV (3 * PAGE_SIZE) + +/* As to send, here let's make sure the hvsock_send_buf struct can be held in 1 + * page, and since we want to use 2 pages for the send rin
[PATCH v10 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication mechanism between the host and the guest. It's somewhat like TCP over VMBus, but the transportation layer (VMBus) is much simpler than IP. With Hyper-V Sockets, applications between the host and the guest can talk to each other directly by the traditional BSD-style socket APIs. Hyper-V Sockets is only available on new Windows hosts, like Windows Server 2016. More info is in this article "Make your own integration services": https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service The patch implements the necessary support in the guest side by introducing a new socket address family AF_HYPERV. You can also get the patch by: https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160512_v10 Note: the VMBus driver side's supporting patches have been in the mainline tree. I know the kernel has already had a VM Sockets driver (AF_VSOCK) based on VMware VMCI (net/vmw_vsock/, drivers/misc/vmw_vmci), and KVM is proposing AF_VSOCK of virtio version: http://marc.info/?l=linux-netdev=145952064004765=2 However, though Hyper-V Sockets may seem conceptually similar to AF_VOSCK, there are differences in the transportation layer, and IMO these make the direct code reusing impractical: 1. In AF_VSOCK, the endpoint type is: , but in AF_HYPERV, the endpoint type is: . Here GUID is 128-bit. 2. AF_VSOCK supports SOCK_DGRAM, while AF_HYPERV doesn't. 3. AF_VSOCK supports some special sock opts, like SO_VM_SOCKETS_BUFFER_SIZE, SO_VM_SOCKETS_BUFFER_MIN/MAX_SIZE and SO_VM_SOCKETS_CONNECT_TIMEOUT. These are meaningless to AF_HYPERV. 4. Some AF_VSOCK's VMCI transportation ops are meanless to AF_HYPERV/VMBus, like .notify_recv_init .notify_recv_pre_block .notify_recv_pre_dequeue .notify_recv_post_dequeue .notify_send_init .notify_send_pre_block .notify_send_pre_enqueue .notify_send_post_enqueue etc. So I think we'd better introduce a new address family: AF_HYPERV. Please review the patch. Looking forward to your comments, especially comments from David. :-) Changes since v1: - updated "[PATCH 6/7] hvsock: introduce Hyper-V VM Sockets feature" - added __init and __exit for the module init/exit functions - net/hv_sock/Kconfig: "default m" -> "default m if HYPERV" - MODULE_LICENSE: "Dual MIT/GPL" -> "Dual BSD/GPL" Changes since v2: - fixed various coding issue pointed out by David Miller - fixed indentation issues - removed pr_debug in net/hv_sock/af_hvsock.c - used reverse-Chrismas-tree style for local variables. - EXPORT_SYMBOL -> EXPORT_SYMBOL_GPL Changes since v3: - fixed a few coding issue pointed by Vitaly Kuznetsov and Dan Carpenter - fixed the ret value in vmbus_recvpacket_hvsock on error - fixed the style of multi-line comment: vmbus_get_hvsock_rw_status() Changes since v4 (https://lkml.org/lkml/2015/7/28/404): - addressed all the comments about V4. - treat the hvsock offers/channels as special VMBus devices - add a mechanism to pass hvsock events to the hvsock driver - fixed some corner cases with proper locking when a connection is closed - rebased to the latest Greg's tree Changes since v5 (https://lkml.org/lkml/2015/12/24/103): - addressed the coding style issues (Vitaly Kuznetsov & David Miller, thanks!) - used a better coding for the per-channel rescind callback (Thank Vitaly!) - avoided the introduction of new VMBUS driver APIs vmbus_sendpacket_hvsock() and vmbus_recvpacket_hvsock() and used vmbus_sendpacket()/vmbus_recvpacket() in the higher level (i.e., the vmsock driver). Thank Vitaly! Changes since v6 (http://lkml.iu.edu/hypermail/linux/kernel/1601.3/01813.html) - only a few minor changes of coding style and comments Changes since v7 - a few minor changes of coding style: thanks, Joe Perches! - added some lines of comments about GUID/UUID before the struct sockaddr_hv. Changes since v8 - removed the unnecessary __packed for some definitions: thanks, David! - hvsock_open_connection: use offer.u.pipe.user_def[0] to know the connection and reorganized the function direction - reorganized the code according to suggestions from Cathy Avery: split big functions into small ones, set .setsockopt and getsockopt to sock_no_setsockopt/sock_no_getsockopt - inline'd some small list helper functions Changes since v9 - minimized struct hvsock_sock by making the send/recv buffers pointers. the buffers are allocated by kmalloc() in __hvsock_create() now. - minimized the sizes of the send/recv buffers and the vmbus ringbuffers. Dexuan Cui (1): hv_sock: introduce Hyper-V Sockets MAINTAINERS |2 + include/linux/hyperv.h | 14 + include/linux/socket.h |4 +- include/net/af_hvsock.h | 78 +++ include/uapi/linux/hyperv.h | 25 + net/Kconfig |1 + net/Makefile|1 + net/hv_sock/Kconfig | 10 +
RE: [PATCH v9 net-next 1/2] hv_sock: introduce Hyper-V Sockets
> From: David Miller [mailto:da...@davemloft.net] > Sent: Monday, May 9, 2016 1:45 > To: Dexuan Cui <de...@microsoft.com> > Cc: gre...@linuxfoundation.org; netdev@vger.kernel.org; linux- > ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de; > a...@canonical.com; jasow...@redhat.com; cav...@redhat.com; KY > Srinivasan <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>; > j...@perches.com; vkuzn...@redhat.com > Subject: Re: [PATCH v9 net-next 1/2] hv_sock: introduce Hyper-V Sockets > > From: Dexuan Cui <de...@microsoft.com> > Date: Sun, 8 May 2016 06:11:04 + > > > Thanks for pointing this out! > > I understand, so I think I should add a module parameter, e.g., > > "hv_sock.max_socket_number" with a default value, say, 1024? > > No, you should get rid of the huge multi-page buffers. Hi David, Ok, how do you like the below proof-of-concept patch snippet? I use 1 page for the recv buf and another page for send buf. They should be allocated by kmalloc(sizeof(struct hvsock_send/recv_buf), GFP_KERNEL). And, by default, I use 2 pages for VMBUS send/recv ringbuffers respectively. (Note: 2 is the minimal ringbuffer size because actually 1 page of the two is used as the shared read/write index etc, rather than data) A module parameter will be added to allow the user to use a big ringbuffer size, if the user cares too much about the performance. Another parameter will be added to limit how many hvsock sockets can be created at most. The default value can be 1024, meaning at most 1024 * (2+2+1+1) * 4KB = 24MB memory is used. -#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV (5 * PAGE_SIZE) -#define VMBUS_RINGBUFFER_SIZE_HVSOCK_SEND (5 * PAGE_SIZE) +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV (2 * PAGE_SIZE) +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_SEND (2 * PAGE_SIZE) -#define HVSOCK_RCV_BUF_SZ VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV +#define HVSOCK_RCV_BUF_SZ PAGE_SIZE #define HVSOCK_SND_BUF_SZ PAGE_SIZE +struct hvsock_send_buf { + struct vmpipe_proto_header hdr; + u8 buf[HVSOCK_SND_BUF_SZ]; +}; + +struct hvsock_recv_buf { + struct vmpipe_proto_header hdr; + u8 buf[HVSOCK_RCV_BUF_SZ]; + + unsigned int data_len; + unsigned int data_offset; +}; + @@ -35,21 +48,8 @@ struct hvsock_sock { struct vmbus_channel *channel; - struct { - struct vmpipe_proto_header hdr; - u8 buf[HVSOCK_SND_BUF_SZ]; - } send; - - struct { - struct vmpipe_proto_header hdr; - u8 buf[HVSOCK_RCV_BUF_SZ]; - - unsigned int data_len; - unsigned int data_offset; - } recv; + struct hvsock_send_buf *send_buf; + struct hvsock_recv_buf *recv_buf; }; Thanks, -- Dexuan
RE: [PATCH v9 net-next 1/2] hv_sock: introduce Hyper-V Sockets
> From: David Miller [mailto:da...@davemloft.net] > Sent: Sunday, May 8, 2016 1:41 > To: Dexuan Cui <de...@microsoft.com> > Cc: gre...@linuxfoundation.org; netdev@vger.kernel.org; linux- > ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de; > a...@canonical.com; jasow...@redhat.com; cav...@redhat.com; KY > Srinivasan <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>; > j...@perches.com; vkuzn...@redhat.com > Subject: Re: [PATCH v9 net-next 1/2] hv_sock: introduce Hyper-V Sockets > > From: Dexuan Cui <de...@microsoft.com> > Date: Sat, 7 May 2016 10:49:25 + > > > I should be able to make 'send', 'recv' here to pointers and use vmalloc() > > to allocate the memory for them. I will do this. > > That's still unswappable kernel memory. Hi David, My understanding is: kernel pages are not swappable in Linux, so it looks I can't avoid unswappable kernel memory here? > People can open N sockets, where N is something on the order of the FD > limit the process has, per process. This allows someone to quickly > eat up a lot of memory and hold onto it nearly indefinitely. Thanks for pointing this out! I understand, so I think I should add a module parameter, e.g., "hv_sock.max_socket_number" with a default value, say, 1024? 1 established hv_sock connection takes less than 20 pages, including 10 pages for VMBus ringbuffer, 6 pages for send/recv buffers(I'll use vmalloc() for this), etc. Here the recv buf needs a size of 5 pages because potentially the host can send the guest a VMBus packet with an up-to-5-page payload, i..e, the VMBus inbound ringbuffer size. 1024 hv_sock connections take less than 20*4KB * 1K = 80MB memory. A user who needs more connections can change the module parameter without reboot. hv_sock connection is designed to work only between the host and the guest. I think 1024 connections seem pretty enough. BTW, a user can't create hv_sock connections without enough privilege. Please see +static int hvsock_create(struct net *net, struct socket *sock, +int protocol, int kern) +{ + if (!capable(CAP_SYS_ADMIN) && !capable(CAP_NET_ADMIN)) + return -EPERM; David, does this make sense to you? Thanks, -- Dexuan
RE: [PATCH v9 net-next 1/2] hv_sock: introduce Hyper-V Sockets
> From: David Miller [mailto:da...@davemloft.net] > Sent: Saturday, May 7, 2016 1:04 > To: Dexuan Cui <de...@microsoft.com> > Cc: gre...@linuxfoundation.org; netdev@vger.kernel.org; linux- > ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de; > a...@canonical.com; jasow...@redhat.com; cav...@redhat.com; KY > Srinivasan <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>; > j...@perches.com; vkuzn...@redhat.com > Subject: Re: [PATCH v9 net-next 1/2] hv_sock: introduce Hyper-V Sockets > > From: Dexuan Cui <de...@microsoft.com> > Date: Wed, 4 May 2016 09:56:57 -0700 > > > +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV (5 * PAGE_SIZE) > > +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_SEND (5 * PAGE_SIZE) > > + > > +#define HVSOCK_RCV_BUF_SZ > VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV > ... > > +struct hvsock_sock { > ... > > + /* The 'hdr' and 'buf' in the below 'send' and 'recv' definitions must > > +* be consecutive: see hvsock_send_data() and hvsock_recv_data(). > > +*/ > > + struct { > > + struct vmpipe_proto_header hdr; > > + u8 buf[HVSOCK_SND_BUF_SZ]; > > + } send; > > + > > + struct { > > + struct vmpipe_proto_header hdr; > > + u8 buf[HVSOCK_RCV_BUF_SZ]; > > + > > + unsigned int data_len; > > + unsigned int data_offset; > > + } recv; > > I don't think allocating 5 pages of unswappable memory for every Hyper-V > socket > created is reasonable. Thanks for the comment, David! I should be able to make 'send', 'recv' here to pointers and use vmalloc() to allocate the memory for them. I will do this. Thanks, -- Dexuan
[PATCH v9 net-next 0/2] introduce Hyper-V VM Sockets(hv_sock)
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication mechanism between the host and the guest. It's somewhat like TCP over VMBus, but the transportation layer (VMBus) is much simpler than IP. With Hyper-V Sockets, applications between the host and the guest can talk to each other directly by the traditional BSD-style socket APIs. Hyper-V Sockets is only available on new Windows hosts, like Windows Server 2016. More info is in this article "Make your own integration services": https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service The patch implements the necessary support in the guest side by introducing a new socket address family AF_HYPERV. You can also get the patch by: https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160502_v09 Note: the VMBus driver side's supporting patches have been in the mainline tree. I know the kernel has already had a VM Sockets driver (AF_VSOCK) based on VMware VMCI (net/vmw_vsock/, drivers/misc/vmw_vmci), and KVM is proposing AF_VSOCK of virtio version: http://marc.info/?l=linux-netdev=145952064004765=2 However, though Hyper-V Sockets may seem conceptually similar to AF_VOSCK, there are differences in the transportation layer, and IMO these make the direct code reusing impractical: 1. In AF_VSOCK, the endpoint type is: , but in AF_HYPERV, the endpoint type is: . Here GUID is 128-bit. 2. AF_VSOCK supports SOCK_DGRAM, while AF_HYPERV doesn't. 3. AF_VSOCK supports some special sock opts, like SO_VM_SOCKETS_BUFFER_SIZE, SO_VM_SOCKETS_BUFFER_MIN/MAX_SIZE and SO_VM_SOCKETS_CONNECT_TIMEOUT. These are meaningless to AF_HYPERV. 4. Some AF_VSOCK's VMCI transportation ops are meanless to AF_HYPERV/VMBus, like .notify_recv_init .notify_recv_pre_block .notify_recv_pre_dequeue .notify_recv_post_dequeue .notify_send_init .notify_send_pre_block .notify_send_pre_enqueue .notify_send_post_enqueue etc. So I think we'd better introduce a new address family: AF_HYPERV. Please review the patch. Looking forward to your comments, especially comments from David. :-) Changes since v1: - updated "[PATCH 6/7] hvsock: introduce Hyper-V VM Sockets feature" - added __init and __exit for the module init/exit functions - net/hv_sock/Kconfig: "default m" -> "default m if HYPERV" - MODULE_LICENSE: "Dual MIT/GPL" -> "Dual BSD/GPL" Changes since v2: - fixed various coding issue pointed out by David Miller - fixed indentation issues - removed pr_debug in net/hv_sock/af_hvsock.c - used reverse-Chrismas-tree style for local variables. - EXPORT_SYMBOL -> EXPORT_SYMBOL_GPL Changes since v3: - fixed a few coding issue pointed by Vitaly Kuznetsov and Dan Carpenter - fixed the ret value in vmbus_recvpacket_hvsock on error - fixed the style of multi-line comment: vmbus_get_hvsock_rw_status() Changes since v4 (https://lkml.org/lkml/2015/7/28/404): - addressed all the comments about V4. - treat the hvsock offers/channels as special VMBus devices - add a mechanism to pass hvsock events to the hvsock driver - fixed some corner cases with proper locking when a connection is closed - rebased to the latest Greg's tree Changes since v5 (https://lkml.org/lkml/2015/12/24/103): - addressed the coding style issues (Vitaly Kuznetsov & David Miller, thanks!) - used a better coding for the per-channel rescind callback (Thank Vitaly!) - avoided the introduction of new VMBUS driver APIs vmbus_sendpacket_hvsock() and vmbus_recvpacket_hvsock() and used vmbus_sendpacket()/vmbus_recvpacket() in the higher level (i.e., the vmsock driver). Thank Vitaly! Changes since v6 (http://lkml.iu.edu/hypermail/linux/kernel/1601.3/01813.html) - only a few minor changes of coding style and comments Changes since v7 - a few minor changes of coding style: thanks, Joe Perches! - added some lines of comments about GUID/UUID before the struct sockaddr_hv. Changes since v8 - removed the unnecessary __packed for some definitions: thanks, David! - hvsock_open_connection: use offer.u.pipe.user_def[0] to know the connection and reorganized the function direction - reorganized the code according to suggestions from Cathy Avery: split big functions into small ones, set .setsockopt and getsockopt to sock_no_setsockopt/sock_no_getsockopt - inline'd some small list helper functions Dexuan Cui (2): hv_sock: introduce Hyper-V Sockets net: add the AF_HYPERV entries to family name tables MAINTAINERS |2 + include/linux/hyperv.h | 16 + include/linux/socket.h |5 +- include/net/af_hvsock.h | 55 ++ include/uapi/linux/hyperv.h | 25 + net/Kconfig |1 + net/Makefile|1 + net/core/sock.c |6 +- net/hv_sock/Kconfig | 10 + net/hv_sock/Makefile|3 + net/hv_sock/af_hvsock.c | 1434 +++ 11 files changed, 1553 insertions(+), 5 deletions(-) create mo
[PATCH v9 net-next 1/2] hv_sock: introduce Hyper-V Sockets
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication mechanism between the host and the guest. It's somewhat like TCP over VMBus, but the transportation layer (VMBus) is much simpler than IP. With Hyper-V Sockets, applications between the host and the guest can talk to each other directly by the traditional BSD-style socket APIs. Hyper-V Sockets is only available on new Windows hosts, like Windows Server 2016. More info is in this article "Make your own integration services": https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service The patch implements the necessary support in the guest side by introducing a new socket address family AF_HYPERV. Signed-off-by: Dexuan Cui <de...@microsoft.com> Cc: "K. Y. Srinivasan" <k...@microsoft.com> Cc: Haiyang Zhang <haiya...@microsoft.com> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com> Cc: Cathy Avery <cav...@redhat.com> --- MAINTAINERS |2 + include/linux/hyperv.h | 16 + include/linux/socket.h |5 +- include/net/af_hvsock.h | 55 ++ include/uapi/linux/hyperv.h | 25 + net/Kconfig |1 + net/Makefile|1 + net/hv_sock/Kconfig | 10 + net/hv_sock/Makefile|3 + net/hv_sock/af_hvsock.c | 1434 +++ 10 files changed, 1550 insertions(+), 2 deletions(-) diff --git a/MAINTAINERS b/MAINTAINERS index fa02825..b32716f 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5268,7 +5268,9 @@ F:drivers/pci/host/pci-hyperv.c F: drivers/net/hyperv/ F: drivers/scsi/storvsc_drv.c F: drivers/video/fbdev/hyperv_fb.c +F: net/hv_sock/ F: include/linux/hyperv.h +F: include/net/af_hvsock.h F: tools/hv/ F: Documentation/ABI/stable/sysfs-bus-vmbus diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index aa0fadc..e756719 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1338,4 +1338,20 @@ extern __u32 vmbus_proto_version; int vmbus_send_tl_connect_request(const uuid_le *shv_guest_servie_id, const uuid_le *shv_host_servie_id); +struct vmpipe_proto_header { + u32 pkt_type; + u32 data_size; +}; + +#define HVSOCK_HEADER_LEN (sizeof(struct vmpacket_descriptor) + \ +sizeof(struct vmpipe_proto_header)) + +/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */ +#define PREV_INDICES_LEN (sizeof(u64)) + +#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \ + ALIGN((payload_len), 8) + \ + PREV_INDICES_LEN) +#define HVSOCK_MIN_PKT_LEN HVSOCK_PKT_LEN(1) + #endif /* _HYPERV_H */ diff --git a/include/linux/socket.h b/include/linux/socket.h index 73bf6c6..88b1ccd 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -201,8 +201,8 @@ struct ucred { #define AF_NFC 39 /* NFC sockets */ #define AF_VSOCK 40 /* vSockets */ #define AF_KCM 41 /* Kernel Connection Multiplexor*/ - -#define AF_MAX 42 /* For now.. */ +#define AF_HYPERV 42 /* Hyper-V Sockets */ +#define AF_MAX 43 /* For now.. */ /* Protocol families, same as address families. */ #define PF_UNSPEC AF_UNSPEC @@ -249,6 +249,7 @@ struct ucred { #define PF_NFC AF_NFC #define PF_VSOCK AF_VSOCK #define PF_KCM AF_KCM +#define PF_HYPERV AF_HYPERV #define PF_MAX AF_MAX /* Maximum queue length specifiable by listen. */ diff --git a/include/net/af_hvsock.h b/include/net/af_hvsock.h new file mode 100644 index 000..04bc40c --- /dev/null +++ b/include/net/af_hvsock.h @@ -0,0 +1,55 @@ +#ifndef __AF_HVSOCK_H__ +#define __AF_HVSOCK_H__ + +#include +#include +#include + +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV (5 * PAGE_SIZE) +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_SEND (5 * PAGE_SIZE) + +#define HVSOCK_RCV_BUF_SZ VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV +#define HVSOCK_SND_BUF_SZ PAGE_SIZE + +#define sk_to_hvsock(__sk)((struct hvsock_sock *)(__sk)) +#define hvsock_to_sk(__hvsk) ((struct sock *)(__hvsk)) + +struct hvsock_sock { + /* sk must be the first member. */ + struct sock sk; + + struct sockaddr_hv local_addr; + struct sockaddr_hv remote_addr; + + /* protected by the global hvsock_mutex */ + struct list_head bound_list; + struct list_head connected_list; + + struct list_head accept_queue; + /* used by enqueue and dequeue */ + struct mutex accept_queue_mutex; + + struct delayed_work dwork; + + u32 peer_shutdown; + + struct vmbus_channel *channel; + + /* The 'hdr' and 'buf' in the below 'send' and 'recv' definitions must +* be consecutive: see hvsock_send_data()
[PATCH v9 net-next 2/2] net: add the AF_HYPERV entries to family name tables
This is for the hv_sock driver, which introduces AF_HYPERV(42). Signed-off-by: Dexuan Cui <de...@microsoft.com> Cc: "K. Y. Srinivasan" <k...@microsoft.com> Cc: Haiyang Zhang <haiya...@microsoft.com> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com> Cc: Cathy Avery <cav...@redhat.com> --- net/core/sock.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/core/sock.c b/net/core/sock.c index e16a5db..c0884c7 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -222,7 +222,7 @@ static const char *const af_family_key_strings[AF_MAX+1] = { "sk_lock-AF_RXRPC" , "sk_lock-AF_ISDN" , "sk_lock-AF_PHONET" , "sk_lock-AF_IEEE802154", "sk_lock-AF_CAIF" , "sk_lock-AF_ALG" , "sk_lock-AF_NFC" , "sk_lock-AF_VSOCK", "sk_lock-AF_KCM" , - "sk_lock-AF_MAX" + "sk_lock-AF_HYPERV", "sk_lock-AF_MAX" }; static const char *const af_family_slock_key_strings[AF_MAX+1] = { "slock-AF_UNSPEC", "slock-AF_UNIX" , "slock-AF_INET" , @@ -239,7 +239,7 @@ static const char *const af_family_slock_key_strings[AF_MAX+1] = { "slock-AF_RXRPC" , "slock-AF_ISDN" , "slock-AF_PHONET" , "slock-AF_IEEE802154", "slock-AF_CAIF" , "slock-AF_ALG" , "slock-AF_NFC" , "slock-AF_VSOCK","slock-AF_KCM" , - "slock-AF_MAX" + "slock-AF_HYPERV", "slock-AF_MAX" }; static const char *const af_family_clock_key_strings[AF_MAX+1] = { "clock-AF_UNSPEC", "clock-AF_UNIX" , "clock-AF_INET" , @@ -256,7 +256,7 @@ static const char *const af_family_clock_key_strings[AF_MAX+1] = { "clock-AF_RXRPC" , "clock-AF_ISDN" , "clock-AF_PHONET" , "clock-AF_IEEE802154", "clock-AF_CAIF" , "clock-AF_ALG" , "clock-AF_NFC" , "clock-AF_VSOCK", "clock-AF_KCM" , - "clock-AF_MAX" + "clock-AF_HYPERV", "clock-AF_MAX" }; /* -- 2.1.0
RE: [PATCH v8 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: Cathy Avery [mailto:cav...@redhat.com] > Sent: Wednesday, April 27, 2016 0:19 > To: Dexuan Cui <de...@microsoft.com>; gre...@linuxfoundation.org; > da...@davemloft.net; netdev@vger.kernel.org; linux-ker...@vger.kernel.org; > de...@linuxdriverproject.org; o...@aepfle.de; Jason Wang > <jasow...@redhat.com>; KY Srinivasan <k...@microsoft.com>; Haiyang Zhang > <haiya...@microsoft.com>; vkuzn...@redhat.com; j...@perches.com > Subject: Re: [PATCH v8 net-next 1/1] hv_sock: introduce Hyper-V Sockets > > Hi, > > I will be working with Dexuan to possibly port this functionality into RHEL. > > Here are my initial comments. Mostly stylistic. They are prefaced by CAA. > > Cathy Avery Thank you very much, Cathy! I'll take your pretty good suggestions and post a new version. Thanks, -- Dexuan
RE: [PATCH v8 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: David Miller [mailto:da...@davemloft.net] > Sent: Thursday, April 14, 2016 10:30 > To: Dexuan Cui <de...@microsoft.com> > Cc: gre...@linuxfoundation.org; netdev@vger.kernel.org; linux- > ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de; > a...@canonical.com; jasow...@redhat.com; cav...@redhat.com; KY > Srinivasan <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>; > j...@perches.com; vkuzn...@redhat.com > Subject: Re: [PATCH v8 net-next 1/1] hv_sock: introduce Hyper-V Sockets > > From: Dexuan Cui <de...@microsoft.com> > Date: Thu, 7 Apr 2016 18:36:51 -0700 > > > +struct vmpipe_proto_header { > > + u32 pkt_type; > > + u32 data_size; > > +} __packed; > > There is no reason to specify __packed here. > > The types are strongly sized to word aligned quantities. > No holes are possible in this structure, nor is any padding > possible either. > > Do not ever slap __packed onto protocol or HW defined structures, > simply just define them properly with proper types and explicit > padding when necessary. Hi David, Thank you very much for taking a look at the patch! I'll remove all the 3 __packed usages in my patch. > > + struct { > > + struct vmpipe_proto_header hdr; > > + char buf[HVSOCK_SND_BUF_SZ]; > > + } __packed send; > > And so on, and so forth.. I'll remove __packed and use u8 to replace the 'char' here. > I'm really disappointed that I couldn't even get one hunk into this > patch submission without finding a major problem. David, Could you please point out more issues in the patch? I'm definitely happy to fix them. :-) > I expect this patch to take several more iterations before I can even > come close to applying it. So please set your expectations properly, > and also it seems like nobody else wants to even review this stuff > either. It is you who needs to find a way to change all of this, not > me. A few people took a look at the early versions of the patch and did give me good suggestions on the interface APIs with VMBus and some coding style issues, especially Vitaly from Redhat. Cathy from Redhat was also looking into the patch recently and gave me some good feedbacks. I'll try to invite more people to review the patch. And, I'm updating the patch to address some issues: 1) the feature is only properly supported on Windows 10/2016 build 14290 and later, so I'm going to not enable the feature on old hosts. 2) there is actually some mechanism we can use to simplify hvsock_open_connection() and help to better support hvsock_shutdown(). Thanks, -- Dexuan
RE: [PATCH v8 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: Joe Perches [mailto:j...@perches.com] > Sent: Friday, April 8, 2016 9:15 > On Thu, 2016-04-07 at 18:36 -0700, Dexuan Cui wrote: > > diff --git a/include/net/af_hvsock.h b/include/net/af_hvsock.h > [] > > +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV (5 * PAGE_SIZE) > > +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_SEND (5 * PAGE_SIZE) > > + > > +#define HVSOCK_RCV_BUF_SZ > VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV > > +#define HVSOCK_SND_BUF_SZ PAGE_SIZE > [] > > +struct hvsock_sock { > [] > > + struct { > > + struct vmpipe_proto_header hdr; > > + char buf[HVSOCK_SND_BUF_SZ]; > > + } __packed send; > > + > > + struct { > > + struct vmpipe_proto_header hdr; > > + char buf[HVSOCK_RCV_BUF_SZ]; > > + unsigned int data_len; > > + unsigned int data_offset; > > + } __packed recv; > > +}; > > These bufs are not page aligned and so can span pages. > > Is there any value in allocating these bufs separately > as pages instead of as a kmalloc? The bufs are not required to be page aligned. Here the 'hdr' and the 'buf' must be consecutive, i.e., the 'buf' must be an array rather than a pointer: please see hvsock_send_data(). It looks to me there is no big value to make sure the 'buf' is page aligned: on x86_64, at least it should already be 8-byte aligned due to the adjacent channel pointer, so memcpy_from_msg() should work enough good and in hvsock_send_data() -> vmbus_sendpacket(), we don't copy the 'buf'. Thanks, -- Dexuan
[PATCH v8 net-next 1/1] hv_sock: introduce Hyper-V Sockets
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication mechanism between the host and the guest. It's somewhat like TCP over VMBus, but the transportation layer (VMBus) is much simpler than IP. With Hyper-V Sockets, applications between the host and the guest can talk to each other directly by the traditional BSD-style socket APIs. Hyper-V Sockets is only available on new Windows hosts, like Windows Server 2016. More info is in this article "Make your own integration services": https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service The patch implements the necessary support in the guest side by introducing a new socket address family AF_HYPERV. Signed-off-by: Dexuan Cui <de...@microsoft.com> Cc: "K. Y. Srinivasan" <k...@microsoft.com> Cc: Haiyang Zhang <haiya...@microsoft.com> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com> --- MAINTAINERS |2 + include/linux/hyperv.h | 16 + include/linux/socket.h |5 +- include/net/af_hvsock.h | 51 ++ include/uapi/linux/hyperv.h | 25 + net/Kconfig |1 + net/Makefile|1 + net/hv_sock/Kconfig | 10 + net/hv_sock/Makefile|3 + net/hv_sock/af_hvsock.c | 1483 +++ 10 files changed, 1595 insertions(+), 2 deletions(-) create mode 100644 include/net/af_hvsock.h create mode 100644 net/hv_sock/Kconfig create mode 100644 net/hv_sock/Makefile create mode 100644 net/hv_sock/af_hvsock.c diff --git a/MAINTAINERS b/MAINTAINERS index 67d99dd..7b6f203 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5267,7 +5267,9 @@ F:drivers/pci/host/pci-hyperv.c F: drivers/net/hyperv/ F: drivers/scsi/storvsc_drv.c F: drivers/video/fbdev/hyperv_fb.c +F: net/hv_sock/ F: include/linux/hyperv.h +F: include/net/af_hvsock.h F: tools/hv/ F: Documentation/ABI/stable/sysfs-bus-vmbus diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index aa0fadc..b92439d 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1338,4 +1338,20 @@ extern __u32 vmbus_proto_version; int vmbus_send_tl_connect_request(const uuid_le *shv_guest_servie_id, const uuid_le *shv_host_servie_id); +struct vmpipe_proto_header { + u32 pkt_type; + u32 data_size; +} __packed; + +#define HVSOCK_HEADER_LEN (sizeof(struct vmpacket_descriptor) + \ +sizeof(struct vmpipe_proto_header)) + +/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */ +#define PREV_INDICES_LEN (sizeof(u64)) + +#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \ + ALIGN((payload_len), 8) + \ + PREV_INDICES_LEN) +#define HVSOCK_MIN_PKT_LEN HVSOCK_PKT_LEN(1) + #endif /* _HYPERV_H */ diff --git a/include/linux/socket.h b/include/linux/socket.h index 73bf6c6..88b1ccd 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -201,8 +201,8 @@ struct ucred { #define AF_NFC 39 /* NFC sockets */ #define AF_VSOCK 40 /* vSockets */ #define AF_KCM 41 /* Kernel Connection Multiplexor*/ - -#define AF_MAX 42 /* For now.. */ +#define AF_HYPERV 42 /* Hyper-V Sockets */ +#define AF_MAX 43 /* For now.. */ /* Protocol families, same as address families. */ #define PF_UNSPEC AF_UNSPEC @@ -249,6 +249,7 @@ struct ucred { #define PF_NFC AF_NFC #define PF_VSOCK AF_VSOCK #define PF_KCM AF_KCM +#define PF_HYPERV AF_HYPERV #define PF_MAX AF_MAX /* Maximum queue length specifiable by listen. */ diff --git a/include/net/af_hvsock.h b/include/net/af_hvsock.h new file mode 100644 index 000..a5aa28d --- /dev/null +++ b/include/net/af_hvsock.h @@ -0,0 +1,51 @@ +#ifndef __AF_HVSOCK_H__ +#define __AF_HVSOCK_H__ + +#include +#include +#include + +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV (5 * PAGE_SIZE) +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_SEND (5 * PAGE_SIZE) + +#define HVSOCK_RCV_BUF_SZ VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV +#define HVSOCK_SND_BUF_SZ PAGE_SIZE + +#define sk_to_hvsock(__sk)((struct hvsock_sock *)(__sk)) +#define hvsock_to_sk(__hvsk) ((struct sock *)(__hvsk)) + +struct hvsock_sock { + /* sk must be the first member. */ + struct sock sk; + + struct sockaddr_hv local_addr; + struct sockaddr_hv remote_addr; + + /* protected by the global hvsock_mutex */ + struct list_head bound_list; + struct list_head connected_list; + + struct list_head accept_queue; + /* used by enqueue and dequeue */ + struct mutex accept_queue_mutex; + + struct delayed_work dwork; + + u32 peer_shutdown; + + struct vmbus_channel *channe
[PATCH v8 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication mechanism between the host and the guest. It's somewhat like TCP over VMBus, but the transportation layer (VMBus) is much simpler than IP. With Hyper-V Sockets, applications between the host and the guest can talk to each other directly by the traditional BSD-style socket APIs. Hyper-V Sockets is only available on new Windows hosts, like Windows Server 2016. More info is in this article "Make your own integration services": https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service The patch implements the necessary support in the guest side by introducing a new socket address family AF_HYPERV. Note: the VMBus driver side's supporting patches have been in the mainline tree. I know the kernel has already had a VM Sockets driver (AF_VSOCK) based on VMware VMCI (net/vmw_vsock/, drivers/misc/vmw_vmci), and KVM is proposing AF_VSOCK of virtio version: http://marc.info/?l=linux-netdev=145952064004765=2 However, though Hyper-V Sockets may seem conceptually similar to AF_VOSCK, there are differences in the transportation layer, and IMO these make the direct code reusing impractical: 1. In AF_VSOCK, the endpoint type is: , but in AF_HYPERV, the endpoint type is: . Here GUID is 128-bit. 2. AF_VSOCK supports SOCK_DGRAM, while AF_HYPERV doesn't. 3. AF_VSOCK supports some special sock opts, like SO_VM_SOCKETS_BUFFER_SIZE, SO_VM_SOCKETS_BUFFER_MIN/MAX_SIZE and SO_VM_SOCKETS_CONNECT_TIMEOUT. These are meaningless to AF_HYPERV. 4. Some AF_VSOCK's VMCI transportation ops are meanless to AF_HYPERV/VMBus, like .notify_recv_init .notify_recv_pre_block .notify_recv_pre_dequeue .notify_recv_post_dequeue .notify_send_init .notify_send_pre_block .notify_send_pre_enqueue .notify_send_post_enqueue etc. So I think we'd better introduce a new address family: AF_HYPERV. Please review the patch. Looking forward to your comments! Changes since v1: - updated "[PATCH 6/7] hvsock: introduce Hyper-V VM Sockets feature" - added __init and __exit for the module init/exit functions - net/hv_sock/Kconfig: "default m" -> "default m if HYPERV" - MODULE_LICENSE: "Dual MIT/GPL" -> "Dual BSD/GPL" Changes since v2: - fixed various coding issue pointed out by David Miller - fixed indentation issues - removed pr_debug in net/hv_sock/af_hvsock.c - used reverse-Chrismas-tree style for local variables. - EXPORT_SYMBOL -> EXPORT_SYMBOL_GPL Changes since v3: - fixed a few coding issue pointed by Vitaly Kuznetsov and Dan Carpenter - fixed the ret value in vmbus_recvpacket_hvsock on error - fixed the style of multi-line comment: vmbus_get_hvsock_rw_status() Changes since v4 (https://lkml.org/lkml/2015/7/28/404): - addressed all the comments about V4. - treat the hvsock offers/channels as special VMBus devices - add a mechanism to pass hvsock events to the hvsock driver - fixed some corner cases with proper locking when a connection is closed - rebased to the latest Greg's tree Changes since v5 (https://lkml.org/lkml/2015/12/24/103): - addressed the coding style issues (Vitaly Kuznetsov & David Miller, thanks!) - used a better coding for the per-channel rescind callback (Thank Vitaly!) - avoided the introduction of new VMBUS driver APIs vmbus_sendpacket_hvsock() and vmbus_recvpacket_hvsock() and used vmbus_sendpacket()/vmbus_recvpacket() in the higher level (i.e., the vmsock driver). Thank Vitaly! Changes since v6 (http://lkml.iu.edu/hypermail/linux/kernel/1601.3/01813.html) - only a few minor changes of coding style and comments Changes since v7 - a few minor changes of coding style: thanks, Joe Perches! - added some lines of comments about GUID/UUID before the struct sockaddr_hv. Dexuan Cui (1): hv_sock: introduce Hyper-V Sockets MAINTAINERS |2 + include/linux/hyperv.h | 16 + include/linux/socket.h |5 +- include/net/af_hvsock.h | 51 ++ include/uapi/linux/hyperv.h | 25 + net/Kconfig |1 + net/Makefile|1 + net/hv_sock/Kconfig | 10 + net/hv_sock/Makefile|3 + net/hv_sock/af_hvsock.c | 1483 +++ 10 files changed, 1595 insertions(+), 2 deletions(-) create mode 100644 include/net/af_hvsock.h create mode 100644 net/hv_sock/Kconfig create mode 100644 net/hv_sock/Makefile create mode 100644 net/hv_sock/af_hvsock.c -- 2.1.0
RE: [PATCH v7 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> From: Joe Perches [mailto:j...@perches.com] > Sent: Thursday, April 7, 2016 19:30 > To: Dexuan Cui <de...@microsoft.com>; gre...@linuxfoundation.org; > da...@davemloft.net; netdev@vger.kernel.org; linux-ker...@vger.kernel.org; > de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com; > jasow...@redhat.com; KY Srinivasan <k...@microsoft.com>; Haiyang Zhang > <haiya...@microsoft.com> > Cc: vkuzn...@redhat.com > Subject: Re: [PATCH v7 net-next 1/1] hv_sock: introduce Hyper-V Sockets > > On Thu, 2016-04-07 at 05:50 -0700, Dexuan Cui wrote: > > Hyper-V Sockets (hv_sock) supplies a byte-stream based communication > > mechanism between the host and the guest. It's somewhat like TCP over > > VMBus, but the transportation layer (VMBus) is much simpler than IP. > > style trivia: > > > diff --git a/net/hv_sock/af_hvsock.c b/net/hv_sock/af_hvsock.c > [] > > +static struct sock *__hvsock_find_bound_socket(const struct sockaddr_hv > *addr) > > +{ > > + struct hvsock_sock *hvsk; > > + > > + list_for_each_entry(hvsk, _bound_list, bound_list) > > + if (uuid_equals(addr->shv_service_id, > > + hvsk->local_addr.shv_service_id)) > > + return hvsock_to_sk(hvsk); > > Because there's an if, it's generally nicer to use > braces in the list_for_each Thanks for the suggestion, Joe! I'll add {}. > > +static struct sock *__hvsock_find_connected_socket_by_channel( > > + const struct vmbus_channel *channel) > > +{ > > + struct hvsock_sock *hvsk; > > + > > + list_for_each_entry(hvsk, _connected_list, connected_list) > > + if (hvsk->channel == channel) > > + return hvsock_to_sk(hvsk); > > + return NULL; > > here too I'll fix this too. > > +static int hvsock_sendmsg(struct socket *sock, struct msghdr *msg, size_t > > len) > > +{ > [] > > + if (msg->msg_flags & ~MSG_DONTWAIT) { > > + pr_err("hvsock_sendmsg: unsupported flags=0x%x\n", > > + msg->msg_flags); > > All the pr_ messages with embedded function > names could use "%s:", __func__ I'll fix this. Thanks, -- Dexuan
[PATCH v7 net-next 1/1] hv_sock: introduce Hyper-V Sockets
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication mechanism between the host and the guest. It's somewhat like TCP over VMBus, but the transportation layer (VMBus) is much simpler than IP. With Hyper-V Sockets, applications between the host and the guest can talk to each other directly by the traditional BSD-style socket APIs. Hyper-V Sockets is only available on new Windows hosts, like Windows Server 2016. More info is in this article "Make your own integration services": https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service The patch implements the necessary support in the guest side by introducing a new socket address family AF_HYPERV. Signed-off-by: Dexuan Cui <de...@microsoft.com> Cc: "K. Y. Srinivasan" <k...@microsoft.com> Cc: Haiyang Zhang <haiya...@microsoft.com> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com> --- MAINTAINERS |2 + include/linux/hyperv.h | 16 + include/linux/socket.h |5 +- include/net/af_hvsock.h | 51 ++ include/uapi/linux/hyperv.h | 16 + net/Kconfig |1 + net/Makefile|1 + net/hv_sock/Kconfig | 10 + net/hv_sock/Makefile|3 + net/hv_sock/af_hvsock.c | 1481 +++ 10 files changed, 1584 insertions(+), 2 deletions(-) create mode 100644 include/net/af_hvsock.h create mode 100644 net/hv_sock/Kconfig create mode 100644 net/hv_sock/Makefile create mode 100644 net/hv_sock/af_hvsock.c diff --git a/MAINTAINERS b/MAINTAINERS index 67d99dd..7b6f203 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5267,7 +5267,9 @@ F:drivers/pci/host/pci-hyperv.c F: drivers/net/hyperv/ F: drivers/scsi/storvsc_drv.c F: drivers/video/fbdev/hyperv_fb.c +F: net/hv_sock/ F: include/linux/hyperv.h +F: include/net/af_hvsock.h F: tools/hv/ F: Documentation/ABI/stable/sysfs-bus-vmbus diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index aa0fadc..b92439d 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1338,4 +1338,20 @@ extern __u32 vmbus_proto_version; int vmbus_send_tl_connect_request(const uuid_le *shv_guest_servie_id, const uuid_le *shv_host_servie_id); +struct vmpipe_proto_header { + u32 pkt_type; + u32 data_size; +} __packed; + +#define HVSOCK_HEADER_LEN (sizeof(struct vmpacket_descriptor) + \ +sizeof(struct vmpipe_proto_header)) + +/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */ +#define PREV_INDICES_LEN (sizeof(u64)) + +#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \ + ALIGN((payload_len), 8) + \ + PREV_INDICES_LEN) +#define HVSOCK_MIN_PKT_LEN HVSOCK_PKT_LEN(1) + #endif /* _HYPERV_H */ diff --git a/include/linux/socket.h b/include/linux/socket.h index 73bf6c6..88b1ccd 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -201,8 +201,8 @@ struct ucred { #define AF_NFC 39 /* NFC sockets */ #define AF_VSOCK 40 /* vSockets */ #define AF_KCM 41 /* Kernel Connection Multiplexor*/ - -#define AF_MAX 42 /* For now.. */ +#define AF_HYPERV 42 /* Hyper-V Sockets */ +#define AF_MAX 43 /* For now.. */ /* Protocol families, same as address families. */ #define PF_UNSPEC AF_UNSPEC @@ -249,6 +249,7 @@ struct ucred { #define PF_NFC AF_NFC #define PF_VSOCK AF_VSOCK #define PF_KCM AF_KCM +#define PF_HYPERV AF_HYPERV #define PF_MAX AF_MAX /* Maximum queue length specifiable by listen. */ diff --git a/include/net/af_hvsock.h b/include/net/af_hvsock.h new file mode 100644 index 000..a5aa28d --- /dev/null +++ b/include/net/af_hvsock.h @@ -0,0 +1,51 @@ +#ifndef __AF_HVSOCK_H__ +#define __AF_HVSOCK_H__ + +#include +#include +#include + +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV (5 * PAGE_SIZE) +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_SEND (5 * PAGE_SIZE) + +#define HVSOCK_RCV_BUF_SZ VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV +#define HVSOCK_SND_BUF_SZ PAGE_SIZE + +#define sk_to_hvsock(__sk)((struct hvsock_sock *)(__sk)) +#define hvsock_to_sk(__hvsk) ((struct sock *)(__hvsk)) + +struct hvsock_sock { + /* sk must be the first member. */ + struct sock sk; + + struct sockaddr_hv local_addr; + struct sockaddr_hv remote_addr; + + /* protected by the global hvsock_mutex */ + struct list_head bound_list; + struct list_head connected_list; + + struct list_head accept_queue; + /* used by enqueue and dequeue */ + struct mutex accept_queue_mutex; + + struct delayed_work dwork; + + u32 peer_shutdown; + + struct vmbus_channel *channe
[PATCH v7 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication mechanism between the host and the guest. It's somewhat like TCP over VMBus, but the transportation layer (VMBus) is much simpler than IP. With Hyper-V Sockets, applications between the host and the guest can talk to each other directly by the traditional BSD-style socket APIs. Hyper-V Sockets is only available on new Windows hosts, like Windows Server 2016. More info is in this article "Make your own integration services": https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service The patch implements the necessary support in the guest side by introducing a new socket address family AF_HYPERV. Note: the VMBus driver side's supporting patches have been in the mainline tree. I know the kernel has already had a VM Sockets driver (AF_VSOCK) based on VMware VMCI (net/vmw_vsock/, drivers/misc/vmw_vmci), and KVM is proposing AF_VSOCK of virtio version: http://marc.info/?l=linux-netdev=145952064004765=2 However, though Hyper-V Sockets may seem conceptually similar to AF_VOSCK, there are differences in the transportation layer, and IMO these make the direct code reusing impractical: 1. In AF_VSOCK, the endpoint type is: , but in AF_HYPERV, the endpoint type is: . Here GUID is 128-bit. 2. AF_VSOCK supports SOCK_DGRAM, while AF_HYPERV doesn't. 3. AF_VSOCK supports some special sock opts, like SO_VM_SOCKETS_BUFFER_SIZE, SO_VM_SOCKETS_BUFFER_MIN/MAX_SIZE and SO_VM_SOCKETS_CONNECT_TIMEOUT. These are meaningless to AF_HYPERV. 4. Some AF_VSOCK's VMCI transportation ops are meanless to AF_HYPERV/VMBus, like .notify_recv_init .notify_recv_pre_block .notify_recv_pre_dequeue .notify_recv_post_dequeue .notify_send_init .notify_send_pre_block .notify_send_pre_enqueue .notify_send_post_enqueue etc. So I think we'd better introduce a new address family: AF_HYPERV. Please review the patch. Looking forward to your comments! Changes since v1: - updated "[PATCH 6/7] hvsock: introduce Hyper-V VM Sockets feature" - added __init and __exit for the module init/exit functions - net/hv_sock/Kconfig: "default m" -> "default m if HYPERV" - MODULE_LICENSE: "Dual MIT/GPL" -> "Dual BSD/GPL" Changes since v2: - fixed various coding issue pointed out by David Miller - fixed indentation issues - removed pr_debug in net/hv_sock/af_hvsock.c - used reverse-Chrismas-tree style for local variables. - EXPORT_SYMBOL -> EXPORT_SYMBOL_GPL Changes since v3: - fixed a few coding issue pointed by Vitaly Kuznetsov and Dan Carpenter - fixed the ret value in vmbus_recvpacket_hvsock on error - fixed the style of multi-line comment: vmbus_get_hvsock_rw_status() Changes since v4 (https://lkml.org/lkml/2015/7/28/404): - addressed all the comments about V4. - treat the hvsock offers/channels as special VMBus devices - add a mechanism to pass hvsock events to the hvsock driver - fixed some corner cases with proper locking when a connection is closed - rebased to the latest Greg's tree Changes since v5 (https://lkml.org/lkml/2015/12/24/103): - addressed the coding style issues (Vitaly Kuznetsov & David Miller, thanks!) - used a better coding for the per-channel rescind callback (Thank Vitaly!) - avoided the introduction of new VMBUS driver APIs vmbus_sendpacket_hvsock() and vmbus_recvpacket_hvsock() and used vmbus_sendpacket()/vmbus_recvpacket() in the higher level (i.e., the vmsock driver). Thank Vitaly! Changes since v6 (http://lkml.iu.edu/hypermail/linux/kernel/1601.3/01813.html) - only a few minor changes of coding style and comments Dexuan Cui (1): hv_sock: introduce Hyper-V Sockets MAINTAINERS |2 + include/linux/hyperv.h | 16 + include/linux/socket.h |5 +- include/net/af_hvsock.h | 51 ++ include/uapi/linux/hyperv.h | 16 + net/Kconfig |1 + net/Makefile|1 + net/hv_sock/Kconfig | 10 + net/hv_sock/Makefile|3 + net/hv_sock/af_hvsock.c | 1481 +++ 10 files changed, 1584 insertions(+), 2 deletions(-) create mode 100644 include/net/af_hvsock.h create mode 100644 net/hv_sock/Kconfig create mode 100644 net/hv_sock/Makefile create mode 100644 net/hv_sock/af_hvsock.c -- 2.1.0
RE: [PATCH net-next] net: add the AF_KCM entries to family name tables
> From: David Miller [mailto:da...@davemloft.net] > Sent: Thursday, April 7, 2016 11:59 > To: Dexuan Cui <de...@microsoft.com> > Cc: netdev@vger.kernel.org > Subject: Re: [PATCH net-next] net: add the AF_KCM entries to family name > tables > > From: Dexuan Cui <de...@microsoft.com> > Date: Thu, 7 Apr 2016 01:54:18 + > > > Can you please apply this to net-next too? > > That will happen transparently the next time I merge 'net' into > 'net-next'. > > It will happen at a time of my own choosing, and usually occurs > when I do a push of my 'net' tree to Linus and he takes it in, > and I know people need some 'net' things in 'net-next'. Thanks for the explanation! So, at present, let me only post the single AF_HYPERV patch to net-next and hold the patch that adds AF_HYPERV entries to the family name tables. Thanks, -- Dexuan
RE: [PATCH net-next] net: add the AF_KCM entries to family name tables
> From: David Miller [mailto:da...@davemloft.net] > Sent: Thursday, April 7, 2016 5:00 > To: Dexuan Cui <de...@microsoft.com> > Cc: netdev@vger.kernel.org > Subject: Re: [PATCH net-next] net: add the AF_KCM entries to family name > tables > > From: Dexuan Cui <de...@microsoft.com> > Date: Tue, 5 Apr 2016 07:41:11 -0700 > > > This is for the recent kcm driver, which introduces AF_KCM(41) in > > b7ac4eb(kcm: Kernel Connection Multiplexor module). > > > > Signed-off-by: Dexuan Cui <de...@microsoft.com> > > Cc: Signed-off-by: Tom Herbert <t...@herbertland.com> > > As this is a bug fix actually, applied to 'net'. David, Can you please apply this to net-next too? It looks net-next is open now and I'm going to resubmit my AF_HYPERV patchset, which needs to add AF_HYPERV entries to the family name tables too. Thanks, -- Dexuan
RE: [PATCH net-next] net: add the AF_KCM entries to family name tables
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org] > On Behalf Of Dexuan Cui > Sent: Tuesday, April 5, 2016 22:41 > To: da...@davemloft.net; netdev@vger.kernel.org > Subject: [PATCH net-next] net: add the AF_KCM entries to family name tables > > This is for the recent kcm driver, which introduces AF_KCM(41) in > b7ac4eb(kcm: Kernel Connection Multiplexor module). > > Signed-off-by: Dexuan Cui <de...@microsoft.com> > Cc: Signed-off-by: Tom Herbert <t...@herbertland.com> > --- > net/core/sock.c | 9 ++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/net/core/sock.c b/net/core/sock.c > index b67b9ae..7e73c26 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -221,7 +221,8 @@ static const char *const > af_family_key_strings[AF_MAX+1] = { >"sk_lock-AF_TIPC" , "sk_lock-AF_BLUETOOTH", "sk_lock-IUCV", >"sk_lock-AF_RXRPC" , "sk_lock-AF_ISDN" , "sk_lock-AF_PHONET" , >"sk_lock-AF_IEEE802154", "sk_lock-AF_CAIF" , "sk_lock-AF_ALG" , > - "sk_lock-AF_NFC" , "sk_lock-AF_VSOCK", "sk_lock-AF_MAX" > + "sk_lock-AF_NFC" , "sk_lock-AF_VSOCK", "sk_lock-AF_KCM" , > + "sk_lock-AF_MAX" > }; > static const char *const af_family_slock_key_strings[AF_MAX+1] = { >"slock-AF_UNSPEC", "slock-AF_UNIX" , "slock-AF_INET" , > @@ -237,7 +238,8 @@ static const char *const > af_family_slock_key_strings[AF_MAX+1] = { >"slock-AF_TIPC" , "slock-AF_BLUETOOTH", "slock-AF_IUCV" , >"slock-AF_RXRPC" , "slock-AF_ISDN" , "slock-AF_PHONET" , >"slock-AF_IEEE802154", "slock-AF_CAIF" , "slock-AF_ALG" , > - "slock-AF_NFC" , "slock-AF_VSOCK","slock-AF_MAX" > + "slock-AF_NFC" , "slock-AF_VSOCK","slock-AF_KCM" , > + "slock-AF_MAX" > }; > static const char *const af_family_clock_key_strings[AF_MAX+1] = { >"clock-AF_UNSPEC", "clock-AF_UNIX" , "clock-AF_INET" , > @@ -253,7 +255,8 @@ static const char *const > af_family_clock_key_strings[AF_MAX+1] = { >"clock-AF_TIPC" , "clock-AF_BLUETOOTH", "clock-AF_IUCV" , >"clock-AF_RXRPC" , "clock-AF_ISDN" , "clock-AF_PHONET" , >"clock-AF_IEEE802154", "clock-AF_CAIF" , "clock-AF_ALG" , > - "clock-AF_NFC" , "clock-AF_VSOCK", "clock-AF_MAX" > + "clock-AF_NFC" , "clock-AF_VSOCK", "clock-AF_KCM" , > + "clock-AF_MAX" > }; > > /* Added Tom to Cc. Thanks, -- Dexuan
[PATCH net-next] net: add the AF_KCM entries to family name tables
This is for the recent kcm driver, which introduces AF_KCM(41) in b7ac4eb(kcm: Kernel Connection Multiplexor module). Signed-off-by: Dexuan Cui <de...@microsoft.com> Cc: Signed-off-by: Tom Herbert <t...@herbertland.com> --- net/core/sock.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/net/core/sock.c b/net/core/sock.c index b67b9ae..7e73c26 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -221,7 +221,8 @@ static const char *const af_family_key_strings[AF_MAX+1] = { "sk_lock-AF_TIPC" , "sk_lock-AF_BLUETOOTH", "sk_lock-IUCV", "sk_lock-AF_RXRPC" , "sk_lock-AF_ISDN" , "sk_lock-AF_PHONET" , "sk_lock-AF_IEEE802154", "sk_lock-AF_CAIF" , "sk_lock-AF_ALG" , - "sk_lock-AF_NFC" , "sk_lock-AF_VSOCK", "sk_lock-AF_MAX" + "sk_lock-AF_NFC" , "sk_lock-AF_VSOCK", "sk_lock-AF_KCM" , + "sk_lock-AF_MAX" }; static const char *const af_family_slock_key_strings[AF_MAX+1] = { "slock-AF_UNSPEC", "slock-AF_UNIX" , "slock-AF_INET" , @@ -237,7 +238,8 @@ static const char *const af_family_slock_key_strings[AF_MAX+1] = { "slock-AF_TIPC" , "slock-AF_BLUETOOTH", "slock-AF_IUCV" , "slock-AF_RXRPC" , "slock-AF_ISDN" , "slock-AF_PHONET" , "slock-AF_IEEE802154", "slock-AF_CAIF" , "slock-AF_ALG" , - "slock-AF_NFC" , "slock-AF_VSOCK","slock-AF_MAX" + "slock-AF_NFC" , "slock-AF_VSOCK","slock-AF_KCM" , + "slock-AF_MAX" }; static const char *const af_family_clock_key_strings[AF_MAX+1] = { "clock-AF_UNSPEC", "clock-AF_UNIX" , "clock-AF_INET" , @@ -253,7 +255,8 @@ static const char *const af_family_clock_key_strings[AF_MAX+1] = { "clock-AF_TIPC" , "clock-AF_BLUETOOTH", "clock-AF_IUCV" , "clock-AF_RXRPC" , "clock-AF_ISDN" , "clock-AF_PHONET" , "clock-AF_IEEE802154", "clock-AF_CAIF" , "clock-AF_ALG" , - "clock-AF_NFC" , "clock-AF_VSOCK", "clock-AF_MAX" + "clock-AF_NFC" , "clock-AF_VSOCK", "clock-AF_KCM" , + "clock-AF_MAX" }; /* -- 2.1.0
Has the net-next tree been open now?
Hi David, I saw the v4.6-rc1 tag had been in net-next.git and a bunch of stmmac patches appeared on the tree's master branch yesterday. Thanks, -- Dexuan
RE: [PATCH net-next 1/3] net: add the AF_KCM entries to family name tables
> From: David Miller [mailto:da...@davemloft.net] > Sent: Monday, March 21, 2016 23:28 > To: Dexuan Cui <de...@microsoft.com> > Cc: gre...@linuxfoundation.org; netdev@vger.kernel.org; linux- > ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de; > a...@canonical.com; jasow...@redhat.com; KY Srinivasan > <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>; > vkuzn...@redhat.com > Subject: Re: [PATCH net-next 1/3] net: add the AF_KCM entries to family > name tables > > > Two things wrong with this submission: > > 1) You need to provide an initial "[PATCH net-next 0/3] ..." header posting >explaining at a high level what this patch series is about and how it is >implemented and why. Hi David, Thanks for the reply! I'll fix this. > 2) The net-next tree is closed at this time because we are in the merge > window, >therefore no new feature patches should be submitted to the netdev > mailing >list at this time. Please wait until some (reasonable) amount of time > after >the merge window closes to resubmit this. OK. I'll repost it when the merge window is open -- I suppose that would happen in 1~2 weeks, according to my reading the documentation. Thanks, -- Dexuan
[PATCH net-next 2/3] hv_sock: introduce Hyper-V Sockets
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication mechanism between the host and the guest. It's somewhat like TCP over VMBus, but the transportation layer (VMBus) is much simpler than IP. With Hyper-V Sockets, applications between the host and the guest can talk to each other directly by the traditional BSD-style socket APIs. Hyper-V Sockets is only available on new Windows hosts, like Windows Server 2016. More info is in this article "Make your own integration services": https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service The patch implements the necessary support in the guest side by introducing a new socket address family AF_HYPERV. Signed-off-by: Dexuan Cui <de...@microsoft.com> Cc: "K. Y. Srinivasan" <k...@microsoft.com> Cc: Haiyang Zhang <haiya...@microsoft.com> Cc: Vitaly Kuznetsov <vkuzn...@redhat.com> --- I posted the V6 of the hv_sock patchset in Jan: [PATCH V6 0/8] introduce Hyper-V VM Socket(hv_sock) http://lkml.iu.edu/hypermail/linux/kernel/1601.3/01813.html Now all the supporting patches in the VMBus side have been merged into the mainline tree and the net-next tree, I think it's time to re-post the net/ side's change -- I'm not sure if net-next is close now, since I don't see a "net-next is CLOSED" mail recently? The patch shouldn't cause any regression because it adds a new driver, not touching the existing code. Please comment on the patch. MAINTAINERS |2 + include/linux/hyperv.h | 16 + include/linux/socket.h |5 +- include/net/af_hvsock.h | 51 ++ include/uapi/linux/hyperv.h | 16 + net/Kconfig |1 + net/Makefile|1 + net/hv_sock/Kconfig | 10 + net/hv_sock/Makefile|3 + net/hv_sock/af_hvsock.c | 1480 +++ 10 files changed, 1583 insertions(+), 2 deletions(-) create mode 100644 include/net/af_hvsock.h create mode 100644 net/hv_sock/Kconfig create mode 100644 net/hv_sock/Makefile create mode 100644 net/hv_sock/af_hvsock.c diff --git a/MAINTAINERS b/MAINTAINERS index 0cbfc69..6fa438d 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5222,7 +5222,9 @@ F:drivers/pci/host/pci-hyperv.c F: drivers/net/hyperv/ F: drivers/scsi/storvsc_drv.c F: drivers/video/fbdev/hyperv_fb.c +F: net/hv_sock/ F: include/linux/hyperv.h +F: include/net/af_hvsock.h F: tools/hv/ F: Documentation/ABI/stable/sysfs-bus-vmbus diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h index aa0fadc..b92439d 100644 --- a/include/linux/hyperv.h +++ b/include/linux/hyperv.h @@ -1338,4 +1338,20 @@ extern __u32 vmbus_proto_version; int vmbus_send_tl_connect_request(const uuid_le *shv_guest_servie_id, const uuid_le *shv_host_servie_id); +struct vmpipe_proto_header { + u32 pkt_type; + u32 data_size; +} __packed; + +#define HVSOCK_HEADER_LEN (sizeof(struct vmpacket_descriptor) + \ +sizeof(struct vmpipe_proto_header)) + +/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */ +#define PREV_INDICES_LEN (sizeof(u64)) + +#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \ + ALIGN((payload_len), 8) + \ + PREV_INDICES_LEN) +#define HVSOCK_MIN_PKT_LEN HVSOCK_PKT_LEN(1) + #endif /* _HYPERV_H */ diff --git a/include/linux/socket.h b/include/linux/socket.h index 73bf6c6..88b1ccd 100644 --- a/include/linux/socket.h +++ b/include/linux/socket.h @@ -201,8 +201,8 @@ struct ucred { #define AF_NFC 39 /* NFC sockets */ #define AF_VSOCK 40 /* vSockets */ #define AF_KCM 41 /* Kernel Connection Multiplexor*/ - -#define AF_MAX 42 /* For now.. */ +#define AF_HYPERV 42 /* Hyper-V Sockets */ +#define AF_MAX 43 /* For now.. */ /* Protocol families, same as address families. */ #define PF_UNSPEC AF_UNSPEC @@ -249,6 +249,7 @@ struct ucred { #define PF_NFC AF_NFC #define PF_VSOCK AF_VSOCK #define PF_KCM AF_KCM +#define PF_HYPERV AF_HYPERV #define PF_MAX AF_MAX /* Maximum queue length specifiable by listen. */ diff --git a/include/net/af_hvsock.h b/include/net/af_hvsock.h new file mode 100644 index 000..a5aa28d --- /dev/null +++ b/include/net/af_hvsock.h @@ -0,0 +1,51 @@ +#ifndef __AF_HVSOCK_H__ +#define __AF_HVSOCK_H__ + +#include +#include +#include + +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV (5 * PAGE_SIZE) +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_SEND (5 * PAGE_SIZE) + +#define HVSOCK_RCV_BUF_SZ VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV +#define HVSOCK_SND_BUF_SZ PAGE_SIZE + +#define sk_to_hvsock(__sk)((struct hvsock_sock *)(__sk)) +#define hvso
[PATCH net-next 3/3] net: add the AF_HYPERV entries to family name tables
This is for the hv_sock driver, which introduces AF_HYPERV(42). Signed-off-by: Dexuan Cui <de...@microsoft.com> Cc: "K. Y. Srinivasan" <k...@microsoft.com> Cc: Haiyang Zhang <haiya...@microsoft.com> --- net/core/sock.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/net/core/sock.c b/net/core/sock.c index 7e73c26..51ffc54 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -222,7 +222,7 @@ static const char *const af_family_key_strings[AF_MAX+1] = { "sk_lock-AF_RXRPC" , "sk_lock-AF_ISDN" , "sk_lock-AF_PHONET" , "sk_lock-AF_IEEE802154", "sk_lock-AF_CAIF" , "sk_lock-AF_ALG" , "sk_lock-AF_NFC" , "sk_lock-AF_VSOCK", "sk_lock-AF_KCM" , - "sk_lock-AF_MAX" + "sk_lock-AF_HYPERV", "sk_lock-AF_MAX" }; static const char *const af_family_slock_key_strings[AF_MAX+1] = { "slock-AF_UNSPEC", "slock-AF_UNIX" , "slock-AF_INET" , @@ -239,7 +239,7 @@ static const char *const af_family_slock_key_strings[AF_MAX+1] = { "slock-AF_RXRPC" , "slock-AF_ISDN" , "slock-AF_PHONET" , "slock-AF_IEEE802154", "slock-AF_CAIF" , "slock-AF_ALG" , "slock-AF_NFC" , "slock-AF_VSOCK","slock-AF_KCM" , - "slock-AF_MAX" + "slock-AF_HYPERV", "slock-AF_MAX" }; static const char *const af_family_clock_key_strings[AF_MAX+1] = { "clock-AF_UNSPEC", "clock-AF_UNIX" , "clock-AF_INET" , @@ -256,7 +256,7 @@ static const char *const af_family_clock_key_strings[AF_MAX+1] = { "clock-AF_RXRPC" , "clock-AF_ISDN" , "clock-AF_PHONET" , "clock-AF_IEEE802154", "clock-AF_CAIF" , "clock-AF_ALG" , "clock-AF_NFC" , "clock-AF_VSOCK", "clock-AF_KCM" , - "clock-AF_MAX" + "clock-AF_HYPERV", "clock-AF_MAX" }; /* -- 2.1.0
[PATCH net-next 1/3] net: add the AF_KCM entries to family name tables
This is for the recent kcm driver, which introduces AF_KCM(41) in b7ac4eb(kcm: Kernel Connection Multiplexor module). Signed-off-by: Dexuan Cui <de...@microsoft.com> Cc: Signed-off-by: Tom Herbert <t...@herbertland.com> --- net/core/sock.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/net/core/sock.c b/net/core/sock.c index b67b9ae..7e73c26 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -221,7 +221,8 @@ static const char *const af_family_key_strings[AF_MAX+1] = { "sk_lock-AF_TIPC" , "sk_lock-AF_BLUETOOTH", "sk_lock-IUCV", "sk_lock-AF_RXRPC" , "sk_lock-AF_ISDN" , "sk_lock-AF_PHONET" , "sk_lock-AF_IEEE802154", "sk_lock-AF_CAIF" , "sk_lock-AF_ALG" , - "sk_lock-AF_NFC" , "sk_lock-AF_VSOCK", "sk_lock-AF_MAX" + "sk_lock-AF_NFC" , "sk_lock-AF_VSOCK", "sk_lock-AF_KCM" , + "sk_lock-AF_MAX" }; static const char *const af_family_slock_key_strings[AF_MAX+1] = { "slock-AF_UNSPEC", "slock-AF_UNIX" , "slock-AF_INET" , @@ -237,7 +238,8 @@ static const char *const af_family_slock_key_strings[AF_MAX+1] = { "slock-AF_TIPC" , "slock-AF_BLUETOOTH", "slock-AF_IUCV" , "slock-AF_RXRPC" , "slock-AF_ISDN" , "slock-AF_PHONET" , "slock-AF_IEEE802154", "slock-AF_CAIF" , "slock-AF_ALG" , - "slock-AF_NFC" , "slock-AF_VSOCK","slock-AF_MAX" + "slock-AF_NFC" , "slock-AF_VSOCK","slock-AF_KCM" , + "slock-AF_MAX" }; static const char *const af_family_clock_key_strings[AF_MAX+1] = { "clock-AF_UNSPEC", "clock-AF_UNIX" , "clock-AF_INET" , @@ -253,7 +255,8 @@ static const char *const af_family_clock_key_strings[AF_MAX+1] = { "clock-AF_TIPC" , "clock-AF_BLUETOOTH", "clock-AF_IUCV" , "clock-AF_RXRPC" , "clock-AF_ISDN" , "clock-AF_PHONET" , "clock-AF_IEEE802154", "clock-AF_CAIF" , "clock-AF_ALG" , - "clock-AF_NFC" , "clock-AF_VSOCK", "clock-AF_MAX" + "clock-AF_NFC" , "clock-AF_VSOCK", "clock-AF_KCM" , + "clock-AF_MAX" }; /* -- 2.1.0
RE: When will net-next merge with linux-next?
> From: gre...@linuxfoundation.org [mailto:gre...@linuxfoundation.org] > Sent: Wednesday, March 16, 2016 10:41 > To: Dexuan Cui <de...@microsoft.com> > Cc: David Miller <da...@davemloft.net>; netdev@vger.kernel.org; KY > Srinivasan <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com> > Subject: Re: When will net-next merge with linux-next? > > On Wed, Mar 16, 2016 at 01:58:50AM +, Dexuan Cui wrote: > > > From: gre...@linuxfoundation.org [mailto:gre...@linuxfoundation.org] > > > Sent: Tuesday, March 15, 2016 23:06 > > > To: Dexuan Cui <de...@microsoft.com> > > > Cc: David Miller <da...@davemloft.net>; netdev@vger.kernel.org; KY > > > Srinivasan <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com> > > > Subject: Re: When will net-next merge with linux-next? > > > > > > On Tue, Mar 15, 2016 at 11:11:34AM +, Dexuan Cui wrote: > > > > I'm wondering whether (and when) step 2 will happen in the next 2 weeks, > > > > that is, before the tag 4.6-rc1 is made. > > > > If not, I guess I'll miss 4.6? > > > > > > You missed 4.6 as your patch was not in any of our trees a few days > > > before 4.5 was released, sorry. > > > > > > greg k-h > > Hi Greg, > > Thanks for the reply! > > > > My patch has to go in net-next first, but even today's mainline and net-next > > haven't had the supporting patches in the VMBus driver, so I can't post my > > patch to net-next even today -- it seems it's doomed to need 2 major > > release cycles to push a feature that makes changes to 2 subsystems? :-( > > Usually, yes, unless you talk to us ahead of time so we can coordinate, > or have one of the patches go through a different tree (i.e. all in one > tree.). Greg, Thanks for your patient explanation! I thought the patch (AF_HYPERV) must go through net-next. > Just wait until 4.6-rc1 is out and all will be fine. > > greg k-h BTW, I saw this in Documentation/development-process/2.Process: "As a general rule, if you miss the merge window for a given feature, the best thing to do is to wait for the next development cycle. (An occasional exception is made for drivers for previously-unsupported hardware; if they touch no in-tree code, they cannot cause regressions and should be safe to add at any time)" I hope David could make an exception for the AF_HYPERV patch, since it is a new driver, touching no in-tree code and unlikely to cause regressions. :-) And actually the new driver won't be automatically loaded -- a user must manually load it before the feature can be used. Thanks, -- Dexuan
RE: When will net-next merge with linux-next?
> From: gre...@linuxfoundation.org [mailto:gre...@linuxfoundation.org] > Sent: Tuesday, March 15, 2016 23:06 > To: Dexuan Cui <de...@microsoft.com> > Cc: David Miller <da...@davemloft.net>; netdev@vger.kernel.org; KY > Srinivasan <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com> > Subject: Re: When will net-next merge with linux-next? > > On Tue, Mar 15, 2016 at 11:11:34AM +, Dexuan Cui wrote: > > I'm wondering whether (and when) step 2 will happen in the next 2 weeks, > > that is, before the tag 4.6-rc1 is made. > > If not, I guess I'll miss 4.6? > > You missed 4.6 as your patch was not in any of our trees a few days > before 4.5 was released, sorry. > > greg k-h Hi Greg, Thanks for the reply! My patch has to go in net-next first, but even today's mainline and net-next haven't had the supporting patches in the VMBus driver, so I can't post my patch to net-next even today -- it seems it's doomed to need 2 major release cycles to push a feature that makes changes to 2 subsystems? :-( I guess Greg will send a pull request to Linus within the next 1~2 weeks, so the supporting VMBus patches will be in the mainline after that. Hi David, May I know when you'll merge with the mainline kernel and how frequently do you usually do it? Thanks, -- Dexuan
RE: When will net-next merge with linux-next?
> From: gre...@linuxfoundation.org [mailto:gre...@linuxfoundation.org] > Sent: Tuesday, March 15, 2016 0:22 > To: Dexuan Cui <de...@microsoft.com> > Cc: David Miller <da...@davemloft.net>; netdev@vger.kernel.org; KY > Srinivasan <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com> > Subject: Re: When will net-next merge with linux-next? > > On Mon, Mar 14, 2016 at 06:09:41AM +, Dexuan Cui wrote: > > Hi David, > > I have a pending patch of the hv_sock driver, which should go into the > > kernel through the net-next tree: > > > https://lkml.org/lkml/2016/2/14/7 > > > The VMBus side's supporting patches of hv_sock have been in Greg's tree > > and linux-next for more than 1 month, but they haven't been in net-next > > yet, I suppose this is because of the releasing of 4.5. > > > > Now 4.5 is released. Will you merge with Greg's tree or linux-next? > > linux-next is a merge of all of the maintainer's trees, and it is > rebased every day, it's impossible to merge that back into a maintainers > tree, sorry. Greg, thanks for the reply! > > I read netdev-FAQ.txt, but still don't have a clear idea about how things > > work in my case. > > Try reading Documentation/development-process/ please. Things will get > merged together into Linus's tree over the next 2 weeks as we ask him to > pull our trees. > > greg k-h I read the development-process documents. Since 4.5 was released and Linus's merge window is open for 4.6, I guess what will happen next is: 1. Linus will pull from char-misc.git and net-netx.git; 2. David will merge with Linus's mainline tree; 3. I can post my patch to net-next.git then. I'm wondering whether (and when) step 2 will happen in the next 2 weeks, that is, before the tag 4.6-rc1 is made. If not, I guess I'll miss 4.6? Thanks, -- Dexuan
When will net-next merge with linux-next?
Hi David, I have a pending patch of the hv_sock driver, which should go into the kernel through the net-next tree: https://lkml.org/lkml/2016/2/14/7 The VMBus side's supporting patches of hv_sock have been in Greg's tree and linux-next for more than 1 month, but they haven't been in net-next yet, I suppose this is because of the releasing of 4.5. Now 4.5 is released. Will you merge with Greg's tree or linux-next? I read netdev-FAQ.txt, but still don't have a clear idea about how things work in my case. Thanks, -- Dexuan
RE: [REGRESSION, bisect] net: ipv6: unregister_netdevice: waiting for lo to become free. Usage count = 2
> Hi David, > On Wed, Mar 02, 2016 at 01:00:21PM -0800, David Ahern wrote: > > On 3/2/16 12:31 PM, Jeremiah Mahler wrote: > > >>On Tue, Mar 01, 2016 at 08:11:54AM +, Dexuan Cui wrote: > > >>>Hi, I got this line every 10 seconds with today's linux-next in a Hyper-V > guest, even > > >>>when I didn't configure any NIC for the guest: > > >>> > > >>>[ 72.604249] unregister_netdevice: waiting for lo to become free. Usage > count = 2 > > >>>[ 82.708170] unregister_netdevice: waiting for lo to become free. Usage > count = 2 > > >>>[ 92.788079] unregister_netdevice: waiting for lo to become free. Usage > count = 2 > > >>>[ 102.808132] unregister_netdevice: waiting for lo to become free. Usage > count = 2 > > >>>[ 112.928166] unregister_netdevice: waiting for lo to become free. Usage > count = 2 > > >>>[ 122.952069] unregister_netdevice: waiting for lo to become free. Usage > count = 2 > > >>> > > >>>I don't think this is related to the underlying host, since it's related > > >>>to "lo". > > > > This should fix it: > > https://patchwork.ozlabs.org/patch/591102/ > > > David > > That patch fixes the problem on my machine. > Thanks for the quick fix :-) > > - Jeremiah Mahler This works for me too! Thanks! Thanks, -- Dexuan
RE: [PATCH V6 0/8] introduce Hyper-V VM Socket(hv_sock)
> From: devel [mailto:driverdev-devel-boun...@linuxdriverproject.org] On Behalf > Of Dexuan Cui > Sent: Tuesday, January 26, 2016 17:40 > ... > Dexuan Cui (8): > Drivers: hv: vmbus: add a helper function to set a channel's pending > send size > Drivers: hv: vmbus: define the new offer type for Hyper-V socket > (hvsock) > Drivers: hv: vmbus: vmbus_sendpacket_ctl: hvsock: avoid unnecessary > signaling > Drivers: hv: vmbus: define a new VMBus message type for hvsock > Drivers: hv: vmbus: add a hvsock flag in struct hv_driver > Drivers: hv: vmbus: add a per-channel rescind callback > Drivers: hv: vmbus: add an API vmbus_hvsock_device_unregister() > hvsock: introduce Hyper-V Socket feature Hi David, Greg has accepted all my VMBus driver side's patches. I'm going to post the net/hv_sock/ patch now. I know I should rebase my patch to the net-next tree, but net-next hasn't contained my VMBus driver side's patches, which are a prerequisite of my net/hv_sock/ patch. It looks I have to wait before you merge net-next with Greg's tree, or with the mainline (after Greg pushes the changes to the mainline)? If so, may I know when the next merge will be happening (so I don't need to check net-next every day :-) ) ? Thanks, -- Dexuan
[PATCH V6 3/8] Drivers: hv: vmbus: vmbus_sendpacket_ctl: hvsock: avoid unnecessary signaling
When the hvsock channel's outbound ringbuffer is full (i.e., hv_ringbuffer_write() returns -EAGAIN), we should avoid the unnecessary signaling the host. Signed-off-by: Dexuan Cui <de...@microsoft.com> --- drivers/hv/channel.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c index 1161d68..3f04533 100644 --- a/drivers/hv/channel.c +++ b/drivers/hv/channel.c @@ -659,6 +659,9 @@ int vmbus_sendpacket_ctl(struct vmbus_channel *channel, void *buffer, * If we cannot write to the ring-buffer; signal the host * even if we may not have written anything. This is a rare * enough condition that it should not matter. +* NOTE: in this case, the hvsock channel is an exception, because +* it looks the host side's hvsock implementation has a throttling +* mechanism which can hurt the performance otherwise. */ if (channel->signal_policy) @@ -666,7 +669,8 @@ int vmbus_sendpacket_ctl(struct vmbus_channel *channel, void *buffer, else kick_q = true; - if (((ret == 0) && kick_q && signal) || (ret)) + if (((ret == 0) && kick_q && signal) || + (ret && !is_hvsock_channel(channel))) vmbus_setevent(channel); return ret; -- 2.1.0