RE: [PATCH] hv_netvsc: Fix a deadlock by getting rtnl_lock earlier in netvsc_probe()

2018-08-29 Thread Dexuan Cui
> From: David Miller 
> Sent: Wednesday, August 29, 2018 17:49
> 
> From: Dexuan Cui 
> Date: Wed, 22 Aug 2018 21:20:03 +
> 
> > ---
> >  drivers/net/hyperv/netvsc_drv.c | 11 ++-
> >  1 file changed, 10 insertions(+), 1 deletion(-)
> >
> >
> > FYI: these are the related 3 paths which show the deadlock:
> 
> This incredibly useful information belongs in the commit log
> message, and therefore before the --- and signoffs.

Hi David,
I was afraid the call-traces are too detailed. :-)

Can you please move the info to before the --- line?

Or, should I resend the patch with the commit log updated?

Thanks,
-- Dexuan


[PATCH net] hv_sock: add locking in the open/close/release code paths

2017-10-18 Thread Dexuan Cui

Without the patch, when hvs_open_connection() hasn't completely established
a connection (e.g. it has changed sk->sk_state to SS_CONNECTED, but hasn't
inserted the sock into the connected queue), vsock_stream_connect() may see
the sk_state change and return the connection to the userspace, and next
when the userspace closes the connection quickly, hvs_release() may not see
the connection in the connected queue; finally hvs_open_connection()
inserts the connection into the queue, but we won't be able to purge the
connection for ever.

Signed-off-by: Dexuan Cui <de...@microsoft.com>
Cc: K. Y. Srinivasan <k...@microsoft.com>
Cc: Haiyang Zhang <haiya...@microsoft.com>
Cc: Stephen Hemminger <sthem...@microsoft.com>
Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
Cc: Cathy Avery <cav...@redhat.com>
Cc: Rolf Neugebauer <rolf.neugeba...@docker.com>
Cc: Marcelo Cerri <marcelo.ce...@canonical.com>
---

Please consider this for v4.14.

 net/vmw_vsock/hyperv_transport.c | 22 ++
 1 file changed, 18 insertions(+), 4 deletions(-)

diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
index 14ed5a3..e21991f 100644
--- a/net/vmw_vsock/hyperv_transport.c
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -310,11 +310,15 @@ static void hvs_close_connection(struct vmbus_channel 
*chan)
struct sock *sk = get_per_channel_state(chan);
struct vsock_sock *vsk = vsock_sk(sk);
 
+   lock_sock(sk);
+
sk->sk_state = SS_UNCONNECTED;
sock_set_flag(sk, SOCK_DONE);
vsk->peer_shutdown |= SEND_SHUTDOWN | RCV_SHUTDOWN;
 
sk->sk_state_change(sk);
+
+   release_sock(sk);
 }
 
 static void hvs_open_connection(struct vmbus_channel *chan)
@@ -344,6 +348,8 @@ static void hvs_open_connection(struct vmbus_channel *chan)
if (!sk)
return;
 
+   lock_sock(sk);
+
if ((conn_from_host && sk->sk_state != VSOCK_SS_LISTEN) ||
(!conn_from_host && sk->sk_state != SS_CONNECTING))
goto out;
@@ -395,9 +401,7 @@ static void hvs_open_connection(struct vmbus_channel *chan)
 
vsock_insert_connected(vnew);
 
-   lock_sock(sk);
vsock_enqueue_accept(sk, new);
-   release_sock(sk);
} else {
sk->sk_state = SS_CONNECTED;
sk->sk_socket->state = SS_CONNECTED;
@@ -410,6 +414,8 @@ static void hvs_open_connection(struct vmbus_channel *chan)
 out:
/* Release refcnt obtained when we called vsock_find_bound_socket() */
sock_put(sk);
+
+   release_sock(sk);
 }
 
 static u32 hvs_get_local_cid(void)
@@ -476,13 +482,21 @@ static int hvs_shutdown(struct vsock_sock *vsk, int mode)
 
 static void hvs_release(struct vsock_sock *vsk)
 {
+   struct sock *sk = sk_vsock(vsk);
struct hvsock *hvs = vsk->trans;
-   struct vmbus_channel *chan = hvs->chan;
+   struct vmbus_channel *chan;
 
+   lock_sock(sk);
+
+   sk->sk_state = SS_DISCONNECTING;
+   vsock_remove_sock(vsk);
+
+   release_sock(sk);
+
+   chan = hvs->chan;
if (chan)
hvs_shutdown(vsk, RCV_SHUTDOWN | SEND_SHUTDOWN);
 
-   vsock_remove_sock(vsk);
 }
 
 static void hvs_destruct(struct vsock_sock *vsk)
-- 
2.7.4



RE: [PATCH] vsock: only load vmci transport on VMware hypervisor by default

2017-09-06 Thread Dexuan Cui
> From: Jorgen S. Hansen [mailto:jhan...@vmware.com]
> Sent: Wednesday, September 6, 2017 7:11 AM
>> ...
> > I'm currently working on NFS over AF_VSOCK and sock_diag support (for
> > ss(8) and netstat-like tools).
> >
> > Multi-transport support is lower priority for me at the moment.  I'm
> > happy to review patches though.  If there is no progress on this by the
> > end of the year then I will have time to work on it.
> >
> 
> I’ll try to find time to write a more coherent proposal in the coming weeks,
> and we can discuss that.
> 
> Jorgen

Thank you! 

Thanks,
-- Dexuan


RE: [PATCH] vsock: only load vmci transport on VMware hypervisor by default

2017-09-02 Thread Dexuan Cui
> From: Stefan Hajnoczi [mailto:stefa...@redhat.com]
> Sent: Thursday, August 31, 2017 4:55 AM
> ...
> On Tue, Aug 29, 2017 at 03:37:07PM +, Jorgen S. Hansen wrote:
> > > On Aug 29, 2017, at 4:36 AM, Dexuan Cui <de...@microsoft.com> wrote:
> > If we allow multiple host side transports, virtio host side support and
> > vmci should be able to coexist regardless of the order of initialization.
> 
> That sounds good to me.
> 
> This means af_vsock.c needs to be aware of CID allocation.  Currently the
> vhost_vsock.ko driver handles this itself (it keeps a list of CIDs and
> checks that they are not used twice).  It should be possible to move
> that state into af_vsock.c so we have <cid, host_transport> pairs.
> 
> I'm currently working on NFS over AF_VSOCK and sock_diag support (for
> ss(8) and netstat-like tools).
> 
> Multi-transport support is lower priority for me at the moment.  I'm
> happy to review patches though.  If there is no progress on this by the
> end of the year then I will have time to work on it.
I understand. Thank you both for sharing the details about the plan!
 
> Are either of you are in Prague, Czech Republic on October 25-27 for
> Linux Kernel Summit, Open Source Summit Europe, Embedded Linux
> Conference Europe, KVM Forum, or MesosCon Europe?
> 
> Stefan
I regret I won't be there this year. 

Thanks,
-- Dexuan


RE: [PATCH] vsock: only load vmci transport on VMware hypervisor by default

2017-08-28 Thread Dexuan Cui
> From: Dexuan Cui
> Sent: Tuesday, August 22, 2017 21:21
> > ...
> > ...
> > The only problem here would be the potential for a guest and a host app
> to
> > have a conflict wrt port numbers, even though they would be able to
> > operate fine, if restricted to their appropriate transport.
> >
> > Thanks,
> > Jorgen
> 
> Hi Jorgen, Stefan,
> Thank you for the detailed analysis!
> You have a much better understanding than me about the complex
> scenarios. Can you please work out a patch? :-)

Hi Jorgen, Stefan,
May I know your plan for this? 
 
> IMO Linux driver of Hyper-V sockets is the simplest case, as we only have
> the "to host" option (the host side driver of Hyper-V sockets runs on
> Windows kernel and I don't think the other hypervisors emulate
> the full Hyper-V VMBus 4.0, which is required to support Hyper-V sockets).
> 
> -- Dexuan

Thanks,
-- Dexuan


RE: [PATCH v3 net-next 1/1] hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)

2017-08-28 Thread Dexuan Cui
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Monday, August 28, 2017 15:39
> From: Dexuan Cui <de...@microsoft.com>
> Date: Sat, 26 Aug 2017 04:52:43 +
> 
> >
> > Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
> > mechanism between the host and the guest. It uses VMBus ringbuffer as
> the
> > transportation layer.
> >
> > With hv_sock, applications between the host (Windows 10, Windows
> Server
> > 2016 or newer) and the guest can talk with each other using the traditional
> > socket APIs.
> >
> > Signed-off-by: Dexuan Cui <de...@microsoft.com>
> 
> Applied, thank you.

Thanks a lot!

There are some supporting patches still pending in the VMBus driver.
I'll make sure they go in through the char-misc tree.

Thanks,
-- Dexuan


[PATCH v3 net-next 1/1] hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)

2017-08-25 Thread Dexuan Cui

Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It uses VMBus ringbuffer as the
transportation layer.

With hv_sock, applications between the host (Windows 10, Windows Server
2016 or newer) and the guest can talk with each other using the traditional
socket APIs.

More info about Hyper-V Sockets is available here:

"Make your own integration services":
https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-guide/make-integration-service

The patch implements the necessary support in Linux guest by introducing a new
vsock transport for AF_VSOCK.

Signed-off-by: Dexuan Cui <de...@microsoft.com>
Cc: K. Y. Srinivasan <k...@microsoft.com>
Cc: Haiyang Zhang <haiya...@microsoft.com>
Cc: Stephen Hemminger <sthem...@microsoft.com>
Cc: Andy King <ack...@vmware.com>
Cc: Dmitry Torokhov <d...@vmware.com>
Cc: George Zhang <georgezh...@vmware.com>
Cc: Jorgen Hansen <jhan...@vmware.com>
Cc: Reilly Grant <gra...@vmware.com>
Cc: Asias He <as...@redhat.com>
Cc: Stefan Hajnoczi <stefa...@redhat.com>
Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
Cc: Cathy Avery <cav...@redhat.com>
Cc: Rolf Neugebauer <rolf.neugeba...@docker.com>
Cc: Marcelo Cerri <marcelo.ce...@canonical.com>

---

Changes in v2:
fixed hvs_stream_allow() for cid and the comments
Thanks Stefan Hajnoczi!

added proper locking when using vsock_enqueue_accept()
Thanks Stefan Hajnoczi and Jorgen Hansen!


The previous v1 patch is not needed any more:
[PATCH net-next 2/3] vsock: fix vsock_dequeue/enqueue_accept race

Another previous v1 patch is being discussed in another thread:
vsock: only load vmci transport on VMware hypervisor by default

Changes in v3 (addressed David Millers's comments):
used better naming: VMBUS_PKT_TRAILER_SIZE
better handled fin_sent: removed atomic
removed "inline" tags
better handled uuid service_id assignments: avoid pointers

 MAINTAINERS  |   1 +
 net/vmw_vsock/Kconfig|  12 +
 net/vmw_vsock/Makefile   |   3 +
 net/vmw_vsock/hyperv_transport.c | 904 +++
 4 files changed, 920 insertions(+)
 create mode 100644 net/vmw_vsock/hyperv_transport.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 2db0f8c..dae0573 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6279,6 +6279,7 @@ F:drivers/net/hyperv/
 F: drivers/scsi/storvsc_drv.c
 F: drivers/uio/uio_hv_generic.c
 F: drivers/video/fbdev/hyperv_fb.c
+F: net/vmw_vsock/hyperv_transport.c
 F: include/linux/hyperv.h
 F: tools/hv/
 F: Documentation/ABI/stable/sysfs-bus-vmbus
diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig
index a7ae09d..3f52929 100644
--- a/net/vmw_vsock/Kconfig
+++ b/net/vmw_vsock/Kconfig
@@ -46,3 +46,15 @@ config VIRTIO_VSOCKETS_COMMON
  This option is selected by any driver which needs to access
  the virtio_vsock.  The module will be called
  vmw_vsock_virtio_transport_common.
+
+config HYPERV_VSOCKETS
+   tristate "Hyper-V transport for Virtual Sockets"
+   depends on VSOCKETS && HYPERV
+   help
+ This module implements a Hyper-V transport for Virtual Sockets.
+
+ Enable this transport if your Virtual Machine host supports Virtual
+ Sockets over Hyper-V VMBus.
+
+ To compile this driver as a module, choose M here: the module will be
+ called hv_sock. If unsure, say N.
diff --git a/net/vmw_vsock/Makefile b/net/vmw_vsock/Makefile
index 09fc2eb..e63d574 100644
--- a/net/vmw_vsock/Makefile
+++ b/net/vmw_vsock/Makefile
@@ -2,6 +2,7 @@ obj-$(CONFIG_VSOCKETS) += vsock.o
 obj-$(CONFIG_VMWARE_VMCI_VSOCKETS) += vmw_vsock_vmci_transport.o
 obj-$(CONFIG_VIRTIO_VSOCKETS) += vmw_vsock_virtio_transport.o
 obj-$(CONFIG_VIRTIO_VSOCKETS_COMMON) += vmw_vsock_virtio_transport_common.o
+obj-$(CONFIG_HYPERV_VSOCKETS) += hv_sock.o
 
 vsock-y += af_vsock.o af_vsock_tap.o vsock_addr.o
 
@@ -11,3 +12,5 @@ vmw_vsock_vmci_transport-y += vmci_transport.o 
vmci_transport_notify.o \
 vmw_vsock_virtio_transport-y += virtio_transport.o
 
 vmw_vsock_virtio_transport_common-y += virtio_transport_common.o
+
+hv_sock-y += hyperv_transport.o
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
new file mode 100644
index 000..14ed5a3
--- /dev/null
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -0,0 +1,904 @@
+/*
+ * Hyper-V transport for vsock
+ *
+ * Hyper-V Sockets supplies a byte-stream based communication mechanism
+ * between the host and the VM. This driver implements the necessary
+ * support in the VM by introducing the new vsock transport.
+ *
+ * Copyright (c) 2017, Microsoft Corporation.
+ *
+ * This program is free software; you can redistribute i

RE: [PATCH v2 net-next 1/1] hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)

2017-08-24 Thread Dexuan Cui
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Thursday, August 24, 2017 18:20
> > +#define VMBUS_PKT_TRAILER  (sizeof(u64))
>
> This is not the packet trailer, it's the size of the packet trailer.

Thanks! I'll change it to VMBUS_PKT_TRAILER_SIZE.

> > +   /* Have we sent the zero-length packet (FIN)? */
> > +   unsigned long fin_sent;
>
> Why does this need to be atomic?  Why can't a smaller simpler
It doesn't have to be. It was originally made for a quick workaround.
Thanks! I should do it in the right way now.

> mechanism be used to make sure hvs_shutdown() only performs
> hvs_send_data() call once on the channel?
I'll change "fin_sent" to bool, and avoid test_and_set_bit().
I'll add lock_sock/release_sock()  in hvs_shutdown() like this:

static int hvs_shutdown(struct vsock_sock *vsk, int mode)
{
 ...
   lock_sock(sk);

hvs = vsk->trans;
if (hvs->fin_sent)
goto out;

send_buf = (struct hvs_send_buf *)
(void)hvs_send_data(hvs->chan, send_buf, 0);

hvs->fin_sent = true;
out:
release_sock(sk);
return 0;
}

> > +static inline bool is_valid_srv_id(const uuid_le *id)
> > +{
> > +   return !memcmp(>b[4], _id_template.b[4], sizeof(uuid_le) -
> 4);
> > +}
>
> Do not use the inline function attribute in *.c code.  Let the
> compiler decide.

OK. Will remove all the inline usages.

> > +   *((u32 *)>vm_srv_id) = vsk->local_addr.svm_port;
> > +   *((u32 *)>host_srv_id) = vsk->remote_addr.svm_port;
>
> There has to be a better way to express this.
I may need to define a uinon here. Let me try it.

 > And if this is partially initializing vm_srv_id, at a minimum
> endianness needs to be taken into account.
I may need to use cpu_to_le32(). Let me check it.



[PATCH v2 net-next 1/1] hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)

2017-08-22 Thread Dexuan Cui

Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It uses VMBus ringbuffer as the
transportation layer.

With hv_sock, applications between the host (Windows 10, Windows Server
2016 or newer) and the guest can talk with each other using the traditional
socket APIs.

More info about Hyper-V Sockets is available here:

"Make your own integration services":
https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-guide/make-integration-service

The patch implements the necessary support in Linux guest by introducing a new
vsock transport for AF_VSOCK.

Signed-off-by: Dexuan Cui <de...@microsoft.com>
Cc: K. Y. Srinivasan <k...@microsoft.com>
Cc: Haiyang Zhang <haiya...@microsoft.com>
Cc: Stephen Hemminger <sthem...@microsoft.com>
Cc: Andy King <ack...@vmware.com>
Cc: Dmitry Torokhov <d...@vmware.com>
Cc: George Zhang <georgezh...@vmware.com>
Cc: Jorgen Hansen <jhan...@vmware.com>
Cc: Reilly Grant <gra...@vmware.com>
Cc: Stefan Hajnoczi <stefa...@redhat.com>
Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
Cc: Cathy Avery <cav...@redhat.com>
Cc: Rolf Neugebauer <rolf.neugeba...@docker.com>
Cc: Marcelo Cerri <marcelo.ce...@canonical.com>
---

Changes in v2:
Fixed hvs_stream_allow() for cid and the comments
Thanks Stefan Hajnoczi!

Added proper locking when using vsock_enqueue_accept()
Thanks Stefan Hajnoczi and Jorgen Hansen!

The previous v1 patch is not needed any more:
[PATCH net-next 2/3] vsock: fix vsock_dequeue/enqueue_accept race

Another previous v1 patch is being discussed in another thread:
vsock: only load vmci transport on VMware hypervisor by default


 MAINTAINERS  |   1 +
 net/vmw_vsock/Kconfig|  12 +
 net/vmw_vsock/Makefile   |   3 +
 net/vmw_vsock/hyperv_transport.c | 888 +++
 4 files changed, 904 insertions(+)
 create mode 100644 net/vmw_vsock/hyperv_transport.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 2db0f8c..dae0573 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6279,6 +6279,7 @@ F:drivers/net/hyperv/
 F: drivers/scsi/storvsc_drv.c
 F: drivers/uio/uio_hv_generic.c
 F: drivers/video/fbdev/hyperv_fb.c
+F: net/vmw_vsock/hyperv_transport.c
 F: include/linux/hyperv.h
 F: tools/hv/
 F: Documentation/ABI/stable/sysfs-bus-vmbus
diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig
index a7ae09d..3f52929 100644
--- a/net/vmw_vsock/Kconfig
+++ b/net/vmw_vsock/Kconfig
@@ -46,3 +46,15 @@ config VIRTIO_VSOCKETS_COMMON
  This option is selected by any driver which needs to access
  the virtio_vsock.  The module will be called
  vmw_vsock_virtio_transport_common.
+
+config HYPERV_VSOCKETS
+   tristate "Hyper-V transport for Virtual Sockets"
+   depends on VSOCKETS && HYPERV
+   help
+ This module implements a Hyper-V transport for Virtual Sockets.
+
+ Enable this transport if your Virtual Machine host supports Virtual
+ Sockets over Hyper-V VMBus.
+
+ To compile this driver as a module, choose M here: the module will be
+ called hv_sock. If unsure, say N.
diff --git a/net/vmw_vsock/Makefile b/net/vmw_vsock/Makefile
index 09fc2eb..e63d574 100644
--- a/net/vmw_vsock/Makefile
+++ b/net/vmw_vsock/Makefile
@@ -2,6 +2,7 @@ obj-$(CONFIG_VSOCKETS) += vsock.o
 obj-$(CONFIG_VMWARE_VMCI_VSOCKETS) += vmw_vsock_vmci_transport.o
 obj-$(CONFIG_VIRTIO_VSOCKETS) += vmw_vsock_virtio_transport.o
 obj-$(CONFIG_VIRTIO_VSOCKETS_COMMON) += vmw_vsock_virtio_transport_common.o
+obj-$(CONFIG_HYPERV_VSOCKETS) += hv_sock.o
 
 vsock-y += af_vsock.o af_vsock_tap.o vsock_addr.o
 
@@ -11,3 +12,5 @@ vmw_vsock_vmci_transport-y += vmci_transport.o 
vmci_transport_notify.o \
 vmw_vsock_virtio_transport-y += virtio_transport.o
 
 vmw_vsock_virtio_transport_common-y += virtio_transport_common.o
+
+hv_sock-y += hyperv_transport.o
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
new file mode 100644
index 000..597fb25
--- /dev/null
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -0,0 +1,888 @@
+/*
+ * Hyper-V transport for vsock
+ *
+ * Hyper-V Sockets supplies a byte-stream based communication mechanism
+ * between the host and the VM. This driver implements the necessary
+ * support in the VM by introducing the new vsock transport.
+ *
+ * Copyright (c) 2017, Microsoft Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS F

RE: [PATCH] vsock: only load vmci transport on VMware hypervisor by default

2017-08-22 Thread Dexuan Cui
> From: Jorgen S. Hansen [mailto:jhan...@vmware.com]
> > On Aug 22, 2017, at 11:54 AM, Stefan Hajnoczi 
> wrote:
> > ...
> > We *can* by looking at the destination CID.  Please take a look at
> > drivers/misc/vmw_vmci/vmci_route.c:vmci_route() to see how VMCI
> handles
> > nested virt.
> >
> > It boils down to something like this:
> >
> >  static int vsock_stream_connect(struct socket *sock, struct sockaddr *addr,
> >  int addr_len, int flags)
> >  {
> >  ...
> >  if (remote_addr.svm_cid == VMADDR_CID_HOST)
> >  transport = host_transport;
> >  else
> >  transport = guest_transport;
> >
> > It's easy for connect(2) but Jorgen mentioned it's harder for listen(2)
> > because the socket would need to listen on both transports.  We define
> > two new constants VMADDR_CID_LISTEN_FROM_GUEST and
> > VMADDR_CID_LISTEN_FROM_HOST for bind(2) so that applications can
> decide
> > which side to listen on.
> 
> If a socket is bound to VMADDR_CID_HOST, we would consider that socket as
> bound to the host side transport, so that would be the same as
> VMADDR_CID_LISTEN_FROM_GUEST. For the guest, we have
> IOCTL_VM_SOCKETS_GET_LOCAL_CID, so that could be used to get and bind
> a socket to the guest transport (VMCI will always return the guest CID as the
> local one, if the VMCI driver is used in a guest, and it looks like virtio 
> will do
> the same). We could treat VMADDR_CID_ANY as always being the guest
> transport, since that is the use case where you don’t know upfront what
> your CID is, if we don’t want to listen on all transports. So we would use the
> host transport, if a socket is bound to VMADDR_CID_HOST, or if there is no
> guest transport, and in all other cases use the guest transport. However,
> having a couple of symbolic names like you suggest certainly makes it more
> obvious, and could be used in combination with this. It would be a plus if
> existing applications would function as intended in most cases.
> 
> >   Or the listen socket could simply listen to
> > both sides.
> 
> The only problem here would be the potential for a guest and a host app to
> have a conflict wrt port numbers, even though they would be able to
> operate fine, if restricted to their appropriate transport.
> 
> Thanks,
> Jorgen

Hi Jorgen, Stefan,
Thank you for the detailed analysis!
You have a much better understanding than me about the complex
scenarios. Can you please work out a patch? :-)

IMO Linux driver of Hyper-V sockets is the simplest case, as we only have
the "to host" option (the host side driver of Hyper-V sockets runs on 
Windows kernel and I don't think the other hypervisors emulate
the full Hyper-V VMBus 4.0, which is required to support Hyper-V sockets).

-- Dexuan



RE: [PATCH net-next 3/3] hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)

2017-08-22 Thread Dexuan Cui
> From: Stefan Hajnoczi [mailto:stefa...@redhat.com]
> On Fri, Aug 18, 2017 at 10:23:54PM +, Dexuan Cui wrote:
> > > > +static bool hvs_stream_allow(u32 cid, u32 port)
> > > > +{
> > > > +   static const u32 valid_cids[] = {
> > > > +   VMADDR_CID_ANY,
> > >
> > > Is this for loopback?
> >
> > No, we don't support lookback in Linux VM, at least for now.
> > In our Linux implementation, Linux VM can only connect to the host, and
> > here when Linux VM calls connect(), I treat  VMADDR_CID_ANY
> > the same as VMADDR_CID_HOST.
> 
> VMCI and virtio-vsock do not treat connect(VMADDR_CID_ANY) the same as
> connect(VMADDR_CID_HOST).  It is an error to connect to VMADDR_CID_ANY.

Ok. Then I'll only allow VMADDR_CID_HOST as the destination CID, since 
we don't support loopback mode.

> > > > +   /* The host's port range [MIN_HOST_EPHEMERAL_PORT, 0x)
> > > is
> > > > +* reserved as ephemeral ports, which are used as the host's 
> > > > ports
> > > > +* when the host initiates connections.
> > > > +*/
> > > > +   if (port > MAX_HOST_LISTEN_PORT)
> > > > +   return false;
> > >
> > > Without this if statement the guest will attempt to connect.  I guess
> > > there will be no listen sockets above MAX_HOST_LISTEN_PORT, so the
> > > connection attempt will fail.
> >
> > You're correct.
> > To use the vsock common infrastructure, we have to map Hyper-V's
> > GUID <VM_ID, Service_ID> to int <cid, port>, and hence we must limit
> > the port range we can listen() on to [0, MAX_LISTEN_PORT], i.e.
> > we can only use half of the whole 32-bit port space for listen().
> > This is detailed in the long comments starting at about Line 100.
> >
> > > ...but hardcode this knowledge into the guest driver?
> > I'd like the guest's connect() to fail immediately here.
> > IMO this is better than a connect timeout. :-)
> 
> Thanks for explaining.  Perhaps the comment could be updated:
> 
>  /* The host's port range [MIN_HOST_EPHEMERAL_PORT, 0x) is
>   * reserved as ephemeral ports, which are used as the host's ports when
>   * the host initiates connections.
>   *
>   * Perform this check in the guest so an immediate error is produced
>   * instead of a timeout.
>   */
> 
> Stefan

Thank you, Stefan! 
Please see the below for the updated version of the function:

static bool hvs_stream_allow(u32 cid, u32 port)
{
/* The host's port range [MIN_HOST_EPHEMERAL_PORT, 0x) is
 * reserved as ephemeral ports, which are used as the host's ports
 * when the host initiates connections.
 *
 * Perform this check in the guest so an immediate error is produced
 * instead of a timeout.
 */
if (port > MAX_HOST_LISTEN_PORT)
return false;

if (cid == VMADDR_CID_HOST)
return true;

return false;
}

I'll send a v2 of the patch later today.

-- Dexuan


RE: [PATCH] vsock: only load vmci transport on VMware hypervisor by default

2017-08-18 Thread Dexuan Cui
> From: Stefan Hajnoczi [mailto:stefa...@redhat.com]
> > CID is not really used by us, because we only support guest<->host
> communication,
> > and don't support guest<->guest communication. The Hyper-V host
> references
> > every VM by VmID (which is invisible to the VM), and a VM can only talk to
> the
> > host via this feature.
> 
> Applications running inside the guest should use VMADDR_CID_HOST (2) to
> connect to the host, even on Hyper-V.
I have no objection, and this patch does support this usage of the
user-space applications.
 
> By the way, we should collaborate on a test suite and a vsock(7) man
> page that documents the semantics of AF_VSOCK sockets.  This way our
> transports will have the same behavior and AF_VSOCK applications will
> work on all 3 hypervisors.
I can't agree more. :-)
BTW, I have been using Rolf's test suite to test my patch:
https://github.com/rn/virtsock/tree/master/c
Maybe this can be a good starting point.
 
> Not all features need to be supported.  For example, VMCI supports
> SOCK_DGRAM while Hyper-V and virtio do not.  But features that are
> available should behave identically.
I totally agree, though I'm afraid Hyper-V may have a little more limitations
compared to VMware/KVM duo to the  <--> 
mapping.
 
> > Can we use the 'protocol' parameter in the socket() function:
> > int socket(int domain, int type, int protocol)
> >
> > IMO currently the 'protocol' is not really used.
> > I think we can modify __vsock_core_init() to allow multiple transport layers
> to
> >  be registered, and we can define different 'protocol' numbers for
> > VMware/KVM/Hyper-V, and ask the application to explicitly specify what
> should
> > be used. Considering compatibility, we can use the default transport in a
> given
> > VM depending on the underlying hypervisor.
> 
> I think AF_VSOCK should hide the transport from users/applications.
Ideally yes, but let's consider the KVM-on-KVM nested scenario: when
an application in the Level-1 VM creates an AF_VSOCK socket and call
connect() for it, how can we know if the app is trying to connect to
the Level-0 host, or connect to the Level-2 VM? We can't. That's why
I propose we should use the 'protocol' parameter to distinguish between
"to guest" and "to host".

With my proposal, in the above scenario, by default (the 'protocol' is 0),
we choose the "to host" transport layer when socket() is called; if the
userspace app explicitly specifies "to guest", we choose the "to guest"
transport layer when socket() is called. This way, the connect(), bind(), etc.
can work automatically.
(Of course, the default transport for a give VM can be better chosen
if we detect which nested level the app is running on.)

> Think of same-on-same nested virtualization: VMware-on-VMware or
> KVM-on-KVM.  In that case specifying VMCI or virtio doesn't help.
> 
> We'd still need to distinguish between "to guest" and "to host"
> (currently VMCI has code to do this but virtio does not).
> 
> The natural place to distinguish the destination is when dealing with
> the sockaddr in connect(), bind(), etc.
> 
> Stefan

Thanks,
-- Dexuan


RE: [PATCH net-next 3/3] hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)

2017-08-18 Thread Dexuan Cui
> From: Stefan Hajnoczi [mailto:stefa...@redhat.com]
> Sent: Thursday, August 17, 2017 07:56
> To: Dexuan Cui <de...@microsoft.com>
> On Tue, Aug 15, 2017 at 10:18:41PM +, Dexuan Cui wrote:
> > +static u32 hvs_get_local_cid(void)
> > +{
> > +   return VMADDR_CID_ANY;
> > +}
> 
> Interesting concept: the guest never knows its CID.  This is nice from a
> live migration perspective.  Currently VMCI and virtio adjust listen
> socket local CIDs after migration.
> 
> > +static bool hvs_stream_allow(u32 cid, u32 port)
> > +{
> > +   static const u32 valid_cids[] = {
> > +   VMADDR_CID_ANY,
> 
> Is this for loopback?

No, we don't support lookback in Linux VM, at least for now.
In our Linux implementation, Linux VM can only connect to the host, and
here when Linux VM calls connect(), I treat  VMADDR_CID_ANY 
the same as VMADDR_CID_HOST.

> > +   VMADDR_CID_HOST,
> > +   };
> > +   int i;
> > +
> > +   /* The host's port range [MIN_HOST_EPHEMERAL_PORT, 0x)
> is
> > +* reserved as ephemeral ports, which are used as the host's ports
> > +* when the host initiates connections.
> > +*/
> > +   if (port > MAX_HOST_LISTEN_PORT)
> > +   return false;
> 
> Without this if statement the guest will attempt to connect.  I guess
> there will be no listen sockets above MAX_HOST_LISTEN_PORT, so the
> connection attempt will fail.

You're correct.
To use the vsock common infrastructure, we have to map Hyper-V's
GUID <VM_ID, Service_ID> to int <cid, port>, and hence we must limit
the port range we can listen() on to [0, MAX_LISTEN_PORT], i.e.
we can only use half of the whole 32-bit port space for listen().
This is detailed in the long comments starting at about Line 100.
 
> ...but hardcode this knowledge into the guest driver?
I'd like the guest's connect() to fail immediately here.
IMO this is better than a connect timeout. :-)

Thanks,
-- Dexuan


RE: [PATCH net-next 2/3] vsock: fix vsock_dequeue/enqueue_accept race

2017-08-18 Thread Dexuan Cui
> From: Stefan Hajnoczi [mailto:stefa...@redhat.com]
> Sent: Thursday, August 17, 2017 07:06
> 
> On Tue, Aug 15, 2017 at 10:15:39PM +0000, Dexuan Cui wrote:
> > With the current code, when vsock_dequeue_accept() is removing a sock
> > from the list, nothing prevents vsock_enqueue_accept() from adding a new
> > sock into the list concurrently. We should add a lock to protect the list.
> 
> The listener sock is locked, preventing concurrent modification.  I have
> checked both the virtio and vmci transports.  Can you post an example
> where the listener sock isn't locked?
> 
> Stefan
Sorry, I was not careful when checking the vmci code. 
Please ignore the patch.

Now I realized the expectation is that the individual transport drivers should
do the locking for vsock_enqueue_accept(), but for vsock_dequeue_accept(),
the locking is done by the common vsock driver.

Thanks,
-- Dexuan


RE: [PATCH] vsock: only load vmci transport on VMware hypervisor by default

2017-08-17 Thread Dexuan Cui
> From: Jorgen S. Hansen [mailto:jhan...@vmware.com]
> Sent: Thursday, August 17, 2017 08:17
> >
> > Putting aside nested virtualization, I want to load the transport (vmci,
> > Hyper-V, vsock) for which there is paravirtualized hardware present
> > inside the guest.
> 
> Good points. Completely agree that this is the desired behavior for a guest.
> 
> 
> > It's a little tricker on the host side (doesn't matter for Hyper-V and
> > probably also doesn't for VMware) because the host-side driver is a
> > software device with no hardware backing it.  In KVM we assume the
> > vhost_vsock.ko kernel module will be loaded sufficiently early.
> 
> Since the vmci driver is currently tied to PF_VSOCK it hasn’t been a problem,
> but on the host side the VMCI driver has no hardware backing it either, so
> when we move to a more appropriate solution, this will be an issue for VMCI as
> well. I’ll check our shipped products, but they most likely assume that if an
> upstreamed vmci module is present, it will be loaded automatically.

Hyper-V Sockets is a standard feature of VMBus v4.0, so we can easily know
we can and should load iff vmbus_proto_version >= VERSION_WIN10.

> > Things get trickier with nested virtualization because the VM might want
> > to talk to its host but also to its nested VMs.  The simple way of
> > fixing this would be to allow two transports loaded simultaneously and
> > route traffic destined to CID 2 to the host transport and all other
> > traffic to the guest transport.

This sounds like a little tricky to me.
CID is not really used by us, because we only support guest<->host 
communication,
and don't support guest<->guest communication. The Hyper-V host references
every VM by VmID (which is invisible to the VM), and a VM can only talk to the
host via this feature.

> This is close to the routing the VMCI driver does in a nested environment, but
> that is with the assumption that there is only one type of transport. Having 
> two
> different transports would require that we delay resolving the transport type
> until the socket endpoint has been bound to an address. Things get trickier if
> listening sockets use VMADDR_CID_ANY - if only one transport is present, this
> would allow the socket to accept connections from both guests and outer host,
> but with multiple transports that won’t work, since we can’t associate a 
> socket
> with a transport until the socket is bound.
> 
> >
> > Perhaps we should discuss these cases a bit more to figure out how to
> > avoid conflicts over MODULE_ALIAS_NETPROTO(PF_VSOCK).
> 
> Agreed.

Can we use the 'protocol' parameter in the socket() function:
int socket(int domain, int type, int protocol) 

IMO currently the 'protocol' is not really used.
I think we can modify __vsock_core_init() to allow multiple transport layers to
 be registered, and we can define different 'protocol' numbers for
VMware/KVM/Hyper-V, and ask the application to explicitly specify what should
be used. Considering compatibility, we can use the default transport in a given
VM depending on the underlying hypervisor. 

-- Dexuan


RE: [PATCH net-next 2/3] vsock: fix vsock_dequeue/enqueue_accept race

2017-08-17 Thread Dexuan Cui
> > On Aug 16, 2017, at 12:15 AM, Dexuan Cui <de...@microsoft.com> wrote:
> > With the current code, when vsock_dequeue_accept() is removing a sock
> > from the list, nothing prevents vsock_enqueue_accept() from adding a new
> > sock into the list concurrently. We should add a lock to protect the list.
> 
> For the VMCI socket transport, we always lock the sockets before calling into
> vsock_enqueue_accept and af_vsock.c locks the socket before calling
> vsock_dequeue_accept, so from our point of view these operations are already
> protected, but with finer granularity than a single global lock. As far as I 
> can see,
> the virtio transport also locks the socket before calling 
> vsock_enqueue_accept,
> so they should be fine with the current version as well, but Stefan can 
> comment
> on that.
> 
> Jorgen

Hi Jorgen,
Thanks, you're correct.
Please ignore this patch. I'll update the hv_sock driver to add proper 
lock_sock()/relesae_sock().

Thanks,
-- Dexuan


RE: [PATCH] vsock: only load vmci transport on VMware hypervisor by default

2017-08-17 Thread Dexuan Cui
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Thursday, August 17, 2017 10:04
> I would avoid module parameters at all costs.
> 
> It is the worst possible interface for users of your software.
> 
> You really need to fundamentally solve the problems related to making
> sure the proper modules for the VM actually present on the system get
> loaded when necessary rather than adding hacks like this.
> 
> Unlike a proper solution, these hacks are ugly but have to stay around
> forever once you put them in place.

Sorry for reminding me again, David! :-)

I'll try to figure out the correct solution.

Thanks,
-- Dexuan


RE: [PATCH net-next 1/3] VMCI: only load on VMware hypervisor

2017-08-17 Thread Dexuan Cui
> From: Dexuan Cui
> Sent: Wednesday, August 16, 2017 15:34
> > From: Jorgen S. Hansen [mailto:jhan...@vmware.com]
> > > Without the patch, vmw_vsock_vmci_transport.ko and vmw_vmci.ko can
> > > automatically load when an application creates an AF_VSOCK socket.
> > >
> > > This is the expected good behavior on VMware hypervisor, but as we
> > > are going to add hv_sock.ko (i.e. Hyper-V transport for AF_VSOCK), we
> > > should make sure vmw_vsock_vmci_transport.ko doesn't load on Hyper-
> V,
> > > otherwise there is a -EBUSY conflict when both
> vmw_vsock_vmci_transport.ko
> > > and hv_sock.ko try to call vsock_core_init() on Hyper-V.
> >
> > The VMCI driver (vmw_vmci.ko) is used both by the VMware guest support
> > (VMware Tools primarily) and by our Workstation product. Always
> disabling the
> > VMCI driver on Hyper-V means that user won’t be able to run Workstation
> > nested in Linux VMs on Hyper-V. Since the VMCI driver itself isn’t the
> problem
> > here, maybe we could move the check to vmw_vsock_vmci_transport.ko?
> > Ideally, there should be some way for a user to have access to both
> protocols,
> > but for now disabling the VMCI socket transport for Hyper-V (possibly with
> a
> > module option to skip that check and always load it) but leaving the VMCI
> driver
> > functional would be better,
> >
> > Jorgen
> 
> Thank you for explaining the background!
> Then I'll make a new patch, following your suggestion.
> 
> -- Dexuan

Hi Jorgen, David,

Just now I posted a new patch
 "[PATCH] vsock: only load vmci transport on VMware hypervisor by default"
to replace this patch.

@Jorgen: 
FWIW, with the new patch, when I create an AF_VSOCK sockets on Hyper-V,
vmw_vmci.ko is also automatically loaded and 3 lines of kernel messages are
printed, but I think I'm OK with this, since it's harmless.

-- Dexuan


[PATCH] vsock: only load vmci transport on VMware hypervisor by default

2017-08-17 Thread Dexuan Cui

Without the patch, vmw_vsock_vmci_transport.ko can automatically load
when an application creates an AF_VSOCK socket.

This is the expected good behavior on VMware hypervisor, but as we
are going to add hv_sock.ko (i.e. Hyper-V transport for AF_VSOCK), we
should make sure vmw_vsock_vmci_transport.ko can't load on Hyper-V,
otherwise there is a -EBUSY conflict when both vmw_vsock_vmci_transport.ko
and hv_sock.ko try to call vsock_core_init() on Hyper-V.

On the other hand, hv_sock.ko can only load on Hyper-V, because it
depends on hv_vmbus.ko, which detects Hyper-V in hv_acpi_init().

KVM's vsock_virtio_transport doesn't have the issue because it doesn't
define MODULE_ALIAS_NETPROTO(PF_VSOCK).

The patch also adds a module parameter "skip_hypervisor_check" for
vmw_vsock_vmci_transport.ko.

Signed-off-by: Dexuan Cui <de...@microsoft.com>
Cc: Alok Kataria <akata...@vmware.com>
Cc: Andy King <ack...@vmware.com>
Cc: Adit Ranadive <ad...@vmware.com>
Cc: George Zhang <georgezh...@vmware.com>
Cc: Jorgen Hansen <jhan...@vmware.com>
Cc: K. Y. Srinivasan <k...@microsoft.com>
Cc: Haiyang Zhang <haiya...@microsoft.com>
Cc: Stephen Hemminger <sthem...@microsoft.com>
---
 net/vmw_vsock/Kconfig  |  2 +-
 net/vmw_vsock/vmci_transport.c | 11 +++
 2 files changed, 12 insertions(+), 1 deletion(-)

diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig
index a24369d..3f52929 100644
--- a/net/vmw_vsock/Kconfig
+++ b/net/vmw_vsock/Kconfig
@@ -17,7 +17,7 @@ config VSOCKETS
 
 config VMWARE_VMCI_VSOCKETS
tristate "VMware VMCI transport for Virtual Sockets"
-   depends on VSOCKETS && VMWARE_VMCI
+   depends on VSOCKETS && VMWARE_VMCI && HYPERVISOR_GUEST
help
  This module implements a VMCI transport for Virtual Sockets.
 
diff --git a/net/vmw_vsock/vmci_transport.c b/net/vmw_vsock/vmci_transport.c
index 10ae782..c068873 100644
--- a/net/vmw_vsock/vmci_transport.c
+++ b/net/vmw_vsock/vmci_transport.c
@@ -16,6 +16,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -73,6 +74,10 @@ struct vmci_transport_recv_pkt_info {
struct vmci_transport_packet pkt;
 };
 
+static bool skip_hypervisor_check;
+module_param(skip_hypervisor_check, bool, 0444);
+MODULE_PARM_DESC(hot_add, "If set, attempt to load on non-VMware platforms");
+
 static LIST_HEAD(vmci_transport_cleanup_list);
 static DEFINE_SPINLOCK(vmci_transport_cleanup_lock);
 static DECLARE_WORK(vmci_transport_cleanup_work, vmci_transport_cleanup);
@@ -2085,6 +2090,12 @@ static int __init vmci_transport_init(void)
 {
int err;
 
+   /* Check if we are running on VMware's hypervisor and bail out
+* if we are not.
+*/
+   if (!skip_hypervisor_check && x86_hyper != _hyper_vmware)
+   return -ENODEV;
+
/* Create the datagram handle that we will use to send and receive all
 * VSocket control messages for this context.
 */
-- 
2.7.4



RE: [PATCH net-next 1/3] VMCI: only load on VMware hypervisor

2017-08-16 Thread Dexuan Cui
> From: Jorgen S. Hansen [mailto:jhan...@vmware.com]
> > Without the patch, vmw_vsock_vmci_transport.ko and vmw_vmci.ko can
> > automatically load when an application creates an AF_VSOCK socket.
> >
> > This is the expected good behavior on VMware hypervisor, but as we
> > are going to add hv_sock.ko (i.e. Hyper-V transport for AF_VSOCK), we
> > should make sure vmw_vsock_vmci_transport.ko doesn't load on Hyper-V,
> > otherwise there is a -EBUSY conflict when both vmw_vsock_vmci_transport.ko
> > and hv_sock.ko try to call vsock_core_init() on Hyper-V.
> 
> The VMCI driver (vmw_vmci.ko) is used both by the VMware guest support
> (VMware Tools primarily) and by our Workstation product. Always disabling the
> VMCI driver on Hyper-V means that user won’t be able to run Workstation
> nested in Linux VMs on Hyper-V. Since the VMCI driver itself isn’t the problem
> here, maybe we could move the check to vmw_vsock_vmci_transport.ko?
> Ideally, there should be some way for a user to have access to both protocols,
> but for now disabling the VMCI socket transport for Hyper-V (possibly with a
> module option to skip that check and always load it) but leaving the VMCI 
> driver
> functional would be better,
> 
> Jorgen

Thank you for explaining the background!
Then I'll make a new patch, following your suggestion.

-- Dexuan


RE: [PATCH net-next 1/3] VMCI: only load on VMware hypervisor

2017-08-16 Thread Dexuan Cui
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Wednesday, August 16, 2017 11:07
> 
> From: Dexuan Cui <de...@microsoft.com>
> Date: Tue, 15 Aug 2017 22:13:29 +
> 
> > +   /*
> > +* Check if we are running on VMware's hypervisor and bail out
> > +* if we are not.
> > +*/
> > +   if (x86_hyper != _hyper_vmware)
> > +   return -ENODEV;
> 
> This symbol is only available when CONFIG_HYPERVISOR_GUEST is defined.
> But this driver does not have a Kconfig dependency on that symbol so
> the build can fail in some configurations.

Hi David,
It looks typically modern Linux distros have CONFIG_HYPERVISOR_GUEST=y
by default, but I agree here we should make the dependency explicit:

--- a/drivers/misc/vmw_vmci/Kconfig
+++ b/drivers/misc/vmw_vmci/Kconfig
@@ -4,7 +4,7 @@

 config VMWARE_VMCI
tristate "VMware VMCI Driver"
-   depends on X86 && PCI
+   depends on X86 && PCI && HYPERVISOR_GUEST
help
  This is VMware's Virtual Machine Communication Interface.  It enables
  high-speed communication between host and guest in a virtual

And it looks it's not a bad thing to add the dependency, because some
existing VMWare drivers have had the dependency on CONFIG_HYPERVISOR_GUEST=y:
drivers/input/mouse/vmmouse.c (MOUSE_PS2_VMMOUSE)
drivers/misc/vmw_balloon.c (VMWARE_BALLOON)

Do you want me to submit a v2 for this patch with the Kconfig change?

-- Dexuan


[PATCH net-next 3/3] hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)

2017-08-15 Thread Dexuan Cui

Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It uses VMBus ringbuffer as the
transportation layer.

With hv_sock, applications between the host (Windows 10, Windows Server
2016 or newer) and the guest can talk with each other using the traditional
socket APIs.

More info about Hyper-V Sockets is available here:

"Make your own integration services":
https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/user-guide/make-integration-service

The patch implements the necessary support in Linux guest by introducing a new
vsock transport for AF_VSOCK.

Signed-off-by: Dexuan Cui <de...@microsoft.com>
Cc: K. Y. Srinivasan <k...@microsoft.com>
Cc: Haiyang Zhang <haiya...@microsoft.com>
Cc: Stephen Hemminger <sthem...@microsoft.com>
Cc: Andy King <ack...@vmware.com>
Cc: Dmitry Torokhov <d...@vmware.com>
Cc: George Zhang <georgezh...@vmware.com>
Cc: Jorgen Hansen <jhan...@vmware.com>
Cc: Reilly Grant <gra...@vmware.com>
Cc: Asias He <as...@redhat.com>
Cc: Stefan Hajnoczi <stefa...@redhat.com>
Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
Cc: Cathy Avery <cav...@redhat.com>
Cc: Rolf Neugebauer <rolf.neugeba...@docker.com>
Cc: Marcelo Cerri <marcelo.ce...@canonical.com>
---
 MAINTAINERS  |   1 +
 net/vmw_vsock/Kconfig|  12 +
 net/vmw_vsock/Makefile   |   3 +
 net/vmw_vsock/hyperv_transport.c | 890 +++
 4 files changed, 906 insertions(+)
 create mode 100644 net/vmw_vsock/hyperv_transport.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 2db0f8c..dae0573 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6279,6 +6279,7 @@ F:drivers/net/hyperv/
 F: drivers/scsi/storvsc_drv.c
 F: drivers/uio/uio_hv_generic.c
 F: drivers/video/fbdev/hyperv_fb.c
+F: net/vmw_vsock/hyperv_transport.c
 F: include/linux/hyperv.h
 F: tools/hv/
 F: Documentation/ABI/stable/sysfs-bus-vmbus
diff --git a/net/vmw_vsock/Kconfig b/net/vmw_vsock/Kconfig
index 8831e7c..a24369d 100644
--- a/net/vmw_vsock/Kconfig
+++ b/net/vmw_vsock/Kconfig
@@ -46,3 +46,15 @@ config VIRTIO_VSOCKETS_COMMON
  This option is selected by any driver which needs to access
  the virtio_vsock.  The module will be called
  vmw_vsock_virtio_transport_common.
+
+config HYPERV_VSOCKETS
+   tristate "Hyper-V transport for Virtual Sockets"
+   depends on VSOCKETS && HYPERV
+   help
+ This module implements a Hyper-V transport for Virtual Sockets.
+
+ Enable this transport if your Virtual Machine host supports Virtual
+ Sockets over Hyper-V VMBus.
+
+ To compile this driver as a module, choose M here: the module will be
+ called hv_sock. If unsure, say N.
diff --git a/net/vmw_vsock/Makefile b/net/vmw_vsock/Makefile
index 09fc2eb..e63d574 100644
--- a/net/vmw_vsock/Makefile
+++ b/net/vmw_vsock/Makefile
@@ -2,6 +2,7 @@ obj-$(CONFIG_VSOCKETS) += vsock.o
 obj-$(CONFIG_VMWARE_VMCI_VSOCKETS) += vmw_vsock_vmci_transport.o
 obj-$(CONFIG_VIRTIO_VSOCKETS) += vmw_vsock_virtio_transport.o
 obj-$(CONFIG_VIRTIO_VSOCKETS_COMMON) += vmw_vsock_virtio_transport_common.o
+obj-$(CONFIG_HYPERV_VSOCKETS) += hv_sock.o
 
 vsock-y += af_vsock.o af_vsock_tap.o vsock_addr.o
 
@@ -11,3 +12,5 @@ vmw_vsock_vmci_transport-y += vmci_transport.o 
vmci_transport_notify.o \
 vmw_vsock_virtio_transport-y += virtio_transport.o
 
 vmw_vsock_virtio_transport_common-y += virtio_transport_common.o
+
+hv_sock-y += hyperv_transport.o
diff --git a/net/vmw_vsock/hyperv_transport.c b/net/vmw_vsock/hyperv_transport.c
new file mode 100644
index 000..1913b38
--- /dev/null
+++ b/net/vmw_vsock/hyperv_transport.c
@@ -0,0 +1,890 @@
+/*
+ * Hyper-V transport for vsock
+ *
+ * Hyper-V Sockets supplies a byte-stream based communication mechanism
+ * between the host and the VM. This driver implements the necessary
+ * support in the VM by introducing the new vsock transport.
+ *
+ * Copyright (c) 2017, Microsoft Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* The host side's design of the feature requires 6 exact 4KB pages for
+ * recv/send rings respectively -- this is suboptimal considering memory
+ * consumption, however unluckily we have to live with it, before the
+ * host comes up with a better design in the future.
+ */
+#define PAGE_SIZE_4K   4096
+#define RINGBUFFER_

[PATCH net-next 2/3] vsock: fix vsock_dequeue/enqueue_accept race

2017-08-15 Thread Dexuan Cui

With the current code, when vsock_dequeue_accept() is removing a sock
from the list, nothing prevents vsock_enqueue_accept() from adding a new
sock into the list concurrently. We should add a lock to protect the list.

Signed-off-by: Dexuan Cui <de...@microsoft.com>
Cc: Andy King <ack...@vmware.com>
Cc: Dmitry Torokhov <d...@vmware.com>
Cc: George Zhang <georgezh...@vmware.com>
Cc: Jorgen Hansen <jhan...@vmware.com>
Cc: Reilly Grant <gra...@vmware.com>
Cc: Asias He <as...@redhat.com>
Cc: Stefan Hajnoczi <stefa...@redhat.com>
Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
Cc: Cathy Avery <cav...@redhat.com>
Cc: K. Y. Srinivasan <k...@microsoft.com>
Cc: Haiyang Zhang <haiya...@microsoft.com>
Cc: Stephen Hemminger <sthem...@microsoft.com>
---
 net/vmw_vsock/af_vsock.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/net/vmw_vsock/af_vsock.c b/net/vmw_vsock/af_vsock.c
index dfc8c51e..b7b2c66 100644
--- a/net/vmw_vsock/af_vsock.c
+++ b/net/vmw_vsock/af_vsock.c
@@ -126,6 +126,7 @@ static struct proto vsock_proto = {
 
 static const struct vsock_transport *transport;
 static DEFINE_MUTEX(vsock_register_mutex);
+static DEFINE_SPINLOCK(vsock_accept_queue_lock);
 
 / EXPORTS /
 
@@ -406,7 +407,10 @@ void vsock_enqueue_accept(struct sock *listener, struct 
sock *connected)
 
sock_hold(connected);
sock_hold(listener);
+
+   spin_lock(_accept_queue_lock);
list_add_tail(>accept_queue, >accept_queue);
+   spin_unlock(_accept_queue_lock);
 }
 EXPORT_SYMBOL_GPL(vsock_enqueue_accept);
 
@@ -423,7 +427,10 @@ static struct sock *vsock_dequeue_accept(struct sock 
*listener)
vconnected = list_entry(vlistener->accept_queue.next,
struct vsock_sock, accept_queue);
 
+   spin_lock(_accept_queue_lock);
list_del_init(>accept_queue);
+   spin_unlock(_accept_queue_lock);
+
sock_put(listener);
/* The caller will need a reference on the connected socket so we let
 * it call sock_put().
-- 
2.7.4



[PATCH net-next 1/3] VMCI: only load on VMware hypervisor

2017-08-15 Thread Dexuan Cui

Without the patch, vmw_vsock_vmci_transport.ko and vmw_vmci.ko can
automatically load when an application creates an AF_VSOCK socket.

This is the expected good behavior on VMware hypervisor, but as we
are going to add hv_sock.ko (i.e. Hyper-V transport for AF_VSOCK), we
should make sure vmw_vsock_vmci_transport.ko doesn't load on Hyper-V,
otherwise there is a -EBUSY conflict when both vmw_vsock_vmci_transport.ko
and hv_sock.ko try to call vsock_core_init() on Hyper-V.

On the other hand, hv_sock.ko can only load on Hyper-V, because it
depends on hv_vmbus.ko, which detects Hyper-V in hv_acpi_init().

KVM's vsock_virtio_transport doesn't have the issue because it doesn't
define MODULE_ALIAS_NETPROTO(PF_VSOCK).

Signed-off-by: Dexuan Cui <de...@microsoft.com>
Cc: Alok Kataria <akata...@vmware.com>
Cc: Andy King <ack...@vmware.com>
Cc: Adit Ranadive <ad...@vmware.com>
Cc: George Zhang <georgezh...@vmware.com>
Cc: Jorgen Hansen <jhan...@vmware.com>
Cc: K. Y. Srinivasan <k...@microsoft.com>
Cc: Haiyang Zhang <haiya...@microsoft.com>
Cc: Stephen Hemminger <sthem...@microsoft.com>
---
 drivers/misc/vmw_vmci/vmci_driver.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/drivers/misc/vmw_vmci/vmci_driver.c 
b/drivers/misc/vmw_vmci/vmci_driver.c
index d7eaf1e..1789ea7 100644
--- a/drivers/misc/vmw_vmci/vmci_driver.c
+++ b/drivers/misc/vmw_vmci/vmci_driver.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "vmci_driver.h"
 #include "vmci_event.h"
@@ -58,6 +59,13 @@ static int __init vmci_drv_init(void)
int vmci_err;
int error;
 
+   /*
+* Check if we are running on VMware's hypervisor and bail out
+* if we are not.
+*/
+   if (x86_hyper != _hyper_vmware)
+   return -ENODEV;
+
vmci_err = vmci_event_init();
if (vmci_err < VMCI_SUCCESS) {
pr_err("Failed to initialize VMCIEvent (result=%d)\n",
-- 
2.7.4



[PATCH net-next 0/3] add Hyper-V transport for Virtual Sockets

2017-08-15 Thread Dexuan Cui

Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It uses VMBus ringbuffer as the
transportation layer.

PATCH 01 and 02 are for VMCI and the common infrastructure vsock.

PATCH 03 implements the necessary support in Linux guest by introducing a
new vsock transport for AF_VSOCK.

Please review them.

Note: there are some other supporting fixes in the VMBus driver. I'll
post them separately for the char-misc tree.


PS, there was an old implementation of Hyper-V Sockets posted last year:
https://patchwork.kernel.org/patch/9244467/, which was not accepted. The
biggest challenge was that why Hyper-V Sockets required a new address
family, and I explained that was because of its different end point format.

Compared to the old implementation, this new implementation maps Hyper-V
Sockets end point format  to vsock's
, and hence it manages to share the common vsock
infrastructure to greatly reduce duplicate code, and avoid adding a new
address family. The details are documented in PATCH 03.

Dexuan Cui (3):
  VMCI: only load on VMware hypervisor
  vsock: fix vsock_dequeue/enqueue_accept race
  hv_sock: implements Hyper-V transport for Virtual Sockets (AF_VSOCK)

 MAINTAINERS |   1 +
 drivers/misc/vmw_vmci/vmci_driver.c |   8 +
 net/vmw_vsock/Kconfig   |  12 +
 net/vmw_vsock/Makefile  |   3 +
 net/vmw_vsock/af_vsock.c|   7 +
 net/vmw_vsock/hyperv_transport.c| 890 
 6 files changed, 921 insertions(+)
 create mode 100644 net/vmw_vsock/hyperv_transport.c

-- 
2.7.4



[PATCH] netvsc: fix use-after-free in netvsc_change_mtu()

2017-03-02 Thread Dexuan Cui
'nvdev' is freed in rndis_filter_device_remove -> netvsc_device_remove ->
free_netvsc_device, so we mustn't access it, before it's re-created in
rndis_filter_device_add -> netvsc_device_add.

Signed-off-by: Dexuan Cui <de...@microsoft.com>
Cc: "K. Y. Srinivasan" <k...@microsoft.com>
Cc: Haiyang Zhang <haiya...@microsoft.com>
Cc: Stephen Hemminger <sthem...@microsoft.com>
---
 drivers/net/hyperv/netvsc_drv.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 2d3cdb0..bc05c89 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -859,15 +859,22 @@ static int netvsc_change_mtu(struct net_device *ndev, int 
mtu)
if (ret)
goto out;
 
+   memset(_info, 0, sizeof(device_info));
+   device_info.ring_size = ring_size;
+   device_info.num_chn = nvdev->num_chn;
+   device_info.max_num_vrss_chns = nvdev->num_chn;
+
ndevctx->start_remove = true;
rndis_filter_device_remove(hdev, nvdev);
 
+   /* 'nvdev' has been freed in rndis_filter_device_remove() ->
+* netvsc_device_remove () -> free_netvsc_device().
+* We mustn't access it before it's re-created in
+* rndis_filter_device_add() -> netvsc_device_add().
+*/
+
ndev->mtu = mtu;
 
-   memset(_info, 0, sizeof(device_info));
-   device_info.ring_size = ring_size;
-   device_info.num_chn = nvdev->num_chn;
-   device_info.max_num_vrss_chns = nvdev->num_chn;
rndis_filter_device_add(hdev, _info);
 
 out:
-- 
2.7.4



Mellanox ConnectX-3 VF driver can't work with 16 CPUs?

2017-02-09 Thread Dexuan Cui
Hi, 
While trying SR-IOV with a Linux guest running on Hyper-V, I found this issue:
the VF driver can't work if the guest has 16 virtual CPUs (less vCPUs, e.g. 8,  
can work fine):

[9.927820] mlx4_core: Mellanox ConnectX core driver v2.2-1 (Feb, 2014)
[9.927882] mlx4_core: Initializing b961:00:02.0
[9.970994] mlx4_core b961:00:02.0: Detected virtual function - running in 
slave mode
[9.976783] mlx4_core b961:00:02.0: Sending reset
[9.985858] mlx4_core b961:00:02.0: Sending vhcr0
[   10.004855] mlx4_core b961:00:02.0: HCA minimum page size:512
[   10.010465] mlx4_core b961:00:02.0: Timestamping is not supported in slave 
mode
[   10.203065] mlx4_core b961:00:02.0: Failed to initialize event queue table, 
aborting
[   10.226728] mlx4_core: probe of b961:00:02.0 failed with error -12

I'm using the mainline kernel (4.10.0-rc4).

Any idea?

Thanks,
-- Dexuan



RE: [PATCH v18 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-27 Thread Dexuan Cui
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Wednesday, July 27, 2016 1:45
> To: Dexuan Cui <de...@microsoft.com>
> 
> From: Dexuan Cui <de...@microsoft.com>
> Date: Tue, 26 Jul 2016 07:09:41 +
> 
> > I googled "S390 hypervisor socket" but didn't find anything related (I 
> > think).
> 
> That would be net/iucv/
Thanks for the info! I'll look into this.
 
> There's also VMWare's stuff under net/vmw_vsock
> 
> It's just absolutely rediculous to make a new hypervisor socket
> interface over and over again, so much code duplication and
> replication.
I agree on this principle of avoiding duplication.
However my feeling is: IMHO different hypervisor sockets were developed
independently without coordination and the implementation details could be
so different that an enough generic framework/infrastructure is difficult,
e.g., at first glance, it looks AF_IUCV is quite different from AF_VSOCK and
this might explain why AF_VSOCK wasn't built on AF_IUCV(?).

I'll dig more into AF_IUCV, AF_VSOCK and AF_HYPERV and figure out what
is the best direction I should go.

Thanks,
-- Dexuan


RE: [PATCH v18 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-27 Thread Dexuan Cui
> From: netdev-ow...@vger.kernel.org [mailto:netdev-
> ow...@vger.kernel.org] On Behalf Of Dexuan Cui
> Sent: Tuesday, July 26, 2016 21:22
> ...
> This is because, the design of AF_HYPERV in the Hyper-V host side is
> suboptimal IMHO (the current host side design requires the least
> change in the host side, but it makes my life difficult. :-(  It may
> change in the future, but luckily we have to live with it at present):
BTW, sorry for my typo: "luckily" should be "unluckily".

Thanks,
-- Dexuan


RE: [PATCH v18 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-26 Thread Dexuan Cui
> From: Michal Kubecek [mailto:mkube...@suse.cz]
> Sent: Tuesday, July 26, 2016 17:57
>  ...
> On Tue, Jul 26, 2016 at 07:09:41AM +0000, Dexuan Cui wrote:
> > ... I don't think Michal
> > Kubecek was suggesting I build my code using the existing AF_VSOCK
> > code(?)  I think he was only asking me to clarify the way I used to write
> > the text to explain why I can't fit my code into the existing AF_VSOCK
> > code. BTW, AF_VSOCK is not on S390, I think.
> 
> Actually, I believe building on top of existing AF_VSOCK should be the
> first thought and only if this way shows unfeasible, one should consider
> a completely new implementation from scratch. After all, when VMware
> was upstreaming vsock, IIRC they had to work hard on making it
> a generic solution rather than a one purpose tool tailored for their specific 
> use
> case.
> 
> What I wanted to say in that mail was that I didn't find the reasoning
> very convincing. The only point that wasn't like "AF_VSOCK has many
> features we don't need" was the incompatible addressing scheme. The
> cover letter text didn't convince me it was given as much thought as it
> deserved. I felt - and it still feel - that the option of building on
> top of vsock wasn't considered seriously enough.
Hi Michal,
Thank you very much for the detailed explanation!

Just now I read your previous reply again and I think I actually failed to
get your point and my reply was inappropriate. I'm sorry about that.
 
When I firstly made the patch last July, I did try to build it on AF_VSOCK, 
but my feeling was that I had to made big changes to AF_VSOCK
code and its related transport layer driver's code. My feeling was that
the AF_VSOCK solution's implementation is not so generic that I can fit
mine in (easily).

To make my feeling more concrete so I can answer your question
properly, I'll be figuring out exactly how big the required changes will
be -- I'm afraid this would take non-trivial time, but I'll try to finish the
investigation ASAP.

The biggest challenge is the incompatible addressing scheme.
If you could give some advice, I would be very grateful.

> I must also admit I'm a bit confused by your response to the issue of
> socket lookup performance. I always thought the main reason to use
> special hypervisor sockets instead of TCP/IP over virtual network
> devices was efficiency (to avoid the overhead of network protocol
> processing). 
Yes, I agree with you.

BTW, IMO hypervisor sockets have an advantage of "zero-configuration".
To make TCP/IP work between host/guest, we need to add a NIC to
the guest, configure the NIC properly in the guest and find a way to
let the host/guest know each other's IP address, etc.

With hypervisor sockets, there is almost no such configuration effort.

> The fact that traversing a linear linked list under
> a global mutex for each socket lookup is not an issue as opening
> a connection is going to be slow anyway surprised me therefore. 
This is because, the design of AF_HYPERV in the Hyper-V host side is
suboptimal IMHO (the current host side design requires the least
change in the host side, but it makes my life difficult. :-(  It may
change in the future, but luckily we have to live with it at present):

1) A new connection is treated as a new Hyper-V device, so it has to
go through the slow device_register(). Please see
vmbus_device_register().

2) A connection/device must have its own ringbuffer that is shared
between host/guest. Allocating the ringbuffer memory in the VM 
and sharing the memory with the host by messages are both slow,
though I didn't measure the exact cost. Please see
hvsock_open_connection() -> vmbus_open().

3) The max length of the linear linked list is 2048, and in practice,
typically I guess the length should be small, so my gut feeling is that
the list traversing shouldn't be the bottleneck.
Having said that, I agree it's good to use some mechanism, like 
hash table, to speed up the lookup. I'll add this.

> But
> maybe it's fine as the typical use case is going to be small number of
> long running connections and traffic performance is going to make for
> the connection latency. 
Yeah, IMO it seems traffic performance and zero-configuration came
first when the current host side design was made.

> Or there are other advantages, I don't know.
> But if that is the case, it would IMHO deserve to be explained.
> 
> Michal Kubecek

Thanks,
-- Dexuan


RE: [PATCH v18 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-26 Thread Dexuan Cui
> From: devel [mailto:driverdev-devel-boun...@linuxdriverproject.org] On
> Behalf Of Dexuan Cui
> ...
> > From: David Miller [mailto:da...@davemloft.net]
> > ...
> > From: Dexuan Cui <de...@microsoft.com>
> > Date: Tue, 26 Jul 2016 03:09:16 +
> >
> > > BTW, during the past month, at least 7 other people also reviewed
> > > the patch and gave me quite a few good comments, which have
> > > been addressed.
> >
> > Correction: Several people gave coding style and simple corrections
> > to your patch.
> >
> > Very few gave any review of the _SUBSTANCE_ of your changes.
> >
> > And the one of the few who did, and suggested you build your
> > facilities using the existing S390 hypervisor socket infrastructure,
> > you brushed off _IMMEDIATELY_.
> >
> > That drives me crazy.  The one person who gave you real feedback
> > you basically didn't consider seriously at all.
>
> Hi David,
> I'm very sorry -- I guess I must have missed something here -- I don't
> remember somebody replied with S390 hypervisor socket
> infrastructure... I'm re-reading all the replies, trying to locate the
> reply and I'll find out why I didn't take it seriously. Sorry in advance.

Hi, David,
I checked all the comments I received and all my replies (at least I really
tried my best to check my Inbox) , but couldn't find the "S390 hypervisor
socket infrastructure" mail.

I googled "S390 hypervisor socket" but didn't find anything related (I think).

I'm really sorry -- could you please give a little more info about it?

If you meant https://lkml.org/lkml/2016/7/13/382, I don't think Michal
Kubecek was suggesting I build my code using the existing AF_VSOCK
code(?)  I think he was only asking me to clarify the way I used to write
the text to explain why I can't fit my code into the existing AF_VSOCK
code. BTW, AF_VSOCK is not on S390, I think.

If this is the case, I'm sorry I didn't explain the reason clearer.
My replies last year explained the reason with more info:
https://lkml.org/lkml/2015/7/7/1162
https://lkml.org/lkml/2015/7/17/67
And I thought people agreed that a new address family is justified.

Please let me excerpt the most related snippets in my old replies:

--
The biggest difference is the size of the endpoint (u128 vs. u32):
 in AF_VOSCK
vs.
 in AF_HYPERV.

In the AF_VSOCK code and the related transport layer (the wrapper
ops of VMware's VMCI), the size is widely used in kernel space
(and user space application). If I have to fit my code to AF_VSOCK
code, I would have to mess up the AF_VSOCK code in many places
by adding ugly code like:

IF the endpoint size is <u32, u32> THEN
use the existing logic;
ELSE
use the new logic;

And the user space application has to explicitly handle the
different endpoint sizes too.
--

Looking forward to your reply!

Thanks,
-- Dexuan


RE: [PATCH v18 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-25 Thread Dexuan Cui
> From: David Miller [mailto:da...@davemloft.net]
> ...
> From: Dexuan Cui <de...@microsoft.com>
> Date: Tue, 26 Jul 2016 03:09:16 +
> 
> > BTW, during the past month, at least 7 other people also reviewed
> > the patch and gave me quite a few good comments, which have
> > been addressed.
> 
> Correction: Several people gave coding style and simple corrections
> to your patch.
> 
> Very few gave any review of the _SUBSTANCE_ of your changes.
> 
> And the one of the few who did, and suggested you build your
> facilities using the existing S390 hypervisor socket infrastructure,
> you brushed off _IMMEDIATELY_.
>
> That drives me crazy.  The one person who gave you real feedback
> you basically didn't consider seriously at all.

Hi David,
I'm very sorry -- I guess I must have missed something here -- I don't
remember somebody replied with S390 hypervisor socket
infrastructure... I'm re-reading all the replies, trying to locate the
reply and I'll find out why I didn't take it seriously. Sorry in advance.

> I know why you don't want to consider alternative implementations,
> and it's because you guys have so much invested in what you've
> implemented already.
This is not true. I'm absolutely open to any possibility to have an
alternative better implementation.
Please allow me to find the "S390 hypervisor socket infrastructure" reply
first and I'll report back ASAP.
 
> But that's tough and not our problem.
> 
> And until this changes, yes, this submission will be stuck in the
> mud and continue slogging on like this.

I definitely agree and understand.

Thanks,
-- Dexuan


RE: [PATCH v18 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-25 Thread Dexuan Cui
> From: David Miller [mailto:da...@davemloft.net]
> 
> From: Dexuan Cui <de...@microsoft.com>
> Date: Sat, 23 Jul 2016 01:35:51 +
> 
> > +static struct sock *hvsock_create(struct net *net, struct socket *sock,
> > + gfp_t priority, unsigned short type)
> > +{
> > +   struct hvsock_sock *hvsk;
> > +   struct sock *sk;
> > +
> > +   sk = sk_alloc(net, AF_HYPERV, priority, _proto, 0);
> > +   if (!sk)
> > +   return NULL;
>  ...
> > +   /* Looks stream-based socket doesn't need this. */
> > +   sk->sk_backlog_rcv = NULL;
> > +
> > +   sk->sk_state = 0;
> > +   sock_reset_flag(sk, SOCK_DONE);
> 
> All of these are unnecessary initializations, since sk_alloc() zeroes
> out the 'sk' object for you.

Hi David,
Thanks for the comment!  I'll remove the 3 lines.

May I know if you have more comments?

BTW, during the past month, at least 7 other people also reviewed
the patch and gave me quite a few good comments, which have
been addressed. Though only one of them gave the Reviewed-by
line for now, I guess I would get more if I ping them to have a look
at the latest version of the patch, i.e., v19 -- I'm going to post it
with the aforementioned 3 lines removed and if you've more 
comments, I'm ready to address them too. :-)

Thanks,
-- Dexuan


[PATCH v18 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-22 Thread Dexuan Cui
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It's somewhat like TCP over
VMBus, but the transportation layer (VMBus) is much simpler than IP.

With Hyper-V Sockets, applications between the host and the guest can talk
to each other directly by the traditional BSD-style socket APIs.

Hyper-V Sockets is only available on new Windows hosts, like Windows Server
2016. More info is in this article "Make your own integration services":
https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service

The patch implements the necessary support in the guest side by introducing
a new socket address family AF_HYPERV.

Signed-off-by: Dexuan Cui <de...@microsoft.com>
Cc: "K. Y. Srinivasan" <k...@microsoft.com>
Cc: Haiyang Zhang <haiya...@microsoft.com>
Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
Cc: Cathy Avery <cav...@redhat.com>
Cc: Olaf Hering <o...@aepfle.de>

---

You can also get the patch by (commit 84146dfb):
https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160721_v18

For the change log before v12, please see https://lkml.org/lkml/2016/5/15/31

In v12, the changes are mainly the following:

1) remove the module params as David suggested.

2) use 5 exact pages for VMBus send/recv rings, respectively.
The host side's design of the feature requires 5 exact pages for recv/send
rings respectively -- this is suboptimal considering memory consumption,
however unluckily we have to live with it, before the host comes up with
a new design in the future. :-(

3) remove the per-connection static send/recv buffers
Instead, we allocate and free the buffers dynamically only when we recv/send
data. This means: when a connection is idle, no memory is consumed as
recv/send buffers at all.

In v13:
I return ENOMEM on buffer alllocation failure

   Actually "man read/write" says "Other errors may occur, depending on the
object connected to fd". "man send/recv" indeed lists ENOMEM.
   Considering AF_HYPERV is a new socket type, ENOMEM seems OK here.
   In the long run, I think we should add a new API in the VMBus driver,
allowing data copy from VMBus ringbuffer into user mode buffer directly.
This way, we can even eliminate this temporary buffer.

In v14:
fix some coding style issues pointed out by David.

In v15:
Just some stylistic changes addressing comments from Joe Perches and
Olaf Hering -- thank you!
- add a GPL blurb.
- define a new macro PAGE_SIZE_4K and use it to replace PAGE_SIZE
- change sk_to_hvsock/hvsock_to_sk() from macros to inline functions
- remove a not-very-useful pr_err()
- fix some typos in comment and coding style issues.

In v16:
Made stylistic changes addressing comments from Vitaly Kuznetsov.
Thank you very much for the detailed comments, Vitaly!

In v17:
- PAGE_SIZE -> PAGE_SIZE_4K
- allow regular users to use the socket
Thank you Michal Kubecek for the suggestions!

In v18:
Just some tiny updates to address some spurious compiler warnings:
"xxx may be used uninitialized in this function".

Looking forward to your comments!

 MAINTAINERS |2 +
 include/linux/hyperv.h  |   13 +
 include/linux/socket.h  |4 +-
 include/net/af_hvsock.h |   78 +++
 include/uapi/linux/hyperv.h |   23 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/hv_sock/Kconfig |   10 +
 net/hv_sock/Makefile|3 +
 net/hv_sock/af_hvsock.c | 1507 +++
 10 files changed, 1641 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 50f69ba..6eaa26f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5514,7 +5514,9 @@ F:drivers/pci/host/pci-hyperv.c
 F: drivers/net/hyperv/
 F: drivers/scsi/storvsc_drv.c
 F: drivers/video/fbdev/hyperv_fb.c
+F: net/hv_sock/
 F: include/linux/hyperv.h
+F: include/net/af_hvsock.h
 F: tools/hv/
 F: Documentation/ABI/stable/sysfs-bus-vmbus
 
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 50f493e..1cda6ea5 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1508,5 +1508,18 @@ static inline void commit_rd_index(struct vmbus_channel 
*channel)
vmbus_set_event(channel);
 }
 
+struct vmpipe_proto_header {
+   u32 pkt_type;
+   u32 data_size;
+};
+
+#define HVSOCK_HEADER_LEN  (sizeof(struct vmpacket_descriptor) + \
+sizeof(struct vmpipe_proto_header))
+
+/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */
+#define PREV_INDICES_LEN   (sizeof(u64))
 
+#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \
+   ALIGN((payload_len), 8) + \
+   PREV_INDICES_LEN)
 #endif /* _HYPERV_H */
diff --git a/include/linux/socket.h b/include/linux/socket.

[PATCH v18 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)

2016-07-22 Thread Dexuan Cui
_sock sockets.
By default, 1024 sockets take about 1024 * (3+2+1+1) * 4KB = 28M bytes.
(Here 1+1 means 1 page for send/recv buffers per connection, respectively.)

3) implement the TODO in hvsock_shutdown().

4) fix a bug in hvsock_close_connection():
   I remove "sk->sk_socket->state = SS_UNCONNECTED;" -- actually this line
is not really useful. For a connection triggered by a host app's connect(),
sk->sk_socket remains NULL before the connection is accepted by the server
app (in Linux VM): see hvsock_accept() -> hvsock_accept_wait() ->
sock_graft(connected, newsock). If the host app exits before the server
app's accept() returns, the host can send a rescind-message to close the
connection and later in the Linux VM's message handler 
i.e. vmbus_onoffer_rescind()), Linux will get a NULL de-referencing crash. 

5) fix a bug in hvsock_open_connection()
  I move the vmbus_set_chn_rescind_callback() to a later place, because
when vmbus_open() fails, hvsock_close_connection() can do nothing and we
count on vmbus_onoffer_rescind() -> vmbus_device_unregister() to clean up
the device.

6) some stylistic modificiation.


Changes since v11:
1) remove the module params as David suggested.

2) use 5 exact pages for VMBus send/recv rings, respectively.
The host side's design of the feature requires 5 exact pages for recv/send
rings respectively -- this is suboptimal considering memory consumption,
however unluckily we have to live with it, before the host comes up with
a new design in the future. :-(

3) remove the per-connection static send/recv buffers
Instead, we allocate and free the buffers dynamically only when we recv/send
data. This means: when a connection is idle, no memory is consumed as
recv/send buffers at all.
Dexuan Cui (1):
  hv_sock: introduce Hyper-V Sockets

Changes since v12:
return ENOMEM on buffer alllocation failure

   Actually "man read/write" says "Other errors may occur, depending on the
object connected to fd". "man send/recv" indeed lists ENOMEM.
   Considering AF_HYPERV is a new socket type, ENOMEM seems OK here.
   In the long run, I think we should add a new API in the VMBus driver,
allowing data copy from VMBus ringbuffer into user mode buffer directly.
This way, we can even eliminate this temporary buffer.

Changes since v13:
fix some coding style issues pointed out by David.

Changes since v14:
Just some stylistic changes addressing comments from Joe Perches and
Olaf Hering -- thank you!
- add a GPL blurb.
- define a new macro PAGE_SIZE_4K and use it to replace PAGE_SIZE
- change sk_to_hvsock/hvsock_to_sk() from macros to inline functions
- remove a not-very-useful pr_err()
- fix some typos in comment and coding style issues.

Changes since v15:
Made stylistic changes addressing comments from Vitaly Kuznetsov.
Thank you very much for the detailed comments, Vitaly! 

Changes since v16:
- PAGE_SIZE -> PAGE_SIZE_4K
- allow regular users to use the socket
Thank you Michal Kubecek for the suggestions!

Changes since v17:
Just some tiny updates to address some spurious compiler warnings:
"xxx may be used uninitialized in this function".


Dexuan Cui (1):
  hv_sock: introduce Hyper-V Sockets

 MAINTAINERS |2 +
 include/linux/hyperv.h  |   13 +
 include/linux/socket.h  |4 +-
 include/net/af_hvsock.h |   78 +++
 include/uapi/linux/hyperv.h |   23 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/hv_sock/Kconfig |   10 +
 net/hv_sock/Makefile|3 +
 net/hv_sock/af_hvsock.c | 1507 +++
 10 files changed, 1641 insertions(+), 1 deletion(-)
 create mode 100644 include/net/af_hvsock.h
 create mode 100644 net/hv_sock/Kconfig
 create mode 100644 net/hv_sock/Makefile
 create mode 100644 net/hv_sock/af_hvsock.c

-- 
2.1.0



RE: [PATCH v17 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-19 Thread Dexuan Cui
> From: David Miller [mailto:da...@davemloft.net]
> >> From: kbuild test robot [mailto:l...@intel.com]
> >> [auto build test WARNING on net-next/master]
> >>
> >> url:https://github.com/0day-ci/linux/commits/Dexuan-Cui/introduce-
> >> Hyper-V-VM-Sockets-hv_sock/20160715-223433
> >> config: x86_64-randconfig-a0-07191719 (attached as .config)
> >> compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7
> >> reproduce:
> >> # save the attached .config to linux build tree
> >> make ARCH=x86_64
> >>
> >> All warnings (new ones prefixed by >>):
> >>
> >>net/hv_sock/af_hvsock.c: In function 'hvsock_open_connection':
> >>net/hv_sock/af_hvsock.c:693: warning: 'hvsk' may be used uninitialized
> in
> >> this function
> >>net/hv_sock/af_hvsock.c:693: warning: 'new_hvsk' may be used
> >> uninitialized in this function
> >>net/hv_sock/af_hvsock.c:697: warning: 'new_sk' may be used
> uninitialized
> >> in this function
> >>net/hv_sock/af_hvsock.c: In function 'hvsock_sendmsg_wait':
> >>net/hv_sock/af_hvsock.c:1053: warning: 'ret' may be used uninitialized
> in
> >> this function
> >> >> net/hv_sock/af_hvsock.o: warning: objtool:
> hvsock_on_channel_cb()+0x1d:
> >> function has unreachable instruction
> >
> > These warnings are all false alarms.
> 
> But you still have to quiet them.

Sure. Will do.

Thanks,
-- Dexuan


RE: [PATCH v17 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-19 Thread Dexuan Cui
> From: kbuild test robot [mailto:l...@intel.com]
> Sent: Wednesday, July 20, 2016 1:10
> 
> Hi,
> 
> [auto build test WARNING on net-next/master]
> 
> url:https://github.com/0day-ci/linux/commits/Dexuan-Cui/introduce-
> Hyper-V-VM-Sockets-hv_sock/20160715-223433
> config: x86_64-randconfig-a0-07191719 (attached as .config)
> compiler: gcc-4.4 (Debian 4.4.7-8) 4.4.7
> reproduce:
> # save the attached .config to linux build tree
> make ARCH=x86_64
> 
> All warnings (new ones prefixed by >>):
> 
>net/hv_sock/af_hvsock.c: In function 'hvsock_open_connection':
>net/hv_sock/af_hvsock.c:693: warning: 'hvsk' may be used uninitialized in
> this function
>net/hv_sock/af_hvsock.c:693: warning: 'new_hvsk' may be used
> uninitialized in this function
>net/hv_sock/af_hvsock.c:697: warning: 'new_sk' may be used uninitialized
> in this function
>net/hv_sock/af_hvsock.c: In function 'hvsock_sendmsg_wait':
>net/hv_sock/af_hvsock.c:1053: warning: 'ret' may be used uninitialized in
> this function
> >> net/hv_sock/af_hvsock.o: warning: objtool: hvsock_on_channel_cb()+0x1d:
> function has unreachable instruction

These warnings are all false alarms.

Thanks,
-- Dexuan


[PATCH v17 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-15 Thread Dexuan Cui
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It's somewhat like TCP over
VMBus, but the transportation layer (VMBus) is much simpler than IP.

With Hyper-V Sockets, applications between the host and the guest can talk
to each other directly by the traditional BSD-style socket APIs.

Hyper-V Sockets is only available on new Windows hosts, like Windows Server
2016. More info is in this article "Make your own integration services":
https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service

The patch implements the necessary support in the guest side by introducing
a new socket address family AF_HYPERV.

Signed-off-by: Dexuan Cui <de...@microsoft.com>
Cc: "K. Y. Srinivasan" <k...@microsoft.com>
Cc: Haiyang Zhang <haiya...@microsoft.com>
Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
Cc: Cathy Avery <cav...@redhat.com>
Cc: Olaf Hering <o...@aepfle.de>

---

You can also get the patch by (commit fcf045af6):
https://github.com/dcui/linux/tree/decui/hv_sock/net-next/20160715_v17

For the change log before v12, please see https://lkml.org/lkml/2016/5/15/31

In v12, the changes are mainly the following:

1) remove the module params as David suggested.

2) use 5 exact pages for VMBus send/recv rings, respectively.
The host side's design of the feature requires 5 exact pages for recv/send
rings respectively -- this is suboptimal considering memory consumption,
however unluckily we have to live with it, before the host comes up with
a new design in the future. :-(

3) remove the per-connection static send/recv buffers
Instead, we allocate and free the buffers dynamically only when we recv/send
data. This means: when a connection is idle, no memory is consumed as
recv/send buffers at all.

In v13:
I return ENOMEM on buffer alllocation failure

   Actually "man read/write" says "Other errors may occur, depending on the
object connected to fd". "man send/recv" indeed lists ENOMEM.
   Considering AF_HYPERV is a new socket type, ENOMEM seems OK here.
   In the long run, I think we should add a new API in the VMBus driver,
allowing data copy from VMBus ringbuffer into user mode buffer directly.
This way, we can even eliminate this temporary buffer.

In v14:
fix some coding style issues pointed out by David.

In v15:
Just some stylistic changes addressing comments from Joe Perches and
Olaf Hering -- thank you!
- add a GPL blurb.
- define a new macro PAGE_SIZE_4K and use it to replace PAGE_SIZE
- change sk_to_hvsock/hvsock_to_sk() from macros to inline functions
- remove a not-very-useful pr_err()
- fix some typos in comment and coding style issues.

In v16:
Made stylistic changes addressing comments from Vitaly Kuznetsov.
Thank you very much for the detailed comments, Vitaly!

In v17:
- PAGE_SIZE -> PAGE_SIZE_4K
- allow regular users to use the socket
Thank you Michal Kubecek for the suggestions!

Looking forward to your comments!

 MAINTAINERS |2 +
 include/linux/hyperv.h  |   13 +
 include/linux/socket.h  |4 +-
 include/net/af_hvsock.h |   78 +++
 include/uapi/linux/hyperv.h |   23 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/hv_sock/Kconfig |   10 +
 net/hv_sock/Makefile|3 +
 net/hv_sock/af_hvsock.c | 1507 +++
 10 files changed, 1641 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 50f69ba..6eaa26f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5514,7 +5514,9 @@ F:drivers/pci/host/pci-hyperv.c
 F: drivers/net/hyperv/
 F: drivers/scsi/storvsc_drv.c
 F: drivers/video/fbdev/hyperv_fb.c
+F: net/hv_sock/
 F: include/linux/hyperv.h
+F: include/net/af_hvsock.h
 F: tools/hv/
 F: Documentation/ABI/stable/sysfs-bus-vmbus
 
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 50f493e..1cda6ea5 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1508,5 +1508,18 @@ static inline void commit_rd_index(struct vmbus_channel 
*channel)
vmbus_set_event(channel);
 }
 
+struct vmpipe_proto_header {
+   u32 pkt_type;
+   u32 data_size;
+};
+
+#define HVSOCK_HEADER_LEN  (sizeof(struct vmpacket_descriptor) + \
+sizeof(struct vmpipe_proto_header))
+
+/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */
+#define PREV_INDICES_LEN   (sizeof(u64))
 
+#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \
+   ALIGN((payload_len), 8) + \
+   PREV_INDICES_LEN)
 #endif /* _HYPERV_H */
diff --git a/include/linux/socket.h b/include/linux/socket.h
index b5cc5a6..0b68b58 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -202,8 +202,9 @@ struct ucred {
 #de

[PATCH v17 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)

2016-07-15 Thread Dexuan Cui
_sock sockets.
By default, 1024 sockets take about 1024 * (3+2+1+1) * 4KB = 28M bytes.
(Here 1+1 means 1 page for send/recv buffers per connection, respectively.)

3) implement the TODO in hvsock_shutdown().

4) fix a bug in hvsock_close_connection():
   I remove "sk->sk_socket->state = SS_UNCONNECTED;" -- actually this line
is not really useful. For a connection triggered by a host app's connect(),
sk->sk_socket remains NULL before the connection is accepted by the server
app (in Linux VM): see hvsock_accept() -> hvsock_accept_wait() ->
sock_graft(connected, newsock). If the host app exits before the server
app's accept() returns, the host can send a rescind-message to close the
connection and later in the Linux VM's message handler 
i.e. vmbus_onoffer_rescind()), Linux will get a NULL de-referencing crash. 

5) fix a bug in hvsock_open_connection()
  I move the vmbus_set_chn_rescind_callback() to a later place, because
when vmbus_open() fails, hvsock_close_connection() can do nothing and we
count on vmbus_onoffer_rescind() -> vmbus_device_unregister() to clean up
the device.

6) some stylistic modificiation.


Changes since v11:
1) remove the module params as David suggested.

2) use 5 exact pages for VMBus send/recv rings, respectively.
The host side's design of the feature requires 5 exact pages for recv/send
rings respectively -- this is suboptimal considering memory consumption,
however unluckily we have to live with it, before the host comes up with
a new design in the future. :-(

3) remove the per-connection static send/recv buffers
Instead, we allocate and free the buffers dynamically only when we recv/send
data. This means: when a connection is idle, no memory is consumed as
recv/send buffers at all.
Dexuan Cui (1):
  hv_sock: introduce Hyper-V Sockets

Changes since v12:
return ENOMEM on buffer alllocation failure

   Actually "man read/write" says "Other errors may occur, depending on the
object connected to fd". "man send/recv" indeed lists ENOMEM.
   Considering AF_HYPERV is a new socket type, ENOMEM seems OK here.
   In the long run, I think we should add a new API in the VMBus driver,
allowing data copy from VMBus ringbuffer into user mode buffer directly.
This way, we can even eliminate this temporary buffer.

Changes since v13:
fix some coding style issues pointed out by David.

Changes since v14:
Just some stylistic changes addressing comments from Joe Perches and
Olaf Hering -- thank you!
- add a GPL blurb.
- define a new macro PAGE_SIZE_4K and use it to replace PAGE_SIZE
- change sk_to_hvsock/hvsock_to_sk() from macros to inline functions
- remove a not-very-useful pr_err()
- fix some typos in comment and coding style issues.

Changes since v15:
Made stylistic changes addressing comments from Vitaly Kuznetsov.
Thank you very much for the detailed comments, Vitaly! 

Changes since v16:
- PAGE_SIZE -> PAGE_SIZE_4K
- allow regular users to use the socket
Thank you Michal Kubecek for the suggestions!

Dexuan Cui (1):
  hv_sock: introduce Hyper-V Sockets

 MAINTAINERS |2 +
 include/linux/hyperv.h  |   13 +
 include/linux/socket.h  |4 +-
 include/net/af_hvsock.h |   78 +++
 include/uapi/linux/hyperv.h |   23 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/hv_sock/Kconfig |   10 +
 net/hv_sock/Makefile|3 +
 net/hv_sock/af_hvsock.c | 1507 +++
 10 files changed, 1641 insertions(+), 1 deletion(-)
 create mode 100644 include/net/af_hvsock.h
 create mode 100644 net/hv_sock/Kconfig
 create mode 100644 net/hv_sock/Makefile
 create mode 100644 net/hv_sock/af_hvsock.c

-- 
2.1.0


RE: [PATCH v16 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)

2016-07-13 Thread Dexuan Cui
> From: Michal Kubecek [mailto:mkube...@suse.cz]
> > ..
> > However, though Hyper-V Sockets may seem conceptually similar to
> > AF_VOSCK, there are differences in the transportation layer, and IMO these
> > make the direct code reusing impractical:
> >
> > 1. In AF_VSOCK, the endpoint type is: , but in
> > AF_HYPERV, the endpoint type is: . Here GUID
> > is 128-bit.
> 
> OK, this could be a problem.
> 
> > 2. AF_VSOCK supports SOCK_DGRAM, while AF_HYPERV doesn't.
> >
> > 3. AF_VSOCK supports some special sock opts, like
> SO_VM_SOCKETS_BUFFER_SIZE,
> > SO_VM_SOCKETS_BUFFER_MIN/MAX_SIZE and
> SO_VM_SOCKETS_CONNECT_TIMEOUT.
> > These are meaningless to AF_HYPERV.
> >
> > 4. Some AF_VSOCK's VMCI transportation ops are meanless to
> AF_HYPERV/VMBus,
> > like .notify_recv_init
> > .notify_recv_pre_block
> > .notify_recv_pre_dequeue
> > .notify_recv_post_dequeue
> > .notify_send_init
> > .notify_send_pre_block
> > .notify_send_pre_enqueue
> > .notify_send_post_enqueue
> > etc.
> >
> > So I think we'd better introduce a new address family: AF_HYPERV.
> 
> I don't quite understand the logic here. All these sound like "AF_VSOCK
> has this feature we don't need so (rather than not using the feature) we
> are not going to use AF_VSOCK". I would understand if you pointed out
> features important for you that are missing in AF_VSOCK but this kind of
> reasoning sounds strange to me.
> 
> Michal Kubecek

Hi Michal,
Sorry, I might not have made me clear.
I didn't mean "AF_VSOCK has this feature we don't need". I didn't mean
"features important for me that are missing in AF_VSOCK", either.

I just wanted to say that I need a new protocol number and I should
have a separate directory in net/, i.e., net/hv_sock/. 

Because AF_VSOCK and AF_HYPERV are conceptually similar, some
people asked why I didn't fit my code into net/vmw_vsock/ and I wrote the
text to explain why that wasn't a good idea: the implementation details
are different and I can't directly reuse the vsock code.

Thanks,
-- Dexuan


RE: [PATCH v16 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-13 Thread Dexuan Cui
> From: Michal Kubecek [mailto:mkube...@suse.cz]
> > ..
> > +static struct sock *hvsock_find_connected_socket_by_channel(
> > +   const struct vmbus_channel *channel)
> > +{
> > +   struct hvsock_sock *hvsk;
> > +
> > +   list_for_each_entry(hvsk, _connected_list, connected_list) {
> > +   if (hvsk->channel == channel)
> > +   return hvsock_to_sk(hvsk);
> > +   }
> > +   return NULL;
> > +}
> 
> How does this work from performance point of view if there are many
> connected sockets and/or high frequency of new connections? AFAICS most
> other families use a hash table for socket lookup.

Hi Michal,
Per the current design of the feature in the host, there is actually an implicit
inherent limit of the number of the per-guest connections: a guest can't
have more than 2048 connections.  This is because 1 connection takes a
VMBus channel ID and at most 2048 channel IDs per guest are supported.

And I don't think the lookup function is a bottleneck because the
whole process of creating or closing a connection is actually doing lots of
things, which need several extra rounds of interactions between the host and
the guest, taking much more cycles than the lookup here.

> > +static void get_ringbuffer_rw_status(struct vmbus_channel *channel,
> > +bool *can_read, bool *can_write)
> > ..
> > +   if (can_write) {
> > +   hv_get_ringbuffer_availbytes(>outbound,
> > +,
> > +_write_bytes);
> > +
> > +   /* We only write if there is enough space */
> > +   *can_write = avl_write_bytes > HVSOCK_PKT_LEN(PAGE_SIZE);
> 
> I'm not sure where does this come from but is this really supposed to be
> PAGE_SIZE (not the fixed 4KB PAGE_SIZE_4K)?

Thanks for pointing this out!
I'll replace it with PAGE_SIZE_4K.

> > +   /* see get_ringbuffer_rw_status() */
> > +   set_channel_pending_send_size(channel, HVSOCK_PKT_LEN(PAGE_SIZE)
> + 1);
> 
> Same question.
I'll replace it with PAGE_SIZE_4K too.

> > +static int hvsock_create_sock(struct net *net, struct socket *sock,
> > + int protocol, int kern)
> > +{
> > +   struct sock *sk;
> > +
> > +   if (!capable(CAP_SYS_ADMIN) && !capable(CAP_NET_ADMIN))
> > +   return -EPERM;
> 
> Looks like any application wanting to use hyper-v sockets will need
> rather high privileges. It would make sense if these sockets were
> reserved for privileged tasks like VM management. But according to the
> commit message, hv_sock is supposed to be used for regular application
> to application communication. Requiring CAP_{SYS,NET}_ADMIN looks like
> an overkill to me.

I agree with you. Let me remove this check.

BTW, the check was supposed to prevent regular app from using the socket,
because the current design by the host has a drawback: a connection consumes
at least 40KB unswapable memory as the host<->guest shared ring and we
don't want malicious regular apps to be able to consume all the memory.

Later I realized the per-guest number of connections couldn't exceed 2048,
so at most the host<->guest rings consume 2K * 40KB = 80MB memory and
this isn't a big concern to me.

Thanks,
-- Dexuan


[PATCH v16 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-11 Thread Dexuan Cui
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It's somewhat like TCP over
VMBus, but the transportation layer (VMBus) is much simpler than IP.

With Hyper-V Sockets, applications between the host and the guest can talk
to each other directly by the traditional BSD-style socket APIs.

Hyper-V Sockets is only available on new Windows hosts, like Windows Server
2016. More info is in this article "Make your own integration services":
https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service

The patch implements the necessary support in the guest side by introducing
a new socket address family AF_HYPERV.

Signed-off-by: Dexuan Cui <de...@microsoft.com>
Cc: "K. Y. Srinivasan" <k...@microsoft.com>
Cc: Haiyang Zhang <haiya...@microsoft.com>
Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
Cc: Cathy Avery <cav...@redhat.com>
Cc: Olaf Hering <o...@aepfle.de>
---

You can also get the patch by (commit 5dde7975):
https://github.com/dcui/linux/tree/decui/hv_sock/net-next/20160711_v16


For the change log before v12, please see https://lkml.org/lkml/2016/5/15/31

In v12, the changes are mainly the following:

1) remove the module params as David suggested.

2) use 5 exact pages for VMBus send/recv rings, respectively.
The host side's design of the feature requires 5 exact pages for recv/send
rings respectively -- this is suboptimal considering memory consumption,
however unluckily we have to live with it, before the host comes up with
a new design in the future. :-(

3) remove the per-connection static send/recv buffers
Instead, we allocate and free the buffers dynamically only when we recv/send
data. This means: when a connection is idle, no memory is consumed as
recv/send buffers at all.

In v13:
I return ENOMEM on buffer alllocation failure

   Actually "man read/write" says "Other errors may occur, depending on the
object connected to fd". "man send/recv" indeed lists ENOMEM.
   Considering AF_HYPERV is a new socket type, ENOMEM seems OK here.
   In the long run, I think we should add a new API in the VMBus driver,
allowing data copy from VMBus ringbuffer into user mode buffer directly.
This way, we can even eliminate this temporary buffer.

In v14:
fix some coding style issues pointed out by David.

In v15:
Just some stylistic changes addressing comments from Joe Perches and
Olaf Hering -- thank you!
- add a GPL blurb.
- define a new macro PAGE_SIZE_4K and use it to replace PAGE_SIZE
- change sk_to_hvsock/hvsock_to_sk() from macros to inline functions
- remove a not-very-useful pr_err()
- fix some typos in comment and coding style issues.

In v16:
Made stylistic changes addressing comments from Vitaly Kuznetsov.
Thank you very much for the detailed comments, Vitaly!

Looking forward to your comments!

 MAINTAINERS |2 +
 include/linux/hyperv.h  |   13 +
 include/linux/socket.h  |4 +-
 include/net/af_hvsock.h |   78 +++
 include/uapi/linux/hyperv.h |   23 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/hv_sock/Kconfig |   10 +
 net/hv_sock/Makefile|3 +
 net/hv_sock/af_hvsock.c | 1509 +++
 10 files changed, 1643 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 50f69ba..6eaa26f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5514,7 +5514,9 @@ F:drivers/pci/host/pci-hyperv.c
 F: drivers/net/hyperv/
 F: drivers/scsi/storvsc_drv.c
 F: drivers/video/fbdev/hyperv_fb.c
+F: net/hv_sock/
 F: include/linux/hyperv.h
+F: include/net/af_hvsock.h
 F: tools/hv/
 F: Documentation/ABI/stable/sysfs-bus-vmbus
 
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 50f493e..1cda6ea5 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1508,5 +1508,18 @@ static inline void commit_rd_index(struct vmbus_channel 
*channel)
vmbus_set_event(channel);
 }
 
+struct vmpipe_proto_header {
+   u32 pkt_type;
+   u32 data_size;
+};
+
+#define HVSOCK_HEADER_LEN  (sizeof(struct vmpacket_descriptor) + \
+sizeof(struct vmpipe_proto_header))
+
+/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */
+#define PREV_INDICES_LEN   (sizeof(u64))
 
+#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \
+   ALIGN((payload_len), 8) + \
+   PREV_INDICES_LEN)
 #endif /* _HYPERV_H */
diff --git a/include/linux/socket.h b/include/linux/socket.h
index b5cc5a6..0b68b58 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -202,8 +202,9 @@ struct ucred {
 #define AF_VSOCK   40  /* vSockets */
 #define AF_KCM 41  /* Kernel Connection Multiplexor*/
 #defin

[PATCH v16 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)

2016-07-11 Thread Dexuan Cui
 send_ring_page is 2.

2) add module param max_socket_number (the default is 1024).
A user can enlarge the number to create more than 1024 hv_sock sockets.
By default, 1024 sockets take about 1024 * (3+2+1+1) * 4KB = 28M bytes.
(Here 1+1 means 1 page for send/recv buffers per connection, respectively.)

3) implement the TODO in hvsock_shutdown().

4) fix a bug in hvsock_close_connection():
   I remove "sk->sk_socket->state = SS_UNCONNECTED;" -- actually this line
is not really useful. For a connection triggered by a host app's connect(),
sk->sk_socket remains NULL before the connection is accepted by the server
app (in Linux VM): see hvsock_accept() -> hvsock_accept_wait() ->
sock_graft(connected, newsock). If the host app exits before the server
app's accept() returns, the host can send a rescind-message to close the
connection and later in the Linux VM's message handler 
i.e. vmbus_onoffer_rescind()), Linux will get a NULL de-referencing crash. 

5) fix a bug in hvsock_open_connection()
  I move the vmbus_set_chn_rescind_callback() to a later place, because
when vmbus_open() fails, hvsock_close_connection() can do nothing and we
count on vmbus_onoffer_rescind() -> vmbus_device_unregister() to clean up
the device.

6) some stylistic modificiation.


Changes since v11:
1) remove the module params as David suggested.

2) use 5 exact pages for VMBus send/recv rings, respectively.
The host side's design of the feature requires 5 exact pages for recv/send
rings respectively -- this is suboptimal considering memory consumption,
however unluckily we have to live with it, before the host comes up with
a new design in the future. :-(

3) remove the per-connection static send/recv buffers
Instead, we allocate and free the buffers dynamically only when we recv/send
data. This means: when a connection is idle, no memory is consumed as
recv/send buffers at all.
Dexuan Cui (1):
  hv_sock: introduce Hyper-V Sockets

Changes since v12:
return ENOMEM on buffer alllocation failure

   Actually "man read/write" says "Other errors may occur, depending on the
object connected to fd". "man send/recv" indeed lists ENOMEM.
   Considering AF_HYPERV is a new socket type, ENOMEM seems OK here.
   In the long run, I think we should add a new API in the VMBus driver,
allowing data copy from VMBus ringbuffer into user mode buffer directly.
This way, we can even eliminate this temporary buffer.

Changes since v13:
fix some coding style issues pointed out by David.

Changes since v14:
Just some stylistic changes addressing comments from Joe Perches and
Olaf Hering -- thank you!
- add a GPL blurb.
- define a new macro PAGE_SIZE_4K and use it to replace PAGE_SIZE
- change sk_to_hvsock/hvsock_to_sk() from macros to inline functions
- remove a not-very-useful pr_err()
- fix some typos in comment and coding style issues.

Changes since v15:
Made stylistic changes addressing comments from Vitaly Kuznetsov.
Thank you very much for the detailed comments, Vitaly! 

Dexuan Cui (1):
  hv_sock: introduce Hyper-V Sockets

 MAINTAINERS |2 +
 include/linux/hyperv.h  |   13 +
 include/linux/socket.h  |4 +-
 include/net/af_hvsock.h |   78 +++
 include/uapi/linux/hyperv.h |   23 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/hv_sock/Kconfig |   10 +
 net/hv_sock/Makefile|3 +
 net/hv_sock/af_hvsock.c | 1509 +++
 10 files changed, 1643 insertions(+), 1 deletion(-)
 create mode 100644 include/net/af_hvsock.h
 create mode 100644 net/hv_sock/Kconfig
 create mode 100644 net/hv_sock/Makefile
 create mode 100644 net/hv_sock/af_hvsock.c

-- 
2.1.0


delta_v15_vs_v16.patch
Description: delta_v15_vs_v16.patch


RE: [PATCH v15 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-11 Thread Dexuan Cui
> From: Vitaly Kuznetsov [mailto:vkuzn...@redhat.com]
>  ...
> Some comments below. The vast majority of them are really minor, the
> only thing which bothers me a little bit is WARN() in hvsock_sendmsg()
> which I think shouldn't be there. But I may have missed something.
 
Thank you for the very detailed comments, Vitaly!

Now I see I shouldn't put pr_err() in hvsock_sendmsg() and hvsock_recvmsg(),
because IMO a malicious app can use this to generate lots of messages to slow
down the system.  I'll remove them.

I'll reply your other comments bellow.

> > +#define guid_t uuid_le
> > +struct sockaddr_hv {
> > +   __kernel_sa_family_tshv_family;  /* Address family  */
> > +   u16 reserved;/* Must be Zero*/
> > +   guid_t  shv_vm_id;   /* VM ID   */
> > +   guid_t  shv_service_id;  /* Service ID  */
> > +};
> 
> I'm not sure it is worth it to introduce a new 'guid_t' type here, we
> may want to rename
> 
> shv_vm_id -> shv_vm_guid
> shv_service_id -> shv_service_guid
> 
> and use uuid_le type.

Ok. I'll make the change.

> > +config HYPERV_SOCK
> > +   tristate "Hyper-V Sockets"
> > +   depends on HYPERV
> > +   default m if HYPERV
> > +   help
> > + Hyper-V Sockets is somewhat like TCP over VMBus, allowing
> > + communication between Linux guest and Hyper-V host without TCP/IP.
> > +
> 
> I know it's hard to come up with a simple description but I'd rather
> describe is as "Socket interface for high speed communication between
> Linux guest and Hyper-V host over VMBus."

OK.

> > +static bool uuid_equals(uuid_le u1, uuid_le u2)
> > +{
> > +   return !uuid_le_cmp(u1, u2);
> > +}
> 
> why not use uuid_le_cmp directly?
OK. I will change to it.

> > +static unsigned int hvsock_poll(struct file *file, struct socket *sock,
> > +   poll_table *wait)
>> ...
> > +   if (channel) {
> > +   /* If there is something in the queue then we can read */
> > +   get_ringbuffer_rw_status(channel, _read, _write);
> > +
> > +   if (!can_read && hvsk->recv)
> > +   can_read = true;
> > +
> > +   if (!(sk->sk_shutdown & RCV_SHUTDOWN) && can_read)
> > +   mask |= POLLIN | POLLRDNORM;
> > +   } else {
> > +   can_read = false;
> 
> we don't use can_read below

I'll remove the can_read assignment.

> > +   channel = hvsk->channel;
> > +   if (!channel) {
> > +   WARN_ONCE(1, "NULL channel! There is a programming
> bug.\n");
> 
> BUG() then

OK.

> > +static int hvsock_open_connection(struct vmbus_channel *channel)
> > +{
> > + ..
> > +   if (conn_from_host) {
> > +   if (sk->sk_ack_backlog >= sk->sk_max_ack_backlog) {
> > +   ret = -EMFILE;
> 
> I'm not sure -EMFILE is appropriate, we don't really have "too many open
> files".
Here the ret value doesn't really matter, because the return value of the
function is not really used at present.

However, I agree with you that EMFILE is unsuitable.
Let me change to ECONNREFUSED, which seems better to me.

> > +static int hvsock_connect_wait(struct socket *sock,
> > +  int flags, int current_ret)
> > +{
> > +   struct sock *sk = sock->sk;
> > +   struct hvsock_sock *hvsk;
> > +   int ret = current_ret;
> > +   DEFINE_WAIT(wait);
> > +   long timeout;
> > +
> > +   hvsk = sk_to_hvsock(sk);
> > +   timeout = 30 * HZ;
> 
> We may want to introduce a define for this timeout. Does it actually
> match host's timeout?

I'll add HVSOCK_CONNECT_TIMEOUT for this.
Yes, the value is from Hyper-V team.
 
> > +static int hvsock_accept_wait(struct sock *listener,
> > + ..
> > +
> > +   if (ret) {
> > +   release_sock(connected);
> > +   sock_put(connected);
> > +   } else {
> > +   newsock->state = SS_CONNECTED;
> > +   sock_graft(connected, newsock);
> > +   release_sock(connected);
> > +   sock_put(connected);
> 
> so we do release_sock()/sock_put() unconditionally and this piece could
> be rewritten as
> 
> if (!ret) {
> newsock->state = SS_CONNECTED;
> sock_graft(connected, newsock);
> }
> release_sock(connected);
> sock_put(connected);

Will do.


> > +static int hvsock_listen(struct socket *sock, int backlog)
> > +{
> > + ..
> > +   /* This is an artificial limit */
> > +   if (backlog > 128)
> > +   backlog = 128;
> 
> Let's do a define for it.
Ok.
 
> > +static int hvsock_sendmsg(struct socket *sock, struct msghdr *msg,
> > + size_t len)
> > +{
> > +   struct hvsock_sock *hvsk;
> > +   struct sock *sk;
> > +   int ret;
> > +
> > +   if (len == 0)
> > +   return -EINVAL;
> > +
> > +   if (msg->msg_flags & ~MSG_DONTWAIT) {
> > +   pr_err("%s: unsupported flags=0x%x\n", __func__,
> > +  msg->msg_flags);
> 
> I don't think we 

RE: [PATCH v14 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-08 Thread Dexuan Cui
> From: Olaf Hering [mailto:o...@aepfle.de]
> Sent: Friday, July 8, 2016 0:02
> On Thu, Jun 30, Dexuan Cui wrote:
> 
> > +/* The MTU is 16KB per the host side's design. */
> > +struct hvsock_recv_buf {
> > +   unsigned int data_len;
> > +   unsigned int data_offset;
> > +
> > +   struct vmpipe_proto_header hdr;
> > +   u8 buf[PAGE_SIZE * 4];
> 
> Please use some macro related to the protocol rather than a Linux
> compiletime macro.
OK. I'll fix this.
 
> > +/* We send at most 4KB payload per VMBus packet. */
> > +struct hvsock_send_buf {
> > +   struct vmpipe_proto_header hdr;
> > +   u8 buf[PAGE_SIZE];
> 
> Same here.
OK. I'll fix this.

> > + * Copyright(c) 2016, Microsoft Corporation. All rights reserved.
> 
> Here the BSD license follows. I think its required/desired to also
> include a GPL blurb like it is done in many other files:
> ...
>  * Alternatively, this software may be distributed under the terms of
>  * the GNU General Public License ("GPL") version 2 as published by the
>  * Free Software Foundation.
> 
> 
> Otherwise the MODULE_LICENSE string might be incorrect.
I'll add the GPL blurb.
 
> > +   /* Hyper-V Sockets requires at least VMBus 4.0 */
> > +   if ((vmbus_proto_version >> 16) < 4) {
> > +   pr_err("failed to load: VMBus 4 or later is required\n");
> 
> I guess this mens WS 2016+, and loading in earlier host versions will
> trigger this path? I think a silent ENODEV is enough.
Yes. 
OK, I'll remove the pr_err().

> 
> > +   return -ENODEV;
> 
> Olaf

I'll post v15 shortly, which will address all the comments from Joe and Olaf.

Thanks,
-- Dexuan


RE: [PATCH v15 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-08 Thread Dexuan Cui
> From: Dexuan Cui
> Sent: Friday, July 8, 2016 15:47
> 
> You can also get the patch here (2764221d):
> https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160708_v15
> 
> In v14:
> fix some coding style issues pointed out by David.
> 
> In v15:
> Just some stylistic changes addressing comments from Joe Perches and
> Olaf Hering -- thank you!
> - add a GPL blurb.
> - define a new macro PAGE_SIZE_4K and use it to replace PAGE_SIZE
> - change sk_to_hvsock/hvsock_to_sk() from macros to inline functions
> - remove a not-very-useful pr_err()
> - fix some typos in comment and coding style issues.

FYI: the diff between v14 and v15 is attached: the diff is generated by 
git-diff-ing the 2 branches decui/hv_sock/net-next/20160629_v14 and 
decui/hv_sock/net-next/20160708_v15 in the above github repo.
 
Thanks,
-- Dexuan


delta_v14_vs.v15.patch
Description: delta_v14_vs.v15.patch


[PATCH v15 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-08 Thread Dexuan Cui
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It's somewhat like TCP over
VMBus, but the transportation layer (VMBus) is much simpler than IP.

With Hyper-V Sockets, applications between the host and the guest can talk
to each other directly by the traditional BSD-style socket APIs.

Hyper-V Sockets is only available on new Windows hosts, like Windows Server
2016. More info is in this article "Make your own integration services":
https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service

The patch implements the necessary support in the guest side by introducing
a new socket address family AF_HYPERV.

Signed-off-by: Dexuan Cui <de...@microsoft.com>
Cc: "K. Y. Srinivasan" <k...@microsoft.com>
Cc: Haiyang Zhang <haiya...@microsoft.com>
Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
Cc: Cathy Avery <cav...@redhat.com>
---

You can also get the patch here (2764221d):
https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160708_v15

For the change log before v12, please see https://lkml.org/lkml/2016/5/15/31

In v12, the changes are mainly the following:

1) remove the module params as David suggested.

2) use 5 exact pages for VMBus send/recv rings, respectively.
The host side's design of the feature requires 5 exact pages for recv/send
rings respectively -- this is suboptimal considering memory consumption,
however unluckily we have to live with it, before the host comes up with
a new design in the future. :-(

3) remove the per-connection static send/recv buffers
Instead, we allocate and free the buffers dynamically only when we recv/send
data. This means: when a connection is idle, no memory is consumed as
recv/send buffers at all.

In v13:
I return ENOMEM on buffer alllocation failure

   Actually "man read/write" says "Other errors may occur, depending on the
object connected to fd". "man send/recv" indeed lists ENOMEM.
   Considering AF_HYPERV is a new socket type, ENOMEM seems OK here.
   In the long run, I think we should add a new API in the VMBus driver,
allowing data copy from VMBus ringbuffer into user mode buffer directly.
This way, we can even eliminate this temporary buffer.

In v14:
fix some coding style issues pointed out by David.

In v15:
Just some stylistic changes addressing comments from Joe Perches and
Olaf Hering -- thank you!
- add a GPL blurb.
- define a new macro PAGE_SIZE_4K and use it to replace PAGE_SIZE
- change sk_to_hvsock/hvsock_to_sk() from macros to inline functions
- remove a not-very-useful pr_err()
- fix some typos in comment and coding style issues.

Looking forward to your comments!

 MAINTAINERS |2 +
 include/linux/hyperv.h  |   13 +
 include/linux/socket.h  |4 +-
 include/net/af_hvsock.h |   78 +++
 include/uapi/linux/hyperv.h |   24 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/hv_sock/Kconfig |   10 +
 net/hv_sock/Makefile|3 +
 net/hv_sock/af_hvsock.c | 1523 +++
 10 files changed, 1658 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 50f69ba..6eaa26f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5514,7 +5514,9 @@ F:drivers/pci/host/pci-hyperv.c
 F: drivers/net/hyperv/
 F: drivers/scsi/storvsc_drv.c
 F: drivers/video/fbdev/hyperv_fb.c
+F: net/hv_sock/
 F: include/linux/hyperv.h
+F: include/net/af_hvsock.h
 F: tools/hv/
 F: Documentation/ABI/stable/sysfs-bus-vmbus
 
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 50f493e..1cda6ea5 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1508,5 +1508,18 @@ static inline void commit_rd_index(struct vmbus_channel 
*channel)
vmbus_set_event(channel);
 }
 
+struct vmpipe_proto_header {
+   u32 pkt_type;
+   u32 data_size;
+};
+
+#define HVSOCK_HEADER_LEN  (sizeof(struct vmpacket_descriptor) + \
+sizeof(struct vmpipe_proto_header))
+
+/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */
+#define PREV_INDICES_LEN   (sizeof(u64))
 
+#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \
+   ALIGN((payload_len), 8) + \
+   PREV_INDICES_LEN)
 #endif /* _HYPERV_H */
diff --git a/include/linux/socket.h b/include/linux/socket.h
index b5cc5a6..0b68b58 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -202,8 +202,9 @@ struct ucred {
 #define AF_VSOCK   40  /* vSockets */
 #define AF_KCM 41  /* Kernel Connection Multiplexor*/
 #define AF_QIPCRTR 42  /* Qualcomm IPC Router  */
+#define AF_HYPERV  43  /* Hyper-V Sockets  */
 
-#define AF_MAX 43  /* For now.. 

[PATCH v15 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)

2016-07-08 Thread Dexuan Cui
efault, 1024 sockets take about 1024 * (3+2+1+1) * 4KB = 28M bytes.
(Here 1+1 means 1 page for send/recv buffers per connection, respectively.)

3) implement the TODO in hvsock_shutdown().

4) fix a bug in hvsock_close_connection():
   I remove "sk->sk_socket->state = SS_UNCONNECTED;" -- actually this line
is not really useful. For a connection triggered by a host app's connect(),
sk->sk_socket remains NULL before the connection is accepted by the server
app (in Linux VM): see hvsock_accept() -> hvsock_accept_wait() ->
sock_graft(connected, newsock). If the host app exits before the server
app's accept() returns, the host can send a rescind-message to close the
connection and later in the Linux VM's message handler 
i.e. vmbus_onoffer_rescind()), Linux will get a NULL de-referencing crash. 

5) fix a bug in hvsock_open_connection()
  I move the vmbus_set_chn_rescind_callback() to a later place, because
when vmbus_open() fails, hvsock_close_connection() can do nothing and we
count on vmbus_onoffer_rescind() -> vmbus_device_unregister() to clean up
the device.

6) some stylistic modificiation.


Changes since v11:
1) remove the module params as David suggested.

2) use 5 exact pages for VMBus send/recv rings, respectively.
The host side's design of the feature requires 5 exact pages for recv/send
rings respectively -- this is suboptimal considering memory consumption,
however unluckily we have to live with it, before the host comes up with
a new design in the future. :-(

3) remove the per-connection static send/recv buffers
Instead, we allocate and free the buffers dynamically only when we recv/send
data. This means: when a connection is idle, no memory is consumed as
recv/send buffers at all.
Dexuan Cui (1):
  hv_sock: introduce Hyper-V Sockets

Changes since v12:
return ENOMEM on buffer alllocation failure

   Actually "man read/write" says "Other errors may occur, depending on the
object connected to fd". "man send/recv" indeed lists ENOMEM.
   Considering AF_HYPERV is a new socket type, ENOMEM seems OK here.
   In the long run, I think we should add a new API in the VMBus driver,
allowing data copy from VMBus ringbuffer into user mode buffer directly.
This way, we can even eliminate this temporary buffer.

Changes since v13:
fix some coding style issues pointed out by David.

Changes since v14:
Just some stylistic changes addressing comments from Joe Perches and
Olaf Hering -- thank you!
- add a GPL blurb.
- define a new macro PAGE_SIZE_4K and use it to replace PAGE_SIZE
- change sk_to_hvsock/hvsock_to_sk() from macros to inline functions
- remove a not-very-useful pr_err()
- fix some typos in comment and coding style issues.


Dexuan Cui (1):
  hv_sock: introduce Hyper-V Sockets

 MAINTAINERS |2 +
 include/linux/hyperv.h  |   13 +
 include/linux/socket.h  |4 +-
 include/net/af_hvsock.h |   78 +++
 include/uapi/linux/hyperv.h |   24 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/hv_sock/Kconfig |   10 +
 net/hv_sock/Makefile|3 +
 net/hv_sock/af_hvsock.c | 1523 +++
 10 files changed, 1658 insertions(+), 1 deletion(-)
 create mode 100644 include/net/af_hvsock.h
 create mode 100644 net/hv_sock/Kconfig
 create mode 100644 net/hv_sock/Makefile
 create mode 100644 net/hv_sock/af_hvsock.c

-- 
2.1.0


RE: [PATCH v14 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-05 Thread Dexuan Cui
> From: Joe Perches [mailto:j...@perches.com]
> Sent: Tuesday, July 5, 2016 17:39
> To: Dexuan Cui <de...@microsoft.com>; da...@davemloft.net;
> gre...@linuxfoundation.org; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de;
> a...@canonical.com; jasow...@redhat.com; Vitaly Kuznetsov
> <vkuzn...@redhat.com>; Cathy Avery <cav...@redhat.com>; KY Srinivasan
> <k...@microsoft.com>
> Cc: Haiyang Zhang <haiya...@microsoft.com>; Rolf Neugebauer
> <rolf.neugeba...@docker.com>
> Subject: Re: [PATCH v14 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> 
> On Tue, 2016-07-05 at 09:31 +, Dexuan Cui wrote:
> 
> > > > +/* This is the address fromat of Hyper-V Sockets.
> > > format
> > I suppose you meant I should change
> > /* This is ...
> > to
> > /*
> >   * This is ...
> > I'll fix this.
> 
> No, I just meant fromat should  be format

Oh... Got it.  Thanks!
I'll fix the typo.

-- Dexuan


RE: [PATCH v14 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-05 Thread Dexuan Cui
> From: Joe Perches [mailto:j...@perches.com]
> 
> > +#define sk_to_hvsock(__sk)   ((struct hvsock_sock *)(__sk))
> > +#define hvsock_to_sk(__hvsk) ((struct sock *)(__hvsk))
> 
> Might as well be static inlines
Hi Joe,
Thank you for the suggestions (again)! :-)

I'll change them to static inlines.

> > +/* We send at most 4KB payload per VMBus packet. */
> > +struct hvsock_send_buf {
> > +   struct vmpipe_proto_header hdr;
> > +   u8 buf[PAGE_SIZE];
> 
> PAGE_SIZE might not be the right define here if
> the comment is to be believed.

I'll change to something like this:

+#define HVSOCK_MAX_SND_SIZE_BY_VM (1024 * 4)
 struct hvsock_send_buf {
struct vmpipe_proto_header hdr;
-   u8 buf[PAGE_SIZE];
+   u8 buf[HVSOCK_MAX_SND_SIZE_BY_VM];
 };

 > > diff --git a/include/uapi/linux/hyperv.h b/include/uapi/linux/hyperv.h
> []
> > @@ -396,4 +397,27 @@ struct hv_kvp_ip_msg {
> >     struct hv_kvp_ipaddr_value  kvp_ip_val;
> >  } __attribute__((packed));
> >
> > +/* This is the address fromat of Hyper-V Sockets.
> 
> format
I suppose you meant I should change 
/* This is ...
to 
/*
  * This is ...
I'll fix this.

> > diff --git a/net/hv_sock/af_hvsock.c b/net/hv_sock/af_hvsock.c
> []
> > @@ -0,0 +1,1519 @@
> > +/*
> > + * Hyper-V Sockets -- a socket-based communication channel between the
> > + * Hyper-V host and the virtual machines running on it.
> > + *
> > + * Copyright(c) 2016, Microsoft Corporation. All rights reserved.
> > + *
> > + * Redistribution and use in source and binary forms, with or without
> > + * modification, are permitted provided that the following conditions
> > + * are met
> .
> Is this license GPL compatible?
Yes. At the end of the file, there is a line
+MODULE_LICENSE("Dual BSD/GPL");

> > +static struct proto hvsock_proto = {
> > +   .name = "HV_SOCK",
> > +   .owner = THIS_MODULE,
> > +   .obj_size = sizeof(struct hvsock_sock),
> > +};
> 
> const?
No. In hvsock_create(), hvsock_proto is passed to sk_alloc(), which requires
a non-const argument. 

> > +static int hvsock_recvmsg_wait(struct sock *sk, struct msghdr *msg,
> > +      size_t len, int flags)
> > +{
> []
> > +   if (ret != 0 || payload_len >
> > +   sizeof(hvsk->recv->buf)) {
> 
> This could look nicer as
> 
>   if (ret != 0 ||
>       payload_len > sizeof(hvsk->recv->buf)) {
I'll fix this.

Thanks,
-- Dexuan


RE: [PATCH v14 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-05 Thread Dexuan Cui
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Tuesday, July 5, 2016 14:27
> To: Dexuan Cui <de...@microsoft.com>
> Subject: Re: [PATCH v14 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> 
> From: Dexuan Cui <de...@microsoft.com>
> Date: Tue, 5 Jul 2016 01:58:31 +
> 
> > Not sure if you had a chance to review this version.
> 
> Why me?
I just think you're the most responsive reviewer. :-)

> Other people have to review this too.
Sure. Let me try to ask more people to review this.
 
> > Now I have a question: may I split the include/linux/socket.h change
> > and ask you to pre-allocate the number for AF_HYPERV to allow
> > backporting of Hyper-V Sockets to distro kernels, and to make sure
> > that applications using the socket type will work with the backport
> > as well as the upstream kernel?
> 
> Sorry, I'm not going to do this.
> 
> You cannot commit anything in userspace to this value anywhere
> until it is accepted upstream.
Got it. Thanks for the explanation! 

Thanks,
-- Dexuan


RE: [PATCH v14 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-07-04 Thread Dexuan Cui
> From: linux-kernel-ow...@vger.kernel.org [mailto:linux-kernel-
> ow...@vger.kernel.org] On Behalf Of Dexuan Cui
> Sent: Thursday, June 30, 2016 23:59
> diff --git a/include/linux/socket.h b/include/linux/socket.h
> index b5cc5a6..0b68b58 100644
> --- a/include/linux/socket.h
> +++ b/include/linux/socket.h
> @@ -202,8 +202,9 @@ struct ucred {
>  #define AF_VSOCK 40  /* vSockets */
>  #define AF_KCM   41  /* Kernel Connection Multiplexor*/
>  #define AF_QIPCRTR   42  /* Qualcomm IPC Router  */
> +#define AF_HYPERV43  /* Hyper-V Sockets  */
> 
> -#define AF_MAX   43  /* For now.. */
> +#define AF_MAX   44  /* For now.. */

Hi David,
Not sure if you had a chance to review this version.
Now I have a question: may I split the include/linux/socket.h change
and ask you to pre-allocate the number for AF_HYPERV to allow
backporting of Hyper-V Sockets to distro kernels, and to make sure that
applications using the socket type will work with the backport as well
as the upstream kernel?

Thanks,
-- Dexuan


RE: [PATCH v14 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-06-30 Thread Dexuan Cui
> From: Olaf Hering [mailto:o...@aepfle.de]
> Sent: Friday, July 1, 2016 0:12
> To: Dexuan Cui <de...@microsoft.com>
> Cc: da...@davemloft.net; gre...@linuxfoundation.org;
> netdev@vger.kernel.org; linux-ker...@vger.kernel.org;
> de...@linuxdriverproject.org; a...@canonical.com; jasow...@redhat.com;
> Vitaly Kuznetsov <vkuzn...@redhat.com>; Cathy Avery <cav...@redhat.com>;
> KY Srinivasan <k...@microsoft.com>; Haiyang Zhang
> <haiya...@microsoft.com>; j...@perches.com; Rolf Neugebauer
> <rolf.neugeba...@docker.com>
> Subject: Re: [PATCH v14 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> 
> On Thu, Jun 30, Dexuan Cui wrote:
> 
> > -#define AF_MAX 43  /* For now.. */
> > +#define AF_MAX 44  /* For now.. */
> 
> Should this patch also change the places where AF_MAX is used,
> like all the arrays in net/core/sock.c?
> 
> Olaf

Thanks for the reminder, Olaf!

I think we may as well make a separate patch for this. 
It is in my To-Do list.

Thanks,
-- Dexuan


[PATCH v14 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)

2016-06-30 Thread Dexuan Cui
efault, 1024 sockets take about 1024 * (3+2+1+1) * 4KB = 28M bytes.
(Here 1+1 means 1 page for send/recv buffers per connection, respectively.)

3) implement the TODO in hvsock_shutdown().

4) fix a bug in hvsock_close_connection():
   I remove "sk->sk_socket->state = SS_UNCONNECTED;" -- actually this line
is not really useful. For a connection triggered by a host app's connect(),
sk->sk_socket remains NULL before the connection is accepted by the server
app (in Linux VM): see hvsock_accept() -> hvsock_accept_wait() ->
sock_graft(connected, newsock). If the host app exits before the server
app's accept() returns, the host can send a rescind-message to close the
connection and later in the Linux VM's message handler 
i.e. vmbus_onoffer_rescind()), Linux will get a NULL de-referencing crash. 

5) fix a bug in hvsock_open_connection()
  I move the vmbus_set_chn_rescind_callback() to a later place, because
when vmbus_open() fails, hvsock_close_connection() can do nothing and we
count on vmbus_onoffer_rescind() -> vmbus_device_unregister() to clean up
the device.

6) some stylistic modificiation.


Changes since v11:
1) remove the module params as David suggested.

2) use 5 exact pages for VMBus send/recv rings, respectively.
The host side's design of the feature requires 5 exact pages for recv/send
rings respectively -- this is suboptimal considering memory consumption,
however unluckily we have to live with it, before the host comes up with
a new design in the future. :-(

3) remove the per-connection static send/recv buffers
Instead, we allocate and free the buffers dynamically only when we recv/send
data. This means: when a connection is idle, no memory is consumed as
recv/send buffers at all.
Dexuan Cui (1):
  hv_sock: introduce Hyper-V Sockets

Changes since v12:
return ENOMEM on buffer alllocation failure

   Actually "man read/write" says "Other errors may occur, depending on the
object connected to fd". "man send/recv" indeed lists ENOMEM.
   Considering AF_HYPERV is a new socket type, ENOMEM seems OK here.
   In the long run, I think we should add a new API in the VMBus driver,
allowing data copy from VMBus ringbuffer into user mode buffer directly.
This way, we can even eliminate this temporary buffer.

Changes since v13:
fix some coding style issues pointed out by David.

Dexuan Cui (1):
  hv_sock: introduce Hyper-V Sockets

 MAINTAINERS |2 +
 include/linux/hyperv.h  |   13 +
 include/linux/socket.h  |4 +-
 include/net/af_hvsock.h |   59 ++
 include/uapi/linux/hyperv.h |   24 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/hv_sock/Kconfig |   10 +
 net/hv_sock/Makefile|3 +
 net/hv_sock/af_hvsock.c | 1519 +++
 10 files changed, 1635 insertions(+), 1 deletion(-)
 create mode 100644 include/net/af_hvsock.h
 create mode 100644 net/hv_sock/Kconfig
 create mode 100644 net/hv_sock/Makefile
 create mode 100644 net/hv_sock/af_hvsock.c

-- 
2.1.0


[PATCH v14 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-06-30 Thread Dexuan Cui
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It's somewhat like TCP over
VMBus, but the transportation layer (VMBus) is much simpler than IP.

With Hyper-V Sockets, applications between the host and the guest can talk
to each other directly by the traditional BSD-style socket APIs.

Hyper-V Sockets is only available on new Windows hosts, like Windows Server
2016. More info is in this article "Make your own integration services":
https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service

The patch implements the necessary support in the guest side by introducing
a new socket address family AF_HYPERV.

Signed-off-by: Dexuan Cui <de...@microsoft.com>
Cc: "K. Y. Srinivasan" <k...@microsoft.com>
Cc: Haiyang Zhang <haiya...@microsoft.com>
Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
Cc: Cathy Avery <cav...@redhat.com>
---

You can also get the patch here (8ba95c8ec9):
https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160629_v14

For the change log before v12, please see https://lkml.org/lkml/2016/5/15/31

In v12, the changes are mainly the following:

1) remove the module params as David suggested.

2) use 5 exact pages for VMBus send/recv rings, respectively.
The host side's design of the feature requires 5 exact pages for recv/send
rings respectively -- this is suboptimal considering memory consumption,
however unluckily we have to live with it, before the host comes up with
a new design in the future. :-(

3) remove the per-connection static send/recv buffers
Instead, we allocate and free the buffers dynamically only when we recv/send
data. This means: when a connection is idle, no memory is consumed as
recv/send buffers at all.

In v13:
I return ENOMEM on buffer alllocation failure

   Actually "man read/write" says "Other errors may occur, depending on the
object connected to fd". "man send/recv" indeed lists ENOMEM.
   Considering AF_HYPERV is a new socket type, ENOMEM seems OK here.
   In the long run, I think we should add a new API in the VMBus driver,
allowing data copy from VMBus ringbuffer into user mode buffer directly.
This way, we can even eliminate this temporary buffer.

In v14:
fix some coding style issues pointed out by David.

Looking forward to your comments!
 MAINTAINERS |2 +
 include/linux/hyperv.h  |   13 +
 include/linux/socket.h  |4 +-
 include/net/af_hvsock.h |   59 ++
 include/uapi/linux/hyperv.h |   24 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/hv_sock/Kconfig |   10 +
 net/hv_sock/Makefile|3 +
 net/hv_sock/af_hvsock.c | 1519 +++
 10 files changed, 1635 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 50f69ba..6eaa26f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5514,7 +5514,9 @@ F:drivers/pci/host/pci-hyperv.c
 F: drivers/net/hyperv/
 F: drivers/scsi/storvsc_drv.c
 F: drivers/video/fbdev/hyperv_fb.c
+F: net/hv_sock/
 F: include/linux/hyperv.h
+F: include/net/af_hvsock.h
 F: tools/hv/
 F: Documentation/ABI/stable/sysfs-bus-vmbus
 
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 50f493e..1cda6ea5 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1508,5 +1508,18 @@ static inline void commit_rd_index(struct vmbus_channel 
*channel)
vmbus_set_event(channel);
 }
 
+struct vmpipe_proto_header {
+   u32 pkt_type;
+   u32 data_size;
+};
+
+#define HVSOCK_HEADER_LEN  (sizeof(struct vmpacket_descriptor) + \
+sizeof(struct vmpipe_proto_header))
+
+/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */
+#define PREV_INDICES_LEN   (sizeof(u64))
 
+#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \
+   ALIGN((payload_len), 8) + \
+   PREV_INDICES_LEN)
 #endif /* _HYPERV_H */
diff --git a/include/linux/socket.h b/include/linux/socket.h
index b5cc5a6..0b68b58 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -202,8 +202,9 @@ struct ucred {
 #define AF_VSOCK   40  /* vSockets */
 #define AF_KCM 41  /* Kernel Connection Multiplexor*/
 #define AF_QIPCRTR 42  /* Qualcomm IPC Router  */
+#define AF_HYPERV  43  /* Hyper-V Sockets  */
 
-#define AF_MAX 43  /* For now.. */
+#define AF_MAX 44  /* For now.. */
 
 /* Protocol families, same as address families. */
 #define PF_UNSPEC  AF_UNSPEC
@@ -251,6 +252,7 @@ struct ucred {
 #define PF_VSOCK   AF_VSOCK
 #define PF_KCM AF_KCM
 #define PF_QIPCRTR AF_QIPCRTR
+#define PF_HYPERV  AF_HYPERV
 #define PF_MAX AF_MAX
 
 /* Maxi

RE: [PATCH v13 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-06-30 Thread Dexuan Cui
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Thursday, June 30, 2016 20:45
> To: Dexuan Cui <de...@microsoft.com>
> Cc: gre...@linuxfoundation.org; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de;
> a...@canonical.com; jasow...@redhat.com; vkuzn...@redhat.com;
> cav...@redhat.com; KY Srinivasan <k...@microsoft.com>; Haiyang Zhang
> <haiya...@microsoft.com>; j...@perches.com; rolf.neugeba...@docker.com
> Subject: Re: [PATCH v13 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> 
> From: Dexuan Cui <de...@microsoft.com>
> Date: Wed, 29 Jun 2016 11:30:40 +
> 
> > @@ -1509,4 +1509,18 @@ static inline void commit_rd_index(struct
> vmbus_channel *channel)
> >  }
> >
> >
> > +struct vmpipe_proto_header {
> > +   u32 pkt_type;
> 
> It is wasteful to have two empty lines before this structure definition, one
> is sufficient.
> 
> ...

Hi David,
Thank you for pointing out the issues!

I'll fix all of them, and check all the similar issues in the patch.

Will post a new version ASAP.

Thanks,
-- Dexuan


[PATCH v13 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-06-29 Thread Dexuan Cui
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It's somewhat like TCP over
VMBus, but the transportation layer (VMBus) is much simpler than IP.

With Hyper-V Sockets, applications between the host and the guest can talk
to each other directly by the traditional BSD-style socket APIs.

Hyper-V Sockets is only available on new Windows hosts, like Windows Server
2016. More info is in this article "Make your own integration services":
https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service

The patch implements the necessary support in the guest side by introducing
a new socket address family AF_HYPERV.

Signed-off-by: Dexuan Cui <de...@microsoft.com>
Cc: "K. Y. Srinivasan" <k...@microsoft.com>
Cc: Haiyang Zhang <haiya...@microsoft.com>
Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
Cc: Cathy Avery <cav...@redhat.com>
---

You can also get the patch here (ae3cbdabca):
https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160629_v13

For the change log before v12, please see https://lkml.org/lkml/2016/5/15/31

In v12, the changes are mainly the following:

1) remove the module params as David suggested.

2) use 5 exact pages for VMBus send/recv rings, respectively.
The host side's design of the feature requires 5 exact pages for recv/send
rings respectively -- this is suboptimal considering memory consumption,
however unluckily we have to live with it, before the host comes up with
a new design in the future. :-(

3) remove the per-connection static send/recv buffers
Instead, we allocate and free the buffers dynamically only when we recv/send
data. This means: when a connection is idle, no memory is consumed as
recv/send buffers at all.

In v13:
I return ENOMEM on buffer alllocation failure

   Actually "man read/write" says "Other errors may occur, depending on the
object connected to fd". "man send/recv" indeed lists ENOMEM.
   Considering AF_HYPERV is a new socket type, ENOMEM seems OK here.
   In the long run, I think we should add a new API in the VMBus driver,
allowing data copy from VMBus ringbuffer into user mode buffer directly.
This way, we can even eliminate this temporary buffer.


Looking forward to your comments!


 MAINTAINERS |2 +
 include/linux/hyperv.h  |   14 +
 include/linux/socket.h  |4 +-
 include/net/af_hvsock.h |   59 ++
 include/uapi/linux/hyperv.h |   25 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/hv_sock/Kconfig |   10 +
 net/hv_sock/Makefile|3 +
 net/hv_sock/af_hvsock.c | 1519 +++
 10 files changed, 1637 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 50f69ba..6eaa26f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5514,7 +5514,9 @@ F:drivers/pci/host/pci-hyperv.c
 F: drivers/net/hyperv/
 F: drivers/scsi/storvsc_drv.c
 F: drivers/video/fbdev/hyperv_fb.c
+F: net/hv_sock/
 F: include/linux/hyperv.h
+F: include/net/af_hvsock.h
 F: tools/hv/
 F: Documentation/ABI/stable/sysfs-bus-vmbus
 
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 50f493e..95d159e 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1509,4 +1509,18 @@ static inline void commit_rd_index(struct vmbus_channel 
*channel)
 }
 
 
+struct vmpipe_proto_header {
+   u32 pkt_type;
+   u32 data_size;
+};
+
+#define HVSOCK_HEADER_LEN  (sizeof(struct vmpacket_descriptor) + \
+sizeof(struct vmpipe_proto_header))
+
+/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */
+#define PREV_INDICES_LEN   (sizeof(u64))
+
+#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \
+   ALIGN((payload_len), 8) + \
+   PREV_INDICES_LEN)
 #endif /* _HYPERV_H */
diff --git a/include/linux/socket.h b/include/linux/socket.h
index b5cc5a6..0b68b58 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -202,8 +202,9 @@ struct ucred {
 #define AF_VSOCK   40  /* vSockets */
 #define AF_KCM 41  /* Kernel Connection Multiplexor*/
 #define AF_QIPCRTR 42  /* Qualcomm IPC Router  */
+#define AF_HYPERV  43  /* Hyper-V Sockets  */
 
-#define AF_MAX 43  /* For now.. */
+#define AF_MAX 44  /* For now.. */
 
 /* Protocol families, same as address families. */
 #define PF_UNSPEC  AF_UNSPEC
@@ -251,6 +252,7 @@ struct ucred {
 #define PF_VSOCK   AF_VSOCK
 #define PF_KCM AF_KCM
 #define PF_QIPCRTR AF_QIPCRTR
+#define PF_HYPERV  AF_HYPERV
 #define PF_MAX AF_MAX
 
 /* Maximum queue length specifiable by listen.  */
diff --git a/include/net/af_hvsock.h b/include/net/af_

[PATCH v13 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)

2016-06-29 Thread Dexuan Cui
efault, 1024 sockets take about 1024 * (3+2+1+1) * 4KB = 28M bytes.
(Here 1+1 means 1 page for send/recv buffers per connection, respectively.)

3) implement the TODO in hvsock_shutdown().

4) fix a bug in hvsock_close_connection():
   I remove "sk->sk_socket->state = SS_UNCONNECTED;" -- actually this line
is not really useful. For a connection triggered by a host app's connect(),
sk->sk_socket remains NULL before the connection is accepted by the server
app (in Linux VM): see hvsock_accept() -> hvsock_accept_wait() ->
sock_graft(connected, newsock). If the host app exits before the server
app's accept() returns, the host can send a rescind-message to close the
connection and later in the Linux VM's message handler 
i.e. vmbus_onoffer_rescind()), Linux will get a NULL de-referencing crash. 

5) fix a bug in hvsock_open_connection()
  I move the vmbus_set_chn_rescind_callback() to a later place, because
when vmbus_open() fails, hvsock_close_connection() can do nothing and we
count on vmbus_onoffer_rescind() -> vmbus_device_unregister() to clean up
the device.

6) some stylistic modificiation.


Changes since v11:
1) remove the module params as David suggested.

2) use 5 exact pages for VMBus send/recv rings, respectively.
The host side's design of the feature requires 5 exact pages for recv/send
rings respectively -- this is suboptimal considering memory consumption,
however unluckily we have to live with it, before the host comes up with
a new design in the future. :-(

3) remove the per-connection static send/recv buffers
Instead, we allocate and free the buffers dynamically only when we recv/send
data. This means: when a connection is idle, no memory is consumed as
recv/send buffers at all.
Dexuan Cui (1):
  hv_sock: introduce Hyper-V Sockets

Changes since v12:
return ENOMEM on buffer alllocation failure

   Actually "man read/write" says "Other errors may occur, depending on the
object connected to fd". "man send/recv" indeed lists ENOMEM.
   Considering AF_HYPERV is a new socket type, ENOMEM seems OK here.
   In the long run, I think we should add a new API in the VMBus driver,
allowing data copy from VMBus ringbuffer into user mode buffer directly.
This way, we can even eliminate this temporary buffer.


 MAINTAINERS |2 +
 include/linux/hyperv.h  |   14 +
 include/linux/socket.h  |4 +-
 include/net/af_hvsock.h |   59 ++
 include/uapi/linux/hyperv.h |   25 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/hv_sock/Kconfig |   10 +
 net/hv_sock/Makefile|3 +
 net/hv_sock/af_hvsock.c | 1519 +++
 10 files changed, 1637 insertions(+), 1 deletion(-)
 create mode 100644 include/net/af_hvsock.h
 create mode 100644 net/hv_sock/Kconfig
 create mode 100644 net/hv_sock/Makefile
 create mode 100644 net/hv_sock/af_hvsock.c

-- 
2.1.0


RE: [PATCH v12 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-06-29 Thread Dexuan Cui
> From: Rick Jones [mailto:rick.jon...@hpe.com]
> Sent: Tuesday, June 28, 2016 23:43
> To: Dexuan Cui <de...@microsoft.com>; David Miller <da...@davemloft.net>
> Cc: gre...@linuxfoundation.org; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de;
> a...@canonical.com; jasow...@redhat.com; vkuzn...@redhat.com;
> cav...@redhat.com; KY Srinivasan <k...@microsoft.com>; Haiyang Zhang
> <haiya...@microsoft.com>; j...@perches.com
> Subject: Re: [PATCH v12 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> 
> On 06/28/2016 02:59 AM, Dexuan Cui wrote:
> > The idea here is: IMO the syscalls sys_read()/write() shoudn't return
> > -ENOMEM, so I have to make sure the buffer allocation succeeds?
> >
> > I tried to use kmalloc with __GFP_NOFAIL, but I hit a warning in
> > in mm/page_alloc.c:
> > WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));
> >
> > What error code do you think I should return?
> > EAGAIN, ERESTARTSYS, or something else?
> >
> > May I have your suggestion? Thanks!
> 
> What happens as far as errno is concerned when an application makes a
> read() call against a (say TCP) socket associated with a connection
> which has been reset? 
I suppose it is ECONNRESET (Connection reset by peer).

>  Is it limited to those errno values listed in the
> read() manpage, or does it end-up getting an errno value from those
> listed in the recv() manpage?  Or, perhaps even one not (presently)
> listed in either?
> 
> rick jones

Actually "man read/write" says "Other errors may occur, depending on the
object connected to fd".

"man send/recv" indeed lists ENOMEM.

Considering AF_HYPERV is a new socket type, ENOMEM seems OK to me
and I'm going to post a new version of the patch.

In the long run, I think we should add a new API in the VMBus driver,
allowing data copy from VMBus ringbuffer into user mode buffer directly.
This way, we can even eliminate this temporary buffer.

Thanks,
-- Dexuan


RE: [PATCH v12 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-06-28 Thread Dexuan Cui
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Tuesday, June 28, 2016 21:45
> To: Dexuan Cui <de...@microsoft.com>
> Cc: gre...@linuxfoundation.org; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de;
> a...@canonical.com; jasow...@redhat.com; vkuzn...@redhat.com;
> cav...@redhat.com; KY Srinivasan <k...@microsoft.com>; Haiyang Zhang
> <haiya...@microsoft.com>; j...@perches.com
> Subject: Re: [PATCH v12 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> 
> From: Dexuan Cui <de...@microsoft.com>
> Date: Tue, 28 Jun 2016 09:59:21 +
> 
> > The idea here is: IMO the syscalls sys_read()/write() shoudn't return
> > -ENOMEM, so I have to make sure the buffer allocation succeeds?
> 
> You have to fail if resources cannot be allocated.

OK, I'll try to fix this, probably by returning -EAGAIN or -ERESTARTSYS.

I'll report back ASAP.

Thanks,
-- Dexuan


RE: [PATCH v12 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-06-28 Thread Dexuan Cui
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Tuesday, June 28, 2016 17:34
> To: Dexuan Cui <de...@microsoft.com>
> Cc: gre...@linuxfoundation.org; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de;
> a...@canonical.com; jasow...@redhat.com; vkuzn...@redhat.com;
> cav...@redhat.com; KY Srinivasan <k...@microsoft.com>; Haiyang Zhang
> <haiya...@microsoft.com>; j...@perches.com
> Subject: Re: [PATCH v12 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> 
> From: Dexuan Cui <de...@microsoft.com>
> Date: Fri, 24 Jun 2016 07:45:24 +
> 
> > +   while ((ret = vmalloc(size)) == NULL)
> > +   ssleep(1);
> 
> This is completely, and entirely, unacceptable.
> 
> If the allocation fails, you return an error and release
> your resources.
> 
> You don't just loop forever waiting for it to succeed.

Hi David,
I agree this is ugly...

The idea here is: IMO the syscalls sys_read()/write() shoudn't return
-ENOMEM, so I have to make sure the buffer allocation succeeds?

I tried to use kmalloc with __GFP_NOFAIL, but I hit a warning in 
in mm/page_alloc.c:
WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1));

What error code do you think I should return? 
EAGAIN, ERESTARTSYS, or something else?

May I have your suggestion? Thanks!

-- Dexuan



[PATCH v12 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-06-24 Thread Dexuan Cui
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It's somewhat like TCP over
VMBus, but the transportation layer (VMBus) is much simpler than IP.

With Hyper-V Sockets, applications between the host and the guest can talk
to each other directly by the traditional BSD-style socket APIs.

Hyper-V Sockets is only available on new Windows hosts, like Windows Server
2016. More info is in this article "Make your own integration services":
https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service

The patch implements the necessary support in the guest side by introducing
a new socket address family AF_HYPERV.

Signed-off-by: Dexuan Cui <de...@microsoft.com>
Cc: "K. Y. Srinivasan" <k...@microsoft.com>
Cc: Haiyang Zhang <haiya...@microsoft.com>
Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
Cc: Cathy Avery <cav...@redhat.com>
---

You can also get the patch here:
https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160620_v12

For the change log before v12, please see https://lkml.org/lkml/2016/5/15/31


In v12, the changes are mainly the following:

1) remove the module params as David suggested.

2) use 5 exact pages for VMBus send/recv rings, respectively.
The host side's design of the feature requires 5 exact pages for recv/send
rings respectively -- this is suboptimal considering memory consumption,
however unluckily we have to live with it, before the host comes up with
a new design in the future. :-(

3) remove the per-connection static send/recv buffers
Instead, we allocate and free the buffers dynamically only when we recv/send
data. This means: when a connection is idle, no memory is consumed as
recv/send buffers at all.

Looking forward to your comments!

 MAINTAINERS |2 +
 include/linux/hyperv.h  |   14 +
 include/linux/socket.h  |4 +-
 include/net/af_hvsock.h |   59 ++
 include/uapi/linux/hyperv.h |   25 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/hv_sock/Kconfig |   10 +
 net/hv_sock/Makefile|3 +
 net/hv_sock/af_hvsock.c | 1514 +++
 10 files changed, 1632 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 50f69ba..6eaa26f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5514,7 +5514,9 @@ F:drivers/pci/host/pci-hyperv.c
 F: drivers/net/hyperv/
 F: drivers/scsi/storvsc_drv.c
 F: drivers/video/fbdev/hyperv_fb.c
+F: net/hv_sock/
 F: include/linux/hyperv.h
+F: include/net/af_hvsock.h
 F: tools/hv/
 F: Documentation/ABI/stable/sysfs-bus-vmbus
 
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index 50f493e..95d159e 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1509,4 +1509,18 @@ static inline void commit_rd_index(struct vmbus_channel 
*channel)
 }
 
 
+struct vmpipe_proto_header {
+   u32 pkt_type;
+   u32 data_size;
+};
+
+#define HVSOCK_HEADER_LEN  (sizeof(struct vmpacket_descriptor) + \
+sizeof(struct vmpipe_proto_header))
+
+/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */
+#define PREV_INDICES_LEN   (sizeof(u64))
+
+#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \
+   ALIGN((payload_len), 8) + \
+   PREV_INDICES_LEN)
 #endif /* _HYPERV_H */
diff --git a/include/linux/socket.h b/include/linux/socket.h
index b5cc5a6..0b68b58 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -202,8 +202,9 @@ struct ucred {
 #define AF_VSOCK   40  /* vSockets */
 #define AF_KCM 41  /* Kernel Connection Multiplexor*/
 #define AF_QIPCRTR 42  /* Qualcomm IPC Router  */
+#define AF_HYPERV  43  /* Hyper-V Sockets  */
 
-#define AF_MAX 43  /* For now.. */
+#define AF_MAX 44  /* For now.. */
 
 /* Protocol families, same as address families. */
 #define PF_UNSPEC  AF_UNSPEC
@@ -251,6 +252,7 @@ struct ucred {
 #define PF_VSOCK   AF_VSOCK
 #define PF_KCM AF_KCM
 #define PF_QIPCRTR AF_QIPCRTR
+#define PF_HYPERV  AF_HYPERV
 #define PF_MAX AF_MAX
 
 /* Maximum queue length specifiable by listen.  */
diff --git a/include/net/af_hvsock.h b/include/net/af_hvsock.h
new file mode 100644
index 000..20d23d5
--- /dev/null
+++ b/include/net/af_hvsock.h
@@ -0,0 +1,59 @@
+#ifndef __AF_HVSOCK_H__
+#define __AF_HVSOCK_H__
+
+#include 
+#include 
+#include 
+
+/* The host side's design of the feature requires 5 exact pages for recv/send
+ * rings respectively -- this is suboptimal considering memory consumption,
+ * however unluckily we have to live with it, before the host comes up with
+ * a better new design in the future.
+ */
+#define R

[PATCH v12 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)

2016-06-24 Thread Dexuan Cui
efault, 1024 sockets take about 1024 * (3+2+1+1) * 4KB = 28M bytes.
(Here 1+1 means 1 page for send/recv buffers per connection, respectively.)

3) implement the TODO in hvsock_shutdown().

4) fix a bug in hvsock_close_connection():
   I remove "sk->sk_socket->state = SS_UNCONNECTED;" -- actually this line
is not really useful. For a connection triggered by a host app's connect(),
sk->sk_socket remains NULL before the connection is accepted by the server
app (in Linux VM): see hvsock_accept() -> hvsock_accept_wait() ->
sock_graft(connected, newsock). If the host app exits before the server
app's accept() returns, the host can send a rescind-message to close the
connection and later in the Linux VM's message handler 
i.e. vmbus_onoffer_rescind()), Linux will get a NULL de-referencing crash. 

5) fix a bug in hvsock_open_connection()
  I move the vmbus_set_chn_rescind_callback() to a later place, because
when vmbus_open() fails, hvsock_close_connection() can do nothing and we
count on vmbus_onoffer_rescind() -> vmbus_device_unregister() to clean up
the device.

6) some stylistic modificiation.


Changes since v11:
1) remove the module params as David suggested.

2) use 5 exact pages for VMBus send/recv rings, respectively.
The host side's design of the feature requires 5 exact pages for recv/send
rings respectively -- this is suboptimal considering memory consumption,
however unluckily we have to live with it, before the host comes up with
a new design in the future. :-(

3) remove the per-connection static send/recv buffers
Instead, we allocate and free the buffers dynamically only when we recv/send
data. This means: when a connection is idle, no memory is consumed as
recv/send buffers at all.

Dexuan Cui (1):
  hv_sock: introduce Hyper-V Sockets

 MAINTAINERS |2 +
 include/linux/hyperv.h  |   14 +
 include/linux/socket.h  |4 +-
 include/net/af_hvsock.h |   59 ++
 include/uapi/linux/hyperv.h |   25 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/hv_sock/Kconfig |   10 +
 net/hv_sock/Makefile|3 +
 net/hv_sock/af_hvsock.c | 1514 +++
 10 files changed, 1632 insertions(+), 1 deletion(-)
 create mode 100644 include/net/af_hvsock.h
 create mode 100644 net/hv_sock/Kconfig
 create mode 100644 net/hv_sock/Makefile
 create mode 100644 net/hv_sock/af_hvsock.c

-- 
2.1.0



RE: [PATCH v11 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)

2016-05-18 Thread Dexuan Cui
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Thursday, May 19, 2016 12:13
> To: Dexuan Cui <de...@microsoft.com>
> Cc: KY Srinivasan <k...@microsoft.com>; o...@aepfle.de;
> gre...@linuxfoundation.org; jasow...@redhat.com; linux-
> ker...@vger.kernel.org; j...@perches.com; netdev@vger.kernel.org;
> a...@canonical.com; de...@linuxdriverproject.org; Haiyang Zhang
> <haiya...@microsoft.com>
> Subject: Re: [PATCH v11 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)
> 
> 
> I'm travelling and very busy with the merge window.  So sorry I won't be able
> to think about this for some time.

David, 
Sure, I understand.

Please let me recap my last mail:

1)  I'll replace my statically-allocated per-connection "send/recv bufs" with
dynamically ones, so no buf is used when there is no traffic.

2) Another kind of bufs i.e., the  multi-page "VMBus send/recv ringbuffer", is
a must IMO due to the host side's design of the feature: every connection needs
its own ringbuffer, which takes several pages (2~3 pages at least. And, 5 pages
should suffice for good performance). The ringbuffer can be accessed by the
host at any time, so IMO the pages can't be swappable.

I understand net-next is closed now. I'm going to post the next version
after 4.7-rc1 is out in several weeks.

If you could give me some suggestions, I would be definitely happy to take.

Thanks!
-- Dexuan


RE: [PATCH v11 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)

2016-05-18 Thread Dexuan Cui
> From: devel [mailto:driverdev-devel-boun...@linuxdriverproject.org] On Behalf
> Of Dexuan Cui
> Sent: Tuesday, May 17, 2016 10:46
> To: David Miller <da...@davemloft.net>
> Cc: o...@aepfle.de; gre...@linuxfoundation.org; jasow...@redhat.com;
> linux-ker...@vger.kernel.org; j...@perches.com; netdev@vger.kernel.org;
> a...@canonical.com; de...@linuxdriverproject.org; Haiyang Zhang
> <haiya...@microsoft.com>
> Subject: RE: [PATCH v11 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)
> 
> > From: David Miller [mailto:da...@davemloft.net]
> > Sent: Monday, May 16, 2016 1:16
> > To: Dexuan Cui <de...@microsoft.com>
> > Cc: gre...@linuxfoundation.org; netdev@vger.kernel.org; linux-
> > ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de;
> > a...@canonical.com; jasow...@redhat.com; cav...@redhat.com; KY
> > Srinivasan <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>;
> > j...@perches.com; vkuzn...@redhat.com
> > Subject: Re: [PATCH v11 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)
> >
> > From: Dexuan Cui <de...@microsoft.com>
> > Date: Sun, 15 May 2016 09:52:42 -0700
> >
> > > Changes since v10
> > >
> > > 1) add module params: send_ring_page, recv_ring_page. They can be used
> to
> > > enlarge the ringbuffer size to get better performance, e.g.,
> > > # modprobe hv_sock  recv_ring_page=16 send_ring_page=16
> > > By default, recv_ring_page is 3 and send_ring_page is 2.
> > >
> > > 2) add module param max_socket_number (the default is 1024).
> > > A user can enlarge the number to create more than 1024 hv_sock sockets.
> > > By default, 1024 sockets take about 1024 * (3+2+1+1) * 4KB = 28M bytes.
> > > (Here 1+1 means 1 page for send/recv buffers per connection, 
> > > respectively.)
> >
> > This is papering around my objections, and create module parameters which
> > I am fundamentally against.
> >
> > You're making the facility unusable by default, just to work around my
> > memory consumption concerns.
> >
> > What will end up happening is that everyone will simply increase the
> > values.
> >
> > You're not really addressing the core issue, and I will be ignoring you
> > future submissions of this change until you do.
> 
> David,
> I am sorry I came across as ignoring your feedback; that was not my intention.
> The current host side design for this feature is such that each socket 
> connection
> needs its own channel, which consists of
> 
> 1.A ring buffer for host to guest communication
> 2.A ring buffer for guest to host communication
> 
> The memory for the ring buffers has to be pinned down as this will be accessed
> both from interrupt level in Linux guest and from the host OS at any time.
> 
> To address your concerns, I am planning to re-implement both the receive path
> and the send path so that no additional pinned memory will be needed.
> 
> Receive Path:
> When the application does a read on the socket, we will dynamically allocate
> the buffer and perform the read operation on the incoming ring buffer. Since
> we will be in the process context, we can sleep here and will set the
> "GFP_KERNEL | __GFP_NOFAIL" flags. This buffer will be freed once the
> application consumes all the data.
> 
> Send Path:
> On the send side, we will construct the payload to be sent directly on the
> outgoing ringbuffer.
> 
> So, with these changes, the only memory that will be pinned down will be the
> memory for the ring buffers on a per-connection basis and this memory will be
> pinned down until the connection is torn down.
> 
> Please let me know if this addresses your concerns.
> 
> -- Dexuan

Hi David,
Ping. Really appreciate your comment.

 Thanks,
-- Dexuan


RE: [PATCH v11 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)

2016-05-16 Thread Dexuan Cui
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Monday, May 16, 2016 1:16
> To: Dexuan Cui <de...@microsoft.com>
> Cc: gre...@linuxfoundation.org; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de;
> a...@canonical.com; jasow...@redhat.com; cav...@redhat.com; KY
> Srinivasan <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>;
> j...@perches.com; vkuzn...@redhat.com
> Subject: Re: [PATCH v11 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)
>
> From: Dexuan Cui <de...@microsoft.com>
> Date: Sun, 15 May 2016 09:52:42 -0700
>
> > Changes since v10
> >
> > 1) add module params: send_ring_page, recv_ring_page. They can be used to
> > enlarge the ringbuffer size to get better performance, e.g.,
> > # modprobe hv_sock  recv_ring_page=16 send_ring_page=16
> > By default, recv_ring_page is 3 and send_ring_page is 2.
> >
> > 2) add module param max_socket_number (the default is 1024).
> > A user can enlarge the number to create more than 1024 hv_sock sockets.
> > By default, 1024 sockets take about 1024 * (3+2+1+1) * 4KB = 28M bytes.
> > (Here 1+1 means 1 page for send/recv buffers per connection, respectively.)
>
> This is papering around my objections, and create module parameters which
> I am fundamentally against.
>
> You're making the facility unusable by default, just to work around my
> memory consumption concerns.
>
> What will end up happening is that everyone will simply increase the
> values.
>
> You're not really addressing the core issue, and I will be ignoring you
> future submissions of this change until you do.

David,
I am sorry I came across as ignoring your feedback; that was not my intention.
The current host side design for this feature is such that each socket 
connection
needs its own channel, which consists of

1.A ring buffer for host to guest communication
2.A ring buffer for guest to host communication

The memory for the ring buffers has to be pinned down as this will be accessed
both from interrupt level in Linux guest and from the host OS at any time.

To address your concerns, I am planning to re-implement both the receive path
and the send path so that no additional pinned memory will be needed.

Receive Path:
When the application does a read on the socket, we will dynamically allocate
the buffer and perform the read operation on the incoming ring buffer. Since
we will be in the process context, we can sleep here and will set the
"GFP_KERNEL | __GFP_NOFAIL" flags. This buffer will be freed once the
application consumes all the data.

Send Path:
On the send side, we will construct the payload to be sent directly on the
outgoing ringbuffer.

So, with these changes, the only memory that will be pinned down will be the
memory for the ring buffers on a per-connection basis and this memory will be
pinned down until the connection is torn down.

Please let me know if this addresses your concerns.

 Thanks,
-- Dexuan



[PATCH v11 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)

2016-05-15 Thread Dexuan Cui
efault, 1024 sockets take about 1024 * (3+2+1+1) * 4KB = 28M bytes.
(Here 1+1 means 1 page for send/recv buffers per connection, respectively.)

3) implement the TODO in hvsock_shutdown().

4) fix a bug in hvsock_close_connection():
   I remove "sk->sk_socket->state = SS_UNCONNECTED;" -- actually this line
is not really useful. For a connection triggered by a host app’s connect(),
sk->sk_socket remains NULL before the connection is accepted by the server
app (in Linux VM): see hvsock_accept() -> hvsock_accept_wait() ->
sock_graft(connected, newsock). If the host app exits before the server
app’s accept() returns, the host can send a rescind-message to close the
connection and later in the Linux VM’s message handler 
i.e. vmbus_onoffer_rescind()), Linux will get a NULL de-referencing crash. 

5) fix a bug in hvsock_open_connection()
  I move the vmbus_set_chn_rescind_callback() to a later place, because
when vmbus_open() fails, hvsock_close_connection() can do nothing and we
count on vmbus_onoffer_rescind() -> vmbus_device_unregister() to clean up
the device.

6) some stylistic modificiation.

Dexuan Cui (1):
  hv_sock: introduce Hyper-V Sockets

 MAINTAINERS |2 +
 include/linux/hyperv.h  |   14 +
 include/linux/socket.h  |4 +-
 include/net/af_hvsock.h |   78 +++
 include/uapi/linux/hyperv.h |   25 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/hv_sock/Kconfig |   10 +
 net/hv_sock/Makefile|3 +
 net/hv_sock/af_hvsock.c | 1520 +++
 10 files changed, 1657 insertions(+), 1 deletion(-)
 create mode 100644 include/net/af_hvsock.h
 create mode 100644 net/hv_sock/Kconfig
 create mode 100644 net/hv_sock/Makefile
 create mode 100644 net/hv_sock/af_hvsock.c

-- 
2.7.4



[PATCH v11 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-05-15 Thread Dexuan Cui
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It's somewhat like TCP over
VMBus, but the transportation layer (VMBus) is much simpler than IP.

With Hyper-V Sockets, applications between the host and the guest can talk
to each other directly by the traditional BSD-style socket APIs.

Hyper-V Sockets is only available on new Windows hosts, like Windows Server
2016. More info is in this article "Make your own integration services":
https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service

The patch implements the necessary support in the guest side by introducing
a new socket address family AF_HYPERV.

Signed-off-by: Dexuan Cui <de...@microsoft.com>
Cc: "K. Y. Srinivasan" <k...@microsoft.com>
Cc: Haiyang Zhang <haiya...@microsoft.com>
Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
Cc: Cathy Avery <cav...@redhat.com>
---

You can also get the patch on this branch:
https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160515_v11

For the change log before v10, please see https://lkml.org/lkml/2016/5/4/532

In v10, the main changes consist of
1) minimize struct hvsock_sock by making the send/recv buffers pointers.
   the buffers are allocated by kmalloc() in __hvsock_create().
2) minimize the sizes of the send/recv buffers and the vmbus ringbuffers.


In v11, the changes are:
1) add module params: send_ring_page, recv_ring_page. They can be used to
enlarge the ringbuffer size to get better performance, e.g.,
# modprobe hv_sock  recv_ring_page=16 send_ring_page=16
By default, recv_ring_page is 3 and send_ring_page is 2.

2) add module param max_socket_number (the default is 1024).
A user can enlarge the number to create more than 1024 hv_sock sockets.
By default, 1024 sockets take about 1024 * (3+2+1+1) * 4KB = 28M bytes.
(Here 1+1 means 1 page for send/recv buffers per connection, respectively.)

3) implement the TODO in hvsock_shutdown().

4) fix a bug in hvsock_close_connection():
   I remove "sk->sk_socket->state = SS_UNCONNECTED;" -- actually this line
is not really useful. For a connection triggered by a host app’s connect(),
sk->sk_socket remains NULL before the connection is accepted by the server
app (in Linux VM): see hvsock_accept() -> hvsock_accept_wait() ->
sock_graft(connected, newsock). If the host app exits before the server
app’s accept() returns, the host can send a rescind-message to close the
connection and later in the Linux VM’s message handler 
i.e. vmbus_onoffer_rescind()), Linux will get a NULL de-referencing crash. 

5) fix a bug in hvsock_open_connection()
  I move the vmbus_set_chn_rescind_callback() to a later place, because
when vmbus_open() fails, hvsock_close_connection() can do nothing and we
count on vmbus_onoffer_rescind() -> vmbus_device_unregister() to clean up
the device.

6) some stylistic modificiation.


 MAINTAINERS |2 +
 include/linux/hyperv.h  |   14 +
 include/linux/socket.h  |4 +-
 include/net/af_hvsock.h |   78 +++
 include/uapi/linux/hyperv.h |   25 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/hv_sock/Kconfig |   10 +
 net/hv_sock/Makefile|3 +
 net/hv_sock/af_hvsock.c | 1520 +++
 10 files changed, 1657 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index b57df66..c9fe2c6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5271,7 +5271,9 @@ F:drivers/pci/host/pci-hyperv.c
 F: drivers/net/hyperv/
 F: drivers/scsi/storvsc_drv.c
 F: drivers/video/fbdev/hyperv_fb.c
+F: net/hv_sock/
 F: include/linux/hyperv.h
+F: include/net/af_hvsock.h
 F: tools/hv/
 F: Documentation/ABI/stable/sysfs-bus-vmbus
 
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index aa0fadc..7be7237 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1338,4 +1338,18 @@ extern __u32 vmbus_proto_version;
 
 int vmbus_send_tl_connect_request(const uuid_le *shv_guest_servie_id,
  const uuid_le *shv_host_servie_id);
+struct vmpipe_proto_header {
+   u32 pkt_type;
+   u32 data_size;
+};
+
+#define HVSOCK_HEADER_LEN  (sizeof(struct vmpacket_descriptor) + \
+sizeof(struct vmpipe_proto_header))
+
+/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */
+#define PREV_INDICES_LEN   (sizeof(u64))
+
+#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \
+   ALIGN((payload_len), 8) + \
+   PREV_INDICES_LEN)
 #endif /* _HYPERV_H */
diff --git a/include/linux/socket.h b/include/linux/socket.h
index b5cc5a6..0b68b58 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -202,8 +202,9 @@ struct ucred {
 #define AF_VSOCK   40

[PATCH v10 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-05-11 Thread Dexuan Cui
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It's somewhat like TCP over
VMBus, but the transportation layer (VMBus) is much simpler than IP.

With Hyper-V Sockets, applications between the host and the guest can talk
to each other directly by the traditional BSD-style socket APIs.

Hyper-V Sockets is only available on new Windows hosts, like Windows Server
2016. More info is in this article "Make your own integration services":
https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service

The patch implements the necessary support in the guest side by introducing
a new socket address family AF_HYPERV.

Signed-off-by: Dexuan Cui <de...@microsoft.com>
Cc: "K. Y. Srinivasan" <k...@microsoft.com>
Cc: Haiyang Zhang <haiya...@microsoft.com>
Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
Cc: Cathy Avery <cav...@redhat.com>
---

You can also get the patch on this branch:
https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160512_v10

For the change log before v10, please see https://lkml.org/lkml/2016/5/4/532

In v10, the main changes consist of
1) minimize struct hvsock_sock by making the send/recv buffers pointers.
   the buffers are allocated by kmalloc() in __hvsock_create().

2) minimize the sizes of the send/recv buffers and the vmbus ringbuffers.

 MAINTAINERS |2 +
 include/linux/hyperv.h  |   14 +
 include/linux/socket.h  |4 +-
 include/net/af_hvsock.h |   78 +++
 include/uapi/linux/hyperv.h |   25 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/hv_sock/Kconfig |   10 +
 net/hv_sock/Makefile|3 +
 net/hv_sock/af_hvsock.c | 1484 +++
 10 files changed, 1621 insertions(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index b57df66..c9fe2c6 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5271,7 +5271,9 @@ F:drivers/pci/host/pci-hyperv.c
 F: drivers/net/hyperv/
 F: drivers/scsi/storvsc_drv.c
 F: drivers/video/fbdev/hyperv_fb.c
+F: net/hv_sock/
 F: include/linux/hyperv.h
+F: include/net/af_hvsock.h
 F: tools/hv/
 F: Documentation/ABI/stable/sysfs-bus-vmbus
 
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index aa0fadc..7be7237 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1338,4 +1338,18 @@ extern __u32 vmbus_proto_version;
 
 int vmbus_send_tl_connect_request(const uuid_le *shv_guest_servie_id,
  const uuid_le *shv_host_servie_id);
+struct vmpipe_proto_header {
+   u32 pkt_type;
+   u32 data_size;
+};
+
+#define HVSOCK_HEADER_LEN  (sizeof(struct vmpacket_descriptor) + \
+sizeof(struct vmpipe_proto_header))
+
+/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */
+#define PREV_INDICES_LEN   (sizeof(u64))
+
+#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \
+   ALIGN((payload_len), 8) + \
+   PREV_INDICES_LEN)
 #endif /* _HYPERV_H */
diff --git a/include/linux/socket.h b/include/linux/socket.h
index b5cc5a6..0b68b58 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -202,8 +202,9 @@ struct ucred {
 #define AF_VSOCK   40  /* vSockets */
 #define AF_KCM 41  /* Kernel Connection Multiplexor*/
 #define AF_QIPCRTR 42  /* Qualcomm IPC Router  */
+#define AF_HYPERV  43  /* Hyper-V Sockets  */
 
-#define AF_MAX 43  /* For now.. */
+#define AF_MAX 44  /* For now.. */
 
 /* Protocol families, same as address families. */
 #define PF_UNSPEC  AF_UNSPEC
@@ -251,6 +252,7 @@ struct ucred {
 #define PF_VSOCK   AF_VSOCK
 #define PF_KCM AF_KCM
 #define PF_QIPCRTR AF_QIPCRTR
+#define PF_HYPERV  AF_HYPERV
 #define PF_MAX AF_MAX
 
 /* Maximum queue length specifiable by listen.  */
diff --git a/include/net/af_hvsock.h b/include/net/af_hvsock.h
new file mode 100644
index 000..e002397
--- /dev/null
+++ b/include/net/af_hvsock.h
@@ -0,0 +1,78 @@
+#ifndef __AF_HVSOCK_H__
+#define __AF_HVSOCK_H__
+
+#include 
+#include 
+#include 
+
+/* Note: 3-page is the minimal recv ringbuffer size:
+ *
+ * the 1st page is used as the shared read/write index etc, rather than data:
+ * see hv_ringbuffer_init();
+ *
+ * the payload length in the vmbus pipe message received from the host can
+ * be 4096 bytes, and considing the header of HVSOCK_HEADER_LEN bytes, we
+ * need at least 2 extra pages for ringbuffer data.
+ */
+#define HVSOCK_RCV_BUF_SZPAGE_SIZE
+#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RCV (3 * PAGE_SIZE)
+
+/* As to send, here let's make sure the hvsock_send_buf struct can be held in 1
+ * page, and since we want to use 2 pages for the send rin

[PATCH v10 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)

2016-05-11 Thread Dexuan Cui
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It's somewhat like TCP over
VMBus, but the transportation layer (VMBus) is much simpler than IP.

With Hyper-V Sockets, applications between the host and the guest can talk
to each other directly by the traditional BSD-style socket APIs.

Hyper-V Sockets is only available on new Windows hosts, like Windows Server
2016. More info is in this article "Make your own integration services":
https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service

The patch implements the necessary support in the guest side by
introducing a new socket address family AF_HYPERV.

You can also get the patch by:
https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160512_v10

Note: the VMBus driver side's supporting patches have been in the mainline
tree.

I know the kernel has already had a VM Sockets driver (AF_VSOCK) based
on VMware VMCI (net/vmw_vsock/, drivers/misc/vmw_vmci), and KVM is
proposing AF_VSOCK of virtio version:
http://marc.info/?l=linux-netdev=145952064004765=2

However, though Hyper-V Sockets may seem conceptually similar to
AF_VOSCK, there are differences in the transportation layer, and IMO these
make the direct code reusing impractical:

1. In AF_VSOCK, the endpoint type is: , but in
AF_HYPERV, the endpoint type is: . Here GUID
is 128-bit.

2. AF_VSOCK supports SOCK_DGRAM, while AF_HYPERV doesn't.

3. AF_VSOCK supports some special sock opts, like SO_VM_SOCKETS_BUFFER_SIZE,
SO_VM_SOCKETS_BUFFER_MIN/MAX_SIZE and SO_VM_SOCKETS_CONNECT_TIMEOUT.
These are meaningless to AF_HYPERV.

4. Some AF_VSOCK's VMCI transportation ops are meanless to AF_HYPERV/VMBus,
like .notify_recv_init
.notify_recv_pre_block
.notify_recv_pre_dequeue
.notify_recv_post_dequeue
.notify_send_init
.notify_send_pre_block
.notify_send_pre_enqueue
.notify_send_post_enqueue
etc.

So I think we'd better introduce a new address family: AF_HYPERV.

Please review the patch.

Looking forward to your comments, especially comments from David. :-)

Changes since v1:
- updated "[PATCH 6/7] hvsock: introduce Hyper-V VM Sockets feature"
- added __init and __exit for the module init/exit functions
- net/hv_sock/Kconfig: "default m" -> "default m if HYPERV"
- MODULE_LICENSE: "Dual MIT/GPL" -> "Dual BSD/GPL"

Changes since v2:
- fixed various coding issue pointed out by David Miller
- fixed indentation issues
- removed pr_debug in net/hv_sock/af_hvsock.c
- used reverse-Chrismas-tree style for local variables.
- EXPORT_SYMBOL -> EXPORT_SYMBOL_GPL

Changes since v3:
- fixed a few coding issue pointed by Vitaly Kuznetsov and Dan Carpenter
- fixed the ret value in vmbus_recvpacket_hvsock on error
- fixed the style of multi-line comment: vmbus_get_hvsock_rw_status()

Changes since v4 (https://lkml.org/lkml/2015/7/28/404):
- addressed all the comments about V4.
- treat the hvsock offers/channels as special VMBus devices
- add a mechanism to pass hvsock events to the hvsock driver
- fixed some corner cases with proper locking when a connection is closed
- rebased to the latest Greg's tree

Changes since v5 (https://lkml.org/lkml/2015/12/24/103):
- addressed the coding style issues (Vitaly Kuznetsov & David Miller, thanks!)
- used a better coding for the per-channel rescind callback (Thank Vitaly!)
- avoided the introduction of new VMBUS driver APIs vmbus_sendpacket_hvsock()
and vmbus_recvpacket_hvsock() and used vmbus_sendpacket()/vmbus_recvpacket()
in the higher level (i.e., the vmsock driver). Thank Vitaly!

Changes since v6 (http://lkml.iu.edu/hypermail/linux/kernel/1601.3/01813.html)
- only a few minor changes of coding style and comments

Changes since v7
- a few minor changes of coding style: thanks, Joe Perches!
- added some lines of comments about GUID/UUID before the struct sockaddr_hv.

Changes since v8
- removed the unnecessary __packed for some definitions: thanks, David!
- hvsock_open_connection:  use offer.u.pipe.user_def[0] to know the connection
and reorganized the function
direction 
- reorganized the code according to suggestions from Cathy Avery: split big
functions into small ones, set .setsockopt and getsockopt to
sock_no_setsockopt/sock_no_getsockopt
- inline'd some small list helper functions

Changes since v9
- minimized struct hvsock_sock by making the send/recv buffers pointers.
   the buffers are allocated by kmalloc() in __hvsock_create() now.
- minimized the sizes of the send/recv buffers and the vmbus ringbuffers.


Dexuan Cui (1):
  hv_sock: introduce Hyper-V Sockets

 MAINTAINERS |2 +
 include/linux/hyperv.h  |   14 +
 include/linux/socket.h  |4 +-
 include/net/af_hvsock.h |   78 +++
 include/uapi/linux/hyperv.h |   25 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/hv_sock/Kconfig |   10 +

RE: [PATCH v9 net-next 1/2] hv_sock: introduce Hyper-V Sockets

2016-05-09 Thread Dexuan Cui
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Monday, May 9, 2016 1:45
> To: Dexuan Cui <de...@microsoft.com>
> Cc: gre...@linuxfoundation.org; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de;
> a...@canonical.com; jasow...@redhat.com; cav...@redhat.com; KY
> Srinivasan <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>;
> j...@perches.com; vkuzn...@redhat.com
> Subject: Re: [PATCH v9 net-next 1/2] hv_sock: introduce Hyper-V Sockets
>
> From: Dexuan Cui <de...@microsoft.com>
> Date: Sun, 8 May 2016 06:11:04 +
>
> > Thanks for pointing this out!
> > I understand, so I think I should add a module parameter, e.g.,
> > "hv_sock.max_socket_number" with a default value, say, 1024?
>
> No, you should get rid of the huge multi-page buffers.

Hi David,
Ok, how do you like the below proof-of-concept patch snippet?

I use 1 page for the recv buf and another page for send buf. They should be
allocated by kmalloc(sizeof(struct hvsock_send/recv_buf), GFP_KERNEL).

And, by default, I use 2 pages for VMBUS send/recv ringbuffers respectively.
(Note: 2 is the minimal ringbuffer size because actually 1 page of the two is 
used
as the shared read/write index etc, rather than data)
A module parameter will be added to allow the user to use a big ringbuffer
size, if the user cares too much about the performance.

Another parameter will be added to limit how many hvsock sockets can be
created at most. The default value can be 1024, meaning at most
1024 * (2+2+1+1) * 4KB = 24MB memory is used.

-#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV (5 * PAGE_SIZE)
-#define VMBUS_RINGBUFFER_SIZE_HVSOCK_SEND (5 * PAGE_SIZE)
+#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV (2 * PAGE_SIZE)
+#define VMBUS_RINGBUFFER_SIZE_HVSOCK_SEND (2 * PAGE_SIZE)

-#define HVSOCK_RCV_BUF_SZ  VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV
+#define HVSOCK_RCV_BUF_SZ  PAGE_SIZE
 #define HVSOCK_SND_BUF_SZ  PAGE_SIZE

+struct hvsock_send_buf {
+   struct vmpipe_proto_header hdr;
+   u8 buf[HVSOCK_SND_BUF_SZ];
+};
+
+struct hvsock_recv_buf {
+   struct vmpipe_proto_header hdr;
+   u8 buf[HVSOCK_RCV_BUF_SZ];
+
+   unsigned int data_len;
+   unsigned int data_offset;
+};
+
@@ -35,21 +48,8 @@ struct hvsock_sock {

struct vmbus_channel *channel;

-   struct {
-   struct vmpipe_proto_header hdr;
-   u8 buf[HVSOCK_SND_BUF_SZ];
-   } send;
-
-   struct {
-   struct vmpipe_proto_header hdr;
-   u8 buf[HVSOCK_RCV_BUF_SZ];
-
-   unsigned int data_len;
-   unsigned int data_offset;
-   } recv;
+   struct hvsock_send_buf *send_buf;
+   struct hvsock_recv_buf *recv_buf;
 };

Thanks,
-- Dexuan


RE: [PATCH v9 net-next 1/2] hv_sock: introduce Hyper-V Sockets

2016-05-08 Thread Dexuan Cui
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Sunday, May 8, 2016 1:41
> To: Dexuan Cui <de...@microsoft.com>
> Cc: gre...@linuxfoundation.org; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de;
> a...@canonical.com; jasow...@redhat.com; cav...@redhat.com; KY
> Srinivasan <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>;
> j...@perches.com; vkuzn...@redhat.com
> Subject: Re: [PATCH v9 net-next 1/2] hv_sock: introduce Hyper-V Sockets
>
> From: Dexuan Cui <de...@microsoft.com>
> Date: Sat, 7 May 2016 10:49:25 +
>
> > I should be able to make 'send', 'recv' here to pointers and use vmalloc()
> > to allocate the memory for them.  I will do this.
>
> That's still unswappable kernel memory.
Hi David,
My understanding is: kernel pages are not swappable in Linux, so it looks I
can't avoid unswappable kernel memory here?

> People can open N sockets, where N is something on the order of the FD
> limit the process has, per process.  This allows someone to quickly
> eat up a lot of memory and hold onto it nearly indefinitely.

Thanks for pointing this out!
I understand, so I think I should add a module parameter, e.g.,
"hv_sock.max_socket_number" with a default value, say, 1024?

1 established hv_sock connection takes less than 20 pages, including 10
pages for VMBus ringbuffer, 6 pages for send/recv buffers(I'll use
vmalloc() for this), etc.
Here the recv buf needs a size of 5 pages because potentially the host
can send the guest a VMBus packet with an up-to-5-page payload, i..e,
the VMBus inbound ringbuffer size.

1024 hv_sock connections take less than 20*4KB * 1K = 80MB memory.

A user who needs more connections can change the module parameter
without reboot.

hv_sock connection is designed  to work only between the host and the
guest. I think 1024 connections seem pretty enough.

BTW, a user can't create hv_sock connections without enough privilege.
Please see

+static int hvsock_create(struct net *net, struct socket *sock,
+int protocol, int kern)
+{
+   if (!capable(CAP_SYS_ADMIN) && !capable(CAP_NET_ADMIN))
+   return -EPERM;

David, does this make sense to you?

Thanks,
-- Dexuan


RE: [PATCH v9 net-next 1/2] hv_sock: introduce Hyper-V Sockets

2016-05-07 Thread Dexuan Cui
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Saturday, May 7, 2016 1:04
> To: Dexuan Cui <de...@microsoft.com>
> Cc: gre...@linuxfoundation.org; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de;
> a...@canonical.com; jasow...@redhat.com; cav...@redhat.com; KY
> Srinivasan <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>;
> j...@perches.com; vkuzn...@redhat.com
> Subject: Re: [PATCH v9 net-next 1/2] hv_sock: introduce Hyper-V Sockets
> 
> From: Dexuan Cui <de...@microsoft.com>
> Date: Wed,  4 May 2016 09:56:57 -0700
> 
> > +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV (5 * PAGE_SIZE)
> > +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_SEND (5 * PAGE_SIZE)
> > +
> > +#define HVSOCK_RCV_BUF_SZ
>   VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV
>  ...
> > +struct hvsock_sock {
>  ...
> > +   /* The 'hdr' and 'buf' in the below 'send' and 'recv' definitions must
> > +* be consecutive: see hvsock_send_data() and hvsock_recv_data().
> > +*/
> > +   struct {
> > +   struct vmpipe_proto_header hdr;
> > +   u8 buf[HVSOCK_SND_BUF_SZ];
> > +   } send;
> > +
> > +   struct {
> > +   struct vmpipe_proto_header hdr;
> > +   u8 buf[HVSOCK_RCV_BUF_SZ];
> > +
> > +   unsigned int data_len;
> > +   unsigned int data_offset;
> > +   } recv;
> 
> I don't think allocating 5 pages of unswappable memory for every Hyper-V
> socket
> created is reasonable.

Thanks for the comment, David!

I should be able to make 'send', 'recv' here to pointers and use vmalloc()
to allocate the memory for them.  I will do this.

Thanks,
-- Dexuan


[PATCH v9 net-next 0/2] introduce Hyper-V VM Sockets(hv_sock)

2016-05-04 Thread Dexuan Cui
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It's somewhat like TCP over
VMBus, but the transportation layer (VMBus) is much simpler than IP.

With Hyper-V Sockets, applications between the host and the guest can talk
to each other directly by the traditional BSD-style socket APIs.

Hyper-V Sockets is only available on new Windows hosts, like Windows Server
2016. More info is in this article "Make your own integration services":
https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service

The patch implements the necessary support in the guest side by
introducing a new socket address family AF_HYPERV.

You can also get the patch by:
https://github.com/dcui/linux/commits/decui/hv_sock/net-next/20160502_v09

Note: the VMBus driver side's supporting patches have been in the mainline
tree.

I know the kernel has already had a VM Sockets driver (AF_VSOCK) based
on VMware VMCI (net/vmw_vsock/, drivers/misc/vmw_vmci), and KVM is
proposing AF_VSOCK of virtio version:
http://marc.info/?l=linux-netdev=145952064004765=2

However, though Hyper-V Sockets may seem conceptually similar to
AF_VOSCK, there are differences in the transportation layer, and IMO these
make the direct code reusing impractical:

1. In AF_VSOCK, the endpoint type is: , but in
AF_HYPERV, the endpoint type is: . Here GUID
is 128-bit.

2. AF_VSOCK supports SOCK_DGRAM, while AF_HYPERV doesn't.

3. AF_VSOCK supports some special sock opts, like SO_VM_SOCKETS_BUFFER_SIZE,
SO_VM_SOCKETS_BUFFER_MIN/MAX_SIZE and SO_VM_SOCKETS_CONNECT_TIMEOUT.
These are meaningless to AF_HYPERV.

4. Some AF_VSOCK's VMCI transportation ops are meanless to AF_HYPERV/VMBus,
like .notify_recv_init
.notify_recv_pre_block
.notify_recv_pre_dequeue
.notify_recv_post_dequeue
.notify_send_init
.notify_send_pre_block
.notify_send_pre_enqueue
.notify_send_post_enqueue
etc.

So I think we'd better introduce a new address family: AF_HYPERV.

Please review the patch.

Looking forward to your comments, especially comments from David. :-)

Changes since v1:
- updated "[PATCH 6/7] hvsock: introduce Hyper-V VM Sockets feature"
- added __init and __exit for the module init/exit functions
- net/hv_sock/Kconfig: "default m" -> "default m if HYPERV"
- MODULE_LICENSE: "Dual MIT/GPL" -> "Dual BSD/GPL"

Changes since v2:
- fixed various coding issue pointed out by David Miller
- fixed indentation issues
- removed pr_debug in net/hv_sock/af_hvsock.c
- used reverse-Chrismas-tree style for local variables.
- EXPORT_SYMBOL -> EXPORT_SYMBOL_GPL

Changes since v3:
- fixed a few coding issue pointed by Vitaly Kuznetsov and Dan Carpenter
- fixed the ret value in vmbus_recvpacket_hvsock on error
- fixed the style of multi-line comment: vmbus_get_hvsock_rw_status()

Changes since v4 (https://lkml.org/lkml/2015/7/28/404):
- addressed all the comments about V4.
- treat the hvsock offers/channels as special VMBus devices
- add a mechanism to pass hvsock events to the hvsock driver
- fixed some corner cases with proper locking when a connection is closed
- rebased to the latest Greg's tree

Changes since v5 (https://lkml.org/lkml/2015/12/24/103):
- addressed the coding style issues (Vitaly Kuznetsov & David Miller, thanks!)
- used a better coding for the per-channel rescind callback (Thank Vitaly!)
- avoided the introduction of new VMBUS driver APIs vmbus_sendpacket_hvsock()
and vmbus_recvpacket_hvsock() and used vmbus_sendpacket()/vmbus_recvpacket()
in the higher level (i.e., the vmsock driver). Thank Vitaly!

Changes since v6 (http://lkml.iu.edu/hypermail/linux/kernel/1601.3/01813.html)
- only a few minor changes of coding style and comments

Changes since v7
- a few minor changes of coding style: thanks, Joe Perches!
- added some lines of comments about GUID/UUID before the struct sockaddr_hv.

Changes since v8
- removed the unnecessary __packed for some definitions: thanks, David!
- hvsock_open_connection:  use offer.u.pipe.user_def[0] to know the connection
and reorganized the function
direction 
- reorganized the code according to suggestions from Cathy Avery: split big
functions into small ones, set .setsockopt and getsockopt to
sock_no_setsockopt/sock_no_getsockopt
- inline'd some small list helper functions

Dexuan Cui (2):
  hv_sock: introduce Hyper-V Sockets
  net: add the AF_HYPERV entries to family name tables

 MAINTAINERS |2 +
 include/linux/hyperv.h  |   16 +
 include/linux/socket.h  |5 +-
 include/net/af_hvsock.h |   55 ++
 include/uapi/linux/hyperv.h |   25 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/core/sock.c |6 +-
 net/hv_sock/Kconfig |   10 +
 net/hv_sock/Makefile|3 +
 net/hv_sock/af_hvsock.c | 1434 +++
 11 files changed, 1553 insertions(+), 5 deletions(-)
 create mo

[PATCH v9 net-next 1/2] hv_sock: introduce Hyper-V Sockets

2016-05-04 Thread Dexuan Cui
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It's somewhat like TCP over
VMBus, but the transportation layer (VMBus) is much simpler than IP.

With Hyper-V Sockets, applications between the host and the guest can talk
to each other directly by the traditional BSD-style socket APIs.

Hyper-V Sockets is only available on new Windows hosts, like Windows Server
2016. More info is in this article "Make your own integration services":
https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service

The patch implements the necessary support in the guest side by introducing
a new socket address family AF_HYPERV.

Signed-off-by: Dexuan Cui <de...@microsoft.com>
Cc: "K. Y. Srinivasan" <k...@microsoft.com>
Cc: Haiyang Zhang <haiya...@microsoft.com>
Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
Cc: Cathy Avery <cav...@redhat.com>
---
 MAINTAINERS |2 +
 include/linux/hyperv.h  |   16 +
 include/linux/socket.h  |5 +-
 include/net/af_hvsock.h |   55 ++
 include/uapi/linux/hyperv.h |   25 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/hv_sock/Kconfig |   10 +
 net/hv_sock/Makefile|3 +
 net/hv_sock/af_hvsock.c | 1434 +++
 10 files changed, 1550 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index fa02825..b32716f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5268,7 +5268,9 @@ F:drivers/pci/host/pci-hyperv.c
 F: drivers/net/hyperv/
 F: drivers/scsi/storvsc_drv.c
 F: drivers/video/fbdev/hyperv_fb.c
+F: net/hv_sock/
 F: include/linux/hyperv.h
+F: include/net/af_hvsock.h
 F: tools/hv/
 F: Documentation/ABI/stable/sysfs-bus-vmbus
 
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index aa0fadc..e756719 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1338,4 +1338,20 @@ extern __u32 vmbus_proto_version;
 
 int vmbus_send_tl_connect_request(const uuid_le *shv_guest_servie_id,
  const uuid_le *shv_host_servie_id);
+struct vmpipe_proto_header {
+   u32 pkt_type;
+   u32 data_size;
+};
+
+#define HVSOCK_HEADER_LEN  (sizeof(struct vmpacket_descriptor) + \
+sizeof(struct vmpipe_proto_header))
+
+/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */
+#define PREV_INDICES_LEN   (sizeof(u64))
+
+#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \
+   ALIGN((payload_len), 8) + \
+   PREV_INDICES_LEN)
+#define HVSOCK_MIN_PKT_LEN HVSOCK_PKT_LEN(1)
+
 #endif /* _HYPERV_H */
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 73bf6c6..88b1ccd 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -201,8 +201,8 @@ struct ucred {
 #define AF_NFC 39  /* NFC sockets  */
 #define AF_VSOCK   40  /* vSockets */
 #define AF_KCM 41  /* Kernel Connection Multiplexor*/
-
-#define AF_MAX 42  /* For now.. */
+#define AF_HYPERV  42  /* Hyper-V Sockets  */
+#define AF_MAX 43  /* For now.. */
 
 /* Protocol families, same as address families. */
 #define PF_UNSPEC  AF_UNSPEC
@@ -249,6 +249,7 @@ struct ucred {
 #define PF_NFC AF_NFC
 #define PF_VSOCK   AF_VSOCK
 #define PF_KCM AF_KCM
+#define PF_HYPERV  AF_HYPERV
 #define PF_MAX AF_MAX
 
 /* Maximum queue length specifiable by listen.  */
diff --git a/include/net/af_hvsock.h b/include/net/af_hvsock.h
new file mode 100644
index 000..04bc40c
--- /dev/null
+++ b/include/net/af_hvsock.h
@@ -0,0 +1,55 @@
+#ifndef __AF_HVSOCK_H__
+#define __AF_HVSOCK_H__
+
+#include 
+#include 
+#include 
+
+#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV (5 * PAGE_SIZE)
+#define VMBUS_RINGBUFFER_SIZE_HVSOCK_SEND (5 * PAGE_SIZE)
+
+#define HVSOCK_RCV_BUF_SZ  VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV
+#define HVSOCK_SND_BUF_SZ  PAGE_SIZE
+
+#define sk_to_hvsock(__sk)((struct hvsock_sock *)(__sk))
+#define hvsock_to_sk(__hvsk)   ((struct sock *)(__hvsk))
+
+struct hvsock_sock {
+   /* sk must be the first member. */
+   struct sock sk;
+
+   struct sockaddr_hv local_addr;
+   struct sockaddr_hv remote_addr;
+
+   /* protected by the global hvsock_mutex */
+   struct list_head bound_list;
+   struct list_head connected_list;
+
+   struct list_head accept_queue;
+   /* used by enqueue and dequeue */
+   struct mutex accept_queue_mutex;
+
+   struct delayed_work dwork;
+
+   u32 peer_shutdown;
+
+   struct vmbus_channel *channel;
+
+   /* The 'hdr' and 'buf' in the below 'send' and 'recv' definitions must
+* be consecutive: see hvsock_send_data() 

[PATCH v9 net-next 2/2] net: add the AF_HYPERV entries to family name tables

2016-05-04 Thread Dexuan Cui
This is for the hv_sock driver, which introduces AF_HYPERV(42).

Signed-off-by: Dexuan Cui <de...@microsoft.com>
Cc: "K. Y. Srinivasan" <k...@microsoft.com>
Cc: Haiyang Zhang <haiya...@microsoft.com>
Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
Cc: Cathy Avery <cav...@redhat.com>
---
 net/core/sock.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index e16a5db..c0884c7 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -222,7 +222,7 @@ static const char *const af_family_key_strings[AF_MAX+1] = {
   "sk_lock-AF_RXRPC" , "sk_lock-AF_ISDN" , "sk_lock-AF_PHONET"   ,
   "sk_lock-AF_IEEE802154", "sk_lock-AF_CAIF" , "sk_lock-AF_ALG"  ,
   "sk_lock-AF_NFC"   , "sk_lock-AF_VSOCK", "sk_lock-AF_KCM"  ,
-  "sk_lock-AF_MAX"
+  "sk_lock-AF_HYPERV", "sk_lock-AF_MAX"
 };
 static const char *const af_family_slock_key_strings[AF_MAX+1] = {
   "slock-AF_UNSPEC", "slock-AF_UNIX" , "slock-AF_INET" ,
@@ -239,7 +239,7 @@ static const char *const 
af_family_slock_key_strings[AF_MAX+1] = {
   "slock-AF_RXRPC" , "slock-AF_ISDN" , "slock-AF_PHONET"   ,
   "slock-AF_IEEE802154", "slock-AF_CAIF" , "slock-AF_ALG"  ,
   "slock-AF_NFC"   , "slock-AF_VSOCK","slock-AF_KCM"   ,
-  "slock-AF_MAX"
+  "slock-AF_HYPERV", "slock-AF_MAX"
 };
 static const char *const af_family_clock_key_strings[AF_MAX+1] = {
   "clock-AF_UNSPEC", "clock-AF_UNIX" , "clock-AF_INET" ,
@@ -256,7 +256,7 @@ static const char *const 
af_family_clock_key_strings[AF_MAX+1] = {
   "clock-AF_RXRPC" , "clock-AF_ISDN" , "clock-AF_PHONET"   ,
   "clock-AF_IEEE802154", "clock-AF_CAIF" , "clock-AF_ALG"  ,
   "clock-AF_NFC"   , "clock-AF_VSOCK", "clock-AF_KCM"  ,
-  "clock-AF_MAX"
+  "clock-AF_HYPERV", "clock-AF_MAX"
 };
 
 /*
-- 
2.1.0



RE: [PATCH v8 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-04-26 Thread Dexuan Cui
> From: Cathy Avery [mailto:cav...@redhat.com]
> Sent: Wednesday, April 27, 2016 0:19
> To: Dexuan Cui <de...@microsoft.com>; gre...@linuxfoundation.org;
> da...@davemloft.net; netdev@vger.kernel.org; linux-ker...@vger.kernel.org;
> de...@linuxdriverproject.org; o...@aepfle.de; Jason Wang
> <jasow...@redhat.com>; KY Srinivasan <k...@microsoft.com>; Haiyang Zhang
> <haiya...@microsoft.com>; vkuzn...@redhat.com; j...@perches.com
> Subject: Re: [PATCH v8 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> 
> Hi,
> 
> I will be working with Dexuan to possibly port this functionality into RHEL.
> 
> Here are my initial comments. Mostly stylistic. They are prefaced by CAA.
> 
> Cathy Avery

Thank you very much, Cathy!

I'll take your pretty good suggestions and post a new version.

Thanks,
-- Dexuan


RE: [PATCH v8 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-04-13 Thread Dexuan Cui
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Thursday, April 14, 2016 10:30
> To: Dexuan Cui <de...@microsoft.com>
> Cc: gre...@linuxfoundation.org; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de;
> a...@canonical.com; jasow...@redhat.com; cav...@redhat.com; KY
> Srinivasan <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>;
> j...@perches.com; vkuzn...@redhat.com
> Subject: Re: [PATCH v8 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> 
> From: Dexuan Cui <de...@microsoft.com>
> Date: Thu,  7 Apr 2016 18:36:51 -0700
> 
> > +struct vmpipe_proto_header {
> > +   u32 pkt_type;
> > +   u32 data_size;
> > +} __packed;
> 
> There is no reason to specify __packed here.
> 
> The types are strongly sized to word aligned quantities.
> No holes are possible in this structure, nor is any padding
> possible either.
> 
> Do not ever slap __packed onto protocol or HW defined structures,
> simply just define them properly with proper types and explicit
> padding when necessary.

Hi David,
Thank you very much for taking a look at the patch!
I'll remove all the 3 __packed usages in my patch.

> > +   struct {
> > +   struct vmpipe_proto_header hdr;
> > +   char buf[HVSOCK_SND_BUF_SZ];
> > +   } __packed send;
> 
> And so on, and so forth..
I'll remove __packed and use u8 to replace the 'char' here.
 
> I'm really disappointed that I couldn't even get one hunk into this
> patch submission without finding a major problem.
David,
Could you please point out more issues in the patch? 
I'm definitely happy to fix them. :-)
 
> I expect this patch to take several more iterations before I can even
> come close to applying it.  So please set your expectations properly,
> and also it seems like nobody else wants to even review this stuff
> either.  It is you who needs to find a way to change all of this, not
> me.

A few people took a look at the early versions of the patch and did
give me good suggestions on the interface APIs with VMBus and
some coding style issues, especially Vitaly from Redhat.

Cathy from Redhat was also looking into the patch recently and
gave me some good feedbacks.

I'll try to invite more people to review the patch.

And, I'm updating the patch to address some issues:

1) the feature is only properly supported on Windows 10/2016
build 14290 and later, so I'm going to not enable the feature on
old hosts.

2) there is actually some mechanism we can use to simplify 
hvsock_open_connection() and help to better support 
hvsock_shutdown().

Thanks,
-- Dexuan


RE: [PATCH v8 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-04-07 Thread Dexuan Cui
> From: Joe Perches [mailto:j...@perches.com]
> Sent: Friday, April 8, 2016 9:15
> On Thu, 2016-04-07 at 18:36 -0700, Dexuan Cui wrote:
> > diff --git a/include/net/af_hvsock.h b/include/net/af_hvsock.h
> []
> > +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV (5 * PAGE_SIZE)
> > +#define VMBUS_RINGBUFFER_SIZE_HVSOCK_SEND (5 * PAGE_SIZE)
> > +
> > +#define HVSOCK_RCV_BUF_SZ
>   VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV
> > +#define HVSOCK_SND_BUF_SZ  PAGE_SIZE
> []
> > +struct hvsock_sock {
> []
> > +   struct {
> > +   struct vmpipe_proto_header hdr;
> > +   char buf[HVSOCK_SND_BUF_SZ];
> > +   } __packed send;
> > +
> > +   struct {
> > +   struct vmpipe_proto_header hdr;
> > +   char buf[HVSOCK_RCV_BUF_SZ];
> > +   unsigned int data_len;
> > +   unsigned int data_offset;
> > +   } __packed recv;
> > +};
> 
> These bufs are not page aligned and so can span pages.
> 
> Is there any value in allocating these bufs separately
> as pages instead of as a kmalloc?

The bufs are not required to be page aligned.
Here the 'hdr' and the 'buf' must be consecutive, i.e., the 'buf' must be
an array rather than a pointer: please see hvsock_send_data().

It looks to me there is no big value to make sure the 'buf' is page
aligned: on x86_64, at least it should already be 8-byte aligned due to the
adjacent channel pointer, so memcpy_from_msg() should work
enough good and in hvsock_send_data() -> vmbus_sendpacket(),
we don't copy the 'buf'.

Thanks,
-- Dexuan


[PATCH v8 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-04-07 Thread Dexuan Cui
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It's somewhat like TCP over
VMBus, but the transportation layer (VMBus) is much simpler than IP.

With Hyper-V Sockets, applications between the host and the guest can talk
to each other directly by the traditional BSD-style socket APIs.

Hyper-V Sockets is only available on new Windows hosts, like Windows Server
2016. More info is in this article "Make your own integration services":
https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service

The patch implements the necessary support in the guest side by introducing
a new socket address family AF_HYPERV.

Signed-off-by: Dexuan Cui <de...@microsoft.com>
Cc: "K. Y. Srinivasan" <k...@microsoft.com>
Cc: Haiyang Zhang <haiya...@microsoft.com>
Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 MAINTAINERS |2 +
 include/linux/hyperv.h  |   16 +
 include/linux/socket.h  |5 +-
 include/net/af_hvsock.h |   51 ++
 include/uapi/linux/hyperv.h |   25 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/hv_sock/Kconfig |   10 +
 net/hv_sock/Makefile|3 +
 net/hv_sock/af_hvsock.c | 1483 +++
 10 files changed, 1595 insertions(+), 2 deletions(-)
 create mode 100644 include/net/af_hvsock.h
 create mode 100644 net/hv_sock/Kconfig
 create mode 100644 net/hv_sock/Makefile
 create mode 100644 net/hv_sock/af_hvsock.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 67d99dd..7b6f203 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5267,7 +5267,9 @@ F:drivers/pci/host/pci-hyperv.c
 F: drivers/net/hyperv/
 F: drivers/scsi/storvsc_drv.c
 F: drivers/video/fbdev/hyperv_fb.c
+F: net/hv_sock/
 F: include/linux/hyperv.h
+F: include/net/af_hvsock.h
 F: tools/hv/
 F: Documentation/ABI/stable/sysfs-bus-vmbus
 
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index aa0fadc..b92439d 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1338,4 +1338,20 @@ extern __u32 vmbus_proto_version;
 
 int vmbus_send_tl_connect_request(const uuid_le *shv_guest_servie_id,
  const uuid_le *shv_host_servie_id);
+struct vmpipe_proto_header {
+   u32 pkt_type;
+   u32 data_size;
+} __packed;
+
+#define HVSOCK_HEADER_LEN  (sizeof(struct vmpacket_descriptor) + \
+sizeof(struct vmpipe_proto_header))
+
+/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */
+#define PREV_INDICES_LEN   (sizeof(u64))
+
+#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \
+   ALIGN((payload_len), 8) + \
+   PREV_INDICES_LEN)
+#define HVSOCK_MIN_PKT_LEN HVSOCK_PKT_LEN(1)
+
 #endif /* _HYPERV_H */
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 73bf6c6..88b1ccd 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -201,8 +201,8 @@ struct ucred {
 #define AF_NFC 39  /* NFC sockets  */
 #define AF_VSOCK   40  /* vSockets */
 #define AF_KCM 41  /* Kernel Connection Multiplexor*/
-
-#define AF_MAX 42  /* For now.. */
+#define AF_HYPERV  42  /* Hyper-V Sockets  */
+#define AF_MAX 43  /* For now.. */
 
 /* Protocol families, same as address families. */
 #define PF_UNSPEC  AF_UNSPEC
@@ -249,6 +249,7 @@ struct ucred {
 #define PF_NFC AF_NFC
 #define PF_VSOCK   AF_VSOCK
 #define PF_KCM AF_KCM
+#define PF_HYPERV  AF_HYPERV
 #define PF_MAX AF_MAX
 
 /* Maximum queue length specifiable by listen.  */
diff --git a/include/net/af_hvsock.h b/include/net/af_hvsock.h
new file mode 100644
index 000..a5aa28d
--- /dev/null
+++ b/include/net/af_hvsock.h
@@ -0,0 +1,51 @@
+#ifndef __AF_HVSOCK_H__
+#define __AF_HVSOCK_H__
+
+#include 
+#include 
+#include 
+
+#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV (5 * PAGE_SIZE)
+#define VMBUS_RINGBUFFER_SIZE_HVSOCK_SEND (5 * PAGE_SIZE)
+
+#define HVSOCK_RCV_BUF_SZ  VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV
+#define HVSOCK_SND_BUF_SZ  PAGE_SIZE
+
+#define sk_to_hvsock(__sk)((struct hvsock_sock *)(__sk))
+#define hvsock_to_sk(__hvsk)   ((struct sock *)(__hvsk))
+
+struct hvsock_sock {
+   /* sk must be the first member. */
+   struct sock sk;
+
+   struct sockaddr_hv local_addr;
+   struct sockaddr_hv remote_addr;
+
+   /* protected by the global hvsock_mutex */
+   struct list_head bound_list;
+   struct list_head connected_list;
+
+   struct list_head accept_queue;
+   /* used by enqueue and dequeue */
+   struct mutex accept_queue_mutex;
+
+   struct delayed_work dwork;
+
+   u32 peer_shutdown;
+
+   struct vmbus_channel *channe

[PATCH v8 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)

2016-04-07 Thread Dexuan Cui
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It's somewhat like TCP over
VMBus, but the transportation layer (VMBus) is much simpler than IP.

With Hyper-V Sockets, applications between the host and the guest can talk
to each other directly by the traditional BSD-style socket APIs.

Hyper-V Sockets is only available on new Windows hosts, like Windows Server
2016. More info is in this article "Make your own integration services":
https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service

The patch implements the necessary support in the guest side by
introducing a new socket address family AF_HYPERV.

Note: the VMBus driver side's supporting patches have been in the mainline
tree.

I know the kernel has already had a VM Sockets driver (AF_VSOCK) based
on VMware VMCI (net/vmw_vsock/, drivers/misc/vmw_vmci), and KVM is
proposing AF_VSOCK of virtio version:
http://marc.info/?l=linux-netdev=145952064004765=2

However, though Hyper-V Sockets may seem conceptually similar to
AF_VOSCK, there are differences in the transportation layer, and IMO these
make the direct code reusing impractical:

1. In AF_VSOCK, the endpoint type is: , but in
AF_HYPERV, the endpoint type is: . Here GUID
is 128-bit.

2. AF_VSOCK supports SOCK_DGRAM, while AF_HYPERV doesn't.

3. AF_VSOCK supports some special sock opts, like SO_VM_SOCKETS_BUFFER_SIZE,
SO_VM_SOCKETS_BUFFER_MIN/MAX_SIZE and SO_VM_SOCKETS_CONNECT_TIMEOUT.
These are meaningless to AF_HYPERV.

4. Some AF_VSOCK's VMCI transportation ops are meanless to AF_HYPERV/VMBus,
like .notify_recv_init
.notify_recv_pre_block
.notify_recv_pre_dequeue
.notify_recv_post_dequeue
.notify_send_init
.notify_send_pre_block
.notify_send_pre_enqueue
.notify_send_post_enqueue
etc.

So I think we'd better introduce a new address family: AF_HYPERV.

Please review the patch.

Looking forward to your comments!

Changes since v1:
- updated "[PATCH 6/7] hvsock: introduce Hyper-V VM Sockets feature"
- added __init and __exit for the module init/exit functions
- net/hv_sock/Kconfig: "default m" -> "default m if HYPERV"
- MODULE_LICENSE: "Dual MIT/GPL" -> "Dual BSD/GPL"

Changes since v2:
- fixed various coding issue pointed out by David Miller
- fixed indentation issues
- removed pr_debug in net/hv_sock/af_hvsock.c
- used reverse-Chrismas-tree style for local variables.
- EXPORT_SYMBOL -> EXPORT_SYMBOL_GPL

Changes since v3:
- fixed a few coding issue pointed by Vitaly Kuznetsov and Dan Carpenter
- fixed the ret value in vmbus_recvpacket_hvsock on error
- fixed the style of multi-line comment: vmbus_get_hvsock_rw_status()

Changes since v4 (https://lkml.org/lkml/2015/7/28/404):
- addressed all the comments about V4.
- treat the hvsock offers/channels as special VMBus devices
- add a mechanism to pass hvsock events to the hvsock driver
- fixed some corner cases with proper locking when a connection is closed
- rebased to the latest Greg's tree

Changes since v5 (https://lkml.org/lkml/2015/12/24/103):
- addressed the coding style issues (Vitaly Kuznetsov & David Miller, thanks!)
- used a better coding for the per-channel rescind callback (Thank Vitaly!)
- avoided the introduction of new VMBUS driver APIs vmbus_sendpacket_hvsock()
and vmbus_recvpacket_hvsock() and used vmbus_sendpacket()/vmbus_recvpacket()
in the higher level (i.e., the vmsock driver). Thank Vitaly!

Changes since v6 (http://lkml.iu.edu/hypermail/linux/kernel/1601.3/01813.html)
- only a few minor changes of coding style and comments

Changes since v7
- a few minor changes of coding style: thanks, Joe Perches!
- added some lines of comments about GUID/UUID before the struct sockaddr_hv.

Dexuan Cui (1):
  hv_sock: introduce Hyper-V Sockets

 MAINTAINERS |2 +
 include/linux/hyperv.h  |   16 +
 include/linux/socket.h  |5 +-
 include/net/af_hvsock.h |   51 ++
 include/uapi/linux/hyperv.h |   25 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/hv_sock/Kconfig |   10 +
 net/hv_sock/Makefile|3 +
 net/hv_sock/af_hvsock.c | 1483 +++
 10 files changed, 1595 insertions(+), 2 deletions(-)
 create mode 100644 include/net/af_hvsock.h
 create mode 100644 net/hv_sock/Kconfig
 create mode 100644 net/hv_sock/Makefile
 create mode 100644 net/hv_sock/af_hvsock.c

-- 
2.1.0



RE: [PATCH v7 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-04-07 Thread Dexuan Cui
> From: Joe Perches [mailto:j...@perches.com]
> Sent: Thursday, April 7, 2016 19:30
> To: Dexuan Cui <de...@microsoft.com>; gre...@linuxfoundation.org;
> da...@davemloft.net; netdev@vger.kernel.org; linux-ker...@vger.kernel.org;
> de...@linuxdriverproject.org; o...@aepfle.de; a...@canonical.com;
> jasow...@redhat.com; KY Srinivasan <k...@microsoft.com>; Haiyang Zhang
> <haiya...@microsoft.com>
> Cc: vkuzn...@redhat.com
> Subject: Re: [PATCH v7 net-next 1/1] hv_sock: introduce Hyper-V Sockets
> 
> On Thu, 2016-04-07 at 05:50 -0700, Dexuan Cui wrote:
> > Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
> > mechanism between the host and the guest. It's somewhat like TCP over
> > VMBus, but the transportation layer (VMBus) is much simpler than IP.
> 
> style trivia:
> 
> > diff --git a/net/hv_sock/af_hvsock.c b/net/hv_sock/af_hvsock.c
> []
> > +static struct sock *__hvsock_find_bound_socket(const struct sockaddr_hv
> *addr)
> > +{
> > +   struct hvsock_sock *hvsk;
> > +
> > +   list_for_each_entry(hvsk, _bound_list, bound_list)
> > +   if (uuid_equals(addr->shv_service_id,
> > +   hvsk->local_addr.shv_service_id))
> > +   return hvsock_to_sk(hvsk);
> 
> Because there's an if, it's generally nicer to use
> braces in the list_for_each

Thanks for the suggestion, Joe!
I'll add {}.

> > +static struct sock *__hvsock_find_connected_socket_by_channel(
> > +   const struct vmbus_channel *channel)
> > +{
> > +   struct hvsock_sock *hvsk;
> > +
> > +   list_for_each_entry(hvsk, _connected_list, connected_list)
> > +   if (hvsk->channel == channel)
> > +   return hvsock_to_sk(hvsk);
> > +   return NULL;
> 
> here too
I'll fix this too.

> > +static int hvsock_sendmsg(struct socket *sock, struct msghdr *msg, size_t 
> > len)
> > +{
> []
> > +   if (msg->msg_flags & ~MSG_DONTWAIT) {
> > +   pr_err("hvsock_sendmsg: unsupported flags=0x%x\n",
> > +      msg->msg_flags);
> 
> All the pr_ messages with embedded function
> names could use "%s:", __func__
I'll fix this.

Thanks,
-- Dexuan


[PATCH v7 net-next 1/1] hv_sock: introduce Hyper-V Sockets

2016-04-07 Thread Dexuan Cui
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It's somewhat like TCP over
VMBus, but the transportation layer (VMBus) is much simpler than IP.

With Hyper-V Sockets, applications between the host and the guest can talk
to each other directly by the traditional BSD-style socket APIs.

Hyper-V Sockets is only available on new Windows hosts, like Windows Server
2016. More info is in this article "Make your own integration services":
https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service

The patch implements the necessary support in the guest side by introducing
a new socket address family AF_HYPERV.

Signed-off-by: Dexuan Cui <de...@microsoft.com>
Cc: "K. Y. Srinivasan" <k...@microsoft.com>
Cc: Haiyang Zhang <haiya...@microsoft.com>
Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
---
 MAINTAINERS |2 +
 include/linux/hyperv.h  |   16 +
 include/linux/socket.h  |5 +-
 include/net/af_hvsock.h |   51 ++
 include/uapi/linux/hyperv.h |   16 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/hv_sock/Kconfig |   10 +
 net/hv_sock/Makefile|3 +
 net/hv_sock/af_hvsock.c | 1481 +++
 10 files changed, 1584 insertions(+), 2 deletions(-)
 create mode 100644 include/net/af_hvsock.h
 create mode 100644 net/hv_sock/Kconfig
 create mode 100644 net/hv_sock/Makefile
 create mode 100644 net/hv_sock/af_hvsock.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 67d99dd..7b6f203 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5267,7 +5267,9 @@ F:drivers/pci/host/pci-hyperv.c
 F: drivers/net/hyperv/
 F: drivers/scsi/storvsc_drv.c
 F: drivers/video/fbdev/hyperv_fb.c
+F: net/hv_sock/
 F: include/linux/hyperv.h
+F: include/net/af_hvsock.h
 F: tools/hv/
 F: Documentation/ABI/stable/sysfs-bus-vmbus
 
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index aa0fadc..b92439d 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1338,4 +1338,20 @@ extern __u32 vmbus_proto_version;
 
 int vmbus_send_tl_connect_request(const uuid_le *shv_guest_servie_id,
  const uuid_le *shv_host_servie_id);
+struct vmpipe_proto_header {
+   u32 pkt_type;
+   u32 data_size;
+} __packed;
+
+#define HVSOCK_HEADER_LEN  (sizeof(struct vmpacket_descriptor) + \
+sizeof(struct vmpipe_proto_header))
+
+/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */
+#define PREV_INDICES_LEN   (sizeof(u64))
+
+#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \
+   ALIGN((payload_len), 8) + \
+   PREV_INDICES_LEN)
+#define HVSOCK_MIN_PKT_LEN HVSOCK_PKT_LEN(1)
+
 #endif /* _HYPERV_H */
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 73bf6c6..88b1ccd 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -201,8 +201,8 @@ struct ucred {
 #define AF_NFC 39  /* NFC sockets  */
 #define AF_VSOCK   40  /* vSockets */
 #define AF_KCM 41  /* Kernel Connection Multiplexor*/
-
-#define AF_MAX 42  /* For now.. */
+#define AF_HYPERV  42  /* Hyper-V Sockets  */
+#define AF_MAX 43  /* For now.. */
 
 /* Protocol families, same as address families. */
 #define PF_UNSPEC  AF_UNSPEC
@@ -249,6 +249,7 @@ struct ucred {
 #define PF_NFC AF_NFC
 #define PF_VSOCK   AF_VSOCK
 #define PF_KCM AF_KCM
+#define PF_HYPERV  AF_HYPERV
 #define PF_MAX AF_MAX
 
 /* Maximum queue length specifiable by listen.  */
diff --git a/include/net/af_hvsock.h b/include/net/af_hvsock.h
new file mode 100644
index 000..a5aa28d
--- /dev/null
+++ b/include/net/af_hvsock.h
@@ -0,0 +1,51 @@
+#ifndef __AF_HVSOCK_H__
+#define __AF_HVSOCK_H__
+
+#include 
+#include 
+#include 
+
+#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV (5 * PAGE_SIZE)
+#define VMBUS_RINGBUFFER_SIZE_HVSOCK_SEND (5 * PAGE_SIZE)
+
+#define HVSOCK_RCV_BUF_SZ  VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV
+#define HVSOCK_SND_BUF_SZ  PAGE_SIZE
+
+#define sk_to_hvsock(__sk)((struct hvsock_sock *)(__sk))
+#define hvsock_to_sk(__hvsk)   ((struct sock *)(__hvsk))
+
+struct hvsock_sock {
+   /* sk must be the first member. */
+   struct sock sk;
+
+   struct sockaddr_hv local_addr;
+   struct sockaddr_hv remote_addr;
+
+   /* protected by the global hvsock_mutex */
+   struct list_head bound_list;
+   struct list_head connected_list;
+
+   struct list_head accept_queue;
+   /* used by enqueue and dequeue */
+   struct mutex accept_queue_mutex;
+
+   struct delayed_work dwork;
+
+   u32 peer_shutdown;
+
+   struct vmbus_channel *channe

[PATCH v7 net-next 0/1] introduce Hyper-V VM Sockets(hv_sock)

2016-04-07 Thread Dexuan Cui
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It's somewhat like TCP over
VMBus, but the transportation layer (VMBus) is much simpler than IP.

With Hyper-V Sockets, applications between the host and the guest can talk
to each other directly by the traditional BSD-style socket APIs.

Hyper-V Sockets is only available on new Windows hosts, like Windows Server
2016. More info is in this article "Make your own integration services":
https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service

The patch implements the necessary support in the guest side by
introducing a new socket address family AF_HYPERV.

Note: the VMBus driver side's supporting patches have been in the mainline
tree.

I know the kernel has already had a VM Sockets driver (AF_VSOCK) based
on VMware VMCI (net/vmw_vsock/, drivers/misc/vmw_vmci), and KVM is
proposing AF_VSOCK of virtio version:
http://marc.info/?l=linux-netdev=145952064004765=2

However, though Hyper-V Sockets may seem conceptually similar to
AF_VOSCK, there are differences in the transportation layer, and IMO these
make the direct code reusing impractical:

1. In AF_VSOCK, the endpoint type is: , but in
AF_HYPERV, the endpoint type is: . Here GUID
is 128-bit.

2. AF_VSOCK supports SOCK_DGRAM, while AF_HYPERV doesn't.

3. AF_VSOCK supports some special sock opts, like SO_VM_SOCKETS_BUFFER_SIZE,
SO_VM_SOCKETS_BUFFER_MIN/MAX_SIZE and SO_VM_SOCKETS_CONNECT_TIMEOUT.
These are meaningless to AF_HYPERV.

4. Some AF_VSOCK's VMCI transportation ops are meanless to AF_HYPERV/VMBus,
like .notify_recv_init
.notify_recv_pre_block
.notify_recv_pre_dequeue
.notify_recv_post_dequeue
.notify_send_init
.notify_send_pre_block
.notify_send_pre_enqueue
.notify_send_post_enqueue
etc.

So I think we'd better introduce a new address family: AF_HYPERV.

Please review the patch.

Looking forward to your comments!


Changes since v1:
- updated "[PATCH 6/7] hvsock: introduce Hyper-V VM Sockets feature"
- added __init and __exit for the module init/exit functions
- net/hv_sock/Kconfig: "default m" -> "default m if HYPERV"
- MODULE_LICENSE: "Dual MIT/GPL" -> "Dual BSD/GPL"

Changes since v2:
- fixed various coding issue pointed out by David Miller
- fixed indentation issues
- removed pr_debug in net/hv_sock/af_hvsock.c
- used reverse-Chrismas-tree style for local variables.
- EXPORT_SYMBOL -> EXPORT_SYMBOL_GPL

Changes since v3:
- fixed a few coding issue pointed by Vitaly Kuznetsov and Dan Carpenter
- fixed the ret value in vmbus_recvpacket_hvsock on error
- fixed the style of multi-line comment: vmbus_get_hvsock_rw_status()

Changes since v4 (https://lkml.org/lkml/2015/7/28/404):
- addressed all the comments about V4.
- treat the hvsock offers/channels as special VMBus devices
- add a mechanism to pass hvsock events to the hvsock driver
- fixed some corner cases with proper locking when a connection is closed
- rebased to the latest Greg's tree

Changes since v5 (https://lkml.org/lkml/2015/12/24/103):
- addressed the coding style issues (Vitaly Kuznetsov & David Miller, thanks!)
- used a better coding for the per-channel rescind callback (Thank Vitaly!)
- avoided the introduction of new VMBUS driver APIs vmbus_sendpacket_hvsock()
and vmbus_recvpacket_hvsock() and used vmbus_sendpacket()/vmbus_recvpacket()
in the higher level (i.e., the vmsock driver). Thank Vitaly!

Changes since v6 (http://lkml.iu.edu/hypermail/linux/kernel/1601.3/01813.html)
- only a few minor changes of coding style and comments

Dexuan Cui (1):
  hv_sock: introduce Hyper-V Sockets

 MAINTAINERS |2 +
 include/linux/hyperv.h  |   16 +
 include/linux/socket.h  |5 +-
 include/net/af_hvsock.h |   51 ++
 include/uapi/linux/hyperv.h |   16 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/hv_sock/Kconfig |   10 +
 net/hv_sock/Makefile|3 +
 net/hv_sock/af_hvsock.c | 1481 +++
 10 files changed, 1584 insertions(+), 2 deletions(-)
 create mode 100644 include/net/af_hvsock.h
 create mode 100644 net/hv_sock/Kconfig
 create mode 100644 net/hv_sock/Makefile
 create mode 100644 net/hv_sock/af_hvsock.c

-- 
2.1.0



RE: [PATCH net-next] net: add the AF_KCM entries to family name tables

2016-04-06 Thread Dexuan Cui
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Thursday, April 7, 2016 11:59
> To: Dexuan Cui <de...@microsoft.com>
> Cc: netdev@vger.kernel.org
> Subject: Re: [PATCH net-next] net: add the AF_KCM entries to family name
> tables
> 
> From: Dexuan Cui <de...@microsoft.com>
> Date: Thu, 7 Apr 2016 01:54:18 +
> 
> > Can you please apply this to net-next too?
> 
> That will happen transparently the next time I merge 'net' into
> 'net-next'.
> 
> It will happen at a time of my own choosing, and usually occurs
> when I do a push of my 'net' tree to Linus and he takes it in,
> and I know people need some 'net' things in 'net-next'.

Thanks for the explanation!

So, at present, let me only post the single AF_HYPERV patch to
net-next and hold the patch that adds AF_HYPERV entries to the family
name tables.

Thanks,
-- Dexuan


RE: [PATCH net-next] net: add the AF_KCM entries to family name tables

2016-04-06 Thread Dexuan Cui
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Thursday, April 7, 2016 5:00
> To: Dexuan Cui <de...@microsoft.com>
> Cc: netdev@vger.kernel.org
> Subject: Re: [PATCH net-next] net: add the AF_KCM entries to family name
> tables
> 
> From: Dexuan Cui <de...@microsoft.com>
> Date: Tue,  5 Apr 2016 07:41:11 -0700
> 
> > This is for the recent kcm driver, which introduces AF_KCM(41) in
> > b7ac4eb(kcm: Kernel Connection Multiplexor module).
> >
> > Signed-off-by: Dexuan Cui <de...@microsoft.com>
> > Cc: Signed-off-by: Tom Herbert <t...@herbertland.com>
> 
> As this is a bug fix actually, applied to 'net'.
David, 
Can you please apply this to net-next too?

It looks net-next is open now and I'm going to resubmit my
AF_HYPERV patchset, which needs to add AF_HYPERV entries to the 
family name tables too.

Thanks,
-- Dexuan


RE: [PATCH net-next] net: add the AF_KCM entries to family name tables

2016-04-06 Thread Dexuan Cui
> From: netdev-ow...@vger.kernel.org [mailto:netdev-ow...@vger.kernel.org]
> On Behalf Of Dexuan Cui
> Sent: Tuesday, April 5, 2016 22:41
> To: da...@davemloft.net; netdev@vger.kernel.org
> Subject: [PATCH net-next] net: add the AF_KCM entries to family name tables
> 
> This is for the recent kcm driver, which introduces AF_KCM(41) in
> b7ac4eb(kcm: Kernel Connection Multiplexor module).
> 
> Signed-off-by: Dexuan Cui <de...@microsoft.com>
> Cc: Signed-off-by: Tom Herbert <t...@herbertland.com>
> ---
>  net/core/sock.c | 9 ++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/net/core/sock.c b/net/core/sock.c
> index b67b9ae..7e73c26 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -221,7 +221,8 @@ static const char *const
> af_family_key_strings[AF_MAX+1] = {
>"sk_lock-AF_TIPC"  , "sk_lock-AF_BLUETOOTH", "sk_lock-IUCV",
>"sk_lock-AF_RXRPC" , "sk_lock-AF_ISDN" , "sk_lock-AF_PHONET"   ,
>"sk_lock-AF_IEEE802154", "sk_lock-AF_CAIF" , "sk_lock-AF_ALG"  ,
> -  "sk_lock-AF_NFC"   , "sk_lock-AF_VSOCK", "sk_lock-AF_MAX"
> +  "sk_lock-AF_NFC"   , "sk_lock-AF_VSOCK", "sk_lock-AF_KCM"  ,
> +  "sk_lock-AF_MAX"
>  };
>  static const char *const af_family_slock_key_strings[AF_MAX+1] = {
>"slock-AF_UNSPEC", "slock-AF_UNIX" , "slock-AF_INET" ,
> @@ -237,7 +238,8 @@ static const char *const
> af_family_slock_key_strings[AF_MAX+1] = {
>"slock-AF_TIPC"  , "slock-AF_BLUETOOTH", "slock-AF_IUCV" ,
>"slock-AF_RXRPC" , "slock-AF_ISDN" , "slock-AF_PHONET"   ,
>"slock-AF_IEEE802154", "slock-AF_CAIF" , "slock-AF_ALG"  ,
> -  "slock-AF_NFC"   , "slock-AF_VSOCK","slock-AF_MAX"
> +  "slock-AF_NFC"   , "slock-AF_VSOCK","slock-AF_KCM"   ,
> +  "slock-AF_MAX"
>  };
>  static const char *const af_family_clock_key_strings[AF_MAX+1] = {
>"clock-AF_UNSPEC", "clock-AF_UNIX" , "clock-AF_INET" ,
> @@ -253,7 +255,8 @@ static const char *const
> af_family_clock_key_strings[AF_MAX+1] = {
>"clock-AF_TIPC"  , "clock-AF_BLUETOOTH", "clock-AF_IUCV" ,
>"clock-AF_RXRPC" , "clock-AF_ISDN" , "clock-AF_PHONET"   ,
>"clock-AF_IEEE802154", "clock-AF_CAIF" , "clock-AF_ALG"  ,
> -  "clock-AF_NFC"   , "clock-AF_VSOCK", "clock-AF_MAX"
> +  "clock-AF_NFC"   , "clock-AF_VSOCK", "clock-AF_KCM"  ,
> +  "clock-AF_MAX"
>  };
> 
>  /*

Added Tom to Cc.

Thanks,
-- Dexuan


[PATCH net-next] net: add the AF_KCM entries to family name tables

2016-04-05 Thread Dexuan Cui
This is for the recent kcm driver, which introduces AF_KCM(41) in
b7ac4eb(kcm: Kernel Connection Multiplexor module).

Signed-off-by: Dexuan Cui <de...@microsoft.com>
Cc: Signed-off-by: Tom Herbert <t...@herbertland.com>
---
 net/core/sock.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index b67b9ae..7e73c26 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -221,7 +221,8 @@ static const char *const af_family_key_strings[AF_MAX+1] = {
   "sk_lock-AF_TIPC"  , "sk_lock-AF_BLUETOOTH", "sk_lock-IUCV",
   "sk_lock-AF_RXRPC" , "sk_lock-AF_ISDN" , "sk_lock-AF_PHONET"   ,
   "sk_lock-AF_IEEE802154", "sk_lock-AF_CAIF" , "sk_lock-AF_ALG"  ,
-  "sk_lock-AF_NFC"   , "sk_lock-AF_VSOCK", "sk_lock-AF_MAX"
+  "sk_lock-AF_NFC"   , "sk_lock-AF_VSOCK", "sk_lock-AF_KCM"  ,
+  "sk_lock-AF_MAX"
 };
 static const char *const af_family_slock_key_strings[AF_MAX+1] = {
   "slock-AF_UNSPEC", "slock-AF_UNIX" , "slock-AF_INET" ,
@@ -237,7 +238,8 @@ static const char *const 
af_family_slock_key_strings[AF_MAX+1] = {
   "slock-AF_TIPC"  , "slock-AF_BLUETOOTH", "slock-AF_IUCV" ,
   "slock-AF_RXRPC" , "slock-AF_ISDN" , "slock-AF_PHONET"   ,
   "slock-AF_IEEE802154", "slock-AF_CAIF" , "slock-AF_ALG"  ,
-  "slock-AF_NFC"   , "slock-AF_VSOCK","slock-AF_MAX"
+  "slock-AF_NFC"   , "slock-AF_VSOCK","slock-AF_KCM"   ,
+  "slock-AF_MAX"
 };
 static const char *const af_family_clock_key_strings[AF_MAX+1] = {
   "clock-AF_UNSPEC", "clock-AF_UNIX" , "clock-AF_INET" ,
@@ -253,7 +255,8 @@ static const char *const 
af_family_clock_key_strings[AF_MAX+1] = {
   "clock-AF_TIPC"  , "clock-AF_BLUETOOTH", "clock-AF_IUCV" ,
   "clock-AF_RXRPC" , "clock-AF_ISDN" , "clock-AF_PHONET"   ,
   "clock-AF_IEEE802154", "clock-AF_CAIF" , "clock-AF_ALG"  ,
-  "clock-AF_NFC"   , "clock-AF_VSOCK", "clock-AF_MAX"
+  "clock-AF_NFC"   , "clock-AF_VSOCK", "clock-AF_KCM"  ,
+  "clock-AF_MAX"
 };
 
 /*
-- 
2.1.0



Has the net-next tree been open now?

2016-04-04 Thread Dexuan Cui
Hi David,
I saw the v4.6-rc1 tag had been in net-next.git and a bunch of stmmac patches
appeared on the tree's master branch yesterday.

Thanks,
-- Dexuan



RE: [PATCH net-next 1/3] net: add the AF_KCM entries to family name tables

2016-03-21 Thread Dexuan Cui
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Monday, March 21, 2016 23:28
> To: Dexuan Cui <de...@microsoft.com>
> Cc: gre...@linuxfoundation.org; netdev@vger.kernel.org; linux-
> ker...@vger.kernel.org; de...@linuxdriverproject.org; o...@aepfle.de;
> a...@canonical.com; jasow...@redhat.com; KY Srinivasan
> <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>;
> vkuzn...@redhat.com
> Subject: Re: [PATCH net-next 1/3] net: add the AF_KCM entries to family
> name tables
> 
> 
> Two things wrong with this submission:
> 
> 1) You need to provide an initial "[PATCH net-next 0/3] ..." header posting
>explaining at a high level what this patch series is about and how it is
>implemented and why.

Hi David,
Thanks for the reply! I'll fix this.

> 2) The net-next tree is closed at this time because we are in the merge
> window,
>therefore no new feature patches should be submitted to the netdev
> mailing
>list at this time.  Please wait until some (reasonable) amount of time 
> after
>the merge window closes to resubmit this.

OK.  I'll repost it when the merge window is open -- I suppose that would happen
in 1~2 weeks, according to my reading the documentation.

Thanks,
-- Dexuan


[PATCH net-next 2/3] hv_sock: introduce Hyper-V Sockets

2016-03-21 Thread Dexuan Cui
Hyper-V Sockets (hv_sock) supplies a byte-stream based communication
mechanism between the host and the guest. It's somewhat like TCP over
VMBus, but the transportation layer (VMBus) is much simpler than IP.

With Hyper-V Sockets, applications between the host and the guest can talk
to each other directly by the traditional BSD-style socket APIs.

Hyper-V Sockets is only available on new Windows hosts, like Windows Server
2016. More info is in this article "Make your own integration services":
https://msdn.microsoft.com/en-us/virtualization/hyperv_on_windows/develop/make_mgmt_service

The patch implements the necessary support in the guest side by introducing
a new socket address family AF_HYPERV.

Signed-off-by: Dexuan Cui <de...@microsoft.com>
Cc: "K. Y. Srinivasan" <k...@microsoft.com>
Cc: Haiyang Zhang <haiya...@microsoft.com>
Cc: Vitaly Kuznetsov <vkuzn...@redhat.com>
---

I posted the V6 of the hv_sock patchset in Jan:
[PATCH V6 0/8] introduce Hyper-V VM Socket(hv_sock)
http://lkml.iu.edu/hypermail/linux/kernel/1601.3/01813.html

Now all the supporting patches in the VMBus side have been merged into
the mainline tree and the net-next tree, I think it's time to re-post the
net/ side's change -- I'm not sure if net-next is close now, since I don't
see a "net-next is CLOSED" mail recently?

The patch shouldn't cause any regression because it adds a new driver, not
touching the existing code.

Please comment on the patch.

 MAINTAINERS |2 +
 include/linux/hyperv.h  |   16 +
 include/linux/socket.h  |5 +-
 include/net/af_hvsock.h |   51 ++
 include/uapi/linux/hyperv.h |   16 +
 net/Kconfig |1 +
 net/Makefile|1 +
 net/hv_sock/Kconfig |   10 +
 net/hv_sock/Makefile|3 +
 net/hv_sock/af_hvsock.c | 1480 +++
 10 files changed, 1583 insertions(+), 2 deletions(-)
 create mode 100644 include/net/af_hvsock.h
 create mode 100644 net/hv_sock/Kconfig
 create mode 100644 net/hv_sock/Makefile
 create mode 100644 net/hv_sock/af_hvsock.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 0cbfc69..6fa438d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5222,7 +5222,9 @@ F:drivers/pci/host/pci-hyperv.c
 F: drivers/net/hyperv/
 F: drivers/scsi/storvsc_drv.c
 F: drivers/video/fbdev/hyperv_fb.c
+F: net/hv_sock/
 F: include/linux/hyperv.h
+F: include/net/af_hvsock.h
 F: tools/hv/
 F: Documentation/ABI/stable/sysfs-bus-vmbus
 
diff --git a/include/linux/hyperv.h b/include/linux/hyperv.h
index aa0fadc..b92439d 100644
--- a/include/linux/hyperv.h
+++ b/include/linux/hyperv.h
@@ -1338,4 +1338,20 @@ extern __u32 vmbus_proto_version;
 
 int vmbus_send_tl_connect_request(const uuid_le *shv_guest_servie_id,
  const uuid_le *shv_host_servie_id);
+struct vmpipe_proto_header {
+   u32 pkt_type;
+   u32 data_size;
+} __packed;
+
+#define HVSOCK_HEADER_LEN  (sizeof(struct vmpacket_descriptor) + \
+sizeof(struct vmpipe_proto_header))
+
+/* See 'prev_indices' in hv_ringbuffer_read(), hv_ringbuffer_write() */
+#define PREV_INDICES_LEN   (sizeof(u64))
+
+#define HVSOCK_PKT_LEN(payload_len)(HVSOCK_HEADER_LEN + \
+   ALIGN((payload_len), 8) + \
+   PREV_INDICES_LEN)
+#define HVSOCK_MIN_PKT_LEN HVSOCK_PKT_LEN(1)
+
 #endif /* _HYPERV_H */
diff --git a/include/linux/socket.h b/include/linux/socket.h
index 73bf6c6..88b1ccd 100644
--- a/include/linux/socket.h
+++ b/include/linux/socket.h
@@ -201,8 +201,8 @@ struct ucred {
 #define AF_NFC 39  /* NFC sockets  */
 #define AF_VSOCK   40  /* vSockets */
 #define AF_KCM 41  /* Kernel Connection Multiplexor*/
-
-#define AF_MAX 42  /* For now.. */
+#define AF_HYPERV  42  /* Hyper-V Sockets  */
+#define AF_MAX 43  /* For now.. */
 
 /* Protocol families, same as address families. */
 #define PF_UNSPEC  AF_UNSPEC
@@ -249,6 +249,7 @@ struct ucred {
 #define PF_NFC AF_NFC
 #define PF_VSOCK   AF_VSOCK
 #define PF_KCM AF_KCM
+#define PF_HYPERV  AF_HYPERV
 #define PF_MAX AF_MAX
 
 /* Maximum queue length specifiable by listen.  */
diff --git a/include/net/af_hvsock.h b/include/net/af_hvsock.h
new file mode 100644
index 000..a5aa28d
--- /dev/null
+++ b/include/net/af_hvsock.h
@@ -0,0 +1,51 @@
+#ifndef __AF_HVSOCK_H__
+#define __AF_HVSOCK_H__
+
+#include 
+#include 
+#include 
+
+#define VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV (5 * PAGE_SIZE)
+#define VMBUS_RINGBUFFER_SIZE_HVSOCK_SEND (5 * PAGE_SIZE)
+
+#define HVSOCK_RCV_BUF_SZ  VMBUS_RINGBUFFER_SIZE_HVSOCK_RECV
+#define HVSOCK_SND_BUF_SZ  PAGE_SIZE
+
+#define sk_to_hvsock(__sk)((struct hvsock_sock *)(__sk))
+#define hvso

[PATCH net-next 3/3] net: add the AF_HYPERV entries to family name tables

2016-03-21 Thread Dexuan Cui
This is for the hv_sock driver, which introduces AF_HYPERV(42).

Signed-off-by: Dexuan Cui <de...@microsoft.com>
Cc: "K. Y. Srinivasan" <k...@microsoft.com>
Cc: Haiyang Zhang <haiya...@microsoft.com>
---
 net/core/sock.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index 7e73c26..51ffc54 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -222,7 +222,7 @@ static const char *const af_family_key_strings[AF_MAX+1] = {
   "sk_lock-AF_RXRPC" , "sk_lock-AF_ISDN" , "sk_lock-AF_PHONET"   ,
   "sk_lock-AF_IEEE802154", "sk_lock-AF_CAIF" , "sk_lock-AF_ALG"  ,
   "sk_lock-AF_NFC"   , "sk_lock-AF_VSOCK", "sk_lock-AF_KCM"  ,
-  "sk_lock-AF_MAX"
+  "sk_lock-AF_HYPERV", "sk_lock-AF_MAX"
 };
 static const char *const af_family_slock_key_strings[AF_MAX+1] = {
   "slock-AF_UNSPEC", "slock-AF_UNIX" , "slock-AF_INET" ,
@@ -239,7 +239,7 @@ static const char *const 
af_family_slock_key_strings[AF_MAX+1] = {
   "slock-AF_RXRPC" , "slock-AF_ISDN" , "slock-AF_PHONET"   ,
   "slock-AF_IEEE802154", "slock-AF_CAIF" , "slock-AF_ALG"  ,
   "slock-AF_NFC"   , "slock-AF_VSOCK","slock-AF_KCM"   ,
-  "slock-AF_MAX"
+  "slock-AF_HYPERV", "slock-AF_MAX"
 };
 static const char *const af_family_clock_key_strings[AF_MAX+1] = {
   "clock-AF_UNSPEC", "clock-AF_UNIX" , "clock-AF_INET" ,
@@ -256,7 +256,7 @@ static const char *const 
af_family_clock_key_strings[AF_MAX+1] = {
   "clock-AF_RXRPC" , "clock-AF_ISDN" , "clock-AF_PHONET"   ,
   "clock-AF_IEEE802154", "clock-AF_CAIF" , "clock-AF_ALG"  ,
   "clock-AF_NFC"   , "clock-AF_VSOCK", "clock-AF_KCM"  ,
-  "clock-AF_MAX"
+  "clock-AF_HYPERV", "clock-AF_MAX"
 };
 
 /*
-- 
2.1.0



[PATCH net-next 1/3] net: add the AF_KCM entries to family name tables

2016-03-21 Thread Dexuan Cui
This is for the recent kcm driver, which introduces AF_KCM(41) in
b7ac4eb(kcm: Kernel Connection Multiplexor module).

Signed-off-by: Dexuan Cui <de...@microsoft.com>
Cc: Signed-off-by: Tom Herbert <t...@herbertland.com>
---
 net/core/sock.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index b67b9ae..7e73c26 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -221,7 +221,8 @@ static const char *const af_family_key_strings[AF_MAX+1] = {
   "sk_lock-AF_TIPC"  , "sk_lock-AF_BLUETOOTH", "sk_lock-IUCV",
   "sk_lock-AF_RXRPC" , "sk_lock-AF_ISDN" , "sk_lock-AF_PHONET"   ,
   "sk_lock-AF_IEEE802154", "sk_lock-AF_CAIF" , "sk_lock-AF_ALG"  ,
-  "sk_lock-AF_NFC"   , "sk_lock-AF_VSOCK", "sk_lock-AF_MAX"
+  "sk_lock-AF_NFC"   , "sk_lock-AF_VSOCK", "sk_lock-AF_KCM"  ,
+  "sk_lock-AF_MAX"
 };
 static const char *const af_family_slock_key_strings[AF_MAX+1] = {
   "slock-AF_UNSPEC", "slock-AF_UNIX" , "slock-AF_INET" ,
@@ -237,7 +238,8 @@ static const char *const 
af_family_slock_key_strings[AF_MAX+1] = {
   "slock-AF_TIPC"  , "slock-AF_BLUETOOTH", "slock-AF_IUCV" ,
   "slock-AF_RXRPC" , "slock-AF_ISDN" , "slock-AF_PHONET"   ,
   "slock-AF_IEEE802154", "slock-AF_CAIF" , "slock-AF_ALG"  ,
-  "slock-AF_NFC"   , "slock-AF_VSOCK","slock-AF_MAX"
+  "slock-AF_NFC"   , "slock-AF_VSOCK","slock-AF_KCM"   ,
+  "slock-AF_MAX"
 };
 static const char *const af_family_clock_key_strings[AF_MAX+1] = {
   "clock-AF_UNSPEC", "clock-AF_UNIX" , "clock-AF_INET" ,
@@ -253,7 +255,8 @@ static const char *const 
af_family_clock_key_strings[AF_MAX+1] = {
   "clock-AF_TIPC"  , "clock-AF_BLUETOOTH", "clock-AF_IUCV" ,
   "clock-AF_RXRPC" , "clock-AF_ISDN" , "clock-AF_PHONET"   ,
   "clock-AF_IEEE802154", "clock-AF_CAIF" , "clock-AF_ALG"  ,
-  "clock-AF_NFC"   , "clock-AF_VSOCK", "clock-AF_MAX"
+  "clock-AF_NFC"   , "clock-AF_VSOCK", "clock-AF_KCM"  ,
+  "clock-AF_MAX"
 };
 
 /*
-- 
2.1.0



RE: When will net-next merge with linux-next?

2016-03-15 Thread Dexuan Cui
> From: gre...@linuxfoundation.org [mailto:gre...@linuxfoundation.org]
> Sent: Wednesday, March 16, 2016 10:41
> To: Dexuan Cui <de...@microsoft.com>
> Cc: David Miller <da...@davemloft.net>; netdev@vger.kernel.org; KY
> Srinivasan <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>
> Subject: Re: When will net-next merge with linux-next?
> 
> On Wed, Mar 16, 2016 at 01:58:50AM +, Dexuan Cui wrote:
> > > From: gre...@linuxfoundation.org [mailto:gre...@linuxfoundation.org]
> > > Sent: Tuesday, March 15, 2016 23:06
> > > To: Dexuan Cui <de...@microsoft.com>
> > > Cc: David Miller <da...@davemloft.net>; netdev@vger.kernel.org; KY
> > > Srinivasan <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>
> > > Subject: Re: When will net-next merge with linux-next?
> > >
> > > On Tue, Mar 15, 2016 at 11:11:34AM +, Dexuan Cui wrote:
> > > > I'm wondering whether (and when) step 2 will happen in the next 2 weeks,
> > > > that is, before the tag 4.6-rc1 is made.
> > > > If not, I guess I'll miss 4.6?
> > >
> > > You missed 4.6 as your patch was not in any of our trees a few days
> > > before 4.5 was released, sorry.
> > >
> > > greg k-h
> > Hi Greg,
> > Thanks for the reply!
> >
> > My patch has to go in net-next first, but even today's mainline and net-next
> > haven't had the supporting patches in the VMBus driver, so I can't post my
> > patch to net-next even today  -- it seems it's doomed to need 2 major
> > release cycles to push a feature that makes changes to 2 subsystems? :-(
> 
> Usually, yes, unless you talk to us ahead of time so we can coordinate,
> or have one of the patches go through a different tree (i.e. all in one
> tree.).

Greg, Thanks for your patient explanation!
I thought the patch (AF_HYPERV) must go through net-next. 
 
> Just wait until 4.6-rc1 is out and all will be fine.
> 
> greg k-h

BTW, I saw this in Documentation/development-process/2.Process:

"As a general rule, if you miss the merge window for a given feature, the
best thing to do is to wait for the next development cycle.  (An occasional
exception is made for drivers for previously-unsupported hardware; if they
touch no in-tree code, they cannot cause regressions and should be safe to
add at any time)"

I hope David could make an exception for the AF_HYPERV patch, since it
is a new driver, touching no in-tree code and unlikely to cause regressions. :-)

And actually the new driver won't be automatically loaded -- a user must
manually load it before the feature can be used.

Thanks,
-- Dexuan


RE: When will net-next merge with linux-next?

2016-03-15 Thread Dexuan Cui
> From: gre...@linuxfoundation.org [mailto:gre...@linuxfoundation.org]
> Sent: Tuesday, March 15, 2016 23:06
> To: Dexuan Cui <de...@microsoft.com>
> Cc: David Miller <da...@davemloft.net>; netdev@vger.kernel.org; KY
> Srinivasan <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>
> Subject: Re: When will net-next merge with linux-next?
> 
> On Tue, Mar 15, 2016 at 11:11:34AM +, Dexuan Cui wrote:
> > I'm wondering whether (and when) step 2 will happen in the next 2 weeks,
> > that is, before the tag 4.6-rc1 is made.
> > If not, I guess I'll miss 4.6?
> 
> You missed 4.6 as your patch was not in any of our trees a few days
> before 4.5 was released, sorry.
> 
> greg k-h
Hi Greg,
Thanks for the reply!

My patch has to go in net-next first, but even today's mainline and net-next
haven't had the supporting patches in the VMBus driver, so I can't post my
patch to net-next even today  -- it seems it's doomed to need 2 major
release cycles to push a feature that makes changes to 2 subsystems? :-(

I guess Greg will send a pull request to Linus within the next 1~2 weeks, so
the supporting VMBus patches will be in the mainline after that.

Hi David,
May I know when you'll merge with the mainline kernel and how
frequently do you usually do it?

Thanks,
-- Dexuan


RE: When will net-next merge with linux-next?

2016-03-15 Thread Dexuan Cui
> From: gre...@linuxfoundation.org [mailto:gre...@linuxfoundation.org]
> Sent: Tuesday, March 15, 2016 0:22
> To: Dexuan Cui <de...@microsoft.com>
> Cc: David Miller <da...@davemloft.net>; netdev@vger.kernel.org; KY
> Srinivasan <k...@microsoft.com>; Haiyang Zhang <haiya...@microsoft.com>
> Subject: Re: When will net-next merge with linux-next?
> 
> On Mon, Mar 14, 2016 at 06:09:41AM +, Dexuan Cui wrote:
> > Hi David,
> > I have a pending patch of the hv_sock driver, which should go into the
> > kernel through the net-next tree:
> >
> https://lkml.org/lkml/2016/2/14/7
>
> > The VMBus side's supporting patches of hv_sock have been in Greg's tree
> > and linux-next for more than 1 month, but they haven't been in net-next
> > yet, I suppose this is because of the releasing of 4.5.
> >
> > Now 4.5 is released. Will you merge with Greg's tree or linux-next?
> 
> linux-next is a merge of all of the maintainer's trees, and it is
> rebased every day, it's impossible to merge that back into a maintainers
> tree, sorry.

Greg, thanks for the reply!
 
> > I read netdev-FAQ.txt, but still don't have a clear idea about how things
> > work in my case.
> 
> Try reading Documentation/development-process/ please.  Things will get
> merged together into Linus's tree over the next 2 weeks as we ask him to
> pull our trees.
> 
> greg k-h

I read the development-process documents. 
Since 4.5 was released and Linus's merge window is open for 4.6, I guess
what will happen next is:
1. Linus will pull from char-misc.git and net-netx.git;
2. David will merge with Linus's mainline tree;
3. I can post my patch to net-next.git then.

I'm wondering whether (and when) step 2 will happen in the next 2 weeks,
that is, before the tag 4.6-rc1 is made.
If not, I guess I'll miss 4.6?

Thanks,
-- Dexuan


When will net-next merge with linux-next?

2016-03-14 Thread Dexuan Cui
Hi David,
I have a pending patch of the hv_sock driver, which should go into the
kernel through the net-next tree:
https://lkml.org/lkml/2016/2/14/7

The VMBus side's supporting patches of hv_sock have been in Greg's tree
and linux-next for more than 1 month, but they haven't been in net-next
yet, I suppose this is because of the releasing of 4.5.

Now 4.5 is released. Will you merge with Greg's tree or linux-next?

I read netdev-FAQ.txt, but still don't have a clear idea about how things
work in my case.

Thanks,
-- Dexuan




RE: [REGRESSION, bisect] net: ipv6: unregister_netdevice: waiting for lo to become free. Usage count = 2

2016-03-02 Thread Dexuan Cui
> Hi David,
> On Wed, Mar 02, 2016 at 01:00:21PM -0800, David Ahern wrote:
> > On 3/2/16 12:31 PM, Jeremiah Mahler wrote:
> > >>On Tue, Mar 01, 2016 at 08:11:54AM +, Dexuan Cui wrote:
> > >>>Hi, I got this line every 10 seconds with today's linux-next in a Hyper-V
> guest, even
> > >>>when I didn't configure any NIC for the guest:
> > >>>
> > >>>[   72.604249] unregister_netdevice: waiting for lo to become free. Usage
> count = 2
> > >>>[   82.708170] unregister_netdevice: waiting for lo to become free. Usage
> count = 2
> > >>>[   92.788079] unregister_netdevice: waiting for lo to become free. Usage
> count = 2
> > >>>[  102.808132] unregister_netdevice: waiting for lo to become free. Usage
> count = 2
> > >>>[  112.928166] unregister_netdevice: waiting for lo to become free. Usage
> count = 2
> > >>>[  122.952069] unregister_netdevice: waiting for lo to become free. Usage
> count = 2
> > >>>
> > >>>I don't think this is related to the underlying host, since it's related 
> > >>>to "lo".
> >
> > This should fix it:
> > https://patchwork.ozlabs.org/patch/591102/
> 
> > David
> 
> That patch fixes the problem on my machine.
> Thanks for the quick fix :-)
> 
> - Jeremiah Mahler

This works for me too! Thanks!

Thanks,
-- Dexuan


RE: [PATCH V6 0/8] introduce Hyper-V VM Socket(hv_sock)

2016-02-13 Thread Dexuan Cui
> From: devel [mailto:driverdev-devel-boun...@linuxdriverproject.org] On Behalf
> Of Dexuan Cui
> Sent: Tuesday, January 26, 2016 17:40
> ...
> Dexuan Cui (8):
>   Drivers: hv: vmbus: add a helper function to set a channel's pending
> send size
>   Drivers: hv: vmbus: define the new offer type for Hyper-V socket
> (hvsock)
>   Drivers: hv: vmbus: vmbus_sendpacket_ctl: hvsock: avoid unnecessary
> signaling
>   Drivers: hv: vmbus: define a new VMBus message type for hvsock
>   Drivers: hv: vmbus: add a hvsock flag in struct hv_driver
>   Drivers: hv: vmbus: add a per-channel rescind callback
>   Drivers: hv: vmbus: add an API vmbus_hvsock_device_unregister()
>   hvsock: introduce Hyper-V Socket feature

Hi David,
Greg has accepted all my VMBus driver side's patches.
I'm going to post the net/hv_sock/ patch now.

I know I should rebase my patch to the net-next tree, but net-next hasn't
contained my VMBus driver side's patches, which are a prerequisite of my
net/hv_sock/ patch.

It looks I have to wait before you merge net-next with Greg's tree, or
with the mainline (after Greg pushes the changes to the mainline)?
If so, may I know when the next merge will be happening (so I don't need
to check net-next every day :-) ) ?

Thanks,
-- Dexuan


[PATCH V6 3/8] Drivers: hv: vmbus: vmbus_sendpacket_ctl: hvsock: avoid unnecessary signaling

2016-01-26 Thread Dexuan Cui
When the hvsock channel's outbound ringbuffer is full (i.e.,
hv_ringbuffer_write() returns -EAGAIN), we should avoid the unnecessary
signaling the host.

Signed-off-by: Dexuan Cui <de...@microsoft.com>
---
 drivers/hv/channel.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/hv/channel.c b/drivers/hv/channel.c
index 1161d68..3f04533 100644
--- a/drivers/hv/channel.c
+++ b/drivers/hv/channel.c
@@ -659,6 +659,9 @@ int vmbus_sendpacket_ctl(struct vmbus_channel *channel, 
void *buffer,
 * If we cannot write to the ring-buffer; signal the host
 * even if we may not have written anything. This is a rare
 * enough condition that it should not matter.
+* NOTE: in this case, the hvsock channel is an exception, because
+* it looks the host side's hvsock implementation has a throttling
+* mechanism which can hurt the performance otherwise.
 */
 
if (channel->signal_policy)
@@ -666,7 +669,8 @@ int vmbus_sendpacket_ctl(struct vmbus_channel *channel, 
void *buffer,
else
kick_q = true;
 
-   if (((ret == 0) && kick_q && signal) || (ret))
+   if (((ret == 0) && kick_q && signal) ||
+   (ret && !is_hvsock_channel(channel)))
vmbus_setevent(channel);
 
return ret;
-- 
2.1.0



  1   2   >