Re: [Qemu-devel] Question about VM inner route entry's lost when vhost-user reconnect

2019-03-19 Thread Lilijun (Jerry, Cloud Networking)
Hi Stefan,

After more detail test, I found two results:
1) This route entry's lost  can be reproduced on both virtio-net and 
pass-through physical devices.
2) The link down event is handled by a service named NetworkManager in my VM 
(CentOS linux 3.10.0-514-e17.x86_64).  If I stop or kill this service,  the 
issue got disappeared.

So, if some customer's VM guest OS works like my VM,  this problem will cause 
some unexpected TCP connection disconnected when the backend process crashed.

Thanks

> -Original Message-
> From: Stefan Hajnoczi [mailto:stefa...@gmail.com]
> Sent: Wednesday, March 13, 2019 10:55 PM
> To: Lilijun (Jerry, Cloud Networking) 
> Cc: qemu-devel@nongnu.org; wangxin (U)
> ; wangyunjian
> 
> Subject: Re: [Qemu-devel] Question about VM inner route entry's lost when
> vhost-user reconnect
> 
> On Tue, Mar 12, 2019 at 02:01:04AM +, Lilijun (Jerry, Cloud Networking)
> wrote:
> >
> >
> > > -Original Message-
> > > From: Stefan Hajnoczi [mailto:stefa...@gmail.com]
> > > Sent: Monday, March 11, 2019 5:47 PM
> > > To: Lilijun (Jerry, Cloud Networking) 
> > > Cc: qemu-devel@nongnu.org; wangxin (U)
> ;
> > > wangyunjian 
> > > Subject: Re: [Qemu-devel] Question about VM inner route entry's lost
> > > when vhost-user reconnect
> > >
> > > On Fri, Mar 08, 2019 at 02:31:12AM +, Lilijun (Jerry, Cloud
> > > Networking)
> > > wrote:
> > > > This problem is related with backend vhost-user socket abnormal
> > > > cases, we
> > > shouldn't ask customers to configure it manually for backend's
> > > issues or depends on guest OS's network configuration.
> > >
> > > In Step 1 you said:
> > >
> > > > > >  1) In the VM, I add one route entry manually on the vNIC
> > > > > > eth0 using the
> > > > > linux tool route.
> > >
> > > You configured the route manually inside the guest.  Seems like a
> > > guest problem to me.
> > >
> > > If this was a physical machine that lost connectivity due to a link
> > > event, what would happen?
> >
> > Yes, the configuration can be recovered manually by customers.
> >
> > But in the virtualization machines, this configuration's lost is a result of
> backend process's software unexpected bugs or version update. So I think
> we need hide this change to customers.
> 
> My question is:
> 
> Do manually added routes disappear on a physical machine when the link
> goes down?
> 
> If yes, then the VM is acting correctly and this issue can be solved by
> configuring the guest appropriately.  (Hiding the link down event might seem
> nice in this particular situation but other users might need the event.  
> Usually
> it's best to follow how physical machines behave and rely on existing
> solutions instead of implementing different behavior for VMs because that
> leads to new problems that are hard to foresee.)
> 
> If no, then why is the guest treating the virtio-net link down differently?
> Could be a bug.
> 
> Stefan



Re: [Qemu-devel] Question about VM inner route entry's lost when vhost-user reconnect

2019-03-11 Thread Lilijun (Jerry, Cloud Networking)



> -Original Message-
> From: Stefan Hajnoczi [mailto:stefa...@gmail.com]
> Sent: Monday, March 11, 2019 5:47 PM
> To: Lilijun (Jerry, Cloud Networking) 
> Cc: qemu-devel@nongnu.org; wangxin (U)
> ; wangyunjian
> 
> Subject: Re: [Qemu-devel] Question about VM inner route entry's lost when
> vhost-user reconnect
> 
> On Fri, Mar 08, 2019 at 02:31:12AM +, Lilijun (Jerry, Cloud Networking)
> wrote:
> > This problem is related with backend vhost-user socket abnormal cases, we
> shouldn't ask customers to configure it manually for backend's issues or
> depends on guest OS's network configuration.
> 
> In Step 1 you said:
> 
> > > >  1) In the VM, I add one route entry manually on the vNIC eth0
> > > > using the
> > > linux tool route.
> 
> You configured the route manually inside the guest.  Seems like a guest
> problem to me.
> 
> If this was a physical machine that lost connectivity due to a link event, 
> what
> would happen?

Yes, the configuration can be recovered manually by customers.

But in the virtualization machines, this configuration's lost is a result of 
backend process's software unexpected bugs or version update. So I think we 
need hide this change to customers.
> 
> Stefan



Re: [Qemu-devel] Question about VM inner route entry's lost when vhost-user reconnect

2019-03-07 Thread Lilijun (Jerry, Cloud Networking)
Hi, Stefan

This problem is related with backend vhost-user socket abnormal cases, we 
shouldn't ask customers to configure it manually for backend's issues or 
depends on guest OS's network configuration.

Thanks

> -Original Message-
> From: Stefan Hajnoczi [mailto:stefa...@gmail.com]
> Sent: Wednesday, March 06, 2019 6:07 PM
> To: Lilijun (Jerry, Cloud Networking) 
> Cc: qemu-devel@nongnu.org; wangxin (U)
> ; wangyunjian
> 
> Subject: Re: [Qemu-devel] Question about VM inner route entry's lost when
> vhost-user reconnect
> 
> On Mon, Mar 04, 2019 at 08:26:23AM +, Lilijun (Jerry, Cloud Networking)
> wrote:
> >   I am running my VM using vhost-user NIC with OVS-DPDK.  The steps of
> my question is shown as follows:
> >  1) In the VM, I add one route entry manually on the vNIC eth0 using the
> linux tool route.
> >  2) When restarting openvswitch service for the crash of the 
> > ovs-vswitchd,
> qemu's vhost-user reconnected successfully after 40s.
> >  3) Here VM's vNIC will receive link down and up events, the interval
> between the two events is about 40s.
> >  3) But that route entry disappeared and that will cause user's network
> traffic interruption and the service failed.
> >
> >  Is there some work on this problem?  Can we keep the vNIC's link up
> status when do vhost-user's reconnecting work?
> 
> Can you add the custom route to the network management tool inside the
> guest so that it will be reinstated when the link comes back up?
> 
> The details of how to do this depend on the guest's distro.
> 
> Stefan



Re: [Qemu-devel] Question about VM virtio device's link down delay when vhost-user reconnect

2019-03-07 Thread Lilijun (Jerry, Cloud Networking)
> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> 
> On Wed, Mar 06, 2019 at 07:36:44AM +0000, Lilijun (Jerry, Cloud Networking)
> wrote:
> > Thanks a lot for your advice.
> >
> > Maybe there are two methods to add this option:
> > 1) Firstly, add a vhost-user protocol feature to tell Qemu if hide the
> disconnects from the guest.  Here we just need the backend such as dpdk
> vhostuser to support this option and the feature.
> > 2) Secondly, add a VM XML vhost-user nic configuration parameters for
> Qemu.  This method need more modification and other components such as
> Libvirt and Nova in openstack to configure it.
> >
> > I'd like to choose the first method,  Do you think so?
> 
> What we need to decide this is - when is it a good idea to down the link on
> disconnect.
> If it depends on vm configuration then it belongs with qemu.
> If it depends on hardware or other host configuration it might belong with
> the backend.
> 

In my opinion, the vhost-user disconnects is related with the host backend 
process's restart or other socket close.

So, we can add a host configuration such as ovs/dpdk  vhostuser interface's 
options(ovs-vsctl set interface) to tell Qemu hide the disconnects by vhostuser 
protocol feature negotiation.

Thanks

> 
> 
> > To monitor the status of connection, we can using the command " virsh
> qemu-monitor-command vm1 --hmp info chardev " to lookup that status.
> Another one is to add new type event for Qemu to notify libvirt or other
> upper level components.
> >
> > Jerry
> >
> > > -Original Message-
> > > From: Michael S. Tsirkin [mailto:m...@redhat.com]
> > > Sent: Tuesday, March 05, 2019 10:39 AM
> > > To: Lilijun (Jerry, Cloud Networking) 
> > > Cc: qemu-devel@nongnu.org; pbonz...@redhat.com; Liujinsong (Paul)
> > > ; lixiao (H) ;
> > > wangyunjian ; wangxin (U)
> > > ; Gonglei (Arei)
> > > 
> > > Subject: Re: Question about VM virtio device's link down delay when
> > > vhost- user reconnect
> > >
> > > On Mon, Mar 04, 2019 at 11:46:32AM +, Lilijun (Jerry, Cloud
> > > Networking)
> > > wrote:
> > > > Hi all,
> > > >
> > > >   I am running my VM using vhost-user NIC with OVS-DPDK.  The
> > > > steps of
> > > my question is shown as follows:
> > > >  1) In the VM, I add one route entry manually on the vNIC eth0
> > > > using
> > > "route add default gw 192.168.1.2".
> > > >  2) If openvswitch service was restarted, or the process
> > > > ovs-vswitchd was
> > > aborted, the new process may be started successfully after long
> > > seconds such as 40s for the initialization of DPDK huge page memory.
> > > >  3) And Qemu's vhost-user closed the connection and
> > > > reconnected
> > > successfully after 40s.
> > > >  4) Here VM's vNIC will receive link down and up events, the
> > > > interval
> > > between the two events is about 40s.
> > > >  5) Then I found that route entry disappeared unexpectedly.
> > > > This will
> > > cause some network traffic problems.
> > > >
> > > >  I have an idea about this problem. We can add a parameter "
> > > link_down_delay" for all virtio devices that use vhost-user socket
> > > such as virtio-net and virtio-blk.
> > > >
> > > > If vhost-user socket get a connection closed event when the
> > > > backend
> > > process was aborted or restarted, we don't notify VM virtio-net
> > > device link down right now.
> > > >When the vhost-user backend recover this socket's connections
> > > > before
> > > the time of "link_down_delay" ms passed, we need not do that link
> > > down notification to VM.
> > > >Or else, if that's timeout, VM can be notified the link down
> > > > event as
> > > before.
> > > >
> > > > Is there any other opinions about this solution?  Or some better 
> > > > ideas?
> > > Thanks.
> > > >
> > > > B.R.
> > > >
> > > > Jerry
> > > >
> > >
> > > Rather than hardcode a specific timeout policy, I would go further
> > > and start with an option to just hide disconnects from guest completely.
> > > Instead add commands to monitor status of connection and events to
> > > report changes.  Management tools can then mirror connection status
> > > to link if they want to.
> > >
> > > --
> > > MST



Re: [Qemu-devel] Question about VM virtio device's link down delay when vhost-user reconnect

2019-03-05 Thread Lilijun (Jerry, Cloud Networking)
Thanks a lot for your advice.

Maybe there are two methods to add this option:
1) Firstly, add a vhost-user protocol feature to tell Qemu if hide the 
disconnects from the guest.  Here we just need the backend such as dpdk 
vhostuser to support this option and the feature.
2) Secondly, add a VM XML vhost-user nic configuration parameters for Qemu.  
This method need more modification and other components such as Libvirt and 
Nova in openstack to configure it.

I'd like to choose the first method,  Do you think so?   

To monitor the status of connection, we can using the command " virsh 
qemu-monitor-command vm1 --hmp info chardev " to lookup that status. Another 
one is to add new type event for Qemu to notify libvirt or other upper level 
components.

Jerry

> -Original Message-
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Tuesday, March 05, 2019 10:39 AM
> To: Lilijun (Jerry, Cloud Networking) 
> Cc: qemu-devel@nongnu.org; pbonz...@redhat.com; Liujinsong (Paul)
> ; lixiao (H) ; wangyunjian
> ; wangxin (U)
> ; Gonglei (Arei)
> 
> Subject: Re: Question about VM virtio device's link down delay when vhost-
> user reconnect
> 
> On Mon, Mar 04, 2019 at 11:46:32AM +, Lilijun (Jerry, Cloud Networking)
> wrote:
> > Hi all,
> >
> >   I am running my VM using vhost-user NIC with OVS-DPDK.  The steps of
> my question is shown as follows:
> >  1) In the VM, I add one route entry manually on the vNIC eth0 using
> "route add default gw 192.168.1.2".
> >  2) If openvswitch service was restarted, or the process ovs-vswitchd 
> > was
> aborted, the new process may be started successfully after long seconds
> such as 40s for the initialization of DPDK huge page memory.
> >  3) And Qemu's vhost-user closed the connection and reconnected
> successfully after 40s.
> >  4) Here VM's vNIC will receive link down and up events, the interval
> between the two events is about 40s.
> >  5) Then I found that route entry disappeared unexpectedly. This will
> cause some network traffic problems.
> >
> >  I have an idea about this problem. We can add a parameter "
> link_down_delay" for all virtio devices that use vhost-user socket such as
> virtio-net and virtio-blk.
> >
> > If vhost-user socket get a connection closed event when the backend
> process was aborted or restarted, we don't notify VM virtio-net device link
> down right now.
> >When the vhost-user backend recover this socket's connections before
> the time of "link_down_delay" ms passed, we need not do that link down
> notification to VM.
> >Or else, if that's timeout, VM can be notified the link down event as
> before.
> >
> > Is there any other opinions about this solution?  Or some better ideas?
> Thanks.
> >
> > B.R.
> >
> > Jerry
> >
> 
> Rather than hardcode a specific timeout policy, I would go further and start
> with an option to just hide disconnects from guest completely.
> Instead add commands to monitor status of connection and events to report
> changes.  Management tools can then mirror connection status to link if they
> want to.
> 
> --
> MST



[Qemu-devel] Question about VM virtio device's link down delay when vhost-user reconnect

2019-03-04 Thread Lilijun (Jerry, Cloud Networking)
Hi all,

  I am running my VM using vhost-user NIC with OVS-DPDK.  The steps of my 
question is shown as follows:
 1) In the VM, I add one route entry manually on the vNIC eth0 using "route 
add default gw 192.168.1.2". 
 2) If openvswitch service was restarted, or the process ovs-vswitchd was 
aborted, the new process may be started successfully after long seconds such as 
40s for the initialization of DPDK huge page memory.
 3) And Qemu's vhost-user closed the connection and reconnected 
successfully after 40s.  
 4) Here VM's vNIC will receive link down and up events, the interval 
between the two events is about 40s.
 5) Then I found that route entry disappeared unexpectedly. This will cause 
some network traffic problems.

 I have an idea about this problem. We can add a parameter " 
link_down_delay" for all virtio devices that use vhost-user socket such as 
virtio-net and virtio-blk. 

If vhost-user socket get a connection closed event when the backend process 
was aborted or restarted, we don't notify VM virtio-net device link down right 
now.
   When the vhost-user backend recover this socket's connections before the 
time of "link_down_delay" ms passed, we need not do that link down notification 
to VM.
   Or else, if that's timeout, VM can be notified the link down event as before.

Is there any other opinions about this solution?  Or some better ideas? 
Thanks.

B.R.

Jerry






[Qemu-devel] Question about VM inner route entry's lost when vhost-user reconnect

2019-03-04 Thread Lilijun (Jerry, Cloud Networking)

Hi all,

  I am running my VM using vhost-user NIC with OVS-DPDK.  The steps of my 
question is shown as follows:
 1) In the VM, I add one route entry manually on the vNIC eth0 using the 
linux tool route. 
 2) When restarting openvswitch service for the crash of the ovs-vswitchd, 
qemu's vhost-user reconnected successfully after 40s.  
 3) Here VM's vNIC will receive link down and up events, the interval 
between the two events is about 40s.
 3) But that route entry disappeared and that will cause user's network 
traffic interruption and the service failed.

 Is there some work on this problem?  Can we keep the vNIC's link up status 
when do vhost-user's reconnecting work?

 Thanks.

Jerry