Re: [Qemu-devel] [PATCH v7 RFC] block/vxhs: Initial commit to add Veritas HyperScale VxHS block device support

Stefan Hajnoczi Wed, 30 Nov 2016 00:38:05 -0800

On Wed, Nov 30, 2016 at 04:20:03AM +0000, Rakesh Ranjan wrote:
> >>>>> Why does the client have to know about failover if it's connected to
> >>>>>a server process on the same host?  I thought the server process
> >>>>>manages networking issues (like the actual protocol to speak to other
> >>>>>VxHS nodes and for failover).
> 
> Just to comment on this, the model being followed within HyperScale is to
> allow application I/O continuity (resiliency) in various cases as
> mentioned below. It really adds value for consumer/customer and tries to
> avoid culprits for single points of failure.
> 
> 1. HyperScale storage service failure (QNIO Server)
>       - Daemon managing local storage for VMs and runs on each compute node
>       - Daemon can run as a service on Hypervisor itself as well as within VSA
> (Virtual Storage Appliance or Virtual Machine running on the hypervisor),
> which depends on ecosystem where HyperScale is supported
>       - Daemon or storage service down/crash/crash-in-loop shouldn¹t lead to 
> an
> huge impact on all the VMs running on that hypervisor or compute node
> hence providing service level resiliency is very useful for
>           application I/O continuity in such case.
> 
>    Solution:
>       - The service failure handling can be only done at the client side and
> not at the server side since service running as a server itself is down.
>       - Client detects an I/O error and depending on the logic, it does
> application I/O failover to another available/active QNIO server or
> HyperScale Storage service running on different compute node
> (reflection/replication node)
>       - Once the orig/old server comes back online, client gets/receives
> negotiated error (not a real application error) to do the application I/O
> failback to the original server or local HyperScale storage service to get
> better I/O performance.
>   
> 2. Local physical storage or media failure
>       - Once server or HyperScale storage service detects the media or local
> disk failure, depending on the vDisk (guest disk) configuration, if
> another storage copy is available
>           on different compute node then it internally handles the local
> fault and serves the application read and write requests otherwise
> application or client gets the fault.
>       - Client doesn¹t know about any I/O failure since Server or Storage
> service manages/handles the fault tolerance.
>         - In such case, in order to get some I/O performance benefit, once
> client gets a negotiated error (not an application error) from local
> server or storage service,
>           client can initiate I/O failover and can directly send
> application I/O to another compute node where storage copy is available to
> serve the application need instead of sending it locally where media is
> faulted.


Thanks for explaining the model.

The new information for me here is that the qnio server may run in a VM
instead of on the host and that the client will attempt to use a remote
qnio server if the local qnio server fails.

This means that although the discussion most recently focussed on local
I/O tap performance, there is a requirement for a network protocol too.
The local I/O tap stuff is just an optimization for when the local qnio
server can be used.

Stefan

signature.asc
Description: PGP signature

Re: [Qemu-devel] [PATCH v7 RFC] block/vxhs: Initial commit to add Veritas HyperScale VxHS block device support

Reply via email to