Re: [PATCH v2] piix: fix regression during unplug in Xen HVM domUs

2023-05-12 Thread Stefano Stabellini
On Wed, 10 May 2023, Olaf Hering wrote:
> Wed, 10 May 2023 00:58:27 +0200 Olaf Hering :
> 
> > In my debugging (with v8.0.0) it turned out the three pci_set_word
> > causes the domU to hang. In fact, it is just the last one:
> > 
> >pci_set_byte(pci_conf + 0x20, 0x01);  /* BMIBA: 20-23h */
> > 
> > It changes the value from 0xc121 to 0x1.
> 
> If I disable just "pci_set_word(pci_conf + PCI_COMMAND, 0x);" it works as 
> well.
> It changes the value from 0x5 to 0.
> 
> In general I feel it is wrong to fiddle with PCI from the host side.
> This is most likely not the intention of the Xen unplug protocol.
> I'm sure the guest does not expect such changes under the hood.
> It happens to work by luck with pvops kernels because their PCI discovery
> is done after the unplug.
> 
> So, what do we do here to get this off the table?

I don't have a concrete suggestion because I don't understand the root
cause of the issue. Looking back at Paolo's reply from 2021

https://marc.info/?l=xen-devel=161669099305992=2

I think he was right. We can either fix the root cause of the issue or
avoid calling qdev_reset_all on unplug. I am OK with either one.



Re: [Qemu-block] [PATCH 02/18] xen: introduce new 'XenBus' and 'XenDevice' object hierarchy

2018-11-28 Thread Stefano Stabellini
On Wed, 28 Nov 2018, Paul Durrant wrote:
> > -Original Message-
> > From: Kevin Wolf [mailto:kw...@redhat.com]
> > Sent: 28 November 2018 16:19
> > To: Paul Durrant 
> > Cc: qemu-block@nongnu.org; qemu-de...@nongnu.org; xen-
> > de...@lists.xenproject.org; Stefano Stabellini ;
> > Eduardo Habkost ; Michael S. Tsirkin
> > ; Marcel Apfelbaum ; Anthony
> > Perard ; Paolo Bonzini ;
> > Richard Henderson 
> > Subject: Re: [Qemu-block] [PATCH 02/18] xen: introduce new 'XenBus' and
> > 'XenDevice' object hierarchy
> > 
> > Am 21.11.2018 um 16:11 hat Paul Durrant geschrieben:
> > > This patch adds the basic boilerplate for a 'XenBus' object that will
> > act
> > > as a parent to 'XenDevice' PV backends.
> > > A new 'XenBridge' object is also added to connect XenBus to the system
> > bus.
> > >
> > > The XenBus object is instantiated by a new xen_bus_init() function
> > called
> > > from the same sites as the legacy xen_be_init() function.
> > >
> > > Subsequent patches will flesh-out the functionality of these objects.
> > >
> > > Signed-off-by: Paul Durrant 
> > 
> > > diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c
> > > new file mode 100644
> > > index 00..dede2d914a
> > > --- /dev/null
> > > +++ b/hw/xen/xen-bus.c
> > > @@ -0,0 +1,125 @@
> > > +/*
> > > + * Copyright (c) Citrix Systems Inc.
> > > + * All rights reserved.
> > > + */
> > 
> > This doesn't look very compatible with the GPL. In fact it might even
> > make it illegal for the QEMU project to distribute this code. :-)
> > 
> > Other files you add throughout the series seem to have the same problem.
> > 
> 
> I was working on the assumption that a lack of explicit license meant that 
> the overall project license as described in item 2 in LICENSE. Did I 
> misinterpret that text?

It's "All rights reserved." the problem



Re: [Qemu-block] [PATCH 17/18] MAINTAINERS: add myself as a Xen maintainer

2018-11-27 Thread Stefano Stabellini
On Wed, 21 Nov 2018, Paul Durrant wrote:
> I have made many significant contributions to the Xen code in QEMU,
> particularly the recent patches introducing a new PV device framework.
> I intend to make further significant contributions, porting other PV back-
> ends to the new framework with the intent of eventually removing the
> legacy code. It therefore seems reasonable that I become a maintiner of
> the Xen code.
> 
> Signed-off-by: Paul Durrant 

Acked-by: Stefano Stabellini 

> ---
> Cc: Stefano Stabellini 
> Cc: Anthony Perard 
> Cc: Paolo Bonzini 
> ---
>  MAINTAINERS | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 5871f035c3..0b668dd205 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -382,6 +382,7 @@ Guest CPU Cores (Xen):
>  X86
>  M: Stefano Stabellini 
>  M: Anthony Perard 
> +M: Paul Durrant 
>  L: xen-de...@lists.xenproject.org
>  S: Supported
>  F: */xen*
> -- 
> 2.11.0
> 



Re: [Qemu-block] [PATCH v2] xen-disk: use g_new0 to fix build

2017-07-28 Thread Stefano Stabellini
On Fri, 28 Jul 2017, Olaf Hering wrote:
> g_malloc0_n is available since glib-2.24. To allow build with older glib
> versions use the generic g_new0, which is already used in many other
> places in the code.
> 
> Fixes commit 3284fad728 ("xen-disk: add support for multi-page shared rings")
> 
> Signed-off-by: Olaf Hering <o...@aepfle.de>

Reviewed-by: Stefano Stabellini <sstabell...@kernel.org>

I'll add it to my queue


> ---
>  hw/block/xen_disk.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index d42ed7070d..536e2ee735 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -1232,7 +1232,7 @@ static int blk_connect(struct XenDevice *xendev)
>  return -1;
>  }
>  
> -domids = g_malloc0_n(blkdev->nr_ring_ref, sizeof(uint32_t));
> +domids = g_new0(uint32_t, blkdev->nr_ring_ref);
>  for (i = 0; i < blkdev->nr_ring_ref; i++) {
>  domids[i] = blkdev->xendev.dom;
>  }
> 



Re: [Qemu-block] [Qemu-devel] [PATCH v2] xen-disk: use g_new0 to fix build

2017-07-28 Thread Stefano Stabellini
On Fri, 28 Jul 2017, Philippe Mathieu-Daudé wrote:
> Hi Olaf,
> 
> On 07/28/2017 10:11 AM, Olaf Hering wrote:
> > g_malloc0_n is available since glib-2.24. To allow build with older glib
> > versions use the generic g_new0, which is already used in many other
> > places in the code.
> 
> Can you provide information about which distrib/release/version/[packages?]
> you used? So we might add the same setup in QEMU continuous integration
> system.

FYI http://marc.info/?l=qemu-devel=150113545516942=2



> > 
> > Fixes commit 3284fad728 ("xen-disk: add support for multi-page shared
> > rings")
> > 
> > Signed-off-by: Olaf Hering 
> > ---
> >   hw/block/xen_disk.c | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> > index d42ed7070d..536e2ee735 100644
> > --- a/hw/block/xen_disk.c
> > +++ b/hw/block/xen_disk.c
> > @@ -1232,7 +1232,7 @@ static int blk_connect(struct XenDevice *xendev)
> >   return -1;
> >   }
> >   -domids = g_malloc0_n(blkdev->nr_ring_ref, sizeof(uint32_t));
> > +domids = g_new0(uint32_t, blkdev->nr_ring_ref);
> >   for (i = 0; i < blkdev->nr_ring_ref; i++) {
> >   domids[i] = blkdev->xendev.dom;
> >   }
> > 
> 

Re: [Qemu-block] [PATCH v2 3/3] xen-disk: use an IOThread per instance

2017-07-07 Thread Stefano Stabellini
On Fri, 7 Jul 2017, Paul Durrant wrote:
> > -Original Message-
> > From: Stefano Stabellini [mailto:sstabell...@kernel.org]
> > Sent: 22 June 2017 23:15
> > To: Paul Durrant <paul.durr...@citrix.com>
> > Cc: xen-de...@lists.xenproject.org; qemu-de...@nongnu.org; qemu-
> > bl...@nongnu.org; Stefano Stabellini <sstabell...@kernel.org>; Anthony
> > Perard <anthony.per...@citrix.com>; Kevin Wolf <kw...@redhat.com>;
> > Max Reitz <mre...@redhat.com>; afaer...@suse.de
> > Subject: Re: [PATCH v2 3/3] xen-disk: use an IOThread per instance
> > 
> > CC'ing Andreas Färber. Could you please give a quick look below at the
> > way the iothread object is instantiate and destroyed? I am no object
> > model expert and would appreaciate a second opinion.
> > 
> 
> I have not seen any response so far.
> 
> > 
> > On Wed, 21 Jun 2017, Paul Durrant wrote:
> > > This patch allocates an IOThread object for each xen_disk instance and
> > > sets the AIO context appropriately on connect. This allows processing
> > > of I/O to proceed in parallel.
> > >
> > > The patch also adds tracepoints into xen_disk to make it possible to
> > > follow the state transtions of an instance in the log.
> > >
> > > Signed-off-by: Paul Durrant <paul.durr...@citrix.com>
> > > ---
> > > Cc: Stefano Stabellini <sstabell...@kernel.org>
> > > Cc: Anthony Perard <anthony.per...@citrix.com>
> > > Cc: Kevin Wolf <kw...@redhat.com>
> > > Cc: Max Reitz <mre...@redhat.com>
> > >
> > > v2:
> > >  - explicitly acquire and release AIO context in qemu_aio_complete() and
> > >blk_bh()
> > > ---
> > >  hw/block/trace-events |  7 ++
> > >  hw/block/xen_disk.c   | 69
> > ---
> > >  2 files changed, 67 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/hw/block/trace-events b/hw/block/trace-events
> > > index 65e83dc258..608b24ba66 100644
> > > --- a/hw/block/trace-events
> > > +++ b/hw/block/trace-events
> > > @@ -10,3 +10,10 @@ virtio_blk_submit_multireq(void *mrb, int start, int
> > num_reqs, uint64_t offset,
> > >  # hw/block/hd-geometry.c
> > >  hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p
> > LCHS %d %d %d"
> > >  hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t
> > secs, int trans) "blk %p CHS %u %u %u trans %d"
> > > +
> > > +# hw/block/xen_disk.c
> > > +xen_disk_alloc(char *name) "%s"
> > > +xen_disk_init(char *name) "%s"
> > > +xen_disk_connect(char *name) "%s"
> > > +xen_disk_disconnect(char *name) "%s"
> > > +xen_disk_free(char *name) "%s"
> > > diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> > > index 0e6513708e..8548195195 100644
> > > --- a/hw/block/xen_disk.c
> > > +++ b/hw/block/xen_disk.c
> > > @@ -27,10 +27,13 @@
> > >  #include "hw/xen/xen_backend.h"
> > >  #include "xen_blkif.h"
> > >  #include "sysemu/blockdev.h"
> > > +#include "sysemu/iothread.h"
> > >  #include "sysemu/block-backend.h"
> > >  #include "qapi/error.h"
> > >  #include "qapi/qmp/qdict.h"
> > >  #include "qapi/qmp/qstring.h"
> > > +#include "qom/object_interfaces.h"
> > > +#include "trace.h"
> > >
> > >  /* - */
> > >
> > > @@ -128,6 +131,9 @@ struct XenBlkDev {
> > >  DriveInfo   *dinfo;
> > >  BlockBackend*blk;
> > >  QEMUBH  *bh;
> > > +
> > > +IOThread*iothread;
> > > +AioContext  *ctx;
> > >  };
> > >
> > >  /* - */
> > > @@ -599,9 +605,12 @@ static int ioreq_runio_qemu_aio(struct ioreq
> > *ioreq);
> > >  static void qemu_aio_complete(void *opaque, int ret)
> > >  {
> > >  struct ioreq *ioreq = opaque;
> > > +struct XenBlkDev *blkdev = ioreq->blkdev;
> > > +
> > > +aio_context_acquire(blkdev->ctx);
> > 
> > I think that Paolo was right that we need a aio_context_acquire here,
> > however the issue is that with the current code:
> &g

Re: [Qemu-block] [Xen-devel] [PATCH v2 0/3] xen-disk: performance improvements

2017-06-27 Thread Stefano Stabellini
On Wed, 21 Jun 2017, Paul Durrant wrote:
> Paul Durrant (3):
>   xen-disk: only advertize feature-persistent if grant copy is not
> available
>   xen-disk: add support for multi-page shared rings
>   xen-disk: use an IOThread per instance
> 
>  hw/block/trace-events |   7 ++
>  hw/block/xen_disk.c   | 228 
> +++---
>  2 files changed, 188 insertions(+), 47 deletions(-)

While waiting for an answer on patch #3, I sent a pull request for the
first 2 patches



Re: [Qemu-block] [PATCH v2 3/3] xen-disk: use an IOThread per instance

2017-06-22 Thread Stefano Stabellini
CC'ing Andreas Färber. Could you please give a quick look below at the
way the iothread object is instantiate and destroyed? I am no object
model expert and would appreaciate a second opinion.


On Wed, 21 Jun 2017, Paul Durrant wrote:
> This patch allocates an IOThread object for each xen_disk instance and
> sets the AIO context appropriately on connect. This allows processing
> of I/O to proceed in parallel.
> 
> The patch also adds tracepoints into xen_disk to make it possible to
> follow the state transtions of an instance in the log.
> 
> Signed-off-by: Paul Durrant <paul.durr...@citrix.com>
> ---
> Cc: Stefano Stabellini <sstabell...@kernel.org>
> Cc: Anthony Perard <anthony.per...@citrix.com>
> Cc: Kevin Wolf <kw...@redhat.com>
> Cc: Max Reitz <mre...@redhat.com>
> 
> v2:
>  - explicitly acquire and release AIO context in qemu_aio_complete() and
>blk_bh()
> ---
>  hw/block/trace-events |  7 ++
>  hw/block/xen_disk.c   | 69 
> ---
>  2 files changed, 67 insertions(+), 9 deletions(-)
> 
> diff --git a/hw/block/trace-events b/hw/block/trace-events
> index 65e83dc258..608b24ba66 100644
> --- a/hw/block/trace-events
> +++ b/hw/block/trace-events
> @@ -10,3 +10,10 @@ virtio_blk_submit_multireq(void *mrb, int start, int 
> num_reqs, uint64_t offset,
>  # hw/block/hd-geometry.c
>  hd_geometry_lchs_guess(void *blk, int cyls, int heads, int secs) "blk %p 
> LCHS %d %d %d"
>  hd_geometry_guess(void *blk, uint32_t cyls, uint32_t heads, uint32_t secs, 
> int trans) "blk %p CHS %u %u %u trans %d"
> +
> +# hw/block/xen_disk.c
> +xen_disk_alloc(char *name) "%s"
> +xen_disk_init(char *name) "%s"
> +xen_disk_connect(char *name) "%s"
> +xen_disk_disconnect(char *name) "%s"
> +xen_disk_free(char *name) "%s"
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 0e6513708e..8548195195 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -27,10 +27,13 @@
>  #include "hw/xen/xen_backend.h"
>  #include "xen_blkif.h"
>  #include "sysemu/blockdev.h"
> +#include "sysemu/iothread.h"
>  #include "sysemu/block-backend.h"
>  #include "qapi/error.h"
>  #include "qapi/qmp/qdict.h"
>  #include "qapi/qmp/qstring.h"
> +#include "qom/object_interfaces.h"
> +#include "trace.h"
>  
>  /* - */
>  
> @@ -128,6 +131,9 @@ struct XenBlkDev {
>  DriveInfo   *dinfo;
>  BlockBackend*blk;
>  QEMUBH  *bh;
> +
> +IOThread*iothread;
> +AioContext  *ctx;
>  };
>  
>  /* - */
> @@ -599,9 +605,12 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq);
>  static void qemu_aio_complete(void *opaque, int ret)
>  {
>  struct ioreq *ioreq = opaque;
> +struct XenBlkDev *blkdev = ioreq->blkdev;
> +
> +aio_context_acquire(blkdev->ctx);

I think that Paolo was right that we need a aio_context_acquire here,
however the issue is that with the current code:

  blk_handle_requests -> ioreq_runio_qemu_aio -> qemu_aio_complete

leading to aio_context_acquire being called twice on the same lock,
which I don't think is allowed?

I think we need to get rid of the qemu_aio_complete call from
ioreq_runio_qemu_aio, but to do that we need to be careful with the
accounting of aio_inflight (today it's incremented unconditionally at
the beginning of ioreq_runio_qemu_aio, I think we would have to change
that to increment it only if presync).


>  if (ret != 0) {
> -xen_pv_printf(>blkdev->xendev, 0, "%s I/O error\n",
> +xen_pv_printf(>xendev, 0, "%s I/O error\n",
>ioreq->req.operation == BLKIF_OP_READ ? "read" : 
> "write");
>  ioreq->aio_errors++;
>  }
> @@ -610,13 +619,13 @@ static void qemu_aio_complete(void *opaque, int ret)
>  if (ioreq->presync) {
>  ioreq->presync = 0;
>  ioreq_runio_qemu_aio(ioreq);
> -return;
> +goto done;
>  }
>  if (ioreq->aio_inflight > 0) {
> -return;
> +goto done;
>  }
>  
> -if (ioreq->blkdev->feature_grant_copy) {
> +if (blkdev->feature_grant_copy) {
>  switch (ioreq->req.operation) {
>  case BLKIF_OP_READ:
>  /* in case of failure ioreq->aio_errors is increased */
> @@ -638,7 +647,7 @@ static void qemu_aio_complete(

Re: [Qemu-block] [PATCH v2 1/3] xen-disk: only advertize feature-persistent if grant copy is not available

2017-06-21 Thread Stefano Stabellini
On Wed, 21 Jun 2017, Paul Durrant wrote:
> If grant copy is available then it will always be used in preference to
> persistent maps. In this case feature-persistent should not be advertized
> to the frontend, otherwise it may needlessly copy data into persistently
> granted buffers.
> 
> Signed-off-by: Paul Durrant <paul.durr...@citrix.com>

Reviewed-by: Stefano Stabellini <sstabell...@kernel.org>


> ---
> Cc: Stefano Stabellini <sstabell...@kernel.org>
> Cc: Anthony Perard <anthony.per...@citrix.com>
> Cc: Kevin Wolf <kw...@redhat.com>
> Cc: Max Reitz <mre...@redhat.com>
> ---
>  hw/block/xen_disk.c | 15 ---
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 3a22805fbc..9b06e3aa81 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -1023,11 +1023,18 @@ static int blk_init(struct XenDevice *xendev)
>  
>  blkdev->file_blk  = BLOCK_SIZE;
>  
> +blkdev->feature_grant_copy =
> +(xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 
> 0);
> +
> +xen_pv_printf(>xendev, 3, "grant copy operation %s\n",
> +  blkdev->feature_grant_copy ? "enabled" : "disabled");
> +
>  /* fill info
>   * blk_connect supplies sector-size and sectors
>   */
>  xenstore_write_be_int(>xendev, "feature-flush-cache", 1);
> -xenstore_write_be_int(>xendev, "feature-persistent", 1);
> +xenstore_write_be_int(>xendev, "feature-persistent",
> +  !blkdev->feature_grant_copy);
>  xenstore_write_be_int(>xendev, "info", info);
>  
>  blk_parse_discard(blkdev);
> @@ -1202,12 +1209,6 @@ static int blk_connect(struct XenDevice *xendev)
>  
>  xen_be_bind_evtchn(>xendev);
>  
> -blkdev->feature_grant_copy =
> -(xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 
> 0);
> -
> -xen_pv_printf(>xendev, 3, "grant copy operation %s\n",
> -  blkdev->feature_grant_copy ? "enabled" : "disabled");
> -
>  xen_pv_printf(>xendev, 1, "ok: proto %s, ring-ref %d, "
>"remote port %d, local port %d\n",
>blkdev->xendev.protocol, blkdev->ring_ref,
> -- 
> 2.11.0
> 



Re: [Qemu-block] [PATCH v2 2/3] xen-disk: add support for multi-page shared rings

2017-06-21 Thread Stefano Stabellini
On Wed, 21 Jun 2017, Paul Durrant wrote:
> The blkif protocol has had provision for negotiation of multi-page shared
> rings for some time now and many guest OS have support in their frontend
> drivers.
> 
> This patch makes the necessary modifications to xen-disk support a shared
> ring up to order 4 (i.e. 16 pages).
> 
> Signed-off-by: Paul Durrant <paul.durr...@citrix.com>

Reviewed-by: Stefano Stabellini <sstabell...@kernel.org>

> ---
> Cc: Stefano Stabellini <sstabell...@kernel.org>
> Cc: Anthony Perard <anthony.per...@citrix.com>
> Cc: Kevin Wolf <kw...@redhat.com>
> Cc: Max Reitz <mre...@redhat.com>
> 
> v2:
>  - Fix memory leak in error path
>  - Print warning if ring-page-order exceeds limits
> ---
>  hw/block/xen_disk.c | 144 
> +---
>  1 file changed, 113 insertions(+), 31 deletions(-)
> 
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 9b06e3aa81..0e6513708e 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -36,8 +36,6 @@
>  
>  static int batch_maps   = 0;
>  
> -static int max_requests = 32;
> -
>  /* - */
>  
>  #define BLOCK_SIZE  512
> @@ -84,6 +82,8 @@ struct ioreq {
>  BlockAcctCookie acct;
>  };
>  
> +#define MAX_RING_PAGE_ORDER 4
> +
>  struct XenBlkDev {
>  struct XenDevicexendev;  /* must be first */
>  char*params;
> @@ -94,7 +94,8 @@ struct XenBlkDev {
>  booldirectiosafe;
>  const char  *fileproto;
>  const char  *filename;
> -int ring_ref;
> +unsigned intring_ref[1 << MAX_RING_PAGE_ORDER];
> +unsigned intnr_ring_ref;
>  void*sring;
>  int64_t file_blk;
>  int64_t file_size;
> @@ -110,6 +111,7 @@ struct XenBlkDev {
>  int requests_total;
>  int requests_inflight;
>  int requests_finished;
> +unsigned intmax_requests;
>  
>  /* Persistent grants extension */
>  gbooleanfeature_discard;
> @@ -199,7 +201,7 @@ static struct ioreq *ioreq_start(struct XenBlkDev *blkdev)
>  struct ioreq *ioreq = NULL;
>  
>  if (QLIST_EMPTY(>freelist)) {
> -if (blkdev->requests_total >= max_requests) {
> +if (blkdev->requests_total >= blkdev->max_requests) {
>  goto out;
>  }
>  /* allocate new struct */
> @@ -905,7 +907,7 @@ static void blk_handle_requests(struct XenBlkDev *blkdev)
>  ioreq_runio_qemu_aio(ioreq);
>  }
>  
> -if (blkdev->more_work && blkdev->requests_inflight < max_requests) {
> +if (blkdev->more_work && blkdev->requests_inflight < 
> blkdev->max_requests) {
>  qemu_bh_schedule(blkdev->bh);
>  }
>  }
> @@ -918,15 +920,6 @@ static void blk_bh(void *opaque)
>  blk_handle_requests(blkdev);
>  }
>  
> -/*
> - * We need to account for the grant allocations requiring contiguous
> - * chunks; the worst case number would be
> - * max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
> - * but in order to keep things simple just use
> - * 2 * max_req * max_seg.
> - */
> -#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
> -
>  static void blk_alloc(struct XenDevice *xendev)
>  {
>  struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, 
> xendev);
> @@ -938,11 +931,6 @@ static void blk_alloc(struct XenDevice *xendev)
>  if (xen_mode != XEN_EMULATE) {
>  batch_maps = 1;
>  }
> -if (xengnttab_set_max_grants(xendev->gnttabdev,
> -MAX_GRANTS(max_requests, BLKIF_MAX_SEGMENTS_PER_REQUEST)) < 0) {
> -xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
> -  strerror(errno));
> -}
>  }
>  
>  static void blk_parse_discard(struct XenBlkDev *blkdev)
> @@ -1037,6 +1025,9 @@ static int blk_init(struct XenDevice *xendev)
>!blkdev->feature_grant_copy);
>  xenstore_write_be_int(>xendev, "info", info);
>  
> +xenstore_write_be_int(>xendev, "max-ring-page-order",
> +  MAX_RING_PAGE_ORDER);
> +
>  blk_parse_discard(blkdev);
>  
>  g_free(directiosafe);
> @@ -1058,12 +1049,25 @@ out_error:
>  return -1;
>  }
>  
> +/*
> + * We need to account for the grant allocations requiring contiguous
> + * chunks; the worst case 

Re: [Qemu-block] [PATCH 2/3] xen-disk: add support for multi-page shared rings

2017-06-20 Thread Stefano Stabellini
On Tue, 20 Jun 2017, Paul Durrant wrote:
> The blkif protocol has had provision for negotiation of multi-page shared
> rings for some time now and many guest OS have support in their frontend
> drivers.
> 
> This patch makes the necessary modifications to xen-disk support a shared
> ring up to order 4 (i.e. 16 pages).
> 
> Signed-off-by: Paul Durrant <paul.durr...@citrix.com>

Thanks for the patch!

> ---
> Cc: Stefano Stabellini <sstabell...@kernel.org>
> Cc: Anthony Perard <anthony.per...@citrix.com>
> Cc: Kevin Wolf <kw...@redhat.com>
> Cc: Max Reitz <mre...@redhat.com>
> ---
>  hw/block/xen_disk.c | 141 
> 
>  1 file changed, 110 insertions(+), 31 deletions(-)
> 
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 9b06e3aa81..a9942d32db 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -36,8 +36,6 @@
>  
>  static int batch_maps   = 0;
>  
> -static int max_requests = 32;
> -
>  /* - */
>  
>  #define BLOCK_SIZE  512
> @@ -84,6 +82,8 @@ struct ioreq {
>  BlockAcctCookie acct;
>  };
>  
> +#define MAX_RING_PAGE_ORDER 4
> +
>  struct XenBlkDev {
>  struct XenDevicexendev;  /* must be first */
>  char*params;
> @@ -94,7 +94,8 @@ struct XenBlkDev {
>  booldirectiosafe;
>  const char  *fileproto;
>  const char  *filename;
> -int ring_ref;
> +unsigned intring_ref[1 << MAX_RING_PAGE_ORDER];
> +unsigned intnr_ring_ref;
>  void*sring;
>  int64_t file_blk;
>  int64_t file_size;
> @@ -110,6 +111,7 @@ struct XenBlkDev {
>  int requests_total;
>  int requests_inflight;
>  int requests_finished;
> +unsigned intmax_requests;
>  
>  /* Persistent grants extension */
>  gbooleanfeature_discard;
> @@ -199,7 +201,7 @@ static struct ioreq *ioreq_start(struct XenBlkDev *blkdev)
>  struct ioreq *ioreq = NULL;
>  
>  if (QLIST_EMPTY(>freelist)) {
> -if (blkdev->requests_total >= max_requests) {
> +if (blkdev->requests_total >= blkdev->max_requests) {
>  goto out;
>  }
>  /* allocate new struct */
> @@ -905,7 +907,7 @@ static void blk_handle_requests(struct XenBlkDev *blkdev)
>  ioreq_runio_qemu_aio(ioreq);
>  }
>  
> -if (blkdev->more_work && blkdev->requests_inflight < max_requests) {
> +if (blkdev->more_work && blkdev->requests_inflight < 
> blkdev->max_requests) {
>  qemu_bh_schedule(blkdev->bh);
>  }
>  }
> @@ -918,15 +920,6 @@ static void blk_bh(void *opaque)
>  blk_handle_requests(blkdev);
>  }
>  
> -/*
> - * We need to account for the grant allocations requiring contiguous
> - * chunks; the worst case number would be
> - * max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
> - * but in order to keep things simple just use
> - * 2 * max_req * max_seg.
> - */
> -#define MAX_GRANTS(max_req, max_seg) (2 * (max_req) * (max_seg))
> -
>  static void blk_alloc(struct XenDevice *xendev)
>  {
>  struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, 
> xendev);
> @@ -938,11 +931,6 @@ static void blk_alloc(struct XenDevice *xendev)
>  if (xen_mode != XEN_EMULATE) {
>  batch_maps = 1;
>  }
> -if (xengnttab_set_max_grants(xendev->gnttabdev,
> -MAX_GRANTS(max_requests, BLKIF_MAX_SEGMENTS_PER_REQUEST)) < 0) {
> -xen_pv_printf(xendev, 0, "xengnttab_set_max_grants failed: %s\n",
> -  strerror(errno));
> -}
>  }
>  
>  static void blk_parse_discard(struct XenBlkDev *blkdev)
> @@ -1037,6 +1025,9 @@ static int blk_init(struct XenDevice *xendev)
>!blkdev->feature_grant_copy);
>  xenstore_write_be_int(>xendev, "info", info);
>  
> +xenstore_write_be_int(>xendev, "max-ring-page-order",
> +  MAX_RING_PAGE_ORDER);
> +
>  blk_parse_discard(blkdev);
>  
>  g_free(directiosafe);
> @@ -1058,12 +1049,25 @@ out_error:
>  return -1;
>  }
>  
> +/*
> + * We need to account for the grant allocations requiring contiguous
> + * chunks; the worst case number would be
> + * max_req * max_seg + (max_req - 1) * (max_seg - 1) + 1,
> + * but in order to keep things simple just use
> + * 

Re: [Qemu-block] [PATCH 1/3] xen-disk: only advertize feature-persistent if grant copy is not available

2017-06-20 Thread Stefano Stabellini
On Tue, 20 Jun 2017, Paul Durrant wrote:
> If grant copy is available then it will always be used in preference to
> persistent maps. In this case feature-persistent should not be advertized
> to the frontend, otherwise it may needlessly copy data into persistently
> granted buffers.
> 
> Signed-off-by: Paul Durrant <paul.durr...@citrix.com>

CC'ing Roger.

It is true that using feature-persistent together with grant copies is a
a very bad idea.

But this change enstablishes an explicit preference of
feature_grant_copy over feature-persistent in the xen_disk backend. It
is not obvious to me that it should be the case.

Why is feature_grant_copy (without feature-persistent) better than
feature-persistent (without feature_grant_copy)? Shouldn't we simply
avoid grant copies to copy data to persistent grants?


> ---
> Cc: Stefano Stabellini <sstabell...@kernel.org>
> Cc: Anthony Perard <anthony.per...@citrix.com>
> Cc: Kevin Wolf <kw...@redhat.com>
> Cc: Max Reitz <mre...@redhat.com>
> ---
>  hw/block/xen_disk.c | 15 ---
>  1 file changed, 8 insertions(+), 7 deletions(-)
> 
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 3a22805fbc..9b06e3aa81 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -1023,11 +1023,18 @@ static int blk_init(struct XenDevice *xendev)
>  
>  blkdev->file_blk  = BLOCK_SIZE;
>  
> +blkdev->feature_grant_copy =
> +(xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 
> 0);
> +
> +xen_pv_printf(>xendev, 3, "grant copy operation %s\n",
> +  blkdev->feature_grant_copy ? "enabled" : "disabled");
> +
>  /* fill info
>   * blk_connect supplies sector-size and sectors
>   */
>  xenstore_write_be_int(>xendev, "feature-flush-cache", 1);
> -xenstore_write_be_int(>xendev, "feature-persistent", 1);
> +xenstore_write_be_int(>xendev, "feature-persistent",
> +  !blkdev->feature_grant_copy);
>  xenstore_write_be_int(>xendev, "info", info);
>  
>  blk_parse_discard(blkdev);
> @@ -1202,12 +1209,6 @@ static int blk_connect(struct XenDevice *xendev)
>  
>  xen_be_bind_evtchn(>xendev);
>  
> -blkdev->feature_grant_copy =
> -(xengnttab_grant_copy(blkdev->xendev.gnttabdev, 0, NULL) == 
> 0);
> -
> -xen_pv_printf(>xendev, 3, "grant copy operation %s\n",
> -  blkdev->feature_grant_copy ? "enabled" : "disabled");
> -
>  xen_pv_printf(>xendev, 1, "ok: proto %s, ring-ref %d, "
>"remote port %d, local port %d\n",
>blkdev->xendev.protocol, blkdev->ring_ref,
> -- 
> 2.11.0
> 



Re: [Qemu-block] [PATCH] xen_disk: convert discard input to byte ranges

2016-11-23 Thread Stefano Stabellini
On Wed, 23 Nov 2016, Olaf Hering wrote:
> On Wed, Nov 23, Olaf Hering wrote:
> 
> > > > +if (!blk_split_discard(ioreq, req->sector_number, 
> > > > req->nr_sectors)) {
> > > > +goto err;
> > > How is error handling supposed to work here?
> 
> In the guest the cmd is stuck, instead of getting an IO error:
> 
> [   91.966404] mkfs.ext4   D  0  2878   2831 
> 0x
> [   91.966406]  88002204bc48 880030530480 88002fae5800 
> 88002204c000
> [   91.966407]   7fff 8000 
> 024000c0
> [   91.966409]  88002204bc60 815dd985 880038815c00 
> 88002204bd08
> [   91.966409] Call Trace:
> [   91.966413]  [] schedule+0x35/0x80
> [   91.966416]  [] schedule_timeout+0x237/0x2d0
> [   91.966419]  [] io_schedule_timeout+0xa6/0x110
> [   91.966421]  [] wait_for_completion_io+0xa3/0x110
> [   91.966425]  [] submit_bio_wait+0x50/0x60
> [   91.966430]  [] blkdev_issue_discard+0x78/0xb0
> [   91.966433]  [] blk_ioctl_discard+0x7b/0xa0
> [   91.966436]  [] blkdev_ioctl+0x730/0x920
> [   91.966440]  [] block_ioctl+0x3d/0x40
> [   91.966444]  [] do_vfs_ioctl+0x2cd/0x4a0
> [   91.966453]  [] SyS_ioctl+0x74/0x80
> [   91.966456]  [] entry_SYSCALL_64_fastpath+0x12/0x6d

The error should be sent back to the frontend via the status field. Not
sure why blkfront is not hanlding it correctly.



Re: [Qemu-block] [PATCH v3] xen_disk: split discard input to match internal representation

2016-11-23 Thread Stefano Stabellini
On Wed, 23 Nov 2016, Olaf Hering wrote:
> The guest sends discard requests as u64 sector/count pairs, but the
> block layer operates internally with s64/s32 pairs. The conversion
> leads to IO errors in the guest, the discard request is not processed.
> 
>   domU.cfg:
>   'vdev=xvda, format=qcow2, backendtype=qdisk, target=/x.qcow2'
>   domU:
>   mkfs.ext4 -F /dev/xvda
>   Discarding device blocks: failed - Input/output error
> 
> Fix this by splitting the request into chunks of BDRV_REQUEST_MAX_SECTORS.
> Add input range checking to avoid overflow.
> 
> Fixes f313520 ("xen_disk: add discard support")
> 
> Signed-off-by: Olaf Hering <o...@aepfle.de>

Reviewed-by: Stefano Stabellini <sstabell...@kernel.org>


> v3:
>  turn tab into spaces to fix checkpatch warning
> v2:
>  adjust overflow check
>  add Fixes revspec because the initial commit also failed to convert u64 to 
> s32
>  adjust summary
> 
>  hw/block/xen_disk.c | 42 --
>  1 file changed, 36 insertions(+), 6 deletions(-)
> 
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 3a7dc19..456a2d5 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -660,6 +660,38 @@ static void qemu_aio_complete(void *opaque, int ret)
>  qemu_bh_schedule(ioreq->blkdev->bh);
>  }
>  
> +static bool blk_split_discard(struct ioreq *ioreq, blkif_sector_t 
> sector_number,
> +  uint64_t nr_sectors)
> +{
> +struct XenBlkDev *blkdev = ioreq->blkdev;
> +int64_t byte_offset;
> +int byte_chunk;
> +uint64_t byte_remaining, limit;
> +uint64_t sec_start = sector_number;
> +uint64_t sec_count = nr_sectors;
> +
> +/* Wrap around, or overflowing byte limit? */
> +if (sec_start + sec_count < sec_count ||
> +sec_start + sec_count > INT64_MAX >> BDRV_SECTOR_BITS) {
> +return false;
> +}
> +
> +limit = BDRV_REQUEST_MAX_SECTORS << BDRV_SECTOR_BITS;
> +byte_offset = sec_start << BDRV_SECTOR_BITS;
> +byte_remaining = sec_count << BDRV_SECTOR_BITS;
> +
> +do {
> +byte_chunk = byte_remaining > limit ? limit : byte_remaining;
> +ioreq->aio_inflight++;
> +blk_aio_pdiscard(blkdev->blk, byte_offset, byte_chunk,
> + qemu_aio_complete, ioreq);
> +byte_remaining -= byte_chunk;
> +byte_offset += byte_chunk;
> +} while (byte_remaining > 0);
> +
> +return true;
> +}
> +
>  static int ioreq_runio_qemu_aio(struct ioreq *ioreq)
>  {
>  struct XenBlkDev *blkdev = ioreq->blkdev;
> @@ -708,12 +740,10 @@ static int ioreq_runio_qemu_aio(struct ioreq *ioreq)
>  break;
>  case BLKIF_OP_DISCARD:
>  {
> -struct blkif_request_discard *discard_req = (void *)>req;
> -ioreq->aio_inflight++;
> -blk_aio_pdiscard(blkdev->blk,
> - discard_req->sector_number << BDRV_SECTOR_BITS,
> - discard_req->nr_sectors << BDRV_SECTOR_BITS,
> - qemu_aio_complete, ioreq);
> +struct blkif_request_discard *req = (void *)>req;
> +if (!blk_split_discard(ioreq, req->sector_number, req->nr_sectors)) {
> +goto err;
> +}
>  break;
>  }
>  default:
> 



Re: [Qemu-block] [Qemu-devel] [PATCH for-2.8 v3] xen_disk: split discard input to match internal representation

2016-11-23 Thread Stefano Stabellini
On Wed, 23 Nov 2016, Kevin Wolf wrote:
> Am 23.11.2016 um 12:40 hat Eric Blake geschrieben:
> > On 11/23/2016 04:39 AM, Olaf Hering wrote:
> > > The guest sends discard requests as u64 sector/count pairs, but the
> > > block layer operates internally with s64/s32 pairs. The conversion
> > > leads to IO errors in the guest, the discard request is not processed.
> > > 
> > >   domU.cfg:
> > >   'vdev=xvda, format=qcow2, backendtype=qdisk, target=/x.qcow2'
> > >   domU:
> > >   mkfs.ext4 -F /dev/xvda
> > >   Discarding device blocks: failed - Input/output error
> > > 
> > > Fix this by splitting the request into chunks of BDRV_REQUEST_MAX_SECTORS.
> > > Add input range checking to avoid overflow.
> > > 
> > > Fixes f313520 ("xen_disk: add discard support")
> > > 
> > > Signed-off-by: Olaf Hering 
> > > ---
> > 
> > Qualifies as a bug fix, so requesting 2.8 inclusion.
> > Reviewed-by: Eric Blake 
> 
> Stefano, are you going to merge this or should I take a look?

I can merge it.

Cheers,

Stefano



Re: [Qemu-block] qdevification of xen_disk

2016-09-29 Thread Stefano Stabellini
Hi Kevin,

I agree with you, and if you would be so kind to send the patches, even
untested, they would be much appreciated. Anthony or I will make sure to
test them appropriately and fix them, if they turn out to be incomplete
or partially broken. Would that be OK?

Cheers,

Stefano

P.S.
FYI Xen works on top of QEMU with PV guests. So theoretically you could
install Xen inside QEMU and use Xen to lunch a PV guest which uses
xen_block as Xen backend. Fun!! :-D


On Thu, 29 Sep 2016, Kevin Wolf wrote:
> Hi Stefano and all,
> 
> while working on some part of the QEMU block layer infrastructure that
> requires going from a BlockBackend to the qdev DeviceState, I noticed
> that xen_disk is still not qdevified after all the years. It's the last
> device, and has been for a while, that is blocking the necessary changes
> in the block layer.
> 
> Specifically, I'm talking about the blk_attach_dev_nofail() and
> blk_detach_dev() functions (and related ones, which don't seem to be
> used by xen_disk though), which must be converted from accepting a void*
> for the device to DeviceState*.
> 
> I think in theory it shouldn't be too hard to build a minimal qdev
> device between xen_disk and the block layer. If you want me to, I can
> try to do that - however, I don't have a Xen setup to actually test the
> result, so if things break, you get to keep the pieces. If someone else
> would like to look into this in the next few days, that might be the
> better option.
> 
> In any case, we need to do something to make xen_disk compatible with
> the modern (well, not _that_ modern any more) infrastructure that all
> other devices use.
> 
> Kevin
> 



Re: [Qemu-block] [PATCH] xen: piix reuse pci generic class init function

2016-04-07 Thread Stefano Stabellini
On Thu, 7 Apr 2016, John Snow wrote:
> > On Wed, Apr 06, 2016 at 04:56:12PM -0700, Stefano Stabellini wrote:
> >> On Sun, 3 Apr 2016, Michael S. Tsirkin wrote:
> >>> piix3_ide_xen_class_init is identical to piix3_ide_class_init
> >>> except it's buggy as it does not set exit and does not disable
> >>> hotplug properly.
> >>>
> >>> Switch to the generic one.
> >>>
> >>> Reviewed-by: Stefano Stabellini <sstabell...@kernel.org>
> >>> Signed-off-by: Michael S. Tsirkin <m...@redhat.com>
> >>
> >> Hey John,
> >>
> >> are you going to take the patch or do you want me to handle it?
> >>
> >> Cheers,
> >>
> >> Stefano
> > 
> > it's in my tree already.
> > 
> 
> Thank you. Apologies for being asleep at the wheel.
> I've been somewhat distracted working on something else recently.
> 
> I'll trust your judgment when it comes to XEN.

Thanks both of you.

Cheers,

Stefano


> >>
> >>>  hw/ide/piix.c | 14 +-
> >>>  1 file changed, 1 insertion(+), 13 deletions(-)
> >>>
> >>> diff --git a/hw/ide/piix.c b/hw/ide/piix.c
> >>> index df46147..0a4cbcb 100644
> >>> --- a/hw/ide/piix.c
> >>> +++ b/hw/ide/piix.c
> >>> @@ -258,22 +258,10 @@ static const TypeInfo piix3_ide_info = {
> >>>  .class_init= piix3_ide_class_init,
> >>>  };
> >>>  
> >>> -static void piix3_ide_xen_class_init(ObjectClass *klass, void *data)
> >>> -{
> >>> -DeviceClass *dc = DEVICE_CLASS(klass);
> >>> -PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> >>> -
> >>> -k->realize = pci_piix_ide_realize;
> >>> -k->vendor_id = PCI_VENDOR_ID_INTEL;
> >>> -k->device_id = PCI_DEVICE_ID_INTEL_82371SB_1;
> >>> -k->class_id = PCI_CLASS_STORAGE_IDE;
> >>> -set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
> >>> -}
> >>> -
> >>>  static const TypeInfo piix3_ide_xen_info = {
> >>>  .name  = "piix3-ide-xen",
> >>>  .parent= TYPE_PCI_IDE,
> >>> -.class_init= piix3_ide_xen_class_init,
> >>> +.class_init= piix3_ide_class_init,
> >>>  };
> >>>  
> >>>  static void piix4_ide_class_init(ObjectClass *klass, void *data)
> >>> -- 
> >>> MST
> >>>
> 



Re: [Qemu-block] [PATCH] xen: piix reuse pci generic class init function

2016-04-06 Thread Stefano Stabellini
On Sun, 3 Apr 2016, Michael S. Tsirkin wrote:
> piix3_ide_xen_class_init is identical to piix3_ide_class_init
> except it's buggy as it does not set exit and does not disable
> hotplug properly.
> 
> Switch to the generic one.
> 
> Reviewed-by: Stefano Stabellini <sstabell...@kernel.org>
> Signed-off-by: Michael S. Tsirkin <m...@redhat.com>

Hey John,

are you going to take the patch or do you want me to handle it?

Cheers,

Stefano


>  hw/ide/piix.c | 14 +-
>  1 file changed, 1 insertion(+), 13 deletions(-)
> 
> diff --git a/hw/ide/piix.c b/hw/ide/piix.c
> index df46147..0a4cbcb 100644
> --- a/hw/ide/piix.c
> +++ b/hw/ide/piix.c
> @@ -258,22 +258,10 @@ static const TypeInfo piix3_ide_info = {
>  .class_init= piix3_ide_class_init,
>  };
>  
> -static void piix3_ide_xen_class_init(ObjectClass *klass, void *data)
> -{
> -DeviceClass *dc = DEVICE_CLASS(klass);
> -PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> -
> -k->realize = pci_piix_ide_realize;
> -k->vendor_id = PCI_VENDOR_ID_INTEL;
> -k->device_id = PCI_DEVICE_ID_INTEL_82371SB_1;
> -k->class_id = PCI_CLASS_STORAGE_IDE;
> -set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
> -}
> -
>  static const TypeInfo piix3_ide_xen_info = {
>  .name  = "piix3-ide-xen",
>  .parent= TYPE_PCI_IDE,
> -.class_init= piix3_ide_xen_class_init,
> +.class_init= piix3_ide_class_init,
>  };
>  
>  static void piix4_ide_class_init(ObjectClass *klass, void *data)
> -- 
> MST
> 



Re: [Qemu-block] [Qemu-devel] [PATCH 06/20] xen_disk: Call blk_set_enable_write_cache() explicitly

2016-03-22 Thread Stefano Stabellini
On Fri, 18 Mar 2016, Kevin Wolf wrote:
> Signed-off-by: Kevin Wolf <kw...@redhat.com>
> ---
>  hw/block/xen_disk.c | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 635328f..c358709 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -888,12 +888,14 @@ static int blk_connect(struct XenDevice *xendev)
>  struct XenBlkDev *blkdev = container_of(xendev, struct XenBlkDev, 
> xendev);
>  int pers, index, qflags;
>  bool readonly = true;
> +bool writethrough = true;
>  
>  /* read-only ? */
>  if (blkdev->directiosafe) {
>  qflags = BDRV_O_NOCACHE | BDRV_O_NATIVE_AIO;
>  } else {
> -qflags = BDRV_O_CACHE_WB;
> +qflags = 0;

You might as well initialize qflags to 0 above and leave it unchanged
here. In any case:

Acked-by: Stefano Stabellini <stefano.stabell...@eu.citrix.com>


> +writethrough = false;
>  }
>  if (strcmp(blkdev->mode, "w") == 0) {
>  qflags |= BDRV_O_RDWR;
> @@ -925,6 +927,7 @@ static int blk_connect(struct XenDevice *xendev)
>  error_free(local_err);
>  return -1;
>  }
> +blk_set_enable_write_cache(blkdev->blk, !writethrough);
>  } else {
>  /* setup via qemu cmdline -> already setup for us */
>  xen_be_printf(>xendev, 2, "get configured bdrv (cmdline 
> setup)\n");
> -- 
> 1.8.3.1
> 
> 



[Qemu-block] [PATCH] xen_disk: treat "vhd" as "vpc"

2015-12-04 Thread Stefano Stabellini
The Xen toolstack uses "vhd" to specify a disk in VHD format, however
the name of the driver in QEMU is "vpc". Replace "vhd" with "vpc", so
that QEMU can find the right driver to use for it.

Signed-off-by: Stefano Stabellini <stefano.stabell...@eu.citrix.com>

diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
index 267d8a8..37e14d1 100644
--- a/hw/block/xen_disk.c
+++ b/hw/block/xen_disk.c
@@ -811,6 +811,9 @@ static int blk_init(struct XenDevice *xendev)
 if (!strcmp("aio", blkdev->fileproto)) {
 blkdev->fileproto = "raw";
 }
+if (!strcmp("vhd", blkdev->fileproto)) {
+blkdev->fileproto = "vpc";
+}
 if (blkdev->mode == NULL) {
 blkdev->mode = xenstore_read_be_str(>xendev, "mode");
 }



Re: [Qemu-block] [PATCH for-2.5] xen_disk: Remove ioreq.postsync

2015-11-25 Thread Stefano Stabellini
On Wed, 25 Nov 2015, Alberto Garcia wrote:
> This code has been dead for three years (since commit 7e7b7cba1).
> 
> Signed-off-by: Alberto Garcia <be...@igalia.com>

Reviewed-by: Stefano Stabellini <stefano.stabell...@eu.citrix.com>

Kevin,

I'll send it out together with another fix today.



>  hw/block/xen_disk.c | 8 
>  1 file changed, 8 deletions(-)
> 
> diff --git a/hw/block/xen_disk.c b/hw/block/xen_disk.c
> index 02eda6e..8146650 100644
> --- a/hw/block/xen_disk.c
> +++ b/hw/block/xen_disk.c
> @@ -75,7 +75,6 @@ struct ioreq {
>  off_t   start;
>  QEMUIOVectorv;
>  int presync;
> -int postsync;
>  uint8_t mapped;
>  
>  /* grant mapping */
> @@ -144,7 +143,6 @@ static void ioreq_reset(struct ioreq *ioreq)
>  ioreq->status = 0;
>  ioreq->start = 0;
>  ioreq->presync = 0;
> -ioreq->postsync = 0;
>  ioreq->mapped = 0;
>  
>  memset(ioreq->domids, 0, sizeof(ioreq->domids));
> @@ -520,12 +518,6 @@ static void qemu_aio_complete(void *opaque, int ret)
>  if (ioreq->aio_inflight > 0) {
>  return;
>  }
> -if (ioreq->postsync) {
> -ioreq->postsync = 0;
> -ioreq->aio_inflight++;
> -blk_aio_flush(ioreq->blkdev->blk, qemu_aio_complete, ioreq);
> -return;
> -}
>  
>  ioreq->status = ioreq->aio_errors ? BLKIF_RSP_ERROR : BLKIF_RSP_OKAY;
>  ioreq_unmap(ioreq);
> -- 
> 2.6.2
> 



Re: [Qemu-block] [Qemu-devel] Question about xen disk unplug support for ahci missed in qemu

2015-10-20 Thread Stefano Stabellini
On Mon, 19 Oct 2015, Laszlo Ersek wrote:
> On 10/16/15 21:09, Laszlo Ersek wrote:
> > On 10/16/15 13:34, Fabio Fantoni wrote:
> >> Il 16/10/2015 12:47, Stefano Stabellini ha scritto:
> >>> On Fri, 16 Oct 2015, Fabio Fantoni wrote:
> >>>> Il 16/10/2015 12:13, Anthony PERARD ha scritto:
> >>>>> On Fri, Oct 16, 2015 at 10:32:44AM +0200, Fabio Fantoni wrote:
> >>>>>> Il 15/10/2015 20:02, Anthony PERARD ha scritto:
> >>>>>>> On Thu, Oct 15, 2015 at 06:27:17PM +0200, Fabio Fantoni wrote:
> >>>>>>>> Il 14/10/2015 13:06, Stefano Stabellini ha scritto:
> >>>>>>>>> I would suggest Fabio to avoid AHCI disks altogether and just use
> >>>>>>>>> OVMF
> >>>>>>>>> with PV disks only and Anthony's patch to libxl to avoid creating
> >>>>>>>>> any
> >>>>>>>>> IDE disks: http://marc.info/?l=xen-devel=144482080812353.
> >>>>>>>>>
> >>>>>>>>> Would that work for you?
> >>>>>>>> Thanks for the advice, I tried it:
> >>>>>>>> https://github.com/Fantu/Xen/commits/rebase/m2r-testing-4.6
> >>>>>>>>
> >>>>>>>> I installed W10 pro 64 bit with ide disk, installed the win pv
> >>>>>>>> drivers
> >>>>>>>> and
> >>>>>>>> after changed to xvdX instead hdX, is the only change needed, right?
> >>>>>>>> Initial boot is ok (ovmf part about pv disks seems ok) but windows
> >>>>>>>> boot
> >>>>>>>> fails with problem with pv drivers.
> >>>>>>>> In attachment full qemu log with xen_platform trace and domU's xl
> >>>>>>>> cfg.
> >>>>>>>>
> >>>>>>>> Someone have windows domUs with ovmf and pv disks only working?
> >>>>>>>> If yes
> >>>>>>>> can
> >>>>>>>> tell me the difference to understand what can be the problem please?
> >>>>>>> When I worked on the PV disk implementation in OVMF, I was able to
> >>>>>>> boot
> >>>>>>> a Windows 8 with pv disk only.
> >>>>>>>
> >>>>>>> I don't have access to the guest configuration I was using, but I
> >>>>>>> think
> >>>>>>> one
> >>>>>>> difference would be the viridian setting, I'm pretty sure I did
> >>>>>>> not set
> >>>>>>> it.
> >>>>>>>
> >>>>>> I tried with viridian disabled but did the same thing, looking
> >>>>>> cdrom as
> >>>>>> latest thing before xenbug trace in qemu log I tried also to remove
> >>>>>> it but
> >>>>>> also in this case don't boot correctly, full qemu log in attachment.
> >>>>>> I don't know if is a ovmf thing to improve (like what seems in
> >>>>>> Laszlo and
> >>>>>> Kevin mails) or xen winpv drivers unexpected case, have you tried also
> >>>>>> with
> >>>>>> latest winpv builds? (for exclude regression)
> >>>>> No, I did not tried the latest winpv drivers.
> >>>>>
> >>>>> Sorry I can help much more that that. When I install this win8 guest
> >>>>> tried
> >>>>> to boot it with pv drivers only, that was more than a year ago. I
> >>>>> have not
> >>>>> check if it's still working. (Also I can not try anything more recent,
> >>>>> right now.)
> >>>>>
> >>>> I did many other tests, retrying with ide first boot working but show pv
> >>>> devices not working, I did another reboot (with ide) and pv devices was
> >>>> working, after I retried with pv (xvdX) and boot correctly.
> >>>> After other tests I found that with empty cdrom device (required for xl
> >>>> cd-insert/cd-eject) boot stop at start (tianocore image), same result
> >>>> with ide
> >>>> instead.
> >>>>  From xl cfg:
> >>>> disk=['/mnt/vm/disks/W10UEFI.disk1.cow-sn1,qcow2,xvda,rw',',raw,xvdb,ro,cdrom']
> >>>>
> >>>> With seabi

Re: [Qemu-block] [Qemu-devel] Question about xen disk unplug support for ahci missed in qemu

2015-10-20 Thread Stefano Stabellini
On Tue, 20 Oct 2015, Laszlo Ersek wrote:
> On 10/20/15 13:59, Stefano Stabellini wrote:
> > On Mon, 19 Oct 2015, Laszlo Ersek wrote:
> >> Could that be related to the issue you are experiencing with OVMF?
> >> Because, OVMF implies HVM (or PV-on-HVM), and your report ("empty
> >> paravirt CD-ROM" or some such -- sorry, the report remains unclear)
> >> appears to match the above message.
> >>
> >> Given that this is guest code, shouldn't the same logic be mirrored in
> >> the OVMF guest driver?
> >>
> >> /* do not create a PV cdrom device if we are an HVM guest */
> >>
> >> In other words, given that OVMF implies HVM, shouldn't OVMF too forego
> >> driving a paravirt CD-ROM entirely?
> > 
> > In the case of OVMF I think we can use the PV block interface to access
> > the cdrom, the problem is just that it cannot handle empty cdrom drives
> > at the moment and XenPvBlockFrontInitialization simply returns an error.
> 
> (*)
> 
> > A simple patch like this one should prevent OVMF from getting stuck with
> > an error when an empty cdrom is found:
> > 
> > diff --git a/OvmfPkg/XenPvBlkDxe/BlockFront.c 
> > b/OvmfPkg/XenPvBlkDxe/BlockFront.c
> > index 256ac55..ae7cab9 100644
> > --- a/OvmfPkg/XenPvBlkDxe/BlockFront.c
> > +++ b/OvmfPkg/XenPvBlkDxe/BlockFront.c
> > @@ -186,6 +186,10 @@ XenPvBlockFrontInitialization (
> >}
> >FreePool (DeviceType);
> >  
> > +  Status = XenBusReadUint64 (XenBusIo, "sectors", TRUE, );
> > +  if (Dev->MediaInfo.CdRom && Status != XENSTORE_STATUS_SUCCESS)
> > + return EFI_NO_MEDIA;
> > +
> >Status = XenBusReadUint64 (XenBusIo, "backend-id", FALSE, );
> >if (Status != XENSTORE_STATUS_SUCCESS || Value > MAX_UINT16) {
> >  DEBUG ((EFI_D_ERROR, "XenPvBlk: Failed to get backend-id (%d)\n",
> 
> (1) Directly returning at that point will leak "Dev". I think you should
> set Status, and then goto Error. XenPvBlockFree() under that label will
> free Dev.

Good idea


> (2) I agree that returning an error code here will propagate through
> XenPvBlkDxeDriverBindingStart() to the caller, and ultimately prevent
> the binding. That's probably the right thing to do.
> 
> But, how does it differ from what you wrote above (*):
>
> > [...] the problem is just that it cannot handle empty cdrom drives
> > at the moment and XenPvBlockFrontInitialization simply returns an
> > error.
> 
> If XenPvBlockFrontInitialization() simply returned an error right now,
> then that would achieve the exact same thing as your proposal -- the
> driver wouldn't bind the device. So either your idea wouldn't make a
> difference, or your analysis that XenPvBlockFrontInitialization()
> currently fails is incorrect.
> 
> I think it's the latter though, and that this patch should be tested.

The patch works, but you are right, the analysis of the problem was
wrong: it gets stuck in XenPvBlkWaitForBackendState, rather than
returning an error.


> If it works in your testing, please submit it to
> <edk2-de...@lists.01.org>. (You have to be subscribed to post, sorry
> about that.) Please fix the leak (1), and add the following line to the
> commit message just before your signoff:
> 
> Contributed-under: TianoCore Contribution Agreement 1.0
> 
> What it means is explained in "OvmfPkg/Contributions.txt".

I'll rework the patch and send it to the list.
Thanks,

Stefano



Re: [Qemu-block] [Qemu-devel] [Xen-devel] Question about xen disk unplug support for ahci missed in qemu

2015-10-16 Thread Stefano Stabellini
On Fri, 16 Oct 2015, Laszlo Ersek wrote:
> On 10/16/15 11:06, Stefano Stabellini wrote:
> > On Thu, 15 Oct 2015, Kevin O'Connor wrote:
> >> On Fri, Oct 16, 2015 at 01:10:54AM +0200, Laszlo Ersek wrote:
> >>> On 10/14/15 13:27, Ian Campbell wrote:
> >>>> On Wed, 2015-10-14 at 12:06 +0100, Stefano Stabellini wrote:
> >>>>>> Can't you just teach SeaBIOS how to deal with your PV disks and then
> >>>>>> only add that to your VM and forget about IDE/AHCI? I mean, that's how
> >>>>>> it's done for virtio-blk, and it doesn't involve any insanities like
> >>>>>> ripping out non-hotpluggable devices.
> >>>>>
> >>>>> Teaching SeaBIOS to deal with PV disks can be done, in fact we already
> >>>>> support PV disks in OVMF. It is possible to boot Windows with OVMF
> >>>>> without any IDE disks (patch pending for libxl to create a VM without
> >>>>> emulated IDE disks).
> >>>>
> >>>> One stumbling block in the past has been how to know when the PV drivers 
> >>>> in
> >>>> the BIOS are no longer required, such that the ring can be torn down 
> >>>> and/or
> >>>> the connection etc handed over to the OS driver.
> >> [...]
> >>>> AFAIK the BIOS interfaces do not have anything as reliable as that.
> >>>>
> >>>> How does virtio deal with this in the BIOS case?
> >>>
> >>> It doesn't, as far as I can tell.
> >>>
> >>> I don't think it has to, though! On a BIOS box, you can always boot DOS,
> >>> or another operating system that continues to use the BIOS interfaces
> >>> forever. (Same as if you never call ExitBootServices() in UEFI.)
> >>>
> >>> Given that no starter pistol gets fired between the firmware and the OS
> >>> on such a platform, they must always respect each other. I guess this
> >>> could occur through the E820 map, or some such.
> >>
> >> One can use the "ACPI enable" SMI event to detect this if they really
> >> wanted to.  In SeaBIOS one could do this from
> >> src/fw/smm.c:handle_smi() - however, no other drivers need this
> >> notification today and it would be a bit ugly to have to handle it
> >> from an SMI.  (Assuming Xen were to support SMIs.)
> >>
> >>> No clue in what kind of E820 memory SeaBIOS allocates the virtio rings,
> >>> but I guess the Linux kernel stays away from those areas until it's past
> >>> device probing and binding.
> >>
> >> In SeaBIOS, the virtio memory is allocated from reserved memory.  (See
> >> the memalign_high() call in src/hw/virtio-pci.c - the "high" memory
> >> zone is taken from reserved memory:
> >> http://seabios.org/Memory_Model#Memory_available_during_initialization
> >> )
> >>
> >> What's the reason for the "stumbling block" that requires the BIOS to
> >> tear down the Xen ring prior to the OS being able to replace it?  The
> >> BIOS disk calls are all synchronous, so the ring wont be active when
> >> the OS brings up its own ring.  Is there some low-level interaction
> >> that prevents the OS from just resetting the ring prior to enabling
> >> it?
> > 
> > Xen only exports one PV disk interface for each disk to the guest, and
> > each PV interface only supports one frontend -- only SeaBIOS or the OS
> > can be connected to one PV disk, not both. In the case of OVMF, we
> > handle that by disconnecting the PV frontend in OVMF when
> > ExitBootServices is called, so that the OS driver can reconnect later.
> 
> Does the XenBus protocol support a device reset operation, regardless of
> what state the device is currently in? (I don't remember all the state
> transitions any longer, sorry.)

The PV block protocol doesn't unfortunately. At least the block backend
in QEMU doesn't support it.



Re: [Qemu-block] [Qemu-devel] [Xen-devel] Question about xen disk unplug support for ahci missed in qemu

2015-10-16 Thread Stefano Stabellini
On Thu, 15 Oct 2015, Kevin O'Connor wrote:
> On Fri, Oct 16, 2015 at 01:10:54AM +0200, Laszlo Ersek wrote:
> > On 10/14/15 13:27, Ian Campbell wrote:
> > > On Wed, 2015-10-14 at 12:06 +0100, Stefano Stabellini wrote:
> > >>> Can't you just teach SeaBIOS how to deal with your PV disks and then
> > >>> only add that to your VM and forget about IDE/AHCI? I mean, that's how
> > >>> it's done for virtio-blk, and it doesn't involve any insanities like
> > >>> ripping out non-hotpluggable devices.
> > >>
> > >> Teaching SeaBIOS to deal with PV disks can be done, in fact we already
> > >> support PV disks in OVMF. It is possible to boot Windows with OVMF
> > >> without any IDE disks (patch pending for libxl to create a VM without
> > >> emulated IDE disks).
> > > 
> > > One stumbling block in the past has been how to know when the PV drivers 
> > > in
> > > the BIOS are no longer required, such that the ring can be torn down 
> > > and/or
> > > the connection etc handed over to the OS driver.
> [...]
> > > AFAIK the BIOS interfaces do not have anything as reliable as that.
> > > 
> > > How does virtio deal with this in the BIOS case?
> > 
> > It doesn't, as far as I can tell.
> > 
> > I don't think it has to, though! On a BIOS box, you can always boot DOS,
> > or another operating system that continues to use the BIOS interfaces
> > forever. (Same as if you never call ExitBootServices() in UEFI.)
> > 
> > Given that no starter pistol gets fired between the firmware and the OS
> > on such a platform, they must always respect each other. I guess this
> > could occur through the E820 map, or some such.
> 
> One can use the "ACPI enable" SMI event to detect this if they really
> wanted to.  In SeaBIOS one could do this from
> src/fw/smm.c:handle_smi() - however, no other drivers need this
> notification today and it would be a bit ugly to have to handle it
> from an SMI.  (Assuming Xen were to support SMIs.)
> 
> > No clue in what kind of E820 memory SeaBIOS allocates the virtio rings,
> > but I guess the Linux kernel stays away from those areas until it's past
> > device probing and binding.
> 
> In SeaBIOS, the virtio memory is allocated from reserved memory.  (See
> the memalign_high() call in src/hw/virtio-pci.c - the "high" memory
> zone is taken from reserved memory:
> http://seabios.org/Memory_Model#Memory_available_during_initialization
> )
> 
> What's the reason for the "stumbling block" that requires the BIOS to
> tear down the Xen ring prior to the OS being able to replace it?  The
> BIOS disk calls are all synchronous, so the ring wont be active when
> the OS brings up its own ring.  Is there some low-level interaction
> that prevents the OS from just resetting the ring prior to enabling
> it?

Xen only exports one PV disk interface for each disk to the guest, and
each PV interface only supports one frontend -- only SeaBIOS or the OS
can be connected to one PV disk, not both. In the case of OVMF, we
handle that by disconnecting the PV frontend in OVMF when
ExitBootServices is called, so that the OS driver can reconnect later.



Re: [Qemu-block] [Qemu-devel] Question about xen disk unplug support for ahci missed in qemu

2015-10-16 Thread Stefano Stabellini
On Fri, 16 Oct 2015, Fabio Fantoni wrote:
> Il 16/10/2015 12:13, Anthony PERARD ha scritto:
> > On Fri, Oct 16, 2015 at 10:32:44AM +0200, Fabio Fantoni wrote:
> > > Il 15/10/2015 20:02, Anthony PERARD ha scritto:
> > > > On Thu, Oct 15, 2015 at 06:27:17PM +0200, Fabio Fantoni wrote:
> > > > > Il 14/10/2015 13:06, Stefano Stabellini ha scritto:
> > > > > > I would suggest Fabio to avoid AHCI disks altogether and just use
> > > > > > OVMF
> > > > > > with PV disks only and Anthony's patch to libxl to avoid creating
> > > > > > any
> > > > > > IDE disks: http://marc.info/?l=xen-devel=144482080812353.
> > > > > > 
> > > > > > Would that work for you?
> > > > > Thanks for the advice, I tried it:
> > > > > https://github.com/Fantu/Xen/commits/rebase/m2r-testing-4.6
> > > > > 
> > > > > I installed W10 pro 64 bit with ide disk, installed the win pv drivers
> > > > > and
> > > > > after changed to xvdX instead hdX, is the only change needed, right?
> > > > > Initial boot is ok (ovmf part about pv disks seems ok) but windows
> > > > > boot
> > > > > fails with problem with pv drivers.
> > > > > In attachment full qemu log with xen_platform trace and domU's xl cfg.
> > > > > 
> > > > > Someone have windows domUs with ovmf and pv disks only working? If yes
> > > > > can
> > > > > tell me the difference to understand what can be the problem please?
> > > > When I worked on the PV disk implementation in OVMF, I was able to boot
> > > > a Windows 8 with pv disk only.
> > > > 
> > > > I don't have access to the guest configuration I was using, but I think
> > > > one
> > > > difference would be the viridian setting, I'm pretty sure I did not set
> > > > it.
> > > > 
> > > I tried with viridian disabled but did the same thing, looking cdrom as
> > > latest thing before xenbug trace in qemu log I tried also to remove it but
> > > also in this case don't boot correctly, full qemu log in attachment.
> > > I don't know if is a ovmf thing to improve (like what seems in Laszlo and
> > > Kevin mails) or xen winpv drivers unexpected case, have you tried also
> > > with
> > > latest winpv builds? (for exclude regression)
> > No, I did not tried the latest winpv drivers.
> > 
> > Sorry I can help much more that that. When I install this win8 guest tried
> > to boot it with pv drivers only, that was more than a year ago. I have not
> > check if it's still working. (Also I can not try anything more recent,
> > right now.)
> > 
> 
> I did many other tests, retrying with ide first boot working but show pv
> devices not working, I did another reboot (with ide) and pv devices was
> working, after I retried with pv (xvdX) and boot correctly.
> After other tests I found that with empty cdrom device (required for xl
> cd-insert/cd-eject) boot stop at start (tianocore image), same result with ide
> instead.
> From xl cfg:
> disk=['/mnt/vm/disks/W10UEFI.disk1.cow-sn1,qcow2,xvda,rw',',raw,xvdb,ro,cdrom']
> With seabios domU boot also with empty cdrom.
> In qemu log I found only these that can be related:
> > xen be: qdisk-51728: error: Could not open image: No such file or directory
> > xen be: qdisk-51728: initialise() failed
> And latest xl dmesg line is:
> > (d1) Invoking OVMF ...
> 
> If you need more informations/test tell me and I'll post them.

Are you saying that without any cdrom drives, it works correctly?



Re: [Qemu-block] [Qemu-devel] Question about xen disk unplug support for ahci missed in qemu

2015-10-14 Thread Stefano Stabellini
On Wed, 14 Oct 2015, Kevin Wolf wrote:
> Am 14.10.2015 um 13:06 hat Stefano Stabellini geschrieben:
> > On Wed, 14 Oct 2015, Kevin Wolf wrote:
> > > [ CC qemu-block ]
> > > 
> > > Am 13.10.2015 um 19:10 hat Stefano Stabellini geschrieben:
> > > > On Tue, 13 Oct 2015, John Snow wrote:
> > > > > On 10/13/2015 11:55 AM, Fabio Fantoni wrote:
> > > > > > I added ahci disk support in libxl and using it for week seems that 
> > > > > > was
> > > > > > ok, after a reply of Stefano Stabellini seems that xen disk unplug
> > > > > > support only ide disks:
> > > > > > http://git.qemu.org/?p=qemu.git;a=commitdiff;h=679f4f8b178e7c66fbc2f39c905374ee8663d5d8
> > > > > > 
> > > > > > Today Paul Durrant told me that even if pv disk is ok also with 
> > > > > > ahci and
> > > > > > the emulated one is offline can be a risk:
> > > > > > http://lists.xenproject.org/archives/html/win-pv-devel/2015-10/msg00021.html
> > > > > > 
> > > > > > 
> > > > > > I tried to take a fast look in qemu code but I not understand the 
> > > > > > needed
> > > > > > thing for add the xen disk unplug support also for ahci, can 
> > > > > > someone do
> > > > > > it or tell me useful information for do it please?
> > > > > > 
> > > > > > Thanks for any reply and sorry for my bad english.
> > > > > > 
> > > > > 
> > > > > I'm not entirely sure what features you need AHCI to support in order
> > > > > for Xen to be happy.
> > > > > 
> > > > > I'd guess hotplugging, but where I get confused is that IDE disks 
> > > > > don't
> > > > > support hotplugging either, so I guess I'm not sure sure what you 
> > > > > need.
> > > > > 
> > > > > Stefano, can you help bridge my Xen knowledge gap?
> > > >  
> > > > Hi John,
> > > > 
> > > > we need something like hw/i386/xen/xen_platform.c:unplug_disks but that
> > > > can unplug AHCI disk. And by unplug, I mean "make disappear" like
> > > > pci_piix3_xen_ide_unplug does for ide.
> > > 
> > > Maybe this would be the right time to stop the craziness with your
> > > hybrid IDE/xendisk setup. It's a horrible thing that would never happen
> > > on real hardware.
> > 
> > I would be quite happy to stop (or even get rid of) the craziness.
> > 
> > 
> > > Can't you just teach SeaBIOS how to deal with your PV disks and then
> > > only add that to your VM and forget about IDE/AHCI? I mean, that's how
> > > it's done for virtio-blk, and it doesn't involve any insanities like
> > > ripping out non-hotpluggable devices.
> > 
> > Teaching SeaBIOS to deal with PV disks can be done, in fact we already
> > support PV disks in OVMF. It is possible to boot Windows with OVMF
> > without any IDE disks (patch pending for libxl to create a VM without
> > emulated IDE disks).
> > 
> > However we have to be honest that implementing PV disk support in
> > SeaBIOS is a different magnitude of effort compared to implementing AHCI
> > "unplug".
> 
> Agreed. But we generally try to do the right thing and not the easy
> thing.

Sure. I am quite happy to let other people do it though :-)


> Maybe I'm missing something, but my impression was that the hybrid setup
> was only used so that you can boot from a PV disk even though the BIOS
> doesn't have a PV driver. In that case, why would you even want to use
> AHCI instead of IDE? During the early boot performance shouldn't be much
> different as there is no parallelism, and afterwards the PV driver is
> used.

The "unplug" code and the design choice predate me -- I am not sure why
it was done like that.  In those days there was no SeaBIOS: maybe they
thought that writing a PV disk frontend in rombios would be madness? 

In addition isn't it true that some guests, once they see a PIIX3
chipset and they know that there is one disk, they might just try to
access it directly via the IDE interface?  I am thinking of some old
versions of Windows. Are they really able to cope with the case where
they can only access the disk via the legacy BIOS until non-IDE drivers
come along?


> > I would suggest Fabio to avoid AHCI disks altogether and just use OVMF
> > with PV disks only and Anthony's patch to libxl to avoid creating any
> > IDE disks: http://marc.info/?l=xen-devel=144482080812353.
> > 
> > Would that work for you?
> 
> That sounds certainly like a better step forward than adding a crude
> hack to AHCI.
> 
> And if you want to make me completely happy, plan to extend SeaBIOS so
> that we can even drop the existing hack in IDE.

There are lots of existing guests which write to the magic ioport to
"unplug" disks.  We would need some sort of deprecation plan.