[Qemu-block] [PATCH V4] pci: removed the is_express field since a uniform interface was inserted

2017-12-11 Thread Yoni Bettan
* according to Eduardo Habkost's commit
  fd3b02c8896d597dd8b9e053dec579cf0386aee1

* since all PCIEs now implement INTERFACE_PCIE_DEVICE we
  don't need this field anymore

* Devices that where only INTERFACE_PCIE_DEVICE (is_express == 1)
  or
  devices that where only INTERFACE_CONVENTIONAL_PCI_DEVICE (is_express 
== 0)
  where not affected by the change

  The only devices that were affected are those that are hybrid and also
  had (is_express == 1) - therefor only:
- hw/vfio/pci.c
- hw/usb/hcd-xhci.c

  For both I made sure that QEMU_PCI_CAP_EXPRESS is on

Signed-off-by: Yoni Bettan 
---
 docs/pcie_pci_bridge.txt   | 2 +-
 hw/block/nvme.c| 1 -
 hw/net/e1000e.c| 1 -
 hw/pci-bridge/pcie_pci_bridge.c| 1 -
 hw/pci-bridge/pcie_root_port.c | 1 -
 hw/pci-bridge/xio3130_downstream.c | 1 -
 hw/pci-bridge/xio3130_upstream.c   | 1 -
 hw/pci-host/xilinx-pcie.c  | 1 -
 hw/pci/pci.c   | 8 ++--
 hw/scsi/megasas.c  | 4 
 hw/usb/hcd-xhci.c  | 9 -
 hw/vfio/pci.c  | 5 -
 include/hw/pci/pci.h   | 3 ---
 13 files changed, 19 insertions(+), 19 deletions(-)

diff --git a/docs/pcie_pci_bridge.txt b/docs/pcie_pci_bridge.txt
index 5a4203f97c..ab35ebf3ca 100644
--- a/docs/pcie_pci_bridge.txt
+++ b/docs/pcie_pci_bridge.txt
@@ -110,5 +110,5 @@ To enable device hot-plug into the bridge on Linux there're 
3 ways:
 Implementation
 ==
 The PCIE-PCI bridge is based on PCI-PCI bridge, but also accumulates PCI 
Express
-features as a PCI Express device (is_express=1).
+features as a PCI Express device.
 
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 441e21ed1f..9325bc0911 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1087,7 +1087,6 @@ static void nvme_class_init(ObjectClass *oc, void *data)
 pc->vendor_id = PCI_VENDOR_ID_INTEL;
 pc->device_id = 0x5845;
 pc->revision = 2;
-pc->is_express = 1;
 
 set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
 dc->desc = "Non-Volatile Memory Express";
diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
index f1af279e8d..c360f0d8c9 100644
--- a/hw/net/e1000e.c
+++ b/hw/net/e1000e.c
@@ -675,7 +675,6 @@ static void e1000e_class_init(ObjectClass *class, void 
*data)
 c->revision = 0;
 c->romfile = "efi-e1000e.rom";
 c->class_id = PCI_CLASS_NETWORK_ETHERNET;
-c->is_express = 1;
 
 dc->desc = "Intel 82574L GbE Controller";
 dc->reset = e1000e_qdev_reset;
diff --git a/hw/pci-bridge/pcie_pci_bridge.c b/hw/pci-bridge/pcie_pci_bridge.c
index a4d827c99d..b7d9ebbec2 100644
--- a/hw/pci-bridge/pcie_pci_bridge.c
+++ b/hw/pci-bridge/pcie_pci_bridge.c
@@ -169,7 +169,6 @@ static void pcie_pci_bridge_class_init(ObjectClass *klass, 
void *data)
 DeviceClass *dc = DEVICE_CLASS(klass);
 HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(klass);
 
-k->is_express = 1;
 k->is_bridge = 1;
 k->vendor_id = PCI_VENDOR_ID_REDHAT;
 k->device_id = PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE;
diff --git a/hw/pci-bridge/pcie_root_port.c b/hw/pci-bridge/pcie_root_port.c
index 9b6e4ce512..45f9e8cd4a 100644
--- a/hw/pci-bridge/pcie_root_port.c
+++ b/hw/pci-bridge/pcie_root_port.c
@@ -145,7 +145,6 @@ static void rp_class_init(ObjectClass *klass, void *data)
 DeviceClass *dc = DEVICE_CLASS(klass);
 PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
-k->is_express = 1;
 k->is_bridge = 1;
 k->config_write = rp_write_config;
 k->realize = rp_realize;
diff --git a/hw/pci-bridge/xio3130_downstream.c 
b/hw/pci-bridge/xio3130_downstream.c
index 1e09d2afb7..613a0d6bb7 100644
--- a/hw/pci-bridge/xio3130_downstream.c
+++ b/hw/pci-bridge/xio3130_downstream.c
@@ -177,7 +177,6 @@ static void xio3130_downstream_class_init(ObjectClass 
*klass, void *data)
 DeviceClass *dc = DEVICE_CLASS(klass);
 PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
-k->is_express = 1;
 k->is_bridge = 1;
 k->config_write = xio3130_downstream_write_config;
 k->realize = xio3130_downstream_realize;
diff --git a/hw/pci-bridge/xio3130_upstream.c b/hw/pci-bridge/xio3130_upstream.c
index 227997ce46..d4645bddee 100644
--- a/hw/pci-bridge/xio3130_upstream.c
+++ b/hw/pci-bridge/xio3130_upstream.c
@@ -148,7 +148,6 @@ static void xio3130_upstream_class_init(ObjectClass *klass, 
void *data)
 DeviceClass *dc = DEVICE_CLASS(klass);
 PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
 
-k->is_express = 1;
 k->is_bridge = 1;
 k->config_write = xio3130_upstream_write_config;
 k->realize = xio3130_upstream_realize;
diff --git a/hw/pci-host/xilinx-pcie.c b/hw/pci-host/xilinx-pcie.c
index 7659253090..a4ca3ba30f 100644
--- a/hw/pci-host/xilinx-pcie.c
+++ b/hw/pci-host/xilinx-pcie.c
@@ -298,7 +298,6 @@ static void xilinx_pcie_root_class_init(ObjectClass *klass, 

[Qemu-block] [PATCH v4] qemu-img: Document --force-share / -U

2017-12-11 Thread Fam Zheng
Signed-off-by: Fam Zheng 

---

v4: "images". [Kevin]

v3: Document that the option is not allowed for read-write. [Stefan]

v2: - "code{qemu-img}". [Kashyap, Eric]
- "etc.." -> "etc.".
---
 qemu-img.texi | 9 +
 1 file changed, 9 insertions(+)

diff --git a/qemu-img.texi b/qemu-img.texi
index fdcf120f36..d93501f94f 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -57,6 +57,15 @@ exclusive with the @var{-O} parameters. It is currently 
required to also use
 the @var{-n} parameter to skip image creation. This restriction may be relaxed
 in a future release.
 
+@item --force-share (-U)
+
+If specified, @code{qemu-img} will open the image with shared permissions,
+which makes it less likely to conflict with a running guest's permissions due
+to image locking. For example, this can be used to get the image information
+(with 'info' subcommand) when the image is used by a running guest. Note that
+this could produce inconsistent result because of concurrent metadata changes,
+etc. This option is only allowed when opening images in read-only mode.
+
 @item fmt
 is the disk image format. It is guessed automatically in most cases. See below
 for a description of the supported disk formats.
-- 
2.14.3




Re: [Qemu-block] [PATCH] blockjob: kick jobs on set-speed

2017-12-11 Thread Jeff Cody
On Mon, Dec 11, 2017 at 06:46:09PM -0500, John Snow wrote:
> If users set an unreasonably low speed (like one byte per second), the
> calculated delay may exceed many hours. While we like to punish users
> for asking for stupid things, we do also like to allow users to correct
> their wicked ways.
> 
> When a user provides a new speed, kick the job to allow it to recalculate
> its delay.
> 
> Signed-off-by: John Snow 
> ---
>  blockjob.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/blockjob.c b/blockjob.c
> index 715c2c2680..43f01ad190 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -483,6 +483,7 @@ static void block_job_completed_txn_success(BlockJob *job)
>  void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
>  {
>  Error *local_err = NULL;
> +int64_t old_speed = job->speed;
>  
>  if (!job->driver->set_speed) {
>  error_setg(errp, QERR_UNSUPPORTED);
> @@ -495,6 +496,10 @@ void block_job_set_speed(BlockJob *job, int64_t speed, 
> Error **errp)
>  }
>  
>  job->speed = speed;
> +/* Kick the job to recompute its delay */
> +if ((speed > old_speed) && timer_pending(>sleep_timer)) {

job->sleep_timer is protected by block_job_mutex (via
block_job_lock/unlock); is it safe for us to check it here outside the
mutex?

But in any case, I think we could get rid of the timer_pending check, and
just always kick the job if we have a speed increase.  block_job_enter()
should do the right thing (mutex protected check on job->busy and
job->sleep_timer).

> +block_job_enter(job);
> +}
>  }
>  
>  void block_job_complete(BlockJob *job, Error **errp)
> -- 
> 2.14.3
> 



[Qemu-block] [PATCH] blockjob: kick jobs on set-speed

2017-12-11 Thread John Snow
If users set an unreasonably low speed (like one byte per second), the
calculated delay may exceed many hours. While we like to punish users
for asking for stupid things, we do also like to allow users to correct
their wicked ways.

When a user provides a new speed, kick the job to allow it to recalculate
its delay.

Signed-off-by: John Snow 
---
 blockjob.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/blockjob.c b/blockjob.c
index 715c2c2680..43f01ad190 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -483,6 +483,7 @@ static void block_job_completed_txn_success(BlockJob *job)
 void block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
 {
 Error *local_err = NULL;
+int64_t old_speed = job->speed;
 
 if (!job->driver->set_speed) {
 error_setg(errp, QERR_UNSUPPORTED);
@@ -495,6 +496,10 @@ void block_job_set_speed(BlockJob *job, int64_t speed, 
Error **errp)
 }
 
 job->speed = speed;
+/* Kick the job to recompute its delay */
+if ((speed > old_speed) && timer_pending(>sleep_timer)) {
+block_job_enter(job);
+}
 }
 
 void block_job_complete(BlockJob *job, Error **errp)
-- 
2.14.3




Re: [Qemu-block] [Qemu-devel] [PATCH for-2.12 0/4] qmp dirty bitmap API

2017-12-11 Thread John Snow


On 12/11/2017 06:15 AM, Kevin Wolf wrote:
> Am 09.12.2017 um 01:57 hat John Snow geschrieben:
>> Here's an idea of what this API might look like without revealing
>> explicit merge/split primitives.
>>
>> A new bitmap property that lets us set retention:
>>
>> :: block-dirty-bitmap-set-retention bitmap=foo slices=10
>>
>> Or something similar, where the default property for all bitmaps is
>> zero -- the current behavior: no copies retained.
>>
>> By setting it to a non-zero positive integer, the incremental backup
>> mode will automatically save a disabled copy when possible.
> 
> -EMAGIC
> 
> Operations that create or delete user-visible objects should be
> explicit, not automatic. You're trying to implement management layer
> functionality in qemu here, but incomplete enough that the artifacts of
> it are still visible externally. (A complete solution within qemu
> wouldn't expose low-level concepts such as bitmaps on an external
> interface, but you would expose something like checkpoints.)
> 
> Usually it's not a good idea to have a design where qemu implements
> enough to restrict management tools to whatever use case we had in mind,
> but not enough to make the management tool's life substantially easier
> (by not having to care about some low-level concepts).
> 
>> "What happens if we exceed our retention?"
>>
>> (A) We push the last one out automatically, or
>> (B) We fail the operation immediately.
>>
>> A is more convenient, but potentially unsafe if the management tool or
>> user wasn't aware that was going to happen.
>> B is more annoying, but definitely more safe as it means we cannot lose
>> a bitmap accidentally.
> 
> Both mean that the management layer has not only to deal with the
> deletion of bitmaps as it wants to have them, but also to keep the
> retention counter somewhere and predict what qemu is going to do to the
> bitmaps and whether any corrective action needs to be taken.
> 
> This is making things more complex rather than simpler.
> 
>> I would argue for B with perhaps a force-cycle=true|false that defaults
>> to false to let management tools say "Yes, go ahead, remove the old one"
>> with additionally some return to let us know it happened:
>>
>> {"return": {
>>   "dropped-slices": [ {"bitmap0": 0}, ...]
>> }}
>>
>> This would introduce some concept of bitmap slices into the mix as ID'd
>> children of a bitmap. I would propose that these slices are numbered and
>> monotonically increasing. "bitmap0" as an object starts with no slices,
>> but every incremental backup creates slice 0, slice 1, slice 2, and so
>> on. Even after we start deleting some, they stay ordered. These numbers
>> then stand in for points in time.
>>
>> The counter can (must?) be reset and all slices forgotten when
>> performing a full backup while providing a bitmap argument.
>>
>> "How can a user make use of the slices once they're made?"
>>
>> Let's consider something like mode=partial in contrast to
>> mode=incremental, and an example where we have 6 prior slices:
>> 0,1,2,3,4,5, (and, unnamed, the 'active' slice.)
>>
>> mode=partial bitmap=foo slice=4
>>
>> This would create a backup from slice 4 to the current time α. This
>> includes all clusters from 4, 5, and the active bitmap.
>>
>> I don't think it is meaningful to define any end point that isn't the
>> current time, so I've omitted that as a possibility.
> 
> John, what are you doing here? This adds option after option, and even
> additional slice object, only complicating an easy thing more and more.
> I'm not sure if that was your intention, but I feel I'm starting to
> understand better how Linus's rants come about.
> 
> Let me summarise what this means for management layer:
> 
> * The management layer has to manage bitmaps. They have direct control
>   over creation and deletion of bitmaps. So far so good.
> 
> * It also has to manage slices in those bitmaps objects; and these
>   slices are what contains the actual bitmaps. In order to identify a
>   bitmap in qemu, you need:
> 
> a) the node name
> b) the bitmap ID, and
> c) the slice number
> 
>   The slice number is assigned by qemu and libvirt has to wait until
>   qemu tells it about the slice number of a newly created slice. If
>   libvirt doesn't receive the reply to the command that started the
>   block job, it needs to be able to query this information from qemu,
>   e.g. in query-block-jobs.
> 
> * Slices are automatically created when you start a backup job with a
>   bitmap. It doesn't matter whether you even intend to do an incremental
>   backup against this point in time. qemu knows better.
> 
> * In order to delete a slice that you don't need any more, you have to
>   create more slices (by doing more backups), but you don't get to
>   decide which one is dropped. qemu helpfully just drops the oldest one.
>   It doesn't matter if you want to keep an older one so you can do an
>   incremental backup for a longer timespan. Don't worry about your
>   backup 

Re: [Qemu-block] [Qemu-devel] [PATCH 08/17] iotests: Skip 103 for refcount_bits=1

2017-12-11 Thread John Snow


On 12/11/2017 12:17 PM, Max Reitz wrote:
> On 2017-12-09 02:36, John Snow wrote:
>>
>>
>> On 11/30/2017 08:23 AM, Max Reitz wrote:
>>> On 2017-11-30 04:18, Fam Zheng wrote:
 On Thu, 11/23 03:08, Max Reitz wrote:
> Signed-off-by: Max Reitz 
> ---
>  tests/qemu-iotests/103 | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/tests/qemu-iotests/103 b/tests/qemu-iotests/103
> index ecbd8ebd71..d0cfab8844 100755
> --- a/tests/qemu-iotests/103
> +++ b/tests/qemu-iotests/103
> @@ -40,6 +40,8 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
>  _supported_fmt qcow2
>  _supported_proto file nfs
>  _supported_os Linux
> +# Internal snapshots are (currently) impossible with refcount_bits=1
> +_unsupported_imgopts 'refcount_bits=1[^0-9]'

 What is the "[^0-9]" part for?
>>>
>>> It's so you can specify refcount_bits=16, but not
>>> refcount_bits=1,compat=0.10 or just refcount_bits=1.
>>>
>>> Max
>>>
>>
>> Worth a comment?
> 
> There is a comment above it that says that refcount_bits=1 is the
> disallowed option. :-)
> 
> I could add a "(refcount_bits=16 is OK, though)" if that would have been
> enough for you (or any proposal of yours).
> 
> Max
> 

Not worth a re-spin.

The double negative of "unsupported" and "not 0-9" takes a hot second to
parse. Mentioning that you are looking to prohibit 1,[foo] specifically
helps.



Re: [Qemu-block] [Qemu-devel] [PATCH] blockdev-backup: enable non-root nodes for backup

2017-12-11 Thread John Snow


On 12/11/2017 12:05 PM, Max Reitz wrote:
> On 2017-12-11 17:47, John Snow wrote:
>> On 12/11/2017 11:31 AM, Max Reitz wrote:
>>> On 2017-12-08 18:09, John Snow wrote:
 On 12/08/2017 09:30 AM, Max Reitz wrote:
> On 2017-12-05 01:48, John Snow wrote:
>>
>> I would say that a bitmap attached to a BlockBackend should behave in
>> the way you say: writes to any children should change the bitmap here.
>>
>> bitmaps attached to nodes shouldn't worry about such things.
>
> Do we have bitmaps attached to BlockBackends?  I sure hope not.
>
> We should not have any interface that requires the use of BlockBackends
> by now.  If we do, that's something that has to be fixed.
>
>

 I'm not sure what the right paradigm is anymore, then.

 A node is just a node, but something has to represent the "drive" as far
 as the device model sees it. I thought that *was* the BlockBackend, but
 is it not?
>>>
>>> Yes, and on the other side the BB represents the device model for the
>>> block layer.  But the thing is that the user should be blissfully
>>> unaware...  Or do you want to make bitmaps attachable to guest devices
>>> (through the QOM path or ID) instead?
>>>
>>
>> OK, sure -- the user can specify a device model to attach it to instead
>> of a node. They don't have to be aware of the BB itself.
>>
>> The implementation though, I imagine it associates with that BB.
> 
> But that would be a whole new implementation...
> 

Yeah.

>>> (The block layer would then internally translate that to a BB.  But
>>> that's a bad internal interface because the bitmap is still attached to
>>> a BDS, and it's a bad external interface because currently you can
>>> attach bitmaps to nodes and only to nodes...)
>>
>> What if the type of bitmap we want to track trans-node changes was not
>> attached to a BDS? That'd be one way to obviously discriminate between
>> "This tracks tree-wide changes" and "This tracks node-local changes."
> 
> A new type of bitmap? :-/
> 

"type" may be too strong of a word, but... all the ones we use currently
are node-local.

>> Implementation wise I don't have any actual thought as to how this could
>> possibly be efficient. Maybe a bitmap reference at each BDS that is a
>> child of that particular BB?
>>
>> On attach, the BDS gets a set-only reference to that bitmap.
>> On detach, we remove the reference.
>>
>> Then, any writes anywhere in the tree will coagulate in one place. It
>> may or may not be particularly true or correct, because a write down the
>> tree doesn't necessarily mean the visible data has changed at the top
>> layer, but I don't think we have anything institutionally set in place
>> to determine if we are changing visible data up-graph with a write
>> down-graph.
> 
> Hmmm...  The first thing to clarify is whether we want two types of
> bitmaps.  I don't think there is much use to node-local bitmaps, all
> bitmaps should track every dirtying of their associated node (wherever
> it comes from).
> 

I don't disagree, but if that's the case then attaching bitmaps to
non-root nodes should be prohibited and we ought to shore up the
semantic idea that bitmaps must collect writes from their children.

I'm not sure that's any better inherently than expanding the idea to
include obvious differences between node-local and tree-wide bitmaps.
Currently, we just confuse the two concepts and do a poor job of either.

> However, if that is too much of a performance penalty...  Then we
> probably do have to distinguish between the two so that users only add
> tree-wide bitmaps when they need them.
> 
> OTOH, I guess that in the common case it's not a performance penalty at
> all, if done right.  Usually, a node you attach a bitmap to will not
> have child nodes that are written to by other nodes.  So in the common
> case your tree-wide bitmap is just a plain local bitmap and thus just as
> efficient.
> 
> And if some child node is indeed written to by some other node...  I
> think you always want a tree-wide bitmap anyway.
> 
> So I think all bitmaps should be tree-wide and the fact that they
> currently are not is a bug.
I could agree with this viewpoint, but that means we need to disallow
bitmaps to be attached to any non-root BDS, and then implement an
ability to give set-only references to children.

That's a bit of an extreme tactic that might prevent us from ever using
node-local bitmaps with anything resembling a sane API in the future.
Maybe that's OK, but it is a commitment.

I'm not sure there are any good reasons to have node-local bitmaps, but
the idea was one I more or less inherited when I started working in this
area because specifying specific BDS nodes with "device=" was going out
of vogue, but what exactly a BB was wasn't really defined yet, so I got
stuck with a QMP interface named "node=" and an implementation that has
an affinity for nodes.



Re: [Qemu-block] [Qemu-devel] [PATCH 08/17] iotests: Skip 103 for refcount_bits=1

2017-12-11 Thread Max Reitz
On 2017-12-09 02:36, John Snow wrote:
> 
> 
> On 11/30/2017 08:23 AM, Max Reitz wrote:
>> On 2017-11-30 04:18, Fam Zheng wrote:
>>> On Thu, 11/23 03:08, Max Reitz wrote:
 Signed-off-by: Max Reitz 
 ---
  tests/qemu-iotests/103 | 2 ++
  1 file changed, 2 insertions(+)

 diff --git a/tests/qemu-iotests/103 b/tests/qemu-iotests/103
 index ecbd8ebd71..d0cfab8844 100755
 --- a/tests/qemu-iotests/103
 +++ b/tests/qemu-iotests/103
 @@ -40,6 +40,8 @@ trap "_cleanup; exit \$status" 0 1 2 3 15
  _supported_fmt qcow2
  _supported_proto file nfs
  _supported_os Linux
 +# Internal snapshots are (currently) impossible with refcount_bits=1
 +_unsupported_imgopts 'refcount_bits=1[^0-9]'
>>>
>>> What is the "[^0-9]" part for?
>>
>> It's so you can specify refcount_bits=16, but not
>> refcount_bits=1,compat=0.10 or just refcount_bits=1.
>>
>> Max
>>
> 
> Worth a comment?

There is a comment above it that says that refcount_bits=1 is the
disallowed option. :-)

I could add a "(refcount_bits=16 is OK, though)" if that would have been
enough for you (or any proposal of yours).

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH] blockdev-backup: enable non-root nodes for backup

2017-12-11 Thread Max Reitz
On 2017-12-11 17:47, John Snow wrote:
> 
> 
> On 12/11/2017 11:31 AM, Max Reitz wrote:
>> On 2017-12-08 18:09, John Snow wrote:
>>>
>>>
>>> On 12/08/2017 09:30 AM, Max Reitz wrote:
 On 2017-12-05 01:48, John Snow wrote:
>
>
> On 12/04/2017 05:21 PM, Max Reitz wrote:
>> On 2017-12-04 23:15, John Snow wrote:
>>>
>>>
>>> On 12/01/2017 02:41 PM, Max Reitz wrote:
 ((By the way, I don't suppose that's how it should work...  But I don't
 suppose that we want propagation of dirtying towards the BDS roots, do
 we? :-/))
>>>
>>> I have never really satisfactorily explained to myself what bitmaps on
>>> intermediate notes truly represent or mean.
>>>
>>> The simple case is "This layer itself serviced a write request."
>>>
>>> If that information is not necessarily meaningful, I'm not sure that's a
>>> problem except in configuration.
>>>
>>>
>>> ...Now, if you wanted to talk about bitmaps that associate with a
>>> Backend instead of a Node...
>>
>> But it's not about bitmaps on intermediate nodes, quite the opposite.
>> It's about bitmaps on roots but write requests happening on intermediate
>> nodes.
>>
>
> Oh, I see what you're saying. It magically doesn't really change my
> opinion, by coincidence!
>
>> Say you have a node I and two filter nodes A and B using it (and they
>> are OK with shared writers).  There is a dirty bitmap on A.
>>
>> Now when a write request goes through B, I will obviously have changed,
>> and because A and B are filters, so will A.  But the dirty bitmap on A
>> will still be clean.
>>
>> My example was that when you run a mirror over A, you won't see dirtying
>> from B.  So you can't e.g. add a throttle driver between a mirror job
>> and the node you want to mirror, because the dirty bitmap on the
>> throttle driver will not be affected by accesses to the actual node.
>>
>> Max
>>
>
> Well, in this case I would say that a root BDS is not really any
> different from an intermediate one and can't really know what's going on
> in the world outside.
>
> At least, I think that's how we model it right now -- we pretend that we
> can record the activity of an entire drive graph by putting the bitmap
> on the root-most node we can get a hold of and assuming that all writes
> are going to go through us.

 Well, yeah, I know we do.  But I consider this counter-intuitive and if
 something is counter-intuitive it's often a bug.

> Clearly this is increasingly false the more we modularise the block graph.
>
>
> *uhm*
>
>
> I would say that a bitmap attached to a BlockBackend should behave in
> the way you say: writes to any children should change the bitmap here.
>
> bitmaps attached to nodes shouldn't worry about such things.

 Do we have bitmaps attached to BlockBackends?  I sure hope not.

 We should not have any interface that requires the use of BlockBackends
 by now.  If we do, that's something that has to be fixed.

 Max

>>>
>>> I'm not sure what the right paradigm is anymore, then.
>>>
>>> A node is just a node, but something has to represent the "drive" as far
>>> as the device model sees it. I thought that *was* the BlockBackend, but
>>> is it not?
>>
>> Yes, and on the other side the BB represents the device model for the
>> block layer.  But the thing is that the user should be blissfully
>> unaware...  Or do you want to make bitmaps attachable to guest devices
>> (through the QOM path or ID) instead?
>>
> 
> OK, sure -- the user can specify a device model to attach it to instead
> of a node. They don't have to be aware of the BB itself.
> 
> The implementation though, I imagine it associates with that BB.

But that would be a whole new implementation...

>> (The block layer would then internally translate that to a BB.  But
>> that's a bad internal interface because the bitmap is still attached to
>> a BDS, and it's a bad external interface because currently you can
>> attach bitmaps to nodes and only to nodes...)
> 
> What if the type of bitmap we want to track trans-node changes was not
> attached to a BDS? That'd be one way to obviously discriminate between
> "This tracks tree-wide changes" and "This tracks node-local changes."

A new type of bitmap? :-/

> Implementation wise I don't have any actual thought as to how this could
> possibly be efficient. Maybe a bitmap reference at each BDS that is a
> child of that particular BB?
> 
> On attach, the BDS gets a set-only reference to that bitmap.
> On detach, we remove the reference.
> 
> Then, any writes anywhere in the tree will coagulate in one place. It
> may or may not be particularly true or correct, because a write down the
> tree doesn't necessarily mean the visible data has changed at the top
> layer, but I don't 

Re: [Qemu-block] import thin provisioned disks with image upload

2017-12-11 Thread Max Reitz
On 2017-12-08 21:56, Nir Soffer wrote:
> On Fri, Dec 8, 2017 at 5:23 PM Max Reitz  > wrote:

[...]

> What "stat" reports as "size" I'd call the length (what qemu-img info
> calls "virtual size" for raw images).  
> 
> 
> raw images are not an issue since the virtual size is the file size.

I personally don't quite see how the file length is important for other
image types.  At least the qcow2 driver doesn't really try to keep the
file length small because it generally assumes that it's just a number
that doesn't really matter.  (That was my impression, at least.)

> What I (and qemu-img info) call
> "size" is how much disk space is actually used.  And both ls -s and du
> agree that it's 0 bytes in this case.
> 
> By the way, yes, "stat" has a different definition of "size", so that
> seems wrong.  But I argue that the "ls" option is called "--size", so
> there's a conflict already and qemu-img info is not the first tool to
> disagree with stat on this.
> 
> 
> I think if you ask most users what is a file size they will tell you what
> stat is returning. If we use the term in the way most people expect
> we can make tools easier to use.

First of all, we cannot easily change the name from "size".  In the QAPI
definition of ImageInfo it's called "actual-size", so we cannot change
it without breaking compatibility.  We could change the human-readable
output, though; maybe to something like "host disk usage".

> Users of qemu-img have no way to tell what is disk-size, the value is
> not documented, at least not in the manual or online help.

Documentation is a different issue.  There is some for the actual-size
field of ImageInfo, but even that is indeed lacking.

> I think the easy way to improve this is to show both the "allocated size"
> (st_size * block_size), and the "file size" (st_size).

Not impossible, although I personally don't really see the point.

To be honest, what I fear is that people see the file length and
complain why it's so large when it's actually just some number that
shouldn't matter whatsoever.  If people really want to find out, I don't
think ls -l is that hard to use.

OK, so let me sum up:

First, I agree that "file size" is confusing.  (I disagree that it would
be wrong.  I think it is simply ambiguous.)  So we should rename it at
least in the human-readable output.

Secondly, we should have documentation about this in the qemu-img man
page, and better documentation for ImageInfo.actual-size.

Thirdly, I don't see the point of displaying the file length in qemu-img
info.  But since you think it useful and it probably wouldn't be too
hard, we can add it.  My only fear about this is that I consider it an
arbitrary and useless number that may confuse people.  It hopefully
shouldn't, as long as they can see the actual disk usage at the same
time, though.


(I just don't know how seeing the actual image file length in qemu-img
info would have helped in your case.  The important thing would have
been to know that image files usually do contain large holes and you
need to enable hole detection when transferring image files anywhere (as
far as I have seen).)

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [Qemu-devel] [PATCH] blockdev-backup: enable non-root nodes for backup

2017-12-11 Thread John Snow


On 12/11/2017 11:31 AM, Max Reitz wrote:
> On 2017-12-08 18:09, John Snow wrote:
>>
>>
>> On 12/08/2017 09:30 AM, Max Reitz wrote:
>>> On 2017-12-05 01:48, John Snow wrote:


 On 12/04/2017 05:21 PM, Max Reitz wrote:
> On 2017-12-04 23:15, John Snow wrote:
>>
>>
>> On 12/01/2017 02:41 PM, Max Reitz wrote:
>>> ((By the way, I don't suppose that's how it should work...  But I don't
>>> suppose that we want propagation of dirtying towards the BDS roots, do
>>> we? :-/))
>>
>> I have never really satisfactorily explained to myself what bitmaps on
>> intermediate notes truly represent or mean.
>>
>> The simple case is "This layer itself serviced a write request."
>>
>> If that information is not necessarily meaningful, I'm not sure that's a
>> problem except in configuration.
>>
>>
>> ...Now, if you wanted to talk about bitmaps that associate with a
>> Backend instead of a Node...
>
> But it's not about bitmaps on intermediate nodes, quite the opposite.
> It's about bitmaps on roots but write requests happening on intermediate
> nodes.
>

 Oh, I see what you're saying. It magically doesn't really change my
 opinion, by coincidence!

> Say you have a node I and two filter nodes A and B using it (and they
> are OK with shared writers).  There is a dirty bitmap on A.
>
> Now when a write request goes through B, I will obviously have changed,
> and because A and B are filters, so will A.  But the dirty bitmap on A
> will still be clean.
>
> My example was that when you run a mirror over A, you won't see dirtying
> from B.  So you can't e.g. add a throttle driver between a mirror job
> and the node you want to mirror, because the dirty bitmap on the
> throttle driver will not be affected by accesses to the actual node.
>
> Max
>

 Well, in this case I would say that a root BDS is not really any
 different from an intermediate one and can't really know what's going on
 in the world outside.

 At least, I think that's how we model it right now -- we pretend that we
 can record the activity of an entire drive graph by putting the bitmap
 on the root-most node we can get a hold of and assuming that all writes
 are going to go through us.
>>>
>>> Well, yeah, I know we do.  But I consider this counter-intuitive and if
>>> something is counter-intuitive it's often a bug.
>>>
 Clearly this is increasingly false the more we modularise the block graph.


 *uhm*


 I would say that a bitmap attached to a BlockBackend should behave in
 the way you say: writes to any children should change the bitmap here.

 bitmaps attached to nodes shouldn't worry about such things.
>>>
>>> Do we have bitmaps attached to BlockBackends?  I sure hope not.
>>>
>>> We should not have any interface that requires the use of BlockBackends
>>> by now.  If we do, that's something that has to be fixed.
>>>
>>> Max
>>>
>>
>> I'm not sure what the right paradigm is anymore, then.
>>
>> A node is just a node, but something has to represent the "drive" as far
>> as the device model sees it. I thought that *was* the BlockBackend, but
>> is it not?
> 
> Yes, and on the other side the BB represents the device model for the
> block layer.  But the thing is that the user should be blissfully
> unaware...  Or do you want to make bitmaps attachable to guest devices
> (through the QOM path or ID) instead?
> 

OK, sure -- the user can specify a device model to attach it to instead
of a node. They don't have to be aware of the BB itself.

The implementation though, I imagine it associates with that BB.

> (The block layer would then internally translate that to a BB.  But
> that's a bad internal interface because the bitmap is still attached to
> a BDS, and it's a bad external interface because currently you can
> attach bitmaps to nodes and only to nodes...)

What if the type of bitmap we want to track trans-node changes was not
attached to a BDS? That'd be one way to obviously discriminate between
"This tracks tree-wide changes" and "This tracks node-local changes."

Implementation wise I don't have any actual thought as to how this could
possibly be efficient. Maybe a bitmap reference at each BDS that is a
child of that particular BB?

On attach, the BDS gets a set-only reference to that bitmap.
On detach, we remove the reference.

Then, any writes anywhere in the tree will coagulate in one place. It
may or may not be particularly true or correct, because a write down the
tree doesn't necessarily mean the visible data has changed at the top
layer, but I don't think we have anything institutionally set in place
to determine if we are changing visible data up-graph with a write
down-graph.



Re: [Qemu-block] [Qemu-devel] [PATCH] blockdev-backup: enable non-root nodes for backup

2017-12-11 Thread Max Reitz
On 2017-12-08 18:09, John Snow wrote:
> 
> 
> On 12/08/2017 09:30 AM, Max Reitz wrote:
>> On 2017-12-05 01:48, John Snow wrote:
>>>
>>>
>>> On 12/04/2017 05:21 PM, Max Reitz wrote:
 On 2017-12-04 23:15, John Snow wrote:
>
>
> On 12/01/2017 02:41 PM, Max Reitz wrote:
>> ((By the way, I don't suppose that's how it should work...  But I don't
>> suppose that we want propagation of dirtying towards the BDS roots, do
>> we? :-/))
>
> I have never really satisfactorily explained to myself what bitmaps on
> intermediate notes truly represent or mean.
>
> The simple case is "This layer itself serviced a write request."
>
> If that information is not necessarily meaningful, I'm not sure that's a
> problem except in configuration.
>
>
> ...Now, if you wanted to talk about bitmaps that associate with a
> Backend instead of a Node...

 But it's not about bitmaps on intermediate nodes, quite the opposite.
 It's about bitmaps on roots but write requests happening on intermediate
 nodes.

>>>
>>> Oh, I see what you're saying. It magically doesn't really change my
>>> opinion, by coincidence!
>>>
 Say you have a node I and two filter nodes A and B using it (and they
 are OK with shared writers).  There is a dirty bitmap on A.

 Now when a write request goes through B, I will obviously have changed,
 and because A and B are filters, so will A.  But the dirty bitmap on A
 will still be clean.

 My example was that when you run a mirror over A, you won't see dirtying
 from B.  So you can't e.g. add a throttle driver between a mirror job
 and the node you want to mirror, because the dirty bitmap on the
 throttle driver will not be affected by accesses to the actual node.

 Max

>>>
>>> Well, in this case I would say that a root BDS is not really any
>>> different from an intermediate one and can't really know what's going on
>>> in the world outside.
>>>
>>> At least, I think that's how we model it right now -- we pretend that we
>>> can record the activity of an entire drive graph by putting the bitmap
>>> on the root-most node we can get a hold of and assuming that all writes
>>> are going to go through us.
>>
>> Well, yeah, I know we do.  But I consider this counter-intuitive and if
>> something is counter-intuitive it's often a bug.
>>
>>> Clearly this is increasingly false the more we modularise the block graph.
>>>
>>>
>>> *uhm*
>>>
>>>
>>> I would say that a bitmap attached to a BlockBackend should behave in
>>> the way you say: writes to any children should change the bitmap here.
>>>
>>> bitmaps attached to nodes shouldn't worry about such things.
>>
>> Do we have bitmaps attached to BlockBackends?  I sure hope not.
>>
>> We should not have any interface that requires the use of BlockBackends
>> by now.  If we do, that's something that has to be fixed.
>>
>> Max
>>
> 
> I'm not sure what the right paradigm is anymore, then.
> 
> A node is just a node, but something has to represent the "drive" as far
> as the device model sees it. I thought that *was* the BlockBackend, but
> is it not?

Yes, and on the other side the BB represents the device model for the
block layer.  But the thing is that the user should be blissfully
unaware...  Or do you want to make bitmaps attachable to guest devices
(through the QOM path or ID) instead?

(The block layer would then internally translate that to a BB.  But
that's a bad internal interface because the bitmap is still attached to
a BDS, and it's a bad external interface because currently you can
attach bitmaps to nodes and only to nodes...)

Max



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-block] [PATCH v6 09/20] parallels: Switch to .bdrv_co_block_status()

2017-12-11 Thread Vladimir Sementsov-Ogievskiy

09.12.2017 19:39, Eric Blake wrote:

On 12/09/2017 06:31 AM, Vladimir Sementsov-Ogievskiy wrote:

07.12.2017 23:30, Eric Blake wrote:

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the parallels driver accordingly.  Note that
the internal function block_status() is still sector-based, because
it is still in use by other sector-based functions; but that's okay
because request_alignment is 512 as a result of those functions.
For now, no optimizations are added based on the mapping hint.

Signed-off-by: Eric Blake 

   {
   BDRVParallelsState *s = bs->opaque;
-    int64_t offset;
+    int count;

+    assert(QEMU_IS_ALIGNED(offset | bytes, BDRV_SECTOR_SIZE));
   qemu_co_mutex_lock(>lock);
-    offset = block_status(s, sector_num, nb_sectors, pnum);
+    offset = block_status(s, offset >> BDRV_SECTOR_BITS,
+  bytes >> BDRV_SECTOR_BITS, );
   qemu_co_mutex_unlock(>lock);

+    *pnum = count * BDRV_SECTOR_SIZE;
   if (offset < 0) {
   return 0;
   }

+    *map = offset;

oh, didn't notice previous time :(, should be

*map = offset * BDRV_SECTOR_SIZE


Oh, right.


(also, if you already use ">> BDRV_SECTOR_BITS" in this function,
would not it be more consistent to use "<< BDRV_SECTOR_BITS"
for sector-to-byte conversion?)

with that fixed (with "<<" or "*", up to you),

'>> BDRV_SECTOR_BITS' always works.  You could also write '/
BDRV_SECTOR_SIZE' (the compiler should figure out that you are dividing
by a power of 2, and use the shift instruction under the hood), but I
find that a bit harder to reason about.

On the other hand, '<< BDRV_SECTOR_BITS' only produces the same size
output as the input; if the left side is 32 bits, it risks overflowing.
But '* BDRV_SECTOR_SIZE' always produces a 64-bit value.  So I've
learned (from past mistakes in other byte-conversion series) that the
multiply form is less likely to introduce unintended truncation bugs.


hm, now I doubt. I've wrote tiny program:

#include 
#include 

int main() {
    uint32_t blocks = 1 << 20;
    int block_bits = 15;
    uint32_t block_size = 1 << block_bits;
    uint64_t a = blocks * block_size;
    uint64_t b = blocks << block_bits;
    uint64_t c = (uint64_t)blocks * block_size;
    uint64_t d = (uint64_t)blocks << block_bits;
    printf("%lx %lx %lx %lx\n", a, b, c, d);
    return 0;
}


and it prints
0 0 8 8
for me. So, no difference between * and <<...




Reviewed-by: Vladimir Sementsov-Ogievskiy 




--
Best regards,
Vladimir




Re: [Qemu-block] [Qemu-devel] [PATCH V3] pci: removed the is_express field since a uniform interface was inserted

2017-12-11 Thread Yoni Bettan



On 12/11/2017 04:19 PM, Eduardo Habkost wrote:

On Mon, Dec 11, 2017 at 03:11:39PM +0200, Yoni Bettan wrote:


On 12/07/2017 10:58 PM, Eduardo Habkost wrote:

On Tue, Dec 05, 2017 at 07:17:06PM +0200, Yoni Bettan wrote:

  * according to Eduardo Habkost's commit
fd3b02c8896d597dd8b9e053dec579cf0386aee1

  * since all PCIEs now implement INTERFACE_PCIE_DEVICE we
don't need this field anymore

  * Devices that where only INTERFACE_PCIE_DEVICE (is_express == 1)
or
devices that where only INTERFACE_CONVENTIONAL_PCI_DEVICE 
(is_express == 0)
where not affected by the change

The only devices that were affected are those that are hybrid and 
also
had (is_express == 1) - therefor only:
  - hw/vfio/pci.c
  - hw/usb/hcd-xhci.c

For both I made sure that QEMU_PCI_CAP_EXPRESS is on

Signed-off-by: Yoni Bettan 
---
   docs/pcie_pci_bridge.txt   | 2 +-
   hw/block/nvme.c| 1 -
   hw/net/e1000e.c| 1 -
   hw/pci-bridge/pcie_pci_bridge.c| 1 -
   hw/pci-bridge/pcie_root_port.c | 1 -
   hw/pci-bridge/xio3130_downstream.c | 1 -
   hw/pci-bridge/xio3130_upstream.c   | 1 -
   hw/pci-host/xilinx-pcie.c  | 1 -
   hw/pci/pci.c   | 4 +++-
   hw/scsi/megasas.c  | 4 
   hw/usb/hcd-xhci.c  | 7 ++-
   hw/vfio/pci.c  | 3 ++-
   include/hw/pci/pci.h   | 3 ---
   13 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/docs/pcie_pci_bridge.txt b/docs/pcie_pci_bridge.txt
index 5a4203f97c..ab35ebf3ca 100644
--- a/docs/pcie_pci_bridge.txt
+++ b/docs/pcie_pci_bridge.txt
@@ -110,5 +110,5 @@ To enable device hot-plug into the bridge on Linux there're 
3 ways:
   Implementation
   ==
   The PCIE-PCI bridge is based on PCI-PCI bridge, but also accumulates PCI 
Express
-features as a PCI Express device (is_express=1).
+features as a PCI Express device.
diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index 441e21ed1f..9325bc0911 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1087,7 +1087,6 @@ static void nvme_class_init(ObjectClass *oc, void *data)
   pc->vendor_id = PCI_VENDOR_ID_INTEL;
   pc->device_id = 0x5845;
   pc->revision = 2;
-pc->is_express = 1;
   set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
   dc->desc = "Non-Volatile Memory Express";
diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
index f1af279e8d..c360f0d8c9 100644
--- a/hw/net/e1000e.c
+++ b/hw/net/e1000e.c
@@ -675,7 +675,6 @@ static void e1000e_class_init(ObjectClass *class, void 
*data)
   c->revision = 0;
   c->romfile = "efi-e1000e.rom";
   c->class_id = PCI_CLASS_NETWORK_ETHERNET;
-c->is_express = 1;
   dc->desc = "Intel 82574L GbE Controller";
   dc->reset = e1000e_qdev_reset;
diff --git a/hw/pci-bridge/pcie_pci_bridge.c b/hw/pci-bridge/pcie_pci_bridge.c
index a4d827c99d..b7d9ebbec2 100644
--- a/hw/pci-bridge/pcie_pci_bridge.c
+++ b/hw/pci-bridge/pcie_pci_bridge.c
@@ -169,7 +169,6 @@ static void pcie_pci_bridge_class_init(ObjectClass *klass, 
void *data)
   DeviceClass *dc = DEVICE_CLASS(klass);
   HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(klass);
-k->is_express = 1;
   k->is_bridge = 1;
   k->vendor_id = PCI_VENDOR_ID_REDHAT;
   k->device_id = PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE;
diff --git a/hw/pci-bridge/pcie_root_port.c b/hw/pci-bridge/pcie_root_port.c
index 9b6e4ce512..45f9e8cd4a 100644
--- a/hw/pci-bridge/pcie_root_port.c
+++ b/hw/pci-bridge/pcie_root_port.c
@@ -145,7 +145,6 @@ static void rp_class_init(ObjectClass *klass, void *data)
   DeviceClass *dc = DEVICE_CLASS(klass);
   PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
-k->is_express = 1;
   k->is_bridge = 1;
   k->config_write = rp_write_config;
   k->realize = rp_realize;
diff --git a/hw/pci-bridge/xio3130_downstream.c 
b/hw/pci-bridge/xio3130_downstream.c
index 1e09d2afb7..613a0d6bb7 100644
--- a/hw/pci-bridge/xio3130_downstream.c
+++ b/hw/pci-bridge/xio3130_downstream.c
@@ -177,7 +177,6 @@ static void xio3130_downstream_class_init(ObjectClass 
*klass, void *data)
   DeviceClass *dc = DEVICE_CLASS(klass);
   PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
-k->is_express = 1;
   k->is_bridge = 1;
   k->config_write = xio3130_downstream_write_config;
   k->realize = xio3130_downstream_realize;
diff --git a/hw/pci-bridge/xio3130_upstream.c b/hw/pci-bridge/xio3130_upstream.c
index 227997ce46..d4645bddee 100644
--- a/hw/pci-bridge/xio3130_upstream.c
+++ b/hw/pci-bridge/xio3130_upstream.c
@@ -148,7 +148,6 @@ static void xio3130_upstream_class_init(ObjectClass *klass, 
void *data)
   DeviceClass *dc = DEVICE_CLASS(klass);
   PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
-k->is_express = 1;
   k->is_bridge = 1;
   k->config_write = 

Re: [Qemu-block] [Qemu-devel] qemu process crash: Assertion failed: QLIST_EMPTY(>tracked_requests)

2017-12-11 Thread Fernando Casas Schössow
Hello Stefan,

Thanks for your reply.
Fortunately I didn’t have the problem again and it’s not clear how it can be 
consistently reproduced. Daily backups are running as usual at the moment.

If there is anything I can do from my side or if you have any ideas to try to 
reproduce it let me know.

Thanks.

Fer

Sent from my iPhone

On 11 Dec 2017, at 12:56, Stefan Hajnoczi 
> wrote:

On Thu, Dec 07, 2017 at 10:18:52AM +, Fernando Casas Schössow wrote:
Hi there,


Last night while doing a backup of a guest using the live snapshot mechanism 
the qemu process for the guest seem to had crashed.

The snapshot succeeded then the backup of the VM disk had place and also 
succeeded but the commit to the original disk after the backup seem to have 
failed.

The command I use in the script to take the snapshot is:


virsh snapshot-create-as --domain $VM backup-job.qcow2 --disk-only --atomic 
--quiesce --no-metadata


And then to commit back is:


virsh blockcommit $VM $TARGETDISK --base $DISKFILE --top $SNAPFILE --active 
--pivot


In the qemu log for the guest I found the following while the commit back was 
having place:


Assertion failed: QLIST_EMPTY(>tracked_requests) 
(/home/buildozer/aports/main/qemu/src/qemu-2.10.1/block/mirror.c: mirror_run: 
884)

I'm running qemu 2.10.1 with libvirt 3.9.0 and kernel 4.9.65 on Alpine Linux 
3.7.

This is the complete guest info from the logs:


LC_ALL=C 
PATH=/bin:/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin
 HOME=/root USER=root QEMU_AUDIO_DRV=spice /usr/bin/qemu-system-x86_64 -name 
guest=DOCKER01,debug-threads=on -S -object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-6-DOCKER01/master-key.aes
 -machine pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu 
IvyBridge,ss=on,vmx=on,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,xsaveopt=on 
-drive 
file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on
 -drive 
file=/var/lib/libvirt/qemu/nvram/DOCKER01_VARS.fd,if=pflash,format=raw,unit=1 
-m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 
4705b146-3b14-4c20-923c-42105d47e7fc -no-user-config -nodefaults -chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-6-DOCKER01/monitor.sock,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew 
-global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global 
PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device 
ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device 
ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 
-device ahci,id=sata0,bus=pci.0,addr=0x9 -device 
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive 
file=/storage/storage-ssd-vms/virtual_machines_ssd/docker01.qcow2,format=qcow2,if=none,id=drive-sata0-0-0,cache=none,aio=threads
 -device ide-hd,bus=sata0.0,drive=drive-sata0-0-0,id=sata0-0-0,bootindex=1 
-netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=35 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1c:af:ce,bus=pci.0,addr=0x3 
-chardev pty,id=charserial0 -device isa-serial,chardev=charserial0,id=serial0 
-chardev 
socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-6-DOCKER01/org.qemu.guest_agent.0,server,nowait
 -device 
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
 -chardev spicevmc,id=charchannel1,name=vdagent -device 
virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0
 -spice port=5905,addr=127.0.0.1,disable-ticketing,seamless-migration=on 
-device 
qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2
 -chardev spicevmc,id=charredir0,name=usbredir -device 
usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev 
spicevmc,id=charredir1,name=usbredir -device 
usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -object 
rng-random,id=objrng0,filename=/dev/random -device 
virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -msg timestamp=on



I was running on qemu 2.8.1 for months and didn't have any problems with the 
backups but yesterday I updated to qemu 2.10.1 and I hit this problem last 
night.


Is this a bug? Any ideas will be appreciated.

Thanks for reporting this bug.  Can you reproduce it reliably?

Stefan


[Qemu-block] [PATCH v2 0/2] virtio-blk: miscellaneous changes

2017-12-11 Thread Mark Kanda
v2: add check for maximum queue size [Stefan]

This series is for two minor virtio-blk changes. The first patch
makes the virtio-blk queue size user configurable. The second patch
rejects logical block size > physical block configurations (similar
to a recent change in virtio-scsi).

Mark Kanda (2):
  virtio-blk: make queue size configurable
  virtio-blk: reject configs with logical block size > physical block
size

 hw/block/virtio-blk.c  | 17 -
 include/hw/virtio/virtio-blk.h |  1 +
 2 files changed, 17 insertions(+), 1 deletion(-)

-- 
1.8.3.1




Re: [Qemu-block] [PATCH v2 0/2] virtio-blk: miscellaneous changes

2017-12-11 Thread Mark Kanda



On 12/11/2017 4:30 AM, Stefan Hajnoczi wrote:

Hi Mark,
Please resend as a top level email thread so the continuous integration
and patch management tools will detect your patch series.


Apologies. I've just resent the series.

Thanks,

-Mark



[Qemu-block] [PATCH v2 1/2] virtio-blk: make queue size configurable

2017-12-11 Thread Mark Kanda
Depending on the configuration, it can be beneficial to adjust the virtio-blk
queue size to something other than the current default of 128. Add a new
property to make the queue size configurable.

Signed-off-by: Mark Kanda 
Reviewed-by: Karl Heubaum 
Reviewed-by: Martin K. Petersen 
Reviewed-by: Ameya More 
---
 hw/block/virtio-blk.c  | 10 +-
 include/hw/virtio/virtio-blk.h |  1 +
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index 05d1440..fb59f57 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -928,6 +928,13 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 error_setg(errp, "num-queues property must be larger than 0");
 return;
 }
+if (!is_power_of_2(conf->queue_size) ||
+conf->queue_size > VIRTQUEUE_MAX_SIZE) {
+error_setg(errp, "invalid queue-size property (%" PRIu16 "), "
+   "must be a power of 2 (max %d)",
+   conf->queue_size, VIRTQUEUE_MAX_SIZE);
+return;
+}
 
 blkconf_serial(>conf, >serial);
 blkconf_apply_backend_options(>conf,
@@ -953,7 +960,7 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 s->sector_mask = (s->conf.conf.logical_block_size / BDRV_SECTOR_SIZE) - 1;
 
 for (i = 0; i < conf->num_queues; i++) {
-virtio_add_queue(vdev, 128, virtio_blk_handle_output);
+virtio_add_queue(vdev, conf->queue_size, virtio_blk_handle_output);
 }
 virtio_blk_data_plane_create(vdev, conf, >dataplane, );
 if (err != NULL) {
@@ -1012,6 +1019,7 @@ static Property virtio_blk_properties[] = {
 DEFINE_PROP_BIT("request-merging", VirtIOBlock, conf.request_merging, 0,
 true),
 DEFINE_PROP_UINT16("num-queues", VirtIOBlock, conf.num_queues, 1),
+DEFINE_PROP_UINT16("queue-size", VirtIOBlock, conf.queue_size, 128),
 DEFINE_PROP_LINK("iothread", VirtIOBlock, conf.iothread, TYPE_IOTHREAD,
  IOThread *),
 DEFINE_PROP_END_OF_LIST(),
diff --git a/include/hw/virtio/virtio-blk.h b/include/hw/virtio/virtio-blk.h
index d3c8a6f..5117431 100644
--- a/include/hw/virtio/virtio-blk.h
+++ b/include/hw/virtio/virtio-blk.h
@@ -39,6 +39,7 @@ struct VirtIOBlkConf
 uint32_t config_wce;
 uint32_t request_merging;
 uint16_t num_queues;
+uint16_t queue_size;
 };
 
 struct VirtIOBlockDataPlane;
-- 
1.8.3.1




[Qemu-block] [PATCH v2 2/2] virtio-blk: reject configs with logical block size > physical block size

2017-12-11 Thread Mark Kanda
virtio-blk logical block size should never be larger than physical block
size because it doesn't make sense to have such configurations. QEMU doesn't
have a way to effectively express this condition; the best it can do is
report the physical block exponent as 0 - indicating the logical block size
equals the physical block size.

This is identical to commit 3da023b5827543ee4c022986ea2ad9d1274410b2
but applied to virtio-blk (instead of virtio-scsi).

Signed-off-by: Mark Kanda 
Reviewed-by: Konrad Rzeszutek Wilk 
Reviewed-by: Ameya More 
Reviewed-by: Martin K. Petersen 
Reviewed-by: Stefan Hajnoczi 
---
 hw/block/virtio-blk.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/block/virtio-blk.c b/hw/block/virtio-blk.c
index fb59f57..36c7608 100644
--- a/hw/block/virtio-blk.c
+++ b/hw/block/virtio-blk.c
@@ -952,6 +952,13 @@ static void virtio_blk_device_realize(DeviceState *dev, 
Error **errp)
 }
 blkconf_blocksizes(>conf);
 
+if (conf->conf.logical_block_size >
+conf->conf.physical_block_size) {
+error_setg(errp,
+   "logical_block_size > physical_block_size not supported");
+return;
+}
+
 virtio_init(vdev, "virtio-blk", VIRTIO_ID_BLOCK,
 sizeof(struct virtio_blk_config));
 
-- 
1.8.3.1




Re: [Qemu-block] [PATCH 3/7] scsi: store unmap offset and nb_sectors in request struct

2017-12-11 Thread Alberto Garcia
On Mon 20 Nov 2017 05:51:00 PM CET, Anton Nefedov wrote:
> it allows to report it in the error handler
>
> Signed-off-by: Anton Nefedov 
> Reviewed-by: Vladimir Sementsov-Ogievskiy 

Reviewed-by: Alberto Garcia 

Berto



Re: [Qemu-block] [PATCH v7 for-2.12 05/25] block: Respect backing bs in bdrv_refresh_filename

2017-12-11 Thread Alberto Garcia
On Mon 20 Nov 2017 09:09:44 PM CET, Max Reitz wrote:
> Basically, bdrv_refresh_filename() should respect all children of a
> BlockDriverState. However, generally those children are driver-specific,
> so this function cannot handle the general case. On the other hand,
> there are only few drivers which use other children than @file and
> @backing (that being vmdk, quorum, and blkverify).
>
> Most block drivers only use @file and/or @backing (if they use any
> children at all). Both can be implemented directly in
> bdrv_refresh_filename.
>
> The user overriding the file's filename is already handled, however, the
> user overriding the backing file is not. If this is done, opening the
> BDS with the plain filename of its file will not be correct, so we may
> not set bs->exact_filename in that case.
>
> iotests 051 and 191 contain test cases for overwriting the backing file,
> and so their output changes with this patch applied (which I consider a
> good thing). Note that in the case of 191, the implicitly opened
> (non-overridden) base file is included in the json:{} filename as well
> -- this will be remedied by a later patch.
>
> Signed-off-by: Max Reitz 

Reviewed-by: Alberto Garcia 

Berto



Re: [Qemu-block] [Qemu-devel] [PATCH V3] pci: removed the is_express field since a uniform interface was inserted

2017-12-11 Thread Eduardo Habkost
On Mon, Dec 11, 2017 at 03:11:39PM +0200, Yoni Bettan wrote:
> 
> 
> On 12/07/2017 10:58 PM, Eduardo Habkost wrote:
> > On Tue, Dec 05, 2017 at 07:17:06PM +0200, Yoni Bettan wrote:
> > >  * according to Eduardo Habkost's commit
> > >fd3b02c8896d597dd8b9e053dec579cf0386aee1
> > > 
> > >  * since all PCIEs now implement INTERFACE_PCIE_DEVICE we
> > >don't need this field anymore
> > > 
> > >  * Devices that where only INTERFACE_PCIE_DEVICE (is_express == 1)
> > >or
> > >devices that where only INTERFACE_CONVENTIONAL_PCI_DEVICE 
> > > (is_express == 0)
> > >where not affected by the change
> > > 
> > >The only devices that were affected are those that are hybrid 
> > > and also
> > >had (is_express == 1) - therefor only:
> > >  - hw/vfio/pci.c
> > >  - hw/usb/hcd-xhci.c
> > > 
> > >For both I made sure that QEMU_PCI_CAP_EXPRESS is on
> > > 
> > > Signed-off-by: Yoni Bettan 
> > > ---
> > >   docs/pcie_pci_bridge.txt   | 2 +-
> > >   hw/block/nvme.c| 1 -
> > >   hw/net/e1000e.c| 1 -
> > >   hw/pci-bridge/pcie_pci_bridge.c| 1 -
> > >   hw/pci-bridge/pcie_root_port.c | 1 -
> > >   hw/pci-bridge/xio3130_downstream.c | 1 -
> > >   hw/pci-bridge/xio3130_upstream.c   | 1 -
> > >   hw/pci-host/xilinx-pcie.c  | 1 -
> > >   hw/pci/pci.c   | 4 +++-
> > >   hw/scsi/megasas.c  | 4 
> > >   hw/usb/hcd-xhci.c  | 7 ++-
> > >   hw/vfio/pci.c  | 3 ++-
> > >   include/hw/pci/pci.h   | 3 ---
> > >   13 files changed, 12 insertions(+), 18 deletions(-)
> > > 
> > > diff --git a/docs/pcie_pci_bridge.txt b/docs/pcie_pci_bridge.txt
> > > index 5a4203f97c..ab35ebf3ca 100644
> > > --- a/docs/pcie_pci_bridge.txt
> > > +++ b/docs/pcie_pci_bridge.txt
> > > @@ -110,5 +110,5 @@ To enable device hot-plug into the bridge on Linux 
> > > there're 3 ways:
> > >   Implementation
> > >   ==
> > >   The PCIE-PCI bridge is based on PCI-PCI bridge, but also accumulates 
> > > PCI Express
> > > -features as a PCI Express device (is_express=1).
> > > +features as a PCI Express device.
> > > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > > index 441e21ed1f..9325bc0911 100644
> > > --- a/hw/block/nvme.c
> > > +++ b/hw/block/nvme.c
> > > @@ -1087,7 +1087,6 @@ static void nvme_class_init(ObjectClass *oc, void 
> > > *data)
> > >   pc->vendor_id = PCI_VENDOR_ID_INTEL;
> > >   pc->device_id = 0x5845;
> > >   pc->revision = 2;
> > > -pc->is_express = 1;
> > >   set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);
> > >   dc->desc = "Non-Volatile Memory Express";
> > > diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
> > > index f1af279e8d..c360f0d8c9 100644
> > > --- a/hw/net/e1000e.c
> > > +++ b/hw/net/e1000e.c
> > > @@ -675,7 +675,6 @@ static void e1000e_class_init(ObjectClass *class, 
> > > void *data)
> > >   c->revision = 0;
> > >   c->romfile = "efi-e1000e.rom";
> > >   c->class_id = PCI_CLASS_NETWORK_ETHERNET;
> > > -c->is_express = 1;
> > >   dc->desc = "Intel 82574L GbE Controller";
> > >   dc->reset = e1000e_qdev_reset;
> > > diff --git a/hw/pci-bridge/pcie_pci_bridge.c 
> > > b/hw/pci-bridge/pcie_pci_bridge.c
> > > index a4d827c99d..b7d9ebbec2 100644
> > > --- a/hw/pci-bridge/pcie_pci_bridge.c
> > > +++ b/hw/pci-bridge/pcie_pci_bridge.c
> > > @@ -169,7 +169,6 @@ static void pcie_pci_bridge_class_init(ObjectClass 
> > > *klass, void *data)
> > >   DeviceClass *dc = DEVICE_CLASS(klass);
> > >   HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(klass);
> > > -k->is_express = 1;
> > >   k->is_bridge = 1;
> > >   k->vendor_id = PCI_VENDOR_ID_REDHAT;
> > >   k->device_id = PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE;
> > > diff --git a/hw/pci-bridge/pcie_root_port.c 
> > > b/hw/pci-bridge/pcie_root_port.c
> > > index 9b6e4ce512..45f9e8cd4a 100644
> > > --- a/hw/pci-bridge/pcie_root_port.c
> > > +++ b/hw/pci-bridge/pcie_root_port.c
> > > @@ -145,7 +145,6 @@ static void rp_class_init(ObjectClass *klass, void 
> > > *data)
> > >   DeviceClass *dc = DEVICE_CLASS(klass);
> > >   PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> > > -k->is_express = 1;
> > >   k->is_bridge = 1;
> > >   k->config_write = rp_write_config;
> > >   k->realize = rp_realize;
> > > diff --git a/hw/pci-bridge/xio3130_downstream.c 
> > > b/hw/pci-bridge/xio3130_downstream.c
> > > index 1e09d2afb7..613a0d6bb7 100644
> > > --- a/hw/pci-bridge/xio3130_downstream.c
> > > +++ b/hw/pci-bridge/xio3130_downstream.c
> > > @@ -177,7 +177,6 @@ static void xio3130_downstream_class_init(ObjectClass 
> > > *klass, void *data)
> > >   DeviceClass *dc = DEVICE_CLASS(klass);
> > >   PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
> > > -k->is_express = 1;
> > >   k->is_bridge = 1;
> > 

Re: [Qemu-block] [Qemu-devel] [PATCH V3] pci: removed the is_express field since a uniform interface was inserted

2017-12-11 Thread Yoni Bettan



On 12/07/2017 10:58 PM, Eduardo Habkost wrote:

On Tue, Dec 05, 2017 at 07:17:06PM +0200, Yoni Bettan wrote:

 * according to Eduardo Habkost's commit
   fd3b02c8896d597dd8b9e053dec579cf0386aee1

 * since all PCIEs now implement INTERFACE_PCIE_DEVICE we
   don't need this field anymore

 * Devices that where only INTERFACE_PCIE_DEVICE (is_express == 1)
   or
   devices that where only INTERFACE_CONVENTIONAL_PCI_DEVICE 
(is_express == 0)
   where not affected by the change

   The only devices that were affected are those that are hybrid and 
also
   had (is_express == 1) - therefor only:
 - hw/vfio/pci.c
 - hw/usb/hcd-xhci.c

   For both I made sure that QEMU_PCI_CAP_EXPRESS is on

Signed-off-by: Yoni Bettan 
---
  docs/pcie_pci_bridge.txt   | 2 +-
  hw/block/nvme.c| 1 -
  hw/net/e1000e.c| 1 -
  hw/pci-bridge/pcie_pci_bridge.c| 1 -
  hw/pci-bridge/pcie_root_port.c | 1 -
  hw/pci-bridge/xio3130_downstream.c | 1 -
  hw/pci-bridge/xio3130_upstream.c   | 1 -
  hw/pci-host/xilinx-pcie.c  | 1 -
  hw/pci/pci.c   | 4 +++-
  hw/scsi/megasas.c  | 4 
  hw/usb/hcd-xhci.c  | 7 ++-
  hw/vfio/pci.c  | 3 ++-
  include/hw/pci/pci.h   | 3 ---
  13 files changed, 12 insertions(+), 18 deletions(-)

diff --git a/docs/pcie_pci_bridge.txt b/docs/pcie_pci_bridge.txt
index 5a4203f97c..ab35ebf3ca 100644
--- a/docs/pcie_pci_bridge.txt
+++ b/docs/pcie_pci_bridge.txt
@@ -110,5 +110,5 @@ To enable device hot-plug into the bridge on Linux there're 
3 ways:
  Implementation
  ==
  The PCIE-PCI bridge is based on PCI-PCI bridge, but also accumulates PCI 
Express
-features as a PCI Express device (is_express=1).
+features as a PCI Express device.
  
diff --git a/hw/block/nvme.c b/hw/block/nvme.c

index 441e21ed1f..9325bc0911 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1087,7 +1087,6 @@ static void nvme_class_init(ObjectClass *oc, void *data)
  pc->vendor_id = PCI_VENDOR_ID_INTEL;
  pc->device_id = 0x5845;
  pc->revision = 2;
-pc->is_express = 1;
  
  set_bit(DEVICE_CATEGORY_STORAGE, dc->categories);

  dc->desc = "Non-Volatile Memory Express";
diff --git a/hw/net/e1000e.c b/hw/net/e1000e.c
index f1af279e8d..c360f0d8c9 100644
--- a/hw/net/e1000e.c
+++ b/hw/net/e1000e.c
@@ -675,7 +675,6 @@ static void e1000e_class_init(ObjectClass *class, void 
*data)
  c->revision = 0;
  c->romfile = "efi-e1000e.rom";
  c->class_id = PCI_CLASS_NETWORK_ETHERNET;
-c->is_express = 1;
  
  dc->desc = "Intel 82574L GbE Controller";

  dc->reset = e1000e_qdev_reset;
diff --git a/hw/pci-bridge/pcie_pci_bridge.c b/hw/pci-bridge/pcie_pci_bridge.c
index a4d827c99d..b7d9ebbec2 100644
--- a/hw/pci-bridge/pcie_pci_bridge.c
+++ b/hw/pci-bridge/pcie_pci_bridge.c
@@ -169,7 +169,6 @@ static void pcie_pci_bridge_class_init(ObjectClass *klass, 
void *data)
  DeviceClass *dc = DEVICE_CLASS(klass);
  HotplugHandlerClass *hc = HOTPLUG_HANDLER_CLASS(klass);
  
-k->is_express = 1;

  k->is_bridge = 1;
  k->vendor_id = PCI_VENDOR_ID_REDHAT;
  k->device_id = PCI_DEVICE_ID_REDHAT_PCIE_BRIDGE;
diff --git a/hw/pci-bridge/pcie_root_port.c b/hw/pci-bridge/pcie_root_port.c
index 9b6e4ce512..45f9e8cd4a 100644
--- a/hw/pci-bridge/pcie_root_port.c
+++ b/hw/pci-bridge/pcie_root_port.c
@@ -145,7 +145,6 @@ static void rp_class_init(ObjectClass *klass, void *data)
  DeviceClass *dc = DEVICE_CLASS(klass);
  PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
  
-k->is_express = 1;

  k->is_bridge = 1;
  k->config_write = rp_write_config;
  k->realize = rp_realize;
diff --git a/hw/pci-bridge/xio3130_downstream.c 
b/hw/pci-bridge/xio3130_downstream.c
index 1e09d2afb7..613a0d6bb7 100644
--- a/hw/pci-bridge/xio3130_downstream.c
+++ b/hw/pci-bridge/xio3130_downstream.c
@@ -177,7 +177,6 @@ static void xio3130_downstream_class_init(ObjectClass 
*klass, void *data)
  DeviceClass *dc = DEVICE_CLASS(klass);
  PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
  
-k->is_express = 1;

  k->is_bridge = 1;
  k->config_write = xio3130_downstream_write_config;
  k->realize = xio3130_downstream_realize;
diff --git a/hw/pci-bridge/xio3130_upstream.c b/hw/pci-bridge/xio3130_upstream.c
index 227997ce46..d4645bddee 100644
--- a/hw/pci-bridge/xio3130_upstream.c
+++ b/hw/pci-bridge/xio3130_upstream.c
@@ -148,7 +148,6 @@ static void xio3130_upstream_class_init(ObjectClass *klass, 
void *data)
  DeviceClass *dc = DEVICE_CLASS(klass);
  PCIDeviceClass *k = PCI_DEVICE_CLASS(klass);
  
-k->is_express = 1;

  k->is_bridge = 1;
  k->config_write = xio3130_upstream_write_config;
  k->realize = xio3130_upstream_realize;
diff --git a/hw/pci-host/xilinx-pcie.c b/hw/pci-host/xilinx-pcie.c

Re: [Qemu-block] [Qemu-devel] [PATCH for-2.12 0/4] qmp dirty bitmap API

2017-12-11 Thread Vladimir Sementsov-Ogievskiy

11.12.2017 14:15, Kevin Wolf wrote:

Am 09.12.2017 um 01:57 hat John Snow geschrieben:

Here's an idea of what this API might look like without revealing
explicit merge/split primitives.

A new bitmap property that lets us set retention:

:: block-dirty-bitmap-set-retention bitmap=foo slices=10

Or something similar, where the default property for all bitmaps is
zero -- the current behavior: no copies retained.

By setting it to a non-zero positive integer, the incremental backup
mode will automatically save a disabled copy when possible.

-EMAGIC

Operations that create or delete user-visible objects should be
explicit, not automatic. You're trying to implement management layer
functionality in qemu here, but incomplete enough that the artifacts of
it are still visible externally. (A complete solution within qemu
wouldn't expose low-level concepts such as bitmaps on an external
interface, but you would expose something like checkpoints.)

Usually it's not a good idea to have a design where qemu implements
enough to restrict management tools to whatever use case we had in mind,
but not enough to make the management tool's life substantially easier
(by not having to care about some low-level concepts).


"What happens if we exceed our retention?"

(A) We push the last one out automatically, or
(B) We fail the operation immediately.

A is more convenient, but potentially unsafe if the management tool or
user wasn't aware that was going to happen.
B is more annoying, but definitely more safe as it means we cannot lose
a bitmap accidentally.

Both mean that the management layer has not only to deal with the
deletion of bitmaps as it wants to have them, but also to keep the
retention counter somewhere and predict what qemu is going to do to the
bitmaps and whether any corrective action needs to be taken.

This is making things more complex rather than simpler.


I would argue for B with perhaps a force-cycle=true|false that defaults
to false to let management tools say "Yes, go ahead, remove the old one"
with additionally some return to let us know it happened:

{"return": {
   "dropped-slices": [ {"bitmap0": 0}, ...]
}}

This would introduce some concept of bitmap slices into the mix as ID'd
children of a bitmap. I would propose that these slices are numbered and
monotonically increasing. "bitmap0" as an object starts with no slices,
but every incremental backup creates slice 0, slice 1, slice 2, and so
on. Even after we start deleting some, they stay ordered. These numbers
then stand in for points in time.

The counter can (must?) be reset and all slices forgotten when
performing a full backup while providing a bitmap argument.

"How can a user make use of the slices once they're made?"

Let's consider something like mode=partial in contrast to
mode=incremental, and an example where we have 6 prior slices:
0,1,2,3,4,5, (and, unnamed, the 'active' slice.)

mode=partial bitmap=foo slice=4

This would create a backup from slice 4 to the current time α. This
includes all clusters from 4, 5, and the active bitmap.

I don't think it is meaningful to define any end point that isn't the
current time, so I've omitted that as a possibility.

John, what are you doing here? This adds option after option, and even
additional slice object, only complicating an easy thing more and more.
I'm not sure if that was your intention, but I feel I'm starting to
understand better how Linus's rants come about.

Let me summarise what this means for management layer:

* The management layer has to manage bitmaps. They have direct control
   over creation and deletion of bitmaps. So far so good.

* It also has to manage slices in those bitmaps objects; and these
   slices are what contains the actual bitmaps. In order to identify a
   bitmap in qemu, you need:

 a) the node name
 b) the bitmap ID, and
 c) the slice number

   The slice number is assigned by qemu and libvirt has to wait until
   qemu tells it about the slice number of a newly created slice. If
   libvirt doesn't receive the reply to the command that started the
   block job, it needs to be able to query this information from qemu,
   e.g. in query-block-jobs.

* Slices are automatically created when you start a backup job with a
   bitmap. It doesn't matter whether you even intend to do an incremental
   backup against this point in time. qemu knows better.

* In order to delete a slice that you don't need any more, you have to
   create more slices (by doing more backups), but you don't get to
   decide which one is dropped. qemu helpfully just drops the oldest one.
   It doesn't matter if you want to keep an older one so you can do an
   incremental backup for a longer timespan. Don't worry about your
   backup strategy, qemu knows better.

* Of course, just creating a new backup job doesn't mean that removing
   the old slice works, even if you give the respective option. That's
   what the 'dropped-slices' return is for. So once again wait for
   

Re: [Qemu-block] [Qemu-devel] QEMU not honouring bootorder

2017-12-11 Thread Michal Privoznik
On 12/11/2017 12:41 PM, Fam Zheng wrote:
> On Thu, 12/07 13:10, Michal Privoznik wrote:
>> Dear list,
>>
>> I've encountered the following problem. I have two disks:
>>
>>   /var/lib/libvirt/images/fedora.qcow2  (which contains OS)
>>   /dev/sde (iSCSI dummy disk just for testing)
>>
>> Now, when I configure QEMU to start with both of them, QEMU/Seabios
>> tries to boot from /dev/sde which fails obviously. Even setting
>> bootorder does not help. Here's my command line:
>>
>>   qemu-system-x86_64 \
>>   -boot menu=on,strict=on \
>>   -device lsi,id=scsi0,bus=pci.0 \
>>   -drive 
>> file=/var/lib/libvirt/images/fedora.qcow2,format=qcow2,if=none,id=drive-scsi0
>>  \
>>   -device scsi-hd,bus=scsi0.0,drive=drive-scsi0,bootindex=1 \
>>   -drive file=/dev/sde,format=raw,if=none,id=drive-scsi1 \
>>   -device scsi-block,bus=scsi0.0,drive=drive-scsi1,bootindex=2
>>
>> It was found that if 'drive-scsi1' is scsi-hd instead of scsi-block
>> everything works as expected and I can boot my guest successfully.
> 
> Does it help if you add SCSI level ordering with 
> "lun={0,1},channel=0,scsi-id=0"
> for both devices?

Setting lun helps. On the other hand, I had to change from LSI
controller to virtio-scsi as LSI doesn't support more than 1 LUNs.

Michal



Re: [Qemu-block] [Qemu-devel] qemu process crash: Assertion failed: QLIST_EMPTY(>tracked_requests)

2017-12-11 Thread Stefan Hajnoczi
On Thu, Dec 07, 2017 at 10:18:52AM +, Fernando Casas Schössow wrote:
> Hi there,
> 
> 
> Last night while doing a backup of a guest using the live snapshot mechanism 
> the qemu process for the guest seem to had crashed.
> 
> The snapshot succeeded then the backup of the VM disk had place and also 
> succeeded but the commit to the original disk after the backup seem to have 
> failed.
> 
> The command I use in the script to take the snapshot is:
> 
> 
> virsh snapshot-create-as --domain $VM backup-job.qcow2 --disk-only --atomic 
> --quiesce --no-metadata
> 
> 
> And then to commit back is:
> 
> 
> virsh blockcommit $VM $TARGETDISK --base $DISKFILE --top $SNAPFILE --active 
> --pivot
> 
> 
> In the qemu log for the guest I found the following while the commit back was 
> having place:
> 
> 
> Assertion failed: QLIST_EMPTY(>tracked_requests) 
> (/home/buildozer/aports/main/qemu/src/qemu-2.10.1/block/mirror.c: mirror_run: 
> 884)
> 
> I'm running qemu 2.10.1 with libvirt 3.9.0 and kernel 4.9.65 on Alpine Linux 
> 3.7.
> 
> This is the complete guest info from the logs:
> 
> 
> LC_ALL=C 
> PATH=/bin:/sbin:/bin:/sbin:/usr/bin:/usr/sbin:/usr/bin:/usr/sbin:/usr/local/bin:/usr/local/sbin
>  HOME=/root USER=root QEMU_AUDIO_DRV=spice /usr/bin/qemu-system-x86_64 -name 
> guest=DOCKER01,debug-threads=on -S -object 
> secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-6-DOCKER01/master-key.aes
>  -machine pc-i440fx-2.8,accel=kvm,usb=off,dump-guest-core=off -cpu 
> IvyBridge,ss=on,vmx=on,pcid=on,hypervisor=on,arat=on,tsc_adjust=on,xsaveopt=on
>  -drive 
> file=/usr/share/edk2.git/ovmf-x64/OVMF_CODE-pure-efi.fd,if=pflash,format=raw,unit=0,readonly=on
>  -drive 
> file=/var/lib/libvirt/qemu/nvram/DOCKER01_VARS.fd,if=pflash,format=raw,unit=1 
> -m 2048 -realtime mlock=off -smp 2,sockets=2,cores=1,threads=1 -uuid 
> 4705b146-3b14-4c20-923c-42105d47e7fc -no-user-config -nodefaults -chardev 
> socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-6-DOCKER01/monitor.sock,server,nowait
>  -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc,driftfix=slew 
> -global kvm-pit.lost_tick_policy=delay -no-hpet -no-shutdown -global 
> PIIX4_PM.disable_s3=1 -global PIIX4_PM.disable_s4=1 -boot strict=on -device 
> ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 -device 
> ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4
>  -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 
> -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 
> -device ahci,id=sata0,bus=pci.0,addr=0x9 -device 
> virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x5 -drive 
> file=/storage/storage-ssd-vms/virtual_machines_ssd/docker01.qcow2,format=qcow2,if=none,id=drive-sata0-0-0,cache=none,aio=threads
>  -device ide-hd,bus=sata0.0,drive=drive-sata0-0-0,id=sata0-0-0,bootindex=1 
> -netdev tap,fd=33,id=hostnet0,vhost=on,vhostfd=35 -device 
> virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:1c:af:ce,bus=pci.0,addr=0x3
>  -chardev pty,id=charserial0 -device 
> isa-serial,chardev=charserial0,id=serial0 -chardev 
> socket,id=charchannel0,path=/var/lib/libvirt/qemu/channel/target/domain-6-DOCKER01/org.qemu.guest_agent.0,server,nowait
>  -device 
> virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
>  -chardev spicevmc,id=charchannel1,name=vdagent -device 
> virtserialport,bus=virtio-serial0.0,nr=2,chardev=charchannel1,id=channel1,name=com.redhat.spice.0
>  -spice port=5905,addr=127.0.0.1,disable-ticketing,seamless-migration=on 
> -device 
> qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2
>  -chardev spicevmc,id=charredir0,name=usbredir -device 
> usb-redir,chardev=charredir0,id=redir0,bus=usb.0,port=2 -chardev 
> spicevmc,id=charredir1,name=usbredir -device 
> usb-redir,chardev=charredir1,id=redir1,bus=usb.0,port=3 -device 
> virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x7 -object 
> rng-random,id=objrng0,filename=/dev/random -device 
> virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x8 -msg timestamp=on
> 
> 
> 
> I was running on qemu 2.8.1 for months and didn't have any problems with the 
> backups but yesterday I updated to qemu 2.10.1 and I hit this problem last 
> night.
> 
> 
> Is this a bug? Any ideas will be appreciated.

Thanks for reporting this bug.  Can you reproduce it reliably?

Stefan


signature.asc
Description: PGP signature


Re: [Qemu-block] [Qemu-devel] [PATCH v3] qemu-img: Document --force-share / -U

2017-12-11 Thread Fam Zheng
On Mon, 12/11 10:50, Kevin Wolf wrote:
> Am 11.12.2017 um 10:33 hat Fam Zheng geschrieben:
> > Signed-off-by: Fam Zheng 
> > 
> > ---
> > 
> > v3: Document that the option is not allowed for read-write. [Stefan]
> > 
> > v2: - "code{qemu-img}". [Kashyap, Eric]
> > - "etc.." -> "etc.".
> > ---
> >  qemu-img.texi | 9 +
> >  1 file changed, 9 insertions(+)
> > 
> > diff --git a/qemu-img.texi b/qemu-img.texi
> > index fdcf120f36..d85ace73f0 100644
> > --- a/qemu-img.texi
> > +++ b/qemu-img.texi
> > @@ -57,6 +57,15 @@ exclusive with the @var{-O} parameters. It is currently 
> > required to also use
> >  the @var{-n} parameter to skip image creation. This restriction may be 
> > relaxed
> >  in a future release.
> >  
> > +@item --force-share (-U)
> > +
> > +If specified, @code{qemu-img} will open the image with shared permissions,
> > +which makes it less likely to conflict with a running guest's permissions 
> > due
> > +to image locking. For example, this can be used to get the image 
> > information
> > +(with 'info' subcommand) when the image is used by a running guest. Note 
> > that
> > +this could produce inconsistent result because of concurrent metadata 
> > changes,
> > +etc. This option is only allowed when opening image in read-only mode.
> 
> I think "an image", "the image" or "images" could all work in this
> context, but just "image" without an article doesn't.

"Images" would be my intent if I proofread it, but you're right, all sound okay
to me. Thanks.

Fam



Re: [Qemu-block] [Qemu-devel] QEMU not honouring bootorder

2017-12-11 Thread Fam Zheng
On Thu, 12/07 13:10, Michal Privoznik wrote:
> Dear list,
> 
> I've encountered the following problem. I have two disks:
> 
>   /var/lib/libvirt/images/fedora.qcow2  (which contains OS)
>   /dev/sde (iSCSI dummy disk just for testing)
> 
> Now, when I configure QEMU to start with both of them, QEMU/Seabios
> tries to boot from /dev/sde which fails obviously. Even setting
> bootorder does not help. Here's my command line:
> 
>   qemu-system-x86_64 \
>   -boot menu=on,strict=on \
>   -device lsi,id=scsi0,bus=pci.0 \
>   -drive 
> file=/var/lib/libvirt/images/fedora.qcow2,format=qcow2,if=none,id=drive-scsi0 
> \
>   -device scsi-hd,bus=scsi0.0,drive=drive-scsi0,bootindex=1 \
>   -drive file=/dev/sde,format=raw,if=none,id=drive-scsi1 \
>   -device scsi-block,bus=scsi0.0,drive=drive-scsi1,bootindex=2
> 
> It was found that if 'drive-scsi1' is scsi-hd instead of scsi-block
> everything works as expected and I can boot my guest successfully.

Does it help if you add SCSI level ordering with "lun={0,1},channel=0,scsi-id=0"
for both devices?

Fam



Re: [Qemu-block] [Qemu-devel] [PATCH for-2.12 0/4] qmp dirty bitmap API

2017-12-11 Thread Kevin Wolf
Am 09.12.2017 um 01:57 hat John Snow geschrieben:
> Here's an idea of what this API might look like without revealing
> explicit merge/split primitives.
> 
> A new bitmap property that lets us set retention:
> 
> :: block-dirty-bitmap-set-retention bitmap=foo slices=10
> 
> Or something similar, where the default property for all bitmaps is
> zero -- the current behavior: no copies retained.
> 
> By setting it to a non-zero positive integer, the incremental backup
> mode will automatically save a disabled copy when possible.

-EMAGIC

Operations that create or delete user-visible objects should be
explicit, not automatic. You're trying to implement management layer
functionality in qemu here, but incomplete enough that the artifacts of
it are still visible externally. (A complete solution within qemu
wouldn't expose low-level concepts such as bitmaps on an external
interface, but you would expose something like checkpoints.)

Usually it's not a good idea to have a design where qemu implements
enough to restrict management tools to whatever use case we had in mind,
but not enough to make the management tool's life substantially easier
(by not having to care about some low-level concepts).

> "What happens if we exceed our retention?"
> 
> (A) We push the last one out automatically, or
> (B) We fail the operation immediately.
> 
> A is more convenient, but potentially unsafe if the management tool or
> user wasn't aware that was going to happen.
> B is more annoying, but definitely more safe as it means we cannot lose
> a bitmap accidentally.

Both mean that the management layer has not only to deal with the
deletion of bitmaps as it wants to have them, but also to keep the
retention counter somewhere and predict what qemu is going to do to the
bitmaps and whether any corrective action needs to be taken.

This is making things more complex rather than simpler.

> I would argue for B with perhaps a force-cycle=true|false that defaults
> to false to let management tools say "Yes, go ahead, remove the old one"
> with additionally some return to let us know it happened:
> 
> {"return": {
>   "dropped-slices": [ {"bitmap0": 0}, ...]
> }}
> 
> This would introduce some concept of bitmap slices into the mix as ID'd
> children of a bitmap. I would propose that these slices are numbered and
> monotonically increasing. "bitmap0" as an object starts with no slices,
> but every incremental backup creates slice 0, slice 1, slice 2, and so
> on. Even after we start deleting some, they stay ordered. These numbers
> then stand in for points in time.
> 
> The counter can (must?) be reset and all slices forgotten when
> performing a full backup while providing a bitmap argument.
> 
> "How can a user make use of the slices once they're made?"
> 
> Let's consider something like mode=partial in contrast to
> mode=incremental, and an example where we have 6 prior slices:
> 0,1,2,3,4,5, (and, unnamed, the 'active' slice.)
> 
> mode=partial bitmap=foo slice=4
> 
> This would create a backup from slice 4 to the current time α. This
> includes all clusters from 4, 5, and the active bitmap.
> 
> I don't think it is meaningful to define any end point that isn't the
> current time, so I've omitted that as a possibility.

John, what are you doing here? This adds option after option, and even
additional slice object, only complicating an easy thing more and more.
I'm not sure if that was your intention, but I feel I'm starting to
understand better how Linus's rants come about.

Let me summarise what this means for management layer:

* The management layer has to manage bitmaps. They have direct control
  over creation and deletion of bitmaps. So far so good.

* It also has to manage slices in those bitmaps objects; and these
  slices are what contains the actual bitmaps. In order to identify a
  bitmap in qemu, you need:

a) the node name
b) the bitmap ID, and
c) the slice number

  The slice number is assigned by qemu and libvirt has to wait until
  qemu tells it about the slice number of a newly created slice. If
  libvirt doesn't receive the reply to the command that started the
  block job, it needs to be able to query this information from qemu,
  e.g. in query-block-jobs.

* Slices are automatically created when you start a backup job with a
  bitmap. It doesn't matter whether you even intend to do an incremental
  backup against this point in time. qemu knows better.

* In order to delete a slice that you don't need any more, you have to
  create more slices (by doing more backups), but you don't get to
  decide which one is dropped. qemu helpfully just drops the oldest one.
  It doesn't matter if you want to keep an older one so you can do an
  incremental backup for a longer timespan. Don't worry about your
  backup strategy, qemu knows better.

* Of course, just creating a new backup job doesn't mean that removing
  the old slice works, even if you give the respective option. That's
  what the 'dropped-slices' 

Re: [Qemu-block] [PATCH v2 0/2] virtio-blk: miscellaneous changes

2017-12-11 Thread Stefan Hajnoczi
On Fri, Dec 08, 2017 at 09:57:25AM -0600, Mark Kanda wrote:
> v2: add check for maximum queue size [Stefan]
> 
> This series is for two minor virtio-blk changes. The first patch
> makes the virtio-blk queue size user configurable. The second patch
> rejects logical block size > physical block configurations (similar
> to a recent change in virtio-scsi).
> 
> Mark Kanda (2):
>   virtio-blk: make queue size configurable
>   virtio-blk: reject configs with logical block size > physical block
> size
> 
>  hw/block/virtio-blk.c  | 17 -
>  include/hw/virtio/virtio-blk.h |  1 +
>  2 files changed, 17 insertions(+), 1 deletion(-)

Hi Mark,
Please resend as a top level email thread so the continuous integration
and patch management tools will detect your patch series.

From https://wiki.qemu.org/Contribute/SubmitAPatch:

  "Send each new revision as a new top-level thread, rather than burying
  it in-reply-to an earlier revision, as many reviewers are not looking
  inside deep threads for new patches."

Thanks,
Stefan


signature.asc
Description: PGP signature


Re: [Qemu-block] QEMU not honouring bootorder

2017-12-11 Thread Michal Privoznik
On 12/07/2017 01:10 PM, Michal Privoznik wrote:
> Dear list,
> 

Ping. Is there anything I can help you with to have this sorted out?

Michal




Re: [Qemu-block] [PATCH v3] qemu-img: Document --force-share / -U

2017-12-11 Thread Kevin Wolf
Am 11.12.2017 um 10:33 hat Fam Zheng geschrieben:
> Signed-off-by: Fam Zheng 
> 
> ---
> 
> v3: Document that the option is not allowed for read-write. [Stefan]
> 
> v2: - "code{qemu-img}". [Kashyap, Eric]
> - "etc.." -> "etc.".
> ---
>  qemu-img.texi | 9 +
>  1 file changed, 9 insertions(+)
> 
> diff --git a/qemu-img.texi b/qemu-img.texi
> index fdcf120f36..d85ace73f0 100644
> --- a/qemu-img.texi
> +++ b/qemu-img.texi
> @@ -57,6 +57,15 @@ exclusive with the @var{-O} parameters. It is currently 
> required to also use
>  the @var{-n} parameter to skip image creation. This restriction may be 
> relaxed
>  in a future release.
>  
> +@item --force-share (-U)
> +
> +If specified, @code{qemu-img} will open the image with shared permissions,
> +which makes it less likely to conflict with a running guest's permissions due
> +to image locking. For example, this can be used to get the image information
> +(with 'info' subcommand) when the image is used by a running guest. Note that
> +this could produce inconsistent result because of concurrent metadata 
> changes,
> +etc. This option is only allowed when opening image in read-only mode.

I think "an image", "the image" or "images" could all work in this
context, but just "image" without an article doesn't.

Kevin



[Qemu-block] [PATCH v3] qemu-img: Document --force-share / -U

2017-12-11 Thread Fam Zheng
Signed-off-by: Fam Zheng 

---

v3: Document that the option is not allowed for read-write. [Stefan]

v2: - "code{qemu-img}". [Kashyap, Eric]
- "etc.." -> "etc.".
---
 qemu-img.texi | 9 +
 1 file changed, 9 insertions(+)

diff --git a/qemu-img.texi b/qemu-img.texi
index fdcf120f36..d85ace73f0 100644
--- a/qemu-img.texi
+++ b/qemu-img.texi
@@ -57,6 +57,15 @@ exclusive with the @var{-O} parameters. It is currently 
required to also use
 the @var{-n} parameter to skip image creation. This restriction may be relaxed
 in a future release.
 
+@item --force-share (-U)
+
+If specified, @code{qemu-img} will open the image with shared permissions,
+which makes it less likely to conflict with a running guest's permissions due
+to image locking. For example, this can be used to get the image information
+(with 'info' subcommand) when the image is used by a running guest. Note that
+this could produce inconsistent result because of concurrent metadata changes,
+etc. This option is only allowed when opening image in read-only mode.
+
 @item fmt
 is the disk image format. It is guessed automatically in most cases. See below
 for a description of the supported disk formats.
-- 
2.14.3




Re: [Qemu-block] [Qemu-devel] [PATCH v2] qemu-img: Document --force-share / -U

2017-12-11 Thread Fam Zheng
On Fri, 12/08 09:54, Stefan Hajnoczi wrote:
> On Fri, Dec 08, 2017 at 09:44:56AM +0800, Fam Zheng wrote:
> > +@item --force-share (-U)
> > +
> > +If specified, @code{qemu-img} will open the image with shared permissions,
> > +which makes it less likely to conflict with a running guest's permissions 
> > due
> > +to image locking. For example, this can be used to get the image 
> > information
> > +(with 'info' subcommand) when the image is used by a running guest. Note 
> > that
> > +this could produce inconsistent result because of concurrent metadata 
> > changes,
> > +etc.
> 
> The documentation isn't clear on whether read-write is supported or just
> read-only.

block.c will emit an error if used with read-write. I will add a sentence for
that.

Fam



Re: [Qemu-block] [PATCH v6 09/20] parallels: Switch to .bdrv_co_block_status()

2017-12-11 Thread Vladimir Sementsov-Ogievskiy

09.12.2017 19:39, Eric Blake wrote:

On 12/09/2017 06:31 AM, Vladimir Sementsov-Ogievskiy wrote:

07.12.2017 23:30, Eric Blake wrote:

We are gradually moving away from sector-based interfaces, towards
byte-based.  Update the parallels driver accordingly.  Note that
the internal function block_status() is still sector-based, because
it is still in use by other sector-based functions; but that's okay
because request_alignment is 512 as a result of those functions.
For now, no optimizations are added based on the mapping hint.

Signed-off-by: Eric Blake 

   {
   BDRVParallelsState *s = bs->opaque;
-    int64_t offset;
+    int count;

+    assert(QEMU_IS_ALIGNED(offset | bytes, BDRV_SECTOR_SIZE));
   qemu_co_mutex_lock(>lock);
-    offset = block_status(s, sector_num, nb_sectors, pnum);
+    offset = block_status(s, offset >> BDRV_SECTOR_BITS,
+  bytes >> BDRV_SECTOR_BITS, );
   qemu_co_mutex_unlock(>lock);

+    *pnum = count * BDRV_SECTOR_SIZE;
   if (offset < 0) {
   return 0;
   }

+    *map = offset;

oh, didn't notice previous time :(, should be

*map = offset * BDRV_SECTOR_SIZE


Oh, right.


(also, if you already use ">> BDRV_SECTOR_BITS" in this function,
would not it be more consistent to use "<< BDRV_SECTOR_BITS"
for sector-to-byte conversion?)

with that fixed (with "<<" or "*", up to you),

'>> BDRV_SECTOR_BITS' always works.  You could also write '/
BDRV_SECTOR_SIZE' (the compiler should figure out that you are dividing
by a power of 2, and use the shift instruction under the hood), but I
find that a bit harder to reason about.

On the other hand, '<< BDRV_SECTOR_BITS' only produces the same size
output as the input; if the left side is 32 bits, it risks overflowing.
But '* BDRV_SECTOR_SIZE' always produces a 64-bit value.  So I've
learned (from past mistakes in other byte-conversion series) that the
multiply form is less likely to introduce unintended truncation bugs.


hmm, interesting observation, thank you




Reviewed-by: Vladimir Sementsov-Ogievskiy 




--
Best regards,
Vladimir




Re: [Qemu-block] [Qemu-devel] [PATCH for-2.12 0/4] qmp dirty bitmap API

2017-12-11 Thread Denis V. Lunev
On 12/09/2017 03:57 AM, John Snow wrote:
> This is going to be a long one. Maybe go get a cup of coffee.
>
> On 12/07/2017 04:39 AM, Vladimir Sementsov-Ogievskiy wrote:
>> 07.12.2017 03:38, John Snow wrote:
>>> I'm sorry, I don't think I understand.
>>>
>>> "customers needs a possibility to create a backup of data changed since
>>> some point in time."
>>>
>>> Is that not the existing case for a simple incremental backup? Granted,
>>> the point in time was decided when we created the bitmap or when we made
>>> the last backup, but it is "since some point in time."
>>>
>>> If you mean to say an arbitrary point in time after-the-fact, I don't
>>> see how the API presented here helps enable that functionality.
>>>
>>> (by "arbitrary point in time after-the-fact I mean for example: Say a
>>> user installs a malicious application in a VM on Thursday, but the
>>> bitmap was created on Monday. The user wants to go back to Wednesday
>>> evening, but we have no record of that point in time, so we cannot go
>>> back to it.)
>>>
>>> Can you elaborate on what you're trying to accomplish so I make sure I'm
>>> considering you carefully?
>> Yes, point in time is when we create a dirty bitmap. But we want to
>> maintain several points in time.
>> For example it may be last 10 incremental backups.
>> User wants an ability to create incremental backup which will contain
>> changes from selected point in
>> time to current moment. It is needed for example if backup was deleted
>> (to save disk space) and now user
>> wants to recreate it.
>>
>> In current scheme for incremental backup, after successful backup we
>> actually lose previous point in time,
>> and have only the last one.
> Differential backup mode may help a little in this flexibility and costs
> us basically nothing to implement:
>
> We simply always re-merge the bitmaps after creating the backup. So you
> have two options:
>
> (1) Incremental: Replace the existing point-in-time with a new one
> (2) Differential: Keep the existing point-in-time.
>
> I suspect you are wanting something a lot more powerful than this, though.
>
>> With ability to merge bitmap we can do the following:
>>
>> 1. create bitmap1
>> 2. disable bitmap1, do external backup by bitmap1, create bitmap2
>> 2.1 on backup fail: merge bitmap2 to bitmap1, enable bitmap1, delete
>> bitmap2
>> 2.2 on backup success: do nothing
> Okay, so bitmap1 and bitmap2 cover periods of time that are disjoint;
> where you have
>
> time--->
> [bitmap1][bitmap2-->
>
> so you intend to accrue a number of bitmaps representing the last N
> slices of time, with only the most recent bitmap being enabled.
>
> Functionally you intend to permanently fork a bitmap every time a backup
> operation succeeds; so on incremental backup:
>
> (1) We succeed and the forked bitmap we already made gets saved as a new
> disabled bitmap instead of being deleted.
> (2) We fail, and we roll back exactly as we always have.
>
>
> Here's an idea of what this API might look like without revealing
> explicit merge/split primitives.
>
> A new bitmap property that lets us set retention:
>
> :: block-dirty-bitmap-set-retention bitmap=foo slices=10
>
> Or something similar, where the default property for all bitmaps is zero
> -- the current behavior: no copies retained.
>
> By setting it to a non-zero positive integer, the incremental backup
> mode will automatically save a disabled copy when possible.
>
> "What happens if we exceed our retention?"
>
> (A) We push the last one out automatically, or
> (B) We fail the operation immediately.
>
> A is more convenient, but potentially unsafe if the management tool or
> user wasn't aware that was going to happen.
> B is more annoying, but definitely more safe as it means we cannot lose
> a bitmap accidentally.
>
> I would argue for B with perhaps a force-cycle=true|false that defaults
> to false to let management tools say "Yes, go ahead, remove the old one"
> with additionally some return to let us know it happened:
>
> {"return": {
>   "dropped-slices": [ {"bitmap0": 0}, ...]
> }}
>
> This would introduce some concept of bitmap slices into the mix as ID'd
> children of a bitmap. I would propose that these slices are numbered and
> monotonically increasing. "bitmap0" as an object starts with no slices,
> but every incremental backup creates slice 0, slice 1, slice 2, and so
> on. Even after we start deleting some, they stay ordered. These numbers
> then stand in for points in time.
>
> The counter can (must?) be reset and all slices forgotten when
> performing a full backup while providing a bitmap argument.
>
> "How can a user make use of the slices once they're made?"
>
> Let's consider something like mode=partial in contrast to
> mode=incremental, and an example where we have 6 prior slices:
> 0,1,2,3,4,5, (and, unnamed, the 'active' slice.)
>
> mode=partial bitmap=foo slice=4
>
> This would create a backup from slice 4 to the current time α. This
> includes all clusters from 

Re: [Qemu-block] [PATCH v2] qemu-img: Document --force-share / -U

2017-12-11 Thread Kashyap Chamarthy
On Fri, Dec 08, 2017 at 09:44:56AM +0800, Fam Zheng wrote:
> Signed-off-by: Fam Zheng 
> 
> ---
> v2: - "code{qemu-img}". [Kashyap, Eric]
> - "etc.." -> "etc.".
> ---
>  qemu-img.texi | 9 +
>  1 file changed, 9 insertions(+)

Reviewed-by: Kashyap Chamarthy 

[...]

-- 
/kashyap



Re: [Qemu-block] [Qemu-devel] [PATCH 3/3] ide: abort TRIM operation for invalid range

2017-12-11 Thread Anton Nefedov

On 8/12/2017 10:51 PM, John Snow wrote:


Looks about right, just remember that this flow won't call
block_acct_invalid because you're bypassing the return to ide_dma_cb. I
assume you'll get to that in your next series.



Yes; I meant to keep the trim accounting in ide_issue_trim_cb()


For now, this should properly reject bogus TRIM commands. When you send
your next series, may I ask for a simple test case if possible?



Sure, I'll look into it


1-3:
Reviewed-by: John Snow