Re: [Qemu-block] [PATCH v3 05/10] block: add transactional callbacks feature

2015-04-23 Thread Max Reitz

On 23.04.2015 02:04, John Snow wrote:

The goal here is to add a new method to transactions that allows
developers to specify a callback that will get invoked only once
all jobs spawned by a transaction are completed, allowing developers
the chance to perform actions conditionally pending complete success,
partial failure, or complete failure.

In order to register the new callback to be invoked, a user must request
a callback pointer and closure by calling new_action_cb_wrapper, which
creates a wrapper around an opaque pointer and callback that would have
originally been passed to e.g. backup_start().

The function will return a function pointer and a new opaque pointer to
be passed instead. The transaction system will effectively intercept the
original callbacks and perform book-keeping on the transaction after it
has delivered the original enveloped callback.

This means that Transaction Action callback methods will be called after
all callbacks triggered by all Actions in the Transactional group have
been received.

This feature has no knowledge of any jobs spawned by Actions that do not
inform the system via new_action_cb_wrapper().

For an example of how to use the feature, please skip ahead to:
'block: drive_backup transaction callback support' which serves as an example
for how to hook up a post-transaction callback to the Drive Backup action.


Note 1: Defining a callback method alone is not sufficient to have the new
 method invoked. You must call new_action_cb_wrapper() AND ensure the
 callback it returns is the one used as the callback for the job
 launched by the action.

Note 2: You can use this feature for any system that registers completions of
 an asynchronous task via a callback of the form
 (void *opaque, int ret), not just block job callbacks.

Signed-off-by: John Snow js...@redhat.com
---
  blockdev.c | 183 +++--
  1 file changed, 179 insertions(+), 4 deletions(-)


Reviewed-by: Max Reitz mre...@redhat.com



Re: [Qemu-block] [PATCH v3 08/10] qmp: Add an implementation wrapper for qmp_drive_backup

2015-04-23 Thread Max Reitz

On 23.04.2015 02:04, John Snow wrote:

We'd like to be able to specify the callback given to backup_start
manually in the case of transactions, so split apart qmp_drive_backup
into an implementation and a wrapper.

Switch drive_backup_prepare to use the new wrapper, but don't overload
the callback and closure yet.

Signed-off-by: John Snow js...@redhat.com
---
  blockdev.c | 78 +++---
  1 file changed, 59 insertions(+), 19 deletions(-)


Reviewed-by: Max Reitz mre...@redhat.com



Re: [Qemu-block] [PATCH v3 10/10] iotests: 124 - transactional failure test

2015-04-23 Thread Max Reitz

On 23.04.2015 02:04, John Snow wrote:

Use a transaction to request an incremental backup across two drives.
Coerce one of the jobs to fail, and then re-run the transaction.

Verify that no bitmap data was lost due to the partial transaction
failure.

Signed-off-by: John Snow js...@redhat.com
---
  tests/qemu-iotests/124 | 120 -
  tests/qemu-iotests/124.out |   4 +-
  2 files changed, 121 insertions(+), 3 deletions(-)


Reviewed-by: Max Reitz mre...@redhat.com



Re: [Qemu-block] [PATCH v3 09/10] block: drive_backup transaction callback support

2015-04-23 Thread Max Reitz

On 23.04.2015 02:04, John Snow wrote:

This patch actually implements the transactional callback system
for the drive_backup action.

(1) We manually pick up a reference to the bitmap if present to allow
 its cleanup to be delayed until after all drive_backup jobs launched
 by the transaction have fully completed.

(2) We create a functional closure that envelops the original drive_backup
 callback, to be able to intercept the completion status and return code
 for the job.

(3) We add the drive_backup_cb method for the drive_backup action, which
 unpacks the completion information and invokes the final cleanup.

(4) backup_transaction_complete will perform the final cleanup on the
 backup job.

(5) In the case of transaction cancellation, drive_backup_cb is still
 responsible for cleaning up the mess we may have already made.

Signed-off-by: John Snow js...@redhat.com
---
  block/backup.c|  9 
  blockdev.c| 53 ---
  include/block/block_int.h |  8 +++
  3 files changed, 67 insertions(+), 3 deletions(-)


Reviewed-by: Max Reitz mre...@redhat.com



Re: [Qemu-block] [PATCH v3 02/10] iotests: add transactional incremental backup test

2015-04-23 Thread Max Reitz

On 23.04.2015 02:04, John Snow wrote:

Test simple usage cases for using transactions to create
and synchronize incremental backups.

Signed-off-by: John Snow js...@redhat.com
---
  tests/qemu-iotests/124 | 54 ++
  tests/qemu-iotests/124.out |  4 ++--
  2 files changed, 56 insertions(+), 2 deletions(-)


Reviewed-by: Max Reitz mre...@redhat.com



Re: [Qemu-block] [PATCH v6 00/21] block: transactionless incremental backup series

2015-04-23 Thread John Snow



On 04/23/2015 09:19 AM, Stefan Hajnoczi wrote:

On Fri, Apr 17, 2015 at 07:49:48PM -0400, John Snow wrote:

===
v6:
===

01: s/underlaying/underlying/
 Removed a reference to 'disabled' bitmaps.
 Touching up inconsistent list indentation.
 Added FreeBSD Documentation License, primarily to be difficult


Please stick to the currently used set of licenses in the future, unless
you have a strong reason.  It's not a good use of anyone's time to fuss
with licenses when we have enough of them in the codebase already.

In my non-lawyer opinion the license you chose seems okay but I'd rather
avoid the risk and hassle.

Thanks,
Stefan



I know I said primarily to be difficult but I was just being 
facetious. I didn't find the GPL2+ to be suitable for documentation, 
strictly, so I went to read up on the documentation licenses that the 
fsf support/recommend.


There's the GNU documentation license, but I found that unsuitable for a 
couple reasons -- one of them was that you are forbidden(!) from 
changing the text of the license, and there are some provisions in there 
I didn't like, such as requiring the full text of the license to be 
included with compiled copies of the document. That's not something I 
care about -- a reference in source, for instance, is sufficient, or a 
copy of the license being distributed *with* the compiled source is 
fine, but I have no need for the full license to be copied with the 
compiled version.


The other documentation license the fsf recommends is the FreeBSD one, 
and that one looked appealing, short, and to the point, so it is the one 
I chose. It is essentially the FreeBSD license with words altered to 
clarify what code and source means with respect to a document.


Sorry for /actually/ being difficult; but Eric Blake was urging me to 
select a license instead of relying on the implicit GPL, so I did go out 
of my way to choose one I found appropriate.


I stand by my pick.

--js



Re: [Qemu-block] [PATCH v6 20/21] iotests: add incremental backup failure recovery test

2015-04-23 Thread Stefan Hajnoczi
On Fri, Apr 17, 2015 at 07:50:08PM -0400, John Snow wrote:
 Test the failure case for incremental backups.
 
 Signed-off-by: John Snow js...@redhat.com
 Reviewed-by: Max Reitz mre...@redhat.com
 ---
  tests/qemu-iotests/124 | 57 
 ++
  tests/qemu-iotests/124.out |  4 ++--
  2 files changed, 59 insertions(+), 2 deletions(-)

Reviewed-by: Stefan Hajnoczi stefa...@redhat.com


pgpEXpOJWS7ep.pgp
Description: PGP signature


Re: [Qemu-block] [PATCH v6 21/21] iotests: add incremental backup granularity tests

2015-04-23 Thread Stefan Hajnoczi
On Fri, Apr 17, 2015 at 07:50:09PM -0400, John Snow wrote:
 Test what happens if you fiddle with the granularity.
 
 Reviewed-by: Max Reitz mre...@redhat.com
 Signed-off-by: John Snow js...@redhat.com
 ---
  tests/qemu-iotests/124 | 58 
 +-
  tests/qemu-iotests/124.out |  4 ++--
  2 files changed, 49 insertions(+), 13 deletions(-)

Reviewed-by: Stefan Hajnoczi stefa...@redhat.com


pgpvKb06jALns.pgp
Description: PGP signature


[Qemu-block] [Question] the problem about driver-mirror

2015-04-23 Thread 张敏
I test the function of driver-mirror when run io stress test in vm.
The fio's model is 128 iodepth, 4K block-size, 100% write, 100% randon.
This is a continuous test, and then I start drive-mirror to copy data
to other image file. I find this job can't finish.
I analysis the problem and find the dirty bitmap is always set, the job
can't converge.
Any ideas to solve this problem?
Thanks




Re: [Qemu-block] [Qemu-devel] [PATCH v6 00/21] block: transactionless incremental backup series

2015-04-23 Thread John Snow



On 04/23/2015 03:18 PM, Eric Blake wrote:

On 04/23/2015 08:41 AM, John Snow wrote:


I know I said primarily to be difficult but I was just being
facetious. I didn't find the GPL2+ to be suitable for documentation,
strictly, so I went to read up on the documentation licenses that the
fsf support/recommend.

There's the GNU documentation license, but I found that unsuitable for a
couple reasons -- one of them was that you are forbidden(!) from
changing the text of the license,


Note that it is usually only the license text proper that is locked
down; the rest of the documentation is not under the same restriction
unless you declare specific invariant sections such as a cover page. But
I know that the Debian project has typically frowned upon any use of FDL
with invariant sections, and the FDL has therefore earned a somewhat
questionable reputation outside of FSF projects.



Understood; however the GNU FDL specifies within the license where and 
how the GNU FDL must be displayed. I didn't like these requirements, and 
might've used the FDL, but you are prohibited from altering the license, 
so I chose against this license.


It's too restrictive for me.


and there are some provisions in there
I didn't like, such as requiring the full text of the license to be
included with compiled copies of the document. That's not something I
care about -- a reference in source, for instance, is sufficient, or a
copy of the license being distributed *with* the compiled source is
fine, but I have no need for the full license to be copied with the
compiled version.


Yes, I like those benefits of the FreeBSD Documentation License.



The other documentation license the fsf recommends is the FreeBSD one,
and that one looked appealing, short, and to the point, so it is the one
I chose. It is essentially the FreeBSD license with words altered to
clarify what code and source means with respect to a document.


In particular, according to the FSF, the FreeBSD Documentation License
_should be_ acceptable for use with a GPLv2 program:

https://www.gnu.org/philosophy/license-list.html#FreeDocumentationLicenses

although this is probably not the right list to get a definitive answer
from a lawyer familiar with the various copyright licenses and laws.



Sorry for /actually/ being difficult; but Eric Blake was urging me to
select a license instead of relying on the implicit GPL, so I did go out
of my way to choose one I found appropriate.

I stand by my pick.


I also agree with the pick; I think that GPLv2+ on documentation is a
bit questionable - if someone else implements the same interface using
just the documentation, is their code required to be under the GPL by
virtue of using the documentation?  Using a more permissive
documentation license feels nicer to me, as it would allow non-GPL
implementations if someone is so inclined.  Sorry if encouraging the
issue has made matters more difficult.



It's too late! You've opened Pandora's Box!



Re: [Qemu-block] [PATCH COLO v3 01/14] docs: block replication's description

2015-04-23 Thread Wen Congyang
On 04/23/2015 05:26 PM, Paolo Bonzini wrote:
 
 
 On 23/04/2015 11:00, Kevin Wolf wrote:
 Because it may be the right design.

 If you're really worried about the test matrix, put a check in the
 filter block driver that its bs-file is qcow2. Of course, such an
 artificial restriction looks a bit ugly, but using a bad design just
 in order to get the same restriction is even worse.

 Stefan originally wanted to put image streaming in the QED driver. I
 think we'll agree today that it was right to reject that. It's simply
 not functionality related to the format. Adding replication logic to
 qcow2 looks similar to me in that respect.
 
 Yes, I can't deny it is similar.  Still, there is a very important
 difference: limiting colo's internal workings to qcow2 or NBD doesn't
 limit what the user can do (while streaming limited the user to image
 files in QED format).
 
 It may also depend on how the patches look like and how much the colo
 code relies on other internal state.
 
 For NBD the answer is almost nothing, and you don't even need a filter
 driver.  You only need to separate sharply the configure and open
 phases.  So it may indeed be possible to generalize the handling of the
 secondary to non-NBD.
 
 It may be the same for the primary; I admit I haven't even tried to read
 the qcow2 patch, as I couldn't do a meaningful review.

For qcow2, we need to read/write from NBD target directly after failover,
because the cache image(the format is qcow2) may be put in ramfs to get
better performance. The other thing is not changed.

For qcow2, if we use a filter driver, the bs-file-drv should support
backing file, and make_empty. So it can be the other format.

Thanks
Wen Congyang

 
 Paolo
 .
 




Re: [Qemu-block] [PATCH COLO v3 01/14] docs: block replication's description

2015-04-23 Thread Paolo Bonzini


On 23/04/2015 11:00, Kevin Wolf wrote:
 Because it may be the right design.
 
 If you're really worried about the test matrix, put a check in the
 filter block driver that its bs-file is qcow2. Of course, such an
 artificial restriction looks a bit ugly, but using a bad design just
 in order to get the same restriction is even worse.
 
 Stefan originally wanted to put image streaming in the QED driver. I
 think we'll agree today that it was right to reject that. It's simply
 not functionality related to the format. Adding replication logic to
 qcow2 looks similar to me in that respect.

Yes, I can't deny it is similar.  Still, there is a very important
difference: limiting colo's internal workings to qcow2 or NBD doesn't
limit what the user can do (while streaming limited the user to image
files in QED format).

It may also depend on how the patches look like and how much the colo
code relies on other internal state.

For NBD the answer is almost nothing, and you don't even need a filter
driver.  You only need to separate sharply the configure and open
phases.  So it may indeed be possible to generalize the handling of the
secondary to non-NBD.

It may be the same for the primary; I admit I haven't even tried to read
the qcow2 patch, as I couldn't do a meaningful review.

Paolo



Re: [Qemu-block] [PATCH COLO v3 01/14] docs: block replication's description

2015-04-23 Thread Stefan Hajnoczi
On Wed, Apr 22, 2015 at 05:28:01PM +0800, Wen Congyang wrote:
 On 04/22/2015 05:18 PM, Stefan Hajnoczi wrote:
  On Tue, Apr 21, 2015 at 05:28:01PM +0200, Paolo Bonzini wrote:
  On 21/04/2015 03:25, Wen Congyang wrote:
  Please do not introduce name+colo block drivers.  This approach is
  invasive and makes block replication specific to only a few block
  drivers, e.g. NBD or qcow2.
  NBD is used to connect to secondary qemu, so it must be used. But the 
  primary
  qemu uses quorum, so the primary disk can be any format.
  The secondary disk is nbd target, and it can also be any format. The cache
  disk(active disk/hidden disk) is an empty disk, and it is created before 
  run
  COLO. The cache disk format is qcow2 now. In theory, it can be ant format 
  which
  supports backing file. But the driver should be updated to support colo 
  mode.
 
  A cleaner approach is a QMP command or -drive options that work for any
  BlockDriverState.
 
  OK, I will add a new drive option to avoid use name+colo.
 
  Actually I liked the foo+colo names.
 
  These are just internal details of the implementations and the
  primary/secondary disks actually can be any format.
 
  Stefan, what was your worry with the +colo block drivers?
  
  Why does NBD need to know about COLO?  It should be possible to use
  iSCSI or other protocols too.
 
 Hmm, if you want to use iSCSI or other protocols, you should update the driver
 to implement block replication's control interface.
 
 Currently, we only support nbd now.

I took a quick look at the NBD patches in this series, it looks like
they are a hacky way to make quorum dynamically reconfigurable.

In other words, what you really need is a way to enable/disable a quorum
child or even add/remove children at run-time.

NBD is not the right place to implement that.  Add APIs to quorum so
COLO code can use them.

Or maybe I'm misinterpreting the patches, I only took a quick look...

Stefan


pgplmUuucC3yz.pgp
Description: PGP signature


Re: [Qemu-block] [PATCH COLO v3 01/14] docs: block replication's description

2015-04-23 Thread Paolo Bonzini


On 23/04/2015 12:17, Kevin Wolf wrote:
  Perhaps quorum is not a great match after all, and it's better to add a
  new colo driver similar to quorum but simpler and only using the read
  policy that you need for colo.  The new driver would also know how to
  use BDRV_O_NO_CONNECT.  In any case the amount of work needed would not
  be too big.

 I thought the same, but haven't looked at the details yet. But if I
 understand correctly, the plan is to take quorum and add options to turn
 off the functionality of using a quorum - that's a bit odd.

Yes, indeed.  Quorum was okay for experimenting, now it's better to cp
quorum.c colo.c and clean up the code instead of adding options to
quorum.  There's not going to be more duplication between quorum.c and
colo.c than, say, between colo.c and blkverify.c.

 What I think is really needed here is essentially an active mirror
 filter.

Yes, an active synchronous mirror.  It can be either a filter or a
device.  Has anyone ever come up with a design for filters?  Colo
doesn't need much more complexity than a toy blkverify filter.

Paolo



Re: [Qemu-block] [PATCH COLO v3 01/14] docs: block replication's description

2015-04-23 Thread Paolo Bonzini


On 23/04/2015 12:40, Kevin Wolf wrote:
 The question that is still open for me is whether it would be a colo.c
 or an active-mirror.c, i.e. if this would be tied specifically to COLO
 or if it could be kept generic enough that it could be used for other
 use cases as well.

Understood (now).

 What I think is really needed here is essentially an active mirror
 filter.

 Yes, an active synchronous mirror.  It can be either a filter or a
 device.  Has anyone ever come up with a design for filters?  Colo
 doesn't need much more complexity than a toy blkverify filter.
 
 I think what we're doing now for quorum/blkverify/blkdebug is okay.
 
 The tricky and yet unsolved part is how to add/remove filter BDSes at
 runtime (dynamic reconfiguration), but IIUC that isn't needed here.

Yes, it is.  The defer connection to NBD when replication is started
is effectively add the COLO filter (with the NBD connection as a
children) when replication is started.

Similarly close the NBD device when replication is stopped is
effectively remove the COLO filter (which brings the NBD connection
down with it).

Paolo



Re: [Qemu-block] [PATCH COLO v3 01/14] docs: block replication's description

2015-04-23 Thread Wen Congyang
On 04/23/2015 05:55 PM, Stefan Hajnoczi wrote:
 On Wed, Apr 22, 2015 at 05:28:01PM +0800, Wen Congyang wrote:
 On 04/22/2015 05:18 PM, Stefan Hajnoczi wrote:
 On Tue, Apr 21, 2015 at 05:28:01PM +0200, Paolo Bonzini wrote:
 On 21/04/2015 03:25, Wen Congyang wrote:
 Please do not introduce name+colo block drivers.  This approach is
 invasive and makes block replication specific to only a few block
 drivers, e.g. NBD or qcow2.
 NBD is used to connect to secondary qemu, so it must be used. But the 
 primary
 qemu uses quorum, so the primary disk can be any format.
 The secondary disk is nbd target, and it can also be any format. The cache
 disk(active disk/hidden disk) is an empty disk, and it is created before 
 run
 COLO. The cache disk format is qcow2 now. In theory, it can be ant format 
 which
 supports backing file. But the driver should be updated to support colo 
 mode.

 A cleaner approach is a QMP command or -drive options that work for any
 BlockDriverState.

 OK, I will add a new drive option to avoid use name+colo.

 Actually I liked the foo+colo names.

 These are just internal details of the implementations and the
 primary/secondary disks actually can be any format.

 Stefan, what was your worry with the +colo block drivers?

 Why does NBD need to know about COLO?  It should be possible to use
 iSCSI or other protocols too.

 Hmm, if you want to use iSCSI or other protocols, you should update the 
 driver
 to implement block replication's control interface.

 Currently, we only support nbd now.
 
 I took a quick look at the NBD patches in this series, it looks like
 they are a hacky way to make quorum dynamically reconfigurable.
 
 In other words, what you really need is a way to enable/disable a quorum
 child or even add/remove children at run-time.
 
 NBD is not the right place to implement that.  Add APIs to quorum so
 COLO code can use them.
 
 Or maybe I'm misinterpreting the patches, I only took a quick look...

Hmm, if we can enable/disable or add/remove a child at run-time, it is another
choice.

Thanks
Wen Congyang

 
 Stefan
 




Re: [Qemu-block] [PATCH] qcow2: do lazy allocation of the L2 cache

2015-04-23 Thread Stefan Hajnoczi
On Wed, Apr 22, 2015 at 04:37:15PM +0200, Alberto Garcia wrote:
 On Wed 22 Apr 2015 12:26:02 PM CEST, Stefan Hajnoczi wrote:
 
  Large disk images need large L2 caches in order to maximize their I/O
  performance. However setting a correct size for the cache is not
  necessarily easy since apart from the image size, it also depends on
  other factors like its usage patterns or whether it's part of a
  backing chain.
  
  In order to be able to set a very large cache size to cover the
  worst-case scenarios and yet prevent an unnecessary waste of memory,
  this patch modifies the qcow2 cache algorithm so the memory for each
  entry is allocated only when it's actually needed.
  
  This also improves the scenarios with smaller images: the current
  default L2 cache size can map a whole 8GB disk, so those smaller than
  that are allocating cache memory that can never be used.
 
  What measurable improvement does this patch make?
 
  Buffers allocated upfront are not touched, so they don't actually
  consume physical memory.
 
 For a cache size of 128MB, the PSS is actually ~10MB larger without the
 patch, which seems to come from posix_memalign().

Do you mean RSS or are you using a tool that reports a PSS number that
I don't know about?

We should understand what is going on instead of moving the code around
to hide/delay the problem.

Stefan


pgpihllvMsUoj.pgp
Description: PGP signature


Re: [Qemu-block] [PATCH COLO v3 01/14] docs: block replication's description

2015-04-23 Thread Wen Congyang
On 04/23/2015 05:00 PM, Kevin Wolf wrote:
 Am 22.04.2015 um 12:12 hat Paolo Bonzini geschrieben:
 On 22/04/2015 11:31, Kevin Wolf wrote:
 Actually I liked the foo+colo names.

 These are just internal details of the implementations and the
 primary/secondary disks actually can be any format.

 Stefan, what was your worry with the +colo block drivers?

 I haven't read the patches yet, so I may be misunderstanding, but
 wouldn't a separate filter driver be more appropriate than modifying
 qcow2 with logic that has nothing to do with the image format?

 Possibly; on the other hand, why multiply the size of the test matrix
 with options that no one will use and that will bitrot?
 
 Because it may be the right design.
 
 If you're really worried about the test matrix, put a check in the
 filter block driver that its bs-file is qcow2. Of course, such an
 artificial restriction looks a bit ugly, but using a bad design just
 in order to get the same restriction is even worse.

The bs-file-driver should support backing file, and use backing reference
already.

What about the primary side? We should control when to connect to NBD server,
not in nbd_open().

Thanks
Wen Congyang

 
 Stefan originally wanted to put image streaming in the QED driver. I
 think we'll agree today that it was right to reject that. It's simply
 not functionality related to the format. Adding replication logic to
 qcow2 looks similar to me in that respect.
 
 Kevin
 .
 




Re: [Qemu-block] [PATCH COLO v3 01/14] docs: block replication's description

2015-04-23 Thread Paolo Bonzini


On 23/04/2015 11:14, Wen Congyang wrote:
 The bs-file-driver should support backing file, and use backing reference
 already.
 
 What about the primary side? We should control when to connect to NBD server,
 not in nbd_open().

My naive suggestion could be to add a BDRV_O_NO_CONNECT option to
bdrv_open and a separate bdrv_connect callback.  Open would fail if
BDRV_O_NO_CONNECT is specified and drv-bdrv_connect is NULL.

You would then need a way to have quorum pass BDRV_O_NO_CONNECT.

Perhaps quorum is not a great match after all, and it's better to add a
new colo driver similar to quorum but simpler and only using the read
policy that you need for colo.  The new driver would also know how to
use BDRV_O_NO_CONNECT.  In any case the amount of work needed would not
be too big.

Paolo



Re: [Qemu-block] [PATCH COLO v3 01/14] docs: block replication's description

2015-04-23 Thread Dr. David Alan Gilbert
* Paolo Bonzini (pbonz...@redhat.com) wrote:
 
 
 On 23/04/2015 14:05, Dr. David Alan Gilbert wrote:
  As presented at the moment, I don't see there's any dynamic reconfiguration
  on the primary side at the moment
 
 So that means the bdrv_start_replication and bdrv_stop_replication
 callbacks are more or less redundant, at least on the primary?
 
 In fact, who calls them?  Certainly nothing in this patch set...
 :)

In the main colo set (I'm looking at the February version) there
are calls to them, the 'stop_replication' is called at failover time.

Here is I think the later version:
http://lists.nongnu.org/archive/html/qemu-devel/2015-03/msg05391.html

Dave

 
 Paolo
 
  - it starts up in the configuration with
  the quorum(disk, NBD), and that's the way it stays throughout the 
  fault-tolerant
  setup; the primary doesn't start running until the secondary is connected.
  
  Similarly the secondary startups in the configuration and stays that way;
  the interesting question to me is what happens after a failure.
  
  If the secondary fails, then your primary is still quorum(disk, NBD) but
  the NBD side is dead - so I don't think you need to do anything there
  immediately.
  
  If the primary fails, and the secondary takes over, then a lot of the
  stuff on the secondary now becomes redundent; does that stay the same
  and just operate in some form of passthrough - or does it need to
  change configuration?
  
  The hard part to me is how to bring it back into fault-tolerance now;
  after a primary failure, the secondary now needs to morph into something
  like a primary, and somehow you need to bring up a new secondary
  and get that new secondary an image of the primaries current disk.
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-block] [PATCH COLO v3 01/14] docs: block replication's description

2015-04-23 Thread Paolo Bonzini


On 23/04/2015 14:19, Dr. David Alan Gilbert wrote:
  So that means the bdrv_start_replication and bdrv_stop_replication
  callbacks are more or less redundant, at least on the primary?
  
  In fact, who calls them?  Certainly nothing in this patch set...
  :)
 In the main colo set (I'm looking at the February version) there
 are calls to them, the 'stop_replication' is called at failover time.
 
 Here is I think the later version:
 http://lists.nongnu.org/archive/html/qemu-devel/2015-03/msg05391.html

I think the primary shouldn't do any I/O after failover (and the
secondary should close the NBD server) so it is probably okay to ignore
the removal for now.  Inserting the filter dynamically is probably
needed though.

Paolo



Re: [Qemu-block] [PATCH v6 00/21] block: transactionless incremental backup series

2015-04-23 Thread Stefan Hajnoczi
On Fri, Apr 17, 2015 at 07:49:48PM -0400, John Snow wrote:
 ===
 v6:
 ===
 
 01: s/underlaying/underlying/
 Removed a reference to 'disabled' bitmaps.
 Touching up inconsistent list indentation.
 Added FreeBSD Documentation License, primarily to be difficult

Please stick to the currently used set of licenses in the future, unless
you have a strong reason.  It's not a good use of anyone's time to fuss
with licenses when we have enough of them in the codebase already.

In my non-lawyer opinion the license you chose seems okay but I'd rather
avoid the risk and hassle.

Thanks,
Stefan


pgpPyTMJK68Iw.pgp
Description: PGP signature


Re: [Qemu-block] [PATCH v6 06/21] hbitmap: cache array lengths

2015-04-23 Thread Stefan Hajnoczi
On Fri, Apr 17, 2015 at 07:49:54PM -0400, John Snow wrote:
 As a convenience: between incremental backups, bitmap migrations
 and bitmap persistence we seem to need to recalculate these a lot.
 
 Because the lengths are a little bit-twiddly, let's just solidly
 cache them and be done with it.
 
 Reviewed-by: Max Reitz mre...@redhat.com
 Reviewed-by: Eric Blake ebl...@redhat.com
 Signed-off-by: John Snow js...@redhat.com
 ---
  util/hbitmap.c | 4 
  1 file changed, 4 insertions(+)

Reviewed-by: Stefan Hajnoczi stefa...@redhat.com


pgpJFTFui6cyd.pgp
Description: PGP signature


Re: [Qemu-block] [PATCH v6 15/21] block: Resize bitmaps on bdrv_truncate

2015-04-23 Thread Stefan Hajnoczi
On Fri, Apr 17, 2015 at 07:50:03PM -0400, John Snow wrote:
 Signed-off-by: John Snow js...@redhat.com
 ---
  block.c| 18 ++
  include/qemu/hbitmap.h | 10 ++
  util/hbitmap.c | 48 
  3 files changed, 76 insertions(+)

Reviewed-by: Stefan Hajnoczi stefa...@redhat.com


pgpqDBe_ud8NM.pgp
Description: PGP signature


Re: [Qemu-block] [PATCH COLO v3 01/14] docs: block replication's description

2015-04-23 Thread Paolo Bonzini


On 23/04/2015 13:36, Kevin Wolf wrote:
 Crap. Then we need to figure out dynamic reconfiguration for filters
 (CCed Markus and Jeff).
 
 And this is really part of the fundamental operation mode and not just a
 way to give users a way to change their mind at runtime? Because if it
 were, we could go forward without that for the start and add dynamic
 reconfiguration in a second step.

I honestly don't know.  Wen, David?

Paolo

 Anyway, even if we move it to a second step, it looks like we need to
 design something rather soon now.



Re: [Qemu-block] [PATCH COLO v3 01/14] docs: block replication's description

2015-04-23 Thread Dr. David Alan Gilbert
* Paolo Bonzini (pbonz...@redhat.com) wrote:
 
 
 On 23/04/2015 13:36, Kevin Wolf wrote:
  Crap. Then we need to figure out dynamic reconfiguration for filters
  (CCed Markus and Jeff).
  
  And this is really part of the fundamental operation mode and not just a
  way to give users a way to change their mind at runtime? Because if it
  were, we could go forward without that for the start and add dynamic
  reconfiguration in a second step.
 
 I honestly don't know.  Wen, David?

As presented at the moment, I don't see there's any dynamic reconfiguration
on the primary side at the moment - it starts up in the configuration with
the quorum(disk, NBD), and that's the way it stays throughout the fault-tolerant
setup; the primary doesn't start running until the secondary is connected.

Similarly the secondary startups in the configuration and stays that way;
the interesting question to me is what happens after a failure.

If the secondary fails, then your primary is still quorum(disk, NBD) but
the NBD side is dead - so I don't think you need to do anything there
immediately.

If the primary fails, and the secondary takes over, then a lot of the
stuff on the secondary now becomes redundent; does that stay the same
and just operate in some form of passthrough - or does it need to
change configuration?

The hard part to me is how to bring it back into fault-tolerance now;
after a primary failure, the secondary now needs to morph into something
like a primary, and somehow you need to bring up a new secondary
and get that new secondary an image of the primaries current disk.

Dave

 Paolo
 
  Anyway, even if we move it to a second step, it looks like we need to
  design something rather soon now.
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-block] [PATCH COLO v3 01/14] docs: block replication's description

2015-04-23 Thread Wen Congyang
On 04/23/2015 06:44 PM, Paolo Bonzini wrote:
 
 
 On 23/04/2015 12:40, Kevin Wolf wrote:
 The question that is still open for me is whether it would be a colo.c
 or an active-mirror.c, i.e. if this would be tied specifically to COLO
 or if it could be kept generic enough that it could be used for other
 use cases as well.
 
 Understood (now).
 
 What I think is really needed here is essentially an active mirror
 filter.

 Yes, an active synchronous mirror.  It can be either a filter or a
 device.  Has anyone ever come up with a design for filters?  Colo
 doesn't need much more complexity than a toy blkverify filter.

 I think what we're doing now for quorum/blkverify/blkdebug is okay.

 The tricky and yet unsolved part is how to add/remove filter BDSes at
 runtime (dynamic reconfiguration), but IIUC that isn't needed here.
 
 Yes, it is.  The defer connection to NBD when replication is started
 is effectively add the COLO filter (with the NBD connection as a
 children) when replication is started.
 
 Similarly close the NBD device when replication is stopped is
 effectively remove the COLO filter (which brings the NBD connection
 down with it).

Hmm, I don't understand it clearly. Do you mean:
1. COLO filter is quorum's child
2. We can add/remove quorum's child at run-time.

If I misunderstand something, please correct me.

Thanks
Wen Congyang

 
 Paolo
 .
 




Re: [Qemu-block] [PATCH COLO v3 01/14] docs: block replication's description

2015-04-23 Thread Kevin Wolf
Am 23.04.2015 um 12:44 hat Paolo Bonzini geschrieben:
 On 23/04/2015 12:40, Kevin Wolf wrote:
  The question that is still open for me is whether it would be a colo.c
  or an active-mirror.c, i.e. if this would be tied specifically to COLO
  or if it could be kept generic enough that it could be used for other
  use cases as well.
 
 Understood (now).
 
  What I think is really needed here is essentially an active mirror
  filter.
 
  Yes, an active synchronous mirror.  It can be either a filter or a
  device.  Has anyone ever come up with a design for filters?  Colo
  doesn't need much more complexity than a toy blkverify filter.
  
  I think what we're doing now for quorum/blkverify/blkdebug is okay.
  
  The tricky and yet unsolved part is how to add/remove filter BDSes at
  runtime (dynamic reconfiguration), but IIUC that isn't needed here.
 
 Yes, it is.  The defer connection to NBD when replication is started
 is effectively add the COLO filter (with the NBD connection as a
 children) when replication is started.
 
 Similarly close the NBD device when replication is stopped is
 effectively remove the COLO filter (which brings the NBD connection
 down with it).

Crap. Then we need to figure out dynamic reconfiguration for filters
(CCed Markus and Jeff).

And this is really part of the fundamental operation mode and not just a
way to give users a way to change their mind at runtime? Because if it
were, we could go forward without that for the start and add dynamic
reconfiguration in a second step.

Anyway, even if we move it to a second step, it looks like we need to
design something rather soon now.

Kevin



Re: [Qemu-block] [PATCH for-2.4 V2 0/9] various improvements for the iSCSI driver

2015-04-23 Thread Stefan Hajnoczi
On Thu, Apr 16, 2015 at 04:08:24PM +0200, Peter Lieven wrote:
 v1-v2: - removed the requirement for libiscsi 1.10.0 [Paolo]
 - reworked to force_next_flush logic [Paolo]
 
 Peter Lieven (9):
   block/iscsi: do not forget to logout from target
   block/iscsi: change all iscsilun properties from uint8_t to bool
   block/iscsi: rename iscsi_write_protected and let it return void
   block/iscsi: store DPOFUA bit from the modesense command
   block/iscsi: optimize WRITE10/16 if cache.writeback is not set
   block/iscsi: increase retry count
   block/iscsi: handle SCSI_STATUS_TASK_SET_FULL
   block/iscsi: bump year in copyright notice
   block/iscsi: use the allocationmap also if cache.direct=on
 
  block/iscsi.c | 64 
 ---
  1 file changed, 44 insertions(+), 20 deletions(-)
 
 -- 
 1.9.1
 
 

Thanks, applied to my block-next tree:
https://github.com/stefanha/qemu/commits/block-next

Stefan


pgpXUlR2Tvxqi.pgp
Description: PGP signature


Re: [Qemu-block] [PATCH COLO v3 01/14] docs: block replication's description

2015-04-23 Thread Wen Congyang
On 04/24/2015 10:01 AM, Fam Zheng wrote:
 On Thu, 04/23 14:23, Paolo Bonzini wrote:


 On 23/04/2015 14:19, Dr. David Alan Gilbert wrote:
 So that means the bdrv_start_replication and bdrv_stop_replication
 callbacks are more or less redundant, at least on the primary?

 In fact, who calls them?  Certainly nothing in this patch set...
 :)
 In the main colo set (I'm looking at the February version) there
 are calls to them, the 'stop_replication' is called at failover time.

 Here is I think the later version:
 http://lists.nongnu.org/archive/html/qemu-devel/2015-03/msg05391.html

 I think the primary shouldn't do any I/O after failover (and the
 secondary should close the NBD server) so it is probably okay to ignore
 the removal for now.  Inserting the filter dynamically is probably
 needed though.
 
 Or maybe just enabling/disabling?

Hmm, after failover, the secondary qemu should become primary qemu, but we don't
know the nbd server's IP/port when we execute the secondary qemu. So we need
to inserting nbd client dynamically after failover.

Thanks
Wen Congyang

 
 Fam
 .