Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
On Wed, May 22, 2013 at 11:46:15PM +0200, Paolo Bonzini wrote: Il 22/05/2013 22:47, Richard W.M. Jones ha scritto: I meant if there was interest in reading from a disk that isn't fully synchronized (yet) to the original disk (it might have old blocks). Or would you only want to connect once a (complete) snapshot is available (synchronized completely to some point-in. IIUC a disk which wasn't fully synchronized wouldn't necessarily be interpretable by libguestfs, so I guess we would need the complete snapshot. In the case of point-in-time backups (Stefan's block-backup) the plan is to have the snapshot complete from the beginning. The way it will work is that the drive-backup target is a qcow2 image with the guest's disk as its backing file. When the guest writes to the disk, drive-backup copies the original data to the qcow2 image. The qcow2 image is exported over NBD so a client can connect to access the read-only point-in-time snapshot. It is not necessary to populate the qcow2 file since it uses the guest disk as its backing file - all reads to unpopulated clusters go to the backing file. Stefan
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
On Wed, May 15, 2013 at 7:54 AM, Paolo Bonzini pbonz...@redhat.com wrote: But does this really cover all use cases a real synchronous active mirror would provide? I understood that Wolf wants to get every single guest request exposed e.g. on an NBD connection. He can use throttling to limit the guest's I/O speed to the size of the asynchronous mirror's buffer. Throttling is fine for me, and actually what I do today (this is the highest source of overhead for a system that wants to see everything), just with the tracing framework. -- Wolf
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
On Thu, May 16, 2013 at 9:44 AM, Richard W.M. Jones rjo...@redhat.comwrote: Ideally I'd like to issue some QMP commands which would set up the point-in-time snapshot, and then connect to this snapshot over (eg) NBD, then when I'm done, send some more QMP commands to tear down the snapshot. This is actually interesting. Does the QEMU nbd server support multiple readers? Essentially, if you're RWMJ (not me), and you're keeping a full mirror, it's clear that the mirror write stream goes to an nbd server, but is it possible to attach a reader to that same nbd server and read things back (read-only)? I know it's possible to name the volumes you attach to, so I think conceptually with the nbd protocol this should work. I think this document would be better with one or more examples showing how this would be used. I think the thread now has me looking at making the mirror command 'active' :-) rather than have a new QMP command. -- Wolf
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
Il 22/05/2013 17:51, Wolfgang Richter ha scritto: On Thu, May 16, 2013 at 9:44 AM, Richard W.M. Jones rjo...@redhat.com mailto:rjo...@redhat.com wrote: Ideally I'd like to issue some QMP commands which would set up the point-in-time snapshot, and then connect to this snapshot over (eg) NBD, then when I'm done, send some more QMP commands to tear down the snapshot. This is actually interesting. Does the QEMU nbd server support multiple readers? Yes. Essentially, if you're RWMJ (not me), and you're keeping a full mirror, it's clear that the mirror write stream goes to an nbd server, but is it possible to attach a reader to that same nbd server and read things back (read-only)? Yes, it can be done with both qemu-nbd and the QEMU nbd server commands. Paolo
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
On Wed, May 22, 2013 at 12:11 PM, Paolo Bonzini pbonz...@redhat.com wrote: Essentially, if you're RWMJ (not me), and you're keeping a full mirror, it's clear that the mirror write stream goes to an nbd server, but is it possible to attach a reader to that same nbd server and read things back (read-only)? Yes, it can be done with both qemu-nbd and the QEMU nbd server commands. Then this means, if there was an active mirror (or snapshot being created), it would be easy to attach an nbd client as a reader to it even as it is being synchronized (perhaps dangerous?). -- Wolf
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
On Wed, May 22, 2013 at 11:51:16AM -0400, Wolfgang Richter wrote: This is actually interesting. Does the QEMU nbd server support multiple readers? Yes. qemu-nbd has a -e/--shared=N option which appears to do exactly what it says in the man page. $ guestfish -N fs exit $ ls -lh test1.img -rw-rw-r--. 1 rjones rjones 100M May 22 17:37 test1.img $ qemu-nbd -e 3 -r -t test1.img From another shell: $ guestfish --format=raw -a nbd://localhost Welcome to guestfish, the guest filesystem shell for editing virtual machine filesystems and disk images. Type: 'help' for help on commands 'man' to read the manual 'quit' to quit the shell fs run fs list-filesystems /dev/sda1: ext2 Run up to two extra guestfish instances, with the same result. The fourth guestfish instance hangs at the 'run' command until one of the first three is told to exit. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones virt-top is 'top' for virtual machines. Tiny program with many powerful monitoring features, net stats, disk stats, logging, etc. http://people.redhat.com/~rjones/virt-top
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
On Wed, May 22, 2013 at 12:42 PM, Richard W.M. Jones rjo...@redhat.comwrote: Run up to two extra guestfish instances, with the same result. The fourth guestfish instance hangs at the 'run' command until one of the first three is told to exit. And your interested on being notified when a snapshot is safe to read from? Or is it valuable to try reading immediately? -- Wolf
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
On Wed, May 22, 2013 at 02:32:37PM -0400, Wolfgang Richter wrote: On Wed, May 22, 2013 at 12:42 PM, Richard W.M. Jones rjo...@redhat.comwrote: Run up to two extra guestfish instances, with the same result. The fourth guestfish instance hangs at the 'run' command until one of the first three is told to exit. And your interested on being notified when a snapshot is safe to read from? Or is it valuable to try reading immediately? I'm not sure I understand the question. I assumed (maybe wrongly) that if we had an NBD address (ie. Unix socket or IP:port) then we'd just connect to that and go. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Fedora Windows cross-compiler. Compile Windows programs, test, and build Windows installers. Over 100 libraries supported. http://fedoraproject.org/wiki/MinGW
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
On Wed, May 22, 2013 at 3:26 PM, Richard W.M. Jones rjo...@redhat.comwrote: On Wed, May 22, 2013 at 02:32:37PM -0400, Wolfgang Richter wrote: On Wed, May 22, 2013 at 12:42 PM, Richard W.M. Jones rjo...@redhat.com wrote: Run up to two extra guestfish instances, with the same result. The fourth guestfish instance hangs at the 'run' command until one of the first three is told to exit. And your interested on being notified when a snapshot is safe to read from? Or is it valuable to try reading immediately? I'm not sure I understand the question. I assumed (maybe wrongly) that if we had an NBD address (ie. Unix socket or IP:port) then we'd just connect to that and go. I meant if there was interest in reading from a disk that isn't fully synchronized (yet) to the original disk (it might have old blocks). Or would you only want to connect once a (complete) snapshot is available (synchronized completely to some point-in. -- Wolf
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
On Wed, May 22, 2013 at 03:38:33PM -0400, Wolfgang Richter wrote: On Wed, May 22, 2013 at 3:26 PM, Richard W.M. Jones rjo...@redhat.comwrote: On Wed, May 22, 2013 at 02:32:37PM -0400, Wolfgang Richter wrote: On Wed, May 22, 2013 at 12:42 PM, Richard W.M. Jones rjo...@redhat.com wrote: Run up to two extra guestfish instances, with the same result. The fourth guestfish instance hangs at the 'run' command until one of the first three is told to exit. And your interested on being notified when a snapshot is safe to read from? Or is it valuable to try reading immediately? I'm not sure I understand the question. I assumed (maybe wrongly) that if we had an NBD address (ie. Unix socket or IP:port) then we'd just connect to that and go. I meant if there was interest in reading from a disk that isn't fully synchronized (yet) to the original disk (it might have old blocks). Or would you only want to connect once a (complete) snapshot is available (synchronized completely to some point-in. IIUC a disk which wasn't fully synchronized wouldn't necessarily be interpretable by libguestfs, so I guess we would need the complete snapshot. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones libguestfs lets you edit virtual machines. Supports shell scripting, bindings from many languages. http://libguestfs.org
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
Il 22/05/2013 22:47, Richard W.M. Jones ha scritto: I meant if there was interest in reading from a disk that isn't fully synchronized (yet) to the original disk (it might have old blocks). Or would you only want to connect once a (complete) snapshot is available (synchronized completely to some point-in. IIUC a disk which wasn't fully synchronized wouldn't necessarily be interpretable by libguestfs, so I guess we would need the complete snapshot. In the case of point-in-time backups (Stefan's block-backup) the plan is to have the snapshot complete from the beginning. Paolo
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
[...] From my point of view, what I'm missing here is how would I use it. Ideally I'd like to issue some QMP commands which would set up the point-in-time snapshot, and then connect to this snapshot over (eg) NBD, then when I'm done, send some more QMP commands to tear down the snapshot. I think this document would be better with one or more examples showing how this would be used. Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming blog: http://rwmj.wordpress.com Fedora now supports 80 OCaml packages (the OPEN alternative to F#)
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
Am 14.05.2013 um 18:45 hat Paolo Bonzini geschrieben: Il 14/05/2013 17:48, Wolfgang Richter ha scritto: On Tue, May 14, 2013 at 6:04 AM, Paolo Bonzini pbonz...@redhat.com mailto:pbonz...@redhat.com wrote: Il 14/05/2013 10:50, Kevin Wolf ha scritto: Or, to translate it into our existing terminology, drive-mirror implements a passive mirror, you're proposing an active one (which we do want to have). With an active mirror, we'll want to have another choice: The mirror can be synchronous (guest writes only complete after the mirrored write has completed) or asynchronous (completion is based only on the original image). It should be easy enough to support both once an active mirror exists. Right, I'm waiting for Stefan's block-backup to give me the right hooks for the active mirror. The bulk phase will always be passive, but an active-asynchronous mirror has some interesting properties and it makes sense to implement it. Do you mean you'd model the 'active' mode after 'block-backup,' or actually call functions provided by 'block-backup'? No, I'll just reuse the same hooks within block/mirror.c (almost... it looks like I need after_write too, not just before_write :( that's a pity). Makes me wonder if using a real BlockDriver for the filter from the beginning wouldn't be better than accumulating more and more hooks and having to find ways to pass data from 'before' to 'after' hooks... Basically: 1) before the write, if there is space in the job's buffers, allocate a MirrorOp and a data buffer for the write. Also record whether the block was dirty before; 2) after the write, do nothing if there was no room to allocate the data buffer. Else clear the block from the dirty bitmap. If the block was dirty, read the whole cluster from the source as in passive mirroring. If it wasn't, copy the data from guest memory to the preallocated buffer and write it to the destination; Does the if there was no room part mean that the mirror is active only sometimes? And why even bother with a dirty bitmap for an active mirror? The background job that sequentially processes the whole image only needs a counter, no bitmap. At which point it looks like implementing it separate from mirror.c could make more sense. Kevin
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
Il 15/05/2013 09:59, Kevin Wolf ha scritto: Do you mean you'd model the 'active' mode after 'block-backup,' or actually call functions provided by 'block-backup'? No, I'll just reuse the same hooks within block/mirror.c (almost... it looks like I need after_write too, not just before_write :( that's a pity). Makes me wonder if using a real BlockDriver for the filter from the beginning wouldn't be better than accumulating more and more hooks and having to find ways to pass data from 'before' to 'after' hooks... We don't need a way to pass data from before to after hooks, a simple scan of a linked list will do. Basically: 1) before the write, if there is space in the job's buffers, allocate a MirrorOp and a data buffer for the write. Also record whether the block was dirty before; 2) after the write, do nothing if there was no room to allocate the data buffer. Else clear the block from the dirty bitmap. If the block was dirty, read the whole cluster from the source as in passive mirroring. If it wasn't, copy the data from guest memory to the preallocated buffer and write it to the destination; Does the if there was no room part mean that the mirror is active only sometimes? Yes, otherwise the guest can allocate arbitrary amounts of memory in the host just by starting a few very large I/O operations. And why even bother with a dirty bitmap for an active mirror? The background job that sequentially processes the whole image only needs a counter, no bitmap. That's not enough for the case when the host crashes and you have to restart the mirroring or complete it offline. Paolo At which point it looks like implementing it separate from mirror.c could make more sense. Kevin
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
We don't need a way to pass data from before to after hooks, a simple scan of a linked list will do. So in this case the linked list is the way. Point taken. :) Does the if there was no room part mean that the mirror is active only sometimes? Yes, otherwise the guest can allocate arbitrary amounts of memory in the host just by starting a few very large I/O operations. I think I would rather throttle I/O in this case, i.e. requests wait until they can get the space. At least for a synchronous mirror we have to do something like this. Yes, but this is still asynchronous. The active part is just an optimization to avoid write amplification (where small random writes require I/O of an entire block as big as the bitmap granularity). And why even bother with a dirty bitmap for an active mirror? The background job that sequentially processes the whole image only needs a counter, no bitmap. That's not enough for the case when the host crashes and you have to restart the mirroring or complete it offline. You're thinking of a persistent bitmap here? Makes sense then, I didn't think about that. Yes. Paolo
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
Il 15/05/2013 11:46, Kevin Wolf ha scritto: Am 15.05.2013 um 11:16 hat Paolo Bonzini geschrieben: Does the if there was no room part mean that the mirror is active only sometimes? Yes, otherwise the guest can allocate arbitrary amounts of memory in the host just by starting a few very large I/O operations. On second thought, can't you do zero copy anyway for full cluster writes? This means that at most two clusters per request must be allocated, no matter how large it is, and you can probably reuse the same one-cluster buffer for both. Only for synchronous mirror. For an asynchronous mirror, there's no guarantee that the mirror finishes writing before the source. When that fails, the guest can touch the memory and the mirror diverges from the source. I think I would rather throttle I/O in this case, i.e. requests wait until they can get the space. At least for a synchronous mirror we have to do something like this. Yes, but this is still asynchronous. The active part is just an optimization to avoid write amplification (where small random writes require I/O of an entire block as big as the bitmap granularity). Yes, that sounds like a good use case. But does this really cover all use cases a real synchronous active mirror would provide? I understood that Wolf wants to get every single guest request exposed e.g. on an NBD connection. He can use throttling to limit the guest's I/O speed to the size of the asynchronous mirror's buffer. Paolo
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
On Mon, May 13, 2013 at 05:21:54PM -0400, Wolfgang Richter wrote: I'm working on a new patch series which will add a new QMP command, block-trace, which turns on tracing of writes for a specified block device and sends the stream unmodified to another block device. The 'trace' is meant to be precise meaning that writes are not lost, which differentiates this command from others. It can be turned on and off depending on when it is needed. How is this different from block-backup or drive-mirror? block-backup is designed to create point-in-time snapshots and not clone the entire write stream of a VM to a particular device. It implements copy-on-write to create a snapshot. Thus whenever a write occurs, block-backup is designed to send the original data and not the contents of the new write. drive-mirror is designed to mirror a disk to another location. It operates by periodically scanning a dirty bitmap and cloning blocks when dirtied. This is efficient as it allows for batching of writes, but it does not maintain the order in which guest writes occurred and it can miss intermediate writes when they go to the same location on disk. How can block-trace be used? (1) Disk introspection - systems which analyze the writes going to a disk for introspection require a perfect clone of the write stream to an original disk to stay in-sync with updates to guest file systems. (2) Replicated block device - two block devices could be maintained as exact copies of each other up to a point in the disk write stream that has successfully been written to the destination block device. CCed Benoit Canet, who implemented the quorum block driver to mirror I/O to multiple images and verify data integrity. QEMU is accumulating many different approaches to snapshots and mirroring. They all have their pros and cons so it's not possible to support only one approach for all use cases. The suggested approach is writing a BlockDriver which mirrors I/O to two BlockDriverStates. There has been discussion around breaking BlockDriver into smaller interfaces, including a BlockFilter for intercepting I/O, but this has not been implemented. blkverify is an example of a BlockDriver that manages two child BlockDriverStates and may be a good starting point.
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
Am 13.05.2013 um 23:21 hat Wolfgang Richter geschrieben: I'm working on a new patch series which will add a new QMP command, block-trace, which turns on tracing of writes for a specified block device and sends the stream unmodified to another block device. The 'trace' is meant to be precise meaning that writes are not lost, which differentiates this command from others. It can be turned on and off depending on when it is needed. How is this different from block-backup or drive-mirror? block-backup is designed to create point-in-time snapshots and not clone the entire write stream of a VM to a particular device. It implements copy-on-write to create a snapshot. Thus whenever a write occurs, block-backup is designed to send the original data and not the contents of the new write. drive-mirror is designed to mirror a disk to another location. It operates by periodically scanning a dirty bitmap and cloning blocks when dirtied. This is efficient as it allows for batching of writes, but it does not maintain the order in which guest writes occurred and it can miss intermediate writes when they go to the same location on disk. Or, to translate it into our existing terminology, drive-mirror implements a passive mirror, you're proposing an active one (which we do want to have). With an active mirror, we'll want to have another choice: The mirror can be synchronous (guest writes only complete after the mirrored write has completed) or asynchronous (completion is based only on the original image). It should be easy enough to support both once an active mirror exists. How can block-trace be used? (1) Disk introspection - systems which analyze the writes going to a disk for introspection require a perfect clone of the write stream to an original disk to stay in-sync with updates to guest file systems. (2) Replicated block device - two block devices could be maintained as exact copies of each other up to a point in the disk write stream that has successfully been written to the destination block device. You're leaving out the most interesting section: How should block-trace be implemented? The first question is what the API should look like, on the QMP level. I think originally the idea was to use drive-mirror for all kinds of mirrors, but maybe it makes more sense indeed to keep the active mirror separate. I don't particularly like the name block-trace for a separate command, but let's save the bikeshedding for later. The other question is how to implement it internally. I don't think adding specific code for each new block job into bdrv_co_do_writev() is acceptable. We really need a generic way to intercept I/O operations. The keyword from earlier discussions is block filters. Essentially the idea is that the block job temporarily adds a BlockDriverState on top of the format driver and becomes able to implement all callbacks it likes to intercept. The bad news is that the infrastructure isn't there yet to actually make this happen in a sane way. Kevin
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
Il 14/05/2013 10:50, Kevin Wolf ha scritto: Or, to translate it into our existing terminology, drive-mirror implements a passive mirror, you're proposing an active one (which we do want to have). With an active mirror, we'll want to have another choice: The mirror can be synchronous (guest writes only complete after the mirrored write has completed) or asynchronous (completion is based only on the original image). It should be easy enough to support both once an active mirror exists. Right, I'm waiting for Stefan's block-backup to give me the right hooks for the active mirror. The bulk phase will always be passive, but an active-asynchronous mirror has some interesting properties and it makes sense to implement it. Paolo
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
On Tue, May 14, 2013 at 4:40 AM, Stefan Hajnoczi stefa...@redhat.comwrote: QEMU is accumulating many different approaches to snapshots and mirroring. They all have their pros and cons so it's not possible to support only one approach for all use cases. The suggested approach is writing a BlockDriver which mirrors I/O to two BlockDriverStates. There has been discussion around breaking BlockDriver into smaller interfaces, including a BlockFilter for intercepting I/O, but this has not been implemented. blkverify is an example of a BlockDriver that manages two child BlockDriverStates and may be a good starting point. BlockFilter sounds interesting. The main reason I proposed 'block-trace' is because that is almost identical to what I currently have implemented with the tracing framework---I just didn't have a nice QMP command. -- Wolf
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
On Tue, May 14, 2013 at 4:50 AM, Kevin Wolf kw...@redhat.com wrote: Or, to translate it into our existing terminology, drive-mirror implements a passive mirror, you're proposing an active one (which we do want to have). With an active mirror, we'll want to have another choice: The mirror can be synchronous (guest writes only complete after the mirrored write has completed) or asynchronous (completion is based only on the original image). It should be easy enough to support both once an active mirror exists. Yes! Active mirroring is precisely what is needed to implement block-level introspection. You're leaving out the most interesting section: How should block-trace be implemented? Noted, although maybe folding it into 'drive-mirror' as an 'active' option might be best, now that Paolo has spoken up. The other question is how to implement it internally. I don't think adding specific code for each new block job into bdrv_co_do_writev() is acceptable. We really need a generic way to intercept I/O operations. The keyword from earlier discussions is block filters. Essentially the idea is that the block job temporarily adds a BlockDriverState on top of the format driver and becomes able to implement all callbacks it likes to intercept. The bad news is that the infrastructure isn't there yet to actually make this happen in a sane way. Yeah, I'd also really love block filters and probably would have originally used them instead of the tracing subsystem originally if they existed. It would make implementing all kinds of 'block-level' features much, much easier. -- Wolf
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
On Tue, May 14, 2013 at 6:04 AM, Paolo Bonzini pbonz...@redhat.com wrote: Il 14/05/2013 10:50, Kevin Wolf ha scritto: Or, to translate it into our existing terminology, drive-mirror implements a passive mirror, you're proposing an active one (which we do want to have). With an active mirror, we'll want to have another choice: The mirror can be synchronous (guest writes only complete after the mirrored write has completed) or asynchronous (completion is based only on the original image). It should be easy enough to support both once an active mirror exists. Right, I'm waiting for Stefan's block-backup to give me the right hooks for the active mirror. The bulk phase will always be passive, but an active-asynchronous mirror has some interesting properties and it makes sense to implement it. Do you mean you'd model the 'active' mode after 'block-backup,' or actually call functions provided by 'block-backup'? If I knew more about what you had in mind, I wouldn't mind trying to add this 'active' mode to 'drive-mirror' and test it with my use case. I want to avoid duplicate work, so if you want to implement it yourself I can defer this. -- Wolf
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
Il 14/05/2013 17:48, Wolfgang Richter ha scritto: On Tue, May 14, 2013 at 6:04 AM, Paolo Bonzini pbonz...@redhat.com mailto:pbonz...@redhat.com wrote: Il 14/05/2013 10:50, Kevin Wolf ha scritto: Or, to translate it into our existing terminology, drive-mirror implements a passive mirror, you're proposing an active one (which we do want to have). With an active mirror, we'll want to have another choice: The mirror can be synchronous (guest writes only complete after the mirrored write has completed) or asynchronous (completion is based only on the original image). It should be easy enough to support both once an active mirror exists. Right, I'm waiting for Stefan's block-backup to give me the right hooks for the active mirror. The bulk phase will always be passive, but an active-asynchronous mirror has some interesting properties and it makes sense to implement it. Do you mean you'd model the 'active' mode after 'block-backup,' or actually call functions provided by 'block-backup'? No, I'll just reuse the same hooks within block/mirror.c (almost... it looks like I need after_write too, not just before_write :( that's a pity). Basically: 1) before the write, if there is space in the job's buffers, allocate a MirrorOp and a data buffer for the write. Also record whether the block was dirty before; 2) after the write, do nothing if there was no room to allocate the data buffer. Else clear the block from the dirty bitmap. If the block was dirty, read the whole cluster from the source as in passive mirroring. If it wasn't, copy the data from guest memory to the preallocated buffer and write it to the destination; If I knew more about what you had in mind, I wouldn't mind trying to add this 'active' mode to 'drive-mirror' and test it with my use case. I want to avoid duplicate work, so if you want to implement it yourself I can defer this. Also the other way round. If you want to give it a shot based on the above spec just tell me. It should require no changes to block.c except for adding after_write. Paolo
Re: [Qemu-devel] [RFC] block-trace Low Level Command Supporting Disk Introspection
On Tue, May 14, 2013 at 12:45 PM, Paolo Bonzini pbonz...@redhat.com wrote: No, I'll just reuse the same hooks within block/mirror.c (almost... it looks like I need after_write too, not just before_write :( that's a pity). Basically: 1) before the write, if there is space in the job's buffers, allocate a MirrorOp and a data buffer for the write. Also record whether the block was dirty before; 2) after the write, do nothing if there was no room to allocate the data buffer. Else clear the block from the dirty bitmap. If the block was dirty, read the whole cluster from the source as in passive mirroring. If it wasn't, copy the data from guest memory to the preallocated buffer and write it to the destination; If I knew more about what you had in mind, I wouldn't mind trying to add this 'active' mode to 'drive-mirror' and test it with my use case. I want to avoid duplicate work, so if you want to implement it yourself I can defer this. Also the other way round. If you want to give it a shot based on the above spec just tell me. Talked with my group here as well. I think I'd like to give it a shot based on the above spec rather than refactor my code into a new command. This way it will hopefully reduce duplicated efforts, and provide extra testing for the active mirroring code. I'll take a pass through the mirror code to make sure I understand it better than I currently do. Would you like to coordinate off-list until we have a patch? -- Wolf