Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel

2011-08-08 Thread Luiz Capitulino
On Sun, 07 Aug 2011 21:28:17 +0300
Ronen Hod r...@redhat.com wrote:

 Well, we want to support Microsoft's VSS, and that requires a guest 
 agent that communicates with all the writers (applications), waiting 
 for them to flush their app data in order to generate a consistent 
 app-level snapshot. The VSS platform does most of the work.
 Still, at the bottom line, the agent's role is only to find the right 
 moment in time. This moment can be relayed back to libvirt, and from 
 there do it according to your suggestion, so that the guest agent does 
 not do the freeze, and it is actually not a mandatory component.

I think this discussion has reached the point where patches will speak
louder than words.



Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel

2011-08-07 Thread Ronen Hod
Well, we want to support Microsoft's VSS, and that requires a guest 
agent that communicates with all the writers (applications), waiting 
for them to flush their app data in order to generate a consistent 
app-level snapshot. The VSS platform does most of the work.
Still, at the bottom line, the agent's role is only to find the right 
moment in time. This moment can be relayed back to libvirt, and from 
there do it according to your suggestion, so that the guest agent does 
not do the freeze, and it is actually not a mandatory component.


Ronen.



Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel

2011-07-28 Thread Andrea Arcangeli
On Thu, Jul 28, 2011 at 11:53:50AM +0900, Fernando Luis Vázquez Cao wrote:
 On Wed, 2011-07-27 at 17:24 +0200, Andrea Arcangeli wrote:
  making
  sure no lib is calling any I/O function to be able to defreeze the
  filesystems later, making sure the oom killer or a wrong kill -9
  $RANDOM isn't killing the agent by mistake while the I/O is blocked
  and the copy is going.
 
 Yes with the current API if the agent is killed while the filesystems
 are frozen we are screwed.
 
 I have just submitted patches that implement a new API that should make
 the virtualization use case more reliable. Basically, I am adding a new
 ioctl, FIGETFREEZEFD, which freezes the indicated filesystem and returns
 a file descriptor; as long as that file descriptor is held open, the
 filesystem remains open. If the freeze file descriptor is closed (be it
 through a explicit call to close(2) or as part of process exit
 housekeeping) the associated filesystem is automatically thawed.
 
 - fsfreeze: add ioctl to create a fd for freeze control
   http://marc.info/?l=linux-fsdevelm=131175212512290w=2
 - fsfreeze: add freeze fd ioctls
   http://marc.info/?l=linux-fsdevelm=131175220612341w=2

This is probably how the API should have been implemented originally
instead of FIFREEZE/FITHAW.

It looks a bit overkill though, I would think it'd be enough to have
the fsfreeze forced at FIGETFREEZEFD, and the only way to thaw by
closing the file without requiring any of the
FS_FREEZE_FD/FS_THAW_FD/FS_ISFROZEN_FD. But I guess you have use cases
for those if you implemented it, maybe to check if root is stepping on
its own toes by checking if the fs is already freezed before freezing
it and returning failure if it is, running ioctl instead of opening
closing the file isn't necessarily better. At the very least the
get_user(should_freeze, argp) doesn't seem so necessary, it just
complicates the ioctl API a bit without much gain, I think it'd be
cleaner if the FS_FREEZE_FD was the only way to freeze then.

It's certainly a nice reliability improvement and safer API.

Now if you add a file descriptor to epoll/poll that userland can open
and talk to, to know when a fsfreeze is asked on a certain fs, a
fsfreeze userland agent (not virt related too) could open it and start
the scripts if that filesystem is being fsfreezed before calling
freeze_super().

Then a PARAVIRT_FSFREEZE=y/m driver could just invoke the fsfreeze
without any dependency on a virt specific guest agent.

Maybe Christoph's right there are filesystems in userland (not sure
how the storage is related, it's all about filesystems and apps as far
I can see, and it's all blkdev agnostic) that may make things more
complicated, but those usually have a kernel backend too (like
fuse). I may not see the full picture of the filesystem in userland or
how the storage agent in guest userland relates to this.

If you believe having libvirt talking QMP/QAPI over a virtio-serial
vmchannel with some virt specific guest userland agent bypassing qemu
entirely is better, that's ok with me, but there should be a strong
reason for it because the paravirt_fsfreeze.ko approach with a small
qemu backend and a qemu monitor command that starts paravirt-fsfreeze
in guest before going ahead blocking all I/O (to provide backwards
compatibility and reliable snapshots to guest OS that won't have the
paravirt fsfreeze too) looks more reliable, more compact and simpler
to use to me. I'll be surely ok either ways though.

Thanks,
Andrea



Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel

2011-07-28 Thread Jes Sorensen
On 07/27/11 18:40, Andrea Arcangeli wrote:
 Another thing to note is that snapshotting is not necessarily something 
  that should be completely transparent to the guest. One of the planned 
  future features for the guest agent (mentioned in the snapshot wiki, and 
  a common use case that I've seen come up elsewhere as well in the 
  context of database applications), is a way for userspace applications 
  to register callbacks to be made in the event of a freeze (dumping 
  application-managed caches to disk and things along that line). The 
 Not sure if the scripts are really needed or if they would just open a
 brand new fsfreeze specific unix domain socket (created by the
 database) to tell the database to freeze.
 
 If the latter is the case, then it'd be better rather than changing
 the database to open unix domain socket so the script can connect to
 it when invoked (or maybe to just add some new function to the
 protocol of an existing open unix domain socket), to instead change
 the database to open a /dev/virtio-fsfreeze device, created by the
 virtio-fsfreeze.ko virtio driver through udev. The database would poll
 it, and it could read the request to freeze, and write into it that it
 finished freezing when done. Then when all openers of the device
 freezed, the virtio-fsfreeze.ko would go ahead freezing all the
 filesystems, and then tell qemu when it's finished freezing. Then qemu
 can finally block all the I/O and tell libvirt to go ahead with the
 snapshot.

I think it could also be a combined operation, ie. having the freeze
happen in the kernel, but doing the callouts using a userspace daemon. I
like the userspace daemon for the callouts because it allows providing a
more sophisticated API than if we provide just a socket like interface.
In addition the callout is less critical wrt crashes than the fsfreeze
operations.

Cheers,
Jes



Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel

2011-07-28 Thread Jes Sorensen
On 07/27/11 20:36, Christoph Hellwig wrote:
 Initiating the freeze from kernelspace doesn't make much sense.  With
 virtio we could add in-band freeze request to the protocol, and although
 that would be a major change in that way virtio-blk works right now it's
 at least doable.  But all other real storage targets only communicate
 with their initators over out of band procotols that are entirely handled
 in userspace, and given their high-level nature better are - that is if
 we know them at all given how vendors like to keep this secrete IP
 closed and just offer userspace management tools in binary form.
 
 building new infrastructure in the kernel just for virtio, while needing
 to duplicate the same thing in userspace for all real storage seems like
 a really bad idea.  That is in addition to the userspace freeze notifier
 similar to what e.g. Windows has - if the freeze process is driven from
 userspace it's much easier to handle those properly compared to requiring
 kernel upcalls.
 

The freeze operation would really just be a case of walking the list of
mounted file systems and calling the FIFREEZE ioctl operation on them. I
wouldn't anticipate doing anything else in a virtio-fsfreeze.ko module.

Cheers,
Jes




Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel

2011-07-28 Thread Michael Roth

On 07/28/2011 03:03 AM, Andrea Arcangeli wrote:

On Thu, Jul 28, 2011 at 11:53:50AM +0900, Fernando Luis Vázquez Cao wrote:

On Wed, 2011-07-27 at 17:24 +0200, Andrea Arcangeli wrote:

making
sure no lib is calling any I/O function to be able to defreeze the
filesystems later, making sure the oom killer or a wrong kill -9
$RANDOM isn't killing the agent by mistake while the I/O is blocked
and the copy is going.


Yes with the current API if the agent is killed while the filesystems
are frozen we are screwed.

I have just submitted patches that implement a new API that should make
the virtualization use case more reliable. Basically, I am adding a new
ioctl, FIGETFREEZEFD, which freezes the indicated filesystem and returns
a file descriptor; as long as that file descriptor is held open, the
filesystem remains open. If the freeze file descriptor is closed (be it
through a explicit call to close(2) or as part of process exit
housekeeping) the associated filesystem is automatically thawed.

- fsfreeze: add ioctl to create a fd for freeze control
   http://marc.info/?l=linux-fsdevelm=131175212512290w=2
- fsfreeze: add freeze fd ioctls
   http://marc.info/?l=linux-fsdevelm=131175220612341w=2


This is probably how the API should have been implemented originally
instead of FIFREEZE/FITHAW.

It looks a bit overkill though, I would think it'd be enough to have
the fsfreeze forced at FIGETFREEZEFD, and the only way to thaw by
closing the file without requiring any of the
FS_FREEZE_FD/FS_THAW_FD/FS_ISFROZEN_FD. But I guess you have use cases


One of the crappy things about the current implementation is the 
inability to determine whether or not a filesystem is frozen. At least 
in the context of guest agent at least, it'd be nice if 
guest-fsfreeze-status checked the actual system state rather than some 
internal state that may not necessarily reflect reality (if we freeze, 
and some other application thaws, we currently still report the state as 
frozen).


Also in the context of the guest agent, we are indeed screwed if the 
agent gets killed while in a frozen state, and remain screwed even if 
it's restarted since we have no way of determining whether or not we're 
in a frozen state and thus should disable logging operations.


We could check status by looking for a failure from the freeze 
operation, but if you're just interested in getting the state, having to 
potentially induce a freeze just to get at the state is really heavy-handed.


So having an open operation that doesn't force a freeze/thaw/status 
operation serves some fairly common use cases I think.



for those if you implemented it, maybe to check if root is stepping on
its own toes by checking if the fs is already freezed before freezing
it and returning failure if it is, running ioctl instead of opening
closing the file isn't necessarily better. At the very least the
get_user(should_freeze, argp) doesn't seem so necessary, it just
complicates the ioctl API a bit without much gain, I think it'd be
cleaner if the FS_FREEZE_FD was the only way to freeze then.

It's certainly a nice reliability improvement and safer API.

Now if you add a file descriptor to epoll/poll that userland can open
and talk to, to know when a fsfreeze is asked on a certain fs, a
fsfreeze userland agent (not virt related too) could open it and start
the scripts if that filesystem is being fsfreezed before calling
freeze_super().

Then a PARAVIRT_FSFREEZE=y/m driver could just invoke the fsfreeze
without any dependency on a virt specific guest agent.

Maybe Christoph's right there are filesystems in userland (not sure
how the storage is related, it's all about filesystems and apps as far
I can see, and it's all blkdev agnostic) that may make things more
complicated, but those usually have a kernel backend too (like
fuse). I may not see the full picture of the filesystem in userland or
how the storage agent in guest userland relates to this.

If you believe having libvirt talking QMP/QAPI over a virtio-serial
vmchannel with some virt specific guest userland agent bypassing qemu
entirely is better, that's ok with me, but there should be a strong
reason for it because the paravirt_fsfreeze.ko approach with a small
qemu backend and a qemu monitor command that starts paravirt-fsfreeze
in guest before going ahead blocking all I/O (to provide backwards
compatibility and reliable snapshots to guest OS that won't have the
paravirt fsfreeze too) looks more reliable, more compact and simpler
to use to me. I'll be surely ok either ways though.

Thanks,
Andrea





Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel

2011-07-28 Thread Michael Roth

On 07/28/2011 03:54 AM, Jes Sorensen wrote:

On 07/27/11 18:40, Andrea Arcangeli wrote:

Another thing to note is that snapshotting is not necessarily something

that should be completely transparent to the guest. One of the planned
future features for the guest agent (mentioned in the snapshot wiki, and
a common use case that I've seen come up elsewhere as well in the
context of database applications), is a way for userspace applications
to register callbacks to be made in the event of a freeze (dumping
application-managed caches to disk and things along that line). The

Not sure if the scripts are really needed or if they would just open a
brand new fsfreeze specific unix domain socket (created by the
database) to tell the database to freeze.

If the latter is the case, then it'd be better rather than changing
the database to open unix domain socket so the script can connect to
it when invoked (or maybe to just add some new function to the
protocol of an existing open unix domain socket), to instead change
the database to open a /dev/virtio-fsfreeze device, created by the
virtio-fsfreeze.ko virtio driver through udev. The database would poll
it, and it could read the request to freeze, and write into it that it
finished freezing when done. Then when all openers of the device
freezed, the virtio-fsfreeze.ko would go ahead freezing all the
filesystems, and then tell qemu when it's finished freezing. Then qemu
can finally block all the I/O and tell libvirt to go ahead with the
snapshot.


I think it could also be a combined operation, ie. having the freeze
happen in the kernel, but doing the callouts using a userspace daemon. I
like the userspace daemon for the callouts because it allows providing a
more sophisticated API than if we provide just a socket like interface.
In addition the callout is less critical wrt crashes than the fsfreeze
operations.



I'd prefer this approach as well. We could potentially implement it with 
a more general mechanism for executing scripts in the guest for whatever 
reason, rather than an fsfreeze-specific one.


Let the management layer handle the orchestration between the 2. Whether 
the freeze is kernel-driven or not I think can go either way, though the 
potential issues I mentioned in response the Fernando's post seem to 
those proposed changes are required for a proper guest agent 
implementation, and at that point we're talking about kernel changes 
either way for the functionality we ultimately want.


I think there may still be value in retaining the current fsfreeze 
support in the agent for older guests, however. What I'm convinced of 
now though is that the operation should not be tethered to the 
application callback operation, since that's applicable to other 
potential fsfreeze mechanisms.



Cheers,
Jes





Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel

2011-07-28 Thread Fernando Luis Vazquez Cao

Michael Roth さんは書きました:

On 07/28/2011 03:03 AM, Andrea Arcangeli wrote:
On Thu, Jul 28, 2011 at 11:53:50AM +0900, Fernando Luis Vázquez Cao 
wrote:

On Wed, 2011-07-27 at 17:24 +0200, Andrea Arcangeli wrote:

making
sure no lib is calling any I/O function to be able to defreeze the
filesystems later, making sure the oom killer or a wrong kill -9
$RANDOM isn't killing the agent by mistake while the I/O is blocked
and the copy is going.


Yes with the current API if the agent is killed while the filesystems
are frozen we are screwed.

I have just submitted patches that implement a new API that should make
the virtualization use case more reliable. Basically, I am adding a new
ioctl, FIGETFREEZEFD, which freezes the indicated filesystem and 
returns

a file descriptor; as long as that file descriptor is held open, the
filesystem remains open. If the freeze file descriptor is closed (be it
through a explicit call to close(2) or as part of process exit
housekeeping) the associated filesystem is automatically thawed.

- fsfreeze: add ioctl to create a fd for freeze control
http://marc.info/?l=linux-fsdevelm=131175212512290w=2
- fsfreeze: add freeze fd ioctls
http://marc.info/?l=linux-fsdevelm=131175220612341w=2


This is probably how the API should have been implemented originally
instead of FIFREEZE/FITHAW.

It looks a bit overkill though, I would think it'd be enough to have
the fsfreeze forced at FIGETFREEZEFD, and the only way to thaw by
closing the file without requiring any of the
FS_FREEZE_FD/FS_THAW_FD/FS_ISFROZEN_FD. But I guess you have use cases


One of the crappy things about the current implementation is the 
inability to determine whether or not a filesystem is frozen. At least 
in the context of guest agent at least, it'd be nice if 
guest-fsfreeze-status checked the actual system state rather than some 
internal state that may not necessarily reflect reality (if we freeze, 
and some other application thaws, we currently still report the state 
as frozen).


Also in the context of the guest agent, we are indeed screwed if the 
agent gets killed while in a frozen state, and remain screwed even if 
it's restarted since we have no way of determining whether or not 
we're in a frozen state and thus should disable logging operations.


That is precisely the reason I added the new API.

We could check status by looking for a failure from the freeze 
operation, but if you're just interested in getting the state, having 
to potentially induce a freeze just to get at the state is really 
heavy-handed.


So having an open operation that doesn't force a freeze/thaw/status 
operation serves some fairly common use cases I think. 


Yep. If you think there is something missing API wise let me know and I 
will implement it.


Thanks,
Fernando



[Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel

2011-07-27 Thread Andrea Arcangeli
Hello everyone,

I've been thinking at the current design of the fsfreeze feature used
by libvirt.

It currently relays on an userland agent in the guest talking to qemu
with some vmchannel communication. The guest agent would walk the
filesystems in the guest and call fsfreeze ioctl on them.

The fsfreeze is an optional feature, it's not required to do safe
snapshots, after fsfreeze (regardless if available or not) QEMU must
still block all I/O for all qemu blkdevices before the image is saved,
to allow safe snapshotting of non-linux guests. Then if a VM is
restarted in the snapshot it becomes identical to a fault tolerance
fallback with nfs or drdb in a highly available
configuration. Fsfreeze just provides some further (minor) benefit on
top of that (which probably won't be available for non-linux guests
any time soon).

The benefits this optional fsfreeze feature provides to the snapshot
are:

1) more peace of mind by not relaying on the kernel journal reply code
when snapshotting journaled/cow filesystems like ext4/btrfs/xfs

2) all dirty outstanding cache is flushed, which reduces the chances
of running into userland journaling data reply bugs if userland is
restarted on the snapshot

3) allows safe live snapshotting of not jorunaled fs like vfat/ext2 on
linux (not so common, and vfat on non-linux guest won't benefit)

4) allows to mount the snapshotted image readonly without requiring
metadata journal reply

Problem is that having a daemon in guest userland is not my
preference, considering it can be done with a virtio-fsfreeze.ko
kernel module in guest without requiring any userland modification to
the guest (and no interprocess communication through vmchannel
or similar way).

This means a kernel upgrade in the guest that adds the
virtio-fsfreeze.ko virtio paravirt driver would be enough to be able
to provide fsfreeze during snapshots.

A virtio-fsfreeze.ko would certainly be more developer friendly, you
could just build the kernel and even boot it with -kernel bzImage
(after building it with VIRTIO_FSFREEZE=y). Then it'd just work
without any daemon or vmchannel or any other change to the guest
userland.

I could see some advantage in not having to modify qemu if libvirt was
talking directly to the guest agent, so to avoid any knowledge into
qemu about FSFREEZE. But it's not even like that, I see FSFREEZE guest
agent patches floating around. So if qemu has to be modified and be
aware of the fsfreeze feature in the userland guest agent (and not
just asked to block all I/O which doesn't require any guest knowledge
and in turn it'd remain agnostic about fsfreeze) I think it'd be
better if the fsfreeze qemu code would just go into a virtio backend.

There is also an advantage in reliability as there's no more need to
worry about mlocking the memory of the userland guest agent, making
sure no lib is calling any I/O function to be able to defreeze the
filesystems later, making sure the oom killer or a wrong kill -9
$RANDOM isn't killing the agent by mistake while the I/O is blocked
and the copy is going. The guest kernel is a more reliable and natural
place to call fsfreeze through a virtio-fsfreeze guest driver without
having to spend time into worrying about the reliability of the
guest-agent feature. It'd surely also waste less memory in the guest
(not that the agent takes much memory but a few kbytes of .text of a
kernel module for this surely would takes a fraction of the mlocked
RAM the agent would take, the RAM saving is the least interesting
part of course).

If there was no hypervisor behind the kernel, it could only be the
userland starting a fsfreeze, so we shouldn't be fooled into thinking
userland is the best place where to start a fsfreeze invocation, it's
most certainly not, but on the host (without virt) there's no other
thing that could possibly ask for it. But here we have an hypervisor
behind the guest kernel that asks for it, so starting the fsfreeze
through a virtio-fsfreeze.ko kernel module loaded into the guest
kernel (or linked into the guest kernel) sounds a cleaner and more
reliable solution (maybe simpler too).

I'd be certainly a more friendly solution for developers to test or
run it, libvirt would talk only with qemu, and qemu would only talk
with the guest kernel without requiring any modification to the guest
userland. My feeling is that usually what feels much simpler to use
for developers tends to be a better solution (not guaranteed) and to
me a virtio-fsfreeze.ko solution would look much simpler to use.

There are drawbacks, like the fact respinning an update to the
fsfreeze code, would then require an upgrade of the guest kernel,
instead of a package update. But there are avantages too in terms of
coverage, as an updated kernel would also run on top of an older guest
userland that may not have a agent package to install through a
repository.

In any case if the virtio-fsfreeze.ko doesn't register into qemu
virtio-fsfreeze backend, the qemu monitor command should 

Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel

2011-07-27 Thread Michael Roth

On 07/27/2011 10:24 AM, Andrea Arcangeli wrote:

Hello everyone,

I've been thinking at the current design of the fsfreeze feature used
by libvirt.

It currently relays on an userland agent in the guest talking to qemu
with some vmchannel communication. The guest agent would walk the
filesystems in the guest and call fsfreeze ioctl on them.

The fsfreeze is an optional feature, it's not required to do safe
snapshots, after fsfreeze (regardless if available or not) QEMU must
still block all I/O for all qemu blkdevices before the image is saved,
to allow safe snapshotting of non-linux guests. Then if a VM is
restarted in the snapshot it becomes identical to a fault tolerance
fallback with nfs or drdb in a highly available
configuration. Fsfreeze just provides some further (minor) benefit on
top of that (which probably won't be available for non-linux guests
any time soon).

The benefits this optional fsfreeze feature provides to the snapshot
are:

1) more peace of mind by not relaying on the kernel journal reply code
when snapshotting journaled/cow filesystems like ext4/btrfs/xfs

2) all dirty outstanding cache is flushed, which reduces the chances
of running into userland journaling data reply bugs if userland is
restarted on the snapshot

3) allows safe live snapshotting of not jorunaled fs like vfat/ext2 on
linux (not so common, and vfat on non-linux guest won't benefit)

4) allows to mount the snapshotted image readonly without requiring
metadata journal reply

Problem is that having a daemon in guest userland is not my
preference, considering it can be done with a virtio-fsfreeze.ko
kernel module in guest without requiring any userland modification to
the guest (and no interprocess communication through vmchannel
or similar way).

This means a kernel upgrade in the guest that adds the
virtio-fsfreeze.ko virtio paravirt driver would be enough to be able
to provide fsfreeze during snapshots.

A virtio-fsfreeze.ko would certainly be more developer friendly, you
could just build the kernel and even boot it with -kernel bzImage
(after building it with VIRTIO_FSFREEZE=y). Then it'd just work
without any daemon or vmchannel or any other change to the guest
userland.

I could see some advantage in not having to modify qemu if libvirt was
talking directly to the guest agent, so to avoid any knowledge into
qemu about FSFREEZE. But it's not even like that, I see FSFREEZE guest
agent patches floating around. So if qemu has to be modified and be
aware of the fsfreeze feature in the userland guest agent (and not
just asked to block all I/O which doesn't require any guest knowledge
and in turn it'd remain agnostic about fsfreeze) I think it'd be
better if the fsfreeze qemu code would just go into a virtio backend.

There is also an advantage in reliability as there's no more need to
worry about mlocking the memory of the userland guest agent, making
sure no lib is calling any I/O function to be able to defreeze the
filesystems later, making sure the oom killer or a wrong kill -9
$RANDOM isn't killing the agent by mistake while the I/O is blocked
and the copy is going. The guest kernel is a more reliable and natural
place to call fsfreeze through a virtio-fsfreeze guest driver without
having to spend time into worrying about the reliability of the
guest-agent feature. It'd surely also waste less memory in the guest
(not that the agent takes much memory but a few kbytes of .text of a
kernel module for this surely would takes a fraction of the mlocked
RAM the agent would take, the RAM saving is the least interesting
part of course).

If there was no hypervisor behind the kernel, it could only be the
userland starting a fsfreeze, so we shouldn't be fooled into thinking
userland is the best place where to start a fsfreeze invocation, it's
most certainly not, but on the host (without virt) there's no other
thing that could possibly ask for it. But here we have an hypervisor
behind the guest kernel that asks for it, so starting the fsfreeze
through a virtio-fsfreeze.ko kernel module loaded into the guest
kernel (or linked into the guest kernel) sounds a cleaner and more
reliable solution (maybe simpler too).

I'd be certainly a more friendly solution for developers to test or
run it, libvirt would talk only with qemu, and qemu would only talk
with the guest kernel without requiring any modification to the guest
userland. My feeling is that usually what feels much simpler to use
for developers tends to be a better solution (not guaranteed) and to
me a virtio-fsfreeze.ko solution would look much simpler to use.

There are drawbacks, like the fact respinning an update to the
fsfreeze code, would then require an upgrade of the guest kernel,
instead of a package update. But there are avantages too in terms of
coverage, as an updated kernel would also run on top of an older guest
userland that may not have a agent package to install through a
repository.

In any case if the virtio-fsfreeze.ko doesn't register into qemu

Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel

2011-07-27 Thread Anthony Liguori

On 07/27/2011 10:24 AM, Andrea Arcangeli wrote:

Hello everyone,

I've been thinking at the current design of the fsfreeze feature used
by libvirt.

It currently relays on an userland agent in the guest talking to qemu
with some vmchannel communication. The guest agent would walk the
filesystems in the guest and call fsfreeze ioctl on them.

The fsfreeze is an optional feature, it's not required to do safe
snapshots, after fsfreeze (regardless if available or not) QEMU must
still block all I/O for all qemu blkdevices before the image is saved,
to allow safe snapshotting of non-linux guests. Then if a VM is
restarted in the snapshot it becomes identical to a fault tolerance
fallback with nfs or drdb in a highly available
configuration. Fsfreeze just provides some further (minor) benefit on
top of that (which probably won't be available for non-linux guests
any time soon).

The benefits this optional fsfreeze feature provides to the snapshot
are:

1) more peace of mind by not relaying on the kernel journal reply code
when snapshotting journaled/cow filesystems like ext4/btrfs/xfs

2) all dirty outstanding cache is flushed, which reduces the chances
of running into userland journaling data reply bugs if userland is
restarted on the snapshot

3) allows safe live snapshotting of not jorunaled fs like vfat/ext2 on
linux (not so common, and vfat on non-linux guest won't benefit)

4) allows to mount the snapshotted image readonly without requiring
metadata journal reply

Problem is that having a daemon in guest userland is not my
preference, considering it can be done with a virtio-fsfreeze.ko
kernel module in guest without requiring any userland modification to
the guest (and no interprocess communication through vmchannel
or similar way).

This means a kernel upgrade in the guest that adds the
virtio-fsfreeze.ko virtio paravirt driver would be enough to be able
to provide fsfreeze during snapshots.

A virtio-fsfreeze.ko would certainly be more developer friendly, you
could just build the kernel and even boot it with -kernel bzImage
(after building it with VIRTIO_FSFREEZE=y). Then it'd just work
without any daemon or vmchannel or any other change to the guest
userland.

I could see some advantage in not having to modify qemu if libvirt was
talking directly to the guest agent, so to avoid any knowledge into
qemu about FSFREEZE. But it's not even like that, I see FSFREEZE guest
agent patches floating around. So if qemu has to be modified and be
aware of the fsfreeze feature in the userland guest agent (and not
just asked to block all I/O which doesn't require any guest knowledge
and in turn it'd remain agnostic about fsfreeze) I think it'd be
better if the fsfreeze qemu code would just go into a virtio backend.


Currently, QEMU doesn't know about fsfreeze.  I don't think it ever will 
either.



I understand an agent may be needed for other features but I think
whenever a feature is better suited for not requiring userland guest
support, it shouldn't. To me requiring modifications to the guest
userland, looks the least transparent and most intrusive possible way
to implement a libvirt feature so it should be used when it has
advantages and I see mostly disadvantages here.


I also dislike having orchestrate all of the freezing stuff because it's 
extremely hard in userspace to do it reliably.


One challenge though is that it's highly desirable to have script hooks 
as part of the freeze process to let other userspace applications 
participate which means you will always need some userspace daemon to 
kick things off.


Instead of having a virtio-fsfreeze, I think it would be better to think 
about if the kernel needs a higher level interface such that the 
userspace operation is dirt-simple.


But I don't see a way to avoid userspace involvement in this set of 
operations unfortunately.


Regards,

Anthony Liguori


This is just a suggestions, I think the agent should work too.

Thanks a lot,
Andrea





Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel

2011-07-27 Thread Andrea Arcangeli
Hello Michael,

On Wed, Jul 27, 2011 at 11:07:13AM -0500, Michael Roth wrote:
 One thing worth mentioning is that the current host-side interface to 
 the guest agent is not what we're hoping to build libvirt interfaces 
 around. It's a standalone, out-of-band tool for now, but when QMP is 
 converted to QAPI the guest agent interfaces will be exposed to the host 
 transparently to the host as normal QMP commands. libvirt should be able 
 to tell the difference from a guest-agent induced fsfreeze or a guest 
 kernel induced fsfreeze (except perhaps to identify extended 
 capabilities in a particular case):
 
 http://wiki.qemu.org/Features/QAPI/GuestAgent

Sounds good.

 Another thing to note is that snapshotting is not necessarily something 
 that should be completely transparent to the guest. One of the planned 
 future features for the guest agent (mentioned in the snapshot wiki, and 
 a common use case that I've seen come up elsewhere as well in the 
 context of database applications), is a way for userspace applications 
 to register callbacks to be made in the event of a freeze (dumping 
 application-managed caches to disk and things along that line). The 

Not sure if the scripts are really needed or if they would just open a
brand new fsfreeze specific unix domain socket (created by the
database) to tell the database to freeze.

If the latter is the case, then it'd be better rather than changing
the database to open unix domain socket so the script can connect to
it when invoked (or maybe to just add some new function to the
protocol of an existing open unix domain socket), to instead change
the database to open a /dev/virtio-fsfreeze device, created by the
virtio-fsfreeze.ko virtio driver through udev. The database would poll
it, and it could read the request to freeze, and write into it that it
finished freezing when done. Then when all openers of the device
freezed, the virtio-fsfreeze.ko would go ahead freezing all the
filesystems, and then tell qemu when it's finished freezing. Then qemu
can finally block all the I/O and tell libvirt to go ahead with the
snapshot.

If the script hangs (user agent in guest approach), or if the database
hangs while keeping open the /dev/virtio-fsfreeze device
(virtio-fsfreeze.ko approach), that would hang the whole fsfreeze
operation in the virtio-fsfreeze.ko driver. Otherwise a timeout would
be required. But the general idea is that the more stuff is going to
be freezed (especially when userland is involved and not just guest
kernel code like in the virtio-fsfreeze.ko), the higher the risk of an
hang (or alternatively of a false positive timeout... if there's a
timeout).

If scripts are needed, then the agent starting the scripts with
execve, could also open the /dev/virtio-fsfreeze instead of being
invoked by the communication with libvirt with QMP/QAPI etc...

The advantage at least is that if the database is killed,
closing the file will not lead to an hang or a failure of the
fsfreeze. If the agent is killed things would go bad instead (either
hang or timeout).

Maybe it's more a matter of taste, and maybe my taste makes me prefer
a virtio-fsfreeze.ko that later can create register a dev
/dev/virtio-fsfreeze that any app can open. The permission on the
device will also define which apps may lead to false positive timeout
of the snapshotting, or lead to an hang.

 implementation of this would likely be a directory where application can 
 place scripts in that get called in the event of a freeze, something 
 that would require a user-space daemon anyway.
 
 Also, in terms of supporting older guests, the proposed guest tools ISO 
 (akin to virtualbox/vmware guest tools):
 
 http://lists.gnu.org/archive/html/qemu-devel/2011-06/msg02239.html
 
 would give us a distribution channel that doesn't require any 
 involvement from distro maintainers. A distro-package to boot strap the 
 agent would be still be preferable, but the ISO approach seems to work 
 well in practice. And for managed environments getting custom packages 
 installed generally isn't as much of a problem as requiring reboots or 
 kernel changes.

Nice to see it works for more hypervisors.

I think it boils down if an agent is needed for fsfreeze or not. I
think it's not, but I also tend to agree it can work with the
agent. As a developer I don't have much doubt that it'd be so much
simpler to use for me with a virtio driver and no userland change but
I may be biased. I just don't see many cons to the kernel solution,
except perhaps the fact to change the fsfreeze code you've to respin a
kernel update.



Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel

2011-07-27 Thread Andrea Arcangeli
On Wed, Jul 27, 2011 at 11:34:44AM -0500, Anthony Liguori wrote:
 Currently, QEMU doesn't know about fsfreeze.  I don't think it ever will 
 either.

Ah, sorry thanks for the correction, it's some other repo that you
were modifying (qga).

 One challenge though is that it's highly desirable to have script hooks 
 as part of the freeze process to let other userspace applications 
 participate which means you will always need some userspace daemon to 
 kick things off.
 
 Instead of having a virtio-fsfreeze, I think it would be better to think 
 about if the kernel needs a higher level interface such that the 
 userspace operation is dirt-simple.
 
 But I don't see a way to avoid userspace involvement in this set of 
 operations unfortunately.

A /dev/virtio-fsfreeze chardevice created by udev when
virtio-fsfreeze.ko is loaded may be enough to do it. Or maybe it
should be a host kernel solution /dev/fsfreeze that talks with
fsfreeze (not just the virtio case).

The apps liekly must be modified for this, I doubt the scripts would
do much on their own (they'd likely just tell the app to do something
through an unix domain socket) but if scripts are needed the agent
could open that chardev instead of talking QMP/QAPI.

It also depends if people prefers a single agent do it all, or a
fsfreeze agent and some other agent for something else. Even if they
want a single agent for everything they could still have it talk
QMP/QAPI on the virtio-serial vmchannel for everything else and open
/dev/virtio-fsfreeze or /dev/freeze if available.

It's up to you... you understand the customer requirements better. For
me a kernel update and no agent sounds nicer and looks more reliable
considering what fsfreeze does.



Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel

2011-07-27 Thread Christoph Hellwig
Initiating the freeze from kernelspace doesn't make much sense.  With
virtio we could add in-band freeze request to the protocol, and although
that would be a major change in that way virtio-blk works right now it's
at least doable.  But all other real storage targets only communicate
with their initators over out of band procotols that are entirely handled
in userspace, and given their high-level nature better are - that is if
we know them at all given how vendors like to keep this secrete IP
closed and just offer userspace management tools in binary form.

building new infrastructure in the kernel just for virtio, while needing
to duplicate the same thing in userspace for all real storage seems like
a really bad idea.  That is in addition to the userspace freeze notifier
similar to what e.g. Windows has - if the freeze process is driven from
userspace it's much easier to handle those properly compared to requiring
kernel upcalls.




Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel

2011-07-27 Thread Andrea Arcangeli
On Wed, Jul 27, 2011 at 08:36:10PM +0200, Christoph Hellwig wrote:
 Initiating the freeze from kernelspace doesn't make much sense.  With
 virtio we could add in-band freeze request to the protocol, and although
 that would be a major change in that way virtio-blk works right now it's
 at least doable.  But all other real storage targets only communicate
 with their initators over out of band procotols that are entirely handled
 in userspace, and given their high-level nature better are - that is if
 we know them at all given how vendors like to keep this secrete IP
 closed and just offer userspace management tools in binary form.

I don't see how blkdev are related or how virtio-blk is related to
this. Clearly there would be no ring for this, just a paravirt driver
calling into the ioctl_fsfreeze().

What would those real storage targets be? It's just a matter of
looping on the superblocks and call freeze_super() on those if
sb-s_op-freeze_fs is not null. We don't even need to go through a
fake file handle to reach the fs by doing it in the guest kernel.

 building new infrastructure in the kernel just for virtio, while needing

It doesn't need to be virtio as in ring. Maybe I should have called it
paravirt-fsfreeze (as in PARAVIRT_CLOCK), virtio as in doing I/O not.

 to duplicate the same thing in userspace for all real storage seems like
 a really bad idea.  That is in addition to the userspace freeze notifier
 similar to what e.g. Windows has - if the freeze process is driven from
 userspace it's much easier to handle those properly compared to requiring
 kernel upcalls.

Not sure how it is simpler to talk through a virtio-serial some
protocol than to poll a /dev/fsfreeze or /dev/paravirt-fsfreeze.



Re: [Qemu-devel] RFC: moving fsfreeze support from the userland guest agent to the guest kernel

2011-07-27 Thread Fernando Luis Vázquez Cao
On Wed, 2011-07-27 at 17:24 +0200, Andrea Arcangeli wrote:
 making
 sure no lib is calling any I/O function to be able to defreeze the
 filesystems later, making sure the oom killer or a wrong kill -9
 $RANDOM isn't killing the agent by mistake while the I/O is blocked
 and the copy is going.

Yes with the current API if the agent is killed while the filesystems
are frozen we are screwed.

I have just submitted patches that implement a new API that should make
the virtualization use case more reliable. Basically, I am adding a new
ioctl, FIGETFREEZEFD, which freezes the indicated filesystem and returns
a file descriptor; as long as that file descriptor is held open, the
filesystem remains open. If the freeze file descriptor is closed (be it
through a explicit call to close(2) or as part of process exit
housekeeping) the associated filesystem is automatically thawed.

- fsfreeze: add ioctl to create a fd for freeze control
  http://marc.info/?l=linux-fsdevelm=131175212512290w=2
- fsfreeze: add freeze fd ioctls
  http://marc.info/?l=linux-fsdevelm=131175220612341w=2