Re: [openstack-dev] [nova] Fixing the console.log grows forever bug.

2014-12-08 Thread Thierry Carrez
Tony Breeds wrote:
 [...]
 So if that timeline is approximately correct:
 
 - Can we wait this long to fix the bug?  As opposed to having it squashed in 
 Kilo.
 - What do we do in nova for the next ~12 months while know there isn't a qemu 
 to fix this?
 - Then once there is a qemu that fixes the issue, do we just say 'thou must 
 use
   qemu 2.3.0' or would nova still need to support old and new qemu's ?

Fixing it in qemu looks like the right way to fix this issue. If it was
simple to fix, it would have been fixed already: this is one of our
oldest bugs with security impact. So I'd say yes, this should be fixed
in qemu, even if that takes a long time to propagate.

If someone finds an interesting way to work around this issue in Nova,
then by all means, add the workaround to Kilo and deprecate it once we
can assume everyone moved to newer qemu. But given it's been 3 years
this bug has been around, I wouldn't hold my breath.

-- 
Thierry Carrez (ttx)



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Fixing the console.log grows forever bug.

2014-12-08 Thread Daniel P. Berrange
On Sat, Dec 06, 2014 at 04:38:52PM +1100, Tony Breeds wrote:
 Hi All,
 In the most recent team meeting we briefly discussed: [1] where the
 console.log grows indefinitely, eventually causing guest stalls.  I mentioned
 that I was working on a spec to fix this issue.
 
 My original plan was fairly similar to [2]  In that we'd switch libvirt/qemu 
 to
 using a unix domain socket and write a simple helper to read from that socket
 and write to disk.  That helper would close and reopen the on disk file upon
 receiving a HUP (so logrotate just works).   Life would be good. and we could
 all move on.
 
 However I was encouraged to investigate fixing this in qemu, such that qemu
 could process the HUP and make life better for all.  This is certainly doable
 and I'm happy[3] to do this work.  I've floated the idea past qemu-devel and
 they seem okay with the idea.  My main concern is in lag and supporting
 qemu/libvirt that can't handle this option.

As mentioned in my reply on qemu-devel, I think the right long term solution
for this is to fix it in libvirt. We have a general security goal to remove
QEMU's ability to open any files whatsoever, instead having it receive all
host resources as pre-opened file descriptors from libvirt. So what we
anticipate is a new libvirt daemon for processing logs, virtlogd. Anywhere
where QEMU currently gets a file to log to (serial devices, and its
stdout/stderr), it would instead be given a FD that's connected to virtlogd.
virtlogd would simply write the data out to file  would be able to close
 re-open files to integrate with logrotate.

 For the sake of discussion  I'll lay out my best guess right now on fixing 
 this
 in qemu.
 
 qemu 2.2.0 /should/ release this year the ETA is 2014-12-09[4] so the fix I'm
 proposing would be available in qemu 2.3.0 which I think will be available in
 June/July 2015.  So we'd be into 'L' development before this fix is available
 and possibly 'M' before the community distros (Fedora and  Ubuntu)[5] include
 and almost certainly longer for Enterprise distros.  Along with the qemu
 development I expect there to be some libvirt development as well but right 
 now
 I don't think that's critical to the feature or this discussion.
 
 So if that timeline is approximately correct:
 
 - Can we wait this long to fix the bug?  As opposed to having it squashed in 
 Kilo.
 - What do we do in nova for the next ~12 months while know there isn't a qemu 
 to fix this?
 - Then once there is a qemu that fixes the issue, do we just say 'thou must 
 use
   qemu 2.3.0' or would nova still need to support old and new qemu's ?

FWIW, by comparison libvirt is on a monthly release schedule, so a fix done in
libvirt has potential to be available sooner, though obviously there's bigger
dev work to be done in libvirt for this.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Fixing the console.log grows forever bug.

2014-12-08 Thread Dave Walker
On 8 December 2014 at 10:33, Daniel P. Berrange berra...@redhat.com wrote:
 On Sat, Dec 06, 2014 at 04:38:52PM +1100, Tony Breeds wrote:
 Hi All,
 In the most recent team meeting we briefly discussed: [1] where the
 console.log grows indefinitely, eventually causing guest stalls.  I mentioned
 that I was working on a spec to fix this issue.

 My original plan was fairly similar to [2]  In that we'd switch libvirt/qemu 
 to
 using a unix domain socket and write a simple helper to read from that socket
 and write to disk.  That helper would close and reopen the on disk file upon
 receiving a HUP (so logrotate just works).   Life would be good. and we could
 all move on.

 However I was encouraged to investigate fixing this in qemu, such that qemu
 could process the HUP and make life better for all.  This is certainly doable
 and I'm happy[3] to do this work.  I've floated the idea past qemu-devel and
 they seem okay with the idea.  My main concern is in lag and supporting
 qemu/libvirt that can't handle this option.

 As mentioned in my reply on qemu-devel, I think the right long term solution
 for this is to fix it in libvirt. We have a general security goal to remove
 QEMU's ability to open any files whatsoever, instead having it receive all
 host resources as pre-opened file descriptors from libvirt. So what we
 anticipate is a new libvirt daemon for processing logs, virtlogd. Anywhere
 where QEMU currently gets a file to log to (serial devices, and its
 stdout/stderr), it would instead be given a FD that's connected to virtlogd.
 virtlogd would simply write the data out to file  would be able to close
  re-open files to integrate with logrotate.

 For the sake of discussion  I'll lay out my best guess right now on fixing 
 this
 in qemu.

 qemu 2.2.0 /should/ release this year the ETA is 2014-12-09[4] so the fix I'm
 proposing would be available in qemu 2.3.0 which I think will be available in
 June/July 2015.  So we'd be into 'L' development before this fix is available
 and possibly 'M' before the community distros (Fedora and  Ubuntu)[5] include
 and almost certainly longer for Enterprise distros.  Along with the qemu
 development I expect there to be some libvirt development as well but right 
 now
 I don't think that's critical to the feature or this discussion.

 So if that timeline is approximately correct:

 - Can we wait this long to fix the bug?  As opposed to having it squashed in 
 Kilo.
 - What do we do in nova for the next ~12 months while know there isn't a 
 qemu to fix this?
 - Then once there is a qemu that fixes the issue, do we just say 'thou must 
 use
   qemu 2.3.0' or would nova still need to support old and new qemu's ?

 FWIW, by comparison libvirt is on a monthly release schedule, so a fix done in
 libvirt has potential to be available sooner, though obviously there's bigger
 dev work to be done in libvirt for this.

 Regards,
 Daniel

Hey,

This thread started by suggesting having a scheduled task to read from
a unix socket.  I don't think this can really be considered an
acceptable fix, as the guest does indeed lock up when the buffer is
full.

Initially, I proposed a quick fix for this back in 2011 which provided
a config option to enable a kernel level ring buffer via a
non-mainline module called emlog.  This was not merged for
understandable reasons.  (pre gerrit) -
http://bazaar.launchpad.net/~davewalker/nova/832507_with_emlog/revision/1509/nova/virt/libvirt/connection.py

Later that same year, Robie Basak presented a change which introduced
similar logic ringbuffer support in the nova code itself making use of
eventlet. This seems quite a reasonable fix, but there was concern it
might lock-up guests.. https://review.openstack.org/#/c/706/

I think shortly after this, it was pretty widely agreed that fixing
this in Nova is not the correct layer.  Personally, I struggle
thinking qemu or libvirt is right layer either.  I can't think that
treating a console as a flat log file is the best default behavior.

I still quite like the emlog approach, as having a ringbuffer device
type in the kernel provides exactly what we need and is pretty simple
to implement.

Does anyone know if this generic ringbuffer kernel support was
proposed to mainline kernel?

--
Kind Regards,
Dave Walker

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] Fixing the console.log grows forever bug.

2014-12-08 Thread Daniel P. Berrange
On Mon, Dec 08, 2014 at 01:20:19PM +, Dave Walker wrote:
 On 8 December 2014 at 10:33, Daniel P. Berrange berra...@redhat.com wrote:
  On Sat, Dec 06, 2014 at 04:38:52PM +1100, Tony Breeds wrote:
  Hi All,
  In the most recent team meeting we briefly discussed: [1] where the
  console.log grows indefinitely, eventually causing guest stalls.  I 
  mentioned
  that I was working on a spec to fix this issue.
 
  My original plan was fairly similar to [2]  In that we'd switch 
  libvirt/qemu to
  using a unix domain socket and write a simple helper to read from that 
  socket
  and write to disk.  That helper would close and reopen the on disk file 
  upon
  receiving a HUP (so logrotate just works).   Life would be good. and we 
  could
  all move on.
 
  However I was encouraged to investigate fixing this in qemu, such that qemu
  could process the HUP and make life better for all.  This is certainly 
  doable
  and I'm happy[3] to do this work.  I've floated the idea past qemu-devel 
  and
  they seem okay with the idea.  My main concern is in lag and supporting
  qemu/libvirt that can't handle this option.
 
  As mentioned in my reply on qemu-devel, I think the right long term solution
  for this is to fix it in libvirt. We have a general security goal to remove
  QEMU's ability to open any files whatsoever, instead having it receive all
  host resources as pre-opened file descriptors from libvirt. So what we
  anticipate is a new libvirt daemon for processing logs, virtlogd. Anywhere
  where QEMU currently gets a file to log to (serial devices, and its
  stdout/stderr), it would instead be given a FD that's connected to virtlogd.
  virtlogd would simply write the data out to file  would be able to close
   re-open files to integrate with logrotate.
 
  For the sake of discussion  I'll lay out my best guess right now on fixing 
  this
  in qemu.
 
  qemu 2.2.0 /should/ release this year the ETA is 2014-12-09[4] so the fix 
  I'm
  proposing would be available in qemu 2.3.0 which I think will be available 
  in
  June/July 2015.  So we'd be into 'L' development before this fix is 
  available
  and possibly 'M' before the community distros (Fedora and  Ubuntu)[5] 
  include
  and almost certainly longer for Enterprise distros.  Along with the qemu
  development I expect there to be some libvirt development as well but 
  right now
  I don't think that's critical to the feature or this discussion.
 
  So if that timeline is approximately correct:
 
  - Can we wait this long to fix the bug?  As opposed to having it squashed 
  in Kilo.
  - What do we do in nova for the next ~12 months while know there isn't a 
  qemu to fix this?
  - Then once there is a qemu that fixes the issue, do we just say 'thou 
  must use
qemu 2.3.0' or would nova still need to support old and new qemu's ?
 
  FWIW, by comparison libvirt is on a monthly release schedule, so a fix done 
  in
  libvirt has potential to be available sooner, though obviously there's 
  bigger
  dev work to be done in libvirt for this.
 
  Regards,
  Daniel
 
 Hey,
 
 This thread started by suggesting having a scheduled task to read from
 a unix socket.  I don't think this can really be considered an
 acceptable fix, as the guest does indeed lock up when the buffer is
 full.
 
 Initially, I proposed a quick fix for this back in 2011 which provided
 a config option to enable a kernel level ring buffer via a
 non-mainline module called emlog.  This was not merged for
 understandable reasons.  (pre gerrit) -
 http://bazaar.launchpad.net/~davewalker/nova/832507_with_emlog/revision/1509/nova/virt/libvirt/connection.py
 
 Later that same year, Robie Basak presented a change which introduced
 similar logic ringbuffer support in the nova code itself making use of
 eventlet. This seems quite a reasonable fix, but there was concern it
 might lock-up guests.. https://review.openstack.org/#/c/706/
 
 I think shortly after this, it was pretty widely agreed that fixing
 this in Nova is not the correct layer.  Personally, I struggle
 thinking qemu or libvirt is right layer either.  I can't think that
 treating a console as a flat log file is the best default behavior.
 
 I still quite like the emlog approach, as having a ringbuffer device
 type in the kernel provides exactly what we need and is pretty simple
 to implement.
 
 Does anyone know if this generic ringbuffer kernel support was
 proposed to mainline kernel?

The emlog approach means the data would only ever be stored in RAM on the
host, so in the event of a host reboot/crash you loose all guest logs.
While that might be ok for some people, I think we need to support the
persistent store of the logs on disk for historical / auditing record
purposes.

We don't need kernel support to provide a ring buffer. An more or less
identical solution can be done in userspace with just a pair of fixed
size files. eg write to one file, when it hits a limit, switch to the
second file, then back to the original, 

Re: [openstack-dev] [nova] Fixing the console.log grows forever bug.

2014-12-07 Thread Tim Bell
 -Original Message-
 From: Tony Breeds [mailto:t...@bakeyournoodle.com]
 Sent: 06 December 2014 06:39
 To: openstack-dev@lists.openstack.org
 Subject: [openstack-dev] [nova] Fixing the console.log grows forever bug.
 
...
 
 However I was encouraged to investigate fixing this in qemu, such that qemu
 could process the HUP and make life better for all.  This is certainly doable 
 and
 I'm happy[3] to do this work.  I've floated the idea past qemu-devel and they
 seem okay with the idea.  My main concern is in lag and supporting 
 qemu/libvirt
 that can't handle this option.
 

Would the nova view console be able to see the older versions also ? Ideally, 
we'd also improve on the current situation where the console contents are 
limited to the current file which causes problems around hard reboot operations 
such as watchdog restarts. Thus, if qemu is logrotating the log files, the view 
console OpenStack operations would ideally be able to count all the rotated 
files as part of the console output.

 For the sake of discussion  I'll lay out my best guess right now on fixing 
 this in
 qemu.
 
 qemu 2.2.0 /should/ release this year the ETA is 2014-12-09[4] so the fix I'm
 proposing would be available in qemu 2.3.0 which I think will be available in
 June/July 2015.  So we'd be into 'L' development before this fix is available 
 and
 possibly 'M' before the community distros (Fedora and  Ubuntu)[5] include and
 almost certainly longer for Enterprise distros.  Along with the qemu
 development I expect there to be some libvirt development as well but right 
 now
 I don't think that's critical to the feature or this discussion.
 
 So if that timeline is approximately correct:
 
 - Can we wait this long to fix the bug?  As opposed to having it squashed in 
 Kilo.
 - What do we do in nova for the next ~12 months while know there isn't a qemu
 to fix this?
 - Then once there is a qemu that fixes the issue, do we just say 'thou must 
 use
   qemu 2.3.0' or would nova still need to support old and new qemu's ?
 

Can we just say that the console for qemu 2.2 would remain as currently and for 
the new functionality, you need qemu 2.3 ?

 [1] https://bugs.launchpad.net/nova/+bug/832507
 [2] https://review.openstack.org/#/c/80865/
 [3] For some value of happy ;P
 [4] From http://wiki.qemu.org/Planning/2.2 [5] Debian and Gentoo are a little
 harder to quantify in this scenario but no
 less important.
 
 Yours Tony.
 
 PS: If any of you have a secret laundry list of things qemu should do to make
 life easier for nova.  Put them on a wiki page so we can discuss them.
 PPS: If this is going to be a thing we do (write features and fixes in qemu)
  we're going to need a consistent plan on how we cope with that.
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev