Re: [openstack-dev] [nova] Fixing the console.log grows forever bug.
On Mon, Dec 08, 2014 at 01:20:19PM +, Dave Walker wrote: > On 8 December 2014 at 10:33, Daniel P. Berrange wrote: > > On Sat, Dec 06, 2014 at 04:38:52PM +1100, Tony Breeds wrote: > >> Hi All, > >> In the most recent team meeting we briefly discussed: [1] where the > >> console.log grows indefinitely, eventually causing guest stalls. I > >> mentioned > >> that I was working on a spec to fix this issue. > >> > >> My original plan was fairly similar to [2] In that we'd switch > >> libvirt/qemu to > >> using a unix domain socket and write a simple helper to read from that > >> socket > >> and write to disk. That helper would close and reopen the on disk file > >> upon > >> receiving a HUP (so logrotate just works). Life would be good. and we > >> could > >> all move on. > >> > >> However I was encouraged to investigate fixing this in qemu, such that qemu > >> could process the HUP and make life better for all. This is certainly > >> doable > >> and I'm happy[3] to do this work. I've floated the idea past qemu-devel > >> and > >> they seem okay with the idea. My main concern is in lag and supporting > >> qemu/libvirt that can't handle this option. > > > > As mentioned in my reply on qemu-devel, I think the right long term solution > > for this is to fix it in libvirt. We have a general security goal to remove > > QEMU's ability to open any files whatsoever, instead having it receive all > > host resources as pre-opened file descriptors from libvirt. So what we > > anticipate is a new libvirt daemon for processing logs, virtlogd. Anywhere > > where QEMU currently gets a file to log to ( devices, and its > > stdout/stderr), it would instead be given a FD that's connected to virtlogd. > > virtlogd would simply write the data out to file & would be able to close > > & re-open files to integrate with logrotate. > > > >> For the sake of discussion I'll lay out my best guess right now on fixing > >> this > >> in qemu. > >> > >> qemu 2.2.0 /should/ release this year the ETA is 2014-12-09[4] so the fix > >> I'm > >> proposing would be available in qemu 2.3.0 which I think will be available > >> in > >> June/July 2015. So we'd be into 'L' development before this fix is > >> available > >> and possibly 'M' before the community distros (Fedora and Ubuntu)[5] > >> include > >> and almost certainly longer for Enterprise distros. Along with the qemu > >> development I expect there to be some libvirt development as well but > >> right now > >> I don't think that's critical to the feature or this discussion. > >> > >> So if that timeline is approximately correct: > >> > >> - Can we wait this long to fix the bug? As opposed to having it squashed > >> in Kilo. > >> - What do we do in nova for the next ~12 months while know there isn't a > >> qemu to fix this? > >> - Then once there is a qemu that fixes the issue, do we just say 'thou > >> must use > >> qemu 2.3.0' or would nova still need to support old and new qemu's ? > > > > FWIW, by comparison libvirt is on a monthly release schedule, so a fix done > > in > > libvirt has potential to be available sooner, though obviously there's > > bigger > > dev work to be done in libvirt for this. > > > > Regards, > > Daniel > > Hey, > > This thread started by suggesting having a scheduled task to read from > a unix socket. I don't think this can really be considered an > acceptable fix, as the guest does indeed lock up when the buffer is > full. > > Initially, I proposed a quick fix for this back in 2011 which provided > a config option to enable a kernel level ring buffer via a > non-mainline module called emlog. This was not merged for > understandable reasons. (pre gerrit) - > http://bazaar.launchpad.net/~davewalker/nova/832507_with_emlog/revision/1509/nova/virt/libvirt/connection.py > > Later that same year, Robie Basak presented a change which introduced > similar logic ringbuffer support in the nova code itself making use of > eventlet. This seems quite a reasonable fix, but there was concern it > might lock-up guests.. https://review.openstack.org/#/c/706/ > > I think shortly after this, it was pretty widely agreed that fixing > this in Nova is not the correct layer. Personally, I struggle > thinking qemu or libvirt is right layer either. I can't think that > treating a console as a flat log file is the best default behavior. > > I still quite like the emlog approach, as having a ringbuffer device > type in the kernel provides exactly what we need and is pretty simple > to implement. > > Does anyone know if this generic ringbuffer kernel support was > proposed to mainline kernel? The emlog approach means the data would only ever be stored in RAM on the host, so in the event of a host reboot/crash you loose all guest logs. While that might be ok for some people, I think we need to support the persistent store of the logs on disk for historical / auditing record purposes. We don't need kernel support to provide a ring buffer. An
Re: [openstack-dev] [nova] Fixing the console.log grows forever bug.
On 8 December 2014 at 10:33, Daniel P. Berrange wrote: > On Sat, Dec 06, 2014 at 04:38:52PM +1100, Tony Breeds wrote: >> Hi All, >> In the most recent team meeting we briefly discussed: [1] where the >> console.log grows indefinitely, eventually causing guest stalls. I mentioned >> that I was working on a spec to fix this issue. >> >> My original plan was fairly similar to [2] In that we'd switch libvirt/qemu >> to >> using a unix domain socket and write a simple helper to read from that socket >> and write to disk. That helper would close and reopen the on disk file upon >> receiving a HUP (so logrotate just works). Life would be good. and we could >> all move on. >> >> However I was encouraged to investigate fixing this in qemu, such that qemu >> could process the HUP and make life better for all. This is certainly doable >> and I'm happy[3] to do this work. I've floated the idea past qemu-devel and >> they seem okay with the idea. My main concern is in lag and supporting >> qemu/libvirt that can't handle this option. > > As mentioned in my reply on qemu-devel, I think the right long term solution > for this is to fix it in libvirt. We have a general security goal to remove > QEMU's ability to open any files whatsoever, instead having it receive all > host resources as pre-opened file descriptors from libvirt. So what we > anticipate is a new libvirt daemon for processing logs, virtlogd. Anywhere > where QEMU currently gets a file to log to ( devices, and its > stdout/stderr), it would instead be given a FD that's connected to virtlogd. > virtlogd would simply write the data out to file & would be able to close > & re-open files to integrate with logrotate. > >> For the sake of discussion I'll lay out my best guess right now on fixing >> this >> in qemu. >> >> qemu 2.2.0 /should/ release this year the ETA is 2014-12-09[4] so the fix I'm >> proposing would be available in qemu 2.3.0 which I think will be available in >> June/July 2015. So we'd be into 'L' development before this fix is available >> and possibly 'M' before the community distros (Fedora and Ubuntu)[5] include >> and almost certainly longer for Enterprise distros. Along with the qemu >> development I expect there to be some libvirt development as well but right >> now >> I don't think that's critical to the feature or this discussion. >> >> So if that timeline is approximately correct: >> >> - Can we wait this long to fix the bug? As opposed to having it squashed in >> Kilo. >> - What do we do in nova for the next ~12 months while know there isn't a >> qemu to fix this? >> - Then once there is a qemu that fixes the issue, do we just say 'thou must >> use >> qemu 2.3.0' or would nova still need to support old and new qemu's ? > > FWIW, by comparison libvirt is on a monthly release schedule, so a fix done in > libvirt has potential to be available sooner, though obviously there's bigger > dev work to be done in libvirt for this. > > Regards, > Daniel Hey, This thread started by suggesting having a scheduled task to read from a unix socket. I don't think this can really be considered an acceptable fix, as the guest does indeed lock up when the buffer is full. Initially, I proposed a quick fix for this back in 2011 which provided a config option to enable a kernel level ring buffer via a non-mainline module called emlog. This was not merged for understandable reasons. (pre gerrit) - http://bazaar.launchpad.net/~davewalker/nova/832507_with_emlog/revision/1509/nova/virt/libvirt/connection.py Later that same year, Robie Basak presented a change which introduced similar logic ringbuffer support in the nova code itself making use of eventlet. This seems quite a reasonable fix, but there was concern it might lock-up guests.. https://review.openstack.org/#/c/706/ I think shortly after this, it was pretty widely agreed that fixing this in Nova is not the correct layer. Personally, I struggle thinking qemu or libvirt is right layer either. I can't think that treating a console as a flat log file is the best default behavior. I still quite like the emlog approach, as having a ringbuffer device type in the kernel provides exactly what we need and is pretty simple to implement. Does anyone know if this generic ringbuffer kernel support was proposed to mainline kernel? -- Kind Regards, Dave Walker ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Fixing the console.log grows forever bug.
On Sat, Dec 06, 2014 at 04:38:52PM +1100, Tony Breeds wrote: > Hi All, > In the most recent team meeting we briefly discussed: [1] where the > console.log grows indefinitely, eventually causing guest stalls. I mentioned > that I was working on a spec to fix this issue. > > My original plan was fairly similar to [2] In that we'd switch libvirt/qemu > to > using a unix domain socket and write a simple helper to read from that socket > and write to disk. That helper would close and reopen the on disk file upon > receiving a HUP (so logrotate just works). Life would be good. and we could > all move on. > > However I was encouraged to investigate fixing this in qemu, such that qemu > could process the HUP and make life better for all. This is certainly doable > and I'm happy[3] to do this work. I've floated the idea past qemu-devel and > they seem okay with the idea. My main concern is in lag and supporting > qemu/libvirt that can't handle this option. As mentioned in my reply on qemu-devel, I think the right long term solution for this is to fix it in libvirt. We have a general security goal to remove QEMU's ability to open any files whatsoever, instead having it receive all host resources as pre-opened file descriptors from libvirt. So what we anticipate is a new libvirt daemon for processing logs, virtlogd. Anywhere where QEMU currently gets a file to log to ( devices, and its stdout/stderr), it would instead be given a FD that's connected to virtlogd. virtlogd would simply write the data out to file & would be able to close & re-open files to integrate with logrotate. > For the sake of discussion I'll lay out my best guess right now on fixing > this > in qemu. > > qemu 2.2.0 /should/ release this year the ETA is 2014-12-09[4] so the fix I'm > proposing would be available in qemu 2.3.0 which I think will be available in > June/July 2015. So we'd be into 'L' development before this fix is available > and possibly 'M' before the community distros (Fedora and Ubuntu)[5] include > and almost certainly longer for Enterprise distros. Along with the qemu > development I expect there to be some libvirt development as well but right > now > I don't think that's critical to the feature or this discussion. > > So if that timeline is approximately correct: > > - Can we wait this long to fix the bug? As opposed to having it squashed in > Kilo. > - What do we do in nova for the next ~12 months while know there isn't a qemu > to fix this? > - Then once there is a qemu that fixes the issue, do we just say 'thou must > use > qemu 2.3.0' or would nova still need to support old and new qemu's ? FWIW, by comparison libvirt is on a monthly release schedule, so a fix done in libvirt has potential to be available sooner, though obviously there's bigger dev work to be done in libvirt for this. Regards, Daniel -- |: http://berrange.com -o-http://www.flickr.com/photos/dberrange/ :| |: http://libvirt.org -o- http://virt-manager.org :| |: http://autobuild.org -o- http://search.cpan.org/~danberr/ :| |: http://entangle-photo.org -o- http://live.gnome.org/gtk-vnc :| ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Fixing the console.log grows forever bug.
Tony Breeds wrote: > [...] > So if that timeline is approximately correct: > > - Can we wait this long to fix the bug? As opposed to having it squashed in > Kilo. > - What do we do in nova for the next ~12 months while know there isn't a qemu > to fix this? > - Then once there is a qemu that fixes the issue, do we just say 'thou must > use > qemu 2.3.0' or would nova still need to support old and new qemu's ? Fixing it in qemu looks like the right way to fix this issue. If it was simple to fix, it would have been fixed already: this is one of our oldest bugs with security impact. So I'd say yes, this should be fixed in qemu, even if that takes a long time to propagate. If someone finds an interesting way to work around this issue in Nova, then by all means, add the workaround to Kilo and deprecate it once we can assume everyone moved to newer qemu. But given it's been 3 years this bug has been around, I wouldn't hold my breath. -- Thierry Carrez (ttx) signature.asc Description: OpenPGP digital signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Fixing the console.log grows forever bug.
On Sun, Dec 07, 2014 at 08:47:28AM +, Tim Bell wrote: > Would the nova view console be able to see the older versions also ? Ideally, > we'd also improve on the current situation where the console contents are > limited to the current file which causes problems around hard reboot > operations such as watchdog restarts. Thus, if qemu is logrotating the log > files, the view console OpenStack operations would ideally be able to count > all the rotated files as part of the console output. So I think the TL;DR: is Yup we can do that and regardless of which process owns the logfile. Having said that I think there are at least 2 related topics in your question. As I see it here are the 2 issues I know about. - Currently if you restart an instance the console.log is overwritten which means you loose console logs from older boots. * With the 'helper app' this issue wouldn't happen anymore. * With the qemu approach extra code would need to be added to ensue we also close that bug. - nova console-log, only shows the current boot. * regardless of which approach we use to solve this bug we'd need to enhance nova console-log to be able to detect other logfiles and display them. I assume something similar would be needed for horizon. I don't think it's be hard to do but I'm not promising to hack on horizon. > Can we just say that the console for qemu 2.2 would remain as currently and > for the new functionality, you need qemu 2.3 ? Yes, but but that leaves operators using qemu < 2.3.0 open to this bug. The LP bug was opened about 3 years ago I'm not sure if that's a problem. I just want to know how much and what work I'll be doing to fix this. Yours Tony. pgpfvYacoyQzS.pgp Description: PGP signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] Fixing the console.log grows forever bug.
> -Original Message- > From: Tony Breeds [mailto:t...@bakeyournoodle.com] > Sent: 06 December 2014 06:39 > To: openstack-dev@lists.openstack.org > Subject: [openstack-dev] [nova] Fixing the console.log grows forever bug. > ... > > However I was encouraged to investigate fixing this in qemu, such that qemu > could process the HUP and make life better for all. This is certainly doable > and > I'm happy[3] to do this work. I've floated the idea past qemu-devel and they > seem okay with the idea. My main concern is in lag and supporting > qemu/libvirt > that can't handle this option. > Would the nova view console be able to see the older versions also ? Ideally, we'd also improve on the current situation where the console contents are limited to the current file which causes problems around hard reboot operations such as watchdog restarts. Thus, if qemu is logrotating the log files, the view console OpenStack operations would ideally be able to count all the rotated files as part of the console output. > For the sake of discussion I'll lay out my best guess right now on fixing > this in > qemu. > > qemu 2.2.0 /should/ release this year the ETA is 2014-12-09[4] so the fix I'm > proposing would be available in qemu 2.3.0 which I think will be available in > June/July 2015. So we'd be into 'L' development before this fix is available > and > possibly 'M' before the community distros (Fedora and Ubuntu)[5] include and > almost certainly longer for Enterprise distros. Along with the qemu > development I expect there to be some libvirt development as well but right > now > I don't think that's critical to the feature or this discussion. > > So if that timeline is approximately correct: > > - Can we wait this long to fix the bug? As opposed to having it squashed in > Kilo. > - What do we do in nova for the next ~12 months while know there isn't a qemu > to fix this? > - Then once there is a qemu that fixes the issue, do we just say 'thou must > use > qemu 2.3.0' or would nova still need to support old and new qemu's ? > Can we just say that the console for qemu 2.2 would remain as currently and for the new functionality, you need qemu 2.3 ? > [1] https://bugs.launchpad.net/nova/+bug/832507 > [2] https://review.openstack.org/#/c/80865/ > [3] For some value of happy ;P > [4] From http://wiki.qemu.org/Planning/2.2 [5] Debian and Gentoo are a little > harder to quantify in this scenario but no > less important. > > Yours Tony. > > PS: If any of you have a secret laundry list of things qemu should do to make > life easier for nova. Put them on a wiki page so we can discuss them. > PPS: If this is going to be a thing we do (write features and fixes in qemu) > we're going to need a consistent plan on how we cope with that. ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] Fixing the console.log grows forever bug.
Hi All, In the most recent team meeting we briefly discussed: [1] where the console.log grows indefinitely, eventually causing guest stalls. I mentioned that I was working on a spec to fix this issue. My original plan was fairly similar to [2] In that we'd switch libvirt/qemu to using a unix domain socket and write a simple helper to read from that socket and write to disk. That helper would close and reopen the on disk file upon receiving a HUP (so logrotate just works). Life would be good. and we could all move on. However I was encouraged to investigate fixing this in qemu, such that qemu could process the HUP and make life better for all. This is certainly doable and I'm happy[3] to do this work. I've floated the idea past qemu-devel and they seem okay with the idea. My main concern is in lag and supporting qemu/libvirt that can't handle this option. For the sake of discussion I'll lay out my best guess right now on fixing this in qemu. qemu 2.2.0 /should/ release this year the ETA is 2014-12-09[4] so the fix I'm proposing would be available in qemu 2.3.0 which I think will be available in June/July 2015. So we'd be into 'L' development before this fix is available and possibly 'M' before the community distros (Fedora and Ubuntu)[5] include and almost certainly longer for Enterprise distros. Along with the qemu development I expect there to be some libvirt development as well but right now I don't think that's critical to the feature or this discussion. So if that timeline is approximately correct: - Can we wait this long to fix the bug? As opposed to having it squashed in Kilo. - What do we do in nova for the next ~12 months while know there isn't a qemu to fix this? - Then once there is a qemu that fixes the issue, do we just say 'thou must use qemu 2.3.0' or would nova still need to support old and new qemu's ? [1] https://bugs.launchpad.net/nova/+bug/832507 [2] https://review.openstack.org/#/c/80865/ [3] For some value of happy ;P [4] From http://wiki.qemu.org/Planning/2.2 [5] Debian and Gentoo are a little harder to quantify in this scenario but no less important. Yours Tony. PS: If any of you have a secret laundry list of things qemu should do to make life easier for nova. Put them on a wiki page so we can discuss them. PPS: If this is going to be a thing we do (write features and fixes in qemu) we're going to need a consistent plan on how we cope with that. pgpa905VGJA30.pgp Description: PGP signature ___ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev