Re: [libvirt] [PATCH] qemu: Refresh capabilities when creating resctrl allocation
On Fri, Feb 02, 2018 at 14:57:54 +0100, Martin Kletzander wrote: > On Fri, Feb 02, 2018 at 02:29:03PM +0100, Peter Krempa wrote: > > On Fri, Feb 02, 2018 at 12:44:03 +0100, Martin Kletzander wrote: > > > Since one of the things in capabilities (info from resctrl updated with > > > data > > > about caches) can be change on the system by remounting the > > > /sys/fs/resctrl with > > > different options, the capabilities need to be refreshed. There is a > > > better fix > > > in the works, but it's going to be way bigger than this (hence the XXX > > > note > > > there), so for the time being let's workaround this. And in order not to > > > slow > > > down the domain starting, only get the capabilities if there are any > > > cachetunes. > > > > > > Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1540780 > > > > This BZ describes a crash if the filesystem is remounted, but you are > > attempting to fix this not by fixing the code that crashed but by > > re-loading the information if possibly somebody remounted it. > > > > This does not seem to be the correct fix since you still have a race > > window, where the options can be changed after the refresh is executed > > and prior to using them in the code where it actually crashed. > > Yeah, I'm looking at that as well. It will need a restructuring (moving some > conf code to util - it'll also look nicer), but fix for exactly what is > happening here is enough for now. Well, I'm not okay with selling this as a fix for the crash described in the bugzilla. I might be okay with doing this as a mitigation for stale data, but this is not a fix for the crash in any way. We have the same kind of issue (minus the crash) with hugetlbfs mount data or host cpu maps in the numa host description and we don't refresh the capabilities at every start of those. So there's the problem that the cache filesystem data is stale and your host might fail to start which is inconvenient but not a big of a problem. The second issue is the crash if the data is stale and this certainly does not fix that. I will be okay with this patch (since it's in an unlikely code path) if you rewrite the commit message and completely drop the mention of the bugzilla above. This patch simply does not fix that BZ. signature.asc Description: PGP signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: Refresh capabilities when creating resctrl allocation
On Fri, Feb 02, 2018 at 02:29:03PM +0100, Peter Krempa wrote: On Fri, Feb 02, 2018 at 12:44:03 +0100, Martin Kletzander wrote: Since one of the things in capabilities (info from resctrl updated with data about caches) can be change on the system by remounting the /sys/fs/resctrl with different options, the capabilities need to be refreshed. There is a better fix in the works, but it's going to be way bigger than this (hence the XXX note there), so for the time being let's workaround this. And in order not to slow down the domain starting, only get the capabilities if there are any cachetunes. Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1540780 This BZ describes a crash if the filesystem is remounted, but you are attempting to fix this not by fixing the code that crashed but by re-loading the information if possibly somebody remounted it. This does not seem to be the correct fix since you still have a race window, where the options can be changed after the refresh is executed and prior to using them in the code where it actually crashed. Yeah, I'm looking at that as well. It will need a restructuring (moving some conf code to util - it'll also look nicer), but fix for exactly what is happening here is enough for now. Martin signature.asc Description: Digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] qemu: Refresh capabilities when creating resctrl allocation
On Fri, Feb 02, 2018 at 12:44:03 +0100, Martin Kletzander wrote: > Since one of the things in capabilities (info from resctrl updated with data > about caches) can be change on the system by remounting the /sys/fs/resctrl > with > different options, the capabilities need to be refreshed. There is a better > fix > in the works, but it's going to be way bigger than this (hence the XXX note > there), so for the time being let's workaround this. And in order not to slow > down the domain starting, only get the capabilities if there are any > cachetunes. > > Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1540780 This BZ describes a crash if the filesystem is remounted, but you are attempting to fix this not by fixing the code that crashed but by re-loading the information if possibly somebody remounted it. This does not seem to be the correct fix since you still have a race window, where the options can be changed after the refresh is executed and prior to using them in the code where it actually crashed. signature.asc Description: PGP signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list