On Tue, Mar 24, 2015 at 4:54 AM, Andres Lagar Cavilla < and...@lagarcavilla.org> wrote:
> > > On Mon, Mar 23, 2015 at 11:25 AM, Tamas K Lengyel <tkleng...@sec.in.tum.de > > wrote: > >> On Mon, Mar 23, 2015 at 6:59 PM, Andres Lagar Cavilla < >> and...@lagarcavilla.org> wrote: >> >>> On Mon, Mar 23, 2015 at 9:10 AM, Tamas K Lengyel < >>> tkleng...@sec.in.tum.de> wrote: >>> >>>> Hello everyone, >>>> I'm trying to chase down a bug that reproducibly crashes Xen (tested >>>> with 4.4.1). The problem is somewhere within the mem-sharing subsystem and >>>> how that interacts with domains that are being actively saved. In my setup >>>> I use the xl toolstack to rapidly create clones of HVM domains by piping >>>> "xl save -c" into xl restore with a modified domain config which updates >>>> the name/disk/vif. However, during such an operation Xen crashes with the >>>> following log if there are already active clones. >>>> >>>> IMHO there should be no conflict between saving the domain and >>>> memsharing, as long as the domain is actually just being checkpointed "-c" >>>> - it's memory should remain as is. This is however clearly not the case. >>>> Any ideas? >>>> >>> >>> Tamas, I'm not clear on the use of memsharing in this workflow. As >>> described, you pipe save into restore, but the internal magic is lost on >>> me. Are you fanning out to multiple restores? That would seem to be the >>> case, given the need to update name/disk/vif. >>> >>> Anyway, I'm inferring. Instead, could you elaborate? >>> >>> Thanks >>> Andre >>> >> >> Hi Andre, >> thanks for getting back on this issue. The script I'm using is at >> https://github.com/tklengyel/drakvuf/blob/master/tools/clone.pl. The >> script simply creates a FIFO pipe (mkfifo) and saves the domain into that >> pipe which is immediately read by xl restore with the updated configuration >> file. This mainly just to eliminate having to read the memory dump from >> disk. That part of the system works as expected and multiple save/restores >> running at the same time don't cause any side-effects. Once the domain has >> thus been cloned, I run memshare on every page which also works as >> expected. This problem only occurs when the cloning procedure runs when a >> page unshare operation kicks in on a already active clone (as you see in >> the log). >> > > Sorry Tamas, I'm a bit slow here, I looked at your script -- looks > allright, no mention of memsharing in there. > > Re-reading ... memsharing? memshare? Is this memshrtool in tools/testing? > How are you running it? > Hi Andre, the memsharing happens here https://github.com/tklengyel/drakvuf/blob/master/src/main.c#L144 after the clone script finished. This is effectively the same approach as in tools/testing, just automatically looping from 0 to max_gpfn. Afterwards all unsharing happens automatically either induced by the guest itself, or when I map pages into the my app with xc_map_foreign_range PROT_WRITE. > > Certainly no xen crash should happen with user-space input. I'm just > trying to understand what you're doing. The unshare code is not, uhmm, > brief, so a NULL deref could happen in half a dozen places at first glance. > Well let me know what I could do help tracing it down. I don't think (potentially buggy) userspace tools should crash Xen either =) Tamas > > Thanks > Andres >
_______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel