On 29.09.2023 15:01, Tamas K Lengyel wrote:
> On Thu, Sep 28, 2023 at 9:19 AM Jan Beulich <jbeul...@suse.com> wrote:
>>
>> On 28.09.2023 14:57, Tamas K Lengyel wrote:
>>> On Thu, Sep 28, 2023 at 7:08 AM Roger Pau Monné <roger....@citrix.com> 
>>> wrote:
>>>> On Thu, Sep 28, 2023 at 12:11:02PM +0200, Jan Beulich wrote:
>>>>> On 28.09.2023 11:51, Roger Pau Monné wrote:
>>>>>> On Thu, Sep 28, 2023 at 09:16:20AM +0200, Jan Beulich wrote:
>>>>>>> +        /*
>>>>>>> +         * Map the area into the guest. For simplicity specify the 
>>>>>>> entire range
>>>>>>> +         * up to the end of the page: All the function uses it for is 
>>>>>>> to check
>>>>>>> +         * that the range doesn't cross page boundaries. Having the 
>>>>>>> area mapped
>>>>>>> +         * in the original domain implies that it fits there and 
>>>>>>> therefore will
>>>>>>> +         * also fit in the clone.
>>>>>>> +         */
>>>>>>> +        offset = PAGE_OFFSET(d_area->map);
>>>>>>> +        ret = map_guest_area(cd_vcpu, gfn_to_gaddr(gfn) + offset,
>>>>>>> +                             PAGE_SIZE - offset, cd_area, NULL);
>>>>>>> +        if ( ret )
>>>>>>> +            return ret;
>>>>>>> +    }
>>>>>>> +    else
>>>>>>> +        cd_mfn = page_to_mfn(cd_area->pg);
>>>>>>> +
>>>>>>> +    copy_domain_page(cd_mfn, d_mfn);
>>>>>>
>>>>>> I think the page copy should be done only once, when the page is
>>>>>> populated on the child p2m.  Otherwise areas smaller than a page size
>>>>>> (like vpcu_time_info_t) that share the same page will get multiple
>>>>>> copies of the same data for no reason.
>>>>>
>>>>> I think you're right, but this would then be another issue in the original
>>>>> code that I merely didn't spot (and it's not just "copy for no reason",
>>>>> we'd actually corrupt what was put there before). IOW the copying needs to
>>>>> move ahead of map_guest_area() (or yet more precisely after the error
>>>>> checking for p2m->set_entry()), and in the original code it would have
>>>>> needed to live ahead of map_vcpu_info(). Once again I'd like Tamas to
>>>>> confirm (or otherwise) before making that change, though.
>>>>
>>>> Yes, it's already an issue in the current code.  I wonder whether
>>>> logic in the guest or Xen could malfunctions due to the fact that
>>>> map_vcpu_info() unconditionally sets evtchn_upcall_pending and injects
>>>> an event channel upcall, but the later call to copy_domain_page()
>>>> might unset evtchn_upcall_pending while the vector is already injected.
>>>
>>> Sorry but I really don't follow the discussion here. My understanding
>>> was that map_vcpu_info, as its name suggests, maps the page. We use it
>>> to map a new page into that position in case the fork hasn't set it up
>>> yet but the parent has one. Then we follow with the copy from the
>>> parent so the page content is matching. If there is already a
>>> vcpu_info page in the fork, we just do the copy.
>>>
>>> Now, if map_vcpu_info does more then mapping, then I don't know what
>>> it does, why it does it, and what happens if we skip it when the fork
>>> is reset for example. Is the suggestion to call it map_vcpu_info every
>>> time the page content is reset (ie after the copy)?
>>
>> The vCPU info area (already prior to this series) and the two other areas
>> can be updated by the hypervisor at any time. Once one such area is
>> registered within a certain page, if another such area happens to live in
>> the same page, copying the entire page again would overwrite all updates
>> that might already have been made for the first area. IOW copying ought
>> to - imo - happen exactly once, when the new page is allocated.
>>
>> As to map_vcpu_info() - just look at the function: It writes to the newly
>> registered area. Even if the function name says just "map", that's an
>> integral part of the operation. We can't just map it, but leave the area
>> untouched.
> 
> The domains are paused (both the parent and the child) when a new fork
> is being made and also during fork reset. If Xen really does write to
> these memory areas "any time", even if the domain is paused, we will
> need a lock to tell Xen not touch it cause it hasn't finished being
> constructed/reset yet.

It's not really "any" time, but "any time the vCPU is (about to) run(ning)".

> Moreover, not sure what Xen is writing to these
> areas, but if it's anything "stateful" it should be discarded or
> copied from the parent because the guest state must match the parent
> after the fork/reset op.

How that? Both are running independently once unpaused, and the data
written is derived from various bits of system and guest state. I see
no reason why the initial populating would need to clone the parent's
when immediately afterwards the data would be overwritten by child-
specific values.

Jan

Reply via email to