On 18.07.2019 15:13, Tamas K Lengyel wrote:
> On Thu, Jul 18, 2019 at 7:12 AM Jan Beulich <jbeul...@suse.com> wrote:
>>
>> On 18.07.2019 14:55, Tamas K Lengyel wrote:
>>> On Thu, Jul 18, 2019 at 4:47 AM Jan Beulich <jbeul...@suse.com> wrote:
>>>> On 17.07.2019 21:33, Tamas K Lengyel wrote:
>>>>> @@ -900,6 +895,7 @@ static int share_pages(struct domain *sd, gfn_t sgfn, 
>>>>> shr_handle_t sh,
>>>>>         p2m_type_t smfn_type, cmfn_type;
>>>>>         struct two_gfns tg;
>>>>>         struct rmap_iterator ri;
>>>>> +    unsigned long put_count = 0;
>>>>>
>>>>>         get_two_gfns(sd, sgfn, &smfn_type, NULL, &smfn,
>>>>>                      cd, cgfn, &cmfn_type, NULL, &cmfn, 0, &tg);
>>>>> @@ -964,15 +960,6 @@ static int share_pages(struct domain *sd, gfn_t 
>>>>> sgfn, shr_handle_t sh,
>>>>>             goto err_out;
>>>>>         }
>>>>>
>>>>> -    /* Acquire an extra reference, for the freeing below to be safe. */
>>>>> -    if ( !get_page(cpage, dom_cow) )
>>>>> -    {
>>>>> -        ret = -EOVERFLOW;
>>>>> -        mem_sharing_page_unlock(secondpg);
>>>>> -        mem_sharing_page_unlock(firstpg);
>>>>> -        goto err_out;
>>>>> -    }
>>>>> -
>>>>>         /* Merge the lists together */
>>>>>         rmap_seed_iterator(cpage, &ri);
>>>>>         while ( (gfn = rmap_iterate(cpage, &ri)) != NULL)
>>>>> @@ -984,13 +971,14 @@ static int share_pages(struct domain *sd, gfn_t 
>>>>> sgfn, shr_handle_t sh,
>>>>>              * Don't change the type of rmap for the client page. */
>>>>>             rmap_del(gfn, cpage, 0);
>>>>>             rmap_add(gfn, spage);
>>>>> -        put_page_and_type(cpage);
>>>>> +        put_count++;
>>>>>             d = get_domain_by_id(gfn->domain);
>>>>>             BUG_ON(!d);
>>>>>             BUG_ON(set_shared_p2m_entry(d, gfn->gfn, smfn));
>>>>>             put_domain(d);
>>>>>         }
>>>>>         ASSERT(list_empty(&cpage->sharing->gfns));
>>>>> +    BUG_ON(!put_count);
>>>>>
>>>>>         /* Clear the rest of the shared state */
>>>>>         page_sharing_dispose(cpage);
>>>>> @@ -1001,7 +989,9 @@ static int share_pages(struct domain *sd, gfn_t 
>>>>> sgfn, shr_handle_t sh,
>>>>>
>>>>>         /* Free the client page */
>>>>>         put_page_alloc_ref(cpage);
>>>>> -    put_page(cpage);
>>>>> +
>>>>> +    while ( put_count-- )
>>>>> +        put_page_and_type(cpage);
>>>>>
>>>>>         /* We managed to free a domain page. */
>>>>>         atomic_dec(&nr_shared_mfns);
>>>>> @@ -1165,19 +1155,13 @@ int __mem_sharing_unshare_page(struct domain *d,
>>>>>         {
>>>>>             if ( !last_gfn )
>>>>>                 mem_sharing_gfn_destroy(page, d, gfn_info);
>>>>> -        put_page_and_type(page);
>>>>> +
>>>>>             mem_sharing_page_unlock(page);
>>>>> +
>>>>>             if ( last_gfn )
>>>>> -        {
>>>>> -            if ( !get_page(page, dom_cow) )
>>>>> -            {
>>>>> -                put_gfn(d, gfn);
>>>>> -                domain_crash(d);
>>>>> -                return -EOVERFLOW;
>>>>> -            }
>>>>>                 put_page_alloc_ref(page);
>>>>> -            put_page(page);
>>>>> -        }
>>>>> +
>>>>> +        put_page_and_type(page);
>>>>>             put_gfn(d, gfn);
>>>>>
>>>>>             return 0;
>>>>
>>>> ... this (main, as I guess by the title) part of the change? I think
>>>> you want to explain what was wrong here and/or why the new arrangement
>>>> is better. (I'm sorry, I guess it was this way on prior versions
>>>> already, but apparently I didn't notice.)
>>>
>>> It's what the patch message says - calling put_page_and_type before
>>> mem_sharing_page_unlock can cause a deadlock. Since now we are now
>>> holding a reference to the page till the end there is no need for the
>>> extra get_page/put_page logic when we are dealing with the last_gfn.
>>
>> The title says "reorder" without any "why".
> 
> Yes, I can't reasonably fit "Calling _put_page_type while also holding
> the page_lock for that page can cause a deadlock." into the title. So
> it's spelled out in the patch message.

Of course not. And I realize _part_ of the changes is indeed what the
title says (although for share_pages() that's not obvious from the
patch alone). But you do more: You also avoid acquiring an extra
reference in share_pages().

And since you made me look at the code again: If put_page() is unsafe
with a lock held, how come the get_page_and_type() in share_pages()
is safe with two such locks held? If it really is, it surely would be
worthwhile to state in the description. There's another such instance
in mem_sharing_add_to_physmap() (plus a get_page()), and also one
where put_page_and_type() gets called with such a lock held (afaics).

Jan
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xenproject.org
https://lists.xenproject.org/mailman/listinfo/xen-devel

Reply via email to