> On 3 Nov 2021, at 15:30, Jan Beulich <jbeul...@suse.com> wrote:
>
> On 03.11.2021 16:16, Luca Fancellu wrote:
>>
>>
>>> On 3 Nov 2021, at 14:30, Jan Beulich <jbeul...@suse.com> wrote:
>>>
>>> On 03.11.2021 15:09, Luca Fancellu wrote:
>>>>> On 3 Nov 2021, at 11:28, Jan Beulich <jbeul...@suse.com> wrote:
>>>>> On 03.11.2021 11:20, Luca Fancellu wrote:
>>>>>>> On 3 Nov 2021, at 08:20, Jan Beulich <jbeul...@suse.com> wrote:
>>>>>>> On 02.11.2021 18:12, Luca Fancellu wrote:
>>>>>>>>> On 2 Nov 2021, at 14:45, Jan Beulich <jbeul...@suse.com> wrote:
>>>>>>>>> On 02.11.2021 15:05, Luca Fancellu wrote:
>>>>>>>>>> The code introduced by commit
>>>>>>>>>> a1743fc3a9fe9b68c265c45264dddf214fd9b882
>>>>>>>>>> ("arm/efi: Use dom0less configuration when using EFI boot") is
>>>>>>>>>> introducing a problem to boot Xen using Grub2 on ARM machine using
>>>>>>>>>> EDK2.
>>>>>>>>>>
>>>>>>>>>> The problem comes from the function get_parent_handle(...) that
>>>>>>>>>> inside
>>>>>>>>>> uses the HandleProtocol on loaded_image->DeviceHandle, but the last
>>>>>>>>>> is NULL, making Xen stop the UEFI boot.
>>>>>>>>>
>>>>>>>>> According to my reading the UEFI spec doesn't (explicitly) allow for
>>>>>>>>> this to be NULL. Could you clarify why this is the case? What other
>>>>>>>>> information may end up being invalid / absent? Is e.g. read_section()
>>>>>>>>> safe to use?
>>>>>>>>
>>>>>>>> My test on an arm machine running Grub2 on top of EDK2 showed that
>>>>>>>> when Xen is started, the get_parent_handle(…) call was failing and
>>>>>>>> stopping
>>>>>>>> the boot because the efi_bs->HandleProtocol(…) was called with the
>>>>>>>> loaded_image->DeviceHandle argument NULL and the call was returning
>>>>>>>> a EFI_INVALID_PARAMETER.
>>>>>>>> So the parent handle can’t be requested and the filesystem can’t be
>>>>>>>> used,
>>>>>>>> but any other code that doesn’t use the handle provided by
>>>>>>>> get_parent_handle(…)
>>>>>>>> can be used without problem like read_section(...).
>>>>>>>
>>>>>>> I understand this. My question was for the reason of ->DeviceHandle
>>>>>>> being NULL. IOW I'm wondering whether we're actually talking about a
>>>>>>> firmware or GrUB bug, in which case your change is a workaround for
>>>>>>> that rather than (primarily) a fix for the earlier Xen change.
>>>>>>
>>>>>> The issue was found only when using EDK2+Grub2, no issue when booting
>>>>>> directly from EDK2.
>>>>>> This is a fix for the regression, because without the EFI changes, Grub2
>>>>>> was
>>>>>> booting successfully Xen. Using grub2 to boot Xen on arm is a very common
>>>>>> solution so not supporting this anymore could lead to lots of people
>>>>>> having
>>>>>> issues if they update to Xen 4.16.
>>>>>
>>>>> I'm not objecting to addressing the issue. But the description needs
>>>>> to make clear where the origin of the problem lies, and afaict that's
>>>>> not the earlier Xen change. That one merely uncovered what, according
>>>>> to your reply, might then be a GrUB bug. Unless, as said earlier, I
>>>>> merely haven't been able to spot provisions in the spec for the field
>>>>> in question to be NULL.
>>>>
>>>> Maybe I can rephrase to be more specific from:
>>>>
>>>> The problem comes from the function get_parent_handle(...) that inside
>>>> uses the HandleProtocol on loaded_image->DeviceHandle, but the last
>>>> is NULL, making Xen stop the UEFI boot.
>>>>
>>>> To:
>>>>
>>>> Despite UEFI specification, EDK2+Grub2 is returning a NULL DeviceHandle,
>>>> that is used by efi_bs->HandleProtocol(…) inside get_parent_handle(…),
>>>> causing Xen to stop the boot getting an EFI_INVALID_PARAMETER error.
>>>>
>>>> Do you think it can be ok like this?
>>>
>>> Much better, yes, but I wonder what "returning" refers to. You want to
>>> describe the origin of the NULL handle as precisely as possible. And
>>> considering this turns out as a workaround, in a suitable place you
>>> will also want to add a code comment, such that a later reader won't
>>> decide this is all dead code and can be done in a simpler way.
>>
>> Ok I can write the issue from the beginning to be sure:
>>
>> Despite UEFI specification, EDK2+Grub2 is returning a NULL DeviceHandle
>> inside the interface given by the LOADED_IMAGE_PROTOCOL service, this
>> handle is used later by efi_bs->HandleProtocol(…) inside get_parent_handle(…)
>> when requesting the SIMPLE_FILE_SYSTEM_PROTOCOL interface,
>> causing Xen to stop the boot because of an EFI_INVALID_PARAMETER error.
>>
>> Regarding the comment, I can rephrase this comment:
>> /*
>> * If DeviceHandle is NULL, we can't use the SIMPLE_FILE_SYSTEM_PROTOCOL
>> * to have access to the filesystem.
>> */
>>
>> To be:
>>
>> /*
>> * If DeviceHandle is NULL, the firmware offering the UEFI services might not
>> be
>> * compliant to the standard and we can't use the SIMPLE_FILE_SYSTEM_PROTOCOL
>> * to have access to the filesystem. However the system can boot if and only
>> if it doesn’t
>> * require access to the filesystem. (e.g. Xen image has everything built in
>> or the
>> * bootloader did previously load every needed binary in memory)
>> */
>>
>> What do you think?
>
> Largely okay, albeit you don't mention GrUB at all (which isn't really part
> of the firmware, but which looks to be the culprit) and it gets a little
> too verbose. Provided the facts have been verified, how about
>
> /*
> * GrUB has been observed to supply a NULL DeviceHandle. We can't use
> * that to gain access to the filesystem. However the system can still
> * boot if it doesn’t require access to the filesystem.
> */
Ok yes this is better, I will do the changes to the commit message and the
comment for the v2, I’ll wait Stefano reply to see if I have to include also
his suggestion.
Thank you.
>
> (and it's up to you whether you include your further "e.g. ..." then, but
> if you do I think the parenthesized part belong before the final full
> stop, and the last "in" would want to be "into")? It's still dubious to me
> how they can get away with such a NULL handle (and why that happens only
> on Arm), but I guess it would go too far to try to understand the
> background.
>
> Jan