> On 3 Nov 2021, at 08:20, Jan Beulich <jbeul...@suse.com> wrote:
>
> On 02.11.2021 18:12, Luca Fancellu wrote:
>>> On 2 Nov 2021, at 14:45, Jan Beulich <jbeul...@suse.com> wrote:
>>> On 02.11.2021 15:05, Luca Fancellu wrote:
>>>> The code introduced by commit a1743fc3a9fe9b68c265c45264dddf214fd9b882
>>>> ("arm/efi: Use dom0less configuration when using EFI boot") is
>>>> introducing a problem to boot Xen using Grub2 on ARM machine using EDK2.
>>>>
>>>> The problem comes from the function get_parent_handle(...) that inside
>>>> uses the HandleProtocol on loaded_image->DeviceHandle, but the last
>>>> is NULL, making Xen stop the UEFI boot.
>>>
>>> According to my reading the UEFI spec doesn't (explicitly) allow for
>>> this to be NULL. Could you clarify why this is the case? What other
>>> information may end up being invalid / absent? Is e.g. read_section()
>>> safe to use?
>>
>> My test on an arm machine running Grub2 on top of EDK2 showed that
>> when Xen is started, the get_parent_handle(…) call was failing and stopping
>> the boot because the efi_bs->HandleProtocol(…) was called with the
>> loaded_image->DeviceHandle argument NULL and the call was returning
>> a EFI_INVALID_PARAMETER.
>> So the parent handle can’t be requested and the filesystem can’t be used,
>> but any other code that doesn’t use the handle provided by
>> get_parent_handle(…)
>> can be used without problem like read_section(...).
>
> I understand this. My question was for the reason of ->DeviceHandle
> being NULL. IOW I'm wondering whether we're actually talking about a
> firmware or GrUB bug, in which case your change is a workaround for
> that rather than (primarily) a fix for the earlier Xen change.
The issue was found only when using EDK2+Grub2, no issue when booting
directly from EDK2.
This is a fix for the regression, because without the EFI changes, Grub2 was
booting successfully Xen. Using grub2 to boot Xen on arm is a very common
solution so not supporting this anymore could lead to lots of people having
issues if they update to Xen 4.16.
>
>>>> --- a/xen/common/efi/boot.c
>>>> +++ b/xen/common/efi/boot.c
>>>> @@ -449,6 +449,13 @@ static EFI_FILE_HANDLE __init
>>>> get_parent_handle(EFI_LOADED_IMAGE *loaded_image,
>>>> CHAR16 *pathend, *ptr;
>>>> EFI_STATUS ret;
>>>>
>>>> + /*
>>>> + * If DeviceHandle is NULL, we can't use the
>>>> SIMPLE_FILE_SYSTEM_PROTOCOL
>>>> + * to have access to the filesystem.
>>>> + */
>>>> + if ( !loaded_image->DeviceHandle )
>>>> + return NULL;
>>>
>>> I couldn't find anything in the spec saying that NULL (a pointer with
>>> the numeric value zero) could actually not be a valid handle. Could
>>> you point me to text saying so?
>>
>> I am reading UEFI spec 2.8 A, section 7.3 Protocol Handler Services, when it
>> talks about
>> EFI_BOOT_SERVICES.HandleProtocol() there is a table of “Status Code
>> Returned” listing
>> the EFI_INVALID_PARAMETER when the Handle is NULL.
>
> Oh, okay. I guess I didn't search very well.
>
>>>> @@ -1333,6 +1342,9 @@ efi_start(EFI_HANDLE ImageHandle, EFI_SYSTEM_TABLE
>>>> *SystemTable)
>>>> EFI_FILE_HANDLE handle = get_parent_handle(loaded_image,
>>>> &file_name);
>>>>
>>>> + if ( !handle )
>>>> + blexit(L"Error retrieving image name: no filesystem
>>>> access");
>>>
>>> I don't think this should be fatal - see the comment ahead of the
>>> enclosing if().
>>
>> I’m not sure I get it, I put the fatal condition in part because the handle
>> was dereferenced by
>> handle->Close(handle), but also because file_name would have not being
>> modified by the call
>> and we have then *argv = file_name.
>
> Instead of you making boot fail I was trying to suggest that you insert
> a made-up name ("xen" or "xen.efi"?) alongside issuing just a warning
> message.
Oh ok now I see, yes I think I can do it
>
> Jan
>