Re: [edk2] [PATCH 0/1] MdeModulePkg/EbcDxe: add ARM support

2016-09-22 Thread Pete Batard

Hi Andrew,

On 2016.09.22 21:27, Andrew Fish wrote:

It seems like tracking the PUSHes would work?


First of all, as got pointed out by Ard, I need to mention that much of 
my earlier analysis results, which you quoted, were due to a misread of 
the specs, as it does mandates pushing 32 bit parameters as native.
So the output I gave for AARCH64 and X64 was erroneous, since it was a 
result of using PUSH32, and the problematic output disappears when using 
PUSHn.


The only issue therefore, is with 32 bit ARM.


We could update the spec to require this behavior.


If the idea is to add the tracking when processing POP/PUSH on Arm 
(which is how I originally saw it), please note that we may also have to 
track stack manipulation with MOV against R0, the EBC stack pointer, if 
someone issues something like 'MOV R0, R0(-1,0)' (equivalent to PUSHn) 
or 'MOV R0, R0(0,-8)' (equivalent to PUSH64) for some 32 or 64 bit 
optional parameter...


However, this brings us back to our original issue if there exists a 
succession of optional 32 and 64 bit parameters, which the application 
doesn't care about filling, and that may be handled with a single 'MOV 
R0, R0(-1,-8)' in the assembly. If we are faced with this in our 
tracking, then we still have no way of knowing which of the 32 or 64 bit 
parameter comes first.


So, notwithstanding other issues, we'd have to mandate something very 
specific against doing this in the UEFI/EBC specs (such as only using 
individual PUSHes)...


Regards,

/Pete



___
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel


Re: [edk2] [PATCH 0/1] MdeModulePkg/EbcDxe: add ARM support

2016-09-22 Thread Andrew Fish

> On Sep 22, 2016, at 4:05 AM, Pete Batard  wrote:
> 
> On 2016.09.22 11:06, Ard Biesheuvel wrote:
>> However, there is a fundamental issue with EBC on ARM that has not
>> been addressed yet, which makes EBC support problematic:
>> ARM uses natural alignment for 64-bit types, which means it leaves
>> gaps in the stack frame, and the thunking code has no way of dealing
>> with that.
> 
> I was hoping you would comment on this, as I believe the issue is larger than 
> Arm (which is why I thought the Arm patch could be integrated), since I ran 
> into something similar for X64 and AARCH64, with the conversion from stack to 
> register parameters.
> 
> Let me start by showing a real-life example of what the current EBC 
> implementation does, on different architectures, if you CALLEX from EBC into 
> a native function call such as:
> 
> VOID MultiParamNative(
>UINT32,
>UINT64,
>UINT64,
>UINT64,
>UINT32,
>UINT32,
>UINT64
> );
> 
> with values:
> 
>0x1C1C1C1C,
>0x2B2B2B2B2A2A2A2A,
>0x3B3B3B3B3A3A3A3A,
>0x4B4B4B4B4A4A4A4A,
>0x5C5C5C5C,
>0x6C6C6C6C,
>0x7B7B7B7B7A7A7A7A
> 
> If you do that, then the parameter values seen by each Arch will be as 
> follows:
> 
> IA32:
>  p1 = 0x1C1C1C1C
>  p2 = 0x2B2B2B2B2A2A2A2A
>  p3 = 0x3B3B3B3B3A3A3A3A
>  p4 = 0x4B4B4B4B4A4A4A4A
>  p5 = 0x5C5C5C5C
>  p6 = 0x6C6C6C6C
>  p7 = 0x7B7B7B7B7A7A7A7A
> 
> X64:
>  p1 = 0x1C1C1C1C
>  p2 = 0x3A3A3A3A2B2B2B2B
>  p3 = 0x4A4A4A4A3B3B3B3B
>  p4 = 0x5C5C5C5C4B4B4B4B
>  p5 = 0x6C6C6C6C
>  p6 = 0x7B7B7B7B
>  p7 = 0x06F23E4012345678
> 
> ARM:
>  p1 = 0x1C1C1C1C
>  p2 = 0x3A3A3A3A2B2B2B2B
>  p3 = 0x4A4A4A4A3B3B3B3B
>  p4 = 0x5C5C5C5C4B4B4B4B
>  p5 = 0x6C6C6C6C
>  p6 = 0x7A7A7A7A
>  p7 = 0x446EEC467B7B7B7B
> 
> AA64:
>  p1 = 0x1C1C1C1C
>  p2 = 0x3A3A3A3A2B2B2B2B
>  p3 = 0x4A4A4A4A3B3B3B3B
>  p4 = 0x5C5C5C5C4B4B4B4B
>  p5 = 0x6C6C6C6C
>  p6 = 0x7B7B7B7B
>  p7 = 0xFF91
> 
> Note that these are real-life results gotten from a native set of drivers [1] 
> + EBC sample [2], specifically designed to test the above.
> 
> So, as you can see, only IA32 currently retrieves the parameters with their 
> expected values. All the other platforms, and not just Arm, have an issue 
> with parameter retrieval.
> 
> I too performed some analysis [3], to understand the problem, the result of 
> which can be summarized as follows:
> 
> Let's say you have native protocol function:
> 
>  ProtocolCall(UINT32, UINT64, UINT64)
> 
> to which you want to pass values:
> 
>  (0x1C1C1C1C, 0x2B2B2B2B2A2A2A2A, 0x3B3B3B3B3A3A3A3A)
> 
> With the EBC VM, the parameters then get stacked as (little endian, CDECL and 
> using 32-bit longwords to represent the stack):
> 
>   ++
>   |1C1C1C1C|
>   ++
>   |2A2A2A2A|
>   ++
>   |2B2B2B2B|
>   ++
>   |3A3A3A3A|
>   ++
>   |3B3B3B3B|
>   ++
>   ++
>   ++
> 
> Now, if you are calling into an x86_32 arch, this is no issue, as the native 
> call reads the parameters off the stack, and finds each one it its expected 
> location.
> 
> But if, say, you are calling into native Arm, then the calling convention 
> dictates that the first four 32 bits parameters must be placed into Arm 
> registers r0-r3, rather than on the stack, and what's more, that if there 
> exist 64 bit parameters among the register ones, they must start with an even 
> register (r0 or r2).
> 
> What this means is that, with the current EBC handling, which simply maps the 
> top of the stack onto registers for native CALLEX (as the VM cannot guess the 
> parameter signature of the function it is calling into, and therefore will 
> not do anything else but a blind mapping of the stack onto registers), the 
> native Arm function effectively gets called with the following parameter 
> mapping:
> 
>   ++
>   |1C1C1C1C|  -> r0 (32-bit first parameter)
>   ++
>   |2A2A2A2A|  -> (r1/unused, since first parameter is 32-bit)
>   ++
>   |2B2B2B2B|  -> r2 (lower half of 64-bit second parameter)
>   ++
>   |3A3A3A3A|  -> r3 (upper half of 64-bit second parameter)
>   ++
>   |3B3B3B3B|  -> lower half of 64-bit third parameter (stack)
>   ++
>   ++  -> upper half of 64-bit third parameter (stack)
>   ++
> 
> The end result is that, the Arm call ends up with these values:
> 
>  (0x1C1C1C1C, 0x3A3A3A3A2B2B2B2B, 0x3B3B3B3B)
> 
> However, while we used Arm for this example, this is not an Arm specific 
> issue, as x86_64 and Arm64 also expect any of the first eight parameters to a 
> native call, that are smaller than 64-bit, to get passed as a 64-bit 
> register, which means they too have the same issue as the one illustrated 
> above.
> 
> Now, I'm not sure what the solution to that issue would be. I tend to agree 
> that, short of including a parameter signature for function calls, this 
> function argument marshalling issue between EBC and native will be difficult 
> to solve. 

Re: [edk2] [PATCH 0/1] MdeModulePkg/EbcDxe: add ARM support

2016-09-22 Thread Ard Biesheuvel
On 22 September 2016 at 12:26, Pete Batard  wrote:
> On 2016.09.22 12:14, Ard Biesheuvel wrote:
>>
>> For X64 and AARCH64, the issue does not exist because the EBC spec
>> mandates that all function arguments are widened to the native word
>> size. So when executing on a 64-bit architecture, the EBC stack looks
>> differently from what you describe above, and maps seamlessly onto the
>> register assignment mandated by the respective calling conventions.
>
>
> Ah, I see that you're right, and that I was trying to solve an issue that
> shouldn't exist:
>
> From UEFI 2.6, paragraph 21.9.3:
>
> "32-bit integers are pushed as natural size (since they
> should be passed as 64-bit parameter values on 64-bit machines)."
>
> I must admit I was a bit curious as to why this problem wouldn't have been
> picked before.
>
>
> So that leaves only the issue you mentioned. But then I'm not too hopeful
> with the timeframe for Arm/EBC integration when you say "we need language
> spec and compiler updates before we can fully support this"...
>
> Do you know if work has been started on this? Or are we just going to
> consider that this is too troublesome a problem to fix?
>

We are debating this internally. Even on AArch64, there are numerous
issues with EBC drivers, even if the code itself executes fine (this
is mainly related to PCI drivers that don't bother to enable support
for 64-bit DMA since this is never needed on Intel platforms)

So while EBC is of high importance to many Linaro members, I am not
sure if that includes 32-bit ARM support.

-- 
Ard.
___
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel


Re: [edk2] [PATCH 0/1] MdeModulePkg/EbcDxe: add ARM support

2016-09-22 Thread Pete Batard

On 2016.09.22 12:14, Ard Biesheuvel wrote:

For X64 and AARCH64, the issue does not exist because the EBC spec
mandates that all function arguments are widened to the native word
size. So when executing on a 64-bit architecture, the EBC stack looks
differently from what you describe above, and maps seamlessly onto the
register assignment mandated by the respective calling conventions.


Ah, I see that you're right, and that I was trying to solve an issue 
that shouldn't exist:


From UEFI 2.6, paragraph 21.9.3:

"32-bit integers are pushed as natural size (since they
should be passed as 64-bit parameter values on 64-bit machines)."

I must admit I was a bit curious as to why this problem wouldn't have 
been picked before.



So that leaves only the issue you mentioned. But then I'm not too 
hopeful with the timeframe for Arm/EBC integration when you say "we need 
language spec and compiler updates before we can fully support this"...


Do you know if work has been started on this? Or are we just going to 
consider that this is too troublesome a problem to fix?


Regards,

/Pete
___
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel


Re: [edk2] [PATCH 0/1] MdeModulePkg/EbcDxe: add ARM support

2016-09-22 Thread Ard Biesheuvel
On 22 September 2016 at 12:05, Pete Batard  wrote:
> On 2016.09.22 11:06, Ard Biesheuvel wrote:
>>
>> However, there is a fundamental issue with EBC on ARM that has not
>> been addressed yet, which makes EBC support problematic:
>> ARM uses natural alignment for 64-bit types, which means it leaves
>> gaps in the stack frame, and the thunking code has no way of dealing
>> with that.
>
>
> I was hoping you would comment on this, as I believe the issue is larger
> than Arm (which is why I thought the Arm patch could be integrated), since I
> ran into something similar for X64 and AARCH64, with the conversion from
> stack to register parameters.
>
> Let me start by showing a real-life example of what the current EBC
> implementation does, on different architectures, if you CALLEX from EBC into
> a native function call such as:
>
> VOID MultiParamNative(
> UINT32,
> UINT64,
> UINT64,
> UINT64,
> UINT32,
> UINT32,
> UINT64
> );
>
> with values:
>
> 0x1C1C1C1C,
> 0x2B2B2B2B2A2A2A2A,
> 0x3B3B3B3B3A3A3A3A,
> 0x4B4B4B4B4A4A4A4A,
> 0x5C5C5C5C,
> 0x6C6C6C6C,
> 0x7B7B7B7B7A7A7A7A
>
> If you do that, then the parameter values seen by each Arch will be as
> follows:
>
> IA32:
>   p1 = 0x1C1C1C1C
>   p2 = 0x2B2B2B2B2A2A2A2A
>   p3 = 0x3B3B3B3B3A3A3A3A
>   p4 = 0x4B4B4B4B4A4A4A4A
>   p5 = 0x5C5C5C5C
>   p6 = 0x6C6C6C6C
>   p7 = 0x7B7B7B7B7A7A7A7A
>
> X64:
>   p1 = 0x1C1C1C1C
>   p2 = 0x3A3A3A3A2B2B2B2B
>   p3 = 0x4A4A4A4A3B3B3B3B
>   p4 = 0x5C5C5C5C4B4B4B4B
>   p5 = 0x6C6C6C6C
>   p6 = 0x7B7B7B7B
>   p7 = 0x06F23E4012345678
>
> ARM:
>   p1 = 0x1C1C1C1C
>   p2 = 0x3A3A3A3A2B2B2B2B
>   p3 = 0x4A4A4A4A3B3B3B3B
>   p4 = 0x5C5C5C5C4B4B4B4B
>   p5 = 0x6C6C6C6C
>   p6 = 0x7A7A7A7A
>   p7 = 0x446EEC467B7B7B7B
>
> AA64:
>   p1 = 0x1C1C1C1C
>   p2 = 0x3A3A3A3A2B2B2B2B
>   p3 = 0x4A4A4A4A3B3B3B3B
>   p4 = 0x5C5C5C5C4B4B4B4B
>   p5 = 0x6C6C6C6C
>   p6 = 0x7B7B7B7B
>   p7 = 0xFF91
>
> Note that these are real-life results gotten from a native set of drivers
> [1] + EBC sample [2], specifically designed to test the above.
>
> So, as you can see, only IA32 currently retrieves the parameters with their
> expected values. All the other platforms, and not just Arm, have an issue
> with parameter retrieval.
>
> I too performed some analysis [3], to understand the problem, the result of
> which can be summarized as follows:
>
> Let's say you have native protocol function:
>
>   ProtocolCall(UINT32, UINT64, UINT64)
>
> to which you want to pass values:
>
>   (0x1C1C1C1C, 0x2B2B2B2B2A2A2A2A, 0x3B3B3B3B3A3A3A3A)
>
> With the EBC VM, the parameters then get stacked as (little endian, CDECL
> and using 32-bit longwords to represent the stack):
>
>++
>|1C1C1C1C|
>++
>|2A2A2A2A|
>++
>|2B2B2B2B|
>++
>|3A3A3A3A|
>++
>|3B3B3B3B|
>++
>++
>++
>
> Now, if you are calling into an x86_32 arch, this is no issue, as the native
> call reads the parameters off the stack, and finds each one it its expected
> location.
>
> But if, say, you are calling into native Arm, then the calling convention
> dictates that the first four 32 bits parameters must be placed into Arm
> registers r0-r3, rather than on the stack, and what's more, that if there
> exist 64 bit parameters among the register ones, they must start with an
> even register (r0 or r2).
>
> What this means is that, with the current EBC handling, which simply maps
> the top of the stack onto registers for native CALLEX (as the VM cannot
> guess the parameter signature of the function it is calling into, and
> therefore will not do anything else but a blind mapping of the stack onto
> registers), the native Arm function effectively gets called with the
> following parameter mapping:
>
>++
>|1C1C1C1C|  -> r0 (32-bit first parameter)
>++
>|2A2A2A2A|  -> (r1/unused, since first parameter is 32-bit)
>++
>|2B2B2B2B|  -> r2 (lower half of 64-bit second parameter)
>++
>|3A3A3A3A|  -> r3 (upper half of 64-bit second parameter)
>++
>|3B3B3B3B|  -> lower half of 64-bit third parameter (stack)
>++
>++  -> upper half of 64-bit third parameter (stack)
>++
>
> The end result is that, the Arm call ends up with these values:
>
>   (0x1C1C1C1C, 0x3A3A3A3A2B2B2B2B, 0x3B3B3B3B)
>
> However, while we used Arm for this example, this is not an Arm specific
> issue, as x86_64 and Arm64 also expect any of the first eight parameters to
> a native call, that are smaller than 64-bit, to get passed as a 64-bit
> register, which means they too have the same issue as the one illustrated
> above.
>
> Now, I'm not sure what the solution to that issue would be. I tend to agree
> that, short of including a parameter signature for function calls, this
> function argument marshalling issue between EBC and native will be 

Re: [edk2] [PATCH 0/1] MdeModulePkg/EbcDxe: add ARM support

2016-09-22 Thread Pete Batard

On 2016.09.22 11:06, Ard Biesheuvel wrote:

However, there is a fundamental issue with EBC on ARM that has not
been addressed yet, which makes EBC support problematic:
ARM uses natural alignment for 64-bit types, which means it leaves
gaps in the stack frame, and the thunking code has no way of dealing
with that.


I was hoping you would comment on this, as I believe the issue is larger 
than Arm (which is why I thought the Arm patch could be integrated), 
since I ran into something similar for X64 and AARCH64, with the 
conversion from stack to register parameters.


Let me start by showing a real-life example of what the current EBC 
implementation does, on different architectures, if you CALLEX from EBC 
into a native function call such as:


VOID MultiParamNative(
UINT32,
UINT64,
UINT64,
UINT64,
UINT32,
UINT32,
UINT64
);

with values:

0x1C1C1C1C,
0x2B2B2B2B2A2A2A2A,
0x3B3B3B3B3A3A3A3A,
0x4B4B4B4B4A4A4A4A,
0x5C5C5C5C,
0x6C6C6C6C,
0x7B7B7B7B7A7A7A7A

If you do that, then the parameter values seen by each Arch will be as 
follows:


IA32:
  p1 = 0x1C1C1C1C
  p2 = 0x2B2B2B2B2A2A2A2A
  p3 = 0x3B3B3B3B3A3A3A3A
  p4 = 0x4B4B4B4B4A4A4A4A
  p5 = 0x5C5C5C5C
  p6 = 0x6C6C6C6C
  p7 = 0x7B7B7B7B7A7A7A7A

X64:
  p1 = 0x1C1C1C1C
  p2 = 0x3A3A3A3A2B2B2B2B
  p3 = 0x4A4A4A4A3B3B3B3B
  p4 = 0x5C5C5C5C4B4B4B4B
  p5 = 0x6C6C6C6C
  p6 = 0x7B7B7B7B
  p7 = 0x06F23E4012345678

ARM:
  p1 = 0x1C1C1C1C
  p2 = 0x3A3A3A3A2B2B2B2B
  p3 = 0x4A4A4A4A3B3B3B3B
  p4 = 0x5C5C5C5C4B4B4B4B
  p5 = 0x6C6C6C6C
  p6 = 0x7A7A7A7A
  p7 = 0x446EEC467B7B7B7B

AA64:
  p1 = 0x1C1C1C1C
  p2 = 0x3A3A3A3A2B2B2B2B
  p3 = 0x4A4A4A4A3B3B3B3B
  p4 = 0x5C5C5C5C4B4B4B4B
  p5 = 0x6C6C6C6C
  p6 = 0x7B7B7B7B
  p7 = 0xFF91

Note that these are real-life results gotten from a native set of 
drivers [1] + EBC sample [2], specifically designed to test the above.


So, as you can see, only IA32 currently retrieves the parameters with 
their expected values. All the other platforms, and not just Arm, have 
an issue with parameter retrieval.


I too performed some analysis [3], to understand the problem, the result 
of which can be summarized as follows:


Let's say you have native protocol function:

  ProtocolCall(UINT32, UINT64, UINT64)

to which you want to pass values:

  (0x1C1C1C1C, 0x2B2B2B2B2A2A2A2A, 0x3B3B3B3B3A3A3A3A)

With the EBC VM, the parameters then get stacked as (little endian, 
CDECL and using 32-bit longwords to represent the stack):


   ++
   |1C1C1C1C|
   ++
   |2A2A2A2A|
   ++
   |2B2B2B2B|
   ++
   |3A3A3A3A|
   ++
   |3B3B3B3B|
   ++
   ++
   ++

Now, if you are calling into an x86_32 arch, this is no issue, as the 
native call reads the parameters off the stack, and finds each one it 
its expected location.


But if, say, you are calling into native Arm, then the calling 
convention dictates that the first four 32 bits parameters must be 
placed into Arm registers r0-r3, rather than on the stack, and what's 
more, that if there exist 64 bit parameters among the register ones, 
they must start with an even register (r0 or r2).


What this means is that, with the current EBC handling, which simply 
maps the top of the stack onto registers for native CALLEX (as the VM 
cannot guess the parameter signature of the function it is calling into, 
and therefore will not do anything else but a blind mapping of the stack 
onto registers), the native Arm function effectively gets called with 
the following parameter mapping:


   ++
   |1C1C1C1C|  -> r0 (32-bit first parameter)
   ++
   |2A2A2A2A|  -> (r1/unused, since first parameter is 32-bit)
   ++
   |2B2B2B2B|  -> r2 (lower half of 64-bit second parameter)
   ++
   |3A3A3A3A|  -> r3 (upper half of 64-bit second parameter)
   ++
   |3B3B3B3B|  -> lower half of 64-bit third parameter (stack)
   ++
   ++  -> upper half of 64-bit third parameter (stack)
   ++

The end result is that, the Arm call ends up with these values:

  (0x1C1C1C1C, 0x3A3A3A3A2B2B2B2B, 0x3B3B3B3B)

However, while we used Arm for this example, this is not an Arm specific 
issue, as x86_64 and Arm64 also expect any of the first eight parameters 
to a native call, that are smaller than 64-bit, to get passed as a 
64-bit register, which means they too have the same issue as the one 
illustrated above.


Now, I'm not sure what the solution to that issue would be. I tend to 
agree that, short of including a parameter signature for function calls, 
this function argument marshalling issue between EBC and native will be 
difficult to solve. A possible half-workaround I thought of could be to 
keep track of all the PUSHes having been carried out before a CALLEX, 
and *assume* (or mandate in the specs) that all the arguments were 
pushed individually and that the size of the PUSH matches the desired 
size for a register argument, but 

Re: [edk2] [PATCH 0/1] MdeModulePkg/EbcDxe: add ARM support

2016-09-22 Thread Ard Biesheuvel
On 22 September 2016 at 11:06, Ard Biesheuvel  wrote:
> On 22 September 2016 at 10:43, Pete Batard  wrote:
>> Hi,
>>
>> The following is an updated/fixed version of the patch(es), put forward by
>> Ard Biesheuvel on August 9 ([1], [2]), and re-submitted for formal
>> inclusion, so that the EDK2 can provide EBC functionality for all of IA32,
>> IA64, X64, AARCH64 and ARM at last.
>>
>> This updated patch now includes the necessary corollary dsc/fdf updates as
>> well as fixes to the assembly's EbcLLCALLEXNative, as I found the following
>> issues there:
>> - At least gcc5 didn't seem to like the manually optimized branching for all
>> register args ("sub r1, r1, r3, lsr #1"), and one can never be sure of the
>> actual size instructions will be assembled into, in case of assembler
>> internal alignment/optimization, so I broke it down into actual labelled
>> branches. There are only 4 of those anyway.
>> - For register + stack calls, while 8 x 64 bit registers on AARCH64 do
>> equate to #64 bytes that need to be taken off the stack, on ARM the 4 x 32
>> bit registers equate to #16 bytes, not #32
>> - Even after fixing the above, I found some issues with the manual stack
>> duplication assembly code, so I switched to using a call to CopyMem(), like
>> IA32 does.
>>
>> With these changes, I believe that the ARM/EBC feature should be fully
>> functional, especially as I have heavily tested multiparameter calls from
>> EBC into native, using an fasmg-based EBC assembler [3], to confirm that
>> they performed just as well with ARM as with AARCH64, IA32 or X64.
>>
>
> Hello Pete,
>
> Thanks a lot for this contribution. I had spotted (and fixed) some of
> the above issues as well.
>
> However, there is a fundamental issue with EBC on ARM that has not
> been addressed yet, which makes EBC support problematic:
> ARM uses natural alignment for 64-bit types, which means it leaves
> gaps in the stack frame, and the thunking code has no way of dealing
> with that.
>
> I am pasting my analysis below, which I sent out internally a couple
> of weeks ago. In summary, we need language spec and compiler updates
> before we can fully support this on 32-bit ARM.
>

BTW, the EDK2 tree has an EBC version of the FAT filesystem driver,
which is what I have been using to test EBC. I have a Frankenstein
version of the 32-bit ARM one (shared below) that deals with the
padding of known protocol methods that contain UINT64 arguments at odd
positions, but it is not pretty, and a clear example why the spec
needs to be updated to accommodate ARM

https://git.linaro.org/people/ard.biesheuvel/uefi-next.git/shortlog/refs/heads/ebc3
___
edk2-devel mailing list
edk2-devel@lists.01.org
https://lists.01.org/mailman/listinfo/edk2-devel


Re: [edk2] [PATCH 0/1] MdeModulePkg/EbcDxe: add ARM support

2016-09-22 Thread Ard Biesheuvel
On 22 September 2016 at 10:43, Pete Batard  wrote:
> Hi,
>
> The following is an updated/fixed version of the patch(es), put forward by
> Ard Biesheuvel on August 9 ([1], [2]), and re-submitted for formal
> inclusion, so that the EDK2 can provide EBC functionality for all of IA32,
> IA64, X64, AARCH64 and ARM at last.
>
> This updated patch now includes the necessary corollary dsc/fdf updates as
> well as fixes to the assembly's EbcLLCALLEXNative, as I found the following
> issues there:
> - At least gcc5 didn't seem to like the manually optimized branching for all
> register args ("sub r1, r1, r3, lsr #1"), and one can never be sure of the
> actual size instructions will be assembled into, in case of assembler
> internal alignment/optimization, so I broke it down into actual labelled
> branches. There are only 4 of those anyway.
> - For register + stack calls, while 8 x 64 bit registers on AARCH64 do
> equate to #64 bytes that need to be taken off the stack, on ARM the 4 x 32
> bit registers equate to #16 bytes, not #32
> - Even after fixing the above, I found some issues with the manual stack
> duplication assembly code, so I switched to using a call to CopyMem(), like
> IA32 does.
>
> With these changes, I believe that the ARM/EBC feature should be fully
> functional, especially as I have heavily tested multiparameter calls from
> EBC into native, using an fasmg-based EBC assembler [3], to confirm that
> they performed just as well with ARM as with AARCH64, IA32 or X64.
>

Hello Pete,

Thanks a lot for this contribution. I had spotted (and fixed) some of
the above issues as well.

However, there is a fundamental issue with EBC on ARM that has not
been addressed yet, which makes EBC support problematic:
ARM uses natural alignment for 64-bit types, which means it leaves
gaps in the stack frame, and the thunking code has no way of dealing
with that.

I am pasting my analysis below, which I sent out internally a couple
of weeks ago. In summary, we need language spec and compiler updates
before we can fully support this on 32-bit ARM.

Thanks,
Ard.

--

This compares the EBC argument stack with the argument assignment across
registers and stack expected by the respective Procedure Call Standards
for AArch64 and AArch32.

Since the EBC thunking layer is not aware of the actual prototype signature
of the function that is being called (it does not even know which part of the
stack frame consists of outgoing arguments, and so it needs to assume that the
entire stack frame needs to be copied into arguments and the native stack), the
calls can only execute correctly if the EBC stack frame happens to align all
arguments natively, in which case the AAPCS happens to agree with the EBC
cdecl calling conventions (although the first 8 resp 4 arguments are passed
via registers). In the diagrams below, this is the case if the diagrams line
up horizontally.

In summary, EBC on AArch64 seems to be OK (although more testing is needed),
as long as we don't pass arguments whose size exceeds 64 bits (which the EBC
compiler is unlikely to support anyway)

EBC on AArch32 happens to work as long as no UINT64 values appear as the
return value or as an odd-numbered argument. Since it is impossible to infer
from EBC bytecode whether any such function calls are being performed, the only
way to fix this is to update the EBC spec (and the compiler) to insert hints
into the bytecode when such problematic values occur.

Below is a comparison between the stack frame layouts of various protocol
entry points that are relevant to EBC drivers, i.e., PCI I/O, block I/O and
network I/O)



typedef
EFI_STATUS
(EFIAPI *EFI_BLOCK_READ)(
  IN EFI_BLOCK_IO_PROTOCOL  *This,
  IN UINT32 MediaId,
  IN EFI_LBALba,
  IN UINTN  BufferSize,
  OUT VOID  *Buffer
  );

Executing on 64-bit (ok)


 EBC stack   AArch64 registers
0x00 ++ ++
 | This   |  x0 | This   |
0x08 ++ ++
 | MediaId|  x1 | MediaId|
0x10 ++ ++
 |  Lba   |  x2 |  Lba   |
0x18 ++ ++
 |   BufferSize   |  x3 |   BufferSize   |
0x20 ++ ++
 | Buffer |  x4 | Buffer |
0x28 ++ ++
 :   :
 ++ ++
 R7  |  Return value  |  x0 |  Return value  |
 ++ ++

Executing on 32-bit (ok)