Re: ZynqMP and Versal crash clearing coherent cache memory with memset

2021-10-18 Thread Chris Johns
On 19/10/21 8:59 am, Joel Sherrill wrote:
> On Mon, Oct 18, 2021 at 4:28 PM Chris Johns  wrote:
>>
>> On 19/10/21 3:53 am, Kinsey Moore wrote:
>>> On 10/18/2021 00:44, Chris Johns wrote:
 Hi,

 I cannot run libbsd on real hardware because the cadence rx descriptor 
 cache
 coherent allocation crashes in `memset`. It is used to clear the memory.

 The rtemsbsd allocator call optionally clears the memory and it seems the 
 newlib
 aarch64 memset code crashes when doing this. A basic loop with 8bit or 
 32bit
 writes does not crash. The memset call happily clears an array in cached 
 memory
 with different offsets.

 I have posted a patch to spcache01 that generates the crash on Versal and 
 ZynqMP
 hardware. The crash dump is:

 test cache coherent allocation
 clear cache coherent with memset: 0x1fe00050


 *** FATAL ***
 fatal source: 9 (RTEMS_FATAL_SOURCE_EXCEPTION)


 X0   = 0x1fe00050 X17  = 0x000c
 X1   = 0x X18  = 0x17b0
 X2   = 0x0110 X19  = 0x1fe00050
 X3   = 0x1fe000c0 X20  = 0x1fdfff80
 X4   = 0x1fe00250 X21  = 0x10013ab0
 X5   = 0x0004 X22  = 0x
 X6   = 0x0001 X23  = 0x
 X7   = 0x X24  = 0x10103140
 X8   = 0x X25  = 0x
 X9   = 0xff80ffc8 X26  = 0x
 X10  = 0x X27  = 0x
 X11  = 0x1010ca78 X28  = 0x
 X12  = 0x0001 FP   = 0x1010cc30
 X13  = 0x1fe00050 LR   = 0x10001f94
 X14  = 0x SP   = 0x1010cc30
 X15  = 0x0004 PC   = 0x100125c0
 X16  = 0x1000f700 DAIF = 0x03c0
 VEC  = 0x0004 CPSR = 0x6005
 ESR  = EC: 0b100101 IL: 0b1 ISS: 0b00111
 Data Abort taken without a change in Exception level
 FAR  = 0x1fe000c0
 FPCR = 0x FPSR = 0x0010

 The Versal (A72) fails in exactly the same way. The allocated address is
 0x1fe00050 and the FAR is 0x1fe000c0 so I am not sure if the "0xc0 - 0x50"
 section is aligning the pointer to a larger word size for better 
 performance and
 that first part is OK but the different word size breaks.
>>>
>>> I'm running with a toolchain that was built with
>>> --targetcflags="-DPREFER_SIZE_OVER_SPEED" which affects the content of the
>>> memset function, so my memset is just loops of writes and seems to work 
>>> fine.
>>
>> Oh. Maybe the eng manual needs a piece on this. Using flags on tool chains 
>> like
>> this is fine for a user because it is use at your own peril however I believe
>> patches need to be tested with the defaults for all tools. It is way to hard 
>> to
>> baseline a BSP if tweaks are needed here and there.
> 
> We did try to merge this to the RSB as a temporary workaround for ilp32 
> issues.
> Kinsey may have realized it had this impact also but I don't recall
> being aware of it.

Sure and we need to accommodate this but I think as a policy we need to make
sure patches are tested with default tool sets. I cannot see how we can make
things work without having this happen?

> We didn't want it to be a local hack. :)

It may have to be just that. It seem to me we have an IPL32 BSP that needs a
special set of tools and that constrains any other aarch64 BSPs if it became the
default. Do we want that? If the cached memory gets a performance boost from a
better memset, memcpy etc then I hope that is available to me by default.

>>> Just out of curiousity, what instruction was at that PC address? If it was 
>>> "dc
>>> zva", then I had seen this a while back during initial AArch64 bringup and 
>>> had
>>> assumed it was fixed since the addition of the MMU code since that 
>>> instruction
>>> doesn't work on device memory.
>>
>> It this that instruction ...
>>
>> 100125c0:   d50b7423dc  zva, x3
>>
>> Looks like it is not fixed.
> 
> I think your suggestion that FreeBSD should not use memset for device memory
> is the right path though. But that could be in a lot of places. :(

I do not know. The allocation is under the bus space DMA allocator and that
interface is complicated. Maybe memset is not suitable?

Chris
___
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel


Re: ZynqMP and Versal crash clearing coherent cache memory with memset

2021-10-18 Thread Joel Sherrill
On Mon, Oct 18, 2021 at 4:28 PM Chris Johns  wrote:
>
> On 19/10/21 3:53 am, Kinsey Moore wrote:
> > On 10/18/2021 00:44, Chris Johns wrote:
> >> Hi,
> >>
> >> I cannot run libbsd on real hardware because the cadence rx descriptor 
> >> cache
> >> coherent allocation crashes in `memset`. It is used to clear the memory.
> >>
> >> The rtemsbsd allocator call optionally clears the memory and it seems the 
> >> newlib
> >> aarch64 memset code crashes when doing this. A basic loop with 8bit or 
> >> 32bit
> >> writes does not crash. The memset call happily clears an array in cached 
> >> memory
> >> with different offsets.
> >>
> >> I have posted a patch to spcache01 that generates the crash on Versal and 
> >> ZynqMP
> >> hardware. The crash dump is:
> >>
> >> test cache coherent allocation
> >> clear cache coherent with memset: 0x1fe00050
> >>
> >>
> >> *** FATAL ***
> >> fatal source: 9 (RTEMS_FATAL_SOURCE_EXCEPTION)
> >>
> >>
> >> X0   = 0x1fe00050 X17  = 0x000c
> >> X1   = 0x X18  = 0x17b0
> >> X2   = 0x0110 X19  = 0x1fe00050
> >> X3   = 0x1fe000c0 X20  = 0x1fdfff80
> >> X4   = 0x1fe00250 X21  = 0x10013ab0
> >> X5   = 0x0004 X22  = 0x
> >> X6   = 0x0001 X23  = 0x
> >> X7   = 0x X24  = 0x10103140
> >> X8   = 0x X25  = 0x
> >> X9   = 0xff80ffc8 X26  = 0x
> >> X10  = 0x X27  = 0x
> >> X11  = 0x1010ca78 X28  = 0x
> >> X12  = 0x0001 FP   = 0x1010cc30
> >> X13  = 0x1fe00050 LR   = 0x10001f94
> >> X14  = 0x SP   = 0x1010cc30
> >> X15  = 0x0004 PC   = 0x100125c0
> >> X16  = 0x1000f700 DAIF = 0x03c0
> >> VEC  = 0x0004 CPSR = 0x6005
> >> ESR  = EC: 0b100101 IL: 0b1 ISS: 0b00111
> >> Data Abort taken without a change in Exception level
> >> FAR  = 0x1fe000c0
> >> FPCR = 0x FPSR = 0x0010
> >>
> >> The Versal (A72) fails in exactly the same way. The allocated address is
> >> 0x1fe00050 and the FAR is 0x1fe000c0 so I am not sure if the "0xc0 - 0x50"
> >> section is aligning the pointer to a larger word size for better 
> >> performance and
> >> that first part is OK but the different word size breaks.
> >
> > I'm running with a toolchain that was built with
> > --targetcflags="-DPREFER_SIZE_OVER_SPEED" which affects the content of the
> > memset function, so my memset is just loops of writes and seems to work 
> > fine.
>
> Oh. Maybe the eng manual needs a piece on this. Using flags on tool chains 
> like
> this is fine for a user because it is use at your own peril however I believe
> patches need to be tested with the defaults for all tools. It is way to hard 
> to
> baseline a BSP if tweaks are needed here and there.

We did try to merge this to the RSB as a temporary workaround for ilp32 issues.
Kinsey may have realized it had this impact also but I don't recall
being aware of
it.

We didn't want it to be a local hack. :)

> > Just out of curiousity, what instruction was at that PC address? If it was 
> > "dc
> > zva", then I had seen this a while back during initial AArch64 bringup and 
> > had
> > assumed it was fixed since the addition of the MMU code since that 
> > instruction
> > doesn't work on device memory.
>
> It this that instruction ...
>
> 100125c0:   d50b7423dc  zva, x3
>
> Looks like it is not fixed.

I think your suggestion that FreeBSD should not use memset for device memory
is the right path though. But that could be in a lot of places. :(

>
> > It looks like bsp_section_nocacheheap sits inside bsp_section_nocachenoload
> > which is mapped as device memory which wouldn't play nicely with the "dc 
> > zva"
> > instruction.
> Cache coherent memory is mapped as device memory. The descriptors are mapped
> into it so checks of bits and updates are effected by caches.
>
> Chris
> ___
> devel mailing list
> devel@rtems.org
> http://lists.rtems.org/mailman/listinfo/devel
___
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel


Re: ZynqMP and Versal crash clearing coherent cache memory with memset

2021-10-18 Thread Chris Johns
On 19/10/21 3:53 am, Kinsey Moore wrote:
> On 10/18/2021 00:44, Chris Johns wrote:
>> Hi,
>>
>> I cannot run libbsd on real hardware because the cadence rx descriptor cache
>> coherent allocation crashes in `memset`. It is used to clear the memory.
>>
>> The rtemsbsd allocator call optionally clears the memory and it seems the 
>> newlib
>> aarch64 memset code crashes when doing this. A basic loop with 8bit or 32bit
>> writes does not crash. The memset call happily clears an array in cached 
>> memory
>> with different offsets.
>>
>> I have posted a patch to spcache01 that generates the crash on Versal and 
>> ZynqMP
>> hardware. The crash dump is:
>>
>> test cache coherent allocation
>> clear cache coherent with memset: 0x1fe00050
>>
>>
>> *** FATAL ***
>> fatal source: 9 (RTEMS_FATAL_SOURCE_EXCEPTION)
>>
>>
>> X0   = 0x1fe00050 X17  = 0x000c
>> X1   = 0x X18  = 0x17b0
>> X2   = 0x0110 X19  = 0x1fe00050
>> X3   = 0x1fe000c0 X20  = 0x1fdfff80
>> X4   = 0x1fe00250 X21  = 0x10013ab0
>> X5   = 0x0004 X22  = 0x
>> X6   = 0x0001 X23  = 0x
>> X7   = 0x X24  = 0x10103140
>> X8   = 0x X25  = 0x
>> X9   = 0xff80ffc8 X26  = 0x
>> X10  = 0x X27  = 0x
>> X11  = 0x1010ca78 X28  = 0x
>> X12  = 0x0001 FP   = 0x1010cc30
>> X13  = 0x1fe00050 LR   = 0x10001f94
>> X14  = 0x SP   = 0x1010cc30
>> X15  = 0x0004 PC   = 0x100125c0
>> X16  = 0x1000f700 DAIF = 0x03c0
>> VEC  = 0x0004 CPSR = 0x6005
>> ESR  = EC: 0b100101 IL: 0b1 ISS: 0b00111
>>     Data Abort taken without a change in Exception level
>> FAR  = 0x1fe000c0
>> FPCR = 0x FPSR = 0x0010
>>
>> The Versal (A72) fails in exactly the same way. The allocated address is
>> 0x1fe00050 and the FAR is 0x1fe000c0 so I am not sure if the "0xc0 - 0x50"
>> section is aligning the pointer to a larger word size for better performance 
>> and
>> that first part is OK but the different word size breaks.
> 
> I'm running with a toolchain that was built with
> --targetcflags="-DPREFER_SIZE_OVER_SPEED" which affects the content of the
> memset function, so my memset is just loops of writes and seems to work fine.

Oh. Maybe the eng manual needs a piece on this. Using flags on tool chains like
this is fine for a user because it is use at your own peril however I believe
patches need to be tested with the defaults for all tools. It is way to hard to
baseline a BSP if tweaks are needed here and there.

> Just out of curiousity, what instruction was at that PC address? If it was "dc
> zva", then I had seen this a while back during initial AArch64 bringup and had
> assumed it was fixed since the addition of the MMU code since that instruction
> doesn't work on device memory.

It this that instruction ...

100125c0:   d50b7423dc  zva, x3

Looks like it is not fixed.

> It looks like bsp_section_nocacheheap sits inside bsp_section_nocachenoload
> which is mapped as device memory which wouldn't play nicely with the "dc zva"
> instruction.
Cache coherent memory is mapped as device memory. The descriptors are mapped
into it so checks of bits and updates are effected by caches.

Chris
___
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel

Re: ZynqMP and Versal crash clearing coherent cache memory with memset

2021-10-18 Thread Kinsey Moore

On 10/18/2021 00:44, Chris Johns wrote:

Hi,

I cannot run libbsd on real hardware because the cadence rx descriptor cache
coherent allocation crashes in `memset`. It is used to clear the memory.

The rtemsbsd allocator call optionally clears the memory and it seems the newlib
aarch64 memset code crashes when doing this. A basic loop with 8bit or 32bit
writes does not crash. The memset call happily clears an array in cached memory
with different offsets.

I have posted a patch to spcache01 that generates the crash on Versal and ZynqMP
hardware. The crash dump is:

test cache coherent allocation
clear cache coherent with memset: 0x1fe00050


*** FATAL ***
fatal source: 9 (RTEMS_FATAL_SOURCE_EXCEPTION)


X0   = 0x1fe00050 X17  = 0x000c
X1   = 0x X18  = 0x17b0
X2   = 0x0110 X19  = 0x1fe00050
X3   = 0x1fe000c0 X20  = 0x1fdfff80
X4   = 0x1fe00250 X21  = 0x10013ab0
X5   = 0x0004 X22  = 0x
X6   = 0x0001 X23  = 0x
X7   = 0x X24  = 0x10103140
X8   = 0x X25  = 0x
X9   = 0xff80ffc8 X26  = 0x
X10  = 0x X27  = 0x
X11  = 0x1010ca78 X28  = 0x
X12  = 0x0001 FP   = 0x1010cc30
X13  = 0x1fe00050 LR   = 0x10001f94
X14  = 0x SP   = 0x1010cc30
X15  = 0x0004 PC   = 0x100125c0
X16  = 0x1000f700 DAIF = 0x03c0
VEC  = 0x0004 CPSR = 0x6005
ESR  = EC: 0b100101 IL: 0b1 ISS: 0b00111
Data Abort taken without a change in Exception level
FAR  = 0x1fe000c0
FPCR = 0x FPSR = 0x0010

The Versal (A72) fails in exactly the same way. The allocated address is
0x1fe00050 and the FAR is 0x1fe000c0 so I am not sure if the "0xc0 - 0x50"
section is aligning the pointer to a larger word size for better performance and
that first part is OK but the different word size breaks.


I'm running with a toolchain that was built with 
--targetcflags="-DPREFER_SIZE_OVER_SPEED" which affects the content of 
the memset function, so my memset is just loops of writes and seems to 
work fine. Just out of curiousity, what instruction was at that PC 
address? If it was "dc zva", then I had seen this a while back during 
initial AArch64 bringup and had assumed it was fixed since the addition 
of the MMU code since that instruction doesn't work on device memory.



It looks like bsp_section_nocacheheap sits inside 
bsp_section_nocachenoload which is mapped as device memory which 
wouldn't play nicely with the "dc zva" instruction.



Kinsey

___
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel


ZynqMP and Versal crash clearing coherent cache memory with memset

2021-10-17 Thread Chris Johns
Hi,

I cannot run libbsd on real hardware because the cadence rx descriptor cache
coherent allocation crashes in `memset`. It is used to clear the memory.

The rtemsbsd allocator call optionally clears the memory and it seems the newlib
aarch64 memset code crashes when doing this. A basic loop with 8bit or 32bit
writes does not crash. The memset call happily clears an array in cached memory
with different offsets.

I have posted a patch to spcache01 that generates the crash on Versal and ZynqMP
hardware. The crash dump is:

test cache coherent allocation
clear cache coherent with memset: 0x1fe00050


*** FATAL ***
fatal source: 9 (RTEMS_FATAL_SOURCE_EXCEPTION)


X0   = 0x1fe00050 X17  = 0x000c
X1   = 0x X18  = 0x17b0
X2   = 0x0110 X19  = 0x1fe00050
X3   = 0x1fe000c0 X20  = 0x1fdfff80
X4   = 0x1fe00250 X21  = 0x10013ab0
X5   = 0x0004 X22  = 0x
X6   = 0x0001 X23  = 0x
X7   = 0x X24  = 0x10103140
X8   = 0x X25  = 0x
X9   = 0xff80ffc8 X26  = 0x
X10  = 0x X27  = 0x
X11  = 0x1010ca78 X28  = 0x
X12  = 0x0001 FP   = 0x1010cc30
X13  = 0x1fe00050 LR   = 0x10001f94
X14  = 0x SP   = 0x1010cc30
X15  = 0x0004 PC   = 0x100125c0
X16  = 0x1000f700 DAIF = 0x03c0
VEC  = 0x0004 CPSR = 0x6005
ESR  = EC: 0b100101 IL: 0b1 ISS: 0b00111
   Data Abort taken without a change in Exception level
FAR  = 0x1fe000c0
FPCR = 0x FPSR = 0x0010

The Versal (A72) fails in exactly the same way. The allocated address is
0x1fe00050 and the FAR is 0x1fe000c0 so I am not sure if the "0xc0 - 0x50"
section is aligning the pointer to a larger word size for better performance and
that first part is OK but the different word size breaks.

Chris
___
devel mailing list
devel@rtems.org
http://lists.rtems.org/mailman/listinfo/devel