Re: [OpenOCD-devel] Cortex-A cache handling thoughts

Chris Johns Fri, 17 Apr 2015 00:14:17 -0700

On 10/04/2015 7:12 am, Paul Fertser wrote:
> Hi,
> 
> Oleksij is working on making OpenOCD properly for debugging kernel and
> the userspace on what seems to be the toughest ARM configuration now:
> dual SMP core Cortex-A9 (i.MX6) with MMU, L1 and L2 caches enabled and
> AHB-AP available. We spent some time today experimenting and
> discussing the issues faced and I'd like to sum it up here.
>


Many thanks for addressing this issue. Using a recent OpenOCD on a
Xilinx Zynq device with the RTEMS real-time operating system does not
work. The data is wrong and single stepping is broken and this is on a
single core. Note, RTEMS sets up the MMU/caches after OpenOCD starts
running.

> My idea is that we should first agree on how it should be done
> properly, then try to implement it; current patches we have on Gerrit,
> are not cutting it at all, AFAICT.
> 
> I have never used OpenOCD with Cortex-A myself, so I might be wrong
> here and there, please correct me if you see a mistake.
> 
> Some basic terminology (without strict definitions) first:
> 
> * AHB-AP access: fast access to main memory, without involving any CPU
>   caches, available on some targets
> * APB-AP access: slower memory access going basically through the CPU
>   core, equivalent to what would target firmware perform, available on
>   all targets;
> * L1 cache: per-core instruction and data caches
> * L2 cache: common cache for all SMP cores
> * Cache invalidation/flush: drops current cache data by marking it
>   invalid
> * Cache cleaning: pushes current cache contents (if it was written to)
>   to upper cache (or RAM) without invalidating anything
> * Software breakpoint: a breakpoint implemented by temporarily
>   changing a target firmware instruction in RAM to a bkpt command
> 
> Current issues: cache handling in general is a mess and
> inconsistent. One specific problem worth mentioning is that L2 cache
> management requires APB access so when AHB is available, it ends up
> doing nothing.
> 
> Now I'll try to describe what is the desired behaviour in my
> opinion. The main rationale is that for the end-user be least
> surprised we should ensure that OpenOCD manipulates exactly the same
> data as the target firmware on the currently active core. The three
> major usecases I describe here are:
> 
> 1. Regular memory access (mdw/mww);
> 2. Loading and dumping big chunks of data;
> 3. Software breakpoints.
> 
> I'll discuss them one by one, in order.
> 
> 
> 1. Regular memory access
> 
> To make it correct with APB no cache operations should be performed at
> all. Rationale: regular data reads and writes are expected to be done
> per-core. Implicit cache operations are not needed here, as the user
> shouldn't expect other cores to be fully in sync at arbitrary points
> of time.
> 
> When reading via AHB: clean affected dcache, clean affected l2 cache,
> read via AHB;
> 
> When writing via AHB: write data, invalidate affected dcache and
> icache on the current core and l2 cache
> 
> An open question: is using AHB here ever worth it? Should at least
> "phys" operations be always using APB?
> 

I think support should first be based around the supported access on all
targets, APB. If AHB proves a performance win for large reads/writes on
hardware that supports it maybe it can be added later. On the other hand
if you add it now I will not complain. :)

> 2. Loading and dumping big chunks of data
> 
> Here the user likely expects all cores to see the same picture, so:
> 
> APB read: clean affected dcache on all cores, clean affected l2 cache,
> perform read;
> AHB read: same;

What does 'clean' mean ? Does this mean you flush and then invalidate so
the cpu is given priority over DMA type hardware writing into the same
area ?

> APB write: do the write; clean affected dcache, clean affected l2
> cache, invalidate affected icache and dcache on all cores;
> AHB write: do the write; invalidate affected icache and dcache on all
> cores, invalide affected l2 cache;
> 
> Oleksij also says that for AHB operations here he'd like to have an
> option to omit cache maintenance due to potential performance issues
> but I have an impression there're none.

In terms is debugging a real-time application, hitting any break point
means anything real-time has gone. Showing correct data in the debugger
is important. I do not think performance is an issue.

> 3. Software breakpoints
> 
> Breakpoints are special in SMP configuration because the end-user
> expects all cores to be affected by the same set of breakpoints.

I think for now this is ok but in time I wonder if this will change. I
see some use cases such as some cores stopping and some left untouched
as nice but the rules for use would need to be clearly defined to make
it possible to implement. For example multi-processing on SMP hardware,
that is different operating systems on separate cores in separate memory
spaces.

> 
> Setting and clearing a breakpoint via APB: write memory, clean
> affected dcache, clean affected l2 cache, invalidate affected icache
> on all cores
> 
> Setting and clearing a breakpoint via AHB: write memory, invalidate
> affected icache on all cores, invalidate affected l2 cache
> 
> Is it worth doing via AHB ever?

For me this depends on the cost to implement. I think APB is ok for
breakpoints. I suspect the JTAG and USB overheads will be higher than
the overhead differences between the buses. I suspect you will always
need APB support so if you have this working first you can move to AHB
in time if practicable.

> 
> There's a possible optimisation opportunity for when a step or resume
> operation is performed after the target was stopped on a breakpoint:
> currently OpenOCD uses generic unset (to restore the original
> instruction), then single-steps the current core, and then sets the
> breakpoint again. When using APB these operations can spare cache
> maintenance as only the current core should be affected anyway.
> 

If you say the cores all see the same breakpoints then I assume you will
have to stop all cores when one stops. It also means you cannot touch
any memory based breakpoints until all have stopped. To do as you
suggest all cores need to be halted and then the single step is only to
happen in the single core being addressed. If you do this and you can
ensure the memory is the same after the completed step process the
caches of the other cores do not need to be touched. Note, you must make
sure any interrupt path is not allowed to be followed when single
stepping at the instruction level.

> 
> It's getting late here, so I might be messing something up
> again. Please correct, discuss and let's develop a consolidated sane
> approach to finally make OpenOCD do the right thing on those powerful
> ARM cores.
> 

Again many thanks for addressing this issue.

Chris

------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
OpenOCD-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openocd-devel

Re: [OpenOCD-devel] Cortex-A cache handling thoughts

Reply via email to