[OpenOCD-devel] Cortex-A cache handling thoughts

Paul Fertser Thu, 09 Apr 2015 14:16:14 -0700

Hi,

Oleksij is working on making OpenOCD properly for debugging kernel and
the userspace on what seems to be the toughest ARM configuration now:
dual SMP core Cortex-A9 (i.MX6) with MMU, L1 and L2 caches enabled and
AHB-AP available. We spent some time today experimenting and
discussing the issues faced and I'd like to sum it up here.


My idea is that we should first agree on how it should be done
properly, then try to implement it; current patches we have on Gerrit,
are not cutting it at all, AFAICT.

I have never used OpenOCD with Cortex-A myself, so I might be wrong
here and there, please correct me if you see a mistake.

Some basic terminology (without strict definitions) first:

* AHB-AP access: fast access to main memory, without involving any CPU
  caches, available on some targets
* APB-AP access: slower memory access going basically through the CPU
  core, equivalent to what would target firmware perform, available on
  all targets;
* L1 cache: per-core instruction and data caches
* L2 cache: common cache for all SMP cores
* Cache invalidation/flush: drops current cache data by marking it
  invalid
* Cache cleaning: pushes current cache contents (if it was written to)
  to upper cache (or RAM) without invalidating anything
* Software breakpoint: a breakpoint implemented by temporarily
  changing a target firmware instruction in RAM to a bkpt command

Current issues: cache handling in general is a mess and
inconsistent. One specific problem worth mentioning is that L2 cache
management requires APB access so when AHB is available, it ends up
doing nothing.

Now I'll try to describe what is the desired behaviour in my
opinion. The main rationale is that for the end-user be least
surprised we should ensure that OpenOCD manipulates exactly the same
data as the target firmware on the currently active core. The three
major usecases I describe here are:

1. Regular memory access (mdw/mww);
2. Loading and dumping big chunks of data;
3. Software breakpoints.

I'll discuss them one by one, in order.


1. Regular memory access

To make it correct with APB no cache operations should be performed at
all. Rationale: regular data reads and writes are expected to be done
per-core. Implicit cache operations are not needed here, as the user
shouldn't expect other cores to be fully in sync at arbitrary points
of time.

When reading via AHB: clean affected dcache, clean affected l2 cache,
read via AHB;

When writing via AHB: write data, invalidate affected dcache and
icache on the current core and l2 cache

An open question: is using AHB here ever worth it? Should at least
"phys" operations be always using APB?


2. Loading and dumping big chunks of data

Here the user likely expects all cores to see the same picture, so:

APB read: clean affected dcache on all cores, clean affected l2 cache,
perform read;
AHB read: same;

APB write: do the write; clean affected dcache, clean affected l2
cache, invalidate affected icache and dcache on all cores;
AHB write: do the write; invalidate affected icache and dcache on all
cores, invalide affected l2 cache;

Oleksij also says that for AHB operations here he'd like to have an
option to omit cache maintenance due to potential performance issues
but I have an impression there're none.


3. Software breakpoints

Breakpoints are special in SMP configuration because the end-user
expects all cores to be affected by the same set of breakpoints.

Setting and clearing a breakpoint via APB: write memory, clean
affected dcache, clean affected l2 cache, invalidate affected icache
on all cores

Setting and clearing a breakpoint via AHB: write memory, invalidate
affected icache on all cores, invalidate affected l2 cache

Is it worth doing via AHB ever?

There's a possible optimisation opportunity for when a step or resume
operation is performed after the target was stopped on a breakpoint:
currently OpenOCD uses generic unset (to restore the original
instruction), then single-steps the current core, and then sets the
breakpoint again. When using APB these operations can spare cache
maintenance as only the current core should be affected anyway.


It's getting late here, so I might be messing something up
again. Please correct, discuss and let's develop a consolidated sane
approach to finally make OpenOCD do the right thing on those powerful
ARM cores.

HTH
-- 
Be free, use free (http://www.gnu.org/philosophy/free-sw.html) software!
mailto:[email protected]

------------------------------------------------------------------------------
BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard
Learn Process modeling best practices with Bonita BPM through live exercises
http://www.bonitasoft.com/be-part-of-it/events/bpm-camp-virtual- event?utm_
source=Sourceforge_BPM_Camp_5_6_15&utm_medium=email&utm_campaign=VA_SF
_______________________________________________
OpenOCD-devel mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/openocd-devel

[OpenOCD-devel] Cortex-A cache handling thoughts

Reply via email to