Hi Adrian, OpenOCD creates a link between the CPU and GDB. Nor GDB nor OpenOCD have the knowledge required to access the RAM directly (apart for the cortex-m case where the RAM is mapped in the same address space of the debug registers). By the way, not relevant here, there are cases where having GDB accessing the RAM is required, but this is work on progress in https://review.openocd.org/c/openocd/+/8815
Today GDB memory requests are managed through the CPU. For armv8r, OpenOCD injects instructions in the pipeline of the CPU to R/W memory through CPU registers. Memory is read and written by the CPU, eventually passing through the D-Cache. I-Cache is not impacted, so after a memory write it is required to flush D-Cache and invalidate I-Cache before executing a new code. The D-Cache flush requires also a flush of any write buffer present in the CPU. The I-Cache invalidate requires the invalidate of the instructions pre-fetch buffer in the CPU. The sequence provided by ARM should guarantee all is property synchronized. The SW sequence in OpenOCD is incorrect; it misses at least the invalidate of the pre-fetch buffer. Regards Antonio On Mon, May 5, 2025, 14:16 Adrien CHARRUEL <acharr...@nanoxplore.com> wrote: > Hi Antonio, > > Sorry the word "sideloading" may not be appropriate in my case. > > What I meant it that, to my understanding, OpenOCD is loading the ELF file > through the DAP of the SoC. > The DAP is linked to the interconnect and has access to the embedded RAM > to write the binary. > > The loading of the second ELF file is done without reseting the board. > And doing so, it bypasses the caches of the CPUs thus compromising the > coherency. So caches have to be invalidated before running the CPUs. > > Here is a schematic of our architecture. > I hope this makes sense. > > > OPENOCD > > │ > > │ > > │ ┌─────────────┐ > > │ │ │ > > │ │ eRAM │ > > │ │ │ > > │ └──────┬──────┘ > > ┌────┴─────┐ ┌─────────────────────┴─────────────────────────────────┐ > │ │ │ │ > │ DAP ├─────┤ ARM Network Interconnect │ > │ │ │ │ > └──────────┘ └────────────────────────┬──────────────────────────────┘ > │ > ┌────────────┐ ┌─┌───┴───┐─┐ > │ │ │ │ CACHE │ │ > │ CORE0 │ │ └───────┘ │ > │ TCMA ├─┤ │ > │ │ │ │ > └────────────┘ │ │ > ┌────────────┐ │ CPU0 │ > │ │ │ │ > │ CORE0 │ │ │ > │ TCMB ├─┤ │ > │ │ │ │ > └────────────┘ └───────────┘ > > Thanks for your time. > Best regards, > > Adrien Charruel > ------------------------------ > *De :* Antonio Borneo <borneo.anto...@gmail.com> > *Envoyé :* dimanche 4 mai 2025 13:03 > *À :* Adrien CHARRUEL <acharr...@nanoxplore.com> > *Cc :* openocd-devel@lists.sourceforge.net < > openocd-devel@lists.sourceforge.net> > *Objet :* Re: ARMv8r: Unable to Load ELF With Cache Enabled > > On Tue, Apr 29, 2025 at 4:12 PM Adrien CHARRUEL > <acharr...@nanoxplore.com> wrote: > > > > Hi, > > > > I'd like to raise an issue we are having in my team with OpenOCD. I'm > not able to load an ELF file from GDB once some code is already running and > caches have been enabled. > > > > Here is a more complete description of my use case: > > > > Start OpenOCD and connect with GDB. > > Load a first ELF file and run it through GDB. This first program enables > caches. > > Halt the target. > > Load a second ELF file through GDB with the "load" command. > > Run it => fails... Sometimes I get a crash. For very small program the > old one still resides in cache and is executed instead of the new one. > > > > The issue is that I'm sideloading the binary through the DAP and memory > coherency is not preserved. > > > Hi, > what do you mean by "sideloading the binary through the DAP" ? > The GDB command "load" in your email uses the current target for the > load. No sideloading through other APs is implemented. > > In latest ARM reference manual for armv8a > https://developer.arm.com/documentation/ddi0487/latest/ > in chapter for 64 bits B2.7.4.2 - "Synchronization and coherency > issues between data and instruction accesses" > and for 32 bits E2.5.3.2 - "Synchronization and coherency issues > between data and instruction accesses" > there are the sequences of instructions to guarantee the coherency. > I see that OpenOCD code does not respect it. > On Cortex-A35 during step-by-step execution, changing a next > instruction with a SW breakpoint fails because the CPU has already > pre-fetched next instructions. > I cannot find information on such a prefetch, but it's reasonable to > consider it exists and that the correct sequence from ARM is required > to invalidate it. > > Checking the extra documents for armv8r, DDI0568 and DDI0600, I don't > find any specific info. > > Probably updating the sequence in OpenOCD for my SW breakpoint issue > is fixing your issue too. > > Antonio > > > Here are some more details on my setup: > > > > Target: quad cortex-r52 from our in-house chip. It's a standard armv8r > core. We use our in-house JTAG probe, Angie, which is upstreamed in openocd. > > Configuration script is "ngultra.cfg" enclosed to this email. > > Command line: > > ./src/openocd -f tcl/interface/angie.cfg -f tcl/target/ngultra.cfg > > List of GDB commands: > > target remote :3333 > > load apps/test_led/out/test_led.elf > > file apps/test_led/out/test_led.elf > > continue > > CTRL+C > > load apps/test_led2/out/test_led2.elf > > file apps/test_led2/out/test_led2.elf > > continue > > Expected: second led test should run fine, it's not the case. > > OpenOCD logs enclosed as "openocd.log". > > > > > > Version of OpenOCD is SHA1 "169d463a3d3c91f62c980aba287b5e110b310ad0" > with extra patches available here: > > > > https://review.openocd.org/q/topic:nx-armv8r > > https://review.openocd.org/q/topic:nx-angie > > > > > > > > Potential solution and discussion: > > > > As there is already a discussion on this topic with Antonio ( > https://review.openocd.org/c/openocd/+/8656), I removed this part from my > patchset. > > Indeed Antonio's point is good, it's too slow to attempt a clear cache > at every memory write. It's not an option. > > > > I prepared another patch that adds a "clear_caches" command for aarch64. > > The user will have to manually call this before loading another binary. > It might be a better solution. > > > > I'm still wondering if the clear cache could be called from another > function, maybe in the aarch64_halt() callback, but I can't tell if there > will be side effects. > > > > Moreover, this behaviour is not only relevant for armv8r but for armv8a > as well. I reproduced it by using aarch64 instead of armv8r in the target > configuration. > > > > I hope that my question is clear enough. I'll remain available to > discuss this point. > > Maybe it's an expected behaviour and I'd like to hear from the community > to know how to deal with this issue and for me to have a better > understanding of it. > > > > Thanks a lot. > > Best regards, > > > > > > > > > > Adrien Charruel >