On 10/19/2016 06:01 PM, Stephen Warren wrote: > On 10/19/2016 04:32 PM, york sun wrote: >> On 10/19/2016 12:18 PM, Stephen Warren wrote: >>> On 10/19/2016 09:25 AM, Stephen Warren wrote: >>>> On 10/14/2016 02:17 PM, York Sun wrote: >>>>> Current code turns off d-cache first, then flush all levels of cache. >>>>> This results data loss. As soon as d-cache is off, the dirty cache >>>>> is discarded according to the test on LS2080A. This issue was not >>>>> seen as long as external L3 cache was flushed to push the data to >>>>> main memory. However, external L3 cache is not guaranteed to have >>>>> the data. To fix this, flush the d-cache by way/set first to make >>>>> sure cache is clean before turning it off. >>>> >>>>> diff --git a/arch/arm/cpu/armv8/cache_v8.c >>>>> b/arch/arm/cpu/armv8/cache_v8.c >>>> >>>>> @@ -478,9 +478,9 @@ void dcache_disable(void) >>>> >>>>> + flush_dcache_all(); >>>>> set_sctlr(sctlr & ~(CR_C|CR_M)); >>>>> >>>>> - flush_dcache_all(); >>>>> __asm_invalidate_tlb_all(); >>>> >>>> I talked to Mark Rutland at ARM, and I believe the current code is >>>> correct. Here's my interpretation of what he said: >>>> >>>> The dcache must be disabled first. This prevents allocation of new >>>> entries in the cache during the flush operation, which prevents the race >>>> conditions that were mentioned in the other thread. >>>> >>>> Then, the flush operation must be invoked. Since the cache is now >>>> disabled, this can fully flush the cache without worrying about racing >>>> with things being added to the cache. >>>> >>>> This all implies that the implementation of dcache_disable(), >>>> set_sctlr(), flush_dcache_all(), and any code they call must not access >>>> data in DRAM at all; since because the dcache is off, any DRAM access >>>> will[1] read potentially stale data from DRAM, rather than any dirty >>>> data that might be in the cache. >>>> >>>> [1] I'm not sure if that's "will" or "may", i.e. whether this is >>>> architecturally guaranteed in ARMv8 or is implementation defined. At >>>> least the Cortex A72 TRM says "will" for that CPU; not sure about others. >>>> >>>> Perhaps the most obvious upshot of this is that the stack can't be used. >>>> This implies to me that we need to recode all those functions purely in >>>> assembly, or just possibly play some tricks to 100% force gcc not to >>>> touch memory anywhere inside dcache_disable() or the functions it calls. >>>> We're just getting lucky here right now since everything happens to be >>>> inlined, but I don't think we're doing anything to 100% guarantee this. >>>> >>>> What worries me here is that at least on Tegra, a "flush the entire >>>> dcache" operation requires an SMC call to the secure monitor. That will >>>> certainly access DRAM when the secure monitor runs, but perhaps this >>>> doesn't matter since that's at a different exception level, and we know >>>> the secure monitor accesses DRAM regions that are separate from U-Boot's >>>> DRAM? I suspect life isn't that convenient. I'm wondering if this all >>>> implies that, like patch 2 in this series, we need to get 100% away from >>>> flush-by-set/way, even with SoC-specific hooks to make that work >>>> reliably, and just flush everything by VA, which IIRC is architecturally >>>> guaranteed to work without SoC-specific logic. That way, we can >>>> encapsulate everything into an assembly function without worrying about >>>> calling SMCs or SoC-specific hook functions without using DRAM. Of >>>> course, how that assembly function knows which VAs to flush without >>>> decoding the page tables or other data structure is another matter:-( >>> >>> Re: the last paragraph there: >>> >>> After reading the ARMv8 ARM, I see that EL1, EL2, and EL3 all have >>> separate cache enable bits in SCTLR_ELx. I believe U-Boot only needs to >>> flush/... its own RAM out of the dcache, since that's all that's >>> relevant at the EL U-Boot is running at, and doesn't care about anything >>> EL3 might have mapped cached. So, it's safe to invoke SMCs from the >>> cache flush code in U-Boot even if the EL3 code touches its own DRAM. >>> There might be a corner case where this isn't true if EL3 has some >>> EL1/EL2-owned RAM mapped, but I don't expect that to be the case here. >>> >>> Re: my other series to add more cache hooks: I'll re-implement the Tegra >>> hook in assembly so it's guaranteed not to touch RAM, retest, and resumbit. >>> >>> If it turns out that dcache_disable() ever starts touching DRAM at the >>> wrong time, we can deal with that then; it doesn't now at least. >>> >> >> Stephen, >> >> I think one of our difference is whether the data is returned correctly >> with dirty dcache being disabled. With current code flow, the dcache is >> disabled, followed by flushing/invalidating by way/set, then L3 cache is >> flushed. I see correct stack after these steps. During my recent attempt >> to remove flushing L3, I added code to flush the stack by VA (which I >> already confirmed the data will end up in main memory) to replace >> flushing L3. I found as soon as the dcache is disabled, the dirty cache >> data is lost when trying to access from the core (debugged by JTAG >> tool). This makes me to think the order may be wrong. By reverting the >> order, i.e. flushing dcache first then disable it, I can see correct >> data in main memory. I understand the proposed new code requires not >> touching stack or any data after the first flush. I prefer to not to >> make this change if we have an explanation of what I saw. > > I think the confusion is: what does "lost" mean? > > On ARMv8 at least: > > Starting off with dcache enabled and containing some dirty data: > > CPU reads will read the dirty data from the cache. > > Now if dcache is disabled: > > CPU reads will be of the uncached/uncachable type since dcache is off. > The dcache will not respond. So, the CPU will receive (potentially) > stale data directly from RAM. > > In this state, it is not possible to access the dirty data in the > caches. However, it's not (permanently) lost... > > Now if the dcache is fully flushed: > > All dirty data that was in the dcache will be written to RAM. After this > process is complete, any reads from RAM will return the most up-to-date > possible data; the temporarily "lost" dirty data is now no longer lost. > > > The cache disable process must proceed as above; we can't flush the > dcache and then disable it. Attempting to do things in that order would > introduce race conditions. At all times before the dcache is disabled, > any CPU writes to memory will allocate new lines in the dcache. Also, > the dcache is fully responsive to the cache coherency protocol, so some > other agent on the bus (another CPU, cluster, DMA agent, ...) could > cause a (clean or dirty) line to be brought into the cache. This might > happen after that part of the cache would have been flushed. In other > words, it's impossible to fully flush the cache (by set/way) while it's > enabled. > > > I'm not sure that flushing by VA instead of by set/way solves anything > much here. I believe flushing by VA runs less risk of interference due > to the coherency protocol bringing lines back into the cache, since > flush by VA will affect other caches? However this still doesn't allow > use of the CPU's own stack or other DRAM writes; if we: > > a) Flush by VA > (this step is assumed to perform some function calls/returns, or other > DRAM writes) > > b) Disable cache > > ... then irrespective of how step (a) was performed, since the DRAM > writes performed in that step are likely still in the cache as dirty > data, so step (b) will cause them to be "lost" until a full flush > operation is performed. Thus we must always flush after disabling the > cache, so there's no point also flushing before. > > Stephen,
Sorry for late response. I am still traveling... I understand the data in dirty cache is not lost when the dcache is disabled. It is just not accessible. In my case, after flushing L1/L2 by way/set, the data is probably in L3 cache. Without flushing L3, I have stalled data (including the stack) in main memory. My previous test was trying to prove I can skip flushing L3 if I flush the necessary VA. Now I when recall it carefully, I think I made a mistake by flushing by VA _after_ flushing the cache by way/set. I might have a positive result if I flushed the cache by VA first. I will repeat this test when I get back to prove this theory. That being said, do you or anyone see any value by skipping flushing L3? My target was to avoid making a SMC call for EL2/EL1, and to avoid implementing flushing L3 function in U-Boot (even we already have one for CCN-504). I understand this is different from what dcache_disable() implies. York _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot