Hi, On 22/04/16 13:09, Hans de Goede wrote: > Hi, > > On 22-04-16 13:46, Andre Przywara wrote: >> Hi Hans, >> >> thanks for the information and the heads up! >> >> On 22/04/16 11:48, Hans de Goede wrote: >>> Hi, >>> >>> On 22-04-16 11:32, Ian Campbell wrote: >>>> On Fri, 2016-04-15 at 09:34 +0200, Hans de Goede wrote: >>>>>> I wonder if what you are observing could be possibly explained by >>>>>> just >>>>>> a usual data corruption problem? Which may be happening when the DRAM >>>>>> clock speed is set higher than this particular device is able to >>>>>> handle >>>>>> in a reliable way. Inserting just one or more NOP instructions >>>>>> instead >>>>>> of the barrier could possibly change some timings too. >>>>>> >>>>>> If this patch helps, then it's fine. But I wonder if it is not merely >>>>>> making the problem latent instead of fixing the root cause? >>>>> I do believe that this patch addresses a real problem and is not >>>>> hiding >>>>> some dram timing issues, I might be wrong about the write-buffer being >>>>> the cause, it could simply be that the compiler is doing something bad >>>>> (despite the accesses being marked as volatile) and that the DSB >>>>> stops >>>>> the compiler from optimizing things too much. >>>> >>>> I have a _very_ vague memory of seeing something not disimilar to this >>>> (apparent write buffer interactions with MMU disabled) in the early >>>> days of Xen development, but that was probably on models and so may not >>>> have been representative of the intended behaviour of eventual silicon. >>>> >>>> It might be interesting to have a look at the generated assembly and >>>> see if it differs in more or less than the addition of the single >>>> instruction and perhaps experiment with just a compiler barrier. >>>> >>>> Andre, do you have any insights on this? >> >> Agree on the compiler barrier, frankly I don't see how this should break >> with caches on or off unless the actual instruction order is wrong or >> the compiler optimized something away. >> Regardless of the write buffer the core should make sure the subsequent >> reads return the value written before - especially if we are talking UP >> here. > > "the core should make sure the subsequent reads return the value written > before" > that is exactly the problem, we are writing 2 different values > to so DRAM_BASE and DRAM_BASE + 512MiB, then read them both back > and compare them, expecting them to be the same (both reads returning > the last written value) if the ramsize is 512MiB (this is used in > several places > in the dram controller code to auto-config number of rows, columns, etc.). > > But the core seems to just return the last written value, > rather then actually going out to the RAM and reading it from > there, which results in the function always returning false > (i.o.w. it claims no DRAM phys address wraparound is happening > at 512MiB).
Oh, right, I missed that part, sorry. So this is about physical aliasing? The DRAM controller has only n address lines connected, and changing a line >n shouldn't make a difference, right? And the write succeeds and does trigger an asynchronous abort? In this case you would indeed need some kind of "flushing", with caches on I'd say a DCCIMVAC (Clean and Invalidate data or unified cache line by MVA to PoC). So I did a quick poll around the office and people say that "dsb" is the right thing to do here (with MMU off). As this is backed by practical experience, I'd just say: good to go! > The DSB seems to fix this, but it might very well be the > compiler being to clever (although all accesses are done > through volatile pointers, so it really should not). Plus those writel and readl macros already have a compiler barrier, though on the "wrong" side for our purpose (before the write and after the read). Cheers, Andre. > > I'll try the barrier() fix when I've some time. > > Regards, > > Hans > > > > >> >>> >>> Andre here is the original mail/patch for reference: >>> >>> sunxi: mctl_mem_matches: Add missing memory barrier >>> >>> We are running with the caches disabled when mctl_mem_matches gets >>> called, >>> but the cpu's write buffer is still there and can still get in >>> the way, >>> add a memory barrier to fix this. >>> >>> This avoids mctl_mem_matches always returning false in some cases, >>> which >>> was resulting in: >>> >>> <snip> >>> >>> @@ -31,6 +32,7 @@ bool mctl_mem_matches(u32 offset) >>> /* Try to write different values to RAM at two addresses */ >>> writel(0, CONFIG_SYS_SDRAM_BASE); >>> writel(0xaa55aa55, (ulong)CONFIG_SYS_SDRAM_BASE + offset); >>> + DSB; >>> /* Check if the same value is actually observed when reading >>> back */ >>> return readl(CONFIG_SYS_SDRAM_BASE) == >>> readl((ulong)CONFIG_SYS_SDRAM_BASE + offset); >>> >>> >>> What this code is trying to do is determine RAM (chip) size by seeing >>> when >>> writing to RAM wrapsaround. >>> >>> This works with the DSB but not without (without it always returns >>> false) >>> this is on a Cortex A7 with the mmu (and data caches) disabled. >>> >>> Ian, I can try using just a compiler barrier, but I've never done so >>> before, how do I insert one ? >> >> barrier(); >> >> I am busy at the moment, but will take a look later. >> >> Cheers, >> Andre. >> > _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de http://lists.denx.de/mailman/listinfo/u-boot