[linux-sunxi] Re: Allwinner A10: the L2 cache may not keep up with the speed at 1GHz
On Tue, 6 May 2014 12:34:45 +0300 Siarhei Siamashka wrote: > On Sun, 4 May 2014 11:36:10 +0300 > Siarhei Siamashka wrote: > > > Hello, > > > > Yesterday I have been trying to debug what's causing the XFCE desktop > > background artefacts on my A10-Lime, which look like this: > > > > http://people.freedesktop.org/~siamashka/files/20140504/a10-l2-cache-fail-artefacts-in-xfce.png > > > > And narrowed them down to ARM Cortex-A8 L2 cache failures, which > > are reproducible when doing JPEG decoding: > > > > $ djpeg -v > > libjpeg-turbo version 1.3.0 (build 20130811) > > > > $ wget http://linux-sunxi.org/images/8/83/A10-LIME.jpg > > > > $ djpeg A10-LIME.jpg | md5sum > > 691497bd2e5d36976c1ea3150de89df6 - > > > > $ djpeg A10-LIME.jpg | md5sum > > 6a874af750f92e1e3c019f2df7edf3f7 - > > > > $ djpeg A10-LIME.jpg | md5sum > > 297b98ba10233cbbcea2566e1c4fd7c7 - > > > > Please note that the md5sum of the decoded JPEG file is different for > > each run. > > > > There are other ways to reproduce it (the FFmpeg test suite can detect > > this problem too), but the djpeg test is very simple and fast to do. > > In the case if somebody does not have the djpeg tool from libjpeg-turbo > > in their distro, I have a static djpeg binary here for extra > > convenience: > > http://people.freedesktop.org/~siamashka/files/20140504/djpeg-static > > It has been built using: > > > > http://people.freedesktop.org/~siamashka/files/20140504/build-static-djpeg.sh > > > > On my collection of just three Allwinner A10 based devices, I get the > > following results with the libjpeg-turbo djpeg test (and the default > > CPU core voltage): > > A10-Lime- fails at 1008MHz (960MHz is fine) > > Mele A2000 - fails at 1152MHz (1104MHz is fine) > > Cubieboard1 - fails at 1152MHz (1104MHz is fine) > > > > Why is it likely related to the L2 cache? Because this problem goes > > away if we disable the L2 cache by adding something like > > mrc p15, 0, r10, c1, c0, 1 > > bic r10, r10, #(1 << 1) > > mcr p15, 0, r10, c1, c0, 1 > > to the code around > > > > https://github.com/linux-sunxi/linux-sunxi/blob/sunxi-v3.4.86-r0/arch/arm/mm/proc-v7.S#L248 > > > > It is also interesting that sun4i and sun5i have different L2 cache > > latency parameters configured there. I have tried increasing the > > latencies in the L2 Cache Auxiliary Control Register, but these > > changes did not seem to affect anything. It looks like the only > > important factors are the CPU clock speed and the CPU core > > voltage (increasing it to 1.45V from 1.4V also fixes the problem > > on my A10-Lime). > > > > Anyway, with the sample size of just 3 devices, 33% of them appear to > > be unable to run stable at 1GHz and 1.4V core voltage. I wonder, how > > common is this problem in general? Are there any other Allwinner A10 > > devices failing the libjpeg-turbo djpeg test at 1GHz? > > > > Also it would make sense to run reliability tests for all the cpufreq > > operating points, because any frequency+voltage pair can be a weak link. > > Implemented an automated script for running tests at different > operating points: > https://github.com/ssvb/cpuburn-arm/blob/master/cpufreq-ljt-stress-test > > Only 1008MHz appears to be really problematic on my A10-Lime device. An > example of running it: > > lime ~ # ./cpufreq-ljt-stress-test > CPU stress test, which is doing JPEG decoding by libjpeg-turbo > at different cpufreq operating points. > > Testing CPU 0 > 1488 MHz SKIPPED > 1440 MHz SKIPPED > 1392 MHz SKIPPED > 1344 MHz SKIPPED > 1296 MHz SKIPPED > 1248 MHz SKIPPED > 1200 MHz SKIPPED > 1152 MHz SKIPPED > 1104 MHz SKIPPED > 1056 MHz SKIPPED > 1008 MHz . FAILED > 960 MHz OK > 912 MHz OK A follow up to this (better late than never). Olliver Schinagl has run the cpufreq-ljt-stress-test test on multiple A10-Lime devices today: http://irclog.whitequark.org/linux-sunxi/2014-07-03#9499336; Appears that it failed on his revA A10-Lime and worked fine on eight other revC A10-Lime boards. Together with my revA A10-Lime, we have a perfect two out of two failure rate. All the other A10 based devices (8 Olliver's revC A10-Lime, my Cubieboard1 and Mele A2000, and also lioka's hackberry) pass the test. Now that we have finally collected the long awaited statistics, it looks pretty obvious that there is something wrong specifically with the revision A of the A10-Lime board. The revision A was a pre-production 'developer' batch of the A10-Lime board and very few people should be affected (I got one donated to me for free, so can't really complain). Kudos to Koen Kooi for providing us with a hint about the possible voltage drop on the power line connecting the AXP209 PMIC and the A10 SoC, which seems to explain the problem: https://www.mail-archive.com/linux-sunxi@
[linux-sunxi] Re: Allwinner A10: the L2 cache may not keep up with the speed at 1GHz
On Tue, 6 May 2014 12:34:45 +0300 Siarhei Siamashka wrote: > Implemented an automated script for running tests at different > operating points: > https://github.com/ssvb/cpuburn-arm/blob/master/cpufreq-ljt-stress-test And also added a script to parse the cpufreq tables from the sunxi-3.4 kernel to generate a plot picture via gnuplot: https://github.com/ssvb/cpuburn-arm/blob/547c229284f3/gnuplot-sunxi-cpufreq Using the cpufreq-ljt-stress-test script I have found the quasi-stable pairs of the CPU clock frequency and voltage for my A10-OLinuXino-LIME to populate the following table: #define SAFETY_MARGIN 0 /* should be set to at least 25 */ static struct cpufreq_dvfs sun4i_poorlime_dvfs_table[] = { {.freq = 100800, .volt = 1450 + SAFETY_MARGIN}, {.freq = 96000, .volt = 1400 + SAFETY_MARGIN}, {.freq = 91200, .volt = 1350 + SAFETY_MARGIN}, {.freq = 86400, .volt = 1300 + SAFETY_MARGIN}, {.freq = 69600, .volt = 1275 + SAFETY_MARGIN}, {.freq = 64800, .volt = 1250 + SAFETY_MARGIN}, {.freq = 52800, .volt = 1225 + SAFETY_MARGIN}, {.freq = 48000, .volt = 1200 + SAFETY_MARGIN}, {.freq = 43200, .volt = 1175 + SAFETY_MARGIN}, {.freq = 40800, .volt = 1125 + SAFETY_MARGIN}, {.freq = 38400, .volt = 1025 + SAFETY_MARGIN}, {.freq = 0, .volt = 1000}, /* end of cpu dvfs table */ }; Quasi-stable here means that the cpufreq-ljt-stress-test can run without failures for something like 10 minutes at each of these points, but increasing the clock frequency or decreasing the core voltage makes it fail. Surely, running longer may reveal that some of these operating points are actually not stable. And also we can't be sure that libjpeg-turbo is really the toughest possible workload one can find. So it makes sense to additionally increase the voltage by at least one extra step (0.025V) to have some safety margin. Anyway, the generated plot is attached ('sun4i_poorlime' represents the A10-OLinuXino-LIME cpufreq table that I tried to make). Anyone should be able to generate the same picture too by running: gnuplot-sunxi-cpufreq result.png \ arch/arm/mach-sun7i/cpu-freq/cpu-freq.c \ arch/arm/plat-sunxi/cpu-freq/cpu-freq-table.c \ some_file_with_sun4i_poorlime_dvfs_table.c -- Best regards, Siarhei Siamashka -- You received this message because you are subscribed to the Google Groups "linux-sunxi" group. To unsubscribe from this group and stop receiving emails from it, send an email to linux-sunxi+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[linux-sunxi] Re: Allwinner A10: the L2 cache may not keep up with the speed at 1GHz
On Tue, 6 May 2014 12:34:45 +0300 Siarhei Siamashka wrote: > Implemented an automated script for running tests at different > operating points: > https://github.com/ssvb/cpuburn-arm/blob/master/cpufreq-ljt-stress-test > > Only 1008MHz appears to be really problematic on my A10-Lime device. And on the Cubietruck with Allwinner A20 I get: cubietruck ~ # ./cpufreq-ljt-stress-test CPU stress test, which is doing JPEG decoding by libjpeg-turbo at different cpufreq operating points. Testing CPU 0 1488 MHz SKIPPED 1440 MHz SKIPPED 1392 MHz SKIPPED 1344 MHz SKIPPED 1296 MHz SKIPPED 1248 MHz SKIPPED 1200 MHz SKIPPED 1152 MHz SKIPPED 1104 MHz SKIPPED 1056 MHz SKIPPED 1008 MHz SKIPPED 960 MHz SKIPPED 912 MHz OK 864 MHz OK 816 MHz OK 768 MHz OK 744 MHz OK 720 MHz OK 696 MHz OK 672 MHz OK 648 MHz OK 600 MHz OK 528 MHz OK 480 MHz OK 408 MHz OK 384 MHz OK 360 MHz OK 336 MHz OK 288 MHz OK 264 MHz OK 240 MHz OK 216 MHz OK 204 MHz OK 192 MHz .. FAILED 180 MHz OK 168 MHz OK 156 MHz OK 144 MHz OK 132 MHz OK 120 MHz OK 96 MHz . Which means that the test has spotted data corruption issues at 192MHz and deadlocked at 96MHz, even failing to finish. It is interesting to compare the fex files from the Cubieboard2 and the Cubietruck: https://github.com/linux-sunxi/sunxi-boards/blob/c36a1c2186b4/sys_config/a20/cubieboard2.fex https://github.com/linux-sunxi/sunxi-boards/blob/c36a1c2186b4/sys_config/a20/cubietruck.fex The Cubietruck uses "min_freq = 6000", which means that it can try to go as low as 60MHz. While for the Cubieboard2 we have "min_freq = 4", which means that 400MHz is the lowest limit. Just like I suspected since a long time ago and recently reminded in [1], cpufreq is a reliability hazard in its current implementation used by the sunxi-3.4 kernel. This may explain some of the mysterious deadlocks experienced by the users, who are suicidal enough to run their A20 hardware with the 'ondemand', 'interactive' or 'fantasy' cpufreq governors. Unfortunately this also includes innocent bystanders, who are just using sunxi-3.4 defconfigs :-( 1. https://www.mail-archive.com/linux-sunxi%40googlegroups.com/msg03612.html -- Best regards, Siarhei Siamashka -- You received this message because you are subscribed to the Google Groups "linux-sunxi" group. To unsubscribe from this group and stop receiving emails from it, send an email to linux-sunxi+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[linux-sunxi] Re: Allwinner A10: the L2 cache may not keep up with the speed at 1GHz
On Sun, 4 May 2014 11:36:10 +0300 Siarhei Siamashka wrote: > Hello, > > Yesterday I have been trying to debug what's causing the XFCE desktop > background artefacts on my A10-Lime, which look like this: > > http://people.freedesktop.org/~siamashka/files/20140504/a10-l2-cache-fail-artefacts-in-xfce.png > > And narrowed them down to ARM Cortex-A8 L2 cache failures, which > are reproducible when doing JPEG decoding: > > $ djpeg -v > libjpeg-turbo version 1.3.0 (build 20130811) > > $ wget http://linux-sunxi.org/images/8/83/A10-LIME.jpg > > $ djpeg A10-LIME.jpg | md5sum > 691497bd2e5d36976c1ea3150de89df6 - > > $ djpeg A10-LIME.jpg | md5sum > 6a874af750f92e1e3c019f2df7edf3f7 - > > $ djpeg A10-LIME.jpg | md5sum > 297b98ba10233cbbcea2566e1c4fd7c7 - > > Please note that the md5sum of the decoded JPEG file is different for > each run. > > There are other ways to reproduce it (the FFmpeg test suite can detect > this problem too), but the djpeg test is very simple and fast to do. > In the case if somebody does not have the djpeg tool from libjpeg-turbo > in their distro, I have a static djpeg binary here for extra > convenience: > http://people.freedesktop.org/~siamashka/files/20140504/djpeg-static > It has been built using: > > http://people.freedesktop.org/~siamashka/files/20140504/build-static-djpeg.sh > > On my collection of just three Allwinner A10 based devices, I get the > following results with the libjpeg-turbo djpeg test (and the default > CPU core voltage): > A10-Lime- fails at 1008MHz (960MHz is fine) > Mele A2000 - fails at 1152MHz (1104MHz is fine) > Cubieboard1 - fails at 1152MHz (1104MHz is fine) > > Why is it likely related to the L2 cache? Because this problem goes > away if we disable the L2 cache by adding something like > mrc p15, 0, r10, c1, c0, 1 > bic r10, r10, #(1 << 1) > mcr p15, 0, r10, c1, c0, 1 > to the code around > > https://github.com/linux-sunxi/linux-sunxi/blob/sunxi-v3.4.86-r0/arch/arm/mm/proc-v7.S#L248 > > It is also interesting that sun4i and sun5i have different L2 cache > latency parameters configured there. I have tried increasing the > latencies in the L2 Cache Auxiliary Control Register, but these > changes did not seem to affect anything. It looks like the only > important factors are the CPU clock speed and the CPU core > voltage (increasing it to 1.45V from 1.4V also fixes the problem > on my A10-Lime). > > Anyway, with the sample size of just 3 devices, 33% of them appear to > be unable to run stable at 1GHz and 1.4V core voltage. I wonder, how > common is this problem in general? Are there any other Allwinner A10 > devices failing the libjpeg-turbo djpeg test at 1GHz? > > Also it would make sense to run reliability tests for all the cpufreq > operating points, because any frequency+voltage pair can be a weak link. Implemented an automated script for running tests at different operating points: https://github.com/ssvb/cpuburn-arm/blob/master/cpufreq-ljt-stress-test Only 1008MHz appears to be really problematic on my A10-Lime device. An example of running it: lime ~ # ./cpufreq-ljt-stress-test CPU stress test, which is doing JPEG decoding by libjpeg-turbo at different cpufreq operating points. Testing CPU 0 1488 MHz SKIPPED 1440 MHz SKIPPED 1392 MHz SKIPPED 1344 MHz SKIPPED 1296 MHz SKIPPED 1248 MHz SKIPPED 1200 MHz SKIPPED 1152 MHz SKIPPED 1104 MHz SKIPPED 1056 MHz SKIPPED 1008 MHz . FAILED 960 MHz OK 912 MHz OK 864 MHz OK 816 MHz OK 768 MHz OK 744 MHz OK 720 MHz OK 696 MHz OK 672 MHz OK 648 MHz OK 600 MHz OK 576 MHz OK 528 MHz OK 480 MHz OK 432 MHz OK 408 MHz OK 384 MHz OK 360 MHz OK 336 MHz OK 300 MHz OK 288 MHz ...