[linux-sunxi] Re: Allwinner A10: the L2 cache may not keep up with the speed at 1GHz

2014-07-03 Thread Siarhei Siamashka
On Tue, 6 May 2014 12:34:45 +0300
Siarhei Siamashka  wrote:

> On Sun, 4 May 2014 11:36:10 +0300
> Siarhei Siamashka  wrote:
> 
> > Hello,
> > 
> > Yesterday I have been trying to debug what's causing the XFCE desktop
> > background artefacts on my A10-Lime, which look like this:
> > 
> > http://people.freedesktop.org/~siamashka/files/20140504/a10-l2-cache-fail-artefacts-in-xfce.png
> > 
> > And narrowed them down to ARM Cortex-A8 L2 cache failures, which
> > are reproducible when doing JPEG decoding:
> > 
> > $ djpeg -v   
> > libjpeg-turbo version 1.3.0 (build 20130811)
> > 
> > $ wget http://linux-sunxi.org/images/8/83/A10-LIME.jpg
> > 
> > $ djpeg A10-LIME.jpg | md5sum
> > 691497bd2e5d36976c1ea3150de89df6  -
> > 
> > $ djpeg A10-LIME.jpg | md5sum
> > 6a874af750f92e1e3c019f2df7edf3f7  -
> > 
> > $ djpeg A10-LIME.jpg | md5sum
> > 297b98ba10233cbbcea2566e1c4fd7c7  -
> > 
> > Please note that the md5sum of the decoded JPEG file is different for
> > each run.
> > 
> > There are other ways to reproduce it (the FFmpeg test suite can detect
> > this problem too), but the djpeg test is very simple and fast to do.
> > In the case if somebody does not have the djpeg tool from libjpeg-turbo
> > in their distro, I have a static djpeg binary here for extra
> > convenience:
> > http://people.freedesktop.org/~siamashka/files/20140504/djpeg-static
> > It has been built using:
> > 
> > http://people.freedesktop.org/~siamashka/files/20140504/build-static-djpeg.sh
> > 
> > On my collection of just three Allwinner A10 based devices, I get the
> > following results with the libjpeg-turbo djpeg test (and the default
> > CPU core voltage):
> > A10-Lime- fails at 1008MHz (960MHz is fine)
> > Mele A2000  - fails at 1152MHz (1104MHz is fine)
> > Cubieboard1 - fails at 1152MHz (1104MHz is fine)
> > 
> > Why is it likely related to the L2 cache? Because this problem goes
> > away if we disable the L2 cache by adding something like
> > mrc p15, 0, r10, c1, c0, 1
> > bic r10, r10, #(1 << 1)
> > mcr p15, 0, r10, c1, c0, 1
> > to the code around
> >
> > https://github.com/linux-sunxi/linux-sunxi/blob/sunxi-v3.4.86-r0/arch/arm/mm/proc-v7.S#L248
> > 
> > It is also interesting that sun4i and sun5i have different L2 cache
> > latency parameters configured there. I have tried increasing the
> > latencies in the L2 Cache Auxiliary Control Register, but these
> > changes did not seem to affect anything. It looks like the only
> > important factors are the CPU clock speed and the CPU core
> > voltage (increasing it to 1.45V from 1.4V also fixes the problem
> > on my A10-Lime).
> > 
> > Anyway, with the sample size of just 3 devices, 33% of them appear to
> > be unable to run stable at 1GHz and 1.4V core voltage. I wonder, how
> > common is this problem in general? Are there any other Allwinner A10
> > devices failing the libjpeg-turbo djpeg test at 1GHz?
> > 
> > Also it would make sense to run reliability tests for all the cpufreq
> > operating points, because any frequency+voltage pair can be a weak link.
> 
> Implemented an automated script for running tests at different
> operating points:
> https://github.com/ssvb/cpuburn-arm/blob/master/cpufreq-ljt-stress-test
> 
> Only 1008MHz appears to be really problematic on my A10-Lime device. An
> example of running it:
> 
> lime ~ # ./cpufreq-ljt-stress-test
> CPU stress test, which is doing JPEG decoding by libjpeg-turbo
> at different cpufreq operating points.
> 
> Testing CPU 0
>  1488 MHz SKIPPED
>  1440 MHz SKIPPED
>  1392 MHz SKIPPED
>  1344 MHz SKIPPED
>  1296 MHz SKIPPED
>  1248 MHz SKIPPED
>  1200 MHz SKIPPED
>  1152 MHz SKIPPED
>  1104 MHz SKIPPED
>  1056 MHz SKIPPED
>  1008 MHz . FAILED
>   960 MHz  OK
>   912 MHz  OK

A follow up to this (better late than never). Olliver Schinagl has run
the cpufreq-ljt-stress-test test on multiple A10-Lime devices today:

http://irclog.whitequark.org/linux-sunxi/2014-07-03#9499336;

Appears that it failed on his revA A10-Lime and worked fine on eight
other revC A10-Lime boards. Together with my revA A10-Lime, we have a
perfect two out of two failure rate. All the other A10 based devices (8
Olliver's revC A10-Lime, my Cubieboard1 and Mele A2000, and also
lioka's hackberry) pass the test.

Now that we have finally collected the long awaited statistics, it looks
pretty obvious that there is something wrong specifically with the
revision A of the A10-Lime board. The revision A was a pre-production
'developer' batch of the A10-Lime board and very few people should be
affected (I got one donated to me for free, so can't really complain).
Kudos to Koen Kooi for providing us with a hint about the possible
voltage drop on the power line connecting the AXP209 PMIC and the
A10 SoC, which seems to explain the problem:

https://www.mail-archive.com/linux-sunxi@

[linux-sunxi] Re: Allwinner A10: the L2 cache may not keep up with the speed at 1GHz

2014-05-12 Thread Siarhei Siamashka
On Tue, 6 May 2014 12:34:45 +0300
Siarhei Siamashka  wrote:

> Implemented an automated script for running tests at different
> operating points:
> https://github.com/ssvb/cpuburn-arm/blob/master/cpufreq-ljt-stress-test

And also added a script to parse the cpufreq tables from the sunxi-3.4
kernel to generate a plot picture via gnuplot:
https://github.com/ssvb/cpuburn-arm/blob/547c229284f3/gnuplot-sunxi-cpufreq

Using the cpufreq-ljt-stress-test script I have found the quasi-stable
pairs of the CPU clock frequency and voltage for my A10-OLinuXino-LIME
to populate the following table:

#define SAFETY_MARGIN 0 /* should be set to at least 25 */
static struct cpufreq_dvfs sun4i_poorlime_dvfs_table[] = {
{.freq = 100800, .volt = 1450 + SAFETY_MARGIN},
{.freq =  96000, .volt = 1400 + SAFETY_MARGIN},
{.freq =  91200, .volt = 1350 + SAFETY_MARGIN},
{.freq =  86400, .volt = 1300 + SAFETY_MARGIN},
{.freq =  69600, .volt = 1275 + SAFETY_MARGIN},
{.freq =  64800, .volt = 1250 + SAFETY_MARGIN},
{.freq =  52800, .volt = 1225 + SAFETY_MARGIN},
{.freq =  48000, .volt = 1200 + SAFETY_MARGIN},
{.freq =  43200, .volt = 1175 + SAFETY_MARGIN},
{.freq =  40800, .volt = 1125 + SAFETY_MARGIN},
{.freq =  38400, .volt = 1025 + SAFETY_MARGIN},
{.freq = 0, .volt = 1000}, /* end of cpu dvfs table */
};

Quasi-stable here means that the cpufreq-ljt-stress-test can run
without failures for something like 10 minutes at each of these
points, but increasing the clock frequency or decreasing the core
voltage makes it fail.

Surely, running longer may reveal that some of these operating points
are actually not stable. And also we can't be sure that libjpeg-turbo is
really the toughest possible workload one can find. So it makes sense to
additionally increase the voltage by at least one extra step (0.025V)
to have some safety margin.

Anyway, the generated plot is attached ('sun4i_poorlime' represents
the A10-OLinuXino-LIME cpufreq table that I tried to make). Anyone
should be able to generate the same picture too by running:

gnuplot-sunxi-cpufreq result.png \
  arch/arm/mach-sun7i/cpu-freq/cpu-freq.c \
  arch/arm/plat-sunxi/cpu-freq/cpu-freq-table.c \
  some_file_with_sun4i_poorlime_dvfs_table.c

-- 
Best regards,
Siarhei Siamashka

-- 
You received this message because you are subscribed to the Google Groups 
"linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to linux-sunxi+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[linux-sunxi] Re: Allwinner A10: the L2 cache may not keep up with the speed at 1GHz

2014-05-06 Thread Siarhei Siamashka
On Tue, 6 May 2014 12:34:45 +0300
Siarhei Siamashka  wrote:

> Implemented an automated script for running tests at different
> operating points:
> https://github.com/ssvb/cpuburn-arm/blob/master/cpufreq-ljt-stress-test
> 
> Only 1008MHz appears to be really problematic on my A10-Lime device.

And on the Cubietruck with Allwinner A20 I get:

cubietruck ~ # ./cpufreq-ljt-stress-test
CPU stress test, which is doing JPEG decoding by libjpeg-turbo
at different cpufreq operating points.

Testing CPU 0
 1488 MHz SKIPPED
 1440 MHz SKIPPED
 1392 MHz SKIPPED
 1344 MHz SKIPPED
 1296 MHz SKIPPED
 1248 MHz SKIPPED
 1200 MHz SKIPPED
 1152 MHz SKIPPED
 1104 MHz SKIPPED
 1056 MHz SKIPPED
 1008 MHz SKIPPED
  960 MHz SKIPPED
  912 MHz  OK
  864 MHz  OK
  816 MHz  OK
  768 MHz  OK
  744 MHz  OK
  720 MHz  OK
  696 MHz  OK
  672 MHz  OK
  648 MHz  OK
  600 MHz  OK
  528 MHz  OK
  480 MHz  OK
  408 MHz  OK
  384 MHz  OK
  360 MHz  OK
  336 MHz  OK
  288 MHz  OK
  264 MHz  OK
  240 MHz  OK
  216 MHz  OK
  204 MHz  OK
  192 MHz .. FAILED
  180 MHz  OK
  168 MHz  OK
  156 MHz  OK
  144 MHz  OK
  132 MHz  OK
  120 MHz  OK
   96 MHz .

Which means that the test has spotted data corruption issues at 192MHz
and deadlocked at 96MHz, even failing to finish.

It is interesting to compare the fex files from the Cubieboard2 and the
Cubietruck:

https://github.com/linux-sunxi/sunxi-boards/blob/c36a1c2186b4/sys_config/a20/cubieboard2.fex

https://github.com/linux-sunxi/sunxi-boards/blob/c36a1c2186b4/sys_config/a20/cubietruck.fex
The Cubietruck uses "min_freq = 6000", which means that it
can try to go as low as 60MHz. While for the Cubieboard2 we have
"min_freq = 4", which means that 400MHz is the lowest limit.

Just like I suspected since a long time ago and recently reminded
in [1], cpufreq is a reliability hazard in its current implementation
used by the sunxi-3.4 kernel. This may explain some of the mysterious
deadlocks experienced by the users, who are suicidal enough to run
their A20 hardware with the 'ondemand', 'interactive' or 'fantasy'
cpufreq governors. Unfortunately this also includes innocent bystanders,
who are just using sunxi-3.4 defconfigs :-(

1. https://www.mail-archive.com/linux-sunxi%40googlegroups.com/msg03612.html

-- 
Best regards,
Siarhei Siamashka

-- 
You received this message because you are subscribed to the Google Groups 
"linux-sunxi" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to linux-sunxi+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[linux-sunxi] Re: Allwinner A10: the L2 cache may not keep up with the speed at 1GHz

2014-05-06 Thread Siarhei Siamashka
On Sun, 4 May 2014 11:36:10 +0300
Siarhei Siamashka  wrote:

> Hello,
> 
> Yesterday I have been trying to debug what's causing the XFCE desktop
> background artefacts on my A10-Lime, which look like this:
> 
> http://people.freedesktop.org/~siamashka/files/20140504/a10-l2-cache-fail-artefacts-in-xfce.png
> 
> And narrowed them down to ARM Cortex-A8 L2 cache failures, which
> are reproducible when doing JPEG decoding:
> 
> $ djpeg -v   
> libjpeg-turbo version 1.3.0 (build 20130811)
> 
> $ wget http://linux-sunxi.org/images/8/83/A10-LIME.jpg
> 
> $ djpeg A10-LIME.jpg | md5sum
> 691497bd2e5d36976c1ea3150de89df6  -
> 
> $ djpeg A10-LIME.jpg | md5sum
> 6a874af750f92e1e3c019f2df7edf3f7  -
> 
> $ djpeg A10-LIME.jpg | md5sum
> 297b98ba10233cbbcea2566e1c4fd7c7  -
> 
> Please note that the md5sum of the decoded JPEG file is different for
> each run.
> 
> There are other ways to reproduce it (the FFmpeg test suite can detect
> this problem too), but the djpeg test is very simple and fast to do.
> In the case if somebody does not have the djpeg tool from libjpeg-turbo
> in their distro, I have a static djpeg binary here for extra
> convenience:
> http://people.freedesktop.org/~siamashka/files/20140504/djpeg-static
> It has been built using:
> 
> http://people.freedesktop.org/~siamashka/files/20140504/build-static-djpeg.sh
> 
> On my collection of just three Allwinner A10 based devices, I get the
> following results with the libjpeg-turbo djpeg test (and the default
> CPU core voltage):
> A10-Lime- fails at 1008MHz (960MHz is fine)
> Mele A2000  - fails at 1152MHz (1104MHz is fine)
> Cubieboard1 - fails at 1152MHz (1104MHz is fine)
> 
> Why is it likely related to the L2 cache? Because this problem goes
> away if we disable the L2 cache by adding something like
> mrc p15, 0, r10, c1, c0, 1
> bic r10, r10, #(1 << 1)
> mcr p15, 0, r10, c1, c0, 1
> to the code around
>
> https://github.com/linux-sunxi/linux-sunxi/blob/sunxi-v3.4.86-r0/arch/arm/mm/proc-v7.S#L248
> 
> It is also interesting that sun4i and sun5i have different L2 cache
> latency parameters configured there. I have tried increasing the
> latencies in the L2 Cache Auxiliary Control Register, but these
> changes did not seem to affect anything. It looks like the only
> important factors are the CPU clock speed and the CPU core
> voltage (increasing it to 1.45V from 1.4V also fixes the problem
> on my A10-Lime).
> 
> Anyway, with the sample size of just 3 devices, 33% of them appear to
> be unable to run stable at 1GHz and 1.4V core voltage. I wonder, how
> common is this problem in general? Are there any other Allwinner A10
> devices failing the libjpeg-turbo djpeg test at 1GHz?
> 
> Also it would make sense to run reliability tests for all the cpufreq
> operating points, because any frequency+voltage pair can be a weak link.

Implemented an automated script for running tests at different
operating points:
https://github.com/ssvb/cpuburn-arm/blob/master/cpufreq-ljt-stress-test

Only 1008MHz appears to be really problematic on my A10-Lime device. An
example of running it:

lime ~ # ./cpufreq-ljt-stress-test
CPU stress test, which is doing JPEG decoding by libjpeg-turbo
at different cpufreq operating points.

Testing CPU 0
 1488 MHz SKIPPED
 1440 MHz SKIPPED
 1392 MHz SKIPPED
 1344 MHz SKIPPED
 1296 MHz SKIPPED
 1248 MHz SKIPPED
 1200 MHz SKIPPED
 1152 MHz SKIPPED
 1104 MHz SKIPPED
 1056 MHz SKIPPED
 1008 MHz . FAILED
  960 MHz  OK
  912 MHz  OK
  864 MHz  OK
  816 MHz  OK
  768 MHz  OK
  744 MHz  OK
  720 MHz  OK
  696 MHz  OK
  672 MHz  OK
  648 MHz  OK
  600 MHz  OK
  576 MHz  OK
  528 MHz  OK
  480 MHz  OK
  432 MHz  OK
  408 MHz  OK
  384 MHz  OK
  360 MHz  OK
  336 MHz  OK
  300 MHz  OK
  288 MHz ...