Re: [PATCH 21.02] ipq806x: backport cpufreq changes to 5.4

2021-09-08 Thread Ansuel Smith
Il giorno mer 8 set 2021 alle ore 02:11 Shane Synan
 ha scritto:
>
> On 8/24/21 7:21 PM, Shane Synan wrote:
> > The fix hasn't been found, but progress has been made!
> >
> > After further testing, I think I've found a way to recreate this issue
> > with just the router itself, no external USB HDD, no Déjà Dup backup
> > over SFTP, and possibly no extra changes beyond a stock NBG6817
> > OpenWRT build (not confirmed as this router runs my home network,
> > including SQM QoS, VLANs with another WiFi AP, etc).
>
> So far, I've attempted all three suggested fixes, but I had trouble
> implementing one and I'm unsure if I tried the other two correctly.
> Additionally, pinning to "performance" for 1.75 GHz does not solve
> the issue either - more on that near the end.
>
> I've put all of my commits into one branch for easier reference:
> https://github.com/openwrt/openwrt/compare/master...digitalcircuit:ft-fix-ipq8065-reset
>
> And I've used my simplified automatic QA script for verification:
> https://github.com/digitalcircuit/openwrt-ipq806x-qa-cpu-reset#readme
>
> (In theory, anyone should be able to reproduce the issue with this
> script on a stock OpenWRT build.  I'll still do testing with the
> Déjà Dup SFTP backup workload.)
>
>
> Suggestions and results in order of attempt:
> (Ignore "ipq8065: force CPUs to share DVFS scaling", wrong method.)
>
>
> 1.  Raising clock latency (commits with "clock latency" in subject)
>
> I've tried raising the clock-latency-ns in the ipq8065 DTS by 100
> nanoseconds, a deliberately excessive value in the hopes of it being
> enough to notice any issues.
>
> I've tried this for...
>
> * 1.4 GHz and 1.75 GHz
>   (ipq8065: raise 1.4 & 1.75 GHz clock latency)
> * All CPU frequencies
>   (previous + ipq8065: raise all clock latency)
> * All CPU frequencies and L2 cache latency
>   (previous + ipq8065: raise L2 cache, CPU core clock latency)
>
> Unfortunately, as noted in the revert commit, this seemed to have no
> impact on the results from the QA script.
>
> I don't know if I've correctly implemented this suggestion.
>
> QA script log on b1870c2 (.tar.xz due to 12.2 MiB uncompressed size):
> https://zorro.casa/sync/Hosting/Utilities/Development/OpenWRT/mailing-list/ipq806x_%20backport%20cpufreq%20changes%20to%205.4/debug-cpufreq%20-%20clock%20default%20test%20case1%20-%202021-08-30%2022-37-50%20-%20r17395-b1870c2530-branch-ft-fix-ipq8065-reset%20-%20date%20segfault%2C%20reboot%20-%20public.tar.xz
>
>
> 2.  Run both cores at the same frequency (most promising?)
>
> I tried to do this (ipq806x: Force CPU cores to share frequency), but
> I think I didn't modify the cpufreq driver in the correct way.
>
> As noted in the revert commit, this didn't appear to force CPUs to
> share frequency, whether manually using the performance governor or
> periodically observing the ondemand governor - the CPU cores were at
> different frequencies.
>
> I'll need help figuring out how to implement this in the cpufreq
> driver correctly.  It seems promising given that in the past,
> dual-core bursty workloads didn't seem to trigger the crash.
>
> NOTE: Before diving into implementing this, read the conclusion below
> as I've noticed reboots happen without changing CPU frequency as well.
>
> I'm also not sure how to debug the cpufreq driver in general.  With
> dynamic debugging, I can turn on messages about the cpufreq governor,
> but I'm not sure of the right way to add dynamic debugging print
> messages to the cpufreq driver.
>
> Example of dynamic debugging:
> echo "file drivers/cpufreq/* =p" > /sys/kernel/debug/dynamic_debug/control
>
> QA script log on 1fdabd9 (.tar.xz due to 4.4 MiB uncompressed size):
> https://zorro.casa/sync/Hosting/Utilities/Development/OpenWRT/mailing-list/ipq806x_%20backport%20cpufreq%20changes%20to%205.4/debug-cpufreq%20-%20clock%20default%20test%20case1%20-%202021-08-31%2020-38-04%20-%20r17397-1fdabd95db-branch-ft-fix-ipq8065-reset%20-%20date%20segfault%2C%20reboot%20-%20public.tar.xz
>
>
> 3.  Add forced frequency transitions between 1.0 GHz and 1.75 GHz
>
> I'm not sure if I implemented this correctly.  I made a first attempt
> (ipq806x: Add transitions to 1.0 <> 1.4 <> 1.75 GHz), but if the
> frequency transitions happen, they're too fast to observe.  And as
> noted above, I'm not yet sure of the right way to add dynamic
> debugging messages.
>
> Running the QA script in "case1" (toggle 800 MHz to 1.75 GHz) still
> crashes.
>
> QA script log on 52f4f77 (.tar.xz due to 471.8 KiB uncompressed size):
> https://zorro.casa/sync/Hosting/Utilities/Development/OpenWRT/mailing-list/ipq806x_%20backport%20cpufreq%20changes%20to%205.4/debug-cpufreq%20-%20clock%20default%20test%20case1%20-%202021-09-07%2019-58-07%20-%20r17399-52f4f77518-branch-ft-fix-ipq8065-reset%20-%20reboot%20-%20public.tar.xz
>
> Separately, I updated the QA script to add a "ramp1" case which
> smoothly ramps the CPU core frequency up/down from 600 MHz to
> 1.75 GHz, stopping at every frequency in between

Re: [PATCH 21.02] ipq806x: backport cpufreq changes to 5.4

2021-09-07 Thread Shane Synan
On 8/24/21 7:21 PM, Shane Synan wrote:
> The fix hasn't been found, but progress has been made!
> 
> After further testing, I think I've found a way to recreate this issue
> with just the router itself, no external USB HDD, no Déjà Dup backup
> over SFTP, and possibly no extra changes beyond a stock NBG6817
> OpenWRT build (not confirmed as this router runs my home network,
> including SQM QoS, VLANs with another WiFi AP, etc).

So far, I've attempted all three suggested fixes, but I had trouble
implementing one and I'm unsure if I tried the other two correctly.
Additionally, pinning to "performance" for 1.75 GHz does not solve
the issue either - more on that near the end.

I've put all of my commits into one branch for easier reference:
https://github.com/openwrt/openwrt/compare/master...digitalcircuit:ft-fix-ipq8065-reset

And I've used my simplified automatic QA script for verification:
https://github.com/digitalcircuit/openwrt-ipq806x-qa-cpu-reset#readme

(In theory, anyone should be able to reproduce the issue with this
script on a stock OpenWRT build.  I'll still do testing with the
Déjà Dup SFTP backup workload.)


Suggestions and results in order of attempt:
(Ignore "ipq8065: force CPUs to share DVFS scaling", wrong method.)


1.  Raising clock latency (commits with "clock latency" in subject)

I've tried raising the clock-latency-ns in the ipq8065 DTS by 100
nanoseconds, a deliberately excessive value in the hopes of it being
enough to notice any issues.

I've tried this for...

* 1.4 GHz and 1.75 GHz
  (ipq8065: raise 1.4 & 1.75 GHz clock latency)
* All CPU frequencies
  (previous + ipq8065: raise all clock latency)
* All CPU frequencies and L2 cache latency
  (previous + ipq8065: raise L2 cache, CPU core clock latency)

Unfortunately, as noted in the revert commit, this seemed to have no
impact on the results from the QA script.

I don't know if I've correctly implemented this suggestion.

QA script log on b1870c2 (.tar.xz due to 12.2 MiB uncompressed size):
https://zorro.casa/sync/Hosting/Utilities/Development/OpenWRT/mailing-list/ipq806x_%20backport%20cpufreq%20changes%20to%205.4/debug-cpufreq%20-%20clock%20default%20test%20case1%20-%202021-08-30%2022-37-50%20-%20r17395-b1870c2530-branch-ft-fix-ipq8065-reset%20-%20date%20segfault%2C%20reboot%20-%20public.tar.xz


2.  Run both cores at the same frequency (most promising?)

I tried to do this (ipq806x: Force CPU cores to share frequency), but
I think I didn't modify the cpufreq driver in the correct way.

As noted in the revert commit, this didn't appear to force CPUs to
share frequency, whether manually using the performance governor or
periodically observing the ondemand governor - the CPU cores were at
different frequencies.

I'll need help figuring out how to implement this in the cpufreq
driver correctly.  It seems promising given that in the past,
dual-core bursty workloads didn't seem to trigger the crash.

NOTE: Before diving into implementing this, read the conclusion below
as I've noticed reboots happen without changing CPU frequency as well.

I'm also not sure how to debug the cpufreq driver in general.  With
dynamic debugging, I can turn on messages about the cpufreq governor,
but I'm not sure of the right way to add dynamic debugging print
messages to the cpufreq driver.

Example of dynamic debugging:
echo "file drivers/cpufreq/* =p" > /sys/kernel/debug/dynamic_debug/control

QA script log on 1fdabd9 (.tar.xz due to 4.4 MiB uncompressed size):
https://zorro.casa/sync/Hosting/Utilities/Development/OpenWRT/mailing-list/ipq806x_%20backport%20cpufreq%20changes%20to%205.4/debug-cpufreq%20-%20clock%20default%20test%20case1%20-%202021-08-31%2020-38-04%20-%20r17397-1fdabd95db-branch-ft-fix-ipq8065-reset%20-%20date%20segfault%2C%20reboot%20-%20public.tar.xz


3.  Add forced frequency transitions between 1.0 GHz and 1.75 GHz

I'm not sure if I implemented this correctly.  I made a first attempt
(ipq806x: Add transitions to 1.0 <> 1.4 <> 1.75 GHz), but if the
frequency transitions happen, they're too fast to observe.  And as
noted above, I'm not yet sure of the right way to add dynamic
debugging messages.

Running the QA script in "case1" (toggle 800 MHz to 1.75 GHz) still
crashes.

QA script log on 52f4f77 (.tar.xz due to 471.8 KiB uncompressed size):
https://zorro.casa/sync/Hosting/Utilities/Development/OpenWRT/mailing-list/ipq806x_%20backport%20cpufreq%20changes%20to%205.4/debug-cpufreq%20-%20clock%20default%20test%20case1%20-%202021-09-07%2019-58-07%20-%20r17399-52f4f77518-branch-ft-fix-ipq8065-reset%20-%20reboot%20-%20public.tar.xz

Separately, I updated the QA script to add a "ramp1" case which
smoothly ramps the CPU core frequency up/down from 600 MHz to
1.75 GHz, stopping at every frequency in between.  Unfortunately,
this still crashes.

Interestingly, the crash again happens when CPU core frequencies are
distant from each other (1.75 GHz and 800 MHz).  This lends credence
to the idea of locking CPU frequencies together for 1.4 an

Re: [PATCH 21.02] ipq806x: backport cpufreq changes to 5.4

2021-08-24 Thread Shane Synan
On 8/10/21 6:02 PM, Shane Synan wrote:
> I've tried multiple other things (mentioned below) which haven't
> worked, so it's on to modifying the CPU voltage!  […]

The fix hasn't been found, but progress has been made!

After further testing, I think I've found a way to recreate this issue
with just the router itself, no external USB HDD, no Déjà Dup backup
over SFTP, and possibly no extra changes beyond a stock NBG6817
OpenWRT build (not confirmed as this router runs my home network,
including SQM QoS, VLANs with another WiFi AP, etc).

I've put the scripts and documentation into a dedicated repository:
https://github.com/digitalcircuit/openwrt-ipq806x-qa-cpu-reset

(In brief: toggling individual CPU cores between 1.75 GHz and 800 MHz)

For others interested, feel free to try to recreate the issue with
your own IPQ8065 routers!  Possibly even IPQ806x (untested).

I'll follow up here again once I've made notable progress.

As part of my research, I found a minor oversight with the OPP CPU
voltages, which has been fixed as of the following commit:
https://github.com/openwrt/openwrt/commit/9baca410644b3f0fe94e2d5b6558c9c4bf61e712

(OPP voltage triplets are , not  like
one would expect)

Ansuel has offered several promising suggestions on how to potentially
fix the reboot issue in the pull request for that commit:
https://github.com/openwrt/openwrt/pull/4464#issuecomment-903178662

In essence:
* Transitioning through multiple frequencies going to/from high speeds
* Forcing both CPU cores to same frequency in some/all situations
* Increasing clock latency in DTS

I'll explore these next!

I've also determined that raising the CPU voltages by 2 microvolts
did not resolve the issue.  I may revisit this with higher voltages in
the future:
https://github.com/openwrt/openwrt/compare/openwrt-21.02...digitalcircuit:openwrt-21.02-cpufreq-dtsivolt-cache-fix-opp-order

(Above obsoleted by "ipq806x: fix min<>target opp-microvolt DTS mixup"
being merged to "master"; I'll create a new backport branch later.)

Tune in next time to find out what happens on...
...debugging with digitalcircuit!

And thanks to everyone who has offered ideas, encouragement, and
patience as I learn many new things.

Regards,
Shane Synan

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [PATCH 21.02] ipq806x: backport cpufreq changes to 5.4

2021-08-10 Thread Shane Synan
On 7/26/21 11:19 PM, Shane Synan wrote:
> Thank you!  In the upcoming weeks, I'll look into tinkering with the
> voltage settings for the CPU in the OPP table.
>
> […]
I've tried multiple other things (mentioned below) which haven't
worked, so it's on to modifying the CPU voltage!  I admit I held off
out of concern for damaging the router, but I'm at the point of
being willing to replace it, so if I mess up, it's fine.

Just to confirm, increasing CPU voltage in the OPP table involves
modifying the "opp-microvolt" line within "[...]/qcom-ipq8065.dtsi"...

opp-14 {
[...]
opp-microvolt = <115>;
[...]
}

...correct?

And if so, do you remember how much you increased the voltage?

I'll research this regardless, so if it'd be a hassle to find it, etc,
no obligation.  As before, thank you for your time!

Regards,
Shane Synan


P.S. The rest of this is merely publicly keeping track of what I've
tried that has not worked:

* Replacing the router power supply (tried "HitLights 12VDC 60W", e.g.
5 amp vs. router's OEM 3.5 amp)

Nothing-shown-in-serial-console reboot still happens.  It is possible
I picked a bad power supply, I don't (yet) have a convenient way to
measure it while connected to the router, but it at least seemed
reasonable.  I can revisit this one if desired.

* Using a powered USB 3.0 hub connected to the USB 3.0 port on the
NBG6817 to remove all power draw (peak measured current: 0.001 amps)

Issue still happens.

* Using an SSD (Samsung 850 Pro 256 GB) via the USB 3.0 hub instead
of the spinning 1 TB HDD

Issue still happens.

(Note: the SSD connected directly to the NBG6817's USB 3.0 port
without the powered USB 3.0 hub in between resulted in the Linux
kernel losing connection and resetting the USB port, possibly due to
sharper current spikes than the spinning disk.  I'm treating that as
an unrelated issue.)

* Using the same SSD via the USB 3.0 hub connected to the NBG6817's
USB 2.0 port

Issue still happens - it's likely not the router's USB 3.0 port.

Older tests:
* Measuring and recreating the 1 TB USB 3.0 HDD's peak power draw
(670-ish mA) and/or the maximum USB 3's spec power draw (900 mA) and
manually toggling a testing load rapidly on/off while router's under
CPU load

No quick crashes.  I'd need to automate the USB test load to perform
a long term (multiple hours) test though.

Future tests:
Asides from the OPP table adjustment, I may try other testing
combinations too, e.g. an IRC suggestion of turning off WiFi radios.
Other ideas include putting the USB HDD on USB 2.0 port without/with
the powered USB 3.0 hub, fitting a measurement device between the
router's power supply and the router itself, etc.


As a refresher for others and myself, my backport efforts are
currently in these two links:

Backport CPU governor with 1.4 GHz L2 cache enabled (broken for me):
https://github.com/openwrt/openwrt/compare/openwrt-21.02...digitalcircuit:openwrt-21.02-cpufreq-dtsivolt-cache

...versus...

Backport CPU governor with 1.4 GHz L2 cache DISABLED (works for me):
https://github.com/openwrt/openwrt/compare/openwrt-21.02...digitalcircuit:openwrt-21.02-cpufreq-dtsivolt-cache-no-1.4ghz

(As Ansuel noted months earlier, this should be a performance
regression from 19.07.  While I've not noticed any appreciable
degradation, I've also not run iperf3/etc to confirm throughput, and
my ISP's claimed 200 mbit/s download speed is hardly top tier.)

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [PATCH 21.02] ipq806x: backport cpufreq changes to 5.4

2021-07-26 Thread Shane Synan
[Since the mailing list didn't seem to pick up Ansuel's mobile reply,
I've reformatted here for easier future reference.]

On 7/20/21 8:36 PM EST, Ansuel Smith wrote:
> Sorry for the bad answer (I'm from phone)
> Anyway what you can try is tweak the OPP table and increase your CPU
> voltage...
> Don't know if it was related but some time ago I tested the router
> with overclock and strangely the router was more stable - 28 days of
> uptime while normally I have 3-4 days.
> My idea is some chip degradation or bad power supply and this cause
> crash.  Increasing the voltage seems to fix the problem.

Thank you!  In the upcoming weeks, I'll look into tinkering with the
voltage settings for the CPU in the OPP table.

With your advice in mind, I'm also suspecting it may be the combination
of CPU power draw and my USB 3.0 HDD's power drain (an old 1 TB Seagate
SRD00F1), as the latter is bus-powered rather than using an external
power supply.  I've acquired a digital USB test load and a USB 3.0
voltage/current meter is on the way (estimated arrival August 15th).
With any luck, I hope to reliably recreate the issue by running a
CPU-intensive task combined with a fixed current load to mimic the HDD
drawing from the NBG6817's USB 3.0 port, instead of the convoluted
Deja Dup SFTP backup test that I've been doing.

I'll also look at measuring the DC side of the power supply while it's
under load.  It might even be a combination of the bus-powered USB 3.0
HDD, chip degradation, and/or failing power supply.  I've had the
NBG6817 as my primary home router since November 2019 (bought new).

Again, thank you for your patience and time spent on this tricky issue!
I'll follow up if I uncover anything of interest, and I'll keep updating
folks in the OFTC/#openwrt-devel IRC channel too.

~ Shane

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [PATCH 21.02] ipq806x: backport cpufreq changes to 5.4

2021-07-20 Thread Shane Synan
On 7/1/21 4:55 PM, Shane Synan wrote:
> On 6/28/21 2:10 AM, Shane Synan wrote:
> [...]
> 
> Given these commits work just fine on the "master" branch, and on
> "21.02" it worked to change the CPU governor *without* the DTSI
> backports and cache (i.e. only the first commit), I strongly suspect I
> goofed up and missed backporting something important.

After further attempts, I still encounter the issue on OpenWRT "master"
with Linux kernel 5.4 as well, I just didn't try enough iterations of
SFTP upload testing.  Disabling 1.4 GHz L2 cache by removing that
"opp_table_l2" entry from the .dtsi
(git revert'ng "ipq806x: fix missing 1.4ghz cache freq for ipq8065 SoC")
*so far* seems more stable.

This applies to OpenWRT 21.02 as well if I try backporting the changes:
https://github.com/openwrt/openwrt/compare/openwrt-21.02...digitalcircuit:openwrt-21.02-cpufreq-dtsivolt-cache
...versus...
https://github.com/openwrt/openwrt/compare/openwrt-21.02...digitalcircuit:openwrt-21.02-cpufreq-dtsivolt-cache-no-1.4ghz
(stress-ng currently fails to build from source on the "master" branch,
hence trying to narrow it down on 21.02 as well)

Unfortunately, I still haven't managed to recreate this issue in a
simple manner.  I'm continuing to experiment with stress-ng parameters
in various loops, scripting GNOME's GIO backend via Python for repeated
SFTP uploads, etc, but haven't succeeded yet.  I'll continue to explore
my options.

Would you have any suggestions for how to diagnose this issue at a lower
level?

I've not gotten any cpufreq-related error messages from the serial
console.  I've noticed that the cpufreq driver mentions transitions
to/from 384 MHz as well, despite having the /etc/init.d/cpufreq changes
applied raising scaling_min_freq to 600 MHz - perhaps I'm somehow
bypassing the check that tries to avoid the hardware constraint?
   From  :To
 :3840006080   100   140   1725000 
   384000: 014 3 611 9 
   60:15 0 49834 23113 7106477 
   80: 7 64708 0 1430134 28653 
  100:10 32739 13774 037 24203 
  140: 6 82162 017 
  1725000: 5 81977 44070 3328125 0
(Source: /sys/devices/system/cpu/cpufreq/policy[0..1]/stats/trans_table)

Alternatively, is it possible to adjust the maximum L2 cache frequency
at runtime, similar to setting the CPU governor min/max?

To be clear, I appreciate your efforts to keep ipq806x alive, especially
after Qualcomm had abandoned the platform (I hadn't realize this until
someone on #openwrt-devel pointed this out).  I don't intend to seem
ungrateful.  If you'd prefer to consider this bizarrely hard-to-simplify
test case unsolvable, that's okay!  I can continue to run custom builds
with 1.4 GHz L2 cache frequency disabled, and the majority of others
without my bizarrely-nondeterministic issues can enjoy the higher speed.

Regardless, thank you for your time spent looking into this!  I hope
my attempts to diagnose this has not caused you any trouble.

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [PATCH 21.02] ipq806x: backport cpufreq changes to 5.4

2021-07-01 Thread Shane Synan
On 6/28/21 2:10 AM, Shane Synan wrote:
> On 6/27/21 6:10 PM, Ansuel Smith wrote:
>> It's good but consider that also the dts changes are required and also there
>> is a pending pr that has to be merged or ipq8065 device will run at lower 
>> speed.
>> The new cpufreq have changed the node definition
>> This is the pr I'm talking about.
>> https://github.com/openwrt/openwrt/pull/4192
>>
>> So in short this backport lacks the dts changes and the pr4192 is mandatory 
>> or
>> we will introduce perf regression in 21
>>
> 
> Thank you for pointing out the details of the DTS changes and CPU
> performance.  Originally I was trying to achieve the smallest
> backport, but in hindsight, it makes far more sense to avoid
> frankensteining with partial changes.  I appreciate your patience and
> advice!
> 
> Meanwhile, until PR #4192 can be reviewed/merged to master (so there's
> commits to actually backport), I've thrown together a rough draft of
> what changes would need backported:

Now that https://github.com/openwrt/openwrt/pull/4192 is merged,
including four successful tests on Linux kernel v5.4, I've tried
backporting all the CPU frequency/DTS/etc changes that appeared
relevant, as shown here:
https://github.com/openwrt/openwrt/compare/openwrt-21.02...digitalcircuit:openwrt-21.02-cpufreq-dtsivolt-cache
(some commits were modified to drop kernel v5.10 patches)

Unfortunately, my initial attempt is unstable again.  I've gotten
3 reboots so far within a few hours of SFTP testing.  Unlike before,
nothing displays on the router's soldered-on serial console headers.

Given these commits work just fine on the "master" branch, and on
"21.02" it worked to change the CPU governor *without* the DTSI
backports and cache (i.e. only the first commit), I strongly suspect I
goofed up and missed backporting something important.

Ansuel, do you have any suggestions on what else I should look for or
try tinkering with?

Meanwhile, just to see what happens in an attempt to narrow things
down, I'm testing a build without the DTSI backports (all commits linked
above except for "ipq806x: refresh dtsi patches" and "ipq806x: apply
correct voltage tolerance").  So far, it boots, though I haven't run a
full test.  Regardless, it's not a proper solution, just a
troubleshooting attempt.

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [PATCH 21.02] ipq806x: backport cpufreq changes to 5.4

2021-06-27 Thread Shane Synan
On 6/27/21 6:10 PM, Ansuel Smith wrote:
> Il giorno dom 27 giu 2021 alle ore 23:41 Baptiste Jonglez
>  ha scritto:
>>
>> Ansuel, do you have any feedback on this backport?  It seems to help things.
>>
> It's good but consider that also the dts changes are required and also there
> is a pending pr that has to be merged or ipq8065 device will run at lower 
> speed.
> The new cpufreq have changed the node definition
> This is the pr I'm talking about.
> https://github.com/openwrt/openwrt/pull/4192
> 
> So in short this backport lacks the dts changes and the pr4192 is mandatory or
> we will introduce perf regression in 21
> 

Thank you for pointing out the details of the DTS changes and CPU
performance.  Originally I was trying to achieve the smallest
backport, but in hindsight, it makes far more sense to avoid
frankensteining with partial changes.  I appreciate your patience and
advice!

Meanwhile, until PR #4192 can be reviewed/merged to master (so there's
commits to actually backport), I've thrown together a rough draft of
what changes would need backported:

https://github.com/openwrt/openwrt/compare/openwrt-21.02...digitalcircuit:openwrt-21.02-cpufreq-dtsivolt-cache
Cherry-pick of existing DTS changes and commits in PR #4192:
https://github.com/openwrt/openwrt/pull/4192
[NOTE: I might be missing commits!]

This isn't a real backport, it's just so I can try out these changes
right now.  So far, it builds and runs, and I've started the first
seven-hour SFTP backup test as described in FS#3099.

If there's anything I'm overlooking, or anything else I could try that
would be helpful, let me know!

No disrespect meant, either.  I realize the scope of these changes
means this might need to wait for 21.02.1, and meanwhile I'm happy to
continue making and testing custom builds.

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [PATCH 21.02] ipq806x: backport cpufreq changes to 5.4

2021-06-27 Thread Ansuel Smith
Il giorno dom 27 giu 2021 alle ore 23:41 Baptiste Jonglez
 ha scritto:
>
> Hi,
>
> On 20-06-21, Shane Synan wrote:
> > In the time since submitting this, I've continued testing this
> > change on my ZyXEL NBG6817.  I'm reasonably confident this fixes my
> > issue (11/11 successes), and if there's any further testing that
> > would help, let me know!
>
> Thanks for the patch and testing, I'd like to merge it in the next few
> days (before 21.02.0 final).  I'll update the commit message.
>
> Ansuel, do you have any feedback on this backport?  It seems to help things.
>
> Thanks,
> Baptiste

It's good but consider that also the dts changes are required and also there
is a pending pr that has to be merged or ipq8065 device will run at lower speed.
The new cpufreq have changed the node definition
This is the pr I'm talking about.
https://github.com/openwrt/openwrt/pull/4192

So in short this backport lacks the dts changes and the pr4192 is mandatory or
we will introduce perf regression in 21

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [PATCH 21.02] ipq806x: backport cpufreq changes to 5.4

2021-06-27 Thread Baptiste Jonglez
Hi,

On 20-06-21, Shane Synan wrote:
> In the time since submitting this, I've continued testing this
> change on my ZyXEL NBG6817.  I'm reasonably confident this fixes my
> issue (11/11 successes), and if there's any further testing that
> would help, let me know!

Thanks for the patch and testing, I'd like to merge it in the next few
days (before 21.02.0 final).  I'll update the commit message.

Ansuel, do you have any feedback on this backport?  It seems to help things.

Thanks,
Baptiste


signature.asc
Description: PGP signature
___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


Re: [PATCH 21.02] ipq806x: backport cpufreq changes to 5.4

2021-06-20 Thread Shane Synan
> -Original Message-
> Sent: 6/17/21 1:33 AM
> To: openwrt-devel@lists.openwrt.org
> Cc: Ansuel Smith 
> Subject: [PATCH 21.02] ipq806x: backport cpufreq changes to 5.4
> 

In the time since submitting this, I've continued testing this
change on my ZyXEL NBG6817.  I'm reasonably confident this fixes my
issue (11/11 successes), and if there's any further testing that
would help, let me know!

I have tried to recreate this issue with stress-ng.  Unfortunately,
my efforts so far have only discovered new and seemingly-unrelated
ways to cause kernel panics.  Though this patch restores pre-21.02
stability for me, it's clearly not the only issue on ipq806x.

Others on the OFTC/#openwrt-devel IRC channel have kindly informed me
it would have helped to include a version of my test results and
rationale in the commit message itself - feel free to amend when
merging, or if it would be easier, I'm happy to submit a PATCHv2
instead!

In the interim, I'll keep verifying this.  Given the frequency of
crashes without this patch, I won't be able to return to official
21.02 builds meanwhile.

I've also CC'd Baptiste due to this May mailing post:
https://lists.openwrt.org/pipermail/openwrt-devel/2021-May/035153.html
Pardon if this was out of place.


Test case:

OpenSSH configured as secondary SSH server, locked to SFTP and
chroot'd to an external USB HDD.  Deja Dup (duplicity frontend)
runs on my Linux computers, backing up roughly 223 GB compressed and
encrypted in about 7 hours.  This happens in 25 MB chunks, so the
router sees bursts of CPU/memory/IO activity with pauses in between.

See "Environment" below as well.

* WITHOUT this patch, running on 21.02rc1/rc2:

5/30 SFTP transfers succeed, with at most 2 succeeding in a row.

21.02.0rc1: 4/25 backups succeeded
21.02.0rc2: 1/5 backups succeeded

Noted in the bug report here:
https://bugs.openwrt.org/index.php?do=details&task_id=3099#comment9712

I've been investigating this for longer than a month.  If desired, I
can retest with the same "openwrt/openwrt" commit, minus the CPU
patch.

* WITH this patch, built atop 21.02 branch commit
072d0afb8fa6359541568081c23fe2d8d411651c:

11/11 SFTP transfers succeed all consecutively, as noted in the bug
report:
https://bugs.openwrt.org/index.php?do=details&task_id=3099#comment9820

Note: I did not need to apply the DTSI patches for this success.
However, they did not cause any issue, either - my previous time spent
running a May 18th snapshot build went just fine.


Environment:

ZyXEL NBG6817 as router and access point connected to ISP modem:

* 3 local networks (primary, guest, openwireless.org) with
3x(2.4+5 GHz) corresponding WiFi SSIDs

* Additional packages:
nano ncdu luci-ssl openssh-sftp-server mosh-server
luci-app-advanced-reboot luci-app-ddns luci-app-sqm luci-app-openvpn
openvpn-openssl luci-app-nlbwmon stubby hostapd-utils wget-ssl
openssh-server shadow-useradd iperf3

* Additional packages for USB 3 external HDD:
block-mount e2fsprogs kmod-fs-ext4 kmod-usb-storage kmod-usb2
kmod-usb3 kmod-usb-storage-uas

* Dropbear remains the primary SSH daemon.  OpenSSH is configured to
run on a different port, locked down to SFTP access only via chroot
to the USB HDD.

* ASUS RT-AC68U running FreshTomato is connected via Ethernet as a
secondary 3x(2.4+5 GHz) AP, using VLAN tagging

* Other seemingly small configuration tweaks which I can share if it
would help


I recognize that OpenWRT 21.02.0 stable is nearly upon us.  If
there's anything else I can do to help verify this change, let me
know.  Your time is appreciated.


Thanks,
Shane

___
openwrt-devel mailing list
openwrt-devel@lists.openwrt.org
https://lists.openwrt.org/mailman/listinfo/openwrt-devel


[PATCH 21.02] ipq806x: backport cpufreq changes to 5.4

2021-06-16 Thread Shane Synan
Date: Thu, 8 Apr 2021 16:20:36 +0200
From: Ansuel Smith 

The new cpufreq driver requires different dts bindings.
Backport the new driver to kernel 5.4

Signed-off-by: Ansuel Smith 
Tested-by: Shane Synan 
(cherry picked from commit 6e411b8416388a9c8be1b2291be9b5adeeb07784)
---
I've tested these changes on the ZyXEL NBG6817 and they've resolved a
5.4 kernel crash for me, triggered by SFTP uploads via OpenSSH server
to an attached USB 3.0 drive, as documented in FS#3099.

https://bugs.openwrt.org/index.php?do=details&task_id=3099#comment9820

However, I'm not certain this resolves the crashes for others, hence
not mentioning a "Fixes" for the issue in the commit directly.

This is my first attempt at my first submission to the openwrt-devel
mailing list, let me know if anything's amiss!  I'm unsure if
attribution is properly credited to the original patch author.

 target/linux/ipq806x/config-5.4   |   1 +
 ...rt-adjusting-OPP-voltages-at-runtime.patch | 153 
 ...per-to-get-an-opp-regulator-for-devi.patch |  52 --
 ...e-voltage-tolerance-when-adjusting-t.patch |  47 --
 ...-dt-Handle-OPP-voltage-adjust-events.patch | 118 ---
 ...-dt-Add-L2-frequency-scaling-support.patch | 199 -
 ...056-cpufreq-dt-Add-missing-rcu-locks.patch |  23 -
 ...qcom-cpufreq-nvmem-support-specific-.patch |  51 ++
 ...q-add-Krait-dedicated-scaling-driver.patch | 679 ++
 ...ufreq-add-qcom-krait-cpufreq-binding.patch | 237 ++
 ...dd-fab-scaling-support-with-cpufreq.patch} |  34 +-
 11 files changed, 985 insertions(+), 609 deletions(-)
 delete mode 100644 
target/linux/ipq806x/patches-5.4/0049-PM-OPP-Support-adjusting-OPP-voltages-at-runtime.patch
 delete mode 100644 
target/linux/ipq806x/patches-5.4/0051-PM-OPP-Add-a-helper-to-get-an-opp-regulator-for-devi.patch
 delete mode 100644 
target/linux/ipq806x/patches-5.4/0052-PM-OPP-Update-the-voltage-tolerance-when-adjusting-t.patch
 delete mode 100644 
target/linux/ipq806x/patches-5.4/0054-cpufreq-dt-Handle-OPP-voltage-adjust-events.patch
 delete mode 100644 
target/linux/ipq806x/patches-5.4/0055-cpufreq-dt-Add-L2-frequency-scaling-support.patch
 delete mode 100644 
target/linux/ipq806x/patches-5.4/0056-cpufreq-dt-Add-missing-rcu-locks.patch
 create mode 100644 
target/linux/ipq806x/patches-5.4/093-drivers-cpufreq-qcom-cpufreq-nvmem-support-specific-.patch
 create mode 100644 
target/linux/ipq806x/patches-5.4/098-1-cpufreq-add-Krait-dedicated-scaling-driver.patch
 create mode 100644 
target/linux/ipq806x/patches-5.4/098-2-Documentation-cpufreq-add-qcom-krait-cpufreq-binding.patch
 rename 
target/linux/ipq806x/patches-5.4/{0057-add-fab-scaling-support-with-cpufreq.patch
 => 098-3-add-fab-scaling-support-with-cpufreq.patch} (93%)

diff --git a/target/linux/ipq806x/config-5.4 b/target/linux/ipq806x/config-5.4
index 68ed6ec0c7..d410cd3829 100644
--- a/target/linux/ipq806x/config-5.4
+++ b/target/linux/ipq806x/config-5.4
@@ -62,6 +62,7 @@ CONFIG_ARM_MODULE_PLTS=y
 CONFIG_ARM_PATCH_IDIV=y
 CONFIG_ARM_PATCH_PHYS_VIRT=y
 # CONFIG_ARM_QCOM_CPUFREQ_HW is not set
+CONFIG_ARM_QCOM_CPUFREQ_KRAIT=y
 CONFIG_ARM_QCOM_CPUFREQ_NVMEM=y
 CONFIG_ARM_QCOM_CPUIDLE=y
 # CONFIG_ARM_SMMU is not set
diff --git 
a/target/linux/ipq806x/patches-5.4/0049-PM-OPP-Support-adjusting-OPP-voltages-at-runtime.patch
 
b/target/linux/ipq806x/patches-5.4/0049-PM-OPP-Support-adjusting-OPP-voltages-at-runtime.patch
deleted file mode 100644
index 9efbd583b4..00
--- 
a/target/linux/ipq806x/patches-5.4/0049-PM-OPP-Support-adjusting-OPP-voltages-at-runtime.patch
+++ /dev/null
@@ -1,153 +0,0 @@
-From: Sylwester Nawrocki 
-To: k...@kernel.org, vire...@kernel.org, robh...@kernel.org
-Cc: sb...@kernel.org, roger...@mediatek.com,
-   linux...@vger.kernel.org, linux-arm-ker...@lists.infradead.org,
-   linux-samsung-...@vger.kernel.org, devicet...@vger.kernel.org,
-   b.zolnier...@samsung.com, m.szyprow...@samsung.com,
-   Stephen Boyd ,
-   Sylwester Nawrocki 
-Subject: [PATCH v5 1/4] PM / OPP: Support adjusting OPP voltages at runtime
-Date: Wed, 16 Oct 2019 16:57:53 +0200
-Message-ID: <20191016145756.16004-2-s.nawro...@samsung.com> (raw)
-In-Reply-To: <20191016145756.16004-1-s.nawro...@samsung.com>
-
-From: Stephen Boyd 
-
-On some SoCs the Adaptive Voltage Scaling (AVS) technique is
-employed to optimize the operating voltage of a device. At a
-given frequency, the hardware monitors dynamic factors and either
-makes a suggestion for how much to adjust a voltage for the
-current frequency, or it automatically adjusts the voltage
-without software intervention. Add an API to the OPP library for
-the former case, so that AVS type devices can update the voltages
-for an OPP when the hardware determines the voltage should
-change. The assumption is that drivers like CPUfreq or devfreq
-will register for the OPP notifiers and adjust the voltage
-according to suggestions that AVS makes.
-
-This patch is derived from [1] submitted by Stephen.
-[1] https://lore.kernel.org/patchwor