Re: [PATCH 21.02] ipq806x: backport cpufreq changes to 5.4
Il giorno mer 8 set 2021 alle ore 02:11 Shane Synan ha scritto: > > On 8/24/21 7:21 PM, Shane Synan wrote: > > The fix hasn't been found, but progress has been made! > > > > After further testing, I think I've found a way to recreate this issue > > with just the router itself, no external USB HDD, no Déjà Dup backup > > over SFTP, and possibly no extra changes beyond a stock NBG6817 > > OpenWRT build (not confirmed as this router runs my home network, > > including SQM QoS, VLANs with another WiFi AP, etc). > > So far, I've attempted all three suggested fixes, but I had trouble > implementing one and I'm unsure if I tried the other two correctly. > Additionally, pinning to "performance" for 1.75 GHz does not solve > the issue either - more on that near the end. > > I've put all of my commits into one branch for easier reference: > https://github.com/openwrt/openwrt/compare/master...digitalcircuit:ft-fix-ipq8065-reset > > And I've used my simplified automatic QA script for verification: > https://github.com/digitalcircuit/openwrt-ipq806x-qa-cpu-reset#readme > > (In theory, anyone should be able to reproduce the issue with this > script on a stock OpenWRT build. I'll still do testing with the > Déjà Dup SFTP backup workload.) > > > Suggestions and results in order of attempt: > (Ignore "ipq8065: force CPUs to share DVFS scaling", wrong method.) > > > 1. Raising clock latency (commits with "clock latency" in subject) > > I've tried raising the clock-latency-ns in the ipq8065 DTS by 100 > nanoseconds, a deliberately excessive value in the hopes of it being > enough to notice any issues. > > I've tried this for... > > * 1.4 GHz and 1.75 GHz > (ipq8065: raise 1.4 & 1.75 GHz clock latency) > * All CPU frequencies > (previous + ipq8065: raise all clock latency) > * All CPU frequencies and L2 cache latency > (previous + ipq8065: raise L2 cache, CPU core clock latency) > > Unfortunately, as noted in the revert commit, this seemed to have no > impact on the results from the QA script. > > I don't know if I've correctly implemented this suggestion. > > QA script log on b1870c2 (.tar.xz due to 12.2 MiB uncompressed size): > https://zorro.casa/sync/Hosting/Utilities/Development/OpenWRT/mailing-list/ipq806x_%20backport%20cpufreq%20changes%20to%205.4/debug-cpufreq%20-%20clock%20default%20test%20case1%20-%202021-08-30%2022-37-50%20-%20r17395-b1870c2530-branch-ft-fix-ipq8065-reset%20-%20date%20segfault%2C%20reboot%20-%20public.tar.xz > > > 2. Run both cores at the same frequency (most promising?) > > I tried to do this (ipq806x: Force CPU cores to share frequency), but > I think I didn't modify the cpufreq driver in the correct way. > > As noted in the revert commit, this didn't appear to force CPUs to > share frequency, whether manually using the performance governor or > periodically observing the ondemand governor - the CPU cores were at > different frequencies. > > I'll need help figuring out how to implement this in the cpufreq > driver correctly. It seems promising given that in the past, > dual-core bursty workloads didn't seem to trigger the crash. > > NOTE: Before diving into implementing this, read the conclusion below > as I've noticed reboots happen without changing CPU frequency as well. > > I'm also not sure how to debug the cpufreq driver in general. With > dynamic debugging, I can turn on messages about the cpufreq governor, > but I'm not sure of the right way to add dynamic debugging print > messages to the cpufreq driver. > > Example of dynamic debugging: > echo "file drivers/cpufreq/* =p" > /sys/kernel/debug/dynamic_debug/control > > QA script log on 1fdabd9 (.tar.xz due to 4.4 MiB uncompressed size): > https://zorro.casa/sync/Hosting/Utilities/Development/OpenWRT/mailing-list/ipq806x_%20backport%20cpufreq%20changes%20to%205.4/debug-cpufreq%20-%20clock%20default%20test%20case1%20-%202021-08-31%2020-38-04%20-%20r17397-1fdabd95db-branch-ft-fix-ipq8065-reset%20-%20date%20segfault%2C%20reboot%20-%20public.tar.xz > > > 3. Add forced frequency transitions between 1.0 GHz and 1.75 GHz > > I'm not sure if I implemented this correctly. I made a first attempt > (ipq806x: Add transitions to 1.0 <> 1.4 <> 1.75 GHz), but if the > frequency transitions happen, they're too fast to observe. And as > noted above, I'm not yet sure of the right way to add dynamic > debugging messages. > > Running the QA script in "case1" (toggle 800 MHz to 1.75 GHz) still > crashes. > > QA script log on 52f4f77 (.tar.xz due to 471.8 KiB uncompressed size): > https://zorro.casa/sync/Hosting/Utilities/Development/OpenWRT/mailing-list/ipq806x_%20backport%20cpufreq%20changes%20to%205.4/debug-cpufreq%20-%20clock%20default%20test%20case1%20-%202021-09-07%2019-58-07%20-%20r17399-52f4f77518-branch-ft-fix-ipq8065-reset%20-%20reboot%20-%20public.tar.xz > > Separately, I updated the QA script to add a "ramp1" case which > smoothly ramps the CPU core frequency up/down from 600 MHz to > 1.75 GHz, stopping at every frequency in between
Re: [PATCH 21.02] ipq806x: backport cpufreq changes to 5.4
On 8/24/21 7:21 PM, Shane Synan wrote: > The fix hasn't been found, but progress has been made! > > After further testing, I think I've found a way to recreate this issue > with just the router itself, no external USB HDD, no Déjà Dup backup > over SFTP, and possibly no extra changes beyond a stock NBG6817 > OpenWRT build (not confirmed as this router runs my home network, > including SQM QoS, VLANs with another WiFi AP, etc). So far, I've attempted all three suggested fixes, but I had trouble implementing one and I'm unsure if I tried the other two correctly. Additionally, pinning to "performance" for 1.75 GHz does not solve the issue either - more on that near the end. I've put all of my commits into one branch for easier reference: https://github.com/openwrt/openwrt/compare/master...digitalcircuit:ft-fix-ipq8065-reset And I've used my simplified automatic QA script for verification: https://github.com/digitalcircuit/openwrt-ipq806x-qa-cpu-reset#readme (In theory, anyone should be able to reproduce the issue with this script on a stock OpenWRT build. I'll still do testing with the Déjà Dup SFTP backup workload.) Suggestions and results in order of attempt: (Ignore "ipq8065: force CPUs to share DVFS scaling", wrong method.) 1. Raising clock latency (commits with "clock latency" in subject) I've tried raising the clock-latency-ns in the ipq8065 DTS by 100 nanoseconds, a deliberately excessive value in the hopes of it being enough to notice any issues. I've tried this for... * 1.4 GHz and 1.75 GHz (ipq8065: raise 1.4 & 1.75 GHz clock latency) * All CPU frequencies (previous + ipq8065: raise all clock latency) * All CPU frequencies and L2 cache latency (previous + ipq8065: raise L2 cache, CPU core clock latency) Unfortunately, as noted in the revert commit, this seemed to have no impact on the results from the QA script. I don't know if I've correctly implemented this suggestion. QA script log on b1870c2 (.tar.xz due to 12.2 MiB uncompressed size): https://zorro.casa/sync/Hosting/Utilities/Development/OpenWRT/mailing-list/ipq806x_%20backport%20cpufreq%20changes%20to%205.4/debug-cpufreq%20-%20clock%20default%20test%20case1%20-%202021-08-30%2022-37-50%20-%20r17395-b1870c2530-branch-ft-fix-ipq8065-reset%20-%20date%20segfault%2C%20reboot%20-%20public.tar.xz 2. Run both cores at the same frequency (most promising?) I tried to do this (ipq806x: Force CPU cores to share frequency), but I think I didn't modify the cpufreq driver in the correct way. As noted in the revert commit, this didn't appear to force CPUs to share frequency, whether manually using the performance governor or periodically observing the ondemand governor - the CPU cores were at different frequencies. I'll need help figuring out how to implement this in the cpufreq driver correctly. It seems promising given that in the past, dual-core bursty workloads didn't seem to trigger the crash. NOTE: Before diving into implementing this, read the conclusion below as I've noticed reboots happen without changing CPU frequency as well. I'm also not sure how to debug the cpufreq driver in general. With dynamic debugging, I can turn on messages about the cpufreq governor, but I'm not sure of the right way to add dynamic debugging print messages to the cpufreq driver. Example of dynamic debugging: echo "file drivers/cpufreq/* =p" > /sys/kernel/debug/dynamic_debug/control QA script log on 1fdabd9 (.tar.xz due to 4.4 MiB uncompressed size): https://zorro.casa/sync/Hosting/Utilities/Development/OpenWRT/mailing-list/ipq806x_%20backport%20cpufreq%20changes%20to%205.4/debug-cpufreq%20-%20clock%20default%20test%20case1%20-%202021-08-31%2020-38-04%20-%20r17397-1fdabd95db-branch-ft-fix-ipq8065-reset%20-%20date%20segfault%2C%20reboot%20-%20public.tar.xz 3. Add forced frequency transitions between 1.0 GHz and 1.75 GHz I'm not sure if I implemented this correctly. I made a first attempt (ipq806x: Add transitions to 1.0 <> 1.4 <> 1.75 GHz), but if the frequency transitions happen, they're too fast to observe. And as noted above, I'm not yet sure of the right way to add dynamic debugging messages. Running the QA script in "case1" (toggle 800 MHz to 1.75 GHz) still crashes. QA script log on 52f4f77 (.tar.xz due to 471.8 KiB uncompressed size): https://zorro.casa/sync/Hosting/Utilities/Development/OpenWRT/mailing-list/ipq806x_%20backport%20cpufreq%20changes%20to%205.4/debug-cpufreq%20-%20clock%20default%20test%20case1%20-%202021-09-07%2019-58-07%20-%20r17399-52f4f77518-branch-ft-fix-ipq8065-reset%20-%20reboot%20-%20public.tar.xz Separately, I updated the QA script to add a "ramp1" case which smoothly ramps the CPU core frequency up/down from 600 MHz to 1.75 GHz, stopping at every frequency in between. Unfortunately, this still crashes. Interestingly, the crash again happens when CPU core frequencies are distant from each other (1.75 GHz and 800 MHz). This lends credence to the idea of locking CPU frequencies together for 1.4 an
Re: [PATCH 21.02] ipq806x: backport cpufreq changes to 5.4
On 8/10/21 6:02 PM, Shane Synan wrote: > I've tried multiple other things (mentioned below) which haven't > worked, so it's on to modifying the CPU voltage! […] The fix hasn't been found, but progress has been made! After further testing, I think I've found a way to recreate this issue with just the router itself, no external USB HDD, no Déjà Dup backup over SFTP, and possibly no extra changes beyond a stock NBG6817 OpenWRT build (not confirmed as this router runs my home network, including SQM QoS, VLANs with another WiFi AP, etc). I've put the scripts and documentation into a dedicated repository: https://github.com/digitalcircuit/openwrt-ipq806x-qa-cpu-reset (In brief: toggling individual CPU cores between 1.75 GHz and 800 MHz) For others interested, feel free to try to recreate the issue with your own IPQ8065 routers! Possibly even IPQ806x (untested). I'll follow up here again once I've made notable progress. As part of my research, I found a minor oversight with the OPP CPU voltages, which has been fixed as of the following commit: https://github.com/openwrt/openwrt/commit/9baca410644b3f0fe94e2d5b6558c9c4bf61e712 (OPP voltage triplets are , not like one would expect) Ansuel has offered several promising suggestions on how to potentially fix the reboot issue in the pull request for that commit: https://github.com/openwrt/openwrt/pull/4464#issuecomment-903178662 In essence: * Transitioning through multiple frequencies going to/from high speeds * Forcing both CPU cores to same frequency in some/all situations * Increasing clock latency in DTS I'll explore these next! I've also determined that raising the CPU voltages by 2 microvolts did not resolve the issue. I may revisit this with higher voltages in the future: https://github.com/openwrt/openwrt/compare/openwrt-21.02...digitalcircuit:openwrt-21.02-cpufreq-dtsivolt-cache-fix-opp-order (Above obsoleted by "ipq806x: fix min<>target opp-microvolt DTS mixup" being merged to "master"; I'll create a new backport branch later.) Tune in next time to find out what happens on... ...debugging with digitalcircuit! And thanks to everyone who has offered ideas, encouragement, and patience as I learn many new things. Regards, Shane Synan ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [PATCH 21.02] ipq806x: backport cpufreq changes to 5.4
On 7/26/21 11:19 PM, Shane Synan wrote: > Thank you! In the upcoming weeks, I'll look into tinkering with the > voltage settings for the CPU in the OPP table. > > […] I've tried multiple other things (mentioned below) which haven't worked, so it's on to modifying the CPU voltage! I admit I held off out of concern for damaging the router, but I'm at the point of being willing to replace it, so if I mess up, it's fine. Just to confirm, increasing CPU voltage in the OPP table involves modifying the "opp-microvolt" line within "[...]/qcom-ipq8065.dtsi"... opp-14 { [...] opp-microvolt = <115>; [...] } ...correct? And if so, do you remember how much you increased the voltage? I'll research this regardless, so if it'd be a hassle to find it, etc, no obligation. As before, thank you for your time! Regards, Shane Synan P.S. The rest of this is merely publicly keeping track of what I've tried that has not worked: * Replacing the router power supply (tried "HitLights 12VDC 60W", e.g. 5 amp vs. router's OEM 3.5 amp) Nothing-shown-in-serial-console reboot still happens. It is possible I picked a bad power supply, I don't (yet) have a convenient way to measure it while connected to the router, but it at least seemed reasonable. I can revisit this one if desired. * Using a powered USB 3.0 hub connected to the USB 3.0 port on the NBG6817 to remove all power draw (peak measured current: 0.001 amps) Issue still happens. * Using an SSD (Samsung 850 Pro 256 GB) via the USB 3.0 hub instead of the spinning 1 TB HDD Issue still happens. (Note: the SSD connected directly to the NBG6817's USB 3.0 port without the powered USB 3.0 hub in between resulted in the Linux kernel losing connection and resetting the USB port, possibly due to sharper current spikes than the spinning disk. I'm treating that as an unrelated issue.) * Using the same SSD via the USB 3.0 hub connected to the NBG6817's USB 2.0 port Issue still happens - it's likely not the router's USB 3.0 port. Older tests: * Measuring and recreating the 1 TB USB 3.0 HDD's peak power draw (670-ish mA) and/or the maximum USB 3's spec power draw (900 mA) and manually toggling a testing load rapidly on/off while router's under CPU load No quick crashes. I'd need to automate the USB test load to perform a long term (multiple hours) test though. Future tests: Asides from the OPP table adjustment, I may try other testing combinations too, e.g. an IRC suggestion of turning off WiFi radios. Other ideas include putting the USB HDD on USB 2.0 port without/with the powered USB 3.0 hub, fitting a measurement device between the router's power supply and the router itself, etc. As a refresher for others and myself, my backport efforts are currently in these two links: Backport CPU governor with 1.4 GHz L2 cache enabled (broken for me): https://github.com/openwrt/openwrt/compare/openwrt-21.02...digitalcircuit:openwrt-21.02-cpufreq-dtsivolt-cache ...versus... Backport CPU governor with 1.4 GHz L2 cache DISABLED (works for me): https://github.com/openwrt/openwrt/compare/openwrt-21.02...digitalcircuit:openwrt-21.02-cpufreq-dtsivolt-cache-no-1.4ghz (As Ansuel noted months earlier, this should be a performance regression from 19.07. While I've not noticed any appreciable degradation, I've also not run iperf3/etc to confirm throughput, and my ISP's claimed 200 mbit/s download speed is hardly top tier.) ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [PATCH 21.02] ipq806x: backport cpufreq changes to 5.4
[Since the mailing list didn't seem to pick up Ansuel's mobile reply, I've reformatted here for easier future reference.] On 7/20/21 8:36 PM EST, Ansuel Smith wrote: > Sorry for the bad answer (I'm from phone) > Anyway what you can try is tweak the OPP table and increase your CPU > voltage... > Don't know if it was related but some time ago I tested the router > with overclock and strangely the router was more stable - 28 days of > uptime while normally I have 3-4 days. > My idea is some chip degradation or bad power supply and this cause > crash. Increasing the voltage seems to fix the problem. Thank you! In the upcoming weeks, I'll look into tinkering with the voltage settings for the CPU in the OPP table. With your advice in mind, I'm also suspecting it may be the combination of CPU power draw and my USB 3.0 HDD's power drain (an old 1 TB Seagate SRD00F1), as the latter is bus-powered rather than using an external power supply. I've acquired a digital USB test load and a USB 3.0 voltage/current meter is on the way (estimated arrival August 15th). With any luck, I hope to reliably recreate the issue by running a CPU-intensive task combined with a fixed current load to mimic the HDD drawing from the NBG6817's USB 3.0 port, instead of the convoluted Deja Dup SFTP backup test that I've been doing. I'll also look at measuring the DC side of the power supply while it's under load. It might even be a combination of the bus-powered USB 3.0 HDD, chip degradation, and/or failing power supply. I've had the NBG6817 as my primary home router since November 2019 (bought new). Again, thank you for your patience and time spent on this tricky issue! I'll follow up if I uncover anything of interest, and I'll keep updating folks in the OFTC/#openwrt-devel IRC channel too. ~ Shane ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [PATCH 21.02] ipq806x: backport cpufreq changes to 5.4
On 7/1/21 4:55 PM, Shane Synan wrote: > On 6/28/21 2:10 AM, Shane Synan wrote: > [...] > > Given these commits work just fine on the "master" branch, and on > "21.02" it worked to change the CPU governor *without* the DTSI > backports and cache (i.e. only the first commit), I strongly suspect I > goofed up and missed backporting something important. After further attempts, I still encounter the issue on OpenWRT "master" with Linux kernel 5.4 as well, I just didn't try enough iterations of SFTP upload testing. Disabling 1.4 GHz L2 cache by removing that "opp_table_l2" entry from the .dtsi (git revert'ng "ipq806x: fix missing 1.4ghz cache freq for ipq8065 SoC") *so far* seems more stable. This applies to OpenWRT 21.02 as well if I try backporting the changes: https://github.com/openwrt/openwrt/compare/openwrt-21.02...digitalcircuit:openwrt-21.02-cpufreq-dtsivolt-cache ...versus... https://github.com/openwrt/openwrt/compare/openwrt-21.02...digitalcircuit:openwrt-21.02-cpufreq-dtsivolt-cache-no-1.4ghz (stress-ng currently fails to build from source on the "master" branch, hence trying to narrow it down on 21.02 as well) Unfortunately, I still haven't managed to recreate this issue in a simple manner. I'm continuing to experiment with stress-ng parameters in various loops, scripting GNOME's GIO backend via Python for repeated SFTP uploads, etc, but haven't succeeded yet. I'll continue to explore my options. Would you have any suggestions for how to diagnose this issue at a lower level? I've not gotten any cpufreq-related error messages from the serial console. I've noticed that the cpufreq driver mentions transitions to/from 384 MHz as well, despite having the /etc/init.d/cpufreq changes applied raising scaling_min_freq to 600 MHz - perhaps I'm somehow bypassing the check that tries to avoid the hardware constraint? From :To :3840006080 100 140 1725000 384000: 014 3 611 9 60:15 0 49834 23113 7106477 80: 7 64708 0 1430134 28653 100:10 32739 13774 037 24203 140: 6 82162 017 1725000: 5 81977 44070 3328125 0 (Source: /sys/devices/system/cpu/cpufreq/policy[0..1]/stats/trans_table) Alternatively, is it possible to adjust the maximum L2 cache frequency at runtime, similar to setting the CPU governor min/max? To be clear, I appreciate your efforts to keep ipq806x alive, especially after Qualcomm had abandoned the platform (I hadn't realize this until someone on #openwrt-devel pointed this out). I don't intend to seem ungrateful. If you'd prefer to consider this bizarrely hard-to-simplify test case unsolvable, that's okay! I can continue to run custom builds with 1.4 GHz L2 cache frequency disabled, and the majority of others without my bizarrely-nondeterministic issues can enjoy the higher speed. Regardless, thank you for your time spent looking into this! I hope my attempts to diagnose this has not caused you any trouble. ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [PATCH 21.02] ipq806x: backport cpufreq changes to 5.4
On 6/28/21 2:10 AM, Shane Synan wrote: > On 6/27/21 6:10 PM, Ansuel Smith wrote: >> It's good but consider that also the dts changes are required and also there >> is a pending pr that has to be merged or ipq8065 device will run at lower >> speed. >> The new cpufreq have changed the node definition >> This is the pr I'm talking about. >> https://github.com/openwrt/openwrt/pull/4192 >> >> So in short this backport lacks the dts changes and the pr4192 is mandatory >> or >> we will introduce perf regression in 21 >> > > Thank you for pointing out the details of the DTS changes and CPU > performance. Originally I was trying to achieve the smallest > backport, but in hindsight, it makes far more sense to avoid > frankensteining with partial changes. I appreciate your patience and > advice! > > Meanwhile, until PR #4192 can be reviewed/merged to master (so there's > commits to actually backport), I've thrown together a rough draft of > what changes would need backported: Now that https://github.com/openwrt/openwrt/pull/4192 is merged, including four successful tests on Linux kernel v5.4, I've tried backporting all the CPU frequency/DTS/etc changes that appeared relevant, as shown here: https://github.com/openwrt/openwrt/compare/openwrt-21.02...digitalcircuit:openwrt-21.02-cpufreq-dtsivolt-cache (some commits were modified to drop kernel v5.10 patches) Unfortunately, my initial attempt is unstable again. I've gotten 3 reboots so far within a few hours of SFTP testing. Unlike before, nothing displays on the router's soldered-on serial console headers. Given these commits work just fine on the "master" branch, and on "21.02" it worked to change the CPU governor *without* the DTSI backports and cache (i.e. only the first commit), I strongly suspect I goofed up and missed backporting something important. Ansuel, do you have any suggestions on what else I should look for or try tinkering with? Meanwhile, just to see what happens in an attempt to narrow things down, I'm testing a build without the DTSI backports (all commits linked above except for "ipq806x: refresh dtsi patches" and "ipq806x: apply correct voltage tolerance"). So far, it boots, though I haven't run a full test. Regardless, it's not a proper solution, just a troubleshooting attempt. ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [PATCH 21.02] ipq806x: backport cpufreq changes to 5.4
On 6/27/21 6:10 PM, Ansuel Smith wrote: > Il giorno dom 27 giu 2021 alle ore 23:41 Baptiste Jonglez > ha scritto: >> >> Ansuel, do you have any feedback on this backport? It seems to help things. >> > It's good but consider that also the dts changes are required and also there > is a pending pr that has to be merged or ipq8065 device will run at lower > speed. > The new cpufreq have changed the node definition > This is the pr I'm talking about. > https://github.com/openwrt/openwrt/pull/4192 > > So in short this backport lacks the dts changes and the pr4192 is mandatory or > we will introduce perf regression in 21 > Thank you for pointing out the details of the DTS changes and CPU performance. Originally I was trying to achieve the smallest backport, but in hindsight, it makes far more sense to avoid frankensteining with partial changes. I appreciate your patience and advice! Meanwhile, until PR #4192 can be reviewed/merged to master (so there's commits to actually backport), I've thrown together a rough draft of what changes would need backported: https://github.com/openwrt/openwrt/compare/openwrt-21.02...digitalcircuit:openwrt-21.02-cpufreq-dtsivolt-cache Cherry-pick of existing DTS changes and commits in PR #4192: https://github.com/openwrt/openwrt/pull/4192 [NOTE: I might be missing commits!] This isn't a real backport, it's just so I can try out these changes right now. So far, it builds and runs, and I've started the first seven-hour SFTP backup test as described in FS#3099. If there's anything I'm overlooking, or anything else I could try that would be helpful, let me know! No disrespect meant, either. I realize the scope of these changes means this might need to wait for 21.02.1, and meanwhile I'm happy to continue making and testing custom builds. ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [PATCH 21.02] ipq806x: backport cpufreq changes to 5.4
Il giorno dom 27 giu 2021 alle ore 23:41 Baptiste Jonglez ha scritto: > > Hi, > > On 20-06-21, Shane Synan wrote: > > In the time since submitting this, I've continued testing this > > change on my ZyXEL NBG6817. I'm reasonably confident this fixes my > > issue (11/11 successes), and if there's any further testing that > > would help, let me know! > > Thanks for the patch and testing, I'd like to merge it in the next few > days (before 21.02.0 final). I'll update the commit message. > > Ansuel, do you have any feedback on this backport? It seems to help things. > > Thanks, > Baptiste It's good but consider that also the dts changes are required and also there is a pending pr that has to be merged or ipq8065 device will run at lower speed. The new cpufreq have changed the node definition This is the pr I'm talking about. https://github.com/openwrt/openwrt/pull/4192 So in short this backport lacks the dts changes and the pr4192 is mandatory or we will introduce perf regression in 21 ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [PATCH 21.02] ipq806x: backport cpufreq changes to 5.4
Hi, On 20-06-21, Shane Synan wrote: > In the time since submitting this, I've continued testing this > change on my ZyXEL NBG6817. I'm reasonably confident this fixes my > issue (11/11 successes), and if there's any further testing that > would help, let me know! Thanks for the patch and testing, I'd like to merge it in the next few days (before 21.02.0 final). I'll update the commit message. Ansuel, do you have any feedback on this backport? It seems to help things. Thanks, Baptiste signature.asc Description: PGP signature ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
Re: [PATCH 21.02] ipq806x: backport cpufreq changes to 5.4
> -Original Message- > Sent: 6/17/21 1:33 AM > To: openwrt-devel@lists.openwrt.org > Cc: Ansuel Smith > Subject: [PATCH 21.02] ipq806x: backport cpufreq changes to 5.4 > In the time since submitting this, I've continued testing this change on my ZyXEL NBG6817. I'm reasonably confident this fixes my issue (11/11 successes), and if there's any further testing that would help, let me know! I have tried to recreate this issue with stress-ng. Unfortunately, my efforts so far have only discovered new and seemingly-unrelated ways to cause kernel panics. Though this patch restores pre-21.02 stability for me, it's clearly not the only issue on ipq806x. Others on the OFTC/#openwrt-devel IRC channel have kindly informed me it would have helped to include a version of my test results and rationale in the commit message itself - feel free to amend when merging, or if it would be easier, I'm happy to submit a PATCHv2 instead! In the interim, I'll keep verifying this. Given the frequency of crashes without this patch, I won't be able to return to official 21.02 builds meanwhile. I've also CC'd Baptiste due to this May mailing post: https://lists.openwrt.org/pipermail/openwrt-devel/2021-May/035153.html Pardon if this was out of place. Test case: OpenSSH configured as secondary SSH server, locked to SFTP and chroot'd to an external USB HDD. Deja Dup (duplicity frontend) runs on my Linux computers, backing up roughly 223 GB compressed and encrypted in about 7 hours. This happens in 25 MB chunks, so the router sees bursts of CPU/memory/IO activity with pauses in between. See "Environment" below as well. * WITHOUT this patch, running on 21.02rc1/rc2: 5/30 SFTP transfers succeed, with at most 2 succeeding in a row. 21.02.0rc1: 4/25 backups succeeded 21.02.0rc2: 1/5 backups succeeded Noted in the bug report here: https://bugs.openwrt.org/index.php?do=details&task_id=3099#comment9712 I've been investigating this for longer than a month. If desired, I can retest with the same "openwrt/openwrt" commit, minus the CPU patch. * WITH this patch, built atop 21.02 branch commit 072d0afb8fa6359541568081c23fe2d8d411651c: 11/11 SFTP transfers succeed all consecutively, as noted in the bug report: https://bugs.openwrt.org/index.php?do=details&task_id=3099#comment9820 Note: I did not need to apply the DTSI patches for this success. However, they did not cause any issue, either - my previous time spent running a May 18th snapshot build went just fine. Environment: ZyXEL NBG6817 as router and access point connected to ISP modem: * 3 local networks (primary, guest, openwireless.org) with 3x(2.4+5 GHz) corresponding WiFi SSIDs * Additional packages: nano ncdu luci-ssl openssh-sftp-server mosh-server luci-app-advanced-reboot luci-app-ddns luci-app-sqm luci-app-openvpn openvpn-openssl luci-app-nlbwmon stubby hostapd-utils wget-ssl openssh-server shadow-useradd iperf3 * Additional packages for USB 3 external HDD: block-mount e2fsprogs kmod-fs-ext4 kmod-usb-storage kmod-usb2 kmod-usb3 kmod-usb-storage-uas * Dropbear remains the primary SSH daemon. OpenSSH is configured to run on a different port, locked down to SFTP access only via chroot to the USB HDD. * ASUS RT-AC68U running FreshTomato is connected via Ethernet as a secondary 3x(2.4+5 GHz) AP, using VLAN tagging * Other seemingly small configuration tweaks which I can share if it would help I recognize that OpenWRT 21.02.0 stable is nearly upon us. If there's anything else I can do to help verify this change, let me know. Your time is appreciated. Thanks, Shane ___ openwrt-devel mailing list openwrt-devel@lists.openwrt.org https://lists.openwrt.org/mailman/listinfo/openwrt-devel
[PATCH 21.02] ipq806x: backport cpufreq changes to 5.4
Date: Thu, 8 Apr 2021 16:20:36 +0200 From: Ansuel Smith The new cpufreq driver requires different dts bindings. Backport the new driver to kernel 5.4 Signed-off-by: Ansuel Smith Tested-by: Shane Synan (cherry picked from commit 6e411b8416388a9c8be1b2291be9b5adeeb07784) --- I've tested these changes on the ZyXEL NBG6817 and they've resolved a 5.4 kernel crash for me, triggered by SFTP uploads via OpenSSH server to an attached USB 3.0 drive, as documented in FS#3099. https://bugs.openwrt.org/index.php?do=details&task_id=3099#comment9820 However, I'm not certain this resolves the crashes for others, hence not mentioning a "Fixes" for the issue in the commit directly. This is my first attempt at my first submission to the openwrt-devel mailing list, let me know if anything's amiss! I'm unsure if attribution is properly credited to the original patch author. target/linux/ipq806x/config-5.4 | 1 + ...rt-adjusting-OPP-voltages-at-runtime.patch | 153 ...per-to-get-an-opp-regulator-for-devi.patch | 52 -- ...e-voltage-tolerance-when-adjusting-t.patch | 47 -- ...-dt-Handle-OPP-voltage-adjust-events.patch | 118 --- ...-dt-Add-L2-frequency-scaling-support.patch | 199 - ...056-cpufreq-dt-Add-missing-rcu-locks.patch | 23 - ...qcom-cpufreq-nvmem-support-specific-.patch | 51 ++ ...q-add-Krait-dedicated-scaling-driver.patch | 679 ++ ...ufreq-add-qcom-krait-cpufreq-binding.patch | 237 ++ ...dd-fab-scaling-support-with-cpufreq.patch} | 34 +- 11 files changed, 985 insertions(+), 609 deletions(-) delete mode 100644 target/linux/ipq806x/patches-5.4/0049-PM-OPP-Support-adjusting-OPP-voltages-at-runtime.patch delete mode 100644 target/linux/ipq806x/patches-5.4/0051-PM-OPP-Add-a-helper-to-get-an-opp-regulator-for-devi.patch delete mode 100644 target/linux/ipq806x/patches-5.4/0052-PM-OPP-Update-the-voltage-tolerance-when-adjusting-t.patch delete mode 100644 target/linux/ipq806x/patches-5.4/0054-cpufreq-dt-Handle-OPP-voltage-adjust-events.patch delete mode 100644 target/linux/ipq806x/patches-5.4/0055-cpufreq-dt-Add-L2-frequency-scaling-support.patch delete mode 100644 target/linux/ipq806x/patches-5.4/0056-cpufreq-dt-Add-missing-rcu-locks.patch create mode 100644 target/linux/ipq806x/patches-5.4/093-drivers-cpufreq-qcom-cpufreq-nvmem-support-specific-.patch create mode 100644 target/linux/ipq806x/patches-5.4/098-1-cpufreq-add-Krait-dedicated-scaling-driver.patch create mode 100644 target/linux/ipq806x/patches-5.4/098-2-Documentation-cpufreq-add-qcom-krait-cpufreq-binding.patch rename target/linux/ipq806x/patches-5.4/{0057-add-fab-scaling-support-with-cpufreq.patch => 098-3-add-fab-scaling-support-with-cpufreq.patch} (93%) diff --git a/target/linux/ipq806x/config-5.4 b/target/linux/ipq806x/config-5.4 index 68ed6ec0c7..d410cd3829 100644 --- a/target/linux/ipq806x/config-5.4 +++ b/target/linux/ipq806x/config-5.4 @@ -62,6 +62,7 @@ CONFIG_ARM_MODULE_PLTS=y CONFIG_ARM_PATCH_IDIV=y CONFIG_ARM_PATCH_PHYS_VIRT=y # CONFIG_ARM_QCOM_CPUFREQ_HW is not set +CONFIG_ARM_QCOM_CPUFREQ_KRAIT=y CONFIG_ARM_QCOM_CPUFREQ_NVMEM=y CONFIG_ARM_QCOM_CPUIDLE=y # CONFIG_ARM_SMMU is not set diff --git a/target/linux/ipq806x/patches-5.4/0049-PM-OPP-Support-adjusting-OPP-voltages-at-runtime.patch b/target/linux/ipq806x/patches-5.4/0049-PM-OPP-Support-adjusting-OPP-voltages-at-runtime.patch deleted file mode 100644 index 9efbd583b4..00 --- a/target/linux/ipq806x/patches-5.4/0049-PM-OPP-Support-adjusting-OPP-voltages-at-runtime.patch +++ /dev/null @@ -1,153 +0,0 @@ -From: Sylwester Nawrocki -To: k...@kernel.org, vire...@kernel.org, robh...@kernel.org -Cc: sb...@kernel.org, roger...@mediatek.com, - linux...@vger.kernel.org, linux-arm-ker...@lists.infradead.org, - linux-samsung-...@vger.kernel.org, devicet...@vger.kernel.org, - b.zolnier...@samsung.com, m.szyprow...@samsung.com, - Stephen Boyd , - Sylwester Nawrocki -Subject: [PATCH v5 1/4] PM / OPP: Support adjusting OPP voltages at runtime -Date: Wed, 16 Oct 2019 16:57:53 +0200 -Message-ID: <20191016145756.16004-2-s.nawro...@samsung.com> (raw) -In-Reply-To: <20191016145756.16004-1-s.nawro...@samsung.com> - -From: Stephen Boyd - -On some SoCs the Adaptive Voltage Scaling (AVS) technique is -employed to optimize the operating voltage of a device. At a -given frequency, the hardware monitors dynamic factors and either -makes a suggestion for how much to adjust a voltage for the -current frequency, or it automatically adjusts the voltage -without software intervention. Add an API to the OPP library for -the former case, so that AVS type devices can update the voltages -for an OPP when the hardware determines the voltage should -change. The assumption is that drivers like CPUfreq or devfreq -will register for the OPP notifiers and adjust the voltage -according to suggestions that AVS makes. - -This patch is derived from [1] submitted by Stephen. -[1] https://lore.kernel.org/patchwor