Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
On Mon, Oct 31, 2016 at 11:54:44PM +0200, sonofa...@openmailbox.org wrote: > No, will simply crash without running something special! If you get no > issues then that is bad news. I get frequent and repeatable crashes. I > forgot to mention that all those crashes occur at program launch. If the > program launches, it does not crash. Unfortunately with Ubuntu it is not > possible to keep oopses from the error report program. Can't you enable crash dumps? $ ulimit -c unlimited That should reenable core dumping which can then be examined with gdb. You could put the program executable and the core somewhere on the web so that I can take a look... In any case, I'd like to see what those crashes look like. Can you send dmesg, does it even say something in dmesg related to those crashes? > On the laptop we have a Debian installation, I will switch to it and > get crash information there so that we figure out why it behaves that > way. Besides that, I have some more tests to do but I am running out > of ideas so I might not be able to help you more on it as my laptop > appears to be really broken! Maybe a hw issue? RAM broken, cooling failing... > Poor performance might be the result here and not the cause of my > issues. The 688 fix just makes the system respond better. As far as I > am concerned, I do not wish any module options for turning on the fix. > I would prefer to use DMI maching for this specific machine and thus > having the fix automatically. The problem with DMI strings is that then we have to always go and update them. And that's always a PITA. I'm just trying to avoid an unnecessary performance penalty to users with the erratum workaround where the erratum itself didn't even occur in the first place. Like in my case, for example. I've never had any issues with that machine for the time I've been using it. > All subsystems of the laptop appear to be good(RAM has been tested and > the HDD has passed our test). There is a sound issue on the HDA but > such issues are common on most laptops and will be dealt soon. If by "tested" you mean, you ran memtest on it, memtest is notorious for not always catching faulty DIMMs. > > Is it a desktop system or a laptop? > I got no reply on this question so I suppose you have a desktop. Oh sorry, I must've missed that question. No, the Ontario I have is a laptop, something like this one: https://support.lenovo.com/de/en/documents/pd015763 x121e with an AMD CPU, I *think* it is E-350. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
On Mon, Oct 31, 2016 at 11:54:44PM +0200, sonofa...@openmailbox.org wrote: > No, will simply crash without running something special! If you get no > issues then that is bad news. I get frequent and repeatable crashes. I > forgot to mention that all those crashes occur at program launch. If the > program launches, it does not crash. Unfortunately with Ubuntu it is not > possible to keep oopses from the error report program. Can't you enable crash dumps? $ ulimit -c unlimited That should reenable core dumping which can then be examined with gdb. You could put the program executable and the core somewhere on the web so that I can take a look... In any case, I'd like to see what those crashes look like. Can you send dmesg, does it even say something in dmesg related to those crashes? > On the laptop we have a Debian installation, I will switch to it and > get crash information there so that we figure out why it behaves that > way. Besides that, I have some more tests to do but I am running out > of ideas so I might not be able to help you more on it as my laptop > appears to be really broken! Maybe a hw issue? RAM broken, cooling failing... > Poor performance might be the result here and not the cause of my > issues. The 688 fix just makes the system respond better. As far as I > am concerned, I do not wish any module options for turning on the fix. > I would prefer to use DMI maching for this specific machine and thus > having the fix automatically. The problem with DMI strings is that then we have to always go and update them. And that's always a PITA. I'm just trying to avoid an unnecessary performance penalty to users with the erratum workaround where the erratum itself didn't even occur in the first place. Like in my case, for example. I've never had any issues with that machine for the time I've been using it. > All subsystems of the laptop appear to be good(RAM has been tested and > the HDD has passed our test). There is a sound issue on the HDA but > such issues are common on most laptops and will be dealt soon. If by "tested" you mean, you ran memtest on it, memtest is notorious for not always catching faulty DIMMs. > > Is it a desktop system or a laptop? > I got no reply on this question so I suppose you have a desktop. Oh sorry, I must've missed that question. No, the Ontario I have is a laptop, something like this one: https://support.lenovo.com/de/en/documents/pd015763 x121e with an AMD CPU, I *think* it is E-350. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
Ok, Ubuntu 16.04.1 is running on the box now, no issues so far. Any special workload I should run? No, will simply crash without running something special! If you get no issues then that is bad news. I get frequent and repeatable crashes. I forgot to mention that all those crashes occur at program launch. If the program launches, it does not crash. Unfortunately with Ubuntu it is not possible to keep oopses from the error report program. On the laptop we have a Debian installation, I will switch to it and get crash information there so that we figure out why it behaves that way. Besides that, I have some more tests to do but I am running out of ideas so I might not be able to help you more on it as my laptop appears to be really broken! Poor performance might be the result here and not the cause of my issues. The 688 fix just makes the system respond better. As far as I am concerned, I do not wish any module options for turning on the fix. I would prefer to use DMI maching for this specific machine and thus having the fix automatically. All subsystems of the laptop appear to be good(RAM has been tested and the HDD has passed our test). There is a sound issue on the HDA but such issues are common on most laptops and will be dealt soon. Is it a desktop system or a laptop? I got no reply on this question so I suppose you have a desktop. Since my APU is installed on a laptop, I expect different behaviour. There were many intel based laptops that had fewer lanes on the DMI interconnect bridging northbridge and southbridge. Maybe my laptop has the A-Link in reduced mode. That could explain my performance issues. It must be easy to verify that as all documents are available for its northbridge and southbridge. I will check the settings of both chips thoroughly. My brother has spotted a C70 board. Normally I would not buy it but it has a better but slower(without CPB) F14 CPU and I am curious if it will behave better like your board does. I might order it if it is still available.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
Ok, Ubuntu 16.04.1 is running on the box now, no issues so far. Any special workload I should run? No, will simply crash without running something special! If you get no issues then that is bad news. I get frequent and repeatable crashes. I forgot to mention that all those crashes occur at program launch. If the program launches, it does not crash. Unfortunately with Ubuntu it is not possible to keep oopses from the error report program. On the laptop we have a Debian installation, I will switch to it and get crash information there so that we figure out why it behaves that way. Besides that, I have some more tests to do but I am running out of ideas so I might not be able to help you more on it as my laptop appears to be really broken! Poor performance might be the result here and not the cause of my issues. The 688 fix just makes the system respond better. As far as I am concerned, I do not wish any module options for turning on the fix. I would prefer to use DMI maching for this specific machine and thus having the fix automatically. All subsystems of the laptop appear to be good(RAM has been tested and the HDD has passed our test). There is a sound issue on the HDA but such issues are common on most laptops and will be dealt soon. Is it a desktop system or a laptop? I got no reply on this question so I suppose you have a desktop. Since my APU is installed on a laptop, I expect different behaviour. There were many intel based laptops that had fewer lanes on the DMI interconnect bridging northbridge and southbridge. Maybe my laptop has the A-Link in reduced mode. That could explain my performance issues. It must be easy to verify that as all documents are available for its northbridge and southbridge. I will check the settings of both chips thoroughly. My brother has spotted a C70 board. Normally I would not buy it but it has a better but slower(without CPB) F14 CPU and I am curious if it will behave better like your board does. I might order it if it is still available.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
On Mon, Oct 24, 2016 at 07:14:50PM +0200, Borislav Petkov wrote: > > Yes, using Ubuntu 16.04 will just crash everything! For example I had > > crashes with the software updater program. Moreover firefox would become > > unresponsive even with one tab. > > Ok, lemme install 16.04 on that box and see if I can reproduce. Ok, Ubuntu 16.04.1 is running on the box now, no issues so far. Any special workload I should run? -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
On Mon, Oct 24, 2016 at 07:14:50PM +0200, Borislav Petkov wrote: > > Yes, using Ubuntu 16.04 will just crash everything! For example I had > > crashes with the software updater program. Moreover firefox would become > > unresponsive even with one tab. > > Ok, lemme install 16.04 on that box and see if I can reproduce. Ok, Ubuntu 16.04.1 is running on the box now, no issues so far. Any special workload I should run? -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
Why not? It all depends on the load type, working set and the access patterns. There's no strong correlation between the load of a machine and the amount of branch misses... Yes I did not say that there is a linear correlation but that does not mean that those two numbers move opposite to each other. On all our systems running more tasks that consume more CPU and memory result in increased branch misses. It is normal as one thread might block another and a third thread might wait for the first thread to finish in order to resume. It is not normal to have increased misses only when the OS is loaded and running in idle without doing anything. Unless you are talking for AMD F14. I wonder if we should just flush the L2 and disable it completely on AMD F14. Since this is an APU I have no idea if the onboard graphics can operate properly without L2. setpci -s 0x18.4 0x164.l and looking at bit 2. If it is set, the erratum is fixed. Will do but there is no meaning as I already told you on the first mail that D18F4x164 is 0003h. It will not change. No, I don't mean that - I'm talking about *not* applying it by default and when people start seeing issues like that, they can boot their machines with something like "enable_e688_workaround" or so and it will get applied then. I.e., an "opt-in" deal. Yes I got it. I have no problem, you are free to do what you think is the best solution. Just ensure that it will not be possible to apply the fix to F16. Even if you decide to not include the fix at all in the kernel, I still have the patch for my system and it works. Did you get any crashes on your B0 box with Ubuntu? Is it a desktop system or a laptop? The irony is that this laptop was bought without USB3 on purpose to achieve maximum stability... Luckily we didn't stick to the original plan to buy two laptops :)
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
Why not? It all depends on the load type, working set and the access patterns. There's no strong correlation between the load of a machine and the amount of branch misses... Yes I did not say that there is a linear correlation but that does not mean that those two numbers move opposite to each other. On all our systems running more tasks that consume more CPU and memory result in increased branch misses. It is normal as one thread might block another and a third thread might wait for the first thread to finish in order to resume. It is not normal to have increased misses only when the OS is loaded and running in idle without doing anything. Unless you are talking for AMD F14. I wonder if we should just flush the L2 and disable it completely on AMD F14. Since this is an APU I have no idea if the onboard graphics can operate properly without L2. setpci -s 0x18.4 0x164.l and looking at bit 2. If it is set, the erratum is fixed. Will do but there is no meaning as I already told you on the first mail that D18F4x164 is 0003h. It will not change. No, I don't mean that - I'm talking about *not* applying it by default and when people start seeing issues like that, they can boot their machines with something like "enable_e688_workaround" or so and it will get applied then. I.e., an "opt-in" deal. Yes I got it. I have no problem, you are free to do what you think is the best solution. Just ensure that it will not be possible to apply the fix to F16. Even if you decide to not include the fix at all in the kernel, I still have the patch for my system and it works. Did you get any crashes on your B0 box with Ubuntu? Is it a desktop system or a laptop? The irony is that this laptop was bought without USB3 on purpose to achieve maximum stability... Luckily we didn't stick to the original plan to buy two laptops :)
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
On Mon, Oct 24, 2016 at 11:39:47PM +0300, sonofa...@openmailbox.org wrote: > It does to me! That cpu family is "broken" both on B0 and C0. I think > that a CPU at 30% load should not have >31% branch misses. For example > with 5% CPU usage you can't expect to get 10% branch-misses... Why not? It all depends on the load type, working set and the access patterns. There's no strong correlation between the load of a machine and the amount of branch misses... > Yes but on C0 I got better results. Maybe the BIOS vendor got similar > results and did not apply the fix. Well, there's a C0 stepping which doesn't need the fix because it was fixed in the silicon. You can check that by doing: setpci -s 0x18.4 0x164.l and looking at bit 2. If it is set, the erratum is fixed. > They use the same BIOS for all machines B0, C0 and that could be the > reason for not applying the 688 workaround. I think we are going to > the wrong place here but I will not try to influence you at all. I > only apply the fix once per boot and I think that we are not supposed > to apply, remove and then reapply workarounds on the fly. No, I don't mean that - I'm talking about *not* applying it by default and when people start seeing issues like that, they can boot their machines with something like "enable_e688_workaround" or so and it will get applied then. I.e., an "opt-in" deal. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
On Mon, Oct 24, 2016 at 11:39:47PM +0300, sonofa...@openmailbox.org wrote: > It does to me! That cpu family is "broken" both on B0 and C0. I think > that a CPU at 30% load should not have >31% branch misses. For example > with 5% CPU usage you can't expect to get 10% branch-misses... Why not? It all depends on the load type, working set and the access patterns. There's no strong correlation between the load of a machine and the amount of branch misses... > Yes but on C0 I got better results. Maybe the BIOS vendor got similar > results and did not apply the fix. Well, there's a C0 stepping which doesn't need the fix because it was fixed in the silicon. You can check that by doing: setpci -s 0x18.4 0x164.l and looking at bit 2. If it is set, the erratum is fixed. > They use the same BIOS for all machines B0, C0 and that could be the > reason for not applying the 688 workaround. I think we are going to > the wrong place here but I will not try to influence you at all. I > only apply the fix once per boot and I think that we are not supposed > to apply, remove and then reapply workarounds on the fly. No, I don't mean that - I'm talking about *not* applying it by default and when people start seeing issues like that, they can boot their machines with something like "enable_e688_workaround" or so and it will get applied then. I.e., an "opt-in" deal. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
so that doesn't tell me a whole lot. It does to me! That cpu family is "broken" both on B0 and C0. I think that a CPU at 30% load should not have >31% branch misses. For example with 5% CPU usage you can't expect to get 10% branch-misses... Well, Ontario is a small core and with the erratum workaround in place, it does get a bit worse too, apparently. Yes but on C0 I got better results. Maybe the BIOS vendor got similar results and did not apply the fix. They use the same BIOS for all machines B0, C0 and that could be the reason for not applying the 688 workaround. I think we are going to the wrong place here but I will not try to influence you at all. I only apply the fix once per boot and I think that we are not supposed to apply, remove and then reapply workarounds on the fly. Be carefull, you might hang your machine, brick your board or destroy your APU! The truth is that my system behaves better with the patch. The problem is that there is no way to get what I need! That is the E-300 datasheet...They give everything for the north and the south but we have poor documentation for the APU itself...I will contact AMD to see if I can get the APU datasheet so that we have a clue what those bits actualy do. Hohumm, yeah, the workaround impacts the number of branch misses. It probably disables some branch predictor optimization or so, which is "problematic" in certain scenarios. That is obvious. You can't say what it does, it might disable an internal buffer or force a CPU subsystem to run at a lower frequency, who knows? I guess we still want it because first we should not explode and then go fast :) Exactly. I agree with that as I want to eliminate the crashes. Keep in mind that speed is something that all those APUs do not have and will never have, stability is what we are trying to improve. I'm thinking currently that if it is not easily triggerable, I could make the erratum workaround off by default and have a command line option which people can enable in case they experience any of the issues... No problem, it is up to you. As I said above, I will not try to change your mind.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
so that doesn't tell me a whole lot. It does to me! That cpu family is "broken" both on B0 and C0. I think that a CPU at 30% load should not have >31% branch misses. For example with 5% CPU usage you can't expect to get 10% branch-misses... Well, Ontario is a small core and with the erratum workaround in place, it does get a bit worse too, apparently. Yes but on C0 I got better results. Maybe the BIOS vendor got similar results and did not apply the fix. They use the same BIOS for all machines B0, C0 and that could be the reason for not applying the 688 workaround. I think we are going to the wrong place here but I will not try to influence you at all. I only apply the fix once per boot and I think that we are not supposed to apply, remove and then reapply workarounds on the fly. Be carefull, you might hang your machine, brick your board or destroy your APU! The truth is that my system behaves better with the patch. The problem is that there is no way to get what I need! That is the E-300 datasheet...They give everything for the north and the south but we have poor documentation for the APU itself...I will contact AMD to see if I can get the APU datasheet so that we have a clue what those bits actualy do. Hohumm, yeah, the workaround impacts the number of branch misses. It probably disables some branch predictor optimization or so, which is "problematic" in certain scenarios. That is obvious. You can't say what it does, it might disable an internal buffer or force a CPU subsystem to run at a lower frequency, who knows? I guess we still want it because first we should not explode and then go fast :) Exactly. I agree with that as I want to eliminate the crashes. Keep in mind that speed is something that all those APUs do not have and will never have, stability is what we are trying to improve. I'm thinking currently that if it is not easily triggerable, I could make the erratum workaround off by default and have a command line option which people can enable in case they experience any of the issues... No problem, it is up to you. As I said above, I will not try to change your mind.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
On Mon, Oct 24, 2016 at 04:13:25PM +0300, sonofa...@openmailbox.org wrote: > No command needed, just type: sudo perf stat -a and immediately exit > with ctrl+C. That will give you a glimpse. See "% of all branches" $ ./perf stat -a --repeat 10 sleep 1s Performance counter stats for 'system wide' (10 runs): 2013.974964 cpu-clock (msec) #1.999 CPUs utilized ( +- 0.02% ) 88 context-switches #0.044 K/sec ( +- 2.05% ) 2 cpu-migrations#0.001 K/sec ( +- 8.55% ) 75 page-faults #0.037 K/sec ( +- 0.42% ) 81,177,296 cycles#0.040 GHz ( +- 0.76% ) (66.62%) 0 stalled-cycles-frontend (66.63%) 0 stalled-cycles-backend#0.00% backend cycles idle (66.64%) 9,602,846 instructions #0.12 insn per cycle ( +- 2.08% ) (66.65%) 1,698,414 branches #0.843 M/sec ( +- 4.26% ) (66.75%) 327,945 branch-misses # 19.31% of all branches ( +- 1.76% ) (66.72%) 1.007545371 seconds time elapsed ( +- 0.02% ) Now disable erratum workaround: $ wrmsr --all 0xc0011021 0x10008000 $ rdmsr --all 0xc0011021 10008000 10008000 $ ./perf stat -a --repeat 10 sleep 1s Performance counter stats for 'system wide' (10 runs): 2012.521775 cpu-clock (msec) #1.999 CPUs utilized ( +- 0.02% ) 91 context-switches #0.045 K/sec ( +- 2.62% ) 3 cpu-migrations#0.001 K/sec ( +- 13.07% ) 75 page-faults #0.037 K/sec ( +- 0.66% ) 82,215,531 cycles#0.041 GHz ( +- 1.08% ) (66.60%) 0 stalled-cycles-frontend (66.60%) 0 stalled-cycles-backend#0.00% backend cycles idle (66.62%) 9,444,884 instructions #0.11 insn per cycle ( +- 2.11% ) (66.70%) 1,484,480 branches #0.738 M/sec ( +- 5.16% ) (66.78%) 303,382 branch-misses # 20.44% of all branches ( +- 1.44% ) (66.70%) 1.006812225 seconds time elapsed ( +- 0.02% ) so that doesn't tell me a whole lot. > next open firefox, rerun the same command after firefox launches and > immediately exit with ctrl+C On that piece of crap I get branch-misses > above 10% from boot without executing anything and perf does not like > it so it displays it with red colour. On my quad core kabini APU, > in order to get 9% branch-misses I have to open 50 tabs on firefox. > Something is terribly wrong here. Well, Ontario is a small core and with the erratum workaround in place, it does get a bit worse too, apparently. Let's see how many branch misses we get when starting firefox: * with workaround: $ echo 3 > /proc/sys/vm/drop_caches && ./perf stat ./firefox.sh Performance counter stats for './firefox.sh': 257.037242 task-clock (msec) #0.103 CPUs utilized 332 context-switches #0.001 M/sec 6 cpu-migrations#0.023 K/sec 1,022 page-faults #0.004 M/sec 213,464,893 cycles#0.830 GHz (63.29%) 0 stalled-cycles-frontend (62.76%) 0 stalled-cycles-backend#0.00% backend cycles idle (66.88%) 106,763,405 instructions #0.50 insn per cycle (73.54%) 23,794,511 branches # 92.572 M/sec (73.32%) 2,629,193 branch-misses # 11.05% of all branches (66.16%) 2.501140816 seconds time elapsed * without it: $ echo 3 > /proc/sys/vm/drop_caches && ./perf stat ./firefox.sh Performance counter stats for './firefox.sh': 196.561165 task-clock (msec) #0.082 CPUs utilized 276 context-switches #0.001 M/sec 9 cpu-migrations#0.046 K/sec 932 page-faults #0.005 M/sec 162,697,731 cycles#0.828 GHz (70.27%) 0 stalled-cycles-frontend
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
On Mon, Oct 24, 2016 at 04:13:25PM +0300, sonofa...@openmailbox.org wrote: > No command needed, just type: sudo perf stat -a and immediately exit > with ctrl+C. That will give you a glimpse. See "% of all branches" $ ./perf stat -a --repeat 10 sleep 1s Performance counter stats for 'system wide' (10 runs): 2013.974964 cpu-clock (msec) #1.999 CPUs utilized ( +- 0.02% ) 88 context-switches #0.044 K/sec ( +- 2.05% ) 2 cpu-migrations#0.001 K/sec ( +- 8.55% ) 75 page-faults #0.037 K/sec ( +- 0.42% ) 81,177,296 cycles#0.040 GHz ( +- 0.76% ) (66.62%) 0 stalled-cycles-frontend (66.63%) 0 stalled-cycles-backend#0.00% backend cycles idle (66.64%) 9,602,846 instructions #0.12 insn per cycle ( +- 2.08% ) (66.65%) 1,698,414 branches #0.843 M/sec ( +- 4.26% ) (66.75%) 327,945 branch-misses # 19.31% of all branches ( +- 1.76% ) (66.72%) 1.007545371 seconds time elapsed ( +- 0.02% ) Now disable erratum workaround: $ wrmsr --all 0xc0011021 0x10008000 $ rdmsr --all 0xc0011021 10008000 10008000 $ ./perf stat -a --repeat 10 sleep 1s Performance counter stats for 'system wide' (10 runs): 2012.521775 cpu-clock (msec) #1.999 CPUs utilized ( +- 0.02% ) 91 context-switches #0.045 K/sec ( +- 2.62% ) 3 cpu-migrations#0.001 K/sec ( +- 13.07% ) 75 page-faults #0.037 K/sec ( +- 0.66% ) 82,215,531 cycles#0.041 GHz ( +- 1.08% ) (66.60%) 0 stalled-cycles-frontend (66.60%) 0 stalled-cycles-backend#0.00% backend cycles idle (66.62%) 9,444,884 instructions #0.11 insn per cycle ( +- 2.11% ) (66.70%) 1,484,480 branches #0.738 M/sec ( +- 5.16% ) (66.78%) 303,382 branch-misses # 20.44% of all branches ( +- 1.44% ) (66.70%) 1.006812225 seconds time elapsed ( +- 0.02% ) so that doesn't tell me a whole lot. > next open firefox, rerun the same command after firefox launches and > immediately exit with ctrl+C On that piece of crap I get branch-misses > above 10% from boot without executing anything and perf does not like > it so it displays it with red colour. On my quad core kabini APU, > in order to get 9% branch-misses I have to open 50 tabs on firefox. > Something is terribly wrong here. Well, Ontario is a small core and with the erratum workaround in place, it does get a bit worse too, apparently. Let's see how many branch misses we get when starting firefox: * with workaround: $ echo 3 > /proc/sys/vm/drop_caches && ./perf stat ./firefox.sh Performance counter stats for './firefox.sh': 257.037242 task-clock (msec) #0.103 CPUs utilized 332 context-switches #0.001 M/sec 6 cpu-migrations#0.023 K/sec 1,022 page-faults #0.004 M/sec 213,464,893 cycles#0.830 GHz (63.29%) 0 stalled-cycles-frontend (62.76%) 0 stalled-cycles-backend#0.00% backend cycles idle (66.88%) 106,763,405 instructions #0.50 insn per cycle (73.54%) 23,794,511 branches # 92.572 M/sec (73.32%) 2,629,193 branch-misses # 11.05% of all branches (66.16%) 2.501140816 seconds time elapsed * without it: $ echo 3 > /proc/sys/vm/drop_caches && ./perf stat ./firefox.sh Performance counter stats for './firefox.sh': 196.561165 task-clock (msec) #0.082 CPUs utilized 276 context-switches #0.001 M/sec 9 cpu-migrations#0.046 K/sec 932 page-faults #0.005 M/sec 162,697,731 cycles#0.828 GHz (70.27%) 0 stalled-cycles-frontend
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
Sure, give me the exact command you're executing so that I can do it here. No command needed, just type: sudo perf stat -a and immediately exit with ctrl+C. That will give you a glimpse. See "% of all branches" next open firefox, rerun the same command after firefox launches and immediately exit with ctrl+C On that piece of crap I get branch-misses above 10% from boot without executing anything and perf does not like it so it displays it with red colour. On my quad core kabini APU, in order to get 9% branch-misses I have to open 50 tabs on firefox. Something is terribly wrong here. Out of pure interest: do you remember how exactly you did reproduce this issue? Yes, using Ubuntu 16.04 will just crash everything! For example I had crashes with the software updater program. Moreover firefox would become unresponsive even with one tab. Luckily initial tests of 16.10 seem promising as it is lighter and consumes 3~5% less RAM! Debian which was lighter was more responsive and had no crashes except an oops from adobe flash. I believe that the bug is triggered by the unusually high branch-misses specific to this machine. After the fix, I got better OS and program responsiveness.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
Sure, give me the exact command you're executing so that I can do it here. No command needed, just type: sudo perf stat -a and immediately exit with ctrl+C. That will give you a glimpse. See "% of all branches" next open firefox, rerun the same command after firefox launches and immediately exit with ctrl+C On that piece of crap I get branch-misses above 10% from boot without executing anything and perf does not like it so it displays it with red colour. On my quad core kabini APU, in order to get 9% branch-misses I have to open 50 tabs on firefox. Something is terribly wrong here. Out of pure interest: do you remember how exactly you did reproduce this issue? Yes, using Ubuntu 16.04 will just crash everything! For example I had crashes with the software updater program. Moreover firefox would become unresponsive even with one tab. Luckily initial tests of 16.10 seem promising as it is lighter and consumes 3~5% less RAM! Debian which was lighter was more responsive and had no crashes except an oops from adobe flash. I believe that the bug is triggered by the unusually high branch-misses specific to this machine. After the fix, I got better OS and program responsiveness.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
On Mon, Oct 24, 2016 at 02:38:06PM +0300, sonofa...@openmailbox.org wrote: > The patch is not equivalent to the original. As a result it behaves > differently. To be specific, using dmesg I get the expected value from the > affected MSR with the original patch. With the latest patch, patching of the > MSR occurs after dmesg prints the MSR information. That is why I thought it > did nothing. Gah, that "show_msr" is crap - it gets issued too early and we can - and we do - set MSRs later too. Oh and it prints only the BSP. I should probably rip it out - there's msr-tools for that which is much better. > rdmsr --all 0xc0011021 returns the expected results on all CPUs with both > patches. I have the impression that the system boots slower because the fix > is applied later compared to the original patch. Could be - setting those bits 3 in 14 in that MSR is probably disabling some hw features which may impact performance. > Could you please use perf and tell me what values do you get at perf > branch-misses right after boot on your ON-B0 box? Launching firefox with > only one tab gives you similar numbers? Sure, give me the exact command you're executing so that I can do it here. > If you need anything more, feel free to ask. Out of pure interest: do you remember how exactly you did reproduce this issue? Thanks. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
On Mon, Oct 24, 2016 at 02:38:06PM +0300, sonofa...@openmailbox.org wrote: > The patch is not equivalent to the original. As a result it behaves > differently. To be specific, using dmesg I get the expected value from the > affected MSR with the original patch. With the latest patch, patching of the > MSR occurs after dmesg prints the MSR information. That is why I thought it > did nothing. Gah, that "show_msr" is crap - it gets issued too early and we can - and we do - set MSRs later too. Oh and it prints only the BSP. I should probably rip it out - there's msr-tools for that which is much better. > rdmsr --all 0xc0011021 returns the expected results on all CPUs with both > patches. I have the impression that the system boots slower because the fix > is applied later compared to the original patch. Could be - setting those bits 3 in 14 in that MSR is probably disabling some hw features which may impact performance. > Could you please use perf and tell me what values do you get at perf > branch-misses right after boot on your ON-B0 box? Launching firefox with > only one tab gives you similar numbers? Sure, give me the exact command you're executing so that I can do it here. > If you need anything more, feel free to ask. Out of pure interest: do you remember how exactly you did reproduce this issue? Thanks. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
Hmm, so did you apply the patch correctly? Yes The patch is not equivalent to the original. As a result it behaves differently. To be specific, using dmesg I get the expected value from the affected MSR with the original patch. With the latest patch, patching of the MSR occurs after dmesg prints the MSR information. That is why I thought it did nothing. rdmsr --all 0xc0011021 returns the expected results on all CPUs with both patches. I have the impression that the system boots slower because the fix is applied later compared to the original patch. Since the code works there is no need to attach the compile config so I attach a dmesg and rdmsr --all 0xc0011021. Could you please use perf and tell me what values do you get at perf branch-misses right after boot on your ON-B0 box? Launching firefox with only one tab gives you similar numbers? If you need anything more, feel free to ask. $ rdmsr --all 0xc0011021 1020c008 1020c008 $ dmesg [0.00] Linux version 4.8.4-vnl-14h-688-amd64 (root@FXLSI) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.2) ) #1 SMP Sun Oct 23 16:19:41 EEST 2016 [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-4.8.4-vnl-14h-688-amd64 root=UUID=124d207f-6ec4-4270-a1a3-2878e0756f25 ro quiet show_msr=1 clocksource=hpet hpet=verbose acpi_sleep=s3_beep mce=bootlog pcie_aspm.policy=powersave debug=y splash vt.handoff=7 [0.00] KERNEL supported cpus: [0.00] Intel GenuineIntel [0.00] AMD AuthenticAMD [0.00] Centaur CentaurHauls [0.00] x86/fpu: Legacy x87 FPU detected. [0.00] x86/fpu: Using 'eager' FPU context switches. [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009f7ff] usable [0.00] BIOS-e820: [mem 0x0009f800-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xdfb3efff] usable [0.00] BIOS-e820: [mem 0xdfb3f000-0xdfbbefff] reserved [0.00] BIOS-e820: [mem 0xdfbbf000-0xdfebefff] ACPI NVS [0.00] BIOS-e820: [mem 0xdfebf000-0xdfef4fff] ACPI data [0.00] BIOS-e820: [mem 0xdfef5000-0xdfef] usable [0.00] BIOS-e820: [mem 0xdff0-0xdfff] reserved [0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved [0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved [0.00] BIOS-e820: [mem 0xfec1-0xfec10fff] reserved [0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved [0.00] BIOS-e820: [mem 0xffe0-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x000206ff] usable [0.00] BIOS-e820: [mem 0x00020700-0x00021eff] reserved [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.7 present. [0.00] DMI: Hewlett-Packard Presario CQ57 Notebook PC/3577, BIOS F.47 12/17/2011 [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] e820: last_pfn = 0x207000 max_arch_pfn = 0x4 [0.00] MTRR default type: uncachable [0.00] MTRR fixed ranges enabled: [0.00] 0-9 write-back [0.00] A-B uncachable [0.00] C-F write-through [0.00] MTRR variable ranges enabled: [0.00] 0 base 0 mask F8000 write-back [0.00] 1 base 08000 mask FC000 write-back [0.00] 2 base 0C000 mask FE000 write-back [0.00] 3 base 0DFEBD000 mask FF000 uncachable [0.00] 4 base 0FFE0 mask FFFE0 write-protect [0.00] 5 disabled [0.00] 6 disabled [0.00] 7 disabled [0.00] TOM2: 00021f00 aka 8688M [0.00] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WC UC- WT [0.00] e820: last_pfn = 0xdff00 max_arch_pfn = 0x4 [0.00] found SMP MP-table at [mem 0x000fe1b0-0x000fe1bf] mapped at [a0e7400fe1b0] [0.00] Scanning 1 areas for low memory corruption [0.00] Base memory trampoline at [a0e740099000] 99000 size 24576 [0.00] Using GB pages for direct mapping [0.00] BRK [0x8222a000, 0x8222afff] PGTABLE [0.00] BRK [0x8222b000, 0x8222bfff] PGTABLE [0.00] BRK [0x8222c000, 0x8222cfff] PGTABLE [0.00] BRK [0x8222d000, 0x8222dfff] PGTABLE [0.00] BRK [0x8222e000, 0x8222efff] PGTABLE [0.00] BRK [0x8222f000, 0x8222] PGTABLE [0.00] RAMDISK: [mem 0x33a42000-0x35d18fff] [0.00] ACPI: Early table checksum verification disabled [0.00] ACPI: RSDP 0x000FE020 24 (v02 HPQOEM) [
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
Hmm, so did you apply the patch correctly? Yes The patch is not equivalent to the original. As a result it behaves differently. To be specific, using dmesg I get the expected value from the affected MSR with the original patch. With the latest patch, patching of the MSR occurs after dmesg prints the MSR information. That is why I thought it did nothing. rdmsr --all 0xc0011021 returns the expected results on all CPUs with both patches. I have the impression that the system boots slower because the fix is applied later compared to the original patch. Since the code works there is no need to attach the compile config so I attach a dmesg and rdmsr --all 0xc0011021. Could you please use perf and tell me what values do you get at perf branch-misses right after boot on your ON-B0 box? Launching firefox with only one tab gives you similar numbers? If you need anything more, feel free to ask. $ rdmsr --all 0xc0011021 1020c008 1020c008 $ dmesg [0.00] Linux version 4.8.4-vnl-14h-688-amd64 (root@FXLSI) (gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.2) ) #1 SMP Sun Oct 23 16:19:41 EEST 2016 [0.00] Command line: BOOT_IMAGE=/boot/vmlinuz-4.8.4-vnl-14h-688-amd64 root=UUID=124d207f-6ec4-4270-a1a3-2878e0756f25 ro quiet show_msr=1 clocksource=hpet hpet=verbose acpi_sleep=s3_beep mce=bootlog pcie_aspm.policy=powersave debug=y splash vt.handoff=7 [0.00] KERNEL supported cpus: [0.00] Intel GenuineIntel [0.00] AMD AuthenticAMD [0.00] Centaur CentaurHauls [0.00] x86/fpu: Legacy x87 FPU detected. [0.00] x86/fpu: Using 'eager' FPU context switches. [0.00] e820: BIOS-provided physical RAM map: [0.00] BIOS-e820: [mem 0x-0x0009f7ff] usable [0.00] BIOS-e820: [mem 0x0009f800-0x0009] reserved [0.00] BIOS-e820: [mem 0x000e-0x000f] reserved [0.00] BIOS-e820: [mem 0x0010-0xdfb3efff] usable [0.00] BIOS-e820: [mem 0xdfb3f000-0xdfbbefff] reserved [0.00] BIOS-e820: [mem 0xdfbbf000-0xdfebefff] ACPI NVS [0.00] BIOS-e820: [mem 0xdfebf000-0xdfef4fff] ACPI data [0.00] BIOS-e820: [mem 0xdfef5000-0xdfef] usable [0.00] BIOS-e820: [mem 0xdff0-0xdfff] reserved [0.00] BIOS-e820: [mem 0xf800-0xfbff] reserved [0.00] BIOS-e820: [mem 0xfec0-0xfec00fff] reserved [0.00] BIOS-e820: [mem 0xfec1-0xfec10fff] reserved [0.00] BIOS-e820: [mem 0xfee0-0xfee00fff] reserved [0.00] BIOS-e820: [mem 0xffe0-0x] reserved [0.00] BIOS-e820: [mem 0x0001-0x000206ff] usable [0.00] BIOS-e820: [mem 0x00020700-0x00021eff] reserved [0.00] NX (Execute Disable) protection: active [0.00] SMBIOS 2.7 present. [0.00] DMI: Hewlett-Packard Presario CQ57 Notebook PC/3577, BIOS F.47 12/17/2011 [0.00] e820: update [mem 0x-0x0fff] usable ==> reserved [0.00] e820: remove [mem 0x000a-0x000f] usable [0.00] e820: last_pfn = 0x207000 max_arch_pfn = 0x4 [0.00] MTRR default type: uncachable [0.00] MTRR fixed ranges enabled: [0.00] 0-9 write-back [0.00] A-B uncachable [0.00] C-F write-through [0.00] MTRR variable ranges enabled: [0.00] 0 base 0 mask F8000 write-back [0.00] 1 base 08000 mask FC000 write-back [0.00] 2 base 0C000 mask FE000 write-back [0.00] 3 base 0DFEBD000 mask FF000 uncachable [0.00] 4 base 0FFE0 mask FFFE0 write-protect [0.00] 5 disabled [0.00] 6 disabled [0.00] 7 disabled [0.00] TOM2: 00021f00 aka 8688M [0.00] x86/PAT: Configuration [0-7]: WB WC UC- UC WB WC UC- WT [0.00] e820: last_pfn = 0xdff00 max_arch_pfn = 0x4 [0.00] found SMP MP-table at [mem 0x000fe1b0-0x000fe1bf] mapped at [a0e7400fe1b0] [0.00] Scanning 1 areas for low memory corruption [0.00] Base memory trampoline at [a0e740099000] 99000 size 24576 [0.00] Using GB pages for direct mapping [0.00] BRK [0x8222a000, 0x8222afff] PGTABLE [0.00] BRK [0x8222b000, 0x8222bfff] PGTABLE [0.00] BRK [0x8222c000, 0x8222cfff] PGTABLE [0.00] BRK [0x8222d000, 0x8222dfff] PGTABLE [0.00] BRK [0x8222e000, 0x8222efff] PGTABLE [0.00] BRK [0x8222f000, 0x8222] PGTABLE [0.00] RAMDISK: [mem 0x33a42000-0x35d18fff] [0.00] ACPI: Early table checksum verification disabled [0.00] ACPI: RSDP 0x000FE020 24 (v02 HPQOEM) [
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
On Mon, Oct 24, 2016 at 12:02:39AM +0300, sonofa...@openmailbox.org wrote: > Good to hear but something is still wrong on my laptop as nothing worked as > expected :( Hmm, so did you apply the patch correctly? Send me arch/x86/kernel/amd_nb.c after you've applied the patch. Then, boot the kernel with my patch applied, send me full dmesg, the .config used and do as root: $ rdmsr --all 0xc0011021 and paste the output here please. For that you'd need the msr-tools package and you'd need to modprobe msr.ko if you haven't done so. Thanks. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
On Mon, Oct 24, 2016 at 12:02:39AM +0300, sonofa...@openmailbox.org wrote: > Good to hear but something is still wrong on my laptop as nothing worked as > expected :( Hmm, so did you apply the patch correctly? Send me arch/x86/kernel/amd_nb.c after you've applied the patch. Then, boot the kernel with my patch applied, send me full dmesg, the .config used and do as root: $ rdmsr --all 0xc0011021 and paste the output here please. For that you'd need the msr-tools package and you'd need to modprobe msr.ko if you haven't done so. Thanks. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
In any case, I tested it on my ON-B0 box and it looked good. Good to hear but something is still wrong on my laptop as nothing worked as expected :( Since I have a working custom kernel including the fix from my original patch it was clear from boot that the last patched kernel did not touch the MSR we want to modify at all. The machine was slower compared with my kernel using the original patch. As I use the show_msr option, a quick look at the dmesg proved that easily. Nowadays that processors have many cores, I wonder if the kernel should report which CPU MSRs are displayed at dmesg. Take your time to see what is wrong, we already have one working kernel for our machine :)
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
In any case, I tested it on my ON-B0 box and it looked good. Good to hear but something is still wrong on my laptop as nothing worked as expected :( Since I have a working custom kernel including the fix from my original patch it was clear from boot that the last patched kernel did not touch the MSR we want to modify at all. The machine was slower compared with my kernel using the original patch. As I use the show_msr option, a quick look at the dmesg proved that easily. Nowadays that processors have many cores, I wonder if the kernel should report which CPU MSRs are displayed at dmesg. Take your time to see what is wrong, we already have one working kernel for our machine :)
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
On Sun, Oct 23, 2016 at 08:06:44PM +0300, sonofa...@openmailbox.org wrote: > I use the patchwork site and my brother uses an LKML mirror site. He gets > patches from there. This worked the with the first two patches but the last > one was a big one and that site truncated some bytes from a line...Sorry for > the trouble. You can simply save the email text if your mail client doesn't mangle white space. Alternatively, there's https://patchwork.kernel.org/project/LKML/list/ which people do use. > Kernel is now ready and moved to USB stick. Testing is about to begin. If > everything works as expected I shall send V2 late at night! Thanks!! Good. But you don't need to send v2 - you just need to say whether my version fixes it for you or not. If not, then I need to stare at it more. :) In any case, I tested it on my ON-B0 box and it looked good. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
On Sun, Oct 23, 2016 at 08:06:44PM +0300, sonofa...@openmailbox.org wrote: > I use the patchwork site and my brother uses an LKML mirror site. He gets > patches from there. This worked the with the first two patches but the last > one was a big one and that site truncated some bytes from a line...Sorry for > the trouble. You can simply save the email text if your mail client doesn't mangle white space. Alternatively, there's https://patchwork.kernel.org/project/LKML/list/ which people do use. > Kernel is now ready and moved to USB stick. Testing is about to begin. If > everything works as expected I shall send V2 late at night! Thanks!! Good. But you don't need to send v2 - you just need to say whether my version fixes it for you or not. If not, then I need to stare at it more. :) In any case, I tested it on my ON-B0 box and it looked good. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
Are you sure you did it right? Yes and no. I use the patchwork site and my brother uses an LKML mirror site. He gets patches from there. This worked the with the first two patches but the last one was a big one and that site truncated some bytes from a line...Sorry for the trouble. For reasons I cannot explain I haven't used git till now even though I have downloaded it with its source files from the very first versions that got released years ago. Kernel is now ready and moved to USB stick. Testing is about to begin. If everything works as expected I shall send V2 late at night! Thanks!!
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
Are you sure you did it right? Yes and no. I use the patchwork site and my brother uses an LKML mirror site. He gets patches from there. This worked the with the first two patches but the last one was a big one and that site truncated some bytes from a line...Sorry for the trouble. For reasons I cannot explain I haven't used git till now even though I have downloaded it with its source files from the very first versions that got released years ago. Kernel is now ready and moved to USB stick. Testing is about to begin. If everything works as expected I shall send V2 late at night! Thanks!!
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
On Sun, Oct 23, 2016 at 12:39:37PM +0300, sonofa...@openmailbox.org wrote: > Last night attempt failed as patch does not apply to 4.8. Neither 4.8.1 nor > 4.8.4. Did you switch to 4.9? Please use 4.8 as we prefer to avoid rc > kernels as we had casualties in the past. Do you want to add changes by > hand? Are you sure you did it right? I saved the mail I sent you before in /tmp/e688.mail. $ git checkout v4.8.1 Previous HEAD position was e58b634ca001... Merge branch 'tip-microcode-rc1+' into rc1+1 HEAD is now at a7fac751ddba... Linux 4.8.1 $ patch -p1 --dry-run -i /tmp/e688.mail checking file arch/x86/kernel/amd_nb.c $ git checkout v4.8.3 Previous HEAD position was a7fac751ddba... Linux 4.8.1 HEAD is now at 1888926ea8d2... Linux 4.8.3 $ patch -p1 --dry-run -i /tmp/e688.mail checking file arch/x86/kernel/amd_nb.c $ git checkout v4.8 Previous HEAD position was 1888926ea8d2... Linux 4.8.3 HEAD is now at c8d2bc9bc39e... Linux 4.8 $ patch -p1 --dry-run -i /tmp/e688.mail checking file arch/x86/kernel/amd_nb.c Now you only have to remove "--dry-run" Paste here the error messages when trying to apply it. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
On Sun, Oct 23, 2016 at 12:39:37PM +0300, sonofa...@openmailbox.org wrote: > Last night attempt failed as patch does not apply to 4.8. Neither 4.8.1 nor > 4.8.4. Did you switch to 4.9? Please use 4.8 as we prefer to avoid rc > kernels as we had casualties in the past. Do you want to add changes by > hand? Are you sure you did it right? I saved the mail I sent you before in /tmp/e688.mail. $ git checkout v4.8.1 Previous HEAD position was e58b634ca001... Merge branch 'tip-microcode-rc1+' into rc1+1 HEAD is now at a7fac751ddba... Linux 4.8.1 $ patch -p1 --dry-run -i /tmp/e688.mail checking file arch/x86/kernel/amd_nb.c $ git checkout v4.8.3 Previous HEAD position was a7fac751ddba... Linux 4.8.1 HEAD is now at 1888926ea8d2... Linux 4.8.3 $ patch -p1 --dry-run -i /tmp/e688.mail checking file arch/x86/kernel/amd_nb.c $ git checkout v4.8 Previous HEAD position was 1888926ea8d2... Linux 4.8.3 HEAD is now at c8d2bc9bc39e... Linux 4.8 $ patch -p1 --dry-run -i /tmp/e688.mail checking file arch/x86/kernel/amd_nb.c Now you only have to remove "--dry-run" Paste here the error messages when trying to apply it. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
Last night attempt failed as patch does not apply to 4.8. Neither 4.8.1 nor 4.8.4. Did you switch to 4.9? Please use 4.8 as we prefer to avoid rc kernels as we had casualties in the past. Do you want to add changes by hand?
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
Last night attempt failed as patch does not apply to 4.8. Neither 4.8.1 nor 4.8.4. Did you switch to 4.9? Please use 4.8 as we prefer to avoid rc kernels as we had casualties in the past. Do you want to add changes by hand?
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
On Sat, Oct 22, 2016 at 02:16:41PM +0300, sonofa...@openmailbox.org wrote: > Patch does not compile. Yeah, it needs more work. Try the version below. It needs to be done differently because we need PCI extended config space access to be enabled in order to check bit 2. > To be honest I can't say. My brother's machine(s) has random crashes from > time to time. We suspect that this erratum is to blame. He must have kept > some information at least from one of those crashes but there was no time to > analyze them till now. Finding those logs on our disks needs a big effort > but it will be done! We are willing to discover the trouble maker no matter > what it takes. To do that the machine must be stripped off from all cards > and then put on quarantine. Then we can connect it with a another machine > with an RS-232 cable to see what is wrong. After that we must test it on a > different motherboard we have. I think we have one with a BIOS from a > different BIOS vendor. We will surely inform you on this one as we can't do > such a patch. So we will focus on triggering this bug. One thing is sure, > its BIOS has no workaround for that erratum. Ok, good. Let me know how it goes. Thanks. --- >From ddce976ba7fc44922a6c4e9e58bbdf65c65c4ae4 Mon Sep 17 00:00:00 2001 From: Borislav PetkovDate: Sat, 22 Oct 2016 15:23:54 +0200 Subject: [PATCH] E688, v1 Signed-off-by: Borislav Petkov --- diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c index 4fdf6230d93c..bfde06b1a587 100644 --- a/arch/x86/kernel/amd_nb.c +++ b/arch/x86/kernel/amd_nb.c @@ -15,6 +15,8 @@ static u32 *flush_words; +#define PCI_DEVICE_ID_AMD_CNB17H_F4 0x1704 + const struct pci_device_id amd_nb_misc_ids[] = { { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_K8_NB_MISC) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_10H_NB_MISC) }, @@ -24,6 +26,7 @@ const struct pci_device_id amd_nb_misc_ids[] = { { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_15H_M60H_NB_F3) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_16H_NB_F3) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_16H_M30H_NB_F3) }, + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CNB17H_F3) }, {} }; EXPORT_SYMBOL(amd_nb_misc_ids); @@ -34,6 +37,7 @@ static const struct pci_device_id amd_nb_link_ids[] = { { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_15H_M60H_NB_F4) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_16H_NB_F4) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_16H_M30H_NB_F4) }, + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CNB17H_F4) }, {} }; @@ -274,11 +278,46 @@ void amd_flush_garts(void) } EXPORT_SYMBOL_GPL(amd_flush_garts); +static void __fix_erratum_688(void *info) +{ +#define MSR_AMD64_IC_CFG 0xC0011021 + + msr_set_bit(MSR_AMD64_IC_CFG, 3); + msr_set_bit(MSR_AMD64_IC_CFG, 14); +} + +/* Apply erratum 688 fix so machines without a BIOS fix work. */ +static __init void fix_erratum_688(void) +{ + struct pci_dev *F4; + u32 val; + + if (boot_cpu_data.x86 != 0x14) + return; + + if (!amd_northbridges.num) + return; + + F4 = node_to_amd_nb(0)->link; + if (!F4) + return; + + if (pci_read_config_dword(F4, 0x164, )) + return; + + if (val & BIT(2)) + return; + + on_each_cpu(__fix_erratum_688, NULL, 0); +} + static __init int init_amd_nbs(void) { amd_cache_northbridges(); amd_cache_gart(); + fix_erratum_688(); + return 0; } -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
On Sat, Oct 22, 2016 at 02:16:41PM +0300, sonofa...@openmailbox.org wrote: > Patch does not compile. Yeah, it needs more work. Try the version below. It needs to be done differently because we need PCI extended config space access to be enabled in order to check bit 2. > To be honest I can't say. My brother's machine(s) has random crashes from > time to time. We suspect that this erratum is to blame. He must have kept > some information at least from one of those crashes but there was no time to > analyze them till now. Finding those logs on our disks needs a big effort > but it will be done! We are willing to discover the trouble maker no matter > what it takes. To do that the machine must be stripped off from all cards > and then put on quarantine. Then we can connect it with a another machine > with an RS-232 cable to see what is wrong. After that we must test it on a > different motherboard we have. I think we have one with a BIOS from a > different BIOS vendor. We will surely inform you on this one as we can't do > such a patch. So we will focus on triggering this bug. One thing is sure, > its BIOS has no workaround for that erratum. Ok, good. Let me know how it goes. Thanks. --- >From ddce976ba7fc44922a6c4e9e58bbdf65c65c4ae4 Mon Sep 17 00:00:00 2001 From: Borislav Petkov Date: Sat, 22 Oct 2016 15:23:54 +0200 Subject: [PATCH] E688, v1 Signed-off-by: Borislav Petkov --- diff --git a/arch/x86/kernel/amd_nb.c b/arch/x86/kernel/amd_nb.c index 4fdf6230d93c..bfde06b1a587 100644 --- a/arch/x86/kernel/amd_nb.c +++ b/arch/x86/kernel/amd_nb.c @@ -15,6 +15,8 @@ static u32 *flush_words; +#define PCI_DEVICE_ID_AMD_CNB17H_F4 0x1704 + const struct pci_device_id amd_nb_misc_ids[] = { { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_K8_NB_MISC) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_10H_NB_MISC) }, @@ -24,6 +26,7 @@ const struct pci_device_id amd_nb_misc_ids[] = { { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_15H_M60H_NB_F3) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_16H_NB_F3) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_16H_M30H_NB_F3) }, + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CNB17H_F3) }, {} }; EXPORT_SYMBOL(amd_nb_misc_ids); @@ -34,6 +37,7 @@ static const struct pci_device_id amd_nb_link_ids[] = { { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_15H_M60H_NB_F4) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_16H_NB_F4) }, { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_16H_M30H_NB_F4) }, + { PCI_DEVICE(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_CNB17H_F4) }, {} }; @@ -274,11 +278,46 @@ void amd_flush_garts(void) } EXPORT_SYMBOL_GPL(amd_flush_garts); +static void __fix_erratum_688(void *info) +{ +#define MSR_AMD64_IC_CFG 0xC0011021 + + msr_set_bit(MSR_AMD64_IC_CFG, 3); + msr_set_bit(MSR_AMD64_IC_CFG, 14); +} + +/* Apply erratum 688 fix so machines without a BIOS fix work. */ +static __init void fix_erratum_688(void) +{ + struct pci_dev *F4; + u32 val; + + if (boot_cpu_data.x86 != 0x14) + return; + + if (!amd_northbridges.num) + return; + + F4 = node_to_amd_nb(0)->link; + if (!F4) + return; + + if (pci_read_config_dword(F4, 0x164, )) + return; + + if (val & BIT(2)) + return; + + on_each_cpu(__fix_erratum_688, NULL, 0); +} + static __init int init_amd_nbs(void) { amd_cache_northbridges(); amd_cache_gart(); + fix_erratum_688(); + return 0; } -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
Patch does not compile. I tried to add pci.h but did nothing. I converted pci_read_config to pci_read_config_dword but again nothing. On 2016-10-22 02:01, Borislav Petkov wrote: Do you have a way to trigger that one? To be honest I can't say. My brother's machine(s) has random crashes from time to time. We suspect that this erratum is to blame. He must have kept some information at least from one of those crashes but there was no time to analyze them till now. Finding those logs on our disks needs a big effort but it will be done! We are willing to discover the trouble maker no matter what it takes. To do that the machine must be stripped off from all cards and then put on quarantine. Then we can connect it with a another machine with an RS-232 cable to see what is wrong. After that we must test it on a different motherboard we have. I think we have one with a BIOS from a different BIOS vendor. We will surely inform you on this one as we can't do such a patch. So we will focus on triggering this bug. One thing is sure, its BIOS has no workaround for that erratum.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
Patch does not compile. I tried to add pci.h but did nothing. I converted pci_read_config to pci_read_config_dword but again nothing. On 2016-10-22 02:01, Borislav Petkov wrote: Do you have a way to trigger that one? To be honest I can't say. My brother's machine(s) has random crashes from time to time. We suspect that this erratum is to blame. He must have kept some information at least from one of those crashes but there was no time to analyze them till now. Finding those logs on our disks needs a big effort but it will be done! We are willing to discover the trouble maker no matter what it takes. To do that the machine must be stripped off from all cards and then put on quarantine. Then we can connect it with a another machine with an RS-232 cable to see what is wrong. After that we must test it on a different motherboard we have. I think we have one with a BIOS from a different BIOS vendor. We will surely inform you on this one as we can't do such a patch. So we will focus on triggering this bug. One thing is sure, its BIOS has no workaround for that erratum.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
On Sat, Oct 22, 2016 at 12:51:32AM +0300, sonofa...@openmailbox.org wrote: > Thank you for your time! I have chosen reply to list and all recipients, it > must work now. Yes, exactly what I had in mind. > My brother rejected the proposed patch because it does not provide > equivalent functionality with the original. > > Our initial patch would fix 3 broken models and 1 working model. Your patch > will only work for 1 model. Only machines having our APU will be fixed. All > B0 APUs will be unpatched. This is not right. Check the revision guide to > verify that. Right you are: I read too much into the description of bit 2 of D18F4x164. Of course we want to apply that fix to to ON-Bs too. > To avoid unneeded complexity we propose this patch as V2, do you agree? > > +#define MSR_AMD64_IC_CFG 0xC0011021 > + > +static void init_amd_on(struct cpuinfo_x86 *c) > +{ > + /* > + * Apply erratum 688 fix so machines without a BIOS > + * fix work. > + */ > + > + u32 val = pci_read_config(0, 0x18, 0x4, 0x164); > + > + if (!(val & BIT(2))) { > + msr_set_bit(MSR_AMD64_IC_CFG, 3); > + msr_set_bit(MSR_AMD64_IC_CFG, 14); Yes, that should work fine. Btw, there's missing a closing } for the if-test here. > +} > static void init_amd_bd(struct cpuinfo_x86 *c) > { > u64 value; > @@ -738,6 +750,7 @@ static void init_amd(struct cpuinfo_x86 > case 0xf: init_amd_k8(c); break; > case 0x10: init_amd_gh(c); break; > case 0x12: init_amd_ln(c); break; > + case 0x14: init_amd_on(c); break; > case 0x15: init_amd_bd(c); break; > } > > Please advice to proceed! Right, please send a tested version of the above with the explanation text from your initial submission. Thanks. > erratum 721 :-( Hmm, interesting. Do you have a way to trigger that one? -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
On Sat, Oct 22, 2016 at 12:51:32AM +0300, sonofa...@openmailbox.org wrote: > Thank you for your time! I have chosen reply to list and all recipients, it > must work now. Yes, exactly what I had in mind. > My brother rejected the proposed patch because it does not provide > equivalent functionality with the original. > > Our initial patch would fix 3 broken models and 1 working model. Your patch > will only work for 1 model. Only machines having our APU will be fixed. All > B0 APUs will be unpatched. This is not right. Check the revision guide to > verify that. Right you are: I read too much into the description of bit 2 of D18F4x164. Of course we want to apply that fix to to ON-Bs too. > To avoid unneeded complexity we propose this patch as V2, do you agree? > > +#define MSR_AMD64_IC_CFG 0xC0011021 > + > +static void init_amd_on(struct cpuinfo_x86 *c) > +{ > + /* > + * Apply erratum 688 fix so machines without a BIOS > + * fix work. > + */ > + > + u32 val = pci_read_config(0, 0x18, 0x4, 0x164); > + > + if (!(val & BIT(2))) { > + msr_set_bit(MSR_AMD64_IC_CFG, 3); > + msr_set_bit(MSR_AMD64_IC_CFG, 14); Yes, that should work fine. Btw, there's missing a closing } for the if-test here. > +} > static void init_amd_bd(struct cpuinfo_x86 *c) > { > u64 value; > @@ -738,6 +750,7 @@ static void init_amd(struct cpuinfo_x86 > case 0xf: init_amd_k8(c); break; > case 0x10: init_amd_gh(c); break; > case 0x12: init_amd_ln(c); break; > + case 0x14: init_amd_on(c); break; > case 0x15: init_amd_bd(c); break; > } > > Please advice to proceed! Right, please send a tested version of the above with the explanation text from your initial submission. Thanks. > erratum 721 :-( Hmm, interesting. Do you have a way to trigger that one? -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
Thank you for your time! I have chosen reply to list and all recipients, it must work now. My brother rejected the proposed patch because it does not provide equivalent functionality with the original. Our initial patch would fix 3 broken models and 1 working model. Your patch will only work for 1 model. Only machines having our APU will be fixed. All B0 APUs will be unpatched. This is not right. Check the revision guide to verify that. To avoid unneeded complexity we propose this patch as V2, do you agree? +#define MSR_AMD64_IC_CFG 0xC0011021 + +static void init_amd_on(struct cpuinfo_x86 *c) +{ + /* +* Apply erratum 688 fix so machines without a BIOS +* fix work. +*/ + + u32 val = pci_read_config(0, 0x18, 0x4, 0x164); + + if (!(val & BIT(2))) { + msr_set_bit(MSR_AMD64_IC_CFG, 3); + msr_set_bit(MSR_AMD64_IC_CFG, 14); +} static void init_amd_bd(struct cpuinfo_x86 *c) { u64 value; @@ -738,6 +750,7 @@ static void init_amd(struct cpuinfo_x86 case 0xf: init_amd_k8(c); break; case 0x10: init_amd_gh(c); break; case 0x12: init_amd_ln(c); break; + case 0x14: init_amd_on(c); break; case 0x15: init_amd_bd(c); break; } Please advice to proceed! Why, what's wrong with that one? That one should be all fixed! :-) I have such box too and it runs fine. erratum 721 :-(
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
Thank you for your time! I have chosen reply to list and all recipients, it must work now. My brother rejected the proposed patch because it does not provide equivalent functionality with the original. Our initial patch would fix 3 broken models and 1 working model. Your patch will only work for 1 model. Only machines having our APU will be fixed. All B0 APUs will be unpatched. This is not right. Check the revision guide to verify that. To avoid unneeded complexity we propose this patch as V2, do you agree? +#define MSR_AMD64_IC_CFG 0xC0011021 + +static void init_amd_on(struct cpuinfo_x86 *c) +{ + /* +* Apply erratum 688 fix so machines without a BIOS +* fix work. +*/ + + u32 val = pci_read_config(0, 0x18, 0x4, 0x164); + + if (!(val & BIT(2))) { + msr_set_bit(MSR_AMD64_IC_CFG, 3); + msr_set_bit(MSR_AMD64_IC_CFG, 14); +} static void init_amd_bd(struct cpuinfo_x86 *c) { u64 value; @@ -738,6 +750,7 @@ static void init_amd(struct cpuinfo_x86 case 0xf: init_amd_k8(c); break; case 0x10: init_amd_gh(c); break; case 0x12: init_amd_ln(c); break; + case 0x14: init_amd_on(c); break; case 0x15: init_amd_bd(c); break; } Please advice to proceed! Why, what's wrong with that one? That one should be all fixed! :-) I have such box too and it runs fine. erratum 721 :-(
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
Hi Ioannis, first of all, when you reply to a mail on lkml, please use the "reply-to-all" functionality of your mail client - otherwise replies might get missed on such a high volume mailing list. On Fri, Oct 21, 2016 at 07:19:07PM +0300, sonofa...@openmailbox.org wrote: > Sorry for the late reply! This machine has caused nothing but trouble. HP > will not fix it and we will not choose their laptops anymore... You're not the only one making this experience. > My brother told me that we apply a quirk to the last Ontario APUs that do > not need it but I did not think it would be an issue since they have fixed > the error. No, you need to apply the fix only on the models which need it. > It seems better this way so that only affected APUs are patched. Be patient, > we are compiling the new patch right now but compiling is run on a different > high end AMD machine of my brother. Tomorrow I will have access to the > laptop and I will update the kernel and send you the V2 patch. Compiling to > that laptop would possibly need a whole day even with AC power! You can build somewhere else and copy the kernel to the laptop. That's how I do it. > Do you want /proc/cpuinfo on the V2 patch e-mail? Both CPUs needed? No, I just wanted to see them and you've pasted them here. Thanks. > Here is a dump from an older installation some months ago I kept on my > disk(tomorrow I will dump it again if you want): No need, one is enough :) > > Then, keep that *whole* changelog above when sending v2 of the patch > What do you mean? It is not clear to me, Do you mean all the info we wrote > on the e-mail, your comments or both? All the info you wrote in the first mail. > We have many AMD machines and we will need your help next week to patch our > Phenom(tm) II X6. Why, what's wrong with that one? That one should be all fixed! :-) I have such box too and it runs fine. > Let's finish this patch first and we will fix that too but it appears > to be much more difficult... Don't hesitate to ask if you need help... HTH. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
Hi Ioannis, first of all, when you reply to a mail on lkml, please use the "reply-to-all" functionality of your mail client - otherwise replies might get missed on such a high volume mailing list. On Fri, Oct 21, 2016 at 07:19:07PM +0300, sonofa...@openmailbox.org wrote: > Sorry for the late reply! This machine has caused nothing but trouble. HP > will not fix it and we will not choose their laptops anymore... You're not the only one making this experience. > My brother told me that we apply a quirk to the last Ontario APUs that do > not need it but I did not think it would be an issue since they have fixed > the error. No, you need to apply the fix only on the models which need it. > It seems better this way so that only affected APUs are patched. Be patient, > we are compiling the new patch right now but compiling is run on a different > high end AMD machine of my brother. Tomorrow I will have access to the > laptop and I will update the kernel and send you the V2 patch. Compiling to > that laptop would possibly need a whole day even with AC power! You can build somewhere else and copy the kernel to the laptop. That's how I do it. > Do you want /proc/cpuinfo on the V2 patch e-mail? Both CPUs needed? No, I just wanted to see them and you've pasted them here. Thanks. > Here is a dump from an older installation some months ago I kept on my > disk(tomorrow I will dump it again if you want): No need, one is enough :) > > Then, keep that *whole* changelog above when sending v2 of the patch > What do you mean? It is not clear to me, Do you mean all the info we wrote > on the e-mail, your comments or both? All the info you wrote in the first mail. > We have many AMD machines and we will need your help next week to patch our > Phenom(tm) II X6. Why, what's wrong with that one? That one should be all fixed! :-) I have such box too and it runs fine. > Let's finish this patch first and we will fix that too but it appears > to be much more difficult... Don't hesitate to ask if you need help... HTH. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
Sorry for the late reply! This machine has caused nothing but trouble. HP will not fix it and we will not choose their laptops anymore... My brother told me that we apply a quirk to the last Ontario APUs that do not need it but I did not think it would be an issue since they have fixed the error. It seems better this way so that only affected APUs are patched. Be patient, we are compiling the new patch right now but compiling is run on a different high end AMD machine of my brother. Tomorrow I will have access to the laptop and I will update the kernel and send you the V2 patch. Compiling to that laptop would possibly need a whole day even with AC power! Do you want /proc/cpuinfo on the V2 patch e-mail? Both CPUs needed? Here is a dump from an older installation some months ago I kept on my disk(tomorrow I will dump it again if you want): processor : 0 vendor_id : AuthenticAMD cpu family : 20 model : 2 model name : AMD E-300 APU with Radeon(tm) HD Graphics stepping: 0 microcode : 0x5000119 cpu MHz : 1300.000 cache size : 512 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 6 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni monitor ssse3 cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch ibs skinit wdt hw_pstate vmmcall arat npt lbrv svm_lock nrip_save pausefilter bugs: fxsave_leak sysret_ss_attrs bogomips: 2594.69 TLB size: 1024 4K pages clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate processor : 1 vendor_id : AuthenticAMD cpu family : 20 model : 2 model name : AMD E-300 APU with Radeon(tm) HD Graphics stepping: 0 microcode : 0x5000119 cpu MHz : 1300.000 cache size : 512 KB physical id : 0 siblings: 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 6 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni monitor ssse3 cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch ibs skinit wdt hw_pstate vmmcall arat npt lbrv svm_lock nrip_save pausefilter bugs: fxsave_leak sysret_ss_attrs bogomips: 2594.69 TLB size: 1024 4K pages clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate Then, keep that *whole* changelog above when sending v2 of the patch What do you mean? It is not clear to me, Do you mean all the info we wrote on the e-mail, your comments or both? We have many AMD machines and we will need your help next week to patch our Phenom(tm) II X6. Let's finish this patch first and we will fix that too but it appears to be much more difficult...
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
Sorry for the late reply! This machine has caused nothing but trouble. HP will not fix it and we will not choose their laptops anymore... My brother told me that we apply a quirk to the last Ontario APUs that do not need it but I did not think it would be an issue since they have fixed the error. It seems better this way so that only affected APUs are patched. Be patient, we are compiling the new patch right now but compiling is run on a different high end AMD machine of my brother. Tomorrow I will have access to the laptop and I will update the kernel and send you the V2 patch. Compiling to that laptop would possibly need a whole day even with AC power! Do you want /proc/cpuinfo on the V2 patch e-mail? Both CPUs needed? Here is a dump from an older installation some months ago I kept on my disk(tomorrow I will dump it again if you want): processor : 0 vendor_id : AuthenticAMD cpu family : 20 model : 2 model name : AMD E-300 APU with Radeon(tm) HD Graphics stepping: 0 microcode : 0x5000119 cpu MHz : 1300.000 cache size : 512 KB physical id : 0 siblings: 2 core id : 0 cpu cores : 2 apicid : 0 initial apicid : 0 fpu : yes fpu_exception : yes cpuid level : 6 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni monitor ssse3 cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch ibs skinit wdt hw_pstate vmmcall arat npt lbrv svm_lock nrip_save pausefilter bugs: fxsave_leak sysret_ss_attrs bogomips: 2594.69 TLB size: 1024 4K pages clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate processor : 1 vendor_id : AuthenticAMD cpu family : 20 model : 2 model name : AMD E-300 APU with Radeon(tm) HD Graphics stepping: 0 microcode : 0x5000119 cpu MHz : 1300.000 cache size : 512 KB physical id : 0 siblings: 2 core id : 1 cpu cores : 2 apicid : 1 initial apicid : 1 fpu : yes fpu_exception : yes cpuid level : 6 wp : yes flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc extd_apicid aperfmperf pni monitor ssse3 cx16 popcnt lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch ibs skinit wdt hw_pstate vmmcall arat npt lbrv svm_lock nrip_save pausefilter bugs: fxsave_leak sysret_ss_attrs bogomips: 2594.69 TLB size: 1024 4K pages clflush size: 64 cache_alignment : 64 address sizes : 36 bits physical, 48 bits virtual power management: ts ttp tm stc 100mhzsteps hwpstate Then, keep that *whole* changelog above when sending v2 of the patch What do you mean? It is not clear to me, Do you mean all the info we wrote on the e-mail, your comments or both? We have many AMD machines and we will need your help next week to patch our Phenom(tm) II X6. Let's finish this patch first and we will fix that too but it appears to be much more difficult...
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
On Wed, Oct 19, 2016 at 04:58:08PM +0300, sonofa...@openmailbox.org wrote: > > AMD F14h machines have an erratum which can cause unpredictable program > behaviour under specific branch conditions. The workaround is to set > MSRC001_1021[14] and MSRC001_1021[3]. Both bits are reserved for this MSR, > so we trust AMD suggestions. Since there is no BIOS update containing that > workaround for some machines, we do it ourselves unconditionally on this > family too. Our Compaq CQ57 laptop which has broken firmware in various > areas does not contain both workarounds(MSRc0011021: 10208000)... ... > +#define MSR_AMD64_IC_CFG 0xC0011021 > + > +static void init_amd_on(struct cpuinfo_x86 *c) > +{ > + /* > + * Apply erratum 688 fix unconditionally so machines without a BIOS > + * fix work. > + */ > + msr_set_bit(MSR_AMD64_IC_CFG, 3); > + msr_set_bit(MSR_AMD64_IC_CFG, 14); > +} You can't force this unconditionally. Look at the suggested workaround: "BIOS should set MSRC001_1021[14] = 1b and MSRC001_1021[3] = 1b. This workaround is required only when bit 2 of Fixed Errata Status Register (D18F4x164[2]) = 0b." So you need to do something like this: if (c->x86_model == 2 && c->x86_mask == 0) { u32 val = pci_read_config(0, 0x18, 0x4, 0x164); if (!(val & BIT(2))) { msr_set_bit(MSR_AMD64_IC_CFG, 3); msr_set_bit(MSR_AMD64_IC_CFG, 14); } } Also, please paste /proc/cpuinfo from that machine. Then, keep that *whole* changelog above when sending v2 of the patch - I like the level of detail of your explanation! ;-) Thanks. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
Re: [PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
On Wed, Oct 19, 2016 at 04:58:08PM +0300, sonofa...@openmailbox.org wrote: > > AMD F14h machines have an erratum which can cause unpredictable program > behaviour under specific branch conditions. The workaround is to set > MSRC001_1021[14] and MSRC001_1021[3]. Both bits are reserved for this MSR, > so we trust AMD suggestions. Since there is no BIOS update containing that > workaround for some machines, we do it ourselves unconditionally on this > family too. Our Compaq CQ57 laptop which has broken firmware in various > areas does not contain both workarounds(MSRc0011021: 10208000)... ... > +#define MSR_AMD64_IC_CFG 0xC0011021 > + > +static void init_amd_on(struct cpuinfo_x86 *c) > +{ > + /* > + * Apply erratum 688 fix unconditionally so machines without a BIOS > + * fix work. > + */ > + msr_set_bit(MSR_AMD64_IC_CFG, 3); > + msr_set_bit(MSR_AMD64_IC_CFG, 14); > +} You can't force this unconditionally. Look at the suggested workaround: "BIOS should set MSRC001_1021[14] = 1b and MSRC001_1021[3] = 1b. This workaround is required only when bit 2 of Fixed Errata Status Register (D18F4x164[2]) = 0b." So you need to do something like this: if (c->x86_model == 2 && c->x86_mask == 0) { u32 val = pci_read_config(0, 0x18, 0x4, 0x164); if (!(val & BIT(2))) { msr_set_bit(MSR_AMD64_IC_CFG, 3); msr_set_bit(MSR_AMD64_IC_CFG, 14); } } Also, please paste /proc/cpuinfo from that machine. Then, keep that *whole* changelog above when sending v2 of the patch - I like the level of detail of your explanation! ;-) Thanks. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply.
[PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
AMD F14h machines have an erratum which can cause unpredictable program behaviour under specific branch conditions. The workaround is to set MSRC001_1021[14] and MSRC001_1021[3]. Both bits are reserved for this MSR, so we trust AMD suggestions. Since there is no BIOS update containing that workaround for some machines, we do it ourselves unconditionally on this family too. Our Compaq CQ57 laptop which has broken firmware in various areas does not contain both workarounds(MSRc0011021: 10208000)... HP does not release a proper BIOS even though we have contacted them and requested an updated BIOS that will fix all errors we spotted. As it is not currently covered by any warranty, they do not support it. HP does not care, but Linux kernel cares to patch out-of-warranty hardware with crappy firmware! Thanks to the author of commit d1992996753132e2dafe955cccb2fb0714d3cfc4 (x86/AMD: Apply erratum 665 on machines without a BIOS fix) as he paved the way to this fix. That patch was not applicable on our machine but it brought back to surface a long standing bug of our E-300 laptop. Poor performance under Debian was observed and things got worse after switching to Ubuntu as crashes became more frequent! As a result the laptop got replaced with a desktop. After some time, we decided to dig deeper and see what is wrong with our laptop. Actually perf proved that something was terrible wrong as branch-misses reached 40% within a minute after booting the E-300 ontario C0 APU! Disabling the second CPU did not help either. CPU Revision Guide erratum 688 seemed promising as it described our issues and we prepared a fix. Now the laptop works and has both workarounds(MSRc0011021: 1020c008)! Since this erratum affects many laptops and some tablets, we request to backport it to stable kernels. Tested on Compaq CQ57-499 laptop. Signed-off-by: Ioannis BarkasSigned-off-by: Nikos Barkas Cc: Borislav Petkov Cc: --- Hello we are Ioannis Barkas (sonofa...@openmailbox.org) and Nikos Barkas (level...@gmail.com). This patch was sent from my yahoo e-mail in the morning and got rejected! Why? Resending... We have had poor performance on our AMD laptop with Debian for some years. Initial value of MSRc0011021 is 10208000h and D18F4x164 is 0003h. Our laptop was not usable even with Ubuntu 16.04 using the radeon driver. What is worse, opening firefox with https://planefinder.net/ after booting Ubuntu, resulted in firefox crashes again and again. After this patch we have not met any problem with that webpage and firefox. Unfortunately linux-tools were not present for our custom kernel and perf could not be launched:( When the patch arrives on Ubuntu 16.10 kernel, we shall recheck it. If branch-misses remain above 10%, we will open a bug for it. --- a/arch/x86/kernel/cpu/amd.c 2016-10-07 16:03:33.0 +0300 +++ b/arch/x86/kernel/cpu/amd.c 2016-10-12 13:25:34.791720549 +0300 @@ -680,6 +680,18 @@ static void init_amd_ln(struct cpuinfo_x msr_set_bit(MSR_AMD64_DE_CFG, 31); } +#define MSR_AMD64_IC_CFG 0xC0011021 + +static void init_amd_on(struct cpuinfo_x86 *c) +{ + /* +* Apply erratum 688 fix unconditionally so machines without a BIOS +* fix work. +*/ + msr_set_bit(MSR_AMD64_IC_CFG, 3); + msr_set_bit(MSR_AMD64_IC_CFG, 14); +} + static void init_amd_bd(struct cpuinfo_x86 *c) { u64 value; @@ -738,6 +750,7 @@ static void init_amd(struct cpuinfo_x86 case 0xf: init_amd_k8(c); break; case 0x10: init_amd_gh(c); break; case 0x12: init_amd_ln(c); break; + case 0x14: init_amd_on(c); break; case 0x15: init_amd_bd(c); break; }
[PATCH] x86/AMD: Apply erratum 688 on machines without a BIOS fix
AMD F14h machines have an erratum which can cause unpredictable program behaviour under specific branch conditions. The workaround is to set MSRC001_1021[14] and MSRC001_1021[3]. Both bits are reserved for this MSR, so we trust AMD suggestions. Since there is no BIOS update containing that workaround for some machines, we do it ourselves unconditionally on this family too. Our Compaq CQ57 laptop which has broken firmware in various areas does not contain both workarounds(MSRc0011021: 10208000)... HP does not release a proper BIOS even though we have contacted them and requested an updated BIOS that will fix all errors we spotted. As it is not currently covered by any warranty, they do not support it. HP does not care, but Linux kernel cares to patch out-of-warranty hardware with crappy firmware! Thanks to the author of commit d1992996753132e2dafe955cccb2fb0714d3cfc4 (x86/AMD: Apply erratum 665 on machines without a BIOS fix) as he paved the way to this fix. That patch was not applicable on our machine but it brought back to surface a long standing bug of our E-300 laptop. Poor performance under Debian was observed and things got worse after switching to Ubuntu as crashes became more frequent! As a result the laptop got replaced with a desktop. After some time, we decided to dig deeper and see what is wrong with our laptop. Actually perf proved that something was terrible wrong as branch-misses reached 40% within a minute after booting the E-300 ontario C0 APU! Disabling the second CPU did not help either. CPU Revision Guide erratum 688 seemed promising as it described our issues and we prepared a fix. Now the laptop works and has both workarounds(MSRc0011021: 1020c008)! Since this erratum affects many laptops and some tablets, we request to backport it to stable kernels. Tested on Compaq CQ57-499 laptop. Signed-off-by: Ioannis Barkas Signed-off-by: Nikos Barkas Cc: Borislav Petkov Cc: --- Hello we are Ioannis Barkas (sonofa...@openmailbox.org) and Nikos Barkas (level...@gmail.com). This patch was sent from my yahoo e-mail in the morning and got rejected! Why? Resending... We have had poor performance on our AMD laptop with Debian for some years. Initial value of MSRc0011021 is 10208000h and D18F4x164 is 0003h. Our laptop was not usable even with Ubuntu 16.04 using the radeon driver. What is worse, opening firefox with https://planefinder.net/ after booting Ubuntu, resulted in firefox crashes again and again. After this patch we have not met any problem with that webpage and firefox. Unfortunately linux-tools were not present for our custom kernel and perf could not be launched:( When the patch arrives on Ubuntu 16.10 kernel, we shall recheck it. If branch-misses remain above 10%, we will open a bug for it. --- a/arch/x86/kernel/cpu/amd.c 2016-10-07 16:03:33.0 +0300 +++ b/arch/x86/kernel/cpu/amd.c 2016-10-12 13:25:34.791720549 +0300 @@ -680,6 +680,18 @@ static void init_amd_ln(struct cpuinfo_x msr_set_bit(MSR_AMD64_DE_CFG, 31); } +#define MSR_AMD64_IC_CFG 0xC0011021 + +static void init_amd_on(struct cpuinfo_x86 *c) +{ + /* +* Apply erratum 688 fix unconditionally so machines without a BIOS +* fix work. +*/ + msr_set_bit(MSR_AMD64_IC_CFG, 3); + msr_set_bit(MSR_AMD64_IC_CFG, 14); +} + static void init_amd_bd(struct cpuinfo_x86 *c) { u64 value; @@ -738,6 +750,7 @@ static void init_amd(struct cpuinfo_x86 case 0xf: init_amd_k8(c); break; case 0x10: init_amd_gh(c); break; case 0x12: init_amd_ln(c); break; + case 0x14: init_amd_on(c); break; case 0x15: init_amd_bd(c); break; }