Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
Heiko Schocher h...@denx.de wrote on 2010/03/08 08:46:29: Hello Joakim, Joakim Tjernlund wrote: [...] What would be interesting is to skip patch 3 and turn off MODULES add PIN_TLB and compare that against your unpatched .33 but with MODULES off and PIN_TLB on run version 1-4 Linux2.6.33-rc without module support and PIN_TLB=on 5-8 Linux2.6.33-rc without module support and PIN_TLB=on + patches 1,2,4 L M B E N C H 3 . 0 S U M M A R Y (Alpha software, do not distribute) hmm, these results varies a lot. The only stable result I can see is: Memory latencies in nanoseconds - smaller is better (WARNING - may not be correct, check graphs) -- Host OS Mhz L1 $ L2 $Main memRand memGuesses - - --- --- tqm8xxLinux 2.6.33-66 31.7 183.2 184.0 1163.0No L2 cache? tqm8xxLinux 2.6.33-66 31.7 183.2 184.0 1164.8No L2 cache? tqm8xxLinux 2.6.33-66 31.7 183.2 184.0 1163.2No L2 cache? tqm8xxLinux 2.6.33-66 31.7 183.2 183.8 1163.7No L2 cache? tqm8xxLinux 2.6.33-66 31.8 172.4 173.2 1147.3No L2 cache? tqm8xxLinux 2.6.33-66 31.8 172.5 173.2 1148.3No L2 cache? tqm8xxLinux 2.6.33-66 31.8 172.5 173.1 1146.9No L2 cache? tqm8xxLinux 2.6.33-66 31.8 172.5 173.2 1147.3No L2 cache? I don't see why the other results vary so much. Are you using NFS or having much network traffic? Jocke ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
Hello Joakim, Joakim Tjernlund wrote: Heiko Schocher h...@denx.de wrote on 2010/03/08 08:46:29: Hello Joakim, Joakim Tjernlund wrote: [...] What would be interesting is to skip patch 3 and turn off MODULES add PIN_TLB and compare that against your unpatched .33 but with MODULES off and PIN_TLB on run version 1-4 Linux2.6.33-rc without module support and PIN_TLB=on 5-8 Linux2.6.33-rc without module support and PIN_TLB=on + patches 1,2,4 L M B E N C H 3 . 0 S U M M A R Y (Alpha software, do not distribute) hmm, these results varies a lot. The only stable result I can see is: Memory latencies in nanoseconds - smaller is better (WARNING - may not be correct, check graphs) -- Host OS Mhz L1 $ L2 $Main memRand mem Guesses - - --- --- tqm8xxLinux 2.6.33-66 31.7 183.2 184.0 1163.0No L2 cache? tqm8xxLinux 2.6.33-66 31.7 183.2 184.0 1164.8No L2 cache? tqm8xxLinux 2.6.33-66 31.7 183.2 184.0 1163.2No L2 cache? tqm8xxLinux 2.6.33-66 31.7 183.2 183.8 1163.7No L2 cache? tqm8xxLinux 2.6.33-66 31.8 172.4 173.2 1147.3No L2 cache? tqm8xxLinux 2.6.33-66 31.8 172.5 173.2 1148.3No L2 cache? tqm8xxLinux 2.6.33-66 31.8 172.5 173.1 1146.9No L2 cache? tqm8xxLinux 2.6.33-66 31.8 172.5 173.2 1147.3No L2 cache? I don't see why the other results vary so much. Are you using NFS or having much network traffic? I use NFS. bye Heiko -- DENX Software Engineering GmbH, MD: Wolfgang Denk Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
Heiko Schocher h...@denx.de wrote on 2010/03/08 10:06:39: Hello Joakim, Joakim Tjernlund wrote: Heiko Schocher h...@denx.de wrote on 2010/03/08 08:46:29: Hello Joakim, Joakim Tjernlund wrote: [...] What would be interesting is to skip patch 3 and turn off MODULES add PIN_TLB and compare that against your unpatched .33 but with MODULES off and PIN_TLB on run version 1-4 Linux2.6.33-rc without module support and PIN_TLB=on 5-8 Linux2.6.33-rc without module support and PIN_TLB=on + patches 1,2,4 L M B E N C H 3 . 0 S U M M A R Y (Alpha software, do not distribute) hmm, these results varies a lot. The only stable result I can see is: Memory latencies in nanoseconds - smaller is better (WARNING - may not be correct, check graphs) -- Host OS Mhz L1 $ L2 $Main memRand mem Guesses - - --- --- tqm8xxLinux 2.6.33-66 31.7 183.2 184.0 1163.0No L2 cache? tqm8xxLinux 2.6.33-66 31.7 183.2 184.0 1164.8No L2 cache? tqm8xxLinux 2.6.33-66 31.7 183.2 184.0 1163.2No L2 cache? tqm8xxLinux 2.6.33-66 31.7 183.2 183.8 1163.7No L2 cache? tqm8xxLinux 2.6.33-66 31.8 172.4 173.2 1147.3No L2 cache? tqm8xxLinux 2.6.33-66 31.8 172.5 173.2 1148.3No L2 cache? tqm8xxLinux 2.6.33-66 31.8 172.5 173.1 1146.9No L2 cache? tqm8xxLinux 2.6.33-66 31.8 172.5 173.2 1147.3No L2 cache? I don't see why the other results vary so much. Are you using NFS or having much network traffic? I use NFS. Then I think it is possible NFS gets in the way for stable measurements. Anyone have experience with running lmbench on NFS? Jocke ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
Dear Joakim Tjernlund, In message of1413a940.58e7b20e-onc12576e0.003a9000-c12576e0.003ac...@transmode.se you wrote: I use NFS. Then I think it is possible NFS gets in the way for stable measurements. Anyone have experience with running lmbench on NFS? NFS may have some influence here, but I doubt it is the primary cause for these variations. The network where Heiko is running these tests is mostly idle, so it should provide fairly constant conditions. Of coursem the use of the network on the MPC8xx itself will add to the variation, but again I would not expect so big differences. Heiko - there is a 10 GB disk attached to the tqm8xx system; I think there should be a usable root file system on it, but I cannot remember the actual state. Maybe we can use that. Please contact me on jabber this afternoon! Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: w...@denx.de Living on Earth may be expensive, but it includes an annual free trip around the Sun. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
Heiko Schocher h...@denx.de wrote on 2010/03/04 17:30:07: From: Heiko Schocher h...@denx.de To: Joakim Tjernlund joakim.tjernl...@transmode.se Cc: Wolfgang Denk w...@denx.de, Klaus-Jürgen heyd...@kieback-peter.de, linuxppc-...@ozlabs.org, Scott Wood scottw...@freescale.com Date: 2010/03/04 17:30 Subject: Re: [PATCH 0/4] 8xx: Optimize TLB Miss code. Hello Joakim, Joakim Tjernlund wrote: Wolfgang Denk w...@denx.de wrote on 2010/03/04 13:16:56: From: Wolfgang Denk w...@denx.de To: h...@denx.de Cc: Joakim Tjernlund joakim.tjernl...@transmode.se, Klaus-Jürgen heyd...@kieback-peter.de, linuxppc-...@ozlabs.org, Scott Wood scottw...@freescale.com Date: 2010/03/04 13:17 Subject: Re: [PATCH 0/4] 8xx: Optimize TLB Miss code. Dear Heiko, thanks for running the tests. In message 4b8f8bb4.6070...@denx.de you wrote: here the results: run version 1-4 2.6.33-rc6 without your patches 5-8 2.6.33-rc6 with all your patches 9-12 2.6.33-rc6 with patches 1,2 and 4 (without 8xx: Don't touch ACCESSED when no SWAP) 13-16 2.6.33-rc6 with all your patches and CONFIG_PIN_TLB=y So CONFIG_PIN_TLB imroves the performance as expected, while the other patches don;t show any measurable improvememt - or am I reading the results incorrectly? BTW, I have impl. all of the newer 2.6 TLB/MMU fixes(including the dcbX fixup) for 2.4 as well. If there is any interest I can polish them and submit for 2.4? I do need an external tester for that though. Jocke ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
Hello Joakim, Joakim Tjernlund wrote: [...] What would be interesting is to skip patch 3 and turn off MODULES add PIN_TLB and compare that against your unpatched .33 but with MODULES off and PIN_TLB on run version 1-4 Linux2.6.33-rc without module support and PIN_TLB=on 5-8 Linux2.6.33-rc without module support and PIN_TLB=on + patches 1,2,4 L M B E N C H 3 . 0 S U M M A R Y (Alpha software, do not distribute) Basic system parameters -- Host OS Description Mhz tlb cache mem scal pages line par load bytes - - --- - - -- tqm8xxLinux 2.6.33- powerpc-linux-gnu 662816 1.01001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 662816 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 662816 1.03001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 662816 1.01001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 662816 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 662816 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 66 716 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 662816 1.01001 Processor, Processes - times in microseconds - smaller is better -- Host OS Mhz null null open slct sig sig fork exec sh call I/O stat clos TCP inst hndl proc proc proc - - tqm8xxLinux 2.6.33- 66 2.97 8.91 127. 1238 270. 22.3 92.1 6386 27.K 83.K tqm8xxLinux 2.6.33- 66 3.05 8.99 129. 1208 261. 22.3 85.3 6418 27.K 83.K tqm8xxLinux 2.6.33- 66 3.05 8.81 128. 1205 270. 22.3 87.3 6342 27.K 82.K tqm8xxLinux 2.6.33- 66 3.05 8.82 132. 1215 270. 23.1 86.7 6357 27.K 82.K tqm8xxLinux 2.6.33- 66 3.28 9.29 128. 1257 260. 23.9 83.7 6511 28.K 84.K tqm8xxLinux 2.6.33- 66 3.34 9.35 126. 1264 271. 23.1 86.6 6437 27.K 84.K tqm8xxLinux 2.6.33- 66 3.19 8.97 130. 1212 271. 23.1 95.3 6480 27.K 84.K tqm8xxLinux 2.6.33- 66 3.28 8.76 127. 1229 269. 22.9 90.9 6293 27.K 82.K Basic integer operations - times in nanoseconds - smaller is better --- Host OS intgr intgr intgr intgr intgr bit addmuldivmod - - -- -- -- -- -- tqm8xxLinux 2.6.33- 15.2 17.9 1.2500 124.1 202.4 tqm8xxLinux 2.6.33- 15.6 18.0 1.1900 124.1 196.4 tqm8xxLinux 2.6.33- 15.2 17.9 1.2400 124.9 202.5 tqm8xxLinux 2.6.33- 15.2 17.9 1.2400 124.2 196.8 tqm8xxLinux 2.6.33- 15.7 17.9 1.5500 124.2 203.6 tqm8xxLinux 2.6.33- 15.7 17.9 1.5500 124.2 202.1 tqm8xxLinux 2.6.33- 15.7 17.9 1.5700 125.0 202.2 tqm8xxLinux 2.6.33- 15.7 17.9 1.5500 121.1 196.4 Basic uint64 operations - times in nanoseconds - smaller is better -- Host OS int64 int64 int64 int64 int64 bitaddmuldivmod - - -- -- -- -- -- tqm8xxLinux 2.6.33-15. 12.9 1944.1 1895.2 tqm8xxLinux 2.6.33-15. 12.9 1886.3 1894.4 tqm8xxLinux 2.6.33-15. 12.9 1944.1 1895.2 tqm8xxLinux 2.6.33-15. 12.9 1886.3 1894.8 tqm8xxLinux 2.6.33-15. 13.2 1944.1 1894.4 tqm8xxLinux 2.6.33-15. 13.2 1944.8 1896.3 tqm8xxLinux 2.6.33-15. 13.2 1945.2 1837.4 tqm8xxLinux 2.6.33-15. 13.2 1957.8 1907.4 Basic float operations - times in nanoseconds - smaller is better - Host OS float float float float addmuldivbogo - - -- -- -- -- tqm8xxLinux 2.6.33- 1011.0 1620.2 5467.0 9868.0 tqm8xxLinux 2.6.33- 1004.5 1630.1 5468.0 9852.0 tqm8xxLinux 2.6.33- 1012.2 1620.5 5472.0 9855.0 tqm8xxLinux 2.6.33- 1011.0 1620.2 5469.0 9866.0 tqm8xxLinux 2.6.33- 1004.8 1617.3 5503.0 9856.0 tqm8xxLinux 2.6.33- 1004.9 1577.1 5469.0 9859.0 tqm8xxLinux 2.6.33- 1011.4 1618.5 5470.0 9859.0 tqm8xxLinux 2.6.33- 1004.9 1620.5 5471.0 9904.0 Basic double operations - times in nanoseconds - smaller is better -- Host OS double double
Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
Heiko Schocher h...@denx.de wrote on 2010/03/04 17:30:07: Hello Joakim, Joakim Tjernlund wrote: Wolfgang Denk w...@denx.de wrote on 2010/03/04 13:16:56: From: Wolfgang Denk w...@denx.de To: h...@denx.de Cc: Joakim Tjernlund joakim.tjernl...@transmode.se, Klaus-Jürgen heyd...@kieback-peter.de, linuxppc-...@ozlabs.org, Scott Wood scottw...@freescale.com Date: 2010/03/04 13:17 Subject: Re: [PATCH 0/4] 8xx: Optimize TLB Miss code. Dear Heiko, thanks for running the tests. In message 4b8f8bb4.6070...@denx.de you wrote: here the results: run version 1-4 2.6.33-rc6 without your patches 5-8 2.6.33-rc6 with all your patches 9-12 2.6.33-rc6 with patches 1,2 and 4 (without 8xx: Don't touch ACCESSED when no SWAP) 13-16 2.6.33-rc6 with all your patches and CONFIG_PIN_TLB=y So CONFIG_PIN_TLB imroves the performance as expected, while the other patches don;t show any measurable improvememt - or am I reading the results incorrectly? Close but not quite. What stands out most is: Memory latencies in nanoseconds - smaller is better (WARNING - may not be correct, check graphs) -- Host OS Mhz L1 $ L2 $Main memRand mem Guesses - - --- --- tqm8xxLinux 2.6.33-66 31.8 141.0 184.0 1165.7 tqm8xxLinux 2.6.33-66 31.8 141.2 184.2 1165.3 tqm8xxLinux 2.6.33-66 31.8 141.3 184.3 1165.6 tqm8xxLinux 2.6.33-66 31.8 141.3 184.2 1166.2 tqm8xxLinux 2.6.33-66 31.8 141.0 171.8 1100.5No L2 cache? tqm8xxLinux 2.6.33-66 31.8 141.0 171.8 1102.5No L2 cache? tqm8xxLinux 2.6.33-66 31.8 141.0 171.8 1101.7No L2 cache? tqm8xxLinux 2.6.33-66 31.8 141.0 171.8 1101.6No L2 cache? tqm8xxLinux 2.6.33-66 31.8 141.1 173.4 1149.1No L2 cache? tqm8xxLinux 2.6.33-66 31.8 141.1 173.4 1149.0No L2 cache? tqm8xxLinux 2.6.33-66 31.7 141.1 173.4 1148.7No L2 cache? tqm8xxLinux 2.6.33-66 31.7 141.1 173.4 1148.2No L2 cache? tqm8xxLinux 2.6.33-66 31.8 171.1 171.7 1099.8No L2 cache? tqm8xxLinux 2.6.33-66 31.8 171.1 171.6 1100.5No L2 cache? tqm8xxLinux 2.6.33-66 31.7 171.0 171.7 1101.0No L2 cache? tqm8xxLinux 2.6.33-66 31.8 171.0 171.6 1101.3No L2 cache? Besides the numbers, note how the first group doesn't have a Guesses entry. Is there something odd with the results for the first group? Hmm.. just to be safe, I made this test again, but it shows also no entry in Guesses ... Hardware, Linux Source, rootFS, lmbench sources, all the same ... OK Also, since you are using MODULES, patch 2 is nullified. Patch 1 is very minor and should not show I think. This leaves patches 3 4. There appears to be something funny with patch 3,Don't touch ACCESSED when no SWAP, as it yields bad numbers for Prot Fault so perhaps I am missing something that needs ACCESSED even if NO_SWAP. Perhaps a someone that knows MM in Linux knows? Is there any messages in the kernel log(dmesg)? I couldn;t find something in the output with dmesg ... but if you want this output, I can send it to you. No, if you can't find anything in there, I won't either. What would be interesting is to skip patch 3 and turn off MODULES add PIN_TLB and compare that against your unpatched .33 but with MODULES off and PIN_TLB on Jocke ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
Hello Joakim, Joakim Tjernlund wrote: Could you try reverting patch: 8xx: Don't touch ACCESSED when no SWAP. and see if that makes a difference? [...] Turning on pinned TLBs(you must turn on ADVANCED_OPTIONS first) could be an improvement, regardless of my patches. here the results: run version 1-4 2.6.33-rc6 without your patches 5-8 2.6.33-rc6 with all your patches 9-122.6.33-rc6 with patches 1,2 and 4 (without 8xx: Don't touch ACCESSED when no SWAP) 13-16 2.6.33-rc6 with all your patches and CONFIG_PIN_TLB=y Turning on pinned TLBs(you must turn on ADVANCED_OPTIONS first) could be an improvement, regardless of my patches. make[1]: Entering directory `/home/hs/lmbench-3.0-a9/results' L M B E N C H 3 . 0 S U M M A R Y (Alpha software, do not distribute) Basic system parameters -- Host OS Description Mhz tlb cache mem scal pages line par load bytes - - --- - - -- tqm8xxLinux 2.6.33- powerpc-linux-gnu 663216 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 66 716 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 66 716 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 663216 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 663216 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 66 716 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 66 716 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 663216 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 663216 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 663216 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 663216 1.01001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 663216 1.01001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 662816 1.17001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 66 716 1.01001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 662816 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 66 716 1.04001 Processor, Processes - times in microseconds - smaller is better -- Host OS Mhz null null open slct sig sig fork exec sh call I/O stat clos TCP inst hndl proc proc proc - - tqm8xxLinux 2.6.33- 66 2.97 10.3 129. 1377 272. 21.8 91.3 6949 29.K 89.K tqm8xxLinux 2.6.33- 66 3.06 10.5 124. 1375 273. 21.8 91.3 7136 30.K 89.K tqm8xxLinux 2.6.33- 66 3.06 10.6 129. 1365 272. 21.2 96.6 6889 29.K 89.K tqm8xxLinux 2.6.33- 66 3.06 10.5 124. 1309 272. 21.8 101. 6896 29.K 89.K tqm8xxLinux 2.6.33- 66 2.97 8.86 126. 1336 273. 21.7 84.2 6785 29.K 88.K tqm8xxLinux 2.6.33- 66 3.06 8.90 130. 1343 263. 21.3 84.7 7080 29.K 88.K tqm8xxLinux 2.6.33- 66 3.52 8.97 129. 1339 270. 22.4 84.4 6823 29.K 88.K tqm8xxLinux 2.6.33- 66 2.97 8.99 127. 1333 261. 22.4 87.0 7037 29.K 87.K tqm8xxLinux 2.6.33- 66 3.06 8.83 128. 1355 269. 20.7 89.2 6927 29.K 87.K tqm8xxLinux 2.6.33- 66 3.05 8.84 127. 1344 271. 21.6 90.5 6868 29.K 88.K tqm8xxLinux 2.6.33- 66 3.06 8.84 131. 1376 260. 21.4 88.1 7119 29.K 87.K tqm8xxLinux 2.6.33- 66 3.05 8.90 122. 1342 272. 21.4 88.6 6847 29.K 88.K tqm8xxLinux 2.6.33- 66 3.19 9.10 122. 1205 265. 20.9 90.3 6358 27.K 83.K tqm8xxLinux 2.6.33- 66 3.28 9.10 124. 1208 270. 20.9 95.2 6217 27.K 82.K tqm8xxLinux 2.6.33- 66 3.19 8.98 125. 1210 270. 21.1 87.9 6364 27.K 83.K tqm8xxLinux 2.6.33- 66 3.19 8.86 124. 1237 262. 21.3 90.7 6311 27.K 84.K Basic integer operations - times in nanoseconds - smaller is better --- Host OS intgr intgr intgr intgr intgr bit addmuldivmod - - -- -- -- -- -- tqm8xxLinux 2.6.33- 15.7 18.0 1.5600 124.2 203.1 tqm8xxLinux 2.6.33- 15.7 17.4 1.5800 121.1 202.8 tqm8xxLinux 2.6.33- 15.2 17.9 1.6200 124.2 202.7 tqm8xxLinux 2.6.33- 15.2 17.9 1.6000 125.0 204.0 tqm8xxLinux 2.6.33- 15.7 18.1 1.5600 124.7 204.4 tqm8xxLinux 2.6.33- 15.7 18.1 1.5800 124.2 202.8 tqm8xxLinux 2.6.33- 15.7 17.9 1.5500 124.2 203.2 tqm8xxLinux 2.6.33- 15.7 18.1 1.5500 124.5 202.0 tqm8xxLinux 2.6.33- 15.7 18.1 1.5500
Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
Dear Heiko, thanks for running the tests. In message 4b8f8bb4.6070...@denx.de you wrote: here the results: run version 1-4 2.6.33-rc6 without your patches 5-8 2.6.33-rc6 with all your patches 9-12 2.6.33-rc6 with patches 1,2 and 4 (without 8xx: Don't touch ACCESSED when no SWAP) 13-16 2.6.33-rc6 with all your patches and CONFIG_PIN_TLB=y So CONFIG_PIN_TLB imroves the performance as expected, while the other patches don;t show any measurable improvememt - or am I reading the results incorrectly? Best regards, Wolfgang Denk -- DENX Software Engineering GmbH, MD: Wolfgang Denk Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany Phone: (+49)-8142-66989-10 Fax: (+49)-8142-66989-80 Email: w...@denx.de And now remains That we find out the cause of this effect, Or rather say, the cause of this defect... -- Hamlet, Act II, Scene 2 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
Wolfgang Denk w...@denx.de wrote on 2010/03/04 13:16:56: From: Wolfgang Denk w...@denx.de To: h...@denx.de Cc: Joakim Tjernlund joakim.tjernl...@transmode.se, Klaus-Jürgen heyd...@kieback-peter.de, linuxppc-...@ozlabs.org, Scott Wood scottw...@freescale.com Date: 2010/03/04 13:17 Subject: Re: [PATCH 0/4] 8xx: Optimize TLB Miss code. Dear Heiko, thanks for running the tests. In message 4b8f8bb4.6070...@denx.de you wrote: here the results: run version 1-4 2.6.33-rc6 without your patches 5-8 2.6.33-rc6 with all your patches 9-12 2.6.33-rc6 with patches 1,2 and 4 (without 8xx: Don't touch ACCESSED when no SWAP) 13-16 2.6.33-rc6 with all your patches and CONFIG_PIN_TLB=y So CONFIG_PIN_TLB imroves the performance as expected, while the other patches don;t show any measurable improvememt - or am I reading the results incorrectly? Close but not quite. What stands out most is: Memory latencies in nanoseconds - smaller is better (WARNING - may not be correct, check graphs) -- Host OS Mhz L1 $ L2 $Main memRand memGuesses - - --- --- tqm8xxLinux 2.6.33-66 31.8 141.0 184.0 1165.7 tqm8xxLinux 2.6.33-66 31.8 141.2 184.2 1165.3 tqm8xxLinux 2.6.33-66 31.8 141.3 184.3 1165.6 tqm8xxLinux 2.6.33-66 31.8 141.3 184.2 1166.2 tqm8xxLinux 2.6.33-66 31.8 141.0 171.8 1100.5No L2 cache? tqm8xxLinux 2.6.33-66 31.8 141.0 171.8 1102.5No L2 cache? tqm8xxLinux 2.6.33-66 31.8 141.0 171.8 1101.7No L2 cache? tqm8xxLinux 2.6.33-66 31.8 141.0 171.8 1101.6No L2 cache? tqm8xxLinux 2.6.33-66 31.8 141.1 173.4 1149.1No L2 cache? tqm8xxLinux 2.6.33-66 31.8 141.1 173.4 1149.0No L2 cache? tqm8xxLinux 2.6.33-66 31.7 141.1 173.4 1148.7No L2 cache? tqm8xxLinux 2.6.33-66 31.7 141.1 173.4 1148.2No L2 cache? tqm8xxLinux 2.6.33-66 31.8 171.1 171.7 1099.8No L2 cache? tqm8xxLinux 2.6.33-66 31.8 171.1 171.6 1100.5No L2 cache? tqm8xxLinux 2.6.33-66 31.7 171.0 171.7 1101.0No L2 cache? tqm8xxLinux 2.6.33-66 31.8 171.0 171.6 1101.3No L2 cache? Besides the numbers, note how the first group doesn't have a Guesses entry. Is there something odd with the results for the first group? Also, since you are using MODULES, patch 2 is nullified. Patch 1 is very minor and should not show I think. This leaves patches 3 4. There appears to be something funny with patch 3,Don't touch ACCESSED when no SWAP, as it yields bad numbers for Prot Fault so perhaps I am missing something that needs ACCESSED even if NO_SWAP. Perhaps a someone that knows MM in Linux knows? Is there any messages in the kernel log(dmesg)? Jocke ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
Hello Joakim, Joakim Tjernlund wrote: Wolfgang Denk w...@denx.de wrote on 2010/03/04 13:16:56: From: Wolfgang Denk w...@denx.de To: h...@denx.de Cc: Joakim Tjernlund joakim.tjernl...@transmode.se, Klaus-Jürgen heyd...@kieback-peter.de, linuxppc-...@ozlabs.org, Scott Wood scottw...@freescale.com Date: 2010/03/04 13:17 Subject: Re: [PATCH 0/4] 8xx: Optimize TLB Miss code. Dear Heiko, thanks for running the tests. In message 4b8f8bb4.6070...@denx.de you wrote: here the results: run version 1-4 2.6.33-rc6 without your patches 5-8 2.6.33-rc6 with all your patches 9-12 2.6.33-rc6 with patches 1,2 and 4 (without 8xx: Don't touch ACCESSED when no SWAP) 13-16 2.6.33-rc6 with all your patches and CONFIG_PIN_TLB=y So CONFIG_PIN_TLB imroves the performance as expected, while the other patches don;t show any measurable improvememt - or am I reading the results incorrectly? Close but not quite. What stands out most is: Memory latencies in nanoseconds - smaller is better (WARNING - may not be correct, check graphs) -- Host OS Mhz L1 $ L2 $Main memRand memGuesses - - --- --- tqm8xxLinux 2.6.33-66 31.8 141.0 184.0 1165.7 tqm8xxLinux 2.6.33-66 31.8 141.2 184.2 1165.3 tqm8xxLinux 2.6.33-66 31.8 141.3 184.3 1165.6 tqm8xxLinux 2.6.33-66 31.8 141.3 184.2 1166.2 tqm8xxLinux 2.6.33-66 31.8 141.0 171.8 1100.5No L2 cache? tqm8xxLinux 2.6.33-66 31.8 141.0 171.8 1102.5No L2 cache? tqm8xxLinux 2.6.33-66 31.8 141.0 171.8 1101.7No L2 cache? tqm8xxLinux 2.6.33-66 31.8 141.0 171.8 1101.6No L2 cache? tqm8xxLinux 2.6.33-66 31.8 141.1 173.4 1149.1No L2 cache? tqm8xxLinux 2.6.33-66 31.8 141.1 173.4 1149.0No L2 cache? tqm8xxLinux 2.6.33-66 31.7 141.1 173.4 1148.7No L2 cache? tqm8xxLinux 2.6.33-66 31.7 141.1 173.4 1148.2No L2 cache? tqm8xxLinux 2.6.33-66 31.8 171.1 171.7 1099.8No L2 cache? tqm8xxLinux 2.6.33-66 31.8 171.1 171.6 1100.5No L2 cache? tqm8xxLinux 2.6.33-66 31.7 171.0 171.7 1101.0No L2 cache? tqm8xxLinux 2.6.33-66 31.8 171.0 171.6 1101.3No L2 cache? Besides the numbers, note how the first group doesn't have a Guesses entry. Is there something odd with the results for the first group? Hmm.. just to be safe, I made this test again, but it shows also no entry in Guesses ... Hardware, Linux Source, rootFS, lmbench sources, all the same ... Also, since you are using MODULES, patch 2 is nullified. Patch 1 is very minor and should not show I think. This leaves patches 3 4. There appears to be something funny with patch 3,Don't touch ACCESSED when no SWAP, as it yields bad numbers for Prot Fault so perhaps I am missing something that needs ACCESSED even if NO_SWAP. Perhaps a someone that knows MM in Linux knows? Is there any messages in the kernel log(dmesg)? I couldn;t find something in the output with dmesg ... but if you want this output, I can send it to you. bye Heiko -- DENX Software Engineering GmbH, MD: Wolfgang Denk Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
Hello Joakim, I tried your 4 patches on a MPC855M based system: -bash-3.2# cat /proc/cpuinfo processor : 0 cpu : 8xx clock : 66.00MHz revision: 0.0 (pvr 0050 ) bogomips: 8.25 timebase: 4125000 platform: TQM8xx model : TQM8xx Memory : 32 MB -bash-3.2# cat /proc/version Linux version 2.6.33-rc6-01500-gbddcb41-dirty (h...@xpert.denx.de) (gcc version 4.2.2) #9 Tue Mar 2 18:08:49 CET 2010 -bash-3.2# First I looked for the Boottime: Booting Linux: 2.6.33 2.6.33tunned ... until Freeing unused kernel memory message (= enter user space) ~4s ~4s ... until login: message (= full multi-user mode) 56s 56s and I did a Performance test with lmbench, see: http://sourceforge.net/projects/lmbench Here the results: (The first 4 rows are the results for the kernel without your patches, the next 4 rows are the results for the kernel with your patches) make[1]: Entering directory `/home/hs/lmbench-3.0-a9/results' L M B E N C H 3 . 0 S U M M A R Y (Alpha software, do not distribute) Basic system parameters -- Host OS Description Mhz tlb cache mem scal pages line par load bytes - - --- - - -- tqm8xxLinux 2.6.33- powerpc-linux-gnu 663216 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 66 716 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 66 716 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 663216 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 663216 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 66 716 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 66 716 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 663216 1.04001 Processor, Processes - times in microseconds - smaller is better -- Host OS Mhz null null open slct sig sig fork exec sh call I/O stat clos TCP inst hndl proc proc proc - - tqm8xxLinux 2.6.33- 66 2.97 10.3 129. 1377 272. 21.8 91.3 6949 29.K 89.K tqm8xxLinux 2.6.33- 66 3.06 10.5 124. 1375 273. 21.8 91.3 7136 30.K 89.K tqm8xxLinux 2.6.33- 66 3.06 10.6 129. 1365 272. 21.2 96.6 6889 29.K 89.K tqm8xxLinux 2.6.33- 66 3.06 10.5 124. 1309 272. 21.8 101. 6896 29.K 89.K tqm8xxLinux 2.6.33- 66 2.97 8.86 126. 1336 273. 21.7 84.2 6785 29.K 88.K tqm8xxLinux 2.6.33- 66 3.06 8.90 130. 1343 263. 21.3 84.7 7080 29.K 88.K tqm8xxLinux 2.6.33- 66 3.52 8.97 129. 1339 270. 22.4 84.4 6823 29.K 88.K tqm8xxLinux 2.6.33- 66 2.97 8.99 127. 1333 261. 22.4 87.0 7037 29.K 87.K Basic integer operations - times in nanoseconds - smaller is better --- Host OS intgr intgr intgr intgr intgr bit addmuldivmod - - -- -- -- -- -- tqm8xxLinux 2.6.33- 15.7 18.0 1.5600 124.2 203.1 tqm8xxLinux 2.6.33- 15.7 17.4 1.5800 121.1 202.8 tqm8xxLinux 2.6.33- 15.2 17.9 1.6200 124.2 202.7 tqm8xxLinux 2.6.33- 15.2 17.9 1.6000 125.0 204.0 tqm8xxLinux 2.6.33- 15.7 18.1 1.5600 124.7 204.4 tqm8xxLinux 2.6.33- 15.7 18.1 1.5800 124.2 202.8 tqm8xxLinux 2.6.33- 15.7 17.9 1.5500 124.2 203.2 tqm8xxLinux 2.6.33- 15.7 18.1 1.5500 124.5 202.0 Basic uint64 operations - times in nanoseconds - smaller is better -- Host OS int64 int64 int64 int64 int64 bitaddmuldivmod - - -- -- -- -- -- tqm8xxLinux 2.6.33-15. 13.3 1952.2 1838.2 tqm8xxLinux 2.6.33-15. 13.2 1951.5 1837.8 tqm8xxLinux 2.6.33-15. 13.2 1886.7 1907.8 tqm8xxLinux 2.6.33-15. 13.2 1951.5 1838.2 tqm8xxLinux 2.6.33-15. 13.3 1887.0 1902.2 tqm8xxLinux 2.6.33-15. 13.3 1887.4 1901.5 tqm8xxLinux 2.6.33-15. 13.3 1886.7 1893.0 tqm8xxLinux 2.6.33-15. 13.3 1950.0 1900.4 Basic float operations - times in nanoseconds - smaller is better - Host
Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
Heiko Schocher h...@denx.de wrote on 2010/03/03 09:02:47: Hello Joakim, I tried your 4 patches on a MPC855M based system: Thanks a lot for testing this for me! -bash-3.2# cat /proc/cpuinfo processor : 0 cpu : 8xx clock : 66.00MHz revision: 0.0 (pvr 0050 ) bogomips: 8.25 timebase: 4125000 platform: TQM8xx model : TQM8xx Memory : 32 MB -bash-3.2# cat /proc/version Linux version 2.6.33-rc6-01500-gbddcb41-dirty (h...@xpert.denx.de) (gcc version 4.2.2) #9 Tue Mar 2 18:08:49 CET 2010 -bash-3.2# First I looked for the Boottime: Booting Linux: 2.6.33 2.6.33tunned ... until Freeing unused kernel memory message (= enter user space)~4s ~4s ... until login: message (= full multi-user mode) 56s56s and I did a Performance test with lmbench, see: http://sourceforge.net/projects/lmbench Here the results: (The first 4 rows are the results for the kernel without your patches, the next 4 rows are the results for the kernel with your patches) make[1]: Entering directory `/home/hs/lmbench-3.0-a9/results' I see both ups and downs in this test, don't quite understand why. What is your config w.r.t SWAP, MODULES, CPU6 and CPU15? L M B E N C H 3 . 0 S U M M A R Y (Alpha software, do not distribute) Basic system parameters -- Host OS Description Mhz tlb cache mem scal pages line par load bytes - - --- - - -- tqm8xxLinux 2.6.33- powerpc-linux-gnu 663216 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 66 716 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 66 716 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 663216 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 663216 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 66 716 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 66 716 1.04001 tqm8xxLinux 2.6.33- powerpc-linux-gnu 663216 1.04001 Processor, Processes - times in microseconds - smaller is better -- Host OS Mhz null null open slct sig sig fork exec sh call I/O stat clos TCP inst hndl proc proc proc - - tqm8xxLinux 2.6.33- 66 2.97 10.3 129. 1377 272. 21.8 91.3 6949 29.K 89.K tqm8xxLinux 2.6.33- 66 3.06 10.5 124. 1375 273. 21.8 91.3 7136 30.K 89.K tqm8xxLinux 2.6.33- 66 3.06 10.6 129. 1365 272. 21.2 96.6 6889 29.K 89.K tqm8xxLinux 2.6.33- 66 3.06 10.5 124. 1309 272. 21.8 101. 6896 29.K 89.K tqm8xxLinux 2.6.33- 66 2.97 8.86 126. 1336 273. 21.7 84.2 6785 29.K 88.K tqm8xxLinux 2.6.33- 66 3.06 8.90 130. 1343 263. 21.3 84.7 7080 29.K 88.K tqm8xxLinux 2.6.33- 66 3.52 8.97 129. 1339 270. 22.4 84.4 6823 29.K 88.K tqm8xxLinux 2.6.33- 66 2.97 8.99 127. 1333 261. 22.4 87.0 7037 29.K 87.K [SNIP integer/float test, these are not relevant] Context switching - times in microseconds - smaller is better - Host OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw - - -- -- -- -- -- --- --- tqm8xxLinux 2.6.33- 92.6 109.6 110.9 137.5 173.8 151.8 199.3 tqm8xxLinux 2.6.33- 95.8 108.5 104.7 137.1 172.7 150.9 194.7 tqm8xxLinux 2.6.33- 95.8 118.8 97.5 146.4 162.0 160.8 190.1 tqm8xxLinux 2.6.33- 92.9 111.9 101.0 138.1 166.6 152.3 192.0 tqm8xxLinux 2.6.33- 90.8 108.5 116.2 134.3 171.8 147.1 210.0 tqm8xxLinux 2.6.33- 100.1 111.4 105.0 136.4 173.1 148.3 200.8 tqm8xxLinux 2.6.33- 98.7 111.3 111.8 135.7 172.5 147.9 200.9 tqm8xxLinux 2.6.33- 92.0 117.9 109.9 141.6 170.4 154.9 196.4 *Local* Communication latencies in microseconds - smaller is better - Host OS 2p/0K Pipe AF UDP RPC/ TCP RPC/ TCP ctxsw UNIX UDP TCP conn - - - - - - - - tqm8xxLinux 2.6.33- 92.6 338.4 581. 720.1 1047. 2749 tqm8xxLinux 2.6.33- 95.8
Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
Heiko Schocher h...@denx.de wrote on 2010/03/03 09:02:47: Hello Joakim, I tried your 4 patches on a MPC855M based system: Thanks a lot for testing this for me! -bash-3.2# cat /proc/cpuinfo processor : 0 cpu : 8xx clock : 66.00MHz revision: 0.0 (pvr 0050 ) bogomips: 8.25 timebase: 4125000 platform: TQM8xx model : TQM8xx Memory : 32 MB -bash-3.2# cat /proc/version Linux version 2.6.33-rc6-01500-gbddcb41-dirty (h...@xpert.denx.de) (gcc version 4.2.2) #9 Tue Mar 2 18:08:49 CET 2010 -bash-3.2# First I looked for the Boottime: Booting Linux: 2.6.33 2.6.33tunned ... until Freeing unused kernel memory message (= enter user space) ~4s~4s ... until login: message (= full multi-user mode) 56s56s and I did a Performance test with lmbench, see: http://sourceforge.net/projects/lmbench Here the results: (The first 4 rows are the results for the kernel without your patches, the next 4 rows are the results for the kernel with your patches) make[1]: Entering directory `/home/hs/lmbench-3.0-a9/results' I see both ups and downs in this test, don't quite understand why. What is your config w.r.t SWAP, MODULES, CPU6 and CPU15? Forgot to ask for PIN_TLB too ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
Hello Joakim, Joakim Tjernlund wrote: Heiko Schocher h...@denx.de wrote on 2010/03/03 09:02:47: [...] Here the results: (The first 4 rows are the results for the kernel without your patches, the next 4 rows are the results for the kernel with your patches) make[1]: Entering directory `/home/hs/lmbench-3.0-a9/results' I see both ups and downs in this test, don't quite understand why. What is your config w.r.t SWAP, MODULES, CPU6 and CPU15? Sorry, forgot to say, where to find the sources. You can find them here: http://git.denx.de/?p=linux-2.6-denx.git;a=shortlog;h=refs/heads/tqm8xx bye Heiko -- DENX Software Engineering GmbH, MD: Wolfgang Denk Detlev Zundel HRB 165235 Munich, Office: Kirchenstr.5, D-82194 Groebenzell, Germany ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/4] 8xx: Optimize TLB Miss code.
Heiko Schocher h...@denx.de wrote on 2010/03/03 11:10:10: Hello Joakim, Joakim Tjernlund wrote: Heiko Schocher h...@denx.de wrote on 2010/03/03 09:02:47: [...] Here the results: (The first 4 rows are the results for the kernel without your patches, the next 4 rows are the results for the kernel with your patches) make[1]: Entering directory `/home/hs/lmbench-3.0-a9/results' I see both ups and downs in this test, don't quite understand why. What is your config w.r.t SWAP, MODULES, CPU6 and CPU15? Sorry, forgot to say, where to find the sources. You can find them here: http://git.denx.de/?p=linux-2.6-denx.git;a=shortlog;h=refs/heads/tqm8xx OK, so you got SWAP=no, MODULES=yes, CPU6=no, CPU15=no PIN_TLB isn't listed in you def config so I assume it is no? MODULES=yes nullifies one optimization. I don't understand the bad numbers for Prot Fault: File VM system latencies in microseconds - smaller is better --- Host OS 0K File 10K File MmapProt Page 100fd Create Delete Create Delete Latency Fault Fault selct - - -- -- -- -- --- - --- - tqm8xxLinux 2.6.33- 5917.2 3968.3 31.2K 4329.0 4147.0 18.834.1 135.2 tqm8xxLinux 2.6.33- 5714.3 3937.0 32.3K 6060.6 4210.0 14.234.5 131.4 tqm8xxLinux 2.6.33- 5747.1 4000.0 31.2K 4329.0 4114.0 7.69234.0 133.1 tqm8xxLinux 2.6.33- 5747.1 4081.6 30.3K 4273.5 4100.0 18.234.2 135.0 tqm8xxLinux 2.6.33- 5714.3 3952.6 31.2K 4273.5 4130.0 33.535.1 136.1 tqm8xxLinux 2.6.33- 5714.3 3906.2 31.2K 6060.6 4105.0 25.735.5 135.9 tqm8xxLinux 2.6.33- 5681.8 3921.6 32.3K 4255.3 4144.0 23.535.0 134.9 tqm8xxLinux 2.6.33- 5649.7 3937.0 30.3K 4237.3 4116.0 21.635.3 135.3 Could you try reverting patch: 8xx: Don't touch ACCESSED when no SWAP. and see if that makes a difference? Turning on pinned TLBs(you must turn on ADVANCED_OPTIONS first) could be an improvement, regardless of my patches. Jocke ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 0/4] 8xx: Optimize TLB Miss code.
This set of tries to optimize the TLB code on 8xx even more. If they work, it should be a noticable performance boost. I would be very happy if you could test them for me. - v2: Since Scott has done some testing of these patches I resend them with my SOB. Scott, can you bless these patches too? Joakim Tjernlund (4): 8xx: Optimze TLB Miss handlers 8xx: Avoid testing for kernel space in ITLB Miss. 8xx: Don't touch ACCESSED when no SWAP. 8xx: Use SPRG2 and DAR registers to stash r11 and cr. arch/powerpc/kernel/head_8xx.S | 70 +++- 1 files changed, 47 insertions(+), 23 deletions(-) ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev