Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Reko Turja wrote on Sun, Dec 02, 2007 at 12:23:15AM +0200: On Sat, 01 Dec 2007 23:37:32 +0200, Alexey Vlasov [EMAIL PROTECTED] wrote: kernel: machine i386 cpu I686_CPU ident F1RNT1 options PAE One very probable culprit for slowness Sorry for the late reply, but here are some results for PAE. The slowdown isn't dramatic. Of course this is just 3 GB normal RAM + 1 GB PAE, so 3 Gb normal + 9 GB PAE would look worse. http://www.cons.org/cracauer/crabench/pae.user.html Martin -- %%% Martin Cracauer [EMAIL PROTECTED] http://www.cons.org/cracauer/ FreeBSD - where you want to go, today. http://www.freebsd.org/ ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Hi Robert Watson wrote: Evidence in-hand seems to suggest that 8 core systems work very well for most users, and reflect a significant performance increase with 7.0 over previous FreeBSD releases. I disagree with that. Heavily loaded Apache, MySQL, Postgres does not work well. The right path forwawrd at this point is to diagnosis the problems and work on fixing them in 8-CURRENT, and assuming they are not highly disruptive, MFC them for FreeBSD 7.1. I believe at least the bug with lockmgr contention should be fixed before release. Could you point me at the specific proposed change in question? I don't think I've seen it come across re@ as a potential merge request. Changing locking primitives close to a release is, FYI, a risky business, as while it may improve performance in specific cases, we may not have a lot of information about more general cases. We also risk opening up previously nascent race conditions in lock consumers. Kris sent me proof of concept patch that helped much against high lockmgr contention. After applying this patch 8-core server become faster that 4-core. But, again, it's still slower than Linux. Here's the patch: http://lists.freebsd.org/pipermail/freebsd-stable/2007-November/038449.html Here's Kris saying that it helps: http://lists.freebsd.org/pipermail/freebsd-stable/2007-November/038672.html I'm not sure it will help to MySQL and Prostgres, but symptoms are mostly identical. With best regards, Alexey Popov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Alexey Popov wrote: Hi Robert Watson wrote: Evidence in-hand seems to suggest that 8 core systems work very well for most users, and reflect a significant performance increase with 7.0 over previous FreeBSD releases. I disagree with that. Heavily loaded Apache, MySQL, Postgres does not work well. There is another report for such problems: http://blog.insidesystems.net/articles/2007/04/09/what-did-i-do-wrong The right path forwawrd at this point is to diagnosis the problems and work on fixing them in 8-CURRENT, and assuming they are not highly disruptive, MFC them for FreeBSD 7.1. I believe at least the bug with lockmgr contention should be fixed before release. Could you point me at the specific proposed change in question? I don't think I've seen it come across re@ as a potential merge request. Changing locking primitives close to a release is, FYI, a risky business, as while it may improve performance in specific cases, we may not have a lot of information about more general cases. We also risk opening up previously nascent race conditions in lock consumers. Kris sent me proof of concept patch that helped much against high lockmgr contention. After applying this patch 8-core server become faster that 4-core. But, again, it's still slower than Linux. Here's the patch: http://lists.freebsd.org/pipermail/freebsd-stable/2007-November/038449.html Here's Kris saying that it helps: http://lists.freebsd.org/pipermail/freebsd-stable/2007-November/038672.html I'm not sure it will help to MySQL and Prostgres, but symptoms are mostly identical. With best regards, Alexey Popov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFHVTFwxJBWvpalMpkRAjX0AJ4otHVzAzVqVRKJxUlD4Y9ENdD5PACgq/eZ ptzb/VC56JFh/Iiepy+bK/s= =wpdw -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Krassimir Slavchev wrote: There is another report for such problems: http://blog.insidesystems.net/articles/2007/04/09/what-did-i-do-wrong Of course - FreeBSD 6.x is really bad at SMP where number of CPUs is larger then about 2 and the loads include much kernel work (e.g. IO, context switches). Numeric tasks (SSL) don't depend on the kernel and so they scale ok. See http://people.freebsd.org/~kris/scaling/Scalability%20Update.pdf for details. Another issue is interesting in this thread: that apparently 7.0 also has a well defined workload where it fails. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
On Tue, 4 Dec 2007, Krassimir Slavchev wrote: Evidence in-hand seems to suggest that 8 core systems work very well for most users, and reflect a significant performance increase with 7.0 over previous FreeBSD releases. I disagree with that. Heavily loaded Apache, MySQL, Postgres does not work well. There is another report for such problems: http://blog.insidesystems.net/articles/2007/04/09/what-did-i-do-wrong A casual reading suggests that this article is about FreeBSD 6.2, and not FreeBSD 7.0. Am I misreading? Robert N M Watson Computer Laboratory University of Cambridge The right path forwawrd at this point is to diagnosis the problems and work on fixing them in 8-CURRENT, and assuming they are not highly disruptive, MFC them for FreeBSD 7.1. I believe at least the bug with lockmgr contention should be fixed before release. Could you point me at the specific proposed change in question? I don't think I've seen it come across re@ as a potential merge request. Changing locking primitives close to a release is, FYI, a risky business, as while it may improve performance in specific cases, we may not have a lot of information about more general cases. We also risk opening up previously nascent race conditions in lock consumers. Kris sent me proof of concept patch that helped much against high lockmgr contention. After applying this patch 8-core server become faster that 4-core. But, again, it's still slower than Linux. Here's the patch: http://lists.freebsd.org/pipermail/freebsd-stable/2007-November/038449.html Here's Kris saying that it helps: http://lists.freebsd.org/pipermail/freebsd-stable/2007-November/038672.html I'm not sure it will help to MySQL and Prostgres, but symptoms are mostly identical. With best regards, Alexey Popov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFHVTFwxJBWvpalMpkRAjX0AJ4otHVzAzVqVRKJxUlD4Y9ENdD5PACgq/eZ ptzb/VC56JFh/Iiepy+bK/s= =wpdw -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
On Tue, 2007-12-04 at 13:00 +0100, Ivan Voras wrote: Krassimir Slavchev wrote: There is another report for such problems: http://blog.insidesystems.net/articles/2007/04/09/what-did-i-do-wrong Of course - FreeBSD 6.x is really bad at SMP where number of CPUs is larger then about 2 and the loads include much kernel work (e.g. IO, context switches). Numeric tasks (SSL) don't depend on the kernel and so they scale ok. See http://people.freebsd.org/~kris/scaling/Scalability%20Update.pdf for details. Another issue is interesting in this thread: that apparently 7.0 also has a well defined workload where it fails. There is also his follow up to that post, comparing postgres on 6.2 with 7.0 (ULE and 4BSD schedulers). http://blog.insidesystems.net/articles/2007/04/11/postgresql-scaling-on-6-2-and-7-0 I'm very excited about getting some 7.0 servers into testing prior to deployment as production mysql boxes. Having run 7-CURRENT on my lappy for best part of 15 months, I think its supersmashinggreat :) Tom signature.asc Description: This is a digitally signed message part
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Robert Watson wrote: On Tue, 4 Dec 2007, Krassimir Slavchev wrote: Evidence in-hand seems to suggest that 8 core systems work very well for most users, and reflect a significant performance increase with 7.0 over previous FreeBSD releases. I disagree with that. Heavily loaded Apache, MySQL, Postgres does not work well. There is another report for such problems: http://blog.insidesystems.net/articles/2007/04/09/what-did-i-do-wrong A casual reading suggests that this article is about FreeBSD 6.2, and not FreeBSD 7.0. Am I misreading? No, But these tests can be performed on FreeBSD 7.0 4/8 core systems. Robert N M Watson Computer Laboratory University of Cambridge The right path forwawrd at this point is to diagnosis the problems and work on fixing them in 8-CURRENT, and assuming they are not highly disruptive, MFC them for FreeBSD 7.1. I believe at least the bug with lockmgr contention should be fixed before release. Could you point me at the specific proposed change in question? I don't think I've seen it come across re@ as a potential merge request. Changing locking primitives close to a release is, FYI, a risky business, as while it may improve performance in specific cases, we may not have a lot of information about more general cases. We also risk opening up previously nascent race conditions in lock consumers. Kris sent me proof of concept patch that helped much against high lockmgr contention. After applying this patch 8-core server become faster that 4-core. But, again, it's still slower than Linux. Here's the patch: http://lists.freebsd.org/pipermail/freebsd-stable/2007-November/038449.html Here's Kris saying that it helps: http://lists.freebsd.org/pipermail/freebsd-stable/2007-November/038672.html I'm not sure it will help to MySQL and Prostgres, but symptoms are mostly identical. With best regards, Alexey Popov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFHVTFwxJBWvpalMpkRAjX0AJ4otHVzAzVqVRKJxUlD4Y9ENdD5PACgq/eZ ptzb/VC56JFh/Iiepy+bK/s= =wpdw -END PGP SIGNATURE- -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFHVUdzxJBWvpalMpkRAsXNAJ9HinGlM19ePrSdXiLqkKRgCWUHpgCfVRaw yi7Tz4lN6dcrtFVdn9601yw= =BLSg -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
On Tue, 4 Dec 2007, Krassimir Slavchev wrote: Evidence in-hand seems to suggest that 8 core systems work very well for most users, and reflect a significant performance increase with 7.0 over previous FreeBSD releases. I disagree with that. Heavily loaded Apache, MySQL, Postgres does not work well. There is another report for such problems: http://blog.insidesystems.net/articles/2007/04/09/what-did-i-do-wrong A casual reading suggests that this article is about FreeBSD 6.2, and not FreeBSD 7.0. Am I misreading? No, But these tests can be performed on FreeBSD 7.0 4/8 core systems. These are precisely the sorts of tests we have been running. You can read a bit about the test in Kris's BSDCon.tr presentation: http://people.freebsd.org/~kris/scaling/7.0%20Preview.pdf We can't promise improvement on every workload, but we have seen real improvements on a great many workloads. I don't think anyone would argue that there isn't more work to be done, but at some point you have to stabilize and cut a release so that people can use something in the mean time. Releasing a perfect operating system in ten years helps no one. :-) The real issue at hand is whether we've hit a critical problem that justifies delaying the release in order to refine, test, and merge a change of a critical locking primitive in the kernel. Changing locking primitives, as I mentioned in an earlier post, is a risky thing: after all, it intentionally changes the timing for critical kernel data structures in the file system code. I've given Stephan, the author of the patch, a ping to ask him about this, but late in a release cycle, conservativism is the watch-word. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
On Tue, 4 Dec 2007, Ivan Voras wrote: Krassimir Slavchev wrote: There is another report for such problems: http://blog.insidesystems.net/articles/2007/04/09/what-did-i-do-wrong Of course - FreeBSD 6.x is really bad at SMP where number of CPUs is larger then about 2 and the loads include much kernel work (e.g. IO, context switches). Numeric tasks (SSL) don't depend on the kernel and so they scale ok. See http://people.freebsd.org/~kris/scaling/Scalability%20Update.pdf for details. Another issue is interesting in this thread: that apparently 7.0 also has a well defined workload where it fails. There are several known contention points that are high on the list of targets for the 8-CURRENT branch, some hopefully with MFCs in time for 7.1. These include contention on the tcbinfo lock, which protects global TCP data structures, and route table locking, which can affect high packets-per-second transmission on multiple CPUs at a time. lockmgr is high on the list for optimization also, especially since it's an older-style sleep lock constructed out of a mutex and msleep. When we optimized file descriptor locking in 7 (which mostly impacted threaded applications, and was one of the primary sources of improvement for MySQL), it had a very similar construction as lockmgr currently has, and optimization made a very big difference. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Tom Evans wrote: On Tue, 2007-12-04 at 13:00 +0100, Ivan Voras wrote: Krassimir Slavchev wrote: There is another report for such problems: http://blog.insidesystems.net/articles/2007/04/09/what-did-i-do-wrong Of course - FreeBSD 6.x is really bad at SMP where number of CPUs is larger then about 2 and the loads include much kernel work (e.g. IO, context switches). Numeric tasks (SSL) don't depend on the kernel and so they scale ok. See http://people.freebsd.org/~kris/scaling/Scalability%20Update.pdf for details. Another issue is interesting in this thread: that apparently 7.0 also has a well defined workload where it fails. There is also his follow up to that post, comparing postgres on 6.2 with 7.0 (ULE and 4BSD schedulers). http://blog.insidesystems.net/articles/2007/04/11/postgresql-scaling-on-6-2-and-7-0 I'm very excited about getting some 7.0 servers into testing prior to deployment as production mysql boxes. Having run 7-CURRENT on my lappy for best part of 15 months, I think its supersmashinggreat :) I know this thread is about SMP scalling, but most of my machines are UP (Sun Fire X2100) so I run my own synthetic benchmarks (super-smack on MySQL 5.0.45 and ab on Apache 2.2.6) on an old box with AMD Barton 2500+ with 512MB RAM. I was a little disappointed, because FreeBSD 6.2 UP behaves better than FreeBSD 7.0-BETA3 (4BSD and ULE tested). super-smack on 6.2 Query_type num_queries max_timemin_timeq_per_s select_index600 0 0 15061.63 super-smack on 7.0 (no metter if 4BSD or ULE) Query_type num_queries max_timemin_timeq_per_s select_index600 0 0 14320.31 used command: super-smack select-key.smack 10 30 results are from the second run The Apache Benchmark result was same on 6.2 and 7.0 (about 165 req/s), but on 7.0 Apache forks more processes (MPM prefork was used) than on 6.2. On 6.2 Apache has about 40 httpd processes running, but on 7.0 it has about 130 and console response was very very bad. example of top from 7.0 running ab -c 15 -n 5 http://192.168.1.164/phpinfo.php last pid: 1650; load averages: 83.80, 33.58, 13.71 up 0+00:32:44 12:09:16 170 processes: 126 running, 43 sleeping, 1 zombie CPU states: 65.9% user, 0.0% nice, 14.1% system, 19.9% interrupt, 0.0% idle Mem: 140M Active, 20M Inact, 64M Wired, 5028K Cache, 34M Buf, 9708K Free Swap: 512M Total, 41M Used, 470M Free, 8% Inuse Console response was better with ULE than 4BSD, but stil not so smooth like in 6.2 So I will postpone upgrade of all my 6.2 UP machines until 7.x UP will behave better or 6.x will reach EOL. Miroslav Lachman ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Robert Watson wrote: Changing locking primitives, as I mentioned in an earlier post, is a risky thing: after all, it intentionally changes the timing for critical kernel data structures in the file system code. I've given Stephan, the author of the patch, a ping to ask him about this, but late in a release cycle, conservativism is the watch-word. Agreed, but it would be a shame to miss on the momentum 7.0 has acquired for performance. Web servers are so common that there's a huge chance one of the first thing people will do with 7.0 would be some kind of web-benchmarks, especially after this thread on [EMAIL PROTECTED] Though (as I read the thread) the patch won't bring FreeBSD in line with Linux, it will help it not to be so slow it's silly. Re: timings: Would looking at past instances give insight into future? I don't remember the time accurately, but in the past, when VFS was translated to MPSAFE and the locking reengineered, were there such problems? Maybe Peter Holm can run a week or so of constant stress testing (24-hours-a-day) with the patch to verify it at least in short term? signature.asc Description: OpenPGP digital signature
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
On Tue, 2007-12-04 at 14:11 +0100, Ivan Voras wrote: Robert Watson wrote: Changing locking primitives, as I mentioned in an earlier post, is a risky thing: after all, it intentionally changes the timing for critical kernel data structures in the file system code. I've given Stephan, the author of the patch, a ping to ask him about this, but late in a release cycle, conservativism is the watch-word. Agreed, but it would be a shame to miss on the momentum 7.0 has acquired for performance. Web servers are so common that there's a huge chance one of the first thing people will do with 7.0 would be some kind of web-benchmarks, especially after this thread on [EMAIL PROTECTED] Though (as I read the thread) the patch won't bring FreeBSD in line with Linux, it will help it not to be so slow it's silly. Re: timings: Would looking at past instances give insight into future? I don't remember the time accurately, but in the past, when VFS was translated to MPSAFE and the locking reengineered, were there such problems? Maybe Peter Holm can run a week or so of constant stress testing (24-hours-a-day) with the patch to verify it at least in short term? I need to agree with Robert on this one. At some point you need to stop fiddling with nits, cut the release, and then fiddle with the nits in preparation for the next release. As we get closer to the point we think we can actually do the release RE needs to weigh the benefits of commit requests versus the risks. One of the biggest factors in our evaluation of the benefits is whether it's addressing an issue that completely blocks functionality (due to the bugs the system panics or otherwise does not do something it should) or if it merely improves on something. The latter we really need to consider extremely carefully because it's *possible* that adjustment would lead to the introduction of new bugs of the blocks functionality form. And this thread demonstrates to some degree exactly why a week of Peter Holm's stress testing doesn't leave us with the warm fuzzy feeling that an adjustment is perfect. It shows it's OK for his synthetic workload. But synthetic workloads of various forms showed improvements in throughput with 7.0 versus 6.3 while other workloads (e.g. the one that started off this thread...) don't. Whether 7.0 helps with peoples' workloads or not there is one thing in common throughout this thread and that's nobody here has been saying the system fails completely (note I said *this* *thread*... :-). RE values that over people getting improved performance for specific workloads at *this* phase of a release cycle. -- Ken Smith - From there to here, from here to | [EMAIL PROTECTED] there, funny things are everywhere. | - Theodore Geisel | signature.asc Description: This is a digitally signed message part
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Now we also have terribly performing PostgreSQL on 8-core server. We noticed the slowdown after moving PostgreSQL from 2xXeon 3.0 Apache+PostgreSQL server to dedicated PostgreSQL server. I collected some stats (see attach) before moving to Linux. I'm sure that some code optimization could help to have multicore enhancement. Rapidmind example comes as proof that there is room to make things better. I saw some text on Sun compiler making code more able to use cores and get the speed. Intel also tried to do the same. Maybe the same extent of work should be put on app optimization, aside to os changes. Despite I like bsd better, must to say that linux does not sleep and wait. Zoran ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Hi Mark Linimon wrote: I used 7.0-BETA3 and it is much worse. Ouch. A lot of systems see improvement. Thanks for trying it out. I hope that one of the people that has been doing the actual work can now comment (I am just an onlooker), and that you can be patient in the meantime. Unfortunately, Kris, who often looks at these kind of issues, is traveling for all of December and thus off the net. Is there any other FreeBSD developer who can take care of performance problems on many-cores systems? Seems like upcoming 7-RELEASE and 6.3-RELEASE would be completely unusable for us on that kind of systems i.e. mostly on all modern hardware. Now we also have terribly performing PostgreSQL on 8-core server. We noticed the slowdown after moving PostgreSQL from 2xXeon 3.0 Apache+PostgreSQL server to dedicated PostgreSQL server. I collected some stats (see attach) before moving to Linux. With best regards, Alexey Popov last pid: 58755; load averages: 26.42, 20.88, 14.00 up 25+22:12:42 11:51:11 84 processes: 29 running, 55 sleeping CPU states: % user, % nice, % system, % interrupt, % idle Mem: 1149M Active, 1971M Inact, 464M Wired, 120M Cache, 214M Buf, 161M Free Swap: 2048M Total, 72K Used, 2048M Free PID USERNAMETHR PRI NICE SIZERES STATE C TIME WCPU COMMAND 58541 pgsql 1 -40 1068M 655M semwai 5 0:17 27.08% postgres 58664 pgsql 1 40 1068M 458M sblock 0 0:05 25.49% postgres 58677 pgsql 1 1290 1067M 291M RUN2 0:04 24.55% postgres 58713 pgsql 1 1300 1067M 210M RUN5 0:03 23.99% postgres 58705 pgsql 1 1300 1069M 214M CPU7 4 0:03 23.03% postgres 58679 pgsql 1 1290 1068M 306M RUN1 0:04 22.45% postgres 58724 pgsql 1 1300 1068M 179M RUN4 0:02 22.19% postgres 58698 pgsql 1 1290 1068M 238M RUN0 0:03 22.19% postgres 58715 pgsql 1 1300 1068M 188M RUN0 0:02 21.68% postgres 58727 pgsql 1 1310 1069M 119M RUN1 0:01 20.15% postgres 58658 pgsql 1 1250 1069M 304M CPU0 0 0:03 19.99% postgres 58728 pgsql 1 1310 1068M 104M RUN3 0:01 19.57% postgres 58726 pgsql 1 -40 1067M 140M semwai 6 0:01 18.83% postgres 58730 pgsql 1 1310 1067M 96504K RUN2 0:01 17.42% postgres 58695 pgsql 1 1280 1069M 194M RUN0 0:02 16.37% postgres 58731 pgsql 1 1310 1068M 57016K CPU2 4 0:01 14.77% postgres 58737 pgsql 1 1310 1067M 53680K RUN3 0:01 13.45% postgres 58738 pgsql 1 1310 1067M 50508K RUN4 0:00 13.45% postgres 58743 pgsql 1 1310 1067M 29588K CPU4 2 0:00 9.74% postgres 58712 pgsql 1 -40 1069M 60488K semwai 6 0:01 9.57% postgres 58733 pgsql 1 1310 1068M 42968K RUN6 0:00 8.61% postgres 58742 pgsql 1 1310 1067M 27284K RUN1 0:00 6.65% postgres 58740 pgsql 1 1310 1067M 20096K RUN7 0:00 5.60% postgres 58736 pgsql 1 -40 1067M 26164K semwai 6 0:00 5.38% postgres 58734 pgsql 1 1300 1068M 33496K RUN7 0:00 4.04% postgres 58741 pgsql 1 40 1067M 23308K sbwait 7 0:00 3.85% postgres 58735 pgsql 1 -40 1067M 26152K semwai 5 0:00 3.50% postgres 47990 pgsql 1 1320 1066M 4300K select 6 163:53 1.51% postgres 58750 pgsql 1 1310 1067M 6816K RUN5 0:00 1.00% postgres 58751 pgsql 1 1310 1067M 6368K RUN6 0:00 1.00% postgres 58748 pgsql 1 1310 1067M 6456K CPU6 6 0:00 1.00% postgres 58732 pgsql 1 40 1067M 6772K sbwait 4 0:00 0.88% postgres 58744 pgsql 1 -40 1067M 10956K semwai 6 0:00 0.51% postgres 58745 pgsql 1 40 1067M 6804K sbwait 1 0:00 0.51% postgres 2 usersLoad 27.56 21.69 14.53 Dec 3 11:51 Mem:KBREALVIRTUAL VN PAGER SWAP PAGER Tot Share TotShareFree in out in out Act 12729446956 137390410528 156944 count All 15162248576 590766818076 pages Proc:Interrupts r p d s w Csw Trp Sys Int Sof Flt222 cow 16147 total 38 46 50 9210 29k 2349 1313 223 28k 3475 zfodatkbd0 1 3449 ozfod ata0 irq14 75.9%Sys 0.4%Intr 21.3%User 0.0%Nice 2.4%Idle 99%ozfod 161 em0 mfi0 1 ||||||||||| daefr 1999 cpu0: time ==1893 prcfr 1999 cpu1: time 282 dtbuf 4131 totfr 1998 cpu2: time Namei
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Hi Alexey Popov wrote: Now we also have terribly performing PostgreSQL on 8-core server. We noticed the slowdown after moving PostgreSQL from 2xXeon 3.0 Apache+PostgreSQL server to dedicated PostgreSQL server. I collected some stats (see attach) before moving to Linux. Sorry for the broken top ouptut in previuos message. Here's the correct one. last pid: 70857; load averages: 35.05, 37.11, 33 up 25+23:08:00 12:46:29 94 processes: 46 running, 48 sleeping CPU: 17.0% user, 0.0% nice, 80.5% system, 0.2% interrupt, 2.3% idle Mem: 1209M Active, 1890M Inact, 494M Wired, 143M Cache, 214M Buf, 127M Free Swap: 2048M Total, 72K Used, 2048M Free PID USERNAMETHR PRI NICE SIZERES STATE C TIME WCPU COMMAN 70557 pgsql 1 1310 1068M 662M RUN0 0:23 26.03% postgr 70840 pgsql 1 1320 1070M 167M CPU6 5 0:02 21.01% postgr 70761 pgsql 1 1280 1069M 384M RUN0 0:04 18.91% postgr 70766 pgsql 1 1290 1069M 414M RUN7 0:05 18.17% postgr 70784 pgsql 1 1280 1071M 374M RUN3 0:04 17.31% postgr 70758 pgsql 1 1280 1075M 443M RUN0 0:05 16.91% postgr 70783 pgsql 1 1280 1073M 393M RUN0 0:05 16.86% postgr 70781 pgsql 1 1280 1073M 389M RUN0 0:05 16.71% postgr 70755 pgsql 1 1280 1067M 387M RUN4 0:05 16.67% postgr 70765 pgsql 1 1280 1075M 424M CPU5 0 0:05 16.51% postgr 70764 pgsql 1 1280 1069M 388M RUN6 0:05 16.45% postgr 70786 pgsql 1 1280 1069M 361M RUN1 0:04 16.23% postgr 70785 pgsql 1 1280 1071M 358M RUN4 0:04 15.76% postgr 70788 pgsql 1 1280 1069M 330M RUN0 0:04 15.46% postgr 70795 pgsql 1 -160 1068M 300M vmpfw 0 0:04 15.07% postgr 70803 pgsql 1 1280 1068M 250M RUN7 0:03 14.71% postgr 70802 pgsql 1 1280 1068M 268M RUN0 0:03 14.19% postgr 70805 pgsql 1 1280 1068M 249M RUN0 0:03 14.04% postgr 70798 pgsql 1 1280 1070M 297M RUN0 0:03 13.92% postgr 70792 pgsql 1 1280 1068M 288M RUN0 0:04 13.90% postgr 70804 pgsql 1 1280 1068M 238M RUN0 0:03 13.29% postgr 70808 pgsql 1 1280 1068M 216M RUN3 0:03 13.28% postgr 70811 pgsql 1 1280 1069M 212M RUN0 0:03 12.81% postgr 70833 pgsql 1 1300 1068M 133M CPU2 3 0:02 12.77% postgr 70843 pgsql 1 1310 1068M 57636K RUN7 0:01 12.13% postgr 70834 pgsql 1 1300 1068M 111M CPU1 2 0:01 11.53% postgr 70850 pgsql 1 1320 1067M 46620K RUN1 0:00 11.28% postgr 70817 pgsql 1 1280 1068M 150M RUN0 0:02 10.45% postgr 70844 pgsql 1 1310 1067M 73296K RUN4 0:01 10.14% postgr 70815 pgsql 1 1280 1068M 143M RUN2 0:01 9.57% postgr 70819 pgsql 1 1280 1068M 158M RUN4 0:02 9.43% postgr 70832 pgsql 1 1290 1067M99M RUN1 0:01 9.42% postgr With best regards, Alexey Popov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
On 3/12/2007 8:50 PM, Alexey Popov wrote: Hi Alexey Popov wrote: Now we also have terribly performing PostgreSQL on 8-core server. We noticed the slowdown after moving PostgreSQL from 2xXeon 3.0 Apache+PostgreSQL server to dedicated PostgreSQL server. I collected some stats (see attach) before moving to Linux. Sorry for the broken top ouptut in previuos message. Here's the correct one. last pid: 70857; load averages: 35.05, 37.11, 33 up 25+23:08:00 12:46:29 94 processes: 46 running, 48 sleeping CPU: 17.0% user, 0.0% nice, 80.5% system, 0.2% interrupt, 2.3% idle Mem: 1209M Active, 1890M Inact, 494M Wired, 143M Cache, 214M Buf, 127M Free Swap: 2048M Total, 72K Used, 2048M Free Have you tried testing with different values for kern.hz? I am by no means an expert, but have stumbled across various postings over the past few years that suggest the high value (1000) used by modern (5.x+?) kernels can be pessimistic for some workloads... If you could try testing with some other values by setting in /boot/loader.conf, eg: kern.hz=100 Perhaps testing 100 and 200 to see how they fare against the default value of 1000, would at least provide some indicator as to whether this has any bearing on performance. Some with a better knowledge of the kernel internals may be able to support or dimiss this idea, but as Kris is off on holidays I figured any suggestion was worthwhile! ;-) I'd also like to say thanks for your efforts to help test and track down the cause of these performance problems - in the end the whole community benefits, so the more you are able to test and help resolve these things the better for us all... :-) --Antony ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
On Mon, 3 Dec 2007, Alexey Popov wrote: Mark Linimon wrote: I used 7.0-BETA3 and it is much worse. Ouch. A lot of systems see improvement. Thanks for trying it out. I hope that one of the people that has been doing the actual work can now comment (I am just an onlooker), and that you can be patient in the meantime. Unfortunately, Kris, who often looks at these kind of issues, is traveling for all of December and thus off the net. Is there any other FreeBSD developer who can take care of performance problems on many-cores systems? Seems like upcoming 7-RELEASE and 6.3-RELEASE would be completely unusable for us on that kind of systems i.e. mostly on all modern hardware. There are many FreeBSD developers who care a great deal about the performance of many-core systems. However, it's also very late in the release cycle for 7.0, and this sort of analysis requires a lot of time, so I don't think we will (or should) see any substantial changes at this point as they would require us to significantly extend the release cycle in order to test them properly. The right path forwawrd at this point is to diagnosis the problems and work on fixing them in 8-CURRENT, and assuming they are not highly disruptive, MFC them for FreeBSD 7.1. In general, the most important factor in optimizing performance is to get a good collaboration going between someone who can reproduce the problem, ideally in a way that can be shared with developers so they can also reproduce the problem, and provide testing and feedback over an extended period (several months) while the changes are developed and refined. This is part of the role Kris has been playing with a number of FreeBSD developers -- Jeff, Attilio, myself, etc -- he set up highly reproduceable performance measurements and then worked with us to evaluate various patches to improve performance. That kind of dynamic is invaluable, but it requires users who care a lot about performance (or whatever other factor it is) to spend a fair amount of time helping us. Whether this is by providing a potted benchmark for developers to try out, or if this is by providing access to the test environment on their own systems, it's still critical. I know from previous messages in the thread that you can't provide access to the actual application, but can you provide some sort of potted substitute that has similar performance properties -- be it php page sizes, database query load traces that can be replayed, etc? Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Hi Alexey Popov wrote: Now we also have terribly performing PostgreSQL on 8-core server. We noticed the slowdown after moving PostgreSQL from 2xXeon 3.0 Apache+PostgreSQL server to dedicated PostgreSQL server. I collected some stats (see attach) before moving to Linux. FYI there's top output on the same server with only 2 cores enabled (it works much better than with 8 cores): last pid: 38266; load averages: 7.11, 5.01, 4. up 0+01:33:40 15:51:20 53 processes: 7 running, 46 sleeping CPU: 69.1% user, 0.0% nice, 29.8% system, 0.4% interrupt, 0.7% idle Mem: 835M Active, 1743M Inact, 443M Wired, 168K Cache, 214M Buf, 882M Free Swap: 2048M Total, 2048M Free PID USERNAMETHR PRI NICE SIZERES STATE C TIME WCPU COMMAN 38139 pgsql 1 1320 1126M 639M RUN0 0:06 29.84% postgr 38245 pgsql 1 1300 1071M 361M RUN1 0:01 19.61% postgr 38249 pgsql 1 1300 1068M 320M CPU1 0 0:01 15.76% postgr 38251 pgsql 1 1300 1069M 317M RUN1 0:01 15.06% postgr 38254 pgsql 1 1300 1067M 161M RUN1 0:01 14.36% postgr 694 pgsql 1 1300 1066M 4816K select 0 0:37 0.00% postgr 698 pgsql 1 960 15588K 4724K select 0 0:28 0.00% postgr 697 pgsql 1 960 1066M 298M select 0 0:14 0.00% postgr 1748 root 1 960 7044K 2252K select 0 0:05 0.00% top 18775 root 1 960 7008K 2220K select 0 0:02 0.00% top 627 root 1 960 18120K 5872K select 0 0:02 0.00% snmpd 1704 vich 1 960 30616K 4248K select 0 0:00 0.00% sshd 695 pgsql 1 960 15392K 4496K select 1 0:00 0.00% postgr 16781 null 1 960 30616K 4220K select 0 0:00 0.00% sshd 655 root 1 960 7732K 2324K select 0 0:00 0.00% ntpd 556 root 1 960 3652K 1192K select 1 0:00 0.00% syslog With best regards, Alexey Popov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Antony Mawer wrote: Have you tried testing with different values for kern.hz? I am by no means an expert, but have stumbled across various postings over the past few years that suggest the high value (1000) used by modern (5.x+?) kernels can be pessimistic for some workloads... If you could try testing with some other values by setting in /boot/loader.conf, eg: kern.hz=100 AFAIK this was tried and found irrelevant for this particular load. It may still help others. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Hi. Robert Watson wrote: Is there any other FreeBSD developer who can take care of performance problems on many-cores systems? Seems like upcoming 7-RELEASE and 6.3-RELEASE would be completely unusable for us on that kind of systems i.e. mostly on all modern hardware. There are many FreeBSD developers who care a great deal about the performance of many-core systems. However, it's also very late in the release cycle for 7.0, and this sort of analysis requires a lot of time, so I don't think we will (or should) see any substantial changes at this point as they would require us to significantly extend the release cycle in order to test them properly. Is there a reason to release system that unable to work on 8-core systems? What would people think when they won't be able to run their old projects after moving to the new hardware? The right path forwawrd at this point is to diagnosis the problems and work on fixing them in 8-CURRENT, and assuming they are not highly disruptive, MFC them for FreeBSD 7.1. I believe at least the bug with lockmgr contention should be fixed before release. In general, the most important factor in optimizing performance is to get a good collaboration going between someone who can reproduce the problem, ideally in a way that can be shared with developers so they can also reproduce the problem, and provide testing and feedback over an extended period (several months) while the changes are developed and refined. This is part of the role Kris has been playing with a number of FreeBSD developers -- Jeff, Attilio, myself, etc -- he set up highly reproduceable performance measurements and then worked with us to evaluate various patches to improve performance. That kind of dynamic is invaluable, but it requires users who care a lot about performance (or whatever other factor it is) to spend a fair amount of time helping us. Whether this is by providing a potted benchmark for developers to try out, or if this is by providing access to the test environment on their own systems, it's still critical. I know from previous messages in the thread that you can't provide access to the actual application, but can you provide some sort of potted substitute that has similar performance properties -- be it php page sizes, database query load traces that can be replayed, etc? I can try to produce synthetic benchmarks based on my workload but really I'm interested more in real workload performance. I'm ready to test changes, measure differences and provide any benchmark and profiling information. Except for lockmgr contention bug there seems to be a much optimization work to do because FreeBSD with patched lockmgr on my workload is still 1.5 times slower that Linux. With best regards, Alexey Popov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
On Mon, 3 Dec 2007, Alexey Popov wrote: Robert Watson wrote: Is there any other FreeBSD developer who can take care of performance problems on many-cores systems? Seems like upcoming 7-RELEASE and 6.3-RELEASE would be completely unusable for us on that kind of systems i.e. mostly on all modern hardware. There are many FreeBSD developers who care a great deal about the performance of many-core systems. However, it's also very late in the release cycle for 7.0, and this sort of analysis requires a lot of time, so I don't think we will (or should) see any substantial changes at this point as they would require us to significantly extend the release cycle in order to test them properly. Is there a reason to release system that unable to work on 8-core systems? What would people think when they won't be able to run their old projects after moving to the new hardware? Evidence in-hand seems to suggest that 8 core systems work very well for most users, and reflect a significant performance increase with 7.0 over previous FreeBSD releases. Obviously, this is not true in all cases, but part of the point of doing a .0 release is to get the technology into the hands of people who want to use it, and part of the point of continuing to support the 6.x release series is to provide the less agressive feature development that some users needed. The right path forwawrd at this point is to diagnosis the problems and work on fixing them in 8-CURRENT, and assuming they are not highly disruptive, MFC them for FreeBSD 7.1. I believe at least the bug with lockmgr contention should be fixed before release. Could you point me at the specific proposed change in question? I don't think I've seen it come across re@ as a potential merge request. Changing locking primitives close to a release is, FYI, a risky business, as while it may improve performance in specific cases, we may not have a lot of information about more general cases. We also risk opening up previously nascent race conditions in lock consumers. In general, the most important factor in optimizing performance is to get a good collaboration going between someone who can reproduce the problem, ideally in a way that can be shared with developers so they can also reproduce the problem, and provide testing and feedback over an extended period (several months) while the changes are developed and refined. This is part of the role Kris has been playing with a number of FreeBSD developers -- Jeff, Attilio, myself, etc -- he set up highly reproduceable performance measurements and then worked with us to evaluate various patches to improve performance. That kind of dynamic is invaluable, but it requires users who care a lot about performance (or whatever other factor it is) to spend a fair amount of time helping us. Whether this is by providing a potted benchmark for developers to try out, or if this is by providing access to the test environment on their own systems, it's still critical. I know from previous messages in the thread that you can't provide access to the actual application, but can you provide some sort of potted substitute that has similar performance properties -- be it php page sizes, database query load traces that can be replayed, etc? I can try to produce synthetic benchmarks based on my workload but really I'm interested more in real workload performance. I'm ready to test changes, measure differences and provide any benchmark and profiling information. Except for lockmgr contention bug there seems to be a much optimization work to do because FreeBSD with patched lockmgr on my workload is still 1.5 times slower that Linux. Obviously, we are interested in the real workload also, but there are times when we have to accept synthetic benchmarks we can get our hands on instead of real benchmarks that people won't give to us because they incorporate proprietary technology, business-sensitive information, or are simply too complex to reproduce, etc. If you can give us the exact workload to reproduce on our systems, that's much better than a synthetic benchmark, but if you can't, then a synthetic benchmark is what we'll have to work with. I suggest we move this thread to the performance@ mailing list, and if possible, could you begin the thread over there with a summary of the workload and investigation to date. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
On Sat, Dec 01, 2007 at 11:04:41PM +0100, Daniel Gerzo wrote: Please try with RELENG_7 (aka. FreeBSD 7.0-BETA3) and ULE scheduler. I used 7.0-BETA3 and it is much worse. ULE, w/o PAE (or with PAE) # ./ab -n 100 -c 20 -t 30 http://somesite-freebsd.com/ab/ This is ApacheBench, Version 2.0.40-dev $Revision: 1.146 $ apache-2.0 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright 2006 The Apache Software Foundation, http://www.apache.org/ Benchmarking test-f1-apache-aux2.1gb.ru (be patient) Finished 17 requests Server Software:Apache/2.2.3 Server Hostname:somesite-freebsd.com Server Port:80 Document Path: /ab/ Document Length:41451 bytes Concurrency Level: 20 Time taken for tests: 30.448737 seconds Complete requests: 17 Failed requests:0 Write errors: 0 Total transferred: 1191762 bytes HTML transferred: 1178622 bytes Requests per second:0.56 [#/sec] (mean) Time per request: 35822.043 [ms] (mean) Time per request: 1791.102 [ms] (mean, across all concurrent requests) Transfer rate: 38.20 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect:00 0.9 0 2 Processing: 490 4160 8103.9640 25972 Waiting: 91 125 70.4110 394 Total:490 4160 8103.8640 25972 Percentage of the requests served within a certain time (ms) 50%631 66%709 75%721 80%734 90% 19495 95% 25972 98% 25972 99% 25972 100% 25972 (longest request) Do you have any more ideas? I know that I can try to change to amd64, but I'm sure that this won't solve my problems. As far as I remember it didn't help Alexey Popov ( author of this thread). And by the way I couldn't launch Zend Optimizer (3.3.0.) on amd64. It gave me Segmentation fault: 11 (core dumped). http://www.zend.com/forums/index.php?t=msggoto=13585S=a322ef7edb5d49c70f431607e648fb57srch=amd64+freebsd#msg_13585 And without it as you undesrtand yourself virtual hosting is nothing. Looking freebsd-maillists I noticed the same discription of the same problem as I have. http://lists.freebsd.org/pipermail/freebsd-performance/2007-July/002781.html -- BRGDS. Alexey Vlasov. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
On Sunday 02 December 2007, Alexey Vlasov wrote: I used 7.0-BETA3 and it is much worse. Ouch. A lot of systems see improvement. Thanks for trying it out. I hope that one of the people that has been doing the actual work can now comment (I am just an onlooker), and that you can be patient in the meantime. Unfortunately, Kris, who often looks at these kind of issues, is traveling for all of December and thus off the net. mcl ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Hi, It seems that I'm not the only one who faced the problem that FreeBSD is non productive on multiprocessors platforms. I use OS Linux on my hosting for web-servers, base for all servers is the same m/b S5000PAL ( SR1500), 2 quad kernel cpu Xeon E5320 or E5345, 8Gb RAM. I decided to install FreeBSD 6.2 i386 on one of the servers, and the result was totally non productive. Used software, Apache 2.2.6 (worker) as frontend Proxy, backend Apache 2.2.3 (prefork) By the time we understood,that something was wrong with FreeBSD, there had already been placed about 10 sites with high-capacity and about hundred of usual ones. And this was the limit for FreeBSD. It came along with a great amount of Context Switches, about hundred thousands. I attached the log what was then happening with FreeBSD. After playing with ab (ApacheBenchmark) options, it turned out that even with the following options you can totally down the server: ./ab -n 100 -c 20 -t 30 http://somesite-freebsd.com I copied at the same time somesite.com (php scripts) to Linux server, launched ab with the same options, and saw that it has no influence on work of the server. (And by the way there work about 1.5 virtual hosts on that server) All options for Apache on Linux and FreeBSD are the same: FreeBSD: This is ApacheBench, Version 2.0.40-dev $Revision: 1.146 $ apache-2.0 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright 2006 The Apache Software Foundation, http://www.apache.org/ Benchmarking somesite-freebsd.com (be patient) Finished 29 requests Server Software:Apache/2.2.3 Server Hostname:somesite-freebsd.com Server Port:80 Document Path: /ab/ Document Length:41450 bytes Concurrency Level: 20 Time taken for tests: 30.44765 seconds Complete requests: 29 Failed requests:22 (Connect: 0, Length: 22, Exceptions: 0) Write errors: 0 Total transferred: 1529557 bytes HTML transferred: 1513497 bytes Requests per second:0.97 [#/sec] (mean) Time per request: 20720.527 [ms] (mean) Time per request: 1036.026 [ms] (mean, across all concurrent requests) Transfer rate: 49.69 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect:0 1760 1503.4 30023002 Processing: 866 13328 9460.5 13853 26246 Waiting: 139 2286 2319.0 11296764 Total:871 15089 10642.6 16855 29248 Percentage of the requests served within a certain time (ms) 50% 16705 66% 22670 75% 25439 80% 26342 90% 29160 95% 29188 98% 29248 99% 29248 100% 29248 (longest request) Linux: (the same site) This is ApacheBench, Version 2.0.40-dev $Revision: 1.146 $ apache-2.0 Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/ Copyright 2006 The Apache Software Foundation, http://www.apache.org/ Benchmarking linux.1gb.ru (be patient) Finished 814 requests Server Software:Apache/2.2.3 Server Hostname:somesite-linux.com Server Port:80 Document Path: /ab/ Document Length:41451 bytes Concurrency Level: 20 Time taken for tests: 30.3216 seconds Complete requests: 814 Failed requests:759 (Connect: 0, Length: 759, Exceptions: 0) Write errors: 0 Non-2xx responses: 1 Total transferred: 34430291 bytes HTML transferred: 34126461 bytes Requests per second:27.13 [#/sec] (mean) Time per request: 737.180 [ms] (mean) Time per request: 36.859 [ms] (mean, across all concurrent requests) Transfer rate: 1120.65 [Kbytes/sec] received Connection Times (ms) min mean[+/-sd] median max Connect:00 0.0 0 0 Processing: 214 725 575.25575001 Waiting: 41 265 376.01313280 Total:214 725 575.25575001 Percentage of the requests served within a certain time (ms) 50%557 66%716 75%863 80%967 90% 1398 95% 1749 98% 2529 99% 3064 100% 5001 (longest request) # cat /etc/sysctl.conf security.bsd.see_other_uids=0 kern.maxfiles=204800 kern.maxfilesperproc=202400 kernel: machine i386 cpu I686_CPU ident F1RNT1 options PAE options SMP options SCHED_4BSD options PREEMPTION options INET options FFS options SOFTUPDATES options UFS_ACL options UFS_DIRHASH options NULLFS options MD_ROOT options CD9660 options PROCFS options PSEUDOFS options GEOM_GPT options GEOM_LABEL options GEOM_MIRROR options COMPAT_43 options COMPAT_FREEBSD4 options COMPAT_FREEBSD5 options SCSI_DELAY=5000 options KTRACE options SYSVSHM options SYSVMSG options SYSVSEM options _KPOSIX_PRIORITY_SCHEDULING options
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
On Sun, Dec 02, 2007 at 12:37:32AM +0300, Alexey Vlasov wrote: I decided to install FreeBSD 6.2 i386 on one of the servers, and the result was totally non productive. The 6.x series was intended to get us back to the stability that we had had pre-SMP integration. I believe we mostly succeeded. One of the major thrusts for 7.0 development was to fix the performance regressions that had been introduced. From the results that I have seen (I am not one of the participants), there has been major progress over the past 2 years in removing yet one bottleneck after another. Recent tests show us to be on a par with Linux on a number of benchmarks; of course, we need more people testing 7.0 in real-world environments to confirm this. You may want to try the 7.0 release candidate on a testbed to see if your results have improved as much as we think that they will have. mcl ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
I use OS Linux on my hosting for web-servers, base for all servers is the same m/b S5000PAL ( SR1500), 2 quad kernel cpu Xeon E5320 or E5345, 8Gb RAM. I decided to install FreeBSD 6.2 i386 on one of the servers, To be a bit mor specific with my previous reply, in order to use SCHED_ULE you need to be running 7.x (which is quite stable already even being a beta. And of course with 64 bit hardware it's best to run amd64 version of the OS. -Reko ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
On Sat, 01 Dec 2007 23:37:32 +0200, Alexey Vlasov [EMAIL PROTECTED] wrote: kernel: machine i386 cpu I686_CPU ident F1RNT1 options PAE One very probable culprit for slowness options SMP options SCHED_4BSD Using _ULE might yield a bit more performance as well # cat /etc/make.conf CPUTYPE?=nocona CFLAGS=-O2 -pipe I think the recommended practise is either use CFLAGS+=your flags or put the local compiler tweaks to COPTFALGS these days. Not sure if this affects performance tho' -Reko ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re[2]: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Hello Alexey, Saturday, December 1, 2007, 10:37:32 PM, you wrote: I use OS Linux on my hosting for web-servers, base for all servers is the same m/b S5000PAL ( SR1500), 2 quad kernel cpu Xeon E5320 or E5345, 8Gb RAM. I decided to install FreeBSD 6.2 i386 on one of the servers, and the result was totally non productive. Please try with RELENG_7 (aka. FreeBSD 7.0-BETA3) and ULE scheduler. -- Best regards, Danielmailto:[EMAIL PROTECTED] ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
options PAE One very probable culprit for slowness I'd say it IS the culprit. PAE is known to decrease performance, and this is probably 95% of the cause. Using _ULE might yield a bit more performance as well Yes, in 7.0-BETA3 I'm seeing a 7% increase in performance (sysbench with 8 threads on a 4-core system) with ULE over 4BSD. Both great suggestions. If he needs the high memory support, I would test without PAE just to test the performance (along with changing to the ULE scheduler), then rebuild the system later with amd64 so he doesn't have to use the PAE hack. Regards, Josh ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Alexey Popov wrote: Hi Kris Kennaway wrote: One more patch which may or may not help is: http://www.freebsd.org/~jhb/patches/namei_rwlock.patch (may also require porting since it was against an older version of 7.0-CURRENT). When I have tested this in the past it was a performance loss for reasons that I think I understand (basically, it is locally a performance improvement for the name cache but also requires a fixed lockmgr to avoid an overall performance loss), but I don't remember if I tested it in conjunction with the lockmgr patch. This patch doesn't apply to 7-STABLE because /sys/kern/vfs_cache.c was changed significanly since rev. 1.108. I tried to patch it manually but don't know what to do with cache_lookup() changes. OK, I am about to go on vacation so I am not able to help with either of these things. Kris There are patches you need to enable it on woodcrest. They are in my p4 branch (kris-contention) but I don't have time right now to extract them. I think it would be very useful because I can't see any other ways to profile FreeBSD on the modern many-cores machines. You can extract the changeset from my branch via http://perforce.freebsd.org. Unfortunately I don't have time to do it myself. I'll try it if it does not also need porting. With best regards, Alexey Popov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Hi Kris Kennaway wrote: One more patch which may or may not help is: http://www.freebsd.org/~jhb/patches/namei_rwlock.patch (may also require porting since it was against an older version of 7.0-CURRENT). When I have tested this in the past it was a performance loss for reasons that I think I understand (basically, it is locally a performance improvement for the name cache but also requires a fixed lockmgr to avoid an overall performance loss), but I don't remember if I tested it in conjunction with the lockmgr patch. This patch doesn't apply to 7-STABLE because /sys/kern/vfs_cache.c was changed significanly since rev. 1.108. I tried to patch it manually but don't know what to do with cache_lookup() changes. There are patches you need to enable it on woodcrest. They are in my p4 branch (kris-contention) but I don't have time right now to extract them. I think it would be very useful because I can't see any other ways to profile FreeBSD on the modern many-cores machines. You can extract the changeset from my branch via http://perforce.freebsd.org. Unfortunately I don't have time to do it myself. I'll try it if it does not also need porting. With best regards, Alexey Popov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Alexey Popov wrote: Hi Kris Kennaway wrote: Now FreeBSD 7-STABLE ULE 8-core server without optimized PHP realpath_cache_size (producing 2000+ lstats per request) can handle up to ~24 rps as opposed to max. 17 rps without your patch. %sys never grows over %user with your patch. On the server with optimized realpath_cache_size there's no visible influence of your patch. You said 20 before for this configuration, so I'm a bit suspicious about how seriously to treat your measurements :) Sorry, my mistake. s/ULE/4BSD. OK, please compare ULE to ULE with and without my patch (and remembering to enable the sysctl), and obtain lock profiling traces in both cases under identical workloads durations. That is what I need to proceed with this issue. I didn't measured the exact values of requests per second on ULE with patch and without patch, but at first glance the benefits of the patch are similiar to 4BSD. If you need this values, I'll obtain them. Here you can find lock profiling results for 7-BETA3 GENERIC kernel with SCHED_ULE running optimized PHP and unoptimized, with your patch and without it: http://83.167.98.162/gprof/lockmgr/ This data was collected by th following script: (sysctl debug.lock.prof.reset=1 sysctl debug.lock.prof.enable=1 sleep 60 sysctl debug.lock.prof.enable=0 sysctl debug.lock.prof.stats top -d 2 -b | tail -25) AFAIU there's still high contention on lockbuilder mtxpool with patch applied. But hopefully lockmgr:ufs contention which i believe produced 80%sysCPU load is gone with your patch. Looks to me like lockmgr-related contention was reduced by 1 to 2 orders of magnitude, which is the expected result. This surely must have a measurable impact on your workload. Further lockmgr improvement will have to wait until the lockmgr replacement work proceeds. One more patch which may or may not help is: http://www.freebsd.org/~jhb/patches/namei_rwlock.patch (may also require porting since it was against an older version of 7.0-CURRENT). When I have tested this in the past it was a performance loss for reasons that I think I understand (basically, it is locally a performance improvement for the name cache but also requires a fixed lockmgr to avoid an overall performance loss), but I don't remember if I tested it in conjunction with the lockmgr patch. There are patches you need to enable it on woodcrest. They are in my p4 branch (kris-contention) but I don't have time right now to extract them. I think it would be very useful because I can't see any other ways to profile FreeBSD on the modern many-cores machines. You can extract the changeset from my branch via http://perforce.freebsd.org. Unfortunately I don't have time to do it myself. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Hi. Kris Kennaway wrote: Now FreeBSD 7-STABLE ULE 8-core server without optimized PHP realpath_cache_size (producing 2000+ lstats per request) can handle up to ~24 rps as opposed to max. 17 rps without your patch. %sys never grows over %user with your patch. On the server with optimized realpath_cache_size there's no visible influence of your patch. You said 20 before for this configuration, so I'm a bit suspicious about how seriously to treat your measurements :) Sorry, my mistake. s/ULE/4BSD. Anyway, please obtain another lock profiling trace using the same conditions as the previous one (same workload duration, etc), so we can compare what changed. OK, I'll make it a little bit later. Also I tried to find what else is slow in FreeBSD, I tried hwpmc as module and in kernel, but it fails with error: pmc: Unknown Intel CPU. module_register_init: MOD_LOAD (hwpmc, 0x804833e0, 0x809338a0) error 78 This is related to http://www.freebsd.org/cgi/query-pr.cgi?pr=amd64%2F111994cat= and it is impossible to use hwpmc with modern CPUs. Is kgmon profiling usable on FreeBSD 7? With best regards, Alexey Popov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Joseph Koshy wrote: Also I tried to find what else is slow in FreeBSD, I tried hwpmc as module and in kernel, but it fails with error: pmc: Unknown Intel CPU. module_register_init: MOD_LOAD (hwpmc, 0x804833e0, 0x809338a0) error 78 There are patches you need to enable it on woodcrest. They are in my p4 branch (kris-contention) but I don't have time right now to extract them. These patches make hwpmc treat these CPUs are possessing Pentium-Pro class PMCs. Unfortunately, this is easy to do, but incorrect: - There are differences in the legal bit values that may be loaded into PMC registers for many hardware events. - hwpmc needs to be taught to support measurements on CPUs with multiple cores per package. And then there is additional work to support these CPUS at the same level as the current set: - The hardware events supported are named differently; documentation, libpmc's event selector parsing code need to be changed to suit. - The hardware supports a new class of fixed function PMCs that hwpmc needs to support. Well, this is all true, but overlooks the point that it does minimally work, which is of critical importance to people with one of these CPUs who want to actually use your tool ;) Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Krassimir Slavchev wrote: That's true but if the tests are same then they can be compared. - the code is most likely checking for changes in PHP libraries) This is not recommended for production systems. PHP code accelerators / caches do that all the time. require_once() also does it. Yes, may be it is easier to write perl/php scripts. I'm glad you're volunteering :) signature.asc Description: OpenPGP digital signature
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Ivan Voras wrote: On 23/11/2007, Krassimir Slavchev [EMAIL PROTECTED] wrote: Would someone define what exact tests to be performed. Ok, using ab is fine but with what parameters it is used and against what, script or static html? It will be good to have written some perl, In this thread, it's always PHP code, with database backends. php ... scripts or C programs which simulates some kind of 'real world' work. The problem is that a realistic applications does a lot of things that are not easily simulated: That's true but if the tests are same then they can be compared. - usually has a lot of code, lots of include files, libraries, etc. (so it stresses file systems, as was shown with fstat() in the thread - the code is most likely checking for changes in PHP libraries) This is not recommended for production systems. - uses a database, which is populated with real-world data (so it has a lot of IPC of very varied sizes) - uses some kind of caching, both of compiled PHP code (eAccelerator, pecl-APC) and of data (eAccelerator, memcached) (which uses SysV SHM and IPC). Reducing all that to a C file that does all of it is very nontrivial. Yes, may be it is easier to write perl/php scripts. For classic setups with mod_php, it's not uncommon that httpd processes grow to 100 MB or more each, with all the heavy stuff brought in. Yes, that is true for mod_perl too. However, it is hard to simulate real workload. I will have 2 2xQuad Core(X5450) with 8G RAM systems (DL380G5) soon and will have about a month to play with them before put in production. If someone wish I can run specific test on them. Best Regards -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFHRrQuxJBWvpalMpkRAvL9AJ9tBgeZPxg6zYWqJUgVimIJgaxl1ACeK2kS POeyNbZBGuiQB0OKHIEtoSk= =pjb2 -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Alexey Popov wrote: Hi. Kris Kennaway wrote: Now FreeBSD 7-STABLE ULE 8-core server without optimized PHP realpath_cache_size (producing 2000+ lstats per request) can handle up to ~24 rps as opposed to max. 17 rps without your patch. %sys never grows over %user with your patch. On the server with optimized realpath_cache_size there's no visible influence of your patch. You said 20 before for this configuration, so I'm a bit suspicious about how seriously to treat your measurements :) Sorry, my mistake. s/ULE/4BSD. OK, please compare ULE to ULE with and without my patch (and remembering to enable the sysctl), and obtain lock profiling traces in both cases under identical workloads durations. That is what I need to proceed with this issue. Anyway, please obtain another lock profiling trace using the same conditions as the previous one (same workload duration, etc), so we can compare what changed. OK, I'll make it a little bit later. Also I tried to find what else is slow in FreeBSD, I tried hwpmc as module and in kernel, but it fails with error: pmc: Unknown Intel CPU. module_register_init: MOD_LOAD (hwpmc, 0x804833e0, 0x809338a0) error 78 There are patches you need to enable it on woodcrest. They are in my p4 branch (kris-contention) but I don't have time right now to extract them. This is related to http://www.freebsd.org/cgi/query-pr.cgi?pr=amd64%2F111994cat= and it is impossible to use hwpmc with modern CPUs. Sounds like it. Is kgmon profiling usable on FreeBSD 7? I've never bothered, it is likely to be quite slow, so it can totally change the workload you are trying to profile. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Ivan Voras wrote: On 20/11/2007, Alexey Popov [EMAIL PROTECTED] wrote: CPU states: 5.9% user, 0.0% nice, 81.3% system, 0.0% interrupt, 12.8% idle CPU states: 82.2% user, 0.0% nice, 13.8% system, 0.0% interrupt, 4.0% idle Interesting coincidence: 1 CPU generates almost 8x less sys time then 8 CPUs. But it seems that you have found something real. Inspired by your problem I've done a simple measurement (ab) on a 4-CPU (2x2 core Opterons 2216 HE, PAE) machine I maintain, under these circumstances: Would someone define what exact tests to be performed. Ok, using ab is fine but with what parameters it is used and against what, script or static html? It will be good to have written some perl, php ... scripts or C programs which simulates some kind of 'real world' work. There are lot of people who thinking 'it is good for me' (including me) but what can be done with such hardware? Best Regards - a heavy PHP application - FastCGI - in this case, load of 4 clients - on 6-STABLE and I'm reporting similar findings: last pid: 2254; load averages: 1.43, 0.92, 0.69 up 71+08:23:06 18:00:31 153 processes: 8 running, 144 sleeping, 1 zombie CPU states: 38.8% user, 0.0% nice, 48.4% system, 3.2% interrupt, 9.6% idle Mem: 2321M Active, 1135M Inact, 313M Wired, 139M Cache, 112M Buf, 93M Free Swap: 4500M Total, 336K Used, 4500M Free PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND 2208 www 1 990 115M 19808K RUN1 0:06 36.83% php-cgi 2207 www 1 1000 114M 19348K RUN3 0:05 32.66% php-cgi 1715 www 1 990 115M 23672K CPU0 0 0:24 27.83% php-cgi 1710 www 1 1010 114M 23460K RUN1 0:31 22.17% php-cgi 1882 www 1 990 115M 23392K CPU2 3 0:18 21.34% php-cgi 1718 www 1 40 114M 22556K sbwait 0 0:21 19.14% php-cgi 2677 pgsql 1 40 977M 55768K sbwait 0 0:00 28.00% postgres We are not so performance bound as you so I didn't do measurements earlier. I cannot play with settings on this machine as it is in production, but ~~50% sys time (the measurement changes around 45% +/- 10%) seems too much. On another 4-CPU machine (2x2 Xeons 5110, AMD64) with the same application and benchmark setup, but RELENG_7, which is not yet in production, the results are slightly different: last pid: 66564; load averages: 1.87, 0.48, 0.18 up 15+05:27:03 17:09:09 113 processes: 9 running, 104 sleeping CPU states: 49.0% user, 0.0% nice, 28.8% system, 0.0% interrupt, 22.1% idle Mem: 555M Active, 295M Inact, 884M Wired, 98M Cache, 213M Buf, 135M Free Swap: 2047M Total, 2047M Free PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND 66557 www 1 1090 105M 25340K RUN3 0:14 64.99% php-cgi 66559 www 1 1090 105M 25308K RUN2 0:14 62.99% php-cgi 66561 www 1 980 105M 22196K RUN0 0:01 12.99% php-cgi 66562 www 1 980 105M 22196K RUN1 0:01 11.96% php-cgi 59043 nobody 1 470 7012K 3744K select 2 0:27 5.96% sqlcached 774 pgsql1 440 437M 112M select 2 3:55 0.00% postgres ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFHRp75xJBWvpalMpkRAhbVAKClBhCif9G/bYPq6hHaNxAyT9NuLwCfb8+a Aqmf9RT+LBNYqKOE6crBs9g= =LL1v -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Also I tried to find what else is slow in FreeBSD, I tried hwpmc as module and in kernel, but it fails with error: pmc: Unknown Intel CPU. module_register_init: MOD_LOAD (hwpmc, 0x804833e0, 0x809338a0) error 78 There are patches you need to enable it on woodcrest. They are in my p4 branch (kris-contention) but I don't have time right now to extract them. These patches make hwpmc treat these CPUs are possessing Pentium-Pro class PMCs. Unfortunately, this is easy to do, but incorrect: - There are differences in the legal bit values that may be loaded into PMC registers for many hardware events. - hwpmc needs to be taught to support measurements on CPUs with multiple cores per package. And then there is additional work to support these CPUS at the same level as the current set: - The hardware events supported are named differently; documentation, libpmc's event selector parsing code need to be changed to suit. - The hardware supports a new class of fixed function PMCs that hwpmc needs to support. -- FreeBSD Volunteer, http://people.freebsd.org/~jkoshy ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
On 23/11/2007, Krassimir Slavchev [EMAIL PROTECTED] wrote: Would someone define what exact tests to be performed. Ok, using ab is fine but with what parameters it is used and against what, script or static html? It will be good to have written some perl, In this thread, it's always PHP code, with database backends. php ... scripts or C programs which simulates some kind of 'real world' work. The problem is that a realistic applications does a lot of things that are not easily simulated: - usually has a lot of code, lots of include files, libraries, etc. (so it stresses file systems, as was shown with fstat() in the thread - the code is most likely checking for changes in PHP libraries) - uses a database, which is populated with real-world data (so it has a lot of IPC of very varied sizes) - uses some kind of caching, both of compiled PHP code (eAccelerator, pecl-APC) and of data (eAccelerator, memcached) (which uses SysV SHM and IPC). Reducing all that to a C file that does all of it is very nontrivial. For classic setups with mod_php, it's not uncommon that httpd processes grow to 100 MB or more each, with all the heavy stuff brought in. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Hi Kris Kennaway wrote: Now FreeBSD 7-STABLE ULE 8-core server without optimized PHP realpath_cache_size (producing 2000+ lstats per request) can handle up to ~24 rps as opposed to max. 17 rps without your patch. %sys never grows over %user with your patch. On the server with optimized realpath_cache_size there's no visible influence of your patch. You said 20 before for this configuration, so I'm a bit suspicious about how seriously to treat your measurements :) Sorry, my mistake. s/ULE/4BSD. OK, please compare ULE to ULE with and without my patch (and remembering to enable the sysctl), and obtain lock profiling traces in both cases under identical workloads durations. That is what I need to proceed with this issue. I didn't measured the exact values of requests per second on ULE with patch and without patch, but at first glance the benefits of the patch are similiar to 4BSD. If you need this values, I'll obtain them. Here you can find lock profiling results for 7-BETA3 GENERIC kernel with SCHED_ULE running optimized PHP and unoptimized, with your patch and without it: http://83.167.98.162/gprof/lockmgr/ This data was collected by th following script: (sysctl debug.lock.prof.reset=1 sysctl debug.lock.prof.enable=1 sleep 60 sysctl debug.lock.prof.enable=0 sysctl debug.lock.prof.stats top -d 2 -b | tail -25) AFAIU there's still high contention on lockbuilder mtxpool with patch applied. But hopefully lockmgr:ufs contention which i believe produced 80%sysCPU load is gone with your patch. Also I tried to find what else is slow in FreeBSD, I tried hwpmc as module and in kernel, but it fails with error: pmc: Unknown Intel CPU. There are patches you need to enable it on woodcrest. They are in my p4 branch (kris-contention) but I don't have time right now to extract them. I think it would be very useful because I can't see any other ways to profile FreeBSD on the modern many-cores machines. With best regards, Alexey Popov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
On Tuesday 20 November 2007, Kris Kennaway wrote: Kris Kennaway wrote: Kris Kennaway wrote: In the meantime there is unfortunately not a lot that can be done, AFAICT. There is one hack that I will send you later but it is not likely to help much. I will also think about how to track down the cause of the contention further (the profiling trace only shows that it comes mostly from vget/vput but doesn't show where these are called from). Actually this patch might help. It doesn't replace lockmgr but it does fix a silly thundering herd behaviour. It probably needs some adjustment to get it to apply cleanly (it is about 7 months old), and I apparently stopped using it because I ran into deadlocks. It might be stable enough to at least see how much it helps. Set the vfs.lookup_shared=1 sysctl to enable the other half of the patch. Kris Try this one instead, it applies to HEAD. You'll need to manually enter the paths though because of how p4 mangles diffs. I rolled a tiny, simple, possibly braindamaged benchmark (but then again php code tends to be braindamaged): test.php includes 1000 different, essential empty files and is strated over and over from a shell script which counts the runs completed within 60seconds. 1-8,128 scripts are started in parallel. On a 2x dual Opteron running amd64 I get: stock RELENG_7 w/o patch ULE: jobs sum runs gain 1 6171 2 7841.27 3 9391.52 4 10151.65 5 6581.07 6 6421.04 7 6661.08 8 6961.13 128 7261.18 RELENG_7 patched ULE vfs.lookup_shared=1: jobs sum runs gain 1 6371 2 7841.23 3 9731.53 4 11041.73 5 7081.11 6 7331.15 7 7761.22 8 8401.32 128 9361.47 So there is still a lot of room for improvement here. I'll rebuild with lock profiling tomorrow and see what I can gather. Anything you'd like to see in particular? -- /\ Best regards, | [EMAIL PROTECTED] \ / Max Laier | ICQ #67774661 X http://pf4freebsd.love2party.net/ | [EMAIL PROTECTED] / \ ASCII Ribbon Campaign | Against HTML Mail and News signature.asc Description: This is a digitally signed message part.
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Hi. Kris Kennaway wrote: In the meantime there is unfortunately not a lot that can be done, AFAICT. There is one hack that I will send you later but it is not likely to help much. I will also think about how to track down the cause of the contention further (the profiling trace only shows that it comes mostly from vget/vput but doesn't show where these are called from). Actually this patch might help. It doesn't replace lockmgr but it does fix a silly thundering herd behaviour. It probably needs some adjustment to get it to apply cleanly (it is about 7 months old), and I apparently stopped using it because I ran into deadlocks. It might be stable enough to at least see how much it helps. Try this one instead, it applies to HEAD. You'll need to manually enter the paths though because of how p4 mangles diffs. Finally I tried your patch and it seems to help a little. Now FreeBSD 7-STABLE ULE 8-core server without optimized PHP realpath_cache_size (producing 2000+ lstats per request) can handle up to ~24 rps as opposed to max. 17 rps without your patch. %sys never grows over %user with your patch. On the server with optimized realpath_cache_size there's no visible influence of your patch. However Linux is still 2 times faster for my workload and there should also be another ways for optimization. With best regards, Alexey Popov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Kris Kennaway wrote: Ivan Voras wrote: On 21/11/2007, Kris Kennaway [EMAIL PROTECTED] wrote: Ivan Voras wrote: Yes, but I had to verify it anyway :) You haven't verified anything until you look at how much work the system is doing, before and after. I have, and it's roughly the same (50 +/- 2 queries/s). (meaning that I'm not interested in exact statistics here, but in order-of-magnitude changes, which didn't happen). OK, let's take a step back here. Did you obtain the lock profiling trace and verify that you're seeing the same problem as Alexey? Can I see the trace? Here it is: http://ivoras.sharanet.org/stuff/lock_profile.txt This is without your patch. There's a lot of ZFS locks in there, but it seems lockmgr:ufs and lockmgr:zfs have the largest records: 299117621 1474776121 148663 1042821 1414 0 513 440 /usr/src/sys/kern/vfs_subr.c:2035 (lockmgr:ufs) 117958368847566147 1820932676 31672868 948 374 /usr/src/sys/kern/vfs_vnops.c:515 (lockmgr:zfs) Which is surprising since all the working-set file systems are on ZFS, only the root and /tmp are on UFS. /tmp also holds sockets for the databases. Your reading of the lock profile will be appreciated. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Alexey Popov wrote: Hi. Kris Kennaway wrote: In the meantime there is unfortunately not a lot that can be done, AFAICT. There is one hack that I will send you later but it is not likely to help much. I will also think about how to track down the cause of the contention further (the profiling trace only shows that it comes mostly from vget/vput but doesn't show where these are called from). Actually this patch might help. It doesn't replace lockmgr but it does fix a silly thundering herd behaviour. It probably needs some adjustment to get it to apply cleanly (it is about 7 months old), and I apparently stopped using it because I ran into deadlocks. It might be stable enough to at least see how much it helps. Try this one instead, it applies to HEAD. You'll need to manually enter the paths though because of how p4 mangles diffs. Finally I tried your patch and it seems to help a little. Now FreeBSD 7-STABLE ULE 8-core server without optimized PHP realpath_cache_size (producing 2000+ lstats per request) can handle up to ~24 rps as opposed to max. 17 rps without your patch. %sys never grows over %user with your patch. On the server with optimized realpath_cache_size there's no visible influence of your patch. You said 20 before for this configuration, so I'm a bit suspicious about how seriously to treat your measurements :) Anyway, please obtain another lock profiling trace using the same conditions as the previous one (same workload duration, etc), so we can compare what changed. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Ivan Voras wrote: On 22/11/2007, Kris Kennaway [EMAIL PROTECTED] wrote: Ivan Voras wrote: Kris Kennaway wrote: OK, let's take a step back here. Did you obtain the lock profiling trace and verify that you're seeing the same problem as Alexey? Can I see the trace? Here it is: http://ivoras.sharanet.org/stuff/lock_profile.txt This is without your patch. Your reading of the lock profile will be appreciated. OK, how about with? The machine is going into production and I can't do such interventions on it any more. Based on the lock trace, do you think It's the same problem as Alexeys? It looks like lockmgr, and the patch should definitely have helped. Maybe you forgot to enable vfs.lookup_shared? Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
On 22/11/2007, Kris Kennaway [EMAIL PROTECTED] wrote: It looks like lockmgr, and the patch should definitely have helped. Maybe you forgot to enable vfs.lookup_shared? No, I haven't. But the machine I tested it on is only 4-core; maybe it would help on 8-core machines. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
On 22/11/2007, Kris Kennaway [EMAIL PROTECTED] wrote: Ivan Voras wrote: Kris Kennaway wrote: OK, let's take a step back here. Did you obtain the lock profiling trace and verify that you're seeing the same problem as Alexey? Can I see the trace? Here it is: http://ivoras.sharanet.org/stuff/lock_profile.txt This is without your patch. Your reading of the lock profile will be appreciated. OK, how about with? The machine is going into production and I can't do such interventions on it any more. Based on the lock trace, do you think It's the same problem as Alexeys? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Ivan Voras wrote: Kris Kennaway wrote: Ivan Voras wrote: On 21/11/2007, Kris Kennaway [EMAIL PROTECTED] wrote: Ivan Voras wrote: Yes, but I had to verify it anyway :) You haven't verified anything until you look at how much work the system is doing, before and after. I have, and it's roughly the same (50 +/- 2 queries/s). (meaning that I'm not interested in exact statistics here, but in order-of-magnitude changes, which didn't happen). OK, let's take a step back here. Did you obtain the lock profiling trace and verify that you're seeing the same problem as Alexey? Can I see the trace? Here it is: http://ivoras.sharanet.org/stuff/lock_profile.txt This is without your patch. There's a lot of ZFS locks in there, but it seems lockmgr:ufs and lockmgr:zfs have the largest records: 299117621 1474776121 148663 1042821 1414 0 513 440 /usr/src/sys/kern/vfs_subr.c:2035 (lockmgr:ufs) 117958368847566147 1820932676 31672868 948 374 /usr/src/sys/kern/vfs_vnops.c:515 (lockmgr:zfs) Which is surprising since all the working-set file systems are on ZFS, only the root and /tmp are on UFS. /tmp also holds sockets for the databases. Your reading of the lock profile will be appreciated. OK, how about with? Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Kris Kennaway wrote: Try this one instead, it applies to HEAD. You'll need to manually enter the paths though because of how p4 mangles diffs. It doesn't help, at least in my case (only 4 clients) - the sys time is still around 30% on a 4-CPU machine. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Ivan Voras wrote: Kris Kennaway wrote: Try this one instead, it applies to HEAD. You'll need to manually enter the paths though because of how p4 mangles diffs. It doesn't help, at least in my case (only 4 clients) - the sys time is still around 30% on a 4-CPU machine. I've already explained why that is meaningless. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Kris Kennaway wrote: Ivan Voras wrote: Kris Kennaway wrote: Try this one instead, it applies to HEAD. You'll need to manually enter the paths though because of how p4 mangles diffs. It doesn't help, at least in my case (only 4 clients) - the sys time is still around 30% on a 4-CPU machine. I've already explained why that is meaningless. Yes, but I had to verify it anyway :) ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Ivan Voras wrote: Kris Kennaway wrote: Ivan Voras wrote: Kris Kennaway wrote: Try this one instead, it applies to HEAD. You'll need to manually enter the paths though because of how p4 mangles diffs. It doesn't help, at least in my case (only 4 clients) - the sys time is still around 30% on a 4-CPU machine. I've already explained why that is meaningless. Yes, but I had to verify it anyway :) You haven't verified anything until you look at how much work the system is doing, before and after. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Hi. Kris Kennaway wrote: In the meantime there is unfortunately not a lot that can be done, AFAICT. There is one hack that I will send you later but it is not likely to help much. I will also think about how to track down the cause of the contention further (the profiling trace only shows that it comes mostly from vget/vput but doesn't show where these are called from). Actually this patch might help. It doesn't replace lockmgr but it does fix a silly thundering herd behaviour. It probably needs some adjustment to get it to apply cleanly (it is about 7 months old), and I apparently stopped using it because I ran into deadlocks. It might be stable enough to at least see how much it helps. Sorry, I didn't try you patch yet but I have other news. As mentioned in the description of your patch there is probably a scalability problem with stat() syscall on FreeBSD. The PHP code of our site consists of large amount of modules. I think this is true for many other large PHP sites. I reached out that PHP calls lstat() for every path element of each file it opens including modules. Truss output shows that PHP makes more than 2000 lstat's for one /index.php request. After investigation I found out that lstats() are called from realpath() libc function. It turned out that PHP has realpath cache, but it's size by default is 16K which is not enough for my files. I set realpath_cache_size to 256K and now there is no that much lstat calls. Performance of 8-core machine growed in ~ 50% for me on 7-STABLE. Now it can handle 30 and more requests per seconds. I have the similiar results with 6-STABLE. Now I have not that big %sys values as it was before (see attached top output). Nevertheless, Linux with its 50 rps is still far away from FreeBSD. Linux makes that 2000+ lstat's without problem. There's still stat(), open(), gettimeofday(), close() syscalls for each include file in PHP that i can not switch off. And also it is unclear for me what to do with MySQL which happened to have the same problems for me. With best regards, Alexey Popov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Alexey Popov wrote: Also could you explain what to look for in the lock profiling results? Does large wait_total values indicate problem or other columns??? All of the columns (well, maybe except for the lock name ;-) can indicate potential problems of various kinds, so you have to look at them all to identify possible abnormalities. For example, if you are acquiring a mutex many times you might look for ways to reduce the frequency of acquisitions. If it is being held for long periods of time this can increase contention for other consumers. If there is high lock contention then processes will block waiting for it. If processes are spending a lot of time waiting for the lock then they are not getting work done. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
On 21/11/2007, Kris Kennaway [EMAIL PROTECTED] wrote: Ivan Voras wrote: Yes, but I had to verify it anyway :) You haven't verified anything until you look at how much work the system is doing, before and after. I have, and it's roughly the same (50 +/- 2 queries/s). (meaning that I'm not interested in exact statistics here, but in order-of-magnitude changes, which didn't happen). ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Hi all. Sorry, forgot to attach top vmstat oputput on 8-core 7-stable with optimized PHP realpath_cache_size. With best regards, Alexey Popov last pid: 91239; load averages: 4.64, 4.72, 7.82 up 0+19:13:37 14:07:50 53 processes: 7 running, 46 sleeping CPU states: 78.0% user, 0.0% nice, 21.5% system, 0.0% interrupt, 0.4% idle Mem: 341M Active, 181M Inact, 225M Wired, 272K Cache, 186M Buf, 3158M Free Swap: 2048M Total, 2048M Free PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND 91238 www 1 40 99644K 48752K sbwait 4 0:03 64.04% httpd 91233 www 1 100098M 46124K select 3 0:03 55.79% httpd 91236 www 1 1000 92476K 41000K select 3 0:03 54.69% httpd 91234 www 1 1000 99644K 47212K select 7 0:03 54.02% httpd 91235 www 1 1000 93500K 42508K CPU1 0 0:03 52.26% httpd 91232 www 1 1000 92476K 41980K select 4 0:03 45.17% httpd 91231 www 1 1000 97596K 43656K CPU7 1 0:03 41.22% httpd 91226 www 1 1000 99644K 49524K select 6 0:05 37.63% httpd 91228 www 1 1000 98620K 46516K select 5 0:05 37.02% httpd 91237 www 1 980 99644K 43588K select 1 0:01 32.83% httpd 91223 www 1 1010 96572K 47312K select 5 0:07 31.85% httpd 91229 www 1 990 99644K 46632K select 4 0:03 30.42% httpd 91135 www 1 101098M 51528K select 7 0:33 29.86% httpd 91227 www 1 100099M 47212K select 2 0:03 28.02% httpd 91225 www 1 100098M 47068K CPU5 5 0:06 27.56% httpd 91180 www 1 1000 113M 62152K CPU2 3 0:21 26.41% httpd 91224 www 1 100099M 50260K select 5 0:06 24.21% httpd 91214 www 1 990 99644K 49576K select 5 0:11 23.49% httpd 91212 www 1 100099M 51548K select 2 0:10 22.89% httpd 91230 www 1 98098M 46200K select 7 0:02 21.31% httpd 91077 www 1 99099M 51940K CPU6 4 0:50 20.41% httpd 91209 www 1 990 97596K 46316K CPU3 3 0:10 20.08% httpd 91196 www 1 980 99648K 50412K select 7 0:13 18.59% httpd 91239 www 1 960 99644K 45728K select 1 0:01 11.57% httpd 18052 llp 1 960 32928K 4544K select 5 0:25 0.00% sshd 698 root 1 960 8952K 2516K select 0 0:02 0.00% ntpd 779 root 1 960 20952K 3740K select 0 0:01 0.00% sshd 89816 root 1 1000 86332K 13340K select 6 0:01 0.00% httpd 18074 root 1 200 9616K 3208K pause 0 0:01 0.00% csh 786 root 1 80 5736K 1388K nanslp 2 0:01 0.00% cron 765 root 1 40 4852K 1640K kqread 5 0:00 0.00% master 2 usersLoad 7.14 7.38 6.83 Nov 21 16:20 Mem:KBREALVIRTUAL VN PAGER SWAP PAGER Tot Share TotShareFree in out in out Act 337476 33184 63070037128 3249600 count All 397780 35216 490046448580 pages Proc:Interrupts r p d s w Csw Trp Sys Int Sof Flt331 cow 20208 total 4 2 43 2 28k 33k 112k 4228 989 31k 31162 zfodsio0 irq4 ozfod ata0 irq14 8.2%Sys 0.1%Intr 48.3%User 0.0%Nice 43.5%Idle%ozfod mfi0 irq18 ||||||||||| daefr uhci0 uhci 3413 prcfr 1972 cpu0: time 9 dtbuf26138 totfr 4228 em0 irq256 Namei Name-cache Dir-cache10 desvn react 2021 cpu2: time Callshits %hits % 10544 numvn pdwak 2021 cpu3: time 88017 88017 100 283 frevn pdpgs 1983 cpu6: time intrn 1983 cpu7: time Disks mfid0241756 wire 1972 cpu1: time KB/t 0.00313748 act2014 cpu4: time tps 0196500 inact 2014 cpu5: time MB/s 0.00 256 cache %busy 0 3247264 free 206320 buf ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
On Wed, Nov 21, 2007 at 02:13:19PM +0300, Alexey Popov wrote: As mentioned in the description of your patch there is probably a scalability problem with stat() syscall on FreeBSD. I wrote a quick tool to lstat() path elements on an otherwise idle dual-core system (1.6GHz Turion64x2, FreeBSD6.3/amd64). One instance: ~62k lstat/sec. 99% sys Two instances, same path: ~43k lstat/sec/instance. 97%sys Two instances, different path, same fs: ~50k lstat/sec/instance. 97%sys Two instances, different fs: ~53k lstat/sec/instance. 98%sys The slowdowns, especially the same path instance, are worse than I would have hoped. makes that 2000+ lstat's without problem. There's still stat(), open(), gettimeofday(), close() syscalls for each include file in PHP that i can not switch off. Note that gettimeofday() is known to be much slower (and more accurate) on FreeBSD than on Linux. Robert Watson (if I recall correctly) has done some work on building a framework to allow a choice between slow-and-accurate and fast-and-less-precise timestamps. I don't have the reference to hand but a check of the archives should turn it up. -- Peter Jeremy Please excuse any delays as the result of my ISP's inability to implement an MTA that is either RFC2821-compliant or matches their claimed behaviour. pgp3yzodXAptR.pgp Description: PGP signature
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Alexey Popov wrote: Hi. Kris Kennaway wrote: In the meantime there is unfortunately not a lot that can be done, AFAICT. There is one hack that I will send you later but it is not likely to help much. I will also think about how to track down the cause of the contention further (the profiling trace only shows that it comes mostly from vget/vput but doesn't show where these are called from). Actually this patch might help. It doesn't replace lockmgr but it does fix a silly thundering herd behaviour. It probably needs some adjustment to get it to apply cleanly (it is about 7 months old), and I apparently stopped using it because I ran into deadlocks. It might be stable enough to at least see how much it helps. Sorry, I didn't try you patch yet but I have other news. As mentioned in the description of your patch there is probably a scalability problem with stat() syscall on FreeBSD. Not as such, that was just a random example I chose to illustrate the lockmgr problems I described earlier. Try the patch I posted, it should help. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Ivan Voras wrote: On 21/11/2007, Kris Kennaway [EMAIL PROTECTED] wrote: Ivan Voras wrote: Yes, but I had to verify it anyway :) You haven't verified anything until you look at how much work the system is doing, before and after. I have, and it's roughly the same (50 +/- 2 queries/s). (meaning that I'm not interested in exact statistics here, but in order-of-magnitude changes, which didn't happen). OK, let's take a step back here. Did you obtain the lock profiling trace and verify that you're seeing the same problem as Alexey? Can I see the trace? Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Hi. Max Laier wrote: I rolled a tiny, simple, possibly braindamaged benchmark (but then again php code tends to be braindamaged): test.php includes 1000 different, essential empty files and is strated over and over from a shell script which counts the runs completed within 60seconds. 1-8,128 scripts are started in parallel. On a 2x dual Opteron running amd64 I get: This problem is almost invisible for me on 4-core servers. Could you try your benchmark on server with 8 or more cores??? With best regards, Alexey Popov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Hi. Kris Kennaway wrote: CPU states: 9.5% user, 0.0% nice, 82.0% system, 0.5% interrupt, 8.0% idle A wild idea that might not help: try reducing kern.hz in loader.conf to something like 100 and see if something significant changes. Now it runs with hz=100, number of context switches became ~ 2 times less, but still there's 90% system CPU load (see attach). System CPU usage doesn't tell you anything by itself, you need to look at how much work the system is actually doing (pages served/second, or whatever). For example, when your kernel is getting more work done, system CPU usage will also be higher. Usually on PHP backends slow PHP code eats most of the CPU time. I have %user much bigger than %system in CPU states. But now %system is much bigger than %user and I can conclude that on 8-core server FreeBSD consumes more CPU time than PHP. With best regards, Alexey Popov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Alexey Popov wrote: Hi. Kris Kennaway wrote: CPU states: 9.5% user, 0.0% nice, 82.0% system, 0.5% interrupt, 8.0% idle A wild idea that might not help: try reducing kern.hz in loader.conf to something like 100 and see if something significant changes. Now it runs with hz=100, number of context switches became ~ 2 times less, but still there's 90% system CPU load (see attach). System CPU usage doesn't tell you anything by itself, you need to look at how much work the system is actually doing (pages served/second, or whatever). For example, when your kernel is getting more work done, system CPU usage will also be higher. Usually on PHP backends slow PHP code eats most of the CPU time. I have %user much bigger than %system in CPU states. But now %system is much bigger than %user and I can conclude that on 8-core server FreeBSD consumes more CPU time than PHP. That is one possibility, but you still need to look at the actual throughput on these machines before making conclusions about which is performing better. Can you please provide those numbers for 6.x, 7.x with ULE and 4BSD on the 4-core and 8-core systems? Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Hi Kris Kennaway wrote: CPU states: 9.5% user, 0.0% nice, 82.0% system, 0.5% interrupt, 8.0% idle A wild idea that might not help: try reducing kern.hz in loader.conf to something like 100 and see if something significant changes. Usually on PHP backends slow PHP code eats most of the CPU time. I have %user much bigger than %system in CPU states. But now %system is much bigger than %user and I can conclude that on 8-core server FreeBSD consumes more CPU time than PHP. That is one possibility, but you still need to look at the actual throughput on these machines before making conclusions about which is performing better. Can you please provide those numbers for 6.x, 7.x with ULE and 4BSD on the 4-core and 8-core systems? Ok, here's results of practical research. The following is approximate maximum qps that backends can survive with my workload: 7-STABLE quad ULE 20 7-STABLE quad 4BSD 17 6-STABLE quad 14 6-STABLE dual 21 Linux CentOS 5 quad 50 With best regards, Alexey Popov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Hi Alexey, Can you please send and dmesg from FreeBSD 7 on this server? As I'm little puzzled what you mean by 7-stable :) Alexey Popov wrote: Hi. I have a large pool of web backends (Apache + mod_php5) with 2 x Xeon 3.2GHz processors and 2 x Xeon 5120 dual-core processors. The workload is mostly CPU-bound. I'm using 6-STABLE-amd64 and also tried 7-STABLE. Now I'm trying to use new hardware with 2 x Xeon 5320 (quad-core), but it can not work under the same load as dual-core. It shows up to 80% system CPU load in top: ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED] -- Best Wishes, Stefan Lambrev ICQ# 24134177 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Hi. Ivan Voras wrote: Some more ideas: How is your disk load (iostat, systat -vm, diskinfo -t) during the load? You don't use NFS for the web directories, do you? Can you run bonnie++ while the machine is idle (i.e. apache is stopped) just to verify it isn't a stupid problem with the disks or the driver? There's almost no disk load except writing ~15 strings per second to logs. All PHP code fits in memory and there's no need to read disk. atime turned off. NFS is not used. So, you pick the CPU out of the motherboard and plug in another one? If not, you can't be sure that some other thing isn't wrong. I know you tried it on Linux, but it might use slightly different commands in the driver that don't trigger the error. I'm very surprised that both 6.x and 7.x behave almost the same on your load: since they are very different in how they support multiple CPU-s, I'd expect a big difference in this case (in favour of 7.x), not a small one. This might point that the problem is not in the OS itself, but maybe in the hardware or in some driver. I did'nt change CPU myself, but I think this 4-core and 8-core servers (Intel SR1500 platform) are different only in CPUs. You can see it in dmesg in the root of this thread. You don't have WITNESS, INVARIANTS, DIAGNOSTICS or something similar enabled? Can you try a generic SMP kernel (called SMP in 6.x; the GENERIC in 7.x has SMP by default) and see how it works? Can you disable SMP and try with only one CPU (on the 2xquad machine)? You can do it in loader.conf by setting kern.smp.disabled=1, or perhaps in BIOS. If there's a problem in some hardware or a driver, you'd still get a big load on sys time. You might also want to halt certain logical CPUs in the OS itself (see smp(4) man page) and see if there's a certain relationship between how many CPUs are running and what the sys load is. Thank you. I need some time to try all this. I'll report if find something. With best regards, Alexey Popov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Hi. Ivan Voras wrote: Many people (including me) have run FreeBSD on machines like yours without such problems, so let's dig further. You don't have WITNESS, INVARIANTS, DIAGNOSTICS or something similar enabled? Can you try a generic SMP kernel (called SMP in 6.x; the GENERIC in 7.x has SMP by default) and see how it works? Can you disable SMP and try with only one CPU (on the 2xquad machine)? You can do it in loader.conf by setting kern.smp.disabled=1, or perhaps in BIOS. If there's a problem in some hardware or a driver, you'd still get a big load on sys time. You might also want to halt certain logical CPUs in the OS itself (see smp(4) man page) and see if there's a certain relationship between how many CPUs are running and what the sys load is. Now I'm running yesterday's FreeBSD 7.0-BETA3 amd64 with GENERIC kernel. I rebuilt kernel and world with clean make.conf. Also I rebuilt Apache, PHP and eAccelerator from scratch. I tried APC as well. No success. I tried 7-STABLE with UP kernel (GENERIC built without SMP config option). It works fine and can handle around 5-10 requests per second. It consumes %sys time is much less than %user time (see top output in attach). I.e. it seems to work good as a simple server with not so powerfull CPU. After that I rebuilt with SMP GENERIC kernel and put on that server 2 times more requests that UP could handle. For the first time it worked good. Then I increased load to 2.5 times more than UP. Immediately Apache child count increased to MaxClients (24), most of them in RUN state, and %sys became greater than %user (see attach). I think after some threshold of load FreeBSD is paying more CPU time to the management of running processes than to run them. Also I tried to halt CPUs by machdep.hlt_cpus sysctl, but in that case %sys in top was still much greater than %user. With best regards, Alexey Popov last pid: 1100; load averages: 8.55, 5.20, 2.35up 0+00:05:39 18:59:52 48 processes: 22 running, 26 sleeping CPU states: 5.9% user, 0.0% nice, 81.3% system, 0.0% interrupt, 12.8% idle Mem: 245M Active, 14M Inact, 102M Wired, 108K Cache, 48M Buf, 3543M Free Swap: 2048M Total, 2048M Free PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND 1093 www 1 1050 94524K 39716K CPU1 7 0:09 34.41% httpd 1094 www 1 1050 91452K 39732K select 4 0:09 34.01% httpd 1097 www 1 -4098M 48392K RUN7 0:10 33.41% httpd 1098 www 1 1050 92476K 43176K CPU4 7 0:09 33.27% httpd 1099 www 1 -40 92476K 40784K RUN7 0:09 33.21% httpd 1100 www 1 -40 92476K 41080K RUN4 0:09 32.87% httpd 1095 www 1 -40 92476K 40824K RUN6 0:09 32.74% httpd 1090 www 1 -40 96572K 42700K RUN5 0:09 32.54% httpd 1089 www 1 -40 93504K 42032K RUN7 0:09 32.41% httpd 1091 www 1 -40 95548K 44900K RUN4 0:09 31.95% httpd 1096 www 1 -40 98620K 47160K RUN6 0:09 31.86% httpd 1086 www 1 -40 96572K 45752K RUN6 0:10 30.92% httpd 1087 www 1 1040 92476K 41016K CPU7 6 0:09 30.70% httpd 1088 www 1 1040 92476K 38332K CPU2 6 0:10 30.51% httpd 1085 www 1 1050 97596K 44416K CPU5 5 0:09 30.23% httpd 1092 www 1 -40 92476K 40172K RUN6 0:08 29.45% httpd last pid: 2203; load averages: 9.71, 10.08, 7.38up 0+00:17:48 18:50:39 35 processes: 5 running, 30 sleeping CPU states: 82.2% user, 0.0% nice, 13.8% system, 0.0% interrupt, 4.0% idle Mem: 128M Active, 15M Inact, 109M Wired, 132K Cache, 88M Buf, 3657M Free Swap: 2048M Total, 2048M Free PID USERNAME THR PRI NICE SIZERES STATETIME WCPU COMMAND 2201 www 1 1170 92476K 40352K RUN 0:02 10.48% httpd 2203 www 1 1160 96572K 40912K RUN 0:02 10.05% httpd 2195 www 1 1170 93500K 43012K select 0:02 9.61% httpd 2202 www 1 200 91452K 41056K lockf0:02 8.95% httpd 2194 www 1 1170 102M 49440K RUN 0:03 8.10% httpd 2192 www 1 1140 99648K 46168K select 0:03 6.76% httpd 2179 www 1 200 92476K 41672K lockf0:04 6.69% httpd 2173 www 1 1180 100M 48920K RUN 0:04 5.92% httpd 2174 www 1 200 92476K 42964K lockf0:04 5.67% httpd 878 llp 1 960 32928K 4576K select 0:00 0.00% sshd 891 root 1 200 9616K 2924K pause0:00 0.00% csh 691 root 1 960 8952K 2528K select 0:00 0.00% ntpd 2161 root 1 1310 86332K 13080K select 0:00 0.00% httpd 2178 root 1 960 7656K 2168K RUN 0:00 0.00% top 875 root 1 40 32928K 4512K sbwait 0:00 0.00% sshd 774 root 1 40 4852K 1652K kqread 0:00 0.00% master
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
On Tue, 2007-11-20 at 19:27 +0300, Alexey Popov wrote: Hi. snip After that I rebuilt with SMP GENERIC kernel and put on that server 2 times more requests that UP could handle. For the first time it worked good. Then I increased load to 2.5 times more than UP. Immediately Apache child count increased to MaxClients (24), most of them in RUN state, and %sys became greater than %user (see attach). I think after some threshold of load FreeBSD is paying more CPU time to the management of running processes than to run them. Also I tried to halt CPUs by machdep.hlt_cpus sysctl, but in that case %sys in top was still much greater than %user. MaxClients of 24 seems very low for a 8 cpu box, running prefork MPM. On our quad CPU boxes, running custom apache modules, we use MaxClients 70 MinSpareServers 5 MaxSpareServers 15 StartServers 20 Perhaps you are seeing high system load because the system is having to maintain a lot of queued connections. Certainly, our load remains in-between comfortable margins, except when heavily stressed. Tom signature.asc Description: This is a digitally signed message part
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
On 20/11/2007, Alexey Popov [EMAIL PROTECTED] wrote: CPU states: 5.9% user, 0.0% nice, 81.3% system, 0.0% interrupt, 12.8% idle CPU states: 82.2% user, 0.0% nice, 13.8% system, 0.0% interrupt, 4.0% idle Interesting coincidence: 1 CPU generates almost 8x less sys time then 8 CPUs. But it seems that you have found something real. Inspired by your problem I've done a simple measurement (ab) on a 4-CPU (2x2 core Opterons 2216 HE, PAE) machine I maintain, under these circumstances: - a heavy PHP application - FastCGI - in this case, load of 4 clients - on 6-STABLE and I'm reporting similar findings: last pid: 2254; load averages: 1.43, 0.92, 0.69 up 71+08:23:06 18:00:31 153 processes: 8 running, 144 sleeping, 1 zombie CPU states: 38.8% user, 0.0% nice, 48.4% system, 3.2% interrupt, 9.6% idle Mem: 2321M Active, 1135M Inact, 313M Wired, 139M Cache, 112M Buf, 93M Free Swap: 4500M Total, 336K Used, 4500M Free PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND 2208 www 1 990 115M 19808K RUN1 0:06 36.83% php-cgi 2207 www 1 1000 114M 19348K RUN3 0:05 32.66% php-cgi 1715 www 1 990 115M 23672K CPU0 0 0:24 27.83% php-cgi 1710 www 1 1010 114M 23460K RUN1 0:31 22.17% php-cgi 1882 www 1 990 115M 23392K CPU2 3 0:18 21.34% php-cgi 1718 www 1 40 114M 22556K sbwait 0 0:21 19.14% php-cgi 2677 pgsql 1 40 977M 55768K sbwait 0 0:00 28.00% postgres We are not so performance bound as you so I didn't do measurements earlier. I cannot play with settings on this machine as it is in production, but ~~50% sys time (the measurement changes around 45% +/- 10%) seems too much. On another 4-CPU machine (2x2 Xeons 5110, AMD64) with the same application and benchmark setup, but RELENG_7, which is not yet in production, the results are slightly different: last pid: 66564; load averages: 1.87, 0.48, 0.18 up 15+05:27:03 17:09:09 113 processes: 9 running, 104 sleeping CPU states: 49.0% user, 0.0% nice, 28.8% system, 0.0% interrupt, 22.1% idle Mem: 555M Active, 295M Inact, 884M Wired, 98M Cache, 213M Buf, 135M Free Swap: 2047M Total, 2047M Free PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND 66557 www 1 1090 105M 25340K RUN3 0:14 64.99% php-cgi 66559 www 1 1090 105M 25308K RUN2 0:14 62.99% php-cgi 66561 www 1 980 105M 22196K RUN0 0:01 12.99% php-cgi 66562 www 1 980 105M 22196K RUN1 0:01 11.96% php-cgi 59043 nobody 1 470 7012K 3744K select 2 0:27 5.96% sqlcached 774 pgsql1 440 437M 112M select 2 3:55 0.00% postgres ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Thank you for your research. I think you can get more %sys with 4-core processors. For me 2xquad-core systems are now completely unusable as PHP backends. I am getting very alarmed by this discussion as we just took delivery of ten 2x quad core systems to be deployes as heavy webservers in order to replace the dual core ones. probably under 7.0 as 6.3 wont boot PAE and they have 16 gigs of memory. Anyway I'm happy that I'm not alone with this problem. But what can we do about it? when I get a webserver up and running I will also do some benchmarking and see if I get the same results. I am simply running straight forward Obj-C code on mine. -pete. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Hi. Ivan Voras wrote: CPU states: 5.9% user, 0.0% nice, 81.3% system, 0.0% interrupt, 12.8% idle CPU states: 82.2% user, 0.0% nice, 13.8% system, 0.0% interrupt, 4.0% idle Interesting coincidence: 1 CPU generates almost 8x less sys time then 8 CPUs. But it seems that you have found something real. Inspired by your problem I've done a simple measurement (ab) on a 4-CPU (2x2 core Opterons 2216 HE, PAE) machine I maintain, under these circumstances: - a heavy PHP application - FastCGI - in this case, load of 4 clients - on 6-STABLE and I'm reporting similar findings: CPU states: 38.8% user, 0.0% nice, 48.4% system, 3.2% interrupt, 9.6% idle We are not so performance bound as you so I didn't do measurements earlier. I cannot play with settings on this machine as it is in production, but ~~50% sys time (the measurement changes around 45% +/- 10%) seems too much. Thank you for your research. I think you can get more %sys with 4-core processors. For me 2xquad-core systems are now completely unusable as PHP backends. Anyway I'm happy that I'm not alone with this problem. But what can we do about it? With best regards, Alexey Popov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Hi. Tom Evans wrote: After that I rebuilt with SMP GENERIC kernel and put on that server 2 times more requests that UP could handle. For the first time it worked good. Then I increased load to 2.5 times more than UP. Immediately Apache child count increased to MaxClients (24), most of them in RUN state, and %sys became greater than %user (see attach). I think after some threshold of load FreeBSD is paying more CPU time to the management of running processes than to run them. MaxClients of 24 seems very low for a 8 cpu box, running prefork MPM. On our quad CPU boxes, running custom apache modules, we use MaxClients 70 MinSpareServers 5 MaxSpareServers 15 StartServers 20 Perhaps you are seeing high system load because the system is having to maintain a lot of queued connections. Certainly, our load remains in-between comfortable margins, except when heavily stressed. I believe 8-core FreeBSD server is able to maintain 1024 waiting TCP connections without measurable CPU load. As of this problem: increasing MaxClients leads to growing %sys part of CPU load. Generally large MaxClients value is useful when most Apache children are waiting for I/O or something else but CPU. With best regards, Alexey Popov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Thank you for your research. I think you can get more %sys with 4-core processors. For me 2xquad-core systems are now completely unusable as PHP backends. I am getting very alarmed by this discussion as we just took delivery of ten 2x quad core systems to be deployes as heavy webservers in order to replace the dual core ones. probably under 7.0 as 6.3 wont boot PAE and they have 16 gigs of memory. I'm running two DL360 G5 webservers each with two quad-core cpu's. Each have 8 GB of ram, one is 2 Ghz and the other is 2.33 Ghz. They run just fine. These two webservers have twice the weight of three opterons with two dual-core cpu's on our coyote load-balancer. The servers are so fast I had to adjust the read- and write-size in fstab for the nfs-mounts. Using 8kb I got the 'nfs server not responding - alive again' during peak on the new servers. Using a 2kb size instead solved my problem. They handle twice as much traffic as the 4-way-opterons. The quad-cores run 7.0 beta2 with ule, php, apache. -- regards Claus When lenity and cruelty play for a kingdom, the gentlest gamester is the soonest winner. Shakespeare ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Claus Guttesen wrote: I'm running two DL360 G5 webservers each with two quad-core cpu's. Each have 8 GB of ram, one is 2 Ghz and the other is 2.33 Ghz. They run just fine. These two webservers have twice the weight of three opterons with two dual-core cpu's on our coyote load-balancer. The issue in this thread is not if they are fast, but could they be made faster by shortening sys time :) (btw. what is your sys time under stress?) The systems I reported about are fast enough for me, but they are not for the person who started the thread, and I agree that it looks like there could be a problem. signature.asc Description: OpenPGP digital signature
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Alexey Popov wrote: Hi Kris Kennaway wrote: CPU states: 9.5% user, 0.0% nice, 82.0% system, 0.5% interrupt, 8.0% idle A wild idea that might not help: try reducing kern.hz in loader.conf to something like 100 and see if something significant changes. Usually on PHP backends slow PHP code eats most of the CPU time. I have %user much bigger than %system in CPU states. But now %system is much bigger than %user and I can conclude that on 8-core server FreeBSD consumes more CPU time than PHP. That is one possibility, but you still need to look at the actual throughput on these machines before making conclusions about which is performing better. Can you please provide those numbers for 6.x, 7.x with ULE and 4BSD on the 4-core and 8-core systems? Ok, here's results of practical research. The following is approximate maximum qps that backends can survive with my workload: 7-STABLE quad ULE20 7-STABLE quad 4BSD17 6-STABLE quad14 6-STABLE dual21 Linux CentOS 5 quad50 OK, so 7.x is an improvement compared to 6.x on the 8 core machine, and ULE is an improvement over 4BSD. This much is in line with expectations. Neither shows an improvement vs 4 cores. It is hard to say for certain without a direct profile comparison of the workload, but it is probably due to lockmgr contention. lockmgr is used for various locking operations to do with VFS data structures. It is known to have poor performance and scale very badly. It is interesting that you are running into this on a real workload though, so far I have only encountered it as a limiting factor in synthetic microbenchmarks. There was some work done over the summer on replacing lockmgr with something reasonable, but unfortunately it is not yet ready for testing. I am CC'ing the developer who was working on that (Attilio Rao). Depending on his availability it will probably be at least a couple of months before it is ready though. In the meantime there is unfortunately not a lot that can be done, AFAICT. There is one hack that I will send you later but it is not likely to help much. I will also think about how to track down the cause of the contention further (the profiling trace only shows that it comes mostly from vget/vput but doesn't show where these are called from). Kris With best regards, Alexey Popov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Kris Kennaway wrote: In the meantime there is unfortunately not a lot that can be done, AFAICT. There is one hack that I will send you later but it is not likely to help much. I will also think about how to track down the cause of the contention further (the profiling trace only shows that it comes mostly from vget/vput but doesn't show where these are called from). Actually this patch might help. It doesn't replace lockmgr but it does fix a silly thundering herd behaviour. It probably needs some adjustment to get it to apply cleanly (it is about 7 months old), and I apparently stopped using it because I ran into deadlocks. It might be stable enough to at least see how much it helps. Set the vfs.lookup_shared=1 sysctl to enable the other half of the patch. Kris Change 117289 by [EMAIL PROTECTED] on 2007/04/03 18:43:03 Rewrite of lockmgr to avoid the silly multiple wakeups. On a microbenchmark designed for high lockmgr contention (80 processes doing 1000 stat() calls of the same file on an 8-core amd64), this reduces system time by 95% and real time by 82%. This was ported from 6.x so support for LOCK_PROFILING is currently removed. Also add a hack to fake up shared lookups on UFS: acquire the lock in shared mode and then upgrade it to exclusive mode. This has a small benefit on my tests, and it is claimed there is a very large benefit in some workloads. Submitted by: ups Affected files ... ... //depot/user/kris/contention/sys/conf/options#10 edit ... //depot/user/kris/contention/sys/kern/kern_lock.c#7 edit ... //depot/user/kris/contention/sys/kern/subr_lock.c#6 edit ... //depot/user/kris/contention/sys/kern/vfs_default.c#4 edit ... //depot/user/kris/contention/sys/sys/lockmgr.h#5 edit ... //depot/user/kris/contention/sys/ufs/ffs/ffs_vfsops.c#6 edit ... //depot/user/kris/contention/sys/ufs/ffs/ffs_vnops.c#6 edit ... //depot/user/kris/contention/sys/ufs/ufs/ufs_lookup.c#4 edit Differences ... //depot/user/kris/contention/sys/conf/options#10 (text+ko) @@ -248,6 +248,9 @@ # Enable gjournal-based UFS journal. UFS_GJOURNAL opt_ufs.h +# Disable shared lookups for UFS +NO_UFS_LOOKUP_SHARED opt_ufs.h + # The below sentence is not in English, and neither is this one. # We plan to remove the static dependences above, with a # filesystem_ROOT option to control if it usable as root. This list //depot/user/kris/contention/sys/kern/kern_lock.c#7 (text+ko) @@ -41,10 +41,9 @@ */ #include sys/cdefs.h -__FBSDID($FreeBSD: src/sys/kern/kern_lock.c,v 1.109 2007/03/30 18:07:24 jhb Exp $); +__FBSDID($FreeBSD: src/sys/kern/kern_lock.c,v 1.89.2.5 2006/10/09 20:04:45 tegge Exp $); #include opt_ddb.h -#include opt_global.h #include sys/param.h #include sys/kdb.h @@ -55,52 +54,21 @@ #include sys/mutex.h #include sys/proc.h #include sys/systm.h -#include sys/lock_profile.h #ifdef DEBUG_LOCKS #include sys/stack.h #endif #ifdef DDB #include ddb/ddb.h -static voiddb_show_lockmgr(struct lock_object *lock); -#endif -static voidlock_lockmgr(struct lock_object *lock, int how); -static int unlock_lockmgr(struct lock_object *lock); - -struct lock_class lock_class_lockmgr = { - .lc_name = lockmgr, - .lc_flags = LC_SLEEPLOCK | LC_SLEEPABLE | LC_RECURSABLE | LC_UPGRADABLE, -#ifdef DDB - .lc_ddb_show = db_show_lockmgr, #endif - .lc_lock = lock_lockmgr, - .lc_unlock = unlock_lockmgr, -}; /* * Locking primitives implementation. * Locks provide shared/exclusive sychronization. */ -void -lock_lockmgr(struct lock_object *lock, int how) -{ - - panic(lockmgr locks do not support sleep interlocking); -} - -int -unlock_lockmgr(struct lock_object *lock) -{ - - panic(lockmgr locks do not support sleep interlocking); -} - #defineCOUNT(td, x)if ((td)) (td)-td_locks += (x) -#define LK_ALL (LK_HAVE_EXCL | LK_WANT_EXCL | LK_WANT_UPGRADE | \ - LK_SHARE_NONZERO | LK_WAIT_NONZERO) -static int acquire(struct lock **lkpp, int extflags, int wanted, int *contested, uint64_t *waittime); static int acquiredrain(struct lock *lkp, int extflags) ; static __inline void @@ -117,60 +85,16 @@ COUNT(td, -decr); if (lkp-lk_sharecount == decr) { - lkp-lk_flags = ~LK_SHARE_NONZERO; - if (lkp-lk_flags (LK_WANT_UPGRADE | LK_WANT_EXCL)) { - wakeup(lkp); - } + if (lkp-lk_exclusivewait != 0) + wakeup_one(lkp-lk_exclusivewait); lkp-lk_sharecount = 0; } else { lkp-lk_sharecount -= decr; + if (lkp-lk_sharecount == 1 lkp-lk_flags LK_WANT_UPGRADE) + wakeup(lkp-lk_flags); } } -static int -acquire(struct lock **lkpp, int extflags, int wanted, int *contested, uint64_t *waittime) -{ -
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Bob Bishop wrote: Hi, FWIW, we are seeing 2 x quad-core 2.66GHz outperform (per core) 2 x dual-core 3GHz on the same type of m/b, apparently because of better bandwidth to memory. However, this is on a compute-intensive workload running 1 job per core so would be pretty insensitive to scheduler/locking issues. Alexey's problem is pretty specific to filesystem performance. Good to hear though :) Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
FWIW, we are seeing 2 x quad-core 2.66GHz outperform (per core) 2 x dual-core 3GHz on the same type of m/b, apparently because of better bandwidth to memory. However, this is on a compute-intensive workload running 1 job per core so would be pretty insensitive to scheduler/locking issues. Alexey's problem is pretty specific to filesystem performance. Good to hear though :) If that is the conclusion, wouldn't it make sense trying a different disk-controller then? -- regards Claus When lenity and cruelty play for a kingdom, the gentlest gamester is the soonest winner. Shakespeare ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Claus Guttesen wrote: FWIW, we are seeing 2 x quad-core 2.66GHz outperform (per core) 2 x dual-core 3GHz on the same type of m/b, apparently because of better bandwidth to memory. However, this is on a compute-intensive workload running 1 job per core so would be pretty insensitive to scheduler/locking issues. Alexey's problem is pretty specific to filesystem performance. Good to hear though :) If that is the conclusion, wouldn't it make sense trying a different disk-controller then? Filesystem, not disk. See my earlier email for more detailed discussion. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
The issue in this thread is not if they are fast, but could they be made faster by shortening sys time :) Yes, I'm aware of that. :-) The comment was related to the former mail where some uncertainty came along when he read this thread. (btw. what is your sys time under stress?) I'll take a look on sunday. That is our busiest day (evening). -- regards Claus When lenity and cruelty play for a kingdom, the gentlest gamester is the soonest winner. Shakespeare ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Hi, FWIW, we are seeing 2 x quad-core 2.66GHz outperform (per core) 2 x dual-core 3GHz on the same type of m/b, apparently because of better bandwidth to memory. However, this is on a compute-intensive workload running 1 job per core so would be pretty insensitive to scheduler/ locking issues. -- Bob Bishop +44 (0)118 940 1243 [EMAIL PROTECTED] fax +44 (0)118 940 1295 ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Kris Kennaway wrote: Kris Kennaway wrote: In the meantime there is unfortunately not a lot that can be done, AFAICT. There is one hack that I will send you later but it is not likely to help much. I will also think about how to track down the cause of the contention further (the profiling trace only shows that it comes mostly from vget/vput but doesn't show where these are called from). Actually this patch might help. It doesn't replace lockmgr but it does fix a silly thundering herd behaviour. It probably needs some adjustment to get it to apply cleanly (it is about 7 months old), and I apparently stopped using it because I ran into deadlocks. It might be stable enough to at least see how much it helps. Set the vfs.lookup_shared=1 sysctl to enable the other half of the patch. Kris Try this one instead, it applies to HEAD. You'll need to manually enter the paths though because of how p4 mangles diffs. Kris //depot/user/kris/contention/sys/kern/kern_lock.c#10 - /zoo/kris/contention/kern/kern_lock.c @@ -109,7 +109,6 @@ #define LK_ALL (LK_HAVE_EXCL | LK_WANT_EXCL | LK_WANT_UPGRADE | \ LK_SHARE_NONZERO | LK_WAIT_NONZERO) -static int acquire(struct lock **lkpp, int extflags, int wanted, int *contested, uint64_t *waittime); static int acquiredrain(struct lock *lkp, int extflags) ; static __inline void @@ -126,61 +125,17 @@ COUNT(td, -decr); if (lkp-lk_sharecount == decr) { - lkp-lk_flags = ~LK_SHARE_NONZERO; - if (lkp-lk_flags (LK_WANT_UPGRADE | LK_WANT_EXCL)) { - wakeup(lkp); - } + if (lkp-lk_exclusivewait != 0) + wakeup_one(lkp-lk_exclusivewait); lkp-lk_sharecount = 0; } else { lkp-lk_sharecount -= decr; + if (lkp-lk_sharecount == 1 lkp-lk_flags LK_WANT_UPGRADE) + wakeup(lkp-lk_flags); } } -static int -acquire(struct lock **lkpp, int extflags, int wanted, int *contested, uint64_t *waittime) -{ - struct lock *lkp = *lkpp; - int error; - CTR3(KTR_LOCK, - acquire(): lkp == %p, extflags == 0x%x, wanted == 0x%x, - lkp, extflags, wanted); - if ((extflags LK_NOWAIT) (lkp-lk_flags wanted)) - return EBUSY; - error = 0; - if ((lkp-lk_flags wanted) != 0) - lock_profile_obtain_lock_failed(lkp-lk_object, contested, waittime); - - while ((lkp-lk_flags wanted) != 0) { - CTR2(KTR_LOCK, - acquire(): lkp == %p, lk_flags == 0x%x sleeping, - lkp, lkp-lk_flags); - lkp-lk_flags |= LK_WAIT_NONZERO; - lkp-lk_waitcount++; - error = msleep(lkp, lkp-lk_interlock, lkp-lk_prio, - lkp-lk_wmesg, - ((extflags LK_TIMELOCK) ? lkp-lk_timo : 0)); - lkp-lk_waitcount--; - if (lkp-lk_waitcount == 0) - lkp-lk_flags = ~LK_WAIT_NONZERO; - if (error) - break; - if (extflags LK_SLEEPFAIL) { - error = ENOLCK; - break; - } - if (lkp-lk_newlock != NULL) { - mtx_lock(lkp-lk_newlock-lk_interlock); - mtx_unlock(lkp-lk_interlock); - if (lkp-lk_waitcount == 0) - wakeup((void *)(lkp-lk_newlock)); - *lkpp = lkp = lkp-lk_newlock; - } - } - mtx_assert(lkp-lk_interlock, MA_OWNED); - return (error); -} - /* * Set, change, or release a lock. * @@ -189,16 +144,16 @@ * accepted shared locks and shared-to-exclusive upgrades to go away. */ int -_lockmgr(struct lock *lkp, u_int flags, struct mtx *interlkp, -struct thread *td, char *file, int line) - +lockmgr(lkp, flags, interlkp, td) + struct lock *lkp; + u_int flags; + struct mtx *interlkp; + struct thread *td; { int error; struct thread *thr; - int extflags, lockflags; - int contested = 0; - uint64_t waitstart = 0; - + int extflags; + error = 0; if (td == NULL) thr = LK_KERNPROC; @@ -226,7 +181,7 @@ if ((flags (LK_NOWAIT|LK_RELEASE)) == 0) WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK, - lkp-lk_interlock-lock_object, + lkp-lk_interlock-mtx_object, Acquiring lockmgr lock \%s\, lkp-lk_wmesg); if (panicstr != NULL) { @@ -253,16 +208,30 @@ * lock itself ). */ if (lkp-lk_lockholder != thr) { - lockflags = LK_HAVE_EXCL; - if (td != NULL !(td-td_pflags TDP_DEADLKTREAT)) - lockflags |=
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Alexey Popov wrote: Hi. I have a large pool of web backends (Apache + mod_php5) with 2 x Xeon 3.2GHz processors and 2 x Xeon 5120 dual-core processors. The workload is mostly CPU-bound. I'm using 6-STABLE-amd64 and also tried 7-STABLE. If you haven't tried mod_fcgid, give it a try - it can dramatically benefit PHP applications. And with mod_fcgid, you can use apache with a multi-threaded MPM (i.e. worker-mpm). Now I'm trying to use new hardware with 2 x Xeon 5320 (quad-core), but it can not work under the same load as dual-core. It shows up to 80% system CPU load in top: On what version of FreeBSD is this? If it's 6-STABLE, this might be expected. CPU states: 9.5% user, 0.0% nice, 79.9% system, 1.2% interrupt, 9.5% idle Can you try hitting S to see if a kernel process is gobbling up CPU time? Here's the output from 2xdual-core backend running under the same load and with the same software: CPU states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle This line is bogus - where is the load? What can I do to make FreeBSD run faster on many-CPU systems??? Except for trying 7-STABLE, there's not much you can do. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
On Mon, 19 Nov 2007, Alexey Popov wrote: I tried Linux and it works much better than old (2 x dual-core) backends. It handles 2 times more requests than FreeBSD on the old backends. So there's a real scalability problem in FreeBSD. The more processors it have the more CPU time it consumes. Also I faced the same problem moving heavily loaded MySQL-server to new hardware. That time I thought that the problem is in the mysql-server itself and I had to install Linux. See in attach: mutex statistics for quad-core system and dmesg and vmstat for dual- and quad-core systems. What can I do to make FreeBSD run faster on many-CPU systems??? Have you configured libmap.conf to force MySQL to use libthr instead of libpthread? libpthread is known to have serious performance bottlenecks for MySQL as compared to libthr. FreeBSD 7 contains significant optimization for increased numbers of cores, and is where a lot of the work optimizing MySQL has ended up. I see you're trying out a 6.3 beta, any chance you could try out a 7.0 beta instead? Also, consider switching to options SCHED_ULE in the 7.0 kernel rather than options SCHED_4BSD. Robert N M Watson Computer Laboratory University of Cambridge ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Hi. Ivan Voras wrote: I have a large pool of web backends (Apache + mod_php5) with 2 x Xeon 3.2GHz processors and 2 x Xeon 5120 dual-core processors. The workload is mostly CPU-bound. I'm using 6-STABLE-amd64 and also tried 7-STABLE. If you haven't tried mod_fcgid, give it a try - it can dramatically benefit PHP applications. And with mod_fcgid, you can use apache with a multi-threaded MPM (i.e. worker-mpm). We tried to run php + nginx via fastcgi interface without apache at all, but improvement was too little (~10% more request per second) to abandon the advantages of apache. Now I'm trying to use new hardware with 2 x Xeon 5320 (quad-core), but it can not work under the same load as dual-core. It shows up to 80% system CPU load in top: On what version of FreeBSD is this? If it's 6-STABLE, this might be expected. I have almost identical results on 6-STABLE and 7-STABLE. Maybe 7-STABLE performs a little better. CPU states: 9.5% user, 0.0% nice, 79.9% system, 1.2% interrupt, 9.5% idle Can you try hitting S to see if a kernel process is gobbling up CPU time? There's no such a process: last pid: 5266; load averages: 24.67, 22.65, 17.44 up 0+03:56:38 17:09:37 121 processes: 41 running, 62 sleeping, 18 waiting CPU states: 9.5% user, 0.0% nice, 82.0% system, 0.5% interrupt, 8.0% idle Mem: 439M Active, 27M Inact, 80M Wired, 108K Cache, 58M Buf, 3341M Free Swap: 2048M Total, 2048M Free PID USERNAME PRI NICE SIZERES STATE C TIME WCPU COMMAND 5090 www-40 96572K 49464K RUN5 2:59 23.39% httpd 3748 www-40 96172K 50060K RUN4 14:21 23.19% httpd 5092 www-40 96412K 48060K RUN4 2:57 23.19% httpd 5095 www-40 98148K 50688K RUN5 2:57 22.75% httpd 5088 www-40 96664K 49120K RUN4 3:02 22.56% httpd 5098 www-40 97404K 49864K RUN3 2:57 22.56% httpd 5106 www 1180 97908K 49972K CPU7 6 2:57 22.51% httpd 5084 www-40 96012K 48164K RUN5 3:01 22.46% httpd 5081 www-40 96636K 49700K RUN0 3:01 22.36% httpd 5109 www-40 96844K 49188K RUN3 2:51 22.36% httpd 5108 www-40 95808K 47508K RUN5 3:00 22.31% httpd 5085 www-40 98244K 49560K RUN4 2:58 21.88% httpd 5104 www-40 96836K 48956K CPU5 5 2:55 21.88% httpd 5086 www 1180 99140K 51264K CPU0 3 3:00 21.78% httpd 5111 www-40 96360K 48532K RUN0 2:56 21.78% httpd 5105 www-40 96364K 47356K RUN0 2:58 21.73% httpd 5099 www-40 9K 47156K RUN4 2:55 21.73% httpd 5096 www-40 96004K 48324K RUN4 2:56 21.68% httpd 5083 www 1170 97712K 50344K RUN2 3:03 21.63% httpd 5094 www 1180 97196K 49348K CPU3 6 2:56 21.58% httpd 5103 www-40 96040K 48808K RUN4 2:58 21.48% httpd 5089 www 1180 96084K 47808K CPU2 4 2:59 21.34% httpd 5082 www 1170 96412K 48520K CPU6 5 3:00 21.29% httpd 5107 www-40 98172K 50332K RUN4 2:55 21.29% httpd 5091 www-40 97460K 49504K RUN0 2:56 20.95% httpd 5100 www-40 97188K 49400K RUN4 2:56 20.65% httpd 5110 www-40 95168K 47436K RUN5 2:59 20.56% httpd 5087 www 1160 98432K 51172K CPU4 5 2:55 20.31% httpd 5097 www-40 96428K 49124K RUN4 2:59 20.21% httpd 5102 www 1170 96344K 48512K CPU3 4 3:01 19.82% httpd 5093 www-40 96512K 49948K RUN4 2:55 19.82% httpd 5101 www-40 96012K 48968K RUN3 3:01 19.48% httpd 10 root 171 52 0K16K RUN7 174:56 7.86% idle: cpu7 12 root 171 52 0K16K RUN5 174:44 7.86% idle: cpu5 14 root 171 52 0K16K RUN3 175:04 7.62% idle: cpu3 Here's the output from 2xdual-core backend running under the same load and with the same software: CPU states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle This line is bogus - where is the load? Sorry, probably it was my fault in copypast. last pid: 54690; load averages: 3.47, 4.89, 5.18 up 42+02:07:51 17:00:00 47 processes: 3 running, 43 sleeping, 1 zombie CPU states: 56.0% user, 0.0% nice, 16.7% system, 1.6% interrupt, 25.7% idle Mem: 2268M Active, 416M Inact, 277M Wired, 186M Cache, 214M Buf, 664M Free Swap: 2048M Total, 1408K Used, 2047M Free PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND 54681 www 1 1060 96916K 47792K CPU3 0 0:10 33.45% httpd 54652 www 1 200 97716K 48144K lockf 1 0:24 31.61% httpd 54680 www 1 1060 96416K 46832K select 1 0:10 31.37% httpd 54686 www 1 200 97640K 45604K lockf 1 0:04 31.13% httpd 54651 www 1 1040 96552K 46924K CPU1 1 0:25 29.50% httpd 54685 www 1 1070 99124K 47300K select 3
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Hi. Robert Watson wrote: Also I faced the same problem moving heavily loaded MySQL-server to new hardware. That time I thought that the problem is in the mysql-server itself and I had to install Linux. What can I do to make FreeBSD run faster on many-CPU systems??? Have you configured libmap.conf to force MySQL to use libthr instead of libpthread? libpthread is known to have serious performance bottlenecks for MySQL as compared to libthr. I'm always using libthr with MySQL on 6-STABLE and it really helps. But that time with MySQL (and this time with Apache) the bottleneck was somewhere else. FreeBSD 7 contains significant optimization for increased numbers of cores, and is where a lot of the work optimizing MySQL has ended up. I see you're trying out a 6.3 beta, any chance you could try out a 7.0 beta instead? Also, consider switching to options SCHED_ULE in the 7.0 kernel rather than options SCHED_4BSD. I tried 7-BETA with SHED_4BSD and id did not help. Now I'll try SHED_ULE, thanks. With best regards, Alexey Popov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Alexey Popov wrote: last pid: 5266; load averages: 24.67, 22.65, 17.44 up 0+03:56:38 17:09:37 121 processes: 41 running, 62 sleeping, 18 waiting CPU states: 9.5% user, 0.0% nice, 82.0% system, 0.5% interrupt, 8.0% idle Mem: 439M Active, 27M Inact, 80M Wired, 108K Cache, 58M Buf, 3341M Free Swap: 2048M Total, 2048M Free PID USERNAME PRI NICE SIZERES STATE C TIME WCPU COMMAND 5090 www-40 96572K 49464K RUN5 2:59 23.39% httpd 3748 www-40 96172K 50060K RUN4 14:21 23.19% httpd 5092 www-40 96412K 48060K RUN4 2:57 23.19% httpd 5095 www-40 98148K 50688K RUN5 2:57 22.75% httpd 5088 www-40 96664K 49120K RUN4 3:02 22.56% httpd This is really unusual - the number of processes is not that high, but if I'm reading the line from systat correctly, you have unusually many context switches: r p d s w Csw Trp Sys Int Sof Fltcow 16839 total 27 1 39 137k 3390 33k 2490 313 2519 2519 zfod sio0 irq4 nginx or similar asynchronous web servers should reduce inter-process contention context switches dramatically, but you say that it didn't work as such so the problem might be somewhere else. Try sending a 10-second or so output from vmstat to confirm this problem. If you can, attach a ktrace(1) to one of the httpd processes that consumes CPU, and send the processed kdump output. Also, did you try configuring and running pecl-APC for PHP? ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Alexey Popov wrote: CPU states: 9.5% user, 0.0% nice, 82.0% system, 0.5% interrupt, 8.0% idle A wild idea that might not help: try reducing kern.hz in loader.conf to something like 100 and see if something significant changes. ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Hi Robert Watson wrote: FreeBSD 7 contains significant optimization for increased numbers of cores, and is where a lot of the work optimizing MySQL has ended up. I see you're trying out a 6.3 beta, any chance you could try out a 7.0 beta instead? Also, consider switching to options SCHED_ULE in the 7.0 kernel rather than options SCHED_4BSD. I tried SCHED_ULE, but got no difference: last pid: 1063; load averages: 22.75, 13.76, 6.31up 0+00:07:24 17:53:49 56 processes: 33 running, 23 sleeping CPU states: 26.5% user, 0.0% nice, 68.1% system, 0.3% interrupt, 5.1% idle Mem: 365M Active, 20M Inact, 102M Wired, 664K Cache, 46M Buf, 3419M Free Swap: 2048M Total, 2048M Free PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND 1019 www 1 1010 101M 51244K RUN6 0:37 26.86% httpd 1040 www 1 -40 92476K 42956K RUN1 0:36 26.76% httpd 1004 www 1 -40 92476K 42864K RUN4 0:38 25.98% httpd 1018 www 1 1010 91452K 41736K CPU3 3 0:37 25.68% httpd 1000 www 1 1010 92476K 42544K RUN0 0:36 25.29% httpd 1026 www 1 1010 93500K 39900K CPU0 0 0:35 25.20% httpd 1021 www 1 1010 101M 49432K RUN4 0:37 25.10% httpd 1024 www 1 1010 93500K 44416K RUN5 0:37 25.10% httpd 1020 www 1 1010 94524K 43684K RUN0 0:37 25.00% httpd 1030 www 1 1010 96576K 46004K RUN3 0:36 25.00% httpd 1031 www 1 1010 101M 50956K RUN3 0:37 24.66% httpd 1025 www 1 1010 94524K 43880K RUN5 0:36 24.56% httpd 1041 www 1 1010 92476K 41792K RUN2 0:36 24.56% httpd 1022 www 1 1010 101M 48932K RUN5 0:36 24.27% httpd With best regards, Alexey Popov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Hi. Ivan Voras wrote: CPU states: 9.5% user, 0.0% nice, 82.0% system, 0.5% interrupt, 8.0% idle A wild idea that might not help: try reducing kern.hz in loader.conf to something like 100 and see if something significant changes. Now it runs with hz=100, number of context switches became ~ 2 times less, but still there's 90% system CPU load (see attach). With best regards, Alexey Popov 1 usersLoad 16.36 12.24 6.14 Nov 19 18:08 Mem:KBREALVIRTUAL VN PAGER SWAP PAGER Tot Share TotShareFree in out in out Act 366988 16952 83428837248 3515624 count 1 All 423228 18472 508956841144 pages 4 Proc:Interrupts r p d s w Csw Trp Sys Int Sof Flt 1 cow3917 total 31 24 98k 35k 95k 2315 100 29k 29919 zfodsio0 irq4 ozfod ata0 irq14 48.6%Sys 1.0%Intr 49.5%User 0.0%Nice 1.0%Idle%ozfod 5 mfi0 irq18 ||||||||||| daefr uhci0 uhci + 1709 prcfr 200 cpu0: time 4 dtbuf23140 totfr 2311 em0 irq256 Namei Name-cache Dir-cache10 desvn react 200 cpu2: time Callshits %hits % 1494 numvn pdwak 200 cpu3: time 147517 147514 100 158 frevn pdpgs 200 cpu1: time intrn 200 cpu4: time Disks mfid0106840 wire200 cpu7: time KB/t 20.20355720 act 201 cpu5: time tps 5 21248 inact 200 cpu6: time MB/s 0.10 1228 cache %busy 1 3514792 free 65056 buf ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
On Mon, 19 Nov 2007 15:54:32 +0100, Alexey Popov [EMAIL PROTECTED] wrote: Hi Robert Watson wrote: FreeBSD 7 contains significant optimization for increased numbers of cores, and is where a lot of the work optimizing MySQL has ended up. I see you're trying out a 6.3 beta, any chance you could try out a 7.0 beta instead? Also, consider switching to options SCHED_ULE in the 7.0 kernel rather than options SCHED_4BSD. I tried SCHED_ULE, but got no difference: last pid: 1063; load averages: 22.75, 13.76, 6.31up 0+00:07:24 17:53:49 56 processes: 33 running, 23 sleeping CPU states: 26.5% user, 0.0% nice, 68.1% system, 0.3% interrupt, 5.1% idle Mem: 365M Active, 20M Inact, 102M Wired, 664K Cache, 46M Buf, 3419M Free Swap: 2048M Total, 2048M Free PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND 1019 www 1 1010 101M 51244K RUN6 0:37 26.86% httpd 1040 www 1 -40 92476K 42956K RUN1 0:36 26.76% httpd 1004 www 1 -40 92476K 42864K RUN4 0:38 25.98% httpd 1018 www 1 1010 91452K 41736K CPU3 3 0:37 25.68% httpd 1000 www 1 1010 92476K 42544K RUN0 0:36 25.29% httpd 1026 www 1 1010 93500K 39900K CPU0 0 0:35 25.20% httpd 1021 www 1 1010 101M 49432K RUN4 0:37 25.10% httpd 1024 www 1 1010 93500K 44416K RUN5 0:37 25.10% httpd 1020 www 1 1010 94524K 43684K RUN0 0:37 25.00% httpd 1030 www 1 1010 96576K 46004K RUN3 0:36 25.00% httpd 1031 www 1 1010 101M 50956K RUN3 0:37 24.66% httpd 1025 www 1 1010 94524K 43880K RUN5 0:36 24.56% httpd 1041 www 1 1010 92476K 41792K RUN2 0:36 24.56% httpd 1022 www 1 1010 101M 48932K RUN5 0:36 24.27% httpd You have a lot of free memory. Maybe you can wait a little to let it fill the cache or let it use more buf's. This could explain that the system is spending a lot if time in 'system'. Ronald. -- Ronald Klop Amsterdam, The Netherlands ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
On Mon, 19 Nov 2007, Alexey Popov wrote: Robert Watson wrote: FreeBSD 7 contains significant optimization for increased numbers of cores, and is where a lot of the work optimizing MySQL has ended up. I see you're trying out a 6.3 beta, any chance you could try out a 7.0 beta instead? Also, consider switching to options SCHED_ULE in the 7.0 kernel rather than options SCHED_4BSD. I tried SCHED_ULE, but got no difference: Did you see no change in throughput, or no change in reported CPU use? We should probably take this thread to performance@ and get Kris involved. He may be interested in trying to reproduce your workload in our testbed so we can perform measurements of our own, as well as getting you to provide profiling information. One of the things we'd most like to have are nice potted benchmarks for real-world workloads, as that allows us to easily replay them, perform measurements, optimize, etc. Thanks, Robert N M Watson Computer Laboratory University of Cambridge last pid: 1063; load averages: 22.75, 13.76, 6.31up 0+00:07:24 17:53:49 56 processes: 33 running, 23 sleeping CPU states: 26.5% user, 0.0% nice, 68.1% system, 0.3% interrupt, 5.1% idle Mem: 365M Active, 20M Inact, 102M Wired, 664K Cache, 46M Buf, 3419M Free Swap: 2048M Total, 2048M Free PID USERNAME THR PRI NICE SIZERES STATE C TIME WCPU COMMAND 1019 www 1 1010 101M 51244K RUN6 0:37 26.86% httpd 1040 www 1 -40 92476K 42956K RUN1 0:36 26.76% httpd 1004 www 1 -40 92476K 42864K RUN4 0:38 25.98% httpd 1018 www 1 1010 91452K 41736K CPU3 3 0:37 25.68% httpd 1000 www 1 1010 92476K 42544K RUN0 0:36 25.29% httpd 1026 www 1 1010 93500K 39900K CPU0 0 0:35 25.20% httpd 1021 www 1 1010 101M 49432K RUN4 0:37 25.10% httpd 1024 www 1 1010 93500K 44416K RUN5 0:37 25.10% httpd 1020 www 1 1010 94524K 43684K RUN0 0:37 25.00% httpd 1030 www 1 1010 96576K 46004K RUN3 0:36 25.00% httpd 1031 www 1 1010 101M 50956K RUN3 0:37 24.66% httpd 1025 www 1 1010 94524K 43880K RUN5 0:36 24.56% httpd 1041 www 1 1010 92476K 41792K RUN2 0:36 24.56% httpd 1022 www 1 1010 101M 48932K RUN5 0:36 24.27% httpd With best regards, Alexey Popov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Hi. Ivan Voras wrote: last pid: 5266; load averages: 24.67, 22.65, 17.44 up 0+03:56:38 121 processes: 41 running, 62 sleeping, 18 waiting CPU states: 9.5% user, 0.0% nice, 82.0% system, 0.5% interrupt, 8.0% idle This is really unusual - the number of processes is not that high, but if I'm reading the line from systat correctly, you have unusually many context switches: r p d s w Csw Trp Sys Int Sof Fltcow 16839 total 27 1 39 137k 3390 33k 2490 313 2519 2519 zfod sio0 irq4 nginx or similar asynchronous web servers should reduce inter-process contention context switches dramatically, but you say that it didn't work as such so the problem might be somewhere else. Try sending a 10-second or so output from vmstat to confirm this problem. Yes, there's really many context switches: %vmstat 1 procs memory page disk faults cpu r b w avmfre flt re pi pofr sr mf0 in sy cs us sy id 23 1 0 615284 3581456 15980 0 0 0 15964 0 0 1414 58211 115230 25 60 15 24 0 0 631668 3564976 9940 0 0 0 5793 0 0 664 30036 158059 11 79 10 20 0 0 655220 3545516 22146 0 0 0 16731 0 0 1992 77638 116627 31 65 4 23 0 0 622452 3579700 18248 0 0 0 27451 0 0 1839 80646 115798 38 59 3 15 9 0 614260 3587484 4795 0 0 0 6765 0 0 352 23938 159993 6 83 11 21 0 0 625524 3567948 10154 0 0 0 5308 0 0 653 32718 159119 11 81 8 13 3 0 627572 3571924 15266 0 0 0 16278 0 0 1031 50321 142111 20 69 11 21 0 0 605044 3591860 9008 0 0 0 14021 0 0 873 42083 160441 13 79 8 19 1 0 611188 3593404 7498 0 0 0 7920 0 0 489 30012 158176 10 77 13 24 0 0 610164 3592360 5855 0 0 0 5602 0 0 666 26627 162937 8 81 11 20 3 0 622452 3587456 6372 0 0 0 5144 0 0 362 23705 161257 10 81 10 ^C % If you can, attach a ktrace(1) to one of the httpd processes that consumes CPU, and send the processed kdump output. Here is it: http://83.167.98.162/gprof/kdump.txt.gz Also, did you try configuring and running pecl-APC for PHP?'s I'm using eAccelerator. Again, the same soft works good on less-CPU system and on Linux. With best regards, Alexey Popov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Hi. Robert Watson wrote: I tried SCHED_ULE, but got no difference: Did you see no change in throughput, or no change in reported CPU use? No significant changes. We should probably take this thread to performance@ and get Kris involved. He may be interested in trying to reproduce your workload in our testbed so we can perform measurements of our own, as well as getting you to provide profiling information. One of the things we'd most like to have are nice potted benchmarks for real-world workloads, as that allows us to easily replay them, perform measurements, optimize, etc. I can provide all profiling or configuration information you ask for. Except I can't provide PHP site source codes. Now I'm in situation that I can't install FreeBSD on all new servers because they are all based on 2xquad-core processors and I can't be sure it would work good. With best regards, Alexey Popov ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi All, What version of apache do you use and what are: StartServers MinSpareServers MaxSpareServers MaxClients KeepAliveTimeout settings in both configurations? Best Regards Alexey Popov wrote: Hi. I have a large pool of web backends (Apache + mod_php5) with 2 x Xeon 3.2GHz processors and 2 x Xeon 5120 dual-core processors. The workload is mostly CPU-bound. I'm using 6-STABLE-amd64 and also tried 7-STABLE. Now I'm trying to use new hardware with 2 x Xeon 5320 (quad-core), but it can not work under the same load as dual-core. It shows up to 80% system CPU load in top: -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (FreeBSD) iD8DBQFHQblfxJBWvpalMpkRAgTQAJ4uy8qhmpCVWevAI0LSYXPrXiIUSQCeNE8y +dkavLoDzqrILkqVGZNZZDM= =xI6R -END PGP SIGNATURE- ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
On Nov 19, 2007 2:32 PM, Alexey Popov [EMAIL PROTECTED] wrote: Hi. I have a large pool of web backends (Apache + mod_php5) with 2 x Xeon 3.2GHz processors and 2 x Xeon 5120 dual-core processors. The workload is mostly CPU-bound. I'm using 6-STABLE-amd64 and also tried 7-STABLE. Now I'm trying to use new hardware with 2 x Xeon 5320 (quad-core), but it can not work under the same load as dual-core. It shows up to 80% system CPU load in top: last pid: 3850; load averages: 22.51, 19.75, 12.18 Very high load. Could it be the raid-controller? I had a db-server with horibble performance due to a cheap raid-controller. Moving to a ciss-controller (DL380 G5) solved all my issues. My load decreased 100 fold. -- regards Claus When lenity and cruelty play for a kingdom, the gentlest gamester is the soonest winner. Shakespeare ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Alexey Popov wrote: Hi. Robert Watson wrote: I tried SCHED_ULE, but got no difference: Did you see no change in throughput, or no change in reported CPU use? No significant changes. We should probably take this thread to performance@ and get Kris involved. He may be interested in trying to reproduce your workload in our testbed so we can perform measurements of our own, as well as getting you to provide profiling information. One of the things we'd most like to have are nice potted benchmarks for real-world workloads, as that allows us to easily replay them, perform measurements, optimize, etc. I can provide all profiling or configuration information you ask for. Except I can't provide PHP site source codes. Now I'm in situation that I can't install FreeBSD on all new servers because they are all based on 2xquad-core processors and I can't be sure it would work good. Running mutex profiling for e.g. 1 minute of representative load would be a useful starting point, as well as hwpmc profiling for the same duration. My guess is that you're hitting contention in the TCP send path, but I missed the start of this conversation so I don't know what problems you are seeing. Kris ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Alexey Popov wrote: Here is it: http://83.167.98.162/gprof/kdump.txt.gz I don't see anything unusual there. Some more ideas: How is your disk load (iostat, systat -vm, diskinfo -t) during the load? You don't use NFS for the web directories, do you? Can you run bonnie++ while the machine is idle (i.e. apache is stopped) just to verify it isn't a stupid problem with the disks or the driver? Also, did you try configuring and running pecl-APC for PHP?'s I'm using eAccelerator. Again, the same soft works good on less-CPU system and on Linux. So, you pick the CPU out of the motherboard and plug in another one? If not, you can't be sure that some other thing isn't wrong. I know you tried it on Linux, but it might use slightly different commands in the driver that don't trigger the error. I'm very surprised that both 6.x and 7.x behave almost the same on your load: since they are very different in how they support multiple CPU-s, I'd expect a big difference in this case (in favour of 7.x), not a small one. This might point that the problem is not in the OS itself, but maybe in the hardware or in some driver. Many people (including me) have run FreeBSD on machines like yours without such problems, so let's dig further. You don't have WITNESS, INVARIANTS, DIAGNOSTICS or something similar enabled? Can you try a generic SMP kernel (called SMP in 6.x; the GENERIC in 7.x has SMP by default) and see how it works? Can you disable SMP and try with only one CPU (on the 2xquad machine)? You can do it in loader.conf by setting kern.smp.disabled=1, or perhaps in BIOS. If there's a problem in some hardware or a driver, you'd still get a big load on sys time. You might also want to halt certain logical CPUs in the OS itself (see smp(4) man page) and see if there's a certain relationship between how many CPUs are running and what the sys load is. signature.asc Description: OpenPGP digital signature
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
On Mon, Nov 19, 2007 at 07:35:09PM +0100, Ivan Voras wrote: Some more ideas: How is your disk load (iostat, systat -vm, diskinfo -t) during the load? You don't use NFS for the web directories, do you? Don't forget about gstat(8), which (if the issue is an I/O bottleneck) may help pinpoint what particular disk device is being utilised too heavily. -- | Jeremy Chadwickjdc at parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB | ___ freebsd-stable@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-stable To unsubscribe, send any mail to [EMAIL PROTECTED]
Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD
Kris Kennaway wrote: My guess is that you're hitting contention in the TCP send path, but I missed the start of this conversation so I don't know what problems you are seeing. Here it is: http://lists.freebsd.org/pipermail/freebsd-stable/2007-November/038371.html there's some mutex profiling there. Offtopic: How to you read output from debug.mutex.prof.stats? Is cnt_lock the number of times a lock has been attempted to be acquired but it wasn't available? signature.asc Description: OpenPGP digital signature