Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-10 Thread Martin Cracauer
Reko Turja wrote on Sun, Dec 02, 2007 at 12:23:15AM +0200: 
 On Sat, 01 Dec 2007 23:37:32 +0200, Alexey Vlasov [EMAIL PROTECTED] wrote:
 
 kernel:
 machine i386
 cpu I686_CPU
 ident   F1RNT1
 
 options PAE
 
 One very probable culprit for slowness

Sorry for the late reply, but here are some results for PAE.  The
slowdown isn't dramatic.  Of course this is just 3 GB normal RAM + 1
GB PAE, so 3 Gb normal + 9 GB PAE would look worse.

http://www.cons.org/cracauer/crabench/pae.user.html

Martin
-- 
%%%
Martin Cracauer [EMAIL PROTECTED]   http://www.cons.org/cracauer/
FreeBSD - where you want to go, today.  http://www.freebsd.org/
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-04 Thread Alexey Popov

Hi

Robert Watson wrote:
Evidence in-hand seems to suggest that 8 core systems work very well for 
most users, and reflect a significant performance increase with 7.0 over 
previous FreeBSD releases.

I disagree with that. Heavily loaded Apache, MySQL, Postgres does not work well.

 The right path forwawrd at this point is to diagnosis the problems 
and work on fixing them in 8-CURRENT, and assuming they are not 
highly disruptive, MFC them for FreeBSD 7.1.


I believe at least the bug with lockmgr contention should be fixed 
before release.
Could you point me at the specific proposed change in question?  I don't 
think I've seen it come across re@ as a potential merge request.  
Changing locking primitives close to a release is, FYI, a risky 
business, as while it may improve performance in specific cases, we may 
not have a lot of information about more general cases.  We also risk 
opening up previously nascent race conditions in lock consumers.
Kris sent me proof of concept patch that helped much against high lockmgr 
contention. After applying this patch 8-core server become faster that 4-core. 
But, again, it's still slower than Linux.


Here's the patch:
http://lists.freebsd.org/pipermail/freebsd-stable/2007-November/038449.html

Here's Kris saying that it helps:
http://lists.freebsd.org/pipermail/freebsd-stable/2007-November/038672.html

I'm not sure it will help to MySQL and Prostgres, but symptoms are mostly 
identical.

With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-04 Thread Krassimir Slavchev
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Alexey Popov wrote:
 Hi
 
 Robert Watson wrote:
 Evidence in-hand seems to suggest that 8 core systems work very well
 for most users, and reflect a significant performance increase with
 7.0 over previous FreeBSD releases.
 I disagree with that. Heavily loaded Apache, MySQL, Postgres does not
 work well.

There is another report for such problems:

http://blog.insidesystems.net/articles/2007/04/09/what-did-i-do-wrong

 
  The right path forwawrd at this point is to diagnosis the problems
 and work on fixing them in 8-CURRENT, and assuming they are not
 highly disruptive, MFC them for FreeBSD 7.1.

 I believe at least the bug with lockmgr contention should be fixed
 before release.
 Could you point me at the specific proposed change in question?  I
 don't think I've seen it come across re@ as a potential merge
 request.  Changing locking primitives close to a release is, FYI, a
 risky business, as while it may improve performance in specific cases,
 we may not have a lot of information about more general cases.  We
 also risk opening up previously nascent race conditions in lock
 consumers.
 Kris sent me proof of concept patch that helped much against high
 lockmgr contention. After applying this patch 8-core server become
 faster that 4-core. But, again, it's still slower than Linux.
 
 Here's the patch:
 http://lists.freebsd.org/pipermail/freebsd-stable/2007-November/038449.html
 
 Here's Kris saying that it helps:
 http://lists.freebsd.org/pipermail/freebsd-stable/2007-November/038672.html
 
 I'm not sure it will help to MySQL and Prostgres, but symptoms are
 mostly identical.
 
 With best regards,
 Alexey Popov
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to [EMAIL PROTECTED]
 

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (FreeBSD)

iD8DBQFHVTFwxJBWvpalMpkRAjX0AJ4otHVzAzVqVRKJxUlD4Y9ENdD5PACgq/eZ
ptzb/VC56JFh/Iiepy+bK/s=
=wpdw
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-04 Thread Ivan Voras
Krassimir Slavchev wrote:

 There is another report for such problems:
 
 http://blog.insidesystems.net/articles/2007/04/09/what-did-i-do-wrong

Of course - FreeBSD 6.x is really bad at SMP where number of CPUs is
larger then about 2 and the loads include much kernel work (e.g. IO,
context switches). Numeric tasks (SSL) don't depend on the kernel and so
they scale ok. See
http://people.freebsd.org/~kris/scaling/Scalability%20Update.pdf for
details.

Another issue is interesting in this thread: that apparently 7.0 also
has a well defined workload where it fails.


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-04 Thread Robert Watson


On Tue, 4 Dec 2007, Krassimir Slavchev wrote:

Evidence in-hand seems to suggest that 8 core systems work very well for 
most users, and reflect a significant performance increase with 7.0 over 
previous FreeBSD releases.


I disagree with that. Heavily loaded Apache, MySQL, Postgres does not work 
well.


There is another report for such problems:

http://blog.insidesystems.net/articles/2007/04/09/what-did-i-do-wrong


A casual reading suggests that this article is about FreeBSD 6.2, and not 
FreeBSD 7.0.  Am I misreading?


Robert N M Watson
Computer Laboratory
University of Cambridge






 The right path forwawrd at this point is to diagnosis the problems
and work on fixing them in 8-CURRENT, and assuming they are not
highly disruptive, MFC them for FreeBSD 7.1.


I believe at least the bug with lockmgr contention should be fixed
before release.

Could you point me at the specific proposed change in question?  I
don't think I've seen it come across re@ as a potential merge
request.  Changing locking primitives close to a release is, FYI, a
risky business, as while it may improve performance in specific cases,
we may not have a lot of information about more general cases.  We
also risk opening up previously nascent race conditions in lock
consumers.

Kris sent me proof of concept patch that helped much against high
lockmgr contention. After applying this patch 8-core server become
faster that 4-core. But, again, it's still slower than Linux.

Here's the patch:
http://lists.freebsd.org/pipermail/freebsd-stable/2007-November/038449.html

Here's Kris saying that it helps:
http://lists.freebsd.org/pipermail/freebsd-stable/2007-November/038672.html

I'm not sure it will help to MySQL and Prostgres, but symptoms are
mostly identical.

With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]



-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (FreeBSD)

iD8DBQFHVTFwxJBWvpalMpkRAjX0AJ4otHVzAzVqVRKJxUlD4Y9ENdD5PACgq/eZ
ptzb/VC56JFh/Iiepy+bK/s=
=wpdw
-END PGP SIGNATURE-


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-04 Thread Tom Evans
On Tue, 2007-12-04 at 13:00 +0100, Ivan Voras wrote:
 Krassimir Slavchev wrote:
 
  There is another report for such problems:
  
  http://blog.insidesystems.net/articles/2007/04/09/what-did-i-do-wrong
 
 Of course - FreeBSD 6.x is really bad at SMP where number of CPUs is
 larger then about 2 and the loads include much kernel work (e.g. IO,
 context switches). Numeric tasks (SSL) don't depend on the kernel and so
 they scale ok. See
 http://people.freebsd.org/~kris/scaling/Scalability%20Update.pdf for
 details.
 
 Another issue is interesting in this thread: that apparently 7.0 also
 has a well defined workload where it fails.
 

There is also his follow up to that post, comparing postgres on 6.2 with
7.0 (ULE and 4BSD schedulers).

http://blog.insidesystems.net/articles/2007/04/11/postgresql-scaling-on-6-2-and-7-0

I'm very excited about getting some 7.0 servers into testing prior to
deployment as production mysql boxes. Having run 7-CURRENT on my lappy
for best part of 15 months, I think its supersmashinggreat :)

Tom


signature.asc
Description: This is a digitally signed message part


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-04 Thread Krassimir Slavchev
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Robert Watson wrote:
 
 On Tue, 4 Dec 2007, Krassimir Slavchev wrote:
 
 Evidence in-hand seems to suggest that 8 core systems work very well
 for most users, and reflect a significant performance increase with
 7.0 over previous FreeBSD releases.

 I disagree with that. Heavily loaded Apache, MySQL, Postgres does not
 work well.

 There is another report for such problems:

 http://blog.insidesystems.net/articles/2007/04/09/what-did-i-do-wrong
 
 A casual reading suggests that this article is about FreeBSD 6.2, and
 not FreeBSD 7.0.  Am I misreading?

No, But these tests can be performed on FreeBSD 7.0 4/8 core systems.

 
 Robert N M Watson
 Computer Laboratory
 University of Cambridge
 


  The right path forwawrd at this point is to diagnosis the problems
 and work on fixing them in 8-CURRENT, and assuming they are not
 highly disruptive, MFC them for FreeBSD 7.1.

 I believe at least the bug with lockmgr contention should be fixed
 before release.
 Could you point me at the specific proposed change in question?  I
 don't think I've seen it come across re@ as a potential merge
 request.  Changing locking primitives close to a release is, FYI, a
 risky business, as while it may improve performance in specific cases,
 we may not have a lot of information about more general cases.  We
 also risk opening up previously nascent race conditions in lock
 consumers.
 Kris sent me proof of concept patch that helped much against high
 lockmgr contention. After applying this patch 8-core server become
 faster that 4-core. But, again, it's still slower than Linux.

 Here's the patch:
 http://lists.freebsd.org/pipermail/freebsd-stable/2007-November/038449.html


 Here's Kris saying that it helps:
 http://lists.freebsd.org/pipermail/freebsd-stable/2007-November/038672.html


 I'm not sure it will help to MySQL and Prostgres, but symptoms are
 mostly identical.

 With best regards,
 Alexey Popov
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to
 [EMAIL PROTECTED]


 -BEGIN PGP SIGNATURE-
 Version: GnuPG v1.4.7 (FreeBSD)

 iD8DBQFHVTFwxJBWvpalMpkRAjX0AJ4otHVzAzVqVRKJxUlD4Y9ENdD5PACgq/eZ
 ptzb/VC56JFh/Iiepy+bK/s=
 =wpdw
 -END PGP SIGNATURE-

 

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (FreeBSD)

iD8DBQFHVUdzxJBWvpalMpkRAsXNAJ9HinGlM19ePrSdXiLqkKRgCWUHpgCfVRaw
yi7Tz4lN6dcrtFVdn9601yw=
=BLSg
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-04 Thread Robert Watson


On Tue, 4 Dec 2007, Krassimir Slavchev wrote:

Evidence in-hand seems to suggest that 8 core systems work very well for 
most users, and reflect a significant performance increase with 7.0 over 
previous FreeBSD releases.


I disagree with that. Heavily loaded Apache, MySQL, Postgres does not 
work well.


There is another report for such problems:

http://blog.insidesystems.net/articles/2007/04/09/what-did-i-do-wrong


A casual reading suggests that this article is about FreeBSD 6.2, and not 
FreeBSD 7.0.  Am I misreading?


No, But these tests can be performed on FreeBSD 7.0 4/8 core systems.


These are precisely the sorts of tests we have been running.  You can read a 
bit about the test in Kris's BSDCon.tr presentation:


  http://people.freebsd.org/~kris/scaling/7.0%20Preview.pdf

We can't promise improvement on every workload, but we have seen real 
improvements on a great many workloads.  I don't think anyone would argue that 
there isn't more work to be done, but at some point you have to stabilize and 
cut a release so that people can use something in the mean time.  Releasing a 
perfect operating system in ten years helps no one. :-)  The real issue at 
hand is whether we've hit a critical problem that justifies delaying the 
release in order to refine, test, and merge a change of a critical locking 
primitive in the kernel.  Changing locking primitives, as I mentioned in an 
earlier post, is a risky thing: after all, it intentionally changes the timing 
for critical kernel data structures in the file system code.  I've given 
Stephan, the author of the patch, a ping to ask him about this, but late in a 
release cycle, conservativism is the watch-word.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-04 Thread Robert Watson


On Tue, 4 Dec 2007, Ivan Voras wrote:


Krassimir Slavchev wrote:


There is another report for such problems:

http://blog.insidesystems.net/articles/2007/04/09/what-did-i-do-wrong


Of course - FreeBSD 6.x is really bad at SMP where number of CPUs is larger 
then about 2 and the loads include much kernel work (e.g. IO, context 
switches). Numeric tasks (SSL) don't depend on the kernel and so they scale 
ok. See http://people.freebsd.org/~kris/scaling/Scalability%20Update.pdf for 
details.


Another issue is interesting in this thread: that apparently 7.0 also has a 
well defined workload where it fails.


There are several known contention points that are high on the list of targets 
for the 8-CURRENT branch, some hopefully with MFCs in time for 7.1.  These 
include contention on the tcbinfo lock, which protects global TCP data 
structures, and route table locking, which can affect high packets-per-second 
transmission on multiple CPUs at a time.  lockmgr is high on the list for 
optimization also, especially since it's an older-style sleep lock constructed 
out of a mutex and msleep.  When we optimized file descriptor locking in 7 
(which mostly impacted threaded applications, and was one of the primary 
sources of improvement for MySQL), it had a very similar construction as 
lockmgr currently has, and optimization made a very big difference.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-04 Thread Miroslav Lachman

Tom Evans wrote:

On Tue, 2007-12-04 at 13:00 +0100, Ivan Voras wrote:


Krassimir Slavchev wrote:



There is another report for such problems:

http://blog.insidesystems.net/articles/2007/04/09/what-did-i-do-wrong


Of course - FreeBSD 6.x is really bad at SMP where number of CPUs is
larger then about 2 and the loads include much kernel work (e.g. IO,
context switches). Numeric tasks (SSL) don't depend on the kernel and so
they scale ok. See
http://people.freebsd.org/~kris/scaling/Scalability%20Update.pdf for
details.

Another issue is interesting in this thread: that apparently 7.0 also
has a well defined workload where it fails.




There is also his follow up to that post, comparing postgres on 6.2 with
7.0 (ULE and 4BSD schedulers).

http://blog.insidesystems.net/articles/2007/04/11/postgresql-scaling-on-6-2-and-7-0

I'm very excited about getting some 7.0 servers into testing prior to
deployment as production mysql boxes. Having run 7-CURRENT on my lappy
for best part of 15 months, I think its supersmashinggreat :)


I know this thread is about SMP scalling, but most of my machines are UP 
(Sun Fire X2100) so I run my own synthetic benchmarks (super-smack on 
MySQL 5.0.45 and ab on Apache 2.2.6) on an old box with AMD Barton 2500+ 
with 512MB RAM.
I was a little disappointed, because FreeBSD 6.2 UP behaves better than 
FreeBSD 7.0-BETA3 (4BSD and ULE tested).


super-smack on 6.2
Query_type  num_queries max_timemin_timeq_per_s
select_index600 0   0  15061.63

super-smack on 7.0 (no metter if 4BSD or ULE)
Query_type  num_queries max_timemin_timeq_per_s
select_index600 0   0  14320.31

used command: super-smack select-key.smack 10 30
results are from the second run

The Apache Benchmark result was same on 6.2 and 7.0 (about 165 req/s), 
but on 7.0 Apache forks more processes (MPM prefork was used) than on 6.2.
On 6.2 Apache has about 40 httpd processes running, but on 7.0 it has 
about 130 and console response was very very bad.


example of top from 7.0 running ab -c 15 -n 5 
http://192.168.1.164/phpinfo.php


last pid:  1650;  load averages: 83.80, 33.58, 13.71 
 up 
0+00:32:44  12:09:16

170 processes: 126 running, 43 sleeping, 1 zombie
CPU states: 65.9% user,  0.0% nice, 14.1% system, 19.9% interrupt,  0.0% 
idle

Mem: 140M Active, 20M Inact, 64M Wired, 5028K Cache, 34M Buf, 9708K Free
Swap: 512M Total, 41M Used, 470M Free, 8% Inuse

Console response was better with ULE than 4BSD, but stil not so smooth 
like in 6.2


So I will postpone upgrade of all my 6.2 UP machines until 7.x UP will 
behave better or 6.x will reach EOL.


Miroslav Lachman
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-04 Thread Ivan Voras
Robert Watson wrote:

 Changing
 locking primitives, as I mentioned in an earlier post, is a risky thing:
 after all, it intentionally changes the timing for critical kernel data
 structures in the file system code.  I've given Stephan, the author of
 the patch, a ping to ask him about this, but late in a release cycle,
 conservativism is the watch-word.

Agreed, but it would be a shame to miss on the momentum 7.0 has acquired
 for performance. Web servers are so common that there's a huge chance
one of the first thing people will do with 7.0 would be some kind of
web-benchmarks, especially after this thread on [EMAIL PROTECTED] Though (as I
read the thread) the patch won't bring FreeBSD in line with Linux, it
will help it not to be so slow it's silly.

Re: timings: Would looking at past instances give insight into future? I
don't remember the time accurately, but in the past, when VFS was
translated to MPSAFE and the locking reengineered, were there such problems?

Maybe Peter Holm can run a week or so of constant stress testing
(24-hours-a-day) with the patch to verify it at least in short term?




signature.asc
Description: OpenPGP digital signature


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-04 Thread Ken Smith

On Tue, 2007-12-04 at 14:11 +0100, Ivan Voras wrote:
 Robert Watson wrote:
 
  Changing
  locking primitives, as I mentioned in an earlier post, is a risky thing:
  after all, it intentionally changes the timing for critical kernel data
  structures in the file system code.  I've given Stephan, the author of
  the patch, a ping to ask him about this, but late in a release cycle,
  conservativism is the watch-word.
 
 Agreed, but it would be a shame to miss on the momentum 7.0 has acquired
  for performance. Web servers are so common that there's a huge chance
 one of the first thing people will do with 7.0 would be some kind of
 web-benchmarks, especially after this thread on [EMAIL PROTECTED] Though (as I
 read the thread) the patch won't bring FreeBSD in line with Linux, it
 will help it not to be so slow it's silly.
 
 Re: timings: Would looking at past instances give insight into future? I
 don't remember the time accurately, but in the past, when VFS was
 translated to MPSAFE and the locking reengineered, were there such problems?
 
 Maybe Peter Holm can run a week or so of constant stress testing
 (24-hours-a-day) with the patch to verify it at least in short term?
 

I need to agree with Robert on this one.  At some point you need to stop
fiddling with nits, cut the release, and then fiddle with the nits in
preparation for the next release.  As we get closer to the point we
think we can actually do the release RE needs to weigh the benefits of
commit requests versus the risks.  One of the biggest factors in our
evaluation of the benefits is whether it's addressing an issue that
completely blocks functionality (due to the bugs the system panics or
otherwise does not do something it should) or if it merely improves on
something.  The latter we really need to consider extremely carefully
because it's *possible* that adjustment would lead to the introduction
of new bugs of the blocks functionality form.

And this thread demonstrates to some degree exactly why a week of Peter
Holm's stress testing doesn't leave us with the warm fuzzy feeling that
an adjustment is perfect.  It shows it's OK for his synthetic workload.
But synthetic workloads of various forms showed improvements in
throughput with 7.0 versus 6.3 while other workloads (e.g. the one that
started off this thread...) don't.  Whether 7.0 helps with peoples'
workloads or not there is one thing in common throughout this thread and
that's nobody here has been saying the system fails completely (note I
said *this* *thread*... :-).  RE values that over people getting
improved performance for specific workloads at *this* phase of a release
cycle.

-- 
Ken Smith
- From there to here, from here to  |   [EMAIL PROTECTED]
  there, funny things are everywhere.   |
  - Theodore Geisel |



signature.asc
Description: This is a digitally signed message part


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-04 Thread Zoran Kolic
  Now we also have terribly performing PostgreSQL on 8-core server. We 
  noticed the slowdown after moving PostgreSQL from 2xXeon 3.0 
  Apache+PostgreSQL server to dedicated PostgreSQL server. I collected 
  some stats (see attach) before moving to Linux.

I'm sure that some code optimization could help to have
multicore enhancement. Rapidmind example comes as proof
that there is room to make things better.
I saw some text on Sun compiler making code more able to
use cores and get the speed. Intel also tried to do the
same. Maybe the same extent of work should be put on app
optimization, aside to os changes.
Despite I like bsd better, must to say that linux does not
sleep and wait.

   Zoran

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-03 Thread Alexey Popov

Hi

Mark Linimon wrote:

I used 7.0-BETA3 and it is much worse.

Ouch.  A lot of systems see improvement.  Thanks for trying it
out.  I hope that one of the people that has been doing the actual
work can now comment (I am just an onlooker), and that you can be
patient in the meantime.
Unfortunately, Kris, who often looks at these kind of issues, is
traveling for all of December and thus off the net.
Is there any other FreeBSD developer who can take care of performance 
problems on many-cores systems? Seems like upcoming 7-RELEASE and 
6.3-RELEASE would be completely unusable for us on that kind of systems 
i.e. mostly on all modern hardware.


Now we also have terribly performing PostgreSQL on 8-core server. We 
noticed the slowdown after moving PostgreSQL from 2xXeon 3.0 
Apache+PostgreSQL server to dedicated PostgreSQL server. I collected 
some stats (see attach) before moving to Linux.


With best regards,
Alexey Popov
last pid: 58755;  load averages: 26.42, 20.88, 14.00
up 25+22:12:42  11:51:11
84 processes:  29 running, 55 sleeping
CPU states: % user, % nice, % system, % interrupt, % idle
Mem: 1149M Active, 1971M Inact, 464M Wired, 120M Cache, 214M Buf, 161M Free
Swap: 2048M Total, 72K Used, 2048M Free

  PID USERNAMETHR PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
58541 pgsql 1  -40  1068M   655M semwai 5   0:17 27.08% postgres
58664 pgsql 1   40  1068M   458M sblock 0   0:05 25.49% postgres
58677 pgsql 1 1290  1067M   291M RUN2   0:04 24.55% postgres
58713 pgsql 1 1300  1067M   210M RUN5   0:03 23.99% postgres
58705 pgsql 1 1300  1069M   214M CPU7   4   0:03 23.03% postgres
58679 pgsql 1 1290  1068M   306M RUN1   0:04 22.45% postgres
58724 pgsql 1 1300  1068M   179M RUN4   0:02 22.19% postgres
58698 pgsql 1 1290  1068M   238M RUN0   0:03 22.19% postgres
58715 pgsql 1 1300  1068M   188M RUN0   0:02 21.68% postgres
58727 pgsql 1 1310  1069M   119M RUN1   0:01 20.15% postgres
58658 pgsql 1 1250  1069M   304M CPU0   0   0:03 19.99% postgres
58728 pgsql 1 1310  1068M   104M RUN3   0:01 19.57% postgres
58726 pgsql 1  -40  1067M   140M semwai 6   0:01 18.83% postgres
58730 pgsql 1 1310  1067M 96504K RUN2   0:01 17.42% postgres
58695 pgsql 1 1280  1069M   194M RUN0   0:02 16.37% postgres
58731 pgsql 1 1310  1068M 57016K CPU2   4   0:01 14.77% postgres
58737 pgsql 1 1310  1067M 53680K RUN3   0:01 13.45% postgres
58738 pgsql 1 1310  1067M 50508K RUN4   0:00 13.45% postgres
58743 pgsql 1 1310  1067M 29588K CPU4   2   0:00  9.74% postgres
58712 pgsql 1  -40  1069M 60488K semwai 6   0:01  9.57% postgres
58733 pgsql 1 1310  1068M 42968K RUN6   0:00  8.61% postgres
58742 pgsql 1 1310  1067M 27284K RUN1   0:00  6.65% postgres
58740 pgsql 1 1310  1067M 20096K RUN7   0:00  5.60% postgres
58736 pgsql 1  -40  1067M 26164K semwai 6   0:00  5.38% postgres
58734 pgsql 1 1300  1068M 33496K RUN7   0:00  4.04% postgres
58741 pgsql 1   40  1067M 23308K sbwait 7   0:00  3.85% postgres
58735 pgsql 1  -40  1067M 26152K semwai 5   0:00  3.50% postgres
47990 pgsql 1 1320  1066M  4300K select 6 163:53  1.51% postgres
58750 pgsql 1 1310  1067M  6816K RUN5   0:00  1.00% postgres
58751 pgsql 1 1310  1067M  6368K RUN6   0:00  1.00% postgres
58748 pgsql 1 1310  1067M  6456K CPU6   6   0:00  1.00% postgres
58732 pgsql 1   40  1067M  6772K sbwait 4   0:00  0.88% postgres
58744 pgsql 1  -40  1067M 10956K semwai 6   0:00  0.51% postgres
58745 pgsql 1   40  1067M  6804K sbwait 1   0:00  0.51% postgres


2 usersLoad 27.56 21.69 14.53  Dec  3 11:51

Mem:KBREALVIRTUAL   VN PAGER   SWAP PAGER
Tot   Share  TotShareFree   in   out in   out
Act 12729446956  137390410528  156944  count
All 15162248576  590766818076  pages
Proc:Interrupts
  r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt222 cow   16147 total
 38  46  50  9210  29k 2349 1313  223  28k   3475 zfodatkbd0 1
 3449 ozfod   ata0 irq14
75.9%Sys   0.4%Intr 21.3%User  0.0%Nice  2.4%Idle  99%ozfod   161 em0 mfi0 1
|||||||||||   daefr  1999 cpu0: time
==1893 prcfr  1999 cpu1: time
   282 dtbuf 4131 totfr  1998 cpu2: time
Namei 

Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-03 Thread Alexey Popov

Hi

Alexey Popov wrote:
Now we also have terribly performing PostgreSQL on 8-core server. We 
noticed the slowdown after moving PostgreSQL from 2xXeon 3.0 
Apache+PostgreSQL server to dedicated PostgreSQL server. I collected 
some stats (see attach) before moving to Linux.

Sorry for the broken top ouptut in previuos message. Here's the correct one.

last pid: 70857;  load averages: 35.05, 37.11, 33 up 25+23:08:00  12:46:29
94 processes:  46 running, 48 sleeping
CPU: 17.0% user,  0.0% nice, 80.5% system,  0.2% interrupt,  2.3% idle
Mem: 1209M Active, 1890M Inact, 494M Wired, 143M Cache, 214M Buf, 127M Free
Swap: 2048M Total, 72K Used, 2048M Free

  PID USERNAMETHR PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAN
70557 pgsql 1 1310  1068M   662M RUN0   0:23 26.03% postgr
70840 pgsql 1 1320  1070M   167M CPU6   5   0:02 21.01% postgr
70761 pgsql 1 1280  1069M   384M RUN0   0:04 18.91% postgr
70766 pgsql 1 1290  1069M   414M RUN7   0:05 18.17% postgr
70784 pgsql 1 1280  1071M   374M RUN3   0:04 17.31% postgr
70758 pgsql 1 1280  1075M   443M RUN0   0:05 16.91% postgr
70783 pgsql 1 1280  1073M   393M RUN0   0:05 16.86% postgr
70781 pgsql 1 1280  1073M   389M RUN0   0:05 16.71% postgr
70755 pgsql 1 1280  1067M   387M RUN4   0:05 16.67% postgr
70765 pgsql 1 1280  1075M   424M CPU5   0   0:05 16.51% postgr
70764 pgsql 1 1280  1069M   388M RUN6   0:05 16.45% postgr
70786 pgsql 1 1280  1069M   361M RUN1   0:04 16.23% postgr
70785 pgsql 1 1280  1071M   358M RUN4   0:04 15.76% postgr
70788 pgsql 1 1280  1069M   330M RUN0   0:04 15.46% postgr
70795 pgsql 1 -160  1068M   300M vmpfw  0   0:04 15.07% postgr
70803 pgsql 1 1280  1068M   250M RUN7   0:03 14.71% postgr
70802 pgsql 1 1280  1068M   268M RUN0   0:03 14.19% postgr
70805 pgsql 1 1280  1068M   249M RUN0   0:03 14.04% postgr
70798 pgsql 1 1280  1070M   297M RUN0   0:03 13.92% postgr
70792 pgsql 1 1280  1068M   288M RUN0   0:04 13.90% postgr
70804 pgsql 1 1280  1068M   238M RUN0   0:03 13.29% postgr
70808 pgsql 1 1280  1068M   216M RUN3   0:03 13.28% postgr
70811 pgsql 1 1280  1069M   212M RUN0   0:03 12.81% postgr
70833 pgsql 1 1300  1068M   133M CPU2   3   0:02 12.77% postgr
70843 pgsql 1 1310  1068M 57636K RUN7   0:01 12.13% postgr
70834 pgsql 1 1300  1068M   111M CPU1   2   0:01 11.53% postgr
70850 pgsql 1 1320  1067M 46620K RUN1   0:00 11.28% postgr
70817 pgsql 1 1280  1068M   150M RUN0   0:02 10.45% postgr
70844 pgsql 1 1310  1067M 73296K RUN4   0:01 10.14% postgr
70815 pgsql 1 1280  1068M   143M RUN2   0:01  9.57% postgr
70819 pgsql 1 1280  1068M   158M RUN4   0:02  9.43% postgr
70832 pgsql 1 1290  1067M99M RUN1   0:01  9.42% postgr

With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-03 Thread Antony Mawer

On 3/12/2007 8:50 PM, Alexey Popov wrote:

Hi

Alexey Popov wrote:
Now we also have terribly performing PostgreSQL on 8-core server. We 
noticed the slowdown after moving PostgreSQL from 2xXeon 3.0 
Apache+PostgreSQL server to dedicated PostgreSQL server. I collected 
some stats (see attach) before moving to Linux.
Sorry for the broken top ouptut in previuos message. Here's the correct 
one.


last pid: 70857;  load averages: 35.05, 37.11, 33 up 25+23:08:00  12:46:29
94 processes:  46 running, 48 sleeping
CPU: 17.0% user,  0.0% nice, 80.5% system,  0.2% interrupt,  2.3% idle
Mem: 1209M Active, 1890M Inact, 494M Wired, 143M Cache, 214M Buf, 127M Free
Swap: 2048M Total, 72K Used, 2048M Free


Have you tried testing with different values for kern.hz? I am by no 
means an expert, but have stumbled across various postings over the past 
few years that suggest the high value (1000) used by modern (5.x+?) 
kernels can be pessimistic for some workloads...


If you could try testing with some other values by setting in 
/boot/loader.conf, eg:


kern.hz=100

Perhaps testing 100 and 200 to see how they fare against the default 
value of 1000, would at least provide some indicator as to whether this 
has any bearing on performance.


Some with a better knowledge of the kernel internals may be able to 
support or dimiss this idea, but as Kris is off on holidays I figured 
any suggestion was worthwhile! ;-)


I'd also like to say thanks for your efforts to help test and track down 
the cause of these performance problems - in the end the whole community 
benefits, so the more you are able to test and help resolve these things 
the better for us all... :-)


--Antony
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-03 Thread Robert Watson


On Mon, 3 Dec 2007, Alexey Popov wrote:


Mark Linimon wrote:

I used 7.0-BETA3 and it is much worse.
Ouch.  A lot of systems see improvement.  Thanks for trying it out.  I hope 
that one of the people that has been doing the actual work can now comment 
(I am just an onlooker), and that you can be patient in the meantime. 
Unfortunately, Kris, who often looks at these kind of issues, is traveling 
for all of December and thus off the net.


Is there any other FreeBSD developer who can take care of performance 
problems on many-cores systems? Seems like upcoming 7-RELEASE and 
6.3-RELEASE would be completely unusable for us on that kind of systems i.e. 
mostly on all modern hardware.


There are many FreeBSD developers who care a great deal about the performance 
of many-core systems.  However, it's also very late in the release cycle for 
7.0, and this sort of analysis requires a lot of time, so I don't think we 
will (or should) see any substantial changes at this point as they would 
require us to significantly extend the release cycle in order to test them 
properly.  The right path forwawrd at this point is to diagnosis the problems 
and work on fixing them in 8-CURRENT, and assuming they are not highly 
disruptive, MFC them for FreeBSD 7.1.


In general, the most important factor in optimizing performance is to get a 
good collaboration going between someone who can reproduce the problem, 
ideally in a way that can be shared with developers so they can also reproduce 
the problem, and provide testing and feedback over an extended period (several 
months) while the changes are developed and refined.  This is part of the role 
Kris has been playing with a number of FreeBSD developers -- Jeff, Attilio, 
myself, etc -- he set up highly reproduceable performance measurements and 
then worked with us to evaluate various patches to improve performance.  That 
kind of dynamic is invaluable, but it requires users who care a lot about 
performance (or whatever other factor it is) to spend a fair amount of time 
helping us.  Whether this is by providing a potted benchmark for developers to 
try out, or if this is by providing access to the test environment on their 
own systems, it's still critical.


I know from previous messages in the thread that you can't provide access to 
the actual application, but can you provide some sort of potted substitute 
that has similar performance properties -- be it php page sizes, database 
query load traces that can be replayed, etc?


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-03 Thread Alexey Popov

Hi

Alexey Popov wrote:
Now we also have terribly performing PostgreSQL on 8-core server. We 
noticed the slowdown after moving PostgreSQL from 2xXeon 3.0 
Apache+PostgreSQL server to dedicated PostgreSQL server. I collected 
some stats (see attach) before moving to Linux.
FYI there's top output on the same server with only 2 cores enabled (it works 
much better than with 8 cores):


last pid: 38266;  load averages:  7.11,  5.01,  4. up 0+01:33:40  15:51:20
53 processes:  7 running, 46 sleeping
CPU: 69.1% user,  0.0% nice, 29.8% system,  0.4% interrupt,  0.7% idle
Mem: 835M Active, 1743M Inact, 443M Wired, 168K Cache, 214M Buf, 882M Free
Swap: 2048M Total, 2048M Free

  PID USERNAMETHR PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAN
38139 pgsql 1 1320  1126M   639M RUN0   0:06 29.84% postgr
38245 pgsql 1 1300  1071M   361M RUN1   0:01 19.61% postgr
38249 pgsql 1 1300  1068M   320M CPU1   0   0:01 15.76% postgr
38251 pgsql 1 1300  1069M   317M RUN1   0:01 15.06% postgr
38254 pgsql 1 1300  1067M   161M RUN1   0:01 14.36% postgr
  694 pgsql 1 1300  1066M  4816K select 0   0:37  0.00% postgr
  698 pgsql 1  960 15588K  4724K select 0   0:28  0.00% postgr
  697 pgsql 1  960  1066M   298M select 0   0:14  0.00% postgr
 1748 root  1  960  7044K  2252K select 0   0:05  0.00% top
18775 root  1  960  7008K  2220K select 0   0:02  0.00% top
  627 root  1  960 18120K  5872K select 0   0:02  0.00% snmpd
 1704 vich  1  960 30616K  4248K select 0   0:00  0.00% sshd
  695 pgsql 1  960 15392K  4496K select 1   0:00  0.00% postgr
16781 null  1  960 30616K  4220K select 0   0:00  0.00% sshd
  655 root  1  960  7732K  2324K select 0   0:00  0.00% ntpd
  556 root  1  960  3652K  1192K select 1   0:00  0.00% syslog

With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-03 Thread Ivan Voras
Antony Mawer wrote:

 Have you tried testing with different values for kern.hz? I am by no
 means an expert, but have stumbled across various postings over the past
 few years that suggest the high value (1000) used by modern (5.x+?)
 kernels can be pessimistic for some workloads...
 
 If you could try testing with some other values by setting in
 /boot/loader.conf, eg:
 
 kern.hz=100

AFAIK this was tried and found irrelevant for this particular load. It
may still help others.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-03 Thread Alexey Popov

Hi.

Robert Watson wrote:
Is there any other FreeBSD developer who can take care of performance 
problems on many-cores systems? Seems like upcoming 7-RELEASE and 
6.3-RELEASE would be completely unusable for us on that kind of 
systems i.e. mostly on all modern hardware.


There are many FreeBSD developers who care a great deal about the 
performance of many-core systems.  However, it's also very late in the 
release cycle for 7.0, and this sort of analysis requires a lot of time, 
so I don't think we will (or should) see any substantial changes at this 
point as they would require us to significantly extend the release cycle 
in order to test them properly.
Is there a reason to release system that unable to work on 8-core systems? What 
would people think when they won't be able to run their old projects after 
moving to the new hardware?


 The right path forwawrd at this point 
is to diagnosis the problems and work on fixing them in 8-CURRENT, and 
assuming they are not highly disruptive, MFC them for FreeBSD 7.1.

I believe at least the bug with lockmgr contention should be fixed before 
release.

In general, the most important factor in optimizing performance is to 
get a good collaboration going between someone who can reproduce the 
problem, ideally in a way that can be shared with developers so they can 
also reproduce the problem, and provide testing and feedback over an 
extended period (several months) while the changes are developed and 
refined.  This is part of the role Kris has been playing with a number 
of FreeBSD developers -- Jeff, Attilio, myself, etc -- he set up highly 
reproduceable performance measurements and then worked with us to 
evaluate various patches to improve performance.  That kind of dynamic 
is invaluable, but it requires users who care a lot about performance 
(or whatever other factor it is) to spend a fair amount of time helping 
us.  Whether this is by providing a potted benchmark for developers to 
try out, or if this is by providing access to the test environment on 
their own systems, it's still critical.
I know from previous messages in the thread that you can't provide 
access to the actual application, but can you provide some sort of 
potted substitute that has similar performance properties -- be it php 
page sizes, database query load traces that can be replayed, etc?
I can try to produce synthetic benchmarks based on my workload but really I'm 
interested more in real workload performance. I'm ready to test changes, measure 
differences and provide any benchmark and profiling information. Except for 
lockmgr contention bug there seems to be a much optimization work to do because 
 FreeBSD with patched lockmgr on my workload is still 1.5 times slower that Linux.


With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-03 Thread Robert Watson


On Mon, 3 Dec 2007, Alexey Popov wrote:


Robert Watson wrote:

Is there any other FreeBSD developer who can take care of performance 
problems on many-cores systems? Seems like upcoming 7-RELEASE and 
6.3-RELEASE would be completely unusable for us on that kind of systems 
i.e. mostly on all modern hardware.


There are many FreeBSD developers who care a great deal about the 
performance of many-core systems.  However, it's also very late in the 
release cycle for 7.0, and this sort of analysis requires a lot of time, so 
I don't think we will (or should) see any substantial changes at this point 
as they would require us to significantly extend the release cycle in order 
to test them properly.


Is there a reason to release system that unable to work on 8-core systems? 
What would people think when they won't be able to run their old projects 
after moving to the new hardware?


Evidence in-hand seems to suggest that 8 core systems work very well for most 
users, and reflect a significant performance increase with 7.0 over previous 
FreeBSD releases.  Obviously, this is not true in all cases, but part of the 
point of doing a .0 release is to get the technology into the hands of people 
who want to use it, and part of the point of continuing to support the 6.x 
release series is to provide the less agressive feature development that some 
users needed.


 The right path forwawrd at this point is to diagnosis the problems and 
work on fixing them in 8-CURRENT, and assuming they are not highly 
disruptive, MFC them for FreeBSD 7.1.


I believe at least the bug with lockmgr contention should be fixed before 
release.


Could you point me at the specific proposed change in question?  I don't think 
I've seen it come across re@ as a potential merge request.  Changing locking 
primitives close to a release is, FYI, a risky business, as while it may 
improve performance in specific cases, we may not have a lot of information 
about more general cases.  We also risk opening up previously nascent race 
conditions in lock consumers.


In general, the most important factor in optimizing performance is to get a 
good collaboration going between someone who can reproduce the problem, 
ideally in a way that can be shared with developers so they can also 
reproduce the problem, and provide testing and feedback over an extended 
period (several months) while the changes are developed and refined.  This 
is part of the role Kris has been playing with a number of FreeBSD 
developers -- Jeff, Attilio, myself, etc -- he set up highly reproduceable 
performance measurements and then worked with us to evaluate various 
patches to improve performance.  That kind of dynamic is invaluable, but it 
requires users who care a lot about performance (or whatever other factor 
it is) to spend a fair amount of time helping us.  Whether this is by 
providing a potted benchmark for developers to try out, or if this is by 
providing access to the test environment on their own systems, it's still 
critical. I know from previous messages in the thread that you can't 
provide access to the actual application, but can you provide some sort of 
potted substitute that has similar performance properties -- be it php page 
sizes, database query load traces that can be replayed, etc?


I can try to produce synthetic benchmarks based on my workload but really 
I'm interested more in real workload performance. I'm ready to test changes, 
measure differences and provide any benchmark and profiling information. 
Except for lockmgr contention bug there seems to be a much optimization work 
to do because FreeBSD with patched lockmgr on my workload is still 1.5 times 
slower that Linux.


Obviously, we are interested in the real workload also, but there are times 
when we have to accept synthetic benchmarks we can get our hands on instead of 
real benchmarks that people won't give to us because they incorporate 
proprietary technology, business-sensitive information, or are simply too 
complex to reproduce, etc.  If you can give us the exact workload to reproduce 
on our systems, that's much better than a synthetic benchmark, but if you 
can't, then a synthetic benchmark is what we'll have to work with.


I suggest we move this thread to the performance@ mailing list, and if 
possible, could you begin the thread over there with a summary of the workload 
and investigation to date.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-02 Thread Alexey Vlasov
On Sat, Dec 01, 2007 at 11:04:41PM +0100, Daniel Gerzo wrote:
 Please try with RELENG_7 (aka. FreeBSD 7.0-BETA3) and ULE scheduler.

I used 7.0-BETA3 and it is much worse.
ULE, w/o PAE (or with PAE)

# ./ab -n 100 -c 20 -t 30 http://somesite-freebsd.com/ab/
This is ApacheBench, Version 2.0.40-dev $Revision: 1.146 $ apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking test-f1-apache-aux2.1gb.ru (be patient)
Finished 17 requests


Server Software:Apache/2.2.3
Server Hostname:somesite-freebsd.com
Server Port:80

Document Path:  /ab/
Document Length:41451 bytes

Concurrency Level:  20
Time taken for tests:   30.448737 seconds
Complete requests:  17
Failed requests:0
Write errors:   0
Total transferred:  1191762 bytes
HTML transferred:   1178622 bytes
Requests per second:0.56 [#/sec] (mean)
Time per request:   35822.043 [ms] (mean)
Time per request:   1791.102 [ms] (mean, across all concurrent
requests)
Transfer rate:  38.20 [Kbytes/sec] received

Connection Times (ms)
  min  mean[+/-sd] median   max
Connect:00   0.9  0   2
Processing:   490 4160 8103.9640   25972
Waiting:   91  125  70.4110 394
Total:490 4160 8103.8640   25972

Percentage of the requests served within a certain time (ms)
  50%631
  66%709
  75%721
  80%734
  90%  19495
  95%  25972
  98%  25972
  99%  25972
 100%  25972 (longest request)

Do you have any more ideas?

I know that I can try to change to amd64,
but I'm sure that this won't solve my problems. As far as I remember it
didn't help Alexey Popov ( author of this thread).

And by the way I couldn't launch Zend Optimizer (3.3.0.) on amd64. It
gave me Segmentation fault: 11 (core dumped).
http://www.zend.com/forums/index.php?t=msggoto=13585S=a322ef7edb5d49c70f431607e648fb57srch=amd64+freebsd#msg_13585
And without it as you undesrtand yourself virtual hosting is nothing.

Looking freebsd-maillists I noticed the same discription of
the same problem as I have. 
http://lists.freebsd.org/pipermail/freebsd-performance/2007-July/002781.html

--
BRGDS. Alexey Vlasov.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-02 Thread Mark Linimon
On Sunday 02 December 2007, Alexey Vlasov wrote:
 I used 7.0-BETA3 and it is much worse.

Ouch.  A lot of systems see improvement.  Thanks for trying it
out.  I hope that one of the people that has been doing the actual
work can now comment (I am just an onlooker), and that you can be
patient in the meantime.

Unfortunately, Kris, who often looks at these kind of issues, is
traveling for all of December and thus off the net.

mcl
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-01 Thread Alexey Vlasov
Hi,

It seems that I'm not the only one who faced the problem that FreeBSD is
non productive on multiprocessors platforms.

I use OS Linux on my hosting for web-servers, base for all servers is
the same m/b S5000PAL ( SR1500), 2 quad kernel cpu  Xeon E5320 or E5345,
8Gb RAM. I decided to install FreeBSD 6.2 i386 on one of the servers,
and the result was totally non productive.
Used software,  Apache 2.2.6 (worker) as frontend Proxy, backend Apache
2.2.3 (prefork)

By the time we understood,that something was wrong with FreeBSD, there
had already been placed about 10 sites with high-capacity and about
hundred of usual ones. And this was the limit for FreeBSD. It came along
with a great amount of Context Switches, about hundred thousands.
I attached the log what was then happening with FreeBSD.
After playing with ab (ApacheBenchmark) options, it turned out that even
with the following options you can totally down the server:

./ab -n 100 -c 20 -t 30 http://somesite-freebsd.com

I copied at the same time somesite.com (php scripts) to Linux server,
launched ab with the same options, and saw that it has no influence on
work of the server. (And by the way there work about 1.5 virtual hosts
on that server)

All options for Apache on Linux and FreeBSD are the same:


FreeBSD:

This is ApacheBench, Version 2.0.40-dev $Revision: 1.146 $ apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking somesite-freebsd.com (be patient)
Finished 29 requests


Server Software:Apache/2.2.3
Server Hostname:somesite-freebsd.com
Server Port:80

Document Path:  /ab/
Document Length:41450 bytes

Concurrency Level:  20
Time taken for tests:   30.44765 seconds
Complete requests:  29
Failed requests:22
   (Connect: 0, Length: 22, Exceptions: 0)
Write errors:   0
Total transferred:  1529557 bytes
HTML transferred:   1513497 bytes
Requests per second:0.97 [#/sec] (mean)
Time per request:   20720.527 [ms] (mean)
Time per request:   1036.026 [ms] (mean, across all concurrent
requests)
Transfer rate:  49.69 [Kbytes/sec] received

Connection Times (ms)
  min  mean[+/-sd] median   max
Connect:0 1760 1503.4   30023002
Processing:   866 13328 9460.5  13853   26246
Waiting:  139 2286 2319.0   11296764
Total:871 15089 10642.6  16855   29248

Percentage of the requests served within a certain time (ms)
  50%  16705
  66%  22670
  75%  25439
  80%  26342
  90%  29160
  95%  29188
  98%  29248
  99%  29248
 100%  29248 (longest request)


Linux: (the same site)
This is ApacheBench, Version 2.0.40-dev $Revision: 1.146 $ apache-2.0
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Copyright 2006 The Apache Software Foundation, http://www.apache.org/

Benchmarking linux.1gb.ru (be patient)
Finished 814 requests

Server Software:Apache/2.2.3
Server Hostname:somesite-linux.com
Server Port:80

Document Path:  /ab/
Document Length:41451 bytes

Concurrency Level:  20
Time taken for tests:   30.3216 seconds
Complete requests:  814
Failed requests:759
   (Connect: 0, Length: 759, Exceptions: 0)
Write errors:   0
Non-2xx responses:  1
Total transferred:  34430291 bytes
HTML transferred:   34126461 bytes
Requests per second:27.13 [#/sec] (mean)
Time per request:   737.180 [ms] (mean)
Time per request:   36.859 [ms] (mean, across all concurrent
requests)
Transfer rate:  1120.65 [Kbytes/sec] received

Connection Times (ms)
  min  mean[+/-sd] median   max
Connect:00   0.0  0   0
Processing:   214  725 575.25575001
Waiting:   41  265 376.01313280
Total:214  725 575.25575001

Percentage of the requests served within a certain time (ms)
  50%557
  66%716
  75%863
  80%967
  90%   1398
  95%   1749
  98%   2529
  99%   3064
 100%   5001 (longest request)

# cat /etc/sysctl.conf
security.bsd.see_other_uids=0
kern.maxfiles=204800
kern.maxfilesperproc=202400

kernel:
machine i386
cpu I686_CPU
ident   F1RNT1

options PAE
options SMP

options SCHED_4BSD
options PREEMPTION
options INET
options FFS
options SOFTUPDATES
options UFS_ACL
options UFS_DIRHASH
options NULLFS
options MD_ROOT
options CD9660
options PROCFS
options PSEUDOFS
options GEOM_GPT
options GEOM_LABEL
options GEOM_MIRROR
options COMPAT_43
options COMPAT_FREEBSD4
options COMPAT_FREEBSD5
options SCSI_DELAY=5000
options KTRACE
options SYSVSHM
options SYSVMSG
options SYSVSEM
options _KPOSIX_PRIORITY_SCHEDULING
options 

Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-01 Thread Mark Linimon
On Sun, Dec 02, 2007 at 12:37:32AM +0300, Alexey Vlasov wrote:
 I decided to install FreeBSD 6.2 i386 on one of the servers, and the
 result was totally non productive.

The 6.x series was intended to get us back to the stability that we had
had pre-SMP integration.  I believe we mostly succeeded.

One of the major thrusts for 7.0 development was to fix the performance
regressions that had been introduced.  From the results that I have seen
(I am not one of the participants), there has been major progress over
the past 2 years in removing yet one bottleneck after another.  Recent
tests show us to be on a par with Linux on a number of benchmarks; of
course, we need more people testing 7.0 in real-world environments to
confirm this.

You may want to try the 7.0 release candidate on a testbed to see if
your results have improved as much as we think that they will have.

mcl
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-01 Thread Reko Turja



I use OS Linux on my hosting for web-servers, base for all servers is
the same m/b S5000PAL ( SR1500), 2 quad kernel cpu  Xeon E5320 or E5345,
8Gb RAM. I decided to install FreeBSD 6.2 i386 on one of the servers,


To be a bit mor specific with my previous reply, in order to use SCHED_ULE
you need to be running 7.x (which is quite stable already even being a  
beta.


And of course with 64 bit hardware it's best to run amd64 version of the  
OS.


-Reko
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-01 Thread Reko Turja

On Sat, 01 Dec 2007 23:37:32 +0200, Alexey Vlasov [EMAIL PROTECTED] wrote:


kernel:
machine i386
cpu I686_CPU
ident   F1RNT1

options PAE


One very probable culprit for slowness


options SMP

options SCHED_4BSD


Using _ULE might yield a bit more performance as well


# cat /etc/make.conf
CPUTYPE?=nocona

CFLAGS=-O2 -pipe


I think the recommended practise is either use CFLAGS+=your flags or put  
the
local compiler tweaks to COPTFALGS these days. Not sure if this affects  
performance tho'


-Reko
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re[2]: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-01 Thread Daniel Gerzo
Hello Alexey,

Saturday, December 1, 2007, 10:37:32 PM, you wrote:

 I use OS Linux on my hosting for web-servers, base for all servers is
 the same m/b S5000PAL ( SR1500), 2 quad kernel cpu  Xeon E5320 or E5345,
 8Gb RAM. I decided to install FreeBSD 6.2 i386 on one of the servers,
 and the result was totally non productive.

Please try with RELENG_7 (aka. FreeBSD 7.0-BETA3) and ULE scheduler.

-- 
Best regards,
 Danielmailto:[EMAIL PROTECTED]

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-12-01 Thread Josh Carroll
  options PAE

 One very probable culprit for slowness

I'd say it IS the culprit. PAE is known to decrease performance, and
this is probably 95% of the cause.

 Using _ULE might yield a bit more performance as well

Yes, in 7.0-BETA3 I'm seeing a 7% increase in performance (sysbench
with 8 threads on a 4-core system) with ULE over 4BSD.

Both great suggestions. If he needs the high memory support, I would
test without PAE just to test the performance (along with changing to
the ULE scheduler), then rebuild the system later with amd64 so he
doesn't have to use the PAE hack.

Regards,
Josh
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-30 Thread Kris Kennaway

Alexey Popov wrote:

Hi

Kris Kennaway wrote:

One more patch which may or may not help is:

  http://www.freebsd.org/~jhb/patches/namei_rwlock.patch

(may also require porting since it was against an older version of 
7.0-CURRENT).  When I have tested this in the past it was a 
performance loss for reasons that I think I understand (basically, it 
is locally a performance improvement for the name cache but also 
requires a fixed lockmgr to avoid an overall performance loss), but I 
don't remember if I tested it in conjunction with the lockmgr patch.
This patch doesn't apply to 7-STABLE because /sys/kern/vfs_cache.c was 
changed significanly since rev. 1.108. I tried to patch it manually but 
don't know what to do with cache_lookup() changes.


OK, I am about to go on vacation so I am not able to help with either of 
these things.


Kris



There are patches you need to enable it on woodcrest.  They are in 
my p4 branch (kris-contention) but I don't have time right now to 
extract them.
I think it would be very useful because I can't see any other ways to 
profile FreeBSD on the modern many-cores machines.


You can extract the changeset from my branch via 
http://perforce.freebsd.org.  Unfortunately I don't have time to do it 
myself.

I'll try it if it does not also need porting.

With best regards,
Alexey Popov




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-30 Thread Alexey Popov

Hi

Kris Kennaway wrote:

One more patch which may or may not help is:

  http://www.freebsd.org/~jhb/patches/namei_rwlock.patch

(may also require porting since it was against an older version of 
7.0-CURRENT).  When I have tested this in the past it was a performance 
loss for reasons that I think I understand (basically, it is locally a 
performance improvement for the name cache but also requires a fixed 
lockmgr to avoid an overall performance loss), but I don't remember if I 
tested it in conjunction with the lockmgr patch.
This patch doesn't apply to 7-STABLE because /sys/kern/vfs_cache.c was 
changed significanly since rev. 1.108. I tried to patch it manually but 
don't know what to do with cache_lookup() changes.


There are patches you need to enable it on woodcrest.  They are in my 
p4 branch (kris-contention) but I don't have time right now to 
extract them.
I think it would be very useful because I can't see any other ways to 
profile FreeBSD on the modern many-cores machines.


You can extract the changeset from my branch via 
http://perforce.freebsd.org.  Unfortunately I don't have time to do it 
myself.

I'll try it if it does not also need porting.

With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-29 Thread Kris Kennaway

Alexey Popov wrote:

Hi

Kris Kennaway wrote:
Now FreeBSD 7-STABLE ULE 8-core server without optimized PHP 
realpath_cache_size (producing 2000+ lstats per request) can handle 
up to ~24 rps as opposed to  max. 17 rps without your patch. %sys 
never grows over %user with your patch. On the server with 
optimized realpath_cache_size there's no visible influence of your 
patch.
You said 20 before for this configuration, so I'm a bit suspicious 
about how seriously to treat your measurements :)

Sorry, my mistake. s/ULE/4BSD.
OK, please compare ULE to ULE with and without my patch (and 
remembering to enable the sysctl), and obtain lock profiling traces in 
both cases under identical workloads  durations.  That is what I need 
to proceed with this issue.
I didn't measured the exact values of requests per second on ULE with 
patch and without patch, but at first glance the benefits of the patch 
are similiar to 4BSD. If you need this values, I'll obtain them.


Here you can find lock profiling results for 7-BETA3 GENERIC kernel with 
SCHED_ULE running optimized PHP and unoptimized, with your patch and 
without it: http://83.167.98.162/gprof/lockmgr/


This data was collected by th following script:
(sysctl debug.lock.prof.reset=1
sysctl debug.lock.prof.enable=1
sleep 60
sysctl debug.lock.prof.enable=0
sysctl debug.lock.prof.stats
top -d 2 -b | tail -25)

AFAIU there's still high contention on lockbuilder mtxpool with patch 
applied. But hopefully lockmgr:ufs contention which i believe produced 
80%sysCPU load is gone with your patch.


Looks to me like lockmgr-related contention was reduced by 1 to 2 orders 
of magnitude, which is the expected result.  This surely must have a 
measurable impact on your workload.  Further lockmgr improvement will 
have to wait until the lockmgr replacement work proceeds.


One more patch which may or may not help is:

  http://www.freebsd.org/~jhb/patches/namei_rwlock.patch

(may also require porting since it was against an older version of 
7.0-CURRENT).  When I have tested this in the past it was a performance 
loss for reasons that I think I understand (basically, it is locally a 
performance improvement for the name cache but also requires a fixed 
lockmgr to avoid an overall performance loss), but I don't remember if I 
tested it in conjunction with the lockmgr patch.


There are patches you need to enable it on woodcrest.  They are in my 
p4 branch (kris-contention) but I don't have time right now to extract 
them.
I think it would be very useful because I can't see any other ways to 
profile FreeBSD on the modern many-cores machines.


You can extract the changeset from my branch via 
http://perforce.freebsd.org.  Unfortunately I don't have time to do it 
myself.


Kris

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-23 Thread Alexey Popov

Hi.

Kris Kennaway wrote:

Now FreeBSD 7-STABLE ULE 8-core server without optimized PHP 
realpath_cache_size (producing 2000+ lstats per request) can handle up 
to ~24 rps as opposed to  max. 17 rps without your patch. %sys never 
grows over %user with your patch. On the server with optimized 
realpath_cache_size there's no visible influence of your patch.


You said 20 before for this configuration, so I'm a bit suspicious 
about how seriously to treat your measurements :)

Sorry, my mistake. s/ULE/4BSD.

Anyway, please obtain another lock profiling trace using the same 
conditions as the previous one (same workload  duration, etc), so we 
can compare what changed.

OK, I'll make it a little bit later.

Also I tried to find what else is slow in FreeBSD, I tried hwpmc as 
module and in kernel, but it fails with error:

pmc: Unknown Intel CPU.
module_register_init: MOD_LOAD (hwpmc, 0x804833e0, 
0x809338a0) error 78


This is related to 
http://www.freebsd.org/cgi/query-pr.cgi?pr=amd64%2F111994cat=

and it is impossible to use hwpmc with modern CPUs.

Is kgmon profiling usable on FreeBSD 7?

With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-23 Thread Kris Kennaway

Joseph Koshy wrote:

Also I tried to find what else is slow in FreeBSD, I tried hwpmc as
module and in kernel, but it fails with error:
pmc: Unknown Intel CPU.
module_register_init: MOD_LOAD (hwpmc, 0x804833e0,
0x809338a0) error 78



There are patches you need to enable it on woodcrest.  They are in my p4
branch (kris-contention) but I don't have time right now to extract them.


These patches make hwpmc treat these CPUs are possessing Pentium-Pro class
PMCs.

Unfortunately, this is easy to do, but incorrect:
- There are differences in the legal bit values that may be loaded into
  PMC registers for many hardware events.
- hwpmc needs to be taught to support measurements on CPUs with
  multiple cores per package.

And then there is additional work to support these CPUS
at the same level as the current set:
- The hardware events supported are named differently; documentation,
   libpmc's event selector parsing code need to be changed to suit.
- The hardware supports a new class of fixed function PMCs that
   hwpmc needs to support.



Well, this is all true, but overlooks the point that it does minimally 
work, which is of critical importance to people with one of these CPUs 
who want to actually use your tool ;)


Kris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-23 Thread Ivan Voras
Krassimir Slavchev wrote:

 That's true but if the tests are same then they can be compared.
 

 - the code is most likely checking for changes in PHP libraries)
 
 This is not recommended for production systems.

PHP code accelerators / caches do that all the time. require_once() also
does it.

 Yes, may be it is easier to write perl/php scripts.

I'm glad you're volunteering :)




signature.asc
Description: OpenPGP digital signature


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-23 Thread Krassimir Slavchev
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Ivan Voras wrote:
 On 23/11/2007, Krassimir Slavchev [EMAIL PROTECTED] wrote:
 
 Would someone define what exact tests to be performed.
 Ok, using ab is fine but with what parameters it is used and against
 what, script or static html? It will be good to have written some perl,
 
 In this thread, it's always PHP code, with database backends.
 
 php ... scripts or C programs which simulates some kind of 'real world'
 work.
 
 The problem is that a realistic applications does a lot of things that
 are not easily simulated:

That's true but if the tests are same then they can be compared.

 
 - usually has a lot of code, lots of include files, libraries, etc.
 (so it stresses file systems, as was shown with fstat() in the thread
 - the code is most likely checking for changes in PHP libraries)

This is not recommended for production systems.

 - uses a database, which is populated with real-world data (so it has
 a lot of IPC of very varied sizes)
 - uses some kind of caching, both of compiled PHP code (eAccelerator,
 pecl-APC) and of data (eAccelerator, memcached) (which uses SysV SHM
 and IPC).
 
 Reducing all that to a C file that does all of it is very nontrivial.

Yes, may be it is easier to write perl/php scripts.

 For classic setups with mod_php, it's not uncommon that httpd
 processes grow to 100 MB or more each, with all the heavy stuff
 brought in.
 
Yes, that is true for mod_perl too.

However, it is hard to simulate real workload.

I will have 2 2xQuad Core(X5450) with 8G RAM systems (DL380G5) soon and
will have about a month to play with them before put in production. If
someone wish I can run specific test on them.

Best Regards

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (FreeBSD)

iD8DBQFHRrQuxJBWvpalMpkRAvL9AJ9tBgeZPxg6zYWqJUgVimIJgaxl1ACeK2kS
POeyNbZBGuiQB0OKHIEtoSk=
=pjb2
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-23 Thread Kris Kennaway

Alexey Popov wrote:

Hi.

Kris Kennaway wrote:

Now FreeBSD 7-STABLE ULE 8-core server without optimized PHP 
realpath_cache_size (producing 2000+ lstats per request) can handle 
up to ~24 rps as opposed to  max. 17 rps without your patch. %sys 
never grows over %user with your patch. On the server with optimized 
realpath_cache_size there's no visible influence of your patch.


You said 20 before for this configuration, so I'm a bit suspicious 
about how seriously to treat your measurements :)

Sorry, my mistake. s/ULE/4BSD.


OK, please compare ULE to ULE with and without my patch (and remembering 
to enable the sysctl), and obtain lock profiling traces in both cases 
under identical workloads  durations.  That is what I need to proceed 
with this issue.


Anyway, please obtain another lock profiling trace using the same 
conditions as the previous one (same workload  duration, etc), so we 
can compare what changed.

OK, I'll make it a little bit later.

Also I tried to find what else is slow in FreeBSD, I tried hwpmc as 
module and in kernel, but it fails with error:

pmc: Unknown Intel CPU.
module_register_init: MOD_LOAD (hwpmc, 0x804833e0, 
0x809338a0) error 78


There are patches you need to enable it on woodcrest.  They are in my p4 
branch (kris-contention) but I don't have time right now to extract them.


This is related to 
http://www.freebsd.org/cgi/query-pr.cgi?pr=amd64%2F111994cat=

and it is impossible to use hwpmc with modern CPUs.


Sounds like it.


Is kgmon profiling usable on FreeBSD 7?


I've never bothered, it is likely to be quite slow, so it can totally 
change the workload you are trying to profile.


Kris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-23 Thread Krassimir Slavchev
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Ivan Voras wrote:
 On 20/11/2007, Alexey Popov [EMAIL PROTECTED] wrote:
 
 CPU states:  5.9% user,  0.0% nice, 81.3% system,  0.0% interrupt, 12.8% idle
 CPU states: 82.2% user,  0.0% nice, 13.8% system,  0.0% interrupt,  4.0% idle
 
 Interesting coincidence: 1 CPU generates almost 8x less sys time then 8 
 CPUs.
 
 But it seems that you have found something real. Inspired by your
 problem I've done a simple measurement (ab) on a 4-CPU (2x2 core
 Opterons 2216 HE, PAE) machine I maintain, under these circumstances:

Would someone define what exact tests to be performed.
Ok, using ab is fine but with what parameters it is used and against
what, script or static html? It will be good to have written some perl,
php ... scripts or C programs which simulates some kind of 'real world'
work.
There are lot of people who thinking 'it is good for me' (including me)
but what can be done with such hardware?

Best Regards

 
 - a heavy PHP application
 - FastCGI
 - in this case, load of 4 clients
 - on 6-STABLE
 
 and I'm reporting similar findings:
 
 last pid:  2254;  load averages:  1.43,  0.92,  0.69   up 71+08:23:06  
 18:00:31
 153 processes: 8 running, 144 sleeping, 1 zombie
 CPU states: 38.8% user,  0.0% nice, 48.4% system,  3.2% interrupt,  9.6% idle
 Mem: 2321M Active, 1135M Inact, 313M Wired, 139M Cache, 112M Buf, 93M Free
 Swap: 4500M Total, 336K Used, 4500M Free
 
   PID USERNAME  THR PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
  2208 www 1  990   115M 19808K RUN1   0:06 36.83% php-cgi
  2207 www 1 1000   114M 19348K RUN3   0:05 32.66% php-cgi
  1715 www 1  990   115M 23672K CPU0   0   0:24 27.83% php-cgi
  1710 www 1 1010   114M 23460K RUN1   0:31 22.17% php-cgi
  1882 www 1  990   115M 23392K CPU2   3   0:18 21.34% php-cgi
  1718 www 1   40   114M 22556K sbwait 0   0:21 19.14% php-cgi
  2677 pgsql   1   40   977M 55768K sbwait 0   0:00 28.00% postgres
 
 We are not so performance bound as you so I didn't do measurements
 earlier. I cannot play with settings on this machine as it is in
 production, but ~~50% sys time (the measurement changes around 45% +/-
 10%) seems too much.
 
 On another 4-CPU machine (2x2 Xeons 5110, AMD64) with the same
 application and benchmark setup, but RELENG_7, which is not yet in
 production, the results are slightly different:
 
 last pid: 66564;  load averages:  1.87,  0.48,  0.18   up 15+05:27:03  
 17:09:09
 113 processes: 9 running, 104 sleeping
 CPU states: 49.0% user,  0.0% nice, 28.8% system,  0.0% interrupt, 22.1% idle
 Mem: 555M Active, 295M Inact, 884M Wired, 98M Cache, 213M Buf, 135M Free
 Swap: 2047M Total, 2047M Free
 
   PID USERNAME   THR PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
 66557 www  1 1090   105M 25340K RUN3   0:14 64.99% php-cgi
 66559 www  1 1090   105M 25308K RUN2   0:14 62.99% php-cgi
 66561 www  1  980   105M 22196K RUN0   0:01 12.99% php-cgi
 66562 www  1  980   105M 22196K RUN1   0:01 11.96% php-cgi
 59043 nobody   1  470  7012K  3744K select 2   0:27  5.96% sqlcached
   774 pgsql1  440   437M   112M select 2   3:55  0.00% postgres
 ___
 freebsd-stable@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-stable
 To unsubscribe, send any mail to [EMAIL PROTECTED]
 


-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (FreeBSD)

iD8DBQFHRp75xJBWvpalMpkRAhbVAKClBhCif9G/bYPq6hHaNxAyT9NuLwCfb8+a
Aqmf9RT+LBNYqKOE6crBs9g=
=LL1v
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-23 Thread Joseph Koshy
  Also I tried to find what else is slow in FreeBSD, I tried hwpmc as
  module and in kernel, but it fails with error:
  pmc: Unknown Intel CPU.
  module_register_init: MOD_LOAD (hwpmc, 0x804833e0,
  0x809338a0) error 78

 There are patches you need to enable it on woodcrest.  They are in my p4
 branch (kris-contention) but I don't have time right now to extract them.

These patches make hwpmc treat these CPUs are possessing Pentium-Pro class
PMCs.

Unfortunately, this is easy to do, but incorrect:
- There are differences in the legal bit values that may be loaded into
  PMC registers for many hardware events.
- hwpmc needs to be taught to support measurements on CPUs with
  multiple cores per package.

And then there is additional work to support these CPUS
at the same level as the current set:
- The hardware events supported are named differently; documentation,
   libpmc's event selector parsing code need to be changed to suit.
- The hardware supports a new class of fixed function PMCs that
   hwpmc needs to support.

-- 
FreeBSD Volunteer, http://people.freebsd.org/~jkoshy
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-23 Thread Ivan Voras
On 23/11/2007, Krassimir Slavchev [EMAIL PROTECTED] wrote:

 Would someone define what exact tests to be performed.
 Ok, using ab is fine but with what parameters it is used and against
 what, script or static html? It will be good to have written some perl,

In this thread, it's always PHP code, with database backends.

 php ... scripts or C programs which simulates some kind of 'real world'
 work.

The problem is that a realistic applications does a lot of things that
are not easily simulated:

- usually has a lot of code, lots of include files, libraries, etc.
(so it stresses file systems, as was shown with fstat() in the thread
- the code is most likely checking for changes in PHP libraries)
- uses a database, which is populated with real-world data (so it has
a lot of IPC of very varied sizes)
- uses some kind of caching, both of compiled PHP code (eAccelerator,
pecl-APC) and of data (eAccelerator, memcached) (which uses SysV SHM
and IPC).

Reducing all that to a C file that does all of it is very nontrivial.
For classic setups with mod_php, it's not uncommon that httpd
processes grow to 100 MB or more each, with all the heavy stuff
brought in.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-23 Thread Alexey Popov

Hi

Kris Kennaway wrote:
Now FreeBSD 7-STABLE ULE 8-core server without optimized PHP 
realpath_cache_size (producing 2000+ lstats per request) can handle 
up to ~24 rps as opposed to  max. 17 rps without your patch. %sys 
never grows over %user with your patch. On the server with optimized 
realpath_cache_size there's no visible influence of your patch.
You said 20 before for this configuration, so I'm a bit suspicious 
about how seriously to treat your measurements :)

Sorry, my mistake. s/ULE/4BSD.
OK, please compare ULE to ULE with and without my patch (and remembering 
to enable the sysctl), and obtain lock profiling traces in both cases 
under identical workloads  durations.  That is what I need to proceed 
with this issue.
I didn't measured the exact values of requests per second on ULE with 
patch and without patch, but at first glance the benefits of the patch 
are similiar to 4BSD. If you need this values, I'll obtain them.


Here you can find lock profiling results for 7-BETA3 GENERIC kernel with 
SCHED_ULE running optimized PHP and unoptimized, with your patch and 
without it: http://83.167.98.162/gprof/lockmgr/


This data was collected by th following script:
(sysctl debug.lock.prof.reset=1
sysctl debug.lock.prof.enable=1
sleep 60
sysctl debug.lock.prof.enable=0
sysctl debug.lock.prof.stats
top -d 2 -b | tail -25)

AFAIU there's still high contention on lockbuilder mtxpool with patch 
applied. But hopefully lockmgr:ufs contention which i believe produced 
80%sysCPU load is gone with your patch.


Also I tried to find what else is slow in FreeBSD, I tried hwpmc as 
module and in kernel, but it fails with error:

pmc: Unknown Intel CPU.
There are patches you need to enable it on woodcrest.  They are in my p4 
branch (kris-contention) but I don't have time right now to extract them.
I think it would be very useful because I can't see any other ways to 
profile FreeBSD on the modern many-cores machines.


With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-22 Thread Max Laier
On Tuesday 20 November 2007, Kris Kennaway wrote:
 Kris Kennaway wrote:
  Kris Kennaway wrote:
  In the meantime there is unfortunately not a lot that can be done,
  AFAICT.  There is one hack that I will send you later but it is not
  likely to help much.  I will also think about how to track down the
  cause of the contention further (the profiling trace only shows that
  it comes mostly from vget/vput but doesn't show where these are
  called from).
 
  Actually this patch might help.  It doesn't replace lockmgr but it
  does fix a silly thundering herd behaviour.  It probably needs some
  adjustment to get it to apply cleanly (it is about 7 months old), and
  I apparently stopped using it because I ran into deadlocks.  It might
  be stable enough to at least see how much it helps.
 
  Set the vfs.lookup_shared=1 sysctl to enable the other half of the
  patch.
 
  Kris

 Try this one instead, it applies to HEAD.  You'll need to manually
 enter the paths though because of how p4 mangles diffs.

I rolled a tiny, simple, possibly braindamaged benchmark (but then again 
php code tends to be braindamaged): test.php includes 1000 different, 
essential empty files and is strated over and over from a shell script 
which counts the runs completed within 60seconds.  1-8,128 scripts are 
started in parallel.

On a 2x dual Opteron running amd64 I get:

stock RELENG_7 w/o patch ULE:
jobs  sum runs   gain
  1   6171
  2   7841.27
  3   9391.52
  4  10151.65
  5   6581.07
  6   6421.04
  7   6661.08
  8   6961.13
128   7261.18

RELENG_7 patched ULE vfs.lookup_shared=1:
jobs  sum runs   gain
  1   6371
  2   7841.23
  3   9731.53
  4  11041.73
  5   7081.11
  6   7331.15
  7   7761.22
  8   8401.32
128   9361.47

So there is still a lot of room for improvement here.  I'll rebuild with 
lock profiling tomorrow and see what I can gather.  Anything you'd like 
to see in particular?

-- 
/\  Best regards,  | [EMAIL PROTECTED]
\ /  Max Laier  | ICQ #67774661
 X   http://pf4freebsd.love2party.net/  | [EMAIL PROTECTED]
/ \  ASCII Ribbon Campaign  | Against HTML Mail and News


signature.asc
Description: This is a digitally signed message part.


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-22 Thread Alexey Popov

Hi.

Kris Kennaway wrote:
In the meantime there is unfortunately not a lot that can be done, 
AFAICT.  There is one hack that I will send you later but it is not 
likely to help much.  I will also think about how to track down the 
cause of the contention further (the profiling trace only shows that 
it comes mostly from vget/vput but doesn't show where these are 
called from).


Actually this patch might help.  It doesn't replace lockmgr but it 
does fix a silly thundering herd behaviour.  It probably needs some 
adjustment to get it to apply cleanly (it is about 7 months old), and 
I apparently stopped using it because I ran into deadlocks.  It might 
be stable enough to at least see how much it helps.
Try this one instead, it applies to HEAD.  You'll need to manually enter 
the paths though because of how p4 mangles diffs.

Finally I tried your patch and it seems to help a little.

Now FreeBSD 7-STABLE ULE 8-core server without optimized PHP 
realpath_cache_size (producing 2000+ lstats per request) can handle up 
to ~24 rps as opposed to  max. 17 rps without your patch. %sys never 
grows over %user with your patch. On the server with optimized 
realpath_cache_size there's no visible influence of your patch.


However Linux is still 2 times faster for my workload and there should 
also be another ways for optimization.


With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-22 Thread Ivan Voras
Kris Kennaway wrote:
 Ivan Voras wrote:
 On 21/11/2007, Kris Kennaway [EMAIL PROTECTED] wrote:
 Ivan Voras wrote:

 Yes, but I had to verify it anyway :)
 You haven't verified anything until you look at how much work the system
 is doing, before and after.

 I have, and it's roughly the same (50 +/- 2 queries/s).

 (meaning that I'm not interested in exact statistics here, but in
 order-of-magnitude changes, which didn't happen).
 
 OK, let's take a step back here.  Did you obtain the lock profiling
 trace and verify that you're seeing the same problem as Alexey?  Can I
 see the trace?

Here it is:

http://ivoras.sharanet.org/stuff/lock_profile.txt

This is without your patch.

There's a lot of ZFS locks in there, but it seems lockmgr:ufs and
lockmgr:zfs have the largest records:

299117621   1474776121   148663 1042821  1414 0  513
 440 /usr/src/sys/kern/vfs_subr.c:2035 (lockmgr:ufs)

117958368847566147   1820932676 31672868
948  374 /usr/src/sys/kern/vfs_vnops.c:515 (lockmgr:zfs)

Which is surprising since all the working-set file systems are on ZFS,
only the root and /tmp are on UFS. /tmp also holds sockets for the
databases.

Your reading of the lock profile will be appreciated.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-22 Thread Kris Kennaway

Alexey Popov wrote:

Hi.

Kris Kennaway wrote:
In the meantime there is unfortunately not a lot that can be done, 
AFAICT.  There is one hack that I will send you later but it is not 
likely to help much.  I will also think about how to track down the 
cause of the contention further (the profiling trace only shows that 
it comes mostly from vget/vput but doesn't show where these are 
called from).


Actually this patch might help.  It doesn't replace lockmgr but it 
does fix a silly thundering herd behaviour.  It probably needs some 
adjustment to get it to apply cleanly (it is about 7 months old), and 
I apparently stopped using it because I ran into deadlocks.  It might 
be stable enough to at least see how much it helps.
Try this one instead, it applies to HEAD.  You'll need to manually 
enter the paths though because of how p4 mangles diffs.

Finally I tried your patch and it seems to help a little.

Now FreeBSD 7-STABLE ULE 8-core server without optimized PHP 
realpath_cache_size (producing 2000+ lstats per request) can handle up 
to ~24 rps as opposed to  max. 17 rps without your patch. %sys never 
grows over %user with your patch. On the server with optimized 
realpath_cache_size there's no visible influence of your patch.


You said 20 before for this configuration, so I'm a bit suspicious 
about how seriously to treat your measurements :)


Anyway, please obtain another lock profiling trace using the same 
conditions as the previous one (same workload  duration, etc), so we 
can compare what changed.


Kris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-22 Thread Kris Kennaway

Ivan Voras wrote:

On 22/11/2007, Kris Kennaway [EMAIL PROTECTED] wrote:

Ivan Voras wrote:

Kris Kennaway wrote:



OK, let's take a step back here.  Did you obtain the lock profiling
trace and verify that you're seeing the same problem as Alexey?  Can I
see the trace?

Here it is:

http://ivoras.sharanet.org/stuff/lock_profile.txt

This is without your patch.




Your reading of the lock profile will be appreciated.

OK, how about with?


The machine is going into production and I can't do such interventions
on it any more. Based on the lock trace, do you think It's the same
problem as Alexeys?


It looks like lockmgr, and the patch should definitely have helped. 
Maybe you forgot to enable vfs.lookup_shared?


Kris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-22 Thread Ivan Voras
On 22/11/2007, Kris Kennaway [EMAIL PROTECTED] wrote:

 It looks like lockmgr, and the patch should definitely have helped.
 Maybe you forgot to enable vfs.lookup_shared?

No, I haven't.

But the machine I tested it on is only 4-core; maybe it would help on
8-core machines.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-22 Thread Ivan Voras
On 22/11/2007, Kris Kennaway [EMAIL PROTECTED] wrote:
 Ivan Voras wrote:
  Kris Kennaway wrote:

  OK, let's take a step back here.  Did you obtain the lock profiling
  trace and verify that you're seeing the same problem as Alexey?  Can I
  see the trace?
 
  Here it is:
 
  http://ivoras.sharanet.org/stuff/lock_profile.txt
 
  This is without your patch.
 

  Your reading of the lock profile will be appreciated.

 OK, how about with?

The machine is going into production and I can't do such interventions
on it any more. Based on the lock trace, do you think It's the same
problem as Alexeys?
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-22 Thread Kris Kennaway

Ivan Voras wrote:

Kris Kennaway wrote:

Ivan Voras wrote:

On 21/11/2007, Kris Kennaway [EMAIL PROTECTED] wrote:

Ivan Voras wrote:

Yes, but I had to verify it anyway :)

You haven't verified anything until you look at how much work the system
is doing, before and after.

I have, and it's roughly the same (50 +/- 2 queries/s).

(meaning that I'm not interested in exact statistics here, but in
order-of-magnitude changes, which didn't happen).

OK, let's take a step back here.  Did you obtain the lock profiling
trace and verify that you're seeing the same problem as Alexey?  Can I
see the trace?


Here it is:

http://ivoras.sharanet.org/stuff/lock_profile.txt

This is without your patch.

There's a lot of ZFS locks in there, but it seems lockmgr:ufs and
lockmgr:zfs have the largest records:

299117621   1474776121   148663 1042821  1414 0  513
 440 /usr/src/sys/kern/vfs_subr.c:2035 (lockmgr:ufs)

117958368847566147   1820932676 31672868
948  374 /usr/src/sys/kern/vfs_vnops.c:515 (lockmgr:zfs)

Which is surprising since all the working-set file systems are on ZFS,
only the root and /tmp are on UFS. /tmp also holds sockets for the
databases.

Your reading of the lock profile will be appreciated.


OK, how about with?

Kris

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-21 Thread Ivan Voras
Kris Kennaway wrote:

 Try this one instead, it applies to HEAD.  You'll need to manually enter
 the paths though because of how p4 mangles diffs.

It doesn't help, at least in my case (only 4 clients) - the sys time is
still around 30% on a 4-CPU machine.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-21 Thread Kris Kennaway

Ivan Voras wrote:

Kris Kennaway wrote:


Try this one instead, it applies to HEAD.  You'll need to manually enter
the paths though because of how p4 mangles diffs.


It doesn't help, at least in my case (only 4 clients) - the sys time is
still around 30% on a 4-CPU machine.


I've already explained why that is meaningless.

Kris

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-21 Thread Ivan Voras
Kris Kennaway wrote:
 Ivan Voras wrote:
 Kris Kennaway wrote:

 Try this one instead, it applies to HEAD.  You'll need to manually enter
 the paths though because of how p4 mangles diffs.

 It doesn't help, at least in my case (only 4 clients) - the sys time is
 still around 30% on a 4-CPU machine.
 
 I've already explained why that is meaningless.

Yes, but I had to verify it anyway :)

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-21 Thread Kris Kennaway

Ivan Voras wrote:

Kris Kennaway wrote:

Ivan Voras wrote:

Kris Kennaway wrote:


Try this one instead, it applies to HEAD.  You'll need to manually enter
the paths though because of how p4 mangles diffs.

It doesn't help, at least in my case (only 4 clients) - the sys time is
still around 30% on a 4-CPU machine.

I've already explained why that is meaningless.


Yes, but I had to verify it anyway :)


You haven't verified anything until you look at how much work the system 
is doing, before and after.


Kris

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-21 Thread Alexey Popov

Hi.

Kris Kennaway wrote:
In the meantime there is unfortunately not a lot that can be done, 
AFAICT.  There is one hack that I will send you later but it is not 
likely to help much.  I will also think about how to track down the 
cause of the contention further (the profiling trace only shows that 
it comes mostly from vget/vput but doesn't show where these are called 
from).
Actually this patch might help.  It doesn't replace lockmgr but it does 
fix a silly thundering herd behaviour.  It probably needs some 
adjustment to get it to apply cleanly (it is about 7 months old), and I 
apparently stopped using it because I ran into deadlocks.  It might be 
stable enough to at least see how much it helps.

Sorry, I didn't try you patch yet but I have other news.

As mentioned in the description of your patch there is probably a 
scalability problem with stat() syscall on FreeBSD.


The PHP code of our site consists of large amount of modules. I think 
this is true for many other large PHP sites.


I reached out that PHP calls lstat() for every path element of each file 
it opens including modules. Truss output shows that PHP makes more than 
2000 lstat's for one /index.php request. After investigation I found out 
that lstats() are called from realpath() libc function. It turned out 
that PHP has realpath cache, but it's size by default is 16K which is 
not enough for my files. I set realpath_cache_size to 256K and now there 
is no that much lstat calls.


Performance of 8-core machine growed in ~ 50% for me on 7-STABLE. Now it 
can handle 30 and more requests per seconds. I have the similiar results 
with 6-STABLE. Now I have not that big %sys values as it was before (see 
 attached top output).


Nevertheless, Linux with its 50 rps is still far away from FreeBSD. 
Linux makes that 2000+ lstat's without problem. There's still stat(), 
open(), gettimeofday(), close() syscalls for each include file in PHP 
that i can not switch off.


And also it is unclear for me what to do with MySQL which happened to 
have the same problems for me.


With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-21 Thread Kris Kennaway

Alexey Popov wrote:

Also could you explain what to look for in the lock profiling results? 
Does large wait_total values indicate problem or other columns???


All of the columns (well, maybe except for the lock name ;-) can 
indicate potential problems of various kinds, so you have to look at 
them all to identify possible abnormalities.


For example, if you are acquiring a mutex many times you might look for 
ways to reduce the frequency of acquisitions.  If it is being held for 
long periods of time this can increase contention for other consumers. 
If there is high lock contention then processes will block waiting for 
it.  If processes are spending a lot of time waiting for the lock then 
they are not getting work done.


Kris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-21 Thread Ivan Voras
On 21/11/2007, Kris Kennaway [EMAIL PROTECTED] wrote:
 Ivan Voras wrote:

  Yes, but I had to verify it anyway :)

 You haven't verified anything until you look at how much work the system
 is doing, before and after.

I have, and it's roughly the same (50 +/- 2 queries/s).

(meaning that I'm not interested in exact statistics here, but in
order-of-magnitude changes, which didn't happen).
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-21 Thread Alexey Popov

Hi all.

Sorry, forgot to attach top  vmstat oputput on 8-core 7-stable with
optimized PHP realpath_cache_size.

With best regards,
Alexey Popov

last pid: 91239;  load averages:  4.64,  4.72,  7.82
 up 0+19:13:37  14:07:50
53 processes:  7 running, 46 sleeping
CPU states: 78.0% user,  0.0% nice, 21.5% system,  0.0% interrupt,  0.4% idle
Mem: 341M Active, 181M Inact, 225M Wired, 272K Cache, 186M Buf, 3158M Free
Swap: 2048M Total, 2048M Free

  PID USERNAME   THR PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
91238 www  1   40 99644K 48752K sbwait 4   0:03 64.04% httpd
91233 www  1 100098M 46124K select 3   0:03 55.79% httpd
91236 www  1 1000 92476K 41000K select 3   0:03 54.69% httpd
91234 www  1 1000 99644K 47212K select 7   0:03 54.02% httpd
91235 www  1 1000 93500K 42508K CPU1   0   0:03 52.26% httpd
91232 www  1 1000 92476K 41980K select 4   0:03 45.17% httpd
91231 www  1 1000 97596K 43656K CPU7   1   0:03 41.22% httpd
91226 www  1 1000 99644K 49524K select 6   0:05 37.63% httpd
91228 www  1 1000 98620K 46516K select 5   0:05 37.02% httpd
91237 www  1  980 99644K 43588K select 1   0:01 32.83% httpd
91223 www  1 1010 96572K 47312K select 5   0:07 31.85% httpd
91229 www  1  990 99644K 46632K select 4   0:03 30.42% httpd
91135 www  1 101098M 51528K select 7   0:33 29.86% httpd
91227 www  1 100099M 47212K select 2   0:03 28.02% httpd
91225 www  1 100098M 47068K CPU5   5   0:06 27.56% httpd
91180 www  1 1000   113M 62152K CPU2   3   0:21 26.41% httpd
91224 www  1 100099M 50260K select 5   0:06 24.21% httpd
91214 www  1  990 99644K 49576K select 5   0:11 23.49% httpd
91212 www  1 100099M 51548K select 2   0:10 22.89% httpd
91230 www  1  98098M 46200K select 7   0:02 21.31% httpd
91077 www  1  99099M 51940K CPU6   4   0:50 20.41% httpd
91209 www  1  990 97596K 46316K CPU3   3   0:10 20.08% httpd
91196 www  1  980 99648K 50412K select 7   0:13 18.59% httpd
91239 www  1  960 99644K 45728K select 1   0:01 11.57% httpd
18052 llp  1  960 32928K  4544K select 5   0:25  0.00% sshd
  698 root 1  960  8952K  2516K select 0   0:02  0.00% ntpd
  779 root 1  960 20952K  3740K select 0   0:01  0.00% sshd
89816 root 1 1000 86332K 13340K select 6   0:01  0.00% httpd
18074 root 1  200  9616K  3208K pause  0   0:01  0.00% csh
  786 root 1   80  5736K  1388K nanslp 2   0:01  0.00% cron
  765 root 1   40  4852K  1640K kqread 5   0:00  0.00% master


2 usersLoad  7.14  7.38  6.83  Nov 21 16:20

Mem:KBREALVIRTUAL   VN PAGER   SWAP PAGER
Tot   Share  TotShareFree   in   out in   out
Act  337476   33184   63070037128 3249600  count
All  397780   35216  490046448580  pages
Proc:Interrupts
  r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt331 cow   20208 total
  4   2  43   2   28k  33k 112k 4228  989  31k  31162 zfodsio0 irq4
  ozfod   ata0 irq14
 8.2%Sys   0.1%Intr 48.3%User  0.0%Nice 43.5%Idle%ozfod   mfi0 irq18
|||||||||||   daefr   uhci0 uhci
 3413 prcfr  1972 cpu0: time
 9 dtbuf26138 totfr  4228 em0 irq256
Namei Name-cache   Dir-cache10 desvn  react  2021 cpu2: time
   Callshits   %hits   % 10544 numvn  pdwak  2021 cpu3: time
   88017   88017 100   283 frevn  pdpgs  1983 cpu6: time
  intrn  1983 cpu7: time
Disks mfid0241756 wire   1972 cpu1: time
KB/t   0.00313748 act2014 cpu4: time
tps   0196500 inact  2014 cpu5: time
MB/s   0.00   256 cache
%busy 0   3247264 free
   206320 buf


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-21 Thread Peter Jeremy
On Wed, Nov 21, 2007 at 02:13:19PM +0300, Alexey Popov wrote:
As mentioned in the description of your patch there is probably a 
scalability problem with stat() syscall on FreeBSD.

I wrote a quick tool to lstat() path elements on an otherwise idle
dual-core system (1.6GHz Turion64x2, FreeBSD6.3/amd64).
One instance:  ~62k lstat/sec.  99% sys
Two instances, same path: ~43k lstat/sec/instance.  97%sys
Two instances, different path, same fs: ~50k lstat/sec/instance.  97%sys
Two instances, different fs: ~53k lstat/sec/instance.  98%sys

The slowdowns, especially the same path instance, are worse than I would
have hoped.

makes that 2000+ lstat's without problem. There's still stat(), open(), 
gettimeofday(), close() syscalls for each include file in PHP that i can 
not switch off.

Note that gettimeofday() is known to be much slower (and more
accurate) on FreeBSD than on Linux.  Robert Watson (if I recall
correctly) has done some work on building a framework to allow a
choice between slow-and-accurate and fast-and-less-precise timestamps.
I don't have the reference to hand but a check of the archives should
turn it up.

-- 
Peter Jeremy
Please excuse any delays as the result of my ISP's inability to implement
an MTA that is either RFC2821-compliant or matches their claimed behaviour.


pgp3yzodXAptR.pgp
Description: PGP signature


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-21 Thread Kris Kennaway

Alexey Popov wrote:

Hi.

Kris Kennaway wrote:
In the meantime there is unfortunately not a lot that can be done, 
AFAICT.  There is one hack that I will send you later but it is not 
likely to help much.  I will also think about how to track down the 
cause of the contention further (the profiling trace only shows that 
it comes mostly from vget/vput but doesn't show where these are 
called from).
Actually this patch might help.  It doesn't replace lockmgr but it 
does fix a silly thundering herd behaviour.  It probably needs some 
adjustment to get it to apply cleanly (it is about 7 months old), and 
I apparently stopped using it because I ran into deadlocks.  It might 
be stable enough to at least see how much it helps.

Sorry, I didn't try you patch yet but I have other news.

As mentioned in the description of your patch there is probably a 
scalability problem with stat() syscall on FreeBSD.


Not as such, that was just a random example I chose to illustrate the 
lockmgr problems I described earlier.


Try the patch I posted, it should help.

Kris

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-21 Thread Kris Kennaway

Ivan Voras wrote:

On 21/11/2007, Kris Kennaway [EMAIL PROTECTED] wrote:

Ivan Voras wrote:



Yes, but I had to verify it anyway :)

You haven't verified anything until you look at how much work the system
is doing, before and after.


I have, and it's roughly the same (50 +/- 2 queries/s).

(meaning that I'm not interested in exact statistics here, but in
order-of-magnitude changes, which didn't happen).


OK, let's take a step back here.  Did you obtain the lock profiling 
trace and verify that you're seeing the same problem as Alexey?  Can I 
see the trace?


Kris

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-21 Thread Alexey Popov

Hi.

Max Laier wrote:
I rolled a tiny, simple, possibly braindamaged benchmark (but then again 
php code tends to be braindamaged): test.php includes 1000 different, 
essential empty files and is strated over and over from a shell script 
which counts the runs completed within 60seconds.  1-8,128 scripts are 
started in parallel.

On a 2x dual Opteron running amd64 I get:
This problem is almost invisible for me on 4-core servers. Could you try 
your benchmark on server with 8 or more cores???


With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-20 Thread Alexey Popov

Hi.

Kris Kennaway wrote:
CPU states:  9.5% user,  0.0% nice, 82.0% system,  0.5% interrupt,  
8.0% idle

A wild idea that might not help: try reducing kern.hz in loader.conf to
something like 100 and see if something significant changes.

Now it runs with hz=100, number of context switches became ~ 2 times
less, but still there's 90% system CPU load (see attach).


System CPU usage doesn't tell you anything by itself, you need to look 
at how much work the system is actually doing (pages served/second, or 
whatever).  For example, when your kernel is getting more work done, 
system CPU usage will also be higher.
Usually on PHP backends slow PHP code eats most of the CPU time. I have 
%user much bigger than %system in CPU states.


But now %system is much bigger than %user and I can conclude that on 
8-core server FreeBSD consumes more CPU time than PHP.


With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-20 Thread Kris Kennaway

Alexey Popov wrote:

Hi.

Kris Kennaway wrote:
CPU states:  9.5% user,  0.0% nice, 82.0% system,  0.5% interrupt,  
8.0% idle

A wild idea that might not help: try reducing kern.hz in loader.conf to
something like 100 and see if something significant changes.

Now it runs with hz=100, number of context switches became ~ 2 times
less, but still there's 90% system CPU load (see attach).


System CPU usage doesn't tell you anything by itself, you need to look 
at how much work the system is actually doing (pages served/second, or 
whatever).  For example, when your kernel is getting more work done, 
system CPU usage will also be higher.
Usually on PHP backends slow PHP code eats most of the CPU time. I have 
%user much bigger than %system in CPU states.


But now %system is much bigger than %user and I can conclude that on 
8-core server FreeBSD consumes more CPU time than PHP.


That is one possibility, but you still need to look at the actual 
throughput on these machines before making conclusions about which is 
performing better.  Can you please provide those numbers for 6.x, 7.x 
with ULE and 4BSD on the 4-core and 8-core systems?


Kris

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-20 Thread Alexey Popov

Hi

Kris Kennaway wrote:
CPU states:  9.5% user,  0.0% nice, 82.0% system,  0.5% 
interrupt,  8.0% idle
A wild idea that might not help: try reducing kern.hz in 
loader.conf to

something like 100 and see if something significant changes.
Usually on PHP backends slow PHP code eats most of the CPU time. I 
have %user much bigger than %system in CPU states.
But now %system is much bigger than %user and I can conclude that on 
8-core server FreeBSD consumes more CPU time than PHP.
That is one possibility, but you still need to look at the actual 
throughput on these machines before making conclusions about which is 
performing better.  Can you please provide those numbers for 6.x, 7.x 
with ULE and 4BSD on the 4-core and 8-core systems?
Ok, here's results of practical research. The following is approximate 
maximum qps that backends can survive with my workload:


7-STABLE quad ULE   20
7-STABLE quad 4BSD  17
6-STABLE quad   14
6-STABLE dual   21
Linux CentOS 5 quad 50

With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-20 Thread Stefan Lambrev

Hi Alexey,

Can you please send and dmesg from FreeBSD 7 on this server?

As I'm little puzzled what you mean by 7-stable :)

Alexey Popov wrote:

Hi.

I have a large pool of web backends (Apache + mod_php5) with
2 x Xeon 3.2GHz processors and 2 x Xeon 5120 dual-core processors. The
workload is mostly CPU-bound. I'm using 6-STABLE-amd64 and also tried
7-STABLE.

Now I'm trying to use new hardware with 2 x Xeon 5320 (quad-core), but
it can not work under the same load as dual-core. It shows up to 80%
system CPU load in top:



___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


--

Best Wishes,
Stefan Lambrev
ICQ# 24134177

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-20 Thread Alexey Popov

Hi.

Ivan Voras wrote:

Some more ideas: How is your disk load (iostat, systat -vm, diskinfo -t)
during the load? You don't use NFS for the web directories, do you?
Can you run bonnie++ while the machine is idle (i.e. apache is stopped)
just to verify it isn't a stupid problem with the disks or the driver?
There's almost no disk load except writing ~15 strings per second to 
logs. All PHP code fits in memory and there's no need to read disk. 
atime turned off. NFS is not used.



So, you pick the CPU out of the motherboard and plug in another one? If
not, you can't be sure that some other thing isn't wrong. I know you
tried it on Linux, but it might use slightly different commands in the
driver that don't trigger the error. I'm very surprised that both 6.x
and 7.x behave almost the same on your load: since they are very
different in how they support multiple CPU-s, I'd expect a big
difference in this case (in favour of 7.x), not a small one. This might
point that the problem is not in the OS itself, but maybe in the
hardware or in some driver.
I did'nt change CPU myself, but I think this 4-core and 8-core servers 
(Intel SR1500 platform) are different only in CPUs. You can see it in 
dmesg in the root of this thread.



You don't have WITNESS, INVARIANTS, DIAGNOSTICS or something similar
enabled? Can you try a generic SMP kernel (called SMP in 6.x; the
GENERIC in 7.x has SMP by default) and see how it works?
Can you disable SMP and try with only one CPU (on the 2xquad machine)?
You can do it in loader.conf by setting kern.smp.disabled=1, or perhaps
in BIOS. If there's a problem in some hardware or a driver, you'd still
get a big load on sys time. You might also want to halt certain logical
CPUs in the OS itself (see smp(4) man page) and see if there's a certain
relationship between how many CPUs are running and what the sys load is.

Thank you.
I need some time to try all this. I'll report if find something.

With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-20 Thread Alexey Popov

Hi.

Ivan Voras wrote:
  Many people (including me) have run FreeBSD on machines like yours

without such problems, so let's dig further.

You don't have WITNESS, INVARIANTS, DIAGNOSTICS or something similar
enabled? Can you try a generic SMP kernel (called SMP in 6.x; the
GENERIC in 7.x has SMP by default) and see how it works?

Can you disable SMP and try with only one CPU (on the 2xquad machine)?
You can do it in loader.conf by setting kern.smp.disabled=1, or perhaps
in BIOS. If there's a problem in some hardware or a driver, you'd still
get a big load on sys time. You might also want to halt certain logical
CPUs in the OS itself (see smp(4) man page) and see if there's a certain
relationship between how many CPUs are running and what the sys load is.
Now I'm running yesterday's FreeBSD 7.0-BETA3 amd64 with GENERIC kernel. 
I rebuilt kernel and world with clean make.conf. Also I rebuilt Apache, 
PHP and eAccelerator from scratch. I tried APC as well. No success.


I tried 7-STABLE with UP kernel (GENERIC built without SMP config 
option). It works fine and can handle around 5-10 requests per second. 
It consumes %sys time is much less than %user time (see top output in 
attach). I.e. it seems to work good as a simple server with not so 
powerfull CPU.


After that I rebuilt with SMP GENERIC kernel and put on that server 2 
times more requests that UP could handle. For the first time it worked 
good. Then I increased load to 2.5 times more than UP. Immediately 
Apache child count increased to MaxClients (24), most of them in RUN 
state, and %sys became greater than %user (see attach). I think after 
some threshold of load FreeBSD is paying more CPU time to the management 
of running processes than to run them.


Also I tried to halt CPUs by machdep.hlt_cpus sysctl, but in that case 
%sys in top was still much greater than %user.


With best regards,
Alexey Popov
last pid:  1100;  load averages:  8.55,  5.20,  2.35up 
0+00:05:39  18:59:52
48 processes:  22 running, 26 sleeping
CPU states:  5.9% user,  0.0% nice, 81.3% system,  0.0% interrupt, 12.8% idle
Mem: 245M Active, 14M Inact, 102M Wired, 108K Cache, 48M Buf, 3543M Free
Swap: 2048M Total, 2048M Free

  PID USERNAME   THR PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
 1093 www  1 1050 94524K 39716K CPU1   7   0:09 34.41% httpd
 1094 www  1 1050 91452K 39732K select 4   0:09 34.01% httpd
 1097 www  1  -4098M 48392K RUN7   0:10 33.41% httpd
 1098 www  1 1050 92476K 43176K CPU4   7   0:09 33.27% httpd
 1099 www  1  -40 92476K 40784K RUN7   0:09 33.21% httpd
 1100 www  1  -40 92476K 41080K RUN4   0:09 32.87% httpd
 1095 www  1  -40 92476K 40824K RUN6   0:09 32.74% httpd
 1090 www  1  -40 96572K 42700K RUN5   0:09 32.54% httpd
 1089 www  1  -40 93504K 42032K RUN7   0:09 32.41% httpd
 1091 www  1  -40 95548K 44900K RUN4   0:09 31.95% httpd
 1096 www  1  -40 98620K 47160K RUN6   0:09 31.86% httpd
 1086 www  1  -40 96572K 45752K RUN6   0:10 30.92% httpd
 1087 www  1 1040 92476K 41016K CPU7   6   0:09 30.70% httpd
 1088 www  1 1040 92476K 38332K CPU2   6   0:10 30.51% httpd
 1085 www  1 1050 97596K 44416K CPU5   5   0:09 30.23% httpd
 1092 www  1  -40 92476K 40172K RUN6   0:08 29.45% httpd

last pid:  2203;  load averages:  9.71, 10.08,  7.38up 
0+00:17:48  18:50:39
35 processes:  5 running, 30 sleeping
CPU states: 82.2% user,  0.0% nice, 13.8% system,  0.0% interrupt,  4.0% idle
Mem: 128M Active, 15M Inact, 109M Wired, 132K Cache, 88M Buf, 3657M Free
Swap: 2048M Total, 2048M Free

  PID USERNAME   THR PRI NICE   SIZERES STATETIME   WCPU COMMAND
 2201 www  1 1170 92476K 40352K RUN  0:02 10.48% httpd
 2203 www  1 1160 96572K 40912K RUN  0:02 10.05% httpd
 2195 www  1 1170 93500K 43012K select   0:02  9.61% httpd
 2202 www  1  200 91452K 41056K lockf0:02  8.95% httpd
 2194 www  1 1170   102M 49440K RUN  0:03  8.10% httpd
 2192 www  1 1140 99648K 46168K select   0:03  6.76% httpd
 2179 www  1  200 92476K 41672K lockf0:04  6.69% httpd
 2173 www  1 1180   100M 48920K RUN  0:04  5.92% httpd
 2174 www  1  200 92476K 42964K lockf0:04  5.67% httpd
  878 llp  1  960 32928K  4576K select   0:00  0.00% sshd
  891 root 1  200  9616K  2924K pause0:00  0.00% csh
  691 root 1  960  8952K  2528K select   0:00  0.00% ntpd
 2161 root 1 1310 86332K 13080K select   0:00  0.00% httpd
 2178 root 1  960  7656K  2168K RUN  0:00  0.00% top
  875 root 1   40 32928K  4512K sbwait   0:00  0.00% sshd
  774 root 1   40  4852K  1652K kqread   0:00  0.00% master


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-20 Thread Tom Evans
On Tue, 2007-11-20 at 19:27 +0300, Alexey Popov wrote:
 Hi.
 snip
 After that I rebuilt with SMP GENERIC kernel and put on that server 2 
 times more requests that UP could handle. For the first time it worked 
 good. Then I increased load to 2.5 times more than UP. Immediately 
 Apache child count increased to MaxClients (24), most of them in RUN 
 state, and %sys became greater than %user (see attach). I think after 
 some threshold of load FreeBSD is paying more CPU time to the management 
 of running processes than to run them.
 
 Also I tried to halt CPUs by machdep.hlt_cpus sysctl, but in that case 
 %sys in top was still much greater than %user.

MaxClients of 24 seems very low for a 8 cpu box, running prefork MPM. On
our quad CPU boxes, running custom apache modules, we use 
  MaxClients 70
  MinSpareServers 5
  MaxSpareServers 15
  StartServers 20

Perhaps you are seeing high system load because the system is having to
maintain a lot of queued connections. Certainly, our load remains
in-between comfortable margins, except when heavily stressed.

Tom


signature.asc
Description: This is a digitally signed message part


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-20 Thread Ivan Voras
On 20/11/2007, Alexey Popov [EMAIL PROTECTED] wrote:

 CPU states:  5.9% user,  0.0% nice, 81.3% system,  0.0% interrupt, 12.8% idle
 CPU states: 82.2% user,  0.0% nice, 13.8% system,  0.0% interrupt,  4.0% idle

Interesting coincidence: 1 CPU generates almost 8x less sys time then 8 CPUs.

But it seems that you have found something real. Inspired by your
problem I've done a simple measurement (ab) on a 4-CPU (2x2 core
Opterons 2216 HE, PAE) machine I maintain, under these circumstances:

- a heavy PHP application
- FastCGI
- in this case, load of 4 clients
- on 6-STABLE

and I'm reporting similar findings:

last pid:  2254;  load averages:  1.43,  0.92,  0.69   up 71+08:23:06  18:00:31
153 processes: 8 running, 144 sleeping, 1 zombie
CPU states: 38.8% user,  0.0% nice, 48.4% system,  3.2% interrupt,  9.6% idle
Mem: 2321M Active, 1135M Inact, 313M Wired, 139M Cache, 112M Buf, 93M Free
Swap: 4500M Total, 336K Used, 4500M Free

  PID USERNAME  THR PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
 2208 www 1  990   115M 19808K RUN1   0:06 36.83% php-cgi
 2207 www 1 1000   114M 19348K RUN3   0:05 32.66% php-cgi
 1715 www 1  990   115M 23672K CPU0   0   0:24 27.83% php-cgi
 1710 www 1 1010   114M 23460K RUN1   0:31 22.17% php-cgi
 1882 www 1  990   115M 23392K CPU2   3   0:18 21.34% php-cgi
 1718 www 1   40   114M 22556K sbwait 0   0:21 19.14% php-cgi
 2677 pgsql   1   40   977M 55768K sbwait 0   0:00 28.00% postgres

We are not so performance bound as you so I didn't do measurements
earlier. I cannot play with settings on this machine as it is in
production, but ~~50% sys time (the measurement changes around 45% +/-
10%) seems too much.

On another 4-CPU machine (2x2 Xeons 5110, AMD64) with the same
application and benchmark setup, but RELENG_7, which is not yet in
production, the results are slightly different:

last pid: 66564;  load averages:  1.87,  0.48,  0.18   up 15+05:27:03  17:09:09
113 processes: 9 running, 104 sleeping
CPU states: 49.0% user,  0.0% nice, 28.8% system,  0.0% interrupt, 22.1% idle
Mem: 555M Active, 295M Inact, 884M Wired, 98M Cache, 213M Buf, 135M Free
Swap: 2047M Total, 2047M Free

  PID USERNAME   THR PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
66557 www  1 1090   105M 25340K RUN3   0:14 64.99% php-cgi
66559 www  1 1090   105M 25308K RUN2   0:14 62.99% php-cgi
66561 www  1  980   105M 22196K RUN0   0:01 12.99% php-cgi
66562 www  1  980   105M 22196K RUN1   0:01 11.96% php-cgi
59043 nobody   1  470  7012K  3744K select 2   0:27  5.96% sqlcached
  774 pgsql1  440   437M   112M select 2   3:55  0.00% postgres
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-20 Thread Pete French
 Thank you for your research. I think you can get more %sys with 4-core 
 processors. For me 2xquad-core systems are now completely unusable as 
 PHP backends.

I am getting very alarmed by this discussion as we just took delivery
of ten 2x quad core systems to be deployes as heavy webservers in order to
replace the dual core ones. probably under 7.0 as 6.3 wont boot PAE and
they have 16 gigs of memory.

 Anyway I'm happy that I'm not alone with this problem. But what can we 
 do about it?

when I get a webserver up and running I will also do some benchmarking and
see if I get the same results. I am simply running straight forward Obj-C 
code on mine.

-pete.
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-20 Thread Alexey Popov

Hi.

Ivan Voras wrote:

CPU states:  5.9% user,  0.0% nice, 81.3% system,  0.0% interrupt, 12.8% idle
CPU states: 82.2% user,  0.0% nice, 13.8% system,  0.0% interrupt,  4.0% idle

Interesting coincidence: 1 CPU generates almost 8x less sys time then 8 CPUs.
But it seems that you have found something real. Inspired by your
problem I've done a simple measurement (ab) on a 4-CPU (2x2 core
Opterons 2216 HE, PAE) machine I maintain, under these circumstances:

- a heavy PHP application
- FastCGI
- in this case, load of 4 clients
- on 6-STABLE

and I'm reporting similar findings:
CPU states: 38.8% user,  0.0% nice, 48.4% system,  3.2% interrupt,  9.6% idle
We are not so performance bound as you so I didn't do measurements
earlier. I cannot play with settings on this machine as it is in
production, but ~~50% sys time (the measurement changes around 45% +/-
10%) seems too much.
Thank you for your research. I think you can get more %sys with 4-core 
processors. For me 2xquad-core systems are now completely unusable as 
PHP backends.


Anyway I'm happy that I'm not alone with this problem. But what can we 
do about it?


With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-20 Thread Alexey Popov

Hi.

Tom Evans wrote:
After that I rebuilt with SMP GENERIC kernel and put on that server 2 
times more requests that UP could handle. For the first time it worked 
good. Then I increased load to 2.5 times more than UP. Immediately 
Apache child count increased to MaxClients (24), most of them in RUN 
state, and %sys became greater than %user (see attach). I think after 
some threshold of load FreeBSD is paying more CPU time to the management 
of running processes than to run them.

MaxClients of 24 seems very low for a 8 cpu box, running prefork MPM. On
our quad CPU boxes, running custom apache modules, we use 
  MaxClients 70

  MinSpareServers 5
  MaxSpareServers 15
  StartServers 20
Perhaps you are seeing high system load because the system is having to
maintain a lot of queued connections. Certainly, our load remains
in-between comfortable margins, except when heavily stressed.
I believe 8-core FreeBSD server is able to maintain 1024 waiting TCP 
connections without measurable CPU load.


As of this problem: increasing MaxClients leads to growing %sys part of 
CPU load. Generally large MaxClients value is useful when most Apache 
children are waiting for I/O or something else but CPU.


With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-20 Thread Claus Guttesen
  Thank you for your research. I think you can get more %sys with 4-core
  processors. For me 2xquad-core systems are now completely unusable as
  PHP backends.

 I am getting very alarmed by this discussion as we just took delivery
 of ten 2x quad core systems to be deployes as heavy webservers in order to
 replace the dual core ones. probably under 7.0 as 6.3 wont boot PAE and
 they have 16 gigs of memory.

I'm running two DL360 G5 webservers each with two quad-core cpu's.
Each have 8 GB of ram, one is 2 Ghz and the other is 2.33 Ghz. They
run just fine. These two webservers have twice the weight of three
opterons with two dual-core cpu's on our coyote load-balancer.

The servers are so fast I had to adjust the read- and write-size in
fstab for the nfs-mounts. Using 8kb I got the 'nfs server not
responding - alive again' during peak on the new servers. Using a 2kb
size instead solved my problem. They handle twice as much traffic as
the 4-way-opterons.

The quad-cores run 7.0 beta2 with ule, php, apache.

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-20 Thread Ivan Voras
Claus Guttesen wrote:

 I'm running two DL360 G5 webservers each with two quad-core cpu's.
 Each have 8 GB of ram, one is 2 Ghz and the other is 2.33 Ghz. They
 run just fine. These two webservers have twice the weight of three
 opterons with two dual-core cpu's on our coyote load-balancer.

The issue in this thread is not if they are fast, but could they be made
faster by shortening sys time :)

(btw. what is your sys time under stress?)

The systems I reported about are fast enough for me, but they are not
for the person who started the thread, and I agree that it looks like
there could be a problem.



signature.asc
Description: OpenPGP digital signature


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-20 Thread Kris Kennaway

Alexey Popov wrote:

Hi

Kris Kennaway wrote:
CPU states:  9.5% user,  0.0% nice, 82.0% system,  0.5% 
interrupt,  8.0% idle
A wild idea that might not help: try reducing kern.hz in 
loader.conf to

something like 100 and see if something significant changes.
Usually on PHP backends slow PHP code eats most of the CPU time. I 
have %user much bigger than %system in CPU states.
But now %system is much bigger than %user and I can conclude that on 
8-core server FreeBSD consumes more CPU time than PHP.
That is one possibility, but you still need to look at the actual 
throughput on these machines before making conclusions about which is 
performing better.  Can you please provide those numbers for 6.x, 7.x 
with ULE and 4BSD on the 4-core and 8-core systems?
Ok, here's results of practical research. The following is approximate 
maximum qps that backends can survive with my workload:


7-STABLE quad ULE20
7-STABLE quad 4BSD17
6-STABLE quad14
6-STABLE dual21
Linux CentOS 5 quad50


OK, so 7.x is an improvement compared to 6.x on the 8 core machine, and 
ULE is an improvement over 4BSD.  This much is in line with expectations.


Neither shows an improvement vs 4 cores.  It is hard to say for certain 
without a direct profile comparison of the workload, but it is probably 
due to lockmgr contention.  lockmgr is used for various locking 
operations to do with VFS data structures.  It is known to have poor 
performance and scale very badly.  It is interesting that you are 
running into this on a real workload though, so far I have only 
encountered it as a limiting factor in synthetic microbenchmarks.


There was some work done over the summer on replacing lockmgr with 
something reasonable, but unfortunately it is not yet ready for testing. 
   I am CC'ing the developer who was working on that (Attilio Rao). 
Depending on his availability it will probably be at least a couple of 
months before it is ready though.


In the meantime there is unfortunately not a lot that can be done, 
AFAICT.  There is one hack that I will send you later but it is not 
likely to help much.  I will also think about how to track down the 
cause of the contention further (the profiling trace only shows that it 
comes mostly from vget/vput but doesn't show where these are called from).


Kris








With best regards,
Alexey Popov




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-20 Thread Kris Kennaway

Kris Kennaway wrote:
In the meantime there is unfortunately not a lot that can be done, 
AFAICT.  There is one hack that I will send you later but it is not 
likely to help much.  I will also think about how to track down the 
cause of the contention further (the profiling trace only shows that it 
comes mostly from vget/vput but doesn't show where these are called from).


Actually this patch might help.  It doesn't replace lockmgr but it does 
fix a silly thundering herd behaviour.  It probably needs some 
adjustment to get it to apply cleanly (it is about 7 months old), and I 
apparently stopped using it because I ran into deadlocks.  It might be 
stable enough to at least see how much it helps.


Set the vfs.lookup_shared=1 sysctl to enable the other half of the patch.

Kris

Change 117289 by [EMAIL PROTECTED] on 2007/04/03 18:43:03

Rewrite of lockmgr to avoid the silly multiple wakeups.  On a
microbenchmark designed for high lockmgr contention (80 processes
doing 1000 stat() calls of the same file on an 8-core amd64), this
reduces system time by 95% and real time by 82%.

This was ported from 6.x so support for LOCK_PROFILING is currently
removed.

Also add a hack to fake up shared lookups on UFS: acquire the lock
in shared mode and then upgrade it to exclusive mode.  This has
a small benefit on my tests, and it is claimed there is a very
large benefit in some workloads.

Submitted by:   ups

Affected files ...

... //depot/user/kris/contention/sys/conf/options#10 edit
... //depot/user/kris/contention/sys/kern/kern_lock.c#7 edit
... //depot/user/kris/contention/sys/kern/subr_lock.c#6 edit
... //depot/user/kris/contention/sys/kern/vfs_default.c#4 edit
... //depot/user/kris/contention/sys/sys/lockmgr.h#5 edit
... //depot/user/kris/contention/sys/ufs/ffs/ffs_vfsops.c#6 edit
... //depot/user/kris/contention/sys/ufs/ffs/ffs_vnops.c#6 edit
... //depot/user/kris/contention/sys/ufs/ufs/ufs_lookup.c#4 edit

Differences ...

 //depot/user/kris/contention/sys/conf/options#10 (text+ko) 

@@ -248,6 +248,9 @@
 # Enable gjournal-based UFS journal.
 UFS_GJOURNAL   opt_ufs.h
 
+# Disable shared lookups for UFS
+NO_UFS_LOOKUP_SHARED   opt_ufs.h   
+
 # The below sentence is not in English, and neither is this one.
 # We plan to remove the static dependences above, with a
 # filesystem_ROOT option to control if it usable as root.  This list

 //depot/user/kris/contention/sys/kern/kern_lock.c#7 (text+ko) 

@@ -41,10 +41,9 @@
  */
 
 #include sys/cdefs.h
-__FBSDID($FreeBSD: src/sys/kern/kern_lock.c,v 1.109 2007/03/30 18:07:24 jhb 
Exp $);
+__FBSDID($FreeBSD: src/sys/kern/kern_lock.c,v 1.89.2.5 2006/10/09 20:04:45 
tegge Exp $);
 
 #include opt_ddb.h
-#include opt_global.h
 
 #include sys/param.h
 #include sys/kdb.h
@@ -55,52 +54,21 @@
 #include sys/mutex.h
 #include sys/proc.h
 #include sys/systm.h
-#include sys/lock_profile.h
 #ifdef DEBUG_LOCKS
 #include sys/stack.h
 #endif
 
 #ifdef DDB
 #include ddb/ddb.h
-static voiddb_show_lockmgr(struct lock_object *lock);
-#endif
-static voidlock_lockmgr(struct lock_object *lock, int how);
-static int unlock_lockmgr(struct lock_object *lock);
-
-struct lock_class lock_class_lockmgr = {
-   .lc_name = lockmgr,
-   .lc_flags = LC_SLEEPLOCK | LC_SLEEPABLE | LC_RECURSABLE | LC_UPGRADABLE,
-#ifdef DDB
-   .lc_ddb_show = db_show_lockmgr,
 #endif
-   .lc_lock = lock_lockmgr,
-   .lc_unlock = unlock_lockmgr,
-};
 
 /*
  * Locking primitives implementation.
  * Locks provide shared/exclusive sychronization.
  */
 
-void
-lock_lockmgr(struct lock_object *lock, int how)
-{
-
-   panic(lockmgr locks do not support sleep interlocking);
-}
-
-int
-unlock_lockmgr(struct lock_object *lock)
-{
-
-   panic(lockmgr locks do not support sleep interlocking);
-}
-
 #defineCOUNT(td, x)if ((td)) (td)-td_locks += (x)
-#define LK_ALL (LK_HAVE_EXCL | LK_WANT_EXCL | LK_WANT_UPGRADE | \
-   LK_SHARE_NONZERO | LK_WAIT_NONZERO)
 
-static int acquire(struct lock **lkpp, int extflags, int wanted, int 
*contested, uint64_t *waittime);
 static int acquiredrain(struct lock *lkp, int extflags) ;
 
 static __inline void
@@ -117,60 +85,16 @@
 
COUNT(td, -decr);
if (lkp-lk_sharecount == decr) {
-   lkp-lk_flags = ~LK_SHARE_NONZERO;
-   if (lkp-lk_flags  (LK_WANT_UPGRADE | LK_WANT_EXCL)) {
-   wakeup(lkp);
-   }
+   if (lkp-lk_exclusivewait != 0)
+   wakeup_one(lkp-lk_exclusivewait);
lkp-lk_sharecount = 0;
} else {
lkp-lk_sharecount -= decr;
+   if (lkp-lk_sharecount == 1  lkp-lk_flags  LK_WANT_UPGRADE)
+   wakeup(lkp-lk_flags);
}
 }
 
-static int
-acquire(struct lock **lkpp, int extflags, int wanted, int *contested, uint64_t 
*waittime)
-{
-   

Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-20 Thread Kris Kennaway

Bob Bishop wrote:

Hi,

FWIW, we are seeing 2 x quad-core 2.66GHz outperform (per core) 2 x 
dual-core 3GHz on the same type of m/b, apparently because of better 
bandwidth to memory. However, this is on a compute-intensive workload 
running 1 job per core so would be pretty insensitive to 
scheduler/locking issues.


Alexey's problem is pretty specific to filesystem performance.  Good to 
hear though :)


Kris

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-20 Thread Claus Guttesen
  FWIW, we are seeing 2 x quad-core 2.66GHz outperform (per core) 2 x
  dual-core 3GHz on the same type of m/b, apparently because of better
  bandwidth to memory. However, this is on a compute-intensive workload
  running 1 job per core so would be pretty insensitive to
  scheduler/locking issues.

 Alexey's problem is pretty specific to filesystem performance.  Good to
 hear though :)

If that is the conclusion, wouldn't it make sense trying a different
disk-controller then?

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-20 Thread Kris Kennaway

Claus Guttesen wrote:

FWIW, we are seeing 2 x quad-core 2.66GHz outperform (per core) 2 x
dual-core 3GHz on the same type of m/b, apparently because of better
bandwidth to memory. However, this is on a compute-intensive workload
running 1 job per core so would be pretty insensitive to
scheduler/locking issues.

Alexey's problem is pretty specific to filesystem performance.  Good to
hear though :)


If that is the conclusion, wouldn't it make sense trying a different
disk-controller then?


Filesystem, not disk.  See my earlier email for more detailed discussion.

Kris

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-20 Thread Claus Guttesen
 The issue in this thread is not if they are fast, but could they be made
 faster by shortening sys time :)

Yes, I'm aware of that. :-) The comment was related to the former mail
where some uncertainty came along when he read this thread.

 (btw. what is your sys time under stress?)

I'll take a look on sunday. That is our busiest day (evening).

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-20 Thread Bob Bishop

Hi,

FWIW, we are seeing 2 x quad-core 2.66GHz outperform (per core) 2 x  
dual-core 3GHz on the same type of m/b, apparently because of better  
bandwidth to memory. However, this is on a compute-intensive workload  
running 1 job per core so would be pretty insensitive to scheduler/ 
locking issues.


--
Bob Bishop  +44 (0)118 940 1243
[EMAIL PROTECTED] fax +44 (0)118 940 1295




___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-20 Thread Kris Kennaway

Kris Kennaway wrote:

Kris Kennaway wrote:
In the meantime there is unfortunately not a lot that can be done, 
AFAICT.  There is one hack that I will send you later but it is not 
likely to help much.  I will also think about how to track down the 
cause of the contention further (the profiling trace only shows that 
it comes mostly from vget/vput but doesn't show where these are called 
from).


Actually this patch might help.  It doesn't replace lockmgr but it does 
fix a silly thundering herd behaviour.  It probably needs some 
adjustment to get it to apply cleanly (it is about 7 months old), and I 
apparently stopped using it because I ran into deadlocks.  It might be 
stable enough to at least see how much it helps.


Set the vfs.lookup_shared=1 sysctl to enable the other half of the patch.

Kris



Try this one instead, it applies to HEAD.  You'll need to manually enter 
the paths though because of how p4 mangles diffs.


Kris
 //depot/user/kris/contention/sys/kern/kern_lock.c#10 - 
/zoo/kris/contention/kern/kern_lock.c 
@@ -109,7 +109,6 @@
 #define LK_ALL (LK_HAVE_EXCL | LK_WANT_EXCL | LK_WANT_UPGRADE | \
LK_SHARE_NONZERO | LK_WAIT_NONZERO)
 
-static int acquire(struct lock **lkpp, int extflags, int wanted, int 
*contested, uint64_t *waittime);
 static int acquiredrain(struct lock *lkp, int extflags) ;
 
 static __inline void
@@ -126,61 +125,17 @@
 
COUNT(td, -decr);
if (lkp-lk_sharecount == decr) {
-   lkp-lk_flags = ~LK_SHARE_NONZERO;
-   if (lkp-lk_flags  (LK_WANT_UPGRADE | LK_WANT_EXCL)) {
-   wakeup(lkp);
-   }
+   if (lkp-lk_exclusivewait != 0)
+   wakeup_one(lkp-lk_exclusivewait);
lkp-lk_sharecount = 0;
} else {
lkp-lk_sharecount -= decr;
+   if (lkp-lk_sharecount == 1  lkp-lk_flags  LK_WANT_UPGRADE)
+   wakeup(lkp-lk_flags);
}
 }
 
-static int
-acquire(struct lock **lkpp, int extflags, int wanted, int *contested, uint64_t 
*waittime)
-{
-   struct lock *lkp = *lkpp;
-   int error;
-   CTR3(KTR_LOCK,
-   acquire(): lkp == %p, extflags == 0x%x, wanted == 0x%x,
-   lkp, extflags, wanted);
 
-   if ((extflags  LK_NOWAIT)  (lkp-lk_flags  wanted))
-   return EBUSY;
-   error = 0;
-   if ((lkp-lk_flags  wanted) != 0)
-   lock_profile_obtain_lock_failed(lkp-lk_object, contested, 
waittime);
-   
-   while ((lkp-lk_flags  wanted) != 0) {
-   CTR2(KTR_LOCK,
-   acquire(): lkp == %p, lk_flags == 0x%x sleeping,
-   lkp, lkp-lk_flags);
-   lkp-lk_flags |= LK_WAIT_NONZERO;
-   lkp-lk_waitcount++;
-   error = msleep(lkp, lkp-lk_interlock, lkp-lk_prio,
-   lkp-lk_wmesg, 
-   ((extflags  LK_TIMELOCK) ? lkp-lk_timo : 0));
-   lkp-lk_waitcount--;
-   if (lkp-lk_waitcount == 0)
-   lkp-lk_flags = ~LK_WAIT_NONZERO;
-   if (error)
-   break;
-   if (extflags  LK_SLEEPFAIL) {
-   error = ENOLCK;
-   break;
-   }
-   if (lkp-lk_newlock != NULL) {
-   mtx_lock(lkp-lk_newlock-lk_interlock);
-   mtx_unlock(lkp-lk_interlock);
-   if (lkp-lk_waitcount == 0)
-   wakeup((void *)(lkp-lk_newlock));
-   *lkpp = lkp = lkp-lk_newlock;
-   }
-   }
-   mtx_assert(lkp-lk_interlock, MA_OWNED);
-   return (error);
-}
-
 /*
  * Set, change, or release a lock.
  *
@@ -189,16 +144,16 @@
  * accepted shared locks and shared-to-exclusive upgrades to go away.
  */
 int
-_lockmgr(struct lock *lkp, u_int flags, struct mtx *interlkp, 
-struct thread *td, char *file, int line)
-
+lockmgr(lkp, flags, interlkp, td)
+   struct lock *lkp;
+   u_int flags;
+   struct mtx *interlkp;
+   struct thread *td;
 {
int error;
struct thread *thr;
-   int extflags, lockflags;
-   int contested = 0;
-   uint64_t waitstart = 0;
-   
+   int extflags;
+
error = 0;
if (td == NULL)
thr = LK_KERNPROC;
@@ -226,7 +181,7 @@
 
if ((flags  (LK_NOWAIT|LK_RELEASE)) == 0)
WITNESS_WARN(WARN_GIANTOK | WARN_SLEEPOK,
-   lkp-lk_interlock-lock_object,
+   lkp-lk_interlock-mtx_object,
Acquiring lockmgr lock \%s\, lkp-lk_wmesg);
 
if (panicstr != NULL) {
@@ -253,16 +208,30 @@
 * lock itself ).
 */
if (lkp-lk_lockholder != thr) {
-   lockflags = LK_HAVE_EXCL;
-   if (td != NULL  !(td-td_pflags  TDP_DEADLKTREAT))
-   lockflags |= 

Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-19 Thread Ivan Voras
Alexey Popov wrote:
 Hi.
 
 I have a large pool of web backends (Apache + mod_php5) with
 2 x Xeon 3.2GHz processors and 2 x Xeon 5120 dual-core processors. The
 workload is mostly CPU-bound. I'm using 6-STABLE-amd64 and also tried
 7-STABLE.

If you haven't tried mod_fcgid, give it a try - it can dramatically
benefit PHP applications. And with mod_fcgid, you can use apache with a
multi-threaded MPM (i.e. worker-mpm).

 Now I'm trying to use new hardware with 2 x Xeon 5320 (quad-core), but
 it can not work under the same load as dual-core. It shows up to 80%
 system CPU load in top:

On what version of FreeBSD is this? If it's 6-STABLE, this might be
expected.

 CPU states:  9.5% user,  0.0% nice, 79.9% system,  1.2% interrupt,  9.5%
 idle

Can you try hitting S to see if a kernel process is gobbling up CPU time?

 Here's the output from 2xdual-core backend running under the same load
 and with the same software:

 CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0%
 idle

This line is bogus - where is the load?

 What can I do to make FreeBSD run faster on many-CPU systems???

Except for trying 7-STABLE, there's not much you can do.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-19 Thread Robert Watson


On Mon, 19 Nov 2007, Alexey Popov wrote:

I tried Linux and it works much better than old (2 x dual-core) backends. It 
handles 2 times more requests than FreeBSD on the old backends. So there's a 
real scalability problem in FreeBSD. The more processors it have the more 
CPU time it consumes.


Also I faced the same problem moving heavily loaded MySQL-server to new 
hardware. That time I thought that the problem is in the mysql-server itself 
and I had to install Linux.


See in attach: mutex statistics for quad-core system and dmesg and vmstat 
for dual- and quad-core systems.


What can I do to make FreeBSD run faster on many-CPU systems???


Have you configured libmap.conf to force MySQL to use libthr instead of 
libpthread?  libpthread is known to have serious performance bottlenecks for 
MySQL as compared to libthr.


FreeBSD 7 contains significant optimization for increased numbers of cores, 
and is where a lot of the work optimizing MySQL has ended up.  I see you're 
trying out a 6.3 beta, any chance you could try out a 7.0 beta instead? 
Also, consider switching to options SCHED_ULE in the 7.0 kernel rather than 
options SCHED_4BSD.


Robert N M Watson
Computer Laboratory
University of Cambridge
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-19 Thread Alexey Popov

Hi.

Ivan Voras wrote:

I have a large pool of web backends (Apache + mod_php5) with
2 x Xeon 3.2GHz processors and 2 x Xeon 5120 dual-core processors. The
workload is mostly CPU-bound. I'm using 6-STABLE-amd64 and also tried
7-STABLE. 

If you haven't tried mod_fcgid, give it a try - it can dramatically
benefit PHP applications. And with mod_fcgid, you can use apache with a
multi-threaded MPM (i.e. worker-mpm).
We tried to run php + nginx via fastcgi interface without apache at all, 
but improvement was too little (~10% more request per second) to abandon 
the advantages of apache.



Now I'm trying to use new hardware with 2 x Xeon 5320 (quad-core), but
it can not work under the same load as dual-core. It shows up to 80%
system CPU load in top:

On what version of FreeBSD is this? If it's 6-STABLE, this might be
expected.
I have almost identical results on 6-STABLE and 7-STABLE. Maybe 7-STABLE 
performs a little better.



CPU states:  9.5% user,  0.0% nice, 79.9% system,  1.2% interrupt,  9.5%
idle

Can you try hitting S to see if a kernel process is gobbling up CPU time?

There's no such a process:

last pid:  5266;  load averages: 24.67, 22.65, 17.44   up 0+03:56:38 
 17:09:37

121 processes: 41 running, 62 sleeping, 18 waiting
CPU states:  9.5% user,  0.0% nice, 82.0% system,  0.5% interrupt,  8.0% 
idle

Mem: 439M Active, 27M Inact, 80M Wired, 108K Cache, 58M Buf, 3341M Free
Swap: 2048M Total, 2048M Free

  PID USERNAME  PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
 5090 www-40 96572K 49464K RUN5   2:59 23.39% httpd
 3748 www-40 96172K 50060K RUN4  14:21 23.19% httpd
 5092 www-40 96412K 48060K RUN4   2:57 23.19% httpd
 5095 www-40 98148K 50688K RUN5   2:57 22.75% httpd
 5088 www-40 96664K 49120K RUN4   3:02 22.56% httpd
 5098 www-40 97404K 49864K RUN3   2:57 22.56% httpd
 5106 www   1180 97908K 49972K CPU7   6   2:57 22.51% httpd
 5084 www-40 96012K 48164K RUN5   3:01 22.46% httpd
 5081 www-40 96636K 49700K RUN0   3:01 22.36% httpd
 5109 www-40 96844K 49188K RUN3   2:51 22.36% httpd
 5108 www-40 95808K 47508K RUN5   3:00 22.31% httpd
 5085 www-40 98244K 49560K RUN4   2:58 21.88% httpd
 5104 www-40 96836K 48956K CPU5   5   2:55 21.88% httpd
 5086 www   1180 99140K 51264K CPU0   3   3:00 21.78% httpd
 5111 www-40 96360K 48532K RUN0   2:56 21.78% httpd
 5105 www-40 96364K 47356K RUN0   2:58 21.73% httpd
 5099 www-40 9K 47156K RUN4   2:55 21.73% httpd
 5096 www-40 96004K 48324K RUN4   2:56 21.68% httpd
 5083 www   1170 97712K 50344K RUN2   3:03 21.63% httpd
 5094 www   1180 97196K 49348K CPU3   6   2:56 21.58% httpd
 5103 www-40 96040K 48808K RUN4   2:58 21.48% httpd
 5089 www   1180 96084K 47808K CPU2   4   2:59 21.34% httpd
 5082 www   1170 96412K 48520K CPU6   5   3:00 21.29% httpd
 5107 www-40 98172K 50332K RUN4   2:55 21.29% httpd
 5091 www-40 97460K 49504K RUN0   2:56 20.95% httpd
 5100 www-40 97188K 49400K RUN4   2:56 20.65% httpd
 5110 www-40 95168K 47436K RUN5   2:59 20.56% httpd
 5087 www   1160 98432K 51172K CPU4   5   2:55 20.31% httpd
 5097 www-40 96428K 49124K RUN4   2:59 20.21% httpd
 5102 www   1170 96344K 48512K CPU3   4   3:01 19.82% httpd
 5093 www-40 96512K 49948K RUN4   2:55 19.82% httpd
 5101 www-40 96012K 48968K RUN3   3:01 19.48% httpd
   10 root  171   52 0K16K RUN7 174:56  7.86% idle: cpu7
   12 root  171   52 0K16K RUN5 174:44  7.86% idle: cpu5
   14 root  171   52 0K16K RUN3 175:04  7.62% idle: cpu3


Here's the output from 2xdual-core backend running under the same load
and with the same software:



CPU states:  0.0% user,  0.0% nice,  0.0% system,  0.0% interrupt,  0.0%
idle


This line is bogus - where is the load?

Sorry, probably it was my fault in copypast.

last pid: 54690;  load averages:  3.47,  4.89,  5.18   up 42+02:07:51 
17:00:00

47 processes:  3 running, 43 sleeping, 1 zombie
CPU states: 56.0% user,  0.0% nice, 16.7% system,  1.6% interrupt, 25.7% 
idle

Mem: 2268M Active, 416M Inact, 277M Wired, 186M Cache, 214M Buf, 664M Free
Swap: 2048M Total, 1408K Used, 2047M Free

  PID USERNAME   THR PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
54681 www  1 1060 96916K 47792K CPU3   0   0:10 33.45% httpd
54652 www  1  200 97716K 48144K lockf  1   0:24 31.61% httpd
54680 www  1 1060 96416K 46832K select 1   0:10 31.37% httpd
54686 www  1  200 97640K 45604K lockf  1   0:04 31.13% httpd
54651 www  1 1040 96552K 46924K CPU1   1   0:25 29.50% httpd
54685 www  1 1070 99124K 47300K select 3   

Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-19 Thread Alexey Popov

Hi.

Robert Watson wrote:
Also I faced the same problem moving heavily loaded MySQL-server to 
new hardware. That time I thought that the problem is in the 
mysql-server itself and I had to install Linux.

What can I do to make FreeBSD run faster on many-CPU systems???
Have you configured libmap.conf to force MySQL to use libthr instead of 
libpthread?  libpthread is known to have serious performance bottlenecks 
for MySQL as compared to libthr.
I'm always using libthr with MySQL on 6-STABLE and it really helps. But 
that time with MySQL (and this time with Apache) the bottleneck was 
somewhere else.


FreeBSD 7 contains significant optimization for increased numbers of 
cores, and is where a lot of the work optimizing MySQL has ended up.  I 
see you're trying out a 6.3 beta, any chance you could try out a 7.0 
beta instead? Also, consider switching to options SCHED_ULE in the 7.0 
kernel rather than options SCHED_4BSD.
I tried 7-BETA with SHED_4BSD and id did not help. Now I'll try 
SHED_ULE, thanks.


With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-19 Thread Ivan Voras
Alexey Popov wrote:

 last pid:  5266;  load averages: 24.67, 22.65, 17.44   up 0+03:56:38
  17:09:37
 121 processes: 41 running, 62 sleeping, 18 waiting
 CPU states:  9.5% user,  0.0% nice, 82.0% system,  0.5% interrupt,  8.0%
 idle
 Mem: 439M Active, 27M Inact, 80M Wired, 108K Cache, 58M Buf, 3341M Free
 Swap: 2048M Total, 2048M Free
 
   PID USERNAME  PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
  5090 www-40 96572K 49464K RUN5   2:59 23.39% httpd
  3748 www-40 96172K 50060K RUN4  14:21 23.19% httpd
  5092 www-40 96412K 48060K RUN4   2:57 23.19% httpd
  5095 www-40 98148K 50688K RUN5   2:57 22.75% httpd
  5088 www-40 96664K 49120K RUN4   3:02 22.56% httpd

This is really unusual - the number of processes is not that high, but
if I'm reading the line from systat correctly, you have unusually many
context switches:

  r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Fltcow   16839 total
 27   1  39  137k 3390  33k 2490  313 2519   2519 zfod
sio0 irq4

nginx or similar asynchronous web servers should reduce inter-process
contention context switches dramatically, but you say that it didn't
work as such so the problem might be somewhere else.

Try sending a 10-second or so output from vmstat to confirm this problem.

If you can, attach a ktrace(1) to one of the httpd processes that
consumes CPU, and send the processed kdump output.

Also, did you try configuring and running pecl-APC for PHP?

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-19 Thread Ivan Voras
Alexey Popov wrote:

 CPU states:  9.5% user,  0.0% nice, 82.0% system,  0.5% interrupt,  8.0%
 idle

A wild idea that might not help: try reducing kern.hz in loader.conf to
something like 100 and see if something significant changes.

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-19 Thread Alexey Popov

Hi

Robert Watson wrote:
FreeBSD 7 contains significant optimization for increased numbers of 
cores, and is where a lot of the work optimizing MySQL has ended up.  I 
see you're trying out a 6.3 beta, any chance you could try out a 7.0 
beta instead? Also, consider switching to options SCHED_ULE in the 7.0 
kernel rather than options SCHED_4BSD.

I tried SCHED_ULE, but got no difference:

last pid:  1063;  load averages: 22.75, 13.76,  6.31up 0+00:07:24 
17:53:49

56 processes:  33 running, 23 sleeping
CPU states: 26.5% user,  0.0% nice, 68.1% system,  0.3% interrupt,  5.1% 
idle

Mem: 365M Active, 20M Inact, 102M Wired, 664K Cache, 46M Buf, 3419M Free
Swap: 2048M Total, 2048M Free

  PID USERNAME   THR PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
 1019 www  1 1010   101M 51244K RUN6   0:37 26.86% httpd
 1040 www  1  -40 92476K 42956K RUN1   0:36 26.76% httpd
 1004 www  1  -40 92476K 42864K RUN4   0:38 25.98% httpd
 1018 www  1 1010 91452K 41736K CPU3   3   0:37 25.68% httpd
 1000 www  1 1010 92476K 42544K RUN0   0:36 25.29% httpd
 1026 www  1 1010 93500K 39900K CPU0   0   0:35 25.20% httpd
 1021 www  1 1010   101M 49432K RUN4   0:37 25.10% httpd
 1024 www  1 1010 93500K 44416K RUN5   0:37 25.10% httpd
 1020 www  1 1010 94524K 43684K RUN0   0:37 25.00% httpd
 1030 www  1 1010 96576K 46004K RUN3   0:36 25.00% httpd
 1031 www  1 1010   101M 50956K RUN3   0:37 24.66% httpd
 1025 www  1 1010 94524K 43880K RUN5   0:36 24.56% httpd
 1041 www  1 1010 92476K 41792K RUN2   0:36 24.56% httpd
 1022 www  1 1010   101M 48932K RUN5   0:36 24.27% httpd

With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-19 Thread Alexey Popov

Hi.

Ivan Voras wrote:

CPU states:  9.5% user,  0.0% nice, 82.0% system,  0.5% interrupt,  8.0%
idle

A wild idea that might not help: try reducing kern.hz in loader.conf to
something like 100 and see if something significant changes.

Now it runs with hz=100, number of context switches became ~ 2 times
less, but still there's 90% system CPU load (see attach).

With best regards,
Alexey Popov











1 usersLoad 16.36 12.24  6.14  Nov 19 18:08

Mem:KBREALVIRTUAL   VN PAGER   SWAP PAGER
Tot   Share  TotShareFree   in   out in   out
Act  366988   16952   83428837248 3515624  count 1
All  423228   18472  508956841144  pages 4
Proc:Interrupts
  r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Flt  1 cow3917 total
 31  24   98k  35k  95k 2315  100  29k  29919 zfodsio0 irq4
  ozfod   ata0 irq14
48.6%Sys   1.0%Intr 49.5%User  0.0%Nice  1.0%Idle%ozfod 5 mfi0 irq18
|||||||||||   daefr   uhci0 uhci
+   1709 prcfr   200 cpu0: time
 4 dtbuf23140 totfr  2311 em0 irq256
Namei Name-cache   Dir-cache10 desvn  react   200 cpu2: time
   Callshits   %hits   %  1494 numvn  pdwak   200 cpu3: time
  147517  147514 100   158 frevn  pdpgs   200 cpu1: time
  intrn   200 cpu4: time
Disks mfid0106840 wire200 cpu7: time
KB/t  20.20355720 act 201 cpu5: time
tps   5 21248 inact   200 cpu6: time
MB/s   0.10  1228 cache
%busy 1   3514792 free
65056 buf





___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]

Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-19 Thread Ronald Klop

On Mon, 19 Nov 2007 15:54:32 +0100, Alexey Popov [EMAIL PROTECTED] wrote:


Hi

Robert Watson wrote:
FreeBSD 7 contains significant optimization for increased numbers of  
cores, and is where a lot of the work optimizing MySQL has ended up.  I  
see you're trying out a 6.3 beta, any chance you could try out a 7.0  
beta instead? Also, consider switching to options SCHED_ULE in the  
7.0 kernel rather than options SCHED_4BSD.

I tried SCHED_ULE, but got no difference:

last pid:  1063;  load averages: 22.75, 13.76,  6.31up 0+00:07:24  
17:53:49

56 processes:  33 running, 23 sleeping
CPU states: 26.5% user,  0.0% nice, 68.1% system,  0.3% interrupt,  5.1%  
idle

Mem: 365M Active, 20M Inact, 102M Wired, 664K Cache, 46M Buf, 3419M Free
Swap: 2048M Total, 2048M Free

   PID USERNAME   THR PRI NICE   SIZERES STATE  C   TIME   WCPU  
COMMAND

  1019 www  1 1010   101M 51244K RUN6   0:37 26.86% httpd
  1040 www  1  -40 92476K 42956K RUN1   0:36 26.76% httpd
  1004 www  1  -40 92476K 42864K RUN4   0:38 25.98% httpd
  1018 www  1 1010 91452K 41736K CPU3   3   0:37 25.68% httpd
  1000 www  1 1010 92476K 42544K RUN0   0:36 25.29% httpd
  1026 www  1 1010 93500K 39900K CPU0   0   0:35 25.20% httpd
  1021 www  1 1010   101M 49432K RUN4   0:37 25.10% httpd
  1024 www  1 1010 93500K 44416K RUN5   0:37 25.10% httpd
  1020 www  1 1010 94524K 43684K RUN0   0:37 25.00% httpd
  1030 www  1 1010 96576K 46004K RUN3   0:36 25.00% httpd
  1031 www  1 1010   101M 50956K RUN3   0:37 24.66% httpd
  1025 www  1 1010 94524K 43880K RUN5   0:36 24.56% httpd
  1041 www  1 1010 92476K 41792K RUN2   0:36 24.56% httpd
  1022 www  1 1010   101M 48932K RUN5   0:36 24.27% httpd


You have a lot of free memory. Maybe you can wait a little to let it fill  
the cache or let it use more buf's. This could explain that the system is  
spending a lot if time in 'system'.


Ronald.

--
 Ronald Klop
 Amsterdam, The Netherlands
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-19 Thread Robert Watson

On Mon, 19 Nov 2007, Alexey Popov wrote:


Robert Watson wrote:
FreeBSD 7 contains significant optimization for increased numbers of cores, 
and is where a lot of the work optimizing MySQL has ended up.  I see you're 
trying out a 6.3 beta, any chance you could try out a 7.0 beta instead? 
Also, consider switching to options SCHED_ULE in the 7.0 kernel rather 
than options SCHED_4BSD.

I tried SCHED_ULE, but got no difference:


Did you see no change in throughput, or no change in reported CPU use?

We should probably take this thread to performance@ and get Kris involved.  He 
may be interested in trying to reproduce your workload in our testbed so we 
can perform measurements of our own, as well as getting you to provide 
profiling information.  One of the things we'd most like to have are nice 
potted benchmarks for real-world workloads, as that allows us to easily replay 
them, perform measurements, optimize, etc.


Thanks,

Robert N M Watson
Computer Laboratory
University of Cambridge



last pid:  1063;  load averages: 22.75, 13.76,  6.31up 0+00:07:24 
17:53:49

56 processes:  33 running, 23 sleeping
CPU states: 26.5% user,  0.0% nice, 68.1% system,  0.3% interrupt,  5.1% idle
Mem: 365M Active, 20M Inact, 102M Wired, 664K Cache, 46M Buf, 3419M Free
Swap: 2048M Total, 2048M Free

 PID USERNAME   THR PRI NICE   SIZERES STATE  C   TIME   WCPU COMMAND
1019 www  1 1010   101M 51244K RUN6   0:37 26.86% httpd
1040 www  1  -40 92476K 42956K RUN1   0:36 26.76% httpd
1004 www  1  -40 92476K 42864K RUN4   0:38 25.98% httpd
1018 www  1 1010 91452K 41736K CPU3   3   0:37 25.68% httpd
1000 www  1 1010 92476K 42544K RUN0   0:36 25.29% httpd
1026 www  1 1010 93500K 39900K CPU0   0   0:35 25.20% httpd
1021 www  1 1010   101M 49432K RUN4   0:37 25.10% httpd
1024 www  1 1010 93500K 44416K RUN5   0:37 25.10% httpd
1020 www  1 1010 94524K 43684K RUN0   0:37 25.00% httpd
1030 www  1 1010 96576K 46004K RUN3   0:36 25.00% httpd
1031 www  1 1010   101M 50956K RUN3   0:37 24.66% httpd
1025 www  1 1010 94524K 43880K RUN5   0:36 24.56% httpd
1041 www  1 1010 92476K 41792K RUN2   0:36 24.56% httpd
1022 www  1 1010   101M 48932K RUN5   0:36 24.27% httpd

With best regards,
Alexey Popov


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-19 Thread Alexey Popov

Hi.

Ivan Voras wrote:

last pid:  5266;  load averages: 24.67, 22.65, 17.44   up 0+03:56:38
121 processes: 41 running, 62 sleeping, 18 waiting
CPU states:  9.5% user,  0.0% nice, 82.0% system,  0.5% interrupt,  8.0%
idle

This is really unusual - the number of processes is not that high, but
if I'm reading the line from systat correctly, you have unusually many
context switches:
  r   p   d   s   w   Csw  Trp  Sys  Int  Sof  Fltcow   16839 total
 27   1  39  137k 3390  33k 2490  313 2519   2519 zfod
sio0 irq4
nginx or similar asynchronous web servers should reduce inter-process
contention context switches dramatically, but you say that it didn't
work as such so the problem might be somewhere else.
Try sending a 10-second or so output from vmstat to confirm this problem.

Yes, there's really many context switches:

%vmstat 1
 procs  memory  page   disk   faults  cpu
 r b w avmfre   flt  re  pi  pofr  sr mf0   in   sy   cs us 
sy id
23 1 0  615284 3581456 15980   0   0   0 15964   0   0 1414 58211 115230 
25 60 15
24 0 0  631668 3564976  9940   0   0   0  5793   0   0  664 30036 158059 
11 79 10
20 0 0  655220 3545516 22146   0   0   0 16731   0   0 1992 77638 116627 
31 65  4
23 0 0  622452 3579700 18248   0   0   0 27451   0   0 1839 80646 115798 
38 59  3
15 9 0  614260 3587484  4795   0   0   0  6765   0   0  352 23938 159993 
 6 83 11
21 0 0  625524 3567948 10154   0   0   0  5308   0   0  653 32718 159119 
11 81  8
13 3 0  627572 3571924 15266   0   0   0 16278   0   0 1031 50321 142111 
20 69 11
21 0 0  605044 3591860  9008   0   0   0 14021   0   0  873 42083 160441 
13 79  8
19 1 0  611188 3593404  7498   0   0   0  7920   0   0  489 30012 158176 
10 77 13
24 0 0  610164 3592360  5855   0   0   0  5602   0   0  666 26627 162937 
 8 81 11
20 3 0  622452 3587456  6372   0   0   0  5144   0   0  362 23705 161257 
10 81 10

^C
%


If you can, attach a ktrace(1) to one of the httpd processes that
consumes CPU, and send the processed kdump output.

Here is it: http://83.167.98.162/gprof/kdump.txt.gz


Also, did you try configuring and running pecl-APC for PHP?'s
I'm using eAccelerator. Again, the same soft works good on less-CPU 
system and on Linux.


With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-19 Thread Alexey Popov

Hi.

Robert Watson wrote:

I tried SCHED_ULE, but got no difference:

Did you see no change in throughput, or no change in reported CPU use?

No significant changes.

We should probably take this thread to performance@ and get Kris 
involved.  He may be interested in trying to reproduce your workload in 
our testbed so we can perform measurements of our own, as well as 
getting you to provide profiling information.  One of the things we'd 
most like to have are nice potted benchmarks for real-world workloads, 
as that allows us to easily replay them, perform measurements, optimize, 
etc.
I can provide all profiling or configuration information you ask for. 
Except I can't provide PHP site source codes.


Now I'm in situation that I can't install FreeBSD on all new servers 
because they are all based on 2xquad-core processors and I can't be sure 
it would work good.


With best regards,
Alexey Popov
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-19 Thread Krassimir Slavchev
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi All,

What version of apache do you use and what are:
StartServers
MinSpareServers
MaxSpareServers
MaxClients

KeepAliveTimeout

settings in both configurations?

Best Regards

Alexey Popov wrote:
 Hi.
 
 I have a large pool of web backends (Apache + mod_php5) with
 2 x Xeon 3.2GHz processors and 2 x Xeon 5120 dual-core processors. The
 workload is mostly CPU-bound. I'm using 6-STABLE-amd64 and also tried
 7-STABLE.
 
 Now I'm trying to use new hardware with 2 x Xeon 5320 (quad-core), but
 it can not work under the same load as dual-core. It shows up to 80%
 system CPU load in top:
 

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.7 (FreeBSD)

iD8DBQFHQblfxJBWvpalMpkRAgTQAJ4uy8qhmpCVWevAI0LSYXPrXiIUSQCeNE8y
+dkavLoDzqrILkqVGZNZZDM=
=xI6R
-END PGP SIGNATURE-
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-19 Thread Claus Guttesen
On Nov 19, 2007 2:32 PM, Alexey Popov [EMAIL PROTECTED] wrote:
 Hi.

 I have a large pool of web backends (Apache + mod_php5) with
 2 x Xeon 3.2GHz processors and 2 x Xeon 5120 dual-core processors. The
 workload is mostly CPU-bound. I'm using 6-STABLE-amd64 and also tried
 7-STABLE.

 Now I'm trying to use new hardware with 2 x Xeon 5320 (quad-core), but
 it can not work under the same load as dual-core. It shows up to 80%
 system CPU load in top:

 last pid:  3850;  load averages: 22.51, 19.75, 12.18

Very high load. Could it be the raid-controller? I had a db-server
with horibble performance due to a cheap raid-controller. Moving to a
ciss-controller (DL380 G5) solved all my issues. My load decreased 100
fold.

-- 
regards
Claus

When lenity and cruelty play for a kingdom,
the gentlest gamester is the soonest winner.

Shakespeare
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-19 Thread Kris Kennaway

Alexey Popov wrote:

Hi.

Robert Watson wrote:

I tried SCHED_ULE, but got no difference:

Did you see no change in throughput, or no change in reported CPU use?

No significant changes.

We should probably take this thread to performance@ and get Kris 
involved.  He may be interested in trying to reproduce your workload 
in our testbed so we can perform measurements of our own, as well as 
getting you to provide profiling information.  One of the things we'd 
most like to have are nice potted benchmarks for real-world workloads, 
as that allows us to easily replay them, perform measurements, 
optimize, etc.
I can provide all profiling or configuration information you ask for. 
Except I can't provide PHP site source codes.


Now I'm in situation that I can't install FreeBSD on all new servers 
because they are all based on 2xquad-core processors and I can't be sure 
it would work good.


Running mutex profiling for e.g. 1 minute of representative load would 
be a useful starting point, as well as hwpmc profiling for the same 
duration.


My guess is that you're hitting contention in the TCP send path, but I 
missed the start of this conversation so I don't know what problems you 
are seeing.


Kris
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-19 Thread Ivan Voras
Alexey Popov wrote:

 Here is it: http://83.167.98.162/gprof/kdump.txt.gz

I don't see anything unusual there.

Some more ideas: How is your disk load (iostat, systat -vm, diskinfo -t)
during the load? You don't use NFS for the web directories, do you?

Can you run bonnie++ while the machine is idle (i.e. apache is stopped)
just to verify it isn't a stupid problem with the disks or the driver?

 Also, did you try configuring and running pecl-APC for PHP?'s
 I'm using eAccelerator. Again, the same soft works good on less-CPU
 system and on Linux.

So, you pick the CPU out of the motherboard and plug in another one? If
not, you can't be sure that some other thing isn't wrong. I know you
tried it on Linux, but it might use slightly different commands in the
driver that don't trigger the error. I'm very surprised that both 6.x
and 7.x behave almost the same on your load: since they are very
different in how they support multiple CPU-s, I'd expect a big
difference in this case (in favour of 7.x), not a small one. This might
point that the problem is not in the OS itself, but maybe in the
hardware or in some driver.

Many people (including me) have run FreeBSD on machines like yours
without such problems, so let's dig further.

You don't have WITNESS, INVARIANTS, DIAGNOSTICS or something similar
enabled? Can you try a generic SMP kernel (called SMP in 6.x; the
GENERIC in 7.x has SMP by default) and see how it works?

Can you disable SMP and try with only one CPU (on the 2xquad machine)?
You can do it in loader.conf by setting kern.smp.disabled=1, or perhaps
in BIOS. If there's a problem in some hardware or a driver, you'd still
get a big load on sys time. You might also want to halt certain logical
CPUs in the OS itself (see smp(4) man page) and see if there's a certain
relationship between how many CPUs are running and what the sys load is.





signature.asc
Description: OpenPGP digital signature


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-19 Thread Jeremy Chadwick
On Mon, Nov 19, 2007 at 07:35:09PM +0100, Ivan Voras wrote:
 Some more ideas: How is your disk load (iostat, systat -vm, diskinfo -t)
 during the load? You don't use NFS for the web directories, do you?

Don't forget about gstat(8), which (if the issue is an I/O bottleneck)
may help pinpoint what particular disk device is being utilised too
heavily.

-- 
| Jeremy Chadwickjdc at parodius.com |
| Parodius Networking   http://www.parodius.com/ |
| UNIX Systems Administrator  Mountain View, CA, USA |
| Making life hard for others since 1977.  PGP: 4BD6C0CB |

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: 2 x quad-core system is slower that 2 x dual core on FreeBSD

2007-11-19 Thread Ivan Voras
Kris Kennaway wrote:

 My guess is that you're hitting contention in the TCP send path, but I
 missed the start of this conversation so I don't know what problems you
 are seeing.

Here it is:
http://lists.freebsd.org/pipermail/freebsd-stable/2007-November/038371.html

there's some mutex profiling there.

Offtopic: How to you read output from debug.mutex.prof.stats? Is
cnt_lock the number of times a lock has been attempted to be acquired
but it wasn't available?




signature.asc
Description: OpenPGP digital signature


  1   2   >