Re: Performance issue

2005-05-10 Thread Ewan Todd
 
 I think I've found the problem: Python uses setjmp/longjmp to protect  
 against SIGFPU every time it does floating point operations. The  python 
 script does not actually use threads, and libpthread assumes  
 non-threaded processes are system scope. So, it would end up using  the 
 sigprocmask syscall, even though it doesn't really need to.
 The diff at http://people.freebsd.org/~ssouhlal/testing/ 
 thr_sigmask-20050509.diff fixes this, by making sure the process is  
 threaded, before using the syscall.
 
 Note that the setjmp/longjmp code is only active if Python is 
 ./configure'd  with -with-fpectl, which has been standard for the 
 ports built Python for a long time.
 
 ISTR that this was because FreeBSD didn't mask SIGFPE by default, while
 Linux and many other OSes do.  I also seem to recall that this may have 
 changed in the evolution of 5.x.  If so, perhaps use of this configure
 option in the port needs to be reviewed for 5.x and later.

Well, I don't know what else it breaks, but for this microbenchmark,
compiling python-2.4.1 without -with-fpectl works swimmingly well
for me.  Not only does it bring the system time way down, but the user
time is down too, to about 5/7 of its previous value:

5.3-RELEASE / without -with-fpectl

   48.78 real48.22 user 0.15 sys
 23372  maximum resident set size
   657  average shared memory size
 20817  average unshared data size
   128  average unshared stack size
  5402  page reclaims
 0  page faults
 0  swaps
 0  block input operations
 0  block output operations
 0  messages sent
 0  messages received
 0  signals received
 0  voluntary context switches
  4889  involuntary context switches

compared with 

5.3-RELEASE / with -with-fpectl

  106.59 real67.25 user38.57 sys
 23140  maximum resident set size
   660  average shared memory size
 20818  average unshared data size
   128  average unshared stack size
  5402  page reclaims
 0  page faults
 0  swaps
 0  block input operations
 0  block output operations
 0  messages sent
 0  messages received
 0  signals received
 0  voluntary context switches
 10678  involuntary context switches

I tentatively second Andrew's proposal that the use of this configure
option in the port needs to be reviewed for 5.x and later, pending
independent confirmation of the efficacy of this fix.

-e
___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Performance issue

2005-05-09 Thread Ewan Todd

Hi All,

I have what I think is a serious performance issue with fbsd 5.3
release.  I've read about threading issues, and it seems to me that
that is what I'm looking at, but I'm not confident enough to rule out
that it might be a hardware issue, a kernel configuration issue, or
something to do with the python port.  I'd appreciate it if someone
would it point out if I'm overlooking something obvious.  Otherwise,
if it is the problem I think it is, then there seems entirely too
little acknowledgement of a major issue.

Here's the background.  I just got a new (to me) AMD machine and put
5.3 release on it.  I'd been very happy with the way my old Intel
machine had been performing with 4.10 stable, and I decided to run a
simple performance diagnostic on both machines, to wow myself with the
amazing performance of the new hardware / kernel combination.
However, the result was pretty disappointing.

Here are what I think are the pertinent dmesg details.

Old rig:

  FreeBSD 4.10-RELEASE #0: Thu Jul  1 22:47:08 EDT 2004
  Timecounter i8254  frequency 1193182 Hz
  Timecounter TSC  frequency 449235058 Hz
  CPU: Pentium III/Pentium III Xeon/Celeron (449.24-MHz 686-class CPU)

New rig:

  FreeBSD 5.3-RELEASE #0: Fri Nov  5 04:19:18 UTC 2004
  Timecounter i8254 frequency 1193182 Hz quality 0
  CPU: AMD Athlon(tm) Processor (995.77-MHz 686-class CPU)
  Timecounter ACPI-fast frequency 3579545 Hz quality 1000
  Timecounter TSC frequency 995767383 Hz quality 800
  Timecounters tick every 10.000 msec

The diagnostic I selected was a python program to generate 1 million
pseudo-random numbers and then to perform a heap sort on them.  That
code is included at the foot of this email.  I named the file
heapsort.py.  I ran it on both machines, using the time utility in
/usr/bin/ (not the builtin tcsh time).  So the command line was

  /usr/bin/time -al -o heapsort.data ./heapsort.py 100

A typical result for the old rig was

  130.78 real   129.86 user 0.11 sys
 22344  maximum resident set size
   608  average shared memory size
 20528  average unshared data size
   128  average unshared stack size
  5360  page reclaims
 0  page faults
 0  swaps
 0  block input operations
 0  block output operations
 0  messages sent
 0  messages received
 0  signals received
 0  voluntary context switches
  2386  involuntary context switches

Whereas, the typical result for the new rig looked more like

  105.36 real71.10 user33.41 sys
 23376  maximum resident set size
   659  average shared memory size
 20796  average unshared data size
   127  average unshared stack size
  5402  page reclaims
 0  page faults
 0  swaps
 0  block input operations
 0  block output operations
 0  messages sent
 0  messages received
 0  signals received
 0  voluntary context switches
 10548  involuntary context switches

You'll notice that the new rig is indeed a little faster (times in
seconds): 105.36 real (new rig) compared with 130.78 real (old rig).

However, the new rig spends about 33.41 seconds on system overhead
compared with just 0.11 seconds on the old rig.  Comparing the rusage
stats, the only significant difference is the involuntary context
switches field, where the old rig has 2386 and the new rig has a
whopping 10548.  Further, I noticed that the number of context
switches on the new rig seems to be more or less exactly one per 10
msec of real time, that is, one per timecounter tick.  (I saw this
when comparing heapsort.py runs with arguments other than 100.)

I think the new rig ought to execute this task in about 70 seconds:
just over the amount of user time.  Assuming that I'm not overlooking
something obvious, and that I'm not interpreting a feature as a bug, 
this business with the context switches strikes me as a bit of a
show-stopper.  If that's right, it appears to be severely underplayed
in the release documentation.

I'll be happy if someone would kindly explain to me what's going on
here.  I'll be even happier to hear of a fix or workaround to remedy
the situation.

Thanks in advance,

-e




heapsort.py:

#!/usr/local/bin/python -O
# $Id: heapsort-python-3.code,v 1.3 2005/04/04 14:56:45 bfulgham Exp $
#
# The Great Computer Language Shootout
# http://shootout.alioth.debian.org/
#
# Updated by Valentino Volonghi for Python 2.4
# Reworked by Kevin Carson to produce correct results and same intent

import sys

IM = 139968
IA =   3877
IC =  29573

LAST = 42
def gen_random(max) :
global LAST
LAST = (LAST * IA + IC) % IM
return( (max * LAST) / IM )

def heapsort(n, ra) :
ir = n
l = (n  1) + 1

while True :
if l  1 :
l -= 1
rra = ra[l]
else :
rra = ra[ir]
ra[ir] = ra[1]
ir -= 1
if ir == 1 :
ra[1] = 

Re: Performance issue

2005-05-09 Thread Ewan Todd
 
 Whereas, the typical result for the new rig looked more like
 
   105.36 real71.10 user33.41 sys
  ...
  10548  involuntary context switches
 
 
 
 First of all, make sure that you have WITNESS and INVARIANTS off in your
 kernel.  You might also want to recompile your kernel with the SMP 
 option turned off.
 
 Scott

First of all, thanks to Mike Tancsa for suggesting 5.4 RC4 and to Pete
French for running the test independently on the higher spec machines
with 5.4 RC4 on them, confirming the system time thing, ruling out an
AMD problem, dissociating the system time result from the context
switching, and saving me the trouble of rediscovering the same problem
on 5.4 RC4.  

This is my first foray into the public world of FreeBSD discussion
lists, and I am encouraged by the helpfulness of the response.

Scott, the 5.3 kernel I had was a essentially a GENERIC release
kernel, with about 100 options commented out.  WITNESS and INVARIANTS
are off by default, which I confirmed by looking through `sysctl -a`.
However, I was curious to see what I would get if I switched them on,
so I added these options and recompiled the kernel:

  options KDB
  options DDB
  options INVARIANTS
  options INVARIANT_SUPPORT
  options WITNESS
  options WITNESS_SKIPSPIN

The result, below, has essentially the same user time (or just less,
if that makes any sense), but tripled system time.  The context
switches are consistent with the one-per-10msec I saw before.  Is
there anything useful I can do while I still have the kernel debug
options on?

-e


  172.29 real67.53 user   103.07 sys
 23376  maximum resident set size
   659  average shared memory size
 20805  average unshared data size
   127  average unshared stack size
  5402  page reclaims
 0  page faults
 0  swaps
 0  block input operations
 0  block output operations
 0  messages sent
 0  messages received
 0  signals received
 0  voluntary context switches
 17234  involuntary context switches


___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]


Re: Performance issue

2005-05-09 Thread Ewan Todd
 
 5.3 ships with SMP turned on, which makes lock operations rather 
 expensive on single-processor machines.  4.x does not have SMP
 turned on by default.  Would you be able to re-run your test with
 SMP turned off?
 

I'm pretty sure there's no SMP in this kernel.

  #cd /usr/src/sys/i386/conf
  #fgrep SMP MYKERNEL
  #

GENERIC has no SMP in it, but there's a second GENERIC kernel conf
called SMP, which simply says:

  include GENERIC
  options SMP

However, sysctl seems to show smp not active, but not disabled.   Is
that anything to worry about?

  #sysctl -a | grep smp
  kern.smp.maxcpus: 1
  kern.smp.active: 0
  kern.smp.disabled: 0
  kern.smp.cpus: 1
  debug.psmpkterrthresh: 2


-e

___
freebsd-stable@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-stable
To unsubscribe, send any mail to [EMAIL PROTECTED]