Re: [PERFORM] Wierd context-switching issue on Xeon

Joe Conway Mon, 19 Apr 2004 20:22:58 -0700

Tom Lane wrote:

Here is a test case.  To set up, run the "test_setup.sql" script once;
then launch two copies of the "test_run.sql" script.  (For those of
you with more than two CPUs, see whether you need one per CPU to make
trouble, or whether two test_runs are enough.)  Check that you get a
nestloops-with-index-scans plan shown by the EXPLAIN in test_run.

Check.

In isolation, test_run.sql should do essentially no syscalls at all once
it's past the initial ramp-up.  On a machine that's functioning per
expectations, multiple copies of test_run show a relatively low rate of
semop() calls --- a few per second, at most --- and maybe a delaying
select() here and there.

What I actually see on Josh's client's machine is a context swap storm:
"vmstat 1" shows CS rates around 170K/sec.  strace'ing the backends
shows a corresponding rate of semop() syscalls, with a few delaying
select()s sprinkled in.  top(1) shows system CPU percent of 25-30
and idle CPU percent of 16-20.

Your test case works perfectly. I ran 4 concurrent psql sessions, on a quad Xeon (IBM x445, 2.8GHz, 4GB RAM), hyperthreaded. Heres what 'top' looks like:

177 processes: 173 sleeping, 3 running, 1 zombie, 0 stopped
CPU states:  cpu    user    nice  system    irq  softirq  iowait    idle
           total   35.9%    0.0%    7.2%   0.0%     0.0%    0.0%   56.8%
           cpu00   19.6%    0.0%    4.9%   0.0%     0.0%    0.0%   75.4%
           cpu01   44.1%    0.0%    7.8%   0.0%     0.0%    0.0%   48.0%
           cpu02    0.0%    0.0%    0.0%   0.0%     0.0%    0.0%  100.0%
           cpu03   32.3%    0.0%   13.7%   0.0%     0.0%    0.0%   53.9%
           cpu04   21.5%    0.0%   10.7%   0.0%     0.0%    0.0%   67.6%
           cpu05   42.1%    0.0%    9.8%   0.0%     0.0%    0.0%   48.0%
           cpu06  100.0%    0.0%    0.0%   0.0%     0.0%    0.0%    0.0%
           cpu07   27.4%    0.0%   10.7%   0.0%     0.0%    0.0%   61.7%
Mem: 4123700k av, 3933896k used, 189804k free, 0k shrd, 221948k buff
                  2492124k actv,  760612k in_d,   41416k in_c
Swap: 2040244k av, 5632k used, 2034612k free 3113272k cached

Note that cpu06 is not a postgres process. The output of vmstat looks like this:

# vmstat 1 procs memory swap io system cpu r b swpd free buff cache si so bi bo in cs us sy id wa 4 0 5632 184264 221948 3113308 0 0 0 0 0 0 0 0 0 0 3 0 5632 184264 221948 3113308 0 0 0 0 112 211894 36 9 55 0 5 0 5632 184264 221948 3113308 0 0 0 0 125 222071 39 8 53 0 4 0 5632 184264 221948 3113308 0 0 0 0 110 215097 39 10 52 0 1 0 5632 184588 221948 3113308 0 0 0 96 139 187561 35 10 55 0 3 0 5632 184588 221948 3113308 0 0 0 0 114 241731 38 10 52 0 3 0 5632 184920 221948 3113308 0 0 0 0 132 257168 40 9 51 0 1 0 5632 184912 221948 3113308 0 0 0 0 114 251802 38 9 54 0

Note the test case assumes you've got shared_buffers set to at least
1000; with smaller values, you may get some I/O syscalls, which will
probably skew the results.


 shared_buffers
----------------
 16384
(1 row)

I found that killing three of the four concurrent queries dropped context switches to about 70,000 to 100,000. Two or more sessions brings it up to 200K+.

Joe

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
     subscribe-nomail command to [EMAIL PROTECTED] so that your
     message can get through to the mailing list cleanly

Re: [PERFORM] Wierd context-switching issue on Xeon

Reply via email to