[PERFORM] CPU maximized out!

2004-09-18 Thread Qing Zhao
Hi, there,
I am running PostgreSQL 7.3.4 on MAC OS X G5 with dual processors and
8GB memory. The shared buffer was set as 512MB.
The database has been running great until about 10 days ago when our
developers decided to add some indexes to some tables to speed up
certain uploading ops.
Now the CPU usage reaches 100% constantly when there are a few users
accessing their information by SELECT tables in databases. If I REINEX
all the indexes, the database performance improves a bit but before 
long,
it goes back to bad again.

My suspicion is that since  now a few indexes are added, every ops are
run by PostgreSQL with the indexes being used when calculating cost.
This leads to the downgrade of performance.
What do you think of this? What is the possible solution?
Thanks!
Qing
The following is the output from TOP command:
Processes:  92 total, 4 running, 88 sleeping... 180 threads
13:09:18
Load Avg:  2.81, 2.73, 2.50 CPU usage:  95.2% user, 4.8% sys, 0.0% 
idle
SharedLibs: num =  116, resident = 11.5M code, 1.66M data, 4.08M 
LinkEdit
MemRegions: num = 12132, resident =  148M + 2.82M private,  403M shared
PhysMem:   435M wired, 5.04G active, 2.22G inactive, 7.69G used,  316M 
free
VM: 32.7G + 81.5M   5281127(13) pageins, 8544145(0) pageouts

  PID COMMAND  %CPU   TIME   #TH #PRTS #MREGS RPRVT  RSHRD  RSIZE  
VSIZE
27314 postgres92.2%  2:14.75   1 949  12.8M+  396M  75.0M+  
849M
26099 postgres91.1% 19:28.04   1 967  15.9M+  396M   298M+  
850M
24754 top  2.8%  4:48.33   12926   272K   404K   648K  
27.1M
0 kernel_tas   1.9%  2:12:05  40 2  8476  67.1M 0K   281M  
1.03G
  294 hwmond   0.5%  2:26:34   87557   240K   544K  1.09M  
31.0M
  347 lookupd  0.3%  1:52:28   23573  3.05M   648K  3.14M  
33.6M
   89 configd  0.1% 53:05.16   3   126   151   304K   644K   832K  
29.2M
26774 servermgrd   0.1%  0:02.93   11040   344K- 1.17M+ 1.86M  
28.2M
  170 coreservic   0.1%  0:09.04   14093   152K   532K  2.64M  
28.5M
  223 DirectoryS   0.1% 19:42.47   884   135   880K+ 1.44M  4.60M+ 
37.1M+
  125 dynamic_pa   0.0%  0:26.79   1121716K   292K28K  
17.7M
   87 kextd0.0%  0:01.23   21721 0K   292K36K  
28.2M
  122 update   0.0% 14:27.71   1 91516K   300K44K  
17.6M
1 init 0.0%  0:00.03   1121628K   320K76K  
17.6M
2 mach_init0.0%  3:36.18   2951876K   320K   148K  
18.2M
   81 syslogd  0.0%  0:19.96   1101796K   320K   148K  
17.7M

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


[PERFORM] Tryint to match Solaris-Oracle performance with directio?

2004-09-18 Thread Mischa Sandberg
Our product (Sophos PureMessage) runs on a Postgres database.
Some of our Solaris customers have Oracle licenses, and they've 
commented on the performance difference between Oracle and Postgresql
on such boxes. In-house, we've noticed the 2:1 (sometimes 5:1)
performance difference in inserting rows (mostly 2-4K), between
Postgresql on Solaris 8 and on Linux, for machines with comparable
CPU's and RAM.

These (big) customers are starting to ask, why don't we just port our 
dataserver to Oracle for them? I'd like to avoid that, if possible :-)

What we can test on, in-house are leetle Sun workstations, while some of 
our customers have BIG Sun iron --- so I have no means to-date to 
reproduce what their bottleneck is :-( Yes, it has been recommended that 
we talk to Sun about their iForce test lab ... that's in the pipe.

In the meantime, what I gather from browsing mail archives is that 
postgresql on Solaris seems to get hung up on IO rather than CPU.
Furthermore, I notice that Oracle and now MySQL use directio to bypass 
the system cache, when doing heavy writes to the disk; and Postgresql 
does not.

Not wishing to alter backend/store/file for this test, I figured I could 
get a customer to mount the UFS volume for pg_xlog  with the option 
forcedirectio.

Any comment on this? No consideration of what the wal_sync_method is at 
this point. Presumably it's defaulting to fdatasync on Solaris.

BTW this is Postgres 7.4.1, and our customers are Solaris 8 and 9.
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Tryint to match Solaris-Oracle performance with directio?

2004-09-18 Thread Tom Lane
Mischa Sandberg [EMAIL PROTECTED] writes:
 Our product (Sophos PureMessage) runs on a Postgres database.
 Some of our Solaris customers have Oracle licenses, and they've 
 commented on the performance difference between Oracle and Postgresql
 on such boxes. In-house, we've noticed the 2:1 (sometimes 5:1)
 performance difference in inserting rows (mostly 2-4K), between
 Postgresql on Solaris 8 and on Linux, for machines with comparable
 CPU's and RAM.

You haven't given any evidence at all to say that I/O is where the
problem is.  I think it would be good first to work through the
conventional issues such as configuration parameters, foreign key
problems, etc.  Give us some more detail about the slow INSERT
queries ...

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


[PERFORM] Planner having way wrong estimate for group aggregate

2004-09-18 Thread Steinar H. Gunderson
Hi,

I'm using PostgreSQL 7.4 on a table with ~700.000 rows looking like this:

   Table public.enkeltsalg
   Column   |   Type   |   Modifiers   

+--+---
 id | integer  | not null default 
nextval('enkeltsalg_id_seq'::text)
 kommentar  | text | not null default ''::text
 antall | numeric(14,4)| not null
 belop  | numeric(10,0)| not null
 type   | character(1) | not null
 tid| timestamp with time zone | default now()
 eksternid  | integer  | 
 kasseid| integer  | 
 baraapning | integer  | 
 salgspris  | integer  | 
 firma  | integer  | 
 bongid | integer  | 
Indexes:
enkeltsalg_pkey primary key, btree (id)
enkeltsalg_aapn btree (baraapning)
enkeltsalg_aapn_pris btree (baraapning, salgspris)
enkeltsalg_aapn_type btree (baraapning, type)
enkeltsalg_pris btree (salgspris)
Check constraints:
enkeltsalg_type_valid CHECK (type = 'K'::bpchar OR type = 'B'::bpchar OR 
type = 'M'::bpchar OR type = 'T'::bpchar)

And I'm doing the query (after VACUUM ANALYZE)

smt=# explain analyze select sum(belop) as omsetning,date_trunc('day',tid) as dato 
from enkeltsalg group by date_trunc('day',tid);
QUERY PLAN 
   
--
 GroupAggregate  (cost=108062.34..114477.98 rows=172735 width=17) (actual 
time=20977.544..23890.020 rows=361 loops=1)
   -  Sort  (cost=108062.34..109912.99 rows=740263 width=17) (actual 
time=20947.372..21627.107 rows=710720 loops=1)
 Sort Key: date_trunc('day'::text, tid)
 -  Seq Scan on enkeltsalg  (cost=0.00..18010.29 rows=740263 width=17) 
(actual time=0.091..7180.528 rows=710720 loops=1)
 Total runtime: 23908.538 ms
(5 rows)

Now, as you can see, the GroupAggregate here is _way_ off, so the planner
makes the wrong choice (it should do a hash aggregate). If I set sort_mem to
131072 instead of 16384, it does a hash aggregate (which is 10 seconds
instead of 24), but I can't have sort_mem that high generally.

Now, my first notion was creating a functional index to help the planner:

smt=# create index enkeltsalg_dag on enkeltsalg ( date_trunc('day',tid) );
CREATE INDEX  
smt=# vacuum analyze;
VACUUM

However, this obviously didn't help the planner (this came as a surprise to
me, but probably won't come as a surprise to the more seasoned users here :-)
):

smt=# explain analyze select sum(belop) as omsetning,date_trunc('day',tid) as dato 
from enkeltsalg group by date_trunc('day',tid);
QUERY PLAN 
   
--
 GroupAggregate  (cost=103809.15..110017.11 rows=175512 width=17) (actual 
time=21061.357..23917.370 rows=361 loops=1)
   -  Sort  (cost=103809.15..105585.95 rows=710720 width=17) (actual 
time=21032.239..21695.674 rows=710720 loops=1)
 Sort Key: date_trunc('day'::text, tid)
 -  Seq Scan on enkeltsalg  (cost=0.00..17641.00 rows=710720 width=17) 
(actual time=0.091..7231.387 rows=710720 loops=1)
 Total runtime: 23937.791 ms
(5 rows)

I also tried to increase the statistics on the tid column:

smt=# alter table enkeltsalg alter column tid set statistics 500;
ALTER TABLE
smt=# analyze enkeltsalg;
ANALYZE

However, this made the planner only do a _worse_ estimate:

smt=# explain analyze select sum(belop) as omsetning,date_trunc('day',tid) as dato 
from enkeltsalg group by date_trunc('day',tid);
QUERY PLAN 
   
--
 GroupAggregate  (cost=107906.59..114449.09 rows=199715 width=17) (actual 
time=20947.197..23794.389 rows=361 loops=1)
   -  Sort  (cost=107906.59..109754.56 rows=739190 width=17) (actual 
time=20918.001..21588.735 rows=710720 loops=1)
 Sort Key: date_trunc('day'::text, tid)
 -  Seq Scan on enkeltsalg  (cost=0.00..17996.88 rows=739190 width=17) 
(actual time=0.092..7166.488 rows=710720 loops=1)
 Total runtime: 23814.624 ms
(5 rows)

Actually, it seems that the higher I set statistics on tid, the worse the
estimate becomes.

Also, I was told (on #postgresql :-) ) to include the following information:

smt=# select n_distinct from pg_stats 

Re: [PERFORM] Tryint to match Solaris-Oracle performance with directio?

2004-09-18 Thread Shachar Shemesh
Mischa Sandberg wrote:
In the meantime, what I gather from browsing mail archives is that 
postgresql on Solaris seems to get hung up on IO rather than CPU.
Well, people more knowledgeable in the secrets of postgres seem 
confident that this is not your problem. Fortunetly, however, there is a 
simple way to find out.

Just download the utinyint var type from pgfoundry 
(http://pgfoundry.org/projects/sql2pg/). There are some stuff there you 
will need to compile yourself from CVS. I'm sorry, but I haven't done a 
proper release just yet. In any case, the utinyint type should provide 
you with the data type you seek, and thus allow you to find out whether 
this is, indeed, the problem.

--
Shachar Shemesh
Lingnu Open Source Consulting ltd.
http://www.lingnu.com/
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Planner having way wrong estimate for group aggregate

2004-09-18 Thread Tom Lane
Steinar H. Gunderson [EMAIL PROTECTED] writes:
 Now, my first notion was creating a functional index to help the planner:
 ...
 However, this obviously didn't help the planner (this came as a surprise to
 me, but probably won't come as a surprise to the more seasoned users here :-)

7.4 doesn't have any statistics on expression indexes.  8.0 will do what
you want though.  (I just fixed an oversight that prevented it from
doing so...)

 Actually, it seems that the higher I set statistics on tid, the worse the
 estimate becomes.

I believe that the estimate of number of groups will be exactly the same
as the estimate of the number of values of tid --- there's no knowledge
that date_trunc() might reduce the number of distinct values.

 Any ideas for speeding this up?

In 7.4, the only way I can see to force this to use a hash aggregate is
to temporarily set enable_sort false or raise sort_mem.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Planner having way wrong estimate for group aggregate

2004-09-18 Thread Steinar H. Gunderson
On Sat, Sep 18, 2004 at 03:48:13PM -0400, Tom Lane wrote:
 7.4 doesn't have any statistics on expression indexes.  8.0 will do what
 you want though.  (I just fixed an oversight that prevented it from
 doing so...)

OK, so I'll have to wait for 8.0.0beta3 or 8.0.0 (I tried 8.0.0beta2, it gave
me zero difference) -- fortunately, I can probably wait at the rate
everything else is progressing here. :-)

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster