date:20071108

Re: [PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-08 Thread Guillaume Smet

Tom,

On Nov 8, 2007 12:14 AM, Tom Lane <[EMAIL PROTECTED]> wrote:
> I've applied a patch that might help you:
> http://archives.postgresql.org/pgsql-committers/2007-11/msg00104.php

AFAICS, it doesn't seem to fix the problem. I just compiled
REL8_1_STABLE branch and I still has the following behaviour:
lbo=# ANALYZE cms_items;
ANALYZE
lbo=# explain analyze select * from cms_items where ancestors LIKE '1062/%';
 QUERY PLAN

 Seq Scan on cms_items  (cost=0.00..688.26 rows=1 width=103) (actual
time=0.009..22.258 rows=11326 loops=1)
   Filter: ((ancestors)::text ~~ '1062/%'::text)
 Total runtime: 29.835 ms
(3 rows)

lbo=# show lc_collate;
 lc_collate
-
 fr_FR.UTF-8
(1 row)

Do you see any reason why your patch doesn't change anything in this case?

Thanks.

--
Guillaume

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [PERFORM] Need to run CLUSTER to keep performance

2007-11-08 Thread Heikki Linnakangas


Rafael Martinez wrote:

This is a question about something we have seen sometimes in the last
months. It happens with tables with a large amount of updates/selects
compared with the amount of inserts/deletes. The sizes of these tables
are small and the amount of rows too.

The 'problem' is that performance decrease during the day and the only
thing that helps is to run CLUSTER on the table with problems. VACUUM
ANALYZE does not help.

Some information that can help to find out why this happens:

- PostgreSQL version: 8.1.9

--
scanorama=# SELECT pg_size_pretty(pg_relation_size('hosts'));

 pg_size_pretty

 12 MB
--
scanorama=# SELECT count(*) FROM hosts ;

 count
---
 16402
--
scanorama=# EXPLAIN ANALYZE SELECT * FROM hosts;

 Seq Scan on hosts  (cost=0.00..2771.56 rows=66756 width=314) (actual
time=0.008..2013.415 rows=16402 loops=1)
 Total runtime: 2048.486 ms
--
scanorama=# VACUUM ANALYZE ;
VACUUM
--
scanorama=# EXPLAIN ANALYZE SELECT * FROM hosts;

 Seq Scan on hosts  (cost=0.00..2718.57 rows=61357 width=314) (actual
time=0.008..1676.283 rows=16402 loops=1)
 Total runtime: 1700.826 ms
--
scanorama=# CLUSTER hosts_pkey ON hosts ;
CLUSTER
--
scanorama=# EXPLAIN ANALYZE SELECT * FROM hosts;

 Seq Scan on hosts  (cost=0.00..680.02 rows=16402 width=314) (actual
time=0.008..31.205 rows=16402 loops=1)
 Total runtime: 53.635 ms
--
scanorama=# SELECT * from pg_stat_all_tables WHERE relname LIKE 'hosts';
 relid  | schemaname | relname | seq_scan | seq_tup_read | idx_scan |
idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del
++-+--+--+--+---+---+---+---
 105805 | public | hosts   |  1996430 |  32360280252 |  2736391 |
   3301856 |   948 |   1403325 |   737

The information from pg_stat_all_tables is from the last 20 days.
--
INFO:  analyzing "public.hosts"
INFO:  "hosts": scanned 2536 of 2536 pages, containing 16410 live rows
and 57042 dead rows; 16410 rows in sample, 16410 estimated total rows
INFO:  free space map contains 191299 pages in 786 relations
DETAIL:  A total of 174560 page slots are in use (including overhead).
174560 page slots are required to track all free space.
Current limits are:  200 page slots, 4000 relations, using 12131 KB.
--

The tables with this 'problem' are not big, so CLUSTER finnish very fast
and it does not have an impact in the access because of locking. But we
wonder why this happens.


2 seconds for seq scanning 12 MB worth of data sounds like a lot. Have 
you increased shared_buffers from the default? Which operating system 
are you using? Shared memory access is known to be slower on Windows.


On a small table like that you could run VACUUM every few minutes 
without much impact on performance. That should keep the table size in 
check.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [PERFORM] Need to run CLUSTER to keep performance

2007-11-08 Thread Tomáš Vondra

Performance problems with heavily modified tables (UPDATE or DELETE) are 
usually caused by not vacuuming. There are two main modes the VACUUM can 
run in (plain or full) and the former works in a much more aggressive 
way (exclusive locking, etc). Try to run VACUUM FULL VERBOSE on the 
table and see if it helps.


A way to fix this is usually a proper setting of pg_autovacuum daemon - 
it may work on the tables that are not modified heavily, but it does not 
work for the heavily modified ones. Do you have the autovacuum daemon 
enabled? What are the settings of it? Try to set it a little bit more 
aggressive (this can be done on a table level).


The stats from pg_stat_all_tables are nice, but I guess the stats that 
matter are located in pg_class catalog, the most interesting beeing 
reltuples and relpages columns - run


  SELECT relname, relpages, reltuples WHERE relname LIKE 'hosts';

and observe the number of pages before and afrer the vacuum full (or 
cluster). I guess the number of pages increases quite fast and the 
autovacuum daemon is not able to reclaim that - and this is probably the 
cause why scanning 12 MB of data takes 2 sec, which is way too much - 
the table is acrually much bigger as it contains a lot of dead data).


Tomas



Hello

This is a question about something we have seen sometimes in the last
months. It happens with tables with a large amount of updates/selects
compared with the amount of inserts/deletes. The sizes of these tables
are small and the amount of rows too.

The 'problem' is that performance decrease during the day and the only
thing that helps is to run CLUSTER on the table with problems. VACUUM
ANALYZE does not help.

Some information that can help to find out why this happens:

- PostgreSQL version: 8.1.9

--
scanorama=# SELECT pg_size_pretty(pg_relation_size('hosts'));

 pg_size_pretty

 12 MB
--
scanorama=# SELECT count(*) FROM hosts ;

 count
---
 16402
--
scanorama=# EXPLAIN ANALYZE SELECT * FROM hosts;

 Seq Scan on hosts  (cost=0.00..2771.56 rows=66756 width=314) (actual
time=0.008..2013.415 rows=16402 loops=1)
 Total runtime: 2048.486 ms
--
scanorama=# VACUUM ANALYZE ;
VACUUM
--
scanorama=# EXPLAIN ANALYZE SELECT * FROM hosts;

 Seq Scan on hosts  (cost=0.00..2718.57 rows=61357 width=314) (actual
time=0.008..1676.283 rows=16402 loops=1)
 Total runtime: 1700.826 ms
--
scanorama=# CLUSTER hosts_pkey ON hosts ;
CLUSTER
--
scanorama=# EXPLAIN ANALYZE SELECT * FROM hosts;

 Seq Scan on hosts  (cost=0.00..680.02 rows=16402 width=314) (actual
time=0.008..31.205 rows=16402 loops=1)
 Total runtime: 53.635 ms
--
scanorama=# SELECT * from pg_stat_all_tables WHERE relname LIKE 'hosts';
 relid  | schemaname | relname | seq_scan | seq_tup_read | idx_scan |
idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del
++-+--+--+--+---+---+---+---
 105805 | public | hosts   |  1996430 |  32360280252 |  2736391 |
   3301856 |   948 |   1403325 |   737

The information from pg_stat_all_tables is from the last 20 days.
--
INFO:  analyzing "public.hosts"
INFO:  "hosts": scanned 2536 of 2536 pages, containing 16410 live rows
and 57042 dead rows; 16410 rows in sample, 16410 estimated total rows
INFO:  free space map contains 191299 pages in 786 relations
DETAIL:  A total of 174560 page slots are in use (including overhead).
174560 page slots are required to track all free space.
Current limits are:  200 page slots, 4000 relations, using 12131 KB.
--

The tables with this 'problem' are not big, so CLUSTER finnish very fast
and it does not have an impact in the access because of locking. But we
wonder why this happens.

Do you need more information?

Thanks in advance.
regards



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

[PERFORM] Need to run CLUSTER to keep performance

2007-11-08 Thread Rafael Martinez

Hello

This is a question about something we have seen sometimes in the last
months. It happens with tables with a large amount of updates/selects
compared with the amount of inserts/deletes. The sizes of these tables
are small and the amount of rows too.

The 'problem' is that performance decrease during the day and the only
thing that helps is to run CLUSTER on the table with problems. VACUUM
ANALYZE does not help.

Some information that can help to find out why this happens:

- PostgreSQL version: 8.1.9

--
scanorama=# SELECT pg_size_pretty(pg_relation_size('hosts'));

 pg_size_pretty

 12 MB
--
scanorama=# SELECT count(*) FROM hosts ;

 count
---
 16402
--
scanorama=# EXPLAIN ANALYZE SELECT * FROM hosts;

 Seq Scan on hosts  (cost=0.00..2771.56 rows=66756 width=314) (actual
time=0.008..2013.415 rows=16402 loops=1)
 Total runtime: 2048.486 ms
--
scanorama=# VACUUM ANALYZE ;
VACUUM
--
scanorama=# EXPLAIN ANALYZE SELECT * FROM hosts;

 Seq Scan on hosts  (cost=0.00..2718.57 rows=61357 width=314) (actual
time=0.008..1676.283 rows=16402 loops=1)
 Total runtime: 1700.826 ms
--
scanorama=# CLUSTER hosts_pkey ON hosts ;
CLUSTER
--
scanorama=# EXPLAIN ANALYZE SELECT * FROM hosts;

 Seq Scan on hosts  (cost=0.00..680.02 rows=16402 width=314) (actual
time=0.008..31.205 rows=16402 loops=1)
 Total runtime: 53.635 ms
--
scanorama=# SELECT * from pg_stat_all_tables WHERE relname LIKE 'hosts';
 relid  | schemaname | relname | seq_scan | seq_tup_read | idx_scan |
idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del
++-+--+--+--+---+---+---+---
 105805 | public | hosts   |  1996430 |  32360280252 |  2736391 |
   3301856 |   948 |   1403325 |   737

The information from pg_stat_all_tables is from the last 20 days.
--
INFO:  analyzing "public.hosts"
INFO:  "hosts": scanned 2536 of 2536 pages, containing 16410 live rows
and 57042 dead rows; 16410 rows in sample, 16410 estimated total rows
INFO:  free space map contains 191299 pages in 786 relations
DETAIL:  A total of 174560 page slots are in use (including overhead).
174560 page slots are required to track all free space.
Current limits are:  200 page slots, 4000 relations, using 12131 KB.
--

The tables with this 'problem' are not big, so CLUSTER finnish very fast
and it does not have an impact in the access because of locking. But we
wonder why this happens.

Do you need more information?

Thanks in advance.
regards
-- 
 Rafael Martinez, <[EMAIL PROTECTED]>
 Center for Information Technology Services
 University of Oslo, Norway

 PGP Public Key: http://folk.uio.no/rafael/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Need to run CLUSTER to keep performance

2007-11-08 Thread Alvaro Herrera

Rafael Martinez wrote:
> Hello
> 
> This is a question about something we have seen sometimes in the last
> months. It happens with tables with a large amount of updates/selects
> compared with the amount of inserts/deletes. The sizes of these tables
> are small and the amount of rows too.
> 
> The 'problem' is that performance decrease during the day and the only
> thing that helps is to run CLUSTER on the table with problems. VACUUM
> ANALYZE does not help.

Probably because all the live tuples are clustered at the end of the
table, and the initial pages are polluted with dead tuples.  Try
vacuuming the table much more often, say every few minutes.

Your table is 2536 pages long, but it could probably be in the vicinity
of 700 ...

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-08 Thread Tom Lane

"Guillaume Smet" <[EMAIL PROTECTED]> writes:
> On Nov 8, 2007 12:14 AM, Tom Lane <[EMAIL PROTECTED]> wrote:
>> I've applied a patch that might help you:
>> http://archives.postgresql.org/pgsql-committers/2007-11/msg00104.php

> AFAICS, it doesn't seem to fix the problem.

Hmm, can we see the pg_stats row for the ancestors column?

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Need to run CLUSTER to keep performance

2007-11-08 Thread Rafael Martinez

Heikki Linnakangas wrote:
> Rafael Martinez wrote:

>> The tables with this 'problem' are not big, so CLUSTER finnish very fast
>> and it does not have an impact in the access because of locking. But we
>> wonder why this happens.
> 
> 2 seconds for seq scanning 12 MB worth of data sounds like a lot. Have
> you increased shared_buffers from the default? Which operating system
> are you using? Shared memory access is known to be slower on Windows.
> 

This is a server with 8GB of ram, we are using 25% as shared_buffers.
Linux RHELAS4 with a 2.6.9-55.0.9.ELsmp kernel / x86_64.

> On a small table like that you could run VACUUM every few minutes
> without much impact on performance. That should keep the table size in
> check.
> 

Ok, we run VACUUM ANALYZE only one time a day, every night. But we would
espect the performance to get ok again after running vacuum, and it
doesn't. Only CLUSTER helps.

I can not see we need to change the max_fsm_pages parameter and pg_class
and analyze give us this information today (not long ago a CLUSTER was
executed):
--
scanorama=# VACUUM VERBOSE ANALYZE hosts;
INFO:  vacuuming "public.hosts"
INFO:  index "hosts_pkey" now contains 20230 row versions in 117 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "hosts": found 0 removable, 20230 nonremovable row versions in
651 pages
DETAIL:  3790 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  vacuuming "pg_toast.pg_toast_376127"
INFO:  index "pg_toast_376127_index" now contains 131 row versions in 2
pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_376127": found 0 removable, 131 nonremovable row
versions in 33 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing "public.hosts"
INFO:  "hosts": scanned 651 of 651 pages, containing 16440 live rows and
3790 dead rows; 16440 rows in sample, 16440 estimated total rows
VACUUM

scanorama=# SELECT relname, relpages, reltuples from pg_class WHERE
relname LIKE 'hosts';
 relname | relpages | reltuples
-+--+---
 hosts   |  651 | 20230
--


Anymore ideas?
regards,
-- 
 Rafael Martinez, <[EMAIL PROTECTED]>
 Center for Information Technology Services
 University of Oslo, Norway

 PGP Public Key: http://folk.uio.no/rafael/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Need to run CLUSTER to keep performance

2007-11-08 Thread Rafael Martinez

Heikki Linnakangas wrote:
>
> If the table is already bloated, a VACUUM won't usually shrink it. It
> only makes the space available for reuse, but a sequential scan still
> needs to go through a lot of pages.
> 
> CLUSTER on the other hand repacks the tuples and gets rid of all the
> unused space on pages. You need to run CLUSTER or VACUUM FULL once to
> shrink the relation, but after that frequent-enough VACUUMs should keep
> the table size down.
> 

Ok, thanks for the advice. We will try this and will come back with more
information.

-- 
 Rafael Martinez, <[EMAIL PROTECTED]>
 Center for Information Technology Services
 University of Oslo, Norway

 PGP Public Key: http://folk.uio.no/rafael/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [PERFORM] Need to run CLUSTER to keep performance

2007-11-08 Thread Bill Moran

In response to Rafael Martinez <[EMAIL PROTECTED]>:

> Heikki Linnakangas wrote:
> > Rafael Martinez wrote:
> 
> >> The tables with this 'problem' are not big, so CLUSTER finnish very fast
> >> and it does not have an impact in the access because of locking. But we
> >> wonder why this happens.
> > 
> > 2 seconds for seq scanning 12 MB worth of data sounds like a lot. Have
> > you increased shared_buffers from the default? Which operating system
> > are you using? Shared memory access is known to be slower on Windows.
> > 
> 
> This is a server with 8GB of ram, we are using 25% as shared_buffers.
> Linux RHELAS4 with a 2.6.9-55.0.9.ELsmp kernel / x86_64.
> 
> > On a small table like that you could run VACUUM every few minutes
> > without much impact on performance. That should keep the table size in
> > check.
> > 
> 
> Ok, we run VACUUM ANALYZE only one time a day, every night. But we would
> espect the performance to get ok again after running vacuum, and it
> doesn't. Only CLUSTER helps.

If you have a large value for max_fsm_pages, but only vacuum once a day,
you could end up with considerable bloat on a small table, but not enough
to exceed max_fsm_pages (thus you wouldn't see any warning/errors)

I recommend either:
a) autovaccum, with aggressive settings for that table
b) a more aggressive schedule for that particular table, maybe a cron
   that vacuums that table every 5 minutes.

You could also do a combination, i.e. enable autovacuum with conservative
settings and set a cron to vacuum the table every 10 minutes.

Vacuuming once a day is usually only enough if you have very minimal
updates.

-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Need to run CLUSTER to keep performance

2007-11-08 Thread Heikki Linnakangas


Rafael Martinez wrote:

Heikki Linnakangas wrote:

On a small table like that you could run VACUUM every few minutes
without much impact on performance. That should keep the table size in
check.


Ok, we run VACUUM ANALYZE only one time a day, every night. But we would
espect the performance to get ok again after running vacuum, and it
doesn't. Only CLUSTER helps.


If the table is already bloated, a VACUUM won't usually shrink it. It 
only makes the space available for reuse, but a sequential scan still 
needs to go through a lot of pages.


CLUSTER on the other hand repacks the tuples and gets rid of all the 
unused space on pages. You need to run CLUSTER or VACUUM FULL once to 
shrink the relation, but after that frequent-enough VACUUMs should keep 
the table size down.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Need to run CLUSTER to keep performance

2007-11-08 Thread Rafael Martinez

Alvaro Herrera wrote:
> Rafael Martinez wrote:

>> The 'problem' is that performance decrease during the day and the only
>> thing that helps is to run CLUSTER on the table with problems. VACUUM
>> ANALYZE does not help.
> 
> Probably because all the live tuples are clustered at the end of the
> table, and the initial pages are polluted with dead tuples.  Try
> vacuuming the table much more often, say every few minutes.
> 
> Your table is 2536 pages long, but it could probably be in the vicinity
> of 700 ...
> 

We run VACUUM ANALYZE every 10 minuttes during 2-3 days to see if it
helped, but when it didn't we when back to the old configuration (1 time
everyday)

Yes, after a CLUSTER we are using 517 pages. But the table does not grow
much, it is always around 12-20MB, it looks like vacuum works without
problems.

regards,
-- 
 Rafael Martinez, <[EMAIL PROTECTED]>
 Center for Information Technology Services
 University of Oslo, Norway

 PGP Public Key: http://folk.uio.no/rafael/

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] dell versus hp

2007-11-08 Thread Vivek Khera



On Nov 6, 2007, at 1:10 PM, Greg Smith wrote:

elsewhere.  But once you have enough disks in an array to spread all  
the load over that itself may improve write throughput enough to  
still be a net improvement.


This has been my expeience with 14+ disks in an array (both RAID10 and  
RAID5).  The difference is barely noticeable.



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-08 Thread Tom Lane

"Guillaume Smet" <[EMAIL PROTECTED]> writes:
> On Nov 8, 2007 12:14 AM, Tom Lane <[EMAIL PROTECTED]> wrote:
>> I've applied a patch that might help you:
>> http://archives.postgresql.org/pgsql-committers/2007-11/msg00104.php

> AFAICS, it doesn't seem to fix the problem. I just compiled
> REL8_1_STABLE branch and I still has the following behaviour:

OK, I tried it in fr_FR locale and what I find is that

regression=# select '123/' < '1230'::text;
 ?column? 
--
 t
(1 row)

so make_greater_string() will still think that its first try at
generating an upper-bound string is good enough.  However

regression=# select '123/1' < '1230'::text;
 ?column? 
--
 f
(1 row)

so the data starting with '123/' is still outside the generated range,
leading to a wrong estimate.  I didn't see this behavior yesterday but
I was experimenting with en_US which I guess has different rules.

What I am tempted to do about this is have make_greater_string tack "zz"
onto the supplied prefix, so that it would have to find a string that
compares greater than "123/zz" before reporting success.  This is
getting pretty klugy though, so cc'ing to pgsql-hackers to see if anyone
has a better idea.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Need to run CLUSTER to keep performance

2007-11-08 Thread Chris Browne

[EMAIL PROTECTED] (Rafael Martinez) writes:
> Heikki Linnakangas wrote:
>> On a small table like that you could run VACUUM every few minutes
>> without much impact on performance. That should keep the table size in
>> check.
>> 
>
> Ok, we run VACUUM ANALYZE only one time a day, every night. But we would
> espect the performance to get ok again after running vacuum, and it
> doesn't. Only CLUSTER helps.

You have characterized the shape of the problem Right There.

If you only VACUUM that table once a day, then it has a whole day to
get cluttered with dead tuples, which increases its size to encompass
651 pages, and NOTHING ever allows it to shrink back to a small size.
Plain VACUUM (or VACUUM ANALYZE) does not attempt to shrink table
sizes.  Only VACUUM FULL and CLUSTER do that.

Here are some options to "parameterize" your choices:

- If you vacuum the table often enough that only 10% of the table
  consists of dead tuples, then you can expect the table to perpetually
  have 10% of dead space.

- If you vacuum the table seldom enough that 90% of the table may be
  expected to consist of dead tuples, then you can expect this table to
  consistently have 90% of its space be "dead."

It sounds like this particular table needs to be vacuumed quite a bit
more frequently than once a day.

On our systems, we have certain tables where tuples get killed off so
frequently that we find it worthwhile to vacuum those tables once
every two to three minutes.  If we didn't, we'd see application
performance bog down until it forced us to CLUSTER or VACUUM FULL the
table.
-- 
"cbbrowne","@","acm.org"
http://linuxfinances.info/info/linux.html
"How much more helpful could I be than to provide you with the
appropriate e-mail address? I could engrave it on a clue-by-four and
deliver it to you in Chicago, I suppose." -- Seen on Slashdot...

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] dell versus hp

2007-11-08 Thread Vivek Khera



On Nov 6, 2007, at 5:12 AM, Tore Halset wrote:


Here are our current alternatives:


Two things I recommend.  If the drives are made by western digital,  
run away.


If the PERC5/i is an Adaptec card, run away.

Max out your cache RAM on the RAID card.  256 is the minimum when you  
have such big data sets that need the big disks you're looking at.




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Need to run CLUSTER to keep performance

2007-11-08 Thread Tom Lane

Rafael Martinez <[EMAIL PROTECTED]> writes:
> Heikki Linnakangas wrote:
>> On a small table like that you could run VACUUM every few minutes
>> without much impact on performance. That should keep the table size in
>> check.

> Ok, we run VACUUM ANALYZE only one time a day, every night.

There's your problem.

Reading between the lines I gather that you think an update is "free"
in the sense of not creating a need for vacuum.  It's not --- it's
exactly equivalent to an insert + a delete, and it leaves behind a
dead row that needs to be vacuumed.  If you do a lot of updates, you
need to vacuum.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-08 Thread Guillaume Smet

On Nov 8, 2007 4:01 PM, Tom Lane <[EMAIL PROTECTED]> wrote:
> Hmm, can we see the pg_stats row for the ancestors column?

Sure:
 public | cms_items | ancestors | 0 |32 |
-1 |  |   |
{10011/10010/10009/10018/2554055/,10011/10010/84022/23372040/,10011/2233043/2233042/2233041/,10011/3985097/5020039/,10011/872018/13335056/1051/,1062/22304709/22304714/,1062/2489/2492/27861901/,1062/2527/2530/29658392/,1062/2698/2705/6014040/,1062/52354/52355/255038/255037/,9846852/}
|   -0.151713

I can provide the data if needed, there's nothing confidential in them.

--
Guillaume

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [PERFORM] dell versus hp

2007-11-08 Thread Scott Marlowe

On Nov 8, 2007 10:43 AM, Vivek Khera <[EMAIL PROTECTED]> wrote:
>
> On Nov 6, 2007, at 1:10 PM, Greg Smith wrote:
>
> > elsewhere.  But once you have enough disks in an array to spread all
> > the load over that itself may improve write throughput enough to
> > still be a net improvement.
>
> This has been my expeience with 14+ disks in an array (both RAID10 and
> RAID5).  The difference is barely noticeable.

Mine too.  I would suggest though, that by the time you get to 14
disks, you switch from RAID-5 to RAID-6 so you have double redundancy.
 Performance of a degraded array is better in RAID6 than RAID5, and
you can run your rebuilds much slower since you're still redundant.

> If the PERC5/i is an Adaptec card, run away.

I've heard the newest adaptecs, even the perc implementations aren't bad.

Of course, that doesn't mean I'm gonna use one, but who knows?  They
might have made a decent card after all.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] hp ciss on freebsd

2007-11-08 Thread Vivek Khera



On Nov 5, 2007, at 8:19 AM, Claus Guttesen wrote:


I will get four 72 GB sas-disks at 15K rpm. Reading the archives
suggest raid 1+0 for optimal read/write performance, but with a solid
raid-controller raid 5 will also perform very well when reading.


If you only have 4 drives, I'd recommend not to go with RAID5. You  
want to max out spindles.  The 256k RAM may not be enough of a cache  
too.



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [PERFORM] Join performance

2007-11-08 Thread Tom Lane

"Steinar H. Gunderson" <[EMAIL PROTECTED]> writes:
> On Thu, Nov 08, 2007 at 04:47:09PM -0600, Pepe Barbe wrote:
>> I am having an issue on PostgreSQL 8.0.12. In the past we had performance 
>> issues with the query planner for queries on some tables where we knew we 
>> had indexes and it was doing a sequential scan, and for this reason we 
>> issue "SET enable_seqscan = FALSE" for some queries.

> This is a bad idea in general.

Indeed.  A less brute-force way of getting the planner to favor
indexscans is to reduce random_page_cost ... have you experimented
with that?

Also, consider updating to 8.2.x, which has an improved cost model
for indexscans and will more often make the correct choice without
such shenanigans.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Join performance

2007-11-08 Thread Tom Lane

"Steinar H. Gunderson" <[EMAIL PROTECTED]> writes:
> On Thu, Nov 08, 2007 at 04:47:09PM -0600, Pepe Barbe wrote:
>> I am having an issue on PostgreSQL 8.0.12. In the past we had performance 
>> issues with the query planner for queries on some tables where we knew we 
>> had indexes and it was doing a sequential scan, and for this reason we 
>> issue "SET enable_seqscan = FALSE" for some queries.

> This is a bad idea in general.

Indeed.  A less brute-force way of getting the planner to favor
indexscans is to reduce random_page_cost ... have you experimented
with that?

Also, consider updating to 8.2.x, which has an improved cost model
for indexscans and will more often make the correct choice without
such shenanigans.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] dell versus hp

2007-11-08 Thread Scott Marlowe

On Nov 8, 2007 2:56 PM, Alan Hodgson <[EMAIL PROTECTED]> wrote:
> On Thursday 08 November 2007, Dimitri Fontaine <[EMAIL PROTECTED]>
> > Is raid6 better than raid10 in term of overall performances, or a better
> > cut when you need capacity more than throughput?
>
> You can't touch RAID 10 for performance or reliability. The only reason to
> use RAID 5 or RAID 6 is to get more capacity out of the same drives.

Actually, RAID6 is about the same on reliability, since it has double
parity and theoretically ANY TWO disks could fail, and RAID6 will
still have all your data.  If the right two disks fail in a RAID-10
you lose everything.  Admittedly, that's a pretty remote possibility,
but so it three drives failing at once in a RAID-6.

For performance RAID-10 is still pretty much the best choice.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Join performance

2007-11-08 Thread Steinar H. Gunderson

On Thu, Nov 08, 2007 at 04:47:09PM -0600, Pepe Barbe wrote:
> I am having an issue on PostgreSQL 8.0.12. In the past we had performance 
> issues with the query planner for queries on some tables where we knew we 
> had indexes and it was doing a sequential scan, and for this reason we 
> issue "SET enable_seqscan = FALSE" for some queries.

This is a bad idea in general. Did you really measure that this made queries
run faster? Generally, using an index is not always a win, and the planner
tries to figure out when it isn't. Setting it globally is seldom a good idea
anyway; if it really _is_ a win for a given query, you could always set it
locally in that session.

> Any ideas what could I try to fix this problem?

Re-enable seqscan?

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] Join performance

2007-11-08 Thread Tom Lane

Ooops, sorry about the multiple copies there --- not sure what happened.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

[PERFORM] Join performance

2007-11-08 Thread Pepe Barbe


Hello,

I am having an issue on PostgreSQL 8.0.12. In the past we had  
performance issues with the query planner for queries on some tables  
where we knew we had indexes and it was doing a sequential scan, and  
for this reason we issue "SET enable_seqscan = FALSE" for some queries.


Recently we have stumbled upon one of these kind of queries that is  
giving terrible performance, because seqscan is disabled. I've reduced  
the problem to a a command like this one:


SELECT * from gsm_sector_metrics NATURAL JOIN gsm_amr_metrics INNER  
JOIN temp_busy_hr USING(start_time,bsc_id,sect_id);


Where temp_busy_hr is a temporary table.

If the previous is issued with seqscan TRUE, it runs within reasonable  
time, else it runs for ever. The query plan for the previous query  
with enable_seqscan = TRUE:


QUERY PLAN

Limit  (cost=0.00..384555.98 rows=1 width=3092)
  ->  Nested Loop  (cost=0.00..384555.98 rows=1 width=3092)
Join Filter: (("inner".bsc_id = "outer".bsc_id) AND  
("inner".site_id = "outer".site_id) AND ("inner".sect_id =  
"outer".sect_id))

->  Nested Loop  (cost=0.00..368645.64 rows=28 width=1192)
  Join Filter: (("outer".sect_id = "inner".sect_id) AND  
("outer".bsc_id = "inner".bsc_id))
  ->  Seq Scan on temp_busy_hr  (cost=0.00..24.00  
rows=1400 width=24)
  ->  Index Scan using gsm_amr_start_time_idx on  
gsm_amr_metrics  (cost=0.00..226.66 rows=2094 width=1168)
Index Cond: ("outer".start_time =  
gsm_amr_metrics.start_time)
->  Index Scan using gsm_sector_start_time_idx on  
gsm_sector_metrics t1  (cost=0.00..528.77 rows=1973 width=1936)

  Index Cond: (t1.start_time = "outer".start_time)
(10 rows)

and the plan for enable_seqscan = FALSE:

QUERY PLAN

Limit  (cost=10097.16.. 100720844.01 rows=1 width=3092)
  ->  Nested Loop  (cost=10097.16..100720844.01 rows=1 width=3092)
Join Filter: (("inner".bsc_id = "outer".bsc_id) AND  
("inner".site_id = "outer".site_id) AND ("inner".sect_id =  
"outer".sect_id))
->  Merge Join  (cost=10097.16..100704933.67 rows=28  
width=1192)

  Merge Cond: ("outer".start_time = "inner".start_time)
  Join Filter: (("inner".sect_id = "outer".sect_id) AND  
("inner".bsc_id = "outer".bsc_id))
  ->  Index Scan using gsm_amr_start_time_idx on  
gsm_amr_metrics  (cost=0.00..631211.45 rows=6005551 width=1168)
  ->  Sort  (cost=10097.16..10100.66 rows=1400  
width=24)

Sort Key: temp_busy_hr.start_time
->  Seq Scan on temp_busy_hr   
(cost=1.00..10024.00 rows=1400 width=24)
->  Index Scan using gsm_sector_start_time_idx on  
gsm_sector_metrics t1  (cost=0.00..528.77 rows=1973 width=1936)

  Index Cond: (t1.start_time = "outer".start_time)
(12 rows)

Any ideas what could I try to fix this problem?

Thanks,
Pepe

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [PERFORM] dell versus hp

2007-11-08 Thread Alan Hodgson

On Thursday 08 November 2007, Dimitri Fontaine <[EMAIL PROTECTED]> 
> Is raid6 better than raid10 in term of overall performances, or a better
> cut when you need capacity more than throughput?

You can't touch RAID 10 for performance or reliability. The only reason to 
use RAID 5 or RAID 6 is to get more capacity out of the same drives.

-- 
Alan

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] dell versus hp

2007-11-08 Thread Kevin Grittner

>>> On Thu, Nov 8, 2007 at  2:14 PM, in message
<[EMAIL PROTECTED]>, Dimitri Fontaine
<[EMAIL PROTECTED]> wrote: 

> The Dell 2900 5U machine has 10 spindles max, that would make 2 for the OS 
> (raid1) and 8 for mixing WAL and data... not enough to benefit from the 
> move, 
> or still to test?

From our testing and various posts on the performance list, you can
expect a good battery backed caching RAID controller will probably
eliminate most of the performance difference between separate WAL
drives and leaving them on the same RAID array with the rest of the
database.  See, for example:

http://archives.postgresql.org/pgsql-performance/2007-02/msg00026.php

Ben found a difference of "a few percent"; I remember seeing a post
from someone who did a lot of testing and found a difference of 1%.
As stated in the above referenced posting, it will depend on your
workload (and your hardware) so it is best if you can do some
realistic tests.

-Kevin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] dell versus hp

2007-11-08 Thread Dimitri Fontaine

Le Thursday 08 November 2007 19:22:48 Scott Marlowe, vous avez écrit :
> On Nov 8, 2007 10:43 AM, Vivek Khera <[EMAIL PROTECTED]> wrote:
> > On Nov 6, 2007, at 1:10 PM, Greg Smith wrote:
> > > elsewhere.  But once you have enough disks in an array to spread all
> > > the load over that itself may improve write throughput enough to
> > > still be a net improvement.
> >
> > This has been my expeience with 14+ disks in an array (both RAID10 and
> > RAID5).  The difference is barely noticeable.
>
> Mine too.

May we conclude from this that mixing WAL and data onto the same array is a 
good idea starting at 14 spindles?

The Dell 2900 5U machine has 10 spindles max, that would make 2 for the OS 
(raid1) and 8 for mixing WAL and data... not enough to benefit from the move, 
or still to test?

> I would suggest though, that by the time you get to 14 
> disks, you switch from RAID-5 to RAID-6 so you have double redundancy.
>  Performance of a degraded array is better in RAID6 than RAID5, and
> you can run your rebuilds much slower since you're still redundant.

Is raid6 better than raid10 in term of overall performances, or a better cut 
when you need capacity more than throughput?

Thanks for sharing the knowlegde, regards,
-- 
dim

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Join performance

2007-11-08 Thread Tom Lane

"Steinar H. Gunderson" <[EMAIL PROTECTED]> writes:
> On Thu, Nov 08, 2007 at 04:47:09PM -0600, Pepe Barbe wrote:
>> I am having an issue on PostgreSQL 8.0.12. In the past we had performance 
>> issues with the query planner for queries on some tables where we knew we 
>> had indexes and it was doing a sequential scan, and for this reason we 
>> issue "SET enable_seqscan = FALSE" for some queries.

> This is a bad idea in general.

Indeed.  A less brute-force way of getting the planner to favor
indexscans is to reduce random_page_cost ... have you experimented
with that?

Also, consider updating to 8.2.x, which has an improved cost model
for indexscans and will more often make the correct choice without
such shenanigans.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] dell versus hp

2007-11-08 Thread Vivek Khera



On Nov 8, 2007, at 1:22 PM, Scott Marlowe wrote:

I've heard the newest adaptecs, even the perc implementations aren't  
bad.



I have a pair of Adaptec 2230SLP cards.  Worst.  Just replaced them on  
Tuesday with fibre channel cards connected to external RAID  
enclosures.  Much nicer.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-08 Thread Tom Lane

Gregory Stark <[EMAIL PROTECTED]> writes:
> Doesn't really strike at the core reason that this is so klugy though. Surely
> the "right" thing is to push the concept of open versus closed end-points
> through deeper into the estimation logic?

No, the right thing is to take the folk who defined "dictionary sort
order" out behind the barn and shoot 'em ;-).  This has got nothing to
do with open/closed endpoints and everything to do with the bizarre
sorting rules used by some locales.  In particular the reason I want to
append a letter is that some locales discriminate against non-letter
characters in the first pass of sorting.

I did do some experimentation and found that among the ASCII characters
(ie, codes 32-126), nearly all the non-C locales on my Fedora machine
sort Z last and z next-to-last or vice versa.  Most of the remainder
sort digits last and z or Z as the last non-digit character.  Since Z is
not that close to the end of the sort order in C locale, however, z
seems the best bet.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-08 Thread Gregory Stark

"Tom Lane" <[EMAIL PROTECTED]> writes:

> What I am tempted to do about this is have make_greater_string tack "zz"
> onto the supplied prefix, so that it would have to find a string that
> compares greater than "123/zz" before reporting success.  This is
> getting pretty klugy though, so cc'ing to pgsql-hackers to see if anyone
> has a better idea.

Hm, instead of "zz" is there a convenient way to find out what actual
character sorts last amongst all the single characters in the locale's
encoding?

Doesn't really strike at the core reason that this is so klugy though. Surely
the "right" thing is to push the concept of open versus closed end-points
through deeper into the estimation logic?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's RemoteDBA services!

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [PERFORM] Join performance

2007-11-08 Thread Tom Lane

"Steinar H. Gunderson" <[EMAIL PROTECTED]> writes:
> On Thu, Nov 08, 2007 at 04:47:09PM -0600, Pepe Barbe wrote:
>> I am having an issue on PostgreSQL 8.0.12. In the past we had performance 
>> issues with the query planner for queries on some tables where we knew we 
>> had indexes and it was doing a sequential scan, and for this reason we 
>> issue "SET enable_seqscan = FALSE" for some queries.

> This is a bad idea in general.

Indeed.  A less brute-force way of getting the planner to favor
indexscans is to reduce random_page_cost ... have you experimented
with that?

Also, consider updating to 8.2.x, which has an improved cost model
for indexscans and will more often make the correct choice without
such shenanigans.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] [PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-08 Thread Tom Lane

I wrote:
> I did do some experimentation and found that among the ASCII characters
> (ie, codes 32-126), nearly all the non-C locales on my Fedora machine
> sort Z last and z next-to-last or vice versa.  Most of the remainder
> sort digits last and z or Z as the last non-digit character.  Since Z is
> not that close to the end of the sort order in C locale, however, z
> seems the best bet.

With still further experimentation, it seems that doesn't work very
well, because the locales that sort digits last also seem not to
discriminate against digits in their first pass.  What did seem to work
was:

* Determine which of the strings "Z", "z", "y", "9" is seen as largest
by strcoll().

* Append this string to the given input.

* Search (using the CVS-HEAD make_greater_string logic) for a string
greater than that.

This rule works for all the locales I have installed ... but I don't
have any Far Eastern locales installed.  Also, my test cases are only
covering ASCII characters, and I believe many locales have some non-ASCII
letters that sort after 'Z'.  I'm not sure how hard we need to try to
cover those corner cases, though.  It is ultimately only an estimate...

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] How to avoid hashjoin and mergejoin

2007-11-08 Thread Carlo Stonebanks

Larry,

Considering these recommendations, let's try setting shared_buffers to 2GB
and work_mem to 16MB. The thing is that work_mem is per connection, and if
we get too aggressive and we get a lot of simultaneous users, we can
potentially eat up a lot of memory.

So 2GB + (100 * 16MB) = 3.6GB total RAM eaten up under peak load for these
two values alone.

If we wanted to get more aggressive, we can raise work_mem.

Carlo

-Original Message-
From: Scott Marlowe [mailto:[EMAIL PROTECTED] 
Sent: November 1, 2007 5:39 PM
To: Carlo Stonebanks
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] How to avoid hashjoin and mergejoin

On 11/1/07, Carlo Stonebanks <[EMAIL PROTECTED]> wrote:

> I am comparing the same query on two different PG 8.2 servers, one Linux

> (8GB RAM) and one Windows (32GB RAM). Both have similar drives and CPU's.

> 

> The Windows posgrestsql.config is pretty well tuned but it looks like

> someone had wiped out the Linux config so the default one was
re-installed.

> All performance-related memory allocation values seem to be set to the

> defaults, but mods have been made: max_connections = 100 and
shared_buffers

> = 32MB.

> 

> The performance for this query is terrible on the Linux server, and good
on

> the Windows server - presumably because the original Linux PG config has

> been lost. This query requires: that "set enable_seqscan to 'off';"

Have you run analyze on the server yet?

A few general points on performance tuning.  With 8.2 you should set

shared_buffers to a pretty big chunk of memory on linux, up to 25% or

so.  That means 32 Meg shared buffers is REAL low for a linux server.

Try running anywhere from 512Meg up to 1Gig for starters and see if

that helps too.  Also turn up work_mem to something like 16 to 32 meg

then restart the server after making these changes.

Then give us the explain analyze output with all the enable_xxx set to ON.

summary: analyze, increase shared_buffers and work_mem, give us explain
analyze.

Re: [PERFORM] Hardware for PostgreSQL

2007-11-08 Thread Robert Treat

On Wednesday 31 October 2007 12:45, Ketema wrote:
> I am trying to build a very Robust DB server that will support 1000+
> concurrent users (all ready have seen max of 237 no pooling being
> used).  I have read so many articles now that I am just saturated.  I
> have a general idea but would like feedback from others.
>

Most of the other answers you've gotten have been pretty good, but I had some 
questions on the above; specifically is there a reason you're avoid pooling? 
(something like pgbouncer can work wonders).  Are your 1000+ concurrent users 
working in something like a web environment, where they won't need a 1:1 
user:connection map to service them all, or are these going to be more 
permanent connections into the system?  FWIW I'd done 1000 connections 
simultaneous on pretty basic hardware, but you need to have the right kind of 
workloads to be able to do it. 

>
> Who has built the biggest baddest Pg server out there and what do you
> use?
>

While I'm not sure this will be that much help, I'd feel remisce if I didn't 
point you to it... 
http://www.lethargy.org/~jesus/archives/66-Big-Bad-PostgreSQL.html

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[PERFORM] Help understanding stat numbers

2007-11-08 Thread Chris Hoover

I have a database where I dropped all indexes on a table last night
and built a new set of indexes.  The goal is to try and let the
database have fewer indexes and use them more.  I removed a bunch of
indexes that were surviving from our 7.3 days where functionality will
now be covered by 8.1's use of multiple indexes..

Anyway, except for the primary key, all indexes were dropped and then
the new indexes where created.  However, I am confused by what the
pg_stat_user_indexes and pg_statio_users_indexes are telling me.
Which one is correct.

pg_stat_user_indexes is reporting this:
"indexrelname","idx_scan",'idx_tup_read","idx_tup_fetch"
"clmhdr_pkey";1583576;1577152;1577027
"hdr_clm_status_partial_idx";5243;6999857;372251
"hdr_create_dt_idx";1010;1420708;3656
"hdr_user_id_idx";71;928074;918439
"hdr_pat_cntl_nbr_idx";14;42;29
"hdr_clm_type_idx";1;673982;0
"hdr_process_dt_idx";1;22050;0
"erb_hdr_create_dt_idx";0;0;0
"erb_hdr_process_dt_idx";0;0;0
"erb_hdr_stmt_from_dt_idx";0;0;0
"erb_hdr_stmt_thru_dt_idx";0;0;0
"erb_hdr_transmit_dt_idx";0;0;0
"hdr_accepted_dt_idx";0;0;0
"hdr_asc_resp_rpt_cd_idx";0;0;0
"hdr_bill_type_idx";0;0;0
"hdr_fss_clm_status_idx";0;0;0
"hdr_fss_process_dt_idx";0;0;0
"hdr_submit_mode_idx";0;0;0
"patient_name_idx";0;0;0
"statement_date_idx";0;0;0


pg_statio_user_indexes is reporting:
"indexrelname","idx_blks_read","idx_blks_hit"
"hdr_clm_status_partial_idx";182;59455
"clmhdr_pkey";115382;6540557
"erb_hdr_process_dt_idx";7943;32679
"erb_hdr_create_dt_idx";8000;32042
"erb_hdr_transmit_dt_idx";7953;31511
"erb_hdr_stmt_thru_dt_idx";8667;30924
"hdr_create_dt_idx";11988;42617
"erb_hdr_stmt_from_dt_idx";8632;30173
"hdr_fss_clm_status_idx";9920;32774
"hdr_bill_type_idx";9949;32730
"hdr_asc_resp_rpt_cd_idx";9916;32387
"hdr_clm_type_idx";11777;33130
"hdr_fss_process_dt_idx";11891;33423
"hdr_accepted_dt_idx";11913;32876
"hdr_process_dt_idx";11976;33049
"hdr_submit_mode_idx";13815;32932
"hdr_user_id_idx";17372;34188
"hdr_pat_cntl_nbr_idx";15061;29137
"statement_date_idx";18838;29834
"patient_name_idx";21619;26182



If there has been no scans on an index (as according to
pg_stat_user_indexes), why is pg_statio_user_indexes showing non 0
values in idx_blks_hit/read?

Please help me understand this apparent contradiction.

Thanks,

Chris

PG 8.1.3

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [PERFORM] Help understanding stat numbers

2007-11-08 Thread Tom Lane

"Chris Hoover" <[EMAIL PROTECTED]> writes:
> If there has been no scans on an index (as according to
> pg_stat_user_indexes), why is pg_statio_user_indexes showing non 0
> values in idx_blks_hit/read?

I grow weary, but I think that "scan" is only incremented by commencing
a SELECT search using the index, whereas the block-level counts are also
incremented when the index is modified by an insert or update.  You may
be looking at indexes that are eating update cycles but not being used
for anything important...

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] [PERFORM] Estimation problem with a LIKE clause containing a /

2007-11-08 Thread Guillaume Smet

On Nov 9, 2007 3:08 AM, Tom Lane <[EMAIL PROTECTED]> wrote:
> This rule works for all the locales I have installed ... but I don't
> have any Far Eastern locales installed.  Also, my test cases are only
> covering ASCII characters, and I believe many locales have some non-ASCII
> letters that sort after 'Z'.  I'm not sure how hard we need to try to
> cover those corner cases, though.  It is ultimately only an estimate...

My opinion is that it's acceptable to fix the problem for most cases
in most locales because, as you said, it's only an estimate. We didn't
have any report of this problem for years so it seems that it's not a
common case or at least it's not common that the bad estimate leads to
noticeably bad plans.

As far as I understand what you plan to do, it doesn't seem to be
something that prevents us to fix the problem afterwards if someone
comes with an example which doesn't fit in the schema you're proposing
and has a real performance problem with it.

--
Guillaume

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

39 matches

Mail list logo