Re: [PERFORM] Can this query go faster???

2005-12-06 Thread Markus Wollny
Hi,

 -Ursprüngliche Nachricht-
 Von: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] Im Auftrag 
 von Joost Kraaijeveld
 Gesendet: Dienstag, 6. Dezember 2005 10:44
 An: Pgsql-Performance
 Betreff: [PERFORM] Can this query go faster???
 
 SELECT customers.objectid FROM prototype.customers, 
 prototype.addresses WHERE customers.contactaddress = 
 addresses.objectid ORDER BY zipCode asc, housenumber asc 
 LIMIT 1 OFFSET 283745
 
 Explain:
 
 Limit  (cost=90956.71..90956.71 rows=1 width=55)
   -  Sort  (cost=90247.34..91169.63 rows=368915 width=55)
 Sort Key: addresses.zipcode, addresses.housenumber
 -  Hash Join  (cost=14598.44..56135.75 rows=368915 width=55)
   Hash Cond: (outer.contactaddress = inner.objectid)
   -  Seq Scan on customers  (cost=0.00..31392.15
 rows=368915 width=80)
   -  Hash  (cost=13675.15..13675.15 rows=369315 width=55)
 -  Seq Scan on addresses  (cost=0.00..13675.15
 rows=369315 width=55)
 
 The customers table has an index on contactaddress and objectid.
 The addresses table has an index on zipcode+housenumber and objectid.

The planner chooses sequential scans on customers.contactaddress and 
addresses.objectid instead of using the indices. In order to determine whether 
this is a sane decision, you should run EXPLAIN ANALYZE on this query, once 
with SET ENABLE_SEQSCAN = on; and once with SET ENABLE_SEQSCAN = off;. If the 
query is significantly faster with SEQSCAN off, then something is amiss - 
either you haven't run analyze often enough so the stats are out of date or you 
have random_page_cost set too high (look for the setting in postgresql.conf) - 
these two are the usual suspects.

Kind regards

   Markus

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Queries taking ages in PG 8.1, have been much faster in PG=8.0

2005-12-05 Thread Markus Wollny
Hi! 

 -Ursprüngliche Nachricht-
 Von: Tom Lane [mailto:[EMAIL PROTECTED] 
 Gesendet: Sonntag, 4. Dezember 2005 19:32
 An: Markus Wollny
 Cc: pgsql-performance@postgresql.org
 Betreff: Re: [PERFORM] Queries taking ages in PG 8.1, have 
 been much faster in PG=8.0 

 The data is not quite the same, right?  I notice different 
 numbers of rows being returned.  

No, you're right, I didn't manage to restore the 8.1 dump into the 8.0.3 
cluster, so I took the quick route and restored the last dump from my 8.0 
installation. The numbers should be roughly within the same range, though:

Table answer has got 8,646,320 rows (counted and estimated, as this db is not 
live, obviously), table participant has got 173,998 rows; for comparison:
The production db had an estimated 8,872,130, counted 8,876,648 rows for table 
answer, and estimated 178,165, counted 178,248 rows for participant. As the 
numbers are a mere 2% apart, I should think that this wouldn't make that much 
difference.

 It seems that checking question_id/value via the index, 
 rather than directly on the fetched tuple, is a net loss 
 here.  It looks like 8.1 would have made the right plan 
 choice if it had made a better estimate of the combined 
 selectivity of the question_id and value conditions, so 
 ultimately this is another manifestation of the lack of 
 cross-column statistics.  What I find interesting though is 
 that the plain index scan in 8.0 is so enormously cheaper 
 than it's estimated to be.  Perhaps the answer table in your 
 8.0 installation is almost perfectly ordered by session_id?

Not quite - there may be several concurrent sessions at any one time, but 
ordinarily the answers for one session-id would be quite close together, in a 
lot of cases even in perfect sequence, so almost perfectly might be a fair 
description, depending on the exact definition of almost :)

 Are you using default values for the planner cost parameters? 

I have to admit that I did tune the random_page_cost and effective_cache_size 
settings ages ago (7.1-ish) to a value that seemed to work best then - and 
didn't touch it ever since, although my data pool has grown quite a bit over 
time. cpu_tuple_cost, cpu_index_tuple_cost and cpu_operator_cost are using 
default values.

  It looks like reducing random_page_cost would help bring the 
 planner estimates into line with reality on your machines.

I had set random_page_cost to 1.4 already, so I doubt that it would do much 
good to further reduce the value - reading the docs and the suggestions for 
tuning I would have thought that I should actually consider increasing this 
value a bit, as not all of my data will fit in memory any more. Do you 
nevertheless want me to try what happens if I reduce random_page_cost even 
further?

Kind regards

   Markus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Queries taking ages in PG 8.1, have been much faster in PG=8.0

2005-12-05 Thread Markus Wollny
 

 -Ursprüngliche Nachricht-
 Von: Tom Lane [mailto:[EMAIL PROTECTED] 
 Gesendet: Montag, 5. Dezember 2005 15:33
 An: Markus Wollny
 Cc: pgsql-performance@postgresql.org
 Betreff: Re: AW: [PERFORM] Queries taking ages in PG 8.1, 
 have been much faster in PG=8.0 
 
 Could we see the pg_stats row for answer.session_id in both 
 8.0 and 8.1?

Here you are:

select null_frac
, avg_width
, n_distinct
, most_common_vals
, most_common_freqs
, histogram_bounds
, Correlation
from pg_stats
where schemaname = 'survey'
and tablename = 'answer'
and attname = 'session_id';

8.1:
null_frac   0
avg_width   4
n_distinct  33513
most_common_vals
{1013854,1017890,1021551,1098817,764249,766938,776353,780954,782232,785985}
most_common_freqs   
{0.001,0.001,0.001,0.001,0.00067,0.00067,0.00067,0.00067,0.00067,0.00067}
histogram_bounds
{757532,819803,874935,938170,1014421,1081507,1164659,1237281,1288267,1331016,1368939}
Correlation -0.0736492

8.0.3:
null_frac   0
avg_width   4
n_distinct  29287
most_common_vals
{765411,931762,983933,1180453,1181959,1229963,1280249,1288736,1314970,764901}
most_common_freqs   
{0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.00067}
histogram_bounds
{757339,822949,875834,939085,1004782,1065251,1140682,1218336,1270024,1312170,1353082}
Correlation -0.237136

Kind regards

   Markus

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Queries taking ages in PG 8.1, have been much faster in PG=8.0

2005-12-05 Thread Markus Wollny
 -Ursprüngliche Nachricht-
 Von: Tom Lane [mailto:[EMAIL PROTECTED] 
 Gesendet: Montag, 5. Dezember 2005 16:12
 An: Markus Wollny
 Cc: pgsql-performance@postgresql.org
 Betreff: Re: AW: AW: [PERFORM] Queries taking ages in PG 8.1, 
 have been much faster in PG=8.0 
 
 Markus Wollny [EMAIL PROTECTED] writes:
  Could we see the pg_stats row for answer.session_id in 
 both 8.0 and 
  8.1?
 
  Here you are:
 
  8.1:
  Correlation -0.0736492
 
  8.0.3:
  Correlation -0.237136
 
 Interesting --- if the 8.1 database is a dump and restore of 
 the 8.0, you'd expect the physical ordering to be similar.  

I dumped the data from my 8.0.1 cluster on 2005-11-18 00:23 using pg_dumpall 
with no further options; the dump was passed through iconv to clear up some 
UTF-8 encoding issues, then restored into a fresh 8.1 cluster where it went 
productive; I used the very same dump to restore the 8.0.3 cluster. So there is 
a difference between the two datasets, an additional 230.328 rows in the 
answers-table.

 Why is 8.1 showing a significantly lower correlation?  That 
 has considerable impact on the estimated cost of an indexscan 
 (plain not bitmap), and so it might explain why 8.1 is 
 mistakenly avoiding the indexscan ...

I just ran a vacuum analyze on the table, just to make sure that the stats are 
up to date (forgot that on the previous run, thanks to pg_autovacuum...), and 
the current correlation on the 8.1 installation is now calculated as -0.158921. 
That's still more than twice the value as for the 8.0-db. I don't know whether 
that is significant, though.

Kind regards

   Markus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Queries taking ages in PG 8.1, have been much faster in PG=8.0

2005-12-05 Thread Markus Wollny
 
 -Ursprüngliche Nachricht-
 Von: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] Im Auftrag 
 von Markus Wollny
 Gesendet: Montag, 5. Dezember 2005 16:41
 An: Tom Lane
 Cc: pgsql-performance@postgresql.org
 Betreff: Re: [PERFORM] Queries taking ages in PG 8.1, have 
 been much faster in PG=8.0 

 an additional 230.328 rows in the answers-table.

That was supposed to read 230,328 rows, sorry.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Queries taking ages in PG 8.1, have been much faster in PG=8.0

2005-12-04 Thread Markus Wollny
Title: RE: [PERFORM] Queries taking ages in PG 8.1, have been much faster in PG=8.0 






Hi!

 -Ursprüngliche Nachricht-
 Von: Tom Lane [mailto:[EMAIL PROTECTED]]
 Gesendet: Donnerstag, 1. Dezember 2005 17:26
 An: Markus Wollny
 Cc: pgsql-performance@postgresql.org
 Betreff: Re: [PERFORM] Queries taking ages in PG 8.1, have
 been much faster in PG=8.0

 It looks like set enable_nestloop = 0 might be a workable
 hack for the immediate need.

 Once you're not under deadline,
 I'd like to investigate more closely to find out why 8.1 does
 worse than 8.0 here.


I've just set up a PostgreSQL 8.0.3 installation ...

select version();
 version

PostgreSQL 8.0.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.5 (Debian 1:3.3.5-13)
(1 row)

...and restored a dump there; here's the explain analyze of the query for 8.0.3:

 QUERY PLAN
---
Sort (cost=5193.63..5193.63 rows=3 width=16) (actual time=7365.107..7365.110 rows=3 loops=1)
 Sort Key: source.position
 - HashAggregate (cost=5193.59..5193.60 rows=3 width=16) (actual time=7365.034..7365.041 rows=3 loops=1)
 - Nested Loop (cost=0.00..5193.57 rows=3 width=16) (actual time=3190.642..7300.820 rows=11086 loops=1)
 - Nested Loop (cost=0.00..3602.44 rows=4 width=20) (actual time=3169.968..5875.153 rows=11087 loops=1)
 - Nested Loop (cost=0.00..1077.95 rows=750 width=16) (actual time=36.599..2778.129 rows=158288 loops=1)
 - Seq Scan on handy_java source (cost=0.00..1.03 rows=3 width=14) (actual time=6.503..6.514 rows=3 loops=1)
 - Index Scan using idx02_performance on answer (cost=0.00..355.85 rows=250 width=8) (actual time=10.071..732.746 rows=52763 loops=3)
 Index Cond: ((answer.question_id = 16) AND (answer.value = outer.id))
 - Index Scan using pk_participant on participant (cost=0.00..3.35 rows=1 width=4) (actual time=0.016..0.016 rows=0 loops=158288)
 Index Cond: (participant.session_id = outer.session_id)
 Filter: ((status = 1) AND (date_trunc('month'::text, created) = date_trunc('month'::text, (now() - '2 mons'::interval
 - Index Scan using idx_answer_session_id on answer (cost=0.00..397.77 rows=1 width=4) (actual time=0.080..0.122 rows=1 loops=11087)
 Index Cond: (outer.session_id = answer.session_id)
 Filter: ((question_id = 6) AND (value = 1))
Total runtime: 7365.461 ms
(16 rows)

Does this tell you anything useful? It's not on the same machine, mind you, but configuration for PostgreSQL is absolutely identical (apart from the autovacuum-lines which 8.0.3 doesn't like).

Kind regards

 Markus





[PERFORM] Queries taking ages in PG 8.1, have been much faster in PG=8.0

2005-12-01 Thread Markus Wollny
Hi!

I've got an urgent problem with an application which is evaluating a
monthly survey; it's running quite a lot of queries like this:

select  SOURCE.NAME as TYPE,
   count(PARTICIPANT.SESSION_ID) as TOTAL
from  (
   select  PARTICIPANT.SESSION_ID
   from   survey.PARTICIPANT,
  survey.ANSWER
   where  PARTICIPANT.STATUS = 1
   and   date_trunc('month', PARTICIPANT.CREATED) = date_trunc('month',
now()-'1 month'::interval)
   and   PARTICIPANT.SESSION_ID = ANSWER.SESSION_ID
   and   ANSWER.QUESTION_ID = 6
   and   ANSWER.VALUE = 1
   )
   as PARTICIPANT,
   survey.ANSWER,
   survey.HANDY_JAVA SOURCE
where  PARTICIPANT.SESSION_ID = ANSWER.SESSION_ID
and   ANSWER.QUESTION_ID = 16
and   ANSWER.VALUE = SOURCE.ID
group by SOURCE.NAME,
   SOURCE.POSITION
order by  SOURCE.POSITION asc;

My current PostgreSQL-version is PostgreSQL 8.1.0 on i686-pc-linux-gnu,
compiled by GCC gcc (GCC) 3.2. Up to 8.0, a query like this took a
couple of seconds, maybe even up to a minute. In 8.1 a query like this
will run from 30 minutes up to two hours to complete, depending on ist
complexity. I've got autovaccum enabled and run a nightly vacuum analyze
over all of my databases. Here's some information about the relevant
tables: Table answer has got ~ 8.9M rows (estimated 8,872,130, counted
8,876,648), participant has got ~178K rows (estimated 178,165, counted
178,248), HANDY_JAVA has got three rows. This is the
explain-analyze-output for the above:

Sort  (cost=11383.09..11383.10 rows=3 width=16) (actual
time=1952676.858..1952676.863 rows=3 loops=1)
  Sort Key: source.position
  -  HashAggregate  (cost=11383.03..11383.07 rows=3 width=16) (actual
time=1952676.626..1952676.635 rows=3 loops=1)
-  Nested Loop  (cost=189.32..11383.00 rows=5 width=16)
(actual time=6975.812..1952371.782 rows=9806 loops=1)
  -  Nested Loop  (cost=3.48..3517.47 rows=42 width=20)
(actual time=6819.716..15419.930 rows=9806 loops=1)
-  Nested Loop  (cost=3.48..1042.38 rows=738
width=16) (actual time=258.434..6233.039 rows=162723 loops=1)
  -  Seq Scan on handy_java source
(cost=0.00..1.03 rows=3 width=14) (actual time=0.093..0.118 rows=3
loops=1)
  -  Bitmap Heap Scan on answer
(cost=3.48..344.04 rows=246 width=8) (actual time=172.381..1820.499
rows=54241 loops=3)
Recheck Cond: ((answer.question_id =
16) AND (answer.value = outer.id))
-  Bitmap Index Scan on
idx02_performance  (cost=0.00..3.48 rows=246 width=0) (actual
time=98.321..98.321 rows=54245 loops=3)
  Index Cond: ((answer.question_id
= 16) AND (answer.value = outer.id))
-  Index Scan using idx01_perf_0006 on participant
(cost=0.00..3.34 rows=1 width=4) (actual time=0.049..0.050 rows=0
loops=162723)
  Index Cond: (participant.session_id =
outer.session_id)
  Filter: ((status = 1) AND
(date_trunc('month'::text, created) = date_trunc('month'::text, (now() -
'1 mon'::interval
  -  Bitmap Heap Scan on answer  (cost=185.85..187.26
rows=1 width=4) (actual time=197.490..197.494 rows=1 loops=9806)
Recheck Cond: ((outer.session_id =
answer.session_id) AND (answer.question_id = 6) AND (answer.value = 1))
-  BitmapAnd  (cost=185.85..185.85 rows=1 width=0)
(actual time=197.421..197.421 rows=0 loops=9806)
  -  Bitmap Index Scan on
idx_answer_session_id  (cost=0.00..2.83 rows=236 width=0) (actual
time=0.109..0.109 rows=49 loops=9806)
Index Cond: (outer.session_id =
answer.session_id)
  -  Bitmap Index Scan on idx02_performance
(cost=0.00..182.77 rows=20629 width=0) (actual time=195.742..195.742
rows=165697 loops=9806)
Index Cond: ((question_id = 6) AND
(value = 1))
Total runtime: 1952678.393 ms

I am really sorry, but currently I haven't got any 8.0-installation
left, so I cannot provide the explain (analyze) output for 8.0. 

I fiddled a little with the statement and managed to speed things up
quite a lot:
 
select  SOURCE.NAME as TYPE,
   count(ANSWER.SESSION_ID) as TOTAL
from survey.ANSWER,
   survey.HANDY_JAVA SOURCE
where  ANSWER.QUESTION_ID = 16
and   ANSWER.VALUE = SOURCE.ID
and ANSWER.SESSION_ID in (
   select  PARTICIPANT.SESSION_ID
   from   survey.PARTICIPANT,
  survey.ANSWER
   where  PARTICIPANT.STATUS = 1
   and   date_trunc('month', PARTICIPANT.CREATED) = date_trunc('month',
now()-'1 month'::interval)
   and   PARTICIPANT.SESSION_ID = ANSWER.SESSION_ID
   and   ANSWER.QUESTION_ID = 6
   and   ANSWER.VALUE = 1
   )
group by SOURCE.NAME,
   SOURCE.POSITION
order by  SOURCE.POSITION asc;

Here's the explain analyze output:
Sort  (cost=27835.39..27835.39 rows=3 width=16) (actual
time=9609.207..9609.212 rows=3 loops=1)
  Sort Key: source.position
  -  HashAggregate  

Re: [PERFORM] Queries taking ages in PG 8.1, have been much faster in PG=8.0

2005-12-01 Thread Markus Wollny

 -Ursprüngliche Nachricht-
 Von: Tom Lane [mailto:[EMAIL PROTECTED] 
 Gesendet: Donnerstag, 1. Dezember 2005 17:26
 An: Markus Wollny
 Cc: pgsql-performance@postgresql.org
 Betreff: Re: [PERFORM] Queries taking ages in PG 8.1, have 
 been much faster in PG=8.0 
 
 It looks like set enable_nestloop = 0 might be a workable 
 hack for the immediate need.  

Whow - that works miracles :)

Sort  (cost=81813.13..81813.14 rows=3 width=16) (actual 
time=7526.745..7526.751 rows=3 loops=1)
  Sort Key: source.position
  -  HashAggregate  (cost=81813.07..81813.11 rows=3 width=16) (actual 
time=7526.590..7526.601 rows=3 loops=1)
-  Merge Join  (cost=81811.40..81813.03 rows=5 width=16) (actual 
time=7423.289..7479.175 rows=9806 loops=1)
  Merge Cond: (outer.id = inner.value)
  -  Sort  (cost=1.05..1.06 rows=3 width=14) (actual 
time=0.085..0.091 rows=3 loops=1)
Sort Key: source.id
-  Seq Scan on handy_java source  (cost=0.00..1.03 rows=3 
width=14) (actual time=0.039..0.049 rows=3 loops=1)
  -  Sort  (cost=81810.35..81811.81 rows=583 width=8) (actual 
time=7423.179..7440.062 rows=9806 loops=1)
Sort Key: mafo.answer.value
-  Hash Join  (cost=27164.31..81783.57 rows=583 width=8) 
(actual time=6757.521..7360.822 rows=9806 loops=1)
  Hash Cond: (outer.session_id = inner.session_id)
  -  Bitmap Heap Scan on answer  
(cost=506.17..54677.92 rows=88334 width=8) (actual time=379.245..2660.344 
rows=162809 loops=1)
Recheck Cond: (question_id = 16)
-  Bitmap Index Scan on 
idx_answer_question_id  (cost=0.00..506.17 rows=88334 width=0) (actual 
time=274.632..274.632 rows=162814 loops=1)
  Index Cond: (question_id = 16)
  -  Hash  (cost=26655.21..26655.21 rows=1175 
width=8) (actual time=3831.362..3831.362 rows=9806 loops=1)
-  Hash Join  (cost=4829.33..26655.21 
rows=1175 width=8) (actual time=542.227..3800.985 rows=9806 loops=1)
  Hash Cond: (outer.session_id = 
inner.session_id)
  -  Bitmap Heap Scan on answer  
(cost=182.84..21429.34 rows=20641 width=4) (actual time=292.067..2750.376 
rows=165762 loops=1)
Recheck Cond: ((question_id = 6) 
AND (value = 1))
-  Bitmap Index Scan on 
idx02_performance  (cost=0.00..182.84 rows=20641 width=0) (actual 
time=167.306..167.306 rows=165769 loops=1)
  Index Cond: ((question_id = 
6) AND (value = 1))
  -  Hash  (cost=4621.13..4621.13 
rows=10141 width=4) (actual time=182.842..182.842 rows=11134 loops=1)
-  Index Scan using 
idx01_perf_0005 on participant  (cost=0.01..4621.13 rows=10141 width=4) (actual 
time=0.632..136.126 rows=11134 loops=1)
  Index Cond: 
(date_trunc('month'::text, created) = date_trunc('month'::text, (now() - '1 
mon'::interval)))
  Filter: (status = 1)
Total runtime: 7535.398 ms

 Once you're not under deadline, 
 I'd like to investigate more closely to find out why 8.1 does 
 worse than 8.0 here.

Please tell me what I can do to help in clearing up this issue, I'd be very 
happy to help! Heck, I am happy anyway that there's such a quick fix, even if 
it's not a beautiful one :)

Kind regards

   Markus

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] Strange planner decision on quite simple select

2005-10-25 Thread Markus Wollny
Hello!

I've got a table BOARD_MESSAGE (message_id int8, thread_id int8, ...)
with pk on message_id and and a non_unique not_null index on thread_id.
A count(*) on BOARD_MESSAGE currently yields a total of 1231171 rows,
the planner estimated a total of 1232530 rows in this table. I've got
pg_autovacuum running on the database and run an additional nightly
VACUUM ANALYZE over it every night.

I've got a few queries of the following type:

select  * 
from PUBLIC.BOARD_MESSAGE 
where THREAD_ID = 3354253 
order byMESSAGE_ID asc 
limit   20 
offset  0; 


There are currently roughly 4500 rows with this thread_id in
BOARD_MESSAGE. Explain-output is like so:

  QUERY PLAN 


-- 
 Limit  (cost=0.00..3927.22 rows=20 width=1148) 
   -  Index Scan using pk_board_message on board_message
(cost=0.00..1100800.55 rows=5606 width=1148) 
 Filter: (thread_id = 3354253) 
(3 rows) 

I didn't have the patience to actually complete an explain analyze on
that one - I cancelled the query on several attempts after more than 40
minutes runtime. Now I fiddled a little with this statement and tried
nudging the planner in the right direction like so:

explain analyze select * from (select  * 
from PUBLIC.BOARD_MESSAGE 
where THREAD_ID = 3354253 
order byMESSAGE_ID asc ) as foo 
limit   20 
offset  0; 
 
QUERY PLAN




-

 Limit  (cost=8083.59..8083.84 rows=20 width=464) (actual
time=1497.455..1498.466 rows=20 loops=1) 
   -  Subquery Scan foo  (cost=8083.59..8153.67 rows=5606 width=464)
(actual time=1497.447..1498.408 rows=20 loops=1) 
 -  Sort  (cost=8083.59..8097.61 rows=5606 width=1148) (actual
time=1497.326..1497.353 rows=20 loops=1) 
   Sort Key: message_id 
   -  Index Scan using nidx_bm_thread_id on board_message
(cost=0.00..7734.54 rows=5606 width=1148) (actual time=0.283..1431.752
rows=4215 loops=1)

 Index Cond: (thread_id = 3354253) 
 Total runtime: 1502.138 ms 

Now this is much more like it. As far as I interpret the explain output,
in the former case the planner decides to just sort the whole table with
it's 1.2m rows by it's primary key on message_id and then filters out
the few thousand rows matching the requested thread_id. In the latter
case, it selects the few thousand rows with the matching thread_id
_first_ and _then_ sorts them according to their message_id. The former
attempt involves sorting of more than a million rows and then filtering
through the result, the latter just uses the index to retrieve a few
thousand rows and sorts those - which is much more efficient.

What's more puzzling is that the results vary somewhat depending on the
overall load situation. When using the first approach without the
subselect, sometimes the planner chooses exactly the same plan as it
does with the second approach - with equally satisfying results in
regard to total execution time; sometimes it does use the first plan and
does complete with a very acceptable execution time, too. But sometimes
(when overall load is sufficiently high, I presume) it just runs and
runs for minutes on end - I've had this thing running for more than one
hour on several occasions until I made some changes to my app which
limits the maximum execution time for a query to no more than 55
seconds.

With this IMHO quite ugly subselect-workaround, performance is
reproducably stable and sufficiently good under either load, so I chose
to stick with it for the time being - but I'd still like to know if I
could have done anything to have the planner choose the evidently better
plan for the first query without such a workaround?

Kind regards

   Markus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Strange planner decision on quite simple select

2005-10-25 Thread Markus Wollny
 
Hi!

 -Ursprüngliche Nachricht-
 Von: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] Im Auftrag 
 von Richard Huxton
 Gesendet: Dienstag, 25. Oktober 2005 12:07
 An: Markus Wollny
 Cc: pgsql-performance@postgresql.org
 Betreff: Re: [PERFORM] Strange planner decision on quite simple select
 
 Hmm - it shouldn't take that long. If I'm reading this right, 
 it's expecting to have to fetch 5606 rows to match 
 thread_id=3354253 the 20 times you've asked for. Now, what it 
 probably doesn't know is that thread_id is correlated with 
 message_id quite highly (actually, I don't know that, I'm 
 guessing). So - it starts at message_id=1 and works along, 
 but I'm figuring that it needs to reach message_id's in the 
 3-4 million range to see any of the required thread.

Reading this I tried with adding a AND MESSAGE_ID = THREAD_ID to the 
WHERE-clause, as you've guessed quite correctly, both message_id and thread_id 
are derived from the same sequence and thread_id equals the lowest message_id 
in a thread. This alone did quite a lot to improve things - I got stable 
executing times down from an average 12 seconds to a mere 2 seconds - just 
about the same as with the subselect.

 Suggestions:
 1. Try ORDER BY thread_id,message_id and see if that nudges 
 things your way.
 2. Keep #1 and try replacing the index on (thread_id) with
 (thread_id,message_id)

Did both (though adding such an index during ordinary workload took some time 
as did the VACUUM ANALYZE afterwards) and that worked like a charm - I've got 
execution times down to as little as a few milliseconds - wow! Thank you very 
much for providing such insightful hints!

Kind regards

   Markus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Help tuning postgres

2005-10-13 Thread Markus Wollny
[EMAIL PROTECTED] wrote:

 Have you tried reindexing your active tables?

 It will cause some performance hit while you are doing it. It
 sounds like something is bloating rapidly on your system and
 the indexes is one possible place that could be happening.

You might consider using contrib/oid2name to monitor physical growth of
tables and indexes. There have been some issues with bloat in PostgreSQL
versions prior to 8.0, however there might still be some issues under
certain circumstances even now, so it does pay to cast an eye on what's
going on. If you haven't run vaccum regularly, this might lead to
regular vacuums not reclaiming enough dead tuples in one go, so if
you've had quite a lot of UPDATE/DELETE activity going onin the past and
only just started to use pg_autovacuum after the DB has been in
production for quite a while, you might indeed have to run a VACUUM FULL
and/or REINDEX on the affected tables, both of which will more or less
lock out any client access to the tables als long as they're running.

Kind regards

   Markus

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Forums tsearch2 - best practices reg. concurrency

2005-06-22 Thread Markus Wollny
Hello!
 
We're using PostgreSQL 8.0.1 as general backend for all of our websites,
including our online forums (aka bulletin boards or whatever you wish to
call that). As for full text search capabilities, we've chosen to
implement this via tsearch2. However, the tables themselves are quite
large, and as there's lots of weird user input in them (just no way of
limiting our users to proper orthography), so are the indices; we have
already split up the main posting-table in two, one containing the more
recent messages (6 months) and one for everything else.

Search capabilities have been limited to accessing only one of those,
either recent or archive. Still, the tsearch2-GiST-index for a table is
around 325MB in size; the recent messages table itself without any
indices weighs in at about 1.8GB containing over one million rows, the
archive-table is a little over 3GB and contains about 1.3 million rows.
A full text search in the table with the recent postings can take up to
five minutes.

This wouldn't be much of a problem, as we're providing other, quicker
search options (like searching for an author or a full text search just
on the topics); the problem with the full text search lies in the
locking mechanisms: As long as there's a search going on, all the
subsequent INSERTs or UPDATEs on that table fail due to timeout. This
means that currently, whenever we allow full text searching, there may
be a timeframe of more than one hour, during which users cannot write
any new postings in our forum or edit (i.e. update) anything. This is
hardly acceptable...

This is what I did to actually diagnose that simple tsearch2-related
SELECTs where causing the write-locks:

First I started a full text search query which I knew would run over
four minutes. Then I waited for other users to try and post some
messages; soon enough a 'ps ax|grep wait' showed several INSERT/UPDATE
waiting-backends. So I took a look at the locks:

select s.current_query as statement,
l.mode as lock_mode,
l.granted as lock_granted,
c.relname as locked_relation,
c.relnamespace as locked_relnamespace,
c.reltype as locked_reltype
from pg_stat_activity s,
pg_locks l,
pg_class c
where
l.pid = s.procpid
and
l.relation = c.oid
order by age(s.query_start) desc;

I found four locks for the search query at the very beginning of the
resultset - all of them of the AccessShareLock persuasion and granted
alright: one on the message-table, one on the thread-table, one on the
tsearch2-index and another one on the primary key index of the
thread-table.

The hanging inserts/updates were waiting for an AccessExclusiveLock on
the tsearch2-index - all the other locks of these queries were marked as
granted.

As far as I understand from some of the previous messages on the mailing
list regarding concurrency issues with GiST-type indices, any SELECT
that's using a tsearch2-index would completely lock write-access to that
index for the runtime of the query - is that correct so far?

Now I'd like to find out about possible solutions or workarounds for
this issue. Surely some of you must have encountered quite similar
situations, so what did you do about it? I already pondered the idea of
a separate insert/update-queue-table which would then be processed by a
cron-job, thus separating the information-entry from the actual insert
into the table that's blocked due to the lock on the index. Another
possibility (which I find a little bit more compelling) would involve
replicating the message-table via Slony-I to another database which
could then be used as only target for any search-queries which require
use of the GiST-index. Would this provide the needed asynchronicity to
avoid this race condition between the AccessShareLock from the
search-SELECT and the AccessExclusiveLock from the write access queries?

I'd be very glad to know your opinions on this matter.

Kind regards

   Markus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match