subject:"Re\: \[PERFORM\] Query seem to slow if table have more than 200 million rows"

Re: [PERFORM] Query seem to slow if table have more than 200 million rows

2005-10-03 Thread Qingqing Zhou


Ahmad Fajar [EMAIL PROTECTED] wrote
 Hi Qingqing,

 I don't know whether the statistic got is bad or good, this is the
 statistic:

Please do it in this way:

1. Start postmaster with stats_start_collector=true and 
stats_block_level=true.

2. Use psql connect it, do something like this:

test=# select pg_stat_reset();
 pg_stat_reset
---
 t
(1 row)

test=# select * from pg_statio_user_indexes ;
 relid | indexrelid | schemaname | relname | indexrelname | idx_blks_read | 
idx_
blks_hit
---+++-+--+---+-
-
 16385 |  16390 | public | test| test_idx | 0 |
   0
(1 row)

test=# select count(*) from test where a = 1234;
 count
---
  7243
(1 row)

test=# select * from pg_statio_user_indexes ;
 relid | indexrelid | schemaname | relname | indexrelname | idx_blks_read | 
idx_
blks_hit
---+++-+--+---+-
-
 16385 |  16390 | public | test| test_idx |55 |
   0
(1 row)


This gives us that to get select count(*) from test where a = 1234, I 
have to read 55 index blocks (no index block hit since I just restart 
postmaster so the bufferpool is empty).


Regards,
Qingqing



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] Query seem to slow if table have more than 200 million rows

2005-09-27 Thread Ahmad Fajar

Hi Qingqing,

I don't know whether the statistic got is bad or good, this is the
statistic:
scooby=# select a.relid, a.relname, b.indexrelid, b.indexrelname,
c.idx_scan, c.idx_tup_read, c.idx_tup_fetch,
scooby-# a.heap_blks_read, a.heap_blks_hit, a.idx_blks_read, a.idx_blks_hit,
scooby-# a.toast_blks_read, a.toast_blks_hit, a.tidx_blks_read,
a.tidx_blks_hit, b.idx_blks_read, b.idx_blks_hit
scooby-# from pg_statio_user_tables a, pg_statio_user_indexes b,
pg_stat_all_indexes c
scooby-# where a.relid=b.relid and a.relid=c.relid and
b.indexrelid=c.indexrelid and a.relname=b.relname and
scooby-# a.relname=c.relname and a.relname='fti_dict1';
  relid   |  relname  | indexrelid | indexrelname | idx_scan | idx_tup_read
| idx_tup_fetch | heap_blks_read | heap_blks_hit | idx
_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit |
tidx_blks_read | tidx_blks_hit | idx_blks_read | idx_blks_hit
--+---++--+--+--
+---++---+
---+--+-++--
--+---+---+--
 22880226 | fti_dict1 |   22880231 | idx_dict3|0 |0
| 0 |  0 | 0 |
 0 |0 | ||
|   | 0 |0
 22880226 | fti_dict1 |   22880230 | idx_dict2|7 |   592799
|592799 |  0 | 0 |
 0 |0 | ||
|   | 0 |0
 22880226 | fti_dict1 |   22880229 | idx_dict1|0 |0
| 0 |  0 | 0 |
 0 |0 | ||
|   | 0 |0
(3 rows)

I have try several time the query below with different keyword, but I just
got idx_tup_read and idx_tup_fetch changed, others keep zero. 
The Index are:
Ids (Idx_dict1), 
keywords (idx_dict2 varchar_ops),
keywords (idx_dict3 varchar_pattern_ops) == I use this index for query ...
keywords like 'blabla%', just for testing purpose

Regards,
ahmad fajar

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Qingqing Zhou
Sent: Selasa, 27 September 2005 8:43
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Query seem to slow if table have more than 200
million rows


Ahmad Fajar [EMAIL PROTECTED] wrote

 Select ids, keywords from dict where keywords='blabla' ('blabla' is a 
 single
 word);

 The table have 200 million rows, I have index the keywords field. On the
 first time my query seem to slow to get the result, about 15-60 sec to get
 the result. But if I repeat the query I will get fast result. My question 
 is
 why on the first time the query seem very slow.

 Table structure is quite simple:

 Ids bigint, keywords varchar(150), weight varchar(1), dpos int.


The first slowness is obviously caused by disk IOs. The second time is 
faster because all data pages it requires are already in buffer pool. 200 
million rows is not a problem for btree index, even if your client tool 
appends some spaces to your keywords at your insertion time, the ideal btree

is 5 to 6 layers high at most. Can you show the iostats of index from your 
statistics view? 
http://www.postgresql.org/docs/8.0/static/monitoring-stats.html#MONITORING-S
TATS-VIEWS

Regards,
Qingqing



---(end of broadcast)---
TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Query seem to slow if table have more than 200 million rows

2005-09-26 Thread Qingqing Zhou


Ahmad Fajar [EMAIL PROTECTED] wrote

 Select ids, keywords from dict where keywords='blabla' ('blabla' is a 
 single
 word);

 The table have 200 million rows, I have index the keywords field. On the
 first time my query seem to slow to get the result, about 15-60 sec to get
 the result. But if I repeat the query I will get fast result. My question 
 is
 why on the first time the query seem very slow.

 Table structure is quite simple:

 Ids bigint, keywords varchar(150), weight varchar(1), dpos int.


The first slowness is obviously caused by disk IOs. The second time is 
faster because all data pages it requires are already in buffer pool. 200 
million rows is not a problem for btree index, even if your client tool 
appends some spaces to your keywords at your insertion time, the ideal btree 
is 5 to 6 layers high at most. Can you show the iostats of index from your 
statistics view? 
http://www.postgresql.org/docs/8.0/static/monitoring-stats.html#MONITORING-STATS-VIEWS

Regards,
Qingqing



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] Query seem to slow if table have more than 200 million rows

Re: [PERFORM] Query seem to slow if table have more than 200 million rows

Re: [PERFORM] Query seem to slow if table have more than 200 million rows

3 matches

Site Navigation

Mail list logo

Footer information