Re: [PERFORM] Query seem to slow if table have more than 200 million rows
Ahmad Fajar [EMAIL PROTECTED] wrote Hi Qingqing, I don't know whether the statistic got is bad or good, this is the statistic: Please do it in this way: 1. Start postmaster with stats_start_collector=true and stats_block_level=true. 2. Use psql connect it, do something like this: test=# select pg_stat_reset(); pg_stat_reset --- t (1 row) test=# select * from pg_statio_user_indexes ; relid | indexrelid | schemaname | relname | indexrelname | idx_blks_read | idx_ blks_hit ---+++-+--+---+- - 16385 | 16390 | public | test| test_idx | 0 | 0 (1 row) test=# select count(*) from test where a = 1234; count --- 7243 (1 row) test=# select * from pg_statio_user_indexes ; relid | indexrelid | schemaname | relname | indexrelname | idx_blks_read | idx_ blks_hit ---+++-+--+---+- - 16385 | 16390 | public | test| test_idx |55 | 0 (1 row) This gives us that to get select count(*) from test where a = 1234, I have to read 55 index blocks (no index block hit since I just restart postmaster so the bufferpool is empty). Regards, Qingqing ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Query seem to slow if table have more than 200 million rows
Hi Qingqing, I don't know whether the statistic got is bad or good, this is the statistic: scooby=# select a.relid, a.relname, b.indexrelid, b.indexrelname, c.idx_scan, c.idx_tup_read, c.idx_tup_fetch, scooby-# a.heap_blks_read, a.heap_blks_hit, a.idx_blks_read, a.idx_blks_hit, scooby-# a.toast_blks_read, a.toast_blks_hit, a.tidx_blks_read, a.tidx_blks_hit, b.idx_blks_read, b.idx_blks_hit scooby-# from pg_statio_user_tables a, pg_statio_user_indexes b, pg_stat_all_indexes c scooby-# where a.relid=b.relid and a.relid=c.relid and b.indexrelid=c.indexrelid and a.relname=b.relname and scooby-# a.relname=c.relname and a.relname='fti_dict1'; relid | relname | indexrelid | indexrelname | idx_scan | idx_tup_read | idx_tup_fetch | heap_blks_read | heap_blks_hit | idx _blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit | tidx_blks_read | tidx_blks_hit | idx_blks_read | idx_blks_hit --+---++--+--+-- +---++---+ ---+--+-++-- --+---+---+-- 22880226 | fti_dict1 | 22880231 | idx_dict3|0 |0 | 0 | 0 | 0 | 0 |0 | || | | 0 |0 22880226 | fti_dict1 | 22880230 | idx_dict2|7 | 592799 |592799 | 0 | 0 | 0 |0 | || | | 0 |0 22880226 | fti_dict1 | 22880229 | idx_dict1|0 |0 | 0 | 0 | 0 | 0 |0 | || | | 0 |0 (3 rows) I have try several time the query below with different keyword, but I just got idx_tup_read and idx_tup_fetch changed, others keep zero. The Index are: Ids (Idx_dict1), keywords (idx_dict2 varchar_ops), keywords (idx_dict3 varchar_pattern_ops) == I use this index for query ... keywords like 'blabla%', just for testing purpose Regards, ahmad fajar -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Qingqing Zhou Sent: Selasa, 27 September 2005 8:43 To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Query seem to slow if table have more than 200 million rows Ahmad Fajar [EMAIL PROTECTED] wrote Select ids, keywords from dict where keywords='blabla' ('blabla' is a single word); The table have 200 million rows, I have index the keywords field. On the first time my query seem to slow to get the result, about 15-60 sec to get the result. But if I repeat the query I will get fast result. My question is why on the first time the query seem very slow. Table structure is quite simple: Ids bigint, keywords varchar(150), weight varchar(1), dpos int. The first slowness is obviously caused by disk IOs. The second time is faster because all data pages it requires are already in buffer pool. 200 million rows is not a problem for btree index, even if your client tool appends some spaces to your keywords at your insertion time, the ideal btree is 5 to 6 layers high at most. Can you show the iostats of index from your statistics view? http://www.postgresql.org/docs/8.0/static/monitoring-stats.html#MONITORING-S TATS-VIEWS Regards, Qingqing ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Query seem to slow if table have more than 200 million rows
Ahmad Fajar [EMAIL PROTECTED] wrote Select ids, keywords from dict where keywords='blabla' ('blabla' is a single word); The table have 200 million rows, I have index the keywords field. On the first time my query seem to slow to get the result, about 15-60 sec to get the result. But if I repeat the query I will get fast result. My question is why on the first time the query seem very slow. Table structure is quite simple: Ids bigint, keywords varchar(150), weight varchar(1), dpos int. The first slowness is obviously caused by disk IOs. The second time is faster because all data pages it requires are already in buffer pool. 200 million rows is not a problem for btree index, even if your client tool appends some spaces to your keywords at your insertion time, the ideal btree is 5 to 6 layers high at most. Can you show the iostats of index from your statistics view? http://www.postgresql.org/docs/8.0/static/monitoring-stats.html#MONITORING-STATS-VIEWS Regards, Qingqing ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Query seem to slow if table have more than 200 million rows
If I do a simple query like: Select ids, keywords from dict where keywords='blabla' ('blabla' is a single word); The table have 200 million rows, I have index the keywords field. On the first time my query seem to slow to get the result, about 15-60 sec to get the result. But if I repeat the query I will get fast result. My question is why on the first time the query seem very slow. Table structure is quite simple: Ids bigint, keywords varchar(150), weight varchar(1), dpos int. I use latest pgAdmin3 to test all queries. My linux box is Redhat 4 AS, kernel 2.6.9-11, postgresql version 8.0.3, 2x200 GB SATA 7200 RPM configure as RAID0 with ext3 file system for postgresql data only. 80 GB EIDE 7200 RPM with ext3 file system for OS only. The server has 2 GB RAM with P4 3,2 GHz. If I do this query on mssql server, with the same hardware spesification and same data, mssql server beat postgresql, the query about 0-4 sec to get the result. What wrong with my postgresql. wassalam, ahmad fajar