Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2
On Thu, May 30, 2013 at 10:22 AM, Igor Neyman wrote: >> -Original Message- >> From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance- >> ow...@postgresql.org] On Behalf Of Merlin Moncure >> Sent: Thursday, May 30, 2013 11:14 AM >> To: John Mudd >> Cc: pgsql-performance@postgresql.org >> Subject: Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2 >> >> On Mon, May 27, 2013 at 9:02 AM, John Mudd wrote: >> > Postgres 9.1.2 on Ubuntu 12.04 >> > >> > Any reason why a select by primary key would be slower than a select >> > that includes an ORDER BY? I was really hoping using the primary key >> > would give me a boost. >> > >> > I stopped the server and cleared the O/S cache using "sync; echo 3 > >> > /proc/sys/vm/drop_caches" between the runs. >> > >> > >> > >> > test=# VACUUM ANALYZE test_select; >> > VACUUM >> > >> > (stopped postgres; reset O/S cache; started postgres) >> > >> > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 >> > ORDER BY key1, key2, key3, id LIMIT 1; >> > QUERY >> > PLAN >> > - >> - >> > >> > Limit (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600 >> > rows=1 >> > loops=1) >> >-> Index Scan using my_key on test_select (cost=0.00..41895.49 >> > rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1) >> > Index Cond: (key1 >= 50) >> > Total runtime: 12.678 ms >> > >> > (stopped postgres; reset O/S cache; started postgres) >> >> >> why are you flushing postgres/os cache? when you do that, you are >> measuring raw read time from disks. Typical disk seek time is measured >> in milliseconds so the timings are completely appropriate once you >> remove caching effects. Hard drives (at least, the spinning >> kind) are slow and one of the major challenges of database and hardware >> engineering is working around their limitations. Fortunately it looks >> like faster storage will soon be commonplace for reasonable prices. >> >> merlin >> > > True. > But, on the hand (back to original question), > execution plans that John got before and after suggested change in > configuration parameters are exactly the same, though timing is different but > only due to buffer cache issue. Right. Well, I think Matheus's answer is the right one. But my point was that what's going here is we are measuring number of raw uncached seeks to satisfy query on index A vs B. Pure luck in terms of how the index data is organized could throw it off one way or the other. But the test methodology is bogus because the root index pages will stay hot so the more compact pkey will likely be slightly faster in real world usage. (but, I prefer the composite key style of design especially for range searching). merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2
I flushed the caches in an attempt to get meaningful results. I've seen complaints to previous posts that don't include clearing the caches. I agree this tends to be artificial in another direction. I will strive to come up with a more realistic test environment next time. Maybe performing many random reads initially to fill the caches with random blocks. That might allow for minimal assistance from the cache and be more realistic. On Thu, May 30, 2013 at 11:13 AM, Merlin Moncure wrote: > On Mon, May 27, 2013 at 9:02 AM, John Mudd wrote: > > Postgres 9.1.2 on Ubuntu 12.04 > > > > Any reason why a select by primary key would be slower than a select that > > includes an ORDER BY? I was really hoping using the primary key would > give > > me a boost. > > > > I stopped the server and cleared the O/S cache using "sync; echo 3 > > > /proc/sys/vm/drop_caches" between the runs. > > > > > > > > test=# VACUUM ANALYZE test_select; > > VACUUM > > > > (stopped postgres; reset O/S cache; started postgres) > > > > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 > ORDER > > BY key1, key2, key3, id LIMIT 1; > > QUERY PLAN > > > -- > > Limit (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600 > rows=1 > > loops=1) > >-> Index Scan using my_key on test_select (cost=0.00..41895.49 > > rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1) > > Index Cond: (key1 >= 50) > > Total runtime: 12.678 ms > > > > (stopped postgres; reset O/S cache; started postgres) > > > why are you flushing postgres/os cache? when you do that, you are > measuring raw read time from disks. Typical disk seek time is > measured in milliseconds so the timings are completely appropriate > once you remove caching effects. Hard drives (at least, the spinning > kind) are slow and one of the major challenges of database and > hardware engineering is working around their limitations. Fortunately > it looks like faster storage will soon be commonplace for reasonable > prices. > > merlin >
Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2
> -Original Message- > From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance- > ow...@postgresql.org] On Behalf Of Merlin Moncure > Sent: Thursday, May 30, 2013 11:14 AM > To: John Mudd > Cc: pgsql-performance@postgresql.org > Subject: Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2 > > On Mon, May 27, 2013 at 9:02 AM, John Mudd wrote: > > Postgres 9.1.2 on Ubuntu 12.04 > > > > Any reason why a select by primary key would be slower than a select > > that includes an ORDER BY? I was really hoping using the primary key > > would give me a boost. > > > > I stopped the server and cleared the O/S cache using "sync; echo 3 > > > /proc/sys/vm/drop_caches" between the runs. > > > > > > > > test=# VACUUM ANALYZE test_select; > > VACUUM > > > > (stopped postgres; reset O/S cache; started postgres) > > > > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 > > ORDER BY key1, key2, key3, id LIMIT 1; > > QUERY > > PLAN > > - > - > > > > Limit (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600 > > rows=1 > > loops=1) > >-> Index Scan using my_key on test_select (cost=0.00..41895.49 > > rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1) > > Index Cond: (key1 >= 50) > > Total runtime: 12.678 ms > > > > (stopped postgres; reset O/S cache; started postgres) > > > why are you flushing postgres/os cache? when you do that, you are > measuring raw read time from disks. Typical disk seek time is measured > in milliseconds so the timings are completely appropriate once you > remove caching effects. Hard drives (at least, the spinning > kind) are slow and one of the major challenges of database and hardware > engineering is working around their limitations. Fortunately it looks > like faster storage will soon be commonplace for reasonable prices. > > merlin > True. But, on the hand (back to original question), execution plans that John got before and after suggested change in configuration parameters are exactly the same, though timing is different but only due to buffer cache issue. Igor Neyman -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2
On Mon, May 27, 2013 at 9:02 AM, John Mudd wrote: > Postgres 9.1.2 on Ubuntu 12.04 > > Any reason why a select by primary key would be slower than a select that > includes an ORDER BY? I was really hoping using the primary key would give > me a boost. > > I stopped the server and cleared the O/S cache using "sync; echo 3 > > /proc/sys/vm/drop_caches" between the runs. > > > > test=# VACUUM ANALYZE test_select; > VACUUM > > (stopped postgres; reset O/S cache; started postgres) > > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 ORDER > BY key1, key2, key3, id LIMIT 1; > QUERY PLAN > -- > Limit (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600 rows=1 > loops=1) >-> Index Scan using my_key on test_select (cost=0.00..41895.49 > rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1) > Index Cond: (key1 >= 50) > Total runtime: 12.678 ms > > (stopped postgres; reset O/S cache; started postgres) why are you flushing postgres/os cache? when you do that, you are measuring raw read time from disks. Typical disk seek time is measured in milliseconds so the timings are completely appropriate once you remove caching effects. Hard drives (at least, the spinning kind) are slow and one of the major challenges of database and hardware engineering is working around their limitations. Fortunately it looks like faster storage will soon be commonplace for reasonable prices. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2
On Mon, May 27, 2013 at 11:02 AM, John Mudd wrote: > Postgres 9.1.2 on Ubuntu 12.04 > > Any reason why a select by primary key would be slower than a select that > includes an ORDER BY? I was really hoping using the primary key would give > me a boost. > > I stopped the server and cleared the O/S cache using "sync; echo 3 > > /proc/sys/vm/drop_caches" between the runs. > > > > test=# VACUUM ANALYZE test_select; > VACUUM > > (stopped postgres; reset O/S cache; started postgres) > > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 > ORDER BY key1, key2, key3, id LIMIT 1; > QUERY PLAN > > -- > Limit (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600 > rows=1 loops=1) >-> Index Scan using my_key on test_select (cost=0.00..41895.49 > rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1) > Index Cond: (key1 >= 50) > Total runtime: 12.678 ms > > (stopped postgres; reset O/S cache; started postgres) > > test=# explain analyze SELECT * FROM test_select WHERE id = 50; >QUERY PLAN > > > - > Index Scan using test_select_pkey on test_select (cost=0.00..8.36 rows=1 > width=21) (actual time=31.396..31.398 rows=1 loops=1) >Index Cond: (id = 50) > Total runtime: 31.504 ms > > > > Schema: > > test=# \d test_select > Table "public.test_select" > Column | Type |Modifiers > > > +--+-- > id | integer | not null default > nextval('test_select_id_seq'::regclass) > key1 | integer | > key2 | integer | > key3 | integer | > data | character(4) | > Indexes: > "test_select_pkey" PRIMARY KEY, btree (id) > "my_key" btree (key1, key2, key3, id) > > test=# > > > > Sample data: > > test=# SELECT * FROM test_select LIMIT 10; > id | key1 | key2 | key3 | data > ++++-- > 1 | 984966 | 283954 | 772063 | x > 2 | 817668 | 393533 | 924888 | x > 3 | 751039 | 798753 | 454309 | x > 4 | 128505 | 329643 | 280553 | x > 5 | 105600 | 257225 | 710015 | x > 6 | 323891 | 615614 | 83206 | x > 7 | 194054 | 63506 | 353171 | x > 8 | 212068 | 881225 | 271804 | x > 9 | 644180 | 26693 | 200738 | x > 10 | 136586 | 498699 | 554417 | x > (10 rows) > > > For me looks like "my_key" index should be better than the PK in this case. For some reasons: 1. You are using a ORDER BY that has the same fields (and at the same order) from your index, so PG only needs to navigate the index. 2. You are using LIMIT 1, which means PG only needs to fetch the first element which key1>=5 (and stop the search right after it). In the case of your PK, PG will need to navigate through the index and return only one value also, but in this case the number of entries it needs to look at is bigger, because "id" has more distinct values than "key1". Regards, -- Matheus de Oliveira Analista de Banco de Dados Dextra Sistemas - MPS.Br nível F! www.dextra.com.br/postgres
Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2
John, And can you please include BUFFERS to ANALYZE? Regards, Roman Konoval On Tue, May 28, 2013 at 9:48 AM, Evgeniy Shishkin wrote: > > > > > On 28.05.2013, at 2:17, John Mudd wrote: > > Thanks again. > > Well, I have two problems with using the CLUSTER option. It's only > temporary since any updates, depending how much free space is reserved per > page, requires re-running the CLUSTER. And my primary concern is that it > arbitrarily gives an unfair advantage to the primary key SELECT. Still, > it's easy to test so here are the results. The primary key still looses > even with the CLUSTER. Granted it is close but considering this is now an > unfair comparison it still doesn't make sense to me. How can a search for a > specific row that should be fairly straight forward take longer than a > search that includes an ORDER BY clause? > > > Well, you do just regular index scan because of LIMIT 1. > > And now it is just a matter of index size and table organization. > > I also don't understand why you consider CLUSTER unfair - the way you > populated the table was natural cluster over my_key. > > But it bothers me why my_key is always better. Can you please test it on > different values but the same rows? Because now it is two different tuples > and you count every io. > > > test=# CLUSTER test_select USING test_select_pkey ; > CLUSTER > test=# VACUUM ANALYZE test_select ; > VACUUM > > (stopped postgres; reset O/S cache; started postgres) > > > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 > ORDER BY key1, key2, key3, id LIMIT 1; > QUERY PLAN > > -- > Limit (cost=0.00..0.08 rows=1 width=21) (actual time=19.430..19.431 > rows=1 loops=1) >-> Index Scan using my_key on test_select (cost=0.00..41938.15 > rows=42 width=21) (actual time=19.428..19.428 rows=1 loops=1) > Index Cond: (key1 >= 50) > Total runtime: 19.526 ms > > > (stopped postgres; reset O/S cache; started postgres) > > > test=# explain analyze SELECT * FROM test_select WHERE id = 50; >QUERY PLAN > > > - > Index Scan using test_select_pkey on test_select (cost=0.00..8.36 rows=1 > width=21) (actual time=21.070..21.072 rows=1 loops=1) >Index Cond: (id = 50) > Total runtime: 21.178 ms > > > > > On Mon, May 27, 2013 at 10:59 AM, Evgeny Shishkin wrote: > >> >> On May 27, 2013, at 6:35 PM, John Mudd wrote: >> >> Thanks, that's easy enough to test. Didn't seem to help though. >> >> >> Ok. And if you CLUSTER tables USING PK? >> >> >> test=# REINDEX index test_select_pkey; >> REINDEX >> test=# VACUUM ANALYZE test_select ; >> VACUUM >> >> >> (stopped postgres; reset O/S cache; started postgres) >> >> test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 >> ORDER BY key1, key2, key3, id LIMIT 1; >> QUERY PLAN >> >> >> -- >> Limit (cost=0.00..0.08 rows=1 width=21) (actual time=16.368..16.369 >> rows=1 loops=1) >>-> Index Scan using my_key on test_select (cost=0.00..41981.16 >> rows=501333 width=21) (actual time=16.366..16.366 rows=1 loops=1) >> Index Cond: (key1 >= 50) >> Total runtime: 16.444 ms >> >> >> (stopped postgres; reset O/S cache; started postgres) >> >> test=# explain analyze SELECT * FROM test_select WHERE id = 50; >>QUERY PLAN >> >> - >> Index Scan using test_select_pkey on test_select (cost=0.00..8.36 >> rows=1 width=21) (actual time=23.072..23.074 rows=1 loops=1) >>Index Cond: (id = 50) >> Total runtime: 23.192 ms >> >> >> >> >> On Mon, May 27, 2013 at 10:21 AM, Evgeny Shishkin >> wrote: >> >>> >>> On May 27, 2013, at 6:02 PM, John Mudd wrote: >>> >>> > Postgres 9.1.2 on Ubuntu 12.04 >>> > >>> > Any reason why a select by primary key would be slower than a select >>> that includes an ORDER BY? I was really hoping using the primary key would >>> give me a boost. >>> > >>> >>> You created my_key after data loading, and PK was there all the time. >>> If you REINDEX PK, i bet it will be as fast. >>> >>> > I stopped the server and cleared the O/S cache using "sync; echo 3 > >>> /proc/sys/vm/drop_caches" between the runs. >>> > >>> > >>> > >>> > test=# VACUUM ANALYZE test_select; >>> > VACUUM >>> > >>> > (stopped postgres; reset O/S cache; started postgres) >>> > >>> > test=# explai
Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2
On 28.05.2013, at 2:17, John Mudd wrote: > Thanks again. > > Well, I have two problems with using the CLUSTER option. It's only temporary > since any updates, depending how much free space is reserved per page, > requires re-running the CLUSTER. And my primary concern is that it > arbitrarily gives an unfair advantage to the primary key SELECT. Still, it's > easy to test so here are the results. The primary key still looses even with > the CLUSTER. Granted it is close but considering this is now an unfair > comparison it still doesn't make sense to me. How can a search for a specific > row that should be fairly straight forward take longer than a search that > includes an ORDER BY clause? > Well, you do just regular index scan because of LIMIT 1. And now it is just a matter of index size and table organization. I also don't understand why you consider CLUSTER unfair - the way you populated the table was natural cluster over my_key. But it bothers me why my_key is always better. Can you please test it on different values but the same rows? Because now it is two different tuples and you count every io. > > test=# CLUSTER test_select USING test_select_pkey ; > CLUSTER > test=# VACUUM ANALYZE test_select ; > VACUUM > >> (stopped postgres; reset O/S cache; started postgres) > > > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 ORDER > BY key1, key2, key3, id LIMIT 1; > QUERY PLAN > -- > Limit (cost=0.00..0.08 rows=1 width=21) (actual time=19.430..19.431 rows=1 > loops=1) >-> Index Scan using my_key on test_select (cost=0.00..41938.15 > rows=42 width=21) (actual time=19.428..19.428 rows=1 loops=1) > Index Cond: (key1 >= 50) > Total runtime: 19.526 ms > > >> (stopped postgres; reset O/S cache; started postgres) > > > test=# explain analyze SELECT * FROM test_select WHERE id = 50; >QUERY PLAN > > - > Index Scan using test_select_pkey on test_select (cost=0.00..8.36 rows=1 > width=21) (actual time=21.070..21.072 rows=1 loops=1) >Index Cond: (id = 50) > Total runtime: 21.178 ms > > > > > On Mon, May 27, 2013 at 10:59 AM, Evgeny Shishkin > wrote: >> >> On May 27, 2013, at 6:35 PM, John Mudd wrote: >> >>> Thanks, that's easy enough to test. Didn't seem to help though. >> >> Ok. And if you CLUSTER tables USING PK? >> >>> >>> test=# REINDEX index test_select_pkey; >>> REINDEX >>> test=# VACUUM ANALYZE test_select ; >>> VACUUM >>> >>> >>> (stopped postgres; reset O/S cache; started postgres) >>> >>> test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 ORDER >>> BY key1, key2, key3, id LIMIT 1; >>> QUERY PLAN >>> >>> -- >>> Limit (cost=0.00..0.08 rows=1 width=21) (actual time=16.368..16.369 >>> rows=1 loops=1) >>>-> Index Scan using my_key on test_select (cost=0.00..41981.16 >>> rows=501333 width=21) (actual time=16.366..16.366 rows=1 loops=1) >>> Index Cond: (key1 >= 50) >>> Total runtime: 16.444 ms >>> >>> >>> (stopped postgres; reset O/S cache; started postgres) >>> >>> test=# explain analyze SELECT * FROM test_select WHERE id = 50; >>>QUERY PLAN >>> - >>> Index Scan using test_select_pkey on test_select (cost=0.00..8.36 rows=1 >>> width=21) (actual time=23.072..23.074 rows=1 loops=1) >>>Index Cond: (id = 50) >>> Total runtime: 23.192 ms >>> >>> >>> >>> >>> On Mon, May 27, 2013 at 10:21 AM, Evgeny Shishkin >>> wrote: On May 27, 2013, at 6:02 PM, John Mudd wrote: > Postgres 9.1.2 on Ubuntu 12.04 > > Any reason why a select by primary key would be slower than a select > that includes an ORDER BY? I was really hoping using the primary key > would give me a boost. > You created my_key after data loading, and PK was there all the time. If you REINDEX PK, i bet it will be as fast. > I stopped the server and cleared the O/S cache using "sync; echo 3 > > /proc/sys/vm/drop_caches" between the runs. > > > > test=# VACUUM ANALYZE test_select; > VACUUM > > (stopped p
Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2
Thanks again. Well, I have two problems with using the CLUSTER option. It's only temporary since any updates, depending how much free space is reserved per page, requires re-running the CLUSTER. And my primary concern is that it arbitrarily gives an unfair advantage to the primary key SELECT. Still, it's easy to test so here are the results. The primary key still looses even with the CLUSTER. Granted it is close but considering this is now an unfair comparison it still doesn't make sense to me. How can a search for a specific row that should be fairly straight forward take longer than a search that includes an ORDER BY clause? test=# CLUSTER test_select USING test_select_pkey ; CLUSTER test=# VACUUM ANALYZE test_select ; VACUUM (stopped postgres; reset O/S cache; started postgres) test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 ORDER BY key1, key2, key3, id LIMIT 1; QUERY PLAN -- Limit (cost=0.00..0.08 rows=1 width=21) (actual time=19.430..19.431 rows=1 loops=1) -> Index Scan using my_key on test_select (cost=0.00..41938.15 rows=42 width=21) (actual time=19.428..19.428 rows=1 loops=1) Index Cond: (key1 >= 50) Total runtime: 19.526 ms (stopped postgres; reset O/S cache; started postgres) test=# explain analyze SELECT * FROM test_select WHERE id = 50; QUERY PLAN - Index Scan using test_select_pkey on test_select (cost=0.00..8.36 rows=1 width=21) (actual time=21.070..21.072 rows=1 loops=1) Index Cond: (id = 50) Total runtime: 21.178 ms On Mon, May 27, 2013 at 10:59 AM, Evgeny Shishkin wrote: > > On May 27, 2013, at 6:35 PM, John Mudd wrote: > > Thanks, that's easy enough to test. Didn't seem to help though. > > > Ok. And if you CLUSTER tables USING PK? > > > test=# REINDEX index test_select_pkey; > REINDEX > test=# VACUUM ANALYZE test_select ; > VACUUM > > > (stopped postgres; reset O/S cache; started postgres) > > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 > ORDER BY key1, key2, key3, id LIMIT 1; > QUERY PLAN > > > -- > Limit (cost=0.00..0.08 rows=1 width=21) (actual time=16.368..16.369 > rows=1 loops=1) >-> Index Scan using my_key on test_select (cost=0.00..41981.16 > rows=501333 width=21) (actual time=16.366..16.366 rows=1 loops=1) > Index Cond: (key1 >= 50) > Total runtime: 16.444 ms > > > (stopped postgres; reset O/S cache; started postgres) > > test=# explain analyze SELECT * FROM test_select WHERE id = 50; >QUERY PLAN > > - > Index Scan using test_select_pkey on test_select (cost=0.00..8.36 > rows=1 width=21) (actual time=23.072..23.074 rows=1 loops=1) >Index Cond: (id = 50) > Total runtime: 23.192 ms > > > > > On Mon, May 27, 2013 at 10:21 AM, Evgeny Shishkin wrote: > >> >> On May 27, 2013, at 6:02 PM, John Mudd wrote: >> >> > Postgres 9.1.2 on Ubuntu 12.04 >> > >> > Any reason why a select by primary key would be slower than a select >> that includes an ORDER BY? I was really hoping using the primary key would >> give me a boost. >> > >> >> You created my_key after data loading, and PK was there all the time. >> If you REINDEX PK, i bet it will be as fast. >> >> > I stopped the server and cleared the O/S cache using "sync; echo 3 > >> /proc/sys/vm/drop_caches" between the runs. >> > >> > >> > >> > test=# VACUUM ANALYZE test_select; >> > VACUUM >> > >> > (stopped postgres; reset O/S cache; started postgres) >> > >> > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 >> ORDER BY key1, key2, key3, id LIMIT 1; >> > QUERY PLAN >> > >> -- >> > Limit (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600 >> rows=1 loops=1) >> >-> Index Scan using my_key on test_select (cost=0.00..41895.49 >> rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1) >> > Index Cond: (key1 >= 50) >> > Total runtime: 12.678 ms >> > >> > (stopped postgres; reset O/S cache; started postgres) >> > >> > test=# explain analyze SELECT * FROM test_select WHERE id = 50; >> >
Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2
On May 27, 2013, at 6:35 PM, John Mudd wrote: > Thanks, that's easy enough to test. Didn't seem to help though. > Ok. And if you CLUSTER tables USING PK? > > test=# REINDEX index test_select_pkey; > REINDEX > test=# VACUUM ANALYZE test_select ; > VACUUM > > > (stopped postgres; reset O/S cache; started postgres) > > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 ORDER > BY key1, key2, key3, id LIMIT 1; > QUERY PLAN > > -- > Limit (cost=0.00..0.08 rows=1 width=21) (actual time=16.368..16.369 rows=1 > loops=1) >-> Index Scan using my_key on test_select (cost=0.00..41981.16 > rows=501333 width=21) (actual time=16.366..16.366 rows=1 loops=1) > Index Cond: (key1 >= 50) > Total runtime: 16.444 ms > > > (stopped postgres; reset O/S cache; started postgres) > > test=# explain analyze SELECT * FROM test_select WHERE id = 50; >QUERY PLAN > - > Index Scan using test_select_pkey on test_select (cost=0.00..8.36 rows=1 > width=21) (actual time=23.072..23.074 rows=1 loops=1) >Index Cond: (id = 50) > Total runtime: 23.192 ms > > > > > On Mon, May 27, 2013 at 10:21 AM, Evgeny Shishkin > wrote: > > On May 27, 2013, at 6:02 PM, John Mudd wrote: > > > Postgres 9.1.2 on Ubuntu 12.04 > > > > Any reason why a select by primary key would be slower than a select that > > includes an ORDER BY? I was really hoping using the primary key would give > > me a boost. > > > > You created my_key after data loading, and PK was there all the time. > If you REINDEX PK, i bet it will be as fast. > > > I stopped the server and cleared the O/S cache using "sync; echo 3 > > > /proc/sys/vm/drop_caches" between the runs. > > > > > > > > test=# VACUUM ANALYZE test_select; > > VACUUM > > > > (stopped postgres; reset O/S cache; started postgres) > > > > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 ORDER > > BY key1, key2, key3, id LIMIT 1; > > QUERY PLAN > > -- > > Limit (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600 > > rows=1 loops=1) > >-> Index Scan using my_key on test_select (cost=0.00..41895.49 > > rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1) > > Index Cond: (key1 >= 50) > > Total runtime: 12.678 ms > > > > (stopped postgres; reset O/S cache; started postgres) > > > > test=# explain analyze SELECT * FROM test_select WHERE id = 50; > >QUERY PLAN > > - > > Index Scan using test_select_pkey on test_select (cost=0.00..8.36 rows=1 > > width=21) (actual time=31.396..31.398 rows=1 loops=1) > >Index Cond: (id = 50) > > Total runtime: 31.504 ms > > > > > > > > Schema: > > > > test=# \d test_select > > Table "public.test_select" > > Column | Type |Modifiers > > +--+-- > > id | integer | not null default > > nextval('test_select_id_seq'::regclass) > > key1 | integer | > > key2 | integer | > > key3 | integer | > > data | character(4) | > > Indexes: > > "test_select_pkey" PRIMARY KEY, btree (id) > > "my_key" btree (key1, key2, key3, id) > > > > test=# > > > > > > > > Sample data: > > > > test=# SELECT * FROM test_select LIMIT 10; > > id | key1 | key2 | key3 | data > > ++++-- > > 1 | 984966 | 283954 | 772063 | x > > 2 | 817668 | 393533 | 924888 | x > > 3 | 751039 | 798753 | 454309 | x > > 4 | 128505 | 329643 | 280553 | x > > 5 | 105600 | 257225 | 710015 | x > > 6 | 323891 | 615614 | 83206 | x > > 7 | 194054 | 63506 | 353171 | x > > 8 | 212068 | 881225 | 271804 | x > > 9 | 644180 | 26693 | 200738 | x > > 10 | 136586 | 498699 | 554417 | x > > (10 rows) > > > > > > > > > > Here's how I populated the table: > > > > import psycopg2 > > > > conn = psycopg2.connect('dbname=test') > > > > cur = conn.cursor() > > > > def random_int(): > > n = 100 > > return random.randint(0,n) > > > > def random_key(): > > return random_int(), random_int(), random_int() > > > > def create_table(): > > cur.execute(''' > >
Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2
Thanks, that's easy enough to test. Didn't seem to help though. test=# REINDEX index test_select_pkey; REINDEX test=# VACUUM ANALYZE test_select ; VACUUM (stopped postgres; reset O/S cache; started postgres) test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 ORDER BY key1, key2, key3, id LIMIT 1; QUERY PLAN -- Limit (cost=0.00..0.08 rows=1 width=21) (actual time=16.368..16.369 rows=1 loops=1) -> Index Scan using my_key on test_select (cost=0.00..41981.16 rows=501333 width=21) (actual time=16.366..16.366 rows=1 loops=1) Index Cond: (key1 >= 50) Total runtime: 16.444 ms (stopped postgres; reset O/S cache; started postgres) test=# explain analyze SELECT * FROM test_select WHERE id = 50; QUERY PLAN - Index Scan using test_select_pkey on test_select (cost=0.00..8.36 rows=1 width=21) (actual time=23.072..23.074 rows=1 loops=1) Index Cond: (id = 50) Total runtime: 23.192 ms On Mon, May 27, 2013 at 10:21 AM, Evgeny Shishkin wrote: > > On May 27, 2013, at 6:02 PM, John Mudd wrote: > > > Postgres 9.1.2 on Ubuntu 12.04 > > > > Any reason why a select by primary key would be slower than a select > that includes an ORDER BY? I was really hoping using the primary key would > give me a boost. > > > > You created my_key after data loading, and PK was there all the time. > If you REINDEX PK, i bet it will be as fast. > > > I stopped the server and cleared the O/S cache using "sync; echo 3 > > /proc/sys/vm/drop_caches" between the runs. > > > > > > > > test=# VACUUM ANALYZE test_select; > > VACUUM > > > > (stopped postgres; reset O/S cache; started postgres) > > > > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 > ORDER BY key1, key2, key3, id LIMIT 1; > > QUERY PLAN > > > -- > > Limit (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600 > rows=1 loops=1) > >-> Index Scan using my_key on test_select (cost=0.00..41895.49 > rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1) > > Index Cond: (key1 >= 50) > > Total runtime: 12.678 ms > > > > (stopped postgres; reset O/S cache; started postgres) > > > > test=# explain analyze SELECT * FROM test_select WHERE id = 50; > >QUERY PLAN > > > - > > Index Scan using test_select_pkey on test_select (cost=0.00..8.36 > rows=1 width=21) (actual time=31.396..31.398 rows=1 loops=1) > >Index Cond: (id = 50) > > Total runtime: 31.504 ms > > > > > > > > Schema: > > > > test=# \d test_select > > Table "public.test_select" > > Column | Type |Modifiers > > > +--+-- > > id | integer | not null default > nextval('test_select_id_seq'::regclass) > > key1 | integer | > > key2 | integer | > > key3 | integer | > > data | character(4) | > > Indexes: > > "test_select_pkey" PRIMARY KEY, btree (id) > > "my_key" btree (key1, key2, key3, id) > > > > test=# > > > > > > > > Sample data: > > > > test=# SELECT * FROM test_select LIMIT 10; > > id | key1 | key2 | key3 | data > > ++++-- > > 1 | 984966 | 283954 | 772063 | x > > 2 | 817668 | 393533 | 924888 | x > > 3 | 751039 | 798753 | 454309 | x > > 4 | 128505 | 329643 | 280553 | x > > 5 | 105600 | 257225 | 710015 | x > > 6 | 323891 | 615614 | 83206 | x > > 7 | 194054 | 63506 | 353171 | x > > 8 | 212068 | 881225 | 271804 | x > > 9 | 644180 | 26693 | 200738 | x > > 10 | 136586 | 498699 | 554417 | x > > (10 rows) > > > > > > > > > > Here's how I populated the table: > > > > import psycopg2 > > > > conn = psycopg2.connect('dbname=test') > > > > cur = conn.cursor() > > > > def random_int(): > > n = 100 > > return random.randint(0,n) > > > > def random_key(): > > return random_int(), random_int(), random_int() > > > > def create_table(): > > cur.execute(''' > > DROP TABLE IF EXISTS test_select; > > > > CREATE TABLE test_select ( > > id SERIAL PRIMARY KEY, > > key1INTEGER, > > key2INTEGER
Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2
On May 27, 2013, at 6:02 PM, John Mudd wrote: > Postgres 9.1.2 on Ubuntu 12.04 > > Any reason why a select by primary key would be slower than a select that > includes an ORDER BY? I was really hoping using the primary key would give me > a boost. > You created my_key after data loading, and PK was there all the time. If you REINDEX PK, i bet it will be as fast. > I stopped the server and cleared the O/S cache using "sync; echo 3 > > /proc/sys/vm/drop_caches" between the runs. > > > > test=# VACUUM ANALYZE test_select; > VACUUM > > (stopped postgres; reset O/S cache; started postgres) > > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 ORDER > BY key1, key2, key3, id LIMIT 1; > QUERY PLAN > -- > Limit (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600 rows=1 > loops=1) >-> Index Scan using my_key on test_select (cost=0.00..41895.49 > rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1) > Index Cond: (key1 >= 50) > Total runtime: 12.678 ms > > (stopped postgres; reset O/S cache; started postgres) > > test=# explain analyze SELECT * FROM test_select WHERE id = 50; >QUERY PLAN > > - > Index Scan using test_select_pkey on test_select (cost=0.00..8.36 rows=1 > width=21) (actual time=31.396..31.398 rows=1 loops=1) >Index Cond: (id = 50) > Total runtime: 31.504 ms > > > > Schema: > > test=# \d test_select > Table "public.test_select" > Column | Type |Modifiers > > +--+-- > id | integer | not null default > nextval('test_select_id_seq'::regclass) > key1 | integer | > key2 | integer | > key3 | integer | > data | character(4) | > Indexes: > "test_select_pkey" PRIMARY KEY, btree (id) > "my_key" btree (key1, key2, key3, id) > > test=# > > > > Sample data: > > test=# SELECT * FROM test_select LIMIT 10; > id | key1 | key2 | key3 | data > ++++-- > 1 | 984966 | 283954 | 772063 | x > 2 | 817668 | 393533 | 924888 | x > 3 | 751039 | 798753 | 454309 | x > 4 | 128505 | 329643 | 280553 | x > 5 | 105600 | 257225 | 710015 | x > 6 | 323891 | 615614 | 83206 | x > 7 | 194054 | 63506 | 353171 | x > 8 | 212068 | 881225 | 271804 | x > 9 | 644180 | 26693 | 200738 | x > 10 | 136586 | 498699 | 554417 | x > (10 rows) > > > > > Here's how I populated the table: > > import psycopg2 > > conn = psycopg2.connect('dbname=test') > > cur = conn.cursor() > > def random_int(): > n = 100 > return random.randint(0,n) > > def random_key(): > return random_int(), random_int(), random_int() > > def create_table(): > cur.execute(''' > DROP TABLE IF EXISTS test_select; > > CREATE TABLE test_select ( > id SERIAL PRIMARY KEY, > key1INTEGER, > key2INTEGER, > key3INTEGER, > datachar(4) > ); > ''') > conn.commit() > > n = 100 > for i in range(n): > cur.execute("INSERT INTO test_select(key1, key2, key3, data) > VALUES(%s, %s, %s, 'x')", random_key()) > conn.commit() > > cur.execute('CREATE INDEX my_key ON test_select(key1, key2, key3, id)') > conn.commit() > > create_table() > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Slow SELECT by primary key? Postgres 9.1.2
Postgres 9.1.2 on Ubuntu 12.04 Any reason why a select by primary key would be slower than a select that includes an ORDER BY? I was really hoping using the primary key would give me a boost. I stopped the server and cleared the O/S cache using "sync; echo 3 > /proc/sys/vm/drop_caches" between the runs. test=# VACUUM ANALYZE test_select; VACUUM (stopped postgres; reset O/S cache; started postgres) test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 ORDER BY key1, key2, key3, id LIMIT 1; QUERY PLAN -- Limit (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600 rows=1 loops=1) -> Index Scan using my_key on test_select (cost=0.00..41895.49 rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1) Index Cond: (key1 >= 50) Total runtime: 12.678 ms (stopped postgres; reset O/S cache; started postgres) test=# explain analyze SELECT * FROM test_select WHERE id = 50; QUERY PLAN - Index Scan using test_select_pkey on test_select (cost=0.00..8.36 rows=1 width=21) (actual time=31.396..31.398 rows=1 loops=1) Index Cond: (id = 50) Total runtime: 31.504 ms Schema: test=# \d test_select Table "public.test_select" Column | Type |Modifiers +--+-- id | integer | not null default nextval('test_select_id_seq'::regclass) key1 | integer | key2 | integer | key3 | integer | data | character(4) | Indexes: "test_select_pkey" PRIMARY KEY, btree (id) "my_key" btree (key1, key2, key3, id) test=# Sample data: test=# SELECT * FROM test_select LIMIT 10; id | key1 | key2 | key3 | data ++++-- 1 | 984966 | 283954 | 772063 | x 2 | 817668 | 393533 | 924888 | x 3 | 751039 | 798753 | 454309 | x 4 | 128505 | 329643 | 280553 | x 5 | 105600 | 257225 | 710015 | x 6 | 323891 | 615614 | 83206 | x 7 | 194054 | 63506 | 353171 | x 8 | 212068 | 881225 | 271804 | x 9 | 644180 | 26693 | 200738 | x 10 | 136586 | 498699 | 554417 | x (10 rows) Here's how I populated the table: import psycopg2 conn = psycopg2.connect('dbname=test') cur = conn.cursor() def random_int(): n = 100 return random.randint(0,n) def random_key(): return random_int(), random_int(), random_int() def create_table(): cur.execute(''' DROP TABLE IF EXISTS test_select; CREATE TABLE test_select ( id SERIAL PRIMARY KEY, key1INTEGER, key2INTEGER, key3INTEGER, datachar(4) ); ''') conn.commit() n = 100 for i in range(n): cur.execute("INSERT INTO test_select(key1, key2, key3, data) VALUES(%s, %s, %s, 'x')", random_key()) conn.commit() cur.execute('CREATE INDEX my_key ON test_select(key1, key2, key3, id)') conn.commit() create_table()
Re: [PERFORM] Slow SELECT on small table
Martin Boese wrote: > The table has only ~1400 rows. A count(*) takes more than 70 > seconds. Other tables are fast as usual. > > When this happens I can also see my system's disks are suffering. > 'systat -vm' shows 100% disk load at ~4MB/sec data rates. > > A simple VACUUM does *not* fix it, a VACUUM FULL however does. See > the textfile attached. This is almost certainly a result of bloat on this table. Autovacuum should normally protect you from that, but there are a few things which can prevent it from doing so, like long-running transactions or repeated updates against the entire table in a short time. There has also been a bug found recently which, as I understand it, can cause autovacuum to become less aggressive over time, which might possibly contribute to this sort of problem. You appear to have snipped the portion of the vacuum output which might have confirmed and quantified the problem. If you get into this state again, the entire output of this would be informative: VACUUM VERBOSE public.circuit; The goal would be to try to prevent the bloat in the first place so that you don't need to use aggressive maintenance like VACUUM FULL to recover. Manual vacuums or tweaking the autovacuum parameters may help. Also, keep an eye out for maintenance releases for 9.0; there's likely to be a fix coming which will help you with this. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Slow SELECT on small table
Hi, I am using Postgresql: 9.01, PostGIS 1.5 on FreeBSD 7.0. I have at least one table on which SELECT's turn terribly slow from time to time. This happened at least three times, also on version 8.4. The table has only ~1400 rows. A count(*) takes more than 70 seconds. Other tables are fast as usual. When this happens I can also see my system's disks are suffering. 'systat -vm' shows 100% disk load at ~4MB/sec data rates. A simple VACUUM does *not* fix it, a VACUUM FULL however does. See the textfile attached. My postgresql.conf is untouched as per distribution. Can someone hint me how I can troubleshoot this problem? Thanks! Martin mbms=# select version(); version -- PostgreSQL 9.0.1 on amd64-portbld-freebsd7.0, compiled by GCC cc (GCC) 4.2.1 20070719 [FreeBSD], 64-bit (1 row) mbms=# EXPLAIN ANALYZE select count(*) from circuit; QUERY PLAN - Aggregate (cost=22213.20..22213.21 rows=1 width=0) (actual time=72847.131..72847.133 rows=1 loops=1) -> Seq Scan on circuit (cost=0.00..22209.56 rows=1456 width=0) (actual time=15527.799..72844.633 rows=1456 loops=1) Total runtime: 72922.484 ms (3 rows) mbms=# VACUUM FULL VERBOSE ANALYZE ; (..snip..) INFO: vacuuming "public.circuit" INFO: analyzing "public.circuit" INFO: "circuit": scanned 41 of 41 pages, containing 1456 live rows and 0 dead rows; 1456 rows in sample, 1456 estimated total rows VACUUM mbms=# EXPLAIN ANALYZE select count(*) from circuit; QUERY PLAN -- Aggregate (cost=59.20..59.21 rows=1 width=0) (actual time=4.416..4.418 rows=1 loops=1) -> Seq Scan on circuit (cost=0.00..55.56 rows=1456 width=0) (actual time=0.013..2.302 rows=1456 loops=1) Total runtime: 4.470 ms (3 rows) mbms=# \d circuit Table "public.circuit" Column|Type | Modifiers -+-+-- id | integer | not null default nextval('circuit_id_seq'::regclass) router_id | integer | customer_id | integer | site| character varying(256) | service | character varying(48) | last_ifdescr| character varying(256) | last_ifalias| character varying(256) | bandwidth_ul| integer | default 0 bandwidth_dl| integer | default 0 last_seen | timestamp without time zone | last_ifindex| integer | default 0 termination_point_id| integer | last_adminstatus| smallint| default 1 status | character(1)| default 'A'::bpchar location| geometry| circuit_type| character(1)| default 'R'::bpchar p_local | character varying(256) | p_remote| character varying(256) | p_technology| character varying(64) | x_provider | character varying(256) | x_provider_reference| character varying(128) | termination_point_reference | character varying(64) | is_monitored| boolean | default false temp_config | boolean | default false activation_date | timestamp without time zone | s_last_did | character varying(64) | td_cache_class | character varying(128) | td_cache_id | integer | td_cache_lastupdate | timestamp without time zone | commercial_service_id | integer | Indexes: "circuit_id_key" UNIQUE, btree (id) "id_circuit_status" btree (status) "idx_circuit_customer_id" btree (customer_id) "idx_circuit_site" btree (site) Check constraints: "enforce_dims_location" CHECK (ndims(location) = 2) "enforce_geotype_location" CHECK (geom
Re: [PERFORM] Slow "Select count(*) ..." query on table with 60 Mio. rows
Kevin Grittner wrote: Greg Smith wrote: There's a whole list of FAQs that are documented on the wiki but not in the main FAQ yet leftover from before the main FAQ was hosted there. You can see them all at http://wiki.postgresql.org/wiki/Frequently_Asked_Questions I just haven't had time to merge those all usefully into the main FAQ. Well, unless you object to the way I did it, there's one down. Should I remove it from the list of "Other FAQs" on the page you cite? Sure; everyone should feel free to assimilate into the main FAQ and wipe out anything on that smaller list. Those are mainly topics where the discussion of workarounds and approaches can be much longer than standard FAQ length, so I suspect many of the answers are going to be a very brief summary with a link to longer discussion. If you come across a really small one, we might even wipe out the original page once it's merged in. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com
Re: [PERFORM] Slow "Select count(*) ..." query on table with 60 Mio. rows
Greg Smith wrote: > There's a whole list of FAQs that are documented on the wiki but > not in the main FAQ yet leftover from before the main FAQ was > hosted there. You can see them all at > http://wiki.postgresql.org/wiki/Frequently_Asked_Questions > > I just haven't had time to merge those all usefully into the main > FAQ. Well, unless you object to the way I did it, there's one down. Should I remove it from the list of "Other FAQs" on the page you cite? (Of course, it goes without saying that you're welcome to improve upon anything I put in there.) -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow "Select count(*) ..." query on table with 60 Mio. rows
Kevin Grittner wrote: Greg Smith wrote: The content was already there, just not linked into the main FAQ yet: http://wiki.postgresql.org/wiki/Slow_Counting For a question asked this frequently, it should probably be in the FAQ. I'll add a link from there to the more thorough write-up. There's a whole list of FAQs that are documented on the wiki but not in the main FAQ yet leftover from before the main FAQ was hosted there. You can see them all at http://wiki.postgresql.org/wiki/Frequently_Asked_Questions I just haven't had time to merge those all usefully into the main FAQ. -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com
Re: [PERFORM] Slow "Select count(*) ..." query on table with 60 Mio. rows
Greg Smith wrote: > The content was already there, just not linked into the main FAQ > yet: > http://wiki.postgresql.org/wiki/Slow_Counting For a question asked this frequently, it should probably be in the FAQ. I'll add a link from there to the more thorough write-up. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow "Select count(*) ..." query on table with 60 Mio. rows
Kevin Grittner wrote: Matthew Wakeling wrote: This is an FAQ. I just added it to the wiki FAQ page: http://wiki.postgresql.org/wiki/FAQ#Why_is_.22SELECT_count.28.2A.29_FROM_bigtable.3B.22_slow.3F The content was already there, just not linked into the main FAQ yet: http://wiki.postgresql.org/wiki/Slow_Counting -- Greg Smith2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support g...@2ndquadrant.com www.2ndQuadrant.com
Re: [PERFORM] Slow "Select count(*) ..." query on table with 60 Mio. rows
Ivan Voras wrote: > Maybe you could add a short note why an estimation like from the > pg_class table is usually enough. OK. Will do. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow "Select count(*) ..." query on table with 60 Mio. rows
Kevin Grittner wrote: Matthew Wakeling wrote: This is an FAQ. I just added it to the wiki FAQ page: http://wiki.postgresql.org/wiki/FAQ#Why_is_.22SELECT_count.28.2A.29_FROM_bigtable.3B.22_slow.3F Maybe you could add a short note why an estimation like from the pg_class table is usually enough. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow "Select count(*) ..." query on table with 60 Mio. rows
Matthew Wakeling wrote: > This is an FAQ. I just added it to the wiki FAQ page: http://wiki.postgresql.org/wiki/FAQ#Why_is_.22SELECT_count.28.2A.29_FROM_bigtable.3B.22_slow.3F -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow "Select count(*) ..." query on table with 60 Mio. rows
In response to tom : > Hi, > > === Problem === > > i have a db-table "data_measurand" with about 6000 (60 Millions) > rows and the following query takes about 20-30 seconds (with psql): > > mydb=# select count(*) from data_measurand; > count > -- > 60846187 > (1 row) > > > === Question === > > - What can i do to improve the performance for the data_measurand table? Short answer: nothing. Long answer: PG has to check the visibility for each record, so it forces a seq.scan. But you can get an estimation, ask pg_class (a system table), the column reltuples there contains an estimated row rount. http://www.postgresql.org/docs/current/static/catalog-pg-class.html If you really needs the correct row-count you should create a TRIGGER and count with this trigger all INSERTs and DELETEs. Regards, Andreas -- Andreas Kretschmer Kontakt: Heynitz: 035242/47150, D1: 0160/7141639 (mehr: -> Header) GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431 2EB0 389D 1DC2 3172 0C99 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow "Select count(*) ..." query on table with 60 Mio. rows
On Thu, 14 Jan 2010, tom wrote: i have a db-table "data_measurand" with about 6000 (60 Millions) rows and the following query takes about 20-30 seconds (with psql): mydb=# select count(*) from data_measurand; count -- 60846187 (1 row) Sounds pretty reasonable to me. Looking at your table, the rows are maybe 200 bytes wide? That's 12GB of data for Postgres to munch through. 30 seconds is really rather quick for that (400MB/s). What sort of RAID array is managing to give you that much? I use a software raid and LVM for Logical Volume Management. Filesystem is ext3 Ditch lvm. This is an FAQ. Counting the rows in a table is an expensive operation in Postgres. It can't be answered directly from an index. If you want, you can keep track of the number of rows yourself with triggers, but beware that this will slow down write access to the table. Matthew -- Nog: Look! They've made me into an ensign! O'Brien: I didn't know things were going so badly. Nog: Frightening, isn't it? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Slow "Select count(*) ..." query on table with 60 Mio. rows
Hi, === Problem === i have a db-table "data_measurand" with about 6000 (60 Millions) rows and the following query takes about 20-30 seconds (with psql): mydb=# select count(*) from data_measurand; count -- 60846187 (1 row) === Question === - What can i do to improve the performance for the data_measurand table? === Background === I created a application with django 1.1 ( http://djangoproject.com ) to collect, analyze and visualize measurement data. === My System === = Postgres Version = postgres=# select version(); version - PostgreSQL 8.3.9 on i486-pc-linux-gnu, compiled by GCC gcc-4.3.real (Debian 4.3.2-1.1) 4.3.2 (1 row) I installed postgres with apt-get from debian lenny without any modifications. = Debian Lenny Kernel Version = lenny:~# uname -a Linux or.ammonit.com 2.6.26-2-686-bigmem #1 SMP Wed Nov 4 21:12:12 UTC 2009 i686 GNU/Linux = Hardware = model name : AMD Athlon(tm) 64 X2 Dual Core Processor 6000+ cpu MHz : 1000.000 cache size : 512 KB MemTotal: 8281516 kB (8 GB) I use a software raid and LVM for Logical Volume Management. Filesystem is ext3 === My Table Definitions === mydb=# \d data_measurand; Table "public.data_measurand" Column | Type | Modifiers -++- id | integer| not null default nextval('data_measurand_id_seq'::regclass) entry_id| integer| not null sensor_id | integer| not null avg_value | numeric(10,4) | avg_count_value | integer| min_value | numeric(10,4) | max_value | numeric(10,4) | sigma_value | numeric(10,4) | unit| character varying(20) | not null status | integer| not null comment | character varying(255) | not null Indexes: "data_measurand_pkey" PRIMARY KEY, btree (id) "data_measurand_entry_id_68e2e3fe" UNIQUE, btree (entry_id, sensor_id) "data_measurand_avg_count_value" btree (avg_count_value) "data_measurand_avg_value" btree (avg_value) "data_measurand_comment" btree (comment) "data_measurand_entry_id" btree (entry_id) "data_measurand_max_value" btree (max_value) "data_measurand_min_value" btree (min_value) "data_measurand_sensor_id" btree (sensor_id) "data_measurand_sigma_value" btree (sigma_value) "data_measurand_status" btree (status) "data_measurand_unit" btree (unit) Foreign-key constraints: "entry_id_refs_id_50fa9bdf" FOREIGN KEY (entry_id) REFERENCES data_entry(id) DEFERRABLE INITIALLY DEFERRED "sensor_id_refs_id_5ed84c7c" FOREIGN KEY (sensor_id) REFERENCES sensor_sensor(id) DEFERRABLE INITIALLY DEFERRED mydb=# \d data_entry; Table "public.data_entry" Column | Type | Modifiers --+--+- id | integer | not null default nextval('data_entry_id_seq'::regclass) project_id | integer | not null logger_id| integer | not null original_file_id | integer | not null datetime | timestamp with time zone | not null Indexes: "data_entry_pkey" PRIMARY KEY, btree (id) "data_entry_logger_id_197f5d41" UNIQUE, btree (logger_id, datetime) "data_entry_datetime" btree (datetime) "data_entry_logger_id" btree (logger_id) "data_entry_original_file_id" btree (original_file_id) "data_entry_project_id" btree (project_id) Foreign-key constraints: "logger_id_refs_id_5f73cf46" FOREIGN KEY (logger_id) REFERENCES logger_logger(id) DEFERRABLE INITIALLY DEFERRED "original_file_id_refs_id_44e8d3b1" FOREIGN KEY (original_file_id) REFERENCES data_originalfile(id) DEFERRABLE INITIALLY DEFERRED "project_id_refs_id_719fb302" FOREIGN KEY (project_id) REFERENCES project_project(id) DEFERRABLE INITIALLY DEFERRED mydb=# \d project_project; Table "public.project_project" Column | Type | Modifiers ---++-- id| integer| not null default nextval('project_project_id_seq'::regclass) auth_group_id | integer| not null name | character varying(200) | not null timezone | character varying(200) | longitude | double precision | latitude | double precision | altitude | double precision | co
Re: [PERFORM] Slow select times on select with xpath
astro77 writes: > Kevin Grittner wrote: >> I would try to minimize how many XML values it had to read, parse, and >> search. The best approach that comes to mind would be to use tsearch2 >> techniques (with a GIN or GiST index on the tsvector) to identify >> which rows contain 'fdc3da1f-060f-4c34-9c30-d9334d9272ae', and use AND >> to combine that with your xpath search. > > Thanks Kevin. I thought about using tsearch2 but I need to be able to select > exact values on other numerical queries and cannot use "contains" queries. > It's got to be fast so I cannot have lots of records returned and have to do > secondary processing on the xml for the records which contain the exact > value I'm looking for. This is one of the reasons I moved from using Lucene > for searching. I hope this makes sense. I think he meant something following this skeleton: SELECT ... FROM ( SELECT ... FROM ... WHERE /* insert preliminary filtering here */ ) WHERE /* insert xpath related filtering here */ Hopefully you have a preliminary filtering available that's restrictive enough for the xpath filtering to only have to check few rows. Kevin proposes that this preliminary filtering be based on Tsearch with an adequate index (GiST for data changing a lot, GIN for pretty static set). As you can see the two-steps filtering can be done in a single SQL query. Regards, -- dim -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow select times on select with xpath
astro77 wrote: > Thanks Kevin. I thought about using tsearch2 but I need to be able to select > exact values on other numerical queries and cannot use "contains" queries. You might be able to make use of a custom parser for tsearch2 that creates something like a single "word" for xml fragments like 1 which would let you quickly find exact matches for those words/phrases. > It's got to be fast so I cannot have lots of records returned and have to do > secondary processing on the xml for the records which contain the exact > value I'm looking for. This is one of the reasons I moved from using Lucene > for searching. I hope this makes sense. > > > Kevin Grittner wrote: >> wrote: >> >> >> I would try to minimize how many XML values it had to read, parse, and >> search. The best approach that comes to mind would be to use tsearch2 >> techniques (with a GIN or GiST index on the tsvector) to identify >> which rows contain 'fdc3da1f-060f-4c34-9c30-d9334d9272ae', and use AND >> to combine that with your xpath search. >> >> -Kevin >> >> >> > -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow select times on select with xpath
As a follow-up, when I try to create the index like this... CREATE INDEX concurrently idx_object_nodeid2 ON object USING btree( xpath('/a:root/a:Identification/b:ObjectId/text()', serialized_object, ARRAY [ ARRAY['a', 'http://schemas.datacontract.org/2004/07/Objects'], ARRAY['b', 'http://schemas.datacontract.org/2004/07/Security'] ]) ) ; The index begins to build but fails after about 90 seconds with this error: ERROR: could not identify a comparison function for type xml SQL state: 42883 Robert Haas wrote: > > On Thu, Sep 3, 2009 at 4:06 PM, astro77 wrote: >> I was receiving an error that an XML field does not support the various >> indexes available in postgresql. > > Please post what happens when you try. > >> Is there an example of how to do this >> properly? > > Not sure. > > ...Robert > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > > -- View this message in context: http://www.nabble.com/Slow-select-times-on-select-with-xpath-tp25259351p25530455.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow select times on select with xpath
Thanks Kevin. I thought about using tsearch2 but I need to be able to select exact values on other numerical queries and cannot use "contains" queries. It's got to be fast so I cannot have lots of records returned and have to do secondary processing on the xml for the records which contain the exact value I'm looking for. This is one of the reasons I moved from using Lucene for searching. I hope this makes sense. Kevin Grittner wrote: > wrote: > > > I would try to minimize how many XML values it had to read, parse, and > search. The best approach that comes to mind would be to use tsearch2 > techniques (with a GIN or GiST index on the tsvector) to identify > which rows contain 'fdc3da1f-060f-4c34-9c30-d9334d9272ae', and use AND > to combine that with your xpath search. > > -Kevin > > > -- View this message in context: http://www.nabble.com/Slow-select-times-on-select-with-xpath-tp25259351p25530439.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow select times on select with xpath
CREATE INDEX CONCURRENTLY idx_serializedxml ON "object" (serialized_object ASC NULLS LAST); yields the error: ERROR: data type xml has no default operator class for access method "btree" The same error occurs when I try to use the other access methods as well. On Thu, Sep 3, 2009 at 4:06 PM, astro77 wrote: > I was receiving an error that an XML field does not support the various > indexes available in postgresql. Please post what happens when you try. -- View this message in context: http://www.nabble.com/Slow-select-times-on-select-with-xpath-tp25259351p25530433.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow select times on select with xpath
On Thu, Sep 3, 2009 at 4:06 PM, astro77 wrote: > I was receiving an error that an XML field does not support the various > indexes available in postgresql. Please post what happens when you try. > Is there an example of how to do this > properly? Not sure. ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow select times on select with xpath
I was receiving an error that an XML field does not support the various indexes available in postgresql. Is there an example of how to do this properly? Robert Haas wrote: > > On Wed, Sep 2, 2009 at 11:04 AM, astro77 wrote: >> >> I've got a table set up with an XML field that I would like to search on >> with >> 2.5 million records. The xml are serialized objects from my application >> which are too complex to break out into separate tables. I'm trying to >> run a >> query similar to this: >> >> SELECT serialized_object as outVal >> from object where >> ( >> >> array_to_string(xpath('/a:root/a:Identification/b:ObjectId/text()', >> serialized_object, >> ARRAY >> [ >> ARRAY['a', >> 'http://schemas.datacontract.org/2004/07/Objects'], >> ARRAY['b', >> 'http://schemas.datacontract.org/2004/07/Security'] >> >> ]), ' ') = 'fdc3da1f-060f-4c34-9c30-d9334d9272ae' >> >> ) >> limit 1000; >> >> I've also set up an index on the xpath query like this... >> >> CREATE INDEX concurrently >> idx_object_nodeid >> ON >> object >> USING >> btree( >> >> cast(xpath('/a:root/a:Identification/b:ObjectId/text()', >> serialized_object, >> ARRAY >> [ >> ARRAY['a', >> 'http://schemas.datacontract.org/2004/07/Objects'], >> ARRAY['b', >> 'http://schemas.datacontract.org/2004/07/Security'] >> >> ])as text[]) >> ); >> >> The query takes around 30 minutes to complete with or without the index >> in >> place and does not cache the query. Additionally the EXPLAIN say that the >> index is not being used. I've looked everywhere but can't seem to find >> solid >> info on how to achieve this. Any ideas would be greatly appreciated. > > Why do you have a cast in the index definition? > > ...Robert > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance > > -- View this message in context: http://www.nabble.com/Slow-select-times-on-select-with-xpath-tp25259351p25283175.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow select times on select with xpath
On Wed, Sep 2, 2009 at 11:04 AM, astro77 wrote: > > I've got a table set up with an XML field that I would like to search on with > 2.5 million records. The xml are serialized objects from my application > which are too complex to break out into separate tables. I'm trying to run a > query similar to this: > > SELECT serialized_object as outVal > from object where > ( > array_to_string(xpath('/a:root/a:Identification/b:ObjectId/text()', > serialized_object, > ARRAY > [ > ARRAY['a', 'http://schemas.datacontract.org/2004/07/Objects'], > ARRAY['b', 'http://schemas.datacontract.org/2004/07/Security'] > > ]), ' ') = 'fdc3da1f-060f-4c34-9c30-d9334d9272ae' > > ) > limit 1000; > > I've also set up an index on the xpath query like this... > > CREATE INDEX concurrently > idx_object_nodeid > ON > object > USING > btree( > > cast(xpath('/a:root/a:Identification/b:ObjectId/text()', serialized_object, > ARRAY > [ > ARRAY['a', 'http://schemas.datacontract.org/2004/07/Objects'], > ARRAY['b', 'http://schemas.datacontract.org/2004/07/Security'] > > ])as text[]) > ); > > The query takes around 30 minutes to complete with or without the index in > place and does not cache the query. Additionally the EXPLAIN say that the > index is not being used. I've looked everywhere but can't seem to find solid > info on how to achieve this. Any ideas would be greatly appreciated. Why do you have a cast in the index definition? ...Robert -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow select times on select with xpath
astro77 wrote: > I've got a table set up with an XML field that I would like to search on > with > 2.5 million records. The xml are serialized objects from my application > which are too complex to break out into separate tables. I'm trying to run a > query similar to this: > > SELECT serialized_object as outVal >from object where > ( > array_to_string(xpath('/a:root/a:Identification/b:ObjectId/text()', > serialized_object, > ARRAY > [ > ARRAY['a', 'http://schemas.datacontract.org/2004/07/Objects'], > ARRAY['b', 'http://schemas.datacontract.org/2004/07/Security'] > > ]), ' ') = 'fdc3da1f-060f-4c34-9c30-d9334d9272ae' > > ) > limit 1000; I would try to minimize how many XML values it had to read, parse, and search. The best approach that comes to mind would be to use tsearch2 techniques (with a GIN or GiST index on the tsvector) to identify which rows contain 'fdc3da1f-060f-4c34-9c30-d9334d9272ae', and use AND to combine that with your xpath search. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Slow select times on select with xpath
I've got a table set up with an XML field that I would like to search on with 2.5 million records. The xml are serialized objects from my application which are too complex to break out into separate tables. I'm trying to run a query similar to this: SELECT serialized_object as outVal from object where ( array_to_string(xpath('/a:root/a:Identification/b:ObjectId/text()', serialized_object, ARRAY [ ARRAY['a', 'http://schemas.datacontract.org/2004/07/Objects'], ARRAY['b', 'http://schemas.datacontract.org/2004/07/Security'] ]), ' ') = 'fdc3da1f-060f-4c34-9c30-d9334d9272ae' ) limit 1000; I've also set up an index on the xpath query like this... CREATE INDEX concurrently idx_object_nodeid ON object USING btree( cast(xpath('/a:root/a:Identification/b:ObjectId/text()', serialized_object, ARRAY [ ARRAY['a', 'http://schemas.datacontract.org/2004/07/Objects'], ARRAY['b', 'http://schemas.datacontract.org/2004/07/Security'] ])as text[]) ); The query takes around 30 minutes to complete with or without the index in place and does not cache the query. Additionally the EXPLAIN say that the index is not being used. I've looked everywhere but can't seem to find solid info on how to achieve this. Any ideas would be greatly appreciated. -- View this message in context: http://www.nabble.com/Slow-select-times-on-select-with-xpath-tp25259351p25259351.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow select performance despite seemingly reasonable query plan
2009/5/7 David Brain : > Hi, Hi, > > Some answers in-line: > >> >> Has there been a performance *change*, or are you just concerned about a >> query which doesn't seem to use "enough" disc bandwidth? > > Performance has degraded noticeably over the past few days. > >> Certainly random access like this index scan can be extremely slow. 2-4 MB/s >> is quite reasonable if you're fetching one 8kB block per disc seek - no more >> than 200 per second. > > We have read ahead set pretty aggressively high as the SAN seems to > 'like' this, given some testing we did: > > /sbin/blockdev --getra /dev/sdb > 16384 > > >> One concern I might have with a big setup like that is how big the database >> directory has got, and whether directory lookups are taking time. Check to >> see if you have the directory_index option enabled on your ext3 filesystem. >> > > That's a thought, I doubt the option is set (I didn't set it and I > don't _think_ rhel does by default), however the 'base' directory only > contains ~5500 items total, so it's not getting too out of hand. default rhel ext3 options are (in 4.x and 5.x) : Filesystem features: has_journal ext_attr resize_inode dir_index filetype needs_recovery sparse_super large_file See tune2fs -l /dev/sdXY Laurent. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow select performance despite seemingly reasonable query plan
On Thu, May 7, 2009 at 11:19 AM, Matthew Wakeling wrote: > On Thu, 7 May 2009, David Brain wrote: > >> Certainly random access like this index scan can be extremely slow. 2-4 >>> MB/s >>> is quite reasonable if you're fetching one 8kB block per disc seek - no >>> more >>> than 200 per second. >>> >> >> We have read ahead set pretty aggressively high as the SAN seems to >> 'like' this, given some testing we did: >> >> /sbin/blockdev --getra /dev/sdb >> 16384 >> > > Read-ahead won't really help with completely random access. Thats a shame because it would be really nice to get the entire index into shared memory or OS cache. Most of the time queries are on data in the past few months. All of the indexes in the past few months should fit in cache. Did something happen to get those indexes flushed from the cache? Were they in the cache before? > I think a much more interesting line of enquiry will be trying to work out > what has changed, and why it was fast before. > > How much of the data you're accessing are you expecting to be in the OS > cache? > > Is the table you're index scanning on ordered at all? Could that have > changed recently? I wrote the application that puts data in that table. Its sort of ordered by that timestamp. Every five minutes it adds rows in no particular order that need to be added. The rows that need to be added every five minutes are ordered by another timestamp that is correlated to but not the same as the indexed timestamp. > > > That's a thought, I doubt the option is set (I didn't set it and I >> don't _think_ rhel does by default), however the 'base' directory only >> contains ~5500 items total, so it's not getting too out of hand. >> > > I think quite a few systems do set it by default now. > > Matthew > > -- > Me... a skeptic? I trust you have proof? > > > -- > Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-performance >
Re: [PERFORM] Slow select performance despite seemingly reasonable query plan
On Thu, 7 May 2009, David Brain wrote: Certainly random access like this index scan can be extremely slow. 2-4 MB/s is quite reasonable if you're fetching one 8kB block per disc seek - no more than 200 per second. We have read ahead set pretty aggressively high as the SAN seems to 'like' this, given some testing we did: /sbin/blockdev --getra /dev/sdb 16384 Read-ahead won't really help with completely random access. I think a much more interesting line of enquiry will be trying to work out what has changed, and why it was fast before. How much of the data you're accessing are you expecting to be in the OS cache? Is the table you're index scanning on ordered at all? Could that have changed recently? That's a thought, I doubt the option is set (I didn't set it and I don't _think_ rhel does by default), however the 'base' directory only contains ~5500 items total, so it's not getting too out of hand. I think quite a few systems do set it by default now. Matthew -- Me... a skeptic? I trust you have proof? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow select performance despite seemingly reasonable query plan
> > Nested Loop Left Join (cost=0.00..6462463.96 rows=1894 width=110) > -> Append (cost=0.00..6453365.66 rows=1894 width=118) > -> Seq Scan on datatable sum (cost=0.00..10.75 rows=1 width=118) > Filter: ((datapointdate >= '2009-04-01 > 00:00:00'::timestamp without time zone) AND (datapointdate <= > '2009-04-30 23:59:59'::timestamp without time zone) AND > ((customerid)::text = ''::text) AND (NOT CASE WHEN (NOT obsolete) > THEN false ELSE CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN > (obsoletereasonid = 1) THEN true WHEN (obsoletereasonid = 2) THEN true > WHEN (cdrdatasourceid = 1) THEN false ELSE true END END)) > -> Index Scan using > datatable_20090328_customeriddatapointdate_idx on datatable_20090328 > sum (cost=0.00..542433.51 rows=180 width=49) > Index Cond: ((datapointdate >= '2009-04-01 > 00:00:00'::timestamp without time zone) AND (datapointdate <= > '2009-04-30 23:59:59'::timestamp without time zone) AND > ((customerid)::text = ''::text)) > Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE > CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid > = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN > (cdrdatasourceid = 1) THEN false ELSE true END END) > -> Index Scan using > datatable_20090404_customeriddatapointdate_idx on datatable_20090404 > sum (cost=0.00..1322098.74 rows=405 width=48) > Index Cond: ((datapointdate >= '2009-04-01 > 00:00:00'::timestamp without time zone) AND (datapointdate <= > '2009-04-30 23:59:59'::timestamp without time zone) AND > ((customerid)::text = ''::text)) > Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE > CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid > = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN > (cdrdatasourceid = 1) THEN false ELSE true END END) > -> Index Scan using > datatable_20090411_customeriddatapointdate_idx on datatable_20090411 > sum (cost=0.00..1612744.29 rows=450 width=48) > Index Cond: ((datapointdate >= '2009-04-01 > 00:00:00'::timestamp without time zone) AND (datapointdate <= > '2009-04-30 23:59:59'::timestamp without time zone) AND > ((customerid)::text = ''::text)) > Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE > CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid > = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN > (cdrdatasourceid = 1) THEN false ELSE true END END) > -> Index Scan using > datatable_20090418_customeriddatapointdate_idx on datatable_20090418 > sum (cost=0.00..1641913.58 rows=469 width=49) > Index Cond: ((datapointdate >= '2009-04-01 > 00:00:00'::timestamp without time zone) AND (datapointdate <= > '2009-04-30 23:59:59'::timestamp without time zone) AND > ((customerid)::text = ''::text)) > Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE > CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid > = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN > (cdrdatasourceid = 1) THEN false ELSE true END END) > -> Index Scan using > datatable_20090425_customeriddatapointdate_idx on datatable_20090425 > sum (cost=0.00..1334164.80 rows=389 width=49) > Index Cond: ((datapointdate >= '2009-04-01 > 00:00:00'::timestamp without time zone) AND (datapointdate <= > '2009-04-30 23:59:59'::timestamp without time zone) AND > ((customerid)::text = ''::text)) > Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE > CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid > = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN > (cdrdatasourceid = 1) THEN false ELSE true END END) > -> Index Scan using pk_cdrextension on cdrextension ext > (cost=0.00..4.77 rows=1 width=8) > Index Cond: (sum.id = ext.datatableid) > > Something doesn't look right. Why is it doing an index scan on datatable_20090404 when the constraint for that table puts it as entirely in the date range? Shouldn't it just seq scan the partition or use the partition's customerid index?
Re: [PERFORM] Slow select performance despite seemingly reasonable query plan
Hi, Some answers in-line: > > Has there been a performance *change*, or are you just concerned about a > query which doesn't seem to use "enough" disc bandwidth? Performance has degraded noticeably over the past few days. > Certainly random access like this index scan can be extremely slow. 2-4 MB/s > is quite reasonable if you're fetching one 8kB block per disc seek - no more > than 200 per second. We have read ahead set pretty aggressively high as the SAN seems to 'like' this, given some testing we did: /sbin/blockdev --getra /dev/sdb 16384 > One concern I might have with a big setup like that is how big the database > directory has got, and whether directory lookups are taking time. Check to > see if you have the directory_index option enabled on your ext3 filesystem. > That's a thought, I doubt the option is set (I didn't set it and I don't _think_ rhel does by default), however the 'base' directory only contains ~5500 items total, so it's not getting too out of hand. David -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow select performance despite seemingly reasonable query plan
On Thu, 7 May 2009, David Brain wrote: This has been working reasonably well, however in the last few days I've been seeing extremely slow performance on what are essentially fairly simple 'index hitting' selects on this data. From the host side I see that the postgres query process is mostly in IO wait, however there is very little data actually being transferred (maybe 2-4 MB/s) - when a different query (say a select count(*) form datatable) will yield a sustained 150+ MB/s. Has there been a performance *change*, or are you just concerned about a query which doesn't seem to use "enough" disc bandwidth? 1. Problem with random versus sequential reads on storage system. Certainly random access like this index scan can be extremely slow. 2-4 MB/s is quite reasonable if you're fetching one 8kB block per disc seek - no more than 200 per second. 3. Problem with the host environment - one suspicion I have here is that we are >90% full on the storage drives (ext3), I'm not sure if that is impacting performance. One concern I might have with a big setup like that is how big the database directory has got, and whether directory lookups are taking time. Check to see if you have the directory_index option enabled on your ext3 filesystem. Matthew -- The third years are wandering about all worried at the moment because they have to hand in their final projects. Please be sympathetic to them, say things like "ha-ha-ha", but in a sympathetic tone of voice -- Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow select performance despite seemingly reasonable query plan
Hi, Interesting, for one index on one partition: idx_scan: 329 idx_tup_fetch: 8905730 So maybe a reindex would help? David. On Thu, May 7, 2009 at 10:26 AM, Scott Mead wrote: > On Thu, May 7, 2009 at 10:14 AM, David Brain wrote: >> >> Hi, >> >> Some context, we have a _lot_ of data, > 1TB, mostly in 1 'table' - >> the 'datatable' in the example below although in order to improve >> performance this table is partitioned (by date range) into a number of >> partition tables. Each partition contains up to 20GB of data (tens of >> millons of rows), with an additional ~3GB of indexes, all this is >> served off a fairly high performance server (8 core 32Gb, with FC >> attached SAN storage). PostgreSQL version is 8.3.5 (running on 64bit >> RHEL 5.2) >> >> This has been working reasonably well, however in the last few days >> I've been seeing extremely slow performance on what are essentially >> fairly simple 'index hitting' selects on this data. > > Have you re-indexed any of your partitioned tables? If you're index is > fragmented, you'll be incurring extra I/O's per index access. Take a look > at the pgstattuple contrib for some functions to determine index > fragmentation. You can also take a look at the pg_stat_all_indexes tables. > If your number of tup's fetched is 100 x more than your idx_scans, you *may* > consider reindexing. > > --Scott > > -- David Brain dbr...@bandwidth.com 919.297.1078 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow select performance despite seemingly reasonable query plan
On Thu, May 7, 2009 at 10:14 AM, David Brain wrote: > Hi, > > Some context, we have a _lot_ of data, > 1TB, mostly in 1 'table' - > the 'datatable' in the example below although in order to improve > performance this table is partitioned (by date range) into a number of > partition tables. Each partition contains up to 20GB of data (tens of > millons of rows), with an additional ~3GB of indexes, all this is > served off a fairly high performance server (8 core 32Gb, with FC > attached SAN storage). PostgreSQL version is 8.3.5 (running on 64bit > RHEL 5.2) > > This has been working reasonably well, however in the last few days > I've been seeing extremely slow performance on what are essentially > fairly simple 'index hitting' selects on this data. Have you re-indexed any of your partitioned tables? If you're index is fragmented, you'll be incurring extra I/O's per index access. Take a look at the pgstattuple contrib for some functions to determine index fragmentation. You can also take a look at the pg_stat_all_indexes tables. If your number of tup's fetched is 100 x more than your idx_scans, you *may* consider reindexing. --Scott
[PERFORM] Slow select performance despite seemingly reasonable query plan
Hi, Some context, we have a _lot_ of data, > 1TB, mostly in 1 'table' - the 'datatable' in the example below although in order to improve performance this table is partitioned (by date range) into a number of partition tables. Each partition contains up to 20GB of data (tens of millons of rows), with an additional ~3GB of indexes, all this is served off a fairly high performance server (8 core 32Gb, with FC attached SAN storage). PostgreSQL version is 8.3.5 (running on 64bit RHEL 5.2) This has been working reasonably well, however in the last few days I've been seeing extremely slow performance on what are essentially fairly simple 'index hitting' selects on this data. From the host side I see that the postgres query process is mostly in IO wait, however there is very little data actually being transferred (maybe 2-4 MB/s) - when a different query (say a select count(*) form datatable) will yield a sustained 150+ MB/s. There have been no configuration changes during this time, although of course the database has grown as data is added on a daily basis. I'm not sure of the best way to diagnose this issue - the possible causes I can think of are: 1. Problem with random versus sequential reads on storage system. 2. 'Something' with PostgreSQL itself. 3. Problem with the host environment - one suspicion I have here is that we are >90% full on the storage drives (ext3), I'm not sure if that is impacting performance. Any thoughts as to how to procede from here would be very welcome. Here is an example query plan - looks reasonable to me, seems is making use of the indexes and the constraint exclusion on the partition tables: Nested Loop Left Join (cost=0.00..6462463.96 rows=1894 width=110) -> Append (cost=0.00..6453365.66 rows=1894 width=118) -> Seq Scan on datatable sum (cost=0.00..10.75 rows=1 width=118) Filter: ((datapointdate >= '2009-04-01 00:00:00'::timestamp without time zone) AND (datapointdate <= '2009-04-30 23:59:59'::timestamp without time zone) AND ((customerid)::text = ''::text) AND (NOT CASE WHEN (NOT obsolete) THEN false ELSE CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN (cdrdatasourceid = 1) THEN false ELSE true END END)) -> Index Scan using datatable_20090328_customeriddatapointdate_idx on datatable_20090328 sum (cost=0.00..542433.51 rows=180 width=49) Index Cond: ((datapointdate >= '2009-04-01 00:00:00'::timestamp without time zone) AND (datapointdate <= '2009-04-30 23:59:59'::timestamp without time zone) AND ((customerid)::text = ''::text)) Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN (cdrdatasourceid = 1) THEN false ELSE true END END) -> Index Scan using datatable_20090404_customeriddatapointdate_idx on datatable_20090404 sum (cost=0.00..1322098.74 rows=405 width=48) Index Cond: ((datapointdate >= '2009-04-01 00:00:00'::timestamp without time zone) AND (datapointdate <= '2009-04-30 23:59:59'::timestamp without time zone) AND ((customerid)::text = ''::text)) Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN (cdrdatasourceid = 1) THEN false ELSE true END END) -> Index Scan using datatable_20090411_customeriddatapointdate_idx on datatable_20090411 sum (cost=0.00..1612744.29 rows=450 width=48) Index Cond: ((datapointdate >= '2009-04-01 00:00:00'::timestamp without time zone) AND (datapointdate <= '2009-04-30 23:59:59'::timestamp without time zone) AND ((customerid)::text = ''::text)) Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN (cdrdatasourceid = 1) THEN false ELSE true END END) -> Index Scan using datatable_20090418_customeriddatapointdate_idx on datatable_20090418 sum (cost=0.00..1641913.58 rows=469 width=49) Index Cond: ((datapointdate >= '2009-04-01 00:00:00'::timestamp without time zone) AND (datapointdate <= '2009-04-30 23:59:59'::timestamp without time zone) AND ((customerid)::text = ''::text)) Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN (cdrdatasourceid = 1) THEN false ELSE true END END) -> Index Scan using datatable_20090425_customeriddatapointdate_idx on datatable_20090425 sum (cost=0.00..1334164.80 rows=389 width=49) Index Cond: ((datapointdate >= '2009-04-01 00:00:00'::timestamp without time zone) AND (datapointdate <= '2
Re: [PERFORM] Slow SELECT on three or more clients
Hi Markus, Thank you very much for the information. I was able to make it fast by correcting indices, i created index on frequently filtered fields. now it runs at 0.05 seconds average, much faster than before 0.86. I will also upgrade to 8.1.5. Once again, thank you very much. it helped me a lot. Amir On 11/15/06, Markus Schaber <[EMAIL PROTECTED]> wrote: Hi, Amir, AMIR FRANCO D. JOVEN wrote: > My current project uses PostgreSQL 7.3.4. By all means, please upgrade. The newest 7.3 series version is 7.3.16, which fixes lots of critical bugs, and can be used as a drop-in replacement for 7.3.4 (see Release Notes at http://www.postgresql.org/docs/7.3/interactive/release.html ) The newest stable release is 8.1.5, and 8.2 is just on the roads... > I have a table with 94 fields and a select with only one resultset in > only one client consumes about 0.86 seconds. "with only on resultset"? You mean "with only one returned row", I presume. Each SELECT has exactly one resultset, which can contain zero to many rows. Please check the following: - Did you create the appropriate indices? - Version 7.3.X may suffer from index bloat, so REINDEX might help. - Did you VACUUM and ANALYZE the table properly? - Is your free space map setting, the statistics targets, and other config options tuned to fit your environment? - Maybe a VACUUM FULL or a CLUSTER command may help you. > for a single select with one field in one resultset, is 0.86 seconds normal? That depends on the circumstances. Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org -- AMIR FRANCO D. JOVEN Software Engineer DIGI Software (PHILS.) Inc.
Re: [PERFORM] Slow SELECT on three or more clients
Hi, Amir, AMIR FRANCO D. JOVEN wrote: > My current project uses PostgreSQL 7.3.4. By all means, please upgrade. The newest 7.3 series version is 7.3.16, which fixes lots of critical bugs, and can be used as a drop-in replacement for 7.3.4 (see Release Notes at http://www.postgresql.org/docs/7.3/interactive/release.html ) The newest stable release is 8.1.5, and 8.2 is just on the roads... > I have a table with 94 fields and a select with only one resultset in > only one client consumes about 0.86 seconds. "with only on resultset"? You mean "with only one returned row", I presume. Each SELECT has exactly one resultset, which can contain zero to many rows. Please check the following: - Did you create the appropriate indices? - Version 7.3.X may suffer from index bloat, so REINDEX might help. - Did you VACUUM and ANALYZE the table properly? - Is your free space map setting, the statistics targets, and other config options tuned to fit your environment? - Maybe a VACUUM FULL or a CLUSTER command may help you. > for a single select with one field in one resultset, is 0.86 seconds normal? That depends on the circumstances. Markus -- Markus Schaber | Logical Tracking&Tracing International AG Dipl. Inf. | Software Development GIS Fight against software patents in Europe! www.ffii.org www.nosoftwarepatents.org ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Slow SELECT on three or more clients
On 11/15/06, AMIR FRANCO D. JOVEN <[EMAIL PROTECTED]> wrote: Hi! Im new to PostgreSQL. My current project uses PostgreSQL 7.3.4. the problem is like this: I have a table with 94 fields and a select with only one resultset in only one client consumes about 0.86 seconds. The client executes three 'select' statements to perform the task which consumes 2.58 seconds. With only one client this is acceptable, but the real problem is as i add more clients, it goes more and more slower. for a single select with one field in one resultset, is 0.86 seconds normal? I tried vacuuming and reindexing but to no avail. the total record count in that particular table is 456,541. returning 450k rows in around 1 second is about right for a result set with one field. imo, your best bet is to try and break up your table and reorganize it so you dont have to query the whole thing every time. why do you need to return all the rows over and over? merlin ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Slow SELECT on three or more clients
Operating system and some of the basic PostreSQL config settings would be helpful, plus any info you have on your disks, the size of the relevant tables, their structure and indexes & vacuum/analyze status ... plus what others have said: Upgrade! There are considerable improvements in, well, *everything* !, since 7.3 (we havew some database atb 7.4.x and I consider them out-of-date). Hopefully this list can provide help to get you through whatever your immediate crisis is, but do consider planning for this as soon as time and resource permit. Data integrity is a _good_ thing! Greg Williamson DBA GlobeXplorer LLC -Original Message- From: [EMAIL PROTECTED] on behalf of Russell Smith Sent: Wed 11/15/2006 5:31 AM To: AMIR FRANCO D. JOVEN Cc: pgsql-performance@postgresql.org Subject:Re: [PERFORM] Slow SELECT on three or more clients AMIR FRANCO D. JOVEN wrote: > Hi! > > Im new to PostgreSQL. > > My current project uses PostgreSQL 7.3.4. Upgrading your version of PostgreSQL to 8.1 will give you significant benefits to performance. > > the problem is like this: > > I have a table with 94 fields and a select with only one resultset in > only one client consumes about 0.86 seconds. > The client executes three 'select' statements to perform the task > which consumes 2.58 seconds. > With only one client this is acceptable, but the real problem is as i > add more clients, it goes more and more slower. > > for a single select with one field in one resultset, is 0.86 seconds > normal? You will need to attach the query. EXPLAIN ANALYZE SELECT ... where SELECT ... is your query. That will help us work out what the problem is. 0.86 seconds might be slow for a query that returns 1 row, it might be fast for a query that returns a large set with complex joins and where conditions. Fast and slow are not objective terms. They are very dependent on the query. > > I tried vacuuming and reindexing but to no avail. > the total record count in that particular table is 456,541. > 456,541 is not all that many records. But again you will need to post more information for us to be able to assist. > Thanks in advance. > ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings --- Click link below if it is SPAM [EMAIL PROTECTED] "https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=455b17b2223071076418835&[EMAIL PROTECTED]&retrain=spam&template=history&history_page=1" !DSPAM:455b17b2223071076418835! --- ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Slow SELECT on three or more clients
AMIR FRANCO D. JOVEN wrote: Hi! Im new to PostgreSQL. My current project uses PostgreSQL 7.3.4. Upgrading your version of PostgreSQL to 8.1 will give you significant benefits to performance. the problem is like this: I have a table with 94 fields and a select with only one resultset in only one client consumes about 0.86 seconds. The client executes three 'select' statements to perform the task which consumes 2.58 seconds. With only one client this is acceptable, but the real problem is as i add more clients, it goes more and more slower. for a single select with one field in one resultset, is 0.86 seconds normal? You will need to attach the query. EXPLAIN ANALYZE SELECT ... where SELECT ... is your query. That will help us work out what the problem is. 0.86 seconds might be slow for a query that returns 1 row, it might be fast for a query that returns a large set with complex joins and where conditions. Fast and slow are not objective terms. They are very dependent on the query. I tried vacuuming and reindexing but to no avail. the total record count in that particular table is 456,541. 456,541 is not all that many records. But again you will need to post more information for us to be able to assist. Thanks in advance. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Slow SELECT on three or more clients
* AMIR FRANCO D. JOVEN <[EMAIL PROTECTED]> [061115 12:44]: > Hi! > > Im new to PostgreSQL. > > My current project uses PostgreSQL 7.3.4. Ancient. Upgrade it, especially if it's a new database. > > the problem is like this: > > I have a table with 94 fields and a select with only one resultset in only > one client consumes about 0.86 seconds. > The client executes three 'select' statements to perform the task which > consumes 2.58 seconds. > With only one client this is acceptable, but the real problem is as i add > more clients, it goes more and more slower. That depends upon: a) your table schema. b) the data in the tables. E.g. how big are rows, how many rows. c) the size of the result sets. d) your indexes? Andreas ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Slow SELECT on three or more clients
Hi! Im new to PostgreSQL. My current project uses PostgreSQL 7.3.4. the problem is like this: I have a table with 94 fields and a select with only one resultset in only one client consumes about 0.86 seconds. The client executes three 'select' statements to perform the task which consumes 2.58 seconds. With only one client this is acceptable, but the real problem is as i add more clients, it goes more and more slower. for a single select with one field in one resultset, is 0.86 seconds normal? I tried vacuuming and reindexing but to no avail. the total record count in that particular table is 456,541. Thanks in advance.
Re: [PERFORM] Slow select, insert, update
Does that mean reindex is not needed for PG version 7.4? In what kind situations under PG 7.4, reindex is worthwhile? Thanks, Here is doc from 7.3: PostgreSQL is unable to reuse B-tree index pages in certain cases. The problem is that if indexed rows are deleted, those index pages can only be reused by rows with similar values. For example, if indexed rows are deleted and newly inserted/updated rows have much higher values, the new rows can't use the index space made available by the deleted rows. Instead, such new rows must be placed on new index pages. In such cases, disk space used by the index will grow indefinitely, even if VACUUM is run frequently. As a solution, you can use the REINDEX command periodically to discard pages used by deleted rows. There is also contrib/reindexdb which can reindex an entire database. The counterpart of 7.4 is: In some situations it is worthwhile to rebuild indexes periodically with the REINDEX command. (There is also contrib/reindexdb which can reindex an entire database.) However, PostgreSQL 7.4 has substantially reduced the need for this activity compared to earlier releases. --- Doug McNaught <[EMAIL PROTECTED]> wrote: > Paul Langard <[EMAIL PROTECTED]> writes: > > > Having trouble with one table (see time to count > records below!). > > > > Fairly new to postgres so any help much > appreciated. > > > > It only contains 9,106 records - as you can see > from: > > > > > > select count(id) from project > > > > count > > 9106 > > 1 row(s) > > Total runtime: 45,778.813 ms > > > ... the database is regularly vaccuumed. > > Hmm. You might try a VACUUM FULL and a REINDEX on > the table (you > don't say what version you are running--REINDEX is > sometimes needed on > 7.3 and below). > > Also, use EXPLAIN ANALYZE on your query and post the > result--that's > helpful diagnostic information. > > -Doug > -- > Let us cross over the river, and rest under the > shade of the trees. >--T. J. Jackson, 1863 > > > ---(end of > broadcast)--- > TIP 8: explain analyze is your friend > __ Do you Yahoo!? Read only the mail you want - Yahoo! Mail SpamGuard. http://promotions.yahoo.com/new_mail ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Slow select, insert, update
Paul Langard <[EMAIL PROTECTED]> writes: > select count(id) from project > count > 9106 > 1 row(s) > Total runtime: 45,778.813 ms Yipes. The only explanation I can think of is tremendous table bloat. What do you get from "vacuum verbose project" --- in particular, how many pages in the table? > ... the database is regularly vaccuumed. Not regularly enough, perhaps ... or else you need to increase the free space map size parameters. In any case you'll probably need to do one round of "vacuum full" to get this table back within bounds. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Slow select, insert, update
Paul, Paul Langard wrote: Having trouble with one table (see time to count records below!). Fairly new to postgres so any help much appreciated. It only contains 9,106 records - as you can see from: select count(id) from project *count *9106 1 row(s) Total runtime: 45,778.813 ms ... the database is regularly vaccuumed. Have you tried doing a VACUUM FULL, CLUSTER, or drop/restore on the table? This sounds symptomatic of a table with a bunch of dead tuples not in the FSM (free space map). Only tuples in the FSM are reclaimed by a regular VACUUM. If your FSM parameters in postgresql.conf are not big enough for your ratio of UPDATE/DELETE operations to VACUUM frequency, you will end up with dead tuples that will only be reclaimed by a VACUUM FULL. To prevent this problem in the future, look at increasing your FSM size and possibly vacuuming more frequently or using pg_autovacuum. Good Luck, Bill Montgomery ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Slow select, insert, update
Paul Langard <[EMAIL PROTECTED]> writes: > Having trouble with one table (see time to count records below!). > > Fairly new to postgres so any help much appreciated. > > It only contains 9,106 records - as you can see from: > > > select count(id) from project > > count > 9106 > 1 row(s) > Total runtime: 45,778.813 ms > ... the database is regularly vaccuumed. Hmm. You might try a VACUUM FULL and a REINDEX on the table (you don't say what version you are running--REINDEX is sometimes needed on 7.3 and below). Also, use EXPLAIN ANALYZE on your query and post the result--that's helpful diagnostic information. -Doug -- Let us cross over the river, and rest under the shade of the trees. --T. J. Jackson, 1863 ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] Slow select, insert, update
Having trouble with one table (see time to count records below!). Fairly new to postgres so any help much appreciated. It only contains 9,106 records - as you can see from: select count(id) from project count 9106 1 row(s) Total runtime: 45,778.813 ms There are only 3 fields: id integer nextval('id'::text) projectnumber text description text There is one index: id_project_ukey CREATE UNIQUE INDEX id_project_ukey ON project USING btree (id) ... the database is regularly vaccuumed.
Re: [PERFORM] slow select
Vivek Khera wrote: "JB" == Josh Berkus <[EMAIL PROTECTED]> writes: JB> Actually, what OS's can't use all idle ram for kernel cache? I JB> should note that in my performance docs FreeBSD. Limited by the value of "sysctl vfs.hibufspace" from what I understand. This value is set at boot based on available RAM and some other tuning parameters. Actually I wanted to ask this question for long time. Can we have guidelines about how to set effective cache size for various OSs? Linux is pretty simple. Everything free is buffer cache. FreeBSD, not so straightforward but there is a sysctl.. How about HP-UX, Solaris and AIX? Other BSDs? and most importantly windows? That could add much value to the tuning guide. Isn't it? Shridhar ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] slow select
> "JB" == Josh Berkus <[EMAIL PROTECTED]> writes: JB> Vivek, >> Assuming your OS will use that much RAM for the cache... the whole >> world's not Linux :-) JB> It's not? Darn! :-) JB> Actually, what OS's can't use all idle ram for kernel cache? I JB> should note that in my performance docs FreeBSD. Limited by the value of "sysctl vfs.hibufspace" from what I understand. This value is set at boot based on available RAM and some other tuning parameters. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] slow select
Vivek, > Assuming your OS will use that much RAM for the cache... the whole > world's not Linux :-) It's not? Darn! Actually, what OS's can't use all idle ram for kernel cache? I should note that in my performance docs -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] slow select
"Medora Schauer" <[EMAIL PROTECTED]> writes: > Merge Join (cost=0.00..287726.10 rows=100221 width=58) (actual time=61.60..5975.63 > rows=100425 loops=1) >Merge Cond: (("outer".shot_line_num = "inner".shot_line_num) AND > ("outer".shotpoint = "inner".shotpoint)) >-> Index Scan using hsot_record_idx on shot_record r (cost=0.00..123080.11 > rows=100425 width=46) (actual time=24.15..2710.31 rows=100425 loops=1) >-> Index Scan using shotpoint_idx on shotpoint p (cost=0.00..467924.54 > rows=290106 width=12) (actual time=37.38..1379.64 rows=100749 loops=1) > Total runtime: 6086.32 msec > > So why did were the indices not used before when they yield a better plan? There's another reason. Notice it thinks the second table will return 290k records. In fact it only returns 100k records. So it's optimizing on the assumption that it will have to read 3x as many records as it actually will... I'm not clear if there's anything you can do to improve this estimate though. -- greg ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] slow select
> "JB" == Josh Berkus <[EMAIL PROTECTED]> writes: JB> Medora, >> Increasing effective_cache_size to 1 did it. JB> That would be 78MB RAM. If you have more than that available, you can JB> increase it further. Ideally, it should be about 2/3 to 3/4 of available JB> RAM. Assuming your OS will use that much RAM for the cache... the whole world's not Linux :-) -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D.Khera Communications, Inc. Internet: [EMAIL PROTECTED] Rockville, MD +1-240-453-8497 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/ ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] slow select
Medora, > Increasing effective_cache_size to 1 did it. That would be 78MB RAM. If you have more than that available, you can increase it further. Ideally, it should be about 2/3 to 3/4 of available RAM. >The query now > takes 4 secs. I left random_page_cost at the default value of 4. > I thought, mistakenly apparently, that our database was relatively > itty bitty and so haven't messed with the .conf file. Actually, for a itty bitty database on a fast machine, you definitely want to lower random_page_cost. It's a large database that would make you cautious about this. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] slow select
Josh, > > So why did were the indices not used before when they yield > a better plan? > > Your .conf settings, most likely. I'd lower your > random_page_cost and raise > your effective_cache_size. Increasing effective_cache_size to 1 did it. The query now takes 4 secs. I left random_page_cost at the default value of 4. I thought, mistakenly apparently, that our database was relatively itty bitty and so haven't messed with the .conf file. Guess I better take a look at all the settings (I know where the docs are). Thanks for your help, Medora *** Medora Schauer Sr. Software Engineer Fairfield Industries *** ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] slow select
Medora, > So why did were the indices not used before when they yield a better plan? Your .conf settings, most likely. I'd lower your random_page_cost and raise your effective_cache_size. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] slow select
> > Medora, > > > I'm using pg 7.3.4 to do a select involving a join on 2 tables. > > The query is taking 15 secs which seems extreme to me considering > > the indices that exist on the two tables. EXPLAIN ANALYZE shows > > that the indices aren't being used. I've done VACUUM ANALYZE on the > > db with no change in results. Shouldn't the indices be used? > > No. You're selecting 100,000 records. For such a large > record dump, a seq > scan is usually faster. > > If you don't believe me, try setting enable_seqscan=false and > see how long the > query takes. I did as you suggested (set enable_seqscan = false) and the query now takes 6 sec (vs 15 secs before) : Merge Join (cost=0.00..287726.10 rows=100221 width=58) (actual time=61.60..5975.63 rows=100425 loops=1) Merge Cond: (("outer".shot_line_num = "inner".shot_line_num) AND ("outer".shotpoint = "inner".shotpoint)) -> Index Scan using hsot_record_idx on shot_record r (cost=0.00..123080.11 rows=100425 width=46) (actual time=24.15..2710.31 rows=100425 loops=1) -> Index Scan using shotpoint_idx on shotpoint p (cost=0.00..467924.54 rows=290106 width=12) (actual time=37.38..1379.64 rows=100749 loops=1) Total runtime: 6086.32 msec So why did were the indices not used before when they yield a better plan? ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] slow select
Medora, > I'm using pg 7.3.4 to do a select involving a join on 2 tables. > The query is taking 15 secs which seems extreme to me considering > the indices that exist on the two tables. EXPLAIN ANALYZE shows > that the indices aren't being used. I've done VACUUM ANALYZE on the > db with no change in results. Shouldn't the indices be used? No. You're selecting 100,000 records. For such a large record dump, a seq scan is usually faster. If you don't believe me, try setting enable_seqscan=false and see how long the query takes. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] slow select
I'm using pg 7.3.4 to do a select involving a join on 2 tables. The query is taking 15 secs which seems extreme to me considering the indices that exist on the two tables. EXPLAIN ANALYZE shows that the indices aren't being used. I've done VACUUM ANALYZE on the db with no change in results. Shouldn't the indices be used? Below is what I believe to be the relevant information. I haven't included the definitions of the tables involved in the foreign key definititions because I don't think they matter. Any help will be greatly appreciated. CREATE TABLE shotpoint ( shot_line_num FLOAT4, \ shotpoint FLOAT4, x FLOAT4, y FLOAT4, template_id INT4, num_chans INT4) CREATE TABLE shot_record ( shot_line_num FLOAT4, shotpoint FLOAT4, index INT2, dev INT4, dev_offset INT8, bin INT4, shot_time INT8, record_length INT4, nav_x FLOAT4, nav_y FLOAT4, num_rus INT4, status INT4 DEFAULT 0, reel_num INT4, file_num INT4, nav_status INT2, nav_shot_line FLOAT4, nav_shotpoint FLOAT4, nav_depth FLOAT4, sample_skew INT4, trace_count INT4, PRIMARY KEY (shot_line_num, shotpoint, index)) ALTER TABLE shotpoint ADD CONSTRAINT shot_line_fk FOREIGN KEY (shot_line_num) REFERENCES shot_line(shot_line_num) CREATE UNIQUE INDEX shotpoint_idx ON shotpoint(shot_line_num, shotpoint) ALTER TABLE shot_record ADD CONSTRAINT shot_record_shotpoint_index_fk FOREIGN KEY (shot_line_num, shotpoint) REFERENCES shotpoint(shot_line_num, shotpoint) EXPLAIN ANALYZE SELECT r.shot_line_num, r.shotpoint, index, shot_time, record_length, dev, dev_offset, num_rus, bin, template_id, trace_count FROM shot_record r, shotpoint p WHERE p.shot_line_num = r.shot_line_num AND p.shotpoint = r.shotpoint; Merge Join (cost=49902.60..52412.21 rows=100221 width=58) (actual time=12814.28..15000.65 rows=100425 loops=1) Merge Cond: (("outer".shot_line_num = "inner".shot_line_num) AND ("outer".shotpoint = "inner".shotpoint)) -> Sort (cost=13460.90..13711.97 rows=100425 width=46) (actual time=3856.94..4157.01 rows=100425 loops=1) Sort Key: r.shot_line_num, r.shotpoint -> Seq Scan on shot_record r (cost=0.00..2663.25 rows=100425 width=46) (actual time=18.00..1089.00 rows=100425 loops=1) -> Sort (cost=36441.70..37166.96 rows=290106 width=12) (actual time=8957.19..9224.09 rows=100749 loops=1) Sort Key: p.shot_line_num, p.shotpoint -> Seq Scan on shotpoint p (cost=0.00..5035.06 rows=290106 width=12) (actual time=7.55..2440.06 rows=290106 loops=1) Total runtime: 15212.05 msec *** Medora Schauer Sr. Software Engineer Fairfield Industries 14100 Southwest Freeway Suite 600 Sugar Land, Tx 77478-3469 USA [EMAIL PROTECTED] phone: 281-275-7664 fax: 281-275-7551 *** ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster