Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2

2013-05-30 Thread Merlin Moncure
On Thu, May 30, 2013 at 10:22 AM, Igor Neyman  wrote:
>> -Original Message-
>> From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
>> ow...@postgresql.org] On Behalf Of Merlin Moncure
>> Sent: Thursday, May 30, 2013 11:14 AM
>> To: John Mudd
>> Cc: pgsql-performance@postgresql.org
>> Subject: Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2
>>
>> On Mon, May 27, 2013 at 9:02 AM, John Mudd  wrote:
>> > Postgres 9.1.2 on Ubuntu 12.04
>> >
>> > Any reason why a select by primary key would be slower than a select
>> > that includes an ORDER BY? I was really hoping using the primary key
>> > would give me a boost.
>> >
>> > I stopped the server and cleared the O/S cache using "sync; echo 3 >
>> > /proc/sys/vm/drop_caches" between the runs.
>> >
>> >
>> >
>> > test=# VACUUM ANALYZE test_select;
>> > VACUUM
>> >
>> > (stopped postgres; reset O/S cache; started postgres)
>> >
>> > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50
>> > ORDER BY key1, key2, key3, id LIMIT 1;
>> >   QUERY
>> > PLAN
>> > -
>> -
>> > 
>> >  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600
>> > rows=1
>> > loops=1)
>> >->  Index Scan using my_key on test_select  (cost=0.00..41895.49
>> > rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
>> >  Index Cond: (key1 >= 50)
>> >  Total runtime: 12.678 ms
>> >
>> > (stopped postgres; reset O/S cache; started postgres)
>>
>>
>> why are you flushing postgres/os cache?  when you do that, you are
>> measuring raw read time from disks.  Typical disk seek time is measured
>> in milliseconds so the timings are completely appropriate once you
>> remove caching effects. Hard drives (at least, the spinning
>> kind) are slow and one of the major challenges of database and hardware
>> engineering is working around their limitations.  Fortunately it looks
>> like faster storage will soon be commonplace for reasonable prices.
>>
>> merlin
>>
>
> True.
> But, on the hand (back to original question),
> execution plans that John got before and after suggested change in 
> configuration parameters are exactly the same, though timing is different but 
> only due to buffer cache issue.

Right.  Well, I think Matheus's answer is the right one.   But my
point was that what's going here is we are measuring number of raw
uncached seeks to satisfy query on index A vs B.  Pure luck in terms
of how the index data is organized could throw it off one way or the
other.  But the test methodology is bogus because the root index pages
will stay hot so the more compact pkey will likely be slightly faster
in real world usage.  (but, I prefer the composite key style of design
especially for range searching).

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2

2013-05-30 Thread John Mudd
I flushed the caches in an attempt to get meaningful results. I've seen
complaints to previous posts that don't include clearing the caches.

I agree this tends to be artificial in another direction. I will strive to
come up with a more realistic test environment next time. Maybe performing
many random reads initially to fill the caches with random blocks. That
might allow for minimal assistance from the cache and be more realistic.



On Thu, May 30, 2013 at 11:13 AM, Merlin Moncure  wrote:

> On Mon, May 27, 2013 at 9:02 AM, John Mudd  wrote:
> > Postgres 9.1.2 on Ubuntu 12.04
> >
> > Any reason why a select by primary key would be slower than a select that
> > includes an ORDER BY? I was really hoping using the primary key would
> give
> > me a boost.
> >
> > I stopped the server and cleared the O/S cache using "sync; echo 3 >
> > /proc/sys/vm/drop_caches" between the runs.
> >
> >
> >
> > test=# VACUUM ANALYZE test_select;
> > VACUUM
> >
> > (stopped postgres; reset O/S cache; started postgres)
> >
> > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50
> ORDER
> > BY key1, key2, key3, id LIMIT 1;
> >   QUERY PLAN
> >
> --
> >  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600
> rows=1
> > loops=1)
> >->  Index Scan using my_key on test_select  (cost=0.00..41895.49
> > rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
> >  Index Cond: (key1 >= 50)
> >  Total runtime: 12.678 ms
> >
> > (stopped postgres; reset O/S cache; started postgres)
>
>
> why are you flushing postgres/os cache?  when you do that, you are
> measuring raw read time from disks.  Typical disk seek time is
> measured in milliseconds so the timings are completely appropriate
> once you remove caching effects. Hard drives (at least, the spinning
> kind) are slow and one of the major challenges of database and
> hardware engineering is working around their limitations.  Fortunately
> it looks like faster storage will soon be commonplace for reasonable
> prices.
>
> merlin
>


Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2

2013-05-30 Thread Igor Neyman
> -Original Message-
> From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-
> ow...@postgresql.org] On Behalf Of Merlin Moncure
> Sent: Thursday, May 30, 2013 11:14 AM
> To: John Mudd
> Cc: pgsql-performance@postgresql.org
> Subject: Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2
> 
> On Mon, May 27, 2013 at 9:02 AM, John Mudd  wrote:
> > Postgres 9.1.2 on Ubuntu 12.04
> >
> > Any reason why a select by primary key would be slower than a select
> > that includes an ORDER BY? I was really hoping using the primary key
> > would give me a boost.
> >
> > I stopped the server and cleared the O/S cache using "sync; echo 3 >
> > /proc/sys/vm/drop_caches" between the runs.
> >
> >
> >
> > test=# VACUUM ANALYZE test_select;
> > VACUUM
> >
> > (stopped postgres; reset O/S cache; started postgres)
> >
> > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50
> > ORDER BY key1, key2, key3, id LIMIT 1;
> >   QUERY
> > PLAN
> > -
> -
> > 
> >  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600
> > rows=1
> > loops=1)
> >->  Index Scan using my_key on test_select  (cost=0.00..41895.49
> > rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
> >  Index Cond: (key1 >= 50)
> >  Total runtime: 12.678 ms
> >
> > (stopped postgres; reset O/S cache; started postgres)
> 
> 
> why are you flushing postgres/os cache?  when you do that, you are
> measuring raw read time from disks.  Typical disk seek time is measured
> in milliseconds so the timings are completely appropriate once you
> remove caching effects. Hard drives (at least, the spinning
> kind) are slow and one of the major challenges of database and hardware
> engineering is working around their limitations.  Fortunately it looks
> like faster storage will soon be commonplace for reasonable prices.
> 
> merlin
> 

True.
But, on the hand (back to original question), 
execution plans that John got before and after suggested change in 
configuration parameters are exactly the same, though timing is different but 
only due to buffer cache issue.

Igor Neyman


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2

2013-05-30 Thread Merlin Moncure
On Mon, May 27, 2013 at 9:02 AM, John Mudd  wrote:
> Postgres 9.1.2 on Ubuntu 12.04
>
> Any reason why a select by primary key would be slower than a select that
> includes an ORDER BY? I was really hoping using the primary key would give
> me a boost.
>
> I stopped the server and cleared the O/S cache using "sync; echo 3 >
> /proc/sys/vm/drop_caches" between the runs.
>
>
>
> test=# VACUUM ANALYZE test_select;
> VACUUM
>
> (stopped postgres; reset O/S cache; started postgres)
>
> test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 ORDER
> BY key1, key2, key3, id LIMIT 1;
>   QUERY PLAN
> --
>  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600 rows=1
> loops=1)
>->  Index Scan using my_key on test_select  (cost=0.00..41895.49
> rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
>  Index Cond: (key1 >= 50)
>  Total runtime: 12.678 ms
>
> (stopped postgres; reset O/S cache; started postgres)


why are you flushing postgres/os cache?  when you do that, you are
measuring raw read time from disks.  Typical disk seek time is
measured in milliseconds so the timings are completely appropriate
once you remove caching effects. Hard drives (at least, the spinning
kind) are slow and one of the major challenges of database and
hardware engineering is working around their limitations.  Fortunately
it looks like faster storage will soon be commonplace for reasonable
prices.

merlin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2

2013-05-28 Thread Matheus de Oliveira
On Mon, May 27, 2013 at 11:02 AM, John Mudd  wrote:

> Postgres 9.1.2 on Ubuntu 12.04
>
> Any reason why a select by primary key would be slower than a select that
> includes an ORDER BY? I was really hoping using the primary key would give
> me a boost.
>
> I stopped the server and cleared the O/S cache using "sync; echo 3 >
> /proc/sys/vm/drop_caches" between the runs.
>
>
>
>  test=# VACUUM ANALYZE test_select;
> VACUUM
>
> (stopped postgres; reset O/S cache; started postgres)
>
> test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50
> ORDER BY key1, key2, key3, id LIMIT 1;
>   QUERY PLAN
>
> --
>  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600
> rows=1 loops=1)
>->  Index Scan using my_key on test_select  (cost=0.00..41895.49
> rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
>  Index Cond: (key1 >= 50)
>  Total runtime: 12.678 ms
>
> (stopped postgres; reset O/S cache; started postgres)
>
> test=# explain analyze SELECT * FROM test_select WHERE id = 50;
>QUERY PLAN
>
>
> -
>  Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1
> width=21) (actual time=31.396..31.398 rows=1 loops=1)
>Index Cond: (id = 50)
>  Total runtime: 31.504 ms
>
>
>
> Schema:
>
> test=# \d test_select
> Table "public.test_select"
>  Column | Type |Modifiers
>
>
> +--+--
>  id | integer  | not null default
> nextval('test_select_id_seq'::regclass)
>  key1   | integer  |
>  key2   | integer  |
>  key3   | integer  |
>  data   | character(4) |
> Indexes:
> "test_select_pkey" PRIMARY KEY, btree (id)
> "my_key" btree (key1, key2, key3, id)
>
> test=#
>
>
>
> Sample data:
>
> test=# SELECT * FROM test_select LIMIT 10;
>  id |  key1  |  key2  |  key3  | data
> ++++--
>   1 | 984966 | 283954 | 772063 | x
>   2 | 817668 | 393533 | 924888 | x
>   3 | 751039 | 798753 | 454309 | x
>   4 | 128505 | 329643 | 280553 | x
>   5 | 105600 | 257225 | 710015 | x
>   6 | 323891 | 615614 |  83206 | x
>   7 | 194054 |  63506 | 353171 | x
>   8 | 212068 | 881225 | 271804 | x
>   9 | 644180 |  26693 | 200738 | x
>  10 | 136586 | 498699 | 554417 | x
> (10 rows)
>
>
>
For me looks like "my_key" index should be better than the PK in this case.
For some reasons:

1. You are using a ORDER BY that has the same fields (and at the same
order) from your index, so PG only needs to navigate the index.
2. You are using LIMIT 1, which means PG only needs to fetch the first
element which key1>=5 (and stop the search right after it).

In the case of your PK, PG will need to navigate through the index and
return only one value also, but in this case the number of entries it needs
to look at is bigger, because "id" has more distinct values than "key1".

Regards,
-- 
Matheus de Oliveira
Analista de Banco de Dados
Dextra Sistemas - MPS.Br nível F!
www.dextra.com.br/postgres


Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2

2013-05-28 Thread Roman Konoval
John,

And can you please include BUFFERS to ANALYZE?

Regards,
Roman Konoval


On Tue, May 28, 2013 at 9:48 AM, Evgeniy Shishkin wrote:

>
>
>
>
> On 28.05.2013, at 2:17, John Mudd  wrote:
>
> Thanks again.
>
> Well, I have two problems with using the CLUSTER option. It's only
> temporary since any updates, depending how much free space is reserved per
> page, requires re-running the CLUSTER. And my primary concern is that it
> arbitrarily gives an unfair advantage to the primary key SELECT. Still,
> it's easy to test so here are the results. The primary key still looses
> even with the CLUSTER. Granted it is close but considering this is now an
> unfair comparison it still doesn't make sense to me. How can a search for a
> specific row that should be fairly straight forward take longer than a
> search that includes an ORDER BY clause?
>
>
> Well, you do just regular index scan because of LIMIT 1.
>
> And now it is just a matter of index size and table organization.
>
> I also don't understand why you consider CLUSTER unfair - the way you
> populated the table was natural cluster over my_key.
>
> But it bothers me why my_key is always better. Can you please test it on
> different values but the same rows? Because now it is two different tuples
> and you count every io.
>
>
> test=# CLUSTER test_select USING test_select_pkey ;
> CLUSTER
> test=# VACUUM ANALYZE test_select ;
> VACUUM
>
> (stopped postgres; reset O/S cache; started postgres)
>
>
> test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50
> ORDER BY key1, key2, key3, id LIMIT 1;
>   QUERY PLAN
>
> --
>  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=19.430..19.431
> rows=1 loops=1)
>->  Index Scan using my_key on test_select  (cost=0.00..41938.15
> rows=42 width=21) (actual time=19.428..19.428 rows=1 loops=1)
>  Index Cond: (key1 >= 50)
>  Total runtime: 19.526 ms
>
>
> (stopped postgres; reset O/S cache; started postgres)
>
>
> test=# explain analyze SELECT * FROM test_select WHERE id = 50;
>QUERY PLAN
>
>
> -
>  Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1
> width=21) (actual time=21.070..21.072 rows=1 loops=1)
>Index Cond: (id = 50)
>  Total runtime: 21.178 ms
>
>
>
>
> On Mon, May 27, 2013 at 10:59 AM, Evgeny Shishkin wrote:
>
>>
>> On May 27, 2013, at 6:35 PM, John Mudd  wrote:
>>
>> Thanks, that's easy enough to test. Didn't seem to help though.
>>
>>
>> Ok. And if you CLUSTER tables USING PK?
>>
>>
>> test=# REINDEX index test_select_pkey;
>> REINDEX
>> test=# VACUUM ANALYZE test_select ;
>> VACUUM
>>
>>
>> (stopped postgres; reset O/S cache; started postgres)
>>
>> test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50
>> ORDER BY key1, key2, key3, id LIMIT 1;
>>   QUERY PLAN
>>
>>
>> --
>>  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=16.368..16.369
>> rows=1 loops=1)
>>->  Index Scan using my_key on test_select  (cost=0.00..41981.16
>> rows=501333 width=21) (actual time=16.366..16.366 rows=1 loops=1)
>>  Index Cond: (key1 >= 50)
>>  Total runtime: 16.444 ms
>>
>>
>> (stopped postgres; reset O/S cache; started postgres)
>>
>> test=# explain analyze SELECT * FROM test_select WHERE id = 50;
>>QUERY PLAN
>>
>> -
>>   Index Scan using test_select_pkey on test_select  (cost=0.00..8.36
>> rows=1 width=21) (actual time=23.072..23.074 rows=1 loops=1)
>>Index Cond: (id = 50)
>>  Total runtime: 23.192 ms
>>
>>
>>
>>
>> On Mon, May 27, 2013 at 10:21 AM, Evgeny Shishkin 
>> wrote:
>>
>>>
>>> On May 27, 2013, at 6:02 PM, John Mudd  wrote:
>>>
>>> > Postgres 9.1.2 on Ubuntu 12.04
>>> >
>>> > Any reason why a select by primary key would be slower than a select
>>> that includes an ORDER BY? I was really hoping using the primary key would
>>> give me a boost.
>>> >
>>>
>>> You created my_key after data loading, and PK was there all the time.
>>> If you REINDEX PK, i bet it will be as fast.
>>>
>>> > I stopped the server and cleared the O/S cache using "sync; echo 3 >
>>> /proc/sys/vm/drop_caches" between the runs.
>>> >
>>> >
>>> >
>>> > test=# VACUUM ANALYZE test_select;
>>> > VACUUM
>>> >
>>> > (stopped postgres; reset O/S cache; started postgres)
>>> >
>>> > test=# explai

Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2

2013-05-27 Thread Evgeniy Shishkin




On 28.05.2013, at 2:17, John Mudd  wrote:

> Thanks again.
> 
> Well, I have two problems with using the CLUSTER option. It's only temporary 
> since any updates, depending how much free space is reserved per page, 
> requires re-running the CLUSTER. And my primary concern is that it 
> arbitrarily gives an unfair advantage to the primary key SELECT. Still, it's 
> easy to test so here are the results. The primary key still looses even with 
> the CLUSTER. Granted it is close but considering this is now an unfair 
> comparison it still doesn't make sense to me. How can a search for a specific 
> row that should be fairly straight forward take longer than a search that 
> includes an ORDER BY clause?
> 

Well, you do just regular index scan because of LIMIT 1.

And now it is just a matter of index size and table organization.

I also don't understand why you consider CLUSTER unfair - the way you populated 
the table was natural cluster over my_key.

But it bothers me why my_key is always better. Can you please test it on 
different values but the same rows? Because now it is two different tuples and 
you count every io.

> 
> test=# CLUSTER test_select USING test_select_pkey ;
> CLUSTER
> test=# VACUUM ANALYZE test_select ;
> VACUUM
> 
>> (stopped postgres; reset O/S cache; started postgres)
> 
> 
> test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 ORDER 
> BY key1, key2, key3, id LIMIT 1;
>   QUERY PLAN
> --
>  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=19.430..19.431 rows=1 
> loops=1)
>->  Index Scan using my_key on test_select  (cost=0.00..41938.15 
> rows=42 width=21) (actual time=19.428..19.428 rows=1 loops=1)
>  Index Cond: (key1 >= 50)
>  Total runtime: 19.526 ms
> 
> 
>> (stopped postgres; reset O/S cache; started postgres)
> 
> 
> test=# explain analyze SELECT * FROM test_select WHERE id = 50;
>QUERY PLAN 
>
> -
>  Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 
> width=21) (actual time=21.070..21.072 rows=1 loops=1)
>Index Cond: (id = 50)
>  Total runtime: 21.178 ms
> 
> 
> 
> 
> On Mon, May 27, 2013 at 10:59 AM, Evgeny Shishkin  
> wrote:
>> 
>> On May 27, 2013, at 6:35 PM, John Mudd  wrote:
>> 
>>> Thanks, that's easy enough to test. Didn't seem to help though.
>> 
>> Ok. And if you CLUSTER tables USING PK?
>> 
>>> 
>>> test=# REINDEX index test_select_pkey;
>>> REINDEX
>>> test=# VACUUM ANALYZE test_select ;
>>> VACUUM
>>> 
>>> 
>>> (stopped postgres; reset O/S cache; started postgres)
>>> 
>>> test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 ORDER 
>>> BY key1, key2, key3, id LIMIT 1;
>>>   QUERY PLAN
>>>   
>>> --
>>>  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=16.368..16.369 
>>> rows=1 loops=1)
>>>->  Index Scan using my_key on test_select  (cost=0.00..41981.16 
>>> rows=501333 width=21) (actual time=16.366..16.366 rows=1 loops=1)
>>>  Index Cond: (key1 >= 50)
>>>  Total runtime: 16.444 ms
>>> 
>>> 
>>> (stopped postgres; reset O/S cache; started postgres)
>>> 
>>> test=# explain analyze SELECT * FROM test_select WHERE id = 50;
>>>QUERY PLAN
>>> -
>>>  Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 
>>> width=21) (actual time=23.072..23.074 rows=1 loops=1)
>>>Index Cond: (id = 50)
>>>  Total runtime: 23.192 ms
>>> 
>>> 
>>> 
>>> 
>>> On Mon, May 27, 2013 at 10:21 AM, Evgeny Shishkin  
>>> wrote:
 
 On May 27, 2013, at 6:02 PM, John Mudd  wrote:
 
 > Postgres 9.1.2 on Ubuntu 12.04
 >
 > Any reason why a select by primary key would be slower than a select 
 > that includes an ORDER BY? I was really hoping using the primary key 
 > would give me a boost.
 >
 
 You created my_key after data loading, and PK was there all the time.
 If you REINDEX PK, i bet it will be as fast.
 
 > I stopped the server and cleared the O/S cache using "sync; echo 3 > 
 > /proc/sys/vm/drop_caches" between the runs.
 >
 >
 >
 > test=# VACUUM ANALYZE test_select;
 > VACUUM
 >
 > (stopped p

Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2

2013-05-27 Thread John Mudd
Thanks again.

Well, I have two problems with using the CLUSTER option. It's only
temporary since any updates, depending how much free space is reserved per
page, requires re-running the CLUSTER. And my primary concern is that it
arbitrarily gives an unfair advantage to the primary key SELECT. Still,
it's easy to test so here are the results. The primary key still looses
even with the CLUSTER. Granted it is close but considering this is now an
unfair comparison it still doesn't make sense to me. How can a search for a
specific row that should be fairly straight forward take longer than a
search that includes an ORDER BY clause?


test=# CLUSTER test_select USING test_select_pkey ;
CLUSTER
test=# VACUUM ANALYZE test_select ;
VACUUM

(stopped postgres; reset O/S cache; started postgres)


test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 ORDER
BY key1, key2, key3, id LIMIT 1;
  QUERY PLAN
--
 Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=19.430..19.431
rows=1 loops=1)
   ->  Index Scan using my_key on test_select  (cost=0.00..41938.15
rows=42 width=21) (actual time=19.428..19.428 rows=1 loops=1)
 Index Cond: (key1 >= 50)
 Total runtime: 19.526 ms


(stopped postgres; reset O/S cache; started postgres)


test=# explain analyze SELECT * FROM test_select WHERE id = 50;
   QUERY PLAN

-
 Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1
width=21) (actual time=21.070..21.072 rows=1 loops=1)
   Index Cond: (id = 50)
 Total runtime: 21.178 ms




On Mon, May 27, 2013 at 10:59 AM, Evgeny Shishkin wrote:

>
> On May 27, 2013, at 6:35 PM, John Mudd  wrote:
>
> Thanks, that's easy enough to test. Didn't seem to help though.
>
>
> Ok. And if you CLUSTER tables USING PK?
>
>
> test=# REINDEX index test_select_pkey;
> REINDEX
> test=# VACUUM ANALYZE test_select ;
> VACUUM
>
>
> (stopped postgres; reset O/S cache; started postgres)
>
> test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50
> ORDER BY key1, key2, key3, id LIMIT 1;
>   QUERY PLAN
>
>
> --
>  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=16.368..16.369
> rows=1 loops=1)
>->  Index Scan using my_key on test_select  (cost=0.00..41981.16
> rows=501333 width=21) (actual time=16.366..16.366 rows=1 loops=1)
>  Index Cond: (key1 >= 50)
>  Total runtime: 16.444 ms
>
>
> (stopped postgres; reset O/S cache; started postgres)
>
> test=# explain analyze SELECT * FROM test_select WHERE id = 50;
>QUERY PLAN
>
> -
>   Index Scan using test_select_pkey on test_select  (cost=0.00..8.36
> rows=1 width=21) (actual time=23.072..23.074 rows=1 loops=1)
>Index Cond: (id = 50)
>  Total runtime: 23.192 ms
>
>
>
>
> On Mon, May 27, 2013 at 10:21 AM, Evgeny Shishkin wrote:
>
>>
>> On May 27, 2013, at 6:02 PM, John Mudd  wrote:
>>
>> > Postgres 9.1.2 on Ubuntu 12.04
>> >
>> > Any reason why a select by primary key would be slower than a select
>> that includes an ORDER BY? I was really hoping using the primary key would
>> give me a boost.
>> >
>>
>> You created my_key after data loading, and PK was there all the time.
>> If you REINDEX PK, i bet it will be as fast.
>>
>> > I stopped the server and cleared the O/S cache using "sync; echo 3 >
>> /proc/sys/vm/drop_caches" between the runs.
>> >
>> >
>> >
>> > test=# VACUUM ANALYZE test_select;
>> > VACUUM
>> >
>> > (stopped postgres; reset O/S cache; started postgres)
>> >
>> > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50
>> ORDER BY key1, key2, key3, id LIMIT 1;
>> >   QUERY PLAN
>> >
>> --
>> >  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600
>> rows=1 loops=1)
>> >->  Index Scan using my_key on test_select  (cost=0.00..41895.49
>> rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
>> >  Index Cond: (key1 >= 50)
>> >  Total runtime: 12.678 ms
>> >
>> > (stopped postgres; reset O/S cache; started postgres)
>> >
>> > test=# explain analyze SELECT * FROM test_select WHERE id = 50;
>> > 

Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2

2013-05-27 Thread Evgeny Shishkin

On May 27, 2013, at 6:35 PM, John Mudd  wrote:

> Thanks, that's easy enough to test. Didn't seem to help though.
> 

Ok. And if you CLUSTER tables USING PK?

> 
> test=# REINDEX index test_select_pkey;
> REINDEX
> test=# VACUUM ANALYZE test_select ;
> VACUUM
> 
> 
> (stopped postgres; reset O/S cache; started postgres)
> 
> test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 ORDER 
> BY key1, key2, key3, id LIMIT 1;
>   QUERY PLAN  
> 
> --
>  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=16.368..16.369 rows=1 
> loops=1)
>->  Index Scan using my_key on test_select  (cost=0.00..41981.16 
> rows=501333 width=21) (actual time=16.366..16.366 rows=1 loops=1)
>  Index Cond: (key1 >= 50)
>  Total runtime: 16.444 ms
> 
> 
> (stopped postgres; reset O/S cache; started postgres)
> 
> test=# explain analyze SELECT * FROM test_select WHERE id = 50;
>QUERY PLAN
> -
>  Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 
> width=21) (actual time=23.072..23.074 rows=1 loops=1)
>Index Cond: (id = 50)
>  Total runtime: 23.192 ms
> 
> 
> 
> 
> On Mon, May 27, 2013 at 10:21 AM, Evgeny Shishkin  
> wrote:
> 
> On May 27, 2013, at 6:02 PM, John Mudd  wrote:
> 
> > Postgres 9.1.2 on Ubuntu 12.04
> >
> > Any reason why a select by primary key would be slower than a select that 
> > includes an ORDER BY? I was really hoping using the primary key would give 
> > me a boost.
> >
> 
> You created my_key after data loading, and PK was there all the time.
> If you REINDEX PK, i bet it will be as fast.
> 
> > I stopped the server and cleared the O/S cache using "sync; echo 3 > 
> > /proc/sys/vm/drop_caches" between the runs.
> >
> >
> >
> > test=# VACUUM ANALYZE test_select;
> > VACUUM
> >
> > (stopped postgres; reset O/S cache; started postgres)
> >
> > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 ORDER 
> > BY key1, key2, key3, id LIMIT 1;
> >   QUERY PLAN
> > --
> >  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600 
> > rows=1 loops=1)
> >->  Index Scan using my_key on test_select  (cost=0.00..41895.49 
> > rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
> >  Index Cond: (key1 >= 50)
> >  Total runtime: 12.678 ms
> >
> > (stopped postgres; reset O/S cache; started postgres)
> >
> > test=# explain analyze SELECT * FROM test_select WHERE id = 50;
> >QUERY PLAN
> > -
> >  Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 
> > width=21) (actual time=31.396..31.398 rows=1 loops=1)
> >Index Cond: (id = 50)
> >  Total runtime: 31.504 ms
> >
> >
> >
> > Schema:
> >
> > test=# \d test_select
> > Table "public.test_select"
> >  Column | Type |Modifiers
> > +--+--
> >  id | integer  | not null default 
> > nextval('test_select_id_seq'::regclass)
> >  key1   | integer  |
> >  key2   | integer  |
> >  key3   | integer  |
> >  data   | character(4) |
> > Indexes:
> > "test_select_pkey" PRIMARY KEY, btree (id)
> > "my_key" btree (key1, key2, key3, id)
> >
> > test=#
> >
> >
> >
> > Sample data:
> >
> > test=# SELECT * FROM test_select LIMIT 10;
> >  id |  key1  |  key2  |  key3  | data
> > ++++--
> >   1 | 984966 | 283954 | 772063 | x
> >   2 | 817668 | 393533 | 924888 | x
> >   3 | 751039 | 798753 | 454309 | x
> >   4 | 128505 | 329643 | 280553 | x
> >   5 | 105600 | 257225 | 710015 | x
> >   6 | 323891 | 615614 |  83206 | x
> >   7 | 194054 |  63506 | 353171 | x
> >   8 | 212068 | 881225 | 271804 | x
> >   9 | 644180 |  26693 | 200738 | x
> >  10 | 136586 | 498699 | 554417 | x
> > (10 rows)
> >
> >
> >
> >
> > Here's how I populated the table:
> >
> > import psycopg2
> >
> > conn = psycopg2.connect('dbname=test')
> >
> > cur = conn.cursor()
> >
> > def random_int():
> > n = 100
> > return random.randint(0,n)
> >
> > def random_key():
> > return random_int(), random_int(), random_int()
> >
> > def create_table():
> > cur.execute('''
> >

Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2

2013-05-27 Thread John Mudd
Thanks, that's easy enough to test. Didn't seem to help though.


test=# REINDEX index test_select_pkey;
REINDEX
test=# VACUUM ANALYZE test_select ;
VACUUM


(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 ORDER
BY key1, key2, key3, id LIMIT 1;
  QUERY PLAN

--
 Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=16.368..16.369
rows=1 loops=1)
   ->  Index Scan using my_key on test_select  (cost=0.00..41981.16
rows=501333 width=21) (actual time=16.366..16.366 rows=1 loops=1)
 Index Cond: (key1 >= 50)
 Total runtime: 16.444 ms


(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE id = 50;
   QUERY PLAN
-
 Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1
width=21) (actual time=23.072..23.074 rows=1 loops=1)
   Index Cond: (id = 50)
 Total runtime: 23.192 ms




On Mon, May 27, 2013 at 10:21 AM, Evgeny Shishkin wrote:

>
> On May 27, 2013, at 6:02 PM, John Mudd  wrote:
>
> > Postgres 9.1.2 on Ubuntu 12.04
> >
> > Any reason why a select by primary key would be slower than a select
> that includes an ORDER BY? I was really hoping using the primary key would
> give me a boost.
> >
>
> You created my_key after data loading, and PK was there all the time.
> If you REINDEX PK, i bet it will be as fast.
>
> > I stopped the server and cleared the O/S cache using "sync; echo 3 >
> /proc/sys/vm/drop_caches" between the runs.
> >
> >
> >
> > test=# VACUUM ANALYZE test_select;
> > VACUUM
> >
> > (stopped postgres; reset O/S cache; started postgres)
> >
> > test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50
> ORDER BY key1, key2, key3, id LIMIT 1;
> >   QUERY PLAN
> >
> --
> >  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600
> rows=1 loops=1)
> >->  Index Scan using my_key on test_select  (cost=0.00..41895.49
> rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
> >  Index Cond: (key1 >= 50)
> >  Total runtime: 12.678 ms
> >
> > (stopped postgres; reset O/S cache; started postgres)
> >
> > test=# explain analyze SELECT * FROM test_select WHERE id = 50;
> >QUERY PLAN
> >
> -
> >  Index Scan using test_select_pkey on test_select  (cost=0.00..8.36
> rows=1 width=21) (actual time=31.396..31.398 rows=1 loops=1)
> >Index Cond: (id = 50)
> >  Total runtime: 31.504 ms
> >
> >
> >
> > Schema:
> >
> > test=# \d test_select
> > Table "public.test_select"
> >  Column | Type |Modifiers
> >
> +--+--
> >  id | integer  | not null default
> nextval('test_select_id_seq'::regclass)
> >  key1   | integer  |
> >  key2   | integer  |
> >  key3   | integer  |
> >  data   | character(4) |
> > Indexes:
> > "test_select_pkey" PRIMARY KEY, btree (id)
> > "my_key" btree (key1, key2, key3, id)
> >
> > test=#
> >
> >
> >
> > Sample data:
> >
> > test=# SELECT * FROM test_select LIMIT 10;
> >  id |  key1  |  key2  |  key3  | data
> > ++++--
> >   1 | 984966 | 283954 | 772063 | x
> >   2 | 817668 | 393533 | 924888 | x
> >   3 | 751039 | 798753 | 454309 | x
> >   4 | 128505 | 329643 | 280553 | x
> >   5 | 105600 | 257225 | 710015 | x
> >   6 | 323891 | 615614 |  83206 | x
> >   7 | 194054 |  63506 | 353171 | x
> >   8 | 212068 | 881225 | 271804 | x
> >   9 | 644180 |  26693 | 200738 | x
> >  10 | 136586 | 498699 | 554417 | x
> > (10 rows)
> >
> >
> >
> >
> > Here's how I populated the table:
> >
> > import psycopg2
> >
> > conn = psycopg2.connect('dbname=test')
> >
> > cur = conn.cursor()
> >
> > def random_int():
> > n = 100
> > return random.randint(0,n)
> >
> > def random_key():
> > return random_int(), random_int(), random_int()
> >
> > def create_table():
> > cur.execute('''
> > DROP TABLE IF EXISTS test_select;
> >
> > CREATE TABLE test_select (
> > id  SERIAL PRIMARY KEY,
> > key1INTEGER,
> > key2INTEGER

Re: [PERFORM] Slow SELECT by primary key? Postgres 9.1.2

2013-05-27 Thread Evgeny Shishkin

On May 27, 2013, at 6:02 PM, John Mudd  wrote:

> Postgres 9.1.2 on Ubuntu 12.04
> 
> Any reason why a select by primary key would be slower than a select that 
> includes an ORDER BY? I was really hoping using the primary key would give me 
> a boost. 
> 

You created my_key after data loading, and PK was there all the time.
If you REINDEX PK, i bet it will be as fast.

> I stopped the server and cleared the O/S cache using "sync; echo 3 > 
> /proc/sys/vm/drop_caches" between the runs.
> 
> 
> 
> test=# VACUUM ANALYZE test_select;
> VACUUM
> 
> (stopped postgres; reset O/S cache; started postgres)
> 
> test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 ORDER 
> BY key1, key2, key3, id LIMIT 1;
>   QUERY PLAN
> --
>  Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600 rows=1 
> loops=1)
>->  Index Scan using my_key on test_select  (cost=0.00..41895.49 
> rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
>  Index Cond: (key1 >= 50)
>  Total runtime: 12.678 ms
> 
> (stopped postgres; reset O/S cache; started postgres)
> 
> test=# explain analyze SELECT * FROM test_select WHERE id = 50;
>QUERY PLAN 
>
> -
>  Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1 
> width=21) (actual time=31.396..31.398 rows=1 loops=1)
>Index Cond: (id = 50)
>  Total runtime: 31.504 ms
> 
> 
> 
> Schema:
> 
> test=# \d test_select
> Table "public.test_select"
>  Column | Type |Modifiers 
> 
> +--+--
>  id | integer  | not null default 
> nextval('test_select_id_seq'::regclass)
>  key1   | integer  | 
>  key2   | integer  | 
>  key3   | integer  | 
>  data   | character(4) | 
> Indexes:
> "test_select_pkey" PRIMARY KEY, btree (id)
> "my_key" btree (key1, key2, key3, id)
> 
> test=# 
> 
> 
> 
> Sample data:
> 
> test=# SELECT * FROM test_select LIMIT 10;
>  id |  key1  |  key2  |  key3  | data 
> ++++--
>   1 | 984966 | 283954 | 772063 | x   
>   2 | 817668 | 393533 | 924888 | x   
>   3 | 751039 | 798753 | 454309 | x   
>   4 | 128505 | 329643 | 280553 | x   
>   5 | 105600 | 257225 | 710015 | x   
>   6 | 323891 | 615614 |  83206 | x   
>   7 | 194054 |  63506 | 353171 | x   
>   8 | 212068 | 881225 | 271804 | x   
>   9 | 644180 |  26693 | 200738 | x   
>  10 | 136586 | 498699 | 554417 | x   
> (10 rows)
> 
> 
> 
> 
> Here's how I populated the table:
> 
> import psycopg2
> 
> conn = psycopg2.connect('dbname=test')
> 
> cur = conn.cursor()
> 
> def random_int():
> n = 100
> return random.randint(0,n)
> 
> def random_key():
> return random_int(), random_int(), random_int()
> 
> def create_table():
> cur.execute('''
> DROP TABLE IF EXISTS test_select;
> 
> CREATE TABLE test_select (
> id  SERIAL PRIMARY KEY,
> key1INTEGER,
> key2INTEGER,
> key3INTEGER,
> datachar(4)
> );
> ''')
> conn.commit()
> 
> n = 100
> for i in range(n):
> cur.execute("INSERT INTO test_select(key1, key2, key3, data) 
> VALUES(%s, %s, %s, 'x')", random_key())
> conn.commit()
> 
> cur.execute('CREATE INDEX my_key ON test_select(key1, key2, key3, id)')
> conn.commit()
> 
> create_table()
> 



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow SELECT by primary key? Postgres 9.1.2

2013-05-27 Thread John Mudd
Postgres 9.1.2 on Ubuntu 12.04

Any reason why a select by primary key would be slower than a select that
includes an ORDER BY? I was really hoping using the primary key would give
me a boost.

I stopped the server and cleared the O/S cache using "sync; echo 3 >
/proc/sys/vm/drop_caches" between the runs.



test=# VACUUM ANALYZE test_select;
VACUUM

(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE key1 >= 50 ORDER
BY key1, key2, key3, id LIMIT 1;
  QUERY PLAN
--
 Limit  (cost=0.00..0.08 rows=1 width=21) (actual time=12.599..12.600
rows=1 loops=1)
   ->  Index Scan using my_key on test_select  (cost=0.00..41895.49
rows=498724 width=21) (actual time=12.597..12.597 rows=1 loops=1)
 Index Cond: (key1 >= 50)
 Total runtime: 12.678 ms

(stopped postgres; reset O/S cache; started postgres)

test=# explain analyze SELECT * FROM test_select WHERE id = 50;
   QUERY PLAN

-
 Index Scan using test_select_pkey on test_select  (cost=0.00..8.36 rows=1
width=21) (actual time=31.396..31.398 rows=1 loops=1)
   Index Cond: (id = 50)
 Total runtime: 31.504 ms



Schema:

test=# \d test_select
Table "public.test_select"
 Column | Type |Modifiers

+--+--
 id | integer  | not null default
nextval('test_select_id_seq'::regclass)
 key1   | integer  |
 key2   | integer  |
 key3   | integer  |
 data   | character(4) |
Indexes:
"test_select_pkey" PRIMARY KEY, btree (id)
"my_key" btree (key1, key2, key3, id)

test=#



Sample data:

test=# SELECT * FROM test_select LIMIT 10;
 id |  key1  |  key2  |  key3  | data
++++--
  1 | 984966 | 283954 | 772063 | x
  2 | 817668 | 393533 | 924888 | x
  3 | 751039 | 798753 | 454309 | x
  4 | 128505 | 329643 | 280553 | x
  5 | 105600 | 257225 | 710015 | x
  6 | 323891 | 615614 |  83206 | x
  7 | 194054 |  63506 | 353171 | x
  8 | 212068 | 881225 | 271804 | x
  9 | 644180 |  26693 | 200738 | x
 10 | 136586 | 498699 | 554417 | x
(10 rows)




Here's how I populated the table:

import psycopg2

conn = psycopg2.connect('dbname=test')

cur = conn.cursor()

def random_int():
n = 100
return random.randint(0,n)

def random_key():
return random_int(), random_int(), random_int()

def create_table():
cur.execute('''
DROP TABLE IF EXISTS test_select;

CREATE TABLE test_select (
id  SERIAL PRIMARY KEY,
key1INTEGER,
key2INTEGER,
key3INTEGER,
datachar(4)
);
''')
conn.commit()

n = 100
for i in range(n):
cur.execute("INSERT INTO test_select(key1, key2, key3, data)
VALUES(%s, %s, %s, 'x')", random_key())
conn.commit()

cur.execute('CREATE INDEX my_key ON test_select(key1, key2, key3, id)')
conn.commit()

create_table()


Re: [PERFORM] Slow SELECT on small table

2010-11-22 Thread Kevin Grittner
Martin Boese  wrote:
 
> The table has only ~1400 rows. A count(*) takes more than 70
> seconds.  Other tables are fast as usual.
> 
> When this happens I can also see my system's disks are suffering.
> 'systat -vm' shows 100% disk load at ~4MB/sec data rates.
> 
> A simple VACUUM does *not* fix it, a VACUUM FULL however does. See
> the textfile attached.
 
This is almost certainly a result of bloat on this table. 
Autovacuum should normally protect you from that, but there are a
few things which can prevent it from doing so, like long-running
transactions or repeated updates against the entire table in a short
time.  There has also been a bug found recently which, as I
understand it, can cause autovacuum to become less aggressive over
time, which might possibly contribute to this sort of problem.
 
You appear to have snipped the portion of the vacuum output which
might have confirmed and quantified the problem.  If you get into
this state again, the entire output of this would be informative:
 
VACUUM VERBOSE public.circuit;
 
The goal would be to try to prevent the bloat in the first place so
that you don't need to use aggressive maintenance like VACUUM FULL
to recover.  Manual vacuums or tweaking the autovacuum parameters
may help.  Also, keep an eye out for maintenance releases for 9.0;
there's likely to be a fix coming which will help you with this.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow SELECT on small table

2010-11-22 Thread Martin Boese
Hi,

I am using Postgresql: 9.01, PostGIS 1.5 on FreeBSD 7.0. I have at
least one table on which SELECT's turn terribly slow from time to time.
This happened at least three times, also on version 8.4.

The table has only ~1400 rows. A count(*) takes more than 70 seconds.
Other tables are fast as usual.

When this happens I can also see my system's disks are suffering.
'systat -vm' shows 100% disk load at ~4MB/sec data rates.

A simple VACUUM does *not* fix it, a VACUUM FULL however does. See the
textfile attached.

My postgresql.conf is untouched as per distribution.

Can someone hint me how I can troubleshoot this problem?

Thanks!

Martin

mbms=# select version();
 version
  
--
 PostgreSQL 9.0.1 on amd64-portbld-freebsd7.0, compiled by GCC cc (GCC) 4.2.1 
20070719  [FreeBSD], 64-bit
(1 row)




mbms=# EXPLAIN ANALYZE select count(*) from circuit; 
   QUERY PLAN   
 
-
 Aggregate  (cost=22213.20..22213.21 rows=1 width=0) (actual 
time=72847.131..72847.133 rows=1 loops=1)
   ->  Seq Scan on circuit  (cost=0.00..22209.56 rows=1456 width=0) (actual 
time=15527.799..72844.633 rows=1456 loops=1)
 Total runtime: 72922.484 ms
(3 rows)



mbms=# VACUUM FULL VERBOSE ANALYZE ;

(..snip..)

INFO:  vacuuming "public.circuit"
INFO:  analyzing "public.circuit"
INFO:  "circuit": scanned 41 of 41 pages, containing 1456 live rows and 0 dead 
rows; 1456 rows in sample, 1456 estimated total rows
VACUUM


mbms=# EXPLAIN ANALYZE select count(*) from circuit;
  QUERY PLAN
  
--
 Aggregate  (cost=59.20..59.21 rows=1 width=0) (actual time=4.416..4.418 rows=1 
loops=1)
   ->  Seq Scan on circuit  (cost=0.00..55.56 rows=1456 width=0) (actual 
time=0.013..2.302 rows=1456 loops=1)
 Total runtime: 4.470 ms
(3 rows)

mbms=# \d circuit
 
  Table "public.circuit"
   Column|Type |
  Modifiers   
-+-+--
 id  | integer | not null default 
nextval('circuit_id_seq'::regclass)
 router_id   | integer | 
 customer_id | integer | 
 site| character varying(256)  | 
 service | character varying(48)   | 
 last_ifdescr| character varying(256)  | 
 last_ifalias| character varying(256)  | 
 bandwidth_ul| integer | default 0
 bandwidth_dl| integer | default 0
 last_seen   | timestamp without time zone | 
 last_ifindex| integer | default 0
 termination_point_id| integer | 
 last_adminstatus| smallint| default 1
 status  | character(1)| default 'A'::bpchar
 location| geometry| 
 circuit_type| character(1)| default 'R'::bpchar
 p_local | character varying(256)  | 
 p_remote| character varying(256)  | 
 p_technology| character varying(64)   | 
 x_provider  | character varying(256)  | 
 x_provider_reference| character varying(128)  | 
 termination_point_reference | character varying(64)   | 
 is_monitored| boolean | default false
 temp_config | boolean | default false
 activation_date | timestamp without time zone | 
 s_last_did  | character varying(64)   | 
 td_cache_class  | character varying(128)  | 
 td_cache_id | integer | 
 td_cache_lastupdate | timestamp without time zone | 
 commercial_service_id   | integer | 
Indexes:
"circuit_id_key" UNIQUE, btree (id)
"id_circuit_status" btree (status)
"idx_circuit_customer_id" btree (customer_id)
"idx_circuit_site" btree (site)
Check constraints:
"enforce_dims_location" CHECK (ndims(location) = 2)
"enforce_geotype_location" CHECK (geom

Re: [PERFORM] Slow "Select count(*) ..." query on table with 60 Mio. rows

2010-01-14 Thread Greg Smith

Kevin Grittner wrote:

Greg Smith  wrote:
 
  

There's a whole list of FAQs that are documented on the wiki but
not in the main FAQ yet leftover from before the main FAQ was
hosted there. You can see them all at 
http://wiki.postgresql.org/wiki/Frequently_Asked_Questions


I just haven't had time to merge those all usefully into the main
FAQ.

 
Well, unless you object to the way I did it, there's one down. 
Should I remove it from the list of "Other FAQs" on the page you

cite?
 


Sure; everyone should feel free to assimilate into the main FAQ and wipe 
out anything on that smaller list. Those are mainly topics where the 
discussion of workarounds and approaches can be much longer than 
standard FAQ length, so I suspect many of the answers are going to be a 
very brief summary with a link to longer discussion. If you come across 
a really small one, we might even wipe out the original page once it's 
merged in.


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com



Re: [PERFORM] Slow "Select count(*) ..." query on table with 60 Mio. rows

2010-01-14 Thread Kevin Grittner
Greg Smith  wrote:
 
> There's a whole list of FAQs that are documented on the wiki but
> not in the main FAQ yet leftover from before the main FAQ was
> hosted there. You can see them all at 
> http://wiki.postgresql.org/wiki/Frequently_Asked_Questions
> 
> I just haven't had time to merge those all usefully into the main
> FAQ.
 
Well, unless you object to the way I did it, there's one down. 
Should I remove it from the list of "Other FAQs" on the page you
cite?
 
(Of course, it goes without saying that you're welcome to improve
upon anything I put in there.)
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow "Select count(*) ..." query on table with 60 Mio. rows

2010-01-14 Thread Greg Smith

Kevin Grittner wrote:

Greg Smith  wrote:
 
  

The content was already there, just not linked into the main FAQ
yet: 
http://wiki.postgresql.org/wiki/Slow_Counting

 
For a question asked this frequently, it should probably be in the

FAQ.  I'll add a link from there to the more thorough write-up.
 


There's a whole list of FAQs that are documented on the wiki but not in 
the main FAQ yet leftover from before the main FAQ was hosted there. You 
can see them all at 
http://wiki.postgresql.org/wiki/Frequently_Asked_Questions


I just haven't had time to merge those all usefully into the main FAQ.

--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com



Re: [PERFORM] Slow "Select count(*) ..." query on table with 60 Mio. rows

2010-01-14 Thread Kevin Grittner
Greg Smith  wrote:
 
> The content was already there, just not linked into the main FAQ
> yet: 
> http://wiki.postgresql.org/wiki/Slow_Counting
 
For a question asked this frequently, it should probably be in the
FAQ.  I'll add a link from there to the more thorough write-up.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow "Select count(*) ..." query on table with 60 Mio. rows

2010-01-14 Thread Greg Smith

Kevin Grittner wrote:

Matthew Wakeling  wrote:
 
  

This is an FAQ.

 
I just added it to the wiki FAQ page:
 
http://wiki.postgresql.org/wiki/FAQ#Why_is_.22SELECT_count.28.2A.29_FROM_bigtable.3B.22_slow.3F
  
The content was already there, just not linked into the main FAQ yet: 
http://wiki.postgresql.org/wiki/Slow_Counting


--
Greg Smith2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
g...@2ndquadrant.com  www.2ndQuadrant.com



Re: [PERFORM] Slow "Select count(*) ..." query on table with 60 Mio. rows

2010-01-14 Thread Kevin Grittner
Ivan Voras  wrote:
 
> Maybe you could add a short note why an estimation like from the 
> pg_class table is usually enough.
 
OK.  Will do.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow "Select count(*) ..." query on table with 60 Mio. rows

2010-01-14 Thread Ivan Voras

Kevin Grittner wrote:

Matthew Wakeling  wrote:
 

This is an FAQ.
 
I just added it to the wiki FAQ page:
 
http://wiki.postgresql.org/wiki/FAQ#Why_is_.22SELECT_count.28.2A.29_FROM_bigtable.3B.22_slow.3F


Maybe you could add a short note why an estimation like from the 
pg_class table is usually enough.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



Re: [PERFORM] Slow "Select count(*) ..." query on table with 60 Mio. rows

2010-01-14 Thread Kevin Grittner
Matthew Wakeling  wrote:
 
> This is an FAQ.
 
I just added it to the wiki FAQ page:
 
http://wiki.postgresql.org/wiki/FAQ#Why_is_.22SELECT_count.28.2A.29_FROM_bigtable.3B.22_slow.3F
 
-Kevin


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow "Select count(*) ..." query on table with 60 Mio. rows

2010-01-14 Thread A. Kretschmer
In response to tom :
> Hi,
> 
> === Problem ===
> 
> i have a db-table "data_measurand" with about 6000 (60 Millions)
> rows and the following query takes about 20-30 seconds (with psql):
> 
> mydb=# select count(*) from data_measurand;
>   count   
> --
>  60846187
> (1 row)
> 
> 
> === Question ===
> 
> - What can i do to improve the performance for the data_measurand table?

Short answer: nothing.

Long answer: PG has to check the visibility for each record, so it
forces a seq.scan.

But you can get an estimation, ask pg_class (a system table), the column
reltuples there contains an estimated row rount.
http://www.postgresql.org/docs/current/static/catalog-pg-class.html

If you really needs the correct row-count you should create a TRIGGER
and count with this trigger all INSERTs and DELETEs.


Regards, Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow "Select count(*) ..." query on table with 60 Mio. rows

2010-01-14 Thread Matthew Wakeling

On Thu, 14 Jan 2010, tom wrote:

i have a db-table "data_measurand" with about 6000 (60 Millions)
rows and the following query takes about 20-30 seconds (with psql):

mydb=# select count(*) from data_measurand;
 count
--
60846187
(1 row)


Sounds pretty reasonable to me. Looking at your table, the rows are maybe 
200 bytes wide? That's 12GB of data for Postgres to munch through. 30 
seconds is really rather quick for that (400MB/s). What sort of RAID array 
is managing to give you that much?



I use a software raid and LVM for Logical Volume Management. Filesystem
is ext3


Ditch lvm.


This is an FAQ. Counting the rows in a table is an expensive operation in 
Postgres. It can't be answered directly from an index. If you want, you 
can keep track of the number of rows yourself with triggers, but beware 
that this will slow down write access to the table.


Matthew

--
Nog: Look! They've made me into an ensign!
O'Brien: I didn't know things were going so badly.
Nog: Frightening, isn't it?

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow "Select count(*) ..." query on table with 60 Mio. rows

2010-01-14 Thread tom
Hi,

=== Problem ===

i have a db-table "data_measurand" with about 6000 (60 Millions)
rows and the following query takes about 20-30 seconds (with psql):

mydb=# select count(*) from data_measurand;
  count   
--
 60846187
(1 row)


=== Question ===

- What can i do to improve the performance for the data_measurand table?
 
=== Background ===

I created a application with django 1.1 ( http://djangoproject.com ) to
collect, analyze and visualize measurement data.

=== My System ===

= Postgres Version =
postgres=# select version();
version
-
 PostgreSQL 8.3.9 on i486-pc-linux-gnu, compiled by GCC gcc-4.3.real
(Debian 4.3.2-1.1) 4.3.2
(1 row)

I installed postgres with apt-get from debian lenny without any
modifications.

= Debian Lenny Kernel Version =
lenny:~# uname  -a
Linux or.ammonit.com 2.6.26-2-686-bigmem #1 SMP Wed Nov 4 21:12:12 UTC
2009 i686 GNU/Linux

= Hardware = 
model name  : AMD Athlon(tm) 64 X2 Dual Core Processor 6000+
cpu MHz : 1000.000
cache size  : 512 KB
MemTotal: 8281516 kB (8 GB)

I use a software raid and LVM for Logical Volume Management. Filesystem
is ext3



=== My Table Definitions ===

mydb=# \d data_measurand;
 Table "public.data_measurand"
 Column  |  Type  |
Modifiers  
-++-
 id  | integer| not null default
nextval('data_measurand_id_seq'::regclass)
 entry_id| integer| not null
 sensor_id   | integer| not null
 avg_value   | numeric(10,4)  | 
 avg_count_value | integer| 
 min_value   | numeric(10,4)  | 
 max_value   | numeric(10,4)  | 
 sigma_value | numeric(10,4)  | 
 unit| character varying(20)  | not null
 status  | integer| not null
 comment | character varying(255) | not null
Indexes:
"data_measurand_pkey" PRIMARY KEY, btree (id)
"data_measurand_entry_id_68e2e3fe" UNIQUE, btree (entry_id,
sensor_id)
"data_measurand_avg_count_value" btree (avg_count_value)
"data_measurand_avg_value" btree (avg_value)
"data_measurand_comment" btree (comment)
"data_measurand_entry_id" btree (entry_id)
"data_measurand_max_value" btree (max_value)
"data_measurand_min_value" btree (min_value)
"data_measurand_sensor_id" btree (sensor_id)
"data_measurand_sigma_value" btree (sigma_value)
"data_measurand_status" btree (status)
"data_measurand_unit" btree (unit)
Foreign-key constraints:
"entry_id_refs_id_50fa9bdf" FOREIGN KEY (entry_id) REFERENCES
data_entry(id) DEFERRABLE INITIALLY DEFERRED
"sensor_id_refs_id_5ed84c7c" FOREIGN KEY (sensor_id) REFERENCES
sensor_sensor(id) DEFERRABLE INITIALLY DEFERRED



mydb=# \d data_entry;
   Table "public.data_entry"
  Column  |   Type   |
Modifiers
--+--+-
 id   | integer  | not null default
nextval('data_entry_id_seq'::regclass)
 project_id   | integer  | not null
 logger_id| integer  | not null
 original_file_id | integer  | not null
 datetime | timestamp with time zone | not null
Indexes:
"data_entry_pkey" PRIMARY KEY, btree (id)
"data_entry_logger_id_197f5d41" UNIQUE, btree (logger_id, datetime)
"data_entry_datetime" btree (datetime)
"data_entry_logger_id" btree (logger_id)
"data_entry_original_file_id" btree (original_file_id)
"data_entry_project_id" btree (project_id)
Foreign-key constraints:
"logger_id_refs_id_5f73cf46" FOREIGN KEY (logger_id) REFERENCES
logger_logger(id) DEFERRABLE INITIALLY DEFERRED
"original_file_id_refs_id_44e8d3b1" FOREIGN KEY (original_file_id)
REFERENCES data_originalfile(id) DEFERRABLE INITIALLY DEFERRED
"project_id_refs_id_719fb302" FOREIGN KEY (project_id) REFERENCES
project_project(id) DEFERRABLE INITIALLY DEFERRED



mydb=# \d project_project;
Table "public.project_project"
Column |  Type  |
Modifiers   
---++--
 id| integer| not null default
nextval('project_project_id_seq'::regclass)
 auth_group_id | integer| not null
 name  | character varying(200) | not null
 timezone  | character varying(200) | 
 longitude | double precision   | 
 latitude  | double precision   | 
 altitude  | double precision   | 
 co

Re: [PERFORM] Slow select times on select with xpath

2009-09-22 Thread Dimitri Fontaine
astro77  writes:
> Kevin Grittner wrote:
>> I would try to minimize how many XML values it had to read, parse, and
>> search.  The best approach that comes to mind would be to use tsearch2
>> techniques (with a GIN or GiST index on the tsvector) to identify
>> which rows contain 'fdc3da1f-060f-4c34-9c30-d9334d9272ae', and use AND
>> to combine that with your xpath search.
>
> Thanks Kevin. I thought about using tsearch2 but I need to be able to select
> exact values on other numerical queries and cannot use "contains" queries.
> It's got to be fast so I cannot have lots of records returned and have to do
> secondary processing on the xml for the records which contain the exact
> value I'm looking for. This is one of the reasons I moved from using Lucene
> for searching. I hope this makes sense.

I think he meant something following this skeleton:

  SELECT ...
FROM ( SELECT ... 
 FROM ...
WHERE /* insert preliminary filtering here */
 )

   WHERE /* insert xpath related filtering here */

Hopefully you have a preliminary filtering available that's restrictive
enough for the xpath filtering to only have to check few rows. Kevin
proposes that this preliminary filtering be based on Tsearch with an
adequate index (GiST for data changing a lot, GIN for pretty static
set).

As you can see the two-steps filtering can be done in a single SQL query.

Regards,
-- 
dim

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow select times on select with xpath

2009-09-21 Thread Ron Mayer
astro77 wrote:
> Thanks Kevin. I thought about using tsearch2 but I need to be able to select
> exact values on other numerical queries and cannot use "contains" queries.

You might be able to make use of a custom parser for tsearch2 that creates
something like a single "word" for xml fragments like 1
which would let you quickly find exact matches for those words/phrases.

> It's got to be fast so I cannot have lots of records returned and have to do
> secondary processing on the xml for the records which contain the exact
> value I'm looking for. This is one of the reasons I moved from using Lucene
> for searching. I hope this makes sense.
> 
> 
> Kevin Grittner wrote:
>>  wrote:
>>  
>>  
>> I would try to minimize how many XML values it had to read, parse, and
>> search.  The best approach that comes to mind would be to use tsearch2
>> techniques (with a GIN or GiST index on the tsvector) to identify
>> which rows contain 'fdc3da1f-060f-4c34-9c30-d9334d9272ae', and use AND
>> to combine that with your xpath search.
>>  
>> -Kevin
>>
>>
>>
> 


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow select times on select with xpath

2009-09-21 Thread astro77

As a follow-up, when I try to create the index like this...

CREATE INDEX concurrently
idx_object_nodeid2
 ON
 object
 USING
 btree(
 xpath('/a:root/a:Identification/b:ObjectId/text()', serialized_object,
 ARRAY
 [
 ARRAY['a', 'http://schemas.datacontract.org/2004/07/Objects'],
 ARRAY['b', 'http://schemas.datacontract.org/2004/07/Security']
 ])
 ) ; 

The index begins to build but fails after about 90 seconds with this error:

ERROR: could not identify a comparison function for type xml
SQL state: 42883



Robert Haas wrote:
> 
> On Thu, Sep 3, 2009 at 4:06 PM, astro77 wrote:
>> I was receiving an error that an XML field does not support the various
>> indexes available in postgresql.
> 
> Please post what happens when you try.
> 
>> Is there an example of how to do this
>> properly?
> 
> Not sure.
> 
> ...Robert
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Slow-select-times-on-select-with-xpath-tp25259351p25530455.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow select times on select with xpath

2009-09-21 Thread astro77

Thanks Kevin. I thought about using tsearch2 but I need to be able to select
exact values on other numerical queries and cannot use "contains" queries.
It's got to be fast so I cannot have lots of records returned and have to do
secondary processing on the xml for the records which contain the exact
value I'm looking for. This is one of the reasons I moved from using Lucene
for searching. I hope this makes sense.


Kevin Grittner wrote:
>  wrote:
>  
>  
> I would try to minimize how many XML values it had to read, parse, and
> search.  The best approach that comes to mind would be to use tsearch2
> techniques (with a GIN or GiST index on the tsvector) to identify
> which rows contain 'fdc3da1f-060f-4c34-9c30-d9334d9272ae', and use AND
> to combine that with your xpath search.
>  
> -Kevin
> 
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Slow-select-times-on-select-with-xpath-tp25259351p25530439.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow select times on select with xpath

2009-09-21 Thread astro77

CREATE INDEX CONCURRENTLY idx_serializedxml
   ON "object" (serialized_object ASC NULLS LAST);

yields the error:
ERROR: data type xml has no default operator class for access method "btree"

The same error occurs when I try to use the other access methods as well.


On Thu, Sep 3, 2009 at 4:06 PM, astro77 wrote:
> I was receiving an error that an XML field does not support the various
> indexes available in postgresql.

Please post what happens when you try.


-- 
View this message in context: 
http://www.nabble.com/Slow-select-times-on-select-with-xpath-tp25259351p25530433.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow select times on select with xpath

2009-09-03 Thread Robert Haas
On Thu, Sep 3, 2009 at 4:06 PM, astro77 wrote:
> I was receiving an error that an XML field does not support the various
> indexes available in postgresql.

Please post what happens when you try.

> Is there an example of how to do this
> properly?

Not sure.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow select times on select with xpath

2009-09-03 Thread astro77

I was receiving an error that an XML field does not support the various
indexes available in postgresql. Is there an example of how to do this
properly?


Robert Haas wrote:
> 
> On Wed, Sep 2, 2009 at 11:04 AM, astro77 wrote:
>>
>> I've got a table set up with an XML field that I would like to search on
>> with
>> 2.5 million records. The xml are serialized objects from my application
>> which are too complex to break out into separate tables. I'm trying to
>> run a
>> query similar to this:
>>
>>        SELECT  serialized_object as outVal
>>         from object  where
>>        (
>>      
>>  array_to_string(xpath('/a:root/a:Identification/b:ObjectId/text()',
>> serialized_object,
>>             ARRAY
>>             [
>>             ARRAY['a',
>> 'http://schemas.datacontract.org/2004/07/Objects'],
>>             ARRAY['b',
>> 'http://schemas.datacontract.org/2004/07/Security']
>>
>>             ]), ' ') = 'fdc3da1f-060f-4c34-9c30-d9334d9272ae'
>>
>>        )
>>        limit 1000;
>>
>> I've also set up an index on the xpath query like this...
>>
>> CREATE INDEX concurrently
>> idx_object_nodeid
>> ON
>> object
>> USING
>> btree(
>>
>>  cast(xpath('/a:root/a:Identification/b:ObjectId/text()',
>> serialized_object,
>>             ARRAY
>>             [
>>             ARRAY['a',
>> 'http://schemas.datacontract.org/2004/07/Objects'],
>>             ARRAY['b',
>> 'http://schemas.datacontract.org/2004/07/Security']
>>
>>             ])as text[])
>> );
>>
>> The query takes around 30 minutes to complete with or without the index
>> in
>> place and does not cache the query. Additionally the EXPLAIN say that the
>> index is not being used. I've looked everywhere but can't seem to find
>> solid
>> info on how to achieve this. Any ideas would be greatly appreciated.
> 
> Why do you have a cast in the index definition?
> 
> ...Robert
> 
> -- 
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
> 
> 

-- 
View this message in context: 
http://www.nabble.com/Slow-select-times-on-select-with-xpath-tp25259351p25283175.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow select times on select with xpath

2009-09-03 Thread Robert Haas
On Wed, Sep 2, 2009 at 11:04 AM, astro77 wrote:
>
> I've got a table set up with an XML field that I would like to search on with
> 2.5 million records. The xml are serialized objects from my application
> which are too complex to break out into separate tables. I'm trying to run a
> query similar to this:
>
>        SELECT  serialized_object as outVal
>         from object  where
>        (
>        array_to_string(xpath('/a:root/a:Identification/b:ObjectId/text()',
> serialized_object,
>             ARRAY
>             [
>             ARRAY['a', 'http://schemas.datacontract.org/2004/07/Objects'],
>             ARRAY['b', 'http://schemas.datacontract.org/2004/07/Security']
>
>             ]), ' ') = 'fdc3da1f-060f-4c34-9c30-d9334d9272ae'
>
>        )
>        limit 1000;
>
> I've also set up an index on the xpath query like this...
>
> CREATE INDEX concurrently
> idx_object_nodeid
> ON
> object
> USING
> btree(
>
>  cast(xpath('/a:root/a:Identification/b:ObjectId/text()', serialized_object,
>             ARRAY
>             [
>             ARRAY['a', 'http://schemas.datacontract.org/2004/07/Objects'],
>             ARRAY['b', 'http://schemas.datacontract.org/2004/07/Security']
>
>             ])as text[])
> );
>
> The query takes around 30 minutes to complete with or without the index in
> place and does not cache the query. Additionally the EXPLAIN say that the
> index is not being used. I've looked everywhere but can't seem to find solid
> info on how to achieve this. Any ideas would be greatly appreciated.

Why do you have a cast in the index definition?

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow select times on select with xpath

2009-09-03 Thread Kevin Grittner
astro77  wrote:
 
> I've got a table set up with an XML field that I would like to search
on 
> with
> 2.5 million records. The xml are serialized objects from my
application
> which are too complex to break out into separate tables. I'm trying
to run a
> query similar to this:
> 
>   SELECT  serialized_object as outVal
>from object  where
>   (
>
array_to_string(xpath('/a:root/a:Identification/b:ObjectId/text()',
> serialized_object, 
>  ARRAY
>  [
>  ARRAY['a',
'http://schemas.datacontract.org/2004/07/Objects'],
>  ARRAY['b',
'http://schemas.datacontract.org/2004/07/Security']
>  
>  ]), ' ') = 'fdc3da1f-060f-4c34-9c30-d9334d9272ae'
> 
>   )
>   limit 1000;
 
I would try to minimize how many XML values it had to read, parse, and
search.  The best approach that comes to mind would be to use tsearch2
techniques (with a GIN or GiST index on the tsvector) to identify
which rows contain 'fdc3da1f-060f-4c34-9c30-d9334d9272ae', and use AND
to combine that with your xpath search.
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Slow select times on select with xpath

2009-09-02 Thread astro77

I've got a table set up with an XML field that I would like to search on with
2.5 million records. The xml are serialized objects from my application
which are too complex to break out into separate tables. I'm trying to run a
query similar to this:

SELECT  serialized_object as outVal
 from object  where
(
array_to_string(xpath('/a:root/a:Identification/b:ObjectId/text()',
serialized_object, 
 ARRAY
 [
 ARRAY['a', 'http://schemas.datacontract.org/2004/07/Objects'],
 ARRAY['b', 'http://schemas.datacontract.org/2004/07/Security']
 
 ]), ' ') = 'fdc3da1f-060f-4c34-9c30-d9334d9272ae'

)
limit 1000;

I've also set up an index on the xpath query like this...

CREATE INDEX concurrently
idx_object_nodeid
ON
object
USING
btree(

 cast(xpath('/a:root/a:Identification/b:ObjectId/text()', serialized_object, 
 ARRAY
 [
 ARRAY['a', 'http://schemas.datacontract.org/2004/07/Objects'],
 ARRAY['b', 'http://schemas.datacontract.org/2004/07/Security']
 
 ])as text[])
);

The query takes around 30 minutes to complete with or without the index in
place and does not cache the query. Additionally the EXPLAIN say that the
index is not being used. I've looked everywhere but can't seem to find solid
info on how to achieve this. Any ideas would be greatly appreciated.
-- 
View this message in context: 
http://www.nabble.com/Slow-select-times-on-select-with-xpath-tp25259351p25259351.html
Sent from the PostgreSQL - performance mailing list archive at Nabble.com.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow select performance despite seemingly reasonable query plan

2009-05-09 Thread Laurent Wandrebeck
2009/5/7 David Brain :
> Hi,
Hi,
>
> Some answers in-line:
>
>>
>> Has there been a performance *change*, or are you just concerned about a
>> query which doesn't seem to use "enough" disc bandwidth?
>
> Performance has degraded noticeably over the past few days.
>
>> Certainly random access like this index scan can be extremely slow. 2-4 MB/s
>> is quite reasonable if you're fetching one 8kB block per disc seek - no more
>> than 200 per second.
>
> We have read ahead set pretty aggressively high as the SAN seems to
> 'like' this, given some testing we did:
>
> /sbin/blockdev --getra /dev/sdb
> 16384
>
>
>> One concern I might have with a big setup like that is how big the database
>> directory has got, and whether directory lookups are taking time. Check to
>> see if you have the directory_index option enabled on your ext3 filesystem.
>>
>
> That's a thought, I doubt the option is set (I didn't set it and I
> don't _think_ rhel does by default), however the 'base' directory only
> contains ~5500 items total, so it's not getting too out of hand.
default rhel ext3 options are (in 4.x and 5.x) :
Filesystem features:  has_journal ext_attr resize_inode dir_index
filetype needs_recovery sparse_super large_file
See tune2fs -l /dev/sdXY
Laurent.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow select performance despite seemingly reasonable query plan

2009-05-07 Thread Nikolas Everett
On Thu, May 7, 2009 at 11:19 AM, Matthew Wakeling wrote:

> On Thu, 7 May 2009, David Brain wrote:
>
>> Certainly random access like this index scan can be extremely slow. 2-4
>>> MB/s
>>> is quite reasonable if you're fetching one 8kB block per disc seek - no
>>> more
>>> than 200 per second.
>>>
>>
>> We have read ahead set pretty aggressively high as the SAN seems to
>> 'like' this, given some testing we did:
>>
>> /sbin/blockdev --getra /dev/sdb
>> 16384
>>
>
> Read-ahead won't really help with completely random access.


Thats a shame because it would be really nice to get the entire index into
shared memory or OS cache.  Most of the time queries are on data in the past
few months.  All of the indexes in the past few months should fit in cache.

Did something happen to get those indexes flushed from the cache?  Were they
in the cache before?


> I think a much more interesting line of enquiry will be trying to work out
> what has changed, and why it was fast before.
>
> How much of the data you're accessing are you expecting to be in the OS
> cache?
>
> Is the table you're index scanning on ordered at all? Could that have
> changed recently?


I wrote the application that puts data in that table.  Its sort of ordered
by that timestamp.  Every five minutes it adds rows in no particular order
that need to be added.  The rows that need to be added every five minutes
are ordered by another timestamp that is correlated to but not the same as
the indexed timestamp.


>
>
>  That's a thought, I doubt the option is set (I didn't set it and I
>> don't _think_ rhel does by default), however the 'base' directory only
>> contains ~5500 items total, so it's not getting too out of hand.
>>
>
> I think quite a few systems do set it by default now.
>
> Matthew
>
> --
> Me... a skeptic?  I trust you have proof?
>
>
> --
> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-performance
>


Re: [PERFORM] Slow select performance despite seemingly reasonable query plan

2009-05-07 Thread Matthew Wakeling

On Thu, 7 May 2009, David Brain wrote:

Certainly random access like this index scan can be extremely slow. 2-4 MB/s
is quite reasonable if you're fetching one 8kB block per disc seek - no more
than 200 per second.


We have read ahead set pretty aggressively high as the SAN seems to
'like' this, given some testing we did:

/sbin/blockdev --getra /dev/sdb
16384


Read-ahead won't really help with completely random access.

I think a much more interesting line of enquiry will be trying to work out 
what has changed, and why it was fast before.


How much of the data you're accessing are you expecting to be in the OS 
cache?


Is the table you're index scanning on ordered at all? Could that have 
changed recently?



That's a thought, I doubt the option is set (I didn't set it and I
don't _think_ rhel does by default), however the 'base' directory only
contains ~5500 items total, so it's not getting too out of hand.


I think quite a few systems do set it by default now.

Matthew

--
Me... a skeptic?  I trust you have proof?

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow select performance despite seemingly reasonable query plan

2009-05-07 Thread Nikolas Everett
>
> Nested Loop Left Join  (cost=0.00..6462463.96 rows=1894 width=110)
>   ->  Append  (cost=0.00..6453365.66 rows=1894 width=118)
> ->  Seq Scan on datatable sum  (cost=0.00..10.75 rows=1 width=118)
>   Filter: ((datapointdate >= '2009-04-01
> 00:00:00'::timestamp without time zone) AND (datapointdate <=
> '2009-04-30 23:59:59'::timestamp without time zone) AND
> ((customerid)::text = ''::text) AND (NOT CASE WHEN (NOT obsolete)
> THEN false ELSE CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN
> (obsoletereasonid = 1) THEN true WHEN (obsoletereasonid = 2) THEN true
> WHEN (cdrdatasourceid = 1) THEN false ELSE true END END))
> ->  Index Scan using
> datatable_20090328_customeriddatapointdate_idx on datatable_20090328
> sum  (cost=0.00..542433.51 rows=180 width=49)
>   Index Cond: ((datapointdate >= '2009-04-01
> 00:00:00'::timestamp without time zone) AND (datapointdate <=
> '2009-04-30 23:59:59'::timestamp without time zone) AND
> ((customerid)::text = ''::text))
>   Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE
> CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid
> = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN
> (cdrdatasourceid = 1) THEN false ELSE true END END)
> ->  Index Scan using
> datatable_20090404_customeriddatapointdate_idx on datatable_20090404
> sum  (cost=0.00..1322098.74 rows=405 width=48)
>   Index Cond: ((datapointdate >= '2009-04-01
> 00:00:00'::timestamp without time zone) AND (datapointdate <=
> '2009-04-30 23:59:59'::timestamp without time zone) AND
> ((customerid)::text = ''::text))
>   Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE
> CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid
> = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN
> (cdrdatasourceid = 1) THEN false ELSE true END END)
> ->  Index Scan using
> datatable_20090411_customeriddatapointdate_idx on datatable_20090411
> sum  (cost=0.00..1612744.29 rows=450 width=48)
>   Index Cond: ((datapointdate >= '2009-04-01
> 00:00:00'::timestamp without time zone) AND (datapointdate <=
> '2009-04-30 23:59:59'::timestamp without time zone) AND
> ((customerid)::text = ''::text))
>   Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE
> CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid
> = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN
> (cdrdatasourceid = 1) THEN false ELSE true END END)
> ->  Index Scan using
> datatable_20090418_customeriddatapointdate_idx on datatable_20090418
> sum  (cost=0.00..1641913.58 rows=469 width=49)
>   Index Cond: ((datapointdate >= '2009-04-01
> 00:00:00'::timestamp without time zone) AND (datapointdate <=
> '2009-04-30 23:59:59'::timestamp without time zone) AND
> ((customerid)::text = ''::text))
>   Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE
> CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid
> = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN
> (cdrdatasourceid = 1) THEN false ELSE true END END)
> ->  Index Scan using
> datatable_20090425_customeriddatapointdate_idx on datatable_20090425
> sum  (cost=0.00..1334164.80 rows=389 width=49)
>   Index Cond: ((datapointdate >= '2009-04-01
> 00:00:00'::timestamp without time zone) AND (datapointdate <=
> '2009-04-30 23:59:59'::timestamp without time zone) AND
> ((customerid)::text = ''::text))
>   Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE
> CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid
> = 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN
> (cdrdatasourceid = 1) THEN false ELSE true END END)
>   ->  Index Scan using pk_cdrextension on cdrextension ext
> (cost=0.00..4.77 rows=1 width=8)
> Index Cond: (sum.id = ext.datatableid)
>
>
Something doesn't look right.  Why is it doing an index scan on
datatable_20090404 when the constraint for that table puts it as entirely in
the date range? Shouldn't it just seq scan the partition or use the
partition's customerid index?


Re: [PERFORM] Slow select performance despite seemingly reasonable query plan

2009-05-07 Thread David Brain
Hi,

Some answers in-line:

>
> Has there been a performance *change*, or are you just concerned about a
> query which doesn't seem to use "enough" disc bandwidth?

Performance has degraded noticeably over the past few days.

> Certainly random access like this index scan can be extremely slow. 2-4 MB/s
> is quite reasonable if you're fetching one 8kB block per disc seek - no more
> than 200 per second.

We have read ahead set pretty aggressively high as the SAN seems to
'like' this, given some testing we did:

/sbin/blockdev --getra /dev/sdb
16384


> One concern I might have with a big setup like that is how big the database
> directory has got, and whether directory lookups are taking time. Check to
> see if you have the directory_index option enabled on your ext3 filesystem.
>

That's a thought, I doubt the option is set (I didn't set it and I
don't _think_ rhel does by default), however the 'base' directory only
contains ~5500 items total, so it's not getting too out of hand.

David

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow select performance despite seemingly reasonable query plan

2009-05-07 Thread Matthew Wakeling

On Thu, 7 May 2009, David Brain wrote:

This has been working reasonably well, however in the last few days
I've been seeing extremely slow performance on what are essentially
fairly simple 'index hitting' selects on this data.  From the host
side I see that the postgres query process is mostly in IO wait,
however there is very little data actually being transferred (maybe
2-4 MB/s) - when a different query (say a select count(*) form
datatable) will yield a sustained 150+ MB/s.


Has there been a performance *change*, or are you just concerned about a 
query which doesn't seem to use "enough" disc bandwidth?



1. Problem with random versus sequential reads on storage system.


Certainly random access like this index scan can be extremely slow. 2-4 
MB/s is quite reasonable if you're fetching one 8kB block per disc seek - 
no more than 200 per second.



3. Problem with the host environment - one suspicion I have here is
that we are >90% full on the storage drives (ext3), I'm not sure if
that is impacting performance.


One concern I might have with a big setup like that is how big the 
database directory has got, and whether directory lookups are taking time. 
Check to see if you have the directory_index option enabled on your ext3 
filesystem.


Matthew

--
The third years are wandering about all worried at the moment because they
have to hand in their final projects. Please be sympathetic to them, say
things like "ha-ha-ha", but in a sympathetic tone of voice 
   -- Computer Science Lecturer


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow select performance despite seemingly reasonable query plan

2009-05-07 Thread David Brain
Hi,

Interesting, for one index on one partition:

idx_scan: 329
idx_tup_fetch: 8905730

So maybe a reindex would help?

David.

On Thu, May 7, 2009 at 10:26 AM, Scott Mead
 wrote:
> On Thu, May 7, 2009 at 10:14 AM, David Brain  wrote:
>>
>> Hi,
>>
>> Some context, we have a _lot_ of data, > 1TB, mostly in 1 'table' -
>> the 'datatable' in the example below although in order to improve
>> performance this table is partitioned (by date range) into a number of
>> partition tables.  Each partition contains up to 20GB of data (tens of
>> millons of rows), with an additional ~3GB of indexes, all this is
>> served off a fairly high performance server (8 core 32Gb, with FC
>> attached SAN storage).  PostgreSQL version is 8.3.5 (running on 64bit
>> RHEL 5.2)
>>
>> This has been working reasonably well, however in the last few days
>> I've been seeing extremely slow performance on what are essentially
>> fairly simple 'index hitting' selects on this data.
>
>    Have you re-indexed any of your partitioned tables?  If you're index is
> fragmented, you'll be incurring extra I/O's per index access.  Take a look
> at the pgstattuple contrib for some functions to determine index
> fragmentation.  You can also take a look at the pg_stat_all_indexes tables.
> If your number of tup's fetched is 100 x more than your idx_scans, you *may*
> consider reindexing.
>
> --Scott
>
>



-- 
David Brain
dbr...@bandwidth.com
919.297.1078

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow select performance despite seemingly reasonable query plan

2009-05-07 Thread Scott Mead
On Thu, May 7, 2009 at 10:14 AM, David Brain  wrote:

> Hi,
>
> Some context, we have a _lot_ of data, > 1TB, mostly in 1 'table' -
> the 'datatable' in the example below although in order to improve
> performance this table is partitioned (by date range) into a number of
> partition tables.  Each partition contains up to 20GB of data (tens of
> millons of rows), with an additional ~3GB of indexes, all this is
> served off a fairly high performance server (8 core 32Gb, with FC
> attached SAN storage).  PostgreSQL version is 8.3.5 (running on 64bit
> RHEL 5.2)
>
> This has been working reasonably well, however in the last few days
> I've been seeing extremely slow performance on what are essentially
> fairly simple 'index hitting' selects on this data.


   Have you re-indexed any of your partitioned tables?  If you're index is
fragmented, you'll be incurring extra I/O's per index access.  Take a look
at the pgstattuple contrib for some functions to determine index
fragmentation.  You can also take a look at the pg_stat_all_indexes tables.
If your number of tup's fetched is 100 x more than your idx_scans, you *may*
consider reindexing.

--Scott


[PERFORM] Slow select performance despite seemingly reasonable query plan

2009-05-07 Thread David Brain
Hi,

Some context, we have a _lot_ of data, > 1TB, mostly in 1 'table' -
the 'datatable' in the example below although in order to improve
performance this table is partitioned (by date range) into a number of
partition tables.  Each partition contains up to 20GB of data (tens of
millons of rows), with an additional ~3GB of indexes, all this is
served off a fairly high performance server (8 core 32Gb, with FC
attached SAN storage).  PostgreSQL version is 8.3.5 (running on 64bit
RHEL 5.2)

This has been working reasonably well, however in the last few days
I've been seeing extremely slow performance on what are essentially
fairly simple 'index hitting' selects on this data.  From the host
side I see that the postgres query process is mostly in IO wait,
however there is very little data actually being transferred (maybe
2-4 MB/s) - when a different query (say a select count(*) form
datatable) will yield a sustained 150+ MB/s.  There have been no
configuration changes during this time, although of course the
database has grown as data is added on a daily basis.

I'm not sure of the best way to diagnose this issue - the possible
causes  I can think of are:

1. Problem with random versus sequential reads on storage system.
2. 'Something' with PostgreSQL itself.
3. Problem with the host environment - one suspicion I have here is
that we are >90% full on the storage drives (ext3), I'm not sure if
that is impacting performance.

Any thoughts as to how to procede from here would be very welcome.

Here is an example query plan - looks reasonable to me, seems is
making use of the indexes and the constraint exclusion on the
partition tables:

Nested Loop Left Join  (cost=0.00..6462463.96 rows=1894 width=110)
   ->  Append  (cost=0.00..6453365.66 rows=1894 width=118)
 ->  Seq Scan on datatable sum  (cost=0.00..10.75 rows=1 width=118)
   Filter: ((datapointdate >= '2009-04-01
00:00:00'::timestamp without time zone) AND (datapointdate <=
'2009-04-30 23:59:59'::timestamp without time zone) AND
((customerid)::text = ''::text) AND (NOT CASE WHEN (NOT obsolete)
THEN false ELSE CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN
(obsoletereasonid = 1) THEN true WHEN (obsoletereasonid = 2) THEN true
WHEN (cdrdatasourceid = 1) THEN false ELSE true END END))
 ->  Index Scan using
datatable_20090328_customeriddatapointdate_idx on datatable_20090328
sum  (cost=0.00..542433.51 rows=180 width=49)
   Index Cond: ((datapointdate >= '2009-04-01
00:00:00'::timestamp without time zone) AND (datapointdate <=
'2009-04-30 23:59:59'::timestamp without time zone) AND
((customerid)::text = ''::text))
   Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE
CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid
= 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN
(cdrdatasourceid = 1) THEN false ELSE true END END)
 ->  Index Scan using
datatable_20090404_customeriddatapointdate_idx on datatable_20090404
sum  (cost=0.00..1322098.74 rows=405 width=48)
   Index Cond: ((datapointdate >= '2009-04-01
00:00:00'::timestamp without time zone) AND (datapointdate <=
'2009-04-30 23:59:59'::timestamp without time zone) AND
((customerid)::text = ''::text))
   Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE
CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid
= 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN
(cdrdatasourceid = 1) THEN false ELSE true END END)
 ->  Index Scan using
datatable_20090411_customeriddatapointdate_idx on datatable_20090411
sum  (cost=0.00..1612744.29 rows=450 width=48)
   Index Cond: ((datapointdate >= '2009-04-01
00:00:00'::timestamp without time zone) AND (datapointdate <=
'2009-04-30 23:59:59'::timestamp without time zone) AND
((customerid)::text = ''::text))
   Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE
CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid
= 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN
(cdrdatasourceid = 1) THEN false ELSE true END END)
 ->  Index Scan using
datatable_20090418_customeriddatapointdate_idx on datatable_20090418
sum  (cost=0.00..1641913.58 rows=469 width=49)
   Index Cond: ((datapointdate >= '2009-04-01
00:00:00'::timestamp without time zone) AND (datapointdate <=
'2009-04-30 23:59:59'::timestamp without time zone) AND
((customerid)::text = ''::text))
   Filter: (NOT CASE WHEN (NOT obsolete) THEN false ELSE
CASE WHEN (obsoletereasonid IS NULL) THEN true WHEN (obsoletereasonid
= 1) THEN true WHEN (obsoletereasonid = 2) THEN true WHEN
(cdrdatasourceid = 1) THEN false ELSE true END END)
 ->  Index Scan using
datatable_20090425_customeriddatapointdate_idx on datatable_20090425
sum  (cost=0.00..1334164.80 rows=389 width=49)
   Index Cond: ((datapointdate >= '2009-04-01
00:00:00'::timestamp without time zone) AND (datapointdate <=
'2

Re: [PERFORM] Slow SELECT on three or more clients

2006-11-16 Thread AMIR FRANCO D. JOVEN

Hi Markus,

Thank you very much for the information.

I was able to make it fast by correcting indices, i created index on
frequently filtered fields.
now it runs at 0.05 seconds average, much faster than before 0.86.

I will also upgrade to 8.1.5.

Once again, thank you very much. it helped me a lot.

Amir

On 11/15/06, Markus Schaber <[EMAIL PROTECTED]> wrote:


Hi, Amir,

AMIR FRANCO D. JOVEN wrote:

> My current project uses PostgreSQL 7.3.4.

By all means, please upgrade.

The newest 7.3 series version is 7.3.16, which fixes lots of critical
bugs, and can be used as a drop-in replacement for 7.3.4 (see Release
Notes at http://www.postgresql.org/docs/7.3/interactive/release.html )

The newest stable release is 8.1.5, and 8.2 is just on the roads...

> I have a table with 94 fields and a select with only one resultset in
> only one client consumes about 0.86 seconds.

"with only on resultset"?

You mean "with only one returned row", I presume.

Each SELECT has exactly one resultset, which can contain zero to many
rows.

Please check the following:

- Did you create the appropriate indices?

- Version 7.3.X may suffer from index bloat, so REINDEX might help.

- Did you VACUUM and ANALYZE the table properly?

- Is your free space map setting, the statistics targets, and other
config options tuned to fit your environment?

- Maybe a VACUUM FULL or a CLUSTER command may help you.

> for a single select with one field in one resultset, is 0.86 seconds
normal?

That depends on the circumstances.

Markus

--
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org





--
AMIR FRANCO D. JOVEN
Software Engineer
DIGI Software (PHILS.) Inc.


Re: [PERFORM] Slow SELECT on three or more clients

2006-11-15 Thread Markus Schaber
Hi, Amir,

AMIR FRANCO D. JOVEN wrote:

> My current project uses PostgreSQL 7.3.4.

By all means, please upgrade.

The newest 7.3 series version is 7.3.16, which fixes lots of critical
bugs, and can be used as a drop-in replacement for 7.3.4 (see Release
Notes at http://www.postgresql.org/docs/7.3/interactive/release.html )

The newest stable release is 8.1.5, and 8.2 is just on the roads...

> I have a table with 94 fields and a select with only one resultset in
> only one client consumes about 0.86 seconds.

"with only on resultset"?

You mean "with only one returned row", I presume.

Each SELECT has exactly one resultset, which can contain zero to many rows.

Please check the following:

- Did you create the appropriate indices?

- Version 7.3.X may suffer from index bloat, so REINDEX might help.

- Did you VACUUM and ANALYZE the table properly?

- Is your free space map setting, the statistics targets, and other
config options tuned to fit your environment?

- Maybe a VACUUM FULL or a CLUSTER command may help you.

> for a single select with one field in one resultset, is 0.86 seconds normal?

That depends on the circumstances.

Markus

-- 
Markus Schaber | Logical Tracking&Tracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in Europe! www.ffii.org
www.nosoftwarepatents.org

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Slow SELECT on three or more clients

2006-11-15 Thread Merlin Moncure

On 11/15/06, AMIR FRANCO D. JOVEN <[EMAIL PROTECTED]> wrote:

Hi!

 Im new to PostgreSQL.

 My current project uses PostgreSQL 7.3.4.

 the problem is like this:

 I have a table with 94 fields and a select with only one resultset in only
one client consumes about 0.86 seconds.
 The client executes three 'select' statements to perform the task which
consumes 2.58 seconds.
 With only one client this is acceptable, but the real problem is as i add
more clients, it goes more and more slower.

 for a single select with one field in one resultset, is 0.86 seconds
normal?

 I tried vacuuming and reindexing but to no avail.
 the total record count in that particular table is 456,541.


returning 450k rows in around 1 second is about right for a result set
with one field.  imo, your best bet is to try and break up your table
and reorganize it so you dont have to query the whole thing every
time.  why do you need to return all the rows over and over?

merlin

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PERFORM] Slow SELECT on three or more clients

2006-11-15 Thread Gregory S. Williamson
Operating system and some of the basic PostreSQL config settings would be 
helpful, plus any info you have on your disks, the size of the relevant tables, 
their structure and indexes & vacuum/analyze status ... plus what others have 
said:

Upgrade!

There are considerable improvements in, well, *everything* !, since 7.3 (we 
havew some database atb 7.4.x and I consider them out-of-date). Hopefully this 
list can provide help to get you through whatever your immediate crisis is, but 
do consider planning for this as soon as time and resource permit.

Data integrity is a _good_ thing!

Greg Williamson
DBA
GlobeXplorer LLC


-Original Message-
From:   [EMAIL PROTECTED] on behalf of Russell Smith
Sent:   Wed 11/15/2006 5:31 AM
To: AMIR FRANCO D. JOVEN
Cc: pgsql-performance@postgresql.org
Subject:Re: [PERFORM] Slow SELECT on three or more clients

AMIR FRANCO D. JOVEN wrote:
> Hi!
>
> Im new to PostgreSQL.
>
> My current project uses PostgreSQL 7.3.4.
Upgrading your version of PostgreSQL to 8.1 will give you significant 
benefits to performance.
>
> the problem is like this:
>
> I have a table with 94 fields and a select with only one resultset in 
> only one client consumes about 0.86 seconds.
> The client executes three 'select' statements to perform the task 
> which consumes 2.58 seconds.
> With only one client this is acceptable, but the real problem is as i 
> add more clients, it goes more and more slower.
>
> for a single select with one field in one resultset, is 0.86 seconds 
> normal?
You will need to attach the query.
EXPLAIN ANALYZE SELECT ...

where SELECT ... is your query.  That will help us work out what the 
problem is. 

0.86 seconds might be slow for a query that returns 1 row, it might be 
fast for a query that returns a large set with complex joins and where 
conditions.  Fast and slow are not objective terms.  They are very 
dependent on the query.

>
> I tried vacuuming and reindexing but to no avail.
> the total record count in that particular table is 456,541.
>
456,541 is not all that many records.  But again you will need to post 
more information for us to be able to assist.
> Thanks in advance.
>


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


---
Click link below if it is SPAM [EMAIL PROTECTED]
"https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=455b17b2223071076418835&[EMAIL
 PROTECTED]&retrain=spam&template=history&history_page=1"
!DSPAM:455b17b2223071076418835!
---






---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Slow SELECT on three or more clients

2006-11-15 Thread Russell Smith

AMIR FRANCO D. JOVEN wrote:

Hi!

Im new to PostgreSQL.

My current project uses PostgreSQL 7.3.4.
Upgrading your version of PostgreSQL to 8.1 will give you significant 
benefits to performance.


the problem is like this:

I have a table with 94 fields and a select with only one resultset in 
only one client consumes about 0.86 seconds.
The client executes three 'select' statements to perform the task 
which consumes 2.58 seconds.
With only one client this is acceptable, but the real problem is as i 
add more clients, it goes more and more slower.


for a single select with one field in one resultset, is 0.86 seconds 
normal?

You will need to attach the query.
EXPLAIN ANALYZE SELECT ...

where SELECT ... is your query.  That will help us work out what the 
problem is. 

0.86 seconds might be slow for a query that returns 1 row, it might be 
fast for a query that returns a large set with complex joins and where 
conditions.  Fast and slow are not objective terms.  They are very 
dependent on the query.




I tried vacuuming and reindexing but to no avail.
the total record count in that particular table is 456,541.

456,541 is not all that many records.  But again you will need to post 
more information for us to be able to assist.

Thanks in advance.




---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Slow SELECT on three or more clients

2006-11-15 Thread Andreas Kostyrka
* AMIR FRANCO D. JOVEN <[EMAIL PROTECTED]> [061115 12:44]:
> Hi!
> 
> Im new to PostgreSQL.
> 
> My current project uses PostgreSQL 7.3.4.
Ancient. Upgrade it, especially if it's a new database.

> 
> the problem is like this:
> 
> I have a table with 94 fields and a select with only one resultset in only
> one client consumes about 0.86 seconds.
> The client executes three 'select' statements to perform the task which
> consumes 2.58 seconds.
> With only one client this is acceptable, but the real problem is as i add
> more clients, it goes more and more slower.
That depends upon:
a) your table schema.
b) the data in the tables. E.g. how big are rows, how many rows.
c) the size of the result sets.
d) your indexes?

Andreas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] Slow SELECT on three or more clients

2006-11-15 Thread AMIR FRANCO D. JOVEN
Hi!

Im new to PostgreSQL.

My current project uses PostgreSQL 7.3.4.

the problem is like this:

I have a table with 94 fields and a select with only one resultset in only one client consumes about 0.86 seconds.
The client executes three 'select' statements to perform the task which consumes 2.58 seconds.
With only one client this is acceptable, but the real problem is as i add more clients, it goes more and more slower.

for a single select with one field in one resultset, is 0.86 seconds normal?

I tried vacuuming and reindexing but to no avail.
the total record count in that particular table is 456,541.

Thanks in advance.



Re: [PERFORM] Slow select, insert, update

2004-08-10 Thread Litao Wu
Does that mean reindex is not needed
for PG version 7.4?

In what kind situations under PG 7.4, 
reindex is worthwhile?

Thanks,
 

Here is doc from 7.3:
PostgreSQL is unable to reuse B-tree index pages in
certain cases. The problem is that if indexed rows are
deleted, those index pages can only be reused by rows
with similar values. For example, if indexed rows are
deleted and newly inserted/updated rows have much
higher values, the new rows can't use the index space
made available by the deleted rows. Instead, such new
rows must be placed on new index pages. In such cases,
disk space used by the index will grow indefinitely,
even if VACUUM is run frequently. 

As a solution, you can use the REINDEX command
periodically to discard pages used by deleted rows.
There is also contrib/reindexdb which can reindex an
entire database. 

The counterpart of 7.4 is:
In some situations it is worthwhile to rebuild indexes
periodically with the REINDEX command. (There is also
contrib/reindexdb which can reindex an entire
database.) However, PostgreSQL 7.4 has substantially
reduced the need for this activity compared to earlier
releases. 


--- Doug McNaught <[EMAIL PROTECTED]> wrote:

> Paul Langard <[EMAIL PROTECTED]> writes:
> 
> > Having trouble with one table (see time to count
> records below!).
> >
> > Fairly new to postgres so any help much
> appreciated.
> >
> > It only contains 9,106 records - as you can see
> from:
> >
> >
> > select count(id) from project
> >
> > count
> > 9106
> > 1 row(s)
> > Total runtime: 45,778.813 ms
> 
> > ... the database is regularly vaccuumed.
> 
> Hmm.  You might try a VACUUM FULL and a REINDEX on
> the table (you
> don't say what version you are running--REINDEX is
> sometimes needed on
> 7.3 and below).
> 
> Also, use EXPLAIN ANALYZE on your query and post the
> result--that's
> helpful diagnostic information.
> 
> -Doug
> -- 
> Let us cross over the river, and rest under the
> shade of the trees.
>--T. J. Jackson, 1863
> 
> 
> ---(end of
> broadcast)---
> TIP 8: explain analyze is your friend
> 




__
Do you Yahoo!?
Read only the mail you want - Yahoo! Mail SpamGuard.
http://promotions.yahoo.com/new_mail 

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Slow select, insert, update

2004-08-10 Thread Tom Lane
Paul Langard <[EMAIL PROTECTED]> writes:
> select count(id) from project
> count
> 9106
> 1 row(s)
> Total runtime: 45,778.813 ms

Yipes.  The only explanation I can think of is tremendous table bloat.
What do you get from "vacuum verbose project" --- in particular, how
many pages in the table?

> ... the database is regularly vaccuumed.

Not regularly enough, perhaps ... or else you need to increase the free
space map size parameters.  In any case you'll probably need to do one
round of "vacuum full" to get this table back within bounds.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Slow select, insert, update

2004-08-10 Thread Bill Montgomery
Paul,
Paul Langard wrote:
Having trouble with one table (see time to count records below!).
Fairly new to postgres so any help much appreciated.
It only contains 9,106 records - as you can see from:
select count(id) from project
*count
*9106
1 row(s)
Total runtime: 45,778.813 ms

... the database is regularly vaccuumed. 

Have you tried doing a VACUUM FULL, CLUSTER, or drop/restore on the 
table? This sounds symptomatic of a table with a bunch of dead tuples 
not in the FSM (free space map). Only tuples in the FSM are reclaimed by 
a regular VACUUM. If your FSM parameters in postgresql.conf are not big 
enough for your ratio of UPDATE/DELETE operations to VACUUM frequency, 
you will end up with dead tuples that will only be reclaimed by a VACUUM 
FULL.

To prevent this problem in the future, look at increasing your FSM size 
and possibly vacuuming more frequently or using pg_autovacuum.

Good Luck,
Bill Montgomery
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Slow select, insert, update

2004-08-10 Thread Doug McNaught
Paul Langard <[EMAIL PROTECTED]> writes:

> Having trouble with one table (see time to count records below!).
>
> Fairly new to postgres so any help much appreciated.
>
> It only contains 9,106 records - as you can see from:
>
>
> select count(id) from project
>
> count
> 9106
> 1 row(s)
> Total runtime: 45,778.813 ms

> ... the database is regularly vaccuumed.

Hmm.  You might try a VACUUM FULL and a REINDEX on the table (you
don't say what version you are running--REINDEX is sometimes needed on
7.3 and below).

Also, use EXPLAIN ANALYZE on your query and post the result--that's
helpful diagnostic information.

-Doug
-- 
Let us cross over the river, and rest under the shade of the trees.
   --T. J. Jackson, 1863


---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] Slow select, insert, update

2004-08-10 Thread Paul Langard
Having trouble with one table (see time to count records below!).

Fairly new to postgres so any help much appreciated.

It only contains 9,106 records - as you can see from:


select count(id) from project

count
9106
1 row(s)
Total runtime: 45,778.813 ms


There are only 3 fields:

id
integer
nextval('id'::text)

projectnumber
text

description
text


There is one index:

id_project_ukey
CREATE UNIQUE INDEX id_project_ukey ON project USING btree (id)

... the database is regularly vaccuumed.

Re: [PERFORM] slow select

2003-10-26 Thread Shridhar Daithankar
Vivek Khera wrote:
"JB" == Josh Berkus <[EMAIL PROTECTED]> writes:
JB> Actually, what OS's can't use all idle ram for kernel cache?  I
JB> should note that in my performance docs 
FreeBSD.  Limited by the value of "sysctl vfs.hibufspace" from what I
understand.  This value is set at boot based on available RAM and some
other tuning parameters.
Actually I wanted to ask this question for long time. Can we have guidelines 
about how to set effective cache size for various OSs?

Linux is pretty simple. Everything free is buffer cache. FreeBSD, not so 
straightforward but there is a sysctl..

How about HP-UX, Solaris and AIX? Other BSDs? and most importantly windows?

That could add much value to the tuning guide. Isn't it?

 Shridhar

---(end of broadcast)---
TIP 6: Have you searched our list archives?
  http://archives.postgresql.org


Re: [PERFORM] slow select

2003-10-24 Thread Vivek Khera
> "JB" == Josh Berkus <[EMAIL PROTECTED]> writes:

JB> Vivek,
>> Assuming your OS will use that much RAM for the cache... the whole
>> world's not Linux :-)

JB> It's not?   Darn!

:-)

JB> Actually, what OS's can't use all idle ram for kernel cache?  I
JB> should note that in my performance docs 

FreeBSD.  Limited by the value of "sysctl vfs.hibufspace" from what I
understand.  This value is set at boot based on available RAM and some
other tuning parameters.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] slow select

2003-10-24 Thread Josh Berkus
Vivek,

> Assuming your OS will use that much RAM for the cache... the whole
> world's not Linux :-)

It's not?   Darn!

Actually, what OS's can't use all idle ram for kernel cache?   I should note 
that in my performance docs 

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] slow select

2003-10-24 Thread Greg Stark

"Medora Schauer" <[EMAIL PROTECTED]> writes:

> Merge Join  (cost=0.00..287726.10 rows=100221 width=58) (actual time=61.60..5975.63 
> rows=100425 loops=1)
>Merge Cond: (("outer".shot_line_num = "inner".shot_line_num) AND 
> ("outer".shotpoint = "inner".shotpoint))
>->  Index Scan using hsot_record_idx on shot_record r  (cost=0.00..123080.11 
> rows=100425 width=46) (actual time=24.15..2710.31 rows=100425 loops=1)
>->  Index Scan using shotpoint_idx on shotpoint p  (cost=0.00..467924.54 
> rows=290106 width=12) (actual time=37.38..1379.64 rows=100749 loops=1)
>  Total runtime: 6086.32 msec
> 
> So why did were the indices not used before when they yield a better plan?

There's another reason. Notice it thinks the second table will return 290k
records. In fact it only returns 100k records. So it's optimizing on the
assumption that it will have to read 3x as many records as it actually will...

I'm not clear if there's anything you can do to improve this estimate though.

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] slow select

2003-10-23 Thread Vivek Khera
> "JB" == Josh Berkus <[EMAIL PROTECTED]> writes:

JB> Medora,
>> Increasing effective_cache_size to 1 did it.  

JB> That would be 78MB RAM.  If you have more than that available, you can 
JB> increase it further.  Ideally, it should be about 2/3 to 3/4 of available 
JB> RAM.

Assuming your OS will use that much RAM for the cache... the whole
world's not Linux :-)

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Vivek Khera, Ph.D.Khera Communications, Inc.
Internet: [EMAIL PROTECTED]   Rockville, MD   +1-240-453-8497
AIM: vivekkhera Y!: vivek_khera   http://www.khera.org/~vivek/

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] slow select

2003-10-22 Thread Josh Berkus
Medora,

> Increasing effective_cache_size to 1 did it.  

That would be 78MB RAM.  If you have more than that available, you can 
increase it further.  Ideally, it should be about 2/3 to 3/4 of available 
RAM.

>The query now
> takes 4 secs.  I left random_page_cost at the default value of 4.  
> I thought, mistakenly apparently, that our database was relatively 
> itty bitty and so haven't messed with the .conf file. 

Actually, for a itty bitty database on a fast machine, you definitely want to 
lower random_page_cost.  It's a large database that would make you cautious 
about this.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] slow select

2003-10-22 Thread Medora Schauer

Josh,

> > So why did were the indices not used before when they yield 
> a better plan?
> 
> Your .conf settings, most likely.  I'd lower your 
> random_page_cost and raise 
> your effective_cache_size.

Increasing effective_cache_size to 1 did it.  The query now
takes 4 secs.  I left random_page_cost at the default value of 4.  
I thought, mistakenly apparently, that our database was relatively 
itty bitty and so haven't messed with the .conf file.  Guess I 
better take a look at all the settings (I know where the docs are).

Thanks for your help,

Medora

***
Medora Schauer
Sr. Software Engineer

Fairfield Industries
***

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] slow select

2003-10-22 Thread Josh Berkus
Medora,

> So why did were the indices not used before when they yield a better plan?

Your .conf settings, most likely.  I'd lower your random_page_cost and raise 
your effective_cache_size.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] slow select

2003-10-22 Thread Medora Schauer


 
> 
> Medora,
> 
> > I'm using pg 7.3.4 to do a select involving a join on 2 tables.
> > The query is taking 15 secs which seems extreme to me considering
> > the indices that exist on the two tables.  EXPLAIN ANALYZE shows
> > that the indices aren't being used.  I've done VACUUM ANALYZE on the
> > db with no change in results.  Shouldn't the indices be used?
> 
> No.  You're selecting 100,000 records.   For such a large 
> record dump, a seq 
> scan is usually faster.
> 
> If you don't believe me, try setting enable_seqscan=false and 
> see how long the 
> query takes.

I did as you suggested (set enable_seqscan = false) and the query now takes 6 sec (vs
15 secs before) :

Merge Join  (cost=0.00..287726.10 rows=100221 width=58) (actual time=61.60..5975.63 
rows=100425 loops=1)
   Merge Cond: (("outer".shot_line_num = "inner".shot_line_num) AND ("outer".shotpoint 
= "inner".shotpoint))
   ->  Index Scan using hsot_record_idx on shot_record r  (cost=0.00..123080.11 
rows=100425 width=46) (actual time=24.15..2710.31 rows=100425 loops=1)
   ->  Index Scan using shotpoint_idx on shotpoint p  (cost=0.00..467924.54 
rows=290106 width=12) (actual time=37.38..1379.64 rows=100749 loops=1)
 Total runtime: 6086.32 msec

So why did were the indices not used before when they yield a better plan?



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] slow select

2003-10-22 Thread Josh Berkus
Medora,

> I'm using pg 7.3.4 to do a select involving a join on 2 tables.
> The query is taking 15 secs which seems extreme to me considering
> the indices that exist on the two tables.  EXPLAIN ANALYZE shows
> that the indices aren't being used.  I've done VACUUM ANALYZE on the
> db with no change in results.  Shouldn't the indices be used?

No.  You're selecting 100,000 records.   For such a large record dump, a seq 
scan is usually faster.

If you don't believe me, try setting enable_seqscan=false and see how long the 
query takes.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] slow select

2003-10-22 Thread Medora Schauer

I'm using pg 7.3.4 to do a select involving a join on 2 tables.  
The query is taking 15 secs which seems extreme to me considering 
the indices that exist on the two tables.  EXPLAIN ANALYZE shows 
that the indices aren't being used.  I've done VACUUM ANALYZE on the 
db with no change in results.  Shouldn't the indices be used?

Below is what I believe to be the relevant information.  I haven't
included the definitions of the tables involved in the foreign
key definititions because I don't think they matter.  

Any help will be greatly appreciated.

 CREATE TABLE shotpoint ( 
  shot_line_num FLOAT4, \
  shotpoint FLOAT4, 
  x FLOAT4, 
  y FLOAT4, 
  template_id INT4, 
  num_chans INT4)

CREATE TABLE shot_record ( 
  shot_line_num FLOAT4, 
  shotpoint FLOAT4, 
  index INT2, 
  dev INT4, 
  dev_offset INT8, 
  bin INT4, 
  shot_time INT8, 
  record_length INT4,
  nav_x FLOAT4,
  nav_y FLOAT4,
  num_rus INT4,
  status INT4 DEFAULT 0, 
  reel_num INT4,
  file_num INT4,
  nav_status INT2,
  nav_shot_line FLOAT4,
  nav_shotpoint FLOAT4,
  nav_depth FLOAT4,
  sample_skew INT4, 
  trace_count INT4,  
  PRIMARY KEY (shot_line_num, shotpoint, index)) 

ALTER TABLE shotpoint ADD CONSTRAINT shot_line_fk 
  FOREIGN KEY (shot_line_num) 
  REFERENCES shot_line(shot_line_num)

CREATE UNIQUE INDEX shotpoint_idx 
  ON shotpoint(shot_line_num, shotpoint)

ALTER TABLE shot_record ADD CONSTRAINT shot_record_shotpoint_index_fk 
  FOREIGN KEY (shot_line_num, shotpoint) 
  REFERENCES shotpoint(shot_line_num, shotpoint)
 

 EXPLAIN ANALYZE SELECT r.shot_line_num, r.shotpoint, index, 
shot_time, 
record_length, dev, 
dev_offset, num_rus, bin, template_id, trace_count
   FROM shot_record r, shotpoint p 
   WHERE p.shot_line_num = r.shot_line_num 
   AND p.shotpoint = r.shotpoint; 

 

Merge Join  (cost=49902.60..52412.21 rows=100221 width=58) (actual 
time=12814.28..15000.65 rows=100425 loops=1)
   Merge Cond: (("outer".shot_line_num = "inner".shot_line_num) AND ("outer".shotpoint 
= "inner".shotpoint))
   ->  Sort  (cost=13460.90..13711.97 rows=100425 width=46) (actual 
time=3856.94..4157.01 rows=100425 loops=1)
 Sort Key: r.shot_line_num, r.shotpoint
 ->  Seq Scan on shot_record r  (cost=0.00..2663.25 rows=100425 width=46) 
(actual time=18.00..1089.00 rows=100425 loops=1)
   ->  Sort  (cost=36441.70..37166.96 rows=290106 width=12) (actual 
time=8957.19..9224.09 rows=100749 loops=1)
 Sort Key: p.shot_line_num, p.shotpoint
 ->  Seq Scan on shotpoint p  (cost=0.00..5035.06 rows=290106 width=12) 
(actual time=7.55..2440.06 rows=290106 loops=1)
 Total runtime: 15212.05 msec


***
Medora Schauer
Sr. Software Engineer

Fairfield Industries
14100 Southwest Freeway
Suite 600
Sugar Land, Tx  77478-3469
USA

[EMAIL PROTECTED]
phone: 281-275-7664
fax: 281-275-7551
***


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster