Re: [PERFORM] slow seqscan

2004-04-21 Thread Tom Lane
Edoardo Ceccarelli <[EMAIL PROTECTED]> writes:
> I wasn't able to make this 2 field index with lower:

> dba400=# CREATE INDEX annuncio400_rubric_testo_idx ON 
> annuncio400(rubric, LOWER(testo));
> ERROR:  parser: parse error at or near "(" at character 71

> seems impossible to creat 2 field indexes with lower function.

You need 7.4 to do that; previous releases don't support multi-column
functional indexes.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] slow seqscan

2004-04-21 Thread Edoardo Ceccarelli

can't understand this policy:

dba400=# SELECT count(*) from annuncio400 where rubric='DD';
(1 row)
dba400=# SELECT count(*) from annuncio400 where rubric='MA';
(1 row)
so it's using the index on 2000 rows and not for 6000?  it's not that
big difference, isn't it?

It's a question of how many pages it thinks it's going to have to retrieve
in order to handle the request.  If it say needs (or think it needs) to
retrieve 50% of the pages, then given a random_page_cost of 4, it's going
to expect the index scan to be about twice the cost.
Generally speaking one good way to compare is to try the query with
explain analyze and then change parameters like enable_seqscan and try the
query with explain analyze again and compare the estimated rows and costs.
That'll give an idea of how it expects the two versions of the query to
compare speed wise.

Ok then how do you explain this?
just created a copy of the same database
Slow seqscan query executed on dba400

dba400=# explain analyze SELECT *, oid FROM annuncio400 WHERE  rubric = 
'DD' AND LOWER(testo) Like LOWER('cbr%') OFFSET 0 LIMIT 11;

Limit  (cost=0.00..3116.00 rows=11 width=546) (actual time=46.66..51.40 
rows=11 loops=1)
 ->  Seq Scan on annuncio400  (cost=0.00..35490.60 rows=125 width=546) 
(actual time=46.66..51.38 rows=12 loops=1)
   Filter: ((rubric = 'DD'::bpchar) AND (lower((testo)::text) ~~ 
Total runtime: 51.46 msec
(4 rows)

fastest index scan query on dba400b (exact copy of dba400)

dba400b=# explain analyze SELECT *, oid FROM annuncio400 WHERE  rubric = 
'DD' AND LOWER(testo) Like LOWER('cbr%') OFFSET 0 LIMIT 11;

Limit  (cost=0.00..7058.40 rows=9 width=546) (actual time=1.36..8.18 
rows=11 loops=1)
 ->  Index Scan using rubric on annuncio400  (cost=0.00..7369.42 rows=9 
width=546) (actual time=1.35..8.15 rows=12 loops=1)
   Index Cond: (rubric = 'DD'::bpchar)
   Filter: (lower((testo)::text) ~~ 'cbr%'::text)
Total runtime: 8.28 msec
(5 rows)

anyway, shall I try to lower the random_page value since I get an index 
scan? I mean that in my case I've already noted that with index scan 
that query get executed in 1/10 of the seqscan speed.

Thank you
---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

Re: [PERFORM] slow seqscan

2004-04-21 Thread Stephan Szabo

On Wed, 21 Apr 2004, Edoardo Ceccarelli wrote:

> > What happens if you go:
> >
> > CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(rubric,
> > LOWER(testo));
> >
> > or even just:
> >
> > CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(LOWER(testo));
> >
> I wasn't able to make this 2 field index with lower:
> dba400=# CREATE INDEX annuncio400_rubric_testo_idx ON
> annuncio400(rubric, LOWER(testo));
> ERROR:  parser: parse error at or near "(" at character 71

That's a 7.4 feature I think (and I think the version with two columns
may need extra parens around the lower()). I think the only way to do
something equivalent in 7.3 is to make a function that concatenates the
two in some fashion after having applied the lower to the one part and
then using that in the queries as well.  Plus, if you're not in "C"
locale, I'm not sure that it'd help in 7.3 anyway.

> >> But the strangest thing ever is that if I change the filter with
> >> another one that represent a smaller amount of data  it uses the
> >> index scan!!!
> >
> >
> > What's strange about that?  The less data is going to be retrieved,
> > the more likely postgres is to use the index.
> >
> can't understand this policy:
> dba400=# SELECT count(*) from annuncio400 where rubric='DD';
>  count
> ---
>   6753
> (1 row)
> dba400=# SELECT count(*) from annuncio400 where rubric='MA';
>  count
> ---
>   2165
> (1 row)
> so it's using the index on 2000 rows and not for 6000?  it's not that
> big difference, isn't it?

It's a question of how many pages it thinks it's going to have to retrieve
in order to handle the request.  If it say needs (or think it needs) to
retrieve 50% of the pages, then given a random_page_cost of 4, it's going
to expect the index scan to be about twice the cost.

Generally speaking one good way to compare is to try the query with
explain analyze and then change parameters like enable_seqscan and try the
query with explain analyze again and compare the estimated rows and costs.
That'll give an idea of how it expects the two versions of the query to
compare speed wise.

---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] slow seqscan

2004-04-21 Thread Edoardo Ceccarelli
just created a copy of the same database and it shows that is the 
analyze that's messing things:

Slow seqscan query executed on dba400

dba400=# explain analyze SELECT *, oid FROM annuncio400 WHERE  rubric = 
'DD' AND LOWER(testo) Like LOWER('cbr%') OFFSET 0 LIMIT 11;

Limit  (cost=0.00..3116.00 rows=11 width=546) (actual time=46.66..51.40 
rows=11 loops=1)
  ->  Seq Scan on annuncio400  (cost=0.00..35490.60 rows=125 width=546) 
(actual time=46.66..51.38 rows=12 loops=1)
Filter: ((rubric = 'DD'::bpchar) AND (lower((testo)::text) ~~ 
Total runtime: 51.46 msec
(4 rows)

fastest index scan query on dba400b (exact copy of dba400)

dba400b=# explain analyze SELECT *, oid FROM annuncio400 WHERE  rubric = 
'DD' AND LOWER(testo) Like LOWER('cbr%') OFFSET 0 LIMIT 11;
Limit  (cost=0.00..7058.40 rows=9 width=546) (actual time=1.36..8.18 
rows=11 loops=1)
  ->  Index Scan using rubric on annuncio400  (cost=0.00..7369.42 
rows=9 width=546) (actual time=1.35..8.15 rows=12 loops=1)
Index Cond: (rubric = 'DD'::bpchar)
Filter: (lower((testo)::text) ~~ 'cbr%'::text)
Total runtime: 8.28 msec
(5 rows)

what about this index you suggested? it gives me sintax error while 
trying to create it:

CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(rubric, 

Christopher Kings-Lynne ha scritto:

enable_seqscan = false

and I'm having all index scans, timing has improved from 600ms to 18ms

wondering what other implications I might expect.

Lots of really bad's really not a good idea.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if 
 joining column's datatypes do not match

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] slow seqscan

2004-04-21 Thread Christopher Kings-Lynne

enable_seqscan = false

and I'm having all index scans, timing has improved from 600ms to 18ms

wondering what other implications I might expect.
Lots of really bad's really not a good idea.


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match

Re: [PERFORM] slow seqscan

2004-04-21 Thread Edoardo Ceccarelli
tried the

enable_seqscan = false

and I'm having all index scans, timing has improved from 600ms to 18ms

wondering what other implications I might expect.

Edoardo Ceccarelli ha scritto:

What happens if you go:

CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(rubric, 

or even just:

CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(LOWER(testo));

I wasn't able to make this 2 field index with lower:

dba400=# CREATE INDEX annuncio400_rubric_testo_idx ON 
annuncio400(rubric, LOWER(testo));
ERROR:  parser: parse error at or near "(" at character 71

seems impossible to creat 2 field indexes with lower function.

The other one does not make it use the index.

But the strangest thing ever is that if I change the filter with 
another one that represent a smaller amount of data  it uses the 
index scan!!!

What's strange about that?  The less data is going to be retrieved, 
the more likely postgres is to use the index.

can't understand this policy:

dba400=# SELECT count(*) from annuncio400 where rubric='DD';
(1 row)
dba400=# SELECT count(*) from annuncio400 where rubric='MA';
(1 row)
so it's using the index on 2000 rows and not for 6000?  it's not that 
big difference, isn't it?

I suggest maybe increasing the amount of stats recorded for your 
rubrik column:

ALTER TABLE annuncio400 ALTER rubrik SET STATISTICS 100;
ANALYZE annuncio400;
done, almost the same, still not using index

You could also try reducing the random_page_cost value in your 
postgresql.conf a little, say to 3 (if it's currently 4).  That will 
make postgres more likely to use index scans over seq scans.

changed the setting on postgresql.conf, restarted the server,
nothing has changed.
what about setting this to false?
#enable_seqscan = true
thanks again
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

Re: [PERFORM] slow seqscan

2004-04-21 Thread Edoardo Ceccarelli

What happens if you go:

CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(rubric, 

or even just:

CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(LOWER(testo));

I wasn't able to make this 2 field index with lower:

dba400=# CREATE INDEX annuncio400_rubric_testo_idx ON 
annuncio400(rubric, LOWER(testo));
ERROR:  parser: parse error at or near "(" at character 71

seems impossible to creat 2 field indexes with lower function.

The other one does not make it use the index.

But the strangest thing ever is that if I change the filter with 
another one that represent a smaller amount of data  it uses the 
index scan!!!

What's strange about that?  The less data is going to be retrieved, 
the more likely postgres is to use the index.

can't understand this policy:

dba400=# SELECT count(*) from annuncio400 where rubric='DD';
(1 row)
dba400=# SELECT count(*) from annuncio400 where rubric='MA';
(1 row)
so it's using the index on 2000 rows and not for 6000?  it's not that 
big difference, isn't it?

I suggest maybe increasing the amount of stats recorded for your 
rubrik column:

ALTER TABLE annuncio400 ALTER rubrik SET STATISTICS 100;
ANALYZE annuncio400;
done, almost the same, still not using index

You could also try reducing the random_page_cost value in your 
postgresql.conf a little, say to 3 (if it's currently 4).  That will 
make postgres more likely to use index scans over seq scans.

changed the setting on postgresql.conf, restarted the server,
nothing has changed.
what about setting this to false?
#enable_seqscan = true
thanks again
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] slow seqscan

2004-04-21 Thread Christopher Kings-Lynne
dba400=# explain analyze SELECT *, oid FROM annuncio400 WHERE  rubric = 
'DD' AND LOWER(testo) Like LOWER('cbr%') OFFSET 0 LIMIT 11;

Limit  (cost=0.00..3116.00 rows=11 width=546) (actual time=51.47..56.42 
rows=11 loops=1)
 ->  Seq Scan on annuncio400  (cost=0.00..35490.60 rows=125 width=546) 
(actual time=51.47..56.40 rows=12 loops=1)
   Filter: ((rubric = 'DD'::bpchar) AND (lower((testo)::text) ~~ 
Total runtime: 56.53 msec
(4 rows)
What happens if you go:

CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(rubric, 

or even just:

CREATE INDEX annuncio400_rubric_testo_idx ON annuncio400(LOWER(testo));

But the strangest thing ever is that if I change the filter with another 
one that represent a smaller amount of data  it uses the index scan!!!
What's strange about that?  The less data is going to be retrieved, the 
more likely postgres is to use the index.

I suggest maybe increasing the amount of stats recorded for your rubrik 

ALTER TABLE annuncio400 ALTER rubrik SET STATISTICS 100;
ANALYZE annuncio400;
You could also try reducing the random_page_cost value in your 
postgresql.conf a little, say to 3 (if it's currently 4).  That will 
make postgres more likely to use index scans over seq scans.


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] slow seqscan

2004-04-21 Thread Edoardo Ceccarelli

In general we are going to need more information, like what kind of 
search filters you are using on the text field and an EXPLAIN ANALYZE. 
But can you try and run the following, bearing in mind it will take a 
while to complete.


From what I remember there were issues with index space not being 
reclaimed in a vacuum. I believe this was fixed in 7.4. By not 
reclaiming the space the indexes grow larger and larger over time, 
causing PG to prefer a sequential scan over an index scan (I think).

The query is this:
SELECT *, oid FROM annuncio400
WHERE  rubric = 'DD' AND LOWER(testo) Like LOWER('cbr%')
dba400=# explain analyze SELECT *, oid FROM annuncio400 WHERE  rubric = 
'DD' AND LOWER(testo) Like LOWER('cbr%') OFFSET 0 LIMIT 11;

Limit  (cost=0.00..3116.00 rows=11 width=546) (actual time=51.47..56.42 
rows=11 loops=1)
 ->  Seq Scan on annuncio400  (cost=0.00..35490.60 rows=125 width=546) 
(actual time=51.47..56.40 rows=12 loops=1)
   Filter: ((rubric = 'DD'::bpchar) AND (lower((testo)::text) ~~ 
Total runtime: 56.53 msec
(4 rows)

But the strangest thing ever is that if I change the filter with another 
one that represent a smaller amount of data  it uses the index scan!!!
check this (same table, same query, different rubric=MA index):

dba400=# explain analyze SELECT *, oid FROM annuncio400 WHERE  rubric = 
'MA' AND LOWER(testo) Like LOWER('cbr%') OFFSET 0 LIMIT 11; 

Limit  (cost=0.00..6630.72 rows=9 width=546) (actual time=42.74..42.74 
rows=0 loops=1)
 ->  Index Scan using rubric on annuncio400  (cost=0.00..6968.48 rows=9 
width=546) (actual time=42.73..42.73 rows=0 loops=1)
   Index Cond: (rubric = 'MA'::bpchar)
   Filter: (lower((testo)::text) ~~ 'cbr%'::text)
Total runtime: 42.81 msec
(5 rows)

Thanks for your help

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

Re: [PERFORM] slow seqscan

2004-04-21 Thread Nick Barr
Edoardo Ceccarelli wrote:

My first post to this list :)

I have a database used only with search queries with only one table that
holds about 450.000/500.000 records.
The table is well indexed so that most of the queries are executed with
index scan but since there is a big text field in the table (360chars)
some search operation (with certain filters) ends up with seq scans.
This table is not written during normal operation: twice per week there
is a batch program that insert about 35.000 records and updates another
last friday morning, after that batch has been executed, the database 
started responding really slowly to queries (expecially seq scans), 
after a "vacuum full analize" things did get something better.
Yesterday the same: before the batch everything was perfect, after 
every query was really slow, I've vacuum it again and now is ok.
Since now the db was working fine, it's 4 month's old with two updates 
per week and I vacuum about once per month.

I am using version 7.3 do I need to upgrade to 7.4? also, I was thinking
about setting this table in a kind of  "read-only" mode to improve
performance, is this possible?
Thank you for your help
Edoardo Ceccarelli
---(end of broadcast)---
TIP 8: explain analyze is your friend
In general we are going to need more information, like what kind of 
search filters you are using on the text field and an EXPLAIN ANALYZE. 
But can you try and run the following, bearing in mind it will take a 
while to complete.


From what I remember there were issues with index space not being 
reclaimed in a vacuum. I believe this was fixed in 7.4. By not 
reclaiming the space the indexes grow larger and larger over time, 
causing PG to prefer a sequential scan over an index scan (I think).

Hope that helps


---(end of broadcast)---
TIP 8: explain analyze is your friend

Re: [PERFORM] slow seqscan

2004-04-21 Thread Christopher Kings-Lynne
Hi Edoardo,

The table is well indexed so that most of the queries are executed with
index scan but since there is a big text field in the table (360chars)
some search operation (with certain filters) ends up with seq scans.
Please paste the exact SELECT query that uses a seqscan, plus the 
EXPLAIN ANALYZE of the SELECT, and the psql output of \d .

This table is not written during normal operation: twice per week there
is a batch program that insert about 35.000 records and updates another
After such an update, you need to run VACUUM ANALYZE ;  Run it 
before the update as well, if it doesn't take that long.

last friday morning, after that batch has been executed, the database 
started responding really slowly to queries (expecially seq scans), 
after a "vacuum full analize" things did get something better.
Yesterday the same: before the batch everything was perfect, after every 
query was really slow, I've vacuum it again and now is ok.
Since now the db was working fine, it's 4 month's old with two updates 
per week and I vacuum about once per month.
You need to vacuum analyze (NOT full) once and HOUR, not once a month. 
Add this command to your crontab to run once an hour and verify that 
it's working:

vacuumdb -a -z -q

Otherwise, install the auto vacuum utility found in 
contrib/pg_autovacuum in the postgres source.  Set this up.  It will 
monitor postgres and run vacuums and analyzes when necessary.  You can 
then remove your cron job.

I am using version 7.3 do I need to upgrade to 7.4? also, I was thinking
about setting this table in a kind of  "read-only" mode to improve
performance, is this possible?
There's no read only mode to improve performance.

Upgrading to 7.4 will more than likely improve the performance of your 
database in general.  Be careful to read the upgrade notes because there 
were a few incompatibilities.


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])

[PERFORM] slow seqscan

2004-04-21 Thread Edoardo Ceccarelli
My first post to this list :)

I have a database used only with search queries with only one table that
holds about 450.000/500.000 records.
The table is well indexed so that most of the queries are executed with
index scan but since there is a big text field in the table (360chars)
some search operation (with certain filters) ends up with seq scans.
This table is not written during normal operation: twice per week there
is a batch program that insert about 35.000 records and updates another
last friday morning, after that batch has been executed, the database 
started responding really slowly to queries (expecially seq scans), 
after a "vacuum full analize" things did get something better.
Yesterday the same: before the batch everything was perfect, after every 
query was really slow, I've vacuum it again and now is ok.
Since now the db was working fine, it's 4 month's old with two updates 
per week and I vacuum about once per month.

I am using version 7.3 do I need to upgrade to 7.4? also, I was thinking
about setting this table in a kind of  "read-only" mode to improve
performance, is this possible?
Thank you for your help
Edoardo Ceccarelli
---(end of broadcast)---
TIP 8: explain analyze is your friend