[SQL] Sequence vs. Index Scan

2007-05-05 Thread Aaron Bono

I have two schemas, both with the exact same DDL.  One performs great and
the other has problems with a specific query:

SELECT *
FROM
   branch_active_vw
WHERE branch_id = get_branch_for_zip ( '22151' )
ORDER BY branch_name
;

I am not defining the view here because the explain plans show the real
problem.  I can post the view as well if it will help.

The explain plans are as follows:

Fast Schema:
1. Sort  (cost=17.50..17.51 rows=1 width=680) (actual time=
2838.583..2838.586 rows=1 loops=1)
2.   Sort Key: branch.branch_name
3.   ->  Nested Loop Left Join  (cost=0.00..17.49 rows=1 width=680) (actual
time=2838.060..2838.093 rows=1 loops=1)
4. Join Filter: ("inner".branch_group_id = "outer".branch_group_id)
5. ->  Nested Loop  (cost=0.00..11.45 rows=1 width=647) (actual
time=2837.776..2837.804 rows=1 loops=1)
6.   ->  Nested Loop  (cost=0.00..7.88 rows=1 width=618) (actual
time=2837.697..2837.716 rows=1 loops=1)
7. Join Filter: ("inner".locale_id = "outer".locale_id)
8. ->  Nested Loop  (cost=0.00..6.86 rows=1 width=598)
(actual time=2837.666..2837.676 rows=1 loops=1)
9.   Join Filter: ("inner".corp_id =
"outer".corp_id)
10.  ->  Index Scan using branch_pkey on branch
(cost=0.00..5.84 rows=1 width=560) (actual time=2837.621..2837.624 rows=1
loops=1)
11.Index Cond: (branch_id =
get_branch_for_zip('22151'::character varying))
12.Filter: ((start_day <= now()) AND
((end_day IS NULL) OR (end_day >= now(
13.  ->  Seq Scan on corp  (cost=0.00..1.01 rows=1
width=46) (actual time=0.015..0.017 rows=1 loops=1)
14.->  Seq Scan on locale  (cost=0.00..1.01 rows=1
width=28) (actual time=0.014..0.016 rows=1 loops=1)
15.  ->  Index Scan using zip_cd_pkey on zip_cd
branch_address_zip_cd  (cost=0.00..3.55 rows=1 width=37) (actual time=
0.066..0.069 rows=1 loops=1)
16.Index Cond: (branch_address_zip_cd.zip_cd_id =
"outer".branch_address_zip_id)
17.->  Seq Scan on branch_group  (cost=0.00..1.07 rows=7 width=41)
(actual time=0.013..0.029 rows=7 loops=1)
18.SubPlan
19.  ->  Seq Scan on branch_area  (cost=0.00..4.89 rows=1 width=6)
(actual time=0.132..0.137 rows=2 loops=1)
20.Filter: (branch_id = $0)
21. Total runtime: 2839.044 ms

Slow Schema:
Sort  (cost=12.77..12.78 rows=1 width=1380) (actual time=
157492.513..157492.515 rows=1 loops=1)
1.   Sort Key: branch.branch_name
2.   ->  Nested Loop Left Join  (cost=0.00..12.76 rows=1 width=1380) (actual
time=130130.384..157492.484 rows=1 loops=1)
3. Join Filter: ("inner".branch_group_id = "outer".branch_group_id)
4. ->  Nested Loop  (cost=0.00..10.34 rows=1 width=1360) (actual
time=130130.157..157492.253 rows=1 loops=1)
5.   Join Filter: ("inner".locale_id = "outer".locale_id)
6.   ->  Nested Loop  (cost=0.00..9.31 rows=1 width=1340)
(actual time=130130.127..157492.213 rows=1 loops=1)
7. ->  Nested Loop  (cost=0.00..5.75 rows=1 width=1311)
(actual time=130130.042..157492.119 rows=1 loops=1)
8.   Join Filter: ("inner".corp_id =
"outer".corp_id)
9.   ->  Seq Scan on branch  (cost=0.00..4.72 rows=1
width=1281) (actual time=130129.988..157492.057 rows=1 loops=1)
10.Filter: ((start_day <= now()) AND
((end_day IS NULL) OR (end_day >= now())) AND (branch_id =
get_branch_for_zip('22151'::character varying)))
11.  ->  Seq Scan on corp  (cost=0.00..1.01 rows=1
width=38) (actual time=0.022..0.024 rows=1 loops=1)
12.->  Index Scan using zip_cd_pkey on zip_cd
branch_address_zip_cd  (cost=0.00..3.55 rows=1 width=37) (actual time=
0.070..0.073 rows=1 loops=1)
13.  Index Cond: (branch_address_zip_cd.zip_cd_id =
"outer".branch_address_zip_id)
14.  ->  Seq Scan on locale  (cost=0.00..1.01 rows=1 width=28)
(actual time=0.013..0.015 rows=1 loops=1)
15.->  Seq Scan on branch_group  (cost=0.00..1.01 rows=1 width=28)
(actual time=0.013..0.015 rows=1 loops=1)
16.SubPlan
17.  ->  Seq Scan on branch_area  (cost=0.00..1.40 rows=1 width=5)
(actual time=0.077..0.084 rows=1 loops=1)
18.Filter: (branch_id = $0)
19. Total runtime: 157492.890 ms

The problem is that lines 10-12 on the fast schema show an index scan while
lines 9-10  of the slow schema show a sequence scan.  The sequence scan of
the branch_id, combined with the rest of the filter takes forever.  I have
checked and there IS an index, specifically a primary key index, on the
branch_id in both schemas so I cannot figure out why the optimizer is
looking at these differently.  In fact, the table the branch_id comes from
has the exact same indices and foreign keys on both schemas.

Any direction would be de

Re: [SQL] Sequence vs. Index Scan

2007-05-05 Thread Andrew Sullivan
On Sat, May 05, 2007 at 04:05:52PM -0500, Aaron Bono wrote:
> I have two schemas, both with the exact same DDL.  One performs great and

Do they have the same data?

>  I have
> checked and there IS an index, specifically a primary key index, on the
> branch_id in both schemas so I cannot figure out why the optimizer is
> looking at these differently.  In fact, the table the branch_id comes from
> has the exact same indices and foreign keys on both schemas.

Different data?  Different statistics?  Different numbers of
(possibly dead) rows?

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
Everything that happens in the world happens at some place.
--Jane Jacobs 

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [SQL] Sequence vs. Index Scan

2007-05-05 Thread Aaron Bono

On 5/5/07, Andrew Sullivan <[EMAIL PROTECTED]> wrote:


On Sat, May 05, 2007 at 04:05:52PM -0500, Aaron Bono wrote:
> I have two schemas, both with the exact same DDL.  One performs great
and

Do they have the same data?



They have different data.  The fast one has about 150 rows and the  slow one
has about 40 rows.  The field in question here, the branch_id, is a
BIGSERIAL in both.


 I have
> checked and there IS an index, specifically a primary key index, on the
> branch_id in both schemas so I cannot figure out why the optimizer is
> looking at these differently.  In fact, the table the branch_id comes
from
> has the exact same indices and foreign keys on both schemas.

Different data?  Different statistics?  Different numbers of
(possibly dead) rows?




We don't allow deletes and updates are fairly infrequent.  I also did a
vacuum analyze to no effect.

What can I do to get more statistics that would help?

Thanks for the help,
Aaron

--
==
  Aaron Bono
  Aranya Software Technologies, Inc.
  http://www.aranya.com
  http://codeelixir.com
==


Re: [SQL] Sequence vs. Index Scan

2007-05-05 Thread Tom Lane
"Aaron Bono" <[EMAIL PROTECTED]> writes:
> 9.   ->  Seq Scan on branch  (cost=0.00..4.72 rows=1
> width=1281) (actual time=130129.988..157492.057 rows=1 loops=1)
> 10.Filter: ((start_day <= now()) AND
> ((end_day IS NULL) OR (end_day >= now())) AND (branch_id =
> get_branch_for_zip('22151'::character varying)))

There is something *awfully* wacko about that entry --- the fact that
the cost estimate is less than 5 units means that the planner thinks
there's 4 or fewer pages; either that's way wrong or the
get_branch_for_zip function is taking enormous amounts of time per row.
Have you tried timing that function on its own?

One possible reason for the performance difference is if you have 
get_branch_for_zip marked as stable in one database and volatile in the
other --- volatile would prevent it from being used in an indexqual as
you'd like.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [SQL] Sequence vs. Index Scan

2007-05-05 Thread Andrew Sullivan
On Sat, May 05, 2007 at 05:00:53PM -0500, Aaron Bono wrote:
> 
> They have different data.  The fast one has about 150 rows and the  slow one
> has about 40 rows.  The field in question here, the branch_id, is a
> BIGSERIAL in both.

I'd be astonished if a table of 40 rows ever got index scanned.  It's
probably more efficient to read the whole table.  But it seems your
case may be strange.  

> We don't allow deletes and updates are fairly infrequent.  I also did a
> vacuum analyze to no effect.

How do you "not allow" deletes?  Does this happen because of a failed
foreign key?  If so, you can end up with dead tuples.  I'd look at
the output of VACUUM VERBOSE to make sure you don't have a lot of
dead tuples.  That said, I wonder if fiddling with the statistics on
your tables might help.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
In the future this spectacle of the middle classes shocking the avant-
garde will probably become the textbook definition of Postmodernism. 
--Brad Holland

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [SQL] Sequence vs. Index Scan

2007-05-05 Thread Aaron Bono

On 5/5/07, Tom Lane <[EMAIL PROTECTED]> wrote:


"Aaron Bono" <[EMAIL PROTECTED]> writes:
> 9.   ->  Seq Scan on branch  (cost=0.00..4.72rows=1
> width=1281) (actual time=130129.988..157492.057 rows=1 loops=1)
> 10.Filter: ((start_day <= now()) AND
> ((end_day IS NULL) OR (end_day >= now())) AND (branch_id =
> get_branch_for_zip('22151'::character varying)))

There is something *awfully* wacko about that entry --- the fact that
the cost estimate is less than 5 units means that the planner thinks
there's 4 or fewer pages; either that's way wrong or the
get_branch_for_zip function is taking enormous amounts of time per row.
Have you tried timing that function on its own?



Yes, the first thing I did was run it on its own.  It takes about 2-6
seconds to run which is not great but acceptable and the functions are
marked as stable.  It is the same +/- a second or two on each of the two
schemas.  The function is a plpgsql function that executes a dynamic SQL
string.  Would it be better to change it to a straight SQL function?  Would
that help the planner?  I may be able to change it if this will help.

One possible reason for the performance difference is if you have

get_branch_for_zip marked as stable in one database and volatile in the
other --- volatile would prevent it from being used in an indexqual as
you'd like.




I just switched the functions in both schemas to be stable (before I posted
the original question), they were volatile.  After doing that, the fast
schema worked great but the slow one is still under performing.  I even went
back to the slow schema and dropped and recreated the function with no
discernible difference.

This is for PostgreSQL 8.1.3 and the results are the same on Windows and
Linux versions.  The windows version is a restore from the Linux version's
backup.

--
==
  Aaron Bono
  Aranya Software Technologies, Inc.
  http://www.aranya.com
  http://codeelixir.com
==


Re: [SQL] Sequence vs. Index Scan

2007-05-05 Thread Aaron Bono

On 5/5/07, Andrew Sullivan <[EMAIL PROTECTED]> wrote:


On Sat, May 05, 2007 at 05:00:53PM -0500, Aaron Bono wrote:

> We don't allow deletes and updates are fairly infrequent.  I also did a
> vacuum analyze to no effect.

How do you "not allow" deletes?



Permissions are set on the tables to only allow the application to do
select/insert/update.  We use start/end dates to indicate that records are
no longer active and then have the application run against views that filter
out inactive records.  It also allows "removal" of records sometime in the
future, gives us the ability to "undelete" records, and keeps us from having
to code the application to have to clean up because of foreign key
constraints.  There are other advantages.

I'd look at

the output of VACUUM VERBOSE to make sure you don't have a lot of
dead tuples.



Fast Schema:
psql:vacuumverbose.sql:1: INFO:  vacuuming "fast_schema.branch"
psql:vacuumverbose.sql:1: INFO:  index "branch_pkey" now contains 150 row
versions in 2 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.

Slow Schema:
psql:vacuumverbose.sql:1: INFO:  vacuuming "slow_schema.branch"
psql:vacuumverbose.sql:1: INFO:  index "branch_pkey" now contains 29 row
versions in 2 pages
DETAIL:  0 index pages have been deleted, 0 are currently reusable.

Like Tom said earlier, I really think it has something to do with the
volatile/stable indicator on the function but I changed that.  The function
in the slow schema acts like it is still volatile while the fast schema is
obviously stable.

Is there another way to analyze how the database is using the function in
the selects?  That or is there a way to just look at how the function is
being used by the optimizer?

I do appreciate the feedback.

--
==
  Aaron Bono
  Aranya Software Technologies, Inc.
  http://www.aranya.com
  http://codeelixir.com
==


Re: [SQL] Sequence vs. Index Scan

2007-05-05 Thread Aaron Bono

On 5/5/07, Tom Lane <[EMAIL PROTECTED]> wrote:


"Aaron Bono" <[EMAIL PROTECTED]> writes:
> 9.   ->  Seq Scan on branch  (cost=0.00..4.72rows=1
> width=1281) (actual time=130129.988..157492.057 rows=1 loops=1)
> 10.Filter: ((start_day <= now()) AND
> ((end_day IS NULL) OR (end_day >= now())) AND (branch_id =
> get_branch_for_zip('22151'::character varying)))

There is something *awfully* wacko about that entry --- the fact that
the cost estimate is less than 5 units means that the planner thinks
there's 4 or fewer pages; either that's way wrong or the
get_branch_for_zip function is taking enormous amounts of time per row.
Have you tried timing that function on its own?

One possible reason for the performance difference is if you have
get_branch_for_zip marked as stable in one database and volatile in the
other --- volatile would prevent it from being used in an indexqual as
you'd like.



I verified it by putting a RAISE NOTICE in the function.  The fast schema
runs the function twice (odd, I would think it would run only once).  The
slow schema runs it 30 times (the number of records returned + 1).  I know I
put the functions into both schemas as stable and even dropped and recreated
the function.  Then I verified with EMS Manager and it tells me the DDL for
the function in the database is set to stable.  Is there something I can do
to tell PostgreSQL that I really did mean stable?

Thanks for all the help,
Aaron

--
==
  Aaron Bono
  Aranya Software Technologies, Inc.
  http://www.aranya.com
  http://codeelixir.com
==


Re: [SQL] Sequence vs. Index Scan

2007-05-05 Thread Jaime Casanova

On 5/5/07, Aaron Bono <[EMAIL PROTECTED]> wrote:

On 5/5/07, Tom Lane <[EMAIL PROTECTED]> wrote:
> "Aaron Bono" <[EMAIL PROTECTED]> writes:
> > 9.   ->  Seq Scan on branch  (cost=0.00..4.72
rows=1
> > width=1281) (actual time= 130129.988..157492.057 rows=1 loops=1)
> > 10.Filter: ((start_day
<= now()) AND
> > ((end_day IS NULL) OR (end_day >= now())) AND (branch_id =
> > get_branch_for_zip('22151'::character varying)))
>
> There is something *awfully* wacko about that entry --- the fact that
> the cost estimate is less than 5 units means that the planner thinks
> there's 4 or fewer pages; either that's way wrong or the
> get_branch_for_zip function is taking enormous amounts of time per row.
> Have you tried timing that function on its own?
>
> One possible reason for the performance difference is if you have
> get_branch_for_zip marked as stable in one database and volatile in the
> other --- volatile would prevent it from being used in an indexqual as
> you'd like.
>

I verified it by putting a RAISE NOTICE in the function.  The fast schema
runs the function twice (odd, I would think it would run only once).  The
slow schema runs it 30 times (the number of records returned + 1).  I know I
put the functions into both schemas as stable and even dropped and recreated
the function.  Then I verified with EMS Manager and it tells me the DDL for
the function in the database is set to stable.  Is there something I can do
to tell PostgreSQL that I really did mean stable?



maybe this is silly but you can verify what the database thinks of the
function selecting from pg_proc

select pronamespace, provolatile
from pg_proc where proname = 'get_branch_for_zip'

--
regards,
Jaime Casanova

"Programming today is a race between software engineers striving to
build bigger and better idiot-proof programs and the universe trying
to produce bigger and better idiots.
So far, the universe is winning."
  Richard Cook

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org