Re: [PERFORM] cross table indexes or something?

2003-12-02 Thread Christopher Kings-Lynne
4) Currently, pg_dump does *not* back up statistics settings.


Is this a TODO?
Oops - sorry I thought you meant 'pg_dump does not back up statistics'. 
   Probably still should be a TODO :)

Chris



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] cross table indexes or something?

2003-12-02 Thread Christopher Kings-Lynne
4) Currently, pg_dump does *not* back up statistics settings.
Is this a TODO?

Chris



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] cross table indexes or something?

2003-12-02 Thread Neil Conway
Josh Berkus <[EMAIL PROTECTED]> writes:
> Oh, good.   Was this a 7.4 improvement?

No, it was in 7.3

-Neil


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] cross table indexes or something?

2003-12-02 Thread Josh Berkus
Neil,

> > 1) to keep it working, you will probably need to run ANALZYE more
> >often than you have been;
> 
> I'm not sure why this would be the case -- can you elaborate?

For the more granular stats to be useful, they have to be accurate; otherwise 
you'll go back to a nestloop as soon as the query planner encounters a value 
that it doens't think is in the table at all. 

> 
> > 4) Currently, pg_dump does *not* back up statistics settings.
> 
> Yes, it does.

Oh, good.   Was this a 7.4 improvement?   I missed that in the changelogs 

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] cross table indexes or something?

2003-12-02 Thread Neil Conway
Josh Berkus <[EMAIL PROTECTED]> writes:
> 1) to keep it working, you will probably need to run ANALZYE more
>often than you have been;

I'm not sure why this would be the case -- can you elaborate?

> 4) Currently, pg_dump does *not* back up statistics settings.

Yes, it does.

-Neil


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] cross table indexes or something?

2003-12-02 Thread Josh Berkus
Jeremiah,

> Thanks to all, I had already run analyze.  But the STATISTICS setting
> seems to have worked. I'm just not sure what it did..? Would anyone care
> to explain.

The STATISTICS setting improves the granularity of statistics kept by the 
query planner on that column; increasing the granularity (i.e. more random 
samples) can significantly improve things in cases where you have data whose 
distribution is significantly skewed.Certainly whenever you see the query 
planner using a slow nestloop becuase of a bad row-return estimate, it is one 
of the first things to try.

Its drawbacks are 4-fold:
1) to keep it working, you will probably need to run ANALZYE more often than 
you have been;
2) these ANALYZEs will take longer, and have the annoying side effect of 
flooring your CPU while they do;
3) You will have to be sure that your vacuum plan includes vacuuming the 
pg_statistic table as the database superuser, as that table will be getting 
updated more often.
4) Currently, pg_dump does *not* back up statistics settings.  So you will 
need to save a script which does this in preparation for having to restore 
your database.

Which is why the stats are set low by default.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] cross table indexes or something?

2003-12-02 Thread Jeremiah Jahn
Thanks to all, I had already run analyze.  But the STATISTICS setting
seems to have worked. I'm just not sure what it did..? Would anyone care
to explain. 


On Mon, 2003-12-01 at 13:47, Josh Berkus wrote:
> Jeremiah,
> 
> > I've attached the Analyze below. I have no idea why the db thinks there
> > is only 1 judge named simth. Is there some what I can inform the DB
> > about this. In actuality, there aren't any judges named smith at the
> > moment, but there are 22K people named smith.
> 
> No, Hannu meant that you may need to run the following command:
> 
> ANALYZE actor;
> 
> ... to update the database statistics on the actors table.   That is a 
> maintainence task that needs to be run periodically.
> 
> If that doesn't fix the bad plan, then the granularity of statistics on the 
> full_name column needs updating; I suggest:
> 
> ALTER TABLE actor ALTER COLUMN full_name SET STATISTICS 100;
> ANALYZE actor;
> 
> And if it's still choosing  a slow nested loop, up the stats to 250.
-- 
Jeremiah Jahn <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] cross table indexes or something?

2003-12-01 Thread Josh Berkus
Jeremiah,

> I've attached the Analyze below. I have no idea why the db thinks there
> is only 1 judge named simth. Is there some what I can inform the DB
> about this. In actuality, there aren't any judges named smith at the
> moment, but there are 22K people named smith.

No, Hannu meant that you may need to run the following command:

ANALYZE actor;

... to update the database statistics on the actors table.   That is a 
maintainence task that needs to be run periodically.

If that doesn't fix the bad plan, then the granularity of statistics on the 
full_name column needs updating; I suggest:

ALTER TABLE actor ALTER COLUMN full_name SET STATISTICS 100;
ANALYZE actor;

And if it's still choosing  a slow nested loop, up the stats to 250.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] cross table indexes or something?

2003-12-01 Thread Arjen van der Meijden
> Jeremiah Jahn wrote:
>
> > Have you run ANALYZE ? Why does DB think that there is only 
> one judge 
> > with name like SMITH% ?
> I've attached the Analyze below. I have no idea why the db 
> thinks there is only 1 judge named simth. Is there some what 
> I can inform the DB about this. In actuality, there aren't 
> any judges named smith at the moment, but there are 22K 
> people named smith.
> 

I think you're mistaking the command EXPLAIN ANALYZE for the command
ANALYZE.
Have you actually run the command ANALYZE or perhaps even better if you
haven't vacuumed before: VACUUM FULL ANALYZE

If you have no idea what vacuum is, check the manual. If you've already
run such a VACUUM/ANALYZE-command, then ignore this message :)

Best regards,

Arjen van der Meijden




---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] cross table indexes or something?

2003-12-01 Thread Richard Huxton
On Monday 01 December 2003 14:29, Jeremiah Jahn wrote:
> On Wed, 2003-11-26 at 16:32, Hannu Krosing wrote:
> > Jeremiah Jahn kirjutas K, 26.11.2003 kell 22:14:
> > > I was wondering if there is something I can do that would act similar
> > > to a index over more than one table.
> > >
> > > I have about 3 million people in my DB at the moment, they all have
> > > roles, and many of them have more than one name.
> > >
> > > for example, a Judge will only have one name, but a Litigant could have
> > > multiple aliases. Things go far to slow when I do a query on a judge
> > > named smith.
> >
> > If you dont need all the judges named smith you could try to use LIMIT.
>
> Unfortunately I do need all of the judges named smith.
>
> > Have you run ANALYZE ? Why does DB think that there is only one judge
> > with name like SMITH% ?
>
> I've attached the Analyze below. I have no idea why the db thinks there
> is only 1 judge named simth. Is there some what I can inform the DB
> about this. In actuality, there aren't any judges named smith at the
> moment, but there are 22K people named smith.

It's guessing there's approximately 1. I don't think PG measures 
cross-correlation of various columns cross-table.

If role_class_code on table actor? If so, try:

CREATE INDEX test_judge_idx ON actor (actor_id) WHERE role_class_code = 
'Judge';

And then similar for the other class-codes (assuming you've not got too many 
of them). Or even just an index on (actor_id,role_class_code).

If role_class_code is on a different table, can you say which one? The problem 
is clearly this step:

> ->  Index Scan using actor_speed on
> actor  (cost=0.00..5.42 rows=1 width=50) (actual time=4.883..4.883 rows=0
> loops=22436)
>  Index Cond: (("outer".actor_id)::text =
> (actor.actor_id)::text) Filter: ((role_class_code)::text = 'Judge'::text)

Thats 4.883 * 22436 loops = 109555 milliseconds.
-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] cross table indexes or something?

2003-12-01 Thread Jeremiah Jahn
On Wed, 2003-11-26 at 16:32, Hannu Krosing wrote:
> Jeremiah Jahn kirjutas K, 26.11.2003 kell 22:14:
> > I was wondering if there is something I can do that would act similar to
> > a index over more than one table. 
> > 
> > I have about 3 million people in my DB at the moment, they all have
> > roles, and many of them have more than one name. 
> > 
> > for example, a Judge will only have one name, but a Litigant could have
> > multiple aliases. Things go far to slow when I do a query on a judge
> > named smith.
> 
> If you dont need all the judges named smith you could try to use LIMIT.
Unfortunately I do need all of the judges named smith.


> 
> Have you run ANALYZE ? Why does DB think that there is only one judge
> with name like SMITH% ?
I've attached the Analyze below. I have no idea why the db thinks there
is only 1 judge named simth. Is there some what I can inform the DB
about this. In actuality, there aren't any judges named smith at the
moment, but there are 22K people named smith.


> 
> -
> Hannu
> 
> P.S. 
> Always send EXPLAIN ANALYZE output if asking for advice on [PERFORM]
 EXPLAIN ANALYZE select distinct 
actor.actor_id,court.id,court.name,role_class_code,full_name from 
actor,identity,court,event,event_actor where role_class_code = 'Judge' and full_name 
like 'SMITH%' and identity.actor_id = actor.actor_id and identity.court_ori = 
actor.court_ori and actor.court_ori = court.id and actor.actor_id = 
event_actor.actor_id and event_actor.event_id = event.event_id and event_date_time > 
'20021126' order by full_name;
  QUERY 
PLAN   
   
---
 Unique  (cost=686.42..686.44 rows=1 width=92) (actual time=111923.877..111923.877 
rows=0 loops=1)
   ->  Sort  (cost=686.42..686.43 rows=1 width=92) (actual time=111923.873..111923.873 
rows=0 loops=1)
 Sort Key: identity.full_name, actor.actor_id, court.id, court.name, 
actor.role_class_code
 ->  Nested Loop  (cost=8.45..686.41 rows=1 width=92) (actual 
time=111923.836..111923.836 rows=0 loops=1)
   ->  Nested Loop  (cost=8.45..680.57 rows=1 width=144) (actual 
time=109958.426..57.822 rows=2449 loops=1)
 ->  Hash Join  (cost=8.45..9.62 rows=1 width=117) (actual 
time=109945.754..109945.896 rows=6 loops=1)
   Hash Cond: (("outer".id)::text = ("inner".court_ori)::text)
   ->  Seq Scan on court  (cost=0.00..1.10 rows=10 width=34) 
(actual time=0.015..0.048 rows=10 loops=1)
   ->  Hash  (cost=8.45..8.45 rows=1 width=109) (actual 
time=109940.161..109940.161 rows=0 loops=1)
 ->  Nested Loop  (cost=0.00..8.45 rows=1 width=109) 
(actual time=10.367..109940.079 rows=7 loops=1)
   Join Filter: (("outer".court_ori)::text = 
("inner".court_ori)::text)
   ->  Index Scan using name_speed on identity  
(cost=0.00..3.01 rows=1 width=59) (actual time=10.202..238.497 rows=22436 loops=1)
 Index Cond: (((full_name)::text >= 
'SMITH'::character varying) AND ((full_name)::text < 'SMITI'::character varying))
 Filter: ((full_name)::text ~~ 
'SMITH%'::text)
   ->  Index Scan using actor_speed on actor  
(cost=0.00..5.42 rows=1 width=50) (actual time=4.883..4.883 rows=0 loops=22436)
 Index Cond: (("outer".actor_id)::text = 
(actor.actor_id)::text)
 Filter: ((role_class_code)::text = 
'Judge'::text)
 ->  Index Scan using event_actor_speed on event_actor  
(cost=0.00..655.59 rows=1229 width=73) (actual time=11.815..198.759 rows=408 loops=6)
   Index Cond: ((event_actor.actor_id)::text = 
("outer".actor_id)::text)
   ->  Index Scan using event_pkey on event  (cost=0.00..5.83 rows=1 
width=52) (actual time=0.308..0.308 rows=0 loops=2449)
 Index Cond: (("outer".event_id)::text = (event.event_id)::text)
 Filter: (event_date_time > '20021126'::bpchar)
 Total runtime: 111924.833 ms
(23 rows)



> 
> -
> Hannu
> 
> ---(end of broadcast)---
> TIP 4: Don't 'kill -9' the postmaster
-- 
Jeremiah Jahn <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECT

Re: [PERFORM] cross table indexes or something?

2003-11-26 Thread Hannu Krosing
Jeremiah Jahn kirjutas K, 26.11.2003 kell 22:14:
> I was wondering if there is something I can do that would act similar to
> a index over more than one table. 
> 
> I have about 3 million people in my DB at the moment, they all have
> roles, and many of them have more than one name. 
> 
> for example, a Judge will only have one name, but a Litigant could have
> multiple aliases. Things go far to slow when I do a query on a judge
> named smith.

If you dont need all the judges named smith you could try to use LIMIT.

Have you run ANALYZE ? Why does DB think that there is only one judge
with name like SMITH% ?

-
Hannu

P.S. 
Always send EXPLAIN ANALYZE output if asking for advice on [PERFORM]

-
Hannu

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] cross table indexes or something?

2003-11-26 Thread Marc A. Leith
Sybase IQ lets you build "joined indexsets". This is amazing but pricey
and really intended more for Data Warehousing than OLTP, although they did 
release a version which permitted writes on-the-fly. (This was implemented 
using a multi-concurrency solution much like PostreSQL uses.)

It essentially pre-joined the data.

Marc A. Leith
redboxdata inc.
E-mail:[EMAIL PROTECTED]

Quoting Jeremiah Jahn <[EMAIL PROTECTED]>:

> I was wondering if there is something I can do that would act similar to
> a index over more than one table. 
> 
> I have about 3 million people in my DB at the moment, they all have
> roles, and many of them have more than one name. 
> 
> for example, a Judge will only have one name, but a Litigant could have
> multiple aliases. Things go far to slow when I do a query on a judge
> named smith. Does any one know a possible way to speed this up? 
> 
> I would think that In a perfect world there would be a way to create an
> index on commonly used joins, or something of that nature. I've tried
> partial indexes, but the optimizer feels that it would be quicker to do
> an index scan for smith% then join using the pkey of the person to get
> their role. For litigants, this makes since, for non-litigants, this
> doesn't. 
> 
> thanx for any insight,
> -jj-
> 
> the basic schema
> 
> actor
>   actor_id PK
>   role_class_code
> 
> identity
>   actor_id FK
>   identity_id PK
>   full_name
> 
> event
>   event_date_time
>   event_id PK
> 
> event_actor
>   event_id FK
>   actor_id FK
> 
> 
> explain select distinct
> actor.actor_id,court.id,court.name,role_class_code,full_name from
> actor,identity,court,event,event_actor where role_class_code = 'Judge' and
> full_name like 'SMITH%' and identity.actor_id = actor.actor_id and
> identity.court_ori = actor.court_ori and actor.court_ori = court.id and
> actor.actor_id = event_actor.actor_id and event_actor.event_id =
> event.event_id and event_date_time > '20021126' order by full_name;
>
> QUERY PLAN
> --


>  Unique  (cost=726.57..726.58 rows=1 width=92)
>->  Sort  (cost=726.57..726.57 rows=1 width=92)
>  Sort Key: identity.full_name, actor.actor_id, court.id, court.name,
> actor.role_class_code
>  ->  Nested Loop  (cost=3.02..726.56 rows=1 width=92)
>->  Nested Loop  (cost=3.02..720.72 rows=1 width=144)
>  ->  Nested Loop  (cost=3.02..9.62 rows=1 width=117)
>Join Filter: (("outer".court_ori)::text =
> ("inner".court_ori)::text)
>->  Hash Join  (cost=3.02..4.18 rows=1 width=93)
>  Hash Cond: (("outer".id)::text =
> ("inner".court_ori)::text)
>  ->  Seq Scan on court  (cost=0.00..1.10
> rows=10 width=34)
>  ->  Hash  (cost=3.01..3.01 rows=1 width=59)
>->  Index Scan using name_speed on
> identity  (cost=0.00..3.01 rows=1 width=59)
>  Index Cond: (((full_name)::text
> >= 'SMITH'::character varying) AND ((full_name)::text < 'SMITI'::character
> varying))
>  Filter: ((full_name)::text ~~
> 'SMITH%'::text)
>->  Index Scan using actor_speed on actor 
> (cost=0.00..5.43 rows=1 width=50)
>  Index Cond: (("outer".actor_id)::text =
> (actor.actor_id)::text)
>  Filter: ((role_class_code)::text =
> 'Judge'::text)
>  ->  Index Scan using event_actor_speed on event_actor 
> (cost=0.00..695.15 rows=1275 width=73)
>Index Cond: ((event_actor.actor_id)::text =
> ("outer".actor_id)::text)
>->  Index Scan using event_pkey on event  (cost=0.00..5.83
> rows=1 width=52)
>  Index Cond: (("outer".event_id)::text =
> (event.event_id)::text)
>  Filter: (event_date_time > '20021126'::bpchar)
> 
> 
> -- 
> "You can't make a program without broken egos."
> -- 
> Jeremiah Jahn <[EMAIL PROTECTED]>
> 
> 
> ---(end of broadcast)---
> TIP 3: if posting/reading through Usenet, please send an appropriate
>   subscribe-nomail command to [EMAIL PROTECTED] so that your
>   message can get through to the mailing list cleanly
> 


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


[PERFORM] cross table indexes or something?

2003-11-26 Thread Jeremiah Jahn
I was wondering if there is something I can do that would act similar to
a index over more than one table. 

I have about 3 million people in my DB at the moment, they all have
roles, and many of them have more than one name. 

for example, a Judge will only have one name, but a Litigant could have
multiple aliases. Things go far to slow when I do a query on a judge
named smith. Does any one know a possible way to speed this up? 

I would think that In a perfect world there would be a way to create an
index on commonly used joins, or something of that nature. I've tried
partial indexes, but the optimizer feels that it would be quicker to do
an index scan for smith% then join using the pkey of the person to get
their role. For litigants, this makes since, for non-litigants, this
doesn't. 

thanx for any insight,
-jj-

the basic schema

actor
actor_id PK
role_class_code

identity
actor_id FK
identity_id PK
full_name

event
event_date_time
event_id PK

event_actor
event_id FK
actor_id FK


explain select distinct actor.actor_id,court.id,court.name,role_class_code,full_name 
from actor,identity,court,event,event_actor where role_class_code = 'Judge' and 
full_name like 'SMITH%' and identity.actor_id = actor.actor_id and identity.court_ori 
= actor.court_ori and actor.court_ori = court.id and actor.actor_id = 
event_actor.actor_id and event_actor.event_id = event.event_id and event_date_time > 
'20021126' order by full_name;
QUERY PLAN
--
 Unique  (cost=726.57..726.58 rows=1 width=92)
   ->  Sort  (cost=726.57..726.57 rows=1 width=92)
 Sort Key: identity.full_name, actor.actor_id, court.id, court.name, 
actor.role_class_code
 ->  Nested Loop  (cost=3.02..726.56 rows=1 width=92)
   ->  Nested Loop  (cost=3.02..720.72 rows=1 width=144)
 ->  Nested Loop  (cost=3.02..9.62 rows=1 width=117)
   Join Filter: (("outer".court_ori)::text = 
("inner".court_ori)::text)
   ->  Hash Join  (cost=3.02..4.18 rows=1 width=93)
 Hash Cond: (("outer".id)::text = 
("inner".court_ori)::text)
 ->  Seq Scan on court  (cost=0.00..1.10 rows=10 
width=34)
 ->  Hash  (cost=3.01..3.01 rows=1 width=59)
   ->  Index Scan using name_speed on identity  
(cost=0.00..3.01 rows=1 width=59)
 Index Cond: (((full_name)::text >= 
'SMITH'::character varying) AND ((full_name)::text < 'SMITI'::character varying))
 Filter: ((full_name)::text ~~ 
'SMITH%'::text)
   ->  Index Scan using actor_speed on actor  (cost=0.00..5.43 
rows=1 width=50)
 Index Cond: (("outer".actor_id)::text = 
(actor.actor_id)::text)
 Filter: ((role_class_code)::text = 'Judge'::text)
 ->  Index Scan using event_actor_speed on event_actor  
(cost=0.00..695.15 rows=1275 width=73)
   Index Cond: ((event_actor.actor_id)::text = 
("outer".actor_id)::text)
   ->  Index Scan using event_pkey on event  (cost=0.00..5.83 rows=1 
width=52)
 Index Cond: (("outer".event_id)::text = (event.event_id)::text)
 Filter: (event_date_time > '20021126'::bpchar)


-- 
"You can't make a program without broken egos."
-- 
Jeremiah Jahn <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly