Re: [PERFORM] cross table indexes or something?

2003-12-02 Thread Christopher Kings-Lynne
4) Currently, pg_dump does *not* back up statistics settings.


Is this a TODO?
Oops - sorry I thought you meant 'pg_dump does not back up statistics'. 
   Probably still should be a TODO :)

Chris



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [PERFORM] cross table indexes or something?

2003-12-02 Thread Christopher Kings-Lynne
4) Currently, pg_dump does *not* back up statistics settings.
Is this a TODO?

Chris



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] cross table indexes or something?

2003-12-02 Thread Neil Conway
Josh Berkus <[EMAIL PROTECTED]> writes:
> Oh, good.   Was this a 7.4 improvement?

No, it was in 7.3

-Neil


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] cross table indexes or something?

2003-12-02 Thread Josh Berkus
Neil,

> > 1) to keep it working, you will probably need to run ANALZYE more
> >often than you have been;
> 
> I'm not sure why this would be the case -- can you elaborate?

For the more granular stats to be useful, they have to be accurate; otherwise 
you'll go back to a nestloop as soon as the query planner encounters a value 
that it doens't think is in the table at all. 

> 
> > 4) Currently, pg_dump does *not* back up statistics settings.
> 
> Yes, it does.

Oh, good.   Was this a 7.4 improvement?   I missed that in the changelogs 

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] cross table indexes or something?

2003-12-02 Thread Neil Conway
Josh Berkus <[EMAIL PROTECTED]> writes:
> 1) to keep it working, you will probably need to run ANALZYE more
>often than you have been;

I'm not sure why this would be the case -- can you elaborate?

> 4) Currently, pg_dump does *not* back up statistics settings.

Yes, it does.

-Neil


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] cross table indexes or something?

2003-12-02 Thread Josh Berkus
Jeremiah,

> Thanks to all, I had already run analyze.  But the STATISTICS setting
> seems to have worked. I'm just not sure what it did..? Would anyone care
> to explain.

The STATISTICS setting improves the granularity of statistics kept by the 
query planner on that column; increasing the granularity (i.e. more random 
samples) can significantly improve things in cases where you have data whose 
distribution is significantly skewed.Certainly whenever you see the query 
planner using a slow nestloop becuase of a bad row-return estimate, it is one 
of the first things to try.

Its drawbacks are 4-fold:
1) to keep it working, you will probably need to run ANALZYE more often than 
you have been;
2) these ANALYZEs will take longer, and have the annoying side effect of 
flooring your CPU while they do;
3) You will have to be sure that your vacuum plan includes vacuuming the 
pg_statistic table as the database superuser, as that table will be getting 
updated more often.
4) Currently, pg_dump does *not* back up statistics settings.  So you will 
need to save a script which does this in preparation for having to restore 
your database.

Which is why the stats are set low by default.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] cross table indexes or something?

2003-12-02 Thread Jeremiah Jahn
Thanks to all, I had already run analyze.  But the STATISTICS setting
seems to have worked. I'm just not sure what it did..? Would anyone care
to explain. 


On Mon, 2003-12-01 at 13:47, Josh Berkus wrote:
> Jeremiah,
> 
> > I've attached the Analyze below. I have no idea why the db thinks there
> > is only 1 judge named simth. Is there some what I can inform the DB
> > about this. In actuality, there aren't any judges named smith at the
> > moment, but there are 22K people named smith.
> 
> No, Hannu meant that you may need to run the following command:
> 
> ANALYZE actor;
> 
> ... to update the database statistics on the actors table.   That is a 
> maintainence task that needs to be run periodically.
> 
> If that doesn't fix the bad plan, then the granularity of statistics on the 
> full_name column needs updating; I suggest:
> 
> ALTER TABLE actor ALTER COLUMN full_name SET STATISTICS 100;
> ANALYZE actor;
> 
> And if it's still choosing  a slow nested loop, up the stats to 250.
-- 
Jeremiah Jahn <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] cross table indexes or something?

2003-12-01 Thread Josh Berkus
Jeremiah,

> I've attached the Analyze below. I have no idea why the db thinks there
> is only 1 judge named simth. Is there some what I can inform the DB
> about this. In actuality, there aren't any judges named smith at the
> moment, but there are 22K people named smith.

No, Hannu meant that you may need to run the following command:

ANALYZE actor;

... to update the database statistics on the actors table.   That is a 
maintainence task that needs to be run periodically.

If that doesn't fix the bad plan, then the granularity of statistics on the 
full_name column needs updating; I suggest:

ALTER TABLE actor ALTER COLUMN full_name SET STATISTICS 100;
ANALYZE actor;

And if it's still choosing  a slow nested loop, up the stats to 250.

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] cross table indexes or something?

2003-12-01 Thread Arjen van der Meijden
> Jeremiah Jahn wrote:
>
> > Have you run ANALYZE ? Why does DB think that there is only 
> one judge 
> > with name like SMITH% ?
> I've attached the Analyze below. I have no idea why the db 
> thinks there is only 1 judge named simth. Is there some what 
> I can inform the DB about this. In actuality, there aren't 
> any judges named smith at the moment, but there are 22K 
> people named smith.
> 

I think you're mistaking the command EXPLAIN ANALYZE for the command
ANALYZE.
Have you actually run the command ANALYZE or perhaps even better if you
haven't vacuumed before: VACUUM FULL ANALYZE

If you have no idea what vacuum is, check the manual. If you've already
run such a VACUUM/ANALYZE-command, then ignore this message :)

Best regards,

Arjen van der Meijden




---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] cross table indexes or something?

2003-12-01 Thread Richard Huxton
On Monday 01 December 2003 14:29, Jeremiah Jahn wrote:
> On Wed, 2003-11-26 at 16:32, Hannu Krosing wrote:
> > Jeremiah Jahn kirjutas K, 26.11.2003 kell 22:14:
> > > I was wondering if there is something I can do that would act similar
> > > to a index over more than one table.
> > >
> > > I have about 3 million people in my DB at the moment, they all have
> > > roles, and many of them have more than one name.
> > >
> > > for example, a Judge will only have one name, but a Litigant could have
> > > multiple aliases. Things go far to slow when I do a query on a judge
> > > named smith.
> >
> > If you dont need all the judges named smith you could try to use LIMIT.
>
> Unfortunately I do need all of the judges named smith.
>
> > Have you run ANALYZE ? Why does DB think that there is only one judge
> > with name like SMITH% ?
>
> I've attached the Analyze below. I have no idea why the db thinks there
> is only 1 judge named simth. Is there some what I can inform the DB
> about this. In actuality, there aren't any judges named smith at the
> moment, but there are 22K people named smith.

It's guessing there's approximately 1. I don't think PG measures 
cross-correlation of various columns cross-table.

If role_class_code on table actor? If so, try:

CREATE INDEX test_judge_idx ON actor (actor_id) WHERE role_class_code = 
'Judge';

And then similar for the other class-codes (assuming you've not got too many 
of them). Or even just an index on (actor_id,role_class_code).

If role_class_code is on a different table, can you say which one? The problem 
is clearly this step:

> ->  Index Scan using actor_speed on
> actor  (cost=0.00..5.42 rows=1 width=50) (actual time=4.883..4.883 rows=0
> loops=22436)
>  Index Cond: (("outer".actor_id)::text =
> (actor.actor_id)::text) Filter: ((role_class_code)::text = 'Judge'::text)

Thats 4.883 * 22436 loops = 109555 milliseconds.
-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] cross table indexes or something?

2003-12-01 Thread Jeremiah Jahn
On Wed, 2003-11-26 at 16:32, Hannu Krosing wrote:
> Jeremiah Jahn kirjutas K, 26.11.2003 kell 22:14:
> > I was wondering if there is something I can do that would act similar to
> > a index over more than one table. 
> > 
> > I have about 3 million people in my DB at the moment, they all have
> > roles, and many of them have more than one name. 
> > 
> > for example, a Judge will only have one name, but a Litigant could have
> > multiple aliases. Things go far to slow when I do a query on a judge
> > named smith.
> 
> If you dont need all the judges named smith you could try to use LIMIT.
Unfortunately I do need all of the judges named smith.


> 
> Have you run ANALYZE ? Why does DB think that there is only one judge
> with name like SMITH% ?
I've attached the Analyze below. I have no idea why the db thinks there
is only 1 judge named simth. Is there some what I can inform the DB
about this. In actuality, there aren't any judges named smith at the
moment, but there are 22K people named smith.


> 
> -
> Hannu
> 
> P.S. 
> Always send EXPLAIN ANALYZE output if asking for advice on [PERFORM]
 EXPLAIN ANALYZE select distinct 
actor.actor_id,court.id,court.name,role_class_code,full_name from 
actor,identity,court,event,event_actor where role_class_code = 'Judge' and full_name 
like 'SMITH%' and identity.actor_id = actor.actor_id and identity.court_ori = 
actor.court_ori and actor.court_ori = court.id and actor.actor_id = 
event_actor.actor_id and event_actor.event_id = event.event_id and event_date_time > 
'20021126' order by full_name;
  QUERY 
PLAN   
   
---
 Unique  (cost=686.42..686.44 rows=1 width=92) (actual time=111923.877..111923.877 
rows=0 loops=1)
   ->  Sort  (cost=686.42..686.43 rows=1 width=92) (actual time=111923.873..111923.873 
rows=0 loops=1)
 Sort Key: identity.full_name, actor.actor_id, court.id, court.name, 
actor.role_class_code
 ->  Nested Loop  (cost=8.45..686.41 rows=1 width=92) (actual 
time=111923.836..111923.836 rows=0 loops=1)
   ->  Nested Loop  (cost=8.45..680.57 rows=1 width=144) (actual 
time=109958.426..57.822 rows=2449 loops=1)
 ->  Hash Join  (cost=8.45..9.62 rows=1 width=117) (actual 
time=109945.754..109945.896 rows=6 loops=1)
   Hash Cond: (("outer".id)::text = ("inner".court_ori)::text)
   ->  Seq Scan on court  (cost=0.00..1.10 rows=10 width=34) 
(actual time=0.015..0.048 rows=10 loops=1)
   ->  Hash  (cost=8.45..8.45 rows=1 width=109) (actual 
time=109940.161..109940.161 rows=0 loops=1)
 ->  Nested Loop  (cost=0.00..8.45 rows=1 width=109) 
(actual time=10.367..109940.079 rows=7 loops=1)
   Join Filter: (("outer".court_ori)::text = 
("inner".court_ori)::text)
   ->  Index Scan using name_speed on identity  
(cost=0.00..3.01 rows=1 width=59) (actual time=10.202..238.497 rows=22436 loops=1)
 Index Cond: (((full_name)::text >= 
'SMITH'::character varying) AND ((full_name)::text < 'SMITI'::character varying))
 Filter: ((full_name)::text ~~ 
'SMITH%'::text)
   ->  Index Scan using actor_speed on actor  
(cost=0.00..5.42 rows=1 width=50) (actual time=4.883..4.883 rows=0 loops=22436)
 Index Cond: (("outer".actor_id)::text = 
(actor.actor_id)::text)
 Filter: ((role_class_code)::text = 
'Judge'::text)
 ->  Index Scan using event_actor_speed on event_actor  
(cost=0.00..655.59 rows=1229 width=73) (actual time=11.815..198.759 rows=408 loops=6)
   Index Cond: ((event_actor.actor_id)::text = 
("outer".actor_id)::text)
   ->  Index Scan using event_pkey on event  (cost=0.00..5.83 rows=1 
width=52) (actual time=0.308..0.308 rows=0 loops=2449)
 Index Cond: (("outer".event_id)::text = (event.event_id)::text)
 Filter: (event_date_time > '20021126'::bpchar)
 Total runtime: 111924.833 ms
(23 rows)



> 
> -
> Hannu
> 
> ---(end of broadcast)---
> TIP 4: Don't 'kill -9' the postmaster
-- 
Jeremiah Jahn <[EMAIL PROTECTED]>


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECT

Re: [PERFORM] cross table indexes or something?

2003-11-26 Thread Hannu Krosing
Jeremiah Jahn kirjutas K, 26.11.2003 kell 22:14:
> I was wondering if there is something I can do that would act similar to
> a index over more than one table. 
> 
> I have about 3 million people in my DB at the moment, they all have
> roles, and many of them have more than one name. 
> 
> for example, a Judge will only have one name, but a Litigant could have
> multiple aliases. Things go far to slow when I do a query on a judge
> named smith.

If you dont need all the judges named smith you could try to use LIMIT.

Have you run ANALYZE ? Why does DB think that there is only one judge
with name like SMITH% ?

-
Hannu

P.S. 
Always send EXPLAIN ANALYZE output if asking for advice on [PERFORM]

-
Hannu

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] cross table indexes or something?

2003-11-26 Thread Marc A. Leith
Sybase IQ lets you build "joined indexsets". This is amazing but pricey
and really intended more for Data Warehousing than OLTP, although they did 
release a version which permitted writes on-the-fly. (This was implemented 
using a multi-concurrency solution much like PostreSQL uses.)

It essentially pre-joined the data.

Marc A. Leith
redboxdata inc.
E-mail:[EMAIL PROTECTED]

Quoting Jeremiah Jahn <[EMAIL PROTECTED]>:

> I was wondering if there is something I can do that would act similar to
> a index over more than one table. 
> 
> I have about 3 million people in my DB at the moment, they all have
> roles, and many of them have more than one name. 
> 
> for example, a Judge will only have one name, but a Litigant could have
> multiple aliases. Things go far to slow when I do a query on a judge
> named smith. Does any one know a possible way to speed this up? 
> 
> I would think that In a perfect world there would be a way to create an
> index on commonly used joins, or something of that nature. I've tried
> partial indexes, but the optimizer feels that it would be quicker to do
> an index scan for smith% then join using the pkey of the person to get
> their role. For litigants, this makes since, for non-litigants, this
> doesn't. 
> 
> thanx for any insight,
> -jj-
> 
> the basic schema
> 
> actor
>   actor_id PK
>   role_class_code
> 
> identity
>   actor_id FK
>   identity_id PK
>   full_name
> 
> event
>   event_date_time
>   event_id PK
> 
> event_actor
>   event_id FK
>   actor_id FK
> 
> 
> explain select distinct
> actor.actor_id,court.id,court.name,role_class_code,full_name from
> actor,identity,court,event,event_actor where role_class_code = 'Judge' and
> full_name like 'SMITH%' and identity.actor_id = actor.actor_id and
> identity.court_ori = actor.court_ori and actor.court_ori = court.id and
> actor.actor_id = event_actor.actor_id and event_actor.event_id =
> event.event_id and event_date_time > '20021126' order by full_name;
>
> QUERY PLAN
> --


>  Unique  (cost=726.57..726.58 rows=1 width=92)
>->  Sort  (cost=726.57..726.57 rows=1 width=92)
>  Sort Key: identity.full_name, actor.actor_id, court.id, court.name,
> actor.role_class_code
>  ->  Nested Loop  (cost=3.02..726.56 rows=1 width=92)
>->  Nested Loop  (cost=3.02..720.72 rows=1 width=144)
>  ->  Nested Loop  (cost=3.02..9.62 rows=1 width=117)
>Join Filter: (("outer".court_ori)::text =
> ("inner".court_ori)::text)
>->  Hash Join  (cost=3.02..4.18 rows=1 width=93)
>  Hash Cond: (("outer".id)::text =
> ("inner".court_ori)::text)
>  ->  Seq Scan on court  (cost=0.00..1.10
> rows=10 width=34)
>  ->  Hash  (cost=3.01..3.01 rows=1 width=59)
>->  Index Scan using name_speed on
> identity  (cost=0.00..3.01 rows=1 width=59)
>  Index Cond: (((full_name)::text
> >= 'SMITH'::character varying) AND ((full_name)::text < 'SMITI'::character
> varying))
>  Filter: ((full_name)::text ~~
> 'SMITH%'::text)
>->  Index Scan using actor_speed on actor 
> (cost=0.00..5.43 rows=1 width=50)
>  Index Cond: (("outer".actor_id)::text =
> (actor.actor_id)::text)
>  Filter: ((role_class_code)::text =
> 'Judge'::text)
>  ->  Index Scan using event_actor_speed on event_actor 
> (cost=0.00..695.15 rows=1275 width=73)
>Index Cond: ((event_actor.actor_id)::text =
> ("outer".actor_id)::text)
>->  Index Scan using event_pkey on event  (cost=0.00..5.83
> rows=1 width=52)
>  Index Cond: (("outer".event_id)::text =
> (event.event_id)::text)
>  Filter: (event_date_time > '20021126'::bpchar)
> 
> 
> -- 
> "You can't make a program without broken egos."
> -- 
> Jeremiah Jahn <[EMAIL PROTECTED]>
> 
> 
> ---(end of broadcast)---
> TIP 3: if posting/reading through Usenet, please send an appropriate
>   subscribe-nomail command to [EMAIL PROTECTED] so that your
>   message can get through to the mailing list cleanly
> 


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])