Re: [PERFORM] cross table indexes or something?
4) Currently, pg_dump does *not* back up statistics settings. Is this a TODO? Oops - sorry I thought you meant 'pg_dump does not back up statistics'. Probably still should be a TODO :) Chris ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] cross table indexes or something?
4) Currently, pg_dump does *not* back up statistics settings. Is this a TODO? Chris ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] cross table indexes or something?
Josh Berkus <[EMAIL PROTECTED]> writes: > Oh, good. Was this a 7.4 improvement? No, it was in 7.3 -Neil ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] cross table indexes or something?
Neil, > > 1) to keep it working, you will probably need to run ANALZYE more > >often than you have been; > > I'm not sure why this would be the case -- can you elaborate? For the more granular stats to be useful, they have to be accurate; otherwise you'll go back to a nestloop as soon as the query planner encounters a value that it doens't think is in the table at all. > > > 4) Currently, pg_dump does *not* back up statistics settings. > > Yes, it does. Oh, good. Was this a 7.4 improvement? I missed that in the changelogs -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] cross table indexes or something?
Josh Berkus <[EMAIL PROTECTED]> writes: > 1) to keep it working, you will probably need to run ANALZYE more >often than you have been; I'm not sure why this would be the case -- can you elaborate? > 4) Currently, pg_dump does *not* back up statistics settings. Yes, it does. -Neil ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] cross table indexes or something?
Jeremiah, > Thanks to all, I had already run analyze. But the STATISTICS setting > seems to have worked. I'm just not sure what it did..? Would anyone care > to explain. The STATISTICS setting improves the granularity of statistics kept by the query planner on that column; increasing the granularity (i.e. more random samples) can significantly improve things in cases where you have data whose distribution is significantly skewed.Certainly whenever you see the query planner using a slow nestloop becuase of a bad row-return estimate, it is one of the first things to try. Its drawbacks are 4-fold: 1) to keep it working, you will probably need to run ANALZYE more often than you have been; 2) these ANALYZEs will take longer, and have the annoying side effect of flooring your CPU while they do; 3) You will have to be sure that your vacuum plan includes vacuuming the pg_statistic table as the database superuser, as that table will be getting updated more often. 4) Currently, pg_dump does *not* back up statistics settings. So you will need to save a script which does this in preparation for having to restore your database. Which is why the stats are set low by default. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] cross table indexes or something?
Thanks to all, I had already run analyze. But the STATISTICS setting seems to have worked. I'm just not sure what it did..? Would anyone care to explain. On Mon, 2003-12-01 at 13:47, Josh Berkus wrote: > Jeremiah, > > > I've attached the Analyze below. I have no idea why the db thinks there > > is only 1 judge named simth. Is there some what I can inform the DB > > about this. In actuality, there aren't any judges named smith at the > > moment, but there are 22K people named smith. > > No, Hannu meant that you may need to run the following command: > > ANALYZE actor; > > ... to update the database statistics on the actors table. That is a > maintainence task that needs to be run periodically. > > If that doesn't fix the bad plan, then the granularity of statistics on the > full_name column needs updating; I suggest: > > ALTER TABLE actor ALTER COLUMN full_name SET STATISTICS 100; > ANALYZE actor; > > And if it's still choosing a slow nested loop, up the stats to 250. -- Jeremiah Jahn <[EMAIL PROTECTED]> ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] cross table indexes or something?
Jeremiah, > I've attached the Analyze below. I have no idea why the db thinks there > is only 1 judge named simth. Is there some what I can inform the DB > about this. In actuality, there aren't any judges named smith at the > moment, but there are 22K people named smith. No, Hannu meant that you may need to run the following command: ANALYZE actor; ... to update the database statistics on the actors table. That is a maintainence task that needs to be run periodically. If that doesn't fix the bad plan, then the granularity of statistics on the full_name column needs updating; I suggest: ALTER TABLE actor ALTER COLUMN full_name SET STATISTICS 100; ANALYZE actor; And if it's still choosing a slow nested loop, up the stats to 250. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] cross table indexes or something?
> Jeremiah Jahn wrote: > > > Have you run ANALYZE ? Why does DB think that there is only > one judge > > with name like SMITH% ? > I've attached the Analyze below. I have no idea why the db > thinks there is only 1 judge named simth. Is there some what > I can inform the DB about this. In actuality, there aren't > any judges named smith at the moment, but there are 22K > people named smith. > I think you're mistaking the command EXPLAIN ANALYZE for the command ANALYZE. Have you actually run the command ANALYZE or perhaps even better if you haven't vacuumed before: VACUUM FULL ANALYZE If you have no idea what vacuum is, check the manual. If you've already run such a VACUUM/ANALYZE-command, then ignore this message :) Best regards, Arjen van der Meijden ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] cross table indexes or something?
On Monday 01 December 2003 14:29, Jeremiah Jahn wrote: > On Wed, 2003-11-26 at 16:32, Hannu Krosing wrote: > > Jeremiah Jahn kirjutas K, 26.11.2003 kell 22:14: > > > I was wondering if there is something I can do that would act similar > > > to a index over more than one table. > > > > > > I have about 3 million people in my DB at the moment, they all have > > > roles, and many of them have more than one name. > > > > > > for example, a Judge will only have one name, but a Litigant could have > > > multiple aliases. Things go far to slow when I do a query on a judge > > > named smith. > > > > If you dont need all the judges named smith you could try to use LIMIT. > > Unfortunately I do need all of the judges named smith. > > > Have you run ANALYZE ? Why does DB think that there is only one judge > > with name like SMITH% ? > > I've attached the Analyze below. I have no idea why the db thinks there > is only 1 judge named simth. Is there some what I can inform the DB > about this. In actuality, there aren't any judges named smith at the > moment, but there are 22K people named smith. It's guessing there's approximately 1. I don't think PG measures cross-correlation of various columns cross-table. If role_class_code on table actor? If so, try: CREATE INDEX test_judge_idx ON actor (actor_id) WHERE role_class_code = 'Judge'; And then similar for the other class-codes (assuming you've not got too many of them). Or even just an index on (actor_id,role_class_code). If role_class_code is on a different table, can you say which one? The problem is clearly this step: > -> Index Scan using actor_speed on > actor (cost=0.00..5.42 rows=1 width=50) (actual time=4.883..4.883 rows=0 > loops=22436) > Index Cond: (("outer".actor_id)::text = > (actor.actor_id)::text) Filter: ((role_class_code)::text = 'Judge'::text) Thats 4.883 * 22436 loops = 109555 milliseconds. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] cross table indexes or something?
On Wed, 2003-11-26 at 16:32, Hannu Krosing wrote: > Jeremiah Jahn kirjutas K, 26.11.2003 kell 22:14: > > I was wondering if there is something I can do that would act similar to > > a index over more than one table. > > > > I have about 3 million people in my DB at the moment, they all have > > roles, and many of them have more than one name. > > > > for example, a Judge will only have one name, but a Litigant could have > > multiple aliases. Things go far to slow when I do a query on a judge > > named smith. > > If you dont need all the judges named smith you could try to use LIMIT. Unfortunately I do need all of the judges named smith. > > Have you run ANALYZE ? Why does DB think that there is only one judge > with name like SMITH% ? I've attached the Analyze below. I have no idea why the db thinks there is only 1 judge named simth. Is there some what I can inform the DB about this. In actuality, there aren't any judges named smith at the moment, but there are 22K people named smith. > > - > Hannu > > P.S. > Always send EXPLAIN ANALYZE output if asking for advice on [PERFORM] EXPLAIN ANALYZE select distinct actor.actor_id,court.id,court.name,role_class_code,full_name from actor,identity,court,event,event_actor where role_class_code = 'Judge' and full_name like 'SMITH%' and identity.actor_id = actor.actor_id and identity.court_ori = actor.court_ori and actor.court_ori = court.id and actor.actor_id = event_actor.actor_id and event_actor.event_id = event.event_id and event_date_time > '20021126' order by full_name; QUERY PLAN --- Unique (cost=686.42..686.44 rows=1 width=92) (actual time=111923.877..111923.877 rows=0 loops=1) -> Sort (cost=686.42..686.43 rows=1 width=92) (actual time=111923.873..111923.873 rows=0 loops=1) Sort Key: identity.full_name, actor.actor_id, court.id, court.name, actor.role_class_code -> Nested Loop (cost=8.45..686.41 rows=1 width=92) (actual time=111923.836..111923.836 rows=0 loops=1) -> Nested Loop (cost=8.45..680.57 rows=1 width=144) (actual time=109958.426..57.822 rows=2449 loops=1) -> Hash Join (cost=8.45..9.62 rows=1 width=117) (actual time=109945.754..109945.896 rows=6 loops=1) Hash Cond: (("outer".id)::text = ("inner".court_ori)::text) -> Seq Scan on court (cost=0.00..1.10 rows=10 width=34) (actual time=0.015..0.048 rows=10 loops=1) -> Hash (cost=8.45..8.45 rows=1 width=109) (actual time=109940.161..109940.161 rows=0 loops=1) -> Nested Loop (cost=0.00..8.45 rows=1 width=109) (actual time=10.367..109940.079 rows=7 loops=1) Join Filter: (("outer".court_ori)::text = ("inner".court_ori)::text) -> Index Scan using name_speed on identity (cost=0.00..3.01 rows=1 width=59) (actual time=10.202..238.497 rows=22436 loops=1) Index Cond: (((full_name)::text >= 'SMITH'::character varying) AND ((full_name)::text < 'SMITI'::character varying)) Filter: ((full_name)::text ~~ 'SMITH%'::text) -> Index Scan using actor_speed on actor (cost=0.00..5.42 rows=1 width=50) (actual time=4.883..4.883 rows=0 loops=22436) Index Cond: (("outer".actor_id)::text = (actor.actor_id)::text) Filter: ((role_class_code)::text = 'Judge'::text) -> Index Scan using event_actor_speed on event_actor (cost=0.00..655.59 rows=1229 width=73) (actual time=11.815..198.759 rows=408 loops=6) Index Cond: ((event_actor.actor_id)::text = ("outer".actor_id)::text) -> Index Scan using event_pkey on event (cost=0.00..5.83 rows=1 width=52) (actual time=0.308..0.308 rows=0 loops=2449) Index Cond: (("outer".event_id)::text = (event.event_id)::text) Filter: (event_date_time > '20021126'::bpchar) Total runtime: 111924.833 ms (23 rows) > > - > Hannu > > ---(end of broadcast)--- > TIP 4: Don't 'kill -9' the postmaster -- Jeremiah Jahn <[EMAIL PROTECTED]> ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECT
Re: [PERFORM] cross table indexes or something?
Jeremiah Jahn kirjutas K, 26.11.2003 kell 22:14: > I was wondering if there is something I can do that would act similar to > a index over more than one table. > > I have about 3 million people in my DB at the moment, they all have > roles, and many of them have more than one name. > > for example, a Judge will only have one name, but a Litigant could have > multiple aliases. Things go far to slow when I do a query on a judge > named smith. If you dont need all the judges named smith you could try to use LIMIT. Have you run ANALYZE ? Why does DB think that there is only one judge with name like SMITH% ? - Hannu P.S. Always send EXPLAIN ANALYZE output if asking for advice on [PERFORM] - Hannu ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] cross table indexes or something?
Sybase IQ lets you build "joined indexsets". This is amazing but pricey and really intended more for Data Warehousing than OLTP, although they did release a version which permitted writes on-the-fly. (This was implemented using a multi-concurrency solution much like PostreSQL uses.) It essentially pre-joined the data. Marc A. Leith redboxdata inc. E-mail:[EMAIL PROTECTED] Quoting Jeremiah Jahn <[EMAIL PROTECTED]>: > I was wondering if there is something I can do that would act similar to > a index over more than one table. > > I have about 3 million people in my DB at the moment, they all have > roles, and many of them have more than one name. > > for example, a Judge will only have one name, but a Litigant could have > multiple aliases. Things go far to slow when I do a query on a judge > named smith. Does any one know a possible way to speed this up? > > I would think that In a perfect world there would be a way to create an > index on commonly used joins, or something of that nature. I've tried > partial indexes, but the optimizer feels that it would be quicker to do > an index scan for smith% then join using the pkey of the person to get > their role. For litigants, this makes since, for non-litigants, this > doesn't. > > thanx for any insight, > -jj- > > the basic schema > > actor > actor_id PK > role_class_code > > identity > actor_id FK > identity_id PK > full_name > > event > event_date_time > event_id PK > > event_actor > event_id FK > actor_id FK > > > explain select distinct > actor.actor_id,court.id,court.name,role_class_code,full_name from > actor,identity,court,event,event_actor where role_class_code = 'Judge' and > full_name like 'SMITH%' and identity.actor_id = actor.actor_id and > identity.court_ori = actor.court_ori and actor.court_ori = court.id and > actor.actor_id = event_actor.actor_id and event_actor.event_id = > event.event_id and event_date_time > '20021126' order by full_name; > > QUERY PLAN > -- > Unique (cost=726.57..726.58 rows=1 width=92) >-> Sort (cost=726.57..726.57 rows=1 width=92) > Sort Key: identity.full_name, actor.actor_id, court.id, court.name, > actor.role_class_code > -> Nested Loop (cost=3.02..726.56 rows=1 width=92) >-> Nested Loop (cost=3.02..720.72 rows=1 width=144) > -> Nested Loop (cost=3.02..9.62 rows=1 width=117) >Join Filter: (("outer".court_ori)::text = > ("inner".court_ori)::text) >-> Hash Join (cost=3.02..4.18 rows=1 width=93) > Hash Cond: (("outer".id)::text = > ("inner".court_ori)::text) > -> Seq Scan on court (cost=0.00..1.10 > rows=10 width=34) > -> Hash (cost=3.01..3.01 rows=1 width=59) >-> Index Scan using name_speed on > identity (cost=0.00..3.01 rows=1 width=59) > Index Cond: (((full_name)::text > >= 'SMITH'::character varying) AND ((full_name)::text < 'SMITI'::character > varying)) > Filter: ((full_name)::text ~~ > 'SMITH%'::text) >-> Index Scan using actor_speed on actor > (cost=0.00..5.43 rows=1 width=50) > Index Cond: (("outer".actor_id)::text = > (actor.actor_id)::text) > Filter: ((role_class_code)::text = > 'Judge'::text) > -> Index Scan using event_actor_speed on event_actor > (cost=0.00..695.15 rows=1275 width=73) >Index Cond: ((event_actor.actor_id)::text = > ("outer".actor_id)::text) >-> Index Scan using event_pkey on event (cost=0.00..5.83 > rows=1 width=52) > Index Cond: (("outer".event_id)::text = > (event.event_id)::text) > Filter: (event_date_time > '20021126'::bpchar) > > > -- > "You can't make a program without broken egos." > -- > Jeremiah Jahn <[EMAIL PROTECTED]> > > > ---(end of broadcast)--- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to [EMAIL PROTECTED] so that your > message can get through to the mailing list cleanly > ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])