Re: [PERFORM] postgres eating CPU
Josh Berkus wrote: in March there was an interesting discussion on the list with the subject "postgres eating CPU on HP9000". http://archives.postgresql.org/pgsql-performance/2004-03/msg00380.php -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL begin:vcard fn:Joshua Drake n:Drake;Joshua org:Command Prompt, Inc. adr:;;PO Box 215 ;Cascade Locks;OR;97014;US email;internet:[EMAIL PROTECTED] title:Consultant tel;work:503-667-4564 tel;fax:503-210-0334 x-mozilla-html:FALSE url:http://www.commandprompt.com version:2.1 end:vcard ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Analyzer is clueless
>> I've seen cases where it seems the >> planer doesn't think it'll be getting a unique value or a small set of >> values even though stats indicates that it should be. A test case exhibiting the problem would be helpful. regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Analyzer is clueless
Jim, > Is there still a good reason to have the histogram stats so low? Should > the default be changed to more like 100 at this point? Low overhead. This is actually a TODO for me for 8.1. I need to find some test cases to set a differential level of histogram access for indexed fields, so like 10 for most fields but 100/150/200 for indexed fields. However, I got stalled on finding test cases and then ran out of time. > Also, how extensively does the planner use n_distinct, null_frac, > reltuples and the histogram to see what the odds are of finding a unique > value or a low number of values? I've seen cases where it seems the > planer doesn't think it'll be getting a unique value or a small set of > values even though stats indicates that it should be. > > One final question... would there be interest in a process that would > dynamically update the histogram settings for tables based on how > distinct/unique each field was? Well, the process by which the analyzer decides that a field is unique could probably use some troubleshooting. And we always, always could use suggestions/tests/help with the query planner. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Analyzer is clueless
On Wed, Nov 17, 2004 at 10:32:48PM +, Simon Riggs wrote: > The main issue is that PostgreSQL's default histogram statistics setting > is lower than other RDBMS. This means that it is less able to > discriminate between cases such as yours that are close to the edge. > This is a trade-off between run-time of the ANALYZE command and the > benefit it produces. As Joshua suggests, increasing the statistics > target for this table will likely allow the optimizer to correctly > determine the selectivity of the index and take the right path. Is there still a good reason to have the histogram stats so low? Should the default be changed to more like 100 at this point? Also, how extensively does the planner use n_distinct, null_frac, reltuples and the histogram to see what the odds are of finding a unique value or a low number of values? I've seen cases where it seems the planer doesn't think it'll be getting a unique value or a small set of values even though stats indicates that it should be. One final question... would there be interest in a process that would dynamically update the histogram settings for tables based on how distinct/unique each field was? -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: "Where do you want to go today?" Linux: "Where do you want to go tomorrow?" FreeBSD: "Are you guys coming, or what?" ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] postgres eating CPU
"[EMAIL PROTECTED]" <[EMAIL PROTECTED]> writes: > in March there was an interesting discussion on the list with the > subject "postgres eating CPU on HP9000". >http://archives.postgresql.org/pgsql-performance/2004-03/msg00380.php Reviewing that, the problem is most likely that (a) they didn't have max_fsm_pages set high enough to cover the database, and (b) they were running 7.3.* which is prone to index bloat. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] postgres eating CPU
> >>in March there was an interesting discussion on the list with the > >>subject "postgres eating CPU on HP9000". Aha, this one. Yeah, I believe that they upgraded to 7.4 inorder to deal with REINDEX issues. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] postgres eating CPU
Josh Berkus wrote: in March there was an interesting discussion on the list with the subject "postgres eating CPU on HP9000". Link, please? http://archives.postgresql.org/pgsql-performance/2004-03/msg00380.php ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Analyzer is clueless
On Thu, 2004-11-18 at 02:08, David Brown wrote: > Statistics were run on each table before query execution. The random page > cost was lowered to 2, but as you can see, the estimated costs are wild > anyway. > > As a comparison, MS SQL Server took less than 15 seconds, or 4 times faster. > > MySQL (InnoDB) took 2 seconds, which is 30 times faster. > > The query looks straightforward to me (it might be clearer with a subselect), > so what on earth is wrong? The query is, as you say, straightforward. You are clearly working with a query that is on the very edge of the decision between using an index or not. The main issue is that PostgreSQL's default histogram statistics setting is lower than other RDBMS. This means that it is less able to discriminate between cases such as yours that are close to the edge. This is a trade-off between run-time of the ANALYZE command and the benefit it produces. As Joshua suggests, increasing the statistics target for this table will likely allow the optimizer to correctly determine the selectivity of the index and take the right path. If this is a general RDBMS comparison, you may wish to extend the system's default_statistics_target = 80 or at least > 10. To improve this query, you may wish to extend the table's statistics target using: ALTER TABLE "drinv" ALTER COLUMN OrderDate SET STATISTICS 100; which should allow the planner to more accurately estimate statistics and thereby select an index, if appropriate. The doco has recently been changed with regard to effective_cache_size; you don't mention what beta release level you're using. That is the only planner parameter that takes cache size into account, so any other changes would certainly have zero effect on this *plan* though might still benefit execution time. Please post EXPLAIN ANALYZE output for any further questions. -- Best Regards, Simon Riggs ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] postgres eating CPU
> in March there was an interesting discussion on the list with the > subject "postgres eating CPU on HP9000". Link, please? -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] postgres eating CPU
Hi, in March there was an interesting discussion on the list with the subject "postgres eating CPU on HP9000". Now I'm the same problem on a Dell dual processor machine. Anybody know if there was a solution? Thanks Piergiorgio ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] sort_mem affect on inserts?
David, > I understand that the sort_mem conf setting affects queries with order by, > etc., and the doc mentions that it is used in create index. Does sort_mem > affect the updating of indexes, i.e., can the sort_mem setting affect the > performance of inserts? Only if the table has Foriegn Keys whose lookup might require a large sort. Otherwise, no. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] sort_mem affect on inserts?
I understand that the sort_mem conf setting affects queries with order by, etc., and the doc mentions that it is used in create index. Does sort_mem affect the updating of indexes, i.e., can the sort_mem setting affect the performance of inserts? - DAP -- David ParkerTazz Networks(401) 709-5130 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] memcached and PostgreSQL
Michael, > Still, it seems like a convenient way to maintain cache coherency, > assuming that your application doesn't already have a clean way to do > that. Precisely.The big problem with memory caching is the cache getting out of sync with the database. Updating the cache through database triggers helps ameliorate that. However, our inability to pass messages with NOTIFY somewhat limits the the utility of this solution Sean wants "on commit triggers", but there's some major issues to work out with that. Passing messages with NOTIFY would be easier and almost as good. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Analyzer is clueless
Hello, Have you tried increasing the statistics target for orderdate and rerunning analyze? Sincerely, Joshua D. Drake David Brown wrote: I'm doing some performance profiling with a simple two-table query: SELECT L."ProductID", sum(L."Amount") FROM "drinv" H JOIN "drinvln" L ON L."OrderNo" = H."OrderNo" WHERE ("OrderDate" between '2003-01-01' AND '2003-04-30') GROUP BY L."ProductID" drinv and drinvln have about 100,000 and 3,500,000 rows respectively. Actual data size in the large table is 500-600MB. OrderNo is indexed in both tables, as is OrderDate. The environment is PGSQL 8 on Win2k with 512MB RAM (results are similar to 7.3 from Mammoth). I've tried tweaking various conf parameters, but apart from using up memory, nothing seems to have had a tangible effect - the Analyzer doesn't seem to take resources into account like some of the doco suggests. The date selection represents about 5% of the range. Here's the plan summaries: Three months (2003-01-01 to 2003-03-30) = 1 second HashAggregate (cost=119365.53..119368.74 rows=642 width=26) -> Nested Loop (cost=0.00..118791.66 rows=114774 width=26) -> Index Scan using "drinv_OrderDate" on drinv h (cost=0.00..200.27 rows=3142 width=8) Index Cond: (("OrderDate" >= '2003-01-01'::date) AND ("OrderDate" <= '2003-03-30'::date)) -> Index Scan using "drinvln_OrderNo" on drinvln l (cost=0.00..28.73 rows=721 width=34) Index Cond: (l."OrderNo" = "outer"."OrderNo") Four months (2003-01-01 to 2003-04-30) = 60 seconds HashAggregate (cost=126110.53..126113.74 rows=642 width=26) -> Hash Join (cost=277.55..125344.88 rows=153130 width=26) Hash Cond: ("outer"."OrderNo" = "inner"."OrderNo") -> Seq Scan on drinvln l (cost=0.00..106671.35 rows=3372935 width=34) -> Hash (cost=267.07..267.07 rows=4192 width=8) -> Index Scan using "drinv_OrderDate" on drinv h (cost=0.00..267.07 rows=4192 width=8) Index Cond: (("OrderDate" >= '2003-01-01'::date) AND ("OrderDate" <= '2003-04-30'::date)) Four months (2003-01-01 to 2003-04-30) with Seq_scan disabled = 75 seconds HashAggregate (cost=130565.83..130569.04 rows=642 width=26) -> Merge Join (cost=519.29..129800.18 rows=153130 width=26) Merge Cond: ("outer"."OrderNo" = "inner"."OrderNo") -> Sort (cost=519.29..529.77 rows=4192 width=8) Sort Key: h."OrderNo" -> Index Scan using "drinv_OrderDate" on drinv h (cost=0.00..267.07 rows=4192 width=8) Index Cond: (("OrderDate" >= '2003-01-01'::date) AND ("OrderDate" <= '2003-04-30'::date)) -> Index Scan using "drinvln_OrderNo" on drinvln l (cost=0.00..119296.29 rows=3372935 width=34) Statistics were run on each table before query execution. The random page cost was lowered to 2, but as you can see, the estimated costs are wild anyway. As a comparison, MS SQL Server took less than 15 seconds, or 4 times faster. MySQL (InnoDB) took 2 seconds, which is 30 times faster. The query looks straightforward to me (it might be clearer with a subselect), so what on earth is wrong? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] -- Command Prompt, Inc., home of PostgreSQL Replication, and plPHP. Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com Mammoth PostgreSQL Replicator. Integrated Replication for PostgreSQL begin:vcard fn:Joshua D. Drake n:Drake;Joshua D. org:Command Prompt, Inc. adr:;;PO Box 215;Cascade Locks;Oregon;97014;USA email;internet:[EMAIL PROTECTED] title:Consultant tel;work:503-667-4564 tel;fax:503-210-0334 note:Command Prompt, Inc. is the largest and oldest US based commercial PostgreSQL support provider. We provide the only commercially viable integrated PostgreSQL replication solution, but also custom programming, and support. We authored the book Practical PostgreSQL, the procedural language plPHP, and adding trigger capability to plPerl. x-mozilla-html:FALSE url:http://www.commandprompt.com/ version:2.1 end:vcard ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] memcached and PostgreSQL
On Wed, Nov 17, 2004 at 09:13:09AM -0800, Darcy Buskermolen wrote: > On November 16, 2004 08:00 pm, Michael Adler wrote: > > http://pugs.postgresql.org/sfpug/archives/21.html > > > > I noticed that some of you left coasters were talking about memcached > > and pgsql. I'm curious to know what was discussed. > > Have a look at the pdf presentation found on the following site: > > http://people.freebsd.org/~seanc/pgmemcache/ Thanks for that. That presentation was rather broad and the API seems rather general purpose, but I wonder why you would really want access the cache by way of the DB? If one major point of memcache is to allocate RAM to a low-overhead server instead of to the RDBMS's disk cache, why would you add the overhead of the RDBMS to the process? (this is a bit of straw man, but just trying to flesh-out the pros and cons) Still, it seems like a convenient way to maintain cache coherency, assuming that your application doesn't already have a clean way to do that. (just my uninformed opinion, though...) -Mike ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Analyzer is clueless
Dan, > I'm doing some performance profiling with a simple two-table query: Please send EXPLAIN ANALYZE for each query, and not just EXPLAIN. Thanks! -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] query plan question
Well based on the autovacuum log that you attached, all of those tables are insert only (at least during the time period included in the log. Is that correct? If so, autovacuum will never do a vacuum (unless required by xid wraparound issues) on those tables. So this doesn't appear to be an autovacuum problem. I'm not sure about the missing pg_statistic entries anyone else care to field that one? Matthew David Parker wrote: Thanks. The tables I'm concerned with are named: 'schema', 'usage', 'usageparameter', and 'flow'. It looks like autovacuum is performing analyzes: % grep "Performing: " logs/.db.tazz.vacuum.log [2004-11-17 12:05:58 PM] Performing: ANALYZE "public"."scriptlibrary_library" [2004-11-17 12:15:59 PM] Performing: ANALYZE "public"."scriptlibraryparm" [2004-11-17 12:15:59 PM] Performing: ANALYZE "public"."usageparameter" [2004-11-17 12:21:00 PM] Performing: ANALYZE "public"."usageproperty" [2004-11-17 12:21:00 PM] Performing: ANALYZE "public"."route" [2004-11-17 12:21:00 PM] Performing: ANALYZE "public"."usageparameter" [2004-11-17 12:21:00 PM] Performing: ANALYZE "public"."scriptlibrary_library" [2004-11-17 12:26:01 PM] Performing: ANALYZE "public"."usage" [2004-11-17 12:26:01 PM] Performing: ANALYZE "public"."usageparameter" [2004-11-17 12:31:04 PM] Performing: ANALYZE "public"."usageproperty" [2004-11-17 12:36:04 PM] Performing: ANALYZE "public"."route" [2004-11-17 12:36:04 PM] Performing: ANALYZE "public"."service_usage" [2004-11-17 12:36:04 PM] Performing: ANALYZE "public"."usageparameter" But when I run the following: select * from pg_statistic where starelid in (select oid from pg_class where relname in ('schema','usageparameter','flow','usage')) it returns no records. Shouldn't it? It doesn't appear to be doing a vacuum anywhere, which makes sense because none of these tables have over the default threshold of 1000. Are there statistics which only get generated by vacuum? I've attached a gzip of the pg_autovacuum log file, with -d 3. Thanks again. - DAP -Original Message- From: Matthew T. O'Connor [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 17, 2004 11:41 AM To: David Parker Cc: Tom Lane; Jeff; Russell Smith; [EMAIL PROTECTED] Subject: Re: [PERFORM] query plan question David Parker wrote: We're using postgresql 7.4.5. I've only recently put pg_autovacuum in place as part of our installation, and I'm basically taking the defaults. I doubt it's a problem with autovacuum itself, but rather with my configuration of it. I have some reading to do, so any pointers to existing autovacuum threads would be greatly appreciated! Well the first thing to do is increase the verbosity of the pg_autovacuum logging output. If you use -d2 or higher, pg_autovacuum will print out a lot of detail on what it thinks the thresholds are and why it is or isn't performing vacuums and analyzes. Attach some of the log and I'll take a look at it. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Tsearch2 really slower than ilike ?
1;2c1;2c1;2cOn Wed, 17 Nov 2004, [iso-8859-15] Herv? Piedvache wrote: Oleg, Sorry but when I do your request I get : # select id_site from site where idx_site_name @@ 'livejourn'; ERROR: type " " does not exist no idea :) btw, what version of postgresql and OS you're running. Could you try minimal test - check sql commands from tsearch2 sources, some basic queries from tsearch2 documentation, tutorials. btw, your query should looks like select id_site from site_rss where idx_site_name @@ 'livejourn'; How did you run your queries at all ? I mean your first message about poor tsearch2 performance. 1;2c1;2c1;2c What is this ? (private: I don't know what happend with my mail, but I do nothing special to disturb the contains when I'm writting to you ...) Le Mardi 16 Novembre 2004 22:13, Oleg Bartunov a ?crit : ok, I downloaded dump of table and here is what I found: zz=# select count(*) from tt; count 183956 (1 row) zz=# select * from stat('select tt from tt') order by ndoc desc, nentry desc,wo rd limit 10; word | ndoc | nentry --+---+ blog | 12710 | 12835 weblog | 4857 | 4859 news | 4402 | 4594 life | 4136 | 4160 world| 1980 | 1986 journal | 1882 | 1883 livejourn| 1737 | 1737 thought | 1669 | 1677 web | 1154 | 1161 scotsman.com | 1138 | 1138 (10 rows) zz=# explain analyze select tt from tt where tt @@ 'blog'; QUERY PLAN --- --- Index Scan using tt_idx on tt (cost=0.00..728.83 rows=184 width=32) (actual time=0.047..141.110 rows=12710 loops=1) Index Cond: (tt @@ '\'blog\''::tsquery) Filter: (tt @@ '\'blog\''::tsquery) Total runtime: 154.105 ms (4 rows) It's really fast ! So, I don't understand your problem. I run query on my desktop machine, nothing special. Oleg On Tue, 16 Nov 2004, [iso-8859-15] Herv? Piedvache wrote: Hi, I'm completly dispointed with Tsearch2 ... I have a table like this : Table "public.site" Column |Type | Modifiers ---+-+--- id_site | integer | not null default nextval('public.site_id_site_seq'::text) site_name | text| site_url | text| url| text| language | text| datecrea | date| default now() id_category | integer | time_refresh | integer | active| integer | error | integer | description | text| version | text| idx_site_name | tsvector| lastcheck | date| lastupdate| timestamp without time zone | Indexes: "site_id_site_key" unique, btree (id_site) "ix_idx_site_name" gist (idx_site_name) Triggers: tsvectorupdate_site_name BEFORE INSERT OR UPDATE ON site FOR EACH ROW EXECUTE PROCEDURE tsearch2('idx_site_name', 'site_name') I have 183 956 records in the database ... SELECT s.site_name, s.id_site, s.description, s.site_url, case when exists (select id_user from user_choice u where u.id_site=s.id_site and u.id_user = 1) then 1 else 0 end as bookmarked FROM site s WHERE s.idx_site_name @@ to_tsquery('atari'); Explain Analyze : QUERY PLAN - - Index Scan using ix_idx_site_name on site s (cost=0.00..1202.12 rows=184 width=158) (actual time=4687.674..4698.422 rows=1 loops=1) Index Cond: (idx_site_name @@ '\'atari\''::tsquery) Filter: (idx_site_name @@ '\'atari\''::tsquery) SubPlan -> Seq Scan on user_choice u (cost=0.00..3.46 rows=1 width=4) (actual time=0.232..0.232 rows=0 loops=1) Filter: ((id_site = $0) AND (id_user = 1)) Total runtime: 4698.608 ms First time I run the request I have a result in about 28 seconds. SELECT s.site_name, s.id_site, s.description, s.site_url, case when exists (select id_user from user_choice u where u.id_site=s.id_site and u.id_user = 1) then 1 else 0 end as bookmarked FROM site_rss s WHERE s.site_name ilike '%atari%'
Re: [PERFORM] Tsearch2 really slower than ilike ?
Oleg, Sorry but when I do your request I get : # select id_site from site where idx_site_name @@ 'livejourn'; ERROR: type " " does not exist What is this ? (private: I don't know what happend with my mail, but I do nothing special to disturb the contains when I'm writting to you ...) Le Mardi 16 Novembre 2004 22:13, Oleg Bartunov a écrit : > ok, I downloaded dump of table and here is what I found: > > zz=# select count(*) from tt; > count > > 183956 > (1 row) > > zz=# select * from stat('select tt from tt') order by ndoc desc, nentry > desc,wo > rd limit 10; > word | ndoc | nentry > --+---+ > blog | 12710 | 12835 > weblog | 4857 | 4859 > news | 4402 | 4594 > life | 4136 | 4160 > world| 1980 | 1986 > journal | 1882 | 1883 > livejourn| 1737 | 1737 > thought | 1669 | 1677 > web | 1154 | 1161 > scotsman.com | 1138 | 1138 > (10 rows) > > zz=# explain analyze select tt from tt where tt @@ 'blog'; >QUERY PLAN > --- >--- Index Scan using tt_idx on tt > (cost=0.00..728.83 rows=184 width=32) (actual time=0.047..141.110 > rows=12710 loops=1) Index Cond: (tt @@ '\'blog\''::tsquery) > Filter: (tt @@ '\'blog\''::tsquery) > Total runtime: 154.105 ms > (4 rows) > > It's really fast ! So, I don't understand your problem. > I run query on my desktop machine, nothing special. > > > Oleg > > On Tue, 16 Nov 2004, [iso-8859-15] Herv? Piedvache wrote: > > Hi, > > > > I'm completly dispointed with Tsearch2 ... > > > > I have a table like this : > > Table "public.site" > >Column |Type | > > Modifiers > > ---+-+--- > > id_site | integer > > | not null default > > nextval('public.site_id_site_seq'::text) > > site_name | text| > > site_url | text| > > url| text| > > language | text| > > datecrea | date| default now() > > id_category | integer | > > time_refresh | integer | > > active| integer | > > error | integer | > > description | text| > > version | text| > > idx_site_name | tsvector| > > lastcheck | date| > > lastupdate| timestamp without time zone | > > Indexes: > >"site_id_site_key" unique, btree (id_site) > >"ix_idx_site_name" gist (idx_site_name) > > Triggers: > >tsvectorupdate_site_name BEFORE INSERT OR UPDATE ON site FOR EACH ROW > > EXECUTE PROCEDURE tsearch2('idx_site_name', 'site_name') > > > > I have 183 956 records in the database ... > > > > SELECT s.site_name, s.id_site, s.description, s.site_url, > >case when exists (select id_user > >from user_choice u > > where u.id_site=s.id_site > >and u.id_user = 1) > > then 1 else 0 end as bookmarked > > FROM site s > > WHERE s.idx_site_name @@ to_tsquery('atari'); > > > > Explain Analyze : > > QUERY PLAN > > - > >- Index > > Scan using ix_idx_site_name on site s (cost=0.00..1202.12 rows=184 > > width=158) (actual time=4687.674..4698.422 rows=1 loops=1) > > Index Cond: (idx_site_name @@ '\'atari\''::tsquery) > > Filter: (idx_site_name @@ '\'atari\''::tsquery) > > SubPlan > > -> Seq Scan on user_choice u (cost=0.00..3.46 rows=1 width=4) > > (actual time=0.232..0.232 rows=0 loops=1) > > Filter: ((id_site = $0) AND (id_user = 1)) > > Total runtime: 4698.608 ms > > > > First time I run the request I have a result in about 28 seconds. > > > > SELECT s.site_name, s.id_site, s.description, s.site_url, > >case when exists (select id_user > >from user_choice u > > where u.id_site=s.id_site > >and u.id_user = 1) > > then 1 else 0 end as bookmarked > > FROM site_rss s > > WHERE s.site_name ilike '%atari%' > > > > QUERY PLAN > > ---
Re: [PERFORM] memcached and PostgreSQL
On November 16, 2004 08:00 pm, Michael Adler wrote: > http://pugs.postgresql.org/sfpug/archives/21.html > > I noticed that some of you left coasters were talking about memcached > and pgsql. I'm curious to know what was discussed. > > In reading about memcached, it seems that many people are using it to > circumvent the scalability problems of MySQL (lack of MVCC). > > from their site: > > > Shouldn't the database do this? > > Regardless of what database you use (MS-SQL, Oracle, Postgres, > MysQL-InnoDB, etc..), there's a lot of overhead in implementing ACID > properties in a RDBMS, especially when disks are involved, which means > queries are going to block. For databases that aren't ACID-compliant > (like MySQL-MyISAM), that overhead doesn't exist, but reading threads > block on the writing threads. memcached never blocks. > > > So What does memcached offer pgsql users? It would still seem to offer > the benefit of a multi-machined cache. Have a look at the pdf presentation found on the following site: http://people.freebsd.org/~seanc/pgmemcache/ > > -Mike > > ---(end of broadcast)--- > TIP 7: don't forget to increase your free space map settings -- Darcy Buskermolen Wavefire Technologies Corp. ph: 250.717.0200 fx: 250.763.1759 http://www.wavefire.com ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] query plan question
David Parker wrote: We're using postgresql 7.4.5. I've only recently put pg_autovacuum in place as part of our installation, and I'm basically taking the defaults. I doubt it's a problem with autovacuum itself, but rather with my configuration of it. I have some reading to do, so any pointers to existing autovacuum threads would be greatly appreciated! Well the first thing to do is increase the verbosity of the pg_autovacuum logging output. If you use -d2 or higher, pg_autovacuum will print out a lot of detail on what it thinks the thresholds are and why it is or isn't performing vacuums and analyzes. Attach some of the log and I'll take a look at it. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] memcached and PostgreSQL
On 17 Nov 2004 03:08:20 -0500, Greg Stark <[EMAIL PROTECTED]> wrote: > Josh Berkus <[EMAIL PROTECTED]> writes: > > > So memcached becomes a very good place to stick data that's read often but > > not > > updated often, or alternately data that changes often but is disposable. > > An > > example of the former is a user+ACL list; and example of the latter is web > > session information ... or simple materialized views. > > I would like very much to use something like memcached for a materialized view > I have. The problem is that I have to join it against other tables. > > I've thought about providing a SRF in postgres to read records out of > memcached but I'm unclear it would it really help at all. > > Has anyone tried anything like this? I haven't tried it yet, but I plan too. An intersting case might be to use plperlu to interface with memcached and store hashes in the cache via some external process, like a CGI script. Then just define a TYPE for the perl SRF to return, and store the data as an array of hashes with keys matching the TYPE. A (perhaps useless) example could then be something like: CREATE TYPE user_info AS ( sessionid TEXT, userid INT, lastaccess TIMESTAMP, lastrequest TEXT); CREATE FUNCTION get_user_info_by_session ( TEXT) RETURNS SETOF user_info AS $$ use Cache::Memcached; my $session = shift; my $c = $_SHARED{memcached} || Cache::Memcached->new( {servers => '127.0.0.1:'} ); my $user_info = $m->get('web_access_list'); # $user_info looks like # [ {userid => 5, lastrequest => 'http://...', lastaccess => localtime(), #sessionid => '123456789'}, { ...} ] # and is stored by a CGI. @info = grep {$$_{sessionid} eq $session} @$user_info; return [EMAIL PROTECTED]; $$ LANGUAGE 'plperlu'; SELECT u.username, f.lastrequest FROM users u, get_user_info_by_session('123456789') WHERE f.userid = u.userid; Any thoughts? > > -- > greg > > ---(end of broadcast)--- > TIP 5: Have you checked our extensive FAQ? > >http://www.postgresql.org/docs/faqs/FAQ.html > -- Mike Rylander [EMAIL PROTECTED] GPLS -- PINES Development Database Developer ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[PERFORM] Analyzer is clueless
I'm doing some performance profiling with a simple two-table query: SELECT L."ProductID", sum(L."Amount") FROM "drinv" H JOIN "drinvln" L ON L."OrderNo" = H."OrderNo" WHERE ("OrderDate" between '2003-01-01' AND '2003-04-30') GROUP BY L."ProductID" drinv and drinvln have about 100,000 and 3,500,000 rows respectively. Actual data size in the large table is 500-600MB. OrderNo is indexed in both tables, as is OrderDate. The environment is PGSQL 8 on Win2k with 512MB RAM (results are similar to 7.3 from Mammoth). I've tried tweaking various conf parameters, but apart from using up memory, nothing seems to have had a tangible effect - the Analyzer doesn't seem to take resources into account like some of the doco suggests. The date selection represents about 5% of the range. Here's the plan summaries: Three months (2003-01-01 to 2003-03-30) = 1 second HashAggregate (cost=119365.53..119368.74 rows=642 width=26) -> Nested Loop (cost=0.00..118791.66 rows=114774 width=26) -> Index Scan using "drinv_OrderDate" on drinv h (cost=0.00..200.27 rows=3142 width=8) Index Cond: (("OrderDate" >= '2003-01-01'::date) AND ("OrderDate" <= '2003-03-30'::date)) -> Index Scan using "drinvln_OrderNo" on drinvln l (cost=0.00..28.73 rows=721 width=34) Index Cond: (l."OrderNo" = "outer"."OrderNo") Four months (2003-01-01 to 2003-04-30) = 60 seconds HashAggregate (cost=126110.53..126113.74 rows=642 width=26) -> Hash Join (cost=277.55..125344.88 rows=153130 width=26) Hash Cond: ("outer"."OrderNo" = "inner"."OrderNo") -> Seq Scan on drinvln l (cost=0.00..106671.35 rows=3372935 width=34) -> Hash (cost=267.07..267.07 rows=4192 width=8) -> Index Scan using "drinv_OrderDate" on drinv h (cost=0.00..267.07 rows=4192 width=8) Index Cond: (("OrderDate" >= '2003-01-01'::date) AND ("OrderDate" <= '2003-04-30'::date)) Four months (2003-01-01 to 2003-04-30) with Seq_scan disabled = 75 seconds HashAggregate (cost=130565.83..130569.04 rows=642 width=26) -> Merge Join (cost=519.29..129800.18 rows=153130 width=26) Merge Cond: ("outer"."OrderNo" = "inner"."OrderNo") -> Sort (cost=519.29..529.77 rows=4192 width=8) Sort Key: h."OrderNo" -> Index Scan using "drinv_OrderDate" on drinv h (cost=0.00..267.07 rows=4192 width=8) Index Cond: (("OrderDate" >= '2003-01-01'::date) AND ("OrderDate" <= '2003-04-30'::date)) -> Index Scan using "drinvln_OrderNo" on drinvln l (cost=0.00..119296.29 rows=3372935 width=34) Statistics were run on each table before query execution. The random page cost was lowered to 2, but as you can see, the estimated costs are wild anyway. As a comparison, MS SQL Server took less than 15 seconds, or 4 times faster. MySQL (InnoDB) took 2 seconds, which is 30 times faster. The query looks straightforward to me (it might be clearer with a subselect), so what on earth is wrong? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] query plan question
We're using postgresql 7.4.5. I've only recently put pg_autovacuum in place as part of our installation, and I'm basically taking the defaults. I doubt it's a problem with autovacuum itself, but rather with my configuration of it. I have some reading to do, so any pointers to existing autovacuum threads would be greatly appreciated! Thanks. - DAP >-Original Message- >From: Tom Lane [mailto:[EMAIL PROTECTED] >Sent: Wednesday, November 17, 2004 10:46 AM >To: David Parker >Cc: Jeff; Russell Smith; [EMAIL PROTECTED] >Subject: Re: [PERFORM] query plan question > >"David Parker" <[EMAIL PROTECTED]> writes: >> So I guess I need to figure out why pg_autovacuum isn't analyzing >> those tables. > >Which autovacuum version are you using? The early releases >had some nasty bugs that would allow it to skip tables >sometimes. I think all the known problems are fixed as of >recent 7.4.x updates. > > regards, tom lane > ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] query plan question
"David Parker" <[EMAIL PROTECTED]> writes: > So I guess I need to figure out why pg_autovacuum isn't analyzing those > tables. Which autovacuum version are you using? The early releases had some nasty bugs that would allow it to skip tables sometimes. I think all the known problems are fixed as of recent 7.4.x updates. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] query plan question
Hmm, I'm really a beginner at this... It turns out that the pg_statistic table in my good database has records in it for the tables in the query, while the pg_statistic table in my bad database has no records for those tables at all! So I guess I need to figure out why pg_autovacuum isn't analyzing those tables. - DAP >-Original Message- >From: David Parker >Sent: Wednesday, November 17, 2004 9:44 AM >To: 'Jeff' >Cc: Russell Smith; [EMAIL PROTECTED] >Subject: RE: [PERFORM] query plan question > >I've got pg_autovacuum running on both platforms. I've >verified that the tables involved in the query have the same >number of rows on both databases. > >I'm not sure where to look to see how the stats might be >different. The "good" database's pg_statistic table has 24 >more rows than that in the "bad" database, so there's >definitely a difference. The good database's pg_statistic has >rows for 2 extra tables, but they are not tables involved in >the query in question... > >So something must be up with stats, but can you tell me what >the most signicant columns in the pg_statistic table are for >the planner making its decision? I'm sure this has been >discussed before, so if there's a thread you can point me to, >that would be great - I realize it's a big general question. > >Thanks for your time. > >- DAP > >>-Original Message- >>From: Jeff [mailto:[EMAIL PROTECTED] >>Sent: Wednesday, November 17, 2004 9:01 AM >>To: David Parker >>Cc: Russell Smith; [EMAIL PROTECTED] >>Subject: Re: [PERFORM] query plan question >> >> >>On Nov 17, 2004, at 7:32 AM, David Parker wrote: >> >>> Oh, I didn't realize that analyze gave that much more info. >>I've got a >>> lot to learn about this tuning stuff ;-) >>> >>> I've attached the output. I see from the new output where the slow >>> query is taking its time (the nested loop at line 10), but I still >>> have no idea why this plan is getting chosen >>> >> >>looks like your stats are incorrect on the sparc. >>Did you forget to run vacuum analyze on it? >> >>also, do both db's have the same data loaded? >>there are some very different numbers in terms of actual rows >floating >>around there... >> >>-- >>Jeff Trout <[EMAIL PROTECTED]> >>http://www.jefftrout.com/ >>http://www.stuarthamm.net/ >> >> ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] query plan question
I've got pg_autovacuum running on both platforms. I've verified that the tables involved in the query have the same number of rows on both databases. I'm not sure where to look to see how the stats might be different. The "good" database's pg_statistic table has 24 more rows than that in the "bad" database, so there's definitely a difference. The good database's pg_statistic has rows for 2 extra tables, but they are not tables involved in the query in question... So something must be up with stats, but can you tell me what the most signicant columns in the pg_statistic table are for the planner making its decision? I'm sure this has been discussed before, so if there's a thread you can point me to, that would be great - I realize it's a big general question. Thanks for your time. - DAP >-Original Message- >From: Jeff [mailto:[EMAIL PROTECTED] >Sent: Wednesday, November 17, 2004 9:01 AM >To: David Parker >Cc: Russell Smith; [EMAIL PROTECTED] >Subject: Re: [PERFORM] query plan question > > >On Nov 17, 2004, at 7:32 AM, David Parker wrote: > >> Oh, I didn't realize that analyze gave that much more info. >I've got a >> lot to learn about this tuning stuff ;-) >> >> I've attached the output. I see from the new output where the slow >> query is taking its time (the nested loop at line 10), but I still >> have no idea why this plan is getting chosen >> > >looks like your stats are incorrect on the sparc. >Did you forget to run vacuum analyze on it? > >also, do both db's have the same data loaded? >there are some very different numbers in terms of actual rows floating >around there... > >-- >Jeff Trout <[EMAIL PROTECTED]> >http://www.jefftrout.com/ >http://www.stuarthamm.net/ > > ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] query plan question
On Nov 17, 2004, at 7:32 AM, David Parker wrote: Oh, I didn't realize that analyze gave that much more info. I've got a lot to learn about this tuning stuff ;-) I've attached the output. I see from the new output where the slow query is taking its time (the nested loop at line 10), but I still have no idea why this plan is getting chosen looks like your stats are incorrect on the sparc. Did you forget to run vacuum analyze on it? also, do both db's have the same data loaded? there are some very different numbers in terms of actual rows floating around there... -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] query plan question
>If they are the same and PostgreSQL are the same, are the >intel machines Xeons? Yup, dual 3.06-GHz Intel Xeon Processors. I'm not sure off the top of my head what the sparcs are exactly. We're in the process of moving completely to intel, but we still have to support our app on sparc, and we are seeing these weird differences... - DAP ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PERFORM] query plan question
Oh, I didn't realize that analyze gave that much more info. I've got a lot to learn about this tuning stuff ;-) I've attached the output. I see from the new output where the slow query is taking its time (the nested loop at line 10), but I still have no idea why this plan is getting chosen Thanks! - DAP >-Original Message- >From: [EMAIL PROTECTED] >[mailto:[EMAIL PROTECTED] On Behalf Of >Russell Smith >Sent: Tuesday, November 16, 2004 11:36 PM >To: [EMAIL PROTECTED] >Subject: Re: [PERFORM] query plan question > >On Wed, 17 Nov 2004 02:54 pm, you wrote: >> I have a query for which postgres is generating a different >plan on different machines. The database schema is the same, >the dataset is the same, the configuration is the same (e.g., >pg_autovacuum running in both cases), both systems are Solaris >9. The main difference in the two systems is that one is sparc >and the other is intel. >> >> The query runs in about 40 ms on the intel box, but takes >about 18 seconds on the sparc box. Now, the intel boxes we >have are certainly faster, but I'm curious why the query plan >might be different. >> >> For the intel: >> >> QUERY PLAN >> Unique (cost=11.50..11.52 rows=2 width=131) >> -> Sort (cost=11.50..11.50 rows=2 width=131) >> Sort Key: up.prefix, s.name, s.tuid, s.foundryversion >> -> Hash Join (cost=10.42..11.49 rows=2 width=131) >> Hash Cond: ("outer".dbid = "inner"."schema") >> -> Seq Scan on "schema" s (cost=0.00..1.02 >rows=2 width=128) >> -> Hash (cost=10.41..10.41 rows=4 width=11) >> -> Nested Loop (cost=0.00..10.41 >rows=4 width=11) >> -> Nested Loop (cost=0.00..2.14 >rows=4 width=4) >> -> Seq Scan on flow fl >(cost=0.00..0.00 rows=1 width=4) >> Filter: (servicetype = 646) >> -> Index Scan using >usage_flow_i on "usage" u (cost=0.00..2.06 rows=6 width=8) >> Index Cond: (u.flow = >"outer".dbid) >> -> Index Scan using >usageparameter_usage_i on usageparameter up (cost=0.00..2.06 >rows=1 width=15) >> Index Cond: (up."usage" = >"outer".dbid) >> Filter: ((prefix)::text <> >> 'xsd'::text) >> >> For the sparc: >> >> QUERY PLAN >> Unique (cost=10.81..10.83 rows=1 width=167) >> -> Sort (cost=10.81..10.82 rows=1 width=167) >> Sort Key: up.prefix, s.name, s.tuid, s.foundryversion >> -> Nested Loop (cost=9.75..10.80 rows=1 width=167) >> Join Filter: ("outer".flow = "inner".dbid) >> -> Hash Join (cost=9.75..10.79 rows=1 width=171) >> Hash Cond: ("outer".dbid = "inner"."schema") >> -> Seq Scan on "schema" s >(cost=0.00..1.02 rows=2 width=128) >> -> Hash (cost=9.75..9.75 rows=1 width=51) >> -> Nested Loop (cost=0.00..9.75 >rows=1 width=51) >> Join Filter: >("inner"."usage" = "outer".dbid) >> -> Index Scan using >usage_flow_i on "usage" u (cost=0.00..4.78 rows=1 width=8) >> -> Index Scan using >usageparameter_schema_i on usageparameter up (cost=0.00..4.96 >rows=1 width=51) >> Filter: >((prefix)::text <> 'xsd'::text) >> -> Seq Scan on flow fl (cost=0.00..0.00 >rows=1 width=4) >> Filter: (servicetype = 646) >> >Unique (cost=11.50..11.52 rows=2 width=131) Unique >(cost=10.81..10.83 rows=1 width=167) > >The estimations for the cost is basically the same, 10ms for >the first row. Can you supply Explain analyze to see what >it's actually doing? > >Russell Smith > >---(end of >broadcast)--- >TIP 6: Have you searched our list archives? > > http://archives.postgresql.org > QUERY PLAN "Unique (cost=10.81..10.83 rows=1 width=167) (actual time=19390.684..19390.687 rows=1 loops=1)" " -> Sort (cost=10.81..10.82 rows=1 width=167) (actual time=19390.678..19390.679 rows=1 loops=1)" "Sort Key: up.prefix, s.name, s.tuid, s.foundryversion" "-> Nested Loop (cost=9.75..10.80 rows=1 width=167) (actual time=19377.051..19390.391 rows=1 loops=1)" " Join Filter: ("outer".flow = "inner".dbid)" " -> Hash Join (cost=9.75..10.79 rows=1 width=171) (actual time=19173.684..19181.827 rows=770 loops=1)" "Hash Cond: ("outer".dbid = "inner"."schema")" "-> Seq Scan on "schema" s (cost=0.00..1.02 rows=2 width=128) (actual time=0.240..0.693 rows=20 loops=1)" "-> Hash (cost=9.75..9.75 rows=1 width=51) (actual time=19173.354..19173.354 rows=0 loops=1)" "
Re: [PERFORM] mis-estimation on data-warehouse aggregate creation
On Tue, 16 Nov 2004, Simon Riggs wrote: > The join condition has so many ANDed predicates that we assume that this > will reduce the selectivity considerably. It does not, and so you pay > the cost dearly later on. > Yes, that makes a lot of sense. Without some incredibly good cross-column statistics there is no way it could expect all of the rows to match. Thanks for the analysis. Kris Jurka ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] memcached and PostgreSQL
Josh Berkus <[EMAIL PROTECTED]> writes: > So memcached becomes a very good place to stick data that's read often but > not > updated often, or alternately data that changes often but is disposable. An > example of the former is a user+ACL list; and example of the latter is web > session information ... or simple materialized views. I would like very much to use something like memcached for a materialized view I have. The problem is that I have to join it against other tables. I've thought about providing a SRF in postgres to read records out of memcached but I'm unclear it would it really help at all. Has anyone tried anything like this? -- greg ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html