Re: [PERFORM] Hardware/OS recommendations for large databases (
Mark, See the results below and analysis - the pure HeapScan gets 94.1% of the max available read bandwidth (cool!). Nothing wrong with heapscan in the presence of large readahead, which is good news. That says it's something else in the path. As you probably know there is a page lock taken, a copy of the tuple from the page, lock removed, count incremented for every iteration of the agg node on a count(*). Is the same true of a count(1)? I recall that the profile is full of memcpy and memory context calls. It would be nice to put some tracers into the executor and see where the time is going. I'm also curious about the impact of the new 8.1 virtual tuples in reducing the executor overhead. In this case my bet's on the agg node itself, what do you think? - Luke On 11/21/05 9:10 PM, Mark Kirkwood [EMAIL PROTECTED] wrote: Luke Lonergan wrote: So that leaves the question - why not more than 64% of the I/O scan rate? And why is it a flat 64% as the I/O subsystem increases in speed from 333-400MB/s? It might be interesting to see what effect reducing the cpu consumption entailed by the count aggregation has - by (say) writing a little bit of code to heap scan the desired relation (sample attached). OK - here are results for a slightly smaller (still bigger than RAM) lineitem on the same machine, using the same xfs filesystem that achieved 407MB/s: 12.9GB of DBT-3 data from the lineitem table llonergan=# select relpages from pg_class where relname='lineitem'; relpages -- 1579270 (1 row) 1579270*8192/100 12937 Million Bytes or 12.9GB llonergan=# \timing Timing is on. llonergan=# select count(1) from lineitem; count -- 59986052 (1 row) Time: 197870.105 ms llonergan=# select count(1) from lineitem; count -- 59986052 (1 row) Time: 49912.164 ms llonergan=# select count(1) from lineitem; count -- 59986052 (1 row) Time: 49218.739 ms llonergan=# select fastcount('lineitem'); fastcount --- 59986052 (1 row) Time: 33752.778 ms llonergan=# select fastcount('lineitem'); fastcount --- 59986052 (1 row) Time: 34543.646 ms llonergan=# select fastcount('lineitem'); fastcount --- 59986052 (1 row) Time: 34528.053 ms Analysis: Bandwidth Percent of max dd Read 407MB/s 100% Count(1)263MB/s 64.6% HeapScan383MB/s 94.1% Wow - looks like the HeapScan gets almost all of the available bandwidth! - Luke ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Hardware/OS recommendations for large databases (
Luke Lonergan wrote: 12.9GB of DBT-3 data from the lineitem table llonergan=# select relpages from pg_class where relname='lineitem'; relpages -- 1579270 (1 row) 1579270*8192/100 12937 Million Bytes or 12.9GB llonergan=# \timing Timing is on. llonergan=# select count(1) from lineitem; count -- 59986052 (1 row) Time: 197870.105 ms So 198 seconds is the uncached read time with count (Just for clarity, did you clear the Pg and filesystem caches or unmount / remount the filesystem?) llonergan=# select count(1) from lineitem; count -- 59986052 (1 row) Time: 49912.164 ms llonergan=# select count(1) from lineitem; count -- 59986052 (1 row) Time: 49218.739 ms and ~50 seconds is the (partially) cached read time with count llonergan=# select fastcount('lineitem'); fastcount --- 59986052 (1 row) Time: 33752.778 ms llonergan=# select fastcount('lineitem'); fastcount --- 59986052 (1 row) Time: 34543.646 ms llonergan=# select fastcount('lineitem'); fastcount --- 59986052 (1 row) Time: 34528.053 ms so ~34 seconds is the (partially) cached read time for fastcount - I calculate this to give ~362Mb/s effective IO rate (I'm doing / by 1024*1024 not 1000*1000) FWIW. While this is interesting, you probably want to stop Pg, unmount the filesystem, and restart Pg to get the uncached time for fastcount too (and how does this compare to uncached read with dd using the same block size?). But at this stage it certainly looks the the heapscan code is pretty efficient - great! Oh - and do you want to try out 32K block size, I'm interested to see what level of improvement you get (as my system is hopelessly cpu bound...)! Analysis: Bandwidth Percent of max dd Read 407MB/s 100% Count(1)263MB/s 64.6% HeapScan383MB/s 94.1% Cheers Mark ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Hardware/OS recommendations for large databases (
Luke Lonergan wrote: Mark, It would be nice to put some tracers into the executor and see where the time is going. I'm also curious about the impact of the new 8.1 virtual tuples in reducing the executor overhead. In this case my bet's on the agg node itself, what do you think? Yeah - it's pretty clear that the count aggregate is fairly expensive wrt cpu - However, I am not sure if all agg nodes suffer this way (guess we could try a trivial aggregate that does nothing for all tuples bar the last and just reports the final value it sees). Cheers Mark ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Hardware/OS recommendations for large databases (
Mark, Time: 197870.105 ms So 198 seconds is the uncached read time with count (Just for clarity, did you clear the Pg and filesystem caches or unmount / remount the filesystem?) Nope - the longer time is due to the second write known issue with Postgres - it writes the data to the table, but all of the pages are marked dirty? So, always on the first scan after loading they are written again. This is clear as you watch vmstat - the pattern on the first seq scan is half read / half write. Time: 49218.739 ms and ~50 seconds is the (partially) cached read time with count Again - the pattern here is pure read and completely non-cached. You see a very nearly constant I/O rate when watching vmstat for the entire scan. Time: 34528.053 ms so ~34 seconds is the (partially) cached read time for fastcount - I calculate this to give ~362Mb/s effective IO rate (I'm doing / by 1024*1024 not 1000*1000) FWIW. The dd number uses 1000*1000, so I maintained it for the percentage of max. While this is interesting, you probably want to stop Pg, unmount the filesystem, and restart Pg to get the uncached time for fastcount too (and how does this compare to uncached read with dd using the same block size?). I'll do it again sometime, but I've already deleted the file. I've done the following in the past to validate this though: - Reboot machine - Rerun scan And we get identical results. But at this stage it certainly looks the the heapscan code is pretty efficient - great! Yep. Oh - and do you want to try out 32K block size, I'm interested to see what level of improvement you get (as my system is hopelessly cpu bound...)! Yah - done so in the past and not seen any - was waiting for Alan to post his results. Analysis: Bandwidth Percent of max dd Read 407MB/s 100% Count(1)263MB/s 64.6% HeapScan383MB/s 94.1% Note these are all in consistent 1000x1000 units. Thanks for the test - neat trick! We'll use it to do some more profiling some time soon... - Luke ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Hardware/OS recommendations for large databases (
Luke Lonergan wrote: That says it's something else in the path. As you probably know there is a page lock taken, a copy of the tuple from the page, lock removed, count incremented for every iteration of the agg node on a count(*). Is the same true of a count(1)? Sorry Luke - message 3 - I seem to be suffering from a very small working memory buffer myself right now, I think it's after a day of working with DB2 ... :-) Anyway, as I read src/backend/parser/gram.y:6542 - count(*) is transformed into count(1), so these two are identical. Cheers (last time tonight, promise!) Mark ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Hardware/OS recommendations for large databases (
Luke Lonergan wrote: Mark, Time: 197870.105 ms So 198 seconds is the uncached read time with count (Just for clarity, did you clear the Pg and filesystem caches or unmount / remount the filesystem?) Nope - the longer time is due to the second write known issue with Postgres - it writes the data to the table, but all of the pages are marked dirty? So, always on the first scan after loading they are written again. This is clear as you watch vmstat - the pattern on the first seq scan is half read / half write. Ah - indeed - first access after a COPY no? I should have thought of that, sorry! ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan
THIS MAY SEEM SILLY but vacuum is mispelled below and presumably there was never any ANALYZE done. postgres=# vaccum full verbose analyze; I do have done the vacUUm full verbose analyze;. But I copy/paste the wrong line. Cordialement, Jean-Gérard Pailloncy ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan
Pailloncy Jean-Gerard [EMAIL PROTECTED] writes: Why the stupid indexscan plan on the whole table ? Pray tell, what are you using for the planner cost parameters? The only way I can come close to duplicating your numbers is by setting random_page_cost to somewhere around 0.01 ... I did not change the costs. grep cost postgresql.conf # note: increasing max_connections costs ~400 bytes of shared memory per # note: increasing max_prepared_transactions costs ~600 bytes of shared memory #vacuum_cost_delay = 0 # 0-1000 milliseconds #vacuum_cost_page_hit = 1 # 0-1 credits #vacuum_cost_page_miss = 10 # 0-1 credits #vacuum_cost_page_dirty = 20# 0-1 credits #vacuum_cost_limit = 200# 0-1 credits #random_page_cost = 4 # units are one sequential page fetch # cost #cpu_tuple_cost = 0.01 # (same) #cpu_index_tuple_cost = 0.001 # (same) #cpu_operator_cost = 0.0025 # (same) #autovacuum_vacuum_cost_delay = -1 # default vacuum cost delay for # vacuum_cost_delay #autovacuum_vacuum_cost_limit = -1 # default vacuum cost limit for # vacuum_cost_limit Cordialement, Jean-Gérard Pailloncy ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] xlog flush request error
Hi , i get the following error on doing anything with the database after starting it. Can anyone suggest how do i fix this xlog flush request 7/7D02338C is not satisfied --- flushed only to 3/2471E324 Vipul Gupta
[PERFORM] Very slow queries - please help.
Hi Folks, I'm new to Postgresql. I'm having great difficulties getting the performance I had hoped for from Postgresql 8.0. The typical query below takes ~20 minutes !! I hope an expert out there will tell me what I'm doing wrong - I hope *I* am doing something wrong. Hardware Single processor, Intel Xeon 3.06 GHz machine running Red Hat Ent. 4. with 1.5 GB of RAM. The machine is dedicated to running Postgresql 8.0 and Apache/mod_perl etc. The database is being accessed for report generation via a web form. The web server talks to Pg over TCP/IP (I know, that I don't need to do this if they are all on the same machine, but I have good reasons for this and don't suspect that this is where my problems are - I have the same poor performance when running from psql on the server.) Database Very simple, not fully normalized set of two tables. The first table, very small (2000 lines of 4 cols with very few chars and integers in in col). The other quite a bit larger (50 lines with 15 cols. with the largest fields ~ 256 chars) Typical query SELECT n.name FROM node n WHERE n.name LIKE '56x%' AND n.type='H' AND n.usage='TEST' AND n.node_id NOT IN (select n.node_id FROM job_log j INNER JOIN node n ON j.node_id = n.node_id WHERE n.name LIKE '56x%' AND n.type='H' AND n.usage='TEST' AND j.job_name = 'COPY FILES' AND j.job_start = '2005-11-14 00:00:00' AND (j.job_stop = '2005-11-22 09:31:10' OR j.job_stop IS NULL)) ORDER BY n.name The node table is the small table and the job_log table is the large table. I've tried all the basic things that I found in the documentation like VACUUM ANALYZE, EXPLAIN etc., but I suspect there is something terribly wrong with what I'm doing and these measures will not shave off 19 min and 50 seconds off the query time. Any help and comments would be very much appreciated. Bealach ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Very slow queries - please help.
Typical query SELECT n.name FROM node n WHERE n.name LIKE '56x%' AND n.type='H' AND n.usage='TEST' AND n.node_id NOT IN (select n.node_id FROM job_log j INNER JOIN node n ON j.node_id = n.node_id WHERE n.name LIKE '56x%' AND n.type='H' AND n.usage='TEST' AND j.job_name = 'COPY FILES' AND j.job_start = '2005-11-14 00:00:00' AND (j.job_stop = '2005-11-22 09:31:10' OR j.job_stop IS NULL)) ORDER BY n.name Do you have any indexes? regards Claus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] High context switches occurring
Hi Anjan, I can support Scott. You should turn on HT if you see high values for CS. I do have a few customers running a web-based 3-tier application with PostgreSQL. We had to turn off HT to have better overall performance. The issue is the behavior under high load. I notice that HT on does collapse faster. Just a question. Which version of XEON do you have? What is does the server have as memory architecture. I think, Dual-Core XEON's are no issue. One of our customers does use a 4-way Dual-Core Opteron 875 since a few months. We have Pg 8.0.3 and it runs perfect. I have to say that we use a special patch from Tom which fix an issue with the looking of shared buffers and the Opteron. I notice that this patch is also useful for XEON's with EMT64. Best regards Sven. Anjan Dave schrieb: Yes, it's turned on, unfortunately it got overlooked during the setup, and until now...! It's mostly a 'read' application, I increased the vm.max-readahead to 2048 from the default 256, after which I've not seen the CS storm, though it could be incidental. Thanks, Anjan -Original Message- From: Scott Marlowe [mailto:[EMAIL PROTECTED] Sent: Tuesday, November 22, 2005 3:38 PM To: Anjan Dave Cc: Tom Lane; Vivek Khera; Postgresql Performance Subject: Re: [PERFORM] High context switches occurring On Tue, 2005-11-22 at 14:33, Anjan Dave wrote: Is there any way to get a temporary relief from this Context Switching storm? Does restarting postmaster help? It seems that I can recreate the heavy CS with just one SELECT statement...and then when multiple such SELECT queries are coming in, things just get hosed up until we cancel a bunch of queries... Is your machine a hyperthreaded one? Some folks have found that turning off hyper threading helps. I knew it made my servers better behaved in the past. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster -- /This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you are not the intended recipient, you should not copy it, re-transmit it, use it or disclose its contents, but should return it to the sender immediately and delete your copy from your system. Thank you for your cooperation./ Sven Geisler [EMAIL PROTECTED] Tel +49.30.5362.1627 Fax .1638 Senior Developer,AEC/communications GmbHBerlin, Germany ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Very slow queries - please help.
Hi, Thanks for your comments. I've explicitly made any indexes, but the default ones are: [EMAIL PROTECTED] \di List of relations Schema | Name | Type | Owner | Table -+-+---+-+- user | job_log_id_pkey | index | user | job_log user | node_id_pkey| index | user | node user | node_name_key | index | user | node (3 rows) I'm also sending the EXPLAIN outputs. explain SELECT n.name,n.type, n.usage, j.status, j.job_start,j.job_stop, j.nfiles_in_job,j.job_name FROM job_log j INNER JOIN node n ON j.node_id = n.node_id WHERE n.name LIKE '56x%' AND n.type = 'K' AND n.usage = 'LIVE' AND j.job_name = 'COPY FILES' AND j.job_start = '2005-11-14 00:00:00' AND (j.job_stop = '2005-11-14 05:00:00' OR j.job_stop IS NULL) ORDER BY n.name; QUERY PLAN -- Nested Loop (cost=0.00..75753.31 rows=1 width=461) Join Filter: (inner.node_id = outer.node_id) - Index Scan using node_name_key on node n (cost=0.00..307.75 rows=1 width=181) Filter: ((name ~~ '56x%'::text) AND (type = 'K'::bpchar) AND (usage = 'LIVE'::bpchar)) - Seq Scan on job_log j (cost=0.00..75445.54 rows=1 width=288) Filter: ((job_name = 'COPY FILES'::bpchar) AND (job_start = '2005-11-14 00:00:00'::timestamp without time zone) AND ((job_stop = '2005-11-14 05:00:00'::timestamp without time zone) OR (job_stop IS NULL))) (6 rows) explain SELECT n.name, n.type, n.usage FROM node n WHERE n.name LIKE '56x%' AND n.type = 'K' AND n.usage = 'LIVE' AND n.node_id NOT IN (SELECT n.node_id FROM job_log j INNER JOIN node n ON j.node_id = n.node_id WHERE n.name LIKE '56x%' AND n.type = 'K' AND n.usage = 'LIVE' AND j.job_name = 'COPY FILES' AND j.job_start = '2005-11-14 00:00:00' AND (j.job_stop = '2005-11-14 05:00:00' OR j.job_stop IS NULL)) ORDER BY n.name; QUERY PLAN -- Index Scan using node_name_key on node n (cost=75451.55..75764.94 rows=1 width=177) Filter: ((name ~~ '56x%'::text) AND (type = 'K'::bpchar) AND (usage = 'LIVE'::bpchar) AND (NOT (hashed subplan))) SubPlan - Nested Loop (cost=0.00..75451.54 rows=1 width=4) - Seq Scan on job_log j (cost=0.00..75445.54 rows=1 width=4) Filter: ((job_name = 'COPY FILES'::bpchar) AND (job_start = '2005-11-14 00:00:00'::timestamp without time zone) AND ((job_stop = '2005-11-14 05:00:00'::timestamp without time zone) OR (job_stop IS NULL))) - Index Scan using node_id_pkey on node n (cost=0.00..5.99 rows=1 width=4) Index Cond: (outer.node_id = n.node_id) Filter: ((name ~~ '56x%'::text) AND (type = 'K'::bpchar) AND (usage = 'LIVE'::bpchar)) Yours, Bealach From: Claus Guttesen [EMAIL PROTECTED] To: Bealach-na Bo [EMAIL PROTECTED] CC: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Very slow queries - please help. Date: Thu, 24 Nov 2005 14:23:38 +0100 Typical query SELECT n.name FROM node n WHERE n.name LIKE '56x%' AND n.type='H' AND n.usage='TEST' AND n.node_id NOT IN (select n.node_id FROM job_log j INNER JOIN node n ON j.node_id = n.node_id WHERE n.name LIKE '56x%' AND n.type='H' AND n.usage='TEST' AND j.job_name = 'COPY FILES' AND j.job_start = '2005-11-14 00:00:00' AND (j.job_stop = '2005-11-22 09:31:10' OR j.job_stop IS NULL)) ORDER BY n.name Do you have any indexes? regards Claus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Very slow queries - please help.
Hi, I'm also sending the EXPLAIN outputs. Please provide EXPLAIN ANALYZE outputs instead of EXPLAIN. You will have more information. Indexes on your tables are obviously missing. You should try to add: CREATE INDEX idx_node_filter ON node(name, type, usage); CREATE INDEX idx_job_log_filter ON job_log(job_name, job_start, job_stop); I'm not so sure it's a good idea to add job_stop in this index as you have an IS NULL in your query so I'm not sure it can be used. You should try it anyway and remove it if not needed. I added all your search fields in the indexes but it depends a lot on the selectivity of your conditions. I don't know your data but I think you understand the idea. HTH -- Guillaume ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Very slow queries - please help.
Bealach-na Bo [EMAIL PROTECTED] writes: I'm having great difficulties getting the performance I had hoped for from Postgresql 8.0. The typical query below takes ~20 minutes !! You need to show us the table definition (including indexes) and the EXPLAIN ANALYZE results for the query. It seems likely that the NOT IN is the source of your problems, but it's hard to be sure without EXPLAIN results. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] xlog flush request error
[EMAIL PROTECTED] writes: Can anyone suggest how do i fix this xlog flush request 7/7D02338C is not satisfied --- flushed only to 3/2471E324 This looks like corrupt data to me --- specifically, garbage in the LSN field of a page header. Is that all you get? PG 7.4 and up should tell you the problem page number in a CONTEXT: line. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Hardware/OS recommendations for large databases (
Mark Kirkwood [EMAIL PROTECTED] writes: Yeah - it's pretty clear that the count aggregate is fairly expensive wrt cpu - However, I am not sure if all agg nodes suffer this way (guess we could try a trivial aggregate that does nothing for all tuples bar the last and just reports the final value it sees). As you mention count(*) and count(1) are the same thing. Last I heard the reason count(*) was so expensive was because its state variable was a bigint. That means it doesn't fit in a Datum and has to be alloced and stored as a pointer. And because of the Aggregate API that means it has to be allocated and freed for every tuple processed. There was some talk of having a special case API for count(*) and maybe sum(...) to avoid having to do this. There was also some talk of making Datum 8 bytes wide on platforms where that was natural (I guess AMD64, Sparc64, Alpha, Itanic). Afaik none of these items have happened but I don't know for sure. -- greg ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Hardware/OS recommendations for large databases (
Greg Stark [EMAIL PROTECTED] writes: Last I heard the reason count(*) was so expensive was because its state variable was a bigint. That means it doesn't fit in a Datum and has to be alloced and stored as a pointer. And because of the Aggregate API that means it has to be allocated and freed for every tuple processed. There's a hack in 8.1 to avoid the palloc overhead (courtesy of Neil Conway IIRC). regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Hardware/OS recommendations for large databases (
Tom Lane [EMAIL PROTECTED] writes: Greg Stark [EMAIL PROTECTED] writes: Last I heard the reason count(*) was so expensive was because its state variable was a bigint. That means it doesn't fit in a Datum and has to be alloced and stored as a pointer. And because of the Aggregate API that means it has to be allocated and freed for every tuple processed. There's a hack in 8.1 to avoid the palloc overhead (courtesy of Neil Conway IIRC). ah, cool, missed that. -- greg ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Hardware/OS recommendations for large databases (
The same 12.9GB distributed across 4 machines using Bizgres MPP fits into I/O cache. The interesting result is that the query select count(1) is limited in speed to 280 MB/s per CPU when run on the lineitem table. So when I run it spread over 4 machines, one CPU per machine I get this: == Bizgres MPP, 4 data segments, 1 per 2 CPUs == llonergan=# explain select count(1) from lineitem; QUERY PLAN -- Aggregate (cost=582452.00..582452.00 rows=1 width=0) - Gather Motion (cost=582452.00..582452.00 rows=1 width=0) - Aggregate (cost=582452.00..582452.00 rows=1 width=0) - Seq Scan on lineitem (cost=0.00..544945.00 rows=15002800 width=0) (4 rows) llonergan=# \timing Timing is on. llonergan=# select count(1) from lineitem; count -- 59986052 (1 row) Time: 12191.435 ms llonergan=# select count(1) from lineitem; count -- 59986052 (1 row) Time: 11986.109 ms llonergan=# select count(1) from lineitem; count -- 59986052 (1 row) Time: 11448.941 ms == That's 12,937 MB in 11.45 seconds, or 1,130 MB/s. When you divide out the number of Postgres instances (4), that's 283MB/s per Postgres instance. To verify that this has nothing to do with MPP, I ran it in a special internal mode on one instance and got the same result. So - we should be able to double this rate by running one segment per CPU, or two per host: == Bizgres MPP, 8 data segments, 1 per CPU == llonergan=# select count(1) from lineitem; count -- 59986052 (1 row) Time: 6484.594 ms llonergan=# select count(1) from lineitem; count -- 59986052 (1 row) Time: 6156.729 ms llonergan=# select count(1) from lineitem; count -- 59986052 (1 row) Time: 6063.416 ms == That's 12,937 MB in 11.45 seconds, or 2,134 MB/s. When you divide out the number of Postgres instances (8), that's 267MB/s per Postgres instance. So, if you want to select count(1), using more CPUs is a good idea! For most complex queries, having lots of CPUs + MPP is a good combo. Here is an example of a sorting plan - this should probably be done with a hash aggregation, but using 8 CPUs makes it go 8x faster: - Luke ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Very slow queries - please help
OK. The consensus seems to be that I need more indexes and I also need to look into the NOT IN statement as a possible bottleneck. I've introduced the indexes which has led to a DRAMATIC change in response time. Now I have to experiment with INNER JOIN - OUTER JOIN variations, SET ENABLE_SEQSCAN=OFF. Forgive me for not mentioning each person individually and by name. You have all contributed to confirming what I had suspected (and hoped): that *I* have a lot to learn! I'm attaching table descriptions, the first few lines of top output while the queries were running, index lists, sample queries and EXPLAIN ANALYSE output BEFORE and AFTER the introduction of the indexes. As I said, DRAMATIC :) I notice that the CPU usage does not vary very much, it's nearly 100% anyway, but the memory usage drops markedly, which is another very nice result of the index introduction. Any more comments and tips would be very welcome. Thank you all for your input. Bealach. [EMAIL PROTECTED] \d job_log Table blouser.job_log Column |Type |Modifiers +-+-- job_log_id | integer | not null default nextval('job_log_id_seq'::text) first_registry | timestamp without time zone | blogger_name | character(50) | node_id| integer | job_type | character(50) | job_name | character(256) | job_start | timestamp without time zone | job_timeout| interval| job_stop | timestamp without time zone | nfiles_in_job | integer | status | integer | error_code | smallint| Indexes: job_log_id_pkey PRIMARY KEY, btree (job_log_id) Check constraints: job_log_status_check CHECK (status = 0 OR status = 1 OR status = 8 OR status = 9) Foreign-key constraints: legal_node FOREIGN KEY (node_id) REFERENCES node(node_id) [EMAIL PROTECTED] \d node Table blouser.node Column | Type | Modifiers -+---+--- node_id | integer | not null default nextval('node_id_seq'::text) name| character(50) | type| character(1) | usage | character(4) | Indexes: node_id_pkey PRIMARY KEY, btree (node_id) node_name_key UNIQUE, btree (name) Check constraints: node_type_check CHECK (type = 'B'::bpchar OR type = 'K'::bpchar OR type = 'C'::bpchar OR type = 'T'::bpchar OR type = 'R'::bpchar) node_usage_check CHECK (usage = 'TEST'::bpchar OR usage = 'LIVE'::bpchar) #before new indexes were created Tasks: 114 total, 2 running, 112 sleeping, 0 stopped, 0 zombie Cpu(s): 25.7% us, 24.5% sy, 0.0% ni, 49.4% id, 0.3% wa, 0.0% hi, 0.0% si Mem: 1554788k total, 1513576k used,41212k free,31968k buffers Swap: 1020024k total,27916k used, 992108k free, 708728k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 25883 postgres 25 0 20528 12m 11m R 99.7 0.8 4:54.91 postmaster [EMAIL PROTECTED] \di List of relations Schema | Name | Type | Owner | Table -+-+---+-+- blouser | job_log_id_pkey | index | blouser | job_log blouser | node_id_pkey| index | blouser | node blouser | node_name_key | index | blouser | node (3 rows) EXPLAIN ANALYSE SELECT n.name,n.type, n.usage, j.status, j.job_start,j.job_stop, j.nfiles_in_job,j.job_name FROM job_log j INNER JOIN node n ON j.node_id = n.node_id WHERE n.name LIKE '711%' AND n.type = 'K' AND n.usage = 'LIVE' AND j.job_name = 'COPY FILES' AND j.job_start = '2005-11-14 00:00:00' AND (j.job_stop = '2005-11-14 05:00:00' OR j.job_stop IS NULL) ORDER BY n.name; QUERY PLAN --- Nested Loop (cost=0.00..75753.31 rows=1 width=461) (actual time=270486.692..291662.350 rows=3 loops=1) Join Filter: (inner.node_id = outer.node_id) - Index Scan using node_name_key on node n (cost=0.00..307.75 rows=1 width=181) (actual time=0.135..11.034 rows=208 loops=1) Filter: ((name ~~ '711%'::text) AND (type = 'K'::bpchar) AND (usage = 'LIVE'::bpchar)) - Seq Scan on job_log j (cost=0.00..75445.54 rows=1 width=288) (actual time=273.374..1402.089 rows=22 loops=208) Filter: ((job_name = 'COPY FILES'::bpchar) AND (job_start = '2005-11-14 00:00:00'::timestamp without time zone) AND ((job_stop = '2005-11-14
[PERFORM] Very slow queries - please help
A quick note to say that I'm very grateful for Tom Lane's input also. Tom, I did put you on the list of recipients for my last posting to pgsql-performance, but got: cut here This is an automatically generated Delivery Status Notification. Delivery to the following recipients failed. [EMAIL PROTECTED] Many regards, Bealach ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Hardware/OS recommendations for large databases (
Tom Lane wrote: Greg Stark [EMAIL PROTECTED] writes: Last I heard the reason count(*) was so expensive was because its state variable was a bigint. That means it doesn't fit in a Datum and has to be alloced and stored as a pointer. And because of the Aggregate API that means it has to be allocated and freed for every tuple processed. There's a hack in 8.1 to avoid the palloc overhead (courtesy of Neil Conway IIRC). It certainly makes quite a difference as I measure it: doing select(1) from a 181000 page table (completely uncached) on my PIII: 8.0 : 32 s 8.1 : 25 s Note that the 'fastcount()' function takes 21 s in both cases - so all the improvement seems to be from the count overhead reduction. Cheers Mark ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan
I redo the test, with a freshly installed data directory. Same result. Note: This is the full log. I just suppress the mistake I do like sl for ls. Jean-Gérard Pailloncy Last login: Thu Nov 24 12:52:32 2005 from 192.168.0.1 OpenBSD 3.8 (WDT) #2: Tue Nov 8 00:52:38 CET 2005 Welcome to OpenBSD: The proactively secure Unix-like operating system. Please use the sendbug(1) utility to report bugs in the system. Before reporting a bug, please try to reproduce it with the latest version of the code. With bug reports, please try to ensure that enough information to reproduce the problem is enclosed, and if a known fix for it exists, include that as well. Terminal type? [xterm-color] # cd /mnt2/pg/install/bin/ # mkdir /mnt2/pg/data # chown -R _pgsql:_pgsql /mnt2/pg/data # su _pgsql $ ls clusterdbdroplang pg_configpg_resetxlog reindexdb createdb dropuser pg_controldata pg_restore vacuumdb createlang ecpg pg_ctl postgres createuser initdb pg_dump postmaster dropdb ipcclean pg_dumpall psql $ ./initdb -D /mnt2/pg/data The files belonging to this database system will be owned by user _pgsql. This user must also own the server process. The database cluster will be initialized with locale C. fixing permissions on existing directory /mnt2/pg/data ... ok creating directory /mnt2/pg/data/global ... ok creating directory /mnt2/pg/data/pg_xlog ... ok creating directory /mnt2/pg/data/pg_xlog/archive_status ... ok creating directory /mnt2/pg/data/pg_clog ... ok creating directory /mnt2/pg/data/pg_subtrans ... ok creating directory /mnt2/pg/data/pg_twophase ... ok creating directory /mnt2/pg/data/pg_multixact/members ... ok creating directory /mnt2/pg/data/pg_multixact/offsets ... ok creating directory /mnt2/pg/data/base ... ok creating directory /mnt2/pg/data/base/1 ... ok creating directory /mnt2/pg/data/pg_tblspc ... ok selecting default max_connections ... 100 selecting default shared_buffers ... 1000 creating configuration files ... ok creating template1 database in /mnt2/pg/data/base/1 ... ok initializing pg_authid ... ok enabling unlimited row size for system tables ... ok initializing dependencies ... ok creating system views ... ok loading pg_description ... ok creating conversions ... ok setting privileges on built-in objects ... ok creating information schema ... ok vacuuming database template1 ... ok copying template1 to template0 ... ok copying template1 to postgres ... ok WARNING: enabling trust authentication for local connections You can change this by editing pg_hba.conf or using the -A option the next time you run initdb. Success. You can now start the database server using: ./postmaster -D /mnt2/pg/data or ./pg_ctl -D /mnt2/pg/data -l logfile start $ ./pg_ctl -D /mnt2/pg/data -l /mnt2/pg/data/logfile start postmaster starting $ ./psql postgres Welcome to psql 8.1.0, the PostgreSQL interactive terminal. Type: \copyright for distribution terms \h for help with SQL commands \? for help with psql commands \g or terminate with semicolon to execute query \q to quit postgres=# create table test (id serial, val integer); NOTICE: CREATE TABLE will create implicit sequence test_id_seq for serial column test.id CREATE TABLE postgres=# create unique index testid on test (id); CREATE INDEX postgres=# create index testval on test (val); CREATE INDEX postgres=# insert into test (val) values (round(random() *1024*1024*1024)); INSERT 0 1 postgres=# vacuum full analyze; VACUUM postgres=# select count(1) from test; count --- 1 (1 row) postgres=# explain select count(*) from (select distinct on (val) * from test) as foo; QUERY PLAN -- Aggregate (cost=1.04..1.05 rows=1 width=0) - Unique (cost=1.02..1.03 rows=1 width=8) - Sort (cost=1.02..1.02 rows=1 width=8) Sort Key: test.val - Seq Scan on test (cost=0.00..1.01 rows=1 width=8) (5 rows) postgres=# insert into test (val) select round(random() *1024*1024*1024) from test; INSERT 0 1 postgres=# insert into test (val) select round(random() *1024*1024*1024) from test; INSERT 0 2 postgres=# insert into test (val) select round(random() *1024*1024*1024) from test; INSERT 0 4 postgres=# insert into test (val) select round(random() *1024*1024*1024) from test; INSERT 0 8 postgres=# insert into test (val) select round(random() *1024*1024*1024) from test; INSERT 0 16 postgres=# insert into test (val) select round(random() *1024*1024*1024) from test; INSERT 0 32 postgres=# vacuum full analyze; VACUUM postgres=# explain select count(*) from (select distinct on (val) * from test) as foo; QUERY PLAN --- Aggregate (cost=4.68..4.69
Re: [PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan
Pailloncy Jean-Gerard [EMAIL PROTECTED] writes: I redo the test, with a freshly installed data directory. Same result. What same result? You only ran it up to 2K rows, not 2M. In any case, EXPLAIN without ANALYZE is pretty poor ammunition for complaining that the planner made the wrong choice. I ran the same test case, and AFAICS the indexscan is the right choice at 2K rows: regression=# explain analyze select count(*) from (select distinct on (val) * from test) as foo; QUERY PLAN -- Aggregate (cost=105.24..105.25 rows=1 width=0) (actual time=41.561..41.565 rows=1 loops=1) - Unique (cost=0.00..79.63 rows=2048 width=8) (actual time=0.059..32.459 rows=2048 loops=1) - Index Scan using testval on test (cost=0.00..74.51 rows=2048 width=8) (actual time=0.049..13.197 rows=2048 loops=1) Total runtime: 41.683 ms (4 rows) regression=# set enable_indexscan TO 0; SET regression=# explain analyze select count(*) from (select distinct on (val) * from test) as foo; QUERY PLAN --- Aggregate (cost=179.96..179.97 rows=1 width=0) (actual time=59.567..59.571 rows=1 loops=1) - Unique (cost=144.12..154.36 rows=2048 width=8) (actual time=21.438..50.434 rows=2048 loops=1) - Sort (cost=144.12..149.24 rows=2048 width=8) (actual time=21.425..30.589 rows=2048 loops=1) Sort Key: test.val - Seq Scan on test (cost=0.00..31.48 rows=2048 width=8) (actual time=0.014..9.902 rows=2048 loops=1) Total runtime: 60.265 ms (6 rows) regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan
Tom Lane wrote: What same result? You only ran it up to 2K rows, not 2M. In any case, EXPLAIN without ANALYZE is pretty poor ammunition for complaining that the planner made the wrong choice. I ran the same Hello, sorry to jump in mid-stream, but this reminded me of something. I have hit cases where I have a query for which there is a somewhat obvious (to a human...) query plan that should make it possible to get a query answer pretty quickly. Yet the query never finishes (or rather, after hours of waiting I finally kill it). I assume this is because of a sub-optimal query plan. But, it appears that an EXPLAIN ANALYZE runs the actual query, so it takes as long as the actual query. In such a case, how can I go about tracking down the issue, up to an including a complaint about the query planner? :-) (Overall, I'm pretty pleased with the PG query planner; it often gets better results than another, popular commercial DBMS we use here that is just a general impression, not the result of setting up the same schema in each for a comparison.) Kyle Cordes www.kylecordes.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] xlog flush request error
Hi tom, basically when i run any query with database say, select count(*) from table1; It gives me the following error trace: WARNING: could not write block 297776 of 1663/2110743/2110807 DETAIL: Multiple failures --- write error may be permanent. ERROR: xlog flush request 7/7D02338C is not satisfied --- flushed only to 3/2471E324 writing block 297776 of relation 1663/2110743/2110807 xlog flush request 7/7D02338C is not satisfied --- flushed only to 3/2471E324 xlog flush request 7/7D02338C is not satisfied --- flushed only to 3/2471E324\q i tried using pg_resetxlog but till date, have not been able to solve this problem Regards, Vipul Gupta Tom Lane [EMAIL PROTECTED] 11/24/2005 09:07 PM To:[EMAIL PROTECTED] cc:pgsql-performance@postgresql.org Subject:Re: [PERFORM] xlog flush request error [EMAIL PROTECTED] writes: Can anyone suggest how do i fix this xlog flush request 7/7D02338C is not satisfied --- flushed only to 3/2471E324 This looks like corrupt data to me --- specifically, garbage in the LSN field of a page header. Is that all you get? PG 7.4 and up should tell you the problem page number in a CONTEXT: line. regards, tom lane
Re: [PERFORM] xlog flush request error
[EMAIL PROTECTED] writes: ERROR: xlog flush request 7/7D02338C is not satisfied --- flushed only to 3/2471E324 writing block 297776 of relation 1663/2110743/2110807 You need to fix or zero out that data block ... regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly