Re: [PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan
Pailloncy Jean-Gerard <[EMAIL PROTECTED]> wrote .. [snip] THIS MAY SEEM SILLY but vacuum is mispelled below and presumably there was never any ANALYZE done. > > postgres=# vaccum full verbose analyze; ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Hardware/OS recommendations for large databases (
Luke Lonergan wrote: Mark, This is an excellent idea – unfortunately I’m in Maui right now (Mahalo!) and I’m not getting to testing with this. My first try was with 8.0.3 and it’s an 8.1 function I presume. Not to be lazy – but any hint as to how to do the same thing for 8.0? Yeah, it's 8.1 - I didn't think to check against 8.0. The attached variant works with 8.0.4 (textToQualifiedNameList needs 2 args) cheers Mark P.s. Maui eh, sounds real nice. /* * fastcount.c * * Do a count that uses considerably less CPU time than an aggregate. * * (Variant for 8.0.x - textToQualifiedNameList needs 2 args) */ #include "postgres.h" #include "funcapi.h" #include "access/heapam.h" #include "catalog/namespace.h" #include "utils/builtins.h" extern Datum fastcount(PG_FUNCTION_ARGS); PG_FUNCTION_INFO_V1(fastcount); Datum fastcount(PG_FUNCTION_ARGS) { text *relname = PG_GETARG_TEXT_P(0); RangeVar *relrv; Relationrel; HeapScanDesc scan; HeapTuple tuple; int64 result = 0; /* Use the name to get a suitable range variable and open the relation. */ relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname, "")); rel = heap_openrv(relrv, AccessShareLock); /* Start a heap scan on the relation. */ scan = heap_beginscan(rel, SnapshotNow, 0, NULL); while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) { result++; } /* End the scan and close up the relation. */ heap_endscan(scan); heap_close(rel, AccessShareLock); PG_RETURN_INT64(result); } ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan
Pailloncy Jean-Gerard <[EMAIL PROTECTED]> writes: > Why the stupid indexscan plan on the whole table ? Pray tell, what are you using for the planner cost parameters? The only way I can come close to duplicating your numbers is by setting random_page_cost to somewhere around 0.01 ... regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Hardware/OS recommendations for large databases (
Title: Re: [PERFORM] Hardware/OS recommendations for large databases ( Mark, This is an excellent idea – unfortunately I’m in Maui right now (Mahalo!) and I’m not getting to testing with this. My first try was with 8.0.3 and it’s an 8.1 function I presume. Not to be lazy – but any hint as to how to do the same thing for 8.0? - Luke On 11/21/05 9:10 PM, "Mark Kirkwood" <[EMAIL PROTECTED]> wrote: Luke Lonergan wrote: > So that leaves the question - why not more than 64% of the I/O scan rate? > And why is it a flat 64% as the I/O subsystem increases in speed from > 333-400MB/s? > It might be interesting to see what effect reducing the cpu consumption entailed by the count aggregation has - by (say) writing a little bit of code to heap scan the desired relation (sample attached). Cheers Mark /* * fastcount.c * * Do a count that uses considerably less CPU time than an aggregate. */ #include "postgres.h" #include "funcapi.h" #include "access/heapam.h" #include "catalog/namespace.h" #include "utils/builtins.h" extern Datum fastcount(PG_FUNCTION_ARGS); PG_FUNCTION_INFO_V1(fastcount); Datum fastcount(PG_FUNCTION_ARGS) { text *relname = PG_GETARG_TEXT_P(0); RangeVar *relrv; Relation rel; HeapScanDesc scan; HeapTuple tuple; int64 result = 0; /* Use the name to get a suitable range variable and open the relation. */ relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname)); rel = heap_openrv(relrv, AccessShareLock); /* Start a heap scan on the relation. */ scan = heap_beginscan(rel, SnapshotNow, 0, NULL); while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL) { result++; } /* End the scan and close up the relation. */ heap_endscan(scan); heap_close(rel, AccessShareLock); PG_RETURN_INT64(result); }
Re: [PERFORM] Hardware/OS recommendations for large databases (
Alan, On 11/23/05 2:00 PM, "Alan Stange" <[EMAIL PROTECTED]> wrote: > Luke Lonergan wrote: >> Why not contribute something - put up proof of your stated 8KB versus >> 32KB page size improvement. > > I did observe that 32KB block sizes were a significant win "for our > usage patterns". It might be a win for any of the following reasons: > (* big snip *) Though all of what you relate is interesting, it seems irrelevant to your earlier statement here: >> Alan Stange <[EMAIL PROTECTED]> writes: >> If your goal is sequential IO, then one must use larger block sizes. >> No one would use 8KB IO for achieving high sequential IO rates. Simply >> put, read() is about the slowest way to get 8KB of data. Switching >> to 32KB blocks reduces all the system call overhead by a large margin. >> Larger blocks would be better still, up to the stripe size of your >> mirror. (Of course, you're using a mirror and not raid5 if you care >> about performance.) And I am interested in seeing if your statement is correct. Do you have any proof of this to share? - Luke ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan
Hi, PostgreSQL 8.1 fresh install on a freshly installed OpenBSD 3.8 box. postgres=# CREATE DATABASE test; CREATE DATABASE postgres=# create table test (id serial, val integer); NOTICE: CREATE TABLE will create implicit sequence "test_id_seq" for serial column "test.id" CREATE TABLE postgres=# create unique index testid on test (id); CREATE INDEX postgres=# create index testval on test (val); CREATE INDEX postgres=# insert into test (val) values (round(random() *1024*1024*1024)); INSERT 0 1 [...] insert many random values postgres=# vaccum full verbose analyze; postgres=# select count(1) from test; count - 2097152 (1 row) postgres=# explain select count(*) from (select distinct on (val) * from test) as foo; QUERY PLAN -- Aggregate (cost=66328.72..66328.73 rows=1 width=0) -> Unique (cost=0.00..40114.32 rows=2097152 width=8) -> Index Scan using testval on test (cost=0.00..34871.44 rows=2097152 width=8) (3 rows) postgres=# set enable_indexscan=off; postgres=# explain analyze select count(*) from (select distinct on (val) * from test) as foo; QUERY PLAN Aggregate (cost=280438.64..280438.65 rows=1 width=0) (actual time=39604.107..39604.108 rows=1 loops=1) -> Unique (cost=243738.48..254224.24 rows=2097152 width=8) (actual time=30281.004..37746.488 rows=2095104 loops=1) -> Sort (cost=243738.48..248981.36 rows=2097152 width=8) (actual time=30280.999..33744.197 rows=2097152 loops=1) Sort Key: test.val -> Seq Scan on test (cost=0.00..23537.52 rows=2097152 width=8) (actual time=11.550..3262.433 rows=2097152 loops=1) Total runtime: 39624.094 ms (6 rows) postgres=# set enable_indexscan=on; postgres=# explain analyze select count(*) from (select distinct on (val) * from test where val<1000) as foo; QUERY PLAN --- Aggregate (cost=4739.58..4739.59 rows=1 width=0) (actual time=4686.472..4686.473 rows=1 loops=1) -> Unique (cost=4380.56..4483.14 rows=20515 width=8) (actual time=4609.046..4669.289 rows=19237 loops=1) -> Sort (cost=4380.56..4431.85 rows=20515 width=8) (actual time=4609.041..4627.976 rows=19255 loops=1) Sort Key: test.val -> Bitmap Heap Scan on test (cost=88.80..2911.24 rows=20515 width=8) (actual time=130.954..4559.244 rows=19255 loops=1) Recheck Cond: (val < 1000) -> Bitmap Index Scan on testval (cost=0.00..88.80 rows=20515 width=0) (actual time=120.041..120.041 rows=19255 loops=1) Index Cond: (val < 1000) Total runtime: 4690.513 ms (9 rows) postgres=# explain select count(*) from (select distinct on (val) * from test where val<1) as foo; QUERY PLAN - Aggregate (cost=16350.20..16350.21 rows=1 width=0) -> Unique (cost=0.00..13748.23 rows=208158 width=8) -> Index Scan using testval on test (cost=0.00..13227.83 rows=208158 width=8) Index Cond: (val < 1) (4 rows) postgres=# set enable_indexscan=off; postgres=# explain analyze select count(*) from (select distinct on (val) * from test where val<1) as foo; QUERY PLAN Aggregate (cost=28081.27..28081.28 rows=1 width=0) (actual time=6444.650..6444.651 rows=1 loops=1) -> Unique (cost=24438.50..25479.29 rows=208158 width=8) (actual time=5669.118..6277.206 rows=194142 loops=1) -> Sort (cost=24438.50..24958.89 rows=208158 width=8) (actual time=5669.112..5852.351 rows=194342 loops=1) Sort Key: test.val -> Bitmap Heap Scan on test (cost=882.55..6050.53 rows=208158 width=8) (actual time=1341.114..4989.840 rows=194342 loops=1) Recheck Cond: (val < 1) -> Bitmap Index Scan on testval (cost=0.00..882.55 rows=208158 width=0) (actual time=1339.707..1339.707 rows=194342 loops=1) Index Cond: (val < 1) Total runtime: 6487.114 ms (9 rows) postgres=# explain analyze select count(*) from (select distinct on (val) * from test where val<75
Re: [PERFORM] Hardware/OS recommendations for large databases (
Luke Lonergan wrote: Why not contribute something - put up proof of your stated 8KB versus 32KB page size improvement. I did observe that 32KB block sizes were a significant win "for our usage patterns". It might be a win for any of the following reasons: 0) The preliminaries: ~300GB database with about ~50GB daily turnover. Our data is fairly reasonably grouped. If we're getting one item on a page we're usually looking at the other items as well. 1) we can live with a smaller FSM size. We were often leaking pages with a 10M page FSM setting. With 32K pages, a 10M FSM size is sufficient. Yes, the solution to this is "run vacuum more often", but when the vacuum was taking 10 hours at a time, that was hard to do. 2) The typical datum size in our largest table is about 2.8KB, which is more than 1/4 page size thus resulting in the use of a toast table. Switching to 32KB pages allows us to get a decent storage of this data into the main tables, thus avoiding another table and associated large index. Not having the extra index in memory for a table with 90M rows is probably beneficial. 3) vacuum time has been substantially reduced. Vacuum analyze now run in the 2 to 3 hour range depending on load. 4) less cpu time spent in the kernel. We're basically doing 1/4 as many system calls. Overall the system has now been working well. We used to see the database being a bottleneck at times, but now it's keeping up nicely. Hope this helps. Happy Thanksgiving! -- Alan ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] High context switches occurring
Simon, I tested it by running two of those simultaneous queries (the 'unoptimized' one), and it doesn't make any difference whether vm.max-readahead is 256 or 2048...the modified query runs in a snap. Thanks, Anjan -Original Message- From: Anjan Dave Sent: Wednesday, November 23, 2005 1:33 PM To: Simon Riggs Cc: Scott Marlowe; Tom Lane; Vivek Khera; Postgresql Performance Subject: Re: [PERFORM] High context switches occurring The offending SELECT query that invoked the CS storm was optimized by folks here last night, so it's hard to say if the VM setting made a difference. I'll give it a try anyway. Thanks, Anjan -Original Message- From: Simon Riggs [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 23, 2005 1:14 PM To: Anjan Dave Cc: Scott Marlowe; Tom Lane; Vivek Khera; Postgresql Performance Subject: Re: [PERFORM] High context switches occurring On Tue, 2005-11-22 at 18:17 -0500, Anjan Dave wrote: > It's mostly a 'read' application, I increased the vm.max-readahead to > 2048 from the default 256, after which I've not seen the CS storm, > though it could be incidental. Can you verify this, please? Turn it back down again, try the test, then reset and try the test. If that is a repeatable way of recreating one manifestation of the problem then we will be further ahead than we are now. Thanks, Best Regards, Simon Riggs ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] High context switches occurring
The offending SELECT query that invoked the CS storm was optimized by folks here last night, so it's hard to say if the VM setting made a difference. I'll give it a try anyway. Thanks, Anjan -Original Message- From: Simon Riggs [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 23, 2005 1:14 PM To: Anjan Dave Cc: Scott Marlowe; Tom Lane; Vivek Khera; Postgresql Performance Subject: Re: [PERFORM] High context switches occurring On Tue, 2005-11-22 at 18:17 -0500, Anjan Dave wrote: > It's mostly a 'read' application, I increased the vm.max-readahead to > 2048 from the default 256, after which I've not seen the CS storm, > though it could be incidental. Can you verify this, please? Turn it back down again, try the test, then reset and try the test. If that is a repeatable way of recreating one manifestation of the problem then we will be further ahead than we are now. Thanks, Best Regards, Simon Riggs ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] High context switches occurring
On Tue, 2005-11-22 at 18:17 -0500, Anjan Dave wrote: > It's mostly a 'read' application, I increased the vm.max-readahead to > 2048 from the default 256, after which I've not seen the CS storm, > though it could be incidental. Can you verify this, please? Turn it back down again, try the test, then reset and try the test. If that is a repeatable way of recreating one manifestation of the problem then we will be further ahead than we are now. Thanks, Best Regards, Simon Riggs ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Hardware/OS recommendations for large databases (
Title: Re: [PERFORM] Hardware/OS recommendations for large databases ( Alan, Why not contribute something - put up proof of your stated 8KB versus 32KB page size improvement. - Luke
Re: [PERFORM] Hardware/OS recommendations for large databases (
Bruce, On 11/22/05 4:13 PM, "Bruce Momjian" wrote: > Perfect summary. We have a background writer now. Ideally we would > have a background reader, that reads-ahead blocks into the buffer cache. > The problem is that while there is a relatively long time between a > buffer being dirtied and the time it must be on disk (checkpoint time), > the read-ahead time is much shorter, requiring some kind of quick > "create a thread" approach that could easily bog us down as outlined > above. Yes, the question is "how much read-ahead buffer is needed to equate to the 38% of I/O wait time in the current executor profile?" The idea of asynchronous buffering would seem appropriate if the executor would use the 38% of time as useful work. A background reader is an interesting approach - it would require admin management of buffers where AIO would leave that in the kernel. The advantage over AIO would be more universal platform support I suppose? > Right now the file system will do read-ahead for a heap scan (but not an > index scan), but even then, there is time required to get that kernel > block into the PostgreSQL shared buffers, backing up Luke's observation > of heavy memcpy() usage. As evidenced by the 16MB readahead setting still resulting in only 36% IO wait. > So what are our options? mmap()? I have no idea. Seems larger page > size does help. Not sure about that, we used to run with 32KB page size and I didn't see a benefit on seq scan at all. I haven't seen tests in this thread that compare 8K to 32K. - Luke ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster