Re: [PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan

2005-11-23 Thread andrew
Pailloncy Jean-Gerard <[EMAIL PROTECTED]> wrote ..
[snip]

THIS MAY SEEM SILLY but vacuum is mispelled below and presumably there was 
never any ANALYZE done.

> 
> postgres=# vaccum full verbose analyze;


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-23 Thread Mark Kirkwood

Luke Lonergan wrote:

Mark,

This is an excellent idea – unfortunately I’m in Maui right now 
(Mahalo!) and I’m not getting to testing with this.  My first try was 
with 8.0.3 and it’s an 8.1 function I presume.


Not to be lazy – but any hint as to how to do the same thing for 8.0?



Yeah, it's 8.1 - I didn't think to check against 8.0. The attached 
variant works with 8.0.4 (textToQualifiedNameList needs 2 args)


cheers

Mark

P.s. Maui eh, sounds real nice.
/*
 * fastcount.c
 *
 * Do a count that uses considerably less CPU time than an aggregate.
 *
 * (Variant for 8.0.x - textToQualifiedNameList needs 2 args)
 */

#include "postgres.h"

#include "funcapi.h"
#include "access/heapam.h"
#include "catalog/namespace.h"
#include "utils/builtins.h"


extern Datum fastcount(PG_FUNCTION_ARGS);


PG_FUNCTION_INFO_V1(fastcount);
Datum
fastcount(PG_FUNCTION_ARGS)
{
text   *relname = PG_GETARG_TEXT_P(0);
RangeVar   *relrv;
Relationrel;
HeapScanDesc scan;
HeapTuple   tuple;
int64   result = 0;

/* Use the name to get a suitable range variable and open the relation. 
*/
relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname, ""));
rel = heap_openrv(relrv, AccessShareLock);

/* Start a heap scan on the relation. */
scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
{
result++;
}

/* End the scan and close up the relation. */
heap_endscan(scan);
heap_close(rel, AccessShareLock);


PG_RETURN_INT64(result);
}

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan

2005-11-23 Thread Tom Lane
Pailloncy Jean-Gerard <[EMAIL PROTECTED]> writes:
> Why the stupid indexscan plan on the whole table ?

Pray tell, what are you using for the planner cost parameters?
The only way I can come close to duplicating your numbers is
by setting random_page_cost to somewhere around 0.01 ...

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-23 Thread Luke Lonergan
Title: Re: [PERFORM] Hardware/OS recommendations for large databases (



Mark,

This is an excellent idea – unfortunately I’m in Maui right now (Mahalo!) and I’m not getting to testing with this.  My first try was with 8.0.3 and it’s an 8.1 function I presume.

Not to be lazy – but any hint as to how to do the same thing for 8.0?

- Luke


On 11/21/05 9:10 PM, "Mark Kirkwood" <[EMAIL PROTECTED]> wrote:

Luke Lonergan wrote:

> So that leaves the question - why not more than 64% of the I/O scan rate?
> And why is it a flat 64% as the I/O subsystem increases in speed from
> 333-400MB/s?
>

It might be interesting to see what effect reducing the cpu consumption
  entailed by the count aggregation has - by (say) writing a little bit
of code to heap scan the desired relation (sample attached).

Cheers

Mark





/*
 * fastcount.c
 *
 * Do a count that uses considerably less CPU time than an aggregate.
 */

#include "postgres.h"

#include "funcapi.h"
#include "access/heapam.h"
#include "catalog/namespace.h"
#include "utils/builtins.h"


extern Datum fastcount(PG_FUNCTION_ARGS);


PG_FUNCTION_INFO_V1(fastcount);
Datum
fastcount(PG_FUNCTION_ARGS)
{
 text    *relname = PG_GETARG_TEXT_P(0);
 RangeVar   *relrv;
 Relation rel;
 HeapScanDesc scan;
 HeapTuple tuple;
 int64  result = 0;

 /* Use the name to get a suitable range variable and open the relation. */
 relrv = makeRangeVarFromNameList(textToQualifiedNameList(relname));
 rel = heap_openrv(relrv, AccessShareLock);

 /* Start a heap scan on the relation. */
 scan = heap_beginscan(rel, SnapshotNow, 0, NULL);
 while ((tuple = heap_getnext(scan, ForwardScanDirection)) != NULL)
 {
  result++;
 }

 /* End the scan and close up the relation. */
 heap_endscan(scan);
 heap_close(rel, AccessShareLock);


 PG_RETURN_INT64(result);
}







Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-23 Thread Luke Lonergan
Alan,

On 11/23/05 2:00 PM, "Alan Stange" <[EMAIL PROTECTED]> wrote:

> Luke Lonergan wrote:
>> Why not contribute something - put up proof of your stated 8KB versus
>> 32KB page size improvement.
> 
> I did observe that 32KB block sizes were a significant win "for our
> usage patterns".   It might be a win for any of the following reasons:
> (* big snip *)

Though all of what you relate is interesting, it seems irrelevant to your
earlier statement here:

>> Alan Stange <[EMAIL PROTECTED]> writes:
>> If your goal is sequential IO, then one must use larger block sizes.
>> No one would use 8KB IO for achieving high sequential IO rates.   Simply
>> put, read() is about the slowest way to get 8KB of data. Switching
>> to 32KB blocks reduces all the system call overhead by a large margin.
>> Larger blocks would be better still, up to the stripe size of your
>> mirror.   (Of course, you're using a mirror and not raid5 if you care
>> about performance.)

And I am interested in seeing if your statement is correct.  Do you have any
proof of this to share?

- Luke



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan

2005-11-23 Thread Pailloncy Jean-Gerard

Hi,

PostgreSQL 8.1 fresh install on a freshly installed OpenBSD 3.8 box.

postgres=# CREATE DATABASE test;
CREATE DATABASE
postgres=# create table test (id serial, val integer);
NOTICE:  CREATE TABLE will create implicit sequence "test_id_seq" for  
serial column "test.id"

CREATE TABLE
postgres=# create unique index testid on test (id);
CREATE INDEX
postgres=# create index testval on test (val);
CREATE INDEX
postgres=# insert into test (val) values (round(random() 
*1024*1024*1024));

INSERT 0 1

[...] insert many random values

postgres=# vaccum full verbose analyze;
postgres=# select count(1) from test;
  count
-
2097152
(1 row)

postgres=# explain select count(*) from (select distinct on (val) *  
from test) as foo;

QUERY PLAN
 
--

Aggregate  (cost=66328.72..66328.73 rows=1 width=0)
   ->  Unique  (cost=0.00..40114.32 rows=2097152 width=8)
 ->  Index Scan using testval on test  (cost=0.00..34871.44  
rows=2097152 width=8)

(3 rows)

postgres=# set enable_indexscan=off;
postgres=# explain analyze select count(*) from (select distinct on  
(val) * from test) as foo;

 QUERY PLAN
 

Aggregate  (cost=280438.64..280438.65 rows=1 width=0) (actual  
time=39604.107..39604.108 rows=1 loops=1)
   ->  Unique  (cost=243738.48..254224.24 rows=2097152 width=8)  
(actual time=30281.004..37746.488 rows=2095104 loops=1)
 ->  Sort  (cost=243738.48..248981.36 rows=2097152 width=8)  
(actual time=30280.999..33744.197 rows=2097152 loops=1)

   Sort Key: test.val
   ->  Seq Scan on test  (cost=0.00..23537.52  
rows=2097152 width=8) (actual time=11.550..3262.433 rows=2097152  
loops=1)

Total runtime: 39624.094 ms
(6 rows)

postgres=# set enable_indexscan=on;
postgres=# explain analyze select count(*) from (select distinct on  
(val) * from test where val<1000) as foo;
   
QUERY PLAN
 
---
Aggregate  (cost=4739.58..4739.59 rows=1 width=0) (actual  
time=4686.472..4686.473 rows=1 loops=1)
   ->  Unique  (cost=4380.56..4483.14 rows=20515 width=8) (actual  
time=4609.046..4669.289 rows=19237 loops=1)
 ->  Sort  (cost=4380.56..4431.85 rows=20515 width=8)  
(actual time=4609.041..4627.976 rows=19255 loops=1)

   Sort Key: test.val
   ->  Bitmap Heap Scan on test  (cost=88.80..2911.24  
rows=20515 width=8) (actual time=130.954..4559.244 rows=19255 loops=1)

 Recheck Cond: (val < 1000)
 ->  Bitmap Index Scan on testval   
(cost=0.00..88.80 rows=20515 width=0) (actual time=120.041..120.041  
rows=19255 loops=1)

   Index Cond: (val < 1000)
Total runtime: 4690.513 ms
(9 rows)

postgres=# explain select count(*) from (select distinct on (val) *  
from test where val<1) as foo;

   QUERY PLAN
 
-

Aggregate  (cost=16350.20..16350.21 rows=1 width=0)
   ->  Unique  (cost=0.00..13748.23 rows=208158 width=8)
 ->  Index Scan using testval on test  (cost=0.00..13227.83  
rows=208158 width=8)

   Index Cond: (val < 1)
(4 rows)

postgres=# set enable_indexscan=off;
postgres=# explain analyze select count(*) from (select distinct on  
(val) * from test where val<1) as foo;
  
QUERY PLAN
 
 

Aggregate  (cost=28081.27..28081.28 rows=1 width=0) (actual  
time=6444.650..6444.651 rows=1 loops=1)
   ->  Unique  (cost=24438.50..25479.29 rows=208158 width=8) (actual  
time=5669.118..6277.206 rows=194142 loops=1)
 ->  Sort  (cost=24438.50..24958.89 rows=208158 width=8)  
(actual time=5669.112..5852.351 rows=194342 loops=1)

   Sort Key: test.val
   ->  Bitmap Heap Scan on test  (cost=882.55..6050.53  
rows=208158 width=8) (actual time=1341.114..4989.840 rows=194342  
loops=1)

 Recheck Cond: (val < 1)
 ->  Bitmap Index Scan on testval   
(cost=0.00..882.55 rows=208158 width=0) (actual  
time=1339.707..1339.707 rows=194342 loops=1)

   Index Cond: (val < 1)
Total runtime: 6487.114 ms
(9 rows)

postgres=# explain analyze select count(*) from (select distinct on  
(val) * from test where val<75

Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-23 Thread Alan Stange

Luke Lonergan wrote:
Why not contribute something - put up proof of your stated 8KB versus 
32KB page size improvement.


I did observe that 32KB block sizes were a significant win "for our 
usage patterns".   It might be a win for any of the following reasons:


0) The preliminaries:   ~300GB database with about ~50GB daily 
turnover.   Our data is fairly reasonably grouped.  If we're getting one 
item on a page we're usually looking at the other items as well.


1) we can live with a smaller FSM size.  We were often leaking pages 
with a 10M page FSM setting.  With 32K pages, a 10M FSM size is 
sufficient.   Yes, the solution to this is "run vacuum more often", but 
when the vacuum was taking 10 hours at a time, that was hard to do.


2) The typical datum size in our largest table is about 2.8KB, which is 
more than 1/4 page size thus resulting in the use of a toast table.   
Switching to 32KB pages allows us to get a decent storage of this data 
into the main tables, thus avoiding another table and associated large 
index.   Not having the extra index in memory for a table with 90M rows 
is probably beneficial.


3) vacuum time has been substantially reduced.  Vacuum analyze now run 
in the 2 to 3 hour range depending on load.


4) less cpu time spent in the kernel.  We're basically doing 1/4 as many 
system calls.  

Overall the system has now been working well.  We used to see the 
database being a bottleneck at times, but now it's keeping up nicely.


Hope this helps.

Happy Thanksgiving!

-- Alan

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] High context switches occurring

2005-11-23 Thread Anjan Dave
Simon,

I tested it by running two of those simultaneous queries (the
'unoptimized' one), and it doesn't make any difference whether
vm.max-readahead is 256 or 2048...the modified query runs in a snap.

Thanks,
Anjan

-Original Message-
From: Anjan Dave 
Sent: Wednesday, November 23, 2005 1:33 PM
To: Simon Riggs
Cc: Scott Marlowe; Tom Lane; Vivek Khera; Postgresql Performance
Subject: Re: [PERFORM] High context switches occurring

The offending SELECT query that invoked the CS storm was optimized by
folks here last night, so it's hard to say if the VM setting made a
difference. I'll give it a try anyway.

Thanks,
Anjan

-Original Message-
From: Simon Riggs [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 23, 2005 1:14 PM
To: Anjan Dave
Cc: Scott Marlowe; Tom Lane; Vivek Khera; Postgresql Performance
Subject: Re: [PERFORM] High context switches occurring

On Tue, 2005-11-22 at 18:17 -0500, Anjan Dave wrote:

> It's mostly a 'read' application, I increased the vm.max-readahead to
> 2048 from the default 256, after which I've not seen the CS storm,
> though it could be incidental.

Can you verify this, please?

Turn it back down again, try the test, then reset and try the test.

If that is a repeatable way of recreating one manifestation of the
problem then we will be further ahead than we are now.

Thanks,

Best Regards, Simon Riggs



---(end of broadcast)---
TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] High context switches occurring

2005-11-23 Thread Anjan Dave
The offending SELECT query that invoked the CS storm was optimized by
folks here last night, so it's hard to say if the VM setting made a
difference. I'll give it a try anyway.

Thanks,
Anjan

-Original Message-
From: Simon Riggs [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, November 23, 2005 1:14 PM
To: Anjan Dave
Cc: Scott Marlowe; Tom Lane; Vivek Khera; Postgresql Performance
Subject: Re: [PERFORM] High context switches occurring

On Tue, 2005-11-22 at 18:17 -0500, Anjan Dave wrote:

> It's mostly a 'read' application, I increased the vm.max-readahead to
> 2048 from the default 256, after which I've not seen the CS storm,
> though it could be incidental.

Can you verify this, please?

Turn it back down again, try the test, then reset and try the test.

If that is a repeatable way of recreating one manifestation of the
problem then we will be further ahead than we are now.

Thanks,

Best Regards, Simon Riggs



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] High context switches occurring

2005-11-23 Thread Simon Riggs
On Tue, 2005-11-22 at 18:17 -0500, Anjan Dave wrote:

> It's mostly a 'read' application, I increased the vm.max-readahead to
> 2048 from the default 256, after which I've not seen the CS storm,
> though it could be incidental.

Can you verify this, please?

Turn it back down again, try the test, then reset and try the test.

If that is a repeatable way of recreating one manifestation of the
problem then we will be further ahead than we are now.

Thanks,

Best Regards, Simon Riggs


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-23 Thread Luke Lonergan
Title: Re: [PERFORM] Hardware/OS recommendations for large databases (



Alan,

Why not contribute something - put up proof of your stated 8KB versus 32KB page size improvement.

- Luke





Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-23 Thread Luke Lonergan
Bruce,

On 11/22/05 4:13 PM, "Bruce Momjian"  wrote:

> Perfect summary.  We have a background writer now.  Ideally we would
> have a background reader, that reads-ahead blocks into the buffer cache.
> The problem is that while there is a relatively long time between a
> buffer being dirtied and the time it must be on disk (checkpoint time),
> the read-ahead time is much shorter, requiring some kind of quick
> "create a thread" approach that could easily bog us down as outlined
> above.

Yes, the question is "how much read-ahead buffer is needed to equate to the
38% of I/O wait time in the current executor profile?"

The idea of asynchronous buffering would seem appropriate if the executor
would use the 38% of time as useful work.

A background reader is an interesting approach - it would require admin
management of buffers where AIO would leave that in the kernel.  The
advantage over AIO would be more universal platform support I suppose?

> Right now the file system will do read-ahead for a heap scan (but not an
> index scan), but even then, there is time required to get that kernel
> block into the PostgreSQL shared buffers, backing up Luke's observation
> of heavy memcpy() usage.

As evidenced by the 16MB readahead setting still resulting in only 36% IO
wait.

> So what are our options?  mmap()?  I have no idea.  Seems larger page
> size does help.

Not sure about that, we used to run with 32KB page size and I didn't see a
benefit on seq scan at all.  I haven't seen tests in this thread that
compare 8K to 32K. 

- Luke



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster