Re: [HACKERS] Seq scans roadmap

2007-05-17 Thread Luke Lonergan
Hi Jeff, On 5/16/07 4:56 PM, Jeff Davis [EMAIL PROTECTED] wrote: The main benefit of a sync scan will be the ability to start the scan where other scans have already filled the I/O cache with useful blocks. This will require some knowledge of the size of the I/O cache by the syncscan logic,

Re: [HACKERS] Seq scans roadmap

2007-05-16 Thread Zeugswetter Andreas ADI SD
32 buffers = 1MB with 32KB blocksize, which spoils the CPU L2 cache effect. I'd say in a scenario where 32k pages are indicated you will also want larger than average L2 caches. How about using 256/blocksize? The reading ahead uses 1/4 ring size. To the best of our knowledge,

Re: [HACKERS] Seq scans roadmap

2007-05-16 Thread Luke Lonergan
I think the analysis on syncscan needs to take the external I/O cache into account. I believe it is not necessary or desirable to keep the scans in lock step within the PG bufcache. The main benefit of a sync scan will be the ability to start the scan where other scans have already filled the

Re: [HACKERS] Seq scans roadmap

2007-05-16 Thread Jeff Davis
On Wed, 2007-05-16 at 10:31 -0700, Luke Lonergan wrote: I think the analysis on syncscan needs to take the external I/O cache into account. I believe it is not necessary or desirable to keep the scans in lock step within the PG bufcache. I partially agree. I don't think we need any huge

Re: [HACKERS] Seq scans roadmap

2007-05-15 Thread Heikki Linnakangas
Just to keep you guys informed, I've been busy testing and pondering over different buffer ring strategies for vacuum, seqscans and copy. Here's what I'm going to do: Use a fixed size ring. Fixed as in doesn't change after the ring is initialized, however different kinds of scans use

Re: [HACKERS] Seq scans roadmap

2007-05-15 Thread Luke Lonergan
-development Cc: Simon Riggs; Zeugswetter Andreas ADI SD; CK.Tan; Luke Lonergan; Jeff Davis Subject: Re: [HACKERS] Seq scans roadmap Just to keep you guys informed, I've been busy testing and pondering over different buffer ring strategies for vacuum, seqscans and copy. Here's what I'm going

Re: [HACKERS] Seq scans roadmap

2007-05-15 Thread Heikki Linnakangas
Luke Lonergan wrote: 32 buffers = 1MB with 32KB blocksize, which spoils the CPU L2 cache effect. How about using 256/blocksize? Sounds reasonable. We need to check the effect on the synchronized scans, though. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] Seq scans roadmap

2007-05-15 Thread Jeff Davis
On Tue, 2007-05-15 at 10:42 +0100, Heikki Linnakangas wrote: Luke Lonergan wrote: 32 buffers = 1MB with 32KB blocksize, which spoils the CPU L2 cache effect. How about using 256/blocksize? Sounds reasonable. We need to check the effect on the synchronized scans, though. I am a

Re: [HACKERS] Seq scans roadmap

2007-05-15 Thread Jim C. Nasby
On Tue, May 15, 2007 at 10:25:35AM -0700, Jeff Davis wrote: On Tue, 2007-05-15 at 10:42 +0100, Heikki Linnakangas wrote: Luke Lonergan wrote: 32 buffers = 1MB with 32KB blocksize, which spoils the CPU L2 cache effect. How about using 256/blocksize? Sounds reasonable. We need

Re: [HACKERS] Seq scans roadmap

2007-05-15 Thread Heikki Linnakangas
Jeff Davis wrote: On Tue, 2007-05-15 at 10:42 +0100, Heikki Linnakangas wrote: Luke Lonergan wrote: 32 buffers = 1MB with 32KB blocksize, which spoils the CPU L2 cache effect. How about using 256/blocksize? Sounds reasonable. We need to check the effect on the synchronized scans, though.

Re: [HACKERS] Seq scans roadmap

2007-05-14 Thread Heikki Linnakangas
Simon Riggs wrote: On Fri, 2007-05-11 at 22:59 +0100, Heikki Linnakangas wrote: For comparison, here's the test results with vanilla CVS HEAD: copy-head | 00:06:21.533137 copy-head | 00:05:54.141285 I'm slightly worried that the results for COPY aren't anywhere near as

Re: [HACKERS] Seq scans roadmap

2007-05-13 Thread CK Tan
Hi All, COPY/INSERT are also bottlenecked on record at a time insertion into heap, and in checking for pre-insert trigger, post-insert trigger and constraints. To speed things up, we really need to special case insertions without triggers and constraints, [probably allow for unique

Re: [HACKERS] Seq scans roadmap

2007-05-13 Thread Tom Lane
CK Tan [EMAIL PROTECTED] writes: COPY/INSERT are also bottlenecked on record at a time insertion into heap, and in checking for pre-insert trigger, post-insert trigger and constraints. To speed things up, we really need to special case insertions without triggers and constraints,

Re: [HACKERS] Seq scans roadmap

2007-05-13 Thread CK Tan
Sorry, I should have been clearer. I meant because we need to check for trigger firing pre/post insertion, and the trigger definitions expect tuples to be inserted one by one, therefore we cannot insert N- tuples at a time into the heap. Checking for triggers itself is not taking up much

Re: [HACKERS] Seq scans roadmap

2007-05-12 Thread Simon Riggs
On Fri, 2007-05-11 at 22:59 +0100, Heikki Linnakangas wrote: For comparison, here's the test results with vanilla CVS HEAD: copy-head | 00:06:21.533137 copy-head | 00:05:54.141285 I'm slightly worried that the results for COPY aren't anywhere near as good as the SELECT

Re: [HACKERS] Seq scans roadmap

2007-05-12 Thread Luke Lonergan
Hi Simon, On 5/12/07 12:35 AM, Simon Riggs [EMAIL PROTECTED] wrote: I'm slightly worried that the results for COPY aren't anywhere near as good as the SELECT and VACUUM results. It isn't clear from those numbers that the benefit really is significant. COPY is bottlenecked on datum formation

Re: [HACKERS] Seq scans roadmap

2007-05-11 Thread Zeugswetter Andreas ADI SD
Sorry, 16x8K page ring is too small indeed. The reason we selected 16 is because greenplum db runs on 32K page size, so we are indeed reading 128K at a time. The #pages in the ring should be made relative to the page size, so you achieve 128K per read. Ah, ok. New disks here also have a

Re: [HACKERS] Seq scans roadmap

2007-05-11 Thread Heikki Linnakangas
I wrote: I'll review my test methodology and keep testing... I ran a set of tests on a 100 warehouse TPC-C stock table that is ~3.2 GB in size and the server has 4 GB of memory. IOW the table fits in OS cache, but not in shared_buffers (set at 1 GB). copy - COPY from a file select - SELECT

Re: [HACKERS] Seq scans roadmap

2007-05-10 Thread Zeugswetter Andreas ADI SD
In reference to the seq scans roadmap, I have just submitted a patch that addresses some of the concerns. The patch does this: 1. for small relation (smaller than 60% of bufferpool), use the current logic 2. for big relation: - use a ring buffer in heap scan - pin first

Re: [HACKERS] Seq scans roadmap

2007-05-10 Thread Heikki Linnakangas
Zeugswetter Andreas ADI SD wrote: In reference to the seq scans roadmap, I have just submitted a patch that addresses some of the concerns. The patch does this: 1. for small relation (smaller than 60% of bufferpool), use the current logic 2. for big relation: - use a ring buffer in

Re: [HACKERS] Seq scans roadmap

2007-05-10 Thread Zeugswetter Andreas ADI SD
Also, that patch doesn't address the VACUUM issue at all. And using a small fixed size ring with scans that do updates can be devastating. I'm experimenting with different ring sizes for COPY at the moment. Too small ring leads to a lot of WAL flushes, it's basically the same problem we

Re: [HACKERS] Seq scans roadmap

2007-05-10 Thread Heikki Linnakangas
Zeugswetter Andreas ADI SD wrote: Also, that patch doesn't address the VACUUM issue at all. And using a small fixed size ring with scans that do updates can be devastating. I'm experimenting with different ring sizes for COPY at the moment. Too small ring leads to a lot of WAL flushes, it's

Re: [HACKERS] Seq scans roadmap

2007-05-10 Thread Heikki Linnakangas
Heikki Linnakangas wrote: But all these assumptions need to be validated. I'm setting up tests with different ring sizes and queries to get a clear picture of this: - VACUUM on a clean table - VACUUM on a table with 1 dead tuple per page - read-only scan, large table - read-only scan, table

Re: [HACKERS] Seq scans roadmap

2007-05-10 Thread Heikki Linnakangas
Heikki Linnakangas wrote: However, it caught me by total surprise that the performance with 1 buffer is so horrible. Using 2 buffers is enough to avoid whatever the issue is with just 1 buffer. I have no idea what's causing that. There must be some interaction that I don't understand. Ok, I

Re: [HACKERS] Seq scans roadmap

2007-05-10 Thread CK Tan
The patch has no effect on scans that do updates. The KillAndReadBuffer routine does not force out a buffer if the dirty bit is set. So updated pages revert to the current performance characteristics. -cktan GreenPlum, Inc. On May 10, 2007, at 5:22 AM, Heikki Linnakangas wrote:

Re: [HACKERS] Seq scans roadmap

2007-05-10 Thread CK Tan
Sorry, 16x8K page ring is too small indeed. The reason we selected 16 is because greenplum db runs on 32K page size, so we are indeed reading 128K at a time. The #pages in the ring should be made relative to the page size, so you achieve 128K per read. Also agree that KillAndReadBuffer

Re: [HACKERS] Seq scans roadmap

2007-05-09 Thread Zeugswetter Andreas ADI SD
Are you filling multiple buffers in the buffer cache with a single read-call? yes, needs vector or ScatterGather IO. I would expect that to get only moderate improvement. The vast improvement comes from 256k blocksize. To get the full benefit I would think you would want to

Re: [HACKERS] Seq scans roadmap

2007-05-09 Thread Simon Riggs
On Tue, 2007-05-08 at 11:40 +0100, Heikki Linnakangas wrote: Here's my roadmap for the scan-resistant buffer cache and synchronized scans patches. 1. Fix the current vacuum behavior of throwing dirty buffers to the freelist, forcing a lot of WAL flushes. Instead, use a backend-private

Re: [HACKERS] Seq scans roadmap

2007-05-09 Thread CK Tan
Hi, In reference to the seq scans roadmap, I have just submitted a patch that addresses some of the concerns. The patch does this: 1. for small relation (smaller than 60% of bufferpool), use the current logic 2. for big relation: - use a ring buffer in heap scan - pin

[HACKERS] Seq scans roadmap

2007-05-08 Thread Heikki Linnakangas
Here's my roadmap for the scan-resistant buffer cache and synchronized scans patches. 1. Fix the current vacuum behavior of throwing dirty buffers to the freelist, forcing a lot of WAL flushes. Instead, use a backend-private ring of shared buffers that are recycled. This is what Simon's

Re: [HACKERS] Seq scans roadmap

2007-05-08 Thread Luke Lonergan
that is ~ 32KB to minimize L2 cache pollution - Luke -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Heikki Linnakangas Sent: Tuesday, May 08, 2007 3:40 AM To: PostgreSQL-development Cc: Jeff Davis; Simon Riggs Subject: [HACKERS] Seq scans roadmap

Re: [HACKERS] Seq scans roadmap

2007-05-08 Thread Heikki Linnakangas
Luke Lonergan wrote: On 3A: In practice, the popular modern OS'es (BSD/Linux/Solaris/etc) implement dynamic I/O caching. The experiments have shown that benefit of re-using PG buffer cache on large sequential scans is vanishingly small when the buffer cache size is small compared to the system

Re: [HACKERS] Seq scans roadmap

2007-05-08 Thread Luke Lonergan
Heikki, That's interesting. Care to share the results of the experiments you ran? I was thinking of running tests of my own with varying table sizes. Yah - it may take a while - you might get there faster. There are some interesting effects to look at between I/O cache performance and PG

Re: [HACKERS] Seq scans roadmap

2007-05-08 Thread Heikki Linnakangas
Luke Lonergan wrote: What do you mean with using readahead inside the heapscan? Starting an async read request? Nope - just reading N buffers ahead for seqscans. Subsequent calls use previously read pages. The objective is to issue contiguous reads to the OS in sizes greater than the PG page

Re: [HACKERS] Seq scans roadmap

2007-05-08 Thread Zeugswetter Andreas ADI SD
Nope - just reading N buffers ahead for seqscans. Subsequent calls use previously read pages. The objective is to issue contiguous reads to the OS in sizes greater than the PG page size (which is much smaller than what is needed for fast sequential I/O). Problem here is that eighter

Re: [HACKERS] Seq scans roadmap

2007-05-08 Thread Zeugswetter Andreas ADI SD
What do you mean with using readahead inside the heapscan? Starting an async read request? Nope - just reading N buffers ahead for seqscans. Subsequent calls use previously read pages. The objective is to issue contiguous reads to the OS in sizes greater than the PG page size

Re: [HACKERS] Seq scans roadmap

2007-05-08 Thread Gregory Stark
Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes: Are you filling multiple buffers in the buffer cache with a single read-call? yes, needs vector or ScatterGather IO. I would expect that to get only moderate improvement. To get the full benefit I would think you would want to either

Re: [HACKERS] Seq scans roadmap

2007-05-08 Thread Jeff Davis
On Tue, 2007-05-08 at 11:40 +0100, Heikki Linnakangas wrote: I'm going to do this incrementally, and we'll see how far we get for 8.3. We might push 3A and/or 3B to 8.4. First, I'm going to finish up Simon's patch (step 1), run some performance tests with vacuum, and submit a patch for

Re: [HACKERS] Seq scans roadmap

2007-05-08 Thread Jeff Davis
On Tue, 2007-05-08 at 07:47 -0400, Luke Lonergan wrote: Heikki, On 3A: In practice, the popular modern OS'es (BSD/Linux/Solaris/etc) implement dynamic I/O caching. The experiments have shown that benefit of re-using PG buffer cache on large sequential scans is vanishingly small when the