Hi Jeff,
On 5/16/07 4:56 PM, Jeff Davis [EMAIL PROTECTED] wrote:
The main benefit of a sync scan will be the ability to start the scan where
other scans have already filled the I/O cache with useful blocks. This will
require some knowledge of the size of the I/O cache by the syncscan logic,
32 buffers = 1MB with 32KB blocksize, which spoils the CPU L2
cache effect.
I'd say in a scenario where 32k pages are indicated you will also want
larger than average L2 caches.
How about using 256/blocksize?
The reading ahead uses 1/4 ring size. To the best of our knowledge,
I think the analysis on syncscan needs to take the external I/O cache into
account. I believe it is not necessary or desirable to keep the scans in
lock step within the PG bufcache.
The main benefit of a sync scan will be the ability to start the scan where
other scans have already filled the
On Wed, 2007-05-16 at 10:31 -0700, Luke Lonergan wrote:
I think the analysis on syncscan needs to take the external I/O cache into
account. I believe it is not necessary or desirable to keep the scans in
lock step within the PG bufcache.
I partially agree. I don't think we need any huge
Just to keep you guys informed, I've been busy testing and pondering
over different buffer ring strategies for vacuum, seqscans and copy.
Here's what I'm going to do:
Use a fixed size ring. Fixed as in doesn't change after the ring is
initialized, however different kinds of scans use
-development
Cc: Simon Riggs; Zeugswetter Andreas ADI SD; CK.Tan; Luke
Lonergan; Jeff Davis
Subject: Re: [HACKERS] Seq scans roadmap
Just to keep you guys informed, I've been busy testing and
pondering over different buffer ring strategies for vacuum,
seqscans and copy.
Here's what I'm going
Luke Lonergan wrote:
32 buffers = 1MB with 32KB blocksize, which spoils the CPU L2 cache
effect.
How about using 256/blocksize?
Sounds reasonable. We need to check the effect on the synchronized
scans, though.
--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com
On Tue, 2007-05-15 at 10:42 +0100, Heikki Linnakangas wrote:
Luke Lonergan wrote:
32 buffers = 1MB with 32KB blocksize, which spoils the CPU L2 cache
effect.
How about using 256/blocksize?
Sounds reasonable. We need to check the effect on the synchronized
scans, though.
I am a
On Tue, May 15, 2007 at 10:25:35AM -0700, Jeff Davis wrote:
On Tue, 2007-05-15 at 10:42 +0100, Heikki Linnakangas wrote:
Luke Lonergan wrote:
32 buffers = 1MB with 32KB blocksize, which spoils the CPU L2 cache
effect.
How about using 256/blocksize?
Sounds reasonable. We need
Jeff Davis wrote:
On Tue, 2007-05-15 at 10:42 +0100, Heikki Linnakangas wrote:
Luke Lonergan wrote:
32 buffers = 1MB with 32KB blocksize, which spoils the CPU L2 cache
effect.
How about using 256/blocksize?
Sounds reasonable. We need to check the effect on the synchronized
scans, though.
Simon Riggs wrote:
On Fri, 2007-05-11 at 22:59 +0100, Heikki Linnakangas wrote:
For comparison, here's the test results with vanilla CVS HEAD:
copy-head | 00:06:21.533137
copy-head | 00:05:54.141285
I'm slightly worried that the results for COPY aren't anywhere near as
Hi All,
COPY/INSERT are also bottlenecked on record at a time insertion into
heap, and in checking for pre-insert trigger, post-insert trigger and
constraints.
To speed things up, we really need to special case insertions without
triggers and constraints, [probably allow for unique
CK Tan [EMAIL PROTECTED] writes:
COPY/INSERT are also bottlenecked on record at a time insertion into
heap, and in checking for pre-insert trigger, post-insert trigger and
constraints.
To speed things up, we really need to special case insertions without
triggers and constraints,
Sorry, I should have been clearer. I meant because we need to check
for trigger firing pre/post insertion, and the trigger definitions
expect tuples to be inserted one by one, therefore we cannot insert N-
tuples at a time into the heap. Checking for triggers itself is not
taking up much
On Fri, 2007-05-11 at 22:59 +0100, Heikki Linnakangas wrote:
For comparison, here's the test results with vanilla CVS HEAD:
copy-head | 00:06:21.533137
copy-head | 00:05:54.141285
I'm slightly worried that the results for COPY aren't anywhere near as
good as the SELECT
Hi Simon,
On 5/12/07 12:35 AM, Simon Riggs [EMAIL PROTECTED] wrote:
I'm slightly worried that the results for COPY aren't anywhere near as
good as the SELECT and VACUUM results. It isn't clear from those numbers
that the benefit really is significant.
COPY is bottlenecked on datum formation
Sorry, 16x8K page ring is too small indeed. The reason we
selected 16 is because greenplum db runs on 32K page size, so
we are indeed reading 128K at a time. The #pages in the ring
should be made relative to the page size, so you achieve 128K
per read.
Ah, ok. New disks here also have a
I wrote:
I'll review my test methodology and keep testing...
I ran a set of tests on a 100 warehouse TPC-C stock table that is ~3.2
GB in size and the server has 4 GB of memory. IOW the table fits in OS
cache, but not in shared_buffers (set at 1 GB).
copy - COPY from a file
select - SELECT
In reference to the seq scans roadmap, I have just submitted
a patch that addresses some of the concerns.
The patch does this:
1. for small relation (smaller than 60% of bufferpool), use
the current logic 2. for big relation:
- use a ring buffer in heap scan
- pin first
Zeugswetter Andreas ADI SD wrote:
In reference to the seq scans roadmap, I have just submitted
a patch that addresses some of the concerns.
The patch does this:
1. for small relation (smaller than 60% of bufferpool), use
the current logic 2. for big relation:
- use a ring buffer in
Also, that patch doesn't address the VACUUM issue at all. And
using a small fixed size ring with scans that do updates can
be devastating. I'm experimenting with different ring sizes
for COPY at the moment. Too small ring leads to a lot of WAL
flushes, it's basically the same problem we
Zeugswetter Andreas ADI SD wrote:
Also, that patch doesn't address the VACUUM issue at all. And
using a small fixed size ring with scans that do updates can
be devastating. I'm experimenting with different ring sizes
for COPY at the moment. Too small ring leads to a lot of WAL
flushes, it's
Heikki Linnakangas wrote:
But all these assumptions need to be validated. I'm setting up tests
with different ring sizes and queries to get a clear picture of this:
- VACUUM on a clean table
- VACUUM on a table with 1 dead tuple per page
- read-only scan, large table
- read-only scan, table
Heikki Linnakangas wrote:
However, it caught me by total surprise that the performance with 1
buffer is so horrible. Using 2 buffers is enough to avoid whatever the
issue is with just 1 buffer. I have no idea what's causing that. There
must be some interaction that I don't understand.
Ok, I
The patch has no effect on scans that do updates. The
KillAndReadBuffer routine does not force out a buffer if the dirty
bit is set. So updated pages revert to the current performance
characteristics.
-cktan
GreenPlum, Inc.
On May 10, 2007, at 5:22 AM, Heikki Linnakangas wrote:
Sorry, 16x8K page ring is too small indeed. The reason we selected 16
is because greenplum db runs on 32K page size, so we are indeed
reading 128K at a time. The #pages in the ring should be made
relative to the page size, so you achieve 128K per read.
Also agree that KillAndReadBuffer
Are you filling multiple buffers in the buffer cache with a single
read-call?
yes, needs vector or ScatterGather IO.
I would expect that to get only moderate improvement.
The vast improvement comes from 256k blocksize.
To get
the full benefit I would think you would want to
On Tue, 2007-05-08 at 11:40 +0100, Heikki Linnakangas wrote:
Here's my roadmap for the scan-resistant buffer cache and
synchronized scans patches.
1. Fix the current vacuum behavior of throwing dirty buffers to the
freelist, forcing a lot of WAL flushes. Instead, use a backend-private
Hi,
In reference to the seq scans roadmap, I have just submitted a patch
that addresses some of the concerns.
The patch does this:
1. for small relation (smaller than 60% of bufferpool), use the
current logic
2. for big relation:
- use a ring buffer in heap scan
- pin
Here's my roadmap for the scan-resistant buffer cache and
synchronized scans patches.
1. Fix the current vacuum behavior of throwing dirty buffers to the
freelist, forcing a lot of WAL flushes. Instead, use a backend-private
ring of shared buffers that are recycled. This is what Simon's
that is ~ 32KB to
minimize L2 cache pollution
- Luke
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of
Heikki Linnakangas
Sent: Tuesday, May 08, 2007 3:40 AM
To: PostgreSQL-development
Cc: Jeff Davis; Simon Riggs
Subject: [HACKERS] Seq scans roadmap
Luke Lonergan wrote:
On 3A: In practice, the popular modern OS'es (BSD/Linux/Solaris/etc)
implement dynamic I/O caching. The experiments have shown that benefit
of re-using PG buffer cache on large sequential scans is vanishingly
small when the buffer cache size is small compared to the system
Heikki,
That's interesting. Care to share the results of the
experiments you ran? I was thinking of running tests of my
own with varying table sizes.
Yah - it may take a while - you might get there faster.
There are some interesting effects to look at between I/O cache
performance and PG
Luke Lonergan wrote:
What do you mean with using readahead inside the heapscan?
Starting an async read request?
Nope - just reading N buffers ahead for seqscans. Subsequent calls use
previously read pages. The objective is to issue contiguous reads to
the OS in sizes greater than the PG page
Nope - just reading N buffers ahead for seqscans. Subsequent
calls use previously read pages. The objective is to issue
contiguous reads to the OS in sizes greater than the PG page
size (which is much smaller than what is needed for fast
sequential I/O).
Problem here is that eighter
What do you mean with using readahead inside the heapscan?
Starting an async read request?
Nope - just reading N buffers ahead for seqscans. Subsequent calls
use previously read pages. The objective is to issue
contiguous reads
to the OS in sizes greater than the PG page size
Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes:
Are you filling multiple buffers in the buffer cache with a
single read-call?
yes, needs vector or ScatterGather IO.
I would expect that to get only moderate improvement. To get the full benefit
I would think you would want to either
On Tue, 2007-05-08 at 11:40 +0100, Heikki Linnakangas wrote:
I'm going to do this incrementally, and we'll see how far we get for
8.3. We might push 3A and/or 3B to 8.4. First, I'm going to finish up
Simon's patch (step 1), run some performance tests with vacuum, and
submit a patch for
On Tue, 2007-05-08 at 07:47 -0400, Luke Lonergan wrote:
Heikki,
On 3A: In practice, the popular modern OS'es (BSD/Linux/Solaris/etc)
implement dynamic I/O caching. The experiments have shown that benefit
of re-using PG buffer cache on large sequential scans is vanishingly
small when the
39 matches
Mail list logo