Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-08 Thread Greg Copeland

Bruce,

Is there remarks along these lines in the performance turning section of
the docs?  Based on what's coming out of this it would seem that
stressing the importance of leaving a notable (rule of thumb here?)
amount for general OS/kernel needs is pretty important.


Greg


On Tue, 2002-10-08 at 09:50, Tom Lane wrote:
> (This is, BTW, one of the reasons for discouraging people from pushing
> Postgres' shared buffer cache up to a large fraction of total RAM;
> starving the kernel of disk buffers is just plain not a good idea.)




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-08 Thread Tom Lane

"Curtis Faith" <[EMAIL PROTECTED]> writes:
> Do you not think this is a potential performance problem to be explored?

I agree that there's a problem if the kernel runs short of buffer space.
I am not sure whether that's really an issue in practical situations,
nor whether we can do much about it at the application level if it is
--- but by all means look for solutions if you are concerned.

(This is, BTW, one of the reasons for discouraging people from pushing
Postgres' shared buffer cache up to a large fraction of total RAM;
starving the kernel of disk buffers is just plain not a good idea.)

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-08 Thread Curtis Faith

> So you think if I try to write a 1 gig file, it will write enough to
> fill up the buffers, then wait while the sync'er writes out a few blocks
> every second, free up some buffers, then write some more?
>
> Take a look at vfs_bio::getnewbuf() on *BSD and you will see that when
> it can't get a buffer, it will async write a dirty buffer to disk.

We've addressed this scenario before, if I recall, the point Greg made
earlier is that buffers getting full means writes become synchronous.

I was trying to point out was that it was very likely that the buffers will
fill even for large buffers and that the writes are going to be driven out
not by efficient ganging but by something approaching LRU flushing, with an
occasional once a second slightly more efficient write of 1/32nd of the
buffers.

Once the buffers get full, all subsequent writes turn into synchronous
writes, since even if the kernel writes asynchronously (meaning it can do
other work), the writing process can't complete, it has to wait until the
buffer has been flushed and is free for the copy. So the relatively poor
implementation (for database inserts at least) of the syncer mechanism will
cost a lot of performance if we get to this synchronous write mode due to a
full buffer. It appears this scenario is much more likely than I had
thought.

Do you not think this is a potential performance problem to be explored?

I'm only pursuing this as hard as I am because I feel like it's deja vu all
over again. I've done this before and found a huge improvement (12X to 20X
for bulk inserts). I'm not necessarily expecting that level of improvement
here but my gut tells me there is more here than seems obvious on the
surface.

> As far as this AIO conversation is concerned, I want to see someone come
> up with some performance improvement that we can only do with AIO.
> Unless I see it, I am not interested in pursuing this thread.

If I come up with something via aio that helps I'd be more than happy if
someone else points out a non-aio way to accomplish the same thing. I'm by
no means married to any particular solutions, I care about getting problems
solved. And I'll stop trying to sell anyone on aio.

- Curtis


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-07 Thread Bruce Momjian

Curtis Faith wrote:
> > This is the trickle syncer.  It prevents bursts of disk activity every
> > 30 seconds.  It is for non-fsync writes, of course, and I assume if the
> > kernel buffers get low, it starts to flush faster.
> 
> AFAICT, the syncer only speeds up when virtual memory paging fills the
> buffers past
> a threshold and even in that event it only speeds it up by a factor of two.
> 
> I can't find any provision for speeding up flushing of the dirty buffers
> when they fill for normal file system writes, so I don't think that
> happens.

So you think if I try to write a 1 gig file, it will write enough to
fill up the buffers, then wait while the sync'er writes out a few blocks
every second, free up some buffers, then write some more?

Take a look at vfs_bio::getnewbuf() on *BSD and you will see that when
it can't get a buffer, it will async write a dirty buffer to disk.

As far as this AIO conversation is concerned, I want to see someone come
up with some performance improvement that we can only do with AIO. 
Unless I see it, I am not interested in pursuing this thread.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-07 Thread Curtis Faith

> Greg Copeland <[EMAIL PROTECTED]> writes:
> > Doesn't this also increase the likelihood that people will be
> > running in a buffer-poor environment more frequently that I
> > previously asserted, especially in very heavily I/O bound
> > systems?  Unless I'm mistaken, that opens the door for a
> > general case of why an aio implementation should be looked into.

Neil Conway replies:
> Well, at least for *this specific sitation*, it doesn't really change
> anything -- since FreeBSD doesn't implement POSIX AIO as far as I
> know, we can't use that as an alternative.

I haven't tried it yet but there does seem to be an aio implementation that
conforms to POSIX in FreeBSD 4.6.2.  Its part of the kernel and can be
found in:
/usr/src/sys/kern/vfs_aio.c

> However, I'd suspect that the FreeBSD kernel allows for some way to
> tune the behavior of the syncer. If that's the case, we could do some
> research into what settings are more appropriate for FreeBSD, and
> recommend those in the docs. I don't run FreeBSD, however -- would
> someone like to volunteer to take a look at this?

I didn't see anything obvious in the docs but I still believe there's some
way to tune it. I'll let everyone know if I find some better settings.

> BTW Curtis, did you happen to check whether this behavior has been
> changed in FreeBSD 5.0?

I haven't checked but I will.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-07 Thread Neil Conway

Greg Copeland <[EMAIL PROTECTED]> writes:
> Doesn't this also increase the likelihood that people will be running in
> a buffer-poor environment more frequently that I previously asserted,
> especially in very heavily I/O bound systems?  Unless I'm mistaken, that
> opens the door for a general case of why an aio implementation should be
> looked into.

Well, at least for *this specific sitation*, it doesn't really change
anything -- since FreeBSD doesn't implement POSIX AIO as far as I
know, we can't use that as an alternative.

However, I'd suspect that the FreeBSD kernel allows for some way to
tune the behavior of the syncer. If that's the case, we could do some
research into what settings are more appropriate for FreeBSD, and
recommend those in the docs. I don't run FreeBSD, however -- would
someone like to volunteer to take a look at this?

BTW Curtis, did you happen to check whether this behavior has been
changed in FreeBSD 5.0?

> Also, on a side note, IIRC, linux kernel 2.5.x has a new priority
> elevator which is said to be MUCH better as saturating disks than ever
> before.

Yeah, there are lots of new & interesting features for database
systems in the new kernel -- I'm looking forward to when 2.6 is widely
deployed...

Cheers,

Neil

-- 
Neil Conway <[EMAIL PROTECTED]> || PGP Key ID: DB3C29FC


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-07 Thread Greg Copeland

On Mon, 2002-10-07 at 15:28, Bruce Momjian wrote:
> This is the trickle syncer.  It prevents bursts of disk activity every
> 30 seconds.  It is for non-fsync writes, of course, and I assume if the
> kernel buffers get low, it starts to flush faster.

Doesn't this also increase the likelihood that people will be running in
a buffer-poor environment more frequently that I previously asserted,
especially in very heavily I/O bound systems?  Unless I'm mistaken, that
opens the door for a general case of why an aio implementation should be
looked into.

Also, on a side note, IIRC, linux kernel 2.5.x has a new priority
elevator which is said to be MUCH better as saturating disks than ever
before.  Once 2.6 (or whatever it's number will be) is released, it may
not be as much of a problem as it seems to be for FreeBSD (I think
that's the one you're using).


Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-07 Thread Curtis Faith

> This is the trickle syncer.  It prevents bursts of disk activity every
> 30 seconds.  It is for non-fsync writes, of course, and I assume if the
> kernel buffers get low, it starts to flush faster.

AFAICT, the syncer only speeds up when virtual memory paging fills the
buffers past
a threshold and even in that event it only speeds it up by a factor of two.

I can't find any provision for speeding up flushing of the dirty buffers
when they fill for normal file system writes, so I don't think that
happens.

- Curtis


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])



Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-07 Thread Bruce Momjian

Curtis Faith wrote:
> Good points.
> 
> Now for some surprising news (at least it surprised me).
> 
> I researched the file system source on my system (FreeBSD 4.6) and found
> that the behavior was optimized for non-database access to eliminate
> unnecessary writes when temp files are created and deleted rapidly. It was
> not optimized to get data to the disk in the most efficient manner.
> 
> The syncer on FreeBSD appears to place dirtied filesystem buffers into
> work queues that range from 1 to SYNCER_MAXDELAY. Each second the syncer
> processes one of the queues and increments a counter syncer_delayno.
> 
> On my system the setting for SYNCER_MAXDELAY is 32. So each second 1/32nd
> of the writes that were buffered are processed. If the syncer gets behind
> and the writes for a given second exceed one second to process the syncer
> does not wait but begins processing the next queue.
> 
> AFAICT this means that there is no opportunity to have writes combined by
> the  disk since they are processed in buckets based on the time the writes
> came in.

This is the trickle syncer.  It prevents bursts of disk activity every
30 seconds.  It is for non-fsync writes, of course, and I assume if the
kernel buffers get low, it starts to flush faster.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html