Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-08 Thread Curtis Faith

 So you think if I try to write a 1 gig file, it will write enough to
 fill up the buffers, then wait while the sync'er writes out a few blocks
 every second, free up some buffers, then write some more?

 Take a look at vfs_bio::getnewbuf() on *BSD and you will see that when
 it can't get a buffer, it will async write a dirty buffer to disk.

We've addressed this scenario before, if I recall, the point Greg made
earlier is that buffers getting full means writes become synchronous.

I was trying to point out was that it was very likely that the buffers will
fill even for large buffers and that the writes are going to be driven out
not by efficient ganging but by something approaching LRU flushing, with an
occasional once a second slightly more efficient write of 1/32nd of the
buffers.

Once the buffers get full, all subsequent writes turn into synchronous
writes, since even if the kernel writes asynchronously (meaning it can do
other work), the writing process can't complete, it has to wait until the
buffer has been flushed and is free for the copy. So the relatively poor
implementation (for database inserts at least) of the syncer mechanism will
cost a lot of performance if we get to this synchronous write mode due to a
full buffer. It appears this scenario is much more likely than I had
thought.

Do you not think this is a potential performance problem to be explored?

I'm only pursuing this as hard as I am because I feel like it's deja vu all
over again. I've done this before and found a huge improvement (12X to 20X
for bulk inserts). I'm not necessarily expecting that level of improvement
here but my gut tells me there is more here than seems obvious on the
surface.

 As far as this AIO conversation is concerned, I want to see someone come
 up with some performance improvement that we can only do with AIO.
 Unless I see it, I am not interested in pursuing this thread.

If I come up with something via aio that helps I'd be more than happy if
someone else points out a non-aio way to accomplish the same thing. I'm by
no means married to any particular solutions, I care about getting problems
solved. And I'll stop trying to sell anyone on aio.

- Curtis


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-08 Thread Tom Lane

Curtis Faith [EMAIL PROTECTED] writes:
 Do you not think this is a potential performance problem to be explored?

I agree that there's a problem if the kernel runs short of buffer space.
I am not sure whether that's really an issue in practical situations,
nor whether we can do much about it at the application level if it is
--- but by all means look for solutions if you are concerned.

(This is, BTW, one of the reasons for discouraging people from pushing
Postgres' shared buffer cache up to a large fraction of total RAM;
starving the kernel of disk buffers is just plain not a good idea.)

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-08 Thread Greg Copeland

Bruce,

Is there remarks along these lines in the performance turning section of
the docs?  Based on what's coming out of this it would seem that
stressing the importance of leaving a notable (rule of thumb here?)
amount for general OS/kernel needs is pretty important.


Greg


On Tue, 2002-10-08 at 09:50, Tom Lane wrote:
 (This is, BTW, one of the reasons for discouraging people from pushing
 Postgres' shared buffer cache up to a large fraction of total RAM;
 starving the kernel of disk buffers is just plain not a good idea.)




signature.asc
Description: This is a digitally signed message part


[HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-07 Thread Curtis Faith

 On Sun, 2002-10-06 at 11:46, Tom Lane wrote:
  I can't personally get excited about something that only helps if your
  server is starved for RAM --- who runs servers that aren't fat on RAM
  anymore?  But give it a shot if you like.  Perhaps your analysis is
  pessimistic.

 snipped I don't find it far fetched to
 imagine situations where people may commit large amounts of memory for
 the database yet marginally starve available memory for file system
 buffers.  Especially so on heavily I/O bound systems or where sporadicly
 other types of non-database file activity may occur.

 snipped Of course, that opens the door for simply adding more memory
 and/or slightly reducing the amount of memory available to the database
 (thus making it available elsewhere).  Now, after all that's said and
 done, having something like aio in use would seemingly allowing it to be
 somewhat more self-tuning from a potential performance perspective.

Good points.

Now for some surprising news (at least it surprised me).

I researched the file system source on my system (FreeBSD 4.6) and found
that the behavior was optimized for non-database access to eliminate
unnecessary writes when temp files are created and deleted rapidly. It was
not optimized to get data to the disk in the most efficient manner.

The syncer on FreeBSD appears to place dirtied filesystem buffers into
work queues that range from 1 to SYNCER_MAXDELAY. Each second the syncer
processes one of the queues and increments a counter syncer_delayno.

On my system the setting for SYNCER_MAXDELAY is 32. So each second 1/32nd
of the writes that were buffered are processed. If the syncer gets behind
and the writes for a given second exceed one second to process the syncer
does not wait but begins processing the next queue.

AFAICT this means that there is no opportunity to have writes combined by
the  disk since they are processed in buckets based on the time the writes
came in.

Also, it seems very likely that many installations won't have enough
buffers for 30 seconds worth of changes and that there would be some level
of SYNCHRONOUS writing because of this delay and the syncer process getting
backed up. This might happen once per second as the buffers get full and
the syncer has not yet started for that second interval.

Linux might handle this better. I saw some emails exchanged a year or so
ago about starting writes immediately in a low-priority way but I'm not
sure if those patches got applied to the linux kernel or not. The source I
had access to seems to do something analogous to FreeBSD but using fixed
percentages of the dirty blocks or a minimum number of blocks. They appear
to be handled in LRU order however.

On-disk caches are much much larger these days so it seems that some way of
getting the data out sooner would result in better write performance for
the cache. My newer drive is a 10K RPM IBM Ultrastar SCSI and it has a 4M
cache. I don't see these caches getting smaller over time so not letting
the disk see writes will become more and more of a performance drain.

- Curtis


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-07 Thread Bruce Momjian

Curtis Faith wrote:
 Good points.
 
 Now for some surprising news (at least it surprised me).
 
 I researched the file system source on my system (FreeBSD 4.6) and found
 that the behavior was optimized for non-database access to eliminate
 unnecessary writes when temp files are created and deleted rapidly. It was
 not optimized to get data to the disk in the most efficient manner.
 
 The syncer on FreeBSD appears to place dirtied filesystem buffers into
 work queues that range from 1 to SYNCER_MAXDELAY. Each second the syncer
 processes one of the queues and increments a counter syncer_delayno.
 
 On my system the setting for SYNCER_MAXDELAY is 32. So each second 1/32nd
 of the writes that were buffered are processed. If the syncer gets behind
 and the writes for a given second exceed one second to process the syncer
 does not wait but begins processing the next queue.
 
 AFAICT this means that there is no opportunity to have writes combined by
 the  disk since they are processed in buckets based on the time the writes
 came in.

This is the trickle syncer.  It prevents bursts of disk activity every
30 seconds.  It is for non-fsync writes, of course, and I assume if the
kernel buffers get low, it starts to flush faster.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-07 Thread Curtis Faith

 This is the trickle syncer.  It prevents bursts of disk activity every
 30 seconds.  It is for non-fsync writes, of course, and I assume if the
 kernel buffers get low, it starts to flush faster.

AFAICT, the syncer only speeds up when virtual memory paging fills the
buffers past
a threshold and even in that event it only speeds it up by a factor of two.

I can't find any provision for speeding up flushing of the dirty buffers
when they fill for normal file system writes, so I don't think that
happens.

- Curtis


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-07 Thread Greg Copeland

On Mon, 2002-10-07 at 15:28, Bruce Momjian wrote:
 This is the trickle syncer.  It prevents bursts of disk activity every
 30 seconds.  It is for non-fsync writes, of course, and I assume if the
 kernel buffers get low, it starts to flush faster.

Doesn't this also increase the likelihood that people will be running in
a buffer-poor environment more frequently that I previously asserted,
especially in very heavily I/O bound systems?  Unless I'm mistaken, that
opens the door for a general case of why an aio implementation should be
looked into.

Also, on a side note, IIRC, linux kernel 2.5.x has a new priority
elevator which is said to be MUCH better as saturating disks than ever
before.  Once 2.6 (or whatever it's number will be) is released, it may
not be as much of a problem as it seems to be for FreeBSD (I think
that's the one you're using).


Greg




signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-07 Thread Curtis Faith

 Greg Copeland [EMAIL PROTECTED] writes:
  Doesn't this also increase the likelihood that people will be
  running in a buffer-poor environment more frequently that I
  previously asserted, especially in very heavily I/O bound
  systems?  Unless I'm mistaken, that opens the door for a
  general case of why an aio implementation should be looked into.

Neil Conway replies:
 Well, at least for *this specific sitation*, it doesn't really change
 anything -- since FreeBSD doesn't implement POSIX AIO as far as I
 know, we can't use that as an alternative.

I haven't tried it yet but there does seem to be an aio implementation that
conforms to POSIX in FreeBSD 4.6.2.  Its part of the kernel and can be
found in:
/usr/src/sys/kern/vfs_aio.c

 However, I'd suspect that the FreeBSD kernel allows for some way to
 tune the behavior of the syncer. If that's the case, we could do some
 research into what settings are more appropriate for FreeBSD, and
 recommend those in the docs. I don't run FreeBSD, however -- would
 someone like to volunteer to take a look at this?

I didn't see anything obvious in the docs but I still believe there's some
way to tune it. I'll let everyone know if I find some better settings.

 BTW Curtis, did you happen to check whether this behavior has been
 changed in FreeBSD 5.0?

I haven't checked but I will.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-07 Thread Neil Conway

Greg Copeland [EMAIL PROTECTED] writes:
 Doesn't this also increase the likelihood that people will be running in
 a buffer-poor environment more frequently that I previously asserted,
 especially in very heavily I/O bound systems?  Unless I'm mistaken, that
 opens the door for a general case of why an aio implementation should be
 looked into.

Well, at least for *this specific sitation*, it doesn't really change
anything -- since FreeBSD doesn't implement POSIX AIO as far as I
know, we can't use that as an alternative.

However, I'd suspect that the FreeBSD kernel allows for some way to
tune the behavior of the syncer. If that's the case, we could do some
research into what settings are more appropriate for FreeBSD, and
recommend those in the docs. I don't run FreeBSD, however -- would
someone like to volunteer to take a look at this?

BTW Curtis, did you happen to check whether this behavior has been
changed in FreeBSD 5.0?

 Also, on a side note, IIRC, linux kernel 2.5.x has a new priority
 elevator which is said to be MUCH better as saturating disks than ever
 before.

Yeah, there are lots of new  interesting features for database
systems in the new kernel -- I'm looking forward to when 2.6 is widely
deployed...

Cheers,

Neil

-- 
Neil Conway [EMAIL PROTECTED] || PGP Key ID: DB3C29FC


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]

2002-10-07 Thread Bruce Momjian

Curtis Faith wrote:
  This is the trickle syncer.  It prevents bursts of disk activity every
  30 seconds.  It is for non-fsync writes, of course, and I assume if the
  kernel buffers get low, it starts to flush faster.
 
 AFAICT, the syncer only speeds up when virtual memory paging fills the
 buffers past
 a threshold and even in that event it only speeds it up by a factor of two.
 
 I can't find any provision for speeding up flushing of the dirty buffers
 when they fill for normal file system writes, so I don't think that
 happens.

So you think if I try to write a 1 gig file, it will write enough to
fill up the buffers, then wait while the sync'er writes out a few blocks
every second, free up some buffers, then write some more?

Take a look at vfs_bio::getnewbuf() on *BSD and you will see that when
it can't get a buffer, it will async write a dirty buffer to disk.

As far as this AIO conversation is concerned, I want to see someone come
up with some performance improvement that we can only do with AIO. 
Unless I see it, I am not interested in pursuing this thread.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]