Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]
So you think if I try to write a 1 gig file, it will write enough to fill up the buffers, then wait while the sync'er writes out a few blocks every second, free up some buffers, then write some more? Take a look at vfs_bio::getnewbuf() on *BSD and you will see that when it can't get a buffer, it will async write a dirty buffer to disk. We've addressed this scenario before, if I recall, the point Greg made earlier is that buffers getting full means writes become synchronous. I was trying to point out was that it was very likely that the buffers will fill even for large buffers and that the writes are going to be driven out not by efficient ganging but by something approaching LRU flushing, with an occasional once a second slightly more efficient write of 1/32nd of the buffers. Once the buffers get full, all subsequent writes turn into synchronous writes, since even if the kernel writes asynchronously (meaning it can do other work), the writing process can't complete, it has to wait until the buffer has been flushed and is free for the copy. So the relatively poor implementation (for database inserts at least) of the syncer mechanism will cost a lot of performance if we get to this synchronous write mode due to a full buffer. It appears this scenario is much more likely than I had thought. Do you not think this is a potential performance problem to be explored? I'm only pursuing this as hard as I am because I feel like it's deja vu all over again. I've done this before and found a huge improvement (12X to 20X for bulk inserts). I'm not necessarily expecting that level of improvement here but my gut tells me there is more here than seems obvious on the surface. As far as this AIO conversation is concerned, I want to see someone come up with some performance improvement that we can only do with AIO. Unless I see it, I am not interested in pursuing this thread. If I come up with something via aio that helps I'd be more than happy if someone else points out a non-aio way to accomplish the same thing. I'm by no means married to any particular solutions, I care about getting problems solved. And I'll stop trying to sell anyone on aio. - Curtis ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]
Curtis Faith [EMAIL PROTECTED] writes: Do you not think this is a potential performance problem to be explored? I agree that there's a problem if the kernel runs short of buffer space. I am not sure whether that's really an issue in practical situations, nor whether we can do much about it at the application level if it is --- but by all means look for solutions if you are concerned. (This is, BTW, one of the reasons for discouraging people from pushing Postgres' shared buffer cache up to a large fraction of total RAM; starving the kernel of disk buffers is just plain not a good idea.) regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]
Bruce, Is there remarks along these lines in the performance turning section of the docs? Based on what's coming out of this it would seem that stressing the importance of leaving a notable (rule of thumb here?) amount for general OS/kernel needs is pretty important. Greg On Tue, 2002-10-08 at 09:50, Tom Lane wrote: (This is, BTW, one of the reasons for discouraging people from pushing Postgres' shared buffer cache up to a large fraction of total RAM; starving the kernel of disk buffers is just plain not a good idea.) signature.asc Description: This is a digitally signed message part
[HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]
On Sun, 2002-10-06 at 11:46, Tom Lane wrote: I can't personally get excited about something that only helps if your server is starved for RAM --- who runs servers that aren't fat on RAM anymore? But give it a shot if you like. Perhaps your analysis is pessimistic. snipped I don't find it far fetched to imagine situations where people may commit large amounts of memory for the database yet marginally starve available memory for file system buffers. Especially so on heavily I/O bound systems or where sporadicly other types of non-database file activity may occur. snipped Of course, that opens the door for simply adding more memory and/or slightly reducing the amount of memory available to the database (thus making it available elsewhere). Now, after all that's said and done, having something like aio in use would seemingly allowing it to be somewhat more self-tuning from a potential performance perspective. Good points. Now for some surprising news (at least it surprised me). I researched the file system source on my system (FreeBSD 4.6) and found that the behavior was optimized for non-database access to eliminate unnecessary writes when temp files are created and deleted rapidly. It was not optimized to get data to the disk in the most efficient manner. The syncer on FreeBSD appears to place dirtied filesystem buffers into work queues that range from 1 to SYNCER_MAXDELAY. Each second the syncer processes one of the queues and increments a counter syncer_delayno. On my system the setting for SYNCER_MAXDELAY is 32. So each second 1/32nd of the writes that were buffered are processed. If the syncer gets behind and the writes for a given second exceed one second to process the syncer does not wait but begins processing the next queue. AFAICT this means that there is no opportunity to have writes combined by the disk since they are processed in buckets based on the time the writes came in. Also, it seems very likely that many installations won't have enough buffers for 30 seconds worth of changes and that there would be some level of SYNCHRONOUS writing because of this delay and the syncer process getting backed up. This might happen once per second as the buffers get full and the syncer has not yet started for that second interval. Linux might handle this better. I saw some emails exchanged a year or so ago about starting writes immediately in a low-priority way but I'm not sure if those patches got applied to the linux kernel or not. The source I had access to seems to do something analogous to FreeBSD but using fixed percentages of the dirty blocks or a minimum number of blocks. They appear to be handled in LRU order however. On-disk caches are much much larger these days so it seems that some way of getting the data out sooner would result in better write performance for the cache. My newer drive is a 10K RPM IBM Ultrastar SCSI and it has a 4M cache. I don't see these caches getting smaller over time so not letting the disk see writes will become more and more of a performance drain. - Curtis ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]
Curtis Faith wrote: Good points. Now for some surprising news (at least it surprised me). I researched the file system source on my system (FreeBSD 4.6) and found that the behavior was optimized for non-database access to eliminate unnecessary writes when temp files are created and deleted rapidly. It was not optimized to get data to the disk in the most efficient manner. The syncer on FreeBSD appears to place dirtied filesystem buffers into work queues that range from 1 to SYNCER_MAXDELAY. Each second the syncer processes one of the queues and increments a counter syncer_delayno. On my system the setting for SYNCER_MAXDELAY is 32. So each second 1/32nd of the writes that were buffered are processed. If the syncer gets behind and the writes for a given second exceed one second to process the syncer does not wait but begins processing the next queue. AFAICT this means that there is no opportunity to have writes combined by the disk since they are processed in buckets based on the time the writes came in. This is the trickle syncer. It prevents bursts of disk activity every 30 seconds. It is for non-fsync writes, of course, and I assume if the kernel buffers get low, it starts to flush faster. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]
This is the trickle syncer. It prevents bursts of disk activity every 30 seconds. It is for non-fsync writes, of course, and I assume if the kernel buffers get low, it starts to flush faster. AFAICT, the syncer only speeds up when virtual memory paging fills the buffers past a threshold and even in that event it only speeds it up by a factor of two. I can't find any provision for speeding up flushing of the dirty buffers when they fill for normal file system writes, so I don't think that happens. - Curtis ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]
On Mon, 2002-10-07 at 15:28, Bruce Momjian wrote: This is the trickle syncer. It prevents bursts of disk activity every 30 seconds. It is for non-fsync writes, of course, and I assume if the kernel buffers get low, it starts to flush faster. Doesn't this also increase the likelihood that people will be running in a buffer-poor environment more frequently that I previously asserted, especially in very heavily I/O bound systems? Unless I'm mistaken, that opens the door for a general case of why an aio implementation should be looked into. Also, on a side note, IIRC, linux kernel 2.5.x has a new priority elevator which is said to be MUCH better as saturating disks than ever before. Once 2.6 (or whatever it's number will be) is released, it may not be as much of a problem as it seems to be for FreeBSD (I think that's the one you're using). Greg signature.asc Description: This is a digitally signed message part
Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]
Greg Copeland [EMAIL PROTECTED] writes: Doesn't this also increase the likelihood that people will be running in a buffer-poor environment more frequently that I previously asserted, especially in very heavily I/O bound systems? Unless I'm mistaken, that opens the door for a general case of why an aio implementation should be looked into. Neil Conway replies: Well, at least for *this specific sitation*, it doesn't really change anything -- since FreeBSD doesn't implement POSIX AIO as far as I know, we can't use that as an alternative. I haven't tried it yet but there does seem to be an aio implementation that conforms to POSIX in FreeBSD 4.6.2. Its part of the kernel and can be found in: /usr/src/sys/kern/vfs_aio.c However, I'd suspect that the FreeBSD kernel allows for some way to tune the behavior of the syncer. If that's the case, we could do some research into what settings are more appropriate for FreeBSD, and recommend those in the docs. I don't run FreeBSD, however -- would someone like to volunteer to take a look at this? I didn't see anything obvious in the docs but I still believe there's some way to tune it. I'll let everyone know if I find some better settings. BTW Curtis, did you happen to check whether this behavior has been changed in FreeBSD 5.0? I haven't checked but I will. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]
Greg Copeland [EMAIL PROTECTED] writes: Doesn't this also increase the likelihood that people will be running in a buffer-poor environment more frequently that I previously asserted, especially in very heavily I/O bound systems? Unless I'm mistaken, that opens the door for a general case of why an aio implementation should be looked into. Well, at least for *this specific sitation*, it doesn't really change anything -- since FreeBSD doesn't implement POSIX AIO as far as I know, we can't use that as an alternative. However, I'd suspect that the FreeBSD kernel allows for some way to tune the behavior of the syncer. If that's the case, we could do some research into what settings are more appropriate for FreeBSD, and recommend those in the docs. I don't run FreeBSD, however -- would someone like to volunteer to take a look at this? BTW Curtis, did you happen to check whether this behavior has been changed in FreeBSD 5.0? Also, on a side note, IIRC, linux kernel 2.5.x has a new priority elevator which is said to be MUCH better as saturating disks than ever before. Yeah, there are lots of new interesting features for database systems in the new kernel -- I'm looking forward to when 2.6 is widely deployed... Cheers, Neil -- Neil Conway [EMAIL PROTECTED] || PGP Key ID: DB3C29FC ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Dirty Buffer Writing [was Proposed LogWriter Scheme]
Curtis Faith wrote: This is the trickle syncer. It prevents bursts of disk activity every 30 seconds. It is for non-fsync writes, of course, and I assume if the kernel buffers get low, it starts to flush faster. AFAICT, the syncer only speeds up when virtual memory paging fills the buffers past a threshold and even in that event it only speeds it up by a factor of two. I can't find any provision for speeding up flushing of the dirty buffers when they fill for normal file system writes, so I don't think that happens. So you think if I try to write a 1 gig file, it will write enough to fill up the buffers, then wait while the sync'er writes out a few blocks every second, free up some buffers, then write some more? Take a look at vfs_bio::getnewbuf() on *BSD and you will see that when it can't get a buffer, it will async write a dirty buffer to disk. As far as this AIO conversation is concerned, I want to see someone come up with some performance improvement that we can only do with AIO. Unless I see it, I am not interested in pursuing this thread. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]