Re: [HACKERS] PANIC caused by open_sync on Linux
Added to TODO: * Be more aggressive about creating WAL files http://archives.postgresql.org/pgsql-hackers/2007-10/msg01325.php --- Tom Lane wrote: Greg Smith [EMAIL PROTECTED] writes: On Fri, 26 Oct 2007, ITAGAKI Takahiro wrote: Mixed usage of buffered and direct i/o is legal, but enforces complexity to kernels. If we simplify it, things would be more relaxed. For example, dropping zero-filling and only use direct i/o. Is it possible? It's possible, but performance suffers considerably. I played around with this at one point when looking into doing all database writes as sync writes. Having to wait until the entire 16MB WAL segment made its way to disk before more WAL could be written can cause a nasty pause in activity, even with direct I/O sync writes. Even the current buffered zero-filled write of that size can be a bit of a drag on performance for the clients that get caught behind it, making it any sort of sync write will be far worse. This ties into a loose end we didn't get to yet: being more aggressive about creating future WAL segments. ISTM there is no good reason for clients ever to have to wait for WAL segment creation --- the bgwriter, or possibly the walwriter, ought to handle that in the background. But we only check for the case once per checkpoint and we don't create a segment unless there's very little space left. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Bruce Momjian [EMAIL PROTECTED]http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] PANIC caused by open_sync on Linux
On Fri, Oct 26, 2007 at 10:39:12PM -0400, Greg Smith wrote: There's a couple of potential to-do list ideas that build on the changes in this area in 8.3: I think that's the right way to go. It's too bad that this may still happen in 8.3, but we're way past the point that this is a bug fix, IMO. A -- Andrew Sullivan | [EMAIL PROTECTED] The plural of anecdote is not data. --Roger Brinner ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] PANIC caused by open_sync on Linux
Greg Smith [EMAIL PROTECTED] wrote: There's a couple of potential to-do list ideas that build on the changes in this area in 8.3: -Aggressively pre-allocate WAL segments -Space out checkpoint fsync requests in addition to disk writes -Consider re-inserting a smarter bgwriter all-scan that writes sorted by usage count during idle periods I'd like to add: - Remove filling with zero before we recycle WAL segments. If it is not needed, we can avoid buffered i/o on open_sync except first allocation of segments. I think we can do it if we have more robust WAL records that can ignore garbage data written before. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] PANIC caused by open_sync on Linux
ITAGAKI Takahiro [EMAIL PROTECTED] writes: I'd like to add: - Remove filling with zero before we recycle WAL segments. Huh? We have never done that. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] PANIC caused by open_sync on Linux
Tom Lane [EMAIL PROTECTED] wrote: ITAGAKI Takahiro [EMAIL PROTECTED] writes: I'd like to add: - Remove filling with zero before we recycle WAL segments. Huh? We have never done that. Oh, sorry. I misread the codes. I would avoid PANIC if I have enough segements at start up. I'll test the configuration. Regards, --- ITAGAKI Takahiro NTT Open Source Software Center ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] PANIC caused by open_sync on Linux
Greg Smith [EMAIL PROTECTED] writes: On Fri, 26 Oct 2007, ITAGAKI Takahiro wrote: Mixed usage of buffered and direct i/o is legal, but enforces complexity to kernels. If we simplify it, things would be more relaxed. For example, dropping zero-filling and only use direct i/o. Is it possible? It's possible, but performance suffers considerably. I played around with this at one point when looking into doing all database writes as sync writes. Having to wait until the entire 16MB WAL segment made its way to disk before more WAL could be written can cause a nasty pause in activity, even with direct I/O sync writes. Even the current buffered zero-filled write of that size can be a bit of a drag on performance for the clients that get caught behind it, making it any sort of sync write will be far worse. This ties into a loose end we didn't get to yet: being more aggressive about creating future WAL segments. ISTM there is no good reason for clients ever to have to wait for WAL segment creation --- the bgwriter, or possibly the walwriter, ought to handle that in the background. But we only check for the case once per checkpoint and we don't create a segment unless there's very little space left. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] PANIC caused by open_sync on Linux
On 10/26/07, Tom Lane [EMAIL PROTECTED] wrote: This ties into a loose end we didn't get to yet: being more aggressive about creating future WAL segments. ISTM there is no good reason for clients ever to have to wait for WAL segment creation --- the bgwriter, or possibly the walwriter, ought to handle that in the background. Agreed. -- Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324 EnterpriseDB Corporation| fax: 732.331.1301 499 Thornall Street, 2nd Floor | [EMAIL PROTECTED] Edison, NJ 08837| http://www.enterprisedb.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] PANIC caused by open_sync on Linux
On Fri, Oct 26, 2007 at 08:34:49AM -0400, Tom Lane wrote: we only check for the case once per checkpoint and we don't create a segment unless there's very little space left. Sort of a filthy hack, but what about always having an _extra_ segment around? The bgwriter could do that, no? A -- Andrew Sullivan | [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] PANIC caused by open_sync on Linux
On Fri, 26 Oct 2007, Andrew Sullivan wrote: Sort of a filthy hack, but what about always having an _extra_ segment around? The bgwriter could do that, no? Now it could. The bgwriter in =8.2 stops executing when there's a checkpoint going on, and needing more WAL segments because a checkpoint is taking too long is one of the major failure cases where proactively creating additional segments would be most helpful. The 8.3 bgwriter keeps running even during checkpoints, so it's feasible to add such a feature now. But that only became true well into the 8.3 feature freeze, after some changes Heikki made just before the load distributed checkpoint patch was commited. Before that, it was hard to implement this feature; afterwards, it was too late to fit the change into the 8.3 release. Should be easy enough to add to 8.4 one day. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] PANIC caused by open_sync on Linux
Greg Smith [EMAIL PROTECTED] writes: The 8.3 bgwriter keeps running even during checkpoints, so it's feasible to add such a feature now. I wonder though whether the walwriter wouldn't be a better place for it. regards, tom lane ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] PANIC caused by open_sync on Linux
On Fri, 26 Oct 2007, Tom Lane wrote: The 8.3 bgwriter keeps running even during checkpoints, so it's feasible to add such a feature now. I wonder though whether the walwriter wouldn't be a better place for it. I do, too, but that wasn't available until too late in the 8.3 cycle to consider adding this feature to there either. There's a couple of potential to-do list ideas that build on the changes in this area in 8.3: -Aggressively pre-allocate WAL segments -Space out checkpoint fsync requests in addition to disk writes -Consider re-inserting a smarter bgwriter all-scan that writes sorted by usage count during idle periods -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] PANIC caused by open_sync on Linux
On Fri, 26 Oct 2007, ITAGAKI Takahiro wrote: My nearby Linux guy says mixed usage of buffered I/O and direct I/O could cause errors (EIO) on many version of Linux kernels. I'd be curious to get some more information about this--specifically which versions have the problems. I'd heard about some weird bugs in the sync write code in versions between RHEL 4 (2.6.9) and 5 (2.6.18), but I wasn't aware of anything wrong with those two stable ones in this area. I have a RHEL 5 system here, will see if I can replicate this EIO error. Mixed usage of buffered and direct i/o is legal, but enforces complexity to kernels. If we simplify it, things would be more relaxed. For example, dropping zero-filling and only use direct i/o. Is it possible? It's possible, but performance suffers considerably. I played around with this at one point when looking into doing all database writes as sync writes. Having to wait until the entire 16MB WAL segment made its way to disk before more WAL could be written can cause a nasty pause in activity, even with direct I/O sync writes. Even the current buffered zero-filled write of that size can be a bit of a drag on performance for the clients that get caught behind it, making it any sort of sync write will be far worse. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match