Re: [HACKERS] PANIC caused by open_sync on Linux

2008-03-24 Thread Bruce Momjian

Added to TODO:

* Be more aggressive about creating WAL files

  http://archives.postgresql.org/pgsql-hackers/2007-10/msg01325.php


---

Tom Lane wrote:
 Greg Smith [EMAIL PROTECTED] writes:
  On Fri, 26 Oct 2007, ITAGAKI Takahiro wrote:
  Mixed usage of buffered and direct i/o is legal, but enforces complexity 
  to kernels. If we simplify it, things would be more relaxed. For 
  example, dropping zero-filling and only use direct i/o. Is it possible?
 
  It's possible, but performance suffers considerably.  I played around with 
  this at one point when looking into doing all database writes as sync 
  writes.  Having to wait until the entire 16MB WAL segment made its way to 
  disk before more WAL could be written can cause a nasty pause in activity, 
  even with direct I/O sync writes.  Even the current buffered zero-filled 
  write of that size can be a bit of a drag on performance for the clients 
  that get caught behind it, making it any sort of sync write will be far 
  worse.
 
 This ties into a loose end we didn't get to yet: being more aggressive
 about creating future WAL segments.  ISTM there is no good reason for
 clients ever to have to wait for WAL segment creation --- the bgwriter,
 or possibly the walwriter, ought to handle that in the background.  But
 we only check for the case once per checkpoint and we don't create a
 segment unless there's very little space left.
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 1: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly

-- 
  Bruce Momjian  [EMAIL PROTECTED]http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PANIC caused by open_sync on Linux

2007-10-29 Thread Andrew Sullivan
On Fri, Oct 26, 2007 at 10:39:12PM -0400, Greg Smith wrote:
 There's a couple of potential to-do list ideas that build on the changes 
 in this area in 8.3:

I think that's the right way to go.  It's too bad that this may still
happen in 8.3, but we're way past the point that this is a bug fix,
IMO.

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]
The plural of anecdote is not data.
--Roger Brinner

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] PANIC caused by open_sync on Linux

2007-10-28 Thread ITAGAKI Takahiro

Greg Smith [EMAIL PROTECTED] wrote:

 There's a couple of potential to-do list ideas that build on the changes 
 in this area in 8.3:
 
 -Aggressively pre-allocate WAL segments 
 -Space out checkpoint fsync requests in addition to disk writes
 -Consider re-inserting a smarter bgwriter all-scan that writes sorted by 
 usage count during idle periods

I'd like to add:
- Remove filling with zero before we recycle WAL segments.

If it is not needed, we can avoid buffered i/o on open_sync except
first allocation of segments. I think we can do it if we have more
robust WAL records that can ignore garbage data written before.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] PANIC caused by open_sync on Linux

2007-10-28 Thread Tom Lane
ITAGAKI Takahiro [EMAIL PROTECTED] writes:
 I'd like to add:
 - Remove filling with zero before we recycle WAL segments.

Huh?  We have never done that.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] PANIC caused by open_sync on Linux

2007-10-28 Thread ITAGAKI Takahiro

Tom Lane [EMAIL PROTECTED] wrote:

 ITAGAKI Takahiro [EMAIL PROTECTED] writes:
  I'd like to add:
  - Remove filling with zero before we recycle WAL segments.
 
 Huh?  We have never done that.

Oh, sorry. I misread the codes.

I would avoid PANIC if I have enough segements at start up.
I'll test the configuration.

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] PANIC caused by open_sync on Linux

2007-10-26 Thread Tom Lane
Greg Smith [EMAIL PROTECTED] writes:
 On Fri, 26 Oct 2007, ITAGAKI Takahiro wrote:
 Mixed usage of buffered and direct i/o is legal, but enforces complexity 
 to kernels. If we simplify it, things would be more relaxed. For 
 example, dropping zero-filling and only use direct i/o. Is it possible?

 It's possible, but performance suffers considerably.  I played around with 
 this at one point when looking into doing all database writes as sync 
 writes.  Having to wait until the entire 16MB WAL segment made its way to 
 disk before more WAL could be written can cause a nasty pause in activity, 
 even with direct I/O sync writes.  Even the current buffered zero-filled 
 write of that size can be a bit of a drag on performance for the clients 
 that get caught behind it, making it any sort of sync write will be far 
 worse.

This ties into a loose end we didn't get to yet: being more aggressive
about creating future WAL segments.  ISTM there is no good reason for
clients ever to have to wait for WAL segment creation --- the bgwriter,
or possibly the walwriter, ought to handle that in the background.  But
we only check for the case once per checkpoint and we don't create a
segment unless there's very little space left.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] PANIC caused by open_sync on Linux

2007-10-26 Thread Jonah H. Harris
On 10/26/07, Tom Lane [EMAIL PROTECTED] wrote:
 This ties into a loose end we didn't get to yet: being more aggressive
 about creating future WAL segments.  ISTM there is no good reason for
 clients ever to have to wait for WAL segment creation --- the bgwriter,
 or possibly the walwriter, ought to handle that in the background.

Agreed.

-- 
Jonah H. Harris, Sr. Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
499 Thornall Street, 2nd Floor  | [EMAIL PROTECTED]
Edison, NJ 08837| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] PANIC caused by open_sync on Linux

2007-10-26 Thread Andrew Sullivan
On Fri, Oct 26, 2007 at 08:34:49AM -0400, Tom Lane wrote:
 we only check for the case once per checkpoint and we don't create a
 segment unless there's very little space left.

Sort of a filthy hack, but what about always having an _extra_
segment around?  The bgwriter could do that, no?

A

-- 
Andrew Sullivan  | [EMAIL PROTECTED]

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] PANIC caused by open_sync on Linux

2007-10-26 Thread Greg Smith

On Fri, 26 Oct 2007, Andrew Sullivan wrote:


Sort of a filthy hack, but what about always having an _extra_
segment around?  The bgwriter could do that, no?


Now it could.  The bgwriter in =8.2 stops executing when there's a 
checkpoint going on, and needing more WAL segments because a checkpoint is 
taking too long is one of the major failure cases where proactively 
creating additional segments would be most helpful.


The 8.3 bgwriter keeps running even during checkpoints, so it's feasible 
to add such a feature now.  But that only became true well into the 8.3 
feature freeze, after some changes Heikki made just before the load 
distributed checkpoint patch was commited.  Before that, it was hard to 
implement this feature; afterwards, it was too late to fit the change into 
the 8.3 release.  Should be easy enough to add to 8.4 one day.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] PANIC caused by open_sync on Linux

2007-10-26 Thread Tom Lane
Greg Smith [EMAIL PROTECTED] writes:
 The 8.3 bgwriter keeps running even during checkpoints, so it's feasible 
 to add such a feature now.

I wonder though whether the walwriter wouldn't be a better place for it.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] PANIC caused by open_sync on Linux

2007-10-26 Thread Greg Smith

On Fri, 26 Oct 2007, Tom Lane wrote:


The 8.3 bgwriter keeps running even during checkpoints, so it's feasible
to add such a feature now.

I wonder though whether the walwriter wouldn't be a better place for it.


I do, too, but that wasn't available until too late in the 8.3 cycle to 
consider adding this feature to there either.


There's a couple of potential to-do list ideas that build on the changes 
in this area in 8.3:


-Aggressively pre-allocate WAL segments 
-Space out checkpoint fsync requests in addition to disk writes
-Consider re-inserting a smarter bgwriter all-scan that writes sorted by 
usage count during idle periods


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] PANIC caused by open_sync on Linux

2007-10-25 Thread Greg Smith

On Fri, 26 Oct 2007, ITAGAKI Takahiro wrote:

My nearby Linux guy says mixed usage of buffered I/O and direct I/O 
could cause errors (EIO) on many version of Linux kernels.


I'd be curious to get some more information about this--specifically which 
versions have the problems.  I'd heard about some weird bugs in the sync 
write code in versions between RHEL 4 (2.6.9) and 5 (2.6.18), but I wasn't 
aware of anything wrong with those two stable ones in this area.  I have a 
RHEL 5 system here, will see if I can replicate this EIO error.


Mixed usage of buffered and direct i/o is legal, but enforces complexity 
to kernels. If we simplify it, things would be more relaxed. For 
example, dropping zero-filling and only use direct i/o. Is it possible?


It's possible, but performance suffers considerably.  I played around with 
this at one point when looking into doing all database writes as sync 
writes.  Having to wait until the entire 16MB WAL segment made its way to 
disk before more WAL could be written can cause a nasty pause in activity, 
even with direct I/O sync writes.  Even the current buffered zero-filled 
write of that size can be a bit of a drag on performance for the clients 
that get caught behind it, making it any sort of sync write will be far 
worse.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match