Re: [HACKERS] Use of sync() [was Re: Potential Large Performance Gain in WAL synching]

2002-10-07 Thread Doug McNaught

Tom Lane [EMAIL PROTECTED] writes:

 Doug McNaught [EMAIL PROTECTED] writes:

  In my understanding, it means all currently dirty blocks in the file
  cache are queued to the disk driver.  The queued writes will
  eventually complete, but not necessarily before sync() returns.  I
  don't think subsequent write()s will block, unless the system is low
  on buffers and has to wait until dirty blocks are freed by the driver.
 
 We don't need later write()s to block.  We only need them to not hit
 disk before the sync-queued writes hit disk.  So I guess the question
 boils down to what queued to the disk driver means --- has the order
 of writes been determined at that point?

It's certainy possible that new write(s) get put into the queue
alongside old ones--I think the Linux block layer tries to do this
when it can, for one.  According to the manpage, Linux used to wait
until everything was written to return from sync(), though I don't
*think* it does anymore.  But that's not mandated by the specs.

So I don't think we can rely on such behavior (not reordering writes
across a sync()), though it will probably happen in practice a lot of
the time.  AFAIK there isn't anything better than sync() + sleep() as
far as the specs go.  Yes, it kinda sucks.  ;)

-Doug

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



[HACKERS] Use of sync() [was Re: Potential Large Performance Gain in WAL synching]

2002-10-05 Thread Mats Lofkvist

[EMAIL PROTECTED] (Tom Lane) writes:

[snip]
 On a filesystem that does have that kind of problem, can't you avoid it
 just by using O_DSYNC on the WAL files?  Then there's no need to call
 fsync() at all, except during checkpoints (which actually issue sync()
 not fsync(), anyway).

This comment on using sync() instead of fsync() makes me slightly
worried since sync() doesn't in any way guarantee that all data is
written immediately. E.g. on *BSD with softupdates, it doesn't even
guarantee that data is written within some deterministic time as
far as I know (*).

With a quick check of the code I found

/*
 *  mdsync() -- Sync storage.
 *
 */
int
mdsync()
{
sync();
if (IsUnderPostmaster)
sleep(2);
sync();
return SM_SUCCESS;
}


which is ugly (imho) even if sync() starts an immediate and complete
file system flush (which I don't think it does with softupdates).

It seems to be used only by

/* 
 * FlushBufferPool
 *
 * Flush all dirty blocks in buffer pool to disk
 * at the checkpoint time
 * 
 */
void
FlushBufferPool(void)
{
BufferSync();
smgrsync();  /* calls mdsync() */
}


so the question that remains is what kinds of guarantees
FlushBufferPool() really expects and needs from smgrsync() ?

If smgrsync() is called to make up for lack of fsync() calls
in BufferSync(), I'm getting really worried :-)

  _
Mats Lofkvist
[EMAIL PROTECTED]


(*) See for example
http://groups.google.com/groups?th=bfc8a0dc5373ed6e

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Use of sync() [was Re: Potential Large Performance Gain in WAL synching]

2002-10-05 Thread Tom Lane

Mats Lofkvist [EMAIL PROTECTED] writes:
 [ mdsync is ugly and not completely reliable ]

Yup, it is.  Do you have a better solution?

fsync is not the answer, since the checkpoint process has no way to know
what files may have been touched since the last checkpoint ... and even
if it could find that out, a string of retail fsync calls would kill
performance, cf. Curtis Faith's complaint.

In practice I am not sure there is a problem.  The local man page for
sync() says

 The writing, although scheduled, is not necessarily complete upon
 return from sync.

Now if scheduled means will occur before any subsequently-commanded
write occurs then we're fine.  I don't know if that's true though ...

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Use of sync() [was Re: Potential Large Performance Gain in WAL synching]

2002-10-05 Thread Doug McNaught

Tom Lane [EMAIL PROTECTED] writes:

 In practice I am not sure there is a problem.  The local man page for
 sync() says
 
  The writing, although scheduled, is not necessarily complete upon
  return from sync.
 
 Now if scheduled means will occur before any subsequently-commanded
 write occurs then we're fine.  I don't know if that's true though ...

In my understanding, it means all currently dirty blocks in the file
cache are queued to the disk driver.  The queued writes will
eventually complete, but not necessarily before sync() returns.  I
don't think subsequent write()s will block, unless the system is low
on buffers and has to wait until dirty blocks are freed by the driver.

-Doug

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] Use of sync() [was Re: Potential Large Performance Gain in WAL synching]

2002-10-05 Thread Tom Lane

Doug McNaught [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 In practice I am not sure there is a problem.  The local man page for
 sync() says
 
 The writing, although scheduled, is not necessarily complete upon
 return from sync.
 
 Now if scheduled means will occur before any subsequently-commanded
 write occurs then we're fine.  I don't know if that's true though ...

 In my understanding, it means all currently dirty blocks in the file
 cache are queued to the disk driver.  The queued writes will
 eventually complete, but not necessarily before sync() returns.  I
 don't think subsequent write()s will block, unless the system is low
 on buffers and has to wait until dirty blocks are freed by the driver.

We don't need later write()s to block.  We only need them to not hit
disk before the sync-queued writes hit disk.  So I guess the question
boils down to what queued to the disk driver means --- has the order
of writes been determined at that point?

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster