* Win32, with fsync, write-cache disabled: no data corruption
* Win32, with fsync, write-cache enabled: no data corruption
* Win32, with osync, write cache disabled: no data corruption
* Win32, with osync, write cache enabled: no data
corruption. Once
I
got:
Magnus Hagander wrote:
This indicated to me that open_sync did not require any
additional changes than our current fsync.
fsync and open_sync both write through the write cache in the operating
system. Only fsync=off turns this off.
fsync also writes through the hardware write cache.
Bruce Momjian pgman@candle.pha.pa.us writes:
However, I do prefer this patch and let Win32 have the same write cache
issues as Unix, for consistency.
I agree that the open flag is more nearly O_DSYNC than O_SYNC.
ISTM Windows' idea of fsync is quite different from Unix's and therefore
we
Tom Lane wrote:
Bruce Momjian pgman@candle.pha.pa.us writes:
However, I do prefer this patch and let Win32 have the same write cache
issues as Unix, for consistency.
I agree that the open flag is more nearly O_DSYNC than O_SYNC.
ISTM Windows' idea of fsync is quite different from Unix's
Bruce Momjian pgman@candle.pha.pa.us writes:
Tom Lane wrote:
we should name the wal_sync_method that invokes it something different
than fsync. write_through or some such? We already have precedent
that not all wal_sync_method values are available on all platforms.
Yes, I am thinking that
, 2005 10:53 AM
To: Tom Lane
Cc: Magnus Hagander; Michael Paesold; pgsql-hackers@postgresql.org;
[EMAIL PROTECTED]; Merlin Moncure
Subject: Re: [pgsql-hackers-win32] [HACKERS] win32 performance - fsync
question
Tom Lane wrote:
Bruce Momjian pgman@candle.pha.pa.us writes:
However, I do prefer
Tom Lane wrote:
Bruce Momjian pgman@candle.pha.pa.us writes:
Tom Lane wrote:
ISTM Windows' idea of fsync is quite different from Unix's and therefore
we should name the wal_sync_method that invokes it something different
than fsync. write_through or some such?
Ah, I remember now. On
Bruce Momjian pgman@candle.pha.pa.us writes:
Tom Lane wrote:
ISTM Windows' idea of fsync is quite different from Unix's and therefore
we should name the wal_sync_method that invokes it something different
than fsync. write_through or some such?
Ah, I remember now. On Win32 our fsync is:
Michael Paesold wrote:
Magnus Hagander wrote:
Magnus Hagander wrote:
Magnus prepared a trivial patch which added the O_SYNC flag
for windows and mapped it to FILE_FLAG_WRITE_THROUGH in
win32_open.c.
[snip]
Michael Paesold wrote:
The original patch did not have any
Bruce Momjian wrote:
Michael Paesold wrote:
Magnus Hagander wrote:
[snip]
Michael, I am not sure why you come to the conclusion that open_sync
requires turning off the disk write cache. I saw nothing to indicate
that in the thread:
I was just seeing his error message below...
-Original Message-
From: [EMAIL PROTECTED] on behalf of Bruce Momjian
Sent: Sun 2/27/2005 12:54 AM
To: Magnus Hagander
Cc: Tom Lane; pgsql-hackers@postgresql.org; [EMAIL PROTECTED]; Merlin Moncure
Subject: Re: [pgsql-hackers-win32] [HACKERS] win32 performance - fsync question
Patch
Patch applied. Thanks.
I assume this is not approprate for 8.0.X.
---
Magnus Hagander wrote:
Magnus prepared a trivial patch which added the O_SYNC flag
for windows and mapped it to FILE_FLAG_WRITE_THROUGH in
win32_open.c.
Bruce Momjian wrote:
Patch applied. Thanks.
I assume this is not approprate for 8.0.X.
---
Magnus Hagander wrote:
Magnus prepared a trivial patch which added the O_SYNC flag
for windows and mapped it to FILE_FLAG_WRITE_THROUGH in
Magnus Hagander wrote:
Magnus Hagander wrote:
Magnus prepared a trivial patch which added the O_SYNC flag
for windows and mapped it to FILE_FLAG_WRITE_THROUGH in
win32_open.c.
[snip]
Michael Paesold wrote:
The original patch did not have any documentation. Have you
added some? Since this has
Moncure
Subject: Re: [pgsql-hackers-win32] [HACKERS] win32 performance
- fsync question
Patch applied. Thanks.
I assume this is not approprate for 8.0.X.
---
Magnus Hagander wrote:
Magnus prepared a trivial patch which
Patch applied. Thanks.
I assume this is not approprate for 8.0.X.
---
Magnus Hagander wrote:
Magnus prepared a trivial patch which added the O_SYNC flag
for windows and mapped it to FILE_FLAG_WRITE_THROUGH in
Are you verifying that all the data that was committed was actually stored?
Or
just verifying that the database works properly after rebooting?
I verified the data.
Does pg startup increase the xid by some amount (say 1000 xids) after crash ?
Else I think you would also need to rollback a
Magnus prepared a trivial patch which added the O_SYNC flag for
windows and mapped it to FILE_FLAG_WRITE_THROUGH in win32_open.c.
Attached is this trivial patch. As Merlin says, it needs some
more reliability testing. But the numbers are at least reasonable - it
*seems* like it's doing
In the final test, the BIOS decided the disk was giving up and
reassigned it as 0Mb.. Required two extra cold boots, then it was back
up to 20Gb. Still no data loss.
I think it would be fun to re-run these tests with MySQL...
Chris
---(end of
My results are:
Fisrt, baseline:
* Linux, with fsync (default), write-cache disabled: no data corruption
* Linux, with fsync (default), write-cache enabled: usually no data
corruption, but two runs which had
* Win32, with fsync, write-cache disabled: no data corruption
* Win32, with fsync,
Magnus Hagander [EMAIL PROTECTED] writes:
My results are:
Fisrt, baseline:
* Linux, with fsync (default), write-cache disabled: no data corruption
* Linux, with fsync (default), write-cache enabled: usually no data
corruption, but two runs which had
That makes sense.
* Win32, with fsync,
* Win32, with fsync, write-cache disabled: no data corruption
* Win32, with fsync, write-cache enabled: no data corruption
* Win32, with osync, write cache disabled: no data corruption
* Win32, with osync, write cache enabled: no data corruption. Once I
got:
2005-02-24 12:19:54 LOG:
Magnus Hagander [EMAIL PROTECTED] writes:
* Linux, with fsync (default), write-cache enabled: usually no data
corruption, but two runs which had
Are you verifying that all the data that was committed was actually stored? Or
just verifying that the database works properly after rebooting?
I'm
Greg Stark [EMAIL PROTECTED] writes:
I'm a bit surprised that the write-cache lead to a corrupt database, and not
merely lost transactions. I had the impression that drives still handled the
writes in the order received.
There'd be little point in having a cache if they did, I should think.
I
Tom Lane [EMAIL PROTECTED] writes:
Greg Stark [EMAIL PROTECTED] writes:
I'm a bit surprised that the write-cache lead to a corrupt database, and not
merely lost transactions. I had the impression that drives still handled the
writes in the order received.
There'd be little point in
You may find that if you check this case again that the
usually no data
corruption is actually usually lost transactions but no
corruption.
That's a good point, but it seems difficult to be sure of the last
reportedly-committed transaction in a powerfail situation. Maybe if
you drive the
* Linux, with fsync (default), write-cache enabled: usually no data
corruption, but two runs which had
Are you verifying that all the data that was committed was
actually stored? Or
just verifying that the database works properly after rebooting?
I verified the data.
I'm a bit surprised
Magnus Hagander [EMAIL PROTECTED] writes:
I'm a bit surprised that the write-cache lead to a corrupt database, and
not merely lost transactions. I had the impression that drives still
handled the writes in the order received.
In this case, it was lost transactions, not data corruption.
One point that I no longer recall the reasoning behind is that xlog.c
doesn't think O_SYNC is a preferable default over fsync.
For larger (8k) transactions O_SYNC|O_DIRECT is only good with the recent
pending patch to group WAL writes together. The fsync method gives the OS a
chance
On win32 (which started this discussion, fsync will sync the
directory
entry as well, which will lead to *at least* two seeks on the disk.
Writing two blocks after each other to an O_SYNC opened file should
give
exactly two seeks.
I think you are making the following not maintainable
Magnus prepared a trivial patch which added the O_SYNC flag
for windows and mapped it to FILE_FLAG_WRITE_THROUGH in
win32_open.c.
Attached is this trivial patch. As Merlin says, it needs some more
reliability testing. But the numbers are at least reasonable - it
*seems* like it's doing the
Portability, or rather the complete lack of it. Stuff that
isn't in the
Single Unix Spec is a hard sell.
O_DIRECT is reasonably common among modern Unixen (it is supported by
Linux, FreeBSD, and probably a couple of the commercial variants like
AIX or IRIX); it should also be reasonably
One point that I no longer recall the reasoning behind is that xlog.c
doesn't think O_SYNC is a preferable default over fsync.
For larger (8k) transactions O_SYNC|O_DIRECT is only good
with the recent
pending patch to group WAL writes together. The fsync method
gives the OS a
chance to do
Tom Lane wrote:
Portability, or rather the complete lack of it. Stuff that isn't in the
Single Unix Spec is a hard sell.
O_DIRECT is reasonably common among modern Unixen (it is supported by
Linux, FreeBSD, and probably a couple of the commercial variants like
AIX or IRIX); it should also be
One point that I no longer recall the reasoning behind is that xlog.c
doesn't think O_SYNC is a preferable default over fsync.
For larger (8k) transactions O_SYNC|O_DIRECT is only good with the recent
pending patch to group WAL writes together. The fsync method gives the OS a
chance to do
Magnus prepared a trivial patch which added the O_SYNC flag for windows
and mapped it to FILE_FLAG_WRITE_THROUGH in win32_open.c. We pg_benched
it and here are the results of our test on my WinXP workstation on a 10k
raptor:
Settings were pgbench -t 100 -c 10.
fsync = off:
~ 280 tps
fsync on,
Magnus prepared a trivial patch which added the O_SYNC flag
for windows and mapped it to FILE_FLAG_WRITE_THROUGH in
win32_open.c.
Attached is this trivial patch. As Merlin says, it needs some more
reliability testing. But the numbers are at least reasonable - it
*seems* like it's doing the
Hi,
looking for the way how to increase performance at Windows XP
box, I found the parameters
#fsync = true # turns forced
synchronization on or off
#wal_sync_method = fsync# the default varies across platforms:
# fsync,
Things worth experimenting with (these are all untested, so please
report any successes):
1) Try reformatting with a cluster size of 8Kb (the pg page size), if
you can.
What about recompiling pg with a 4k block size. Win32 file cluster
sizes and memory allocation units are both on 4k
The general question is - does PostgreSQL really need fsync? I suppose it
is a question for design, not platform-specific one. It sounds like only
one scenario, when fsync is useful, is to interprocess communication via
open file. But PostgreSQL utilize IPC for this, so does fsync is really
On Thu, 17 Feb 2005, Magnus Hagander wrote:
Hi,
looking for the way how to increase performance at Windows XP
box, I found the parameters
#fsync = true # turns forced
synchronization on or off
#wal_sync_method = fsync# the default varies across platforms:
On Thu, 17 Feb 2005, Christopher Kings-Lynne wrote:
The general question is - does PostgreSQL really need fsync? I suppose it
is a question for design, not platform-specific one. It sounds like only
one scenario, when fsync is useful, is to interprocess communication via
open file. But PostgreSQL
On Thu, 17 Feb 2005 17:54:38 +0300 (MSK)
E.Rodichev [EMAIL PROTECTED] wrote:
On Thu, 17 Feb 2005, Christopher Kings-Lynne wrote:
The general question is - does PostgreSQL really need fsync? I
suppose it is a question for design, not platform-specific one. It
sounds like only one scenario,
E.Rodichev [EMAIL PROTECTED] writes:
On Thu, 17 Feb 2005, Christopher Kings-Lynne wrote:
Fsync is so that when your computer loses power without warning, you
will have no data loss.
If you turn it off, you run the risk of losing data if you lose power.
Chris
This problem is addressed by
E.Rodichev wrote:
This problem is addressed by file system (fsck, journalling etc.).
Is it reasonable to handle it directly within application?
In the words of the Duke of Wellington, If you believe that you'll
believe anything.
Please review past discussions on the mailing lists on this
In [EMAIL PROTECTED], on 02/17/05
at 10:21 AM, Andrew Dunstan [EMAIL PROTECTED] said:
E.Rodichev wrote:
This problem is addressed by file system (fsck, journalling etc.).
Is it reasonable to handle it directly within application?
In the words of the Duke of Wellington, If you believe
So by all means turn off fsync if you want the performance gain *and*
you accept the risk. But if you do, don't come crying later that your
data has been lost or corrupted.
(the results are interesting, though - with fsync off Windows
and Linux
are in the same performance ballpark.)
Yes,
This is what we have discovered. AFAIK, all other major databases or
other similar apps (like exchange or AD) all open files with
FILE_FLAG_WRITE_THROUGH and do *not* use fsync. It might
give noticably
better performance with an O_DIRECT style WAL logging at
least. But I'm
unsure if the
Magnus Hagander [EMAIL PROTECTED] writes:
This is what we have discovered. AFAIK, all other major databases or
other similar apps (like exchange or AD) all open files with
FILE_FLAG_WRITE_THROUGH and do *not* use fsync. It might give noticably
better performance with an O_DIRECT style WAL
Things worth experimenting with (these are all untested, so please
report any successes):
1) Try reformatting with a cluster size of 8Kb (the pg page size), if
you can.
2) Disable the last access time (like noatime on linux). fsutil
behavior set disablelastaccess 1
3) Disable 8.3 filenames
On Thu, 17 Feb 2005, Andrew Dunstan wrote:
(the results are interesting, though - with fsync off Windows and Linux are
in the same performance ballpark.)
Some addition:
WinXP fsync = true 20-28 tps
WinXP fsync = false 600 tps
Linux fsync = true 800 tps
Linux fsync = false
Doesn't Windows support O_SYNC (or even better O_DSYNC) flag
to open()?
That should be the Posixy spelling of FILE_FLAG_WRITE_THROUGH, if the
latter means what I suppose it does.
They should, but someone said it didn't work. I haven't
followed up on it, though, so it is quite possible it works.
Doesn't Windows support O_SYNC (or even better O_DSYNC) flag to
open()?
That should be the Posixy spelling of FILE_FLAG_WRITE_THROUGH, if the
latter means what I suppose it does.
They should, but someone said it didn't work. I haven't followed up on
it, though, so it is quite possible it
Magnus Hagander [EMAIL PROTECTED] writes:
Oh, and finally. The win32 commands have the following options:
FILE_FLAG_NO_BUFFERING. This disables the cache completely. It also has
lots of limits, like every read and write has to be on a sector boundary
etc. It gives great performance with async
Some addition:
WinXP fsync = true 20-28 tps
WinXP fsync = false 600 tps
Linux fsync = true 800 tps
Linux fsync = false 980 tps
Wow, that's terrible on Windows. If there's a solution, it'd be nice to
backport it...
Chris
---(end of
There are two different concerns here.
1. transactions loss because of unexpected power loss and/or system failure
2. inconsistent database state
For many application (1) is fairly acceptable, and (2) is not.
So I'd like to formulate my questions by another way.
- if PostgeSQL is running without
WinXP fsync = true 20-28 tps
WinXP fsync = false 600 tps
Linux fsync = true 800 tps
Linux fsync = false 980 tps
Wow, that's terrible on Windows. If there's a solution, it'd be nice
to
backport it...
there is. I just rigged up a test benchmark comparing sync
Christopher Kings-Lynne [EMAIL PROTECTED] writes:
WinXP fsync = true 20-28 tps
WinXP fsync = false 600 tps
Linux fsync = true 800 tps
Linux fsync = false 980 tps
Wow, that's terrible on Windows. If there's a solution, it'd be nice to
backport it...
Actually, the
Evgeny Rodichev wrote:
There are two different concerns here.
1. transactions loss because of unexpected power loss and/or system failure
2. inconsistent database state
For many application (1) is fairly acceptable, and (2) is not.
So I'd like to formulate my questions by another way.
- if
WinXP fsync = true 20-28 tps
WinXP fsync = false 600 tps
Linux fsync = true 800 tps
Linux fsync = false 980 tps
Wow, that's terrible on Windows. If there's a solution, it'd be nice
to
backport it...
there is. I just rigged up a test benchmark comparing sync
One point that I no longer recall the reasoning behind is that xlog.c
doesn't think O_SYNC is a preferable default over fsync. We'd
certainly
want to hack xlog.c to change its mind about that, at least on
Windows;
assuming that the FILE_FLAG way is indeed faster.
I also confirmed that the
Magnus Hagander [EMAIL PROTECTED] writes:
Tom, if you look at all the requirements of FILE_FLAG_NO_BUFFERING on
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/
base/createfile.asp, can you say offhand if the WAL code fulfills them?
If I'm reading it right, you are
On Thu, 17 Feb 2005, Tom Lane wrote:
Christopher Kings-Lynne [EMAIL PROTECTED] writes:
WinXP fsync = true 20-28 tps
WinXP fsync = false 600 tps
Linux fsync = true 800 tps
Linux fsync = false 980 tps
Wow, that's terrible on Windows. If there's a solution, it'd be nice to
Magnus Hagander [EMAIL PROTECTED] writes:
Tom, if you look at all the requirements of FILE_FLAG_NO_BUFFERING
on
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/
base/createfile.asp, can you say offhand if the WAL code fulfills
them?
If I'm reading it right, you
Evgeny Rodichev [EMAIL PROTECTED] writes:
Any claimed TPS rate exceeding your disk drive's rotation rate is a
red flag.
Write cache is enabled under Linux by default all the time I make deal
with it (since 1993).
You're playing with fire.
fsync() really works fine as I switch off my
After multiple runs on different blocksizes( a few anomalous results
aside), I didn't see a whole lot of difference between
FILE_FLAG_NO_BUFFERING being on or off for writing performance.
However, with NO_BUFFERING set, the file is not *read* cached at all.
While the performance is on not terrible
Magnus Hagander [EMAIL PROTECTED] writes:
Is there actually a reason why we don't use O_DIRECT on Unix?
Portability, or rather the complete lack of it. Stuff that isn't in the
Single Unix Spec is a hard sell.
regards, tom lane
---(end of
Evgeny Rodichev wrote:
Write cache is enabled under Linux by default all the time I make deal
with it (since 1993).
It doesn't interfere with fsync(), as linux kernel uses cache flush for
fsync.
The problem is that most IDE drives lie (or perhaps you could say the
specification is ambiguous)
Magnus Hagander [EMAIL PROTECTED] writes:
Is there actually a reason why we don't use O_DIRECT on Unix?
Portability, or rather the complete lack of it. Stuff that isn't in
the
Single Unix Spec is a hard sell.
Well, how about this (ok, maybe I'm way out in left field):
Change fsync option
Oliver Jowett [EMAIL PROTECTED] writes:
So Linux is indeed doing a cache flush on fsync
Actually I think the root of the problem was precisely that Linux does not
issue any sort of cache flush commands to drives on fsync. There was some talk
on linux-kernel of what how they could take
Greg Stark wrote:
Oliver Jowett [EMAIL PROTECTED] writes:
So Linux is indeed doing a cache flush on fsync
Actually I think the root of the problem was precisely that Linux does not
issue any sort of cache flush commands to drives on fsync. There was some talk
on linux-kernel of what how they
On Thu, 17 Feb 2005, Tom Lane wrote:
Evgeny Rodichev [EMAIL PROTECTED] writes:
Any claimed TPS rate exceeding your disk drive's rotation rate is a
red flag.
Write cache is enabled under Linux by default all the time I make deal
with it (since 1993).
You're playing with fire.
Yes. I'm lucky in
On Fri, 18 Feb 2005, Oliver Jowett wrote:
Evgeny Rodichev wrote:
Write cache is enabled under Linux by default all the time I make deal
with it (since 1993).
It doesn't interfere with fsync(), as linux kernel uses cache flush for
fsync.
The problem is that most IDE drives lie (or perhaps you could
On Fri, 17 Feb 2005, Greg Stark wrote:
Oliver Jowett [EMAIL PROTECTED] writes:
So Linux is indeed doing a cache flush on fsync
Actually I think the root of the problem was precisely that Linux does not
issue any sort of cache flush commands to drives on fsync.
No, it does. Let's try the simplest
Magnus Hagander [EMAIL PROTECTED]
news:[EMAIL PROTECTED]
This is what we have discovered. AFAIK, all other major databases or
other similar apps (like exchange or AD) all open files with
FILE_FLAG_WRITE_THROUGH and do *not* use fsync. It might give noticably
better performance with an
Evgeny Rodichev [EMAIL PROTECTED] writes:
No, it does. Let's try the simplest test:
for (i = 0; i LEN; i++) {
write (fd, buf, 512);
if (sync) fsync (fd);
}
with sync = 0 and 1, and you'll see the difference.
Uh, I'm sure you'll see a difference, one will be limited by the i/o
Hi,
looking for the way how to increase performance at Windows XP box, I found
the parameters
#fsync = true # turns forced synchronization on or off
#wal_sync_method = fsync# the default varies across platforms:
# fsync, fdatasync,
looking for the way how to increase performance at Windows XP box, I
found
the parameters
#fsync = true # turns forced synchronization on or
off
#wal_sync_method = fsync# the default varies across platforms:
# fsync, fdatasync,
78 matches
Mail list logo