Re: [HACKERS] Design notes for BufMgrLock rewrite
Tom Lane wrote: Jim C. Nasby [EMAIL PROTECTED] writes: The advantage of using a counter instead of a simple active bit is that buffers that are (or have been) used heavily will be able to go through several sweeps of the clock before being freed. Infrequently used buffers (such as those from a vacuum or seq. scan), would get marked as inactive the first time they were hit by the clock hand. What I'm envisioning is that pinning (actually unpinning) a buffer increments the counter (up to some limit), and the clock sweep decrements it (down to zero), and only buffers with count zero are taken by the sweep for recycling. Would there be any value in incrementing by 2 for index accesses and 1 for seq-scans/vacuums? Actually, it should probably be a ratio based on random_page_cost shouldn't it? -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Urgent problem: Unicode characters greater than or
When I try to input a unicode caracter which code is greater than U+2, phpPgAdmin returns the following error message : ERROR: Unicode characters greater than or equal to 0x1 are not supported Could someone fix this problem ? If yes, would you please tell me where can i download the new postgre debian package. All I can say as the developer of phpPgAdmin, is that it's a PostgreSQL restriction, not a phpPgAdmin one. Chris ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Help me recovering data
And most databases get a mix of updates and selects. I would expect it would be pretty hard to go that long with any significant level of update activity and no vacuums and not notice the performance problems from the dead tuples. I think the people who've managed to shoot themselves in the foot this way are those who decided to optimize their cron jobs to only vacuum their user tables, and forgot about the system catalogs. That's certainly the case with one of the people we helped in IRC - 3 user tables only being vacuumed. Chris ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] Terminating a SETOF function call sequence
Some SQL constructs will be satisfied before all rows of a set has been examined. I'm thinking of for instance: EXISTS(SELECT * FROM y WHERE y.a 0) If the first row of collection y fulfills the WHERE predicate, there's no reason to continue perusing the rest of the rows. Now, what if 'y' is a function returning SETOF something? I see only one possible way for a C-function to detect that it doesn't need to return more rows and that would be if the FuncCallContext call_cntr reaches max_calls. My question is, what happens when the evaluator doesn't need more rows? Will it: a) call the function with call_cntr = max_calls? b) continue calling until the set is exhausted anyway? c) simply stop calling? a) seems unlikely since max_calls is set by the user, b) doesn't seem very optimal, and c) would be very bad since it doesn't give me any chance to release the resources that where used in order to produce the rows. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Strange RETURN NEXT behaviour in Postgres 8.0
It looks like the code that handles returning a RECORD variable doesn't cope with dropped columns in the function result rowtype. (If you instead declare rec as usno%rowtype, you get a different set of misbehaviors after adding/dropping columns, so that code path isn't perfect either :-() Isn't it amazing, Tom, that that column dropping code that we did up for 7.3 is STILL causing bugs :D Chris ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Terminating a SETOF function call sequence
... c) would be very bad since it doesn't give me any chance to release the resources that where used in order to produce the rows. You are supposed to free resources used to produce the rows before srf_return_next(); The actual rows are pfree()'d by pg. (an dso are any other palloc()'d resources, but I'd recommend freeing them anyway, especially if youre going to use the function in an index or transactions, since resources a not freed till the end of the transaction) ... JOhn ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] win32 performance - fsync question
Hi, looking for the way how to increase performance at Windows XP box, I found the parameters #fsync = true # turns forced synchronization on or off #wal_sync_method = fsync# the default varies across platforms: # fsync, fdatasync, open_sync, or open_datasync I have no idea how it works with win32. May I try fsync = false, or it is dangerous? Which of wal_sync_method may I try at WinXP? You can try it, but it is dangerous. fsync is the correct wal_sync_method. For some reason the syncing is quite a lot slower on win32. One reason might be that it does flush metadata about the file as well, which I beleive at least Linux doesn't. If it wasn't clear already, if you're running antivirus, try uninstalling it. Note that you may need to uninstall it to get all performance back, just disabling is often *not* enough as the kernel driver is still loaded. Things worth experimenting with (these are all untested, so please report any successes): 1) Try reformatting with a cluster size of 8Kb (the pg page size), if you can. 2) Disable the last access time (like noatime on linux). fsutil behavior set disablelastaccess 1 3) Disable 8.3 filenames fsutil behavior set disable8dot3 1 2 and 3 may require a reboot. (2 and 3 can be done on earlier windows through registry settings only, in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem) //Magnus ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Terminating a SETOF function call sequence
John, You are supposed to free resources used to produce the rows before srf_return_next(); I can (and must) free up the resources used to produce one single row at that time yes, but I might have resources that is common to all rows. Let's assume that I have a file open for instance. I read one row at a time from that file. I need to know when to close the file. Regards, Thomas Hallgren ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Design notes for BufMgrLock rewrite
Would there be any value in incrementing by 2 for index accesses and 1 for seq-scans/vacuums? Actually, it should probably be a ratio based on random_page_cost shouldn't it? What happens with very small hot tables that are only a few pages and thus have no index defined. I think it would not be good to treat such data pages as less important than index pages. Andreas ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Help me recovering data
Gaetano Mendola [EMAIL PROTECTED] writes: We do ~4000 txn/minute so in 6 month you are screewd up... Sure, but if you ran without vacuuming for 6 months, wouldn't you notice the huge slowdowns from all those dead tuples before that? I would think that only applies to databases where UPDATE and DELETE are done often. What about databases that are 99.999% inserts? A DBA lightly going over the docs may not even know that vacuum needs to be run. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] win32 performance - fsync question
Things worth experimenting with (these are all untested, so please report any successes): 1) Try reformatting with a cluster size of 8Kb (the pg page size), if you can. What about recompiling pg with a 4k block size. Win32 file cluster sizes and memory allocation units are both on 4k boundries. Merlin ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] win32 performance - fsync question
The general question is - does PostgreSQL really need fsync? I suppose it is a question for design, not platform-specific one. It sounds like only one scenario, when fsync is useful, is to interprocess communication via open file. But PostgreSQL utilize IPC for this, so does fsync is really required? NO! Fsync is so that when your computer loses power without warning, you will have no data loss. If you turn it off, you run the risk of losing data if you lose power. Chris ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] win32 performance - fsync question
On Thu, 17 Feb 2005, Magnus Hagander wrote: Hi, looking for the way how to increase performance at Windows XP box, I found the parameters #fsync = true # turns forced synchronization on or off #wal_sync_method = fsync# the default varies across platforms: # fsync, fdatasync, open_sync, or open_datasync I have no idea how it works with win32. May I try fsync = false, or it is dangerous? Which of wal_sync_method may I try at WinXP? You can try it, but it is dangerous. fsync is the correct wal_sync_method. For some reason the syncing is quite a lot slower on win32. One reason might be that it does flush metadata about the file as well, which I beleive at least Linux doesn't. If it wasn't clear already, if you're running antivirus, try uninstalling it. Note that you may need to uninstall it to get all performance back, just disabling is often *not* enough as the kernel driver is still loaded. No, I have not any resident disk-related staff. Things worth experimenting with (these are all untested, so please report any successes): 1) Try reformatting with a cluster size of 8Kb (the pg page size), if you can. 2) Disable the last access time (like noatime on linux). fsutil behavior set disablelastaccess 1 3) Disable 8.3 filenames fsutil behavior set disable8dot3 1 2 and 3 may require a reboot. (2 and 3 can be done on earlier windows through registry settings only, in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem) I've repeated the test under 2 and 3 - no noticeable difference. With disablelastaccess I got about 10% - 15% better results, but it is not too significant. Finally I tried fsync = false and got 580-620 tps. So, the short summary: WinXP fsync = true 20-28 tps WinXP fsync = false 600 tps Linux 800 tps The general question is - does PostgreSQL really need fsync? I suppose it is a question for design, not platform-specific one. It sounds like only one scenario, when fsync is useful, is to interprocess communication via open file. But PostgreSQL utilize IPC for this, so does fsync is really required? E.R. _ Evgeny Rodichev Sternberg Astronomical Institute email: [EMAIL PROTECTED] Moscow State University Phone: 007 (095) 939 2383 Fax: 007 (095) 932 8841 http://www.sai.msu.su/~er ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] win32 performance - fsync question
On Thu, 17 Feb 2005, Christopher Kings-Lynne wrote: The general question is - does PostgreSQL really need fsync? I suppose it is a question for design, not platform-specific one. It sounds like only one scenario, when fsync is useful, is to interprocess communication via open file. But PostgreSQL utilize IPC for this, so does fsync is really required? NO! Fsync is so that when your computer loses power without warning, you will have no data loss. If you turn it off, you run the risk of losing data if you lose power. Chris This problem is addressed by file system (fsck, journalling etc.). Is it reasonable to handle it directly within application? Regards, E.R. _ Evgeny Rodichev Sternberg Astronomical Institute email: [EMAIL PROTECTED] Moscow State University Phone: 007 (095) 939 2383 Fax: 007 (095) 932 8841 http://www.sai.msu.su/~er ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] win32 performance - fsync question
On Thu, 17 Feb 2005 17:54:38 +0300 (MSK) E.Rodichev [EMAIL PROTECTED] wrote: On Thu, 17 Feb 2005, Christopher Kings-Lynne wrote: The general question is - does PostgreSQL really need fsync? I suppose it is a question for design, not platform-specific one. It sounds like only one scenario, when fsync is useful, is to interprocess communication via open file. But PostgreSQL utilize IPC for this, so does fsync is really required? NO! Fsync is so that when your computer loses power without warning, you will have no data loss. If you turn it off, you run the risk of losing data if you lose power. Chris This problem is addressed by file system (fsck, journalling etc.). Is it reasonable to handle it directly within application? NO again! Fsck only fixes up file system pointers after a crash. If the data did not make it to the disk, no amount of fscking will put it there. I'm not positive but I think that journalled file systems also need fsync to guarantee that the information gets journalled but in any case, journalling only helps if you have a journalled file system. Not everyone does. This is not to say that fsync is always required, just that it solves a different problem than all those other tools. -- D'Arcy J.M. Cain darcy@druid.net | Democracy is three wolves http://www.druid.net/darcy/| and a sheep voting on +1 416 425 1212 (DoD#0082)(eNTP) | what's for dinner. ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] win32 performance - fsync question
E.Rodichev [EMAIL PROTECTED] writes: On Thu, 17 Feb 2005, Christopher Kings-Lynne wrote: Fsync is so that when your computer loses power without warning, you will have no data loss. If you turn it off, you run the risk of losing data if you lose power. Chris This problem is addressed by file system (fsck, journalling etc.). Is it reasonable to handle it directly within application? No, it's not addressed by the file system. fsync() tells the OS to make sure the data is on disk. Without that, the OS is free to just keep the WAL data in memory cache, and a power failure could cause data from committed transactions to be lost (we don't report commit success until fsync() tells us the file data is on disk). -Doug ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] win32 performance - fsync question
E.Rodichev wrote: This problem is addressed by file system (fsck, journalling etc.). Is it reasonable to handle it directly within application? In the words of the Duke of Wellington, If you believe that you'll believe anything. Please review past discussions on the mailing lists on this point. BTW, most journalling file systems do not guarantee file integrity, only file metadata integrity. In particular, I believe this is tru of NTFS (and whether it even does that has been debated). So by all means turn off fsync if you want the performance gain *and* you accept the risk. But if you do, don't come crying later that your data has been lost or corrupted. (the results are interesting, though - with fsync off Windows and Linux are in the same performance ballpark.) cheers andrew ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] win32 performance - fsync question
In [EMAIL PROTECTED], on 02/17/05 at 10:21 AM, Andrew Dunstan [EMAIL PROTECTED] said: E.Rodichev wrote: This problem is addressed by file system (fsck, journalling etc.). Is it reasonable to handle it directly within application? In the words of the Duke of Wellington, If you believe that you'll believe anything. Please review past discussions on the mailing lists on this point. BTW, most journalling file systems do not guarantee file integrity, only file metadata integrity. In particular, I believe this is tru of NTFS (and whether it even does that has been debated). So by all means turn off fsync if you want the performance gain *and* you accept the risk. But if you do, don't come crying later that your data has been lost or corrupted. (the results are interesting, though - with fsync off Windows and Linux are in the same performance ballpark.) cheers andrew In anything I've done, Windows is very slow when you use fsync or the Windows API equivalent. If you need the performance, you had better have the machine hooked up to a UPS (probably a good idea in any case) and set up something that is triggered by the UPS running down to signal postgreSQL to do an immediate shutdown. -- --- [EMAIL PROTECTED] --- ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Terminating a SETOF function call sequence
Thomas Hallgren [EMAIL PROTECTED] writes: My question is, what happens when the evaluator doesn't need more rows? Will it: a) call the function with call_cntr = max_calls? b) continue calling until the set is exhausted anyway? c) simply stop calling? (c) a) seems unlikely since max_calls is set by the user, b) doesn't seem very optimal, and c) would be very bad since it doesn't give me any chance to release the resources that where used in order to produce the rows. This is what RegisterExprContextCallback is for. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Terminating a SETOF function call sequence
Tom Lane wrote: Thomas Hallgren [EMAIL PROTECTED] writes: My question is, what happens when the evaluator doesn't need more rows? Will it: a) call the function with call_cntr = max_calls? b) continue calling until the set is exhausted anyway? c) simply stop calling? (c) a) seems unlikely since max_calls is set by the user, b) doesn't seem very optimal, and c) would be very bad since it doesn't give me any chance to release the resources that where used in order to produce the rows. This is what RegisterExprContextCallback is for. regards, tom lane Thanks Tom, This is exactly what I need. I didn't know that such a callback existed. Perhaps it should be mentioned in the documentation chapter that talks about SETOF and C-functions? Regards, Thomas Hallgren ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] win32 performance - fsync question
So by all means turn off fsync if you want the performance gain *and* you accept the risk. But if you do, don't come crying later that your data has been lost or corrupted. (the results are interesting, though - with fsync off Windows and Linux are in the same performance ballpark.) Yes, this is definitly interesting. It confirms Merlins signs of I/O being what kills the win32 version. IPC etc is a bit slower, but not significantly. In anything I've done, Windows is very slow when you use fsync or the Windows API equivalent. This is what we have discovered. AFAIK, all other major databases or other similar apps (like exchange or AD) all open files with FILE_FLAG_WRITE_THROUGH and do *not* use fsync. It might give noticably better performance with an O_DIRECT style WAL logging at least. But I'm unsure if the current code for O_DIRECT works on win32 - I think it needs some fixing for that. Which might be worth looking at for 8.1. Not much to do about the bgwriter, the way it is designed it *has* to fsync during checkpoint. The Other Databases implement their own cache and write data files directly also, but pg is designed to have the OS cache helping out. Bypassing it would not be good for performance. If you need the performance, you had better have the machine hooked up to a UPS (probably a good idea in any case) and set up something that is triggered by the UPS running down to signal postgreSQL to do an immediate shutdown. UPS will not help you. UPS does not help you if the OS crashes (hey, yuo're on windows, this *does* happen). UPS does not help you if somebody accidentally pulls the plug between the UPS and the server. UPS does not help you if your server overheats and shuts down. Bottom line, there are lots of cases when an UPS does not help. Having an UPS (preferrably redundant UPSes feeding redundant power supplies - this is not at all expensive today) is certainly a good thing, but it is *not* a replacement for fsync. On *any* platform. //Magnus ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] win32 performance - fsync question
This is what we have discovered. AFAIK, all other major databases or other similar apps (like exchange or AD) all open files with FILE_FLAG_WRITE_THROUGH and do *not* use fsync. It might give noticably better performance with an O_DIRECT style WAL logging at least. But I'm unsure if the current code for O_DIRECT works on win32 - I think it needs some fixing for that. Which might be worth looking at for 8.1. Doesn't Windows support O_SYNC (or even better O_DSYNC) flag to open()? That should be the Posixy spelling of FILE_FLAG_WRITE_THROUGH, if the latter means what I suppose it does. They should, but someone said it didn't work. I haven't followed up on it, though, so it is quite possible it works. If so, it is definitly worth trying. Not much to do about the bgwriter, the way it is designed it *has* to fsync during checkpoint. Theoretically at least, the fsync during checkpoints should not be a performance killer. If you run a tight benchmark past a checkpoint, it will make an effect if the fsync takes twice as long as it does on unix. If the checkpoint happens when other I/O is fairly low then it shuold not have an effect. Merlin, was that by any chance you? We've been talking about these things quite a lot :-) So: try O_SYNC instead of fsync for WAL, ie, wal_sync_method = open_sync or open_datasync. Definitly worth cehcking out. //Magnus ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] win32 performance - fsync question
Magnus Hagander [EMAIL PROTECTED] writes: This is what we have discovered. AFAIK, all other major databases or other similar apps (like exchange or AD) all open files with FILE_FLAG_WRITE_THROUGH and do *not* use fsync. It might give noticably better performance with an O_DIRECT style WAL logging at least. But I'm unsure if the current code for O_DIRECT works on win32 - I think it needs some fixing for that. Which might be worth looking at for 8.1. Doesn't Windows support O_SYNC (or even better O_DSYNC) flag to open()? That should be the Posixy spelling of FILE_FLAG_WRITE_THROUGH, if the latter means what I suppose it does. Not much to do about the bgwriter, the way it is designed it *has* to fsync during checkpoint. Theoretically at least, the fsync during checkpoints should not be a performance killer. The issue that's at hand here is fsyncing the WAL, and the reason we need that is (a) to be sure a transaction is committed when we say it is, and (b) to be sure that WAL writes hit disk before associated data file updates do (it's write AHEAD log remember). Direct writes of WAL should be fine. So: try O_SYNC instead of fsync for WAL, ie, wal_sync_method = open_sync or open_datasync. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] win32 performance - fsync question
Things worth experimenting with (these are all untested, so please report any successes): 1) Try reformatting with a cluster size of 8Kb (the pg page size), if you can. 2) Disable the last access time (like noatime on linux). fsutil behavior set disablelastaccess 1 3) Disable 8.3 filenames fsutil behavior set disable8dot3 1 2 and 3 may require a reboot. (2 and 3 can be done on earlier windows through registry settings only, in HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\FileSystem) I've repeated the test under 2 and 3 - no noticeable difference. With disablelastaccess I got about 10% - 15% better results, but it is not too significant. Actually, that's enough to care about in a real world deployment. Finally I tried fsync = false and got 580-620 tps. So, the short summary: WinXP fsync = true 20-28 tps WinXP fsync = false 600 tps Linux 800 tps This Linux figure is really compared to the WinXP fsync=false, since you have write cacheing on. The interesting one to compare with is the other one you did: Linux w/o write cache80-90 tps Which is still faster than windows, but not as much faster. The general question is - does PostgreSQL really need fsync? I suppose it is a question for design, not platform-specific one. It sounds like only one scenario, when fsync is useful, is to interprocess communication via open file. But PostgreSQL utilize IPC for this, so does fsync is really required? No, fsync is used to make sure your data is committed to disk once you commit a transaction. IPC is handled through shared memory and named pipes. //Magnus ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] win32 performance - fsync question
On Thu, 17 Feb 2005, Andrew Dunstan wrote: (the results are interesting, though - with fsync off Windows and Linux are in the same performance ballpark.) Some addition: WinXP fsync = true 20-28 tps WinXP fsync = false 600 tps Linux fsync = true 800 tps Linux fsync = false 980 tps Regards, E.R. _ Evgeny Rodichev Sternberg Astronomical Institute email: [EMAIL PROTECTED] Moscow State University Phone: 007 (095) 939 2383 Fax: 007 (095) 932 8841 http://www.sai.msu.su/~er ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] win32 performance - fsync question
Doesn't Windows support O_SYNC (or even better O_DSYNC) flag to open()? That should be the Posixy spelling of FILE_FLAG_WRITE_THROUGH, if the latter means what I suppose it does. They should, but someone said it didn't work. I haven't followed up on it, though, so it is quite possible it works. If so, it is definitly worth trying. Update on that. There is no O_SYNC nor O_DSYNC. They just aren't there. However, we already have win32_open (in port/open.c) which is used to open these files. We could probably add code there to check for O_SYNC and map it to the correct win32 flags for CreateFile (because the support certainly is there). To make this happen, is it enough to define O_DSYNC in the win32 port include file, and then implement it in the open call? Or do I need to hack xlog.c? The comment claims it's hackery ;-), so I figured I should verify that before actually testing things. Oh, and finally. The win32 commands have the following options: FILE_FLAG_NO_BUFFERING. This disables the cache completely. It also has lots of limits, like every read and write has to be on a sector boundary etc. It gives great performance with async I/O, because it bypasses the memory manager. It appears to be like O_DIRECT on linux? FILE_FLAG_WRITE_THROUGH: Instructs the system to write through any intermediate cache and go directly to disk. If FILE_FLAG_NO_BUFFERING is not also specified, so that system caching is in effect, then the data is written to the system cache, but is flushed to disk without delay. If FILE_FLAG_NO_BUFFERING is also specified, so that system caching is not in effect, then the data is immediately flushed to disk without going through the system cache. The operating system also requests a write-through the hard disk cache to persistent media. However, not all hardware supports this write-through capability. It seems to me FILE_FLAG_NO_BUFFERING is the same as O_DSYNC. (A different place in the docs says Also, the file metadata may still be cached. To flush the metadata to disk, use the FlushFileBuffers function., so it seems it's more DSYNC than SYNC) //Magnus ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] win32 performance - fsync question
Doesn't Windows support O_SYNC (or even better O_DSYNC) flag to open()? That should be the Posixy spelling of FILE_FLAG_WRITE_THROUGH, if the latter means what I suppose it does. They should, but someone said it didn't work. I haven't followed up on it, though, so it is quite possible it works. If so, it is definitly worth trying. Yes, and the other issue is that FlushFileBuffers() does not play nice with raid controllers, it actually overrides their write caching so that you can not get around the fsync performance issue using raid + bbu on most configurations. Not much to do about the bgwriter, the way it is designed it *has* to fsync during checkpoint. Theoretically at least, the fsync during checkpoints should not be a performance killer. I agree: it's the WAL sync that is the problem. I don't mind a slower sync during checkpoint because that is controllable. However, there is also the raid issue. If you run a tight benchmark past a checkpoint, it will make an effect if the fsync takes twice as long as it does on unix. If the checkpoint happens when other I/O is fairly low then it shuold not have an effect. Merlin, was that by any chance you? We've been talking about these things quite a lot :-) So: try O_SYNC instead of fsync for WAL, ie, wal_sync_method = open_sync or open_datasync. Definitly worth cehcking out. Yeah. Merlin ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] win32 performance - fsync question
Magnus Hagander [EMAIL PROTECTED] writes: Oh, and finally. The win32 commands have the following options: FILE_FLAG_NO_BUFFERING. This disables the cache completely. It also has lots of limits, like every read and write has to be on a sector boundary etc. It gives great performance with async I/O, because it bypasses the memory manager. It appears to be like O_DIRECT on linux? FILE_FLAG_WRITE_THROUGH: Instructs the system to write through any intermediate cache and go directly to disk. If FILE_FLAG_NO_BUFFERING is not also specified, so that system caching is in effect, then the data is written to the system cache, but is flushed to disk without delay. If FILE_FLAG_NO_BUFFERING is also specified, so that system caching is not in effect, then the data is immediately flushed to disk without going through the system cache. The operating system also requests a write-through the hard disk cache to persistent media. However, not all hardware supports this write-through capability. AFAICS it would make sense for us to specify both of those flags for WAL writes. We could either hack win32_open() to translate O_SYNC to those flags, or make xlog.c aware of the Windows spellings of the flags. Probably the former is less painful given that open.c already does wholesale translations of open() flags. One point that I no longer recall the reasoning behind is that xlog.c doesn't think O_SYNC is a preferable default over fsync. We'd certainly want to hack xlog.c to change its mind about that, at least on Windows; assuming that the FILE_FLAG way is indeed faster. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] win32 performance - fsync question
Some addition: WinXP fsync = true 20-28 tps WinXP fsync = false 600 tps Linux fsync = true 800 tps Linux fsync = false 980 tps Wow, that's terrible on Windows. If there's a solution, it'd be nice to backport it... Chris ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] win32 performance - fsync question
There are two different concerns here. 1. transactions loss because of unexpected power loss and/or system failure 2. inconsistent database state For many application (1) is fairly acceptable, and (2) is not. So I'd like to formulate my questions by another way. - if PostgeSQL is running without fsync, and power loss occur, which kind of damage is possible? 1, 2, or both? - it looks like with proper fwrite/fflush policy it is possible to guarantee that only transactions loss may occur, but database keeps some consistent state as before (several) last transactions. Is it true for PostgeSQL? Regards, E.R. e Evgeny Rodichev Sternberg Astronomical Institute email: [EMAIL PROTECTED] Moscow State University Phone: 007 (095) 939 2383 Fax: 007 (095) 932 8841 http://www.sai.msu.su/~er ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] win32 performance - fsync question
WinXP fsync = true 20-28 tps WinXP fsync = false 600 tps Linux fsync = true 800 tps Linux fsync = false 980 tps Wow, that's terrible on Windows. If there's a solution, it'd be nice to backport it... there is. I just rigged up a test benchmark comparing sync methods. I ran on 2 boxes, my xp workstation on 10k raptor and a win2k server on 3ware raid 5 (also on 10k raptors). Workstation: did 1000 FILE_FLAG_WRITE_THROUGH | FILE_FLAG_NO_BUFFERING writes in 5.729633 seconds did 1000 FILE_FLAG_WRITE_THROUGH writes in 0.593322 seconds did 1000 flushfilebuffers writes in 15.898989 seconds server: did 1000 FILE_FLAG_WRITE_THROUGH | FILE_FLAG_NO_BUFFERING writes in 16.501076 seconds did 1000 FILE_FLAG_WRITE_THROUGH writes in 16.104133 seconds did 1000 flushfilebuffers writes in 18.962439 seconds server after running super altra secret dskcache '+p' mode: did 1000 FILE_FLAG_WRITE_THROUGH | FILE_FLAG_NO_BUFFERING writes in 0.256574 seconds did 1000 FILE_FLAG_WRITE_THROUGH writes in 2.627602 seconds did 1000 flushfilebuffers writes in 15.290967 seconds dskcache.exe is required to enable power protect mode (unbypassing raid conttoller write cache settings) on win2k. enjoy. Merlin ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] win32 performance - fsync question
Christopher Kings-Lynne [EMAIL PROTECTED] writes: WinXP fsync = true 20-28 tps WinXP fsync = false 600 tps Linux fsync = true 800 tps Linux fsync = false 980 tps Wow, that's terrible on Windows. If there's a solution, it'd be nice to backport it... Actually, the number that's way out of line there is the Linux w/fsync one. I infer that he's got disk write cache enabled and therefore the transactions aren't really being synced to disk at all. Any claimed TPS rate exceeding your disk drive's rotation rate is a red flag. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] win32 performance - fsync question
Evgeny Rodichev wrote: There are two different concerns here. 1. transactions loss because of unexpected power loss and/or system failure 2. inconsistent database state For many application (1) is fairly acceptable, and (2) is not. So I'd like to formulate my questions by another way. - if PostgeSQL is running without fsync, and power loss occur, which kind of damage is possible? 1, 2, or both? Both. If 1 can happen then 2 can happen. - it looks like with proper fwrite/fflush policy it is possible to guarantee that only transactions loss may occur, but database keeps some consistent state as before (several) last transactions. Is it true for PostgeSQL? No - if fsync is on and the transaction is reported as committed then it should still be there when the power returns. Provided you don't suffer hardware failure you should be able to rely on a committed transaction actually being written to disk. That's what fsync does for you. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] win32 performance - fsync question
WinXP fsync = true 20-28 tps WinXP fsync = false 600 tps Linux fsync = true 800 tps Linux fsync = false 980 tps Wow, that's terrible on Windows. If there's a solution, it'd be nice to backport it... there is. I just rigged up a test benchmark comparing sync methods. I ran on 2 boxes, my xp workstation on 10k raptor and a win2k server on 3ware raid 5 (also on 10k raptors). Workstation: did 1000 FILE_FLAG_WRITE_THROUGH | FILE_FLAG_NO_BUFFERING writes in 5.729633 seconds did 1000 FILE_FLAG_WRITE_THROUGH writes in 0.593322 seconds did 1000 flushfilebuffers writes in 15.898989 seconds server: did 1000 FILE_FLAG_WRITE_THROUGH | FILE_FLAG_NO_BUFFERING writes in 16.501076 seconds did 1000 FILE_FLAG_WRITE_THROUGH writes in 16.104133 seconds did 1000 flushfilebuffers writes in 18.962439 seconds server after running super altra secret dskcache '+p' mode: did 1000 FILE_FLAG_WRITE_THROUGH | FILE_FLAG_NO_BUFFERING writes in 0.256574 seconds did 1000 FILE_FLAG_WRITE_THROUGH writes in 2.627602 seconds did 1000 flushfilebuffers writes in 15.290967 seconds dskcache.exe is required to enable power protect mode (unbypassing raid conttoller write cache settings) on win2k. I draw the following conclusions: 1) Using just FILE_FLAG_WRITE_THROUGH is not enough. It sends it out of the cache, but it returns to the application before the data has hit disk. AFAIK, that's not good enough for us. 2) Using both, we can get a *significant* speed boost. Tom, if you look at all the requirements of FILE_FLAG_NO_BUFFERING on http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/ base/createfile.asp, can you say offhand if the WAL code fulfills them? If it does, we can probably just hack it in win32_open (at least for testing and a possible backpatch). Ifn ot, then we'll need to stuff code in xlog.c. (Specifically, I'm most worried about the memory alignment requirement) //Magnus ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] win32 performance - fsync question
One point that I no longer recall the reasoning behind is that xlog.c doesn't think O_SYNC is a preferable default over fsync. We'd certainly want to hack xlog.c to change its mind about that, at least on Windows; assuming that the FILE_FLAG way is indeed faster. I also confirmed that the totally un-cached mode in windows (FILE_FLAG_WRITE_THROUGH | FILE_FLAG_NO_BUFFERING) will only work if the amount of data written is some multiple of 512 bytes. Can WAL work under this restriction? Merlin ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] win32 performance - fsync question
Magnus Hagander [EMAIL PROTECTED] writes: Tom, if you look at all the requirements of FILE_FLAG_NO_BUFFERING on http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/ base/createfile.asp, can you say offhand if the WAL code fulfills them? If I'm reading it right, you are referring to: File access must begin at byte offsets within the file that are integer multiples of the volume's sector size. File access must be for numbers of bytes that are integer multiples of the volume's sector size. For example, if the sector size is 512 bytes, an application can request reads and writes of 512, 1024, or 2048 bytes, but not of 335, 981, or 7171 bytes. Buffer addresses for read and write operations should be sector aligned (aligned on addresses in memory that are integer multiples of the volume's sector size). Depending on the disk, this requirement may not be enforced. 1 and 2 should be no problem since we only read or write integral pages (8K). 3 is a bit bogus IMHO, or even a lot bogus. You can set ALIGNOF_BUFFER in src/include/pg_config_manual.h to whatever you think the alignment requirement really needs to be (I'd try 512). regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] win32 performance - fsync question
On Thu, 17 Feb 2005, Tom Lane wrote: Christopher Kings-Lynne [EMAIL PROTECTED] writes: WinXP fsync = true 20-28 tps WinXP fsync = false 600 tps Linux fsync = true 800 tps Linux fsync = false 980 tps Wow, that's terrible on Windows. If there's a solution, it'd be nice to backport it... Actually, the number that's way out of line there is the Linux w/fsync one. I infer that he's got disk write cache enabled and therefore the transactions aren't really being synced to disk at all. Any claimed TPS rate exceeding your disk drive's rotation rate is a red flag. Write cache is enabled under Linux by default all the time I make deal with it (since 1993). It doesn't interfere with fsync(), as linux kernel uses cache flush for fsync. I have 2.6.10 kernel running *without* any additional patches, and without any specific hdparm settings. fsync() really works fine as I switch off my notebook everyday 2-3 times, and never had any data loss :) Related staff from dmesg is hda: cache flushes supported Regards, E.R. _ Evgeny Rodichev Sternberg Astronomical Institute email: [EMAIL PROTECTED] Moscow State University Phone: 007 (095) 939 2383 Fax: 007 (095) 932 8841 http://www.sai.msu.su/~er ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] win32 performance - fsync question
Magnus Hagander [EMAIL PROTECTED] writes: Tom, if you look at all the requirements of FILE_FLAG_NO_BUFFERING on http://msdn.microsoft.com/library/default.asp?url=/library/en-us/fileio/ base/createfile.asp, can you say offhand if the WAL code fulfills them? If I'm reading it right, you are referring to: File access must begin at byte offsets within the file that are integer multiples of the volume's sector size. File access must be for numbers of bytes that are integer multiples of the volume's sector size. For example, if the sector size is 512 bytes, an application can request reads and writes of 512, 1024, or 2048 bytes, but not of 335, 981, or 7171 bytes. Buffer addresses for read and write operations should be sector aligned (aligned on addresses in memory that are integer multiples of the volume's sector size). Depending on the disk, this requirement may not be enforced. 1 and 2 should be no problem since we only read or write integral pages (8K). 3 is a bit bogus IMHO, or even a lot bogus. You can set ALIGNOF_BUFFER in src/include/pg_config_manual.h to whatever you think the alignment requirement really needs to be (I'd try 512). After multiple runs on different blocksizes( a few anomalous results aside), I didn't see a whole lot of difference between FILE_FLAG_NO_BUFFERING being on or off for writing performance. However, with NO_BUFFERING set, the file is not *read* cached at all. While the performance is on not terrible for reads, some careful consideration would have to be given for using it outside of WAL. For WAL, though, it seems perfect. If my results are to be believed, we can expect up to a 30 yes, that's three + zero times faster sync performance by ditching FlushFileBuffers (although probably far less in practice). Applying FILE_FLAG_WRITE_THROUGH to non WAL data files will give similar speedups to checkpoints, but right now I'm making no assumptions about the safety issue. I'd like to point out here that using the FlushFileBuffers() sync approach it was impossible to get my 3ware raid controller to cache the writes at all. This means that unless we change the sync method for data files, win32 will always have horrible checkpoint performance (and I do mean horrible). My suggestion would be to FILE_FLAG_NO_BUFFERING | FILE_FLAG_WRITE_THROUGH for WAL, and FILE_FLAG_WRITE_THROUGH for everything else. Then it's time to power-fail test etc. and make sure things work the way they are supposed to. By the way, by some quirk of fate, 8k seems to be a fairly good choice of block size. 4k block sizes give slightly lower latency but not nearly as much throughput. Merlin ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] win32 performance - fsync question
Evgeny Rodichev [EMAIL PROTECTED] writes: Any claimed TPS rate exceeding your disk drive's rotation rate is a red flag. Write cache is enabled under Linux by default all the time I make deal with it (since 1993). You're playing with fire. fsync() really works fine as I switch off my notebook everyday 2-3 times, and never had any data loss :) Given that it's a notebook, it's possible that the hardware is smart enough not to power down the disk until the disk is done writing everything it's cached. Do you care to try some experiments with pulling out the battery while Postgres is busy making updates? regards, tom lane ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] win32 performance - fsync question
After multiple runs on different blocksizes( a few anomalous results aside), I didn't see a whole lot of difference between FILE_FLAG_NO_BUFFERING being on or off for writing performance. However, with NO_BUFFERING set, the file is not *read* cached at all. While the performance is on not terrible for reads, some careful consideration would have to be given for using it outside of WAL. For WAL, though, it seems perfect. If my results are to be believed, we can expect up to a 30 yes, that's three + zero times faster sync performance by ditching FlushFileBuffers (although probably far less in practice). Yes, for WAL it won't blow away read-cache stuff, since we normally don't expect to read the data that's in WAL. Is there actually a reason why we don't use O_DIRECT on Unix? From what I can tell, O_SYNC does the write through but also puts it in the cache, whereas O_DIRECT doesn't waste cache on it? I was thinking of using O_DIRECT as the compatibility flag for the combination of FILE_FLAG_WRITE_THROUGH and NO_BUFFERING, and using O_SYNC for just the WRITE_THROUGH. Reasonable? //Magnus ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] win32 performance - fsync question
Magnus Hagander [EMAIL PROTECTED] writes: Is there actually a reason why we don't use O_DIRECT on Unix? Portability, or rather the complete lack of it. Stuff that isn't in the Single Unix Spec is a hard sell. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] win32 performance - fsync question
Evgeny Rodichev wrote: Write cache is enabled under Linux by default all the time I make deal with it (since 1993). It doesn't interfere with fsync(), as linux kernel uses cache flush for fsync. The problem is that most IDE drives lie (or perhaps you could say the specification is ambiguous) about completion of the cache-flush command -- they say Yeah, I've flushed when they have not actually written the data to the media and have no provision for making sure it will get there in the event of power failure. So Linux is indeed doing a cache flush on fsync, but the hardware is not behaving as expected. By turning off the write-cache on the disk via hdparm, you manage to get the hardware to behave better. The kernel is caching anyway, so the loss of the drive's write cache doesn't make a big difference. There was some work done for better IDE write-barrier support (related to TCQ/SATA support?) in the kernel, but I'm not sure how far that has progressed. -O ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] win32 performance - fsync question
Magnus Hagander [EMAIL PROTECTED] writes: Is there actually a reason why we don't use O_DIRECT on Unix? Portability, or rather the complete lack of it. Stuff that isn't in the Single Unix Spec is a hard sell. Well, how about this (ok, maybe I'm way out in left field): Change fsync option from on/off to on/off/O_SYNC. On win32 we treat O_SYNC as opened with FILE_FLAG_WRITE_THROUGH. When we are in O_SYNC mode, all files, WAL or otherwise, are assumed to be synced when written and are therefore not synced during pg_fsync(). WAL syncing may of course be overridden using alternate sync methods in postgresql.conf. I suspect that this will drastically alter windows performance, especially on raid systems. What is TBD is the safety aspect. What I like about this that now are not dealing with a win32-only hack, any unix system now has another performance setting top play with. We also don't touch the O_DIRECT flag (on win32: FILE_FLAG_WRITE_THROUGH | FILE_FLAG_NO_BUFFERING) leaving that can of worms for another day. Under normal situations, we would expect O_SYNCing everything all the time to slow stuff down, especially during checkpoints, but it might actually help on a caching raid controller. On win32, it will help because the performance of fsync() sucks so horribly, even or raid. Merlin ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Strange RETURN NEXT behaviour in Postgres 8.0
On Wed, 16 Feb 2005, Tom Lane wrote: Richard Huxton dev@archonet.com writes: I seem to remember some subtle problems with dropped columns and plpgsql functions - could be one of those still left. It looks like the code that handles returning a RECORD variable doesn't cope with dropped columns in the function result rowtype. (If you instead declare rec as usno%rowtype, you get a different set of misbehaviors after adding/dropping columns, so that code path isn't perfect either :-() Finally I want to clarify, that after copying my usno table into another, the problems have disappeared. So I had experienced just exacty the bug with dropped columns. So, is there a chance that this bug will be fixed in some 8.X postgres ? Sergey ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] win32 performance - fsync question
Oliver Jowett [EMAIL PROTECTED] writes: So Linux is indeed doing a cache flush on fsync Actually I think the root of the problem was precisely that Linux does not issue any sort of cache flush commands to drives on fsync. There was some talk on linux-kernel of what how they could take advantage of new ATA features planned on new SATA drives coming out now to solve this. But they didn't seem to think it was urgent or worth the performance hit of doing a complete cache flush. -- greg ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] win32 performance - fsync question
Greg Stark wrote: Oliver Jowett [EMAIL PROTECTED] writes: So Linux is indeed doing a cache flush on fsync Actually I think the root of the problem was precisely that Linux does not issue any sort of cache flush commands to drives on fsync. There was some talk on linux-kernel of what how they could take advantage of new ATA features planned on new SATA drives coming out now to solve this. But they didn't seem to think it was urgent or worth the performance hit of doing a complete cache flush. Oh, ok. I haven't really kept up to date with it; I just run with write-cache disabled on my IDE drives as a matter of course. I did see this: http://www.ussg.iu.edu/hypermail/linux/kernel/0304.1/0471.html which implies you're never going to get an implementation that is safe across all IDE hardware :( -O ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] win32 performance - fsync question
On Thu, 17 Feb 2005, Tom Lane wrote: Evgeny Rodichev [EMAIL PROTECTED] writes: Any claimed TPS rate exceeding your disk drive's rotation rate is a red flag. Write cache is enabled under Linux by default all the time I make deal with it (since 1993). You're playing with fire. Yes. I'm lucky in this play :) More seriously, we (with Oleg Bartunov) investigated many platforms/OS for commercial, scientific and other applications during past 10-12 years. I suppose, virtually all excluding modern mainframes. For reliability Linux + PostreSQL was found the best one (including the environment with very frequent unexpected power-off, as at some astronomical observatories at high mountains). Hence, I'm lucky :) fsync() really works fine as I switch off my notebook everyday 2-3 times, and never had any data loss :) Given that it's a notebook, it's possible that the hardware is smart enough not to power down the disk until the disk is done writing everything it's cached. Do you care to try some experiments with pulling out the battery while Postgres is busy making updates? Yes, you are exactly right. All modern HDDs (not entry level ones) has a huge cache (at device, not at controller), and provide the safe hardware flush of cache *after* power off (thanks capacitors). My HDD has 16MB cache, and it is the reason for excellent performance. Regards, E.R. _ Evgeny Rodichev Sternberg Astronomical Institute email: [EMAIL PROTECTED] Moscow State University Phone: 007 (095) 939 2383 Fax: 007 (095) 932 8841 http://www.sai.msu.su/~er ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] win32 performance - fsync question
On Fri, 18 Feb 2005, Oliver Jowett wrote: Evgeny Rodichev wrote: Write cache is enabled under Linux by default all the time I make deal with it (since 1993). It doesn't interfere with fsync(), as linux kernel uses cache flush for fsync. The problem is that most IDE drives lie (or perhaps you could say the specification is ambiguous) about completion of the cache-flush command -- they say Yeah, I've flushed when they have not actually written the data to the media and have no provision for making sure it will get there in the event of power failure. Yes, I agree. But in my real SA practice I've met 50-100 times the situation when HDD were unexpectedly physically corrupted (the heads touch a surface), without possibility to restore. And I never met any corruption because of possible hardware lie. So Linux is indeed doing a cache flush on fsync, but the hardware is not behaving as expected. By turning off the write-cache on the disk via hdparm, you manage to get the hardware to behave better. The kernel is caching anyway, so the loss of the drive's write cache doesn't make a big difference. Again, in practice, it is different. FreeBSD had a true flush (at least 2-3 yeas ago, not sure about the modern versions), and for write-intensive applications it was a bit slower (comparing with linux), but it never was more reliable (since 1996, at least). Another practical example is Google :) Isn't reliable? There was some work done for better IDE write-barrier support (related to TCQ/SATA support?) in the kernel, but I'm not sure how far that has progressed. Yes, but IMHO it is not stable enough at the moment. Regards, E.R. _ Evgeny Rodichev Sternberg Astronomical Institute email: [EMAIL PROTECTED] Moscow State University Phone: 007 (095) 939 2383 Fax: 007 (095) 932 8841 http://www.sai.msu.su/~er ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] win32 performance - fsync question
On Fri, 17 Feb 2005, Greg Stark wrote: Oliver Jowett [EMAIL PROTECTED] writes: So Linux is indeed doing a cache flush on fsync Actually I think the root of the problem was precisely that Linux does not issue any sort of cache flush commands to drives on fsync. No, it does. Let's try the simplest test: for (i = 0; i LEN; i++) { write (fd, buf, 512); if (sync) fsync (fd); } with sync = 0 and 1, and you'll see the difference. There was some talk on linux-kernel of what how they could take advantage of new ATA features planned on new SATA drives coming out now to solve this. But they didn't seem to think it was urgent or worth the performance hit of doing a complete cache flush. It was a bit different topic. Regards, E.R. _ Evgeny Rodichev Sternberg Astronomical Institute email: [EMAIL PROTECTED] Moscow State University Phone: 007 (095) 939 2383 Fax: 007 (095) 932 8841 http://www.sai.msu.su/~er ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Help me recovering data
Greg Stark wrote: Gaetano Mendola [EMAIL PROTECTED] writes: We do ~4000 txn/minute so in 6 month you are screewd up... Sure, but if you ran without vacuuming for 6 months, wouldn't you notice the huge slowdowns from all those dead tuples before that? In my applications yes, for sure I see the huge slowdown after 2 days without it, but giveng the fact that someone crossed the limit I immagine that is possible without performance loose Regards Gaetano Mendola ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] win32 performance - fsync question
Magnus Hagander [EMAIL PROTECTED] news:[EMAIL PROTECTED] This is what we have discovered. AFAIK, all other major databases or other similar apps (like exchange or AD) all open files with FILE_FLAG_WRITE_THROUGH and do *not* use fsync. It might give noticably better performance with an O_DIRECT style WAL logging at least. But I'm unsure if the current code for O_DIRECT works on win32 - I think it needs some fixing for that. Which might be worth looking at for 8.1. UPS will not help you. UPS does not help you if the OS crashes (hey, yuo're on windows, this *does* happen). UPS does not help you if somebody accidentally pulls the plug between the UPS and the server. UPS does not help you if your server overheats and shuts down. Bottom line, there are lots of cases when an UPS does not help. Having an UPS (preferrably redundant UPSes feeding redundant power supplies - this is not at all expensive today) is certainly a good thing, but it is *not* a replacement for fsync. On *any* platform. //Magnus Oracle9 and SQL Server 2000 use this flag. Some comments on the lost-data-concern about FILE_FLAG_WRITE_THROUGH: (1) Assume you just use ordinary SCSI disks with write back cache on - you will lost your data if the server suddently lost power; you will *not* lost your data when OS crashes, server reset or whatever only if the server has the power; This has been verified with Oracle9 and SQL Server 2000. (2) Turn off write back cache in disks, you will not lost data, but you will see your performance decreased; (3) If you use some advanced expensive disks like the battery-equipped ones, then you can safely enable write back cache; So UPS is useful for ordinary SCSI disks when write back cache is enabled, but make sure don't let somebody accidentally pulls the plug between the UPS and the server this unfortunate thing happen. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] win32 performance - fsync question
Evgeny Rodichev [EMAIL PROTECTED] writes: No, it does. Let's try the simplest test: for (i = 0; i LEN; i++) { write (fd, buf, 512); if (sync) fsync (fd); } with sync = 0 and 1, and you'll see the difference. Uh, I'm sure you'll see a difference, one will be limited by the i/o throughput the IDE interface is capable of, the other will be limited purely by the memory bandwidth and kernel syscall latency. Try it with sync=1 and write caching disabled on your IDE drive and you should see an even larger difference. However, no filesystem and ide driver combination in linux 2.4 and afaik none in 2.6 either issue any special ATA commands to force the drive to There was some talk on linux-kernel of what how they could take advantage of new ATA features planned on new SATA drives coming out now to solve this. But they didn't seem to think it was urgent or worth the performance hit of doing a complete cache flush. It was a bit different topic. Well no way to tell if we're talking about the same threads. But in the discussion I saw it was clear they were talking about adding an interface to drivers so for filesystems to issue cache flushes when necessary to guarantee filesystem integrity. They still didn't seem to get that users cared about their data too, not just filesystem integrity. -- greg ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Help me recovering data
Tom Lane wrote: Gaetano Mendola [EMAIL PROTECTED] writes: BTW, why not do an automatic vacuum instead of shutdown ? At least the DB do not stop working untill someone study what the problem is and how solve it. No, the entire point of this discussion is to whup the DBA upside the head with a big enough cluestick to get him to install autovacuum. Once autovacuum is default, it won't matter anymore. I have a concern about this that I hope is just based on some misunderstanding on my part. My concern is: suppose that a database is modified extremely infrequently? So infrequently, in fact, that over a billion read transactions occur before the next write transaction. Once that write transaction occurs, you're hosed, right? Autovacuum won't catch this because it takes action based on the write activity that occurs in the tables. So: will autovacuum be coded to explicitly look for transaction wraparound, or to automatically vacuum every N number of transactions (e.g., 500 million)? -- Kevin Brown [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly