Re: [HACKERS] Analysis of ganged WAL writes

2002-10-18 Thread Mats Lofkvist
[EMAIL PROTECTED] (Tom Lane) writes: [snip] So this does seem to be a nice win, and unless I hear objections I will apply it ... It does indeed look like a great improvement, so is the fix going to be merged to the 7.3 branch or is it too late for that? _ Mats Lofkvist [EMAIL

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-18 Thread Tom Lane
Mats Lofkvist [EMAIL PROTECTED] writes: It does indeed look like a great improvement, so is the fix going to be merged to the 7.3 branch or is it too late for that? Yes, been there done that ... regards, tom lane ---(end of

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-08 Thread Curtis Faith
Since in your case all transactions A-E want the same buffer written, the memory (not it's content) will also be the same. But no, it won't: the successive writes will ask to write different snapshots of the same buffer. Successive writes would write different NON-OVERLAPPING sections of

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-08 Thread Tom Lane
Curtis Faith [EMAIL PROTECTED] writes: Successive writes would write different NON-OVERLAPPING sections of the same log buffer. It wouldn't make sense to send three separate copies of the entire block. That could indeed cause problems. So you're going to undo the code's present property that

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-08 Thread Curtis Faith
Curtis Faith [EMAIL PROTECTED] writes: Successive writes would write different NON-OVERLAPPING sections of the same log buffer. It wouldn't make sense to send three separate copies of the entire block. That could indeed cause problems. So you're going to undo the code's present property

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-08 Thread Curtis Faith
Curtis Faith [EMAIL PROTECTED] writes: I'm not really worried about doing page-in reads because the disks internal buffers should contain most of the blocks surrounding the end of the log file. If the successive partial writes exceed a block (which they will in heavy use) then most of

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-08 Thread Greg Copeland
On Tue, 2002-10-08 at 04:15, Zeugswetter Andreas SB SD wrote: Can the magic be, that kaio directly writes from user space memory to the disk ? Since in your case all transactions A-E want the same buffer written, the memory (not it's content) will also be the same. This would automatically

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-08 Thread Tom Lane
Zeugswetter Andreas SB SD [EMAIL PROTECTED] writes: Can the magic be, that kaio directly writes from user space memory to the disk ? This makes more assumptions about the disk drive's behavior than I think are justified... Since in your case all transactions A-E want the same buffer

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-08 Thread Curtis Faith
You example of 1 trx/proc/rev will wok _only_ if no more and no less than 1/4 of platter is filled by _other_ log writers. Not really, if 1/2 the platter has been filled we'll still get in one more commit in for a given rotation. If more than a rotation's worth of writing has occurred that

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-08 Thread Zeugswetter Andreas SB SD
Can the magic be, that kaio directly writes from user space memory to the disk ? This makes more assumptions about the disk drive's behavior than I think are justified... No, no assumption about the drive, only the kaio implementation, namely, that the kaio implementation reads the

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-08 Thread Hannu Krosing
Curtis Faith kirjutas T, 08.10.2002 kell 01:04: I may be missing something obvious, but I don't see a way to get more than 1 trx/process/revolution, as each previous transaction in that process must be written to disk before the next can start, and the only way it can be written to the

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-08 Thread Zeugswetter Andreas SB SD
ISTM aio_write only improves the picture if there's some magic in-kernel processing that makes this same kind of judgment as to when to issue the ganged write for real, and is able to do it on time because it's in the kernel. I haven't heard anything to make me think that that feature

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Hannu Krosing
Tom Lane kirjutas E, 07.10.2002 kell 01:07: To test this, I made a modified version of pgbench in which each transaction consists of a simple insert into table_NNN values(0); where each client thread has a separate insertion target table. This is about the simplest transaction I

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Tom Lane
Hannu Krosing [EMAIL PROTECTED] writes: in an ideal world this would be 5*120=600 tps. Have you any good any ideas what holds it back for the other 300 tps ? Well, recall that the CPU usage was about 20% in the single-client test. (The reason I needed a variant version of pgbench is that this

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Tom Lane
I wrote: That says that the best possible throughput on this test scenario is 5 transactions per disk rotation --- the CPU is just not capable of doing more. I am actually getting about 4 xact/rotation for 10 or more clients (in fact it seems to reach that plateau at 8 clients, and be close

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Curtis Faith
Tom, first of all, excellent job improving the current algorithm. I'm glad you look at the WALCommitLock code. This must be so because the backends that are released at the end of any given disk revolution will not be able to participate in the next group commit, if there is already at least

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Tom Lane
Curtis Faith [EMAIL PROTECTED] writes: Even the theoretical limit you mention of one transaction per revolution per committing process seem like a significant bottleneck. Well, too bad. If you haven't gotten your commit record down to disk, then *you have not committed*. This is not

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Hannu Krosing
On Tue, 2002-10-08 at 00:12, Curtis Faith wrote: Tom, first of all, excellent job improving the current algorithm. I'm glad you look at the WALCommitLock code. This must be so because the backends that are released at the end of any given disk revolution will not be able to participate

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Hannu Krosing
On Tue, 2002-10-08 at 01:27, Tom Lane wrote: The scheme we now have (with my recent patch) essentially says that the commit delay seen by any one transaction is at most two disk rotations. Unfortunately it's also at least one rotation :-(, except in the case where there is no contention, ie,

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Curtis Faith
Well, too bad. If you haven't gotten your commit record down to disk, then *you have not committed*. This is not negotiable. (If you think it is, then turn off fsync and quit worrying ;-)) I've never disputed this, so if I seem to be suggesting that, I've beee unclear. I'm just assuming

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Greg Copeland
On Mon, 2002-10-07 at 16:06, Curtis Faith wrote: Well, too bad. If you haven't gotten your commit record down to disk, then *you have not committed*. This is not negotiable. (If you think it is, then turn off fsync and quit worrying ;-)) At this point, I think we've come full circle.

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Justin Clift
Greg Copeland wrote: snip If so, I assume it would become a configure option (--with-aio)? Or maybe a GUC use_aio ? :-) Regards and best wishes, Justin Clift Regards, Greg

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Tom Lane
Curtis Faith [EMAIL PROTECTED] writes: Well, too bad. If you haven't gotten your commit record down to disk, then *you have not committed*. This is not negotiable. (If you think it is, then turn off fsync and quit worrying ;-)) I've never disputed this, so if I seem to be suggesting that,

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Greg Copeland
Well, I was thinking that aio may not be available on all platforms, thus the conditional compile option. On the other hand, wouldn't you pretty much want it either on or off for all instances? I can see that it would be nice for testing though. ;) Greg On Mon, 2002-10-07 at 16:23, Justin

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-07 Thread Curtis Faith
I may be missing something obvious, but I don't see a way to get more than 1 trx/process/revolution, as each previous transaction in that process must be written to disk before the next can start, and the only way it can be written to the disk is when the disk heads are on the right place

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-06 Thread Tom Lane
I said: There is a simple error in the current code that is easily corrected: in XLogFlush(), the wait to acquire WALWriteLock should occur before, not after, we try to acquire WALInsertLock and advance our local copy of the write request pointer. (To be exact, xlog.c lines 1255-1269 in CVS

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-06 Thread Greg Copeland
On Sun, 2002-10-06 at 18:07, Tom Lane wrote: CPU loading goes from 80% idle at 1 client to 50% idle at 5 clients to 10% idle at 10 or more. So this does seem to be a nice win, and unless I hear objections I will apply it ... Wow Tom! That's wonderful! On the other hand, maybe people

Re: [HACKERS] Analysis of ganged WAL writes

2002-10-06 Thread Rod Taylor
On Sun, 2002-10-06 at 19:35, Greg Copeland wrote: On Sun, 2002-10-06 at 18:07, Tom Lane wrote: CPU loading goes from 80% idle at 1 client to 50% idle at 5 clients to 10% idle at 10 or more. So this does seem to be a nice win, and unless I hear objections I will apply it ...