Re: [HACKERS] [PERFORM] insert performance for win32
On Fri, 2005-11-04 at 13:21 -0500, Bruce Momjian wrote: David Fetter wrote: On Fri, Nov 04, 2005 at 01:01:20PM -0500, Tom Lane wrote: I'm inclined to treat this as an outright bug, not just a minor performance issue, because it implies that a sufficiently long psql script would probably crash a Windows machine. Ouch. In light of this, are we *sure* what we've got a is a candidate for release? Good point. It is something we would fix in a minor release, so it doesn't seem worth doing another RC just for that. Will this be documented in the release notes? If we put unimplemented features in TODO, where do we list things we regard as bugs? Best Regards, Simon Riggs ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] insert performance for win32
You mean something like the attached? not quite: attached is a file to generate test. to do it: psql yadda \i timeit.sql \t \o dump.sql select make_dump(5, false); \q cat dump.sql | psql -q yadda and see what pops out. I had to do it that way because redirecting psql to dump file caused psql sit forever waiting on more with cpu load... Merlin timeit.sql Description: timeit.sql ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] insert performance for win32
You mean something like the attached? oh, btw I ran timeit.c and performance is flat and fairly fast. I'm pretty sure psql is the culprit here. Merlin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] insert performance for win32
Merlin Moncure [EMAIL PROTECTED] writes: You mean something like the attached? not quite: attached is a file to generate test. cat dump.sql | psql -q yadda Ah. Does your psql have readline support? if so, does adding -n to that command change anything? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] insert performance for win32
not quite: attached is a file to generate test. cat dump.sql | psql -q yadda Ah. Does your psql have readline support? if so, does adding -n to that command change anything? It doesn't, and it doesn't. :/ Ok, here's where it gets interesting. I removed all the newlines from the test output (dump.sql) and got flat times ;). Merlin ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] insert performance for win32
Merlin Moncure [EMAIL PROTECTED] writes: It doesn't, and it doesn't. :/ Ok, here's where it gets interesting. I removed all the newlines from the test output (dump.sql) and got flat times ;). That's bizarre ... I'd have thought a very long line would be more likely to trigger internal performance problems than the original. What happens if you read the file with psql -f dump.sql instead of cat/stdin? BTW, I get flat times for your psql test case on Unix, again both with local and remote client. So whatever is going on here, it's Windows-specific. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] insert performance for win32
That's bizarre ... I'd have thought a very long line would be more likely to trigger internal performance problems than the original. What happens if you read the file with psql -f dump.sql instead of cat/stdin? non-flat. Also ran via \i and got non flat times. BTW, I get flat times for your psql test case on Unix, again both with local and remote client. So whatever is going on here, it's Windows-specific. yeah. I'm guessing problem is in the mingw flex/bison (which I really, really hope is not the case) or some other win32 specific block of code. I'm snooping around there... Merlin ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] insert performance for win32
Merlin Moncure [EMAIL PROTECTED] writes: yeah. I'm guessing problem is in the mingw flex/bison (which I really, really hope is not the case) or some other win32 specific block of code. I'm snooping around there... Maybe I'm confused here, but I thought we had established that the local and remote cases behave differently for you? If so I'd suppose that it must be a networking issue, and there's little point in looking inside psql. If the problem is internal to psql, gprof or similar tool would be helpful ... got anything like that on Windows? regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] insert performance for win32
Merlin Moncure [EMAIL PROTECTED] writes: yeah. I'm guessing problem is in the mingw flex/bison (which I really, really hope is not the case) or some other win32 specific block of code. I'm snooping around there... Maybe I'm confused here, but I thought we had established that the local and remote cases behave differently for you? If so I'd suppose that it must be a networking issue, and there's little point in looking inside psql. The local case is *worse*...presumably because psql is competing with the server for cpu time...cpu load is pegged at 100%. On the remote case, I'm getting 50-60% cpu load which is way to high. The problem is definitely in psql. Merlin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] insert performance for win32
ok, here is gprof output from newlines/no newlines [newlines] % cumulative self self total time seconds secondscalls s/call s/call name 19.03 0.67 0.671 0.67 3.20 MainLoop 17.61 1.29 0.62 500031 0.00 0.00 yylex 15.63 1.84 0.55 1500094 0.00 0.00 GetVariable 11.08 2.23 0.39 250018 0.00 0.00 SendQuery 4.26 2.38 0.15 750051 0.00 0.00 GetVariableBool 3.41 2.50 0.12 250024 0.00 0.00 SetVariable 2.56 2.59 0.09 250015 0.00 0.00 gets_fromFile 2.27 2.67 0.08 750044 0.00 0.00 yy_switch_to_buffer 2.27 2.75 0.08 500031 0.00 0.00 psql_scan 2.27 2.83 0.08 pg_strcasecmp 1.70 2.89 0.06 4250078 0.00 0.00 emit 1.70 2.95 0.06 500031 0.00 0.00 VariableEquals 1.70 3.01 0.06 250018 0.00 0.00 AcceptResult 1.42 3.06 0.05 250018 0.00 0.00 ResetCancelConn [no newlines] % cumulative self self total time seconds secondscalls s/call s/call name 23.01 0.26 0.26 250019 0.00 0.00 yylex 19.47 0.48 0.22 250018 0.00 0.00 SendQuery 11.50 0.61 0.13 170 0.00 0.00 GetVariable 9.73 0.72 0.11 250042 0.00 0.00 pg_strdup 9.73 0.83 0.11 250024 0.00 0.00 SetVariable 6.19 0.90 0.07 500039 0.00 0.00 GetVariableBool 5.31 0.96 0.06 pg_strcasecmp 4.42 1.01 0.05 4250078 0.00 0.00 emit 2.65 1.04 0.031 0.03 1.01 MainLoop ok, mingw gprof is claiming MainLoop is a culprit here, along with general efficiency penalty otherwise in several things (twice many calls to yylex, 33%more to getvariable, etc). Just for fun I double checked string len of query input to SendQuery and everything is the right length. Same # calls to SendQuery, but 2.5 times call time in newlines case...anything jump out? Merlin ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] insert performance for win32
Merlin Moncure [EMAIL PROTECTED] writes: ok, mingw gprof is claiming MainLoop is a culprit here, The only thing I can see that would be different for Windows is the SetConsoleCtrlHandler kernel call ... could that be expensive? Why do we have either sigsetjmp or setup_cancel_handler inside the per-line loop, rather than just before it? There is a lot of stuff in MainLoop that doesn't seem like it really needs to be done on every single line, particularly not the repeated fetching of psql variables that couldn't possibly change except inside HandleSlashCmds. But that all ought to be the same on Unix or Windows. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] insert performance for win32
Nailed it. problem is in mainloop.c - setup_cancel_handler. Apparently you can have multiple handlers and windows keeps track of them all, even if they do the same thing. Keeping track of so many system handles would naturally slow the whole process down. Commenting that line times are flat as a pancake. I am thinking keeping track of a global flag would be appropriate. Merlin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] insert performance for win32
Merlin Moncure [EMAIL PROTECTED] writes: Nailed it. problem is in mainloop.c - setup_cancel_handler. Apparently you can have multiple handlers and windows keeps track of them all, even if they do the same thing. Keeping track of so many system handles would naturally slow the whole process down. Yipes. So we really want to do that only once. AFAICS it is appropriate to move the sigsetjmp and setup_cancel_handler calls in front of the per-line loop inside MainLoop --- can anyone see a reason not to? I'm inclined to treat this as an outright bug, not just a minor performance issue, because it implies that a sufficiently long psql script would probably crash a Windows machine. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] insert performance for win32
Merlin Moncure [EMAIL PROTECTED] writes: Nailed it. problem is in mainloop.c - setup_cancel_handler. Apparently you can have multiple handlers and windows keeps track of them all, even if they do the same thing. Keeping track of so many system handles would naturally slow the whole process down. Yipes. So we really want to do that only once. AFAICS it is appropriate to move the sigsetjmp and setup_cancel_handler calls in front of the per-line loop inside MainLoop --- can anyone see a reason not to? hm. mainloop is re-entrant, right? That means each \i would reset the handler...what is downside to keeping global flag? I'm inclined to treat this as an outright bug, not just a minor certainly... performance issue, because it implies that a sufficiently long psql script would probably crash a Windows machine. actually, it's worse than that, it's more of a dos on the whole system, as windows will eventually stop granting handles, but there is a good chance of side effects on other applications. Merlin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] insert performance for win32
Merlin Moncure [EMAIL PROTECTED] writes: AFAICS it is appropriate to move the sigsetjmp and setup_cancel_handler calls in front of the per-line loop inside MainLoop --- can anyone see a reason not to? hm. mainloop is re-entrant, right? That means each \i would reset the handler...what is downside to keeping global flag? Ah, right, and in fact I'd missed the comment at line 325 pointing out that we're relying on the sigsetjmp to be re-executed every time through. That could be improved on, likely, but not right before a release. Does the flag need to be global? I'm thinking void setup_cancel_handler(void) { + static bool done = false; + + if (!done) SetConsoleCtrlHandler(consoleHandler, TRUE); + done = true; } regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] insert performance for win32
Tom Lane wrote: Merlin Moncure [EMAIL PROTECTED] writes: Nailed it. problem is in mainloop.c - setup_cancel_handler. Apparently you can have multiple handlers and windows keeps track of them all, even if they do the same thing. Keeping track of so many system handles would naturally slow the whole process down. Yipes. So we really want to do that only once. AFAICS it is appropriate to move the sigsetjmp and setup_cancel_handler calls in front of the per-line loop inside MainLoop --- can anyone see a reason not to? Nope. I'm inclined to treat this as an outright bug, not just a minor performance issue, because it implies that a sufficiently long psql script would probably crash a Windows machine. Agreed. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PERFORM] insert performance for win32
David Fetter [EMAIL PROTECTED] writes: On Fri, Nov 04, 2005 at 01:01:20PM -0500, Tom Lane wrote: I'm inclined to treat this as an outright bug, not just a minor performance issue, because it implies that a sufficiently long psql script would probably crash a Windows machine. Ouch. In light of this, are we *sure* what we've got a is a candidate for release? Sure. This problem exists in 8.0.* too. Pre-existing bugs don't disqualify an RC in my mind --- we fix them and move on, same as we would do at any other time. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PERFORM] insert performance for win32
David Fetter wrote: On Fri, Nov 04, 2005 at 01:01:20PM -0500, Tom Lane wrote: Merlin Moncure [EMAIL PROTECTED] writes: Nailed it. problem is in mainloop.c - setup_cancel_handler. Apparently you can have multiple handlers and windows keeps track of them all, even if they do the same thing. Keeping track of so many system handles would naturally slow the whole process down. Yipes. So we really want to do that only once. AFAICS it is appropriate to move the sigsetjmp and setup_cancel_handler calls in front of the per-line loop inside MainLoop --- can anyone see a reason not to? I'm inclined to treat this as an outright bug, not just a minor performance issue, because it implies that a sufficiently long psql script would probably crash a Windows machine. Ouch. In light of this, are we *sure* what we've got a is a candidate for release? Good point. It is something we would fix in a minor release, so it doesn't seem worth doing another RC just for that. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] insert performance for win32
I'm inclined to treat this as an outright bug, not just a minor certainly... performance issue, because it implies that a sufficiently long psql script would probably crash a Windows machine. actually, it's worse than that, it's more of a dos on the whole system, as windows will eventually stop granting handles, but there is a good chance of side effects on other applications. Does it actually use up *handles* there? I don't see anything in the docs that says it should do that - and they usually do document when handles are used. You should be seeing a *huge* increase in system handles very fast if it does, right? That said, I definitly agree with calling it a bug :-) //Magnus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] insert performance for win32
Merlin Moncure [EMAIL PROTECTED] wrote ok, here is gprof output from newlines/no newlines [newlines] % cumulative self self total time seconds secondscalls s/call s/call name 19.03 0.67 0.671 0.67 3.20 MainLoop 17.61 1.29 0.62 500031 0.00 0.00 yylex 15.63 1.84 0.55 1500094 0.00 0.00 GetVariable 11.08 2.23 0.39 250018 0.00 0.00 SendQuery 4.26 2.38 0.15 750051 0.00 0.00 GetVariableBool 3.41 2.50 0.12 250024 0.00 0.00 SetVariable 2.56 2.59 0.09 250015 0.00 0.00 gets_fromFile 2.27 2.67 0.08 750044 0.00 0.00 yy_switch_to_buffer 2.27 2.75 0.08 500031 0.00 0.00 psql_scan 2.27 2.83 0.08 pg_strcasecmp 1.70 2.89 0.06 4250078 0.00 0.00 emit 1.70 2.95 0.06 500031 0.00 0.00 VariableEquals 1.70 3.01 0.06 250018 0.00 0.00 AcceptResult 1.42 3.06 0.05 250018 0.00 0.00 ResetCancelConn Maybe I missed some threads do you think it is interesting to test the *absoulte* time difference of the same machine on Windows/Linux by using timeit.c? I wonder if windows is slower than Linux ... Regards, Qingqing ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PERFORM] insert performance for win32
AFAICS it is appropriate to move the sigsetjmp and setup_cancel_handler calls in front of the per-line loop inside MainLoop --- can anyone see a reason not to? hm. mainloop is re-entrant, right? That means each \i would reset the handler...what is downside to keeping global flag? Ah, right, and in fact I'd missed the comment at line 325 pointing out that we're relying on the sigsetjmp to be re-executed every time through. That could be improved on, likely, but not right before a release. Does the flag need to be global? I'm thinking void setup_cancel_handler(void) { + static bool done = false; + + if (!done) SetConsoleCtrlHandler(consoleHandler, TRUE); + done = true; } Seems like a simple enough solution, don't see why it shouldn't work. As long as psql is single-threaded, which it is... (Actually, that code seems to re-set done=true on every call which seems unnecessary - but that might be optimised away, I guess) //Magnus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] insert performance for win32
Magnus Hagander [EMAIL PROTECTED] wrote I'd like to use the win32 provided recv(), send() functions instead of redirect them to pgwin32_recv()/pgwin32_send(), just like libpq does. If we do this, we will lose some functionalities, but I'd like to see the performance difference first. -- do you think that will be any difference? I doubt there will be much performance difference, as you hav eto hit the kernel anyway (in the recv/send call). But that part is just a guess :-) On a separate line -- I verified Magnus's doubt -- revert pgwin32_recv() to recv() does not improve performance visiblly. Regards, Qingqing ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] insert performance for win32
On Wed, 2 Nov 2005, Merlin Moncure wrote: If you put client/server on the same machine, then we don't know how the CPU is splitted. Can you take a look at the approximate number by observing the task manager data while running? ok, I generated a test case which was 250k inserts to simple two column table all in single transaction. Every 50k inserts, time is recorded via timeofday(). Running from remote, Time progression is: First 50k: 20 sec Second: 29 sec [...] final:: 66 sec so, clear upward progression of time/rec. Initial time is 2.5k inserts/sec which is decent but not great for such a narrow table. CPU time on server starts around 50% and drops in exact proportion to insert performance. My earlier gprof test also suggest there is no smoking gun sucking down all the cpu time. cpu time on the client is very volatile but with a clear increase over time starting around 20 and ending perhaps 60. My client box is pretty quick, 3ghz p4. Running the script locally, from the server, cpu time is pegged at 100% and stays...first 50k is 23 sec with a much worse decomposition to almost three minutes for final 50k. Merlin If communication code is the suspect, can we measure the difference if we disable the redefinition of recv()/send() etc in port/win32.h (may require change related code a little bit as well). In this way, the socket will not be able to pickup signals, but let see if there is any performance difference first. Regards, Qingqing [OK, I'm bringing this back on-list, and bringing it to QingQing's attention, who I secretly hope is the right person to be looking at this problem :)] P.s. You scared me ;-) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] insert performance for win32
Both win32 send/recv have pgwin32_poll_signals() in them. This is glorified WaitForSingleObjectEx on global pgwin32_signal_event. This is probably part of the problem. Can we work some of the same magic you put into check interrupts macro? Whoop! following a cvs update I see this is already nailed :) Back to the drawing board... Merlin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] insert performance for win32
On Thu, 3 Nov 2005, Magnus Hagander wrote: Sorry, I don't follow you here - what do you mean to do? Remove the event completely so we can't wait on it? I'd like to use the win32 provided recv(), send() functions instead of redirect them to pgwin32_recv()/pgwin32_send(), just like libpq does. If we do this, we will lose some functionalities, but I'd like to see the performance difference first. -- do you think that will be any difference? Regards, Qingqing ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] insert performance for win32
Both win32 send/recv have pgwin32_poll_signals() in them. This is glorified WaitForSingleObjectEx on global pgwin32_signal_event. This is probably part of the problem. Can we work some of the same magic you put into check interrupts macro? Uh, we already do that, don't we? http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/port/win3 2/ socket.c?rev=1.10 has: Yeah, we did this. I am thinking of just use simple mechanism of the win32 sockets, which could not pick up signals, but I would like to see if there is any difference -- do you think there is any point to try this? Sorry, I don't follow you here - what do you mean to do? Remove the event completely so we can't wait on it? //Magnus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] insert performance for win32
On Thu, 3 Nov 2005, Magnus Hagander wrote: Sorry, I don't follow you here - what do you mean to do? Remove the event completely so we can't wait on it? I'd like to use the win32 provided recv(), send() functions instead of redirect them to pgwin32_recv()/pgwin32_send(), just like libpq does. If we do this, we will lose some functionalities, but I'd like to see the performance difference first. -- do you think that will be any difference? Doesn't work, really. It will no longer be possible to send a signal to an idle backend. The idle backend will be blocking on recv(), that's how it works. So unless we can get around that somehow, it's a non-starter I think. Yeah, agreed. An alternative is set tiemout like 100 ms or so. When timeout happens, check the signals. But I guess you will be strongly against it. I doubt there will be much performance difference, as you hav eto hit the kernel anyway (in the recv/send call). But that part is just a guess :-) I know what you mean ... I will take a look -- if the patch (not including fix signaling problem), if doesn't change much, I will give it a try. Regards, Qingqing ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] insert performance for win32
I'd like to use the win32 provided recv(), send() functions instead of redirect them to pgwin32_recv()/pgwin32_send(), just like libpq does. If we do this, we will lose some functionalities, but I'd like to see the performance difference first. -- do you think that will be any difference? Doesn't work, really. It will no longer be possible to send a signal to an idle backend. The idle backend will be blocking on recv(), that's how it works. So unless we can get around that somehow, it's a non-starter I think. Yeah, agreed. An alternative is set tiemout like 100 ms or so. When timeout happens, check the signals. But I guess you will be strongly against it. Not on principle, but I don't think it'll give us enough gain for the cost. But if it does, I'm certainly not against it. //Magnus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] insert performance for win32
Merlin Moncure [EMAIL PROTECTED] wrote Running from remote, Time progression is: First 50k: 20 sec Second: 29 sec [...] final:: 66 sec This may due to the maintainence cost of a big transaction, I am not sure ... Tom? so, clear upward progression of time/rec. Initial time is 2.5k inserts/sec which is decent but not great for such a narrow table. CPU time on server starts around 50% and drops in exact proportion to insert performance. My earlier gprof test also suggest there is no smoking gun sucking down all the cpu time. Not to 100%, so this means the server is always starving. It is waiting on something -- of couse not lock. That's why I think there is some problem on network communication. Another suspect will be the write - I knwo NTFS system will issue an internal log when extending a file. To remove the second suspect, I will try to hack the source to do a fake write ... Regards, Qingqing ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] insert performance for win32
Qingqing Zhou [EMAIL PROTECTED] wrote Not to 100%, so this means the server is always starving. It is waiting on something -- of couse not lock. That's why I think there is some problem on network communication. Another suspect will be the write - I knwo NTFS system will issue an internal log when extending a file. To remove the second suspect, I will try to hack the source to do a fake write ... To patch: - Here is a quite straight hack to implement fake write for both relation and xlog. Now the server becomes pure CPU play. 1. RelationGetBufferForTuple()/hio.c: remove line (if you do not enable cassert, then doesn't matter): - Assert(PageIsNew((PageHeader) pageHeader)); 2. ReadBuffer()/bufmgr.c: remove line - smgrextend(reln-rd_smgr, blockNum, (char *) bufBlock, - reln-rd_istemp); 3. XLogWrite()/xlog.c errno = 0; + goto fake; if (write(openLogFile, from, nbytes) != nbytes) { /* if write didn't set errno, assume no disk space */ ... } + + fake: /* Update state for write */ To use it: - 1. have several copies of a correct data; 2. patch the server; 3. when you startup postmaster, use the following parameters: postmaster -ccheckpoint_timeout=3600 -cbgwriter_all_percent=0 -Ddata Note now the database server is one-shoot usable -- after you shutdown, it won't startup again. Just run begin; many inserts; end; To observe: - (1) In this case, what's the remote server CPU usage -- 100%? I don't have several machines to test it. In my single machine, I run 35000 insert commands from psql by cut and paste into it and could observe that: --- 25% kernel time 75% user time 20% postgresql (--enable-debug --enable-cassert) 65% psql (as same above) 10% csrss (system process, manage graphics commands (not sure, just googled it), etc) 5% system (system process) --- (2) In this case, Linux still keeps almost 10 times faster? After this, we may need more observations like comparison of simple select 1; to reduce the code space we may want to explore ... Regards, Qingqing ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] insert performance for win32
On Thu, 3 Nov 2005, Magnus Hagander wrote: Both win32 send/recv have pgwin32_poll_signals() in them. This is glorified WaitForSingleObjectEx on global pgwin32_signal_event. This is probably part of the problem. Can we work some of the same magic you put into check interrupts macro? Uh, we already do that, don't we? http://developer.postgresql.org/cvsweb.cgi/pgsql/src/backend/port/win32/ socket.c?rev=1.10 has: Yeah, we did this. I am thinking of just use simple mechanism of the win32 sockets, which could not pick up signals, but I would like to see if there is any difference -- do you think there is any point to try this? Regards, Qingqing ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] insert performance for win32
Merlin Moncure [EMAIL PROTECTED] writes: ok, I generated a test case which was 250k inserts to simple two column table all in single transaction. Every 50k inserts, time is recorded via timeofday(). You mean something like the attached? Running from remote, Time progression is: First 50k: 20 sec Second: 29 sec [...] final:: 66 sec On Unix I get a dead flat line (within measurement noise), both local loopback and across my LAN. after 5 30.20 sec after 10 31.67 sec after 15 30.98 sec after 20 29.64 sec after 25 29.83 sec top shows nearly constant CPU usage over the run, too. With a local connection it's pretty well pegged, with LAN connection the server's about 20% idle and the client about 90% (client machine is much faster than server which may affect this, but I'm too lazy to try it in the other direction). I think it's highly likely that you are looking at some strange behavior of the Windows TCP stack. regards, tom lane binU7zmyKjiMS.bin Description: timeit.c ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] insert performance for win32
On Thu, 3 Nov 2005, Tom Lane wrote: On Unix I get a dead flat line (within measurement noise), both local loopback and across my LAN. after 5 30.20 sec after 10 31.67 sec after 15 30.98 sec after 20 29.64 sec after 25 29.83 sec Confirmed in Linux. And on a winxp machine(sp2) with server, client together, with (see almost no performance difference) or without my fake write, the observation is still hold for both cases: after 5 25.21 sec after 10 26.26 sec after 15 25.23 sec after 20 26.25 sec after 25 26.58 sec In both cases, postgres 67% cpu, psql 15~20%, rest: system process. Kernel time is 40+% -- where from? Regards, Qingqing ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] insert performance for win32
I've done the tests with rc1. This is still as slow on windows ... about 6-10 times slower thant linux (via Ip socket). (depending on using prepared queries, etc...) By the way, we've tried to insert into the windows database from a linux psql client, via the network. In this configuration, inserting is only about 2 times slower than inserting locally (the linux client had a slower CPU 1700Mhz agains 3000). Could it be related to a problem in the windows psql client ? [OK, I'm bringing this back on-list, and bringing it to QingQing's attention, who I secretly hope is the right person to be looking at this problem :)] Just to recap Marc and I have been looking at the performance disparity between windows and linux for a single transaction statement by statement insert on a very narrow table with no keys from a remote client. Marc's observations showed (and I verified) that windows is much slower in this case than it should be. I gprof'ed both the psql client and the server during the insert and didn't see anything seriously out of order...unfortunately QQ's latest win32 performance tweaks haven't helped. Marc's observation that by switching to a linux client drops time down drastically is really intersing! Merlin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] insert performance for win32
I've done the tests with rc1. This is still as slow on windows ... about 6-10 times slower thant linux (via Ip socket). (depending on using prepared queries, etc...) By the way, we've tried to insert into the windows database from a linux psql client, via the network. In this configuration, inserting is only about 2 times slower than inserting locally (the linux client had a slower CPU 1700Mhz agains 3000). Could it be related to a problem in the windows psql client ? [OK, I'm bringing this back on-list, and bringing it to QingQing's attention, who I secretly hope is the right person to be looking at this problem :)] Just to recap Marc and I have been looking at the performance disparity between windows and linux for a single transaction statement by statement insert on a very narrow table with no keys from a remote client. Marc's observations showed (and I verified) that windows is much slower in this case than it should be. I gprof'ed both the psql client and the server during the insert and didn't see anything seriously out of order...unfortunately QQ's latest win32 performance tweaks haven't helped. Marc's observation that by switching to a linux client drops time down drastically is really intersing! Could this be a case of the network being slow, as we've seen a couple of times before? And if you run psql on the local box, you get it double. Do you get a speed difference between the local windows box and a remote wnidows box? //Magnus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] insert performance for win32
Le Mercredi 02 Novembre 2005 14:54, Magnus Hagander a écrit : I've done the tests with rc1. This is still as slow on windows ... about 6-10 times slower thant linux (via Ip socket). (depending on using prepared queries, etc...) By the way, we've tried to insert into the windows database from a linux psql client, via the network. In this configuration, inserting is only about 2 times slower than inserting locally (the linux client had a slower CPU 1700Mhz agains 3000). Could it be related to a problem in the windows psql client ? [OK, I'm bringing this back on-list, and bringing it to QingQing's attention, who I secretly hope is the right person to be looking at this problem :)] Just to recap Marc and I have been looking at the performance disparity between windows and linux for a single transaction statement by statement insert on a very narrow table with no keys from a remote client. Marc's observations showed (and I verified) that windows is much slower in this case than it should be. I gprof'ed both the psql client and the server during the insert and didn't see anything seriously out of order...unfortunately QQ's latest win32 performance tweaks haven't helped. Marc's observation that by switching to a linux client drops time down drastically is really intersing! Could this be a case of the network being slow, as we've seen a couple of times before? And if you run psql on the local box, you get it double. Do you get a speed difference between the local windows box and a remote wnidows box? //Magnus The Windows-Windows test is local (via loopback interface) The Linux (client) - Windows (server) is via network (100Mbits) I can't test with 2 windows box ... I haven't got that much (all machines linux, except the test one...) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] insert performance for win32
Magnus Hagander [EMAIL PROTECTED] writes: Marc's observation that by switching to a linux client drops time down drastically is really intersing! Could this be a case of the network being slow, I'm wondering about nonstandard junk lurking in the TCP stack of the Windows client machine. Also, I seem to recall something about a QOS patch that people are supposed to apply, or not apply as the case may be, to get Windows' TCP stack to behave reasonably ... ring a bell? regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] insert performance for win32
On Wed, 2 Nov 2005, Merlin Moncure wrote: By the way, we've tried to insert into the windows database from a linux psql client, via the network. In this configuration, inserting is only about 2 times slower than inserting locally (the linux client had a slower CPU 1700Mhz agains 3000). Could it be related to a problem in the windows psql client ? If you put client/server on the same machine, then we don't know how the CPU is splitted. Can you take a look at the approximate number by observing the task manager data while running? If communication code is the suspect, can we measure the difference if we disable the redefinition of recv()/send() etc in port/win32.h (may require change related code a little bit as well). In this way, the socket will not be able to pickup signals, but let see if there is any performance difference first. Regards, Qingqing [OK, I'm bringing this back on-list, and bringing it to QingQing's attention, who I secretly hope is the right person to be looking at this problem :)] P.s. You scared me ;-) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] insert performance for win32
One thing I did notice that in a 250k insert transaction the insert time grows with #recs inserted. Time to insert first 50k recs is about 27 sec and last 50 k recs is 77 sec. I also confimed that size of table is not playing a role here. Marc, can you do select timeofday() every 50k recs from linux? Also a gprof trace from linux would be helpful. Here's the timeofday ... i'll do the gprof as soon as I can. Every 5 rows... Were those all in a single transaction? Merlin ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] insert performance for win32
On Tuesday 06 September 2005 19:11, Merlin Moncure wrote: This makes me wonder if we are looking in the wrong place. Maybe the problem is coming from psql? More results to follow. problem is not coming from psql. One thing I did notice that in a 250k insert transaction the insert time grows with #recs inserted. Time to insert first 50k recs is about 27 sec and last 50 k recs is 77 sec. I also confimed that size of table is not playing a role here. Marc, can you do select timeofday() every 50k recs from linux? Also a gprof trace from linux would be helpful. Here's the timeofday ... i'll do the gprof as soon as I can. Every 5 rows... Wed Sep 07 13:58:13.860378 2005 CEST Wed Sep 07 13:58:20.926983 2005 CEST Wed Sep 07 13:58:27.928385 2005 CEST Wed Sep 07 13:58:35.472813 2005 CEST Wed Sep 07 13:58:42.825709 2005 CEST Wed Sep 07 13:58:50.789486 2005 CEST Wed Sep 07 13:58:57.553869 2005 CEST Wed Sep 07 13:59:04.298136 2005 CEST Wed Sep 07 13:59:11.066059 2005 CEST Wed Sep 07 13:59:19.368694 2005 CEST Merlin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] insert performance for win32
On Tuesday 06 September 2005 19:11, Merlin Moncure wrote: Here's the timeofday ... i'll do the gprof as soon as I can. Every 5 rows... Wed Sep 07 13:58:13.860378 2005 CEST Wed Sep 07 13:58:20.926983 2005 CEST Wed Sep 07 13:58:27.928385 2005 CEST Wed Sep 07 13:58:35.472813 2005 CEST Wed Sep 07 13:58:42.825709 2005 CEST Wed Sep 07 13:58:50.789486 2005 CEST Wed Sep 07 13:58:57.553869 2005 CEST Wed Sep 07 13:59:04.298136 2005 CEST Wed Sep 07 13:59:11.066059 2005 CEST Wed Sep 07 13:59:19.368694 2005 CEST ok, I've been in crunching profile profile graphs, and so far have been only been able to draw following conclusions. For bulk, 'in-transaction' insert: 1. win32 is slower than linux. win32 time for each insert grows with # inserts in xact, linux does not (or grows much slower). Win32 starts out about 3x slower and grows to 10x slower after 250k inserts. 2. ran a 50k profile vs. 250k profile. Nothing jumps out as being slower or faster: most time is spent in yyparse on either side. From this my preliminary conclusion is that there is something going on in the win32 api which is not showing in the profile. 3. The mingw gprof cumulative seconds does not show measurable growth in cpu time/insert in 50k/250k profile. I'm now talking suggestions about where to look for performance problems :(. Merlin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] insert performance for win32
Hi, I usually use PostgreSQL coupled with Linux, but I have to use Windows for a perticular project. So I wanted to do some tests to know if the performance will be acceptable (I don't need PostgreSQL to be as fast with windows as with linux, but it has to be usable...). I started with trying to do lots of inserts, and I'm quite astonished by the catastrophics results ... The test : The computer was the same (my workstation, a PIV Dell with SATA disk), dual boot The windows OS is XP. Both Oses are PostgreSQL 8.0.3 Both PostgreSQL clusters (windows and linux) have the same tuning (shared_buffers=2, wal_buffers=128, checkpoint_segments=10) Before each test, the clusters are vacuum analyzed, and the test database is recreated. The script is quite dumb : BEGIN; CREATE TABLE test (col1 serial, col2 text); INSERT INTO test (col2) values ('test'); INSERT INTO test (col2) values ('test'); INSERT INTO test (col2) values ('test'); INSERT INTO test (col2) values ('test'); INSERT INTO test (col2) values ('test'); .. 500,000 times Then COMMIT. I know it isn't realistic, but I needed to start with something :) The results are as follows : Linux : 1'9'' Windows : 9'38'' What I've tried to solve, and didn't work : - Deactivate antivirus on windows - fsync=no - raise the checkpoint_segments value (32) - remove hyperthreading (who knows...) I don't know what could cause this (I'm not a windows admin...at all). All I see is a very high kernel load during the execution of this script, but I can't determine where it comes from. I'd like to know if this is a know problem, if there is something I can do, etc... Thanks a lot. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] insert performance for win32
Hi, I usually use PostgreSQL coupled with Linux, but I have to use Windows for a perticular project. So I wanted to do some tests to know if the performance will be acceptable (I don't need PostgreSQL to be as fast with windows as with linux, but it has to be usable...). In my experience win32 is par with linux generally with a few gotchas on either side. Are your times with fsync=no? It's much harder to give apples-apples comparison with fsync=on for various reasons. Are you running stats_command_string=on? Try disabling and compare results. Is your loading app running locally or on the server? I am very interesting in discovering sources of high cpu load problems on win32. If you are still having problems could you get a gprof profile together? There is a recent thread on win32-hackers discussing how to do this. Merlin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] insert performance for win32
In my experience win32 is par with linux generally with a few gotchas on either side. Are your times with fsync=no? It's much harder to give apples-apples comparison with fsync=on for various reasons. It is with fsync=off on windows, fsync=on on linux Are you running stats_command_string=on? Try disabling and compare results. Deactivated on windows, activated on linux Is your loading app running locally or on the server? Yes I am very interesting in discovering sources of high cpu load problems on win32. If you are still having problems could you get a gprof profile together? There is a recent thread on win32-hackers discussing how to do this. I'll give it a look Merlin ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] insert performance for win32
In my experience win32 is par with linux generally with a few gotchas on either side. Are your times with fsync=no? It's much harder to give apples-apples comparison with fsync=on for various reasons. It is with fsync=off on windows, fsync=on on linux well, inside a transaction this shouldn't have mattered anyways. Are you running stats_command_string=on? Try disabling and compare results. Deactivated on windows, activated on linux Is your loading app running locally or on the server? Yes hm :(. Well, you had me curious so I went ahead and re-ran your test case and profiled it (on windows). I got similar results time wise. It's interesting to note that the command I used to generate the test table before dumping w/inserts insert into test select nextval('test_id_seq'), 'test' from generate_series(1,50) ran in just a few seconds. Well, I cut the #recs down to 50k and here is profile trace: % cumulative self self total time seconds secondscalls s/call s/call name 10.78 0.62 0.6250001 0.00 0.00 yyparse 5.39 0.93 0.31 5101422 0.00 0.00 AllocSetAlloc 4.52 1.19 0.26 799970 0.00 0.00 base_yylex 2.78 1.35 0.16 28 0.00 0.00 SearchCatCache 2.43 1.49 0.14 554245 0.00 0.00 hash_search 2.26 1.62 0.1349998 0.00 0.00 XLogInsert 1.74 1.72 0.10 453363 0.00 0.00 LWLockAcquire 1.74 1.82 0.10 299988 0.00 0.00 ScanKeywordLookup This makes me wonder if we are looking in the wrong place. Maybe the problem is coming from psql? More results to follow. Merlin ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] insert performance for win32
This makes me wonder if we are looking in the wrong place. Maybe the problem is coming from psql? More results to follow. problem is not coming from psql. One thing I did notice that in a 250k insert transaction the insert time grows with #recs inserted. Time to insert first 50k recs is about 27 sec and last 50 k recs is 77 sec. I also confimed that size of table is not playing a role here. Marc, can you do select timeofday() every 50k recs from linux? Also a gprof trace from linux would be helpful. Merlin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match