Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1
Dave, Are you testing this with Tom's code, you need to do a baseline measurement with 10 and then increase it, you will still get lots of cs, but it will be less. No, that was just a test of 1000 straight up.Tom outlined a method, but I didn't see any code that would help me find a better level, other than just trying each +100 increase one at a time. This would take days of testing ... -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1
Josh, I think you can safely increase by orders of magnitude here, instead of by +100, my wild ass guess is that the sweet spot is the spin time should be approximately the time it takes to consume the resource. So if you have a really fast machine then the spin count should be higher. Also you have to take into consideration your memory bus speed, with the pause instruction inserted in the loop the timing is now dependent on memory speed. But... you need a baseline first. Dave On Tue, 2004-04-27 at 14:05, Josh Berkus wrote: Dave, Are you testing this with Tom's code, you need to do a baseline measurement with 10 and then increase it, you will still get lots of cs, but it will be less. No, that was just a test of 1000 straight up.Tom outlined a method, but I didn't see any code that would help me find a better level, other than just trying each +100 increase one at a time. This would take days of testing ... -- Dave Cramer 519 939 0336 ICQ # 14675561 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1
Dave, But... you need a baseline first. A baseline on CS? I have that -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1
Dave, Yeah, I did some more testing myself, and actually get better numbers with increasing spins per delay to 1000, but my suspicion is that it is highly dependent on finding the right delay for the processor you are on. Well, it certainly didn't help here: procs memory swap io system cpu r b swpd free buff cache si sobibo incs us sy id wa 2 0 0 14870744 123872 112991200 0 0 1027 187341 48 27 26 0 2 0 0 14869912 123872 112991200 048 1030 126490 65 18 16 0 2 0 0 14867032 123872 112991200 0 0 1021 106046 72 16 12 0 2 0 0 14869912 123872 112991200 0 0 1025 90256 76 14 10 0 2 0 0 14870424 123872 112991200 0 0 1022 135249 63 22 16 0 2 0 0 14872664 123872 112991200 0 0 1023 13 63 20 17 0 1 0 0 14871128 123872 112991200 048 1024 155728 57 22 20 0 2 0 0 14871128 123872 112991200 0 0 1028 189655 49 29 22 0 2 0 0 14871064 123872 112991200 0 0 1018 190744 48 29 23 0 2 0 0 14871064 123872 112991200 0 0 1027 186812 51 26 23 0 -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1
Are you testing this with Tom's code, you need to do a baseline measurement with 10 and then increase it, you will still get lots of cs, but it will be less. Dave On Mon, 2004-04-26 at 20:03, Josh Berkus wrote: Dave, Yeah, I did some more testing myself, and actually get better numbers with increasing spins per delay to 1000, but my suspicion is that it is highly dependent on finding the right delay for the processor you are on. Well, it certainly didn't help here: procs memory swap io system cpu r b swpd free buff cache si sobibo incs us sy id wa 2 0 0 14870744 123872 112991200 0 0 1027 187341 48 27 26 0 2 0 0 14869912 123872 112991200 048 1030 126490 65 18 16 0 2 0 0 14867032 123872 112991200 0 0 1021 106046 72 16 12 0 2 0 0 14869912 123872 112991200 0 0 1025 90256 76 14 10 0 2 0 0 14870424 123872 112991200 0 0 1022 135249 63 22 16 0 2 0 0 14872664 123872 112991200 0 0 1023 13 63 20 17 0 1 0 0 14871128 123872 112991200 048 1024 155728 57 22 20 0 2 0 0 14871128 123872 112991200 0 0 1028 189655 49 29 22 0 2 0 0 14871064 123872 112991200 0 0 1018 190744 48 29 23 0 2 0 0 14871064 123872 112991200 0 0 1027 186812 51 26 23 0 -- Dave Cramer 519 939 0336 ICQ # 14675561 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1
On Thu, 2004-04-22 at 10:37 -0700, Josh Berkus wrote: Tom, The tricky part is that a slow adaptation rate means we can't have every backend figuring this out for itself --- the right value would have to be maintained globally, and I'm not sure how to do that without adding a lot of overhead. This may be a moot point, since you've stated that changing the loop timing won't solve the problem, but what about making the test part of make? I don't think too many systems are going to change processor architectures once in production, and those that do can be told to re-compile. Sure they do - PostgreSQL is regularly provided as a pre-compiled distribution. I haven't compiled PostgreSQL for years, and we have it running on dozens of machines, some SMP, some not, but most running Debian Linux. Even having a compiler _installed_ on one of our client's database servers would usually be considered against security procedures, and would get a black mark when the auditors came through. Regards, Andrew McMillan - Andrew @ Catalyst .Net .NZ Ltd, PO Box 11-053, Manners St, Wellington WEB: http://catalyst.net.nz/ PHYS: Level 2, 150-154 Willis St DDI: +64(4)916-7201 MOB: +64(21)635-694 OFFICE: +64(4)499-2267 Planning an election? Call us! - ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1
Yeah, I did some more testing myself, and actually get better numbers with increasing spins per delay to 1000, but my suspicion is that it is highly dependent on finding the right delay for the processor you are on. My hypothesis is that if you spin approximately the same or more time than the average time it takes to get finished with the shared resource then this should reduce cs. Certainly more ideas are required here. Dave On Wed, 2004-04-21 at 22:35, Tom Lane wrote: Dave Cramer [EMAIL PROTECTED] writes: diff -c -r1.16 s_lock.c *** backend/storage/lmgr/s_lock.c 8 Aug 2003 21:42:00 - 1.16 --- backend/storage/lmgr/s_lock.c 21 Apr 2004 20:27:34 - *** *** 76,82 * The select() delays are measured in centiseconds (0.01 sec) because 10 * msec is a common resolution limit at the OS level. */ ! #define SPINS_PER_DELAY 100 #define NUM_DELAYS1000 #define MIN_DELAY_CSEC1 #define MAX_DELAY_CSEC100 --- 76,82 * The select() delays are measured in centiseconds (0.01 sec) because 10 * msec is a common resolution limit at the OS level. */ ! #define SPINS_PER_DELAY 10 #define NUM_DELAYS1000 #define MIN_DELAY_CSEC1 #define MAX_DELAY_CSEC100 As far as I can tell, this does reduce the rate of semop's significantly, but it does so by bringing the overall processing rate to a crawl :-(. I see 97% CPU idle time when using this patch. I believe what is happening is that the select() delay in s_lock.c is being hit frequently because the spin loop isn't allowed to run long enough to let the other processor get out of the spinlock. regards, tom lane !DSPAM:40872f7e21492906114513! -- Dave Cramer 519 939 0336 ICQ # 14675561 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1
More data On a dual xeon with HTT enabled: I tried increasing the NUM_SPINS to 1000 and it works better. NUM_SPINLOCKS CS ID pgbench 100 250K59% 230 TPS 1000125K55% 228 TPS This is certainly heading in the right direction ? Although it looks like it is highly dependent on the system you are running on. --dc-- On Wed, 2004-04-21 at 22:53, Josh Berkus wrote: Tom, As far as I can tell, this does reduce the rate of semop's significantly, but it does so by bringing the overall processing rate to a crawl :-(. I see 97% CPU idle time when using this patch. I believe what is happening is that the select() delay in s_lock.c is being hit frequently because the spin loop isn't allowed to run long enough to let the other processor get out of the spinlock. Also, I tested it on production data, and it reduces the CSes by about 40%. An improvement, but not a magic bullet. -- Dave Cramer 519 939 0336 ICQ # 14675561 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1
Dave Cramer [EMAIL PROTECTED] writes: My hypothesis is that if you spin approximately the same or more time than the average time it takes to get finished with the shared resource then this should reduce cs. The only thing we use spinlocks for nowadays is to protect LWLocks, so the average time involved is fairly small and stable --- or at least that was the design intention. What we seem to be seeing is that on SMP machines, cache coherency issues cause the TAS step itself to be expensive and variable. However, in the experiments I did, strace'ing showed that actual spin timeouts (manifested by the execution of a delaying select()) weren't actually that common; the big source of context switches is semop(), which indicates contention at the LWLock level rather than the spinlock level. So while tuning the spinlock limit count might be a useful thing to do in general, I think it will have only negligible impact on the particular problems we're discussing in this thread. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1
Josh Berkus wrote: Tom, Having to recompile to run on single- vs dual-processor machines doesn't seem like it would fly. Oh, I don't know. Many applications require compiling for a target architecture; SQL Server, for example, won't use a 2nd processor without re-installation. I'm not sure about Oracle. It certainly wasn't too long ago that Linux gurus were esposing re-compiling the kernel for the machine. And it's not like they would *have* to re-compile to use PostgreSQL after adding an additional processor. Just if they wanted to maximize peformance benefit. Also, this is a fairly rare circumstance, I think; to judge by my clients, once a database server is in production nobody touches the hardware. A much simpler solution would be for the postmaster to run a test during startup. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1
Tom, Having to recompile to run on single- vs dual-processor machines doesn't seem like it would fly. Oh, I don't know. Many applications require compiling for a target architecture; SQL Server, for example, won't use a 2nd processor without re-installation. I'm not sure about Oracle. It certainly wasn't too long ago that Linux gurus were esposing re-compiling the kernel for the machine. And it's not like they would *have* to re-compile to use PostgreSQL after adding an additional processor. Just if they wanted to maximize peformance benefit. Also, this is a fairly rare circumstance, I think; to judge by my clients, once a database server is in production nobody touches the hardware. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1
Tom, The tricky part is that a slow adaptation rate means we can't have every backend figuring this out for itself --- the right value would have to be maintained globally, and I'm not sure how to do that without adding a lot of overhead. This may be a moot point, since you've stated that changing the loop timing won't solve the problem, but what about making the test part of make? I don't think too many systems are going to change processor architectures once in production, and those that do can be told to re-compile. -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1
On Thu, 2004-04-22 at 13:55, Tom Lane wrote: Josh Berkus [EMAIL PROTECTED] writes: This may be a moot point, since you've stated that changing the loop timing won't solve the problem, but what about making the test part of make? I don't think too many systems are going to change processor architectures once in production, and those that do can be told to re-compile. Having to recompile to run on single- vs dual-processor machines doesn't seem like it would fly. Is it something the postmaster could quickly determine and set a global during the startup cycle? ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1
attached. -- Dave Cramer 519 939 0336 ICQ # 14675561 Index: backend/storage/lmgr/s_lock.c === RCS file: /usr/local/cvs/pgsql-server/src/backend/storage/lmgr/s_lock.c,v retrieving revision 1.16 diff -c -r1.16 s_lock.c *** backend/storage/lmgr/s_lock.c 8 Aug 2003 21:42:00 - 1.16 --- backend/storage/lmgr/s_lock.c 21 Apr 2004 20:27:34 - *** *** 76,82 * The select() delays are measured in centiseconds (0.01 sec) because 10 * msec is a common resolution limit at the OS level. */ ! #define SPINS_PER_DELAY 100 #define NUM_DELAYS 1000 #define MIN_DELAY_CSEC 1 #define MAX_DELAY_CSEC 100 --- 76,82 * The select() delays are measured in centiseconds (0.01 sec) because 10 * msec is a common resolution limit at the OS level. */ ! #define SPINS_PER_DELAY 10 #define NUM_DELAYS 1000 #define MIN_DELAY_CSEC 1 #define MAX_DELAY_CSEC 100 *** *** 88,93 --- 88,94 while (TAS(lock)) { + __asm__ __volatile__ ( rep;nop: : :memory); if (++spins SPINS_PER_DELAY) { if (++delays NUM_DELAYS) Index: include/storage/s_lock.h === RCS file: /usr/local/cvs/pgsql-server/src/include/storage/s_lock.h,v retrieving revision 1.115.2.1 diff -c -r1.115.2.1 s_lock.h *** include/storage/s_lock.h 4 Nov 2003 09:43:56 - 1.115.2.1 --- include/storage/s_lock.h 21 Apr 2004 20:26:25 - *** *** 103,110 register slock_t _res = 1; __asm__ __volatile__( ! lock \n xchgb %0,%1 \n : =q(_res), =m(*lock) : 0(_res)); return (int) _res; --- 103,113 register slock_t _res = 1; __asm__ __volatile__( ! cmpb $0,%1 \n ! jne 1f \n ! lock \n xchgb %0,%1 \n + 1:\n : =q(_res), =m(*lock) : 0(_res)); return (int) _res; ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1
Dave Cramer [EMAIL PROTECTED] writes: diff -c -r1.16 s_lock.c *** backend/storage/lmgr/s_lock.c 8 Aug 2003 21:42:00 - 1.16 --- backend/storage/lmgr/s_lock.c 21 Apr 2004 20:27:34 - *** *** 76,82 * The select() delays are measured in centiseconds (0.01 sec) because 10 * msec is a common resolution limit at the OS level. */ ! #define SPINS_PER_DELAY 100 #define NUM_DELAYS 1000 #define MIN_DELAY_CSEC 1 #define MAX_DELAY_CSEC 100 --- 76,82 * The select() delays are measured in centiseconds (0.01 sec) because 10 * msec is a common resolution limit at the OS level. */ ! #define SPINS_PER_DELAY 10 #define NUM_DELAYS 1000 #define MIN_DELAY_CSEC 1 #define MAX_DELAY_CSEC 100 As far as I can tell, this does reduce the rate of semop's significantly, but it does so by bringing the overall processing rate to a crawl :-(. I see 97% CPU idle time when using this patch. I believe what is happening is that the select() delay in s_lock.c is being hit frequently because the spin loop isn't allowed to run long enough to let the other processor get out of the spinlock. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Wierd context-switching issue on Xeon patch for 7.4.1
Bruce Momjian [EMAIL PROTECTED] writes: For BSDOS it has: #if (CLIENT_OS == OS_FREEBSD) || (CLIENT_OS == OS_BSDOS) || \ (CLIENT_OS == OS_OPENBSD) || (CLIENT_OS == OS_NETBSD) { /* comment out if inappropriate for your *bsd - cyp (25/may/1999) */ int ncpus; size_t len = sizeof(ncpus); int mib[2]; mib[0] = CTL_HW; mib[1] = HW_NCPU; if (sysctl( mib[0], 2, ncpus, len, NULL, 0 ) == 0) //if (sysctlbyname(hw.ncpu, ncpus, len, NULL, 0 ) == 0) cpucount = ncpus; } Multiplied by how many platforms? Ewww... I was wondering about some sort of dynamic adaptation, roughly along the lines of whenever a spin loop successfully gets the lock after spinning, decrease the allowed loop count by one; whenever we fail to get the lock after spinning, increase by 100; if the loop count reaches, say, 1, decide we are on a uniprocessor and irreversibly set it to 1. As written this would tend to incur a select() delay once per hundred spinlock acquisitions, which is way too much, but I think we could make it work with a sufficiently slow adaptation rate. The tricky part is that a slow adaptation rate means we can't have every backend figuring this out for itself --- the right value would have to be maintained globally, and I'm not sure how to do that without adding a lot of overhead. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend