Re: [PATCHES] WIN32_CONSOLE usage
Bruce Momjian wrote: I think we ought to detect the console type automatically anyway. Certainly we don't want people to have to set this variables all the time. Agreed. Automatic is ideal. Any ideas out there? Does input from the console work correctly right now? What about setting the code page of the console to the ansi code page? Or to the code page that postgresql uses internally. SetConsoleCP(GetACP()); SetConsoleOutputCP(GetACP()); might be sufficient. -- Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PATCHES] WIN32_CONSOLE usage
Christoph Dalitz wrote: On Thu, 11 Sep 2003 20:51:24 +0200 Manfred Spraul <[EMAIL PROTECTED]> wrote: - OemToChar() and CharToOem() convert all console input/output. In the long run this might be the better solution, if it works entirely without user intervention. I'm not sure if it's possible to get all corner cases right. I do not think that it is possible to handle all cases automatically. Even restricting the conversion to input/output on stdin/stdout will fail in some circumstances; eg. with the command "psql < script.sql" when the script has been written with a windows editor (notepad, emacs or whatever). It would be very hard to get right. Perhaps something like isatty(), except that it's probably called GetHandleIsConsoleFlag() or something like that, followed by a check of the code page. But I didn't find a suitable function yet. -- Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PATCHES] WIN32_CONSOLE usage
Peter Eisentraut wrote: If you can detect that they are different, why can't you adjust the code page in that case only? What should we do if we detect that they differ: - set the console code page to the ansi code page. This has two drawbacks: It doesn't work with Indic, because Indic doesn't have an ansi code page. And the user must still switch the console font. But: if the user must change the configuration, then he can as easily change both the font and the code page. Which means: SetConsoleCP is the wrong approach. - OemToChar() and CharToOem() convert all console input/output. In the long run this might be the better solution, if it works entirely without user intervention. I'm not sure if it's possible to get all corner cases right. -- Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
[PATCHES] Align large shared memory allocations
Attached is a patch that aligns large shared memory allocations beyond MAXIMUM_ALIGNOF. The reason for this is that Intel's cpus have a fast path for bulk memory copies that only works with aligned addresses. It's possible that other cpus have similar restrictions. With 7.3.4, it achives a 5% performance gain with pgbench. It has no effect with 7.3.3, because the buffers are already aligned by chance. I haven't properly tested 7.4cvs yet. One problem is the "32" - it's arbitrary, it probably belongs into an arch dependant header file. But where? -- Manfred diff -u pgsql.orig/src/backend/storage/ipc/shmem.c pgsql/src/backend/storage/ipc/shmem.c --- pgsql.orig/src/backend/storage/ipc/shmem.c 2003-09-20 20:17:08.0 +0200 +++ pgsql/src/backend/storage/ipc/shmem.c 2003-09-20 20:34:21.0 +0200 @@ -131,6 +131,7 @@ void * ShmemAlloc(Size size) { + uint32 newStart; uint32 newFree; void *newSpace; @@ -146,10 +147,21 @@ SpinLockAcquire(ShmemLock); - newFree = shmemseghdr->freeoffset + size; + newStart = shmemseghdr->freeoffset; + if (size >= BLCKSZ) + { + /* Align BLCKSZ sized buffers even further: +* - the costs are small +* - some cpus (most notably Intel Pentium III) +* prefer well-aligned addresses for memory copies +*/ + newStart = TYPEALIGN(32, newStart); + } + + newFree = newStart + size; if (newFree <= shmemseghdr->totalsize) { - newSpace = (void *) MAKE_PTR(shmemseghdr->freeoffset); + newSpace = (void *) MAKE_PTR(newStart); shmemseghdr->freeoffset = newFree; } else ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PATCHES] Align large shared memory allocations
Tom Lane wrote: Manfred Spraul <[EMAIL PROTECTED]> writes: Attached is a patch that aligns large shared memory allocations beyond MAXIMUM_ALIGNOF. The reason for this is that Intel's cpus have a fast path for bulk memory copies that only works with aligned addresses. This patch is missing a demonstration that it's actually worth anything. What kind of performance gain do you get? 7.4cvs on a 1.13 GHz Intel Celeron mobile, 384 MB RAM, "Severn" RedHat Linux 2.4 beta, postmaster -N 30 -B 64, data directory on ramdisk, pgbench -c 10 -s 11 -t 1000: Without the patch: 124 tps with the patch: 130 tps. I've reduced the buffer setting to 64 because without that, a too large part of the database was cached by postgres. I expect that with all Intel Pentium III chips, it will be worth 10-20% less system time. I had around 30% system time after reducing the number of buffers, thus the ~5% performance improvement. We don't really have arch-dependent header files. What I'd be inclined to do is "#define ALIGNOF_BUFFER 32" in pg_config_manual.h, then #define BUFFERALIGN(LEN) to parallel the other TYPEALIGN macros in c.h, and finally use that in the ShmemAlloc code. Ok, new patch attached. -- Manfred Index: src/backend/storage/ipc/shmem.c === RCS file: /projects/cvsroot/pgsql-server/src/backend/storage/ipc/shmem.c,v retrieving revision 1.70 diff -u -r1.70 shmem.c --- src/backend/storage/ipc/shmem.c 4 Aug 2003 02:40:03 - 1.70 +++ src/backend/storage/ipc/shmem.c 21 Sep 2003 07:53:13 - @@ -131,6 +131,7 @@ void * ShmemAlloc(Size size) { + uint32 newStart; uint32 newFree; void *newSpace; @@ -146,10 +147,14 @@ SpinLockAcquire(ShmemLock); - newFree = shmemseghdr->freeoffset + size; + newStart = shmemseghdr->freeoffset; + if (size >= BLCKSZ) + newStart = BUFFERALIGN(newStart); + + newFree = newStart + size; if (newFree <= shmemseghdr->totalsize) { - newSpace = (void *) MAKE_PTR(shmemseghdr->freeoffset); + newSpace = (void *) MAKE_PTR(newStart); shmemseghdr->freeoffset = newFree; } else Index: src/include/c.h === RCS file: /projects/cvsroot/pgsql-server/src/include/c.h,v retrieving revision 1.152 diff -u -r1.152 c.h --- src/include/c.h 4 Aug 2003 02:40:10 - 1.152 +++ src/include/c.h 21 Sep 2003 07:53:14 - @@ -529,6 +529,7 @@ #define LONGALIGN(LEN) TYPEALIGN(ALIGNOF_LONG, (LEN)) #define DOUBLEALIGN(LEN) TYPEALIGN(ALIGNOF_DOUBLE, (LEN)) #define MAXALIGN(LEN) TYPEALIGN(MAXIMUM_ALIGNOF, (LEN)) +#define BUFFERALIGN(LEN) TYPEALIGN(ALIGNOF_BUFFER, (LEN)) /* Index: src/include/pg_config_manual.h === RCS file: /projects/cvsroot/pgsql-server/src/include/pg_config_manual.h,v retrieving revision 1.5 diff -u -r1.5 pg_config_manual.h --- src/include/pg_config_manual.h 4 Aug 2003 00:43:29 - 1.5 +++ src/include/pg_config_manual.h 21 Sep 2003 07:53:14 - @@ -176,6 +176,14 @@ */ #define MAX_RANDOM_VALUE (0x7FFF) +/* + * Alignment of the disk blocks in the shared memory area. + * A significant amount of the total system time is required for + * copying disk blocks between the os buffers and the cache in the + * shared memory area. Some cpus (most notably Intel Pentium III) + * prefer well-aligned addresses for memory copies. + */ +#define ALIGNOF_BUFFER 32 /* * ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] fix for strict-alias warnings
Tom Lane wrote: "Andrew Dunstan" <[EMAIL PROTECTED]> writes: Of course, the linux kernel is aimed at a limited set of compilers - as I understand it basically gcc although it has been made to build with Intel compilers icc once compiled the kernel. But they had to teach it quite a lots of gccisms. - which makes things somewhat easier for them. What is our target set of compilers? What is our target version of C? "Pretty much anything that speaks ANSI C" is my usual feeling about that. As yet we have not heard of any non-gcc compilers in which this is a problem, although you have a point that some compiler somewhere may do this and not have a way to turn it off :-( Intel's icc compiler supports strict alias analysis, but the default was off. Also note that uninhibited casting between types can still cause alignment problems, We understand that issue, we solved it years ago. BTW, I haven't looked at the problem spots in detail. How many of them are due to the use of MemSet in conjunction with other access to a chunk of memory? ISTM that we need not worry about code motion around a MemSet call, since that would require the compiler to prove that the memset() path through the macro wouldn't be affected, which I doubt it would think. gcc is quite good at propagating constants around. This is heavily used in the linux-kernel: __builtin_constant(x), and then large switch statements that are completely evaluated at compile time. There is a good chance that gcc figures out that MemSet(,0,sizeof(double)) are two writes to two integer values, and then decides that they can't alias with reads/write to the double. I'll search for a suitable gcc list and post the memset macro - that might give a definitive answer. -- Manfred ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] fix for strict-alias warnings
I've asked the question on the gcc devel list. The first reply was that MemSet violates strict aliasing rules: http://gcc.gnu.org/ml/gcc/2003-10/msg00524.html I think we must either add -fno-strict-aliasing, or switch to the c compiler memset functions for gcc. -- Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PATCHES] fix for strict-alias warnings
Tom Lane wrote: Given that gcc is smart enough not to move any code across the memset() call, I doubt that it would be moving anything across the whole if() construct. Now if the if-condition were such that the memset code path could be optimized away, then we'd have a problem, but in practice I do not believe gcc is smart enough to realize that the alignment check is always true. gcc-3.2.2 optimizes the memset away - that's a simple exercise for gcc. gcc-3.2.2 isn't smart enough to replace everything - it didn't like the pointer arithmetics. After some massaging, I've succeeded in generating bad code using a slightly modified MemSetAligned macro (parameters -O2 -fstrict-aliasing): gcc pipelined the x*x around the memset. Annotated asm output with gcc -O99 -fomit-frame-pointer -fstrict-aliasing: 08048328 : 8048328: 83 ec 18sub$0x18,%esp > stack setup for automatic variables. 804832b: c7 44 24 0c 00 00 00movl $0x0,0xc(%esp,1) 8048332: 00 8048333: c7 44 24 10 00 00 00movl $0x4000,0x10(%esp,1) 804833a: 40 x = 1.0; 804833b: dd 44 24 0c fldl 0xc(%esp,1) 804833f: d8 c8 fmul %st(0),%st x = x*x; 8048341: c7 44 24 0c 00 00 00movl $0x0,0xc(%esp,1) 8048348: 00 8048349: c7 44 24 10 00 00 00movl $0x0,0x10(%esp,1) 8048350: 00 > MemSetAligned(): optimized to storing two ints. 8048351: dd 54 24 0c fstl 0xc(%esp,1) >>> write back the result of x*x to the stack 8048355: dd 1c 24fstpl (%esp,1) >>> push x*x for printf call 8048358: 68 54 84 04 08 push $0x8048454 >>> push pointer to "square is %f.\n" 804835d: e8 06 ff ff ff call 8048268 <_init+0x38> call printf 8048362: 83 c4 1cadd$0x1c,%esp 8048365: c3 ret > and exit. 8048366: 89 f6 mov%esi,%esi To paraphrase the ISO C line: gcc is not your grandfather's gcc. It's within 10% of the best compilers for SpecInt - the propagation and analysis of constants it quite good, and several bugs were fixed sinced 3.2.2. What is the oldest gcc versions still supported by postgres? It seems that the strict alias analysis is from the egcs tree. Probably first supported by egcs-1.1.2 - is that gcc-2.91? http://groups.google.de/groups?q=g:thl2087564510d&dq=&hl=de&lr=&ie=UTF-8&oe=UTF-8&selm=fa.fjlldvv.l7m7hk%40ifi.uio.no -- Manfred #include #include typedef signed int int32; typedef int Size; #define INT_ALIGN_MASK (sizeof(int)-1) #define MEMSET_LOOP_LIMIT 1024 #define MemSetAligned(start, val, len) \ do \ { \ int32 * _start = (int32 *) (start); \ int _val = (val); \ Size _len = (len); \ \ if ((_len & INT_ALIGN_MASK) == 0 && \ _val == 0 && \ _len <= MEMSET_LOOP_LIMIT) \ { \ Sizeoffset; \ _len = _len/sizeof(int); \ for (offset=0;offset<_len;offset++) \ _start[offset] = 0; \ } \ else \ memset_is_this_function_optimized_away_question_mark((char *) _start, _val, _len); \ } while (0) void test2(void) { double x; x = 2.0; MemSetAligned(&x, 0, sizeof(double)); x = x*x; printf("square is %f.\n", x); } int main(void) { test2(); return 1; } ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] fix for strict-alias warnings
Tom Lane wrote: Manfred Spraul <[EMAIL PROTECTED]> writes: After some massaging, I've succeeded in generating bad code using a slightly modified MemSetAligned macro (parameters -O2 -fstrict-aliasing): gcc pipelined the x*x around the memset. As I already explained, we do not care about the MemSetAligned case. Is gcc 3.3 smart enough to optimize away the pointer alignment test in the full macro? 3.2 optimizes away the pointer alignment test, but then doesn't pipeline the "x*x" calculation. It might be due to a known (and now fixed) bug in gcc where is lost track of constants, and thus didn't succeed in optimizing long calculations. I don't have gcc 3.3 installed, but IMHO it would be insane to leave strict alias analysis enabled - writing to *(int32*)addr violates the alias rules, the bad code generated with MemSetAligned proved that. Is someone around with 3.3 who could test MemSet? -- Manfred ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PATCHES] SIGPIPE handling, take two.
pqsecure_write tries to catch SIGPIPE signals generated by network disconnects by setting the signal handler to SIG_IGN. The current approach causes several problems: - it always sets SA_RESTART when it restores the old handler. - it's not reliable for multi threaded apps, because another thread could change the signal handler inbetween. - it's slow, because after setting a signal handler to SIG_IGN the kernel must enumerate all threads and clear all pending signals (at least FreeBSD-5.1 and linux-2.6 do that. Earlier linux kernels don't - their signal handling is known to be broken for multithreaded apps). Initially I proposed a new option for PQconnectdb, but Tom didn't like that. The attached patch autodetects if it should set the signal handler, Tom proposed that. The code doesn't try to check if the signal is "handled" by blocking it, because I haven't figured out how to check that: sigprocmask is undefined for multithreaded apps and calling pthread_sigmask would force every libpq user to link against libpthread. -- Manfred ? src/interfaces/libpq/libpq.so.3.1 Index: src/interfaces/libpq/fe-connect.c === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/fe-connect.c,v retrieving revision 1.263 diff -c -r1.263 fe-connect.c *** src/interfaces/libpq/fe-connect.c 18 Oct 2003 05:02:06 - 1.263 --- src/interfaces/libpq/fe-connect.c 2 Nov 2003 18:29:40 - *** *** 41,46 --- 41,47 #include #endif #include + #include #endif #include "libpq/ip.h" *** *** 951,956 --- 952,983 else if (conn->sslmode[0] == 'a') /* "allow" */ conn->wait_ssl_try = true; #endif + /* +* Autodetect SIGPIPE signal handling: +* The default action per Unix spec is kill current process and +* that's not acceptable. If the current setting is not the default, +* then assume that the caller knows what he's doing and leave the +* signal handler unchanged. Otherwise set the signal handler to +* SIG_IGN around each send() syscall. Unfortunately this is both +* unreliable and slow for multithreaded apps. +*/ + conn->do_sigaction = true; + #if !defined(HAVE_POSIX_SIGNALS) + { + pqsigfunc old; + old = signal(SIGPIPE, SIG_IGN); + if (old != SIG_DFL) + conn->do_sigaction = false; + signal(SIGPIPE, old); + } + #else + { + struct sigaction oact; + + if (sigaction(SIGPIPE, NULL, &oact) == 0 && oact.sa_handler != SIG_DFL) + conn->do_sigaction = false; + } + #endif /* !HAVE_POSIX_SIGNALS */ /* * Set up to try to connect, with protocol 3.0 as the first attempt. Index: src/interfaces/libpq/fe-secure.c === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/fe-secure.c,v retrieving revision 1.32 diff -c -r1.32 fe-secure.c *** src/interfaces/libpq/fe-secure.c29 Sep 2003 16:38:04 - 1.32 --- src/interfaces/libpq/fe-secure.c2 Nov 2003 18:29:41 - *** *** 348,354 ssize_t n; #ifndef WIN32 ! pqsigfunc oldsighandler = pqsignal(SIGPIPE, SIG_IGN); #endif #ifdef USE_SSL --- 348,357 ssize_t n; #ifndef WIN32 ! pqsigfunc oldsighandler = NULL; ! ! if (conn->do_sigaction) ! oldsighandler = pqsignal(SIGPIPE, SIG_IGN); #endif #ifdef USE_SSL *** *** 408,414 n = send(conn->sock, ptr, len, 0); #ifndef WIN32 ! pqsignal(SIGPIPE, oldsighandler); #endif return n; --- 411,418 n = send(conn->sock, ptr, len, 0); #ifndef WIN32 ! if (conn->do_sigaction) ! pqsignal(SIGPIPE, oldsighandler); #endif return n; Index: src/interfaces/libpq/libpq-int.h === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/libpq-int.h,v retrieving revision 1.82 diff -c -r1.82 libpq-int.h *** src/interfaces/libpq/libpq-int.h5 Sep 2003 02:08:36 - 1.82 --- src/interfaces/libpq/libpq-int.h2 Nov 2003 18:29:42 - *** *** 329,334 --- 329,335 charpeer_dn[256 + 1]; /* peer distinguished name */ charpeer_cn[SM_USER + 1]; /* peer common name */ #endif + booldo_sigaction; /* set SIGPIPE to SIG_IGN around every send() call */ /* Buffer for current error message */ PQExpBufferData errorMessage; /* expansible string */ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[PATCHES] sigpipe handling
Attatched is the latest version of my patch that makes the signal(SIG_PIPE, SIG_IGN) calls around the send() syscall conditional: they are not sufficient to ensure that multithreaded libpq users are not killed by SIGPIPE signals, and they cause a noticable slowdown. I've switched to a global flag, and two functions to get/set the flag. Any other ideas how to protect libpq against SIGPIPE? -- Manfred Index: contrib/pgbench/README.pgbench === RCS file: /projects/cvsroot/pgsql-server/contrib/pgbench/README.pgbench,v retrieving revision 1.9 diff -c -r1.9 README.pgbench *** contrib/pgbench/README.pgbench 10 Jun 2003 09:07:15 - 1.9 --- contrib/pgbench/README.pgbench 8 Nov 2003 21:43:53 - *** *** 112,117 --- 112,121 might be a security hole since ps command will show the password. Use this for TESTING PURPOSE ONLY. + -a + Disable SIGPIPE delivery globally instead of within each + libpq operation. + -n No vacuuming and cleaning the history table prior to the test is performed. Index: contrib/pgbench/pgbench.c === RCS file: /projects/cvsroot/pgsql-server/contrib/pgbench/pgbench.c,v retrieving revision 1.27 diff -c -r1.27 pgbench.c *** contrib/pgbench/pgbench.c 27 Sep 2003 19:15:34 - 1.27 --- contrib/pgbench/pgbench.c 8 Nov 2003 21:43:54 - *** *** 28,33 --- 28,34 #else #include #include + #include #ifdef HAVE_GETOPT_H #include *** *** 105,112 static void usage() { ! fprintf(stderr, "usage: pgbench [-h hostname][-p port][-c nclients][-t ntransactions][-s scaling_factor][-n][-C][-v][-S][-N][-l][-U login][-P password][-d][dbname]\n"); ! fprintf(stderr, "(initialize mode): pgbench -i [-h hostname][-p port][-s scaling_factor][-U login][-P password][-d][dbname]\n"); } /* random number generator */ --- 106,113 static void usage() { ! fprintf(stderr, "usage: pgbench [-h hostname][-p port][-c nclients][-t ntransactions][-s scaling_factor][-n][-C][-v][-S][-N][-l][-a][-U login][-P password][-d][dbname]\n"); ! fprintf(stderr, "(initialize mode): pgbench -i [-h hostname][-p port][-s scaling_factor][-U login][-P password][-d][dbname][-a]\n"); } /* random number generator */ *** *** 703,712 else if ((env = getenv("PGUSER")) != NULL && *env != '\0') login = env; ! while ((c = getopt(argc, argv, "ih:nvp:dc:t:s:U:P:CNSl")) != -1) { switch (c) { case 'i': is_init_mode++; break; --- 704,719 else if ((env = getenv("PGUSER")) != NULL && *env != '\0') login = env; ! while ((c = getopt(argc, argv, "aih:nvp:dc:t:s:U:P:CNSl")) != -1) { switch (c) { + case 'a': + #ifndef WIN32 + signal(SIGPIPE, SIG_IGN); + #endif + PQsetsighandling(0); + break; case 'i': is_init_mode++; break; Index: doc/src/sgml/libpq.sgml === RCS file: /projects/cvsroot/pgsql-server/doc/src/sgml/libpq.sgml,v retrieving revision 1.141 diff -c -r1.141 libpq.sgml *** doc/src/sgml/libpq.sgml 1 Nov 2003 01:56:29 - 1.141 --- doc/src/sgml/libpq.sgml 8 Nov 2003 21:43:56 - *** *** 645,650 --- 645,693 + + PQsetsighandlingPQsetsighandling + PQgetsighandlingPQgetsighandling + + +Set/query SIGPIPE signal handling. + + void PQsetsighandling(int internal_sigign); + + + int PQgetsighandling(void); + + + + + These functions allow to query and set the SIGPIPE signal handling + of libpq: by default, Unix systems generate a (fatal) SIGPIPE signal + on write attempts to a disconnected socket. Most callers expect a + normal error return instead of the signal. A normal error return can + be achieved by blocking or ignoring the SIGPIPE signal. This can be + done either globally in the application or inside libpq. + + + If internal signal handling is enabled (this is the default), then + libpq sets the SIGPIPE handler to SIG_IGN before every socket send + operation and restores it afterwards. This prevents libpq from + killing the application, at the cost of a slight performance + decrease. This approach is not reliable for multithreaded applications. + + + If internal signal handling is disabled, then the caller is + respons
Re: [PATCHES] SIGPIPE handling, take two.
Tom Lane wrote: Bruce Momjian <[EMAIL PROTECTED]> writes: I think this is the patch I like. The #if coding is messy and unnecessary. You could do the test as per the non-POSIX variant using two calls of pqsignal(), and not have any system dependence here, nor a need for . What about multithreaded apps? old = pgsignal(SIPEPIPE, SIG_IGN); ** another thread calls sigaction(SIGPIPE,,); pgsignal(SIGPIPE, old); And the signal state is corrupted. What about extending pgsignal: pgsignal(signo, SIG_ERR); reads the current signal handler. I'll update my patch. From your other mail: No, because this patch does not have any global effect on the signal handling. It might be unnecessary to check per-connection, but it doesn't hurt, and on grounds of cleanliness I'd prefer to avoid a global variable. I agree - global state would require global synchronization. -- Manfred ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PATCHES] SIGPIPE handling, take two.
Tom Lane wrote: I don't think we need to complicate pqsignal's API for this. Instead we'd better document that SIGPIPE handling has to be set up and kept stable before doing any libpq operations in a multithread app. Not reliable. An app could install it's own signal handler and block SIGPIPE around all libpq calls. Signal blocking is per-thread. But the SIG_IGN/restore sequence affects the whole app - PQconnectdb calls would result in randomly dropped SIGPIPE signals. -- Manfred ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PATCHES] SIGPIPE handling
Hi, attached is an update of my automatic sigaction patch: I've moved the actual sigaction calls into pqsignal.c and added a helper function (pgsignalinquire(signo)). I couldn't remove the include from fe-connect.c: it's required for the SIGPIPE definition. Additionally I've added a -a flag for pgbench that sets the signal handler before calling PQconnectdb. Tested on Fedora Core 1 (Redhat Linux) with pgbench. -- Manfred Index: src/interfaces/libpq/fe-connect.c === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/fe-connect.c,v retrieving revision 1.263 diff -c -r1.263 fe-connect.c *** src/interfaces/libpq/fe-connect.c 18 Oct 2003 05:02:06 - 1.263 --- src/interfaces/libpq/fe-connect.c 16 Nov 2003 11:44:47 - *** *** 41,46 --- 41,48 #include #endif #include + #include + #include "pqsignal.h" #endif #include "libpq/ip.h" *** *** 881,886 --- 883,891 struct addrinfo hint; const char *node = NULL; int ret; + #ifndef WIN32 + pqsigfunc pipehandler; + #endif if (!conn) return 0; *** *** 950,955 --- 955,976 conn->allow_ssl_try = false; else if (conn->sslmode[0] == 'a') /* "allow" */ conn->wait_ssl_try = true; + #endif + #ifndef WIN32 + /* +* Autodetect SIGPIPE signal handling: +* The default action per Unix spec is kill current process and +* that's not acceptable. If the current setting is not the default, +* then assume that the caller knows what he's doing and leave the +* signal handler unchanged. Otherwise set the signal handler to +* SIG_IGN around each send() syscall. Unfortunately this is both +* unreliable and slow for multithreaded apps. +*/ + pipehandler = pqsignalinquire(SIGPIPE); + if (pipehandler == SIG_DFL || pipehandler == SIG_ERR) + conn->do_sigaction = true; + else + conn->do_sigaction = false; #endif /* Index: src/interfaces/libpq/fe-secure.c === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/fe-secure.c,v retrieving revision 1.32 diff -c -r1.32 fe-secure.c *** src/interfaces/libpq/fe-secure.c29 Sep 2003 16:38:04 - 1.32 --- src/interfaces/libpq/fe-secure.c16 Nov 2003 11:44:47 - *** *** 348,354 ssize_t n; #ifndef WIN32 ! pqsigfunc oldsighandler = pqsignal(SIGPIPE, SIG_IGN); #endif #ifdef USE_SSL --- 348,357 ssize_t n; #ifndef WIN32 ! pqsigfunc oldsighandler = NULL; ! ! if (conn->do_sigaction) ! oldsighandler = pqsignal(SIGPIPE, SIG_IGN); #endif #ifdef USE_SSL *** *** 408,414 n = send(conn->sock, ptr, len, 0); #ifndef WIN32 ! pqsignal(SIGPIPE, oldsighandler); #endif return n; --- 411,418 n = send(conn->sock, ptr, len, 0); #ifndef WIN32 ! if (conn->do_sigaction) ! pqsignal(SIGPIPE, oldsighandler); #endif return n; Index: src/interfaces/libpq/libpq-int.h === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/libpq-int.h,v retrieving revision 1.82 diff -c -r1.82 libpq-int.h *** src/interfaces/libpq/libpq-int.h5 Sep 2003 02:08:36 - 1.82 --- src/interfaces/libpq/libpq-int.h16 Nov 2003 11:44:48 - *** *** 329,334 --- 329,337 charpeer_dn[256 + 1]; /* peer distinguished name */ charpeer_cn[SM_USER + 1]; /* peer common name */ #endif + #ifndef WIN32 + booldo_sigaction; /* set SIGPIPE to SIG_IGN around every send() call */ + #endif /* Buffer for current error message */ PQExpBufferData errorMessage; /* expansible string */ Index: src/interfaces/libpq/pqsignal.c === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/pqsignal.c,v retrieving revision 1.17 diff -c -r1.17 pqsignal.c *** src/interfaces/libpq/pqsignal.c 4 Aug 2003 02:40:20 - 1.17 --- src/interfaces/libpq/pqsignal.c 16 Nov 2003 11:44:48 - *** *** 40,42 --- 40,61 return oact.sa_handler; #endif /* !HAVE_POSIX_SIGNALS */ } + + pqsigfunc + pqsignalinquire(int signo) + { + #if !defined(HAVE_POSIX_SIGNALS) + pqsigfunc old; + old = signal(SIGPIPE, SIG_IGN); + signal(SIGPIPE, old); + return old; + #else + struct sigaction oact; + + if (sigaction(SIGPIPE, NULL, &oact) != 0) + return SIG_ERR; +
Re: [PATCHES] SIGPIPE handling
Bruce Momjian wrote: Better. However, I am confused over when we do sigaction. I thought we were going to do it only if they had a signal handler defined, meaning if (pipehandler != SIG_DFL && pipehandler != SIG_IGN && pipehandler != SIG_ERR) conn->do_sigaction = true; else conn->do_sigaction = false; By doing this, we don't do sigaction in the default case where no handler was defined. No. If no handler was definied, then libpq must define a handler. Without a handler, a network disconnect would result in a SIGPIE that kills the app. I thought we would just set the entire application to SIGPIPE <= SIG_IGN. This gives us good performance in all cases except when a signal handler is defined. I don't want to change the whole app - perhaps someone expects that sigpipe works? Perhaps psql for the console input, or something similar? Is running the rest of the application with SIGPIPE <= SIG_IGN a problem? I think that depends on the application, and libpq shouldn't mandate that SIGPIPE must be SIG_IGN. Right now libpq tries to catch sigpipe signals by manually installing/restoring a signal handler around send() calls. This doesn't work for multithreaded apps, because the signal handlers are per-process, not per-thread. Thus for multithreaded apps, the libpq user is responsible for handling sigpipe. The API change should be a big problem - the current system doesn't work, and there shouldn't be many multithreaded apps. But how should libpq notice that the caller handles sigpipe signals? a) autodetection - if the sigpipe handler is not the default, then the caller knows what he's doing. b) a new PGsetsignalhandler() function. c) an additional flag passed to PGconnectdb. Tom preferred a). One problem is that the autodetection is not perfect: an app could block the signal with sigprocmask, or it could install a handler that doesn't expect sigpipe signals from within libpq. I would prefer b), because it guarantees that the patch has no effect on existing apps. c) is bad, Tom explained that the connect string is often directly specified by the user. -- Manfred ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PATCHES] SRA Win32 sync() code
Tom Lane wrote: Seriously though, if we can move the bulk of the writing work into background processes then I don't believe that there will be any significant penalty for regular backends. And I believe that it would be a huge advantage from a correctness point of view if we could stop depending on sync(). Which function guarantees that renames of WAL files arrived on the disk? AFAIK sync() is the only function that guarantees that. What about the sync app from sysinternals? It seems Mark Russinovich figured out how to implement sync on Win32: http://www.sysinternals.com/ntw2k/source/misc.shtml#Sync It requires administrative priveledges, but it shouldn't be that difficult to write a tiny service that runs in the LocalSystem account, listens to a pipe and syncs all disks when asked. -- Manfred ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PATCHES] SIGPIPE handling
Bruce Momjian wrote: I thought it should be global too, basically testing on the first connection request. What if two PQconnect calls happen at the same time? I would really prefer the manual approach with a new PQsetsighandler function - the autodetection is fragile, it's trivial to find a special case where it breaks. Bruce, you wrote that a new function would be overdesign. Are you sure? Your simpler proposals all fail with multithreaded apps. I've attached the patch that implements the global flag with two special function that access it. -- Manfred Index: contrib/pgbench/README.pgbench === RCS file: /projects/cvsroot/pgsql-server/contrib/pgbench/README.pgbench,v retrieving revision 1.9 diff -c -r1.9 README.pgbench *** contrib/pgbench/README.pgbench 10 Jun 2003 09:07:15 - 1.9 --- contrib/pgbench/README.pgbench 8 Nov 2003 21:43:53 - *** *** 112,117 --- 112,121 might be a security hole since ps command will show the password. Use this for TESTING PURPOSE ONLY. + -a + Disable SIGPIPE delivery globally instead of within each + libpq operation. + -n No vacuuming and cleaning the history table prior to the test is performed. Index: contrib/pgbench/pgbench.c === RCS file: /projects/cvsroot/pgsql-server/contrib/pgbench/pgbench.c,v retrieving revision 1.27 diff -c -r1.27 pgbench.c *** contrib/pgbench/pgbench.c 27 Sep 2003 19:15:34 - 1.27 --- contrib/pgbench/pgbench.c 8 Nov 2003 21:43:54 - *** *** 28,33 --- 28,34 #else #include #include + #include #ifdef HAVE_GETOPT_H #include *** *** 105,112 static void usage() { ! fprintf(stderr, "usage: pgbench [-h hostname][-p port][-c nclients][-t ntransactions][-s scaling_factor][-n][-C][-v][-S][-N][-l][-U login][-P password][-d][dbname]\n"); ! fprintf(stderr, "(initialize mode): pgbench -i [-h hostname][-p port][-s scaling_factor][-U login][-P password][-d][dbname]\n"); } /* random number generator */ --- 106,113 static void usage() { ! fprintf(stderr, "usage: pgbench [-h hostname][-p port][-c nclients][-t ntransactions][-s scaling_factor][-n][-C][-v][-S][-N][-l][-a][-U login][-P password][-d][dbname]\n"); ! fprintf(stderr, "(initialize mode): pgbench -i [-h hostname][-p port][-s scaling_factor][-U login][-P password][-d][dbname][-a]\n"); } /* random number generator */ *** *** 703,712 else if ((env = getenv("PGUSER")) != NULL && *env != '\0') login = env; ! while ((c = getopt(argc, argv, "ih:nvp:dc:t:s:U:P:CNSl")) != -1) { switch (c) { case 'i': is_init_mode++; break; --- 704,719 else if ((env = getenv("PGUSER")) != NULL && *env != '\0') login = env; ! while ((c = getopt(argc, argv, "aih:nvp:dc:t:s:U:P:CNSl")) != -1) { switch (c) { + case 'a': + #ifndef WIN32 + signal(SIGPIPE, SIG_IGN); + #endif + PQsetsighandling(0); + break; case 'i': is_init_mode++; break; Index: doc/src/sgml/libpq.sgml === RCS file: /projects/cvsroot/pgsql-server/doc/src/sgml/libpq.sgml,v retrieving revision 1.141 diff -c -r1.141 libpq.sgml *** doc/src/sgml/libpq.sgml 1 Nov 2003 01:56:29 - 1.141 --- doc/src/sgml/libpq.sgml 8 Nov 2003 21:43:56 - *** *** 645,650 --- 645,693 + + PQsetsighandlingPQsetsighandling + PQgetsighandlingPQgetsighandling + + +Set/query SIGPIPE signal handling. + + void PQsetsighandling(int internal_sigign); + + + int PQgetsighandling(void); + + + + + These functions allow to query and set the SIGPIPE signal handling + of libpq: by default, Unix systems generate a (fatal) SIGPIPE signal + on write attempts to a disconnected socket. Most callers expect a + normal error return instead of the signal. A normal error return can + be achieved by blocking or ignoring the SIGPIPE signal. This can be + done either globally in the application or inside libpq. + + + If internal signal handling is enabled (this is the default), then + libpq sets the SIGPIPE handler to SIG_IGN before every socket send + operation and restores it afterwards. This prevents libpq from + killing the application, at the cost of a slight performance + dec
Re: [PATCHES] SIGPIPE handling
Bruce Momjian wrote: Here is my logic --- 99% of apps don't install a SIGPIPE signal handler, and 90% will not add a SIGPIPE/SIG_IGN call to their applications. I guess I am looking for something that would allow the performance benefit of not doing a pgsignal() call around very send() for the majority of our apps. What was the speed improvement? Around 10% for a heavily multithreaded app on an 8-way Xeon server. Far less for a single threaded app and far less for uniprocessor systems: the kernel must update the pending queue of all threads and that causes lots of contention for the (per-process) spinlock that protects the signal handlers. Granted, we need to do something because our current setup isn't even thread-safe. Also, how is your patch more thread-safe than the old one? The detection is thread-safe, but I don't see how the use is. First function in main(): signal(SIGPIPE, SIG_IGN); PQsetsighandling(1); This results in perfectly thread-safe sigpipe handling. If it's a multithreaded app that needs correct correct per-thread delivery of SIGPIPE signals for console IO, then the libpq user must implement the sequence I describe below. If you still pgsignal around the calls, I don't see how two threads couldn't do: thread 1thread 2 pgsignal(SIGPIPE, SIG_IGN); pgsignal(SIGPIPE, SIG_DFL); send(); pgsignal(SIGPIPE, SIG_DFL); send(); pgsignal(SIGPIPE, SIG_DFL); This runs thread1 with SIGPIPE as SIG_DFL. Correct. A thread safe sequence might be something like: pthread_sigmask(SIG_BLOCK,{SIGPIPE}); send(); if (sigpending(SIGPIPE) { sigwait({SIGPIPE},); } pthread_sigmask(SIG_UNBLOCK,{SIGPIPE}); But this sequence only works for users that link against libpthread. And the same sequence with sigprocmask is undefined for multithreaded apps. -- Manfred ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] SIGPIPE handling
Bruce Momjian wrote: OK, I know you had a flag for pgbench, and that doesn't use threads. What speedup do you see there? Tiny. I added the flag to check that my implementation works, not as a benchmark tool. I would not expect a library to require me to do something in my code to be thread-safe --- either it is or it isn't. The library is thread-safe. Just the SIGPIPE handling differs: - single thread: handled by libpq. - multi thread: caller must handle SIGPIPE for libpq. Rationale: posix is broken. Per-thread signal handling is too ugly to think about. Again, let's get it working perfect if they say they are going to use threads with libpq. Does it work OK if the app doesn't use threading? No. pthread_sigmask is part of libpthread - libpq would have to link unconditionally against libpthread. Or use __attribute__((weak, alias())), but that would only work with gcc. Does sigpending/sigwait work efficiently for threads? Another idea is to go with a thread-local storage boolean for each thread, and check that in a signal handler we install. I think installing a signal handler is not an option - libpq is a library, the signal handler is global. Seems synchronous signals like SIGPIPE are delivered to the thread that invoked them, and we can check thread-local storage to see if we were in a send() loop at the time of signal delivery. IMHO way to fragile. -- Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PATCHES] initdb copyright notice
Neil Conway wrote: A quick grep of the source tree indicates that the following files claim to be covered by the 4 clause BSD license: $ grep -rlI 'This product includes software developed' * contrib/mysql/my2pg.pl contrib/pgcrypto/README.pgcrypto contrib/pgcrypto/blf.c You must be careful with 3 clause vs. 4 clause BSD licenses: The advertising clause for UC Berkeley is now void, but all other advertising clauses are still in force. i.e. blf.c contains the line "This product includes software developed by Niels Provos", and that must be obeyed. -- Manfred ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [pgsql-hackers-win32] [PATCHES] SRA Win32 sync() code
Shridhar Daithankar wrote: Does 30% difference above count as significant? No. It's Linux, we can look at the sources: there is no per-fd cache, the page cache is global. Thus fsync() syncs the whole cache to disk. A problem could only occur if the file cache is not global - perhaps a per-node file cache on NUMA systems - IRIX on an Origin 2000 cluster or something similar. But as I read the unix spec, fsync is guaranteed to sync all data to disk: Draft 6 of the posix-200x spec: SIO If _POSIX_SYNCHRONIZED_IO is defined, the fsync( ) function shall force all currently queued I/O operations associated with the file indicated by file descriptor fildes to the synchronized I/O completion state. All I/O operations shall be completed as defined for synchronized I/O file integrity completion. "All I/O operations associated with the file", not all operations associated with the file descriptor. -- Manfred ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PATCHES] update i386 spinlock for hyperthreading
Hi, Intel recommends to add a special pause instruction into spinlock busy loops. It's necessary for hyperthreading - without it, the cpu can't figure out that a logical thread does no useful work and incorrectly awards lots of execution resources to that thread. Additionally, it's supposed to reduce the time the cpu needs to recover from the (mispredicted) branch after the spinlock was obtained. The attached patch adds a new platform hook and implements it for i386. The new instruction is backward compatible, thus no cpu detection is necessary. Additionally I've increased the number of loops from 100 to 1000 - a 3 GHz Pentium 4 might execute 100 loops faster than a single bus transaction. I don't know if this change is appropriate for all platforms, or if SPINS_PER_DELAY should be made platform specific. Mark did a test run with his dbt-2 benchmark on a 4-way Xeon with HT enabled, and the patch resulted in a 10% performance increase: Before: http://developer.osdl.org/markw/dbt2-pgsql/284/ After: http://developer.osdl.org/markw/dbt2-pgsql/300/ -- Manfred Index: ./src/backend/storage/lmgr/s_lock.c === RCS file: /projects/cvsroot/pgsql-server/src/backend/storage/lmgr/s_lock.c,v retrieving revision 1.22 diff -c -r1.22 s_lock.c *** ./src/backend/storage/lmgr/s_lock.c 23 Dec 2003 22:15:07 - 1.22 --- ./src/backend/storage/lmgr/s_lock.c 26 Dec 2003 22:24:52 - *** *** 76,82 * The select() delays are measured in centiseconds (0.01 sec) because 10 * msec is a common resolution limit at the OS level. */ ! #define SPINS_PER_DELAY 100 #define NUM_DELAYS1000 #define MIN_DELAY_CSEC1 #define MAX_DELAY_CSEC100 --- 76,82 * The select() delays are measured in centiseconds (0.01 sec) because 10 * msec is a common resolution limit at the OS level. */ ! #define SPINS_PER_DELAY 1000 #define NUM_DELAYS1000 #define MIN_DELAY_CSEC1 #define MAX_DELAY_CSEC100 *** *** 111,116 --- 111,117 spins = 0; } + CPU_DELAY(); } } Index: ./src/include/storage/s_lock.h === RCS file: /projects/cvsroot/pgsql-server/src/include/storage/s_lock.h,v retrieving revision 1.123 diff -c -r1.123 s_lock.h *** ./src/include/storage/s_lock.h 23 Dec 2003 22:15:07 - 1.123 --- ./src/include/storage/s_lock.h 26 Dec 2003 22:24:52 - *** *** 52,57 --- 52,66 *in assembly language to execute a hardware atomic-test-and-set *instruction. Equivalent OS-supplied mutex routines could be used too. * + *Additionally, a platform specific delay function can be defined: + * + *void CPU_DELAY(void) + *Notification that the cpu is executing a busy loop. + * + *Some platforms need such an indication. One example are platforms + *that implement SMT, i.e. multiple logical threads that share + *execution resources in one physical cpu. + * *If no system-specific TAS() is available (ie, HAVE_SPINLOCKS is not *defined), then we fall back on an emulation that uses SysV semaphores *(see spin.c). This emulation will be MUCH MUCH slower than a proper TAS() *** *** 115,120 --- 124,140 return (int) _res; } + #define HAS_CPU_DELAY + + #define CPU_DELAY() cpu_delay() + + static __inline__ void + cpu_delay(void) + { + __asm__ __volatile__( + " rep; nop \n" + : : : "memory"); + } #endif /* __i386__ || __x86_64__ */ *** *** 715,720 --- 735,748 #define TAS(lock) tas(lock) #endif /* TAS */ + #ifndef HAS_CPU_DELAY + #define CPU_DELAY() cpu_delay() + + static __inline__ void + cpu_delay(void) + { + } + #endif /* * Platform-independent out-of-line support routines ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PATCHES] update i386 spinlock for hyperthreading
Tom Lane wrote: Manfred Spraul <[EMAIL PROTECTED]> writes: Intel recommends to add a special pause instruction into spinlock busy loops. It's necessary for hyperthreading - without it, the cpu can't figure out that a logical thread does no useful work and incorrectly awards lots of execution resources to that thread. Additionally, it's supposed to reduce the time the cpu needs to recover from the (mispredicted) branch after the spinlock was obtained. Don't you have to put it in a specific place in the loop to make that work? If not, why not? I doubt that rep;nop is magic enough to recognize the loop that will be generated from s_lock()'s code. Rep;nop is just a short delay - that's all. It means that the cpu pipelines have a chance to drain, and that the other thread gets enough cpu resources. Below is the full instruction documentation, from the latest ia32 doc set from Intel: <<< Improves the performance of spin-wait loops. When executing a spin-wait loop, a Pentium 4 or Intel Xeon processor suffers a severe performance penalty when exiting the loop because it detects a possible memory order violation. The PAUSE instruction provides a hint to the processor that the code sequence is a spin-wait loop. The processor uses this hint to avoid the memory order violation in most situations, which greatly improves processor performance. For this reason, it is recommended that a PAUSE instruction be placed in all spin-wait loops. An additional function of the PAUSE instruction is to reduce the power consumed by a Pentium 4 processor while executing a spin loop. The Pentium 4 processor can execute a spin-wait loop extremely quickly, causing the processor to consume a lot of power while it waits for the resource it is spinning on to become available. Inserting a pause instruction in a spin-wait loop greatly reduces the processor s power consumption. This instruction was introduced in the Pentium 4 processors, but is backward compatible with all IA-32 processors. In earlier IA-32 processors, the PAUSE instruction operates like a NOP instruction. The Pentium 4 and Intel Xeon processors implement the PAUSE instruction as a pre-defined delay. The delay is finite and can be zero for some processors. This instruction does not change the architectural state of the processor (that is, it performs essentially a delaying noop operation). <<< I think a separate function is better than adding it into TAS: if it's part of tas, then it would automatically be included by every SpinLockAcquire call - unnecessary .text bloat. Additionally, there might be other busy loops, in addition to TAS, that could use a delay function. I'll post a new patch that doesn't rely on __inline__ in the i386 independant part. -- Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PATCHES] update i386 spinlock for hyperthreading
Tom Lane wrote: Manfred Spraul <[EMAIL PROTECTED]> writes: Tom Lane wrote: Don't you have to put it in a specific place in the loop to make that work? If not, why not? Rep;nop is just a short delay - that's all. That view seems to me to be directly contradicted by this statement: The PAUSE instruction provides a hint to the processor that the code sequence is a spin-wait loop. The processor uses this hint to avoid the memory order violation in most situations, which greatly improves processor performance. It's not apparent to me how a short delay translates into avoiding a memory order violation (possibly some docs on what that means exactly might help...). I suspect strongly that there needs to be some near proximity between the PAUSE instruction and the lock-test instruction for this to work as advertised. It would help if Intel were less coy about what the instruction really does. My guess: Pentium 4 cpu support something like 250 uops in flight - it will have a dozend of the spinlock loops in it's pipeline. When the spinlock is released, it must figure out which of the loops should get it, and gets lost. My guess is that rep;nop delays the cpu buy at least 100 cpu ticks, and thus the pipeline will be empty before it proceeds. I don't have a Pentium 4, and the HP testdrive is down. Someone around who could run my test app? This instruction does not change the architectural state of the processor (that is, it performs essentially a delaying noop operation). This can be rephrased as "we're not telling you what this instruction really does, because its interesting effects are below the level of the instruction set architecture". Great. How are we supposed to know how to use it? There was a w_spinlock.pdf document with reference code. google still finds it, but the links are dead :-( I think a separate function is better than adding it into TAS: if it's part of tas, then it would automatically be included by every SpinLockAcquire call - unnecessary .text bloat. Why do you think it's unnecessary? One thing that I find particularly vague in the quoted documentation is the statement that the PAUSE instruction is needed to avoid a delay when *exiting* the spin-wait loop. Doesn't this mean that a PAUSE is needed in the success path when the first TAS succeeds (i.e, the normal no-contention path)? IIRC: No. If not, why not? If so, does it go before or after the lock instruction? Neither: somewhere in the failure path. Also, if the principal effect is a "short delay", do we really need it at all considering that our inner loop in s_lock is rather more than an "xchgb" followed by a conditional branch? There will be time for the write queue to drain while we're incrementing and testing our spin counter (which I trust is in a register...). The reason I'm so full of questions is that I spent some time several days ago looking at exactly this issue, and came away with only the conclusion that I had to find some more-detailed documentation before I could figure out what we should do about the spinlocks for Xeons. I'll try to find some more docs and post links. The 2nd thing I would change is to add a nonatomic test in the slow path: locked instructions generate lots of bus traffic, and that's a waste of resources. Another question: regardless of the placement of rep;nop - 10% speedup means that the postgres spends far too much time in the spinlock code. I've looked at the oprofile dumps, and something like 1.2% of the total cpu time is spent it the TAS macro in LWLockAcquire. That's the hottest instruction in the whole profile, it eats more cpu cycles than the memcpy() calls that transfer data to/from kernel. Is there an easy way find out which LWLock is contended? -- Manfred /* * skel.cpp. skeleton for rdtsc benchmarks * * Copyright (C) 1999, 2001 by Manfred Spraul. * All rights reserved except the rights granted by the GPL. * * Redistribution of this file is permitted under the terms of the GNU * General Public License (GPL) version 2 or later. * $Header: /pub/home/manfred/cvs-tree/timetest/rep_nop.cpp,v 1.1 2001/04/07 19:38:33 manfred Exp $ */ #include #include #include #include #include // disable local interrupts during benchmark #undef USE_CLI // define a cache flushing function #undef CACHE_FLUSH #ifdef USE_CLI #include #define CLI "cli\n\t" #define STI "sti\n\t" #else #define CLI #define STI #define iopl(a) do { } while(0) #endif // Intel recommends that a serializing instruction // should be called before and after rdtsc. // CPUID is a serializing instruction. // ".align 128:" P 4 L2 cache line size #define read_rdtsc_before(time) \ __asm__ __volatile__( \ ".align 128\n\t" \ "xor %%eax,%%eax\n\t" \ CLI \ "cpuid\n\t" \
Re: [PATCHES] update i386 spinlock for hyperthreading
Tom Lane wrote: Anyway, I've committed your patch with some changes. Thanks. BTW, I noticed a lot of concern in the Intel app notes about reserving 64 or even 128 bytes for each spinlock to avoid cache line conflicts. That seems excessive to me (we use a lot of spinlocks for buffers), but perhaps it is worth looking into. This recommendation usually ignored in the Linux kernel. A few very hot spinlocks have an exclusive cacheline, but most don't. Is there an easy way find out which LWLock is contended? Not from oprofile output, as far as I can think. I've suspected for some time that the BufMgrLock is a major bottleneck, but have no proof. I'll try to write a patch that dumps the LWLock usage and ask mark to run it. -- Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PATCHES] update i386 spinlock for hyperthreading
Tom Lane wrote: Is there an easy way find out which LWLock is contended? Not from oprofile output, as far as I can think. I've suspected for some time that the BufMgrLock is a major bottleneck, but have no proof. Mark ran a DBT-2 testrun with the attached statistics patch applied: It collects stats about all lightweight locks and dumps them during shutdown. The hottest locks are Lock Acquire %contention sleep calls 8(WALInsertLock) 8679205 0.030410263934 1(LockMgrLock) 640894180.0797835113215 5(SInvalLock) 683964700.00129888812 0(BufMgrLock) 246307425 0.12029329629089 The lock numbers are from 7.4, i.e. without the patch that removes ShmemIndexLock. I've check that 8 is really WALInsertLock in the assembly output. The scary part from the system perspective are the 35 million context switches that were generated by the BufMgrLock and the LockMgrLock. I remember there were patches that tried other algorithms instead of the simple LRU for the buffer manager. Has anyone tried to change the locking of the buffer manager? The effect of padding the lightweight locks to a full cacheline appears to be negligable: With the padding, there were around 4 million performance monitor hits on the 'lock xchg' instructions. Without it (test run 300), there were 4.2 million hits. The complete data is at http://developer.osdl.org/markw/dbt2-pgsql/303/ The db log with the lock stats is at http://developer.osdl.org/markw/dbt2-pgsql/303/db/log (Warning: 6.9 MB) -- Manfred Index: src/backend/storage/lmgr/lwlock.c === RCS file: /projects/cvsroot/pgsql-server/src/backend/storage/lmgr/lwlock.c,v retrieving revision 1.19 diff -u -r1.19 lwlock.c --- src/backend/storage/lmgr/lwlock.c 20 Dec 2003 17:31:21 - 1.19 +++ src/backend/storage/lmgr/lwlock.c 27 Dec 2003 22:51:36 - @@ -36,6 +36,11 @@ PGPROC *head; /* head of list of waiting PGPROCs */ PGPROC *tail; /* tail of list of waiting PGPROCs */ /* tail is undefined when head is NULL */ + unsigned long long stat_acquire_total; + unsigned long long stat_acquire_fail; + unsigned long long stat_release_total; + unsigned long long stat_release_wakeup; + int fill[20]; } LWLock; /* @@ -159,6 +164,10 @@ lock->shared = 0; lock->head = NULL; lock->tail = NULL; + lock->stat_acquire_total = 0; + lock->stat_acquire_fail = 0; + lock->stat_release_total = 0; + lock->stat_release_wakeup = 0; } /* @@ -245,6 +254,10 @@ if (retry) lock->releaseOK = true; + lock->stat_acquire_total++; + if (retry) + lock->stat_acquire_fail++; + /* If I can get the lock, do so quickly. */ if (mode == LW_EXCLUSIVE) { @@ -440,6 +453,7 @@ Assert(lock->shared > 0); lock->shared--; } + lock->stat_release_total++; /* * See if I need to awaken any waiters. If I released a non-last @@ -477,6 +491,8 @@ } } + if (head) + lock->stat_release_wakeup++; /* We are done updating shared state of the lock itself. */ SpinLockRelease_NoHoldoff(&lock->mutex); @@ -517,5 +533,19 @@ HOLD_INTERRUPTS(); /* match the upcoming RESUME_INTERRUPTS */ LWLockRelease(held_lwlocks[num_held_lwlocks - 1]); + } +} + +void LWLockPrintStats(void); +void +LWLockPrintStats(void) +{ + int i; + for (i=0;istat_acquire_total, lock->stat_acquire_fail, +lock->stat_release_total, lock->stat_release_wakeup); } } Index: src/backend/postmaster/postmaster.c === RCS file: /projects/cvsroot/pgsql-server/src/backend/postmaster/postmaster.c,v retrieving revision 1.353 diff -u -r1.353 postmaster.c --- src/backend/postmaster/postmaster.c 25 Dec 2003 03:52:51 - 1.353 +++ src/backend/postmaster/postmaster.c 27 Dec 2003 22:51:38 - @@ -1701,7 +1701,7 @@ errno = save_errno; } - +void LWLockPrintStats(void); /* * pmdie -- signal handler for processing various postmaster signals. @@ -1733,6 +1733,7 @@ Shutdown = SmartShutdown; ereport(LOG, (errmsg("received smart shutdown request"))); + LWLockPrintStats(); if (DLGetHead(BackendList)) /* let reaper() handle this */ break; @@ -1766,6 +1767,7 @@
Re: [PATCHES] SIGPIPE handling
Bruce Momjian wrote: + /* +* We could lose a signal during this test. +* In a multi-threaded application, this might +* be a problem. Do any non-threaded platforms Threaded or non-threaded? +* lack sigaction()? +*/ Additionally, the problem is not restricted to multithreaded apps: signal(,SIG_IGN) clears all pending signals. -- Manfred ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] [HACKERS] libpq thread safety
Bruce Momjian wrote: I could not get this patch to compile. I am getting a failure because BSD/OS doesn't have pthread_rwlock_wrlock(). I am concerned other platforms might not have it either. I feared that. I'll switch to pthread_mutex_lock()+_unlock(). pthread_rwlock_wrlock()+_unlock was faster in some tests - wrlocks do not need to be async signal safe. I'll send a new patch. -- Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PATCHES] Libpq ssl fix
Hi, I'll retest the patch. I didn't have a working ssl test setup, thus I stopped when I ran into the same errors as without my patch. Probably printfs in initialize_SSL(), but no tests that the code beyond initialize_SSL() actually runs - sorry. Btw, --enable-thread-safety on Linux (RedHat Fedora Core 1) fails in configure with configure: error: *** Thread test program failed. Your platform is not thread-safe. *** Check the file 'config.log'for the exact reason. -- Manfred Tom Lane wrote: Andreas Pflug <[EMAIL PROTECTED]> writes: init_ssl_system will return 0 on success and -1 on failure, which will be interpreted just the other way round in initialize_SSL. Patch appended. Hmm, that looks backwards to me too, but this would seem to imply that Manfred Spraul failed to test his last patch at all. Manfred, care to explain why we shouldn't revert that patch in toto? 2004-03-23 22:44 momjian * doc/src/sgml/libpq.sgml, src/backend/libpq/md5.c, src/interfaces/libpq/fe-auth.c, src/interfaces/libpq/fe-connect.c, src/interfaces/libpq/fe-secure.c, src/interfaces/libpq/libpq-fe.h, src/interfaces/libpq/libpq-int.h: Add thread locking to SSL and Kerberos connections. I have removed the docs mentioning that SSL and Kerberos are not thread-safe. Manfred Spraul regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PATCHES] Libpq ssl fix
Bruce Momjian wrote: - patch for thread_test.c needed posted some hours ago. Applied. The current CVS tree work again, Andreas' patch fixed the configure failure. Additionally Andreas' libpq patch fixes ssl. I've tested the locking, too: ssl calls pq_lockingcallback. I've tested it by adding ssl support into pgbench. Should I clean up the change and post a patch? -- Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PATCHES] Compiling libpq with VisualC
Andreas Pflug wrote: I don't really think so. That mutex_initialized is a globally static variable, not a thread local one. Thread interruption between testing mutex_initialized and setting it is very unlikely and still wouldn't do much harm if it actually does happen. Very unlikely is not a good argument. And you haven't considered multiprocessor systems. They aren't that rare: all newer Pentium 4 systems have two logical cores. The harm would be a failed connection attempt - I don't think that this is acceptable. Hmm. Is libpq a .DLL? Then you could initialize the mutex in DllMain(). Otherwise create a C++ class with one instance and a constructor. Then initialize the mutex from the constructor. -- Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [PATCHES] Compiling libpq with VisualC
Andreas Pflug wrote: Bruce Momjian wrote: Agreed. My pthread book says pthread_mutex_init() should be called only once, and we have to guarantee that. If the Windows implentation allows it to be called multiple times, just create a function to be called only by Win32 that does that and leave the Unix safe. Ok, so here's the win32 workaround with the unix stuff left untouched. There's no memory interlocking api in win32 that wouldn't need some initializing api call itself, so we'd have to go for assembly level test-and-set code. Wrong. There are portable test-and-set functions in the Win32 SDK: for example InterlockedCompareExchange. They operate on ints and do not need initialization. or introduce a mandatory global libpq initializing api. There is a third option: Add one C++ file and use the constructor to initialize the mutex. VisualC is always a C++ compiler, thus C++ support shouldn't be an issue. I've attached an example app. Considering the probably quite low usage of kerberos/ssl together with threads under win32, and the very low probability of two threads/processors (!) trying to initiate a connection at the same time, it doesn't seem to be worth the compiler hassle with assembly inline. This argument is insane: Given enough installations, even a very low probability will cause failures. But this is a minor point, I think the patch should go in and I'll write with a proposal to close the race, either based on InterlockedCompareExchange or on a C++ file. -- Manfred #include #include int g_mutex; extern "C" int main(void) { printf("in main. Value now %d.\n", g_mutex); } class init_me { public: init_me::init_me(void) { g_mutex = 99; /* CreateMutex(), or whatever. */ } }; static class init_me instance_one; ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PATCHES] [PATCH] fix libpq mutex initialization for multithreaded win32 libs
Hi, win32 doesn't support a static initializer for mutexes, thus the first user must initialize the lock. The problem are concurrent "first" users - the pthread_mutex_t initialization must be synchronized. The current implementation is broken, the attached patches fixes that: mutex_initlock is a spinlock. If the pthread_mutex_t mutex is not initialized, then the spinlock is acquired, if the pthread_mutex_t is initialized if it's not yet initialized and then the spinlock is dropped. Again untested due to lack of Visual C++. -- Manfre ? GNUmakefile ? config.log ? config.status ? src/Makefile.global ? src/include/pg_config.h ? src/include/stamp-h ? src/port/pg_config_paths.h Index: src/interfaces/libpq/fe-connect.c === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/fe-connect.c,v retrieving revision 1.275 diff -c -r1.275 fe-connect.c *** src/interfaces/libpq/fe-connect.c 19 Jun 2004 04:22:17 - 1.275 --- src/interfaces/libpq/fe-connect.c 20 Jun 2004 16:38:38 - *** *** 3193,3202 #ifndef WIN32 static pthread_mutex_t singlethread_lock = PTHREAD_MUTEX_INITIALIZER; #else ! static pthread_mutex_t singlethread_lock; ! static long mutex_initialized = 0; ! if (!InterlockedExchange(&mutex_initialized, 1L)) ! pthread_mutex_init(&singlethread_lock, NULL); #endif if (acquire) pthread_mutex_lock(&singlethread_lock); --- 3193,3208 #ifndef WIN32 static pthread_mutex_t singlethread_lock = PTHREAD_MUTEX_INITIALIZER; #else ! static pthread_mutex_t singlethread_lock = NULL; ! static long mutex_initlock = 0; ! ! if (singlethread_lock == NULL) { ! while(InterlockedExchange(&mutex_initlock, 1) == 1) ! /* loop, another thread own the lock */ ; ! if (singlethread_lock == NULL) ! pthread_mutex_init(&singlethread_lock, NULL); ! InterlockedExchange(&mutex_initlock,0); ! } #endif if (acquire) pthread_mutex_lock(&singlethread_lock); Index: src/interfaces/libpq/fe-secure.c === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/fe-secure.c,v retrieving revision 1.42 diff -c -r1.42 fe-secure.c *** src/interfaces/libpq/fe-secure.c19 Jun 2004 04:22:17 - 1.42 --- src/interfaces/libpq/fe-secure.c20 Jun 2004 16:38:39 - *** *** 867,876 #ifndef WIN32 static pthread_mutex_t init_mutex = PTHREAD_MUTEX_INITIALIZER; #else ! static pthread_mutex_t init_mutex; ! static long mutex_initialized = 0L; ! if (!InterlockedExchange(&mutex_initialized, 1L)) ! pthread_mutex_init(&init_mutex, NULL); #endif pthread_mutex_lock(&init_mutex); --- 867,882 #ifndef WIN32 static pthread_mutex_t init_mutex = PTHREAD_MUTEX_INITIALIZER; #else ! static pthread_mutex_t init_mutex = NULL; ! static long mutex_initlock = 0; ! ! if (init_mutex == NULL) { ! while(InterlockedExchange(&mutex_initlock, 1) == 1) ! /* loop, another thread own the lock */ ; ! if (init_mutex == NULL) ! pthread_mutex_init(&init_mutex, NULL); ! InterlockedExchange(&mutex_initlock,0); ! } #endif pthread_mutex_lock(&init_mutex); ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PATCHES] [PATCH] s_lock support for win32
Hi, The win32 port doesn't have a native user space spinlock implementation yet. Attached is an untested patch - could someone test it? I don't have Visual C++. -- Manfred Index: src/include/storage/s_lock.h === RCS file: /projects/cvsroot/pgsql-server/src/include/storage/s_lock.h,v retrieving revision 1.126 diff -c -r1.126 s_lock.h *** src/include/storage/s_lock.h19 Jun 2004 23:02:32 - 1.126 --- src/include/storage/s_lock.h30 Jun 2004 17:14:08 - *** *** 648,653 --- 648,661 #endif/* !defined(HAS_TEST_AND_SET) */ + #if defined(WIN32) + #define HAS_TEST_AND_SET + + typedef long slock_t; + + #define TAS(lock) (InterlockedExchange(lock, 1)) + #define S_UNLOCK(lock)(InterlockedExchange(lock, 0)) + #endif /* Blow up if we didn't have any way to do spinlocks */ #ifndef HAS_TEST_AND_SET ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PATCHES] [PATCH] s_lock support for win32
Bruce Momjian wrote: Because we don't support non-gcc Win32 builds of the backend, adding this patch doesn't make sense. If we ever start to support non-gcc Win32 backends we can add this. Ok. I wasn't aware that the backend is gcc-only. But what about my libpq patch? Races in the library startup just ask for corruptions. -- Manfred ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org