[PATCHES] [PATCH] fix libpq mutex initialization for multithreaded win32 libs
Hi, win32 doesn't support a static initializer for mutexes, thus the first user must initialize the lock. The problem are concurrent first users - the pthread_mutex_t initialization must be synchronized. The current implementation is broken, the attached patches fixes that: mutex_initlock is a spinlock. If the pthread_mutex_t mutex is not initialized, then the spinlock is acquired, if the pthread_mutex_t is initialized if it's not yet initialized and then the spinlock is dropped. Again untested due to lack of Visual C++. -- Manfre ? GNUmakefile ? config.log ? config.status ? src/Makefile.global ? src/include/pg_config.h ? src/include/stamp-h ? src/port/pg_config_paths.h Index: src/interfaces/libpq/fe-connect.c === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/fe-connect.c,v retrieving revision 1.275 diff -c -r1.275 fe-connect.c *** src/interfaces/libpq/fe-connect.c 19 Jun 2004 04:22:17 - 1.275 --- src/interfaces/libpq/fe-connect.c 20 Jun 2004 16:38:38 - *** *** 3193,3202 #ifndef WIN32 static pthread_mutex_t singlethread_lock = PTHREAD_MUTEX_INITIALIZER; #else ! static pthread_mutex_t singlethread_lock; ! static long mutex_initialized = 0; ! if (!InterlockedExchange(mutex_initialized, 1L)) ! pthread_mutex_init(singlethread_lock, NULL); #endif if (acquire) pthread_mutex_lock(singlethread_lock); --- 3193,3208 #ifndef WIN32 static pthread_mutex_t singlethread_lock = PTHREAD_MUTEX_INITIALIZER; #else ! static pthread_mutex_t singlethread_lock = NULL; ! static long mutex_initlock = 0; ! ! if (singlethread_lock == NULL) { ! while(InterlockedExchange(mutex_initlock, 1) == 1) ! /* loop, another thread own the lock */ ; ! if (singlethread_lock == NULL) ! pthread_mutex_init(singlethread_lock, NULL); ! InterlockedExchange(mutex_initlock,0); ! } #endif if (acquire) pthread_mutex_lock(singlethread_lock); Index: src/interfaces/libpq/fe-secure.c === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/fe-secure.c,v retrieving revision 1.42 diff -c -r1.42 fe-secure.c *** src/interfaces/libpq/fe-secure.c19 Jun 2004 04:22:17 - 1.42 --- src/interfaces/libpq/fe-secure.c20 Jun 2004 16:38:39 - *** *** 867,876 #ifndef WIN32 static pthread_mutex_t init_mutex = PTHREAD_MUTEX_INITIALIZER; #else ! static pthread_mutex_t init_mutex; ! static long mutex_initialized = 0L; ! if (!InterlockedExchange(mutex_initialized, 1L)) ! pthread_mutex_init(init_mutex, NULL); #endif pthread_mutex_lock(init_mutex); --- 867,882 #ifndef WIN32 static pthread_mutex_t init_mutex = PTHREAD_MUTEX_INITIALIZER; #else ! static pthread_mutex_t init_mutex = NULL; ! static long mutex_initlock = 0; ! ! if (init_mutex == NULL) { ! while(InterlockedExchange(mutex_initlock, 1) == 1) ! /* loop, another thread own the lock */ ; ! if (init_mutex == NULL) ! pthread_mutex_init(init_mutex, NULL); ! InterlockedExchange(mutex_initlock,0); ! } #endif pthread_mutex_lock(init_mutex); ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PATCHES] [PATCH] s_lock support for win32
Hi, The win32 port doesn't have a native user space spinlock implementation yet. Attached is an untested patch - could someone test it? I don't have Visual C++. -- Manfred Index: src/include/storage/s_lock.h === RCS file: /projects/cvsroot/pgsql-server/src/include/storage/s_lock.h,v retrieving revision 1.126 diff -c -r1.126 s_lock.h *** src/include/storage/s_lock.h19 Jun 2004 23:02:32 - 1.126 --- src/include/storage/s_lock.h30 Jun 2004 17:14:08 - *** *** 648,653 --- 648,661 #endif/* !defined(HAS_TEST_AND_SET) */ + #if defined(WIN32) + #define HAS_TEST_AND_SET + + typedef long slock_t; + + #define TAS(lock) (InterlockedExchange(lock, 1)) + #define S_UNLOCK(lock)(InterlockedExchange(lock, 0)) + #endif /* Blow up if we didn't have any way to do spinlocks */ #ifndef HAS_TEST_AND_SET ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PATCHES] Compiling libpq with VisualC
Andreas Pflug wrote: I don't really think so. That mutex_initialized is a globally static variable, not a thread local one. Thread interruption between testing mutex_initialized and setting it is very unlikely and still wouldn't do much harm if it actually does happen. Very unlikely is not a good argument. And you haven't considered multiprocessor systems. They aren't that rare: all newer Pentium 4 systems have two logical cores. The harm would be a failed connection attempt - I don't think that this is acceptable. Hmm. Is libpq a .DLL? Then you could initialize the mutex in DllMain(). Otherwise create a C++ class with one instance and a constructor. Then initialize the mutex from the constructor. -- Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PATCHES] Compiling libpq with VisualC
Andreas Pflug wrote: Bruce Momjian wrote: Agreed. My pthread book says pthread_mutex_init() should be called only once, and we have to guarantee that. If the Windows implentation allows it to be called multiple times, just create a function to be called only by Win32 that does that and leave the Unix safe. Ok, so here's the win32 workaround with the unix stuff left untouched. There's no memory interlocking api in win32 that wouldn't need some initializing api call itself, so we'd have to go for assembly level test-and-set code. Wrong. There are portable test-and-set functions in the Win32 SDK: for example InterlockedCompareExchange. They operate on ints and do not need initialization. or introduce a mandatory global libpq initializing api. There is a third option: Add one C++ file and use the constructor to initialize the mutex. VisualC is always a C++ compiler, thus C++ support shouldn't be an issue. I've attached an example app. Considering the probably quite low usage of kerberos/ssl together with threads under win32, and the very low probability of two threads/processors (!) trying to initiate a connection at the same time, it doesn't seem to be worth the compiler hassle with assembly inline. This argument is insane: Given enough installations, even a very low probability will cause failures. But this is a minor point, I think the patch should go in and I'll write with a proposal to close the race, either based on InterlockedCompareExchange or on a C++ file. -- Manfred #include stdio.h #include stdlib.h int g_mutex; extern C int main(void) { printf(in main. Value now %d.\n, g_mutex); } class init_me { public: init_me::init_me(void) { g_mutex = 99; /* CreateMutex(), or whatever. */ } }; static class init_me instance_one; ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] Libpq ssl fix
Hi, I'll retest the patch. I didn't have a working ssl test setup, thus I stopped when I ran into the same errors as without my patch. Probably printfs in initialize_SSL(), but no tests that the code beyond initialize_SSL() actually runs - sorry. Btw, --enable-thread-safety on Linux (RedHat Fedora Core 1) fails in configure with configure: error: *** Thread test program failed. Your platform is not thread-safe. *** Check the file 'config.log'for the exact reason. -- Manfred Tom Lane wrote: Andreas Pflug [EMAIL PROTECTED] writes: init_ssl_system will return 0 on success and -1 on failure, which will be interpreted just the other way round in initialize_SSL. Patch appended. Hmm, that looks backwards to me too, but this would seem to imply that Manfred Spraul failed to test his last patch at all. Manfred, care to explain why we shouldn't revert that patch in toto? 2004-03-23 22:44 momjian * doc/src/sgml/libpq.sgml, src/backend/libpq/md5.c, src/interfaces/libpq/fe-auth.c, src/interfaces/libpq/fe-connect.c, src/interfaces/libpq/fe-secure.c, src/interfaces/libpq/libpq-fe.h, src/interfaces/libpq/libpq-int.h: Add thread locking to SSL and Kerberos connections. I have removed the docs mentioning that SSL and Kerberos are not thread-safe. Manfred Spraul regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PATCHES] Libpq ssl fix
Bruce Momjian wrote: - patch for thread_test.c needed posted some hours ago. Applied. The current CVS tree work again, Andreas' patch fixed the configure failure. Additionally Andreas' libpq patch fixes ssl. I've tested the locking, too: ssl calls pq_lockingcallback. I've tested it by adding ssl support into pgbench. Should I clean up the change and post a patch? -- Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PATCHES] [HACKERS] libpq thread safety
Bruce Momjian wrote: I could not get this patch to compile. I am getting a failure because BSD/OS doesn't have pthread_rwlock_wrlock(). I am concerned other platforms might not have it either. I feared that. I'll switch to pthread_mutex_lock()+_unlock(). pthread_rwlock_wrlock()+_unlock was faster in some tests - wrlocks do not need to be async signal safe. I'll send a new patch. -- Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PATCHES] SIGPIPE handling
Bruce Momjian wrote: + /* +* We could lose a signal during this test. +* In a multi-threaded application, this might +* be a problem. Do any non-threaded platforms Threaded or non-threaded? +* lack sigaction()? +*/ Additionally, the problem is not restricted to multithreaded apps: signal(,SIG_IGN) clears all pending signals. -- Manfred ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] update i386 spinlock for hyperthreading
Tom Lane wrote: Is there an easy way find out which LWLock is contended? Not from oprofile output, as far as I can think. I've suspected for some time that the BufMgrLock is a major bottleneck, but have no proof. Mark ran a DBT-2 testrun with the attached statistics patch applied: It collects stats about all lightweight locks and dumps them during shutdown. The hottest locks are Lock Acquire %contention sleep calls 8(WALInsertLock) 8679205 0.030410263934 1(LockMgrLock) 640894180.0797835113215 5(SInvalLock) 683964700.00129888812 0(BufMgrLock) 246307425 0.12029329629089 The lock numbers are from 7.4, i.e. without the patch that removes ShmemIndexLock. I've check that 8 is really WALInsertLock in the assembly output. The scary part from the system perspective are the 35 million context switches that were generated by the BufMgrLock and the LockMgrLock. I remember there were patches that tried other algorithms instead of the simple LRU for the buffer manager. Has anyone tried to change the locking of the buffer manager? The effect of padding the lightweight locks to a full cacheline appears to be negligable: With the padding, there were around 4 million performance monitor hits on the 'lock xchg' instructions. Without it (test run 300), there were 4.2 million hits. The complete data is at http://developer.osdl.org/markw/dbt2-pgsql/303/ The db log with the lock stats is at http://developer.osdl.org/markw/dbt2-pgsql/303/db/log (Warning: 6.9 MB) -- Manfred Index: src/backend/storage/lmgr/lwlock.c === RCS file: /projects/cvsroot/pgsql-server/src/backend/storage/lmgr/lwlock.c,v retrieving revision 1.19 diff -u -r1.19 lwlock.c --- src/backend/storage/lmgr/lwlock.c 20 Dec 2003 17:31:21 - 1.19 +++ src/backend/storage/lmgr/lwlock.c 27 Dec 2003 22:51:36 - @@ -36,6 +36,11 @@ PGPROC *head; /* head of list of waiting PGPROCs */ PGPROC *tail; /* tail of list of waiting PGPROCs */ /* tail is undefined when head is NULL */ + unsigned long long stat_acquire_total; + unsigned long long stat_acquire_fail; + unsigned long long stat_release_total; + unsigned long long stat_release_wakeup; + int fill[20]; } LWLock; /* @@ -159,6 +164,10 @@ lock-shared = 0; lock-head = NULL; lock-tail = NULL; + lock-stat_acquire_total = 0; + lock-stat_acquire_fail = 0; + lock-stat_release_total = 0; + lock-stat_release_wakeup = 0; } /* @@ -245,6 +254,10 @@ if (retry) lock-releaseOK = true; + lock-stat_acquire_total++; + if (retry) + lock-stat_acquire_fail++; + /* If I can get the lock, do so quickly. */ if (mode == LW_EXCLUSIVE) { @@ -440,6 +453,7 @@ Assert(lock-shared 0); lock-shared--; } + lock-stat_release_total++; /* * See if I need to awaken any waiters. If I released a non-last @@ -477,6 +491,8 @@ } } + if (head) + lock-stat_release_wakeup++; /* We are done updating shared state of the lock itself. */ SpinLockRelease_NoHoldoff(lock-mutex); @@ -517,5 +533,19 @@ HOLD_INTERRUPTS(); /* match the upcoming RESUME_INTERRUPTS */ LWLockRelease(held_lwlocks[num_held_lwlocks - 1]); + } +} + +void LWLockPrintStats(void); +void +LWLockPrintStats(void) +{ + int i; + for (i=0;iLWLockCounter[0];i++) { + volatile LWLock *lock = LWLockArray + i; + elog(LOG, Lock %d): acquire_total %Ld acquire_fail %Ld release_total %Ld release_wakeup %Ld\n, +i, +lock-stat_acquire_total, lock-stat_acquire_fail, +lock-stat_release_total, lock-stat_release_wakeup); } } Index: src/backend/postmaster/postmaster.c === RCS file: /projects/cvsroot/pgsql-server/src/backend/postmaster/postmaster.c,v retrieving revision 1.353 diff -u -r1.353 postmaster.c --- src/backend/postmaster/postmaster.c 25 Dec 2003 03:52:51 - 1.353 +++ src/backend/postmaster/postmaster.c 27 Dec 2003 22:51:38 - @@ -1701,7 +1701,7 @@ errno = save_errno; } - +void LWLockPrintStats(void); /* * pmdie -- signal handler for processing various postmaster signals. @@ -1733,6 +1733,7 @@ Shutdown = SmartShutdown; ereport(LOG, (errmsg(received smart
Re: [PATCHES] update i386 spinlock for hyperthreading
Tom Lane wrote: Manfred Spraul [EMAIL PROTECTED] writes: Tom Lane wrote: Don't you have to put it in a specific place in the loop to make that work? If not, why not? Rep;nop is just a short delay - that's all. That view seems to me to be directly contradicted by this statement: The PAUSE instruction provides a hint to the processor that the code sequence is a spin-wait loop. The processor uses this hint to avoid the memory order violation in most situations, which greatly improves processor performance. It's not apparent to me how a short delay translates into avoiding a memory order violation (possibly some docs on what that means exactly might help...). I suspect strongly that there needs to be some near proximity between the PAUSE instruction and the lock-test instruction for this to work as advertised. It would help if Intel were less coy about what the instruction really does. My guess: Pentium 4 cpu support something like 250 uops in flight - it will have a dozend of the spinlock loops in it's pipeline. When the spinlock is released, it must figure out which of the loops should get it, and gets lost. My guess is that rep;nop delays the cpu buy at least 100 cpu ticks, and thus the pipeline will be empty before it proceeds. I don't have a Pentium 4, and the HP testdrive is down. Someone around who could run my test app? This instruction does not change the architectural state of the processor (that is, it performs essentially a delaying noop operation). This can be rephrased as we're not telling you what this instruction really does, because its interesting effects are below the level of the instruction set architecture. Great. How are we supposed to know how to use it? There was a w_spinlock.pdf document with reference code. google still finds it, but the links are dead :-( I think a separate function is better than adding it into TAS: if it's part of tas, then it would automatically be included by every SpinLockAcquire call - unnecessary .text bloat. Why do you think it's unnecessary? One thing that I find particularly vague in the quoted documentation is the statement that the PAUSE instruction is needed to avoid a delay when *exiting* the spin-wait loop. Doesn't this mean that a PAUSE is needed in the success path when the first TAS succeeds (i.e, the normal no-contention path)? IIRC: No. If not, why not? If so, does it go before or after the lock instruction? Neither: somewhere in the failure path. Also, if the principal effect is a short delay, do we really need it at all considering that our inner loop in s_lock is rather more than an xchgb followed by a conditional branch? There will be time for the write queue to drain while we're incrementing and testing our spin counter (which I trust is in a register...). The reason I'm so full of questions is that I spent some time several days ago looking at exactly this issue, and came away with only the conclusion that I had to find some more-detailed documentation before I could figure out what we should do about the spinlocks for Xeons. I'll try to find some more docs and post links. The 2nd thing I would change is to add a nonatomic test in the slow path: locked instructions generate lots of bus traffic, and that's a waste of resources. Another question: regardless of the placement of rep;nop - 10% speedup means that the postgres spends far too much time in the spinlock code. I've looked at the oprofile dumps, and something like 1.2% of the total cpu time is spent it the TAS macro in LWLockAcquire. That's the hottest instruction in the whole profile, it eats more cpu cycles than the memcpy() calls that transfer data to/from kernel. Is there an easy way find out which LWLock is contended? -- Manfred /* * skel.cpp. skeleton for rdtsc benchmarks * * Copyright (C) 1999, 2001 by Manfred Spraul. * All rights reserved except the rights granted by the GPL. * * Redistribution of this file is permitted under the terms of the GNU * General Public License (GPL) version 2 or later. * $Header: /pub/home/manfred/cvs-tree/timetest/rep_nop.cpp,v 1.1 2001/04/07 19:38:33 manfred Exp $ */ #include stdio.h #include stdlib.h #include string.h #include unistd.h #include getopt.h // disable local interrupts during benchmark #undef USE_CLI // define a cache flushing function #undef CACHE_FLUSH #ifdef USE_CLI #include sys/io.h #define CLI cli\n\t #define STI sti\n\t #else #define CLI #define STI #define iopl(a) do { } while(0) #endif // Intel recommends that a serializing instruction // should be called before and after rdtsc. // CPUID is a serializing instruction. // .align 128: P 4 L2 cache line size #define read_rdtsc_before(time) \ __asm__ __volatile__( \ .align 128\n\t \ xor %%eax,%%eax\n\t \ CLI \ cpuid\n\t \ rdtsc\n\t \ mov %%eax,(%0)\n\t \ mov %%edx,4(%0)\n\t \ xor %%eax,%%eax\n\t \ cpuid\n\t \ : /* no output */ \ : S(time) \ : eax
Re: [PATCHES] update i386 spinlock for hyperthreading
Tom Lane wrote: Anyway, I've committed your patch with some changes. Thanks. BTW, I noticed a lot of concern in the Intel app notes about reserving 64 or even 128 bytes for each spinlock to avoid cache line conflicts. That seems excessive to me (we use a lot of spinlocks for buffers), but perhaps it is worth looking into. This recommendation usually ignored in the Linux kernel. A few very hot spinlocks have an exclusive cacheline, but most don't. Is there an easy way find out which LWLock is contended? Not from oprofile output, as far as I can think. I've suspected for some time that the BufMgrLock is a major bottleneck, but have no proof. I'll try to write a patch that dumps the LWLock usage and ask mark to run it. -- Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PATCHES] update i386 spinlock for hyperthreading
Tom Lane wrote: Manfred Spraul [EMAIL PROTECTED] writes: Intel recommends to add a special pause instruction into spinlock busy loops. It's necessary for hyperthreading - without it, the cpu can't figure out that a logical thread does no useful work and incorrectly awards lots of execution resources to that thread. Additionally, it's supposed to reduce the time the cpu needs to recover from the (mispredicted) branch after the spinlock was obtained. Don't you have to put it in a specific place in the loop to make that work? If not, why not? I doubt that rep;nop is magic enough to recognize the loop that will be generated from s_lock()'s code. Rep;nop is just a short delay - that's all. It means that the cpu pipelines have a chance to drain, and that the other thread gets enough cpu resources. Below is the full instruction documentation, from the latest ia32 doc set from Intel: Improves the performance of spin-wait loops. When executing a spin-wait loop, a Pentium 4 or Intel Xeon processor suffers a severe performance penalty when exiting the loop because it detects a possible memory order violation. The PAUSE instruction provides a hint to the processor that the code sequence is a spin-wait loop. The processor uses this hint to avoid the memory order violation in most situations, which greatly improves processor performance. For this reason, it is recommended that a PAUSE instruction be placed in all spin-wait loops. An additional function of the PAUSE instruction is to reduce the power consumed by a Pentium 4 processor while executing a spin loop. The Pentium 4 processor can execute a spin-wait loop extremely quickly, causing the processor to consume a lot of power while it waits for the resource it is spinning on to become available. Inserting a pause instruction in a spin-wait loop greatly reduces the processor s power consumption. This instruction was introduced in the Pentium 4 processors, but is backward compatible with all IA-32 processors. In earlier IA-32 processors, the PAUSE instruction operates like a NOP instruction. The Pentium 4 and Intel Xeon processors implement the PAUSE instruction as a pre-defined delay. The delay is finite and can be zero for some processors. This instruction does not change the architectural state of the processor (that is, it performs essentially a delaying noop operation). I think a separate function is better than adding it into TAS: if it's part of tas, then it would automatically be included by every SpinLockAcquire call - unnecessary .text bloat. Additionally, there might be other busy loops, in addition to TAS, that could use a delay function. I'll post a new patch that doesn't rely on __inline__ in the i386 independant part. -- Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [pgsql-hackers-win32] [PATCHES] SRA Win32 sync() code
Shridhar Daithankar wrote: Does 30% difference above count as significant? No. It's Linux, we can look at the sources: there is no per-fd cache, the page cache is global. Thus fsync() syncs the whole cache to disk. A problem could only occur if the file cache is not global - perhaps a per-node file cache on NUMA systems - IRIX on an Origin 2000 cluster or something similar. But as I read the unix spec, fsync is guaranteed to sync all data to disk: Draft 6 of the posix-200x spec: SIO If _POSIX_SYNCHRONIZED_IO is defined, the fsync( ) function shall force all currently queued I/O operations associated with the file indicated by file descriptor fildes to the synchronized I/O completion state. All I/O operations shall be completed as defined for synchronized I/O file integrity completion. All I/O operations associated with the file, not all operations associated with the file descriptor. -- Manfred ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PATCHES] initdb copyright notice
Neil Conway wrote: A quick grep of the source tree indicates that the following files claim to be covered by the 4 clause BSD license: $ grep -rlI 'This product includes software developed' * contrib/mysql/my2pg.pl contrib/pgcrypto/README.pgcrypto contrib/pgcrypto/blf.c You must be careful with 3 clause vs. 4 clause BSD licenses: The advertising clause for UC Berkeley is now void, but all other advertising clauses are still in force. i.e. blf.c contains the line This product includes software developed by Niels Provos, and that must be obeyed. -- Manfred ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PATCHES] SIGPIPE handling
Hi, attached is an update of my automatic sigaction patch: I've moved the actual sigaction calls into pqsignal.c and added a helper function (pgsignalinquire(signo)). I couldn't remove the include signal.h from fe-connect.c: it's required for the SIGPIPE definition. Additionally I've added a -a flag for pgbench that sets the signal handler before calling PQconnectdb. Tested on Fedora Core 1 (Redhat Linux) with pgbench. -- Manfred Index: src/interfaces/libpq/fe-connect.c === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/fe-connect.c,v retrieving revision 1.263 diff -c -r1.263 fe-connect.c *** src/interfaces/libpq/fe-connect.c 18 Oct 2003 05:02:06 - 1.263 --- src/interfaces/libpq/fe-connect.c 16 Nov 2003 11:44:47 - *** *** 41,46 --- 41,48 #include netinet/tcp.h #endif #include arpa/inet.h + #include signal.h + #include pqsignal.h #endif #include libpq/ip.h *** *** 881,886 --- 883,891 struct addrinfo hint; const char *node = NULL; int ret; + #ifndef WIN32 + pqsigfunc pipehandler; + #endif if (!conn) return 0; *** *** 950,955 --- 955,976 conn-allow_ssl_try = false; else if (conn-sslmode[0] == 'a') /* allow */ conn-wait_ssl_try = true; + #endif + #ifndef WIN32 + /* +* Autodetect SIGPIPE signal handling: +* The default action per Unix spec is kill current process and +* that's not acceptable. If the current setting is not the default, +* then assume that the caller knows what he's doing and leave the +* signal handler unchanged. Otherwise set the signal handler to +* SIG_IGN around each send() syscall. Unfortunately this is both +* unreliable and slow for multithreaded apps. +*/ + pipehandler = pqsignalinquire(SIGPIPE); + if (pipehandler == SIG_DFL || pipehandler == SIG_ERR) + conn-do_sigaction = true; + else + conn-do_sigaction = false; #endif /* Index: src/interfaces/libpq/fe-secure.c === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/fe-secure.c,v retrieving revision 1.32 diff -c -r1.32 fe-secure.c *** src/interfaces/libpq/fe-secure.c29 Sep 2003 16:38:04 - 1.32 --- src/interfaces/libpq/fe-secure.c16 Nov 2003 11:44:47 - *** *** 348,354 ssize_t n; #ifndef WIN32 ! pqsigfunc oldsighandler = pqsignal(SIGPIPE, SIG_IGN); #endif #ifdef USE_SSL --- 348,357 ssize_t n; #ifndef WIN32 ! pqsigfunc oldsighandler = NULL; ! ! if (conn-do_sigaction) ! oldsighandler = pqsignal(SIGPIPE, SIG_IGN); #endif #ifdef USE_SSL *** *** 408,414 n = send(conn-sock, ptr, len, 0); #ifndef WIN32 ! pqsignal(SIGPIPE, oldsighandler); #endif return n; --- 411,418 n = send(conn-sock, ptr, len, 0); #ifndef WIN32 ! if (conn-do_sigaction) ! pqsignal(SIGPIPE, oldsighandler); #endif return n; Index: src/interfaces/libpq/libpq-int.h === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/libpq-int.h,v retrieving revision 1.82 diff -c -r1.82 libpq-int.h *** src/interfaces/libpq/libpq-int.h5 Sep 2003 02:08:36 - 1.82 --- src/interfaces/libpq/libpq-int.h16 Nov 2003 11:44:48 - *** *** 329,334 --- 329,337 charpeer_dn[256 + 1]; /* peer distinguished name */ charpeer_cn[SM_USER + 1]; /* peer common name */ #endif + #ifndef WIN32 + booldo_sigaction; /* set SIGPIPE to SIG_IGN around every send() call */ + #endif /* Buffer for current error message */ PQExpBufferData errorMessage; /* expansible string */ Index: src/interfaces/libpq/pqsignal.c === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/pqsignal.c,v retrieving revision 1.17 diff -c -r1.17 pqsignal.c *** src/interfaces/libpq/pqsignal.c 4 Aug 2003 02:40:20 - 1.17 --- src/interfaces/libpq/pqsignal.c 16 Nov 2003 11:44:48 - *** *** 40,42 --- 40,61 return oact.sa_handler; #endif /* !HAVE_POSIX_SIGNALS */ } + + pqsigfunc + pqsignalinquire(int signo) + { + #if !defined(HAVE_POSIX_SIGNALS) + pqsigfunc old; + old = signal(SIGPIPE, SIG_IGN); + signal(SIGPIPE, old); + return old; + #else + struct sigaction oact; + + if (sigaction(SIGPIPE, NULL, oact) != 0) +
Re: [PATCHES] SRA Win32 sync() code
Tom Lane wrote: Seriously though, if we can move the bulk of the writing work into background processes then I don't believe that there will be any significant penalty for regular backends. And I believe that it would be a huge advantage from a correctness point of view if we could stop depending on sync(). Which function guarantees that renames of WAL files arrived on the disk? AFAIK sync() is the only function that guarantees that. What about the sync app from sysinternals? It seems Mark Russinovich figured out how to implement sync on Win32: http://www.sysinternals.com/ntw2k/source/misc.shtml#Sync It requires administrative priveledges, but it shouldn't be that difficult to write a tiny service that runs in the LocalSystem account, listens to a pipe and syncs all disks when asked. -- Manfred ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PATCHES] SIGPIPE handling
Bruce Momjian wrote: I thought it should be global too, basically testing on the first connection request. What if two PQconnect calls happen at the same time? I would really prefer the manual approach with a new PQsetsighandler function - the autodetection is fragile, it's trivial to find a special case where it breaks. Bruce, you wrote that a new function would be overdesign. Are you sure? Your simpler proposals all fail with multithreaded apps. I've attached the patch that implements the global flag with two special function that access it. -- Manfred Index: contrib/pgbench/README.pgbench === RCS file: /projects/cvsroot/pgsql-server/contrib/pgbench/README.pgbench,v retrieving revision 1.9 diff -c -r1.9 README.pgbench *** contrib/pgbench/README.pgbench 10 Jun 2003 09:07:15 - 1.9 --- contrib/pgbench/README.pgbench 8 Nov 2003 21:43:53 - *** *** 112,117 --- 112,121 might be a security hole since ps command will show the password. Use this for TESTING PURPOSE ONLY. + -a + Disable SIGPIPE delivery globally instead of within each + libpq operation. + -n No vacuuming and cleaning the history table prior to the test is performed. Index: contrib/pgbench/pgbench.c === RCS file: /projects/cvsroot/pgsql-server/contrib/pgbench/pgbench.c,v retrieving revision 1.27 diff -c -r1.27 pgbench.c *** contrib/pgbench/pgbench.c 27 Sep 2003 19:15:34 - 1.27 --- contrib/pgbench/pgbench.c 8 Nov 2003 21:43:54 - *** *** 28,33 --- 28,34 #else #include sys/time.h #include unistd.h + #include signal.h #ifdef HAVE_GETOPT_H #include getopt.h *** *** 105,112 static void usage() { ! fprintf(stderr, usage: pgbench [-h hostname][-p port][-c nclients][-t ntransactions][-s scaling_factor][-n][-C][-v][-S][-N][-l][-U login][-P password][-d][dbname]\n); ! fprintf(stderr, (initialize mode): pgbench -i [-h hostname][-p port][-s scaling_factor][-U login][-P password][-d][dbname]\n); } /* random number generator */ --- 106,113 static void usage() { ! fprintf(stderr, usage: pgbench [-h hostname][-p port][-c nclients][-t ntransactions][-s scaling_factor][-n][-C][-v][-S][-N][-l][-a][-U login][-P password][-d][dbname]\n); ! fprintf(stderr, (initialize mode): pgbench -i [-h hostname][-p port][-s scaling_factor][-U login][-P password][-d][dbname][-a]\n); } /* random number generator */ *** *** 703,712 else if ((env = getenv(PGUSER)) != NULL *env != '\0') login = env; ! while ((c = getopt(argc, argv, ih:nvp:dc:t:s:U:P:CNSl)) != -1) { switch (c) { case 'i': is_init_mode++; break; --- 704,719 else if ((env = getenv(PGUSER)) != NULL *env != '\0') login = env; ! while ((c = getopt(argc, argv, aih:nvp:dc:t:s:U:P:CNSl)) != -1) { switch (c) { + case 'a': + #ifndef WIN32 + signal(SIGPIPE, SIG_IGN); + #endif + PQsetsighandling(0); + break; case 'i': is_init_mode++; break; Index: doc/src/sgml/libpq.sgml === RCS file: /projects/cvsroot/pgsql-server/doc/src/sgml/libpq.sgml,v retrieving revision 1.141 diff -c -r1.141 libpq.sgml *** doc/src/sgml/libpq.sgml 1 Nov 2003 01:56:29 - 1.141 --- doc/src/sgml/libpq.sgml 8 Nov 2003 21:43:56 - *** *** 645,650 --- 645,693 /listitem /varlistentry + varlistentry + termfunctionPQsetsighandling/functionindextermprimaryPQsetsighandling///term + termfunctionPQgetsighandling/functionindextermprimaryPQgetsighandling///term + listitem +para +Set/query SIGPIPE signal handling. + synopsis + void PQsetsighandling(int internal_sigign); + /synopsis + synopsis + int PQgetsighandling(void); + /synopsis + /para + + para + These functions allow to query and set the SIGPIPE signal handling + of libpq: by default, Unix systems generate a (fatal) SIGPIPE signal + on write attempts to a disconnected socket. Most callers expect a + normal error return instead of the signal. A normal error return can + be achieved by blocking or ignoring the SIGPIPE signal. This can be + done either globally in the application or inside libpq. +/para +para + If internal signal handling is enabled (this is the default), then + libpq sets
Re: [PATCHES] SIGPIPE handling, take two.
Tom Lane wrote: I don't think we need to complicate pqsignal's API for this. Instead we'd better document that SIGPIPE handling has to be set up and kept stable before doing any libpq operations in a multithread app. Not reliable. An app could install it's own signal handler and block SIGPIPE around all libpq calls. Signal blocking is per-thread. But the SIG_IGN/restore sequence affects the whole app - PQconnectdb calls would result in randomly dropped SIGPIPE signals. -- Manfred ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PATCHES] sigpipe handling
Attatched is the latest version of my patch that makes the signal(SIG_PIPE, SIG_IGN) calls around the send() syscall conditional: they are not sufficient to ensure that multithreaded libpq users are not killed by SIGPIPE signals, and they cause a noticable slowdown. I've switched to a global flag, and two functions to get/set the flag. Any other ideas how to protect libpq against SIGPIPE? -- Manfred Index: contrib/pgbench/README.pgbench === RCS file: /projects/cvsroot/pgsql-server/contrib/pgbench/README.pgbench,v retrieving revision 1.9 diff -c -r1.9 README.pgbench *** contrib/pgbench/README.pgbench 10 Jun 2003 09:07:15 - 1.9 --- contrib/pgbench/README.pgbench 8 Nov 2003 21:43:53 - *** *** 112,117 --- 112,121 might be a security hole since ps command will show the password. Use this for TESTING PURPOSE ONLY. + -a + Disable SIGPIPE delivery globally instead of within each + libpq operation. + -n No vacuuming and cleaning the history table prior to the test is performed. Index: contrib/pgbench/pgbench.c === RCS file: /projects/cvsroot/pgsql-server/contrib/pgbench/pgbench.c,v retrieving revision 1.27 diff -c -r1.27 pgbench.c *** contrib/pgbench/pgbench.c 27 Sep 2003 19:15:34 - 1.27 --- contrib/pgbench/pgbench.c 8 Nov 2003 21:43:54 - *** *** 28,33 --- 28,34 #else #include sys/time.h #include unistd.h + #include signal.h #ifdef HAVE_GETOPT_H #include getopt.h *** *** 105,112 static void usage() { ! fprintf(stderr, usage: pgbench [-h hostname][-p port][-c nclients][-t ntransactions][-s scaling_factor][-n][-C][-v][-S][-N][-l][-U login][-P password][-d][dbname]\n); ! fprintf(stderr, (initialize mode): pgbench -i [-h hostname][-p port][-s scaling_factor][-U login][-P password][-d][dbname]\n); } /* random number generator */ --- 106,113 static void usage() { ! fprintf(stderr, usage: pgbench [-h hostname][-p port][-c nclients][-t ntransactions][-s scaling_factor][-n][-C][-v][-S][-N][-l][-a][-U login][-P password][-d][dbname]\n); ! fprintf(stderr, (initialize mode): pgbench -i [-h hostname][-p port][-s scaling_factor][-U login][-P password][-d][dbname][-a]\n); } /* random number generator */ *** *** 703,712 else if ((env = getenv(PGUSER)) != NULL *env != '\0') login = env; ! while ((c = getopt(argc, argv, ih:nvp:dc:t:s:U:P:CNSl)) != -1) { switch (c) { case 'i': is_init_mode++; break; --- 704,719 else if ((env = getenv(PGUSER)) != NULL *env != '\0') login = env; ! while ((c = getopt(argc, argv, aih:nvp:dc:t:s:U:P:CNSl)) != -1) { switch (c) { + case 'a': + #ifndef WIN32 + signal(SIGPIPE, SIG_IGN); + #endif + PQsetsighandling(0); + break; case 'i': is_init_mode++; break; Index: doc/src/sgml/libpq.sgml === RCS file: /projects/cvsroot/pgsql-server/doc/src/sgml/libpq.sgml,v retrieving revision 1.141 diff -c -r1.141 libpq.sgml *** doc/src/sgml/libpq.sgml 1 Nov 2003 01:56:29 - 1.141 --- doc/src/sgml/libpq.sgml 8 Nov 2003 21:43:56 - *** *** 645,650 --- 645,693 /listitem /varlistentry + varlistentry + termfunctionPQsetsighandling/functionindextermprimaryPQsetsighandling///term + termfunctionPQgetsighandling/functionindextermprimaryPQgetsighandling///term + listitem +para +Set/query SIGPIPE signal handling. + synopsis + void PQsetsighandling(int internal_sigign); + /synopsis + synopsis + int PQgetsighandling(void); + /synopsis + /para + + para + These functions allow to query and set the SIGPIPE signal handling + of libpq: by default, Unix systems generate a (fatal) SIGPIPE signal + on write attempts to a disconnected socket. Most callers expect a + normal error return instead of the signal. A normal error return can + be achieved by blocking or ignoring the SIGPIPE signal. This can be + done either globally in the application or inside libpq. +/para +para + If internal signal handling is enabled (this is the default), then + libpq sets the SIGPIPE handler to SIG_IGN before every socket send + operation and restores it afterwards. This prevents libpq from + killing the application, at
[PATCHES] SIGPIPE handling, take two.
pqsecure_write tries to catch SIGPIPE signals generated by network disconnects by setting the signal handler to SIG_IGN. The current approach causes several problems: - it always sets SA_RESTART when it restores the old handler. - it's not reliable for multi threaded apps, because another thread could change the signal handler inbetween. - it's slow, because after setting a signal handler to SIG_IGN the kernel must enumerate all threads and clear all pending signals (at least FreeBSD-5.1 and linux-2.6 do that. Earlier linux kernels don't - their signal handling is known to be broken for multithreaded apps). Initially I proposed a new option for PQconnectdb, but Tom didn't like that. The attached patch autodetects if it should set the signal handler, Tom proposed that. The code doesn't try to check if the signal is handled by blocking it, because I haven't figured out how to check that: sigprocmask is undefined for multithreaded apps and calling pthread_sigmask would force every libpq user to link against libpthread. -- Manfred ? src/interfaces/libpq/libpq.so.3.1 Index: src/interfaces/libpq/fe-connect.c === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/fe-connect.c,v retrieving revision 1.263 diff -c -r1.263 fe-connect.c *** src/interfaces/libpq/fe-connect.c 18 Oct 2003 05:02:06 - 1.263 --- src/interfaces/libpq/fe-connect.c 2 Nov 2003 18:29:40 - *** *** 41,46 --- 41,47 #include netinet/tcp.h #endif #include arpa/inet.h + #include signal.h #endif #include libpq/ip.h *** *** 951,956 --- 952,983 else if (conn-sslmode[0] == 'a') /* allow */ conn-wait_ssl_try = true; #endif + /* +* Autodetect SIGPIPE signal handling: +* The default action per Unix spec is kill current process and +* that's not acceptable. If the current setting is not the default, +* then assume that the caller knows what he's doing and leave the +* signal handler unchanged. Otherwise set the signal handler to +* SIG_IGN around each send() syscall. Unfortunately this is both +* unreliable and slow for multithreaded apps. +*/ + conn-do_sigaction = true; + #if !defined(HAVE_POSIX_SIGNALS) + { + pqsigfunc old; + old = signal(SIGPIPE, SIG_IGN); + if (old != SIG_DFL) + conn-do_sigaction = false; + signal(SIGPIPE, old); + } + #else + { + struct sigaction oact; + + if (sigaction(SIGPIPE, NULL, oact) == 0 oact.sa_handler != SIG_DFL) + conn-do_sigaction = false; + } + #endif /* !HAVE_POSIX_SIGNALS */ /* * Set up to try to connect, with protocol 3.0 as the first attempt. Index: src/interfaces/libpq/fe-secure.c === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/fe-secure.c,v retrieving revision 1.32 diff -c -r1.32 fe-secure.c *** src/interfaces/libpq/fe-secure.c29 Sep 2003 16:38:04 - 1.32 --- src/interfaces/libpq/fe-secure.c2 Nov 2003 18:29:41 - *** *** 348,354 ssize_t n; #ifndef WIN32 ! pqsigfunc oldsighandler = pqsignal(SIGPIPE, SIG_IGN); #endif #ifdef USE_SSL --- 348,357 ssize_t n; #ifndef WIN32 ! pqsigfunc oldsighandler = NULL; ! ! if (conn-do_sigaction) ! oldsighandler = pqsignal(SIGPIPE, SIG_IGN); #endif #ifdef USE_SSL *** *** 408,414 n = send(conn-sock, ptr, len, 0); #ifndef WIN32 ! pqsignal(SIGPIPE, oldsighandler); #endif return n; --- 411,418 n = send(conn-sock, ptr, len, 0); #ifndef WIN32 ! if (conn-do_sigaction) ! pqsignal(SIGPIPE, oldsighandler); #endif return n; Index: src/interfaces/libpq/libpq-int.h === RCS file: /projects/cvsroot/pgsql-server/src/interfaces/libpq/libpq-int.h,v retrieving revision 1.82 diff -c -r1.82 libpq-int.h *** src/interfaces/libpq/libpq-int.h5 Sep 2003 02:08:36 - 1.82 --- src/interfaces/libpq/libpq-int.h2 Nov 2003 18:29:42 - *** *** 329,334 --- 329,335 charpeer_dn[256 + 1]; /* peer distinguished name */ charpeer_cn[SM_USER + 1]; /* peer common name */ #endif + booldo_sigaction; /* set SIGPIPE to SIG_IGN around every send() call */ /* Buffer for current error message */ PQExpBufferData errorMessage; /* expansible string */ ---(end of broadcast)--- TIP 4: Don't 'kill -9' the
Re: [PATCHES] fix for strict-alias warnings
Tom Lane wrote: Andrew Dunstan [EMAIL PROTECTED] writes: Of course, the linux kernel is aimed at a limited set of compilers - as I understand it basically gcc although it has been made to build with Intel compilers icc once compiled the kernel. But they had to teach it quite a lots of gccisms. - which makes things somewhat easier for them. What is our target set of compilers? What is our target version of C? Pretty much anything that speaks ANSI C is my usual feeling about that. As yet we have not heard of any non-gcc compilers in which this is a problem, although you have a point that some compiler somewhere may do this and not have a way to turn it off :-( Intel's icc compiler supports strict alias analysis, but the default was off. Also note that uninhibited casting between types can still cause alignment problems, We understand that issue, we solved it years ago. BTW, I haven't looked at the problem spots in detail. How many of them are due to the use of MemSet in conjunction with other access to a chunk of memory? ISTM that we need not worry about code motion around a MemSet call, since that would require the compiler to prove that the memset() path through the macro wouldn't be affected, which I doubt it would think. gcc is quite good at propagating constants around. This is heavily used in the linux-kernel: __builtin_constant(x), and then large switch statements that are completely evaluated at compile time. There is a good chance that gcc figures out that MemSet(,0,sizeof(double)) are two writes to two integer values, and then decides that they can't alias with reads/write to the double. I'll search for a suitable gcc list and post the memset macro - that might give a definitive answer. -- Manfred ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] fix for strict-alias warnings
Tom Lane wrote: Manfred Spraul [EMAIL PROTECTED] writes: After some massaging, I've succeeded in generating bad code using a slightly modified MemSetAligned macro (parameters -O2 -fstrict-aliasing): gcc pipelined the x*x around the memset. As I already explained, we do not care about the MemSetAligned case. Is gcc 3.3 smart enough to optimize away the pointer alignment test in the full macro? 3.2 optimizes away the pointer alignment test, but then doesn't pipeline the x*x calculation. It might be due to a known (and now fixed) bug in gcc where is lost track of constants, and thus didn't succeed in optimizing long calculations. I don't have gcc 3.3 installed, but IMHO it would be insane to leave strict alias analysis enabled - writing to *(int32*)addr violates the alias rules, the bad code generated with MemSetAligned proved that. Is someone around with 3.3 who could test MemSet? -- Manfred ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PATCHES] WIN32_CONSOLE usage
Christoph Dalitz wrote: On Thu, 11 Sep 2003 20:51:24 +0200 Manfred Spraul [EMAIL PROTECTED] wrote: - OemToChar() and CharToOem() convert all console input/output. In the long run this might be the better solution, if it works entirely without user intervention. I'm not sure if it's possible to get all corner cases right. I do not think that it is possible to handle all cases automatically. Even restricting the conversion to input/output on stdin/stdout will fail in some circumstances; eg. with the command psql script.sql when the script has been written with a windows editor (notepad, emacs or whatever). It would be very hard to get right. Perhaps something like isatty(), except that it's probably called GetHandleIsConsoleFlag() or something like that, followed by a check of the code page. But I didn't find a suitable function yet. -- Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PATCHES] WIN32_CONSOLE usage
Peter Eisentraut wrote: If you can detect that they are different, why can't you adjust the code page in that case only? What should we do if we detect that they differ: - set the console code page to the ansi code page. This has two drawbacks: It doesn't work with Indic, because Indic doesn't have an ansi code page. And the user must still switch the console font. But: if the user must change the configuration, then he can as easily change both the font and the code page. Which means: SetConsoleCP is the wrong approach. - OemToChar() and CharToOem() convert all console input/output. In the long run this might be the better solution, if it works entirely without user intervention. I'm not sure if it's possible to get all corner cases right. -- Manfred ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])