Re: [HACKERS] Problem with locks
Gregory Stark [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: We're seeing a problem where occasionally a process appears to be granted a lock but miss its semaphore signal. Kernel bug maybe? What's the platform? I've written a synthetic test program to check for lost semaphore wakeups. I can't seem to produce any on my machine but haven't had a chance to run it yet on the benchmark machine that's been showing the problem. If I can't produce any lost wakeups with a program like this it looks more like it might be a Postgres or GCC bug than a Linux bug. It would be helpful if people could run this on various architectures and various versions of Linux (or other OSes). I've been running it with 40 processes for an hour, but even shorter runs would be useful. It will drive the load on your machine through the roof but doesn't cause any i/o. $ gcc -Wall ipctest.c -lrt $ ./a.out 40 3600 #include stdlib.h #include stdio.h #include semaphore.h #include unistd.h #include sys/types.h #include sys/ipc.h #include sys/shm.h #include signal.h #define IPCProtection (0600) /* access/modify by user only */ int nthreads; sem_t *sems; unsigned char *wakers; static void worker(int n); int main(int argc, char *argv[]) { int i, shmid1, shmid2, runtime; pid_t *pids; if (argc = 1) nthreads = 10; else nthreads = atoi(argv[1]); if (argc = 2) runtime = 10; else runtime = atoi(argv[2]); if (nthreads = 0 || nthreads 200) exit(1); if (runtime = 0) exit(1); printf(running with %d processes for %ds\n, nthreads, runtime); shmid1 = shmget(IPC_PRIVATE, nthreads*sizeof(sem_t), IPC_CREAT | IPC_EXCL | IPCProtection); if (shmid1 == -1) { perror(shmget:); exit(1); } sems = shmat(shmid1, NULL, 0); for (i=0;inthreads;i++) if (sem_init(sems[i], 1, 0) 0) { perror(sem_init); exit(1); } shmid2 = shmget(IPC_PRIVATE, nthreads*sizeof(unsigned char), IPC_CREAT | IPC_EXCL | IPCProtection); if (shmid2 == -1) { perror(shmget:); exit(1); } wakers = shmat(shmid2, NULL, 0); for (i=0;inthreads;i++) wakers[i] = 255; pids = malloc(sizeof(pid_t)*nthreads); for (i=0;inthreads;i++) { /*printf(forking thread %d\n, i);*/ switch(pids[i] = fork()) { case 0: worker(i); exit(0); case -1: perror(fork); exit(1); default: /*printf(successfully forked thread %d as pid %d\n, i, pids[i]);*/ break; } } sleep(runtime); kill(pids[0], 3); sleep(1); for (i=1;inthreads;i++) { if (wakers[i] == 255) printf(thread %d lost a wakeup!!!\n, i); kill(pids[i], 3); } if (shmctl(shmid1, IPC_RMID, NULL) 0) perror(smctl ipc_rmid); if (shmctl(shmid2, IPC_RMID, NULL) 0) perror(smctl ipc_rmid); exit(0); } static void worker(int n) { srandom(getpid()); for(;;) { int waker; int i; /* wake anyone following us waiting for us to wake them */ for (i=n+1;inthreads;i++) { if (wakers[i] == n) { /*printf(thread %d waking thread %d\n, n, i);*/ wakers[i] = 255; if (sem_post(sems[i]) 0) { perror(sem_post); exit(1); } } } if (n == 0) { /* we're the first thread so we just sleep and then go around waking people again */ continue; } /* otherwise pick a random thread earlier than us to wake us and go to sleep until awoken by it */ waker = random()%n; /*printf(thread %d sleeping waiting for %d to wake us\n, n, waker);*/ wakers[n] = waker; if (sem_wait(sems[n]) 0) { perror(sem_wait); exit(1); } if ((waker = wakers[n]) != 255) { printf(thread %d awake but waker is still set to %d \n, n, waker); exit(1); } /*printf(thread %d awoken\n, n);*/ } } -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] Problem with locks
Gregory Stark wrote: Gregory Stark [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: We're seeing a problem where occasionally a process appears to be granted a lock but miss its semaphore signal. Kernel bug maybe? What's the platform? I've written a synthetic test program to check for lost semaphore wakeups. I can't seem to produce any on my machine but haven't had a chance to run it yet on the benchmark machine that's been showing the problem. If I can't produce any lost wakeups with a program like this it looks more like it might be a Postgres or GCC bug than a Linux bug. It would be helpful if people could run this on various architectures and various versions of Linux (or other OSes). I've been running it with 40 processes for an hour, but even shorter runs would be useful. It will drive the load on your machine through the roof but doesn't cause any i/o. doesn't work on OpenBSD: $ gcc -o ipctest ipctest.c -lpthread $ ./ipctest 40 3600 running with 40 processes for 3600s sem_init: Operation not permitted This implementation does not support shared semaphores, and reports this fact by setting errno to EPERM. This is perhaps a stretch of the intention of POSIX, but is compliant, with the caveat that sem_init() always reports a permissions error when an attempt to create a shared semaphore is made. Stefan ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] proper way to fix information_schema.key_column_usage view
April Lorenzen wrote: I had to feel my way carrying out this fix, and I don't know if I did it right - I only know that it appears I no longer have the error. Please confirm whether I was supposed to execute all of share/information_schema.sql --- or just the portion that CREATEs or REPLACEs key_column_usage view. If you don't have any dependencies on the information schema (that is, you have yourself created objects that refer to the information schema, which should be rare), it is safe to just drop the schema, that is, DROP SCHEMA information_schema CASCADE; and reload it psql -f .../information_schema.sql $PGDATABASE -- Peter Eisentraut http://developer.postgresql.org/~petere/ ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] Problem with locks
2007/8/12, Gregory Stark [EMAIL PROTECTED]: Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: We're seeing a problem where occasionally a process appears to be granted a lock but miss its semaphore signal. Kernel bug maybe? What's the platform? I've written a synthetic test program to check for lost semaphore wakeups. I can't seem to produce any on my machine but haven't had a chance to run it yet on the benchmark machine that's been showing the problem. If I can't produce any lost wakeups with a program like this it looks more like it might be a Postgres or GCC bug than a Linux bug. It would be helpful if people could run this on various architectures and various versions of Linux (or other OSes). I've been running it with 40 processes for an hour, but even shorter runs would be useful. It will drive the load on your machine through the roof but doesn't cause any i/o. $ gcc -Wall ipctest.c -lrt $ ./a.out 40 3600 -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq Hello [EMAIL PROTECTED] ~]$ cat /proc/sys/kernel/osrelease 2.6.22.1-41.fc7 [EMAIL PROTECTED] tmp]$ ./a.out 40 3600 running with 40 processes for 3600s [EMAIL PROTECTED] tmp]$ without any problem Regards Pavel Stehule ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Problem with locks
Gregory Stark [EMAIL PROTECTED] writes: I've written a synthetic test program to check for lost semaphore wakeups. Seems to me this proves nothing much, since it doesn't use the same SysV semaphore API PG does. Please adjust so that it looks more like our code --- in particular there should be multiple processes having semaphores in the same semid group. Also, I think you have race conditions at shutdown --- the appearance of the thread %d lost a wakeup message would not convince me there was a bug in the least. You need to make sure the workers exit at a known point in their loop. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Wrote a connect-by feature
Hi, Am Samstag, 11. Aug 2007, 19:57:59 -0400 schrieb Andrew Dunstan: You say you read the Developers FAQ, but you clearly ignored this entry: http://www.postgresql.org/docs/faqs.FAQ_DEV.html#item1.4 [...] you didn't seem to understand [...] Yes. I ignore and I don't understand. Thanks. Good bye! Bertram -- Bertram Scharpf Stuttgart, Deutschland/Germany http://www.bertram-scharpf.de ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [HACKERS] Problem with locks
Tom Lane [EMAIL PROTECTED] writes: Gregory Stark [EMAIL PROTECTED] writes: I've written a synthetic test program to check for lost semaphore wakeups. Seems to me this proves nothing much, since it doesn't use the same SysV semaphore API PG does. Please adjust so that it looks more like our code --- in particular there should be multiple processes having semaphores in the same semid group. I was trying to copy the semaphore API exactly assuming USE_NAMED_POSIX_SEMAPHORES was *not* defined. According to the comments we prefer not to use named semaphores if possible. Also, I think you have race conditions at shutdown --- the appearance of the thread %d lost a wakeup message would not convince me there was a bug in the least. You need to make sure the workers exit at a known point in their loop. I intended to try to recreate the dynamics of the deadlock timeout timer signal. This was just a first cut. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Problem with locks
Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: Seems to me this proves nothing much, since it doesn't use the same SysV semaphore API PG does. I was trying to copy the semaphore API exactly assuming USE_NAMED_POSIX_SEMAPHORES was *not* defined. According to the comments we prefer not to use named semaphores if possible. What you seem to have copied is the posix_sema.c code, which AFAIK is only used on Darwin. sysv_sema.c is what to look at ... unless your benchmark machine is a Mac. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] Interesting misbehavior of repalloc()
I wrote: Gregory Stark [EMAIL PROTECTED] writes: We could also only do the realloc-in-place only if there isn't a 4k chunk in the 4k freelist. I'm imagining that usually there wouldn't be. Or in general, if there's a free chunk of the right size then copy to it, else consider realloc-in-place. Counterintuitive but it might work. I'm not sure how often there wouldn't be a free chunk though ... I experimented with this a bit. Not doing enlarge-in-place when there's a suitable free chunk turns out to be practically a one-line addition to AllocSetRealloc, but the question is whether that forty-line block of code is pulling its weight at all. I added some debug code to log when the different cases happen, and ran the regression tests. (Which maybe aren't very representative of real-world usage, but it's the best easy test I can think of.) What I got was 380 successful enlarge in place 438 blocked by new rule about available chunk 6078 other reallocs of small chunks The other reallocs are ones where one of the existing limitations prevent us from using realloc-in-place. The successful enlargements broke down like this: 12 realloc enlarge 16 - 24 1 realloc enlarge 16 - 32 1 realloc enlarge 16 - 40 1 realloc enlarge 16 - 64 1 realloc enlarge 16 - 80 139 realloc enlarge 256 - 512 119 realloc enlarge 512 - 1024 80 realloc enlarge 1024 - 2048 26 realloc enlarge 2048 - 4096 Bearing in mind that the first number is the number of bytes of data we'd have to copy if we don't enlarge-in-place, we're not saving that much work. (Cases involving larger chunks are passed off to libc's realloc(), so there's never anything bigger than 2K of copying at stake, at least when power-of-2 request sizes are used.) I drilled down a bit deeper and found that most of the larger realloc's are coming from just two places: enlargement of StringInfo buffers (initially 256 bytes) and enlargement of scan.l's literalbuf (initially 128 bytes). I changed the initial allocations to 1K for each of these, and then the profile of successful realloc-in-place changes to 12 realloc enlarge 16 - 24 1 realloc enlarge 16 - 32 1 realloc enlarge 16 - 40 1 realloc enlarge 16 - 64 1 realloc enlarge 16 - 80 81 realloc enlarge 1024 - 2048 25 realloc enlarge 2048 - 4096 Here, all of the remaining larger realloc's are happening during CREATE VIEW operations (while constructing the pg_rewrite rule text), which probably need not be considered a performance-critical path. Based on this, I conclude that the realloc-in-place code doesn't pull its weight. We should just remove it, and increase those penurious initial allocations in stringinfo.c and scan.l to avoid most of the use-cases for repalloc in the first place. Does anyone have any other test cases to suggest? Stuff like pgbench isn't interesting --- it doesn't cause repalloc to be invoked at all. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] regexp_matches and regexp_split are inconsistent
Stephan Szabo [EMAIL PROTECTED] writes: On Fri, 10 Aug 2007, Tom Lane wrote: Is this what we want? Arguably regexp_split is doing the most reasonable thing for its intended usage, but the strict definition of regexp matching seems to require what regexp_matches does. I think we need to either change one function to match the other, or else document the inconsistency. I'm not sure how many languages do this, but at least perl seems to work similarly, which makes me guess that it's probably similar in a bunch of languages. If it is, then we should probably just document the inconsistency. The Perl precedent is good enough for me. Documented... regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq