Re: [PATCHES] [HACKERS] 'Waiting on lock'
* Tom Lane ([EMAIL PROTECTED]) wrote: Yeah, I wouldn't want one per second. Do we already track how long we've been waiting? No, because we're *asleep*. You'd have to add an additional timeout-interrupt reason. Plus there's a ton of interesting questions about what's safe to do from an interrupt service routine. In fact, I am scandalized to see that someone has inserted a boatload of elog calls into CheckDeadLock since 8.2 --- that seems entirely unsafe. [ checks revision history... ] Attached is a patch which moves the messages to ProcSleep(). To do this I had to move the semaphore signal to unconditionally be signalled whenever CheckDeadLock() is called regardless of whether it finds a hard deadlock. I'm not 100% sure that's correct but afaik we only use semaphores to signal state changes and deal with spurious semaphore firings everywhere. Incidentally in looking at this I found that the early deadlock detection never seems to fire. Reading the source it seems it ought to be firing whenever we have a simple two-process deadlock. But instead I only get the timeout-based detection. checkpoint-log-messages-fix.patch.gz Description: Binary data -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] boolean = text explicit casts
On May 30, 2007, at 3:40 PM, Neil Conway wrote: On Wed, 2007-30-05 at 21:23 +0200, Peter Eisentraut wrote: I'm not sure what your rationale was for creating lower-case words instead of upper case, except for it looks nicer. Is there a technical reason? There's no real technical reason: the standard says upper-case, but PG's general philosophy of case folding would suggest folding to lower- case. If we were compliant with the spec's case folding requirements then emitting uppercase would be the clear choice, but since we aren't, I don't have strong feelings either way. Sorry for the late reply... I'm worried that this would make us incompatible with cross-database code. Granted, should probably be using a boolean representation, but I'm not sure if that's universally true. And if we find out later that lower case is a problem, it won't be possible to change it without messing with the rest of our users. I think it'd be best to go with the spec. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] Load Distributed Checkpoints, revised patch
On Jun 17, 2007, at 4:39 AM, Simon Riggs wrote: pg_start_backup() should be a normal checkpoint I think. No need for backup to be an intrusive process. Good point. A spread out checkpoint can take a long time to finish, though. Is there risk for running into a timeout or something if it takes say 10 minutes for a call to pg_start_backup to finish? That would be annoying, but the alternative is for backups to seriously effect performance, which would defeat the object of the HOT backup. It's not like its immediate right now, so we'd probably be moving from 2-3 mins to 10 mins in your example. Most people are expecting their backups to take a long time anyway, so thats OK. We should document it, though; otherwise I can see a bunch of confused users wondering why pg_start_backup takes so long. Remember that with longer checkpoints, the odds of them calling pg_start_backup during one and having to wait are much greater. -- Jim Nasby[EMAIL PROTECTED] EnterpriseDB http://enterprisedb.com 512.569.9461 (cell) ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] Load Distributed Checkpoints, revised patch
Jim Nasby wrote: On Jun 17, 2007, at 4:39 AM, Simon Riggs wrote: pg_start_backup() should be a normal checkpoint I think. No need for backup to be an intrusive process. Good point. A spread out checkpoint can take a long time to finish, though. Is there risk for running into a timeout or something if it takes say 10 minutes for a call to pg_start_backup to finish? That would be annoying, but the alternative is for backups to seriously effect performance, which would defeat the object of the HOT backup. It's not like its immediate right now, so we'd probably be moving from 2-3 mins to 10 mins in your example. Most people are expecting their backups to take a long time anyway, so thats OK. We should document it, though; otherwise I can see a bunch of confused users wondering why pg_start_backup takes so long. Remember that with longer checkpoints, the odds of them calling pg_start_backup during one and having to wait are much greater. If pg_start_backup initiates a non-immediate, smoothed checkpoint, how about a checkpoint that's in progress when pg_start_backup is called? Should that be hurried, so we can start the backup sooner? Probably not, which means we'll need yet another mode to RequestCheckpoint: request a non-immediate checkpoint, but if there's a checkpoint already running, don't rush it. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PATCHES] [HACKERS] 'Waiting on lock'
Gregory Stark [EMAIL PROTECTED] writes: Incidentally in looking at this I found that the early deadlock detection never seems to fire. Reading the source it seems it ought to be firing whenever we have a simple two-process deadlock. But instead I only get the timeout-based detection. Ok, I understand now that early deadlock detection only kicks in when doing something like LOCK TABLE and even then only if you're deadlocking because you're upgrading a lock. So this works as intended though it's much less useful than I thought. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] Preliminary GSSAPI Patches
On Sun, May 20, 2007 at 01:28:40AM -0700, Henry B. Hotz wrote: I finally got to testing that updated patch. It's fine per-se, but was missing the updated README.GSSAPI file. Herewith fixed. I've been reviewing and updating this patch, for a while now.I've changed quite a bit around, and I have it working fine, but I have one question. Is there a way to provoke GSSAPI into sending multiple packets in the authentication? It doesn't seem to do that for me, and ISTM that the code as it stands is broken in that case - but I'd like to verify it. Basically, pg_GSS_recvauth() is supposed to loop and read all continuing exchange packets, right? But the reading of packets from the network sits *outside* the loop. So it basically just loops over and over on the same data, which ISTM is wrong. It does send a proper ask-for-continue message to the frontend inside the loop, but I can't figure out how it's supposed to read the response. It looks like the fix should be as simple as moving the packet reading into the loop, but again I'd like a way to test it :) //Magnus ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PATCHES] WIP: rewrite numeric division
Gregory Stark [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: Yeah, my proposed patch is schoolbook division. I don't think I'd trust Newton's method to give anything but a close approximation, which is what we have in HEAD already. Well unless we start storing rational numbers it'll always be a limited to a finite number of decimals. The key is whether we can guarantee a lower bound on the number of accurate decimal places. As long as we can then we can keep going until the decimal places we're going to return are all accurate. This is nonsense. Consider an approximate division that gives ...123.99... You can keep generating 9's forever, but you'll never know whether the accurate result of trunc() should be 123 or 124. Unless you use a method that gives you trustworthy digits to start with. It looks like multiplication can also generate incorrect results. It can, but only if told to produce fewer digits than would be in the exact result, so I'm not worried about that. regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PATCHES] [HACKERS] 'Waiting on lock'
Gregory Stark [EMAIL PROTECTED] writes: Attached is a patch which moves the messages to ProcSleep(). To do this I had to move the semaphore signal to unconditionally be signalled whenever CheckDeadLock() is called regardless of whether it finds a hard deadlock. I'm not 100% sure that's correct but afaik we only use semaphores to signal state changes and deal with spurious semaphore firings everywhere. It's a bit itchy-seeming, but I think what you are actually doing is going over to a regime where every wait on the semaphore checks and re-waits if it didn't get the condition it wants. Before, I think we did that for every use of the semaphore except this one (which is an outgrowth of the fact that originally this was indeed the only use of the sema, and the others got shoehorned in over time). I'll check it over. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PATCHES] WIP: rewrite numeric division
Tom Lane wrote: I wrote: ... Now it's unlikely that real-world applications are going to be as dependent on the speed of div_var for long inputs as numeric_big is. And getting the right answer has to take priority over speed anyway. Still this is a bit annoying. Anyone see a way to speed it up, or have another approach? regards, tom lane +1 for the change from me. We use PostgreSQL for financial accounting stuff, including plpgsql triggers and functions etc. And we use numeric for all that. I always thought that numeric division was exact! :-) I never saw problem, perhaps because we round to very few digits, but well. Please apply this patch. I can accept the performance drop, as long as there won't be bad surprises with correctness. Best Regards Michael Paesold ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PATCHES] more autovacuum fixes
I attach a patch to fix some issues in the autovac process handling code. Mainly, instead of rechecking periodically in the launcher whether a worker was able to start or not, what we do is signal it from the postmaster when it fails. In order to do this I had to make the postmaster set a flag in shared memory, much like the pmsignal.c code does. According to previous discussions this is an acceptable thing for postmaster to do. I must admit I did not try to fill shared memory with garbage and see if it caused a postmaster crash. What I did test was creating a database, making sure that the launcher was picking it up, and then delete the database's directory. This correctly causes a worker to log errors when trying to connect, and it is handled well -- i.e. other databases are not starved. (One remaining problem I see is that a database in this state, or similar, that fails to be vacuumed, *will* cause starvation if in danger of Xid wraparound). Another thing I noticed is that I can change the autovacuum parameter in postgresql.conf and send SIGHUP to postmaster, which correctly starts the launcher if it was previously disabled. To make the shutdown case act accordingly, I made the launcher check the start daemon flag after rereading the config file, and quit if it's off. I tried multiple combinations of having it set and unset, changing stats_collect_tuple_level and it works fine. One problem with the patch is this (new code): bn = (Backend *) malloc(sizeof(Backend)); ! if (bn) { ! bn-pid = StartAutoVacWorker(); ! bn-is_autovacuum = true; ! /* we don't need a cancel key */ ! if (bn-pid 0) ! { ! /* FIXME -- unchecked memory allocation here */ ! DLAddHead(BackendList, DLNewElem(bn)); If the palloc() inside DLNewElem fails, we will fail to report a fork failure to the launcher. I am not sure how serious this is. One idea that came to mind was using a PG_TRY block, sending the signal in the CATCH block, and then rethrowing the exception. Is this acceptable? All in all, this patch increases the reliability of autovacuum, so I intend to apply it shortly unless there are objections. Further improvements might come later as I become aware of possible fixes. -- Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J Hay quien adquiere la mala costumbre de ser infeliz (M. A. Evans) Index: src/backend/postmaster/autovacuum.c === RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/postmaster/autovacuum.c,v retrieving revision 1.50 diff -c -p -r1.50 autovacuum.c *** src/backend/postmaster/autovacuum.c 13 Jun 2007 21:24:55 - 1.50 --- src/backend/postmaster/autovacuum.c 19 Jun 2007 17:30:00 - *** *** 4,16 * * PostgreSQL Integrated Autovacuum Daemon * * * Portions Copyright (c) 1996-2007, PostgreSQL Global Development Group * Portions Copyright (c) 1994, Regents of the University of California * * * IDENTIFICATION ! * $PostgreSQL: pgsql/src/backend/postmaster/autovacuum.c,v 1.50 2007-06-13 21:24:55 alvherre Exp $ * *- */ --- 4,52 * * PostgreSQL Integrated Autovacuum Daemon * + * The autovacuum system is structured in two different kinds of processes: the + * autovacuum launcher and the autovacuum worker. The launcher is an + * always-running process, started by the postmaster when the autovacuum GUC + * parameter is set. The launcher schedules autovacuum workers to be started + * when appropriate. The workers are the processes which execute the actual + * vacuuming; they connect to a database as determined in the launcher, and + * once connected they examine the catalogs to select the tables to vacuum. + * + * The autovacuum launcher cannot start the worker processes by itself, + * because doing so would cause robustness issues (namely, failure to shut + * them down on exceptional conditions; and apart from that, since the launcher + * is connected to shared memory and is thus subject to corruption there, it is + * not as robust as the postmaster which is not). So it leaves that task to + * the postmaster. + * + * There is an autovacuum shared memory area, where the launcher stores + * information about the database it wants vacuumed. When it wants a new + * worker to start, it sets a flag in shared memory and sends a special signal + * to the postmaster. Then postmaster knows nothing more than it must start a + * worker; so it forks a new child, which turns into a worker and examines + * shared memory to know what database it must connect to. + * + * If the fork() call fails, the postmaster sets a flag in the shared memory + * area, and it sends a signal to the launcher. The launcher, upon noticing + * the flag, can try starting the worker again. Note that the failure
Re: [PATCHES] [HACKERS] 'Waiting on lock'
Gregory Stark [EMAIL PROTECTED] writes: Attached is a patch which moves the messages to ProcSleep(). BTW, with the messages moved out of the ISR it would be safe to make it identify the specific lock being waited on (in the same terms used in the existing deadlock messages). Is there a reason not to do that? I suppose it would eat a few more cycles ... regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] [HACKERS] 'Waiting on lock'
Gregory Stark [EMAIL PROTECTED] writes: * Tom Lane ([EMAIL PROTECTED]) wrote: In fact, I am scandalized to see that someone has inserted a boatload of elog calls into CheckDeadLock since 8.2 --- that seems entirely unsafe. [ checks revision history... ] Attached is a patch which moves the messages to ProcSleep(). Applied with some further revisions to improve the usefulness of the log messages. There's now one issued when the deadlock timeout elapses, and another when the lock is finally obtained: LOG: process 14945 still waiting for AccessExclusiveLock on relation 86076 of database 86042 after 1003.354 ms ... LOG: process 14945 acquired AccessExclusiveLock on relation 86076 of database 86042 after 5918.002 ms although I just realized that the wording of the second one is misleading; it actually comes out when the lock wait ends, whether we acquired the lock or not. (The other possibility is that our statement was aborted, eg by SIGINT or statement_timeout.) Is it worth having two messages for the two cases? I'm tempted to just not log anything if the statement is aborted --- the cause of the abort should be reflected in some later error message, and reporting how long we waited before giving up seems not within the charter of log_lock_waits. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PATCHES] [HACKERS] 'Waiting on lock'
On Tue, 2007-06-19 at 16:24 -0400, Tom Lane wrote: Gregory Stark [EMAIL PROTECTED] writes: * Tom Lane ([EMAIL PROTECTED]) wrote: In fact, I am scandalized to see that someone has inserted a boatload of elog calls into CheckDeadLock since 8.2 --- that seems entirely unsafe. [ checks revision history... ] Attached is a patch which moves the messages to ProcSleep(). Thanks Greg for taking this on; it would still be in my queue now if you hadn't, so much appreciated. Applied with some further revisions to improve the usefulness of the log messages. There's now one issued when the deadlock timeout elapses, and another when the lock is finally obtained: LOG: process 14945 still waiting for AccessExclusiveLock on relation 86076 of database 86042 after 1003.354 ms ... LOG: process 14945 acquired AccessExclusiveLock on relation 86076 of database 86042 after 5918.002 ms although I just realized that the wording of the second one is misleading; it actually comes out when the lock wait ends, whether we acquired the lock or not. (The other possibility is that our statement was aborted, eg by SIGINT or statement_timeout.) Is it worth having two messages for the two cases? I'm tempted to just not log anything if the statement is aborted --- the cause of the abort should be reflected in some later error message, and reporting how long we waited before giving up seems not within the charter of log_lock_waits. Sounds good; thanks Tom. related TODO items: - add a WAIT n clause in same SQL locations as NOWAIT - add a lock_wait_timeout (USERSET), default = 0 (unlimited waiting) to provide better control over lock waits. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PATCHES] Preliminary GSSAPI Patches
Such timing! I just spent most of yesterday stepping though the gssapi sample app's in Java 1.4 with someone here at work. Was thinking I needed to get back to the JDBC client and do what I promised. Also finished filtering the PG lists for stuff just before seeing this email. On Jun 19, 2007, at 6:04 AM, Magnus Hagander wrote: On Sun, May 20, 2007 at 01:28:40AM -0700, Henry B. Hotz wrote: I finally got to testing that updated patch. It's fine per-se, but was missing the updated README.GSSAPI file. Herewith fixed. I've been reviewing and updating this patch, for a while now.I've changed quite a bit around, and I have it working fine, but I have one question. Be curious to see what you've done, but if you're actively changing things I'll let them settle. Is there a way to provoke GSSAPI into sending multiple packets in the authentication? It doesn't seem to do that for me, and ISTM that the code as it stands is broken in that case - but I'd like to verify it. Remember wondering about that myself. For SASL if you turn on all the options you get an extra round trip. Not for GSSAPI/Krb5, which is pretty efficient in that respect. The loop logic for SASL is just different enough I can imagine messing up, but I would have thought it would have made me get the logic right. The only thing I can think of is to use a different GSSAPI mechanism. That opens an interesting can of worms that has nothing to do with Postgresql. First of all that means you need to use a GSSAPI implementation that supports multiple mechanisms (which precludes Java for now). That in turn means either Sun, or MIT 1.6+, or Heimdal 0.8+. Second, you need another mechanism to install. That means the commercial Entrust SPKM mechanism on Sun, or the UMICH SPKM mechanism, or some SPNEGO mechanism. Love says there are problems with the SPKM RFC and the UMICH implementation won't interoperate with other implementations as a result (but it works with itself). I also know he's been experimenting with other mechanisms. Looking at the latest Heimdal snapshot I have, it seems to have both SPNEGO and NTLM mechanism code in it. A configuration that used SPNEGO to negotiate Kerberos 5 ought to take two round trips, at least. Feel like trying it? I never tested my patches against a Heimdal gssapi library. My bad. If you try the patches against Heimdal GSSAPI/SPNEGO/Krb5 that ought to be a really good test case. It would be a good server case to try with a native Windows client as well. Basically, pg_GSS_recvauth() is supposed to loop and read all continuing exchange packets, right? But the reading of packets from the network sits *outside* the loop. So it basically just loops over and over on the same data, which ISTM is wrong. It does send a proper ask-for-continue message to the frontend inside the loop, but I can't figure out how it's supposed to read the response. I remember having to stand on my head to get it (apparently) right on the client side. I also remember the server side was a lot simpler, but you're saying I may have oversimplified? I can't answer the question about the server side without reviewing the code. It looks like the fix should be as simple as moving the packet reading into the loop, but again I'd like a way to test it :) Well, probably, but that doesn't sound that simple to me (assuming I really didn't do that to begin with). //Magnus The opinions expressed in this message are mine, not those of Caltech, JPL, NASA, or the US Government. [EMAIL PROTECTED], or [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PATCHES] Cancel autovacuum conflicting with DROP TABLE
Here is a patch that cancels autovacuum workers conflicting with DROP TABLE, TRUNCATE and CLUSTER. It was discussed here: http://archives.postgresql.org/pgsql-hackers/2007-06/msg00556.php Before backends drop a table, they search autovacuum workers that are running on the table. If found, they send SIGINT signal to the autovacuum to avoid waiting for the worthless vacuum. When an autovacuum worker receives SIGINT, it skips only the vacuuming table and continues the remaining jobs. Now SIGINT and SIGTERM have different meanings and their logs are changed. SIGINT -- Cancel the current job. DEBUG: autovacuum on relation schema.table is canceled SIGTERM -- Cancel all jobs. FATAL: terminating autovacuum due to administrator command We can see the second SIGTERM log on shutdown and DROP DATABASE. In such case, we send SIGTERM to autovacuums instead of SIGINT. I'd like to bring down the error level to WARNING or less actually, but I have no idea to do it because canceling queries and error levels are tightly coupled in the present implementation. Instead, I added the special FATAL message for autovacuum workers so that we can distinguish the lines are ignorable. Comments welcome. --- ITAGAKI Takahiro NTT Open Source Software Center cancel_autovac_on_drop.patch Description: Binary data ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match