Re: [PATCHES] [HACKERS] 'Waiting on lock'

2007-06-19 Thread Gregory Stark

 * Tom Lane ([EMAIL PROTECTED]) wrote:

 Yeah, I wouldn't want one per second.  Do we already track how long
 we've been waiting?

 No, because we're *asleep*.  You'd have to add an additional
 timeout-interrupt reason.  Plus there's a ton of interesting questions
 about what's safe to do from an interrupt service routine.

 In fact, I am scandalized to see that someone has inserted a boatload
 of elog calls into CheckDeadLock since 8.2 --- that seems entirely
 unsafe.  [ checks revision history... ]

Attached is a patch which moves the messages to ProcSleep(). To do this I had
to move the semaphore signal to unconditionally be signalled whenever
CheckDeadLock() is called regardless of whether it finds a hard deadlock. I'm
not 100% sure that's correct but afaik we only use semaphores to signal state
changes and deal with spurious semaphore firings everywhere.

Incidentally in looking at this I found that the early deadlock detection
never seems to fire. Reading the source it seems it ought to be firing
whenever we have a simple two-process deadlock. But instead I only get the
timeout-based detection.



checkpoint-log-messages-fix.patch.gz
Description: Binary data

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] boolean = text explicit casts

2007-06-19 Thread Jim Nasby

On May 30, 2007, at 3:40 PM, Neil Conway wrote:

On Wed, 2007-30-05 at 21:23 +0200, Peter Eisentraut wrote:

I'm not sure what your rationale was for creating lower-case words
instead of upper case, except for it looks nicer.  Is there a  
technical

reason?


There's no real technical reason: the standard says upper-case, but  
PG's
general philosophy of case folding would suggest folding to lower- 
case.

If we were compliant with the spec's case folding requirements then
emitting uppercase would be the clear choice, but since we aren't, I
don't have strong feelings either way.


Sorry for the late reply...

I'm worried that this would make us incompatible with cross-database  
code. Granted, should probably be using a boolean representation, but  
I'm not sure if that's universally true. And if we find out later  
that lower case is a problem, it won't be possible to change it  
without messing with the rest of our users. I think it'd be best to  
go with the spec.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Load Distributed Checkpoints, revised patch

2007-06-19 Thread Jim Nasby

On Jun 17, 2007, at 4:39 AM, Simon Riggs wrote:

pg_start_backup() should be a normal checkpoint I think. No need for
backup to be an intrusive process.


Good point. A spread out checkpoint can take a long time to finish,
though. Is there risk for running into a timeout or something if it
takes say 10 minutes for a call to pg_start_backup to finish?


That would be annoying, but the alternative is for backups to  
seriously

effect performance, which would defeat the object of the HOT backup.
It's not like its immediate right now, so we'd probably be moving from
2-3 mins to 10 mins in your example. Most people are expecting their
backups to take a long time anyway, so thats OK.


We should document it, though; otherwise I can see a bunch of  
confused users wondering why pg_start_backup takes so long. Remember  
that with longer checkpoints, the odds of them calling  
pg_start_backup during one and having to wait are much greater.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [PATCHES] Load Distributed Checkpoints, revised patch

2007-06-19 Thread Heikki Linnakangas

Jim Nasby wrote:

On Jun 17, 2007, at 4:39 AM, Simon Riggs wrote:

pg_start_backup() should be a normal checkpoint I think. No need for
backup to be an intrusive process.


Good point. A spread out checkpoint can take a long time to finish,
though. Is there risk for running into a timeout or something if it
takes say 10 minutes for a call to pg_start_backup to finish?


That would be annoying, but the alternative is for backups to seriously
effect performance, which would defeat the object of the HOT backup.
It's not like its immediate right now, so we'd probably be moving from
2-3 mins to 10 mins in your example. Most people are expecting their
backups to take a long time anyway, so thats OK.


We should document it, though; otherwise I can see a bunch of confused 
users wondering why pg_start_backup takes so long. Remember that with 
longer checkpoints, the odds of them calling pg_start_backup during one 
and having to wait are much greater.


If pg_start_backup initiates a non-immediate, smoothed checkpoint, how 
about a checkpoint that's in progress when pg_start_backup is called? 
Should that be hurried, so we can start the backup sooner? Probably not, 
which means we'll need yet another mode to RequestCheckpoint: request a 
non-immediate checkpoint, but if there's a checkpoint already running, 
don't rush it.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PATCHES] [HACKERS] 'Waiting on lock'

2007-06-19 Thread Gregory Stark
Gregory Stark [EMAIL PROTECTED] writes:

 Incidentally in looking at this I found that the early deadlock detection
 never seems to fire. Reading the source it seems it ought to be firing
 whenever we have a simple two-process deadlock. But instead I only get the
 timeout-based detection.

Ok, I understand now that early deadlock detection only kicks in when doing
something like LOCK TABLE and even then only if you're deadlocking because
you're upgrading a lock. So this works as intended though it's much less
useful than I thought.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] Preliminary GSSAPI Patches

2007-06-19 Thread Magnus Hagander
On Sun, May 20, 2007 at 01:28:40AM -0700, Henry B. Hotz wrote:
 I finally got to testing that updated patch.  It's fine per-se, but  
 was missing the updated README.GSSAPI file.  Herewith fixed.
 

I've been reviewing and updating this patch, for a while now.I've changed
quite a bit around, and I have it working fine, but I have one question.

Is there a way to provoke GSSAPI into sending multiple packets in the
authentication? It doesn't seem to do that for me, and ISTM that the code
as it stands is broken in that case - but I'd like to verify it.

Basically, pg_GSS_recvauth() is supposed to loop and read all continuing
exchange packets, right? But the reading of packets from the network sits
*outside* the loop. So it basically just loops over and over on the same
data, which ISTM is wrong. It does send a proper ask-for-continue message
to the frontend inside the loop, but I can't figure out how it's supposed
to read the response.

It looks like the fix should be as simple as moving the packet reading into
the loop, but again I'd like a way to test it :)

//Magnus


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PATCHES] WIP: rewrite numeric division

2007-06-19 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 Yeah, my proposed patch is schoolbook division.  I don't think I'd trust
 Newton's method to give anything but a close approximation, which is
 what we have in HEAD already.

 Well unless we start storing rational numbers it'll always be a limited to a
 finite number of decimals. The key is whether we can guarantee a lower bound
 on the number of accurate decimal places. As long as we can then we can keep
 going until the decimal places we're going to return are all accurate.

This is nonsense.  Consider an approximate division that gives

...123.99...

You can keep generating 9's forever, but you'll never know whether the
accurate result of trunc() should be 123 or 124.  Unless you use a
method that gives you trustworthy digits to start with.

 It looks like multiplication can also generate incorrect results.

It can, but only if told to produce fewer digits than would be in the
exact result, so I'm not worried about that.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PATCHES] [HACKERS] 'Waiting on lock'

2007-06-19 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Attached is a patch which moves the messages to ProcSleep(). To do this I had
 to move the semaphore signal to unconditionally be signalled whenever
 CheckDeadLock() is called regardless of whether it finds a hard deadlock. I'm
 not 100% sure that's correct but afaik we only use semaphores to signal state
 changes and deal with spurious semaphore firings everywhere.

It's a bit itchy-seeming, but I think what you are actually doing is
going over to a regime where every wait on the semaphore checks and
re-waits if it didn't get the condition it wants.  Before, I think we
did that for every use of the semaphore except this one (which is an
outgrowth of the fact that originally this was indeed the only use of
the sema, and the others got shoehorned in over time).

I'll check it over.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PATCHES] WIP: rewrite numeric division

2007-06-19 Thread Michael Paesold

Tom Lane wrote:

I wrote:

...

Now it's unlikely that real-world applications are going to be as
dependent on the speed of div_var for long inputs as numeric_big is.
And getting the right answer has to take priority over speed anyway.
Still this is a bit annoying.  Anyone see a way to speed it up, or
have another approach?

regards, tom lane


+1 for the change from me.

We use PostgreSQL for financial accounting stuff, including plpgsql 
triggers and functions etc. And we use numeric for all that. I always 
thought that numeric division was exact! :-) I never saw problem, 
perhaps because we round to very few digits, but well.


Please apply this patch. I can accept the performance drop, as long as 
there won't be bad surprises with correctness.


Best Regards
Michael Paesold


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[PATCHES] more autovacuum fixes

2007-06-19 Thread Alvaro Herrera
I attach a patch to fix some issues in the autovac process handling
code.  Mainly, instead of rechecking periodically in the launcher
whether a worker was able to start or not, what we do is signal it from
the postmaster when it fails.

In order to do this I had to make the postmaster set a flag in shared
memory, much like the pmsignal.c code does.  According to previous
discussions this is an acceptable thing for postmaster to do.

I must admit I did not try to fill shared memory with garbage and see if
it caused a postmaster crash.  What I did test was creating a database,
making sure that the launcher was picking it up, and then delete the
database's directory.  This correctly causes a worker to log errors when
trying to connect, and it is handled well -- i.e. other databases are
not starved.  (One remaining problem I see is that a database in this
state, or similar, that fails to be vacuumed, *will* cause starvation if
in danger of Xid wraparound).

Another thing I noticed is that I can change the autovacuum parameter
in postgresql.conf and send SIGHUP to postmaster, which correctly
starts the launcher if it was previously disabled.  To make the shutdown
case act accordingly, I made the launcher check the start daemon flag
after rereading the config file, and quit if it's off.  I tried multiple
combinations of having it set and unset, changing
stats_collect_tuple_level and it works fine.

One problem with the patch is this (new code):

bn = (Backend *) malloc(sizeof(Backend));
!   if (bn)
{
!   bn-pid = StartAutoVacWorker();
!   bn-is_autovacuum = true;
!   /* we don't need a cancel key */
  
!   if (bn-pid  0)
!   {
!   /* FIXME -- unchecked memory allocation here */
!   DLAddHead(BackendList, DLNewElem(bn));


If the palloc() inside DLNewElem fails, we will fail to report a fork
failure to the launcher.  I am not sure how serious this is.  One idea
that came to mind was using a PG_TRY block, sending the signal in the
CATCH block, and then rethrowing the exception.  Is this acceptable?


All in all, this patch increases the reliability of autovacuum, so I
intend to apply it shortly unless there are objections.  Further
improvements might come later as I become aware of possible fixes.

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/DXLWNGRJD34J
Hay quien adquiere la mala costumbre de ser infeliz (M. A. Evans)
Index: src/backend/postmaster/autovacuum.c
===
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/postmaster/autovacuum.c,v
retrieving revision 1.50
diff -c -p -r1.50 autovacuum.c
*** src/backend/postmaster/autovacuum.c	13 Jun 2007 21:24:55 -	1.50
--- src/backend/postmaster/autovacuum.c	19 Jun 2007 17:30:00 -
***
*** 4,16 
   *
   * PostgreSQL Integrated Autovacuum Daemon
   *
   *
   * Portions Copyright (c) 1996-2007, PostgreSQL Global Development Group
   * Portions Copyright (c) 1994, Regents of the University of California
   *
   *
   * IDENTIFICATION
!  *	  $PostgreSQL: pgsql/src/backend/postmaster/autovacuum.c,v 1.50 2007-06-13 21:24:55 alvherre Exp $
   *
   *-
   */
--- 4,52 
   *
   * PostgreSQL Integrated Autovacuum Daemon
   *
+  * The autovacuum system is structured in two different kinds of processes: the
+  * autovacuum launcher and the autovacuum worker.  The launcher is an
+  * always-running process, started by the postmaster when the autovacuum GUC
+  * parameter is set.  The launcher schedules autovacuum workers to be started
+  * when appropriate.  The workers are the processes which execute the actual
+  * vacuuming; they connect to a database as determined in the launcher, and
+  * once connected they examine the catalogs to select the tables to vacuum.
+  *
+  * The autovacuum launcher cannot start the worker processes by itself,
+  * because doing so would cause robustness issues (namely, failure to shut
+  * them down on exceptional conditions; and apart from that, since the launcher
+  * is connected to shared memory and is thus subject to corruption there, it is
+  * not as robust as the postmaster which is not).  So it leaves that task to
+  * the postmaster.
+  *
+  * There is an autovacuum shared memory area, where the launcher stores
+  * information about the database it wants vacuumed.  When it wants a new
+  * worker to start, it sets a flag in shared memory and sends a special signal
+  * to the postmaster.  Then postmaster knows nothing more than it must start a
+  * worker; so it forks a new child, which turns into a worker and examines
+  * shared memory to know what database it must connect to.
+  *
+  * If the fork() call fails, the postmaster sets a flag in the shared memory
+  * area, and it sends a signal to the launcher.  The launcher, upon noticing
+  * the flag, can try starting the worker again.  Note that the failure 

Re: [PATCHES] [HACKERS] 'Waiting on lock'

2007-06-19 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Attached is a patch which moves the messages to ProcSleep().

BTW, with the messages moved out of the ISR it would be safe to make it
identify the specific lock being waited on (in the same terms used in
the existing deadlock messages).  Is there a reason not to do that?
I suppose it would eat a few more cycles ...

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PATCHES] [HACKERS] 'Waiting on lock'

2007-06-19 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 * Tom Lane ([EMAIL PROTECTED]) wrote:
 In fact, I am scandalized to see that someone has inserted a boatload
 of elog calls into CheckDeadLock since 8.2 --- that seems entirely
 unsafe.  [ checks revision history... ]

 Attached is a patch which moves the messages to ProcSleep().

Applied with some further revisions to improve the usefulness of the log
messages.  There's now one issued when the deadlock timeout elapses, and
another when the lock is finally obtained:

LOG:  process 14945 still waiting for AccessExclusiveLock on relation 86076 of 
database 86042 after 1003.354 ms
...
LOG:  process 14945 acquired AccessExclusiveLock on relation 86076 of database 
86042 after 5918.002 ms

although I just realized that the wording of the second one is
misleading; it actually comes out when the lock wait ends, whether we
acquired the lock or not.  (The other possibility is that our statement
was aborted, eg by SIGINT or statement_timeout.)

Is it worth having two messages for the two cases?  I'm tempted to just
not log anything if the statement is aborted --- the cause of the abort
should be reflected in some later error message, and reporting how long
we waited before giving up seems not within the charter of
log_lock_waits.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PATCHES] [HACKERS] 'Waiting on lock'

2007-06-19 Thread Simon Riggs
On Tue, 2007-06-19 at 16:24 -0400, Tom Lane wrote:
 Gregory Stark [EMAIL PROTECTED] writes:
  * Tom Lane ([EMAIL PROTECTED]) wrote:
  In fact, I am scandalized to see that someone has inserted a boatload
  of elog calls into CheckDeadLock since 8.2 --- that seems entirely
  unsafe.  [ checks revision history... ]
 
  Attached is a patch which moves the messages to ProcSleep().

Thanks Greg for taking this on; it would still be in my queue now if you
hadn't, so much appreciated.

 Applied with some further revisions to improve the usefulness of the log
 messages.  There's now one issued when the deadlock timeout elapses, and
 another when the lock is finally obtained:
 
 LOG:  process 14945 still waiting for AccessExclusiveLock on relation 86076 
 of database 86042 after 1003.354 ms
 ...
 LOG:  process 14945 acquired AccessExclusiveLock on relation 86076 of 
 database 86042 after 5918.002 ms
 
 although I just realized that the wording of the second one is
 misleading; it actually comes out when the lock wait ends, whether we
 acquired the lock or not.  (The other possibility is that our statement
 was aborted, eg by SIGINT or statement_timeout.)
 
 Is it worth having two messages for the two cases?  I'm tempted to just
 not log anything if the statement is aborted --- the cause of the abort
 should be reflected in some later error message, and reporting how long
 we waited before giving up seems not within the charter of
 log_lock_waits.

Sounds good; thanks Tom.


related TODO items:
- add a WAIT n clause in same SQL locations as NOWAIT
- add a lock_wait_timeout (USERSET), default = 0 (unlimited waiting)

to provide better control over lock waits.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PATCHES] Preliminary GSSAPI Patches

2007-06-19 Thread Henry B. Hotz

Such timing!

I just spent most of yesterday stepping though the gssapi sample  
app's in Java 1.4 with someone here at work.  Was thinking I needed  
to get back to the JDBC client and do what I promised.  Also finished  
filtering the PG lists for stuff just before seeing this email.


On Jun 19, 2007, at 6:04 AM, Magnus Hagander wrote:


On Sun, May 20, 2007 at 01:28:40AM -0700, Henry B. Hotz wrote:

I finally got to testing that updated patch.  It's fine per-se, but
was missing the updated README.GSSAPI file.  Herewith fixed.



I've been reviewing and updating this patch, for a while now.I've  
changed
quite a bit around, and I have it working fine, but I have one  
question.


Be curious to see what you've done, but if you're actively changing  
things I'll let them settle.



Is there a way to provoke GSSAPI into sending multiple packets in the
authentication? It doesn't seem to do that for me, and ISTM that  
the code

as it stands is broken in that case - but I'd like to verify it.


Remember wondering about that myself.  For SASL if you turn on all  
the options you get an extra round trip.  Not for GSSAPI/Krb5, which  
is pretty efficient in that respect.  The loop logic for SASL is just  
different enough I can imagine messing up, but I would have thought  
it would have made me get the logic right.


The only thing I can think of is to use a different GSSAPI  
mechanism.  That opens an interesting can of worms that has nothing  
to do with Postgresql.  First of all that means you need to use a  
GSSAPI implementation that supports multiple mechanisms (which  
precludes Java for now).  That in turn means either Sun, or MIT 1.6+,  
or Heimdal 0.8+.  Second, you need another mechanism to install.   
That means the commercial Entrust SPKM mechanism on Sun, or the UMICH  
SPKM mechanism, or some SPNEGO mechanism.


Love says there are problems with the SPKM RFC and the UMICH  
implementation won't interoperate with other implementations as a  
result (but it works with itself).  I also know he's been  
experimenting with other mechanisms.  Looking at the latest Heimdal  
snapshot I have, it seems to have both SPNEGO and NTLM mechanism code  
in it.


A configuration that used SPNEGO to negotiate Kerberos 5 ought to  
take two round trips, at least.  Feel like trying it?


I never tested my patches against a Heimdal gssapi library.  My bad.   
If you try the patches against Heimdal GSSAPI/SPNEGO/Krb5 that ought  
to be a really good test case.  It would be a good server case to try  
with a native Windows client as well.


Basically, pg_GSS_recvauth() is supposed to loop and read all  
continuing
exchange packets, right? But the reading of packets from the  
network sits
*outside* the loop. So it basically just loops over and over on the  
same
data, which ISTM is wrong. It does send a proper ask-for-continue  
message
to the frontend inside the loop, but I can't figure out how it's  
supposed

to read the response.


I remember having to stand on my head to get it (apparently) right on  
the client side.  I also remember the server side was a lot simpler,  
but you're saying I may have oversimplified?  I can't answer the  
question about the server side without reviewing the code.


It looks like the fix should be as simple as moving the packet  
reading into

the loop, but again I'd like a way to test it :)


Well, probably, but that doesn't sound that simple to me (assuming I  
really didn't do that to begin with).



//Magnus



The opinions expressed in this message are mine,
not those of Caltech, JPL, NASA, or the US Government.
[EMAIL PROTECTED], or [EMAIL PROTECTED]



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PATCHES] Cancel autovacuum conflicting with DROP TABLE

2007-06-19 Thread ITAGAKI Takahiro
Here is a patch that cancels autovacuum workers conflicting with
DROP TABLE, TRUNCATE and CLUSTER. It was discussed here:
http://archives.postgresql.org/pgsql-hackers/2007-06/msg00556.php

Before backends drop a table, they search autovacuum workers that
are running on the table. If found, they send SIGINT signal to
the autovacuum to avoid waiting for the worthless vacuum.

When an autovacuum worker receives SIGINT, it skips only the vacuuming
table and continues the remaining jobs. Now SIGINT and SIGTERM have
different meanings and their logs are changed.

  SIGINT -- Cancel the current job.
  DEBUG: autovacuum on relation schema.table is canceled
  SIGTERM -- Cancel all jobs.
  FATAL: terminating autovacuum due to administrator command

We can see the second SIGTERM log on shutdown and DROP DATABASE.
In such case, we send SIGTERM to autovacuums instead of SIGINT.
I'd like to bring down the error level to WARNING or less actually,
but I have no idea to do it because canceling queries and error levels
are tightly coupled in the present implementation.
Instead, I added the special FATAL message for autovacuum workers
so that we can distinguish the lines are ignorable.

Comments welcome.

---
ITAGAKI Takahiro
NTT Open Source Software Center


cancel_autovac_on_drop.patch
Description: Binary data

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match