Re: [PATCHES] fork/exec patch: pre-CreateProcess finalization

2003-12-26 Thread Tom Lane
Claudio Natoli [EMAIL PROTECTED] writes:
 This has required some reworking of the existing code base, particularly to
 BackendFork (which now actually does the fork()). As such, I've been
 anticipating that this will be the most controversial of the fork/exec
 patches, so critique away :-)

You haven't explained why that's necessary.  Given the fact that this
patch seems to hugely complicate the postmaster logic --- not so much
either path individually, but the messy #ifdef interleaving of two
radically different programs --- I am inclined to reject it on style
grounds alone.

We should either find a way to make the fork/exec path more like the
existing code, or bite the bullet and change them both.  Doing it the
way you have here will create an unmaintainable mess.  I'm not prepared
to work on a postmaster in which a step as basic as fork() happens in
two different places depending on an #ifdef.

If you want to change them both, let's first see the reason why it's
necessary.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PATCHES] fork/exec patch: pre-CreateProcess finalization

2003-12-26 Thread Tom Lane
I said:
 We should either find a way to make the fork/exec path more like the
 existing code, or bite the bullet and change them both.

It occurs to me that you probably need some concrete suggestions about
what to do.  Here's one.

There are really three separate sections in this code: the postmaster
(which no longer does much except fork off a subprocess on receipt of a
new connection), the sub-postmaster which does client authentication,
and the backend.

Historical note: formerly the sub-postmaster actions were done in the
parent postmaster process, with a sort of poor man's threading logic
to allow the postmaster to interleave multiple client authentication
operations.  We had to change that because various libraries like SSL,
PAM, and Kerberos weren't amenable to pseudo-threading.

It is worth maintaining a clean distinction between sub-postmaster and
backend because when we launch a standalone backend, we don't want the
sub-postmaster part of the code.  The command line syntax accepted by
PostgresMain() is largely designed for the convenience of the standalone
backend case, with a couple of simple hacks for the under-postmaster case.

Now it seems to me that a lot of the mess in your latest patch comes
from trying to set up the backend command string before you've forked,
and therefore before you've done any of the sub-postmaster actions.
That's not gonna work.

What I'd suggest is adding a new entry point from main.c, say
SubPostmasterMain(), with its own command line syntax.  Perhaps
postmaster -fork ... additional args ...
(where -fork is what cues main.c where to dispatch to, and the
additional args are just those that need to be passed from the
postmaster to the sub-postmaster, without any of the additional
information that will be learned by the sub-postmaster during
client authentication.)

In the Windows case the postmaster would fork/exec to this routine,
which would re-load as much of the postmaster context as was needed
and then call BackendFork (which we really oughta rename to something
else).  BackendFork would execute the client auth process and then
simply call PostgresMain with a correctly constructed argument list.

In the Unix case we don't bother compiling SubPostmasterMain, we just
fork and go straight to BackendFork as the code does now.

In essence, this design makes SubPostmasterMain responsible for
emulating Unix-style fork(), ie, fork without loss of static variable
contents.  It would need to reload as many variables as we needed.

I think this is a cleaner approach since it creates a layer of code
which is simply there in one case and not there in the other, rather
than making arbitrary changes in the layers above and below depending
on #ifdefs.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PATCHES] update i386 spinlock for hyperthreading

2003-12-26 Thread Tom Lane
Manfred Spraul [EMAIL PROTECTED] writes:
 Intel recommends to add a special pause instruction into spinlock busy 
 loops. It's necessary for hyperthreading - without it, the cpu can't 
 figure out that a logical thread does no useful work and incorrectly 
 awards lots of execution resources to that thread. Additionally, it's 
 supposed to reduce the time the cpu needs to recover from the 
 (mispredicted) branch after the spinlock was obtained.

Don't you have to put it in a specific place in the loop to make that
work?  If not, why not?  I doubt that rep;nop is magic enough to
recognize the loop that will be generated from s_lock()'s code.

My guess is that it'd be more useful to insert the rep;nop into the
failure branch of the TAS macro and forget about the separate CPU_DELAY
construct.  This would allow you to control where exactly rep;nop
appears relative to the xchgb.

 Additionally I've increased the number of loops from 100 to 1000 

I think this change is almost certainly counterproductive; for any
platform other than the Xeon, remove almost.

 + #ifndef HAS_CPU_DELAY
 + #define CPU_DELAY() cpu_delay()
 + 
 + static __inline__ void
 + cpu_delay(void)
 + {
 + }
 + #endif

This breaks every non-gcc compiler in the world (or at least all those
that don't recognize __inline__).  If you really want to keep CPU_DELAY,
consider

#ifndef CPU_DELAY
#define CPU_DELAY()
#endif

but as stated above, I'm dubious that the bottom of the s_lock loop
is the place to be adding anything anyway.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PATCHES] update i386 spinlock for hyperthreading

2003-12-26 Thread Manfred Spraul
Tom Lane wrote:

Manfred Spraul [EMAIL PROTECTED] writes:
 

Intel recommends to add a special pause instruction into spinlock busy 
loops. It's necessary for hyperthreading - without it, the cpu can't 
figure out that a logical thread does no useful work and incorrectly 
awards lots of execution resources to that thread. Additionally, it's 
supposed to reduce the time the cpu needs to recover from the 
(mispredicted) branch after the spinlock was obtained.
   

Don't you have to put it in a specific place in the loop to make that
work?  If not, why not?  I doubt that rep;nop is magic enough to
recognize the loop that will be generated from s_lock()'s code.
 

Rep;nop is just a short delay - that's all. It means that the cpu 
pipelines have a chance to drain, and that the other thread gets enough 
cpu resources. Below is the full instruction documentation, from the 
latest ia32 doc set from Intel:

Improves the performance of spin-wait loops. When executing a  spin-wait 
loop,  a Pentium 4 or Intel Xeon processor suffers a severe performance 
penalty when exiting the loop because it detects a possible memory order 
violation. The PAUSE instruction provides a hint to the processor that 
the code sequence is a spin-wait loop. The processor uses this hint to 
avoid the memory order violation in most situations, which greatly 
improves processor performance. For this reason, it is recommended that 
a PAUSE instruction be placed in all spin-wait loops. An additional 
function of the PAUSE instruction is to reduce the power consumed by a 
Pentium 4 processor while executing a spin loop. The Pentium 4 processor 
can execute a spin-wait loop extremely quickly, causing the processor to 
consume a lot of power while it waits for the resource it is spinning on 
to become available. Inserting a pause instruction in a spin-wait loop 
greatly reduces the processor s power consumption. This instruction was 
introduced in the Pentium 4 processors, but is backward compatible with 
all IA-32 processors. In earlier IA-32 processors, the PAUSE instruction 
operates like a NOP instruction. The Pentium 4 and Intel Xeon processors 
implement the PAUSE instruction as a pre-defined delay. The delay is 
finite and can be zero for some processors. This instruction does not 
change the architectural state of the processor (that is, it performs 
essentially a delaying noop operation).


I think a separate function is better than adding it into TAS: if it's 
part of tas, then it would automatically be included by every 
SpinLockAcquire call - unnecessary .text bloat. Additionally, there 
might be other busy loops, in addition to TAS, that could use a delay 
function.

I'll post a new patch that doesn't rely on __inline__ in the i386 
independant part.

--
   Manfred
---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PATCHES] update i386 spinlock for hyperthreading

2003-12-26 Thread Tom Lane
Manfred Spraul [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Don't you have to put it in a specific place in the loop to make that
 work?  If not, why not?
 
 Rep;nop is just a short delay - that's all.

That view seems to me to be directly contradicted by this statement:

 The PAUSE instruction provides a hint to the processor that 
 the code sequence is a spin-wait loop. The processor uses this hint to 
 avoid the memory order violation in most situations, which greatly 
 improves processor performance.

It's not apparent to me how a short delay translates into avoiding a
memory order violation (possibly some docs on what that means exactly
might help...).  I suspect strongly that there needs to be some near
proximity between the PAUSE instruction and the lock-test instruction
for this to work as advertised.  It would help if Intel were less coy
about what the instruction really does.

 This instruction does not change the architectural state of the
 processor (that is, it performs essentially a delaying noop
 operation).

This can be rephrased as we're not telling you what this instruction
really does, because its interesting effects are below the level of the
instruction set architecture.  Great.  How are we supposed to know
how to use it?

 I think a separate function is better than adding it into TAS: if it's 
 part of tas, then it would automatically be included by every 
 SpinLockAcquire call - unnecessary .text bloat.

Why do you think it's unnecessary?  One thing that I find particularly
vague in the quoted documentation is the statement that the PAUSE
instruction is needed to avoid a delay when *exiting* the spin-wait
loop.  Doesn't this mean that a PAUSE is needed in the success path
when the first TAS succeeds (i.e, the normal no-contention path)?
If not, why not?  If so, does it go before or after the lock
instruction?

Also, if the principal effect is a short delay, do we really need it
at all considering that our inner loop in s_lock is rather more than
an xchgb followed by a conditional branch?  There will be time for
the write queue to drain while we're incrementing and testing our
spin counter (which I trust is in a register...).

The reason I'm so full of questions is that I spent some time several
days ago looking at exactly this issue, and came away with only the
conclusion that I had to find some more-detailed documentation before
I could figure out what we should do about the spinlocks for Xeons.
You have not convinced me that you know more about the issue than I do.
A 10% speedup is nice, but how do we know that that's what we should
expect to get?  Maybe there's a lot more to be won by doing it correctly
(for some value of correctly).

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PATCHES] fork/exec patch: pre-CreateProcess finalization

2003-12-26 Thread Claudio Natoli

Thanks for your comments Tom.


Tom Lane writes:
 Claudio Natoli [EMAIL PROTECTED] writes:
  This has required some reworking of the existing code base, particularly
to
  BackendFork (which now actually does the fork()). As such, I've been
  anticipating that this will be the most controversial of 
 the fork/exec patches, so critique away :-)
 
 You haven't explained why that's necessary.  Given the fact that this
 patch seems to hugely complicate the postmaster logic --- not so much
 either path individually, but the messy #ifdef interleaving of two
 radically different programs --- I am inclined to reject it on style
 grounds alone.

I have to agree, as I wasn't particularly happy with the BackendFork section
of this patch. You didn't really seem to comment on the pgstat changes,
which were much cleaner, so perhaps (like me) you were happier with these
changes.

So, could you give me an indication as to whether or not making changes to
the BackendFork logic as done for the pgstat logic (specifically, replacing
fork() call with a new, argument marshalling routine to do fork/exec) would
be more acceptable?  The reason I avoided this in the BackendFork case was,
predominantly, code duplication. Creating a new BackendForkExec would, I
think, result in a lot of similarity between the two routines. [yes,
BackendFork is poorly named, although I'd still think of moving the fork()
call into BackendFork anyway, thereby justifying its name. But, much more
importantly, for symmetry with the code format that will inevitably arise in
the fork/exec case; see below]


 We should either find a way to make the fork/exec path more like the
 existing code, or bite the bullet and change them both.  Doing it the
 way you have here will create an unmaintainable mess.  I'm not prepared
 to work on a postmaster in which a step as basic as fork() happens in
 two different places depending on an #ifdef.

[unmaintainable mess: yeah, I had the exact same thought when I'd
finished. No one's ever going to be able to rework this properly.]

I'm afraid that, short of removing all SubPostmaster actions from
BackendFork, the need to do fork in two different places (at least,
physically, if not logically different) will remain. After fork/exec'ing,
the child process can't recover the context needed to create the argument
list which it'll need to build up the PostgresMain arg list.

So, in summary, I see two solutions:

a) do something similar to BackendFork as done for pgstat, specifically:
- move the fork call into BackendFork
- create a fork/exec equivalent to BackendFork
- #ifdef the call to the appropriate function in BackendStartup
  PRO: child is forked in the same logical place
  CON: possible duplication in code between BackendFork + BackendForkExec

b) delay the fork() call to the end of BackendFork in both fork/exec and
normal cases, turning it into solely an argument formatter for PostgresMain
  - move BackendInit, port-cmdline_options parsing + MemoryContext
calls into PostgresMain
  PRO: forking in same (physical) location in code; no duplication
  CON: additional complexity at the start of PostgresMain to avoid these
calls in stand-alone backend case

Which is the lesser of two evils? FWIW, I prefer the latter, as it makes the
normal + fork/exec cases effectively identical, for a little added
complexity to PostgresMain.


There's also the alternative you outlined, but I think it is less workable
than the above options:

 Now it seems to me that a lot of the mess in your latest patch comes
 from trying to set up the backend command string before you've forked,
 and therefore before you've done any of the sub-postmaster actions.
 That's not gonna work.

I agree that this is where the mess comes from, but I think this (setting up
backend command string before fork) is *exactly* what needs to occur in the
fork/exec case (leading into the Win32 CreateProcess calls). If we want this
code to work in the Win32 case, we basically can't do any processing between
the fork + exec calls, and I think recovering all the context to do the
command string marshalling post fork/exec is untenable.


 What I'd suggest is adding a new entry point from main.c, say
 SubPostmasterMain(), with its own command line syntax.  Perhaps
   postmaster -fork ... additional args ...
 (where -fork is what cues main.c where to dispatch to, and the
 additional args are just those that need to be passed from the
 postmaster to the sub-postmaster, without any of the additional
 information that will be learned by the sub-postmaster during
 client authentication.)
 
 In the Windows case the postmaster would fork/exec to this routine,
 which would re-load as much of the postmaster context as was needed
 and then call BackendFork (which we really oughta rename to something
 else).  BackendFork would execute the client auth process and then
 simply call PostgresMain with a correctly constructed argument list.

Here's the critical 

Re: [PATCHES] fork/exec patch: pre-CreateProcess finalization

2003-12-26 Thread Tom Lane
Claudio Natoli [EMAIL PROTECTED] writes:
 Ok. So, in conclusion we are in agreement on at least one thing: the current
 patch sucks.

Good, I'm glad we see eye to eye on that ;-)

 Here's the critical difference in our thinking: 

   ISTM that we'll have to go to a lot of work to get the fork/exec
 case to re-load the context required to format up the argument list to
 PostgresMain. Certainly, it is a lot more work than allowing the postmaster
 itself (not the forked child) to create the argument list, albeit moving
 auth to PostgresMain.

I don't follow your thinking here.  The information that has to be
reloaded *must* be passed across the fork/exec boundary in either case.
For example, there is no alternative to telling the backend process
where PGDATA is: it's got to be passed down some way or another.  The
Unix implementation assumes it can just inherit the value of a static
variable, while the Windows version will have to do something else.
I take no position here on whether we pass it as a command line item,
or through a temporary file, or whatever: it has to be passed down.

The reason your patch is messy is that the PostgresMain command line
string involves both information passed down from the postmaster and
information acquired from the client's connection-request packet.
We could consider redesigning that command line behavior, but we don't
really want to since it would impact the standalone-backend case.  So
I'm proposing substituting two levels of command-line processing within
the exec'd backend process.  The first level is responsible for
transferring information inherited from the postmaster (which we are
going to have to do *anyway*, somewhere, and this provides a nice
localized place to do it) and then the second level is fully compatible
with the standalone-backend case and the Unix-fork case.

 I'm afraid that, short of removing all SubPostmaster actions from
 BackendFork, the need to do fork in two different places (at least,
 physically, if not logically different) will remain.

I am not by any means saying that the fork() call has to stay exactly
where it is in the existing code.  We clearly want to refactor things
so that fork() is close to the invocation of FooMain() (or equivalently
exec() in the Windows case).  I think it's a historical artifact that
these steps got separated in the existing code base.

 After fork/exec'ing,
 the child process can't recover the context needed to create the argument
 list which it'll need to build up the PostgresMain arg list.

If that's literally true, then we are screwed because things cannot
work.  Pardon me for doubting this statement.  All that information
*must* be available in the child, once it has analyzed the information
it must be able to collect from the postmaster and performed the client
authentication handshake.

 So, in summary, I see two solutions:

I still think there's a third answer: create an intermediate layer
that's responsible for recovering the information that has to be
inherited from the parent process.  I don't say this requires zero
rearrangement of existing code, I just say it's a more maintainable
design than either of the alternatives you suggest.

In particular, this alternative is quite unworkable:

 b) delay the fork() call to the end of BackendFork in both fork/exec and
 normal cases, turning it into solely an argument formatter for PostgresMain

We can *not* postpone the fork() until after client authentication.
That loses all of the work that's been done since circa 7.1 to eliminate
cases where the postmaster gets blocked waiting for a single client to
respond.  SSL, PAM, and I think Kerberos all create denial-of-service
risks if we do that.

I believe most of what you are presently struggling with revolves
around the fact that the fork() got pushed up to happen before client
authentication, while the interface to PostgresMain didn't change.
I do not want to alter either of those decisions, however.  What we need
is to provide an alternative exec-able interface that encapsulates the
processing that now occurs between these steps.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PATCHES] Quoting of psql \d output

2003-12-26 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes:
 Christopher Kings-Lynne writes:
 Now you've lost me - how is a user-inputted object name translatable?

 Not all languages use ... as quote symbols, but if you make them part of
 some string that comes from the backend, it becomes prohibitively hard to
 translate it correctly.

Hm.  This gets back to the point we've discussed before: there is some
confusion between SQL's use of quoted identifiers and the customary
English use of quote marks to set off text that should be distinguished
from the surrounding sentence.

Essentially, Bruce's proposed patch moves the use of quotes in \d table
headers to conform to SQL's technical use of quotes, while you are
arguing for sticking with the message style guidelines' human-oriented
use of quotes.

I can see merit in both positions.  But I also see merit in the
compromise position of not using quotes at all.  I don't see a strong
need to demarcate table name from surrounding text in a context as
simple as this --- is
Table foo
really any easier to read than
Table foo
?  And Peter is correct that the former introduces translation issues
when you think about languages that don't customarily use ... as
quotation marks.

regards, tom lane

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match