Re: [PATCHES] fork/exec patch: pre-CreateProcess finalization
Claudio Natoli [EMAIL PROTECTED] writes: This has required some reworking of the existing code base, particularly to BackendFork (which now actually does the fork()). As such, I've been anticipating that this will be the most controversial of the fork/exec patches, so critique away :-) You haven't explained why that's necessary. Given the fact that this patch seems to hugely complicate the postmaster logic --- not so much either path individually, but the messy #ifdef interleaving of two radically different programs --- I am inclined to reject it on style grounds alone. We should either find a way to make the fork/exec path more like the existing code, or bite the bullet and change them both. Doing it the way you have here will create an unmaintainable mess. I'm not prepared to work on a postmaster in which a step as basic as fork() happens in two different places depending on an #ifdef. If you want to change them both, let's first see the reason why it's necessary. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PATCHES] fork/exec patch: pre-CreateProcess finalization
I said: We should either find a way to make the fork/exec path more like the existing code, or bite the bullet and change them both. It occurs to me that you probably need some concrete suggestions about what to do. Here's one. There are really three separate sections in this code: the postmaster (which no longer does much except fork off a subprocess on receipt of a new connection), the sub-postmaster which does client authentication, and the backend. Historical note: formerly the sub-postmaster actions were done in the parent postmaster process, with a sort of poor man's threading logic to allow the postmaster to interleave multiple client authentication operations. We had to change that because various libraries like SSL, PAM, and Kerberos weren't amenable to pseudo-threading. It is worth maintaining a clean distinction between sub-postmaster and backend because when we launch a standalone backend, we don't want the sub-postmaster part of the code. The command line syntax accepted by PostgresMain() is largely designed for the convenience of the standalone backend case, with a couple of simple hacks for the under-postmaster case. Now it seems to me that a lot of the mess in your latest patch comes from trying to set up the backend command string before you've forked, and therefore before you've done any of the sub-postmaster actions. That's not gonna work. What I'd suggest is adding a new entry point from main.c, say SubPostmasterMain(), with its own command line syntax. Perhaps postmaster -fork ... additional args ... (where -fork is what cues main.c where to dispatch to, and the additional args are just those that need to be passed from the postmaster to the sub-postmaster, without any of the additional information that will be learned by the sub-postmaster during client authentication.) In the Windows case the postmaster would fork/exec to this routine, which would re-load as much of the postmaster context as was needed and then call BackendFork (which we really oughta rename to something else). BackendFork would execute the client auth process and then simply call PostgresMain with a correctly constructed argument list. In the Unix case we don't bother compiling SubPostmasterMain, we just fork and go straight to BackendFork as the code does now. In essence, this design makes SubPostmasterMain responsible for emulating Unix-style fork(), ie, fork without loss of static variable contents. It would need to reload as many variables as we needed. I think this is a cleaner approach since it creates a layer of code which is simply there in one case and not there in the other, rather than making arbitrary changes in the layers above and below depending on #ifdefs. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PATCHES] update i386 spinlock for hyperthreading
Manfred Spraul [EMAIL PROTECTED] writes: Intel recommends to add a special pause instruction into spinlock busy loops. It's necessary for hyperthreading - without it, the cpu can't figure out that a logical thread does no useful work and incorrectly awards lots of execution resources to that thread. Additionally, it's supposed to reduce the time the cpu needs to recover from the (mispredicted) branch after the spinlock was obtained. Don't you have to put it in a specific place in the loop to make that work? If not, why not? I doubt that rep;nop is magic enough to recognize the loop that will be generated from s_lock()'s code. My guess is that it'd be more useful to insert the rep;nop into the failure branch of the TAS macro and forget about the separate CPU_DELAY construct. This would allow you to control where exactly rep;nop appears relative to the xchgb. Additionally I've increased the number of loops from 100 to 1000 I think this change is almost certainly counterproductive; for any platform other than the Xeon, remove almost. + #ifndef HAS_CPU_DELAY + #define CPU_DELAY() cpu_delay() + + static __inline__ void + cpu_delay(void) + { + } + #endif This breaks every non-gcc compiler in the world (or at least all those that don't recognize __inline__). If you really want to keep CPU_DELAY, consider #ifndef CPU_DELAY #define CPU_DELAY() #endif but as stated above, I'm dubious that the bottom of the s_lock loop is the place to be adding anything anyway. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PATCHES] update i386 spinlock for hyperthreading
Tom Lane wrote: Manfred Spraul [EMAIL PROTECTED] writes: Intel recommends to add a special pause instruction into spinlock busy loops. It's necessary for hyperthreading - without it, the cpu can't figure out that a logical thread does no useful work and incorrectly awards lots of execution resources to that thread. Additionally, it's supposed to reduce the time the cpu needs to recover from the (mispredicted) branch after the spinlock was obtained. Don't you have to put it in a specific place in the loop to make that work? If not, why not? I doubt that rep;nop is magic enough to recognize the loop that will be generated from s_lock()'s code. Rep;nop is just a short delay - that's all. It means that the cpu pipelines have a chance to drain, and that the other thread gets enough cpu resources. Below is the full instruction documentation, from the latest ia32 doc set from Intel: Improves the performance of spin-wait loops. When executing a spin-wait loop, a Pentium 4 or Intel Xeon processor suffers a severe performance penalty when exiting the loop because it detects a possible memory order violation. The PAUSE instruction provides a hint to the processor that the code sequence is a spin-wait loop. The processor uses this hint to avoid the memory order violation in most situations, which greatly improves processor performance. For this reason, it is recommended that a PAUSE instruction be placed in all spin-wait loops. An additional function of the PAUSE instruction is to reduce the power consumed by a Pentium 4 processor while executing a spin loop. The Pentium 4 processor can execute a spin-wait loop extremely quickly, causing the processor to consume a lot of power while it waits for the resource it is spinning on to become available. Inserting a pause instruction in a spin-wait loop greatly reduces the processor s power consumption. This instruction was introduced in the Pentium 4 processors, but is backward compatible with all IA-32 processors. In earlier IA-32 processors, the PAUSE instruction operates like a NOP instruction. The Pentium 4 and Intel Xeon processors implement the PAUSE instruction as a pre-defined delay. The delay is finite and can be zero for some processors. This instruction does not change the architectural state of the processor (that is, it performs essentially a delaying noop operation). I think a separate function is better than adding it into TAS: if it's part of tas, then it would automatically be included by every SpinLockAcquire call - unnecessary .text bloat. Additionally, there might be other busy loops, in addition to TAS, that could use a delay function. I'll post a new patch that doesn't rely on __inline__ in the i386 independant part. -- Manfred ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PATCHES] update i386 spinlock for hyperthreading
Manfred Spraul [EMAIL PROTECTED] writes: Tom Lane wrote: Don't you have to put it in a specific place in the loop to make that work? If not, why not? Rep;nop is just a short delay - that's all. That view seems to me to be directly contradicted by this statement: The PAUSE instruction provides a hint to the processor that the code sequence is a spin-wait loop. The processor uses this hint to avoid the memory order violation in most situations, which greatly improves processor performance. It's not apparent to me how a short delay translates into avoiding a memory order violation (possibly some docs on what that means exactly might help...). I suspect strongly that there needs to be some near proximity between the PAUSE instruction and the lock-test instruction for this to work as advertised. It would help if Intel were less coy about what the instruction really does. This instruction does not change the architectural state of the processor (that is, it performs essentially a delaying noop operation). This can be rephrased as we're not telling you what this instruction really does, because its interesting effects are below the level of the instruction set architecture. Great. How are we supposed to know how to use it? I think a separate function is better than adding it into TAS: if it's part of tas, then it would automatically be included by every SpinLockAcquire call - unnecessary .text bloat. Why do you think it's unnecessary? One thing that I find particularly vague in the quoted documentation is the statement that the PAUSE instruction is needed to avoid a delay when *exiting* the spin-wait loop. Doesn't this mean that a PAUSE is needed in the success path when the first TAS succeeds (i.e, the normal no-contention path)? If not, why not? If so, does it go before or after the lock instruction? Also, if the principal effect is a short delay, do we really need it at all considering that our inner loop in s_lock is rather more than an xchgb followed by a conditional branch? There will be time for the write queue to drain while we're incrementing and testing our spin counter (which I trust is in a register...). The reason I'm so full of questions is that I spent some time several days ago looking at exactly this issue, and came away with only the conclusion that I had to find some more-detailed documentation before I could figure out what we should do about the spinlocks for Xeons. You have not convinced me that you know more about the issue than I do. A 10% speedup is nice, but how do we know that that's what we should expect to get? Maybe there's a lot more to be won by doing it correctly (for some value of correctly). regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] fork/exec patch: pre-CreateProcess finalization
Thanks for your comments Tom. Tom Lane writes: Claudio Natoli [EMAIL PROTECTED] writes: This has required some reworking of the existing code base, particularly to BackendFork (which now actually does the fork()). As such, I've been anticipating that this will be the most controversial of the fork/exec patches, so critique away :-) You haven't explained why that's necessary. Given the fact that this patch seems to hugely complicate the postmaster logic --- not so much either path individually, but the messy #ifdef interleaving of two radically different programs --- I am inclined to reject it on style grounds alone. I have to agree, as I wasn't particularly happy with the BackendFork section of this patch. You didn't really seem to comment on the pgstat changes, which were much cleaner, so perhaps (like me) you were happier with these changes. So, could you give me an indication as to whether or not making changes to the BackendFork logic as done for the pgstat logic (specifically, replacing fork() call with a new, argument marshalling routine to do fork/exec) would be more acceptable? The reason I avoided this in the BackendFork case was, predominantly, code duplication. Creating a new BackendForkExec would, I think, result in a lot of similarity between the two routines. [yes, BackendFork is poorly named, although I'd still think of moving the fork() call into BackendFork anyway, thereby justifying its name. But, much more importantly, for symmetry with the code format that will inevitably arise in the fork/exec case; see below] We should either find a way to make the fork/exec path more like the existing code, or bite the bullet and change them both. Doing it the way you have here will create an unmaintainable mess. I'm not prepared to work on a postmaster in which a step as basic as fork() happens in two different places depending on an #ifdef. [unmaintainable mess: yeah, I had the exact same thought when I'd finished. No one's ever going to be able to rework this properly.] I'm afraid that, short of removing all SubPostmaster actions from BackendFork, the need to do fork in two different places (at least, physically, if not logically different) will remain. After fork/exec'ing, the child process can't recover the context needed to create the argument list which it'll need to build up the PostgresMain arg list. So, in summary, I see two solutions: a) do something similar to BackendFork as done for pgstat, specifically: - move the fork call into BackendFork - create a fork/exec equivalent to BackendFork - #ifdef the call to the appropriate function in BackendStartup PRO: child is forked in the same logical place CON: possible duplication in code between BackendFork + BackendForkExec b) delay the fork() call to the end of BackendFork in both fork/exec and normal cases, turning it into solely an argument formatter for PostgresMain - move BackendInit, port-cmdline_options parsing + MemoryContext calls into PostgresMain PRO: forking in same (physical) location in code; no duplication CON: additional complexity at the start of PostgresMain to avoid these calls in stand-alone backend case Which is the lesser of two evils? FWIW, I prefer the latter, as it makes the normal + fork/exec cases effectively identical, for a little added complexity to PostgresMain. There's also the alternative you outlined, but I think it is less workable than the above options: Now it seems to me that a lot of the mess in your latest patch comes from trying to set up the backend command string before you've forked, and therefore before you've done any of the sub-postmaster actions. That's not gonna work. I agree that this is where the mess comes from, but I think this (setting up backend command string before fork) is *exactly* what needs to occur in the fork/exec case (leading into the Win32 CreateProcess calls). If we want this code to work in the Win32 case, we basically can't do any processing between the fork + exec calls, and I think recovering all the context to do the command string marshalling post fork/exec is untenable. What I'd suggest is adding a new entry point from main.c, say SubPostmasterMain(), with its own command line syntax. Perhaps postmaster -fork ... additional args ... (where -fork is what cues main.c where to dispatch to, and the additional args are just those that need to be passed from the postmaster to the sub-postmaster, without any of the additional information that will be learned by the sub-postmaster during client authentication.) In the Windows case the postmaster would fork/exec to this routine, which would re-load as much of the postmaster context as was needed and then call BackendFork (which we really oughta rename to something else). BackendFork would execute the client auth process and then simply call PostgresMain with a correctly constructed argument list. Here's the critical
Re: [PATCHES] fork/exec patch: pre-CreateProcess finalization
Claudio Natoli [EMAIL PROTECTED] writes: Ok. So, in conclusion we are in agreement on at least one thing: the current patch sucks. Good, I'm glad we see eye to eye on that ;-) Here's the critical difference in our thinking: ISTM that we'll have to go to a lot of work to get the fork/exec case to re-load the context required to format up the argument list to PostgresMain. Certainly, it is a lot more work than allowing the postmaster itself (not the forked child) to create the argument list, albeit moving auth to PostgresMain. I don't follow your thinking here. The information that has to be reloaded *must* be passed across the fork/exec boundary in either case. For example, there is no alternative to telling the backend process where PGDATA is: it's got to be passed down some way or another. The Unix implementation assumes it can just inherit the value of a static variable, while the Windows version will have to do something else. I take no position here on whether we pass it as a command line item, or through a temporary file, or whatever: it has to be passed down. The reason your patch is messy is that the PostgresMain command line string involves both information passed down from the postmaster and information acquired from the client's connection-request packet. We could consider redesigning that command line behavior, but we don't really want to since it would impact the standalone-backend case. So I'm proposing substituting two levels of command-line processing within the exec'd backend process. The first level is responsible for transferring information inherited from the postmaster (which we are going to have to do *anyway*, somewhere, and this provides a nice localized place to do it) and then the second level is fully compatible with the standalone-backend case and the Unix-fork case. I'm afraid that, short of removing all SubPostmaster actions from BackendFork, the need to do fork in two different places (at least, physically, if not logically different) will remain. I am not by any means saying that the fork() call has to stay exactly where it is in the existing code. We clearly want to refactor things so that fork() is close to the invocation of FooMain() (or equivalently exec() in the Windows case). I think it's a historical artifact that these steps got separated in the existing code base. After fork/exec'ing, the child process can't recover the context needed to create the argument list which it'll need to build up the PostgresMain arg list. If that's literally true, then we are screwed because things cannot work. Pardon me for doubting this statement. All that information *must* be available in the child, once it has analyzed the information it must be able to collect from the postmaster and performed the client authentication handshake. So, in summary, I see two solutions: I still think there's a third answer: create an intermediate layer that's responsible for recovering the information that has to be inherited from the parent process. I don't say this requires zero rearrangement of existing code, I just say it's a more maintainable design than either of the alternatives you suggest. In particular, this alternative is quite unworkable: b) delay the fork() call to the end of BackendFork in both fork/exec and normal cases, turning it into solely an argument formatter for PostgresMain We can *not* postpone the fork() until after client authentication. That loses all of the work that's been done since circa 7.1 to eliminate cases where the postmaster gets blocked waiting for a single client to respond. SSL, PAM, and I think Kerberos all create denial-of-service risks if we do that. I believe most of what you are presently struggling with revolves around the fact that the fork() got pushed up to happen before client authentication, while the interface to PostgresMain didn't change. I do not want to alter either of those decisions, however. What we need is to provide an alternative exec-able interface that encapsulates the processing that now occurs between these steps. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PATCHES] Quoting of psql \d output
Peter Eisentraut [EMAIL PROTECTED] writes: Christopher Kings-Lynne writes: Now you've lost me - how is a user-inputted object name translatable? Not all languages use ... as quote symbols, but if you make them part of some string that comes from the backend, it becomes prohibitively hard to translate it correctly. Hm. This gets back to the point we've discussed before: there is some confusion between SQL's use of quoted identifiers and the customary English use of quote marks to set off text that should be distinguished from the surrounding sentence. Essentially, Bruce's proposed patch moves the use of quotes in \d table headers to conform to SQL's technical use of quotes, while you are arguing for sticking with the message style guidelines' human-oriented use of quotes. I can see merit in both positions. But I also see merit in the compromise position of not using quotes at all. I don't see a strong need to demarcate table name from surrounding text in a context as simple as this --- is Table foo really any easier to read than Table foo ? And Peter is correct that the former introduces translation issues when you think about languages that don't customarily use ... as quotation marks. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match