Re: [HACKERS] Bad timestamp external representation
On Thu, Jul 26, 2001 at 05:38:23PM -0400, Bruce Momjian wrote: Nathan Myers wrote: Bruce wrote: I can confirm that current CVS sources have the same bug. It's a bug in timestamp output. # select '2001-07-24 15:55:59.999'::timestamp; ?column? --- 2001-07-24 15:55:60.00-04 (1 row) Richard Huxton wrote: From: tamsin [EMAIL PROTECTED] Hi, Just created a db from a pg_dump file and got this error: ERROR: copy: line 602, Bad timestamp external representation '2000-10-03 09:01:60.00+00' I guess its a bad representation because 09:01:60.00+00 is actually 09:02, but how could it have got into my database/can I do anything about it? The value must have been inserted by my app via JDBC, I can't insert that value directly via psql. Seem to remember a bug in either pg_dump or timestamp rendering causing rounding-up problems like this. If no-one else comes up with a definitive answer, check the list archives. If you're not running the latest release, check the change-log. It is not a bug, in general, to generate or accept times like 09:01:60. Leap seconds are inserted as the 60th second of a minute. ANSI C defines the range of struct member tm.tm_sec as seconds after the minute [0-61], inclusive, and strftime format %S as the second as a decimal number (00-61). A footnote mentions the range [0-61] for tm_sec allows for as many as two leap seconds. This is not to say that pg_dump should misrepresent stored times, but rather that PG should not reject those misrepresented times as being ill-formed. We were lucky that PG has the bug which causes it to reject these times, as it led to the other bug in pg_dump being noticed. We should access :60 seconds but we should round 59.99 to 1:00, right? If the xx:59.999 occurred immediately before a leap second, rounding it up to (xx+1):00.00 would introduce an error of 1.001 seconds. As I understand it, the problem is in trying to round 59.999 to two digits. My question is, why is pg_dump representing times with less precision than PostgreSQL's internal format? Should pg_dump be lossy? Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Re: RPM source files should be in CVS (was Re: [GENERAL] psql -l)
On Fri, Jul 20, 2001 at 07:05:46PM -0400, Trond Eivind Glomsr?d wrote: Tom Lane [EMAIL PROTECTED] writes: BTW, the only python shebangs I can find in CVS look like #! /usr/bin/env python Isn't that OK on RedHat? It is. Probably the perl scripts should say, likewise, #!/usr/bin/env perl Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] MySQL Gemini code
On Wed, Jul 18, 2001 at 11:45:54AM -0400, Bruce Momjian wrote: And this press release http://www.nusphere.com/releases/071601.htm ... On a more significant note, I hear the word fork clearly suggested in that text. It is almost like MySQL AB GPL'ed the MySQL code and now they may not be able to keep control of it. Anybody is free to fork MySQL or PostgreSQL alike. The only difference is that all published MySQL forks must remain public, where PostgreSQL forks need not. MySQL AB is demonstrating their legal right to keep as much control as they chose, and NuSphere will lose if it goes to court. The interesting event here is that since NuSphere violated the license terms, they no longer have any rights to use or distribute the MySQL AB code, and won't until they get forgiveness from MySQL AB. MySQL AB would be within their rights to demand that the copyright to Gemini be signed over, before offering forgiveness. If Red Hat forks PostgreSQL, nobody will have any grounds for complaint. (It's been forked lots of times already, less visibly.) Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] MySQL Gemini code
On Wed, Jul 18, 2001 at 08:35:58AM -0400, Jan Wieck wrote: And this press release http://www.nusphere.com/releases/071601.htm also explains why they had to do it this way. They were always free to fork, but doing it the way they did -- violating MySQL AB's license -- they shot the dog. The lesson? Ask somebody competent, first, before you bet your company playing license games. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] dependent dependants
For the record: http://www.lineone.net/dictionaryof/englishusage/d0081889.html dependent or dependant Dependent is the adjective, used for a person or thing that depends on someone or something: Admission to college is dependent on A-level results. Dependant is the noun, and is a person who relies on someone for financial support: Do you have any dependants? This is not for mailing-list pendantism, but just to make sure that the right spelling gets into the code. (The page mentioned above was found by entering dependent dependant into Google.) Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] MySQL Gemini code
On Wed, Jul 18, 2001 at 06:37:48PM -0400, Trond Eivind Glomsr?d wrote: Michael Widenius [EMAIL PROTECTED] writes: Assigning over the code is also something that FSF requires for all code contributions. If you criticize us at MySQL AB, you should also criticize the above. This is slightly different - FSF wants it so it will have a legal position to defend its programs: ... MySQL and TrollTech requires copyright assignment in order to sell non-open licenses. Some people will have a problem with this, while not having a problem with the FSF copyright assignment. Nobody who works on MySQL is unaware of MySQL AB's business model. Anybody who contributes to the core server has to expect that MySQL AB will need to relicense anything accepted into the core; that's their right as originators. Everybody who contributes has a choice to make: fork, or sign over. (With the GPL, forking remains possible; Apple and Sun community licenses don't allow it.) Anybody who contributes to PG has to make the same choice: fork, or put your code under the PG license. The latter choice is equivalent to signing over to all proprietary vendors, who are then free to take your code proprietary. Some of us like that. I had actually hoped to get support from you guys at PostgreSQL regarding this. You may have similar experience or at least understand our position. The RedHat database may be a good thing for PostgreSQL, but I am not sure if it's a good thing for RedHat or for the main developers to PostgreSQL. This isn't even a remotely similar situation: ... It's similar enough. One difference is that PG users are less afraid to fork. Another is that without the GPL, we have elected not to (and indeed cannot) stop any company from doing with PG what NuSphere is doing with MySQL. This is why characterizing the various licenses as more or less business-friendly is misleading (i.e. dishonest) -- it evades the question, friendly to whom?. Businesses sometimes compete... Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] Re: SOMAXCONN (was Re: Solaris source code)
On Thu, Jul 12, 2001 at 11:08:34PM +0200, Peter Eisentraut wrote: Nathan Myers writes: When the system is too heavily loaded (however measured), any further login attempts will fail. What I suggested is, instead of the postmaster accept()ing the connection, why not leave the connection attempt in the queue until we can afford a back end to handle it? Because the new connection might be a cancel request. Supporting cancel requests seems like a poor reason to ignore what load-shedding support operating systems provide. To support cancel requests, it would suffice for PG to listen at another socket dedicated to administrative requests. (It might even ignore MaxBackends for connections on that socket.) Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Re: SOMAXCONN (was Re: Solaris source code)
On Sat, Jul 14, 2001 at 11:38:51AM -0400, Tom Lane wrote: The state of affairs in current sources is that the listen queue parameter is MIN(MaxBackends * 2, PG_SOMAXCONN), where PG_SOMAXCONN is a constant defined in config.h --- it's 1, hence a non-factor, by default, but could be reduced if you have a kernel that doesn't cope well with large listen-queue requests. We probably won't know if there are any such systems until we get some field experience with the new code, but we could have configure select a platform-dependent value if we find such problems. Considering the Apache comment about some systems truncating instead of limiting... 10xff is 16. Maybe 10239 would be a better choice, or 16383. So, having thought that through, I'm still of the opinion that holding off accept is of little or no benefit to us. But it's not as simple as it looks at first glance. Anyone have a different take on what the behavior is likely to be? After doing some more reading, I find that most OSes do not reject connect requests that would exceed the specified backlog; instead, they ignore the connection request and assume the client will retry later. Therefore, it appears cannot use a small backlog to shed load unless we assume that clients will time out quickly by themselves. OTOH, maybe it's reasonable to assume that clients will time out, and that in the normal case authentication happens quickly. Then we can use a small listen() backlog, and never accept() if we have more than MaxBackend back ends. The OS will keep a small queue corresponding to our small backlog, and the clients will do our load shedding for us. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] Re: SOMAXCONN
On Fri, Jul 13, 2001 at 10:36:13AM +0200, Zeugswetter Andreas SB wrote: When the system is too heavily loaded (however measured), any further login attempts will fail. What I suggested is, instead of the postmaster accept()ing the connection, why not leave the connection attempt in the queue until we can afford a back end to handle it? Because the clients would time out ? It takes a long time for half-open connections to time out, by default. Probably most clients would time out, themselves, first, if PG took too long to get to them. That would be a Good Thing. Once the SOMAXCONN threshold is reached (which would only happen when the system is very heavily loaded, because when it's not then nothing stays in the queue for long), new connection attempts would fail immediately, another Good Thing. When the system is very heavily loaded, we don't want to spare attention for clients we can't serve. Then, the argument to listen() will determine how many attempts can be in the queue before the network stack itself rejects them without the postmaster involved. You cannot change the argument to listen() at runtime, or are you suggesting to close and reopen the socket when maxbackends is reached ? I think that would be nonsense. Of course that would not work, and indeed nobody suggested it. If postmaster behaved a little differently, not accept()ing when the system is too heavily loaded, then it would be reasonable to call listen() (once!) with PG_SOMAXCONN set to (e.g.) N=20. Where the system is not too heavily-loaded, the postmaster accept()s the connection attempts from the queue very quickly, and the number of half-open connections never builds up to N. (This is how PG has been running already, under light load -- except that on Solaris with Unix sockets N has been too small.) When the system *is* heavily loaded, the first N attempts would be queued, and then the OS would automatically reject the rest. This is better than accept()ing any number of attempts and then refusing to authenticate. The N half-open connections in the queue would be picked up by postmaster as existing back ends drop off, or time out and give up if that happens too slowly. I liked the idea of min(MaxBackends, PG_SOMAXCONN), since there is no use in accepting more than your total allowed connections concurrently. That might not have the effect you imagine, where many short-lived connections are being made. In some cases it would mean that clients are rejected that could have been served after a very short delay. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] Re: SOMAXCONN (was Re: Solaris source code)
On Fri, Jul 13, 2001 at 07:53:02AM -0400, mlw wrote: Zeugswetter Andreas SB wrote: I liked the idea of min(MaxBackends, PG_SOMAXCONN), since there is no use in accepting more than your total allowed connections concurrently. I have been following this thread and I am confused why the queue argument to listen() has anything to do with Max backends. All the parameter to listen does is specify how long a list of sockets open and waiting for connection can be. It has nothing to do with the number of back end sockets which are open. Correct. If you have a limit of 128 back end connections, and you have 127 of them open, a listen with queue size of 128 will still allow 128 sockets to wait for connection before turning others away. Correct. It should be a parameter based on the time out of a socket connection vs the ability to answer connection requests within that period of time. It's not really meaningful at all, at present. There are two was to think about this. Either you make this parameter tunable to give a proper estimate of the usability of the system, i.e. tailor the listen queue parameter to reject sockets when some number of sockets are waiting, or you say no one should ever be denied, accept everyone and let them time out if we are not fast enough. This debate could go on, why not make it a parameter in the config file that defaults to some system variable, i.e. SOMAXCONN. With postmaster's current behavior there is no benefit in setting the listen() argument to anything less than 1000. With a small change in postmaster behavior, a tunable system variable becomes useful. But using SOMAXCONN blindly is always wrong; that is often 5, which is demonstrably too small. BTW: on linux, the backlog queue parameter is silently truncated to 128 anyway. The 128 limit is common, applied on BSD and Solaris as well. It will probably increase in future releases. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Re: SOMAXCONN (was Re: Solaris source code)
On Wed, Jul 11, 2001 at 12:26:43PM -0400, Tom Lane wrote: Peter Eisentraut [EMAIL PROTECTED] writes: Tom Lane writes: Right. Okay, it seems like just making it a hand-configurable entry in config.h.in is good enough for now. When and if we find that that's inadequate in a real-world situation, we can improve on it... Would anything computed from the maximum number of allowed connections make sense? [ looks at code ... ] Hmm, MaxBackends is indeed set before we arrive at the listen(), so it'd be possible to use MaxBackends to compute the parameter. Offhand I would think that MaxBackends or at most 2*MaxBackends would be a reasonable value. Question, though: is this better than having a hardwired constant? The only case I can think of where it might not be is if some platform out there throws an error from listen() when the parameter is too large for it, rather than silently reducing the value to what it can handle. A value set in config.h.in would be simpler to adapt for such a platform. The question is really whether you ever want a client to get a rejected result from an open attempt, or whether you'd rather they got a report from the back end telling them they can't log in. The second is more polite but a lot more expensive. That expense might really matter if you have MaxBackends already running. I doubt most clients have tested either failure case more thoroughly than the other (or at all), but the lower-level code is more likely to have been cut-and-pasted from well-tested code. :-) Maybe PG should avoid accept()ing connections once it has MaxBackends back ends already running (as hinted at by Ian), so that the listen() parameter actually has some meaningful effect, and excess connections can be rejected more cheaply. That might also make it easier to respond more adaptively to true load than we do now. BTW, while I'm thinking about it: why doesn't pqcomm.c test for a failure return from the listen() call? Is this just an oversight, or is there a good reason to ignore errors? The failure of listen() seems impossible. In the Linux, NetBSD, and Solaris man pages, none of the error returns mentioned are possible with PG's current use of the function. It seems as if the most that might be needed now would be to add a comment to the call to socket() noting that if any other address families are supported (besides AF_INET and AF_LOCAL aka AF_UNIX), the call to listen() might need to be looked at. AF_INET6 (which PG will need to support someday) doesn't seem to change matters. Probably if listen() did fail, then one or other of bind(), accept(), and read() would fail too. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: SOMAXCONN (was Re: [HACKERS] Solaris source code)
On Tue, Jul 10, 2001 at 05:06:28PM -0400, Bruce Momjian wrote: Mathijs Brands [EMAIL PROTECTED] writes: OK, I tried using 1024 (and later 128) instead of SOMAXCONN (defined to be 5 on Solaris) in src/backend/libpq/pqcomm.c and ran a few regression tests on two different Sparc boxes (Solaris 7 and 8). The regression test still fails, but for a different reason. The abstime test fails; not only on Solaris but also on FreeBSD (4.3-RELEASE). The abstime diff is to be expected (if you look closely, the test is comparing 'current' to 'June 30, 2001'. Ooops). If that's the only diff then you are in good shape. Based on this and previous discussions, I am strongly tempted to remove the use of SOMAXCONN and instead use, say, #define PG_SOMAXCONN1000 defined in config.h.in. That would leave room for configure to twiddle it, if that proves necessary. Does anyone know of a platform where this would cause problems? AFAICT, all versions of listen(2) are claimed to be willing to reduce the passed parameter to whatever they can handle. Could we test SOMAXCONN and set PG_SOMAXCONN to 1000 only if SOMAXCONN is less than 1000? All the OSes we know of fold it to 128, currently. We can jump it to 10240 now, or later when there are 20GHz CPUs. If you want to make it more complicated, it would be more useful to be able to set the value lower for runtime environments where PG is competing for OS resources with another daemon that deserves higher priority. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: SOMAXCONN (was Re: [HACKERS] Solaris source code)
On Tue, Jul 10, 2001 at 06:36:21PM -0400, Tom Lane wrote: [EMAIL PROTECTED] (Nathan Myers) writes: All the OSes we know of fold it to 128, currently. We can jump it to 10240 now, or later when there are 20GHz CPUs. If you want to make it more complicated, it would be more useful to be able to set the value lower for runtime environments where PG is competing for OS resources with another daemon that deserves higher priority. Hmm, good point. Does anyone have a feeling for the amount of kernel resources that are actually sucked up by an accept-queue entry? If 128 is the customary limit, is it actually worth worrying about whether we are setting it to 128 vs. something smaller? I don't think the issue is the resources that are consumed by the accept-queue entry. Rather, it's a tuning knob to help shed load at the entry point to the system, before significant resources have been committed. An administrator would tune it according to actual system and traffic characteristics. It is easy enough for somebody to change, if they care, that it seems to me we have already devoted it more time than it deserves right now. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] Re: Backup and Recovery
On Fri, Jul 06, 2001 at 06:52:49AM -0400, Bruce Momjian wrote: Nathan wrote: How hard would it be to turn these row records into updates against a pg_dump image, assuming access to a good table-image file? pg_dump is very hard because WAL contains only tids. No way to match that to pg_dump-loaded rows. Maybe pg_dump can write out a mapping of TIDs to line numbers, and the back-end can create a map of inserted records' line numbers when the dump is reloaded, so that the original TIDs can be traced to the new TIDs. I guess this would require a new option on IMPORT. I suppose the mappings could be temporary tables. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] RE: Row Versioning, for jdbc updateable result sets
On Fri, Jun 15, 2001 at 10:21:37AM -0400, Tom Lane wrote: Dave Cramer [EMAIL PROTECTED] writes: I had no idea that xmin even existed, but having a quick look I think this is what I am looking for. Can I assume that if xmin has changed, then another process has changed the underlying data ? xmin is a transaction ID, not a process ID, but looking at it should work for your purposes at present. There has been talk of redefining xmin as part of a solution to the XID-overflow problem: what would happen is that all sufficiently old tuples would get relabeled with the same special xmin, so that only recent transactions would need to have distinguishable xmin values. If that happens then your code would break, at least if you want to check for changes just at long intervals. An simpler alternative was change all sufficiently old tuples to have an xmin value, N, equal to the oldest that would need to be distinguished. xmin values could then be compared using normal arithmetic: less(xminA, xminB) is just ((xminA - N) (xminB - N)), with no special cases. A hack that comes to mind is that when relabeling an old tuple this way, we could copy its original xmin into cmin while setting xmin to the permanently-valid XID. Then, if you compare both xmin and cmin, you have only about a 1 in 2^32 chance of being fooled. (At least if we use a wraparound style of allocating XIDs. I think Vadim is advocating resetting the XID counter to 0 at each system restart, so the active range of XIDs might be a lot smaller than 2^32 in that scenario.) That assumes a pretty frequent system restart. Many of us prefer to code to the goal of a system that could run for decades. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] What (not) to do in signal handlers
On Thu, Jun 14, 2001 at 02:18:40PM -0400, Tom Lane wrote: Peter Eisentraut [EMAIL PROTECTED] writes: I notice that the signal handlers in postmaster.c do quite a lot of work, much more than what they teach you in school they should do. Yes, they're pretty ugly. However, we have not recently heard any complaints suggesting problems with it. Since we block signals everywhere except just around the select() for new input, there's not really any risk of recursive resource use AFAICS. ISTM that most of these, esp. pmdie(), can be written more like the SIGHUP handler, i.e., set a global variable and evaluate right after the select(). I would love to see it done that way, *if* you can show me a way to guarantee that the signal response will happen promptly. AFAIK there's no portable way to ensure that we don't end up sitting and waiting for a new client message before we get past the select(). It could open a pipe, and write(2) a byte to it in the signal handler, and then have select(2) watch that pipe. (SIGHUP could use the same pipe.) Writing to and reading from your own pipe can be a recipe for deadlock, but here it would be safe if the signal handler knows not to get too far ahead of select. (The easy way would be to allow no more than one byte in the pipe per signal handler.) Of course this is still a system call in a signal handler, but it can't (modulo coding bugs) fail. See Stevens, Unix Network Programming, Vol. 2, Interprocess Communication, p. 91, Figure 5.10, Functions that are async-signal-safe. The figure lists write() among others. Sample code implementing the above appears on page 94. Examples using other techniques (sigwait, nonblocking mq_receive) are presented also. A pipe per backend might be considered pretty expensive. Does UNIX allocate a pipe buffer before there's anything to put in it? Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] What (not) to do in signal handlers
On Thu, Jun 14, 2001 at 04:27:14PM -0400, Tom Lane wrote: [EMAIL PROTECTED] (Nathan Myers) writes: It could open a pipe, and write(2) a byte to it in the signal handler, and then have select(2) watch that pipe. (SIGHUP could use the same pipe.) Of course this is still a system call in a signal handler, but it can't (modulo coding bugs) fail. Hm. That's one way, but is it really any cleaner than our existing technique? Since you still need to assume you can do a system call in a signal handler, it doesn't seem like a real gain in bulletproofness to me. Quoting Stevens (UNPv2, p. 90), Posix uses the term *async-signal-safe* to describe the functions that may be called from a signal handler. Figure 5.10 lists these Posix functions, along with a few that were added by Unix98. Functions not listed may not be called from a signal andler. Note that none of the standard I/O functions ... are listed. Of call the IPC functions covered in this text, only sem_post, read, and write are listed (we are assuming the latter two would be used with pipes and FIFOs). Restricting the handler to use those in the approved list seems like an automatic improvement to me, even in the apparent absence of evidence of problems on those platforms that happen to get tested most. A pipe per backend might be considered pretty expensive. Pipe per postmaster, no? That doesn't seem like a huge cost. I haven't looked at how complex the signal handling in the backends is; maybe they don't need anything this fancy. (OTOH, maybe they should be using a pipe to communicate with postmaster, instead of using signals.) I'd be more concerned about the two extra kernel calls (write and read) per signal received, actually. Are there so many signals flying around? The signal handler would check a flag before writing, so a storm of signals would result in only one call to write, and one call to read, per select loop. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] What (not) to do in signal handlers
On Thu, Jun 14, 2001 at 05:10:58PM -0400, Tom Lane wrote: Doug McNaught [EMAIL PROTECTED] writes: Tom Lane [EMAIL PROTECTED] writes: Hm. That's one way, but is it really any cleaner than our existing technique? Since you still need to assume you can do a system call in a signal handler, it doesn't seem like a real gain in bulletproofness to me. Doing write() in a signal handler is safe; doing fprintf() (and friends) is not. If we were calling the signal handlers from random places, then I'd agree. But we're not: we use sigblock to ensure that signals are only serviced at the place in the postmaster main loop where select() is called. So there's no actual risk of reentrant use of non-reentrant library functions. Please recall that in practice the postmaster is extremely reliable. The single bug we have seen with the signal handlers in recent releases was the problem that they were clobbering errno, which was easily fixed by saving/restoring errno. This same bug would have arisen (though at such low probability we'd likely never have solved it) in a signal handler that only invoked write(). So I find it difficult to buy the argument that there's any net gain in robustness to be had here. In short: this code isn't broken, and so I'm not convinced we should fix it. Formally speaking, it *is* broken: we depend on semantics that are documented as unportable and undefined. In a sense, we have been so unlucky as not to have perceived, thus far, the undefined effects. This is no different from depending on finding a NUL at *(char*)0, or on being able to say free(p); p = p-next;. Yes, it appears to work, at the moment, on some platforms, but that doesn't make it correct. It may not be terribly urgent to fix it right now, but that's far from isn't broken. It at least merits a TODO entry. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] Re: Australian timezone configure option
On Thu, Jun 14, 2001 at 12:23:22AM +, Thomas Lockhart wrote: Surely the correct solution is to have a config file somewhere that gets read on startup? That way us Australians don't have to be the only ones in the world that need a custom built postgres. I will point out that you Australians, and, well, us 'mericans, are the only countries without the sense to choose unique conventions for time zone names. It sounds like having a second lookup table for the Australian rules is a possibility, and this sounds fairly reasonable to me. Btw, is there an Australian convention for referring to North American time zones for those zones with naming conflicts? For years I've been on the TZ list, the announcement list for a community-maintained database of time zones. One point they have firmly established is that there is no reasonable hope of making anything like a standard system of time zone name abbreviations work. Legislators and dictators compete for arbitrariness in their time zone manipulations. Even if you assign, for your own use, an abbreviation to a particular administrative region, you still need a history of legislation for that region to know what any particular time record (particularly and April or September) really means. The best practice for annotating times is to tag them with the numeric offset from UTC at the time the sample is formed. If the time sample is the present time, you don't have to know very much make or use it. If it's in the past, you have to know the legislative history of the place to form a proper time record, but not to use it. If the time is in the future, you cannot know what offset will be in popular use at that time, but at least you can be precise about what actual time you really mean, even if you can't be sure about what the wall clock says. (Actual wall clock times are not reliably predictable, a fact that occasionally makes things tough on airline passengers.) Things are a little more stable in some places (e.g. in Europe it is improving) but worldwide all is chaos. Assigning some country's current abbreviations at compile time is madness. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Idea: quicker abort after loss of client connection
On Tue, Jun 05, 2001 at 08:01:02PM -0400, Tom Lane wrote: Thoughts? Is there anything about this that might be unsafe? Should QueryCancel be set after *any* failure of recv() or send(), or only if certain errno codes are detected (and if so, which ones)? Stevens identifies some errno codes that are not significant; in particular, EINTR, EAGAIN, and EWOULDBLOCK. Of these, maybe only the first occurs on a blocking socket. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Re: Interesting Atricle
On Sat, Jun 02, 2001 at 10:59:20AM -0400, Vince Vielhaber wrote: On Fri, 1 Jun 2001, Bruce Momjian wrote: Thought some people might find this article interesting. http://www.zend.com/zend/art/databases.php The only interesting thing I noticed is how fast it crashes my Netscape-4.76 browser ;) Yours too? I turned off Java/Javascript to get it to load and I am on BSD/OS. Strange it so univerally crashes. Really odd. I have Java/Javascript with FreeBSD and Netscape 4.76 and read it just fine. One difference tho probably, I keep style sheets shut off. Netscape crashes about 1% as often as it used to. This is getting off-topic, but ... I keep CSS, Javascript, Java, dynamic fonts, and images turned off, and Netscape 4.77 stays up for many weeks at a time. I also have no Flash plugin. All together it makes for a far more pleasant web experience. I didn't notice any problem with the Zend page. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Re: Interesting Atricle
On Mon, Jun 04, 2001 at 04:55:13PM -0400, Bruce Momjian wrote: This is getting off-topic, but ... I keep CSS, Javascript, Java, dynamic fonts, and images turned off, and Netscape 4.77 stays up for many weeks at a time. I also have no Flash plugin. All together it makes for a far more pleasant web experience. I didn't notice any problem with the Zend page. You are running no images! You may as well have Netscape minimized and say it is running for weeks. :-) Over 98% of the images on the web are either pr0n or wankage. If you don't need to see that, you can save a lot of time. But it's usually Javascript that crashes Netscape. (CSS appears to be implemented using Javascript, because if you turn off Javascript, then CSS stops working (and crashing).) That's not to say that Java doesn't also crash Netscape; it's just that pages with Java in them are not very common. There's little point in bookmarking a site that depends on client-side Javascript or Java, because it won't be up for very long. But this is *really* off topic, now. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Imperfect solutions
On Thu, May 31, 2001 at 10:07:36AM -0400, Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: What got me thinking about this is that I don't think my gram.y fix would be accepted given the current review process, Not to put too fine a point on it: the project has advanced a long way since you did that code. Our standards *should* be higher than they were then. and that is bad because we would have to live with no LIKE optimization for 1-2 years until we learned how to do it right. We still haven't learned how to do it right, actually. I think the history of the LIKE indexing problem is a perfect example of why fixes that work for some people but not others don't survive long. We put out several attempts at making it work reliably in non-ASCII locales, but none of them have withstood the test of actual usage. I think there are a few rules we can use to decide how to deal with imperfect solutions: You forgot * will the fix institutionalize user-visible behavior that will in the long run be considered the wrong thing? * will the fix contort new code that is written in the same vicinity, thereby making it harder and harder to replace as time goes on? The first of these is the core of my concern about %TYPE. This list points up a problem that needs a better solution than a list: you have to put in questionable features now to get the usage experience you need to do it right later. The set of prospective features that meet that description does not resemble the set that would pass all the criteria in the list. This is really a familiar problem, with a familiar solution. When a feature is added that is wrong, make sure it's marked somehow -- at worst, in the documentation, but ideally with a NOTICE or something when it's used -- as experimental. If anybody complains later that when you ripped it out and redid it correctly, you broke his code, you can just laugh, and add, if you're feeling charitable, experimental features are not to be depended on. -- Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] Re: charin(), text_char() should return something else for empty input
On Mon, May 28, 2001 at 02:37:32PM -0400, Tom Lane wrote: I wrote: I propose that both of these operations should return a space character for an empty input string. This is by analogy to space-padding as you'd get with char(1). Any objections? An alternative approach is to make charin and text_char map empty strings to the null character (\0), and conversely make charout and char_text map the null character to empty strings. charout already acts that way, in effect, since it has to produce a null-terminated C string. This way would have the advantage that there would still be a reversible dump and reload representation for a char field containing '\0', whereas space-padding would cause such a field to become ' ' after reload. But it's a little strange if you think that char ought to behave the same as char(1). Does the standard require any particular behavior in with NUL characters? I'd like to see PG move toward treating them as ordinary control characters. I realize that at best it will take a long time to get there. C is irretrievably mired in the NUL is a terminator swamp, but SQL isn't C. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] BSD gettext
On Thu, May 24, 2001 at 10:30:01AM -0400, Bruce Momjian wrote: The HPUX man page for mmap documents its failure return value as -1, so I hacked around this with #ifndef MAP_FAILED #define MAP_FAILED ((void *) (-1)) #endif whereupon it built and passed the simple self-test you suggested. However, I think it's pretty foolish to depend on mmap for such little reason as this code does. I suggest ripping out the mmap usage and just reading the file with good old read(2). Agreed. Let read() use mmap() internally if it wants to. The reason mmap() is faster than read() is that it can avoid copying data to the place you specify. read() can use mmap() internally only in cases rare enough to hardly be worth checking for. Stdio is often able to use mmap() internally for parsing, and in glibc-2.x (and, I think, on recent Solarix and BSDs) it does. Usually, therefore, it would be better to use stdio functions (except fread()!) in place of read(), where possible, to allow this optimization. Using mmap() in place of disk read() almost always results in enough performance improvement to make doing so worth a lot of disruption. Today mmap() is used heavily enough, in important programs, that worries about unreliability are no better founded than worries about read(). Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] More pgindent follies
On Wed, May 23, 2001 at 11:58:51AM -0400, Bruce Momjian wrote: I don't see the problem here. My assumption is that the comment is not part of the define, right? Well, that's the question. ANSI C requires comments to be replaced by whitespace before preprocessor commands are detected/executed, but there was an awful lot of variation in preprocessor behavior before ANSI. I suspect there are still preprocessors out there that might misbehave on this input --- for example, by leaving the text * end-of-string */ present in the preprocessor output. Now we still go to considerable lengths to support not-quite-ANSI preprocessors. I don't like the idea that all the work done by configure and c.h in that direction might be wasted because of pgindent carelessness. I agree, but in a certain sense, we would have found those compilers already. This is not new behavour as far as I know, and clearly this would throw a compiler error. This is good news! Maybe this process can be formalized. That is, each official release migh contain a source file with various modern constructs which we suspect might break old compilers. A comment block at the top requests that any breakage be reported. A configure option would allow a user to avoid compiling it, and a comment in the file would explain how to use the option. After a major release, any modern construct that caused no trouble in the last release is considered OK to use. This process makes it easy to leave behind obsolete language restrictions: if you wonder if it's OK now to use a feature that once broke some crufty platform, drop it in modern.c and forget about it. After the next release, you know the answer. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] C++ Headers
On Tue, May 22, 2001 at 12:19:41AM -0400, Bruce Momjian wrote: This in fact has happened within ECPG. But since sizeof(bool) is passed to libecpg it was possible to figure out which 'bool' is requested. Another issue of C++ compatibility would be cleaning up the usage of 'const' declarations. C++ is really strict about 'const'ness. But I don't know whether postgres' internal headers would need such a cleanup. (I suspect that in ecpg there is an oddity left with respect to host variable declaration. I'll check that later) We have added more const-ness to libpq++ for 7.2. Breaking link compatibility without bumping the major version number on the library seems to me serious no-no. To const-ify member functions without breaking link compatibility, you have to add another, overloaded member that is const, and turn the non-const function into a wrapper. For example: void Foo::bar() { ... } // existing interface becomes void Foo::bar() { ((const Foo*)this)-bar(); } void Foo::bar() const { ... } Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] C++ Headers
On Tue, May 22, 2001 at 05:52:20PM -0400, Bruce Momjian wrote: On Tue, May 22, 2001 at 12:19:41AM -0400, Bruce Momjian wrote: This in fact has happened within ECPG. But since sizeof(bool) is passed to libecpg it was possible to figure out which 'bool' is requested. Another issue of C++ compatibility would be cleaning up the usage of 'const' declarations. C++ is really strict about 'const'ness. But I don't know whether postgres' internal headers would need such a cleanup. (I suspect that in ecpg there is an oddity left with respect to host variable declaration. I'll check that later) We have added more const-ness to libpq++ for 7.2. Breaking link compatibility without bumping the major version number on the library seems to me serious no-no. To const-ify member functions without breaking link compatibility, you have to add another, overloaded member that is const, and turn the non-const function into a wrapper. For example: void Foo::bar() { ... } // existing interface becomes void Foo::bar() { ((const Foo*)this)-bar(); } void Foo::bar() const { ... } Thanks. That was my problem, not knowing when I break link compatiblity in C++. Major updated. Wouldn't it be better to add the forwarding function and keep the same major number? It's quite disruptive to change the major number for what are really very minor changes. Otherwise you accumulate lots of near-copies of almost-identical libraries to be able to run old binaries. A major-number bump should usually be something planned for and scheduled. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] storage density
When organizing available free storage for re-use, we will probably have a choice whether to favor using space in (mostly-) empty blocks, or in mostly-full blocks. Empty and mostly-empty blocks are quicker -- you can put lots of rows in them before they fill up and you have to choose another. Preferring mostly-full blocks improves active-storage and cache density because a table tends to occupy fewer total blocks. Does anybody know of papers that analyze the tradeoffs involved? Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Upgrade issue (again).
On Thu, May 17, 2001 at 12:43:49PM -0400, Rod Taylor wrote: Best way to upgrade might bee to do something as simple as get the master to master replication working. Master-to-master replication is not simple, and (fortunately) isn't strictly necessary. The minimal sequence is, 1. Start a backup and a redo log at the same time. 2. Start the new database and read the backup. 3. Get the new database consuming the redo logs. 4. When the new database catches up, make it a hot failover for the old. 5. Turn off the old database and fail over. The nice thing about this approach is that all the parts used are essential parts of an enterprise database anyway, regardless of their usefulness in upgrading. Master-to-master replication is nice for load balancing, but not necessary for failover. Its chief benefit, there, is that you wouldn't need to abort the uncompleted transactions on the old database when you make the switch. But master-to-master replication is *hard* to make work, and intrusive besides. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Plans for solving the VACUUM problem
On Fri, May 18, 2001 at 06:10:10PM -0700, Mikheev, Vadim wrote: Vadim, can you remind me what UNDO is used for? Ok, last reminder -:)) On transaction abort, read WAL records and undo (rollback) changes made in storage. Would allow: 1. Reclaim space allocated by aborted transactions. 2. Implement SAVEPOINTs. Just to remind -:) - in the event of error discovered by server - duplicate key, deadlock, command mistyping, etc, - transaction will be rolled back to the nearest implicit savepoint setted just before query execution; - or transaction can be aborted by ROLLBACK TO savepoint_name command to some explicit savepoint setted by user. Transaction rolled back to savepoint may be continued. 3. Reuse transaction IDs on postmaster restart. 4. Split pg_log into small files with ability to remove old ones (which do not hold statuses for any running transactions). I missed the original discussions; apologies if this has already been beaten into the ground. But... mightn't sub-transactions be a better-structured way to expose this service? Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[HACKERS] End-to-end paper
For those of you who have missed it, here http://www.google.com/search?q=cache:web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf+clark+end+to+endhl=en is the paper some of us mention, END-TO-END ARGUMENTS IN SYSTEM DESIGN by Saltzer, Reed, and Clark. The abstract is: This paper presents a design principle that helps guide placement of functions among the modules of a distributed computer system. The principle, called the end-to-end argument, suggests that functions placed at low levels of a system may be redundant or of little value when compared with the cost of providing them at that low level. Examples discussed in the paper include bit error recovery, security using encryption, duplicate message suppression, recovery from system crashes, and delivery acknowledgement. Low level mechanisms to support these functions are justified only as performance enhancements. It was written in 1981 and is undiminished by the subsequent decades. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Re: End-to-end paper
On Thu, May 17, 2001 at 06:04:54PM +0800, Lincoln Yeoh wrote: At 12:24 AM 17-05-2001 -0700, Nathan Myers wrote: For those of you who have missed it, here http://www.google.com/search?q=cache:web.mit.edu/Saltzer/www/publications/endtoend/endtoend.pdf+clark+end+to+endhl=en is the paper some of us mention, END-TO-END ARGUMENTS IN SYSTEM DESIGN by Saltzer, Reed, and Clark. The abstract is: This paper presents a design principle that helps guide placement of functions among the modules of a distributed computer system. The principle, called the end-to-end argument, suggests that functions placed at low levels of a system may be redundant or of little value when compared with the cost of providing them at that low level. Examples discussed in the paper include bit error recovery, security using encryption, duplicate message suppression, recovery from system crashes, and delivery acknowledgement. Low level mechanisms to support these functions are justified only as performance enhancements. It was written in 1981 and is undiminished by the subsequent decades. Maybe I don't understand the paper. Yes. It bears re-reading. The end-to-end argument might be true if taking the monolithic approach. I find more useful ideas gleaned from the RFCs, TCP/IP and the OSI 7 layer model: modularity, useful standard interfaces, Be liberal in what you accept, and conservative in what you send and so on. The end-to-end principle has had profound effects on the design of Internet protocols, perhaps most importantly in keeping them simpler than OSI's. Within a module I figure the end to end argument might hold, The end-to-end principle isn't particularly applicable within a module. It's a system-design principle. Its prescription for individual modules is: don't imagine that anybody else gets much value from your complex error recovery shenanigans; they have to do their own error recovery anyway. You provide more value by making a good effort. but the author keeps talking about networks and networking. Of course networking is just an example, but it's a particularly good example. Data storage (e.g. disk) is another good example; in the context of the paper it may be thought of as a mechanism for communicating with other (later) times. The point there is that the CRCs and ECC performed by the disk are not sufficient to ensure reliability for the system (e.g. database service); for that, end-to-end measures such as hot-failover, backups, redo logs, and block- or record-level CRCs are needed. The purpose of the disk CRCs is not reliability, a job they cannot do alone, but performance: they help make the need to use the backups and redo logs infrequent enough to be tolerable. SSL and TCP are useful. The various CRC checks down the IP stack to the datalink layer have their uses too. Yes, of course they are useful. The authors say so in the paper, and they say precisely how (and how not). By splitting stuff up at appropriate points, adding or substituting objects at various layers becomes so much easier. People can download Postgresql over token ring, Gigabit ethernet, X.25 and so on. As noted in the paper, the principle is most useful in helping to decide what goes in each layer. Splitting stuff up does mean that the bits and pieces now do have a certain responsibility. If those responsibilities involve some redundancies in error checking or encryption or whatever, so be it, because if done well people can use those bits and pieces in interesting ways never dreamed of initially. For example SSL over TCP over IPSEC over encrypted WAP works (even though IPSEC is way too complicated :)). There's so much redundancy there, but at the same time it's not a far fetched scenario - just someone ordering online on a notebook pc. The authors quote a similar example in the paper, even though it was written twenty years ago. But if a low level module never bothered with error correction/detection/handling or whatever and was optimized for an application specific purpose, it's harder to use it for other purposes. And if you do, some chap could post an article to Bugtraq on it, mentioning exploit, DoS or buffer overflow. The point is that leaving that stuff _out_ is how you keep low-level mechanisms useful for a variety of purposes. Putting in complicated error-recovery stuff might suit it better for a particular application, but make it less suitable for others. This is why, at the IP layer, packets get tossed at the first sign of congestion. It's why TCP connections often get dropped at the first sign of a data-format violation. This is a very deep principle; understanding it thoroughly will make you a much better system designer. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister
Re: [HACKERS] Configurable path to look up dynamic libraries
On Tue, May 15, 2001 at 05:53:36PM -0400, Bruce Momjian wrote: But, if I may editorialize a little myself, this is just indicative of a 'Fortress PostgreSQL' attitude that is easy to get into. 'We've always I have to admit I like the sound of 'Fortress PostgreSQL'. :-) Ye Olde PostgreSQL Shoppe The PostgreSQL of Giza Our Lady of PostgreSQL, Ascendant PostgreSQL International Airport PostgreSQL Galactica PostgreSQL's Tavern ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
[HACKERS] Cursor support in pl/pg
Now that 7.1 is safely in the can, is it time to consider this patch? It provides cursor support in PL. http://www.airs.com/ian/postgresql-cursor.patch Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[HACKERS] tables/indexes/logs on different volumes
On Wed, Apr 25, 2001 at 09:41:57AM -0300, The Hermit Hacker wrote: On Tue, 24 Apr 2001, Nathan Myers wrote: On Tue, Apr 24, 2001 at 11:28:17PM -0300, The Hermit Hacker wrote: I have a Dual-866, 1gig of RAM and strip'd file systems ... this past week, I've hit many times where CPU usage is 100%, RAM is 500Meg free and disks are pretty much sitting idle ... Assuming strip'd above means striped, it strikes me that you might be much better off operating the drives independently, with the various tables, indexes, and logs scattered each entirely on one drive. have you ever tried to maintain a database doing this? PgSQL is definitely not designed for this sort of setup, I had symlinks going everywhere, and with the new numbering schema, this is even more difficult to try and do :) Clearly you need to build a tool to organize it. It would help a lot if PG itself could provide some basic assistance, such as calling a stored procedure to generate the pathname of the file. Has there been any discussion of anything like that? Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] refusing connections based on load ...
On Mon, Apr 23, 2001 at 03:09:53PM -0300, The Hermit Hacker wrote: Anyone thought of implementing this, similar to how sendmail does it? If load n, refuse connections? ... If nobody is working on something like this, does anyone but me feel that it has merit to make use of? I'll play with it if so ... I agree that it would be useful. Even more useful would be soft load shedding, where once some load average level is exceeded the postmaster delays a bit (proportionately) before accepting a connection. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] refusing connections based on load ...
On Mon, Apr 23, 2001 at 10:50:42PM -0400, Tom Lane wrote: Basically, if we do this then we are abandoning the notion that Postgres runs as an unprivileged user. I think that's a BAD idea, especially in an environment that's open enough that you might feel the need to load-throttle your users. By definition you do not trust them, eh? No. It's not a case of trust, but of providing an adaptive way to keep performance reasonable. The users may have no independent way to cooperate to limit load, but the DB can provide that. A less dangerous way of approaching it might be to have an option whereby the postmaster invokes 'uptime' via system() every so often (maybe once a minute?) and throttles on the basis of the results. The reaction time would be poorer, but security would be a whole lot better. Yes, this alternative looks much better to me. On Linux you have the much more efficient alternative, /proc/loadavg. (I wouldn't use system(), though.) Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [HACKERS] Is it possible to mirror the db in Postgres?
On Fri, Apr 20, 2001 at 03:33:38PM -0700, G. Anthony Reina wrote: We use Postgres 7.0.3 to store data for our scientific research. We have two other labs in St. Louis, MO and Tempe, AZ. I'd like to see if there's a way for them to mirror our database. They would be able to update our database when they received new results and we would be able to update theirs. So, in effect, we'd have 3 copies of the same db. Each copy would be able to update the other. Any thoughts on if this is possible? Does the replication have to be reliable? Are you equipped to reconcile databases that have got out of sync, if not? Will the different labs ever try to update the same existing record, or insert conflicting (unique-key) records? Symmetric replication is easy or impossible, but usually somewhere in between, depending on many details. Usually when it's made to work, it runs on a LAN. Reliable WAN replication is harder. Most of the proprietary database companies will tell you they can do it, but their customers will tell you they can't. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Re: Is it possible to mirror the db in Postgres?
On Fri, Apr 20, 2001 at 04:53:43PM -0700, G. Anthony Reina wrote: Nathan Myers wrote: Does the replication have to be reliable? Are you equipped to reconcile databases that have got out of sync, when it's not? Will the different labs ever try to update the same existing record, or insert conflicting (unique-key) records? (1) Yes, of course. (2) Willing--yes; equipped--dunno. (3) Yes, probably. Hmm, good luck. Replication, by itself, is not hard, but it's only a tiny part of the job. Most of the job is in handling failures and conflicts correctly, for some (usually enormous) definition of "correctly". Reliable WAN replication is harder. Most of the proprietary database companies will tell you they can do it, but their customers will tell you they can't. Joel Burton suggested the rserv utility. I don't know how well it would work over a wide network. The point about WANs is that things which work nicely in the lab, on a LAN, behave very differently when the communication medium is, like the Internet, only fitfully reliable. You will tend to have events occurring in unexpected order, and communications lost, and queues topping over, and conflicting entries in different instances which you must somehow reconcile after the fact. Reconciliation by shipping the whole database across the WAN is often impractical, particularly when you're trying to use it at the same time. WAN replication is an important part of Zembu's business, and it's hard. I would expect the rserv utility (about which I admit I know little) not to have been designed for the job. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] CRN article not updated
On Wed, Apr 18, 2001 at 02:22:48PM -0400, Bruce Momjian wrote: I just checked the CRN PostgreSQL article at: http://www.crn.com/Sections/Fast_Forward/fast_forward.asp?ArticleID=25670 I see no changes to the article, even though Vince our webmaster, Geoff Davidson of PostgreSQL, Inc, and Dave Mele of Great Bridge have requested it be fixed. If _you_ had been deluged with that kind of vitriol, what kind of favors would you feel like doing? Not sure what we can do now. It's too late. "We" screwed it up. (Thanks again, guys.) The responses have done far more lasting damage than any article could ever have done. The horse is dead. The best we can do is to plan for the future. 1. What happens the next time a slightly inaccurate article is published? 2. What happens when an openly hostile article is published? Will our posse ride off again with guns blazing, making more enemies? Will they make us all look to potential users like a bunch of hotheaded, childish nobodies? Or will we have somebody appointed, already, to write a measured, rational, mature clarification? Will we have articles already written, and handed to more responsible reporters, so that an isolated badly-done article can do little damage? We're not even on Oracle's radar yet. When PG begins to threaten their income, their marketing department will go on the offensive. Oracle marketing is very, very skillful, and very, very nasty. If they find that by seeding the press with reasonable-sounding criticisms of PG, they can prod the PG community into making itself look like idiots, they will go to town on it. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] timeout on lock feature
On Wed, Apr 18, 2001 at 09:54:11AM +0200, Zeugswetter Andreas SB wrote: In short, I think lock timeout is a solution searching in vain for a problem. If we implement it, we are just encouraging bad application design. I agree with Tom completely here. In any real-world application the database is the key component of a larger system: the work it does is the most finicky, and any mistakes (either internally or, more commonly, from misuse) have the most far-reaching consequences. The responsibility of the database is to provide a reliable and easily described and understood mechanism to build on. It is not something that makes anything unrelyable or less robust. It is also simple: "I (the client) request that you (the backend) dont wait for any lock longer than x seconds" Many things that are easy to say have complicated consequences. Timeouts are a system-level mechanism that to be useful must refer to system-level events that are far above anything that PG knows about. I think you are talking about different kinds of timeouts here. Exactly. I'm talking about useful, meaningful timeouts, not random timeouts attached to invisible events within the database. The only way PG could apply reasonable timeouts would be for the application to dictate them, That is exactly what we are talking about here. No. You wrote elsewhere that the application sets "30 seconds" and leaves it. But that 30 seconds doesn't have any application-level meaning -- an operation could take twelve hours without tripping your 30-second timeout. For the application to dictate the timeouts reasonably, PG would have to expose all its lock events to the client and expect it to deduce how they affect overall behavior. but the application can better implement them itself. It can, but it makes the program more complicated (needs timers or threads, which violates your last statement "simplest interface". It is good for the program to be more complicated if it is doing a more complicated thing -- if it means the database may remain simple. People building complex systems have an even greater need for simple components than people building little ones. What might be a reasonable alternative would be a BEGIN timeout: report failure as soon as possible after N seconds unless the timer is reset, such as by a commit. Such a timeout would be meaningful at the database-interface level. It could serve as a useful building block for application-level timeouts when the client environment has trouble applying timeouts on its own. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] timeout on lock feature
On Wed, Apr 18, 2001 at 07:33:24PM -0400, Bruce Momjian wrote: What might be a reasonable alternative would be a BEGIN timeout: report failure as soon as possible after N seconds unless the timer is reset, such as by a commit. Such a timeout would be meaningful at the database-interface level. It could serve as a useful building block for application-level timeouts when the client environment has trouble applying timeouts on its own. Now that is a nifty idea. Just put it on one command, BEGIN, and have it apply for the whole transaction. We could just set an alarm and do a longjump out on timeout. Of course, it begs the question why the client couldn't do that itself, and leave PG out of the picture. But that's what we've been talking about all along. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] Another news story in need of 'enlightenment'
On Tue, Apr 17, 2001 at 01:31:43PM -0400, Lamar Owen wrote: This one probably needs the 'iron hand and the velvet paw' touch. The iron hand to pound some sense into the author, and the velvet paw to make him like having sense pounded into him. Title of article is 'Open Source Databases Won't Fly' -- http://www.dqindia.com/content/enterprise/datawatch/101041201.asp This one is best just ignored. It's content-free, just a his frightened opinions. The only thing that will change his mind is the improvements planned for releases 7.2 and 7.3, and lots of deployments. Few will read his rambling. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] timeout on lock feature
On Tue, Apr 17, 2001 at 12:56:11PM -0400, Tom Lane wrote: In short, I think lock timeout is a solution searching in vain for a problem. If we implement it, we are just encouraging bad application design. I agree with Tom completely here. In any real-world application the database is the key component of a larger system: the work it does is the most finicky, and any mistakes (either internally or, more commonly, from misuse) have the most far-reaching consequences. The responsibility of the database is to provide a reliable and easily described and understood mechanism to build on. Timeouts are a system-level mechanism that to be useful must refer to system-level events that are far above anything that PG knows about. The only way PG could apply reasonable timeouts would be for the application to dictate them, but the application can better implement them itself. You can think of this as another aspect of the "end-to-end" principle: any system-level construct duplicated in a lower-level system component can only improve efficiency, not provide the corresponding high-level service. If we have timeouts in the database, they should be there to enable the database to better implement its abstraction, and not pretend to be a substitute for system-level timeouts. There's no upper limit on how complicated a database interface can become (cf. Oracle). The database serves its users best by having the simplest interface that can possibly provide the needed service. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Fast Forward (fwd)
On Sun, Apr 15, 2001 at 01:17:15AM -0400, Vince Vielhaber wrote: Here's my response to the inaccurate article cmp produced. After chatting with Marc I decided to post it myself. ... Where do you get your info? Do you just make it up? PostgreSQL is not a product of Great Bridge and never has been. It's 100% independant. Is Linux a keyword you figure you can use to draw readers? Won't take long before folks determine you're full of it. The PostgreSQL team takes great pride (not to be confused with great bridge) in ensuring that the work we do runs on ALL platforms; be it Mac's OSX, FreeBSD 4.3, or even Windows 2000. So why do you figure this is a Great Bridge product? Why do you figure it's Linux only? What is it with you writers lately? Are you getting lazy and simply using Linux as a quick out for a paycheck? This is probably a good time to point out that this is the _worst_ _possible_ response to erroneous reportage. The perception by readers will not be that the reporter failed, but that PostgreSQL advocates are rabid weasels who don't appreciate favorable attention, and are dangerous to write anything about. You can bet this reporter and her editor will treat the topic very circumspectly (i.e. avoid it) in the future. When they have to mention it, their reporting will be colored by their personal experience. They (and their readers) don't run the code, so they must get their impressions from those who do. Most reporters are ignorant, most reporters are lazy, and many are both. It's part of the job description. Getting angry about it is like getting angry at birds for fouling their cage. Their job is to regurgitate what they're given, and quickly. They have no time to learn the depths, or to write coherently about it, or even to check facts. None of the errors in the article matter. Nobody will develop an enduring impression of PG from them. What matters is that PG is being mentioned in the same article with Oracle. In her limited way, she did the PG community the biggest favor in her limited power, and all we can do is attack? It will be harder than the original mailings, but I urge each who wrote to write again and apologize for attacking her. Thank her graciously for making an effort, and offer to help her check her facts next time. PostgreSQL needs friends in the press, even if they are ignorant or lazy. It doesn't need any enemies in the press. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Fast Forward (fwd)
On Sun, Apr 15, 2001 at 11:44:48AM -0300, The Hermit Hacker wrote: On Sat, 14 Apr 2001, Nathan Myers wrote: This is probably a good time to point out that this is the _worst_ _possible_ response to erroneous reportage. The perception by readers will not be that the reporter failed, but that PostgreSQL advocates are rabid weasels who don't appreciate favorable attention, and are favorable attention?? Yes, totally favorable. There wasn't a hint of the condescension typically accorded free software. All of the details you find so objectionable (April vs. June? "The" marketing arm vs. "a" marketing arm?) would not even be noticed by a non-cultist. dangerous to write anything about. You can bet this reporter and her editor will treat the topic very circumspectly (i.e. avoid it) in the future. woo hoo, if that is the result, then I think Vince did us a great service, not dis-service ... False. This may have been the reporter's and the editor's first direct exposure to free software advocates. You guys came across as hate-filled religious whackos, and that reflects on all of us. Most reporters are ignorant, most reporters are lazy, and many are both. It's part of the job description. Getting angry about it is like getting angry at birds for fouling their cage. Their job is to regurgitate what they're given, and quickly. They have no time to learn the depths, or to write coherently about it, or even to check facts. Out of all the articles on PgSQL that I've read over the years, this one should have been shot before it hit the paper (so to say) ... it was the most blatantly inaccurate article I've ever read ... It had a number of minor errors, easily corrected. The next will probably talk about what a bunch of nasty cranks and lunatics PostgreSQL fans are, unless you who wrote can display a lot more finesse in your apologies. Thanks a lot, guys. It will be harder than the original mailings, but I urge each who wrote to write again and apologize for attacking her. In a way, I think you are right .. I think the attack was aimed at the wrong ppl :( She obviously didn't get *any* of her information from ppl that belong *in* the Pg community, or that have any knowledge of how it works, or of its history :( How is this reporter going to have developed contacts within the community? She has just started. Now you've burnt her to a crisp, and she will figure the less contact with that "community" she has, the happier she'll be. Her editor will know that mentioning PG in any context will result in a raft of hate mail from cranks, and will treat press releases from our community with the scorn they have earned. Reporters are fragile creatures, and must be gently guided toward the light. They will always get facts wrong, but that matter not at all. The overall tone of the writing is the only thing that stays with their equally dim audience. That dim audience controls the budgets for technology deployment, including databases. Next time you propose a deployment on PG instead of Oracle, thank Vince et al. when it's dismissed as a crank toy. Finally, their talkback page was most probably implemented _not_ with MySQL, but with MS SQL Server. These intramural squabbles (between MySQL and PG, between Linux and BSD, between NetBSD and OpenBSD) are justifiably seen as pathetic in the outside world. Respectful attention among projects doesn't just create a better impression, it also allows you, maybe, to learn something. (MySQL is not objectively as good as PG, but those guys are doing something right, in their presentation, that some of us could learn from.) Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Re: Hey guys, check this out.
On Sun, Apr 15, 2001 at 10:05:46PM -0400, Vince Vielhaber wrote: On Mon, 16 Apr 2001, Lincoln Yeoh wrote: Maybe you guys should get some Great Bridge marketing/PR person to handle stuff like this. After reading Ned's comments I figured that's how it got that way in the first place. But that's just speculation. You probably figured wrong. All those publications have editors who generally feel they're not doing their job if they don't introduce errors, usually without even talking to the reporter. That's probably how the "FreeBSD" reference got in there: somebody saw "Berkeley" and decided "FreeBSD" would look more "techie". It's stupid, but nothng to excoriate the reporter about. Sam Williams's articles read completely differently according to who publishes them. Typically the Linux magazines print what he writes, and thereby get it mostly right, but the finance magazines mangle them to total nonsense. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] Truncation of object names
On Fri, Apr 13, 2001 at 01:16:43AM -0400, Tom Lane wrote: [EMAIL PROTECTED] (Nathan Myers) writes: We have noticed here also that object (e.g. table) names get truncated in some places and not others. If you create a table with a long name, PG truncates the name and creates a table with the shorter name; but if you refer to the table by the same long name, PG reports an error. Example please? This is clearly a bug. Sorry, false alarm. When I got the test case, it turned out to be the more familiar problem: create table foo_..._bar1 (id1 ...); [notice, "foo_..._bar1" truncated to "foo_..._bar"] create table foo_..._bar (id2 ...); [error, foo_..._bar already exists] create index foo_..._bar_ix on foo_..._bar(id2); [notice, "foo_..._bar_ix" truncated to "foo_..._bar"] [error, foo_..._bar already exists] [error, attribute "id2" not found] It would be more helpful for the first "create" to fail so we don't end up cluttered with objects that shouldn't exist, and which interfere with operations on objects which should. But I'm not proposing that for 7.1. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Truncation of object names
On Fri, Apr 13, 2001 at 02:54:47PM -0400, Tom Lane wrote: [EMAIL PROTECTED] (Nathan Myers) writes: Sorry, false alarm. When I got the test case, it turned out to be the more familiar problem: create table foo_..._bar1 (id1 ...); [notice, "foo_..._bar1" truncated to "foo_..._bar"] create table foo_..._bar (id2 ...); [error, foo_..._bar already exists] create index foo_..._bar_ix on foo_..._bar(id2); [notice, "foo_..._bar_ix" truncated to "foo_..._bar"] [error, foo_..._bar already exists] [error, attribute "id2" not found] It would be more helpful for the first "create" to fail so we don't end up cluttered with objects that shouldn't exist, and which interfere with operations on objects which should. Seems to me that if you want a bunch of CREATEs to be mutually dependent, then you wrap them all in a BEGIN/END block. Yes, but... The second and third commands weren't supposed to be related to the first at all, never mind dependent on it. They were made dependent by PG crushing the names together. We are thinking about working around the name length limitation (encountered in migrating from other dbs) by allowing "foo.bar.baz" name syntax, as a sort of rudimentary namespace mechanism. It ain't schemas, but it's better than "foo__bar__baz". Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Anyone have any good addresses ... ?
On Fri, Apr 13, 2001 at 06:32:26PM -0400, Trond Eivind Glomsr?d wrote: The Hermit Hacker [EMAIL PROTECTED] writes: Here is what we've always sent to to date ... anyone have any good ones to add? Addresses : [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED], [EMAIL PROTECTED] Freshmeat, linuxtoday. If the release includes RPMs for Red Hat Linux, redhat-announce is also a suitable location. Linux Journal: [EMAIL PROTECTED] Freshmeat: [EMAIL PROTECTED] LinuxToday: http://linuxtoday.com/contribute.php3 -- Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Re: Hand written parsers
On Wed, Apr 11, 2001 at 10:44:59PM -0700, Ian Lance Taylor wrote: Mark Butler [EMAIL PROTECTED] writes: ... The advantages of using a hand written recursive descent parser lie in 1) ease of implementing grammar changes 2) ease of debugging 3) ability to handle unusual cases 4) ability to support context sensitive grammars ... Another nice capability is the ability to enable and disable grammar rules at run time ... On the other hand, recursive descent parsers tend to be more ad hoc, they tend to be harder to maintain, and they tend to be less efficient. ... And I note that despite the difficulties, the g++ parser is yacc based. Yacc and yacc-like programs are most useful when the target grammar (or your understanding of it) is not very stable. With Yacc you can make sweeping changes much more easily; big changes can be a lot of work in a hand-coded parser. Once your grammar stabilizes, though, hand coding can provide flexibility that is inconceivable in a parser generator, albeit at some cost in speed and compact description. (I doubt parser speed is an issue for PG.) G++ has flirted seriously with switching to a recursive-descent parser, largely to be able to offer meaningful error messages and to recover better from errors, as well as to be able to parse some problematic but conformant (if unlikely) programs. Note that the choice is not just between Yacc and a hand-coded parser. Since Yacc, many more powerful parser generators have been released, one of which might be just right for PG. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] Truncation of char, varchar types
On Mon, Apr 09, 2001 at 09:20:42PM +0200, Peter Eisentraut wrote: Excessively long values are currently silently truncated when they are inserted into char or varchar fields. This makes the entire notion of specifying a length limit for these types kind of useless, IMO. Needless to say, it's also not in compliance with SQL. How do people feel about changing this to raise an error in this situation? Does anybody rely on silent truncation? Should this be user-settable, or can those people resort to using triggers? Yes, detecting and reporting errors early is a Good Thing. You don't do anybody any favors by pretending to save data, but really throwing it away. We have noticed here also that object (e.g. table) names get truncated in some places and not others. If you create a table with a long name, PG truncates the name and creates a table with the shorter name; but if you refer to the table by the same long name, PG reports an error. (Very long names may show up in machine- generated schemas.) Would patches for this, e.g. to refuse to create a table with an impossible name, be welcome? Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Re: TODO list
On Thu, Apr 05, 2001 at 04:25:42PM -0400, Ken Hirsch wrote: TODO updated. I know we did number 2, but did we agree on #1 and is it done? #2 is indeed done. #1 is not done, and possibly not agreed to --- I think Vadim had doubts about its usefulness, though personally I'd like to see it. That was my recollection too. This was the discussion about testing the disk hardware. #1 removed. What is recommended in the bible (Gray and Reuter), especially for larger disk block sizes that may not be written atomically, is to have a word at the end of the that must match a word at the beginning of the block. It gets changed each time you write the block. That only works if your blocks are atomic. Even SCSI disks reorder sector writes, and they are free to write the first and last sectors of an 8k-32k block, and not have written the intermediate blocks before the power goes out. On IDE disks it is of course far worse. (On many (most?) IDE drives, even when they have been told to report write completion only after data is physically on the platter, they will "forget" if they see activity that looks like benchmarking. Others just ignore the command, and in any case they all default to unsafe mode.) If the reason that a block CRC isn't on the TODO list is that Vadim objects, maybe we should hear some reasons why he objects? Maybe the objections could be dealt with, and everyone satisfied. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Re: TODO list
On Thu, Apr 05, 2001 at 02:27:48PM -0700, Mikheev, Vadim wrote: If the reason that a block CRC isn't on the TODO list is that Vadim objects, maybe we should hear some reasons why he objects? Maybe the objections could be dealt with, and everyone satisfied. Unordered disk writes are covered by backing up modified blocks in log. It allows not only catch such writes, as would CRC do, but *avoid* them. So, for what CRC could be used? To catch disk damages? Disk has its own CRC for this. OK, this was already discussed, maybe while Vadim was absent. Should I re-post the previous text? Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Re: TODO list
On Thu, Apr 05, 2001 at 06:25:17PM -0400, Tom Lane wrote: "Mikheev, Vadim" [EMAIL PROTECTED] writes: If the reason that a block CRC isn't on the TODO list is that Vadim objects, maybe we should hear some reasons why he objects? Maybe the objections could be dealt with, and everyone satisfied. Unordered disk writes are covered by backing up modified blocks in log. It allows not only catch such writes, as would CRC do, but *avoid* them. So, for what CRC could be used? To catch disk damages? Disk has its own CRC for this. Blocks that have recently been written, but failed to make it down to the disk platter intact, should be restorable from the WAL log. So we do not need a block-level CRC to guard against partial writes. If a block is missing some sectors in the middle, how would you know to reconstruct it from the WAL, without a block CRC telling you that the block is corrupt? A block-level CRC might be useful to guard against long-term data lossage, but Vadim thinks that the disk's own CRCs ought to be sufficient for that (and I can't say I disagree). The people who make the disks don't agree. They publish the error rate they guarantee, and they meet it, more or less. They publish a rate that is _just_ low enough to satisfy noncritical requirements (on the correct assumption that they can't satisfy critical requirements in any case) and high enough not to interfere with benchmarks. They assume that if you need better reliability you can and will provide it yourself, and rely on their CRC only as a performance optimization. At the raw sector level, they get (and correct) errors very frequently; when they are not getting "enough" errors, they pack the bits more densely until they do, and sell a higher-density drive. So the only real benefit of a block-level CRC would be to guard against bits dropped in transit from the disk surface to someplace else, ie, during read or during a "cp -r" type copy of the database to another location. That's not a totally negligible risk, but is it worth the overhead of updating and checking block CRCs? Seems dubious at best. Vadim didn't want to re-open this discussion until after 7.1 is out the door, but that "dubious at best" demands an answer. See the archive posting: http://www.postgresql.org/mhonarc/pgsql-hackers/2001-01/msg00473.html ... Incidentally, is the page at http://www.postgresql.org/mhonarc/pgsql-hackers/2001-01/ the best place to find old messages? It's never worked right for me. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Final call for platform testing
On Tue, Apr 03, 2001 at 03:31:25PM +, Thomas Lockhart wrote: OK. So we are close to a final tally of supported machines. ... Here are the up-to-date platforms: AIX 4.3.3 RS6000 7.1 2001-03-21, Gilles Darold BeOS 5.0.4 x86 7.1 2000-12-18, Cyril Velter BSDI 4.01 x86 7.1 2001-03-19, Bruce Momjian Compaq Tru64 4.0g Alpha 7.1 2001-03-19, Brent Verner FreeBSD 4.3 x867.1 2001-03-19, Vince Vielhaber HPUX PA-RISC 7.1 2001-03-19, 10.20 Tom Lane, 11.00 Giles Lean IRIX 6.5.11 MIPS 7.1 2001-03-22, Robert Bruccoleri Linux 2.2.x Alpha 7.1 2001-01-23, Ryan Kirkpatrick Linux 2.2.x armv4l 7.1 2001-03-22, Mark Knox Linux 2.0.x MIPS 7.1 2001-03-30, Dominic Eidson Linux 2.2.18 PPC74xx 7.1 2001-03-19, Tom Lane Linux 2.2.x S/390 7.1 2000-11-17, Neale Ferguson Linux 2.2.15 Sparc 7.1 2001-01-30, Ryan Kirkpatrick Linux 2.2.16 x86 7.1 2001-03-19, Thomas Lockhart MacOS X Darwin PPC 7.1 2000-12-11, Peter Bierman NetBSD 1.5 Alpha 7.1 2001-03-22, Giles Lean NetBSD 1.5E arm32 7.1 2001-03-21, Patrick Welche NetBSD m68k7.0 2000-04-10 (Henry has lost machine) NetBSD Sparc 7.0 2000-04-13, Tom I. Helbekkmo NetBSD VAX 7.1 2001-03-30, Tom I. Helbekkmo NetBSD 1.5 x86 7.1 2001-03-23, Giles Lean OpenBSD 2.8 Sparc 7.1 2001-03-23, Brandon Palmer OpenBSD 2.8 x867.1 2001-03-22, Brandon Palmer SCO OpenServer 5 x86 7.1 2001-03-13, Billy Allie SCO UnixWare 7.1.1 x86 7.1 2001-03-19, Larry Rosenman Solaris 2.7-8 Sparc7.1 2001-03-22, Marc Fournier Solaris x867.1 2001-03-27, Mathijs Brands SunOS 4.1.4 Sparc 7.1 2001-03-23, Tatsuo Ishii WinNT/Cygwin x86 7.1 2001-03-16, Jason Tishler And the "unsupported platforms": DGUX m88k MkLinux DR1 PPC750 7.0 2000-04-13, Tatsuo Ishii NextStep x86 QNX 4.25 x86 7.0 2000-04-01, Dr. Andreas Kardos System V R4 m88k System V R4 MIPS Ultrix MIPS7.1 2001-03-26, Alexander Klimov Windows/Win32 x86 7.1 2001-03-26, Magnus Hagander (clients only) I saw three separate reports of successful builds on Linux 2.4.2 on x86 (including mine), but it isn't listed here. -- Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Re: Final call for platform testing
On Tue, Apr 03, 2001 at 11:19:04PM +, Thomas Lockhart wrote: I saw three separate reports of successful builds on Linux 2.4.2 on x86 (including mine), but it isn't listed here. It is listed in the comments in the real docs. At least one report was for an extensively patched 2.4.2, and I'm not sure of the true lineage of the others. You could ask. Just to ignore reports that you have asked for is not polite. My report was based on a virgin, unpatched 2.4.2 kernel, and (as noted) the Debian-packaged glibc-2.2.2. If you are trying to trim your list, would be reasonable to drop Linux-2.0.x, because that version is not being maintained any more. I *could* remove the version info from the x86 listing, and mention both 2.2.x and 2.4.x in the comments. Linux-2.2 and Linux-2.4 are different codebases. It is worth noting, besides, the glibc-version tested along with each Linux kernel version. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Re: Changing the default value of an inherited column
On Sun, Apr 01, 2001 at 03:15:56PM -0400, Tom Lane wrote: Christopher Masto [EMAIL PROTECTED] writes: Another thing that seems kind of interesting would be to have: CREATE TABLE base (table_id CHAR(8) NOT NULL [, etc.]); CREATE TABLE foo (table_id CHAR(8) NOT NULL DEFAULT 'foo'); CREATE TABLE bar (table_id CHAR(8) NOT NULL DEFAULT 'bar'); Then a function on "base" could look at table_id and know which table it's working on. A waste of space, but I can think of uses for it. This particular need is superseded in 7.1 by the 'tableoid' pseudo-column. However you can certainly imagine variants of this that tableoid doesn't handle, for example columns where the subtable creator can provide a useful-but-not-always-correct default value. A bit of O-O doctrine... when you find yourself tempted to do something like the above, it usually means you're trying to do the wrong thing. You may not have a choice, in some cases, but you should know you are on the way to architecture meltdown. "She'll blow, Cap'n!" Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Re: Changing the default value of an inherited column
On Mon, Apr 02, 2001 at 01:27:06PM -0400, Tom Lane wrote: Philip: the rule that pg_dump needs to apply w.r.t. defaults for inherited fields is that if an inherited field has a default and either (a) no parent table supplies a default, or (b) any parent table supplies a default different from the child's, then pg_dump had better emit the child field explicitly. The rule above appears to work even if inherited-default conflicts are not taken as an error, but just result in a derived-table column with no default. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] MIPS test-and-set
On Mon, Mar 26, 2001 at 07:09:38PM -0500, Tom Lane wrote: Thomas Lockhart [EMAIL PROTECTED] writes: That is not already available from the Irix support code? What we have for IRIX is ... Doesn't look to me like it's likely to work on anything but IRIX ... I have attached linuxthreads/sysdeps/mips/pt-machine.h from glibc-2.2.2 below. (Glibc linuxthreads has alpha, arm, hppa, i386, ia64, m68k, mips, powerpc, s390, SH, and SPARC support, at least in some degree.) Since the actual instruction sequence is probably lifted from the MIPS manual, it's probably much freer than GPL. For the paranoid, the actual instructions, extracted, are just 1: ll %0,%3 bnez %0,2f li %1,1 sc %1,%2 beqz %1,1b 2: Nathan Myers [EMAIL PROTECTED] --- /* Machine-dependent pthreads configuration and inline functions. Copyright (C) 1996, 1997, 1998, 2000 Free Software Foundation, Inc. This file is part of the GNU C Library. Contributed by Ralf Baechle [EMAIL PROTECTED]. Based on the Alpha version by Richard Henderson [EMAIL PROTECTED]. The GNU C Library is free software; you can redistribute it and/or modify it under the terms of the GNU Library General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. The GNU C Library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Library General Public License for more details. You should have received a copy of the GNU Library General Public License along with the GNU C Library; see the file COPYING.LIB. If not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. */ #include sgidefs.h #include sys/tas.h #ifndef PT_EI # define PT_EI extern inline #endif /* Memory barrier. */ #define MEMORY_BARRIER() __asm__ ("" : : : "memory") /* Spinlock implementation; required. */ #if (_MIPS_ISA = _MIPS_ISA_MIPS2) PT_EI long int testandset (int *spinlock) { long int ret, temp; __asm__ __volatile__ ("/* Inline spinlock test set */\n\t" "1:\n\t" "ll%0,%3\n\t" ".set push\n\t" ".set noreorder\n\t" "bnez %0,2f\n\t" " li %1,1\n\t" ".set pop\n\t" "sc%1,%2\n\t" "beqz %1,1b\n" "2:\n\t" "/* End spinlock test set */" : "=r" (ret), "=r" (temp), "=m" (*spinlock) : "m" (*spinlock) : "memory"); return ret; } #else /* !(_MIPS_ISA = _MIPS_ISA_MIPS2) */ PT_EI long int testandset (int *spinlock) { return _test_and_set (spinlock, 1); } #endif /* !(_MIPS_ISA = _MIPS_ISA_MIPS2) */ /* Get some notion of the current stack. Need not be exactly the top of the stack, just something somewhere in the current frame. */ #define CURRENT_STACK_FRAME stack_pointer register char * stack_pointer __asm__ ("$29"); /* Compare-and-swap for semaphores. */ #if (_MIPS_ISA = _MIPS_ISA_MIPS2) #define HAS_COMPARE_AND_SWAP PT_EI int __compare_and_swap (long int *p, long int oldval, long int newval) { long int ret; __asm__ __volatile__ ("/* Inline compare swap */\n\t" "1:\n\t" "ll%0,%4\n\t" ".set push\n" ".set noreorder\n\t" "bne %0,%2,2f\n\t" " move %0,%3\n\t" ".set pop\n\t" "sc%0,%1\n\t" "beqz %0,1b\n" "2:\n\t" "/* End compare swap */" : "=r" (ret), "=m" (*p) : "r" (oldval), "r" (newval), "m" (*p) : "memory"); return ret; } #endif /* (_MIPS_ISA = _MIPS_ISA_MIPS2) */ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Banner links not working (fwd)
On Mon, Mar 12, 2001 at 08:05:26PM +, Peter Mount wrote: At 11:41 12/03/01 -0500, Vince Vielhaber wrote: On Mon, 12 Mar 2001, Peter Mount wrote: Bottom of every page (part of the template) is both my name and email address ;-) Can we slightly enlarge the font? Can do. What size do you think is best? I've always used size=1 for that line... Absolute font sizes in HTML are always a mistake. size="-1" would do. -- Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] Internationalized dates (was Internationalized error messages)
On Mon, Mar 12, 2001 at 11:11:46AM +0100, Karel Zak wrote: On Fri, Mar 09, 2001 at 10:58:02PM +0100, Kaare Rasmussen wrote: Now you're talking about i18n, maybe someone could think about input and output of dates in local language. As fas as I can tell, PostgreSQL will only use English for dates, eg January, February and weekdays, Monday, Tuesday etc. Not the local name. May be add special mask to to_char() and use locales for this, but I not sure. It isn't easy -- arbitrary size of strings, to_char's cache problems -- more and more difficult is parsing input with locales usage. The other thing is speed... A solution is use number based dates without names :-( ISO has published a standard on date/time formats, ISO 8601. Dates look like "2001-03-22". Times look like "12:47:63". The only unfortunate feature is their standard format for a date/time: "2001-03-22T12:47:63". To me the ISO date format is far better than something involving month names. I'd like to see ISO 8601 as the default data format. -- Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] Uh, this is *not* a 64-bit CRC ...
On Mon, Mar 05, 2001 at 02:00:59PM -0500, Tom Lane wrote: [EMAIL PROTECTED] (Nathan Myers) writes: The CRC-64 code used in the SWISS-PROT genetic database is (now) at: ftp://ftp.ebi.ac.uk/pub/software/swissprot/Swissknife/old/SPcrc.tar.gz From the README: The code in this package has been derived from the BTLib package obtained from Christian Iseli [EMAIL PROTECTED]. From his mail: The reference is: W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, "Numerical recipes in C", 2nd ed., Cambridge University Press. Pages 896ff. The generator polynomial is x64 + x4 + x3 + x1 + 1. Nathan (or anyone else with a copy of "Numerical recipes in C", which I'm embarrassed to admit I don't own), is there any indication in there that anyone spent any effort on choosing that particular generator polynomial? As far as I can see, it violates one of the standard guidelines for choosing a polynomial, namely that it be a multiple of (x + 1) ... which in modulo-2 land is equivalent to having an even number of terms, which this ain't got. See Ross Williams' A PAINLESS GUIDE TO CRC ERROR DETECTION ALGORITHMS, available from ftp://ftp.rocksoft.com/papers/crc_v3.txt among other places, which is by far the most thorough and readable thing I've ever seen on CRCs. I spent some time digging around the net for standard CRC64 polynomials, and the only thing I could find that looked like it might have been picked by someone who understood what they were doing is in the DLT (digital linear tape) standard, ECMA-182 (available from http://www.ecma.ch/ecma1/STAND/ECMA-182.HTM): x^64 + x^62 + x^57 + x^55 + x^54 + x^53 + x^52 + x^47 + x^46 + x^45 + x^40 + x^39 + x^38 + x^37 + x^35 + x^33 + x^32 + x^31 + x^29 + x^27 + x^24 + x^23 + x^22 + x^21 + x^19 + x^17 + x^13 + x^12 + x^10 + x^9 + x^7 + x^4 + x + 1 I'm sorry to have taken so long to reply. The polynomial chosen for SWISS-PROT turns out to be presented, in Numerical Recipes, just as an example of a primitive polynomial of that degree; no assertion is made about its desirability for error checking. It is (in turn) drawn from E. J. Watson, "Mathematics of Computation", vol. 16, pp368-9. Having (x + 1) as a factor guarantees to catch all errors in which an odd number of bits have been changed. Presumably you are then infinitesimally less likely to catch all errors in which an even number of bits have been changed. I would have posted the ECMA-182 polynomial if I had found it. (That was good searching!) One hopes that the ECMA polynomial was chosen more carefully than entirely at random. High-degree codes are often chosen by Monte Carlo methods, by applying statistical tests to randomly-chosen values, because the search space is so large. I have verified that Tom transcribed the polynomial correctly from the PDF image. The ECMA document doesn't say whether their polynomial is applied "bit-reversed", but the check would be equally strong either way. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] WAL SHM principles
Sorry for taking so long to reply... On Wed, Mar 07, 2001 at 01:27:34PM -0800, Mikheev, Vadim wrote: Nathan wrote: It is possible to build a logging system so that you mostly don't care when the data blocks get written [after being changed, as long as they get written by an fsync]; a particular data block on disk is considered garbage until the next checkpoint, so that you How to know if a particular data page was modified if there is no log record for that modification? (Ie how to know where is garbage? -:)) In such a scheme, any block on disk not referenced up to (and including) the last checkpoint is garbage, and is either blank or reflects a recent logged or soon-to-be-logged change. Everything written (except in the log) after the checkpoint thus has to happen in blocks not otherwise referenced from on-disk -- except in other post-checkpoint blocks. During recovery, the log contents get written to those pages during startup. Blocks that actually got written before the crash are not changed by being overwritten from the log, but that's ok. If they got written before the corresponding log entry, too, nothing references them, so they are considered blank. might as well allow the blocks to be written any time, even before the log entry. And what to do with index tuples pointing to unupdated heap pages after that? Maybe index pages are cached in shm and copied to mmapped blocks after it is ok for them to be written. What platforms does PG run on that don't have mmap()? Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[HACKERS] doxygen PG
Is this page http://members.fortunecity.com/nymia/postgres/dox/backend/html/ common knowledge? It appears to be an automatically-generated cross-reference documentation web site. My impression is that appropriately-marked comments in the code get extracted to the web pages, too, so it is also a way to automate internal documentation. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])
Re: [HACKERS] doxygen PG
On Sat, Mar 10, 2001 at 06:29:37PM -0500, Tom Lane wrote: [EMAIL PROTECTED] (Nathan Myers) writes: Is this page http://members.fortunecity.com/nymia/postgres/dox/backend/html/ common knowledge? Interesting, but bizarrely incomplete. (Yeah, we have only ~100 struct types ... sure ...) It does say "version 0.0.1". What was interesting to me is that the interface seems a lot more helpful than the current CVS web gateway. If it were to be completed, and could be kept up to date automatically, something like it could be very useful. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Internationalized error messages
On Fri, Mar 09, 2001 at 12:05:22PM -0500, Tom Lane wrote: Gettext takes care of this. In the source you'd write elog(ERROR, "2200G", gettext("type mismatch in CASE expression (%s vs %s)"), string, string); Duh. For some reason I was envisioning the localization substitution as occurring on the client side, but of course we'd want to do it on the server side, and before parameters are substituted into the message. Sorry for the noise. I am not sure we can/should use gettext (possible license problems?), but certainly something like this could be cooked up. I've been assuming that PG's needs are specialized enough that the project wouldn't use gettext directly, but instead something inspired by it. If you look at my last posting on the subject, by the way, you will see that it could work without a catalog underneath; integrating a catalog would just require changes in a header file (and the programs to generate the catalog, of course). That quality seems to me essential to allow the changeover to be phased in gradually, and to allow different underlying catalog implementations to be tried out. Nathan ncm ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/users-lounge/docs/faq.html
Re: [HACKERS] Use SIGQUIT instead of SIGUSR1?
On Thu, Mar 08, 2001 at 04:06:16PM -0500, Tom Lane wrote: To implement the idea of performing a checkpoint after every so many XLOG megabytes (as well as after every so many seconds), I need to pick an additional signal number for the postmaster to accept. Seems like the most appropriate choice for this is SIGUSR1, which isn't currently being used at the postmaster level. However, if I just do that, then SIGUSR1 and SIGQUIT will have completely different meanings for the postmaster and for the backends, in fact SIGQUIT to the postmaster means send SIGUSR1 to the backends. This seems hopelessly confusing. I think it'd be a good idea to change the code so that SIGQUIT is the per-backend quickdie() signal, not SIGUSR1, to bring the postmaster and backend signals back into some semblance of agreement. For the moment we could leave the backends also accepting SIGUSR1 as quickdie, just in case someone out there is in the habit of sending that signal manually to individual backends. Eventually backend SIGUSR1 might be reassigned to mean something else. (I suspect Bruce is coveting it already ;-).) The number and variety of signals used in PG is already terrifying. Attaching a specific meaning to SIGQUIT may be dangerous if the OS and its daemons also send SIGQUIT to mean something subtly different. I'd rather see a reduction in the use of signals, and a movement toward more modern, better behaved interprocess communication mechanisms. Still, "if it were done when 'tis done, then 'twere well It were done" cleanly. -- Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] Proposed WAL changes
On Wed, Mar 07, 2001 at 11:09:25AM -0500, Tom Lane wrote: "Vadim Mikheev" [EMAIL PROTECTED] writes: * Store two past checkpoint locations, not just one, in pg_control. On startup, we fall back to the older checkpoint if the newer one is unreadable. Also, a physical copy of the newest checkpoint record And what to do if older one is unreadable too? (Isn't it like using 2 x CRC32 instead of CRC64 ? -:)) Then you lose --- but two checkpoints gives you twice the chance of recovery (probably more, actually, since it's much more likely that the previous checkpoint will have reached disk safely). Actually far more: if the checkpoints are minutes apart, even the worst disk drive will certainly have flushed any blocks written for the earlier checkpoint. -- Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://www.postgresql.org/search.mpl
Re: [HACKERS] WAL SHM principles
On Wed, Mar 07, 2001 at 11:21:37AM -0500, Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: The only problem is that we would no longer have control over which pages made it to disk. The OS would perhaps write pages as we modified them. Not sure how important that is. Unfortunately, this alone is a *fatal* objection. See nearby discussions about WAL behavior: we must be able to control the relative timing of WAL write/flush and data page writes. Not so fast! It is possible to build a logging system so that you mostly don't care when the data blocks get written; a particular data block on disk is considered garbage until the next checkpoint, so that you might as well allow the blocks to be written any time, even before the log entry. Letting the OS manage sharing of disk block images via mmap should be an enormous win vs. a fixed shm and manual scheduling by PG. If that requires changes in the logging protocol, it's worth it. (What supported platforms don't have mmap?) Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Proposed WAL changes
On Wed, Mar 07, 2001 at 12:03:41PM -0800, Mikheev, Vadim wrote: Ian wrote: I feel that the fact that WAL can't help in the event of disk errors is often overlooked. This is true in general. But, nevertheless, WAL can be written to protect against predictable disk errors, when possible. Failing to write a couple of disk blocks when the system crashes or, more likely, when power drops; a system crash shouldn't keep the disk from draining its buffers ... is a reasonably predictable disk error. WAL should ideally be written to work correctly in that situation. But what can be done if fsync returns before pages flushed? Just what Tom has done: preserve a little more history. If it's not too expensive, then it doesn't hurt you when running on sound hardware, but it offers a good chance of preventing embarrassments for (the overwhelming fraction of) users on garbage hardware. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Red Hat bashing
On Tue, Mar 06, 2001 at 04:20:13PM -0500, Lamar Owen wrote: Nathan Myers wrote: That is why there is no problem with version skew in the syscall argument structures on a correctly-configured Linux system. (On a Red Hat system it is very easy to get them out of sync, but RH fans are used to problems.) Is RedHat bashing really necessary here? I recognize that my last seven words above contributed nothing. In the future I will only post strictly factual statements about Red Hat and similarly charged topics, and keep the opinions to myself. I value the collegiality of this list too much to risk it further. I offer my apologies for violating it. By the way... do they call Red Hat "RedHat" at Red Hat? Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] How to shoot yourself in the foot: kill -9 postmaster
On Mon, Mar 05, 2001 at 08:55:41PM -0500, Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: killproc should send a kill -15 to the process, wait a few seconds for it to exit. If it does not, try kill -1, and if that doesn't kill it, then kill -9. Tell it to the Linux people ... this is their boot-script code we're talking about. Not to be a zealot, but this isn't _Linux_ boot-script code, it's _Red Hat_ boot-script code. Red Hat would like for us all to confuse the two, but they jes' ain't the same. (As a rule of thumb, where it works right, credit Linux; where it doesn't, blame Red Hat. :-) Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] WAL RC1 status
On Fri, Mar 02, 2001 at 10:54:04AM -0500, Bruce Momjian wrote: Bruce Momjian [EMAIL PROTECTED] writes: Is there a version number in the WAL file? catversion.h will do fine, no? Can we put conditional code in there to create new log file records with an updated format? The WAL stuff is *far* too complex already. I've spent a week studying it and I only partially understand it. I will not consent to trying to support multiple log file formats concurrently. Well, I was thinking a few things. Right now, if we update the catversion.h, we will require a dump/reload. If we can update just the WAL version stamp, that will allow us to fix WAL format problems without requiring people to dump/reload. I can imagine this would be valuable if we find we need to make changes in 7.1.1, where we can not require dump/reload. It Seems to Me that after an orderly shutdown, the WAL files should be, effectively, slag -- they should contain no deltas from the current table contents. In practice that means the only part of the format that *should* matter is whatever it takes to discover that they really are slag. That *should* mean that, at worst, a change to the WAL file format should only require doing an orderly shutdown, and then (perhaps) running a simple program to generate a new-format empty WAL. It ought not to require an initdb. Of course the details of the current implementation may interfere with that ideal, but it seems a worthy goal for the next beta, if it's not possible already. Given the opportunity to change the current WAL format, it ought to be possible to avoid even needing to run a program to generate an empty WAL. Nathan Myers [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Uh, this is *not* a 64-bit CRC ...
On Wed, Feb 28, 2001 at 04:53:09PM -0500, Tom Lane wrote: I just took a close look at the COMP_CRC64 macro in xlog.c. This isn't a 64-bit CRC. It's two independent 32-bit CRCs, one done on just the odd-numbered bytes and one on just the even-numbered bytes of the datastream. That's hardly any stronger than a single 32-bit CRC; it's certainly not what I thought we had agreed to implement. We can't change this algorithm without forcing an initdb, which would be a rather unpleasant thing to do at this late stage of the release cycle. But I'm not happy with it. Comments? This might be a good time to update: The CRC-64 code used in the SWISS-PROT genetic database is (now) at: ftp://ftp.ebi.ac.uk/pub/software/swissprot/Swissknife/old/SPcrc.tar.gz From the README: The code in this package has been derived from the BTLib package obtained from Christian Iseli [EMAIL PROTECTED]. From his mail: The reference is: W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, "Numerical recipes in C", 2nd ed., Cambridge University Press. Pages 896ff. The generator polynomial is x64 + x4 + x3 + x1 + 1. I would suggest that if you don't change the algorithm, at least change the name in the sources. Were you to #ifdef in a real crc-64, and make a compile-time option to select the old one, you could allow users who wish to avoid the initdb a way to continue with the existing pair of CRC-32s. Nathan Myers [EMAIL PROTECTED]
Re: [HACKERS] Uh, this is *not* a 64-bit CRC ...
On Wed, Feb 28, 2001 at 09:17:19PM -0500, Bruce Momjian wrote: On Wed, Feb 28, 2001 at 04:53:09PM -0500, Tom Lane wrote: I just took a close look at the COMP_CRC64 macro in xlog.c. This isn't a 64-bit CRC. It's two independent 32-bit CRCs, one done on just the odd-numbered bytes and one on just the even-numbered bytes of the datastream. That's hardly any stronger than a single 32-bit CRC; it's certainly not what I thought we had agreed to implement. We can't change this algorithm without forcing an initdb, which would be a rather unpleasant thing to do at this late stage of the release cycle. But I'm not happy with it. Comments? This might be a good time to update: The CRC-64 code used in the SWISS-PROT genetic database is (now) at: ftp://ftp.ebi.ac.uk/pub/software/swissprot/Swissknife/old/SPcrc.tar.gz From the README: The code in this package has been derived from the BTLib package obtained from Christian Iseli [EMAIL PROTECTED]. From his mail: The reference is: W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, "Numerical recipes in C", 2nd ed., Cambridge University Press. Pages 896ff. The generator polynomial is x64 + x4 + x3 + x1 + 1. I would suggest that if you don't change the algorithm, at least change the name in the sources. Were you to #ifdef in a real crc-64, and make a compile-time option to select the old one, you could allow users who wish to avoid the initdb a way to continue with the existing pair of CRC-32s. Added to TODO: * Correct CRC WAL code to be normal CRC32 algorithm Um, how about * Correct CRC WAL code to be a real CRC64 algorithm instead? Nathan Myers [EMAIL PROTECTED]
Re: [HACKERS] CommitDelay performance improvement
On Sun, Feb 25, 2001 at 12:41:28AM -0500, Tom Lane wrote: Attached are graphs from more thorough runs of pgbench with a commit delay that occurs only when at least N other backends are running active transactions. ... It's not entirely clear what set of parameters is best, but it is absolutely clear that a flat zero-commit-delay policy is NOT best. The test conditions are postmaster options -N 100 -B 1024, pgbench scale factor 10, pgbench -t (transactions per client) 100. (Hence the results for a single client rely on only 100 transactions, and are pretty noisy. The noise level should decrease as the number of clients increases.) It's hard to interpret these results. In particular, "delay 10k, sibs 20" (10k,20), or cyan-triangle, is almost the same as "delay 50k, sibs 1" (50k,1), or green X. Those are pretty different parameters to get such similar results. The only really bad performers were (0), (10k,1), (100k,20). The best were (30k,1) and (30k,10), although (30k,5) also did well except at 40. Why would 30k be a magic delay, regardless of siblings? What happened at 40? At low loads, it seems (100k,1) (brown +) did best by far, which seems very odd. Even more odd, it did pretty well at very high loads but had problems at intermediate loads. Nathan Myers [EMAIL PROTECTED]
Re: [HACKERS] CommitDelay performance improvement
On Sat, Feb 24, 2001 at 01:07:17AM -0500, Tom Lane wrote: [EMAIL PROTECTED] (Nathan Myers) writes: I see, I had it backwards: N=0 corresponds to "always delay", and N=infinity (~0) is "never delay", or what you call zero delay. N=1 is not interesting. N=M/2 or N=sqrt(M) or N=log(M) might be interesting, where M is the number of backends, or the number of backends with begun transactions, or something. N=10 would be conservative (and maybe pointless) just because it would hardly ever trigger a delay. Why is N=1 not interesting? That requires at least one other backend to be in a transaction before you'll delay. That would seem to be the minimum useful value --- N=0 (always delay) seems clearly to be too stupid to be useful. N=1 seems arbitrarily aggressive. It assumes any open transaction will commit within a few milliseconds; otherwise the delay is wasted. On a fairly busy system, it seems to me to impose a strict upper limit on transaction rate for any client, regardless of actual system I/O load. (N=0 would impose that strict upper limit even for a single client.) Delaying isn't free, because it means that the client can't turn around and do even a cheap query for a while. In a sense, when you delay you are charging the committer a tax to try to improve overall throughput. If the delay lets you reduce I/O churn enough to increase the total bandwidth, then it was worthwhile; if not, you just cut system performance, and responsiveness to each client, for nothing. The above suggests that maybe N should depend on recent disk I/O activity, so you get a larger N (and thus less likely delay and more certain payoff) for a more lightly-loaded system. On a system that has maxed its I/O bandwidth, clients will suffer delays anyhow, so they might as well suffer controlled delays that result in better total throughput. On a lightly-loaded system there's no need, or payoff, for such throttling. Can we measure disk system load by averaging the times taken for fsyncs? Nathan Myers [EMAIL PROTECTED]
Re: [HACKERS] regression test form
On Fri, Feb 23, 2001 at 03:53:14PM -0500, Vince Vielhaber wrote: Yes there are some extra linuxes, if noone comes up with another distro I'll lop the extras off. BTW, is VA Linux a distribution or just a tool company?? Debian is a pretty important Linux distibution, probably second only to Red Hat in number of installations. PG is packaged for it by Oliver Elphick, who is on this list. Debian is currently supported on x86, SPARC, PowerPC, M68K, ARM, and Alpha architectures. VA Linux is a hardware vendor. They ship with any of Red Hat, Debian, or Suse distributions installed, per customer preference. Nathan Myers [EMAIL PROTECTED]
Re: [HACKERS] CommitDelay performance improvement
On Fri, Feb 23, 2001 at 11:32:21AM -0500, Tom Lane wrote: A further refinement, still quite cheap to implement since the info is in the PROC struct, would be to not count backends that are blocked waiting for locks. These guys are less likely to be ready to commit in the next few milliseconds than the guys who are actively running; indeed they cannot commit until someone else has committed/aborted to release the lock they need. Comments? What should the threshold N be ... or do we need to make that a tunable parameter? Once you make it tuneable, you're stuck with it. You can always add a knob later, after somebody discovers a real need. Nathan Myers [EMAIL PROTECTED]
Re: [HACKERS] CommitDelay performance improvement
On Fri, Feb 23, 2001 at 05:18:19PM -0500, Tom Lane wrote: [EMAIL PROTECTED] (Nathan Myers) writes: Comments? What should the threshold N be ... or do we need to make that a tunable parameter? Once you make it tuneable, you're stuck with it. You can always add a knob later, after somebody discovers a real need. If we had a good idea what the default level should be, I'd be willing to go without a knob. I'm thinking of a default of about 5 (ie, at least 5 other active backends to trigger a commit delay) ... but I'm not so confident of that that I think it needn't be tunable. It's really dependent on your average and peak transaction lengths, and that's going to vary across installations, so unless we want to try to make it self-adjusting, a knob seems like a good idea. A self-adjusting delay might well be a great idea, BTW, but I'm trying to be conservative about how much complexity we should add right now. When thinking about tuning N, I like to consider what are the interesting possible values for N: 0: Ignore any other potential committers. 1: The minimum possible responsiveness to other committers. 5: Tom's guess for what might be a good choice. 10: Harry's guess. ~0: Always delay. I would rather release with N=1 than with 0, because it actually responds to conditions. What N might best be, 1, probably varies on a lot of hard-to-guess parameters. It seems to me that comparing various choices (and other, more interesting, algorithms) to the N=1 case would be more productive than comparing them to the N=0 case, so releasing at N=1 would yield better statistics for actually tuning in 7.2. Nathan Myers [EMAIL PROTECTED]
Re: [HACKERS] CommitDelay performance improvement
On Fri, Feb 23, 2001 at 06:37:06PM -0500, Bruce Momjian wrote: When thinking about tuning N, I like to consider what are the interesting possible values for N: 0: Ignore any other potential committers. 1: The minimum possible responsiveness to other committers. 5: Tom's guess for what might be a good choice. 10: Harry's guess. ~0: Always delay. I would rather release with N=1 than with 0, because it actually responds to conditions. What N might best be, 1, probably varies on a lot of hard-to-guess parameters. It seems to me that comparing various choices (and other, more interesting, algorithms) to the N=1 case would be more productive than comparing them to the N=0 case, so releasing at N=1 would yield better statistics for actually tuning in 7.2. We don't release code because it has better tuning opportunities for later releases. What we can do is give people parameters where the default is safe, and they can play and report to us. Perhaps I misunderstood. I had perceived N=1 as a conservative choice that was nevertheless preferable to N=0. Nathan Myers [EMAIL PROTECTED]
Re: [HACKERS] CommitDelay performance improvement
On Fri, Feb 23, 2001 at 09:05:20PM -0500, Bruce Momjian wrote: It seems to me that comparing various choices (and other, more interesting, algorithms) to the N=1 case would be more productive than comparing them to the N=0 case, so releasing at N=1 would yield better statistics for actually tuning in 7.2. We don't release code because it has better tuning opportunities for later releases. What we can do is give people parameters where the default is safe, and they can play and report to us. Perhaps I misunderstood. I had perceived N=1 as a conservative choice that was nevertheless preferable to N=0. I think zero delay is the conservative choice at this point, unless we hear otherwise from testers. I see, I had it backwards: N=0 corresponds to "always delay", and N=infinity (~0) is "never delay", or what you call zero delay. N=1 is not interesting. N=M/2 or N=sqrt(M) or N=log(M) might be interesting, where M is the number of backends, or the number of backends with begun transactions, or something. N=10 would be conservative (and maybe pointless) just because it would hardly ever trigger a delay. Nathan Myers [EMAIL PROTECTED]
Re: [HACKERS] GPL, readline, and static/dynamic linking
On Thu, Feb 22, 2001 at 10:50:17AM -0500, Bruce Momjian wrote: Let me add I don't agree with this, and find the whole GPL heavy-handedness very distasteful. Please, not this again. Is there a piss-and-moan-about-the-GPL schedule posted somewhere? Either PG is in compliance, or it's not. Only libreadline's copyright holder has the right to complain if it's not. There is no need to speculate; if we care about compliance, we need only ask the owner. If the owner says we're violating his license, then we can comply, or negotiate, or stop using the code. The GPL is no different from any other license, that way. Complaining about the terms on something you got for nothing has to be the biggest waste of time and attention I've seen on this list. Nathan Myers [EMAIL PROTECTED]
Re: [HACKERS] Re: WAL and commit_delay
On Sun, Feb 18, 2001 at 11:51:50AM -0500, Tom Lane wrote: Adriaan Joubert [EMAIL PROTECTED] writes: fdatasync() is available on Tru64 and according to the man-page behaves as Tom expects. So it should be a win for us. Careful ... HPUX's man page also claims that fdatasync does something useful, but it doesn't. I'd recommend an experiment. Does today's snapshot run any faster for you (without -F) than before? It's worth noting in documentation that systems that don't have fdatasync(), or that have the phony implementation, can get the same benefit by using a raw volume (partition) for the log file. This applies even on Linux 2.0 and 2.2 without the "raw-i/o" patch. Using raw volumes would have other performance benefits, even on systems that do fully support fdatasync, through bypassing the buffer cache. (The above assumes I understood correctly Vadim's postings about changes he made to support putting logs on raw volumes.) Nathan Myers [EMAIL PROTECTED]
Re: [HACKERS] WAL and commit_delay
On Sat, Feb 17, 2001 at 03:45:30PM -0500, Bruce Momjian wrote: Right now the WAL preallocation code (XLogFileInit) is not good enough because it does lseek to the 16MB position and then writes 1 byte there. On an implementation that supports holes in files (which is most Unixen) that doesn't cause physical allocation of the intervening space. We'd have to actually write zeroes into all 16MB to ensure the space is allocated ... but that's just a couple more lines of code. Are OS's smart enough to not allocate zero-written blocks? No, but some disks are. Writing zeroes is a bit faster on smart disks. This has no real implications for PG, but it is one of the reasons that writing zeroes doesn't really wipe a disk, for forensic purposes. Nathan Myers [EMAIL PROTECTED]
Re: [HACKERS] Microsecond sleeps with select()
On Sat, Feb 17, 2001 at 12:26:31PM -0500, Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: A comment on microsecond delays using select(). Most Unix kernels run at 100hz, meaning that they have a programmable timer that interrupts the CPU every 10 milliseconds. Right --- this probably also explains my observation that some kernels seem to add an extra 10msec to the requested sleep time. Actually they're interpreting a one-clock-tick select() delay as "wait till the next clock tick, plus one tick". The actual delay will be between one and two ticks depending on just when you went to sleep. ... In short: s_spincycle in its current form does not do anything anywhere near what the author thought it would. It's wasted complexity. I am thinking about simplifying s_lock_sleep down to simple wait-one-tick-on-every-call logic. An alternative is to keep s_spincycle, but populate it with, say, 1, 2 and larger entries, which would offer some hope of actual random-backoff behavior. Either change would clearly be a win on single-CPU machines, and I doubt it would hurt on multi-CPU machines. Comments? I don't believe that most kernels schedule only on clock ticks. They schedule on a clock tick *or* whenever the process yields, which on a loaded system may be much more frequently. The question is whether, scheduling, the kernel considers processes that have requested to sleep less than a clock tick as "ready" once their actual request time expires. On V7 Unix, the answer was no, because the kernel had no way to measure any time shorter than a tick, so it rounded up all sleeps to "the next tick". Certainly there are machines and kernels that count time more precisely (isn't PG ported to QNX?). We do users of such kernels no favors by pretending they only count clock ticks. Furthermore, a 1ms clock tick is pretty common, e.g. on Alpha boxes. A 10ms initial delay is ten clock ticks, far longer than seems appropriate. This argues for yielding the minimum discernable amount of time (1us) and then backing off to a less-minimal time (1ms). On systems that chug at 10ms, this is equivalent to a sleep of up-to-10ms (i.e. until the next tick), then a sequence of 10ms sleeps; on dumbOS Alphas, it's equivalent to a sequence of 1ms sleeps; and on a smartOS on an Alpha it's equivalent to a short, variable time (long enough for other runnable processes to run and yield) followed by a sequence of 1ms sleeps. (Some of the numbers above are doubled on really dumb kernels, as Tom noted.) Nathan Myers [EMAIL PROTECTED]
Re: [HACKERS] Re: WAL and commit_delay
On Sat, Feb 17, 2001 at 06:30:12PM -0500, Brent Verner wrote: On 17 Feb 2001 at 17:56 (-0500), Tom Lane wrote: [snipped] | Is anyone out there running a 2.4 Linux kernel? Would you try pgbench | with current sources, commit_delay=0, -B at least 1024, no -F, and see | how the results change when pg_fsync is made to call fdatasync instead | of fsync? (It's in src/backend/storage/file/fd.c) I've not run this requested test, but glibc-2.2 provides this bit of code for fdatasync, so it /appears/ to me that kernel version will not affect the test case. [glibc-2.2/sysdeps/generic/fdatasync.c] int fdatasync (int fildes) { return fsync (fildes); } In the 2.4 kernel it says (fs/buffer.c) /* this needs further work, at the moment it is identical to fsync() */ down(inode-i_sem); err = file-f_op-fsync(file, dentry); up(inode-i_sem); We can probably expect this to be fixed in an upcoming 2.4.x, i.e. well before 2.6. This is moot, though, if you're writing to a raw volume, which you will be if you are really serious. Then, fsync really is equivalent to fdatasync. Nathan Myers [EMAIL PROTECTED]
Re: [HACKERS] locale support
On Mon, Feb 12, 2001 at 09:59:37PM -0500, Tom Lane wrote: Tatsuo Ishii [EMAIL PROTECTED] writes: I know this is not PostgreSQL's fault but the broken locale data on certain platforms. The problem makes it impossible to use PostgreSQL RPMs in Japan. I'm looking for solutions/workarounds for this problem. Build a set of RPMs without locale support? Run it with LC_ALL="C". Nathan Myers [EMAIL PROTECTED]
Re: [HACKERS] Syslog and pg_options (for RPMs)
On Thu, Feb 08, 2001 at 11:36:38PM -0500, Vince Vielhaber wrote: On 8 Feb 2001, Ian Lance Taylor wrote: Unfortunately, the license [to splogger] probably precludes including it with Postgres. Fortunately, it's only 72 lines long, and would be trivial to recreate. I missed most of this, but has anyone actually ASKED Dan for permission? What's the point? I've attached an independent implementation. It recognizes tags for all seven levels. It needs no command-line arguments. Untagged messages end up logged as "LOG_NOTICE". Use it freely. Nathan Myers [EMAIL PROTECTED] -- /* pglogger: stdin-to-syslog gateway for postgresql. * * Copyright 2001 by Nathan Myers [EMAIL PROTECTED] * Permission is granted to make copies for any purpose if * this copyright notice is retained unchanged. */ #include stdio.h #include stddef.h #include syslog.h #include string.h char* levels[] = { "", "emerg:", "alert:", "crit:", "err:", "warning:", "notice:", "info:", "debug:" }; int lengths[] = { 0, sizeof("emerg"), sizeof("alert"), sizeof("crit"), sizeof("err"), sizeof("warning"), sizeof("notice"), sizeof("info"), sizeof("debug") }; int priorities[] = { LOG_NOTICE, LOG_EMERG, LOG_ALERT, LOG_CRIT, LOG_ERR, LOG_WARNING, LOG_NOTICE, LOG_INFO, LOG_DEBUG }; int main() { char buf[301]; int c; char* pos = buf; int colon = 0; #ifndef DEBUG openlog("postgresql", LOG_CONS, LOG_LOCAL1); #endif while ( (c = getchar()) != EOF) { if (c == '\r') { continue; } if (c == '\n') { int level = (colon ? sizeof(levels)/sizeof(*levels) : 1); char* bol; *pos = 0; while (--level) { if (pos - buf = lengths[level] strncmp(buf, levels[level], lengths[level]) == 0) { break; } } bol = buf + lengths[level]; if (bol buf *bol == ' ') { ++bol; } if (pos - bol 0) { #ifndef DEBUG syslog(priorities[level], "%s", bol); #else printf("%d/%s\n", priorities[level], bol); #endif } pos = buf; colon = 0; continue; } if (c == ':') { colon = 1; } if ((size_t)(pos - buf) sizeof(buf)-1) { *pos++ = c; } } return 0; }
Re: [HACKERS] Btree runtime recovery. Stuck spins.
On Fri, Feb 09, 2001 at 01:23:35PM -0500, Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: Our spinlocks don't go into an infinite test loop, right? They back off and retest at random intervals. Not very random --- either 0 or 10 milliseconds. (I think there was some discussion of changing that, but it died off without agreeing on anything.) ... I think we agreed that 0 was just wrong, but nobody changed it. Changing it to 1 microsecond would be the smallest reasonable change. As it is, it just does a bunch of no-op syscalls each time it wakes up after a 10ms sleep, without yielding the CPU. Nathan Myers [EMAIL PROTECTED]
Re: [HACKERS] Syslog and pg_options (for RPMs)
Here's the latest version of the pg_logger utility. The particular questions that come to my mind are: 1. Do the prefixes it watches for match what PG produces? 2. Should it log to LOG_LOCAL1 or to some other LOG_LOCALn? 3. Is the ident string ("postgresql") right? 4. Are the openlog() args right? (E.g. should it ask for LOG_PID too?) 5. What am I failing to ask about? I'd like to turn it over to whoever can answer those questions. Nathan Myers [EMAIL PROTECTED] - /* pg_logger: stdin-to-syslog gateway for postgresql. * * Copyright 2001 by Nathan Myers [EMAIL PROTECTED] * This software is distributed free of charge with no warranty of any kind. * You have permission to make copies for any purpose, provided that (1) * this copyright notice is retained unchanged, and (2) you agree to * absolve the author of all responsibility for all consequences arising * from any use. */ #include stdio.h #include stddef.h #include syslog.h #include string.h struct { char *name; int size; int priority; } tags[] = { { "", 0, LOG_NOTICE }, { "emerg:", sizeof("emerg"), LOG_EMERG }, { "alert:", sizeof("alert"), LOG_ALERT }, { "crit:",sizeof("crit"),LOG_CRIT }, { "err:", sizeof("err"), LOG_ERR }, { "error:", sizeof("error"), LOG_ERR }, { "warning:", sizeof("warning"), LOG_WARNING }, { "notice:", sizeof("notice"), LOG_NOTICE }, { "info:",sizeof("info"),LOG_INFO }, { "debug:", sizeof("debug"), LOG_DEBUG } }; int main() { char buf[301]; int c; char *pos = buf; const char *colon = 0; #ifndef DEBUG openlog("postgresql", LOG_CONS, LOG_LOCAL1); #endif while ( (c = getchar()) != EOF) { if (c == '\r') { continue; } if (c == '\n') { int level = sizeof(tags)/sizeof(*tags); char *bol; if (colon == 0 || (size_t)(colon - buf) sizeof("warning")) { level = 1; } *pos = 0; while (--level) { if (pos - buf = tags[level].size strncmp(buf, tags[level].name, tags[level].size) == 0) { break; } } bol = buf + tags[level].size; if (bol buf *bol == ' ') { ++bol; } if (pos - bol 0) { #ifndef DEBUG syslog(tags[level].priority, "%s", bol); #else printf("%d/%s\n", tags[level].priority, bol); #endif } pos = buf; colon = (char const *)0; continue; } if (c == ':' !colon) { colon = pos; } if ((size_t)(pos - buf) sizeof(buf)-1) { *pos++ = c; } } return 0; }
Re: [HACKERS] Syslog and pg_options (for RPMs)
On Thu, Feb 08, 2001 at 04:00:12PM -0500, Lamar Owen wrote: "Dominic J. Eidson" wrote: On Thu, 8 Feb 2001, Lamar Owen wrote: A syslogger of stderr would make a nice place to pipe the output :-). 'postmaster 21 | output-to-syslog-program -f facility.desired' or 21 | logger -p facility.level [snip] Logger provides a shell command interface to the syslog(3) system log module. Good. POSIX required, and part of the base system (basically, guaranteed to be there on any Linux box). Thanks for the pointer. Not so fast... logger just writes its arguments to syslog. I don't see any indication that it (portably) reads its standard input. It's meant for use in shellscripts. You could write: ... 21 | while read i; do logger -p local1.warning -t 'PG ' -- "$i"; done but syslog is pretty high-overhead already without starting up logger on every message. Maybe stderr messages are infrequent enough that it doesn't matter. Nathan Myers [EMAIL PROTECTED]
Re: [HACKERS] PL/pgsql EXECUTE 'SELECT INTO ...'
On Wed, Feb 07, 2001 at 10:15:02PM -0500, Tom Lane wrote: I have looked a little bit at what it'd take to make SELECT INTO inside an EXECUTE work the same as it does in plain plpgsql ... If we do nothing now, and then implement this feature in 7.2, we will have a backwards compatibility problem: EXECUTE 'SELECT INTO ...' will completely change in meaning. I am inclined to keep our options open by forbidding EXECUTE 'SELECT INTO ...' for now. ... if [not] I think we'll regret it later. I agree, disable it. But put a backpatch into contrib along with a reference to this last e-mail. Anybody who cares enough can apply the patch, and will be prepared for the incompatibility. Nathan Myers [EMAIL PROTECTED]
Re: [HACKERS] OID from insert has extra letter
On Tue, Feb 06, 2001 at 01:21:00PM -0500, Bruce Momjian wrote: *** fe-exec.c 2001/01/24 19:43:30 1.98 --- fe-exec.c 2001/02/06 02:02:27 1.100 *** *** 2035,2041 if (len 23) len = 23; strncpy(buf, res-cmdStatus + 7, len); ! buf[23] = '\0'; return buf; } --- 2035,2041 if (len 23) len = 23; strncpy(buf, res-cmdStatus + 7, len); ! buf[len] = '\0'; return buf; } Hmm, is there some undocumented feature of strncpy that I don't know about, where it modifies the passed length variable (which would be hard, since it's pass by value)? Otherwise, doesn't this patch just replace the constant '23' with the variable 'len', set to 23? What if len 23? If len 23, then strncpy will have terminated the destination already. Poking out buf[23] just compensates for a particular bit of brain damage in strncpy. Read the man page: The strncpy() function is similar [to strcpy], except that not more than n bytes of src are copied. Thus, if there is no null byte among the first n bytes of src, the result wil not be null-terminated. Thus, the original code is OK, except probably the literal "23" in place of what should be a meaningful symbolic constant, or (at least!) sizeof(buf) - 1. BTW, that static buffer in PGoidStatus is likely to upset threaded client code... ob-ed To null-terminate strings is an Abomination. /ob-ed Nathan Myers [EMAIL PROTECTED]
Re: [HACKERS] using the same connection?
On Tue, Feb 06, 2001 at 11:08:49AM -0500, Mathieu Dube wrote: Hi y'all, Is it a bad idea for an app to keep just a couple of connections to a database, put semaphore/mutex on them and reuse them all through the program? Of course I would check if their PQstatus isnt at CONNECTION_BAD and reconnect if they were... You would have to hold the lock from BEGIN until COMMIT. Otherwise, connection re-use is normal. Nathan Myers [EMAIL PROTECTED]
Re: [HACKERS] configure problem with krb4 and ssl when compiling 7.1beta4
On Fri, Feb 02, 2001 at 12:03:14PM +, Jun Kuwamura wrote: Furthermore, the newest version of PyGreSQL is 3.1 instead of 2.5. Is this on the TODO-7.1 list? Nathan Myers [EMAIL PROTECTED]
Re: [HACKERS] beta3 Solaris 7 (SPARC) port report
On Thu, Jan 25, 2001 at 09:47:16PM +0100, Frank Joerdens wrote: On Thu, Jan 25, 2001 at 12:04:40PM -0800, Ian Lance Taylor wrote: [ . . . ] for the /tmp directory, which looks distinctly odd to me. What kind of device is swap (I know what swap is normally but I didn't know you could mount stuff there . . . )?? That is a tmpfs file system which uses swap space for /tmp storage. Both swap usage and /tmp compete for the same partition on the disk. If you have a lot of swapping programs, you don't get to put much in /tmp. If you have a lot of files in /tmp, you don't get to run many programs. As far as I can recall, this is a Sun specific thing. It's a reasonable idea on a stable system. It's a pretty crummy idea on a development system, or one with unpredictable loads. My experience is that either something goes crazy and fills up /tmp and then you can't run anything else and you have to reboot, or something goes crazy and fills up swap and then you can't write any /tmp files and daemon processes start to silently die and you have to reboot. Very peculiar, or crummy, indeed. This is system is not used by anyone else besides myself at the moment (cuz it's just being built up), as far a I can tell, and is ludicrously overpowered (3 CPUs, 768 MB RAM) for the mundane uses I am subjecting it to (installing and testing Postgresql). I doubt you can blame any problems on tmpfs, here. tmpfs has been in Solarix for many years, and has had plenty of time to stabilize. With 768M of RAM and running only PG you not using any swap space at all, and unix sockets don't use any appreciable space either, so the conflicts Ian describes are impossible in your case. Nathan Myers [EMAIL PROTECTED]
Re: AW: [HACKERS] like and optimization
On Mon, Jan 22, 2001 at 05:46:09PM -0500, Tom Lane wrote: Hannu Krosing [EMAIL PROTECTED] writes: Is there any possibility to use, in a portable way, only our own locale definition files, without reimplementing all the sorts uppercases etc. ? The situation is not too much different for timezones, BTW. Might make sense to deal with both of those problems in the same way. The timezone situation is much better, in that there is a separate organization which maintains a timezone database and code to operate on it. It wouldn't be necessary to include the package with PG, because it can be got at a standard place. You would only need scripts to download, build, and integrate it. Are there any BSD-license locale and/or timezone libraries that we might assimilate in this way? We could use an LGPL'd library if there is no other alternative, but I'd just as soon not open up the license issue. Posix systems include a set of commands for dumping locales in a standard format, and building from them. Instead of shipping locales and code to operate on them, one might include a script to run these tools (where they exist) to dump an existing locale, edit it a bit, and build a more PG-friendly locale. Nathan Myers [EMAIL PROTECTED]