Re: [HACKERS] ALL, ANY bug?
Tatsuo Ishii [EMAIL PROTECTED] writes: You are right. However, SQL92 spec allows: select * from t1 where i all values(0,1); No, I still disagree. The ANY/ALL constructs compare a single row on the left side with all the rows produced by the query on the right. "values(0,1)" is effectively the same as "SELECT 0,1", ie, it is a query that happens to produce just one row. The above is illegal in SQL92 because the lefthand side is not a two-item row. The semantics you seem to be looking for is "scalar IN/NOT IN list", which we do have. ANY/ALL is a different animal. It's true that we don't accept values(...) as a subquery, but that's only one aspect of this example. regards, tom lane
Re: [HACKERS] ALL, ANY bug?
No, I still disagree. The ANY/ALL constructs compare a single row on the left side with all the rows produced by the query on the right. "values(0,1)" is effectively the same as "SELECT 0,1", ie, it is a query that happens to produce just one row. The above is illegal in SQL92 because the lefthand side is not a two-item row. Oh, I see your point. values(0,1) = SELECT 0,1", which is one row constructor. Sorry for the confusion. -- Tatsuo Ishii
Re: [HACKERS] pg_dump return status..
Nathan Myers writes: On Fri, Jan 05, 2001 at 11:20:43AM -0500, Tom Lane wrote: Philip Warner [EMAIL PROTECTED] writes: how do I check for a failed write in a way that works on all Unixes? Is the following OK: - fwrite: ok if return value equals item count - fprintf: ok if return value 0. - fputc: ok if != EOF Probably fprintf() = 0 --- according to my specs, it returns the number of chars emitted, or a negative value on error. The other two are correct. An fprintf returning 0 is a suspicious event; it's easy to imagine cases where it makes sense, but I don't think I have ever coded one. Probably N (where N is the smallest reasonable output, defaulting to 1) may be a better test in real code. As I recall, on SunOS 4 the printf()s don't return the number of characters written. I don't recall what they do instead, and have no access to such machines any more. Other old BSD-derived systems are likely to have have wonky return values/types on the printf()s. Looking at the list of supported platforms, none jump out as likely candidates, but in the "unsupported" list, Ultrix and NextStep do. (Do we care?) If SunOS 4 is to remain a supported platform, the printf checks may need to be special-cased for it. Current Solaris is liable to problems still, though these are not relevant to this thread. printf() and fprintf() have always returned the number of characters transmitted, or EOF for failure. It is sprintf() that has problems. There are two versions of sprintf() available in SunOS 4 - 8. The standard one (ANSI C) in libc returns an int, the number of characters written (excluding '\0'). The BSD version returns a char* which points to the target. If you have a -lbsd on your link line then you get the BSD version. There are no compiler errors, just run time errors if you rely on the return from sprintf() being the number of characters. The workaround is to put an extra -lc on the link line before the -lbsd if your code needs both standard sprintf() and some other BSD function. Ultrix is documented as having the same behaviour as Solaris. I don't know about NeXTSTEP/OPENSTEP/GNUStep. -- Pete Forman -./\.- Disclaimer: This post is originated WesternGeco -./\.- by myself and does not represent [EMAIL PROTECTED] -./\.- opinion of Schlumberger, Baker http://www.crosswinds.net/~petef -./\.- Hughes or their divisions.
Re: [HACKERS] patch: contrib/pgcrypto sanity
On Mon, Jan 08, 2001 at 10:03:25AM +, Pete Forman wrote: Marko Kreen writes: On Mon, Jan 08, 2001 at 04:06:09AM +0200, Marko Kreen wrote: Well, actually they do. glibc in stdint.h and NetBSD in sys/inttypes.h which is a mess, all rigth. Problem is that postgres.h does not know about this. I guess that C99 forgot to specify _where_ they should be defined. Correction, they both have inttypes.h which probably is the right location for this. stdint.h is adequate to pick up uint*_t. inttypes.h is defined to include stdint.h. Of course all this C99 stuff is new and existing implementations may have the typedefs in different files or not have them at all. But as I said, NetBSD does not have it. So what is the correct/portable/standard location for it? Can anyone with C99 standard in hand find that out? E.g. Tom Lane has some OS where these types are in std{io|lib|def|arg} but on NetBSD and glibc/Linux you must include separate header file for them. -- marko
Re: [HACKERS] heap_update is broken in current sources
How are we on this? On Sun, 7 Jan 2001, Tom Lane wrote: heap_update() currently ends with if (newbuf != buffer) { LockBuffer(newbuf, BUFFER_LOCK_UNLOCK); WriteBuffer(newbuf); } LockBuffer(buffer, BUFFER_LOCK_UNLOCK); WriteBuffer(buffer); /* invalidate caches */ RelationInvalidateHeapTuple(relation, oldtup); RelationMark4RollbackHeapTuple(relation, newtup); return HeapTupleMayBeUpdated; This is broken because WriteBuffer releases our refcounts on the buffer pages that are holding the old and new tuples. By the time RelationInvalidateHeapTuple gets to do its thing, some other backend may have swapped a new disk page into the shared buffer that oldtup points at. catcache.c will then be using the wrong data to compute the hash index of the old tuple. This will at minimum result in failure to invalidate the old tuple out of our catcache (because we'll be searching the wrong hashchains), and can lead to a flat-out crash or Assert failure due to invalid data being fed to the hashing code. I have seen several nonrepeatable failures in the parallel regress tests in recent weeks, which I now believe are all traceable to this error. I will commit a fix for this error shortly, and have recommended to Marc that he re-roll the beta2 tarball before announcing it... regards, tom lane Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org
Re: [HACKERS] heap_update is broken in current sources
The Hermit Hacker [EMAIL PROTECTED] writes: How are we on this? It's fixed. I've also run the regress tests with bufmgr.c hacked up to discard pages (with forcible overwriting) as soon as their refcount goes to zero. That didn't disclose any similar bugs, although the coverage of the tests leaves much to be desired :-( regards, tom lane
Re: [HACKERS] pg_dump return status..
Pete Forman [EMAIL PROTECTED] writes: Philip Warner writes: All I need to know is how to detect an error. Does it return EOF on error? The standard sprintf() returns a negative int on error. I thought we were talking about fprintf. sprintf can't really detect any errors anyway, except maybe a bad format string. regards, tom lane
Re: [HACKERS] Quite strange crash
Denis Perchine [EMAIL PROTECTED] writes: On Monday 08 January 2001 00:08, Tom Lane wrote: FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. Were there any errors before that? No... Just clean log (I redirect log from stderr/out t file, and all other to syslog). The error messages would be in the syslog then, not in stderr. And the last query was: Jan 7 04:27:53 mx postgres[1008]: query: select message_id from pop3 where server_id = 22615 How about the prior queries of other processes? Keep in mind that the spinlock could have been left locked by any backend, not only the one that complained about it. regards, tom lane
Re: [HACKERS] heap_update is broken in current sources
okay, will bundle up beta2 and announce it tonight when I get home ... gives about 6 hrs or so to "halt the presses" *grin* On Mon, 8 Jan 2001, Tom Lane wrote: The Hermit Hacker [EMAIL PROTECTED] writes: How are we on this? It's fixed. I've also run the regress tests with bufmgr.c hacked up to discard pages (with forcible overwriting) as soon as their refcount goes to zero. That didn't disclose any similar bugs, although the coverage of the tests leaves much to be desired :-( regards, tom lane Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: [EMAIL PROTECTED] secondary: scrappy@{freebsd|postgresql}.org
Re: [HACKERS] Quite strange crash
FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. Were there any errors before that? No... Just clean log (I redirect log from stderr/out t file, and all other to syslog). The error messages would be in the syslog then, not in stderr. Hmmm... The only strange errors I see are: Jan 7 04:22:14 mx postgres[679]: query: insert into statistic (date, visit_count, variant_id) values (now(), 1, 2) Jan 7 04:22:14 mx postgres[631]: query: insert into statistic (date, visit_count, variant_id) values (now(), 1, 2) Jan 7 04:22:14 mx postgres[700]: query: insert into statistic (date, visit_count, variant_id) values (now(), 1, 2) Jan 7 04:22:14 mx postgres[665]: query: insert into statistic (date, visit_count, variant_id) values (now(), 1, 2) Jan 7 04:22:14 mx postgres[633]: query: insert into statistic (date, visit_count, variant_id) values (now(), 1, 2) Jan 7 04:22:14 mx postgres[629]: query: insert into statistic (date, visit_count, variant_id) values (now(), 1, 2) Jan 7 04:22:14 mx postgres[736]: query: commit Jan 7 04:22:14 mx postgres[736]: ProcessUtility: commit Jan 7 04:22:14 mx postgres[700]: ERROR: Cannot insert a duplicate key into unique index statistic_date_vid_key Jan 7 04:22:14 mx postgres[700]: query: update users set rcpt_ip='213.75.35.129',rcptdate=now() where id=1428067 Jan 7 04:22:14 mx postgres[700]: NOTICE: current transaction is aborted, queries ignored until end of transaction block Jan 7 04:22:14 mx postgres[679]: query: commit Jan 7 04:22:14 mx postgres[679]: ProcessUtility: commit Jan 7 04:22:14 mx postgres[679]: query: update users set rcpt_ip='213.75.55.185',rcptdate=now() where id=1430836 Jan 7 04:22:14 mx postgres[665]: ERROR: Cannot insert a duplicate key into unique index statistic_date_vid_key Jan 7 04:22:14 mx postgres[665]: query: update users set rcpt_ip='202.156.121.139',rcptdate=now() where id=1271397 Jan 7 04:22:14 mx postgres[665]: NOTICE: current transaction is aborted, queries ignored until end of transaction block Jan 7 04:22:14 mx postgres[631]: ERROR: Cannot insert a duplicate key into unique index statistic_date_vid_key Jan 7 04:22:14 mx postgres[631]: query: update users set rcpt_ip='24.20.53.63',rcptdate=now() where id=1451254 Jan 7 04:22:14 mx postgres[631]: NOTICE: current transaction is aborted, queries ignored until end of transaction block Jan 7 04:22:14 mx postgres[633]: ERROR: Cannot insert a duplicate key into unique index statistic_date_vid_key Jan 7 04:22:14 mx postgres[633]: query: update users set rcpt_ip='213.116.168.173',rcptdate=now() where id=1378049 Jan 7 04:22:14 mx postgres[633]: NOTICE: current transaction is aborted, queries ignored until end of transaction block Jan 7 04:22:14 mx postgres[630]: query: select id,msg,next from alert Jan 7 04:22:14 mx postgres[630]: query: select email,type from email where variant_id=2 Jan 7 04:22:14 mx postgres[630]: query: select * from users where senderdate now()-'10days'::interval AND variant_id=2 AND crypt='21AN6KRffJdFRFc511' Jan 7 04:22:14 mx postgres[629]: ERROR: Cannot insert a duplicate key into unique index statistic_date_vid_key Jan 7 04:22:14 mx postgres[629]: query: update users set rcpt_ip='213.42.45.81',rcptdate=now() where id=1441046 Jan 7 04:22:14 mx postgres[629]: NOTICE: current transaction is aborted, queries ignored until end of transaction block Jan 7 04:22:15 mx postgres[711]: query: select message_id from pop3 where server_id = 17746 Jan 7 04:22:15 mx postgres[711]: ERROR: Relation 'pop3' does not exist They popped up 4 minutes before. And the most interesting is that relation pop3 does exist! And the last query was: Jan 7 04:27:53 mx postgres[1008]: query: select message_id from pop3 where server_id = 22615 How about the prior queries of other processes? I do not want to flood maillist (it will be too much of info). I can send you complete log file from Jan 7. It is 128Mb uncompressed. With gz it is 8Mb. Maybe it will be smaller with bz2. Keep in mind that the spinlock could have been left locked by any backend, not only the one that complained about it. Actually you can have a look on the logs yourself. Remember I gave you a password from postgres user. This is the same postgres. Logs are in /var/log/postgres. You will need postgres.log.1.gz. -- Sincerely Yours, Denis Perchine -- E-Mail: [EMAIL PROTECTED] HomePage: http://www.perchine.com/dyp/ FidoNet: 2:5000/120.5 --
Re: [HACKERS] bootstrap tables
On Sat, Jan 06, 2001 at 03:50:03AM +0100, Peter Eisentraut wrote: Ross J. Reedstrom writes: Do you really need the thing to be a bootstrap table, and not a plain system table? Yup, 'cause it's going to store the schema info, including the system schema. I forsee it needing to be accessed immediately during bootstrap. Does "schema info" mean SQL schemas or merely additional schema information along the lines of pg_class, etc.? I thought that might prick your ears up. Yes, I'm looking at just how horrible it might be to implement SQL schemas. As it turns out, I think Tom is right, and I can just use a regular system table. I've got that working, and some of the gammar modified to accept the schema.table notation: now I'm working from both ends on all the places in between that need to know about the schema. Vadim's work replacing relnames with relfilenode helps to some extent. I'm taking the simplistic approach of adding the schema name wherever the relname is currently used, in parallel, and introducing a new global, the current default schema. As I said in my first note, I'm trying this out, regardless if it's the best way to implement the feature (why is it that I only seem to find time to work on new stuff in pgsql when we're in beta?) We can debate a better implementation after I have roughly working code, or have given up on it as a bad idea. As it happens, my pgsql time this week is taken up by another task, so I won't be working on this until the weekend, at the earliest. Ross
Re: [HACKERS] Quite strange crash
Denis Perchine [EMAIL PROTECTED] writes: FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. Were there any errors before that? Actually you can have a look on the logs yourself. Well, I found a smoking gun: Jan 7 04:27:51 mx postgres[2501]: FATAL 1: The system is shutting down PID 2501 had been running: Jan 7 04:25:44 mx postgres[2501]: query: vacuum verbose lazy; What seems to have happened is that 2501 curled up and died, leaving one or more buffer spinlocks locked. Roughly one spinlock timeout later, at 04:29:07, we have 1008 complaining of a stuck spinlock. So that fits. The real question is what happened to 2501? None of the other backends reported a SIGTERM signal, so the signal did not come from the postmaster. Another interesting datapoint: there is a second place in this logfile where one single backend reports SIGTERM while its brethren keep running: Jan 7 04:30:47 mx postgres[4269]: query: vacuum verbose; ... Jan 7 04:38:16 mx postgres[4269]: FATAL 1: The system is shutting down There is something pretty fishy about this. You aren't by any chance running the postmaster under a ulimit setting that might cut off individual backends after a certain amount of CPU time, are you? What signal does a ulimit violation deliver on your machine, anyway? regards, tom lane
[HACKERS] Re: Should heap_update/heap_delete hold buffer locks while toasting?
Tom Lane wrote: The way that heap_update() and heap_delete() are currently coded, they hold the buffer context lock on the buffer containing the old tuple while they invoke heap_tuple_toast_attrs(). This strikes me as at least inefficient and at worst a source of deadlock. Is it possible to avoid holding the buffer lock while doing the TOAST manipulations? Since the TOAST table access is doing it's own locking on the TOAST tables, I think it'd be possible to move it outside of the buffer lock. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] #
Re: [HACKERS] is_view seems unnecessarily slow
Tom Lane wrote: backend/commands/command.c has a routine is_view() that tests for view-ness by scanning pg_rewrite (all of it) to see if the given relation has any ON SELECT rules. This is only used to disallow AlterTableAddConstraint and LockTableCommand on views. While I don't care much about the performance of AlterTableAddConstraint, it does bug me that this might slow down LOCK TABLE a good deal. Any objection to replacing this routine by a test for relkind = VIEW? No objections. These checks came from the days where views still had relkind 'r' and looking up pg_rewrite was the only way to know. Jan -- #==# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #== [EMAIL PROTECTED] #
Re: [HACKERS] Assuming that TAS() will succeed the first time is verboten
One last followup on that bizarreness about shutdown's checkpoint failing on Alpha platforms --- After changing the checkpoint code to loop, rather than assuming TAS() must succeed the first time, I noticed that it always looped exactly once. This didn't make sense to me at the time, but after querying some Alpha experts at DEC^H^H^HCompaq, it does now. If a new process's first write to a shared memory page is a stq_c, that stq_c is guaranteed to fail (at least on Tru64 Unix), because it will page fault. The shared memory page is inherited read-only and is converted to read-write on first fault. This doesn't seem really necessary, but I suppose it's done to share code with the copy-on-write case for non-shared pages that are inherited via fork(). It makes sense that the checkpoint process's first write to shared memory would be stq_c, because after all it shouldn't be scribbling on shared memory until it's got the spinlock, n'est ce pas? So a failure the first time through the TAS loop is entirely expected for Alpha. I wouldn't be surprised to see similar behavior on other architectures, now that I see the first-write-from-a-process connection. Bottom line is the same: always call TAS() in a retry loop. regards, tom lane
Re: [HACKERS] Quite strange crash
On Mon, Jan 08, 2001 at 12:21:38PM -0500, Tom Lane wrote: Denis Perchine [EMAIL PROTECTED] writes: FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. Were there any errors before that? Actually you can have a look on the logs yourself. Well, I found a smoking gun: ... What seems to have happened is that 2501 curled up and died, leaving one or more buffer spinlocks locked. ... There is something pretty fishy about this. You aren't by any chance running the postmaster under a ulimit setting that might cut off individual backends after a certain amount of CPU time, are you? What signal does a ulimit violation deliver on your machine, anyway? It's worth noting here that modern Unixes run around killing user-level processes more or less at random when free swap space (and sometimes just RAM) runs low. AIX was the first such, but would send SIGDANGER to processes first to try to reclaim some RAM; critical daemons were expected to explicitly ignore SIGDANGER. Other Unixes picked up the idea without picking up the SIGDANGER behavior. The reason for this common pathological behavior is usually traced to sloppy resource accounting. It manifests as the bad policy of having malloc() (and sbrk() or mmap() underneath) return a valid pointer rather than NULL, on the assumption that most of the memory asked for won't be used just yet. Anyhow, the system doesn't know how much memory is really available at that moment. Usually the problem is explained with the example of a very large process that forks, suddenly demanding twice as much memory. (Apache is particularly egregious this way, allocating lots of memory and then forking several times.) Instead of failing the fork, the kernel waits for a process to touch memory it was granted and then see if any RAM/swap has turned up to satisfy it, and then kill the process (or some random other process!) if not. Now that programs have come to depend on this behavior, it has become very hard to fix it. The implication for the rest of us is that we should expect our processes to be killed at random, just for touching memory granted, or for no reason at all. (Kernel people say, "They're just user-level programs, restart them;" or, "Maybe we can designate some critical processes that don't get killed".) In Linux they try to invent heuristics to avoid killing the X server, because so many programs depend on it. It's a disgraceful mess, really. The relevance to the issue at hand is that processes dying during heavy memory load is a documented feature of our supported platforms. Nathan Myers [EMAIL PROTECTED]
[HACKERS] Dec TRU64/PG 7.1
Has anyone tried 7.1Beta on Tru64? I've got an app that will be moving to this platform, and would like to not have any surprises (It's the first production app with PG in the dallas office, and would like to not make trouble). Thanks! LER -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED] US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749
[HACKERS] README.mb
I think doc/REAME.mb is now deprecated and I would like to remove it. Also I would like to place the Chinese (Big5) version of README.mb recently posted by Chih-Chang Hsieh [EMAIL PROTECTED] as doc/README.mb.big5. Comments? Objections? -- Tatsuo Ishii
[HACKERS] Compaq open source database benchmark
Compaq has released an open-source database benchmark at: http://opensource.compaq.com/sourceforge/project/?group_id=19 I must say, Compaq has been more involved with PostgreSQL than any of the other computer vendors. They have contributed equipment, and now benchmark source code. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026
Re: [HACKERS] Assuming that TAS() will succeed the first time is verboten
One last followup on that bizarreness about shutdown's checkpoint failing on Alpha platforms --- After changing the checkpoint code to loop, rather than assuming TAS() must succeed the first time, I noticed that it always looped exactly once. This didn't make sense to me at the time, but after querying some Alpha experts at DEC^H^H^HCompaq, it does now. If a new process's first write to a shared memory page is a stq_c, that stq_c is guaranteed to fail (at least on Tru64 Unix), because it will page fault. The shared memory page is inherited read-only and is converted to read-write on first fault. This doesn't seem really necessary, but I suppose it's done to share code with the copy-on-write case for non-shared pages that are inherited via fork(). This seems quite bizarre. Why would the process fail on the write, and not just pause and wait for the fault to bring in the page? Doesn't the CPU halt the instruction to fetch in the page and restart the instruction? -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026
Re: [HACKERS] Assuming that TAS() will succeed the first time is verboten
Bruce Momjian [EMAIL PROTECTED] writes: After changing the checkpoint code to loop, rather than assuming TAS() must succeed the first time, I noticed that it always looped exactly once. This didn't make sense to me at the time, but after querying some Alpha experts at DEC^H^H^HCompaq, it does now. If a new process's first write to a shared memory page is a stq_c, that stq_c is guaranteed to fail (at least on Tru64 Unix), because it will page fault. The shared memory page is inherited read-only and is converted to read-write on first fault. This doesn't seem really necessary, but I suppose it's done to share code with the copy-on-write case for non-shared pages that are inherited via fork(). This seems quite bizarre. Why would the process fail on the write, and not just pause and wait for the fault to bring in the page? An ordinary write would be re-executed and would succeed after the page fault. stq_c is different, because it's only supposed to succeed if the processor has managed to hold an access lock on the target address continuously since the ldq_l. It would be very bad form to try to hold the lock during a page fault. (stq_c will also fail if the processor is interrupted between ldq_l and stq_c, so occasional failures are to be expected. What was surprising me was the consistency of the failure pattern.) See the Alpha Architecture Manual if you really want to discuss this. regards, tom lane
Re: [HACKERS] Assuming that TAS() will succeed the first time is verboten
Oh, thanks. That makes sense. Bruce Momjian [EMAIL PROTECTED] writes: After changing the checkpoint code to loop, rather than assuming TAS() must succeed the first time, I noticed that it always looped exactly once. This didn't make sense to me at the time, but after querying some Alpha experts at DEC^H^H^HCompaq, it does now. If a new process's first write to a shared memory page is a stq_c, that stq_c is guaranteed to fail (at least on Tru64 Unix), because it will page fault. The shared memory page is inherited read-only and is converted to read-write on first fault. This doesn't seem really necessary, but I suppose it's done to share code with the copy-on-write case for non-shared pages that are inherited via fork(). This seems quite bizarre. Why would the process fail on the write, and not just pause and wait for the fault to bring in the page? An ordinary write would be re-executed and would succeed after the page fault. stq_c is different, because it's only supposed to succeed if the processor has managed to hold an access lock on the target address continuously since the ldq_l. It would be very bad form to try to hold the lock during a page fault. (stq_c will also fail if the processor is interrupted between ldq_l and stq_c, so occasional failures are to be expected. What was surprising me was the consistency of the failure pattern.) See the Alpha Architecture Manual if you really want to discuss this. regards, tom lane -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup.| Drexel Hill, Pennsylvania 19026
Re: [HACKERS] README.mb
Tatsuo Ishii [EMAIL PROTECTED] writes: I think doc/REAME.mb is now deprecated and I would like to remove it. Also I would like to place the Chinese (Big5) version of README.mb recently posted by Chih-Chang Hsieh [EMAIL PROTECTED] as doc/README.mb.big5. Go for it. I was surprised you hadn't done so already. regards, tom lane
Re: [HACKERS] Dec TRU64/PG 7.1
Larry Rosenman [EMAIL PROTECTED] writes: Has anyone tried 7.1Beta on Tru64? beta1 will not work, but recent snapshots are reported to pass regression tests. Feel free to pound away on it ... regards, tom lane
Re: [HACKERS] Quite strange crash
Well, I found a smoking gun: ... What seems to have happened is that 2501 curled up and died, leaving one or more buffer spinlocks locked. ... There is something pretty fishy about this. You aren't by any chance running the postmaster under a ulimit setting that might cut off individual backends after a certain amount of CPU time, are you? What signal does a ulimit violation deliver on your machine, anyway? It's worth noting here that modern Unixes run around killing user-level processes more or less at random when free swap space (and sometimes just RAM) runs low. AIX was the first such, but would send SIGDANGER to processes first to try to reclaim some RAM; critical daemons were expected to explicitly ignore SIGDANGER. Other Unixes picked up the idea without picking up the SIGDANGER behavior. That's not the case for sure. There are 512Mb on the machine, and when I had this problem it was compltely unloaded (300Mb in caches). -- Sincerely Yours, Denis Perchine -- E-Mail: [EMAIL PROTECTED] HomePage: http://www.perchine.com/dyp/ FidoNet: 2:5000/120.5 --
Re: [HACKERS] Quite strange crash
Denis Perchine [EMAIL PROTECTED] writes: It's worth noting here that modern Unixes run around killing user-level processes more or less at random when free swap space (and sometimes just RAM) runs low. That's not the case for sure. There are 512Mb on the machine, and when I had this problem it was compltely unloaded (300Mb in caches). The fact that VACUUM processes seemed to be preferential victims suggests a resource limit of some sort. I had suggested a CPU-time limit, but perhaps it could also be disk-pages-written. regards, tom lane
Re: [HACKERS] Quite strange crash
On Monday 08 January 2001 23:21, Tom Lane wrote: Denis Perchine [EMAIL PROTECTED] writes: FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting. Were there any errors before that? Actually you can have a look on the logs yourself. Well, I found a smoking gun: Jan 7 04:27:51 mx postgres[2501]: FATAL 1: The system is shutting down PID 2501 had been running: Jan 7 04:25:44 mx postgres[2501]: query: vacuum verbose lazy; Hmmm... actually this is real problem with vacuum lazy. Sometimes it just do something for enormous amount of time (I have mailed a sample database to Vadim, but did not get any response yet). It is possible, that it was me, who killed the backend. What seems to have happened is that 2501 curled up and died, leaving one or more buffer spinlocks locked. Roughly one spinlock timeout later, at 04:29:07, we have 1008 complaining of a stuck spinlock. So that fits. The real question is what happened to 2501? None of the other backends reported a SIGTERM signal, so the signal did not come from the postmaster. Another interesting datapoint: there is a second place in this logfile where one single backend reports SIGTERM while its brethren keep running: Jan 7 04:30:47 mx postgres[4269]: query: vacuum verbose; ... Jan 7 04:38:16 mx postgres[4269]: FATAL 1: The system is shutting down Hmmm... Maybe this also was me... But I am not sure here. There is something pretty fishy about this. You aren't by any chance running the postmaster under a ulimit setting that might cut off individual backends after a certain amount of CPU time, are you? [postgres@mx postgres]$ ulimit -a core file size (blocks) 100 data seg size (kbytes) unlimited file size (blocks) unlimited max memory size (kbytes) unlimited stack size (kbytes) 8192 cpu time (seconds) unlimited max user processes 2048 pipe size (512 bytes)8 open files 1024 virtual memory (kbytes) 2105343 No, there are no any ulimits. What signal does a ulimit violation deliver on your machine, anyway? if (psecs / HZ p-rlim[RLIMIT_CPU].rlim_cur) { /* Send SIGXCPU every second.. */ if (!(psecs % HZ)) send_sig(SIGXCPU, p, 1); /* and SIGKILL when we go over max.. */ if (psecs / HZ p-rlim[RLIMIT_CPU].rlim_max) send_sig(SIGKILL, p, 1); } This part of the kernel show the logic. This mean that process wil get SIGXCPU each second if it above soft limit, and SIGKILL when it will be above hardlimit. -- Sincerely Yours, Denis Perchine -- E-Mail: [EMAIL PROTECTED] HomePage: http://www.perchine.com/dyp/ FidoNet: 2:5000/120.5 --
Re: [HACKERS] Quite strange crash
Denis Perchine [EMAIL PROTECTED] writes: Hmmm... actually this is real problem with vacuum lazy. Sometimes it just do something for enormous amount of time (I have mailed a sample database to Vadim, but did not get any response yet). It is possible, that it was me, who killed the backend. Killing an individual backend with SIGTERM is bad luck. The backend will assume that it's being killed by the postmaster, and will exit without a whole lot of concern for cleaning up shared memory --- the expectation is that as soon as all the backends are dead, the postmaster will reinitialize shared memory. You can get away with sending SIGINT (QueryCancel) to an individual backend. Anything else voids the warranty ;=) But, having said that --- this VACUUM process had only been running for two minutes of real time. Seems unlikely that you'd have chosen to kill it so quickly. regards, tom lane
RE: [HACKERS] Quite strange crash
Killing an individual backend with SIGTERM is bad luck. The backend will assume that it's being killed by the postmaster, and will exit without a whole lot of concern for cleaning up shared memory --- the What code will be returned to postmaster in this case? Vadim
Re: [HACKERS] Quite strange crash
"Mikheev, Vadim" [EMAIL PROTECTED] writes: Killing an individual backend with SIGTERM is bad luck. The backend will assume that it's being killed by the postmaster, and will exit without a whole lot of concern for cleaning up shared memory --- the What code will be returned to postmaster in this case? Right at the moment, the backend will exit with status 0. I think you are thinking the same thing I am: maybe a backend that receives SIGTERM ought to exit with nonzero status. That would mean that killing an individual backend would instantly translate into an installation-wide restart. I am not sure whether that's a good idea. Perhaps this cure is worse than the disease. Comments anyone? regards, tom lane
RE: [HACKERS] Quite strange crash
Killing an individual backend with SIGTERM is bad luck. The backend will assume that it's being killed by the postmaster, and will exit without a whole lot of concern for cleaning up shared memory --- the SIGTERM -- die() -- elog(FATAL) Is it true that elog(FATAL) doesn't clean up shmem etc? This would be very bad... What code will be returned to postmaster in this case? Right at the moment, the backend will exit with status 0. I think you are thinking the same thing I am: maybe a backend that receives SIGTERM ought to exit with nonzero status. That would mean that killing an individual backend would instantly translate into an installation-wide restart. I am not sure whether that's a good idea. Perhaps this cure is worse than the disease. Well, it's not good idea because of SIGTERM is used for ABORT + EXIT (pg_ctl -m fast stop), but shouldn't ABORT clean up everything? Vadim
Re: [HACKERS] Quite strange crash
"Mikheev, Vadim" [EMAIL PROTECTED] writes: Killing an individual backend with SIGTERM is bad luck. SIGTERM -- die() -- elog(FATAL) Is it true that elog(FATAL) doesn't clean up shmem etc? This would be very bad... It tries, but I don't think it's possible to make a complete guarantee without an unreasonable amount of overhead. The case at hand was a stuck spinlock because die() -- elog(FATAL) had neglected to release that particular spinlock before exiting. To guarantee that all spinlocks will be released by die(), we'd need something like START_CRIT_SECTION; S_LOCK(spinlock); record that we own spinlock; END_CRIT_SECTION; around every existing S_LOCK() call, and the reverse around every S_UNLOCK. Are you willing to pay that kind of overhead? I'm not sure this'd be enough anyway. Guaranteeing that you have consistent state at every instant that an ISR could interrupt you is not easy. regards, tom lane
Re: [HACKERS] Quite strange crash
* Mikheev, Vadim [EMAIL PROTECTED] [010108 23:08] wrote: Killing an individual backend with SIGTERM is bad luck. The backend will assume that it's being killed by the postmaster, and will exit without a whole lot of concern for cleaning up shared memory --- the SIGTERM -- die() -- elog(FATAL) Is it true that elog(FATAL) doesn't clean up shmem etc? This would be very bad... What code will be returned to postmaster in this case? Right at the moment, the backend will exit with status 0. I think you are thinking the same thing I am: maybe a backend that receives SIGTERM ought to exit with nonzero status. That would mean that killing an individual backend would instantly translate into an installation-wide restart. I am not sure whether that's a good idea. Perhaps this cure is worse than the disease. Well, it's not good idea because of SIGTERM is used for ABORT + EXIT (pg_ctl -m fast stop), but shouldn't ABORT clean up everything? Er, shouldn't ABORT leave the system in the exact state that it's in so that one can get a crashdump/traceback on a wedged process without it trying to clean up after itself? -- -Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]] "I have the heart of a child; I keep it in a jar on my desk."