Re: [HACKERS] ALL, ANY bug?

2001-01-08 Thread Tom Lane

Tatsuo Ishii [EMAIL PROTECTED] writes:
 You are right. However, SQL92 spec allows:
 select * from t1 where i  all values(0,1);

No, I still disagree.  The ANY/ALL constructs compare a single row on
the left side with all the rows produced by the query on the right.
"values(0,1)" is effectively the same as "SELECT 0,1", ie, it is a
query that happens to produce just one row.  The above is illegal in
SQL92 because the lefthand side is not a two-item row.

The semantics you seem to be looking for is "scalar IN/NOT IN list",
which we do have.  ANY/ALL is a different animal.

It's true that we don't accept values(...) as a subquery, but that's
only one aspect of this example.

regards, tom lane



Re: [HACKERS] ALL, ANY bug?

2001-01-08 Thread Tatsuo Ishii

 No, I still disagree.  The ANY/ALL constructs compare a single row on
 the left side with all the rows produced by the query on the right.
 "values(0,1)" is effectively the same as "SELECT 0,1", ie, it is a
 query that happens to produce just one row.  The above is illegal in
 SQL92 because the lefthand side is not a two-item row.

Oh, I see your point. values(0,1) = SELECT 0,1", which is one row
constructor. Sorry for the confusion.
--
Tatsuo Ishii



Re: [HACKERS] pg_dump return status..

2001-01-08 Thread Pete Forman

Nathan Myers writes:
  On Fri, Jan 05, 2001 at 11:20:43AM -0500, Tom Lane wrote:
   Philip Warner [EMAIL PROTECTED] writes:
how do I
check for a failed write in a way that works on all Unixes? Is the
following OK:
   
- fwrite: ok if return value equals item count
- fprintf: ok if return value  0.
- fputc: ok if != EOF
   
   Probably fprintf() = 0 --- according to my specs, it returns the number
   of chars emitted, or a negative value on error.  The other two are
   correct.
  
  An fprintf returning 0 is a suspicious event; it's easy to imagine 
  cases where it makes sense, but I don't think I have ever coded one.
  Probably N (where N is the smallest reasonable output, defaulting 
  to 1) may be a better test in real code.
  
  As I recall, on SunOS 4 the printf()s don't return the number of 
  characters written.  I don't recall what they do instead, and have
  no access to such machines any more.
  
  Other old BSD-derived systems are likely to have have wonky return 
  values/types on the printf()s.  Looking at the list of supported 
  platforms, none jump out as likely candidates, but in the "unsupported" 
  list, Ultrix and NextStep do.  (Do we care?)
  
  If SunOS 4 is to remain a supported platform, the printf checks may 
  need to be special-cased for it.

Current Solaris is liable to problems still, though these are not
relevant to this thread.  printf() and fprintf() have always returned
the number of characters transmitted, or EOF for failure.  It is
sprintf() that has problems.

There are two versions of sprintf() available in SunOS 4 - 8.  The
standard one (ANSI C) in libc returns an int, the number of characters
written (excluding '\0').  The BSD version returns a char* which
points to the target.  If you have a -lbsd on your link line then you
get the BSD version.  There are no compiler errors, just run time
errors if you rely on the return from sprintf() being the number of
characters.  The workaround is to put an extra -lc on the link line
before the -lbsd if your code needs both standard sprintf() and some
other BSD function.

Ultrix is documented as having the same behaviour as Solaris.  I don't
know about NeXTSTEP/OPENSTEP/GNUStep.
-- 
Pete Forman -./\.- Disclaimer: This post is originated
WesternGeco   -./\.-  by myself and does not represent
[EMAIL PROTECTED] -./\.-  opinion of Schlumberger, Baker
http://www.crosswinds.net/~petef  -./\.-  Hughes or their divisions.



Re: [HACKERS] patch: contrib/pgcrypto sanity

2001-01-08 Thread Marko Kreen

On Mon, Jan 08, 2001 at 10:03:25AM +, Pete Forman wrote:
 Marko Kreen writes:
   On Mon, Jan 08, 2001 at 04:06:09AM +0200, Marko Kreen wrote:
Well, actually they do.  glibc in stdint.h and NetBSD in
sys/inttypes.h which is a mess, all rigth.  Problem is that
postgres.h does not know about this.  I guess that C99 forgot to
specify _where_ they should be defined.
   
   Correction, they both have inttypes.h which probably is the right
   location for this.
 
 stdint.h is adequate to pick up uint*_t.  inttypes.h is defined to
 include stdint.h.  Of course all this C99 stuff is new and existing
 implementations may have the typedefs in different files or not have
 them at all.

But as I said, NetBSD does not have it.  So what is the
correct/portable/standard location for it?  Can anyone with C99
standard in hand find that out?

E.g. Tom Lane has some OS where these types are in
std{io|lib|def|arg} but on NetBSD and glibc/Linux you must include
separate header file for them.

-- 
marko




Re: [HACKERS] heap_update is broken in current sources

2001-01-08 Thread The Hermit Hacker


How are we on this?

On Sun, 7 Jan 2001, Tom Lane wrote:

 heap_update() currently ends with

 if (newbuf != buffer)
 {
 LockBuffer(newbuf, BUFFER_LOCK_UNLOCK);
 WriteBuffer(newbuf);
 }
 LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
 WriteBuffer(buffer);

 /* invalidate caches */
 RelationInvalidateHeapTuple(relation, oldtup);
 RelationMark4RollbackHeapTuple(relation, newtup);

 return HeapTupleMayBeUpdated;

 This is broken because WriteBuffer releases our refcounts on the buffer
 pages that are holding the old and new tuples.  By the time
 RelationInvalidateHeapTuple gets to do its thing, some other backend may
 have swapped a new disk page into the shared buffer that oldtup points
 at.  catcache.c will then be using the wrong data to compute the hash
 index of the old tuple.  This will at minimum result in failure to
 invalidate the old tuple out of our catcache (because we'll be searching
 the wrong hashchains), and can lead to a flat-out crash or Assert
 failure due to invalid data being fed to the hashing code.

 I have seen several nonrepeatable failures in the parallel regress tests
 in recent weeks, which I now believe are all traceable to this error.

 I will commit a fix for this error shortly, and have recommended to Marc
 that he re-roll the beta2 tarball before announcing it...

   regards, tom lane


Marc G. Fournier   ICQ#7615664   IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: [EMAIL PROTECTED]   secondary: scrappy@{freebsd|postgresql}.org




Re: [HACKERS] heap_update is broken in current sources

2001-01-08 Thread Tom Lane

The Hermit Hacker [EMAIL PROTECTED] writes:
 How are we on this?

It's fixed.

I've also run the regress tests with bufmgr.c hacked up to discard
pages (with forcible overwriting) as soon as their refcount goes to
zero.  That didn't disclose any similar bugs, although the coverage
of the tests leaves much to be desired :-(

regards, tom lane



Re: [HACKERS] pg_dump return status..

2001-01-08 Thread Tom Lane

Pete Forman [EMAIL PROTECTED] writes:
 Philip Warner writes:
 All I need to know is how to detect an error. Does it return EOF on
 error?

 The standard sprintf() returns a negative int on error.

I thought we were talking about fprintf.  sprintf can't really detect
any errors anyway, except maybe a bad format string.

regards, tom lane



Re: [HACKERS] Quite strange crash

2001-01-08 Thread Tom Lane

Denis Perchine [EMAIL PROTECTED] writes:
 On Monday 08 January 2001 00:08, Tom Lane wrote:
 FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting.
 
 Were there any errors before that?

 No... Just clean log (I redirect log from stderr/out t file, and all
 other to syslog).

The error messages would be in the syslog then, not in stderr.

 And the last query was:
 Jan  7 04:27:53 mx postgres[1008]: query: select message_id from pop3 where 
 server_id = 22615

How about the prior queries of other processes?  Keep in mind that the
spinlock could have been left locked by any backend, not only the one
that complained about it.

regards, tom lane



Re: [HACKERS] heap_update is broken in current sources

2001-01-08 Thread The Hermit Hacker


okay, will bundle up beta2 and announce it tonight when I get home ...
gives about 6 hrs or so to "halt the presses" *grin*

On Mon, 8 Jan 2001, Tom Lane wrote:

 The Hermit Hacker [EMAIL PROTECTED] writes:
  How are we on this?

 It's fixed.

 I've also run the regress tests with bufmgr.c hacked up to discard
 pages (with forcible overwriting) as soon as their refcount goes to
 zero.  That didn't disclose any similar bugs, although the coverage
 of the tests leaves much to be desired :-(

   regards, tom lane


Marc G. Fournier   ICQ#7615664   IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: [EMAIL PROTECTED]   secondary: scrappy@{freebsd|postgresql}.org




Re: [HACKERS] Quite strange crash

2001-01-08 Thread Denis Perchine

  FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting.
 
  Were there any errors before that?
 
  No... Just clean log (I redirect log from stderr/out t file, and all
  other to syslog).

 The error messages would be in the syslog then, not in stderr.

Hmmm... The only strange errors I see are:

Jan  7 04:22:14 mx postgres[679]: query: insert into statistic (date, 
visit_count, variant_id) values (now(), 1, 2)
Jan  7 04:22:14 mx postgres[631]: query: insert into statistic (date, 
visit_count, variant_id) values (now(), 1, 2)
Jan  7 04:22:14 mx postgres[700]: query: insert into statistic (date, 
visit_count, variant_id) values (now(), 1, 2)
Jan  7 04:22:14 mx postgres[665]: query: insert into statistic (date, 
visit_count, variant_id) values (now(), 1, 2)
Jan  7 04:22:14 mx postgres[633]: query: insert into statistic (date, 
visit_count, variant_id) values (now(), 1, 2)
Jan  7 04:22:14 mx postgres[629]: query: insert into statistic (date, 
visit_count, variant_id) values (now(), 1, 2)
Jan  7 04:22:14 mx postgres[736]: query: commit
Jan  7 04:22:14 mx postgres[736]: ProcessUtility: commit
Jan  7 04:22:14 mx postgres[700]: ERROR:  Cannot insert a duplicate key into 
unique index statistic_date_vid_key
Jan  7 04:22:14 mx postgres[700]: query: update users set 
rcpt_ip='213.75.35.129',rcptdate=now() where id=1428067
Jan  7 04:22:14 mx postgres[700]: NOTICE:  current transaction is aborted, 
queries ignored until end of transaction block
Jan  7 04:22:14 mx postgres[679]: query: commit
Jan  7 04:22:14 mx postgres[679]: ProcessUtility: commit
Jan  7 04:22:14 mx postgres[679]: query: update users set 
rcpt_ip='213.75.55.185',rcptdate=now() where id=1430836
Jan  7 04:22:14 mx postgres[665]: ERROR:  Cannot insert a duplicate key into 
unique index statistic_date_vid_key
Jan  7 04:22:14 mx postgres[665]: query: update users set 
rcpt_ip='202.156.121.139',rcptdate=now() where id=1271397
Jan  7 04:22:14 mx postgres[665]: NOTICE:  current transaction is aborted, 
queries ignored until end of transaction block
Jan  7 04:22:14 mx postgres[631]: ERROR:  Cannot insert a duplicate key into 
unique index statistic_date_vid_key
Jan  7 04:22:14 mx postgres[631]: query: update users set 
rcpt_ip='24.20.53.63',rcptdate=now() where id=1451254
Jan  7 04:22:14 mx postgres[631]: NOTICE:  current transaction is aborted, 
queries ignored until end of transaction block
Jan  7 04:22:14 mx postgres[633]: ERROR:  Cannot insert a duplicate key into 
unique index statistic_date_vid_key
Jan  7 04:22:14 mx postgres[633]: query: update users set 
rcpt_ip='213.116.168.173',rcptdate=now() where id=1378049
Jan  7 04:22:14 mx postgres[633]: NOTICE:  current transaction is aborted, 
queries ignored until end of transaction block
Jan  7 04:22:14 mx postgres[630]: query: select id,msg,next from alert
Jan  7 04:22:14 mx postgres[630]: query: select email,type from email where 
variant_id=2
Jan  7 04:22:14 mx postgres[630]: query:
select * from users where senderdate  now()-'10days'::interval AND
variant_id=2 AND crypt='21AN6KRffJdFRFc511'
 
Jan  7 04:22:14 mx postgres[629]: ERROR:  Cannot insert a duplicate key into 
unique index statistic_date_vid_key
Jan  7 04:22:14 mx postgres[629]: query: update users set 
rcpt_ip='213.42.45.81',rcptdate=now() where id=1441046
Jan  7 04:22:14 mx postgres[629]: NOTICE:  current transaction is aborted, 
queries ignored until end of transaction block
Jan  7 04:22:15 mx postgres[711]: query: select message_id from pop3 where 
server_id = 17746
Jan  7 04:22:15 mx postgres[711]: ERROR:  Relation 'pop3' does not exist

They popped up 4 minutes before. And the most interesting is that relation 
pop3 does exist!

  And the last query was:
  Jan  7 04:27:53 mx postgres[1008]: query: select message_id from pop3
  where server_id = 22615

 How about the prior queries of other processes?

I do not want to flood maillist (it will be too much of info). I can send you 
complete log file from Jan 7. It is 128Mb uncompressed. With gz it is 8Mb. 
Maybe it will be smaller with bz2.

  Keep in mind that the
 spinlock could have been left locked by any backend, not only the one
 that complained about it.

Actually you can have a look on the logs yourself. Remember I gave you a 
password from postgres user. This is the same postgres. Logs are in 
/var/log/postgres. You will need postgres.log.1.gz.

-- 
Sincerely Yours,
Denis Perchine

--
E-Mail: [EMAIL PROTECTED]
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
--



Re: [HACKERS] bootstrap tables

2001-01-08 Thread Ross J. Reedstrom

On Sat, Jan 06, 2001 at 03:50:03AM +0100, Peter Eisentraut wrote:
 Ross J. Reedstrom writes:
 
   Do you really need the thing to be a bootstrap table, and not a plain
   system table?
 
  Yup, 'cause it's going to store the schema info, including the system
  schema. I forsee it needing to be accessed immediately during bootstrap.
 
 Does "schema info" mean SQL schemas or merely additional schema
 information along the lines of pg_class, etc.?
 

I thought that might prick your ears up. Yes, I'm looking at just how
horrible it might be to implement SQL schemas. As it turns out, I think
Tom is right, and I can just use a regular system table. I've got that
working, and some of the gammar modified to accept the schema.table
notation: now I'm working from both ends on all the places in between
that need to know about the schema. Vadim's work replacing relnames with
relfilenode helps to some extent.

I'm taking the simplistic approach of adding the schema name wherever
the relname is currently used, in parallel, and introducing a new global,
the current default schema.

As I said in my first note, I'm trying this out, regardless if it's the
best way to implement the feature (why is it that I only seem to find
time to work on new stuff in pgsql when we're in beta?) We can debate a
better implementation after I have roughly working code, or have given
up on it as a bad idea.

As it happens, my pgsql time this week is taken up by another task, so I
won't be working on this until the weekend, at the earliest.

Ross



Re: [HACKERS] Quite strange crash

2001-01-08 Thread Tom Lane

Denis Perchine [EMAIL PROTECTED] writes:
 FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting.
 
 Were there any errors before that?

 Actually you can have a look on the logs yourself.

Well, I found a smoking gun:

Jan  7 04:27:51 mx postgres[2501]: FATAL 1:  The system is shutting down

PID 2501 had been running:

Jan  7 04:25:44 mx postgres[2501]: query: vacuum verbose lazy;

What seems to have happened is that 2501 curled up and died, leaving
one or more buffer spinlocks locked.  Roughly one spinlock timeout
later, at 04:29:07, we have 1008 complaining of a stuck spinlock.
So that fits.

The real question is what happened to 2501?  None of the other backends
reported a SIGTERM signal, so the signal did not come from the
postmaster.

Another interesting datapoint: there is a second place in this logfile
where one single backend reports SIGTERM while its brethren keep running:

Jan  7 04:30:47 mx postgres[4269]: query: vacuum verbose;
...
Jan  7 04:38:16 mx postgres[4269]: FATAL 1:  The system is shutting down

There is something pretty fishy about this.  You aren't by any chance
running the postmaster under a ulimit setting that might cut off
individual backends after a certain amount of CPU time, are you?
What signal does a ulimit violation deliver on your machine, anyway?

regards, tom lane



[HACKERS] Re: Should heap_update/heap_delete hold buffer locks while toasting?

2001-01-08 Thread Jan Wieck

Tom Lane wrote:
 The way that heap_update() and heap_delete() are currently coded, they
 hold the buffer context lock on the buffer containing the old tuple
 while they invoke heap_tuple_toast_attrs().  This strikes me as at least
 inefficient and at worst a source of deadlock.  Is it possible to avoid
 holding the buffer lock while doing the TOAST manipulations?

Since the TOAST table access is doing it's own locking on the
TOAST tables, I think it'd be possible to move it outside  of
the buffer lock.


Jan

--

#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #





Re: [HACKERS] is_view seems unnecessarily slow

2001-01-08 Thread Jan Wieck

Tom Lane wrote:
 backend/commands/command.c has a routine is_view() that tests for
 view-ness by scanning pg_rewrite (all of it) to see if the given
 relation has any ON SELECT rules.

 This is only used to disallow AlterTableAddConstraint and
 LockTableCommand on views.  While I don't care much about the
 performance of AlterTableAddConstraint, it does bug me that this
 might slow down LOCK TABLE a good deal.

 Any objection to replacing this routine by a test for relkind = VIEW?

No objections.

These checks came from the days where views still had relkind
'r' and looking up pg_rewrite was the only way to know.


Jan

--

#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #





Re: [HACKERS] Assuming that TAS() will succeed the first time is verboten

2001-01-08 Thread Tom Lane

One last followup on that bizarreness about shutdown's checkpoint
failing on Alpha platforms ---

After changing the checkpoint code to loop, rather than assuming TAS()
must succeed the first time, I noticed that it always looped exactly
once.  This didn't make sense to me at the time, but after querying some
Alpha experts at DEC^H^H^HCompaq, it does now.  If a new process's first
write to a shared memory page is a stq_c, that stq_c is guaranteed to
fail (at least on Tru64 Unix), because it will page fault.  The shared
memory page is inherited read-only and is converted to read-write on
first fault.  This doesn't seem really necessary, but I suppose it's
done to share code with the copy-on-write case for non-shared pages
that are inherited via fork().

It makes sense that the checkpoint process's first write to shared
memory would be stq_c, because after all it shouldn't be scribbling
on shared memory until it's got the spinlock, n'est ce pas?

So a failure the first time through the TAS loop is entirely expected
for Alpha.  I wouldn't be surprised to see similar behavior on other
architectures, now that I see the first-write-from-a-process connection.

Bottom line is the same: always call TAS() in a retry loop.

regards, tom lane



Re: [HACKERS] Quite strange crash

2001-01-08 Thread Nathan Myers

On Mon, Jan 08, 2001 at 12:21:38PM -0500, Tom Lane wrote:
 Denis Perchine [EMAIL PROTECTED] writes:
  FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting.
  
  Were there any errors before that?
 
  Actually you can have a look on the logs yourself.
 
 Well, I found a smoking gun: ...
 What seems to have happened is that 2501 curled up and died, leaving
 one or more buffer spinlocks locked.  ...
 There is something pretty fishy about this.  You aren't by any chance
 running the postmaster under a ulimit setting that might cut off
 individual backends after a certain amount of CPU time, are you?
 What signal does a ulimit violation deliver on your machine, anyway?

It's worth noting here that modern Unixes run around killing user-level
processes more or less at random when free swap space (and sometimes
just RAM) runs low.  AIX was the first such, but would send SIGDANGER
to processes first to try to reclaim some RAM; critical daemons were
expected to explicitly ignore SIGDANGER. Other Unixes picked up the 
idea without picking up the SIGDANGER behavior.

The reason for this common pathological behavior is usually traced
to sloppy resource accounting.  It manifests as the bad policy of 
having malloc() (and sbrk() or mmap() underneath) return a valid 
pointer rather than NULL, on the assumption that most of the memory 
asked for won't be used just yet.  Anyhow, the system doesn't know 
how much memory is really available at that moment.

Usually the problem is explained with the example of a very large
process that forks, suddenly demanding twice as much memory. (Apache
is particularly egregious this way, allocating lots of memory and
then forking several times.)  Instead of failing the fork, the kernel
waits for a process to touch memory it was granted and then see if 
any RAM/swap has turned up to satisfy it, and then kill the process 
(or some random other process!) if not.

Now that programs have come to depend on this behavior, it has become
very hard to fix it. The implication for the rest of us is that we 
should expect our processes to be killed at random, just for touching 
memory granted, or for no reason at all. (Kernel people say, "They're 
just user-level programs, restart them;" or, "Maybe we can designate 
some critical processes that don't get killed".)  In Linux they try 
to invent heuristics to avoid killing the X server, because so many 
programs depend on it.  It's a disgraceful mess, really.

The relevance to the issue at hand is that processes dying during 
heavy memory load is a documented feature of our supported platforms.

Nathan Myers 
[EMAIL PROTECTED]



[HACKERS] Dec TRU64/PG 7.1

2001-01-08 Thread Larry Rosenman

Has anyone tried 7.1Beta on Tru64? 

I've got an app that will be moving to this platform, and would like
to not have any surprises (It's the first production app with PG in
the dallas office, and would like to not make trouble).

Thanks!

LER


-- 
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED]
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749



[HACKERS] README.mb

2001-01-08 Thread Tatsuo Ishii

I think doc/REAME.mb is now deprecated and I would like to remove it.
Also I would like to place the Chinese (Big5) version of README.mb
recently posted by Chih-Chang Hsieh [EMAIL PROTECTED] as
doc/README.mb.big5.

Comments? Objections?
--
Tatsuo Ishii



[HACKERS] Compaq open source database benchmark

2001-01-08 Thread Bruce Momjian

Compaq has released an open-source database benchmark at:

http://opensource.compaq.com/sourceforge/project/?group_id=19

I must say, Compaq has been more involved with PostgreSQL than any of
the other computer vendors.  They have contributed equipment, and now
benchmark source code.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [HACKERS] Assuming that TAS() will succeed the first time is verboten

2001-01-08 Thread Bruce Momjian

 One last followup on that bizarreness about shutdown's checkpoint
 failing on Alpha platforms ---
 
 After changing the checkpoint code to loop, rather than assuming TAS()
 must succeed the first time, I noticed that it always looped exactly
 once.  This didn't make sense to me at the time, but after querying some
 Alpha experts at DEC^H^H^HCompaq, it does now.  If a new process's first
 write to a shared memory page is a stq_c, that stq_c is guaranteed to
 fail (at least on Tru64 Unix), because it will page fault.  The shared
 memory page is inherited read-only and is converted to read-write on
 first fault.  This doesn't seem really necessary, but I suppose it's
 done to share code with the copy-on-write case for non-shared pages
 that are inherited via fork().

This seems quite bizarre.  Why would the process fail on the write, and
not just pause and wait for the fault to bring in the page?  Doesn't the
CPU halt the instruction to fetch in the page and restart the
instruction?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [HACKERS] Assuming that TAS() will succeed the first time is verboten

2001-01-08 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 After changing the checkpoint code to loop, rather than assuming TAS()
 must succeed the first time, I noticed that it always looped exactly
 once.  This didn't make sense to me at the time, but after querying some
 Alpha experts at DEC^H^H^HCompaq, it does now.  If a new process's first
 write to a shared memory page is a stq_c, that stq_c is guaranteed to
 fail (at least on Tru64 Unix), because it will page fault.  The shared
 memory page is inherited read-only and is converted to read-write on
 first fault.  This doesn't seem really necessary, but I suppose it's
 done to share code with the copy-on-write case for non-shared pages
 that are inherited via fork().

 This seems quite bizarre.  Why would the process fail on the write, and
 not just pause and wait for the fault to bring in the page?

An ordinary write would be re-executed and would succeed after the
page fault.  stq_c is different, because it's only supposed to succeed
if the processor has managed to hold an access lock on the target
address continuously since the ldq_l.  It would be very bad form to try
to hold the lock during a page fault.  (stq_c will also fail if the
processor is interrupted between ldq_l and stq_c, so occasional failures
are to be expected.  What was surprising me was the consistency of the
failure pattern.)

See the Alpha Architecture Manual if you really want to discuss this.

regards, tom lane



Re: [HACKERS] Assuming that TAS() will succeed the first time is verboten

2001-01-08 Thread Bruce Momjian


Oh, thanks.  That makes sense.

 Bruce Momjian [EMAIL PROTECTED] writes:
  After changing the checkpoint code to loop, rather than assuming TAS()
  must succeed the first time, I noticed that it always looped exactly
  once.  This didn't make sense to me at the time, but after querying some
  Alpha experts at DEC^H^H^HCompaq, it does now.  If a new process's first
  write to a shared memory page is a stq_c, that stq_c is guaranteed to
  fail (at least on Tru64 Unix), because it will page fault.  The shared
  memory page is inherited read-only and is converted to read-write on
  first fault.  This doesn't seem really necessary, but I suppose it's
  done to share code with the copy-on-write case for non-shared pages
  that are inherited via fork().
 
  This seems quite bizarre.  Why would the process fail on the write, and
  not just pause and wait for the fault to bring in the page?
 
 An ordinary write would be re-executed and would succeed after the
 page fault.  stq_c is different, because it's only supposed to succeed
 if the processor has managed to hold an access lock on the target
 address continuously since the ldq_l.  It would be very bad form to try
 to hold the lock during a page fault.  (stq_c will also fail if the
 processor is interrupted between ldq_l and stq_c, so occasional failures
 are to be expected.  What was surprising me was the consistency of the
 failure pattern.)
 
 See the Alpha Architecture Manual if you really want to discuss this.
 
   regards, tom lane
 


-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [HACKERS] README.mb

2001-01-08 Thread Tom Lane

Tatsuo Ishii [EMAIL PROTECTED] writes:
 I think doc/REAME.mb is now deprecated and I would like to remove it.
 Also I would like to place the Chinese (Big5) version of README.mb
 recently posted by Chih-Chang Hsieh [EMAIL PROTECTED] as
 doc/README.mb.big5.

Go for it.  I was surprised you hadn't done so already.

regards, tom lane



Re: [HACKERS] Dec TRU64/PG 7.1

2001-01-08 Thread Tom Lane

Larry Rosenman [EMAIL PROTECTED] writes:
 Has anyone tried 7.1Beta on Tru64? 

beta1 will not work, but recent snapshots are reported to pass
regression tests.  Feel free to pound away on it ...

regards, tom lane



Re: [HACKERS] Quite strange crash

2001-01-08 Thread Denis Perchine

  Well, I found a smoking gun: ...
  What seems to have happened is that 2501 curled up and died, leaving
  one or more buffer spinlocks locked.  ...
  There is something pretty fishy about this.  You aren't by any chance
  running the postmaster under a ulimit setting that might cut off
  individual backends after a certain amount of CPU time, are you?
  What signal does a ulimit violation deliver on your machine, anyway?

 It's worth noting here that modern Unixes run around killing user-level
 processes more or less at random when free swap space (and sometimes
 just RAM) runs low.  AIX was the first such, but would send SIGDANGER
 to processes first to try to reclaim some RAM; critical daemons were
 expected to explicitly ignore SIGDANGER. Other Unixes picked up the
 idea without picking up the SIGDANGER behavior.

That's not the case for sure. There are 512Mb on the machine, and when I had 
this problem it was compltely unloaded (300Mb in caches).

-- 
Sincerely Yours,
Denis Perchine

--
E-Mail: [EMAIL PROTECTED]
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
--



Re: [HACKERS] Quite strange crash

2001-01-08 Thread Tom Lane

Denis Perchine [EMAIL PROTECTED] writes:
 It's worth noting here that modern Unixes run around killing user-level
 processes more or less at random when free swap space (and sometimes
 just RAM) runs low.

 That's not the case for sure. There are 512Mb on the machine, and when I had 
 this problem it was compltely unloaded (300Mb in caches).

The fact that VACUUM processes seemed to be preferential victims
suggests a resource limit of some sort.  I had suggested a CPU-time
limit, but perhaps it could also be disk-pages-written.

regards, tom lane



Re: [HACKERS] Quite strange crash

2001-01-08 Thread Denis Perchine

On Monday 08 January 2001 23:21, Tom Lane wrote:
 Denis Perchine [EMAIL PROTECTED] writes:
  FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting.
 
  Were there any errors before that?
 
  Actually you can have a look on the logs yourself.

 Well, I found a smoking gun:

 Jan  7 04:27:51 mx postgres[2501]: FATAL 1:  The system is shutting down

 PID 2501 had been running:

 Jan  7 04:25:44 mx postgres[2501]: query: vacuum verbose lazy;

Hmmm... actually this is real problem with vacuum lazy. Sometimes it just do 
something for enormous amount of time (I have mailed a sample database to 
Vadim, but did not get any response yet). It is possible, that it was me, who 
killed the backend.

 What seems to have happened is that 2501 curled up and died, leaving
 one or more buffer spinlocks locked.  Roughly one spinlock timeout
 later, at 04:29:07, we have 1008 complaining of a stuck spinlock.
 So that fits.

 The real question is what happened to 2501?  None of the other backends
 reported a SIGTERM signal, so the signal did not come from the
 postmaster.

 Another interesting datapoint: there is a second place in this logfile
 where one single backend reports SIGTERM while its brethren keep running:

 Jan  7 04:30:47 mx postgres[4269]: query: vacuum verbose;
 ...
 Jan  7 04:38:16 mx postgres[4269]: FATAL 1:  The system is shutting down

Hmmm... Maybe this also was me... But I am not sure here.

 There is something pretty fishy about this.  You aren't by any chance
 running the postmaster under a ulimit setting that might cut off
 individual backends after a certain amount of CPU time, are you?

[postgres@mx postgres]$ ulimit -a
core file size (blocks)  100
data seg size (kbytes)   unlimited
file size (blocks)   unlimited
max memory size (kbytes) unlimited
stack size (kbytes)  8192
cpu time (seconds)   unlimited
max user processes   2048
pipe size (512 bytes)8
open files   1024
virtual memory (kbytes)  2105343

No, there are no any ulimits.

 What signal does a ulimit violation deliver on your machine, anyway?

if (psecs / HZ  p-rlim[RLIMIT_CPU].rlim_cur) {
/* Send SIGXCPU every second.. */
if (!(psecs % HZ))
send_sig(SIGXCPU, p, 1);
/* and SIGKILL when we go over max.. */
if (psecs / HZ  p-rlim[RLIMIT_CPU].rlim_max)
send_sig(SIGKILL, p, 1);
}

This part of the kernel show the logic. This mean that process wil get 
SIGXCPU each second if it above soft limit, and SIGKILL when it will be above 
hardlimit.

-- 
Sincerely Yours,
Denis Perchine

--
E-Mail: [EMAIL PROTECTED]
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
--



Re: [HACKERS] Quite strange crash

2001-01-08 Thread Tom Lane

Denis Perchine [EMAIL PROTECTED] writes:
 Hmmm... actually this is real problem with vacuum lazy. Sometimes it
 just do something for enormous amount of time (I have mailed a sample
 database to Vadim, but did not get any response yet). It is possible,
 that it was me, who killed the backend.

Killing an individual backend with SIGTERM is bad luck.  The backend
will assume that it's being killed by the postmaster, and will exit
without a whole lot of concern for cleaning up shared memory --- the
expectation is that as soon as all the backends are dead, the postmaster
will reinitialize shared memory.

You can get away with sending SIGINT (QueryCancel) to an individual
backend.  Anything else voids the warranty ;=)

But, having said that --- this VACUUM process had only been running
for two minutes of real time.  Seems unlikely that you'd have chosen
to kill it so quickly.

regards, tom lane



RE: [HACKERS] Quite strange crash

2001-01-08 Thread Mikheev, Vadim

 Killing an individual backend with SIGTERM is bad luck.  The backend
 will assume that it's being killed by the postmaster, and will exit
 without a whole lot of concern for cleaning up shared memory --- the

What code will be returned to postmaster in this case?

Vadim



Re: [HACKERS] Quite strange crash

2001-01-08 Thread Tom Lane

"Mikheev, Vadim" [EMAIL PROTECTED] writes:
 Killing an individual backend with SIGTERM is bad luck.  The backend
 will assume that it's being killed by the postmaster, and will exit
 without a whole lot of concern for cleaning up shared memory --- the

 What code will be returned to postmaster in this case?

Right at the moment, the backend will exit with status 0.  I think you
are thinking the same thing I am: maybe a backend that receives SIGTERM
ought to exit with nonzero status.

That would mean that killing an individual backend would instantly
translate into an installation-wide restart.  I am not sure whether
that's a good idea.  Perhaps this cure is worse than the disease.
Comments anyone?

regards, tom lane



RE: [HACKERS] Quite strange crash

2001-01-08 Thread Mikheev, Vadim

  Killing an individual backend with SIGTERM is bad luck.  
  The backend will assume that it's being killed by the postmaster,
  and will exit without a whole lot of concern for cleaning up shared 
  memory --- the

SIGTERM -- die() -- elog(FATAL)

Is it true that elog(FATAL) doesn't clean up shmem etc?
This would be very bad...

  What code will be returned to postmaster in this case?
 
 Right at the moment, the backend will exit with status 0.  I think you
 are thinking the same thing I am: maybe a backend that 
 receives SIGTERM ought to exit with nonzero status.
 
 That would mean that killing an individual backend would instantly
 translate into an installation-wide restart.  I am not sure whether
 that's a good idea.  Perhaps this cure is worse than the disease.

Well, it's not good idea because of SIGTERM is used for ABORT + EXIT
(pg_ctl -m fast stop), but shouldn't ABORT clean up everything?

Vadim



Re: [HACKERS] Quite strange crash

2001-01-08 Thread Tom Lane

"Mikheev, Vadim" [EMAIL PROTECTED] writes:
 Killing an individual backend with SIGTERM is bad luck.  

 SIGTERM -- die() -- elog(FATAL)

 Is it true that elog(FATAL) doesn't clean up shmem etc?
 This would be very bad...

It tries, but I don't think it's possible to make a complete guarantee
without an unreasonable amount of overhead.  The case at hand was a
stuck spinlock because die() -- elog(FATAL) had neglected to release
that particular spinlock before exiting.  To guarantee that all
spinlocks will be released by die(), we'd need something like

START_CRIT_SECTION;
S_LOCK(spinlock);
record that we own spinlock;
END_CRIT_SECTION;

around every existing S_LOCK() call, and the reverse around every
S_UNLOCK.  Are you willing to pay that kind of overhead?  I'm not
sure this'd be enough anyway.  Guaranteeing that you have consistent
state at every instant that an ISR could interrupt you is not easy.

regards, tom lane



Re: [HACKERS] Quite strange crash

2001-01-08 Thread Alfred Perlstein

* Mikheev, Vadim [EMAIL PROTECTED] [010108 23:08] wrote:
   Killing an individual backend with SIGTERM is bad luck.  
   The backend will assume that it's being killed by the postmaster,
   and will exit without a whole lot of concern for cleaning up shared 
   memory --- the
 
 SIGTERM -- die() -- elog(FATAL)
 
 Is it true that elog(FATAL) doesn't clean up shmem etc?
 This would be very bad...
 
   What code will be returned to postmaster in this case?
  
  Right at the moment, the backend will exit with status 0.  I think you
  are thinking the same thing I am: maybe a backend that 
  receives SIGTERM ought to exit with nonzero status.
  
  That would mean that killing an individual backend would instantly
  translate into an installation-wide restart.  I am not sure whether
  that's a good idea.  Perhaps this cure is worse than the disease.
 
 Well, it's not good idea because of SIGTERM is used for ABORT + EXIT
 (pg_ctl -m fast stop), but shouldn't ABORT clean up everything?

Er, shouldn't ABORT leave the system in the exact state that it's
in so that one can get a crashdump/traceback on a wedged process
without it trying to clean up after itself?

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."