Re: [HACKERS] Quite strange crash

2001-01-08 Thread Alfred Perlstein

* Mikheev, Vadim <[EMAIL PROTECTED]> [010108 23:08] wrote:
> > >> Killing an individual backend with SIGTERM is bad luck.  
> > >> The backend will assume that it's being killed by the postmaster,
> > >> and will exit without a whole lot of concern for cleaning up shared 
> > >> memory --- the
> 
> SIGTERM --> die() --> elog(FATAL)
> 
> Is it true that elog(FATAL) doesn't clean up shmem etc?
> This would be very bad...
> 
> > > What code will be returned to postmaster in this case?
> > 
> > Right at the moment, the backend will exit with status 0.  I think you
> > are thinking the same thing I am: maybe a backend that 
> > receives SIGTERM ought to exit with nonzero status.
> > 
> > That would mean that killing an individual backend would instantly
> > translate into an installation-wide restart.  I am not sure whether
> > that's a good idea.  Perhaps this cure is worse than the disease.
> 
> Well, it's not good idea because of SIGTERM is used for ABORT + EXIT
> (pg_ctl -m fast stop), but shouldn't ABORT clean up everything?

Er, shouldn't ABORT leave the system in the exact state that it's
in so that one can get a crashdump/traceback on a wedged process
without it trying to clean up after itself?

-- 
-Alfred Perlstein - [[EMAIL PROTECTED]|[EMAIL PROTECTED]]
"I have the heart of a child; I keep it in a jar on my desk."



Re: [HACKERS] Quite strange crash

2001-01-08 Thread Tom Lane

"Mikheev, Vadim" <[EMAIL PROTECTED]> writes:
> Killing an individual backend with SIGTERM is bad luck.  

> SIGTERM --> die() --> elog(FATAL)

> Is it true that elog(FATAL) doesn't clean up shmem etc?
> This would be very bad...

It tries, but I don't think it's possible to make a complete guarantee
without an unreasonable amount of overhead.  The case at hand was a
stuck spinlock because die() --> elog(FATAL) had neglected to release
that particular spinlock before exiting.  To guarantee that all
spinlocks will be released by die(), we'd need something like

START_CRIT_SECTION;
S_LOCK(spinlock);
record that we own spinlock;
END_CRIT_SECTION;

around every existing S_LOCK() call, and the reverse around every
S_UNLOCK.  Are you willing to pay that kind of overhead?  I'm not
sure this'd be enough anyway.  Guaranteeing that you have consistent
state at every instant that an ISR could interrupt you is not easy.

regards, tom lane



RE: [HACKERS] Quite strange crash

2001-01-08 Thread Mikheev, Vadim

> >> Killing an individual backend with SIGTERM is bad luck.  
> >> The backend will assume that it's being killed by the postmaster,
> >> and will exit without a whole lot of concern for cleaning up shared 
> >> memory --- the

SIGTERM --> die() --> elog(FATAL)

Is it true that elog(FATAL) doesn't clean up shmem etc?
This would be very bad...

> > What code will be returned to postmaster in this case?
> 
> Right at the moment, the backend will exit with status 0.  I think you
> are thinking the same thing I am: maybe a backend that 
> receives SIGTERM ought to exit with nonzero status.
> 
> That would mean that killing an individual backend would instantly
> translate into an installation-wide restart.  I am not sure whether
> that's a good idea.  Perhaps this cure is worse than the disease.

Well, it's not good idea because of SIGTERM is used for ABORT + EXIT
(pg_ctl -m fast stop), but shouldn't ABORT clean up everything?

Vadim



Re: [HACKERS] Quite strange crash

2001-01-08 Thread Tom Lane

"Mikheev, Vadim" <[EMAIL PROTECTED]> writes:
>> Killing an individual backend with SIGTERM is bad luck.  The backend
>> will assume that it's being killed by the postmaster, and will exit
>> without a whole lot of concern for cleaning up shared memory --- the

> What code will be returned to postmaster in this case?

Right at the moment, the backend will exit with status 0.  I think you
are thinking the same thing I am: maybe a backend that receives SIGTERM
ought to exit with nonzero status.

That would mean that killing an individual backend would instantly
translate into an installation-wide restart.  I am not sure whether
that's a good idea.  Perhaps this cure is worse than the disease.
Comments anyone?

regards, tom lane



RE: [HACKERS] Quite strange crash

2001-01-08 Thread Mikheev, Vadim

> Killing an individual backend with SIGTERM is bad luck.  The backend
> will assume that it's being killed by the postmaster, and will exit
> without a whole lot of concern for cleaning up shared memory --- the

What code will be returned to postmaster in this case?

Vadim



Re: [HACKERS] Quite strange crash

2001-01-08 Thread Tom Lane

Denis Perchine <[EMAIL PROTECTED]> writes:
> Hmmm... actually this is real problem with vacuum lazy. Sometimes it
> just do something for enormous amount of time (I have mailed a sample
> database to Vadim, but did not get any response yet). It is possible,
> that it was me, who killed the backend.

Killing an individual backend with SIGTERM is bad luck.  The backend
will assume that it's being killed by the postmaster, and will exit
without a whole lot of concern for cleaning up shared memory --- the
expectation is that as soon as all the backends are dead, the postmaster
will reinitialize shared memory.

You can get away with sending SIGINT (QueryCancel) to an individual
backend.  Anything else voids the warranty ;=)

But, having said that --- this VACUUM process had only been running
for two minutes of real time.  Seems unlikely that you'd have chosen
to kill it so quickly.

regards, tom lane



Re: [HACKERS] Quite strange crash

2001-01-08 Thread Denis Perchine

On Monday 08 January 2001 23:21, Tom Lane wrote:
> Denis Perchine <[EMAIL PROTECTED]> writes:
> >>> FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting.
> >
> > Were there any errors before that?
> >
> > Actually you can have a look on the logs yourself.
>
> Well, I found a smoking gun:
>
> Jan  7 04:27:51 mx postgres[2501]: FATAL 1:  The system is shutting down
>
> PID 2501 had been running:
>
> Jan  7 04:25:44 mx postgres[2501]: query: vacuum verbose lazy;

Hmmm... actually this is real problem with vacuum lazy. Sometimes it just do 
something for enormous amount of time (I have mailed a sample database to 
Vadim, but did not get any response yet). It is possible, that it was me, who 
killed the backend.

> What seems to have happened is that 2501 curled up and died, leaving
> one or more buffer spinlocks locked.  Roughly one spinlock timeout
> later, at 04:29:07, we have 1008 complaining of a stuck spinlock.
> So that fits.
>
> The real question is what happened to 2501?  None of the other backends
> reported a SIGTERM signal, so the signal did not come from the
> postmaster.
>
> Another interesting datapoint: there is a second place in this logfile
> where one single backend reports SIGTERM while its brethren keep running:
>
> Jan  7 04:30:47 mx postgres[4269]: query: vacuum verbose;
> ...
> Jan  7 04:38:16 mx postgres[4269]: FATAL 1:  The system is shutting down

Hmmm... Maybe this also was me... But I am not sure here.

> There is something pretty fishy about this.  You aren't by any chance
> running the postmaster under a ulimit setting that might cut off
> individual backends after a certain amount of CPU time, are you?

[postgres@mx postgres]$ ulimit -a
core file size (blocks)  100
data seg size (kbytes)   unlimited
file size (blocks)   unlimited
max memory size (kbytes) unlimited
stack size (kbytes)  8192
cpu time (seconds)   unlimited
max user processes   2048
pipe size (512 bytes)8
open files   1024
virtual memory (kbytes)  2105343

No, there are no any ulimits.

> What signal does a ulimit violation deliver on your machine, anyway?

if (psecs / HZ > p->rlim[RLIMIT_CPU].rlim_cur) {
/* Send SIGXCPU every second.. */
if (!(psecs % HZ))
send_sig(SIGXCPU, p, 1);
/* and SIGKILL when we go over max.. */
if (psecs / HZ > p->rlim[RLIMIT_CPU].rlim_max)
send_sig(SIGKILL, p, 1);
}

This part of the kernel show the logic. This mean that process wil get 
SIGXCPU each second if it above soft limit, and SIGKILL when it will be above 
hardlimit.

-- 
Sincerely Yours,
Denis Perchine

--
E-Mail: [EMAIL PROTECTED]
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
--



Re: [HACKERS] Quite strange crash

2001-01-08 Thread Tom Lane

Denis Perchine <[EMAIL PROTECTED]> writes:
>> It's worth noting here that modern Unixes run around killing user-level
>> processes more or less at random when free swap space (and sometimes
>> just RAM) runs low.

> That's not the case for sure. There are 512Mb on the machine, and when I had 
> this problem it was compltely unloaded (>300Mb in caches).

The fact that VACUUM processes seemed to be preferential victims
suggests a resource limit of some sort.  I had suggested a CPU-time
limit, but perhaps it could also be disk-pages-written.

regards, tom lane



Re: [HACKERS] Quite strange crash

2001-01-08 Thread Denis Perchine

> > Well, I found a smoking gun: ...
> > What seems to have happened is that 2501 curled up and died, leaving
> > one or more buffer spinlocks locked.  ...
> > There is something pretty fishy about this.  You aren't by any chance
> > running the postmaster under a ulimit setting that might cut off
> > individual backends after a certain amount of CPU time, are you?
> > What signal does a ulimit violation deliver on your machine, anyway?
>
> It's worth noting here that modern Unixes run around killing user-level
> processes more or less at random when free swap space (and sometimes
> just RAM) runs low.  AIX was the first such, but would send SIGDANGER
> to processes first to try to reclaim some RAM; critical daemons were
> expected to explicitly ignore SIGDANGER. Other Unixes picked up the
> idea without picking up the SIGDANGER behavior.

That's not the case for sure. There are 512Mb on the machine, and when I had 
this problem it was compltely unloaded (>300Mb in caches).

-- 
Sincerely Yours,
Denis Perchine

--
E-Mail: [EMAIL PROTECTED]
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
--



Re: [HACKERS] V7.0 Error: OID ##### no longer exists in pg_database

2001-01-08 Thread Bruce Momjian

I think this was fixed today.

> 
> > Any ideas on why this error would occur after dropping an existing
> > database and than re-creating it (createdb)?  The only work around so far
> > is to completely destroy the postgres instance and start over.  
> > 
> > It appears that the pg_database may be getting currupted.
> > 
> > Sandy Barnes
> > Honeywell
> > Business & General Aviation Engineering
> > One Technology Center
> > 23500 West 105th Street
> > Olathe, KS. 66061
> > tel 913.712.2027
> > fax913.712.1347
> > email[EMAIL PROTECTED]
> > 
> 


-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [HACKERS] Dec TRU64/PG 7.1

2001-01-08 Thread Tom Lane

Larry Rosenman <[EMAIL PROTECTED]> writes:
> Has anyone tried 7.1Beta on Tru64? 

beta1 will not work, but recent snapshots are reported to pass
regression tests.  Feel free to pound away on it ...

regards, tom lane



Re: [HACKERS] README.mb

2001-01-08 Thread Tom Lane

Tatsuo Ishii <[EMAIL PROTECTED]> writes:
> I think doc/REAME.mb is now deprecated and I would like to remove it.
> Also I would like to place the Chinese (Big5) version of README.mb
> recently posted by Chih-Chang Hsieh <[EMAIL PROTECTED]> as
> doc/README.mb.big5.

Go for it.  I was surprised you hadn't done so already.

regards, tom lane



Re: [HACKERS] Assuming that TAS() will succeed the first time is verboten

2001-01-08 Thread Bruce Momjian


Oh, thanks.  That makes sense.

> Bruce Momjian <[EMAIL PROTECTED]> writes:
> >> After changing the checkpoint code to loop, rather than assuming TAS()
> >> must succeed the first time, I noticed that it always looped exactly
> >> once.  This didn't make sense to me at the time, but after querying some
> >> Alpha experts at DEC^H^H^HCompaq, it does now.  If a new process's first
> >> write to a shared memory page is a stq_c, that stq_c is guaranteed to
> >> fail (at least on Tru64 Unix), because it will page fault.  The shared
> >> memory page is inherited read-only and is converted to read-write on
> >> first fault.  This doesn't seem really necessary, but I suppose it's
> >> done to share code with the copy-on-write case for non-shared pages
> >> that are inherited via fork().
> 
> > This seems quite bizarre.  Why would the process fail on the write, and
> > not just pause and wait for the fault to bring in the page?
> 
> An ordinary write would be re-executed and would succeed after the
> page fault.  stq_c is different, because it's only supposed to succeed
> if the processor has managed to hold an access lock on the target
> address continuously since the ldq_l.  It would be very bad form to try
> to hold the lock during a page fault.  (stq_c will also fail if the
> processor is interrupted between ldq_l and stq_c, so occasional failures
> are to be expected.  What was surprising me was the consistency of the
> failure pattern.)
> 
> See the Alpha Architecture Manual if you really want to discuss this.
> 
>   regards, tom lane
> 


-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [HACKERS] Assuming that TAS() will succeed the first time is verboten

2001-01-08 Thread Tom Lane

Bruce Momjian <[EMAIL PROTECTED]> writes:
>> After changing the checkpoint code to loop, rather than assuming TAS()
>> must succeed the first time, I noticed that it always looped exactly
>> once.  This didn't make sense to me at the time, but after querying some
>> Alpha experts at DEC^H^H^HCompaq, it does now.  If a new process's first
>> write to a shared memory page is a stq_c, that stq_c is guaranteed to
>> fail (at least on Tru64 Unix), because it will page fault.  The shared
>> memory page is inherited read-only and is converted to read-write on
>> first fault.  This doesn't seem really necessary, but I suppose it's
>> done to share code with the copy-on-write case for non-shared pages
>> that are inherited via fork().

> This seems quite bizarre.  Why would the process fail on the write, and
> not just pause and wait for the fault to bring in the page?

An ordinary write would be re-executed and would succeed after the
page fault.  stq_c is different, because it's only supposed to succeed
if the processor has managed to hold an access lock on the target
address continuously since the ldq_l.  It would be very bad form to try
to hold the lock during a page fault.  (stq_c will also fail if the
processor is interrupted between ldq_l and stq_c, so occasional failures
are to be expected.  What was surprising me was the consistency of the
failure pattern.)

See the Alpha Architecture Manual if you really want to discuss this.

regards, tom lane



Re: [HACKERS] Assuming that TAS() will succeed the first time is verboten

2001-01-08 Thread Nathan Myers

On Mon, Jan 08, 2001 at 10:15:30PM -0500, Bruce Momjian wrote:
> > One last followup on that bizarreness about shutdown's checkpoint
> > failing on Alpha platforms ---
> > 
> > After changing the checkpoint code to loop, rather than assuming TAS()
> > must succeed the first time, I noticed that it always looped exactly
> > once.  This didn't make sense to me at the time, but after querying some
> > Alpha experts at DEC^H^H^HCompaq, it does now.  If a new process's first
> > write to a shared memory page is a stq_c, that stq_c is guaranteed to
> > fail (at least on Tru64 Unix), because it will page fault.  The shared
> > memory page is inherited read-only and is converted to read-write on
> > first fault.  This doesn't seem really necessary, but I suppose it's
> > done to share code with the copy-on-write case for non-shared pages
> > that are inherited via fork().
> 
> This seems quite bizarre.  Why would the process fail on the write, and
> not just pause and wait for the fault to bring in the page?  Doesn't the
> CPU halt the instruction to fetch in the page and restart the
> instruction?

This is normal, although non-intuitive.  (Good detective work, Tom.)  
The definition of load-locked/store-conditional says that if there's 
been an interrupt or trap (e.g. page fault) since the load-locked 
instruction executed, the store-conditional instruction fails.  That 
way you don't overwrite something that might have been written by 
another process that ran during the interval before you got the CPU
again.

Thus, the instruction does get restarted, but the lock has been 
(correctly) cleared, resulting in the need for failure/retry.  It's 
not a performance issue, because it only happens once per process.
Think of it as part of the cost of forking.

Nathan Myers
[EMAIL PROTECTED]



Re: [HACKERS] Assuming that TAS() will succeed the first time is verboten

2001-01-08 Thread Bruce Momjian

> One last followup on that bizarreness about shutdown's checkpoint
> failing on Alpha platforms ---
> 
> After changing the checkpoint code to loop, rather than assuming TAS()
> must succeed the first time, I noticed that it always looped exactly
> once.  This didn't make sense to me at the time, but after querying some
> Alpha experts at DEC^H^H^HCompaq, it does now.  If a new process's first
> write to a shared memory page is a stq_c, that stq_c is guaranteed to
> fail (at least on Tru64 Unix), because it will page fault.  The shared
> memory page is inherited read-only and is converted to read-write on
> first fault.  This doesn't seem really necessary, but I suppose it's
> done to share code with the copy-on-write case for non-shared pages
> that are inherited via fork().

This seems quite bizarre.  Why would the process fail on the write, and
not just pause and wait for the fault to bring in the page?  Doesn't the
CPU halt the instruction to fetch in the page and restart the
instruction?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



[HACKERS] README.mb

2001-01-08 Thread Tatsuo Ishii

I think doc/REAME.mb is now deprecated and I would like to remove it.
Also I would like to place the Chinese (Big5) version of README.mb
recently posted by Chih-Chang Hsieh <[EMAIL PROTECTED]> as
doc/README.mb.big5.

Comments? Objections?
--
Tatsuo Ishii



[HACKERS] Dec TRU64/PG 7.1

2001-01-08 Thread Larry Rosenman

Has anyone tried 7.1Beta on Tru64? 

I've got an app that will be moving to this platform, and would like
to not have any surprises (It's the first production app with PG in
the dallas office, and would like to not make trouble).

Thanks!

LER


-- 
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 972-414-9812 E-Mail: [EMAIL PROTECTED]
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749



[HACKERS] Compaq open source database benchmark

2001-01-08 Thread Bruce Momjian

Compaq has released an open-source database benchmark at:

http://opensource.compaq.com/sourceforge/project/?group_id=19

I must say, Compaq has been more involved with PostgreSQL than any of
the other computer vendors.  They have contributed equipment, and now
benchmark source code.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026



Re: [HACKERS] Quite strange crash

2001-01-08 Thread Nathan Myers

On Mon, Jan 08, 2001 at 12:21:38PM -0500, Tom Lane wrote:
> Denis Perchine <[EMAIL PROTECTED]> writes:
> >>> FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting.
> > 
> > Were there any errors before that?
> 
> > Actually you can have a look on the logs yourself.
> 
> Well, I found a smoking gun: ...
> What seems to have happened is that 2501 curled up and died, leaving
> one or more buffer spinlocks locked.  ...
> There is something pretty fishy about this.  You aren't by any chance
> running the postmaster under a ulimit setting that might cut off
> individual backends after a certain amount of CPU time, are you?
> What signal does a ulimit violation deliver on your machine, anyway?

It's worth noting here that modern Unixes run around killing user-level
processes more or less at random when free swap space (and sometimes
just RAM) runs low.  AIX was the first such, but would send SIGDANGER
to processes first to try to reclaim some RAM; critical daemons were
expected to explicitly ignore SIGDANGER. Other Unixes picked up the 
idea without picking up the SIGDANGER behavior.

The reason for this common pathological behavior is usually traced
to sloppy resource accounting.  It manifests as the bad policy of 
having malloc() (and sbrk() or mmap() underneath) return a valid 
pointer rather than NULL, on the assumption that most of the memory 
asked for won't be used just yet.  Anyhow, the system doesn't know 
how much memory is really available at that moment.

Usually the problem is explained with the example of a very large
process that forks, suddenly demanding twice as much memory. (Apache
is particularly egregious this way, allocating lots of memory and
then forking several times.)  Instead of failing the fork, the kernel
waits for a process to touch memory it was granted and then see if 
any RAM/swap has turned up to satisfy it, and then kill the process 
(or some random other process!) if not.

Now that programs have come to depend on this behavior, it has become
very hard to fix it. The implication for the rest of us is that we 
should expect our processes to be killed at random, just for touching 
memory granted, or for no reason at all. (Kernel people say, "They're 
just user-level programs, restart them;" or, "Maybe we can designate 
some critical processes that don't get killed".)  In Linux they try 
to invent heuristics to avoid killing the X server, because so many 
programs depend on it.  It's a disgraceful mess, really.

The relevance to the issue at hand is that processes dying during 
heavy memory load is a documented feature of our supported platforms.

Nathan Myers 
[EMAIL PROTECTED]



Re: [HACKERS] Assuming that TAS() will succeed the first time is verboten

2001-01-08 Thread Tom Lane

One last followup on that bizarreness about shutdown's checkpoint
failing on Alpha platforms ---

After changing the checkpoint code to loop, rather than assuming TAS()
must succeed the first time, I noticed that it always looped exactly
once.  This didn't make sense to me at the time, but after querying some
Alpha experts at DEC^H^H^HCompaq, it does now.  If a new process's first
write to a shared memory page is a stq_c, that stq_c is guaranteed to
fail (at least on Tru64 Unix), because it will page fault.  The shared
memory page is inherited read-only and is converted to read-write on
first fault.  This doesn't seem really necessary, but I suppose it's
done to share code with the copy-on-write case for non-shared pages
that are inherited via fork().

It makes sense that the checkpoint process's first write to shared
memory would be stq_c, because after all it shouldn't be scribbling
on shared memory until it's got the spinlock, n'est ce pas?

So a failure the first time through the TAS loop is entirely expected
for Alpha.  I wouldn't be surprised to see similar behavior on other
architectures, now that I see the first-write-from-a-process connection.

Bottom line is the same: always call TAS() in a retry loop.

regards, tom lane



Re: [HACKERS] is_view seems unnecessarily slow

2001-01-08 Thread Jan Wieck

Tom Lane wrote:
> backend/commands/command.c has a routine is_view() that tests for
> view-ness by scanning pg_rewrite (all of it) to see if the given
> relation has any ON SELECT rules.
>
> This is only used to disallow AlterTableAddConstraint and
> LockTableCommand on views.  While I don't care much about the
> performance of AlterTableAddConstraint, it does bug me that this
> might slow down LOCK TABLE a good deal.
>
> Any objection to replacing this routine by a test for relkind = VIEW?

No objections.

These checks came from the days where views still had relkind
'r' and looking up pg_rewrite was the only way to know.


Jan

--

#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #





[HACKERS] Re: Should heap_update/heap_delete hold buffer locks while toasting?

2001-01-08 Thread Jan Wieck

Tom Lane wrote:
> The way that heap_update() and heap_delete() are currently coded, they
> hold the buffer context lock on the buffer containing the old tuple
> while they invoke heap_tuple_toast_attrs().  This strikes me as at least
> inefficient and at worst a source of deadlock.  Is it possible to avoid
> holding the buffer lock while doing the TOAST manipulations?

Since the TOAST table access is doing it's own locking on the
TOAST tables, I think it'd be possible to move it outside  of
the buffer lock.


Jan

--

#==#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.  #
#== [EMAIL PROTECTED] #





Re: [HACKERS] Quite strange crash

2001-01-08 Thread Tom Lane

Denis Perchine <[EMAIL PROTECTED]> writes:
>>> FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting.
> 
> Were there any errors before that?

> Actually you can have a look on the logs yourself.

Well, I found a smoking gun:

Jan  7 04:27:51 mx postgres[2501]: FATAL 1:  The system is shutting down

PID 2501 had been running:

Jan  7 04:25:44 mx postgres[2501]: query: vacuum verbose lazy;

What seems to have happened is that 2501 curled up and died, leaving
one or more buffer spinlocks locked.  Roughly one spinlock timeout
later, at 04:29:07, we have 1008 complaining of a stuck spinlock.
So that fits.

The real question is what happened to 2501?  None of the other backends
reported a SIGTERM signal, so the signal did not come from the
postmaster.

Another interesting datapoint: there is a second place in this logfile
where one single backend reports SIGTERM while its brethren keep running:

Jan  7 04:30:47 mx postgres[4269]: query: vacuum verbose;
...
Jan  7 04:38:16 mx postgres[4269]: FATAL 1:  The system is shutting down

There is something pretty fishy about this.  You aren't by any chance
running the postmaster under a ulimit setting that might cut off
individual backends after a certain amount of CPU time, are you?
What signal does a ulimit violation deliver on your machine, anyway?

regards, tom lane



Re: [HACKERS] bootstrap tables

2001-01-08 Thread Ross J. Reedstrom

On Sat, Jan 06, 2001 at 03:50:03AM +0100, Peter Eisentraut wrote:
> Ross J. Reedstrom writes:
> 
> > > Do you really need the thing to be a bootstrap table, and not a plain
> > > system table?
> >
> > Yup, 'cause it's going to store the schema info, including the system
> > schema. I forsee it needing to be accessed immediately during bootstrap.
> 
> Does "schema info" mean SQL schemas or merely additional schema
> information along the lines of pg_class, etc.?
> 

I thought that might prick your ears up. Yes, I'm looking at just how
horrible it might be to implement SQL schemas. As it turns out, I think
Tom is right, and I can just use a regular system table. I've got that
working, and some of the gammar modified to accept the schema.table
notation: now I'm working from both ends on all the places in between
that need to know about the schema. Vadim's work replacing relnames with
relfilenode helps to some extent.

I'm taking the simplistic approach of adding the schema name wherever
the relname is currently used, in parallel, and introducing a new global,
the current default schema.

As I said in my first note, I'm trying this out, regardless if it's the
best way to implement the feature (why is it that I only seem to find
time to work on new stuff in pgsql when we're in beta?) We can debate a
better implementation after I have roughly working code, or have given
up on it as a bad idea.

As it happens, my pgsql time this week is taken up by another task, so I
won't be working on this until the weekend, at the earliest.

Ross



Re: [HACKERS] Quite strange crash

2001-01-08 Thread Denis Perchine

>  FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting.
> >>
> >> Were there any errors before that?
> >
> > No... Just clean log (I redirect log from stderr/out t file, and all
> > other to syslog).
>
> The error messages would be in the syslog then, not in stderr.

Hmmm... The only strange errors I see are:

Jan  7 04:22:14 mx postgres[679]: query: insert into statistic (date, 
visit_count, variant_id) values (now(), 1, 2)
Jan  7 04:22:14 mx postgres[631]: query: insert into statistic (date, 
visit_count, variant_id) values (now(), 1, 2)
Jan  7 04:22:14 mx postgres[700]: query: insert into statistic (date, 
visit_count, variant_id) values (now(), 1, 2)
Jan  7 04:22:14 mx postgres[665]: query: insert into statistic (date, 
visit_count, variant_id) values (now(), 1, 2)
Jan  7 04:22:14 mx postgres[633]: query: insert into statistic (date, 
visit_count, variant_id) values (now(), 1, 2)
Jan  7 04:22:14 mx postgres[629]: query: insert into statistic (date, 
visit_count, variant_id) values (now(), 1, 2)
Jan  7 04:22:14 mx postgres[736]: query: commit
Jan  7 04:22:14 mx postgres[736]: ProcessUtility: commit
Jan  7 04:22:14 mx postgres[700]: ERROR:  Cannot insert a duplicate key into 
unique index statistic_date_vid_key
Jan  7 04:22:14 mx postgres[700]: query: update users set 
rcpt_ip='213.75.35.129',rcptdate=now() where id=1428067
Jan  7 04:22:14 mx postgres[700]: NOTICE:  current transaction is aborted, 
queries ignored until end of transaction block
Jan  7 04:22:14 mx postgres[679]: query: commit
Jan  7 04:22:14 mx postgres[679]: ProcessUtility: commit
Jan  7 04:22:14 mx postgres[679]: query: update users set 
rcpt_ip='213.75.55.185',rcptdate=now() where id=1430836
Jan  7 04:22:14 mx postgres[665]: ERROR:  Cannot insert a duplicate key into 
unique index statistic_date_vid_key
Jan  7 04:22:14 mx postgres[665]: query: update users set 
rcpt_ip='202.156.121.139',rcptdate=now() where id=1271397
Jan  7 04:22:14 mx postgres[665]: NOTICE:  current transaction is aborted, 
queries ignored until end of transaction block
Jan  7 04:22:14 mx postgres[631]: ERROR:  Cannot insert a duplicate key into 
unique index statistic_date_vid_key
Jan  7 04:22:14 mx postgres[631]: query: update users set 
rcpt_ip='24.20.53.63',rcptdate=now() where id=1451254
Jan  7 04:22:14 mx postgres[631]: NOTICE:  current transaction is aborted, 
queries ignored until end of transaction block
Jan  7 04:22:14 mx postgres[633]: ERROR:  Cannot insert a duplicate key into 
unique index statistic_date_vid_key
Jan  7 04:22:14 mx postgres[633]: query: update users set 
rcpt_ip='213.116.168.173',rcptdate=now() where id=1378049
Jan  7 04:22:14 mx postgres[633]: NOTICE:  current transaction is aborted, 
queries ignored until end of transaction block
Jan  7 04:22:14 mx postgres[630]: query: select id,msg,next from alert
Jan  7 04:22:14 mx postgres[630]: query: select email,type from email where 
variant_id=2
Jan  7 04:22:14 mx postgres[630]: query:
select * from users where senderdate > now()-'10days'::interval AND
variant_id=2 AND crypt='21AN6KRffJdFRFc511'
 
Jan  7 04:22:14 mx postgres[629]: ERROR:  Cannot insert a duplicate key into 
unique index statistic_date_vid_key
Jan  7 04:22:14 mx postgres[629]: query: update users set 
rcpt_ip='213.42.45.81',rcptdate=now() where id=1441046
Jan  7 04:22:14 mx postgres[629]: NOTICE:  current transaction is aborted, 
queries ignored until end of transaction block
Jan  7 04:22:15 mx postgres[711]: query: select message_id from pop3 where 
server_id = 17746
Jan  7 04:22:15 mx postgres[711]: ERROR:  Relation 'pop3' does not exist

They popped up 4 minutes before. And the most interesting is that relation 
pop3 does exist!

> > And the last query was:
> > Jan  7 04:27:53 mx postgres[1008]: query: select message_id from pop3
> > where server_id = 22615
>
> How about the prior queries of other processes?

I do not want to flood maillist (it will be too much of info). I can send you 
complete log file from Jan 7. It is 128Mb uncompressed. With gz it is 8Mb. 
Maybe it will be smaller with bz2.

>  Keep in mind that the
> spinlock could have been left locked by any backend, not only the one
> that complained about it.

Actually you can have a look on the logs yourself. Remember I gave you a 
password from postgres user. This is the same postgres. Logs are in 
/var/log/postgres. You will need postgres.log.1.gz.

-- 
Sincerely Yours,
Denis Perchine

--
E-Mail: [EMAIL PROTECTED]
HomePage: http://www.perchine.com/dyp/
FidoNet: 2:5000/120.5
--



Re: [HACKERS] heap_update is broken in current sources

2001-01-08 Thread The Hermit Hacker


okay, will bundle up beta2 and announce it tonight when I get home ...
gives about 6 hrs or so to "halt the presses" *grin*

On Mon, 8 Jan 2001, Tom Lane wrote:

> The Hermit Hacker <[EMAIL PROTECTED]> writes:
> > How are we on this?
>
> It's fixed.
>
> I've also run the regress tests with bufmgr.c hacked up to discard
> pages (with forcible overwriting) as soon as their refcount goes to
> zero.  That didn't disclose any similar bugs, although the coverage
> of the tests leaves much to be desired :-(
>
>   regards, tom lane
>

Marc G. Fournier   ICQ#7615664   IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: [EMAIL PROTECTED]   secondary: scrappy@{freebsd|postgresql}.org




Re: [HACKERS] Quite strange crash

2001-01-08 Thread Tom Lane

Denis Perchine <[EMAIL PROTECTED]> writes:
> On Monday 08 January 2001 00:08, Tom Lane wrote:
 FATAL: s_lock(401f7435) at bufmgr.c:2350, stuck spinlock. Aborting.
>> 
>> Were there any errors before that?

> No... Just clean log (I redirect log from stderr/out t file, and all
> other to syslog).

The error messages would be in the syslog then, not in stderr.

> And the last query was:
> Jan  7 04:27:53 mx postgres[1008]: query: select message_id from pop3 where 
> server_id = 22615

How about the prior queries of other processes?  Keep in mind that the
spinlock could have been left locked by any backend, not only the one
that complained about it.

regards, tom lane



Re: [HACKERS] pg_dump return status..

2001-01-08 Thread Tom Lane

Pete Forman <[EMAIL PROTECTED]> writes:
> Philip Warner writes:
>> All I need to know is how to detect an error. Does it return EOF on
>> error?

> The standard sprintf() returns a negative int on error.

I thought we were talking about fprintf.  sprintf can't really detect
any errors anyway, except maybe a bad format string.

regards, tom lane



Re: [HACKERS] heap_update is broken in current sources

2001-01-08 Thread Tom Lane

The Hermit Hacker <[EMAIL PROTECTED]> writes:
> How are we on this?

It's fixed.

I've also run the regress tests with bufmgr.c hacked up to discard
pages (with forcible overwriting) as soon as their refcount goes to
zero.  That didn't disclose any similar bugs, although the coverage
of the tests leaves much to be desired :-(

regards, tom lane



Re: [HACKERS] heap_update is broken in current sources

2001-01-08 Thread The Hermit Hacker


How are we on this?

On Sun, 7 Jan 2001, Tom Lane wrote:

> heap_update() currently ends with
>
> if (newbuf != buffer)
> {
> LockBuffer(newbuf, BUFFER_LOCK_UNLOCK);
> WriteBuffer(newbuf);
> }
> LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
> WriteBuffer(buffer);
>
> /* invalidate caches */
> RelationInvalidateHeapTuple(relation, &oldtup);
> RelationMark4RollbackHeapTuple(relation, newtup);
>
> return HeapTupleMayBeUpdated;
>
> This is broken because WriteBuffer releases our refcounts on the buffer
> pages that are holding the old and new tuples.  By the time
> RelationInvalidateHeapTuple gets to do its thing, some other backend may
> have swapped a new disk page into the shared buffer that oldtup points
> at.  catcache.c will then be using the wrong data to compute the hash
> index of the old tuple.  This will at minimum result in failure to
> invalidate the old tuple out of our catcache (because we'll be searching
> the wrong hashchains), and can lead to a flat-out crash or Assert
> failure due to invalid data being fed to the hashing code.
>
> I have seen several nonrepeatable failures in the parallel regress tests
> in recent weeks, which I now believe are all traceable to this error.
>
> I will commit a fix for this error shortly, and have recommended to Marc
> that he re-roll the beta2 tarball before announcing it...
>
>   regards, tom lane
>

Marc G. Fournier   ICQ#7615664   IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: [EMAIL PROTECTED]   secondary: scrappy@{freebsd|postgresql}.org




Re: [HACKERS] patch: contrib/pgcrypto sanity

2001-01-08 Thread Marko Kreen

On Mon, Jan 08, 2001 at 10:03:25AM +, Pete Forman wrote:
> Marko Kreen writes:
>  > On Mon, Jan 08, 2001 at 04:06:09AM +0200, Marko Kreen wrote:
>  > > Well, actually they do.  glibc in  and NetBSD in
>  > >  which is a mess, all rigth.  Problem is that
>  > > postgres.h does not know about this.  I guess that C99 forgot to
>  > > specify _where_ they should be defined.
>  > 
>  > Correction, they both have  which probably is the right
>  > location for this.
> 
>  is adequate to pick up uint*_t.   is defined to
> include .  Of course all this C99 stuff is new and existing
> implementations may have the typedefs in different files or not have
> them at all.

But as I said, NetBSD does not have it.  So what is the
correct/portable/standard location for it?  Can anyone with C99
standard in hand find that out?

E.g. Tom Lane has some OS where these types are in
std{io|lib|def|arg} but on NetBSD and glibc/Linux you must include
separate header file for them.

-- 
marko




Re: [HACKERS] pg_dump return status..

2001-01-08 Thread Pete Forman

Philip Warner writes:
 > At 09:36 8/01/01 +, Pete Forman wrote:
 > >There are no compiler errors, just run time errors if you rely on
 > >the return from sprintf() being the number of characters.
 > 
 > All I need to know is how to detect an error. Does it return EOF on
 > error?

The standard sprintf() returns a negative int on error.  That value
may or may not be EOF.  EOF technically indicates the end of an input
stream.  It has no direct connection with errors writing to streams.

The "BSD" version in Solaris always returns the char* pointer that is
the first argument.  There is no way to check for an error.

Other BSD OSs such as FreeBSD, NetBSD and OpenBSD all conform to C89,
i.e. sprintf() returns an int which is the number of characters
written (excluding '\0') or a negative number for failure.
-- 
Pete Forman -./\.- Disclaimer: This post is originated
WesternGeco   -./\.-  by myself and does not represent
[EMAIL PROTECTED] -./\.-  opinion of Schlumberger, Baker
http://www.crosswinds.net/~petef  -./\.-  Hughes or their divisions.



Re: [HACKERS] pg_dump return status..

2001-01-08 Thread Philip Warner

At 09:36 8/01/01 +, Pete Forman wrote:
>There are no compiler errors, just run time
>errors if you rely on the return from sprintf() being the number of
>characters.

All I need to know is how to detect an error. Does it return EOF on error?



Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 0500 83 82 82 | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/



Re: [HACKERS] patch: contrib/pgcrypto sanity

2001-01-08 Thread Pete Forman

Marko Kreen writes:
 > On Mon, Jan 08, 2001 at 04:06:09AM +0200, Marko Kreen wrote:
 > > On Sun, Jan 07, 2001 at 08:09:07PM -0500, Tom Lane wrote:
 > > > Marko Kreen <[EMAIL PROTECTED]> writes:
 > > > > Which basically changes u_int*_t -> uint*_t, so now it does
 > > > > not compile neither under Debian 2.2 nor under NetBSD 1.5
 > > > > which is platform independent¸ all right.
 > > > 
 > > > Well, that's annoying.  I guess those platforms are out of step
 > > > with the C99 standard, which specifies uint*_t not u_int*_t
 > > > (cf. C99 7.4.1.1).  I agree with your solution of switching to
 > > > Postgres-supplied typenames.
 > > 
 > > Well, actually they do.  glibc in  and NetBSD in
 > >  which is a mess, all rigth.  Problem is that
 > > postgres.h does not know about this.  I guess that C99 forgot to
 > > specify _where_ they should be defined.
 > 
 > Correction, they both have  which probably is the right
 > location for this.

 is adequate to pick up uint*_t.   is defined to
include .  Of course all this C99 stuff is new and existing
implementations may have the typedefs in different files or not have
them at all.
-- 
Pete Forman -./\.- Disclaimer: This post is originated
WesternGeco   -./\.-  by myself and does not represent
[EMAIL PROTECTED] -./\.-  opinion of Schlumberger, Baker
http://www.crosswinds.net/~petef  -./\.-  Hughes or their divisions.



Re: [HACKERS] pg_dump return status..

2001-01-08 Thread Pete Forman

Nathan Myers writes:
 > On Fri, Jan 05, 2001 at 11:20:43AM -0500, Tom Lane wrote:
 > > Philip Warner <[EMAIL PROTECTED]> writes:
 > > > how do I
 > > > check for a failed write in a way that works on all Unixes? Is the
 > > > following OK:
 > > 
 > > > - fwrite: ok if return value equals item count
 > > > - fprintf: ok if return value > 0.
 > > > - fputc: ok if != EOF
 > > 
 > > Probably fprintf() >= 0 --- according to my specs, it returns the number
 > > of chars emitted, or a negative value on error.  The other two are
 > > correct.
 > 
 > An fprintf returning 0 is a suspicious event; it's easy to imagine 
 > cases where it makes sense, but I don't think I have ever coded one.
 > Probably >N (where N is the smallest reasonable output, defaulting 
 > to 1) may be a better test in real code.
 > 
 > As I recall, on SunOS 4 the printf()s don't return the number of 
 > characters written.  I don't recall what they do instead, and have
 > no access to such machines any more.
 > 
 > Other old BSD-derived systems are likely to have have wonky return 
 > values/types on the printf()s.  Looking at the list of supported 
 > platforms, none jump out as likely candidates, but in the "unsupported" 
 > list, Ultrix and NextStep do.  (Do we care?)
 > 
 > If SunOS 4 is to remain a supported platform, the printf checks may 
 > need to be special-cased for it.

Current Solaris is liable to problems still, though these are not
relevant to this thread.  printf() and fprintf() have always returned
the number of characters transmitted, or EOF for failure.  It is
sprintf() that has problems.

There are two versions of sprintf() available in SunOS 4 - 8.  The
standard one (ANSI C) in libc returns an int, the number of characters
written (excluding '\0').  The BSD version returns a char* which
points to the target.  If you have a -lbsd on your link line then you
get the BSD version.  There are no compiler errors, just run time
errors if you rely on the return from sprintf() being the number of
characters.  The workaround is to put an extra -lc on the link line
before the -lbsd if your code needs both standard sprintf() and some
other BSD function.

Ultrix is documented as having the same behaviour as Solaris.  I don't
know about NeXTSTEP/OPENSTEP/GNUStep.
-- 
Pete Forman -./\.- Disclaimer: This post is originated
WesternGeco   -./\.-  by myself and does not represent
[EMAIL PROTECTED] -./\.-  opinion of Schlumberger, Baker
http://www.crosswinds.net/~petef  -./\.-  Hughes or their divisions.



Re: [HACKERS] ALL, ANY bug?

2001-01-08 Thread Tatsuo Ishii

> No, I still disagree.  The ANY/ALL constructs compare a single row on
> the left side with all the rows produced by the query on the right.
> "values(0,1)" is effectively the same as "SELECT 0,1", ie, it is a
> query that happens to produce just one row.  The above is illegal in
> SQL92 because the lefthand side is not a two-item row.

Oh, I see your point. values(0,1) = SELECT 0,1", which is one row
constructor. Sorry for the confusion.
--
Tatsuo Ishii



Re: [HACKERS] ALL, ANY bug?

2001-01-08 Thread Tom Lane

Tatsuo Ishii <[EMAIL PROTECTED]> writes:
> You are right. However, SQL92 spec allows:
> select * from t1 where i <> all values(0,1);

No, I still disagree.  The ANY/ALL constructs compare a single row on
the left side with all the rows produced by the query on the right.
"values(0,1)" is effectively the same as "SELECT 0,1", ie, it is a
query that happens to produce just one row.  The above is illegal in
SQL92 because the lefthand side is not a two-item row.

The semantics you seem to be looking for is "scalar IN/NOT IN list",
which we do have.  ANY/ALL is a different animal.

It's true that we don't accept values(...) as a subquery, but that's
only one aspect of this example.

regards, tom lane