Re: [HACKERS] Instability in TRUNCATE regression test

2006-06-28 Thread Alvaro Herrera
Tom Lane wrote:

 1. Find a way to make the processing order consistent (eg by driving it
 off OID ordering).  Doesn't seem easy, but maybe I'm missing an idea.

Hmm, what about

1. get the complete list of tables to truncate, AccessShareLock'ed, get
their names
2. release locks
3. sort the list lexicographically (or by Oid, whatever)
4. acquire the stronger locks, in list order, taking care of not
aborting if a table is no longer there
5. truncate

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Instability in TRUNCATE regression test

2006-06-28 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 1. Find a way to make the processing order consistent (eg by driving it
 off OID ordering).  Doesn't seem easy, but maybe I'm missing an idea.

 Hmm, what about

 1. get the complete list of tables to truncate, AccessShareLock'ed, get
 their names
 2. release locks
 3. sort the list lexicographically (or by Oid, whatever)
 4. acquire the stronger locks, in list order, taking care of not
 aborting if a table is no longer there
 5. truncate

Releasing locks is no good ... what if someone adds/drops FK constraints
while you've not got any lock?

One thing I was toying with was to add an index to pg_constraint on,
say, (confrelid, conrelid), and to replace the existing seqscans for FK
constraints with scans using this index.  The second-column ordering
would guarantee everybody visits the entries in the same order.  Not
sure about overall performance implications ... in a small database,
several indexscans might take more time than one seqscan.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Instability in TRUNCATE regression test

2006-06-28 Thread Jim C. Nasby
On Wed, Jun 28, 2006 at 01:13:42PM -0400, Tom Lane wrote:
 One thing I was toying with was to add an index to pg_constraint on,
 say, (confrelid, conrelid), and to replace the existing seqscans for FK
 constraints with scans using this index.  The second-column ordering
 would guarantee everybody visits the entries in the same order.  Not
 sure about overall performance implications ... in a small database,
 several indexscans might take more time than one seqscan.

In a small database, both operations are likely to be plenty fast for
TRUNCATE, though. Surely the performance impact of getting the requisite
locks would far exceed any catalog scan times, no? And if you were doing
TRUNCATE's very often, I'd expect the right pages to be in cache
anyway...
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Instability in TRUNCATE regression test

2006-06-28 Thread Andrew Dunstan

Tom Lane wrote:


Buildfarm member platypus is showing a regression failure that I'm
surprised we have not seen before:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=platypusdt=2006-06-28%2014:05:01

Basically what this is showing is that when there is more than one
referencing table, the order in which things get done is dependent
on chance locations of system catalog entries.  That results in
cosmetic differences in which of multiple violations gets reported,
or in the order of truncate cascades to notices.

Given our push to have the buildfarm all green all the time,
I don't think I want to just live with occasional failures.
Seems like the alternatives are

1. Find a way to make the processing order consistent (eg by driving it
off OID ordering).  Doesn't seem easy, but maybe I'm missing an idea.

2. Install multiple expected files for the truncate test.

3. Dumb down the test cases so that they don't test multiple-cascade
situations.

Don't much care for any of these :-(.

Also, it seems possible that not-so-cosmetic problems could occur, for
instance deadlock between two backends trying to truncate the same
tables in different orders.  That suggests that answer #1 would be the
best way to fix it, but that would mean ordering the tables consistently
before we even get any locks on them, which seems hard.

Thoughts?


 



If this were a significant risk wouldn't we have seen many such failures 
before now? I guess we don't expect to see concurrent truncates being 
run. Probably worth protecting against, but also probably something of a 
corner case.


In the absence of a fix I'd go for the extra regression result file.

cheers

andrew


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Instability in TRUNCATE regression test

2006-06-28 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  1. Find a way to make the processing order consistent (eg by driving it
  off OID ordering).  Doesn't seem easy, but maybe I'm missing an idea.
 
  Hmm, what about
 
  1. get the complete list of tables to truncate, AccessShareLock'ed, get
  their names
  2. release locks
  3. sort the list lexicographically (or by Oid, whatever)
  4. acquire the stronger locks, in list order, taking care of not
  aborting if a table is no longer there
  5. truncate
 
 Releasing locks is no good ... what if someone adds/drops FK constraints
 while you've not got any lock?

Recheck after acquiring the stronger locks, unlock and drop from list.

 One thing I was toying with was to add an index to pg_constraint on,
 say, (confrelid, conrelid), and to replace the existing seqscans for FK
 constraints with scans using this index.  The second-column ordering
 would guarantee everybody visits the entries in the same order.  Not
 sure about overall performance implications ... in a small database,
 several indexscans might take more time than one seqscan.

I think there is more than one place that would benefit from such an
index.  Probably turn into a syscache as well?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Instability in TRUNCATE regression test

2006-06-28 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 If this were a significant risk wouldn't we have seen many such failures 
 before now?

Hard to tell.  It's possibly architecture-dependent, for one thing
(MAXALIGN will affect space availability).  Since this happened in a
parallel regression run, it could also be a matter of timing relative to
the concurrent tests.  I've often thought that we are not getting as much
mileage out of the parallel-testing facility as we could, because it's
really not exercising variations in timing all that much.  It'd be
interesting to throw in a small random delay at the start of each member
of a concurrent set of tests.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Instability in TRUNCATE regression test

2006-06-28 Thread Alvaro Herrera
Alvaro Herrera wrote:
 Tom Lane wrote:
  Alvaro Herrera [EMAIL PROTECTED] writes:
   Tom Lane wrote:
   1. Find a way to make the processing order consistent (eg by driving it
   off OID ordering).  Doesn't seem easy, but maybe I'm missing an idea.
  
   Hmm, what about
  
   1. get the complete list of tables to truncate, AccessShareLock'ed, get
   their names
   2. release locks
   3. sort the list lexicographically (or by Oid, whatever)
   4. acquire the stronger locks, in list order, taking care of not
   aborting if a table is no longer there
   5. truncate
  
  Releasing locks is no good ... what if someone adds/drops FK constraints
  while you've not got any lock?
 
 Recheck after acquiring the stronger locks, unlock and drop from list.

Oops, this doesn't cover the add FK constraints case, only drop.  I
think it would work to keep the locks on the tables initially mentioned
in the command (i.e. those not followed by CASCADE).  Hmm, but it fails
if it cascades more than once, so scratch that.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Instability in TRUNCATE regression test

2006-06-28 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 One thing I was toying with was to add an index to pg_constraint on,
 say, (confrelid, conrelid), and to replace the existing seqscans for FK
 constraints with scans using this index.

 I think there is more than one place that would benefit from such an
 index.  Probably turn into a syscache as well?

Yeah, that was in the back of my mind too, but I haven't looked through
the code to see.  A syscache wouldn't work because it's not a unique key.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Instability in TRUNCATE regression test

2006-06-28 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Buildfarm member platypus is showing a regression failure that I'm
 surprised we have not seen before:
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=platypusdt=2006-06-28%2014:05:01

 If this were a significant risk wouldn't we have seen many such failures 
 before now?

mongoose just failed with almost the exact same symptoms:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=mongoosedt=2006-06-28%2021:30:02

It's not quite the same diffs, which is unsurprising given the presumed
mechanism behind the failure, but that probably shoots down the add
another expected file response.

I imagine some recent change has made the probability of this behavior
much higher than it was before; perhaps there's more pg_constraint
update traffic in concurrent tests?  Anyway, it's now up to must fix
in my estimation.  I'll look into the new-pg_constraint-index idea.

I think someone should also take a hard look at that idea of introducing
more timing variability into the parallel tests.  Any volunteers?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Instability in TRUNCATE regression test

2006-06-28 Thread Tom Lane
I wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 One thing I was toying with was to add an index to pg_constraint on,
 say, (confrelid, conrelid), and to replace the existing seqscans for FK
 constraints with scans using this index.

 I think there is more than one place that would benefit from such an
 index.  Probably turn into a syscache as well?

 Yeah, that was in the back of my mind too, but I haven't looked through
 the code to see.  A syscache wouldn't work because it's not a unique key.

Having looked through the code, the only two places that currently seem
to have any need for an index on confrelid are the two paths in TRUNCATE
that find/check for FK relationships.  So I'm hesitant to add an index
just for that; seems like too much overhead to put onto all other
updates of pg_constraint.

What we can perhaps do instead is pull out the related OIDs (ie, a
function that given a rel OID returns a list of rels that have FK
dependencies on that rel) and then sort that list into OID order before
acting on it.

Note: the OID-sort-order concept is not perfect; if the OID counter were
to wrap around while the regression tests are running, you could get a
bogus failure of this type.  That seems low enough probability to live
with, though.  Anyway it'll never happen in the buildfarm's usage, since
buildfarm only runs the tests in freshly-initdb'd databases.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match