Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Decibel!

On Jun 7, 2008, at 2:00 PM, Andrew Dunstan wrote:
If we go down this road then I would far rather we tried to devise  
some safe (or semi-safe) way of doing it instead of simply  
providing expert (a.k.a. footgun) mode.


For instance, I'm wondering if we could do something with checksums  
of the input lines or something else that would make this difficult  
to do in circumstances other than pg_restore.



Yes, but that provides no help at all outside of pg_dump. Being able  
to add a FK with NO CHECK would be tremendously useful outside of  
pg_dump. Actually, in the interest of stating the problem and not the  
solution, what we need is a way to add FKs that doesn't lock  
everything up to perform the key checks. Perhaps there is some semi- 
safe way that the constraint could be added and the checks done in  
the background...


As for the footgun aspect, are we the enterprise-class OSS database  
or the one that caters itself to noobs that will go out of their way  
to make life hard on themselves? I'm all in favor of not adding  
footguns that don't have value, but this one holds a lot of value for  
anyone trying to maintain a large database in a 24/7 environment. To  
put this in perspective, the amount of revenue we would loose from  
adding just one FK to one of our larger tables would more than cover  
paying someone to develop this feature.

--
Decibel!, aka Jim C. Nasby, Database Architect  [EMAIL PROTECTED]
Give your computer some brain candy! www.distributed.net Team #1828




smime.p7s
Description: S/MIME cryptographic signature


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Andrew Dunstan



Decibel! wrote:



Yes, but that provides no help at all outside of pg_dump. Being able 
to add a FK with NO CHECK would be tremendously useful outside of 
pg_dump. Actually, in the interest of stating the problem and not the 
solution, what we need is a way to add FKs that doesn't lock 
everything up to perform the key checks. Perhaps there is some 
semi-safe way that the constraint could be added and the checks done 
in the background...


I had some thoughts along the same lines.

But how do you propose to recover when the check fails? What should 
pg_restore do if the dump is corrupt causing an FK check to fail?


I suppose we could have some sort of marking for FK constraints along 
the lines of {checked, unchecked, invalid}.




As for the footgun aspect, are we the enterprise-class OSS database or 
the one that caters itself to noobs that will go out of their way to 
make life hard on themselves? 


We are the database that tries very hard to keep its promises. If you 
want to change or relax those promises then the implications need to be 
very very clear.


I'm all in favor of not adding footguns that don't have value, but 
this one holds a lot of value for anyone trying to maintain a large 
database in a 24/7 environment. To put this in perspective, the amount 
of revenue we would loose from adding just one FK to one of our larger 
tables would more than cover paying someone to develop this feature.





Come up with a good proposal and I'm your man :-)  I haven't seen one yet.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Tom Lane
Decibel! [EMAIL PROTECTED] writes:
 Actually, in the interest of stating the problem and not the  
 solution, what we need is a way to add FKs that doesn't lock  
 everything up to perform the key checks.

Ah, finally a useful comment.  I think it might be possible to do an
add FK concurrently type of command that would take exclusive lock
for just long enough to add the triggers, then scan the tables with just
AccessShareLock to see if the existing rows meet the constraint, and
if so finally mark the constraint valid.  Meanwhile the constraint
would be enforced against newly-added rows by the triggers, so nothing
gets missed.  You'd still get a small hiccup in system performance
from the transient exclusive lock, but nothing like as bad as it is
now.  Would that solve your problem?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Simon Riggs

On Mon, 2008-06-09 at 10:57 -0400, Tom Lane wrote:
 Decibel! [EMAIL PROTECTED] writes:
  Actually, in the interest of stating the problem and not the  
  solution, what we need is a way to add FKs that doesn't lock  
  everything up to perform the key checks.
 
 Ah, finally a useful comment.  I think it might be possible to do an
 add FK concurrently type of command that would take exclusive lock
 for just long enough to add the triggers, then scan the tables with just
 AccessShareLock to see if the existing rows meet the constraint, and
 if so finally mark the constraint valid.  Meanwhile the constraint
 would be enforced against newly-added rows by the triggers, so nothing
 gets missed.  You'd still get a small hiccup in system performance
 from the transient exclusive lock, but nothing like as bad as it is
 now.  Would that solve your problem?

That's good, but it doesn't solve the original user complaint about
needing to re-run many, many large queries to which we already know the
answer.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Andrew Dunstan



Simon Riggs wrote:

On Mon, 2008-06-09 at 10:57 -0400, Tom Lane wrote:
  

Decibel! [EMAIL PROTECTED] writes:

Actually, in the interest of stating the problem and not the  
solution, what we need is a way to add FKs that doesn't lock  
everything up to perform the key checks.
  

Ah, finally a useful comment.  I think it might be possible to do an
add FK concurrently type of command that would take exclusive lock
for just long enough to add the triggers, then scan the tables with just
AccessShareLock to see if the existing rows meet the constraint, and
if so finally mark the constraint valid.  Meanwhile the constraint
would be enforced against newly-added rows by the triggers, so nothing
gets missed.  You'd still get a small hiccup in system performance
from the transient exclusive lock, but nothing like as bad as it is
now.  Would that solve your problem?



That's good, but it doesn't solve the original user complaint about
needing to re-run many, many large queries to which we already know the
answer.

  


But we don't know it for dead sure, we only think we do. What if the 
data for one or other of the tables is corrupted? We'll end up with data 
we believe is consistent but in fact is not, ISTM. If you can somehow 
guarantee the integrity of data in both tables then we might be 
justified in assuming that the FK constraint will be consistent - that's 
why I suggested some sort of checksum mechanism might serve the purpose.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Mon, 2008-06-09 at 10:57 -0400, Tom Lane wrote:
 Ah, finally a useful comment.  I think it might be possible to do an
 add FK concurrently type of command that would take exclusive lock

 That's good, but it doesn't solve the original user complaint about
 needing to re-run many, many large queries to which we already know the
 answer.

No, we are running a large query to which the user *thinks* he knows the
answer.  There are any number of reasons why he might be wrong.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Simon Riggs

On Mon, 2008-06-09 at 11:23 -0400, Andrew Dunstan wrote:
 
 Simon Riggs wrote:
  On Mon, 2008-06-09 at 10:57 -0400, Tom Lane wrote:

  Decibel! [EMAIL PROTECTED] writes:
  
  Actually, in the interest of stating the problem and not the  
  solution, what we need is a way to add FKs that doesn't lock  
  everything up to perform the key checks.

  Ah, finally a useful comment.  I think it might be possible to do an
  add FK concurrently type of command that would take exclusive lock
  for just long enough to add the triggers, then scan the tables with just
  AccessShareLock to see if the existing rows meet the constraint, and
  if so finally mark the constraint valid.  Meanwhile the constraint
  would be enforced against newly-added rows by the triggers, so nothing
  gets missed.  You'd still get a small hiccup in system performance
  from the transient exclusive lock, but nothing like as bad as it is
  now.  Would that solve your problem?
  
 
  That's good, but it doesn't solve the original user complaint about
  needing to re-run many, many large queries to which we already know the
  answer.
 
 But we don't know it for dead sure, we only think we do. What if the 
 data for one or other of the tables is corrupted? We'll end up with data 
 we believe is consistent but in fact is not, ISTM. If you can somehow 
 guarantee the integrity of data in both tables then we might be 
 justified in assuming that the FK constraint will be consistent - that's 
 why I suggested some sort of checksum mechanism might serve the purpose.

Agreed.

Can we get COPY to output the checksum of its output as part of the
command tag? How else can we return the checksum? In $file.cksum for any
given output file?

We can then use an explicit checksum option in the COPY when we reload,
with CHECKSUM option.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Simon Riggs

On Mon, 2008-06-09 at 11:33 -0400, Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  On Mon, 2008-06-09 at 10:57 -0400, Tom Lane wrote:
  Ah, finally a useful comment.  I think it might be possible to do an
  add FK concurrently type of command that would take exclusive lock
 
  That's good, but it doesn't solve the original user complaint about
  needing to re-run many, many large queries to which we already know the
  answer.
 
 No, we are running a large query to which the user *thinks* he knows the
 answer.  There are any number of reasons why he might be wrong.

Of course. I should have said to which we already know the answer to
indicate I'm passing on others' criticisms of us.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 On Mon, 2008-06-09 at 11:33 -0400, Tom Lane wrote:
 No, we are running a large query to which the user *thinks* he knows the
 answer.  There are any number of reasons why he might be wrong.

 Of course. I should have said to which we already know the answer to
 indicate I'm passing on others' criticisms of us.

[ shrug... ]  We don't know the answer either, and anyone who says
we do is merely betraying his ignorance of the number of ways to load
a foot-gun.

I don't have any confidence in the checksum proposal either, as it's
still naively assuming that changes in the data are the only possible
problem.  Consider that you are loading the data into a new database,
which might be running under a different locale setting, might contain
a new implementation of a datatype with subtly (or not so subtly)
different semantics, or might just already contain data in the target
tables.  pg_dump scripts are not nearly bright enough to defend against
these types of threats.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Filip RembiaƂkowski
2008/6/9 Simon Riggs [EMAIL PROTECTED]:


 On Mon, 2008-06-09 at 10:57 -0400, Tom Lane wrote:
  Decibel! [EMAIL PROTECTED] writes:
   Actually, in the interest of stating the problem and not the
   solution, what we need is a way to add FKs that doesn't lock
   everything up to perform the key checks.
 
  Ah, finally a useful comment.  I think it might be possible to do an
  add FK concurrently type of command that would take exclusive lock
  for just long enough to add the triggers, then scan the tables with just
  AccessShareLock to see if the existing rows meet the constraint, and
  if so finally mark the constraint valid.  Meanwhile the constraint
  would be enforced against newly-added rows by the triggers, so nothing
  gets missed.  You'd still get a small hiccup in system performance
  from the transient exclusive lock, but nothing like as bad as it is
  now.  Would that solve your problem?

 That's good, but it doesn't solve the original user complaint about
 needing to re-run many, many large queries to which we already know the
 answer.


just a guess, but maybe create FK concurrently feature combined with
synchronized scan feature _does_ resolve original problem.

if you run many create FK concurrently one after another, wouldn't the seq
scan be reused?












 --
  Simon Riggs   www.2ndQuadrant.com
  PostgreSQL Training, Services and Support


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Filip RembiaƂkowski


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Andrew Dunstan



Simon Riggs wrote:
But we don't know it for dead sure, we only think we do. What if the 
data for one or other of the tables is corrupted? We'll end up with data 
we believe is consistent but in fact is not, ISTM. If you can somehow 
guarantee the integrity of data in both tables then we might be 
justified in assuming that the FK constraint will be consistent - that's 
why I suggested some sort of checksum mechanism might serve the purpose.



Agreed.

Can we get COPY to output the checksum of its output as part of the
command tag? How else can we return the checksum? In $file.cksum for any
given output file?
  


It seems a reasonable idea to use the command tag, unless that's going 
to break lots of stuff. I think the only thing we can usefully checksum 
is the output lines in the client encoding.



We can then use an explicit checksum option in the COPY when we reload,
with CHECKSUM option.

  


We need rather more than this to make sure your facility isn't abused. 
That's the part that I haven't been able to think of a good answer for 
(yet).


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Robert Treat
On Monday 09 June 2008 11:59:27 Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  On Mon, 2008-06-09 at 11:33 -0400, Tom Lane wrote:
  No, we are running a large query to which the user *thinks* he knows the
  answer.  There are any number of reasons why he might be wrong.
 
  Of course. I should have said to which we already know the answer to
  indicate I'm passing on others' criticisms of us.

 [ shrug... ]  We don't know the answer either, and anyone who says
 we do is merely betraying his ignorance of the number of ways to load
 a foot-gun.


I think the more realistic scenario (based on the FK idea) is that you want to 
prevent any future rows from coming without validating the FK, and you're 
willing to clean up any violators after the fact, since you can make that 
an out of the critical path operation.

if you extend this to a more general create constraint concurrently (to 
handle normal constraint, not null constraints, etc...), it would certainly 
be a big win, and i think most would see it as a reasonable compromise. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Simon Riggs

On Mon, 2008-06-09 at 12:37 -0400, Robert Treat wrote:
 On Monday 09 June 2008 11:59:27 Tom Lane wrote:
  Simon Riggs [EMAIL PROTECTED] writes:
   On Mon, 2008-06-09 at 11:33 -0400, Tom Lane wrote:
   No, we are running a large query to which the user *thinks* he knows the
   answer.  There are any number of reasons why he might be wrong.
  
   Of course. I should have said to which we already know the answer to
   indicate I'm passing on others' criticisms of us.
 
  [ shrug... ]  We don't know the answer either, and anyone who says
  we do is merely betraying his ignorance of the number of ways to load
  a foot-gun.
 
 
 I think the more realistic scenario (based on the FK idea) is that you want 
 to 
 prevent any future rows from coming without validating the FK, and you're 
 willing to clean up any violators after the fact, since you can make that 
 an out of the critical path operation.
 
 if you extend this to a more general create constraint concurrently (to 
 handle normal constraint, not null constraints, etc...), it would certainly 
 be a big win, and i think most would see it as a reasonable compromise. 

Agreed. I think the out of the critical path action is more likely to
be the intended path rather than the never check at all route.

If we break down the action into two parts.

ALTER TABLE ... ADD CONSTRAINT foo FOREIGN KEY ... NOVALIDATE;
which holds exclusive lock, but only momentarily
After this runs any new data is validated at moment of data change, but
the older data has yet to be validated.

ALTER TABLE ... VALIDATE CONSTRAINT foo
which runs lengthy check, though only grabs lock as last part of action

This way we have the ability to add them concurrently if we choose by
running one after the other, or we can run first part only for now and
run the other one at a more convenient moment.

On a full set of checks on a large complex database can easily take
hours or even days.

We should allow this. It's not a footgun, its an honest attempt by
people to add RI checks to their database. The only other alternative
for some people is to not add FKs at all, which is also a footgun, but
we don't seem bothered that they might take that option.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Alvaro Herrera
Simon Riggs wrote:

 If we break down the action into two parts.
 
 ALTER TABLE ... ADD CONSTRAINT foo FOREIGN KEY ... NOVALIDATE;
 which holds exclusive lock, but only momentarily
 After this runs any new data is validated at moment of data change, but
 the older data has yet to be validated.
 
 ALTER TABLE ... VALIDATE CONSTRAINT foo
 which runs lengthy check, though only grabs lock as last part of action

The problem I see with this approach in general (two-phase FK creation)
is that you have to keep the same transaction for the first and second
command, but you really want concurrent backends to see the tuple for
the not-yet-validated constraint row.

Another benefit that could arise from this is that the hypothetical
VALIDATE CONSTRAINT step could validate more than one constraint at a
time, possibly processing all the constraints with a single table scan.

Perhaps VALIDATE CONSTRAINT could be handled as an automatic commit-time
action.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Gregory Stark
Alvaro Herrera [EMAIL PROTECTED] writes:

 The problem I see with this approach in general (two-phase FK creation)
 is that you have to keep the same transaction for the first and second
 command, but you really want concurrent backends to see the tuple for
 the not-yet-validated constraint row.

Do you? It seems like having a constraint which is enforced on any new
operations but which doesn't guarantee that existing records satisfy it is a
useful feature in itself -- separating the two concepts this property is true
for all records and any action taken must leave the record with this
property

ISTM you can validate an invalid constraint using any snapshot taken at any
time = the original snapshot. As long as the constraint is being enforced for
all transactions which start after the validating snapshot's xmin then when
it's done it can know the constraint is valid.

Taking a lock on the table to create the constraint certainly leaves that
property fulfilled. Actually it seems we could not take any lock and just
check when it comes time to do the validation that the snapshot's xmin is =
the xmin on the constraint. I'm starting to get leery of all these tightly
argued bits of logic though. Each one on its own is safe but the resulting
system is getting to be quite complex.

 Another benefit that could arise from this is that the hypothetical
 VALIDATE CONSTRAINT step could validate more than one constraint at a
 time, possibly processing all the constraints with a single table scan.

Interesting.

 Perhaps VALIDATE CONSTRAINT could be handled as an automatic commit-time
 action.

I don't really like this, at least not as the only option, because as I said
above and Robert Treat also said, it could be useful to have the constraint in
place for new operations but check it for the existing data at some later
date. (Or even never)

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com
  Ask me about EnterpriseDB's Slony Replication support!

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Simon Riggs

On Mon, 2008-06-09 at 14:07 -0400, Alvaro Herrera wrote:
 Simon Riggs wrote:
 
  If we break down the action into two parts.
  
  ALTER TABLE ... ADD CONSTRAINT foo FOREIGN KEY ... NOVALIDATE;
  which holds exclusive lock, but only momentarily
  After this runs any new data is validated at moment of data change, but
  the older data has yet to be validated.
  
  ALTER TABLE ... VALIDATE CONSTRAINT foo
  which runs lengthy check, though only grabs lock as last part of action
 
 The problem I see with this approach in general (two-phase FK creation)
 is that you have to keep the same transaction for the first and second
 command, but you really want concurrent backends to see the tuple for
 the not-yet-validated constraint row.

Well, they *must* be in separate transactions if we are to avoid holding
an AccessExclusiveLock while we perform the check. Plus the whole idea
is to perform the second part at some other non-critical time, though we
all agree that never performing the check at all is foolhardy.

Maybe we say that you can defer the check, but after a while autovacuum
runs it for you if you haven't done so. It would certainly be useful to
run the VALIDATE part as a background task with vacuum wait enabled.

 Another benefit that could arise from this is that the hypothetical
 VALIDATE CONSTRAINT step could validate more than one constraint at a
 time, possibly processing all the constraints with a single table scan.

Good thought, though not as useful for FK checks.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-09 Thread Alvaro Herrera
Simon Riggs wrote:

 Maybe we say that you can defer the check, but after a while autovacuum
 runs it for you if you haven't done so. It would certainly be useful to
 run the VALIDATE part as a background task with vacuum wait enabled.

It would be useful if there was anywhere to report the error to, or an
action that could be taken automatically.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-08 Thread Robert Treat
On Saturday 07 June 2008 16:22:56 Tom Lane wrote:
 Simon Riggs [EMAIL PROTECTED] writes:
  Perhaps we need a GUC that says expert_mode = on. ...  Another idea
  might be to make such command options superuser only, to ensure the
  power is available, yet only in the hands of, by-definition, the trusted
  few.

 This all seems pretty useless, as the sort of user most likely to shoot
 himself in the foot will also always be running as superuser.


yeah, i'm not a big fan of set enable_footgun=true since the people likely 
to get tripped up are going to blindly enable these modes. 

otoh, if we do such a thing, i would be a big fan of calling 
it enable_footgun :-)

 I'd much rather see us expend more effort on speeding up the checks
 than open holes in the system.


and i'm sure no one is against that idea, but you're never going to be able to 
match the performance of just avoiding the check. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-08 Thread Tom Lane
Robert Treat [EMAIL PROTECTED] writes:
 and i'm sure no one is against that idea, but you're never going to be able 
 to 
 match the performance of just avoiding the check. 

We'll never be able to match the performance of not having transactions,
either, but the community has never for a moment considered having a
no transactions mode.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-08 Thread Robert Treat
On Sunday 08 June 2008 20:12:15 Tom Lane wrote:
 Robert Treat [EMAIL PROTECTED] writes:
  and i'm sure no one is against that idea, but you're never going to be
  able to match the performance of just avoiding the check.

 We'll never be able to match the performance of not having transactions,
 either, but the community has never for a moment considered having a
 no transactions mode.


it's unclear what a no transaction mode would mean, but I'd be willing to 
guess some people have consider aspects of it (we've just never had 
agreement)

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-07 Thread Robert Treat
On Thursday 05 June 2008 08:56:35 Simon Riggs wrote:
 On Thu, 2008-06-05 at 07:57 -0400, Andrew Dunstan wrote:
  Simon Riggs wrote:
   pg_dump restore times can be high when they include many ALTER TABLE
   ADD FORIEGN KEY statements, since each statement checks the data to see
   if it is fully valid in all cases.
  
   I've been asked why we run that at all?, since if we dumped the
   tables together, we already know they match.
  
   If we had a way of pg_dump passing on the information that the test
   already passes, we would be able to skip the checks.
  
   Proposal:
  
   * Introduce a new mode for ALTER TABLE ADD FOREIGN KEY [WITHOUT CHECK];
   When we run WITHOUT CHECK, iff both the source and target table are
   newly created in this transaction, then we skip the check. If the check
   is skipped we mark the constraint as being unchecked, so we can tell
   later if this has been used.
  
   * Have pg_dump write the new syntax into its dumps, when both the
   source and target table are dumped in same run
  
   I'm guessing that the WITHOUT CHECK option would not be acceptable as
   an unprotected trap for our lazy and wicked users. :-)
 
  This whole proposal would be a major footgun which would definitely be
  abused, IMNSHO.

 OK, understood. Two negatives is enough to sink it.


Heh, I would have argued that the idea should go the other way and just make 
this part of the normal syntax.  Oracle DBA's have been doing this for years 
(MS SQL supports it too actually) and it really helps working around having 
to hold locks on large relations for lengthy periods of times. Heck, I'd like 
to see a no check option for all constraints really. 

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-07 Thread Simon Riggs

On Sat, 2008-06-07 at 13:08 -0400, Robert Treat wrote:
 On Thursday 05 June 2008 08:56:35 Simon Riggs wrote:
  On Thu, 2008-06-05 at 07:57 -0400, Andrew Dunstan wrote:

 Heh, I would have argued that the idea should go the other way and
 just make this part of the normal syntax.  Oracle DBA's have been
 doing this for years (MS SQL supports it too actually) and it really
 helps working around having to hold locks on large relations for
 lengthy periods of times. Heck, I'd like to see a no check option for
 all constraints really. 

Interesting that SQL Server does it also.

Holding the lock for a long period is just one more problem. :-)

I'm always torn between the I-know-what-Im-doing-so-give-me-the-option
viewpoint and the some-dumbass-will-abuse-it viewpoint. I see the
results of both viewpoints daily.

Perhaps we need a GUC that says expert_mode = on. In expert_mode we are
allowed to do a range of things that are normally avoided - there would
be an explicit list. Managers can then take a single considered decision
as to whether the situation warrants extreme action and their DBA is
good enough to handle it. That might resolve our continued angst about
whether our users our smart enough to avoid the gotchas, or just smart
enough to win a DBA's Darwin Award. 

The UNIX philosophy has always been to allow the power to exist, yet
seek to minimise the number of people who exercise it. Another idea
might be to make such command options superuser only, to ensure the
power is available, yet only in the hands of, by-definition, the trusted
few.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-07 Thread Andrew Dunstan



Simon Riggs wrote:

On Sat, 2008-06-07 at 13:08 -0400, Robert Treat wrote:
  

On Thursday 05 June 2008 08:56:35 Simon Riggs wrote:


On Thu, 2008-06-05 at 07:57 -0400, Andrew Dunstan wrote:
  


  

Heh, I would have argued that the idea should go the other way and
just make this part of the normal syntax.  Oracle DBA's have been
doing this for years (MS SQL supports it too actually) and it really
helps working around having to hold locks on large relations for
lengthy periods of times. Heck, I'd like to see a no check option for
all constraints really. 



Interesting that SQL Server does it also.

Holding the lock for a long period is just one more problem. :-)

I'm always torn between the I-know-what-Im-doing-so-give-me-the-option
viewpoint and the some-dumbass-will-abuse-it viewpoint. I see the
results of both viewpoints daily.

Perhaps we need a GUC that says expert_mode = on. In expert_mode we are
allowed to do a range of things that are normally avoided - there would
be an explicit list. Managers can then take a single considered decision
as to whether the situation warrants extreme action and their DBA is
good enough to handle it. That might resolve our continued angst about
whether our users our smart enough to avoid the gotchas, or just smart
enough to win a DBA's Darwin Award. 


The UNIX philosophy has always been to allow the power to exist, yet
seek to minimise the number of people who exercise it. Another idea
might be to make such command options superuser only, to ensure the
power is available, yet only in the hands of, by-definition, the trusted
few.

  


If we go down this road then I would far rather we tried to devise some 
safe (or semi-safe) way of doing it instead of simply providing expert 
(a.k.a. footgun) mode.


For instance, I'm wondering if we could do something with checksums of 
the input lines or something else that would make this difficult to do 
in circumstances other than pg_restore.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-07 Thread Tom Lane
Simon Riggs [EMAIL PROTECTED] writes:
 Perhaps we need a GUC that says expert_mode = on. ...  Another idea
 might be to make such command options superuser only, to ensure the
 power is available, yet only in the hands of, by-definition, the trusted
 few.

This all seems pretty useless, as the sort of user most likely to shoot
himself in the foot will also always be running as superuser.

I'd much rather see us expend more effort on speeding up the checks
than open holes in the system.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-05 Thread Heikki Linnakangas

Simon Riggs wrote:

I'm guessing that the WITHOUT CHECK option would not be acceptable as an
unprotected trap for our lazy and wicked users. :-)


Yes, that sounds scary.

Instead, I'd suggest finding ways to speed up the ALTER TABLE ADD 
FOREIGN KEY. Or speeding up COPY into a table with foreign keys already 
defined. For example, you might want to build an in-memory hash table of 
the keys in the target table, instead of issuing a query on each INSERT, 
if the target table isn't huge.


Nothing beats the speed of simply not checking the constraint, of 
course, but I'd hate to lose the protection it gives.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-05 Thread Simon Riggs

On Thu, 2008-06-05 at 10:19 +0300, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  I'm guessing that the WITHOUT CHECK option would not be acceptable as an
  unprotected trap for our lazy and wicked users. :-)
 
 Yes, that sounds scary.
 
 Instead, I'd suggest finding ways to speed up the ALTER TABLE ADD 
 FOREIGN KEY. 

I managed a suggestion for improving it for integers only, but if
anybody has any other ideas, I'm all ears. 

 Or speeding up COPY into a table with foreign keys already 
 defined. For example, you might want to build an in-memory hash table of 
 the keys in the target table, instead of issuing a query on each INSERT, 
 if the target table isn't huge.

No, that's not the problem, but I agree that is a problem also.

 Nothing beats the speed of simply not checking the constraint, of 
 course, but I'd hate to lose the protection it gives.

Are you saying you don't like the rest of the proposal, or just don't
like the idea of having that added as an unprotected option, but find
the proposal acceptable? 

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-05 Thread Richard Huxton

Simon Riggs wrote:


If we had a way of pg_dump passing on the information that the test
already passes, we would be able to skip the checks.

Proposal:

* Introduce a new mode for ALTER TABLE ADD FOREIGN KEY [WITHOUT CHECK];



* Have pg_dump write the new syntax into its dumps, when both the source
and target table are dumped in same I've been known to manually tweak dumps before now. I can see me 

forgetting this.

What about pg_dump writing out a row-count and MD5 of the rows in the 
COPY (just a textual calculation). Iff the restore checksum matches the 
dump checksum for both tables then the foreign-keys can be skipped.


If the restore checksum doesn't match the dump then it can issue a 
warning, but continue and run the full fkey check.


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-05 Thread Andrew Dunstan



Simon Riggs wrote:

pg_dump restore times can be high when they include many ALTER TABLE ADD
FORIEGN KEY statements, since each statement checks the data to see if
it is fully valid in all cases.

I've been asked why we run that at all?, since if we dumped the tables
together, we already know they match.

If we had a way of pg_dump passing on the information that the test
already passes, we would be able to skip the checks.

Proposal:

* Introduce a new mode for ALTER TABLE ADD FOREIGN KEY [WITHOUT CHECK];
When we run WITHOUT CHECK, iff both the source and target table are
newly created in this transaction, then we skip the check. If the check
is skipped we mark the constraint as being unchecked, so we can tell
later if this has been used.

* Have pg_dump write the new syntax into its dumps, when both the source
and target table are dumped in same run

I'm guessing that the WITHOUT CHECK option would not be acceptable as an
unprotected trap for our lazy and wicked users. :-)
  


This whole proposal would be a major footgun which would definitely be 
abused, IMNSHO.


I think Heikki's idea of speeding up the check using a hash table of the 
foreign keys possibly has merit.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-05 Thread Heikki Linnakangas

Simon Riggs wrote:

Are you saying you don't like the rest of the proposal, or just don't
like the idea of having that added as an unprotected option, but find
the proposal acceptable? 


I don't like the idea of having an unprotected option. If we were going 
to have one, I wouldn't bother with the extra checks you proposed; it's 
going to be unsafe anyway.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-05 Thread Simon Riggs

On Thu, 2008-06-05 at 07:57 -0400, Andrew Dunstan wrote:
 
 Simon Riggs wrote:
  pg_dump restore times can be high when they include many ALTER TABLE ADD
  FORIEGN KEY statements, since each statement checks the data to see if
  it is fully valid in all cases.
 
  I've been asked why we run that at all?, since if we dumped the tables
  together, we already know they match.
 
  If we had a way of pg_dump passing on the information that the test
  already passes, we would be able to skip the checks.
 
  Proposal:
 
  * Introduce a new mode for ALTER TABLE ADD FOREIGN KEY [WITHOUT CHECK];
  When we run WITHOUT CHECK, iff both the source and target table are
  newly created in this transaction, then we skip the check. If the check
  is skipped we mark the constraint as being unchecked, so we can tell
  later if this has been used.
 
  * Have pg_dump write the new syntax into its dumps, when both the source
  and target table are dumped in same run
 
  I'm guessing that the WITHOUT CHECK option would not be acceptable as an
  unprotected trap for our lazy and wicked users. :-)

 
 This whole proposal would be a major footgun which would definitely be 
 abused, IMNSHO.

OK, understood. Two negatives is enough to sink it.

 I think Heikki's idea of speeding up the check using a hash table of the 
 foreign keys possibly has merit.

The query is sent through SPI, so if there was a way to speed this up,
we would already be using it implicitly. If we find a way to speed up
joins it will improve the FK check also.

The typical join plan for the check query is already a hash join,
assuming the target table is small enough. If not, its a huge sort/merge
join. So in a way, we already follow the suggestion.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-05 Thread Heikki Linnakangas

Simon Riggs wrote:

On Thu, 2008-06-05 at 10:19 +0300, Heikki Linnakangas wrote:

Simon Riggs wrote:

I'm guessing that the WITHOUT CHECK option would not be acceptable as an
unprotected trap for our lazy and wicked users. :-)

Yes, that sounds scary.

Instead, I'd suggest finding ways to speed up the ALTER TABLE ADD 
FOREIGN KEY. 


I managed a suggestion for improving it for integers only, but if
anybody has any other ideas, I'm all ears. 


Well, one idea would be to allow adding multiple foreign keys in one 
command, and checking them all at once with one SQL query instead of one 
per foreign key. Right now we need one seq scan over the table per 
foreign key, by checking all references at once we would only need one 
seq scan to check them all.


Or speeding up COPY into a table with foreign keys already 
defined. For example, you might want to build an in-memory hash table of 
the keys in the target table, instead of issuing a query on each INSERT, 
if the target table isn't huge.


No, that's not the problem, but I agree that is a problem also.


It is related, because if we can make COPY into a table with foreign 
keys fast enough, we could rearrange dumps so that foreign keys are 
created before loading data. That would save the seqscan over the table 
altogether.


Thinking about this idea a bit more, instead of loading the whole target 
table into memory, it would probably make more sense to keep a hash 
table as just a cache of the most recent keys that have been referenced.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-05 Thread Simon Riggs

On Thu, 2008-06-05 at 16:01 +0300, Heikki Linnakangas wrote:
 Simon Riggs wrote:
  On Thu, 2008-06-05 at 10:19 +0300, Heikki Linnakangas wrote:
  Simon Riggs wrote:
  I'm guessing that the WITHOUT CHECK option would not be acceptable as an
  unprotected trap for our lazy and wicked users. :-)
  Yes, that sounds scary.
 
  Instead, I'd suggest finding ways to speed up the ALTER TABLE ADD 
  FOREIGN KEY. 
  
  I managed a suggestion for improving it for integers only, but if
  anybody has any other ideas, I'm all ears. 
 
 Well, one idea would be to allow adding multiple foreign keys in one 
 command, and checking them all at once with one SQL query instead of one 
 per foreign key. Right now we need one seq scan over the table per 
 foreign key, by checking all references at once we would only need one 
 seq scan to check them all.

No need. Just parallelise the restore with concurrent psql. Which would
speed up the index creation also. Does Greg have plans for further work?

  Or speeding up COPY into a table with foreign keys already 
  defined. For example, you might want to build an in-memory hash table of 
  the keys in the target table, instead of issuing a query on each INSERT, 
  if the target table isn't huge.
  
  No, that's not the problem, but I agree that is a problem also.
 
 It is related, because if we can make COPY into a table with foreign 
 keys fast enough, we could rearrange dumps so that foreign keys are 
 created before loading data. That would save the seqscan over the table 
 altogether.

True.

 Thinking about this idea a bit more, instead of loading the whole target 
 table into memory, it would probably make more sense to keep a hash 
 table as just a cache of the most recent keys that have been referenced.

If you can think of a way of improving hash joins generally, then it
will work for this specific case also.

-- 
 Simon Riggs   www.2ndQuadrant.com
 PostgreSQL Training, Services and Support


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump restore time and Foreign Keys

2008-06-05 Thread Heikki Linnakangas

Simon Riggs wrote:

On Thu, 2008-06-05 at 16:01 +0300, Heikki Linnakangas wrote:
Well, one idea would be to allow adding multiple foreign keys in one 
command, and checking them all at once with one SQL query instead of one 
per foreign key. Right now we need one seq scan over the table per 
foreign key, by checking all references at once we would only need one 
seq scan to check them all.


No need. Just parallelise the restore with concurrent psql. Which would
speed up the index creation also.


True, you could do that.


 Does Greg have plans for further work?


I believe he's busy with other stuff at the moment.

Thinking about this idea a bit more, instead of loading the whole target 
table into memory, it would probably make more sense to keep a hash 
table as just a cache of the most recent keys that have been referenced.


If you can think of a way of improving hash joins generally, then it
will work for this specific case also.


Individual RI checks performed on inserts/COPY don't do a hash join. The 
bulk check done by ALTER TABLE ADD FOREIGN KEY does, but that's 
different issue.


This hash table would be a specific trick to speed up RI checks. If 
you're anyway I/O bound, it wouldn't help, and you'd already be better 
off creating the foreign key first and loading the data after that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers