Re: [HACKERS] pg_dump in 7.4

2002-11-18 Thread Rod Taylor
On Sat, 2002-11-16 at 15:49, Alvaro Herrera wrote:
 On Fri, Nov 15, 2002 at 11:43:47AM -0500, Rod Taylor wrote:
 
  Below is a summary of what pg_depend tracks that might be useful. 
  Skipped a number of dependencies that are internal only (ie. toast table
  dependencies) as they will be regenerated correctly if their 'owners'
  are generated correctly.
  
  
  Tables depend on:
  - Namespace
  - Parent tables (if inheritance)
 
 And columns?

I only see table dependencies.

tablecmds.c line 943

  Indexes depend on:
  - Constraint (where unique / primary key constraint)
  - Index procedure
  - Index operator
  - Attributes of indexed relation
 
 On function if functional maybe? (Is this procedure?) 

Yes, forgot that marker beside 'Index Procedure'.

-- 
Rod Taylor [EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] pg_dump in 7.4

2002-11-16 Thread Philip Warner
At 11:43 AM 15/11/2002 -0500, Rod Taylor wrote:

Below is a summary of what pg_depend tracks that might be useful.


This looks excellent. If people are happy with my earlier outline, it 
should be reasonably easy to proceed...




Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 03 5330 3172  | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] pg_dump in 7.4

2002-11-15 Thread Rod Taylor
On Wed, 2002-11-13 at 23:37, Philip Warner wrote:
 At 02:53 PM 13/11/2002 -0500, Rod Taylor wrote:
 I can make a complete list tonight of whats captured.
 
 Sounds good

Below is a summary of what pg_depend tracks that might be useful. 
Skipped a number of dependencies that are internal only (ie. toast table
dependencies) as they will be regenerated correctly if their 'owners'
are generated correctly.


Expression Dependencies include:
- Operators
- Functions
- Relations (and columns)
- Aggregates


Attributes (Columns) depend on:
- Type of attribute

Tables depend on:
- Namespace
- Parent tables (if inheritance)

Default expressions depend on:
- Table
- Expression Dependencies

Indexes depend on:
- Constraint (where unique / primary key constraint)
- Index procedure
- Index operator
- Attributes of indexed relation

Aggregates depend on:
- Transformation function
- Final function (if required)

Foreign Keys depend on:
- Foreign key'd relation and its attributes
- Constrained relation and its attributes
- Unique Index on the foreign key'd relation

Check Constraints depend on:
- Expression Dependencies - includes parent relation
- Domain type (if check constraint on domain -- v7.4)

Operators depend on:
- Namespace
- Left operator type
- Right operator type
- Result operator type
- Code function
- Rest function
- Join function

Functions depend on:
- Namespace
- Language
- Return type
- Argument types (all)

Types (domains included) depend on:
- Namespace
- Input type
- Output type
- Element type (if array)
- Base type (if domain)
- Default value - Expression Dependencies

Casts depend on:
- Source type
- Target type
- Cast function

Operator Classes depend on:
- Namespaces
- Input type
- Key data type (if different than input type)
- Dependencies on operators in the class
- Dependencies on procedures in the class

Languages depend on:
- Call function
- Validation function

Triggers depend on:
- Trigger function
- Relation trigger is on
- Constrained relation (if constraint trigger)

Rules depend on:
- Relation rule is on
- Qualifying condition - Expression Dependencies
- resulting Query Tree - Expression Dependencies

Missing:
- Body of all functions

-- 
Rod Taylor [EMAIL PROTECTED]

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] pg_dump in 7.4

2002-11-14 Thread Tom Lane
Philip Warner [EMAIL PROTECTED] writes:
 At 12:39 AM 14/11/2002 -0500, Neil Conway wrote:
 Perhaps when the function is defined, we run all the SQL queries in
 the function body through the parser/analyzer/rewriter, and then
 generate dependencies on the Query trees we get back?

 Won't work for functions that build dynamic queries.

That's irrelevant for SQL functions, though, and pg_dump does not need
to worry about dependencies inside other types of functions (since only
SQL functions' bodies are examined during CREATE FUNCTION).

Neil's solution is exactly what I was thinking we should do.  Or almost
exactly, anyway: we don't want to run the rewriter before extracting
dependencies.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] pg_dump in 7.4

2002-11-13 Thread Philip Warner
At 01:50 PM 13/11/2002 +0800, Christopher Kings-Lynne wrote:

Well, the problem is that you can add a new type and then add a column to a
really old table that uses that type - that causes pain\


You may have misunderstood; I meant to add each type used by the table to 
the deps list for a table (and each function used by a view etc etc). 
Current implementation leaves the deps list blank for tables, and only 
lists the tables for a view (I think).

The problem with the current deps list is that (a) it assumes that OID 
order is important and (b) it does not do any analysis of the topology of 
the dependencies.

The latter will be substantially improved if we can get pg_depend deps into 
the dump file, and if we can do a useful analysis of the dependencies.



Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 03 5330 3172  | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly


Re: [HACKERS] pg_dump in 7.4

2002-11-13 Thread Philip Warner
At 08:52 AM 13/11/2002 -0500, Rod Taylor wrote:

The biggest trick will be trying to re-combine the ALTER ... ADD
CONSTRAINT and ALTER ... SET DEFAULT statements back into CREATE TABLE


I'm not sure this would be worth the effort - I'll grant it would be cute, 
but  getting pg_dump to understand SQL seems a little ambitious. We'd 
probably end up defining a portable schema definition language just for 
dump files.

To achieve Tom's suggestion it might be simpler to store two versions - the 
'full' version, and the 'fully deconstructed' version. If our analysis of 
the dependencies  meant we needed to break up an object, then we use the 
latter.




Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 03 5330 3172  | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] pg_dump in 7.4

2002-11-13 Thread Rod Taylor
On Wed, 2002-11-13 at 09:08, Philip Warner wrote:
 At 08:52 AM 13/11/2002 -0500, Rod Taylor wrote:
 The biggest trick will be trying to re-combine the ALTER ... ADD
 CONSTRAINT and ALTER ... SET DEFAULT statements back into CREATE TABLE
 
 I'm not sure this would be worth the effort - I'll grant it would be cute, 
 but  getting pg_dump to understand SQL seems a little ambitious. We'd 
 probably end up defining a portable schema definition language just for 
 dump files.

 To achieve Tom's suggestion it might be simpler to store two versions - the 
 'full' version, and the 'fully deconstructed' version. If our analysis of 
 the dependencies  meant we needed to break up an object, then we use the 
 latter.

Different approaches to the same result.  To me, the dependency tree is
already (mostly) broken up to start with.  So at some point you need to
teach something to re-combine in the pg_attrdef - pg_class dependencies
among others. But the opposite approach of starting with the large
objects and breaking up where required would be just as good, especially
if you only breakup the little bits that are required.

Starting with all functions broken up into their two parts (definition
and body) with the body dependent on the definition and recombining
later if they sort side by side seems much easier than trying to resolve
cycles and break it up at a later time.

An ALAP scheduling algorithm will almost always sort these things to be
side by side to allow combining on a second pass by something with the
intelligence.

-- 
  Rod Taylor


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] pg_dump in 7.4

2002-11-13 Thread Tom Lane
Philip Warner [EMAIL PROTECTED] writes:
 The suggestion of breaking items into create/alter etc is interesting - I 
 assume you are thinking of function bodies? Or is there something else?

Let's see --- foreign-key constraints are an obvious source of possible
circularities, but I see pg_dump already dumps those as separate
objects.  I recall thinking that column default and constraint clauses
might need to be broken out too, but I'm not sure why I thought that
(maybe because they can call SQL functions?).  Anything else?

A simple-minded approach would be to *always* add these things via
ALTER commands at the end, but to keep dumps legible it would be
nicer to keep them in the original table definition whenever
possible.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] pg_dump in 7.4

2002-11-13 Thread Philip Warner
At 09:29 AM 13/11/2002 -0500, Rod Taylor wrote:

An ALAP scheduling algorithm will almost always sort these things to be
side by side to allow combining on a second pass by something with the
intelligence.


Do we have a list of dependency data that we collect? eg. do we know about 
functions used in views and indexes? At this stage it's probably worth 
making a list of what we think is achievable in 7.4, and what we want to 
achieve ultimately. I certainly agree about recombining function headers 
and bodies, but there are a bunch of things that I think we can leave 
deconstructed and always move to the end of the restore, eg:

- constraints
- sequences set (not really a dependency problem)
- indexes
- comments

AFAIR, we can only split function bodies for non-SQL functions.

For views we need to know the types of casts within the view, the types 
returned by the view, the functions used by the view as well as the tables 
referenced. AFAIR, there is no way to break up a view, so it has to go 
after each ancestor.

For a table, it should be sufficient to know the constraints  types; we 
can add constraints later, but I'd be reluctant to get into doing 'ALTER 
TABLE ADD COLUMN...'.

Indexes may have a function and/or a cast? Create Index I on T( cast(id as 
my_type) )?

I'd guess constraints can depend on multiple tables, views(?), types,  
functions. Not sure what else. We can't really break these down.

So it looks like the only contentious item might be table attrs? is that right?






Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 03 5330 3172  | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [HACKERS] pg_dump in 7.4

2002-11-13 Thread Rod Taylor
 Do we have a list of dependency data that we collect? eg. do we know about 
 functions used in views and indexes? At this stage it's probably worth 

 - constraints
 - sequences set (not really a dependency problem)
 - indexes
 - comments

I can make a complete list tonight of whats captured.   Shall we tack
the list onto the end of section 3.13 (pg_depend):

http://developer.postgresql.org/docs/postgres/catalog-pg-depend.html

 For a table, it should be sufficient to know the constraints  types; we 
 can add constraints later, but I'd be reluctant to get into doing 'ALTER 
 TABLE ADD COLUMN...'.

Shouldn't ever need to do an ALTER TABLE ADD COLUMN.  But I can
certainly come up with a case for ALTER TABLE SET DEFAULT.

 Indexes may have a function and/or a cast? Create Index I on T( cast(id as 
 my_type) )?
 
 I'd guess constraints can depend on multiple tables, views(?), types,  
 functions. Not sure what else. We can't really break these down.

They can via functions.  And you can break down a function and table,
but not really types or views.


CREATE FUNCTION func  'SELECT TRUE;' LANGUAGE 'sql';

CREATE items requiring function;

-- Fill in function body.
CREATE OR REPLACE FUNCTION func ... 'real query' LANGUAGE 'sql';

 So it looks like the only contentious item might be table attrs? is that right?

More likely to be functions.  As everything else (I can think of) is
easily altered into place.

-- 
  Rod Taylor


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] pg_dump in 7.4

2002-11-13 Thread Philip Warner
At 02:53 PM 13/11/2002 -0500, Rod Taylor wrote:

I can make a complete list tonight of whats captured.


Sounds good.

If you can also indicate which parts of functions are captured - arguments, 
return type and body? IIRC, only SQL functions are compiled at define-time, 
so other functions *shouldn't* be a major problem if they are out of order 
in *most* cases.

Since pg_dump already has some intrinsic knowledge of dependencies, if 
there is much missing from pg_depend, we could probably add it to pg_dump 
(it many cases it will just be a matter of getting oids as well as names in 
SQL statements).

In terms of supporting older dump files, I think it should all work: they 
already have space for deps, mostly empty, and as Tom suggested, we should 
be able to dump leaf nodes in OID order. From the PoV of pg_dump, the 
algorthim becomes:

1. generate in-memory TOC in a convenient format.
2. pick the lowest OID *leaf* node. If none, goto 5.
3. remove it from deps of other TOC entries
4. goto 2.

[cyclic]
5. pick the lowest OID node. See if we can break it. If not repeat with 
more nodes until we can break one. If none, then goto step 8.
6. Break up the node.
7. Goto 2.

[cyclic, no resolution]
8. Pick the lowest OID node.
9. Goto 3.


There are a few issues here:

(a) we need to know dependencies of *parts* of objects in order to do the 
breakup. To me this implies we should break them up at dump time (ie. have 
FUNCTION_DEFINITION and FUNCTION_BODY TOC entries; it gets nastier with 
tables, but TABLE_DEFINITION, TABLE_CONSTRAINT, ATTRIBUTE_CONSTRAINT, and 
ATTRIBUTE_DEFAULT come to mind. So time later we can write cunning code to 
recombine them (volunteers?).

(b) Step 6 may be pointless. We probably need to verify that we will end up 
with a leaf node as a result of the breakup: I presume someone has a good 
algorithm. But if we do have a cycle, I'd guess we should just revert to an 
OID ordering, since while we may pick the topologically optimal node, it 
may well not be optimal from PostgreSQL point of view: if the node fails to 
be defined, then everything else that depends on it will fail. This has the 
advantage of being simple.





Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /(@)   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 03 5330 3172  | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly


Re: [HACKERS] pg_dump in 7.4

2002-11-13 Thread Neil Conway
Tom Lane [EMAIL PROTECTED] writes:
 * We don't store dependencies for SQL functions to things mentioned in
 the SQL function body.  (Maybe we should, but we don't.)  So there's
 data missing in that case, and possibly other cases.

This might be interesting to do, and we could tie it into the need to
invalidate PL/PgSQL functions that depend on a database object when
the object is changed.

Perhaps when the function is defined, we run all the SQL queries in
the function body through the parser/analyzer/rewriter, and then
generate dependencies on the Query trees we get back?

In any case, there would be a limit to what we could divine from the
function definition (e.g. we'd get practically no info about a
function defined in C) -- but this might make things a little nicer,
anyway.

Cheers,

Neil

-- 
Neil Conway [EMAIL PROTECTED] || PGP Key ID: DB3C29FC


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] pg_dump in 7.4

2002-11-13 Thread Philip Warner
At 12:39 AM 14/11/2002 -0500, Neil Conway wrote:

Perhaps when the function is defined, we run all the SQL queries in
the function body through the parser/analyzer/rewriter, and then
generate dependencies on the Query trees we get back?


Won't work for functions that build dynamic queries. But it might be 
interesting/worthwhile to allow user-specified dependencies; that way if a 
user has problems with database dumps etc, they could manually add 
dependencies for C functions, dynamic query functions (where possible) etc. 
It's probably more trouble than it's worth, but worth considering.






Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /()   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 03 5330 3172  | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] pg_dump in 7.4

2002-11-13 Thread Neil Conway
Philip Warner [EMAIL PROTECTED] writes:
 Won't work for functions that build dynamic queries.

Granted, but the the intent is to

(a) solve some, but not necessarily all, of the dump-order
problems

(b) drop functions that depend on a database object when the
database object itself is dropped -- i.e. you don't *want*
to worry about dynamically built queries, as they are fine
already

Cheers,

Neil

-- 
Neil Conway [EMAIL PROTECTED] || PGP Key ID: DB3C29FC


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



[HACKERS] pg_dump in 7.4

2002-11-12 Thread Christopher Kings-Lynne
Hi,

Has anyone given much thought to improving pg_dump's object order algorithm
for 7.4?  It seems that now we have dependencies, it should just be a matter
of doing a breadth-first or depth-first search over the pg_depend table to
generate a valid order of oids.

To allow for mess-ups in that table, the next step would be to add to the
end of the list of oids any objects that for whatever reason aren't in the
dependency system.  (Is this possible?  Manual hacking can do it
methinks...)

Does this sound like an idea?

I've just become rather frustrated trying to do a test reload of our 7.2.3
dump into 7.3b5.  The problem is all the tsearch types are declared after
the tables that actually use them!

Chris


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] pg_dump in 7.4

2002-11-12 Thread Philip Warner
At 01:33 PM 13/11/2002 +0800, Christopher Kings-Lynne wrote:

Does this sound like an idea?


It does, but in keeping with allowing pg_restore to be quite flexible, I'd 
like to see the dependency data stored in the dump file, then processed at 
restore-time.


I've just become rather frustrated trying to do a test reload of our 7.2.3
dump into 7.3b5.  The problem is all the tsearch types are declared after
the tables that actually use them!


pg_dump already has rudimentary dependency tracking (one level deep); each 
item can have a list of oid's it depends on. You *could* patch it to add 
the types to the table dependencies.

In the future I'd imagine we'll just dump the OIDs of all first level 
dependencies for each object, then at restore-time, process them in 
whatever order the user requests (defaulting to dependency-order).




Philip Warner| __---_
Albatross Consulting Pty. Ltd.   |/   -  \
(A.B.N. 75 008 659 498)  |  /()   __---_
Tel: (+61) 0500 83 82 81 | _  \
Fax: (+61) 03 5330 3172  | ___ |
Http://www.rhyme.com.au  |/   \|
 |----
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] pg_dump in 7.4

2002-11-12 Thread Christopher Kings-Lynne
 pg_dump already has rudimentary dependency tracking (one level
 deep); each
 item can have a list of oid's it depends on. You *could* patch it to add
 the types to the table dependencies.

 In the future I'd imagine we'll just dump the OIDs of all first level
 dependencies for each object, then at restore-time, process them in
 whatever order the user requests (defaulting to dependency-order).

Well, the problem is that you can add a new type and then add a column to a
really old table that uses that type - that causes pain.  Lots of other
people have also reported the view dumped before table it is based on
problem.

Chris


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html