Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-17 Thread Bruce Momjian


Thread added to TODO.detail/drop.

---

Christopher Kings-Lynne wrote:
  Actually, what we need to do to reclaim space is to enable table
  recreation without the column, now that we have relfilenode for file
  renaming.  It isn't hard to do, but no one has focused on it.  I want to
  focus on it, but have not had the time, obviously, and would be very
  excited to assist someone else.
 
  Hiroshi's fine idea of marking certain columns as unused would not have
  reclaimed the missing space, just as my idea of physical/logical column
  distinction would not reclaim the space either.  Again, my
  physical/logical idea is more for fixing other problems and
  optimization, not DROP COLUMN.
 
 Hmmm.  Personally, I think that a DROP COLUMN that cannot reclaim space is
 kinda useless - you may as well just use a view!!!
 
 So how would this occur?:
 
 1. Lock target table for writing (allow reads)
 2. Begin a table scan on target table, writing
a new file with a particular filenode
 3. Delete the attribute row from pg_attribute
 4. Point the table in the catalog to the new filenode
 5. Release locks
 6. Commit transaction
 7. Delete orhpan filenode
 
 i. Upon postmaster startup, remove any orphaned filenodes
 
 The real problem here is the fact that there are now missing attnos in
 pg_attribute.  Either that's handled or we renumber the attnos - which is
 also quite hard?
 
 This, of course, suffers from the double size data problem - but I believe
 that it does not matter - we just need to document it.
 
 Interestingly enough, Oracle support
 
 ALTER TABLE foo SET UNUSED col;
 
 Which invalidates the attribute entry, and:
 
 ALTER TABLE foo DROP col CHECKPOINT 1000;
 
 Which actually reclaims the space.  The optional CHECKPOINT [n] clause
 tells Oracle to do a checkpoint every [n] rows.
 
 Checkpointing cuts down the amount of undo logs accumulated during the
 drop column operation to avoid running out of rollback segment space.
 However, if this statement is interrupted after a checkpoint has been
 applied, the table remains in an unusable state. While the table is
 unusable, the only operations allowed on it are DROP TABLE, TRUNCATE
 TABLE, and ALTER TABLE DROP COLUMNS CONTINUE (described below). 
 
 Chris
 
 
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-14 Thread Hiroshi Inoue
Christopher Kings-Lynne wrote:
 
 Also, it seems to me that at some point we are forced to break client
 compatibility.

It's not a users' consensus at all. I'm suspicious if
DROP COLUMN is such a significant feature to break
client compatibility at our ease.

 Either we add attisdropped field to pg_attribute, or we use
 Hiroshi's (-1 * attnum - offset) idea.  Both Tom and Hiroshi have good
 reasons for each of these - would it be possible for you guys to post with
 your reasons for and against both the techniques. 

I don't object to adding attisdropped field. What
I meant to say is that the differene is very small.

regards,
Hiroshi Inoue

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-13 Thread Christopher Kings-Lynne

 Updating pg_attribute per se is not so hard --- just store new copies of
 all the rows for the table.  However, propagating the changes into other
 places could be quite painful (I'm thinking of column numbers in stored
 constraints, rules, etc).

 It seems to me that reducing the column to NULLs already gets you the
 majority of the space savings.  I don't think there is a case to be made
 that getting back that last bit is worth the pain involved, either in
 implementation effort or direct runtime costs (do you really want a DROP
 COLUMN to force an immediate rewrite of the whole table?)

OK, sounds fair.  However, is there a more aggressive way of reclaiming the
space?  The problem with updating all the rows to null for that column is
that the on-disk size is doubled anyway, right?  So, could a VACUUM FULL
process do the nulling for us?  Vacuum works outside of normal transaction
constraints anyway...?

Also, it seems to me that at some point we are forced to break client
compatibility.  Either we add attisdropped field to pg_attribute, or we use
Hiroshi's (-1 * attnum - offset) idea.  Both Tom and Hiroshi have good
reasons for each of these - would it be possible for you guys to post with
your reasons for and against both the techniques.  I just want to get to an
implementation specification we all agree on that can be written up and then
the coding can proceed.  Maybe we should add it to the 'Postgres Major
Projects' page - and remove those old ones that have already been
implemented.

Chris



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-12 Thread Dave Page



 -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED]] 
 Sent: 12 April 2002 03:54
 To: Bruce Momjian
 Cc: Hiroshi Inoue; Christopher Kings-Lynne; 
 [EMAIL PROTECTED]
 Subject: Re: RFC: Restructuring pg_aggregate 
 
 
 Bruce Momjian [EMAIL PROTECTED] writes:
  I think that is why Tom was suggesting making all the column values 
  NULL and removing the pg_attribute row for the column.
 
 That was not my suggestion.
 
  With a NULL value, it
  doesn't take up any room in the tuple, and with the pg_attribute 
  column gone, no one will see that row.  The only problem is 
 the gap in 
  attno numbering.  How big a problem is that?
 
 You can't do it that way unless you're intending to rewrite 
 all rows of the relation before committing the ALTER; which 
 would be the worst of both worlds.  The pg_attribute row 
 *must* be retained to show the datatype of the former column, 
 so that we can correctly skip over it in tuples where the 
 column isn't yet nulled out.
 
 Hiroshi did this by renumbering the attnum; I propose leaving 
 attnum alone and instead adding an attisdropped flag.  That 
 would avoid creating a gap in the column numbers, but either 
 way is likely to affect some applications that inspect pg_attribute.

Applications like pgAdmin that inspect pg_attribute are being seriously
hacked to incorporate schema support anyway for 7.3. Personnally I'd be glad
to spend some time re-coding to allow for this, just to not have to answer
the numerous 'how do I drop a column' emails I get reguarly.

Regards, Dave.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-11 Thread Christopher Kings-Lynne

 Actually, what we need to do to reclaim space is to enable table
 recreation without the column, now that we have relfilenode for file
 renaming.  It isn't hard to do, but no one has focused on it.  I want to
 focus on it, but have not had the time, obviously, and would be very
 excited to assist someone else.

 Hiroshi's fine idea of marking certain columns as unused would not have
 reclaimed the missing space, just as my idea of physical/logical column
 distinction would not reclaim the space either.  Again, my
 physical/logical idea is more for fixing other problems and
 optimization, not DROP COLUMN.

Hmmm.  Personally, I think that a DROP COLUMN that cannot reclaim space is
kinda useless - you may as well just use a view!!!

So how would this occur?:

1. Lock target table for writing (allow reads)
2. Begin a table scan on target table, writing
   a new file with a particular filenode
3. Delete the attribute row from pg_attribute
4. Point the table in the catalog to the new filenode
5. Release locks
6. Commit transaction
7. Delete orhpan filenode

i. Upon postmaster startup, remove any orphaned filenodes

The real problem here is the fact that there are now missing attnos in
pg_attribute.  Either that's handled or we renumber the attnos - which is
also quite hard?

This, of course, suffers from the double size data problem - but I believe
that it does not matter - we just need to document it.

Interestingly enough, Oracle support

ALTER TABLE foo SET UNUSED col;

Which invalidates the attribute entry, and:

ALTER TABLE foo DROP col CHECKPOINT 1000;

Which actually reclaims the space.  The optional CHECKPOINT [n] clause
tells Oracle to do a checkpoint every [n] rows.

Checkpointing cuts down the amount of undo logs accumulated during the
drop column operation to avoid running out of rollback segment space.
However, if this statement is interrupted after a checkpoint has been
applied, the table remains in an unusable state. While the table is
unusable, the only operations allowed on it are DROP TABLE, TRUNCATE
TABLE, and ALTER TABLE DROP COLUMNS CONTINUE (described below). 

Chris



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-11 Thread Tom Lane

Peter Eisentraut [EMAIL PROTECTED] writes:
 Tom Lane writes:
 Why shouldn't aggregate functions have entries in pg_proc?  Then one
 search would cover both possibilities, and we could eliminate some
 duplicate code in the parser.

 Furthermore, we could make the new function privileges apply to aggregates
 as well.

GRANT/REVOKE FUNCTION will now work on aggregate functions too (is there
any value in making a variant syntax for aggregates?).  However, I
didn't implement enforcement of the EXECUTE privilege yet.  I was
slightly bemused to notice that your implementation of it for regular
functions tests the privilege at plan startup but doesn't actually throw
the error until the function is called.  What's the point of that?
Seems like we might as well throw the error in init_fcache and not
bother with storing a boolean.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-11 Thread Peter Eisentraut

Tom Lane writes:

 I was slightly bemused to notice that your implementation of it for
 regular functions tests the privilege at plan startup but doesn't
 actually throw the error until the function is called.  What's the
 point of that? Seems like we might as well throw the error in
 init_fcache and not bother with storing a boolean.

Yeah, it's a bit funny.  I wanted to keep the fcache code from doing
anything not to do with caching, and I wanted to keep the permission check
in the executor, like it is for tables.

There were a couple of cases, which I have not fully explored yet, for
which this seemed like a good idea, such as some functions being in the
plan but not being executed, or the permission check being avoided for
some functions (e.g., cast functions).

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-11 Thread Hiroshi Inoue
Bruce Momjian wrote:
 
 Hiroshi Inoue wrote:
  Christopher Kings-Lynne wrote:
  
 I think that is why Tom was suggesting making all the column values NULL
 and removing the pg_attribute row for the column.  With a NULL value, it
 doesn't take up any room in the tuple, and with the pg_attribute column
 gone, no one will see that row.  The only problem is the gap in attno
 numbering.  How big a problem is that?

There's no problem with applications which don't inquire
of system catalogs(pg_attribute). Unfortunately we have 
been very tolerant of users' access on system tables and
there would be pretty many applications which inquire of
pg_attribute.

regards,
Hiroshi Inoue

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html


Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-11 Thread Tom Lane

Christopher Kings-Lynne [EMAIL PROTECTED] writes:
 The real problem here is the fact that there are now missing attnos in
 pg_attribute.  Either that's handled or we renumber the attnos - which is
 also quite hard?

Updating pg_attribute per se is not so hard --- just store new copies of
all the rows for the table.  However, propagating the changes into other
places could be quite painful (I'm thinking of column numbers in stored
constraints, rules, etc).

It seems to me that reducing the column to NULLs already gets you the
majority of the space savings.  I don't think there is a case to be made
that getting back that last bit is worth the pain involved, either in
implementation effort or direct runtime costs (do you really want a DROP
COLUMN to force an immediate rewrite of the whole table?)

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-11 Thread Bruce Momjian

Christopher Kings-Lynne wrote:
  Actually, what we need to do to reclaim space is to enable table
  recreation without the column, now that we have relfilenode for file
  renaming.  It isn't hard to do, but no one has focused on it.  I want to
  focus on it, but have not had the time, obviously, and would be very
  excited to assist someone else.
 
  Hiroshi's fine idea of marking certain columns as unused would not have
  reclaimed the missing space, just as my idea of physical/logical column
  distinction would not reclaim the space either.  Again, my
  physical/logical idea is more for fixing other problems and
  optimization, not DROP COLUMN.
 
 Hmmm.  Personally, I think that a DROP COLUMN that cannot reclaim space is
 kinda useless - you may as well just use a view!!!

Yep, kind of a problem.  It is a tradeoff between double diskspace/speed
and removing column from disk.  I guess that's why Oracle has both.

 
 So how would this occur?:
 
 1. Lock target table for writing (allow reads)
 2. Begin a table scan on target table, writing
a new file with a particular filenode
 3. Delete the attribute row from pg_attribute
 4. Point the table in the catalog to the new filenode
 5. Release locks
 6. Commit transaction
 7. Delete orhpan filenode

Yep, something like that.  CLUSTER is a good start.  DROP COLUMN just
deals with the attno too.  You would have to renumber them to fill the
gap.

 i. Upon postmaster startup, remove any orphaned filenodes

Actually, we don't have a good solution for finding orphaned filenodes
right now.  I do have some code that tries to do this as part of VACUUM
but it was not 100% perfect, so it was rejected.  I am willing to open
the discussion to see if a perfect solution can be found.


-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-11 Thread Bruce Momjian

Tom Lane wrote:
 Christopher Kings-Lynne [EMAIL PROTECTED] writes:
  The real problem here is the fact that there are now missing attnos in
  pg_attribute.  Either that's handled or we renumber the attnos - which is
  also quite hard?
 
 Updating pg_attribute per se is not so hard --- just store new copies of
 all the rows for the table.  However, propagating the changes into other
 places could be quite painful (I'm thinking of column numbers in stored
 constraints, rules, etc).
 
 It seems to me that reducing the column to NULLs already gets you the
 majority of the space savings.  I don't think there is a case to be made
 that getting back that last bit is worth the pain involved, either in
 implementation effort or direct runtime costs (do you really want a DROP
 COLUMN to force an immediate rewrite of the whole table?)

That is an excellent point about having to fix all the places that refer
to attno.  In fact, we have been moving away from attname references to
attno references for a while, so it only gets worse.  Tom is also
correct that setting it to NULL removes the problem of disk space usage
quite easily.

That only leaves the problem of having gaps in the pg_attribute for that
relation, and as I remember, that was the problem for Hiroshi's DROP
COLUMN change, but at this point, after years of delay with no great
solution on the horizon, we may as well just get this working.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-11 Thread Hiroshi Inoue
Christopher Kings-Lynne wrote:
 
 Hmmm.  Personally, I think that a DROP COLUMN that cannot reclaim space is
 kinda useless - you may as well just use a view!!!
 
 So how would this occur?:
 
 1. Lock target table for writing (allow reads)
 2. Begin a table scan on target table, writing
a new file with a particular filenode
 3. Delete the attribute row from pg_attribute
 4. Point the table in the catalog to the new filenode
 5. Release locks
 6. Commit transaction
 7. Delete orhpan filenode
 
 i. Upon postmaster startup, remove any orphaned filenodes
 
 The real problem here is the fact that there are now missing attnos in
 pg_attribute.  Either that's handled or we renumber the attnos - which is
 also quite hard?

The attnos should be renumbered and it's easy.
But the above seems only 20% of the total implementation.
If the attnos are renumbered, all objects which refer to 
the numbers must be invalidated or re-compiled ...
For example the parameters of foreign key constraints
triggers are consist of relname and colnames currently.
There has been a proposal that change to use relid or
column numbers instead. Certainly it makes RENAME happy
but DROP COLUMN unhappy. If there's a foreign key a_rel/1/3
and the second column of the relation is dropped the
parameter must be changed to be a_rel/1/2. If neither
foreign key stuff nor DROP COLUMN take the other into
account, the consistency is easily broken. 

regards,
Hiroshi Inoue

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-11 Thread Bruce Momjian

Hiroshi Inoue wrote:
 Christopher Kings-Lynne wrote:
  
  Hmmm.  Personally, I think that a DROP COLUMN that cannot reclaim space is
  kinda useless - you may as well just use a view!!!
  
  So how would this occur?:
  
  1. Lock target table for writing (allow reads)
  2. Begin a table scan on target table, writing
 a new file with a particular filenode
  3. Delete the attribute row from pg_attribute
  4. Point the table in the catalog to the new filenode
  5. Release locks
  6. Commit transaction
  7. Delete orhpan filenode
  
  i. Upon postmaster startup, remove any orphaned filenodes
  
  The real problem here is the fact that there are now missing attnos in
  pg_attribute.  Either that's handled or we renumber the attnos - which is
  also quite hard?
 
 The attnos should be renumbered and it's easy.
 But the above seems only 20% of the total implementation.
 If the attnos are renumbered, all objects which refer to 
 the numbers must be invalidated or re-compiled ...
 For example the parameters of foreign key constraints
 triggers are consist of relname and colnames currently.
 There has been a proposal that change to use relid or
 column numbers instead. Certainly it makes RENAME happy
 but DROP COLUMN unhappy. If there's a foreign key a_rel/1/3
 and the second column of the relation is dropped the
 parameter must be changed to be a_rel/1/2. If neither
 foreign key stuff nor DROP COLUMN take the other into
 account, the consistency is easily broken. 

I think that is why Tom was suggesting making all the column values NULL
and removing the pg_attribute row for the column.  With a NULL value, it
doesn't take up any room in the tuple, and with the pg_attribute column
gone, no one will see that row.  The only problem is the gap in attno
numbering.  How big a problem is that?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-11 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 I think that is why Tom was suggesting making all the column values NULL
 and removing the pg_attribute row for the column.

That was not my suggestion.

 With a NULL value, it
 doesn't take up any room in the tuple, and with the pg_attribute column
 gone, no one will see that row.  The only problem is the gap in attno
 numbering.  How big a problem is that?

You can't do it that way unless you're intending to rewrite all rows of
the relation before committing the ALTER; which would be the worst of
both worlds.  The pg_attribute row *must* be retained to show the
datatype of the former column, so that we can correctly skip over it
in tuples where the column isn't yet nulled out.

Hiroshi did this by renumbering the attnum; I propose leaving attnum
alone and instead adding an attisdropped flag.  That would avoid
creating a gap in the column numbers, but either way is likely to affect
some applications that inspect pg_attribute.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-10 Thread Hiroshi Inoue
Tom Lane wrote:
 
 Hiroshi Inoue [EMAIL PROTECTED] writes:
  Tom Lane wrote:
  That means that
  a lot of low-level places *do* need to know about the dropped-column
  convention, else they can't make any sense of tuple layouts.
 
  Why ? As you already mentioned, there were not that many places
  to be changed.
 
 There are not many places to change if the implementation uses
 attisdropped, because we *only* have to hide the existence of the column
 at the parser level.  The guts of the system don't know anything funny
 is going on; a dropped column looks the same as an undropped one
 throughout the executor.  But with negative attnums, even such basic
 routines as heap_formtuple have to know about it, no?

When a tuple descriptor is made, the info of
dropped columns are placed at (their physical
position - 1) index in the same way as ordinary
columns. There are few places where conversions
between negative attnums and the physical positions
are needed.

The following is my posting more than 2 years ago.
What's changed since then.

regards,
Hiroshi Inoue

  I don't want a final implementation this time.
  What I want is to provide a quick hack for both others and me
  to judge whether this direction is good or not.

  My idea is essentially an invisible column implementation.
  DROP COLUMN would change the target pg_attribute tuple
  as follows..

attnum  -  an offset - attnum;
atttypid -  0

  We would be able to see where to change by tracking error/
  crashes caused by this change.

  I would also change attname to '*already dropped %d' for
  examle to avoid duplicate attname.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-10 Thread Hiroshi Inoue
Bruce Momjian wrote:
 
 Hiroshi Inoue wrote:
  Tom Lane wrote:
 
   Hiroshi's "DROP_COLUMN_HACK" was essentially along this line, but
   I think he made a representational mistake by trying to change the
   attnums of dropped columns to be negative values.
 
  Negative attnums had 2 advantages then. It had a big
  advantage that initdb isn't needed. Note that it was
  only a trial hack and there was no consensus on the way.
  It was very easy to change the implementation to use
  attisdropped. OTOH physical/logical attnums approach
  needed the change on pg_class, pg_attribute and so
  I've never had a chance to open the patch to public.
  It was also more sensitive about oversights of needed
  changes than the attisdropped flag approach.
 
   That means that
   a lot of low-level places *do* need to know about the dropped-column
   convention, else they can't make any sense of tuple layouts.
 
  Why ? As you already mentioned, there were not that many places
  to be changed.
 
  Well what's changed since then ?
 
 Here is an old email from me that outlines the idea of having a
 physical/logical attribute numbering system, and the advantages. 

I already tried physical/logical attribute implementation 
pretty long ago. Where are new ideas to solve the problems
that the approach has ?

regards,
Hiroshi Inoue

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-10 Thread Bruce Momjian

Hiroshi Inoue wrote:
   Why ? As you already mentioned, there were not that many places
   to be changed.
  
   Well what's changed since then ?
  
  Here is an old email from me that outlines the idea of having a
  physical/logical attribute numbering system, and the advantages. 
 
 I already tried physical/logical attribute implementation 
 pretty long ago. Where are new ideas to solve the problems
 that the approach has ?

Good question.  I am suggesting more than just the drop column fix.  It
could be used for smaller data files to reduce padding, fix for
inheritance problems with ADD COLUMN, and performance of moving
varlena's to the end of the row.

Also, my idea was to have the physical/logical mapping happen closer to
the client, so the backend mostly only deals with physical.  I was
thinking of having the libpq backend communication layer actually do the
reordering of the return results.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-10 Thread Hiroshi Inoue
Bruce Momjian wrote:
 
 Hiroshi Inoue wrote:
Why ? As you already mentioned, there were not that many places
to be changed.
   
Well what's changed since then ?
  
   Here is an old email from me that outlines the idea of having a
   physical/logical attribute numbering system, and the advantages.
 
  I already tried physical/logical attribute implementation
  pretty long ago. Where are new ideas to solve the problems
  that the approach has ?
 
 Good question.  I am suggesting more than just the drop column fix.  It
 could be used for smaller data files to reduce padding, fix for
 inheritance problems with ADD COLUMN, and performance of moving
 varlena's to the end of the row.
 
 Also, my idea was to have the physical/logical mapping happen closer to
 the client, so the backend mostly only deals with physical.

If the client has to bear the some part, isn't the invisible
column approach much simpler ?

I've put a pretty much time into DROP COLUMN feature but
I am really disappointed to see the comments in this thread.
What DROP COLUMN has brought me seems only a waste of time.

Possibly I must have introduced either implementation forcibly.

regards,
Hiroshi Inoue

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-10 Thread Bruce Momjian

Hiroshi Inoue wrote:
 If the client has to bear the some part, isn't the invisible
 column approach much simpler ?
 
 I've put a pretty much time into DROP COLUMN feature but
 I am really disappointed to see the comments in this thread.
 What DROP COLUMN has brought me seems only a waste of time.
 
 Possibly I must have introduced either implementation forcibly.

I understand.  I personally think maybe we have been a little to picky
about patches being accepted.  Sometimes when something is not 100%
perfect, we do nothing rather than accept the patch, and replace or
improve it later.  The DROP COLUMN approach you had clearly is one of
them.

Personally, now that we have relfilenode, I think we should implement
drop of columns by just recreating the table without the column.

The big problem with DROP COLUMN was that we couldn't decide on what to
do, so we did nothing, which is probably worse than just choosing one
and doing it.

Our code is littered with my 80% solutions for LIKE optimization,
optimizer statistics, BETWEEN, and lots of other solutions that have met
a need and are now being replaced with better code.  My code was not
great, but I hadn't dont them, PostgreSQL would have had even more
missing features than we do now.  DROP COLUMN is clearly one where we
missed getting something that works and would keep people happy.

As far as my proposal, my idea was not to do it in the client, but
rather to do it just before the data is sent from/to the client.  Maybe
that is a stupid idea.  I never really researched it.  My idea was more
to make the physical/logical column numbers distinct so certain tricks
could be performed.  It wasn't for DROP COLUMN specifically, and in fact
to do DROP COLUMN with my code, there would have to be more code similar
to what you had where clients would see a column and have to skip it.  I
was focusing more on physical/logical to enable other features.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-10 Thread Bruce Momjian

Christopher Kings-Lynne wrote:
  If the client has to bear the some part, isn't the invisible
  column approach much simpler ?
 
  I've put a pretty much time into DROP COLUMN feature but
  I am really disappointed to see the comments in this thread.
  What DROP COLUMN has brought me seems only a waste of time.
 
 I kind of agree with Hiroshi here.  All I want to be able to do is drop
 columns from my tables, and reclaim the space.  I've got all sorts of
 production tables with columns just sitting there doing nothing, awaiting
 the time that I can happily drop them.  It seems to me that whatever we do
 will require some kind of client breakage.

Actually, what we need to do to reclaim space is to enable table
recreation without the column, now that we have relfilenode for file
renaming.  It isn't hard to do, but no one has focused on it.  I want to
focus on it, but have not had the time, obviously, and would be very
excited to assist someone else.

Hiroshi's fine idea of marking certain columns as unused would not have
reclaimed the missing space, just as my idea of physical/logical column
distinction would not reclaim the space either.  Again, my
physical/logical idea is more for fixing other problems and
optimization, not DROP COLUMN.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-10 Thread Bruce Momjian

Christopher Kings-Lynne wrote:
  Actually, what we need to do to reclaim space is to enable table
  recreation without the column, now that we have relfilenode for file
  renaming.  It isn't hard to do, but no one has focused on it.  I want to
  focus on it, but have not had the time, obviously, and would be very
  excited to assist someone else.
 
 I'm happy to help - depends if it's within my skill level or not tho.  Most
 of the time the problem I have is finding where to make the changes, not
 actually making the changes themselves.  So, count me in.

OK, let me mention that I have had great success with chat sessions with
PostgreSQL developers.  They can code and ask questions and I can answer
quickly.  Seems to be speeding things along for some people.  I am:

AIM bmomjian
ICQ 151255111
Yahoo   bmomjian
MSN [EMAIL PROTECTED]

I am also on the PostgreSQL IRC channel.  As far as where to start, I
think the CLUSTER command would be a good start because it just reorders
the existing table.  Then DROP COLUMN can come out of that by removing
the column during the copy, and removing mention of the column from
pg_attribute, and of course renumbering the gap.

  Hiroshi's fine idea of marking certain columns as unused would not have
  reclaimed the missing space, just as my idea of physical/logical column
  distinction would not reclaim the space either.  Again, my
  physical/logical idea is more for fixing other problems and
  optimization, not DROP COLUMN.
 
 Question: Is it _possible_ to reclaim the space during a VACUUM FULL?  I do
 not know enough about the file format to know this.  What happens if the
 VACUUM is stopped halfway thru reclaiming a column in a table?

Not really.  I moves only whole tuples, and only certain ones.

 Bruce: WRT modifying libpq to do the translation - won't this cause probs
 for JDBC and ODBC people?

No, not in libpq, but rather in backend/libpq, the backend part of the
connection.  My idea is for the user to think things are in a different
order in the row than they actually appear on disk.  I haven't really
researched it enough to understand its validity.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-10 Thread Hiroshi Inoue
Christopher Kings-Lynne wrote:
 
  If the client has to bear the some part, isn't the invisible
  column approach much simpler ?
 
  I've put a pretty much time into DROP COLUMN feature but
  I am really disappointed to see the comments in this thread.
  What DROP COLUMN has brought me seems only a waste of time.
 
 I kind of agree with Hiroshi here.  All I want to be able to do is drop
 columns from my tables, and reclaim the space.  I've got all sorts of
 production tables with columns just sitting there doing nothing, awaiting
 the time that I can happily drop them.

 It seems to me that whatever we do
 will require some kind of client breakage.

Physical/logical attnum approach was mainly to not break
clients. I implemented it on trial but the implementation
was hard to maintain unfortunately. It's pretty difficult
to decide whether the number is physical or logical in
many cases.

regards,
Hiroshi Inoue

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org


Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-10 Thread Bruce Momjian

Hiroshi Inoue wrote:
  It seems to me that whatever we do
  will require some kind of client breakage.
 
 Physical/logical attnum approach was mainly to not break
 clients. I implemented it on trial but the implementation
 was hard to maintain unfortunately. It's pretty difficult
 to decide whether the number is physical or logical in
 many cases.

How many cases do we have that use logical numering?  Hiroshi, I know
you are the expert on this.  I know 'SELECT *' uses it, but are their
other places that need to know about the logical ordering of the
columns?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-10 Thread Hiroshi Inoue
Bruce Momjian wrote:
 
 Hiroshi Inoue wrote:
  If the client has to bear the some part, isn't the invisible
  column approach much simpler ?
 
  I've put a pretty much time into DROP COLUMN feature but
  I am really disappointed to see the comments in this thread.
  What DROP COLUMN has brought me seems only a waste of time.
 
  Possibly I must have introduced either implementation forcibly.
 
 I understand.  I personally think maybe we have been a little to picky
 about patches being accepted.  Sometimes when something is not 100%
 perfect, we do nothing rather than accept the patch, and replace or
 improve it later.  The DROP COLUMN approach you had clearly is one of
 them.

I don't complain about the rejection of my patch.
If it has an essential flaw we had better reject it.
What I'm complaining is why it is OK now whereas
there's nothing new.

regards,
Hiroshi Inoue

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-10 Thread Bruce Momjian

Hiroshi Inoue wrote:
 Bruce Momjian wrote:
  
  Hiroshi Inoue wrote:
   If the client has to bear the some part, isn't the invisible
   column approach much simpler ?
  
   I've put a pretty much time into DROP COLUMN feature but
   I am really disappointed to see the comments in this thread.
   What DROP COLUMN has brought me seems only a waste of time.
  
   Possibly I must have introduced either implementation forcibly.
  
  I understand.  I personally think maybe we have been a little to picky
  about patches being accepted.  Sometimes when something is not 100%
  perfect, we do nothing rather than accept the patch, and replace or
  improve it later.  The DROP COLUMN approach you had clearly is one of
  them.
 
 I don't complain about the rejection of my patch.
 If it has an essential flaw we had better reject it.
 What I'm complaining is why it is OK now whereas
 there's nothing new.

Sure, I understand.

My physical/logical idea may have the same problems as your DROP COLUMN
idea, and may be as rapidly rejected.  I am just throwing it out for
discussion.

I am not sure I like it.  :-)

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-10 Thread Tom Lane

Hiroshi Inoue [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 That means that
 a lot of low-level places *do* need to know about the dropped-column
 convention, else they can't make any sense of tuple layouts.

 Why ? As you already mentioned, there were not that many places
 to be changed.

There are not many places to change if the implementation uses
attisdropped, because we *only* have to hide the existence of the column
at the parser level.  The guts of the system don't know anything funny
is going on; a dropped column looks the same as an undropped one
throughout the executor.  But with negative attnums, even such basic
routines as heap_formtuple have to know about it, no?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-07 Thread Christopher Kings-Lynne

 That seems like pure speculation to me, if not outright wrong.  If we
 can't renumber the attnums it'll be because the table's tuples still
 have data at a particular column position.  Since we'll need to know
 the datatype of that data (if only to skip over it correctly), there
 will still have to be a pg_attribute entry for the dropped column.
 Thus, what people will more likely have to watch out for is pg_attribute
 rows marked deleted in some fashion.

You know there is a way to do this and not break client compatibility.
Rename the current pg_attribute relation to pg_baseatt or something.  Fix
all references to it in the code.  Create a system view called pg_attribute
which is SELECT * (except attisdropped) FROM pg_baseattr WHERE NOT
attisdropped.

More work though, of course.

 We are actually not that far away from being able to do DROP COLUMN,
 if people don't mind being slow to recover the space used by a dropped
 column.  It'd work like this:

Logical vs. physical column numbers would still be quite handy tho.  If
you're going to break compatibility, may as well do all breaks at once?

 1. Add an attisdropped boolean to pg_attribute.

 2. DROP COLUMN sets this flag and changes attname to something like
 ***deleted_NNN.  (Changing attname is only necessary to allow the
 same column name to be reused without drawing a unique-index error.)
 That's it --- it's done.

 3. Column lookup, expansion of *, etc have to be taught to ignore
 columns marked attisdropped.

 The idea is that the extant data sits there but is invisible.  Inserts
 of new rows in the table would always insert a NULL in the dropped
 column (which'd fall out more or less for free, there being no way
 to tell the system to insert anything else).  Over time, UPDATEs of
 extant rows would also replace the dropped data with NULLs.

Would it be possible to modify VACUUM FULL in some way so as to permanently
remove these tuples?  Surely people would like an actual space-saving column
drop?

Chris


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-07 Thread Tom Lane

Christopher Kings-Lynne [EMAIL PROTECTED] writes:
 You know there is a way to do this and not break client compatibility.
 Rename the current pg_attribute relation to pg_baseatt or something.  Fix
 all references to it in the code.  Create a system view called pg_attribute
 which is SELECT * (except attisdropped) FROM pg_baseattr WHERE NOT
 attisdropped.

Wasn't your original concern that the attnums wouldn't be consecutive?
How is this view going to hide that?

 Logical vs. physical column numbers would still be quite handy tho.

But confusing as all hell, at *all* levels of the code ... I've thought
about that quite a bit, and I can't see that we could expect to make it
work without a lot of hard-to-find bugs.  Too many places where it's
not instantly obvious which set of numbers you should be using.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-06 Thread Christopher Kings-Lynne

 A more serious objection is that this will break client applications
 that know about the pg_aggregate table.  However, 7.3 is already going
 to force a lot of reprogramming of clients that inspect system tables,
 because of the addition of namespaces.  Restructuring pg_aggregate
 doesn't seem like it makes life all that much worse.

How about putting a note in the 7.3 release that tells them not to rely on
sequential attnums in tn pg_attribute anymore.  That should make it easier
to implement column dropping in the future?

Chris



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-06 Thread Tom Lane

Christopher Kings-Lynne [EMAIL PROTECTED] writes:
 How about putting a note in the 7.3 release that tells them not to rely on
 sequential attnums in tn pg_attribute anymore.  That should make it easier
 to implement column dropping in the future?

That seems like pure speculation to me, if not outright wrong.  If we
can't renumber the attnums it'll be because the table's tuples still
have data at a particular column position.  Since we'll need to know
the datatype of that data (if only to skip over it correctly), there
will still have to be a pg_attribute entry for the dropped column.
Thus, what people will more likely have to watch out for is pg_attribute
rows marked deleted in some fashion.

We are actually not that far away from being able to do DROP COLUMN,
if people don't mind being slow to recover the space used by a dropped
column.  It'd work like this:

1. Add an attisdropped boolean to pg_attribute.

2. DROP COLUMN sets this flag and changes attname to something like
***deleted_NNN.  (Changing attname is only necessary to allow the
same column name to be reused without drawing a unique-index error.)
That's it --- it's done.

3. Column lookup, expansion of *, etc have to be taught to ignore
columns marked attisdropped.

The idea is that the extant data sits there but is invisible.  Inserts
of new rows in the table would always insert a NULL in the dropped
column (which'd fall out more or less for free, there being no way
to tell the system to insert anything else).  Over time, UPDATEs of
extant rows would also replace the dropped data with NULLs.

I suspect there are only about half a dozen key places that would have
to explicitly check attisdropped.  None of the low-level executor
machinery would care at all, since it's dealing with real tuples where
the attribute is still there, at least as a NULL.

Hiroshi's DROP_COLUMN_HACK was essentially along this line, but
I think he made a representational mistake by trying to change the
attnums of dropped columns to be negative values.  That means that
a lot of low-level places *do* need to know about the dropped-column
convention, else they can't make any sense of tuple layouts.  The
negative-attnum idea might have been a little easier for clients
inspecting pg_attribute to cope with, but in practice I think they'd
need to be taught about dropped columns anyway --- as evidenced by
your remark suggesting that gaps in the sequence of positive attnums
would break clients.

regards, tom lane

PS: Once you have that, try this on for size: ALTER COLUMN is

ALTER DROP COLUMN;
ALTER ADD COLUMN newtype;
UPDATE foo SET newcol = coercion_fn(oldcol);

That last couldn't be expressed as an SQL statement because the parser
wouldn't allow access to oldcol, but there's nothing stopping it at the
implementation level.

This approach changes the user-visible column ordering, which'd be
a tad annoying, so probably something based on building a new version of
the table would be better.  But as a quick hack this would be doable.

Actually, given the DROP feature a user could do it for himself:

ALTER ADD COLUMN tempcol newtype;
UPDATE foo SET tempcol = coercion_fn(oldcol);
ALTER DROP COLUMN oldcol;
ALTER RENAME COLUMN tempcol to oldcol;

which seems like an okay approach, especially since it'd allow the
UPDATE computing the new column values to be of arbitrary complexity,
not just a simple coercion of one existing column.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-06 Thread Peter Eisentraut

Tom Lane writes:

 Why shouldn't aggregate functions have entries in pg_proc?  Then one
 search would cover both possibilities, and we could eliminate some
 duplicate code in the parser.

Furthermore, we could make the new function privileges apply to aggregates
as well.

-- 
Peter Eisentraut   [EMAIL PROTECTED]


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] RFC: Restructuring pg_aggregate

2002-04-06 Thread Tom Lane

Peter Eisentraut [EMAIL PROTECTED] writes:
 Tom Lane writes:
 Why shouldn't aggregate functions have entries in pg_proc?

 Furthermore, we could make the new function privileges apply to aggregates
 as well.

Good point.  Another thing that would fall out for free is that the
aggregate type-coercion rules would become exactly like the function
type-coercion rules; right now they are a tad stupider.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org